- * [PATCH 01/33] cacheinfo: Expose the code to generate a cache-id from a device_node
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
@ 2025-08-22 15:29 ` James Morse
  2025-08-27 10:46   ` Dave Martin
  2025-08-22 15:29 ` [PATCH 02/33] drivers: base: cacheinfo: Add helper to find the cache size from cpu+level James Morse
                   ` (66 subsequent siblings)
  67 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
The MPAM driver identifies caches by id for use with resctrl. It
needs to know the cache-id when probe-ing, but the value isn't set
in cacheinfo until device_initcall().
Expose the code that generates the cache-id. The parts of the MPAM
driver that run early can use this to set up the resctrl structures
before cacheinfo is ready in device_initcall().
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * Renamed cache_of_get_id() cache_of_calculate_id().
---
 drivers/base/cacheinfo.c  | 19 +++++++++++++------
 include/linux/cacheinfo.h |  1 +
 2 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index 613410705a47..f6289d142ba9 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -207,11 +207,10 @@ static bool match_cache_node(struct device_node *cpu,
 #define arch_compact_of_hwid(_x)	(_x)
 #endif
 
-static void cache_of_set_id(struct cacheinfo *this_leaf,
-			    struct device_node *cache_node)
+unsigned long cache_of_calculate_id(struct device_node *cache_node)
 {
 	struct device_node *cpu;
-	u32 min_id = ~0;
+	unsigned long min_id = ~0UL;
 
 	for_each_of_cpu_node(cpu) {
 		u64 id = of_get_cpu_hwid(cpu, 0);
@@ -219,15 +218,23 @@ static void cache_of_set_id(struct cacheinfo *this_leaf,
 		id = arch_compact_of_hwid(id);
 		if (FIELD_GET(GENMASK_ULL(63, 32), id)) {
 			of_node_put(cpu);
-			return;
+			return ~0UL;
 		}
 
 		if (match_cache_node(cpu, cache_node))
 			min_id = min(min_id, id);
 	}
 
-	if (min_id != ~0) {
-		this_leaf->id = min_id;
+	return min_id;
+}
+
+static void cache_of_set_id(struct cacheinfo *this_leaf,
+			    struct device_node *cache_node)
+{
+	unsigned long id = cache_of_calculate_id(cache_node);
+
+	if (id != ~0UL) {
+		this_leaf->id = id;
 		this_leaf->attributes |= CACHE_ID;
 	}
 }
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index c8f4f0a0b874..2dcbb69139e9 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -112,6 +112,7 @@ int acpi_get_cache_info(unsigned int cpu,
 #endif
 
 const struct attribute_group *cache_get_priv_group(struct cacheinfo *this_leaf);
+unsigned long cache_of_calculate_id(struct device_node *np);
 
 /*
  * Get the cacheinfo structure for the cache associated with @cpu at
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 01/33] cacheinfo: Expose the code to generate a cache-id from a device_node
  2025-08-22 15:29 ` [PATCH 01/33] cacheinfo: Expose the code to generate a cache-id from a device_node James Morse
@ 2025-08-27 10:46   ` Dave Martin
  2025-08-27 17:11     ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Dave Martin @ 2025-08-27 10:46 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On Fri, Aug 22, 2025 at 03:29:42PM +0000, James Morse wrote:
> The MPAM driver identifies caches by id for use with resctrl. It
> needs to know the cache-id when probe-ing, but the value isn't set
> in cacheinfo until device_initcall().
> 
> Expose the code that generates the cache-id. The parts of the MPAM
> driver that run early can use this to set up the resctrl structures
> before cacheinfo is ready in device_initcall().
Why can't the MPAM driver just consume the precomputed cache-id
information?
Possible reasons are that the MPAM driver probes too early, or that it
must parse the PPTT directly (which is true) and needs to label caches
consistently with the way the kernel does it.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v1:
>  * Renamed cache_of_get_id() cache_of_calculate_id().
> ---
>  drivers/base/cacheinfo.c  | 19 +++++++++++++------
>  include/linux/cacheinfo.h |  1 +
>  2 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
> index 613410705a47..f6289d142ba9 100644
> --- a/drivers/base/cacheinfo.c
> +++ b/drivers/base/cacheinfo.c
> @@ -207,11 +207,10 @@ static bool match_cache_node(struct device_node *cpu,
>  #define arch_compact_of_hwid(_x)	(_x)
>  #endif
>  
> -static void cache_of_set_id(struct cacheinfo *this_leaf,
> -			    struct device_node *cache_node)
> +unsigned long cache_of_calculate_id(struct device_node *cache_node)
>  {
>  	struct device_node *cpu;
> -	u32 min_id = ~0;
> +	unsigned long min_id = ~0UL;
Why the change of type here?
This does mean that 0xffffffff can now be generated as a valid cache-id,
but if that is necessary then this patch is also fixing a bug in the
code -- but the commit message doesn't say anything about that.
For a patch that is just exposing an internal result, it may be
better to keep the original type.  ~(u32)0 is already used as an
exceptional value.
[...]
Otherwise, this looks reasonable to me.
Cheers
---Dave
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 01/33] cacheinfo: Expose the code to generate a cache-id from a device_node
  2025-08-27 10:46   ` Dave Martin
@ 2025-08-27 17:11     ` James Morse
  2025-08-28 14:08       ` Dave Martin
  0 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-27 17:11 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi Dave,
On 27/08/2025 11:46, Dave Martin wrote:
> On Fri, Aug 22, 2025 at 03:29:42PM +0000, James Morse wrote:
>> The MPAM driver identifies caches by id for use with resctrl. It
>> needs to know the cache-id when probe-ing, but the value isn't set
>> in cacheinfo until device_initcall().
>>
>> Expose the code that generates the cache-id. The parts of the MPAM
>> driver that run early can use this to set up the resctrl structures
>> before cacheinfo is ready in device_initcall().
> Why can't the MPAM driver just consume the precomputed cache-id
> information?
Because it would need to wait until cacheinfo was ready, and it would still
need a way of getting the cache-id for caches where all the CPUs are offline.
The resctrl glue code has a waitqueue to wait for device_initcall_sync(), but that is
asynchronous to driver probing, its triggered by the schedule_work() from the cpuhp
callbacks. This bit is about the driver's use, which just gets probed whenever the core
code feels like it.
I toyed with always using cacheinfo for everything, and just waiting - but the MPAM driver
already has to parse the PPTT to find the information it needs on ACPI platforms, so the
wait would only happen on DT.
It seemed simpler to grab what the value would be, instead of waiting (or probe defer) -
especially as this is also needed for caches where all the CPUs are offline.
(I'll add the offline-cpus angle to the commit message)
> Possible reasons are that the MPAM driver probes too early,
yup,
> or that it
> must parse the PPTT directly (which is true) and needs to label caches
> consistently with the way the kernel does it.
It needs to match what will be exposed to user-space from cacheinfo.
This isn't about the PPTT, its the value that is generated for DT systems.
The driver has to know if its ACPI or DT to call the appropriate thing to get cache-ids
before cacheinfo is ready.
>> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
>> index 613410705a47..f6289d142ba9 100644
>> --- a/drivers/base/cacheinfo.c
>> +++ b/drivers/base/cacheinfo.c
>> @@ -207,11 +207,10 @@ static bool match_cache_node(struct device_node *cpu,
>>  #define arch_compact_of_hwid(_x)	(_x)
>>  #endif
>>  
>> -static void cache_of_set_id(struct cacheinfo *this_leaf,
>> -			    struct device_node *cache_node)
>> +unsigned long cache_of_calculate_id(struct device_node *cache_node)
>>  {
>>  	struct device_node *cpu;
>> -	u32 min_id = ~0;
>> +	unsigned long min_id = ~0UL;
> Why the change of type here?
This is a hang over from Rob's approach of making the cache-id 64 bit.
> This does mean that 0xffffffff can now be generated as a valid cache-id,
> but if that is necessary then this patch is also fixing a bug in the
> code -- but the commit message doesn't say anything about that.
> 
> For a patch that is just exposing an internal result, it may be
> better to keep the original type.  ~(u32)0 is already used as an
> exceptional value.
Yup, I'll fix that.
Thanks!
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 01/33] cacheinfo: Expose the code to generate a cache-id from a device_node
  2025-08-27 17:11     ` James Morse
@ 2025-08-28 14:08       ` Dave Martin
  0 siblings, 0 replies; 200+ messages in thread
From: Dave Martin @ 2025-08-28 14:08 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hi James,
On Wed, Aug 27, 2025 at 06:11:25PM +0100, James Morse wrote:
> Hi Dave,
> 
> On 27/08/2025 11:46, Dave Martin wrote:
> > On Fri, Aug 22, 2025 at 03:29:42PM +0000, James Morse wrote:
> >> The MPAM driver identifies caches by id for use with resctrl. It
> >> needs to know the cache-id when probe-ing, but the value isn't set
> >> in cacheinfo until device_initcall().
> >>
> >> Expose the code that generates the cache-id. The parts of the MPAM
> >> driver that run early can use this to set up the resctrl structures
> >> before cacheinfo is ready in device_initcall().
> 
> > Why can't the MPAM driver just consume the precomputed cache-id
> > information?
> 
> Because it would need to wait until cacheinfo was ready, and it would still
> need a way of getting the cache-id for caches where all the CPUs are offline.
> 
> The resctrl glue code has a waitqueue to wait for device_initcall_sync(), but that is
> asynchronous to driver probing, its triggered by the schedule_work() from the cpuhp
> callbacks. This bit is about the driver's use, which just gets probed whenever the core
> code feels like it.
> 
> I toyed with always using cacheinfo for everything, and just waiting - but the MPAM driver
> already has to parse the PPTT to find the information it needs on ACPI platforms, so the
> wait would only happen on DT.
> 
> It seemed simpler to grab what the value would be, instead of waiting (or probe defer) -
> especially as this is also needed for caches where all the CPUs are offline.
> 
> (I'll add the offline-cpus angle to the commit message)
Ack
> > Possible reasons are that the MPAM driver probes too early,
> 
> yup,
> 
> > or that it
> > must parse the PPTT directly (which is true) and needs to label caches
> > consistently with the way the kernel does it.
> 
> It needs to match what will be exposed to user-space from cacheinfo.
> This isn't about the PPTT, its the value that is generated for DT systems.
Right -- confused myself there.  From the point of view of this series,
the usage scenario isn't clear at this point.
> The driver has to know if its ACPI or DT to call the appropriate thing to get cache-ids
> before cacheinfo is ready.
I see.  This might be worth stating in the commit message.
> >> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
> >> index 613410705a47..f6289d142ba9 100644
> >> --- a/drivers/base/cacheinfo.c
> >> +++ b/drivers/base/cacheinfo.c
> >> @@ -207,11 +207,10 @@ static bool match_cache_node(struct device_node *cpu,
> >>  #define arch_compact_of_hwid(_x)	(_x)
> >>  #endif
> >>  
> >> -static void cache_of_set_id(struct cacheinfo *this_leaf,
> >> -			    struct device_node *cache_node)
> >> +unsigned long cache_of_calculate_id(struct device_node *cache_node)
> >>  {
> >>  	struct device_node *cpu;
> >> -	u32 min_id = ~0;
> >> +	unsigned long min_id = ~0UL;
> 
> > Why the change of type here?
> 
> This is a hang over from Rob's approach of making the cache-id 64 bit.
Ah, right.
(I have assumed that 0xffffffff is never going to clash with a valid
value.)
> > This does mean that 0xffffffff can now be generated as a valid cache-id,
> > but if that is necessary then this patch is also fixing a bug in the
> > code -- but the commit message doesn't say anything about that.
> > 
> > For a patch that is just exposing an internal result, it may be
> > better to keep the original type.  ~(u32)0 is already used as an
> > exceptional value.
> 
> Yup, I'll fix that.
OK -- it works either way, of course, but this should make the patch a
little less noisy.
Cheers
---Dave
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
 
 
- * [PATCH 02/33] drivers: base: cacheinfo: Add helper to find the cache size from cpu+level
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
  2025-08-22 15:29 ` [PATCH 01/33] cacheinfo: Expose the code to generate a cache-id from a device_node James Morse
@ 2025-08-22 15:29 ` James Morse
  2025-08-24 17:25   ` Krzysztof Kozlowski
  2025-08-27 10:46   ` Dave Martin
  2025-08-22 15:29 ` [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container James Morse
                   ` (65 subsequent siblings)
  67 siblings, 2 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
MPAM needs to know the size of a cache associated with a particular CPU.
The DT/ACPI agnostic way of doing this is to ask cacheinfo.
Add a helper to do this.
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * Converted to kdoc.
 * Simplified helper to use get_cpu_cacheinfo_level().
---
 include/linux/cacheinfo.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index 2dcbb69139e9..e12d6f2c6a57 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -148,6 +148,21 @@ static inline int get_cpu_cacheinfo_id(int cpu, int level)
 	return ci ? ci->id : -1;
 }
 
+/**
+ * get_cpu_cacheinfo_size() - Get the size of the cache.
+ * @cpu:      The cpu that is associated with the cache.
+ * @level:    The level of the cache as seen by @cpu.
+ *
+ * Callers must hold the cpuhp lock.
+ * Returns the cache-size on success, or 0 for an error.
+ */
+static inline unsigned int get_cpu_cacheinfo_size(int cpu, int level)
+{
+	struct cacheinfo *ci = get_cpu_cacheinfo_level(cpu, level);
+
+	return ci ? ci->size : 0;
+}
+
 #if defined(CONFIG_ARM64) || defined(CONFIG_ARM)
 #define use_arch_cache_info()	(true)
 #else
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 02/33] drivers: base: cacheinfo: Add helper to find the cache size from cpu+level
  2025-08-22 15:29 ` [PATCH 02/33] drivers: base: cacheinfo: Add helper to find the cache size from cpu+level James Morse
@ 2025-08-24 17:25   ` Krzysztof Kozlowski
  2025-08-27 17:11     ` James Morse
  2025-08-27 10:46   ` Dave Martin
  1 sibling, 1 reply; 200+ messages in thread
From: Krzysztof Kozlowski @ 2025-08-24 17:25 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
On 22/08/2025 17:29, James Morse wrote:
> MPAM needs to know the size of a cache associated with a particular CPU.
> The DT/ACPI agnostic way of doing this is to ask cacheinfo.
> 
> Add a helper to do this.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> 
> ---
> Changes since v1:
You marked this as v1.
>  * Converted to kdoc.
>  * Simplified helper to use get_cpu_cacheinfo_level().
Please use consistent subject prefixes. Look at previous patch subject
prefix.
Best regards,
Krzysztof
^ permalink raw reply	[flat|nested] 200+ messages in thread 
- * Re: [PATCH 02/33] drivers: base: cacheinfo: Add helper to find the cache size from cpu+level
  2025-08-24 17:25   ` Krzysztof Kozlowski
@ 2025-08-27 17:11     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-27 17:11 UTC (permalink / raw)
  To: Krzysztof Kozlowski, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi Krzysztof,
On 24/08/2025 18:25, Krzysztof Kozlowski wrote:
> On 22/08/2025 17:29, James Morse wrote:
>> MPAM needs to know the size of a cache associated with a particular CPU.
>> The DT/ACPI agnostic way of doing this is to ask cacheinfo.
>>
>> Add a helper to do this.
>> ---
>> Changes since v1:
> 
> You marked this as v1.
Oops - that should say RFC. I'll fix all those.
>>  * Converted to kdoc.
>>  * Simplified helper to use get_cpu_cacheinfo_level().
> Please use consistent subject prefixes. Look at previous patch subject
> prefix.
Presumably the previous patch in my series - this is a side effect of multiple branches
that were written at different times getting combined! I'll change it to 'cacheinfo:' as
that seems to be the most popular recently.
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread 
 
- * Re: [PATCH 02/33] drivers: base: cacheinfo: Add helper to find the cache size from cpu+level
  2025-08-22 15:29 ` [PATCH 02/33] drivers: base: cacheinfo: Add helper to find the cache size from cpu+level James Morse
  2025-08-24 17:25   ` Krzysztof Kozlowski
@ 2025-08-27 10:46   ` Dave Martin
  2025-08-27 17:11     ` James Morse
  1 sibling, 1 reply; 200+ messages in thread
From: Dave Martin @ 2025-08-27 10:46 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi,
On Fri, Aug 22, 2025 at 03:29:43PM +0000, James Morse wrote:
> MPAM needs to know the size of a cache associated with a particular CPU.
> The DT/ACPI agnostic way of doing this is to ask cacheinfo.
> 
> Add a helper to do this.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> 
> ---
> Changes since v1:
>  * Converted to kdoc.
>  * Simplified helper to use get_cpu_cacheinfo_level().
> ---
>  include/linux/cacheinfo.h | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
> index 2dcbb69139e9..e12d6f2c6a57 100644
> --- a/include/linux/cacheinfo.h
> +++ b/include/linux/cacheinfo.h
> @@ -148,6 +148,21 @@ static inline int get_cpu_cacheinfo_id(int cpu, int level)
>  	return ci ? ci->id : -1;
>  }
>  
> +/**
> + * get_cpu_cacheinfo_size() - Get the size of the cache.
> + * @cpu:      The cpu that is associated with the cache.
> + * @level:    The level of the cache as seen by @cpu.
> + *
> + * Callers must hold the cpuhp lock.
> + * Returns the cache-size on success, or 0 for an error.
> + */
Nit: Maybe use the wording
	cpuhp lock must be held.
in the kerneldoc here, to match the other helpers it sits alongside.
Otherwise, looks reasonable.
> +static inline unsigned int get_cpu_cacheinfo_size(int cpu, int level)
> +{
> +	struct cacheinfo *ci = get_cpu_cacheinfo_level(cpu, level);
> +
> +	return ci ? ci->size : 0;
> +}
> +
Orphaned function?
Can fs/resctrl/rdtgroup.c:rdtgroup_cbm_to_size() be ported to use this?
If so, this wouldn't just be dead code in this series.
Cheers
---Dave
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 02/33] drivers: base: cacheinfo: Add helper to find the cache size from cpu+level
  2025-08-27 10:46   ` Dave Martin
@ 2025-08-27 17:11     ` James Morse
  2025-08-28 14:10       ` Dave Martin
  2025-09-05 16:19       ` Dave Martin
  0 siblings, 2 replies; 200+ messages in thread
From: James Morse @ 2025-08-27 17:11 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi Dave,
On 27/08/2025 11:46, Dave Martin wrote:
> Hi,
> 
> On Fri, Aug 22, 2025 at 03:29:43PM +0000, James Morse wrote:
>> MPAM needs to know the size of a cache associated with a particular CPU.
>> The DT/ACPI agnostic way of doing this is to ask cacheinfo.
>>
>> Add a helper to do this.
>> diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
>> index 2dcbb69139e9..e12d6f2c6a57 100644
>> --- a/include/linux/cacheinfo.h
>> +++ b/include/linux/cacheinfo.h
>> @@ -148,6 +148,21 @@ static inline int get_cpu_cacheinfo_id(int cpu, int level)
>>  	return ci ? ci->id : -1;
>>  }
>>  
>> +/**
>> + * get_cpu_cacheinfo_size() - Get the size of the cache.
>> + * @cpu:      The cpu that is associated with the cache.
>> + * @level:    The level of the cache as seen by @cpu.
>> + *
>> + * Callers must hold the cpuhp lock.
>> + * Returns the cache-size on success, or 0 for an error.
>> + */
> 
> Nit: Maybe use the wording
> 
> 	cpuhp lock must be held.
> 
> in the kerneldoc here, to match the other helpers it sits alongside.
> 
> Otherwise, looks reasonable.
Sure,
>> +static inline unsigned int get_cpu_cacheinfo_size(int cpu, int level)
>> +{
>> +	struct cacheinfo *ci = get_cpu_cacheinfo_level(cpu, level);
>> +
>> +	return ci ? ci->size : 0;
>> +}
>> +
> 
> Orphaned function?
> 
> Can fs/resctrl/rdtgroup.c:rdtgroup_cbm_to_size() be ported to use this?
> If so, this wouldn't just be dead code in this series.
Ah - I thought the MPAM driver was pulling this value in, but its the resctrl glue code.
I was trying to reduce the number of trees this touches - its probably best to kick this
into the next series that adds the resctrl code as its pretty trivial.
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 02/33] drivers: base: cacheinfo: Add helper to find the cache size from cpu+level
  2025-08-27 17:11     ` James Morse
@ 2025-08-28 14:10       ` Dave Martin
  2025-09-05 16:19       ` Dave Martin
  1 sibling, 0 replies; 200+ messages in thread
From: Dave Martin @ 2025-08-28 14:10 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On Wed, Aug 27, 2025 at 06:11:43PM +0100, James Morse wrote:
> Hi Dave,
> 
> On 27/08/2025 11:46, Dave Martin wrote:
> > Hi,
> > 
> > On Fri, Aug 22, 2025 at 03:29:43PM +0000, James Morse wrote:
> >> MPAM needs to know the size of a cache associated with a particular CPU.
> >> The DT/ACPI agnostic way of doing this is to ask cacheinfo.
> >>
> >> Add a helper to do this.
> 
> >> diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
> >> index 2dcbb69139e9..e12d6f2c6a57 100644
> >> --- a/include/linux/cacheinfo.h
> >> +++ b/include/linux/cacheinfo.h
> >> @@ -148,6 +148,21 @@ static inline int get_cpu_cacheinfo_id(int cpu, int level)
> >>  	return ci ? ci->id : -1;
> >>  }
> >>  
> >> +/**
> >> + * get_cpu_cacheinfo_size() - Get the size of the cache.
> >> + * @cpu:      The cpu that is associated with the cache.
> >> + * @level:    The level of the cache as seen by @cpu.
> >> + *
> >> + * Callers must hold the cpuhp lock.
> >> + * Returns the cache-size on success, or 0 for an error.
> >> + */
> > 
> > Nit: Maybe use the wording
> > 
> > 	cpuhp lock must be held.
> > 
> > in the kerneldoc here, to match the other helpers it sits alongside.
> > 
> > Otherwise, looks reasonable.
> 
> Sure,
> 
> 
> >> +static inline unsigned int get_cpu_cacheinfo_size(int cpu, int level)
> >> +{
> >> +	struct cacheinfo *ci = get_cpu_cacheinfo_level(cpu, level);
> >> +
> >> +	return ci ? ci->size : 0;
> >> +}
> >> +
> > 
> > Orphaned function?
> > 
> > Can fs/resctrl/rdtgroup.c:rdtgroup_cbm_to_size() be ported to use this?
> > If so, this wouldn't just be dead code in this series.
> 
> Ah - I thought the MPAM driver was pulling this value in, but its the resctrl glue code.
> I was trying to reduce the number of trees this touches - its probably best to kick this
> into the next series that adds the resctrl code as its pretty trivial.
> 
> 
> Thanks,
> 
> James
Sure, that also works.
Cheers
---Dave
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 02/33] drivers: base: cacheinfo: Add helper to find the cache size from cpu+level
  2025-08-27 17:11     ` James Morse
  2025-08-28 14:10       ` Dave Martin
@ 2025-09-05 16:19       ` Dave Martin
  1 sibling, 0 replies; 200+ messages in thread
From: Dave Martin @ 2025-09-05 16:19 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hi,
On Wed, Aug 27, 2025 at 06:11:43PM +0100, James Morse wrote:
> Hi Dave,
> 
> On 27/08/2025 11:46, Dave Martin wrote:
> > Hi,
> > 
> > On Fri, Aug 22, 2025 at 03:29:43PM +0000, James Morse wrote:
> >> MPAM needs to know the size of a cache associated with a particular CPU.
> >> The DT/ACPI agnostic way of doing this is to ask cacheinfo.
> >>
> >> Add a helper to do this.
> 
> >> diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
[...]
> >> +static inline unsigned int get_cpu_cacheinfo_size(int cpu, int level)
> >> +{
> >> +	struct cacheinfo *ci = get_cpu_cacheinfo_level(cpu, level);
> >> +
> >> +	return ci ? ci->size : 0;
> >> +}
> >> +
> > 
> > Orphaned function?
> > 
> > Can fs/resctrl/rdtgroup.c:rdtgroup_cbm_to_size() be ported to use this?
> > If so, this wouldn't just be dead code in this series.
> 
> Ah - I thought the MPAM driver was pulling this value in, but its the resctrl glue code.
> I was trying to reduce the number of trees this touches - its probably best to kick this
> into the next series that adds the resctrl code as its pretty trivial.
> 
> 
> Thanks,
> 
> James
> 
Fair enough.
Cheers
---Dave
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
 
 
- * [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
  2025-08-22 15:29 ` [PATCH 01/33] cacheinfo: Expose the code to generate a cache-id from a device_node James Morse
  2025-08-22 15:29 ` [PATCH 02/33] drivers: base: cacheinfo: Add helper to find the cache size from cpu+level James Morse
@ 2025-08-22 15:29 ` James Morse
  2025-08-26 14:45   ` Ben Horgan
  2025-08-27 10:48   ` Dave Martin
  2025-08-22 15:29 ` [PATCH 04/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels James Morse
                   ` (64 subsequent siblings)
  67 siblings, 2 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
The PPTT describes CPUs and caches, as well as processor containers.
The ACPI table for MPAM describes the set of CPUs that can access an MSC
with the UID of a processor container.
Add a helper to find the processor container by its id, then walk
the possible CPUs to fill a cpumask with the CPUs that have this
processor container as a parent.
CC: Dave Martin <dave.martin@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
 * Dropped has_leaf_flag dodging of acpi_pptt_leaf_node()
 * Added missing : in kernel-doc
 * Made helper return void as this never actually returns an error.
---
 drivers/acpi/pptt.c  | 86 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h |  3 ++
 2 files changed, 89 insertions(+)
diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 54676e3d82dd..4791ca2bdfac 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -298,6 +298,92 @@ static struct acpi_pptt_processor *acpi_find_processor_node(struct acpi_table_he
 	return NULL;
 }
 
+/**
+ * acpi_pptt_get_child_cpus() - Find all the CPUs below a PPTT processor node
+ * @table_hdr:		A reference to the PPTT table.
+ * @parent_node:	A pointer to the processor node in the @table_hdr.
+ * @cpus:		A cpumask to fill with the CPUs below @parent_node.
+ *
+ * Walks up the PPTT from every possible CPU to find if the provided
+ * @parent_node is a parent of this CPU.
+ */
+static void acpi_pptt_get_child_cpus(struct acpi_table_header *table_hdr,
+				     struct acpi_pptt_processor *parent_node,
+				     cpumask_t *cpus)
+{
+	struct acpi_pptt_processor *cpu_node;
+	u32 acpi_id;
+	int cpu;
+
+	cpumask_clear(cpus);
+
+	for_each_possible_cpu(cpu) {
+		acpi_id = get_acpi_id_for_cpu(cpu);
+		cpu_node = acpi_find_processor_node(table_hdr, acpi_id);
+
+		while (cpu_node) {
+			if (cpu_node == parent_node) {
+				cpumask_set_cpu(cpu, cpus);
+				break;
+			}
+			cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
+		}
+	}
+}
+
+/**
+ * acpi_pptt_get_cpus_from_container() - Populate a cpumask with all CPUs in a
+ *                                       processor containers
+ * @acpi_cpu_id:	The UID of the processor container.
+ * @cpus:		The resulting CPU mask.
+ *
+ * Find the specified Processor Container, and fill @cpus with all the cpus
+ * below it.
+ *
+ * Not all 'Processor' entries in the PPTT are either a CPU or a Processor
+ * Container, they may exist purely to describe a Private resource. CPUs
+ * have to be leaves, so a Processor Container is a non-leaf that has the
+ * 'ACPI Processor ID valid' flag set.
+ *
+ * Return: 0 for a complete walk, or an error if the mask is incomplete.
+ */
+void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
+{
+	struct acpi_pptt_processor *cpu_node;
+	struct acpi_table_header *table_hdr;
+	struct acpi_subtable_header *entry;
+	unsigned long table_end;
+	acpi_status status;
+	bool leaf_flag;
+	u32 proc_sz;
+
+	cpumask_clear(cpus);
+
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table_hdr);
+	if (ACPI_FAILURE(status))
+		return;
+
+	table_end = (unsigned long)table_hdr + table_hdr->length;
+	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
+			     sizeof(struct acpi_table_pptt));
+	proc_sz = sizeof(struct acpi_pptt_processor);
+	while ((unsigned long)entry + proc_sz <= table_end) {
+		cpu_node = (struct acpi_pptt_processor *)entry;
+		if (entry->type == ACPI_PPTT_TYPE_PROCESSOR &&
+		    cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID) {
+			leaf_flag = acpi_pptt_leaf_node(table_hdr, cpu_node);
+			if (!leaf_flag) {
+				if (cpu_node->acpi_processor_id == acpi_cpu_id)
+					acpi_pptt_get_child_cpus(table_hdr, cpu_node, cpus);
+			}
+		}
+		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
+				     entry->length);
+	}
+
+	acpi_put_table(table_hdr);
+}
+
 static u8 acpi_cache_type(enum cache_type type)
 {
 	switch (type) {
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 1c5bb1e887cd..f97a9ff678cc 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -1541,6 +1541,7 @@ int find_acpi_cpu_topology(unsigned int cpu, int level);
 int find_acpi_cpu_topology_cluster(unsigned int cpu);
 int find_acpi_cpu_topology_package(unsigned int cpu);
 int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
+void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
 #else
 static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
 {
@@ -1562,6 +1563,8 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
 {
 	return -EINVAL;
 }
+static inline void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id,
+						     cpumask_t *cpus) { }
 #endif
 
 void acpi_arch_init(void);
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
  2025-08-22 15:29 ` [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container James Morse
@ 2025-08-26 14:45   ` Ben Horgan
  2025-08-28 15:56     ` James Morse
  2025-08-27 10:48   ` Dave Martin
  1 sibling, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-08-26 14:45 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
The patch logic update makes sense to me. Just a nit.
On 8/22/25 16:29, James Morse wrote:
> The PPTT describes CPUs and caches, as well as processor containers.
> The ACPI table for MPAM describes the set of CPUs that can access an MSC
> with the UID of a processor container.
> 
> Add a helper to find the processor container by its id, then walk
> the possible CPUs to fill a cpumask with the CPUs that have this
> processor container as a parent.
> 
> CC: Dave Martin <dave.martin@arm.com>
> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
>  * Dropped has_leaf_flag dodging of acpi_pptt_leaf_node()
>  * Added missing : in kernel-doc
>  * Made helper return void as this never actually returns an error.
> ---
>  drivers/acpi/pptt.c  | 86 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/acpi.h |  3 ++
>  2 files changed, 89 insertions(+)
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 54676e3d82dd..4791ca2bdfac 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -298,6 +298,92 @@ static struct acpi_pptt_processor *acpi_find_processor_node(struct acpi_table_he
>  	return NULL;
>  }
>  
> +/**
> + * acpi_pptt_get_child_cpus() - Find all the CPUs below a PPTT processor node
> + * @table_hdr:		A reference to the PPTT table.
> + * @parent_node:	A pointer to the processor node in the @table_hdr.
> + * @cpus:		A cpumask to fill with the CPUs below @parent_node.
> + *
> + * Walks up the PPTT from every possible CPU to find if the provided
> + * @parent_node is a parent of this CPU.
> + */
> +static void acpi_pptt_get_child_cpus(struct acpi_table_header *table_hdr,
> +				     struct acpi_pptt_processor *parent_node,
> +				     cpumask_t *cpus)
> +{
> +	struct acpi_pptt_processor *cpu_node;
> +	u32 acpi_id;
> +	int cpu;
> +
> +	cpumask_clear(cpus);
> +
> +	for_each_possible_cpu(cpu) {
> +		acpi_id = get_acpi_id_for_cpu(cpu);
> +		cpu_node = acpi_find_processor_node(table_hdr, acpi_id);
> +
> +		while (cpu_node) {
> +			if (cpu_node == parent_node) {
> +				cpumask_set_cpu(cpu, cpus);
> +				break;
> +			}
> +			cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +		}
> +	}
> +}
> +
> +/**
> + * acpi_pptt_get_cpus_from_container() - Populate a cpumask with all CPUs in a
> + *                                       processor containers
> + * @acpi_cpu_id:	The UID of the processor container.
> + * @cpus:		The resulting CPU mask.
> + *
> + * Find the specified Processor Container, and fill @cpus with all the cpus
> + * below it.
> + *
> + * Not all 'Processor' entries in the PPTT are either a CPU or a Processor
> + * Container, they may exist purely to describe a Private resource. CPUs
> + * have to be leaves, so a Processor Container is a non-leaf that has the
> + * 'ACPI Processor ID valid' flag set.
> + *
> + * Return: 0 for a complete walk, or an error if the mask is incomplete.
> + */
> +void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
> +{
> +	struct acpi_pptt_processor *cpu_node;
> +	struct acpi_table_header *table_hdr;
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	acpi_status status;
> +	bool leaf_flag;
> +	u32 proc_sz;
> +
> +	cpumask_clear(cpus);
> +
> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table_hdr);
> +	if (ACPI_FAILURE(status))
> +		return;
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
> +			     sizeof(struct acpi_table_pptt));
> +	proc_sz = sizeof(struct acpi_pptt_processor);
> +	while ((unsigned long)entry + proc_sz <= table_end) {
> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +		if (entry->type == ACPI_PPTT_TYPE_PROCESSOR &&
> +		    cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID) {
> +			leaf_flag = acpi_pptt_leaf_node(table_hdr, cpu_node);
nit: Consider dropping the boolean leaf_flag and just using
acpi_pptt_leaf_node() in the condition. The name leaf_flag is slightly
overloaded to include the case when the acpi leaf flag is not supported
and dropping it would make the code more succinct.
> +			if (!leaf_flag) {
> +				if (cpu_node->acpi_processor_id == acpi_cpu_id)
> +					acpi_pptt_get_child_cpus(table_hdr, cpu_node, cpus);
> +			}
> +		}
> +		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
> +				     entry->length);
> +	}
> +
> +	acpi_put_table(table_hdr);
> +}
> +
>  static u8 acpi_cache_type(enum cache_type type)
>  {
>  	switch (type) {
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index 1c5bb1e887cd..f97a9ff678cc 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -1541,6 +1541,7 @@ int find_acpi_cpu_topology(unsigned int cpu, int level);
>  int find_acpi_cpu_topology_cluster(unsigned int cpu);
>  int find_acpi_cpu_topology_package(unsigned int cpu);
>  int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
> +void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
>  #else
>  static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
>  {
> @@ -1562,6 +1563,8 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
>  {
>  	return -EINVAL;
>  }
> +static inline void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id,
> +						     cpumask_t *cpus) { }
>  #endif
>  
>  void acpi_arch_init(void);
Thanks,
Ben
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
  2025-08-26 14:45   ` Ben Horgan
@ 2025-08-28 15:56     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-28 15:56 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi Ben,
On 26/08/2025 15:45, Ben Horgan wrote:
> The patch logic update makes sense to me. Just a nit.
> 
> On 8/22/25 16:29, James Morse wrote:
>> The PPTT describes CPUs and caches, as well as processor containers.
>> The ACPI table for MPAM describes the set of CPUs that can access an MSC
>> with the UID of a processor container.
>>
>> Add a helper to find the processor container by its id, then walk
>> the possible CPUs to fill a cpumask with the CPUs that have this
>> processor container as a parent.
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index 54676e3d82dd..4791ca2bdfac 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -298,6 +298,92 @@ static struct acpi_pptt_processor *acpi_find_processor_node(struct acpi_table_he
>> +/**
>> + * acpi_pptt_get_cpus_from_container() - Populate a cpumask with all CPUs in a
>> + *                                       processor containers
>> + * @acpi_cpu_id:	The UID of the processor container.
>> + * @cpus:		The resulting CPU mask.
>> + *
>> + * Find the specified Processor Container, and fill @cpus with all the cpus
>> + * below it.
>> + *
>> + * Not all 'Processor' entries in the PPTT are either a CPU or a Processor
>> + * Container, they may exist purely to describe a Private resource. CPUs
>> + * have to be leaves, so a Processor Container is a non-leaf that has the
>> + * 'ACPI Processor ID valid' flag set.
>> + *
>> + * Return: 0 for a complete walk, or an error if the mask is incomplete.
>> + */
>> +void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
>> +{
>> +	struct acpi_pptt_processor *cpu_node;
>> +	struct acpi_table_header *table_hdr;
>> +	struct acpi_subtable_header *entry;
>> +	unsigned long table_end;
>> +	acpi_status status;
>> +	bool leaf_flag;
>> +	u32 proc_sz;
>> +
>> +	cpumask_clear(cpus);
>> +
>> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table_hdr);
>> +	if (ACPI_FAILURE(status))
>> +		return;
>> +
>> +	table_end = (unsigned long)table_hdr + table_hdr->length;
>> +	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
>> +			     sizeof(struct acpi_table_pptt));
>> +	proc_sz = sizeof(struct acpi_pptt_processor);
>> +	while ((unsigned long)entry + proc_sz <= table_end) {
>> +		cpu_node = (struct acpi_pptt_processor *)entry;
>> +		if (entry->type == ACPI_PPTT_TYPE_PROCESSOR &&
>> +		    cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID) {
>> +			leaf_flag = acpi_pptt_leaf_node(table_hdr, cpu_node);
> nit: Consider dropping the boolean leaf_flag and just using
> acpi_pptt_leaf_node() in the condition. The name leaf_flag is slightly
> overloaded to include the case when the acpi leaf flag is not supported
> and dropping it would make the code more succinct.
Sure, this is a hangover from the earlier cleanup you suggested. It's readable enough
without giving the result a name.
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
- * Re: [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
  2025-08-22 15:29 ` [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container James Morse
  2025-08-26 14:45   ` Ben Horgan
@ 2025-08-27 10:48   ` Dave Martin
  2025-08-28 15:57     ` James Morse
  1 sibling, 1 reply; 200+ messages in thread
From: Dave Martin @ 2025-08-27 10:48 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi,
On Fri, Aug 22, 2025 at 03:29:44PM +0000, James Morse wrote:
> The PPTT describes CPUs and caches, as well as processor containers.
> The ACPI table for MPAM describes the set of CPUs that can access an MSC
> with the UID of a processor container.
> 
> Add a helper to find the processor container by its id, then walk
> the possible CPUs to fill a cpumask with the CPUs that have this
> processor container as a parent.
Nit: The motivation for the change is not clear here.
I guess this boils down to the need to map the MSC topology information
in the the ACPI MPAM table to a cpumask for each MSC.
If so, a possible rearrangement and rewording might be, say:
--8<--
The ACPI MPAM table uses the UID of a processor container specified in
the PPTT, to indicate the subset of CPUs and upstream cache topology
that can access each MPAM Memory System Component (MSC).
This information is not directly useful to the kernel.  The equivalent
cpumask is needed instead.
Add a helper to find the processor container by its id, then [...]
-->8--
> 
> CC: Dave Martin <dave.martin@arm.com>
> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
>  * Dropped has_leaf_flag dodging of acpi_pptt_leaf_node()
>  * Added missing : in kernel-doc
>  * Made helper return void as this never actually returns an error.
> ---
>  drivers/acpi/pptt.c  | 86 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/acpi.h |  3 ++
>  2 files changed, 89 insertions(+)
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 54676e3d82dd..4791ca2bdfac 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -298,6 +298,92 @@ static struct acpi_pptt_processor *acpi_find_processor_node(struct acpi_table_he
>  	return NULL;
>  }
>  
> +/**
> + * acpi_pptt_get_child_cpus() - Find all the CPUs below a PPTT processor node
> + * @table_hdr:		A reference to the PPTT table.
> + * @parent_node:	A pointer to the processor node in the @table_hdr.
> + * @cpus:		A cpumask to fill with the CPUs below @parent_node.
> + *
> + * Walks up the PPTT from every possible CPU to find if the provided
> + * @parent_node is a parent of this CPU.
> + */
> +static void acpi_pptt_get_child_cpus(struct acpi_table_header *table_hdr,
> +				     struct acpi_pptt_processor *parent_node,
> +				     cpumask_t *cpus)
> +{
> +	struct acpi_pptt_processor *cpu_node;
> +	u32 acpi_id;
> +	int cpu;
> +
> +	cpumask_clear(cpus);
> +
> +	for_each_possible_cpu(cpu) {
> +		acpi_id = get_acpi_id_for_cpu(cpu);
^ Presumably this can't fail?
> +		cpu_node = acpi_find_processor_node(table_hdr, acpi_id);
> +
> +		while (cpu_node) {
> +			if (cpu_node == parent_node) {
> +				cpumask_set_cpu(cpu, cpus);
> +				break;
> +			}
> +			cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
> +		}
> +	}
> +}
> +
> +/**
> + * acpi_pptt_get_cpus_from_container() - Populate a cpumask with all CPUs in a
> + *                                       processor containers
Nit: "containers" -> "container" ?
> + * @acpi_cpu_id:	The UID of the processor container.
> + * @cpus:		The resulting CPU mask.
> + *
> + * Find the specified Processor Container, and fill @cpus with all the cpus
> + * below it.
> + *
> + * Not all 'Processor' entries in the PPTT are either a CPU or a Processor
> + * Container, they may exist purely to describe a Private resource. CPUs
> + * have to be leaves, so a Processor Container is a non-leaf that has the
> + * 'ACPI Processor ID valid' flag set.
(Revise this if dropping the leaf/non-leaf distinction -- see below.)
> + *
> + * Return: 0 for a complete walk, or an error if the mask is incomplete.
> + */
> +void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
> +{
> +	struct acpi_pptt_processor *cpu_node;
> +	struct acpi_table_header *table_hdr;
> +	struct acpi_subtable_header *entry;
> +	unsigned long table_end;
> +	acpi_status status;
> +	bool leaf_flag;
> +	u32 proc_sz;
> +
> +	cpumask_clear(cpus);
> +
> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table_hdr);
> +	if (ACPI_FAILURE(status))
> +		return;
Is acpi_get_pptt() applicable here?
(That function is not thread-safe, but then, perhaps most/all of these
functions are not thread safe.  If we are still on the boot CPU at this
point (?) then this wouldn't be a concern.)
> +
> +	table_end = (unsigned long)table_hdr + table_hdr->length;
> +	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
> +			     sizeof(struct acpi_table_pptt));
> +	proc_sz = sizeof(struct acpi_pptt_processor);
> +	while ((unsigned long)entry + proc_sz <= table_end) {
Ack that this matches the bounds check in functions that are already
present.
> +		cpu_node = (struct acpi_pptt_processor *)entry;
> +		if (entry->type == ACPI_PPTT_TYPE_PROCESSOR &&
> +		    cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID) {
> +			leaf_flag = acpi_pptt_leaf_node(table_hdr, cpu_node);
> +			if (!leaf_flag) {
> +				if (cpu_node->acpi_processor_id == acpi_cpu_id)
Is there any need to distinguish processor containers from (leaf) CPU
nodes, here?  If not, dropping the distinction might simplify the code
here (even if callers do not care).
Otherwise, maybe eliminate leaf_flag and collapse these into a single
if(), as suggested by Ben [1].
> +					acpi_pptt_get_child_cpus(table_hdr, cpu_node, cpus);
Can there ever be multiple matches?
The possibility of duplicate processor IDs in the PPTT sounds weird to
me, but then I'm not an ACPI expert.
If there can only be a single match, though, then we may as well break
out of the loop here, unless we want to be paranoid and report
duplicates as an error -- but that would require extra implementation,
so I'm not sure that would be worth it.
> +			}
> +		}
> +		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
> +				     entry->length);
> +	}
> +
> +	acpi_put_table(table_hdr);
> +}
[...]
[1] Ben Horgan, Re: [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
https://lore.kernel.org/lkml/b032775e-1729-441a-8ec4-dd85f70055e8@arm.com/
Cheers
---Dave
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
  2025-08-27 10:48   ` Dave Martin
@ 2025-08-28 15:57     ` James Morse
  2025-09-05 16:24       ` Dave Martin
  0 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-28 15:57 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hi Dave,
On 27/08/2025 11:48, Dave Martin wrote:
> On Fri, Aug 22, 2025 at 03:29:44PM +0000, James Morse wrote:
>> The PPTT describes CPUs and caches, as well as processor containers.
>> The ACPI table for MPAM describes the set of CPUs that can access an MSC
>> with the UID of a processor container.
>>
>> Add a helper to find the processor container by its id, then walk
>> the possible CPUs to fill a cpumask with the CPUs that have this
>> processor container as a parent.
> Nit: The motivation for the change is not clear here.
> 
> I guess this boils down to the need to map the MSC topology information
> in the the ACPI MPAM table to a cpumask for each MSC.
> 
> If so, a possible rearrangement and rewording might be, say:
> 
> --8<--
> 
> The ACPI MPAM table uses the UID of a processor container specified in
> the PPTT, to indicate the subset of CPUs and upstream cache topology
> that can access each MPAM Memory System Component (MSC).
> 
> This information is not directly useful to the kernel.  The equivalent
> cpumask is needed instead.
> 
> Add a helper to find the processor container by its id, then [...]
> 
> -->8--
Thanks, that is clearer!
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index 54676e3d82dd..4791ca2bdfac 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -298,6 +298,92 @@ static struct acpi_pptt_processor *acpi_find_processor_node(struct acpi_table_he
>>  	return NULL;
>>  }
>>  
>> +/**
>> + * acpi_pptt_get_child_cpus() - Find all the CPUs below a PPTT processor node
>> + * @table_hdr:		A reference to the PPTT table.
>> + * @parent_node:	A pointer to the processor node in the @table_hdr.
>> + * @cpus:		A cpumask to fill with the CPUs below @parent_node.
>> + *
>> + * Walks up the PPTT from every possible CPU to find if the provided
>> + * @parent_node is a parent of this CPU.
>> + */
>> +static void acpi_pptt_get_child_cpus(struct acpi_table_header *table_hdr,
>> +				     struct acpi_pptt_processor *parent_node,
>> +				     cpumask_t *cpus)
>> +{
>> +	struct acpi_pptt_processor *cpu_node;
>> +	u32 acpi_id;
>> +	int cpu;
>> +
>> +	cpumask_clear(cpus);
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		acpi_id = get_acpi_id_for_cpu(cpu);
> ^ Presumably this can't fail?
It'll return something! This could only be a problem if this raced with a CPU becoming
impossible, and there is no mechanism to do that.
>> +		cpu_node = acpi_find_processor_node(table_hdr, acpi_id);
>> +
>> +		while (cpu_node) {
>> +			if (cpu_node == parent_node) {
>> +				cpumask_set_cpu(cpu, cpus);
>> +				break;
>> +			}
>> +			cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>> +		}
>> +	}
>> +}
>> +
>> +/**
>> + * acpi_pptt_get_cpus_from_container() - Populate a cpumask with all CPUs in a
>> + *                                       processor containers
> Nit: "containers" -> "container" ?
Fixed,
>> + * @acpi_cpu_id:	The UID of the processor container.
>> + * @cpus:		The resulting CPU mask.
>> + *
>> + * Find the specified Processor Container, and fill @cpus with all the cpus
>> + * below it.
>> + *
>> + * Not all 'Processor' entries in the PPTT are either a CPU or a Processor
>> + * Container, they may exist purely to describe a Private resource. CPUs
>> + * have to be leaves, so a Processor Container is a non-leaf that has the
>> + * 'ACPI Processor ID valid' flag set.
> 
> (Revise this if dropping the leaf/non-leaf distinction -- see below.)
> 
>> + *
>> + * Return: 0 for a complete walk, or an error if the mask is incomplete.
>> + */
>> +void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
>> +{
>> +	struct acpi_pptt_processor *cpu_node;
>> +	struct acpi_table_header *table_hdr;
>> +	struct acpi_subtable_header *entry;
>> +	unsigned long table_end;
>> +	acpi_status status;
>> +	bool leaf_flag;
>> +	u32 proc_sz;
>> +
>> +	cpumask_clear(cpus);
>> +
>> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table_hdr);
>> +	if (ACPI_FAILURE(status))
>> +		return;
> Is acpi_get_pptt() applicable here?
Oh, that is new, and would let me chuck the reference counting.
I guess this replaces Jonthan's magic table free'ing cleanup thing!
> (That function is not thread-safe, but then, perhaps most/all of these
> functions are not thread safe.  If we are still on the boot CPU at this
> point (?) then this wouldn't be a concern.)
I think that relies on the first caller being from somewhere that can't race.
In this case its the architecture's smp_prepare_cpus() call to setup the acpi topology.
That is sufficiently early its not a concern.
>> +
>> +	table_end = (unsigned long)table_hdr + table_hdr->length;
>> +	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
>> +			     sizeof(struct acpi_table_pptt));
>> +	proc_sz = sizeof(struct acpi_pptt_processor);
>> +	while ((unsigned long)entry + proc_sz <= table_end) {
> 
> Ack that this matches the bounds check in functions that are already
> present.
> 
>> +		cpu_node = (struct acpi_pptt_processor *)entry;
>> +		if (entry->type == ACPI_PPTT_TYPE_PROCESSOR &&
>> +		    cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID) {
>> +			leaf_flag = acpi_pptt_leaf_node(table_hdr, cpu_node);
>> +			if (!leaf_flag) {
>> +				if (cpu_node->acpi_processor_id == acpi_cpu_id)
> Is there any need to distinguish processor containers from (leaf) CPU
> nodes, here?  If not, dropping the distinction might simplify the code
> here (even if callers do not care).
In the namespace the object types are different, so I assumed they have their own UID
space. The PPTT holds both - hence the check for which kind of thing it is. The risk is
looking for processor-container-4 and finding CPU-4 instead...
The relevant ACPI bit is "8.4.2.1 Processor Container Device", its says:
| A processor container declaration must supply a _UID method returning an ID that is
| unique in the processor container hierarchy.
Which doesn't quite let me combine them here.
> Otherwise, maybe eliminate leaf_flag and collapse these into a single
> if(), as suggested by Ben [1].
> 
>> +					acpi_pptt_get_child_cpus(table_hdr, cpu_node, cpus);
> 
> Can there ever be multiple matches?
> 
> The possibility of duplicate processor IDs in the PPTT sounds weird to
> me, but then I'm not an ACPI expert.
Multiple processor-containers with the same ID? That would be a corrupt table.
acpi_pptt_get_child_cpus() then walks the tree again to find the CPUs below this
processor-container - those have a different kind of id.
> If there can only be a single match, though, then we may as well break
> out of the loop here, unless we want to be paranoid and report
> duplicates as an error -- but that would require extra implementation,
> so I'm not sure that would be worth it.
Hmmm, the PPTT node should map to only one processor or processor-container.
I'll chuck the break in.
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
  2025-08-28 15:57     ` James Morse
@ 2025-09-05 16:24       ` Dave Martin
  2025-09-10 19:29         ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Dave Martin @ 2025-09-05 16:24 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hi James,
On Thu, Aug 28, 2025 at 04:57:06PM +0100, James Morse wrote:
> Hi Dave,
> 
> On 27/08/2025 11:48, Dave Martin wrote:
> > On Fri, Aug 22, 2025 at 03:29:44PM +0000, James Morse wrote:
> >> The PPTT describes CPUs and caches, as well as processor containers.
> >> The ACPI table for MPAM describes the set of CPUs that can access an MSC
> >> with the UID of a processor container.
> >>
> >> Add a helper to find the processor container by its id, then walk
> >> the possible CPUs to fill a cpumask with the CPUs that have this
> >> processor container as a parent.
> 
> > Nit: The motivation for the change is not clear here.
> > 
> > I guess this boils down to the need to map the MSC topology information
> > in the the ACPI MPAM table to a cpumask for each MSC.
> > 
> > If so, a possible rearrangement and rewording might be, say:
> > 
> > --8<--
> > 
> > The ACPI MPAM table uses the UID of a processor container specified in
> > the PPTT, to indicate the subset of CPUs and upstream cache topology
> > that can access each MPAM Memory System Component (MSC).
> > 
> > This information is not directly useful to the kernel.  The equivalent
> > cpumask is needed instead.
> > 
> > Add a helper to find the processor container by its id, then [...]
> > 
> > -->8--
> 
> Thanks, that is clearer!
Thanks
> >> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
[...]
> >> @@ -298,6 +298,92 @@ static struct acpi_pptt_processor *acpi_find_processor_node(struct acpi_table_he
[...]
> >> +static void acpi_pptt_get_child_cpus(struct acpi_table_header *table_hdr,
> >> +				     struct acpi_pptt_processor *parent_node,
> >> +				     cpumask_t *cpus)
> >> +{
> >> +	struct acpi_pptt_processor *cpu_node;
> >> +	u32 acpi_id;
> >> +	int cpu;
> >> +
> >> +	cpumask_clear(cpus);
> >> +
> >> +	for_each_possible_cpu(cpu) {
> >> +		acpi_id = get_acpi_id_for_cpu(cpu);
> 
> > ^ Presumably this can't fail?
> 
> It'll return something! This could only be a problem if this raced with a CPU becoming
> impossible, and there is no mechanism to do that.
Yep, now I go and look more closely at that function, my question looks
misguided.
[...]
> >> +void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
> >> +{
> >> +	struct acpi_pptt_processor *cpu_node;
> >> +	struct acpi_table_header *table_hdr;
> >> +	struct acpi_subtable_header *entry;
> >> +	unsigned long table_end;
> >> +	acpi_status status;
> >> +	bool leaf_flag;
> >> +	u32 proc_sz;
> >> +
> >> +	cpumask_clear(cpus);
> >> +
> >> +	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table_hdr);
> >> +	if (ACPI_FAILURE(status))
> >> +		return;
> 
> > Is acpi_get_pptt() applicable here?
> 
> Oh, that is new, and would let me chuck the reference counting.
> I guess this replaces Jonthan's magic table free'ing cleanup thing!
Ah, rightho.
> > (That function is not thread-safe, but then, perhaps most/all of these
> > functions are not thread safe.  If we are still on the boot CPU at this
> > point (?) then this wouldn't be a concern.)
> 
> I think that relies on the first caller being from somewhere that can't race.
> In this case its the architecture's smp_prepare_cpus() call to setup the acpi topology.
> That is sufficiently early its not a concern.
I guess so.
[...]
> >> +		cpu_node = (struct acpi_pptt_processor *)entry;
> >> +		if (entry->type == ACPI_PPTT_TYPE_PROCESSOR &&
> >> +		    cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID) {
> >> +			leaf_flag = acpi_pptt_leaf_node(table_hdr, cpu_node);
> >> +			if (!leaf_flag) {
> >> +				if (cpu_node->acpi_processor_id == acpi_cpu_id)
> 
> 
> > Is there any need to distinguish processor containers from (leaf) CPU
> > nodes, here?  If not, dropping the distinction might simplify the code
> > here (even if callers do not care).
> 
> In the namespace the object types are different, so I assumed they have their own UID
> space. The PPTT holds both - hence the check for which kind of thing it is. The risk is
> looking for processor-container-4 and finding CPU-4 instead...
>
> The relevant ACPI bit is "8.4.2.1 Processor Container Device", its says:
> | A processor container declaration must supply a _UID method returning an ID that is
> | unique in the processor container hierarchy.
> 
> Which doesn't quite let me combine them here.
I was going by the PPTT spec, where the types are not distinct --
you're probably right, though.
According to that, isn't it the "ACPI Processor ID valid" flag, not the
"Node is a Leaf" flag, that says whether this field is meaningful?
It's reasonable not to bother to try to enumerate the children of a
node that claims to be a leaf (even if there actually are children),
but I wonder what happens if acpi_processor_id is not declared to be
valid and matches by accident.  That's probably not a valid table (?)
but does anything bad happen on the kernel side?
> > Otherwise, maybe eliminate leaf_flag and collapse these into a single
> > if(), as suggested by Ben [1].
> > 
> >> +					acpi_pptt_get_child_cpus(table_hdr, cpu_node, cpus);
> > 
> > Can there ever be multiple matches?
> > 
> > The possibility of duplicate processor IDs in the PPTT sounds weird to
> > me, but then I'm not an ACPI expert.
> 
> Multiple processor-containers with the same ID? That would be a corrupt table.
> acpi_pptt_get_child_cpus() then walks the tree again to find the CPUs below this
> processor-container - those have a different kind of id.
Does anything bad happen if we encounter duplicates?
(Other then the MPAM driver never getting enabled, or not working as
advertised, that is.)
I haven't tried to think through all the implications, here.
> > If there can only be a single match, though, then we may as well break
> > out of the loop here, unless we want to be paranoid and report
> > duplicates as an error -- but that would require extra implementation,
> > so I'm not sure that would be worth it.
> 
> Hmmm, the PPTT node should map to only one processor or processor-container.
> I'll chuck the break in.
Ack
Cheers
---Dave
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
  2025-09-05 16:24       ` Dave Martin
@ 2025-09-10 19:29         ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 19:29 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hi Dave,
On 05/09/2025 17:24, Dave Martin wrote:
> On Thu, Aug 28, 2025 at 04:57:06PM +0100, James Morse wrote:
>> On 27/08/2025 11:48, Dave Martin wrote:
>>> On Fri, Aug 22, 2025 at 03:29:44PM +0000, James Morse wrote:
>>>> The PPTT describes CPUs and caches, as well as processor containers.
>>>> The ACPI table for MPAM describes the set of CPUs that can access an MSC
>>>> with the UID of a processor container.
>>>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>>>> @@ -298,6 +298,92 @@ static struct acpi_pptt_processor *acpi_find_processor_node(struct acpi_table_he
>>>>  +void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
>>>>  +{
>>>> +		cpu_node = (struct acpi_pptt_processor *)entry;
>>>> +		if (entry->type == ACPI_PPTT_TYPE_PROCESSOR &&
>>>> +		    cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID) {
>>>> +			leaf_flag = acpi_pptt_leaf_node(table_hdr, cpu_node);
>>>> +			if (!leaf_flag) {
>>>> +				if (cpu_node->acpi_processor_id == acpi_cpu_id)
>>
>>
>>> Is there any need to distinguish processor containers from (leaf) CPU
>>> nodes, here?  If not, dropping the distinction might simplify the code
>>> here (even if callers do not care).
>>
>> In the namespace the object types are different, so I assumed they have their own UID
>> space. The PPTT holds both - hence the check for which kind of thing it is. The risk is
>> looking for processor-container-4 and finding CPU-4 instead...
>>
>> The relevant ACPI bit is "8.4.2.1 Processor Container Device", its says:
>> | A processor container declaration must supply a _UID method returning an ID that is
>> | unique in the processor container hierarchy.
>>
>> Which doesn't quite let me combine them here.
> 
> I was going by the PPTT spec, where the types are not distinct --
> you're probably right, though.
This way round is at least robust to this happening.
> According to that, isn't it the "ACPI Processor ID valid" flag, not the
> "Node is a Leaf" flag, that says whether this field is meaningful?
ACPI_PPTT_ACPI_PROCESSOR_ID_VALID was checked a few lines earlier. We're looking for
processors, hence also checking the leaf.
> It's reasonable not to bother to try to enumerate the children of a
> node that claims to be a leaf (even if there actually are children),
> but I wonder what happens if acpi_processor_id is not declared to be
> valid and matches by accident.  That's probably not a valid table (?)
> but does anything bad happen on the kernel side?
The type and flag are both checked earlier, so this can't happen.
You could certainly but junk nodes in the table that would be skipped over, and those
could point to a parent that is a leaf - I can't spot anything in the table parsing code
that would care about that.
>>> Otherwise, maybe eliminate leaf_flag and collapse these into a single
>>> if(), as suggested by Ben [1].
>>>
>>>> +					acpi_pptt_get_child_cpus(table_hdr, cpu_node, cpus);
>>>
>>> Can there ever be multiple matches?
>>>
>>> The possibility of duplicate processor IDs in the PPTT sounds weird to
>>> me, but then I'm not an ACPI expert.
>>
>> Multiple processor-containers with the same ID? That would be a corrupt table.
>> acpi_pptt_get_child_cpus() then walks the tree again to find the CPUs below this
>> processor-container - those have a different kind of id.
> Does anything bad happen if we encounter duplicates?
> 
> (Other then the MPAM driver never getting enabled, or not working as
> advertised, that is.)
> 
> I haven't tried to think through all the implications, here.
It would be unpredictable which node linux finds when it goes looking for CPUs. I don't
think anything would notice. Messing up the cache hierarchy is a different story!
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
 
 
 
- * [PATCH 04/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (2 preceding siblings ...)
  2025-08-22 15:29 ` [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container James Morse
@ 2025-08-22 15:29 ` James Morse
  2025-08-27 10:49   ` Dave Martin
  2025-08-22 15:29 ` [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id James Morse
                   ` (63 subsequent siblings)
  67 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
acpi_count_levels() passes the number of levels back via a pointer argument.
It also passes this to acpi_find_cache_level() as the starting_level, and
preserves this value as it walks up the cpu_node tree counting the levels.
This means the caller must initialise 'levels' due to acpi_count_levels()
internals. The only caller acpi_get_cache_info() happens to have already
initialised levels to zero, which acpi_count_levels() depends on to get the
correct result.
Two results are passed back from acpi_count_levels(), unlike split_levels,
levels is not optional.
Split these two results up. The mandatory 'levels' is always returned,
which hides the internal details from the caller, and avoids having
duplicated initialisation in all callers. split_levels remains an
optional argument passed back.
Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
 * Made acpi_count_levels() return the levels value.
---
 drivers/acpi/pptt.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 4791ca2bdfac..8f9b9508acba 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -181,10 +181,10 @@ acpi_find_cache_level(struct acpi_table_header *table_hdr,
  * levels and split cache levels (data/instruction).
  * @table_hdr: Pointer to the head of the PPTT table
  * @cpu_node: processor node we wish to count caches for
- * @levels: Number of levels if success.
  * @split_levels:	Number of split cache levels (data/instruction) if
- *			success. Can by NULL.
+ *			success. Can be NULL.
  *
+ * Returns number of levels.
  * Given a processor node containing a processing unit, walk into it and count
  * how many levels exist solely for it, and then walk up each level until we hit
  * the root node (ignore the package level because it may be possible to have
@@ -192,14 +192,18 @@ acpi_find_cache_level(struct acpi_table_header *table_hdr,
  * split cache levels (data/instruction) that exist at each level on the way
  * up.
  */
-static void acpi_count_levels(struct acpi_table_header *table_hdr,
-			      struct acpi_pptt_processor *cpu_node,
-			      unsigned int *levels, unsigned int *split_levels)
+static int acpi_count_levels(struct acpi_table_header *table_hdr,
+			     struct acpi_pptt_processor *cpu_node,
+			     unsigned int *split_levels)
 {
+	int starting_level = 0;
+
 	do {
-		acpi_find_cache_level(table_hdr, cpu_node, levels, split_levels, 0, 0);
+		acpi_find_cache_level(table_hdr, cpu_node, &starting_level, split_levels, 0, 0);
 		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
 	} while (cpu_node);
+
+	return starting_level;
 }
 
 /**
@@ -731,7 +735,7 @@ int acpi_get_cache_info(unsigned int cpu, unsigned int *levels,
 	if (!cpu_node)
 		return -ENOENT;
 
-	acpi_count_levels(table, cpu_node, levels, split_levels);
+	*levels = acpi_count_levels(table, cpu_node, split_levels);
 
 	pr_debug("Cache Setup: last_level=%d split_levels=%d\n",
 		 *levels, split_levels ? *split_levels : -1);
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 04/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels
  2025-08-22 15:29 ` [PATCH 04/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels James Morse
@ 2025-08-27 10:49   ` Dave Martin
  2025-08-28 15:57     ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Dave Martin @ 2025-08-27 10:49 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi,
On Fri, Aug 22, 2025 at 03:29:45PM +0000, James Morse wrote:
> acpi_count_levels() passes the number of levels back via a pointer argument.
> It also passes this to acpi_find_cache_level() as the starting_level, and
> preserves this value as it walks up the cpu_node tree counting the levels.
> 
> This means the caller must initialise 'levels' due to acpi_count_levels()
> internals. The only caller acpi_get_cache_info() happens to have already
> initialised levels to zero, which acpi_count_levels() depends on to get the
> correct result.
> 
> Two results are passed back from acpi_count_levels(), unlike split_levels,
> levels is not optional.
> 
> Split these two results up. The mandatory 'levels' is always returned,
> which hides the internal details from the caller, and avoids having
> duplicated initialisation in all callers. split_levels remains an
> optional argument passed back.
Nit: I found all this a bit hard to follow.
This seems to boil down to:
--8<--
In acpi_count_levels(), the initial value of *levels passed by the
caller is really an implementation detail of acpi_count_levels(), so it
is unreasonable to expect the callers of this function to know what to
pass in for this parameter.  The only sensible initial value is 0,
which is what the only upstream caller (acpi_get_cache_info()) passes.
Use a local variable for the starting cache level in acpi_count_levels(),
and pass the result back to the caller via the function return value.
Gid rid of the levels parameter, which has no remaining purpose.
Fix acpi_get_cache_info() to match.
-->8--
split_levels is orthogonal to this refactoring (as evinced by the diff).
I think mentioning it in the commit message at all may just add to the
confusion...
> Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> 
> ---
> Changes since RFC:
>  * Made acpi_count_levels() return the levels value.
> ---
>  drivers/acpi/pptt.c | 18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 4791ca2bdfac..8f9b9508acba 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -181,10 +181,10 @@ acpi_find_cache_level(struct acpi_table_header *table_hdr,
>   * levels and split cache levels (data/instruction).
>   * @table_hdr: Pointer to the head of the PPTT table
>   * @cpu_node: processor node we wish to count caches for
> - * @levels: Number of levels if success.
>   * @split_levels:	Number of split cache levels (data/instruction) if
> - *			success. Can by NULL.
> + *			success. Can be NULL.
>   *
> + * Returns number of levels.
Nit: the prevailing convention in this file would be
	Return: number of levels
(I don't know whether kerneldoc cares.)
Maybe also say "total number of levels" in place of "level", to make it
clearer that the split levels (if any) are included in this count.
>   * Given a processor node containing a processing unit, walk into it and count
>   * how many levels exist solely for it, and then walk up each level until we hit
>   * the root node (ignore the package level because it may be possible to have
> @@ -192,14 +192,18 @@ acpi_find_cache_level(struct acpi_table_header *table_hdr,
>   * split cache levels (data/instruction) that exist at each level on the way
>   * up.
>   */
> -static void acpi_count_levels(struct acpi_table_header *table_hdr,
> -			      struct acpi_pptt_processor *cpu_node,
> -			      unsigned int *levels, unsigned int *split_levels)
> +static int acpi_count_levels(struct acpi_table_header *table_hdr,
> +			     struct acpi_pptt_processor *cpu_node,
> +			     unsigned int *split_levels)
>  {
> +	int starting_level = 0;
> +
>  	do {
> -		acpi_find_cache_level(table_hdr, cpu_node, levels, split_levels, 0, 0);
> +		acpi_find_cache_level(table_hdr, cpu_node, &starting_level, split_levels, 0, 0);
>  		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>  	} while (cpu_node);
> +
> +	return starting_level;
>  }
>  
>  /**
> @@ -731,7 +735,7 @@ int acpi_get_cache_info(unsigned int cpu, unsigned int *levels,
>  	if (!cpu_node)
>  		return -ENOENT;
>  
> -	acpi_count_levels(table, cpu_node, levels, split_levels);
> +	*levels = acpi_count_levels(table, cpu_node, split_levels);
>  
>  	pr_debug("Cache Setup: last_level=%d split_levels=%d\n",
>  		 *levels, split_levels ? *split_levels : -1);
Otherwise, looks reasonable to me.
(But see my comments on the next patches re whether we really need this.)
Cheers
---Dave
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 04/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels
  2025-08-27 10:49   ` Dave Martin
@ 2025-08-28 15:57     ` James Morse
  2025-09-09 10:06       ` Dave Martin
  0 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-28 15:57 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hi Dave,
On 27/08/2025 11:49, Dave Martin wrote:
> On Fri, Aug 22, 2025 at 03:29:45PM +0000, James Morse wrote:
>> acpi_count_levels() passes the number of levels back via a pointer argument.
>> It also passes this to acpi_find_cache_level() as the starting_level, and
>> preserves this value as it walks up the cpu_node tree counting the levels.
>>
>> This means the caller must initialise 'levels' due to acpi_count_levels()
>> internals. The only caller acpi_get_cache_info() happens to have already
>> initialised levels to zero, which acpi_count_levels() depends on to get the
>> correct result.
>>
>> Two results are passed back from acpi_count_levels(), unlike split_levels,
>> levels is not optional.
>>
>> Split these two results up. The mandatory 'levels' is always returned,
>> which hides the internal details from the caller, and avoids having
>> duplicated initialisation in all callers. split_levels remains an
>> optional argument passed back.
> 
> Nit: I found all this a bit hard to follow.
> 
> This seems to boil down to:
> 
> --8<--
> 
> In acpi_count_levels(), the initial value of *levels passed by the
> caller is really an implementation detail of acpi_count_levels(), so it
> is unreasonable to expect the callers of this function to know what to
> pass in for this parameter.  The only sensible initial value is 0,
> which is what the only upstream caller (acpi_get_cache_info()) passes.
> 
> Use a local variable for the starting cache level in acpi_count_levels(),
> and pass the result back to the caller via the function return value.
> 
> Gid rid of the levels parameter, which has no remaining purpose.
> 
> Fix acpi_get_cache_info() to match.
> 
> -->8--
I've taken this instead,
> split_levels is orthogonal to this refactoring (as evinced by the diff).
> I think mentioning it in the commit message at all may just add to the
> confusion...
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index 4791ca2bdfac..8f9b9508acba 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -181,10 +181,10 @@ acpi_find_cache_level(struct acpi_table_header *table_hdr,
>>   * levels and split cache levels (data/instruction).
>>   * @table_hdr: Pointer to the head of the PPTT table
>>   * @cpu_node: processor node we wish to count caches for
>> - * @levels: Number of levels if success.
>>   * @split_levels:	Number of split cache levels (data/instruction) if
>> - *			success. Can by NULL.
>> + *			success. Can be NULL.
>>   *
>> + * Returns number of levels.
> 
> Nit: the prevailing convention in this file would be
> 
> 	Return: number of levels
> 
> (I don't know whether kerneldoc cares.)
> 
> Maybe also say "total number of levels" in place of "level", to make it
> clearer that the split levels (if any) are included in this count.
Sure,
>> @@ -731,7 +735,7 @@ int acpi_get_cache_info(unsigned int cpu, unsigned int *levels,
>>  	if (!cpu_node)
>>  		return -ENOENT;
>>  
>> -	acpi_count_levels(table, cpu_node, levels, split_levels);
>> +	*levels = acpi_count_levels(table, cpu_node, split_levels);
>>  
>>  	pr_debug("Cache Setup: last_level=%d split_levels=%d\n",
>>  		 *levels, split_levels ? *split_levels : -1);
> 
> Otherwise, looks reasonable to me.
> 
> (But see my comments on the next patches re whether we really need this.)
It was enough fun to debug that I'd like to save anyone else the trouble!
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 04/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels
  2025-08-28 15:57     ` James Morse
@ 2025-09-09 10:06       ` Dave Martin
  0 siblings, 0 replies; 200+ messages in thread
From: Dave Martin @ 2025-09-09 10:06 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hi James,
On Thu, Aug 28, 2025 at 04:57:15PM +0100, James Morse wrote:
> Hi Dave,
> 
> On 27/08/2025 11:49, Dave Martin wrote:
> > On Fri, Aug 22, 2025 at 03:29:45PM +0000, James Morse wrote:
> >> acpi_count_levels() passes the number of levels back via a pointer argument.
> >> It also passes this to acpi_find_cache_level() as the starting_level, and
> >> preserves this value as it walks up the cpu_node tree counting the levels.
> >>
> >> This means the caller must initialise 'levels' due to acpi_count_levels()
> >> internals. The only caller acpi_get_cache_info() happens to have already
> >> initialised levels to zero, which acpi_count_levels() depends on to get the
> >> correct result.
> >>
> >> Two results are passed back from acpi_count_levels(), unlike split_levels,
> >> levels is not optional.
> >>
> >> Split these two results up. The mandatory 'levels' is always returned,
> >> which hides the internal details from the caller, and avoids having
> >> duplicated initialisation in all callers. split_levels remains an
> >> optional argument passed back.
> > 
> > Nit: I found all this a bit hard to follow.
> > 
> > This seems to boil down to:
> > 
> > --8<--
> > 
> > In acpi_count_levels(), the initial value of *levels passed by the
> > caller is really an implementation detail of acpi_count_levels(), so it
> > is unreasonable to expect the callers of this function to know what to
> > pass in for this parameter.  The only sensible initial value is 0,
> > which is what the only upstream caller (acpi_get_cache_info()) passes.
> > 
> > Use a local variable for the starting cache level in acpi_count_levels(),
> > and pass the result back to the caller via the function return value.
> > 
> > Gid rid of the levels parameter, which has no remaining purpose.
> > 
> > Fix acpi_get_cache_info() to match.
> > 
> > -->8--
> 
> I've taken this instead,
OK
[...]
> >> @@ -731,7 +735,7 @@ int acpi_get_cache_info(unsigned int cpu, unsigned int *levels,
> >>  	if (!cpu_node)
> >>  		return -ENOENT;
> >>  
> >> -	acpi_count_levels(table, cpu_node, levels, split_levels);
> >> +	*levels = acpi_count_levels(table, cpu_node, split_levels);
> >>  
> >>  	pr_debug("Cache Setup: last_level=%d split_levels=%d\n",
> >>  		 *levels, split_levels ? *split_levels : -1);
> > 
> > Otherwise, looks reasonable to me.
> > 
> > (But see my comments on the next patches re whether we really need this.)
> 
> It was enough fun to debug that I'd like to save anyone else the trouble!
Fair enough.
Cheers
---Dave
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
 
 
- * [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (3 preceding siblings ...)
  2025-08-22 15:29 ` [PATCH 04/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels James Morse
@ 2025-08-22 15:29 ` James Morse
  2025-08-23 12:14   ` Markus Elfring
                     ` (2 more replies)
  2025-08-22 15:29 ` [PATCH 06/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id James Morse
                   ` (62 subsequent siblings)
  67 siblings, 3 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
The MPAM table identifies caches by id. The MPAM driver also wants to know
the cache level to determine if the platform is of the shape that can be
managed via resctrl. Cacheinfo has this information, but only for CPUs that
are online.
Waiting for all CPUs to come online is a problem for platforms where
CPUs are brought online late by user-space.
Add a helper that walks every possible cache, until it finds the one
identified by cache-id, then return the level.
Add a cleanup based free-ing mechanism for acpi_get_table().
CC: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
 * acpi_count_levels() now returns a value.
 * Converted the table-get stuff to use Jonathan's cleanup helper.
 * Dropped Sudeep's Review tag due to the cleanup change.
---
 drivers/acpi/pptt.c  | 64 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h | 17 ++++++++++++
 2 files changed, 81 insertions(+)
diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 8f9b9508acba..660457644a5b 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -907,3 +907,67 @@ int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
 	return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
 					  ACPI_PPTT_ACPI_IDENTICAL);
 }
+
+/**
+ * find_acpi_cache_level_from_id() - Get the level of the specified cache
+ * @cache_id: The id field of the unified cache
+ *
+ * Determine the level relative to any CPU for the unified cache identified by
+ * cache_id. This allows the property to be found even if the CPUs are offline.
+ *
+ * The returned level can be used to group unified caches that are peers.
+ *
+ * The PPTT table must be rev 3 or later,
+ *
+ * If one CPUs L2 is shared with another as L3, this function will return
+ * an unpredictable value.
+ *
+ * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
+ * Otherwise returns a value which represents the level of the specified cache.
+ */
+int find_acpi_cache_level_from_id(u32 cache_id)
+{
+	u32 acpi_cpu_id;
+	int level, cpu, num_levels;
+	struct acpi_pptt_cache *cache;
+	struct acpi_pptt_cache_v1 *cache_v1;
+	struct acpi_pptt_processor *cpu_node;
+	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);
+
+	if (IS_ERR(table))
+		return PTR_ERR(table);
+
+	if (table->revision < 3)
+		return -ENOENT;
+
+	/*
+	 * If we found the cache first, we'd still need to walk from each CPU
+	 * to find the level...
+	 */
+	for_each_possible_cpu(cpu) {
+		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
+		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
+		if (!cpu_node)
+			return -ENOENT;
+		num_levels = acpi_count_levels(table, cpu_node, NULL);
+
+		/* Start at 1 for L1 */
+		for (level = 1; level <= num_levels; level++) {
+			cache = acpi_find_cache_node(table, acpi_cpu_id,
+						     ACPI_PPTT_CACHE_TYPE_UNIFIED,
+						     level, &cpu_node);
+			if (!cache)
+				continue;
+
+			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
+						cache,
+						sizeof(struct acpi_pptt_cache));
+
+			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
+			    cache_v1->cache_id == cache_id)
+				return level;
+		}
+	}
+
+	return -ENOENT;
+}
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index f97a9ff678cc..30c10b1dcdb2 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -8,6 +8,7 @@
 #ifndef _LINUX_ACPI_H
 #define _LINUX_ACPI_H
 
+#include <linux/cleanup.h>
 #include <linux/errno.h>
 #include <linux/ioport.h>	/* for struct resource */
 #include <linux/resource_ext.h>
@@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
 void acpi_table_init_complete (void);
 int acpi_table_init (void);
 
+static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
+{
+	struct acpi_table_header *table;
+	int status = acpi_get_table(signature, instance, &table);
+
+	if (ACPI_FAILURE(status))
+		return ERR_PTR(-ENOENT);
+	return table;
+}
+DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR(_T)) acpi_put_table(_T))
+
 int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
 int __init_or_acpilib acpi_table_parse_entries(char *id,
 		unsigned long table_size, int entry_id,
@@ -1542,6 +1554,7 @@ int find_acpi_cpu_topology_cluster(unsigned int cpu);
 int find_acpi_cpu_topology_package(unsigned int cpu);
 int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
 void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
+int find_acpi_cache_level_from_id(u32 cache_id);
 #else
 static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
 {
@@ -1565,6 +1578,10 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
 }
 static inline void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id,
 						     cpumask_t *cpus) { }
+static inline int find_acpi_cache_level_from_id(u32 cache_id)
+{
+	return -EINVAL;
+}
 #endif
 
 void acpi_arch_init(void);
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id
  2025-08-22 15:29 ` [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id James Morse
@ 2025-08-23 12:14   ` Markus Elfring
  2025-08-28 15:57     ` James Morse
  2025-08-27  9:25   ` Ben Horgan
  2025-08-27 10:50   ` Dave Martin
  2 siblings, 1 reply; 200+ messages in thread
From: Markus Elfring @ 2025-08-23 12:14 UTC (permalink / raw)
  To: James Morse, linux-arm-kernel, linux-acpi, devicetree
  Cc: LKML, Amit Singh Tomar, Baisheng Gao, Baolin Wang,
	bobo.shaobowang, Carl Worth, Catalin Marinas, Conor Dooley,
	Danilo Krummrich, Dave Martin, David Hildenbrand, Drew Fustini,
	D Scott Phillips, Fenghua Yu, Greg Kroah-Hartman, Hanjun Guo,
	Jamie Iles, Jonathan Cameron, Koba Ko, Krzysztof Kozlowski,
	Len Brown, Linu Cherian, Lorenzo Pieralisi, Peter Newman,
	Rafael J. Wysocki, Rex Nie, Rob Herring, Rohit Mathew,
	Shameer Kolothum, Shanker Donthineni, Shaopeng Tan, Sudeep Holla,
	Will Deacon, Xin Hao
…
> +++ b/include/linux/acpi.h
…
> @@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
>  void acpi_table_init_complete (void);
>  int acpi_table_init (void);
…
> +DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR(_T)) acpi_put_table(_T))
> +
>  int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
…
How do you think about to offer the addition of such a special macro call
by another separate update step?
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.17-rc2#n81
Regards,
Markus
^ permalink raw reply	[flat|nested] 200+ messages in thread 
- * Re: [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id
  2025-08-23 12:14   ` Markus Elfring
@ 2025-08-28 15:57     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-28 15:57 UTC (permalink / raw)
  To: Markus Elfring, linux-arm-kernel, linux-acpi, devicetree
  Cc: LKML, Amit Singh Tomar, Baisheng Gao, Baolin Wang,
	bobo.shaobowang, Carl Worth, Catalin Marinas, Conor Dooley,
	Danilo Krummrich, Dave Martin, David Hildenbrand, Drew Fustini,
	D Scott Phillips, Fenghua Yu, Greg Kroah-Hartman, Hanjun Guo,
	Jamie Iles, Jonathan Cameron, Koba Ko, Krzysztof Kozlowski,
	Len Brown, Linu Cherian, Lorenzo Pieralisi, Peter Newman,
	Rafael J. Wysocki, Rex Nie, Rob Herring, Rohit Mathew,
	Shanker Donthineni, Shaopeng Tan, Sudeep Holla, Will Deacon,
	Xin Hao
Hi Markus,
On 23/08/2025 13:14, Markus Elfring wrote:
> …
>> +++ b/include/linux/acpi.h
> …
>> @@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
>>  void acpi_table_init_complete (void);
>>  int acpi_table_init (void);
> …
>> +DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR(_T)) acpi_put_table(_T))
>> +
>>  int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
> …
> 
> How do you think about to offer the addition of such a special macro call
> by another separate update step?
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.17-rc2#n81
As it goes via the same tree I don't think there is a strong reason either way.
Dave points out on an earlier patch that the PPTT code doesn't care about the reference
counting anyway, so this stuff can go.
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread 
 
- * Re: [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id
  2025-08-22 15:29 ` [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id James Morse
  2025-08-23 12:14   ` Markus Elfring
@ 2025-08-27  9:25   ` Ben Horgan
  2025-08-28 15:57     ` James Morse
  2025-08-27 10:50   ` Dave Martin
  2 siblings, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-08-27  9:25 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:29, James Morse wrote:
> The MPAM table identifies caches by id. The MPAM driver also wants to know
> the cache level to determine if the platform is of the shape that can be
> managed via resctrl. Cacheinfo has this information, but only for CPUs that
> are online.
> 
> Waiting for all CPUs to come online is a problem for platforms where
> CPUs are brought online late by user-space.
> 
> Add a helper that walks every possible cache, until it finds the one
> identified by cache-id, then return the level.
> Add a cleanup based free-ing mechanism for acpi_get_table().
> 
> CC: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
>  * acpi_count_levels() now returns a value.
>  * Converted the table-get stuff to use Jonathan's cleanup helper.
>  * Dropped Sudeep's Review tag due to the cleanup change.
> ---
>  drivers/acpi/pptt.c  | 64 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/acpi.h | 17 ++++++++++++
>  2 files changed, 81 insertions(+)
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 8f9b9508acba..660457644a5b 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -907,3 +907,67 @@ int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
>  	return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
>  					  ACPI_PPTT_ACPI_IDENTICAL);
>  }
> +
> +/**
> + * find_acpi_cache_level_from_id() - Get the level of the specified cache
> + * @cache_id: The id field of the unified cache
> + *
> + * Determine the level relative to any CPU for the unified cache identified by
> + * cache_id. This allows the property to be found even if the CPUs are offline.
> + *
> + * The returned level can be used to group unified caches that are peers.
> + *
> + * The PPTT table must be rev 3 or later,
> + *
> + * If one CPUs L2 is shared with another as L3, this function will return
> + * an unpredictable value.
> + *
> + * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
> + * Otherwise returns a value which represents the level of the specified cache.
> + */
> +int find_acpi_cache_level_from_id(u32 cache_id)
> +{
> +	u32 acpi_cpu_id;
> +	int level, cpu, num_levels;
> +	struct acpi_pptt_cache *cache;
> +	struct acpi_pptt_cache_v1 *cache_v1;
> +	struct acpi_pptt_processor *cpu_node;
> +	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);
> +
> +	if (IS_ERR(table))
> +		return PTR_ERR(table);
> +
> +	if (table->revision < 3)
> +		return -ENOENT;
> +
> +	/*
> +	 * If we found the cache first, we'd still need to walk from each CPU
> +	 * to find the level...
> +	 */
> +	for_each_possible_cpu(cpu) {
> +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
> +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> +		if (!cpu_node)
> +			return -ENOENT;
> +		num_levels = acpi_count_levels(table, cpu_node, NULL);
> +
> +		/* Start at 1 for L1 */
> +		for (level = 1; level <= num_levels; level++) {
> +			cache = acpi_find_cache_node(table, acpi_cpu_id,
> +						     ACPI_PPTT_CACHE_TYPE_UNIFIED,
> +						     level, &cpu_node);
> +			if (!cache)
> +				continue;
> +
> +			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
> +						cache,
> +						sizeof(struct acpi_pptt_cache));
> +
> +			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
> +			    cache_v1->cache_id == cache_id)
> +				return level;
> +		}
> +	}
> +
> +	return -ENOENT;
> +}
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index f97a9ff678cc..30c10b1dcdb2 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -8,6 +8,7 @@
>  #ifndef _LINUX_ACPI_H
>  #define _LINUX_ACPI_H
>  
> +#include <linux/cleanup.h>
>  #include <linux/errno.h>
>  #include <linux/ioport.h>	/* for struct resource */
>  #include <linux/resource_ext.h>
> @@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
>  void acpi_table_init_complete (void);
>  int acpi_table_init (void);
>  
> +static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
> +{
> +	struct acpi_table_header *table;
> +	int status = acpi_get_table(signature, instance, &table);
> +
> +	if (ACPI_FAILURE(status))
> +		return ERR_PTR(-ENOENT);
> +	return table;
> +}
> +DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR(_T)) acpi_put_table(_T))
nit: Is it useful to change the condition from !IS_ERR(_T) to
!IS_ERR_OR_NULL(_T)? This seems to be the common pattern. I do note that
acpi_put_table() can take NULL, so there is no real danger.
> +
>  int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
>  int __init_or_acpilib acpi_table_parse_entries(char *id,
>  		unsigned long table_size, int entry_id,
> @@ -1542,6 +1554,7 @@ int find_acpi_cpu_topology_cluster(unsigned int cpu);
>  int find_acpi_cpu_topology_package(unsigned int cpu);
>  int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
>  void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
> +int find_acpi_cache_level_from_id(u32 cache_id);
>  #else
>  static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
>  {
> @@ -1565,6 +1578,10 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
>  }
>  static inline void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id,
>  						     cpumask_t *cpus) { }
> +static inline int find_acpi_cache_level_from_id(u32 cache_id)
> +{
> +	return -EINVAL;
> +}
>  #endif
>  
>  void acpi_arch_init(void);
Thanks,
Ben
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id
  2025-08-27  9:25   ` Ben Horgan
@ 2025-08-28 15:57     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-28 15:57 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Rex Nie, Dave Martin,
	Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi Ben,
On 27/08/2025 10:25, Ben Horgan wrote:
> On 8/22/25 16:29, James Morse wrote:
>> The MPAM table identifies caches by id. The MPAM driver also wants to know
>> the cache level to determine if the platform is of the shape that can be
>> managed via resctrl. Cacheinfo has this information, but only for CPUs that
>> are online.
>>
>> Waiting for all CPUs to come online is a problem for platforms where
>> CPUs are brought online late by user-space.
>>
>> Add a helper that walks every possible cache, until it finds the one
>> identified by cache-id, then return the level.
>> Add a cleanup based free-ing mechanism for acpi_get_table().
>> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
>> index f97a9ff678cc..30c10b1dcdb2 100644
>> --- a/include/linux/acpi.h
>> +++ b/include/linux/acpi.h
>> @@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
>>  void acpi_table_init_complete (void);
>>  int acpi_table_init (void);
>>  
>> +static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
>> +{
>> +	struct acpi_table_header *table;
>> +	int status = acpi_get_table(signature, instance, &table);
>> +
>> +	if (ACPI_FAILURE(status))
>> +		return ERR_PTR(-ENOENT);
>> +	return table;
>> +}
>> +DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR(_T)) acpi_put_table(_T))
> nit: Is it useful to change the condition from !IS_ERR(_T) to
> !IS_ERR_OR_NULL(_T)? This seems to be the common pattern. I do note that
> acpi_put_table() can take NULL, so there is no real danger.
If it's the common pattern, sure.
But this code got dropped as Dave pointed out the PPTT doesn't care about the reference
counting anyway, its acpi_get_pptt() helper just uses the same reference for everything.
This might come back for the MPAM driver..
>> +
>>  int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
>>  int __init_or_acpilib acpi_table_parse_entries(char *id,
>>  		unsigned long table_size, int entry_id,
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
- * Re: [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id
  2025-08-22 15:29 ` [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id James Morse
  2025-08-23 12:14   ` Markus Elfring
  2025-08-27  9:25   ` Ben Horgan
@ 2025-08-27 10:50   ` Dave Martin
  2025-08-28 15:58     ` James Morse
  2 siblings, 1 reply; 200+ messages in thread
From: Dave Martin @ 2025-08-27 10:50 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi,
On Fri, Aug 22, 2025 at 03:29:46PM +0000, James Morse wrote:
> The MPAM table identifies caches by id. The MPAM driver also wants to know
> the cache level to determine if the platform is of the shape that can be
> managed via resctrl. Cacheinfo has this information, but only for CPUs that
> are online.
> 
> Waiting for all CPUs to come online is a problem for platforms where
> CPUs are brought online late by user-space.
> 
> Add a helper that walks every possible cache, until it finds the one
> identified by cache-id, then return the level.
> Add a cleanup based free-ing mechanism for acpi_get_table().
Does this mean that the early secondaries must be spread out across the
whole topology so that everything can be probed?
(i.e., a random subset is no good?)
If so, is this documented somewhere, such as in booting.rst?
Maybe this is not a new requirement -- it's not an area that I'm very
familiar with.
> 
> CC: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
>  * acpi_count_levels() now returns a value.
>  * Converted the table-get stuff to use Jonathan's cleanup helper.
>  * Dropped Sudeep's Review tag due to the cleanup change.
> ---
>  drivers/acpi/pptt.c  | 64 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/acpi.h | 17 ++++++++++++
>  2 files changed, 81 insertions(+)
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 8f9b9508acba..660457644a5b 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -907,3 +907,67 @@ int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
>  	return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
>  					  ACPI_PPTT_ACPI_IDENTICAL);
>  }
> +
> +/**
> + * find_acpi_cache_level_from_id() - Get the level of the specified cache
> + * @cache_id: The id field of the unified cache
> + *
> + * Determine the level relative to any CPU for the unified cache identified by
> + * cache_id. This allows the property to be found even if the CPUs are offline.
> + *
> + * The returned level can be used to group unified caches that are peers.
> + *
> + * The PPTT table must be rev 3 or later,
> + *
> + * If one CPUs L2 is shared with another as L3, this function will return
> + * an unpredictable value.
> + *
> + * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
Nit: doesn't exist or its revision is too old.
> + * Otherwise returns a value which represents the level of the specified cache.
> + */
> +int find_acpi_cache_level_from_id(u32 cache_id)
> +{
> +	u32 acpi_cpu_id;
> +	int level, cpu, num_levels;
> +	struct acpi_pptt_cache *cache;
> +	struct acpi_pptt_cache_v1 *cache_v1;
> +	struct acpi_pptt_processor *cpu_node;
> +	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);
acpi_get_pptt() ? (See comment on patch 3.)
Comments there also suggest that the acpi_put_table() may be
unnecessary, at least on some paths.
I haven't tried to understand the ins and outs of this.
> +
> +	if (IS_ERR(table))
> +		return PTR_ERR(table);
> +
> +	if (table->revision < 3)
> +		return -ENOENT;
> +
> +	/*
> +	 * If we found the cache first, we'd still need to walk from each CPU
> +	 * to find the level...
> +	 */
^ Possibly confusing comment?  The cache id is the starting point for
calling this function.  Is there a world in which we are at this point
without first having found the cache node?
(If the comment is just a restatement of part of the kerneldoc
description, maybe just drop it.)
> +	for_each_possible_cpu(cpu) {
> +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
> +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> +		if (!cpu_node)
> +			return -ENOENT;
> +		num_levels = acpi_count_levels(table, cpu_node, NULL);
Is the initial call to acpi_count_levels() really needed here?
It feels a bit like we end up enumerating the whole topology two or
three times here; once to count how many levels there are, and then
again to examine the nodes, and once more inside acpi_find_cache_node().
Why can't we just walk until we run out of levels?
I may be missing some details of how these functions interact -- if
this is only run at probe time, compact, well-factored code is
more important than making things as fast as possible.
> +
> +		/* Start at 1 for L1 */
> +		for (level = 1; level <= num_levels; level++) {
> +			cache = acpi_find_cache_node(table, acpi_cpu_id,
> +						     ACPI_PPTT_CACHE_TYPE_UNIFIED,
> +						     level, &cpu_node);
> +			if (!cache)
> +				continue;
> +
> +			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
> +						cache,
> +						sizeof(struct acpi_pptt_cache));
> +
> +			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
> +			    cache_v1->cache_id == cache_id)
> +				return level;
> +		}
> +	}
> +
> +	return -ENOENT;
> +}
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index f97a9ff678cc..30c10b1dcdb2 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
[...]
> @@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
>  void acpi_table_init_complete (void);
>  int acpi_table_init (void);
>  
> +static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
> +{
> +	struct acpi_table_header *table;
> +	int status = acpi_get_table(signature, instance, &table);
> +
> +	if (ACPI_FAILURE(status))
> +		return ERR_PTR(-ENOENT);
> +	return table;
> +}
This feels like something that ought to exist already.  If not, why
not?  If so, are there open-coded versions of this spread around the
ACPI tree that should be ported to use it?
[...]
Cheers
---Dave
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id
  2025-08-27 10:50   ` Dave Martin
@ 2025-08-28 15:58     ` James Morse
  2025-09-05 16:27       ` Dave Martin
  0 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-28 15:58 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi Dave,
On 27/08/2025 11:50, Dave Martin wrote:
> Hi,
> 
> On Fri, Aug 22, 2025 at 03:29:46PM +0000, James Morse wrote:
>> The MPAM table identifies caches by id. The MPAM driver also wants to know
>> the cache level to determine if the platform is of the shape that can be
>> managed via resctrl. Cacheinfo has this information, but only for CPUs that
>> are online.
>>
>> Waiting for all CPUs to come online is a problem for platforms where
>> CPUs are brought online late by user-space.
>>
>> Add a helper that walks every possible cache, until it finds the one
>> identified by cache-id, then return the level.
>> Add a cleanup based free-ing mechanism for acpi_get_table().
> Does this mean that the early secondaries must be spread out across the
> whole topology so that everything can be probed?
>
> (i.e., a random subset is no good?)
For the mpam driver - it needs to see each cache with mpam hardware, which means a CPU
associated with each cache needs to be online. Random is fine - provided you get lucky.
> If so, is this documented somewhere, such as in booting.rst?
booting.rst is for the bootloader.
Late secondaries is a bit of a niche sport, I've only seen it commonly done in VMs.
Most platforms so far have their MPAM controls on a global L3, so this requirement doesn't
make much of a difference.
The concern is that if resctrl gets probed after user-space has started, whatever
user-space service is supposed to set it up will have concluded its not supported. Working
with cache-ids for offline CPUs means you don't have to bring all the CPUs online - only
enough so that every piece of hardware is reachable.
> Maybe this is not a new requirement -- it's not an area that I'm very
> familiar with.
Hard to say - its a potentially surprising side effect of glomming OS accessible registers
onto the side of hardware that can be automatically powered off. (PSCI CPU_SUSPEND).
I did try getting cacheinfo to populate all the CPUs at boot, regardless of whether they
were online. Apparently that doesn't work for PowerPC where the properties of CPUs can
change while they are offline. (presumably due to RAS or a firmware update)
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index 8f9b9508acba..660457644a5b 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -907,3 +907,67 @@ int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
>>  	return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
>>  					  ACPI_PPTT_ACPI_IDENTICAL);
>>  }
>> +
>> +/**
>> + * find_acpi_cache_level_from_id() - Get the level of the specified cache
>> + * @cache_id: The id field of the unified cache
>> + *
>> + * Determine the level relative to any CPU for the unified cache identified by
>> + * cache_id. This allows the property to be found even if the CPUs are offline.
>> + *
>> + * The returned level can be used to group unified caches that are peers.
>> + *
>> + * The PPTT table must be rev 3 or later,
>> + *
>> + * If one CPUs L2 is shared with another as L3, this function will return
>> + * an unpredictable value.
>> + *
>> + * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
> 
> Nit: doesn't exist or its revision is too old.
... its not old, but there is no published spec for that revision... unsupported?
>> + * Otherwise returns a value which represents the level of the specified cache.
>> + */
>> +int find_acpi_cache_level_from_id(u32 cache_id)
>> +{
>> +	u32 acpi_cpu_id;
>> +	int level, cpu, num_levels;
>> +	struct acpi_pptt_cache *cache;
>> +	struct acpi_pptt_cache_v1 *cache_v1;
>> +	struct acpi_pptt_processor *cpu_node;
>> +	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);
> acpi_get_pptt() ? (See comment on patch 3.)
Yup,
> Comments there also suggest that the acpi_put_table() may be
> unnecessary, at least on some paths.
> 
> I haven't tried to understand the ins and outs of this.
It's grabbing one reference and using it for everything, because it needs to 'map' the
table in atomic context due to cpuhp, but can't.
Given how frequently its used, there is no problem just leaving it mapped.
>> +
>> +	if (IS_ERR(table))
>> +		return PTR_ERR(table);
>> +
>> +	if (table->revision < 3)
>> +		return -ENOENT;
>> +
>> +	/*
>> +	 * If we found the cache first, we'd still need to walk from each CPU
>> +	 * to find the level...
>> +	 */
> ^ Possibly confusing comment?  The cache id is the starting point for
> calling this function.  Is there a world in which we are at this point
> without first having found the cache node?
> 
> (If the comment is just a restatement of part of the kerneldoc
> description, maybe just drop it.)
It's describing the alternate world where the table is searched to find the cache first,
but then we'd still need to walk the table another NR_CPUs times, which can't be avoided.
I'll drop it - it was justifying why its done this way round...
>> +	for_each_possible_cpu(cpu) {
>> +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
>> +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
>> +		if (!cpu_node)
>> +			return -ENOENT;
>> +		num_levels = acpi_count_levels(table, cpu_node, NULL);
> 
> Is the initial call to acpi_count_levels() really needed here?
> 
> It feels a bit like we end up enumerating the whole topology two or
> three times here; once to count how many levels there are, and then
> again to examine the nodes, and once more inside acpi_find_cache_node().
> 
> Why can't we just walk until we run out of levels?
This is looking for a unified cache - and we don't know where those start.
We could walk the first 100 caches, and stop once we start getting unified caches, then
they stop again ... but this seemed simpler.
> I may be missing some details of how these functions interact -- if
> this is only run at probe time, compact, well-factored code is
> more important than making things as fast as possible.
>> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
>> index f97a9ff678cc..30c10b1dcdb2 100644
>> --- a/include/linux/acpi.h
>> +++ b/include/linux/acpi.h
> 
> [...]
> 
>> @@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
>>  void acpi_table_init_complete (void);
>>  int acpi_table_init (void);
>>  
>> +static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
>> +{
>> +	struct acpi_table_header *table;
>> +	int status = acpi_get_table(signature, instance, &table);
>> +
>> +	if (ACPI_FAILURE(status))
>> +		return ERR_PTR(-ENOENT);
>> +	return table;
>> +}
> This feels like something that ought to exist already.  If not, why
> not?  If so, are there open-coded versions of this spread around the
> ACPI tree that should be ported to use it?
It's a cleanup idiom helper that lets the compiler do this automagically - but its moot as
its not going to be needed in the pptt because of the acpi_get_pptt() thing.
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id
  2025-08-28 15:58     ` James Morse
@ 2025-09-05 16:27       ` Dave Martin
  2025-09-10 19:29         ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Dave Martin @ 2025-09-05 16:27 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On Thu, Aug 28, 2025 at 04:58:05PM +0100, James Morse wrote:
> Hi Dave,
> 
> On 27/08/2025 11:50, Dave Martin wrote:
> > Hi,
> > 
> > On Fri, Aug 22, 2025 at 03:29:46PM +0000, James Morse wrote:
> >> The MPAM table identifies caches by id. The MPAM driver also wants to know
> >> the cache level to determine if the platform is of the shape that can be
> >> managed via resctrl. Cacheinfo has this information, but only for CPUs that
> >> are online.
> >>
> >> Waiting for all CPUs to come online is a problem for platforms where
> >> CPUs are brought online late by user-space.
> >>
> >> Add a helper that walks every possible cache, until it finds the one
> >> identified by cache-id, then return the level.
> >> Add a cleanup based free-ing mechanism for acpi_get_table().
> 
> > Does this mean that the early secondaries must be spread out across the
> > whole topology so that everything can be probed?
> >
> > (i.e., a random subset is no good?)
> 
> For the mpam driver - it needs to see each cache with mpam hardware, which means a CPU
> associated with each cache needs to be online. Random is fine - provided you get lucky.
"Fine" = "not dependent on luck".  So, random is not fine.
> > If so, is this documented somewhere, such as in booting.rst?
> 
> booting.rst is for the bootloader.
> Late secondaries is a bit of a niche sport, I've only seen it commonly done in VMs.
> Most platforms so far have their MPAM controls on a global L3, so this requirement doesn't
> make much of a difference.
> 
> The concern is that if resctrl gets probed after user-space has started, whatever
> user-space service is supposed to set it up will have concluded its not supported. Working
> with cache-ids for offline CPUs means you don't have to bring all the CPUs online - only
> enough so that every piece of hardware is reachable.
> 
> 
> > Maybe this is not a new requirement -- it's not an area that I'm very
> > familiar with.
> 
> Hard to say - its a potentially surprising side effect of glomming OS accessible registers
> onto the side of hardware that can be automatically powered off. (PSCI CPU_SUSPEND).
> 
> I did try getting cacheinfo to populate all the CPUs at boot, regardless of whether they
> were online. Apparently that doesn't work for PowerPC where the properties of CPUs can
> change while they are offline. (presumably due to RAS or a firmware update)
So, it sounds like there is a requirement, but we don't document it,
and if the requirement is not met then the user is presented with an
obscure failure in the MPAM driver.  This seems a bit unhelpful?
I'm not saying booting.rst is the right place for this -- maybe the
appropriate document doesn't exist yet.
I wonder whether the required property is reasonable and general enough
that it should be treated as a kernel boot requirement.
Or, we require caches to be symmetric for non-early CPUs and reject
those that don't match when they try to come online (similarly to
the way cpufeatures deals with mismatches).
> >> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> >> index 8f9b9508acba..660457644a5b 100644
> >> --- a/drivers/acpi/pptt.c
> >> +++ b/drivers/acpi/pptt.c
> >> @@ -907,3 +907,67 @@ int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
> >>  	return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
> >>  					  ACPI_PPTT_ACPI_IDENTICAL);
> >>  }
> >> +
> >> +/**
> >> + * find_acpi_cache_level_from_id() - Get the level of the specified cache
> >> + * @cache_id: The id field of the unified cache
> >> + *
> >> + * Determine the level relative to any CPU for the unified cache identified by
> >> + * cache_id. This allows the property to be found even if the CPUs are offline.
> >> + *
> >> + * The returned level can be used to group unified caches that are peers.
> >> + *
> >> + * The PPTT table must be rev 3 or later,
> >> + *
> >> + * If one CPUs L2 is shared with another as L3, this function will return
> >> + * an unpredictable value.
> >> + *
> >> + * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
> > 
> > Nit: doesn't exist or its revision is too old.
> 
> ... its not old, but there is no published spec for that revision... unsupported?
That seems OK, say,
	"... if the PPTT doesn't exist or has an unsupported format, or ..."
> >> + * Otherwise returns a value which represents the level of the specified cache.
> >> + */
> >> +int find_acpi_cache_level_from_id(u32 cache_id)
> >> +{
> >> +	u32 acpi_cpu_id;
> >> +	int level, cpu, num_levels;
> >> +	struct acpi_pptt_cache *cache;
> >> +	struct acpi_pptt_cache_v1 *cache_v1;
> >> +	struct acpi_pptt_processor *cpu_node;
> >> +	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);
> 
> > acpi_get_pptt() ? (See comment on patch 3.)
> 
> Yup,
> 
> > Comments there also suggest that the acpi_put_table() may be
> > unnecessary, at least on some paths.
> > 
> > I haven't tried to understand the ins and outs of this.
> 
> It's grabbing one reference and using it for everything, because it needs to 'map' the
> table in atomic context due to cpuhp, but can't.
> Given how frequently its used, there is no problem just leaving it mapped.
That's rather what I thought -- in which case we can use it (as you
already concluded, by the looks of it).
> >> +
> >> +	if (IS_ERR(table))
> >> +		return PTR_ERR(table);
> >> +
> >> +	if (table->revision < 3)
> >> +		return -ENOENT;
> >> +
> >> +	/*
> >> +	 * If we found the cache first, we'd still need to walk from each CPU
> >> +	 * to find the level...
> >> +	 */
> 
> > ^ Possibly confusing comment?  The cache id is the starting point for
> > calling this function.  Is there a world in which we are at this point
> > without first having found the cache node?
> > 
> > (If the comment is just a restatement of part of the kerneldoc
> > description, maybe just drop it.)
> 
> It's describing the alternate world where the table is searched to find the cache first,
> but then we'd still need to walk the table another NR_CPUs times, which can't be avoided.
> I'll drop it - it was justifying why its done this way round...
Oh, I see, this is "if the code had been written in such-and-such a way",
not "if such-and-such a runtime precondition is met" ?
The comment can be read both ways, as it stands.
> 
> >> +	for_each_possible_cpu(cpu) {
> >> +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
> >> +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> >> +		if (!cpu_node)
> >> +			return -ENOENT;
> >> +		num_levels = acpi_count_levels(table, cpu_node, NULL);
> > 
> > Is the initial call to acpi_count_levels() really needed here?
> > 
> > It feels a bit like we end up enumerating the whole topology two or
> > three times here; once to count how many levels there are, and then
> > again to examine the nodes, and once more inside acpi_find_cache_node().
> > 
> > Why can't we just walk until we run out of levels?
> 
> This is looking for a unified cache - and we don't know where those start.
> We could walk the first 100 caches, and stop once we start getting unified caches, then
> they stop again ... but this seemed simpler.
I'm still a bit confused.
We start at level one, and then trace parents until we hit a unified
cache or run out of levels.
Why do we need to know a priori how many levels there are, when the
way to determine that is part of the same procedure we're already doing
(i.e., start at level one and trace parents until we run out of levels)?
> > I may be missing some details of how these functions interact -- if
> > this is only run at probe time, compact, well-factored code is
> > more important than making things as fast as possible.
(This still stands.)
> >> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
[...]
> >> @@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
> >>  void acpi_table_init_complete (void);
> >>  int acpi_table_init (void);
> >>  
> >> +static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
> >> +{
> >> +	struct acpi_table_header *table;
> >> +	int status = acpi_get_table(signature, instance, &table);
> >> +
> >> +	if (ACPI_FAILURE(status))
> >> +		return ERR_PTR(-ENOENT);
> >> +	return table;
> >> +}
> 
> > This feels like something that ought to exist already.  If not, why
> > not?  If so, are there open-coded versions of this spread around the
> > ACPI tree that should be ported to use it?
> 
> 
> It's a cleanup idiom helper that lets the compiler do this automagically - but its moot as
> its not going to be needed in the pptt because of the acpi_get_pptt() thing.
Ah, OK.
Cheers
---Dave
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id
  2025-09-05 16:27       ` Dave Martin
@ 2025-09-10 19:29         ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 19:29 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi Dave,
On 05/09/2025 17:27, Dave Martin wrote:
> On Thu, Aug 28, 2025 at 04:58:05PM +0100, James Morse wrote:
>> On 27/08/2025 11:50, Dave Martin wrote:
>>> On Fri, Aug 22, 2025 at 03:29:46PM +0000, James Morse wrote:
>>>> The MPAM table identifies caches by id. The MPAM driver also wants to know
>>>> the cache level to determine if the platform is of the shape that can be
>>>> managed via resctrl. Cacheinfo has this information, but only for CPUs that
>>>> are online.
>>>>
>>>> Waiting for all CPUs to come online is a problem for platforms where
>>>> CPUs are brought online late by user-space.
>>>>
>>>> Add a helper that walks every possible cache, until it finds the one
>>>> identified by cache-id, then return the level.
>>>> Add a cleanup based free-ing mechanism for acpi_get_table().
>>
>>> Does this mean that the early secondaries must be spread out across the
>>> whole topology so that everything can be probed?
>>>
>>> (i.e., a random subset is no good?)
>>
>> For the mpam driver - it needs to see each cache with mpam hardware, which means a CPU
>> associated with each cache needs to be online. Random is fine - provided you get lucky.
> 
> "Fine" = "not dependent on luck".  So, random is not fine.
As in it doesn't matter which CPUs - as long as you manage to represent each cache.
I really don't care if people configure their platform such that the mpam driver can't
complete its probing. Everything continues to work, you just don't get to use the unusable
features.
I've only really seen it done in VMs, which are most likely to have a single global MSC
because the thing has to be emulated.
>>> If so, is this documented somewhere, such as in booting.rst?
>>
>> booting.rst is for the bootloader.
>> Late secondaries is a bit of a niche sport, I've only seen it commonly done in VMs.
>> Most platforms so far have their MPAM controls on a global L3, so this requirement doesn't
>> make much of a difference.
>>
>> The concern is that if resctrl gets probed after user-space has started, whatever
>> user-space service is supposed to set it up will have concluded its not supported. Working
>> with cache-ids for offline CPUs means you don't have to bring all the CPUs online - only
>> enough so that every piece of hardware is reachable.
>>
>>
>>> Maybe this is not a new requirement -- it's not an area that I'm very
>>> familiar with.
>>
>> Hard to say - its a potentially surprising side effect of glomming OS accessible registers
>> onto the side of hardware that can be automatically powered off. (PSCI CPU_SUSPEND).
>>
>> I did try getting cacheinfo to populate all the CPUs at boot, regardless of whether they
>> were online. Apparently that doesn't work for PowerPC where the properties of CPUs can
>> change while they are offline. (presumably due to RAS or a firmware update)
> So, it sounds like there is a requirement, but we don't document it,
> and if the requirement is not met then the user is presented with an
> obscure failure in the MPAM driver.  This seems a bit unhelpful?
Not a failure. MPAM isn't yet available. Bring the rest of the system up and it will
spring into life. The same goes for any device you keep turned off.
> I'm not saying booting.rst is the right place for this -- maybe the
> appropriate document doesn't exist yet.
> 
> I wonder whether the required property is reasonable and general enough
> that it should be treated as a kernel boot requirement.
It's not a boot requirement. Linux boots just fine.
> Or, we require caches to be symmetric for non-early CPUs and reject
> those that don't match when they try to come online (similarly to
> the way cpufeatures deals with mismatches).
MPAM isn't an interesting enough feature to reject CPUs!
We can't check the the MSC properties until we can schedule, (thank the 'hide it in
firmware' folk for that one), meaning the CPU has to be online before we realise the
MPAM properties are mismatched.
The best approach here is to wait until everything has been seen before declaring that
we know what is going on. The architecture even calls this out as something that needs
doing, meaning the firmware tables have to describe all possible MSC.
I agree its not nice - but it is what MPAM is.
I think the people clever enough to manage late online-ing secondaries will work this out.
>>>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>>>> index 8f9b9508acba..660457644a5b 100644
>>>> --- a/drivers/acpi/pptt.c
>>>> +++ b/drivers/acpi/pptt.c
>>>> @@ -907,3 +907,67 @@ int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
>>
>>>> +	for_each_possible_cpu(cpu) {
>>>> +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
>>>> +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
>>>> +		if (!cpu_node)
>>>> +			return -ENOENT;
>>>> +		num_levels = acpi_count_levels(table, cpu_node, NULL);
>>>
>>> Is the initial call to acpi_count_levels() really needed here?
>>>
>>> It feels a bit like we end up enumerating the whole topology two or
>>> three times here; once to count how many levels there are, and then
>>> again to examine the nodes, and once more inside acpi_find_cache_node().
>>>
>>> Why can't we just walk until we run out of levels?
>>
>> This is looking for a unified cache - and we don't know where those start.
>> We could walk the first 100 caches, and stop once we start getting unified caches, then
>> they stop again ... but this seemed simpler.
> 
> I'm still a bit confused.
> 
> We start at level one, and then trace parents until we hit a unified
> cache or run out of levels.
> 
> Why do we need to know a priori how many levels there are, when the
> way to determine that is part of the same procedure we're already doing
> (i.e., start at level one and trace parents until we run out of levels)?
| level = 1;
| acpi_find_cache_node(table, acpi_cpu_id, ACPI_PPTT_CACHE_TYPE_UNIFIED,
|			level, &cpu_node);
Fails. Do we stop the loop, or try level=2?
level = 1 fails because the L1 (probably) isn't a unified cache. Is L2? We don't know.
It's simpler to know how many levels there are, and walk the lot, than it is to try and
work out where the unified bit of the hierarchy starts - and start walking from there.
Yes the PPTT parsing could be different, but this is the kind of shape it has. Doing it
like this is in-keeping with the rest of it.
>>> I may be missing some details of how these functions interact -- if
>>> this is only run at probe time, compact, well-factored code is
>>> more important than making things as fast as possible.
> 
> (This still stands.)
This is all done at probe time, and never called again.
Nothing else in here caches values - it derives it all from the table every time so that
its stateless.
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
 
 
 
- * [PATCH 06/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (4 preceding siblings ...)
  2025-08-22 15:29 ` [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id James Morse
@ 2025-08-22 15:29 ` James Morse
  2025-08-27 10:53   ` Dave Martin
  2025-08-22 15:29 ` [PATCH 07/33] arm64: kconfig: Add Kconfig entry for MPAM James Morse
                   ` (61 subsequent siblings)
  67 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Rohit Mathew
MPAM identifies CPUs by the cache_id in the PPTT cache structure.
The driver needs to know which CPUs are associated with the cache,
the CPUs may not all be online, so cacheinfo does not have the
information.
Add a helper to pull this information out of the PPTT.
CC: Rohit Mathew <Rohit.Mathew@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
---
Changes since RFC:
 * acpi_count_levels() now returns a value.
 * Converted the table-get stuff to use Jonathan's cleanup helper.
 * Dropped Sudeep's Review tag due to the cleanup change.
---
 drivers/acpi/pptt.c  | 62 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h |  6 +++++
 2 files changed, 68 insertions(+)
diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 660457644a5b..cb93a9a7f9b6 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -971,3 +971,65 @@ int find_acpi_cache_level_from_id(u32 cache_id)
 
 	return -ENOENT;
 }
+
+/**
+ * acpi_pptt_get_cpumask_from_cache_id() - Get the cpus associated with the
+ *					   specified cache
+ * @cache_id: The id field of the unified cache
+ * @cpus: Where to build the cpumask
+ *
+ * Determine which CPUs are below this cache in the PPTT. This allows the property
+ * to be found even if the CPUs are offline.
+ *
+ * The PPTT table must be rev 3 or later,
+ *
+ * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
+ * Otherwise returns 0 and sets the cpus in the provided cpumask.
+ */
+int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus)
+{
+	u32 acpi_cpu_id;
+	int level, cpu, num_levels;
+	struct acpi_pptt_cache *cache;
+	struct acpi_pptt_cache_v1 *cache_v1;
+	struct acpi_pptt_processor *cpu_node;
+	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);
+
+	cpumask_clear(cpus);
+
+	if (IS_ERR(table))
+		return -ENOENT;
+
+	if (table->revision < 3)
+		return -ENOENT;
+
+	/*
+	 * If we found the cache first, we'd still need to walk from each cpu.
+	 */
+	for_each_possible_cpu(cpu) {
+		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
+		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
+		if (!cpu_node)
+			return 0;
+		num_levels = acpi_count_levels(table, cpu_node, NULL);
+
+		/* Start at 1 for L1 */
+		for (level = 1; level <= num_levels; level++) {
+			cache = acpi_find_cache_node(table, acpi_cpu_id,
+						     ACPI_PPTT_CACHE_TYPE_UNIFIED,
+						     level, &cpu_node);
+			if (!cache)
+				continue;
+
+			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
+						cache,
+						sizeof(struct acpi_pptt_cache));
+
+			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
+			    cache_v1->cache_id == cache_id)
+				cpumask_set_cpu(cpu, cpus);
+		}
+	}
+
+	return 0;
+}
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 30c10b1dcdb2..4ad08f5f1d83 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -1555,6 +1555,7 @@ int find_acpi_cpu_topology_package(unsigned int cpu);
 int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
 void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
 int find_acpi_cache_level_from_id(u32 cache_id);
+int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus);
 #else
 static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
 {
@@ -1582,6 +1583,11 @@ static inline int find_acpi_cache_level_from_id(u32 cache_id)
 {
 	return -EINVAL;
 }
+static inline int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id,
+						      cpumask_t *cpus)
+{
+	return -EINVAL;
+}
 #endif
 
 void acpi_arch_init(void);
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 06/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
  2025-08-22 15:29 ` [PATCH 06/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id James Morse
@ 2025-08-27 10:53   ` Dave Martin
  2025-08-28 15:58     ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Dave Martin @ 2025-08-27 10:53 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On Fri, Aug 22, 2025 at 03:29:47PM +0000, James Morse wrote:
> MPAM identifies CPUs by the cache_id in the PPTT cache structure.
> 
> The driver needs to know which CPUs are associated with the cache,
> the CPUs may not all be online, so cacheinfo does not have the
> information.
Nit: cacheinfo lacking the information is not a consequence of the
driver needing it.
Maybe split the sentence:
-> "[...] associated with the cache. The CPUs may not [...]"
> 
> Add a helper to pull this information out of the PPTT.
> 
> CC: Rohit Mathew <Rohit.Mathew@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
> ---
> Changes since RFC:
>  * acpi_count_levels() now returns a value.
>  * Converted the table-get stuff to use Jonathan's cleanup helper.
>  * Dropped Sudeep's Review tag due to the cleanup change.
> ---
>  drivers/acpi/pptt.c  | 62 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/acpi.h |  6 +++++
>  2 files changed, 68 insertions(+)
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 660457644a5b..cb93a9a7f9b6 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -971,3 +971,65 @@ int find_acpi_cache_level_from_id(u32 cache_id)
>  
>  	return -ENOENT;
>  }
> +
> +/**
> + * acpi_pptt_get_cpumask_from_cache_id() - Get the cpus associated with the
> + *					   specified cache
> + * @cache_id: The id field of the unified cache
> + * @cpus: Where to build the cpumask
> + *
> + * Determine which CPUs are below this cache in the PPTT. This allows the property
> + * to be found even if the CPUs are offline.
> + *
> + * The PPTT table must be rev 3 or later,
> + *
> + * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
> + * Otherwise returns 0 and sets the cpus in the provided cpumask.
> + */
> +int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus)
> +{
> +	u32 acpi_cpu_id;
> +	int level, cpu, num_levels;
> +	struct acpi_pptt_cache *cache;
> +	struct acpi_pptt_cache_v1 *cache_v1;
> +	struct acpi_pptt_processor *cpu_node;
> +	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);
> +
> +	cpumask_clear(cpus);
> +
> +	if (IS_ERR(table))
> +		return -ENOENT;
> +
> +	if (table->revision < 3)
> +		return -ENOENT;
> +
> +	/*
> +	 * If we found the cache first, we'd still need to walk from each cpu.
> +	 */
> +	for_each_possible_cpu(cpu) {
> +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
> +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> +		if (!cpu_node)
> +			return 0;
> +		num_levels = acpi_count_levels(table, cpu_node, NULL);
> +
> +		/* Start at 1 for L1 */
> +		for (level = 1; level <= num_levels; level++) {
> +			cache = acpi_find_cache_node(table, acpi_cpu_id,
> +						     ACPI_PPTT_CACHE_TYPE_UNIFIED,
> +						     level, &cpu_node);
> +			if (!cache)
> +				continue;
> +
> +			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
> +						cache,
> +						sizeof(struct acpi_pptt_cache));
> +
> +			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
> +			    cache_v1->cache_id == cache_id)
> +				cpumask_set_cpu(cpu, cpus);
Again, it feels like we are repeating the same walk multiple times to
determine how deep the table is (on which point the table is self-
describing anyway), and then again to derive some static property, and
then we are then doing all of that work multiple times to derive
different static properties, etc.
Can we not just walk over the tables once and stash the derived
properties somewhere?
I'm still getting my head around this parsing code, so I'm not saying
that the approach is incorrect here -- just wondering whether there is
a way to make it simpler.
[...]
Cheers
---Dave
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 06/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
  2025-08-27 10:53   ` Dave Martin
@ 2025-08-28 15:58     ` James Morse
  2025-09-09 10:14       ` Dave Martin
  0 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-28 15:58 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hi Dave,
On 27/08/2025 11:53, Dave Martin wrote:
> On Fri, Aug 22, 2025 at 03:29:47PM +0000, James Morse wrote:
>> MPAM identifies CPUs by the cache_id in the PPTT cache structure.
>>
>> The driver needs to know which CPUs are associated with the cache,
>> the CPUs may not all be online, so cacheinfo does not have the
>> information.
> 
> Nit: cacheinfo lacking the information is not a consequence of the
> driver needing it.
> 
> Maybe split the sentence:
> 
> -> "[...] associated with the cache. The CPUs may not [...]"
Sure,
>> Add a helper to pull this information out of the PPTT.
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index 660457644a5b..cb93a9a7f9b6 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -971,3 +971,65 @@ int find_acpi_cache_level_from_id(u32 cache_id)
>>  
>>  	return -ENOENT;
>>  }
>> +
>> +/**
>> + * acpi_pptt_get_cpumask_from_cache_id() - Get the cpus associated with the
>> + *					   specified cache
>> + * @cache_id: The id field of the unified cache
>> + * @cpus: Where to build the cpumask
>> + *
>> + * Determine which CPUs are below this cache in the PPTT. This allows the property
>> + * to be found even if the CPUs are offline.
>> + *
>> + * The PPTT table must be rev 3 or later,
>> + *
>> + * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
>> + * Otherwise returns 0 and sets the cpus in the provided cpumask.
>> + */
>> +int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus)
>> +{
>> +	u32 acpi_cpu_id;
>> +	int level, cpu, num_levels;
>> +	struct acpi_pptt_cache *cache;
>> +	struct acpi_pptt_cache_v1 *cache_v1;
>> +	struct acpi_pptt_processor *cpu_node;
>> +	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);
>> +
>> +	cpumask_clear(cpus);
>> +
>> +	if (IS_ERR(table))
>> +		return -ENOENT;
>> +
>> +	if (table->revision < 3)
>> +		return -ENOENT;
>> +
>> +	/*
>> +	 * If we found the cache first, we'd still need to walk from each cpu.
>> +	 */
>> +	for_each_possible_cpu(cpu) {
>> +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
>> +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
>> +		if (!cpu_node)
>> +			return 0;
>> +		num_levels = acpi_count_levels(table, cpu_node, NULL);
>> +
>> +		/* Start at 1 for L1 */
>> +		for (level = 1; level <= num_levels; level++) {
>> +			cache = acpi_find_cache_node(table, acpi_cpu_id,
>> +						     ACPI_PPTT_CACHE_TYPE_UNIFIED,
>> +						     level, &cpu_node);
>> +			if (!cache)
>> +				continue;
>> +
>> +			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
>> +						cache,
>> +						sizeof(struct acpi_pptt_cache));
>> +
>> +			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
>> +			    cache_v1->cache_id == cache_id)
>> +				cpumask_set_cpu(cpu, cpus);
> Again, it feels like we are repeating the same walk multiple times to
> determine how deep the table is (on which point the table is self-
> describing anyway), and then again to derive some static property, and
> then we are then doing all of that work multiple times to derive
> different static properties, etc.
> 
> Can we not just walk over the tables once and stash the derived
> properties somewhere?
That is possible - but its a more invasive change to the PPTT parsing code.
Before the introduction of the leaf flag, the search for a processor also included a
search to check if the discovered node was a leaf.
I think this is trading time - walking over the table multiple times, against the memory
you'd need to de-serialise the tree to find the necessary properties quickly. I think the
reason Jeremy L went this way was because there may never be another request into this
code, so being ready with a quick answer was a waste of memory.
MPAM doesn't change this - all these things are done up front during driver probing, and
the values are cached by the driver.
> I'm still getting my head around this parsing code, so I'm not saying
> that the approach is incorrect here -- just wondering whether there is
> a way to make it simpler.
It's walked at boot, and on cpu-hotplug. Neither are particularly performance critical.
I agree that as platforms get bigger, there will be a tipping point ... I don't think
anyone has complained yet!
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 06/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
  2025-08-28 15:58     ` James Morse
@ 2025-09-09 10:14       ` Dave Martin
  2025-09-10 19:29         ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Dave Martin @ 2025-09-09 10:14 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hi,
On Thu, Aug 28, 2025 at 04:58:16PM +0100, James Morse wrote:
> Hi Dave,
> 
> On 27/08/2025 11:53, Dave Martin wrote:
> > On Fri, Aug 22, 2025 at 03:29:47PM +0000, James Morse wrote:
> >> MPAM identifies CPUs by the cache_id in the PPTT cache structure.
> >>
> >> The driver needs to know which CPUs are associated with the cache,
> >> the CPUs may not all be online, so cacheinfo does not have the
> >> information.
> > 
> > Nit: cacheinfo lacking the information is not a consequence of the
> > driver needing it.
> > 
> > Maybe split the sentence:
> > 
> > -> "[...] associated with the cache. The CPUs may not [...]"
> 
> Sure,
OK
> >> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> >> index 660457644a5b..cb93a9a7f9b6 100644
> >> --- a/drivers/acpi/pptt.c
> >> +++ b/drivers/acpi/pptt.c
> >> @@ -971,3 +971,65 @@ int find_acpi_cache_level_from_id(u32 cache_id)
[...]
> >> + * acpi_pptt_get_cpumask_from_cache_id() - Get the cpus associated with the
> >> + *					   specified cache
> >> + * @cache_id: The id field of the unified cache
> >> + * @cpus: Where to build the cpumask
> >> + *
> >> + * Determine which CPUs are below this cache in the PPTT. This allows the property
> >> + * to be found even if the CPUs are offline.
> >> + *
> >> + * The PPTT table must be rev 3 or later,
> >> + *
> >> + * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
> >> + * Otherwise returns 0 and sets the cpus in the provided cpumask.
> >> + */
> >> +int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus)
> >> +{
[...]
> >> +	/*
> >> +	 * If we found the cache first, we'd still need to walk from each cpu.
> >> +	 */
> >> +	for_each_possible_cpu(cpu) {
[...]
> > Again, it feels like we are repeating the same walk multiple times to
> > determine how deep the table is (on which point the table is self-
> > describing anyway), and then again to derive some static property, and
> > then we are then doing all of that work multiple times to derive
> > different static properties, etc.
> > 
> > Can we not just walk over the tables once and stash the derived
> > properties somewhere?
> 
> That is possible - but its a more invasive change to the PPTT parsing code.
> Before the introduction of the leaf flag, the search for a processor also included a
> search to check if the discovered node was a leaf.
> 
> I think this is trading time - walking over the table multiple times, against the memory
> you'd need to de-serialise the tree to find the necessary properties quickly. I think the
> reason Jeremy L went this way was because there may never be another request into this
> code, so being ready with a quick answer was a waste of memory.
> 
> MPAM doesn't change this - all these things are done up front during driver probing, and
> the values are cached by the driver.
I guess that's true.
> > I'm still getting my head around this parsing code, so I'm not saying
> > that the approach is incorrect here -- just wondering whether there is
> > a way to make it simpler.
> 
> It's walked at boot, and on cpu-hotplug. Neither are particularly performance critical.
Do we do this only for unknown late secondaries (e.g., that haven't
previously come online?)  I haven't gone to track this down but, if not,
this cuts across the assertion that "there may never be another request
into this code".
cpu hotlug is slow in practice, but gratuitous cost on this path should
still be avoided where feasible.
> I agree that as platforms get bigger, there will be a tipping point ... I don't think
> anyone has complained yet!
Ack -- when in ACPI, do as the ACPI folks do, I guess.
Cheers
---Dave
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 06/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
  2025-09-09 10:14       ` Dave Martin
@ 2025-09-10 19:29         ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 19:29 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hi Dave,
On 09/09/2025 11:14, Dave Martin wrote:
> On Thu, Aug 28, 2025 at 04:58:16PM +0100, James Morse wrote:
>> On 27/08/2025 11:53, Dave Martin wrote:
>>> On Fri, Aug 22, 2025 at 03:29:47PM +0000, James Morse wrote:
>>>> MPAM identifies CPUs by the cache_id in the PPTT cache structure.
>>>>
>>>> The driver needs to know which CPUs are associated with the cache,
>>>> the CPUs may not all be online, so cacheinfo does not have the
>>>> information.
>>>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>>>> index 660457644a5b..cb93a9a7f9b6 100644
>>>> --- a/drivers/acpi/pptt.c
>>>> +++ b/drivers/acpi/pptt.c
>>>> @@ -971,3 +971,65 @@ int find_acpi_cache_level_from_id(u32 cache_id)
> 
> [...]
> 
>>>> + * acpi_pptt_get_cpumask_from_cache_id() - Get the cpus associated with the
>>>> + *					   specified cache
>>>> + * @cache_id: The id field of the unified cache
>>>> + * @cpus: Where to build the cpumask
>>>> + *
>>>> + * Determine which CPUs are below this cache in the PPTT. This allows the property
>>>> + * to be found even if the CPUs are offline.
>>>> + *
>>>> + * The PPTT table must be rev 3 or later,
>>>> + *
>>>> + * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
>>>> + * Otherwise returns 0 and sets the cpus in the provided cpumask.
>>>> + */
>>>> +int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus)
>>>> +{
> 
> [...]
> 
>>>> +	/*
>>>> +	 * If we found the cache first, we'd still need to walk from each cpu.
>>>> +	 */
>>>> +	for_each_possible_cpu(cpu) {
> 
> [...]
> 
>>> Again, it feels like we are repeating the same walk multiple times to
>>> determine how deep the table is (on which point the table is self-
>>> describing anyway), and then again to derive some static property, and
>>> then we are then doing all of that work multiple times to derive
>>> different static properties, etc.
>>>
>>> Can we not just walk over the tables once and stash the derived
>>> properties somewhere?
>>
>> That is possible - but its a more invasive change to the PPTT parsing code.
>> Before the introduction of the leaf flag, the search for a processor also included a
>> search to check if the discovered node was a leaf.
>>
>> I think this is trading time - walking over the table multiple times, against the memory
>> you'd need to de-serialise the tree to find the necessary properties quickly. I think the
>> reason Jeremy L went this way was because there may never be another request into this
>> code, so being ready with a quick answer was a waste of memory.
>>
>> MPAM doesn't change this - all these things are done up front during driver probing, and
>> the values are cached by the driver.
> 
> I guess that's true.
> 
>>> I'm still getting my head around this parsing code, so I'm not saying
>>> that the approach is incorrect here -- just wondering whether there is
>>> a way to make it simpler.
>>
>> It's walked at boot, and on cpu-hotplug. Neither are particularly performance critical.
> Do we do this only for unknown late secondaries (e.g., that haven't
> previously come online?) 
No, each time a CPU comes online.
> I haven't gone to track this down but, if not,
> this cuts across the assertion that "there may never be another request
> into this code".
CPU hotplug is optional - you don't have to bounce CPUs. It's very common on mobile parts
for power saving. I think its fairly unusual on server parts, once CPUs are online they
stay online.
The cacheinfo code doesn't cache this, it re-reads it every time. That turns out to be
because of PowerPC where some of these properties can be changed while a CPU is offline.
Sure, we could have a Kconfig thing to say ARCH_STATIC_TABLES_ARE_STATIC, but that would
be a different piece of work.
(I've had a couple of stabs at this, but cacheinfo is the shape it needs to be)
> cpu hotlug is slow in practice, but gratuitous cost on this path should
> still be avoided where feasible.
> 
>> I agree that as platforms get bigger, there will be a tipping point ... I don't think
>> anyone has complained yet!
> 
> Ack -- when in ACPI, do as the ACPI folks do, I guess.
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
 
 
 
- * [PATCH 07/33] arm64: kconfig: Add Kconfig entry for MPAM
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (5 preceding siblings ...)
  2025-08-22 15:29 ` [PATCH 06/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id James Morse
@ 2025-08-22 15:29 ` James Morse
  2025-08-27  8:53   ` Ben Horgan
  2025-08-27 11:01   ` Dave Martin
  2025-08-22 15:29 ` [PATCH 08/33] ACPI / MPAM: Parse the MPAM table James Morse
                   ` (60 subsequent siblings)
  67 siblings, 2 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
The bulk of the MPAM driver lives outside the arch code because it
largely manages MMIO devices that generate interrupts. The driver
needs a Kconfig symbol to enable it, as MPAM is only found on arm64
platforms, that is where the Kconfig option makes the most sense.
This Kconfig option will later be used by the arch code to enable
or disable the MPAM context-switch code, and registering the CPUs
properties with the MPAM driver.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
 arch/arm64/Kconfig | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e9bbfacc35a6..658e47fc0c5a 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2060,6 +2060,23 @@ config ARM64_TLB_RANGE
 	  ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
 	  range of input addresses.
 
+config ARM64_MPAM
+	bool "Enable support for MPAM"
+	help
+	  Memory Partitioning and Monitoring is an optional extension
+	  that allows the CPUs to mark load and store transactions with
+	  labels for partition-id and performance-monitoring-group.
+	  System components, such as the caches, can use the partition-id
+	  to apply a performance policy. MPAM monitors can use the
+	  partition-id and performance-monitoring-group to measure the
+	  cache occupancy or data throughput.
+
+	  Use of this extension requires CPU support, support in the
+	  memory system components (MSC), and a description from firmware
+	  of where the MSC are in the address space.
+
+	  MPAM is exposed to user-space via the resctrl pseudo filesystem.
+
 endmenu # "ARMv8.4 architectural features"
 
 menu "ARMv8.5 architectural features"
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 07/33] arm64: kconfig: Add Kconfig entry for MPAM
  2025-08-22 15:29 ` [PATCH 07/33] arm64: kconfig: Add Kconfig entry for MPAM James Morse
@ 2025-08-27  8:53   ` Ben Horgan
  2025-08-28 15:58     ` James Morse
  2025-08-27 11:01   ` Dave Martin
  1 sibling, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-08-27  8:53 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:29, James Morse wrote:
> The bulk of the MPAM driver lives outside the arch code because it
> largely manages MMIO devices that generate interrupts. The driver
> needs a Kconfig symbol to enable it, as MPAM is only found on arm64
> platforms, that is where the Kconfig option makes the most sense.
> 
> This Kconfig option will later be used by the arch code to enable
> or disable the MPAM context-switch code, and registering the CPUs
> properties with the MPAM driver.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> ---
>  arch/arm64/Kconfig | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index e9bbfacc35a6..658e47fc0c5a 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2060,6 +2060,23 @@ config ARM64_TLB_RANGE
>  	  ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
>  	  range of input addresses.
>  
> +config ARM64_MPAM
> +	bool "Enable support for MPAM"
> +	help
> +	  Memory Partitioning and Monitoring is an optional extension
> +	  that allows the CPUs to mark load and store transactions with
> +	  labels for partition-id and performance-monitoring-group.
> +	  System components, such as the caches, can use the partition-id
> +	  to apply a performance policy. MPAM monitors can use the
> +	  partition-id and performance-monitoring-group to measure the
> +	  cache occupancy or data throughput.
> +
> +	  Use of this extension requires CPU support, support in the
> +	  memory system components (MSC), and a description from firmware
> +	  of where the MSC are in the address space.
> +
> +	  MPAM is exposed to user-space via the resctrl pseudo filesystem.
> +
>  endmenu # "ARMv8.4 architectural features"
Should this be moved to "ARMv8.2 architectural features" rather than the
8.4 menu? In the arm reference manual, version L.b, I see FEAT_MPAM
listed in the section A2.2.3.1 Features added to the Armv8.2 extension
in later releases.
>  
>  menu "ARMv8.5 architectural features"
Thanks,
Ben
^ permalink raw reply	[flat|nested] 200+ messages in thread 
- * Re: [PATCH 07/33] arm64: kconfig: Add Kconfig entry for MPAM
  2025-08-27  8:53   ` Ben Horgan
@ 2025-08-28 15:58     ` James Morse
  2025-08-29  8:20       ` Ben Horgan
  0 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-28 15:58 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi Ben,
On 27/08/2025 09:53, Ben Horgan wrote:
> On 8/22/25 16:29, James Morse wrote:
>> The bulk of the MPAM driver lives outside the arch code because it
>> largely manages MMIO devices that generate interrupts. The driver
>> needs a Kconfig symbol to enable it, as MPAM is only found on arm64
>> platforms, that is where the Kconfig option makes the most sense.
>>
>> This Kconfig option will later be used by the arch code to enable
>> or disable the MPAM context-switch code, and registering the CPUs
>> properties with the MPAM driver.
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index e9bbfacc35a6..658e47fc0c5a 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -2060,6 +2060,23 @@ config ARM64_TLB_RANGE
>>  	  ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
>>  	  range of input addresses.
>>  
>> +config ARM64_MPAM
>> +	bool "Enable support for MPAM"
>> +	help
>> +	  Memory Partitioning and Monitoring is an optional extension
>> +	  that allows the CPUs to mark load and store transactions with
>> +	  labels for partition-id and performance-monitoring-group.
>> +	  System components, such as the caches, can use the partition-id
>> +	  to apply a performance policy. MPAM monitors can use the
>> +	  partition-id and performance-monitoring-group to measure the
>> +	  cache occupancy or data throughput.
>> +
>> +	  Use of this extension requires CPU support, support in the
>> +	  memory system components (MSC), and a description from firmware
>> +	  of where the MSC are in the address space.
>> +
>> +	  MPAM is exposed to user-space via the resctrl pseudo filesystem.
>> +
>>  endmenu # "ARMv8.4 architectural features"
> Should this be moved to "ARMv8.2 architectural features" rather than the
> 8.4 menu? In the arm reference manual, version L.b, I see FEAT_MPAM
> listed in the section A2.2.3.1 Features added to the Armv8.2 extension
> in later releases.
Hmmm, I don't think we've done that anywhere else. I'm only aware of one v8.2 platform
that had it, and those are not widely available. As it was a headline v8.4 feature I'd
prefer to keep it there.
I think its more confusing to put it under v8.2!
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread 
- * Re: [PATCH 07/33] arm64: kconfig: Add Kconfig entry for MPAM
  2025-08-28 15:58     ` James Morse
@ 2025-08-29  8:20       ` Ben Horgan
  0 siblings, 0 replies; 200+ messages in thread
From: Ben Horgan @ 2025-08-29  8:20 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/28/25 16:58, James Morse wrote:
> Hi Ben,
> 
> On 27/08/2025 09:53, Ben Horgan wrote:
>> On 8/22/25 16:29, James Morse wrote:
>>> The bulk of the MPAM driver lives outside the arch code because it
>>> largely manages MMIO devices that generate interrupts. The driver
>>> needs a Kconfig symbol to enable it, as MPAM is only found on arm64
>>> platforms, that is where the Kconfig option makes the most sense.
>>>
>>> This Kconfig option will later be used by the arch code to enable
>>> or disable the MPAM context-switch code, and registering the CPUs
>>> properties with the MPAM driver.
> 
>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>> index e9bbfacc35a6..658e47fc0c5a 100644
>>> --- a/arch/arm64/Kconfig
>>> +++ b/arch/arm64/Kconfig
>>> @@ -2060,6 +2060,23 @@ config ARM64_TLB_RANGE
>>>  	  ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
>>>  	  range of input addresses.
>>>  
>>> +config ARM64_MPAM
>>> +	bool "Enable support for MPAM"
>>> +	help
>>> +	  Memory Partitioning and Monitoring is an optional extension
>>> +	  that allows the CPUs to mark load and store transactions with
>>> +	  labels for partition-id and performance-monitoring-group.
>>> +	  System components, such as the caches, can use the partition-id
>>> +	  to apply a performance policy. MPAM monitors can use the
>>> +	  partition-id and performance-monitoring-group to measure the
>>> +	  cache occupancy or data throughput.
>>> +
>>> +	  Use of this extension requires CPU support, support in the
>>> +	  memory system components (MSC), and a description from firmware
>>> +	  of where the MSC are in the address space.
>>> +
>>> +	  MPAM is exposed to user-space via the resctrl pseudo filesystem.
>>> +
>>>  endmenu # "ARMv8.4 architectural features"
> 
>> Should this be moved to "ARMv8.2 architectural features" rather than the
>> 8.4 menu? In the arm reference manual, version L.b, I see FEAT_MPAM
>> listed in the section A2.2.3.1 Features added to the Armv8.2 extension
>> in later releases.
> 
> Hmmm, I don't think we've done that anywhere else. I'm only aware of one v8.2 platform
> that had it, and those are not widely available. As it was a headline v8.4 feature I'd
> prefer to keep it there.
> 
> I think its more confusing to put it under v8.2!
Ok, always best to minimise confusion. Keep it in v8.4.
 >
> Thanks,
> 
> James
-- 
Thanks,
Ben
^ permalink raw reply	[flat|nested] 200+ messages in thread 
 
 
- * Re: [PATCH 07/33] arm64: kconfig: Add Kconfig entry for MPAM
  2025-08-22 15:29 ` [PATCH 07/33] arm64: kconfig: Add Kconfig entry for MPAM James Morse
  2025-08-27  8:53   ` Ben Horgan
@ 2025-08-27 11:01   ` Dave Martin
  2025-09-04 17:28     ` James Morse
  1 sibling, 1 reply; 200+ messages in thread
From: Dave Martin @ 2025-08-27 11:01 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi,
<super-pedantic mode enabled>
(Since this likely be people's go-to patch for understanding what MPAM
is, it is probably worth going the extra mile.)
On Fri, Aug 22, 2025 at 03:29:48PM +0000, James Morse wrote:
> The bulk of the MPAM driver lives outside the arch code because it
> largely manages MMIO devices that generate interrupts. The driver
> needs a Kconfig symbol to enable it, as MPAM is only found on arm64
Prefer -> "[...] to enable it. As MPAM is only [...]"
> platforms, that is where the Kconfig option makes the most sense.
It could be clearer what "where" refers to, here.
Maybe reword from ", that is [...]" -> ", the arm64 tree is the most
natural home for the Kconfig option."
(Or something like that.)
> This Kconfig option will later be used by the arch code to enable
> or disable the MPAM context-switch code, and registering the CPUs
Nit: "registering" -> "to register"
> properties with the MPAM driver.
Nit: "CPUs properties" -> "properties of CPUs" ?
(Maybe there was just a missed apostrophe, but it may be more readable
here if written out longhand.)
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> ---
>  arch/arm64/Kconfig | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index e9bbfacc35a6..658e47fc0c5a 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2060,6 +2060,23 @@ config ARM64_TLB_RANGE
>  	  ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
>  	  range of input addresses.
>  
> +config ARM64_MPAM
> +	bool "Enable support for MPAM"
> +	help
<pedantic mode on>
> +	  Memory Partitioning and Monitoring is an optional extension
> +	  that allows the CPUs to mark load and store transactions with
Nit: "memory transactions" ?
(I'm wondering whether there are some transactions such as atomic
exchanges that are not neatly characterised as "load" or "store".
Possibly MPAM labels some transactions that really are neither.)
> +	  labels for partition-id and performance-monitoring-group.
Nit: the hyphenation suggests that these are known terms (in this
specific, hyphenated, form) with specific definitions somewhere.
I don't think that this is the case?  At least, I have not seen the
terms presented in this way anywhere else.
Also, the partition ID is itself a label, so "label for partition-id"
is a tautology.
How about:
--8<--
	  Memory System Resource Partitioning and Monitoring (MPAM) is an
	  optional extension to the Arm architecture that allows each
	  transaction issued to the memory system to be labelled with a
	  Partition identifier (PARTID) and Performance Monitoring Group
	  identifier (PMG).
-->8--
(Yes, that really seems to be what MPAM stands for in the published
specs.  That's quite a mounthful, and news to me...  I can't say I paid
much attention to the document titles beyond "MPAM"!)
> +	  System components, such as the caches, can use the partition-id
> +	  to apply a performance policy. MPAM monitors can use the
What is a "performance policy"?
The MPAM specs talk about resource controls; it's probably best to
stick to the same terminology.
> +	  partition-id and performance-monitoring-group to measure the
> +	  cache occupancy or data throughput.
So, how about something like:
--8<--
	  Memory system components, such as the caches, can be configured with
	  policies to control how much of various physical resources (such as
	  memory bandwidth or cache memory) the transactions labelled with each
	  PARTID can consume.  Depending on the capabilities of the hardware,
	  the PARTID and PMG can also be used as filtering criteria to measure
	  the memory system resource consumption of different parts of a
	  workload.
-->8--
(Where "Memory system components" is used in a generic sense and so not
capitalised.)
> +
> +	  Use of this extension requires CPU support, support in the
> +	  memory system components (MSC), and a description from firmware
But here, we are explicitly using an architectural term now, so
	"Memory System Components" (MSC)
makes sense.
> +	  of where the MSC are in the address space.
Prefer "MSCs" ?  (Not everyone agrees about whether TLAs are
pluralisable but it is easier on the reader if "are" has an obviously
plural noun to bind to.)
> +
> +	  MPAM is exposed to user-space via the resctrl pseudo filesystem.
> +
>  endmenu # "ARMv8.4 architectural features"
>  
>  menu "ARMv8.5 architectural features"
Cheers
---Dave
^ permalink raw reply	[flat|nested] 200+ messages in thread 
- * Re: [PATCH 07/33] arm64: kconfig: Add Kconfig entry for MPAM
  2025-08-27 11:01   ` Dave Martin
@ 2025-09-04 17:28     ` James Morse
  2025-09-09 10:26       ` Dave Martin
  0 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-09-04 17:28 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi Dave,
On 27/08/2025 12:01, Dave Martin wrote:
> <super-pedantic mode enabled>
Uh oh!
> (Since this likely be people's go-to patch for understanding what MPAM
> is, it is probably worth going the extra mile.)
> 
> On Fri, Aug 22, 2025 at 03:29:48PM +0000, James Morse wrote:
>> The bulk of the MPAM driver lives outside the arch code because it
>> largely manages MMIO devices that generate interrupts. The driver
>> needs a Kconfig symbol to enable it, as MPAM is only found on arm64
> 
> Prefer -> "[...] to enable it. As MPAM is only [...]"
> 
>> platforms, that is where the Kconfig option makes the most sense.
> 
> It could be clearer what "where" refers to, here.
Sure,
> Maybe reword from ", that is [...]" -> ", the arm64 tree is the most
> natural home for the Kconfig option."
> 
> (Or something like that.)
Sure,
>> This Kconfig option will later be used by the arch code to enable
>> or disable the MPAM context-switch code, and registering the CPUs
> 
> Nit: "registering" -> "to register"
> 
>> properties with the MPAM driver.
> 
> Nit: "CPUs properties" -> "properties of CPUs" ?
> 
> (Maybe there was just a missed apostrophe, but it may be more readable
> here if written out longhand.)
Done, it just takes one person to think its clearer!
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index e9bbfacc35a6..658e47fc0c5a 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -2060,6 +2060,23 @@ config ARM64_TLB_RANGE
>>  	  ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
>>  	  range of input addresses.
>>  
>> +config ARM64_MPAM
>> +	bool "Enable support for MPAM"
>> +	help
> 
> <pedantic mode on>
> 
>> +	  Memory Partitioning and Monitoring is an optional extension
>> +	  that allows the CPUs to mark load and store transactions with
> 
> Nit: "memory transactions" ?
Sure,
> (I'm wondering whether there are some transactions such as atomic
> exchanges that are not neatly characterised as "load" or "store".
> Possibly MPAM labels some transactions that really are neither.)
Equally instruction fetch and possibly even CMOs get these labels.
I wanted something other than 'transactions' so it wasn't confused with
transactional memory - and traffic seemed to vauge.
I don't think anyone expects a formal definition in the Kconfig text...
>> +	  labels for partition-id and performance-monitoring-group.
> 
> Nit: the hyphenation suggests that these are known terms (in this
> specific, hyphenated, form) with specific definitions somewhere.
> I don't think that this is the case?  At least, I have not seen the
> terms presented in this way anywhere else.
> 
> Also, the partition ID is itself a label, so "label for partition-id"
> is a tautology.
> 
> How about:
> 
> --8<--
> 
> 	  Memory System Resource Partitioning and Monitoring (MPAM) is an
> 	  optional extension to the Arm architecture that allows each
> 	  transaction issued to the memory system to be labelled with a
> 	  Partition identifier (PARTID) and Performance Monitoring Group
> 	  identifier (PMG).
> 
> -->8--
Done,
> (Yes, that really seems to be what MPAM stands for in the published
> specs.  That's quite a mounthful, and news to me...  I can't say I paid
> much attention to the document titles beyond "MPAM"!)
> 
>> +	  System components, such as the caches, can use the partition-id
>> +	  to apply a performance policy. MPAM monitors can use the
> 
> What is a "performance policy"?
A bunch of controls, the value of which reflect some kind of policy.
> The MPAM specs talk about resource controls; it's probably best to
> stick to the same terminology.
> 
>> +	  partition-id and performance-monitoring-group to measure the
>> +	  cache occupancy or data throughput.
> 
> So, how about something like:
> 
> --8<--
> 
> 	  Memory system components, such as the caches, can be configured with
> 	  policies to control how much of various physical resources (such as
> 	  memory bandwidth or cache memory) the transactions labelled with each
> 	  PARTID can consume.  Depending on the capabilities of the hardware,
> 	  the PARTID and PMG can also be used as filtering criteria to measure
> 	  the memory system resource consumption of different parts of a
> 	  workload.
> 
> -->8--
Done,
> (Where "Memory system components" is used in a generic sense and so not
> capitalised.)
(I can't wait for the Memory System Component on the Memory Side Cache!)
>> +
>> +	  Use of this extension requires CPU support, support in the
>> +	  memory system components (MSC), and a description from firmware
> 
> But here, we are explicitly using an architectural term now, so
> 
> 	"Memory System Components" (MSC)
> 
> makes sense.
> 
>> +	  of where the MSC are in the address space.
> 
> Prefer "MSCs" ?  (Not everyone agrees about whether TLAs are
> pluralisable but it is easier on the reader if "are" has an obviously
> plural noun to bind to.)
Sure,
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread 
- * Re: [PATCH 07/33] arm64: kconfig: Add Kconfig entry for MPAM
  2025-09-04 17:28     ` James Morse
@ 2025-09-09 10:26       ` Dave Martin
  0 siblings, 0 replies; 200+ messages in thread
From: Dave Martin @ 2025-09-09 10:26 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi,
On Thu, Sep 04, 2025 at 06:28:14PM +0100, James Morse wrote:
> Hi Dave,
> 
> On 27/08/2025 12:01, Dave Martin wrote:
> > <super-pedantic mode enabled>
> 
> Uh oh!
> 
> > (Since this likely be people's go-to patch for understanding what MPAM
> > is, it is probably worth going the extra mile.)
> > 
> > On Fri, Aug 22, 2025 at 03:29:48PM +0000, James Morse wrote:
> >> The bulk of the MPAM driver lives outside the arch code because it
> >> largely manages MMIO devices that generate interrupts. The driver
> >> needs a Kconfig symbol to enable it, as MPAM is only found on arm64
> > 
> > Prefer -> "[...] to enable it. As MPAM is only [...]"
> > 
> >> platforms, that is where the Kconfig option makes the most sense.
> > 
> > It could be clearer what "where" refers to, here.
> 
> Sure,
> 
> 
> > Maybe reword from ", that is [...]" -> ", the arm64 tree is the most
> > natural home for the Kconfig option."
> > 
> > (Or something like that.)
> 
> Sure,
[... etc., etc. ...]
> >> +	  partition-id and performance-monitoring-group to measure the
> >> +	  cache occupancy or data throughput.
> > 
> > So, how about something like:
> > 
> > --8<--
> > 
> > 	  Memory system components, such as the caches, can be configured with
> > 	  policies to control how much of various physical resources (such as
> > 	  memory bandwidth or cache memory) the transactions labelled with each
> > 	  PARTID can consume.  Depending on the capabilities of the hardware,
> > 	  the PARTID and PMG can also be used as filtering criteria to measure
> > 	  the memory system resource consumption of different parts of a
> > 	  workload.
> > 
> > -->8--
> 
> Done,
> 
> 
> > (Where "Memory system components" is used in a generic sense and so not
> > capitalised.)
> 
> (I can't wait for the Memory System Component on the Memory Side Cache!)
Urk.
MSC² ?
[...]
Cheers
---Dave
^ permalink raw reply	[flat|nested] 200+ messages in thread 
 
 
 
- * [PATCH 08/33] ACPI / MPAM: Parse the MPAM table
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (6 preceding siblings ...)
  2025-08-22 15:29 ` [PATCH 07/33] arm64: kconfig: Add Kconfig entry for MPAM James Morse
@ 2025-08-22 15:29 ` James Morse
  2025-08-23 10:55   ` Markus Elfring
  2025-08-27 16:05   ` Dave Martin
  2025-08-22 15:29 ` [PATCH 09/33] dt-bindings: arm: Add MPAM MSC binding James Morse
                   ` (59 subsequent siblings)
  67 siblings, 2 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Add code to parse the arm64 specific MPAM table, looking up the cache
level from the PPTT and feeding the end result into the MPAM driver.
CC: Carl Worth <carl@os.amperecomputing.com>
Link: https://developer.arm.com/documentation/den0065/3-0bet/?lang=en
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
 * Used DEFINE_RES_IRQ_NAMED() and friends macros.
 * Additional error handling.
 * Check for zero sized MSC.
 * Allow table revisions greater than 1. (no spec for revision 0!)
 * Use cleanup helpers to retrive ACPI tables, which allows some functions
   to be folded together.
---
 arch/arm64/Kconfig          |   1 +
 drivers/acpi/arm64/Kconfig  |   3 +
 drivers/acpi/arm64/Makefile |   1 +
 drivers/acpi/arm64/mpam.c   | 331 ++++++++++++++++++++++++++++++++++++
 drivers/acpi/tables.c       |   2 +-
 include/linux/arm_mpam.h    |  46 +++++
 6 files changed, 383 insertions(+), 1 deletion(-)
 create mode 100644 drivers/acpi/arm64/mpam.c
 create mode 100644 include/linux/arm_mpam.h
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 658e47fc0c5a..e51ccf1da102 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2062,6 +2062,7 @@ config ARM64_TLB_RANGE
 
 config ARM64_MPAM
 	bool "Enable support for MPAM"
+	select ACPI_MPAM if ACPI
 	help
 	  Memory Partitioning and Monitoring is an optional extension
 	  that allows the CPUs to mark load and store transactions with
diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
index b3ed6212244c..f2fd79f22e7d 100644
--- a/drivers/acpi/arm64/Kconfig
+++ b/drivers/acpi/arm64/Kconfig
@@ -21,3 +21,6 @@ config ACPI_AGDI
 
 config ACPI_APMT
 	bool
+
+config ACPI_MPAM
+	bool
diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
index 05ecde9eaabe..9390b57cb564 100644
--- a/drivers/acpi/arm64/Makefile
+++ b/drivers/acpi/arm64/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_ACPI_APMT) 	+= apmt.o
 obj-$(CONFIG_ACPI_FFH)		+= ffh.o
 obj-$(CONFIG_ACPI_GTDT) 	+= gtdt.o
 obj-$(CONFIG_ACPI_IORT) 	+= iort.o
+obj-$(CONFIG_ACPI_MPAM) 	+= mpam.o
 obj-$(CONFIG_ACPI_PROCESSOR_IDLE) += cpuidle.o
 obj-$(CONFIG_ARM_AMBA)		+= amba.o
 obj-y				+= dma.o init.o
diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c
new file mode 100644
index 000000000000..e55fc2729ac5
--- /dev/null
+++ b/drivers/acpi/arm64/mpam.c
@@ -0,0 +1,331 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2025 Arm Ltd.
+
+/* Parse the MPAM ACPI table feeding the discovered nodes into the driver */
+
+#define pr_fmt(fmt) "ACPI MPAM: " fmt
+
+#include <linux/acpi.h>
+#include <linux/arm_mpam.h>
+#include <linux/bits.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/platform_device.h>
+
+#include <acpi/processor.h>
+
+/*
+ * Flags for acpi_table_mpam_msc.*_interrupt_flags.
+ * See 2.1.1 Interrupt Flags, Table 5, of DEN0065B_MPAM_ACPI_3.0-bet.
+ */
+#define ACPI_MPAM_MSC_IRQ_MODE_MASK                    BIT(0)
+#define ACPI_MPAM_MSC_IRQ_TYPE_MASK                    GENMASK(2, 1)
+#define ACPI_MPAM_MSC_IRQ_TYPE_WIRED                   0
+#define ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER BIT(3)
+#define ACPI_MPAM_MSC_IRQ_AFFINITY_VALID               BIT(4)
+
+static bool frob_irq(struct platform_device *pdev, int intid, u32 flags,
+		     int *irq, u32 processor_container_uid)
+{
+	int sense;
+
+	if (!intid)
+		return false;
+
+	if (FIELD_GET(ACPI_MPAM_MSC_IRQ_TYPE_MASK, flags) !=
+	    ACPI_MPAM_MSC_IRQ_TYPE_WIRED)
+		return false;
+
+	sense = FIELD_GET(ACPI_MPAM_MSC_IRQ_MODE_MASK, flags);
+
+	/*
+	 * If the GSI is in the GIC's PPI range, try and create a partitioned
+	 * percpu interrupt.
+	 */
+	if (16 <= intid && intid < 32 && processor_container_uid != ~0) {
+		pr_err_once("Partitioned interrupts not supported\n");
+		return false;
+	}
+
+	*irq = acpi_register_gsi(&pdev->dev, intid, sense, ACPI_ACTIVE_HIGH);
+	if (*irq <= 0) {
+		pr_err_once("Failed to register interrupt 0x%x with ACPI\n",
+			    intid);
+		return false;
+	}
+
+	return true;
+}
+
+static void acpi_mpam_parse_irqs(struct platform_device *pdev,
+				 struct acpi_mpam_msc_node *tbl_msc,
+				 struct resource *res, int *res_idx)
+{
+	u32 flags, aff;
+	int irq;
+
+	flags = tbl_msc->overflow_interrupt_flags;
+	if (flags & ACPI_MPAM_MSC_IRQ_AFFINITY_VALID &&
+	    flags & ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER)
+		aff = tbl_msc->overflow_interrupt_affinity;
+	else
+		aff = ~0;
+	if (frob_irq(pdev, tbl_msc->overflow_interrupt, flags, &irq, aff))
+		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "overflow");
+
+	flags = tbl_msc->error_interrupt_flags;
+	if (flags & ACPI_MPAM_MSC_IRQ_AFFINITY_VALID &&
+	    flags & ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER)
+		aff = tbl_msc->error_interrupt_affinity;
+	else
+		aff = ~0;
+	if (frob_irq(pdev, tbl_msc->error_interrupt, flags, &irq, aff))
+		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "error");
+}
+
+static int acpi_mpam_parse_resource(struct mpam_msc *msc,
+				    struct acpi_mpam_resource_node *res)
+{
+	int level, nid;
+	u32 cache_id;
+
+	switch (res->locator_type) {
+	case ACPI_MPAM_LOCATION_TYPE_PROCESSOR_CACHE:
+		cache_id = res->locator.cache_locator.cache_reference;
+		level = find_acpi_cache_level_from_id(cache_id);
+		if (level <= 0) {
+			pr_err_once("Bad level (%u) for cache with id %u\n", level, cache_id);
+			return -EINVAL;
+		}
+		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_CACHE,
+				       level, cache_id);
+	case ACPI_MPAM_LOCATION_TYPE_MEMORY:
+		nid = pxm_to_node(res->locator.memory_locator.proximity_domain);
+		if (nid == NUMA_NO_NODE)
+			nid = 0;
+		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_MEMORY,
+				       255, nid);
+	default:
+		/* These get discovered later and treated as unknown */
+		return 0;
+	}
+}
+
+int acpi_mpam_parse_resources(struct mpam_msc *msc,
+			      struct acpi_mpam_msc_node *tbl_msc)
+{
+	int i, err;
+	struct acpi_mpam_resource_node *resources;
+
+	resources = (struct acpi_mpam_resource_node *)(tbl_msc + 1);
+	for (i = 0; i < tbl_msc->num_resource_nodes; i++) {
+		err = acpi_mpam_parse_resource(msc, &resources[i]);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static bool __init parse_msc_pm_link(struct acpi_mpam_msc_node *tbl_msc,
+				     struct platform_device *pdev,
+				     u32 *acpi_id)
+{
+	bool acpi_id_valid = false;
+	struct acpi_device *buddy;
+	char hid[16], uid[16];
+	int err;
+
+	memset(&hid, 0, sizeof(hid));
+	memcpy(hid, &tbl_msc->hardware_id_linked_device,
+	       sizeof(tbl_msc->hardware_id_linked_device));
+
+	if (!strcmp(hid, ACPI_PROCESSOR_CONTAINER_HID)) {
+		*acpi_id = tbl_msc->instance_id_linked_device;
+		acpi_id_valid = true;
+	}
+
+	err = snprintf(uid, sizeof(uid), "%u",
+		       tbl_msc->instance_id_linked_device);
+	if (err >= sizeof(uid))
+		return acpi_id_valid;
+
+	buddy = acpi_dev_get_first_match_dev(hid, uid, -1);
+	if (buddy)
+		device_link_add(&pdev->dev, &buddy->dev, DL_FLAG_STATELESS);
+
+	return acpi_id_valid;
+}
+
+static int decode_interface_type(struct acpi_mpam_msc_node *tbl_msc,
+				 enum mpam_msc_iface *iface)
+{
+	switch (tbl_msc->interface_type) {
+	case 0:
+		*iface = MPAM_IFACE_MMIO;
+		return 0;
+	case 0xa:
+		*iface = MPAM_IFACE_PCC;
+		return 0;
+	default:
+		return -EINVAL;
+	}
+}
+
+static int __init acpi_mpam_parse(void)
+{
+        struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
+	char *table_end, *table_offset = (char *)(table + 1);
+	struct property_entry props[4]; /* needs a sentinel */
+	struct acpi_mpam_msc_node *tbl_msc;
+	int next_res, next_prop, err = 0;
+	struct acpi_device *companion;
+	struct platform_device *pdev;
+	enum mpam_msc_iface iface;
+	struct resource res[3];
+	char uid[16];
+	u32 acpi_id;
+
+	if (acpi_disabled || !system_supports_mpam() || IS_ERR(table))
+		return 0;
+
+	if (IS_ERR(table))
+		return 0;
+
+	if (table->revision < 1)
+		return 0;
+
+	table_end = (char *)table + table->length;
+
+	while (table_offset < table_end) {
+		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
+		table_offset += tbl_msc->length;
+
+		/*
+		 * If any of the reserved fields are set, make no attempt to
+		 * parse the msc structure. This will prevent the driver from
+		 * probing all the MSC, meaning it can't discover the system
+		 * wide supported partid and pmg ranges. This avoids whatever
+		 * this MSC is truncating the partids and creating a screaming
+		 * error interrupt.
+		 */
+		if (tbl_msc->reserved || tbl_msc->reserved1 || tbl_msc->reserved2)
+			continue;
+
+		if (!tbl_msc->mmio_size)
+			continue;
+
+		if (decode_interface_type(tbl_msc, &iface))
+			continue;
+
+		next_res = 0;
+		next_prop = 0;
+		memset(res, 0, sizeof(res));
+		memset(props, 0, sizeof(props));
+
+		pdev = platform_device_alloc("mpam_msc", tbl_msc->identifier);
+		if (!pdev) {
+			err = -ENOMEM;
+			break;
+		}
+
+		if (tbl_msc->length < sizeof(*tbl_msc)) {
+			err = -EINVAL;
+			break;
+		}
+
+		/* Some power management is described in the namespace: */
+		err = snprintf(uid, sizeof(uid), "%u", tbl_msc->identifier);
+		if (err > 0 && err < sizeof(uid)) {
+			companion = acpi_dev_get_first_match_dev("ARMHAA5C", uid, -1);
+			if (companion)
+				ACPI_COMPANION_SET(&pdev->dev, companion);
+		}
+
+		if (iface == MPAM_IFACE_MMIO) {
+			res[next_res++] = DEFINE_RES_MEM_NAMED(tbl_msc->base_address,
+							       tbl_msc->mmio_size,
+							       "MPAM:MSC");
+		} else if (iface == MPAM_IFACE_PCC) {
+			props[next_prop++] = PROPERTY_ENTRY_U32("pcc-channel",
+								tbl_msc->base_address);
+			next_prop++;
+		}
+
+		acpi_mpam_parse_irqs(pdev, tbl_msc, res, &next_res);
+		err = platform_device_add_resources(pdev, res, next_res);
+		if (err)
+			break;
+
+		props[next_prop++] = PROPERTY_ENTRY_U32("arm,not-ready-us",
+							tbl_msc->max_nrdy_usec);
+
+		/*
+		 * The MSC's CPU affinity is described via its linked power
+		 * management device, but only if it points at a Processor or
+		 * Processor Container.
+		 */
+		if (parse_msc_pm_link(tbl_msc, pdev, &acpi_id)) {
+			props[next_prop++] = PROPERTY_ENTRY_U32("cpu_affinity",
+								acpi_id);
+		}
+
+		err = device_create_managed_software_node(&pdev->dev, props,
+							  NULL);
+		if (err)
+			break;
+
+		/* Come back later if you want the RIS too */
+		err = platform_device_add_data(pdev, tbl_msc, tbl_msc->length);
+		if (err)
+			break;
+
+		err = platform_device_add(pdev);
+		if (err)
+			break;
+	}
+
+	if (err)
+		platform_device_put(pdev);
+
+	return err;
+}
+
+int acpi_mpam_count_msc(void)
+{
+        struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
+	char *table_end, *table_offset = (char *)(table + 1);
+	struct acpi_mpam_msc_node *tbl_msc;
+	int count = 0;
+
+	if (IS_ERR(table))
+		return 0;
+
+	if (table->revision < 1)
+		return 0;
+
+	tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
+	table_end = (char *)table + table->length;
+
+	while (table_offset < table_end) {
+		if (!tbl_msc->mmio_size)
+			continue;
+
+		if (tbl_msc->length < sizeof(*tbl_msc))
+			return -EINVAL;
+
+		count++;
+
+		table_offset += tbl_msc->length;
+		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
+	}
+
+	return count;
+}
+
+/*
+ * Call after ACPI devices have been created, which happens behind acpi_scan_init()
+ * called from subsys_initcall(). PCC requires the mailbox driver, which is
+ * initialised from postcore_initcall().
+ */
+subsys_initcall_sync(acpi_mpam_parse);
diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index fa9bb8c8ce95..835e3795ede3 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -408,7 +408,7 @@ static const char table_sigs[][ACPI_NAMESEG_SIZE] __nonstring_array __initconst
 	ACPI_SIG_PSDT, ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT,
 	ACPI_SIG_IORT, ACPI_SIG_NFIT, ACPI_SIG_HMAT, ACPI_SIG_PPTT,
 	ACPI_SIG_NHLT, ACPI_SIG_AEST, ACPI_SIG_CEDT, ACPI_SIG_AGDI,
-	ACPI_SIG_NBFT };
+	ACPI_SIG_NBFT, ACPI_SIG_MPAM };
 
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
 
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
new file mode 100644
index 000000000000..0edefa6ba019
--- /dev/null
+++ b/include/linux/arm_mpam.h
@@ -0,0 +1,46 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2025 Arm Ltd. */
+
+#ifndef __LINUX_ARM_MPAM_H
+#define __LINUX_ARM_MPAM_H
+
+#include <linux/acpi.h>
+#include <linux/types.h>
+
+struct mpam_msc;
+
+enum mpam_msc_iface {
+	MPAM_IFACE_MMIO,	/* a real MPAM MSC */
+	MPAM_IFACE_PCC,		/* a fake MPAM MSC */
+};
+
+enum mpam_class_types {
+	MPAM_CLASS_CACHE,       /* Well known caches, e.g. L2 */
+	MPAM_CLASS_MEMORY,      /* Main memory */
+	MPAM_CLASS_UNKNOWN,     /* Everything else, e.g. SMMU */
+};
+
+#ifdef CONFIG_ACPI_MPAM
+/* Parse the ACPI description of resources entries for this MSC. */
+int acpi_mpam_parse_resources(struct mpam_msc *msc,
+			      struct acpi_mpam_msc_node *tbl_msc);
+
+int acpi_mpam_count_msc(void);
+#else
+static inline int acpi_mpam_parse_resources(struct mpam_msc *msc,
+					    struct acpi_mpam_msc_node *tbl_msc)
+{
+	return -EINVAL;
+}
+
+static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
+#endif
+
+static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
+				  enum mpam_class_types type, u8 class_id,
+				  int component_id)
+{
+	return -EINVAL;
+}
+
+#endif /* __LINUX_ARM_MPAM_H */
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 08/33] ACPI / MPAM: Parse the MPAM table
  2025-08-22 15:29 ` [PATCH 08/33] ACPI / MPAM: Parse the MPAM table James Morse
@ 2025-08-23 10:55   ` Markus Elfring
  2025-09-04 17:28     ` James Morse
  2025-08-27 16:05   ` Dave Martin
  1 sibling, 1 reply; 200+ messages in thread
From: Markus Elfring @ 2025-08-23 10:55 UTC (permalink / raw)
  To: James Morse, linux-arm-kernel, linux-acpi, devicetree
  Cc: LKML, Amit Singh Tomar, Baisheng Gao, Baolin Wang,
	bobo.shaobowang, Carl Worth, Catalin Marinas, Conor Dooley,
	Danilo Krummrich, Dave Martin, David Hildenbrand, Drew Fustini,
	D Scott Phillips, Fenghua Yu, Greg Kroah-Hartman, Hanjun Guo,
	Jamie Iles, Jonathan Cameron, Koba Ko, Krzysztof Kozlowski,
	Len Brown, Linu Cherian, Lorenzo Pieralisi, Peter Newman,
	Rafael J. Wysocki, Rex Nie, Rob Herring, Rohit Mathew,
	Shameer Kolothum, Shanker Donthineni, Shaopeng Tan, Sudeep Holla,
	Will Deacon, Xin Hao
…
> +++ b/drivers/acpi/arm64/mpam.c
> @@ -0,0 +1,331 @@
…
> +static int __init acpi_mpam_parse(void)
> +{
> +        struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
> +	char *table_end, *table_offset = (char *)(table + 1);
…
Please replace eight space characters by a tab character here.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-style.rst?h=v6.17-rc2#n18
Are further source code places similarly improvable?
Regards,
Markus
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 08/33] ACPI / MPAM: Parse the MPAM table
  2025-08-23 10:55   ` Markus Elfring
@ 2025-09-04 17:28     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-04 17:28 UTC (permalink / raw)
  To: Markus Elfring, linux-arm-kernel, linux-acpi, devicetree
  Cc: LKML, Amit Singh Tomar, Baisheng Gao, Baolin Wang,
	bobo.shaobowang, Carl Worth, Catalin Marinas, Conor Dooley,
	Danilo Krummrich, Dave Martin, David Hildenbrand, Drew Fustini,
	D Scott Phillips, Fenghua Yu, Greg Kroah-Hartman, Hanjun Guo,
	Jamie Iles, Jonathan Cameron, Koba Ko, Krzysztof Kozlowski,
	Len Brown, Linu Cherian, Lorenzo Pieralisi, Peter Newman,
	Rafael J. Wysocki, Rex Nie, Rob Herring, Rohit Mathew,
	Shameer Kolothum, Shanker Donthineni, Shaopeng Tan, Sudeep Holla,
	Will Deacon, Xin Hao
Hi Markus,
On 23/08/2025 11:55, Markus Elfring wrote:
> …
>> +++ b/drivers/acpi/arm64/mpam.c
>> @@ -0,0 +1,331 @@
> …
>> +static int __init acpi_mpam_parse(void)
>> +{
>> +        struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
>> +	char *table_end, *table_offset = (char *)(table + 1);
> …
> 
> Please replace eight space characters by a tab character here.
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-style.rst?h=v6.17-rc2#n18
> 
> Are further source code places similarly improvable?
Yes - I just copied this rune from the email from Jonathan that suggested it.
'Fixed',
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
- * Re: [PATCH 08/33] ACPI / MPAM: Parse the MPAM table
  2025-08-22 15:29 ` [PATCH 08/33] ACPI / MPAM: Parse the MPAM table James Morse
  2025-08-23 10:55   ` Markus Elfring
@ 2025-08-27 16:05   ` Dave Martin
  2025-09-04 17:28     ` James Morse
  1 sibling, 1 reply; 200+ messages in thread
From: Dave Martin @ 2025-08-27 16:05 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hi,
[Note, looks like I crossed over with Rob here -- apologies for any
duplicate or conflicting comments.]
On Fri, Aug 22, 2025 at 03:29:49PM +0000, James Morse wrote:
> Add code to parse the arm64 specific MPAM table, looking up the cache
> level from the PPTT and feeding the end result into the MPAM driver.
Might be worth mentioning that the hook for feeding the parsed factoids
into the driver (mpam_ris_create()) is not implemented for now.
> CC: Carl Worth <carl@os.amperecomputing.com>
> Link: https://developer.arm.com/documentation/den0065/3-0bet/?lang=en
> Signed-off-by: James Morse <james.morse@arm.com>
> 
> ---
> Changes since RFC:
>  * Used DEFINE_RES_IRQ_NAMED() and friends macros.
>  * Additional error handling.
>  * Check for zero sized MSC.
>  * Allow table revisions greater than 1. (no spec for revision 0!)
>  * Use cleanup helpers to retrive ACPI tables, which allows some functions
>    to be folded together.
> ---
>  arch/arm64/Kconfig          |   1 +
>  drivers/acpi/arm64/Kconfig  |   3 +
>  drivers/acpi/arm64/Makefile |   1 +
>  drivers/acpi/arm64/mpam.c   | 331 ++++++++++++++++++++++++++++++++++++
>  drivers/acpi/tables.c       |   2 +-
>  include/linux/arm_mpam.h    |  46 +++++
>  6 files changed, 383 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/acpi/arm64/mpam.c
>  create mode 100644 include/linux/arm_mpam.h
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 658e47fc0c5a..e51ccf1da102 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2062,6 +2062,7 @@ config ARM64_TLB_RANGE
>  
>  config ARM64_MPAM
>  	bool "Enable support for MPAM"
> +	select ACPI_MPAM if ACPI
>  	help
>  	  Memory Partitioning and Monitoring is an optional extension
>  	  that allows the CPUs to mark load and store transactions with
> diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
> index b3ed6212244c..f2fd79f22e7d 100644
> --- a/drivers/acpi/arm64/Kconfig
> +++ b/drivers/acpi/arm64/Kconfig
> @@ -21,3 +21,6 @@ config ACPI_AGDI
>  
>  config ACPI_APMT
>  	bool
> +
> +config ACPI_MPAM
> +	bool
> diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
> index 05ecde9eaabe..9390b57cb564 100644
> --- a/drivers/acpi/arm64/Makefile
> +++ b/drivers/acpi/arm64/Makefile
> @@ -4,6 +4,7 @@ obj-$(CONFIG_ACPI_APMT) 	+= apmt.o
>  obj-$(CONFIG_ACPI_FFH)		+= ffh.o
>  obj-$(CONFIG_ACPI_GTDT) 	+= gtdt.o
>  obj-$(CONFIG_ACPI_IORT) 	+= iort.o
> +obj-$(CONFIG_ACPI_MPAM) 	+= mpam.o
>  obj-$(CONFIG_ACPI_PROCESSOR_IDLE) += cpuidle.o
>  obj-$(CONFIG_ARM_AMBA)		+= amba.o
>  obj-y				+= dma.o init.o
> diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c
> new file mode 100644
> index 000000000000..e55fc2729ac5
> --- /dev/null
> +++ b/drivers/acpi/arm64/mpam.c
> @@ -0,0 +1,331 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2025 Arm Ltd.
> +
> +/* Parse the MPAM ACPI table feeding the discovered nodes into the driver */
> +
> +#define pr_fmt(fmt) "ACPI MPAM: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/arm_mpam.h>
> +#include <linux/bits.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
> +#include <linux/platform_device.h>
> +
> +#include <acpi/processor.h>
> +
> +/*
> + * Flags for acpi_table_mpam_msc.*_interrupt_flags.
> + * See 2.1.1 Interrupt Flags, Table 5, of DEN0065B_MPAM_ACPI_3.0-bet.
> + */
> +#define ACPI_MPAM_MSC_IRQ_MODE_MASK                    BIT(0)
> +#define ACPI_MPAM_MSC_IRQ_TYPE_MASK                    GENMASK(2, 1)
> +#define ACPI_MPAM_MSC_IRQ_TYPE_WIRED                   0
> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER BIT(3)
> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_VALID               BIT(4)
> +
> +static bool frob_irq(struct platform_device *pdev, int intid, u32 flags,
> +		     int *irq, u32 processor_container_uid)
Can this have a name, please?
> +{
> +	int sense;
> +
> +	if (!intid)
> +		return false;
> +
> +	if (FIELD_GET(ACPI_MPAM_MSC_IRQ_TYPE_MASK, flags) !=
> +	    ACPI_MPAM_MSC_IRQ_TYPE_WIRED)
> +		return false;
> +
> +	sense = FIELD_GET(ACPI_MPAM_MSC_IRQ_MODE_MASK, flags);
Do we handle cross-endian ACPI tables?
ACPI defers to the relevant specification regarding the endianness of
externally defined tables, but as of v3.0 (beta) of for MPAM ACPI
spec [1], no statement is made about this.
Following the spirit of the ACPI core specs, I suspect that the
"correct" answer is that MPAM tables are always little-endian, even if
it not written down anywhere.
If the kernel is big-endian, we lose.
Maybe it is sufficient to make CONFIG_ACPI_MPAM depend on
!CONFIG_CPU_BIG_ENDIAN for now.
I haven't tried to understand how this is handled for other tables.
> +
> +	/*
> +	 * If the GSI is in the GIC's PPI range, try and create a partitioned
> +	 * percpu interrupt.
But actually we don't even try?  Or did I miss something?
> +	 */
> +	if (16 <= intid && intid < 32 && processor_container_uid != ~0) {
checkpatch.pl says:
 | WARNING: Comparisons should place the constant on the right side of the test
 | #108: FILE: drivers/acpi/arm64/mpam.c:45:
 | +       if (16 <= intid && intid < 32 && processor_container_uid != ~0) {
(Dubious whether this is "wrong" IMHO, but still probably best avoided
since it is not what people are used to seeing.)
> +		pr_err_once("Partitioned interrupts not supported\n");
> +		return false;
> +	}
> +
> +	*irq = acpi_register_gsi(&pdev->dev, intid, sense, ACPI_ACTIVE_HIGH);
> +	if (*irq <= 0) {
> +		pr_err_once("Failed to register interrupt 0x%x with ACPI\n",
> +			    intid);
Are we going to get a lot of duplicate error messages with the same
interrupt?  If not, perhaps make this a pr_err() so that all the
affected interrupts are notified?
(Either way, hopefully the user will take the hint that they messed
something up though.)
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +static void acpi_mpam_parse_irqs(struct platform_device *pdev,
> +				 struct acpi_mpam_msc_node *tbl_msc,
> +				 struct resource *res, int *res_idx)
> +{
> +	u32 flags, aff;
> +	int irq;
We may still get in here if MPAMF_IDR.HAS_ERR_MSI and/or
MPAMF_MSMON_IDR.HAS_OFLW_MSI is set.  If so, there is no wired
interrupt.  Does it matter if we still parse and allocate the wired
interrupts here?
> +
> +	flags = tbl_msc->overflow_interrupt_flags;
> +	if (flags & ACPI_MPAM_MSC_IRQ_AFFINITY_VALID &&
> +	    flags & ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER)
> +		aff = tbl_msc->overflow_interrupt_affinity;
> +	else
> +		aff = ~0;
(u32)~0 is used as an exceptional UID all over the place.  If this is
not a pre-existing convention, it could be worth having a #define for
this.  (grep '~0' drivers/acpi/ suggests that this is new.)
> +	if (frob_irq(pdev, tbl_msc->overflow_interrupt, flags, &irq, aff))
> +		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "overflow");
I couldn't find a statement in the spec of how the table can specify
that there is no interrupt.
Are the interrupts always required for ACPI-based MPAM systems?
overflow_interrupt and error_interrupt are GSIVs, which seems to be an
ACPI thing.  The examples in the ACPI spec suggest that 0 can be a
valid value.  No exceptional value seems to be defined.
The flags fields have some invalid encodings, but no explicit "no
interrupt" encoding that I can see.
> +
> +	flags = tbl_msc->error_interrupt_flags;
> +	if (flags & ACPI_MPAM_MSC_IRQ_AFFINITY_VALID &&
> +	    flags & ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER)
> +		aff = tbl_msc->error_interrupt_affinity;
> +	else
> +		aff = ~0;
> +	if (frob_irq(pdev, tbl_msc->error_interrupt, flags, &irq, aff))
> +		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "error");
> +}
> +
> +static int acpi_mpam_parse_resource(struct mpam_msc *msc,
> +				    struct acpi_mpam_resource_node *res)
> +{
> +	int level, nid;
> +	u32 cache_id;
> +
> +	switch (res->locator_type) {
> +	case ACPI_MPAM_LOCATION_TYPE_PROCESSOR_CACHE:
> +		cache_id = res->locator.cache_locator.cache_reference;
> +		level = find_acpi_cache_level_from_id(cache_id);
> +		if (level <= 0) {
> +			pr_err_once("Bad level (%u) for cache with id %u\n", level, cache_id);
> +			return -EINVAL;
> +		}
> +		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_CACHE,
> +				       level, cache_id);
> +	case ACPI_MPAM_LOCATION_TYPE_MEMORY:
> +		nid = pxm_to_node(res->locator.memory_locator.proximity_domain);
> +		if (nid == NUMA_NO_NODE)
> +			nid = 0;
> +		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_MEMORY,
> +				       255, nid);
> +	default:
> +		/* These get discovered later and treated as unknown */
> +		return 0;
> +	}
> +}
> +
> +int acpi_mpam_parse_resources(struct mpam_msc *msc,
> +			      struct acpi_mpam_msc_node *tbl_msc)
> +{
> +	int i, err;
> +	struct acpi_mpam_resource_node *resources;
> +
> +	resources = (struct acpi_mpam_resource_node *)(tbl_msc + 1);
Should we check that we don't go out of the bounds of the MSC node
(or, at the very least, of the MPAM table)?
If tbl_msc->length was already validated, that can be used for the
bounds check.
> +	for (i = 0; i < tbl_msc->num_resource_nodes; i++) {
> +		err = acpi_mpam_parse_resource(msc, &resources[i]);
Isn't the length of each resource node variable?  According to [2],
the length depends on the num_functional_deps field.  It looks like the
functional dependency descriptors (if any) are appended contiguously to
the resource node, unless I've misunderstood something.
> +		if (err)
> +			return err;
> +	}
> +
> +	return 0;
> +}
> +
> +static bool __init parse_msc_pm_link(struct acpi_mpam_msc_node *tbl_msc,
> +				     struct platform_device *pdev,
> +				     u32 *acpi_id)
> +{
> +	bool acpi_id_valid = false;
> +	struct acpi_device *buddy;
> +	char hid[16], uid[16];
> +	int err;
> +
> +	memset(&hid, 0, sizeof(hid));
> +	memcpy(hid, &tbl_msc->hardware_id_linked_device,
> +	       sizeof(tbl_msc->hardware_id_linked_device));
This is safe by semi-accident, since 16 > 8.
It might be cleaner to declare
	char hid[sizeof(tbl_msc->hardware_id_linked_device)];
which can never be wrong.
memset()+memcpy() might be better replaced with strscpy() (or just use
snprintf again, since this avoids having to think about multiple
different ways of avoiding buffer overflows at the same time.  This is
not a fast path.)
> +
> +	if (!strcmp(hid, ACPI_PROCESSOR_CONTAINER_HID)) {
> +		*acpi_id = tbl_msc->instance_id_linked_device;
> +		acpi_id_valid = true;
> +	}
> +
> +	err = snprintf(uid, sizeof(uid), "%u",
char uid[11]; would be sufficient, here.  The instance ID is strictly
32-bit.  Adding a safety margin is worthless here, since snprintf()
checks the bounds -- either the size is sufficient for all possible u32
values, or it isn't.
> +		       tbl_msc->instance_id_linked_device);
Can snprintf() return < 0 on error?
I don't know, but elsewhere you do check for this.  I tend to the view
that is cleaner to assume that the kernel's snprintf() is just as
hostile as C's version (if not more so).
(-1 >= sizeof(foo) is always true thanks to the C arithmetic conversion
rules, but it's probably best not to rely on it.)
> +	if (err >= sizeof(uid))
> +		return acpi_id_valid;
Possibly return true on error?  Why?
> +
> +	buddy = acpi_dev_get_first_match_dev(hid, uid, -1);
> +	if (buddy)
> +		device_link_add(&pdev->dev, &buddy->dev, DL_FLAG_STATELESS);
> +
> +	return acpi_id_valid;
> +}
> +
> +static int decode_interface_type(struct acpi_mpam_msc_node *tbl_msc,
> +				 enum mpam_msc_iface *iface)
> +{
> +	switch (tbl_msc->interface_type) {
> +	case 0:
> +		*iface = MPAM_IFACE_MMIO;
> +		return 0;
> +	case 0xa:
> +		*iface = MPAM_IFACE_PCC;
> +		return 0;
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
> +static int __init acpi_mpam_parse(void)
> +{
> +        struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
checkpatch.pl says:
 | ERROR: code indent should use tabs where possible
 | #240: FILE: drivers/acpi/arm64/mpam.c:177:
 | +        struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);$
 | 
 | WARNING: please, no spaces at the start of a line
 | #240: FILE: drivers/acpi/arm64/mpam.c:177:
 | +        struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);$
> +	char *table_end, *table_offset = (char *)(table + 1);
> +	struct property_entry props[4]; /* needs a sentinel */
> +	struct acpi_mpam_msc_node *tbl_msc;
> +	int next_res, next_prop, err = 0;
> +	struct acpi_device *companion;
> +	struct platform_device *pdev;
> +	enum mpam_msc_iface iface;
> +	struct resource res[3];
> +	char uid[16];
> +	u32 acpi_id;
> +
> +	if (acpi_disabled || !system_supports_mpam() || IS_ERR(table))
> +		return 0;
> +
> +	if (IS_ERR(table))
> +		return 0;
> +
> +	if (table->revision < 1)
> +		return 0;
> +
> +	table_end = (char *)table + table->length;
> +
> +	while (table_offset < table_end) {
> +		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
> +		table_offset += tbl_msc->length;
> +
> +		/*
> +		 * If any of the reserved fields are set, make no attempt to
> +		 * parse the msc structure. This will prevent the driver from
> +		 * probing all the MSC, meaning it can't discover the system
> +		 * wide supported partid and pmg ranges. This avoids whatever
> +		 * this MSC is truncating the partids and creating a screaming
Mangled sentence?
I have not so far found any reference in [2] to the reset value of the
MPAMF_ECR.INTEN bit.  Do we rely on the error interrupt(s) for all MSCs
to be disabled at the interrupt controller?  If the same interrupt may
be shared by multiple MSCs, that's bad.
> +		 * error interrupt.
> +		 */
> +		if (tbl_msc->reserved || tbl_msc->reserved1 || tbl_msc->reserved2)
> +			continue;
The specs are not clear about how backwards compatibility is supposed
to work.
I would feel a bit uneasy about silently throwing away MSCs based on
critera that may not indicate incompatibility, and without even a
diagnostic.
> +
> +		if (!tbl_msc->mmio_size)
> +			continue;
> +
> +		if (decode_interface_type(tbl_msc, &iface))
> +			continue;
Ditto regarding diagnostics.
> +
> +		next_res = 0;
> +		next_prop = 0;
> +		memset(res, 0, sizeof(res));
> +		memset(props, 0, sizeof(props));
> +
> +		pdev = platform_device_alloc("mpam_msc", tbl_msc->identifier);
If the tbl_msc->identifier values contain duplicates, we will get a
platform device with a duplicate name here.  I don't know whether it
matters.
> +		if (!pdev) {
> +			err = -ENOMEM;
> +			break;
> +		}
> +
> +		if (tbl_msc->length < sizeof(*tbl_msc)) {
> +			err = -EINVAL;
> +			break;
> +		}
No check for oversized tbl_msc->length?  (See also
acpi_mpam_count_msc().)
> +
> +		/* Some power management is described in the namespace: */
> +		err = snprintf(uid, sizeof(uid), "%u", tbl_msc->identifier);
> +		if (err > 0 && err < sizeof(uid)) {
> +			companion = acpi_dev_get_first_match_dev("ARMHAA5C", uid, -1);
Diagnostic?
> +			if (companion)
> +				ACPI_COMPANION_SET(&pdev->dev, companion);
> +		}
> +
> +		if (iface == MPAM_IFACE_MMIO) {
> +			res[next_res++] = DEFINE_RES_MEM_NAMED(tbl_msc->base_address,
> +							       tbl_msc->mmio_size,
> +							       "MPAM:MSC");
> +		} else if (iface == MPAM_IFACE_PCC) {
> +			props[next_prop++] = PROPERTY_ENTRY_U32("pcc-channel",
> +								tbl_msc->base_address);
> +			next_prop++;
> +		}
> +
> +		acpi_mpam_parse_irqs(pdev, tbl_msc, res, &next_res);
> +		err = platform_device_add_resources(pdev, res, next_res);
> +		if (err)
> +			break;
> +
> +		props[next_prop++] = PROPERTY_ENTRY_U32("arm,not-ready-us",
> +							tbl_msc->max_nrdy_usec);
> +
> +		/*
> +		 * The MSC's CPU affinity is described via its linked power
> +		 * management device, but only if it points at a Processor or
> +		 * Processor Container.
> +		 */
> +		if (parse_msc_pm_link(tbl_msc, pdev, &acpi_id)) {
> +			props[next_prop++] = PROPERTY_ENTRY_U32("cpu_affinity",
> +								acpi_id);
> +		}
> +
> +		err = device_create_managed_software_node(&pdev->dev, props,
> +							  NULL);
> +		if (err)
> +			break;
> +
> +		/* Come back later if you want the RIS too */
> +		err = platform_device_add_data(pdev, tbl_msc, tbl_msc->length);
> +		if (err)
> +			break;
> +
> +		err = platform_device_add(pdev);
> +		if (err)
> +			break;
> +	}
> +
> +	if (err)
> +		platform_device_put(pdev);
> +
> +	return err;
> +}
> +
> +int acpi_mpam_count_msc(void)
> +{
> +        struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
checkpatch.pl says:
 | ERROR: code indent should use tabs where possible
 | #359: FILE: drivers/acpi/arm64/mpam.c:296:
 | +        struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);$
 | 
 | WARNING: please, no spaces at the start of a line
 | #359: FILE: drivers/acpi/arm64/mpam.c:296:
 | +        struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);$
> +	char *table_end, *table_offset = (char *)(table + 1);
> +	struct acpi_mpam_msc_node *tbl_msc;
> +	int count = 0;
> +
> +	if (IS_ERR(table))
> +		return 0;
> +
> +	if (table->revision < 1)
> +		return 0;
> +
> +	tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
Can this be moved into the loop?  It looks like it just duplicates the
update to tbl_msc at the end of the loop; the loop termination
condition does not depend on this variable.
> +	table_end = (char *)table + table->length;
> +
> +	while (table_offset < table_end) {
> +		if (!tbl_msc->mmio_size)
> +			continue;
This is 0 for non-usable PCC-based MSCs, right?
(Why explicitly unusable MSCs are listed in the table at all is a
mystery to me, but that's what the spec says.  I guess there must be a
reason.)
> +
> +		if (tbl_msc->length < sizeof(*tbl_msc))
> +			return -EINVAL;
Should we also have something like (not tested):
if (tbl_msc->length > table_end - table_offset)
	return -EINVAL;
Also, is it an error if a length is not a multiple of four bytes?
(I'm guessing that the core ACPI code doesn't try to understand the
contents of the MPAM table and so doesn't check this.)
> +
> +		count++;
> +
> +		table_offset += tbl_msc->length;
> +		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
> +	}
> +
> +	return count;
> +}
> +
> +/*
> + * Call after ACPI devices have been created, which happens behind acpi_scan_init()
> + * called from subsys_initcall(). PCC requires the mailbox driver, which is
> + * initialised from postcore_initcall().
> + */
> +subsys_initcall_sync(acpi_mpam_parse);
> diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
> index fa9bb8c8ce95..835e3795ede3 100644
> --- a/drivers/acpi/tables.c
> +++ b/drivers/acpi/tables.c
> @@ -408,7 +408,7 @@ static const char table_sigs[][ACPI_NAMESEG_SIZE] __nonstring_array __initconst
>  	ACPI_SIG_PSDT, ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT,
>  	ACPI_SIG_IORT, ACPI_SIG_NFIT, ACPI_SIG_HMAT, ACPI_SIG_PPTT,
>  	ACPI_SIG_NHLT, ACPI_SIG_AEST, ACPI_SIG_CEDT, ACPI_SIG_AGDI,
> -	ACPI_SIG_NBFT };
> +	ACPI_SIG_NBFT, ACPI_SIG_MPAM };
>  
>  #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
>  
> diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
> new file mode 100644
> index 000000000000..0edefa6ba019
> --- /dev/null
> +++ b/include/linux/arm_mpam.h
> @@ -0,0 +1,46 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (C) 2025 Arm Ltd. */
checkpatch.pl says:
 | WARNING: Improper SPDX comment style for 'include/linux/arm_mpam.h', please use '/*' instead
 | #414: FILE: include/linux/arm_mpam.h:1:
 | +// SPDX-License-Identifier: GPL-2.0
 | 
 | WARNING: Missing or malformed SPDX-License-Identifier tag in line 1
 | #414: FILE: include/linux/arm_mpam.h:1:
 | +// SPDX-License-Identifier: GPL-2.0
(That's probably the same error twice.)
(I never understood why the SPDX folks couldn't have allowed either
type of comment -- or at least, the same style in .c and .h files.
But I'm sure they had a reason that they believed was good.)
[...]
Cheers
---Dave
[1] ACPI for Memory System Resource Partitioning and Monitoring, 3.0 beta
https://developer.arm.com/documentation/den0065/3-0bet/?lang=en
[2] Arm Memory System Resource Partitioning and Monitoring (MPAM)
System Component Specification, ARM IHI 0099A.a
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 08/33] ACPI / MPAM: Parse the MPAM table
  2025-08-27 16:05   ` Dave Martin
@ 2025-09-04 17:28     ` James Morse
  2025-09-05 16:38       ` Dave Martin
  0 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-09-04 17:28 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hi Dave,
On 27/08/2025 17:05, Dave Martin wrote:
> On Fri, Aug 22, 2025 at 03:29:49PM +0000, James Morse wrote:
>> Add code to parse the arm64 specific MPAM table, looking up the cache
>> level from the PPTT and feeding the end result into the MPAM driver.
> 
> Might be worth mentioning that the hook for feeding the parsed factoids
> into the driver (mpam_ris_create()) is not implemented for now.
Sure,
>> diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c
>> new file mode 100644
>> index 000000000000..e55fc2729ac5
>> --- /dev/null
>> +++ b/drivers/acpi/arm64/mpam.c
>> @@ -0,0 +1,331 @@
>> +#define ACPI_MPAM_MSC_IRQ_MODE_MASK                    BIT(0)
>> +#define ACPI_MPAM_MSC_IRQ_TYPE_MASK                    GENMASK(2, 1)
>> +#define ACPI_MPAM_MSC_IRQ_TYPE_WIRED                   0
>> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER BIT(3)
>> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_VALID               BIT(4)
>> +
>> +static bool frob_irq(struct platform_device *pdev, int intid, u32 flags,
>> +		     int *irq, u32 processor_container_uid)
> 
> Can this have a name, please?
parse_irq()? (but not all of it - only the bits that are duplicated)
How about parse_irq_common().
>> +{
>> +	int sense;
>> +
>> +	if (!intid)
>> +		return false;
>> +
>> +	if (FIELD_GET(ACPI_MPAM_MSC_IRQ_TYPE_MASK, flags) !=
>> +	    ACPI_MPAM_MSC_IRQ_TYPE_WIRED)
>> +		return false;
>> +
>> +	sense = FIELD_GET(ACPI_MPAM_MSC_IRQ_MODE_MASK, flags);
> Do we handle cross-endian ACPI tables?
ACPI depends on UEFI, which is little endian only.
> ACPI defers to the relevant specification regarding the endianness of
> externally defined tables, but as of v3.0 (beta) of for MPAM ACPI
> spec [1], no statement is made about this.
> 
> Following the spirit of the ACPI core specs, I suspect that the
> "correct" answer is that MPAM tables are always little-endian, even if
> it not written down anywhere.
> 
> If the kernel is big-endian, we lose.
> 
> Maybe it is sufficient to make CONFIG_ACPI_MPAM depend on
> !CONFIG_CPU_BIG_ENDIAN for now.
> 
> 
> I haven't tried to understand how this is handled for other tables.
There is no way to hand the tables to the OS. I don't think this needs explict handling,
it all just falls out in the wash.
>> +
>> +	/*
>> +	 * If the GSI is in the GIC's PPI range, try and create a partitioned
>> +	 * percpu interrupt.
> 
> But actually we don't even try?  Or did I miss something?
Ah - I ripped that out on the assumption no-one was using it. I'll drop the comment.
No-one has squealed yet!
It's works out the box on DT systems, and can be described in the MPAM ACPI table - but
without a platform, there is no need for the irqchip juggling to make it work. Hence the
pr_err_once().
>> +	 */
>> +	if (16 <= intid && intid < 32 && processor_container_uid != ~0) {
> 
> checkpatch.pl says:
> 
>  | WARNING: Comparisons should place the constant on the right side of the test
>  | #108: FILE: drivers/acpi/arm64/mpam.c:45:
>  | +       if (16 <= intid && intid < 32 && processor_container_uid != ~0) {
> 
> (Dubious whether this is "wrong" IMHO, but still probably best avoided
> since it is not what people are used to seeing.)
If it were a single comparison I'd agree, but as its two, I find this a lot easier to read
as 'intid is between 16 and 32' than the alternative.
>> +		pr_err_once("Partitioned interrupts not supported\n");
>> +		return false;
>> +	}
>> +
>> +	*irq = acpi_register_gsi(&pdev->dev, intid, sense, ACPI_ACTIVE_HIGH);
>> +	if (*irq <= 0) {
>> +		pr_err_once("Failed to register interrupt 0x%x with ACPI\n",
>> +			    intid);
> 
> Are we going to get a lot of duplicate error messages with the same
> interrupt?  If not, perhaps make this a pr_err() so that all the
> affected interrupts are notified?
> 
> (Either way, hopefully the user will take the hint that they messed
> something up though.)
The hardware people figured they could get the OS to distribute the configuration to all
the nodes in the mesh, so there are platforms with hundreds of MSC. They also like to
share the interrupts, so if you mess the interrupt description up - you get hundreds of
messages.
To avoid spamming the log for what is likely to be a frequently repeated mistake, these
are all 'once'.
>> +		return false;
>> +	}
>> +
>> +	return true;
>> +}
>> +
>> +static void acpi_mpam_parse_irqs(struct platform_device *pdev,
>> +				 struct acpi_mpam_msc_node *tbl_msc,
>> +				 struct resource *res, int *res_idx)
>> +{
>> +	u32 flags, aff;
>> +	int irq;
> We may still get in here if MPAMF_IDR.HAS_ERR_MSI and/or
> MPAMF_MSMON_IDR.HAS_OFLW_MSI is set.
(this code can't know that)
> If so, there is no wired
> interrupt.  Does it matter if we still parse and allocate the wired
> interrupts here?
I don't think we do - frob_irq() will return false if the 'wired' flag is not set in the
table, so the 'res[(*res_idx)++]' step will be skipped, and the callers array is left
unmodified. The struct property_entry array is null-terminated, and gets copied anyway.
>> +	flags = tbl_msc->overflow_interrupt_flags;
>> +	if (flags & ACPI_MPAM_MSC_IRQ_AFFINITY_VALID &&
>> +	    flags & ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER)
>> +		aff = tbl_msc->overflow_interrupt_affinity;
>> +	else
>> +		aff = ~0;
> 
> (u32)~0 is used as an exceptional UID all over the place.  If this is
> not a pre-existing convention, it could be worth having a #define for
> this.  (grep '~0' drivers/acpi/ suggests that this is new.)
It's not normal to describe the affinity of an interrupt in tables like this...
The MPAM ACPI spec allows you to describe the PPI affinity because the MSC can be local to
a CPU, (e.g. the L2 cache), and the MPAM architecture spec says it can be a PPI.
We have support for this on DT systems, so it wasn't possible to argue "no one would ever
do that"!
>> +	if (frob_irq(pdev, tbl_msc->overflow_interrupt, flags, &irq, aff))
>> +		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "overflow");
> 
> I couldn't find a statement in the spec of how the table can specify
> that there is no interrupt.
With zero in the GSI field, from the spec:
| If this MSC does not support overflow interrupts or monitors, this field must be set
| to 0.
Which is detected in frob_irq() as:
|	if (!intid)
|		return false;
> Are the interrupts always required for ACPI-based MPAM systems?
In practice they're optional.
> overflow_interrupt and error_interrupt are GSIVs, which seems to be an
> ACPI thing.
It's a flat space of interrupt IDs - no need to shift the values around for the
SGI/PPI/SPI range. (aka IPI, percpu interrupt or plain old wired interrupt)
> The examples in the ACPI spec suggest that 0 can be a valid value.
Pretty sure its not - it would be a secure SGI. The MPAM spec uses it as 'invalid'.
> No exceptional value seems to be defined.
> 
> The flags fields have some invalid encodings, but no explicit "no
> interrupt" encoding that I can see.
I think you missed it hidden as 'and another thing' in the description of the field.
>> +
>> +	flags = tbl_msc->error_interrupt_flags;
>> +	if (flags & ACPI_MPAM_MSC_IRQ_AFFINITY_VALID &&
>> +	    flags & ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER)
>> +		aff = tbl_msc->error_interrupt_affinity;
>> +	else
>> +		aff = ~0;
>> +	if (frob_irq(pdev, tbl_msc->error_interrupt, flags, &irq, aff))
>> +		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "error");
>> +}
>> +
>> +static int acpi_mpam_parse_resource(struct mpam_msc *msc,
>> +				    struct acpi_mpam_resource_node *res)
>> +{
>> +	int level, nid;
>> +	u32 cache_id;
>> +
>> +	switch (res->locator_type) {
>> +	case ACPI_MPAM_LOCATION_TYPE_PROCESSOR_CACHE:
>> +		cache_id = res->locator.cache_locator.cache_reference;
>> +		level = find_acpi_cache_level_from_id(cache_id);
>> +		if (level <= 0) {
>> +			pr_err_once("Bad level (%u) for cache with id %u\n", level, cache_id);
>> +			return -EINVAL;
>> +		}
>> +		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_CACHE,
>> +				       level, cache_id);
>> +	case ACPI_MPAM_LOCATION_TYPE_MEMORY:
>> +		nid = pxm_to_node(res->locator.memory_locator.proximity_domain);
>> +		if (nid == NUMA_NO_NODE)
>> +			nid = 0;
>> +		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_MEMORY,
>> +				       255, nid);
>> +	default:
>> +		/* These get discovered later and treated as unknown */
>> +		return 0;
>> +	}
>> +}
>> +
>> +int acpi_mpam_parse_resources(struct mpam_msc *msc,
>> +			      struct acpi_mpam_msc_node *tbl_msc)
>> +{
>> +	int i, err;
>> +	struct acpi_mpam_resource_node *resources;
>> +
>> +	resources = (struct acpi_mpam_resource_node *)(tbl_msc + 1);
> Should we check that we don't go out of the bounds of the MSC node
> (or, at the very least, of the MPAM table)?
> 
> If tbl_msc->length was already validated, that can be used for the
> bounds check.
I'm not a fan of trying to validate the APCI tables - its extra work for something that
just has to be right. e.g. we can't check the base-address, we just have to trust its correct.
I didn't want to pass the table in here, and grabbing another reference means I'd have to
work out if its the same mapping...
But as tbl_msc->length is to hand...
>> +	for (i = 0; i < tbl_msc->num_resource_nodes; i++) {
>> +		err = acpi_mpam_parse_resource(msc, &resources[i]);
> 
> Isn't the length of each resource node variable?  According to [2],
> the length depends on the num_functional_deps field.  It looks like the
> functional dependency descriptors (if any) are appended contiguously to
> the resource node, unless I've misunderstood something.
No - it proably is that broken. They forgot to include a length field, painting themselves
into a corner if they ever want to modify this!
I was ignoring the 'functional dependencies' until someone comes up with a platform that
needs it. But yes, this assumes that field is zero.
I'm not entirely clear what the MPAM driver is supposed to do with functional
dependencies: 'make up a configuration' ... but when the controls formats are different,
its going to get exciting.
I'ved fixed that up:
| int acpi_mpam_parse_resources(struct mpam_msc *msc,
| 			      struct acpi_mpam_msc_node *tbl_msc)
| {
| 	int i, err;
|	char *ptr, *last_byte;
| 	struct acpi_mpam_resource_node *resource;
|
| 	ptr = (char *)(tbl_msc + 1);
| 	last_byte = ptr + tbl_msc->length;
| 	for (i = 0; i < tbl_msc->num_resource_nodes; i++) {
| 		if (ptr + sizeof(*resource) > last_byte)
| 			return -EINVAL;
|
| 		resource = (struct acpi_mpam_resource_node *)ptr;
| 		err = acpi_mpam_parse_resource(msc, resource);
| 		if (err)
| 			return err;
|
| 		ptr += sizeof(*resource);
| 		ptr += resource->num_functional_deps * sizeof(struct acpi_mpam_func_deps);
| 	}
|
| 	return 0;
| }
>> +		if (err)
>> +			return err;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static bool __init parse_msc_pm_link(struct acpi_mpam_msc_node *tbl_msc,
>> +				     struct platform_device *pdev,
>> +				     u32 *acpi_id)
>> +{
>> +	bool acpi_id_valid = false;
>> +	struct acpi_device *buddy;
>> +	char hid[16], uid[16];
>> +	int err;
>> +
>> +	memset(&hid, 0, sizeof(hid));
>> +	memcpy(hid, &tbl_msc->hardware_id_linked_device,
>> +	       sizeof(tbl_msc->hardware_id_linked_device));
> 
> This is safe by semi-accident, since 16 > 8.
> 
> It might be cleaner to declare
> 
> 	char hid[sizeof(tbl_msc->hardware_id_linked_device)];
> 
> which can never be wrong.
But can be too precise!
> memset()+memcpy() might be better replaced with strscpy() (or just use
> snprintf again, since this avoids having to think about multiple
> different ways of avoiding buffer overflows at the same time.  This is
> not a fast path.)
snprintf() demands space for a NULL byte - but these APCI tables fields are fixed width,
so don't have them.
The array sizes picked are the next larger power of 2, and the memset is ensure there is a
NULL byte after the fixed size region is copied in.
Where snprintf() is being used - its to convert from integer to string. For the hid, what
we really want is a fixed size memcpy() ... its possible to cook up a format string that
will do that, but its going to be an eye sore.
I can change this to;
| char hid[sizeof(tbl_msc->hardware_id_linked_device) + 1];
But I don't think this is better... How about a comment?!
|	/*
|	 * tbl_msc->hardware_id_linked_device is an 8 byte fixed width string.
|	 * hid[] is the next larger power of 2, and is zero'd to give us a
|	 * null terminated string for acpi_dev_get_first_match_dev().
|	 */
This is all because the MPAM stuff was forced into a static table, even though it needs to
refer to stuff in the namespace.
>> +
>> +	if (!strcmp(hid, ACPI_PROCESSOR_CONTAINER_HID)) {
>> +		*acpi_id = tbl_msc->instance_id_linked_device;
>> +		acpi_id_valid = true;
>> +	}
>> +
>> +	err = snprintf(uid, sizeof(uid), "%u",
> 
> char uid[11]; would be sufficient, here.  The instance ID is strictly
> 32-bit.  Adding a safety margin is worthless here, since snprintf()
> checks the bounds -- either the size is sufficient for all possible u32
> values, or it isn't.
I just plumped for the next largest power of 2.
>> +		       tbl_msc->instance_id_linked_device);
> 
> Can snprintf() return < 0 on error?
Documented as returning a positive number.
> I don't know, but elsewhere you do check for this.  I tend to the view
> that is cleaner to assume that the kernel's snprintf() is just as
> hostile as C's version (if not more so).
> 
> (-1 >= sizeof(foo) is always true thanks to the C arithmetic conversion
> rules, but it's probably best not to rely on it.)
> 
>> +	if (err >= sizeof(uid))
>> +		return acpi_id_valid;
> 
> Possibly return true on error?  Why?
(this is an error case that will never happen - but error handling is important!)
The acpi_id was parsed out of the table and returned to the caller. Hence true is
returned. This gets used to find the CPU-affinity, which is how the driver avoids
accessing caches that might not be on. There are sanity checks around this value. It can't
be present for some caches and not others.
Due to this error, the link to the buddy device was not added - which is silently ignored.
If it matters, it will show up as an access to a device that didn't get turned on by the
power-management code, because of that missing link.
Because I don't think this can happpen, I didn't try to handle it explicitly - just carry
on like nothing is wrong.
>> +	buddy = acpi_dev_get_first_match_dev(hid, uid, -1);
>> +	if (buddy)
>> +		device_link_add(&pdev->dev, &buddy->dev, DL_FLAG_STATELESS);
>> +
>> +	return acpi_id_valid;
>> +}
>> +
>> +static int decode_interface_type(struct acpi_mpam_msc_node *tbl_msc,
>> +				 enum mpam_msc_iface *iface)
>> +{
>> +	switch (tbl_msc->interface_type) {
>> +	case 0:
>> +		*iface = MPAM_IFACE_MMIO;
>> +		return 0;
>> +	case 0xa:
>> +		*iface = MPAM_IFACE_PCC;
>> +		return 0;
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +}
>> +
>> +static int __init acpi_mpam_parse(void)
>> +{
>> +        struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
> 
> checkpatch.pl says:
> 
>  | ERROR: code indent should use tabs where possible
>  | #240: FILE: drivers/acpi/arm64/mpam.c:177:
>  | +        struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);$
>  | 
>  | WARNING: please, no spaces at the start of a line
>  | #240: FILE: drivers/acpi/arm64/mpam.c:177:
>  | +        struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);$
Pff, I copied that from Jonathan's email because of its rune like nature.
>> +	char *table_end, *table_offset = (char *)(table + 1);
>> +	struct property_entry props[4]; /* needs a sentinel */
>> +	struct acpi_mpam_msc_node *tbl_msc;
>> +	int next_res, next_prop, err = 0;
>> +	struct acpi_device *companion;
>> +	struct platform_device *pdev;
>> +	enum mpam_msc_iface iface;
>> +	struct resource res[3];
>> +	char uid[16];
>> +	u32 acpi_id;
>> +
>> +	if (acpi_disabled || !system_supports_mpam() || IS_ERR(table))
>> +		return 0;
>> +
>> +	if (IS_ERR(table))
>> +		return 0;
>> +
>> +	if (table->revision < 1)
>> +		return 0;
>> +
>> +	table_end = (char *)table + table->length;
>> +
>> +	while (table_offset < table_end) {
>> +		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
>> +		table_offset += tbl_msc->length;
>> +
>> +		/*
>> +		 * If any of the reserved fields are set, make no attempt to
>> +		 * parse the msc structure. This will prevent the driver from
>> +		 * probing all the MSC, meaning it can't discover the system
>> +		 * wide supported partid and pmg ranges. This avoids whatever
>> +		 * this MSC is truncating the partids and creating a screaming
> 
> Mangled sentence?
just hard to parse:
"whatever this MSC is" "truncating the parids..."
Fixed as
| This avoids this MSC truncating the partids and creating a screaming error interrupt.
(although its not strictly the MSC that does that)
> I have not so far found any reference in [2] to the reset value of the
> MPAMF_ECR.INTEN bit.  Do we rely on the error interrupt(s) for all MSCs
> to be disabled at the interrupt controller?  If the same interrupt may
> be shared by multiple MSCs, that's bad.
They are anticipated as shared because the ESR register lets us know if this MSC triggered
the error.
The interrupt won't be delivered until the driver requests it - it'll just stay pending.
Yes - if some firwmare component used an out-of-range PARTID then linux will believe that
this was caused by linux and disable MPAM. Not much we can do about that.
(I've even seen a platform where this happens!)
>> +		 * error interrupt.
>> +		 */
>> +		if (tbl_msc->reserved || tbl_msc->reserved1 || tbl_msc->reserved2)
>> +			continue;
> The specs are not clear about how backwards compatibility is supposed
> to work.
How about that!
The spec people thought this would allow them to add things that aren't MSC in, re-using
one of these reserved fields as a type.
But - the architecture spec mandates the OS run around the MSC and collect the minimum
PARTID/PMG values (so people don't have to build an SoC where everything fits together
nicely). Because of that, the OS needs to know how many MSC there are, so that it knows it
has probed them all.
> I would feel a bit uneasy about silently throwing away MSCs based on
> critera that may not indicate incompatibility, and without even a
> diagnostic.
Yes the MSC gets thrown away - but it was still counted by acpi_mpam_count_msc(). The MPAM
driver will never probe all the MSC, so it will never call mpam_enable().
(you used to be able to see which cpuhp callback was registeered, to know it was stuck in
discovery, but now a helper does that which always uses the same name. I may fix that)
I think its fine for MPAM to silently fail to probe like this. The system still works, all
the known hardware gets probed and none of the unknown hardware is touched. (so it can't
blow us up). You don't get resctrl - but we can't know if the unknown hardware was going
to be important to supporting MPAM. e.g. it lowers the minimum usable PARTID/PMG.
It's between a rock and a hard place - I think this  behaviour is the best we can do.
Regarding it being silent - whatever happens it needs a kernel upgrade to learn about the
newly described hardware. I don't think its worth printing anything out in this case.
>> +
>> +		if (!tbl_msc->mmio_size)
>> +			continue;
>> +
>> +		if (decode_interface_type(tbl_msc, &iface))
>> +			continue;
> 
> Ditto regarding diagnostics.
A third interface type will result in those MSC being untouched, and MPAM unavailable.
I don't think printing "maybe upgrade your kernel?" is going to help anyone.
I'll chuck some pr_debug() in here so someone could find out which of these it is.
>> +
>> +		next_res = 0;
>> +		next_prop = 0;
>> +		memset(res, 0, sizeof(res));
>> +		memset(props, 0, sizeof(props));
>> +
>> +		pdev = platform_device_alloc("mpam_msc", tbl_msc->identifier);
> 
> If the tbl_msc->identifier values contain duplicates, we will get a
> platform device with a duplicate name here.  I don't know whether it
> matters.
The identifiers must be unique. I don't think this is the sort of table validation we need
to do for the firmware vendor.
>> +		if (!pdev) {
>> +			err = -ENOMEM;
>> +			break;
>> +		}
>> +
>> +		if (tbl_msc->length < sizeof(*tbl_msc)) {
>> +			err = -EINVAL;
>> +			break;
>> +		}
> No check for oversized tbl_msc->length?
You want it to parse the whole thing to check if firmware got the length field right?
I don't think we can know that. If its wrong, firmware may just have easily got the number
of RIS, or some other field wrong.
Historically x86 using to rummage around in memory to see if it could find something that
looked like ACPI tables. I'd imagine it was important to parse those carefully, as they
may never have been APCI table. We don't have this problem on arm64 - the ACPI tables were
handed over from UEFI. If the value is wrong - the firmware vendor meant it to be wrong.
> (See also acpi_mpam_count_msc().)
> 
>> +
>> +		/* Some power management is described in the namespace: */
>> +		err = snprintf(uid, sizeof(uid), "%u", tbl_msc->identifier);
>> +		if (err > 0 && err < sizeof(uid)) {
>> +			companion = acpi_dev_get_first_match_dev("ARMHAA5C", uid, -1);
> 
> Diagnostic?
> 
>> +			if (companion)
>> +				ACPI_COMPANION_SET(&pdev->dev, companion);
>> +		}
>> +
>> +		if (iface == MPAM_IFACE_MMIO) {
>> +			res[next_res++] = DEFINE_RES_MEM_NAMED(tbl_msc->base_address,
>> +							       tbl_msc->mmio_size,
>> +							       "MPAM:MSC");
>> +		} else if (iface == MPAM_IFACE_PCC) {
>> +			props[next_prop++] = PROPERTY_ENTRY_U32("pcc-channel",
>> +								tbl_msc->base_address);
>> +			next_prop++;
>> +		}
>> +
>> +		acpi_mpam_parse_irqs(pdev, tbl_msc, res, &next_res);
>> +		err = platform_device_add_resources(pdev, res, next_res);
>> +		if (err)
>> +			break;
>> +
>> +		props[next_prop++] = PROPERTY_ENTRY_U32("arm,not-ready-us",
>> +							tbl_msc->max_nrdy_usec);
>> +
>> +		/*
>> +		 * The MSC's CPU affinity is described via its linked power
>> +		 * management device, but only if it points at a Processor or
>> +		 * Processor Container.
>> +		 */
>> +		if (parse_msc_pm_link(tbl_msc, pdev, &acpi_id)) {
>> +			props[next_prop++] = PROPERTY_ENTRY_U32("cpu_affinity",
>> +								acpi_id);
>> +		}
>> +
>> +		err = device_create_managed_software_node(&pdev->dev, props,
>> +							  NULL);
>> +		if (err)
>> +			break;
>> +
>> +		/* Come back later if you want the RIS too */
>> +		err = platform_device_add_data(pdev, tbl_msc, tbl_msc->length);
>> +		if (err)
>> +			break;
>> +
>> +		err = platform_device_add(pdev);
>> +		if (err)
>> +			break;
>> +	}
>> +
>> +	if (err)
>> +		platform_device_put(pdev);
>> +
>> +	return err;
>> +}
>> +
>> +int acpi_mpam_count_msc(void)
>> +{
>> +        struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
> 
> checkpatch.pl says:
> 
>  | ERROR: code indent should use tabs where possible
>  | #359: FILE: drivers/acpi/arm64/mpam.c:296:
>  | +        struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);$
>  | 
>  | WARNING: please, no spaces at the start of a line
>  | #359: FILE: drivers/acpi/arm64/mpam.c:296:
>  | +        struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);$
> 
Yup, runes copied from Jonathan's email.
>> +	char *table_end, *table_offset = (char *)(table + 1);
>> +	struct acpi_mpam_msc_node *tbl_msc;
>> +	int count = 0;
>> +
>> +	if (IS_ERR(table))
>> +		return 0;
>> +
>> +	if (table->revision < 1)
>> +		return 0;
>> +
>> +	tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
> 
> Can this be moved into the loop?  It looks like it just duplicates the
> update to tbl_msc at the end of the loop; the loop termination
> condition does not depend on this variable.
Yup - looks like this used to be a do {} while () ; loop.
>> +	table_end = (char *)table + table->length;
>> +
>> +	while (table_offset < table_end) {
>> +		if (!tbl_msc->mmio_size)
>> +			continue;
> 
> This is 0 for non-usable PCC-based MSCs, right?
> 
> (Why explicitly unusable MSCs are listed in the table at all is a
> mystery to me, but that's what the spec says.  I guess there must be a
> reason.)
Some firmwware stacks have pre-built ACPI tables, then go knock out the enabled bits based
on what the platform actually has. This is easier than rebuilding the table...
>> +
>> +		if (tbl_msc->length < sizeof(*tbl_msc))
>> +			return -EINVAL;
> 
> Should we also have something like (not tested):
> 
> if (tbl_msc->length > table_end - table_offset)
> 	return -EINVAL;
Sure,
> Also, is it an error if a length is not a multiple of four bytes?
Hmmm, because all the subtables in the sepc are multiples of four bytes?
I think this falls in the bucket of 'table validation the OS shouldn't have to do'.
If the ACPI tables are that wrong, we're going to have bigger problems.
> (I'm guessing that the core ACPI code doesn't try to understand the
> contents of the MPAM table and so doesn't check this.)
It doesn't.
>> +
>> +		count++;
>> +
>> +		table_offset += tbl_msc->length;
>> +		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
>> +	}
>> +
>> +	return count;
>> +}
>> diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
>> new file mode 100644
>> index 000000000000..0edefa6ba019
>> --- /dev/null
>> +++ b/include/linux/arm_mpam.h
>> @@ -0,0 +1,46 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/* Copyright (C) 2025 Arm Ltd. */
> 
> checkpatch.pl says:
> 
>  | WARNING: Improper SPDX comment style for 'include/linux/arm_mpam.h', please use '/*' instead
>  | #414: FILE: include/linux/arm_mpam.h:1:
>  | +// SPDX-License-Identifier: GPL-2.0
>  | 
>  | WARNING: Missing or malformed SPDX-License-Identifier tag in line 1
>  | #414: FILE: include/linux/arm_mpam.h:1:
>  | +// SPDX-License-Identifier: GPL-2.0
> 
> (That's probably the same error twice.)
> 
> (I never understood why the SPDX folks couldn't have allowed either
> type of comment -- or at least, the same style in .c and .h files.
> But I'm sure they had a reason that they believed was good.)
Fixed,
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 08/33] ACPI / MPAM: Parse the MPAM table
  2025-09-04 17:28     ` James Morse
@ 2025-09-05 16:38       ` Dave Martin
  2025-09-10 19:19         ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Dave Martin @ 2025-09-05 16:38 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hi James,
On Thu, Sep 04, 2025 at 06:28:17PM +0100, James Morse wrote:
> Hi Dave,
> 
> On 27/08/2025 17:05, Dave Martin wrote:
> > On Fri, Aug 22, 2025 at 03:29:49PM +0000, James Morse wrote:
> >> Add code to parse the arm64 specific MPAM table, looking up the cache
> >> level from the PPTT and feeding the end result into the MPAM driver.
> > 
> > Might be worth mentioning that the hook for feeding the parsed factoids
> > into the driver (mpam_ris_create()) is not implemented for now.
> 
> Sure,
> 
> >> diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c
[...]
> >> +static bool frob_irq(struct platform_device *pdev, int intid, u32 flags,
> >> +		     int *irq, u32 processor_container_uid)
> > 
> > Can this have a name, please?
> 
> parse_irq()? (but not all of it - only the bits that are duplicated)
> How about parse_irq_common().
That is inoffensive, but since this is where the interrupt is validated
as usable and an attempt is made to reqister it -- and this is
basically all the function seems to do right now, would
	acpi_mpam_register_irq()
or similar make sense, here?
> >> +{
> >> +	int sense;
> >> +
> >> +	if (!intid)
> >> +		return false;
> >> +
> >> +	if (FIELD_GET(ACPI_MPAM_MSC_IRQ_TYPE_MASK, flags) !=
> >> +	    ACPI_MPAM_MSC_IRQ_TYPE_WIRED)
> >> +		return false;
> >> +
> >> +	sense = FIELD_GET(ACPI_MPAM_MSC_IRQ_MODE_MASK, flags);
> 
> > Do we handle cross-endian ACPI tables?
> 
> ACPI depends on UEFI, which is little endian only.
> 
> 
> > ACPI defers to the relevant specification regarding the endianness of
> > externally defined tables, but as of v3.0 (beta) of for MPAM ACPI
> > spec [1], no statement is made about this.
> > 
> > Following the spirit of the ACPI core specs, I suspect that the
> > "correct" answer is that MPAM tables are always little-endian, even if
> > it not written down anywhere.
> > 
> > If the kernel is big-endian, we lose.
> > 
> > Maybe it is sufficient to make CONFIG_ACPI_MPAM depend on
> > !CONFIG_CPU_BIG_ENDIAN for now.
> > 
> > 
> > I haven't tried to understand how this is handled for other tables.
> 
> There is no way to hand the tables to the OS. I don't think this needs explict handling,
> it all just falls out in the wash.
(I'm not sure what you mean by not being able to hand tables to the OS.
How did Linux get them?)
But in any case, if the BE case is not handled by the Linux ACPI code
in general then it almost certainly doesn't make sense to handle it
here.
(There are a lot of BE-handling macros in the imported ACPICA headers
in the kernel tree, which made me wonder whether this was a thing.)
> >> +
> >> +	/*
> >> +	 * If the GSI is in the GIC's PPI range, try and create a partitioned
> >> +	 * percpu interrupt.
> > 
> > But actually we don't even try?  Or did I miss something?
> 
> Ah - I ripped that out on the assumption no-one was using it. I'll drop the comment.
> No-one has squealed yet!
>
> It's works out the box on DT systems, and can be described in the MPAM ACPI table - but
> without a platform, there is no need for the irqchip juggling to make it work. Hence the
> pr_err_once().
OK.  I've no strong feeling about that.
> >> +	 */
> >> +	if (16 <= intid && intid < 32 && processor_container_uid != ~0) {
> > 
> > checkpatch.pl says:
> > 
> >  | WARNING: Comparisons should place the constant on the right side of the test
> >  | #108: FILE: drivers/acpi/arm64/mpam.c:45:
> >  | +       if (16 <= intid && intid < 32 && processor_container_uid != ~0) {
> > 
> > (Dubious whether this is "wrong" IMHO, but still probably best avoided
> > since it is not what people are used to seeing.)
> 
> If it were a single comparison I'd agree, but as its two, I find this a lot easier to read
> as 'intid is between 16 and 32' than the alternative.
Sure, just flagging it.
(There is a slight temptation with this idiom to accidentally write
e.g., 16 <= intid < 32, which means something very different.
But although the (correct) idiom appears uncommon in the kernel, there
is some precedent.  Either way, it works.)
> >> +		pr_err_once("Partitioned interrupts not supported\n");
> >> +		return false;
> >> +	}
> >> +
> >> +	*irq = acpi_register_gsi(&pdev->dev, intid, sense, ACPI_ACTIVE_HIGH);
> >> +	if (*irq <= 0) {
> >> +		pr_err_once("Failed to register interrupt 0x%x with ACPI\n",
> >> +			    intid);
> > 
> > Are we going to get a lot of duplicate error messages with the same
> > interrupt?  If not, perhaps make this a pr_err() so that all the
> > affected interrupts are notified?
> > 
> > (Either way, hopefully the user will take the hint that they messed
> > something up though.)
> 
> The hardware people figured they could get the OS to distribute the configuration to all
> the nodes in the mesh, so there are platforms with hundreds of MSC. They also like to
> share the interrupts, so if you mess the interrupt description up - you get hundreds of
> messages.
> To avoid spamming the log for what is likely to be a frequently repeated mistake, these
> are all 'once'.
OK, fair enough.
1 error message is better than none, either way.
> 
> >> +		return false;
> >> +	}
> >> +
> >> +	return true;
> >> +}
> >> +
> >> +static void acpi_mpam_parse_irqs(struct platform_device *pdev,
> >> +				 struct acpi_mpam_msc_node *tbl_msc,
> >> +				 struct resource *res, int *res_idx)
> >> +{
> >> +	u32 flags, aff;
> >> +	int irq;
> 
> > We may still get in here if MPAMF_IDR.HAS_ERR_MSI and/or
> > MPAMF_MSMON_IDR.HAS_OFLW_MSI is set.
> 
> (this code can't know that)
> 
> 
> > If so, there is no wired
> > interrupt.  Does it matter if we still parse and allocate the wired
> > interrupts here?
> 
> I don't think we do - frob_irq() will return false if the 'wired' flag is not set in the
> table, so the 'res[(*res_idx)++]' step will be skipped, and the callers array is left
> unmodified. The struct property_entry array is null-terminated, and gets copied anyway.
> 
> 
> >> +	flags = tbl_msc->overflow_interrupt_flags;
> >> +	if (flags & ACPI_MPAM_MSC_IRQ_AFFINITY_VALID &&
> >> +	    flags & ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER)
> >> +		aff = tbl_msc->overflow_interrupt_affinity;
> >> +	else
> >> +		aff = ~0;
> > 
> > (u32)~0 is used as an exceptional UID all over the place.  If this is
> > not a pre-existing convention, it could be worth having a #define for
> > this.  (grep '~0' drivers/acpi/ suggests that this is new.)
> 
> It's not normal to describe the affinity of an interrupt in tables like this...
> The MPAM ACPI spec allows you to describe the PPI affinity because the MSC can be local to
> a CPU, (e.g. the L2 cache), and the MPAM architecture spec says it can be a PPI.
> We have support for this on DT systems, so it wasn't possible to argue "no one would ever
> do that"!
I guess this is OK.
(My misgivings about ~0 are partly due to the way C evaluates expressions.
The conversion to the destination type occus only after the ~ is evaulated,
so if the affected type is changed to a 64-bit type during maintenance
(or by copy-pasting into another context), then it's easy to end up with
with wrong values in the high bits.  The definition
	u64 x = ~0;
does indeed set x to all ones, but I find the reason _why_ this works
counterintuitive, and superficially similar expressions can go wrong.
Having a #define only requires this to be got right in one place.)
> >> +	if (frob_irq(pdev, tbl_msc->overflow_interrupt, flags, &irq, aff))
> >> +		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "overflow");
> > 
> > I couldn't find a statement in the spec of how the table can specify
> > that there is no interrupt.
> 
> With zero in the GSI field, from the spec:
> | If this MSC does not support overflow interrupts or monitors, this field must be set
> | to 0.
> 
> Which is detected in frob_irq() as:
> |	if (!intid)
> |		return false;
> 
> 
> > Are the interrupts always required for ACPI-based MPAM systems?
> 
> In practice they're optional.
> 
> 
> > overflow_interrupt and error_interrupt are GSIVs, which seems to be an
> > ACPI thing.
> 
> It's a flat space of interrupt IDs - no need to shift the values around for the
> SGI/PPI/SPI range. (aka IPI, percpu interrupt or plain old wired interrupt)
> 
> 
> > The examples in the ACPI spec suggest that 0 can be a valid value.
> > No exceptional value seems to be defined.
> > 
> > The flags fields have some invalid encodings, but no explicit "no
> > interrupt" encoding that I can see.
> 
> I think you missed it hidden as 'and another thing' in the description of the field.
I was being diplomatic.  I meant: "explicitly uses this as an example
of a valid value."
> Pretty sure its not - it would be a secure SGI. The MPAM spec uses it as 'invalid'.
I guess that's is a Arm-ism, then.  So long as this is standard for
ACPI on Arm systems, then I guess there is no problem -- and anyway,
the ACPI MPAM spec would take precedence for interpreting this field.
Now I look more closely, you're right: the ACPI MPAM spec says, e.g.:
"If the MSC supports MSI, as indicated by the
MPAMF_MSMON_IDR.HAS_OFLW_MSI bit, then this field must be set to 0 and
ignored by the OS.  If this MSC does not support overflow interrupts or
monitors, this field must be set to 0".
This does not say that the value 0 means there is no wired interrupt,
but that seems to be the intention.
It could be better worded, but the intention does seem to be that 0
means "no (wired) interrupt", here...
> >> +
> >> +	flags = tbl_msc->error_interrupt_flags;
> >> +	if (flags & ACPI_MPAM_MSC_IRQ_AFFINITY_VALID &&
> >> +	    flags & ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER)
> >> +		aff = tbl_msc->error_interrupt_affinity;
> >> +	else
> >> +		aff = ~0;
> >> +	if (frob_irq(pdev, tbl_msc->error_interrupt, flags, &irq, aff))
> >> +		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "error");
intid 0 gets ignored by frob_irq(), so I guess we are OK in the MSI
case (the spec says we must ignore it, but also says that it must be
zero).
It would be onerous to have to map the MSC and examine its ID regs
here assuming that the intid really is 0 in the MSI case seems
reasonable.
> >> +}
> >> +
> >> +static int acpi_mpam_parse_resource(struct mpam_msc *msc,
> >> +				    struct acpi_mpam_resource_node *res)
> >> +{
> >> +	int level, nid;
> >> +	u32 cache_id;
> >> +
> >> +	switch (res->locator_type) {
> >> +	case ACPI_MPAM_LOCATION_TYPE_PROCESSOR_CACHE:
> >> +		cache_id = res->locator.cache_locator.cache_reference;
> >> +		level = find_acpi_cache_level_from_id(cache_id);
> >> +		if (level <= 0) {
> >> +			pr_err_once("Bad level (%u) for cache with id %u\n", level, cache_id);
> >> +			return -EINVAL;
> >> +		}
> >> +		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_CACHE,
> >> +				       level, cache_id);
> >> +	case ACPI_MPAM_LOCATION_TYPE_MEMORY:
> >> +		nid = pxm_to_node(res->locator.memory_locator.proximity_domain);
> >> +		if (nid == NUMA_NO_NODE)
> >> +			nid = 0;
> >> +		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_MEMORY,
> >> +				       255, nid);
> >> +	default:
> >> +		/* These get discovered later and treated as unknown */
> >> +		return 0;
> >> +	}
> >> +}
> >> +
> >> +int acpi_mpam_parse_resources(struct mpam_msc *msc,
> >> +			      struct acpi_mpam_msc_node *tbl_msc)
> >> +{
> >> +	int i, err;
> >> +	struct acpi_mpam_resource_node *resources;
> >> +
> >> +	resources = (struct acpi_mpam_resource_node *)(tbl_msc + 1);
> 
> > Should we check that we don't go out of the bounds of the MSC node
> > (or, at the very least, of the MPAM table)?
> > 
> > If tbl_msc->length was already validated, that can be used for the
> > bounds check.
> 
> I'm not a fan of trying to validate the APCI tables - its extra work for something that
> just has to be right. e.g. we can't check the base-address, we just have to trust its correct.
> 
> I didn't want to pass the table in here, and grabbing another reference means I'd have to
> work out if its the same mapping...
> 
> But as tbl_msc->length is to hand...
That gets my vote, except that it would be trivial to validate length
in length in acpi_mpam_parse().  Checking against length without first
validating length woule be completely pointless.
I agree that there are things that we can't validate and things that it
is redundant or pointless to check.  But it feels a bit lazy not to
care whether the parser wanders out of the table while trying to parse
it.
If the rest of the ACPI code takes a similarly relaxed attitude (I've
not checked) then I guess it is pointless to be more careful here.
But otherwise, I would prefer to see all the bounds checks done.
This is not hard, and there are not many things to check.  (In my
experience, it requires far fewer brain cycles to just make these things
correct by construction than to try to figure out the situations in
which the corner-cutting might or might not matter.)
> >> +	for (i = 0; i < tbl_msc->num_resource_nodes; i++) {
> >> +		err = acpi_mpam_parse_resource(msc, &resources[i]);
> > 
> > Isn't the length of each resource node variable?  According to [2],
> > the length depends on the num_functional_deps field.  It looks like the
> > functional dependency descriptors (if any) are appended contiguously to
> > the resource node, unless I've misunderstood something.
> 
> No - it proably is that broken. They forgot to include a length field, painting themselves
> into a corner if they ever want to modify this!
The spec doesn't look particularly broken -- i.e., it can be compatibly
extended, albeit in a more awkward way that would be ideal.  The main
impact is that the list of resources has to be parsed sequentially, but
that's all code is ever likely to do with the table anyway.
> I was ignoring the 'functional dependencies' until someone comes up with a platform that
> needs it. But yes, this assumes that field is zero.
> 
> I'm not entirely clear what the MPAM driver is supposed to do with functional
> dependencies: 'make up a configuration' ... but when the controls formats are different,
> its going to get exciting.
> 
> I'ved fixed that up:
> | int acpi_mpam_parse_resources(struct mpam_msc *msc,
> | 			      struct acpi_mpam_msc_node *tbl_msc)
> | {
> | 	int i, err;
> |	char *ptr, *last_byte;
> | 	struct acpi_mpam_resource_node *resource;
> |
> | 	ptr = (char *)(tbl_msc + 1);
> | 	last_byte = ptr + tbl_msc->length;
Nit: misnomer.  This is not the address of the last byte.
(I tend to call similar pointers something like "limit".)
> | 	for (i = 0; i < tbl_msc->num_resource_nodes; i++) {
> | 		if (ptr + sizeof(*resource) > last_byte)
> | 			return -EINVAL;
> |
> | 		resource = (struct acpi_mpam_resource_node *)ptr;
> | 		err = acpi_mpam_parse_resource(msc, resource);
> | 		if (err)
> | 			return err;
> |
> | 		ptr += sizeof(*resource);
> | 		ptr += resource->num_functional_deps * sizeof(struct acpi_mpam_func_deps);
This mostly looks sane.
But unless ptr is guaranteed to be less than 0xffffffff00000002 (?),
then the bounds check at the start of the loop may bogusly pass.
Would a check like this before advancing ptr across the functional deps
be reasonable? (untested)
	if (resource->num_functional_deps >
	    (last_byte - ptr) / sizeof(struct acpi_mpam_func_deps)
		return -EINVAL;
(Why the table allows more functional dependencies to be declared than
could possibly fit in an MSC node is a mystery.)
[...]
> >> +		if (err)
> >> +			return err;
> >> +	}
> >> +
> >> +	return 0;
> >> +}
> >> +
> >> +static bool __init parse_msc_pm_link(struct acpi_mpam_msc_node *tbl_msc,
> >> +				     struct platform_device *pdev,
> >> +				     u32 *acpi_id)
> >> +{
> >> +	bool acpi_id_valid = false;
> >> +	struct acpi_device *buddy;
> >> +	char hid[16], uid[16];
> >> +	int err;
> >> +
> >> +	memset(&hid, 0, sizeof(hid));
> >> +	memcpy(hid, &tbl_msc->hardware_id_linked_device,
> >> +	       sizeof(tbl_msc->hardware_id_linked_device));
> > 
> > This is safe by semi-accident, since 16 > 8.
> > 
> > It might be cleaner to declare
> > 
> > 	char hid[sizeof(tbl_msc->hardware_id_linked_device)];
> > 
> > which can never be wrong.
> 
> But can be too precise!
But only when the correct length was unknown.  So you are saying that
this can only be correct when it is not known to be correct? ;)
Sometimes an oversized array has useful defensive value against future
mis-maintenance, but since the ACPI fields are all fixed-size, I don't
see any benefit here.
> > memset()+memcpy() might be better replaced with strscpy() (or just use
> > snprintf again, since this avoids having to think about multiple
> > different ways of avoiding buffer overflows at the same time.  This is
> > not a fast path.)
> 
> snprintf() demands space for a NULL byte - but these APCI tables fields are fixed width,
> so don't have them.
> 
> The array sizes picked are the next larger power of 2, and the memset is ensure there is a
> NULL byte after the fixed size region is copied in.
> 
> Where snprintf() is being used - its to convert from integer to string. For the hid, what
> we really want is a fixed size memcpy() ... its possible to cook up a format string that
> will do that, but its going to be an eye sore.
Well, it's a matter of style.
> 
> I can change this to;
> | char hid[sizeof(tbl_msc->hardware_id_linked_device) + 1];
Can do (and this would probably have shut me up), but:
> 
> But I don't think this is better... How about a comment?!
> |	/*
> |	 * tbl_msc->hardware_id_linked_device is an 8 byte fixed width string.
> |	 * hid[] is the next larger power of 2, and is zero'd to give us a
> |	 * null terminated string for acpi_dev_get_first_match_dev().
> |	 */
this doesn't explain why the size needs to be a power of two, or what
the wasted bytes are for.
(Either way, this is obviously no big deal...)
> 
> 
> This is all because the MPAM stuff was forced into a static table, even though it needs to
> refer to stuff in the namespace.
> 
> 
> >> +
> >> +	if (!strcmp(hid, ACPI_PROCESSOR_CONTAINER_HID)) {
> >> +		*acpi_id = tbl_msc->instance_id_linked_device;
> >> +		acpi_id_valid = true;
> >> +	}
> >> +
> >> +	err = snprintf(uid, sizeof(uid), "%u",
> > 
> > char uid[11]; would be sufficient, here.  The instance ID is strictly
> > 32-bit.  Adding a safety margin is worthless here, since snprintf()
> > checks the bounds -- either the size is sufficient for all possible u32
> > values, or it isn't.
> 
> I just plumped for the next largest power of 2.
This is harmless, but since the only way to know that this is
sufficient is to know what the correct value would have been, it seems
pointless (as above) to lie to the compiler about how much space can
be used.
> 
> 
> >> +		       tbl_msc->instance_id_linked_device);
> > 
> > Can snprintf() return < 0 on error?
> 
> Documented as returning a positive number.
Not in C23, and never standard for the printf() family in general.  But
it does appear that this is true for the kernel's snprintf(), and this
assmuption is widely relied upon in the kernel.  So, a check here would
be pointless compilexity after all.
(The only documented error condition in C23 seems the case of a
multibyte encoding error, which is not relevant here although it does
not rule out other error conditions -- the wording is
characteristically ambiguous.)
[...]
> >> +	if (err >= sizeof(uid))
> >> +		return acpi_id_valid;
> > 
> > Possibly return true on error?  Why?
> 
> (this is an error case that will never happen - but error handling is important!)
> 
> The acpi_id was parsed out of the table and returned to the caller. Hence true is
> returned. This gets used to find the CPU-affinity, which is how the driver avoids
> accessing caches that might not be on. There are sanity checks around this value. It can't
> be present for some caches and not others.
> 
> Due to this error, the link to the buddy device was not added - which is silently ignored.
> If it matters, it will show up as an access to a device that didn't get turned on by the
> power-management code, because of that missing link.
> 
> Because I don't think this can happpen, I didn't try to handle it explicitly - just carry
> on like nothing is wrong.
What I meant was, if the MPAM table contains garbage, why not just flag
this and give up?  That feels semantically simpler than trying to limp on,
with possibly undiagnosed invalid results.
[...]
> >> +static int __init acpi_mpam_parse(void)
> >> +{
> >> +        struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
> > 
> > checkpatch.pl says:
> > 
> >  | ERROR: code indent should use tabs where possible
> >  | #240: FILE: drivers/acpi/arm64/mpam.c:177:
> >  | +        struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);$
> >  | 
> >  | WARNING: please, no spaces at the start of a line
> >  | #240: FILE: drivers/acpi/arm64/mpam.c:177:
> >  | +        struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);$
> 
> Pff, I copied that from Jonathan's email because of its rune like nature.
[...]
> >> +	while (table_offset < table_end) {
> >> +		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
> >> +		table_offset += tbl_msc->length;
> >> +
> >> +		/*
> >> +		 * If any of the reserved fields are set, make no attempt to
> >> +		 * parse the msc structure. This will prevent the driver from
> >> +		 * probing all the MSC, meaning it can't discover the system
> >> +		 * wide supported partid and pmg ranges. This avoids whatever
> >> +		 * this MSC is truncating the partids and creating a screaming
> > 
> > Mangled sentence?
> 
> just hard to parse:
> "whatever this MSC is" "truncating the parids..."
> 
> Fixed as
> | This avoids this MSC truncating the partids and creating a screaming error interrupt.
> 
> (although its not strictly the MSC that does that)
The critical thing here seems to be that we prevent the driver from
ever being enabled.  The comment doesn't seem to say that we are
actually disabling the MPAM driver by giving up here (it only explains
why).
For the benefit of people who are less familiar with the code, Would
something like this be better: [*]
--8<--
If the reserved fields are set then the meaning of the rest of the
entry is unknown, so leave the MSC marked as unprobed and give up.
This means that the MPAM driver will never be enabled.  There is no way
to enable it safely, because we cannot determine safe system-wide
partid and pmg ranges in this situation.
-->8--
> 
> 
> > I have not so far found any reference in [2] to the reset value of the
> > MPAMF_ECR.INTEN bit.  Do we rely on the error interrupt(s) for all MSCs
> > to be disabled at the interrupt controller?  If the same interrupt may
> > be shared by multiple MSCs, that's bad.
> 
> They are anticipated as shared because the ESR register lets us know if this MSC triggered
> the error.
>
> The interrupt won't be delivered until the driver requests it - it'll just stay pending.
> 
> Yes - if some firwmare component used an out-of-range PARTID then linux will believe that
> this was caused by linux and disable MPAM. Not much we can do about that.
> 
> (I've even seen a platform where this happens!)
Ah, right -- I'd missed the ESR.
This is inherently racy even without interrupt sharing, so I guess
sharing the interrupt doesn't make things worse (except for having to
poll multiple ESRs on interrupt -- but if that hardware vendor made us
do that, it's on them.)
> >> +		 * error interrupt.
> >> +		 */
> >> +		if (tbl_msc->reserved || tbl_msc->reserved1 || tbl_msc->reserved2)
> >> +			continue;
> 
> > The specs are not clear about how backwards compatibility is supposed
> > to work.
> 
> How about that!
> 
> The spec people thought this would allow them to add things that aren't MSC in, re-using
> one of these reserved fields as a type.
> But - the architecture spec mandates the OS run around the MSC and collect the minimum
> PARTID/PMG values (so people don't have to build an SoC where everything fits together
> nicely). Because of that, the OS needs to know how many MSC there are, so that it knows it
> has probed them all.
See my suggestion at [*], above.
The reader doesn't need to understand about future architecture
directions, here.  The only thing that matters is that the kernel does
not make any assumptions about the meaning of the MSC entry if these
fields are set.  I'm not sure that we need to justify this position --
simply stating it is probably enough.
> > I would feel a bit uneasy about silently throwing away MSCs based on
> > critera that may not indicate incompatibility, and without even a
> > diagnostic.
> 
> Yes the MSC gets thrown away - but it was still counted by acpi_mpam_count_msc(). The MPAM
> driver will never probe all the MSC, so it will never call mpam_enable().
> 
> (you used to be able to see which cpuhp callback was registeered, to know it was stuck in
> discovery, but now a helper does that which always uses the same name. I may fix that)
> 
> 
> I think its fine for MPAM to silently fail to probe like this. The system still works, all
> the known hardware gets probed and none of the unknown hardware is touched. (so it can't
> blow us up). You don't get resctrl - but we can't know if the unknown hardware was going
> to be important to supporting MPAM. e.g. it lowers the minimum usable PARTID/PMG.
> 
> It's between a rock and a hard place - I think this  behaviour is the best we can do.
> 
> 
> Regarding it being silent - whatever happens it needs a kernel upgrade to learn about the
> newly described hardware. I don't think its worth printing anything out in this case.
I would still vote for a basic
	"Unrecognised MSC, MPAM not usable"
message or similar.  This costs us nothing, and at least gives a clue
that missing kernel support or a corrupt ACPI table are likely causes
for resctrl not showing up.
It partly depends on how likely we think this scenario os -- or how
likely it is that people will insist on using an old kernel with a
botched stack of backports on new hardware.  I don't have a strong
sense on this.
> >> +
> >> +		if (!tbl_msc->mmio_size)
> >> +			continue;
> >> +
> >> +		if (decode_interface_type(tbl_msc, &iface))
> >> +			continue;
> > 
> > Ditto regarding diagnostics.
> 
> A third interface type will result in those MSC being untouched, and MPAM unavailable.
> I don't think printing "maybe upgrade your kernel?" is going to help anyone.
> I'll chuck some pr_debug() in here so someone could find out which of these it is.
> 
pr_debug() for sub-diagnosing the error may be helpful, but these feel
like an addition to the basic "MPAM unusable" error rather than as a
replacement for it...
> >> +
> >> +		next_res = 0;
> >> +		next_prop = 0;
> >> +		memset(res, 0, sizeof(res));
> >> +		memset(props, 0, sizeof(props));
> >> +
> >> +		pdev = platform_device_alloc("mpam_msc", tbl_msc->identifier);
> > 
> > If the tbl_msc->identifier values contain duplicates, we will get a
> > platform device with a duplicate name here.  I don't know whether it
> > matters.
> 
> The identifiers must be unique. I don't think this is the sort of table validation we need
> to do for the firmware vendor.
My question was really: "what goes wrong?"  I suppose that if
platform_device_alloc() doesn't like this then
> 
> 
> >> +		if (!pdev) {
> >> +			err = -ENOMEM;
> >> +			break;
> >> +		}
we call the caller we ran out of memory.  While not exactly helpful,
this is better than blowing up.  If this is the only consequence of
duplicate IDs here (and this anyway Should Not Happen), then I guess
this is adequate.
If platform_device_alloc() doesn't care about duplicate names, then I
guess we can just trust it to have done something sensible.
> >> +
> >> +		if (tbl_msc->length < sizeof(*tbl_msc)) {
> >> +			err = -EINVAL;
> >> +			break;
> >> +		}
> 
> > No check for oversized tbl_msc->length?
> 
> You want it to parse the whole thing to check if firmware got the length field right?
> I don't think we can know that. If its wrong, firmware may just have easily got the number
> of RIS, or some other field wrong.
We can't know that the table is "wrong" (since it is authoritative),
but we can know whether it so internally inconsistent that it is not
parseable.
> Historically x86 using to rummage around in memory to see if it could find something that
> looked like ACPI tables. I'd imagine it was important to parse those carefully, as they
> may never have been APCI table. We don't have this problem on arm64 - the ACPI tables were
> handed over from UEFI. If the value is wrong - the firmware vendor meant it to be wrong.
See above for my rationale.
This is more about maintainability and cleanliness than trying to
defend ourselves against broken firmware.
Note: if the ACPI code in general doesn't bother to do these kind of
checks then we can probably follow the established culture.  I'm not so
familiar with this codebase.
[...]
> >> +
> >> +		/* Some power management is described in the namespace: */
> >> +		err = snprintf(uid, sizeof(uid), "%u", tbl_msc->identifier);
> >> +		if (err > 0 && err < sizeof(uid)) {
> >> +			companion = acpi_dev_get_first_match_dev("ARMHAA5C", uid, -1);
> > 
> > Diagnostic?
> > 
Ping
[...]
> >> +int acpi_mpam_count_msc(void)
> >> +{
> >> +        struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
> > 
> > checkpatch.pl says:
> > 
> >  | ERROR: code indent should use tabs where possible
> >  | #359: FILE: drivers/acpi/arm64/mpam.c:296:
> >  | +        struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);$
> >  | 
> >  | WARNING: please, no spaces at the start of a line
> >  | #359: FILE: drivers/acpi/arm64/mpam.c:296:
> >  | +        struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);$
> > 
> 
> Yup, runes copied from Jonathan's email.
Ack
> >> +	char *table_end, *table_offset = (char *)(table + 1);
> >> +	struct acpi_mpam_msc_node *tbl_msc;
> >> +	int count = 0;
> >> +
> >> +	if (IS_ERR(table))
> >> +		return 0;
> >> +
> >> +	if (table->revision < 1)
> >> +		return 0;
> >> +
> >> +	tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
> > 
> > Can this be moved into the loop?  It looks like it just duplicates the
> > update to tbl_msc at the end of the loop; the loop termination
> > condition does not depend on this variable.
> 
> Yup - looks like this used to be a do {} while () ; loop.
OK
> >> +	table_end = (char *)table + table->length;
> >> +
> >> +	while (table_offset < table_end) {
> >> +		if (!tbl_msc->mmio_size)
> >> +			continue;
> > 
> > This is 0 for non-usable PCC-based MSCs, right?
> > 
> > (Why explicitly unusable MSCs are listed in the table at all is a
> > mystery to me, but that's what the spec says.  I guess there must be a
> > reason.)
(I guess PCC was added after the size field was named.  I suppose we
don't really need a comment for that -- it's fairly clear why this bit
of code is here.  We could rename the field in our struct, but this
would probably just move confusion around rather than solving it.)
> Some firmwware stacks have pre-built ACPI tables, then go knock out the enabled bits based
> on what the platform actually has. This is easier than rebuilding the table...
Right, that makes sense.
> >> +
> >> +		if (tbl_msc->length < sizeof(*tbl_msc))
> >> +			return -EINVAL;
> > 
> > Should we also have something like (not tested):
> > 
> > if (tbl_msc->length > table_end - table_offset)
> > 	return -EINVAL;
> 
> Sure,
> 
> 
> > Also, is it an error if a length is not a multiple of four bytes?
> 
> Hmmm, because all the subtables in the sepc are multiples of four bytes?
> I think this falls in the bucket of 'table validation the OS shouldn't have to do'.
> If the ACPI tables are that wrong, we're going to have bigger problems.
So long as we don't fault on unaligned access in the kernel (which I'm
pretty sure we don't do), then nothing goes wrong in the unaligned case
-- so, fine.  Otherwise we'll get a clean Oops, which would at least be
a decent place to start debugging.
> > (I'm guessing that the core ACPI code doesn't try to understand the
> > contents of the MPAM table and so doesn't check this.)
> 
> It doesn't.
Ack.
> >> +
> >> +		count++;
> >> +
> >> +		table_offset += tbl_msc->length;
> >> +		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
> >> +	}
> >> +
> >> +	return count;
> >> +}
> 
> >> diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
> >> new file mode 100644
> >> index 000000000000..0edefa6ba019
> >> --- /dev/null
> >> +++ b/include/linux/arm_mpam.h
> >> @@ -0,0 +1,46 @@
> >> +// SPDX-License-Identifier: GPL-2.0
> >> +/* Copyright (C) 2025 Arm Ltd. */
> > 
> > checkpatch.pl says:
> > 
> >  | WARNING: Improper SPDX comment style for 'include/linux/arm_mpam.h', please use '/*' instead
> >  | #414: FILE: include/linux/arm_mpam.h:1:
> >  | +// SPDX-License-Identifier: GPL-2.0
> >  | 
> >  | WARNING: Missing or malformed SPDX-License-Identifier tag in line 1
> >  | #414: FILE: include/linux/arm_mpam.h:1:
> >  | +// SPDX-License-Identifier: GPL-2.0
> > 
> > (That's probably the same error twice.)
> > 
> > (I never understood why the SPDX folks couldn't have allowed either
> > type of comment -- or at least, the same style in .c and .h files.
> > But I'm sure they had a reason that they believed was good.)
> 
> Fixed,
> 
> Thanks,
> 
> James
> 
OK
Cheers
---Dave
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 08/33] ACPI / MPAM: Parse the MPAM table
  2025-09-05 16:38       ` Dave Martin
@ 2025-09-10 19:19         ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 19:19 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hi Dave,
On 05/09/2025 17:38, Dave Martin wrote:
> On Thu, Sep 04, 2025 at 06:28:17PM +0100, James Morse wrote:
>> On 27/08/2025 17:05, Dave Martin wrote:
>>> On Fri, Aug 22, 2025 at 03:29:49PM +0000, James Morse wrote:
>>>> Add code to parse the arm64 specific MPAM table, looking up the cache
>>>> level from the PPTT and feeding the end result into the MPAM driver.
>>>
>>> Might be worth mentioning that the hook for feeding the parsed factoids
>>> into the driver (mpam_ris_create()) is not implemented for now.
>>
>> Sure,
>>
>>>> diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c
> 
> [...]
> 
>>>> +static bool frob_irq(struct platform_device *pdev, int intid, u32 flags,
>>>> +		     int *irq, u32 processor_container_uid)
>>>
>>> Can this have a name, please?
>>
>> parse_irq()? (but not all of it - only the bits that are duplicated)
>> How about parse_irq_common().
> 
> That is inoffensive, but since this is where the interrupt is validated
> as usable and an attempt is made to reqister it -- and this is
> basically all the function seems to do right now, would
> 
> 	acpi_mpam_register_irq()
> 
> or similar make sense, here?
Sure,
>>>> +{
>>>> +	int sense;
>>>> +
>>>> +	if (!intid)
>>>> +		return false;
>>>> +
>>>> +	if (FIELD_GET(ACPI_MPAM_MSC_IRQ_TYPE_MASK, flags) !=
>>>> +	    ACPI_MPAM_MSC_IRQ_TYPE_WIRED)
>>>> +		return false;
>>>> +
>>>> +	sense = FIELD_GET(ACPI_MPAM_MSC_IRQ_MODE_MASK, flags);
>>
>>> Do we handle cross-endian ACPI tables?
>>
>> ACPI depends on UEFI, which is little endian only.
>>
>>
>>> ACPI defers to the relevant specification regarding the endianness of
>>> externally defined tables, but as of v3.0 (beta) of for MPAM ACPI
>>> spec [1], no statement is made about this.
>>>
>>> Following the spirit of the ACPI core specs, I suspect that the
>>> "correct" answer is that MPAM tables are always little-endian, even if
>>> it not written down anywhere.
>>>
>>> If the kernel is big-endian, we lose.
>>>
>>> Maybe it is sufficient to make CONFIG_ACPI_MPAM depend on
>>> !CONFIG_CPU_BIG_ENDIAN for now.
>>>
>>>
>>> I haven't tried to understand how this is handled for other tables.
>>
>> There is no way to hand the tables to the OS. I don't think this needs explict handling,
>> it all just falls out in the wash.
> 
> (I'm not sure what you mean by not being able to hand tables to the OS.
> How did Linux get them?)
Your hypothetical cross-endian machine would have to be a big-endian kernel running with
little-endian ACPI tables. UEFI is little-endian, UEFI is the only mechanism for handing
ACPI tables to the kernel ... there is no mechanism for a big-endian kernel to get hold of
ACPI tables.
> But in any case, if the BE case is not handled by the Linux ACPI code
> in general then it almost certainly doesn't make sense to handle it
> here.
> 
> (There are a lot of BE-handling macros in the imported ACPICA headers
> in the kernel tree, which made me wonder whether this was a thing.)
Nothing defines ACPI_BIG_ENDIAN ... I don't know why acpica supports it, but linux doesn't
use it.
[...]
>>>> +		return false;
>>>> +	}
>>>> +
>>>> +	return true;
>>>> +}
>>>> +
>>>> +static void acpi_mpam_parse_irqs(struct platform_device *pdev,
>>>> +				 struct acpi_mpam_msc_node *tbl_msc,
>>>> +				 struct resource *res, int *res_idx)
>>>> +{
>>>> +	u32 flags, aff;
>>>> +	int irq;
>>
>>> We may still get in here if MPAMF_IDR.HAS_ERR_MSI and/or
>>> MPAMF_MSMON_IDR.HAS_OFLW_MSI is set.
>>
>> (this code can't know that)
>>
>>
>>> If so, there is no wired
>>> interrupt.  Does it matter if we still parse and allocate the wired
>>> interrupts here?
>>
>> I don't think we do - frob_irq() will return false if the 'wired' flag is not set in the
>> table, so the 'res[(*res_idx)++]' step will be skipped, and the callers array is left
>> unmodified. The struct property_entry array is null-terminated, and gets copied anyway.
>>
>>
>>>> +	flags = tbl_msc->overflow_interrupt_flags;
>>>> +	if (flags & ACPI_MPAM_MSC_IRQ_AFFINITY_VALID &&
>>>> +	    flags & ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER)
>>>> +		aff = tbl_msc->overflow_interrupt_affinity;
>>>> +	else
>>>> +		aff = ~0;
>>>
>>> (u32)~0 is used as an exceptional UID all over the place.  If this is
>>> not a pre-existing convention, it could be worth having a #define for
>>> this.  (grep '~0' drivers/acpi/ suggests that this is new.)
>>
>> It's not normal to describe the affinity of an interrupt in tables like this...
>> The MPAM ACPI spec allows you to describe the PPI affinity because the MSC can be local to
>> a CPU, (e.g. the L2 cache), and the MPAM architecture spec says it can be a PPI.
>> We have support for this on DT systems, so it wasn't possible to argue "no one would ever
>> do that"!
> 
> I guess this is OK.
> 
> (My misgivings about ~0 are partly due to the way C evaluates expressions.
> The conversion to the destination type occus only after the ~ is evaulated,
> so if the affected type is changed to a 64-bit type during maintenance
> (or by copy-pasting into another context), then it's easy to end up with
> with wrong values in the high bits.  The definition
> 
> 	u64 x = ~0;
> 
> does indeed set x to all ones, but I find the reason _why_ this works
> counterintuitive, and superficially similar expressions can go wrong.
> Having a #define only requires this to be got right in one place.)
Thanks for the background.
>>>> +	if (frob_irq(pdev, tbl_msc->overflow_interrupt, flags, &irq, aff))
>>>> +		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "overflow");
>>>
>>> I couldn't find a statement in the spec of how the table can specify
>>> that there is no interrupt.
>>
>> With zero in the GSI field, from the spec:
>> | If this MSC does not support overflow interrupts or monitors, this field must be set
>> | to 0.
>>
>> Which is detected in frob_irq() as:
>> |	if (!intid)
>> |		return false;
>>
>>
>>> Are the interrupts always required for ACPI-based MPAM systems?
>>
>> In practice they're optional.
>>
>>
>>> overflow_interrupt and error_interrupt are GSIVs, which seems to be an
>>> ACPI thing.
>>
>> It's a flat space of interrupt IDs - no need to shift the values around for the
>> SGI/PPI/SPI range. (aka IPI, percpu interrupt or plain old wired interrupt)
>>
>>
>>> The examples in the ACPI spec suggest that 0 can be a valid value.
>>> No exceptional value seems to be defined.
>>>
>>> The flags fields have some invalid encodings, but no explicit "no
>>> interrupt" encoding that I can see.
>>
>> I think you missed it hidden as 'and another thing' in the description of the field.
> 
> I was being diplomatic.  I meant: "explicitly uses this as an example
> of a valid value."
I can ask the spec people to clarify things - but I'm not entirely sure what the request
is. An example entry in the appendix for a system with no interrupts?
>> Pretty sure its not - it would be a secure SGI. The MPAM spec uses it as 'invalid'.
> 
> I guess that's is a Arm-ism, then.  So long as this is standard for
> ACPI on Arm systems, then I guess there is no problem -- and anyway,
> the ACPI MPAM spec would take precedence for interpreting this field.
> 
> Now I look more closely, you're right: the ACPI MPAM spec says, e.g.:
> "If the MSC supports MSI, as indicated by the
> MPAMF_MSMON_IDR.HAS_OFLW_MSI bit, then this field must be set to 0 and
> ignored by the OS.  If this MSC does not support overflow interrupts or
> monitors, this field must be set to 0".
> 
> This does not say that the value 0 means there is no wired interrupt,
> but that seems to be the intention.
> 
> It could be better worded, but the intention does seem to be that 0
> means "no (wired) interrupt", here...
I'll copy you on the thread to the spec people.
>>>> +
>>>> +	flags = tbl_msc->error_interrupt_flags;
>>>> +	if (flags & ACPI_MPAM_MSC_IRQ_AFFINITY_VALID &&
>>>> +	    flags & ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER)
>>>> +		aff = tbl_msc->error_interrupt_affinity;
>>>> +	else
>>>> +		aff = ~0;
>>>> +	if (frob_irq(pdev, tbl_msc->error_interrupt, flags, &irq, aff))
>>>> +		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "error");
> 
> intid 0 gets ignored by frob_irq(), so I guess we are OK in the MSI
> case (the spec says we must ignore it, but also says that it must be
> zero).
> 
> It would be onerous to have to map the MSC and examine its ID regs
> here assuming that the intid really is 0 in the MSI case seems
> reasonable.
Registering+Requesting MSI would be up to the driver to do, there is no reason for this
ACPI parsing code to get involved with that.
>>>> +static int acpi_mpam_parse_resource(struct mpam_msc *msc,
>>>> +				    struct acpi_mpam_resource_node *res)
>>>> +{
>>>> +	int level, nid;
>>>> +	u32 cache_id;
>>>> +
>>>> +	switch (res->locator_type) {
>>>> +	case ACPI_MPAM_LOCATION_TYPE_PROCESSOR_CACHE:
>>>> +		cache_id = res->locator.cache_locator.cache_reference;
>>>> +		level = find_acpi_cache_level_from_id(cache_id);
>>>> +		if (level <= 0) {
>>>> +			pr_err_once("Bad level (%u) for cache with id %u\n", level, cache_id);
>>>> +			return -EINVAL;
>>>> +		}
>>>> +		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_CACHE,
>>>> +				       level, cache_id);
>>>> +	case ACPI_MPAM_LOCATION_TYPE_MEMORY:
>>>> +		nid = pxm_to_node(res->locator.memory_locator.proximity_domain);
>>>> +		if (nid == NUMA_NO_NODE)
>>>> +			nid = 0;
>>>> +		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_MEMORY,
>>>> +				       255, nid);
>>>> +	default:
>>>> +		/* These get discovered later and treated as unknown */
>>>> +		return 0;
>>>> +	}
>>>> +}
>>>> +
>>>> +int acpi_mpam_parse_resources(struct mpam_msc *msc,
>>>> +			      struct acpi_mpam_msc_node *tbl_msc)
>>>> +{
>>>> +	int i, err;
>>>> +	struct acpi_mpam_resource_node *resources;
>>>> +
>>>> +	resources = (struct acpi_mpam_resource_node *)(tbl_msc + 1);
>>
>>> Should we check that we don't go out of the bounds of the MSC node
>>> (or, at the very least, of the MPAM table)?
>>>
>>> If tbl_msc->length was already validated, that can be used for the
>>> bounds check.
>>
>> I'm not a fan of trying to validate the APCI tables - its extra work for something that
>> just has to be right. e.g. we can't check the base-address, we just have to trust its correct.
>>
>> I didn't want to pass the table in here, and grabbing another reference means I'd have to
>> work out if its the same mapping...
>>
>> But as tbl_msc->length is to hand...
> 
> That gets my vote, except that it would be trivial to validate length
> in length in acpi_mpam_parse().  Checking against length without first
> validating length woule be completely pointless.
I've added something to do that...
> I agree that there are things that we can't validate and things that it
> is redundant or pointless to check.  But it feels a bit lazy not to
> care whether the parser wanders out of the table while trying to parse
> it.
> 
> If the rest of the ACPI code takes a similarly relaxed attitude (I've
> not checked) then I guess it is pointless to be more careful here.
> 
> But otherwise, I would prefer to see all the bounds checks done.
> This is not hard, and there are not many things to check.  (In my
> experience, it requires far fewer brain cycles to just make these things
> correct by construction than to try to figure out the situations in
> which the corner-cutting might or might not matter.)
> 
>>>> +	for (i = 0; i < tbl_msc->num_resource_nodes; i++) {
>>>> +		err = acpi_mpam_parse_resource(msc, &resources[i]);
>>>
>>> Isn't the length of each resource node variable?  According to [2],
>>> the length depends on the num_functional_deps field.  It looks like the
>>> functional dependency descriptors (if any) are appended contiguously to
>>> the resource node, unless I've misunderstood something.
>>
>> No - it proably is that broken. They forgot to include a length field, painting themselves
>> into a corner if they ever want to modify this!
> 
> The spec doesn't look particularly broken -- i.e., it can be compatibly
> extended, albeit in a more awkward way that would be ideal.  The main
> impact is that the list of resources has to be parsed sequentially, but
> that's all code is ever likely to do with the table anyway.
> 
>> I was ignoring the 'functional dependencies' until someone comes up with a platform that
>> needs it. But yes, this assumes that field is zero.
>>
>> I'm not entirely clear what the MPAM driver is supposed to do with functional
>> dependencies: 'make up a configuration' ... but when the controls formats are different,
>> its going to get exciting.
>>
>> I'ved fixed that up:
>> | int acpi_mpam_parse_resources(struct mpam_msc *msc,
>> | 			      struct acpi_mpam_msc_node *tbl_msc)
>> | {
>> | 	int i, err;
>> |	char *ptr, *last_byte;
>> | 	struct acpi_mpam_resource_node *resource;
>> |
>> | 	ptr = (char *)(tbl_msc + 1);
>> | 	last_byte = ptr + tbl_msc->length;
> 
> Nit: misnomer.  This is not the address of the last byte.
> 
> (I tend to call similar pointers something like "limit".)
It's called table_end elsewhere - I've gone with that.
>> | 	for (i = 0; i < tbl_msc->num_resource_nodes; i++) {
>> | 		if (ptr + sizeof(*resource) > last_byte)
>> | 			return -EINVAL;
>> |
>> | 		resource = (struct acpi_mpam_resource_node *)ptr;
>> | 		err = acpi_mpam_parse_resource(msc, resource);
>> | 		if (err)
>> | 			return err;
>> |
>> | 		ptr += sizeof(*resource);
>> | 		ptr += resource->num_functional_deps * sizeof(struct acpi_mpam_func_deps);
> 
> This mostly looks sane.
> 
> But unless ptr is guaranteed to be less than 0xffffffff00000002 (?),
> then the bounds check at the start of the loop may bogusly pass.
> 
> Would a check like this before advancing ptr across the functional deps
> be reasonable? (untested)
> 
> 	if (resource->num_functional_deps >
> 	    (last_byte - ptr) / sizeof(struct acpi_mpam_func_deps)
> 		return -EINVAL;
> 
> (Why the table allows more functional dependencies to be declared than
> could possibly fit in an MSC node is a mystery.)
Giving bits of that names made it easier for me to read. (and fit on one line)
|	remaining_table = table_end - ptr;
|	max_deps = remaining_table / sizeof(struct acpi_mpam_func_deps);
|	if (resource->num_functional_deps > max_deps) {
|		pr_debug("MSC has impossible number of functional dependencies\n");
|		return -EINVAL;
|	}
>>>> +		if (err)
>>>> +			return err;
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static bool __init parse_msc_pm_link(struct acpi_mpam_msc_node *tbl_msc,
>>>> +				     struct platform_device *pdev,
>>>> +				     u32 *acpi_id)
>>>> +{
>>>> +	bool acpi_id_valid = false;
>>>> +	struct acpi_device *buddy;
>>>> +	char hid[16], uid[16];
>>>> +	int err;
>>>> +
>>>> +	memset(&hid, 0, sizeof(hid));
>>>> +	memcpy(hid, &tbl_msc->hardware_id_linked_device,
>>>> +	       sizeof(tbl_msc->hardware_id_linked_device));
>>>
>>> This is safe by semi-accident, since 16 > 8.
>>>
>>> It might be cleaner to declare
>>>
>>> 	char hid[sizeof(tbl_msc->hardware_id_linked_device)];
>>>
>>> which can never be wrong.
>>
>> But can be too precise!
> 
> But only when the correct length was unknown.  So you are saying that
> this can only be correct when it is not known to be correct? ;)
> 
> Sometimes an oversized array has useful defensive value against future
> mis-maintenance, but since the ACPI fields are all fixed-size, I don't
> see any benefit here.
The field in the table is eight bytes, but the string on the stack must be nine because
the table isn't null terminated. I went for the next power of two.
>>> memset()+memcpy() might be better replaced with strscpy() (or just use
>>> snprintf again, since this avoids having to think about multiple
>>> different ways of avoiding buffer overflows at the same time.  This is
>>> not a fast path.)
>>
>> snprintf() demands space for a NULL byte - but these APCI tables fields are fixed width,
>> so don't have them.
>>
>> The array sizes picked are the next larger power of 2, and the memset is ensure there is a
>> NULL byte after the fixed size region is copied in.
>>
>> Where snprintf() is being used - its to convert from integer to string. For the hid, what
>> we really want is a fixed size memcpy() ... its possible to cook up a format string that
>> will do that, but its going to be an eye sore.
> 
> Well, it's a matter of style.
> 
>>
>> I can change this to;
>> | char hid[sizeof(tbl_msc->hardware_id_linked_device) + 1];
> 
> Can do (and this would probably have shut me up), but:
fine, lets do that.
>>
>> But I don't think this is better... How about a comment?!
>> |	/*
>> |	 * tbl_msc->hardware_id_linked_device is an 8 byte fixed width string.
>> |	 * hid[] is the next larger power of 2, and is zero'd to give us a
>> |	 * null terminated string for acpi_dev_get_first_match_dev().
>> |	 */
> 
> this doesn't explain why the size needs to be a power of two, or what
> the wasted bytes are for.
> 
> (Either way, this is obviously no big deal...)
> 
>>
>>
>> This is all because the MPAM stuff was forced into a static table, even though it needs to
>> refer to stuff in the namespace.
>>
>>
>>>> +
>>>> +	if (!strcmp(hid, ACPI_PROCESSOR_CONTAINER_HID)) {
>>>> +		*acpi_id = tbl_msc->instance_id_linked_device;
>>>> +		acpi_id_valid = true;
>>>> +	}
>>>> +
>>>> +	err = snprintf(uid, sizeof(uid), "%u",
>>>
>>> char uid[11]; would be sufficient, here.  The instance ID is strictly
>>> 32-bit.  Adding a safety margin is worthless here, since snprintf()
>>> checks the bounds -- either the size is sufficient for all possible u32
>>> values, or it isn't.
>>
>> I just plumped for the next largest power of 2.
> 
> This is harmless, but since the only way to know that this is
> sufficient is to know what the correct value would have been, it seems
> pointless (as above) to lie to the compiler about how much space can
> be used.
It's on the stack, the compiler is almost certain to align to register size anyway. Making
it obviously big enough saves any time thinking about whether it should be 11 or 12.
I've changed it to 11. (I'll take your word for it!)
>>>> +		       tbl_msc->instance_id_linked_device);
>>>
>>> Can snprintf() return < 0 on error?
>>
>> Documented as returning a positive number.
> 
> Not in C23, and never standard for the printf() family in general.  But
> it does appear that this is true for the kernel's snprintf(), and this
> assmuption is widely relied upon in the kernel.  So, a check here would
> be pointless compilexity after all.
> 
> (The only documented error condition in C23 seems the case of a
> multibyte encoding error, which is not relevant here although it does
> not rule out other error conditions -- the wording is
> characteristically ambiguous.)
> 
> [...]
> 
>>>> +	if (err >= sizeof(uid))
>>>> +		return acpi_id_valid;
>>>
>>> Possibly return true on error?  Why?
>>
>> (this is an error case that will never happen - but error handling is important!)
>>
>> The acpi_id was parsed out of the table and returned to the caller. Hence true is
>> returned. This gets used to find the CPU-affinity, which is how the driver avoids
>> accessing caches that might not be on. There are sanity checks around this value. It can't
>> be present for some caches and not others.
>>
>> Due to this error, the link to the buddy device was not added - which is silently ignored.
>> If it matters, it will show up as an access to a device that didn't get turned on by the
>> power-management code, because of that missing link.
>>
>> Because I don't think this can happpen, I didn't try to handle it explicitly - just carry
>> on like nothing is wrong.
> What I meant was, if the MPAM table contains garbage, why not just flag
> this and give up?  That feels semantically simpler than trying to limp on,
> with possibly undiagnosed invalid results.
Not junk in the table - but the kernel was unable to convert a known-size integer to a
string in a correctly sized array.
I've added a debug statement for it:
|	pr_debug("Failed to convert uid of device for power management.");
>>>> +	while (table_offset < table_end) {
>>>> +		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
>>>> +		table_offset += tbl_msc->length;
>>>> +
>>>> +		/*
>>>> +		 * If any of the reserved fields are set, make no attempt to
>>>> +		 * parse the msc structure. This will prevent the driver from
>>>> +		 * probing all the MSC, meaning it can't discover the system
>>>> +		 * wide supported partid and pmg ranges. This avoids whatever
>>>> +		 * this MSC is truncating the partids and creating a screaming
>>>
>>> Mangled sentence?
>>
>> just hard to parse:
>> "whatever this MSC is" "truncating the parids..."
>>
>> Fixed as
>> | This avoids this MSC truncating the partids and creating a screaming error interrupt.
>>
>> (although its not strictly the MSC that does that)
> 
> The critical thing here seems to be that we prevent the driver from
> ever being enabled.  The comment doesn't seem to say that we are
> actually disabling the MPAM driver by giving up here (it only explains
> why).
> 
> For the benefit of people who are less familiar with the code, Would
> something like this be better: [*]
> 
> --8<--
> 
> If the reserved fields are set then the meaning of the rest of the
> entry is unknown,
I'm not sure that is true - use of the reserved fields should be backward compatible, and
there is no top level type because this thing isn't a subtable, so it isn't possible to
add other kinds of MSC - which is where you need the other fields to change meaning.
> so leave the MSC marked as unprobed and give up.
It skips creating the platform devices, so there is nothing for the driver to probe against.
> This means that the MPAM driver will never be enabled.  There is no way
> to enable it safely, because we cannot determine safe system-wide
> partid and pmg ranges in this situation.
> 
> -->8--
Combined as:
| * If any of the reserved fields are set, make no attempt to
| * parse the MSC structure. This MSC will still be counted,
| * meaning the MPAM driver can't probe against all MSC, and
| * will never be enabled. There is no way to enable it safely,
| * because we cannot determine safe system-wide partid and pmg
| * ranges in this situation.
>>> I have not so far found any reference in [2] to the reset value of the
>>> MPAMF_ECR.INTEN bit.  Do we rely on the error interrupt(s) for all MSCs
>>> to be disabled at the interrupt controller?  If the same interrupt may
>>> be shared by multiple MSCs, that's bad.
>>
>> They are anticipated as shared because the ESR register lets us know if this MSC triggered
>> the error.
>>
>> The interrupt won't be delivered until the driver requests it - it'll just stay pending.
>>
>> Yes - if some firwmare component used an out-of-range PARTID then linux will believe that
>> this was caused by linux and disable MPAM. Not much we can do about that.
>>
>> (I've even seen a platform where this happens!)
> 
> Ah, right -- I'd missed the ESR.
> 
> This is inherently racy even without interrupt sharing, so I guess
> sharing the interrupt doesn't make things worse (except for having to
> poll multiple ESRs on interrupt -- but if that hardware vendor made us
> do that, it's on them.)
The irqchip core will call every handler for a shared interrupt because of that race.
>>>> +		 * error interrupt.
>>>> +		 */
>>>> +		if (tbl_msc->reserved || tbl_msc->reserved1 || tbl_msc->reserved2)
>>>> +			continue;
>>
>>> The specs are not clear about how backwards compatibility is supposed
>>> to work.
>>
>> How about that!
>>
>> The spec people thought this would allow them to add things that aren't MSC in, re-using
>> one of these reserved fields as a type.
>> But - the architecture spec mandates the OS run around the MSC and collect the minimum
>> PARTID/PMG values (so people don't have to build an SoC where everything fits together
>> nicely). Because of that, the OS needs to know how many MSC there are, so that it knows it
>> has probed them all.
> 
> See my suggestion at [*], above.
> 
> The reader doesn't need to understand about future architecture
> directions, here.  The only thing that matters is that the kernel does
> not make any assumptions about the meaning of the MSC entry if these
> fields are set.  I'm not sure that we need to justify this position --
> simply stating it is probably enough.
> 
>>> I would feel a bit uneasy about silently throwing away MSCs based on
>>> critera that may not indicate incompatibility, and without even a
>>> diagnostic.
>>
>> Yes the MSC gets thrown away - but it was still counted by acpi_mpam_count_msc(). The MPAM
>> driver will never probe all the MSC, so it will never call mpam_enable().
>>
>> (you used to be able to see which cpuhp callback was registeered, to know it was stuck in
>> discovery, but now a helper does that which always uses the same name. I may fix that)
>>
>>
>> I think its fine for MPAM to silently fail to probe like this. The system still works, all
>> the known hardware gets probed and none of the unknown hardware is touched. (so it can't
>> blow us up). You don't get resctrl - but we can't know if the unknown hardware was going
>> to be important to supporting MPAM. e.g. it lowers the minimum usable PARTID/PMG.
>>
>> It's between a rock and a hard place - I think this  behaviour is the best we can do.
>>
>>
>> Regarding it being silent - whatever happens it needs a kernel upgrade to learn about the
>> newly described hardware. I don't think its worth printing anything out in this case.
> 
> I would still vote for a basic
> 
> 	"Unrecognised MSC, MPAM not usable"
> 
> message or similar.  This costs us nothing, and at least gives a clue
> that missing kernel support or a corrupt ACPI table are likely causes
> for resctrl not showing up.
> 
> It partly depends on how likely we think this scenario os -- or how
> likely it is that people will insist on using an old kernel with a
> botched stack of backports on new hardware.  I don't have a strong
> sense on this.
> 
>>>> +
>>>> +		if (!tbl_msc->mmio_size)
>>>> +			continue;
>>>> +
>>>> +		if (decode_interface_type(tbl_msc, &iface))
>>>> +			continue;
>>>
>>> Ditto regarding diagnostics.
>>
>> A third interface type will result in those MSC being untouched, and MPAM unavailable.
>> I don't think printing "maybe upgrade your kernel?" is going to help anyone.
>> I'll chuck some pr_debug() in here so someone could find out which of these it is.
>>
> 
> pr_debug() for sub-diagnosing the error may be helpful, but these feel
> like an addition to the basic "MPAM unusable" error rather than as a
> replacement for it...
Sure, printing multiple levels of error message looks strange to me. But whatever.
>>>> +
>>>> +		next_res = 0;
>>>> +		next_prop = 0;
>>>> +		memset(res, 0, sizeof(res));
>>>> +		memset(props, 0, sizeof(props));
>>>> +
>>>> +		pdev = platform_device_alloc("mpam_msc", tbl_msc->identifier);
>>>
>>> If the tbl_msc->identifier values contain duplicates, we will get a
>>> platform device with a duplicate name here.  I don't know whether it
>>> matters.
>>
>> The identifiers must be unique. I don't think this is the sort of table validation we need
>> to do for the firmware vendor.
> 
> My question was really: "what goes wrong?"  I suppose that if
> platform_device_alloc() doesn't like this then
True - you'd get an error from the plafrom device allocation code about the duplicate id.
>>
>>>> +		if (!pdev) {
>>>> +			err = -ENOMEM;
>>>> +			break;
>>>> +		}
> 
> we call the caller we ran out of memory.  While not exactly helpful,
> this is better than blowing up.  If this is the only consequence of
> duplicate IDs here (and this anyway Should Not Happen), then I guess
> this is adequate.
> 
> If platform_device_alloc() doesn't care about duplicate names, then I
> guess we can just trust it to have done something sensible.
I'm pretty sure it cares because the names have to be unique under sysfs.
[..]
>>>> +
>>>> +		/* Some power management is described in the namespace: */
>>>> +		err = snprintf(uid, sizeof(uid), "%u", tbl_msc->identifier);
>>>> +		if (err > 0 && err < sizeof(uid)) {
>>>> +			companion = acpi_dev_get_first_match_dev("ARMHAA5C", uid, -1);
>>>
>>> Diagnostic?
>>>
> 
> Ping
In line with what I did for the previous ones, I shoved something here.
[...]
>>>> +	table_end = (char *)table + table->length;
>>>> +
>>>> +	while (table_offset < table_end) {
>>>> +		if (!tbl_msc->mmio_size)
>>>> +			continue;
>>>
>>> This is 0 for non-usable PCC-based MSCs, right?
>>>
>>> (Why explicitly unusable MSCs are listed in the table at all is a
>>> mystery to me, but that's what the spec says.  I guess there must be a
>>> reason.)
> 
> (I guess PCC was added after the size field was named.  I suppose we
> don't really need a comment for that -- it's fairly clear why this bit
> of code is here.  We could rename the field in our struct, but this
(its imported from acpica - it would take quite a long time to change)
> would probably just move confusion around rather than solving it.)
[...]
>>> Also, is it an error if a length is not a multiple of four bytes?
>>
>> Hmmm, because all the subtables in the sepc are multiples of four bytes?
>> I think this falls in the bucket of 'table validation the OS shouldn't have to do'.
>> If the ACPI tables are that wrong, we're going to have bigger problems.
> 
> So long as we don't fault on unaligned access in the kernel (which I'm
> pretty sure we don't do), then nothing goes wrong in the unaligned case
> -- so, fine.  Otherwise we'll get a clean Oops, which would at least be
> a decent place to start debugging.
UEFI gets to choose the attibutes, but if it choses Device, things will blow up
well before we get in here.
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
 
 
 
- * [PATCH 09/33] dt-bindings: arm: Add MPAM MSC binding
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (7 preceding siblings ...)
  2025-08-22 15:29 ` [PATCH 08/33] ACPI / MPAM: Parse the MPAM table James Morse
@ 2025-08-22 15:29 ` James Morse
  2025-08-27 16:22   ` Dave Martin
  2025-08-22 15:29 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
                   ` (58 subsequent siblings)
  67 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
From: Rob Herring <robh@kernel.org>
The binding is designed around the assumption that an MSC will be a
sub-block of something else such as a memory controller, cache controller,
or IOMMU. However, it's certainly possible a design does not have that
association or has a mixture of both, so the binding illustrates how we can
support that with RIS child nodes.
A key part of MPAM is we need to know about all of the MSCs in the system
before it can be enabled. This drives the need for the genericish
'arm,mpam-msc' compatible. Though we can't assume an MSC is accessible
until a h/w specific driver potentially enables the h/w.
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
 * Syntax(?) corrections supplied by Rob.
 * Culled some context in the example.
---
 .../devicetree/bindings/arm/arm,mpam-msc.yaml | 200 ++++++++++++++++++
 1 file changed, 200 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/arm,mpam-msc.yaml
diff --git a/Documentation/devicetree/bindings/arm/arm,mpam-msc.yaml b/Documentation/devicetree/bindings/arm/arm,mpam-msc.yaml
new file mode 100644
index 000000000000..d984817b3385
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/arm,mpam-msc.yaml
@@ -0,0 +1,200 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/arm/arm,mpam-msc.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Arm Memory System Resource Partitioning and Monitoring (MPAM)
+
+description: |
+  The Arm MPAM specification can be found here:
+
+  https://developer.arm.com/documentation/ddi0598/latest
+
+maintainers:
+  - Rob Herring <robh@kernel.org>
+
+properties:
+  compatible:
+    items:
+      - const: arm,mpam-msc                   # Further details are discoverable
+      - const: arm,mpam-memory-controller-msc
+
+  reg:
+    maxItems: 1
+    description: A memory region containing registers as defined in the MPAM
+      specification.
+
+  interrupts:
+    minItems: 1
+    items:
+      - description: error (optional)
+      - description: overflow (optional, only for monitoring)
+
+  interrupt-names:
+    oneOf:
+      - items:
+          - enum: [ error, overflow ]
+      - items:
+          - const: error
+          - const: overflow
+
+  arm,not-ready-us:
+    description: The maximum time in microseconds for monitoring data to be
+      accurate after a settings change. For more information, see the
+      Not-Ready (NRDY) bit description in the MPAM specification.
+
+  numa-node-id: true # see NUMA binding
+
+  '#address-cells':
+    const: 1
+
+  '#size-cells':
+    const: 0
+
+patternProperties:
+  '^ris@[0-9a-f]$':
+    type: object
+    additionalProperties: false
+    description:
+      RIS nodes for each RIS in an MSC. These nodes are required for each RIS
+      implementing known MPAM controls
+
+    properties:
+      compatible:
+        enum:
+            # Bulk storage for cache
+          - arm,mpam-cache
+            # Memory bandwidth
+          - arm,mpam-memory
+
+      reg:
+        minimum: 0
+        maximum: 0xf
+
+      cpus:
+        description:
+          Phandle(s) to the CPU node(s) this RIS belongs to. By default, the parent
+          device's affinity is used.
+
+      arm,mpam-device:
+        $ref: /schemas/types.yaml#/definitions/phandle
+        description:
+          By default, the MPAM enabled device associated with a RIS is the MSC's
+          parent node. It is possible for each RIS to be associated with different
+          devices in which case 'arm,mpam-device' should be used.
+
+    required:
+      - compatible
+      - reg
+
+required:
+  - compatible
+  - reg
+
+dependencies:
+  interrupts: [ interrupt-names ]
+
+additionalProperties: false
+
+examples:
+  - |
+    L3: cache-controller@30000000 {
+        compatible = "arm,dsu-l3-cache", "cache";
+        cache-level = <3>;
+        cache-unified;
+
+        ranges = <0x0 0x30000000 0x800000>;
+        #address-cells = <1>;
+        #size-cells = <1>;
+
+        msc@10000 {
+            compatible = "arm,mpam-msc";
+
+            /* CPU affinity implied by parent cache node's  */
+            reg = <0x10000 0x2000>;
+            interrupts = <1>, <2>;
+            interrupt-names = "error", "overflow";
+            arm,not-ready-us = <1>;
+        };
+    };
+
+    mem: memory-controller@20000 {
+        compatible = "foo,a-memory-controller";
+        reg = <0x20000 0x1000>;
+
+        #address-cells = <1>;
+        #size-cells = <1>;
+        ranges;
+
+        msc@21000 {
+            compatible = "arm,mpam-memory-controller-msc", "arm,mpam-msc";
+            reg = <0x21000 0x1000>;
+            interrupts = <3>;
+            interrupt-names = "error";
+            arm,not-ready-us = <1>;
+            numa-node-id = <1>;
+        };
+    };
+
+    iommu@40000 {
+        reg = <0x40000 0x1000>;
+
+        ranges;
+        #address-cells = <1>;
+        #size-cells = <1>;
+
+        msc@41000 {
+            compatible = "arm,mpam-msc";
+            reg = <0 0x1000>;
+            interrupts = <5>, <6>;
+            interrupt-names = "error", "overflow";
+            arm,not-ready-us = <1>;
+
+            #address-cells = <1>;
+            #size-cells = <0>;
+
+            ris@2 {
+                compatible = "arm,mpam-cache";
+                reg = <0>;
+                // TODO: How to map to device(s)?
+            };
+        };
+    };
+
+    msc@80000 {
+        compatible = "foo,a-standalone-msc";
+        reg = <0x80000 0x1000>;
+
+        clocks = <&clks 123>;
+
+        ranges;
+        #address-cells = <1>;
+        #size-cells = <1>;
+
+        msc@10000 {
+            compatible = "arm,mpam-msc";
+
+            reg = <0x10000 0x2000>;
+            interrupts = <7>;
+            interrupt-names = "overflow";
+            arm,not-ready-us = <1>;
+
+            #address-cells = <1>;
+            #size-cells = <0>;
+
+            ris@0 {
+                compatible = "arm,mpam-cache";
+                reg = <0>;
+                arm,mpam-device = <&L2_0>;
+            };
+
+            ris@1 {
+                compatible = "arm,mpam-memory";
+                reg = <1>;
+                arm,mpam-device = <&mem>;
+            };
+        };
+    };
+
+...
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 09/33] dt-bindings: arm: Add MPAM MSC binding
  2025-08-22 15:29 ` [PATCH 09/33] dt-bindings: arm: Add MPAM MSC binding James Morse
@ 2025-08-27 16:22   ` Dave Martin
  2025-09-05  9:11     ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Dave Martin @ 2025-08-27 16:22 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hi James,
On Fri, Aug 22, 2025 at 03:29:50PM +0000, James Morse wrote:
> From: Rob Herring <robh@kernel.org>
> 
> The binding is designed around the assumption that an MSC will be a
> sub-block of something else such as a memory controller, cache controller,
> or IOMMU. However, it's certainly possible a design does not have that
> association or has a mixture of both, so the binding illustrates how we can
> support that with RIS child nodes.
> 
> A key part of MPAM is we need to know about all of the MSCs in the system
> before it can be enabled. This drives the need for the genericish
> 'arm,mpam-msc' compatible. Though we can't assume an MSC is accessible
> until a h/w specific driver potentially enables the h/w.
I'll leave detailed review to other people for now, since I'm not so up
to speed on all things DT.
A few random comments, below.
[...]
> diff --git a/Documentation/devicetree/bindings/arm/arm,mpam-msc.yaml b/Documentation/devicetree/bindings/arm/arm,mpam-msc.yaml
[...]
> @@ -0,0 +1,200 @@
[...]
> +title: Arm Memory System Resource Partitioning and Monitoring (MPAM)
> +
> +description: |
> +  The Arm MPAM specification can be found here:
> +
> +  https://developer.arm.com/documentation/ddi0598/latest
> +
> +maintainers:
> +  - Rob Herring <robh@kernel.org>
> +
> +properties:
> +  compatible:
> +    items:
> +      - const: arm,mpam-msc                   # Further details are discoverable
> +      - const: arm,mpam-memory-controller-msc
There seems to be no clear statement about how these differ.
> +  reg:
> +    maxItems: 1
> +    description: A memory region containing registers as defined in the MPAM
> +      specification.
There seems to be no handling of PCC-based MSCs here.  Should there be?
If this can be added later in a backwards-compatible way, I guess
that's not a problem (and this is what compatible strings are for, if
all else fails.)
An explicit statement that PCC is not supported here might be helpful,
though.
> +  interrupts:
> +    minItems: 1
> +    items:
> +      - description: error (optional)
> +      - description: overflow (optional, only for monitoring)
> +
> +  interrupt-names:
> +    oneOf:
> +      - items:
> +          - enum: [ error, overflow ]
> +      - items:
> +          - const: error
> +          - const: overflow
Yeugh.  Is this really the only way to say "one or both of foo"?
(I don't know the answer to this -- though I can believe that it's
true.  Perhaps just not describing this property is another option.
Many bindings seem not to bother.)
> +
> +  arm,not-ready-us:
> +    description: The maximum time in microseconds for monitoring data to be
> +      accurate after a settings change. For more information, see the
> +      Not-Ready (NRDY) bit description in the MPAM specification.
> +
> +  numa-node-id: true # see NUMA binding
> +
> +  '#address-cells':
> +    const: 1
> +
> +  '#size-cells':
> +    const: 0
> +
> +patternProperties:
> +  '^ris@[0-9a-f]$':
It this supposed to be '^ris@[0-9a-f]+$' ?
Currently MPAMF_IDR.RIS_MAX is only 4 bits in size and so cannot be
greater than 0xf.  But it is not inconceivable that a future revision
of the architecture might enable more -- and the are 4 RES0 bits
looming over the RIS_MAX field, just waiting to be used...
(In any case, it feels wrong to try to enforce numeric bounds with a
regex, even in the cases where it happens to work straightforwardly.)
> +    type: object
> +    additionalProperties: false
> +    description:
> +      RIS nodes for each RIS in an MSC. These nodes are required for each RIS
The architectural term is "resource instance", not "RIS".
But "RIS nodes" is fine for describing the DT nodes, since we can call
them what we like, and "ris" is widely used inside the MPAM driver.
People writing DTs should not need to be familiar with the driver's
internal naming conventions, though.
(There are other instances, but I won't comment on them all
individually.)
> +      implementing known MPAM controls
> +
> +    properties:
> +      compatible:
> +        enum:
> +            # Bulk storage for cache
Nit: What is "bulk storage"?
The MPAM spec just refers to "cache" or "cache memory".
> +          - arm,mpam-cache
> +            # Memory bandwidth
> +          - arm,mpam-memory
> +
> +      reg:
> +        minimum: 0
> +        maximum: 0xf
> +
> +      cpus:
> +        description:
> +          Phandle(s) to the CPU node(s) this RIS belongs to. By default, the parent
> +          device's affinity is used.
> +
> +      arm,mpam-device:
> +        $ref: /schemas/types.yaml#/definitions/phandle
> +        description:
> +          By default, the MPAM enabled device associated with a RIS is the MSC's
Associated how?  Is this the device where the physical resources
managed by the MSC are located?
> +          parent node. It is possible for each RIS to be associated with different
> +          devices in which case 'arm,mpam-device' should be used.
[...]
> +examples:
> +  - |
> +    L3: cache-controller@30000000 {
> +        compatible = "arm,dsu-l3-cache", "cache";
> +        cache-level = <3>;
> +        cache-unified;
> +
> +        ranges = <0x0 0x30000000 0x800000>;
> +        #address-cells = <1>;
> +        #size-cells = <1>;
> +
> +        msc@10000 {
> +            compatible = "arm,mpam-msc";
> +
> +            /* CPU affinity implied by parent cache node's  */
"node's" -> "nodes".
(or it this supposed to be in the singular -- i.e., the immediately
parent cache node only?)
Anyway, it looks like this is commenting on the "reg" property, which
doesn't seem right.
Is this commnent supposed instead to explain the omission of the "cpus"
property?  If so, that should be made clearer.
> +            reg = <0x10000 0x2000>;
> +            interrupts = <1>, <2>;
> +            interrupt-names = "error", "overflow";
> +            arm,not-ready-us = <1>;
> +        };
> +    };
[...]
(Examples otherwise not reviewed in detail.)
Cheers
---Dave
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 09/33] dt-bindings: arm: Add MPAM MSC binding
  2025-08-27 16:22   ` Dave Martin
@ 2025-09-05  9:11     ` James Morse
  2025-09-09 11:02       ` Dave Martin
  0 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-09-05  9:11 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hi Dave,
On 27/08/2025 17:22, Dave Martin wrote:
> On Fri, Aug 22, 2025 at 03:29:50PM +0000, James Morse wrote:
>> From: Rob Herring <robh@kernel.org>
>>
>> The binding is designed around the assumption that an MSC will be a
>> sub-block of something else such as a memory controller, cache controller,
>> or IOMMU. However, it's certainly possible a design does not have that
>> association or has a mixture of both, so the binding illustrates how we can
>> support that with RIS child nodes.
>>
>> A key part of MPAM is we need to know about all of the MSCs in the system
>> before it can be enabled. This drives the need for the genericish
>> 'arm,mpam-msc' compatible. Though we can't assume an MSC is accessible
>> until a h/w specific driver potentially enables the h/w.
> I'll leave detailed review to other people for now, since I'm not so up
> to speed on all things DT.
Me neither!
>> diff --git a/Documentation/devicetree/bindings/arm/arm,mpam-msc.yaml b/Documentation/devicetree/bindings/arm/arm,mpam-msc.yaml
> 
> [...]
> 
>> @@ -0,0 +1,200 @@
> 
> [...]
> 
>> +title: Arm Memory System Resource Partitioning and Monitoring (MPAM)
>> +
>> +description: |
>> +  The Arm MPAM specification can be found here:
>> +
>> +  https://developer.arm.com/documentation/ddi0598/latest
>> +
>> +maintainers:
>> +  - Rob Herring <robh@kernel.org>
>> +
>> +properties:
>> +  compatible:
>> +    items:
>> +      - const: arm,mpam-msc                   # Further details are discoverable
>> +      - const: arm,mpam-memory-controller-msc
> 
> There seems to be no clear statement about how these differ.
It's a more-specific compatible, I think these are usually things like:
| compatible = "acme,mega-cache-9000", "arm,mpam-msc"
Where the driver can key errata-workaround on the vendor specific bit when needed.
In this case - I think they're examples, but Rob said they were supposed to be in some
other list of compatible. (not sure what/where that is)
>> +  reg:
>> +    maxItems: 1
>> +    description: A memory region containing registers as defined in the MPAM
>> +      specification.
> There seems to be no handling of PCC-based MSCs here.  Should there be?
That is newer than this document. On DT platforms PCC is spelled SCMI, and is
discoverable. Andre P prototyped this, (patches in the extras branch) but no-one
has come out of the woodwork to say they actually need it yet.
ACPI PCC is a definite maybe.
> If this can be added later in a backwards-compatible way, I guess
> that's not a problem (and this is what compatible strings are for, if
> all else fails.)
> 
> An explicit statement that PCC is not supported here might be helpful,
> though.
I'm pretty sure its discoverable on DT/SCMI platforms.
>> +  interrupts:
>> +    minItems: 1
>> +    items:
>> +      - description: error (optional)
>> +      - description: overflow (optional, only for monitoring)
>> +
>> +  interrupt-names:
>> +    oneOf:
>> +      - items:
>> +          - enum: [ error, overflow ]
>> +      - items:
>> +          - const: error
>> +          - const: overflow
> 
> Yeugh.  Is this really the only way to say "one or both of foo"?
> 
> (I don't know the answer to this -- though I can believe that it's
> true.  Perhaps just not describing this property is another option.
> Many bindings seem not to bother.)
> 
>> +
>> +  arm,not-ready-us:
>> +    description: The maximum time in microseconds for monitoring data to be
>> +      accurate after a settings change. For more information, see the
>> +      Not-Ready (NRDY) bit description in the MPAM specification.
>> +
>> +  numa-node-id: true # see NUMA binding
>> +
>> +  '#address-cells':
>> +    const: 1
>> +
>> +  '#size-cells':
>> +    const: 0
>> +
>> +patternProperties:
>> +  '^ris@[0-9a-f]$':
> 
> It this supposed to be '^ris@[0-9a-f]+$' ?
Looks like yes. Fixed.
> Currently MPAMF_IDR.RIS_MAX is only 4 bits in size and so cannot be
> greater than 0xf.  But it is not inconceivable that a future revision
> of the architecture might enable more -- and the are 4 RES0 bits
> looming over the RIS_MAX field, just waiting to be used...
> 
> (In any case, it feels wrong to try to enforce numeric bounds with a
> regex, even in the cases where it happens to work straightforwardly.)
> 
>> +    type: object
>> +    additionalProperties: false
>> +    description:
>> +      RIS nodes for each RIS in an MSC. These nodes are required for each RIS
> 
> The architectural term is "resource instance", not "RIS".
> 
> But "RIS nodes" is fine for describing the DT nodes, since we can call
> them what we like, and "ris" is widely used inside the MPAM driver.
> People writing DTs should not need to be familiar with the driver's
> internal naming conventions, though.
What about the architecture's name for fields?
This number goes in MPAMCFG_PART_SEL.RIS.
> (There are other instances, but I won't comment on them all
> individually.)
> 
>> +      implementing known MPAM controls
>> +
>> +    properties:
>> +      compatible:
>> +        enum:
>> +            # Bulk storage for cache
> 
> Nit: What is "bulk storage"?
Probably to distinguish it from other storage a cache may have, such as tag-ram.
> The MPAM spec just refers to "cache" or "cache memory".
I figure these are comments, I'll remove them...
>> +          - arm,mpam-cache
>> +            # Memory bandwidth
>> +          - arm,mpam-memory
>> +
>> +      reg:
>> +        minimum: 0
>> +        maximum: 0xf
>> +
>> +      cpus:
>> +        description:
>> +          Phandle(s) to the CPU node(s) this RIS belongs to. By default, the parent
>> +          device's affinity is used.
>> +
>> +      arm,mpam-device:
>> +        $ref: /schemas/types.yaml#/definitions/phandle
>> +        description:
>> +          By default, the MPAM enabled device associated with a RIS is the MSC's
> 
> Associated how?
By the phandle this is a description for.
> Is this the device where the physical resources managed by the MSC are located?
Yes,
>> +          parent node. It is possible for each RIS to be associated with different
>> +          devices in which case 'arm,mpam-device' should be used.
> 
> [...]
> 
>> +examples:
>> +  - |
>> +    L3: cache-controller@30000000 {
>> +        compatible = "arm,dsu-l3-cache", "cache";
>> +        cache-level = <3>;
>> +        cache-unified;
>> +
>> +        ranges = <0x0 0x30000000 0x800000>;
>> +        #address-cells = <1>;
>> +        #size-cells = <1>;
>> +
>> +        msc@10000 {
>> +            compatible = "arm,mpam-msc";
>> +
>> +            /* CPU affinity implied by parent cache node's  */
> 
> "node's" -> "nodes".
> 
> (or it this supposed to be in the singular -- i.e., the immediately
> parent cache node only?)
The MSC's parent cache node can be used to find the affinity.
I'll make it singular and drop the 's
> Anyway, it looks like this is commenting on the "reg" property, which
> doesn't seem right.
> 
> Is this commnent supposed instead to explain the omission of the "cpus"
> property?  If so, that should be made clearer.
I'll move it to the end of the list of properties so it doesn't look like it belongs to
the one below it.
>> +            reg = <0x10000 0x2000>;
>> +            interrupts = <1>, <2>;
>> +            interrupt-names = "error", "overflow";
>> +            arm,not-ready-us = <1>;
>> +        };
>> +    };
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 09/33] dt-bindings: arm: Add MPAM MSC binding
  2025-09-05  9:11     ` James Morse
@ 2025-09-09 11:02       ` Dave Martin
  0 siblings, 0 replies; 200+ messages in thread
From: Dave Martin @ 2025-09-09 11:02 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hi,
On Fri, Sep 05, 2025 at 10:11:03AM +0100, James Morse wrote:
> Hi Dave,
> 
> On 27/08/2025 17:22, Dave Martin wrote:
> > On Fri, Aug 22, 2025 at 03:29:50PM +0000, James Morse wrote:
> >> From: Rob Herring <robh@kernel.org>
> >>
> >> The binding is designed around the assumption that an MSC will be a
> >> sub-block of something else such as a memory controller, cache controller,
> >> or IOMMU. However, it's certainly possible a design does not have that
> >> association or has a mixture of both, so the binding illustrates how we can
> >> support that with RIS child nodes.
> >>
> >> A key part of MPAM is we need to know about all of the MSCs in the system
> >> before it can be enabled. This drives the need for the genericish
> >> 'arm,mpam-msc' compatible. Though we can't assume an MSC is accessible
> >> until a h/w specific driver potentially enables the h/w.
> 
> > I'll leave detailed review to other people for now, since I'm not so up
> > to speed on all things DT.
> 
> Me neither!
> 
> 
> >> diff --git a/Documentation/devicetree/bindings/arm/arm,mpam-msc.yaml b/Documentation/devicetree/bindings/arm/arm,mpam-msc.yaml
> > 
> > [...]
> > 
> >> @@ -0,0 +1,200 @@
> > 
> > [...]
> > 
> >> +title: Arm Memory System Resource Partitioning and Monitoring (MPAM)
> >> +
> >> +description: |
> >> +  The Arm MPAM specification can be found here:
> >> +
> >> +  https://developer.arm.com/documentation/ddi0598/latest
> >> +
> >> +maintainers:
> >> +  - Rob Herring <robh@kernel.org>
> >> +
> >> +properties:
> >> +  compatible:
> >> +    items:
> >> +      - const: arm,mpam-msc                   # Further details are discoverable
> >> +      - const: arm,mpam-memory-controller-msc
> > 
> > There seems to be no clear statement about how these differ.
> 
> It's a more-specific compatible, I think these are usually things like:
> | compatible = "acme,mega-cache-9000", "arm,mpam-msc"
> 
> Where the driver can key errata-workaround on the vendor specific bit when needed.
> 
> In this case - I think they're examples, but Rob said they were supposed to be in some
> other list of compatible. (not sure what/where that is)
I guess I'll defer to the DT folks about how this ought to be presented.
The DT bindings are a weird hybrid of informal and formal that I'm not
really used to.
> >> +  reg:
> >> +    maxItems: 1
> >> +    description: A memory region containing registers as defined in the MPAM
> >> +      specification.
> 
> > There seems to be no handling of PCC-based MSCs here.  Should there be?
> 
> That is newer than this document. On DT platforms PCC is spelled SCMI, and is
> discoverable. Andre P prototyped this, (patches in the extras branch) but no-one
> has come out of the woodwork to say they actually need it yet.
> 
> ACPI PCC is a definite maybe.
>
> > If this can be added later in a backwards-compatible way, I guess
> > that's not a problem (and this is what compatible strings are for, if
> > all else fails.)
> > 
> > An explicit statement that PCC is not supported here might be helpful,
> > though.
> 
> I'm pretty sure its discoverable on DT/SCMI platforms.
OK.  If this may not be needed, is discoverable and/or can be bolted on
in a compatible way later, I guess we wouldn't need to panic about it
just now.
(At least we can do that much more easily than promulgating an update
to the ACPI tables.)
> >> +  interrupts:
> >> +    minItems: 1
> >> +    items:
> >> +      - description: error (optional)
> >> +      - description: overflow (optional, only for monitoring)
> >> +
> >> +  interrupt-names:
> >> +    oneOf:
> >> +      - items:
> >> +          - enum: [ error, overflow ]
> >> +      - items:
> >> +          - const: error
> >> +          - const: overflow
> > 
> > Yeugh.  Is this really the only way to say "one or both of foo"?
> > 
> > (I don't know the answer to this -- though I can believe that it's
> > true.  Perhaps just not describing this property is another option.
> > Many bindings seem not to bother.)
> > 
> >> +
> >> +  arm,not-ready-us:
> >> +    description: The maximum time in microseconds for monitoring data to be
> >> +      accurate after a settings change. For more information, see the
> >> +      Not-Ready (NRDY) bit description in the MPAM specification.
> >> +
> >> +  numa-node-id: true # see NUMA binding
> >> +
> >> +  '#address-cells':
> >> +    const: 1
> >> +
> >> +  '#size-cells':
> >> +    const: 0
> >> +
> >> +patternProperties:
> >> +  '^ris@[0-9a-f]$':
> > 
> > It this supposed to be '^ris@[0-9a-f]+$' ?
> 
> Looks like yes. Fixed.
OK
> > Currently MPAMF_IDR.RIS_MAX is only 4 bits in size and so cannot be
> > greater than 0xf.  But it is not inconceivable that a future revision
> > of the architecture might enable more -- and the are 4 RES0 bits
> > looming over the RIS_MAX field, just waiting to be used...
> > 
> > (In any case, it feels wrong to try to enforce numeric bounds with a
> > regex, even in the cases where it happens to work straightforwardly.)
> > 
> >> +    type: object
> >> +    additionalProperties: false
> >> +    description:
> >> +      RIS nodes for each RIS in an MSC. These nodes are required for each RIS
> > 
> > The architectural term is "resource instance", not "RIS".
> > 
> > But "RIS nodes" is fine for describing the DT nodes, since we can call
> > them what we like, and "ris" is widely used inside the MPAM driver.
> 
> 
> > People writing DTs should not need to be familiar with the driver's
> > internal naming conventions, though.
> 
> What about the architecture's name for fields?
> This number goes in MPAMCFG_PART_SEL.RIS.
That's the identifier for the resource instance (= "Resource Instance
Selector", see e.g., ARM IHI 0099A.a Section 9.4.14 "MPAMCFG_PART_SEL,
MPAM Partition Configuration Selection Register").  The way I read this,
the contents of MPAMCFG_PART_SEL.RIS is just a numeric identifier
identifier, rather than the thing being identified.
(I guess I am bikeshedding, here.  The chance for actual confusion
remains low.  I just find this use of "RIS" a bit dissonant.)
> > (There are other instances, but I won't comment on them all
> > individually.)
> > 
> >> +      implementing known MPAM controls
> >> +
> >> +    properties:
> >> +      compatible:
> >> +        enum:
> >> +            # Bulk storage for cache
> > 
> > Nit: What is "bulk storage"?
> 
> Probably to distinguish it from other storage a cache may have, such as tag-ram.
> 
> > The MPAM spec just refers to "cache" or "cache memory".
> 
> I figure these are comments, I'll remove them...
> 
> 
> >> +          - arm,mpam-cache
> >> +            # Memory bandwidth
> >> +          - arm,mpam-memory
I think that the meaning of "mpam-cache" is pretty obvious without
benefiting from a comment, but "mpam-memory" is not an obvious name for
memory _bandwidth_.  That probably still wants clarification.
> >> +
> >> +      reg:
> >> +        minimum: 0
> >> +        maximum: 0xf
> >> +
> >> +      cpus:
> >> +        description:
> >> +          Phandle(s) to the CPU node(s) this RIS belongs to. By default, the parent
> >> +          device's affinity is used.
> >> +
> >> +      arm,mpam-device:
> >> +        $ref: /schemas/types.yaml#/definitions/phandle
> >> +        description:
> >> +          By default, the MPAM enabled device associated with a RIS is the MSC's
> > 
> > Associated how?
> 
> By the phandle this is a description for.
> 
> 
> > Is this the device where the physical resources managed by the MSC are located?
> 
> Yes,
OK, that's not "associated by the phandle".  It's a physical hardware
property.
[...]
> >> +examples:
> >> +  - |
> >> +    L3: cache-controller@30000000 {
> >> +        compatible = "arm,dsu-l3-cache", "cache";
> >> +        cache-level = <3>;
> >> +        cache-unified;
> >> +
> >> +        ranges = <0x0 0x30000000 0x800000>;
> >> +        #address-cells = <1>;
> >> +        #size-cells = <1>;
> >> +
> >> +        msc@10000 {
> >> +            compatible = "arm,mpam-msc";
> >> +
> >> +            /* CPU affinity implied by parent cache node's  */
> > 
> > "node's" -> "nodes".
> > 
> > (or it this supposed to be in the singular -- i.e., the immediately
> > parent cache node only?)
> 
> The MSC's parent cache node can be used to find the affinity.
> I'll make it singular and drop the 's
OK
> > Anyway, it looks like this is commenting on the "reg" property, which
> > doesn't seem right.
> > 
> > Is this commnent supposed instead to explain the omission of the "cpus"
> > property?  If so, that should be made clearer.
> 
> 
> I'll move it to the end of the list of properties so it doesn't look like it belongs to
> the one below it.
Ack, that works.
Cheers
---Dave
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
 
 
- * [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (8 preceding siblings ...)
  2025-08-22 15:29 ` [PATCH 09/33] dt-bindings: arm: Add MPAM MSC binding James Morse
@ 2025-08-22 15:29 ` James Morse
  2025-08-22 19:15   ` Markus Elfring
                     ` (5 more replies)
  2025-08-22 15:29 ` [PATCH 11/33] arm_mpam: Add support for memory controller MSC on DT platforms James Morse
                   ` (57 subsequent siblings)
  67 siblings, 6 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Probing MPAM is convoluted. MSCs that are integrated with a CPU may
only be accessible from those CPUs, and they may not be online.
Touching the hardware early is pointless as MPAM can't be used until
the system-wide common values for num_partid and num_pmg have been
discovered.
Start with driver probe/remove and mapping the MSC.
CC: Carl Worth <carl@os.amperecomputing.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
 * Check for status=broken DT devices.
 * Moved all the files around.
 * Made Kconfig symbols depend on EXPERT
---
 arch/arm64/Kconfig              |   1 +
 drivers/Kconfig                 |   2 +
 drivers/Makefile                |   1 +
 drivers/resctrl/Kconfig         |  11 ++
 drivers/resctrl/Makefile        |   4 +
 drivers/resctrl/mpam_devices.c  | 336 ++++++++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h |  62 ++++++
 7 files changed, 417 insertions(+)
 create mode 100644 drivers/resctrl/Kconfig
 create mode 100644 drivers/resctrl/Makefile
 create mode 100644 drivers/resctrl/mpam_devices.c
 create mode 100644 drivers/resctrl/mpam_internal.h
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e51ccf1da102..ea3c54e04275 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2062,6 +2062,7 @@ config ARM64_TLB_RANGE
 
 config ARM64_MPAM
 	bool "Enable support for MPAM"
+	select ARM64_MPAM_DRIVER
 	select ACPI_MPAM if ACPI
 	help
 	  Memory Partitioning and Monitoring is an optional extension
diff --git a/drivers/Kconfig b/drivers/Kconfig
index 4915a63866b0..3054b50a2f4c 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -251,4 +251,6 @@ source "drivers/hte/Kconfig"
 
 source "drivers/cdx/Kconfig"
 
+source "drivers/resctrl/Kconfig"
+
 endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index b5749cf67044..f41cf4eddeba 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -194,5 +194,6 @@ obj-$(CONFIG_HTE)		+= hte/
 obj-$(CONFIG_DRM_ACCEL)		+= accel/
 obj-$(CONFIG_CDX_BUS)		+= cdx/
 obj-$(CONFIG_DPLL)		+= dpll/
+obj-y				+= resctrl/
 
 obj-$(CONFIG_S390)		+= s390/
diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
new file mode 100644
index 000000000000..dff7b87280ab
--- /dev/null
+++ b/drivers/resctrl/Kconfig
@@ -0,0 +1,11 @@
+# Confusingly, this is everything but the CPU bits of MPAM. CPU here means
+# CPU resources, not containers or cgroups etc.
+config ARM64_MPAM_DRIVER
+	bool "MPAM driver for System IP, e,g. caches and memory controllers"
+	depends on ARM64_MPAM && EXPERT
+
+config ARM64_MPAM_DRIVER_DEBUG
+	bool "Enable debug messages from the MPAM driver."
+	depends on ARM64_MPAM_DRIVER
+	help
+	  Say yes here to enable debug messages from the MPAM driver.
diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
new file mode 100644
index 000000000000..92b48fa20108
--- /dev/null
+++ b/drivers/resctrl/Makefile
@@ -0,0 +1,4 @@
+obj-$(CONFIG_ARM64_MPAM_DRIVER)			+= mpam.o
+mpam-y						+= mpam_devices.o
+
+cflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG)	+= -DDEBUG
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
new file mode 100644
index 000000000000..a0d9a699a6e7
--- /dev/null
+++ b/drivers/resctrl/mpam_devices.c
@@ -0,0 +1,336 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2025 Arm Ltd.
+
+#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
+
+#include <linux/acpi.h>
+#include <linux/arm_mpam.h>
+#include <linux/cacheinfo.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/device.h>
+#include <linux/errno.h>
+#include <linux/gfp.h>
+#include <linux/list.h>
+#include <linux/lockdep.h>
+#include <linux/mutex.h>
+#include <linux/of.h>
+#include <linux/of_platform.h>
+#include <linux/platform_device.h>
+#include <linux/printk.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/srcu.h>
+#include <linux/types.h>
+
+#include <acpi/pcc.h>
+
+#include "mpam_internal.h"
+
+/*
+ * mpam_list_lock protects the SRCU lists when writing. Once the
+ * mpam_enabled key is enabled these lists are read-only,
+ * unless the error interrupt disables the driver.
+ */
+static DEFINE_MUTEX(mpam_list_lock);
+static LIST_HEAD(mpam_all_msc);
+
+static struct srcu_struct mpam_srcu;
+
+/* MPAM isn't available until all the MSC have been probed. */
+static u32 mpam_num_msc;
+
+static void mpam_discovery_complete(void)
+{
+	pr_err("Discovered all MSC\n");
+}
+
+static int mpam_dt_count_msc(void)
+{
+	int count = 0;
+	struct device_node *np;
+
+	for_each_compatible_node(np, NULL, "arm,mpam-msc") {
+		if (of_device_is_available(np))
+			count++;
+	}
+
+	return count;
+}
+
+static int mpam_dt_parse_resource(struct mpam_msc *msc, struct device_node *np,
+				  u32 ris_idx)
+{
+	int err = 0;
+	u32 level = 0;
+	unsigned long cache_id;
+	struct device_node *cache;
+
+	do {
+		if (of_device_is_compatible(np, "arm,mpam-cache")) {
+			cache = of_parse_phandle(np, "arm,mpam-device", 0);
+			if (!cache) {
+				pr_err("Failed to read phandle\n");
+				break;
+			}
+		} else if (of_device_is_compatible(np->parent, "cache")) {
+			cache = of_node_get(np->parent);
+		} else {
+			/* For now, only caches are supported */
+			cache = NULL;
+			break;
+		}
+
+		err = of_property_read_u32(cache, "cache-level", &level);
+		if (err) {
+			pr_err("Failed to read cache-level\n");
+			break;
+		}
+
+		cache_id = cache_of_calculate_id(cache);
+		if (cache_id == ~0UL) {
+			err = -ENOENT;
+			break;
+		}
+
+		err = mpam_ris_create(msc, ris_idx, MPAM_CLASS_CACHE, level,
+				      cache_id);
+	} while (0);
+	of_node_put(cache);
+
+	return err;
+}
+
+static int mpam_dt_parse_resources(struct mpam_msc *msc, void *ignored)
+{
+	int err, num_ris = 0;
+	const u32 *ris_idx_p;
+	struct device_node *iter, *np;
+
+	np = msc->pdev->dev.of_node;
+	for_each_child_of_node(np, iter) {
+		ris_idx_p = of_get_property(iter, "reg", NULL);
+		if (ris_idx_p) {
+			num_ris++;
+			err = mpam_dt_parse_resource(msc, iter, *ris_idx_p);
+			if (err) {
+				of_node_put(iter);
+				return err;
+			}
+		}
+	}
+
+	if (!num_ris)
+		mpam_dt_parse_resource(msc, np, 0);
+
+	return err;
+}
+
+/*
+ * An MSC can control traffic from a set of CPUs, but may only be accessible
+ * from a (hopefully wider) set of CPUs. The common reason for this is power
+ * management. If all the CPUs in a cluster are in PSCI:CPU_SUSPEND, the
+ * the corresponding cache may also be powered off. By making accesses from
+ * one of those CPUs, we ensure this isn't the case.
+ */
+static int update_msc_accessibility(struct mpam_msc *msc)
+{
+	struct device_node *parent;
+	u32 affinity_id;
+	int err;
+
+	if (!acpi_disabled) {
+		err = device_property_read_u32(&msc->pdev->dev, "cpu_affinity",
+					       &affinity_id);
+		if (err)
+			cpumask_copy(&msc->accessibility, cpu_possible_mask);
+		else
+			acpi_pptt_get_cpus_from_container(affinity_id,
+							  &msc->accessibility);
+
+		return 0;
+	}
+
+	/* This depends on the path to of_node */
+	parent = of_get_parent(msc->pdev->dev.of_node);
+	if (parent == of_root) {
+		cpumask_copy(&msc->accessibility, cpu_possible_mask);
+		err = 0;
+	} else {
+		err = -EINVAL;
+		pr_err("Cannot determine accessibility of MSC: %s\n",
+		       dev_name(&msc->pdev->dev));
+	}
+	of_node_put(parent);
+
+	return err;
+}
+
+static int fw_num_msc;
+
+static void mpam_pcc_rx_callback(struct mbox_client *cl, void *msg)
+{
+	/* TODO: wake up tasks blocked on this MSC's PCC channel */
+}
+
+static void mpam_msc_drv_remove(struct platform_device *pdev)
+{
+	struct mpam_msc *msc = platform_get_drvdata(pdev);
+
+	if (!msc)
+		return;
+
+	mutex_lock(&mpam_list_lock);
+	mpam_num_msc--;
+	platform_set_drvdata(pdev, NULL);
+	list_del_rcu(&msc->glbl_list);
+	synchronize_srcu(&mpam_srcu);
+	devm_kfree(&pdev->dev, msc);
+	mutex_unlock(&mpam_list_lock);
+}
+
+static int mpam_msc_drv_probe(struct platform_device *pdev)
+{
+	int err;
+	struct mpam_msc *msc;
+	struct resource *msc_res;
+	void *plat_data = pdev->dev.platform_data;
+
+	mutex_lock(&mpam_list_lock);
+	do {
+		msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
+		if (!msc) {
+			err = -ENOMEM;
+			break;
+		}
+
+		mutex_init(&msc->probe_lock);
+		mutex_init(&msc->part_sel_lock);
+		mutex_init(&msc->outer_mon_sel_lock);
+		raw_spin_lock_init(&msc->inner_mon_sel_lock);
+		msc->id = mpam_num_msc++;
+		msc->pdev = pdev;
+		INIT_LIST_HEAD_RCU(&msc->glbl_list);
+		INIT_LIST_HEAD_RCU(&msc->ris);
+
+		err = update_msc_accessibility(msc);
+		if (err)
+			break;
+		if (cpumask_empty(&msc->accessibility)) {
+			pr_err_once("msc:%u is not accessible from any CPU!",
+				    msc->id);
+			err = -EINVAL;
+			break;
+		}
+
+		if (device_property_read_u32(&pdev->dev, "pcc-channel",
+					     &msc->pcc_subspace_id))
+			msc->iface = MPAM_IFACE_MMIO;
+		else
+			msc->iface = MPAM_IFACE_PCC;
+
+		if (msc->iface == MPAM_IFACE_MMIO) {
+			void __iomem *io;
+
+			io = devm_platform_get_and_ioremap_resource(pdev, 0,
+								    &msc_res);
+			if (IS_ERR(io)) {
+				pr_err("Failed to map MSC base address\n");
+				err = PTR_ERR(io);
+				break;
+			}
+			msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
+			msc->mapped_hwpage = io;
+		} else if (msc->iface == MPAM_IFACE_PCC) {
+			msc->pcc_cl.dev = &pdev->dev;
+			msc->pcc_cl.rx_callback = mpam_pcc_rx_callback;
+			msc->pcc_cl.tx_block = false;
+			msc->pcc_cl.tx_tout = 1000; /* 1s */
+			msc->pcc_cl.knows_txdone = false;
+
+			msc->pcc_chan = pcc_mbox_request_channel(&msc->pcc_cl,
+								 msc->pcc_subspace_id);
+			if (IS_ERR(msc->pcc_chan)) {
+				pr_err("Failed to request MSC PCC channel\n");
+				err = PTR_ERR(msc->pcc_chan);
+				break;
+			}
+		}
+
+		list_add_rcu(&msc->glbl_list, &mpam_all_msc);
+		platform_set_drvdata(pdev, msc);
+	} while (0);
+	mutex_unlock(&mpam_list_lock);
+
+	if (!err) {
+		/* Create RIS entries described by firmware */
+		if (!acpi_disabled)
+			err = acpi_mpam_parse_resources(msc, plat_data);
+		else
+			err = mpam_dt_parse_resources(msc, plat_data);
+	}
+
+	if (!err && fw_num_msc == mpam_num_msc)
+		mpam_discovery_complete();
+
+	if (err && msc)
+		mpam_msc_drv_remove(pdev);
+
+	return err;
+}
+
+static const struct of_device_id mpam_of_match[] = {
+	{ .compatible = "arm,mpam-msc", },
+	{},
+};
+MODULE_DEVICE_TABLE(of, mpam_of_match);
+
+static struct platform_driver mpam_msc_driver = {
+	.driver = {
+		.name = "mpam_msc",
+		.of_match_table = of_match_ptr(mpam_of_match),
+	},
+	.probe = mpam_msc_drv_probe,
+	.remove = mpam_msc_drv_remove,
+};
+
+/*
+ * MSC that are hidden under caches are not created as platform devices
+ * as there is no cache driver. Caches are also special-cased in
+ * update_msc_accessibility().
+ */
+static void mpam_dt_create_foundling_msc(void)
+{
+	int err;
+	struct device_node *cache;
+
+	for_each_compatible_node(cache, NULL, "cache") {
+		err = of_platform_populate(cache, mpam_of_match, NULL, NULL);
+		if (err)
+			pr_err("Failed to create MSC devices under caches\n");
+	}
+}
+
+static int __init mpam_msc_driver_init(void)
+{
+	if (!system_supports_mpam())
+		return -EOPNOTSUPP;
+
+	init_srcu_struct(&mpam_srcu);
+
+	if (!acpi_disabled)
+		fw_num_msc = acpi_mpam_count_msc();
+	else
+		fw_num_msc = mpam_dt_count_msc();
+
+	if (fw_num_msc <= 0) {
+		pr_err("No MSC devices found in firmware\n");
+		return -EINVAL;
+	}
+
+	if (acpi_disabled)
+		mpam_dt_create_foundling_msc();
+
+	return platform_driver_register(&mpam_msc_driver);
+}
+subsys_initcall(mpam_msc_driver_init);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
new file mode 100644
index 000000000000..07e0f240eaca
--- /dev/null
+++ b/drivers/resctrl/mpam_internal.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+// Copyright (C) 2024 Arm Ltd.
+
+#ifndef MPAM_INTERNAL_H
+#define MPAM_INTERNAL_H
+
+#include <linux/arm_mpam.h>
+#include <linux/cpumask.h>
+#include <linux/io.h>
+#include <linux/mailbox_client.h>
+#include <linux/mutex.h>
+#include <linux/resctrl.h>
+#include <linux/sizes.h>
+
+struct mpam_msc {
+	/* member of mpam_all_msc */
+	struct list_head        glbl_list;
+
+	int			id;
+	struct platform_device *pdev;
+
+	/* Not modified after mpam_is_enabled() becomes true */
+	enum mpam_msc_iface	iface;
+	u32			pcc_subspace_id;
+	struct mbox_client	pcc_cl;
+	struct pcc_mbox_chan	*pcc_chan;
+	u32			nrdy_usec;
+	cpumask_t		accessibility;
+
+	/*
+	 * probe_lock is only take during discovery. After discovery these
+	 * properties become read-only and the lists are protected by SRCU.
+	 */
+	struct mutex		probe_lock;
+	unsigned long		ris_idxs[128 / BITS_PER_LONG];
+	u32			ris_max;
+
+	/* mpam_msc_ris of this component */
+	struct list_head	ris;
+
+	/*
+	 * part_sel_lock protects access to the MSC hardware registers that are
+	 * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
+	 * by RIS).
+	 * If needed, take msc->lock first.
+	 */
+	struct mutex		part_sel_lock;
+
+	/*
+	 * mon_sel_lock protects access to the MSC hardware registers that are
+	 * affeted by MPAMCFG_MON_SEL.
+	 * If needed, take msc->lock first.
+	 */
+	struct mutex		outer_mon_sel_lock;
+	raw_spinlock_t		inner_mon_sel_lock;
+	unsigned long		inner_mon_sel_flags;
+
+	void __iomem		*mapped_hwpage;
+	size_t			mapped_hwpage_sz;
+};
+
+#endif /* MPAM_INTERNAL_H */
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-08-22 15:29 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
@ 2025-08-22 19:15   ` Markus Elfring
  2025-08-22 19:55   ` Markus Elfring
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 200+ messages in thread
From: Markus Elfring @ 2025-08-22 19:15 UTC (permalink / raw)
  To: James Morse, linux-arm-kernel, linux-acpi, devicetree
  Cc: LKML, Amit Singh Tomar, Baisheng Gao, Baolin Wang,
	bobo.shaobowang, Carl Worth, Catalin Marinas, Conor Dooley,
	Danilo Krummrich, Dave Martin, David Hildenbrand, Drew Fustini,
	D Scott Phillips, Fenghua Yu, Greg Kroah-Hartman, Hanjun Guo,
	Jamie Iles, Jonathan Cameron, Koba Ko, Krzysztof Kozlowski,
	Len Brown, Linu Cherian, Lorenzo Pieralisi, Peter Newman,
	Rafael J. Wysocki, Rex Nie, Rob Herring, Rohit Mathew,
	Shameer Kolothum, Shanker Donthineni, Shaopeng Tan, Sudeep Holla,
	Will Deacon, Xin Hao
…
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -0,0 +1,336 @@
…
> +static void mpam_msc_drv_remove(struct platform_device *pdev)
> +{
…
> +	mutex_lock(&mpam_list_lock);
> +	mpam_num_msc--;
…
> +	devm_kfree(&pdev->dev, msc);
> +	mutex_unlock(&mpam_list_lock);
> +}
…
Under which circumstances would you become interested to apply a statement
like “guard(mutex)(&mpam_list_lock);”?
https://elixir.bootlin.com/linux/v6.17-rc2/source/include/linux/mutex.h#L228
Regards,
Markus
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-08-22 15:29 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
  2025-08-22 19:15   ` Markus Elfring
@ 2025-08-22 19:55   ` Markus Elfring
  2025-08-23  6:41     ` Greg Kroah-Hartman
  2025-08-27 13:03   ` Ben Horgan
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 200+ messages in thread
From: Markus Elfring @ 2025-08-22 19:55 UTC (permalink / raw)
  To: James Morse, linux-arm-kernel, linux-acpi, devicetree
  Cc: LKML, Amit Singh Tomar, Baisheng Gao, Baolin Wang,
	bobo.shaobowang, Carl Worth, Catalin Marinas, Conor Dooley,
	Danilo Krummrich, Dave Martin, David Hildenbrand, Drew Fustini,
	D Scott Phillips, Fenghua Yu, Greg Kroah-Hartman, Hanjun Guo,
	Jamie Iles, Jonathan Cameron, Koba Ko, Krzysztof Kozlowski,
	Len Brown, Linu Cherian, Lorenzo Pieralisi, Peter Newman,
	Rafael J. Wysocki, Rex Nie, Rob Herring, Rohit Mathew,
	Shameer Kolothum, Shanker Donthineni, Shaopeng Tan, Sudeep Holla,
	Will Deacon, Xin Hao
…
…
> +static int mpam_msc_drv_probe(struct platform_device *pdev)
> +{
…
> +	} while (0);
> +	mutex_unlock(&mpam_list_lock);
> +
> +	if (!err) {
> +		/* Create RIS entries described by firmware */
> +		if (!acpi_disabled)
> +			err = acpi_mpam_parse_resources(msc, plat_data);
> +		else
> +			err = mpam_dt_parse_resources(msc, plat_data);
> +	}
> +
> +	if (!err && fw_num_msc == mpam_num_msc)
> +		mpam_discovery_complete();
> +
> +	if (err && msc)
> +		mpam_msc_drv_remove(pdev);
> +
> +	return err;
> +}
…
* Would you like to integrate anything from the following source code variant?
	if (!err)
		/* Create RIS entries described by firmware */
		err = acpi_disabled
		      ? mpam_dt_parse_resources(msc, plat_data)
		      : acpi_mpam_parse_resources(msc, plat_data);
	if (err) {
		if (msc)
			mpam_msc_drv_remove(pdev);
	} else {
		if (fw_num_msc == mpam_num_msc)
			mpam_discovery_complete();
	}
* How do you think about to increase the application of scope-based resource management
  at further places?
Regards,
Markus
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-08-22 19:55   ` Markus Elfring
@ 2025-08-23  6:41     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 200+ messages in thread
From: Greg Kroah-Hartman @ 2025-08-23  6:41 UTC (permalink / raw)
  To: Markus Elfring
  Cc: James Morse, linux-arm-kernel, linux-acpi, devicetree, LKML,
	Amit Singh Tomar, Baisheng Gao, Baolin Wang, bobo.shaobowang,
	Carl Worth, Catalin Marinas, Conor Dooley, Danilo Krummrich,
	Dave Martin, David Hildenbrand, Drew Fustini, D Scott Phillips,
	Fenghua Yu, Hanjun Guo, Jamie Iles, Jonathan Cameron, Koba Ko,
	Krzysztof Kozlowski, Len Brown, Linu Cherian, Lorenzo Pieralisi,
	Peter Newman, Rafael J. Wysocki, Rex Nie, Rob Herring,
	Rohit Mathew, Shameer Kolothum, Shanker Donthineni, Shaopeng Tan,
	Sudeep Holla, Will Deacon, Xin Hao
On Fri, Aug 22, 2025 at 09:55:33PM +0200, Markus Elfring wrote:
> …
> …
> > +static int mpam_msc_drv_probe(struct platform_device *pdev)
> > +{
> …
> > +	} while (0);
> > +	mutex_unlock(&mpam_list_lock);
> > +
> > +	if (!err) {
> > +		/* Create RIS entries described by firmware */
> > +		if (!acpi_disabled)
> > +			err = acpi_mpam_parse_resources(msc, plat_data);
> > +		else
> > +			err = mpam_dt_parse_resources(msc, plat_data);
> > +	}
> > +
> > +	if (!err && fw_num_msc == mpam_num_msc)
> > +		mpam_discovery_complete();
> > +
> > +	if (err && msc)
> > +		mpam_msc_drv_remove(pdev);
> > +
> > +	return err;
> > +}
> …
> 
> * Would you like to integrate anything from the following source code variant?
> 
> 	if (!err)
> 		/* Create RIS entries described by firmware */
> 		err = acpi_disabled
> 		      ? mpam_dt_parse_resources(msc, plat_data)
> 		      : acpi_mpam_parse_resources(msc, plat_data);
> 
> 	if (err) {
> 		if (msc)
> 			mpam_msc_drv_remove(pdev);
> 	} else {
> 		if (fw_num_msc == mpam_num_msc)
> 			mpam_discovery_complete();
> 	}
> 
> * How do you think about to increase the application of scope-based resource management
>   at further places?
> 
> 
> Regards,
> Markus
Hi,
This is the semi-friendly patch-bot of Greg Kroah-Hartman.
Markus, you seem to have sent a nonsensical or otherwise pointless
review comment to a patch submission on a Linux kernel developer mailing
list.  I strongly suggest that you not do this anymore.  Please do not
bother developers who are actively working to produce patches and
features with comments that, in the end, are a waste of time.
Patch submitter, please ignore Markus's suggestion; you do not need to
follow it at all.  The person/bot/AI that sent it is being ignored by
almost all Linux kernel maintainers for having a persistent pattern of
behavior of producing distracting and pointless commentary, and
inability to adapt to feedback.  Please feel free to also ignore emails
from them.
thanks,
greg k-h's patch email bot
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
- * Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-08-22 15:29 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
  2025-08-22 19:15   ` Markus Elfring
  2025-08-22 19:55   ` Markus Elfring
@ 2025-08-27 13:03   ` Ben Horgan
  2025-09-05 18:48     ` James Morse
  2025-08-27 15:39   ` Rob Herring
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-08-27 13:03 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:29, James Morse wrote:
> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
> only be accessible from those CPUs, and they may not be online.
> Touching the hardware early is pointless as MPAM can't be used until
> the system-wide common values for num_partid and num_pmg have been
> discovered.
> 
> Start with driver probe/remove and mapping the MSC.
> 
> CC: Carl Worth <carl@os.amperecomputing.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
>  * Check for status=broken DT devices.
>  * Moved all the files around.
>  * Made Kconfig symbols depend on EXPERT
> ---
>  arch/arm64/Kconfig              |   1 +
>  drivers/Kconfig                 |   2 +
>  drivers/Makefile                |   1 +
>  drivers/resctrl/Kconfig         |  11 ++
>  drivers/resctrl/Makefile        |   4 +
>  drivers/resctrl/mpam_devices.c  | 336 ++++++++++++++++++++++++++++++++
>  drivers/resctrl/mpam_internal.h |  62 ++++++
>  7 files changed, 417 insertions(+)
>  create mode 100644 drivers/resctrl/Kconfig
>  create mode 100644 drivers/resctrl/Makefile
>  create mode 100644 drivers/resctrl/mpam_devices.c
>  create mode 100644 drivers/resctrl/mpam_internal.h
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index e51ccf1da102..ea3c54e04275 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2062,6 +2062,7 @@ config ARM64_TLB_RANGE
>  
>  config ARM64_MPAM
>  	bool "Enable support for MPAM"
> +	select ARM64_MPAM_DRIVER
>  	select ACPI_MPAM if ACPI
>  	help
>  	  Memory Partitioning and Monitoring is an optional extension
> diff --git a/drivers/Kconfig b/drivers/Kconfig
> index 4915a63866b0..3054b50a2f4c 100644
> --- a/drivers/Kconfig
> +++ b/drivers/Kconfig
> @@ -251,4 +251,6 @@ source "drivers/hte/Kconfig"
>  
>  source "drivers/cdx/Kconfig"
>  
> +source "drivers/resctrl/Kconfig"
> +
>  endmenu
> diff --git a/drivers/Makefile b/drivers/Makefile
> index b5749cf67044..f41cf4eddeba 100644
> --- a/drivers/Makefile
> +++ b/drivers/Makefile
> @@ -194,5 +194,6 @@ obj-$(CONFIG_HTE)		+= hte/
>  obj-$(CONFIG_DRM_ACCEL)		+= accel/
>  obj-$(CONFIG_CDX_BUS)		+= cdx/
>  obj-$(CONFIG_DPLL)		+= dpll/
> +obj-y				+= resctrl/
>  
>  obj-$(CONFIG_S390)		+= s390/
> diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
> new file mode 100644
> index 000000000000..dff7b87280ab
> --- /dev/null
> +++ b/drivers/resctrl/Kconfig
> @@ -0,0 +1,11 @@
> +# Confusingly, this is everything but the CPU bits of MPAM. CPU here means
> +# CPU resources, not containers or cgroups etc.
> +config ARM64_MPAM_DRIVER
> +	bool "MPAM driver for System IP, e,g. caches and memory controllers"
> +	depends on ARM64_MPAM && EXPERT
> +
> +config ARM64_MPAM_DRIVER_DEBUG
> +	bool "Enable debug messages from the MPAM driver."
> +	depends on ARM64_MPAM_DRIVER
> +	help
> +	  Say yes here to enable debug messages from the MPAM driver.
> diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
> new file mode 100644
> index 000000000000..92b48fa20108
> --- /dev/null
> +++ b/drivers/resctrl/Makefile
> @@ -0,0 +1,4 @@
> +obj-$(CONFIG_ARM64_MPAM_DRIVER)			+= mpam.o
> +mpam-y						+= mpam_devices.o
> +
> +cflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG)	+= -DDEBUG
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> new file mode 100644
> index 000000000000..a0d9a699a6e7
> --- /dev/null
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -0,0 +1,336 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2025 Arm Ltd.
> +
> +#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
> +
> +#include <linux/acpi.h>
> +#include <linux/arm_mpam.h>
> +#include <linux/cacheinfo.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
> +#include <linux/device.h>
> +#include <linux/errno.h>
> +#include <linux/gfp.h>
> +#include <linux/list.h>
> +#include <linux/lockdep.h>
> +#include <linux/mutex.h>
> +#include <linux/of.h>
> +#include <linux/of_platform.h>
> +#include <linux/platform_device.h>
> +#include <linux/printk.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/srcu.h>
> +#include <linux/types.h>
> +
> +#include <acpi/pcc.h>
> +
> +#include "mpam_internal.h"
> +
> +/*
> + * mpam_list_lock protects the SRCU lists when writing. Once the
> + * mpam_enabled key is enabled these lists are read-only,
> + * unless the error interrupt disables the driver.
> + */
> +static DEFINE_MUTEX(mpam_list_lock);
> +static LIST_HEAD(mpam_all_msc);
> +
> +static struct srcu_struct mpam_srcu;
> +
> +/* MPAM isn't available until all the MSC have been probed. */
> +static u32 mpam_num_msc;
> +
> +static void mpam_discovery_complete(void)
> +{
> +	pr_err("Discovered all MSC\n");
> +}
> +
> +static int mpam_dt_count_msc(void)
> +{
> +	int count = 0;
> +	struct device_node *np;
> +
> +	for_each_compatible_node(np, NULL, "arm,mpam-msc") {
> +		if (of_device_is_available(np))
> +			count++;
> +	}
> +
> +	return count;
> +}
> +
> +static int mpam_dt_parse_resource(struct mpam_msc *msc, struct device_node *np,
> +				  u32 ris_idx)
> +{
> +	int err = 0;
> +	u32 level = 0;
> +	unsigned long cache_id;
> +	struct device_node *cache;
> +
> +	do {
> +		if (of_device_is_compatible(np, "arm,mpam-cache")) {
> +			cache = of_parse_phandle(np, "arm,mpam-device", 0);
> +			if (!cache) {
> +				pr_err("Failed to read phandle\n");
> +				break;
> +			}
This looks like this allows "arm,mpam-cache" and "arm,mpam-device" to be
used on an msc node when there are no ris children. This usage could be
reasonable but doesn't match the schema in the previous patch. Should
this usage be rejected or the schema extended?
> +		} else if (of_device_is_compatible(np->parent, "cache")) {
> +			cache = of_node_get(np->parent);
> +		} else {
> +			/* For now, only caches are supported */
> +			cache = NULL;
> +			break;
> +		}
> +
> +		err = of_property_read_u32(cache, "cache-level", &level);
> +		if (err) {
> +			pr_err("Failed to read cache-level\n");
> +			break;
> +		}
> +
> +		cache_id = cache_of_calculate_id(cache);
> +		if (cache_id == ~0UL) {
> +			err = -ENOENT;
> +			break;
> +		}
> +
> +		err = mpam_ris_create(msc, ris_idx, MPAM_CLASS_CACHE, level,
> +				      cache_id);
> +	} while (0);
> +	of_node_put(cache);
> +
> +	return err;
> +}
> +
> +static int mpam_dt_parse_resources(struct mpam_msc *msc, void *ignored)
> +{
> +	int err, num_ris = 0;
> +	const u32 *ris_idx_p;
> +	struct device_node *iter, *np;
> +
> +	np = msc->pdev->dev.of_node;
> +	for_each_child_of_node(np, iter) {
> +		ris_idx_p = of_get_property(iter, "reg", NULL);
> +		if (ris_idx_p) {
> +			num_ris++;
> +			err = mpam_dt_parse_resource(msc, iter, *ris_idx_p);
> +			if (err) {
> +				of_node_put(iter);
> +				return err;
> +			}
> +		}
> +	}
> +
> +	if (!num_ris)
> +		mpam_dt_parse_resource(msc, np, 0);
> +
> +	return err;
> +}
> +
> +/*
> + * An MSC can control traffic from a set of CPUs, but may only be accessible
> + * from a (hopefully wider) set of CPUs. The common reason for this is power
> + * management. If all the CPUs in a cluster are in PSCI:CPU_SUSPEND, the
> + * the corresponding cache may also be powered off. By making accesses from
> + * one of those CPUs, we ensure this isn't the case.
> + */
> +static int update_msc_accessibility(struct mpam_msc *msc)
> +{
> +	struct device_node *parent;
> +	u32 affinity_id;
> +	int err;
> +
> +	if (!acpi_disabled) {
> +		err = device_property_read_u32(&msc->pdev->dev, "cpu_affinity",
> +					       &affinity_id);
> +		if (err)
> +			cpumask_copy(&msc->accessibility, cpu_possible_mask);
> +		else
> +			acpi_pptt_get_cpus_from_container(affinity_id,
> +							  &msc->accessibility);
> +
> +		return 0;
> +	}
> +
> +	/* This depends on the path to of_node */
> +	parent = of_get_parent(msc->pdev->dev.of_node);
> +	if (parent == of_root) {
> +		cpumask_copy(&msc->accessibility, cpu_possible_mask);
> +		err = 0;
> +	} else {
> +		err = -EINVAL;
> +		pr_err("Cannot determine accessibility of MSC: %s\n",
> +		       dev_name(&msc->pdev->dev));
> +	}
> +	of_node_put(parent);
> +
> +	return err;
> +}
> +
> +static int fw_num_msc;
> +
> +static void mpam_pcc_rx_callback(struct mbox_client *cl, void *msg)
> +{
> +	/* TODO: wake up tasks blocked on this MSC's PCC channel */
> +}
> +
> +static void mpam_msc_drv_remove(struct platform_device *pdev)
> +{
> +	struct mpam_msc *msc = platform_get_drvdata(pdev);
> +
> +	if (!msc)
> +		return;
> +
> +	mutex_lock(&mpam_list_lock);
> +	mpam_num_msc--;
> +	platform_set_drvdata(pdev, NULL);
> +	list_del_rcu(&msc->glbl_list);
> +	synchronize_srcu(&mpam_srcu);
> +	devm_kfree(&pdev->dev, msc);
> +	mutex_unlock(&mpam_list_lock);
> +}
> +
> +static int mpam_msc_drv_probe(struct platform_device *pdev)
> +{
> +	int err;
> +	struct mpam_msc *msc;
> +	struct resource *msc_res;
> +	void *plat_data = pdev->dev.platform_data;
> +
> +	mutex_lock(&mpam_list_lock);
> +	do {
> +		msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
> +		if (!msc) {
> +			err = -ENOMEM;
> +			break;
> +		}
> +
> +		mutex_init(&msc->probe_lock);
> +		mutex_init(&msc->part_sel_lock);
> +		mutex_init(&msc->outer_mon_sel_lock);
> +		raw_spin_lock_init(&msc->inner_mon_sel_lock);
> +		msc->id = mpam_num_msc++;
> +		msc->pdev = pdev;
> +		INIT_LIST_HEAD_RCU(&msc->glbl_list);
> +		INIT_LIST_HEAD_RCU(&msc->ris);
> +
> +		err = update_msc_accessibility(msc);
> +		if (err)
> +			break;
> +		if (cpumask_empty(&msc->accessibility)) {
> +			pr_err_once("msc:%u is not accessible from any CPU!",
> +				    msc->id);
> +			err = -EINVAL;
> +			break;
> +		}
> +
> +		if (device_property_read_u32(&pdev->dev, "pcc-channel",
> +					     &msc->pcc_subspace_id))
> +			msc->iface = MPAM_IFACE_MMIO;
> +		else
> +			msc->iface = MPAM_IFACE_PCC;
> +
> +		if (msc->iface == MPAM_IFACE_MMIO) {
> +			void __iomem *io;
> +
> +			io = devm_platform_get_and_ioremap_resource(pdev, 0,
> +								    &msc_res);
> +			if (IS_ERR(io)) {
> +				pr_err("Failed to map MSC base address\n");
> +				err = PTR_ERR(io);
> +				break;
> +			}
> +			msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
> +			msc->mapped_hwpage = io;
> +		} else if (msc->iface == MPAM_IFACE_PCC) {
> +			msc->pcc_cl.dev = &pdev->dev;
> +			msc->pcc_cl.rx_callback = mpam_pcc_rx_callback;
> +			msc->pcc_cl.tx_block = false;
> +			msc->pcc_cl.tx_tout = 1000; /* 1s */
> +			msc->pcc_cl.knows_txdone = false;
> +
> +			msc->pcc_chan = pcc_mbox_request_channel(&msc->pcc_cl,
> +								 msc->pcc_subspace_id);
> +			if (IS_ERR(msc->pcc_chan)) {
> +				pr_err("Failed to request MSC PCC channel\n");
> +				err = PTR_ERR(msc->pcc_chan);
> +				break;
> +			}
I don't see pcc support added in this series. Should we fail the probe
if this interface is specified?
(If keeping, there is a missing pcc_mbox_free_channel() on the error path.)
> +		}
> +
> +		list_add_rcu(&msc->glbl_list, &mpam_all_msc);
> +		platform_set_drvdata(pdev, msc);
> +	} while (0);
> +	mutex_unlock(&mpam_list_lock);
> +
> +	if (!err) {
> +		/* Create RIS entries described by firmware */
> +		if (!acpi_disabled)
> +			err = acpi_mpam_parse_resources(msc, plat_data);
> +		else
> +			err = mpam_dt_parse_resources(msc, plat_data);
> +	}
> +
> +	if (!err && fw_num_msc == mpam_num_msc)
> +		mpam_discovery_complete();
> +
> +	if (err && msc)
> +		mpam_msc_drv_remove(pdev);
> +
> +	return err;
> +}
> +
> +static const struct of_device_id mpam_of_match[] = {
> +	{ .compatible = "arm,mpam-msc", },
> +	{},
> +};
> +MODULE_DEVICE_TABLE(of, mpam_of_match);
> +
> +static struct platform_driver mpam_msc_driver = {
> +	.driver = {
> +		.name = "mpam_msc",
> +		.of_match_table = of_match_ptr(mpam_of_match),
> +	},
> +	.probe = mpam_msc_drv_probe,
> +	.remove = mpam_msc_drv_remove,
> +};
> +
> +/*
> + * MSC that are hidden under caches are not created as platform devices
> + * as there is no cache driver. Caches are also special-cased in
> + * update_msc_accessibility().
> + */
> +static void mpam_dt_create_foundling_msc(void)
> +{
> +	int err;
> +	struct device_node *cache;
> +
> +	for_each_compatible_node(cache, NULL, "cache") {
> +		err = of_platform_populate(cache, mpam_of_match, NULL, NULL);
> +		if (err)
> +			pr_err("Failed to create MSC devices under caches\n");
> +	}
> +}
> +
> +static int __init mpam_msc_driver_init(void)
> +{
> +	if (!system_supports_mpam())
> +		return -EOPNOTSUPP;
> +
> +	init_srcu_struct(&mpam_srcu);
> +
> +	if (!acpi_disabled)
> +		fw_num_msc = acpi_mpam_count_msc();
> +	else
> +		fw_num_msc = mpam_dt_count_msc();
> +
> +	if (fw_num_msc <= 0) {
> +		pr_err("No MSC devices found in firmware\n");
> +		return -EINVAL;
> +	}
> +
> +	if (acpi_disabled)
> +		mpam_dt_create_foundling_msc();
> +
> +	return platform_driver_register(&mpam_msc_driver);
> +}
> +subsys_initcall(mpam_msc_driver_init);
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> new file mode 100644
> index 000000000000..07e0f240eaca
> --- /dev/null
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -0,0 +1,62 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +// Copyright (C) 2024 Arm Ltd.
> +
> +#ifndef MPAM_INTERNAL_H
> +#define MPAM_INTERNAL_H
> +
> +#include <linux/arm_mpam.h>
> +#include <linux/cpumask.h>
> +#include <linux/io.h>
> +#include <linux/mailbox_client.h>
> +#include <linux/mutex.h>
> +#include <linux/resctrl.h>
> +#include <linux/sizes.h>
> +
> +struct mpam_msc {
> +	/* member of mpam_all_msc */
> +	struct list_head        glbl_list;
> +
> +	int			id;
> +	struct platform_device *pdev;
> +
> +	/* Not modified after mpam_is_enabled() becomes true */
> +	enum mpam_msc_iface	iface;
> +	u32			pcc_subspace_id;
> +	struct mbox_client	pcc_cl;
> +	struct pcc_mbox_chan	*pcc_chan;
> +	u32			nrdy_usec;
> +	cpumask_t		accessibility;
> +
> +	/*
> +	 * probe_lock is only take during discovery. After discovery these
nit: s/take/taken/
> +	 * properties become read-only and the lists are protected by SRCU.
> +	 */
> +	struct mutex		probe_lock;
> +	unsigned long		ris_idxs[128 / BITS_PER_LONG];
> +	u32			ris_max;
> +
> +	/* mpam_msc_ris of this component */
> +	struct list_head	ris;
> +
> +	/*
> +	 * part_sel_lock protects access to the MSC hardware registers that are
> +	 * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
> +	 * by RIS).
> +	 * If needed, take msc->lock first.
> +	 */
> +	struct mutex		part_sel_lock;
> +
> +	/*
> +	 * mon_sel_lock protects access to the MSC hardware registers that are
> +	 * affeted by MPAMCFG_MON_SEL.
nit: s/affeted/affected/
> +	 * If needed, take msc->lock first.
> +	 */
> +	struct mutex		outer_mon_sel_lock;
> +	raw_spinlock_t		inner_mon_sel_lock;
> +	unsigned long		inner_mon_sel_flags;
> +
> +	void __iomem		*mapped_hwpage;
> +	size_t			mapped_hwpage_sz;
> +};
> +
> +#endif /* MPAM_INTERNAL_H */
Thanks,
Ben
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-08-27 13:03   ` Ben Horgan
@ 2025-09-05 18:48     ` James Morse
  2025-09-08 10:54       ` Ben Horgan
  0 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-09-05 18:48 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi Ben,
On 27/08/2025 14:03, Ben Horgan wrote:
> On 8/22/25 16:29, James Morse wrote:
>> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
>> only be accessible from those CPUs, and they may not be online.
>> Touching the hardware early is pointless as MPAM can't be used until
>> the system-wide common values for num_partid and num_pmg have been
>> discovered.
>>
>> Start with driver probe/remove and mapping the MSC.
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> new file mode 100644
>> index 000000000000..a0d9a699a6e7
>> --- /dev/null
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -0,0 +1,336 @@
>> +static int mpam_dt_parse_resource(struct mpam_msc *msc, struct device_node *np,
>> +				  u32 ris_idx)
>> +{
>> +	int err = 0;
>> +	u32 level = 0;
>> +	unsigned long cache_id;
>> +	struct device_node *cache;
>> +
>> +	do {
>> +		if (of_device_is_compatible(np, "arm,mpam-cache")) {
>> +			cache = of_parse_phandle(np, "arm,mpam-device", 0);
>> +			if (!cache) {
>> +				pr_err("Failed to read phandle\n");
>> +				break;
>> +			}
> This looks like this allows "arm,mpam-cache" and "arm,mpam-device" to be
> used on an msc node when there are no ris children. This usage could be
> reasonable but doesn't match the schema in the previous patch. Should
> this usage be rejected or the schema extended?
The DT/ACPI stuff is only going to describe the things that make sense at a high level,
e.g. the controls for the L3. There may be other controls for stuff that doesn't make
sense in the hardware - these get discovered, grouped as 'unknown' and left alone.
Another angle on this is where there is an MSC that the OS will never make use of, but
needs to know about to find the system wide minimum value. (there is a comment about
this in the ACPI spec...)
I don't think its a problem if the magic dt-binding machinery is overly restrictive, that
is about validating DTB files...
>> +		} else if (of_device_is_compatible(np->parent, "cache")) {
>> +			cache = of_node_get(np->parent);
>> +		} else {
>> +			/* For now, only caches are supported */
>> +			cache = NULL;
>> +			break;
>> +		}
>> +
>> +		err = of_property_read_u32(cache, "cache-level", &level);
>> +		if (err) {
>> +			pr_err("Failed to read cache-level\n");
>> +			break;
>> +		}
>> +
>> +		cache_id = cache_of_calculate_id(cache);
>> +		if (cache_id == ~0UL) {
>> +			err = -ENOENT;
>> +			break;
>> +		}
>> +
>> +		err = mpam_ris_create(msc, ris_idx, MPAM_CLASS_CACHE, level,
>> +				      cache_id);
>> +	} while (0);
>> +	of_node_put(cache);
>> +
>> +	return err;
>> +}
>> +static int mpam_msc_drv_probe(struct platform_device *pdev)
>> +{
>> +	int err;
>> +	struct mpam_msc *msc;
>> +	struct resource *msc_res;
>> +	void *plat_data = pdev->dev.platform_data;
>> +
>> +	mutex_lock(&mpam_list_lock);
>> +	do {
>> +		msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
>> +		if (!msc) {
>> +			err = -ENOMEM;
>> +			break;
>> +		}
>> +
>> +		mutex_init(&msc->probe_lock);
>> +		mutex_init(&msc->part_sel_lock);
>> +		mutex_init(&msc->outer_mon_sel_lock);
>> +		raw_spin_lock_init(&msc->inner_mon_sel_lock);
>> +		msc->id = mpam_num_msc++;
>> +		msc->pdev = pdev;
>> +		INIT_LIST_HEAD_RCU(&msc->glbl_list);
>> +		INIT_LIST_HEAD_RCU(&msc->ris);
>> +
>> +		err = update_msc_accessibility(msc);
>> +		if (err)
>> +			break;
>> +		if (cpumask_empty(&msc->accessibility)) {
>> +			pr_err_once("msc:%u is not accessible from any CPU!",
>> +				    msc->id);
>> +			err = -EINVAL;
>> +			break;
>> +		}
>> +
>> +		if (device_property_read_u32(&pdev->dev, "pcc-channel",
>> +					     &msc->pcc_subspace_id))
>> +			msc->iface = MPAM_IFACE_MMIO;
>> +		else
>> +			msc->iface = MPAM_IFACE_PCC;
>> +
>> +		if (msc->iface == MPAM_IFACE_MMIO) {
>> +			void __iomem *io;
>> +
>> +			io = devm_platform_get_and_ioremap_resource(pdev, 0,
>> +								    &msc_res);
>> +			if (IS_ERR(io)) {
>> +				pr_err("Failed to map MSC base address\n");
>> +				err = PTR_ERR(io);
>> +				break;
>> +			}
>> +			msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
>> +			msc->mapped_hwpage = io;
>> +		} else if (msc->iface == MPAM_IFACE_PCC) {
>> +			msc->pcc_cl.dev = &pdev->dev;
>> +			msc->pcc_cl.rx_callback = mpam_pcc_rx_callback;
>> +			msc->pcc_cl.tx_block = false;
>> +			msc->pcc_cl.tx_tout = 1000; /* 1s */
>> +			msc->pcc_cl.knows_txdone = false;
>> +
>> +			msc->pcc_chan = pcc_mbox_request_channel(&msc->pcc_cl,
>> +								 msc->pcc_subspace_id);
>> +			if (IS_ERR(msc->pcc_chan)) {
>> +				pr_err("Failed to request MSC PCC channel\n");
>> +				err = PTR_ERR(msc->pcc_chan);
>> +				break;
>> +			}
> I don't see pcc support added in this series. Should we fail the probe
> if this interface is specified?
I've got patches from Andre P to support it on DT - but the platforms that need it keeping
popping in and out of existence. I'll pull these bits out - they were intended to check
the ACPI table wasn't totally rotten...
> (If keeping, there is a missing pcc_mbox_free_channel() on the error path.)
When pcc_mbox_request_channel() fails? It already called mbox_free_channel() itself.
>> +		}
>> +
>> +		list_add_rcu(&msc->glbl_list, &mpam_all_msc);
>> +		platform_set_drvdata(pdev, msc);
>> +	} while (0);
>> +	mutex_unlock(&mpam_list_lock);
>> +
>> +	if (!err) {
>> +		/* Create RIS entries described by firmware */
>> +		if (!acpi_disabled)
>> +			err = acpi_mpam_parse_resources(msc, plat_data);
>> +		else
>> +			err = mpam_dt_parse_resources(msc, plat_data);
>> +	}
>> +
>> +	if (!err && fw_num_msc == mpam_num_msc)
>> +		mpam_discovery_complete();
>> +
>> +	if (err && msc)
>> +		mpam_msc_drv_remove(pdev);
>> +
>> +	return err;
>> +}
>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>> new file mode 100644
>> index 000000000000..07e0f240eaca
>> --- /dev/null
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -0,0 +1,62 @@
>> +struct mpam_msc {
>> +	/* member of mpam_all_msc */
>> +	struct list_head        glbl_list;
>> +
>> +	int			id;
>> +	struct platform_device *pdev;
>> +
>> +	/* Not modified after mpam_is_enabled() becomes true */
>> +	enum mpam_msc_iface	iface;
>> +	u32			pcc_subspace_id;
>> +	struct mbox_client	pcc_cl;
>> +	struct pcc_mbox_chan	*pcc_chan;
>> +	u32			nrdy_usec;
>> +	cpumask_t		accessibility;
>> +
>> +	/*
>> +	 * probe_lock is only take during discovery. After discovery these
> nit: s/take/taken/
Fixed,
>> +	 * properties become read-only and the lists are protected by SRCU.
>> +	 */
>> +	struct mutex		probe_lock;
>> +	unsigned long		ris_idxs[128 / BITS_PER_LONG];
>> +	u32			ris_max;
>> +
>> +	/* mpam_msc_ris of this component */
>> +	struct list_head	ris;
>> +
>> +	/*
>> +	 * part_sel_lock protects access to the MSC hardware registers that are
>> +	 * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
>> +	 * by RIS).
>> +	 * If needed, take msc->lock first.
>> +	 */
>> +	struct mutex		part_sel_lock;
>> +
>> +	/*
>> +	 * mon_sel_lock protects access to the MSC hardware registers that are
>> +	 * affeted by MPAMCFG_MON_SEL.
> nit: s/affeted/affected/
Fixed,
>> +	 * If needed, take msc->lock first.
>> +	 */
>> +	struct mutex		outer_mon_sel_lock;
>> +	raw_spinlock_t		inner_mon_sel_lock;
>> +	unsigned long		inner_mon_sel_flags;
>> +
>> +	void __iomem		*mapped_hwpage;
>> +	size_t			mapped_hwpage_sz;
>> +};
>> +
>> +#endif /* MPAM_INTERNAL_H */
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-09-05 18:48     ` James Morse
@ 2025-09-08 10:54       ` Ben Horgan
  0 siblings, 0 replies; 200+ messages in thread
From: Ben Horgan @ 2025-09-08 10:54 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 9/5/25 19:48, James Morse wrote:
> Hi Ben,
> 
> On 27/08/2025 14:03, Ben Horgan wrote:
>> On 8/22/25 16:29, James Morse wrote:
>>> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
>>> only be accessible from those CPUs, and they may not be online.
>>> Touching the hardware early is pointless as MPAM can't be used until
>>> the system-wide common values for num_partid and num_pmg have been
>>> discovered.
>>>
>>> Start with driver probe/remove and mapping the MSC.
> 
>>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>>> new file mode 100644
>>> index 000000000000..a0d9a699a6e7
>>> --- /dev/null
>>> +++ b/drivers/resctrl/mpam_devices.c
>>> @@ -0,0 +1,336 @@
> 
>>> +static int mpam_dt_parse_resource(struct mpam_msc *msc, struct device_node *np,
>>> +				  u32 ris_idx)
>>> +{
>>> +	int err = 0;
>>> +	u32 level = 0;
>>> +	unsigned long cache_id;
>>> +	struct device_node *cache;
>>> +
>>> +	do {
>>> +		if (of_device_is_compatible(np, "arm,mpam-cache")) {
>>> +			cache = of_parse_phandle(np, "arm,mpam-device", 0);
>>> +			if (!cache) {
>>> +				pr_err("Failed to read phandle\n");
>>> +				break;
>>> +			}
>> This looks like this allows "arm,mpam-cache" and "arm,mpam-device" to be
>> used on an msc node when there are no ris children. This usage could be
>> reasonable but doesn't match the schema in the previous patch. Should
>> this usage be rejected or the schema extended?
> 
> The DT/ACPI stuff is only going to describe the things that make sense at a high level,
> e.g. the controls for the L3. There may be other controls for stuff that doesn't make
> sense in the hardware - these get discovered, grouped as 'unknown' and left alone.
> 
> Another angle on this is where there is an MSC that the OS will never make use of, but
> needs to know about to find the system wide minimum value. (there is a comment about
> this in the ACPI spec...)
> 
> I don't think its a problem if the magic dt-binding machinery is overly restrictive, that
> is about validating DTB files...
I agree with your points. However, I was rather thinking that the code
allows more ways to describe the same thing than the schema does. In
that, you could write something like:
msc@80000 {
        compatible = "foo,a-standalone-msc";
        reg = <0x80000 0x1000>;
	...
        msc@10000 {
            compatible = "arm,mpam-msc arm,mpam-cache";
            arm,mpam-device = <&mem>;
            ...
         }
}
Although, now I've written this out, it doesn't seem sensible to worry
about this. Using ris compatibles on an msc, as in my example, is
clearly an error.
> 
>
[snip]
>>> +		} else if (msc->iface == MPAM_IFACE_PCC) {
>>> +			msc->pcc_cl.dev = &pdev->dev;
>>> +			msc->pcc_cl.rx_callback = mpam_pcc_rx_callback;
>>> +			msc->pcc_cl.tx_block = false;
>>> +			msc->pcc_cl.tx_tout = 1000; /* 1s */
>>> +			msc->pcc_cl.knows_txdone = false;
>>> +
>>> +			msc->pcc_chan = pcc_mbox_request_channel(&msc->pcc_cl,
>>> +								 msc->pcc_subspace_id);
>>> +			if (IS_ERR(msc->pcc_chan)) {
>>> +				pr_err("Failed to request MSC PCC channel\n");
>>> +				err = PTR_ERR(msc->pcc_chan);
>>> +				break;
>>> +			}
>> I don't see pcc support added in this series. Should we fail the probe
>> if this interface is specified?
> 
> I've got patches from Andre P to support it on DT - but the platforms that need it keeping
> popping in and out of existence. I'll pull these bits out - they were intended to check
> the ACPI table wasn't totally rotten...
> 
> 
>> (If keeping, there is a missing pcc_mbox_free_channel() on the error path.)
> 
> When pcc_mbox_request_channel() fails? It already called mbox_free_channel() itself.
Apologies, this was relating to if the *_parse_resources() call below
failed.
> 
> 
>>> +		}
>>> +
>>> +		list_add_rcu(&msc->glbl_list, &mpam_all_msc);
>>> +		platform_set_drvdata(pdev, msc);
>>> +	} while (0);
>>> +	mutex_unlock(&mpam_list_lock);
>>> +
>>> +	if (!err) {
>>> +		/* Create RIS entries described by firmware */
>>> +		if (!acpi_disabled)
>>> +			err = acpi_mpam_parse_resources(msc, plat_data);
>>> +		else
>>> +			err = mpam_dt_parse_resources(msc, plat_data);
>>> +	}
>>> +
>>> +	if (!err && fw_num_msc == mpam_num_msc)
>>> +		mpam_discovery_complete();
>>> +
>>> +	if (err && msc)
>>> +		mpam_msc_drv_remove(pdev);
>>> +
>>> +	return err;
>>> +}
[snip]>
> Thanks,
> 
> James
Thanks,
Ben
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
 
- * Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-08-22 15:29 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
                     ` (2 preceding siblings ...)
  2025-08-27 13:03   ` Ben Horgan
@ 2025-08-27 15:39   ` Rob Herring
  2025-08-27 16:16     ` Rob Herring
  2025-09-05 18:52     ` James Morse
  2025-09-01  9:11   ` Ben Horgan
  2025-09-01 11:21   ` Dave Martin
  5 siblings, 2 replies; 200+ messages in thread
From: Rob Herring @ 2025-08-27 15:39 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
On Fri, Aug 22, 2025 at 10:32 AM James Morse <james.morse@arm.com> wrote:
>
> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
> only be accessible from those CPUs, and they may not be online.
> Touching the hardware early is pointless as MPAM can't be used until
> the system-wide common values for num_partid and num_pmg have been
> discovered.
>
> Start with driver probe/remove and mapping the MSC.
>
> CC: Carl Worth <carl@os.amperecomputing.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
>  * Check for status=broken DT devices.
No such status... 'disabled' can be for a variety of reasons.
>  * Moved all the files around.
>  * Made Kconfig symbols depend on EXPERT
> ---
>  arch/arm64/Kconfig              |   1 +
>  drivers/Kconfig                 |   2 +
>  drivers/Makefile                |   1 +
>  drivers/resctrl/Kconfig         |  11 ++
>  drivers/resctrl/Makefile        |   4 +
>  drivers/resctrl/mpam_devices.c  | 336 ++++++++++++++++++++++++++++++++
>  drivers/resctrl/mpam_internal.h |  62 ++++++
>  7 files changed, 417 insertions(+)
>  create mode 100644 drivers/resctrl/Kconfig
>  create mode 100644 drivers/resctrl/Makefile
>  create mode 100644 drivers/resctrl/mpam_devices.c
>  create mode 100644 drivers/resctrl/mpam_internal.h
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index e51ccf1da102..ea3c54e04275 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2062,6 +2062,7 @@ config ARM64_TLB_RANGE
>
>  config ARM64_MPAM
>         bool "Enable support for MPAM"
> +       select ARM64_MPAM_DRIVER
>         select ACPI_MPAM if ACPI
>         help
>           Memory Partitioning and Monitoring is an optional extension
> diff --git a/drivers/Kconfig b/drivers/Kconfig
> index 4915a63866b0..3054b50a2f4c 100644
> --- a/drivers/Kconfig
> +++ b/drivers/Kconfig
> @@ -251,4 +251,6 @@ source "drivers/hte/Kconfig"
>
>  source "drivers/cdx/Kconfig"
>
> +source "drivers/resctrl/Kconfig"
> +
>  endmenu
> diff --git a/drivers/Makefile b/drivers/Makefile
> index b5749cf67044..f41cf4eddeba 100644
> --- a/drivers/Makefile
> +++ b/drivers/Makefile
> @@ -194,5 +194,6 @@ obj-$(CONFIG_HTE)           += hte/
>  obj-$(CONFIG_DRM_ACCEL)                += accel/
>  obj-$(CONFIG_CDX_BUS)          += cdx/
>  obj-$(CONFIG_DPLL)             += dpll/
> +obj-y                          += resctrl/
>
>  obj-$(CONFIG_S390)             += s390/
> diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
> new file mode 100644
> index 000000000000..dff7b87280ab
> --- /dev/null
> +++ b/drivers/resctrl/Kconfig
> @@ -0,0 +1,11 @@
> +# Confusingly, this is everything but the CPU bits of MPAM. CPU here means
> +# CPU resources, not containers or cgroups etc.
> +config ARM64_MPAM_DRIVER
> +       bool "MPAM driver for System IP, e,g. caches and memory controllers"
> +       depends on ARM64_MPAM && EXPERT
> +
> +config ARM64_MPAM_DRIVER_DEBUG
> +       bool "Enable debug messages from the MPAM driver."
> +       depends on ARM64_MPAM_DRIVER
> +       help
> +         Say yes here to enable debug messages from the MPAM driver.
> diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
> new file mode 100644
> index 000000000000..92b48fa20108
> --- /dev/null
> +++ b/drivers/resctrl/Makefile
> @@ -0,0 +1,4 @@
> +obj-$(CONFIG_ARM64_MPAM_DRIVER)                        += mpam.o
> +mpam-y                                         += mpam_devices.o
> +
> +cflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG)       += -DDEBUG
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> new file mode 100644
> index 000000000000..a0d9a699a6e7
> --- /dev/null
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -0,0 +1,336 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2025 Arm Ltd.
Given the 2024 below, should this be 2024-2025?
> +
> +#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
> +
> +#include <linux/acpi.h>
> +#include <linux/arm_mpam.h>
> +#include <linux/cacheinfo.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
> +#include <linux/device.h>
> +#include <linux/errno.h>
> +#include <linux/gfp.h>
> +#include <linux/list.h>
> +#include <linux/lockdep.h>
> +#include <linux/mutex.h>
> +#include <linux/of.h>
> +#include <linux/of_platform.h>
> +#include <linux/platform_device.h>
> +#include <linux/printk.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/srcu.h>
> +#include <linux/types.h>
> +
> +#include <acpi/pcc.h>
> +
> +#include "mpam_internal.h"
> +
> +/*
> + * mpam_list_lock protects the SRCU lists when writing. Once the
> + * mpam_enabled key is enabled these lists are read-only,
> + * unless the error interrupt disables the driver.
> + */
> +static DEFINE_MUTEX(mpam_list_lock);
> +static LIST_HEAD(mpam_all_msc);
> +
> +static struct srcu_struct mpam_srcu;
> +
> +/* MPAM isn't available until all the MSC have been probed. */
> +static u32 mpam_num_msc;
> +
> +static void mpam_discovery_complete(void)
> +{
> +       pr_err("Discovered all MSC\n");
Perhaps print out how many MSCs.
> +}
> +
> +static int mpam_dt_count_msc(void)
> +{
> +       int count = 0;
> +       struct device_node *np;
> +
> +       for_each_compatible_node(np, NULL, "arm,mpam-msc") {
> +               if (of_device_is_available(np))
> +                       count++;
> +       }
> +
> +       return count;
> +}
> +
> +static int mpam_dt_parse_resource(struct mpam_msc *msc, struct device_node *np,
> +                                 u32 ris_idx)
> +{
> +       int err = 0;
> +       u32 level = 0;
> +       unsigned long cache_id;
> +       struct device_node *cache;
> +
> +       do {
> +               if (of_device_is_compatible(np, "arm,mpam-cache")) {
> +                       cache = of_parse_phandle(np, "arm,mpam-device", 0);
> +                       if (!cache) {
> +                               pr_err("Failed to read phandle\n");
> +                               break;
> +                       }
> +               } else if (of_device_is_compatible(np->parent, "cache")) {
Don't access device_node members. I'm trying to make it opaque. And
technically it can be racy to access parent ptr when/if nodes are
dynamic. I think this should suffice:
else {
  cache = of_get_parent(np);
  if (!of_device_is_compatible(cache, "cache")) {
    cache = NULL;
    break;
  }
}
> +                       cache = of_node_get(np->parent);
> +               } else {
> +                       /* For now, only caches are supported */
> +                       cache = NULL;
> +                       break;
> +               }
> +
> +               err = of_property_read_u32(cache, "cache-level", &level);
> +               if (err) {
> +                       pr_err("Failed to read cache-level\n");
> +                       break;
> +               }
> +
> +               cache_id = cache_of_calculate_id(cache);
> +               if (cache_id == ~0UL) {
> +                       err = -ENOENT;
> +                       break;
> +               }
> +
> +               err = mpam_ris_create(msc, ris_idx, MPAM_CLASS_CACHE, level,
> +                                     cache_id);
> +       } while (0);
> +       of_node_put(cache);
> +
> +       return err;
> +}
> +
> +static int mpam_dt_parse_resources(struct mpam_msc *msc, void *ignored)
> +{
> +       int err, num_ris = 0;
> +       const u32 *ris_idx_p;
> +       struct device_node *iter, *np;
> +
> +       np = msc->pdev->dev.of_node;
> +       for_each_child_of_node(np, iter) {
Use for_each_available_child_of_node_scoped()
> +               ris_idx_p = of_get_property(iter, "reg", NULL);
This is broken on big endian and new users of of_get_property() are
discouraged. Use of_property_read_reg().
> +               if (ris_idx_p) {
> +                       num_ris++;
> +                       err = mpam_dt_parse_resource(msc, iter, *ris_idx_p);
> +                       if (err) {
> +                               of_node_put(iter);
And then drop the put.
> +                               return err;
> +                       }
> +               }
> +       }
> +
> +       if (!num_ris)
> +               mpam_dt_parse_resource(msc, np, 0);
> +
> +       return err;
> +}
> +
> +/*
> + * An MSC can control traffic from a set of CPUs, but may only be accessible
> + * from a (hopefully wider) set of CPUs. The common reason for this is power
> + * management. If all the CPUs in a cluster are in PSCI:CPU_SUSPEND, the
> + * the corresponding cache may also be powered off. By making accesses from
> + * one of those CPUs, we ensure this isn't the case.
> + */
> +static int update_msc_accessibility(struct mpam_msc *msc)
> +{
> +       struct device_node *parent;
> +       u32 affinity_id;
> +       int err;
> +
> +       if (!acpi_disabled) {
> +               err = device_property_read_u32(&msc->pdev->dev, "cpu_affinity",
> +                                              &affinity_id);
> +               if (err)
> +                       cpumask_copy(&msc->accessibility, cpu_possible_mask);
> +               else
> +                       acpi_pptt_get_cpus_from_container(affinity_id,
> +                                                         &msc->accessibility);
> +
> +               return 0;
> +       }
> +
> +       /* This depends on the path to of_node */
I'm failing to understand what has to be at the root node?
> +       parent = of_get_parent(msc->pdev->dev.of_node);
> +       if (parent == of_root) {
> +               cpumask_copy(&msc->accessibility, cpu_possible_mask);
> +               err = 0;
> +       } else {
> +               err = -EINVAL;
> +               pr_err("Cannot determine accessibility of MSC: %s\n",
> +                      dev_name(&msc->pdev->dev));
> +       }
> +       of_node_put(parent);
> +
> +       return err;
> +}
> +
> +static int fw_num_msc;
> +
> +static void mpam_pcc_rx_callback(struct mbox_client *cl, void *msg)
> +{
> +       /* TODO: wake up tasks blocked on this MSC's PCC channel */
> +}
> +
> +static void mpam_msc_drv_remove(struct platform_device *pdev)
> +{
> +       struct mpam_msc *msc = platform_get_drvdata(pdev);
> +
> +       if (!msc)
> +               return;
> +
> +       mutex_lock(&mpam_list_lock);
> +       mpam_num_msc--;
> +       platform_set_drvdata(pdev, NULL);
> +       list_del_rcu(&msc->glbl_list);
> +       synchronize_srcu(&mpam_srcu);
> +       devm_kfree(&pdev->dev, msc);
This should happen automagically.
> +       mutex_unlock(&mpam_list_lock);
> +}
> +
> +static int mpam_msc_drv_probe(struct platform_device *pdev)
> +{
> +       int err;
> +       struct mpam_msc *msc;
> +       struct resource *msc_res;
> +       void *plat_data = pdev->dev.platform_data;
> +
> +       mutex_lock(&mpam_list_lock);
> +       do {
> +               msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
> +               if (!msc) {
> +                       err = -ENOMEM;
> +                       break;
> +               }
> +
> +               mutex_init(&msc->probe_lock);
> +               mutex_init(&msc->part_sel_lock);
> +               mutex_init(&msc->outer_mon_sel_lock);
> +               raw_spin_lock_init(&msc->inner_mon_sel_lock);
> +               msc->id = mpam_num_msc++;
Multiple probe functions can run in parallel, so this needs to be
atomic. Maybe it is with mpam_list_lock, but then the name of the
mutex is misleading given this is not the list. It's not really clear
to me what all needs the mutex here. Certainly a lot of it doesn't.
Like everything else above here except the increment.
> +               msc->pdev = pdev;
> +               INIT_LIST_HEAD_RCU(&msc->glbl_list);
> +               INIT_LIST_HEAD_RCU(&msc->ris);
> +
> +               err = update_msc_accessibility(msc);
> +               if (err)
> +                       break;
> +               if (cpumask_empty(&msc->accessibility)) {
> +                       pr_err_once("msc:%u is not accessible from any CPU!",
> +                                   msc->id);
> +                       err = -EINVAL;
> +                       break;
> +               }
> +
> +               if (device_property_read_u32(&pdev->dev, "pcc-channel",
Does this property apply to DT? It would as the code is written. It is
not documented though.
> +                                            &msc->pcc_subspace_id))
> +                       msc->iface = MPAM_IFACE_MMIO;
> +               else
> +                       msc->iface = MPAM_IFACE_PCC;
> +
> +               if (msc->iface == MPAM_IFACE_MMIO) {
> +                       void __iomem *io;
> +
> +                       io = devm_platform_get_and_ioremap_resource(pdev, 0,
> +                                                                   &msc_res);
> +                       if (IS_ERR(io)) {
> +                               pr_err("Failed to map MSC base address\n");
> +                               err = PTR_ERR(io);
> +                               break;
> +                       }
> +                       msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
> +                       msc->mapped_hwpage = io;
> +               } else if (msc->iface == MPAM_IFACE_PCC) {
> +                       msc->pcc_cl.dev = &pdev->dev;
> +                       msc->pcc_cl.rx_callback = mpam_pcc_rx_callback;
> +                       msc->pcc_cl.tx_block = false;
> +                       msc->pcc_cl.tx_tout = 1000; /* 1s */
> +                       msc->pcc_cl.knows_txdone = false;
> +
> +                       msc->pcc_chan = pcc_mbox_request_channel(&msc->pcc_cl,
> +                                                                msc->pcc_subspace_id);
> +                       if (IS_ERR(msc->pcc_chan)) {
> +                               pr_err("Failed to request MSC PCC channel\n");
> +                               err = PTR_ERR(msc->pcc_chan);
> +                               break;
> +                       }
> +               }
> +
> +               list_add_rcu(&msc->glbl_list, &mpam_all_msc);
> +               platform_set_drvdata(pdev, msc);
> +       } while (0);
> +       mutex_unlock(&mpam_list_lock);
> +
> +       if (!err) {
> +               /* Create RIS entries described by firmware */
> +               if (!acpi_disabled)
> +                       err = acpi_mpam_parse_resources(msc, plat_data);
> +               else
> +                       err = mpam_dt_parse_resources(msc, plat_data);
Isn't there a race here if an error occurs since you already added the
MSC to the list? Something like this sequence with 2 MSCs:
device 1 probe
device 1 added
    device 2 probe
    device 2 added
device 1 calls mpam_discovery_complete()
    device 2 error on parse_resources
    device 2 removed
> +       }
> +
> +       if (!err && fw_num_msc == mpam_num_msc)
> +               mpam_discovery_complete();
> +
> +       if (err && msc)
> +               mpam_msc_drv_remove(pdev);
> +
> +       return err;
> +}
> +
> +static const struct of_device_id mpam_of_match[] = {
> +       { .compatible = "arm,mpam-msc", },
> +       {},
> +};
> +MODULE_DEVICE_TABLE(of, mpam_of_match);
> +
> +static struct platform_driver mpam_msc_driver = {
> +       .driver = {
> +               .name = "mpam_msc",
> +               .of_match_table = of_match_ptr(mpam_of_match),
> +       },
> +       .probe = mpam_msc_drv_probe,
> +       .remove = mpam_msc_drv_remove,
> +};
> +
> +/*
> + * MSC that are hidden under caches are not created as platform devices
> + * as there is no cache driver. Caches are also special-cased in
> + * update_msc_accessibility().
> + */
> +static void mpam_dt_create_foundling_msc(void)
> +{
> +       int err;
> +       struct device_node *cache;
> +
> +       for_each_compatible_node(cache, NULL, "cache") {
> +               err = of_platform_populate(cache, mpam_of_match, NULL, NULL);
This is going to create platform devices for all caches (except L1)
regardless of whether they support MPAM or not. Isn't it likely or
possible that only L3 or SLC caches support MPAM?
> +               if (err)
> +                       pr_err("Failed to create MSC devices under caches\n");
> +       }
> +}
> +
> +static int __init mpam_msc_driver_init(void)
> +{
> +       if (!system_supports_mpam())
> +               return -EOPNOTSUPP;
> +
> +       init_srcu_struct(&mpam_srcu);
> +
> +       if (!acpi_disabled)
> +               fw_num_msc = acpi_mpam_count_msc();
> +       else
> +               fw_num_msc = mpam_dt_count_msc();
> +
> +       if (fw_num_msc <= 0) {
> +               pr_err("No MSC devices found in firmware\n");
> +               return -EINVAL;
> +       }
> +
> +       if (acpi_disabled)
> +               mpam_dt_create_foundling_msc();
> +
> +       return platform_driver_register(&mpam_msc_driver);
> +}
> +subsys_initcall(mpam_msc_driver_init);
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> new file mode 100644
> index 000000000000..07e0f240eaca
> --- /dev/null
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -0,0 +1,62 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +// Copyright (C) 2024 Arm Ltd.
It's 2025.
> +
> +#ifndef MPAM_INTERNAL_H
> +#define MPAM_INTERNAL_H
> +
> +#include <linux/arm_mpam.h>
> +#include <linux/cpumask.h>
> +#include <linux/io.h>
> +#include <linux/mailbox_client.h>
> +#include <linux/mutex.h>
> +#include <linux/resctrl.h>
> +#include <linux/sizes.h>
> +
> +struct mpam_msc {
> +       /* member of mpam_all_msc */
> +       struct list_head        glbl_list;
> +
> +       int                     id;
> +       struct platform_device *pdev;
> +
> +       /* Not modified after mpam_is_enabled() becomes true */
> +       enum mpam_msc_iface     iface;
> +       u32                     pcc_subspace_id;
> +       struct mbox_client      pcc_cl;
> +       struct pcc_mbox_chan    *pcc_chan;
> +       u32                     nrdy_usec;
> +       cpumask_t               accessibility;
> +
> +       /*
> +        * probe_lock is only take during discovery. After discovery these
s/take/taken/
> +        * properties become read-only and the lists are protected by SRCU.
> +        */
> +       struct mutex            probe_lock;
> +       unsigned long           ris_idxs[128 / BITS_PER_LONG];
> +       u32                     ris_max;
> +
> +       /* mpam_msc_ris of this component */
> +       struct list_head        ris;
> +
> +       /*
> +        * part_sel_lock protects access to the MSC hardware registers that are
> +        * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
> +        * by RIS).
> +        * If needed, take msc->lock first.
Stale comment? I don't see any 'lock' member.
> +        */
> +       struct mutex            part_sel_lock;
> +
> +       /*
> +        * mon_sel_lock protects access to the MSC hardware registers that are
> +        * affeted by MPAMCFG_MON_SEL.
> +        * If needed, take msc->lock first.
> +        */
> +       struct mutex            outer_mon_sel_lock;
> +       raw_spinlock_t          inner_mon_sel_lock;
> +       unsigned long           inner_mon_sel_flags;
> +
> +       void __iomem            *mapped_hwpage;
> +       size_t                  mapped_hwpage_sz;
> +};
> +
> +#endif /* MPAM_INTERNAL_H */
> --
> 2.20.1
>
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-08-27 15:39   ` Rob Herring
@ 2025-08-27 16:16     ` Rob Herring
  2025-09-05 18:52       ` James Morse
  2025-09-05 18:52     ` James Morse
  1 sibling, 1 reply; 200+ messages in thread
From: Rob Herring @ 2025-08-27 16:16 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
On Wed, Aug 27, 2025 at 10:39 AM Rob Herring <robh@kernel.org> wrote:
>
> On Fri, Aug 22, 2025 at 10:32 AM James Morse <james.morse@arm.com> wrote:
> >
> > Probing MPAM is convoluted. MSCs that are integrated with a CPU may
> > only be accessible from those CPUs, and they may not be online.
> > Touching the hardware early is pointless as MPAM can't be used until
> > the system-wide common values for num_partid and num_pmg have been
> > discovered.
[...]
> > +static int mpam_dt_parse_resources(struct mpam_msc *msc, void *ignored)
> > +{
> > +       int err, num_ris = 0;
> > +       const u32 *ris_idx_p;
> > +       struct device_node *iter, *np;
> > +
> > +       np = msc->pdev->dev.of_node;
> > +       for_each_child_of_node(np, iter) {
>
> Use for_each_available_child_of_node_scoped()
>
> > +               ris_idx_p = of_get_property(iter, "reg", NULL);
>
> This is broken on big endian and new users of of_get_property() are
> discouraged. Use of_property_read_reg().
Err, this is broken on little endian as the DT is big endian.
So this was obviously not tested as I'm confident you didn't test on BE.
Rob
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-08-27 16:16     ` Rob Herring
@ 2025-09-05 18:52       ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-05 18:52 UTC (permalink / raw)
  To: Rob Herring
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hi Rob,
On 27/08/2025 17:16, Rob Herring wrote:
> On Wed, Aug 27, 2025 at 10:39 AM Rob Herring <robh@kernel.org> wrote:
>>
>> On Fri, Aug 22, 2025 at 10:32 AM James Morse <james.morse@arm.com> wrote:
>>>
>>> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
>>> only be accessible from those CPUs, and they may not be online.
>>> Touching the hardware early is pointless as MPAM can't be used until
>>> the system-wide common values for num_partid and num_pmg have been
>>> discovered.
>
> [...]
>
>>> +static int mpam_dt_parse_resources(struct mpam_msc *msc, void *ignored)
>>> +{
>>> +       int err, num_ris = 0;
>>> +       const u32 *ris_idx_p;
>>> +       struct device_node *iter, *np;
>>> +
>>> +       np = msc->pdev->dev.of_node;
>>> +       for_each_child_of_node(np, iter) {
>>
>> Use for_each_available_child_of_node_scoped()
>>
>>> +               ris_idx_p = of_get_property(iter, "reg", NULL);
>>
>> This is broken on big endian and new users of of_get_property() are
>> discouraged. Use of_property_read_reg().
>
> Err, this is broken on little endian as the DT is big endian.
>
> So this was obviously not tested as I'm confident you didn't test on BE.
'not tested' is shades of grey. I fed the FVP ~6 different DTB files to hit the different
paths through the driver. The FVP only has controls under RIS-0, so all of those only
defined RIS-0, and unsurprisingly didn't notice the helper isn't endian safe.
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
- * Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-08-27 15:39   ` Rob Herring
  2025-08-27 16:16     ` Rob Herring
@ 2025-09-05 18:52     ` James Morse
  1 sibling, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-05 18:52 UTC (permalink / raw)
  To: Rob Herring
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hi Rob,
On 27/08/2025 16:39, Rob Herring wrote:
> On Fri, Aug 22, 2025 at 10:32 AM James Morse <james.morse@arm.com> wrote:
>>
>> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
>> only be accessible from those CPUs, and they may not be online.
>> Touching the hardware early is pointless as MPAM can't be used until
>> the system-wide common values for num_partid and num_pmg have been
>> discovered.
>>
>> Start with driver probe/remove and mapping the MSC.
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> new file mode 100644
>> index 000000000000..a0d9a699a6e7
>> --- /dev/null
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -0,0 +1,336 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +// Copyright (C) 2025 Arm Ltd.
>
> Given the 2024 below, should this be 2024-2025?
I've never known what this year is really for. People clearly expect it to be this year
... I've evidently missed one somewhere. I'll fix that. ... I wrote some of this code in
2018, so there are a range of options on what it 'should' be ...
>> +
>> +#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
>> +
>> +#include <linux/acpi.h>
>> +#include <linux/arm_mpam.h>
>> +#include <linux/cacheinfo.h>
>> +#include <linux/cpu.h>
>> +#include <linux/cpumask.h>
>> +#include <linux/device.h>
>> +#include <linux/errno.h>
>> +#include <linux/gfp.h>
>> +#include <linux/list.h>
>> +#include <linux/lockdep.h>
>> +#include <linux/mutex.h>
>> +#include <linux/of.h>
>> +#include <linux/of_platform.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/printk.h>
>> +#include <linux/slab.h>
>> +#include <linux/spinlock.h>
>> +#include <linux/srcu.h>
>> +#include <linux/types.h>
>> +
>> +#include <acpi/pcc.h>
>> +
>> +#include "mpam_internal.h"
>> +
>> +/*
>> + * mpam_list_lock protects the SRCU lists when writing. Once the
>> + * mpam_enabled key is enabled these lists are read-only,
>> + * unless the error interrupt disables the driver.
>> + */
>> +static DEFINE_MUTEX(mpam_list_lock);
>> +static LIST_HEAD(mpam_all_msc);
>> +
>> +static struct srcu_struct mpam_srcu;
>> +
>> +/* MPAM isn't available until all the MSC have been probed. */
>> +static u32 mpam_num_msc;
>> +
>> +static void mpam_discovery_complete(void)
>> +{
>> +       pr_err("Discovered all MSC\n");
> Perhaps print out how many MSCs.
Once the whole thing is assembled the mpam_enable() call prints the number of PARTID/PMG,
which is something user-space can do something with. I don't think the number of MSC is
useful to anyone as some of them are grouped together and can't be configured independently.
>> +}
>> +static int mpam_dt_parse_resource(struct mpam_msc *msc, struct device_node *np,
>> +                                 u32 ris_idx)
>> +{
>> +       int err = 0;
>> +       u32 level = 0;
>> +       unsigned long cache_id;
>> +       struct device_node *cache;
>> +
>> +       do {
>> +               if (of_device_is_compatible(np, "arm,mpam-cache")) {
>> +                       cache = of_parse_phandle(np, "arm,mpam-device", 0);
>> +                       if (!cache) {
>> +                               pr_err("Failed to read phandle\n");
>> +                               break;
>> +                       }
>> +               } else if (of_device_is_compatible(np->parent, "cache")) {
>
> Don't access device_node members. I'm trying to make it opaque. And
> technically it can be racy to access parent ptr when/if nodes are
> dynamic. I think this should suffice:
>
> else {
>   cache = of_get_parent(np);
>   if (!of_device_is_compatible(cache, "cache")) {
>     cache = NULL;
>     break;
>   }
> }
Thanks!
The if/else-if ladder needs to stay, I've grabbed the parent earlier, and added a
of_node_put() of it after the do/while.
>> +                       cache = of_node_get(np->parent);
>> +               } else {
>> +                       /* For now, only caches are supported */
>> +                       cache = NULL;
>> +                       break;
>> +               }
>> +
>> +               err = of_property_read_u32(cache, "cache-level", &level);
>> +               if (err) {
>> +                       pr_err("Failed to read cache-level\n");
>> +                       break;
>> +               }
>> +
>> +               cache_id = cache_of_calculate_id(cache);
>> +               if (cache_id == ~0UL) {
>> +                       err = -ENOENT;
>> +                       break;
>> +               }
>> +
>> +               err = mpam_ris_create(msc, ris_idx, MPAM_CLASS_CACHE, level,
>> +                                     cache_id);
>> +       } while (0);
>> +       of_node_put(cache);
>> +
>> +       return err;
>> +}
>> +
>> +static int mpam_dt_parse_resources(struct mpam_msc *msc, void *ignored)
>> +{
>> +       int err, num_ris = 0;
>> +       const u32 *ris_idx_p;
>> +       struct device_node *iter, *np;
>> +
>> +       np = msc->pdev->dev.of_node;
>> +       for_each_child_of_node(np, iter) {
>
> Use for_each_available_child_of_node_scoped()
Sure,
>> +               ris_idx_p = of_get_property(iter, "reg", NULL);
>
> This is broken on big endian and new users of of_get_property() are
> discouraged. Use of_property_read_reg().
Done,
>> +               if (ris_idx_p) {
>> +                       num_ris++;
>> +                       err = mpam_dt_parse_resource(msc, iter, *ris_idx_p);
>> +                       if (err) {
>> +                               of_node_put(iter);
>
> And then drop the put.
>
>> +                               return err;
>> +                       }
>> +               }
>> +       }
>> +
>> +       if (!num_ris)
>> +               mpam_dt_parse_resource(msc, np, 0);
>> +
>> +       return err;
>> +}
>> +/*
>> + * An MSC can control traffic from a set of CPUs, but may only be accessible
>> + * from a (hopefully wider) set of CPUs. The common reason for this is power
>> + * management. If all the CPUs in a cluster are in PSCI:CPU_SUSPEND, the
>> + * the corresponding cache may also be powered off. By making accesses from
>> + * one of those CPUs, we ensure this isn't the case.
>> + */
>> +static int update_msc_accessibility(struct mpam_msc *msc)
>> +{
>> +       struct device_node *parent;
>> +       u32 affinity_id;
>> +       int err;
>> +
>> +       if (!acpi_disabled) {
>> +               err = device_property_read_u32(&msc->pdev->dev, "cpu_affinity",
>> +                                              &affinity_id);
>> +               if (err)
>> +                       cpumask_copy(&msc->accessibility, cpu_possible_mask);
>> +               else
>> +                       acpi_pptt_get_cpus_from_container(affinity_id,
>> +                                                         &msc->accessibility);
>> +
>> +               return 0;
>> +       }
>> +
>> +       /* This depends on the path to of_node */
> I'm failing to understand what has to be at the root node?
The accessibility bitmap. If the MSC is at the root of the tree - its assumed to be
global. If its buried in a device, its assumed to be in the same power domain as that
device. Practically this only matters for caches where PSCI's CPU_SUSPEND can turn the
cache off, so the cache MSC can only be accessed from the CPU's local to it, that way we
know its not about to be turned off by PSCI.
I'll rephrase the comment - its trying to explain why its not explicitly encoded.
| /* Where an MSC can be accessed from depends on the path to of_node. */
>> +       parent = of_get_parent(msc->pdev->dev.of_node);
>> +       if (parent == of_root) {
>> +               cpumask_copy(&msc->accessibility, cpu_possible_mask);
>> +               err = 0;
>> +       } else {
>> +               err = -EINVAL;
>> +               pr_err("Cannot determine accessibility of MSC: %s\n",
>> +                      dev_name(&msc->pdev->dev));
>> +       }
>> +       of_node_put(parent);
>> +
>> +       return err;
>> +}
>> +static void mpam_msc_drv_remove(struct platform_device *pdev)
>> +{
>> +       struct mpam_msc *msc = platform_get_drvdata(pdev);
>> +
>> +       if (!msc)
>> +               return;
>> +
>> +       mutex_lock(&mpam_list_lock);
>> +       mpam_num_msc--;
>> +       platform_set_drvdata(pdev, NULL);
>> +       list_del_rcu(&msc->glbl_list);
>> +       synchronize_srcu(&mpam_srcu);
>> +       devm_kfree(&pdev->dev, msc);
>
> This should happen automagically.
>
>> +       mutex_unlock(&mpam_list_lock);
>> +}
>> +
>> +static int mpam_msc_drv_probe(struct platform_device *pdev)
>> +{
>> +       int err;
>> +       struct mpam_msc *msc;
>> +       struct resource *msc_res;
>> +       void *plat_data = pdev->dev.platform_data;
>> +
>> +       mutex_lock(&mpam_list_lock);
>> +       do {
>> +               msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
>> +               if (!msc) {
>> +                       err = -ENOMEM;
>> +                       break;
>> +               }
>> +
>> +               mutex_init(&msc->probe_lock);
>> +               mutex_init(&msc->part_sel_lock);
>> +               mutex_init(&msc->outer_mon_sel_lock);
>> +               raw_spin_lock_init(&msc->inner_mon_sel_lock);
>> +               msc->id = mpam_num_msc++;
>
> Multiple probe functions can run in parallel, so this needs to be
> atomic. Maybe it is with mpam_list_lock, but then the name of the
> mutex is misleading given this is not the list. It's not really clear
> to me what all needs the mutex here. Certainly a lot of it doesn't.
> Like everything else above here except the increment.
It's more about the class/component/device lists that get added later - but more on this
mpam_num_msc thing below...
>> +               msc->pdev = pdev;
>> +               INIT_LIST_HEAD_RCU(&msc->glbl_list);
>> +               INIT_LIST_HEAD_RCU(&msc->ris);
>> +
>> +               err = update_msc_accessibility(msc);
>> +               if (err)
>> +                       break;
>> +               if (cpumask_empty(&msc->accessibility)) {
>> +                       pr_err_once("msc:%u is not accessible from any CPU!",
>> +                                   msc->id);
>> +                       err = -EINVAL;
>> +                       break;
>> +               }
>> +
>> +               if (device_property_read_u32(&pdev->dev, "pcc-channel",
> Does this property apply to DT? It would as the code is written. It is
> not documented though.
I don't think so - on DT PCC is going to be spelled SCMI which somes with some kind of
discovery instead. This property is added by the ACPI table 'driver'.
>> +                                            &msc->pcc_subspace_id))
>> +                       msc->iface = MPAM_IFACE_MMIO;
>> +               else
>> +                       msc->iface = MPAM_IFACE_PCC;
>> +
>> +               if (msc->iface == MPAM_IFACE_MMIO) {
>> +                       void __iomem *io;
>> +
>> +                       io = devm_platform_get_and_ioremap_resource(pdev, 0,
>> +                                                                   &msc_res);
>> +                       if (IS_ERR(io)) {
>> +                               pr_err("Failed to map MSC base address\n");
>> +                               err = PTR_ERR(io);
>> +                               break;
>> +                       }
>> +                       msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
>> +                       msc->mapped_hwpage = io;
>> +               } else if (msc->iface == MPAM_IFACE_PCC) {
>> +                       msc->pcc_cl.dev = &pdev->dev;
>> +                       msc->pcc_cl.rx_callback = mpam_pcc_rx_callback;
>> +                       msc->pcc_cl.tx_block = false;
>> +                       msc->pcc_cl.tx_tout = 1000; /* 1s */
>> +                       msc->pcc_cl.knows_txdone = false;
>> +
>> +                       msc->pcc_chan = pcc_mbox_request_channel(&msc->pcc_cl,
>> +                                                                msc->pcc_subspace_id);
>> +                       if (IS_ERR(msc->pcc_chan)) {
>> +                               pr_err("Failed to request MSC PCC channel\n");
>> +                               err = PTR_ERR(msc->pcc_chan);
>> +                               break;
>> +                       }
>> +               }
>> +
>> +               list_add_rcu(&msc->glbl_list, &mpam_all_msc);
>> +               platform_set_drvdata(pdev, msc);
>> +       } while (0);
>> +       mutex_unlock(&mpam_list_lock);
>> +
>> +       if (!err) {
>> +               /* Create RIS entries described by firmware */
>> +               if (!acpi_disabled)
>> +                       err = acpi_mpam_parse_resources(msc, plat_data);
>> +               else
>> +                       err = mpam_dt_parse_resources(msc, plat_data);
>
> Isn't there a race here if an error occurs since you already added the
> MSC to the list? Something like this sequence with 2 MSCs:
>
> device 1 probe
> device 1 added
>     device 2 probe
>     device 2 added
> device 1 calls mpam_discovery_complete()
>     device 2 error on parse_resources
>     device 2 removed
By the time the whole thing is assembled the 'discovery complete' work is scheduled by a
cpuhp callback and has better protection against this. That discovery-complete call is
just to help illustrate that there is stuff that happens once all the MSC have been
discovered, as that code gets much more complicated later in teh series.
Combined with your comment about the msc_id increment above - I'll stop using that as both
a count and an id, re-using the pdev->id as its more likely to be stable from boot to
boot. mpam_num_msc can then become an atomic_t that is incremented once we know we're not
going to remove the MSC from the list.
>> +       }
>> +
>> +       if (!err && fw_num_msc == mpam_num_msc)
>> +               mpam_discovery_complete();
>> +
>> +       if (err && msc)
>> +               mpam_msc_drv_remove(pdev);
>> +
>> +       return err;
>> +}
>> +/*
>> + * MSC that are hidden under caches are not created as platform devices
>> + * as there is no cache driver. Caches are also special-cased in
>> + * update_msc_accessibility().
>> + */
>> +static void mpam_dt_create_foundling_msc(void)
>> +{
>> +       int err;
>> +       struct device_node *cache;
>> +
>> +       for_each_compatible_node(cache, NULL, "cache") {
>> +               err = of_platform_populate(cache, mpam_of_match, NULL, NULL);
>
> This is going to create platform devices for all caches (except L1)
> regardless of whether they support MPAM or not. Isn't it likely or
> possible that only L3 or SLC caches support MPAM?
Likely - but all things are possible. You could put an MPAM MSC in your L1-I cache.
(or even in the CPU - but lets not go there)
I'll attempt to fix that up, what I have doesn't quite work, but I'll keep picking at it:
|        for_each_compatible_node(cache, NULL, "cache") {
|               struct device_node *cache_device;
|
|               if (of_node_check_flag(cache, OF_POPULATED))
|                       continue;
|
|               cache_device = of_find_matching_node_and_match(cache, mpam_of_match, NULL);
|               if (!cache_device)
|                       continue;
|               of_node_put(cache_device);
|
|               pdev = of_platform_device_create(cache, "cache", NULL);
|               if (!pdev)
|                       pr_err_once("Failed to create MSC devices under caches\n");
|          }
>> +               if (err)
>> +                       pr_err("Failed to create MSC devices under caches\n");
>> +       }
>> +}
>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>> new file mode 100644
>> index 000000000000..07e0f240eaca
>> --- /dev/null
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -0,0 +1,62 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +// Copyright (C) 2024 Arm Ltd.
>
> It's 2025.
... and I wrote this in 2018, but missed the annual update. I'll fix it.
>> +
>> +struct mpam_msc {
>> +       /* member of mpam_all_msc */
>> +       struct list_head        glbl_list;
>> +
>> +       int                     id;
>> +       struct platform_device *pdev;
>> +
>> +       /* Not modified after mpam_is_enabled() becomes true */
>> +       enum mpam_msc_iface     iface;
>> +       u32                     pcc_subspace_id;
>> +       struct mbox_client      pcc_cl;
>> +       struct pcc_mbox_chan    *pcc_chan;
>> +       u32                     nrdy_usec;
>> +       cpumask_t               accessibility;
>> +
>> +       /*
>> +        * probe_lock is only take during discovery. After discovery these
> s/take/taken/
Fixed,
>> +        * properties become read-only and the lists are protected by SRCU.
>> +        */
>> +       struct mutex            probe_lock;
>> +       unsigned long           ris_idxs[128 / BITS_PER_LONG];
>> +       u32                     ris_max;
>> +
>> +       /* mpam_msc_ris of this component */
>> +       struct list_head        ris;
>> +
>> +       /*
>> +        * part_sel_lock protects access to the MSC hardware registers that are
>> +        * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
>> +        * by RIS).
>> +        * If needed, take msc->lock first.
> Stale comment? I don't see any 'lock' member.
Yes, it should say probe_lock .. renamed after more locks got added in here, and the
comment got fixed up in the wrong patch. Fixed.
>> +        */
>> +       struct mutex            part_sel_lock;
>> +
>> +       /*
>> +        * mon_sel_lock protects access to the MSC hardware registers that are
>> +        * affeted by MPAMCFG_MON_SEL.
>> +        * If needed, take msc->lock first.
>> +        */
>> +       struct mutex            outer_mon_sel_lock;
>> +       raw_spinlock_t          inner_mon_sel_lock;
>> +       unsigned long           inner_mon_sel_flags;
>> +
>> +       void __iomem            *mapped_hwpage;
>> +       size_t                  mapped_hwpage_sz;
>> +};
>> +
>> +#endif /* MPAM_INTERNAL_H */
>> --
>> 2.20.1
>>
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
- * Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-08-22 15:29 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
                     ` (3 preceding siblings ...)
  2025-08-27 15:39   ` Rob Herring
@ 2025-09-01  9:11   ` Ben Horgan
  2025-09-05 18:49     ` James Morse
  2025-09-01 11:21   ` Dave Martin
  5 siblings, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-09-01  9:11 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:29, James Morse wrote:
> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
> only be accessible from those CPUs, and they may not be online.
> Touching the hardware early is pointless as MPAM can't be used until
> the system-wide common values for num_partid and num_pmg have been
> discovered.
> 
> Start with driver probe/remove and mapping the MSC.
> 
> CC: Carl Worth <carl@os.amperecomputing.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
>  * Check for status=broken DT devices.
>  * Moved all the files around.
>  * Made Kconfig symbols depend on EXPERT
> ---
>  arch/arm64/Kconfig              |   1 +
>  drivers/Kconfig                 |   2 +
>  drivers/Makefile                |   1 +
>  drivers/resctrl/Kconfig         |  11 ++
>  drivers/resctrl/Makefile        |   4 +
>  drivers/resctrl/mpam_devices.c  | 336 ++++++++++++++++++++++++++++++++
>  drivers/resctrl/mpam_internal.h |  62 ++++++
>  7 files changed, 417 insertions(+)
>  create mode 100644 drivers/resctrl/Kconfig
>  create mode 100644 drivers/resctrl/Makefile
>  create mode 100644 drivers/resctrl/mpam_devices.c
>  create mode 100644 drivers/resctrl/mpam_internal.h
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index e51ccf1da102..ea3c54e04275 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2062,6 +2062,7 @@ config ARM64_TLB_RANGE
>  
>  config ARM64_MPAM
>  	bool "Enable support for MPAM"
> +	select ARM64_MPAM_DRIVER
>  	select ACPI_MPAM if ACPI
>  	help
>  	  Memory Partitioning and Monitoring is an optional extension
> diff --git a/drivers/Kconfig b/drivers/Kconfig
> index 4915a63866b0..3054b50a2f4c 100644
> --- a/drivers/Kconfig
> +++ b/drivers/Kconfig
> @@ -251,4 +251,6 @@ source "drivers/hte/Kconfig"
>  
>  source "drivers/cdx/Kconfig"
>  
> +source "drivers/resctrl/Kconfig"
> +
>  endmenu
> diff --git a/drivers/Makefile b/drivers/Makefile
> index b5749cf67044..f41cf4eddeba 100644
> --- a/drivers/Makefile
> +++ b/drivers/Makefile
> @@ -194,5 +194,6 @@ obj-$(CONFIG_HTE)		+= hte/
>  obj-$(CONFIG_DRM_ACCEL)		+= accel/
>  obj-$(CONFIG_CDX_BUS)		+= cdx/
>  obj-$(CONFIG_DPLL)		+= dpll/
> +obj-y				+= resctrl/
>  
>  obj-$(CONFIG_S390)		+= s390/
> diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
> new file mode 100644
> index 000000000000..dff7b87280ab
> --- /dev/null
> +++ b/drivers/resctrl/Kconfig
> @@ -0,0 +1,11 @@
> +# Confusingly, this is everything but the CPU bits of MPAM. CPU here means
> +# CPU resources, not containers or cgroups etc.
> +config ARM64_MPAM_DRIVER
> +	bool "MPAM driver for System IP, e,g. caches and memory controllers"
> +	depends on ARM64_MPAM && EXPERT
> +
> +config ARM64_MPAM_DRIVER_DEBUG
> +	bool "Enable debug messages from the MPAM driver."
> +	depends on ARM64_MPAM_DRIVER
> +	help
> +	  Say yes here to enable debug messages from the MPAM driver.
> diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
> new file mode 100644
> index 000000000000..92b48fa20108
> --- /dev/null
> +++ b/drivers/resctrl/Makefile
> @@ -0,0 +1,4 @@
> +obj-$(CONFIG_ARM64_MPAM_DRIVER)			+= mpam.o
> +mpam-y						+= mpam_devices.o
> +
> +cflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG)	+= -DDEBUG
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> new file mode 100644
> index 000000000000..a0d9a699a6e7
> --- /dev/null
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -0,0 +1,336 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2025 Arm Ltd.
> +
> +#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
> +
> +#include <linux/acpi.h>
> +#include <linux/arm_mpam.h>
> +#include <linux/cacheinfo.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
> +#include <linux/device.h>
> +#include <linux/errno.h>
> +#include <linux/gfp.h>
> +#include <linux/list.h>
> +#include <linux/lockdep.h>
> +#include <linux/mutex.h>
> +#include <linux/of.h>
> +#include <linux/of_platform.h>
> +#include <linux/platform_device.h>
> +#include <linux/printk.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/srcu.h>
> +#include <linux/types.h>
> +
> +#include <acpi/pcc.h>
> +
> +#include "mpam_internal.h"
> +
> +/*
> + * mpam_list_lock protects the SRCU lists when writing. Once the
> + * mpam_enabled key is enabled these lists are read-only,
> + * unless the error interrupt disables the driver.
> + */
> +static DEFINE_MUTEX(mpam_list_lock);
> +static LIST_HEAD(mpam_all_msc);
> +
> +static struct srcu_struct mpam_srcu;
> +
> +/* MPAM isn't available until all the MSC have been probed. */
> +static u32 mpam_num_msc;
> +
> +static void mpam_discovery_complete(void)
> +{
> +	pr_err("Discovered all MSC\n");
> +}
> +
> +static int mpam_dt_count_msc(void)
> +{
> +	int count = 0;
> +	struct device_node *np;
> +
> +	for_each_compatible_node(np, NULL, "arm,mpam-msc") {
> +		if (of_device_is_available(np))
> +			count++;
> +	}
> +
> +	return count;
> +}
> +
> +static int mpam_dt_parse_resource(struct mpam_msc *msc, struct device_node *np,
> +				  u32 ris_idx)
> +{
> +	int err = 0;
> +	u32 level = 0;
> +	unsigned long cache_id;
> +	struct device_node *cache;
> +
> +	do {
> +		if (of_device_is_compatible(np, "arm,mpam-cache")) {
> +			cache = of_parse_phandle(np, "arm,mpam-device", 0);
> +			if (!cache) {
> +				pr_err("Failed to read phandle\n");
> +				break;
> +			}
> +		} else if (of_device_is_compatible(np->parent, "cache")) {
> +			cache = of_node_get(np->parent);
> +		} else {
> +			/* For now, only caches are supported */
> +			cache = NULL;
> +			break;
> +		}
> +
> +		err = of_property_read_u32(cache, "cache-level", &level);
> +		if (err) {
> +			pr_err("Failed to read cache-level\n");
> +			break;
> +		}
> +
> +		cache_id = cache_of_calculate_id(cache);
> +		if (cache_id == ~0UL) {
> +			err = -ENOENT;
> +			break;
> +		}
> +
> +		err = mpam_ris_create(msc, ris_idx, MPAM_CLASS_CACHE, level,
> +				      cache_id);
> +	} while (0);
> +	of_node_put(cache);
> +
> +	return err;
> +}
> +
> +static int mpam_dt_parse_resources(struct mpam_msc *msc, void *ignored)
> +{
> +	int err, num_ris = 0;
> +	const u32 *ris_idx_p;
> +	struct device_node *iter, *np;
> +
> +	np = msc->pdev->dev.of_node;
> +	for_each_child_of_node(np, iter) {
> +		ris_idx_p = of_get_property(iter, "reg", NULL);
> +		if (ris_idx_p) {
> +			num_ris++;
> +			err = mpam_dt_parse_resource(msc, iter, *ris_idx_p);
> +			if (err) {
> +				of_node_put(iter);
> +				return err;
> +			}
> +		}
> +	}
> +
> +	if (!num_ris)
> +		mpam_dt_parse_resource(msc, np, 0);
> +
> +	return err;
> +}
> +
> +/*
> + * An MSC can control traffic from a set of CPUs, but may only be accessible
> + * from a (hopefully wider) set of CPUs. The common reason for this is power
> + * management. If all the CPUs in a cluster are in PSCI:CPU_SUSPEND, the
> + * the corresponding cache may also be powered off. By making accesses from
> + * one of those CPUs, we ensure this isn't the case.
> + */
> +static int update_msc_accessibility(struct mpam_msc *msc)
> +{
> +	struct device_node *parent;
> +	u32 affinity_id;
> +	int err;
> +
> +	if (!acpi_disabled) {
> +		err = device_property_read_u32(&msc->pdev->dev, "cpu_affinity",
> +					       &affinity_id);
> +		if (err)
> +			cpumask_copy(&msc->accessibility, cpu_possible_mask);
> +		else
> +			acpi_pptt_get_cpus_from_container(affinity_id,
> +							  &msc->accessibility);
> +
> +		return 0;
> +	}
> +
> +	/* This depends on the path to of_node */
> +	parent = of_get_parent(msc->pdev->dev.of_node);
> +	if (parent == of_root) {
> +		cpumask_copy(&msc->accessibility, cpu_possible_mask);
> +		err = 0;
> +	} else {
> +		err = -EINVAL;
> +		pr_err("Cannot determine accessibility of MSC: %s\n",
> +		       dev_name(&msc->pdev->dev));
> +	}
> +	of_node_put(parent);
> +
> +	return err;
> +}
> +
> +static int fw_num_msc;
> +
> +static void mpam_pcc_rx_callback(struct mbox_client *cl, void *msg)
> +{
> +	/* TODO: wake up tasks blocked on this MSC's PCC channel */
> +}
> +
> +static void mpam_msc_drv_remove(struct platform_device *pdev)
> +{
> +	struct mpam_msc *msc = platform_get_drvdata(pdev);
> +
> +	if (!msc)
> +		return;
> +
> +	mutex_lock(&mpam_list_lock);
> +	mpam_num_msc--;
> +	platform_set_drvdata(pdev, NULL);
> +	list_del_rcu(&msc->glbl_list);
> +	synchronize_srcu(&mpam_srcu);
> +	devm_kfree(&pdev->dev, msc);
> +	mutex_unlock(&mpam_list_lock);
> +}
> +
> +static int mpam_msc_drv_probe(struct platform_device *pdev)
> +{
> +	int err;
> +	struct mpam_msc *msc;
> +	struct resource *msc_res;
> +	void *plat_data = pdev->dev.platform_data;
> +
> +	mutex_lock(&mpam_list_lock);
> +	do {
> +		msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
> +		if (!msc) {
> +			err = -ENOMEM;
> +			break;
> +		}
> +
> +		mutex_init(&msc->probe_lock);
> +		mutex_init(&msc->part_sel_lock);
> +		mutex_init(&msc->outer_mon_sel_lock);
> +		raw_spin_lock_init(&msc->inner_mon_sel_lock);
> +		msc->id = mpam_num_msc++;
> +		msc->pdev = pdev;
> +		INIT_LIST_HEAD_RCU(&msc->glbl_list);
> +		INIT_LIST_HEAD_RCU(&msc->ris);
> +
> +		err = update_msc_accessibility(msc);
> +		if (err)
> +			break;
> +		if (cpumask_empty(&msc->accessibility)) {
> +			pr_err_once("msc:%u is not accessible from any CPU!",
> +				    msc->id);
> +			err = -EINVAL;
> +			break;
> +		}
> +
> +		if (device_property_read_u32(&pdev->dev, "pcc-channel",
> +					     &msc->pcc_subspace_id))
> +			msc->iface = MPAM_IFACE_MMIO;
> +		else
> +			msc->iface = MPAM_IFACE_PCC;
> +
> +		if (msc->iface == MPAM_IFACE_MMIO) {
> +			void __iomem *io;
> +
> +			io = devm_platform_get_and_ioremap_resource(pdev, 0,
> +								    &msc_res);
> +			if (IS_ERR(io)) {
> +				pr_err("Failed to map MSC base address\n");
> +				err = PTR_ERR(io);
> +				break;
> +			}
> +			msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
> +			msc->mapped_hwpage = io;
> +		} else if (msc->iface == MPAM_IFACE_PCC) {
> +			msc->pcc_cl.dev = &pdev->dev;
> +			msc->pcc_cl.rx_callback = mpam_pcc_rx_callback;
> +			msc->pcc_cl.tx_block = false;
> +			msc->pcc_cl.tx_tout = 1000; /* 1s */
> +			msc->pcc_cl.knows_txdone = false;
> +
> +			msc->pcc_chan = pcc_mbox_request_channel(&msc->pcc_cl,
> +								 msc->pcc_subspace_id);
> +			if (IS_ERR(msc->pcc_chan)) {
> +				pr_err("Failed to request MSC PCC channel\n");
> +				err = PTR_ERR(msc->pcc_chan);
> +				break;
> +			}
> +		}
> +
> +		list_add_rcu(&msc->glbl_list, &mpam_all_msc);
> +		platform_set_drvdata(pdev, msc);
> +	} while (0);
> +	mutex_unlock(&mpam_list_lock);
> +
> +	if (!err) {
> +		/* Create RIS entries described by firmware */
> +		if (!acpi_disabled)
> +			err = acpi_mpam_parse_resources(msc, plat_data);
> +		else
> +			err = mpam_dt_parse_resources(msc, plat_data);
> +	}
> +
> +	if (!err && fw_num_msc == mpam_num_msc)
> +		mpam_discovery_complete();
> +
> +	if (err && msc)
> +		mpam_msc_drv_remove(pdev);
> +
> +	return err;
> +}
> +
> +static const struct of_device_id mpam_of_match[] = {
> +	{ .compatible = "arm,mpam-msc", },
> +	{},
> +};
> +MODULE_DEVICE_TABLE(of, mpam_of_match);
> +
> +static struct platform_driver mpam_msc_driver = {
> +	.driver = {
> +		.name = "mpam_msc",
> +		.of_match_table = of_match_ptr(mpam_of_match),
> +	},
> +	.probe = mpam_msc_drv_probe,
> +	.remove = mpam_msc_drv_remove,
> +};
> +
> +/*
> + * MSC that are hidden under caches are not created as platform devices
> + * as there is no cache driver. Caches are also special-cased in
> + * update_msc_accessibility().
> + */
> +static void mpam_dt_create_foundling_msc(void)
> +{
> +	int err;
> +	struct device_node *cache;
> +
> +	for_each_compatible_node(cache, NULL, "cache") {
> +		err = of_platform_populate(cache, mpam_of_match, NULL, NULL);
> +		if (err)
> +			pr_err("Failed to create MSC devices under caches\n");
> +	}
> +}
> +
> +static int __init mpam_msc_driver_init(void)
> +{
> +	if (!system_supports_mpam())
> +		return -EOPNOTSUPP;
> +
> +	init_srcu_struct(&mpam_srcu);
> +
> +	if (!acpi_disabled)
> +		fw_num_msc = acpi_mpam_count_msc();
> +	else
> +		fw_num_msc = mpam_dt_count_msc();
> +
> +	if (fw_num_msc <= 0) {
> +		pr_err("No MSC devices found in firmware\n");
> +		return -EINVAL;
> +	}
> +
> +	if (acpi_disabled)
> +		mpam_dt_create_foundling_msc();
> +
> +	return platform_driver_register(&mpam_msc_driver);
> +}
> +subsys_initcall(mpam_msc_driver_init);
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> new file mode 100644
> index 000000000000..07e0f240eaca
> --- /dev/null
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -0,0 +1,62 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +// Copyright (C) 2024 Arm Ltd.
> +
> +#ifndef MPAM_INTERNAL_H
> +#define MPAM_INTERNAL_H
> +
> +#include <linux/arm_mpam.h>
> +#include <linux/cpumask.h>
> +#include <linux/io.h>
> +#include <linux/mailbox_client.h>
> +#include <linux/mutex.h>
> +#include <linux/resctrl.h>
> +#include <linux/sizes.h>
> +
> +struct mpam_msc {
> +	/* member of mpam_all_msc */
> +	struct list_head        glbl_list;
> +
> +	int			id;
> +	struct platform_device *pdev;
> +
> +	/* Not modified after mpam_is_enabled() becomes true */
> +	enum mpam_msc_iface	iface;
> +	u32			pcc_subspace_id;
> +	struct mbox_client	pcc_cl;
> +	struct pcc_mbox_chan	*pcc_chan;
> +	u32			nrdy_usec;
> +	cpumask_t		accessibility;
> +
> +	/*
> +	 * probe_lock is only take during discovery. After discovery these
> +	 * properties become read-only and the lists are protected by SRCU.
> +	 */
> +	struct mutex		probe_lock;
> +	unsigned long		ris_idxs[128 / BITS_PER_LONG];
Why is this sized this way? RIS_MAX is 4 bits and so there are at most
16 RIS per msc.
> +	u32			ris_max;
> +
> +	/* mpam_msc_ris of this component */
> +	struct list_head	ris;
> +
> +	/*
> +	 * part_sel_lock protects access to the MSC hardware registers that are
> +	 * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
> +	 * by RIS).
> +	 * If needed, take msc->lock first.
> +	 */
> +	struct mutex		part_sel_lock;
> +
> +	/*
> +	 * mon_sel_lock protects access to the MSC hardware registers that are
> +	 * affeted by MPAMCFG_MON_SEL.
> +	 * If needed, take msc->lock first.
> +	 */
> +	struct mutex		outer_mon_sel_lock;
> +	raw_spinlock_t		inner_mon_sel_lock;
> +	unsigned long		inner_mon_sel_flags;
> +
> +	void __iomem		*mapped_hwpage;
> +	size_t			mapped_hwpage_sz;
> +};
> +
> +#endif /* MPAM_INTERNAL_H */
Thanks,
Ben
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-09-01  9:11   ` Ben Horgan
@ 2025-09-05 18:49     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-05 18:49 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi Ben,
On 01/09/2025 10:11, Ben Horgan wrote:
> On 8/22/25 16:29, James Morse wrote:
>> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
>> only be accessible from those CPUs, and they may not be online.
>> Touching the hardware early is pointless as MPAM can't be used until
>> the system-wide common values for num_partid and num_pmg have been
>> discovered.
>>
>> Start with driver probe/remove and mapping the MSC.
>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>> new file mode 100644
>> index 000000000000..07e0f240eaca
>> --- /dev/null
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -0,0 +1,62 @@
>> +struct mpam_msc {
>> +	/* member of mpam_all_msc */
>> +	struct list_head        glbl_list;
>> +
>> +	int			id;
>> +	struct platform_device *pdev;
>> +
>> +	/* Not modified after mpam_is_enabled() becomes true */
>> +	enum mpam_msc_iface	iface;
>> +	u32			pcc_subspace_id;
>> +	struct mbox_client	pcc_cl;
>> +	struct pcc_mbox_chan	*pcc_chan;
>> +	u32			nrdy_usec;
>> +	cpumask_t		accessibility;
>> +
>> +	/*
>> +	 * probe_lock is only take during discovery. After discovery these
>> +	 * properties become read-only and the lists are protected by SRCU.
>> +	 */
>> +	struct mutex		probe_lock;
>> +	unsigned long		ris_idxs[128 / BITS_PER_LONG];
> Why is this sized this way? RIS_MAX is 4 bits and so there are at most
> 16 RIS per msc.
Hmmm, lost in time - I agree with the 16 reasoning. Fixed.
(It's likely due to RES0 space above the field - but that has been filled in with other
 stuff since then. RIS was added as a 'backward compatible feature' - I was wary of them
 extending it)
>> +	u32			ris_max;
>> +
>> +	/* mpam_msc_ris of this component */
>> +	struct list_head	ris;
>> +
>> +	/*
>> +	 * part_sel_lock protects access to the MSC hardware registers that are
>> +	 * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
>> +	 * by RIS).
>> +	 * If needed, take msc->lock first.
>> +	 */
>> +	struct mutex		part_sel_lock;
>> +
>> +	/*
>> +	 * mon_sel_lock protects access to the MSC hardware registers that are
>> +	 * affeted by MPAMCFG_MON_SEL.
>> +	 * If needed, take msc->lock first.
>> +	 */
>> +	struct mutex		outer_mon_sel_lock;
>> +	raw_spinlock_t		inner_mon_sel_lock;
>> +	unsigned long		inner_mon_sel_flags;
>> +
>> +	void __iomem		*mapped_hwpage;
>> +	size_t			mapped_hwpage_sz;
>> +};
>> +
>> +#endif /* MPAM_INTERNAL_H */
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
- * Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-08-22 15:29 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
                     ` (4 preceding siblings ...)
  2025-09-01  9:11   ` Ben Horgan
@ 2025-09-01 11:21   ` Dave Martin
  2025-09-05 18:49     ` James Morse
  5 siblings, 1 reply; 200+ messages in thread
From: Dave Martin @ 2025-09-01 11:21 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hi James,
On Fri, Aug 22, 2025 at 03:29:51PM +0000, James Morse wrote:
> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
> only be accessible from those CPUs, and they may not be online.
> Touching the hardware early is pointless as MPAM can't be used until
> the system-wide common values for num_partid and num_pmg have been
> discovered.
> 
> Start with driver probe/remove and mapping the MSC.
> 
> CC: Carl Worth <carl@os.amperecomputing.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
>  * Check for status=broken DT devices.
>  * Moved all the files around.
>  * Made Kconfig symbols depend on EXPERT
> ---
[...]
> diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
> new file mode 100644
> index 000000000000..dff7b87280ab
> --- /dev/null
> +++ b/drivers/resctrl/Kconfig
> @@ -0,0 +1,11 @@
> +# Confusingly, this is everything but the CPU bits of MPAM. CPU here means
> +# CPU resources, not containers or cgroups etc.
Drop confusing comment?
CPUs are not mentioned other than in the comment -- I think the
descriptions are sufficiently self-explanatory that they don't read
onto CPUs.
> +config ARM64_MPAM_DRIVER
> +	bool "MPAM driver for System IP, e,g. caches and memory controllers"
> +	depends on ARM64_MPAM && EXPERT
> +
> +config ARM64_MPAM_DRIVER_DEBUG
> +	bool "Enable debug messages from the MPAM driver."
Nit: spurious full stop.
(i.e., people don't add one in these one-line descriptions.
They are title-like and self-delimiting, even when the text is a valid
sentence.)
> +	depends on ARM64_MPAM_DRIVER
> +	help
> +	  Say yes here to enable debug messages from the MPAM driver.
> diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
> new file mode 100644
> index 000000000000..92b48fa20108
> --- /dev/null
> +++ b/drivers/resctrl/Makefile
> @@ -0,0 +1,4 @@
> +obj-$(CONFIG_ARM64_MPAM_DRIVER)			+= mpam.o
> +mpam-y						+= mpam_devices.o
> +
> +cflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG)	+= -DDEBUG
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> new file mode 100644
> index 000000000000..a0d9a699a6e7
> --- /dev/null
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -0,0 +1,336 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2025 Arm Ltd.
> +
> +#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
> +
> +#include <linux/acpi.h>
> +#include <linux/arm_mpam.h>
> +#include <linux/cacheinfo.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
> +#include <linux/device.h>
> +#include <linux/errno.h>
> +#include <linux/gfp.h>
> +#include <linux/list.h>
> +#include <linux/lockdep.h>
> +#include <linux/mutex.h>
> +#include <linux/of.h>
> +#include <linux/of_platform.h>
> +#include <linux/platform_device.h>
> +#include <linux/printk.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/srcu.h>
> +#include <linux/types.h>
> +
> +#include <acpi/pcc.h>
> +
> +#include "mpam_internal.h"
> +
> +/*
> + * mpam_list_lock protects the SRCU lists when writing. Once the
> + * mpam_enabled key is enabled these lists are read-only,
> + * unless the error interrupt disables the driver.
> + */
> +static DEFINE_MUTEX(mpam_list_lock);
> +static LIST_HEAD(mpam_all_msc);
> +
> +static struct srcu_struct mpam_srcu;
> +
> +/* MPAM isn't available until all the MSC have been probed. */
Comment doesn't really explain the variable.
Maybe something like "Number of MSCs that need to be probed for MPAM
to be usable" ?
> +static u32 mpam_num_msc;
Any particular reason this is u32 and not unsigned int?
How are accesses to this protected against data races?
If there are supposed to be locks to protect globals in the MPAM driver,
is it worth wrapping them in access functions with a lockdep assert?
Otherwise, it feels rather easy to get this wrong -- I think I've found
at least one bug (see mpam_msc_drv_probe().)
> +
> +static void mpam_discovery_complete(void)
> +{
> +	pr_err("Discovered all MSC\n");
> +}
As others have commented, if this is non-functional code that gets
removed later on, it's probably best to drop this up-front?
[...]
> +static int mpam_dt_parse_resource(struct mpam_msc *msc, struct device_node *np,
> +				  u32 ris_idx)
> +{
> +	int err = 0;
> +	u32 level = 0;
> +	unsigned long cache_id;
> +	struct device_node *cache;
> +
> +	do {
> +		if (of_device_is_compatible(np, "arm,mpam-cache")) {
> +			cache = of_parse_phandle(np, "arm,mpam-device", 0);
> +			if (!cache) {
> +				pr_err("Failed to read phandle\n");
> +				break;
> +			}
> +		} else if (of_device_is_compatible(np->parent, "cache")) {
> +			cache = of_node_get(np->parent);
> +		} else {
> +			/* For now, only caches are supported */
> +			cache = NULL;
> +			break;
> +		}
> +
> +		err = of_property_read_u32(cache, "cache-level", &level);
> +		if (err) {
> +			pr_err("Failed to read cache-level\n");
> +			break;
> +		}
> +
> +		cache_id = cache_of_calculate_id(cache);
> +		if (cache_id == ~0UL) {
The type of cache_id may change if the return type of
cache_of_calculate_id() changes (see comments on patch 1).
Possible #define for the exceptional value.
> +			err = -ENOENT;
> +			break;
The lack of a diagnostic here is inconsistent with the level of
diagnostics in the rest of the loop.
> +		}
> +
> +		err = mpam_ris_create(msc, ris_idx, MPAM_CLASS_CACHE, level,
> +				      cache_id);
> +	} while (0);
Abuse of do ... while () here?
There is no loop.  The breaks are stealth "goto"s to this statement:
> +	of_node_put(cache);
(It works either way, but maybe gotos to an explicit label would be
more readable, as well as avoiding an unnecessary level of indentation.)
> +
> +	return err;
> +}
[...]
> +/*
> + * An MSC can control traffic from a set of CPUs, but may only be accessible
> + * from a (hopefully wider) set of CPUs. The common reason for this is power
> + * management. If all the CPUs in a cluster are in PSCI:CPU_SUSPEND, the
> + * the corresponding cache may also be powered off. By making accesses from
Nit: the the
> + * one of those CPUs, we ensure this isn't the case.
> + */
> +static int update_msc_accessibility(struct mpam_msc *msc)
> +{
> +	struct device_node *parent;
> +	u32 affinity_id;
> +	int err;
> +
> +	if (!acpi_disabled) {
> +		err = device_property_read_u32(&msc->pdev->dev, "cpu_affinity",
> +					       &affinity_id);
> +		if (err)
> +			cpumask_copy(&msc->accessibility, cpu_possible_mask);
> +		else
> +			acpi_pptt_get_cpus_from_container(affinity_id,
> +							  &msc->accessibility);
> +
> +		return 0;
> +	}
> +
> +	/* This depends on the path to of_node */
> +	parent = of_get_parent(msc->pdev->dev.of_node);
> +	if (parent == of_root) {
> +		cpumask_copy(&msc->accessibility, cpu_possible_mask);
> +		err = 0;
> +	} else {
> +		err = -EINVAL;
> +		pr_err("Cannot determine accessibility of MSC: %s\n",
> +		       dev_name(&msc->pdev->dev));
> +	}
> +	of_node_put(parent);
> +
> +	return err;
> +}
> +
> +static int fw_num_msc;
Does this need to be protected against data races?
If individual mpam_msc_drv_probe() calls may execute on different CPUs
from mpam_msc_driver_init(), then seem to be potential races here.
> +
> +static void mpam_pcc_rx_callback(struct mbox_client *cl, void *msg)
> +{
> +	/* TODO: wake up tasks blocked on this MSC's PCC channel */
So, is this broken in this commit?
(If the series does not get broken up or applied piecemail, that's not
such a concern, though.)
> +}
> +
> +static void mpam_msc_drv_remove(struct platform_device *pdev)
> +{
The MPAM driver cannot currently be built as a module.
Is it possible to exercise the driver remove paths, today?
> +	struct mpam_msc *msc = platform_get_drvdata(pdev);
> +
> +	if (!msc)
> +		return;
> +
> +	mutex_lock(&mpam_list_lock);
> +	mpam_num_msc--;
> +	platform_set_drvdata(pdev, NULL);
> +	list_del_rcu(&msc->glbl_list);
> +	synchronize_srcu(&mpam_srcu);
> +	devm_kfree(&pdev->dev, msc);
> +	mutex_unlock(&mpam_list_lock);
> +}
> +
> +static int mpam_msc_drv_probe(struct platform_device *pdev)
> +{
> +	int err;
> +	struct mpam_msc *msc;
> +	struct resource *msc_res;
> +	void *plat_data = pdev->dev.platform_data;
> +
> +	mutex_lock(&mpam_list_lock);
> +	do {
> +		msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
> +		if (!msc) {
> +			err = -ENOMEM;
> +			break;
> +		}
> +
> +		mutex_init(&msc->probe_lock);
> +		mutex_init(&msc->part_sel_lock);
> +		mutex_init(&msc->outer_mon_sel_lock);
> +		raw_spin_lock_init(&msc->inner_mon_sel_lock);
> +		msc->id = mpam_num_msc++;
> +		msc->pdev = pdev;
> +		INIT_LIST_HEAD_RCU(&msc->glbl_list);
> +		INIT_LIST_HEAD_RCU(&msc->ris);
> +
> +		err = update_msc_accessibility(msc);
> +		if (err)
> +			break;
> +		if (cpumask_empty(&msc->accessibility)) {
> +			pr_err_once("msc:%u is not accessible from any CPU!",
> +				    msc->id);
> +			err = -EINVAL;
> +			break;
> +		}
> +
> +		if (device_property_read_u32(&pdev->dev, "pcc-channel",
> +					     &msc->pcc_subspace_id))
> +			msc->iface = MPAM_IFACE_MMIO;
> +		else
> +			msc->iface = MPAM_IFACE_PCC;
> +
> +		if (msc->iface == MPAM_IFACE_MMIO) {
> +			void __iomem *io;
> +
> +			io = devm_platform_get_and_ioremap_resource(pdev, 0,
> +								    &msc_res);
> +			if (IS_ERR(io)) {
> +				pr_err("Failed to map MSC base address\n");
> +				err = PTR_ERR(io);
> +				break;
> +			}
> +			msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
> +			msc->mapped_hwpage = io;
> +		} else if (msc->iface == MPAM_IFACE_PCC) {
> +			msc->pcc_cl.dev = &pdev->dev;
> +			msc->pcc_cl.rx_callback = mpam_pcc_rx_callback;
> +			msc->pcc_cl.tx_block = false;
> +			msc->pcc_cl.tx_tout = 1000; /* 1s */
> +			msc->pcc_cl.knows_txdone = false;
> +
> +			msc->pcc_chan = pcc_mbox_request_channel(&msc->pcc_cl,
> +								 msc->pcc_subspace_id);
> +			if (IS_ERR(msc->pcc_chan)) {
> +				pr_err("Failed to request MSC PCC channel\n");
> +				err = PTR_ERR(msc->pcc_chan);
> +				break;
> +			}
> +		}
Should the lock be held across initialisation of the msc fields?
list_add_rcu() might imply sufficient barriers to ensure that the
initialisations are visible to other threads that obtain the msc
pointer by iterating over mpam_all_msc.
It's probably cleaner to hold the lock explicitly, though.
What other ways of obtaining the msc pointer exist?
> +
> +		list_add_rcu(&msc->glbl_list, &mpam_all_msc);
> +		platform_set_drvdata(pdev, msc);
> +	} while (0);
> +	mutex_unlock(&mpam_list_lock);
> +
> +	if (!err) {
> +		/* Create RIS entries described by firmware */
> +		if (!acpi_disabled)
> +			err = acpi_mpam_parse_resources(msc, plat_data);
> +		else
> +			err = mpam_dt_parse_resources(msc, plat_data);
> +	}
> +
> +	if (!err && fw_num_msc == mpam_num_msc)
Unlocked read of mpam_num_msc?
> +		mpam_discovery_complete();
> +
> +	if (err && msc)
> +		mpam_msc_drv_remove(pdev);
> +
> +	return err;
> +}
> +
> +static const struct of_device_id mpam_of_match[] = {
> +	{ .compatible = "arm,mpam-msc", },
> +	{},
> +};
> +MODULE_DEVICE_TABLE(of, mpam_of_match);
> +
> +static struct platform_driver mpam_msc_driver = {
> +	.driver = {
> +		.name = "mpam_msc",
> +		.of_match_table = of_match_ptr(mpam_of_match),
> +	},
> +	.probe = mpam_msc_drv_probe,
> +	.remove = mpam_msc_drv_remove,
> +};
> +
> +/*
> + * MSC that are hidden under caches are not created as platform devices
> + * as there is no cache driver. Caches are also special-cased in
> + * update_msc_accessibility().
> + */
Can you elaborate?  I don't understand quite what this is doing.
> +static void mpam_dt_create_foundling_msc(void)
> +{
> +	int err;
> +	struct device_node *cache;
> +
> +	for_each_compatible_node(cache, NULL, "cache") {
> +		err = of_platform_populate(cache, mpam_of_match, NULL, NULL);
> +		if (err)
> +			pr_err("Failed to create MSC devices under caches\n");
> +	}
> +}
[...]
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> new file mode 100644
> index 000000000000..07e0f240eaca
> --- /dev/null
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -0,0 +1,62 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +// Copyright (C) 2024 Arm Ltd.
> +
> +#ifndef MPAM_INTERNAL_H
> +#define MPAM_INTERNAL_H
> +
> +#include <linux/arm_mpam.h>
> +#include <linux/cpumask.h>
> +#include <linux/io.h>
> +#include <linux/mailbox_client.h>
> +#include <linux/mutex.h>
> +#include <linux/resctrl.h>
> +#include <linux/sizes.h>
> +
> +struct mpam_msc {
> +	/* member of mpam_all_msc */
> +	struct list_head        glbl_list;
It is worth making these names less mismatched?
> +
> +	int			id;
> +	struct platform_device *pdev;
> +
> +	/* Not modified after mpam_is_enabled() becomes true */
> +	enum mpam_msc_iface	iface;
> +	u32			pcc_subspace_id;
> +	struct mbox_client	pcc_cl;
> +	struct pcc_mbox_chan	*pcc_chan;
> +	u32			nrdy_usec;
> +	cpumask_t		accessibility;
> +
> +	/*
> +	 * probe_lock is only take during discovery. After discovery these
> +	 * properties become read-only and the lists are protected by SRCU.
> +	 */
> +	struct mutex		probe_lock;
Can we have more clarify about the locking strategy, including details
of which things each lock is supposed to apply to and when, and how (if
at all) the locks are intended to nest?
(Similarly for the global locks.)
> +	unsigned long		ris_idxs[128 / BITS_PER_LONG];
> +	u32			ris_max;
nrdy_usec, ris_idxs and ris_max appear unused in this patch (though I
suppose they get initialised by virtue of kzalloc()).  Is this
intentional?
> +
> +	/* mpam_msc_ris of this component */
> +	struct list_head	ris;
> +
> +	/*
> +	 * part_sel_lock protects access to the MSC hardware registers that are
> +	 * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
> +	 * by RIS).
> +	 * If needed, take msc->lock first.
> +	 */
What's msc->lock ?
> +	struct mutex		part_sel_lock;
> +
> +	/*
> +	 * mon_sel_lock protects access to the MSC hardware registers that are
> +	 * affeted by MPAMCFG_MON_SEL.
> +	 * If needed, take msc->lock first.
> +	 */
Same here.
> +	struct mutex		outer_mon_sel_lock;
> +	raw_spinlock_t		inner_mon_sel_lock;
> +	unsigned long		inner_mon_sel_flags;
> +
> +	void __iomem		*mapped_hwpage;
> +	size_t			mapped_hwpage_sz;
> +};
> +
> +#endif /* MPAM_INTERNAL_H */
[...]
Cheers
---Dave
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-09-01 11:21   ` Dave Martin
@ 2025-09-05 18:49     ` James Morse
  2025-09-08 15:25       ` Dave Martin
  0 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-09-05 18:49 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hi Dave,
On 01/09/2025 12:21, Dave Martin wrote:
> On Fri, Aug 22, 2025 at 03:29:51PM +0000, James Morse wrote:
>> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
>> only be accessible from those CPUs, and they may not be online.
>> Touching the hardware early is pointless as MPAM can't be used until
>> the system-wide common values for num_partid and num_pmg have been
>> discovered.
>>
>> Start with driver probe/remove and mapping the MSC.
>> diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
>> new file mode 100644
>> index 000000000000..dff7b87280ab
>> --- /dev/null
>> +++ b/drivers/resctrl/Kconfig
>> @@ -0,0 +1,11 @@
>> +# Confusingly, this is everything but the CPU bits of MPAM. CPU here means
>> +# CPU resources, not containers or cgroups etc.
>
> Drop confusing comment?
>
> CPUs are not mentioned other than in the comment -- I think the
> descriptions are sufficiently self-explanatory that they don't read
> onto CPUs.
This used to add ARM_CPU_RESCTRL, to mirror X86_CPU_RESCTRL.
It's been tidied up since then, but the comment remains.
I'll remove it.
>> +config ARM64_MPAM_DRIVER
>> +	bool "MPAM driver for System IP, e,g. caches and memory controllers"
>> +	depends on ARM64_MPAM && EXPERT
>> +
>> +config ARM64_MPAM_DRIVER_DEBUG
>> +	bool "Enable debug messages from the MPAM driver."
>
> Nit: spurious full stop.
>
> (i.e., people don't add one in these one-line descriptions.
> They are title-like and self-delimiting, even when the text is a valid
> sentence.)
/me waves hands around
>> +	depends on ARM64_MPAM_DRIVER
>> +	help
>> +	  Say yes here to enable debug messages from the MPAM driver.
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> new file mode 100644
>> index 000000000000..a0d9a699a6e7
>> --- /dev/null
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -0,0 +1,336 @@
>> +/*
>> + * mpam_list_lock protects the SRCU lists when writing. Once the
>> + * mpam_enabled key is enabled these lists are read-only,
>> + * unless the error interrupt disables the driver.
>> + */
>> +static DEFINE_MUTEX(mpam_list_lock);
>> +static LIST_HEAD(mpam_all_msc);
>> +
>> +static struct srcu_struct mpam_srcu;
>> +
>> +/* MPAM isn't available until all the MSC have been probed. */
>
> Comment doesn't really explain the variable.
>
> Maybe something like "Number of MSCs that need to be probed for MPAM
> to be usable" ?
Its the count not the remainder. I went with:
| * Number of MSCs that have been probed. Once all MSC have been probed MPAM
| * can be enabled.
>> +static u32 mpam_num_msc;
>
> Any particular reason this is u32 and not unsigned int?
u32 is less typing!
> How are accesses to this protected against data races?
It's under the list-lock, but after Rob's feedback I've made it an atomic_t and stopped
using it as an id for all the print messages.
> If there are supposed to be locks to protect globals in the MPAM driver,
> is it worth wrapping them in access functions with a lockdep assert?
> Otherwise, it feels rather easy to get this wrong -- I think I've found
> at least one bug (see mpam_msc_drv_probe().)
Broadly: everything is protected by the list_lock when things are being discovered.
Once everything has been discovered, these things can become read-only.
It's not until everything has been discovered that interrupts get registered, and things
like a potential PMU driver could make calls in a strange context.
Adding helpers would need some global state variable, if (state == foo) lockdep_assert...
I had that early on, but figured it was overkill.
>> +static void mpam_discovery_complete(void)
>> +{
>> +	pr_err("Discovered all MSC\n");
>> +}
> As others have commented, if this is non-functional code that gets
> removed later on, it's probably best to drop this up-front?
It's illustrating that something happens after all the MSC have been discovered.
Knowing that from the beginning of the series is supposed to make the insertion of the
cpuhp notifiers in the middle easier to think about...
>> +static int mpam_dt_parse_resource(struct mpam_msc *msc, struct device_node *np,
>> +				  u32 ris_idx)
>> +{
>> +	int err = 0;
>> +	u32 level = 0;
>> +	unsigned long cache_id;
>> +	struct device_node *cache;
>> +
>> +	do {
>> +		if (of_device_is_compatible(np, "arm,mpam-cache")) {
>> +			cache = of_parse_phandle(np, "arm,mpam-device", 0);
>> +			if (!cache) {
>> +				pr_err("Failed to read phandle\n");
>> +				break;
>> +			}
>> +		} else if (of_device_is_compatible(np->parent, "cache")) {
>> +			cache = of_node_get(np->parent);
>> +		} else {
>> +			/* For now, only caches are supported */
>> +			cache = NULL;
>> +			break;
>> +		}
>> +
>> +		err = of_property_read_u32(cache, "cache-level", &level);
>> +		if (err) {
>> +			pr_err("Failed to read cache-level\n");
>> +			break;
>> +		}
>> +
>> +		cache_id = cache_of_calculate_id(cache);
>> +		if (cache_id == ~0UL) {
>
> The type of cache_id may change if the return type of
> cache_of_calculate_id() changes (see comments on patch 1).
Yup,
> Possible #define for the exceptional value.
I don't think its any more surprising than '-1' as an error value, and its only got one
caller, which is pretty obviously an error path.
>> +			err = -ENOENT;
>> +			break;
>
> The lack of a diagnostic here is inconsistent with the level of
> diagnostics in the rest of the loop.
I've never needed to debug that one because its already visible to user-space. If the
cache-id's are missing, you can tell that in sysfs, you don't need to instrument the kernel.
I'll add one if you think its important. They can all be _once, and as its related to
the probing of a particular device, can use the dev_ print helpers.
>> +		}
>> +
>> +		err = mpam_ris_create(msc, ris_idx, MPAM_CLASS_CACHE, level,
>> +				      cache_id);
>> +	} while (0);
>
> Abuse of do ... while () here?
>
> There is no loop.  The breaks are stealth "goto"s to this statement:
Yes. The alternative would be actual gotos - which is surely worse!
It just wasn't worth pulling this out as a separate function.
>> +	of_node_put(cache);
>
> (It works either way, but maybe gotos to an explicit label would be
> more readable, as well as avoiding an unnecessary level of indentation.)
As the cleanup magic has become fashionable, I'll switch to using that...
>> +
>> +	return err;
>> +}
>
> [...]
>
>> +/*
>> + * An MSC can control traffic from a set of CPUs, but may only be accessible
>> + * from a (hopefully wider) set of CPUs. The common reason for this is power
>> + * management. If all the CPUs in a cluster are in PSCI:CPU_SUSPEND, the
>> + * the corresponding cache may also be powered off. By making accesses from
>
> Nit: the the
Fixed,
>> + * one of those CPUs, we ensure this isn't the case.
>> + */
>> +static int fw_num_msc;
>
> Does this need to be protected against data races?
>
> If individual mpam_msc_drv_probe() calls may execute on different CPUs
> from mpam_msc_driver_init(), then seem to be potential races here.
Incrementing was under the list-lock, but not the last 'are they all done' read. Following
Rob's comments I've made this an atomic_t.
>> +
>> +static void mpam_pcc_rx_callback(struct mbox_client *cl, void *msg)
>> +{
>> +	/* TODO: wake up tasks blocked on this MSC's PCC channel */
>
> So, is this broken in this commit?>
> (If the series does not get broken up or applied piecemail, that's not
> such a concern, though.)
Unsupported - or at least only enough to not mistake them for MMIO devices.
I've pulled this out to a later patch in the tree that isn't in this series.
The platforms that need this haven't yet materialised, (and may not!).
There is a prototype for DT/SCMI, but nothing I've seen yet for ACPI.
>> +}
>> +
>> +static void mpam_msc_drv_remove(struct platform_device *pdev)
>> +{
>
> The MPAM driver cannot currently be built as a module.
>
> Is it possible to exercise the driver remove paths, today?
Yes, through the sysfs unbind interface.
It doesn't make a lot of sense for MPAM as the moment you unbind the driver from one MSC
it has to work out if it needs to teardown resctrl...
>> +	struct mpam_msc *msc = platform_get_drvdata(pdev);
>> +
>> +	if (!msc)
>> +		return;
>> +
>> +	mutex_lock(&mpam_list_lock);
>> +	mpam_num_msc--;
>> +	platform_set_drvdata(pdev, NULL);
>> +	list_del_rcu(&msc->glbl_list);
>> +	synchronize_srcu(&mpam_srcu);
>> +	devm_kfree(&pdev->dev, msc);
>> +	mutex_unlock(&mpam_list_lock);
>> +}
>> +
>> +static int mpam_msc_drv_probe(struct platform_device *pdev)
>> +{
>> +	int err;
>> +	struct mpam_msc *msc;
>> +	struct resource *msc_res;
>> +	void *plat_data = pdev->dev.platform_data;
>> +
>> +	mutex_lock(&mpam_list_lock);
>> +	do {
>> +		msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
>> +		if (!msc) {
>> +			err = -ENOMEM;
>> +			break;
>> +		}
>> +
>> +		mutex_init(&msc->probe_lock);
>> +		mutex_init(&msc->part_sel_lock);
>> +		mutex_init(&msc->outer_mon_sel_lock);
>> +		raw_spin_lock_init(&msc->inner_mon_sel_lock);
>> +		msc->id = mpam_num_msc++;
>> +		msc->pdev = pdev;
>> +		INIT_LIST_HEAD_RCU(&msc->glbl_list);
>> +		INIT_LIST_HEAD_RCU(&msc->ris);
>> +
>> +		err = update_msc_accessibility(msc);
>> +		if (err)
>> +			break;
>> +		if (cpumask_empty(&msc->accessibility)) {
>> +			pr_err_once("msc:%u is not accessible from any CPU!",
>> +				    msc->id);
>> +			err = -EINVAL;
>> +			break;
>> +		}
>> +
>> +		if (device_property_read_u32(&pdev->dev, "pcc-channel",
>> +					     &msc->pcc_subspace_id))
>> +			msc->iface = MPAM_IFACE_MMIO;
>> +		else
>> +			msc->iface = MPAM_IFACE_PCC;
>> +
>> +		if (msc->iface == MPAM_IFACE_MMIO) {
>> +			void __iomem *io;
>> +
>> +			io = devm_platform_get_and_ioremap_resource(pdev, 0,
>> +								    &msc_res);
>> +			if (IS_ERR(io)) {
>> +				pr_err("Failed to map MSC base address\n");
>> +				err = PTR_ERR(io);
>> +				break;
>> +			}
>> +			msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
>> +			msc->mapped_hwpage = io;
>> +		} else if (msc->iface == MPAM_IFACE_PCC) {
>> +			msc->pcc_cl.dev = &pdev->dev;
>> +			msc->pcc_cl.rx_callback = mpam_pcc_rx_callback;
>> +			msc->pcc_cl.tx_block = false;
>> +			msc->pcc_cl.tx_tout = 1000; /* 1s */
>> +			msc->pcc_cl.knows_txdone = false;
>> +
>> +			msc->pcc_chan = pcc_mbox_request_channel(&msc->pcc_cl,
>> +								 msc->pcc_subspace_id);
>> +			if (IS_ERR(msc->pcc_chan)) {
>> +				pr_err("Failed to request MSC PCC channel\n");
>> +				err = PTR_ERR(msc->pcc_chan);
>> +				break;
>> +			}
>> +		}
> Should the lock be held across initialisation of the msc fields?
The msc isn't visible until its added to the list, so provided all that inialisation is
done 'before' its added to the list, it doesn't matter.
> list_add_rcu() might imply sufficient barriers to ensure that the
> initialisations are visible to other threads that obtain the msc
> pointer by iterating over mpam_all_msc.
>
> It's probably cleaner to hold the lock explicitly, though.
The list lock? We do.
But the readers don't need to take the list lock, its only there to prevent concurrent
writers.
> What other ways of obtaining the msc pointer exist?
The class/component/device structures in a subsequent patch, protected in the same way.
Once MPAM is enabled all that can be sprayed through resctrl - at which point no
modifications are allowed, and teardown for fatal errors depends on the static-key.
>> +
>> +		list_add_rcu(&msc->glbl_list, &mpam_all_msc);
>> +		platform_set_drvdata(pdev, msc);
>> +	} while (0);
>> +	mutex_unlock(&mpam_list_lock);
>> +
>> +	if (!err) {
>> +		/* Create RIS entries described by firmware */
>> +		if (!acpi_disabled)
>> +			err = acpi_mpam_parse_resources(msc, plat_data);
>> +		else
>> +			err = mpam_dt_parse_resources(msc, plat_data);
>> +	}
>> +
>> +	if (!err && fw_num_msc == mpam_num_msc)
> Unlocked read of mpam_num_msc?
Fixed as an atomic_t flavoured thing.
>> +		mpam_discovery_complete();
>> +
>> +	if (err && msc)
>> +		mpam_msc_drv_remove(pdev);
>> +
>> +	return err;
>> +}
>> +/*
>> + * MSC that are hidden under caches are not created as platform devices
>> + * as there is no cache driver. Caches are also special-cased in
>> + * update_msc_accessibility().
>> + */
>
> Can you elaborate?  I don't understand quite what this is doing.
/ {
    my_thing {
        compatible = "my_thing";
        msc {
           compatible = "arm,mpam-msc";
        };
    };
    other_thing {
        compatible = "other_thing";
    };
    msc {
       compatible = "arm,mpam-msc";
       arm,mpam-device = <&other_thing>;
    };
    l2-cache {
      compatible = "cache";
        msc {
           compatible = "arm,mpam-msc";
        };
    };
};
my_thing and other_thing's MSC will have devices created - but the cache will not, because
it's a cache not a device, and anything below it is ignored.
>> +static void mpam_dt_create_foundling_msc(void)
>> +{
>> +	int err;
>> +	struct device_node *cache;
>> +
>> +	for_each_compatible_node(cache, NULL, "cache") {
>> +		err = of_platform_populate(cache, mpam_of_match, NULL, NULL);
>> +		if (err)
>> +			pr_err("Failed to create MSC devices under caches\n");
>> +	}
>> +}
>
> [...]
>
>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>> new file mode 100644
>> index 000000000000..07e0f240eaca
>> --- /dev/null
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -0,0 +1,62 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +// Copyright (C) 2024 Arm Ltd.
>> +
>> +#ifndef MPAM_INTERNAL_H
>> +#define MPAM_INTERNAL_H
>> +
>> +#include <linux/arm_mpam.h>
>> +#include <linux/cpumask.h>
>> +#include <linux/io.h>
>> +#include <linux/mailbox_client.h>
>> +#include <linux/mutex.h>
>> +#include <linux/resctrl.h>
>> +#include <linux/sizes.h>
>> +
>> +struct mpam_msc {
>> +	/* member of mpam_all_msc */
>> +	struct list_head        glbl_list;
> It is worth making these names less mismatched?
all_msc_list ?
It's because its global. The pattern otherwise is parent has a list foo, and all the
children have a member 'foo_list'.
>> +
>> +	int			id;
>> +	struct platform_device *pdev;
>> +
>> +	/* Not modified after mpam_is_enabled() becomes true */
>> +	enum mpam_msc_iface	iface;
>> +	u32			pcc_subspace_id;
>> +	struct mbox_client	pcc_cl;
>> +	struct pcc_mbox_chan	*pcc_chan;
>> +	u32			nrdy_usec;
>> +	cpumask_t		accessibility;
>> +
>> +	/*
>> +	 * probe_lock is only take during discovery. After discovery these
>> +	 * properties become read-only and the lists are protected by SRCU.
>> +	 */
>> +	struct mutex		probe_lock;
>
> Can we have more clarify about the locking strategy, including details
> of which things each lock is supposed to apply to and when, and how (if
> at all) the locks are intended to nest?
The comment above is supposed to describe that. A rule of thumb is its all the stuff
'below' it in the struct, there is also a comment for each order. This one is a bit of a
catch all for the values in this struct that can be written.
The purpose of the part_sel_lock and mon_sel_lock arrangement becomes much more obvious in
subsequent patches when the driver starts accessing the registers of those names.
> (Similarly for the global locks.)
>
>> +	unsigned long		ris_idxs[128 / BITS_PER_LONG];
>> +	u32			ris_max;
> nrdy_usec, ris_idxs and ris_max appear unused in this patch (though I
> suppose they get initialised by virtue of kzalloc()).  Is this
> intentional?
To reduce the amount of churn in the series the bulk of the structure is added here, then
the stuff that build more complicated structures, accesses hardware, deals with interrupts
etc.
>> +
>> +	/* mpam_msc_ris of this component */
>> +	struct list_head	ris;
>> +
>> +	/*
>> +	 * part_sel_lock protects access to the MSC hardware registers that are
>> +	 * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
>> +	 * by RIS).
>> +	 * If needed, take msc->lock first.
>> +	 */
> What's msc->lock ?
The old name for probe_lock, before it was necessary to lock the hardware registers
separately. I've fixed the name.
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-09-05 18:49     ` James Morse
@ 2025-09-08 15:25       ` Dave Martin
  2025-09-10 19:19         ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Dave Martin @ 2025-09-08 15:25 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
On Fri, Sep 05, 2025 at 07:49:37PM +0100, James Morse wrote:
> Hi Dave,
> 
> On 01/09/2025 12:21, Dave Martin wrote:
> > On Fri, Aug 22, 2025 at 03:29:51PM +0000, James Morse wrote:
> >> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
> >> only be accessible from those CPUs, and they may not be online.
> >> Touching the hardware early is pointless as MPAM can't be used until
> >> the system-wide common values for num_partid and num_pmg have been
> >> discovered.
> >>
> >> Start with driver probe/remove and mapping the MSC.
> >> diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
> >> new file mode 100644
> >> index 000000000000..dff7b87280ab
> >> --- /dev/null
> >> +++ b/drivers/resctrl/Kconfig
> >> @@ -0,0 +1,11 @@
> >> +# Confusingly, this is everything but the CPU bits of MPAM. CPU here means
> >> +# CPU resources, not containers or cgroups etc.
> >
> > Drop confusing comment?
> >
> > CPUs are not mentioned other than in the comment -- I think the
> > descriptions are sufficiently self-explanatory that they don't read
> > onto CPUs.
> 
> This used to add ARM_CPU_RESCTRL, to mirror X86_CPU_RESCTRL.
> It's been tidied up since then, but the comment remains.
> I'll remove it.
OK
> >> +config ARM64_MPAM_DRIVER
> >> +	bool "MPAM driver for System IP, e,g. caches and memory controllers"
> >> +	depends on ARM64_MPAM && EXPERT
> >> +
> >> +config ARM64_MPAM_DRIVER_DEBUG
> >> +	bool "Enable debug messages from the MPAM driver."
> >
> > Nit: spurious full stop.
> >
> > (i.e., people don't add one in these one-line descriptions.
> > They are title-like and self-delimiting, even when the text is a valid
> > sentence.)
> 
> /me waves hands around
I did say "Nit" ;)  (This was mainly a "hmm, this doesn't look quite
like the rest" thing.)
> >> +	depends on ARM64_MPAM_DRIVER
> >> +	help
> >> +	  Say yes here to enable debug messages from the MPAM driver.
> 
> >> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> >> new file mode 100644
> >> index 000000000000..a0d9a699a6e7
> >> --- /dev/null
> >> +++ b/drivers/resctrl/mpam_devices.c
> >> @@ -0,0 +1,336 @@
> 
> >> +/*
> >> + * mpam_list_lock protects the SRCU lists when writing. Once the
> >> + * mpam_enabled key is enabled these lists are read-only,
> >> + * unless the error interrupt disables the driver.
> >> + */
> >> +static DEFINE_MUTEX(mpam_list_lock);
> >> +static LIST_HEAD(mpam_all_msc);
> >> +
> >> +static struct srcu_struct mpam_srcu;
> >> +
> >> +/* MPAM isn't available until all the MSC have been probed. */
> >
> > Comment doesn't really explain the variable.
> >
> > Maybe something like "Number of MSCs that need to be probed for MPAM
> > to be usable" ?
> 
> Its the count not the remainder. I went with:
> | * Number of MSCs that have been probed. Once all MSC have been probed MPAM
> | * can be enabled.
Thanks, that's clearer.
> >> +static u32 mpam_num_msc;
> >
> > Any particular reason this is u32 and not unsigned int?
> u32 is less typing!
But more effort for the reader / reviewer -- it's semantic noise.
(It works either way though, obviosly.  It looks like this code may
have caught fixed-size-itis off the resctrl code, to some degree.)
> > How are accesses to this protected against data races?
> 
> It's under the list-lock, but after Rob's feedback I've made it an atomic_t and stopped
> using it as an id for all the print messages.
Converting to atomic_t reduces the chance of people asking the
question, but doesn't really answer the question.
Since mpam_num_msc shadows the contents of the lists and msc data
structures, it may matter whether the two can be seen out of sync.
Does it definitely not matter?
> > If there are supposed to be locks to protect globals in the MPAM driver,
> > is it worth wrapping them in access functions with a lockdep assert?
> > Otherwise, it feels rather easy to get this wrong -- I think I've found
> > at least one bug (see mpam_msc_drv_probe().)
> Broadly: everything is protected by the list_lock when things are being discovered.
> Once everything has been discovered, these things can become read-only.
> 
> It's not until everything has been discovered that interrupts get registered, and things
> like a potential PMU driver could make calls in a strange context.
> 
> Adding helpers would need some global state variable, if (state == foo) lockdep_assert...
> I had that early on, but figured it was overkill.
I wonder whether it would be worth migrating this, so that the probe-
time variables (which are read-write) can be kept separate from the
run-time system description variables (which are mostly write-once).
This would avoid having to support two different locking scenarios with
a single mechanism.
I have a bit of a concern that there are too many synchronisation
mechanisms in use, with purposes that overlap and are in some cases
not clealy described and not obvious -- at least, not to me.
I don't think the fact that there have been few comments on this area
necessarily indicates that other reviewers have fully understood the
locking.
> >> +static void mpam_discovery_complete(void)
> >> +{
> >> +	pr_err("Discovered all MSC\n");
> >> +}
> 
> > As others have commented, if this is non-functional code that gets
> > removed later on, it's probably best to drop this up-front?
> It's illustrating that something happens after all the MSC have been discovered.
> Knowing that from the beginning of the series is supposed to make the insertion of the
> cpuhp notifiers in the middle easier to think about...
So long as this was not unintentionally left behind when splitting the
series, I guess it's OK to have it here -- as you say, it does motivate
the shape that the code will eventually need to have.
> >> +static int mpam_dt_parse_resource(struct mpam_msc *msc, struct device_node *np,
> >> +				  u32 ris_idx)
> >> +{
> >> +	int err = 0;
> >> +	u32 level = 0;
> >> +	unsigned long cache_id;
> >> +	struct device_node *cache;
> >> +
> >> +	do {
> >> +		if (of_device_is_compatible(np, "arm,mpam-cache")) {
> >> +			cache = of_parse_phandle(np, "arm,mpam-device", 0);
> >> +			if (!cache) {
> >> +				pr_err("Failed to read phandle\n");
> >> +				break;
> >> +			}
> >> +		} else if (of_device_is_compatible(np->parent, "cache")) {
> >> +			cache = of_node_get(np->parent);
> >> +		} else {
> >> +			/* For now, only caches are supported */
> >> +			cache = NULL;
> >> +			break;
> >> +		}
> >> +
> >> +		err = of_property_read_u32(cache, "cache-level", &level);
> >> +		if (err) {
> >> +			pr_err("Failed to read cache-level\n");
> >> +			break;
> >> +		}
> >> +
> >> +		cache_id = cache_of_calculate_id(cache);
> >> +		if (cache_id == ~0UL) {
> >
> > The type of cache_id may change if the return type of
> > cache_of_calculate_id() changes (see comments on patch 1).
> 
> Yup,
> 
> > Possible #define for the exceptional value.
> 
> I don't think its any more surprising than '-1' as an error value, and its only got one
> caller, which is pretty obviously an error path.
A #define is not essential.  (As I say elsewhere, I don't entirely
trust uses of ~ where the types may be mixed.  But this case look low-
isk for future maintenance.)
> >> +			err = -ENOENT;
> >> +			break;
> >
> > The lack of a diagnostic here is inconsistent with the level of
> > diagnostics in the rest of the loop.
> 
> I've never needed to debug that one because its already visible to user-space. If the
> cache-id's are missing, you can tell that in sysfs, you don't need to instrument the kernel.
> 
> I'll add one if you think its important. They can all be _once, and as its related to
> the probing of a particular device, can use the dev_ print helpers.
No, I guess that's fine.  I don't have a strong feel for which things a
user is likely to have to debug in practice, and I accept that we
shouldn't try to cover absolutely everything.
> >> +		}
> >> +
> >> +		err = mpam_ris_create(msc, ris_idx, MPAM_CLASS_CACHE, level,
> >> +				      cache_id);
> >> +	} while (0);
> >
> > Abuse of do ... while () here?
> >
> > There is no loop.  The breaks are stealth "goto"s to this statement:
> Yes. The alternative would be actual gotos - which is surely worse!
> It just wasn't worth pulling this out as a separate function.
Again, it's semantic noise.  do ... while is a looping construct.
If you write "do {", the reader expects a loop and has to analyse the
code in order to conclude that there isn't.
Using a loop in order to avoid giving a label to the sequence point at
the closing bracket is a neat trick, but it's not helpful to someone
reading the code.  I can live with it, but as an idiom this seems rare.
Use of gotos for doing cleanup seems to be the most common idiom for
this in the kernel.  It may be inelegant, but it is likely to be readily
understood.  However...
> 
> 
> >> +	of_node_put(cache);
> >
> > (It works either way, but maybe gotos to an explicit label would be
> > more readable, as well as avoiding an unnecessary level of indentation.)
> As the cleanup magic has become fashionable, I'll switch to using that...
...I guess that works, too.
(Not that I much like that clunky bolt-on extension to the language,
but it should do the job and at least avoids arguments about precisely
how or where the cleanup happens.)
> >> +
> >> +	return err;
> >> +}
> >
> > [...]
> >
> >> +/*
> >> + * An MSC can control traffic from a set of CPUs, but may only be accessible
> >> + * from a (hopefully wider) set of CPUs. The common reason for this is power
> >> + * management. If all the CPUs in a cluster are in PSCI:CPU_SUSPEND, the
> >> + * the corresponding cache may also be powered off. By making accesses from
> >
> > Nit: the the
> 
> Fixed,
> 
> 
> >> + * one of those CPUs, we ensure this isn't the case.
> >> + */
> 
> >> +static int fw_num_msc;
> >
> > Does this need to be protected against data races?
> >
> > If individual mpam_msc_drv_probe() calls may execute on different CPUs
> > from mpam_msc_driver_init(), then seem to be potential races here.
> 
> Incrementing was under the list-lock, but not the last 'are they all done' read. Following
> Rob's comments I've made this an atomic_t.
As with mpam_num_msc, this eliminates data races on fw_num_msc, but
races between this variable and other data structures may remain.
Can you explain what prevents such races, or why they don't
matter?
> 
> 
> >> +
> >> +static void mpam_pcc_rx_callback(struct mbox_client *cl, void *msg)
> >> +{
> >> +	/* TODO: wake up tasks blocked on this MSC's PCC channel */
> >
> > So, is this broken in this commit?>
> > (If the series does not get broken up or applied piecemail, that's not
> > such a concern, though.)
> Unsupported - or at least only enough to not mistake them for MMIO devices.
> I've pulled this out to a later patch in the tree that isn't in this series.
> 
> The platforms that need this haven't yet materialised, (and may not!).
> There is a prototype for DT/SCMI, but nothing I've seen yet for ACPI.
OK; I guess deferring the introduction of this until there is more
context for it makes sense.
> >> +}
> >> +
> >> +static void mpam_msc_drv_remove(struct platform_device *pdev)
> >> +{
> >
> > The MPAM driver cannot currently be built as a module.
> >
> > Is it possible to exercise the driver remove paths, today?
> Yes, through the sysfs unbind interface.
> 
> It doesn't make a lot of sense for MPAM as the moment you unbind the driver from one MSC
> it has to work out if it needs to teardown resctrl...
Has to, but doesn't?  Have I missed something?
Are we supposed to put the MSC back into a sane state, for e.g. the
kexec path?
For resctrl, one option would be to stub out the backend -- i.e.,
we don't tell resctrl that the affected resources disappeared, but
attempts to manipulate the affected MSC(s) are stubbed out (similarly
to what happens to an open tty after a hangup).
(Something along these lines may be done somewhere, but I'm not
currently aware of it.)
Is there an outstanding get on the device that prevents us from getting
here until resctrl is shut down?  Thanks to the wisdom and restraint of
systemd I'd expect resctrl to tangled up in some rat's nest of
unremovable mounts by the time we try to shut down, but I hope I'm
being pessimistic.  (Arguably that's not our bug, if so.)
> >> +	struct mpam_msc *msc = platform_get_drvdata(pdev);
> >> +
> >> +	if (!msc)
> >> +		return;
> >> +
> >> +	mutex_lock(&mpam_list_lock);
> >> +	mpam_num_msc--;
> >> +	platform_set_drvdata(pdev, NULL);
> >> +	list_del_rcu(&msc->glbl_list);
> >> +	synchronize_srcu(&mpam_srcu);
> >> +	devm_kfree(&pdev->dev, msc);
> >> +	mutex_unlock(&mpam_list_lock);
> >> +}
> >> +
> >> +static int mpam_msc_drv_probe(struct platform_device *pdev)
> >> +{
> >> +	int err;
> >> +	struct mpam_msc *msc;
> >> +	struct resource *msc_res;
> >> +	void *plat_data = pdev->dev.platform_data;
> >> +
> >> +	mutex_lock(&mpam_list_lock);
> >> +	do {
> >> +		msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
> >> +		if (!msc) {
> >> +			err = -ENOMEM;
> >> +			break;
> >> +		}
> >> +
> >> +		mutex_init(&msc->probe_lock);
> >> +		mutex_init(&msc->part_sel_lock);
> >> +		mutex_init(&msc->outer_mon_sel_lock);
> >> +		raw_spin_lock_init(&msc->inner_mon_sel_lock);
> >> +		msc->id = mpam_num_msc++;
> >> +		msc->pdev = pdev;
> >> +		INIT_LIST_HEAD_RCU(&msc->glbl_list);
msc->glbl_list is not a list head?  Does it need to be initialised at all?
list_add_rcu() will just splat it, by the looks of it.
> >> +		INIT_LIST_HEAD_RCU(&msc->ris);
Maybe INIT_LIST_HEAD_RCU() isn't needed here.  Do we ever access this
list without holding one of the MSC locks?
This list is not used until a cpuhp hook comes in to probe the MSC, and
then mpam_discovery_cpu_online() obtains a pointer via
list_for_each_entry() -- but this is not RCU-protected.  The MSC probe
lock is taken, there...
> >> +
> >> +		err = update_msc_accessibility(msc);
> >> +		if (err)
> >> +			break;
> >> +		if (cpumask_empty(&msc->accessibility)) {
> >> +			pr_err_once("msc:%u is not accessible from any CPU!",
> >> +				    msc->id);
> >> +			err = -EINVAL;
> >> +			break;
> >> +		}
> >> +
> >> +		if (device_property_read_u32(&pdev->dev, "pcc-channel",
> >> +					     &msc->pcc_subspace_id))
> >> +			msc->iface = MPAM_IFACE_MMIO;
> >> +		else
> >> +			msc->iface = MPAM_IFACE_PCC;
> >> +
> >> +		if (msc->iface == MPAM_IFACE_MMIO) {
> >> +			void __iomem *io;
> >> +
> >> +			io = devm_platform_get_and_ioremap_resource(pdev, 0,
> >> +								    &msc_res);
> >> +			if (IS_ERR(io)) {
> >> +				pr_err("Failed to map MSC base address\n");
> >> +				err = PTR_ERR(io);
> >> +				break;
> >> +			}
> >> +			msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
> >> +			msc->mapped_hwpage = io;
> >> +		} else if (msc->iface == MPAM_IFACE_PCC) {
> >> +			msc->pcc_cl.dev = &pdev->dev;
> >> +			msc->pcc_cl.rx_callback = mpam_pcc_rx_callback;
> >> +			msc->pcc_cl.tx_block = false;
> >> +			msc->pcc_cl.tx_tout = 1000; /* 1s */
> >> +			msc->pcc_cl.knows_txdone = false;
... so, it feels like we may need to hold the probe lock, or ensure
that all iterations over the msc list are RCU-protected (see below for
counterexamples), or both.
> >> +
> >> +			msc->pcc_chan = pcc_mbox_request_channel(&msc->pcc_cl,
> >> +								 msc->pcc_subspace_id);
> >> +			if (IS_ERR(msc->pcc_chan)) {
> >> +				pr_err("Failed to request MSC PCC channel\n");
> >> +				err = PTR_ERR(msc->pcc_chan);
> >> +				break;
> >> +			}
> >> +		}
> 
> > Should the lock be held across initialisation of the msc fields?
> 
> The msc isn't visible until its added to the list, so provided all that inialisation is
> done 'before' its added to the list, it doesn't matter.
Is it possible for other code to get a pointer to the new msc after
this, other than by dereferencing the list?
(The obvious case is the interrupt handlers, but it looks like the msc
pointers used for registering the interrupts are indeed obtained
through an RCU-protected iteration over mpam_all_msc.)
Note, there seem to be non-RCU-protected iterations over mpam_all_msc
in the mpam_discovery_cpu_online() (patch 14) and
mpam_enable_merge_features() paths (patch 18).  The lack of symmetry
between list maintenance and consumption look a little suspect for
those -- is safety ensured in some other way?
> > list_add_rcu() might imply sufficient barriers to ensure that the
> > initialisations are visible to other threads that obtain the msc
> > pointer by iterating over mpam_all_msc.
> >
> > It's probably cleaner to hold the lock explicitly, though.
> 
> The list lock? We do.
> But the readers don't need to take the list lock, its only there to prevent concurrent
> writers.
I meant whatever lock is supposed to protect the fields of the specific
msc.  The list lock is not that lock.
> 
> 
> > What other ways of obtaining the msc pointer exist?
> The class/component/device structures in a subsequent patch, protected in the same way.
> Once MPAM is enabled all that can be sprayed through resctrl - at which point no
> modifications are allowed, and teardown for fatal errors depends on the static-key.
I'll bear this in mind as I review.
> >> +
> >> +		list_add_rcu(&msc->glbl_list, &mpam_all_msc);
> >> +		platform_set_drvdata(pdev, msc);
> >> +	} while (0);
> >> +	mutex_unlock(&mpam_list_lock);
> >> +
> >> +	if (!err) {
> >> +		/* Create RIS entries described by firmware */
> >> +		if (!acpi_disabled)
> >> +			err = acpi_mpam_parse_resources(msc, plat_data);
> >> +		else
> >> +			err = mpam_dt_parse_resources(msc, plat_data);
> >> +	}
> >> +
> >> +	if (!err && fw_num_msc == mpam_num_msc)
> 
> > Unlocked read of mpam_num_msc?
> Fixed as an atomic_t flavoured thing.
Ditto comments above about whether it is a problem that mpam_num_msc
can now be seen out of sync with the list.
> >> +		mpam_discovery_complete();
> >> +
> >> +	if (err && msc)
> >> +		mpam_msc_drv_remove(pdev);
> >> +
> >> +	return err;
> >> +}
> >> +/*
> >> + * MSC that are hidden under caches are not created as platform devices
> >> + * as there is no cache driver. Caches are also special-cased in
> >> + * update_msc_accessibility().
> >> + */
> >
> > Can you elaborate?  I don't understand quite what this is doing.
> 
> / {
>     my_thing {
>         compatible = "my_thing";
>         msc {
>            compatible = "arm,mpam-msc";
>         };
>     };
> 
>     other_thing {
>         compatible = "other_thing";
>     };
> 
>     msc {
>        compatible = "arm,mpam-msc";
>        arm,mpam-device = <&other_thing>;
>     };
> 
> 
>     l2-cache {
>       compatible = "cache";
>         msc {
>            compatible = "arm,mpam-msc";
>         };
>     };
> };
> 
> my_thing and other_thing's MSC will have devices created - but the cache will not, because
> it's a cache not a device, and anything below it is ignored.
OK.  Maybe reword as something like:
--8<--
MSCs that are declared by the firmware as being part of a cache may not
be created automatically as platform devices, since there is no
dedicated cache driver.
Deal with those MSCs here.
-->8--
Maybe add a comment at update_msc_accessibility() that references this
comment, instead of a reader of that function just needing to know that
this comment is here?
> 
> 
> >> +static void mpam_dt_create_foundling_msc(void)
> >> +{
> >> +	int err;
> >> +	struct device_node *cache;
> >> +
> >> +	for_each_compatible_node(cache, NULL, "cache") {
> >> +		err = of_platform_populate(cache, mpam_of_match, NULL, NULL);
> >> +		if (err)
> >> +			pr_err("Failed to create MSC devices under caches\n");
> >> +	}
> >> +}
> >
> > [...]
> >
> >> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> >> new file mode 100644
> >> index 000000000000..07e0f240eaca
> >> --- /dev/null
> >> +++ b/drivers/resctrl/mpam_internal.h
> >> @@ -0,0 +1,62 @@
> >> +/* SPDX-License-Identifier: GPL-2.0 */
> >> +// Copyright (C) 2024 Arm Ltd.
> >> +
> >> +#ifndef MPAM_INTERNAL_H
> >> +#define MPAM_INTERNAL_H
> >> +
> >> +#include <linux/arm_mpam.h>
> >> +#include <linux/cpumask.h>
> >> +#include <linux/io.h>
> >> +#include <linux/mailbox_client.h>
> >> +#include <linux/mutex.h>
> >> +#include <linux/resctrl.h>
> >> +#include <linux/sizes.h>
> >> +
> >> +struct mpam_msc {
> >> +	/* member of mpam_all_msc */
> >> +	struct list_head        glbl_list;
> 
> > It is worth making these names less mismatched?
> 
> all_msc_list ?
> 
> It's because its global. The pattern otherwise is parent has a list foo, and all the
> children have a member 'foo_list'.
"all_msc", then?
I'm not sure that it's essential for the list head's name to have
"list" in it: that's clear from the type and from how it is used.
"all" seems sufficient to imply that this is a list (?)
It doesn't get much more global than "all".
(This is purely cosmetic, of course.)
> 
> 
> >> +
> >> +	int			id;
> >> +	struct platform_device *pdev;
> >> +
> >> +	/* Not modified after mpam_is_enabled() becomes true */
> >> +	enum mpam_msc_iface	iface;
> >> +	u32			pcc_subspace_id;
> >> +	struct mbox_client	pcc_cl;
> >> +	struct pcc_mbox_chan	*pcc_chan;
> >> +	u32			nrdy_usec;
> >> +	cpumask_t		accessibility;
> >> +
> >> +	/*
> >> +	 * probe_lock is only take during discovery. After discovery these
> >> +	 * properties become read-only and the lists are protected by SRCU.
> >> +	 */
> >> +	struct mutex		probe_lock;
> >
> > Can we have more clarify about the locking strategy, including details
> > of which things each lock is supposed to apply to and when, and how (if
> > at all) the locks are intended to nest?
> 
> The comment above is supposed to describe that. A rule of thumb is its all the stuff
> 'below' it in the struct, there is also a comment for each order. This one is a bit of a
> catch all for the values in this struct that can be written.
I get that, but the devil is in the detail.
A lock that is "a bit of a catch-all" would always need to be taken,
if safety is the goal.
I guess we should try to close out the other discussions about locking
before working out whether anything else is needed, here.
> The purpose of the part_sel_lock and mon_sel_lock arrangement becomes much more obvious in
> subsequent patches when the driver starts accessing the registers of those names.
It's fair enough to defer that context until later -- though arguably
these could be added later when they are actually used.  Not really
worth resplitting the series just for that, though.
> 
> 
> > (Similarly for the global locks.)
> >
> >> +	unsigned long		ris_idxs[128 / BITS_PER_LONG];
> >> +	u32			ris_max;
> 
> > nrdy_usec, ris_idxs and ris_max appear unused in this patch (though I
> > suppose they get initialised by virtue of kzalloc()).  Is this
> > intentional?
> 
> To reduce the amount of churn in the series the bulk of the structure is added here, then
> the stuff that build more complicated structures, accesses hardware, deals with interrupts
> etc.
Ack, just checking that something wasn't unintentionally dropped.
> >> +
> >> +	/* mpam_msc_ris of this component */
> >> +	struct list_head	ris;
> >> +
> >> +	/*
> >> +	 * part_sel_lock protects access to the MSC hardware registers that are
> >> +	 * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
> >> +	 * by RIS).
> >> +	 * If needed, take msc->lock first.
> >> +	 */
> 
> > What's msc->lock ?
> 
> The old name for probe_lock, before it was necessary to lock the hardware registers
> separately. I've fixed the name.
> 
> Thanks,
> 
> James
Ack
Cheers
---Dave
> 
> 
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-09-08 15:25       ` Dave Martin
@ 2025-09-10 19:19         ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 19:19 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hi Dave,
On 08/09/2025 16:25, Dave Martin wrote:
> On Fri, Sep 05, 2025 at 07:49:37PM +0100, James Morse wrote:
>> On 01/09/2025 12:21, Dave Martin wrote:
>>> On Fri, Aug 22, 2025 at 03:29:51PM +0000, James Morse wrote:
>>>> Probing MPAM is convoluted. MSCs that are integrated with a CPU may
>>>> only be accessible from those CPUs, and they may not be online.
>>>> Touching the hardware early is pointless as MPAM can't be used until
>>>> the system-wide common values for num_partid and num_pmg have been
>>>> discovered.
>>>>
>>>> Start with driver probe/remove and mapping the MSC.
>>>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>>>> new file mode 100644
>>>> index 000000000000..a0d9a699a6e7
>>>> --- /dev/null
>>>> +++ b/drivers/resctrl/mpam_devices.c
>>>> @@ -0,0 +1,336 @@
>>> How are accesses to this protected against data races?
>>
>> It's under the list-lock, but after Rob's feedback I've made it an atomic_t and stopped
>> using it as an id for all the print messages.
> 
> Converting to atomic_t reduces the chance of people asking the
> question, but doesn't really answer the question.
> 
> Since mpam_num_msc shadows the contents of the lists and msc data
> structures, it may matter whether the two can be seen out of sync.
> 
> Does it definitely not matter?
v2 uses the firmware-table id as the id. The only thing mpam_num_msc needs to do is spot
which MSC was last to be probed so that the cpuhp calls can be registered.
The amount of skid that happens on the way there doesn't matter.
[..]
>>>> +static int fw_num_msc;
>>>
>>> Does this need to be protected against data races?
>>>
>>> If individual mpam_msc_drv_probe() calls may execute on different CPUs
>>> from mpam_msc_driver_init(), then seem to be potential races here.
>>
>> Incrementing was under the list-lock, but not the last 'are they all done' read. Following
>> Rob's comments I've made this an atomic_t.
> 
> As with mpam_num_msc, this eliminates data races on fw_num_msc, but
> races between this variable and other data structures may remain.
> 
> Can you explain what prevents such races, or why they don't
> matter?
See mpam_msc_driver_init(). The value is set before the driver is registered. It can't
probe before its registered, so all the readers must happen after the writer.
[..]
>>>> +}
>>>> +
>>>> +static void mpam_msc_drv_remove(struct platform_device *pdev)
>>>> +{
>>>
>>> The MPAM driver cannot currently be built as a module.
>>>
>>> Is it possible to exercise the driver remove paths, today?
>> Yes, through the sysfs unbind interface.
>>
>> It doesn't make a lot of sense for MPAM as the moment you unbind the driver from one MSC
>> it has to work out if it needs to teardown resctrl...
> 
> Has to, but doesn't?  Have I missed something?
Has to, and does:
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/tree/drivers/resctrl/mpam_devices.c?h=mpam/snapshot%2bextras/v6.17-rc2&id=41c768c1705d3fa0814bb1cb92c83280c872cb3e#n474
It's probably not what the user wants, but it is what they're asking for if they do this.
> Are we supposed to put the MSC back into a sane state, for e.g. the
> kexec path?
kdump on panic() means you couldn't trust it to do this.
The driver reset all the configurations during probing. It will initially run with
whatever configuration was left in partid-0, there is very little we can do about this.
> For resctrl, one option would be to stub out the backend -- i.e.,
> we don't tell resctrl that the affected resources disappeared, but
> attempts to manipulate the affected MSC(s) are stubbed out (similarly
> to what happens to an open tty after a hangup).
> 
> (Something along these lines may be done somewhere, but I'm not
> currently aware of it.)
On teardown the cpuhp callbacks are unregistered, which makes all the CPUs and
resctrl:domains appear offline, and the static key behind mpam_enabled() is disabled,
which makes a bunch of paths return an error.
The result is no more access to the hardware after an error has occured. The aim
is to prevent accidentally muddling important/unimportant tasks due to PARTID truncation.
> Is there an outstanding get on the device that prevents us from getting
> here until resctrl is shut down?  Thanks to the wisdom and restraint of
> systemd I'd expect resctrl to tangled up in some rat's nest of
> unremovable mounts by the time we try to shut down, but I hope I'm
> being pessimistic.  (Arguably that's not our bug, if so.)
The resctrl_exit() path removes the kernfs internals from the mount point. Systemd is able
to keep zombie mount points, but they end up being empty. (to reproduce this, put resctrl
in fstab!)
>>>> +	struct mpam_msc *msc = platform_get_drvdata(pdev);
>>>> +
>>>> +	if (!msc)
>>>> +		return;
>>>> +
>>>> +	mutex_lock(&mpam_list_lock);
>>>> +	mpam_num_msc--;
>>>> +	platform_set_drvdata(pdev, NULL);
>>>> +	list_del_rcu(&msc->glbl_list);
>>>> +	synchronize_srcu(&mpam_srcu);
>>>> +	devm_kfree(&pdev->dev, msc);
>>>> +	mutex_unlock(&mpam_list_lock);
>>>> +}
>>>> +
>>>> +static int mpam_msc_drv_probe(struct platform_device *pdev)
>>>> +{
>>>> +	int err;
>>>> +	struct mpam_msc *msc;
>>>> +	struct resource *msc_res;
>>>> +	void *plat_data = pdev->dev.platform_data;
>>>> +
>>>> +	mutex_lock(&mpam_list_lock);
>>>> +	do {
>>>> +		msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
>>>> +		if (!msc) {
>>>> +			err = -ENOMEM;
>>>> +			break;
>>>> +		}
>>>> +
>>>> +		mutex_init(&msc->probe_lock);
>>>> +		mutex_init(&msc->part_sel_lock);
>>>> +		mutex_init(&msc->outer_mon_sel_lock);
>>>> +		raw_spin_lock_init(&msc->inner_mon_sel_lock);
>>>> +		msc->id = mpam_num_msc++;
>>>> +		msc->pdev = pdev;
>>>> +		INIT_LIST_HEAD_RCU(&msc->glbl_list);
> msc->glbl_list is not a list head?  Does it need to be initialised at all?
> list_add_rcu() will just splat it, by the looks of it.
It's the member of the global list, hence the comment above it in the struct.
I've never known if the member 'list head' needs to be initialised or not, better safe
than sorry.
>>>> +		INIT_LIST_HEAD_RCU(&msc->ris);
> 
> Maybe INIT_LIST_HEAD_RCU() isn't needed here.  Do we ever access this
> list without holding one of the MSC locks?
> 
> This list is not used until a cpuhp hook comes in to probe the MSC, and
> then mpam_discovery_cpu_online() obtains a pointer via
> list_for_each_entry() -- but this is not RCU-protected.  The MSC probe
> lock is taken, there...
mpam_reprogram_msc(), mpam_msmon_reset_all_mbwu() and mpam_reset_msc() all do this
under (s)rcu.
The aim is to not take a lock on the read side. Once the cpuhp callbacks are registered
the only thing that writes to these lists is the error interrupt teardown.
>>>> +
>>>> +		err = update_msc_accessibility(msc);
>>>> +		if (err)
>>>> +			break;
>>>> +		if (cpumask_empty(&msc->accessibility)) {
>>>> +			pr_err_once("msc:%u is not accessible from any CPU!",
>>>> +				    msc->id);
>>>> +			err = -EINVAL;
>>>> +			break;
>>>> +		}
>>>> +
>>>> +		if (device_property_read_u32(&pdev->dev, "pcc-channel",
>>>> +					     &msc->pcc_subspace_id))
>>>> +			msc->iface = MPAM_IFACE_MMIO;
>>>> +		else
>>>> +			msc->iface = MPAM_IFACE_PCC;
>>>> +
>>>> +		if (msc->iface == MPAM_IFACE_MMIO) {
>>>> +			void __iomem *io;
>>>> +
>>>> +			io = devm_platform_get_and_ioremap_resource(pdev, 0,
>>>> +								    &msc_res);
>>>> +			if (IS_ERR(io)) {
>>>> +				pr_err("Failed to map MSC base address\n");
>>>> +				err = PTR_ERR(io);
>>>> +				break;
>>>> +			}
>>>> +			msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
>>>> +			msc->mapped_hwpage = io;
>>>> +		} else if (msc->iface == MPAM_IFACE_PCC) {
>>>> +			msc->pcc_cl.dev = &pdev->dev;
>>>> +			msc->pcc_cl.rx_callback = mpam_pcc_rx_callback;
>>>> +			msc->pcc_cl.tx_block = false;
>>>> +			msc->pcc_cl.tx_tout = 1000; /* 1s */
>>>> +			msc->pcc_cl.knows_txdone = false;
> 
> ... so, it feels like we may need to hold the probe lock, or ensure
> that all iterations over the msc list are RCU-protected (see below for
> counterexamples), or both.
In here, the msc hasn't yet been added to the global list, so it can't be found by anyone.
Anyone that does find it, found it by taking the list_lock and walking the global list.
>>>> +
>>>> +			msc->pcc_chan = pcc_mbox_request_channel(&msc->pcc_cl,
>>>> +								 msc->pcc_subspace_id);
>>>> +			if (IS_ERR(msc->pcc_chan)) {
>>>> +				pr_err("Failed to request MSC PCC channel\n");
>>>> +				err = PTR_ERR(msc->pcc_chan);
>>>> +				break;
>>>> +			}
>>>> +		}
>>
>>> Should the lock be held across initialisation of the msc fields?
>>
>> The msc isn't visible until its added to the list, so provided all that inialisation is
>> done 'before' its added to the list, it doesn't matter.
> Is it possible for other code to get a pointer to the new msc after
> this, other than by dereferencing the list?
While MSC are still being created - nothing walks the list. There are no interrupts or
hotplug callabacks.
Once all the MSC have been found, hardware probing starts.
During hardware probing, the cpuhp callback walks the global list under srcu.
Once all the hardware has been probed, the cpuhp callbacks are swapped out, and
mpam_enable_once() does things with static_keys and registering interrupts.
After this point, the mpam_class list/tree is used under srcu, and yes interrupts know
which MSC they are for.
> (The obvious case is the interrupt handlers, but it looks like the msc
> pointers used for registering the interrupts are indeed obtained
> through an RCU-protected iteration over mpam_all_msc.)
> 
> Note, there seem to be non-RCU-protected iterations over mpam_all_msc
> in the mpam_discovery_cpu_online() (patch 14) and
That was taking the write side lock unecessarily. I've fixed it to walk the list under srcu.
> mpam_enable_merge_features() paths (patch 18). 
Holds the write side lock.
> The lack of symmetry
> between list maintenance and consumption look a little suspect for
> those -- is safety ensured in some other way?
Readers must use the RCU primitives to be safe against a concurrent writer.
Writers must take some write side lock to be safe against each other - they don't need to
use the RCU list-walking primitives.
>>> list_add_rcu() might imply sufficient barriers to ensure that the
>>> initialisations are visible to other threads that obtain the msc
>>> pointer by iterating over mpam_all_msc.
>>>
>>> It's probably cleaner to hold the lock explicitly, though.
>>
>> The list lock? We do.
>> But the readers don't need to take the list lock, its only there to prevent concurrent
>> writers.
> 
> I meant whatever lock is supposed to protect the fields of the specific
> msc.  The list lock is not that lock.
We could bundle that under the probe_lock - but there would be no need to hold that here,
the struct mpam_msc isn't reachable.
[..]
>>>> +		mpam_discovery_complete();
>>>> +
>>>> +	if (err && msc)
>>>> +		mpam_msc_drv_remove(pdev);
>>>> +
>>>> +	return err;
>>>> +}
>>>> +/*
>>>> + * MSC that are hidden under caches are not created as platform devices
>>>> + * as there is no cache driver. Caches are also special-cased in
>>>> + * update_msc_accessibility().
>>>> + */
>>>
>>> Can you elaborate?  I don't understand quite what this is doing.
>>
>> / {
>>     my_thing {
>>         compatible = "my_thing";
>>         msc {
>>            compatible = "arm,mpam-msc";
>>         };
>>     };
>>
>>     other_thing {
>>         compatible = "other_thing";
>>     };
>>
>>     msc {
>>        compatible = "arm,mpam-msc";
>>        arm,mpam-device = <&other_thing>;
>>     };
>>
>>
>>     l2-cache {
>>       compatible = "cache";
>>         msc {
>>            compatible = "arm,mpam-msc";
>>         };
>>     };
>> };
>>
>> my_thing and other_thing's MSC will have devices created - but the cache will not, because
>> it's a cache not a device, and anything below it is ignored.
> 
> OK.  Maybe reword as something like:
> 
> --8<--
> 
> MSCs that are declared by the firmware as being part of a cache may not
> be created automatically as platform devices, since there is no
> dedicated cache driver.
> 
> Deal with those MSCs here.
> 
> -->8--
> 
> Maybe add a comment at update_msc_accessibility() that references this
> comment, instead of a reader of that function just needing to know that
> this comment is here?
I'll just drop the bit about update_msc_accessibility(), its less relevant once the next
patch adds the memory node in a similar way. (and even less relevant once I rip out DT
support)
>>>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>>>> new file mode 100644
>>>> index 000000000000..07e0f240eaca
>>>> --- /dev/null
>>>> +++ b/drivers/resctrl/mpam_internal.h
>>>> @@ -0,0 +1,62 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>>> +// Copyright (C) 2024 Arm Ltd.
>>>> +
>>>> +#ifndef MPAM_INTERNAL_H
>>>> +#define MPAM_INTERNAL_H
>>>> +
>>>> +#include <linux/arm_mpam.h>
>>>> +#include <linux/cpumask.h>
>>>> +#include <linux/io.h>
>>>> +#include <linux/mailbox_client.h>
>>>> +#include <linux/mutex.h>
>>>> +#include <linux/resctrl.h>
>>>> +#include <linux/sizes.h>
>>>> +
>>>> +struct mpam_msc {
>>>> +	/* member of mpam_all_msc */
>>>> +	struct list_head        glbl_list;
>>
>>> It is worth making these names less mismatched?
>>
>> all_msc_list ?
>>
>> It's because its global. The pattern otherwise is parent has a list foo, and all the
>> children have a member 'foo_list'.
> 
> "all_msc", then?
> 
> I'm not sure that it's essential for the list head's name to have
> "list" in it: that's clear from the type and from how it is used.
> "all" seems sufficient to imply that this is a list (?)
> 
> It doesn't get much more global than "all".
> 
> (This is purely cosmetic, of course.)
Ending in _list is part of a pattern with the class->component -->
component->component_list relationship. We can discuss whatever suffix is best on v2 - as
long as they all follow the same pattern!
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
 
 
 
- * [PATCH 11/33] arm_mpam: Add support for memory controller MSC on DT platforms
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (9 preceding siblings ...)
  2025-08-22 15:29 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
@ 2025-08-22 15:29 ` James Morse
  2025-08-22 15:29 ` [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described James Morse
                   ` (56 subsequent siblings)
  67 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
From: Shanker Donthineni <sdonthineni@nvidia.com>
The device-tree binding has two examples for MSC associated with
memory controllers. Add the support to discover the component_id
from the device-tree and create 'memory' RIS.
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
[ morse: split out of a bigger patch, added affinity piece ]
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_devices.c | 67 ++++++++++++++++++++++++----------
 1 file changed, 47 insertions(+), 20 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index a0d9a699a6e7..71a1fb1a9c75 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -62,41 +62,63 @@ static int mpam_dt_parse_resource(struct mpam_msc *msc, struct device_node *np,
 				  u32 ris_idx)
 {
 	int err = 0;
-	u32 level = 0;
-	unsigned long cache_id;
-	struct device_node *cache;
+	u32 class_id = 0, component_id = 0;
+	struct device_node *cache = NULL, *memory = NULL;
+	enum mpam_class_types type = MPAM_CLASS_UNKNOWN;
 
 	do {
+		/* What kind of MSC is this? */
 		if (of_device_is_compatible(np, "arm,mpam-cache")) {
 			cache = of_parse_phandle(np, "arm,mpam-device", 0);
 			if (!cache) {
 				pr_err("Failed to read phandle\n");
 				break;
 			}
+			type = MPAM_CLASS_CACHE;
 		} else if (of_device_is_compatible(np->parent, "cache")) {
 			cache = of_node_get(np->parent);
+			type = MPAM_CLASS_CACHE;
+		} else if (of_device_is_compatible(np, "arm,mpam-memory")) {
+			memory = of_parse_phandle(np, "arm,mpam-device", 0);
+			if (!memory) {
+				pr_err("Failed to read phandle\n");
+				break;
+			}
+			type = MPAM_CLASS_MEMORY;
+		} else if (of_device_is_compatible(np, "arm,mpam-memory-controller-msc")) {
+			memory = of_node_get(np->parent);
+			type = MPAM_CLASS_MEMORY;
 		} else {
-			/* For now, only caches are supported */
-			cache = NULL;
+			/*
+			 * For now, only caches and memory controllers are
+			 * supported.
+			 */
 			break;
 		}
 
-		err = of_property_read_u32(cache, "cache-level", &level);
-		if (err) {
-			pr_err("Failed to read cache-level\n");
-			break;
-		}
-
-		cache_id = cache_of_calculate_id(cache);
-		if (cache_id == ~0UL) {
-			err = -ENOENT;
-			break;
+		/* Determine the class and component ids, based on type. */
+		if (type == MPAM_CLASS_CACHE) {
+			err = of_property_read_u32(cache, "cache-level", &class_id);
+			if (err) {
+				pr_err("Failed to read cache-level\n");
+				break;
+			}
+			component_id = cache_of_calculate_id(cache);
+			if (component_id == ~0UL) {
+				err = -ENOENT;
+				break;
+			}
+		} else if (type == MPAM_CLASS_MEMORY) {
+			err = of_node_to_nid(np);
+			component_id = (err == NUMA_NO_NODE) ? 0 : err;
+			class_id = 255;
 		}
 
-		err = mpam_ris_create(msc, ris_idx, MPAM_CLASS_CACHE, level,
-				      cache_id);
+		err = mpam_ris_create(msc, ris_idx, type, class_id,
+				      component_id);
 	} while (0);
 	of_node_put(cache);
+	of_node_put(memory);
 
 	return err;
 }
@@ -157,9 +179,14 @@ static int update_msc_accessibility(struct mpam_msc *msc)
 		cpumask_copy(&msc->accessibility, cpu_possible_mask);
 		err = 0;
 	} else {
-		err = -EINVAL;
-		pr_err("Cannot determine accessibility of MSC: %s\n",
-		       dev_name(&msc->pdev->dev));
+		if (of_device_is_compatible(parent, "memory")) {
+			cpumask_copy(&msc->accessibility, cpu_possible_mask);
+			err = 0;
+		} else {
+			err = -EINVAL;
+			pr_err("Cannot determine accessibility of MSC: %s\n",
+			       dev_name(&msc->pdev->dev));
+		}
 	}
 	of_node_put(parent);
 
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (10 preceding siblings ...)
  2025-08-22 15:29 ` [PATCH 11/33] arm_mpam: Add support for memory controller MSC on DT platforms James Morse
@ 2025-08-22 15:29 ` James Morse
  2025-08-28  1:29   ` Fenghua Yu
  2025-09-01 11:09   ` Dave Martin
  2025-08-22 15:29 ` [PATCH 13/33] arm_mpam: Add MPAM MSC register layout definitions James Morse
                   ` (55 subsequent siblings)
  67 siblings, 2 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Ben Horgan
An MSC is a container of resources, each identified by their RIS index.
Some RIS are described by firmware to provide their position in the system.
Others are discovered when the driver probes the hardware.
To configure a resource it needs to be found by its class, e.g. 'L2'.
There are two kinds of grouping, a class is a set of components, which
are visible to user-space as there are likely to be multiple instances
of the L2 cache. (e.g. one per cluster or package)
struct mpam_components are a set of struct mpam_vmsc. A vMSC groups the
RIS in an MSC that control the same logical piece of hardware. (e.g. L2).
This is to allow hardware implementations where two controls are presented
as different RIS. Re-combining these RIS allows their feature bits to
be or-ed. This structure is not visible outside mpam_devices.c
struct mpam_vmsc are then a set of struct mpam_msc_ris, which are not
visible as each L2 cache may be composed of individual slices which need
to be configured the same as the hardware is not able to distribute the
configuration.
Add support for creating and destroying these structures.
A gfp is passed as the structures may need creating when a new RIS entry
is discovered when probing the MSC.
CC: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
 * removed a pr_err() debug message that crept in.
---
 drivers/resctrl/mpam_devices.c  | 488 +++++++++++++++++++++++++++++++-
 drivers/resctrl/mpam_internal.h |  91 ++++++
 include/linux/arm_mpam.h        |   8 +-
 3 files changed, 574 insertions(+), 13 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 71a1fb1a9c75..5baf2a8786fb 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -20,7 +20,6 @@
 #include <linux/printk.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
-#include <linux/srcu.h>
 #include <linux/types.h>
 
 #include <acpi/pcc.h>
@@ -35,11 +34,483 @@
 static DEFINE_MUTEX(mpam_list_lock);
 static LIST_HEAD(mpam_all_msc);
 
-static struct srcu_struct mpam_srcu;
+struct srcu_struct mpam_srcu;
 
 /* MPAM isn't available until all the MSC have been probed. */
 static u32 mpam_num_msc;
 
+/*
+ * An MSC is a physical container for controls and monitors, each identified by
+ * their RIS index. These share a base-address, interrupts and some MMIO
+ * registers. A vMSC is a virtual container for RIS in an MSC that control or
+ * monitor the same thing. Members of a vMSC are all RIS in the same MSC, but
+ * not all RIS in an MSC share a vMSC.
+ * Components are a group of vMSC that control or monitor the same thing but
+ * are from different MSC, so have different base-address, interrupts etc.
+ * Classes are the set components of the same type.
+ *
+ * The features of a vMSC is the union of the RIS it contains.
+ * The features of a Class and Component are the common subset of the vMSC
+ * they contain.
+ *
+ * e.g. The system cache may have bandwidth controls on multiple interfaces,
+ * for regulating traffic from devices independently of traffic from CPUs.
+ * If these are two RIS in one MSC, they will be treated as controlling
+ * different things, and will not share a vMSC/component/class.
+ *
+ * e.g. The L2 may have one MSC and two RIS, one for cache-controls another
+ * for bandwidth. These two RIS are members of the same vMSC.
+ *
+ * e.g. The set of RIS that make up the L2 are grouped as a component. These
+ * are sometimes termed slices. They should be configured the same, as if there
+ * were only one.
+ *
+ * e.g. The SoC probably has more than one L2, each attached to a distinct set
+ * of CPUs. All the L2 components are grouped as a class.
+ *
+ * When creating an MSC, struct mpam_msc is added to the all mpam_all_msc list,
+ * then linked via struct mpam_ris to a vmsc, component and class.
+ * The same MSC may exist under different class->component->vmsc paths, but the
+ * RIS index will be unique.
+ */
+LIST_HEAD(mpam_classes);
+
+/* List of all objects that can be free()d after synchronise_srcu() */
+static LLIST_HEAD(mpam_garbage);
+
+#define init_garbage(x)	init_llist_node(&(x)->garbage.llist)
+
+static struct mpam_vmsc *
+mpam_vmsc_alloc(struct mpam_component *comp, struct mpam_msc *msc, gfp_t gfp)
+{
+	struct mpam_vmsc *vmsc;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	vmsc = kzalloc(sizeof(*vmsc), gfp);
+	if (!comp)
+		return ERR_PTR(-ENOMEM);
+	init_garbage(vmsc);
+
+	INIT_LIST_HEAD_RCU(&vmsc->ris);
+	INIT_LIST_HEAD_RCU(&vmsc->comp_list);
+	vmsc->comp = comp;
+	vmsc->msc = msc;
+
+	list_add_rcu(&vmsc->comp_list, &comp->vmsc);
+
+	return vmsc;
+}
+
+static struct mpam_vmsc *mpam_vmsc_get(struct mpam_component *comp,
+				       struct mpam_msc *msc, bool alloc,
+				       gfp_t gfp)
+{
+	struct mpam_vmsc *vmsc;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+		if (vmsc->msc->id == msc->id)
+			return vmsc;
+	}
+
+	if (!alloc)
+		return ERR_PTR(-ENOENT);
+
+	return mpam_vmsc_alloc(comp, msc, gfp);
+}
+
+static struct mpam_component *
+mpam_component_alloc(struct mpam_class *class, int id, gfp_t gfp)
+{
+	struct mpam_component *comp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	comp = kzalloc(sizeof(*comp), gfp);
+	if (!comp)
+		return ERR_PTR(-ENOMEM);
+	init_garbage(comp);
+
+	comp->comp_id = id;
+	INIT_LIST_HEAD_RCU(&comp->vmsc);
+	/* affinity is updated when ris are added */
+	INIT_LIST_HEAD_RCU(&comp->class_list);
+	comp->class = class;
+
+	list_add_rcu(&comp->class_list, &class->components);
+
+	return comp;
+}
+
+static struct mpam_component *
+mpam_component_get(struct mpam_class *class, int id, bool alloc, gfp_t gfp)
+{
+	struct mpam_component *comp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_for_each_entry(comp, &class->components, class_list) {
+		if (comp->comp_id == id)
+			return comp;
+	}
+
+	if (!alloc)
+		return ERR_PTR(-ENOENT);
+
+	return mpam_component_alloc(class, id, gfp);
+}
+
+static struct mpam_class *
+mpam_class_alloc(u8 level_idx, enum mpam_class_types type, gfp_t gfp)
+{
+	struct mpam_class *class;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	class = kzalloc(sizeof(*class), gfp);
+	if (!class)
+		return ERR_PTR(-ENOMEM);
+	init_garbage(class);
+
+	INIT_LIST_HEAD_RCU(&class->components);
+	/* affinity is updated when ris are added */
+	class->level = level_idx;
+	class->type = type;
+	INIT_LIST_HEAD_RCU(&class->classes_list);
+
+	list_add_rcu(&class->classes_list, &mpam_classes);
+
+	return class;
+}
+
+static struct mpam_class *
+mpam_class_get(u8 level_idx, enum mpam_class_types type, bool alloc, gfp_t gfp)
+{
+	bool found = false;
+	struct mpam_class *class;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_for_each_entry(class, &mpam_classes, classes_list) {
+		if (class->type == type && class->level == level_idx) {
+			found = true;
+			break;
+		}
+	}
+
+	if (found)
+		return class;
+
+	if (!alloc)
+		return ERR_PTR(-ENOENT);
+
+	return mpam_class_alloc(level_idx, type, gfp);
+}
+
+#define add_to_garbage(x)				\
+do {							\
+	__typeof__(x) _x = x;				\
+	(_x)->garbage.to_free = (_x);			\
+	llist_add(&(_x)->garbage.llist, &mpam_garbage);	\
+} while (0)
+
+static void mpam_class_destroy(struct mpam_class *class)
+{
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_del_rcu(&class->classes_list);
+	add_to_garbage(class);
+}
+
+static void mpam_comp_destroy(struct mpam_component *comp)
+{
+	struct mpam_class *class = comp->class;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_del_rcu(&comp->class_list);
+	add_to_garbage(comp);
+
+	if (list_empty(&class->components))
+		mpam_class_destroy(class);
+}
+
+static void mpam_vmsc_destroy(struct mpam_vmsc *vmsc)
+{
+	struct mpam_component *comp = vmsc->comp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_del_rcu(&vmsc->comp_list);
+	add_to_garbage(vmsc);
+
+	if (list_empty(&comp->vmsc))
+		mpam_comp_destroy(comp);
+}
+
+static void mpam_ris_destroy(struct mpam_msc_ris *ris)
+{
+	struct mpam_vmsc *vmsc = ris->vmsc;
+	struct mpam_msc *msc = vmsc->msc;
+	struct platform_device *pdev = msc->pdev;
+	struct mpam_component *comp = vmsc->comp;
+	struct mpam_class *class = comp->class;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	cpumask_andnot(&comp->affinity, &comp->affinity, &ris->affinity);
+	cpumask_andnot(&class->affinity, &class->affinity, &ris->affinity);
+	clear_bit(ris->ris_idx, msc->ris_idxs);
+	list_del_rcu(&ris->vmsc_list);
+	list_del_rcu(&ris->msc_list);
+	add_to_garbage(ris);
+	ris->garbage.pdev = pdev;
+
+	if (list_empty(&vmsc->ris))
+		mpam_vmsc_destroy(vmsc);
+}
+
+/*
+ * There are two ways of reaching a struct mpam_msc_ris. Via the
+ * class->component->vmsc->ris, or via the msc.
+ * When destroying the msc, the other side needs unlinking and cleaning up too.
+ */
+static void mpam_msc_destroy(struct mpam_msc *msc)
+{
+	struct platform_device *pdev = msc->pdev;
+	struct mpam_msc_ris *ris, *tmp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_del_rcu(&msc->glbl_list);
+	platform_set_drvdata(pdev, NULL);
+
+	list_for_each_entry_safe(ris, tmp, &msc->ris, msc_list)
+		mpam_ris_destroy(ris);
+
+	add_to_garbage(msc);
+	msc->garbage.pdev = pdev;
+}
+
+static void mpam_free_garbage(void)
+{
+	struct mpam_garbage *iter, *tmp;
+	struct llist_node *to_free = llist_del_all(&mpam_garbage);
+
+	if (!to_free)
+		return;
+
+	synchronize_srcu(&mpam_srcu);
+
+	llist_for_each_entry_safe(iter, tmp, to_free, llist) {
+		if (iter->pdev)
+			devm_kfree(&iter->pdev->dev, iter->to_free);
+		else
+			kfree(iter->to_free);
+	}
+}
+
+/* Called recursively to walk the list of caches from a particular CPU */
+static void __mpam_get_cpumask_from_cache_id(int cpu, struct device_node *cache_node,
+					     unsigned long cache_id,
+					     u32 cache_level,
+					     cpumask_t *affinity)
+{
+	int err;
+	u32 iter_level;
+	unsigned long iter_cache_id;
+	struct device_node *iter_node __free(device_node) = of_find_next_cache_node(cache_node);
+
+	if (!iter_node)
+		return;
+
+	err = of_property_read_u32(iter_node, "cache-level", &iter_level);
+	if (err)
+		return;
+
+	/*
+	 * get_cpu_cacheinfo_id() isn't ready until sometime
+	 * during device_initcall(). Use cache_of_calculate_id().
+	 */
+	iter_cache_id = cache_of_calculate_id(iter_node);
+	if (cache_id == ~0UL)
+		return;
+
+	if (iter_level == cache_level && iter_cache_id == cache_id)
+		cpumask_set_cpu(cpu, affinity);
+
+	__mpam_get_cpumask_from_cache_id(cpu, iter_node, cache_id, cache_level,
+					 affinity);
+}
+
+/*
+ * The cacheinfo structures are only populated when CPUs are online.
+ * This helper walks the device tree to include offline CPUs too.
+ */
+int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
+				   cpumask_t *affinity)
+{
+	int cpu;
+
+	if (!acpi_disabled)
+		return acpi_pptt_get_cpumask_from_cache_id(cache_id, affinity);
+
+	for_each_possible_cpu(cpu) {
+		struct device_node *cpu_node __free(device_node) = of_get_cpu_node(cpu, NULL);
+		if (!cpu_node) {
+			pr_err("Failed to find cpu%d device node\n", cpu);
+			return -ENOENT;
+		}
+
+		__mpam_get_cpumask_from_cache_id(cpu, cpu_node, cache_id,
+						 cache_level, affinity);
+			continue;
+	}
+
+	return 0;
+}
+
+/*
+ * cpumask_of_node() only knows about online CPUs. This can't tell us whether
+ * a class is represented on all possible CPUs.
+ */
+static void get_cpumask_from_node_id(u32 node_id, cpumask_t *affinity)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		if (node_id == cpu_to_node(cpu))
+			cpumask_set_cpu(cpu, affinity);
+	}
+}
+
+static int get_cpumask_from_cache(struct device_node *cache,
+				  cpumask_t *affinity)
+{
+	int err;
+	u32 cache_level;
+	unsigned long cache_id;
+
+	err = of_property_read_u32(cache, "cache-level", &cache_level);
+	if (err) {
+		pr_err("Failed to read cache-level from cache node\n");
+		return -ENOENT;
+	}
+
+	cache_id = cache_of_calculate_id(cache);
+	if (cache_id == ~0UL) {
+		pr_err("Failed to calculate cache-id from cache node\n");
+		return -ENOENT;
+	}
+
+	return mpam_get_cpumask_from_cache_id(cache_id, cache_level, affinity);
+}
+
+static int mpam_ris_get_affinity(struct mpam_msc *msc, cpumask_t *affinity,
+				 enum mpam_class_types type,
+				 struct mpam_class *class,
+				 struct mpam_component *comp)
+{
+	int err;
+
+	switch (type) {
+	case MPAM_CLASS_CACHE:
+		err = mpam_get_cpumask_from_cache_id(comp->comp_id, class->level,
+						     affinity);
+		if (err)
+			return err;
+
+		if (cpumask_empty(affinity))
+			pr_warn_once("%s no CPUs associated with cache node",
+				     dev_name(&msc->pdev->dev));
+
+		break;
+	case MPAM_CLASS_MEMORY:
+		get_cpumask_from_node_id(comp->comp_id, affinity);
+		/* affinity may be empty for CPU-less memory nodes */
+		break;
+	case MPAM_CLASS_UNKNOWN:
+		return 0;
+	}
+
+	cpumask_and(affinity, affinity, &msc->accessibility);
+
+	return 0;
+}
+
+static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
+				  enum mpam_class_types type, u8 class_id,
+				  int component_id, gfp_t gfp)
+{
+	int err;
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc_ris *ris;
+	struct mpam_class *class;
+	struct mpam_component *comp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	if (test_and_set_bit(ris_idx, msc->ris_idxs))
+		return -EBUSY;
+
+	ris = devm_kzalloc(&msc->pdev->dev, sizeof(*ris), gfp);
+	if (!ris)
+		return -ENOMEM;
+	init_garbage(ris);
+
+	class = mpam_class_get(class_id, type, true, gfp);
+	if (IS_ERR(class))
+		return PTR_ERR(class);
+
+	comp = mpam_component_get(class, component_id, true, gfp);
+	if (IS_ERR(comp)) {
+		if (list_empty(&class->components))
+			mpam_class_destroy(class);
+		return PTR_ERR(comp);
+	}
+
+	vmsc = mpam_vmsc_get(comp, msc, true, gfp);
+	if (IS_ERR(vmsc)) {
+		if (list_empty(&comp->vmsc))
+			mpam_comp_destroy(comp);
+		return PTR_ERR(vmsc);
+	}
+
+	err = mpam_ris_get_affinity(msc, &ris->affinity, type, class, comp);
+	if (err) {
+		if (list_empty(&vmsc->ris))
+			mpam_vmsc_destroy(vmsc);
+		return err;
+	}
+
+	ris->ris_idx = ris_idx;
+	INIT_LIST_HEAD_RCU(&ris->vmsc_list);
+	ris->vmsc = vmsc;
+
+	cpumask_or(&comp->affinity, &comp->affinity, &ris->affinity);
+	cpumask_or(&class->affinity, &class->affinity, &ris->affinity);
+	list_add_rcu(&ris->vmsc_list, &vmsc->ris);
+
+	return 0;
+}
+
+int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
+		    enum mpam_class_types type, u8 class_id, int component_id)
+{
+	int err;
+
+	mutex_lock(&mpam_list_lock);
+	err = mpam_ris_create_locked(msc, ris_idx, type, class_id,
+				     component_id, GFP_KERNEL);
+	mutex_unlock(&mpam_list_lock);
+	if (err)
+		mpam_free_garbage();
+
+	return err;
+}
+
 static void mpam_discovery_complete(void)
 {
 	pr_err("Discovered all MSC\n");
@@ -179,7 +650,10 @@ static int update_msc_accessibility(struct mpam_msc *msc)
 		cpumask_copy(&msc->accessibility, cpu_possible_mask);
 		err = 0;
 	} else {
-		if (of_device_is_compatible(parent, "memory")) {
+		if (of_device_is_compatible(parent, "cache")) {
+			err = get_cpumask_from_cache(parent,
+						     &msc->accessibility);
+		} else if (of_device_is_compatible(parent, "memory")) {
 			cpumask_copy(&msc->accessibility, cpu_possible_mask);
 			err = 0;
 		} else {
@@ -209,11 +683,10 @@ static void mpam_msc_drv_remove(struct platform_device *pdev)
 
 	mutex_lock(&mpam_list_lock);
 	mpam_num_msc--;
-	platform_set_drvdata(pdev, NULL);
-	list_del_rcu(&msc->glbl_list);
-	synchronize_srcu(&mpam_srcu);
-	devm_kfree(&pdev->dev, msc);
+	mpam_msc_destroy(msc);
 	mutex_unlock(&mpam_list_lock);
+
+	mpam_free_garbage();
 }
 
 static int mpam_msc_drv_probe(struct platform_device *pdev)
@@ -230,6 +703,7 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
 			err = -ENOMEM;
 			break;
 		}
+		init_garbage(msc);
 
 		mutex_init(&msc->probe_lock);
 		mutex_init(&msc->part_sel_lock);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 07e0f240eaca..d49bb884b433 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -7,10 +7,27 @@
 #include <linux/arm_mpam.h>
 #include <linux/cpumask.h>
 #include <linux/io.h>
+#include <linux/llist.h>
 #include <linux/mailbox_client.h>
 #include <linux/mutex.h>
 #include <linux/resctrl.h>
 #include <linux/sizes.h>
+#include <linux/srcu.h>
+
+/*
+ * Structures protected by SRCU may not be freed for a surprising amount of
+ * time (especially if perf is running). To ensure the MPAM error interrupt can
+ * tear down all the structures, build a list of objects that can be gargbage
+ * collected once synchronize_srcu() has returned.
+ * If pdev is non-NULL, use devm_kfree().
+ */
+struct mpam_garbage {
+	/* member of mpam_garbage */
+	struct llist_node	llist;
+
+	void			*to_free;
+	struct platform_device	*pdev;
+};
 
 struct mpam_msc {
 	/* member of mpam_all_msc */
@@ -57,6 +74,80 @@ struct mpam_msc {
 
 	void __iomem		*mapped_hwpage;
 	size_t			mapped_hwpage_sz;
+
+	struct mpam_garbage	garbage;
+};
+
+struct mpam_class {
+	/* mpam_components in this class */
+	struct list_head	components;
+
+	cpumask_t		affinity;
+
+	u8			level;
+	enum mpam_class_types	type;
+
+	/* member of mpam_classes */
+	struct list_head	classes_list;
+
+	struct mpam_garbage	garbage;
+};
+
+struct mpam_component {
+	u32			comp_id;
+
+	/* mpam_vmsc in this component */
+	struct list_head	vmsc;
+
+	cpumask_t		affinity;
+
+	/* member of mpam_class:components */
+	struct list_head	class_list;
+
+	/* parent: */
+	struct mpam_class	*class;
+
+	struct mpam_garbage	garbage;
 };
 
+struct mpam_vmsc {
+	/* member of mpam_component:vmsc_list */
+	struct list_head	comp_list;
+
+	/* mpam_msc_ris in this vmsc */
+	struct list_head	ris;
+
+	/* All RIS in this vMSC are members of this MSC */
+	struct mpam_msc		*msc;
+
+	/* parent: */
+	struct mpam_component	*comp;
+
+	struct mpam_garbage	garbage;
+};
+
+struct mpam_msc_ris {
+	u8			ris_idx;
+
+	cpumask_t		affinity;
+
+	/* member of mpam_vmsc:ris */
+	struct list_head	vmsc_list;
+
+	/* member of mpam_msc:ris */
+	struct list_head	msc_list;
+
+	/* parent: */
+	struct mpam_vmsc	*vmsc;
+
+	struct mpam_garbage	garbage;
+};
+
+/* List of all classes - protected by srcu*/
+extern struct srcu_struct mpam_srcu;
+extern struct list_head mpam_classes;
+
+int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
+				   cpumask_t *affinity);
+
 #endif /* MPAM_INTERNAL_H */
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
index 0edefa6ba019..406a77be68cb 100644
--- a/include/linux/arm_mpam.h
+++ b/include/linux/arm_mpam.h
@@ -36,11 +36,7 @@ static inline int acpi_mpam_parse_resources(struct mpam_msc *msc,
 static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
 #endif
 
-static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
-				  enum mpam_class_types type, u8 class_id,
-				  int component_id)
-{
-	return -EINVAL;
-}
+int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
+		    enum mpam_class_types type, u8 class_id, int component_id);
 
 #endif /* __LINUX_ARM_MPAM_H */
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described
  2025-08-22 15:29 ` [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described James Morse
@ 2025-08-28  1:29   ` Fenghua Yu
  2025-09-08 17:57     ` James Morse
  2025-09-01 11:09   ` Dave Martin
  1 sibling, 1 reply; 200+ messages in thread
From: Fenghua Yu @ 2025-08-28  1:29 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich, Ben Horgan
Hi, James,
On 8/22/25 08:29, James Morse wrote:
> An MSC is a container of resources, each identified by their RIS index.
> Some RIS are described by firmware to provide their position in the system.
> Others are discovered when the driver probes the hardware.
>
> To configure a resource it needs to be found by its class, e.g. 'L2'.
> There are two kinds of grouping, a class is a set of components, which
> are visible to user-space as there are likely to be multiple instances
> of the L2 cache. (e.g. one per cluster or package)
>
> struct mpam_components are a set of struct mpam_vmsc. A vMSC groups the
> RIS in an MSC that control the same logical piece of hardware. (e.g. L2).
> This is to allow hardware implementations where two controls are presented
> as different RIS. Re-combining these RIS allows their feature bits to
> be or-ed. This structure is not visible outside mpam_devices.c
>
> struct mpam_vmsc are then a set of struct mpam_msc_ris, which are not
> visible as each L2 cache may be composed of individual slices which need
> to be configured the same as the hardware is not able to distribute the
> configuration.
>
> Add support for creating and destroying these structures.
>
> A gfp is passed as the structures may need creating when a new RIS entry
> is discovered when probing the MSC.
>
> CC: Ben Horgan <ben.horgan@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
>   * removed a pr_err() debug message that crept in.
> ---
>   drivers/resctrl/mpam_devices.c  | 488 +++++++++++++++++++++++++++++++-
>   drivers/resctrl/mpam_internal.h |  91 ++++++
>   include/linux/arm_mpam.h        |   8 +-
>   3 files changed, 574 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 71a1fb1a9c75..5baf2a8786fb 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
[SNIP]
> +static struct mpam_vmsc *
> +mpam_vmsc_alloc(struct mpam_component *comp, struct mpam_msc *msc, gfp_t gfp)
> +{
> +	struct mpam_vmsc *vmsc;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	vmsc = kzalloc(sizeof(*vmsc), gfp);
> +	if (!comp)
s/if (!cmp)/if (!vmsc)/
> +		return ERR_PTR(-ENOMEM);
> +	init_garbage(vmsc);
> +
> +	INIT_LIST_HEAD_RCU(&vmsc->ris);
> +	INIT_LIST_HEAD_RCU(&vmsc->comp_list);
> +	vmsc->comp = comp;
> +	vmsc->msc = msc;
> +
> +	list_add_rcu(&vmsc->comp_list, &comp->vmsc);
> +
> +	return vmsc;
> +}
Thanks.
-Fenghua
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described
  2025-08-28  1:29   ` Fenghua Yu
@ 2025-09-08 17:57     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-08 17:57 UTC (permalink / raw)
  To: Fenghua Yu, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich, Ben Horgan
Hi Fenghua,
On 28/08/2025 02:29, Fenghua Yu wrote:
> On 8/22/25 08:29, James Morse wrote:
>> An MSC is a container of resources, each identified by their RIS index.
>> Some RIS are described by firmware to provide their position in the system.
>> Others are discovered when the driver probes the hardware.
>>
>> To configure a resource it needs to be found by its class, e.g. 'L2'.
>> There are two kinds of grouping, a class is a set of components, which
>> are visible to user-space as there are likely to be multiple instances
>> of the L2 cache. (e.g. one per cluster or package)
>>
>> struct mpam_components are a set of struct mpam_vmsc. A vMSC groups the
>> RIS in an MSC that control the same logical piece of hardware. (e.g. L2).
>> This is to allow hardware implementations where two controls are presented
>> as different RIS. Re-combining these RIS allows their feature bits to
>> be or-ed. This structure is not visible outside mpam_devices.c
>>
>> struct mpam_vmsc are then a set of struct mpam_msc_ris, which are not
>> visible as each L2 cache may be composed of individual slices which need
>> to be configured the same as the hardware is not able to distribute the
>> configuration.
>>
>> Add support for creating and destroying these structures.
>>
>> A gfp is passed as the structures may need creating when a new RIS entry
>> is discovered when probing the MSC.
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index 71a1fb1a9c75..5baf2a8786fb 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
> [SNIP]
>> +static struct mpam_vmsc *
>> +mpam_vmsc_alloc(struct mpam_component *comp, struct mpam_msc *msc, gfp_t gfp)
>> +{
>> +    struct mpam_vmsc *vmsc;
>> +
>> +    lockdep_assert_held(&mpam_list_lock);
>> +
>> +    vmsc = kzalloc(sizeof(*vmsc), gfp);
>> +    if (!comp)
> 
> s/if (!cmp)/if (!vmsc)/
> 
> 
>> +        return ERR_PTR(-ENOMEM);
Yup, that's a copy-and-paste typo. Fixed,
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
- * Re: [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described
  2025-08-22 15:29 ` [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described James Morse
  2025-08-28  1:29   ` Fenghua Yu
@ 2025-09-01 11:09   ` Dave Martin
  2025-09-08 17:57     ` James Morse
  1 sibling, 1 reply; 200+ messages in thread
From: Dave Martin @ 2025-09-01 11:09 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich, Ben Horgan
Hi,
> Subject: arm_mpam: Add the class and component structures for ris firmware described
Mangled subject line?
There is a fair intersection between the commit message and what the
patch does, but they don't quite seem to match up.
Some key issues like locking / object lifecycle management
and DT parsing (a bit of which, it appears, lives here too) are not
mentioned at all.
In lieu of a complete rewrite, it might be best to discard the
explanation of the various object types.  The comment in the code
speaks for itself, and looks clearer.
[...]
On Fri, Aug 22, 2025 at 03:29:53PM +0000, James Morse wrote:
> An MSC is a container of resources, each identified by their RIS index.
> Some RIS are described by firmware to provide their position in the system.
> Others are discovered when the driver probes the hardware.
> 
> To configure a resource it needs to be found by its class, e.g. 'L2'.
> There are two kinds of grouping, a class is a set of components, which
> are visible to user-space as there are likely to be multiple instances
> of the L2 cache. (e.g. one per cluster or package)
> 
> struct mpam_components are a set of struct mpam_vmsc. A vMSC groups the
> RIS in an MSC that control the same logical piece of hardware. (e.g. L2).
> This is to allow hardware implementations where two controls are presented
> as different RIS. Re-combining these RIS allows their feature bits to
> be or-ed. This structure is not visible outside mpam_devices.c
> 
> struct mpam_vmsc are then a set of struct mpam_msc_ris, which are not
> visible as each L2 cache may be composed of individual slices which need
> to be configured the same as the hardware is not able to distribute the
> configuration.
> 
> Add support for creating and destroying these structures.
> A gfp is passed as the structures may need creating when a new RIS entry
> is discovered when probing the MSC.
> 
> CC: Ben Horgan <ben.horgan@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
>  * removed a pr_err() debug message that crept in.
> ---
>  drivers/resctrl/mpam_devices.c  | 488 +++++++++++++++++++++++++++++++-
>  drivers/resctrl/mpam_internal.h |  91 ++++++
>  include/linux/arm_mpam.h        |   8 +-
>  3 files changed, 574 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 71a1fb1a9c75..5baf2a8786fb 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -20,7 +20,6 @@
[...]
> @@ -35,11 +34,483 @@
>  static DEFINE_MUTEX(mpam_list_lock);
>  static LIST_HEAD(mpam_all_msc);
>  
> -static struct srcu_struct mpam_srcu;
> +struct srcu_struct mpam_srcu;
Why expose this here?  This patch makes no use of the exposed symbol.
>  
>  /* MPAM isn't available until all the MSC have been probed. */
>  static u32 mpam_num_msc;
>  
> +/*
> + * An MSC is a physical container for controls and monitors, each identified by
> + * their RIS index. These share a base-address, interrupts and some MMIO
> + * registers. A vMSC is a virtual container for RIS in an MSC that control or
> + * monitor the same thing. Members of a vMSC are all RIS in the same MSC, but
> + * not all RIS in an MSC share a vMSC.
> + * Components are a group of vMSC that control or monitor the same thing but
> + * are from different MSC, so have different base-address, interrupts etc.
> + * Classes are the set components of the same type.
> + *
> + * The features of a vMSC is the union of the RIS it contains.
> + * The features of a Class and Component are the common subset of the vMSC
> + * they contain.
> + *
> + * e.g. The system cache may have bandwidth controls on multiple interfaces,
> + * for regulating traffic from devices independently of traffic from CPUs.
> + * If these are two RIS in one MSC, they will be treated as controlling
> + * different things, and will not share a vMSC/component/class.
> + *
> + * e.g. The L2 may have one MSC and two RIS, one for cache-controls another
> + * for bandwidth. These two RIS are members of the same vMSC.
> + *
> + * e.g. The set of RIS that make up the L2 are grouped as a component. These
> + * are sometimes termed slices. They should be configured the same, as if there
> + * were only one.
> + *
> + * e.g. The SoC probably has more than one L2, each attached to a distinct set
> + * of CPUs. All the L2 components are grouped as a class.
> + *
> + * When creating an MSC, struct mpam_msc is added to the all mpam_all_msc list,
> + * then linked via struct mpam_ris to a vmsc, component and class.
> + * The same MSC may exist under different class->component->vmsc paths, but the
> + * RIS index will be unique.
> + */
This description of the structures and how they relate to each other
seems OK (bearing in mind that I am already familiar with this stuff --
I can't speak for other people).
> +LIST_HEAD(mpam_classes);
> +
> +/* List of all objects that can be free()d after synchronise_srcu() */
> +static LLIST_HEAD(mpam_garbage);
> +
> +#define init_garbage(x)	init_llist_node(&(x)->garbage.llist)
[...]
> +#define add_to_garbage(x)				\
> +do {							\
> +	__typeof__(x) _x = x;				\
Nit:
= (x)
(for the paranoid)
> +	(_x)->garbage.to_free = (_x);			\
_x->garbage.to_free = _x;
(_x is an identifier, not a macro argument.  It can't get re-parsed as
something else -- assuming that there is not a #define for _x, but then
all bets would be off anyway.)
> +	llist_add(&(_x)->garbage.llist, &mpam_garbage);	\
&_x->...
[...]
> +static void mpam_ris_destroy(struct mpam_msc_ris *ris)
> +{
> +	struct mpam_vmsc *vmsc = ris->vmsc;
> +	struct mpam_msc *msc = vmsc->msc;
> +	struct platform_device *pdev = msc->pdev;
> +	struct mpam_component *comp = vmsc->comp;
> +	struct mpam_class *class = comp->class;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	cpumask_andnot(&comp->affinity, &comp->affinity, &ris->affinity);
> +	cpumask_andnot(&class->affinity, &class->affinity, &ris->affinity);
This is not the inverse of the cpumask_or()s in mpam_ris_create_locked(),
unless the the ris associated with each class and each component have
strictly disjoint affinity masks.  Is that checked anywhere, or should
it be impossible by construction?
But, thinking about it:
I wonder why we ever really need to do the teardown.  If we get an
error interrupt then we can just go into a sulk, spam dmesg a bit, put
the hardware into the most vanilla state that we can, and refuse to
manipulate it further.  But this only happens in the case of a software
or hardware *bug* (or, in a future world where we might implement
virtualisation, an uncontainable MPAM error triggered by a guest -- for
which tearing down the host MPAM would be an overreaction).
Trying to cleanly tear the MPAM driver down after such an error seems a
bit futile.
The MPAM resctrl glue could eventually be made into a module (though
not needed from day 1) -- which would allow for unloading resctrlfs if
that is eventually module-ised.  I think this wouldn't require the MPAM
devices backend to be torn down at any point, though (?)
If we can simplify or eliminate the teardown, does it simplify the
locking at all?  The garbage collection logic can also be dispensed
with if there is never any garbage.
Since MSCs etc. never disappear from the hardware, it feels like it
ought not to be necessary ever to remove items from any of these lists
except when trying to do a teardown (?)
(Putting the hardware into a quiecent state is not the same thing as
tearing down the data structures -- we do want to quiesce MPAM when
shutting down the kernel, as least for the kexec scenario.)
> +	clear_bit(ris->ris_idx, msc->ris_idxs);
> +	list_del_rcu(&ris->vmsc_list);
> +	list_del_rcu(&ris->msc_list);
> +	add_to_garbage(ris);
> +	ris->garbage.pdev = pdev;
> +
> +	if (list_empty(&vmsc->ris))
> +		mpam_vmsc_destroy(vmsc);
> +}
[...]
> +static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
> +				  enum mpam_class_types type, u8 class_id,
> +				  int component_id, gfp_t gfp)
> +{
> +	int err;
> +	struct mpam_vmsc *vmsc;
> +	struct mpam_msc_ris *ris;
> +	struct mpam_class *class;
> +	struct mpam_component *comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	if (test_and_set_bit(ris_idx, msc->ris_idxs))
> +		return -EBUSY;
Is it impossible by construction to get in here with an out-of-range
ris_idx?
To avoid the callers (i.e., ACPI) needing to understand the internal
limitations of this code, maybe it is worth having a check here (even
if technically redundant).
[...]
Cheers
---Dave
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described
  2025-09-01 11:09   ` Dave Martin
@ 2025-09-08 17:57     ` James Morse
  2025-09-09 11:28       ` Dave Martin
  0 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-09-08 17:57 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Ben Horgan
Hi Dave,
On 01/09/2025 12:09, Dave Martin wrote:
>> Subject: arm_mpam: Add the class and component structures for ris firmware described
> 
> Mangled subject line?
order words hard is.
> There is a fair intersection between the commit message and what the
> patch does, but they don't quite seem to match up.
> 
> Some key issues like locking / object lifecycle management
> and DT parsing (a bit of which, it appears, lives here too) are not
> mentioned at all.
I don't think everything needs mentioning - you have the diff for that.This should capture
the motivation, what you have and what you need to find, the grouping etc.
> In lieu of a complete rewrite, it might be best to discard the
> explanation of the various object types.  The comment in the code
> speaks for itself, and looks clearer.
Fair enough,
> [...]
> 
> On Fri, Aug 22, 2025 at 03:29:53PM +0000, James Morse wrote:
>> An MSC is a container of resources, each identified by their RIS index.
>> Some RIS are described by firmware to provide their position in the system.
>> Others are discovered when the driver probes the hardware.
>>
>> To configure a resource it needs to be found by its class, e.g. 'L2'.
>> There are two kinds of grouping, a class is a set of components, which
>> are visible to user-space as there are likely to be multiple instances
>> of the L2 cache. (e.g. one per cluster or package)
>>
>> struct mpam_components are a set of struct mpam_vmsc. A vMSC groups the
>> RIS in an MSC that control the same logical piece of hardware. (e.g. L2).
>> This is to allow hardware implementations where two controls are presented
>> as different RIS. Re-combining these RIS allows their feature bits to
>> be or-ed. This structure is not visible outside mpam_devices.c
>>
>> struct mpam_vmsc are then a set of struct mpam_msc_ris, which are not
>> visible as each L2 cache may be composed of individual slices which need
>> to be configured the same as the hardware is not able to distribute the
>> configuration.
>>
>> Add support for creating and destroying these structures.
>> A gfp is passed as the structures may need creating when a new RIS entry
>> is discovered when probing the MSC.
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index 71a1fb1a9c75..5baf2a8786fb 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -20,7 +20,6 @@
> 
> [...]
> 
>> @@ -35,11 +34,483 @@
>>  static DEFINE_MUTEX(mpam_list_lock);
>>  static LIST_HEAD(mpam_all_msc);
>>  
>> -static struct srcu_struct mpam_srcu;
>> +struct srcu_struct mpam_srcu;
> Why expose this here?  This patch makes no use of the exposed symbol.
The mpam_resctrl code needs to take it when it walks these lists. I don't want to change
it then because its just additional churn.
>>  /* MPAM isn't available until all the MSC have been probed. */
>>  static u32 mpam_num_msc;
>>  
>> +/*
>> + * An MSC is a physical container for controls and monitors, each identified by
>> + * their RIS index. These share a base-address, interrupts and some MMIO
>> + * registers. A vMSC is a virtual container for RIS in an MSC that control or
>> + * monitor the same thing. Members of a vMSC are all RIS in the same MSC, but
>> + * not all RIS in an MSC share a vMSC.
>> + * Components are a group of vMSC that control or monitor the same thing but
>> + * are from different MSC, so have different base-address, interrupts etc.
>> + * Classes are the set components of the same type.
>> + *
>> + * The features of a vMSC is the union of the RIS it contains.
>> + * The features of a Class and Component are the common subset of the vMSC
>> + * they contain.
>> + *
>> + * e.g. The system cache may have bandwidth controls on multiple interfaces,
>> + * for regulating traffic from devices independently of traffic from CPUs.
>> + * If these are two RIS in one MSC, they will be treated as controlling
>> + * different things, and will not share a vMSC/component/class.
>> + *
>> + * e.g. The L2 may have one MSC and two RIS, one for cache-controls another
>> + * for bandwidth. These two RIS are members of the same vMSC.
>> + *
>> + * e.g. The set of RIS that make up the L2 are grouped as a component. These
>> + * are sometimes termed slices. They should be configured the same, as if there
>> + * were only one.
>> + *
>> + * e.g. The SoC probably has more than one L2, each attached to a distinct set
>> + * of CPUs. All the L2 components are grouped as a class.
>> + *
>> + * When creating an MSC, struct mpam_msc is added to the all mpam_all_msc list,
>> + * then linked via struct mpam_ris to a vmsc, component and class.
>> + * The same MSC may exist under different class->component->vmsc paths, but the
>> + * RIS index will be unique.
>> + */
> 
> This description of the structures and how they relate to each other
> seems OK (bearing in mind that I am already familiar with this stuff --
> I can't speak for other people).
Great!
> [...]
> 
>> +#define add_to_garbage(x)				\
>> +do {							\
>> +	__typeof__(x) _x = x;				\
> 
> Nit:
> 
> = (x)
> 
> (for the paranoid)
Fixed,
>> +	(_x)->garbage.to_free = (_x);			\
> 
> _x->garbage.to_free = _x;
> 
> (_x is an identifier, not a macro argument.  It can't get re-parsed as
> something else -- assuming that there is not a #define for _x, but then
> all bets would be off anyway.)
>> +	llist_add(&(_x)->garbage.llist, &mpam_garbage);	\
> 
> &_x->...
Fixed,
> [...]
> 
>> +static void mpam_ris_destroy(struct mpam_msc_ris *ris)
>> +{
>> +	struct mpam_vmsc *vmsc = ris->vmsc;
>> +	struct mpam_msc *msc = vmsc->msc;
>> +	struct platform_device *pdev = msc->pdev;
>> +	struct mpam_component *comp = vmsc->comp;
>> +	struct mpam_class *class = comp->class;
>> +
>> +	lockdep_assert_held(&mpam_list_lock);
>> +
>> +	cpumask_andnot(&comp->affinity, &comp->affinity, &ris->affinity);
>> +	cpumask_andnot(&class->affinity, &class->affinity, &ris->affinity);
> 
> This is not the inverse of the cpumask_or()s in mpam_ris_create_locked(),
> unless the the ris associated with each class and each component have
> strictly disjoint affinity masks.  Is that checked anywhere, or should
> it be impossible by construction?
They should be disjoint. These bitmaps are built from firmware description of the cache
hierarchy. I don't think its possible to describe a situation where there are overlaps.
You can build a nonsense cache hierarchy, e.g. where CPU-0's L3 is CPU-6's L2, but if you
do the scheduler is going to complain when it tries to chose the scheduler domains. I
think this should be filed under "you've got bigger problems".  There is a check that
catches this in mpam_resctrl_pick_caches(), to see that all the CPUs are accounted for,
which is to avoid tasks that get lucky with task-placement managing to escape their
resource limit.
> But, thinking about it:
> 
> I wonder why we ever really need to do the teardown.  If we get an
> error interrupt then we can just go into a sulk, spam dmesg a bit, put
> the hardware into the most vanilla state that we can, and refuse to
> manipulate it further.  But this only happens in the case of a software
> or hardware *bug* (or, in a future world where we might implement
> virtualisation, an uncontainable MPAM error triggered by a guest -- for
> which tearing down the host MPAM would be an overreaction).
The good news is guests can't escape the PARTID virtualisation that the CPU does, so any
mess a guest manages to make is confined to that guest's PARTID range.
> Trying to cleanly tear the MPAM driver down after such an error seems a
> bit futile.
> 
> The MPAM resctrl glue could eventually be made into a module (though
> not needed from day 1) -- which would allow for unloading resctrlfs if
> that is eventually module-ised.  I think this wouldn't require the MPAM
> devices backend to be torn down at any point, though (?)
It would certainly be optional. kernfs->resctrl->mpam is the reason all this has to be
built-in. If that changes I'd aim for this to be a module.
All this free()ing was added so that the driver doesn't end up sitting on memory when it
isn't providing any usable feature. I have seen a platform where the error interrupt goes
off during boot, (I suspect firmware configures an out-of-range PARTID). On such a
platform any memory that isn't free()d is a waste.
But I agree its a small amount of memory.
> If we can simplify or eliminate the teardown, does it simplify the
> locking at all?  The garbage collection logic can also be dispensed
> with if there is never any garbage.
It wouldn't simplify the locking, only remove that deferred free()ing which is needed
because of SRCU.
> Since MSCs etc. never disappear from the hardware, it feels like it
> ought not to be necessary ever to remove items from any of these lists
> except when trying to do a teardown (?)
Unbinding the driver from an MSC is another case where this may be triggered via
mpam_msc_drv_remove(). If you look at the whole thing, mpam_ris_destroy() pokes
mpam_resctrl_teardown_class() to see if resctrl needs to be torn down.
I don't anticipate folk actually needing to do that. One Reasons is for VFIO - but this
kind of stuff has a performance impact on the hypervisor, so its unlikely to ever allow a
guest direct access to this kind of thing. Another reason is to load a more specific
driver, which sounds unlikely.
Ultimately this memory free-ing code is here because its the right thing to do.
I'd prefer to keep it as making this a loadable module would mean we have to do this.
> (Putting the hardware into a quiecent state is not the same thing as
> tearing down the data structures -- we do want to quiesce MPAM when
> shutting down the kernel, as least for the kexec scenario.)
> [...]
> 
>> +static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
>> +				  enum mpam_class_types type, u8 class_id,
>> +				  int component_id, gfp_t gfp)
>> +{
>> +	int err;
>> +	struct mpam_vmsc *vmsc;
>> +	struct mpam_msc_ris *ris;
>> +	struct mpam_class *class;
>> +	struct mpam_component *comp;
>> +
>> +	lockdep_assert_held(&mpam_list_lock);
>> +
>> +	if (test_and_set_bit(ris_idx, msc->ris_idxs))
>> +		return -EBUSY;
> 
> Is it impossible by construction to get in here with an out-of-range
> ris_idx?
> 
> To avoid the callers (i.e., ACPI) needing to understand the internal
> limitations of this code, maybe it is worth having a check here (even
> if technically redundant).
It's possible - but only if you mess up the firmware tables.
I'll add a check for this as its easy enough.
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described
  2025-09-08 17:57     ` James Morse
@ 2025-09-09 11:28       ` Dave Martin
  2025-09-10 19:19         ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Dave Martin @ 2025-09-09 11:28 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Ben Horgan
Hi,
On Mon, Sep 08, 2025 at 06:57:41PM +0100, James Morse wrote:
> Hi Dave,
> 
> On 01/09/2025 12:09, Dave Martin wrote:
> >> Subject: arm_mpam: Add the class and component structures for ris firmware described
> > 
> > Mangled subject line?
> 
> order words hard is.
> 
> 
> > There is a fair intersection between the commit message and what the
> > patch does, but they don't quite seem to match up.
> > 
> > Some key issues like locking / object lifecycle management
> > and DT parsing (a bit of which, it appears, lives here too) are not
> > mentioned at all.
> 
> I don't think everything needs mentioning - you have the diff for that.This should capture
> the motivation, what you have and what you need to find, the grouping etc.
> 
> 
> > In lieu of a complete rewrite, it might be best to discard the
> > explanation of the various object types.  The comment in the code
> > speaks for itself, and looks clearer.
> 
> Fair enough,
OK
[...]
> >> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> >> index 71a1fb1a9c75..5baf2a8786fb 100644
> >> --- a/drivers/resctrl/mpam_devices.c
> >> +++ b/drivers/resctrl/mpam_devices.c
> >> @@ -20,7 +20,6 @@
> > 
> > [...]
> > 
> >> @@ -35,11 +34,483 @@
> >>  static DEFINE_MUTEX(mpam_list_lock);
> >>  static LIST_HEAD(mpam_all_msc);
> >>  
> >> -static struct srcu_struct mpam_srcu;
> >> +struct srcu_struct mpam_srcu;
> 
> > Why expose this here?  This patch makes no use of the exposed symbol.
> 
> The mpam_resctrl code needs to take it when it walks these lists. I don't want to change
> it then because its just additional churn.
I guess this is harmless, but it's no help to the kernel, or to
reviewers...
[...]
> >> +/*
> >> + * An MSC is a physical container for controls and monitors, each identified by
[...]
> >> + * The same MSC may exist under different class->component->vmsc paths, but the
> >> + * RIS index will be unique.
> >> + */
> > 
> > This description of the structures and how they relate to each other
> > seems OK (bearing in mind that I am already familiar with this stuff --
> > I can't speak for other people).
> 
> Great!
OK
[...]
> >> +static void mpam_ris_destroy(struct mpam_msc_ris *ris)
> >> +{
> >> +	struct mpam_vmsc *vmsc = ris->vmsc;
> >> +	struct mpam_msc *msc = vmsc->msc;
> >> +	struct platform_device *pdev = msc->pdev;
> >> +	struct mpam_component *comp = vmsc->comp;
> >> +	struct mpam_class *class = comp->class;
> >> +
> >> +	lockdep_assert_held(&mpam_list_lock);
> >> +
> >> +	cpumask_andnot(&comp->affinity, &comp->affinity, &ris->affinity);
> >> +	cpumask_andnot(&class->affinity, &class->affinity, &ris->affinity);
> > 
> > This is not the inverse of the cpumask_or()s in mpam_ris_create_locked(),
> > unless the the ris associated with each class and each component have
> > strictly disjoint affinity masks.  Is that checked anywhere, or should
> > it be impossible by construction?
> 
> They should be disjoint. These bitmaps are built from firmware description of the cache
> hierarchy. I don't think its possible to describe a situation where there are overlaps.
> 
> You can build a nonsense cache hierarchy, e.g. where CPU-0's L3 is CPU-6's L2, but if you
> do the scheduler is going to complain when it tries to chose the scheduler domains. I
> think this should be filed under "you've got bigger problems".  There is a check that
> catches this in mpam_resctrl_pick_caches(), to see that all the CPUs are accounted for,
> which is to avoid tasks that get lucky with task-placement managing to escape their
> resource limit.
I guess that makes sense.
If the firmware description is formally a tree structure then it should
be impossible to end up with overlapping affinity masks.
Since this doesn't bite us until teardown-time in any case, I think
this probably doesn't need to be checked explicitly, unless we observe
actual problems.
A comment documenting this assumption may be worth having.
> > But, thinking about it:
> > 
> > I wonder why we ever really need to do the teardown.  If we get an
> > error interrupt then we can just go into a sulk, spam dmesg a bit, put
> > the hardware into the most vanilla state that we can, and refuse to
> > manipulate it further.  But this only happens in the case of a software
> > or hardware *bug* (or, in a future world where we might implement
> > virtualisation, an uncontainable MPAM error triggered by a guest -- for
> > which tearing down the host MPAM would be an overreaction).
> 
> The good news is guests can't escape the PARTID virtualisation that the CPU does, so any
> mess a guest manages to make is confined to that guest's PARTID range.
> 
> 
> > Trying to cleanly tear the MPAM driver down after such an error seems a
> > bit futile.
> > 
> > The MPAM resctrl glue could eventually be made into a module (though
> > not needed from day 1) -- which would allow for unloading resctrlfs if
> > that is eventually module-ised.  I think this wouldn't require the MPAM
> > devices backend to be torn down at any point, though (?)
> 
> It would certainly be optional. kernfs->resctrl->mpam is the reason all this has to be
> built-in. If that changes I'd aim for this to be a module.
> 
> All this free()ing was added so that the driver doesn't end up sitting on memory when it
> isn't providing any usable feature. I have seen a platform where the error interrupt goes
I guess that's reasonable, but this is only applies to hardware that
has MPAM but where it is either broken, or where it is unsuitable for
running Linux but Linux has been deployed on it anyway while still
leaving the ACPI tables intact.  This does not violate any
specification, but it seems of marginal benefit to introduce a load of
complexity just to safe a few K in this situation.  (Or do we get stuck,
unable to free the config and mbwu_state arrays?  Those don't count as
large on a server-class system, but they are about the "a few K"
magnitude.)
(Not that I'm not saying that teardown is something we shouldn't do --
rather, my point is: do we really need to do it now if it is subtle and
complex to make it work, or can this be a later addition?)
> off during boot, (I suspect firmware configures an out-of-range PARTID). On such a
> platform any memory that isn't free()d is a waste.
> 
> But I agree its a small amount of memory.
> 
> 
> > If we can simplify or eliminate the teardown, does it simplify the
> > locking at all?  The garbage collection logic can also be dispensed
> > with if there is never any garbage.
> 
> It wouldn't simplify the locking, only remove that deferred free()ing which is needed
> because of SRCU.
My point was that there is no need to defend against concurrent removal
if list entries if list entries are never removed.
> > Since MSCs etc. never disappear from the hardware, it feels like it
> > ought not to be necessary ever to remove items from any of these lists
> > except when trying to do a teardown (?)
> 
> Unbinding the driver from an MSC is another case where this may be triggered via
> mpam_msc_drv_remove(). If you look at the whole thing, mpam_ris_destroy() pokes
> mpam_resctrl_teardown_class() to see if resctrl needs to be torn down.
> 
> I don't anticipate folk actually needing to do that. One Reasons is for VFIO - but this
> kind of stuff has a performance impact on the hypervisor, so its unlikely to ever allow a
> guest direct access to this kind of thing. Another reason is to load a more specific
> driver, which sounds unlikely.
> 
> 
> Ultimately this memory free-ing code is here because its the right thing to do.
> I'd prefer to keep it as making this a loadable module would mean we have to do this.
I don't disagree with that: it is messy to retrofit teardown if it was
never considered in the initial design.
I guess that this all comes from my uncertainty about the object
lifecycles and locking behaviour.
I would still prefer to see this documented.  If the the documentation
would be too unwieldy or infrasible to write, this would suggest that
the code would benefit from simplification...
For the probe phase, or for teardown, I'm really not sure why it would
break anything to have a single Big MPAM Lock (however inelegant).
For the run phase (when resctrl and other clients of the driver are
able to use the driver), the discovered system properties and the
mappings onto resctrl resources are all static, and we don't seem to
need all this RCU stuff.
> > (Putting the hardware into a quiecent state is not the same thing as
> > tearing down the data structures -- we do want to quiesce MPAM when
> > shutting down the kernel, as least for the kexec scenario.)
> 
> > [...]
> > 
> >> +static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
> >> +				  enum mpam_class_types type, u8 class_id,
> >> +				  int component_id, gfp_t gfp)
> >> +{
> >> +	int err;
> >> +	struct mpam_vmsc *vmsc;
> >> +	struct mpam_msc_ris *ris;
> >> +	struct mpam_class *class;
> >> +	struct mpam_component *comp;
> >> +.
> >> +	lockdep_assert_held(&mpam_list_lock);
> >> +
> >> +	if (test_and_set_bit(ris_idx, msc->ris_idxs))
> >> +		return -EBUSY;
> > 
> > Is it impossible by construction to get in here with an out-of-range
> > ris_idx?
> > 
> > To avoid the callers (i.e., ACPI) needing to understand the internal
> > limitations of this code, maybe it is worth having a check here (even
> > if technically redundant).
> 
> It's possible - but only if you mess up the firmware tables.
> I'll add a check for this as its easy enough.
OK, suits me.
Cheers
---Dave
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described
  2025-09-09 11:28       ` Dave Martin
@ 2025-09-10 19:19         ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 19:19 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Ben Horgan
Hi Dave,
On 09/09/2025 12:28, Dave Martin wrote:
> On Mon, Sep 08, 2025 at 06:57:41PM +0100, James Morse wrote:
>> On 01/09/2025 12:09, Dave Martin wrote:
>>>> Subject: arm_mpam: Add the class and component structures for ris firmware described
>>>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>>>> index 71a1fb1a9c75..5baf2a8786fb 100644
>>>> --- a/drivers/resctrl/mpam_devices.c
>>>> +++ b/drivers/resctrl/mpam_devices.c
>>>> @@ -20,7 +20,6 @@
>>>
>>> [...]
>>>
>>>> @@ -35,11 +34,483 @@
>>>>  static DEFINE_MUTEX(mpam_list_lock);
>>>>  static LIST_HEAD(mpam_all_msc);
>>>>  
>>>> -static struct srcu_struct mpam_srcu;
>>>> +struct srcu_struct mpam_srcu;
>>
>>> Why expose this here?  This patch makes no use of the exposed symbol.
>>
>> The mpam_resctrl code needs to take it when it walks these lists. I don't want to change
>> it then because its just additional churn.
> 
> I guess this is harmless, but it's no help to the kernel, or to
> reviewers...
A trade-off has to be made here. The series is too big to post in one go. driver/resctrl
is the obvious split - but until both arrive then there is no need for mpam_internal.h, or
really any of the driver as it doesn't have a user-space interface.
I can barf the other series on the list as an illustration - but I think that would just
frustrate people.
[...]
>>>> +static void mpam_ris_destroy(struct mpam_msc_ris *ris)
>>>> +{
>>>> +	struct mpam_vmsc *vmsc = ris->vmsc;
>>>> +	struct mpam_msc *msc = vmsc->msc;
>>>> +	struct platform_device *pdev = msc->pdev;
>>>> +	struct mpam_component *comp = vmsc->comp;
>>>> +	struct mpam_class *class = comp->class;
>>>> +
>>>> +	lockdep_assert_held(&mpam_list_lock);
>>>> +
>>>> +	cpumask_andnot(&comp->affinity, &comp->affinity, &ris->affinity);
>>>> +	cpumask_andnot(&class->affinity, &class->affinity, &ris->affinity);
>>>
>>> This is not the inverse of the cpumask_or()s in mpam_ris_create_locked(),
>>> unless the the ris associated with each class and each component have
>>> strictly disjoint affinity masks.  Is that checked anywhere, or should
>>> it be impossible by construction?
>>
>> They should be disjoint. These bitmaps are built from firmware description of the cache
>> hierarchy. I don't think its possible to describe a situation where there are overlaps.
>>
>> You can build a nonsense cache hierarchy, e.g. where CPU-0's L3 is CPU-6's L2, but if you
>> do the scheduler is going to complain when it tries to chose the scheduler domains. I
>> think this should be filed under "you've got bigger problems".  There is a check that
>> catches this in mpam_resctrl_pick_caches(), to see that all the CPUs are accounted for,
>> which is to avoid tasks that get lucky with task-placement managing to escape their
>> resource limit.
> 
> I guess that makes sense.
> 
> If the firmware description is formally a tree structure then it should
> be impossible to end up with overlapping affinity masks.
> 
> Since this doesn't bite us until teardown-time in any case, I think
> this probably doesn't need to be checked explicitly, unless we observe
> actual problems.
> 
> A comment documenting this assumption may be worth having.
Sure,
>>> But, thinking about it:
>>>
>>> I wonder why we ever really need to do the teardown.  If we get an
>>> error interrupt then we can just go into a sulk, spam dmesg a bit, put
>>> the hardware into the most vanilla state that we can, and refuse to
>>> manipulate it further.  But this only happens in the case of a software
>>> or hardware *bug* (or, in a future world where we might implement
>>> virtualisation, an uncontainable MPAM error triggered by a guest -- for
>>> which tearing down the host MPAM would be an overreaction).
>>
>> The good news is guests can't escape the PARTID virtualisation that the CPU does, so any
>> mess a guest manages to make is confined to that guest's PARTID range.
>>
>>
>>> Trying to cleanly tear the MPAM driver down after such an error seems a
>>> bit futile.
>>>
>>> The MPAM resctrl glue could eventually be made into a module (though
>>> not needed from day 1) -- which would allow for unloading resctrlfs if
>>> that is eventually module-ised.  I think this wouldn't require the MPAM
>>> devices backend to be torn down at any point, though (?)
>>
>> It would certainly be optional. kernfs->resctrl->mpam is the reason all this has to be
>> built-in. If that changes I'd aim for this to be a module.
>>
>> All this free()ing was added so that the driver doesn't end up sitting on memory when it
>> isn't providing any usable feature. I have seen a platform where the error interrupt goes
> 
> I guess that's reasonable, but this is only applies to hardware that
> has MPAM but where it is either broken, or where it is unsuitable for
> running Linux but Linux has been deployed on it anyway while still
> leaving the ACPI tables intact.  This does not violate any
> specification, but it seems of marginal benefit to introduce a load of
> complexity just to safe a few K in this situation.  (Or do we get stuck,
> unable to free the config and mbwu_state arrays?  Those don't count as
> large on a server-class system, but they are about the "a few K"
> magnitude.)
> 
> (Not that I'm not saying that teardown is something we shouldn't do --
> rather, my point is: do we really need to do it now if it is subtle and
> complex to make it work, or can this be a later addition?)
Equally, can't someone say this memory has been leaked once the MPAM driver has given up.
As alloc/free were done together it seems to odd to do them at separate times - that will
certainly make it more subtle.
>> off during boot, (I suspect firmware configures an out-of-range PARTID). On such a
>> platform any memory that isn't free()d is a waste.
>>
>> But I agree its a small amount of memory.
>>
>>
>>> If we can simplify or eliminate the teardown, does it simplify the
>>> locking at all?  The garbage collection logic can also be dispensed
>>> with if there is never any garbage.
>>
>> It wouldn't simplify the locking, only remove that deferred free()ing which is needed
>> because of SRCU.
> 
> My point was that there is no need to defend against concurrent removal
> if list entries if list entries are never removed.
You can eyeball the writers are recognise the pattern as srcu. If it's an "oh that list is
read only" - then its much more of a driver specific hack.
I'd prefer to keep close to the srcu pattern - even if it is a bit complex.
>>> Since MSCs etc. never disappear from the hardware, it feels like it
>>> ought not to be necessary ever to remove items from any of these lists
>>> except when trying to do a teardown (?)
>>
>> Unbinding the driver from an MSC is another case where this may be triggered via
>> mpam_msc_drv_remove(). If you look at the whole thing, mpam_ris_destroy() pokes
>> mpam_resctrl_teardown_class() to see if resctrl needs to be torn down.
>>
>> I don't anticipate folk actually needing to do that. One Reasons is for VFIO - but this
>> kind of stuff has a performance impact on the hypervisor, so its unlikely to ever allow a
>> guest direct access to this kind of thing. Another reason is to load a more specific
>> driver, which sounds unlikely.
>>
>>
>> Ultimately this memory free-ing code is here because its the right thing to do.
>> I'd prefer to keep it as making this a loadable module would mean we have to do this.
> 
> I don't disagree with that: it is messy to retrofit teardown if it was
> never considered in the initial design.
> 
> I guess that this all comes from my uncertainty about the object
> lifecycles and locking behaviour.
> 
> I would still prefer to see this documented.  If the the documentation
> would be too unwieldy or infrasible to write, this would suggest that
> the code would benefit from simplification...
Right - nothing describes the 'phases' the driver has, they just emerge.
I'll try and add that, but it won't be in time for v2.
> For the probe phase, or for teardown, I'm really not sure why it would
> break anything to have a single Big MPAM Lock (however inelegant).
That is broadly what mpam_list_lock is doing before the cpuhp calls are registered.
> For the run phase (when resctrl and other clients of the driver are
> able to use the driver), the discovered system properties and the
> mappings onto resctrl resources are all static, and we don't seem to
> need all this RCU stuff.
Iff we say "driver specific hack - read only list" - I think that is worse.
Making it srcu makes it recognisable, and lets us free the memory instead of leaking it.
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
 
 
 
- * [PATCH 13/33] arm_mpam: Add MPAM MSC register layout definitions
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (11 preceding siblings ...)
  2025-08-22 15:29 ` [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described James Morse
@ 2025-08-22 15:29 ` James Morse
  2025-08-29  8:42   ` Ben Horgan
  2025-09-09 11:36   ` Shaopeng Tan (Fujitsu)
  2025-08-22 15:29 ` [PATCH 14/33] arm_mpam: Add cpuhp callbacks to probe MSC hardware James Morse
                   ` (54 subsequent siblings)
  67 siblings, 2 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Memory Partitioning and Monitoring (MPAM) has memory mapped devices
(MSCs) with an identity/configuration page.
Add the definitions for these registers as offset within the page(s).
Link: https://developer.arm.com/documentation/ihi0099/latest/
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
 * Renamed MSMON_CFG_MBWU_CTL_TYPE_CSU as MSMON_CFG_CSU_CTL_TYPE_CSU
 * Whitepsace churn.
 * Cite a more recent document.
 * Removed some stale feature, fixed some names etc.
---
 drivers/resctrl/mpam_internal.h | 266 ++++++++++++++++++++++++++++++++
 1 file changed, 266 insertions(+)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index d49bb884b433..6e0982a1a9ac 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -150,4 +150,270 @@ extern struct list_head mpam_classes;
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
 
+/*
+ * MPAM MSCs have the following register layout. See:
+ * Arm Memory System Resource Partitioning and Monitoring (MPAM) System
+ * Component Specification.
+ * https://developer.arm.com/documentation/ihi0099/latest/
+ */
+#define MPAM_ARCHITECTURE_V1    0x10
+
+/* Memory mapped control pages: */
+/* ID Register offsets in the memory mapped page */
+#define MPAMF_IDR		0x0000  /* features id register */
+#define MPAMF_MSMON_IDR		0x0080  /* performance monitoring features */
+#define MPAMF_IMPL_IDR		0x0028  /* imp-def partitioning */
+#define MPAMF_CPOR_IDR		0x0030  /* cache-portion partitioning */
+#define MPAMF_CCAP_IDR		0x0038  /* cache-capacity partitioning */
+#define MPAMF_MBW_IDR		0x0040  /* mem-bw partitioning */
+#define MPAMF_PRI_IDR		0x0048  /* priority partitioning */
+#define MPAMF_CSUMON_IDR	0x0088  /* cache-usage monitor */
+#define MPAMF_MBWUMON_IDR	0x0090  /* mem-bw usage monitor */
+#define MPAMF_PARTID_NRW_IDR	0x0050  /* partid-narrowing */
+#define MPAMF_IIDR		0x0018  /* implementer id register */
+#define MPAMF_AIDR		0x0020  /* architectural id register */
+
+/* Configuration and Status Register offsets in the memory mapped page */
+#define MPAMCFG_PART_SEL	0x0100  /* partid to configure: */
+#define MPAMCFG_CPBM		0x1000  /* cache-portion config */
+#define MPAMCFG_CMAX		0x0108  /* cache-capacity config */
+#define MPAMCFG_CMIN		0x0110  /* cache-capacity config */
+#define MPAMCFG_MBW_MIN		0x0200  /* min mem-bw config */
+#define MPAMCFG_MBW_MAX		0x0208  /* max mem-bw config */
+#define MPAMCFG_MBW_WINWD	0x0220  /* mem-bw accounting window config */
+#define MPAMCFG_MBW_PBM		0x2000  /* mem-bw portion bitmap config */
+#define MPAMCFG_PRI		0x0400  /* priority partitioning config */
+#define MPAMCFG_MBW_PROP	0x0500  /* mem-bw stride config */
+#define MPAMCFG_INTPARTID	0x0600  /* partid-narrowing config */
+
+#define MSMON_CFG_MON_SEL	0x0800  /* monitor selector */
+#define MSMON_CFG_CSU_FLT	0x0810  /* cache-usage monitor filter */
+#define MSMON_CFG_CSU_CTL	0x0818  /* cache-usage monitor config */
+#define MSMON_CFG_MBWU_FLT	0x0820  /* mem-bw monitor filter */
+#define MSMON_CFG_MBWU_CTL	0x0828  /* mem-bw monitor config */
+#define MSMON_CSU		0x0840  /* current cache-usage */
+#define MSMON_CSU_CAPTURE	0x0848  /* last cache-usage value captured */
+#define MSMON_MBWU		0x0860  /* current mem-bw usage value */
+#define MSMON_MBWU_CAPTURE	0x0868  /* last mem-bw value captured */
+#define MSMON_MBWU_L		0x0880  /* current long mem-bw usage value */
+#define MSMON_MBWU_CAPTURE_L	0x0890  /* last long mem-bw value captured */
+#define MSMON_CAPT_EVNT		0x0808  /* signal a capture event */
+#define MPAMF_ESR		0x00F8  /* error status register */
+#define MPAMF_ECR		0x00F0  /* error control register */
+
+/* MPAMF_IDR - MPAM features ID register */
+#define MPAMF_IDR_PARTID_MAX		GENMASK(15, 0)
+#define MPAMF_IDR_PMG_MAX		GENMASK(23, 16)
+#define MPAMF_IDR_HAS_CCAP_PART		BIT(24)
+#define MPAMF_IDR_HAS_CPOR_PART		BIT(25)
+#define MPAMF_IDR_HAS_MBW_PART		BIT(26)
+#define MPAMF_IDR_HAS_PRI_PART		BIT(27)
+#define MPAMF_IDR_EXT			BIT(28)
+#define MPAMF_IDR_HAS_IMPL_IDR		BIT(29)
+#define MPAMF_IDR_HAS_MSMON		BIT(30)
+#define MPAMF_IDR_HAS_PARTID_NRW	BIT(31)
+#define MPAMF_IDR_HAS_RIS		BIT(32)
+#define MPAMF_IDR_HAS_EXTD_ESR		BIT(38)
+#define MPAMF_IDR_HAS_ESR		BIT(39)
+#define MPAMF_IDR_RIS_MAX		GENMASK(59, 56)
+
+/* MPAMF_MSMON_IDR - MPAM performance monitoring ID register */
+#define MPAMF_MSMON_IDR_MSMON_CSU		BIT(16)
+#define MPAMF_MSMON_IDR_MSMON_MBWU		BIT(17)
+#define MPAMF_MSMON_IDR_HAS_LOCAL_CAPT_EVNT	BIT(31)
+
+/* MPAMF_CPOR_IDR - MPAM features cache portion partitioning ID register */
+#define MPAMF_CPOR_IDR_CPBM_WD			GENMASK(15, 0)
+
+/* MPAMF_CCAP_IDR - MPAM features cache capacity partitioning ID register */
+#define MPAMF_CCAP_IDR_CMAX_WD			GENMASK(5, 0)
+#define MPAMF_CCAP_IDR_CASSOC_WD		GENMASK(12, 8)
+#define MPAMF_CCAP_IDR_HAS_CASSOC		BIT(28)
+#define MPAMF_CCAP_IDR_HAS_CMIN			BIT(29)
+#define MPAMF_CCAP_IDR_NO_CMAX			BIT(30)
+#define MPAMF_CCAP_IDR_HAS_CMAX_SOFTLIM		BIT(31)
+
+/* MPAMF_MBW_IDR - MPAM features memory bandwidth partitioning ID register */
+#define MPAMF_MBW_IDR_BWA_WD		GENMASK(5, 0)
+#define MPAMF_MBW_IDR_HAS_MIN		BIT(10)
+#define MPAMF_MBW_IDR_HAS_MAX		BIT(11)
+#define MPAMF_MBW_IDR_HAS_PBM		BIT(12)
+#define MPAMF_MBW_IDR_HAS_PROP		BIT(13)
+#define MPAMF_MBW_IDR_WINDWR		BIT(14)
+#define MPAMF_MBW_IDR_BWPBM_WD		GENMASK(28, 16)
+
+/* MPAMF_PRI_IDR - MPAM features priority partitioning ID register */
+#define MPAMF_PRI_IDR_HAS_INTPRI	BIT(0)
+#define MPAMF_PRI_IDR_INTPRI_0_IS_LOW	BIT(1)
+#define MPAMF_PRI_IDR_INTPRI_WD		GENMASK(9, 4)
+#define MPAMF_PRI_IDR_HAS_DSPRI		BIT(16)
+#define MPAMF_PRI_IDR_DSPRI_0_IS_LOW	BIT(17)
+#define MPAMF_PRI_IDR_DSPRI_WD		GENMASK(25, 20)
+
+/* MPAMF_CSUMON_IDR - MPAM cache storage usage monitor ID register */
+#define MPAMF_CSUMON_IDR_NUM_MON	GENMASK(15, 0)
+#define MPAMF_CSUMON_IDR_HAS_OFLOW_CAPT	BIT(24)
+#define MPAMF_CSUMON_IDR_HAS_CEVNT_OFLW	BIT(25)
+#define MPAMF_CSUMON_IDR_HAS_OFSR	BIT(26)
+#define MPAMF_CSUMON_IDR_HAS_OFLOW_LNKG	BIT(27)
+#define MPAMF_CSUMON_IDR_HAS_XCL	BIT(29)
+#define MPAMF_CSUMON_IDR_CSU_RO		BIT(30)
+#define MPAMF_CSUMON_IDR_HAS_CAPTURE	BIT(31)
+
+/* MPAMF_MBWUMON_IDR - MPAM memory bandwidth usage monitor ID register */
+#define MPAMF_MBWUMON_IDR_NUM_MON	GENMASK(15, 0)
+#define MPAMF_MBWUMON_IDR_HAS_RWBW	BIT(28)
+#define MPAMF_MBWUMON_IDR_LWD		BIT(29)
+#define MPAMF_MBWUMON_IDR_HAS_LONG	BIT(30)
+#define MPAMF_MBWUMON_IDR_HAS_CAPTURE	BIT(31)
+
+/* MPAMF_PARTID_NRW_IDR - MPAM PARTID narrowing ID register */
+#define MPAMF_PARTID_NRW_IDR_INTPARTID_MAX      GENMASK(15, 0)
+
+/* MPAMF_IIDR - MPAM implementation ID register */
+#define MPAMF_IIDR_PRODUCTID	GENMASK(31, 20)
+#define MPAMF_IIDR_PRODUCTID_SHIFT	20
+#define MPAMF_IIDR_VARIANT	GENMASK(19, 16)
+#define MPAMF_IIDR_VARIANT_SHIFT	16
+#define MPAMF_IIDR_REVISON	GENMASK(15, 12)
+#define MPAMF_IIDR_REVISON_SHIFT	12
+#define MPAMF_IIDR_IMPLEMENTER	GENMASK(11, 0)
+#define MPAMF_IIDR_IMPLEMENTER_SHIFT	0
+
+/* MPAMF_AIDR - MPAM architecture ID register */
+#define MPAMF_AIDR_ARCH_MAJOR_REV	GENMASK(7, 4)
+#define MPAMF_AIDR_ARCH_MINOR_REV	GENMASK(3, 0)
+
+/* MPAMCFG_PART_SEL - MPAM partition configuration selection register */
+#define MPAMCFG_PART_SEL_PARTID_SEL	GENMASK(15, 0)
+#define MPAMCFG_PART_SEL_INTERNAL	BIT(16)
+#define MPAMCFG_PART_SEL_RIS		GENMASK(27, 24)
+
+/* MPAMCFG_CMAX - MPAM cache capacity configuration register */
+#define MPAMCFG_CMAX_SOFTLIM		BIT(31)
+#define MPAMCFG_CMAX_CMAX		GENMASK(15, 0)
+
+/* MPAMCFG_CMIN - MPAM cache capacity configuration register */
+#define MPAMCFG_CMIN_CMIN		GENMASK(15, 0)
+
+/*
+ * MPAMCFG_MBW_MIN - MPAM memory minimum bandwidth partitioning configuration
+ *                   register
+ */
+#define MPAMCFG_MBW_MIN_MIN		GENMASK(15, 0)
+
+/*
+ * MPAMCFG_MBW_MAX - MPAM memory maximum bandwidth partitioning configuration
+ *                   register
+ */
+#define MPAMCFG_MBW_MAX_MAX		GENMASK(15, 0)
+#define MPAMCFG_MBW_MAX_HARDLIM		BIT(31)
+
+/*
+ * MPAMCFG_MBW_WINWD - MPAM memory bandwidth partitioning window width
+ *                     register
+ */
+#define MPAMCFG_MBW_WINWD_US_FRAC	GENMASK(7, 0)
+#define MPAMCFG_MBW_WINWD_US_INT	GENMASK(23, 8)
+
+/* MPAMCFG_PRI - MPAM priority partitioning configuration register */
+#define MPAMCFG_PRI_INTPRI		GENMASK(15, 0)
+#define MPAMCFG_PRI_DSPRI		GENMASK(31, 16)
+
+/*
+ * MPAMCFG_MBW_PROP - Memory bandwidth proportional stride partitioning
+ *                    configuration register
+ */
+#define MPAMCFG_MBW_PROP_STRIDEM1	GENMASK(15, 0)
+#define MPAMCFG_MBW_PROP_EN		BIT(31)
+
+/*
+ * MPAMCFG_INTPARTID - MPAM internal partition narrowing configuration register
+ */
+#define MPAMCFG_INTPARTID_INTPARTID	GENMASK(15, 0)
+#define MPAMCFG_INTPARTID_INTERNAL	BIT(16)
+
+/* MSMON_CFG_MON_SEL - Memory system performance monitor selection register */
+#define MSMON_CFG_MON_SEL_MON_SEL	GENMASK(15, 0)
+#define MSMON_CFG_MON_SEL_RIS		GENMASK(27, 24)
+
+/* MPAMF_ESR - MPAM Error Status Register */
+#define MPAMF_ESR_PARTID_MON	GENMASK(15, 0)
+#define MPAMF_ESR_PMG		GENMASK(23, 16)
+#define MPAMF_ESR_ERRCODE	GENMASK(27, 24)
+#define MPAMF_ESR_OVRWR		BIT(31)
+#define MPAMF_ESR_RIS		GENMASK(35, 32)
+
+/* MPAMF_ECR - MPAM Error Control Register */
+#define MPAMF_ECR_INTEN		BIT(0)
+
+/* Error conditions in accessing memory mapped registers */
+#define MPAM_ERRCODE_NONE			0
+#define MPAM_ERRCODE_PARTID_SEL_RANGE		1
+#define MPAM_ERRCODE_REQ_PARTID_RANGE		2
+#define MPAM_ERRCODE_MSMONCFG_ID_RANGE		3
+#define MPAM_ERRCODE_REQ_PMG_RANGE		4
+#define MPAM_ERRCODE_MONITOR_RANGE		5
+#define MPAM_ERRCODE_INTPARTID_RANGE		6
+#define MPAM_ERRCODE_UNEXPECTED_INTERNAL	7
+
+/*
+ * MSMON_CFG_CSU_FLT - Memory system performance monitor configure cache storage
+ *                    usage monitor filter register
+ */
+#define MSMON_CFG_CSU_FLT_PARTID	GENMASK(15, 0)
+#define MSMON_CFG_CSU_FLT_PMG		GENMASK(23, 16)
+
+/*
+ * MSMON_CFG_CSU_CTL - Memory system performance monitor configure cache storage
+ *                    usage monitor control register
+ * MSMON_CFG_MBWU_CTL - Memory system performance monitor configure memory
+ *                     bandwidth usage monitor control register
+ */
+#define MSMON_CFG_x_CTL_TYPE			GENMASK(7, 0)
+#define MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L	BIT(15)
+#define MSMON_CFG_x_CTL_MATCH_PARTID		BIT(16)
+#define MSMON_CFG_x_CTL_MATCH_PMG		BIT(17)
+#define MSMON_CFG_x_CTL_SCLEN			BIT(19)
+#define MSMON_CFG_x_CTL_SUBTYPE			GENMASK(22, 20)
+#define MSMON_CFG_x_CTL_OFLOW_FRZ		BIT(24)
+#define MSMON_CFG_x_CTL_OFLOW_INTR		BIT(25)
+#define MSMON_CFG_x_CTL_OFLOW_STATUS		BIT(26)
+#define MSMON_CFG_x_CTL_CAPT_RESET		BIT(27)
+#define MSMON_CFG_x_CTL_CAPT_EVNT		GENMASK(30, 28)
+#define MSMON_CFG_x_CTL_EN			BIT(31)
+
+#define MSMON_CFG_MBWU_CTL_TYPE_MBWU			0x42
+#define MSMON_CFG_CSU_CTL_TYPE_CSU			0
+
+/*
+ * MSMON_CFG_MBWU_FLT - Memory system performance monitor configure memory
+ *                     bandwidth usage monitor filter register
+ */
+#define MSMON_CFG_MBWU_FLT_PARTID		GENMASK(15, 0)
+#define MSMON_CFG_MBWU_FLT_PMG			GENMASK(23, 16)
+#define MSMON_CFG_MBWU_FLT_RWBW			GENMASK(31, 30)
+
+/*
+ * MSMON_CSU - Memory system performance monitor cache storage usage monitor
+ *            register
+ * MSMON_CSU_CAPTURE -  Memory system performance monitor cache storage usage
+ *                     capture register
+ * MSMON_MBWU  - Memory system performance monitor memory bandwidth usage
+ *               monitor register
+ * MSMON_MBWU_CAPTURE - Memory system performance monitor memory bandwidth usage
+ *                     capture register
+ */
+#define MSMON___VALUE		GENMASK(30, 0)
+#define MSMON___NRDY		BIT(31)
+#define MSMON___NRDY_L		BIT(63)
+#define MSMON___L_VALUE		GENMASK(43, 0)
+#define MSMON___LWD_VALUE	GENMASK(62, 0)
+
+/*
+ * MSMON_CAPT_EVNT - Memory system performance monitoring capture event
+ *                  generation register
+ */
+#define MSMON_CAPT_EVNT_NOW	BIT(0)
+
 #endif /* MPAM_INTERNAL_H */
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 13/33] arm_mpam: Add MPAM MSC register layout definitions
  2025-08-22 15:29 ` [PATCH 13/33] arm_mpam: Add MPAM MSC register layout definitions James Morse
@ 2025-08-29  8:42   ` Ben Horgan
  2025-09-08 17:57     ` James Morse
  2025-09-09 11:36   ` Shaopeng Tan (Fujitsu)
  1 sibling, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-08-29  8:42 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:29, James Morse wrote:
> Memory Partitioning and Monitoring (MPAM) has memory mapped devices
> (MSCs) with an identity/configuration page.
> 
> Add the definitions for these registers as offset within the page(s).
> 
> Link: https://developer.arm.com/documentation/ihi0099/latest/
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
>  * Renamed MSMON_CFG_MBWU_CTL_TYPE_CSU as MSMON_CFG_CSU_CTL_TYPE_CSU
>  * Whitepsace churn.
>  * Cite a more recent document.
>  * Removed some stale feature, fixed some names etc.
> ---
>  drivers/resctrl/mpam_internal.h | 266 ++++++++++++++++++++++++++++++++
>  1 file changed, 266 insertions(+)
> 
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index d49bb884b433..6e0982a1a9ac 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -150,4 +150,270 @@ extern struct list_head mpam_classes;
>  int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
>  				   cpumask_t *affinity);
>  
> +/*
> + * MPAM MSCs have the following register layout. See:
> + * Arm Memory System Resource Partitioning and Monitoring (MPAM) System
> + * Component Specification.
> + * https://developer.arm.com/documentation/ihi0099/latest/
> + */
> +#define MPAM_ARCHITECTURE_V1    0x10
> +
> +/* Memory mapped control pages: */
> +/* ID Register offsets in the memory mapped page */
> +#define MPAMF_IDR		0x0000  /* features id register */
> +#define MPAMF_MSMON_IDR		0x0080  /* performance monitoring features */
> +#define MPAMF_IMPL_IDR		0x0028  /* imp-def partitioning */
> +#define MPAMF_CPOR_IDR		0x0030  /* cache-portion partitioning */
> +#define MPAMF_CCAP_IDR		0x0038  /* cache-capacity partitioning */
> +#define MPAMF_MBW_IDR		0x0040  /* mem-bw partitioning */
> +#define MPAMF_PRI_IDR		0x0048  /* priority partitioning */
> +#define MPAMF_CSUMON_IDR	0x0088  /* cache-usage monitor */
> +#define MPAMF_MBWUMON_IDR	0x0090  /* mem-bw usage monitor */
> +#define MPAMF_PARTID_NRW_IDR	0x0050  /* partid-narrowing */
> +#define MPAMF_IIDR		0x0018  /* implementer id register */
> +#define MPAMF_AIDR		0x0020  /* architectural id register */
> +
> +/* Configuration and Status Register offsets in the memory mapped page */
> +#define MPAMCFG_PART_SEL	0x0100  /* partid to configure: */
> +#define MPAMCFG_CPBM		0x1000  /* cache-portion config */
> +#define MPAMCFG_CMAX		0x0108  /* cache-capacity config */
> +#define MPAMCFG_CMIN		0x0110  /* cache-capacity config */
> +#define MPAMCFG_MBW_MIN		0x0200  /* min mem-bw config */
> +#define MPAMCFG_MBW_MAX		0x0208  /* max mem-bw config */
> +#define MPAMCFG_MBW_WINWD	0x0220  /* mem-bw accounting window config */
> +#define MPAMCFG_MBW_PBM		0x2000  /* mem-bw portion bitmap config */
> +#define MPAMCFG_PRI		0x0400  /* priority partitioning config */
> +#define MPAMCFG_MBW_PROP	0x0500  /* mem-bw stride config */
> +#define MPAMCFG_INTPARTID	0x0600  /* partid-narrowing config */
> +
> +#define MSMON_CFG_MON_SEL	0x0800  /* monitor selector */
> +#define MSMON_CFG_CSU_FLT	0x0810  /* cache-usage monitor filter */
> +#define MSMON_CFG_CSU_CTL	0x0818  /* cache-usage monitor config */
> +#define MSMON_CFG_MBWU_FLT	0x0820  /* mem-bw monitor filter */
> +#define MSMON_CFG_MBWU_CTL	0x0828  /* mem-bw monitor config */
> +#define MSMON_CSU		0x0840  /* current cache-usage */
> +#define MSMON_CSU_CAPTURE	0x0848  /* last cache-usage value captured */
> +#define MSMON_MBWU		0x0860  /* current mem-bw usage value */
> +#define MSMON_MBWU_CAPTURE	0x0868  /* last mem-bw value captured */
> +#define MSMON_MBWU_L		0x0880  /* current long mem-bw usage value */
> +#define MSMON_MBWU_CAPTURE_L	0x0890  /* last long mem-bw value captured */
> +#define MSMON_CAPT_EVNT		0x0808  /* signal a capture event */
> +#define MPAMF_ESR		0x00F8  /* error status register */
> +#define MPAMF_ECR		0x00F0  /* error control register */
> +
> +/* MPAMF_IDR - MPAM features ID register */
> +#define MPAMF_IDR_PARTID_MAX		GENMASK(15, 0)
> +#define MPAMF_IDR_PMG_MAX		GENMASK(23, 16)
> +#define MPAMF_IDR_HAS_CCAP_PART		BIT(24)
> +#define MPAMF_IDR_HAS_CPOR_PART		BIT(25)
> +#define MPAMF_IDR_HAS_MBW_PART		BIT(26)
> +#define MPAMF_IDR_HAS_PRI_PART		BIT(27)
> +#define MPAMF_IDR_EXT			BIT(28)
> +#define MPAMF_IDR_HAS_IMPL_IDR		BIT(29)
> +#define MPAMF_IDR_HAS_MSMON		BIT(30)
> +#define MPAMF_IDR_HAS_PARTID_NRW	BIT(31)
> +#define MPAMF_IDR_HAS_RIS		BIT(32)
> +#define MPAMF_IDR_HAS_EXTD_ESR		BIT(38)
> +#define MPAMF_IDR_HAS_ESR		BIT(39)
> +#define MPAMF_IDR_RIS_MAX		GENMASK(59, 56)
> +
> +/* MPAMF_MSMON_IDR - MPAM performance monitoring ID register */
> +#define MPAMF_MSMON_IDR_MSMON_CSU		BIT(16)
> +#define MPAMF_MSMON_IDR_MSMON_MBWU		BIT(17)
> +#define MPAMF_MSMON_IDR_HAS_LOCAL_CAPT_EVNT	BIT(31)
> +
> +/* MPAMF_CPOR_IDR - MPAM features cache portion partitioning ID register */
> +#define MPAMF_CPOR_IDR_CPBM_WD			GENMASK(15, 0)
> +
> +/* MPAMF_CCAP_IDR - MPAM features cache capacity partitioning ID register */
> +#define MPAMF_CCAP_IDR_CMAX_WD			GENMASK(5, 0)
> +#define MPAMF_CCAP_IDR_CASSOC_WD		GENMASK(12, 8)
> +#define MPAMF_CCAP_IDR_HAS_CASSOC		BIT(28)
> +#define MPAMF_CCAP_IDR_HAS_CMIN			BIT(29)
> +#define MPAMF_CCAP_IDR_NO_CMAX			BIT(30)
> +#define MPAMF_CCAP_IDR_HAS_CMAX_SOFTLIM		BIT(31)
> +
> +/* MPAMF_MBW_IDR - MPAM features memory bandwidth partitioning ID register */
> +#define MPAMF_MBW_IDR_BWA_WD		GENMASK(5, 0)
> +#define MPAMF_MBW_IDR_HAS_MIN		BIT(10)
> +#define MPAMF_MBW_IDR_HAS_MAX		BIT(11)
> +#define MPAMF_MBW_IDR_HAS_PBM		BIT(12)
> +#define MPAMF_MBW_IDR_HAS_PROP		BIT(13)
> +#define MPAMF_MBW_IDR_WINDWR		BIT(14)
> +#define MPAMF_MBW_IDR_BWPBM_WD		GENMASK(28, 16)
> +
> +/* MPAMF_PRI_IDR - MPAM features priority partitioning ID register */
> +#define MPAMF_PRI_IDR_HAS_INTPRI	BIT(0)
> +#define MPAMF_PRI_IDR_INTPRI_0_IS_LOW	BIT(1)
> +#define MPAMF_PRI_IDR_INTPRI_WD		GENMASK(9, 4)
> +#define MPAMF_PRI_IDR_HAS_DSPRI		BIT(16)
> +#define MPAMF_PRI_IDR_DSPRI_0_IS_LOW	BIT(17)
> +#define MPAMF_PRI_IDR_DSPRI_WD		GENMASK(25, 20)
> +
> +/* MPAMF_CSUMON_IDR - MPAM cache storage usage monitor ID register */
> +#define MPAMF_CSUMON_IDR_NUM_MON	GENMASK(15, 0)
> +#define MPAMF_CSUMON_IDR_HAS_OFLOW_CAPT	BIT(24)
> +#define MPAMF_CSUMON_IDR_HAS_CEVNT_OFLW	BIT(25)
> +#define MPAMF_CSUMON_IDR_HAS_OFSR	BIT(26)
> +#define MPAMF_CSUMON_IDR_HAS_OFLOW_LNKG	BIT(27)
> +#define MPAMF_CSUMON_IDR_HAS_XCL	BIT(29)
> +#define MPAMF_CSUMON_IDR_CSU_RO		BIT(30)
> +#define MPAMF_CSUMON_IDR_HAS_CAPTURE	BIT(31)
> +
> +/* MPAMF_MBWUMON_IDR - MPAM memory bandwidth usage monitor ID register */
> +#define MPAMF_MBWUMON_IDR_NUM_MON	GENMASK(15, 0)
> +#define MPAMF_MBWUMON_IDR_HAS_RWBW	BIT(28)
> +#define MPAMF_MBWUMON_IDR_LWD		BIT(29)
> +#define MPAMF_MBWUMON_IDR_HAS_LONG	BIT(30)
> +#define MPAMF_MBWUMON_IDR_HAS_CAPTURE	BIT(31)
> +
> +/* MPAMF_PARTID_NRW_IDR - MPAM PARTID narrowing ID register */
> +#define MPAMF_PARTID_NRW_IDR_INTPARTID_MAX      GENMASK(15, 0)
nit: spaces used instead of tabs
> +
> +/* MPAMF_IIDR - MPAM implementation ID register */
> +#define MPAMF_IIDR_PRODUCTID	GENMASK(31, 20)
> +#define MPAMF_IIDR_PRODUCTID_SHIFT	20
> +#define MPAMF_IIDR_VARIANT	GENMASK(19, 16)
> +#define MPAMF_IIDR_VARIANT_SHIFT	16
> +#define MPAMF_IIDR_REVISON	GENMASK(15, 12)
> +#define MPAMF_IIDR_REVISON_SHIFT	12
> +#define MPAMF_IIDR_IMPLEMENTER	GENMASK(11, 0)
> +#define MPAMF_IIDR_IMPLEMENTER_SHIFT	0
> +
> +/* MPAMF_AIDR - MPAM architecture ID register */
> +#define MPAMF_AIDR_ARCH_MAJOR_REV	GENMASK(7, 4)
> +#define MPAMF_AIDR_ARCH_MINOR_REV	GENMASK(3, 0)
> +
> +/* MPAMCFG_PART_SEL - MPAM partition configuration selection register */
> +#define MPAMCFG_PART_SEL_PARTID_SEL	GENMASK(15, 0)
> +#define MPAMCFG_PART_SEL_INTERNAL	BIT(16)
> +#define MPAMCFG_PART_SEL_RIS		GENMASK(27, 24)
> +
> +/* MPAMCFG_CMAX - MPAM cache capacity configuration register */
> +#define MPAMCFG_CMAX_SOFTLIM		BIT(31)
> +#define MPAMCFG_CMAX_CMAX		GENMASK(15, 0)
> +
> +/* MPAMCFG_CMIN - MPAM cache capacity configuration register */
> +#define MPAMCFG_CMIN_CMIN		GENMASK(15, 0)
> +
> +/*
> + * MPAMCFG_MBW_MIN - MPAM memory minimum bandwidth partitioning configuration
> + *                   register
> + */
> +#define MPAMCFG_MBW_MIN_MIN		GENMASK(15, 0)
> +
> +/*
> + * MPAMCFG_MBW_MAX - MPAM memory maximum bandwidth partitioning configuration
> + *                   register
> + */
> +#define MPAMCFG_MBW_MAX_MAX		GENMASK(15, 0)
> +#define MPAMCFG_MBW_MAX_HARDLIM		BIT(31)
> +
> +/*
> + * MPAMCFG_MBW_WINWD - MPAM memory bandwidth partitioning window width
> + *                     register
> + */
> +#define MPAMCFG_MBW_WINWD_US_FRAC	GENMASK(7, 0)
> +#define MPAMCFG_MBW_WINWD_US_INT	GENMASK(23, 8)
> +
> +/* MPAMCFG_PRI - MPAM priority partitioning configuration register */
> +#define MPAMCFG_PRI_INTPRI		GENMASK(15, 0)
> +#define MPAMCFG_PRI_DSPRI		GENMASK(31, 16)
> +
> +/*
> + * MPAMCFG_MBW_PROP - Memory bandwidth proportional stride partitioning
> + *                    configuration register
> + */
> +#define MPAMCFG_MBW_PROP_STRIDEM1	GENMASK(15, 0)
> +#define MPAMCFG_MBW_PROP_EN		BIT(31)
> +
> +/*
> + * MPAMCFG_INTPARTID - MPAM internal partition narrowing configuration register
> + */
> +#define MPAMCFG_INTPARTID_INTPARTID	GENMASK(15, 0)
> +#define MPAMCFG_INTPARTID_INTERNAL	BIT(16)
> +
> +/* MSMON_CFG_MON_SEL - Memory system performance monitor selection register */
> +#define MSMON_CFG_MON_SEL_MON_SEL	GENMASK(15, 0)
> +#define MSMON_CFG_MON_SEL_RIS		GENMASK(27, 24)
> +
> +/* MPAMF_ESR - MPAM Error Status Register */
> +#define MPAMF_ESR_PARTID_MON	GENMASK(15, 0)
> +#define MPAMF_ESR_PMG		GENMASK(23, 16)
> +#define MPAMF_ESR_ERRCODE	GENMASK(27, 24)
> +#define MPAMF_ESR_OVRWR		BIT(31)
> +#define MPAMF_ESR_RIS		GENMASK(35, 32)
> +
> +/* MPAMF_ECR - MPAM Error Control Register */
> +#define MPAMF_ECR_INTEN		BIT(0)
> +
> +/* Error conditions in accessing memory mapped registers */
> +#define MPAM_ERRCODE_NONE			0
> +#define MPAM_ERRCODE_PARTID_SEL_RANGE		1
> +#define MPAM_ERRCODE_REQ_PARTID_RANGE		2
> +#define MPAM_ERRCODE_MSMONCFG_ID_RANGE		3
> +#define MPAM_ERRCODE_REQ_PMG_RANGE		4
> +#define MPAM_ERRCODE_MONITOR_RANGE		5
> +#define MPAM_ERRCODE_INTPARTID_RANGE		6
> +#define MPAM_ERRCODE_UNEXPECTED_INTERNAL	7
> +
> +/*
> + * MSMON_CFG_CSU_FLT - Memory system performance monitor configure cache storage
> + *                    usage monitor filter register
> + */
> +#define MSMON_CFG_CSU_FLT_PARTID	GENMASK(15, 0)
> +#define MSMON_CFG_CSU_FLT_PMG		GENMASK(23, 16)
> +
> +/*
> + * MSMON_CFG_CSU_CTL - Memory system performance monitor configure cache storage
> + *                    usage monitor control register
> + * MSMON_CFG_MBWU_CTL - Memory system performance monitor configure memory
> + *                     bandwidth usage monitor control register
> + */
> +#define MSMON_CFG_x_CTL_TYPE			GENMASK(7, 0)
> +#define MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L	BIT(15)
> +#define MSMON_CFG_x_CTL_MATCH_PARTID		BIT(16)
> +#define MSMON_CFG_x_CTL_MATCH_PMG		BIT(17)
> +#define MSMON_CFG_x_CTL_SCLEN			BIT(19)
> +#define MSMON_CFG_x_CTL_SUBTYPE			GENMASK(22, 20)
> +#define MSMON_CFG_x_CTL_OFLOW_FRZ		BIT(24)
> +#define MSMON_CFG_x_CTL_OFLOW_INTR		BIT(25)
> +#define MSMON_CFG_x_CTL_OFLOW_STATUS		BIT(26)
> +#define MSMON_CFG_x_CTL_CAPT_RESET		BIT(27)
> +#define MSMON_CFG_x_CTL_CAPT_EVNT		GENMASK(30, 28)
> +#define MSMON_CFG_x_CTL_EN			BIT(31)
> +
> +#define MSMON_CFG_MBWU_CTL_TYPE_MBWU			0x42
> +#define MSMON_CFG_CSU_CTL_TYPE_CSU			0
> +
> +/*
> + * MSMON_CFG_MBWU_FLT - Memory system performance monitor configure memory
> + *                     bandwidth usage monitor filter register
> + */
> +#define MSMON_CFG_MBWU_FLT_PARTID		GENMASK(15, 0)
> +#define MSMON_CFG_MBWU_FLT_PMG			GENMASK(23, 16)
> +#define MSMON_CFG_MBWU_FLT_RWBW			GENMASK(31, 30)
> +
> +/*
> + * MSMON_CSU - Memory system performance monitor cache storage usage monitor
> + *            register
> + * MSMON_CSU_CAPTURE -  Memory system performance monitor cache storage usage
> + *                     capture register
> + * MSMON_MBWU  - Memory system performance monitor memory bandwidth usage
> + *               monitor register
> + * MSMON_MBWU_CAPTURE - Memory system performance monitor memory bandwidth usage
> + *                     capture register
> + */
> +#define MSMON___VALUE		GENMASK(30, 0)
> +#define MSMON___NRDY		BIT(31)
> +#define MSMON___NRDY_L		BIT(63)
> +#define MSMON___L_VALUE		GENMASK(43, 0)
> +#define MSMON___LWD_VALUE	GENMASK(62, 0)
> +
> +/*
> + * MSMON_CAPT_EVNT - Memory system performance monitoring capture event
> + *                  generation register
> + */
> +#define MSMON_CAPT_EVNT_NOW	BIT(0)
> +
>  #endif /* MPAM_INTERNAL_H */
The names and values match the specification.
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Thanks,
Ben
^ permalink raw reply	[flat|nested] 200+ messages in thread 
- * Re: [PATCH 13/33] arm_mpam: Add MPAM MSC register layout definitions
  2025-08-29  8:42   ` Ben Horgan
@ 2025-09-08 17:57     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-08 17:57 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi Ben,
On 29/08/2025 09:42, Ben Horgan wrote:
> On 8/22/25 16:29, James Morse wrote:
>> Memory Partitioning and Monitoring (MPAM) has memory mapped devices
>> (MSCs) with an identity/configuration page.
>>
>> Add the definitions for these registers as offset within the page(s).
>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>> index d49bb884b433..6e0982a1a9ac 100644
>> --- a/drivers/resctrl/mpam_internal.h
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -150,4 +150,270 @@ extern struct list_head mpam_classes;
>> +/* MPAMF_PARTID_NRW_IDR - MPAM PARTID narrowing ID register */
>> +#define MPAMF_PARTID_NRW_IDR_INTPARTID_MAX      GENMASK(15, 0)
> 
> nit: spaces used instead of tabs
Fixed,
~
> The names and values match the specification.
> 
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Thanks!
James
^ permalink raw reply	[flat|nested] 200+ messages in thread 
 
- * RE: [PATCH 13/33] arm_mpam: Add MPAM MSC register layout definitions
  2025-08-22 15:29 ` [PATCH 13/33] arm_mpam: Add MPAM MSC register layout definitions James Morse
  2025-08-29  8:42   ` Ben Horgan
@ 2025-09-09 11:36   ` Shaopeng Tan (Fujitsu)
  2025-09-10 19:31     ` James Morse
  1 sibling, 1 reply; 200+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-09-09 11:36 UTC (permalink / raw)
  To: 'James Morse', linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-acpi@vger.kernel.org,
	devicetree@vger.kernel.org
  Cc: shameerali.kolothum.thodi@huawei.com, D Scott Phillips OS,
	carl@os.amperecomputing.com, lcherian@marvell.com,
	bobo.shaobowang@huawei.com, baolin.wang@linux.alibaba.com,
	Jamie Iles, Xin Hao, peternewman@google.com,
	dfustini@baylibre.com, amitsinght@marvell.com, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay@nvidia.com, baisheng.gao@unisoc.com, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hello James,
> Memory Partitioning and Monitoring (MPAM) has memory mapped devices
> (MSCs) with an identity/configuration page.
> 
> Add the definitions for these registers as offset within the page(s).
> 
> Link: https://developer.arm.com/documentation/ihi0099/latest/
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
>  * Renamed MSMON_CFG_MBWU_CTL_TYPE_CSU as
> MSMON_CFG_CSU_CTL_TYPE_CSU
>  * Whitepsace churn.
>  * Cite a more recent document.
>  * Removed some stale feature, fixed some names etc.
> ---
>  drivers/resctrl/mpam_internal.h | 266
> ++++++++++++++++++++++++++++++++
>  1 file changed, 266 insertions(+)
> 
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index d49bb884b433..6e0982a1a9ac 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -150,4 +150,270 @@ extern struct list_head mpam_classes;  int
> mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
>  				   cpumask_t *affinity);
> 
> +/*
> + * MPAM MSCs have the following register layout. See:
> + * Arm Memory System Resource Partitioning and Monitoring (MPAM) System
> + * Component Specification.
> + * https://developer.arm.com/documentation/ihi0099/latest/
> + */
> +#define MPAM_ARCHITECTURE_V1    0x10
> +
> +/* Memory mapped control pages: */
> +/* ID Register offsets in the memory mapped page */
> +#define MPAMF_IDR		0x0000  /* features id register */
> +#define MPAMF_MSMON_IDR		0x0080  /* performance monitoring
> features */
> +#define MPAMF_IMPL_IDR		0x0028  /* imp-def partitioning */
> +#define MPAMF_CPOR_IDR		0x0030  /* cache-portion
> partitioning */
> +#define MPAMF_CCAP_IDR		0x0038  /* cache-capacity
> partitioning */
> +#define MPAMF_MBW_IDR		0x0040  /* mem-bw partitioning */
> +#define MPAMF_PRI_IDR		0x0048  /* priority partitioning */
> +#define MPAMF_CSUMON_IDR	0x0088  /* cache-usage monitor */
> +#define MPAMF_MBWUMON_IDR	0x0090  /* mem-bw usage monitor
> */
> +#define MPAMF_PARTID_NRW_IDR	0x0050  /* partid-narrowing */
> +#define MPAMF_IIDR		0x0018  /* implementer id register */
> +#define MPAMF_AIDR		0x0020  /* architectural id register */
> +
> +/* Configuration and Status Register offsets in the memory mapped page */
> +#define MPAMCFG_PART_SEL	0x0100  /* partid to configure: */
> +#define MPAMCFG_CPBM		0x1000  /* cache-portion config */
> +#define MPAMCFG_CMAX		0x0108  /* cache-capacity config */
> +#define MPAMCFG_CMIN		0x0110  /* cache-capacity config */
> +#define MPAMCFG_MBW_MIN		0x0200  /* min mem-bw config */
> +#define MPAMCFG_MBW_MAX		0x0208  /* max mem-bw
> config */
> +#define MPAMCFG_MBW_WINWD	0x0220  /* mem-bw accounting
> window config */
> +#define MPAMCFG_MBW_PBM		0x2000  /* mem-bw portion
> bitmap config */
> +#define MPAMCFG_PRI		0x0400  /* priority partitioning
> config */
> +#define MPAMCFG_MBW_PROP	0x0500  /* mem-bw stride config */
> +#define MPAMCFG_INTPARTID	0x0600  /* partid-narrowing config
> */
> +
> +#define MSMON_CFG_MON_SEL	0x0800  /* monitor selector */
> +#define MSMON_CFG_CSU_FLT	0x0810  /* cache-usage monitor
> filter */
> +#define MSMON_CFG_CSU_CTL	0x0818  /* cache-usage monitor
> config */
> +#define MSMON_CFG_MBWU_FLT	0x0820  /* mem-bw monitor filter */
> +#define MSMON_CFG_MBWU_CTL	0x0828  /* mem-bw monitor config
> */
> +#define MSMON_CSU		0x0840  /* current cache-usage */
> +#define MSMON_CSU_CAPTURE	0x0848  /* last cache-usage value
> captured */
> +#define MSMON_MBWU		0x0860  /* current mem-bw usage
> value */
> +#define MSMON_MBWU_CAPTURE	0x0868  /* last mem-bw value
> captured */
> +#define MSMON_MBWU_L		0x0880  /* current long mem-bw
> usage value */
> +#define MSMON_MBWU_CAPTURE_L	0x0890  /* last long mem-bw value
> captured */
> +#define MSMON_CAPT_EVNT		0x0808  /* signal a capture event */
> +#define MPAMF_ESR		0x00F8  /* error status register */
> +#define MPAMF_ECR		0x00F0  /* error control register */
> +
> +/* MPAMF_IDR - MPAM features ID register */
> +#define MPAMF_IDR_PARTID_MAX		GENMASK(15, 0)
> +#define MPAMF_IDR_PMG_MAX		GENMASK(23, 16)
> +#define MPAMF_IDR_HAS_CCAP_PART		BIT(24)
> +#define MPAMF_IDR_HAS_CPOR_PART		BIT(25)
> +#define MPAMF_IDR_HAS_MBW_PART		BIT(26)
> +#define MPAMF_IDR_HAS_PRI_PART		BIT(27)
> +#define MPAMF_IDR_EXT			BIT(28)
> +#define MPAMF_IDR_HAS_IMPL_IDR		BIT(29)
> +#define MPAMF_IDR_HAS_MSMON		BIT(30)
> +#define MPAMF_IDR_HAS_PARTID_NRW	BIT(31)
> +#define MPAMF_IDR_HAS_RIS		BIT(32)
> +#define MPAMF_IDR_HAS_EXTD_ESR		BIT(38)
> +#define MPAMF_IDR_HAS_ESR		BIT(39)
> +#define MPAMF_IDR_RIS_MAX		GENMASK(59, 56)
> +
> +/* MPAMF_MSMON_IDR - MPAM performance monitoring ID register */
> +#define MPAMF_MSMON_IDR_MSMON_CSU		BIT(16)
> +#define MPAMF_MSMON_IDR_MSMON_MBWU		BIT(17)
> +#define MPAMF_MSMON_IDR_HAS_LOCAL_CAPT_EVNT	BIT(31)
> +
> +/* MPAMF_CPOR_IDR - MPAM features cache portion partitioning ID register
> */
> +#define MPAMF_CPOR_IDR_CPBM_WD
> 	GENMASK(15, 0)
> +
> +/* MPAMF_CCAP_IDR - MPAM features cache capacity partitioning ID
> register */
> +#define MPAMF_CCAP_IDR_CMAX_WD
> 	GENMASK(5, 0)
> +#define MPAMF_CCAP_IDR_CASSOC_WD		GENMASK(12, 8)
> +#define MPAMF_CCAP_IDR_HAS_CASSOC		BIT(28)
> +#define MPAMF_CCAP_IDR_HAS_CMIN			BIT(29)
> +#define MPAMF_CCAP_IDR_NO_CMAX			BIT(30)
> +#define MPAMF_CCAP_IDR_HAS_CMAX_SOFTLIM		BIT(31)
> +
> +/* MPAMF_MBW_IDR - MPAM features memory bandwidth partitioning ID
> register */
> +#define MPAMF_MBW_IDR_BWA_WD		GENMASK(5, 0)
> +#define MPAMF_MBW_IDR_HAS_MIN		BIT(10)
> +#define MPAMF_MBW_IDR_HAS_MAX		BIT(11)
> +#define MPAMF_MBW_IDR_HAS_PBM		BIT(12)
> +#define MPAMF_MBW_IDR_HAS_PROP		BIT(13)
> +#define MPAMF_MBW_IDR_WINDWR		BIT(14)
> +#define MPAMF_MBW_IDR_BWPBM_WD		GENMASK(28, 16)
> +
> +/* MPAMF_PRI_IDR - MPAM features priority partitioning ID register */
> +#define MPAMF_PRI_IDR_HAS_INTPRI	BIT(0)
> +#define MPAMF_PRI_IDR_INTPRI_0_IS_LOW	BIT(1)
> +#define MPAMF_PRI_IDR_INTPRI_WD		GENMASK(9, 4)
> +#define MPAMF_PRI_IDR_HAS_DSPRI		BIT(16)
> +#define MPAMF_PRI_IDR_DSPRI_0_IS_LOW	BIT(17)
> +#define MPAMF_PRI_IDR_DSPRI_WD		GENMASK(25, 20)
> +
> +/* MPAMF_CSUMON_IDR - MPAM cache storage usage monitor ID register */
> +#define MPAMF_CSUMON_IDR_NUM_MON	GENMASK(15, 0)
> +#define MPAMF_CSUMON_IDR_HAS_OFLOW_CAPT	BIT(24)
> +#define MPAMF_CSUMON_IDR_HAS_CEVNT_OFLW	BIT(25)
> +#define MPAMF_CSUMON_IDR_HAS_OFSR	BIT(26)
> +#define MPAMF_CSUMON_IDR_HAS_OFLOW_LNKG	BIT(27)
> +#define MPAMF_CSUMON_IDR_HAS_XCL	BIT(29)
> +#define MPAMF_CSUMON_IDR_CSU_RO		BIT(30)
> +#define MPAMF_CSUMON_IDR_HAS_CAPTURE	BIT(31)
> +
> +/* MPAMF_MBWUMON_IDR - MPAM memory bandwidth usage monitor ID
> register */
> +#define MPAMF_MBWUMON_IDR_NUM_MON	GENMASK(15, 0)
> +#define MPAMF_MBWUMON_IDR_HAS_RWBW	BIT(28)
> +#define MPAMF_MBWUMON_IDR_LWD		BIT(29)
> +#define MPAMF_MBWUMON_IDR_HAS_LONG	BIT(30)
> +#define MPAMF_MBWUMON_IDR_HAS_CAPTURE	BIT(31)
> +
> +/* MPAMF_PARTID_NRW_IDR - MPAM PARTID narrowing ID register */
> +#define MPAMF_PARTID_NRW_IDR_INTPARTID_MAX      GENMASK(15,
> 0)
> +
> +/* MPAMF_IIDR - MPAM implementation ID register */
> +#define MPAMF_IIDR_PRODUCTID	GENMASK(31, 20)
> +#define MPAMF_IIDR_PRODUCTID_SHIFT	20
> +#define MPAMF_IIDR_VARIANT	GENMASK(19, 16)
> +#define MPAMF_IIDR_VARIANT_SHIFT	16
> +#define MPAMF_IIDR_REVISON	GENMASK(15, 12)
> +#define MPAMF_IIDR_REVISON_SHIFT	12
> +#define MPAMF_IIDR_IMPLEMENTER	GENMASK(11, 0)
> +#define MPAMF_IIDR_IMPLEMENTER_SHIFT	0
> +
> +/* MPAMF_AIDR - MPAM architecture ID register */
> +#define MPAMF_AIDR_ARCH_MAJOR_REV	GENMASK(7, 4)
> +#define MPAMF_AIDR_ARCH_MINOR_REV	GENMASK(3, 0)
> +
> +/* MPAMCFG_PART_SEL - MPAM partition configuration selection register */
> +#define MPAMCFG_PART_SEL_PARTID_SEL	GENMASK(15, 0)
> +#define MPAMCFG_PART_SEL_INTERNAL	BIT(16)
> +#define MPAMCFG_PART_SEL_RIS		GENMASK(27, 24)
> +
> +/* MPAMCFG_CMAX - MPAM cache capacity configuration register */
> +#define MPAMCFG_CMAX_SOFTLIM		BIT(31)
> +#define MPAMCFG_CMAX_CMAX		GENMASK(15, 0)
> +
> +/* MPAMCFG_CMIN - MPAM cache capacity configuration register */
> +#define MPAMCFG_CMIN_CMIN		GENMASK(15, 0)
> +
> +/*
> + * MPAMCFG_MBW_MIN - MPAM memory minimum bandwidth partitioning
> configuration
> + *                   register
> + */
> +#define MPAMCFG_MBW_MIN_MIN		GENMASK(15, 0)
> +
> +/*
> + * MPAMCFG_MBW_MAX - MPAM memory maximum bandwidth partitioning
> configuration
> + *                   register
> + */
> +#define MPAMCFG_MBW_MAX_MAX		GENMASK(15, 0)
> +#define MPAMCFG_MBW_MAX_HARDLIM		BIT(31)
> +
> +/*
> + * MPAMCFG_MBW_WINWD - MPAM memory bandwidth partitioning window
> width
> + *                     register
> + */
> +#define MPAMCFG_MBW_WINWD_US_FRAC	GENMASK(7, 0)
> +#define MPAMCFG_MBW_WINWD_US_INT	GENMASK(23, 8)
> +
> +/* MPAMCFG_PRI - MPAM priority partitioning configuration register */
> +#define MPAMCFG_PRI_INTPRI		GENMASK(15, 0)
> +#define MPAMCFG_PRI_DSPRI		GENMASK(31, 16)
> +
> +/*
> + * MPAMCFG_MBW_PROP - Memory bandwidth proportional stride
> partitioning
> + *                    configuration register
> + */
> +#define MPAMCFG_MBW_PROP_STRIDEM1	GENMASK(15, 0)
> +#define MPAMCFG_MBW_PROP_EN		BIT(31)
> +
> +/*
> + * MPAMCFG_INTPARTID - MPAM internal partition narrowing configuration
> +register  */
> +#define MPAMCFG_INTPARTID_INTPARTID	GENMASK(15, 0)
> +#define MPAMCFG_INTPARTID_INTERNAL	BIT(16)
> +
> +/* MSMON_CFG_MON_SEL - Memory system performance monitor selection
> register */
> +#define MSMON_CFG_MON_SEL_MON_SEL	GENMASK(15, 0)
> +#define MSMON_CFG_MON_SEL_RIS		GENMASK(27, 24)
> +
> +/* MPAMF_ESR - MPAM Error Status Register */
> +#define MPAMF_ESR_PARTID_MON	GENMASK(15, 0)
> +#define MPAMF_ESR_PMG		GENMASK(23, 16)
> +#define MPAMF_ESR_ERRCODE	GENMASK(27, 24)
> +#define MPAMF_ESR_OVRWR		BIT(31)
> +#define MPAMF_ESR_RIS		GENMASK(35, 32)
> +
> +/* MPAMF_ECR - MPAM Error Control Register */
> +#define MPAMF_ECR_INTEN		BIT(0)
> +
> +/* Error conditions in accessing memory mapped registers */
> +#define MPAM_ERRCODE_NONE			0
> +#define MPAM_ERRCODE_PARTID_SEL_RANGE		1
> +#define MPAM_ERRCODE_REQ_PARTID_RANGE		2
> +#define MPAM_ERRCODE_MSMONCFG_ID_RANGE		3
> +#define MPAM_ERRCODE_REQ_PMG_RANGE		4
> +#define MPAM_ERRCODE_MONITOR_RANGE		5
> +#define MPAM_ERRCODE_INTPARTID_RANGE		6
> +#define MPAM_ERRCODE_UNEXPECTED_INTERNAL	7
> +
> +/*
> + * MSMON_CFG_CSU_FLT - Memory system performance monitor configure
> cache storage
> + *                    usage monitor filter register
> + */
> +#define MSMON_CFG_CSU_FLT_PARTID	GENMASK(15, 0)
> +#define MSMON_CFG_CSU_FLT_PMG		GENMASK(23, 16)
> +
> +/*
> + * MSMON_CFG_CSU_CTL - Memory system performance monitor configure
> cache storage
> + *                    usage monitor control register
> + * MSMON_CFG_MBWU_CTL - Memory system performance monitor
> configure memory
> + *                     bandwidth usage monitor control register
> + */
> +#define MSMON_CFG_x_CTL_TYPE			GENMASK(7, 0)
> +#define MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L	BIT(15)
> +#define MSMON_CFG_x_CTL_MATCH_PARTID		BIT(16)
> +#define MSMON_CFG_x_CTL_MATCH_PMG		BIT(17)
> +#define MSMON_CFG_x_CTL_SCLEN			BIT(19)
> +#define MSMON_CFG_x_CTL_SUBTYPE
> 	GENMASK(22, 20)
> +#define MSMON_CFG_x_CTL_OFLOW_FRZ		BIT(24)
> +#define MSMON_CFG_x_CTL_OFLOW_INTR		BIT(25)
> +#define MSMON_CFG_x_CTL_OFLOW_STATUS		BIT(26)
> +#define MSMON_CFG_x_CTL_CAPT_RESET		BIT(27)
> +#define MSMON_CFG_x_CTL_CAPT_EVNT		GENMASK(30, 28)
> +#define MSMON_CFG_x_CTL_EN			BIT(31)
> +
> +#define MSMON_CFG_MBWU_CTL_TYPE_MBWU
> 	0x42
> +#define MSMON_CFG_CSU_CTL_TYPE_CSU			0
+#define MSMON_CFG_CSU_CTL_TYPE_CSU			0x43?
Best regards,
Shaopeng TAN
> +/*
> + * MSMON_CFG_MBWU_FLT - Memory system performance monitor
> configure memory
> + *                     bandwidth usage monitor filter register
> + */
> +#define MSMON_CFG_MBWU_FLT_PARTID		GENMASK(15, 0)
> +#define MSMON_CFG_MBWU_FLT_PMG
> 	GENMASK(23, 16)
> +#define MSMON_CFG_MBWU_FLT_RWBW
> 	GENMASK(31, 30)
> +
> +/*
> + * MSMON_CSU - Memory system performance monitor cache storage usage
> monitor
> + *            register
> + * MSMON_CSU_CAPTURE -  Memory system performance monitor cache
> storage usage
> + *                     capture register
> + * MSMON_MBWU  - Memory system performance monitor memory
> bandwidth usage
> + *               monitor register
> + * MSMON_MBWU_CAPTURE - Memory system performance monitor
> memory bandwidth usage
> + *                     capture register
> + */
> +#define MSMON___VALUE		GENMASK(30, 0)
> +#define MSMON___NRDY		BIT(31)
> +#define MSMON___NRDY_L		BIT(63)
> +#define MSMON___L_VALUE		GENMASK(43, 0)
> +#define MSMON___LWD_VALUE	GENMASK(62, 0)
> +
> +/*
> + * MSMON_CAPT_EVNT - Memory system performance monitoring capture
> event
> + *                  generation register
> + */
> +#define MSMON_CAPT_EVNT_NOW	BIT(0)
> +
>  #endif /* MPAM_INTERNAL_H */
> --
> 2.20.1
^ permalink raw reply	[flat|nested] 200+ messages in thread 
- * Re: [PATCH 13/33] arm_mpam: Add MPAM MSC register layout definitions
  2025-09-09 11:36   ` Shaopeng Tan (Fujitsu)
@ 2025-09-10 19:31     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 19:31 UTC (permalink / raw)
  To: Shaopeng Tan (Fujitsu), linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-acpi@vger.kernel.org,
	devicetree@vger.kernel.org
  Cc: shameerali.kolothum.thodi@huawei.com, D Scott Phillips OS,
	carl@os.amperecomputing.com, lcherian@marvell.com,
	bobo.shaobowang@huawei.com, baolin.wang@linux.alibaba.com,
	Jamie Iles, Xin Hao, peternewman@google.com,
	dfustini@baylibre.com, amitsinght@marvell.com, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay@nvidia.com, baisheng.gao@unisoc.com, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hi Shaopeng,
On 09/09/2025 12:36, Shaopeng Tan (Fujitsu) wrote:
>> Memory Partitioning and Monitoring (MPAM) has memory mapped devices
>> (MSCs) with an identity/configuration page.
>>
>> Add the definitions for these registers as offset within the page(s).
>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>> index d49bb884b433..6e0982a1a9ac 100644
>> --- a/drivers/resctrl/mpam_internal.h
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -150,4 +150,270 @@ extern struct list_head mpam_classes;  int
>> +#define MSMON_CFG_MBWU_CTL_TYPE_MBWU
>> 	0x42
>> +#define MSMON_CFG_CSU_CTL_TYPE_CSU			0
> 
> +#define MSMON_CFG_CSU_CTL_TYPE_CSU			0x43?
Yes, that looks like the line got truncated.
This would have caused the counter type to be mismatched and reprogrammed each time.
Thanks for spotting it!
James
^ permalink raw reply	[flat|nested] 200+ messages in thread 
 
 
- * [PATCH 14/33] arm_mpam: Add cpuhp callbacks to probe MSC hardware
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (12 preceding siblings ...)
  2025-08-22 15:29 ` [PATCH 13/33] arm_mpam: Add MPAM MSC register layout definitions James Morse
@ 2025-08-22 15:29 ` James Morse
  2025-08-27 16:08   ` Rob Herring
                     ` (2 more replies)
  2025-08-22 15:29 ` [PATCH 15/33] arm_mpam: Probe MSCs to find the supported partid/pmg values James Morse
                   ` (53 subsequent siblings)
  67 siblings, 3 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Lecopzer Chen
Because an MSC can only by accessed from the CPUs in its cpu-affinity
set we need to be running on one of those CPUs to probe the MSC
hardware.
Do this work in the cpuhp callback. Probing the hardware will only
happen before MPAM is enabled, walk all the MSCs and probe those we can
reach that haven't already been probed.
Later once MPAM is enabled, this cpuhp callback will be replaced by
one that avoids the global list.
Enabling a static key will also take the cpuhp lock, so can't be done
from the cpuhp callback. Whenever a new MSC has been probed schedule
work to test if all the MSCs have now been probed.
CC: Lecopzer Chen <lecopzerc@nvidia.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_devices.c  | 144 +++++++++++++++++++++++++++++++-
 drivers/resctrl/mpam_internal.h |   8 +-
 2 files changed, 147 insertions(+), 5 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 5baf2a8786fb..9d6516f98acf 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -4,6 +4,7 @@
 #define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
 
 #include <linux/acpi.h>
+#include <linux/atomic.h>
 #include <linux/arm_mpam.h>
 #include <linux/cacheinfo.h>
 #include <linux/cpu.h>
@@ -21,6 +22,7 @@
 #include <linux/slab.h>
 #include <linux/spinlock.h>
 #include <linux/types.h>
+#include <linux/workqueue.h>
 
 #include <acpi/pcc.h>
 
@@ -39,6 +41,16 @@ struct srcu_struct mpam_srcu;
 /* MPAM isn't available until all the MSC have been probed. */
 static u32 mpam_num_msc;
 
+static int mpam_cpuhp_state;
+static DEFINE_MUTEX(mpam_cpuhp_state_lock);
+
+/*
+ * mpam is enabled once all devices have been probed from CPU online callbacks,
+ * scheduled via this work_struct. If access to an MSC depends on a CPU that
+ * was not brought online at boot, this can happen surprisingly late.
+ */
+static DECLARE_WORK(mpam_enable_work, &mpam_enable);
+
 /*
  * An MSC is a physical container for controls and monitors, each identified by
  * their RIS index. These share a base-address, interrupts and some MMIO
@@ -78,6 +90,22 @@ LIST_HEAD(mpam_classes);
 /* List of all objects that can be free()d after synchronise_srcu() */
 static LLIST_HEAD(mpam_garbage);
 
+static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
+{
+	WARN_ON_ONCE(reg > msc->mapped_hwpage_sz);
+	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
+
+	return readl_relaxed(msc->mapped_hwpage + reg);
+}
+
+static inline u32 _mpam_read_partsel_reg(struct mpam_msc *msc, u16 reg)
+{
+	lockdep_assert_held_once(&msc->part_sel_lock);
+	return __mpam_read_reg(msc, reg);
+}
+
+#define mpam_read_partsel_reg(msc, reg)        _mpam_read_partsel_reg(msc, MPAMF_##reg)
+
 #define init_garbage(x)	init_llist_node(&(x)->garbage.llist)
 
 static struct mpam_vmsc *
@@ -511,9 +539,84 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
 	return err;
 }
 
-static void mpam_discovery_complete(void)
+static int mpam_msc_hw_probe(struct mpam_msc *msc)
+{
+	u64 idr;
+	int err;
+
+	lockdep_assert_held(&msc->probe_lock);
+
+	mutex_lock(&msc->part_sel_lock);
+	idr = mpam_read_partsel_reg(msc, AIDR);
+	if ((idr & MPAMF_AIDR_ARCH_MAJOR_REV) != MPAM_ARCHITECTURE_V1) {
+		pr_err_once("%s does not match MPAM architecture v1.x\n",
+			    dev_name(&msc->pdev->dev));
+		err = -EIO;
+	} else {
+		msc->probed = true;
+		err = 0;
+	}
+	mutex_unlock(&msc->part_sel_lock);
+
+	return err;
+}
+
+static int mpam_cpu_online(unsigned int cpu)
 {
-	pr_err("Discovered all MSC\n");
+	return 0;
+}
+
+/* Before mpam is enabled, try to probe new MSC */
+static int mpam_discovery_cpu_online(unsigned int cpu)
+{
+	int err = 0;
+	struct mpam_msc *msc;
+	bool new_device_probed = false;
+
+	mutex_lock(&mpam_list_lock);
+	list_for_each_entry(msc, &mpam_all_msc, glbl_list) {
+		if (!cpumask_test_cpu(cpu, &msc->accessibility))
+			continue;
+
+		mutex_lock(&msc->probe_lock);
+		if (!msc->probed)
+			err = mpam_msc_hw_probe(msc);
+		mutex_unlock(&msc->probe_lock);
+
+		if (!err)
+			new_device_probed = true;
+		else
+			break; // mpam_broken
+	}
+	mutex_unlock(&mpam_list_lock);
+
+	if (new_device_probed && !err)
+		schedule_work(&mpam_enable_work);
+
+	return err;
+}
+
+static int mpam_cpu_offline(unsigned int cpu)
+{
+	return 0;
+}
+
+static void mpam_register_cpuhp_callbacks(int (*online)(unsigned int online),
+					  int (*offline)(unsigned int offline))
+{
+	mutex_lock(&mpam_cpuhp_state_lock);
+	if (mpam_cpuhp_state) {
+		cpuhp_remove_state(mpam_cpuhp_state);
+		mpam_cpuhp_state = 0;
+	}
+
+	mpam_cpuhp_state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "mpam:online",
+					     online, offline);
+	if (mpam_cpuhp_state <= 0) {
+		pr_err("Failed to register cpuhp callbacks");
+		mpam_cpuhp_state = 0;
+	}
+	mutex_unlock(&mpam_cpuhp_state_lock);
 }
 
 static int mpam_dt_count_msc(void)
@@ -772,7 +875,7 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
 	}
 
 	if (!err && fw_num_msc == mpam_num_msc)
-		mpam_discovery_complete();
+		mpam_register_cpuhp_callbacks(&mpam_discovery_cpu_online, NULL);
 
 	if (err && msc)
 		mpam_msc_drv_remove(pdev);
@@ -795,6 +898,41 @@ static struct platform_driver mpam_msc_driver = {
 	.remove = mpam_msc_drv_remove,
 };
 
+static void mpam_enable_once(void)
+{
+	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
+
+	pr_info("MPAM enabled\n");
+}
+
+/*
+ * Enable mpam once all devices have been probed.
+ * Scheduled by mpam_discovery_cpu_online() once all devices have been created.
+ * Also scheduled when new devices are probed when new CPUs come online.
+ */
+void mpam_enable(struct work_struct *work)
+{
+	static atomic_t once;
+	struct mpam_msc *msc;
+	bool all_devices_probed = true;
+
+	/* Have we probed all the hw devices? */
+	mutex_lock(&mpam_list_lock);
+	list_for_each_entry(msc, &mpam_all_msc, glbl_list) {
+		mutex_lock(&msc->probe_lock);
+		if (!msc->probed)
+			all_devices_probed = false;
+		mutex_unlock(&msc->probe_lock);
+
+		if (!all_devices_probed)
+			break;
+	}
+	mutex_unlock(&mpam_list_lock);
+
+	if (all_devices_probed && !atomic_fetch_inc(&once))
+		mpam_enable_once();
+}
+
 /*
  * MSC that are hidden under caches are not created as platform devices
  * as there is no cache driver. Caches are also special-cased in
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 6e0982a1a9ac..a98cca08a2ef 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -49,6 +49,7 @@ struct mpam_msc {
 	 * properties become read-only and the lists are protected by SRCU.
 	 */
 	struct mutex		probe_lock;
+	bool			probed;
 	unsigned long		ris_idxs[128 / BITS_PER_LONG];
 	u32			ris_max;
 
@@ -59,14 +60,14 @@ struct mpam_msc {
 	 * part_sel_lock protects access to the MSC hardware registers that are
 	 * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
 	 * by RIS).
-	 * If needed, take msc->lock first.
+	 * If needed, take msc->probe_lock first.
 	 */
 	struct mutex		part_sel_lock;
 
 	/*
 	 * mon_sel_lock protects access to the MSC hardware registers that are
 	 * affeted by MPAMCFG_MON_SEL.
-	 * If needed, take msc->lock first.
+	 * If needed, take msc->probe_lock first.
 	 */
 	struct mutex		outer_mon_sel_lock;
 	raw_spinlock_t		inner_mon_sel_lock;
@@ -147,6 +148,9 @@ struct mpam_msc_ris {
 extern struct srcu_struct mpam_srcu;
 extern struct list_head mpam_classes;
 
+/* Scheduled work callback to enable mpam once all MSC have been probed */
+void mpam_enable(struct work_struct *work);
+
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
 
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 14/33] arm_mpam: Add cpuhp callbacks to probe MSC hardware
  2025-08-22 15:29 ` [PATCH 14/33] arm_mpam: Add cpuhp callbacks to probe MSC hardware James Morse
@ 2025-08-27 16:08   ` Rob Herring
  2025-09-08 17:58     ` James Morse
  2025-09-05 16:40   ` Dave Martin
  2025-09-09 14:23   ` Dave Martin
  2 siblings, 1 reply; 200+ messages in thread
From: Rob Herring @ 2025-08-27 16:08 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Lecopzer Chen
On Fri, Aug 22, 2025 at 10:32 AM James Morse <james.morse@arm.com> wrote:
>
> Because an MSC can only by accessed from the CPUs in its cpu-affinity
> set we need to be running on one of those CPUs to probe the MSC
> hardware.
>
> Do this work in the cpuhp callback. Probing the hardware will only
> happen before MPAM is enabled, walk all the MSCs and probe those we can
> reach that haven't already been probed.
>
> Later once MPAM is enabled, this cpuhp callback will be replaced by
> one that avoids the global list.
>
> Enabling a static key will also take the cpuhp lock, so can't be done
> from the cpuhp callback. Whenever a new MSC has been probed schedule
> work to test if all the MSCs have now been probed.
>
> CC: Lecopzer Chen <lecopzerc@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  drivers/resctrl/mpam_devices.c  | 144 +++++++++++++++++++++++++++++++-
>  drivers/resctrl/mpam_internal.h |   8 +-
>  2 files changed, 147 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 5baf2a8786fb..9d6516f98acf 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -4,6 +4,7 @@
>  #define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
>
>  #include <linux/acpi.h>
> +#include <linux/atomic.h>
>  #include <linux/arm_mpam.h>
>  #include <linux/cacheinfo.h>
>  #include <linux/cpu.h>
> @@ -21,6 +22,7 @@
>  #include <linux/slab.h>
>  #include <linux/spinlock.h>
>  #include <linux/types.h>
> +#include <linux/workqueue.h>
>
>  #include <acpi/pcc.h>
>
> @@ -39,6 +41,16 @@ struct srcu_struct mpam_srcu;
>  /* MPAM isn't available until all the MSC have been probed. */
>  static u32 mpam_num_msc;
>
> +static int mpam_cpuhp_state;
> +static DEFINE_MUTEX(mpam_cpuhp_state_lock);
> +
> +/*
> + * mpam is enabled once all devices have been probed from CPU online callbacks,
> + * scheduled via this work_struct. If access to an MSC depends on a CPU that
> + * was not brought online at boot, this can happen surprisingly late.
> + */
> +static DECLARE_WORK(mpam_enable_work, &mpam_enable);
> +
>  /*
>   * An MSC is a physical container for controls and monitors, each identified by
>   * their RIS index. These share a base-address, interrupts and some MMIO
> @@ -78,6 +90,22 @@ LIST_HEAD(mpam_classes);
>  /* List of all objects that can be free()d after synchronise_srcu() */
>  static LLIST_HEAD(mpam_garbage);
>
> +static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
> +{
> +       WARN_ON_ONCE(reg > msc->mapped_hwpage_sz);
> +       WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
These either make __mpam_read_reg uninlined or add 2 checks to every
register read. Neither seems very good.
> +
> +       return readl_relaxed(msc->mapped_hwpage + reg);
> +}
> +
> +static inline u32 _mpam_read_partsel_reg(struct mpam_msc *msc, u16 reg)
> +{
> +       lockdep_assert_held_once(&msc->part_sel_lock);
Similar thing here.
> +       return __mpam_read_reg(msc, reg);
> +}
> +
> +#define mpam_read_partsel_reg(msc, reg)        _mpam_read_partsel_reg(msc, MPAMF_##reg)
> +
>  #define init_garbage(x)        init_llist_node(&(x)->garbage.llist)
>
>  static struct mpam_vmsc *
> @@ -511,9 +539,84 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
>         return err;
>  }
>
> -static void mpam_discovery_complete(void)
It is annoying to review things which disappear in later patches...
> +static int mpam_msc_hw_probe(struct mpam_msc *msc)
> +{
> +       u64 idr;
> +       int err;
> +
> +       lockdep_assert_held(&msc->probe_lock);
> +
> +       mutex_lock(&msc->part_sel_lock);
> +       idr = mpam_read_partsel_reg(msc, AIDR);
I don't think AIDR access depends on PART_SEL.
> +       if ((idr & MPAMF_AIDR_ARCH_MAJOR_REV) != MPAM_ARCHITECTURE_V1) {
> +               pr_err_once("%s does not match MPAM architecture v1.x\n",
> +                           dev_name(&msc->pdev->dev));
> +               err = -EIO;
> +       } else {
> +               msc->probed = true;
> +               err = 0;
> +       }
> +       mutex_unlock(&msc->part_sel_lock);
> +
> +       return err;
> +}
> +
> +static int mpam_cpu_online(unsigned int cpu)
>  {
> -       pr_err("Discovered all MSC\n");
> +       return 0;
> +}
> +
> +/* Before mpam is enabled, try to probe new MSC */
> +static int mpam_discovery_cpu_online(unsigned int cpu)
> +{
> +       int err = 0;
> +       struct mpam_msc *msc;
> +       bool new_device_probed = false;
> +
> +       mutex_lock(&mpam_list_lock);
> +       list_for_each_entry(msc, &mpam_all_msc, glbl_list) {
> +               if (!cpumask_test_cpu(cpu, &msc->accessibility))
> +                       continue;
> +
> +               mutex_lock(&msc->probe_lock);
> +               if (!msc->probed)
> +                       err = mpam_msc_hw_probe(msc);
> +               mutex_unlock(&msc->probe_lock);
> +
> +               if (!err)
> +                       new_device_probed = true;
> +               else
> +                       break; // mpam_broken
> +       }
> +       mutex_unlock(&mpam_list_lock);
> +
> +       if (new_device_probed && !err)
> +               schedule_work(&mpam_enable_work);
> +
> +       return err;
> +}
> +
> +static int mpam_cpu_offline(unsigned int cpu)
> +{
> +       return 0;
> +}
> +
> +static void mpam_register_cpuhp_callbacks(int (*online)(unsigned int online),
> +                                         int (*offline)(unsigned int offline))
> +{
> +       mutex_lock(&mpam_cpuhp_state_lock);
> +       if (mpam_cpuhp_state) {
> +               cpuhp_remove_state(mpam_cpuhp_state);
> +               mpam_cpuhp_state = 0;
> +       }
> +
> +       mpam_cpuhp_state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "mpam:online",
> +                                            online, offline);
> +       if (mpam_cpuhp_state <= 0) {
> +               pr_err("Failed to register cpuhp callbacks");
> +               mpam_cpuhp_state = 0;
> +       }
> +       mutex_unlock(&mpam_cpuhp_state_lock);
>  }
>
>  static int mpam_dt_count_msc(void)
> @@ -772,7 +875,7 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
>         }
>
>         if (!err && fw_num_msc == mpam_num_msc)
> -               mpam_discovery_complete();
> +               mpam_register_cpuhp_callbacks(&mpam_discovery_cpu_online, NULL);
>
>         if (err && msc)
>                 mpam_msc_drv_remove(pdev);
> @@ -795,6 +898,41 @@ static struct platform_driver mpam_msc_driver = {
>         .remove = mpam_msc_drv_remove,
>  };
>
> +static void mpam_enable_once(void)
> +{
> +       mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
> +
> +       pr_info("MPAM enabled\n");
> +}
> +
> +/*
> + * Enable mpam once all devices have been probed.
> + * Scheduled by mpam_discovery_cpu_online() once all devices have been created.
> + * Also scheduled when new devices are probed when new CPUs come online.
> + */
> +void mpam_enable(struct work_struct *work)
> +{
> +       static atomic_t once;
> +       struct mpam_msc *msc;
> +       bool all_devices_probed = true;
> +
> +       /* Have we probed all the hw devices? */
> +       mutex_lock(&mpam_list_lock);
> +       list_for_each_entry(msc, &mpam_all_msc, glbl_list) {
> +               mutex_lock(&msc->probe_lock);
> +               if (!msc->probed)
> +                       all_devices_probed = false;
> +               mutex_unlock(&msc->probe_lock);
> +
> +               if (!all_devices_probed)
> +                       break;
> +       }
> +       mutex_unlock(&mpam_list_lock);
> +
> +       if (all_devices_probed && !atomic_fetch_inc(&once))
> +               mpam_enable_once();
> +}
> +
>  /*
>   * MSC that are hidden under caches are not created as platform devices
>   * as there is no cache driver. Caches are also special-cased in
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 6e0982a1a9ac..a98cca08a2ef 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -49,6 +49,7 @@ struct mpam_msc {
>          * properties become read-only and the lists are protected by SRCU.
>          */
>         struct mutex            probe_lock;
> +       bool                    probed;
>         unsigned long           ris_idxs[128 / BITS_PER_LONG];
>         u32                     ris_max;
>
> @@ -59,14 +60,14 @@ struct mpam_msc {
>          * part_sel_lock protects access to the MSC hardware registers that are
>          * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
>          * by RIS).
> -        * If needed, take msc->lock first.
> +        * If needed, take msc->probe_lock first.
Humm. I think this belongs in patch 10.
>          */
>         struct mutex            part_sel_lock;
>
>         /*
>          * mon_sel_lock protects access to the MSC hardware registers that are
>          * affeted by MPAMCFG_MON_SEL.
> -        * If needed, take msc->lock first.
> +        * If needed, take msc->probe_lock first.
>          */
>         struct mutex            outer_mon_sel_lock;
>         raw_spinlock_t          inner_mon_sel_lock;
> @@ -147,6 +148,9 @@ struct mpam_msc_ris {
>  extern struct srcu_struct mpam_srcu;
>  extern struct list_head mpam_classes;
>
> +/* Scheduled work callback to enable mpam once all MSC have been probed */
> +void mpam_enable(struct work_struct *work);
> +
>  int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
>                                    cpumask_t *affinity);
>
> --
> 2.20.1
>
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 14/33] arm_mpam: Add cpuhp callbacks to probe MSC hardware
  2025-08-27 16:08   ` Rob Herring
@ 2025-09-08 17:58     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-08 17:58 UTC (permalink / raw)
  To: Rob Herring
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Lecopzer Chen
Hi Rob,
On 27/08/2025 17:08, Rob Herring wrote:
> On Fri, Aug 22, 2025 at 10:32 AM James Morse <james.morse@arm.com> wrote:
>>
>> Because an MSC can only by accessed from the CPUs in its cpu-affinity
>> set we need to be running on one of those CPUs to probe the MSC
>> hardware.
>>
>> Do this work in the cpuhp callback. Probing the hardware will only
>> happen before MPAM is enabled, walk all the MSCs and probe those we can
>> reach that haven't already been probed.
>>
>> Later once MPAM is enabled, this cpuhp callback will be replaced by
>> one that avoids the global list.
>>
>> Enabling a static key will also take the cpuhp lock, so can't be done
>> from the cpuhp callback. Whenever a new MSC has been probed schedule
>> work to test if all the MSCs have now been probed.
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index 5baf2a8786fb..9d6516f98acf 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -78,6 +90,22 @@ LIST_HEAD(mpam_classes);
>>  /* List of all objects that can be free()d after synchronise_srcu() */
>>  static LLIST_HEAD(mpam_garbage);
>>
>> +static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
>> +{
>> +       WARN_ON_ONCE(reg > msc->mapped_hwpage_sz);
>> +       WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
> 
> These either make __mpam_read_reg uninlined or add 2 checks to every
> register read. Neither seems very good.
The mapping-bounds one is from before the ACPI table had the size of the mapping. I can
remove that one now.
The accessibility one really needs checking as getting this wrong will only occasionally
cause a deadlock if you get unlucky with power-management. I don't think we'd ever manage
to debug that, hence the check. The server platforms we'll see first aren't going to
bother with PSCI CPU_SUSPEND - but mobile devices will.
If the compiler choses not to inline this, I'm fine with that. Accesses to the device
mapped configuration are rare, and always via a resctrl filesystem access. I don't think
the performance matters.
>> +
>> +       return readl_relaxed(msc->mapped_hwpage + reg);
>> +}
>> +
>> +static inline u32 _mpam_read_partsel_reg(struct mpam_msc *msc, u16 reg)
>> +{
>> +       lockdep_assert_held_once(&msc->part_sel_lock);
> 
> Similar thing here.
If don't build with lockdep this costs you nothing.
>> +       return __mpam_read_reg(msc, reg);
>> +}
>> +
>> +#define mpam_read_partsel_reg(msc, reg)        _mpam_read_partsel_reg(msc, MPAMF_##reg)
>> +
>>  #define init_garbage(x)        init_llist_node(&(x)->garbage.llist)
>>
>>  static struct mpam_vmsc *
>> @@ -511,9 +539,84 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
>>         return err;
>>  }
>>
>> -static void mpam_discovery_complete(void)
> 
> It is annoying to review things which disappear in later patches...
It's a printk - its purpose was to show something happens once all the MSC have been
probed. That was supposed to make it easier to review as it has always has this shape from
the beginning. This patch adds the hardware accesses that do the probing, which happen
from cpuhp calls - which in turn moves this 'complete' work to be scheduled.
As this seems to be causing confusion I'll inline it so it doesn't look strange.
>> +static int mpam_msc_hw_probe(struct mpam_msc *msc)
>> +{
>> +       u64 idr;
>> +       int err;
>> +
>> +       lockdep_assert_held(&msc->probe_lock);
>> +
>> +       mutex_lock(&msc->part_sel_lock);
>> +       idr = mpam_read_partsel_reg(msc, AIDR);
> 
> I don't think AIDR access depends on PART_SEL.
It doesn't, but as most registers do, it was just simpler to pretend it does.
I'll shove the __version in here instead, which will save taking the lock.
>> +       if ((idr & MPAMF_AIDR_ARCH_MAJOR_REV) != MPAM_ARCHITECTURE_V1) {
>> +               pr_err_once("%s does not match MPAM architecture v1.x\n",
>> +                           dev_name(&msc->pdev->dev));
>> +               err = -EIO;
>> +       } else {
>> +               msc->probed = true;
>> +               err = 0;
>> +       }
>> +       mutex_unlock(&msc->part_sel_lock);
>> +
>> +       return err;
>> +}
>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>> index 6e0982a1a9ac..a98cca08a2ef 100644
>> --- a/drivers/resctrl/mpam_internal.h
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -59,14 +60,14 @@ struct mpam_msc {
>>          * part_sel_lock protects access to the MSC hardware registers that are
>>          * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
>>          * by RIS).
>> -        * If needed, take msc->lock first.
>> +        * If needed, take msc->probe_lock first.
> 
> Humm. I think this belongs in patch 10.
Yup, fixed.
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
- * Re: [PATCH 14/33] arm_mpam: Add cpuhp callbacks to probe MSC hardware
  2025-08-22 15:29 ` [PATCH 14/33] arm_mpam: Add cpuhp callbacks to probe MSC hardware James Morse
  2025-08-27 16:08   ` Rob Herring
@ 2025-09-05 16:40   ` Dave Martin
  2025-09-09 16:56     ` James Morse
  2025-09-09 14:23   ` Dave Martin
  2 siblings, 1 reply; 200+ messages in thread
From: Dave Martin @ 2025-09-05 16:40 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Lecopzer Chen
Hi James,
On Fri, Aug 22, 2025 at 03:29:55PM +0000, James Morse wrote:
> Because an MSC can only by accessed from the CPUs in its cpu-affinity
> set we need to be running on one of those CPUs to probe the MSC
> hardware.
> 
> Do this work in the cpuhp callback. Probing the hardware will only
> happen before MPAM is enabled, walk all the MSCs and probe those we can
> reach that haven't already been probed.
It may be worth mentioning that the low-level MSC register accessors
are added by this patch.
> Later once MPAM is enabled, this cpuhp callback will be replaced by
> one that avoids the global list.
I misread this is as meaning "later in the patch series" and got
confused.
Perhaps, something like the following? (though this got a bit verbose)
--8<--
Once all MSCs reported by the firmware have been probed from a CPU in
their respective cpu-affinity set, the probe-time cpuhp callbacks are
replaced.  The replacement callbacks will ultimately need to handle
save/restore of the runtime MSC state across power transitions, but for
now there is nothing to do in them: so do nothing.
-->8--
> Enabling a static key will also take the cpuhp lock, so can't be done
What static key?
None in this patch that I can see.
> from the cpuhp callback. Whenever a new MSC has been probed schedule
> work to test if all the MSCs have now been probed.
> 
> CC: Lecopzer Chen <lecopzerc@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  drivers/resctrl/mpam_devices.c  | 144 +++++++++++++++++++++++++++++++-
>  drivers/resctrl/mpam_internal.h |   8 +-
>  2 files changed, 147 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
[...]
> @@ -511,9 +539,84 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
[...]
> +static int mpam_cpu_online(unsigned int cpu)
>  {
> -	pr_err("Discovered all MSC\n");
I guess this disappears later?
If we print anything, it feels like it should be in the
mpam_enable_once() path, otherwise it looks like dmesg is going to get
spammed on every hotplug.  I might have missed something, here.
> +	return 0;
> +}
> +
> +/* Before mpam is enabled, try to probe new MSC */
> +static int mpam_discovery_cpu_online(unsigned int cpu)
> +{
> +	int err = 0;
> +	struct mpam_msc *msc;
> +	bool new_device_probed = false;
> +
> +	mutex_lock(&mpam_list_lock);
I take it nothing breaks if we sleep here?
Pending cpuhp callbacks for this CPU look to be blocked while we sleep,
at the very least.
Since this only happens during the probing phase, maybe that's not such
a big deal.
Is it likely that some late CPUs might be left offline indefinitely?
If so, we might end up doing futile work here forever.
> +	list_for_each_entry(msc, &mpam_all_msc, glbl_list) {
> +		if (!cpumask_test_cpu(cpu, &msc->accessibility))
> +			continue;
> +
> +		mutex_lock(&msc->probe_lock);
> +		if (!msc->probed)
> +			err = mpam_msc_hw_probe(msc);
> +		mutex_unlock(&msc->probe_lock);
> +
> +		if (!err)
> +			new_device_probed = true;
> +		else
> +			break; // mpam_broken
What's the effect of returning a non-zero value to the CPU hotplug
callback dispatcher here?
Do we want to tear anything down if MPAM is unusable?
> +	}
> +	mutex_unlock(&mpam_list_lock);
> +
> +	if (new_device_probed && !err)
> +		schedule_work(&mpam_enable_work);
> +
> +	return err;
> +}
> +
> +static int mpam_cpu_offline(unsigned int cpu)
> +{
> +	return 0;
> +}
> +
> +static void mpam_register_cpuhp_callbacks(int (*online)(unsigned int online),
> +					  int (*offline)(unsigned int offline))
> +{
> +	mutex_lock(&mpam_cpuhp_state_lock);
> +	if (mpam_cpuhp_state) {
> +		cpuhp_remove_state(mpam_cpuhp_state);
> +		mpam_cpuhp_state = 0;
> +	}
> +
> +	mpam_cpuhp_state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "mpam:online",
> +					     online, offline);
> +	if (mpam_cpuhp_state <= 0) {
> +		pr_err("Failed to register cpuhp callbacks");
Should an error code be returned to the caller if this fails?
> +		mpam_cpuhp_state = 0;
> +	}
> +	mutex_unlock(&mpam_cpuhp_state_lock);
>  }
>  
>  static int mpam_dt_count_msc(void)
> @@ -772,7 +875,7 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
>  	}
>  
>  	if (!err && fw_num_msc == mpam_num_msc)
> -		mpam_discovery_complete();
> +		mpam_register_cpuhp_callbacks(&mpam_discovery_cpu_online, NULL);
Abandon probing the MSC if this fails?
(However, the next phase of probing hangs off CPU hotplug, so it just
won't happen if the callbacks can't be registered -- but it looks like
MPAM may be left in a half-probed state.  I'm not entirely convinced
that this matters if the MPAM driver is not unloadable anyway...)
Nit: redundant &
(You don't have it in the similar call in mpam_enable_once().)
>  
>  	if (err && msc)
>  		mpam_msc_drv_remove(pdev);
> @@ -795,6 +898,41 @@ static struct platform_driver mpam_msc_driver = {
>  	.remove = mpam_msc_drv_remove,
>  };
>  
> +static void mpam_enable_once(void)
> +{
> +	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
Should it be fatal if this fails?
> +
> +	pr_info("MPAM enabled\n");
> +}
> +
> +/*
> + * Enable mpam once all devices have been probed.
> + * Scheduled by mpam_discovery_cpu_online() once all devices have been created.
> + * Also scheduled when new devices are probed when new CPUs come online.
> + */
> +void mpam_enable(struct work_struct *work)
> +{
> +	static atomic_t once;
Nit: possibly unnecessary atomic_t?  This is slow-path code, and we
already have to take mpam_list_lock.  Harmless, though.
> +	struct mpam_msc *msc;
> +	bool all_devices_probed = true;
> +
> +	/* Have we probed all the hw devices? */
> +	mutex_lock(&mpam_list_lock);
> +	list_for_each_entry(msc, &mpam_all_msc, glbl_list) {
> +		mutex_lock(&msc->probe_lock);
> +		if (!msc->probed)
> +			all_devices_probed = false;
> +		mutex_unlock(&msc->probe_lock);
> +
> +		if (!all_devices_probed)
> +			break;
WARN()?
We counted the MSCs in via the mpam_discovery_cpu_online(), so I think
we shouldn't get in here if some failed to probe?
> +	}
> +	mutex_unlock(&mpam_list_lock);
> +
> +	if (all_devices_probed && !atomic_fetch_inc(&once))
> +		mpam_enable_once();
> +}
[...]
Cheers
---Dave
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 14/33] arm_mpam: Add cpuhp callbacks to probe MSC hardware
  2025-09-05 16:40   ` Dave Martin
@ 2025-09-09 16:56     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-09 16:56 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Lecopzer Chen
Hi Dave,
On 05/09/2025 17:40, Dave Martin wrote:
> On Fri, Aug 22, 2025 at 03:29:55PM +0000, James Morse wrote:
>> Because an MSC can only by accessed from the CPUs in its cpu-affinity
>> set we need to be running on one of those CPUs to probe the MSC
>> hardware.
>>
>> Do this work in the cpuhp callback. Probing the hardware will only
>> happen before MPAM is enabled, walk all the MSCs and probe those we can
>> reach that haven't already been probed.
> 
> It may be worth mentioning that the low-level MSC register accessors
> are added by this patch.
Sure,
>> Later once MPAM is enabled, this cpuhp callback will be replaced by
>> one that avoids the global list.
> 
> I misread this is as meaning "later in the patch series" and got
> confused.
> 
> Perhaps, something like the following? (though this got a bit verbose)
> 
> --8<--
> 
> Once all MSCs reported by the firmware have been probed from a CPU in
> their respective cpu-affinity set, the probe-time cpuhp callbacks are
> replaced.  The replacement callbacks will ultimately need to handle
> save/restore of the runtime MSC state across power transitions, but for
> now there is nothing to do in them: so do nothing.
> 
> -->8--
Done.
>> Enabling a static key will also take the cpuhp lock, so can't be done
> 
> What static key?
The one that enables the architectures context-switch code. That was added by an
earlier patch, but got moved later to reduce the number of trees that this series touches.
(also, there is no point having the context switch code if you can't have different values
until the resctrl code shows up.)
This is trying to describe why mpam_enable() is scheduled, instead of just being called in
cpuhp context. (again - to reduce the churn caused by changing that later).
I'll rephrase it as:
| The architecture's context switch code will be enabled by a static-key, this can be set
| by mpam_enable(), but must be done from process context, not a cpuhp callback because
| both take the cpuhp lock.
| Whenever a new MSC has been probed, the mpam_enable() work is scheduled to test if all
| the MSCs have been probed.
> None in this patch that I can see.
> 
>> from the cpuhp callback. Whenever a new MSC has been probed schedule
>> work to test if all the MSCs have now been probed.
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> 
> [...]
> 
>> @@ -511,9 +539,84 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
> 
> [...]
> 
>> +static int mpam_cpu_online(unsigned int cpu)
>>  {
>> -	pr_err("Discovered all MSC\n");
> 
> I guess this disappears later?
> 
> If we print anything, it feels like it should be in the
> mpam_enable_once() path, otherwise it looks like dmesg is going to get
> spammed on every hotplug.  I might have missed something, here.
Yes - there is a print that happens on the mpam_enable_once() path that shows the number
of PARTID/PMG discovered. That "Discovered all MSC" message was so that probing had this
shape from the very beginning - it is unfortunately not as simple as probing a stand-alone
driver.
>> +	return 0;
>> +}
>> +
>> +/* Before mpam is enabled, try to probe new MSC */
>> +static int mpam_discovery_cpu_online(unsigned int cpu)
>> +{
>> +	int err = 0;
>> +	struct mpam_msc *msc;
>> +	bool new_device_probed = false;
>> +
>> +	mutex_lock(&mpam_list_lock);
> I take it nothing breaks if we sleep here?
From memory, callbacks registered at CPUHP_AP_ONLINE_DYN are allowed to sleep. There is
some point in the state machine where you can't. I can't find where this comes from right
now...
e.g. resctrl does this in x86's resctrl_arch_online_cpu() and friends.
> Pending cpuhp callbacks for this CPU look to be blocked while we sleep,
> at the very least.
> Since this only happens during the probing phase, maybe that's not such
> a big deal.
> Is it likely that some late CPUs might be left offline indefinitely?
Offlined and never come back is certainly something that can happen.
> If so, we might end up doing futile work here forever.
It may never probe all the MSC? Yes, that can certainly happen.
But the work only happens when CPUs come online, which is already a major serialising
event. There is no cost to run in this 'not done yet' state forever. It's not retrying on
a timer or something like that.
>> +	list_for_each_entry(msc, &mpam_all_msc, glbl_list) {
>> +		if (!cpumask_test_cpu(cpu, &msc->accessibility))
>> +			continue;
>> +
>> +		mutex_lock(&msc->probe_lock);
>> +		if (!msc->probed)
>> +			err = mpam_msc_hw_probe(msc);
>> +		mutex_unlock(&msc->probe_lock);
>> +
>> +		if (!err)
>> +			new_device_probed = true;
>> +		else
>> +			break; // mpam_broken
> What's the effect of returning a non-zero value to the CPU hotplug
> callback dispatcher here?
I think the dynamically allocated ones can fail without any ill effects, it'll print a
message but nothing else will happen. In this case the callbacks are synchronous with the
attempt to register them, so it propagates the error back there.
> Do we want to tear anything down if MPAM is unusable?
It will keep trying, whereas it could pack up shop completely.
Looks like that '// mpam_broken' is where I intended to schedule the work, but the code
doesn't exist this early in the series. I'll pull bits of that earlier.
>> +	}
>> +	mutex_unlock(&mpam_list_lock);
>> +
>> +	if (new_device_probed && !err)
>> +		schedule_work(&mpam_enable_work);
>> +
>> +	return err;
>> +}
>> +
>> +static int mpam_cpu_offline(unsigned int cpu)
>> +{
>> +	return 0;
>> +}
>> +
>> +static void mpam_register_cpuhp_callbacks(int (*online)(unsigned int online),
>> +					  int (*offline)(unsigned int offline))
>> +{
>> +	mutex_lock(&mpam_cpuhp_state_lock);
>> +	if (mpam_cpuhp_state) {
>> +		cpuhp_remove_state(mpam_cpuhp_state);
>> +		mpam_cpuhp_state = 0;
>> +	}
>> +
>> +	mpam_cpuhp_state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "mpam:online",
>> +					     online, offline);
>> +	if (mpam_cpuhp_state <= 0) {
>> +		pr_err("Failed to register cpuhp callbacks");
> Should an error code be returned to the caller if this fails?
It can fail asynchronously - so any error handling for this is only solves part of the
problem, a failure here means at least one of the callbacks ran and returned an error.
If we schedule the disable call in the callback then that will take care of tearing the
whole thing down.
The cpuhp callbacks don't get registered until all the driver has found all the MSC that
firmware described, so there is no race with driver:probing an MSC after the disable call
got scheduled which tears the whole thing down.
>> +		mpam_cpuhp_state = 0;
>> +	}
>> +	mutex_unlock(&mpam_cpuhp_state_lock);
>>  }
>>  
>>  static int mpam_dt_count_msc(void)
>> @@ -772,7 +875,7 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
>>  	}
>>  
>>  	if (!err && fw_num_msc == mpam_num_msc)
>> -		mpam_discovery_complete();
>> +		mpam_register_cpuhp_callbacks(&mpam_discovery_cpu_online, NULL);
> 
> Abandon probing the MSC if this fails?
Any error returned here is most likely to be from mpam_discovery_cpu_online(), but that
can also happen asynchronously. Scheduling mpam_disable() is a better approach as it
covers the asynchronous case too.
> (However, the next phase of probing hangs off CPU hotplug, so it just
> won't happen if the callbacks can't be registered -- but it looks like
> MPAM may be left in a half-probed state.  I'm not entirely convinced
> that this matters if the MPAM driver is not unloadable anyway...)
I'm not at all worried about failing to register the cpuhp callbacks. The space needed for
that is pre-allocated, if there isn't enough space, you get a splat from the cpuhp core -
and the callbacks never run. No additional/unnecessary work happens - sure the memory
didn't get free'd, but the WARN() from cpuhp_reserve_state() should be enough to debug this.
> Nit: redundant &
> 
> (You don't have it in the similar call in mpam_enable_once().)
Done,
>>  	if (err && msc)
>>  		mpam_msc_drv_remove(pdev);
>> @@ -795,6 +898,41 @@ static struct platform_driver mpam_msc_driver = {
>>  	.remove = mpam_msc_drv_remove,
>>  };
>>  
>> +static void mpam_enable_once(void)
>> +{
>> +	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
> 
> Should it be fatal if this fails?
As above. Once case doesn't matter - and any handling here is incomplete.
>> +
>> +	pr_info("MPAM enabled\n");
>> +}
>> +
>> +/*
>> + * Enable mpam once all devices have been probed.
>> + * Scheduled by mpam_discovery_cpu_online() once all devices have been created.
>> + * Also scheduled when new devices are probed when new CPUs come online.
>> + */
>> +void mpam_enable(struct work_struct *work)
>> +{
>> +	static atomic_t once;
> 
> Nit: possibly unnecessary atomic_t?  This is slow-path code, and we
> already have to take mpam_list_lock.  Harmless, though.
mpam_enable_once() can't be called under the lock because of the ordering with cpuhp. This
just ended up being cleaner. Too much is stuffed under that lock already!
>> +	struct mpam_msc *msc;
>> +	bool all_devices_probed = true;
>> +
>> +	/* Have we probed all the hw devices? */
>> +	mutex_lock(&mpam_list_lock);
>> +	list_for_each_entry(msc, &mpam_all_msc, glbl_list) {
>> +		mutex_lock(&msc->probe_lock);
>> +		if (!msc->probed)
>> +			all_devices_probed = false;
>> +		mutex_unlock(&msc->probe_lock);
>> +
>> +		if (!all_devices_probed)
>> +			break;
> 
> WARN()?
Expected condition...
> We counted the MSCs in via the mpam_discovery_cpu_online(), so I think
> we shouldn't get in here if some failed to probe?
No we didn't!
We counted them in mpam_msc_drv_probe() before registering the cpuhp callbacks.
The cpuhp callbacks run asynchronously, each one schedules mpam_enable() iff it probed a
new device. mpam_enable() then has to check if all the devices had been probed.
This is done so that mpam_discovery_cpu_online() only has to take the msc->probe_lock for
MSC that it can access - instead of all of them. That was to avoid having something that
serialises CPUs coming online ... but mpam_discovery_cpu_online() is taking the list_lock
due to an incomplete switch to SRCU. I'll fix that.
>> +	}
>> +	mutex_unlock(&mpam_list_lock);
>> +
>> +	if (all_devices_probed && !atomic_fetch_inc(&once))
>> +		mpam_enable_once();
>> +}
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
- * Re: [PATCH 14/33] arm_mpam: Add cpuhp callbacks to probe MSC hardware
  2025-08-22 15:29 ` [PATCH 14/33] arm_mpam: Add cpuhp callbacks to probe MSC hardware James Morse
  2025-08-27 16:08   ` Rob Herring
  2025-09-05 16:40   ` Dave Martin
@ 2025-09-09 14:23   ` Dave Martin
  2 siblings, 0 replies; 200+ messages in thread
From: Dave Martin @ 2025-09-09 14:23 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
	tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
	dfustini, amitsinght, David Hildenbrand, Rex Nie, Koba Ko,
	Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Lecopzer Chen
Hi James,
While I'm here:
On Fri, Aug 22, 2025 at 03:29:55PM +0000, James Morse wrote:
> Because an MSC can only by accessed from the CPUs in its cpu-affinity
> set we need to be running on one of those CPUs to probe the MSC
> hardware.
> 
> Do this work in the cpuhp callback. Probing the hardware will only
> happen before MPAM is enabled, walk all the MSCs and probe those we can
> reach that haven't already been probed.
> 
> Later once MPAM is enabled, this cpuhp callback will be replaced by
> one that avoids the global list.
> 
> Enabling a static key will also take the cpuhp lock, so can't be done
> from the cpuhp callback. Whenever a new MSC has been probed schedule
> work to test if all the MSCs have now been probed.
> 
> CC: Lecopzer Chen <lecopzerc@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  drivers/resctrl/mpam_devices.c  | 144 +++++++++++++++++++++++++++++++-
>  drivers/resctrl/mpam_internal.h |   8 +-
>  2 files changed, 147 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 5baf2a8786fb..9d6516f98acf 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
[...]
> @@ -511,9 +539,84 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
>  	return err;
>  }
>  
> -static void mpam_discovery_complete(void)
> +static int mpam_msc_hw_probe(struct mpam_msc *msc)
> +{
> +	u64 idr;
> +	int err;
Redundant variable which gets removed again in the next patch?
 
> +
> +	lockdep_assert_held(&msc->probe_lock);
> +
> +	mutex_lock(&msc->part_sel_lock);
> +	idr = mpam_read_partsel_reg(msc, AIDR);
> +	if ((idr & MPAMF_AIDR_ARCH_MAJOR_REV) != MPAM_ARCHITECTURE_V1) {
> +		pr_err_once("%s does not match MPAM architecture v1.x\n",
> +			    dev_name(&msc->pdev->dev));
> +		err = -EIO;
> +	} else {
> +		msc->probed = true;
> +		err = 0;
> +	}
> +	mutex_unlock(&msc->part_sel_lock);
> +
> +	return err;
> +}
[...]
Cheers
---Dave
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
- * [PATCH 15/33] arm_mpam: Probe MSCs to find the supported partid/pmg values
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (13 preceding siblings ...)
  2025-08-22 15:29 ` [PATCH 14/33] arm_mpam: Add cpuhp callbacks to probe MSC hardware James Morse
@ 2025-08-22 15:29 ` James Morse
  2025-08-28 13:12   ` Ben Horgan
  2025-09-08 16:29   ` Dave Martin
  2025-08-22 15:29 ` [PATCH 16/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers James Morse
                   ` (52 subsequent siblings)
  67 siblings, 2 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
CPUs can generate traffic with a range of PARTID and PMG values,
but each MSC may have its own maximum size for these fields.
Before MPAM can be used, the driver needs to probe each RIS on
each MSC, to find the system-wide smallest value that can be used.
While doing this, RIS entries that firmware didn't describe are create
under MPAM_CLASS_UNKNOWN.
While we're here, implement the mpam_register_requestor() call
for the arch code to register the CPU limits. Future callers of this
will tell us about the SMMU and ITS.
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_devices.c  | 158 ++++++++++++++++++++++++++++++--
 drivers/resctrl/mpam_internal.h |   6 ++
 include/linux/arm_mpam.h        |  14 +++
 3 files changed, 171 insertions(+), 7 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 9d6516f98acf..012e09e80300 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -6,6 +6,7 @@
 #include <linux/acpi.h>
 #include <linux/atomic.h>
 #include <linux/arm_mpam.h>
+#include <linux/bitfield.h>
 #include <linux/cacheinfo.h>
 #include <linux/cpu.h>
 #include <linux/cpumask.h>
@@ -44,6 +45,15 @@ static u32 mpam_num_msc;
 static int mpam_cpuhp_state;
 static DEFINE_MUTEX(mpam_cpuhp_state_lock);
 
+/*
+ * The smallest common values for any CPU or MSC in the system.
+ * Generating traffic outside this range will result in screaming interrupts.
+ */
+u16 mpam_partid_max;
+u8 mpam_pmg_max;
+static bool partid_max_init, partid_max_published;
+static DEFINE_SPINLOCK(partid_max_lock);
+
 /*
  * mpam is enabled once all devices have been probed from CPU online callbacks,
  * scheduled via this work_struct. If access to an MSC depends on a CPU that
@@ -106,6 +116,74 @@ static inline u32 _mpam_read_partsel_reg(struct mpam_msc *msc, u16 reg)
 
 #define mpam_read_partsel_reg(msc, reg)        _mpam_read_partsel_reg(msc, MPAMF_##reg)
 
+static void __mpam_write_reg(struct mpam_msc *msc, u16 reg, u32 val)
+{
+	WARN_ON_ONCE(reg + sizeof(u32) > msc->mapped_hwpage_sz);
+	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
+
+	writel_relaxed(val, msc->mapped_hwpage + reg);
+}
+
+static inline void _mpam_write_partsel_reg(struct mpam_msc *msc, u16 reg, u32 val)
+{
+	lockdep_assert_held_once(&msc->part_sel_lock);
+	__mpam_write_reg(msc, reg, val);
+}
+#define mpam_write_partsel_reg(msc, reg, val)  _mpam_write_partsel_reg(msc, MPAMCFG_##reg, val)
+
+static u64 mpam_msc_read_idr(struct mpam_msc *msc)
+{
+	u64 idr_high = 0, idr_low;
+
+	lockdep_assert_held(&msc->part_sel_lock);
+
+	idr_low = mpam_read_partsel_reg(msc, IDR);
+	if (FIELD_GET(MPAMF_IDR_EXT, idr_low))
+		idr_high = mpam_read_partsel_reg(msc, IDR + 4);
+
+	return (idr_high << 32) | idr_low;
+}
+
+static void __mpam_part_sel_raw(u32 partsel, struct mpam_msc *msc)
+{
+	lockdep_assert_held(&msc->part_sel_lock);
+
+	mpam_write_partsel_reg(msc, PART_SEL, partsel);
+}
+
+static void __mpam_part_sel(u8 ris_idx, u16 partid, struct mpam_msc *msc)
+{
+	u32 partsel = FIELD_PREP(MPAMCFG_PART_SEL_RIS, ris_idx) |
+		      FIELD_PREP(MPAMCFG_PART_SEL_PARTID_SEL, partid);
+
+	__mpam_part_sel_raw(partsel, msc);
+}
+
+int mpam_register_requestor(u16 partid_max, u8 pmg_max)
+{
+	int err = 0;
+
+	lockdep_assert_irqs_enabled();
+
+	spin_lock(&partid_max_lock);
+	if (!partid_max_init) {
+		mpam_partid_max = partid_max;
+		mpam_pmg_max = pmg_max;
+		partid_max_init = true;
+	} else if (!partid_max_published) {
+		mpam_partid_max = min(mpam_partid_max, partid_max);
+		mpam_pmg_max = min(mpam_pmg_max, pmg_max);
+	} else {
+		/* New requestors can't lower the values */
+		if (partid_max < mpam_partid_max || pmg_max < mpam_pmg_max)
+			err = -EBUSY;
+	}
+	spin_unlock(&partid_max_lock);
+
+	return err;
+}
+EXPORT_SYMBOL(mpam_register_requestor);
+
 #define init_garbage(x)	init_llist_node(&(x)->garbage.llist)
 
 static struct mpam_vmsc *
@@ -520,6 +598,7 @@ static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
 	cpumask_or(&comp->affinity, &comp->affinity, &ris->affinity);
 	cpumask_or(&class->affinity, &class->affinity, &ris->affinity);
 	list_add_rcu(&ris->vmsc_list, &vmsc->ris);
+	list_add_rcu(&ris->msc_list, &msc->ris);
 
 	return 0;
 }
@@ -539,10 +618,37 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
 	return err;
 }
 
+static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
+						   u8 ris_idx)
+{
+	int err;
+	struct mpam_msc_ris *ris, *found = ERR_PTR(-ENOENT);
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	if (!test_bit(ris_idx, msc->ris_idxs)) {
+		err = mpam_ris_create_locked(msc, ris_idx, MPAM_CLASS_UNKNOWN,
+					     0, 0, GFP_ATOMIC);
+		if (err)
+			return ERR_PTR(err);
+	}
+
+	list_for_each_entry(ris, &msc->ris, msc_list) {
+		if (ris->ris_idx == ris_idx) {
+			found = ris;
+			break;
+		}
+	}
+
+	return found;
+}
+
 static int mpam_msc_hw_probe(struct mpam_msc *msc)
 {
 	u64 idr;
-	int err;
+	u16 partid_max;
+	u8 ris_idx, pmg_max;
+	struct mpam_msc_ris *ris;
 
 	lockdep_assert_held(&msc->probe_lock);
 
@@ -551,14 +657,42 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 	if ((idr & MPAMF_AIDR_ARCH_MAJOR_REV) != MPAM_ARCHITECTURE_V1) {
 		pr_err_once("%s does not match MPAM architecture v1.x\n",
 			    dev_name(&msc->pdev->dev));
-		err = -EIO;
-	} else {
-		msc->probed = true;
-		err = 0;
+		mutex_unlock(&msc->part_sel_lock);
+		return -EIO;
 	}
+
+	idr = mpam_msc_read_idr(msc);
 	mutex_unlock(&msc->part_sel_lock);
+	msc->ris_max = FIELD_GET(MPAMF_IDR_RIS_MAX, idr);
+
+	/* Use these values so partid/pmg always starts with a valid value */
+	msc->partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
+	msc->pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
+
+	for (ris_idx = 0; ris_idx <= msc->ris_max; ris_idx++) {
+		mutex_lock(&msc->part_sel_lock);
+		__mpam_part_sel(ris_idx, 0, msc);
+		idr = mpam_msc_read_idr(msc);
+		mutex_unlock(&msc->part_sel_lock);
+
+		partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
+		pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
+		msc->partid_max = min(msc->partid_max, partid_max);
+		msc->pmg_max = min(msc->pmg_max, pmg_max);
+
+		ris = mpam_get_or_create_ris(msc, ris_idx);
+		if (IS_ERR(ris))
+			return PTR_ERR(ris);
+	}
 
-	return err;
+	spin_lock(&partid_max_lock);
+	mpam_partid_max = min(mpam_partid_max, msc->partid_max);
+	mpam_pmg_max = min(mpam_pmg_max, msc->pmg_max);
+	spin_unlock(&partid_max_lock);
+
+	msc->probed = true;
+
+	return 0;
 }
 
 static int mpam_cpu_online(unsigned int cpu)
@@ -900,9 +1034,18 @@ static struct platform_driver mpam_msc_driver = {
 
 static void mpam_enable_once(void)
 {
+	/*
+	 * Once the cpuhp callbacks have been changed, mpam_partid_max can no
+	 * longer change.
+	 */
+	spin_lock(&partid_max_lock);
+	partid_max_published = true;
+	spin_unlock(&partid_max_lock);
+
 	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
 
-	pr_info("MPAM enabled\n");
+	printk(KERN_INFO "MPAM enabled with %u partid and %u pmg\n",
+	       mpam_partid_max + 1, mpam_pmg_max + 1);
 }
 
 /*
@@ -972,4 +1115,5 @@ static int __init mpam_msc_driver_init(void)
 
 	return platform_driver_register(&mpam_msc_driver);
 }
+/* Must occur after arm64_mpam_register_cpus() from arch_initcall() */
 subsys_initcall(mpam_msc_driver_init);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index a98cca08a2ef..a623f405ddd8 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -50,6 +50,8 @@ struct mpam_msc {
 	 */
 	struct mutex		probe_lock;
 	bool			probed;
+	u16			partid_max;
+	u8			pmg_max;
 	unsigned long		ris_idxs[128 / BITS_PER_LONG];
 	u32			ris_max;
 
@@ -148,6 +150,10 @@ struct mpam_msc_ris {
 extern struct srcu_struct mpam_srcu;
 extern struct list_head mpam_classes;
 
+/* System wide partid/pmg values */
+extern u16 mpam_partid_max;
+extern u8 mpam_pmg_max;
+
 /* Scheduled work callback to enable mpam once all MSC have been probed */
 void mpam_enable(struct work_struct *work);
 
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
index 406a77be68cb..8af93794c7a2 100644
--- a/include/linux/arm_mpam.h
+++ b/include/linux/arm_mpam.h
@@ -39,4 +39,18 @@ static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
 int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
 		    enum mpam_class_types type, u8 class_id, int component_id);
 
+/**
+ * mpam_register_requestor() - Register a requestor with the MPAM driver
+ * @partid_max:		The maximum PARTID value the requestor can generate.
+ * @pmg_max:		The maximum PMG value the requestor can generate.
+ *
+ * Registers a requestor with the MPAM driver to ensure the chosen system-wide
+ * minimum PARTID and PMG values will allow the requestors features to be used.
+ *
+ * Returns an error if the registration is too late, and a larger PARTID/PMG
+ * value has been advertised to user-space. In this case the requestor should
+ * not use its MPAM features. Returns 0 on success.
+ */
+int mpam_register_requestor(u16 partid_max, u8 pmg_max);
+
 #endif /* __LINUX_ARM_MPAM_H */
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 15/33] arm_mpam: Probe MSCs to find the supported partid/pmg values
  2025-08-22 15:29 ` [PATCH 15/33] arm_mpam: Probe MSCs to find the supported partid/pmg values James Morse
@ 2025-08-28 13:12   ` Ben Horgan
  2025-09-09 16:56     ` James Morse
  2025-09-08 16:29   ` Dave Martin
  1 sibling, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-08-28 13:12 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:29, James Morse wrote:
> CPUs can generate traffic with a range of PARTID and PMG values,
> but each MSC may have its own maximum size for these fields.
> Before MPAM can be used, the driver needs to probe each RIS on
> each MSC, to find the system-wide smallest value that can be used.
> 
> While doing this, RIS entries that firmware didn't describe are create
> under MPAM_CLASS_UNKNOWN.
> 
> While we're here, implement the mpam_register_requestor() call
> for the arch code to register the CPU limits. Future callers of this
> will tell us about the SMMU and ITS.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  drivers/resctrl/mpam_devices.c  | 158 ++++++++++++++++++++++++++++++--
>  drivers/resctrl/mpam_internal.h |   6 ++
>  include/linux/arm_mpam.h        |  14 +++
>  3 files changed, 171 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 9d6516f98acf..012e09e80300 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -6,6 +6,7 @@
>  #include <linux/acpi.h>
>  #include <linux/atomic.h>
>  #include <linux/arm_mpam.h>
> +#include <linux/bitfield.h>
>  #include <linux/cacheinfo.h>
>  #include <linux/cpu.h>
>  #include <linux/cpumask.h>
> @@ -44,6 +45,15 @@ static u32 mpam_num_msc;
>  static int mpam_cpuhp_state;
>  static DEFINE_MUTEX(mpam_cpuhp_state_lock);
>  
> +/*
> + * The smallest common values for any CPU or MSC in the system.
> + * Generating traffic outside this range will result in screaming interrupts.
> + */
> +u16 mpam_partid_max;
> +u8 mpam_pmg_max;
> +static bool partid_max_init, partid_max_published;
> +static DEFINE_SPINLOCK(partid_max_lock);
> +
>  /*
>   * mpam is enabled once all devices have been probed from CPU online callbacks,
>   * scheduled via this work_struct. If access to an MSC depends on a CPU that
> @@ -106,6 +116,74 @@ static inline u32 _mpam_read_partsel_reg(struct mpam_msc *msc, u16 reg)
>  
>  #define mpam_read_partsel_reg(msc, reg)        _mpam_read_partsel_reg(msc, MPAMF_##reg)
>  
> +static void __mpam_write_reg(struct mpam_msc *msc, u16 reg, u32 val)
> +{
> +	WARN_ON_ONCE(reg + sizeof(u32) > msc->mapped_hwpage_sz);
> +	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
> +
> +	writel_relaxed(val, msc->mapped_hwpage + reg);
> +}
> +
> +static inline void _mpam_write_partsel_reg(struct mpam_msc *msc, u16 reg, u32 val)
> +{
> +	lockdep_assert_held_once(&msc->part_sel_lock);
> +	__mpam_write_reg(msc, reg, val);
> +}
> +#define mpam_write_partsel_reg(msc, reg, val)  _mpam_write_partsel_reg(msc, MPAMCFG_##reg, val)
> +
> +static u64 mpam_msc_read_idr(struct mpam_msc *msc)
> +{
> +	u64 idr_high = 0, idr_low;
> +
> +	lockdep_assert_held(&msc->part_sel_lock);
> +
> +	idr_low = mpam_read_partsel_reg(msc, IDR);
> +	if (FIELD_GET(MPAMF_IDR_EXT, idr_low))
> +		idr_high = mpam_read_partsel_reg(msc, IDR + 4);
> +
> +	return (idr_high << 32) | idr_low;
> +}
> +
> +static void __mpam_part_sel_raw(u32 partsel, struct mpam_msc *msc)
> +{
> +	lockdep_assert_held(&msc->part_sel_lock);
> +
> +	mpam_write_partsel_reg(msc, PART_SEL, partsel);
> +}
> +
> +static void __mpam_part_sel(u8 ris_idx, u16 partid, struct mpam_msc *msc)
> +{
> +	u32 partsel = FIELD_PREP(MPAMCFG_PART_SEL_RIS, ris_idx) |
> +		      FIELD_PREP(MPAMCFG_PART_SEL_PARTID_SEL, partid);
> +
> +	__mpam_part_sel_raw(partsel, msc);
> +}
> +
> +int mpam_register_requestor(u16 partid_max, u8 pmg_max)
> +{
> +	int err = 0;
> +
> +	lockdep_assert_irqs_enabled();
> +
> +	spin_lock(&partid_max_lock);
> +	if (!partid_max_init) {
> +		mpam_partid_max = partid_max;
> +		mpam_pmg_max = pmg_max;
> +		partid_max_init = true;
> +	} else if (!partid_max_published) {
> +		mpam_partid_max = min(mpam_partid_max, partid_max);
> +		mpam_pmg_max = min(mpam_pmg_max, pmg_max);
Do we really need to reduce these maximum here? If, say, we add an SMMU
requester which supports fewer partids than the cpus don't we want to be
able to carry on using those partids from the cpus. In this case the
SMMU requestor can, without risk of error interrupts, just use all the
partids it supports.
> +	} else {
> +		/* New requestors can't lower the values */
> +		if (partid_max < mpam_partid_max || pmg_max < mpam_pmg_max)
> +			err = -EBUSY;
> +	}
> +	spin_unlock(&partid_max_lock);
> +
> +	return err;
> +}
> +EXPORT_SYMBOL(mpam_register_requestor);
> +
>  #define init_garbage(x)	init_llist_node(&(x)->garbage.llist)
>  
>  static struct mpam_vmsc *
> @@ -520,6 +598,7 @@ static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
>  	cpumask_or(&comp->affinity, &comp->affinity, &ris->affinity);
>  	cpumask_or(&class->affinity, &class->affinity, &ris->affinity);
>  	list_add_rcu(&ris->vmsc_list, &vmsc->ris);
> +	list_add_rcu(&ris->msc_list, &msc->ris);
>  
>  	return 0;
>  }
> @@ -539,10 +618,37 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
>  	return err;
>  }
>  
> +static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
> +						   u8 ris_idx)
> +{
> +	int err;
> +	struct mpam_msc_ris *ris, *found = ERR_PTR(-ENOENT);
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	if (!test_bit(ris_idx, msc->ris_idxs)) {
> +		err = mpam_ris_create_locked(msc, ris_idx, MPAM_CLASS_UNKNOWN,
> +					     0, 0, GFP_ATOMIC);
> +		if (err)
> +			return ERR_PTR(err);
> +	}
> +
> +	list_for_each_entry(ris, &msc->ris, msc_list) {
> +		if (ris->ris_idx == ris_idx) {
> +			found = ris;
> +			break;
> +		}
> +	}
> +
> +	return found;
> +}
> +
>  static int mpam_msc_hw_probe(struct mpam_msc *msc)
>  {
>  	u64 idr;
> -	int err;
> +	u16 partid_max;
> +	u8 ris_idx, pmg_max;
> +	struct mpam_msc_ris *ris;
>  
>  	lockdep_assert_held(&msc->probe_lock);
>  
> @@ -551,14 +657,42 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>  	if ((idr & MPAMF_AIDR_ARCH_MAJOR_REV) != MPAM_ARCHITECTURE_V1) {
>  		pr_err_once("%s does not match MPAM architecture v1.x\n",
>  			    dev_name(&msc->pdev->dev));
> -		err = -EIO;
> -	} else {
> -		msc->probed = true;
> -		err = 0;
> +		mutex_unlock(&msc->part_sel_lock);
> +		return -EIO;
>  	}
> +
> +	idr = mpam_msc_read_idr(msc);
>  	mutex_unlock(&msc->part_sel_lock);
> +	msc->ris_max = FIELD_GET(MPAMF_IDR_RIS_MAX, idr);
> +
> +	/* Use these values so partid/pmg always starts with a valid value */
> +	msc->partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
> +	msc->pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
> +
> +	for (ris_idx = 0; ris_idx <= msc->ris_max; ris_idx++) {
> +		mutex_lock(&msc->part_sel_lock);
> +		__mpam_part_sel(ris_idx, 0, msc);
> +		idr = mpam_msc_read_idr(msc);
> +		mutex_unlock(&msc->part_sel_lock);
> +
> +		partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
> +		pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
> +		msc->partid_max = min(msc->partid_max, partid_max);
> +		msc->pmg_max = min(msc->pmg_max, pmg_max);
> +
> +		ris = mpam_get_or_create_ris(msc, ris_idx);
> +		if (IS_ERR(ris))
> +			return PTR_ERR(ris);
> +	}
>  
> -	return err;
> +	spin_lock(&partid_max_lock);
> +	mpam_partid_max = min(mpam_partid_max, msc->partid_max);
> +	mpam_pmg_max = min(mpam_pmg_max, msc->pmg_max);
> +	spin_unlock(&partid_max_lock);
> +
> +	msc->probed = true;
> +
> +	return 0;
>  }
>  
>  static int mpam_cpu_online(unsigned int cpu)
> @@ -900,9 +1034,18 @@ static struct platform_driver mpam_msc_driver = {
>  
>  static void mpam_enable_once(void)
>  {
> +	/*
> +	 * Once the cpuhp callbacks have been changed, mpam_partid_max can no
> +	 * longer change.
> +	 */
> +	spin_lock(&partid_max_lock);
> +	partid_max_published = true;
> +	spin_unlock(&partid_max_lock);
> +
>  	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
>  
> -	pr_info("MPAM enabled\n");
> +	printk(KERN_INFO "MPAM enabled with %u partid and %u pmg\n",
> +	       mpam_partid_max + 1, mpam_pmg_max + 1);
>  }
>  
>  /*
> @@ -972,4 +1115,5 @@ static int __init mpam_msc_driver_init(void)
>  
>  	return platform_driver_register(&mpam_msc_driver);
>  }
> +/* Must occur after arm64_mpam_register_cpus() from arch_initcall() */
>  subsys_initcall(mpam_msc_driver_init);
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index a98cca08a2ef..a623f405ddd8 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -50,6 +50,8 @@ struct mpam_msc {
>  	 */
>  	struct mutex		probe_lock;
>  	bool			probed;
> +	u16			partid_max;
> +	u8			pmg_max;
>  	unsigned long		ris_idxs[128 / BITS_PER_LONG];
>  	u32			ris_max;
>  
> @@ -148,6 +150,10 @@ struct mpam_msc_ris {
>  extern struct srcu_struct mpam_srcu;
>  extern struct list_head mpam_classes;
>  
> +/* System wide partid/pmg values */
> +extern u16 mpam_partid_max;
> +extern u8 mpam_pmg_max;
> +
>  /* Scheduled work callback to enable mpam once all MSC have been probed */
>  void mpam_enable(struct work_struct *work);
>  
> diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
> index 406a77be68cb..8af93794c7a2 100644
> --- a/include/linux/arm_mpam.h
> +++ b/include/linux/arm_mpam.h
> @@ -39,4 +39,18 @@ static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
>  int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
>  		    enum mpam_class_types type, u8 class_id, int component_id);
>  
> +/**
> + * mpam_register_requestor() - Register a requestor with the MPAM driver
> + * @partid_max:		The maximum PARTID value the requestor can generate.
> + * @pmg_max:		The maximum PMG value the requestor can generate.
> + *
> + * Registers a requestor with the MPAM driver to ensure the chosen system-wide
> + * minimum PARTID and PMG values will allow the requestors features to be used.
> + *
> + * Returns an error if the registration is too late, and a larger PARTID/PMG
> + * value has been advertised to user-space. In this case the requestor should
> + * not use its MPAM features. Returns 0 on success.
> + */
> +int mpam_register_requestor(u16 partid_max, u8 pmg_max);
> +
>  #endif /* __LINUX_ARM_MPAM_H */
Thanks,
Ben
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 15/33] arm_mpam: Probe MSCs to find the supported partid/pmg values
  2025-08-28 13:12   ` Ben Horgan
@ 2025-09-09 16:56     ` James Morse
  2025-09-10  9:01       ` Ben Horgan
  0 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-09-09 16:56 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi Ben,
On 28/08/2025 14:12, Ben Horgan wrote:
> On 8/22/25 16:29, James Morse wrote:
>> CPUs can generate traffic with a range of PARTID and PMG values,
>> but each MSC may have its own maximum size for these fields.
>> Before MPAM can be used, the driver needs to probe each RIS on
>> each MSC, to find the system-wide smallest value that can be used.
>>
>> While doing this, RIS entries that firmware didn't describe are create
>> under MPAM_CLASS_UNKNOWN.
>>
>> While we're here, implement the mpam_register_requestor() call
>> for the arch code to register the CPU limits. Future callers of this
>> will tell us about the SMMU and ITS.
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index 9d6516f98acf..012e09e80300 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -106,6 +116,74 @@ static inline u32 _mpam_read_partsel_reg(struct mpam_msc *msc, u16 reg)
>> +int mpam_register_requestor(u16 partid_max, u8 pmg_max)
>> +{
>> +	int err = 0;
>> +
>> +	lockdep_assert_irqs_enabled();
>> +
>> +	spin_lock(&partid_max_lock);
>> +	if (!partid_max_init) {
>> +		mpam_partid_max = partid_max;
>> +		mpam_pmg_max = pmg_max;
>> +		partid_max_init = true;
>> +	} else if (!partid_max_published) {
>> +		mpam_partid_max = min(mpam_partid_max, partid_max);
>> +		mpam_pmg_max = min(mpam_pmg_max, pmg_max);
> Do we really need to reduce these maximum here? If, say, we add an SMMU
> requester which supports fewer partids than the cpus don't we want to be
> able to carry on using those partids from the cpus. In this case the
> SMMU requestor can, without risk of error interrupts, just use all the
> partids it supports.
How would it do that?
We're probably going to expose that SMMU, or the devices behind it, via resctrl. You can
create 10 control groups in resctrl - but can't assign the SMMU/devices to the last two
because it doesn't actually support that many...
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 15/33] arm_mpam: Probe MSCs to find the supported partid/pmg values
  2025-09-09 16:56     ` James Morse
@ 2025-09-10  9:01       ` Ben Horgan
  0 siblings, 0 replies; 200+ messages in thread
From: Ben Horgan @ 2025-09-10  9:01 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 9/9/25 17:56, James Morse wrote:
> Hi Ben,
> 
> On 28/08/2025 14:12, Ben Horgan wrote:
>> On 8/22/25 16:29, James Morse wrote:
>>> CPUs can generate traffic with a range of PARTID and PMG values,
>>> but each MSC may have its own maximum size for these fields.
>>> Before MPAM can be used, the driver needs to probe each RIS on
>>> each MSC, to find the system-wide smallest value that can be used.
>>>
>>> While doing this, RIS entries that firmware didn't describe are create
>>> under MPAM_CLASS_UNKNOWN.
>>>
>>> While we're here, implement the mpam_register_requestor() call
>>> for the arch code to register the CPU limits. Future callers of this
>>> will tell us about the SMMU and ITS.
> 
>>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>>> index 9d6516f98acf..012e09e80300 100644
>>> --- a/drivers/resctrl/mpam_devices.c
>>> +++ b/drivers/resctrl/mpam_devices.c
>>> @@ -106,6 +116,74 @@ static inline u32 _mpam_read_partsel_reg(struct mpam_msc *msc, u16 reg)
> 
>>> +int mpam_register_requestor(u16 partid_max, u8 pmg_max)
>>> +{
>>> +	int err = 0;
>>> +
>>> +	lockdep_assert_irqs_enabled();
>>> +
>>> +	spin_lock(&partid_max_lock);
>>> +	if (!partid_max_init) {
>>> +		mpam_partid_max = partid_max;
>>> +		mpam_pmg_max = pmg_max;
>>> +		partid_max_init = true;
>>> +	} else if (!partid_max_published) {
>>> +		mpam_partid_max = min(mpam_partid_max, partid_max);
>>> +		mpam_pmg_max = min(mpam_pmg_max, pmg_max);
> 
>> Do we really need to reduce these maximum here? If, say, we add an SMMU
>> requester which supports fewer partids than the cpus don't we want to be
>> able to carry on using those partids from the cpus. In this case the
>> SMMU requestor can, without risk of error interrupts, just use all the
>> partids it supports.
> 
> How would it do that?
> 
> We're probably going to expose that SMMU, or the devices behind it, via resctrl. You can
> create 10 control groups in resctrl - but can't assign the SMMU/devices to the last two
> because it doesn't actually support that many...
Ok. If that's how it's going to be exposed to the user then it make sense.
> 
> 
> Thanks,
> 
> James
Thanks,
Ben
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
 
- * Re: [PATCH 15/33] arm_mpam: Probe MSCs to find the supported partid/pmg values
  2025-08-22 15:29 ` [PATCH 15/33] arm_mpam: Probe MSCs to find the supported partid/pmg values James Morse
  2025-08-28 13:12   ` Ben Horgan
@ 2025-09-08 16:29   ` Dave Martin
  2025-09-09 16:57     ` James Morse
  1 sibling, 1 reply; 200+ messages in thread
From: Dave Martin @ 2025-09-08 16:29 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi,
On Fri, Aug 22, 2025 at 03:29:56PM +0000, James Morse wrote:
> Subject: Re: [PATCH 15/33] arm_mpam: Probe MSCs to find the supported partid/pmg values
Summary line does not describe the patch (i.e., the requestor
registration interface is something different... and the probing of RIS
properties has nothing architecturally to do with determining the max
PARTID and PMG values.)
Even regarding the MSC probing, there is considerably more in this
patch than just the determination of the ID limits.
> CPUs can generate traffic with a range of PARTID and PMG values,
> but each MSC may have its own maximum size for these fields.
> Before MPAM can be used, the driver needs to probe each RIS on
> each MSC, to find the system-wide smallest value that can be used.
> 
> While doing this, RIS entries that firmware didn't describe are create
Nit: created
> under MPAM_CLASS_UNKNOWN.
What are the effects / implications of this?
> While we're here, implement the mpam_register_requestor() call
> for the arch code to register the CPU limits. Future callers of this
> will tell us about the SMMU and ITS.
This function seems to be dead in this series; registration of CPUs is
"future" too. Does it make sense to boot this into the next series?
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  drivers/resctrl/mpam_devices.c  | 158 ++++++++++++++++++++++++++++++--
>  drivers/resctrl/mpam_internal.h |   6 ++
>  include/linux/arm_mpam.h        |  14 +++
>  3 files changed, 171 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 9d6516f98acf..012e09e80300 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -6,6 +6,7 @@
>  #include <linux/acpi.h>
>  #include <linux/atomic.h>
>  #include <linux/arm_mpam.h>
> +#include <linux/bitfield.h>
>  #include <linux/cacheinfo.h>
>  #include <linux/cpu.h>
>  #include <linux/cpumask.h>
> @@ -44,6 +45,15 @@ static u32 mpam_num_msc;
>  static int mpam_cpuhp_state;
>  static DEFINE_MUTEX(mpam_cpuhp_state_lock);
>  
> +/*
> + * The smallest common values for any CPU or MSC in the system.
> + * Generating traffic outside this range will result in screaming interrupts.
> + */
> +u16 mpam_partid_max;
> +u8 mpam_pmg_max;
> +static bool partid_max_init, partid_max_published;
> +static DEFINE_SPINLOCK(partid_max_lock);
> +
Can partid_max and pmg_max simply be statically initialised with the
max value of the respective type instead? -1 or ~0 would be adequate
initialisers.
This would allow us to get rid of partid_max_init and the associated
logic.
This assumes that we don't enable MPAM (and start using nonzero values
in MPAM[10]_EL1) unless at least one usable MSC was found, so that the
initialiser is not left behing.
I think that's true by construction -- if the firmware reports no MSCs,
we probe nothing.  Otherwise, we require every MSC reported by the
firmware to probe before enabling MPAM.
>  /*
>   * mpam is enabled once all devices have been probed from CPU online callbacks,
>   * scheduled via this work_struct. If access to an MSC depends on a CPU that
> @@ -106,6 +116,74 @@ static inline u32 _mpam_read_partsel_reg(struct mpam_msc *msc, u16 reg)
>  
>  #define mpam_read_partsel_reg(msc, reg)        _mpam_read_partsel_reg(msc, MPAMF_##reg)
>  
> +static void __mpam_write_reg(struct mpam_msc *msc, u16 reg, u32 val)
> +{
> +	WARN_ON_ONCE(reg + sizeof(u32) > msc->mapped_hwpage_sz);
> +	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
> +
> +	writel_relaxed(val, msc->mapped_hwpage + reg);
> +}
> +
> +static inline void _mpam_write_partsel_reg(struct mpam_msc *msc, u16 reg, u32 val)
> +{
> +	lockdep_assert_held_once(&msc->part_sel_lock);
> +	__mpam_write_reg(msc, reg, val);
> +}
> +#define mpam_write_partsel_reg(msc, reg, val)  _mpam_write_partsel_reg(msc, MPAMCFG_##reg, val)
> +
> +static u64 mpam_msc_read_idr(struct mpam_msc *msc)
> +{
> +	u64 idr_high = 0, idr_low;
> +
> +	lockdep_assert_held(&msc->part_sel_lock);
> +
> +	idr_low = mpam_read_partsel_reg(msc, IDR);
> +	if (FIELD_GET(MPAMF_IDR_EXT, idr_low))
> +		idr_high = mpam_read_partsel_reg(msc, IDR + 4);
> +
> +	return (idr_high << 32) | idr_low;
> +}
> +
> +static void __mpam_part_sel_raw(u32 partsel, struct mpam_msc *msc)
> +{
> +	lockdep_assert_held(&msc->part_sel_lock);
> +
> +	mpam_write_partsel_reg(msc, PART_SEL, partsel);
> +}
> +
> +static void __mpam_part_sel(u8 ris_idx, u16 partid, struct mpam_msc *msc)
> +{
> +	u32 partsel = FIELD_PREP(MPAMCFG_PART_SEL_RIS, ris_idx) |
> +		      FIELD_PREP(MPAMCFG_PART_SEL_PARTID_SEL, partid);
> +
> +	__mpam_part_sel_raw(partsel, msc);
> +}
> +
> +int mpam_register_requestor(u16 partid_max, u8 pmg_max)
> +{
> +	int err = 0;
> +
> +	lockdep_assert_irqs_enabled();
This is just because we don't use spin_lock_irqsave(), right?
(Just checking my understanding.)
> +
> +	spin_lock(&partid_max_lock);
> +	if (!partid_max_init) {
> +		mpam_partid_max = partid_max;
> +		mpam_pmg_max = pmg_max;
> +		partid_max_init = true;
> +	} else if (!partid_max_published) {
> +		mpam_partid_max = min(mpam_partid_max, partid_max);
> +		mpam_pmg_max = min(mpam_pmg_max, pmg_max);
> +	} else {
> +		/* New requestors can't lower the values */
> +		if (partid_max < mpam_partid_max || pmg_max < mpam_pmg_max)
> +			err = -EBUSY;
What will actually happen here?  Will the arch code reject the
offending late secondary?
It feels a bit like we're reinventing part of cpufeatures here --
though I suppose the split between early and late CPUs is not the same
as for the kernel proper: the "MPAM early CPUs" are those that have
appeared by the time the last MSC probes, which can be a different
set than the kernel's early CPUs.  So I guess there is no choice but
to have some special logic here.
> +	}
> +	spin_unlock(&partid_max_lock);
> +
> +	return err;
> +}
> +EXPORT_SYMBOL(mpam_register_requestor);
As noted above, dead function (in this series)?
Also, would it be better to split out the common core shared by this and
the last stanza of mpam_msc_hw_probe()?
(This common core would make sense in this series even if
mpam_register_requestor() is deferred.)
> +
>  #define init_garbage(x)	init_llist_node(&(x)->garbage.llist)
>  
>  static struct mpam_vmsc *
> @@ -520,6 +598,7 @@ static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
>  	cpumask_or(&comp->affinity, &comp->affinity, &ris->affinity);
>  	cpumask_or(&class->affinity, &class->affinity, &ris->affinity);
>  	list_add_rcu(&ris->vmsc_list, &vmsc->ris);
> +	list_add_rcu(&ris->msc_list, &msc->ris);
>  
>  	return 0;
>  }
> @@ -539,10 +618,37 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
>  	return err;
>  }
>  
> +static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
> +						   u8 ris_idx)
> +{
> +	int err;
> +	struct mpam_msc_ris *ris, *found = ERR_PTR(-ENOENT);
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	if (!test_bit(ris_idx, msc->ris_idxs)) {
I was a bit puzzled by this.
A comment somewhere might help: if ris_idx wasn't already seen while
parsing firmware tables, then there is nothing to tell us what it
controls in the hardware: mark it as MPAM_CLASS_UNKNOWN.
(If I understood correctly, that is.)
> +		err = mpam_ris_create_locked(msc, ris_idx, MPAM_CLASS_UNKNOWN,
> +					     0, 0, GFP_ATOMIC);
It's mildly unfortunate that these allocations are from the GFP_ATOMIC pool.
I think we were in regular thread context until we started taking MPAM
locks?
But it's a finite amount of memory and not under userspace control, so
not a huge deal.
This could possibly be improved later; it doesn't feel critical right
now.
> +		if (err)
> +			return ERR_PTR(err);
> +	}
> +
> +	list_for_each_entry(ris, &msc->ris, msc_list) {
> +		if (ris->ris_idx == ris_idx) {
> +			found = ris;
return ris; (?)
(You could then also delete all the curlies.)
> +			break;
> +		}
> +	}
> +
> +	return found;
> +}
> +
>  static int mpam_msc_hw_probe(struct mpam_msc *msc)
>  {
>  	u64 idr;
> -	int err;
Churn?  (See patch 14.)
> +	u16 partid_max;
> +	u8 ris_idx, pmg_max;
> +	struct mpam_msc_ris *ris;
>  
>  	lockdep_assert_held(&msc->probe_lock);
>  
> @@ -551,14 +657,42 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>  	if ((idr & MPAMF_AIDR_ARCH_MAJOR_REV) != MPAM_ARCHITECTURE_V1) {
>  		pr_err_once("%s does not match MPAM architecture v1.x\n",
>  			    dev_name(&msc->pdev->dev));
> -		err = -EIO;
> -	} else {
> -		msc->probed = true;
> -		err = 0;
> +		mutex_unlock(&msc->part_sel_lock);
> +		return -EIO;
>  	}
> +
> +	idr = mpam_msc_read_idr(msc);
>  	mutex_unlock(&msc->part_sel_lock);
> +	msc->ris_max = FIELD_GET(MPAMF_IDR_RIS_MAX, idr);
> +
> +	/* Use these values so partid/pmg always starts with a valid value */
> +	msc->partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
> +	msc->pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
> +
> +	for (ris_idx = 0; ris_idx <= msc->ris_max; ris_idx++) {
> +		mutex_lock(&msc->part_sel_lock);
> +		__mpam_part_sel(ris_idx, 0, msc);
> +		idr = mpam_msc_read_idr(msc);
> +		mutex_unlock(&msc->part_sel_lock);
> +
> +		partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
> +		pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
> +		msc->partid_max = min(msc->partid_max, partid_max);
> +		msc->pmg_max = min(msc->pmg_max, pmg_max);
> +
> +		ris = mpam_get_or_create_ris(msc, ris_idx);
> +		if (IS_ERR(ris))
> +			return PTR_ERR(ris);
> +	}
>  
> -	return err;
> +	spin_lock(&partid_max_lock);
Can we get in here with partid_max_init = false?  Do we need to check /
set it?  Or, maybe get rid of it; see mpam_partid_max etc.
> +	mpam_partid_max = min(mpam_partid_max, msc->partid_max);
> +	mpam_pmg_max = min(mpam_pmg_max, msc->pmg_max);
> +	spin_unlock(&partid_max_lock);
> +
> +	msc->probed = true;
> +
> +	return 0;
>  }
>  
>  static int mpam_cpu_online(unsigned int cpu)
> @@ -900,9 +1034,18 @@ static struct platform_driver mpam_msc_driver = {
>  
>  static void mpam_enable_once(void)
>  {
> +	/*
> +	 * Once the cpuhp callbacks have been changed, mpam_partid_max can no
> +	 * longer change.
> +	 */
> +	spin_lock(&partid_max_lock);
> +	partid_max_published = true;
> +	spin_unlock(&partid_max_lock);
> +
>  	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
>  
> -	pr_info("MPAM enabled\n");
> +	printk(KERN_INFO "MPAM enabled with %u partid and %u pmg\n",
> +	       mpam_partid_max + 1, mpam_pmg_max + 1);
Nit: Since this is the main advertisement in dmesg that MPAM has been
successfully probed, maybe use the formal architectural terms here,
e.g.:
	printk(KERN_INFO "MPAM enabled with %u PARTIDs and %u PMGs\n",
(Largely cosmetic, though.)
>  }
>  
>  /*
> @@ -972,4 +1115,5 @@ static int __init mpam_msc_driver_init(void)
>  
>  	return platform_driver_register(&mpam_msc_driver);
>  }
> +/* Must occur after arm64_mpam_register_cpus() from arch_initcall() */
>  subsys_initcall(mpam_msc_driver_init);
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index a98cca08a2ef..a623f405ddd8 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -50,6 +50,8 @@ struct mpam_msc {
>  	 */
>  	struct mutex		probe_lock;
>  	bool			probed;
> +	u16			partid_max;
> +	u8			pmg_max;
Are these actually used for anything?
mpam_msc_hw_probe() already merges these into mpam_partid_max etc., so
we just leave the hardware-reported values here -- which we can't ever
use, for fear of triggering error interrupts.
(Keeping them would make it easier to migrate to merging these max
values through the mpam_enable_merge_features() logic introduced in
patch 18.  That's a possibly simplification, but it inessential to do
that right now.
>  	unsigned long		ris_idxs[128 / BITS_PER_LONG];
>  	u32			ris_max;
[...]
> diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
> index 406a77be68cb..8af93794c7a2 100644
> --- a/include/linux/arm_mpam.h
> +++ b/include/linux/arm_mpam.h
> @@ -39,4 +39,18 @@ static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
>  int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
>  		    enum mpam_class_types type, u8 class_id, int component_id);
>  
> +/**
> + * mpam_register_requestor() - Register a requestor with the MPAM driver
> + * @partid_max:		The maximum PARTID value the requestor can generate.
> + * @pmg_max:		The maximum PMG value the requestor can generate.
> + *
> + * Registers a requestor with the MPAM driver to ensure the chosen system-wide
> + * minimum PARTID and PMG values will allow the requestors features to be used.
> + *
> + * Returns an error if the registration is too late, and a larger PARTID/PMG
> + * value has been advertised to user-space. In this case the requestor should
> + * not use its MPAM features. Returns 0 on success.
> + */
> +int mpam_register_requestor(u16 partid_max, u8 pmg_max);
> +
Remove if the declared function is punted from this series.
Cheers
---Dave
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 15/33] arm_mpam: Probe MSCs to find the supported partid/pmg values
  2025-09-08 16:29   ` Dave Martin
@ 2025-09-09 16:57     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-09 16:57 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi Dave,
On 08/09/2025 17:29, Dave Martin wrote:
> On Fri, Aug 22, 2025 at 03:29:56PM +0000, James Morse wrote:
>> Subject: Re: [PATCH 15/33] arm_mpam: Probe MSCs to find the supported partid/pmg values
> 
> Summary line does not describe the patch (i.e., the requestor
> registration interface is something different... and the probing of RIS
> properties has nothing architecturally to do with determining the max
> PARTID and PMG values.)
The values are allowed to vary by 'instance' of the id register - which someone is going
to read as per-RIS. (it should really have said per security state). Doing this makes its
robust against that, and we're going to have to walk the list of RIS anyway.
The CPU requestor probing isn't part of the MSC, I'll rephrase it 'probe hardware'.
> Even regarding the MSC probing, there is considerably more in this
> patch than just the determination of the ID limits.
The only real exta bit is creating the UNKNOWN MSC as they've now been discovered.
I don't think that's worth a patch on its own, there are too many as it is.
>> CPUs can generate traffic with a range of PARTID and PMG values,
>> but each MSC may have its own maximum size for these fields.
>> Before MPAM can be used, the driver needs to probe each RIS on
>> each MSC, to find the system-wide smallest value that can be used.
>>
>> While doing this, RIS entries that firmware didn't describe are create
> 
> Nit: created
Fixed,
>> under MPAM_CLASS_UNKNOWN.
> 
> What are the effects / implications of this?
They're accounted for, reset, exposed to debugfs. But unused.
>> While we're here, implement the mpam_register_requestor() call
>> for the arch code to register the CPU limits. Future callers of this
>> will tell us about the SMMU and ITS.
> 
> This function seems to be dead in this series; registration of CPUs is
> "future" too. Does it make sense to boot this into the next series?
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index 9d6516f98acf..012e09e80300 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -44,6 +45,15 @@ static u32 mpam_num_msc;
>>  static int mpam_cpuhp_state;
>>  static DEFINE_MUTEX(mpam_cpuhp_state_lock);
>>  
>> +/*
>> + * The smallest common values for any CPU or MSC in the system.
>> + * Generating traffic outside this range will result in screaming interrupts.
>> + */
>> +u16 mpam_partid_max;
>> +u8 mpam_pmg_max;
>> +static bool partid_max_init, partid_max_published;
>> +static DEFINE_SPINLOCK(partid_max_lock);
> Can partid_max and pmg_max simply be statically initialised with the
> max value of the respective type instead? -1 or ~0 would be adequate
> initialisers.
How do we know ~0 isn't the final value?
> This would allow us to get rid of partid_max_init and the associated
> logic.
This is what causes the driver to report 0 PARTID and 0 PMG if no requestors turn up.
I think initialising as ~0 would result in more handing of ~0 as a special case, and
wouldn't be self contained. As it is, 0 doesn't need handling as its safe to use when
there are no requestors.
> This assumes that we don't enable MPAM (and start using nonzero values
> in MPAM[10]_EL1) unless at least one usable MSC was found, so that the
> initialiser is not left behing.
> 
> I think that's true by construction -- if the firmware reports no MSCs,
> we probe nothing.  Otherwise, we require every MSC reported by the
> firmware to probe before enabling MPAM.
But we can't know how many requestors there are - only which ones showed up.
It's not just about the MSC.
>> @@ -106,6 +116,74 @@ static inline u32 _mpam_read_partsel_reg(struct mpam_msc *msc, u16 reg)
>> +int mpam_register_requestor(u16 partid_max, u8 pmg_max)
>> +{
>> +	int err = 0;
>> +
>> +	lockdep_assert_irqs_enabled();
> 
> This is just because we don't use spin_lock_irqsave(), right?
raw_spinlock_t is (now) needed from irq-masked contexts like IPI - but wasn't at the time
I wrote this. It can go now as mainline checks this.
> (Just checking my understanding.)
>> +
>> +	spin_lock(&partid_max_lock);
>> +	if (!partid_max_init) {
>> +		mpam_partid_max = partid_max;
>> +		mpam_pmg_max = pmg_max;
>> +		partid_max_init = true;
>> +	} else if (!partid_max_published) {
>> +		mpam_partid_max = min(mpam_partid_max, partid_max);
>> +		mpam_pmg_max = min(mpam_pmg_max, pmg_max);
>> +	} else {
>> +		/* New requestors can't lower the values */
>> +		if (partid_max < mpam_partid_max || pmg_max < mpam_pmg_max)
>> +			err = -EBUSY;
> 
> What will actually happen here?  Will the arch code reject the
> offending late secondary?
its not meant for the CPUs as these get registers way before anything else happpens.
This is for things like the SMMU or ITS, where if they probe too late, they need to be
told they can't use their MPAM features. See:
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/commit/?h=mpam/snapshot%2bextras/v6.17-rc2&id=41c768c1705d3fa0814bb1cb92c83280c872cb3e
> It feels a bit like we're reinventing part of cpufeatures here --
yes, but for system-ip, hence the new wheel.
> though I suppose the split between early and late CPUs is not the same
> as for the kernel proper: the "MPAM early CPUs" are those that have
> appeared by the time the last MSC probes, which can be a different
> set than the kernel's early CPUs.  So I guess there is no choice but
> to have some special logic here.
Late online-ing CPUs are cpufeature's problem to deal with, yes it has similar logic for
MPAMIDR_EL1.
>> +	}
>> +	spin_unlock(&partid_max_lock);
>> +
>> +	return err;
>> +}
>> +EXPORT_SYMBOL(mpam_register_requestor);
> 
> As noted above, dead function (in this series)?
Yes, its a chicken and egg problem. The caller was in the arm64 patches at the begining,
but got moved later, and then out of this series to reduce the number of trees this series
touches. Its too big to post the whole thing in one go, and however you break it up its
always possible the throw the whole lot away as being unusable until its all present. I'm
damned whatever I do here. I plan to post the arm64 patches as RFC along side the v2.
> Also, would it be better to split out the common core shared by this and
> the last stanza of mpam_msc_hw_probe()?
> 
> (This common core would make sense in this series even if
> mpam_register_requestor() is deferred.)
It's not common because of the extra flags that requestors need to pay attention to, which
can't happen by construction for the hardware probing. The he hardware probing version
can't fail. I'd like to keep the requestor logic separate - its maybe two lines in common.
>> @@ -539,10 +618,37 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
>>  	return err;
>>  }
>>  
>> +static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
>> +						   u8 ris_idx)
>> +{
>> +	int err;
>> +	struct mpam_msc_ris *ris, *found = ERR_PTR(-ENOENT);
>> +
>> +	lockdep_assert_held(&mpam_list_lock);
>> +
>> +	if (!test_bit(ris_idx, msc->ris_idxs)) {
> 
> I was a bit puzzled by this.
> 
> A comment somewhere might help: if ris_idx wasn't already seen while
> parsing firmware tables, then there is nothing to tell us what it
> controls in the hardware: mark it as MPAM_CLASS_UNKNOWN.
> 
> (If I understood correctly, that is.)
Spot on.
Taken your comment with some word order changes.
>> +		err = mpam_ris_create_locked(msc, ris_idx, MPAM_CLASS_UNKNOWN,
>> +					     0, 0, GFP_ATOMIC);
> 
> It's mildly unfortunate that these allocations are from the GFP_ATOMIC pool.
> I think we were in regular thread context until we started taking MPAM
> locks?
> 
> But it's a finite amount of memory and not under userspace control, so
> not a huge deal.
> 
> This could possibly be improved later; it doesn't feel critical right
> now.
I've fixed that up - its from when msc->lock was a spin lock that was taken for any
hardware access. Now that there is a separate probe_lock which is a mutex, all of these
can always sleep.
>> +		if (err)
>> +			return ERR_PTR(err);
>> +	}
>> +
>> +	list_for_each_entry(ris, &msc->ris, msc_list) {
>> +		if (ris->ris_idx == ris_idx) {
>> +			found = ris;
> 
> return ris; (?)
> 
> (You could then also delete all the curlies.)
I'll keep the outer ones as its easier to parse lines that expressions as you skim this.
>> +			break;
>> +		}
>> +	}
>> +
>> +	return found;
>> +}
>> +
>>  static int mpam_msc_hw_probe(struct mpam_msc *msc)
>>  {
>>  	u64 idr;
>> -	int err;
> 
> Churn?  (See patch 14.)
I'll try and come up with a version that doesn't do that,
>> +	u16 partid_max;
>> +	u8 ris_idx, pmg_max;
>> +	struct mpam_msc_ris *ris;
>>  
>>  	lockdep_assert_held(&msc->probe_lock);
>>  
>> @@ -551,14 +657,42 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>>  	if ((idr & MPAMF_AIDR_ARCH_MAJOR_REV) != MPAM_ARCHITECTURE_V1) {
>>  		pr_err_once("%s does not match MPAM architecture v1.x\n",
>>  			    dev_name(&msc->pdev->dev));
>> -		err = -EIO;
>> -	} else {
>> -		msc->probed = true;
>> -		err = 0;
>> +		mutex_unlock(&msc->part_sel_lock);
>> +		return -EIO;
>>  	}
>> +
>> +	idr = mpam_msc_read_idr(msc);
>>  	mutex_unlock(&msc->part_sel_lock);
>> +	msc->ris_max = FIELD_GET(MPAMF_IDR_RIS_MAX, idr);
>> +
>> +	/* Use these values so partid/pmg always starts with a valid value */
>> +	msc->partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
>> +	msc->pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
>> +
>> +	for (ris_idx = 0; ris_idx <= msc->ris_max; ris_idx++) {
>> +		mutex_lock(&msc->part_sel_lock);
>> +		__mpam_part_sel(ris_idx, 0, msc);
>> +		idr = mpam_msc_read_idr(msc);
>> +		mutex_unlock(&msc->part_sel_lock);
>> +
>> +		partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
>> +		pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
>> +		msc->partid_max = min(msc->partid_max, partid_max);
>> +		msc->pmg_max = min(msc->pmg_max, pmg_max);
>> +
>> +		ris = mpam_get_or_create_ris(msc, ris_idx);
>> +		if (IS_ERR(ris))
>> +			return PTR_ERR(ris);
>> +	}
>>  
>> -	return err;
>> +	spin_lock(&partid_max_lock);
> 
> Can we get in here with partid_max_init = false?
Today, yes - because the arch code has been ripped out to reduce the number of trees this
touches.
> Do we need to check /
> set it?  Or, maybe get rid of it; see mpam_partid_max etc.
Nope - it causes the 0 PARTID value to get consumed as a minimum (no special casing!),
meaning you get 0 PARTID 0 PMG reported after probing because there are no requestors.
Once a requestor shows up, the right thing happens.
This relies on the arch_initcall() running to register the CPUs before the driver probes.
>> +	mpam_partid_max = min(mpam_partid_max, msc->partid_max);
>> +	mpam_pmg_max = min(mpam_pmg_max, msc->pmg_max);
>> +	spin_unlock(&partid_max_lock);
>> +
>> +	msc->probed = true;
>> +
>> +	return 0;
>>  }
>> @@ -900,9 +1034,18 @@ static struct platform_driver mpam_msc_driver = {
>>  
>>  static void mpam_enable_once(void)
>>  {
>> +	/*
>> +	 * Once the cpuhp callbacks have been changed, mpam_partid_max can no
>> +	 * longer change.
>> +	 */
>> +	spin_lock(&partid_max_lock);
>> +	partid_max_published = true;
>> +	spin_unlock(&partid_max_lock);
>> +
>>  	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
>>  
>> -	pr_info("MPAM enabled\n");
>> +	printk(KERN_INFO "MPAM enabled with %u partid and %u pmg\n",
>> +	       mpam_partid_max + 1, mpam_pmg_max + 1);
> 
> Nit: Since this is the main advertisement in dmesg that MPAM has been
> successfully probed, maybe use the formal architectural terms here,
> e.g.:
> 
> 	printk(KERN_INFO "MPAM enabled with %u PARTIDs and %u PMGs\n",
> 
> (Largely cosmetic, though.)
Done,
>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>> index a98cca08a2ef..a623f405ddd8 100644
>> --- a/drivers/resctrl/mpam_internal.h
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -50,6 +50,8 @@ struct mpam_msc {
>>  	 */
>>  	struct mutex		probe_lock;
>>  	bool			probed;
>> +	u16			partid_max;
>> +	u8			pmg_max;
> Are these actually used for anything?
There are a few things kept around to expose to debugfs to answer the question "why
doesn't this hardware do what I wanted". In this case - finding which MSC is causing
the PARTID/PMG values to be lower than expected.
> mpam_msc_hw_probe() already merges these into mpam_partid_max etc., so
> we just leave the hardware-reported values here -- which we can't ever
> use, for fear of triggering error interrupts.
> 
> (Keeping them would make it easier to migrate to merging these max
> values through the mpam_enable_merge_features() logic introduced in
> patch 18.  That's a possibly simplification, but it inessential to do
> that right now.
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
 
- * [PATCH 16/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (14 preceding siblings ...)
  2025-08-22 15:29 ` [PATCH 15/33] arm_mpam: Probe MSCs to find the supported partid/pmg values James Morse
@ 2025-08-22 15:29 ` James Morse
  2025-08-28 17:07   ` Fenghua Yu
  2025-09-09 15:39   ` Dave Martin
  2025-08-22 15:29 ` [PATCH 17/33] arm_mpam: Probe the hardware features resctrl supports James Morse
                   ` (51 subsequent siblings)
  67 siblings, 2 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
The MSC MON_SEL register needs to be accessed from hardirq context by the
PMU drivers, making an irqsave spinlock the obvious lock to protect these
registers. On systems with SCMI mailboxes it must be able to sleep, meaning
a mutex must be used.
Clearly these two can't exist at the same time.
Add helpers for the MON_SEL locking. The outer lock must be taken in a
pre-emptible context before the inner lock can be taken. On systems with
SCMI mailboxes where the MON_SEL accesses must sleep - the inner lock
will fail to be 'taken' if the caller is unable to sleep. This will allow
the PMU driver to fail without having to check the interface type of
each MSC.
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_internal.h | 57 ++++++++++++++++++++++++++++++++-
 1 file changed, 56 insertions(+), 1 deletion(-)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index a623f405ddd8..c6f087f9fa7d 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -68,10 +68,19 @@ struct mpam_msc {
 
 	/*
 	 * mon_sel_lock protects access to the MSC hardware registers that are
-	 * affeted by MPAMCFG_MON_SEL.
+	 * affected by MPAMCFG_MON_SEL, and the mbwu_state.
+	 * Both the 'inner' and 'outer' must be taken.
+	 * For real MMIO MSC, the outer lock is unnecessary - but keeps the
+	 * code common with:
+	 * Firmware backed MSC need to sleep when accessing the MSC, which
+	 * means some code-paths will always fail. For these MSC the outer
+	 * lock is providing the protection, and the inner lock fails to
+	 * be taken if the task is unable to sleep.
+	 *
 	 * If needed, take msc->probe_lock first.
 	 */
 	struct mutex		outer_mon_sel_lock;
+	bool			outer_lock_held;
 	raw_spinlock_t		inner_mon_sel_lock;
 	unsigned long		inner_mon_sel_flags;
 
@@ -81,6 +90,52 @@ struct mpam_msc {
 	struct mpam_garbage	garbage;
 };
 
+static inline bool __must_check mpam_mon_sel_inner_lock(struct mpam_msc *msc)
+{
+	/*
+	 * The outer lock may be taken by a CPU that then issues an IPI to run
+	 * a helper that takes the inner lock. lockdep can't help us here.
+	 */
+	WARN_ON_ONCE(!msc->outer_lock_held);
+
+	if (msc->iface == MPAM_IFACE_MMIO) {
+		raw_spin_lock_irqsave(&msc->inner_mon_sel_lock, msc->inner_mon_sel_flags);
+		return true;
+	}
+
+	/* Accesses must fail if we are not pre-emptible */
+	return !!preemptible();
+}
+
+static inline void mpam_mon_sel_inner_unlock(struct mpam_msc *msc)
+{
+	WARN_ON_ONCE(!msc->outer_lock_held);
+
+	if (msc->iface == MPAM_IFACE_MMIO)
+		raw_spin_unlock_irqrestore(&msc->inner_mon_sel_lock, msc->inner_mon_sel_flags);
+}
+
+static inline void mpam_mon_sel_outer_lock(struct mpam_msc *msc)
+{
+	mutex_lock(&msc->outer_mon_sel_lock);
+	msc->outer_lock_held = true;
+}
+
+static inline void mpam_mon_sel_outer_unlock(struct mpam_msc *msc)
+{
+	msc->outer_lock_held = false;
+	mutex_unlock(&msc->outer_mon_sel_lock);
+}
+
+static inline void mpam_mon_sel_lock_held(struct mpam_msc *msc)
+{
+	WARN_ON_ONCE(!msc->outer_lock_held);
+	if (msc->iface == MPAM_IFACE_MMIO)
+		lockdep_assert_held_once(&msc->inner_mon_sel_lock);
+	else
+		lockdep_assert_preemption_enabled();
+}
+
 struct mpam_class {
 	/* mpam_components in this class */
 	struct list_head	components;
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 16/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers
  2025-08-22 15:29 ` [PATCH 16/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers James Morse
@ 2025-08-28 17:07   ` Fenghua Yu
  2025-09-09 16:57     ` James Morse
  2025-09-09 15:39   ` Dave Martin
  1 sibling, 1 reply; 200+ messages in thread
From: Fenghua Yu @ 2025-08-28 17:07 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi, James,
On 8/22/25 08:29, James Morse wrote:
> The MSC MON_SEL register needs to be accessed from hardirq context by the
> PMU drivers, making an irqsave spinlock the obvious lock to protect these
> registers. On systems with SCMI mailboxes it must be able to sleep, meaning
> a mutex must be used.
>
> Clearly these two can't exist at the same time.
>
> Add helpers for the MON_SEL locking. The outer lock must be taken in a
> pre-emptible context before the inner lock can be taken. On systems with
> SCMI mailboxes where the MON_SEL accesses must sleep - the inner lock
> will fail to be 'taken' if the caller is unable to sleep. This will allow
> the PMU driver to fail without having to check the interface type of
> each MSC.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>   drivers/resctrl/mpam_internal.h | 57 ++++++++++++++++++++++++++++++++-
>   1 file changed, 56 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index a623f405ddd8..c6f087f9fa7d 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -68,10 +68,19 @@ struct mpam_msc {
>   
>   	/*
>   	 * mon_sel_lock protects access to the MSC hardware registers that are
> -	 * affeted by MPAMCFG_MON_SEL.
> +	 * affected by MPAMCFG_MON_SEL, and the mbwu_state.
> +	 * Both the 'inner' and 'outer' must be taken.
> +	 * For real MMIO MSC, the outer lock is unnecessary - but keeps the
> +	 * code common with:
> +	 * Firmware backed MSC need to sleep when accessing the MSC, which
> +	 * means some code-paths will always fail. For these MSC the outer
> +	 * lock is providing the protection, and the inner lock fails to
> +	 * be taken if the task is unable to sleep.
> +	 *
>   	 * If needed, take msc->probe_lock first.
>   	 */
>   	struct mutex		outer_mon_sel_lock;
> +	bool			outer_lock_held;
Is it better to define outer_lock_held at atomic_t?
>   	raw_spinlock_t		inner_mon_sel_lock;
>   	unsigned long		inner_mon_sel_flags;
>   
> @@ -81,6 +90,52 @@ struct mpam_msc {
>   	struct mpam_garbage	garbage;
>   };
>   
> +static inline bool __must_check mpam_mon_sel_inner_lock(struct mpam_msc *msc)
> +{
> +	/*
> +	 * The outer lock may be taken by a CPU that then issues an IPI to run
> +	 * a helper that takes the inner lock. lockdep can't help us here.
> +	 */
> +	WARN_ON_ONCE(!msc->outer_lock_held);
At this point, msc->outer_lock_held might not be true yet due to no 
memory barrier on it on this CPU. If it's atomic_t and it's set as true 
on another CPU by smp_store_release(), it's guaranteed to be visible as 
true on this CPU. Without atomic setting, we may see a false warning 
here and cause debug difficult.
> +
> +	if (msc->iface == MPAM_IFACE_MMIO) {
> +		raw_spin_lock_irqsave(&msc->inner_mon_sel_lock, msc->inner_mon_sel_flags);
> +		return true;
> +	}
> +
> +	/* Accesses must fail if we are not pre-emptible */
> +	return !!preemptible();
> +}
> +
> +static inline void mpam_mon_sel_inner_unlock(struct mpam_msc *msc)
> +{
> +	WARN_ON_ONCE(!msc->outer_lock_held);
> +
> +	if (msc->iface == MPAM_IFACE_MMIO)
> +		raw_spin_unlock_irqrestore(&msc->inner_mon_sel_lock, msc->inner_mon_sel_flags);
> +}
> +
> +static inline void mpam_mon_sel_outer_lock(struct mpam_msc *msc)
> +{
> +	mutex_lock(&msc->outer_mon_sel_lock);
> +	msc->outer_lock_held = true;
> +}
> +
> +static inline void mpam_mon_sel_outer_unlock(struct mpam_msc *msc)
> +{
> +	msc->outer_lock_held = false;
> +	mutex_unlock(&msc->outer_mon_sel_lock);
> +}
> +
> +static inline void mpam_mon_sel_lock_held(struct mpam_msc *msc)
> +{
> +	WARN_ON_ONCE(!msc->outer_lock_held);
> +	if (msc->iface == MPAM_IFACE_MMIO)
> +		lockdep_assert_held_once(&msc->inner_mon_sel_lock);
> +	else
> +		lockdep_assert_preemption_enabled();
> +}
> +
>   struct mpam_class {
>   	/* mpam_components in this class */
>   	struct list_head	components;
Thanks.
-Fenghua
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 16/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers
  2025-08-28 17:07   ` Fenghua Yu
@ 2025-09-09 16:57     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-09 16:57 UTC (permalink / raw)
  To: Fenghua Yu, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi Fenghua,
On 28/08/2025 18:07, Fenghua Yu wrote:
> On 8/22/25 08:29, James Morse wrote:
>> The MSC MON_SEL register needs to be accessed from hardirq context by the
>> PMU drivers, making an irqsave spinlock the obvious lock to protect these
>> registers. On systems with SCMI mailboxes it must be able to sleep, meaning
>> a mutex must be used.
>>
>> Clearly these two can't exist at the same time.
>>
>> Add helpers for the MON_SEL locking. The outer lock must be taken in a
>> pre-emptible context before the inner lock can be taken. On systems with
>> SCMI mailboxes where the MON_SEL accesses must sleep - the inner lock
>> will fail to be 'taken' if the caller is unable to sleep. This will allow
>> the PMU driver to fail without having to check the interface type of
>> each MSC.
>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>> index a623f405ddd8..c6f087f9fa7d 100644
>> --- a/drivers/resctrl/mpam_internal.h
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -68,10 +68,19 @@ struct mpam_msc {
>>         /*
>>        * mon_sel_lock protects access to the MSC hardware registers that are
>> -     * affeted by MPAMCFG_MON_SEL.
>> +     * affected by MPAMCFG_MON_SEL, and the mbwu_state.
>> +     * Both the 'inner' and 'outer' must be taken.
>> +     * For real MMIO MSC, the outer lock is unnecessary - but keeps the
>> +     * code common with:
>> +     * Firmware backed MSC need to sleep when accessing the MSC, which
>> +     * means some code-paths will always fail. For these MSC the outer
>> +     * lock is providing the protection, and the inner lock fails to
>> +     * be taken if the task is unable to sleep.
>> +     *
>>        * If needed, take msc->probe_lock first.
>>        */
>>       struct mutex        outer_mon_sel_lock;
>> +    bool            outer_lock_held;
> Is it better to define outer_lock_held at atomic_t?
Writes a protected by the outer lock, its just something to generate a warning for debug.
I can make it a READ_ONCE() if you're worried about torn values in the failure case.
(as its just for debug, I'm not worried about false-negatives)
>>       raw_spinlock_t        inner_mon_sel_lock;
>>       unsigned long        inner_mon_sel_flags;
>>   @@ -81,6 +90,52 @@ struct mpam_msc {
>>       struct mpam_garbage    garbage;
>>   };
>>   +static inline bool __must_check mpam_mon_sel_inner_lock(struct mpam_msc *msc)
>> +{
>> +    /*
>> +     * The outer lock may be taken by a CPU that then issues an IPI to run
>> +     * a helper that takes the inner lock. lockdep can't help us here.
>> +     */
>> +    WARN_ON_ONCE(!msc->outer_lock_held);
> 
> At this point, msc->outer_lock_held might not be true yet due to no memory barrier on it
> on this CPU.
The IPI machinery has this covered:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/irqchip/irq-gic.c#n838
> If it's atomic_t and it's set as true on another CPU by smp_store_release(),
> it's guaranteed to be visible as true on this CPU. Without atomic setting, we may see a
> false warning here and cause debug difficult.
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
- * Re: [PATCH 16/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers
  2025-08-22 15:29 ` [PATCH 16/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers James Morse
  2025-08-28 17:07   ` Fenghua Yu
@ 2025-09-09 15:39   ` Dave Martin
  2025-09-10 19:19     ` James Morse
  1 sibling, 1 reply; 200+ messages in thread
From: Dave Martin @ 2025-09-09 15:39 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi,
On Fri, Aug 22, 2025 at 03:29:57PM +0000, James Morse wrote:
> The MSC MON_SEL register needs to be accessed from hardirq context by the
> PMU drivers, making an irqsave spinlock the obvious lock to protect these
What PMU drivers?  MPAM itself doesn't define its monitors as PMUs, and
(as of this series) there is no intergration with perf.
> registers. On systems with SCMI mailboxes it must be able to sleep, meaning
> a mutex must be used.
> 
> Clearly these two can't exist at the same time.
The locks obvisouly do exist at the same time.  Do you mean that an
individual MSC must be either MMIO or SCMI/PCC?
(I don't think anything prevents both kinds of MSC from existing in the
same system?)
Above, you seem to imply that each kind of MSC interface requires a
different kind of lock, but below, you imply that the locks must be
used together, with holding the outer lock being a precondition for
taking the inner lock. 
Because these functions are introduced with no user, the code doesn't
offer much in the way of clues.  In particular, there is no indication
of what the outer lock is supposed to protect.
> Add helpers for the MON_SEL locking. The outer lock must be taken in a
> pre-emptible context before the inner lock can be taken. On systems with
> SCMI mailboxes where the MON_SEL accesses must sleep - the inner lock
> will fail to be 'taken' if the caller is unable to sleep. This will allow
> the PMU driver to fail without having to check the interface type of
Why is it acceptable to fail (i.e., don't the counts need to be read on
non-MMIO MSCs?)
> each MSC.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  drivers/resctrl/mpam_internal.h | 57 ++++++++++++++++++++++++++++++++-
>  1 file changed, 56 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index a623f405ddd8..c6f087f9fa7d 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -68,10 +68,19 @@ struct mpam_msc {
>  
>  	/*
>  	 * mon_sel_lock protects access to the MSC hardware registers that are
> -	 * affeted by MPAMCFG_MON_SEL.
> +	 * affected by MPAMCFG_MON_SEL, and the mbwu_state.
> +	 * Both the 'inner' and 'outer' must be taken.
> +	 * For real MMIO MSC, the outer lock is unnecessary - but keeps the
> +	 * code common with:
> +	 * Firmware backed MSC need to sleep when accessing the MSC, which
> +	 * means some code-paths will always fail. For these MSC the outer
> +	 * lock is providing the protection, and the inner lock fails to
> +	 * be taken if the task is unable to sleep.
> +	 *
>  	 * If needed, take msc->probe_lock first.
>  	 */
>  	struct mutex		outer_mon_sel_lock;
> +	bool			outer_lock_held;
Why not use mutex_is_locked()?
>  	raw_spinlock_t		inner_mon_sel_lock;
Why raw?  The commit message makes no mention of it.
(We really to need to sit on a specific CPU while holding this lock, so
"raw" makes sense.  But we're always doing this in a cross-call,
presumably with the hotplug lock held -- so I think we can't be
migrated anyway?)
>  	unsigned long		inner_mon_sel_flags;
>  
> @@ -81,6 +90,52 @@ struct mpam_msc {
>  	struct mpam_garbage	garbage;
>  };
>  
> +static inline bool __must_check mpam_mon_sel_inner_lock(struct mpam_msc *msc)
> +{
> +	/*
> +	 * The outer lock may be taken by a CPU that then issues an IPI to run
> +	 * a helper that takes the inner lock. lockdep can't help us here.
> +	 */
> +	WARN_ON_ONCE(!msc->outer_lock_held);
> +
> +	if (msc->iface == MPAM_IFACE_MMIO) {
> +		raw_spin_lock_irqsave(&msc->inner_mon_sel_lock, msc->inner_mon_sel_flags);
> +		return true;
> +	}
> +
> +	/* Accesses must fail if we are not pre-emptible */
> +	return !!preemptible();
What accesses?
In the MPAM_IFACE_MMIO case, this returns true even though non-
preemptible (because of getting the lock).
So, what is the semantics of the return value?
A comment would probably help.
> +}
> +
> +static inline void mpam_mon_sel_inner_unlock(struct mpam_msc *msc)
> +{
> +	WARN_ON_ONCE(!msc->outer_lock_held);
> +
> +	if (msc->iface == MPAM_IFACE_MMIO)
> +		raw_spin_unlock_irqrestore(&msc->inner_mon_sel_lock, msc->inner_mon_sel_flags);
> +}
> +
> +static inline void mpam_mon_sel_outer_lock(struct mpam_msc *msc)
> +{
> +	mutex_lock(&msc->outer_mon_sel_lock);
> +	msc->outer_lock_held = true;
> +}
> +
> +static inline void mpam_mon_sel_outer_unlock(struct mpam_msc *msc)
> +{
> +	msc->outer_lock_held = false;
> +	mutex_unlock(&msc->outer_mon_sel_lock);
> +}
> +
> +static inline void mpam_mon_sel_lock_held(struct mpam_msc *msc)
> +{
> +	WARN_ON_ONCE(!msc->outer_lock_held);
> +	if (msc->iface == MPAM_IFACE_MMIO)
> +		lockdep_assert_held_once(&msc->inner_mon_sel_lock);
> +	else
> +		lockdep_assert_preemption_enabled();
> +}
> +
Except that monitors may need to be accessed in interrupt context,
I don't see an obvious difference between controls and monitors that
motivates this locking model.
Is the outer lock ever needfully held for extended periods of time,
making a (raw) spinlock unsuitable?
Cheers
---Dave
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 16/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers
  2025-09-09 15:39   ` Dave Martin
@ 2025-09-10 19:19     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 19:19 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi Dave,
On 09/09/2025 16:39, Dave Martin wrote:
> On Fri, Aug 22, 2025 at 03:29:57PM +0000, James Morse wrote:
>> The MSC MON_SEL register needs to be accessed from hardirq context by the
>> PMU drivers, making an irqsave spinlock the obvious lock to protect these
> 
> What PMU drivers?  MPAM itself doesn't define its monitors as PMUs, and
> (as of this series) there is no intergration with perf.
I can redraw this as the IPI that is needed on platforms with cache:MSC and PSCI:CPU_SUSPEND.
The PMU driver got dragged further out in time as ABMC may be a viable alternative
for platforms with insufficient monitors. (but there are also platforms which don't
look enough like a Xeon for this to work)
>> registers. On systems with SCMI mailboxes it must be able to sleep, meaning
>> a mutex must be used.
>>
>> Clearly these two can't exist at the same time.
> 
> The locks obvisouly do exist at the same time.  Do you mean that an
> individual MSC must be either MMIO or SCMI/PCC?
Yes, I've reworded that as 'for one MSC at the same time'.
> (I don't think anything prevents both kinds of MSC from existing in the
> same system?)
> 
> Above, you seem to imply that each kind of MSC interface requires a
> different kind of lock, but below, you imply that the locks must be
> used together, with holding the outer lock being a precondition for
> taking the inner lock. 
> 
> Because these functions are introduced with no user, the code doesn't
> offer much in the way of clues.  In particular, there is no indication
> of what the outer lock is supposed to protect.
It's a structure to you do the right things in the right context.
You have to try to take both locks - all the inner lock does on a system that
needs to sleep is check the context, so the outer lock does all the 'protecting'.
On 'normal' systems, the inner lock takes an irqsave spinlock which makes does
all the work, and makes it safe for the overflow interrupt.
>> Add helpers for the MON_SEL locking. The outer lock must be taken in a
>> pre-emptible context before the inner lock can be taken. On systems with
>> SCMI mailboxes where the MON_SEL accesses must sleep - the inner lock
>> will fail to be 'taken' if the caller is unable to sleep. This will allow
>> the PMU driver to fail without having to check the interface type of
> 
> Why is it acceptable to fail (i.e., don't the counts need to be read on
> non-MMIO MSCs?)
They can't from contexts that need to sleep. If you've got this firmware thing
you also need to have a platform that doesn't need IPI to reach the mailbox (why
would it), overflow interrupts, or a PMU driver.
Instead of having two drivers, or type checks all over the place - this structure
lets such a platform get through as much of the driver as possible, before failing
at the point that would deadlock. (need to wait for an interrupt in interrupt context).
I think this is the most maintainable approach as it has the most in common. I don't like
the two drivers alternative.
>> each MSC.
>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>> index a623f405ddd8..c6f087f9fa7d 100644
>> --- a/drivers/resctrl/mpam_internal.h
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -68,10 +68,19 @@ struct mpam_msc {
>>  
>>  	/*
>>  	 * mon_sel_lock protects access to the MSC hardware registers that are
>> -	 * affeted by MPAMCFG_MON_SEL.
>> +	 * affected by MPAMCFG_MON_SEL, and the mbwu_state.
>> +	 * Both the 'inner' and 'outer' must be taken.
>> +	 * For real MMIO MSC, the outer lock is unnecessary - but keeps the
>> +	 * code common with:
>> +	 * Firmware backed MSC need to sleep when accessing the MSC, which
>> +	 * means some code-paths will always fail. For these MSC the outer
>> +	 * lock is providing the protection, and the inner lock fails to
>> +	 * be taken if the task is unable to sleep.
>> +	 *
>>  	 * If needed, take msc->probe_lock first.
>>  	 */
>>  	struct mutex		outer_mon_sel_lock;
>> +	bool			outer_lock_held;
> 
> Why not use mutex_is_locked()?
That works. I've had a bad experience with the lockdep version of that checking who
owns the mutex, and getting confused when there is an IPI involved.
>>  	raw_spinlock_t		inner_mon_sel_lock;
> 
> Why raw?  The commit message makes no mention of it.
> 
> (We really to need to sit on a specific CPU while holding this lock, so
> "raw" makes sense.  But we're always doing this in a cross-call,
> presumably with the hotplug lock held -- so I think we can't be
> migrated anyway?)
Nothing to do with hotplug. (my recollection as to why this got changed - ) is because an
IPI results in the kind of context where you can't sleep - and regular spinlocks can end
up sleeping. This is the trick RT pulls. Without raw here - the atomic sleep check starts
complaining about taking a spinlock  behind and IPI.
>>  	unsigned long		inner_mon_sel_flags;
>>  
>> @@ -81,6 +90,52 @@ struct mpam_msc {
>>  	struct mpam_garbage	garbage;
>>  };
>>  
>> +static inline bool __must_check mpam_mon_sel_inner_lock(struct mpam_msc *msc)
>> +{
>> +	/*
>> +	 * The outer lock may be taken by a CPU that then issues an IPI to run
>> +	 * a helper that takes the inner lock. lockdep can't help us here.
>> +	 */
>> +	WARN_ON_ONCE(!msc->outer_lock_held);
>> +
>> +	if (msc->iface == MPAM_IFACE_MMIO) {
>> +		raw_spin_lock_irqsave(&msc->inner_mon_sel_lock, msc->inner_mon_sel_flags);
>> +		return true;
>> +	}
>> +
>> +	/* Accesses must fail if we are not pre-emptible */
>> +	return !!preemptible();
> 
> What accesses?
To the mon_sel register.
> In the MPAM_IFACE_MMIO case, this returns true even though non-
> preemptible (because of getting the lock).
> 
> So, what is the semantics of the return value?
> 
> A comment would probably help.
/* Returning false here means accesses to mon_sel must fail and report an error. */
>> +}
>> +
>> +static inline void mpam_mon_sel_inner_unlock(struct mpam_msc *msc)
>> +{
>> +	WARN_ON_ONCE(!msc->outer_lock_held);
>> +
>> +	if (msc->iface == MPAM_IFACE_MMIO)
>> +		raw_spin_unlock_irqrestore(&msc->inner_mon_sel_lock, msc->inner_mon_sel_flags);
>> +}
>> +
>> +static inline void mpam_mon_sel_outer_lock(struct mpam_msc *msc)
>> +{
>> +	mutex_lock(&msc->outer_mon_sel_lock);
>> +	msc->outer_lock_held = true;
>> +}
>> +
> 
>> +static inline void mpam_mon_sel_outer_unlock(struct mpam_msc *msc)
>> +{
>> +	msc->outer_lock_held = false;
>> +	mutex_unlock(&msc->outer_mon_sel_lock);
>> +}
>> +
>> +static inline void mpam_mon_sel_lock_held(struct mpam_msc *msc)
>> +{
>> +	WARN_ON_ONCE(!msc->outer_lock_held);
>> +	if (msc->iface == MPAM_IFACE_MMIO)
>> +		lockdep_assert_held_once(&msc->inner_mon_sel_lock);
>> +	else
>> +		lockdep_assert_preemption_enabled();
>> +}
>> +
> 
> Except that monitors may need to be accessed in interrupt context,
> I don't see an obvious difference between controls and monitors that
> motivates this locking model.
Controls don't have an overflow interrupt, and would never be accessed by perf in nasty
contexts.
> Is the outer lock ever needfully held for extended periods of time,
> making a (raw) spinlock unsuitable?
It's held before sending the IPI - but only because the firmware platforms should never
need to send that IPI.
I can drop the outer lock for now as the firmware platforms haven't properly materialised,
(promised ~three years ago - also promised in December this year). But some kind of
abstraction is needed here to keep the code common, and these mon_sel accesses need to be
something that can fail.
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
 
- * [PATCH 17/33] arm_mpam: Probe the hardware features resctrl supports
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (15 preceding siblings ...)
  2025-08-22 15:29 ` [PATCH 16/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers James Morse
@ 2025-08-22 15:29 ` James Morse
  2025-08-28 13:44   ` Ben Horgan
  2025-08-22 15:29 ` [PATCH 18/33] arm_mpam: Merge supported features during mpam_enable() into mpam_class James Morse
                   ` (50 subsequent siblings)
  67 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Dave Martin
Expand the probing support with the control and monitor types
we can use with resctrl.
CC: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
 * Made mpam_ris_hw_probe_hw_nrdy() more in C.
 * Added static assert on features bitmap size.
---
 drivers/resctrl/mpam_devices.c  | 156 +++++++++++++++++++++++++++++++-
 drivers/resctrl/mpam_internal.h |  54 +++++++++++
 2 files changed, 209 insertions(+), 1 deletion(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 012e09e80300..290a04f8654f 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -102,7 +102,7 @@ static LLIST_HEAD(mpam_garbage);
 
 static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
 {
-	WARN_ON_ONCE(reg > msc->mapped_hwpage_sz);
+	WARN_ON_ONCE(reg + sizeof(u32) > msc->mapped_hwpage_sz);
 	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
 
 	return readl_relaxed(msc->mapped_hwpage + reg);
@@ -131,6 +131,20 @@ static inline void _mpam_write_partsel_reg(struct mpam_msc *msc, u16 reg, u32 va
 }
 #define mpam_write_partsel_reg(msc, reg, val)  _mpam_write_partsel_reg(msc, MPAMCFG_##reg, val)
 
+static inline u32 _mpam_read_monsel_reg(struct mpam_msc *msc, u16 reg)
+{
+	mpam_mon_sel_lock_held(msc);
+	return __mpam_read_reg(msc, reg);
+}
+#define mpam_read_monsel_reg(msc, reg) _mpam_read_monsel_reg(msc, MSMON_##reg)
+
+static inline void _mpam_write_monsel_reg(struct mpam_msc *msc, u16 reg, u32 val)
+{
+	mpam_mon_sel_lock_held(msc);
+	__mpam_write_reg(msc, reg, val);
+}
+#define mpam_write_monsel_reg(msc, reg, val)   _mpam_write_monsel_reg(msc, MSMON_##reg, val)
+
 static u64 mpam_msc_read_idr(struct mpam_msc *msc)
 {
 	u64 idr_high = 0, idr_low;
@@ -643,6 +657,139 @@ static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
 	return found;
 }
 
+/*
+ * IHI009A.a has this nugget: "If a monitor does not support automatic behaviour
+ * of NRDY, software can use this bit for any purpose" - so hardware might not
+ * implement this - but it isn't RES0.
+ *
+ * Try and see what values stick in this bit. If we can write either value,
+ * its probably not implemented by hardware.
+ */
+static bool _mpam_ris_hw_probe_hw_nrdy(struct mpam_msc_ris * ris, u32 mon_reg)
+{
+	u32 now;
+	u64 mon_sel;
+	bool can_set, can_clear;
+	struct mpam_msc *msc = ris->vmsc->msc;
+
+	if (WARN_ON_ONCE(!mpam_mon_sel_inner_lock(msc)))
+		return false;
+
+	mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, 0) |
+		  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
+	_mpam_write_monsel_reg(msc, mon_reg, mon_sel);
+
+	_mpam_write_monsel_reg(msc, mon_reg, MSMON___NRDY);
+	now = _mpam_read_monsel_reg(msc, mon_reg);
+	can_set = now & MSMON___NRDY;
+
+	_mpam_write_monsel_reg(msc, mon_reg, 0);
+	now = _mpam_read_monsel_reg(msc, mon_reg);
+	can_clear = !(now & MSMON___NRDY);
+	mpam_mon_sel_inner_unlock(msc);
+
+	return (!can_set || !can_clear);
+}
+
+#define mpam_ris_hw_probe_hw_nrdy(_ris, _mon_reg)			\
+        _mpam_ris_hw_probe_hw_nrdy(_ris, MSMON_##_mon_reg)
+
+static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
+{
+	int err;
+	struct mpam_msc *msc = ris->vmsc->msc;
+	struct mpam_props *props = &ris->props;
+
+	lockdep_assert_held(&msc->probe_lock);
+	lockdep_assert_held(&msc->part_sel_lock);
+
+	/* Cache Portion partitioning */
+	if (FIELD_GET(MPAMF_IDR_HAS_CPOR_PART, ris->idr)) {
+		u32 cpor_features = mpam_read_partsel_reg(msc, CPOR_IDR);
+
+		props->cpbm_wd = FIELD_GET(MPAMF_CPOR_IDR_CPBM_WD, cpor_features);
+		if (props->cpbm_wd)
+			mpam_set_feature(mpam_feat_cpor_part, props);
+	}
+
+	/* Memory bandwidth partitioning */
+	if (FIELD_GET(MPAMF_IDR_HAS_MBW_PART, ris->idr)) {
+		u32 mbw_features = mpam_read_partsel_reg(msc, MBW_IDR);
+
+		/* portion bitmap resolution */
+		props->mbw_pbm_bits = FIELD_GET(MPAMF_MBW_IDR_BWPBM_WD, mbw_features);
+		if (props->mbw_pbm_bits &&
+		    FIELD_GET(MPAMF_MBW_IDR_HAS_PBM, mbw_features))
+			mpam_set_feature(mpam_feat_mbw_part, props);
+
+		props->bwa_wd = FIELD_GET(MPAMF_MBW_IDR_BWA_WD, mbw_features);
+		if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MAX, mbw_features))
+			mpam_set_feature(mpam_feat_mbw_max, props);
+	}
+
+	/* Performance Monitoring */
+	if (FIELD_GET(MPAMF_IDR_HAS_MSMON, ris->idr)) {
+		u32 msmon_features = mpam_read_partsel_reg(msc, MSMON_IDR);
+
+		/*
+		 * If the firmware max-nrdy-us property is missing, the
+		 * CSU counters can't be used. Should we wait forever?
+		 */
+		err = device_property_read_u32(&msc->pdev->dev,
+					       "arm,not-ready-us",
+					       &msc->nrdy_usec);
+
+		if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_CSU, msmon_features)) {
+			u32 csumonidr;
+
+			csumonidr = mpam_read_partsel_reg(msc, CSUMON_IDR);
+			props->num_csu_mon = FIELD_GET(MPAMF_CSUMON_IDR_NUM_MON, csumonidr);
+			if (props->num_csu_mon) {
+				bool hw_managed;
+
+				mpam_set_feature(mpam_feat_msmon_csu, props);
+
+				/* Is NRDY hardware managed? */
+				mpam_mon_sel_outer_lock(msc);
+				hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, CSU);
+				mpam_mon_sel_outer_unlock(msc);
+				if (hw_managed)
+					mpam_set_feature(mpam_feat_msmon_csu_hw_nrdy, props);
+			}
+
+			/*
+			 * Accept the missing firmware property if NRDY appears
+			 * un-implemented.
+			 */
+			if (err && mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, props))
+				pr_err_once("Counters are not usable because not-ready timeout was not provided by firmware.");
+		}
+		if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_MBWU, msmon_features)) {
+			bool hw_managed;
+			u32 mbwumonidr = mpam_read_partsel_reg(msc, MBWUMON_IDR);
+
+			props->num_mbwu_mon = FIELD_GET(MPAMF_MBWUMON_IDR_NUM_MON, mbwumonidr);
+			if (props->num_mbwu_mon)
+				mpam_set_feature(mpam_feat_msmon_mbwu, props);
+
+			if (FIELD_GET(MPAMF_MBWUMON_IDR_HAS_RWBW, mbwumonidr))
+				mpam_set_feature(mpam_feat_msmon_mbwu_rwbw, props);
+
+			/* Is NRDY hardware managed? */
+			mpam_mon_sel_outer_lock(msc);
+			hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, MBWU);
+			mpam_mon_sel_outer_unlock(msc);
+			if (hw_managed)
+				mpam_set_feature(mpam_feat_msmon_mbwu_hw_nrdy, props);
+
+			/*
+			 * Don't warn about any missing firmware property for
+			 * MBWU NRDY - it doesn't make any sense!
+			 */
+		}
+	}
+}
+
 static int mpam_msc_hw_probe(struct mpam_msc *msc)
 {
 	u64 idr;
@@ -663,6 +810,7 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 
 	idr = mpam_msc_read_idr(msc);
 	mutex_unlock(&msc->part_sel_lock);
+
 	msc->ris_max = FIELD_GET(MPAMF_IDR_RIS_MAX, idr);
 
 	/* Use these values so partid/pmg always starts with a valid value */
@@ -683,6 +831,12 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 		ris = mpam_get_or_create_ris(msc, ris_idx);
 		if (IS_ERR(ris))
 			return PTR_ERR(ris);
+		ris->idr = idr;
+
+		mutex_lock(&msc->part_sel_lock);
+		__mpam_part_sel(ris_idx, 0, msc);
+		mpam_ris_hw_probe(ris);
+		mutex_unlock(&msc->part_sel_lock);
 	}
 
 	spin_lock(&partid_max_lock);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index c6f087f9fa7d..9f6cd4a68cce 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -136,6 +136,56 @@ static inline void mpam_mon_sel_lock_held(struct mpam_msc *msc)
 		lockdep_assert_preemption_enabled();
 }
 
+/*
+ * When we compact the supported features, we don't care what they are.
+ * Storing them as a bitmap makes life easy.
+ */
+typedef u16 mpam_features_t;
+
+/* Bits for mpam_features_t */
+enum mpam_device_features {
+	mpam_feat_ccap_part = 0,
+	mpam_feat_cpor_part,
+	mpam_feat_mbw_part,
+	mpam_feat_mbw_min,
+	mpam_feat_mbw_max,
+	mpam_feat_mbw_prop,
+	mpam_feat_msmon,
+	mpam_feat_msmon_csu,
+	mpam_feat_msmon_csu_capture,
+	mpam_feat_msmon_csu_hw_nrdy,
+	mpam_feat_msmon_mbwu,
+	mpam_feat_msmon_mbwu_capture,
+	mpam_feat_msmon_mbwu_rwbw,
+	mpam_feat_msmon_mbwu_hw_nrdy,
+	mpam_feat_msmon_capt,
+	MPAM_FEATURE_LAST,
+};
+static_assert(BITS_PER_TYPE(mpam_features_t) >= MPAM_FEATURE_LAST);
+#define MPAM_ALL_FEATURES      ((1 << MPAM_FEATURE_LAST) - 1)
+
+struct mpam_props {
+	mpam_features_t		features;
+
+	u16			cpbm_wd;
+	u16			mbw_pbm_bits;
+	u16			bwa_wd;
+	u16			num_csu_mon;
+	u16			num_mbwu_mon;
+};
+
+static inline bool mpam_has_feature(enum mpam_device_features feat,
+				    struct mpam_props *props)
+{
+	return (1 << feat) & props->features;
+}
+
+static inline void mpam_set_feature(enum mpam_device_features feat,
+				    struct mpam_props *props)
+{
+	props->features |= (1 << feat);
+}
+
 struct mpam_class {
 	/* mpam_components in this class */
 	struct list_head	components;
@@ -175,6 +225,8 @@ struct mpam_vmsc {
 	/* mpam_msc_ris in this vmsc */
 	struct list_head	ris;
 
+	struct mpam_props	props;
+
 	/* All RIS in this vMSC are members of this MSC */
 	struct mpam_msc		*msc;
 
@@ -186,6 +238,8 @@ struct mpam_vmsc {
 
 struct mpam_msc_ris {
 	u8			ris_idx;
+	u64			idr;
+	struct mpam_props	props;
 
 	cpumask_t		affinity;
 
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 17/33] arm_mpam: Probe the hardware features resctrl supports
  2025-08-22 15:29 ` [PATCH 17/33] arm_mpam: Probe the hardware features resctrl supports James Morse
@ 2025-08-28 13:44   ` Ben Horgan
  2025-09-09 16:57     ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-08-28 13:44 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:29, James Morse wrote:
> Expand the probing support with the control and monitor types
> we can use with resctrl.
> 
> CC: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
>  * Made mpam_ris_hw_probe_hw_nrdy() more in C.
>  * Added static assert on features bitmap size.
> ---
>  drivers/resctrl/mpam_devices.c  | 156 +++++++++++++++++++++++++++++++-
>  drivers/resctrl/mpam_internal.h |  54 +++++++++++
>  2 files changed, 209 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 012e09e80300..290a04f8654f 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -102,7 +102,7 @@ static LLIST_HEAD(mpam_garbage);
>  
>  static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
>  {
> -	WARN_ON_ONCE(reg > msc->mapped_hwpage_sz);
> +	WARN_ON_ONCE(reg + sizeof(u32) > msc->mapped_hwpage_sz);
Update in the patch that introduced this line.
>  	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
>  
>  	return readl_relaxed(msc->mapped_hwpage + reg);
> @@ -131,6 +131,20 @@ static inline void _mpam_write_partsel_reg(struct mpam_msc *msc, u16 reg, u32 va
>  }
>  #define mpam_write_partsel_reg(msc, reg, val)  _mpam_write_partsel_reg(msc, MPAMCFG_##reg, val)
>  
> +static inline u32 _mpam_read_monsel_reg(struct mpam_msc *msc, u16 reg)
> +{
> +	mpam_mon_sel_lock_held(msc);
> +	return __mpam_read_reg(msc, reg);
> +}
> +#define mpam_read_monsel_reg(msc, reg) _mpam_read_monsel_reg(msc, MSMON_##reg)
> +
> +static inline void _mpam_write_monsel_reg(struct mpam_msc *msc, u16 reg, u32 val)
> +{
> +	mpam_mon_sel_lock_held(msc);
> +	__mpam_write_reg(msc, reg, val);
> +}
> +#define mpam_write_monsel_reg(msc, reg, val)   _mpam_write_monsel_reg(msc, MSMON_##reg, val)
> +
>  static u64 mpam_msc_read_idr(struct mpam_msc *msc)
>  {
>  	u64 idr_high = 0, idr_low;
> @@ -643,6 +657,139 @@ static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
>  	return found;
>  }
>  
> +/*
> + * IHI009A.a has this nugget: "If a monitor does not support automatic behaviour
> + * of NRDY, software can use this bit for any purpose" - so hardware might not
> + * implement this - but it isn't RES0.
> + *
> + * Try and see what values stick in this bit. If we can write either value,
> + * its probably not implemented by hardware.
> + */
> +static bool _mpam_ris_hw_probe_hw_nrdy(struct mpam_msc_ris * ris, u32 mon_reg)
> +{
> +	u32 now;
> +	u64 mon_sel;
> +	bool can_set, can_clear;
> +	struct mpam_msc *msc = ris->vmsc->msc;
> +
> +	if (WARN_ON_ONCE(!mpam_mon_sel_inner_lock(msc)))
> +		return false;
> +
> +	mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, 0) |
> +		  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
> +	_mpam_write_monsel_reg(msc, mon_reg, mon_sel);
> +
> +	_mpam_write_monsel_reg(msc, mon_reg, MSMON___NRDY);
> +	now = _mpam_read_monsel_reg(msc, mon_reg);
> +	can_set = now & MSMON___NRDY;
> +
> +	_mpam_write_monsel_reg(msc, mon_reg, 0);
> +	now = _mpam_read_monsel_reg(msc, mon_reg);
> +	can_clear = !(now & MSMON___NRDY);
> +	mpam_mon_sel_inner_unlock(msc);
> +
> +	return (!can_set || !can_clear);
> +}
> +
> +#define mpam_ris_hw_probe_hw_nrdy(_ris, _mon_reg)			\
> +        _mpam_ris_hw_probe_hw_nrdy(_ris, MSMON_##_mon_reg)
> +
> +static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
> +{
> +	int err;
> +	struct mpam_msc *msc = ris->vmsc->msc;
> +	struct mpam_props *props = &ris->props;
> +
> +	lockdep_assert_held(&msc->probe_lock);
> +	lockdep_assert_held(&msc->part_sel_lock);
> +
> +	/* Cache Portion partitioning */
> +	if (FIELD_GET(MPAMF_IDR_HAS_CPOR_PART, ris->idr)) {
> +		u32 cpor_features = mpam_read_partsel_reg(msc, CPOR_IDR);
> +
> +		props->cpbm_wd = FIELD_GET(MPAMF_CPOR_IDR_CPBM_WD, cpor_features);
> +		if (props->cpbm_wd)
> +			mpam_set_feature(mpam_feat_cpor_part, props);
> +	}
> +
> +	/* Memory bandwidth partitioning */
> +	if (FIELD_GET(MPAMF_IDR_HAS_MBW_PART, ris->idr)) {
> +		u32 mbw_features = mpam_read_partsel_reg(msc, MBW_IDR);
> +
> +		/* portion bitmap resolution */
> +		props->mbw_pbm_bits = FIELD_GET(MPAMF_MBW_IDR_BWPBM_WD, mbw_features);
> +		if (props->mbw_pbm_bits &&
> +		    FIELD_GET(MPAMF_MBW_IDR_HAS_PBM, mbw_features))
> +			mpam_set_feature(mpam_feat_mbw_part, props);
> +
> +		props->bwa_wd = FIELD_GET(MPAMF_MBW_IDR_BWA_WD, mbw_features);
> +		if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MAX, mbw_features))
> +			mpam_set_feature(mpam_feat_mbw_max, props);
> +	}
> +
> +	/* Performance Monitoring */
> +	if (FIELD_GET(MPAMF_IDR_HAS_MSMON, ris->idr)) {
> +		u32 msmon_features = mpam_read_partsel_reg(msc, MSMON_IDR);
> +
> +		/*
> +		 * If the firmware max-nrdy-us property is missing, the
> +		 * CSU counters can't be used. Should we wait forever?
> +		 */
> +		err = device_property_read_u32(&msc->pdev->dev,
> +					       "arm,not-ready-us",
> +					       &msc->nrdy_usec);
> +
> +		if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_CSU, msmon_features)) {
> +			u32 csumonidr;
> +
> +			csumonidr = mpam_read_partsel_reg(msc, CSUMON_IDR);
> +			props->num_csu_mon = FIELD_GET(MPAMF_CSUMON_IDR_NUM_MON, csumonidr);
> +			if (props->num_csu_mon) {
> +				bool hw_managed;
> +
> +				mpam_set_feature(mpam_feat_msmon_csu, props);
> +
> +				/* Is NRDY hardware managed? */
> +				mpam_mon_sel_outer_lock(msc);
> +				hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, CSU);
> +				mpam_mon_sel_outer_unlock(msc);
> +				if (hw_managed)
> +					mpam_set_feature(mpam_feat_msmon_csu_hw_nrdy, props);
> +			}
> +
> +			/*
> +			 * Accept the missing firmware property if NRDY appears
> +			 * un-implemented.
> +			 */
> +			if (err && mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, props))
> +				pr_err_once("Counters are not usable because not-ready timeout was not provided by firmware.");
> +		}
> +		if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_MBWU, msmon_features)) {
> +			bool hw_managed;
> +			u32 mbwumonidr = mpam_read_partsel_reg(msc, MBWUMON_IDR);
> +
> +			props->num_mbwu_mon = FIELD_GET(MPAMF_MBWUMON_IDR_NUM_MON, mbwumonidr);
> +			if (props->num_mbwu_mon)
> +				mpam_set_feature(mpam_feat_msmon_mbwu, props);
> +
> +			if (FIELD_GET(MPAMF_MBWUMON_IDR_HAS_RWBW, mbwumonidr))
> +				mpam_set_feature(mpam_feat_msmon_mbwu_rwbw, props);
> +
> +			/* Is NRDY hardware managed? */
> +			mpam_mon_sel_outer_lock(msc);
> +			hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, MBWU);
> +			mpam_mon_sel_outer_unlock(msc);
> +			if (hw_managed)
> +				mpam_set_feature(mpam_feat_msmon_mbwu_hw_nrdy, props);
> +
> +			/*
> +			 * Don't warn about any missing firmware property for
> +			 * MBWU NRDY - it doesn't make any sense!
> +			 */
> +		}
> +	}
> +}
> +
>  static int mpam_msc_hw_probe(struct mpam_msc *msc)
>  {
>  	u64 idr;
> @@ -663,6 +810,7 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>  
>  	idr = mpam_msc_read_idr(msc);
>  	mutex_unlock(&msc->part_sel_lock);
> +
>  	msc->ris_max = FIELD_GET(MPAMF_IDR_RIS_MAX, idr);
>  
>  	/* Use these values so partid/pmg always starts with a valid value */
> @@ -683,6 +831,12 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>  		ris = mpam_get_or_create_ris(msc, ris_idx);
>  		if (IS_ERR(ris))
>  			return PTR_ERR(ris);
> +		ris->idr = idr;
> +
> +		mutex_lock(&msc->part_sel_lock);
> +		__mpam_part_sel(ris_idx, 0, msc);
> +		mpam_ris_hw_probe(ris);
> +		mutex_unlock(&msc->part_sel_lock);
>  	}
>  
>  	spin_lock(&partid_max_lock);
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index c6f087f9fa7d..9f6cd4a68cce 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -136,6 +136,56 @@ static inline void mpam_mon_sel_lock_held(struct mpam_msc *msc)
>  		lockdep_assert_preemption_enabled();
>  }
>  
> +/*
> + * When we compact the supported features, we don't care what they are.
> + * Storing them as a bitmap makes life easy.
> + */
> +typedef u16 mpam_features_t;
> +
> +/* Bits for mpam_features_t */
> +enum mpam_device_features {
> +	mpam_feat_ccap_part = 0,
> +	mpam_feat_cpor_part,
> +	mpam_feat_mbw_part,
> +	mpam_feat_mbw_min,
> +	mpam_feat_mbw_max,
> +	mpam_feat_mbw_prop,
> +	mpam_feat_msmon,
> +	mpam_feat_msmon_csu,
> +	mpam_feat_msmon_csu_capture,
> +	mpam_feat_msmon_csu_hw_nrdy,
> +	mpam_feat_msmon_mbwu,
> +	mpam_feat_msmon_mbwu_capture,
> +	mpam_feat_msmon_mbwu_rwbw,
> +	mpam_feat_msmon_mbwu_hw_nrdy,
> +	mpam_feat_msmon_capt,
> +	MPAM_FEATURE_LAST,
This isn't all the features or just the features supported by resctrl.
Just add them all in this patch?
> +};
> +static_assert(BITS_PER_TYPE(mpam_features_t) >= MPAM_FEATURE_LAST);
> +#define MPAM_ALL_FEATURES      ((1 << MPAM_FEATURE_LAST) - 1)
Unused?
> +
> +struct mpam_props {
> +	mpam_features_t		features;
> +
> +	u16			cpbm_wd;
> +	u16			mbw_pbm_bits;
> +	u16			bwa_wd;
> +	u16			num_csu_mon;
> +	u16			num_mbwu_mon;
> +};
> +
> +static inline bool mpam_has_feature(enum mpam_device_features feat,
> +				    struct mpam_props *props)
> +{
> +	return (1 << feat) & props->features;
> +}
> +
> +static inline void mpam_set_feature(enum mpam_device_features feat,
> +				    struct mpam_props *props)
> +{
> +	props->features |= (1 << feat);
> +}
> +
>  struct mpam_class {
>  	/* mpam_components in this class */
>  	struct list_head	components;
> @@ -175,6 +225,8 @@ struct mpam_vmsc {
>  	/* mpam_msc_ris in this vmsc */
>  	struct list_head	ris;
>  
> +	struct mpam_props	props;
> +
>  	/* All RIS in this vMSC are members of this MSC */
>  	struct mpam_msc		*msc;
>  
> @@ -186,6 +238,8 @@ struct mpam_vmsc {
>  
>  struct mpam_msc_ris {
>  	u8			ris_idx;
> +	u64			idr;
> +	struct mpam_props	props;
>  
>  	cpumask_t		affinity;
>  
Thanks,
Ben
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 17/33] arm_mpam: Probe the hardware features resctrl supports
  2025-08-28 13:44   ` Ben Horgan
@ 2025-09-09 16:57     ` James Morse
  2025-09-10  9:11       ` Ben Horgan
  0 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-09-09 16:57 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi Ben,
On 28/08/2025 14:44, Ben Horgan wrote:
> On 8/22/25 16:29, James Morse wrote:
>> Expand the probing support with the control and monitor types
>> we can use with resctrl.
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index 012e09e80300..290a04f8654f 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -102,7 +102,7 @@ static LLIST_HEAD(mpam_garbage);
>>  
>>  static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
>>  {
>> -	WARN_ON_ONCE(reg > msc->mapped_hwpage_sz);
>> +	WARN_ON_ONCE(reg + sizeof(u32) > msc->mapped_hwpage_sz);
> Update in the patch that introduced this line.
Yeah - this got ripped out.
Now that the size is in the ACPI table, I should never need to debug it being wrong!
(and the resulting translation fault will be enough to say something is wrong)
>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>> index c6f087f9fa7d..9f6cd4a68cce 100644
>> --- a/drivers/resctrl/mpam_internal.h
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -136,6 +136,56 @@ static inline void mpam_mon_sel_lock_held(struct mpam_msc *msc)
>> +/* Bits for mpam_features_t */
>> +enum mpam_device_features {
>> +	mpam_feat_ccap_part = 0,
>> +	mpam_feat_cpor_part,
>> +	mpam_feat_mbw_part,
>> +	mpam_feat_mbw_min,
>> +	mpam_feat_mbw_max,
>> +	mpam_feat_mbw_prop,
>> +	mpam_feat_msmon,
>> +	mpam_feat_msmon_csu,
>> +	mpam_feat_msmon_csu_capture,
>> +	mpam_feat_msmon_csu_hw_nrdy,
>> +	mpam_feat_msmon_mbwu,
>> +	mpam_feat_msmon_mbwu_capture,
>> +	mpam_feat_msmon_mbwu_rwbw,
>> +	mpam_feat_msmon_mbwu_hw_nrdy,
>> +	mpam_feat_msmon_capt,
>> +	MPAM_FEATURE_LAST,
> This isn't all the features or just the features supported by resctrl.
> Just add them all in this patch?
I'm having trouble parsing this ...
I needed somewhere to split the features up, as there are rather a lot. Those that resctrl
supports seemed like the logical spot.
>> +};
>> +static_assert(BITS_PER_TYPE(mpam_features_t) >= MPAM_FEATURE_LAST);
>> +#define MPAM_ALL_FEATURES      ((1 << MPAM_FEATURE_LAST) - 1)
> Unused?
Fixed, thanks!
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 17/33] arm_mpam: Probe the hardware features resctrl supports
  2025-09-09 16:57     ` James Morse
@ 2025-09-10  9:11       ` Ben Horgan
  0 siblings, 0 replies; 200+ messages in thread
From: Ben Horgan @ 2025-09-10  9:11 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 9/9/25 17:57, James Morse wrote:
> Hi Ben,
> 
> On 28/08/2025 14:44, Ben Horgan wrote:
>> On 8/22/25 16:29, James Morse wrote:
>>> Expand the probing support with the control and monitor types
>>> we can use with resctrl.
> 
>>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>>> index 012e09e80300..290a04f8654f 100644
>>> --- a/drivers/resctrl/mpam_devices.c
>>> +++ b/drivers/resctrl/mpam_devices.c
>>> @@ -102,7 +102,7 @@ static LLIST_HEAD(mpam_garbage);
>>>  
>>>  static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
>>>  {
>>> -	WARN_ON_ONCE(reg > msc->mapped_hwpage_sz);
>>> +	WARN_ON_ONCE(reg + sizeof(u32) > msc->mapped_hwpage_sz);
> 
>> Update in the patch that introduced this line.
> 
> Yeah - this got ripped out.
> Now that the size is in the ACPI table, I should never need to debug it being wrong!
> (and the resulting translation fault will be enough to say something is wrong)
> 
> 
>>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>>> index c6f087f9fa7d..9f6cd4a68cce 100644
>>> --- a/drivers/resctrl/mpam_internal.h
>>> +++ b/drivers/resctrl/mpam_internal.h
>>> @@ -136,6 +136,56 @@ static inline void mpam_mon_sel_lock_held(struct mpam_msc *msc)
> 
>>> +/* Bits for mpam_features_t */
>>> +enum mpam_device_features {
>>> +	mpam_feat_ccap_part = 0,
>>> +	mpam_feat_cpor_part,
>>> +	mpam_feat_mbw_part,
>>> +	mpam_feat_mbw_min,
>>> +	mpam_feat_mbw_max,
>>> +	mpam_feat_mbw_prop,
>>> +	mpam_feat_msmon,
>>> +	mpam_feat_msmon_csu,
>>> +	mpam_feat_msmon_csu_capture,
>>> +	mpam_feat_msmon_csu_hw_nrdy,
>>> +	mpam_feat_msmon_mbwu,
>>> +	mpam_feat_msmon_mbwu_capture,
>>> +	mpam_feat_msmon_mbwu_rwbw,
>>> +	mpam_feat_msmon_mbwu_hw_nrdy,
>>> +	mpam_feat_msmon_capt,
>>> +	MPAM_FEATURE_LAST,
> 
>> This isn't all the features or just the features supported by resctrl.
>> Just add them all in this patch?
> 
> I'm having trouble parsing this ...
> 
> I needed somewhere to split the features up, as there are rather a lot. Those that resctrl
> supports seemed like the logical spot.
I don't think resctrl doesn't support ccap, mpam_feat_ccap_part.
Possibly the point is that this is split into more detailed features
later but that's not clear from this patch or commit message.
> 
> 
>>> +};
>>> +static_assert(BITS_PER_TYPE(mpam_features_t) >= MPAM_FEATURE_LAST);
>>> +#define MPAM_ALL_FEATURES      ((1 << MPAM_FEATURE_LAST) - 1)
> 
>> Unused?
> 
> Fixed, thanks!
> 
> 
> Thanks,
> 
> James
Thanks,
Ben
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
 
 
- * [PATCH 18/33] arm_mpam: Merge supported features during mpam_enable() into mpam_class
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (16 preceding siblings ...)
  2025-08-22 15:29 ` [PATCH 17/33] arm_mpam: Probe the hardware features resctrl supports James Morse
@ 2025-08-22 15:29 ` James Morse
  2025-08-29 13:54   ` Ben Horgan
  2025-08-22 15:30 ` [PATCH 19/33] arm_mpam: Reset MSC controls from cpu hp callbacks James Morse
                   ` (49 subsequent siblings)
  67 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-22 15:29 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
To make a decision about whether to expose an mpam class as
a resctrl resource we need to know its overall supported
features and properties.
Once we've probed all the resources, we can walk the tree
and produce overall values by merging the bitmaps. This
eliminates features that are only supported by some MSC
that make up a component or class.
If bitmap properties are mismatched within a component we
cannot support the mismatched feature.
Care has to be taken as vMSC may hold mismatched RIS.
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_devices.c  | 215 ++++++++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h |   8 ++
 2 files changed, 223 insertions(+)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 290a04f8654f..bb62de6d3847 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -1186,8 +1186,223 @@ static struct platform_driver mpam_msc_driver = {
 	.remove = mpam_msc_drv_remove,
 };
 
+/* Any of these features mean the BWA_WD field is valid. */
+static bool mpam_has_bwa_wd_feature(struct mpam_props *props)
+{
+	if (mpam_has_feature(mpam_feat_mbw_min, props))
+		return true;
+	if (mpam_has_feature(mpam_feat_mbw_max, props))
+		return true;
+	if (mpam_has_feature(mpam_feat_mbw_prop, props))
+		return true;
+	return false;
+}
+
+#define MISMATCHED_HELPER(parent, child, helper, field, alias)		\
+	helper(parent) &&						\
+	((helper(child) && (parent)->field != (child)->field) ||	\
+	 (!helper(child) && !(alias)))
+
+#define MISMATCHED_FEAT(parent, child, feat, field, alias)		     \
+	mpam_has_feature((feat), (parent)) &&				     \
+	((mpam_has_feature((feat), (child)) && (parent)->field != (child)->field) || \
+	 (!mpam_has_feature((feat), (child)) && !(alias)))
+
+#define CAN_MERGE_FEAT(parent, child, feat, alias)			\
+	(alias) && !mpam_has_feature((feat), (parent)) &&		\
+	mpam_has_feature((feat), (child))
+
+/*
+ * Combine two props fields.
+ * If this is for controls that alias the same resource, it is safe to just
+ * copy the values over. If two aliasing controls implement the same scheme
+ * a safe value must be picked.
+ * For non-aliasing controls, these control different resources, and the
+ * resulting safe value must be compatible with both. When merging values in
+ * the tree, all the aliasing resources must be handled first.
+ * On mismatch, parent is modified.
+ */
+static void __props_mismatch(struct mpam_props *parent,
+			     struct mpam_props *child, bool alias)
+{
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_cpor_part, alias)) {
+		parent->cpbm_wd = child->cpbm_wd;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_cpor_part,
+				   cpbm_wd, alias)) {
+		pr_debug("%s cleared cpor_part\n", __func__);
+		mpam_clear_feature(mpam_feat_cpor_part, &parent->features);
+		parent->cpbm_wd = 0;
+	}
+
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_mbw_part, alias)) {
+		parent->mbw_pbm_bits = child->mbw_pbm_bits;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_mbw_part,
+				   mbw_pbm_bits, alias)) {
+		pr_debug("%s cleared mbw_part\n", __func__);
+		mpam_clear_feature(mpam_feat_mbw_part, &parent->features);
+		parent->mbw_pbm_bits = 0;
+	}
+
+	/* bwa_wd is a count of bits, fewer bits means less precision */
+	if (alias && !mpam_has_bwa_wd_feature(parent) && mpam_has_bwa_wd_feature(child)) {
+		parent->bwa_wd = child->bwa_wd;
+	} else if (MISMATCHED_HELPER(parent, child, mpam_has_bwa_wd_feature,
+				     bwa_wd, alias)) {
+		pr_debug("%s took the min bwa_wd\n", __func__);
+		parent->bwa_wd = min(parent->bwa_wd, child->bwa_wd);
+	}
+
+	/* For num properties, take the minimum */
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_msmon_csu, alias)) {
+		parent->num_csu_mon = child->num_csu_mon;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_msmon_csu,
+				   num_csu_mon, alias)) {
+		pr_debug("%s took the min num_csu_mon\n", __func__);
+		parent->num_csu_mon = min(parent->num_csu_mon, child->num_csu_mon);
+	}
+
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_msmon_mbwu, alias)) {
+		parent->num_mbwu_mon = child->num_mbwu_mon;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_msmon_mbwu,
+				   num_mbwu_mon, alias)) {
+		pr_debug("%s took the min num_mbwu_mon\n", __func__);
+		parent->num_mbwu_mon = min(parent->num_mbwu_mon, child->num_mbwu_mon);
+	}
+
+	if (alias) {
+		/* Merge features for aliased resources */
+		parent->features |= child->features;
+	} else {
+		/* Clear missing features for non aliasing */
+		parent->features &= child->features;
+	}
+}
+
+/*
+ * If a vmsc doesn't match class feature/configuration, do the right thing(tm).
+ * For 'num' properties we can just take the minimum.
+ * For properties where the mismatched unused bits would make a difference, we
+ * nobble the class feature, as we can't configure all the resources.
+ * e.g. The L3 cache is composed of two resources with 13 and 17 portion
+ * bitmaps respectively.
+ */
+static void
+__class_props_mismatch(struct mpam_class *class, struct mpam_vmsc *vmsc)
+{
+	struct mpam_props *cprops = &class->props;
+	struct mpam_props *vprops = &vmsc->props;
+
+	lockdep_assert_held(&mpam_list_lock); /* we modify class */
+
+	pr_debug("%s: Merging features for class:0x%lx &= vmsc:0x%lx\n",
+		 dev_name(&vmsc->msc->pdev->dev),
+		 (long)cprops->features, (long)vprops->features);
+
+	/* Take the safe value for any common features */
+	__props_mismatch(cprops, vprops, false);
+}
+
+static void
+__vmsc_props_mismatch(struct mpam_vmsc *vmsc, struct mpam_msc_ris *ris)
+{
+	struct mpam_props *rprops = &ris->props;
+	struct mpam_props *vprops = &vmsc->props;
+
+	lockdep_assert_held(&mpam_list_lock); /* we modify vmsc */
+
+	pr_debug("%s: Merging features for vmsc:0x%lx |= ris:0x%lx\n",
+		 dev_name(&vmsc->msc->pdev->dev),
+		 (long)vprops->features, (long)rprops->features);
+
+	/*
+	 * Merge mismatched features - Copy any features that aren't common,
+	 * but take the safe value for any common features.
+	 */
+	__props_mismatch(vprops, rprops, true);
+}
+
+/*
+ * Copy the first component's first vMSC's properties and features to the
+ * class. __class_props_mismatch() will remove conflicts.
+ * It is not possible to have a class with no components, or a component with
+ * no resources. The vMSC properties have already been built.
+ */
+static void mpam_enable_init_class_features(struct mpam_class *class)
+{
+	struct mpam_vmsc *vmsc;
+	struct mpam_component *comp;
+
+	comp = list_first_entry_or_null(&class->components,
+					struct mpam_component, class_list);
+	if (WARN_ON(!comp))
+		return;
+
+	vmsc = list_first_entry_or_null(&comp->vmsc,
+					struct mpam_vmsc, comp_list);
+	if (WARN_ON(!vmsc))
+		return;
+
+	class->props = vmsc->props;
+}
+
+static void mpam_enable_merge_vmsc_features(struct mpam_component *comp)
+{
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc_ris *ris;
+	struct mpam_class *class = comp->class;
+
+	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+		list_for_each_entry(ris, &vmsc->ris, vmsc_list) {
+			__vmsc_props_mismatch(vmsc, ris);
+			class->nrdy_usec = max(class->nrdy_usec,
+					       vmsc->msc->nrdy_usec);
+		}
+	}
+}
+
+static void mpam_enable_merge_class_features(struct mpam_component *comp)
+{
+	struct mpam_vmsc *vmsc;
+	struct mpam_class *class = comp->class;
+
+	list_for_each_entry(vmsc, &comp->vmsc, comp_list)
+		__class_props_mismatch(class, vmsc);
+}
+
+/*
+ * Merge all the common resource features into class.
+ * vmsc features are bitwise-or'd together, this must be done first.
+ * Next the class features are the bitwise-and of all the vmsc features.
+ * Other features are the min/max as appropriate.
+ *
+ * To avoid walking the whole tree twice, the class->nrdy_usec property is
+ * updated when working with the vmsc as it is a max(), and doesn't need
+ * initialising first.
+ */
+static void mpam_enable_merge_features(struct list_head *all_classes_list)
+{
+	struct mpam_class *class;
+	struct mpam_component *comp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_for_each_entry(class, all_classes_list, classes_list) {
+		list_for_each_entry(comp, &class->components, class_list)
+			mpam_enable_merge_vmsc_features(comp);
+
+		mpam_enable_init_class_features(class);
+
+		list_for_each_entry(comp, &class->components, class_list)
+			mpam_enable_merge_class_features(comp);
+	}
+}
+
 static void mpam_enable_once(void)
 {
+	mutex_lock(&mpam_list_lock);
+	mpam_enable_merge_features(&mpam_classes);
+	mutex_unlock(&mpam_list_lock);
+
 	/*
 	 * Once the cpuhp callbacks have been changed, mpam_partid_max can no
 	 * longer change.
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 9f6cd4a68cce..a2b0ff411138 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -186,12 +186,20 @@ static inline void mpam_set_feature(enum mpam_device_features feat,
 	props->features |= (1 << feat);
 }
 
+static inline void mpam_clear_feature(enum mpam_device_features feat,
+				      mpam_features_t *supported)
+{
+	*supported &= ~(1 << feat);
+}
+
 struct mpam_class {
 	/* mpam_components in this class */
 	struct list_head	components;
 
 	cpumask_t		affinity;
 
+	struct mpam_props	props;
+	u32			nrdy_usec;
 	u8			level;
 	enum mpam_class_types	type;
 
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 18/33] arm_mpam: Merge supported features during mpam_enable() into mpam_class
  2025-08-22 15:29 ` [PATCH 18/33] arm_mpam: Merge supported features during mpam_enable() into mpam_class James Morse
@ 2025-08-29 13:54   ` Ben Horgan
  2025-09-09 16:57     ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-08-29 13:54 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:29, James Morse wrote:
> To make a decision about whether to expose an mpam class as
> a resctrl resource we need to know its overall supported
> features and properties.
> 
> Once we've probed all the resources, we can walk the tree
> and produce overall values by merging the bitmaps. This
> eliminates features that are only supported by some MSC
> that make up a component or class.
> 
> If bitmap properties are mismatched within a component we
> cannot support the mismatched feature.
> 
> Care has to be taken as vMSC may hold mismatched RIS.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  drivers/resctrl/mpam_devices.c  | 215 ++++++++++++++++++++++++++++++++
>  drivers/resctrl/mpam_internal.h |   8 ++
>  2 files changed, 223 insertions(+)
Intricate but, as far as I can tell, all correct.
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Thanks,
Ben
^ permalink raw reply	[flat|nested] 200+ messages in thread 
- * Re: [PATCH 18/33] arm_mpam: Merge supported features during mpam_enable() into mpam_class
  2025-08-29 13:54   ` Ben Horgan
@ 2025-09-09 16:57     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-09 16:57 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi Ben,
On 29/08/2025 14:54, Ben Horgan wrote:
> On 8/22/25 16:29, James Morse wrote:
>> To make a decision about whether to expose an mpam class as
>> a resctrl resource we need to know its overall supported
>> features and properties.
>>
>> Once we've probed all the resources, we can walk the tree
>> and produce overall values by merging the bitmaps. This
>> eliminates features that are only supported by some MSC
>> that make up a component or class.
>>
>> If bitmap properties are mismatched within a component we
>> cannot support the mismatched feature.
>>
>> Care has to be taken as vMSC may hold mismatched RIS.
> Intricate but, as far as I can tell, all correct.
Yeah - its cpufeature.c all over again.
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Thanks!
James
^ permalink raw reply	[flat|nested] 200+ messages in thread 
 
 
- * [PATCH 19/33] arm_mpam: Reset MSC controls from cpu hp callbacks
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (17 preceding siblings ...)
  2025-08-22 15:29 ` [PATCH 18/33] arm_mpam: Merge supported features during mpam_enable() into mpam_class James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-27 16:19   ` Ben Horgan
  2025-08-22 15:30 ` [PATCH 20/33] arm_mpam: Add a helper to touch an MSC from any CPU James Morse
                   ` (48 subsequent siblings)
  67 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Rohit Mathew
When a CPU comes online, it may bring a newly accessible MSC with
it. Only the default partid has its value reset by hardware, and
even then the MSC might not have been reset since its config was
previously dirtyied. e.g. Kexec.
Any in-use partid must have its configuration restored, or reset.
In-use partids may be held in caches and evicted later.
MSC are also reset when CPUs are taken offline to cover cases where
firmware doesn't reset the MSC over reboot using UEFI, or kexec
where there is no firmware involvement.
If the configuration for a RIS has not been touched since it was
brought online, it does not need resetting again.
To reset, write the maximum values for all discovered controls.
CC: Rohit Mathew <Rohit.Mathew@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
 * Last bitmap write will always be non-zero.
  * Dropped READ_ONCE() - teh value can no longer change.
---
 drivers/resctrl/mpam_devices.c  | 121 ++++++++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h |   8 +++
 2 files changed, 129 insertions(+)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index bb62de6d3847..c1f01dd748ad 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -7,6 +7,7 @@
 #include <linux/atomic.h>
 #include <linux/arm_mpam.h>
 #include <linux/bitfield.h>
+#include <linux/bitmap.h>
 #include <linux/cacheinfo.h>
 #include <linux/cpu.h>
 #include <linux/cpumask.h>
@@ -849,8 +850,115 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 	return 0;
 }
 
+static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
+{
+	u32 num_words, msb;
+	u32 bm = ~0;
+	int i;
+
+	lockdep_assert_held(&msc->part_sel_lock);
+
+	if (wd == 0)
+		return;
+
+	/*
+	 * Write all ~0 to all but the last 32bit-word, which may
+	 * have fewer bits...
+	 */
+	num_words = DIV_ROUND_UP(wd, 32);
+	for (i = 0; i < num_words - 1; i++, reg += sizeof(bm))
+		__mpam_write_reg(msc, reg, bm);
+
+	/*
+	 * ....and then the last (maybe) partial 32bit word. When wd is a
+	 * multiple of 32, msb should be 31 to write a full 32bit word.
+	 */
+	msb = (wd - 1) % 32;
+	bm = GENMASK(msb, 0);
+	__mpam_write_reg(msc, reg, bm);
+}
+
+static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
+{
+	u16 bwa_fract = MPAMCFG_MBW_MAX_MAX;
+	struct mpam_msc *msc = ris->vmsc->msc;
+	struct mpam_props *rprops = &ris->props;
+
+	mpam_assert_srcu_read_lock_held();
+
+	mutex_lock(&msc->part_sel_lock);
+	__mpam_part_sel(ris->ris_idx, partid, msc);
+
+	if (mpam_has_feature(mpam_feat_cpor_part, rprops))
+		mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM, rprops->cpbm_wd);
+
+	if (mpam_has_feature(mpam_feat_mbw_part, rprops))
+		mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM, rprops->mbw_pbm_bits);
+
+	if (mpam_has_feature(mpam_feat_mbw_min, rprops))
+		mpam_write_partsel_reg(msc, MBW_MIN, 0);
+
+	if (mpam_has_feature(mpam_feat_mbw_max, rprops))
+		mpam_write_partsel_reg(msc, MBW_MAX, bwa_fract);
+
+	if (mpam_has_feature(mpam_feat_mbw_prop, rprops))
+		mpam_write_partsel_reg(msc, MBW_PROP, bwa_fract);
+	mutex_unlock(&msc->part_sel_lock);
+}
+
+static void mpam_reset_ris(struct mpam_msc_ris *ris)
+{
+	u16 partid, partid_max;
+
+	mpam_assert_srcu_read_lock_held();
+
+	if (ris->in_reset_state)
+		return;
+
+	spin_lock(&partid_max_lock);
+	partid_max = mpam_partid_max;
+	spin_unlock(&partid_max_lock);
+	for (partid = 0; partid < partid_max; partid++)
+		mpam_reset_ris_partid(ris, partid);
+}
+
+static void mpam_reset_msc(struct mpam_msc *msc, bool online)
+{
+	int idx;
+	struct mpam_msc_ris *ris;
+
+	mpam_assert_srcu_read_lock_held();
+
+	mpam_mon_sel_outer_lock(msc);
+	idx = srcu_read_lock(&mpam_srcu);
+	list_for_each_entry_srcu(ris, &msc->ris, msc_list, srcu_read_lock_held(&mpam_srcu)) {
+		mpam_reset_ris(ris);
+
+		/*
+		 * Set in_reset_state when coming online. The reset state
+		 * for non-zero partid may be lost while the CPUs are offline.
+		 */
+		ris->in_reset_state = online;
+	}
+	srcu_read_unlock(&mpam_srcu, idx);
+	mpam_mon_sel_outer_unlock(msc);
+}
+
 static int mpam_cpu_online(unsigned int cpu)
 {
+	int idx;
+	struct mpam_msc *msc;
+
+	idx = srcu_read_lock(&mpam_srcu);
+	list_for_each_entry_srcu(msc, &mpam_all_msc, glbl_list, srcu_read_lock_held(&mpam_srcu)) {
+		if (!cpumask_test_cpu(cpu, &msc->accessibility))
+			continue;
+
+		if (atomic_fetch_inc(&msc->online_refs) == 0)
+			mpam_reset_msc(msc, true);
+	}
+	srcu_read_unlock(&mpam_srcu, idx);
+
 	return 0;
 }
 
@@ -886,6 +994,19 @@ static int mpam_discovery_cpu_online(unsigned int cpu)
 
 static int mpam_cpu_offline(unsigned int cpu)
 {
+	int idx;
+	struct mpam_msc *msc;
+
+	idx = srcu_read_lock(&mpam_srcu);
+	list_for_each_entry_srcu(msc, &mpam_all_msc, glbl_list, srcu_read_lock_held(&mpam_srcu)) {
+		if (!cpumask_test_cpu(cpu, &msc->accessibility))
+			continue;
+
+		if (atomic_dec_and_test(&msc->online_refs))
+			mpam_reset_msc(msc, false);
+	}
+	srcu_read_unlock(&mpam_srcu, idx);
+
 	return 0;
 }
 
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index a2b0ff411138..466d670a01eb 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -5,6 +5,7 @@
 #define MPAM_INTERNAL_H
 
 #include <linux/arm_mpam.h>
+#include <linux/atomic.h>
 #include <linux/cpumask.h>
 #include <linux/io.h>
 #include <linux/llist.h>
@@ -43,6 +44,7 @@ struct mpam_msc {
 	struct pcc_mbox_chan	*pcc_chan;
 	u32			nrdy_usec;
 	cpumask_t		accessibility;
+	atomic_t		online_refs;
 
 	/*
 	 * probe_lock is only take during discovery. After discovery these
@@ -248,6 +250,7 @@ struct mpam_msc_ris {
 	u8			ris_idx;
 	u64			idr;
 	struct mpam_props	props;
+	bool			in_reset_state;
 
 	cpumask_t		affinity;
 
@@ -267,6 +270,11 @@ struct mpam_msc_ris {
 extern struct srcu_struct mpam_srcu;
 extern struct list_head mpam_classes;
 
+static inline void mpam_assert_srcu_read_lock_held(void)
+{
+	WARN_ON_ONCE(!srcu_read_lock_held((&mpam_srcu)));
+}
+
 /* System wide partid/pmg values */
 extern u16 mpam_partid_max;
 extern u8 mpam_pmg_max;
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 19/33] arm_mpam: Reset MSC controls from cpu hp callbacks
  2025-08-22 15:30 ` [PATCH 19/33] arm_mpam: Reset MSC controls from cpu hp callbacks James Morse
@ 2025-08-27 16:19   ` Ben Horgan
  2025-09-09 16:57     ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-08-27 16:19 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:30, James Morse wrote:
> When a CPU comes online, it may bring a newly accessible MSC with
> it. Only the default partid has its value reset by hardware, and
> even then the MSC might not have been reset since its config was
> previously dirtyied. e.g. Kexec.
> 
> Any in-use partid must have its configuration restored, or reset.
> In-use partids may be held in caches and evicted later.
> 
> MSC are also reset when CPUs are taken offline to cover cases where
> firmware doesn't reset the MSC over reboot using UEFI, or kexec
> where there is no firmware involvement.
> 
> If the configuration for a RIS has not been touched since it was
> brought online, it does not need resetting again.
> 
> To reset, write the maximum values for all discovered controls.
> 
> CC: Rohit Mathew <Rohit.Mathew@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
>  * Last bitmap write will always be non-zero.
>   * Dropped READ_ONCE() - teh value can no longer change.
> ---
>  drivers/resctrl/mpam_devices.c  | 121 ++++++++++++++++++++++++++++++++
>  drivers/resctrl/mpam_internal.h |   8 +++
>  2 files changed, 129 insertions(+)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index bb62de6d3847..c1f01dd748ad 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -7,6 +7,7 @@
>  #include <linux/atomic.h>
>  #include <linux/arm_mpam.h>
>  #include <linux/bitfield.h>
> +#include <linux/bitmap.h>
>  #include <linux/cacheinfo.h>
>  #include <linux/cpu.h>
>  #include <linux/cpumask.h>
> @@ -849,8 +850,115 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>  	return 0;
>  }
>  
> +static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
> +{
> +	u32 num_words, msb;
> +	u32 bm = ~0;
> +	int i;
> +
> +	lockdep_assert_held(&msc->part_sel_lock);
> +
> +	if (wd == 0)
> +		return;
> +
> +	/*
> +	 * Write all ~0 to all but the last 32bit-word, which may
> +	 * have fewer bits...
> +	 */
> +	num_words = DIV_ROUND_UP(wd, 32);
> +	for (i = 0; i < num_words - 1; i++, reg += sizeof(bm))
> +		__mpam_write_reg(msc, reg, bm);
> +
> +	/*
> +	 * ....and then the last (maybe) partial 32bit word. When wd is a
> +	 * multiple of 32, msb should be 31 to write a full 32bit word.
> +	 */
> +	msb = (wd - 1) % 32;
> +	bm = GENMASK(msb, 0);
> +	__mpam_write_reg(msc, reg, bm);
> +}
> +
> +static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
> +{
> +	u16 bwa_fract = MPAMCFG_MBW_MAX_MAX;
> +	struct mpam_msc *msc = ris->vmsc->msc;
> +	struct mpam_props *rprops = &ris->props;
> +
> +	mpam_assert_srcu_read_lock_held();
> +
> +	mutex_lock(&msc->part_sel_lock);
> +	__mpam_part_sel(ris->ris_idx, partid, msc);
> +
> +	if (mpam_has_feature(mpam_feat_cpor_part, rprops))
> +		mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM, rprops->cpbm_wd);
> +
> +	if (mpam_has_feature(mpam_feat_mbw_part, rprops))
> +		mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM, rprops->mbw_pbm_bits);
> +
> +	if (mpam_has_feature(mpam_feat_mbw_min, rprops))
> +		mpam_write_partsel_reg(msc, MBW_MIN, 0);
> +
> +	if (mpam_has_feature(mpam_feat_mbw_max, rprops))
> +		mpam_write_partsel_reg(msc, MBW_MAX, bwa_fract);
MPAMCFG_MBW_MAX_MAX can be used directly instead of bwa_fract.
> +
> +	if (mpam_has_feature(mpam_feat_mbw_prop, rprops))
> +		mpam_write_partsel_reg(msc, MBW_PROP, bwa_fract);
Shouldn't this reset to 0? STRIDEM1 is a cost.
> +	mutex_unlock(&msc->part_sel_lock);
> +}
> +
> +static void mpam_reset_ris(struct mpam_msc_ris *ris)
> +{
> +	u16 partid, partid_max;
> +
> +	mpam_assert_srcu_read_lock_held();
> +
> +	if (ris->in_reset_state)
> +		return;
> +
> +	spin_lock(&partid_max_lock);
> +	partid_max = mpam_partid_max;
> +	spin_unlock(&partid_max_lock);
> +	for (partid = 0; partid < partid_max; partid++)
> +		mpam_reset_ris_partid(ris, partid);
> +}
> +
> +static void mpam_reset_msc(struct mpam_msc *msc, bool online)
> +{
> +	int idx;
> +	struct mpam_msc_ris *ris;
> +
> +	mpam_assert_srcu_read_lock_held();
> +
> +	mpam_mon_sel_outer_lock(msc);
> +	idx = srcu_read_lock(&mpam_srcu);
> +	list_for_each_entry_srcu(ris, &msc->ris, msc_list, srcu_read_lock_held(&mpam_srcu)) {
> +		mpam_reset_ris(ris);
> +
> +		/*
> +		 * Set in_reset_state when coming online. The reset state
> +		 * for non-zero partid may be lost while the CPUs are offline.
> +		 */
> +		ris->in_reset_state = online;
> +	}
> +	srcu_read_unlock(&mpam_srcu, idx);
> +	mpam_mon_sel_outer_unlock(msc);
> +}
> +
>  static int mpam_cpu_online(unsigned int cpu)
>  {
> +	int idx;
> +	struct mpam_msc *msc;
> +
> +	idx = srcu_read_lock(&mpam_srcu);
> +	list_for_each_entry_srcu(msc, &mpam_all_msc, glbl_list, srcu_read_lock_held(&mpam_srcu)) {
> +		if (!cpumask_test_cpu(cpu, &msc->accessibility))
> +			continue;
> +
> +		if (atomic_fetch_inc(&msc->online_refs) == 0)
> +			mpam_reset_msc(msc, true);
> +	}
> +	srcu_read_unlock(&mpam_srcu, idx);
> +
>  	return 0;
>  }
>  
> @@ -886,6 +994,19 @@ static int mpam_discovery_cpu_online(unsigned int cpu)
>  
>  static int mpam_cpu_offline(unsigned int cpu)
>  {
> +	int idx;
> +	struct mpam_msc *msc;
> +
> +	idx = srcu_read_lock(&mpam_srcu);
> +	list_for_each_entry_srcu(msc, &mpam_all_msc, glbl_list, srcu_read_lock_held(&mpam_srcu)) {
> +		if (!cpumask_test_cpu(cpu, &msc->accessibility))
> +			continue;
> +
> +		if (atomic_dec_and_test(&msc->online_refs))
> +			mpam_reset_msc(msc, false);
> +	}
> +	srcu_read_unlock(&mpam_srcu, idx);
> +
>  	return 0;
>  }
>  
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index a2b0ff411138..466d670a01eb 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -5,6 +5,7 @@
>  #define MPAM_INTERNAL_H
>  
>  #include <linux/arm_mpam.h>
> +#include <linux/atomic.h>
>  #include <linux/cpumask.h>
>  #include <linux/io.h>
>  #include <linux/llist.h>
> @@ -43,6 +44,7 @@ struct mpam_msc {
>  	struct pcc_mbox_chan	*pcc_chan;
>  	u32			nrdy_usec;
>  	cpumask_t		accessibility;
> +	atomic_t		online_refs;
>  
>  	/*
>  	 * probe_lock is only take during discovery. After discovery these
> @@ -248,6 +250,7 @@ struct mpam_msc_ris {
>  	u8			ris_idx;
>  	u64			idr;
>  	struct mpam_props	props;
> +	bool			in_reset_state;
>  
>  	cpumask_t		affinity;
>  
> @@ -267,6 +270,11 @@ struct mpam_msc_ris {
>  extern struct srcu_struct mpam_srcu;
>  extern struct list_head mpam_classes;
>  
> +static inline void mpam_assert_srcu_read_lock_held(void)
> +{
> +	WARN_ON_ONCE(!srcu_read_lock_held((&mpam_srcu)));
> +}
> +
>  /* System wide partid/pmg values */
>  extern u16 mpam_partid_max;
>  extern u8 mpam_pmg_max;
Thanks,
Ben
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 19/33] arm_mpam: Reset MSC controls from cpu hp callbacks
  2025-08-27 16:19   ` Ben Horgan
@ 2025-09-09 16:57     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-09 16:57 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi Ben,
On 27/08/2025 17:19, Ben Horgan wrote:
> On 8/22/25 16:30, James Morse wrote:
>> When a CPU comes online, it may bring a newly accessible MSC with
>> it. Only the default partid has its value reset by hardware, and
>> even then the MSC might not have been reset since its config was
>> previously dirtyied. e.g. Kexec.
>>
>> Any in-use partid must have its configuration restored, or reset.
>> In-use partids may be held in caches and evicted later.
>>
>> MSC are also reset when CPUs are taken offline to cover cases where
>> firmware doesn't reset the MSC over reboot using UEFI, or kexec
>> where there is no firmware involvement.
>>
>> If the configuration for a RIS has not been touched since it was
>> brought online, it does not need resetting again.
>>
>> To reset, write the maximum values for all discovered controls.
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index bb62de6d3847..c1f01dd748ad 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -849,8 +850,115 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>> +static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
>> +{
>> +	u16 bwa_fract = MPAMCFG_MBW_MAX_MAX;
>> +	struct mpam_msc *msc = ris->vmsc->msc;
>> +	struct mpam_props *rprops = &ris->props;
>> +
>> +	mpam_assert_srcu_read_lock_held();
>> +
>> +	mutex_lock(&msc->part_sel_lock);
>> +	__mpam_part_sel(ris->ris_idx, partid, msc);
>> +
>> +	if (mpam_has_feature(mpam_feat_cpor_part, rprops))
>> +		mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM, rprops->cpbm_wd);
>> +
>> +	if (mpam_has_feature(mpam_feat_mbw_part, rprops))
>> +		mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM, rprops->mbw_pbm_bits);
>> +
>> +	if (mpam_has_feature(mpam_feat_mbw_min, rprops))
>> +		mpam_write_partsel_reg(msc, MBW_MIN, 0);
>> +
>> +	if (mpam_has_feature(mpam_feat_mbw_max, rprops))
>> +		mpam_write_partsel_reg(msc, MBW_MAX, bwa_fract);
> MPAMCFG_MBW_MAX_MAX can be used directly instead of bwa_fract.
Without the second user, yes.
>> +
>> +	if (mpam_has_feature(mpam_feat_mbw_prop, rprops))
>> +		mpam_write_partsel_reg(msc, MBW_PROP, bwa_fract);
> Shouldn't this reset to 0? STRIDEM1 is a cost.
Heh, this is just a copy and paste of the last value, because it clears the 'enable' bit,
and the spec says "there is no setting of the STRIDEM1 control field that disables the
effects of proportional-stride".
Yes - zero would be better.
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
 
- * [PATCH 20/33] arm_mpam: Add a helper to touch an MSC from any CPU
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (18 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 19/33] arm_mpam: Reset MSC controls from cpu hp callbacks James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-28 16:13   ` Ben Horgan
  2025-08-22 15:30 ` [PATCH 21/33] arm_mpam: Extend reset logic to allow devices to be reset any time James Morse
                   ` (47 subsequent siblings)
  67 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Resetting RIS entries from the cpuhp callback is easy as the
callback occurs on the correct CPU. This won't be true for any other
caller that wants to reset or configure an MSC.
Add a helper that schedules the provided function if necessary.
Prevent the cpuhp callbacks from changing the MSC state by taking the
cpuhp lock.
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_devices.c | 37 +++++++++++++++++++++++++++++++---
 1 file changed, 34 insertions(+), 3 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index c1f01dd748ad..759244966736 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -906,20 +906,51 @@ static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
 	mutex_unlock(&msc->part_sel_lock);
 }
 
-static void mpam_reset_ris(struct mpam_msc_ris *ris)
+/*
+ * Called via smp_call_on_cpu() to prevent migration, while still being
+ * pre-emptible.
+ */
+static int mpam_reset_ris(void *arg)
 {
 	u16 partid, partid_max;
+	struct mpam_msc_ris *ris = arg;
 
 	mpam_assert_srcu_read_lock_held();
 
 	if (ris->in_reset_state)
-		return;
+		return 0;
 
 	spin_lock(&partid_max_lock);
 	partid_max = mpam_partid_max;
 	spin_unlock(&partid_max_lock);
 	for (partid = 0; partid < partid_max; partid++)
 		mpam_reset_ris_partid(ris, partid);
+
+	return 0;
+}
+
+/*
+ * Get the preferred CPU for this MSC. If it is accessible from this CPU,
+ * this CPU is preferred. This can be preempted/migrated, it will only result
+ * in more work.
+ */
+static int mpam_get_msc_preferred_cpu(struct mpam_msc *msc)
+{
+	int cpu = raw_smp_processor_id();
+
+	if (cpumask_test_cpu(cpu, &msc->accessibility))
+		return cpu;
+
+	return cpumask_first_and(&msc->accessibility, cpu_online_mask);
+}
+
+static int mpam_touch_msc(struct mpam_msc *msc, int (*fn)(void *a), void *arg)
+{
+	lockdep_assert_irqs_enabled();
+	lockdep_assert_cpus_held();
+	mpam_assert_srcu_read_lock_held();
+
+	return smp_call_on_cpu(mpam_get_msc_preferred_cpu(msc), fn, arg, true);
 }
 
 static void mpam_reset_msc(struct mpam_msc *msc, bool online)
@@ -932,7 +963,7 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
 	mpam_mon_sel_outer_lock(msc);
 	idx = srcu_read_lock(&mpam_srcu);
 	list_for_each_entry_srcu(ris, &msc->ris, msc_list, srcu_read_lock_held(&mpam_srcu)) {
-		mpam_reset_ris(ris);
+		mpam_touch_msc(msc, &mpam_reset_ris, ris);
 
 		/*
 		 * Set in_reset_state when coming online. The reset state
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 20/33] arm_mpam: Add a helper to touch an MSC from any CPU
  2025-08-22 15:30 ` [PATCH 20/33] arm_mpam: Add a helper to touch an MSC from any CPU James Morse
@ 2025-08-28 16:13   ` Ben Horgan
  2025-09-09 16:57     ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-08-28 16:13 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:30, James Morse wrote:
> Resetting RIS entries from the cpuhp callback is easy as the
> callback occurs on the correct CPU. This won't be true for any other
> caller that wants to reset or configure an MSC.
> 
> Add a helper that schedules the provided function if necessary.
> Prevent the cpuhp callbacks from changing the MSC state by taking the
> cpuhp lock.
At first, I thought this was referring to something done in the patch.
Consider changing to something like:
Callers should take the cpuhp lock to prevent the cpuhp callbacks from
changing the MSC state.
Regardless, this looks good to me.
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  drivers/resctrl/mpam_devices.c | 37 +++++++++++++++++++++++++++++++---
>  1 file changed, 34 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index c1f01dd748ad..759244966736 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -906,20 +906,51 @@ static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
>  	mutex_unlock(&msc->part_sel_lock);
>  }
>  
> -static void mpam_reset_ris(struct mpam_msc_ris *ris)
> +/*
> + * Called via smp_call_on_cpu() to prevent migration, while still being
> + * pre-emptible.
> + */
> +static int mpam_reset_ris(void *arg)
>  {
>  	u16 partid, partid_max;
> +	struct mpam_msc_ris *ris = arg;
>  
>  	mpam_assert_srcu_read_lock_held();
>  
>  	if (ris->in_reset_state)
> -		return;
> +		return 0;
>  
>  	spin_lock(&partid_max_lock);
>  	partid_max = mpam_partid_max;
>  	spin_unlock(&partid_max_lock);
>  	for (partid = 0; partid < partid_max; partid++)
>  		mpam_reset_ris_partid(ris, partid);
> +
> +	return 0;
> +}
> +
> +/*
> + * Get the preferred CPU for this MSC. If it is accessible from this CPU,
> + * this CPU is preferred. This can be preempted/migrated, it will only result
> + * in more work.
> + */
> +static int mpam_get_msc_preferred_cpu(struct mpam_msc *msc)
> +{
> +	int cpu = raw_smp_processor_id();
> +
> +	if (cpumask_test_cpu(cpu, &msc->accessibility))
> +		return cpu;
> +
> +	return cpumask_first_and(&msc->accessibility, cpu_online_mask);
> +}
> +
> +static int mpam_touch_msc(struct mpam_msc *msc, int (*fn)(void *a), void *arg)
> +{
> +	lockdep_assert_irqs_enabled();
> +	lockdep_assert_cpus_held();
> +	mpam_assert_srcu_read_lock_held();
> +
> +	return smp_call_on_cpu(mpam_get_msc_preferred_cpu(msc), fn, arg, true);
>  }
>  
>  static void mpam_reset_msc(struct mpam_msc *msc, bool online)
> @@ -932,7 +963,7 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
>  	mpam_mon_sel_outer_lock(msc);
>  	idx = srcu_read_lock(&mpam_srcu);
>  	list_for_each_entry_srcu(ris, &msc->ris, msc_list, srcu_read_lock_held(&mpam_srcu)) {
> -		mpam_reset_ris(ris);
> +		mpam_touch_msc(msc, &mpam_reset_ris, ris);
>  
>  		/*
>  		 * Set in_reset_state when coming online. The reset state
Thanks,
Ben
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 20/33] arm_mpam: Add a helper to touch an MSC from any CPU
  2025-08-28 16:13   ` Ben Horgan
@ 2025-09-09 16:57     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-09 16:57 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi Ben,
On 28/08/2025 17:13, Ben Horgan wrote:
> On 8/22/25 16:30, James Morse wrote:
>> Resetting RIS entries from the cpuhp callback is easy as the
>> callback occurs on the correct CPU. This won't be true for any other
>> caller that wants to reset or configure an MSC.
>>
>> Add a helper that schedules the provided function if necessary.
>> Prevent the cpuhp callbacks from changing the MSC state by taking the
>> cpuhp lock.
> At first, I thought this was referring to something done in the patch.
> Consider changing to something like:
> 
> Callers should take the cpuhp lock to prevent the cpuhp callbacks from
> changing the MSC state.
Yes - that is better,
> Regardless, this looks good to me.
> 
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Thanks!
James
^ permalink raw reply	[flat|nested] 200+ messages in thread 
 
 
- * [PATCH 21/33] arm_mpam: Extend reset logic to allow devices to be reset any time
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (19 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 20/33] arm_mpam: Add a helper to touch an MSC from any CPU James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-29 14:30   ` Ben Horgan
  2025-08-22 15:30 ` [PATCH 22/33] arm_mpam: Register and enable IRQs James Morse
                   ` (46 subsequent siblings)
  67 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
cpuhp callbacks aren't the only time the MSC configuration may need to
be reset. Resctrl has an API call to reset a class.
If an MPAM error interrupt arrives it indicates the driver has
misprogrammed an MSC. The safest thing to do is reset all the MSCs
and disable MPAM.
Add a helper to reset RIS via their class. Call this from mpam_disable(),
which can be scheduled from the error interrupt handler.
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_devices.c  | 62 +++++++++++++++++++++++++++++++--
 drivers/resctrl/mpam_internal.h |  1 +
 2 files changed, 61 insertions(+), 2 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 759244966736..3516cbe8623e 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -915,8 +915,6 @@ static int mpam_reset_ris(void *arg)
 	u16 partid, partid_max;
 	struct mpam_msc_ris *ris = arg;
 
-	mpam_assert_srcu_read_lock_held();
-
 	if (ris->in_reset_state)
 		return 0;
 
@@ -1569,6 +1567,66 @@ static void mpam_enable_once(void)
 	       mpam_partid_max + 1, mpam_pmg_max + 1);
 }
 
+static void mpam_reset_component_locked(struct mpam_component *comp)
+{
+	int idx;
+	struct mpam_msc *msc;
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc_ris *ris;
+
+	might_sleep();
+	lockdep_assert_cpus_held();
+
+	idx = srcu_read_lock(&mpam_srcu);
+	list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
+		msc = vmsc->msc;
+
+		list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
+			if (!ris->in_reset_state)
+				mpam_touch_msc(msc, mpam_reset_ris, ris);
+			ris->in_reset_state = true;
+		}
+	}
+	srcu_read_unlock(&mpam_srcu, idx);
+}
+
+static void mpam_reset_class_locked(struct mpam_class *class)
+{
+	int idx;
+	struct mpam_component *comp;
+
+	lockdep_assert_cpus_held();
+
+	idx = srcu_read_lock(&mpam_srcu);
+	list_for_each_entry_rcu(comp, &class->components, class_list)
+		mpam_reset_component_locked(comp);
+	srcu_read_unlock(&mpam_srcu, idx);
+}
+
+static void mpam_reset_class(struct mpam_class *class)
+{
+	cpus_read_lock();
+	mpam_reset_class_locked(class);
+	cpus_read_unlock();
+}
+
+/*
+ * Called in response to an error IRQ.
+ * All of MPAMs errors indicate a software bug, restore any modified
+ * controls to their reset values.
+ */
+void mpam_disable(void)
+{
+	int idx;
+	struct mpam_class *class;
+
+	idx = srcu_read_lock(&mpam_srcu);
+	list_for_each_entry_srcu(class, &mpam_classes, classes_list,
+				 srcu_read_lock_held(&mpam_srcu))
+		mpam_reset_class(class);
+	srcu_read_unlock(&mpam_srcu, idx);
+}
+
 /*
  * Enable mpam once all devices have been probed.
  * Scheduled by mpam_discovery_cpu_online() once all devices have been created.
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 466d670a01eb..b30fee2b7674 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -281,6 +281,7 @@ extern u8 mpam_pmg_max;
 
 /* Scheduled work callback to enable mpam once all MSC have been probed */
 void mpam_enable(struct work_struct *work);
+void mpam_disable(void);
 
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 21/33] arm_mpam: Extend reset logic to allow devices to be reset any time
  2025-08-22 15:30 ` [PATCH 21/33] arm_mpam: Extend reset logic to allow devices to be reset any time James Morse
@ 2025-08-29 14:30   ` Ben Horgan
  2025-09-09 16:58     ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-08-29 14:30 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:30, James Morse wrote:
> cpuhp callbacks aren't the only time the MSC configuration may need to
> be reset. Resctrl has an API call to reset a class.
> If an MPAM error interrupt arrives it indicates the driver has
> misprogrammed an MSC. The safest thing to do is reset all the MSCs
> and disable MPAM.
> 
> Add a helper to reset RIS via their class. Call this from mpam_disable(),
> which can be scheduled from the error interrupt handler.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  drivers/resctrl/mpam_devices.c  | 62 +++++++++++++++++++++++++++++++--
>  drivers/resctrl/mpam_internal.h |  1 +
>  2 files changed, 61 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 759244966736..3516cbe8623e 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -915,8 +915,6 @@ static int mpam_reset_ris(void *arg)
>  	u16 partid, partid_max;
>  	struct mpam_msc_ris *ris = arg;
>  
> -	mpam_assert_srcu_read_lock_held();
> -
>  	if (ris->in_reset_state)
>  		return 0;
>  
> @@ -1569,6 +1567,66 @@ static void mpam_enable_once(void)
>  	       mpam_partid_max + 1, mpam_pmg_max + 1);
>  }
>  
> +static void mpam_reset_component_locked(struct mpam_component *comp)
> +{
> +	int idx;
> +	struct mpam_msc *msc;
> +	struct mpam_vmsc *vmsc;
> +	struct mpam_msc_ris *ris;
> +
> +	might_sleep();
> +	lockdep_assert_cpus_held();
> +
> +	idx = srcu_read_lock(&mpam_srcu);
> +	list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
> +		msc = vmsc->msc;
> +
> +		list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
> +			if (!ris->in_reset_state)
> +				mpam_touch_msc(msc, mpam_reset_ris, ris);
> +			ris->in_reset_state = true;
> +		}
> +	}
> +	srcu_read_unlock(&mpam_srcu, idx);
> +}
> +
> +static void mpam_reset_class_locked(struct mpam_class *class)
> +{
> +	int idx;
> +	struct mpam_component *comp;
> +
> +	lockdep_assert_cpus_held();
> +
> +	idx = srcu_read_lock(&mpam_srcu);
> +	list_for_each_entry_rcu(comp, &class->components, class_list)
> +		mpam_reset_component_locked(comp);
> +	srcu_read_unlock(&mpam_srcu, idx);
> +}
> +
> +static void mpam_reset_class(struct mpam_class *class)
> +{
> +	cpus_read_lock();
> +	mpam_reset_class_locked(class);
> +	cpus_read_unlock();
> +}
> +
> +/*
> + * Called in response to an error IRQ.
> + * All of MPAMs errors indicate a software bug, restore any modified
> + * controls to their reset values.
> + */
> +void mpam_disable(void)
> +{
> +	int idx;
> +	struct mpam_class *class;
> +
> +	idx = srcu_read_lock(&mpam_srcu);
> +	list_for_each_entry_srcu(class, &mpam_classes, classes_list,
> +				 srcu_read_lock_held(&mpam_srcu))
Why do you use list_for_each_entry_srcu() here when in other places you
use list_for_each_entry_rcu()?
> +		mpam_reset_class(class);
> +	srcu_read_unlock(&mpam_srcu, idx);
> +}
> +
>  /*
>   * Enable mpam once all devices have been probed.
>   * Scheduled by mpam_discovery_cpu_online() once all devices have been created.
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 466d670a01eb..b30fee2b7674 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -281,6 +281,7 @@ extern u8 mpam_pmg_max;
>  
>  /* Scheduled work callback to enable mpam once all MSC have been probed */
>  void mpam_enable(struct work_struct *work);
> +void mpam_disable(void);
>  
>  int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
>  				   cpumask_t *affinity);
Thanks,
Ben
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 21/33] arm_mpam: Extend reset logic to allow devices to be reset any time
  2025-08-29 14:30   ` Ben Horgan
@ 2025-09-09 16:58     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-09 16:58 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi Ben,
On 29/08/2025 15:30, Ben Horgan wrote:
> On 8/22/25 16:30, James Morse wrote:
>> cpuhp callbacks aren't the only time the MSC configuration may need to
>> be reset. Resctrl has an API call to reset a class.
>> If an MPAM error interrupt arrives it indicates the driver has
>> misprogrammed an MSC. The safest thing to do is reset all the MSCs
>> and disable MPAM.
>>
>> Add a helper to reset RIS via their class. Call this from mpam_disable(),
>> which can be scheduled from the error interrupt handler.
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index 759244966736..3516cbe8623e 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -1569,6 +1567,66 @@ static void mpam_enable_once(void)
>> +/*
>> + * Called in response to an error IRQ.
>> + * All of MPAMs errors indicate a software bug, restore any modified
>> + * controls to their reset values.
>> + */
>> +void mpam_disable(void)
>> +{
>> +	int idx;
>> +	struct mpam_class *class;
>> +
>> +	idx = srcu_read_lock(&mpam_srcu);
>> +	list_for_each_entry_srcu(class, &mpam_classes, classes_list,
>> +				 srcu_read_lock_held(&mpam_srcu))
> 
> Why do you use list_for_each_entry_srcu() here when in other places you
> use list_for_each_entry_rcu()?
It's a mistake. I was part way making this use RCU when someone 'invented' the
firmware interface meaning readl() needs to be able to sleep...
Those were added in a later patch than I thought, and I missed fixing them up.
I think the srcu version provides extra checking - and is the correct one to use.
I'll fix those -- thanks for spotting it!
Thanks,
James
>> +		mpam_reset_class(class);
>> +	srcu_read_unlock(&mpam_srcu, idx);
>> +}
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
 
- * [PATCH 22/33] arm_mpam: Register and enable IRQs
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (20 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 21/33] arm_mpam: Extend reset logic to allow devices to be reset any time James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-09-09 16:58   ` James Morse
  2025-08-22 15:30 ` [PATCH 23/33] arm_mpam: Use a static key to indicate when mpam is enabled James Morse
                   ` (45 subsequent siblings)
  67 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Register and enable error IRQs. All the MPAM error interrupts indicate a
software bug, e.g. out of range partid. If the error interrupt is ever
signalled, attempt to disable MPAM.
Only the irq handler accesses the ESR register, so no locking is needed.
The work to disable MPAM after an error needs to happen at process
context, use a threaded interrupt.
There is no support for percpu threaded interrupts, for now schedule
the work to be done from the irq handler.
Enabling the IRQs in the MSC may involve cross calling to a CPU that
can access the MSC.
Once the IRQ is requested, the mpam_disable() path can be called
asynchronously, which will walk structures sized by max_partid. Ensure
this size is fixed before the interrupt is requested.
CC: Rohit Mathew <rohit.mathew@arm.com>
Tested-by: Rohit Mathew <rohit.mathew@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
 * Use guard marco when walking srcu list.
 * Use INTEN macro for enabling interrupts.
 * Move partid_max_published up earlier in mpam_enable_once().
---
 drivers/resctrl/mpam_devices.c  | 311 +++++++++++++++++++++++++++++++-
 drivers/resctrl/mpam_internal.h |   9 +-
 2 files changed, 312 insertions(+), 8 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 3516cbe8623e..210d64fad0b1 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -14,6 +14,9 @@
 #include <linux/device.h>
 #include <linux/errno.h>
 #include <linux/gfp.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/irqdesc.h>
 #include <linux/list.h>
 #include <linux/lockdep.h>
 #include <linux/mutex.h>
@@ -62,6 +65,12 @@ static DEFINE_SPINLOCK(partid_max_lock);
  */
 static DECLARE_WORK(mpam_enable_work, &mpam_enable);
 
+/*
+ * All mpam error interrupts indicate a software bug. On receipt, disable the
+ * driver.
+ */
+static DECLARE_WORK(mpam_broken_work, &mpam_disable);
+
 /*
  * An MSC is a physical container for controls and monitors, each identified by
  * their RIS index. These share a base-address, interrupts and some MMIO
@@ -159,6 +168,24 @@ static u64 mpam_msc_read_idr(struct mpam_msc *msc)
 	return (idr_high << 32) | idr_low;
 }
 
+static void mpam_msc_zero_esr(struct mpam_msc *msc)
+{
+	__mpam_write_reg(msc, MPAMF_ESR, 0);
+	if (msc->has_extd_esr)
+		__mpam_write_reg(msc, MPAMF_ESR + 4, 0);
+}
+
+static u64 mpam_msc_read_esr(struct mpam_msc *msc)
+{
+	u64 esr_high = 0, esr_low;
+
+	esr_low = __mpam_read_reg(msc, MPAMF_ESR);
+	if (msc->has_extd_esr)
+		esr_high = __mpam_read_reg(msc, MPAMF_ESR + 4);
+
+	return (esr_high << 32) | esr_low;
+}
+
 static void __mpam_part_sel_raw(u32 partsel, struct mpam_msc *msc)
 {
 	lockdep_assert_held(&msc->part_sel_lock);
@@ -405,12 +432,12 @@ static void mpam_msc_destroy(struct mpam_msc *msc)
 
 	lockdep_assert_held(&mpam_list_lock);
 
-	list_del_rcu(&msc->glbl_list);
-	platform_set_drvdata(pdev, NULL);
-
 	list_for_each_entry_safe(ris, tmp, &msc->ris, msc_list)
 		mpam_ris_destroy(ris);
 
+	list_del_rcu(&msc->glbl_list);
+	platform_set_drvdata(pdev, NULL);
+
 	add_to_garbage(msc);
 	msc->garbage.pdev = pdev;
 }
@@ -828,6 +855,7 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 		pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
 		msc->partid_max = min(msc->partid_max, partid_max);
 		msc->pmg_max = min(msc->pmg_max, pmg_max);
+		msc->has_extd_esr = FIELD_GET(MPAMF_IDR_HAS_EXTD_ESR, idr);
 
 		ris = mpam_get_or_create_ris(msc, ris_idx);
 		if (IS_ERR(ris))
@@ -840,6 +868,9 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 		mutex_unlock(&msc->part_sel_lock);
 	}
 
+	/* Clear any stale errors */
+	mpam_msc_zero_esr(msc);
+
 	spin_lock(&partid_max_lock);
 	mpam_partid_max = min(mpam_partid_max, msc->partid_max);
 	mpam_pmg_max = min(mpam_pmg_max, msc->pmg_max);
@@ -973,6 +1004,13 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
 	mpam_mon_sel_outer_unlock(msc);
 }
 
+static void _enable_percpu_irq(void *_irq)
+{
+	int *irq = _irq;
+
+	enable_percpu_irq(*irq, IRQ_TYPE_NONE);
+}
+
 static int mpam_cpu_online(unsigned int cpu)
 {
 	int idx;
@@ -983,6 +1021,9 @@ static int mpam_cpu_online(unsigned int cpu)
 		if (!cpumask_test_cpu(cpu, &msc->accessibility))
 			continue;
 
+		if (msc->reenable_error_ppi)
+			_enable_percpu_irq(&msc->reenable_error_ppi);
+
 		if (atomic_fetch_inc(&msc->online_refs) == 0)
 			mpam_reset_msc(msc, true);
 	}
@@ -1031,6 +1072,9 @@ static int mpam_cpu_offline(unsigned int cpu)
 		if (!cpumask_test_cpu(cpu, &msc->accessibility))
 			continue;
 
+		if (msc->reenable_error_ppi)
+			disable_percpu_irq(msc->reenable_error_ppi);
+
 		if (atomic_dec_and_test(&msc->online_refs))
 			mpam_reset_msc(msc, false);
 	}
@@ -1057,6 +1101,51 @@ static void mpam_register_cpuhp_callbacks(int (*online)(unsigned int online),
 	mutex_unlock(&mpam_cpuhp_state_lock);
 }
 
+static int __setup_ppi(struct mpam_msc *msc)
+{
+	int cpu;
+
+	msc->error_dev_id = alloc_percpu_gfp(struct mpam_msc *, GFP_KERNEL);
+	if (!msc->error_dev_id)
+		return -ENOMEM;
+
+	for_each_cpu(cpu, &msc->accessibility) {
+		struct mpam_msc *empty = *per_cpu_ptr(msc->error_dev_id, cpu);
+
+		if (empty) {
+			pr_err_once("%s shares PPI with %s!\n",
+				    dev_name(&msc->pdev->dev),
+				    dev_name(&empty->pdev->dev));
+			return -EBUSY;
+		}
+		*per_cpu_ptr(msc->error_dev_id, cpu) = msc;
+	}
+
+	return 0;
+}
+
+static int mpam_msc_setup_error_irq(struct mpam_msc *msc)
+{
+	int irq;
+
+	irq = platform_get_irq_byname_optional(msc->pdev, "error");
+	if (irq <= 0)
+		return 0;
+
+	/* Allocate and initialise the percpu device pointer for PPI */
+	if (irq_is_percpu(irq))
+		return __setup_ppi(msc);
+
+	/* sanity check: shared interrupts can be routed anywhere? */
+	if (!cpumask_equal(&msc->accessibility, cpu_possible_mask)) {
+		pr_err_once("msc:%u is a private resource with a shared error interrupt",
+			    msc->id);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static int mpam_dt_count_msc(void)
 {
 	int count = 0;
@@ -1265,6 +1354,10 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
 			break;
 		}
 
+		err = mpam_msc_setup_error_irq(msc);
+		if (err)
+			break;
+
 		if (device_property_read_u32(&pdev->dev, "pcc-channel",
 					     &msc->pcc_subspace_id))
 			msc->iface = MPAM_IFACE_MMIO;
@@ -1547,11 +1640,171 @@ static void mpam_enable_merge_features(struct list_head *all_classes_list)
 	}
 }
 
+static char *mpam_errcode_names[16] = {
+	[0] = "No error",
+	[1] = "PARTID_SEL_Range",
+	[2] = "Req_PARTID_Range",
+	[3] = "MSMONCFG_ID_RANGE",
+	[4] = "Req_PMG_Range",
+	[5] = "Monitor_Range",
+	[6] = "intPARTID_Range",
+	[7] = "Unexpected_INTERNAL",
+	[8] = "Undefined_RIS_PART_SEL",
+	[9] = "RIS_No_Control",
+	[10] = "Undefined_RIS_MON_SEL",
+	[11] = "RIS_No_Monitor",
+	[12 ... 15] = "Reserved"
+};
+
+static int mpam_enable_msc_ecr(void *_msc)
+{
+	struct mpam_msc *msc = _msc;
+
+	__mpam_write_reg(msc, MPAMF_ECR, MPAMF_ECR_INTEN);
+
+	return 0;
+}
+
+static int mpam_disable_msc_ecr(void *_msc)
+{
+	struct mpam_msc *msc = _msc;
+
+	__mpam_write_reg(msc, MPAMF_ECR, 0);
+
+	return 0;
+}
+
+static irqreturn_t __mpam_irq_handler(int irq, struct mpam_msc *msc)
+{
+	u64 reg;
+	u16 partid;
+	u8 errcode, pmg, ris;
+
+	if (WARN_ON_ONCE(!msc) ||
+	    WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(),
+					   &msc->accessibility)))
+		return IRQ_NONE;
+
+	reg = mpam_msc_read_esr(msc);
+
+	errcode = FIELD_GET(MPAMF_ESR_ERRCODE, reg);
+	if (!errcode)
+		return IRQ_NONE;
+
+	/* Clear level triggered irq */
+	mpam_msc_zero_esr(msc);
+
+	partid = FIELD_GET(MPAMF_ESR_PARTID_MON, reg);
+	pmg = FIELD_GET(MPAMF_ESR_PMG, reg);
+	ris = FIELD_GET(MPAMF_ESR_RIS, reg);
+
+	pr_err("error irq from msc:%u '%s', partid:%u, pmg: %u, ris: %u\n",
+	       msc->id, mpam_errcode_names[errcode], partid, pmg, ris);
+
+	if (irq_is_percpu(irq)) {
+		mpam_disable_msc_ecr(msc);
+		schedule_work(&mpam_broken_work);
+		return IRQ_HANDLED;
+	}
+
+	return IRQ_WAKE_THREAD;
+}
+
+static irqreturn_t mpam_ppi_handler(int irq, void *dev_id)
+{
+	struct mpam_msc *msc = *(struct mpam_msc **)dev_id;
+
+	return __mpam_irq_handler(irq, msc);
+}
+
+static irqreturn_t mpam_spi_handler(int irq, void *dev_id)
+{
+	struct mpam_msc *msc = dev_id;
+
+	return __mpam_irq_handler(irq, msc);
+}
+
+static irqreturn_t mpam_disable_thread(int irq, void *dev_id);
+
+static int mpam_register_irqs(void)
+{
+	int err, irq;
+	struct mpam_msc *msc;
+
+	lockdep_assert_cpus_held();
+
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(msc, &mpam_all_msc, glbl_list, srcu_read_lock_held(&mpam_srcu)) {
+		irq = platform_get_irq_byname_optional(msc->pdev, "error");
+		if (irq <= 0)
+			continue;
+
+		/* The MPAM spec says the interrupt can be SPI, PPI or LPI */
+		/* We anticipate sharing the interrupt with other MSCs */
+		if (irq_is_percpu(irq)) {
+			err = request_percpu_irq(irq, &mpam_ppi_handler,
+						 "mpam:msc:error",
+						 msc->error_dev_id);
+			if (err)
+				return err;
+
+			msc->reenable_error_ppi = irq;
+			smp_call_function_many(&msc->accessibility,
+					       &_enable_percpu_irq, &irq,
+					       true);
+		} else {
+			err = devm_request_threaded_irq(&msc->pdev->dev, irq,
+							&mpam_spi_handler,
+							&mpam_disable_thread,
+							IRQF_SHARED,
+							"mpam:msc:error", msc);
+			if (err)
+				return err;
+		}
+
+		msc->error_irq_requested = true;
+		mpam_touch_msc(msc, mpam_enable_msc_ecr, msc);
+		msc->error_irq_hw_enabled = true;
+	}
+
+	return 0;
+}
+
+static void mpam_unregister_irqs(void)
+{
+	int irq, idx;
+	struct mpam_msc *msc;
+
+	cpus_read_lock();
+	/* take the lock as free_irq() can sleep */
+	idx = srcu_read_lock(&mpam_srcu);
+	list_for_each_entry_srcu(msc, &mpam_all_msc, glbl_list, srcu_read_lock_held(&mpam_srcu)) {
+		irq = platform_get_irq_byname_optional(msc->pdev, "error");
+		if (irq <= 0)
+			continue;
+
+		if (msc->error_irq_hw_enabled) {
+			mpam_touch_msc(msc, mpam_disable_msc_ecr, msc);
+			msc->error_irq_hw_enabled = false;
+		}
+
+		if (msc->error_irq_requested) {
+			if (irq_is_percpu(irq)) {
+				msc->reenable_error_ppi = 0;
+				free_percpu_irq(irq, msc->error_dev_id);
+			} else {
+				devm_free_irq(&msc->pdev->dev, irq, msc);
+			}
+			msc->error_irq_requested = false;
+		}
+	}
+	srcu_read_unlock(&mpam_srcu, idx);
+	cpus_read_unlock();
+}
+
 static void mpam_enable_once(void)
 {
-	mutex_lock(&mpam_list_lock);
-	mpam_enable_merge_features(&mpam_classes);
-	mutex_unlock(&mpam_list_lock);
+	int err;
 
 	/*
 	 * Once the cpuhp callbacks have been changed, mpam_partid_max can no
@@ -1561,6 +1814,27 @@ static void mpam_enable_once(void)
 	partid_max_published = true;
 	spin_unlock(&partid_max_lock);
 
+	/*
+	 * If all the MSC have been probed, enabling the IRQs happens next.
+	 * That involves cross-calling to a CPU that can reach the MSC, and
+	 * the locks must be taken in this order:
+	 */
+	cpus_read_lock();
+	mutex_lock(&mpam_list_lock);
+	mpam_enable_merge_features(&mpam_classes);
+
+	err = mpam_register_irqs();
+	if (err)
+		pr_warn("Failed to register irqs: %d\n", err);
+
+	mutex_unlock(&mpam_list_lock);
+	cpus_read_unlock();
+
+	if (err) {
+		schedule_work(&mpam_broken_work);
+		return;
+	}
+
 	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
 
 	printk(KERN_INFO "MPAM enabled with %u partid and %u pmg\n",
@@ -1615,16 +1889,39 @@ static void mpam_reset_class(struct mpam_class *class)
  * All of MPAMs errors indicate a software bug, restore any modified
  * controls to their reset values.
  */
-void mpam_disable(void)
+static irqreturn_t mpam_disable_thread(int irq, void *dev_id)
 {
 	int idx;
 	struct mpam_class *class;
+	struct mpam_msc *msc, *tmp;
+
+	mutex_lock(&mpam_cpuhp_state_lock);
+	if (mpam_cpuhp_state) {
+		cpuhp_remove_state(mpam_cpuhp_state);
+		mpam_cpuhp_state = 0;
+	}
+	mutex_unlock(&mpam_cpuhp_state_lock);
+
+	mpam_unregister_irqs();
 
 	idx = srcu_read_lock(&mpam_srcu);
 	list_for_each_entry_srcu(class, &mpam_classes, classes_list,
 				 srcu_read_lock_held(&mpam_srcu))
 		mpam_reset_class(class);
 	srcu_read_unlock(&mpam_srcu, idx);
+
+	mutex_lock(&mpam_list_lock);
+	list_for_each_entry_safe(msc, tmp, &mpam_all_msc, glbl_list)
+		mpam_msc_destroy(msc);
+	mutex_unlock(&mpam_list_lock);
+	mpam_free_garbage();
+
+	return IRQ_HANDLED;
+}
+
+void mpam_disable(struct work_struct *ignored)
+{
+	mpam_disable_thread(0, NULL);
 }
 
 /*
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index b30fee2b7674..c9418c9cf9f2 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -44,6 +44,11 @@ struct mpam_msc {
 	struct pcc_mbox_chan	*pcc_chan;
 	u32			nrdy_usec;
 	cpumask_t		accessibility;
+	bool			has_extd_esr;
+
+	int				reenable_error_ppi;
+	struct mpam_msc * __percpu	*error_dev_id;
+
 	atomic_t		online_refs;
 
 	/*
@@ -52,6 +57,8 @@ struct mpam_msc {
 	 */
 	struct mutex		probe_lock;
 	bool			probed;
+	bool			error_irq_requested;
+	bool			error_irq_hw_enabled;
 	u16			partid_max;
 	u8			pmg_max;
 	unsigned long		ris_idxs[128 / BITS_PER_LONG];
@@ -281,7 +288,7 @@ extern u8 mpam_pmg_max;
 
 /* Scheduled work callback to enable mpam once all MSC have been probed */
 void mpam_enable(struct work_struct *work);
-void mpam_disable(void);
+void mpam_disable(struct work_struct *work);
 
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 22/33] arm_mpam: Register and enable IRQs
  2025-08-22 15:30 ` [PATCH 22/33] arm_mpam: Register and enable IRQs James Morse
@ 2025-09-09 16:58   ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-09 16:58 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James, (:p)
On 22/08/2025 16:30, James Morse wrote:
> Register and enable error IRQs. All the MPAM error interrupts indicate a
> software bug, e.g. out of range partid. If the error interrupt is ever
> signalled, attempt to disable MPAM.
> 
> Only the irq handler accesses the ESR register, so no locking is needed.
> The work to disable MPAM after an error needs to happen at process
> context, use a threaded interrupt.
> 
> There is no support for percpu threaded interrupts, for now schedule
> the work to be done from the irq handler.
> 
> Enabling the IRQs in the MSC may involve cross calling to a CPU that
> can access the MSC.
> 
> Once the IRQ is requested, the mpam_disable() path can be called
> asynchronously, which will walk structures sized by max_partid. Ensure
> this size is fixed before the interrupt is requested.
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 3516cbe8623e..210d64fad0b1 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -1547,11 +1640,171 @@ static void mpam_enable_merge_features(struct list_head *all_classes_list)
> +static irqreturn_t __mpam_irq_handler(int irq, struct mpam_msc *msc)
> +{
> +	u64 reg;
> +	u16 partid;
> +	u8 errcode, pmg, ris;
> +
> +	if (WARN_ON_ONCE(!msc) ||
> +	    WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(),
> +					   &msc->accessibility)))
> +		return IRQ_NONE;
> +
> +	reg = mpam_msc_read_esr(msc);
> +
> +	errcode = FIELD_GET(MPAMF_ESR_ERRCODE, reg);
> +	if (!errcode)
> +		return IRQ_NONE;
> +
> +	/* Clear level triggered irq */
> +	mpam_msc_zero_esr(msc);
> +
> +	partid = FIELD_GET(MPAMF_ESR_PARTID_MON, reg);
> +	pmg = FIELD_GET(MPAMF_ESR_PMG, reg);
> +	ris = FIELD_GET(MPAMF_ESR_RIS, reg);
> +
> +	pr_err("error irq from msc:%u '%s', partid:%u, pmg: %u, ris: %u\n",
> +	       msc->id, mpam_errcode_names[errcode], partid, pmg, ris);
> +
> +	if (irq_is_percpu(irq)) {
> +		mpam_disable_msc_ecr(msc);
> +		schedule_work(&mpam_broken_work);
> +		return IRQ_HANDLED;
> +	}
> +
> +	return IRQ_WAKE_THREAD;
> +}
> +static void mpam_unregister_irqs(void)
> +{
> +	int irq, idx;
> +	struct mpam_msc *msc;
> +
> +	cpus_read_lock();
> +	/* take the lock as free_irq() can sleep */
> +	idx = srcu_read_lock(&mpam_srcu);
> +	list_for_each_entry_srcu(msc, &mpam_all_msc, glbl_list, srcu_read_lock_held(&mpam_srcu)) {
> +		irq = platform_get_irq_byname_optional(msc->pdev, "error");
> +		if (irq <= 0)
> +			continue;
> +
> +		if (msc->error_irq_hw_enabled) {
> +			mpam_touch_msc(msc, mpam_disable_msc_ecr, msc);
> +			msc->error_irq_hw_enabled = false;
> +		}
> +
> +		if (msc->error_irq_requested) {
> +			if (irq_is_percpu(irq)) {
> +				msc->reenable_error_ppi = 0;
> +				free_percpu_irq(irq, msc->error_dev_id);
> +			} else {
> +				devm_free_irq(&msc->pdev->dev, irq, msc);
> +			}
> +			msc->error_irq_requested = false;
> +		}
> +	}
> +	srcu_read_unlock(&mpam_srcu, idx);
> +	cpus_read_unlock();
> +}
> @@ -1615,16 +1889,39 @@ static void mpam_reset_class(struct mpam_class *class)
>   * All of MPAMs errors indicate a software bug, restore any modified
>   * controls to their reset values.
>   */
> -void mpam_disable(void)
> +static irqreturn_t mpam_disable_thread(int irq, void *dev_id)
>  {
>  	int idx;
>  	struct mpam_class *class;
> +	struct mpam_msc *msc, *tmp;
> +
> +	mutex_lock(&mpam_cpuhp_state_lock);
> +	if (mpam_cpuhp_state) {
> +		cpuhp_remove_state(mpam_cpuhp_state);
> +		mpam_cpuhp_state = 0;
> +	}
> +	mutex_unlock(&mpam_cpuhp_state_lock);
> +	mpam_unregister_irqs();
When out-of-range PARTID get used, all the MSC go off at once - which means the interrupts
can be delivered to multiple CPUs at the same time. This unregister call is outside any
lock, and the msc->error_irq_* flags aren't atomic - leading to hilarity as this races
with itself.
Also turns out you can't devm_free_irq() from a threaded irq as it blocks forever in
syncrhonise_irq().
Naturally I didn't hit either of these issues when scheduling the thread from debugfs.
I've made the flags atomic, and thrown the threaded-irq away - instead the work always
gets scheduled.
Thanks,
James
>  	idx = srcu_read_lock(&mpam_srcu);
>  	list_for_each_entry_srcu(class, &mpam_classes, classes_list,
>  				 srcu_read_lock_held(&mpam_srcu))
>  		mpam_reset_class(class);
>  	srcu_read_unlock(&mpam_srcu, idx);
> +
> +	mutex_lock(&mpam_list_lock);
> +	list_for_each_entry_safe(msc, tmp, &mpam_all_msc, glbl_list)
> +		mpam_msc_destroy(msc);
> +	mutex_unlock(&mpam_list_lock);
> +	mpam_free_garbage();
> +
> +	return IRQ_HANDLED;
> +}
/*error_irq_requested
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
- * [PATCH 23/33] arm_mpam: Use a static key to indicate when mpam is enabled
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (21 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 22/33] arm_mpam: Register and enable IRQs James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-22 15:30 ` [PATCH 24/33] arm_mpam: Allow configuration to be applied and restored during cpu online James Morse
                   ` (44 subsequent siblings)
  67 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Once all the MSC have been probed, the system wide usable number of
PARTID is known and the configuration arrays can be allocated.
After this point, checking all the MSC have been probed is pointless,
and the cpuhp callbacks should restore the configuration, instead of
just resetting the MSC.
Add a static key to enable this behaviour. This will also allow MPAM
to be disabled in repsonse to an error, and the architecture code to
enable/disable the context switch of the MPAM system registers.
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_devices.c  | 8 ++++++++
 drivers/resctrl/mpam_internal.h | 8 ++++++++
 2 files changed, 16 insertions(+)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 210d64fad0b1..b424af666b1e 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -33,6 +33,8 @@
 
 #include "mpam_internal.h"
 
+DEFINE_STATIC_KEY_FALSE(mpam_enabled); /* TODO: move to arch code */
+
 /*
  * mpam_list_lock protects the SRCU lists when writing. Once the
  * mpam_enabled key is enabled these lists are read-only,
@@ -1039,6 +1041,9 @@ static int mpam_discovery_cpu_online(unsigned int cpu)
 	struct mpam_msc *msc;
 	bool new_device_probed = false;
 
+	if (mpam_is_enabled())
+		return 0;
+
 	mutex_lock(&mpam_list_lock);
 	list_for_each_entry(msc, &mpam_all_msc, glbl_list) {
 		if (!cpumask_test_cpu(cpu, &msc->accessibility))
@@ -1835,6 +1840,7 @@ static void mpam_enable_once(void)
 		return;
 	}
 
+	static_branch_enable(&mpam_enabled);
 	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
 
 	printk(KERN_INFO "MPAM enabled with %u partid and %u pmg\n",
@@ -1902,6 +1908,8 @@ static irqreturn_t mpam_disable_thread(int irq, void *dev_id)
 	}
 	mutex_unlock(&mpam_cpuhp_state_lock);
 
+	static_branch_disable(&mpam_enabled);
+
 	mpam_unregister_irqs();
 
 	idx = srcu_read_lock(&mpam_srcu);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index c9418c9cf9f2..3476ee97f8ac 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -8,6 +8,7 @@
 #include <linux/atomic.h>
 #include <linux/cpumask.h>
 #include <linux/io.h>
+#include <linux/jump_label.h>
 #include <linux/llist.h>
 #include <linux/mailbox_client.h>
 #include <linux/mutex.h>
@@ -15,6 +16,13 @@
 #include <linux/sizes.h>
 #include <linux/srcu.h>
 
+DECLARE_STATIC_KEY_FALSE(mpam_enabled);
+
+static inline bool mpam_is_enabled(void)
+{
+	return static_branch_likely(&mpam_enabled);
+}
+
 /*
  * Structures protected by SRCU may not be freed for a surprising amount of
  * time (especially if perf is running). To ensure the MPAM error interrupt can
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * [PATCH 24/33] arm_mpam: Allow configuration to be applied and restored during cpu online
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (22 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 23/33] arm_mpam: Use a static key to indicate when mpam is enabled James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-28 16:13   ` Ben Horgan
  2025-08-22 15:30 ` [PATCH 25/33] arm_mpam: Probe and reset the rest of the features James Morse
                   ` (43 subsequent siblings)
  67 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Dave Martin
When CPUs come online the original configuration should be restored.
Once the maximum partid is known, allocate an configuration array for
each component, and reprogram each RIS configuration from this.
The MPAM spec describes how multiple controls can interact. To prevent
this happening by accident, always reset controls that don't have a
valid configuration. This allows the same helper to be used for
configuration and reset.
CC: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
 * Added a comment about the ordering around max_partid.
 * Allocate configurations after interrupts are registered to reduce churn.
 * Added mpam_assert_partid_sizes_fixed();
---
 drivers/resctrl/mpam_devices.c  | 253 +++++++++++++++++++++++++++++---
 drivers/resctrl/mpam_internal.h |  26 +++-
 2 files changed, 251 insertions(+), 28 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index b424af666b1e..8f6df2406c22 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -112,6 +112,16 @@ LIST_HEAD(mpam_classes);
 /* List of all objects that can be free()d after synchronise_srcu() */
 static LLIST_HEAD(mpam_garbage);
 
+/*
+ * Once mpam is enabled, new requestors cannot further reduce the available
+ * partid. Assert that the size is fixed, and new requestors will be turned
+ * away.
+ */
+static void mpam_assert_partid_sizes_fixed(void)
+{
+	WARN_ON_ONCE(!partid_max_published);
+}
+
 static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
 {
 	WARN_ON_ONCE(reg + sizeof(u32) > msc->mapped_hwpage_sz);
@@ -374,12 +384,16 @@ static void mpam_class_destroy(struct mpam_class *class)
 	add_to_garbage(class);
 }
 
+static void __destroy_component_cfg(struct mpam_component *comp);
+
 static void mpam_comp_destroy(struct mpam_component *comp)
 {
 	struct mpam_class *class = comp->class;
 
 	lockdep_assert_held(&mpam_list_lock);
 
+	__destroy_component_cfg(comp);
+
 	list_del_rcu(&comp->class_list);
 	add_to_garbage(comp);
 
@@ -911,51 +925,90 @@ static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
 	__mpam_write_reg(msc, reg, bm);
 }
 
-static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
+/* Called via IPI. Call while holding an SRCU reference */
+static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
+				      struct mpam_config *cfg)
 {
 	u16 bwa_fract = MPAMCFG_MBW_MAX_MAX;
 	struct mpam_msc *msc = ris->vmsc->msc;
 	struct mpam_props *rprops = &ris->props;
 
-	mpam_assert_srcu_read_lock_held();
-
 	mutex_lock(&msc->part_sel_lock);
 	__mpam_part_sel(ris->ris_idx, partid, msc);
 
-	if (mpam_has_feature(mpam_feat_cpor_part, rprops))
-		mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM, rprops->cpbm_wd);
+	if (mpam_has_feature(mpam_feat_cpor_part, rprops)) {
+		if (mpam_has_feature(mpam_feat_cpor_part, cfg))
+			mpam_write_partsel_reg(msc, CPBM, cfg->cpbm);
+		else
+			mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM,
+					      rprops->cpbm_wd);
+	}
 
-	if (mpam_has_feature(mpam_feat_mbw_part, rprops))
-		mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM, rprops->mbw_pbm_bits);
+	if (mpam_has_feature(mpam_feat_mbw_part, rprops)) {
+		if (mpam_has_feature(mpam_feat_mbw_part, cfg))
+			mpam_write_partsel_reg(msc, MBW_PBM, cfg->mbw_pbm);
+		else
+			mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM,
+					      rprops->mbw_pbm_bits);
+	}
 
 	if (mpam_has_feature(mpam_feat_mbw_min, rprops))
 		mpam_write_partsel_reg(msc, MBW_MIN, 0);
 
-	if (mpam_has_feature(mpam_feat_mbw_max, rprops))
-		mpam_write_partsel_reg(msc, MBW_MAX, bwa_fract);
+	if (mpam_has_feature(mpam_feat_mbw_max, rprops)) {
+		if (mpam_has_feature(mpam_feat_mbw_max, cfg))
+			mpam_write_partsel_reg(msc, MBW_MAX, cfg->mbw_max);
+		else
+			mpam_write_partsel_reg(msc, MBW_MAX, bwa_fract);
+	}
 
 	if (mpam_has_feature(mpam_feat_mbw_prop, rprops))
 		mpam_write_partsel_reg(msc, MBW_PROP, bwa_fract);
 	mutex_unlock(&msc->part_sel_lock);
 }
 
+struct reprogram_ris {
+	struct mpam_msc_ris *ris;
+	struct mpam_config *cfg;
+};
+
+/* Call with MSC lock held */
+static int mpam_reprogram_ris(void *_arg)
+{
+	u16 partid, partid_max;
+	struct reprogram_ris *arg = _arg;
+	struct mpam_msc_ris *ris = arg->ris;
+	struct mpam_config *cfg = arg->cfg;
+
+	if (ris->in_reset_state)
+		return 0;
+
+	spin_lock(&partid_max_lock);
+	partid_max = mpam_partid_max;
+	spin_unlock(&partid_max_lock);
+	for (partid = 0; partid <= partid_max; partid++)
+		mpam_reprogram_ris_partid(ris, partid, cfg);
+
+	return 0;
+}
+
 /*
  * Called via smp_call_on_cpu() to prevent migration, while still being
  * pre-emptible.
  */
 static int mpam_reset_ris(void *arg)
 {
-	u16 partid, partid_max;
 	struct mpam_msc_ris *ris = arg;
+	struct reprogram_ris reprogram_arg;
+	struct mpam_config empty_cfg = { 0 };
 
 	if (ris->in_reset_state)
 		return 0;
 
-	spin_lock(&partid_max_lock);
-	partid_max = mpam_partid_max;
-	spin_unlock(&partid_max_lock);
-	for (partid = 0; partid < partid_max; partid++)
-		mpam_reset_ris_partid(ris, partid);
+	reprogram_arg.ris = ris;
+	reprogram_arg.cfg = &empty_cfg;
+
+	mpam_reprogram_ris(&reprogram_arg);
 
 	return 0;
 }
@@ -986,13 +1039,11 @@ static int mpam_touch_msc(struct mpam_msc *msc, int (*fn)(void *a), void *arg)
 
 static void mpam_reset_msc(struct mpam_msc *msc, bool online)
 {
-	int idx;
 	struct mpam_msc_ris *ris;
 
 	mpam_assert_srcu_read_lock_held();
 
 	mpam_mon_sel_outer_lock(msc);
-	idx = srcu_read_lock(&mpam_srcu);
 	list_for_each_entry_srcu(ris, &msc->ris, msc_list, srcu_read_lock_held(&mpam_srcu)) {
 		mpam_touch_msc(msc, &mpam_reset_ris, ris);
 
@@ -1002,10 +1053,42 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
 		 */
 		ris->in_reset_state = online;
 	}
-	srcu_read_unlock(&mpam_srcu, idx);
 	mpam_mon_sel_outer_unlock(msc);
 }
 
+static void mpam_reprogram_msc(struct mpam_msc *msc)
+{
+	u16 partid;
+	bool reset;
+	struct mpam_config *cfg;
+	struct mpam_msc_ris *ris;
+
+	/*
+	 * No lock for mpam_partid_max as partid_max_published has been
+	 * set by mpam_enabled(), so the values can no longer change.
+	 */
+	mpam_assert_partid_sizes_fixed();
+
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_rcu(ris, &msc->ris, msc_list) {
+		if (!mpam_is_enabled() && !ris->in_reset_state) {
+			mpam_touch_msc(msc, &mpam_reset_ris, ris);
+			ris->in_reset_state = true;
+			continue;
+		}
+
+		reset = true;
+		for (partid = 0; partid <= mpam_partid_max; partid++) {
+			cfg = &ris->vmsc->comp->cfg[partid];
+			if (cfg->features)
+				reset = false;
+
+			mpam_reprogram_ris_partid(ris, partid, cfg);
+		}
+		ris->in_reset_state = reset;
+	}
+}
+
 static void _enable_percpu_irq(void *_irq)
 {
 	int *irq = _irq;
@@ -1027,7 +1110,7 @@ static int mpam_cpu_online(unsigned int cpu)
 			_enable_percpu_irq(&msc->reenable_error_ppi);
 
 		if (atomic_fetch_inc(&msc->online_refs) == 0)
-			mpam_reset_msc(msc, true);
+			mpam_reprogram_msc(msc);
 	}
 	srcu_read_unlock(&mpam_srcu, idx);
 
@@ -1807,6 +1890,45 @@ static void mpam_unregister_irqs(void)
 	cpus_read_unlock();
 }
 
+static void __destroy_component_cfg(struct mpam_component *comp)
+{
+	add_to_garbage(comp->cfg);
+}
+
+static int __allocate_component_cfg(struct mpam_component *comp)
+{
+	mpam_assert_partid_sizes_fixed();
+
+	if (comp->cfg)
+		return 0;
+
+	comp->cfg = kcalloc(mpam_partid_max + 1, sizeof(*comp->cfg), GFP_KERNEL);
+	if (!comp->cfg)
+		return -ENOMEM;
+	init_garbage(comp->cfg);
+
+	return 0;
+}
+
+static int mpam_allocate_config(void)
+{
+	int err = 0;
+	struct mpam_class *class;
+	struct mpam_component *comp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_for_each_entry(class, &mpam_classes, classes_list) {
+		list_for_each_entry(comp, &class->components, class_list) {
+			err = __allocate_component_cfg(comp);
+			if (err)
+				return err;
+		}
+	}
+
+	return 0;
+}
+
 static void mpam_enable_once(void)
 {
 	int err;
@@ -1826,12 +1948,21 @@ static void mpam_enable_once(void)
 	 */
 	cpus_read_lock();
 	mutex_lock(&mpam_list_lock);
-	mpam_enable_merge_features(&mpam_classes);
+	do {
+		mpam_enable_merge_features(&mpam_classes);
 
-	err = mpam_register_irqs();
-	if (err)
-		pr_warn("Failed to register irqs: %d\n", err);
+		err = mpam_register_irqs();
+		if (err) {
+			pr_warn("Failed to register irqs: %d\n", err);
+			break;
+		}
 
+		err = mpam_allocate_config();
+		if (err) {
+			pr_err("Failed to allocate configuration arrays.\n");
+			break;
+		}
+	} while (0);
 	mutex_unlock(&mpam_list_lock);
 	cpus_read_unlock();
 
@@ -1856,6 +1987,9 @@ static void mpam_reset_component_locked(struct mpam_component *comp)
 
 	might_sleep();
 	lockdep_assert_cpus_held();
+	mpam_assert_partid_sizes_fixed();
+
+	memset(comp->cfg, 0, (mpam_partid_max * sizeof(*comp->cfg)));
 
 	idx = srcu_read_lock(&mpam_srcu);
 	list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
@@ -1960,6 +2094,79 @@ void mpam_enable(struct work_struct *work)
 		mpam_enable_once();
 }
 
+struct mpam_write_config_arg {
+	struct mpam_msc_ris *ris;
+	struct mpam_component *comp;
+	u16 partid;
+};
+
+static int __write_config(void *arg)
+{
+	struct mpam_write_config_arg *c = arg;
+
+	mpam_reprogram_ris_partid(c->ris, c->partid, &c->comp->cfg[c->partid]);
+
+	return 0;
+}
+
+#define maybe_update_config(cfg, feature, newcfg, member, changes) do { \
+	if (mpam_has_feature(feature, newcfg) &&			\
+	    (newcfg)->member != (cfg)->member) {			\
+		(cfg)->member = (newcfg)->member;			\
+		cfg->features |= (1 << feature);			\
+									\
+		(changes) |= (1 << feature);				\
+	}								\
+} while (0)
+
+static mpam_features_t mpam_update_config(struct mpam_config *cfg,
+					  const struct mpam_config *newcfg)
+{
+	mpam_features_t changes = 0;
+
+	maybe_update_config(cfg, mpam_feat_cpor_part, newcfg, cpbm, changes);
+	maybe_update_config(cfg, mpam_feat_mbw_part, newcfg, mbw_pbm, changes);
+	maybe_update_config(cfg, mpam_feat_mbw_max, newcfg, mbw_max, changes);
+
+	return changes;
+}
+
+/* TODO: split into write_config/sync_config */
+/* TODO: add config_dirty bitmap to drive sync_config */
+int mpam_apply_config(struct mpam_component *comp, u16 partid,
+		      struct mpam_config *cfg)
+{
+	struct mpam_write_config_arg arg;
+	struct mpam_msc_ris *ris;
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc *msc;
+	int idx;
+
+	lockdep_assert_cpus_held();
+
+	/* Don't pass in the current config! */
+	WARN_ON_ONCE(&comp->cfg[partid] == cfg);
+
+	if (!mpam_update_config(&comp->cfg[partid], cfg))
+		return 0;
+
+	arg.comp = comp;
+	arg.partid = partid;
+
+	idx = srcu_read_lock(&mpam_srcu);
+	list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
+		msc = vmsc->msc;
+
+		list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
+			arg.ris = ris;
+			mpam_touch_msc(msc, __write_config, &arg);
+		}
+	}
+	srcu_read_unlock(&mpam_srcu, idx);
+
+	return 0;
+}
+
 /*
  * MSC that are hidden under caches are not created as platform devices
  * as there is no cache driver. Caches are also special-cased in
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 3476ee97f8ac..70cba9f22746 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -191,11 +191,7 @@ struct mpam_props {
 	u16			num_mbwu_mon;
 };
 
-static inline bool mpam_has_feature(enum mpam_device_features feat,
-				    struct mpam_props *props)
-{
-	return (1 << feat) & props->features;
-}
+#define mpam_has_feature(_feat, x)	((1 << (_feat)) & (x)->features)
 
 static inline void mpam_set_feature(enum mpam_device_features feat,
 				    struct mpam_props *props)
@@ -226,6 +222,17 @@ struct mpam_class {
 	struct mpam_garbage	garbage;
 };
 
+struct mpam_config {
+	/* Which configuration values are valid. 0 is used for reset */
+	mpam_features_t		features;
+
+	u32	cpbm;
+	u32	mbw_pbm;
+	u16	mbw_max;
+
+	struct mpam_garbage	garbage;
+};
+
 struct mpam_component {
 	u32			comp_id;
 
@@ -234,6 +241,12 @@ struct mpam_component {
 
 	cpumask_t		affinity;
 
+	/*
+	 * Array of configuration values, indexed by partid.
+	 * Read from cpuhp callbacks, hold the cpuhp lock when writing.
+	 */
+	struct mpam_config	*cfg;
+
 	/* member of mpam_class:components */
 	struct list_head	class_list;
 
@@ -298,6 +311,9 @@ extern u8 mpam_pmg_max;
 void mpam_enable(struct work_struct *work);
 void mpam_disable(struct work_struct *work);
 
+int mpam_apply_config(struct mpam_component *comp, u16 partid,
+		      struct mpam_config *cfg);
+
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
 
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 24/33] arm_mpam: Allow configuration to be applied and restored during cpu online
  2025-08-22 15:30 ` [PATCH 24/33] arm_mpam: Allow configuration to be applied and restored during cpu online James Morse
@ 2025-08-28 16:13   ` Ben Horgan
  2025-09-10 19:29     ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-08-28 16:13 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:30, James Morse wrote:
> When CPUs come online the original configuration should be restored.
> Once the maximum partid is known, allocate an configuration array for
> each component, and reprogram each RIS configuration from this.
> 
> The MPAM spec describes how multiple controls can interact. To prevent
> this happening by accident, always reset controls that don't have a
> valid configuration. This allows the same helper to be used for
> configuration and reset.
What in particular are you worried about here? It does seem a bit
wasteful that to update a single control in a ris all the controls in
that ris are updated. This is needed for reset and restore but do we
really want if we are just changing one control, e.g. the cache portion
bitmap.
> 
> CC: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
>  * Added a comment about the ordering around max_partid.
>  * Allocate configurations after interrupts are registered to reduce churn.
>  * Added mpam_assert_partid_sizes_fixed();
> ---
>  drivers/resctrl/mpam_devices.c  | 253 +++++++++++++++++++++++++++++---
>  drivers/resctrl/mpam_internal.h |  26 +++-
>  2 files changed, 251 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index b424af666b1e..8f6df2406c22 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -112,6 +112,16 @@ LIST_HEAD(mpam_classes);
>  /* List of all objects that can be free()d after synchronise_srcu() */
>  static LLIST_HEAD(mpam_garbage);
>  
> +/*
> + * Once mpam is enabled, new requestors cannot further reduce the available
> + * partid. Assert that the size is fixed, and new requestors will be turned
> + * away.
> + */
> +static void mpam_assert_partid_sizes_fixed(void)
> +{
> +	WARN_ON_ONCE(!partid_max_published);
> +}
> +
>  static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
>  {
>  	WARN_ON_ONCE(reg + sizeof(u32) > msc->mapped_hwpage_sz);
> @@ -374,12 +384,16 @@ static void mpam_class_destroy(struct mpam_class *class)
>  	add_to_garbage(class);
>  }
>  
> +static void __destroy_component_cfg(struct mpam_component *comp);
> +
>  static void mpam_comp_destroy(struct mpam_component *comp)
>  {
>  	struct mpam_class *class = comp->class;
>  
>  	lockdep_assert_held(&mpam_list_lock);
>  
> +	__destroy_component_cfg(comp);
> +
>  	list_del_rcu(&comp->class_list);
>  	add_to_garbage(comp);
>  
> @@ -911,51 +925,90 @@ static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
>  	__mpam_write_reg(msc, reg, bm);
>  }
>  
> -static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
> +/* Called via IPI. Call while holding an SRCU reference */
> +static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
> +				      struct mpam_config *cfg)
>  {
>  	u16 bwa_fract = MPAMCFG_MBW_MAX_MAX;
>  	struct mpam_msc *msc = ris->vmsc->msc;
>  	struct mpam_props *rprops = &ris->props;
>  
> -	mpam_assert_srcu_read_lock_held();
> -
>  	mutex_lock(&msc->part_sel_lock);
>  	__mpam_part_sel(ris->ris_idx, partid, msc);
>  
> -	if (mpam_has_feature(mpam_feat_cpor_part, rprops))
> -		mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM, rprops->cpbm_wd);
> +	if (mpam_has_feature(mpam_feat_cpor_part, rprops)) {
> +		if (mpam_has_feature(mpam_feat_cpor_part, cfg))
> +			mpam_write_partsel_reg(msc, CPBM, cfg->cpbm);
> +		else
> +			mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM,
> +					      rprops->cpbm_wd);
> +	}
>  
> -	if (mpam_has_feature(mpam_feat_mbw_part, rprops))
> -		mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM, rprops->mbw_pbm_bits);
> +	if (mpam_has_feature(mpam_feat_mbw_part, rprops)) {
> +		if (mpam_has_feature(mpam_feat_mbw_part, cfg))
> +			mpam_write_partsel_reg(msc, MBW_PBM, cfg->mbw_pbm);
> +		else
> +			mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM,
> +					      rprops->mbw_pbm_bits);
> +	}
>  
>  	if (mpam_has_feature(mpam_feat_mbw_min, rprops))
>  		mpam_write_partsel_reg(msc, MBW_MIN, 0);
>  
> -	if (mpam_has_feature(mpam_feat_mbw_max, rprops))
> -		mpam_write_partsel_reg(msc, MBW_MAX, bwa_fract);
> +	if (mpam_has_feature(mpam_feat_mbw_max, rprops)) {
> +		if (mpam_has_feature(mpam_feat_mbw_max, cfg))
> +			mpam_write_partsel_reg(msc, MBW_MAX, cfg->mbw_max);
> +		else
> +			mpam_write_partsel_reg(msc, MBW_MAX, bwa_fract);
> +	}
>  
>  	if (mpam_has_feature(mpam_feat_mbw_prop, rprops))
>  		mpam_write_partsel_reg(msc, MBW_PROP, bwa_fract);
>  	mutex_unlock(&msc->part_sel_lock);
>  }
>  
> +struct reprogram_ris {
> +	struct mpam_msc_ris *ris;
> +	struct mpam_config *cfg;
> +};
> +
> +/* Call with MSC lock held */
> +static int mpam_reprogram_ris(void *_arg)
> +{
> +	u16 partid, partid_max;
> +	struct reprogram_ris *arg = _arg;
> +	struct mpam_msc_ris *ris = arg->ris;
> +	struct mpam_config *cfg = arg->cfg;
> +
> +	if (ris->in_reset_state)
> +		return 0;
> +
> +	spin_lock(&partid_max_lock);
> +	partid_max = mpam_partid_max;
> +	spin_unlock(&partid_max_lock);
> +	for (partid = 0; partid <= partid_max; partid++)
> +		mpam_reprogram_ris_partid(ris, partid, cfg);
> +
> +	return 0;
> +}
> +
>  /*
>   * Called via smp_call_on_cpu() to prevent migration, while still being
>   * pre-emptible.
>   */
>  static int mpam_reset_ris(void *arg)
>  {
> -	u16 partid, partid_max;
>  	struct mpam_msc_ris *ris = arg;
> +	struct reprogram_ris reprogram_arg;
> +	struct mpam_config empty_cfg = { 0 };
>  
>  	if (ris->in_reset_state)
>  		return 0;
>  
> -	spin_lock(&partid_max_lock);
> -	partid_max = mpam_partid_max;
> -	spin_unlock(&partid_max_lock);
> -	for (partid = 0; partid < partid_max; partid++)
> -		mpam_reset_ris_partid(ris, partid);
> +	reprogram_arg.ris = ris;
> +	reprogram_arg.cfg = &empty_cfg;
> +
> +	mpam_reprogram_ris(&reprogram_arg);
>  
>  	return 0;
>  }
> @@ -986,13 +1039,11 @@ static int mpam_touch_msc(struct mpam_msc *msc, int (*fn)(void *a), void *arg)
>  
>  static void mpam_reset_msc(struct mpam_msc *msc, bool online)
>  {
> -	int idx;
>  	struct mpam_msc_ris *ris;
>  
>  	mpam_assert_srcu_read_lock_held();
>  
>  	mpam_mon_sel_outer_lock(msc);
> -	idx = srcu_read_lock(&mpam_srcu);
>  	list_for_each_entry_srcu(ris, &msc->ris, msc_list, srcu_read_lock_held(&mpam_srcu)) {
>  		mpam_touch_msc(msc, &mpam_reset_ris, ris);
>  
> @@ -1002,10 +1053,42 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
>  		 */
>  		ris->in_reset_state = online;
>  	}
> -	srcu_read_unlock(&mpam_srcu, idx);
>  	mpam_mon_sel_outer_unlock(msc);
>  }
>  
> +static void mpam_reprogram_msc(struct mpam_msc *msc)
> +{
> +	u16 partid;
> +	bool reset;
> +	struct mpam_config *cfg;
> +	struct mpam_msc_ris *ris;
> +
> +	/*
> +	 * No lock for mpam_partid_max as partid_max_published has been
> +	 * set by mpam_enabled(), so the values can no longer change.
> +	 */
> +	mpam_assert_partid_sizes_fixed();
> +
> +	guard(srcu)(&mpam_srcu);
> +	list_for_each_entry_rcu(ris, &msc->ris, msc_list) {
> +		if (!mpam_is_enabled() && !ris->in_reset_state) {
> +			mpam_touch_msc(msc, &mpam_reset_ris, ris);
> +			ris->in_reset_state = true;
> +			continue;
> +		}
> +
> +		reset = true;
> +		for (partid = 0; partid <= mpam_partid_max; partid++) {
> +			cfg = &ris->vmsc->comp->cfg[partid];
> +			if (cfg->features)
> +				reset = false;
> +
> +			mpam_reprogram_ris_partid(ris, partid, cfg);
> +		}
> +		ris->in_reset_state = reset;
> +	}
> +}
> +
>  static void _enable_percpu_irq(void *_irq)
>  {
>  	int *irq = _irq;
> @@ -1027,7 +1110,7 @@ static int mpam_cpu_online(unsigned int cpu)
>  			_enable_percpu_irq(&msc->reenable_error_ppi);
>  
>  		if (atomic_fetch_inc(&msc->online_refs) == 0)
> -			mpam_reset_msc(msc, true);
> +			mpam_reprogram_msc(msc);
>  	}
>  	srcu_read_unlock(&mpam_srcu, idx);
>  
> @@ -1807,6 +1890,45 @@ static void mpam_unregister_irqs(void)
>  	cpus_read_unlock();
>  }
>  
> +static void __destroy_component_cfg(struct mpam_component *comp)
> +{
> +	add_to_garbage(comp->cfg);
> +}
> +
> +static int __allocate_component_cfg(struct mpam_component *comp)
> +{
> +	mpam_assert_partid_sizes_fixed();
> +
> +	if (comp->cfg)
> +		return 0;
> +
> +	comp->cfg = kcalloc(mpam_partid_max + 1, sizeof(*comp->cfg), GFP_KERNEL);
> +	if (!comp->cfg)
> +		return -ENOMEM;
> +	init_garbage(comp->cfg);
> +
> +	return 0;
> +}
> +
> +static int mpam_allocate_config(void)
> +{
> +	int err = 0;
> +	struct mpam_class *class;
> +	struct mpam_component *comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_for_each_entry(class, &mpam_classes, classes_list) {
> +		list_for_each_entry(comp, &class->components, class_list) {
> +			err = __allocate_component_cfg(comp);
> +			if (err)
> +				return err;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>  static void mpam_enable_once(void)
>  {
>  	int err;
> @@ -1826,12 +1948,21 @@ static void mpam_enable_once(void)
>  	 */
>  	cpus_read_lock();
>  	mutex_lock(&mpam_list_lock);
> -	mpam_enable_merge_features(&mpam_classes);
> +	do {
> +		mpam_enable_merge_features(&mpam_classes);
>  
> -	err = mpam_register_irqs();
> -	if (err)
> -		pr_warn("Failed to register irqs: %d\n", err);
> +		err = mpam_register_irqs();
> +		if (err) {
> +			pr_warn("Failed to register irqs: %d\n", err);
> +			break;
> +		}
>  
> +		err = mpam_allocate_config();
> +		if (err) {
> +			pr_err("Failed to allocate configuration arrays.\n");
> +			break;
> +		}
> +	} while (0);
>  	mutex_unlock(&mpam_list_lock);
>  	cpus_read_unlock();
>  
> @@ -1856,6 +1987,9 @@ static void mpam_reset_component_locked(struct mpam_component *comp)
>  
>  	might_sleep();
>  	lockdep_assert_cpus_held();
> +	mpam_assert_partid_sizes_fixed();
> +
> +	memset(comp->cfg, 0, (mpam_partid_max * sizeof(*comp->cfg)));
>  
>  	idx = srcu_read_lock(&mpam_srcu);
>  	list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
> @@ -1960,6 +2094,79 @@ void mpam_enable(struct work_struct *work)
>  		mpam_enable_once();
>  }
>  
> +struct mpam_write_config_arg {
> +	struct mpam_msc_ris *ris;
> +	struct mpam_component *comp;
> +	u16 partid;
> +};
> +
> +static int __write_config(void *arg)
> +{
> +	struct mpam_write_config_arg *c = arg;
> +
> +	mpam_reprogram_ris_partid(c->ris, c->partid, &c->comp->cfg[c->partid]);
> +
> +	return 0;
> +}
> +
> +#define maybe_update_config(cfg, feature, newcfg, member, changes) do { \
> +	if (mpam_has_feature(feature, newcfg) &&			\
> +	    (newcfg)->member != (cfg)->member) {			\
> +		(cfg)->member = (newcfg)->member;			\
> +		cfg->features |= (1 << feature);			\
> +									\
> +		(changes) |= (1 << feature);				\
> +	}								\
> +} while (0)
> +
> +static mpam_features_t mpam_update_config(struct mpam_config *cfg,
> +					  const struct mpam_config *newcfg)
> +{
> +	mpam_features_t changes = 0;
> +
> +	maybe_update_config(cfg, mpam_feat_cpor_part, newcfg, cpbm, changes);
> +	maybe_update_config(cfg, mpam_feat_mbw_part, newcfg, mbw_pbm, changes);
> +	maybe_update_config(cfg, mpam_feat_mbw_max, newcfg, mbw_max, changes);
> +
> +	return changes;
> +}
> +
> +/* TODO: split into write_config/sync_config */
> +/* TODO: add config_dirty bitmap to drive sync_config */
Any changes to come for these TODO comments?
> +int mpam_apply_config(struct mpam_component *comp, u16 partid,
> +		      struct mpam_config *cfg)
> +{
> +	struct mpam_write_config_arg arg;
> +	struct mpam_msc_ris *ris;
> +	struct mpam_vmsc *vmsc;
> +	struct mpam_msc *msc;
> +	int idx;
> +
> +	lockdep_assert_cpus_held();
> +
> +	/* Don't pass in the current config! */
> +	WARN_ON_ONCE(&comp->cfg[partid] == cfg);
> +
> +	if (!mpam_update_config(&comp->cfg[partid], cfg))
> +		return 0;
> +
> +	arg.comp = comp;
> +	arg.partid = partid;
> +
> +	idx = srcu_read_lock(&mpam_srcu);
> +	list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
> +		msc = vmsc->msc;
> +
> +		list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
> +			arg.ris = ris;
> +			mpam_touch_msc(msc, __write_config, &arg);
> +		}
> +	}
> +	srcu_read_unlock(&mpam_srcu, idx);
> +
> +	return 0;
> +}
> +
>  /*
>   * MSC that are hidden under caches are not created as platform devices
>   * as there is no cache driver. Caches are also special-cased in
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 3476ee97f8ac..70cba9f22746 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -191,11 +191,7 @@ struct mpam_props {
>  	u16			num_mbwu_mon;
>  };
>  
> -static inline bool mpam_has_feature(enum mpam_device_features feat,
> -				    struct mpam_props *props)
> -{
> -	return (1 << feat) & props->features;
> -}
> +#define mpam_has_feature(_feat, x)	((1 << (_feat)) & (x)->features)
>  
>  static inline void mpam_set_feature(enum mpam_device_features feat,
>  				    struct mpam_props *props)
> @@ -226,6 +222,17 @@ struct mpam_class {
>  	struct mpam_garbage	garbage;
>  };
>  
> +struct mpam_config {
> +	/* Which configuration values are valid. 0 is used for reset */
> +	mpam_features_t		features;
> +
> +	u32	cpbm;
> +	u32	mbw_pbm;
> +	u16	mbw_max;
> +
> +	struct mpam_garbage	garbage;
> +};
> +
>  struct mpam_component {
>  	u32			comp_id;
>  
> @@ -234,6 +241,12 @@ struct mpam_component {
>  
>  	cpumask_t		affinity;
>  
> +	/*
> +	 * Array of configuration values, indexed by partid.
> +	 * Read from cpuhp callbacks, hold the cpuhp lock when writing.
> +	 */
> +	struct mpam_config	*cfg;
> +
>  	/* member of mpam_class:components */
>  	struct list_head	class_list;
>  
> @@ -298,6 +311,9 @@ extern u8 mpam_pmg_max;
>  void mpam_enable(struct work_struct *work);
>  void mpam_disable(struct work_struct *work);
>  
> +int mpam_apply_config(struct mpam_component *comp, u16 partid,
> +		      struct mpam_config *cfg);
> +
>  int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
>  				   cpumask_t *affinity);
>  
Thanks,
Ben
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 24/33] arm_mpam: Allow configuration to be applied and restored during cpu online
  2025-08-28 16:13   ` Ben Horgan
@ 2025-09-10 19:29     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 19:29 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi Ben,
On 28/08/2025 17:13, Ben Horgan wrote:
> On 8/22/25 16:30, James Morse wrote:
>> When CPUs come online the original configuration should be restored.
>> Once the maximum partid is known, allocate an configuration array for
>> each component, and reprogram each RIS configuration from this.
>>
>> The MPAM spec describes how multiple controls can interact. To prevent
>> this happening by accident, always reset controls that don't have a
>> valid configuration. This allows the same helper to be used for
>> configuration and reset.
> What in particular are you worried about here?
'other' controls being left in an unknown state - meaning the one you did set, is useless.
In a sane world, the thing writing the controls would write all the supported registers.
In practice, resctrl only knows about bitmaps. The glue code could provide all the other
values, but I figured it was better for the driver to do it. I'm sure they'll add other
control types, and it would be a nuisance to update multiple callers if there is ever more
than one.
Another angle down here is mismatched components/devices mean a control type could be
hidden if its not available everywhere - so the caller may not be aware of all the
controls it was supposed to provide.
> It does seem a bit
> wasteful that to update a single control in a ris all the controls in
> that ris are updated.
I don't think anyone would ever build something that supports all these. One is most
likely, pushing to three for platforms that support CPOR and CMIN/MAX. By the time you've
taken the IPI to access a cache MSC, the cost of an additional register access is negligible.
> This is needed for reset and restore but do we
> really want if we are just changing one control, e.g. the cache portion
> bitmap.
The original config has been blown away by this point, but we do have the bitmap of what
changed. I guess this is an emergent effect of __write_config() originating from the reset
helper, and the 'empty config' being used to reset devices.
I'd like to keep it as a single function that actually touches these registers.
I'll change to generate a 'maximal' config instead of empty for reset - which pulls the
policy on those values out, and drops the '0 for reset'.
huh ... that is what the ALL_FEATURES macro you pointed out was for ...
I suspect it was the bitmaps that are larger than a u32 that made this hard.
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index b424af666b1e..8f6df2406c22 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -1960,6 +2094,79 @@ void mpam_enable(struct work_struct *work)
>>  		mpam_enable_once();
>>  }
>>  
>> +struct mpam_write_config_arg {
>> +	struct mpam_msc_ris *ris;
>> +	struct mpam_component *comp;
>> +	u16 partid;
>> +};
>> +
>> +static int __write_config(void *arg)
>> +{
>> +	struct mpam_write_config_arg *c = arg;
>> +
>> +	mpam_reprogram_ris_partid(c->ris, c->partid, &c->comp->cfg[c->partid]);
>> +
>> +	return 0;
>> +}
>> +
>> +#define maybe_update_config(cfg, feature, newcfg, member, changes) do { \
>> +	if (mpam_has_feature(feature, newcfg) &&			\
>> +	    (newcfg)->member != (cfg)->member) {			\
>> +		(cfg)->member = (newcfg)->member;			\
>> +		cfg->features |= (1 << feature);			\
>> +									\
>> +		(changes) |= (1 << feature);				\
>> +	}								\
>> +} while (0)
>> +
>> +static mpam_features_t mpam_update_config(struct mpam_config *cfg,
>> +					  const struct mpam_config *newcfg)
>> +{
>> +	mpam_features_t changes = 0;
>> +
>> +	maybe_update_config(cfg, mpam_feat_cpor_part, newcfg, cpbm, changes);
>> +	maybe_update_config(cfg, mpam_feat_mbw_part, newcfg, mbw_pbm, changes);
>> +	maybe_update_config(cfg, mpam_feat_mbw_max, newcfg, mbw_max, changes);
>> +
>> +	return changes;
>> +}
>> +
>> +/* TODO: split into write_config/sync_config */
>> +/* TODO: add config_dirty bitmap to drive sync_config */
> Any changes to come for these TODO comments?
No time. The dirty bitmap was to help with the problem you highlighted above. Separating
into write/sync was to make it easier to support the firmware-backed thing, which can be
a problem for another day.
I'll drop these.
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
 
- * [PATCH 25/33] arm_mpam: Probe and reset the rest of the features
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (23 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 24/33] arm_mpam: Allow configuration to be applied and restored during cpu online James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-28 10:11   ` Ben Horgan
  2025-08-22 15:30 ` [PATCH 26/33] arm_mpam: Add helpers to allocate monitors James Morse
                   ` (42 subsequent siblings)
  67 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Rohit Mathew,
	Zeng Heng, Dave Martin
MPAM supports more features than are going to be exposed to resctrl.
For partid other than 0, the reset values of these controls isn't
known.
Discover the rest of the features so they can be reset to avoid any
side effects when resctrl is in use.
PARTID narrowing allows MSC/RIS to support less configuration space than
is usable. If this feature is found on a class of device we are likely
to use, then reduce the partid_max to make it usable. This allows us
to map a PARTID to itself.
CC: Rohit Mathew <Rohit.Mathew@arm.com>
CC: Zeng Heng <zengheng4@huawei.com>
CC: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_devices.c  | 175 ++++++++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h |  16 ++-
 2 files changed, 189 insertions(+), 2 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 8f6df2406c22..aedd743d6827 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -213,6 +213,15 @@ static void __mpam_part_sel(u8 ris_idx, u16 partid, struct mpam_msc *msc)
 	__mpam_part_sel_raw(partsel, msc);
 }
 
+static void __mpam_intpart_sel(u8 ris_idx, u16 intpartid, struct mpam_msc *msc)
+{
+	u32 partsel = FIELD_PREP(MPAMCFG_PART_SEL_RIS, ris_idx) |
+		      FIELD_PREP(MPAMCFG_PART_SEL_PARTID_SEL, intpartid) |
+		      MPAMCFG_PART_SEL_INTERNAL;
+
+	__mpam_part_sel_raw(partsel, msc);
+}
+
 int mpam_register_requestor(u16 partid_max, u8 pmg_max)
 {
 	int err = 0;
@@ -743,10 +752,35 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
 	int err;
 	struct mpam_msc *msc = ris->vmsc->msc;
 	struct mpam_props *props = &ris->props;
+	struct mpam_class *class = ris->vmsc->comp->class;
 
 	lockdep_assert_held(&msc->probe_lock);
 	lockdep_assert_held(&msc->part_sel_lock);
 
+	/* Cache Capacity Partitioning */
+	if (FIELD_GET(MPAMF_IDR_HAS_CCAP_PART, ris->idr)) {
+		u32 ccap_features = mpam_read_partsel_reg(msc, CCAP_IDR);
+
+		props->cmax_wd = FIELD_GET(MPAMF_CCAP_IDR_CMAX_WD, ccap_features);
+		if (props->cmax_wd &&
+		    FIELD_GET(MPAMF_CCAP_IDR_HAS_CMAX_SOFTLIM, ccap_features))
+			mpam_set_feature(mpam_feat_cmax_softlim, props);
+
+		if (props->cmax_wd &&
+		    !FIELD_GET(MPAMF_CCAP_IDR_NO_CMAX, ccap_features))
+			mpam_set_feature(mpam_feat_cmax_cmax, props);
+
+		if (props->cmax_wd &&
+		    FIELD_GET(MPAMF_CCAP_IDR_HAS_CMIN, ccap_features))
+			mpam_set_feature(mpam_feat_cmax_cmin, props);
+
+		props->cassoc_wd = FIELD_GET(MPAMF_CCAP_IDR_CASSOC_WD, ccap_features);
+
+		if (props->cassoc_wd &&
+		    FIELD_GET(MPAMF_CCAP_IDR_HAS_CASSOC, ccap_features))
+			mpam_set_feature(mpam_feat_cmax_cassoc, props);
+	}
+
 	/* Cache Portion partitioning */
 	if (FIELD_GET(MPAMF_IDR_HAS_CPOR_PART, ris->idr)) {
 		u32 cpor_features = mpam_read_partsel_reg(msc, CPOR_IDR);
@@ -769,6 +803,31 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
 		props->bwa_wd = FIELD_GET(MPAMF_MBW_IDR_BWA_WD, mbw_features);
 		if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MAX, mbw_features))
 			mpam_set_feature(mpam_feat_mbw_max, props);
+
+		if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MIN, mbw_features))
+			mpam_set_feature(mpam_feat_mbw_min, props);
+
+		if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_PROP, mbw_features))
+			mpam_set_feature(mpam_feat_mbw_prop, props);
+	}
+
+	/* Priority partitioning */
+	if (FIELD_GET(MPAMF_IDR_HAS_PRI_PART, ris->idr)) {
+		u32 pri_features = mpam_read_partsel_reg(msc, PRI_IDR);
+
+		props->intpri_wd = FIELD_GET(MPAMF_PRI_IDR_INTPRI_WD, pri_features);
+		if (props->intpri_wd && FIELD_GET(MPAMF_PRI_IDR_HAS_INTPRI, pri_features)) {
+			mpam_set_feature(mpam_feat_intpri_part, props);
+			if (FIELD_GET(MPAMF_PRI_IDR_INTPRI_0_IS_LOW, pri_features))
+				mpam_set_feature(mpam_feat_intpri_part_0_low, props);
+		}
+
+		props->dspri_wd = FIELD_GET(MPAMF_PRI_IDR_DSPRI_WD, pri_features);
+		if (props->dspri_wd && FIELD_GET(MPAMF_PRI_IDR_HAS_DSPRI, pri_features)) {
+			mpam_set_feature(mpam_feat_dspri_part, props);
+			if (FIELD_GET(MPAMF_PRI_IDR_DSPRI_0_IS_LOW, pri_features))
+				mpam_set_feature(mpam_feat_dspri_part_0_low, props);
+		}
 	}
 
 	/* Performance Monitoring */
@@ -832,6 +891,21 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
 			 */
 		}
 	}
+
+	/*
+	 * RIS with PARTID narrowing don't have enough storage for one
+	 * configuration per PARTID. If these are in a class we could use,
+	 * reduce the supported partid_max to match the number of intpartid.
+	 * If the class is unknown, just ignore it.
+	 */
+	if (FIELD_GET(MPAMF_IDR_HAS_PARTID_NRW, ris->idr) &&
+	    class->type != MPAM_CLASS_UNKNOWN) {
+		u32 nrwidr = mpam_read_partsel_reg(msc, PARTID_NRW_IDR);
+		u16 partid_max = FIELD_GET(MPAMF_PARTID_NRW_IDR_INTPARTID_MAX, nrwidr);
+
+		mpam_set_feature(mpam_feat_partid_nrw, props);
+		msc->partid_max = min(msc->partid_max, partid_max);
+	}
 }
 
 static int mpam_msc_hw_probe(struct mpam_msc *msc)
@@ -929,13 +1003,29 @@ static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
 static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
 				      struct mpam_config *cfg)
 {
+	u32 pri_val = 0;
+	u16 cmax = MPAMCFG_CMAX_CMAX;
 	u16 bwa_fract = MPAMCFG_MBW_MAX_MAX;
 	struct mpam_msc *msc = ris->vmsc->msc;
 	struct mpam_props *rprops = &ris->props;
+	u16 dspri = GENMASK(rprops->dspri_wd, 0);
+	u16 intpri = GENMASK(rprops->intpri_wd, 0);
 
 	mutex_lock(&msc->part_sel_lock);
 	__mpam_part_sel(ris->ris_idx, partid, msc);
 
+	if (mpam_has_feature(mpam_feat_partid_nrw, rprops)) {
+		/* Update the intpartid mapping */
+		mpam_write_partsel_reg(msc, INTPARTID,
+				       MPAMCFG_INTPARTID_INTERNAL | partid);
+
+		/*
+		 * Then switch to the 'internal' partid to update the
+		 * configuration.
+		 */
+		__mpam_intpart_sel(ris->ris_idx, partid, msc);
+	}
+
 	if (mpam_has_feature(mpam_feat_cpor_part, rprops)) {
 		if (mpam_has_feature(mpam_feat_cpor_part, cfg))
 			mpam_write_partsel_reg(msc, CPBM, cfg->cpbm);
@@ -964,6 +1054,29 @@ static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
 
 	if (mpam_has_feature(mpam_feat_mbw_prop, rprops))
 		mpam_write_partsel_reg(msc, MBW_PROP, bwa_fract);
+
+	if (mpam_has_feature(mpam_feat_cmax_cmax, rprops))
+		mpam_write_partsel_reg(msc, CMAX, cmax);
+
+	if (mpam_has_feature(mpam_feat_cmax_cmin, rprops))
+		mpam_write_partsel_reg(msc, CMIN, 0);
+
+	if (mpam_has_feature(mpam_feat_intpri_part, rprops) ||
+	    mpam_has_feature(mpam_feat_dspri_part, rprops)) {
+		/* aces high? */
+		if (!mpam_has_feature(mpam_feat_intpri_part_0_low, rprops))
+			intpri = 0;
+		if (!mpam_has_feature(mpam_feat_dspri_part_0_low, rprops))
+			dspri = 0;
+
+		if (mpam_has_feature(mpam_feat_intpri_part, rprops))
+			pri_val |= FIELD_PREP(MPAMCFG_PRI_INTPRI, intpri);
+		if (mpam_has_feature(mpam_feat_dspri_part, rprops))
+			pri_val |= FIELD_PREP(MPAMCFG_PRI_DSPRI, dspri);
+
+		mpam_write_partsel_reg(msc, PRI, pri_val);
+	}
+
 	mutex_unlock(&msc->part_sel_lock);
 }
 
@@ -1529,6 +1642,16 @@ static bool mpam_has_bwa_wd_feature(struct mpam_props *props)
 	return false;
 }
 
+/* Any of these features mean the CMAX_WD field is valid. */
+static bool mpam_has_cmax_wd_feature(struct mpam_props *props)
+{
+	if (mpam_has_feature(mpam_feat_cmax_cmax, props))
+		return true;
+	if (mpam_has_feature(mpam_feat_cmax_cmin, props))
+		return true;
+	return false;
+}
+
 #define MISMATCHED_HELPER(parent, child, helper, field, alias)		\
 	helper(parent) &&						\
 	((helper(child) && (parent)->field != (child)->field) ||	\
@@ -1583,6 +1706,23 @@ static void __props_mismatch(struct mpam_props *parent,
 		parent->bwa_wd = min(parent->bwa_wd, child->bwa_wd);
 	}
 
+	if (alias && !mpam_has_cmax_wd_feature(parent) && mpam_has_cmax_wd_feature(child)) {
+		parent->cmax_wd = child->cmax_wd;
+	} else if (MISMATCHED_HELPER(parent, child, mpam_has_cmax_wd_feature,
+				     cmax_wd, alias)) {
+		pr_debug("%s took the min cmax_wd\n", __func__);
+		parent->cmax_wd = min(parent->cmax_wd, child->cmax_wd);
+	}
+
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_cmax_cassoc, alias)) {
+		parent->cassoc_wd = child->cassoc_wd;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_cmax_cassoc,
+				   cassoc_wd, alias)) {
+		pr_debug("%s cleared cassoc_wd\n", __func__);
+		mpam_clear_feature(mpam_feat_cmax_cassoc, &parent->features);
+		parent->cassoc_wd = 0;
+	}
+
 	/* For num properties, take the minimum */
 	if (CAN_MERGE_FEAT(parent, child, mpam_feat_msmon_csu, alias)) {
 		parent->num_csu_mon = child->num_csu_mon;
@@ -1600,6 +1740,41 @@ static void __props_mismatch(struct mpam_props *parent,
 		parent->num_mbwu_mon = min(parent->num_mbwu_mon, child->num_mbwu_mon);
 	}
 
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_intpri_part, alias)) {
+		parent->intpri_wd = child->intpri_wd;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_intpri_part,
+				   intpri_wd, alias)) {
+		pr_debug("%s took the min intpri_wd\n", __func__);
+		parent->intpri_wd = min(parent->intpri_wd, child->intpri_wd);
+	}
+
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_dspri_part, alias)) {
+		parent->dspri_wd = child->dspri_wd;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_dspri_part,
+				   dspri_wd, alias)) {
+		pr_debug("%s took the min dspri_wd\n", __func__);
+		parent->dspri_wd = min(parent->dspri_wd, child->dspri_wd);
+	}
+
+	/* TODO: alias support for these two */
+	/* {int,ds}pri may not have differing 0-low behaviour */
+	if (mpam_has_feature(mpam_feat_intpri_part, parent) &&
+	    (!mpam_has_feature(mpam_feat_intpri_part, child) ||
+	     mpam_has_feature(mpam_feat_intpri_part_0_low, parent) !=
+	     mpam_has_feature(mpam_feat_intpri_part_0_low, child))) {
+		pr_debug("%s cleared intpri_part\n", __func__);
+		mpam_clear_feature(mpam_feat_intpri_part, &parent->features);
+		mpam_clear_feature(mpam_feat_intpri_part_0_low, &parent->features);
+	}
+	if (mpam_has_feature(mpam_feat_dspri_part, parent) &&
+	    (!mpam_has_feature(mpam_feat_dspri_part, child) ||
+	     mpam_has_feature(mpam_feat_dspri_part_0_low, parent) !=
+	     mpam_has_feature(mpam_feat_dspri_part_0_low, child))) {
+		pr_debug("%s cleared dspri_part\n", __func__);
+		mpam_clear_feature(mpam_feat_dspri_part, &parent->features);
+		mpam_clear_feature(mpam_feat_dspri_part_0_low, &parent->features);
+	}
+
 	if (alias) {
 		/* Merge features for aliased resources */
 		parent->features |= child->features;
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 70cba9f22746..23445aedbabd 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -157,16 +157,23 @@ static inline void mpam_mon_sel_lock_held(struct mpam_msc *msc)
  * When we compact the supported features, we don't care what they are.
  * Storing them as a bitmap makes life easy.
  */
-typedef u16 mpam_features_t;
+typedef u32 mpam_features_t;
 
 /* Bits for mpam_features_t */
 enum mpam_device_features {
-	mpam_feat_ccap_part = 0,
+	mpam_feat_cmax_softlim,
+	mpam_feat_cmax_cmax,
+	mpam_feat_cmax_cmin,
+	mpam_feat_cmax_cassoc,
 	mpam_feat_cpor_part,
 	mpam_feat_mbw_part,
 	mpam_feat_mbw_min,
 	mpam_feat_mbw_max,
 	mpam_feat_mbw_prop,
+	mpam_feat_intpri_part,
+	mpam_feat_intpri_part_0_low,
+	mpam_feat_dspri_part,
+	mpam_feat_dspri_part_0_low,
 	mpam_feat_msmon,
 	mpam_feat_msmon_csu,
 	mpam_feat_msmon_csu_capture,
@@ -176,6 +183,7 @@ enum mpam_device_features {
 	mpam_feat_msmon_mbwu_rwbw,
 	mpam_feat_msmon_mbwu_hw_nrdy,
 	mpam_feat_msmon_capt,
+	mpam_feat_partid_nrw,
 	MPAM_FEATURE_LAST,
 };
 static_assert(BITS_PER_TYPE(mpam_features_t) >= MPAM_FEATURE_LAST);
@@ -187,6 +195,10 @@ struct mpam_props {
 	u16			cpbm_wd;
 	u16			mbw_pbm_bits;
 	u16			bwa_wd;
+	u16			cmax_wd;
+	u16			cassoc_wd;
+	u16			intpri_wd;
+	u16			dspri_wd;
 	u16			num_csu_mon;
 	u16			num_mbwu_mon;
 };
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 25/33] arm_mpam: Probe and reset the rest of the features
  2025-08-22 15:30 ` [PATCH 25/33] arm_mpam: Probe and reset the rest of the features James Morse
@ 2025-08-28 10:11   ` Ben Horgan
  2025-09-10 19:30     ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-08-28 10:11 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich, Zeng Heng
Hi James,
On 8/22/25 16:30, James Morse wrote:
> MPAM supports more features than are going to be exposed to resctrl.
> For partid other than 0, the reset values of these controls isn't
> known.
> 
> Discover the rest of the features so they can be reset to avoid any
> side effects when resctrl is in use.
> 
> PARTID narrowing allows MSC/RIS to support less configuration space than
> is usable. If this feature is found on a class of device we are likely
> to use, then reduce the partid_max to make it usable. This allows us
> to map a PARTID to itself.
> 
> CC: Rohit Mathew <Rohit.Mathew@arm.com>
> CC: Zeng Heng <zengheng4@huawei.com>
> CC: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  drivers/resctrl/mpam_devices.c  | 175 ++++++++++++++++++++++++++++++++
>  drivers/resctrl/mpam_internal.h |  16 ++-
>  2 files changed, 189 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 8f6df2406c22..aedd743d6827 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -213,6 +213,15 @@ static void __mpam_part_sel(u8 ris_idx, u16 partid, struct mpam_msc *msc)
>  	__mpam_part_sel_raw(partsel, msc);
>  }
>  
> +static void __mpam_intpart_sel(u8 ris_idx, u16 intpartid, struct mpam_msc *msc)
> +{
> +	u32 partsel = FIELD_PREP(MPAMCFG_PART_SEL_RIS, ris_idx) |
> +		      FIELD_PREP(MPAMCFG_PART_SEL_PARTID_SEL, intpartid) |
> +		      MPAMCFG_PART_SEL_INTERNAL;
> +
> +	__mpam_part_sel_raw(partsel, msc);
> +}
> +
>  int mpam_register_requestor(u16 partid_max, u8 pmg_max)
>  {
>  	int err = 0;
> @@ -743,10 +752,35 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
>  	int err;
>  	struct mpam_msc *msc = ris->vmsc->msc;
>  	struct mpam_props *props = &ris->props;
> +	struct mpam_class *class = ris->vmsc->comp->class;
>  
>  	lockdep_assert_held(&msc->probe_lock);
>  	lockdep_assert_held(&msc->part_sel_lock);
>  
> +	/* Cache Capacity Partitioning */
> +	if (FIELD_GET(MPAMF_IDR_HAS_CCAP_PART, ris->idr)) {
> +		u32 ccap_features = mpam_read_partsel_reg(msc, CCAP_IDR);
> +
> +		props->cmax_wd = FIELD_GET(MPAMF_CCAP_IDR_CMAX_WD, ccap_features);
> +		if (props->cmax_wd &&
> +		    FIELD_GET(MPAMF_CCAP_IDR_HAS_CMAX_SOFTLIM, ccap_features))
> +			mpam_set_feature(mpam_feat_cmax_softlim, props);
> +
> +		if (props->cmax_wd &&
> +		    !FIELD_GET(MPAMF_CCAP_IDR_NO_CMAX, ccap_features))
> +			mpam_set_feature(mpam_feat_cmax_cmax, props);
> +
> +		if (props->cmax_wd &&
> +		    FIELD_GET(MPAMF_CCAP_IDR_HAS_CMIN, ccap_features))
> +			mpam_set_feature(mpam_feat_cmax_cmin, props);
> +
> +		props->cassoc_wd = FIELD_GET(MPAMF_CCAP_IDR_CASSOC_WD, ccap_features);
> +
> +		if (props->cassoc_wd &&
> +		    FIELD_GET(MPAMF_CCAP_IDR_HAS_CASSOC, ccap_features))
> +			mpam_set_feature(mpam_feat_cmax_cassoc, props);
> +	}
> +
>  	/* Cache Portion partitioning */
>  	if (FIELD_GET(MPAMF_IDR_HAS_CPOR_PART, ris->idr)) {
>  		u32 cpor_features = mpam_read_partsel_reg(msc, CPOR_IDR);
> @@ -769,6 +803,31 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
>  		props->bwa_wd = FIELD_GET(MPAMF_MBW_IDR_BWA_WD, mbw_features);
>  		if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MAX, mbw_features))
>  			mpam_set_feature(mpam_feat_mbw_max, props);
> +
> +		if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MIN, mbw_features))
> +			mpam_set_feature(mpam_feat_mbw_min, props);
> +
> +		if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_PROP, mbw_features))
> +			mpam_set_feature(mpam_feat_mbw_prop, props);
> +	}
> +
> +	/* Priority partitioning */
> +	if (FIELD_GET(MPAMF_IDR_HAS_PRI_PART, ris->idr)) {
> +		u32 pri_features = mpam_read_partsel_reg(msc, PRI_IDR);
> +
> +		props->intpri_wd = FIELD_GET(MPAMF_PRI_IDR_INTPRI_WD, pri_features);
> +		if (props->intpri_wd && FIELD_GET(MPAMF_PRI_IDR_HAS_INTPRI, pri_features)) {
> +			mpam_set_feature(mpam_feat_intpri_part, props);
> +			if (FIELD_GET(MPAMF_PRI_IDR_INTPRI_0_IS_LOW, pri_features))
> +				mpam_set_feature(mpam_feat_intpri_part_0_low, props);
> +		}
> +
> +		props->dspri_wd = FIELD_GET(MPAMF_PRI_IDR_DSPRI_WD, pri_features);
> +		if (props->dspri_wd && FIELD_GET(MPAMF_PRI_IDR_HAS_DSPRI, pri_features)) {
> +			mpam_set_feature(mpam_feat_dspri_part, props);
> +			if (FIELD_GET(MPAMF_PRI_IDR_DSPRI_0_IS_LOW, pri_features))
> +				mpam_set_feature(mpam_feat_dspri_part_0_low, props);
> +		}
>  	}
>  
>  	/* Performance Monitoring */
> @@ -832,6 +891,21 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
>  			 */
>  		}
>  	}
> +
> +	/*
> +	 * RIS with PARTID narrowing don't have enough storage for one
> +	 * configuration per PARTID. If these are in a class we could use,
> +	 * reduce the supported partid_max to match the number of intpartid.
> +	 * If the class is unknown, just ignore it.
> +	 */
> +	if (FIELD_GET(MPAMF_IDR_HAS_PARTID_NRW, ris->idr) &&
> +	    class->type != MPAM_CLASS_UNKNOWN) {
> +		u32 nrwidr = mpam_read_partsel_reg(msc, PARTID_NRW_IDR);
> +		u16 partid_max = FIELD_GET(MPAMF_PARTID_NRW_IDR_INTPARTID_MAX, nrwidr);
> +
> +		mpam_set_feature(mpam_feat_partid_nrw, props);
> +		msc->partid_max = min(msc->partid_max, partid_max);
> +	}
>  }
>  
>  static int mpam_msc_hw_probe(struct mpam_msc *msc)
> @@ -929,13 +1003,29 @@ static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
>  static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
>  				      struct mpam_config *cfg)
>  {
> +	u32 pri_val = 0;
> +	u16 cmax = MPAMCFG_CMAX_CMAX;
>  	u16 bwa_fract = MPAMCFG_MBW_MAX_MAX;
>  	struct mpam_msc *msc = ris->vmsc->msc;
>  	struct mpam_props *rprops = &ris->props;
> +	u16 dspri = GENMASK(rprops->dspri_wd, 0);
> +	u16 intpri = GENMASK(rprops->intpri_wd, 0);
>  
>  	mutex_lock(&msc->part_sel_lock);
>  	__mpam_part_sel(ris->ris_idx, partid, msc);
>  
> +	if (mpam_has_feature(mpam_feat_partid_nrw, rprops)) {
> +		/* Update the intpartid mapping */
> +		mpam_write_partsel_reg(msc, INTPARTID,
> +				       MPAMCFG_INTPARTID_INTERNAL | partid);
> +
> +		/*
> +		 * Then switch to the 'internal' partid to update the
> +		 * configuration.
> +		 */
> +		__mpam_intpart_sel(ris->ris_idx, partid, msc);
> +	}
> +
>  	if (mpam_has_feature(mpam_feat_cpor_part, rprops)) {
>  		if (mpam_has_feature(mpam_feat_cpor_part, cfg))
>  			mpam_write_partsel_reg(msc, CPBM, cfg->cpbm);
> @@ -964,6 +1054,29 @@ static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
>  
>  	if (mpam_has_feature(mpam_feat_mbw_prop, rprops))
>  		mpam_write_partsel_reg(msc, MBW_PROP, bwa_fract);
> +
> +	if (mpam_has_feature(mpam_feat_cmax_cmax, rprops))
> +		mpam_write_partsel_reg(msc, CMAX, cmax);
> +
> +	if (mpam_has_feature(mpam_feat_cmax_cmin, rprops))
> +		mpam_write_partsel_reg(msc, CMIN, 0);
Missing reset for cmax_cassoc. I wonder if it makes sense to have
separate enums for partitioning features, which require reset, and the rest.
> +
> +	if (mpam_has_feature(mpam_feat_intpri_part, rprops) ||
> +	    mpam_has_feature(mpam_feat_dspri_part, rprops)) {
> +		/* aces high? */
> +		if (!mpam_has_feature(mpam_feat_intpri_part_0_low, rprops))
> +			intpri = 0;
> +		if (!mpam_has_feature(mpam_feat_dspri_part_0_low, rprops))
> +			dspri = 0;
> +
> +		if (mpam_has_feature(mpam_feat_intpri_part, rprops))
> +			pri_val |= FIELD_PREP(MPAMCFG_PRI_INTPRI, intpri);
> +		if (mpam_has_feature(mpam_feat_dspri_part, rprops))
> +			pri_val |= FIELD_PREP(MPAMCFG_PRI_DSPRI, dspri);
> +
> +		mpam_write_partsel_reg(msc, PRI, pri_val);
> +	}
> +
>  	mutex_unlock(&msc->part_sel_lock);
>  }
>  
> @@ -1529,6 +1642,16 @@ static bool mpam_has_bwa_wd_feature(struct mpam_props *props)
>  	return false;
>  }
>  
> +/* Any of these features mean the CMAX_WD field is valid. */
> +static bool mpam_has_cmax_wd_feature(struct mpam_props *props)
> +{
> +	if (mpam_has_feature(mpam_feat_cmax_cmax, props))
> +		return true;
> +	if (mpam_has_feature(mpam_feat_cmax_cmin, props))
> +		return true;
> +	return false;
> +}
> +
>  #define MISMATCHED_HELPER(parent, child, helper, field, alias)		\
>  	helper(parent) &&						\
>  	((helper(child) && (parent)->field != (child)->field) ||	\
> @@ -1583,6 +1706,23 @@ static void __props_mismatch(struct mpam_props *parent,
>  		parent->bwa_wd = min(parent->bwa_wd, child->bwa_wd);
>  	}
>  
> +	if (alias && !mpam_has_cmax_wd_feature(parent) && mpam_has_cmax_wd_feature(child)) {
> +		parent->cmax_wd = child->cmax_wd;
> +	} else if (MISMATCHED_HELPER(parent, child, mpam_has_cmax_wd_feature,
> +				     cmax_wd, alias)) {
> +		pr_debug("%s took the min cmax_wd\n", __func__);
> +		parent->cmax_wd = min(parent->cmax_wd, child->cmax_wd);
> +	}
> +
> +	if (CAN_MERGE_FEAT(parent, child, mpam_feat_cmax_cassoc, alias)) {
> +		parent->cassoc_wd = child->cassoc_wd;
> +	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_cmax_cassoc,
> +				   cassoc_wd, alias)) {
> +		pr_debug("%s cleared cassoc_wd\n", __func__);
> +		mpam_clear_feature(mpam_feat_cmax_cassoc, &parent->features);
> +		parent->cassoc_wd = 0;
> +	}
> +
>  	/* For num properties, take the minimum */
>  	if (CAN_MERGE_FEAT(parent, child, mpam_feat_msmon_csu, alias)) {
>  		parent->num_csu_mon = child->num_csu_mon;
> @@ -1600,6 +1740,41 @@ static void __props_mismatch(struct mpam_props *parent,
>  		parent->num_mbwu_mon = min(parent->num_mbwu_mon, child->num_mbwu_mon);
>  	}
>  
> +	if (CAN_MERGE_FEAT(parent, child, mpam_feat_intpri_part, alias)) {
> +		parent->intpri_wd = child->intpri_wd;
> +	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_intpri_part,
> +				   intpri_wd, alias)) {
> +		pr_debug("%s took the min intpri_wd\n", __func__);
> +		parent->intpri_wd = min(parent->intpri_wd, child->intpri_wd);
> +	}
> +
> +	if (CAN_MERGE_FEAT(parent, child, mpam_feat_dspri_part, alias)) {
> +		parent->dspri_wd = child->dspri_wd;
> +	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_dspri_part,
> +				   dspri_wd, alias)) {
> +		pr_debug("%s took the min dspri_wd\n", __func__);
> +		parent->dspri_wd = min(parent->dspri_wd, child->dspri_wd);
> +	}
> +
> +	/* TODO: alias support for these two */
> +	/* {int,ds}pri may not have differing 0-low behaviour */
> +	if (mpam_has_feature(mpam_feat_intpri_part, parent) &&
> +	    (!mpam_has_feature(mpam_feat_intpri_part, child) ||
> +	     mpam_has_feature(mpam_feat_intpri_part_0_low, parent) !=
> +	     mpam_has_feature(mpam_feat_intpri_part_0_low, child))) {
> +		pr_debug("%s cleared intpri_part\n", __func__);
> +		mpam_clear_feature(mpam_feat_intpri_part, &parent->features);
> +		mpam_clear_feature(mpam_feat_intpri_part_0_low, &parent->features);
> +	}
> +	if (mpam_has_feature(mpam_feat_dspri_part, parent) &&
> +	    (!mpam_has_feature(mpam_feat_dspri_part, child) ||
> +	     mpam_has_feature(mpam_feat_dspri_part_0_low, parent) !=
> +	     mpam_has_feature(mpam_feat_dspri_part_0_low, child))) {
> +		pr_debug("%s cleared dspri_part\n", __func__);
> +		mpam_clear_feature(mpam_feat_dspri_part, &parent->features);
> +		mpam_clear_feature(mpam_feat_dspri_part_0_low, &parent->features);
> +	}
> +
>  	if (alias) {
>  		/* Merge features for aliased resources */
>  		parent->features |= child->features;
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 70cba9f22746..23445aedbabd 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -157,16 +157,23 @@ static inline void mpam_mon_sel_lock_held(struct mpam_msc *msc)
>   * When we compact the supported features, we don't care what they are.
>   * Storing them as a bitmap makes life easy.
>   */
> -typedef u16 mpam_features_t;
> +typedef u32 mpam_features_t;
>  
>  /* Bits for mpam_features_t */
>  enum mpam_device_features {
> -	mpam_feat_ccap_part = 0,
> +	mpam_feat_cmax_softlim,
> +	mpam_feat_cmax_cmax,
> +	mpam_feat_cmax_cmin,
> +	mpam_feat_cmax_cassoc,
>  	mpam_feat_cpor_part,
>  	mpam_feat_mbw_part,
>  	mpam_feat_mbw_min,
>  	mpam_feat_mbw_max,
>  	mpam_feat_mbw_prop,
> +	mpam_feat_intpri_part,
> +	mpam_feat_intpri_part_0_low,
> +	mpam_feat_dspri_part,
> +	mpam_feat_dspri_part_0_low,
>  	mpam_feat_msmon,
>  	mpam_feat_msmon_csu,
>  	mpam_feat_msmon_csu_capture,
> @@ -176,6 +183,7 @@ enum mpam_device_features {
>  	mpam_feat_msmon_mbwu_rwbw,
>  	mpam_feat_msmon_mbwu_hw_nrdy,
>  	mpam_feat_msmon_capt,
> +	mpam_feat_partid_nrw,
>  	MPAM_FEATURE_LAST,
>  };
>  static_assert(BITS_PER_TYPE(mpam_features_t) >= MPAM_FEATURE_LAST);
> @@ -187,6 +195,10 @@ struct mpam_props {
>  	u16			cpbm_wd;
>  	u16			mbw_pbm_bits;
>  	u16			bwa_wd;
> +	u16			cmax_wd;
> +	u16			cassoc_wd;
> +	u16			intpri_wd;
> +	u16			dspri_wd;
>  	u16			num_csu_mon;
>  	u16			num_mbwu_mon;
>  };
Thanks,
Ben
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 25/33] arm_mpam: Probe and reset the rest of the features
  2025-08-28 10:11   ` Ben Horgan
@ 2025-09-10 19:30     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 19:30 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich, Zeng Heng
Hi Ben,
On 28/08/2025 11:11, Ben Horgan wrote:
> On 8/22/25 16:30, James Morse wrote:
>> MPAM supports more features than are going to be exposed to resctrl.
>> For partid other than 0, the reset values of these controls isn't
>> known.
>>
>> Discover the rest of the features so they can be reset to avoid any
>> side effects when resctrl is in use.
>>
>> PARTID narrowing allows MSC/RIS to support less configuration space than
>> is usable. If this feature is found on a class of device we are likely
>> to use, then reduce the partid_max to make it usable. This allows us
>> to map a PARTID to itself.
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index 8f6df2406c22..aedd743d6827 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -964,6 +1054,29 @@ static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
>>  
>>  	if (mpam_has_feature(mpam_feat_mbw_prop, rprops))
>>  		mpam_write_partsel_reg(msc, MBW_PROP, bwa_fract);
>> +
>> +	if (mpam_has_feature(mpam_feat_cmax_cmax, rprops))
>> +		mpam_write_partsel_reg(msc, CMAX, cmax);
>> +
>> +	if (mpam_has_feature(mpam_feat_cmax_cmin, rprops))
>> +		mpam_write_partsel_reg(msc, CMIN, 0);
> Missing reset for cmax_cassoc. I wonder if it makes sense to have
> separate enums for partitioning features, which require reset, and the rest.
Fixed. They all need resetting because the architecture doesn't guarantee the state of
controls out of reset for PARTID other than zero. (or - when the MSC are reset).
I think those two lists would just be those that are reset to zero, as opposed to some
other value. Given the register names have to be listed here, I don't think its any worse
to have the hand-picked reset value.
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread 
 
 
- * [PATCH 26/33] arm_mpam: Add helpers to allocate monitors
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (24 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 25/33] arm_mpam: Probe and reset the rest of the features James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-29 15:47   ` Ben Horgan
  2025-08-22 15:30 ` [PATCH 27/33] arm_mpam: Add mpam_msmon_read() to read monitor value James Morse
                   ` (41 subsequent siblings)
  67 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
MPAM's MSC support a number of monitors, each of which supports
bandwidth counters, or cache-storage-utilisation counters. To use
a counter, a monitor needs to be configured. Add helpers to allocate
and free CSU or MBWU monitors.
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_devices.c  |  2 ++
 drivers/resctrl/mpam_internal.h | 35 +++++++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index aedd743d6827..e7e00c632512 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -348,6 +348,8 @@ mpam_class_alloc(u8 level_idx, enum mpam_class_types type, gfp_t gfp)
 	class->level = level_idx;
 	class->type = type;
 	INIT_LIST_HEAD_RCU(&class->classes_list);
+	ida_init(&class->ida_csu_mon);
+	ida_init(&class->ida_mbwu_mon);
 
 	list_add_rcu(&class->classes_list, &mpam_classes);
 
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 23445aedbabd..4981de120869 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -231,6 +231,9 @@ struct mpam_class {
 	/* member of mpam_classes */
 	struct list_head	classes_list;
 
+	struct ida		ida_csu_mon;
+	struct ida		ida_mbwu_mon;
+
 	struct mpam_garbage	garbage;
 };
 
@@ -306,6 +309,38 @@ struct mpam_msc_ris {
 	struct mpam_garbage	garbage;
 };
 
+static inline int mpam_alloc_csu_mon(struct mpam_class *class)
+{
+	struct mpam_props *cprops = &class->props;
+
+	if (!mpam_has_feature(mpam_feat_msmon_csu, cprops))
+		return -EOPNOTSUPP;
+
+	return ida_alloc_range(&class->ida_csu_mon, 0, cprops->num_csu_mon - 1,
+			       GFP_KERNEL);
+}
+
+static inline void mpam_free_csu_mon(struct mpam_class *class, int csu_mon)
+{
+	ida_free(&class->ida_csu_mon, csu_mon);
+}
+
+static inline int mpam_alloc_mbwu_mon(struct mpam_class *class)
+{
+	struct mpam_props *cprops = &class->props;
+
+	if (!mpam_has_feature(mpam_feat_msmon_mbwu, cprops))
+		return -EOPNOTSUPP;
+
+	return ida_alloc_range(&class->ida_mbwu_mon, 0,
+			       cprops->num_mbwu_mon - 1, GFP_KERNEL);
+}
+
+static inline void mpam_free_mbwu_mon(struct mpam_class *class, int mbwu_mon)
+{
+	ida_free(&class->ida_mbwu_mon, mbwu_mon);
+}
+
 /* List of all classes - protected by srcu*/
 extern struct srcu_struct mpam_srcu;
 extern struct list_head mpam_classes;
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 26/33] arm_mpam: Add helpers to allocate monitors
  2025-08-22 15:30 ` [PATCH 26/33] arm_mpam: Add helpers to allocate monitors James Morse
@ 2025-08-29 15:47   ` Ben Horgan
  0 siblings, 0 replies; 200+ messages in thread
From: Ben Horgan @ 2025-08-29 15:47 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:30, James Morse wrote:
> MPAM's MSC support a number of monitors, each of which supports
> bandwidth counters, or cache-storage-utilisation counters. To use
> a counter, a monitor needs to be configured. Add helpers to allocate
> and free CSU or MBWU monitors.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  drivers/resctrl/mpam_devices.c  |  2 ++
>  drivers/resctrl/mpam_internal.h | 35 +++++++++++++++++++++++++++++++++
>  2 files changed, 37 insertions(+)
This looks good to me.
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Thanks,
Ben
^ permalink raw reply	[flat|nested] 200+ messages in thread 
 
- * [PATCH 27/33] arm_mpam: Add mpam_msmon_read() to read monitor value
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (25 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 26/33] arm_mpam: Add helpers to allocate monitors James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-29 15:55   ` Ben Horgan
  2025-08-22 15:30 ` [PATCH 28/33] arm_mpam: Track bandwidth counter state for overflow and power management James Morse
                   ` (40 subsequent siblings)
  67 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Reading a monitor involves configuring what you want to monitor, and
reading the value. Components made up of multiple MSC may need values
from each MSC. MSCs may take time to configure, returning 'not ready'.
The maximum 'not ready' time should have been provided by firmware.
Add mpam_msmon_read() to hide all this. If (one of) the MSC returns
not ready, then wait the full timeout value before trying again.
CC: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_devices.c  | 222 ++++++++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h |  18 +++
 2 files changed, 240 insertions(+)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index e7e00c632512..9ce771aaf671 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -973,6 +973,228 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 	return 0;
 }
 
+struct mon_read {
+	struct mpam_msc_ris		*ris;
+	struct mon_cfg			*ctx;
+	enum mpam_device_features	type;
+	u64				*val;
+	int				err;
+};
+
+static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
+				   u32 *flt_val)
+{
+	struct mon_cfg *ctx = m->ctx;
+
+	switch (m->type) {
+	case mpam_feat_msmon_csu:
+		*ctl_val = MSMON_CFG_CSU_CTL_TYPE_CSU;
+		break;
+	case mpam_feat_msmon_mbwu:
+		*ctl_val = MSMON_CFG_MBWU_CTL_TYPE_MBWU;
+		break;
+	default:
+		return;
+	}
+
+	/*
+	 * For CSU counters its implementation-defined what happens when not
+	 * filtering by partid.
+	 */
+	*ctl_val |= MSMON_CFG_x_CTL_MATCH_PARTID;
+
+	*flt_val = FIELD_PREP(MSMON_CFG_MBWU_FLT_PARTID, ctx->partid);
+	if (m->ctx->match_pmg) {
+		*ctl_val |= MSMON_CFG_x_CTL_MATCH_PMG;
+		*flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_PMG, ctx->pmg);
+	}
+
+	if (mpam_has_feature(mpam_feat_msmon_mbwu_rwbw, &m->ris->props))
+		*flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_RWBW, ctx->opts);
+}
+
+static void read_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
+				    u32 *flt_val)
+{
+	struct mpam_msc *msc = m->ris->vmsc->msc;
+
+	switch (m->type) {
+	case mpam_feat_msmon_csu:
+		*ctl_val = mpam_read_monsel_reg(msc, CFG_CSU_CTL);
+		*flt_val = mpam_read_monsel_reg(msc, CFG_CSU_FLT);
+		break;
+	case mpam_feat_msmon_mbwu:
+		*ctl_val = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
+		*flt_val = mpam_read_monsel_reg(msc, CFG_MBWU_FLT);
+		break;
+	default:
+		return;
+	}
+}
+
+/* Remove values set by the hardware to prevent apparant mismatches. */
+static void clean_msmon_ctl_val(u32 *cur_ctl)
+{
+	*cur_ctl &= ~MSMON_CFG_x_CTL_OFLOW_STATUS;
+}
+
+static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
+				     u32 flt_val)
+{
+	struct mpam_msc *msc = m->ris->vmsc->msc;
+
+	/*
+	 * Write the ctl_val with the enable bit cleared, reset the counter,
+	 * then enable counter.
+	 */
+	switch (m->type) {
+	case mpam_feat_msmon_csu:
+		mpam_write_monsel_reg(msc, CFG_CSU_FLT, flt_val);
+		mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val);
+		mpam_write_monsel_reg(msc, CSU, 0);
+		mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
+		break;
+	case mpam_feat_msmon_mbwu:
+		mpam_write_monsel_reg(msc, CFG_MBWU_FLT, flt_val);
+		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
+		mpam_write_monsel_reg(msc, MBWU, 0);
+		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
+		break;
+	default:
+		return;
+	}
+}
+
+/* Call with MSC lock held */
+static void __ris_msmon_read(void *arg)
+{
+	u64 now;
+	bool nrdy = false;
+	struct mon_read *m = arg;
+	struct mon_cfg *ctx = m->ctx;
+	struct mpam_msc_ris *ris = m->ris;
+	struct mpam_props *rprops = &ris->props;
+	struct mpam_msc *msc = m->ris->vmsc->msc;
+	u32 mon_sel, ctl_val, flt_val, cur_ctl, cur_flt;
+
+	if (!mpam_mon_sel_inner_lock(msc)) {
+		m->err = -EIO;
+		return;
+	}
+	mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, ctx->mon) |
+		  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
+	mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
+
+	/*
+	 * Read the existing configuration to avoid re-writing the same values.
+	 * This saves waiting for 'nrdy' on subsequent reads.
+	 */
+	read_msmon_ctl_flt_vals(m, &cur_ctl, &cur_flt);
+	clean_msmon_ctl_val(&cur_ctl);
+	gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
+	if (cur_flt != flt_val || cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN))
+		write_msmon_ctl_flt_vals(m, ctl_val, flt_val);
+
+	switch (m->type) {
+	case mpam_feat_msmon_csu:
+		now = mpam_read_monsel_reg(msc, CSU);
+		if (mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, rprops))
+			nrdy = now & MSMON___NRDY;
+		break;
+	case mpam_feat_msmon_mbwu:
+		now = mpam_read_monsel_reg(msc, MBWU);
+		if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
+			nrdy = now & MSMON___NRDY;
+		break;
+	default:
+		m->err = -EINVAL;
+		break;
+	}
+	mpam_mon_sel_inner_unlock(msc);
+
+	if (nrdy) {
+		m->err = -EBUSY;
+		return;
+	}
+
+	now = FIELD_GET(MSMON___VALUE, now);
+	*m->val += now;
+}
+
+static int _msmon_read(struct mpam_component *comp, struct mon_read *arg)
+{
+	int err, idx;
+	struct mpam_msc *msc;
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc_ris *ris;
+
+	idx = srcu_read_lock(&mpam_srcu);
+	list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
+		msc = vmsc->msc;
+
+		mpam_mon_sel_outer_lock(msc);
+		list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
+			arg->ris = ris;
+
+			err = smp_call_function_any(&msc->accessibility,
+						    __ris_msmon_read, arg,
+						    true);
+			if (!err && arg->err)
+				err = arg->err;
+			if (err)
+				break;
+		}
+		mpam_mon_sel_outer_unlock(msc);
+		if (err)
+			break;
+	}
+	srcu_read_unlock(&mpam_srcu, idx);
+
+	return err;
+}
+
+int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
+		    enum mpam_device_features type, u64 *val)
+{
+	int err;
+	struct mon_read arg;
+	u64 wait_jiffies = 0;
+	struct mpam_props *cprops = &comp->class->props;
+
+	might_sleep();
+
+	if (!mpam_is_enabled())
+		return -EIO;
+
+	if (!mpam_has_feature(type, cprops))
+		return -EOPNOTSUPP;
+
+	memset(&arg, 0, sizeof(arg));
+	arg.ctx = ctx;
+	arg.type = type;
+	arg.val = val;
+	*val = 0;
+
+	err = _msmon_read(comp, &arg);
+	if (err == -EBUSY && comp->class->nrdy_usec)
+		wait_jiffies = usecs_to_jiffies(comp->class->nrdy_usec);
+
+	while (wait_jiffies)
+		wait_jiffies = schedule_timeout_uninterruptible(wait_jiffies);
+
+	if (err == -EBUSY) {
+		memset(&arg, 0, sizeof(arg));
+		arg.ctx = ctx;
+		arg.type = type;
+		arg.val = val;
+		*val = 0;
+
+		err = _msmon_read(comp, &arg);
+	}
+
+	return err;
+}
+
 static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
 {
 	u32 num_words, msb;
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 4981de120869..76e406a2b0d1 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -309,6 +309,21 @@ struct mpam_msc_ris {
 	struct mpam_garbage	garbage;
 };
 
+/* The values for MSMON_CFG_MBWU_FLT.RWBW */
+enum mon_filter_options {
+	COUNT_BOTH	= 0,
+	COUNT_WRITE	= 1,
+	COUNT_READ	= 2,
+};
+
+struct mon_cfg {
+	u16                     mon;
+	u8                      pmg;
+	bool                    match_pmg;
+	u32                     partid;
+	enum mon_filter_options opts;
+};
+
 static inline int mpam_alloc_csu_mon(struct mpam_class *class)
 {
 	struct mpam_props *cprops = &class->props;
@@ -361,6 +376,9 @@ void mpam_disable(struct work_struct *work);
 int mpam_apply_config(struct mpam_component *comp, u16 partid,
 		      struct mpam_config *cfg);
 
+int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
+		    enum mpam_device_features, u64 *val);
+
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
 
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 27/33] arm_mpam: Add mpam_msmon_read() to read monitor value
  2025-08-22 15:30 ` [PATCH 27/33] arm_mpam: Add mpam_msmon_read() to read monitor value James Morse
@ 2025-08-29 15:55   ` Ben Horgan
  2025-09-10 19:30     ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-08-29 15:55 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:30, James Morse wrote:
> Reading a monitor involves configuring what you want to monitor, and
> reading the value. Components made up of multiple MSC may need values
> from each MSC. MSCs may take time to configure, returning 'not ready'.
> The maximum 'not ready' time should have been provided by firmware.
> 
> Add mpam_msmon_read() to hide all this. If (one of) the MSC returns
> not ready, then wait the full timeout value before trying again.
> 
> CC: Shanker Donthineni <sdonthineni@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  drivers/resctrl/mpam_devices.c  | 222 ++++++++++++++++++++++++++++++++
>  drivers/resctrl/mpam_internal.h |  18 +++
>  2 files changed, 240 insertions(+)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index e7e00c632512..9ce771aaf671 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -973,6 +973,228 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>  	return 0;
>  }
>  
> +struct mon_read {
> +	struct mpam_msc_ris		*ris;
> +	struct mon_cfg			*ctx;
> +	enum mpam_device_features	type;
> +	u64				*val;
> +	int				err;
> +};
> +
> +static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
> +				   u32 *flt_val)
> +{
> +	struct mon_cfg *ctx = m->ctx;
> +
> +	switch (m->type) {
> +	case mpam_feat_msmon_csu:
> +		*ctl_val = MSMON_CFG_CSU_CTL_TYPE_CSU;
> +		break;
> +	case mpam_feat_msmon_mbwu:
> +		*ctl_val = MSMON_CFG_MBWU_CTL_TYPE_MBWU;
> +		break;
> +	default:
> +		return;
> +	}
> +
> +	/*
> +	 * For CSU counters its implementation-defined what happens when not
> +	 * filtering by partid.
> +	 */
> +	*ctl_val |= MSMON_CFG_x_CTL_MATCH_PARTID;
> +
> +	*flt_val = FIELD_PREP(MSMON_CFG_MBWU_FLT_PARTID, ctx->partid);
> +	if (m->ctx->match_pmg) {
> +		*ctl_val |= MSMON_CFG_x_CTL_MATCH_PMG;
> +		*flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_PMG, ctx->pmg);
> +	}
As we are using MSMON_CFG_MBWU_FLT_{PMG,PARTID} for both CSU and MBWU
how about changing to MSMON_CFG_x_FLT_{PMG,PARTID}?
> +
> +	if (mpam_has_feature(mpam_feat_msmon_mbwu_rwbw, &m->ris->props))
> +		*flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_RWBW, ctx->opts);
This needs to be conditional on the type of the monitor being
configured. There is an XCL bit here for CSU monitors.
> +}
> +
> +static void read_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
> +				    u32 *flt_val)
> +{
> +	struct mpam_msc *msc = m->ris->vmsc->msc;
> +
> +	switch (m->type) {
> +	case mpam_feat_msmon_csu:
> +		*ctl_val = mpam_read_monsel_reg(msc, CFG_CSU_CTL);
> +		*flt_val = mpam_read_monsel_reg(msc, CFG_CSU_FLT);
> +		break;
> +	case mpam_feat_msmon_mbwu:
> +		*ctl_val = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
> +		*flt_val = mpam_read_monsel_reg(msc, CFG_MBWU_FLT);
> +		break;
> +	default:
> +		return;
> +	}
> +}
> +
> +/* Remove values set by the hardware to prevent apparant mismatches. */
> +static void clean_msmon_ctl_val(u32 *cur_ctl)
> +{
> +	*cur_ctl &= ~MSMON_CFG_x_CTL_OFLOW_STATUS;
> +}
> +
> +static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
> +				     u32 flt_val)
> +{
> +	struct mpam_msc *msc = m->ris->vmsc->msc;
> +
> +	/*
> +	 * Write the ctl_val with the enable bit cleared, reset the counter,
> +	 * then enable counter.
> +	 */
> +	switch (m->type) {
> +	case mpam_feat_msmon_csu:
> +		mpam_write_monsel_reg(msc, CFG_CSU_FLT, flt_val);
> +		mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val);
> +		mpam_write_monsel_reg(msc, CSU, 0);
> +		mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
> +		break;
> +	case mpam_feat_msmon_mbwu:
> +		mpam_write_monsel_reg(msc, CFG_MBWU_FLT, flt_val);
> +		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
> +		mpam_write_monsel_reg(msc, MBWU, 0);
> +		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
> +		break;
> +	default:
> +		return;
> +	}
> +}
> +
> +/* Call with MSC lock held */
> +static void __ris_msmon_read(void *arg)
> +{
> +	u64 now;
> +	bool nrdy = false;
> +	struct mon_read *m = arg;
> +	struct mon_cfg *ctx = m->ctx;
> +	struct mpam_msc_ris *ris = m->ris;
> +	struct mpam_props *rprops = &ris->props;
> +	struct mpam_msc *msc = m->ris->vmsc->msc;
> +	u32 mon_sel, ctl_val, flt_val, cur_ctl, cur_flt;
> +
> +	if (!mpam_mon_sel_inner_lock(msc)) {
> +		m->err = -EIO;
> +		return;
> +	}
> +	mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, ctx->mon) |
> +		  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
> +	mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
> +
> +	/*
> +	 * Read the existing configuration to avoid re-writing the same values.
> +	 * This saves waiting for 'nrdy' on subsequent reads.
> +	 */
> +	read_msmon_ctl_flt_vals(m, &cur_ctl, &cur_flt);
> +	clean_msmon_ctl_val(&cur_ctl);
> +	gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
> +	if (cur_flt != flt_val || cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN))
> +		write_msmon_ctl_flt_vals(m, ctl_val, flt_val);
> +
> +	switch (m->type) {
> +	case mpam_feat_msmon_csu:
> +		now = mpam_read_monsel_reg(msc, CSU);
> +		if (mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, rprops))
> +			nrdy = now & MSMON___NRDY;
> +		break;
> +	case mpam_feat_msmon_mbwu:
> +		now = mpam_read_monsel_reg(msc, MBWU);
> +		if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
> +			nrdy = now & MSMON___NRDY;
> +		break;
> +	default:
> +		m->err = -EINVAL;
> +		break;
> +	}
> +	mpam_mon_sel_inner_unlock(msc);
> +
> +	if (nrdy) {
> +		m->err = -EBUSY;
> +		return;
> +	}
> +
> +	now = FIELD_GET(MSMON___VALUE, now);
> +	*m->val += now;
> +}
> +
> +static int _msmon_read(struct mpam_component *comp, struct mon_read *arg)
> +{
> +	int err, idx;
> +	struct mpam_msc *msc;
> +	struct mpam_vmsc *vmsc;
> +	struct mpam_msc_ris *ris;
> +
> +	idx = srcu_read_lock(&mpam_srcu);
> +	list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
> +		msc = vmsc->msc;
> +
> +		mpam_mon_sel_outer_lock(msc);
> +		list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
> +			arg->ris = ris;
> +
> +			err = smp_call_function_any(&msc->accessibility,
> +						    __ris_msmon_read, arg,
> +						    true);
> +			if (!err && arg->err)
> +				err = arg->err;
> +			if (err)
> +				break;
> +		}
> +		mpam_mon_sel_outer_unlock(msc);
> +		if (err)
> +			break;
> +	}
> +	srcu_read_unlock(&mpam_srcu, idx);
> +
> +	return err;
> +}
> +
> +int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
> +		    enum mpam_device_features type, u64 *val)
> +{
> +	int err;
> +	struct mon_read arg;
> +	u64 wait_jiffies = 0;
> +	struct mpam_props *cprops = &comp->class->props;
> +
> +	might_sleep();
> +
> +	if (!mpam_is_enabled())
> +		return -EIO;
> +
> +	if (!mpam_has_feature(type, cprops))
> +		return -EOPNOTSUPP;
> +
> +	memset(&arg, 0, sizeof(arg));
> +	arg.ctx = ctx;
> +	arg.type = type;
> +	arg.val = val;
> +	*val = 0;
> +
> +	err = _msmon_read(comp, &arg);
> +	if (err == -EBUSY && comp->class->nrdy_usec)
> +		wait_jiffies = usecs_to_jiffies(comp->class->nrdy_usec);
> +
> +	while (wait_jiffies)
> +		wait_jiffies = schedule_timeout_uninterruptible(wait_jiffies);
> +
> +	if (err == -EBUSY) {
> +		memset(&arg, 0, sizeof(arg));
> +		arg.ctx = ctx;
> +		arg.type = type;
> +		arg.val = val;
> +		*val = 0;
> +
> +		err = _msmon_read(comp, &arg);
> +	}
> +
> +	return err;
> +}
> +
>  static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
>  {
>  	u32 num_words, msb;
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 4981de120869..76e406a2b0d1 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -309,6 +309,21 @@ struct mpam_msc_ris {
>  	struct mpam_garbage	garbage;
>  };
>  
> +/* The values for MSMON_CFG_MBWU_FLT.RWBW */
> +enum mon_filter_options {
> +	COUNT_BOTH	= 0,
> +	COUNT_WRITE	= 1,
> +	COUNT_READ	= 2,
> +};
> +
> +struct mon_cfg {
> +	u16                     mon;
> +	u8                      pmg;
> +	bool                    match_pmg;
> +	u32                     partid;
> +	enum mon_filter_options opts;
> +};
> +
>  static inline int mpam_alloc_csu_mon(struct mpam_class *class)
>  {
>  	struct mpam_props *cprops = &class->props;
> @@ -361,6 +376,9 @@ void mpam_disable(struct work_struct *work);
>  int mpam_apply_config(struct mpam_component *comp, u16 partid,
>  		      struct mpam_config *cfg);
>  
> +int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
> +		    enum mpam_device_features, u64 *val);
> +
>  int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
>  				   cpumask_t *affinity);
>  
Thanks,
Ben
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 27/33] arm_mpam: Add mpam_msmon_read() to read monitor value
  2025-08-29 15:55   ` Ben Horgan
@ 2025-09-10 19:30     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 19:30 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi Ben,
On 29/08/2025 16:55, Ben Horgan wrote:
> On 8/22/25 16:30, James Morse wrote:
>> Reading a monitor involves configuring what you want to monitor, and
>> reading the value. Components made up of multiple MSC may need values
>> from each MSC. MSCs may take time to configure, returning 'not ready'.
>> The maximum 'not ready' time should have been provided by firmware.
>>
>> Add mpam_msmon_read() to hide all this. If (one of) the MSC returns
>> not ready, then wait the full timeout value before trying again.
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index e7e00c632512..9ce771aaf671 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -973,6 +973,228 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>> +static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
>> +				   u32 *flt_val)
>> +{
>> +	struct mon_cfg *ctx = m->ctx;
>> +
>> +	switch (m->type) {
>> +	case mpam_feat_msmon_csu:
>> +		*ctl_val = MSMON_CFG_CSU_CTL_TYPE_CSU;
>> +		break;
>> +	case mpam_feat_msmon_mbwu:
>> +		*ctl_val = MSMON_CFG_MBWU_CTL_TYPE_MBWU;
>> +		break;
>> +	default:
>> +		return;
>> +	}
>> +
>> +	/*
>> +	 * For CSU counters its implementation-defined what happens when not
>> +	 * filtering by partid.
>> +	 */
>> +	*ctl_val |= MSMON_CFG_x_CTL_MATCH_PARTID;
>> +
>> +	*flt_val = FIELD_PREP(MSMON_CFG_MBWU_FLT_PARTID, ctx->partid);
>> +	if (m->ctx->match_pmg) {
>> +		*ctl_val |= MSMON_CFG_x_CTL_MATCH_PMG;
>> +		*flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_PMG, ctx->pmg);
>> +	}
> As we are using MSMON_CFG_MBWU_FLT_{PMG,PARTID} for both CSU and MBWU
> how about changing to MSMON_CFG_x_FLT_{PMG,PARTID}?
Sure,
>> +
>> +	if (mpam_has_feature(mpam_feat_msmon_mbwu_rwbw, &m->ris->props))
>> +		*flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_RWBW, ctx->opts);
> This needs to be conditional on the type of the monitor being
> configured. There is an XCL bit here for CSU monitors.
Fixed ... that wasn't there last time I looked!
I may as well wire that up too...
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
 
- * [PATCH 28/33] arm_mpam: Track bandwidth counter state for overflow and power management
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (26 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 27/33] arm_mpam: Add mpam_msmon_read() to read monitor value James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-29 16:09   ` Ben Horgan
  2025-08-22 15:30 ` [PATCH 29/33] arm_mpam: Probe for long/lwd mbwu counters James Morse
                   ` (39 subsequent siblings)
  67 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Bandwidth counters need to run continuously to correctly reflect the
bandwidth.
The value read may be lower than the previous value read in the case
of overflow and when the hardware is reset due to CPU hotplug.
Add struct mbwu_state to track the bandwidth counter to allow overflow
and power management to be handled.
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_devices.c  | 163 +++++++++++++++++++++++++++++++-
 drivers/resctrl/mpam_internal.h |  54 ++++++++---
 2 files changed, 200 insertions(+), 17 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 9ce771aaf671..11be34b54643 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -1004,6 +1004,7 @@ static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
 	*ctl_val |= MSMON_CFG_x_CTL_MATCH_PARTID;
 
 	*flt_val = FIELD_PREP(MSMON_CFG_MBWU_FLT_PARTID, ctx->partid);
+	*flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_RWBW, ctx->opts);
 	if (m->ctx->match_pmg) {
 		*ctl_val |= MSMON_CFG_x_CTL_MATCH_PMG;
 		*flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_PMG, ctx->pmg);
@@ -1041,6 +1042,7 @@ static void clean_msmon_ctl_val(u32 *cur_ctl)
 static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
 				     u32 flt_val)
 {
+	struct msmon_mbwu_state *mbwu_state;
 	struct mpam_msc *msc = m->ris->vmsc->msc;
 
 	/*
@@ -1059,20 +1061,32 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
 		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
 		mpam_write_monsel_reg(msc, MBWU, 0);
 		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
+
+		mbwu_state = &m->ris->mbwu_state[m->ctx->mon];
+		if (mbwu_state)
+			mbwu_state->prev_val = 0;
+
 		break;
 	default:
 		return;
 	}
 }
 
+static u64 mpam_msmon_overflow_val(struct mpam_msc_ris *ris)
+{
+	/* TODO: scaling, and long counters */
+	return GENMASK_ULL(30, 0);
+}
+
 /* Call with MSC lock held */
 static void __ris_msmon_read(void *arg)
 {
-	u64 now;
 	bool nrdy = false;
 	struct mon_read *m = arg;
+	u64 now, overflow_val = 0;
 	struct mon_cfg *ctx = m->ctx;
 	struct mpam_msc_ris *ris = m->ris;
+	struct msmon_mbwu_state *mbwu_state;
 	struct mpam_props *rprops = &ris->props;
 	struct mpam_msc *msc = m->ris->vmsc->msc;
 	u32 mon_sel, ctl_val, flt_val, cur_ctl, cur_flt;
@@ -1100,11 +1114,30 @@ static void __ris_msmon_read(void *arg)
 		now = mpam_read_monsel_reg(msc, CSU);
 		if (mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, rprops))
 			nrdy = now & MSMON___NRDY;
+		now = FIELD_GET(MSMON___VALUE, now);
 		break;
 	case mpam_feat_msmon_mbwu:
 		now = mpam_read_monsel_reg(msc, MBWU);
 		if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
 			nrdy = now & MSMON___NRDY;
+		now = FIELD_GET(MSMON___VALUE, now);
+
+		if (nrdy)
+			break;
+
+		mbwu_state = &ris->mbwu_state[ctx->mon];
+		if (!mbwu_state)
+			break;
+
+		/* Add any pre-overflow value to the mbwu_state->val */
+		if (mbwu_state->prev_val > now)
+			overflow_val = mpam_msmon_overflow_val(ris) - mbwu_state->prev_val;
+
+		mbwu_state->prev_val = now;
+		mbwu_state->correction += overflow_val;
+
+		/* Include bandwidth consumed before the last hardware reset */
+		now += mbwu_state->correction;
 		break;
 	default:
 		m->err = -EINVAL;
@@ -1117,7 +1150,6 @@ static void __ris_msmon_read(void *arg)
 		return;
 	}
 
-	now = FIELD_GET(MSMON___VALUE, now);
 	*m->val += now;
 }
 
@@ -1329,6 +1361,72 @@ static int mpam_reprogram_ris(void *_arg)
 	return 0;
 }
 
+/* Call with MSC lock and outer mon_sel lock held */
+static int mpam_restore_mbwu_state(void *_ris)
+{
+	int i;
+	struct mon_read mwbu_arg;
+	struct mpam_msc_ris *ris = _ris;
+	struct mpam_msc *msc = ris->vmsc->msc;
+
+	mpam_mon_sel_outer_lock(msc);
+
+	for (i = 0; i < ris->props.num_mbwu_mon; i++) {
+		if (ris->mbwu_state[i].enabled) {
+			mwbu_arg.ris = ris;
+			mwbu_arg.ctx = &ris->mbwu_state[i].cfg;
+			mwbu_arg.type = mpam_feat_msmon_mbwu;
+
+			__ris_msmon_read(&mwbu_arg);
+		}
+	}
+
+	mpam_mon_sel_outer_unlock(msc);
+
+	return 0;
+}
+
+/* Call with MSC lock and outer mon_sel lock held */
+static int mpam_save_mbwu_state(void *arg)
+{
+	int i;
+	u64 val;
+	struct mon_cfg *cfg;
+	u32 cur_flt, cur_ctl, mon_sel;
+	struct mpam_msc_ris *ris = arg;
+	struct msmon_mbwu_state *mbwu_state;
+	struct mpam_msc *msc = ris->vmsc->msc;
+
+	for (i = 0; i < ris->props.num_mbwu_mon; i++) {
+		mbwu_state = &ris->mbwu_state[i];
+		cfg = &mbwu_state->cfg;
+
+		if (WARN_ON_ONCE(!mpam_mon_sel_inner_lock(msc)))
+			return -EIO;
+
+		mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, i) |
+			  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
+		mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
+
+		cur_flt = mpam_read_monsel_reg(msc, CFG_MBWU_FLT);
+		cur_ctl = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
+		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, 0);
+
+		val = mpam_read_monsel_reg(msc, MBWU);
+		mpam_write_monsel_reg(msc, MBWU, 0);
+
+		cfg->mon = i;
+		cfg->pmg = FIELD_GET(MSMON_CFG_MBWU_FLT_PMG, cur_flt);
+		cfg->match_pmg = FIELD_GET(MSMON_CFG_x_CTL_MATCH_PMG, cur_ctl);
+		cfg->partid = FIELD_GET(MSMON_CFG_MBWU_FLT_PARTID, cur_flt);
+		mbwu_state->correction += val;
+		mbwu_state->enabled = FIELD_GET(MSMON_CFG_x_CTL_EN, cur_ctl);
+		mpam_mon_sel_inner_unlock(msc);
+	}
+
+	return 0;
+}
+
 /*
  * Called via smp_call_on_cpu() to prevent migration, while still being
  * pre-emptible.
@@ -1389,6 +1487,9 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
 		 * for non-zero partid may be lost while the CPUs are offline.
 		 */
 		ris->in_reset_state = online;
+
+		if (mpam_is_enabled() && !online)
+			mpam_touch_msc(msc, &mpam_save_mbwu_state, ris);
 	}
 	mpam_mon_sel_outer_unlock(msc);
 }
@@ -1423,6 +1524,9 @@ static void mpam_reprogram_msc(struct mpam_msc *msc)
 			mpam_reprogram_ris_partid(ris, partid, cfg);
 		}
 		ris->in_reset_state = reset;
+
+		if (mpam_has_feature(mpam_feat_msmon_mbwu, &ris->props))
+			mpam_touch_msc(msc, &mpam_restore_mbwu_state, ris);
 	}
 }
 
@@ -2291,11 +2395,35 @@ static void mpam_unregister_irqs(void)
 
 static void __destroy_component_cfg(struct mpam_component *comp)
 {
+	struct mpam_msc *msc;
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc_ris *ris;
+
+	lockdep_assert_held(&mpam_list_lock);
+
 	add_to_garbage(comp->cfg);
+	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+		msc = vmsc->msc;
+
+		mpam_mon_sel_outer_lock(msc);
+		if (mpam_mon_sel_inner_lock(msc)) {
+			list_for_each_entry(ris, &vmsc->ris, vmsc_list)
+				add_to_garbage(ris->mbwu_state);
+			mpam_mon_sel_inner_unlock(msc);
+		}
+		mpam_mon_sel_outer_lock(msc);
+	}
 }
 
 static int __allocate_component_cfg(struct mpam_component *comp)
 {
+	int err = 0;
+	struct mpam_msc *msc;
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc_ris *ris;
+	struct msmon_mbwu_state *mbwu_state;
+
+	lockdep_assert_held(&mpam_list_lock);
 	mpam_assert_partid_sizes_fixed();
 
 	if (comp->cfg)
@@ -2306,6 +2434,37 @@ static int __allocate_component_cfg(struct mpam_component *comp)
 		return -ENOMEM;
 	init_garbage(comp->cfg);
 
+	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+		if (!vmsc->props.num_mbwu_mon)
+			continue;
+
+		msc = vmsc->msc;
+		mpam_mon_sel_outer_lock(msc);
+		list_for_each_entry(ris, &vmsc->ris, vmsc_list) {
+			if (!ris->props.num_mbwu_mon)
+				continue;
+
+			mbwu_state = kcalloc(ris->props.num_mbwu_mon,
+					     sizeof(*ris->mbwu_state),
+					     GFP_KERNEL);
+			if (!mbwu_state) {
+				__destroy_component_cfg(comp);
+				err = -ENOMEM;
+				break;
+			}
+
+			if (mpam_mon_sel_inner_lock(msc)) {
+				init_garbage(mbwu_state);
+				ris->mbwu_state = mbwu_state;
+				mpam_mon_sel_inner_unlock(msc);
+			}
+		}
+		mpam_mon_sel_outer_unlock(msc);
+
+		if (err)
+			break;
+	}
+
 	return 0;
 }
 
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 76e406a2b0d1..9a50a5432f4a 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -271,6 +271,42 @@ struct mpam_component {
 	struct mpam_garbage	garbage;
 };
 
+/* The values for MSMON_CFG_MBWU_FLT.RWBW */
+enum mon_filter_options {
+	COUNT_BOTH	= 0,
+	COUNT_WRITE	= 1,
+	COUNT_READ	= 2,
+};
+
+struct mon_cfg {
+	/* mon is wider than u16 to hold an out of range 'USE_RMID_IDX' */
+	u32                     mon;
+	u8                      pmg;
+	bool                    match_pmg;
+	u32                     partid;
+	enum mon_filter_options opts;
+};
+
+/*
+ * Changes to enabled and cfg are protected by the msc->lock.
+ * Changes to prev_val and correction are protected by the msc's mon_sel_lock.
+ */
+struct msmon_mbwu_state {
+	bool		enabled;
+	struct mon_cfg	cfg;
+
+	/* The value last read from the hardware. Used to detect overflow. */
+	u64		prev_val;
+
+	/*
+	 * The value to add to the new reading to account for power management,
+	 * and shifts to trigger the overflow interrupt.
+	 */
+	u64		correction;
+
+	struct mpam_garbage	garbage;
+};
+
 struct mpam_vmsc {
 	/* member of mpam_component:vmsc_list */
 	struct list_head	comp_list;
@@ -306,22 +342,10 @@ struct mpam_msc_ris {
 	/* parent: */
 	struct mpam_vmsc	*vmsc;
 
-	struct mpam_garbage	garbage;
-};
+	/* msmon mbwu configuration is preserved over reset */
+	struct msmon_mbwu_state	*mbwu_state;
 
-/* The values for MSMON_CFG_MBWU_FLT.RWBW */
-enum mon_filter_options {
-	COUNT_BOTH	= 0,
-	COUNT_WRITE	= 1,
-	COUNT_READ	= 2,
-};
-
-struct mon_cfg {
-	u16                     mon;
-	u8                      pmg;
-	bool                    match_pmg;
-	u32                     partid;
-	enum mon_filter_options opts;
+	struct mpam_garbage	garbage;
 };
 
 static inline int mpam_alloc_csu_mon(struct mpam_class *class)
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 28/33] arm_mpam: Track bandwidth counter state for overflow and power management
  2025-08-22 15:30 ` [PATCH 28/33] arm_mpam: Track bandwidth counter state for overflow and power management James Morse
@ 2025-08-29 16:09   ` Ben Horgan
  0 siblings, 0 replies; 200+ messages in thread
From: Ben Horgan @ 2025-08-29 16:09 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:30, James Morse wrote:
> Bandwidth counters need to run continuously to correctly reflect the
> bandwidth.
> 
> The value read may be lower than the previous value read in the case
> of overflow and when the hardware is reset due to CPU hotplug.
> 
> Add struct mbwu_state to track the bandwidth counter to allow overflow
> and power management to be handled.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  drivers/resctrl/mpam_devices.c  | 163 +++++++++++++++++++++++++++++++-
>  drivers/resctrl/mpam_internal.h |  54 ++++++++---
>  2 files changed, 200 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 9ce771aaf671..11be34b54643 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -1004,6 +1004,7 @@ static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
>  	*ctl_val |= MSMON_CFG_x_CTL_MATCH_PARTID;
>  
>  	*flt_val = FIELD_PREP(MSMON_CFG_MBWU_FLT_PARTID, ctx->partid);
> +	*flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_RWBW, ctx->opts);
>  	if (m->ctx->match_pmg) {
>  		*ctl_val |= MSMON_CFG_x_CTL_MATCH_PMG;
>  		*flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_PMG, ctx->pmg);
> @@ -1041,6 +1042,7 @@ static void clean_msmon_ctl_val(u32 *cur_ctl)
>  static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
>  				     u32 flt_val)
>  {
> +	struct msmon_mbwu_state *mbwu_state;
>  	struct mpam_msc *msc = m->ris->vmsc->msc;
>  
>  	/*
> @@ -1059,20 +1061,32 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
>  		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
>  		mpam_write_monsel_reg(msc, MBWU, 0);
>  		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
> +
> +		mbwu_state = &m->ris->mbwu_state[m->ctx->mon];
> +		if (mbwu_state)
> +			mbwu_state->prev_val = 0;
> +
>  		break;
>  	default:
>  		return;
>  	}
>  }
>  
> +static u64 mpam_msmon_overflow_val(struct mpam_msc_ris *ris)
> +{
> +	/* TODO: scaling, and long counters */
> +	return GENMASK_ULL(30, 0);
> +}
> +
>  /* Call with MSC lock held */
>  static void __ris_msmon_read(void *arg)
>  {
> -	u64 now;
>  	bool nrdy = false;
>  	struct mon_read *m = arg;
> +	u64 now, overflow_val = 0;
>  	struct mon_cfg *ctx = m->ctx;
>  	struct mpam_msc_ris *ris = m->ris;
> +	struct msmon_mbwu_state *mbwu_state;
>  	struct mpam_props *rprops = &ris->props;
>  	struct mpam_msc *msc = m->ris->vmsc->msc;
>  	u32 mon_sel, ctl_val, flt_val, cur_ctl, cur_flt;
> @@ -1100,11 +1114,30 @@ static void __ris_msmon_read(void *arg)
>  		now = mpam_read_monsel_reg(msc, CSU);
>  		if (mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, rprops))
>  			nrdy = now & MSMON___NRDY;
> +		now = FIELD_GET(MSMON___VALUE, now);
>  		break;
>  	case mpam_feat_msmon_mbwu:
>  		now = mpam_read_monsel_reg(msc, MBWU);
>  		if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
>  			nrdy = now & MSMON___NRDY;
> +		now = FIELD_GET(MSMON___VALUE, now);
> +
> +		if (nrdy)
> +			break;
> +
> +		mbwu_state = &ris->mbwu_state[ctx->mon];
> +		if (!mbwu_state)
> +			break;
> +
> +		/* Add any pre-overflow value to the mbwu_state->val */
> +		if (mbwu_state->prev_val > now)
> +			overflow_val = mpam_msmon_overflow_val(ris) - mbwu_state->prev_val;
> +
> +		mbwu_state->prev_val = now;
> +		mbwu_state->correction += overflow_val;
> +
> +		/* Include bandwidth consumed before the last hardware reset */
> +		now += mbwu_state->correction;
>  		break;
>  	default:
>  		m->err = -EINVAL;
> @@ -1117,7 +1150,6 @@ static void __ris_msmon_read(void *arg)
>  		return;
>  	}
>  
> -	now = FIELD_GET(MSMON___VALUE, now);
>  	*m->val += now;
>  }
>  
> @@ -1329,6 +1361,72 @@ static int mpam_reprogram_ris(void *_arg)
>  	return 0;
>  }
>  
> +/* Call with MSC lock and outer mon_sel lock held */
> +static int mpam_restore_mbwu_state(void *_ris)
> +{
> +	int i;
> +	struct mon_read mwbu_arg;
> +	struct mpam_msc_ris *ris = _ris;
> +	struct mpam_msc *msc = ris->vmsc->msc;
> +
> +	mpam_mon_sel_outer_lock(msc);
> +
> +	for (i = 0; i < ris->props.num_mbwu_mon; i++) {
> +		if (ris->mbwu_state[i].enabled) {
> +			mwbu_arg.ris = ris;
> +			mwbu_arg.ctx = &ris->mbwu_state[i].cfg;
> +			mwbu_arg.type = mpam_feat_msmon_mbwu;
> +
> +			__ris_msmon_read(&mwbu_arg);
> +		}
> +	}
> +
> +	mpam_mon_sel_outer_unlock(msc);
> +
> +	return 0;
> +}
> +
> +/* Call with MSC lock and outer mon_sel lock held */
> +static int mpam_save_mbwu_state(void *arg)
> +{
> +	int i;
> +	u64 val;
> +	struct mon_cfg *cfg;
> +	u32 cur_flt, cur_ctl, mon_sel;
> +	struct mpam_msc_ris *ris = arg;
> +	struct msmon_mbwu_state *mbwu_state;
> +	struct mpam_msc *msc = ris->vmsc->msc;
> +
> +	for (i = 0; i < ris->props.num_mbwu_mon; i++) {
> +		mbwu_state = &ris->mbwu_state[i];
> +		cfg = &mbwu_state->cfg;
> +
> +		if (WARN_ON_ONCE(!mpam_mon_sel_inner_lock(msc)))
> +			return -EIO;
> +
> +		mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, i) |
> +			  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
> +		mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
> +
> +		cur_flt = mpam_read_monsel_reg(msc, CFG_MBWU_FLT);
> +		cur_ctl = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
> +		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, 0);
> +
> +		val = mpam_read_monsel_reg(msc, MBWU);
> +		mpam_write_monsel_reg(msc, MBWU, 0);
> +
> +		cfg->mon = i;
> +		cfg->pmg = FIELD_GET(MSMON_CFG_MBWU_FLT_PMG, cur_flt);
> +		cfg->match_pmg = FIELD_GET(MSMON_CFG_x_CTL_MATCH_PMG, cur_ctl);
> +		cfg->partid = FIELD_GET(MSMON_CFG_MBWU_FLT_PARTID, cur_flt);
> +		mbwu_state->correction += val;
> +		mbwu_state->enabled = FIELD_GET(MSMON_CFG_x_CTL_EN, cur_ctl);
> +		mpam_mon_sel_inner_unlock(msc);
> +	}
> +
> +	return 0;
> +}
> +
>  /*
>   * Called via smp_call_on_cpu() to prevent migration, while still being
>   * pre-emptible.
> @@ -1389,6 +1487,9 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
>  		 * for non-zero partid may be lost while the CPUs are offline.
>  		 */
>  		ris->in_reset_state = online;
> +
> +		if (mpam_is_enabled() && !online)
> +			mpam_touch_msc(msc, &mpam_save_mbwu_state, ris);
>  	}
>  	mpam_mon_sel_outer_unlock(msc);
>  }
> @@ -1423,6 +1524,9 @@ static void mpam_reprogram_msc(struct mpam_msc *msc)
>  			mpam_reprogram_ris_partid(ris, partid, cfg);
>  		}
>  		ris->in_reset_state = reset;
> +
> +		if (mpam_has_feature(mpam_feat_msmon_mbwu, &ris->props))
> +			mpam_touch_msc(msc, &mpam_restore_mbwu_state, ris);
>  	}
>  }
>  
> @@ -2291,11 +2395,35 @@ static void mpam_unregister_irqs(void)
>  
>  static void __destroy_component_cfg(struct mpam_component *comp)
>  {
> +	struct mpam_msc *msc;
> +	struct mpam_vmsc *vmsc;
> +	struct mpam_msc_ris *ris;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
>  	add_to_garbage(comp->cfg);
> +	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
> +		msc = vmsc->msc;
> +
> +		mpam_mon_sel_outer_lock(msc);
> +		if (mpam_mon_sel_inner_lock(msc)) {
> +			list_for_each_entry(ris, &vmsc->ris, vmsc_list)
> +				add_to_garbage(ris->mbwu_state);
> +			mpam_mon_sel_inner_unlock(msc);
> +		}
> +		mpam_mon_sel_outer_lock(msc);
> +	}
>  }
>  
>  static int __allocate_component_cfg(struct mpam_component *comp)
>  {
> +	int err = 0;
> +	struct mpam_msc *msc;
> +	struct mpam_vmsc *vmsc;
> +	struct mpam_msc_ris *ris;
> +	struct msmon_mbwu_state *mbwu_state;
> +
> +	lockdep_assert_held(&mpam_list_lock);
>  	mpam_assert_partid_sizes_fixed();
>  
>  	if (comp->cfg)
> @@ -2306,6 +2434,37 @@ static int __allocate_component_cfg(struct mpam_component *comp)
>  		return -ENOMEM;
>  	init_garbage(comp->cfg);
>  
> +	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
> +		if (!vmsc->props.num_mbwu_mon)
> +			continue;
> +
> +		msc = vmsc->msc;
> +		mpam_mon_sel_outer_lock(msc);
> +		list_for_each_entry(ris, &vmsc->ris, vmsc_list) {
> +			if (!ris->props.num_mbwu_mon)
> +				continue;
> +
> +			mbwu_state = kcalloc(ris->props.num_mbwu_mon,
> +					     sizeof(*ris->mbwu_state),
> +					     GFP_KERNEL);
> +			if (!mbwu_state) {
> +				__destroy_component_cfg(comp);
> +				err = -ENOMEM;
> +				break;
> +			}
> +
> +			if (mpam_mon_sel_inner_lock(msc)) {
> +				init_garbage(mbwu_state);
> +				ris->mbwu_state = mbwu_state;
> +				mpam_mon_sel_inner_unlock(msc);
> +			}
> +		}
> +		mpam_mon_sel_outer_unlock(msc);
> +
> +		if (err)
> +			break;
> +	}
> +
>  	return 0;
>  }
>  
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 76e406a2b0d1..9a50a5432f4a 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -271,6 +271,42 @@ struct mpam_component {
>  	struct mpam_garbage	garbage;
>  };
>  
> +/* The values for MSMON_CFG_MBWU_FLT.RWBW */
> +enum mon_filter_options {
> +	COUNT_BOTH	= 0,
> +	COUNT_WRITE	= 1,
> +	COUNT_READ	= 2,
> +};
> +
> +struct mon_cfg {
> +	/* mon is wider than u16 to hold an out of range 'USE_RMID_IDX' */
> +	u32                     mon;
> +	u8                      pmg;
> +	bool                    match_pmg;
> +	u32                     partid;
> +	enum mon_filter_options opts;
> +};
> +
> +/*
> + * Changes to enabled and cfg are protected by the msc->lock.
> + * Changes to prev_val and correction are protected by the msc's mon_sel_lock.
> + */
> +struct msmon_mbwu_state {
> +	bool		enabled;
> +	struct mon_cfg	cfg;
> +
> +	/* The value last read from the hardware. Used to detect overflow. */
> +	u64		prev_val;
> +
> +	/*
> +	 * The value to add to the new reading to account for power management,
> +	 * and shifts to trigger the overflow interrupt.
> +	 */
> +	u64		correction;
> +
> +	struct mpam_garbage	garbage;
> +};
> +
These structures have ended up between struct mpam_component and struct
mpam_vmc. Move to somewhere more natural.
>  struct mpam_vmsc {
>  	/* member of mpam_component:vmsc_list */
>  	struct list_head	comp_list;
> @@ -306,22 +342,10 @@ struct mpam_msc_ris {
>  	/* parent: */
>  	struct mpam_vmsc	*vmsc;
>  
> -	struct mpam_garbage	garbage;
> -};
> +	/* msmon mbwu configuration is preserved over reset */
> +	struct msmon_mbwu_state	*mbwu_state;
>  
> -/* The values for MSMON_CFG_MBWU_FLT.RWBW */
> -enum mon_filter_options {
> -	COUNT_BOTH	= 0,
> -	COUNT_WRITE	= 1,
> -	COUNT_READ	= 2,
> -};
> -
> -struct mon_cfg {
> -	u16                     mon;
> -	u8                      pmg;
> -	bool                    match_pmg;
> -	u32                     partid;
> -	enum mon_filter_options opts;
Choose where this enum and structure go in the previous patch.
> +	struct mpam_garbage	garbage;
>  };
>  
>  static inline int mpam_alloc_csu_mon(struct mpam_class *class)
Thanks,
Ben
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
- * [PATCH 29/33] arm_mpam: Probe for long/lwd mbwu counters
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (27 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 28/33] arm_mpam: Track bandwidth counter state for overflow and power management James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-28 16:14   ` Ben Horgan
  2025-08-22 15:30 ` [PATCH 30/33] arm_mpam: Use long MBWU counters if supported James Morse
                   ` (38 subsequent siblings)
  67 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
From: Rohit Mathew <rohit.mathew@arm.com>
mpam v0.1 and versions above v1.0 support optional long counter for
memory bandwidth monitoring. The MPAMF_MBWUMON_IDR register have fields
indicating support for long counters. As of now, a 44 bit counter
represented by HAS_LONG field (bit 30) and a 63 bit counter represented
by LWD (bit 29) can be optionally integrated. Probe for these counters
and set corresponding feature bits if any of these counters are present.
Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_devices.c  | 23 ++++++++++++++++++++++-
 drivers/resctrl/mpam_internal.h |  8 ++++++++
 2 files changed, 30 insertions(+), 1 deletion(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 11be34b54643..2ab7f127baaa 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -870,7 +870,7 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
 				pr_err_once("Counters are not usable because not-ready timeout was not provided by firmware.");
 		}
 		if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_MBWU, msmon_features)) {
-			bool hw_managed;
+			bool has_long, hw_managed;
 			u32 mbwumonidr = mpam_read_partsel_reg(msc, MBWUMON_IDR);
 
 			props->num_mbwu_mon = FIELD_GET(MPAMF_MBWUMON_IDR_NUM_MON, mbwumonidr);
@@ -880,6 +880,27 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
 			if (FIELD_GET(MPAMF_MBWUMON_IDR_HAS_RWBW, mbwumonidr))
 				mpam_set_feature(mpam_feat_msmon_mbwu_rwbw, props);
 
+			/*
+			 * Treat long counter and its extension, lwd as mutually
+			 * exclusive feature bits. Though these are dependent
+			 * fields at the implementation level, there would never
+			 * be a need for mpam_feat_msmon_mbwu_44counter (long
+			 * counter) and mpam_feat_msmon_mbwu_63counter (lwd)
+			 * bits to be set together.
+			 *
+			 * mpam_feat_msmon_mbwu isn't treated as an exclusive
+			 * bit as this feature bit would be used as the "front
+			 * facing feature bit" for any checks related to mbwu
+			 * monitors.
+			 */
+			has_long = FIELD_GET(MPAMF_MBWUMON_IDR_HAS_LONG, mbwumonidr);
+			if (props->num_mbwu_mon && has_long) {
+				if (FIELD_GET(MPAMF_MBWUMON_IDR_LWD, mbwumonidr))
+					mpam_set_feature(mpam_feat_msmon_mbwu_63counter, props);
+				else
+					mpam_set_feature(mpam_feat_msmon_mbwu_44counter, props);
+			}
+
 			/* Is NRDY hardware managed? */
 			mpam_mon_sel_outer_lock(msc);
 			hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, MBWU);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 9a50a5432f4a..9f627b5f72a1 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -178,7 +178,15 @@ enum mpam_device_features {
 	mpam_feat_msmon_csu,
 	mpam_feat_msmon_csu_capture,
 	mpam_feat_msmon_csu_hw_nrdy,
+
+	/*
+	 * Having mpam_feat_msmon_mbwu set doesn't mean the regular 31 bit MBWU
+	 * counter would be used. The exact counter used is decided based on the
+	 * status of mpam_feat_msmon_mbwu_l/mpam_feat_msmon_mbwu_lwd as well.
+	 */
 	mpam_feat_msmon_mbwu,
+	mpam_feat_msmon_mbwu_44counter,
+	mpam_feat_msmon_mbwu_63counter,
 	mpam_feat_msmon_mbwu_capture,
 	mpam_feat_msmon_mbwu_rwbw,
 	mpam_feat_msmon_mbwu_hw_nrdy,
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 29/33] arm_mpam: Probe for long/lwd mbwu counters
  2025-08-22 15:30 ` [PATCH 29/33] arm_mpam: Probe for long/lwd mbwu counters James Morse
@ 2025-08-28 16:14   ` Ben Horgan
  2025-09-10 19:30     ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-08-28 16:14 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:30, James Morse wrote:
> From: Rohit Mathew <rohit.mathew@arm.com>
> 
> mpam v0.1 and versions above v1.0 support optional long counter for
> memory bandwidth monitoring. The MPAMF_MBWUMON_IDR register have fields
> indicating support for long counters. As of now, a 44 bit counter
> represented by HAS_LONG field (bit 30) and a 63 bit counter represented
> by LWD (bit 29) can be optionally integrated. Probe for these counters
> and set corresponding feature bits if any of these counters are present.
> 
> Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  drivers/resctrl/mpam_devices.c  | 23 ++++++++++++++++++++++-
>  drivers/resctrl/mpam_internal.h |  8 ++++++++
>  2 files changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 11be34b54643..2ab7f127baaa 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -870,7 +870,7 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
>  				pr_err_once("Counters are not usable because not-ready timeout was not provided by firmware.");
>  		}
>  		if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_MBWU, msmon_features)) {
> -			bool hw_managed;
> +			bool has_long, hw_managed;
>  			u32 mbwumonidr = mpam_read_partsel_reg(msc, MBWUMON_IDR);
nit: the variable name would be more readable with an underscore,
mwumon_idr.
>  
>  			props->num_mbwu_mon = FIELD_GET(MPAMF_MBWUMON_IDR_NUM_MON, mbwumonidr);
> @@ -880,6 +880,27 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
>  			if (FIELD_GET(MPAMF_MBWUMON_IDR_HAS_RWBW, mbwumonidr))
>  				mpam_set_feature(mpam_feat_msmon_mbwu_rwbw, props);
>  
> +			/*
> +			 * Treat long counter and its extension, lwd as mutually
> +			 * exclusive feature bits. Though these are dependent
> +			 * fields at the implementation level, there would never
> +			 * be a need for mpam_feat_msmon_mbwu_44counter (long
> +			 * counter) and mpam_feat_msmon_mbwu_63counter (lwd)
> +			 * bits to be set together.
> +			 *
> +			 * mpam_feat_msmon_mbwu isn't treated as an exclusive
> +			 * bit as this feature bit would be used as the "front
> +			 * facing feature bit" for any checks related to mbwu
> +			 * monitors.
> +			 */
> +			has_long = FIELD_GET(MPAMF_MBWUMON_IDR_HAS_LONG, mbwumonidr);
> +			if (props->num_mbwu_mon && has_long) {
> +				if (FIELD_GET(MPAMF_MBWUMON_IDR_LWD, mbwumonidr))
> +					mpam_set_feature(mpam_feat_msmon_mbwu_63counter, props);
> +				else
> +					mpam_set_feature(mpam_feat_msmon_mbwu_44counter, props);
> +			}
> +
>  			/* Is NRDY hardware managed? */
>  			mpam_mon_sel_outer_lock(msc);
>  			hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, MBWU);
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 9a50a5432f4a..9f627b5f72a1 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -178,7 +178,15 @@ enum mpam_device_features {
>  	mpam_feat_msmon_csu,
>  	mpam_feat_msmon_csu_capture,
>  	mpam_feat_msmon_csu_hw_nrdy,
> +
> +	/*
> +	 * Having mpam_feat_msmon_mbwu set doesn't mean the regular 31 bit MBWU
> +	 * counter would be used. The exact counter used is decided based on the
> +	 * status of mpam_feat_msmon_mbwu_l/mpam_feat_msmon_mbwu_lwd as well.
mpam_feat_msmon_mbwu_44counter/mpam_feat_msmon_mbwu_63counter
> +	 */
>  	mpam_feat_msmon_mbwu,
> +	mpam_feat_msmon_mbwu_44counter,
> +	mpam_feat_msmon_mbwu_63counter,
>  	mpam_feat_msmon_mbwu_capture,
>  	mpam_feat_msmon_mbwu_rwbw,
>  	mpam_feat_msmon_mbwu_hw_nrdy,
Other than the two nits, the change looks good to me.
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Thanks,
Ben
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 29/33] arm_mpam: Probe for long/lwd mbwu counters
  2025-08-28 16:14   ` Ben Horgan
@ 2025-09-10 19:30     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 19:30 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi Ben,
On 28/08/2025 17:14, Ben Horgan wrote:
> On 8/22/25 16:30, James Morse wrote:
>> From: Rohit Mathew <rohit.mathew@arm.com>
>>
>> mpam v0.1 and versions above v1.0 support optional long counter for
>> memory bandwidth monitoring. The MPAMF_MBWUMON_IDR register have fields
>> indicating support for long counters. As of now, a 44 bit counter
>> represented by HAS_LONG field (bit 30) and a 63 bit counter represented
>> by LWD (bit 29) can be optionally integrated. Probe for these counters
>> and set corresponding feature bits if any of these counters are present.
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index 11be34b54643..2ab7f127baaa 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -870,7 +870,7 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
>>  				pr_err_once("Counters are not usable because not-ready timeout was not provided by firmware.");
>>  		}
>>  		if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_MBWU, msmon_features)) {
>> -			bool hw_managed;
>> +			bool has_long, hw_managed;
>>  			u32 mbwumonidr = mpam_read_partsel_reg(msc, MBWUMON_IDR);
> nit: the variable name would be more readable with an underscore,
> mwumon_idr.
Sure,
>>  
>>  			props->num_mbwu_mon = FIELD_GET(MPAMF_MBWUMON_IDR_NUM_MON, mbwumonidr);
>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>> index 9a50a5432f4a..9f627b5f72a1 100644
>> --- a/drivers/resctrl/mpam_internal.h
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -178,7 +178,15 @@ enum mpam_device_features {
>>  	mpam_feat_msmon_csu,
>>  	mpam_feat_msmon_csu_capture,
>>  	mpam_feat_msmon_csu_hw_nrdy,
>> +
>> +	/*
>> +	 * Having mpam_feat_msmon_mbwu set doesn't mean the regular 31 bit MBWU
>> +	 * counter would be used. The exact counter used is decided based on the
>> +	 * status of mpam_feat_msmon_mbwu_l/mpam_feat_msmon_mbwu_lwd as well.
> mpam_feat_msmon_mbwu_44counter/mpam_feat_msmon_mbwu_63counter
That's my fault - I hate the names for these things!
>> +	 */
>>  	mpam_feat_msmon_mbwu,
>> +	mpam_feat_msmon_mbwu_44counter,
>> +	mpam_feat_msmon_mbwu_63counter,
>>  	mpam_feat_msmon_mbwu_capture,
>>  	mpam_feat_msmon_mbwu_rwbw,
>>  	mpam_feat_msmon_mbwu_hw_nrdy,
> 
> Other than the two nits, the change looks good to me.
> 
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Thanks!
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
 
- * [PATCH 30/33] arm_mpam: Use long MBWU counters if supported
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (28 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 29/33] arm_mpam: Probe for long/lwd mbwu counters James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-29 16:39   ` Ben Horgan
  2025-08-22 15:30 ` [PATCH 31/33] arm_mpam: Add helper to reset saved mbwu state James Morse
                   ` (37 subsequent siblings)
  67 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
From: Rohit Mathew <rohit.mathew@arm.com>
If the 44 bit (long) or 63 bit (LWD) counters are detected on probing
the RIS, use long/LWD counter instead of the regular 31 bit mbwu
counter.
Only 32bit accesses to the MSC are required to be supported by the
spec, but these registers are 64bits. The lower half may overflow
into the higher half between two 32bit reads. To avoid this, use
a helper that reads the top half multiple times to check for overflow.
Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
[morse: merged multiple patches from Rohit]
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
 * Commit message wrangling.
 * Refer to 31 bit counters as opposed to 32 bit (registers).
---
 drivers/resctrl/mpam_devices.c | 89 ++++++++++++++++++++++++++++++----
 1 file changed, 80 insertions(+), 9 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 2ab7f127baaa..8fbcf6eb946a 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -1002,6 +1002,48 @@ struct mon_read {
 	int				err;
 };
 
+static bool mpam_ris_has_mbwu_long_counter(struct mpam_msc_ris *ris)
+{
+	return (mpam_has_feature(mpam_feat_msmon_mbwu_63counter, &ris->props) ||
+		mpam_has_feature(mpam_feat_msmon_mbwu_44counter, &ris->props));
+}
+
+static u64 mpam_msc_read_mbwu_l(struct mpam_msc *msc)
+{
+	int retry = 3;
+	u32 mbwu_l_low;
+	u64 mbwu_l_high1, mbwu_l_high2;
+
+	mpam_mon_sel_lock_held(msc);
+
+	WARN_ON_ONCE((MSMON_MBWU_L + sizeof(u64)) > msc->mapped_hwpage_sz);
+	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
+
+	mbwu_l_high2 = __mpam_read_reg(msc, MSMON_MBWU_L + 4);
+	do {
+		mbwu_l_high1 = mbwu_l_high2;
+		mbwu_l_low = __mpam_read_reg(msc, MSMON_MBWU_L);
+		mbwu_l_high2 = __mpam_read_reg(msc, MSMON_MBWU_L + 4);
+
+		retry--;
+	} while (mbwu_l_high1 != mbwu_l_high2 && retry > 0);
+
+	if (mbwu_l_high1 == mbwu_l_high2)
+		return (mbwu_l_high1 << 32) | mbwu_l_low;
+	return MSMON___NRDY_L;
+}
+
+static void mpam_msc_zero_mbwu_l(struct mpam_msc *msc)
+{
+	mpam_mon_sel_lock_held(msc);
+
+	WARN_ON_ONCE((MSMON_MBWU_L + sizeof(u64)) > msc->mapped_hwpage_sz);
+	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
+
+	__mpam_write_reg(msc, MSMON_MBWU_L, 0);
+	__mpam_write_reg(msc, MSMON_MBWU_L + 4, 0);
+}
+
 static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
 				   u32 *flt_val)
 {
@@ -1058,6 +1100,7 @@ static void read_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
 static void clean_msmon_ctl_val(u32 *cur_ctl)
 {
 	*cur_ctl &= ~MSMON_CFG_x_CTL_OFLOW_STATUS;
+	*cur_ctl &= ~MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L;
 }
 
 static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
@@ -1080,7 +1123,11 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
 	case mpam_feat_msmon_mbwu:
 		mpam_write_monsel_reg(msc, CFG_MBWU_FLT, flt_val);
 		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
-		mpam_write_monsel_reg(msc, MBWU, 0);
+		if (mpam_ris_has_mbwu_long_counter(m->ris))
+			mpam_msc_zero_mbwu_l(m->ris->vmsc->msc);
+		else
+			mpam_write_monsel_reg(msc, MBWU, 0);
+
 		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
 
 		mbwu_state = &m->ris->mbwu_state[m->ctx->mon];
@@ -1095,8 +1142,13 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
 
 static u64 mpam_msmon_overflow_val(struct mpam_msc_ris *ris)
 {
-	/* TODO: scaling, and long counters */
-	return GENMASK_ULL(30, 0);
+	/* TODO: implement scaling counters */
+	if (mpam_has_feature(mpam_feat_msmon_mbwu_63counter, &ris->props))
+		return GENMASK_ULL(62, 0);
+	else if (mpam_has_feature(mpam_feat_msmon_mbwu_44counter, &ris->props))
+		return GENMASK_ULL(43, 0);
+	else
+		return GENMASK_ULL(30, 0);
 }
 
 /* Call with MSC lock held */
@@ -1138,10 +1190,24 @@ static void __ris_msmon_read(void *arg)
 		now = FIELD_GET(MSMON___VALUE, now);
 		break;
 	case mpam_feat_msmon_mbwu:
-		now = mpam_read_monsel_reg(msc, MBWU);
-		if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
-			nrdy = now & MSMON___NRDY;
-		now = FIELD_GET(MSMON___VALUE, now);
+		/*
+		 * If long or lwd counters are supported, use them, else revert
+		 * to the 31 bit counter.
+		 */
+		if (mpam_ris_has_mbwu_long_counter(ris)) {
+			now = mpam_msc_read_mbwu_l(msc);
+			if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
+				nrdy = now & MSMON___NRDY_L;
+			if (mpam_has_feature(mpam_feat_msmon_mbwu_63counter, rprops))
+				now = FIELD_GET(MSMON___LWD_VALUE, now);
+			else
+				now = FIELD_GET(MSMON___L_VALUE, now);
+		} else {
+			now = mpam_read_monsel_reg(msc, MBWU);
+			if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
+				nrdy = now & MSMON___NRDY;
+			now = FIELD_GET(MSMON___VALUE, now);
+		}
 
 		if (nrdy)
 			break;
@@ -1433,8 +1499,13 @@ static int mpam_save_mbwu_state(void *arg)
 		cur_ctl = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
 		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, 0);
 
-		val = mpam_read_monsel_reg(msc, MBWU);
-		mpam_write_monsel_reg(msc, MBWU, 0);
+		if (mpam_ris_has_mbwu_long_counter(ris)) {
+			val = mpam_msc_read_mbwu_l(msc);
+			mpam_msc_zero_mbwu_l(msc);
+		} else {
+			val = mpam_read_monsel_reg(msc, MBWU);
+			mpam_write_monsel_reg(msc, MBWU, 0);
+		}
 
 		cfg->mon = i;
 		cfg->pmg = FIELD_GET(MSMON_CFG_MBWU_FLT_PMG, cur_flt);
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 30/33] arm_mpam: Use long MBWU counters if supported
  2025-08-22 15:30 ` [PATCH 30/33] arm_mpam: Use long MBWU counters if supported James Morse
@ 2025-08-29 16:39   ` Ben Horgan
  2025-09-10 19:30     ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-08-29 16:39 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:30, James Morse wrote:
> From: Rohit Mathew <rohit.mathew@arm.com>
> 
> If the 44 bit (long) or 63 bit (LWD) counters are detected on probing
> the RIS, use long/LWD counter instead of the regular 31 bit mbwu
> counter.
> 
> Only 32bit accesses to the MSC are required to be supported by the
> spec, but these registers are 64bits. The lower half may overflow
> into the higher half between two 32bit reads. To avoid this, use
> a helper that reads the top half multiple times to check for overflow.
> 
> Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
> [morse: merged multiple patches from Rohit]
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
>  * Commit message wrangling.
>  * Refer to 31 bit counters as opposed to 32 bit (registers).
> ---
>  drivers/resctrl/mpam_devices.c | 89 ++++++++++++++++++++++++++++++----
>  1 file changed, 80 insertions(+), 9 deletions(-)
> 
Looks good to me.
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 2ab7f127baaa..8fbcf6eb946a 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -1002,6 +1002,48 @@ struct mon_read {
>  	int				err;
>  };
>  
> +static bool mpam_ris_has_mbwu_long_counter(struct mpam_msc_ris *ris)
> +{
> +	return (mpam_has_feature(mpam_feat_msmon_mbwu_63counter, &ris->props) ||
> +		mpam_has_feature(mpam_feat_msmon_mbwu_44counter, &ris->props));
> +}
> +
> +static u64 mpam_msc_read_mbwu_l(struct mpam_msc *msc)
> +{
> +	int retry = 3;
> +	u32 mbwu_l_low;
> +	u64 mbwu_l_high1, mbwu_l_high2;
> +
> +	mpam_mon_sel_lock_held(msc);
> +
> +	WARN_ON_ONCE((MSMON_MBWU_L + sizeof(u64)) > msc->mapped_hwpage_sz);
> +	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
> +
> +	mbwu_l_high2 = __mpam_read_reg(msc, MSMON_MBWU_L + 4);
> +	do {
> +		mbwu_l_high1 = mbwu_l_high2;
> +		mbwu_l_low = __mpam_read_reg(msc, MSMON_MBWU_L);
> +		mbwu_l_high2 = __mpam_read_reg(msc, MSMON_MBWU_L + 4);
> +
> +		retry--;
> +	} while (mbwu_l_high1 != mbwu_l_high2 && retry > 0);
> +
> +	if (mbwu_l_high1 == mbwu_l_high2)
> +		return (mbwu_l_high1 << 32) | mbwu_l_low;
> +	return MSMON___NRDY_L;
> +}
> +
> +static void mpam_msc_zero_mbwu_l(struct mpam_msc *msc)
> +{
> +	mpam_mon_sel_lock_held(msc);
> +
> +	WARN_ON_ONCE((MSMON_MBWU_L + sizeof(u64)) > msc->mapped_hwpage_sz);
> +	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
> +
> +	__mpam_write_reg(msc, MSMON_MBWU_L, 0);
> +	__mpam_write_reg(msc, MSMON_MBWU_L + 4, 0);
> +}
> +
>  static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
>  				   u32 *flt_val)
>  {
> @@ -1058,6 +1100,7 @@ static void read_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
>  static void clean_msmon_ctl_val(u32 *cur_ctl)
>  {
>  	*cur_ctl &= ~MSMON_CFG_x_CTL_OFLOW_STATUS;
> +	*cur_ctl &= ~MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L;
I observe that this bit is res0, in the CSU case, and so the clearing is ok.
>  }
>  
>  static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
> @@ -1080,7 +1123,11 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
>  	case mpam_feat_msmon_mbwu:
>  		mpam_write_monsel_reg(msc, CFG_MBWU_FLT, flt_val);
>  		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
> -		mpam_write_monsel_reg(msc, MBWU, 0);
> +		if (mpam_ris_has_mbwu_long_counter(m->ris))
> +			mpam_msc_zero_mbwu_l(m->ris->vmsc->msc);
> +		else
> +			mpam_write_monsel_reg(msc, MBWU, 0);
> +
>  		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
>  
>  		mbwu_state = &m->ris->mbwu_state[m->ctx->mon];
> @@ -1095,8 +1142,13 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
>  
>  static u64 mpam_msmon_overflow_val(struct mpam_msc_ris *ris)
>  {
> -	/* TODO: scaling, and long counters */
> -	return GENMASK_ULL(30, 0);
> +	/* TODO: implement scaling counters */
> +	if (mpam_has_feature(mpam_feat_msmon_mbwu_63counter, &ris->props))
> +		return GENMASK_ULL(62, 0);
> +	else if (mpam_has_feature(mpam_feat_msmon_mbwu_44counter, &ris->props))
> +		return GENMASK_ULL(43, 0);
> +	else
> +		return GENMASK_ULL(30, 0);
>  }
>  
>  /* Call with MSC lock held */
> @@ -1138,10 +1190,24 @@ static void __ris_msmon_read(void *arg)
>  		now = FIELD_GET(MSMON___VALUE, now);
>  		break;
>  	case mpam_feat_msmon_mbwu:
> -		now = mpam_read_monsel_reg(msc, MBWU);
> -		if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
> -			nrdy = now & MSMON___NRDY;
> -		now = FIELD_GET(MSMON___VALUE, now);
> +		/*
> +		 * If long or lwd counters are supported, use them, else revert
> +		 * to the 31 bit counter.
> +		 */
> +		if (mpam_ris_has_mbwu_long_counter(ris)) {
> +			now = mpam_msc_read_mbwu_l(msc);
> +			if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
> +				nrdy = now & MSMON___NRDY_L;
> +			if (mpam_has_feature(mpam_feat_msmon_mbwu_63counter, rprops))
> +				now = FIELD_GET(MSMON___LWD_VALUE, now);
> +			else
> +				now = FIELD_GET(MSMON___L_VALUE, now);
> +		} else {
> +			now = mpam_read_monsel_reg(msc, MBWU);
> +			if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
> +				nrdy = now & MSMON___NRDY;
> +			now = FIELD_GET(MSMON___VALUE, now);
> +		}
>  
>  		if (nrdy)
>  			break;
> @@ -1433,8 +1499,13 @@ static int mpam_save_mbwu_state(void *arg)
>  		cur_ctl = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
>  		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, 0);
>  
> -		val = mpam_read_monsel_reg(msc, MBWU);
> -		mpam_write_monsel_reg(msc, MBWU, 0);
> +		if (mpam_ris_has_mbwu_long_counter(ris)) {
> +			val = mpam_msc_read_mbwu_l(msc);
> +			mpam_msc_zero_mbwu_l(msc);
> +		} else {
> +			val = mpam_read_monsel_reg(msc, MBWU);
> +			mpam_write_monsel_reg(msc, MBWU, 0);
> +		}
>  
>  		cfg->mon = i;
>  		cfg->pmg = FIELD_GET(MSMON_CFG_MBWU_FLT_PMG, cur_flt);
-- 
Thanks,
Ben
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 30/33] arm_mpam: Use long MBWU counters if supported
  2025-08-29 16:39   ` Ben Horgan
@ 2025-09-10 19:30     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 19:30 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi Ben,
On 29/08/2025 17:39, Ben Horgan wrote:
> On 8/22/25 16:30, James Morse wrote:
>> From: Rohit Mathew <rohit.mathew@arm.com>
>>
>> If the 44 bit (long) or 63 bit (LWD) counters are detected on probing
>> the RIS, use long/LWD counter instead of the regular 31 bit mbwu
>> counter.
>>
>> Only 32bit accesses to the MSC are required to be supported by the
>> spec, but these registers are 64bits. The lower half may overflow
>> into the higher half between two 32bit reads. To avoid this, use
>> a helper that reads the top half multiple times to check for overflow.
> Looks good to me.
> 
> Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Thanks!
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index 2ab7f127baaa..8fbcf6eb946a 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -1058,6 +1100,7 @@ static void read_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
>>  static void clean_msmon_ctl_val(u32 *cur_ctl)
>>  {
>>  	*cur_ctl &= ~MSMON_CFG_x_CTL_OFLOW_STATUS;
>> +	*cur_ctl &= ~MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L;
> I observe that this bit is res0, in the CSU case, and so the clearing is ok.
As they've started allocating bits that collide, it probably shouldn't rely on that.
The bug would be when that bit gets set for CSU in the future, its always masked out and
the monitor gets reprogrammed every time. (possibly incurring the timeout each time)
Changed as:
|	if (FIELD_GET(MSMON_CFG_x_CTL_TYPE, *cur_ctl) == MSMON_CFG_MBWU_CTL_TYPE_MBWU)
|		*cur_ctl &= ~MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L;
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
 
- * [PATCH 31/33] arm_mpam: Add helper to reset saved mbwu state
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (29 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 30/33] arm_mpam: Use long MBWU counters if supported James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-22 15:30 ` [PATCH 32/33] arm_mpam: Add kunit test for bitmap reset James Morse
                   ` (36 subsequent siblings)
  67 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
resctrl expects to reset the bandwidth counters when the filesystem
is mounted.
To allow this, add a helper that clears the saved mbwu state. Instead
of cross calling to each CPU that can access the component MSC to
write to the counter, set a flag that causes it to be zero'd on the
the next read. This is easily done by forcing a configuration update.
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_devices.c  | 49 +++++++++++++++++++++++++++++++--
 drivers/resctrl/mpam_internal.h |  5 +++-
 2 files changed, 51 insertions(+), 3 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 8fbcf6eb946a..65c30ebfe001 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -1155,9 +1155,11 @@ static u64 mpam_msmon_overflow_val(struct mpam_msc_ris *ris)
 static void __ris_msmon_read(void *arg)
 {
 	bool nrdy = false;
+	bool config_mismatch;
 	struct mon_read *m = arg;
 	u64 now, overflow_val = 0;
 	struct mon_cfg *ctx = m->ctx;
+	bool reset_on_next_read = false;
 	struct mpam_msc_ris *ris = m->ris;
 	struct msmon_mbwu_state *mbwu_state;
 	struct mpam_props *rprops = &ris->props;
@@ -1172,6 +1174,14 @@ static void __ris_msmon_read(void *arg)
 		  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
 	mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
 
+	if (m->type == mpam_feat_msmon_mbwu) {
+		mbwu_state = &ris->mbwu_state[ctx->mon];
+		if (mbwu_state) {
+			reset_on_next_read = mbwu_state->reset_on_next_read;
+			mbwu_state->reset_on_next_read = false;
+		}
+	}
+
 	/*
 	 * Read the existing configuration to avoid re-writing the same values.
 	 * This saves waiting for 'nrdy' on subsequent reads.
@@ -1179,7 +1189,10 @@ static void __ris_msmon_read(void *arg)
 	read_msmon_ctl_flt_vals(m, &cur_ctl, &cur_flt);
 	clean_msmon_ctl_val(&cur_ctl);
 	gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
-	if (cur_flt != flt_val || cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN))
+	config_mismatch = cur_flt != flt_val ||
+			  cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN);
+
+	if (config_mismatch || reset_on_next_read)
 		write_msmon_ctl_flt_vals(m, ctl_val, flt_val);
 
 	switch (m->type) {
@@ -1212,7 +1225,6 @@ static void __ris_msmon_read(void *arg)
 		if (nrdy)
 			break;
 
-		mbwu_state = &ris->mbwu_state[ctx->mon];
 		if (!mbwu_state)
 			break;
 
@@ -1314,6 +1326,39 @@ int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
 	return err;
 }
 
+void mpam_msmon_reset_mbwu(struct mpam_component *comp, struct mon_cfg *ctx)
+{
+	int idx;
+	struct mpam_msc *msc;
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc_ris *ris;
+
+	if (!mpam_is_enabled())
+		return;
+
+	idx = srcu_read_lock(&mpam_srcu);
+	list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
+		if (!mpam_has_feature(mpam_feat_msmon_mbwu, &vmsc->props))
+			continue;
+
+		msc = vmsc->msc;
+		mpam_mon_sel_outer_lock(msc);
+		list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
+			if (!mpam_has_feature(mpam_feat_msmon_mbwu, &ris->props))
+				continue;
+
+			if (WARN_ON_ONCE(!mpam_mon_sel_inner_lock(msc)))
+				continue;
+
+			ris->mbwu_state[ctx->mon].correction = 0;
+			ris->mbwu_state[ctx->mon].reset_on_next_read = true;
+			mpam_mon_sel_inner_unlock(msc);
+		}
+		mpam_mon_sel_outer_unlock(msc);
+	}
+	srcu_read_unlock(&mpam_srcu, idx);
+}
+
 static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
 {
 	u32 num_words, msb;
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 9f627b5f72a1..bbf0306abc82 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -297,10 +297,12 @@ struct mon_cfg {
 
 /*
  * Changes to enabled and cfg are protected by the msc->lock.
- * Changes to prev_val and correction are protected by the msc's mon_sel_lock.
+ * Changes to reset_on_next_read, prev_val and correction are protected by the
+ * msc's mon_sel_lock.
  */
 struct msmon_mbwu_state {
 	bool		enabled;
+	bool		reset_on_next_read;
 	struct mon_cfg	cfg;
 
 	/* The value last read from the hardware. Used to detect overflow. */
@@ -410,6 +412,7 @@ int mpam_apply_config(struct mpam_component *comp, u16 partid,
 
 int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
 		    enum mpam_device_features, u64 *val);
+void mpam_msmon_reset_mbwu(struct mpam_component *comp, struct mon_cfg *ctx);
 
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * [PATCH 32/33] arm_mpam: Add kunit test for bitmap reset
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (30 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 31/33] arm_mpam: Add helper to reset saved mbwu state James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-29 16:56   ` Ben Horgan
  2025-08-22 15:30 ` [PATCH 33/33] arm_mpam: Add kunit tests for props_mismatch() James Morse
                   ` (35 subsequent siblings)
  67 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich,
	Jonathan Cameron
The bitmap reset code has been a source of bugs. Add a unit test.
This currently has to be built in, as the rest of the driver is
builtin.
Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/Kconfig             | 13 ++++++
 drivers/resctrl/mpam_devices.c      |  4 ++
 drivers/resctrl/test_mpam_devices.c | 68 +++++++++++++++++++++++++++++
 3 files changed, 85 insertions(+)
 create mode 100644 drivers/resctrl/test_mpam_devices.c
diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
index dff7b87280ab..f5e0609975e4 100644
--- a/drivers/resctrl/Kconfig
+++ b/drivers/resctrl/Kconfig
@@ -4,8 +4,21 @@ config ARM64_MPAM_DRIVER
 	bool "MPAM driver for System IP, e,g. caches and memory controllers"
 	depends on ARM64_MPAM && EXPERT
 
+menu "ARM64 MPAM driver options"
+
 config ARM64_MPAM_DRIVER_DEBUG
 	bool "Enable debug messages from the MPAM driver."
 	depends on ARM64_MPAM_DRIVER
 	help
 	  Say yes here to enable debug messages from the MPAM driver.
+
+config MPAM_KUNIT_TEST
+	bool "KUnit tests for MPAM driver " if !KUNIT_ALL_TESTS
+	depends on KUNIT=y
+	default KUNIT_ALL_TESTS
+	help
+	  Enable this option to run tests in the MPAM driver.
+
+	  If unsure, say N.
+
+endmenu
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 65c30ebfe001..4cf5aae88c53 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -2903,3 +2903,7 @@ static int __init mpam_msc_driver_init(void)
 }
 /* Must occur after arm64_mpam_register_cpus() from arch_initcall() */
 subsys_initcall(mpam_msc_driver_init);
+
+#ifdef CONFIG_MPAM_KUNIT_TEST
+#include "test_mpam_devices.c"
+#endif
diff --git a/drivers/resctrl/test_mpam_devices.c b/drivers/resctrl/test_mpam_devices.c
new file mode 100644
index 000000000000..8e9d6c88171c
--- /dev/null
+++ b/drivers/resctrl/test_mpam_devices.c
@@ -0,0 +1,68 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2024 Arm Ltd.
+/* This file is intended to be included into mpam_devices.c */
+
+#include <kunit/test.h>
+
+static void test_mpam_reset_msc_bitmap(struct kunit *test)
+{
+	char *buf = kunit_kzalloc(test, SZ_16K, GFP_KERNEL);
+	struct mpam_msc fake_msc;
+	u32 *test_result;
+
+	if (!buf)
+		return;
+
+	fake_msc.mapped_hwpage = buf;
+	fake_msc.mapped_hwpage_sz = SZ_16K;
+	cpumask_copy(&fake_msc.accessibility, cpu_possible_mask);
+
+	mutex_init(&fake_msc.part_sel_lock);
+	mutex_lock(&fake_msc.part_sel_lock);
+
+	test_result = (u32 *)(buf + MPAMCFG_CPBM);
+
+	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 0);
+	KUNIT_EXPECT_EQ(test, test_result[0], 0);
+	KUNIT_EXPECT_EQ(test, test_result[1], 0);
+	test_result[0] = 0;
+	test_result[1] = 0;
+
+	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 1);
+	KUNIT_EXPECT_EQ(test, test_result[0], 1);
+	KUNIT_EXPECT_EQ(test, test_result[1], 0);
+	test_result[0] = 0;
+	test_result[1] = 0;
+
+	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 16);
+	KUNIT_EXPECT_EQ(test, test_result[0], 0xffff);
+	KUNIT_EXPECT_EQ(test, test_result[1], 0);
+	test_result[0] = 0;
+	test_result[1] = 0;
+
+	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 32);
+	KUNIT_EXPECT_EQ(test, test_result[0], 0xffffffff);
+	KUNIT_EXPECT_EQ(test, test_result[1], 0);
+	test_result[0] = 0;
+	test_result[1] = 0;
+
+	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 33);
+	KUNIT_EXPECT_EQ(test, test_result[0], 0xffffffff);
+	KUNIT_EXPECT_EQ(test, test_result[1], 1);
+	test_result[0] = 0;
+	test_result[1] = 0;
+
+	mutex_unlock(&fake_msc.part_sel_lock);
+}
+
+static struct kunit_case mpam_devices_test_cases[] = {
+	KUNIT_CASE(test_mpam_reset_msc_bitmap),
+	{}
+};
+
+static struct kunit_suite mpam_devices_test_suite = {
+	.name = "mpam_devices_test_suite",
+	.test_cases = mpam_devices_test_cases,
+};
+
+kunit_test_suites(&mpam_devices_test_suite);
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 32/33] arm_mpam: Add kunit test for bitmap reset
  2025-08-22 15:30 ` [PATCH 32/33] arm_mpam: Add kunit test for bitmap reset James Morse
@ 2025-08-29 16:56   ` Ben Horgan
  2025-09-10 19:30     ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-08-29 16:56 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:30, James Morse wrote:
> The bitmap reset code has been a source of bugs. Add a unit test.
> 
> This currently has to be built in, as the rest of the driver is
> builtin.
> 
> Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  drivers/resctrl/Kconfig             | 13 ++++++
>  drivers/resctrl/mpam_devices.c      |  4 ++
>  drivers/resctrl/test_mpam_devices.c | 68 +++++++++++++++++++++++++++++
>  3 files changed, 85 insertions(+)
>  create mode 100644 drivers/resctrl/test_mpam_devices.c
> 
> diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
> index dff7b87280ab..f5e0609975e4 100644
> --- a/drivers/resctrl/Kconfig
> +++ b/drivers/resctrl/Kconfig
> @@ -4,8 +4,21 @@ config ARM64_MPAM_DRIVER
>  	bool "MPAM driver for System IP, e,g. caches and memory controllers"
>  	depends on ARM64_MPAM && EXPERT
>  
> +menu "ARM64 MPAM driver options"
> +
>  config ARM64_MPAM_DRIVER_DEBUG
>  	bool "Enable debug messages from the MPAM driver."
>  	depends on ARM64_MPAM_DRIVER
>  	help
>  	  Say yes here to enable debug messages from the MPAM driver.
> +
> +config MPAM_KUNIT_TEST
> +	bool "KUnit tests for MPAM driver " if !KUNIT_ALL_TESTS
> +	depends on KUNIT=y
It depends on ARM64_MPAM_DRIVER as well.
> +	default KUNIT_ALL_TESTS
> +	help
> +	  Enable this option to run tests in the MPAM driver.
> +
> +	  If unsure, say N.
> +
> +endmenu
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 65c30ebfe001..4cf5aae88c53 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -2903,3 +2903,7 @@ static int __init mpam_msc_driver_init(void)
>  }
>  /* Must occur after arm64_mpam_register_cpus() from arch_initcall() */
>  subsys_initcall(mpam_msc_driver_init);
> +
> +#ifdef CONFIG_MPAM_KUNIT_TEST
> +#include "test_mpam_devices.c"
> +#endif
> diff --git a/drivers/resctrl/test_mpam_devices.c b/drivers/resctrl/test_mpam_devices.c
> new file mode 100644
> index 000000000000..8e9d6c88171c
> --- /dev/null
> +++ b/drivers/resctrl/test_mpam_devices.c
> @@ -0,0 +1,68 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2024 Arm Ltd.
> +/* This file is intended to be included into mpam_devices.c */
> +
> +#include <kunit/test.h>
> +
> +static void test_mpam_reset_msc_bitmap(struct kunit *test)
> +{
> +	char *buf = kunit_kzalloc(test, SZ_16K, GFP_KERNEL);
> +	struct mpam_msc fake_msc;
> +	u32 *test_result;
> +
> +	if (!buf)
> +		return;
> +
> +	fake_msc.mapped_hwpage = buf;
> +	fake_msc.mapped_hwpage_sz = SZ_16K;
> +	cpumask_copy(&fake_msc.accessibility, cpu_possible_mask);
> +
> +	mutex_init(&fake_msc.part_sel_lock);
> +	mutex_lock(&fake_msc.part_sel_lock);
> +
> +	test_result = (u32 *)(buf + MPAMCFG_CPBM);
> +
> +	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 0);
> +	KUNIT_EXPECT_EQ(test, test_result[0], 0);
> +	KUNIT_EXPECT_EQ(test, test_result[1], 0);
> +	test_result[0] = 0;
> +	test_result[1] = 0;
> +
> +	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 1);
> +	KUNIT_EXPECT_EQ(test, test_result[0], 1);
> +	KUNIT_EXPECT_EQ(test, test_result[1], 0);
> +	test_result[0] = 0;
> +	test_result[1] = 0;
> +
> +	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 16);
> +	KUNIT_EXPECT_EQ(test, test_result[0], 0xffff);
> +	KUNIT_EXPECT_EQ(test, test_result[1], 0);
> +	test_result[0] = 0;
> +	test_result[1] = 0;
> +
> +	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 32);
> +	KUNIT_EXPECT_EQ(test, test_result[0], 0xffffffff);
> +	KUNIT_EXPECT_EQ(test, test_result[1], 0);
> +	test_result[0] = 0;
> +	test_result[1] = 0;
> +
> +	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 33);
> +	KUNIT_EXPECT_EQ(test, test_result[0], 0xffffffff);
> +	KUNIT_EXPECT_EQ(test, test_result[1], 1);
> +	test_result[0] = 0;
> +	test_result[1] = 0;
> +
> +	mutex_unlock(&fake_msc.part_sel_lock);
> +}
> +
> +static struct kunit_case mpam_devices_test_cases[] = {
> +	KUNIT_CASE(test_mpam_reset_msc_bitmap),
> +	{}
> +};
> +
> +static struct kunit_suite mpam_devices_test_suite = {
> +	.name = "mpam_devices_test_suite",
> +	.test_cases = mpam_devices_test_cases,
> +};
> +
> +kunit_test_suites(&mpam_devices_test_suite);
Thanks,
Ben
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 32/33] arm_mpam: Add kunit test for bitmap reset
  2025-08-29 16:56   ` Ben Horgan
@ 2025-09-10 19:30     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 19:30 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi Ben,
On 29/08/2025 17:56, Ben Horgan wrote:
> On 8/22/25 16:30, James Morse wrote:
>> The bitmap reset code has been a source of bugs. Add a unit test.
>>
>> This currently has to be built in, as the rest of the driver is
>> builtin.
>> diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
>> index dff7b87280ab..f5e0609975e4 100644
>> --- a/drivers/resctrl/Kconfig
>> +++ b/drivers/resctrl/Kconfig
>> @@ -4,8 +4,21 @@ config ARM64_MPAM_DRIVER
>>  	bool "MPAM driver for System IP, e,g. caches and memory controllers"
>>  	depends on ARM64_MPAM && EXPERT
>>  
>> +menu "ARM64 MPAM driver options"
>> +
>>  config ARM64_MPAM_DRIVER_DEBUG
>>  	bool "Enable debug messages from the MPAM driver."
>>  	depends on ARM64_MPAM_DRIVER
>>  	help
>>  	  Say yes here to enable debug messages from the MPAM driver.
>> +
>> +config MPAM_KUNIT_TEST
>> +	bool "KUnit tests for MPAM driver " if !KUNIT_ALL_TESTS
>> +	depends on KUNIT=y
> It depends on ARM64_MPAM_DRIVER as well.
Yeah, the kbuild robot had some fun with all this. Turns out ARM64_MPAM is undefined on
non-ARM64, which means its dependencies disappear.
All this is now under an 'if ARM64_MPAM_DRIVER' and the driver symbol depends on ARM64 &&
ARM64_MPAM...
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread 
 
 
- * [PATCH 33/33] arm_mpam: Add kunit tests for props_mismatch()
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (31 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 32/33] arm_mpam: Add kunit test for bitmap reset James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-29 17:11   ` Ben Horgan
  2025-08-22 15:30 ` [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (34 subsequent siblings)
  67 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
When features are mismatched between MSC the way features are combined
to the class determines whether resctrl can support this SoC.
Add some tests to illustrate the sort of thing that is expected to
work, and those that must be removed.
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_internal.h     |   8 +-
 drivers/resctrl/test_mpam_devices.c | 322 ++++++++++++++++++++++++++++
 2 files changed, 329 insertions(+), 1 deletion(-)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index bbf0306abc82..6e973be095f8 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -18,6 +18,12 @@
 
 DECLARE_STATIC_KEY_FALSE(mpam_enabled);
 
+#ifdef CONFIG_MPAM_KUNIT_TEST
+#define PACKED_FOR_KUNIT __packed
+#else
+#define PACKED_FOR_KUNIT
+#endif
+
 static inline bool mpam_is_enabled(void)
 {
 	return static_branch_likely(&mpam_enabled);
@@ -209,7 +215,7 @@ struct mpam_props {
 	u16			dspri_wd;
 	u16			num_csu_mon;
 	u16			num_mbwu_mon;
-};
+} PACKED_FOR_KUNIT;
 
 #define mpam_has_feature(_feat, x)	((1 << (_feat)) & (x)->features)
 
diff --git a/drivers/resctrl/test_mpam_devices.c b/drivers/resctrl/test_mpam_devices.c
index 8e9d6c88171c..ef39696e7ff8 100644
--- a/drivers/resctrl/test_mpam_devices.c
+++ b/drivers/resctrl/test_mpam_devices.c
@@ -4,6 +4,326 @@
 
 #include <kunit/test.h>
 
+/*
+ * This test catches fields that aren't being sanitised - but can't tell you
+ * which one...
+ */
+static void test__props_mismatch(struct kunit *test)
+{
+	struct mpam_props parent = { 0 };
+	struct mpam_props child;
+
+	memset(&child, 0xff, sizeof(child));
+	__props_mismatch(&parent, &child, false);
+
+	memset(&child, 0, sizeof(child));
+	KUNIT_EXPECT_EQ(test, memcmp(&parent, &child, sizeof(child)), 0);
+
+	memset(&child, 0xff, sizeof(child));
+	__props_mismatch(&parent, &child, true);
+
+	KUNIT_EXPECT_EQ(test, memcmp(&parent, &child, sizeof(child)), 0);
+}
+
+static void test_mpam_enable_merge_features(struct kunit *test)
+{
+	/* o/` How deep is your stack? o/` */
+	struct list_head fake_classes_list;
+	struct mpam_class fake_class = { 0 };
+	struct mpam_component fake_comp1 = { 0 };
+	struct mpam_component fake_comp2 = { 0 };
+	struct mpam_vmsc fake_vmsc1 = { 0 };
+	struct mpam_vmsc fake_vmsc2 = { 0 };
+	struct mpam_msc fake_msc1 = { 0 };
+	struct mpam_msc fake_msc2 = { 0 };
+	struct mpam_msc_ris fake_ris1 = { 0 };
+	struct mpam_msc_ris fake_ris2 = { 0 };
+	struct platform_device fake_pdev = { 0 };
+
+#define RESET_FAKE_HIEARCHY()	do {				\
+	INIT_LIST_HEAD(&fake_classes_list);			\
+								\
+	memset(&fake_class, 0, sizeof(fake_class));		\
+	fake_class.level = 3;					\
+	fake_class.type = MPAM_CLASS_CACHE;			\
+	INIT_LIST_HEAD_RCU(&fake_class.components);		\
+	INIT_LIST_HEAD(&fake_class.classes_list);		\
+								\
+	memset(&fake_comp1, 0, sizeof(fake_comp1));		\
+	memset(&fake_comp2, 0, sizeof(fake_comp2));		\
+	fake_comp1.comp_id = 1;					\
+	fake_comp2.comp_id = 2;					\
+	INIT_LIST_HEAD(&fake_comp1.vmsc);			\
+	INIT_LIST_HEAD(&fake_comp1.class_list);			\
+	INIT_LIST_HEAD(&fake_comp2.vmsc);			\
+	INIT_LIST_HEAD(&fake_comp2.class_list);			\
+								\
+	memset(&fake_vmsc1, 0, sizeof(fake_vmsc1));		\
+	memset(&fake_vmsc2, 0, sizeof(fake_vmsc2));		\
+	INIT_LIST_HEAD(&fake_vmsc1.ris);			\
+	INIT_LIST_HEAD(&fake_vmsc1.comp_list);			\
+	fake_vmsc1.msc = &fake_msc1;				\
+	INIT_LIST_HEAD(&fake_vmsc2.ris);			\
+	INIT_LIST_HEAD(&fake_vmsc2.comp_list);			\
+	fake_vmsc2.msc = &fake_msc2;				\
+								\
+	memset(&fake_ris1, 0, sizeof(fake_ris1));		\
+	memset(&fake_ris2, 0, sizeof(fake_ris2));		\
+	fake_ris1.ris_idx = 1;					\
+	INIT_LIST_HEAD(&fake_ris1.msc_list);			\
+	fake_ris2.ris_idx = 2;					\
+	INIT_LIST_HEAD(&fake_ris2.msc_list);			\
+								\
+	fake_msc1.pdev = &fake_pdev;				\
+	fake_msc2.pdev = &fake_pdev;				\
+								\
+	list_add(&fake_class.classes_list, &fake_classes_list);	\
+} while (0)
+
+	RESET_FAKE_HIEARCHY();
+
+	mutex_lock(&mpam_list_lock);
+
+	/* One Class+Comp, two RIS in one vMSC with common features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = NULL;
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc1;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc1.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cpbm_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
+
+	RESET_FAKE_HIEARCHY();
+
+	/* One Class+Comp, two RIS in one vMSC with non-overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = NULL;
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc1;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc1.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cmax_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	/* Multiple RIS within one MSC controlling the same resource can be mismatched */
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_vmsc1.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
+	KUNIT_EXPECT_EQ(test, fake_vmsc1.props.cmax_wd, 4);
+	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 4);
+
+	RESET_FAKE_HIEARCHY();
+
+	/* One Class+Comp, two MSC with overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp1;
+	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cpbm_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
+
+	RESET_FAKE_HIEARCHY();
+
+	/* One Class+Comp, two MSC with non-overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp1;
+	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cmax_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	/*
+	 * Multiple RIS in different MSC can't the same resource, mismatched
+	 * features can not be supported.
+	 */
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
+	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 0);
+
+	RESET_FAKE_HIEARCHY();
+
+	/* One Class+Comp, two MSC with incompatible overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp1;
+	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
+	mpam_set_feature(mpam_feat_mbw_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_mbw_part, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 5;
+	fake_ris2.props.cpbm_wd = 3;
+	fake_ris1.props.mbw_pbm_bits = 5;
+	fake_ris2.props.mbw_pbm_bits = 3;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	/*
+	 * Multiple RIS in different MSC can't the same resource, mismatched
+	 * features can not be supported.
+	 */
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_mbw_part, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
+	KUNIT_EXPECT_EQ(test, fake_class.props.mbw_pbm_bits, 0);
+
+	RESET_FAKE_HIEARCHY();
+
+	/* One Class+Comp, two MSC with overlapping features that need tweaking */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp1;
+	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_mbw_min, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_mbw_min, &fake_ris2.props);
+	mpam_set_feature(mpam_feat_cmax_cmax, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cmax_cmax, &fake_ris2.props);
+	fake_ris1.props.bwa_wd = 5;
+	fake_ris2.props.bwa_wd = 3;
+	fake_ris1.props.cmax_wd = 5;
+	fake_ris2.props.cmax_wd = 3;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	/*
+	 * Multiple RIS in different MSC can't the same resource, mismatched
+	 * features can not be supported.
+	 */
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_min, &fake_class.props));
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmax, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.bwa_wd, 3);
+	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 3);
+
+	RESET_FAKE_HIEARCHY();
+
+	/* One Class Two Comp with overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = &fake_class;
+	list_add(&fake_comp2.class_list, &fake_class.components);
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp2;
+	list_add(&fake_vmsc2.comp_list, &fake_comp2.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cpbm_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
+
+	RESET_FAKE_HIEARCHY();
+
+	/* One Class Two Comp with non-overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = &fake_class;
+	list_add(&fake_comp2.class_list, &fake_class.components);
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp2;
+	list_add(&fake_vmsc2.comp_list, &fake_comp2.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cmax_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	/*
+	 * Multiple components can't control the same resource, mismatched features can
+	 * not be supported.
+	 */
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
+	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 0);
+
+	mutex_unlock(&mpam_list_lock);
+
+#undef RESET_FAKE_HIEARCHY
+}
+
 static void test_mpam_reset_msc_bitmap(struct kunit *test)
 {
 	char *buf = kunit_kzalloc(test, SZ_16K, GFP_KERNEL);
@@ -57,6 +377,8 @@ static void test_mpam_reset_msc_bitmap(struct kunit *test)
 
 static struct kunit_case mpam_devices_test_cases[] = {
 	KUNIT_CASE(test_mpam_reset_msc_bitmap),
+	KUNIT_CASE(test_mpam_enable_merge_features),
+	KUNIT_CASE(test__props_mismatch),
 	{}
 };
 
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 33/33] arm_mpam: Add kunit tests for props_mismatch()
  2025-08-22 15:30 ` [PATCH 33/33] arm_mpam: Add kunit tests for props_mismatch() James Morse
@ 2025-08-29 17:11   ` Ben Horgan
  2025-09-10 19:31     ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-08-29 17:11 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
The tests seem reasonable. Just some comments on the comments.
On 8/22/25 16:30, James Morse wrote:
> When features are mismatched between MSC the way features are combined
> to the class determines whether resctrl can support this SoC.
> 
> Add some tests to illustrate the sort of thing that is expected to
> work, and those that must be removed.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  drivers/resctrl/mpam_internal.h     |   8 +-
>  drivers/resctrl/test_mpam_devices.c | 322 ++++++++++++++++++++++++++++
>  2 files changed, 329 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index bbf0306abc82..6e973be095f8 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -18,6 +18,12 @@
>  
>  DECLARE_STATIC_KEY_FALSE(mpam_enabled);
>  
> +#ifdef CONFIG_MPAM_KUNIT_TEST
> +#define PACKED_FOR_KUNIT __packed
> +#else
> +#define PACKED_FOR_KUNIT
> +#endif
> +
>  static inline bool mpam_is_enabled(void)
>  {
>  	return static_branch_likely(&mpam_enabled);
> @@ -209,7 +215,7 @@ struct mpam_props {
>  	u16			dspri_wd;
>  	u16			num_csu_mon;
>  	u16			num_mbwu_mon;
> -};
> +} PACKED_FOR_KUNIT;
>  
>  #define mpam_has_feature(_feat, x)	((1 << (_feat)) & (x)->features)
>  
> diff --git a/drivers/resctrl/test_mpam_devices.c b/drivers/resctrl/test_mpam_devices.c
> index 8e9d6c88171c..ef39696e7ff8 100644
> --- a/drivers/resctrl/test_mpam_devices.c
> +++ b/drivers/resctrl/test_mpam_devices.c
> @@ -4,6 +4,326 @@
>  
>  #include <kunit/test.h>
>  
> +/*
> + * This test catches fields that aren't being sanitised - but can't tell you
> + * which one...
> + */
> +static void test__props_mismatch(struct kunit *test)
> +{
> +	struct mpam_props parent = { 0 };
> +	struct mpam_props child;
> +
> +	memset(&child, 0xff, sizeof(child));
> +	__props_mismatch(&parent, &child, false);
> +
> +	memset(&child, 0, sizeof(child));
> +	KUNIT_EXPECT_EQ(test, memcmp(&parent, &child, sizeof(child)), 0);
> +
> +	memset(&child, 0xff, sizeof(child));
> +	__props_mismatch(&parent, &child, true);
> +
> +	KUNIT_EXPECT_EQ(test, memcmp(&parent, &child, sizeof(child)), 0);
> +}
> +
> +static void test_mpam_enable_merge_features(struct kunit *test)
> +{
> +	/* o/` How deep is your stack? o/` */
> +	struct list_head fake_classes_list;
> +	struct mpam_class fake_class = { 0 };
> +	struct mpam_component fake_comp1 = { 0 };
> +	struct mpam_component fake_comp2 = { 0 };
> +	struct mpam_vmsc fake_vmsc1 = { 0 };
> +	struct mpam_vmsc fake_vmsc2 = { 0 };
> +	struct mpam_msc fake_msc1 = { 0 };
> +	struct mpam_msc fake_msc2 = { 0 };
> +	struct mpam_msc_ris fake_ris1 = { 0 };
> +	struct mpam_msc_ris fake_ris2 = { 0 };
> +	struct platform_device fake_pdev = { 0 };
> +
> +#define RESET_FAKE_HIEARCHY()	do {				\
> +	INIT_LIST_HEAD(&fake_classes_list);			\
> +								\
> +	memset(&fake_class, 0, sizeof(fake_class));		\
> +	fake_class.level = 3;					\
> +	fake_class.type = MPAM_CLASS_CACHE;			\
> +	INIT_LIST_HEAD_RCU(&fake_class.components);		\
> +	INIT_LIST_HEAD(&fake_class.classes_list);		\
> +								\
> +	memset(&fake_comp1, 0, sizeof(fake_comp1));		\
> +	memset(&fake_comp2, 0, sizeof(fake_comp2));		\
> +	fake_comp1.comp_id = 1;					\
> +	fake_comp2.comp_id = 2;					\
> +	INIT_LIST_HEAD(&fake_comp1.vmsc);			\
> +	INIT_LIST_HEAD(&fake_comp1.class_list);			\
> +	INIT_LIST_HEAD(&fake_comp2.vmsc);			\
> +	INIT_LIST_HEAD(&fake_comp2.class_list);			\
> +								\
> +	memset(&fake_vmsc1, 0, sizeof(fake_vmsc1));		\
> +	memset(&fake_vmsc2, 0, sizeof(fake_vmsc2));		\
> +	INIT_LIST_HEAD(&fake_vmsc1.ris);			\
> +	INIT_LIST_HEAD(&fake_vmsc1.comp_list);			\
> +	fake_vmsc1.msc = &fake_msc1;				\
> +	INIT_LIST_HEAD(&fake_vmsc2.ris);			\
> +	INIT_LIST_HEAD(&fake_vmsc2.comp_list);			\
> +	fake_vmsc2.msc = &fake_msc2;				\
> +								\
> +	memset(&fake_ris1, 0, sizeof(fake_ris1));		\
> +	memset(&fake_ris2, 0, sizeof(fake_ris2));		\
> +	fake_ris1.ris_idx = 1;					\
> +	INIT_LIST_HEAD(&fake_ris1.msc_list);			\
> +	fake_ris2.ris_idx = 2;					\
> +	INIT_LIST_HEAD(&fake_ris2.msc_list);			\
> +								\
> +	fake_msc1.pdev = &fake_pdev;				\
> +	fake_msc2.pdev = &fake_pdev;				\
> +								\
> +	list_add(&fake_class.classes_list, &fake_classes_list);	\
> +} while (0)
> +
> +	RESET_FAKE_HIEARCHY();
> +
> +	mutex_lock(&mpam_list_lock);
> +
> +	/* One Class+Comp, two RIS in one vMSC with common features */
> +	fake_comp1.class = &fake_class;
> +	list_add(&fake_comp1.class_list, &fake_class.components);
> +	fake_comp2.class = NULL;
> +	fake_vmsc1.comp = &fake_comp1;
> +	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> +	fake_vmsc2.comp = NULL;
> +	fake_ris1.vmsc = &fake_vmsc1;
> +	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> +	fake_ris2.vmsc = &fake_vmsc1;
> +	list_add(&fake_ris2.vmsc_list, &fake_vmsc1.ris);
> +
> +	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
> +	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
> +	fake_ris1.props.cpbm_wd = 4;
> +	fake_ris2.props.cpbm_wd = 4;
> +
> +	mpam_enable_merge_features(&fake_classes_list);
> +
> +	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
> +	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
> +
> +	RESET_FAKE_HIEARCHY();
> +
> +	/* One Class+Comp, two RIS in one vMSC with non-overlapping features */
> +	fake_comp1.class = &fake_class;
> +	list_add(&fake_comp1.class_list, &fake_class.components);
> +	fake_comp2.class = NULL;
> +	fake_vmsc1.comp = &fake_comp1;
> +	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> +	fake_vmsc2.comp = NULL;
> +	fake_ris1.vmsc = &fake_vmsc1;
> +	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> +	fake_ris2.vmsc = &fake_vmsc1;
> +	list_add(&fake_ris2.vmsc_list, &fake_vmsc1.ris);
> +
> +	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
> +	mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
> +	fake_ris1.props.cpbm_wd = 4;
> +	fake_ris2.props.cmax_wd = 4;
> +
> +	mpam_enable_merge_features(&fake_classes_list);
> +
> +	/* Multiple RIS within one MSC controlling the same resource can be mismatched */
> +	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
> +	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
> +	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_vmsc1.props));
> +	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
> +	KUNIT_EXPECT_EQ(test, fake_vmsc1.props.cmax_wd, 4);
> +	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 4);
> +
> +	RESET_FAKE_HIEARCHY();
> +
> +	/* One Class+Comp, two MSC with overlapping features */
> +	fake_comp1.class = &fake_class;
> +	list_add(&fake_comp1.class_list, &fake_class.components);
> +	fake_comp2.class = NULL;
> +	fake_vmsc1.comp = &fake_comp1;
> +	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> +	fake_vmsc2.comp = &fake_comp1;
> +	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
> +	fake_ris1.vmsc = &fake_vmsc1;
> +	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> +	fake_ris2.vmsc = &fake_vmsc2;
> +	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
> +
> +	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
> +	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
> +	fake_ris1.props.cpbm_wd = 4;
> +	fake_ris2.props.cpbm_wd = 4;
> +
> +	mpam_enable_merge_features(&fake_classes_list);
> +
> +	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
> +	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
> +
> +	RESET_FAKE_HIEARCHY();
> +
> +	/* One Class+Comp, two MSC with non-overlapping features */
> +	fake_comp1.class = &fake_class;
> +	list_add(&fake_comp1.class_list, &fake_class.components);
> +	fake_comp2.class = NULL;
> +	fake_vmsc1.comp = &fake_comp1;
> +	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> +	fake_vmsc2.comp = &fake_comp1;
> +	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
> +	fake_ris1.vmsc = &fake_vmsc1;
> +	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> +	fake_ris2.vmsc = &fake_vmsc2;
> +	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
> +
> +	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
> +	mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
> +	fake_ris1.props.cpbm_wd = 4;
> +	fake_ris2.props.cmax_wd = 4;
> +
> +	mpam_enable_merge_features(&fake_classes_list);
> +
> +	/*
> +	 * Multiple RIS in different MSC can't the same resource, mismatched
s/can't the same/can't control the same/
> +	 * features can not be supported.
> +	 */
> +	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
> +	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
> +	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
> +	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 0);
> +
> +	RESET_FAKE_HIEARCHY();
> +
> +	/* One Class+Comp, two MSC with incompatible overlapping features */
> +	fake_comp1.class = &fake_class;
> +	list_add(&fake_comp1.class_list, &fake_class.components);
> +	fake_comp2.class = NULL;
> +	fake_vmsc1.comp = &fake_comp1;
> +	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> +	fake_vmsc2.comp = &fake_comp1;
> +	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
> +	fake_ris1.vmsc = &fake_vmsc1;
> +	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> +	fake_ris2.vmsc = &fake_vmsc2;
> +	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
> +
> +	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
> +	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
> +	mpam_set_feature(mpam_feat_mbw_part, &fake_ris1.props);
> +	mpam_set_feature(mpam_feat_mbw_part, &fake_ris2.props);
> +	fake_ris1.props.cpbm_wd = 5;
> +	fake_ris2.props.cpbm_wd = 3;
> +	fake_ris1.props.mbw_pbm_bits = 5;
> +	fake_ris2.props.mbw_pbm_bits = 3;
> +
> +	mpam_enable_merge_features(&fake_classes_list);
> +
> +	/*
> +	 * Multiple RIS in different MSC can't the same resource, mismatched
> +	 * features can not be supported.
> +	 */
Missing the word "control" again.
> +	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
> +	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_mbw_part, &fake_class.props));
> +	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
> +	KUNIT_EXPECT_EQ(test, fake_class.props.mbw_pbm_bits, 0);
> +
> +	RESET_FAKE_HIEARCHY();
> +
> +	/* One Class+Comp, two MSC with overlapping features that need tweaking */
> +	fake_comp1.class = &fake_class;
> +	list_add(&fake_comp1.class_list, &fake_class.components);
> +	fake_comp2.class = NULL;
> +	fake_vmsc1.comp = &fake_comp1;
> +	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> +	fake_vmsc2.comp = &fake_comp1;
> +	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
> +	fake_ris1.vmsc = &fake_vmsc1;
> +	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> +	fake_ris2.vmsc = &fake_vmsc2;
> +	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
> +
> +	mpam_set_feature(mpam_feat_mbw_min, &fake_ris1.props);
> +	mpam_set_feature(mpam_feat_mbw_min, &fake_ris2.props);
> +	mpam_set_feature(mpam_feat_cmax_cmax, &fake_ris1.props);
> +	mpam_set_feature(mpam_feat_cmax_cmax, &fake_ris2.props);
> +	fake_ris1.props.bwa_wd = 5;
> +	fake_ris2.props.bwa_wd = 3;
> +	fake_ris1.props.cmax_wd = 5;
> +	fake_ris2.props.cmax_wd = 3;
> +
> +	mpam_enable_merge_features(&fake_classes_list);
> +
> +	/*
> +	 * Multiple RIS in different MSC can't the same resource, mismatched
> +	 * features can not be supported.
> +	 */
Comment is for a different case.
> +	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_min, &fake_class.props));
> +	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmax, &fake_class.props));
> +	KUNIT_EXPECT_EQ(test, fake_class.props.bwa_wd, 3);
> +	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 3);
> +
> +	RESET_FAKE_HIEARCHY();
> +
> +	/* One Class Two Comp with overlapping features */
> +	fake_comp1.class = &fake_class;
> +	list_add(&fake_comp1.class_list, &fake_class.components);
> +	fake_comp2.class = &fake_class;
> +	list_add(&fake_comp2.class_list, &fake_class.components);
> +	fake_vmsc1.comp = &fake_comp1;
> +	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> +	fake_vmsc2.comp = &fake_comp2;
> +	list_add(&fake_vmsc2.comp_list, &fake_comp2.vmsc);
> +	fake_ris1.vmsc = &fake_vmsc1;
> +	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> +	fake_ris2.vmsc = &fake_vmsc2;
> +	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
> +
> +	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
> +	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
> +	fake_ris1.props.cpbm_wd = 4;
> +	fake_ris2.props.cpbm_wd = 4;
> +
> +	mpam_enable_merge_features(&fake_classes_list);
> +
> +	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
> +	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
> +
> +	RESET_FAKE_HIEARCHY();
> +
> +	/* One Class Two Comp with non-overlapping features */
> +	fake_comp1.class = &fake_class;
> +	list_add(&fake_comp1.class_list, &fake_class.components);
> +	fake_comp2.class = &fake_class;
> +	list_add(&fake_comp2.class_list, &fake_class.components);
> +	fake_vmsc1.comp = &fake_comp1;
> +	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> +	fake_vmsc2.comp = &fake_comp2;
> +	list_add(&fake_vmsc2.comp_list, &fake_comp2.vmsc);
> +	fake_ris1.vmsc = &fake_vmsc1;
> +	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> +	fake_ris2.vmsc = &fake_vmsc2;
> +	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
> +
> +	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
> +	mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
> +	fake_ris1.props.cpbm_wd = 4;
> +	fake_ris2.props.cmax_wd = 4;
> +
> +	mpam_enable_merge_features(&fake_classes_list);
> +
> +	/*
> +	 * Multiple components can't control the same resource, mismatched features can
> +	 * not be supported.
> +	 */
> +	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
> +	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
> +	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
> +	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 0);
> +
> +	mutex_unlock(&mpam_list_lock);
> +
> +#undef RESET_FAKE_HIEARCHY
> +}
> +
>  static void test_mpam_reset_msc_bitmap(struct kunit *test)
>  {
>  	char *buf = kunit_kzalloc(test, SZ_16K, GFP_KERNEL);
> @@ -57,6 +377,8 @@ static void test_mpam_reset_msc_bitmap(struct kunit *test)
>  
>  static struct kunit_case mpam_devices_test_cases[] = {
>  	KUNIT_CASE(test_mpam_reset_msc_bitmap),
> +	KUNIT_CASE(test_mpam_enable_merge_features),
> +	KUNIT_CASE(test__props_mismatch),
>  	{}
>  };
>  
Thanks,
Ben
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 33/33] arm_mpam: Add kunit tests for props_mismatch()
  2025-08-29 17:11   ` Ben Horgan
@ 2025-09-10 19:31     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 19:31 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi Ben,
On 29/08/2025 18:11, Ben Horgan wrote:
> The tests seem reasonable. Just some comments on the comments.
> 
> On 8/22/25 16:30, James Morse wrote:
>> When features are mismatched between MSC the way features are combined
>> to the class determines whether resctrl can support this SoC.
>>
>> Add some tests to illustrate the sort of thing that is expected to
>> work, and those that must be removed.
>> diff --git a/drivers/resctrl/test_mpam_devices.c b/drivers/resctrl/test_mpam_devices.c
>> index 8e9d6c88171c..ef39696e7ff8 100644
>> --- a/drivers/resctrl/test_mpam_devices.c
>> +++ b/drivers/resctrl/test_mpam_devices.c
>> @@ -4,6 +4,326 @@
>> +static void test_mpam_enable_merge_features(struct kunit *test)
>> +{
[...]
>> +	/* One Class+Comp, two MSC with non-overlapping features */
>> +	fake_comp1.class = &fake_class;
>> +	list_add(&fake_comp1.class_list, &fake_class.components);
>> +	fake_comp2.class = NULL;
>> +	fake_vmsc1.comp = &fake_comp1;
>> +	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
>> +	fake_vmsc2.comp = &fake_comp1;
>> +	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
>> +	fake_ris1.vmsc = &fake_vmsc1;
>> +	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
>> +	fake_ris2.vmsc = &fake_vmsc2;
>> +	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
>> +
>> +	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
>> +	mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
>> +	fake_ris1.props.cpbm_wd = 4;
>> +	fake_ris2.props.cmax_wd = 4;
>> +
>> +	mpam_enable_merge_features(&fake_classes_list);
>> +
>> +	/*
>> +	 * Multiple RIS in different MSC can't the same resource, mismatched
> s/can't the same/can't control the same/
Thanks,
>> +	 * features can not be supported.
>> +	 */
>> +	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
>> +	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
>> +	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
>> +	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 0);
>> +
>> +	RESET_FAKE_HIEARCHY();
>> +
>> +	/* One Class+Comp, two MSC with incompatible overlapping features */
>> +	fake_comp1.class = &fake_class;
>> +	list_add(&fake_comp1.class_list, &fake_class.components);
>> +	fake_comp2.class = NULL;
>> +	fake_vmsc1.comp = &fake_comp1;
>> +	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
>> +	fake_vmsc2.comp = &fake_comp1;
>> +	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
>> +	fake_ris1.vmsc = &fake_vmsc1;
>> +	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
>> +	fake_ris2.vmsc = &fake_vmsc2;
>> +	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
>> +
>> +	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
>> +	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
>> +	mpam_set_feature(mpam_feat_mbw_part, &fake_ris1.props);
>> +	mpam_set_feature(mpam_feat_mbw_part, &fake_ris2.props);
>> +	fake_ris1.props.cpbm_wd = 5;
>> +	fake_ris2.props.cpbm_wd = 3;
>> +	fake_ris1.props.mbw_pbm_bits = 5;
>> +	fake_ris2.props.mbw_pbm_bits = 3;
>> +
>> +	mpam_enable_merge_features(&fake_classes_list);
>> +
>> +	/*
>> +	 * Multiple RIS in different MSC can't the same resource, mismatched
>> +	 * features can not be supported.
>> +	 */
> Missing the word "control" again.
Copy and paste!
>> +	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
>> +	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_mbw_part, &fake_class.props));
>> +	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
>> +	KUNIT_EXPECT_EQ(test, fake_class.props.mbw_pbm_bits, 0);
>> +
>> +	RESET_FAKE_HIEARCHY();
>> +
>> +	/* One Class+Comp, two MSC with overlapping features that need tweaking */
>> +	fake_comp1.class = &fake_class;
>> +	list_add(&fake_comp1.class_list, &fake_class.components);
>> +	fake_comp2.class = NULL;
>> +	fake_vmsc1.comp = &fake_comp1;
>> +	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
>> +	fake_vmsc2.comp = &fake_comp1;
>> +	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
>> +	fake_ris1.vmsc = &fake_vmsc1;
>> +	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
>> +	fake_ris2.vmsc = &fake_vmsc2;
>> +	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
>> +
>> +	mpam_set_feature(mpam_feat_mbw_min, &fake_ris1.props);
>> +	mpam_set_feature(mpam_feat_mbw_min, &fake_ris2.props);
>> +	mpam_set_feature(mpam_feat_cmax_cmax, &fake_ris1.props);
>> +	mpam_set_feature(mpam_feat_cmax_cmax, &fake_ris2.props);
>> +	fake_ris1.props.bwa_wd = 5;
>> +	fake_ris2.props.bwa_wd = 3;
>> +	fake_ris1.props.cmax_wd = 5;
>> +	fake_ris2.props.cmax_wd = 3;
>> +
>> +	mpam_enable_merge_features(&fake_classes_list);
>> +
>> +	/*
>> +	 * Multiple RIS in different MSC can't the same resource, mismatched
>> +	 * features can not be supported.
>> +	 */
> Comment is for a different case.
Fixed as:
	/*
	 * RIS with different control properties need to be sanitised so the
	 * class has the common set of properties.
	 */
>> +	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_min, &fake_class.props));
>> +	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmax, &fake_class.props));
>> +	KUNIT_EXPECT_EQ(test, fake_class.props.bwa_wd, 3);
>> +	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 3);
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
 
- * [PATCH 00/33] arm_mpam: Add basic mpam driver
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (32 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 33/33] arm_mpam: Add kunit tests for props_mismatch() James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-22 15:30 ` [PATCH 01/33] cacheinfo: Expose the code to generate a cache-id from a device_node James Morse
                   ` (33 subsequent siblings)
  67 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Hello,
This is just enough MPAM driver for the ACPI and DT pre-requisites.
It doesn't contain any of the resctrl code, meaning you can't actually drive it
from user-space yet. Becuase of that, its hidden behind CONFIG_EXPERT.
This will change once the user interface is connected up.
This is the initial group of patches that allows the resctrl code to be built
on top. Including that will increase the number of trees that may need to
coordinate, so breaking it up make sense.
The locking looks very strange - but is influenced by the 'mpam-fb' firmware
interface specification that is still alpha. That thing needs to wait for an
interrupt after every system register write, which significantly impacts the
driver. Some features just won't work, e.g. reading the monitor registers via
perf.
The aim is to not have to make invasive changes to the locking to support the
firmware interface, hence it looks strange from day-1.
I've not found a platform that can test all the behaviours around the monitors,
so this is where I'd expect the most bugs.
The MPAM spec that describes all the system and MMIO registers can be found here:
https://developer.arm.com/documentation/ddi0598/db/?lang=en
(Ignored the 'RETIRED' warning - that is just arm moving the documentation around.
 This document has the best overview)
The expectation is this will go via the arm64 tree.
This series is based on v6.17-rc2, and can be retrieved from:
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/driver/rv1
The rest of the driver can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/snapshot/v6.17-rc2
What is MPAM? Set your time-machine to 2020:
https://lore.kernel.org/lkml/20201030161120.227225-1-james.morse@arm.com/
This series was previously posted here:
[RFC] lore.kernel.org/r/20250711183648.30766-2-james.morse@arm.com
Bugs welcome,
Thanks,
James Morse (29):
  cacheinfo: Expose the code to generate a cache-id from a device_node
  drivers: base: cacheinfo: Add helper to find the cache size from
    cpu+level
  ACPI / PPTT: Add a helper to fill a cpumask from a processor container
  ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear
    levels
  ACPI / PPTT: Find cache level by cache-id
  ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
  arm64: kconfig: Add Kconfig entry for MPAM
  ACPI / MPAM: Parse the MPAM table
  arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  arm_mpam: Add the class and component structures for ris firmware
    described
  arm_mpam: Add MPAM MSC register layout definitions
  arm_mpam: Add cpuhp callbacks to probe MSC hardware
  arm_mpam: Probe MSCs to find the supported partid/pmg values
  arm_mpam: Add helpers for managing the locking around the mon_sel
    registers
  arm_mpam: Probe the hardware features resctrl supports
  arm_mpam: Merge supported features during mpam_enable() into
    mpam_class
  arm_mpam: Reset MSC controls from cpu hp callbacks
  arm_mpam: Add a helper to touch an MSC from any CPU
  arm_mpam: Extend reset logic to allow devices to be reset any time
  arm_mpam: Register and enable IRQs
  arm_mpam: Use a static key to indicate when mpam is enabled
  arm_mpam: Allow configuration to be applied and restored during cpu
    online
  arm_mpam: Probe and reset the rest of the features
  arm_mpam: Add helpers to allocate monitors
  arm_mpam: Add mpam_msmon_read() to read monitor value
  arm_mpam: Track bandwidth counter state for overflow and power
    management
  arm_mpam: Add helper to reset saved mbwu state
  arm_mpam: Add kunit test for bitmap reset
  arm_mpam: Add kunit tests for props_mismatch()
Rob Herring (1):
  dt-bindings: arm: Add MPAM MSC binding
Rohit Mathew (2):
  arm_mpam: Probe for long/lwd mbwu counters
  arm_mpam: Use long MBWU counters if supported
Shanker Donthineni (1):
  arm_mpam: Add support for memory controller MSC on DT platforms
 .../devicetree/bindings/arm/arm,mpam-msc.yaml |  200 ++
 arch/arm64/Kconfig                            |   19 +
 drivers/Kconfig                               |    2 +
 drivers/Makefile                              |    1 +
 drivers/acpi/arm64/Kconfig                    |    3 +
 drivers/acpi/arm64/Makefile                   |    1 +
 drivers/acpi/arm64/mpam.c                     |  331 ++
 drivers/acpi/pptt.c                           |  230 +-
 drivers/acpi/tables.c                         |    2 +-
 drivers/base/cacheinfo.c                      |   19 +-
 drivers/resctrl/Kconfig                       |   24 +
 drivers/resctrl/Makefile                      |    4 +
 drivers/resctrl/mpam_devices.c                | 2909 +++++++++++++++++
 drivers/resctrl/mpam_internal.h               |  692 ++++
 drivers/resctrl/test_mpam_devices.c           |  390 +++
 include/linux/acpi.h                          |   26 +
 include/linux/arm_mpam.h                      |   56 +
 include/linux/cacheinfo.h                     |   16 +
 18 files changed, 4911 insertions(+), 14 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/arm/arm,mpam-msc.yaml
 create mode 100644 drivers/acpi/arm64/mpam.c
 create mode 100644 drivers/resctrl/Kconfig
 create mode 100644 drivers/resctrl/Makefile
 create mode 100644 drivers/resctrl/mpam_devices.c
 create mode 100644 drivers/resctrl/mpam_internal.h
 create mode 100644 drivers/resctrl/test_mpam_devices.c
 create mode 100644 include/linux/arm_mpam.h
-- 
2.20.1
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * [PATCH 01/33] cacheinfo: Expose the code to generate a cache-id from a device_node
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (33 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-22 15:30 ` [PATCH 02/33] drivers: base: cacheinfo: Add helper to find the cache size from cpu+level James Morse
                   ` (32 subsequent siblings)
  67 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
The MPAM driver identifies caches by id for use with resctrl. It
needs to know the cache-id when probe-ing, but the value isn't set
in cacheinfo until device_initcall().
Expose the code that generates the cache-id. The parts of the MPAM
driver that run early can use this to set up the resctrl structures
before cacheinfo is ready in device_initcall().
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * Renamed cache_of_get_id() cache_of_calculate_id().
---
 drivers/base/cacheinfo.c  | 19 +++++++++++++------
 include/linux/cacheinfo.h |  1 +
 2 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index 613410705a47..f6289d142ba9 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -207,11 +207,10 @@ static bool match_cache_node(struct device_node *cpu,
 #define arch_compact_of_hwid(_x)	(_x)
 #endif
 
-static void cache_of_set_id(struct cacheinfo *this_leaf,
-			    struct device_node *cache_node)
+unsigned long cache_of_calculate_id(struct device_node *cache_node)
 {
 	struct device_node *cpu;
-	u32 min_id = ~0;
+	unsigned long min_id = ~0UL;
 
 	for_each_of_cpu_node(cpu) {
 		u64 id = of_get_cpu_hwid(cpu, 0);
@@ -219,15 +218,23 @@ static void cache_of_set_id(struct cacheinfo *this_leaf,
 		id = arch_compact_of_hwid(id);
 		if (FIELD_GET(GENMASK_ULL(63, 32), id)) {
 			of_node_put(cpu);
-			return;
+			return ~0UL;
 		}
 
 		if (match_cache_node(cpu, cache_node))
 			min_id = min(min_id, id);
 	}
 
-	if (min_id != ~0) {
-		this_leaf->id = min_id;
+	return min_id;
+}
+
+static void cache_of_set_id(struct cacheinfo *this_leaf,
+			    struct device_node *cache_node)
+{
+	unsigned long id = cache_of_calculate_id(cache_node);
+
+	if (id != ~0UL) {
+		this_leaf->id = id;
 		this_leaf->attributes |= CACHE_ID;
 	}
 }
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index c8f4f0a0b874..2dcbb69139e9 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -112,6 +112,7 @@ int acpi_get_cache_info(unsigned int cpu,
 #endif
 
 const struct attribute_group *cache_get_priv_group(struct cacheinfo *this_leaf);
+unsigned long cache_of_calculate_id(struct device_node *np);
 
 /*
  * Get the cacheinfo structure for the cache associated with @cpu at
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * [PATCH 02/33] drivers: base: cacheinfo: Add helper to find the cache size from cpu+level
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (34 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 01/33] cacheinfo: Expose the code to generate a cache-id from a device_node James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-22 15:30 ` [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container James Morse
                   ` (31 subsequent siblings)
  67 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
MPAM needs to know the size of a cache associated with a particular CPU.
The DT/ACPI agnostic way of doing this is to ask cacheinfo.
Add a helper to do this.
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v1:
 * Converted to kdoc.
 * Simplified helper to use get_cpu_cacheinfo_level().
---
 include/linux/cacheinfo.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index 2dcbb69139e9..e12d6f2c6a57 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -148,6 +148,21 @@ static inline int get_cpu_cacheinfo_id(int cpu, int level)
 	return ci ? ci->id : -1;
 }
 
+/**
+ * get_cpu_cacheinfo_size() - Get the size of the cache.
+ * @cpu:      The cpu that is associated with the cache.
+ * @level:    The level of the cache as seen by @cpu.
+ *
+ * Callers must hold the cpuhp lock.
+ * Returns the cache-size on success, or 0 for an error.
+ */
+static inline unsigned int get_cpu_cacheinfo_size(int cpu, int level)
+{
+	struct cacheinfo *ci = get_cpu_cacheinfo_level(cpu, level);
+
+	return ci ? ci->size : 0;
+}
+
 #if defined(CONFIG_ARM64) || defined(CONFIG_ARM)
 #define use_arch_cache_info()	(true)
 #else
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (35 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 02/33] drivers: base: cacheinfo: Add helper to find the cache size from cpu+level James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-22 15:30 ` [PATCH 04/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels James Morse
                   ` (30 subsequent siblings)
  67 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
The PPTT describes CPUs and caches, as well as processor containers.
The ACPI table for MPAM describes the set of CPUs that can access an MSC
with the UID of a processor container.
Add a helper to find the processor container by its id, then walk
the possible CPUs to fill a cpumask with the CPUs that have this
processor container as a parent.
CC: Dave Martin <dave.martin@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
 * Dropped has_leaf_flag dodging of acpi_pptt_leaf_node()
 * Added missing : in kernel-doc
 * Made helper return void as this never actually returns an error.
---
 drivers/acpi/pptt.c  | 86 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h |  3 ++
 2 files changed, 89 insertions(+)
diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 54676e3d82dd..4791ca2bdfac 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -298,6 +298,92 @@ static struct acpi_pptt_processor *acpi_find_processor_node(struct acpi_table_he
 	return NULL;
 }
 
+/**
+ * acpi_pptt_get_child_cpus() - Find all the CPUs below a PPTT processor node
+ * @table_hdr:		A reference to the PPTT table.
+ * @parent_node:	A pointer to the processor node in the @table_hdr.
+ * @cpus:		A cpumask to fill with the CPUs below @parent_node.
+ *
+ * Walks up the PPTT from every possible CPU to find if the provided
+ * @parent_node is a parent of this CPU.
+ */
+static void acpi_pptt_get_child_cpus(struct acpi_table_header *table_hdr,
+				     struct acpi_pptt_processor *parent_node,
+				     cpumask_t *cpus)
+{
+	struct acpi_pptt_processor *cpu_node;
+	u32 acpi_id;
+	int cpu;
+
+	cpumask_clear(cpus);
+
+	for_each_possible_cpu(cpu) {
+		acpi_id = get_acpi_id_for_cpu(cpu);
+		cpu_node = acpi_find_processor_node(table_hdr, acpi_id);
+
+		while (cpu_node) {
+			if (cpu_node == parent_node) {
+				cpumask_set_cpu(cpu, cpus);
+				break;
+			}
+			cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
+		}
+	}
+}
+
+/**
+ * acpi_pptt_get_cpus_from_container() - Populate a cpumask with all CPUs in a
+ *                                       processor containers
+ * @acpi_cpu_id:	The UID of the processor container.
+ * @cpus:		The resulting CPU mask.
+ *
+ * Find the specified Processor Container, and fill @cpus with all the cpus
+ * below it.
+ *
+ * Not all 'Processor' entries in the PPTT are either a CPU or a Processor
+ * Container, they may exist purely to describe a Private resource. CPUs
+ * have to be leaves, so a Processor Container is a non-leaf that has the
+ * 'ACPI Processor ID valid' flag set.
+ *
+ * Return: 0 for a complete walk, or an error if the mask is incomplete.
+ */
+void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus)
+{
+	struct acpi_pptt_processor *cpu_node;
+	struct acpi_table_header *table_hdr;
+	struct acpi_subtable_header *entry;
+	unsigned long table_end;
+	acpi_status status;
+	bool leaf_flag;
+	u32 proc_sz;
+
+	cpumask_clear(cpus);
+
+	status = acpi_get_table(ACPI_SIG_PPTT, 0, &table_hdr);
+	if (ACPI_FAILURE(status))
+		return;
+
+	table_end = (unsigned long)table_hdr + table_hdr->length;
+	entry = ACPI_ADD_PTR(struct acpi_subtable_header, table_hdr,
+			     sizeof(struct acpi_table_pptt));
+	proc_sz = sizeof(struct acpi_pptt_processor);
+	while ((unsigned long)entry + proc_sz <= table_end) {
+		cpu_node = (struct acpi_pptt_processor *)entry;
+		if (entry->type == ACPI_PPTT_TYPE_PROCESSOR &&
+		    cpu_node->flags & ACPI_PPTT_ACPI_PROCESSOR_ID_VALID) {
+			leaf_flag = acpi_pptt_leaf_node(table_hdr, cpu_node);
+			if (!leaf_flag) {
+				if (cpu_node->acpi_processor_id == acpi_cpu_id)
+					acpi_pptt_get_child_cpus(table_hdr, cpu_node, cpus);
+			}
+		}
+		entry = ACPI_ADD_PTR(struct acpi_subtable_header, entry,
+				     entry->length);
+	}
+
+	acpi_put_table(table_hdr);
+}
+
 static u8 acpi_cache_type(enum cache_type type)
 {
 	switch (type) {
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 1c5bb1e887cd..f97a9ff678cc 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -1541,6 +1541,7 @@ int find_acpi_cpu_topology(unsigned int cpu, int level);
 int find_acpi_cpu_topology_cluster(unsigned int cpu);
 int find_acpi_cpu_topology_package(unsigned int cpu);
 int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
+void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
 #else
 static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
 {
@@ -1562,6 +1563,8 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
 {
 	return -EINVAL;
 }
+static inline void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id,
+						     cpumask_t *cpus) { }
 #endif
 
 void acpi_arch_init(void);
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * [PATCH 04/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (36 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 03/33] ACPI / PPTT: Add a helper to fill a cpumask from a processor container James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-09-10 13:44   ` Lorenzo Pieralisi
  2025-08-22 15:30 ` [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id James Morse
                   ` (29 subsequent siblings)
  67 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
acpi_count_levels() passes the number of levels back via a pointer argument.
It also passes this to acpi_find_cache_level() as the starting_level, and
preserves this value as it walks up the cpu_node tree counting the levels.
This means the caller must initialise 'levels' due to acpi_count_levels()
internals. The only caller acpi_get_cache_info() happens to have already
initialised levels to zero, which acpi_count_levels() depends on to get the
correct result.
Two results are passed back from acpi_count_levels(), unlike split_levels,
levels is not optional.
Split these two results up. The mandatory 'levels' is always returned,
which hides the internal details from the caller, and avoids having
duplicated initialisation in all callers. split_levels remains an
optional argument passed back.
Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
 * Made acpi_count_levels() return the levels value.
---
 drivers/acpi/pptt.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 4791ca2bdfac..8f9b9508acba 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -181,10 +181,10 @@ acpi_find_cache_level(struct acpi_table_header *table_hdr,
  * levels and split cache levels (data/instruction).
  * @table_hdr: Pointer to the head of the PPTT table
  * @cpu_node: processor node we wish to count caches for
- * @levels: Number of levels if success.
  * @split_levels:	Number of split cache levels (data/instruction) if
- *			success. Can by NULL.
+ *			success. Can be NULL.
  *
+ * Returns number of levels.
  * Given a processor node containing a processing unit, walk into it and count
  * how many levels exist solely for it, and then walk up each level until we hit
  * the root node (ignore the package level because it may be possible to have
@@ -192,14 +192,18 @@ acpi_find_cache_level(struct acpi_table_header *table_hdr,
  * split cache levels (data/instruction) that exist at each level on the way
  * up.
  */
-static void acpi_count_levels(struct acpi_table_header *table_hdr,
-			      struct acpi_pptt_processor *cpu_node,
-			      unsigned int *levels, unsigned int *split_levels)
+static int acpi_count_levels(struct acpi_table_header *table_hdr,
+			     struct acpi_pptt_processor *cpu_node,
+			     unsigned int *split_levels)
 {
+	int starting_level = 0;
+
 	do {
-		acpi_find_cache_level(table_hdr, cpu_node, levels, split_levels, 0, 0);
+		acpi_find_cache_level(table_hdr, cpu_node, &starting_level, split_levels, 0, 0);
 		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
 	} while (cpu_node);
+
+	return starting_level;
 }
 
 /**
@@ -731,7 +735,7 @@ int acpi_get_cache_info(unsigned int cpu, unsigned int *levels,
 	if (!cpu_node)
 		return -ENOENT;
 
-	acpi_count_levels(table, cpu_node, levels, split_levels);
+	*levels = acpi_count_levels(table, cpu_node, split_levels);
 
 	pr_debug("Cache Setup: last_level=%d split_levels=%d\n",
 		 *levels, split_levels ? *split_levels : -1);
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 04/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels
  2025-08-22 15:30 ` [PATCH 04/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels James Morse
@ 2025-09-10 13:44   ` Lorenzo Pieralisi
  2025-09-10 19:19     ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Lorenzo Pieralisi @ 2025-09-10 13:44 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
On Fri, Aug 22, 2025 at 03:30:19PM +0000, James Morse wrote:
> acpi_count_levels() passes the number of levels back via a pointer argument.
> It also passes this to acpi_find_cache_level() as the starting_level, and
> preserves this value as it walks up the cpu_node tree counting the levels.
> 
> This means the caller must initialise 'levels' due to acpi_count_levels()
> internals. The only caller acpi_get_cache_info() happens to have already
> initialised levels to zero, which acpi_count_levels() depends on to get the
> correct result.
> 
> Two results are passed back from acpi_count_levels(), unlike split_levels,
> levels is not optional.
> 
> Split these two results up. The mandatory 'levels' is always returned,
> which hides the internal details from the caller, and avoids having
> duplicated initialisation in all callers. split_levels remains an
> optional argument passed back.
> 
> Suggested-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> 
> ---
> Changes since RFC:
>  * Made acpi_count_levels() return the levels value.
> ---
>  drivers/acpi/pptt.c | 18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 4791ca2bdfac..8f9b9508acba 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -181,10 +181,10 @@ acpi_find_cache_level(struct acpi_table_header *table_hdr,
>   * levels and split cache levels (data/instruction).
>   * @table_hdr: Pointer to the head of the PPTT table
>   * @cpu_node: processor node we wish to count caches for
> - * @levels: Number of levels if success.
>   * @split_levels:	Number of split cache levels (data/instruction) if
> - *			success. Can by NULL.
> + *			success. Can be NULL.
Nit: tempting but this change does not belong here.
>   *
> + * Returns number of levels.
>   * Given a processor node containing a processing unit, walk into it and count
>   * how many levels exist solely for it, and then walk up each level until we hit
>   * the root node (ignore the package level because it may be possible to have
> @@ -192,14 +192,18 @@ acpi_find_cache_level(struct acpi_table_header *table_hdr,
>   * split cache levels (data/instruction) that exist at each level on the way
>   * up.
>   */
> -static void acpi_count_levels(struct acpi_table_header *table_hdr,
> -			      struct acpi_pptt_processor *cpu_node,
> -			      unsigned int *levels, unsigned int *split_levels)
> +static int acpi_count_levels(struct acpi_table_header *table_hdr,
> +			     struct acpi_pptt_processor *cpu_node,
> +			     unsigned int *split_levels)
>  {
> +	int starting_level = 0;
> +
>  	do {
> -		acpi_find_cache_level(table_hdr, cpu_node, levels, split_levels, 0, 0);
> +		acpi_find_cache_level(table_hdr, cpu_node, &starting_level, split_levels, 0, 0);
>  		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>  	} while (cpu_node);
> +
> +	return starting_level;
>  }
>  
>  /**
> @@ -731,7 +735,7 @@ int acpi_get_cache_info(unsigned int cpu, unsigned int *levels,
>  	if (!cpu_node)
>  		return -ENOENT;
>  
> -	acpi_count_levels(table, cpu_node, levels, split_levels);
> +	*levels = acpi_count_levels(table, cpu_node, split_levels);
Looks fine to me - though initializing
*levels = 0
upper in the function now becomes superfluous (?) (well, it initializes
*levels to 0 if an error path is hit but on that case the caller should
not expect *levels to be initialized to anything IIUC).
Apart from these (very) minor things:
Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
>  	pr_debug("Cache Setup: last_level=%d split_levels=%d\n",
>  		 *levels, split_levels ? *split_levels : -1);
> -- 
> 2.20.1
> 
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 04/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels
  2025-09-10 13:44   ` Lorenzo Pieralisi
@ 2025-09-10 19:19     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 19:19 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hi Lorenzo,
On 10/09/2025 14:44, Lorenzo Pieralisi wrote:
> On Fri, Aug 22, 2025 at 03:30:19PM +0000, James Morse wrote:
>> acpi_count_levels() passes the number of levels back via a pointer argument.
>> It also passes this to acpi_find_cache_level() as the starting_level, and
>> preserves this value as it walks up the cpu_node tree counting the levels.
>>
>> This means the caller must initialise 'levels' due to acpi_count_levels()
>> internals. The only caller acpi_get_cache_info() happens to have already
>> initialised levels to zero, which acpi_count_levels() depends on to get the
>> correct result.
>>
>> Two results are passed back from acpi_count_levels(), unlike split_levels,
>> levels is not optional.
>>
>> Split these two results up. The mandatory 'levels' is always returned,
>> which hides the internal details from the caller, and avoids having
>> duplicated initialisation in all callers. split_levels remains an
>> optional argument passed back.
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index 4791ca2bdfac..8f9b9508acba 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -181,10 +181,10 @@ acpi_find_cache_level(struct acpi_table_header *table_hdr,
>>   * levels and split cache levels (data/instruction).
>>   * @table_hdr: Pointer to the head of the PPTT table
>>   * @cpu_node: processor node we wish to count caches for
>> - * @levels: Number of levels if success.
>>   * @split_levels:	Number of split cache levels (data/instruction) if
>> - *			success. Can by NULL.
>> + *			success. Can be NULL.
> 
> Nit: tempting but this change does not belong here.
Clearly a much loved typo!
>> @@ -192,14 +192,18 @@ acpi_find_cache_level(struct acpi_table_header *table_hdr,
>>   * split cache levels (data/instruction) that exist at each level on the way
>>   * up.
>>   */
>> -static void acpi_count_levels(struct acpi_table_header *table_hdr,
>> -			      struct acpi_pptt_processor *cpu_node,
>> -			      unsigned int *levels, unsigned int *split_levels)
>> +static int acpi_count_levels(struct acpi_table_header *table_hdr,
>> +			     struct acpi_pptt_processor *cpu_node,
>> +			     unsigned int *split_levels)
>>  {
>> +	int starting_level = 0;
>> +
>>  	do {
>> -		acpi_find_cache_level(table_hdr, cpu_node, levels, split_levels, 0, 0);
>> +		acpi_find_cache_level(table_hdr, cpu_node, &starting_level, split_levels, 0, 0);
>>  		cpu_node = fetch_pptt_node(table_hdr, cpu_node->parent);
>>  	} while (cpu_node);
>> +
>> +	return starting_level;
>>  }
>>  
>>  /**
>> @@ -731,7 +735,7 @@ int acpi_get_cache_info(unsigned int cpu, unsigned int *levels,
>>  	if (!cpu_node)
>>  		return -ENOENT;
>>  
>> -	acpi_count_levels(table, cpu_node, levels, split_levels);
>> +	*levels = acpi_count_levels(table, cpu_node, split_levels);
> Looks fine to me - though initializing
> 
> *levels = 0
> 
> upper in the function now becomes superfluous (?) (well, it initializes
> *levels to 0 if an error path is hit but on that case the caller should
> not expect *levels to be initialized to anything IIUC).
Maybe, but its the least surprising thing to do - hence the existing early clobber.
> Apart from these (very) minor things:
> 
> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Thanks!
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
 
- * [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (37 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 04/33] ACPI / PPTT: Stop acpi_count_levels() expecting callers to clear levels James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-22 15:30 ` [PATCH 06/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id James Morse
                   ` (28 subsequent siblings)
  67 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
The MPAM table identifies caches by id. The MPAM driver also wants to know
the cache level to determine if the platform is of the shape that can be
managed via resctrl. Cacheinfo has this information, but only for CPUs that
are online.
Waiting for all CPUs to come online is a problem for platforms where
CPUs are brought online late by user-space.
Add a helper that walks every possible cache, until it finds the one
identified by cache-id, then return the level.
Add a cleanup based free-ing mechanism for acpi_get_table().
CC: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
 * acpi_count_levels() now returns a value.
 * Converted the table-get stuff to use Jonathan's cleanup helper.
 * Dropped Sudeep's Review tag due to the cleanup change.
---
 drivers/acpi/pptt.c  | 64 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h | 17 ++++++++++++
 2 files changed, 81 insertions(+)
diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 8f9b9508acba..660457644a5b 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -907,3 +907,67 @@ int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
 	return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE,
 					  ACPI_PPTT_ACPI_IDENTICAL);
 }
+
+/**
+ * find_acpi_cache_level_from_id() - Get the level of the specified cache
+ * @cache_id: The id field of the unified cache
+ *
+ * Determine the level relative to any CPU for the unified cache identified by
+ * cache_id. This allows the property to be found even if the CPUs are offline.
+ *
+ * The returned level can be used to group unified caches that are peers.
+ *
+ * The PPTT table must be rev 3 or later,
+ *
+ * If one CPUs L2 is shared with another as L3, this function will return
+ * an unpredictable value.
+ *
+ * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
+ * Otherwise returns a value which represents the level of the specified cache.
+ */
+int find_acpi_cache_level_from_id(u32 cache_id)
+{
+	u32 acpi_cpu_id;
+	int level, cpu, num_levels;
+	struct acpi_pptt_cache *cache;
+	struct acpi_pptt_cache_v1 *cache_v1;
+	struct acpi_pptt_processor *cpu_node;
+	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);
+
+	if (IS_ERR(table))
+		return PTR_ERR(table);
+
+	if (table->revision < 3)
+		return -ENOENT;
+
+	/*
+	 * If we found the cache first, we'd still need to walk from each CPU
+	 * to find the level...
+	 */
+	for_each_possible_cpu(cpu) {
+		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
+		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
+		if (!cpu_node)
+			return -ENOENT;
+		num_levels = acpi_count_levels(table, cpu_node, NULL);
+
+		/* Start at 1 for L1 */
+		for (level = 1; level <= num_levels; level++) {
+			cache = acpi_find_cache_node(table, acpi_cpu_id,
+						     ACPI_PPTT_CACHE_TYPE_UNIFIED,
+						     level, &cpu_node);
+			if (!cache)
+				continue;
+
+			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
+						cache,
+						sizeof(struct acpi_pptt_cache));
+
+			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
+			    cache_v1->cache_id == cache_id)
+				return level;
+		}
+	}
+
+	return -ENOENT;
+}
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index f97a9ff678cc..30c10b1dcdb2 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -8,6 +8,7 @@
 #ifndef _LINUX_ACPI_H
 #define _LINUX_ACPI_H
 
+#include <linux/cleanup.h>
 #include <linux/errno.h>
 #include <linux/ioport.h>	/* for struct resource */
 #include <linux/resource_ext.h>
@@ -221,6 +222,17 @@ void acpi_reserve_initial_tables (void);
 void acpi_table_init_complete (void);
 int acpi_table_init (void);
 
+static inline struct acpi_table_header *acpi_get_table_ret(char *signature, u32 instance)
+{
+	struct acpi_table_header *table;
+	int status = acpi_get_table(signature, instance, &table);
+
+	if (ACPI_FAILURE(status))
+		return ERR_PTR(-ENOENT);
+	return table;
+}
+DEFINE_FREE(acpi_table, struct acpi_table_header *, if (!IS_ERR(_T)) acpi_put_table(_T))
+
 int acpi_table_parse(char *id, acpi_tbl_table_handler handler);
 int __init_or_acpilib acpi_table_parse_entries(char *id,
 		unsigned long table_size, int entry_id,
@@ -1542,6 +1554,7 @@ int find_acpi_cpu_topology_cluster(unsigned int cpu);
 int find_acpi_cpu_topology_package(unsigned int cpu);
 int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
 void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
+int find_acpi_cache_level_from_id(u32 cache_id);
 #else
 static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
 {
@@ -1565,6 +1578,10 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu)
 }
 static inline void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id,
 						     cpumask_t *cpus) { }
+static inline int find_acpi_cache_level_from_id(u32 cache_id)
+{
+	return -EINVAL;
+}
 #endif
 
 void acpi_arch_init(void);
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * [PATCH 06/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (38 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 05/33] ACPI / PPTT: Find cache level by cache-id James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-09-10 16:06   ` Lorenzo Pieralisi
  2025-08-22 15:30 ` [PATCH 07/33] arm64: kconfig: Add Kconfig entry for MPAM James Morse
                   ` (27 subsequent siblings)
  67 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Rohit Mathew
MPAM identifies CPUs by the cache_id in the PPTT cache structure.
The driver needs to know which CPUs are associated with the cache,
the CPUs may not all be online, so cacheinfo does not have the
information.
Add a helper to pull this information out of the PPTT.
CC: Rohit Mathew <Rohit.Mathew@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
---
Changes since RFC:
 * acpi_count_levels() now returns a value.
 * Converted the table-get stuff to use Jonathan's cleanup helper.
 * Dropped Sudeep's Review tag due to the cleanup change.
---
 drivers/acpi/pptt.c  | 62 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/acpi.h |  6 +++++
 2 files changed, 68 insertions(+)
diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index 660457644a5b..cb93a9a7f9b6 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -971,3 +971,65 @@ int find_acpi_cache_level_from_id(u32 cache_id)
 
 	return -ENOENT;
 }
+
+/**
+ * acpi_pptt_get_cpumask_from_cache_id() - Get the cpus associated with the
+ *					   specified cache
+ * @cache_id: The id field of the unified cache
+ * @cpus: Where to build the cpumask
+ *
+ * Determine which CPUs are below this cache in the PPTT. This allows the property
+ * to be found even if the CPUs are offline.
+ *
+ * The PPTT table must be rev 3 or later,
+ *
+ * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
+ * Otherwise returns 0 and sets the cpus in the provided cpumask.
+ */
+int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus)
+{
+	u32 acpi_cpu_id;
+	int level, cpu, num_levels;
+	struct acpi_pptt_cache *cache;
+	struct acpi_pptt_cache_v1 *cache_v1;
+	struct acpi_pptt_processor *cpu_node;
+	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);
+
+	cpumask_clear(cpus);
+
+	if (IS_ERR(table))
+		return -ENOENT;
+
+	if (table->revision < 3)
+		return -ENOENT;
+
+	/*
+	 * If we found the cache first, we'd still need to walk from each cpu.
+	 */
+	for_each_possible_cpu(cpu) {
+		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
+		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
+		if (!cpu_node)
+			return 0;
+		num_levels = acpi_count_levels(table, cpu_node, NULL);
+
+		/* Start at 1 for L1 */
+		for (level = 1; level <= num_levels; level++) {
+			cache = acpi_find_cache_node(table, acpi_cpu_id,
+						     ACPI_PPTT_CACHE_TYPE_UNIFIED,
+						     level, &cpu_node);
+			if (!cache)
+				continue;
+
+			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
+						cache,
+						sizeof(struct acpi_pptt_cache));
+
+			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
+			    cache_v1->cache_id == cache_id)
+				cpumask_set_cpu(cpu, cpus);
+		}
+	}
+
+	return 0;
+}
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 30c10b1dcdb2..4ad08f5f1d83 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -1555,6 +1555,7 @@ int find_acpi_cpu_topology_package(unsigned int cpu);
 int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
 void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
 int find_acpi_cache_level_from_id(u32 cache_id);
+int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus);
 #else
 static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
 {
@@ -1582,6 +1583,11 @@ static inline int find_acpi_cache_level_from_id(u32 cache_id)
 {
 	return -EINVAL;
 }
+static inline int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id,
+						      cpumask_t *cpus)
+{
+	return -EINVAL;
+}
 #endif
 
 void acpi_arch_init(void);
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 06/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
  2025-08-22 15:30 ` [PATCH 06/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id James Morse
@ 2025-09-10 16:06   ` Lorenzo Pieralisi
  2025-09-10 19:18     ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Lorenzo Pieralisi @ 2025-09-10 16:06 UTC (permalink / raw)
  To: James Morse
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
On Fri, Aug 22, 2025 at 03:30:21PM +0000, James Morse wrote:
> MPAM identifies CPUs by the cache_id in the PPTT cache structure.
> 
> The driver needs to know which CPUs are associated with the cache,
> the CPUs may not all be online, so cacheinfo does not have the
> information.
> 
> Add a helper to pull this information out of the PPTT.
> 
> CC: Rohit Mathew <Rohit.Mathew@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
> ---
> Changes since RFC:
>  * acpi_count_levels() now returns a value.
>  * Converted the table-get stuff to use Jonathan's cleanup helper.
>  * Dropped Sudeep's Review tag due to the cleanup change.
> ---
>  drivers/acpi/pptt.c  | 62 ++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/acpi.h |  6 +++++
>  2 files changed, 68 insertions(+)
> 
> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 660457644a5b..cb93a9a7f9b6 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -971,3 +971,65 @@ int find_acpi_cache_level_from_id(u32 cache_id)
>  
>  	return -ENOENT;
>  }
> +
> +/**
> + * acpi_pptt_get_cpumask_from_cache_id() - Get the cpus associated with the
> + *					   specified cache
> + * @cache_id: The id field of the unified cache
> + * @cpus: Where to build the cpumask
> + *
> + * Determine which CPUs are below this cache in the PPTT. This allows the property
> + * to be found even if the CPUs are offline.
> + *
> + * The PPTT table must be rev 3 or later,
> + *
> + * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
> + * Otherwise returns 0 and sets the cpus in the provided cpumask.
> + */
> +int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus)
> +{
> +	u32 acpi_cpu_id;
> +	int level, cpu, num_levels;
> +	struct acpi_pptt_cache *cache;
> +	struct acpi_pptt_cache_v1 *cache_v1;
> +	struct acpi_pptt_processor *cpu_node;
> +	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);
> +
> +	cpumask_clear(cpus);
> +
> +	if (IS_ERR(table))
> +		return -ENOENT;
> +
> +	if (table->revision < 3)
> +		return -ENOENT;
> +
> +	/*
> +	 * If we found the cache first, we'd still need to walk from each cpu.
> +	 */
> +	for_each_possible_cpu(cpu) {
> +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
> +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
> +		if (!cpu_node)
> +			return 0;
If for a possible cpu you don't get an acpi_pptt_processor node we return 0,
is that correct ? Should not the loop continue ? Forgive me if that's a
dumb question.
> +		num_levels = acpi_count_levels(table, cpu_node, NULL);
> +
> +		/* Start at 1 for L1 */
> +		for (level = 1; level <= num_levels; level++) {
> +			cache = acpi_find_cache_node(table, acpi_cpu_id,
> +						     ACPI_PPTT_CACHE_TYPE_UNIFIED,
> +						     level, &cpu_node);
> +			if (!cache)
> +				continue;
> +
> +			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
> +						cache,
> +						sizeof(struct acpi_pptt_cache));
> +
> +			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
> +			    cache_v1->cache_id == cache_id)
> +				cpumask_set_cpu(cpu, cpus);
> +		}
> +	}
> +
> +	return 0;
> +}
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index 30c10b1dcdb2..4ad08f5f1d83 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -1555,6 +1555,7 @@ int find_acpi_cpu_topology_package(unsigned int cpu);
>  int find_acpi_cpu_topology_hetero_id(unsigned int cpu);
>  void acpi_pptt_get_cpus_from_container(u32 acpi_cpu_id, cpumask_t *cpus);
>  int find_acpi_cache_level_from_id(u32 cache_id);
> +int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus);
>  #else
>  static inline int acpi_pptt_cpu_is_thread(unsigned int cpu)
>  {
> @@ -1582,6 +1583,11 @@ static inline int find_acpi_cache_level_from_id(u32 cache_id)
>  {
>  	return -EINVAL;
>  }
> +static inline int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id,
> +						      cpumask_t *cpus)
> +{
> +	return -EINVAL;
Nit: You might want the return value here to be coherent with what the function
documentation states (ie return -ENOENT;)
Other than that:
Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
> +}
>  #endif
>  
>  void acpi_arch_init(void);
> -- 
> 2.20.1
> 
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 06/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id
  2025-09-10 16:06   ` Lorenzo Pieralisi
@ 2025-09-10 19:18     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 19:18 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: linux-kernel, linux-arm-kernel, linux-acpi, devicetree,
	shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hi Lorenzo,
On 10/09/2025 17:06, Lorenzo Pieralisi wrote:
> On Fri, Aug 22, 2025 at 03:30:21PM +0000, James Morse wrote:
>> MPAM identifies CPUs by the cache_id in the PPTT cache structure.
>>
>> The driver needs to know which CPUs are associated with the cache,
>> the CPUs may not all be online, so cacheinfo does not have the
>> information.
>>
>> Add a helper to pull this information out of the PPTT.
>> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
>> index 660457644a5b..cb93a9a7f9b6 100644
>> --- a/drivers/acpi/pptt.c
>> +++ b/drivers/acpi/pptt.c
>> @@ -971,3 +971,65 @@ int find_acpi_cache_level_from_id(u32 cache_id)
>>  
>>  	return -ENOENT;
>>  }
>> +
>> +/**
>> + * acpi_pptt_get_cpumask_from_cache_id() - Get the cpus associated with the
>> + *					   specified cache
>> + * @cache_id: The id field of the unified cache
>> + * @cpus: Where to build the cpumask
>> + *
>> + * Determine which CPUs are below this cache in the PPTT. This allows the property
>> + * to be found even if the CPUs are offline.
>> + *
>> + * The PPTT table must be rev 3 or later,
>> + *
>> + * Return: -ENOENT if the PPTT doesn't exist, or the cache cannot be found.
>> + * Otherwise returns 0 and sets the cpus in the provided cpumask.
>> + */
>> +int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id, cpumask_t *cpus)
>> +{
>> +	u32 acpi_cpu_id;
>> +	int level, cpu, num_levels;
>> +	struct acpi_pptt_cache *cache;
>> +	struct acpi_pptt_cache_v1 *cache_v1;
>> +	struct acpi_pptt_processor *cpu_node;
>> +	struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_PPTT, 0);
>> +
>> +	cpumask_clear(cpus);
>> +
>> +	if (IS_ERR(table))
>> +		return -ENOENT;
>> +
>> +	if (table->revision < 3)
>> +		return -ENOENT;
>> +
>> +	/*
>> +	 * If we found the cache first, we'd still need to walk from each cpu.
>> +	 */
>> +	for_each_possible_cpu(cpu) {
>> +		acpi_cpu_id = get_acpi_id_for_cpu(cpu);
>> +		cpu_node = acpi_find_processor_node(table, acpi_cpu_id);
>> +		if (!cpu_node)
>> +			return 0;
> 
> If for a possible cpu you don't get an acpi_pptt_processor node we return 0,
> is that correct ? Should not the loop continue ? Forgive me if that's a
> dumb question.
That looks like me throwing my hands up in the air and bailing out!
Yes, the loop continue-ing would be better as only possible CPUs that are missing a
PPTT description (...and cache hierarchy...) would be missing form the bitmap.
It's probably worth a WARN_ON_ONCE() too.
Thanks for spotting that!
>> +		num_levels = acpi_count_levels(table, cpu_node, NULL);
>> +
>> +		/* Start at 1 for L1 */
>> +		for (level = 1; level <= num_levels; level++) {
>> +			cache = acpi_find_cache_node(table, acpi_cpu_id,
>> +						     ACPI_PPTT_CACHE_TYPE_UNIFIED,
>> +						     level, &cpu_node);
>> +			if (!cache)
>> +				continue;
>> +
>> +			cache_v1 = ACPI_ADD_PTR(struct acpi_pptt_cache_v1,
>> +						cache,
>> +						sizeof(struct acpi_pptt_cache));
>> +
>> +			if (cache->flags & ACPI_PPTT_CACHE_ID_VALID &&
>> +			    cache_v1->cache_id == cache_id)
>> +				cpumask_set_cpu(cpu, cpus);
>> +		}
>> +	}
>> +
>> +	return 0;
>> +}
>> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
>> index 30c10b1dcdb2..4ad08f5f1d83 100644
>> --- a/include/linux/acpi.h
>> +++ b/include/linux/acpi.h
>> @@ -1555,6 +1555,7 @@ int find_acpi_cpu_topology_package(unsigned int cpu);
>> @@ -1582,6 +1583,11 @@ static inline int find_acpi_cache_level_from_id(u32 cache_id)
>>  {
>>  	return -EINVAL;
>>  }
>> +static inline int acpi_pptt_get_cpumask_from_cache_id(u32 cache_id,
>> +						      cpumask_t *cpus)
>> +{
>> +	return -EINVAL;
> 
> Nit: You might want the return value here to be coherent with what the function
> documentation states (ie return -ENOENT;)
Makes sense,
> Other than that:
> 
> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Thanks!
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
 
- * [PATCH 07/33] arm64: kconfig: Add Kconfig entry for MPAM
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (39 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 06/33] ACPI / PPTT: Add a helper to fill a cpumask from a cache_id James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-22 15:30 ` [PATCH 08/33] ACPI / MPAM: Parse the MPAM table James Morse
                   ` (26 subsequent siblings)
  67 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
The bulk of the MPAM driver lives outside the arch code because it
largely manages MMIO devices that generate interrupts. The driver
needs a Kconfig symbol to enable it, as MPAM is only found on arm64
platforms, that is where the Kconfig option makes the most sense.
This Kconfig option will later be used by the arch code to enable
or disable the MPAM context-switch code, and registering the CPUs
properties with the MPAM driver.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
 arch/arm64/Kconfig | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e9bbfacc35a6..658e47fc0c5a 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2060,6 +2060,23 @@ config ARM64_TLB_RANGE
 	  ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
 	  range of input addresses.
 
+config ARM64_MPAM
+	bool "Enable support for MPAM"
+	help
+	  Memory Partitioning and Monitoring is an optional extension
+	  that allows the CPUs to mark load and store transactions with
+	  labels for partition-id and performance-monitoring-group.
+	  System components, such as the caches, can use the partition-id
+	  to apply a performance policy. MPAM monitors can use the
+	  partition-id and performance-monitoring-group to measure the
+	  cache occupancy or data throughput.
+
+	  Use of this extension requires CPU support, support in the
+	  memory system components (MSC), and a description from firmware
+	  of where the MSC are in the address space.
+
+	  MPAM is exposed to user-space via the resctrl pseudo filesystem.
+
 endmenu # "ARMv8.4 architectural features"
 
 menu "ARMv8.5 architectural features"
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * [PATCH 08/33] ACPI / MPAM: Parse the MPAM table
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (40 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 07/33] arm64: kconfig: Add Kconfig entry for MPAM James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-09-09  6:54   ` Shaopeng Tan (Fujitsu)
  2025-08-22 15:30 ` [PATCH 09/33] dt-bindings: arm: Add MPAM MSC binding James Morse
                   ` (25 subsequent siblings)
  67 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Add code to parse the arm64 specific MPAM table, looking up the cache
level from the PPTT and feeding the end result into the MPAM driver.
CC: Carl Worth <carl@os.amperecomputing.com>
Link: https://developer.arm.com/documentation/den0065/3-0bet/?lang=en
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
 * Used DEFINE_RES_IRQ_NAMED() and friends macros.
 * Additional error handling.
 * Check for zero sized MSC.
 * Allow table revisions greater than 1. (no spec for revision 0!)
 * Use cleanup helpers to retrive ACPI tables, which allows some functions
   to be folded together.
---
 arch/arm64/Kconfig          |   1 +
 drivers/acpi/arm64/Kconfig  |   3 +
 drivers/acpi/arm64/Makefile |   1 +
 drivers/acpi/arm64/mpam.c   | 331 ++++++++++++++++++++++++++++++++++++
 drivers/acpi/tables.c       |   2 +-
 include/linux/arm_mpam.h    |  46 +++++
 6 files changed, 383 insertions(+), 1 deletion(-)
 create mode 100644 drivers/acpi/arm64/mpam.c
 create mode 100644 include/linux/arm_mpam.h
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 658e47fc0c5a..e51ccf1da102 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2062,6 +2062,7 @@ config ARM64_TLB_RANGE
 
 config ARM64_MPAM
 	bool "Enable support for MPAM"
+	select ACPI_MPAM if ACPI
 	help
 	  Memory Partitioning and Monitoring is an optional extension
 	  that allows the CPUs to mark load and store transactions with
diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
index b3ed6212244c..f2fd79f22e7d 100644
--- a/drivers/acpi/arm64/Kconfig
+++ b/drivers/acpi/arm64/Kconfig
@@ -21,3 +21,6 @@ config ACPI_AGDI
 
 config ACPI_APMT
 	bool
+
+config ACPI_MPAM
+	bool
diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
index 05ecde9eaabe..9390b57cb564 100644
--- a/drivers/acpi/arm64/Makefile
+++ b/drivers/acpi/arm64/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_ACPI_APMT) 	+= apmt.o
 obj-$(CONFIG_ACPI_FFH)		+= ffh.o
 obj-$(CONFIG_ACPI_GTDT) 	+= gtdt.o
 obj-$(CONFIG_ACPI_IORT) 	+= iort.o
+obj-$(CONFIG_ACPI_MPAM) 	+= mpam.o
 obj-$(CONFIG_ACPI_PROCESSOR_IDLE) += cpuidle.o
 obj-$(CONFIG_ARM_AMBA)		+= amba.o
 obj-y				+= dma.o init.o
diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c
new file mode 100644
index 000000000000..e55fc2729ac5
--- /dev/null
+++ b/drivers/acpi/arm64/mpam.c
@@ -0,0 +1,331 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2025 Arm Ltd.
+
+/* Parse the MPAM ACPI table feeding the discovered nodes into the driver */
+
+#define pr_fmt(fmt) "ACPI MPAM: " fmt
+
+#include <linux/acpi.h>
+#include <linux/arm_mpam.h>
+#include <linux/bits.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/platform_device.h>
+
+#include <acpi/processor.h>
+
+/*
+ * Flags for acpi_table_mpam_msc.*_interrupt_flags.
+ * See 2.1.1 Interrupt Flags, Table 5, of DEN0065B_MPAM_ACPI_3.0-bet.
+ */
+#define ACPI_MPAM_MSC_IRQ_MODE_MASK                    BIT(0)
+#define ACPI_MPAM_MSC_IRQ_TYPE_MASK                    GENMASK(2, 1)
+#define ACPI_MPAM_MSC_IRQ_TYPE_WIRED                   0
+#define ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER BIT(3)
+#define ACPI_MPAM_MSC_IRQ_AFFINITY_VALID               BIT(4)
+
+static bool frob_irq(struct platform_device *pdev, int intid, u32 flags,
+		     int *irq, u32 processor_container_uid)
+{
+	int sense;
+
+	if (!intid)
+		return false;
+
+	if (FIELD_GET(ACPI_MPAM_MSC_IRQ_TYPE_MASK, flags) !=
+	    ACPI_MPAM_MSC_IRQ_TYPE_WIRED)
+		return false;
+
+	sense = FIELD_GET(ACPI_MPAM_MSC_IRQ_MODE_MASK, flags);
+
+	/*
+	 * If the GSI is in the GIC's PPI range, try and create a partitioned
+	 * percpu interrupt.
+	 */
+	if (16 <= intid && intid < 32 && processor_container_uid != ~0) {
+		pr_err_once("Partitioned interrupts not supported\n");
+		return false;
+	}
+
+	*irq = acpi_register_gsi(&pdev->dev, intid, sense, ACPI_ACTIVE_HIGH);
+	if (*irq <= 0) {
+		pr_err_once("Failed to register interrupt 0x%x with ACPI\n",
+			    intid);
+		return false;
+	}
+
+	return true;
+}
+
+static void acpi_mpam_parse_irqs(struct platform_device *pdev,
+				 struct acpi_mpam_msc_node *tbl_msc,
+				 struct resource *res, int *res_idx)
+{
+	u32 flags, aff;
+	int irq;
+
+	flags = tbl_msc->overflow_interrupt_flags;
+	if (flags & ACPI_MPAM_MSC_IRQ_AFFINITY_VALID &&
+	    flags & ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER)
+		aff = tbl_msc->overflow_interrupt_affinity;
+	else
+		aff = ~0;
+	if (frob_irq(pdev, tbl_msc->overflow_interrupt, flags, &irq, aff))
+		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "overflow");
+
+	flags = tbl_msc->error_interrupt_flags;
+	if (flags & ACPI_MPAM_MSC_IRQ_AFFINITY_VALID &&
+	    flags & ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER)
+		aff = tbl_msc->error_interrupt_affinity;
+	else
+		aff = ~0;
+	if (frob_irq(pdev, tbl_msc->error_interrupt, flags, &irq, aff))
+		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "error");
+}
+
+static int acpi_mpam_parse_resource(struct mpam_msc *msc,
+				    struct acpi_mpam_resource_node *res)
+{
+	int level, nid;
+	u32 cache_id;
+
+	switch (res->locator_type) {
+	case ACPI_MPAM_LOCATION_TYPE_PROCESSOR_CACHE:
+		cache_id = res->locator.cache_locator.cache_reference;
+		level = find_acpi_cache_level_from_id(cache_id);
+		if (level <= 0) {
+			pr_err_once("Bad level (%u) for cache with id %u\n", level, cache_id);
+			return -EINVAL;
+		}
+		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_CACHE,
+				       level, cache_id);
+	case ACPI_MPAM_LOCATION_TYPE_MEMORY:
+		nid = pxm_to_node(res->locator.memory_locator.proximity_domain);
+		if (nid == NUMA_NO_NODE)
+			nid = 0;
+		return mpam_ris_create(msc, res->ris_index, MPAM_CLASS_MEMORY,
+				       255, nid);
+	default:
+		/* These get discovered later and treated as unknown */
+		return 0;
+	}
+}
+
+int acpi_mpam_parse_resources(struct mpam_msc *msc,
+			      struct acpi_mpam_msc_node *tbl_msc)
+{
+	int i, err;
+	struct acpi_mpam_resource_node *resources;
+
+	resources = (struct acpi_mpam_resource_node *)(tbl_msc + 1);
+	for (i = 0; i < tbl_msc->num_resource_nodes; i++) {
+		err = acpi_mpam_parse_resource(msc, &resources[i]);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static bool __init parse_msc_pm_link(struct acpi_mpam_msc_node *tbl_msc,
+				     struct platform_device *pdev,
+				     u32 *acpi_id)
+{
+	bool acpi_id_valid = false;
+	struct acpi_device *buddy;
+	char hid[16], uid[16];
+	int err;
+
+	memset(&hid, 0, sizeof(hid));
+	memcpy(hid, &tbl_msc->hardware_id_linked_device,
+	       sizeof(tbl_msc->hardware_id_linked_device));
+
+	if (!strcmp(hid, ACPI_PROCESSOR_CONTAINER_HID)) {
+		*acpi_id = tbl_msc->instance_id_linked_device;
+		acpi_id_valid = true;
+	}
+
+	err = snprintf(uid, sizeof(uid), "%u",
+		       tbl_msc->instance_id_linked_device);
+	if (err >= sizeof(uid))
+		return acpi_id_valid;
+
+	buddy = acpi_dev_get_first_match_dev(hid, uid, -1);
+	if (buddy)
+		device_link_add(&pdev->dev, &buddy->dev, DL_FLAG_STATELESS);
+
+	return acpi_id_valid;
+}
+
+static int decode_interface_type(struct acpi_mpam_msc_node *tbl_msc,
+				 enum mpam_msc_iface *iface)
+{
+	switch (tbl_msc->interface_type) {
+	case 0:
+		*iface = MPAM_IFACE_MMIO;
+		return 0;
+	case 0xa:
+		*iface = MPAM_IFACE_PCC;
+		return 0;
+	default:
+		return -EINVAL;
+	}
+}
+
+static int __init acpi_mpam_parse(void)
+{
+        struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
+	char *table_end, *table_offset = (char *)(table + 1);
+	struct property_entry props[4]; /* needs a sentinel */
+	struct acpi_mpam_msc_node *tbl_msc;
+	int next_res, next_prop, err = 0;
+	struct acpi_device *companion;
+	struct platform_device *pdev;
+	enum mpam_msc_iface iface;
+	struct resource res[3];
+	char uid[16];
+	u32 acpi_id;
+
+	if (acpi_disabled || !system_supports_mpam() || IS_ERR(table))
+		return 0;
+
+	if (IS_ERR(table))
+		return 0;
+
+	if (table->revision < 1)
+		return 0;
+
+	table_end = (char *)table + table->length;
+
+	while (table_offset < table_end) {
+		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
+		table_offset += tbl_msc->length;
+
+		/*
+		 * If any of the reserved fields are set, make no attempt to
+		 * parse the msc structure. This will prevent the driver from
+		 * probing all the MSC, meaning it can't discover the system
+		 * wide supported partid and pmg ranges. This avoids whatever
+		 * this MSC is truncating the partids and creating a screaming
+		 * error interrupt.
+		 */
+		if (tbl_msc->reserved || tbl_msc->reserved1 || tbl_msc->reserved2)
+			continue;
+
+		if (!tbl_msc->mmio_size)
+			continue;
+
+		if (decode_interface_type(tbl_msc, &iface))
+			continue;
+
+		next_res = 0;
+		next_prop = 0;
+		memset(res, 0, sizeof(res));
+		memset(props, 0, sizeof(props));
+
+		pdev = platform_device_alloc("mpam_msc", tbl_msc->identifier);
+		if (!pdev) {
+			err = -ENOMEM;
+			break;
+		}
+
+		if (tbl_msc->length < sizeof(*tbl_msc)) {
+			err = -EINVAL;
+			break;
+		}
+
+		/* Some power management is described in the namespace: */
+		err = snprintf(uid, sizeof(uid), "%u", tbl_msc->identifier);
+		if (err > 0 && err < sizeof(uid)) {
+			companion = acpi_dev_get_first_match_dev("ARMHAA5C", uid, -1);
+			if (companion)
+				ACPI_COMPANION_SET(&pdev->dev, companion);
+		}
+
+		if (iface == MPAM_IFACE_MMIO) {
+			res[next_res++] = DEFINE_RES_MEM_NAMED(tbl_msc->base_address,
+							       tbl_msc->mmio_size,
+							       "MPAM:MSC");
+		} else if (iface == MPAM_IFACE_PCC) {
+			props[next_prop++] = PROPERTY_ENTRY_U32("pcc-channel",
+								tbl_msc->base_address);
+			next_prop++;
+		}
+
+		acpi_mpam_parse_irqs(pdev, tbl_msc, res, &next_res);
+		err = platform_device_add_resources(pdev, res, next_res);
+		if (err)
+			break;
+
+		props[next_prop++] = PROPERTY_ENTRY_U32("arm,not-ready-us",
+							tbl_msc->max_nrdy_usec);
+
+		/*
+		 * The MSC's CPU affinity is described via its linked power
+		 * management device, but only if it points at a Processor or
+		 * Processor Container.
+		 */
+		if (parse_msc_pm_link(tbl_msc, pdev, &acpi_id)) {
+			props[next_prop++] = PROPERTY_ENTRY_U32("cpu_affinity",
+								acpi_id);
+		}
+
+		err = device_create_managed_software_node(&pdev->dev, props,
+							  NULL);
+		if (err)
+			break;
+
+		/* Come back later if you want the RIS too */
+		err = platform_device_add_data(pdev, tbl_msc, tbl_msc->length);
+		if (err)
+			break;
+
+		err = platform_device_add(pdev);
+		if (err)
+			break;
+	}
+
+	if (err)
+		platform_device_put(pdev);
+
+	return err;
+}
+
+int acpi_mpam_count_msc(void)
+{
+        struct acpi_table_header *table __free(acpi_table) = acpi_get_table_ret(ACPI_SIG_MPAM, 0);
+	char *table_end, *table_offset = (char *)(table + 1);
+	struct acpi_mpam_msc_node *tbl_msc;
+	int count = 0;
+
+	if (IS_ERR(table))
+		return 0;
+
+	if (table->revision < 1)
+		return 0;
+
+	tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
+	table_end = (char *)table + table->length;
+
+	while (table_offset < table_end) {
+		if (!tbl_msc->mmio_size)
+			continue;
+
+		if (tbl_msc->length < sizeof(*tbl_msc))
+			return -EINVAL;
+
+		count++;
+
+		table_offset += tbl_msc->length;
+		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
+	}
+
+	return count;
+}
+
+/*
+ * Call after ACPI devices have been created, which happens behind acpi_scan_init()
+ * called from subsys_initcall(). PCC requires the mailbox driver, which is
+ * initialised from postcore_initcall().
+ */
+subsys_initcall_sync(acpi_mpam_parse);
diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index fa9bb8c8ce95..835e3795ede3 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -408,7 +408,7 @@ static const char table_sigs[][ACPI_NAMESEG_SIZE] __nonstring_array __initconst
 	ACPI_SIG_PSDT, ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT,
 	ACPI_SIG_IORT, ACPI_SIG_NFIT, ACPI_SIG_HMAT, ACPI_SIG_PPTT,
 	ACPI_SIG_NHLT, ACPI_SIG_AEST, ACPI_SIG_CEDT, ACPI_SIG_AGDI,
-	ACPI_SIG_NBFT };
+	ACPI_SIG_NBFT, ACPI_SIG_MPAM };
 
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
 
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
new file mode 100644
index 000000000000..0edefa6ba019
--- /dev/null
+++ b/include/linux/arm_mpam.h
@@ -0,0 +1,46 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2025 Arm Ltd. */
+
+#ifndef __LINUX_ARM_MPAM_H
+#define __LINUX_ARM_MPAM_H
+
+#include <linux/acpi.h>
+#include <linux/types.h>
+
+struct mpam_msc;
+
+enum mpam_msc_iface {
+	MPAM_IFACE_MMIO,	/* a real MPAM MSC */
+	MPAM_IFACE_PCC,		/* a fake MPAM MSC */
+};
+
+enum mpam_class_types {
+	MPAM_CLASS_CACHE,       /* Well known caches, e.g. L2 */
+	MPAM_CLASS_MEMORY,      /* Main memory */
+	MPAM_CLASS_UNKNOWN,     /* Everything else, e.g. SMMU */
+};
+
+#ifdef CONFIG_ACPI_MPAM
+/* Parse the ACPI description of resources entries for this MSC. */
+int acpi_mpam_parse_resources(struct mpam_msc *msc,
+			      struct acpi_mpam_msc_node *tbl_msc);
+
+int acpi_mpam_count_msc(void);
+#else
+static inline int acpi_mpam_parse_resources(struct mpam_msc *msc,
+					    struct acpi_mpam_msc_node *tbl_msc)
+{
+	return -EINVAL;
+}
+
+static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
+#endif
+
+static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
+				  enum mpam_class_types type, u8 class_id,
+				  int component_id)
+{
+	return -EINVAL;
+}
+
+#endif /* __LINUX_ARM_MPAM_H */
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * RE: [PATCH 08/33] ACPI / MPAM: Parse the MPAM table
  2025-08-22 15:30 ` [PATCH 08/33] ACPI / MPAM: Parse the MPAM table James Morse
@ 2025-09-09  6:54   ` Shaopeng Tan (Fujitsu)
  2025-09-10 19:31     ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-09-09  6:54 UTC (permalink / raw)
  To: 'James Morse', linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-acpi@vger.kernel.org,
	devicetree@vger.kernel.org
  Cc: shameerali.kolothum.thodi@huawei.com, D Scott Phillips OS,
	carl@os.amperecomputing.com, lcherian@marvell.com,
	bobo.shaobowang@huawei.com, baolin.wang@linux.alibaba.com,
	Jamie Iles, Xin Hao, peternewman@google.com,
	dfustini@baylibre.com, amitsinght@marvell.com, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay@nvidia.com, baisheng.gao@unisoc.com, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hello James,
> Add code to parse the arm64 specific MPAM table, looking up the cache level
> from the PPTT and feeding the end result into the MPAM driver.
> 
> CC: Carl Worth <carl@os.amperecomputing.com>
> Link: https://developer.arm.com/documentation/den0065/3-0bet/?lang=en
> Signed-off-by: James Morse <james.morse@arm.com>
> 
> ---
> Changes since RFC:
>  * Used DEFINE_RES_IRQ_NAMED() and friends macros.
>  * Additional error handling.
>  * Check for zero sized MSC.
>  * Allow table revisions greater than 1. (no spec for revision 0!)
>  * Use cleanup helpers to retrive ACPI tables, which allows some functions
>    to be folded together.
> ---
>  arch/arm64/Kconfig          |   1 +
>  drivers/acpi/arm64/Kconfig  |   3 +
>  drivers/acpi/arm64/Makefile |   1 +
>  drivers/acpi/arm64/mpam.c   | 331
> ++++++++++++++++++++++++++++++++++++
>  drivers/acpi/tables.c       |   2 +-
>  include/linux/arm_mpam.h    |  46 +++++
>  6 files changed, 383 insertions(+), 1 deletion(-)  create mode 100644
> drivers/acpi/arm64/mpam.c  create mode 100644
> include/linux/arm_mpam.h
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index
> 658e47fc0c5a..e51ccf1da102 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2062,6 +2062,7 @@ config ARM64_TLB_RANGE
> 
>  config ARM64_MPAM
>  	bool "Enable support for MPAM"
> +	select ACPI_MPAM if ACPI
>  	help
>  	  Memory Partitioning and Monitoring is an optional extension
>  	  that allows the CPUs to mark load and store transactions with diff
> --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig index
> b3ed6212244c..f2fd79f22e7d 100644
> --- a/drivers/acpi/arm64/Kconfig
> +++ b/drivers/acpi/arm64/Kconfig
> @@ -21,3 +21,6 @@ config ACPI_AGDI
> 
>  config ACPI_APMT
>  	bool
> +
> +config ACPI_MPAM
> +	bool
> diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile index
> 05ecde9eaabe..9390b57cb564 100644
> --- a/drivers/acpi/arm64/Makefile
> +++ b/drivers/acpi/arm64/Makefile
> @@ -4,6 +4,7 @@ obj-$(CONFIG_ACPI_APMT) 	+= apmt.o
>  obj-$(CONFIG_ACPI_FFH)		+= ffh.o
>  obj-$(CONFIG_ACPI_GTDT) 	+= gtdt.o
>  obj-$(CONFIG_ACPI_IORT) 	+= iort.o
> +obj-$(CONFIG_ACPI_MPAM) 	+= mpam.o
>  obj-$(CONFIG_ACPI_PROCESSOR_IDLE) += cpuidle.o
>  obj-$(CONFIG_ARM_AMBA)		+= amba.o
>  obj-y				+= dma.o init.o
> diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c new
> file mode 100644 index 000000000000..e55fc2729ac5
> --- /dev/null
> +++ b/drivers/acpi/arm64/mpam.c
> @@ -0,0 +1,331 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2025 Arm Ltd.
> +
> +/* Parse the MPAM ACPI table feeding the discovered nodes into the
> +driver */
> +
> +#define pr_fmt(fmt) "ACPI MPAM: " fmt
> +
> +#include <linux/acpi.h>
> +#include <linux/arm_mpam.h>
> +#include <linux/bits.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
> +#include <linux/platform_device.h>
> +
> +#include <acpi/processor.h>
> +
> +/*
> + * Flags for acpi_table_mpam_msc.*_interrupt_flags.
> + * See 2.1.1 Interrupt Flags, Table 5, of DEN0065B_MPAM_ACPI_3.0-bet.
> + */
> +#define ACPI_MPAM_MSC_IRQ_MODE_MASK                    BIT(0)
> +#define ACPI_MPAM_MSC_IRQ_TYPE_MASK
> GENMASK(2, 1)
> +#define ACPI_MPAM_MSC_IRQ_TYPE_WIRED                   0
> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER
> BIT(3)
> +#define ACPI_MPAM_MSC_IRQ_AFFINITY_VALID               BIT(4)
> +
> +static bool frob_irq(struct platform_device *pdev, int intid, u32 flags,
> +		     int *irq, u32 processor_container_uid) {
> +	int sense;
> +
> +	if (!intid)
> +		return false;
> +
> +	if (FIELD_GET(ACPI_MPAM_MSC_IRQ_TYPE_MASK, flags) !=
> +	    ACPI_MPAM_MSC_IRQ_TYPE_WIRED)
> +		return false;
> +
> +	sense = FIELD_GET(ACPI_MPAM_MSC_IRQ_MODE_MASK, flags);
> +
> +	/*
> +	 * If the GSI is in the GIC's PPI range, try and create a partitioned
> +	 * percpu interrupt.
> +	 */
> +	if (16 <= intid && intid < 32 && processor_container_uid != ~0) {
> +		pr_err_once("Partitioned interrupts not supported\n");
> +		return false;
> +	}
> +
> +	*irq = acpi_register_gsi(&pdev->dev, intid, sense,
> ACPI_ACTIVE_HIGH);
> +	if (*irq <= 0) {
> +		pr_err_once("Failed to register interrupt 0x%x with ACPI\n",
> +			    intid);
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +static void acpi_mpam_parse_irqs(struct platform_device *pdev,
> +				 struct acpi_mpam_msc_node *tbl_msc,
> +				 struct resource *res, int *res_idx) {
> +	u32 flags, aff;
> +	int irq;
> +
> +	flags = tbl_msc->overflow_interrupt_flags;
> +	if (flags & ACPI_MPAM_MSC_IRQ_AFFINITY_VALID &&
> +	    flags &
> ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER)
> +		aff = tbl_msc->overflow_interrupt_affinity;
> +	else
> +		aff = ~0;
> +	if (frob_irq(pdev, tbl_msc->overflow_interrupt, flags, &irq, aff))
> +		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq,
> "overflow");
> +
> +	flags = tbl_msc->error_interrupt_flags;
> +	if (flags & ACPI_MPAM_MSC_IRQ_AFFINITY_VALID &&
> +	    flags &
> ACPI_MPAM_MSC_IRQ_AFFINITY_PROCESSOR_CONTAINER)
> +		aff = tbl_msc->error_interrupt_affinity;
> +	else
> +		aff = ~0;
> +	if (frob_irq(pdev, tbl_msc->error_interrupt, flags, &irq, aff))
> +		res[(*res_idx)++] = DEFINE_RES_IRQ_NAMED(irq, "error"); }
> +
> +static int acpi_mpam_parse_resource(struct mpam_msc *msc,
> +				    struct acpi_mpam_resource_node *res) {
> +	int level, nid;
> +	u32 cache_id;
> +
> +	switch (res->locator_type) {
> +	case ACPI_MPAM_LOCATION_TYPE_PROCESSOR_CACHE:
> +		cache_id = res->locator.cache_locator.cache_reference;
> +		level = find_acpi_cache_level_from_id(cache_id);
> +		if (level <= 0) {
> +			pr_err_once("Bad level (%u) for cache with id %u\n",
> level, cache_id);
> +			return -EINVAL;
> +		}
> +		return mpam_ris_create(msc, res->ris_index,
> MPAM_CLASS_CACHE,
> +				       level, cache_id);
> +	case ACPI_MPAM_LOCATION_TYPE_MEMORY:
> +		nid =
> pxm_to_node(res->locator.memory_locator.proximity_domain);
> +		if (nid == NUMA_NO_NODE)
> +			nid = 0;
> +		return mpam_ris_create(msc, res->ris_index,
> MPAM_CLASS_MEMORY,
> +				       255, nid);
> +	default:
> +		/* These get discovered later and treated as unknown */
> +		return 0;
> +	}
> +}
> +
> +int acpi_mpam_parse_resources(struct mpam_msc *msc,
> +			      struct acpi_mpam_msc_node *tbl_msc) {
> +	int i, err;
> +	struct acpi_mpam_resource_node *resources;
> +
> +	resources = (struct acpi_mpam_resource_node *)(tbl_msc + 1);
> +	for (i = 0; i < tbl_msc->num_resource_nodes; i++) {
> +		err = acpi_mpam_parse_resource(msc, &resources[i]);
> +		if (err)
> +			return err;
> +	}
> +
> +	return 0;
> +}
> +
> +static bool __init parse_msc_pm_link(struct acpi_mpam_msc_node
> *tbl_msc,
> +				     struct platform_device *pdev,
> +				     u32 *acpi_id)
> +{
> +	bool acpi_id_valid = false;
> +	struct acpi_device *buddy;
> +	char hid[16], uid[16];
> +	int err;
> +
> +	memset(&hid, 0, sizeof(hid));
> +	memcpy(hid, &tbl_msc->hardware_id_linked_device,
> +	       sizeof(tbl_msc->hardware_id_linked_device));
> +
> +	if (!strcmp(hid, ACPI_PROCESSOR_CONTAINER_HID)) {
> +		*acpi_id = tbl_msc->instance_id_linked_device;
> +		acpi_id_valid = true;
> +	}
> +
> +	err = snprintf(uid, sizeof(uid), "%u",
> +		       tbl_msc->instance_id_linked_device);
> +	if (err >= sizeof(uid))
> +		return acpi_id_valid;
> +
> +	buddy = acpi_dev_get_first_match_dev(hid, uid, -1);
> +	if (buddy)
> +		device_link_add(&pdev->dev, &buddy->dev,
> DL_FLAG_STATELESS);
> +
> +	return acpi_id_valid;
> +}
> +
> +static int decode_interface_type(struct acpi_mpam_msc_node *tbl_msc,
> +				 enum mpam_msc_iface *iface)
> +{
> +	switch (tbl_msc->interface_type) {
> +	case 0:
> +		*iface = MPAM_IFACE_MMIO;
> +		return 0;
> +	case 0xa:
> +		*iface = MPAM_IFACE_PCC;
> +		return 0;
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
> +static int __init acpi_mpam_parse(void) {
> +        struct acpi_table_header *table __free(acpi_table) =
> acpi_get_table_ret(ACPI_SIG_MPAM, 0);
> +	char *table_end, *table_offset = (char *)(table + 1);
> +	struct property_entry props[4]; /* needs a sentinel */
> +	struct acpi_mpam_msc_node *tbl_msc;
> +	int next_res, next_prop, err = 0;
> +	struct acpi_device *companion;
> +	struct platform_device *pdev;
> +	enum mpam_msc_iface iface;
> +	struct resource res[3];
> +	char uid[16];
> +	u32 acpi_id;
> +
> +	if (acpi_disabled || !system_supports_mpam() || IS_ERR(table))
> +		return 0;
> +
> +	if (IS_ERR(table))
> +		return 0;
This is redundant, it's the same as the previous line.
Best regards,
Shaopeng TAN
> +	if (table->revision < 1)
> +		return 0;
> +
> +	table_end = (char *)table + table->length;
> +
> +	while (table_offset < table_end) {
> +		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
> +		table_offset += tbl_msc->length;
> +
> +		/*
> +		 * If any of the reserved fields are set, make no attempt to
> +		 * parse the msc structure. This will prevent the driver from
> +		 * probing all the MSC, meaning it can't discover the system
> +		 * wide supported partid and pmg ranges. This avoids
> whatever
> +		 * this MSC is truncating the partids and creating a screaming
> +		 * error interrupt.
> +		 */
> +		if (tbl_msc->reserved || tbl_msc->reserved1 ||
> tbl_msc->reserved2)
> +			continue;
> +
> +		if (!tbl_msc->mmio_size)
> +			continue;
> +
> +		if (decode_interface_type(tbl_msc, &iface))
> +			continue;
> +
> +		next_res = 0;
> +		next_prop = 0;
> +		memset(res, 0, sizeof(res));
> +		memset(props, 0, sizeof(props));
> +
> +		pdev = platform_device_alloc("mpam_msc",
> tbl_msc->identifier);
> +		if (!pdev) {
> +			err = -ENOMEM;
> +			break;
> +		}
> +
> +		if (tbl_msc->length < sizeof(*tbl_msc)) {
> +			err = -EINVAL;
> +			break;
> +		}
> +
> +		/* Some power management is described in the namespace:
> */
> +		err = snprintf(uid, sizeof(uid), "%u", tbl_msc->identifier);
> +		if (err > 0 && err < sizeof(uid)) {
> +			companion =
> acpi_dev_get_first_match_dev("ARMHAA5C", uid, -1);
> +			if (companion)
> +				ACPI_COMPANION_SET(&pdev->dev,
> companion);
> +		}
> +
> +		if (iface == MPAM_IFACE_MMIO) {
> +			res[next_res++] =
> DEFINE_RES_MEM_NAMED(tbl_msc->base_address,
> +
> tbl_msc->mmio_size,
> +
> "MPAM:MSC");
> +		} else if (iface == MPAM_IFACE_PCC) {
> +			props[next_prop++] =
> PROPERTY_ENTRY_U32("pcc-channel",
> +
> 	tbl_msc->base_address);
> +			next_prop++;
> +		}
> +
> +		acpi_mpam_parse_irqs(pdev, tbl_msc, res, &next_res);
> +		err = platform_device_add_resources(pdev, res, next_res);
> +		if (err)
> +			break;
> +
> +		props[next_prop++] =
> PROPERTY_ENTRY_U32("arm,not-ready-us",
> +
> 	tbl_msc->max_nrdy_usec);
> +
> +		/*
> +		 * The MSC's CPU affinity is described via its linked power
> +		 * management device, but only if it points at a Processor or
> +		 * Processor Container.
> +		 */
> +		if (parse_msc_pm_link(tbl_msc, pdev, &acpi_id)) {
> +			props[next_prop++] =
> PROPERTY_ENTRY_U32("cpu_affinity",
> +								acpi_id);
> +		}
> +
> +		err = device_create_managed_software_node(&pdev->dev,
> props,
> +							  NULL);
> +		if (err)
> +			break;
> +
> +		/* Come back later if you want the RIS too */
> +		err = platform_device_add_data(pdev, tbl_msc,
> tbl_msc->length);
> +		if (err)
> +			break;
> +
> +		err = platform_device_add(pdev);
> +		if (err)
> +			break;
> +	}
> +
> +	if (err)
> +		platform_device_put(pdev);
> +
> +	return err;
> +}
> +
> +int acpi_mpam_count_msc(void)
> +{
> +        struct acpi_table_header *table __free(acpi_table) =
> acpi_get_table_ret(ACPI_SIG_MPAM, 0);
> +	char *table_end, *table_offset = (char *)(table + 1);
> +	struct acpi_mpam_msc_node *tbl_msc;
> +	int count = 0;
> +
> +	if (IS_ERR(table))
> +		return 0;
> +
> +	if (table->revision < 1)
> +		return 0;
> +
> +	tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
> +	table_end = (char *)table + table->length;
> +
> +	while (table_offset < table_end) {
> +		if (!tbl_msc->mmio_size)
> +			continue;
> +
> +		if (tbl_msc->length < sizeof(*tbl_msc))
> +			return -EINVAL;
> +
> +		count++;
> +
> +		table_offset += tbl_msc->length;
> +		tbl_msc = (struct acpi_mpam_msc_node *)table_offset;
> +	}
> +
> +	return count;
> +}
> +
> +/*
> + * Call after ACPI devices have been created, which happens behind
> +acpi_scan_init()
> + * called from subsys_initcall(). PCC requires the mailbox driver,
> +which is
> + * initialised from postcore_initcall().
> + */
> +subsys_initcall_sync(acpi_mpam_parse);
> diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c index
> fa9bb8c8ce95..835e3795ede3 100644
> --- a/drivers/acpi/tables.c
> +++ b/drivers/acpi/tables.c
> @@ -408,7 +408,7 @@ static const char table_sigs[][ACPI_NAMESEG_SIZE]
> __nonstring_array __initconst
>  	ACPI_SIG_PSDT, ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT,
>  	ACPI_SIG_IORT, ACPI_SIG_NFIT, ACPI_SIG_HMAT, ACPI_SIG_PPTT,
>  	ACPI_SIG_NHLT, ACPI_SIG_AEST, ACPI_SIG_CEDT, ACPI_SIG_AGDI,
> -	ACPI_SIG_NBFT };
> +	ACPI_SIG_NBFT, ACPI_SIG_MPAM };
> 
>  #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
> 
> diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h new file
> mode 100644 index 000000000000..0edefa6ba019
> --- /dev/null
> +++ b/include/linux/arm_mpam.h
> @@ -0,0 +1,46 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (C) 2025 Arm Ltd. */
> +
> +#ifndef __LINUX_ARM_MPAM_H
> +#define __LINUX_ARM_MPAM_H
> +
> +#include <linux/acpi.h>
> +#include <linux/types.h>
> +
> +struct mpam_msc;
> +
> +enum mpam_msc_iface {
> +	MPAM_IFACE_MMIO,	/* a real MPAM MSC */
> +	MPAM_IFACE_PCC,		/* a fake MPAM MSC */
> +};
> +
> +enum mpam_class_types {
> +	MPAM_CLASS_CACHE,       /* Well known caches, e.g. L2 */
> +	MPAM_CLASS_MEMORY,      /* Main memory */
> +	MPAM_CLASS_UNKNOWN,     /* Everything else, e.g. SMMU */
> +};
> +
> +#ifdef CONFIG_ACPI_MPAM
> +/* Parse the ACPI description of resources entries for this MSC. */ int
> +acpi_mpam_parse_resources(struct mpam_msc *msc,
> +			      struct acpi_mpam_msc_node *tbl_msc);
> +
> +int acpi_mpam_count_msc(void);
> +#else
> +static inline int acpi_mpam_parse_resources(struct mpam_msc *msc,
> +					    struct acpi_mpam_msc_node
> *tbl_msc) {
> +	return -EINVAL;
> +}
> +
> +static inline int acpi_mpam_count_msc(void) { return -EINVAL; } #endif
> +
> +static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
> +				  enum mpam_class_types type, u8
> class_id,
> +				  int component_id)
> +{
> +	return -EINVAL;
> +}
> +
> +#endif /* __LINUX_ARM_MPAM_H */
> --
> 2.20.1
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 08/33] ACPI / MPAM: Parse the MPAM table
  2025-09-09  6:54   ` Shaopeng Tan (Fujitsu)
@ 2025-09-10 19:31     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 19:31 UTC (permalink / raw)
  To: Shaopeng Tan (Fujitsu), linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-acpi@vger.kernel.org,
	devicetree@vger.kernel.org
  Cc: shameerali.kolothum.thodi@huawei.com, D Scott Phillips OS,
	carl@os.amperecomputing.com, lcherian@marvell.com,
	bobo.shaobowang@huawei.com, baolin.wang@linux.alibaba.com,
	Jamie Iles, Xin Hao, peternewman@google.com,
	dfustini@baylibre.com, amitsinght@marvell.com, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay@nvidia.com, baisheng.gao@unisoc.com, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hi Shaopeng,
On 09/09/2025 07:54, Shaopeng Tan (Fujitsu) wrote:
>> Add code to parse the arm64 specific MPAM table, looking up the cache level
>> from the PPTT and feeding the end result into the MPAM driver.
>> diff --git a/drivers/acpi/arm64/mpam.c b/drivers/acpi/arm64/mpam.c new
>> file mode 100644 index 000000000000..e55fc2729ac5
>> --- /dev/null
>> +++ b/drivers/acpi/arm64/mpam.c
>> @@ -0,0 +1,331 @@
>> +static int __init acpi_mpam_parse(void) {
>> +        struct acpi_table_header *table __free(acpi_table) =
>> acpi_get_table_ret(ACPI_SIG_MPAM, 0);
>> +	char *table_end, *table_offset = (char *)(table + 1);
>> +	struct property_entry props[4]; /* needs a sentinel */
>> +	struct acpi_mpam_msc_node *tbl_msc;
>> +	int next_res, next_prop, err = 0;
>> +	struct acpi_device *companion;
>> +	struct platform_device *pdev;
>> +	enum mpam_msc_iface iface;
>> +	struct resource res[3];
>> +	char uid[16];
>> +	u32 acpi_id;
>> +
>> +	if (acpi_disabled || !system_supports_mpam() || IS_ERR(table))
>> +		return 0;
>> +
>> +	if (IS_ERR(table))
>> +		return 0;
> This is redundant, it's the same as the previous line.
Fixed, thanks.
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
 
- * [PATCH 09/33] dt-bindings: arm: Add MPAM MSC binding
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (41 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 08/33] ACPI / MPAM: Parse the MPAM table James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-22 15:30 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
                   ` (24 subsequent siblings)
  67 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
From: Rob Herring <robh@kernel.org>
The binding is designed around the assumption that an MSC will be a
sub-block of something else such as a memory controller, cache controller,
or IOMMU. However, it's certainly possible a design does not have that
association or has a mixture of both, so the binding illustrates how we can
support that with RIS child nodes.
A key part of MPAM is we need to know about all of the MSCs in the system
before it can be enabled. This drives the need for the genericish
'arm,mpam-msc' compatible. Though we can't assume an MSC is accessible
until a h/w specific driver potentially enables the h/w.
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
 * Syntax(?) corrections supplied by Rob.
 * Culled some context in the example.
---
 .../devicetree/bindings/arm/arm,mpam-msc.yaml | 200 ++++++++++++++++++
 1 file changed, 200 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/arm,mpam-msc.yaml
diff --git a/Documentation/devicetree/bindings/arm/arm,mpam-msc.yaml b/Documentation/devicetree/bindings/arm/arm,mpam-msc.yaml
new file mode 100644
index 000000000000..d984817b3385
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/arm,mpam-msc.yaml
@@ -0,0 +1,200 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/arm/arm,mpam-msc.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Arm Memory System Resource Partitioning and Monitoring (MPAM)
+
+description: |
+  The Arm MPAM specification can be found here:
+
+  https://developer.arm.com/documentation/ddi0598/latest
+
+maintainers:
+  - Rob Herring <robh@kernel.org>
+
+properties:
+  compatible:
+    items:
+      - const: arm,mpam-msc                   # Further details are discoverable
+      - const: arm,mpam-memory-controller-msc
+
+  reg:
+    maxItems: 1
+    description: A memory region containing registers as defined in the MPAM
+      specification.
+
+  interrupts:
+    minItems: 1
+    items:
+      - description: error (optional)
+      - description: overflow (optional, only for monitoring)
+
+  interrupt-names:
+    oneOf:
+      - items:
+          - enum: [ error, overflow ]
+      - items:
+          - const: error
+          - const: overflow
+
+  arm,not-ready-us:
+    description: The maximum time in microseconds for monitoring data to be
+      accurate after a settings change. For more information, see the
+      Not-Ready (NRDY) bit description in the MPAM specification.
+
+  numa-node-id: true # see NUMA binding
+
+  '#address-cells':
+    const: 1
+
+  '#size-cells':
+    const: 0
+
+patternProperties:
+  '^ris@[0-9a-f]$':
+    type: object
+    additionalProperties: false
+    description:
+      RIS nodes for each RIS in an MSC. These nodes are required for each RIS
+      implementing known MPAM controls
+
+    properties:
+      compatible:
+        enum:
+            # Bulk storage for cache
+          - arm,mpam-cache
+            # Memory bandwidth
+          - arm,mpam-memory
+
+      reg:
+        minimum: 0
+        maximum: 0xf
+
+      cpus:
+        description:
+          Phandle(s) to the CPU node(s) this RIS belongs to. By default, the parent
+          device's affinity is used.
+
+      arm,mpam-device:
+        $ref: /schemas/types.yaml#/definitions/phandle
+        description:
+          By default, the MPAM enabled device associated with a RIS is the MSC's
+          parent node. It is possible for each RIS to be associated with different
+          devices in which case 'arm,mpam-device' should be used.
+
+    required:
+      - compatible
+      - reg
+
+required:
+  - compatible
+  - reg
+
+dependencies:
+  interrupts: [ interrupt-names ]
+
+additionalProperties: false
+
+examples:
+  - |
+    L3: cache-controller@30000000 {
+        compatible = "arm,dsu-l3-cache", "cache";
+        cache-level = <3>;
+        cache-unified;
+
+        ranges = <0x0 0x30000000 0x800000>;
+        #address-cells = <1>;
+        #size-cells = <1>;
+
+        msc@10000 {
+            compatible = "arm,mpam-msc";
+
+            /* CPU affinity implied by parent cache node's  */
+            reg = <0x10000 0x2000>;
+            interrupts = <1>, <2>;
+            interrupt-names = "error", "overflow";
+            arm,not-ready-us = <1>;
+        };
+    };
+
+    mem: memory-controller@20000 {
+        compatible = "foo,a-memory-controller";
+        reg = <0x20000 0x1000>;
+
+        #address-cells = <1>;
+        #size-cells = <1>;
+        ranges;
+
+        msc@21000 {
+            compatible = "arm,mpam-memory-controller-msc", "arm,mpam-msc";
+            reg = <0x21000 0x1000>;
+            interrupts = <3>;
+            interrupt-names = "error";
+            arm,not-ready-us = <1>;
+            numa-node-id = <1>;
+        };
+    };
+
+    iommu@40000 {
+        reg = <0x40000 0x1000>;
+
+        ranges;
+        #address-cells = <1>;
+        #size-cells = <1>;
+
+        msc@41000 {
+            compatible = "arm,mpam-msc";
+            reg = <0 0x1000>;
+            interrupts = <5>, <6>;
+            interrupt-names = "error", "overflow";
+            arm,not-ready-us = <1>;
+
+            #address-cells = <1>;
+            #size-cells = <0>;
+
+            ris@2 {
+                compatible = "arm,mpam-cache";
+                reg = <0>;
+                // TODO: How to map to device(s)?
+            };
+        };
+    };
+
+    msc@80000 {
+        compatible = "foo,a-standalone-msc";
+        reg = <0x80000 0x1000>;
+
+        clocks = <&clks 123>;
+
+        ranges;
+        #address-cells = <1>;
+        #size-cells = <1>;
+
+        msc@10000 {
+            compatible = "arm,mpam-msc";
+
+            reg = <0x10000 0x2000>;
+            interrupts = <7>;
+            interrupt-names = "overflow";
+            arm,not-ready-us = <1>;
+
+            #address-cells = <1>;
+            #size-cells = <0>;
+
+            ris@0 {
+                compatible = "arm,mpam-cache";
+                reg = <0>;
+                arm,mpam-device = <&L2_0>;
+            };
+
+            ris@1 {
+                compatible = "arm,mpam-memory";
+                reg = <1>;
+                arm,mpam-device = <&mem>;
+            };
+        };
+    };
+
+...
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (42 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 09/33] dt-bindings: arm: Add MPAM MSC binding James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-09-09  7:03   ` Shaopeng Tan (Fujitsu)
  2025-08-22 15:30 ` [PATCH 11/33] arm_mpam: Add support for memory controller MSC on DT platforms James Morse
                   ` (23 subsequent siblings)
  67 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Probing MPAM is convoluted. MSCs that are integrated with a CPU may
only be accessible from those CPUs, and they may not be online.
Touching the hardware early is pointless as MPAM can't be used until
the system-wide common values for num_partid and num_pmg have been
discovered.
Start with driver probe/remove and mapping the MSC.
CC: Carl Worth <carl@os.amperecomputing.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
 * Check for status=broken DT devices.
 * Moved all the files around.
 * Made Kconfig symbols depend on EXPERT
---
 arch/arm64/Kconfig              |   1 +
 drivers/Kconfig                 |   2 +
 drivers/Makefile                |   1 +
 drivers/resctrl/Kconfig         |  11 ++
 drivers/resctrl/Makefile        |   4 +
 drivers/resctrl/mpam_devices.c  | 336 ++++++++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h |  62 ++++++
 7 files changed, 417 insertions(+)
 create mode 100644 drivers/resctrl/Kconfig
 create mode 100644 drivers/resctrl/Makefile
 create mode 100644 drivers/resctrl/mpam_devices.c
 create mode 100644 drivers/resctrl/mpam_internal.h
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index e51ccf1da102..ea3c54e04275 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2062,6 +2062,7 @@ config ARM64_TLB_RANGE
 
 config ARM64_MPAM
 	bool "Enable support for MPAM"
+	select ARM64_MPAM_DRIVER
 	select ACPI_MPAM if ACPI
 	help
 	  Memory Partitioning and Monitoring is an optional extension
diff --git a/drivers/Kconfig b/drivers/Kconfig
index 4915a63866b0..3054b50a2f4c 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -251,4 +251,6 @@ source "drivers/hte/Kconfig"
 
 source "drivers/cdx/Kconfig"
 
+source "drivers/resctrl/Kconfig"
+
 endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index b5749cf67044..f41cf4eddeba 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -194,5 +194,6 @@ obj-$(CONFIG_HTE)		+= hte/
 obj-$(CONFIG_DRM_ACCEL)		+= accel/
 obj-$(CONFIG_CDX_BUS)		+= cdx/
 obj-$(CONFIG_DPLL)		+= dpll/
+obj-y				+= resctrl/
 
 obj-$(CONFIG_S390)		+= s390/
diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
new file mode 100644
index 000000000000..dff7b87280ab
--- /dev/null
+++ b/drivers/resctrl/Kconfig
@@ -0,0 +1,11 @@
+# Confusingly, this is everything but the CPU bits of MPAM. CPU here means
+# CPU resources, not containers or cgroups etc.
+config ARM64_MPAM_DRIVER
+	bool "MPAM driver for System IP, e,g. caches and memory controllers"
+	depends on ARM64_MPAM && EXPERT
+
+config ARM64_MPAM_DRIVER_DEBUG
+	bool "Enable debug messages from the MPAM driver."
+	depends on ARM64_MPAM_DRIVER
+	help
+	  Say yes here to enable debug messages from the MPAM driver.
diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
new file mode 100644
index 000000000000..92b48fa20108
--- /dev/null
+++ b/drivers/resctrl/Makefile
@@ -0,0 +1,4 @@
+obj-$(CONFIG_ARM64_MPAM_DRIVER)			+= mpam.o
+mpam-y						+= mpam_devices.o
+
+cflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG)	+= -DDEBUG
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
new file mode 100644
index 000000000000..a0d9a699a6e7
--- /dev/null
+++ b/drivers/resctrl/mpam_devices.c
@@ -0,0 +1,336 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2025 Arm Ltd.
+
+#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
+
+#include <linux/acpi.h>
+#include <linux/arm_mpam.h>
+#include <linux/cacheinfo.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/device.h>
+#include <linux/errno.h>
+#include <linux/gfp.h>
+#include <linux/list.h>
+#include <linux/lockdep.h>
+#include <linux/mutex.h>
+#include <linux/of.h>
+#include <linux/of_platform.h>
+#include <linux/platform_device.h>
+#include <linux/printk.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/srcu.h>
+#include <linux/types.h>
+
+#include <acpi/pcc.h>
+
+#include "mpam_internal.h"
+
+/*
+ * mpam_list_lock protects the SRCU lists when writing. Once the
+ * mpam_enabled key is enabled these lists are read-only,
+ * unless the error interrupt disables the driver.
+ */
+static DEFINE_MUTEX(mpam_list_lock);
+static LIST_HEAD(mpam_all_msc);
+
+static struct srcu_struct mpam_srcu;
+
+/* MPAM isn't available until all the MSC have been probed. */
+static u32 mpam_num_msc;
+
+static void mpam_discovery_complete(void)
+{
+	pr_err("Discovered all MSC\n");
+}
+
+static int mpam_dt_count_msc(void)
+{
+	int count = 0;
+	struct device_node *np;
+
+	for_each_compatible_node(np, NULL, "arm,mpam-msc") {
+		if (of_device_is_available(np))
+			count++;
+	}
+
+	return count;
+}
+
+static int mpam_dt_parse_resource(struct mpam_msc *msc, struct device_node *np,
+				  u32 ris_idx)
+{
+	int err = 0;
+	u32 level = 0;
+	unsigned long cache_id;
+	struct device_node *cache;
+
+	do {
+		if (of_device_is_compatible(np, "arm,mpam-cache")) {
+			cache = of_parse_phandle(np, "arm,mpam-device", 0);
+			if (!cache) {
+				pr_err("Failed to read phandle\n");
+				break;
+			}
+		} else if (of_device_is_compatible(np->parent, "cache")) {
+			cache = of_node_get(np->parent);
+		} else {
+			/* For now, only caches are supported */
+			cache = NULL;
+			break;
+		}
+
+		err = of_property_read_u32(cache, "cache-level", &level);
+		if (err) {
+			pr_err("Failed to read cache-level\n");
+			break;
+		}
+
+		cache_id = cache_of_calculate_id(cache);
+		if (cache_id == ~0UL) {
+			err = -ENOENT;
+			break;
+		}
+
+		err = mpam_ris_create(msc, ris_idx, MPAM_CLASS_CACHE, level,
+				      cache_id);
+	} while (0);
+	of_node_put(cache);
+
+	return err;
+}
+
+static int mpam_dt_parse_resources(struct mpam_msc *msc, void *ignored)
+{
+	int err, num_ris = 0;
+	const u32 *ris_idx_p;
+	struct device_node *iter, *np;
+
+	np = msc->pdev->dev.of_node;
+	for_each_child_of_node(np, iter) {
+		ris_idx_p = of_get_property(iter, "reg", NULL);
+		if (ris_idx_p) {
+			num_ris++;
+			err = mpam_dt_parse_resource(msc, iter, *ris_idx_p);
+			if (err) {
+				of_node_put(iter);
+				return err;
+			}
+		}
+	}
+
+	if (!num_ris)
+		mpam_dt_parse_resource(msc, np, 0);
+
+	return err;
+}
+
+/*
+ * An MSC can control traffic from a set of CPUs, but may only be accessible
+ * from a (hopefully wider) set of CPUs. The common reason for this is power
+ * management. If all the CPUs in a cluster are in PSCI:CPU_SUSPEND, the
+ * the corresponding cache may also be powered off. By making accesses from
+ * one of those CPUs, we ensure this isn't the case.
+ */
+static int update_msc_accessibility(struct mpam_msc *msc)
+{
+	struct device_node *parent;
+	u32 affinity_id;
+	int err;
+
+	if (!acpi_disabled) {
+		err = device_property_read_u32(&msc->pdev->dev, "cpu_affinity",
+					       &affinity_id);
+		if (err)
+			cpumask_copy(&msc->accessibility, cpu_possible_mask);
+		else
+			acpi_pptt_get_cpus_from_container(affinity_id,
+							  &msc->accessibility);
+
+		return 0;
+	}
+
+	/* This depends on the path to of_node */
+	parent = of_get_parent(msc->pdev->dev.of_node);
+	if (parent == of_root) {
+		cpumask_copy(&msc->accessibility, cpu_possible_mask);
+		err = 0;
+	} else {
+		err = -EINVAL;
+		pr_err("Cannot determine accessibility of MSC: %s\n",
+		       dev_name(&msc->pdev->dev));
+	}
+	of_node_put(parent);
+
+	return err;
+}
+
+static int fw_num_msc;
+
+static void mpam_pcc_rx_callback(struct mbox_client *cl, void *msg)
+{
+	/* TODO: wake up tasks blocked on this MSC's PCC channel */
+}
+
+static void mpam_msc_drv_remove(struct platform_device *pdev)
+{
+	struct mpam_msc *msc = platform_get_drvdata(pdev);
+
+	if (!msc)
+		return;
+
+	mutex_lock(&mpam_list_lock);
+	mpam_num_msc--;
+	platform_set_drvdata(pdev, NULL);
+	list_del_rcu(&msc->glbl_list);
+	synchronize_srcu(&mpam_srcu);
+	devm_kfree(&pdev->dev, msc);
+	mutex_unlock(&mpam_list_lock);
+}
+
+static int mpam_msc_drv_probe(struct platform_device *pdev)
+{
+	int err;
+	struct mpam_msc *msc;
+	struct resource *msc_res;
+	void *plat_data = pdev->dev.platform_data;
+
+	mutex_lock(&mpam_list_lock);
+	do {
+		msc = devm_kzalloc(&pdev->dev, sizeof(*msc), GFP_KERNEL);
+		if (!msc) {
+			err = -ENOMEM;
+			break;
+		}
+
+		mutex_init(&msc->probe_lock);
+		mutex_init(&msc->part_sel_lock);
+		mutex_init(&msc->outer_mon_sel_lock);
+		raw_spin_lock_init(&msc->inner_mon_sel_lock);
+		msc->id = mpam_num_msc++;
+		msc->pdev = pdev;
+		INIT_LIST_HEAD_RCU(&msc->glbl_list);
+		INIT_LIST_HEAD_RCU(&msc->ris);
+
+		err = update_msc_accessibility(msc);
+		if (err)
+			break;
+		if (cpumask_empty(&msc->accessibility)) {
+			pr_err_once("msc:%u is not accessible from any CPU!",
+				    msc->id);
+			err = -EINVAL;
+			break;
+		}
+
+		if (device_property_read_u32(&pdev->dev, "pcc-channel",
+					     &msc->pcc_subspace_id))
+			msc->iface = MPAM_IFACE_MMIO;
+		else
+			msc->iface = MPAM_IFACE_PCC;
+
+		if (msc->iface == MPAM_IFACE_MMIO) {
+			void __iomem *io;
+
+			io = devm_platform_get_and_ioremap_resource(pdev, 0,
+								    &msc_res);
+			if (IS_ERR(io)) {
+				pr_err("Failed to map MSC base address\n");
+				err = PTR_ERR(io);
+				break;
+			}
+			msc->mapped_hwpage_sz = msc_res->end - msc_res->start;
+			msc->mapped_hwpage = io;
+		} else if (msc->iface == MPAM_IFACE_PCC) {
+			msc->pcc_cl.dev = &pdev->dev;
+			msc->pcc_cl.rx_callback = mpam_pcc_rx_callback;
+			msc->pcc_cl.tx_block = false;
+			msc->pcc_cl.tx_tout = 1000; /* 1s */
+			msc->pcc_cl.knows_txdone = false;
+
+			msc->pcc_chan = pcc_mbox_request_channel(&msc->pcc_cl,
+								 msc->pcc_subspace_id);
+			if (IS_ERR(msc->pcc_chan)) {
+				pr_err("Failed to request MSC PCC channel\n");
+				err = PTR_ERR(msc->pcc_chan);
+				break;
+			}
+		}
+
+		list_add_rcu(&msc->glbl_list, &mpam_all_msc);
+		platform_set_drvdata(pdev, msc);
+	} while (0);
+	mutex_unlock(&mpam_list_lock);
+
+	if (!err) {
+		/* Create RIS entries described by firmware */
+		if (!acpi_disabled)
+			err = acpi_mpam_parse_resources(msc, plat_data);
+		else
+			err = mpam_dt_parse_resources(msc, plat_data);
+	}
+
+	if (!err && fw_num_msc == mpam_num_msc)
+		mpam_discovery_complete();
+
+	if (err && msc)
+		mpam_msc_drv_remove(pdev);
+
+	return err;
+}
+
+static const struct of_device_id mpam_of_match[] = {
+	{ .compatible = "arm,mpam-msc", },
+	{},
+};
+MODULE_DEVICE_TABLE(of, mpam_of_match);
+
+static struct platform_driver mpam_msc_driver = {
+	.driver = {
+		.name = "mpam_msc",
+		.of_match_table = of_match_ptr(mpam_of_match),
+	},
+	.probe = mpam_msc_drv_probe,
+	.remove = mpam_msc_drv_remove,
+};
+
+/*
+ * MSC that are hidden under caches are not created as platform devices
+ * as there is no cache driver. Caches are also special-cased in
+ * update_msc_accessibility().
+ */
+static void mpam_dt_create_foundling_msc(void)
+{
+	int err;
+	struct device_node *cache;
+
+	for_each_compatible_node(cache, NULL, "cache") {
+		err = of_platform_populate(cache, mpam_of_match, NULL, NULL);
+		if (err)
+			pr_err("Failed to create MSC devices under caches\n");
+	}
+}
+
+static int __init mpam_msc_driver_init(void)
+{
+	if (!system_supports_mpam())
+		return -EOPNOTSUPP;
+
+	init_srcu_struct(&mpam_srcu);
+
+	if (!acpi_disabled)
+		fw_num_msc = acpi_mpam_count_msc();
+	else
+		fw_num_msc = mpam_dt_count_msc();
+
+	if (fw_num_msc <= 0) {
+		pr_err("No MSC devices found in firmware\n");
+		return -EINVAL;
+	}
+
+	if (acpi_disabled)
+		mpam_dt_create_foundling_msc();
+
+	return platform_driver_register(&mpam_msc_driver);
+}
+subsys_initcall(mpam_msc_driver_init);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
new file mode 100644
index 000000000000..07e0f240eaca
--- /dev/null
+++ b/drivers/resctrl/mpam_internal.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+// Copyright (C) 2024 Arm Ltd.
+
+#ifndef MPAM_INTERNAL_H
+#define MPAM_INTERNAL_H
+
+#include <linux/arm_mpam.h>
+#include <linux/cpumask.h>
+#include <linux/io.h>
+#include <linux/mailbox_client.h>
+#include <linux/mutex.h>
+#include <linux/resctrl.h>
+#include <linux/sizes.h>
+
+struct mpam_msc {
+	/* member of mpam_all_msc */
+	struct list_head        glbl_list;
+
+	int			id;
+	struct platform_device *pdev;
+
+	/* Not modified after mpam_is_enabled() becomes true */
+	enum mpam_msc_iface	iface;
+	u32			pcc_subspace_id;
+	struct mbox_client	pcc_cl;
+	struct pcc_mbox_chan	*pcc_chan;
+	u32			nrdy_usec;
+	cpumask_t		accessibility;
+
+	/*
+	 * probe_lock is only take during discovery. After discovery these
+	 * properties become read-only and the lists are protected by SRCU.
+	 */
+	struct mutex		probe_lock;
+	unsigned long		ris_idxs[128 / BITS_PER_LONG];
+	u32			ris_max;
+
+	/* mpam_msc_ris of this component */
+	struct list_head	ris;
+
+	/*
+	 * part_sel_lock protects access to the MSC hardware registers that are
+	 * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
+	 * by RIS).
+	 * If needed, take msc->lock first.
+	 */
+	struct mutex		part_sel_lock;
+
+	/*
+	 * mon_sel_lock protects access to the MSC hardware registers that are
+	 * affeted by MPAMCFG_MON_SEL.
+	 * If needed, take msc->lock first.
+	 */
+	struct mutex		outer_mon_sel_lock;
+	raw_spinlock_t		inner_mon_sel_lock;
+	unsigned long		inner_mon_sel_flags;
+
+	void __iomem		*mapped_hwpage;
+	size_t			mapped_hwpage_sz;
+};
+
+#endif /* MPAM_INTERNAL_H */
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * RE: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-08-22 15:30 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
@ 2025-09-09  7:03   ` Shaopeng Tan (Fujitsu)
  2025-09-10 19:31     ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-09-09  7:03 UTC (permalink / raw)
  To: 'James Morse', linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-acpi@vger.kernel.org,
	devicetree@vger.kernel.org
  Cc: shameerali.kolothum.thodi@huawei.com, D Scott Phillips OS,
	carl@os.amperecomputing.com, lcherian@marvell.com,
	bobo.shaobowang@huawei.com, baolin.wang@linux.alibaba.com,
	Jamie Iles, Xin Hao, peternewman@google.com,
	dfustini@baylibre.com, amitsinght@marvell.com, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay@nvidia.com, baisheng.gao@unisoc.com, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hello James,
> Probing MPAM is convoluted. MSCs that are integrated with a CPU may only be
> accessible from those CPUs, and they may not be online.
> Touching the hardware early is pointless as MPAM can't be used until the
> system-wide common values for num_partid and num_pmg have been
> discovered.
> 
> Start with driver probe/remove and mapping the MSC.
> 
> CC: Carl Worth <carl@os.amperecomputing.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
>  * Check for status=broken DT devices.
>  * Moved all the files around.
>  * Made Kconfig symbols depend on EXPERT
> ---
>  arch/arm64/Kconfig              |   1 +
>  drivers/Kconfig                 |   2 +
>  drivers/Makefile                |   1 +
>  drivers/resctrl/Kconfig         |  11 ++
>  drivers/resctrl/Makefile        |   4 +
>  drivers/resctrl/mpam_devices.c  | 336
> ++++++++++++++++++++++++++++++++
> drivers/resctrl/mpam_internal.h |  62 ++++++
>  7 files changed, 417 insertions(+)
>  create mode 100644 drivers/resctrl/Kconfig  create mode 100644
> drivers/resctrl/Makefile  create mode 100644 drivers/resctrl/mpam_devices.c
> create mode 100644 drivers/resctrl/mpam_internal.h
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index
> e51ccf1da102..ea3c54e04275 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2062,6 +2062,7 @@ config ARM64_TLB_RANGE
> 
>  config ARM64_MPAM
>  	bool "Enable support for MPAM"
> +	select ARM64_MPAM_DRIVER
>  	select ACPI_MPAM if ACPI
>  	help
>  	  Memory Partitioning and Monitoring is an optional extension diff
> --git a/drivers/Kconfig b/drivers/Kconfig index 4915a63866b0..3054b50a2f4c
> 100644
> --- a/drivers/Kconfig
> +++ b/drivers/Kconfig
> @@ -251,4 +251,6 @@ source "drivers/hte/Kconfig"
> 
>  source "drivers/cdx/Kconfig"
> 
> +source "drivers/resctrl/Kconfig"
> +
>  endmenu
> diff --git a/drivers/Makefile b/drivers/Makefile index
> b5749cf67044..f41cf4eddeba 100644
> --- a/drivers/Makefile
> +++ b/drivers/Makefile
> @@ -194,5 +194,6 @@ obj-$(CONFIG_HTE)		+= hte/
>  obj-$(CONFIG_DRM_ACCEL)		+= accel/
>  obj-$(CONFIG_CDX_BUS)		+= cdx/
>  obj-$(CONFIG_DPLL)		+= dpll/
> +obj-y				+= resctrl/
> 
>  obj-$(CONFIG_S390)		+= s390/
> diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig new file mode
> 100644 index 000000000000..dff7b87280ab
> --- /dev/null
> +++ b/drivers/resctrl/Kconfig
> @@ -0,0 +1,11 @@
> +# Confusingly, this is everything but the CPU bits of MPAM. CPU here
> +means # CPU resources, not containers or cgroups etc.
> +config ARM64_MPAM_DRIVER
> +	bool "MPAM driver for System IP, e,g. caches and memory controllers"
> +	depends on ARM64_MPAM && EXPERT
> +
> +config ARM64_MPAM_DRIVER_DEBUG
> +	bool "Enable debug messages from the MPAM driver."
> +	depends on ARM64_MPAM_DRIVER
> +	help
> +	  Say yes here to enable debug messages from the MPAM driver.
> diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile new file mode
> 100644 index 000000000000..92b48fa20108
> --- /dev/null
> +++ b/drivers/resctrl/Makefile
> @@ -0,0 +1,4 @@
> +obj-$(CONFIG_ARM64_MPAM_DRIVER)			+= mpam.o
> +mpam-y						+=
> mpam_devices.o
> +
> +cflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG)	+= -DDEBUG
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> new file mode 100644 index 000000000000..a0d9a699a6e7
> --- /dev/null
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -0,0 +1,336 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2025 Arm Ltd.
> +
> +#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
> +
> +#include <linux/acpi.h>
> +#include <linux/arm_mpam.h>
> +#include <linux/cacheinfo.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
> +#include <linux/device.h>
> +#include <linux/errno.h>
> +#include <linux/gfp.h>
> +#include <linux/list.h>
> +#include <linux/lockdep.h>
> +#include <linux/mutex.h>
> +#include <linux/of.h>
> +#include <linux/of_platform.h>
> +#include <linux/platform_device.h>
> +#include <linux/printk.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/srcu.h>
> +#include <linux/types.h>
> +
> +#include <acpi/pcc.h>
> +
> +#include "mpam_internal.h"
> +
> +/*
> + * mpam_list_lock protects the SRCU lists when writing. Once the
> + * mpam_enabled key is enabled these lists are read-only,
> + * unless the error interrupt disables the driver.
> + */
> +static DEFINE_MUTEX(mpam_list_lock);
> +static LIST_HEAD(mpam_all_msc);
> +
> +static struct srcu_struct mpam_srcu;
> +
> +/* MPAM isn't available until all the MSC have been probed. */ static
> +u32 mpam_num_msc;
> +
> +static void mpam_discovery_complete(void) {
> +	pr_err("Discovered all MSC\n");
> +}
> +
> +static int mpam_dt_count_msc(void)
> +{
> +	int count = 0;
> +	struct device_node *np;
> +
> +	for_each_compatible_node(np, NULL, "arm,mpam-msc") {
> +		if (of_device_is_available(np))
> +			count++;
> +	}
> +
> +	return count;
> +}
> +
> +static int mpam_dt_parse_resource(struct mpam_msc *msc, struct
> device_node *np,
> +				  u32 ris_idx)
> +{
> +	int err = 0;
> +	u32 level = 0;
> +	unsigned long cache_id;
> +	struct device_node *cache;
> +
> +	do {
> +		if (of_device_is_compatible(np, "arm,mpam-cache")) {
> +			cache = of_parse_phandle(np, "arm,mpam-device",
> 0);
> +			if (!cache) {
> +				pr_err("Failed to read phandle\n");
> +				break;
> +			}
> +		} else if (of_device_is_compatible(np->parent, "cache")) {
> +			cache = of_node_get(np->parent);
> +		} else {
> +			/* For now, only caches are supported */
> +			cache = NULL;
> +			break;
> +		}
> +
> +		err = of_property_read_u32(cache, "cache-level", &level);
> +		if (err) {
> +			pr_err("Failed to read cache-level\n");
> +			break;
> +		}
> +
> +		cache_id = cache_of_calculate_id(cache);
> +		if (cache_id == ~0UL) {
> +			err = -ENOENT;
> +			break;
> +		}
> +
> +		err = mpam_ris_create(msc, ris_idx, MPAM_CLASS_CACHE,
> level,
> +				      cache_id);
> +	} while (0);
> +	of_node_put(cache);
> +
> +	return err;
> +}
> +
> +static int mpam_dt_parse_resources(struct mpam_msc *msc, void *ignored)
> +{
> +	int err, num_ris = 0;
> +	const u32 *ris_idx_p;
> +	struct device_node *iter, *np;
> +
> +	np = msc->pdev->dev.of_node;
> +	for_each_child_of_node(np, iter) {
> +		ris_idx_p = of_get_property(iter, "reg", NULL);
> +		if (ris_idx_p) {
> +			num_ris++;
> +			err = mpam_dt_parse_resource(msc, iter, *ris_idx_p);
> +			if (err) {
> +				of_node_put(iter);
> +				return err;
> +			}
> +		}
> +	}
> +
> +	if (!num_ris)
> +		mpam_dt_parse_resource(msc, np, 0);
err = mpam_dt_parse_resource(msc, np, 0);
Best regards,
Shaopeng TAN
> +	return err;
> +}
> +
> +/*
> + * An MSC can control traffic from a set of CPUs, but may only be
> +accessible
> + * from a (hopefully wider) set of CPUs. The common reason for this is
> +power
> + * management. If all the CPUs in a cluster are in PSCI:CPU_SUSPEND,
> +the
> + * the corresponding cache may also be powered off. By making accesses
> +from
> + * one of those CPUs, we ensure this isn't the case.
> + */
> +static int update_msc_accessibility(struct mpam_msc *msc) {
> +	struct device_node *parent;
> +	u32 affinity_id;
> +	int err;
> +
> +	if (!acpi_disabled) {
> +		err = device_property_read_u32(&msc->pdev->dev,
> "cpu_affinity",
> +					       &affinity_id);
> +		if (err)
> +			cpumask_copy(&msc->accessibility,
> cpu_possible_mask);
> +		else
> +			acpi_pptt_get_cpus_from_container(affinity_id,
> +
> &msc->accessibility);
> +
> +		return 0;
> +	}
> +
> +	/* This depends on the path to of_node */
> +	parent = of_get_parent(msc->pdev->dev.of_node);
> +	if (parent == of_root) {
> +		cpumask_copy(&msc->accessibility, cpu_possible_mask);
> +		err = 0;
> +	} else {
> +		err = -EINVAL;
> +		pr_err("Cannot determine accessibility of MSC: %s\n",
> +		       dev_name(&msc->pdev->dev));
> +	}
> +	of_node_put(parent);
> +
> +	return err;
> +}
> +
> +static int fw_num_msc;
> +
> +static void mpam_pcc_rx_callback(struct mbox_client *cl, void *msg) {
> +	/* TODO: wake up tasks blocked on this MSC's PCC channel */ }
> +
> +static void mpam_msc_drv_remove(struct platform_device *pdev) {
> +	struct mpam_msc *msc = platform_get_drvdata(pdev);
> +
> +	if (!msc)
> +		return;
> +
> +	mutex_lock(&mpam_list_lock);
> +	mpam_num_msc--;
> +	platform_set_drvdata(pdev, NULL);
> +	list_del_rcu(&msc->glbl_list);
> +	synchronize_srcu(&mpam_srcu);
> +	devm_kfree(&pdev->dev, msc);
> +	mutex_unlock(&mpam_list_lock);
> +}
> +
> +static int mpam_msc_drv_probe(struct platform_device *pdev) {
> +	int err;
> +	struct mpam_msc *msc;
> +	struct resource *msc_res;
> +	void *plat_data = pdev->dev.platform_data;
> +
> +	mutex_lock(&mpam_list_lock);
> +	do {
> +		msc = devm_kzalloc(&pdev->dev, sizeof(*msc),
> GFP_KERNEL);
> +		if (!msc) {
> +			err = -ENOMEM;
> +			break;
> +		}
> +
> +		mutex_init(&msc->probe_lock);
> +		mutex_init(&msc->part_sel_lock);
> +		mutex_init(&msc->outer_mon_sel_lock);
> +		raw_spin_lock_init(&msc->inner_mon_sel_lock);
> +		msc->id = mpam_num_msc++;
> +		msc->pdev = pdev;
> +		INIT_LIST_HEAD_RCU(&msc->glbl_list);
> +		INIT_LIST_HEAD_RCU(&msc->ris);
> +
> +		err = update_msc_accessibility(msc);
> +		if (err)
> +			break;
> +		if (cpumask_empty(&msc->accessibility)) {
> +			pr_err_once("msc:%u is not accessible from any
> CPU!",
> +				    msc->id);
> +			err = -EINVAL;
> +			break;
> +		}
> +
> +		if (device_property_read_u32(&pdev->dev, "pcc-channel",
> +					     &msc->pcc_subspace_id))
> +			msc->iface = MPAM_IFACE_MMIO;
> +		else
> +			msc->iface = MPAM_IFACE_PCC;
> +
> +		if (msc->iface == MPAM_IFACE_MMIO) {
> +			void __iomem *io;
> +
> +			io = devm_platform_get_and_ioremap_resource(pdev,
> 0,
> +
> &msc_res);
> +			if (IS_ERR(io)) {
> +				pr_err("Failed to map MSC base address\n");
> +				err = PTR_ERR(io);
> +				break;
> +			}
> +			msc->mapped_hwpage_sz = msc_res->end -
> msc_res->start;
> +			msc->mapped_hwpage = io;
> +		} else if (msc->iface == MPAM_IFACE_PCC) {
> +			msc->pcc_cl.dev = &pdev->dev;
> +			msc->pcc_cl.rx_callback = mpam_pcc_rx_callback;
> +			msc->pcc_cl.tx_block = false;
> +			msc->pcc_cl.tx_tout = 1000; /* 1s */
> +			msc->pcc_cl.knows_txdone = false;
> +
> +			msc->pcc_chan =
> pcc_mbox_request_channel(&msc->pcc_cl,
> +
> msc->pcc_subspace_id);
> +			if (IS_ERR(msc->pcc_chan)) {
> +				pr_err("Failed to request MSC PCC
> channel\n");
> +				err = PTR_ERR(msc->pcc_chan);
> +				break;
> +			}
> +		}
> +
> +		list_add_rcu(&msc->glbl_list, &mpam_all_msc);
> +		platform_set_drvdata(pdev, msc);
> +	} while (0);
> +	mutex_unlock(&mpam_list_lock);
> +
> +	if (!err) {
> +		/* Create RIS entries described by firmware */
> +		if (!acpi_disabled)
> +			err = acpi_mpam_parse_resources(msc, plat_data);
> +		else
> +			err = mpam_dt_parse_resources(msc, plat_data);
> +	}
> +
> +	if (!err && fw_num_msc == mpam_num_msc)
> +		mpam_discovery_complete();
> +
> +	if (err && msc)
> +		mpam_msc_drv_remove(pdev);
> +
> +	return err;
> +}
> +
> +static const struct of_device_id mpam_of_match[] = {
> +	{ .compatible = "arm,mpam-msc", },
> +	{},
> +};
> +MODULE_DEVICE_TABLE(of, mpam_of_match);
> +
> +static struct platform_driver mpam_msc_driver = {
> +	.driver = {
> +		.name = "mpam_msc",
> +		.of_match_table = of_match_ptr(mpam_of_match),
> +	},
> +	.probe = mpam_msc_drv_probe,
> +	.remove = mpam_msc_drv_remove,
> +};
> +
> +/*
> + * MSC that are hidden under caches are not created as platform devices
> + * as there is no cache driver. Caches are also special-cased in
> + * update_msc_accessibility().
> + */
> +static void mpam_dt_create_foundling_msc(void)
> +{
> +	int err;
> +	struct device_node *cache;
> +
> +	for_each_compatible_node(cache, NULL, "cache") {
> +		err = of_platform_populate(cache, mpam_of_match, NULL,
> NULL);
> +		if (err)
> +			pr_err("Failed to create MSC devices under
> caches\n");
> +	}
> +}
> +
> +static int __init mpam_msc_driver_init(void) {
> +	if (!system_supports_mpam())
> +		return -EOPNOTSUPP;
> +
> +	init_srcu_struct(&mpam_srcu);
> +
> +	if (!acpi_disabled)
> +		fw_num_msc = acpi_mpam_count_msc();
> +	else
> +		fw_num_msc = mpam_dt_count_msc();
> +
> +	if (fw_num_msc <= 0) {
> +		pr_err("No MSC devices found in firmware\n");
> +		return -EINVAL;
> +	}
> +
> +	if (acpi_disabled)
> +		mpam_dt_create_foundling_msc();
> +
> +	return platform_driver_register(&mpam_msc_driver);
> +}
> +subsys_initcall(mpam_msc_driver_init);
> diff --git a/drivers/resctrl/mpam_internal.h
> b/drivers/resctrl/mpam_internal.h new file mode 100644 index
> 000000000000..07e0f240eaca
> --- /dev/null
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -0,0 +1,62 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +// Copyright (C) 2024 Arm Ltd.
> +
> +#ifndef MPAM_INTERNAL_H
> +#define MPAM_INTERNAL_H
> +
> +#include <linux/arm_mpam.h>
> +#include <linux/cpumask.h>
> +#include <linux/io.h>
> +#include <linux/mailbox_client.h>
> +#include <linux/mutex.h>
> +#include <linux/resctrl.h>
> +#include <linux/sizes.h>
> +
> +struct mpam_msc {
> +	/* member of mpam_all_msc */
> +	struct list_head        glbl_list;
> +
> +	int			id;
> +	struct platform_device *pdev;
> +
> +	/* Not modified after mpam_is_enabled() becomes true */
> +	enum mpam_msc_iface	iface;
> +	u32			pcc_subspace_id;
> +	struct mbox_client	pcc_cl;
> +	struct pcc_mbox_chan	*pcc_chan;
> +	u32			nrdy_usec;
> +	cpumask_t		accessibility;
> +
> +	/*
> +	 * probe_lock is only take during discovery. After discovery these
> +	 * properties become read-only and the lists are protected by SRCU.
> +	 */
> +	struct mutex		probe_lock;
> +	unsigned long		ris_idxs[128 / BITS_PER_LONG];
> +	u32			ris_max;
> +
> +	/* mpam_msc_ris of this component */
> +	struct list_head	ris;
> +
> +	/*
> +	 * part_sel_lock protects access to the MSC hardware registers that
> are
> +	 * affected by MPAMCFG_PART_SEL. (including the ID registers that
> vary
> +	 * by RIS).
> +	 * If needed, take msc->lock first.
> +	 */
> +	struct mutex		part_sel_lock;
> +
> +	/*
> +	 * mon_sel_lock protects access to the MSC hardware registers that
> are
> +	 * affeted by MPAMCFG_MON_SEL.
> +	 * If needed, take msc->lock first.
> +	 */
> +	struct mutex		outer_mon_sel_lock;
> +	raw_spinlock_t		inner_mon_sel_lock;
> +	unsigned long		inner_mon_sel_flags;
> +
> +	void __iomem		*mapped_hwpage;
> +	size_t			mapped_hwpage_sz;
> +};
> +
> +#endif /* MPAM_INTERNAL_H */
> --
> 2.20.1
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate
  2025-09-09  7:03   ` Shaopeng Tan (Fujitsu)
@ 2025-09-10 19:31     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 19:31 UTC (permalink / raw)
  To: Shaopeng Tan (Fujitsu), linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-acpi@vger.kernel.org,
	devicetree@vger.kernel.org
  Cc: shameerali.kolothum.thodi@huawei.com, D Scott Phillips OS,
	carl@os.amperecomputing.com, lcherian@marvell.com,
	bobo.shaobowang@huawei.com, baolin.wang@linux.alibaba.com,
	Jamie Iles, Xin Hao, peternewman@google.com,
	dfustini@baylibre.com, amitsinght@marvell.com, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay@nvidia.com, baisheng.gao@unisoc.com, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hi Shaopeng,
On 09/09/2025 08:03, Shaopeng Tan (Fujitsu) wrote:
>> Probing MPAM is convoluted. MSCs that are integrated with a CPU may only be
>> accessible from those CPUs, and they may not be online.
>> Touching the hardware early is pointless as MPAM can't be used until the
>> system-wide common values for num_partid and num_pmg have been
>> discovered.
>>
>> Start with driver probe/remove and mapping the MSC.
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> new file mode 100644 index 000000000000..a0d9a699a6e7
>> --- /dev/null
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -0,0 +1,336 @@
>> +static int mpam_dt_parse_resources(struct mpam_msc *msc, void *ignored)
>> +{
>> +	int err, num_ris = 0;
>> +	const u32 *ris_idx_p;
>> +	struct device_node *iter, *np;
>> +
>> +	np = msc->pdev->dev.of_node;
>> +	for_each_child_of_node(np, iter) {
>> +		ris_idx_p = of_get_property(iter, "reg", NULL);
>> +		if (ris_idx_p) {
>> +			num_ris++;
>> +			err = mpam_dt_parse_resource(msc, iter, *ris_idx_p);
>> +			if (err) {
>> +				of_node_put(iter);
>> +				return err;
>> +			}
>> +		}
>> +	}
>> +
>> +	if (!num_ris)
>> +		mpam_dt_parse_resource(msc, np, 0);
> err = mpam_dt_parse_resource(msc, np, 0);
Oops! Thanks for catching that,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
 
- * [PATCH 11/33] arm_mpam: Add support for memory controller MSC on DT platforms
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (43 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 10/33] arm_mpam: Add probe/remove for mpam msc driver and kbuild boiler plate James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-09-09  7:11   ` Shaopeng Tan (Fujitsu)
  2025-08-22 15:30 ` [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described James Morse
                   ` (22 subsequent siblings)
  67 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
From: Shanker Donthineni <sdonthineni@nvidia.com>
The device-tree binding has two examples for MSC associated with
memory controllers. Add the support to discover the component_id
from the device-tree and create 'memory' RIS.
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
[ morse: split out of a bigger patch, added affinity piece ]
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_devices.c | 67 ++++++++++++++++++++++++----------
 1 file changed, 47 insertions(+), 20 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index a0d9a699a6e7..71a1fb1a9c75 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -62,41 +62,63 @@ static int mpam_dt_parse_resource(struct mpam_msc *msc, struct device_node *np,
 				  u32 ris_idx)
 {
 	int err = 0;
-	u32 level = 0;
-	unsigned long cache_id;
-	struct device_node *cache;
+	u32 class_id = 0, component_id = 0;
+	struct device_node *cache = NULL, *memory = NULL;
+	enum mpam_class_types type = MPAM_CLASS_UNKNOWN;
 
 	do {
+		/* What kind of MSC is this? */
 		if (of_device_is_compatible(np, "arm,mpam-cache")) {
 			cache = of_parse_phandle(np, "arm,mpam-device", 0);
 			if (!cache) {
 				pr_err("Failed to read phandle\n");
 				break;
 			}
+			type = MPAM_CLASS_CACHE;
 		} else if (of_device_is_compatible(np->parent, "cache")) {
 			cache = of_node_get(np->parent);
+			type = MPAM_CLASS_CACHE;
+		} else if (of_device_is_compatible(np, "arm,mpam-memory")) {
+			memory = of_parse_phandle(np, "arm,mpam-device", 0);
+			if (!memory) {
+				pr_err("Failed to read phandle\n");
+				break;
+			}
+			type = MPAM_CLASS_MEMORY;
+		} else if (of_device_is_compatible(np, "arm,mpam-memory-controller-msc")) {
+			memory = of_node_get(np->parent);
+			type = MPAM_CLASS_MEMORY;
 		} else {
-			/* For now, only caches are supported */
-			cache = NULL;
+			/*
+			 * For now, only caches and memory controllers are
+			 * supported.
+			 */
 			break;
 		}
 
-		err = of_property_read_u32(cache, "cache-level", &level);
-		if (err) {
-			pr_err("Failed to read cache-level\n");
-			break;
-		}
-
-		cache_id = cache_of_calculate_id(cache);
-		if (cache_id == ~0UL) {
-			err = -ENOENT;
-			break;
+		/* Determine the class and component ids, based on type. */
+		if (type == MPAM_CLASS_CACHE) {
+			err = of_property_read_u32(cache, "cache-level", &class_id);
+			if (err) {
+				pr_err("Failed to read cache-level\n");
+				break;
+			}
+			component_id = cache_of_calculate_id(cache);
+			if (component_id == ~0UL) {
+				err = -ENOENT;
+				break;
+			}
+		} else if (type == MPAM_CLASS_MEMORY) {
+			err = of_node_to_nid(np);
+			component_id = (err == NUMA_NO_NODE) ? 0 : err;
+			class_id = 255;
 		}
 
-		err = mpam_ris_create(msc, ris_idx, MPAM_CLASS_CACHE, level,
-				      cache_id);
+		err = mpam_ris_create(msc, ris_idx, type, class_id,
+				      component_id);
 	} while (0);
 	of_node_put(cache);
+	of_node_put(memory);
 
 	return err;
 }
@@ -157,9 +179,14 @@ static int update_msc_accessibility(struct mpam_msc *msc)
 		cpumask_copy(&msc->accessibility, cpu_possible_mask);
 		err = 0;
 	} else {
-		err = -EINVAL;
-		pr_err("Cannot determine accessibility of MSC: %s\n",
-		       dev_name(&msc->pdev->dev));
+		if (of_device_is_compatible(parent, "memory")) {
+			cpumask_copy(&msc->accessibility, cpu_possible_mask);
+			err = 0;
+		} else {
+			err = -EINVAL;
+			pr_err("Cannot determine accessibility of MSC: %s\n",
+			       dev_name(&msc->pdev->dev));
+		}
 	}
 	of_node_put(parent);
 
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * RE: [PATCH 11/33] arm_mpam: Add support for memory controller MSC on DT platforms
  2025-08-22 15:30 ` [PATCH 11/33] arm_mpam: Add support for memory controller MSC on DT platforms James Morse
@ 2025-09-09  7:11   ` Shaopeng Tan (Fujitsu)
  2025-09-10 19:31     ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-09-09  7:11 UTC (permalink / raw)
  To: 'James Morse', linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-acpi@vger.kernel.org,
	devicetree@vger.kernel.org
  Cc: shameerali.kolothum.thodi@huawei.com, D Scott Phillips OS,
	carl@os.amperecomputing.com, lcherian@marvell.com,
	bobo.shaobowang@huawei.com, baolin.wang@linux.alibaba.com,
	Jamie Iles, Xin Hao, peternewman@google.com,
	dfustini@baylibre.com, amitsinght@marvell.com, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay@nvidia.com, baisheng.gao@unisoc.com, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hello James,
> From: Shanker Donthineni <sdonthineni@nvidia.com>
> 
> The device-tree binding has two examples for MSC associated with memory
> controllers. Add the support to discover the component_id from the device-tree
> and create 'memory' RIS.
> 
> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com> [ morse: split
> out of a bigger patch, added affinity piece ]
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>  drivers/resctrl/mpam_devices.c | 67
> ++++++++++++++++++++++++----------
>  1 file changed, 47 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index a0d9a699a6e7..71a1fb1a9c75 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -62,41 +62,63 @@ static int mpam_dt_parse_resource(struct mpam_msc
> *msc, struct device_node *np,
>  				  u32 ris_idx)
>  {
>  	int err = 0;
> -	u32 level = 0;
> -	unsigned long cache_id;
> -	struct device_node *cache;
> +	u32 class_id = 0, component_id = 0;
> +	struct device_node *cache = NULL, *memory = NULL;
> +	enum mpam_class_types type = MPAM_CLASS_UNKNOWN;
> 
>  	do {
> +		/* What kind of MSC is this? */
>  		if (of_device_is_compatible(np, "arm,mpam-cache")) {
>  			cache = of_parse_phandle(np, "arm,mpam-device",
> 0);
>  			if (!cache) {
>  				pr_err("Failed to read phandle\n");
>  				break;
>  			}
> +			type = MPAM_CLASS_CACHE;
>  		} else if (of_device_is_compatible(np->parent, "cache")) {
>  			cache = of_node_get(np->parent);
> +			type = MPAM_CLASS_CACHE;
> +		} else if (of_device_is_compatible(np, "arm,mpam-memory"))
> {
> +			memory = of_parse_phandle(np, "arm,mpam-device",
> 0);
> +			if (!memory) {
> +				pr_err("Failed to read phandle\n");
> +				break;
> +			}
> +			type = MPAM_CLASS_MEMORY;
> +		} else if (of_device_is_compatible(np,
> "arm,mpam-memory-controller-msc")) {
> +			memory = of_node_get(np->parent);
> +			type = MPAM_CLASS_MEMORY;
>  		} else {
> -			/* For now, only caches are supported */
> -			cache = NULL;
> +			/*
> +			 * For now, only caches and memory controllers are
> +			 * supported.
> +			 */
>  			break;
>  		}
There is no need "{}" here.
Best regards,
Shaopeng TAN
> -		err = of_property_read_u32(cache, "cache-level", &level);
> -		if (err) {
> -			pr_err("Failed to read cache-level\n");
> -			break;
> -		}
> -
> -		cache_id = cache_of_calculate_id(cache);
> -		if (cache_id == ~0UL) {
> -			err = -ENOENT;
> -			break;
> +		/* Determine the class and component ids, based on type. */
> +		if (type == MPAM_CLASS_CACHE) {
> +			err = of_property_read_u32(cache, "cache-level",
> &class_id);
> +			if (err) {
> +				pr_err("Failed to read cache-level\n");
> +				break;
> +			}
> +			component_id = cache_of_calculate_id(cache);
> +			if (component_id == ~0UL) {
> +				err = -ENOENT;
> +				break;
> +			}
> +		} else if (type == MPAM_CLASS_MEMORY) {
> +			err = of_node_to_nid(np);
> +			component_id = (err == NUMA_NO_NODE) ? 0 : err;
> +			class_id = 255;
>  		}
> 
> -		err = mpam_ris_create(msc, ris_idx, MPAM_CLASS_CACHE,
> level,
> -				      cache_id);
> +		err = mpam_ris_create(msc, ris_idx, type, class_id,
> +				      component_id);
>  	} while (0);
>  	of_node_put(cache);
> +	of_node_put(memory);
> 
>  	return err;
>  }
> @@ -157,9 +179,14 @@ static int update_msc_accessibility(struct mpam_msc
> *msc)
>  		cpumask_copy(&msc->accessibility, cpu_possible_mask);
>  		err = 0;
>  	} else {
> -		err = -EINVAL;
> -		pr_err("Cannot determine accessibility of MSC: %s\n",
> -		       dev_name(&msc->pdev->dev));
> +		if (of_device_is_compatible(parent, "memory")) {
> +			cpumask_copy(&msc->accessibility,
> cpu_possible_mask);
> +			err = 0;
> +		} else {
> +			err = -EINVAL;
> +			pr_err("Cannot determine accessibility of
> MSC: %s\n",
> +			       dev_name(&msc->pdev->dev));
> +		}
>  	}
>  	of_node_put(parent);
> 
> --
> 2.20.1
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 11/33] arm_mpam: Add support for memory controller MSC on DT platforms
  2025-09-09  7:11   ` Shaopeng Tan (Fujitsu)
@ 2025-09-10 19:31     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 19:31 UTC (permalink / raw)
  To: Shaopeng Tan (Fujitsu), linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-acpi@vger.kernel.org,
	devicetree@vger.kernel.org
  Cc: shameerali.kolothum.thodi@huawei.com, D Scott Phillips OS,
	carl@os.amperecomputing.com, lcherian@marvell.com,
	bobo.shaobowang@huawei.com, baolin.wang@linux.alibaba.com,
	Jamie Iles, Xin Hao, peternewman@google.com,
	dfustini@baylibre.com, amitsinght@marvell.com, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay@nvidia.com, baisheng.gao@unisoc.com, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich
Hi Shaopeng,
On 09/09/2025 08:11, Shaopeng Tan (Fujitsu) wrote:
>> From: Shanker Donthineni <sdonthineni@nvidia.com>
>>
>> The device-tree binding has two examples for MSC associated with memory
>> controllers. Add the support to discover the component_id from the device-tree
>> and create 'memory' RIS.
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index a0d9a699a6e7..71a1fb1a9c75 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -62,41 +62,63 @@ static int mpam_dt_parse_resource(struct mpam_msc
>> *msc, struct device_node *np,
>>  				  u32 ris_idx)
>>  {
>>  	int err = 0;
>> -	u32 level = 0;
>> -	unsigned long cache_id;
>> -	struct device_node *cache;
>> +	u32 class_id = 0, component_id = 0;
>> +	struct device_node *cache = NULL, *memory = NULL;
>> +	enum mpam_class_types type = MPAM_CLASS_UNKNOWN;
>>
>>  	do {
>> +		/* What kind of MSC is this? */
>>  		if (of_device_is_compatible(np, "arm,mpam-cache")) {
>>  			cache = of_parse_phandle(np, "arm,mpam-device",
>> 0);
>>  			if (!cache) {
>>  				pr_err("Failed to read phandle\n");
>>  				break;
>>  			}
>> +			type = MPAM_CLASS_CACHE;
>>  		} else if (of_device_is_compatible(np->parent, "cache")) {
>>  			cache = of_node_get(np->parent);
>> +			type = MPAM_CLASS_CACHE;
>> +		} else if (of_device_is_compatible(np, "arm,mpam-memory"))
>> {
>> +			memory = of_parse_phandle(np, "arm,mpam-device",
>> 0);
>> +			if (!memory) {
>> +				pr_err("Failed to read phandle\n");
>> +				break;
>> +			}
>> +			type = MPAM_CLASS_MEMORY;
>> +		} else if (of_device_is_compatible(np,
>> "arm,mpam-memory-controller-msc")) {
>> +			memory = of_node_get(np->parent);
>> +			type = MPAM_CLASS_MEMORY;
>>  		} else {
>> -			/* For now, only caches are supported */
>> -			cache = NULL;
>> +			/*
>> +			 * For now, only caches and memory controllers are
>> +			 * supported.
>> +			 */
>>  			break;
>>  		}
> There is no need "{}" here.
Sure, but its more than one line, and all the previous parts of this else-if tree have
them. Keeping this here make it much easier to read.
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
 
- * [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (44 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 11/33] arm_mpam: Add support for memory controller MSC on DT platforms James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-29 12:41   ` Ben Horgan
  2025-09-09  7:30   ` Shaopeng Tan (Fujitsu)
  2025-08-22 15:30 ` [PATCH 13/33] arm_mpam: Add MPAM MSC register layout definitions James Morse
                   ` (21 subsequent siblings)
  67 siblings, 2 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Ben Horgan
An MSC is a container of resources, each identified by their RIS index.
Some RIS are described by firmware to provide their position in the system.
Others are discovered when the driver probes the hardware.
To configure a resource it needs to be found by its class, e.g. 'L2'.
There are two kinds of grouping, a class is a set of components, which
are visible to user-space as there are likely to be multiple instances
of the L2 cache. (e.g. one per cluster or package)
struct mpam_components are a set of struct mpam_vmsc. A vMSC groups the
RIS in an MSC that control the same logical piece of hardware. (e.g. L2).
This is to allow hardware implementations where two controls are presented
as different RIS. Re-combining these RIS allows their feature bits to
be or-ed. This structure is not visible outside mpam_devices.c
struct mpam_vmsc are then a set of struct mpam_msc_ris, which are not
visible as each L2 cache may be composed of individual slices which need
to be configured the same as the hardware is not able to distribute the
configuration.
Add support for creating and destroying these structures.
A gfp is passed as the structures may need creating when a new RIS entry
is discovered when probing the MSC.
CC: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
 * removed a pr_err() debug message that crept in.
---
 drivers/resctrl/mpam_devices.c  | 488 +++++++++++++++++++++++++++++++-
 drivers/resctrl/mpam_internal.h |  91 ++++++
 include/linux/arm_mpam.h        |   8 +-
 3 files changed, 574 insertions(+), 13 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 71a1fb1a9c75..5baf2a8786fb 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -20,7 +20,6 @@
 #include <linux/printk.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
-#include <linux/srcu.h>
 #include <linux/types.h>
 
 #include <acpi/pcc.h>
@@ -35,11 +34,483 @@
 static DEFINE_MUTEX(mpam_list_lock);
 static LIST_HEAD(mpam_all_msc);
 
-static struct srcu_struct mpam_srcu;
+struct srcu_struct mpam_srcu;
 
 /* MPAM isn't available until all the MSC have been probed. */
 static u32 mpam_num_msc;
 
+/*
+ * An MSC is a physical container for controls and monitors, each identified by
+ * their RIS index. These share a base-address, interrupts and some MMIO
+ * registers. A vMSC is a virtual container for RIS in an MSC that control or
+ * monitor the same thing. Members of a vMSC are all RIS in the same MSC, but
+ * not all RIS in an MSC share a vMSC.
+ * Components are a group of vMSC that control or monitor the same thing but
+ * are from different MSC, so have different base-address, interrupts etc.
+ * Classes are the set components of the same type.
+ *
+ * The features of a vMSC is the union of the RIS it contains.
+ * The features of a Class and Component are the common subset of the vMSC
+ * they contain.
+ *
+ * e.g. The system cache may have bandwidth controls on multiple interfaces,
+ * for regulating traffic from devices independently of traffic from CPUs.
+ * If these are two RIS in one MSC, they will be treated as controlling
+ * different things, and will not share a vMSC/component/class.
+ *
+ * e.g. The L2 may have one MSC and two RIS, one for cache-controls another
+ * for bandwidth. These two RIS are members of the same vMSC.
+ *
+ * e.g. The set of RIS that make up the L2 are grouped as a component. These
+ * are sometimes termed slices. They should be configured the same, as if there
+ * were only one.
+ *
+ * e.g. The SoC probably has more than one L2, each attached to a distinct set
+ * of CPUs. All the L2 components are grouped as a class.
+ *
+ * When creating an MSC, struct mpam_msc is added to the all mpam_all_msc list,
+ * then linked via struct mpam_ris to a vmsc, component and class.
+ * The same MSC may exist under different class->component->vmsc paths, but the
+ * RIS index will be unique.
+ */
+LIST_HEAD(mpam_classes);
+
+/* List of all objects that can be free()d after synchronise_srcu() */
+static LLIST_HEAD(mpam_garbage);
+
+#define init_garbage(x)	init_llist_node(&(x)->garbage.llist)
+
+static struct mpam_vmsc *
+mpam_vmsc_alloc(struct mpam_component *comp, struct mpam_msc *msc, gfp_t gfp)
+{
+	struct mpam_vmsc *vmsc;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	vmsc = kzalloc(sizeof(*vmsc), gfp);
+	if (!comp)
+		return ERR_PTR(-ENOMEM);
+	init_garbage(vmsc);
+
+	INIT_LIST_HEAD_RCU(&vmsc->ris);
+	INIT_LIST_HEAD_RCU(&vmsc->comp_list);
+	vmsc->comp = comp;
+	vmsc->msc = msc;
+
+	list_add_rcu(&vmsc->comp_list, &comp->vmsc);
+
+	return vmsc;
+}
+
+static struct mpam_vmsc *mpam_vmsc_get(struct mpam_component *comp,
+				       struct mpam_msc *msc, bool alloc,
+				       gfp_t gfp)
+{
+	struct mpam_vmsc *vmsc;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+		if (vmsc->msc->id == msc->id)
+			return vmsc;
+	}
+
+	if (!alloc)
+		return ERR_PTR(-ENOENT);
+
+	return mpam_vmsc_alloc(comp, msc, gfp);
+}
+
+static struct mpam_component *
+mpam_component_alloc(struct mpam_class *class, int id, gfp_t gfp)
+{
+	struct mpam_component *comp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	comp = kzalloc(sizeof(*comp), gfp);
+	if (!comp)
+		return ERR_PTR(-ENOMEM);
+	init_garbage(comp);
+
+	comp->comp_id = id;
+	INIT_LIST_HEAD_RCU(&comp->vmsc);
+	/* affinity is updated when ris are added */
+	INIT_LIST_HEAD_RCU(&comp->class_list);
+	comp->class = class;
+
+	list_add_rcu(&comp->class_list, &class->components);
+
+	return comp;
+}
+
+static struct mpam_component *
+mpam_component_get(struct mpam_class *class, int id, bool alloc, gfp_t gfp)
+{
+	struct mpam_component *comp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_for_each_entry(comp, &class->components, class_list) {
+		if (comp->comp_id == id)
+			return comp;
+	}
+
+	if (!alloc)
+		return ERR_PTR(-ENOENT);
+
+	return mpam_component_alloc(class, id, gfp);
+}
+
+static struct mpam_class *
+mpam_class_alloc(u8 level_idx, enum mpam_class_types type, gfp_t gfp)
+{
+	struct mpam_class *class;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	class = kzalloc(sizeof(*class), gfp);
+	if (!class)
+		return ERR_PTR(-ENOMEM);
+	init_garbage(class);
+
+	INIT_LIST_HEAD_RCU(&class->components);
+	/* affinity is updated when ris are added */
+	class->level = level_idx;
+	class->type = type;
+	INIT_LIST_HEAD_RCU(&class->classes_list);
+
+	list_add_rcu(&class->classes_list, &mpam_classes);
+
+	return class;
+}
+
+static struct mpam_class *
+mpam_class_get(u8 level_idx, enum mpam_class_types type, bool alloc, gfp_t gfp)
+{
+	bool found = false;
+	struct mpam_class *class;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_for_each_entry(class, &mpam_classes, classes_list) {
+		if (class->type == type && class->level == level_idx) {
+			found = true;
+			break;
+		}
+	}
+
+	if (found)
+		return class;
+
+	if (!alloc)
+		return ERR_PTR(-ENOENT);
+
+	return mpam_class_alloc(level_idx, type, gfp);
+}
+
+#define add_to_garbage(x)				\
+do {							\
+	__typeof__(x) _x = x;				\
+	(_x)->garbage.to_free = (_x);			\
+	llist_add(&(_x)->garbage.llist, &mpam_garbage);	\
+} while (0)
+
+static void mpam_class_destroy(struct mpam_class *class)
+{
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_del_rcu(&class->classes_list);
+	add_to_garbage(class);
+}
+
+static void mpam_comp_destroy(struct mpam_component *comp)
+{
+	struct mpam_class *class = comp->class;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_del_rcu(&comp->class_list);
+	add_to_garbage(comp);
+
+	if (list_empty(&class->components))
+		mpam_class_destroy(class);
+}
+
+static void mpam_vmsc_destroy(struct mpam_vmsc *vmsc)
+{
+	struct mpam_component *comp = vmsc->comp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_del_rcu(&vmsc->comp_list);
+	add_to_garbage(vmsc);
+
+	if (list_empty(&comp->vmsc))
+		mpam_comp_destroy(comp);
+}
+
+static void mpam_ris_destroy(struct mpam_msc_ris *ris)
+{
+	struct mpam_vmsc *vmsc = ris->vmsc;
+	struct mpam_msc *msc = vmsc->msc;
+	struct platform_device *pdev = msc->pdev;
+	struct mpam_component *comp = vmsc->comp;
+	struct mpam_class *class = comp->class;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	cpumask_andnot(&comp->affinity, &comp->affinity, &ris->affinity);
+	cpumask_andnot(&class->affinity, &class->affinity, &ris->affinity);
+	clear_bit(ris->ris_idx, msc->ris_idxs);
+	list_del_rcu(&ris->vmsc_list);
+	list_del_rcu(&ris->msc_list);
+	add_to_garbage(ris);
+	ris->garbage.pdev = pdev;
+
+	if (list_empty(&vmsc->ris))
+		mpam_vmsc_destroy(vmsc);
+}
+
+/*
+ * There are two ways of reaching a struct mpam_msc_ris. Via the
+ * class->component->vmsc->ris, or via the msc.
+ * When destroying the msc, the other side needs unlinking and cleaning up too.
+ */
+static void mpam_msc_destroy(struct mpam_msc *msc)
+{
+	struct platform_device *pdev = msc->pdev;
+	struct mpam_msc_ris *ris, *tmp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_del_rcu(&msc->glbl_list);
+	platform_set_drvdata(pdev, NULL);
+
+	list_for_each_entry_safe(ris, tmp, &msc->ris, msc_list)
+		mpam_ris_destroy(ris);
+
+	add_to_garbage(msc);
+	msc->garbage.pdev = pdev;
+}
+
+static void mpam_free_garbage(void)
+{
+	struct mpam_garbage *iter, *tmp;
+	struct llist_node *to_free = llist_del_all(&mpam_garbage);
+
+	if (!to_free)
+		return;
+
+	synchronize_srcu(&mpam_srcu);
+
+	llist_for_each_entry_safe(iter, tmp, to_free, llist) {
+		if (iter->pdev)
+			devm_kfree(&iter->pdev->dev, iter->to_free);
+		else
+			kfree(iter->to_free);
+	}
+}
+
+/* Called recursively to walk the list of caches from a particular CPU */
+static void __mpam_get_cpumask_from_cache_id(int cpu, struct device_node *cache_node,
+					     unsigned long cache_id,
+					     u32 cache_level,
+					     cpumask_t *affinity)
+{
+	int err;
+	u32 iter_level;
+	unsigned long iter_cache_id;
+	struct device_node *iter_node __free(device_node) = of_find_next_cache_node(cache_node);
+
+	if (!iter_node)
+		return;
+
+	err = of_property_read_u32(iter_node, "cache-level", &iter_level);
+	if (err)
+		return;
+
+	/*
+	 * get_cpu_cacheinfo_id() isn't ready until sometime
+	 * during device_initcall(). Use cache_of_calculate_id().
+	 */
+	iter_cache_id = cache_of_calculate_id(iter_node);
+	if (cache_id == ~0UL)
+		return;
+
+	if (iter_level == cache_level && iter_cache_id == cache_id)
+		cpumask_set_cpu(cpu, affinity);
+
+	__mpam_get_cpumask_from_cache_id(cpu, iter_node, cache_id, cache_level,
+					 affinity);
+}
+
+/*
+ * The cacheinfo structures are only populated when CPUs are online.
+ * This helper walks the device tree to include offline CPUs too.
+ */
+int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
+				   cpumask_t *affinity)
+{
+	int cpu;
+
+	if (!acpi_disabled)
+		return acpi_pptt_get_cpumask_from_cache_id(cache_id, affinity);
+
+	for_each_possible_cpu(cpu) {
+		struct device_node *cpu_node __free(device_node) = of_get_cpu_node(cpu, NULL);
+		if (!cpu_node) {
+			pr_err("Failed to find cpu%d device node\n", cpu);
+			return -ENOENT;
+		}
+
+		__mpam_get_cpumask_from_cache_id(cpu, cpu_node, cache_id,
+						 cache_level, affinity);
+			continue;
+	}
+
+	return 0;
+}
+
+/*
+ * cpumask_of_node() only knows about online CPUs. This can't tell us whether
+ * a class is represented on all possible CPUs.
+ */
+static void get_cpumask_from_node_id(u32 node_id, cpumask_t *affinity)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		if (node_id == cpu_to_node(cpu))
+			cpumask_set_cpu(cpu, affinity);
+	}
+}
+
+static int get_cpumask_from_cache(struct device_node *cache,
+				  cpumask_t *affinity)
+{
+	int err;
+	u32 cache_level;
+	unsigned long cache_id;
+
+	err = of_property_read_u32(cache, "cache-level", &cache_level);
+	if (err) {
+		pr_err("Failed to read cache-level from cache node\n");
+		return -ENOENT;
+	}
+
+	cache_id = cache_of_calculate_id(cache);
+	if (cache_id == ~0UL) {
+		pr_err("Failed to calculate cache-id from cache node\n");
+		return -ENOENT;
+	}
+
+	return mpam_get_cpumask_from_cache_id(cache_id, cache_level, affinity);
+}
+
+static int mpam_ris_get_affinity(struct mpam_msc *msc, cpumask_t *affinity,
+				 enum mpam_class_types type,
+				 struct mpam_class *class,
+				 struct mpam_component *comp)
+{
+	int err;
+
+	switch (type) {
+	case MPAM_CLASS_CACHE:
+		err = mpam_get_cpumask_from_cache_id(comp->comp_id, class->level,
+						     affinity);
+		if (err)
+			return err;
+
+		if (cpumask_empty(affinity))
+			pr_warn_once("%s no CPUs associated with cache node",
+				     dev_name(&msc->pdev->dev));
+
+		break;
+	case MPAM_CLASS_MEMORY:
+		get_cpumask_from_node_id(comp->comp_id, affinity);
+		/* affinity may be empty for CPU-less memory nodes */
+		break;
+	case MPAM_CLASS_UNKNOWN:
+		return 0;
+	}
+
+	cpumask_and(affinity, affinity, &msc->accessibility);
+
+	return 0;
+}
+
+static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
+				  enum mpam_class_types type, u8 class_id,
+				  int component_id, gfp_t gfp)
+{
+	int err;
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc_ris *ris;
+	struct mpam_class *class;
+	struct mpam_component *comp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	if (test_and_set_bit(ris_idx, msc->ris_idxs))
+		return -EBUSY;
+
+	ris = devm_kzalloc(&msc->pdev->dev, sizeof(*ris), gfp);
+	if (!ris)
+		return -ENOMEM;
+	init_garbage(ris);
+
+	class = mpam_class_get(class_id, type, true, gfp);
+	if (IS_ERR(class))
+		return PTR_ERR(class);
+
+	comp = mpam_component_get(class, component_id, true, gfp);
+	if (IS_ERR(comp)) {
+		if (list_empty(&class->components))
+			mpam_class_destroy(class);
+		return PTR_ERR(comp);
+	}
+
+	vmsc = mpam_vmsc_get(comp, msc, true, gfp);
+	if (IS_ERR(vmsc)) {
+		if (list_empty(&comp->vmsc))
+			mpam_comp_destroy(comp);
+		return PTR_ERR(vmsc);
+	}
+
+	err = mpam_ris_get_affinity(msc, &ris->affinity, type, class, comp);
+	if (err) {
+		if (list_empty(&vmsc->ris))
+			mpam_vmsc_destroy(vmsc);
+		return err;
+	}
+
+	ris->ris_idx = ris_idx;
+	INIT_LIST_HEAD_RCU(&ris->vmsc_list);
+	ris->vmsc = vmsc;
+
+	cpumask_or(&comp->affinity, &comp->affinity, &ris->affinity);
+	cpumask_or(&class->affinity, &class->affinity, &ris->affinity);
+	list_add_rcu(&ris->vmsc_list, &vmsc->ris);
+
+	return 0;
+}
+
+int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
+		    enum mpam_class_types type, u8 class_id, int component_id)
+{
+	int err;
+
+	mutex_lock(&mpam_list_lock);
+	err = mpam_ris_create_locked(msc, ris_idx, type, class_id,
+				     component_id, GFP_KERNEL);
+	mutex_unlock(&mpam_list_lock);
+	if (err)
+		mpam_free_garbage();
+
+	return err;
+}
+
 static void mpam_discovery_complete(void)
 {
 	pr_err("Discovered all MSC\n");
@@ -179,7 +650,10 @@ static int update_msc_accessibility(struct mpam_msc *msc)
 		cpumask_copy(&msc->accessibility, cpu_possible_mask);
 		err = 0;
 	} else {
-		if (of_device_is_compatible(parent, "memory")) {
+		if (of_device_is_compatible(parent, "cache")) {
+			err = get_cpumask_from_cache(parent,
+						     &msc->accessibility);
+		} else if (of_device_is_compatible(parent, "memory")) {
 			cpumask_copy(&msc->accessibility, cpu_possible_mask);
 			err = 0;
 		} else {
@@ -209,11 +683,10 @@ static void mpam_msc_drv_remove(struct platform_device *pdev)
 
 	mutex_lock(&mpam_list_lock);
 	mpam_num_msc--;
-	platform_set_drvdata(pdev, NULL);
-	list_del_rcu(&msc->glbl_list);
-	synchronize_srcu(&mpam_srcu);
-	devm_kfree(&pdev->dev, msc);
+	mpam_msc_destroy(msc);
 	mutex_unlock(&mpam_list_lock);
+
+	mpam_free_garbage();
 }
 
 static int mpam_msc_drv_probe(struct platform_device *pdev)
@@ -230,6 +703,7 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
 			err = -ENOMEM;
 			break;
 		}
+		init_garbage(msc);
 
 		mutex_init(&msc->probe_lock);
 		mutex_init(&msc->part_sel_lock);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 07e0f240eaca..d49bb884b433 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -7,10 +7,27 @@
 #include <linux/arm_mpam.h>
 #include <linux/cpumask.h>
 #include <linux/io.h>
+#include <linux/llist.h>
 #include <linux/mailbox_client.h>
 #include <linux/mutex.h>
 #include <linux/resctrl.h>
 #include <linux/sizes.h>
+#include <linux/srcu.h>
+
+/*
+ * Structures protected by SRCU may not be freed for a surprising amount of
+ * time (especially if perf is running). To ensure the MPAM error interrupt can
+ * tear down all the structures, build a list of objects that can be gargbage
+ * collected once synchronize_srcu() has returned.
+ * If pdev is non-NULL, use devm_kfree().
+ */
+struct mpam_garbage {
+	/* member of mpam_garbage */
+	struct llist_node	llist;
+
+	void			*to_free;
+	struct platform_device	*pdev;
+};
 
 struct mpam_msc {
 	/* member of mpam_all_msc */
@@ -57,6 +74,80 @@ struct mpam_msc {
 
 	void __iomem		*mapped_hwpage;
 	size_t			mapped_hwpage_sz;
+
+	struct mpam_garbage	garbage;
+};
+
+struct mpam_class {
+	/* mpam_components in this class */
+	struct list_head	components;
+
+	cpumask_t		affinity;
+
+	u8			level;
+	enum mpam_class_types	type;
+
+	/* member of mpam_classes */
+	struct list_head	classes_list;
+
+	struct mpam_garbage	garbage;
+};
+
+struct mpam_component {
+	u32			comp_id;
+
+	/* mpam_vmsc in this component */
+	struct list_head	vmsc;
+
+	cpumask_t		affinity;
+
+	/* member of mpam_class:components */
+	struct list_head	class_list;
+
+	/* parent: */
+	struct mpam_class	*class;
+
+	struct mpam_garbage	garbage;
 };
 
+struct mpam_vmsc {
+	/* member of mpam_component:vmsc_list */
+	struct list_head	comp_list;
+
+	/* mpam_msc_ris in this vmsc */
+	struct list_head	ris;
+
+	/* All RIS in this vMSC are members of this MSC */
+	struct mpam_msc		*msc;
+
+	/* parent: */
+	struct mpam_component	*comp;
+
+	struct mpam_garbage	garbage;
+};
+
+struct mpam_msc_ris {
+	u8			ris_idx;
+
+	cpumask_t		affinity;
+
+	/* member of mpam_vmsc:ris */
+	struct list_head	vmsc_list;
+
+	/* member of mpam_msc:ris */
+	struct list_head	msc_list;
+
+	/* parent: */
+	struct mpam_vmsc	*vmsc;
+
+	struct mpam_garbage	garbage;
+};
+
+/* List of all classes - protected by srcu*/
+extern struct srcu_struct mpam_srcu;
+extern struct list_head mpam_classes;
+
+int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
+				   cpumask_t *affinity);
+
 #endif /* MPAM_INTERNAL_H */
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
index 0edefa6ba019..406a77be68cb 100644
--- a/include/linux/arm_mpam.h
+++ b/include/linux/arm_mpam.h
@@ -36,11 +36,7 @@ static inline int acpi_mpam_parse_resources(struct mpam_msc *msc,
 static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
 #endif
 
-static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
-				  enum mpam_class_types type, u8 class_id,
-				  int component_id)
-{
-	return -EINVAL;
-}
+int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
+		    enum mpam_class_types type, u8 class_id, int component_id);
 
 #endif /* __LINUX_ARM_MPAM_H */
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described
  2025-08-22 15:30 ` [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described James Morse
@ 2025-08-29 12:41   ` Ben Horgan
  2025-09-10 19:32     ` James Morse
  2025-09-09  7:30   ` Shaopeng Tan (Fujitsu)
  1 sibling, 1 reply; 200+ messages in thread
From: Ben Horgan @ 2025-08-29 12:41 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:30, James Morse wrote:
> An MSC is a container of resources, each identified by their RIS index.
> Some RIS are described by firmware to provide their position in the system.
> Others are discovered when the driver probes the hardware.
> 
> To configure a resource it needs to be found by its class, e.g. 'L2'.
> There are two kinds of grouping, a class is a set of components, which
> are visible to user-space as there are likely to be multiple instances
> of the L2 cache. (e.g. one per cluster or package)
> 
> struct mpam_components are a set of struct mpam_vmsc. A vMSC groups the
> RIS in an MSC that control the same logical piece of hardware. (e.g. L2).
> This is to allow hardware implementations where two controls are presented
> as different RIS. Re-combining these RIS allows their feature bits to
> be or-ed. This structure is not visible outside mpam_devices.c
> 
> struct mpam_vmsc are then a set of struct mpam_msc_ris, which are not
> visible as each L2 cache may be composed of individual slices which need
> to be configured the same as the hardware is not able to distribute the
> configuration.
> 
> Add support for creating and destroying these structures.
> 
> A gfp is passed as the structures may need creating when a new RIS entry
> is discovered when probing the MSC.
> 
> CC: Ben Horgan <ben.horgan@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
>  * removed a pr_err() debug message that crept in.
> ---
>  drivers/resctrl/mpam_devices.c  | 488 +++++++++++++++++++++++++++++++-
>  drivers/resctrl/mpam_internal.h |  91 ++++++
>  include/linux/arm_mpam.h        |   8 +-
>  3 files changed, 574 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 71a1fb1a9c75..5baf2a8786fb 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -20,7 +20,6 @@
>  #include <linux/printk.h>
>  #include <linux/slab.h>
>  #include <linux/spinlock.h>
> -#include <linux/srcu.h>
>  #include <linux/types.h>
>  
>  #include <acpi/pcc.h>
> @@ -35,11 +34,483 @@
>  static DEFINE_MUTEX(mpam_list_lock);
>  static LIST_HEAD(mpam_all_msc);
>  
> -static struct srcu_struct mpam_srcu;
> +struct srcu_struct mpam_srcu;
>  
>  /* MPAM isn't available until all the MSC have been probed. */
>  static u32 mpam_num_msc;
>  
> +/*
> + * An MSC is a physical container for controls and monitors, each identified by
> + * their RIS index. These share a base-address, interrupts and some MMIO
> + * registers. A vMSC is a virtual container for RIS in an MSC that control or
> + * monitor the same thing. Members of a vMSC are all RIS in the same MSC, but
> + * not all RIS in an MSC share a vMSC.
> + * Components are a group of vMSC that control or monitor the same thing but
> + * are from different MSC, so have different base-address, interrupts etc.
> + * Classes are the set components of the same type.
> + *
> + * The features of a vMSC is the union of the RIS it contains.
> + * The features of a Class and Component are the common subset of the vMSC
> + * they contain.
> + *
> + * e.g. The system cache may have bandwidth controls on multiple interfaces,
> + * for regulating traffic from devices independently of traffic from CPUs.
> + * If these are two RIS in one MSC, they will be treated as controlling
> + * different things, and will not share a vMSC/component/class.
> + *
> + * e.g. The L2 may have one MSC and two RIS, one for cache-controls another
> + * for bandwidth. These two RIS are members of the same vMSC.
> + *
> + * e.g. The set of RIS that make up the L2 are grouped as a component. These
> + * are sometimes termed slices. They should be configured the same, as if there
> + * were only one.
> + *
> + * e.g. The SoC probably has more than one L2, each attached to a distinct set
> + * of CPUs. All the L2 components are grouped as a class.
> + *
> + * When creating an MSC, struct mpam_msc is added to the all mpam_all_msc list,
> + * then linked via struct mpam_ris to a vmsc, component and class.
> + * The same MSC may exist under different class->component->vmsc paths, but the
> + * RIS index will be unique.
> + */
> +LIST_HEAD(mpam_classes);
> +
> +/* List of all objects that can be free()d after synchronise_srcu() */
> +static LLIST_HEAD(mpam_garbage);
> +
> +#define init_garbage(x)	init_llist_node(&(x)->garbage.llist)
> +
> +static struct mpam_vmsc *
> +mpam_vmsc_alloc(struct mpam_component *comp, struct mpam_msc *msc, gfp_t gfp)
> +{
> +	struct mpam_vmsc *vmsc;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	vmsc = kzalloc(sizeof(*vmsc), gfp);
> +	if (!comp)
> +		return ERR_PTR(-ENOMEM);
> +	init_garbage(vmsc);
> +
> +	INIT_LIST_HEAD_RCU(&vmsc->ris);
> +	INIT_LIST_HEAD_RCU(&vmsc->comp_list);
> +	vmsc->comp = comp;
> +	vmsc->msc = msc;
> +
> +	list_add_rcu(&vmsc->comp_list, &comp->vmsc);
> +
> +	return vmsc;
> +}
> +
> +static struct mpam_vmsc *mpam_vmsc_get(struct mpam_component *comp,
> +				       struct mpam_msc *msc, bool alloc,
> +				       gfp_t gfp)
> +{
> +	struct mpam_vmsc *vmsc;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
> +		if (vmsc->msc->id == msc->id)
> +			return vmsc;
> +	}
> +
> +	if (!alloc)
> +		return ERR_PTR(-ENOENT);
> +
> +	return mpam_vmsc_alloc(comp, msc, gfp);
> +}
> +
> +static struct mpam_component *
> +mpam_component_alloc(struct mpam_class *class, int id, gfp_t gfp)
> +{
> +	struct mpam_component *comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	comp = kzalloc(sizeof(*comp), gfp);
> +	if (!comp)
> +		return ERR_PTR(-ENOMEM);
> +	init_garbage(comp);
> +
> +	comp->comp_id = id;
> +	INIT_LIST_HEAD_RCU(&comp->vmsc);
> +	/* affinity is updated when ris are added */
> +	INIT_LIST_HEAD_RCU(&comp->class_list);
> +	comp->class = class;
> +
> +	list_add_rcu(&comp->class_list, &class->components);
> +
> +	return comp;
> +}
> +
> +static struct mpam_component *
> +mpam_component_get(struct mpam_class *class, int id, bool alloc, gfp_t gfp)
> +{
> +	struct mpam_component *comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_for_each_entry(comp, &class->components, class_list) {
> +		if (comp->comp_id == id)
> +			return comp;
> +	}
> +
> +	if (!alloc)
> +		return ERR_PTR(-ENOENT);
> +
> +	return mpam_component_alloc(class, id, gfp);
> +}
> +
> +static struct mpam_class *
> +mpam_class_alloc(u8 level_idx, enum mpam_class_types type, gfp_t gfp)
> +{
> +	struct mpam_class *class;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	class = kzalloc(sizeof(*class), gfp);
> +	if (!class)
> +		return ERR_PTR(-ENOMEM);
> +	init_garbage(class);
> +
> +	INIT_LIST_HEAD_RCU(&class->components);
> +	/* affinity is updated when ris are added */
> +	class->level = level_idx;
> +	class->type = type;
> +	INIT_LIST_HEAD_RCU(&class->classes_list);
> +
> +	list_add_rcu(&class->classes_list, &mpam_classes);
> +
> +	return class;
> +}
> +
> +static struct mpam_class *
> +mpam_class_get(u8 level_idx, enum mpam_class_types type, bool alloc, gfp_t gfp)
> +{
> +	bool found = false;
> +	struct mpam_class *class;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_for_each_entry(class, &mpam_classes, classes_list) {
> +		if (class->type == type && class->level == level_idx) {
> +			found = true;
> +			break;
> +		}
> +	}
> +
> +	if (found)
> +		return class;
> +
> +	if (!alloc)
> +		return ERR_PTR(-ENOENT);
> +
> +	return mpam_class_alloc(level_idx, type, gfp);
> +}
> +
> +#define add_to_garbage(x)				\
> +do {							\
> +	__typeof__(x) _x = x;				\
> +	(_x)->garbage.to_free = (_x);			\
> +	llist_add(&(_x)->garbage.llist, &mpam_garbage);	\
> +} while (0)
> +
> +static void mpam_class_destroy(struct mpam_class *class)
> +{
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_del_rcu(&class->classes_list);
> +	add_to_garbage(class);
> +}
> +
> +static void mpam_comp_destroy(struct mpam_component *comp)
> +{
> +	struct mpam_class *class = comp->class;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_del_rcu(&comp->class_list);
> +	add_to_garbage(comp);
> +
> +	if (list_empty(&class->components))
> +		mpam_class_destroy(class);
> +}
> +
> +static void mpam_vmsc_destroy(struct mpam_vmsc *vmsc)
> +{
> +	struct mpam_component *comp = vmsc->comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_del_rcu(&vmsc->comp_list);
> +	add_to_garbage(vmsc);
> +
> +	if (list_empty(&comp->vmsc))
> +		mpam_comp_destroy(comp);
> +}
> +
> +static void mpam_ris_destroy(struct mpam_msc_ris *ris)
> +{
> +	struct mpam_vmsc *vmsc = ris->vmsc;
> +	struct mpam_msc *msc = vmsc->msc;
> +	struct platform_device *pdev = msc->pdev;
> +	struct mpam_component *comp = vmsc->comp;
> +	struct mpam_class *class = comp->class;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	cpumask_andnot(&comp->affinity, &comp->affinity, &ris->affinity);
> +	cpumask_andnot(&class->affinity, &class->affinity, &ris->affinity);
> +	clear_bit(ris->ris_idx, msc->ris_idxs);
> +	list_del_rcu(&ris->vmsc_list);
> +	list_del_rcu(&ris->msc_list);
> +	add_to_garbage(ris);
> +	ris->garbage.pdev = pdev;
> +
> +	if (list_empty(&vmsc->ris))
> +		mpam_vmsc_destroy(vmsc);
> +}
> +
> +/*
> + * There are two ways of reaching a struct mpam_msc_ris. Via the
> + * class->component->vmsc->ris, or via the msc.
> + * When destroying the msc, the other side needs unlinking and cleaning up too.
> + */
> +static void mpam_msc_destroy(struct mpam_msc *msc)
> +{
> +	struct platform_device *pdev = msc->pdev;
> +	struct mpam_msc_ris *ris, *tmp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_del_rcu(&msc->glbl_list);
> +	platform_set_drvdata(pdev, NULL);
> +
> +	list_for_each_entry_safe(ris, tmp, &msc->ris, msc_list)
> +		mpam_ris_destroy(ris);
> +
> +	add_to_garbage(msc);
> +	msc->garbage.pdev = pdev;
> +}
> +
> +static void mpam_free_garbage(void)
> +{
> +	struct mpam_garbage *iter, *tmp;
> +	struct llist_node *to_free = llist_del_all(&mpam_garbage);
> +
> +	if (!to_free)
> +		return;
> +
> +	synchronize_srcu(&mpam_srcu);
> +
> +	llist_for_each_entry_safe(iter, tmp, to_free, llist) {
> +		if (iter->pdev)
> +			devm_kfree(&iter->pdev->dev, iter->to_free);
> +		else
> +			kfree(iter->to_free);
> +	}
> +}
> +
> +/* Called recursively to walk the list of caches from a particular CPU */
> +static void __mpam_get_cpumask_from_cache_id(int cpu, struct device_node *cache_node,
> +					     unsigned long cache_id,
> +					     u32 cache_level,
> +					     cpumask_t *affinity)
> +{
> +	int err;
> +	u32 iter_level;
> +	unsigned long iter_cache_id;
> +	struct device_node *iter_node __free(device_node) = of_find_next_cache_node(cache_node);
> +
> +	if (!iter_node)
> +		return;
> +
> +	err = of_property_read_u32(iter_node, "cache-level", &iter_level);
> +	if (err)
> +		return;
> +
> +	/*
> +	 * get_cpu_cacheinfo_id() isn't ready until sometime
> +	 * during device_initcall(). Use cache_of_calculate_id().
> +	 */
> +	iter_cache_id = cache_of_calculate_id(iter_node);
> +	if (cache_id == ~0UL)
> +		return;
> +
> +	if (iter_level == cache_level && iter_cache_id == cache_id)
> +		cpumask_set_cpu(cpu, affinity);
> +
> +	__mpam_get_cpumask_from_cache_id(cpu, iter_node, cache_id, cache_level,
> +					 affinity);
> +}
> +
> +/*
> + * The cacheinfo structures are only populated when CPUs are online.
> + * This helper walks the device tree to include offline CPUs too.
> + */
> +int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
> +				   cpumask_t *affinity)
> +{
> +	int cpu;
> +
> +	if (!acpi_disabled)
> +		return acpi_pptt_get_cpumask_from_cache_id(cache_id, affinity);
> +
> +	for_each_possible_cpu(cpu) {
> +		struct device_node *cpu_node __free(device_node) = of_get_cpu_node(cpu, NULL);
> +		if (!cpu_node) {
> +			pr_err("Failed to find cpu%d device node\n", cpu);
> +			return -ENOENT;
> +		}
> +
> +		__mpam_get_cpumask_from_cache_id(cpu, cpu_node, cache_id,
> +						 cache_level, affinity);
> +			continue;
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * cpumask_of_node() only knows about online CPUs. This can't tell us whether
> + * a class is represented on all possible CPUs.
> + */
> +static void get_cpumask_from_node_id(u32 node_id, cpumask_t *affinity)
> +{
> +	int cpu;
> +
> +	for_each_possible_cpu(cpu) {
> +		if (node_id == cpu_to_node(cpu))
> +			cpumask_set_cpu(cpu, affinity);
> +	}
> +}
> +
> +static int get_cpumask_from_cache(struct device_node *cache,
> +				  cpumask_t *affinity)
> +{
> +	int err;
> +	u32 cache_level;
> +	unsigned long cache_id;
> +
> +	err = of_property_read_u32(cache, "cache-level", &cache_level);
> +	if (err) {
> +		pr_err("Failed to read cache-level from cache node\n");
> +		return -ENOENT;
> +	}
> +
> +	cache_id = cache_of_calculate_id(cache);
> +	if (cache_id == ~0UL) {
> +		pr_err("Failed to calculate cache-id from cache node\n");
> +		return -ENOENT;
> +	}
> +
> +	return mpam_get_cpumask_from_cache_id(cache_id, cache_level, affinity);
> +}
> +
> +static int mpam_ris_get_affinity(struct mpam_msc *msc, cpumask_t *affinity,
> +				 enum mpam_class_types type,
> +				 struct mpam_class *class,
> +				 struct mpam_component *comp)
> +{
> +	int err;
> +
> +	switch (type) {
> +	case MPAM_CLASS_CACHE:
> +		err = mpam_get_cpumask_from_cache_id(comp->comp_id, class->level,
> +						     affinity);
> +		if (err)
> +			return err;
> +
> +		if (cpumask_empty(affinity))
> +			pr_warn_once("%s no CPUs associated with cache node",
> +				     dev_name(&msc->pdev->dev));
> +
> +		break;
> +	case MPAM_CLASS_MEMORY:
> +		get_cpumask_from_node_id(comp->comp_id, affinity);
> +		/* affinity may be empty for CPU-less memory nodes */
> +		break;
> +	case MPAM_CLASS_UNKNOWN:
> +		return 0;
> +	}
> +
> +	cpumask_and(affinity, affinity, &msc->accessibility);
> +
> +	return 0;
> +}
> +
> +static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
> +				  enum mpam_class_types type, u8 class_id,
> +				  int component_id, gfp_t gfp)
> +{
> +	int err;
> +	struct mpam_vmsc *vmsc;
> +	struct mpam_msc_ris *ris;
> +	struct mpam_class *class;
> +	struct mpam_component *comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	if (test_and_set_bit(ris_idx, msc->ris_idxs))
> +		return -EBUSY;
> +
> +	ris = devm_kzalloc(&msc->pdev->dev, sizeof(*ris), gfp);
> +	if (!ris)
> +		return -ENOMEM;
> +	init_garbage(ris);
> +
> +	class = mpam_class_get(class_id, type, true, gfp);
> +	if (IS_ERR(class))
> +		return PTR_ERR(class);
> +
> +	comp = mpam_component_get(class, component_id, true, gfp);
> +	if (IS_ERR(comp)) {
> +		if (list_empty(&class->components))
> +			mpam_class_destroy(class);
> +		return PTR_ERR(comp);
> +	}
> +
> +	vmsc = mpam_vmsc_get(comp, msc, true, gfp);
> +	if (IS_ERR(vmsc)) {
> +		if (list_empty(&comp->vmsc))
> +			mpam_comp_destroy(comp);
> +		return PTR_ERR(vmsc);
> +	}
> +
> +	err = mpam_ris_get_affinity(msc, &ris->affinity, type, class, comp);
> +	if (err) {
> +		if (list_empty(&vmsc->ris))
> +			mpam_vmsc_destroy(vmsc);
> +		return err;
> +	}
> +
> +	ris->ris_idx = ris_idx;
> +	INIT_LIST_HEAD_RCU(&ris->vmsc_list);
> +	ris->vmsc = vmsc;
> +
> +	cpumask_or(&comp->affinity, &comp->affinity, &ris->affinity);
> +	cpumask_or(&class->affinity, &class->affinity, &ris->affinity);
> +	list_add_rcu(&ris->vmsc_list, &vmsc->ris);
> +
> +	return 0;
> +}
> +
> +int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
> +		    enum mpam_class_types type, u8 class_id, int component_id)
> +{
> +	int err;
> +
> +	mutex_lock(&mpam_list_lock);
> +	err = mpam_ris_create_locked(msc, ris_idx, type, class_id,
> +				     component_id, GFP_KERNEL);
> +	mutex_unlock(&mpam_list_lock);
> +	if (err)
> +		mpam_free_garbage();
> +
> +	return err;
> +}
> +
>  static void mpam_discovery_complete(void)
>  {
>  	pr_err("Discovered all MSC\n");
> @@ -179,7 +650,10 @@ static int update_msc_accessibility(struct mpam_msc *msc)
>  		cpumask_copy(&msc->accessibility, cpu_possible_mask);
>  		err = 0;
>  	} else {
> -		if (of_device_is_compatible(parent, "memory")) {
> +		if (of_device_is_compatible(parent, "cache")) {
> +			err = get_cpumask_from_cache(parent,
> +						     &msc->accessibility);
> +		} else if (of_device_is_compatible(parent, "memory")) {
The determination of the accessibility for the h/w msc doesn't fit with
the subject of this patch. Could this hunk and the supporting functions
be split into a precursor patch?
>  			cpumask_copy(&msc->accessibility, cpu_possible_mask);
>  			err = 0;
>  		} else {
> @@ -209,11 +683,10 @@ static void mpam_msc_drv_remove(struct platform_device *pdev)
>  
>  	mutex_lock(&mpam_list_lock);
>  	mpam_num_msc--;
> -	platform_set_drvdata(pdev, NULL);
> -	list_del_rcu(&msc->glbl_list);
> -	synchronize_srcu(&mpam_srcu);
> -	devm_kfree(&pdev->dev, msc);
> +	mpam_msc_destroy(msc);
>  	mutex_unlock(&mpam_list_lock);
> +
> +	mpam_free_garbage();
>  }
>  
>  static int mpam_msc_drv_probe(struct platform_device *pdev)
> @@ -230,6 +703,7 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
>  			err = -ENOMEM;
>  			break;
>  		}
> +		init_garbage(msc);
>  
>  		mutex_init(&msc->probe_lock);
>  		mutex_init(&msc->part_sel_lock);
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 07e0f240eaca..d49bb884b433 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -7,10 +7,27 @@
>  #include <linux/arm_mpam.h>
>  #include <linux/cpumask.h>
>  #include <linux/io.h>
> +#include <linux/llist.h>
>  #include <linux/mailbox_client.h>
>  #include <linux/mutex.h>
>  #include <linux/resctrl.h>
>  #include <linux/sizes.h>
> +#include <linux/srcu.h>
> +
> +/*
> + * Structures protected by SRCU may not be freed for a surprising amount of
> + * time (especially if perf is running). To ensure the MPAM error interrupt can
> + * tear down all the structures, build a list of objects that can be gargbage
nit: s/gargbage/garbage/
> + * collected once synchronize_srcu() has returned.> + * If pdev is
non-NULL, use devm_kfree().
> + */
> +struct mpam_garbage {
> +	/* member of mpam_garbage */
> +	struct llist_node	llist;
> +
> +	void			*to_free;
> +	struct platform_device	*pdev;
> +};
>  
>  struct mpam_msc {
>  	/* member of mpam_all_msc */
> @@ -57,6 +74,80 @@ struct mpam_msc {
>  
>  	void __iomem		*mapped_hwpage;
>  	size_t			mapped_hwpage_sz;
> +
> +	struct mpam_garbage	garbage;
> +};
> +
> +struct mpam_class {
> +	/* mpam_components in this class */
> +	struct list_head	components;
> +
> +	cpumask_t		affinity;
> +
> +	u8			level;
> +	enum mpam_class_types	type;
> +
> +	/* member of mpam_classes */
> +	struct list_head	classes_list;
> +
> +	struct mpam_garbage	garbage;
> +};
> +
> +struct mpam_component {
> +	u32			comp_id;
> +
> +	/* mpam_vmsc in this component */
> +	struct list_head	vmsc;
> +
> +	cpumask_t		affinity;
> +
> +	/* member of mpam_class:components */
> +	struct list_head	class_list;
> +
> +	/* parent: */
> +	struct mpam_class	*class;
> +
> +	struct mpam_garbage	garbage;
>  };
>  
> +struct mpam_vmsc {
> +	/* member of mpam_component:vmsc_list */
> +	struct list_head	comp_list;
> +
> +	/* mpam_msc_ris in this vmsc */
> +	struct list_head	ris;
> +
> +	/* All RIS in this vMSC are members of this MSC */
> +	struct mpam_msc		*msc;
> +
> +	/* parent: */
> +	struct mpam_component	*comp;
> +
> +	struct mpam_garbage	garbage;
> +};
> +
> +struct mpam_msc_ris {
> +	u8			ris_idx;
> +
> +	cpumask_t		affinity;
> +
> +	/* member of mpam_vmsc:ris */
> +	struct list_head	vmsc_list;
> +
> +	/* member of mpam_msc:ris */
> +	struct list_head	msc_list;
> +
> +	/* parent: */
> +	struct mpam_vmsc	*vmsc;
> +
> +	struct mpam_garbage	garbage;
> +};
> +
> +/* List of all classes - protected by srcu*/
> +extern struct srcu_struct mpam_srcu;
> +extern struct list_head mpam_classes;
> +
> +int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
> +				   cpumask_t *affinity);
> +
>  #endif /* MPAM_INTERNAL_H */
> diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
> index 0edefa6ba019..406a77be68cb 100644
> --- a/include/linux/arm_mpam.h
> +++ b/include/linux/arm_mpam.h
> @@ -36,11 +36,7 @@ static inline int acpi_mpam_parse_resources(struct mpam_msc *msc,
>  static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
>  #endif
>  
> -static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
> -				  enum mpam_class_types type, u8 class_id,
> -				  int component_id)
> -{
> -	return -EINVAL;
> -}
> +int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
> +		    enum mpam_class_types type, u8 class_id, int component_id);
>  
>  #endif /* __LINUX_ARM_MPAM_H */
Thanks,
Ben
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described
  2025-08-29 12:41   ` Ben Horgan
@ 2025-09-10 19:32     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 19:32 UTC (permalink / raw)
  To: Ben Horgan, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi Ben,
On 29/08/2025 13:41, Ben Horgan wrote:
> On 8/22/25 16:30, James Morse wrote:
>> An MSC is a container of resources, each identified by their RIS index.
>> Some RIS are described by firmware to provide their position in the system.
>> Others are discovered when the driver probes the hardware.
>>
>> To configure a resource it needs to be found by its class, e.g. 'L2'.
>> There are two kinds of grouping, a class is a set of components, which
>> are visible to user-space as there are likely to be multiple instances
>> of the L2 cache. (e.g. one per cluster or package)
>>
>> struct mpam_components are a set of struct mpam_vmsc. A vMSC groups the
>> RIS in an MSC that control the same logical piece of hardware. (e.g. L2).
>> This is to allow hardware implementations where two controls are presented
>> as different RIS. Re-combining these RIS allows their feature bits to
>> be or-ed. This structure is not visible outside mpam_devices.c
>>
>> struct mpam_vmsc are then a set of struct mpam_msc_ris, which are not
>> visible as each L2 cache may be composed of individual slices which need
>> to be configured the same as the hardware is not able to distribute the
>> configuration.
>>
>> Add support for creating and destroying these structures.
>>
>> A gfp is passed as the structures may need creating when a new RIS entry
>> is discovered when probing the MSC.
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index 71a1fb1a9c75..5baf2a8786fb 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -179,7 +650,10 @@ static int update_msc_accessibility(struct mpam_msc *msc)
>>  		cpumask_copy(&msc->accessibility, cpu_possible_mask);
>>  		err = 0;
>>  	} else {
>> -		if (of_device_is_compatible(parent, "memory")) {
>> +		if (of_device_is_compatible(parent, "cache")) {
>> +			err = get_cpumask_from_cache(parent,
>> +						     &msc->accessibility);
>> +		} else if (of_device_is_compatible(parent, "memory")) {
> The determination of the accessibility for the h/w msc doesn't fit with
> the subject of this patch. Could this hunk and the supporting functions
> be split into a precursor patch?
I've moved this bit into the previous patches.
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
- * RE: [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described
  2025-08-22 15:30 ` [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described James Morse
  2025-08-29 12:41   ` Ben Horgan
@ 2025-09-09  7:30   ` Shaopeng Tan (Fujitsu)
  2025-09-10 19:32     ` James Morse
  1 sibling, 1 reply; 200+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2025-09-09  7:30 UTC (permalink / raw)
  To: 'James Morse', linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-acpi@vger.kernel.org,
	devicetree@vger.kernel.org
  Cc: shameerali.kolothum.thodi@huawei.com, D Scott Phillips OS,
	carl@os.amperecomputing.com, lcherian@marvell.com,
	bobo.shaobowang@huawei.com, baolin.wang@linux.alibaba.com,
	Jamie Iles, Xin Hao, peternewman@google.com,
	dfustini@baylibre.com, amitsinght@marvell.com, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay@nvidia.com, baisheng.gao@unisoc.com, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Ben Horgan
Hello James,
> An MSC is a container of resources, each identified by their RIS index.
> Some RIS are described by firmware to provide their position in the system.
> Others are discovered when the driver probes the hardware.
> 
> To configure a resource it needs to be found by its class, e.g. 'L2'.
> There are two kinds of grouping, a class is a set of components, which are
> visible to user-space as there are likely to be multiple instances of the L2 cache.
> (e.g. one per cluster or package)
> 
> struct mpam_components are a set of struct mpam_vmsc. A vMSC groups the
> RIS in an MSC that control the same logical piece of hardware. (e.g. L2).
> This is to allow hardware implementations where two controls are presented as
> different RIS. Re-combining these RIS allows their feature bits to be or-ed. This
> structure is not visible outside mpam_devices.c
> 
> struct mpam_vmsc are then a set of struct mpam_msc_ris, which are not
> visible as each L2 cache may be composed of individual slices which need to
> be configured the same as the hardware is not able to distribute the
> configuration.
> 
> Add support for creating and destroying these structures.
> 
> A gfp is passed as the structures may need creating when a new RIS entry is
> discovered when probing the MSC.
> 
> CC: Ben Horgan <ben.horgan@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
>  * removed a pr_err() debug message that crept in.
> ---
>  drivers/resctrl/mpam_devices.c  | 488
> +++++++++++++++++++++++++++++++-
> drivers/resctrl/mpam_internal.h |  91 ++++++
>  include/linux/arm_mpam.h        |   8 +-
>  3 files changed, 574 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 71a1fb1a9c75..5baf2a8786fb 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -20,7 +20,6 @@
>  #include <linux/printk.h>
>  #include <linux/slab.h>
>  #include <linux/spinlock.h>
> -#include <linux/srcu.h>
>  #include <linux/types.h>
> 
>  #include <acpi/pcc.h>
> @@ -35,11 +34,483 @@
>  static DEFINE_MUTEX(mpam_list_lock);
>  static LIST_HEAD(mpam_all_msc);
> 
> -static struct srcu_struct mpam_srcu;
> +struct srcu_struct mpam_srcu;
> 
>  /* MPAM isn't available until all the MSC have been probed. */  static u32
> mpam_num_msc;
> 
> +/*
> + * An MSC is a physical container for controls and monitors, each
> +identified by
> + * their RIS index. These share a base-address, interrupts and some
> +MMIO
> + * registers. A vMSC is a virtual container for RIS in an MSC that
> +control or
> + * monitor the same thing. Members of a vMSC are all RIS in the same
> +MSC, but
> + * not all RIS in an MSC share a vMSC.
> + * Components are a group of vMSC that control or monitor the same
> +thing but
> + * are from different MSC, so have different base-address, interrupts etc.
> + * Classes are the set components of the same type.
> + *
> + * The features of a vMSC is the union of the RIS it contains.
> + * The features of a Class and Component are the common subset of the
> +vMSC
> + * they contain.
> + *
> + * e.g. The system cache may have bandwidth controls on multiple
> +interfaces,
> + * for regulating traffic from devices independently of traffic from CPUs.
> + * If these are two RIS in one MSC, they will be treated as controlling
> + * different things, and will not share a vMSC/component/class.
> + *
> + * e.g. The L2 may have one MSC and two RIS, one for cache-controls
> +another
> + * for bandwidth. These two RIS are members of the same vMSC.
> + *
> + * e.g. The set of RIS that make up the L2 are grouped as a component.
> +These
> + * are sometimes termed slices. They should be configured the same, as
> +if there
> + * were only one.
> + *
> + * e.g. The SoC probably has more than one L2, each attached to a
> +distinct set
> + * of CPUs. All the L2 components are grouped as a class.
> + *
> + * When creating an MSC, struct mpam_msc is added to the all
> +mpam_all_msc list,
> + * then linked via struct mpam_ris to a vmsc, component and class.
> + * The same MSC may exist under different class->component->vmsc paths,
> +but the
> + * RIS index will be unique.
> + */
> +LIST_HEAD(mpam_classes);
> +
> +/* List of all objects that can be free()d after synchronise_srcu() */
> +static LLIST_HEAD(mpam_garbage);
> +
> +#define init_garbage(x)	init_llist_node(&(x)->garbage.llist)
> +
> +static struct mpam_vmsc *
> +mpam_vmsc_alloc(struct mpam_component *comp, struct mpam_msc
> *msc,
> +gfp_t gfp) {
> +	struct mpam_vmsc *vmsc;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	vmsc = kzalloc(sizeof(*vmsc), gfp);
> +	if (!comp)
> +		return ERR_PTR(-ENOMEM);
> +	init_garbage(vmsc);
> +
> +	INIT_LIST_HEAD_RCU(&vmsc->ris);
> +	INIT_LIST_HEAD_RCU(&vmsc->comp_list);
> +	vmsc->comp = comp;
> +	vmsc->msc = msc;
> +
> +	list_add_rcu(&vmsc->comp_list, &comp->vmsc);
> +
> +	return vmsc;
> +}
> +
> +static struct mpam_vmsc *mpam_vmsc_get(struct mpam_component
> *comp,
> +				       struct mpam_msc *msc, bool alloc,
> +				       gfp_t gfp)
> +{
> +	struct mpam_vmsc *vmsc;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
> +		if (vmsc->msc->id == msc->id)
> +			return vmsc;
> +	}
> +
> +	if (!alloc)
> +		return ERR_PTR(-ENOENT);
It seems like it is always false here.
If necessary, why not do this at the beginning of the function?
> +	return mpam_vmsc_alloc(comp, msc, gfp); }
> +
> +static struct mpam_component *
> +mpam_component_alloc(struct mpam_class *class, int id, gfp_t gfp) {
> +	struct mpam_component *comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	comp = kzalloc(sizeof(*comp), gfp);
> +	if (!comp)
> +		return ERR_PTR(-ENOMEM);
> +	init_garbage(comp);
> +
> +	comp->comp_id = id;
> +	INIT_LIST_HEAD_RCU(&comp->vmsc);
> +	/* affinity is updated when ris are added */
> +	INIT_LIST_HEAD_RCU(&comp->class_list);
> +	comp->class = class;
> +
> +	list_add_rcu(&comp->class_list, &class->components);
> +
> +	return comp;
> +}
> +
> +static struct mpam_component *
> +mpam_component_get(struct mpam_class *class, int id, bool alloc, gfp_t
> +gfp) {
> +	struct mpam_component *comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_for_each_entry(comp, &class->components, class_list) {
> +		if (comp->comp_id == id)
> +			return comp;
> +	}
> +
> +	if (!alloc)
> +		return ERR_PTR(-ENOENT);
Same here.
> +	return mpam_component_alloc(class, id, gfp); }
> +
> +static struct mpam_class *
> +mpam_class_alloc(u8 level_idx, enum mpam_class_types type, gfp_t gfp) {
> +	struct mpam_class *class;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	class = kzalloc(sizeof(*class), gfp);
> +	if (!class)
> +		return ERR_PTR(-ENOMEM);
> +	init_garbage(class);
> +
> +	INIT_LIST_HEAD_RCU(&class->components);
> +	/* affinity is updated when ris are added */
> +	class->level = level_idx;
> +	class->type = type;
> +	INIT_LIST_HEAD_RCU(&class->classes_list);
> +
> +	list_add_rcu(&class->classes_list, &mpam_classes);
> +
> +	return class;
> +}
> +
> +static struct mpam_class *
> +mpam_class_get(u8 level_idx, enum mpam_class_types type, bool alloc,
> +gfp_t gfp) {
> +	bool found = false;
> +	struct mpam_class *class;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_for_each_entry(class, &mpam_classes, classes_list) {
> +		if (class->type == type && class->level == level_idx) {
> +			found = true;
> +			break;
> +		}
> +	}
> +
> +	if (found)
> +		return class;
> +
> +	if (!alloc)
> +		return ERR_PTR(-ENOENT);
Same here.
Best regards,
Shaopeng TAN
> +	return mpam_class_alloc(level_idx, type, gfp); }
> +
> +#define add_to_garbage(x)				\
> +do {							\
> +	__typeof__(x) _x = x;				\
> +	(_x)->garbage.to_free = (_x);			\
> +	llist_add(&(_x)->garbage.llist, &mpam_garbage);	\
> +} while (0)
> +
> +static void mpam_class_destroy(struct mpam_class *class) {
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_del_rcu(&class->classes_list);
> +	add_to_garbage(class);
> +}
> +
> +static void mpam_comp_destroy(struct mpam_component *comp) {
> +	struct mpam_class *class = comp->class;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_del_rcu(&comp->class_list);
> +	add_to_garbage(comp);
> +
> +	if (list_empty(&class->components))
> +		mpam_class_destroy(class);
> +}
> +
> +static void mpam_vmsc_destroy(struct mpam_vmsc *vmsc) {
> +	struct mpam_component *comp = vmsc->comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_del_rcu(&vmsc->comp_list);
> +	add_to_garbage(vmsc);
> +
> +	if (list_empty(&comp->vmsc))
> +		mpam_comp_destroy(comp);
> +}
> +
> +static void mpam_ris_destroy(struct mpam_msc_ris *ris) {
> +	struct mpam_vmsc *vmsc = ris->vmsc;
> +	struct mpam_msc *msc = vmsc->msc;
> +	struct platform_device *pdev = msc->pdev;
> +	struct mpam_component *comp = vmsc->comp;
> +	struct mpam_class *class = comp->class;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	cpumask_andnot(&comp->affinity, &comp->affinity, &ris->affinity);
> +	cpumask_andnot(&class->affinity, &class->affinity, &ris->affinity);
> +	clear_bit(ris->ris_idx, msc->ris_idxs);
> +	list_del_rcu(&ris->vmsc_list);
> +	list_del_rcu(&ris->msc_list);
> +	add_to_garbage(ris);
> +	ris->garbage.pdev = pdev;
> +
> +	if (list_empty(&vmsc->ris))
> +		mpam_vmsc_destroy(vmsc);
> +}
> +
> +/*
> + * There are two ways of reaching a struct mpam_msc_ris. Via the
> + * class->component->vmsc->ris, or via the msc.
> + * When destroying the msc, the other side needs unlinking and cleaning up
> too.
> + */
> +static void mpam_msc_destroy(struct mpam_msc *msc) {
> +	struct platform_device *pdev = msc->pdev;
> +	struct mpam_msc_ris *ris, *tmp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	list_del_rcu(&msc->glbl_list);
> +	platform_set_drvdata(pdev, NULL);
> +
> +	list_for_each_entry_safe(ris, tmp, &msc->ris, msc_list)
> +		mpam_ris_destroy(ris);
> +
> +	add_to_garbage(msc);
> +	msc->garbage.pdev = pdev;
> +}
> +
> +static void mpam_free_garbage(void)
> +{
> +	struct mpam_garbage *iter, *tmp;
> +	struct llist_node *to_free = llist_del_all(&mpam_garbage);
> +
> +	if (!to_free)
> +		return;
> +
> +	synchronize_srcu(&mpam_srcu);
> +
> +	llist_for_each_entry_safe(iter, tmp, to_free, llist) {
> +		if (iter->pdev)
> +			devm_kfree(&iter->pdev->dev, iter->to_free);
> +		else
> +			kfree(iter->to_free);
> +	}
> +}
> +
> +/* Called recursively to walk the list of caches from a particular CPU
> +*/ static void __mpam_get_cpumask_from_cache_id(int cpu, struct
> device_node *cache_node,
> +					     unsigned long cache_id,
> +					     u32 cache_level,
> +					     cpumask_t *affinity)
> +{
> +	int err;
> +	u32 iter_level;
> +	unsigned long iter_cache_id;
> +	struct device_node *iter_node __free(device_node) =
> +of_find_next_cache_node(cache_node);
> +
> +	if (!iter_node)
> +		return;
> +
> +	err = of_property_read_u32(iter_node, "cache-level", &iter_level);
> +	if (err)
> +		return;
> +
> +	/*
> +	 * get_cpu_cacheinfo_id() isn't ready until sometime
> +	 * during device_initcall(). Use cache_of_calculate_id().
> +	 */
> +	iter_cache_id = cache_of_calculate_id(iter_node);
> +	if (cache_id == ~0UL)
> +		return;
> +
> +	if (iter_level == cache_level && iter_cache_id == cache_id)
> +		cpumask_set_cpu(cpu, affinity);
> +
> +	__mpam_get_cpumask_from_cache_id(cpu, iter_node, cache_id,
> cache_level,
> +					 affinity);
> +}
> +
> +/*
> + * The cacheinfo structures are only populated when CPUs are online.
> + * This helper walks the device tree to include offline CPUs too.
> + */
> +int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32
> cache_level,
> +				   cpumask_t *affinity)
> +{
> +	int cpu;
> +
> +	if (!acpi_disabled)
> +		return acpi_pptt_get_cpumask_from_cache_id(cache_id,
> affinity);
> +
> +	for_each_possible_cpu(cpu) {
> +		struct device_node *cpu_node __free(device_node) =
> of_get_cpu_node(cpu, NULL);
> +		if (!cpu_node) {
> +			pr_err("Failed to find cpu%d device node\n", cpu);
> +			return -ENOENT;
> +		}
> +
> +		__mpam_get_cpumask_from_cache_id(cpu, cpu_node,
> cache_id,
> +						 cache_level, affinity);
> +			continue;
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * cpumask_of_node() only knows about online CPUs. This can't tell us
> +whether
> + * a class is represented on all possible CPUs.
> + */
> +static void get_cpumask_from_node_id(u32 node_id, cpumask_t *affinity)
> +{
> +	int cpu;
> +
> +	for_each_possible_cpu(cpu) {
> +		if (node_id == cpu_to_node(cpu))
> +			cpumask_set_cpu(cpu, affinity);
> +	}
> +}
> +
> +static int get_cpumask_from_cache(struct device_node *cache,
> +				  cpumask_t *affinity)
> +{
> +	int err;
> +	u32 cache_level;
> +	unsigned long cache_id;
> +
> +	err = of_property_read_u32(cache, "cache-level", &cache_level);
> +	if (err) {
> +		pr_err("Failed to read cache-level from cache node\n");
> +		return -ENOENT;
> +	}
> +
> +	cache_id = cache_of_calculate_id(cache);
> +	if (cache_id == ~0UL) {
> +		pr_err("Failed to calculate cache-id from cache node\n");
> +		return -ENOENT;
> +	}
> +
> +	return mpam_get_cpumask_from_cache_id(cache_id, cache_level,
> +affinity); }
> +
> +static int mpam_ris_get_affinity(struct mpam_msc *msc, cpumask_t
> *affinity,
> +				 enum mpam_class_types type,
> +				 struct mpam_class *class,
> +				 struct mpam_component *comp)
> +{
> +	int err;
> +
> +	switch (type) {
> +	case MPAM_CLASS_CACHE:
> +		err = mpam_get_cpumask_from_cache_id(comp->comp_id,
> class->level,
> +						     affinity);
> +		if (err)
> +			return err;
> +
> +		if (cpumask_empty(affinity))
> +			pr_warn_once("%s no CPUs associated with cache
> node",
> +				     dev_name(&msc->pdev->dev));
> +
> +		break;
> +	case MPAM_CLASS_MEMORY:
> +		get_cpumask_from_node_id(comp->comp_id, affinity);
> +		/* affinity may be empty for CPU-less memory nodes */
> +		break;
> +	case MPAM_CLASS_UNKNOWN:
> +		return 0;
> +	}
> +
> +	cpumask_and(affinity, affinity, &msc->accessibility);
> +
> +	return 0;
> +}
> +
> +static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
> +				  enum mpam_class_types type, u8
> class_id,
> +				  int component_id, gfp_t gfp)
> +{
> +	int err;
> +	struct mpam_vmsc *vmsc;
> +	struct mpam_msc_ris *ris;
> +	struct mpam_class *class;
> +	struct mpam_component *comp;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
> +	if (test_and_set_bit(ris_idx, msc->ris_idxs))
> +		return -EBUSY;
> +
> +	ris = devm_kzalloc(&msc->pdev->dev, sizeof(*ris), gfp);
> +	if (!ris)
> +		return -ENOMEM;
> +	init_garbage(ris);
> +
> +	class = mpam_class_get(class_id, type, true, gfp);
> +	if (IS_ERR(class))
> +		return PTR_ERR(class);
> +
> +	comp = mpam_component_get(class, component_id, true, gfp);
> +	if (IS_ERR(comp)) {
> +		if (list_empty(&class->components))
> +			mpam_class_destroy(class);
> +		return PTR_ERR(comp);
> +	}
> +
> +	vmsc = mpam_vmsc_get(comp, msc, true, gfp);
> +	if (IS_ERR(vmsc)) {
> +		if (list_empty(&comp->vmsc))
> +			mpam_comp_destroy(comp);
> +		return PTR_ERR(vmsc);
> +	}
> +
> +	err = mpam_ris_get_affinity(msc, &ris->affinity, type, class, comp);
> +	if (err) {
> +		if (list_empty(&vmsc->ris))
> +			mpam_vmsc_destroy(vmsc);
> +		return err;
> +	}
> +
> +	ris->ris_idx = ris_idx;
> +	INIT_LIST_HEAD_RCU(&ris->vmsc_list);
> +	ris->vmsc = vmsc;
> +
> +	cpumask_or(&comp->affinity, &comp->affinity, &ris->affinity);
> +	cpumask_or(&class->affinity, &class->affinity, &ris->affinity);
> +	list_add_rcu(&ris->vmsc_list, &vmsc->ris);
> +
> +	return 0;
> +}
> +
> +int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
> +		    enum mpam_class_types type, u8 class_id, int
> component_id) {
> +	int err;
> +
> +	mutex_lock(&mpam_list_lock);
> +	err = mpam_ris_create_locked(msc, ris_idx, type, class_id,
> +				     component_id, GFP_KERNEL);
> +	mutex_unlock(&mpam_list_lock);
> +	if (err)
> +		mpam_free_garbage();
> +
> +	return err;
> +}
> +
>  static void mpam_discovery_complete(void)  {
>  	pr_err("Discovered all MSC\n");
> @@ -179,7 +650,10 @@ static int update_msc_accessibility(struct mpam_msc
> *msc)
>  		cpumask_copy(&msc->accessibility, cpu_possible_mask);
>  		err = 0;
>  	} else {
> -		if (of_device_is_compatible(parent, "memory")) {
> +		if (of_device_is_compatible(parent, "cache")) {
> +			err = get_cpumask_from_cache(parent,
> +						     &msc->accessibility);
> +		} else if (of_device_is_compatible(parent, "memory")) {
>  			cpumask_copy(&msc->accessibility,
> cpu_possible_mask);
>  			err = 0;
>  		} else {
> @@ -209,11 +683,10 @@ static void mpam_msc_drv_remove(struct
> platform_device *pdev)
> 
>  	mutex_lock(&mpam_list_lock);
>  	mpam_num_msc--;
> -	platform_set_drvdata(pdev, NULL);
> -	list_del_rcu(&msc->glbl_list);
> -	synchronize_srcu(&mpam_srcu);
> -	devm_kfree(&pdev->dev, msc);
> +	mpam_msc_destroy(msc);
>  	mutex_unlock(&mpam_list_lock);
> +
> +	mpam_free_garbage();
>  }
> 
>  static int mpam_msc_drv_probe(struct platform_device *pdev) @@ -230,6
> +703,7 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
>  			err = -ENOMEM;
>  			break;
>  		}
> +		init_garbage(msc);
> 
>  		mutex_init(&msc->probe_lock);
>  		mutex_init(&msc->part_sel_lock);
> diff --git a/drivers/resctrl/mpam_internal.h
> b/drivers/resctrl/mpam_internal.h index 07e0f240eaca..d49bb884b433 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -7,10 +7,27 @@
>  #include <linux/arm_mpam.h>
>  #include <linux/cpumask.h>
>  #include <linux/io.h>
> +#include <linux/llist.h>
>  #include <linux/mailbox_client.h>
>  #include <linux/mutex.h>
>  #include <linux/resctrl.h>
>  #include <linux/sizes.h>
> +#include <linux/srcu.h>
> +
> +/*
> + * Structures protected by SRCU may not be freed for a surprising
> +amount of
> + * time (especially if perf is running). To ensure the MPAM error
> +interrupt can
> + * tear down all the structures, build a list of objects that can be
> +gargbage
> + * collected once synchronize_srcu() has returned.
> + * If pdev is non-NULL, use devm_kfree().
> + */
> +struct mpam_garbage {
> +	/* member of mpam_garbage */
> +	struct llist_node	llist;
> +
> +	void			*to_free;
> +	struct platform_device	*pdev;
> +};
> 
>  struct mpam_msc {
>  	/* member of mpam_all_msc */
> @@ -57,6 +74,80 @@ struct mpam_msc {
> 
>  	void __iomem		*mapped_hwpage;
>  	size_t			mapped_hwpage_sz;
> +
> +	struct mpam_garbage	garbage;
> +};
> +
> +struct mpam_class {
> +	/* mpam_components in this class */
> +	struct list_head	components;
> +
> +	cpumask_t		affinity;
> +
> +	u8			level;
> +	enum mpam_class_types	type;
> +
> +	/* member of mpam_classes */
> +	struct list_head	classes_list;
> +
> +	struct mpam_garbage	garbage;
> +};
> +
> +struct mpam_component {
> +	u32			comp_id;
> +
> +	/* mpam_vmsc in this component */
> +	struct list_head	vmsc;
> +
> +	cpumask_t		affinity;
> +
> +	/* member of mpam_class:components */
> +	struct list_head	class_list;
> +
> +	/* parent: */
> +	struct mpam_class	*class;
> +
> +	struct mpam_garbage	garbage;
>  };
> 
> +struct mpam_vmsc {
> +	/* member of mpam_component:vmsc_list */
> +	struct list_head	comp_list;
> +
> +	/* mpam_msc_ris in this vmsc */
> +	struct list_head	ris;
> +
> +	/* All RIS in this vMSC are members of this MSC */
> +	struct mpam_msc		*msc;
> +
> +	/* parent: */
> +	struct mpam_component	*comp;
> +
> +	struct mpam_garbage	garbage;
> +};
> +
> +struct mpam_msc_ris {
> +	u8			ris_idx;
> +
> +	cpumask_t		affinity;
> +
> +	/* member of mpam_vmsc:ris */
> +	struct list_head	vmsc_list;
> +
> +	/* member of mpam_msc:ris */
> +	struct list_head	msc_list;
> +
> +	/* parent: */
> +	struct mpam_vmsc	*vmsc;
> +
> +	struct mpam_garbage	garbage;
> +};
> +
> +/* List of all classes - protected by srcu*/ extern struct srcu_struct
> +mpam_srcu; extern struct list_head mpam_classes;
> +
> +int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32
> cache_level,
> +				   cpumask_t *affinity);
> +
>  #endif /* MPAM_INTERNAL_H */
> diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h index
> 0edefa6ba019..406a77be68cb 100644
> --- a/include/linux/arm_mpam.h
> +++ b/include/linux/arm_mpam.h
> @@ -36,11 +36,7 @@ static inline int acpi_mpam_parse_resources(struct
> mpam_msc *msc,  static inline int acpi_mpam_count_msc(void) { return
> -EINVAL; }  #endif
> 
> -static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
> -				  enum mpam_class_types type, u8
> class_id,
> -				  int component_id)
> -{
> -	return -EINVAL;
> -}
> +int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
> +		    enum mpam_class_types type, u8 class_id, int
> component_id);
> 
>  #endif /* __LINUX_ARM_MPAM_H */
> --
> 2.20.1
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described
  2025-09-09  7:30   ` Shaopeng Tan (Fujitsu)
@ 2025-09-10 19:32     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 19:32 UTC (permalink / raw)
  To: Shaopeng Tan (Fujitsu), linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-acpi@vger.kernel.org,
	devicetree@vger.kernel.org
  Cc: shameerali.kolothum.thodi@huawei.com, D Scott Phillips OS,
	carl@os.amperecomputing.com, lcherian@marvell.com,
	bobo.shaobowang@huawei.com, baolin.wang@linux.alibaba.com,
	Jamie Iles, Xin Hao, peternewman@google.com,
	dfustini@baylibre.com, amitsinght@marvell.com, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni,
	fenghuay@nvidia.com, baisheng.gao@unisoc.com, Jonathan Cameron,
	Rob Herring, Rohit Mathew, Rafael Wysocki, Len Brown,
	Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, Krzysztof Kozlowski,
	Conor Dooley, Catalin Marinas, Will Deacon, Greg Kroah-Hartman,
	Danilo Krummrich, Ben Horgan
Hi Shaopeng,
On 09/09/2025 08:30, Shaopeng Tan (Fujitsu) wrote:
>> An MSC is a container of resources, each identified by their RIS index.
>> Some RIS are described by firmware to provide their position in the system.
>> Others are discovered when the driver probes the hardware.
>>
>> To configure a resource it needs to be found by its class, e.g. 'L2'.
>> There are two kinds of grouping, a class is a set of components, which are
>> visible to user-space as there are likely to be multiple instances of the L2 cache.
>> (e.g. one per cluster or package)
>>
>> struct mpam_components are a set of struct mpam_vmsc. A vMSC groups the
>> RIS in an MSC that control the same logical piece of hardware. (e.g. L2).
>> This is to allow hardware implementations where two controls are presented as
>> different RIS. Re-combining these RIS allows their feature bits to be or-ed. This
>> structure is not visible outside mpam_devices.c
>>
>> struct mpam_vmsc are then a set of struct mpam_msc_ris, which are not
>> visible as each L2 cache may be composed of individual slices which need to
>> be configured the same as the hardware is not able to distribute the
>> configuration.
>>
>> Add support for creating and destroying these structures.
>>
>> A gfp is passed as the structures may need creating when a new RIS entry is
>> discovered when probing the MSC.
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index 71a1fb1a9c75..5baf2a8786fb 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -35,11 +34,483 @@
>> +static struct mpam_vmsc *mpam_vmsc_get(struct mpam_component
>> *comp,
>> +				       struct mpam_msc *msc, bool alloc,
>> +				       gfp_t gfp)
>> +{
>> +	struct mpam_vmsc *vmsc;
>> +
>> +	lockdep_assert_held(&mpam_list_lock);
>> +
>> +	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
>> +		if (vmsc->msc->id == msc->id)
>> +			return vmsc;
>> +	}
>> +
>> +	if (!alloc)
>> +		return ERR_PTR(-ENOENT);
> 
> It seems like it is always false here.
> If necessary, why not do this at the beginning of the function?
Because the VMSC may exist - in which case the 'get' function should return it
regardless of whether the caller wants to allocate it if its missing.
I'd anticipated callers like resctrl would want to grab components by things like cache-
id, without allocating them by accident. But that coded ended up just searching the lists
instead. I'll rip this out.
Thanks,
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
 
- * [PATCH 13/33] arm_mpam: Add MPAM MSC register layout definitions
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (45 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 12/33] arm_mpam: Add the class and component structures for ris firmware described James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-22 15:30 ` [PATCH 14/33] arm_mpam: Add cpuhp callbacks to probe MSC hardware James Morse
                   ` (20 subsequent siblings)
  67 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Memory Partitioning and Monitoring (MPAM) has memory mapped devices
(MSCs) with an identity/configuration page.
Add the definitions for these registers as offset within the page(s).
Link: https://developer.arm.com/documentation/ihi0099/latest/
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
 * Renamed MSMON_CFG_MBWU_CTL_TYPE_CSU as MSMON_CFG_CSU_CTL_TYPE_CSU
 * Whitepsace churn.
 * Cite a more recent document.
 * Removed some stale feature, fixed some names etc.
---
 drivers/resctrl/mpam_internal.h | 266 ++++++++++++++++++++++++++++++++
 1 file changed, 266 insertions(+)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index d49bb884b433..6e0982a1a9ac 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -150,4 +150,270 @@ extern struct list_head mpam_classes;
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
 
+/*
+ * MPAM MSCs have the following register layout. See:
+ * Arm Memory System Resource Partitioning and Monitoring (MPAM) System
+ * Component Specification.
+ * https://developer.arm.com/documentation/ihi0099/latest/
+ */
+#define MPAM_ARCHITECTURE_V1    0x10
+
+/* Memory mapped control pages: */
+/* ID Register offsets in the memory mapped page */
+#define MPAMF_IDR		0x0000  /* features id register */
+#define MPAMF_MSMON_IDR		0x0080  /* performance monitoring features */
+#define MPAMF_IMPL_IDR		0x0028  /* imp-def partitioning */
+#define MPAMF_CPOR_IDR		0x0030  /* cache-portion partitioning */
+#define MPAMF_CCAP_IDR		0x0038  /* cache-capacity partitioning */
+#define MPAMF_MBW_IDR		0x0040  /* mem-bw partitioning */
+#define MPAMF_PRI_IDR		0x0048  /* priority partitioning */
+#define MPAMF_CSUMON_IDR	0x0088  /* cache-usage monitor */
+#define MPAMF_MBWUMON_IDR	0x0090  /* mem-bw usage monitor */
+#define MPAMF_PARTID_NRW_IDR	0x0050  /* partid-narrowing */
+#define MPAMF_IIDR		0x0018  /* implementer id register */
+#define MPAMF_AIDR		0x0020  /* architectural id register */
+
+/* Configuration and Status Register offsets in the memory mapped page */
+#define MPAMCFG_PART_SEL	0x0100  /* partid to configure: */
+#define MPAMCFG_CPBM		0x1000  /* cache-portion config */
+#define MPAMCFG_CMAX		0x0108  /* cache-capacity config */
+#define MPAMCFG_CMIN		0x0110  /* cache-capacity config */
+#define MPAMCFG_MBW_MIN		0x0200  /* min mem-bw config */
+#define MPAMCFG_MBW_MAX		0x0208  /* max mem-bw config */
+#define MPAMCFG_MBW_WINWD	0x0220  /* mem-bw accounting window config */
+#define MPAMCFG_MBW_PBM		0x2000  /* mem-bw portion bitmap config */
+#define MPAMCFG_PRI		0x0400  /* priority partitioning config */
+#define MPAMCFG_MBW_PROP	0x0500  /* mem-bw stride config */
+#define MPAMCFG_INTPARTID	0x0600  /* partid-narrowing config */
+
+#define MSMON_CFG_MON_SEL	0x0800  /* monitor selector */
+#define MSMON_CFG_CSU_FLT	0x0810  /* cache-usage monitor filter */
+#define MSMON_CFG_CSU_CTL	0x0818  /* cache-usage monitor config */
+#define MSMON_CFG_MBWU_FLT	0x0820  /* mem-bw monitor filter */
+#define MSMON_CFG_MBWU_CTL	0x0828  /* mem-bw monitor config */
+#define MSMON_CSU		0x0840  /* current cache-usage */
+#define MSMON_CSU_CAPTURE	0x0848  /* last cache-usage value captured */
+#define MSMON_MBWU		0x0860  /* current mem-bw usage value */
+#define MSMON_MBWU_CAPTURE	0x0868  /* last mem-bw value captured */
+#define MSMON_MBWU_L		0x0880  /* current long mem-bw usage value */
+#define MSMON_MBWU_CAPTURE_L	0x0890  /* last long mem-bw value captured */
+#define MSMON_CAPT_EVNT		0x0808  /* signal a capture event */
+#define MPAMF_ESR		0x00F8  /* error status register */
+#define MPAMF_ECR		0x00F0  /* error control register */
+
+/* MPAMF_IDR - MPAM features ID register */
+#define MPAMF_IDR_PARTID_MAX		GENMASK(15, 0)
+#define MPAMF_IDR_PMG_MAX		GENMASK(23, 16)
+#define MPAMF_IDR_HAS_CCAP_PART		BIT(24)
+#define MPAMF_IDR_HAS_CPOR_PART		BIT(25)
+#define MPAMF_IDR_HAS_MBW_PART		BIT(26)
+#define MPAMF_IDR_HAS_PRI_PART		BIT(27)
+#define MPAMF_IDR_EXT			BIT(28)
+#define MPAMF_IDR_HAS_IMPL_IDR		BIT(29)
+#define MPAMF_IDR_HAS_MSMON		BIT(30)
+#define MPAMF_IDR_HAS_PARTID_NRW	BIT(31)
+#define MPAMF_IDR_HAS_RIS		BIT(32)
+#define MPAMF_IDR_HAS_EXTD_ESR		BIT(38)
+#define MPAMF_IDR_HAS_ESR		BIT(39)
+#define MPAMF_IDR_RIS_MAX		GENMASK(59, 56)
+
+/* MPAMF_MSMON_IDR - MPAM performance monitoring ID register */
+#define MPAMF_MSMON_IDR_MSMON_CSU		BIT(16)
+#define MPAMF_MSMON_IDR_MSMON_MBWU		BIT(17)
+#define MPAMF_MSMON_IDR_HAS_LOCAL_CAPT_EVNT	BIT(31)
+
+/* MPAMF_CPOR_IDR - MPAM features cache portion partitioning ID register */
+#define MPAMF_CPOR_IDR_CPBM_WD			GENMASK(15, 0)
+
+/* MPAMF_CCAP_IDR - MPAM features cache capacity partitioning ID register */
+#define MPAMF_CCAP_IDR_CMAX_WD			GENMASK(5, 0)
+#define MPAMF_CCAP_IDR_CASSOC_WD		GENMASK(12, 8)
+#define MPAMF_CCAP_IDR_HAS_CASSOC		BIT(28)
+#define MPAMF_CCAP_IDR_HAS_CMIN			BIT(29)
+#define MPAMF_CCAP_IDR_NO_CMAX			BIT(30)
+#define MPAMF_CCAP_IDR_HAS_CMAX_SOFTLIM		BIT(31)
+
+/* MPAMF_MBW_IDR - MPAM features memory bandwidth partitioning ID register */
+#define MPAMF_MBW_IDR_BWA_WD		GENMASK(5, 0)
+#define MPAMF_MBW_IDR_HAS_MIN		BIT(10)
+#define MPAMF_MBW_IDR_HAS_MAX		BIT(11)
+#define MPAMF_MBW_IDR_HAS_PBM		BIT(12)
+#define MPAMF_MBW_IDR_HAS_PROP		BIT(13)
+#define MPAMF_MBW_IDR_WINDWR		BIT(14)
+#define MPAMF_MBW_IDR_BWPBM_WD		GENMASK(28, 16)
+
+/* MPAMF_PRI_IDR - MPAM features priority partitioning ID register */
+#define MPAMF_PRI_IDR_HAS_INTPRI	BIT(0)
+#define MPAMF_PRI_IDR_INTPRI_0_IS_LOW	BIT(1)
+#define MPAMF_PRI_IDR_INTPRI_WD		GENMASK(9, 4)
+#define MPAMF_PRI_IDR_HAS_DSPRI		BIT(16)
+#define MPAMF_PRI_IDR_DSPRI_0_IS_LOW	BIT(17)
+#define MPAMF_PRI_IDR_DSPRI_WD		GENMASK(25, 20)
+
+/* MPAMF_CSUMON_IDR - MPAM cache storage usage monitor ID register */
+#define MPAMF_CSUMON_IDR_NUM_MON	GENMASK(15, 0)
+#define MPAMF_CSUMON_IDR_HAS_OFLOW_CAPT	BIT(24)
+#define MPAMF_CSUMON_IDR_HAS_CEVNT_OFLW	BIT(25)
+#define MPAMF_CSUMON_IDR_HAS_OFSR	BIT(26)
+#define MPAMF_CSUMON_IDR_HAS_OFLOW_LNKG	BIT(27)
+#define MPAMF_CSUMON_IDR_HAS_XCL	BIT(29)
+#define MPAMF_CSUMON_IDR_CSU_RO		BIT(30)
+#define MPAMF_CSUMON_IDR_HAS_CAPTURE	BIT(31)
+
+/* MPAMF_MBWUMON_IDR - MPAM memory bandwidth usage monitor ID register */
+#define MPAMF_MBWUMON_IDR_NUM_MON	GENMASK(15, 0)
+#define MPAMF_MBWUMON_IDR_HAS_RWBW	BIT(28)
+#define MPAMF_MBWUMON_IDR_LWD		BIT(29)
+#define MPAMF_MBWUMON_IDR_HAS_LONG	BIT(30)
+#define MPAMF_MBWUMON_IDR_HAS_CAPTURE	BIT(31)
+
+/* MPAMF_PARTID_NRW_IDR - MPAM PARTID narrowing ID register */
+#define MPAMF_PARTID_NRW_IDR_INTPARTID_MAX      GENMASK(15, 0)
+
+/* MPAMF_IIDR - MPAM implementation ID register */
+#define MPAMF_IIDR_PRODUCTID	GENMASK(31, 20)
+#define MPAMF_IIDR_PRODUCTID_SHIFT	20
+#define MPAMF_IIDR_VARIANT	GENMASK(19, 16)
+#define MPAMF_IIDR_VARIANT_SHIFT	16
+#define MPAMF_IIDR_REVISON	GENMASK(15, 12)
+#define MPAMF_IIDR_REVISON_SHIFT	12
+#define MPAMF_IIDR_IMPLEMENTER	GENMASK(11, 0)
+#define MPAMF_IIDR_IMPLEMENTER_SHIFT	0
+
+/* MPAMF_AIDR - MPAM architecture ID register */
+#define MPAMF_AIDR_ARCH_MAJOR_REV	GENMASK(7, 4)
+#define MPAMF_AIDR_ARCH_MINOR_REV	GENMASK(3, 0)
+
+/* MPAMCFG_PART_SEL - MPAM partition configuration selection register */
+#define MPAMCFG_PART_SEL_PARTID_SEL	GENMASK(15, 0)
+#define MPAMCFG_PART_SEL_INTERNAL	BIT(16)
+#define MPAMCFG_PART_SEL_RIS		GENMASK(27, 24)
+
+/* MPAMCFG_CMAX - MPAM cache capacity configuration register */
+#define MPAMCFG_CMAX_SOFTLIM		BIT(31)
+#define MPAMCFG_CMAX_CMAX		GENMASK(15, 0)
+
+/* MPAMCFG_CMIN - MPAM cache capacity configuration register */
+#define MPAMCFG_CMIN_CMIN		GENMASK(15, 0)
+
+/*
+ * MPAMCFG_MBW_MIN - MPAM memory minimum bandwidth partitioning configuration
+ *                   register
+ */
+#define MPAMCFG_MBW_MIN_MIN		GENMASK(15, 0)
+
+/*
+ * MPAMCFG_MBW_MAX - MPAM memory maximum bandwidth partitioning configuration
+ *                   register
+ */
+#define MPAMCFG_MBW_MAX_MAX		GENMASK(15, 0)
+#define MPAMCFG_MBW_MAX_HARDLIM		BIT(31)
+
+/*
+ * MPAMCFG_MBW_WINWD - MPAM memory bandwidth partitioning window width
+ *                     register
+ */
+#define MPAMCFG_MBW_WINWD_US_FRAC	GENMASK(7, 0)
+#define MPAMCFG_MBW_WINWD_US_INT	GENMASK(23, 8)
+
+/* MPAMCFG_PRI - MPAM priority partitioning configuration register */
+#define MPAMCFG_PRI_INTPRI		GENMASK(15, 0)
+#define MPAMCFG_PRI_DSPRI		GENMASK(31, 16)
+
+/*
+ * MPAMCFG_MBW_PROP - Memory bandwidth proportional stride partitioning
+ *                    configuration register
+ */
+#define MPAMCFG_MBW_PROP_STRIDEM1	GENMASK(15, 0)
+#define MPAMCFG_MBW_PROP_EN		BIT(31)
+
+/*
+ * MPAMCFG_INTPARTID - MPAM internal partition narrowing configuration register
+ */
+#define MPAMCFG_INTPARTID_INTPARTID	GENMASK(15, 0)
+#define MPAMCFG_INTPARTID_INTERNAL	BIT(16)
+
+/* MSMON_CFG_MON_SEL - Memory system performance monitor selection register */
+#define MSMON_CFG_MON_SEL_MON_SEL	GENMASK(15, 0)
+#define MSMON_CFG_MON_SEL_RIS		GENMASK(27, 24)
+
+/* MPAMF_ESR - MPAM Error Status Register */
+#define MPAMF_ESR_PARTID_MON	GENMASK(15, 0)
+#define MPAMF_ESR_PMG		GENMASK(23, 16)
+#define MPAMF_ESR_ERRCODE	GENMASK(27, 24)
+#define MPAMF_ESR_OVRWR		BIT(31)
+#define MPAMF_ESR_RIS		GENMASK(35, 32)
+
+/* MPAMF_ECR - MPAM Error Control Register */
+#define MPAMF_ECR_INTEN		BIT(0)
+
+/* Error conditions in accessing memory mapped registers */
+#define MPAM_ERRCODE_NONE			0
+#define MPAM_ERRCODE_PARTID_SEL_RANGE		1
+#define MPAM_ERRCODE_REQ_PARTID_RANGE		2
+#define MPAM_ERRCODE_MSMONCFG_ID_RANGE		3
+#define MPAM_ERRCODE_REQ_PMG_RANGE		4
+#define MPAM_ERRCODE_MONITOR_RANGE		5
+#define MPAM_ERRCODE_INTPARTID_RANGE		6
+#define MPAM_ERRCODE_UNEXPECTED_INTERNAL	7
+
+/*
+ * MSMON_CFG_CSU_FLT - Memory system performance monitor configure cache storage
+ *                    usage monitor filter register
+ */
+#define MSMON_CFG_CSU_FLT_PARTID	GENMASK(15, 0)
+#define MSMON_CFG_CSU_FLT_PMG		GENMASK(23, 16)
+
+/*
+ * MSMON_CFG_CSU_CTL - Memory system performance monitor configure cache storage
+ *                    usage monitor control register
+ * MSMON_CFG_MBWU_CTL - Memory system performance monitor configure memory
+ *                     bandwidth usage monitor control register
+ */
+#define MSMON_CFG_x_CTL_TYPE			GENMASK(7, 0)
+#define MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L	BIT(15)
+#define MSMON_CFG_x_CTL_MATCH_PARTID		BIT(16)
+#define MSMON_CFG_x_CTL_MATCH_PMG		BIT(17)
+#define MSMON_CFG_x_CTL_SCLEN			BIT(19)
+#define MSMON_CFG_x_CTL_SUBTYPE			GENMASK(22, 20)
+#define MSMON_CFG_x_CTL_OFLOW_FRZ		BIT(24)
+#define MSMON_CFG_x_CTL_OFLOW_INTR		BIT(25)
+#define MSMON_CFG_x_CTL_OFLOW_STATUS		BIT(26)
+#define MSMON_CFG_x_CTL_CAPT_RESET		BIT(27)
+#define MSMON_CFG_x_CTL_CAPT_EVNT		GENMASK(30, 28)
+#define MSMON_CFG_x_CTL_EN			BIT(31)
+
+#define MSMON_CFG_MBWU_CTL_TYPE_MBWU			0x42
+#define MSMON_CFG_CSU_CTL_TYPE_CSU			0
+
+/*
+ * MSMON_CFG_MBWU_FLT - Memory system performance monitor configure memory
+ *                     bandwidth usage monitor filter register
+ */
+#define MSMON_CFG_MBWU_FLT_PARTID		GENMASK(15, 0)
+#define MSMON_CFG_MBWU_FLT_PMG			GENMASK(23, 16)
+#define MSMON_CFG_MBWU_FLT_RWBW			GENMASK(31, 30)
+
+/*
+ * MSMON_CSU - Memory system performance monitor cache storage usage monitor
+ *            register
+ * MSMON_CSU_CAPTURE -  Memory system performance monitor cache storage usage
+ *                     capture register
+ * MSMON_MBWU  - Memory system performance monitor memory bandwidth usage
+ *               monitor register
+ * MSMON_MBWU_CAPTURE - Memory system performance monitor memory bandwidth usage
+ *                     capture register
+ */
+#define MSMON___VALUE		GENMASK(30, 0)
+#define MSMON___NRDY		BIT(31)
+#define MSMON___NRDY_L		BIT(63)
+#define MSMON___L_VALUE		GENMASK(43, 0)
+#define MSMON___LWD_VALUE	GENMASK(62, 0)
+
+/*
+ * MSMON_CAPT_EVNT - Memory system performance monitoring capture event
+ *                  generation register
+ */
+#define MSMON_CAPT_EVNT_NOW	BIT(0)
+
 #endif /* MPAM_INTERNAL_H */
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * [PATCH 14/33] arm_mpam: Add cpuhp callbacks to probe MSC hardware
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (46 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 13/33] arm_mpam: Add MPAM MSC register layout definitions James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-22 15:30 ` [PATCH 15/33] arm_mpam: Probe MSCs to find the supported partid/pmg values James Morse
                   ` (19 subsequent siblings)
  67 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Lecopzer Chen
Because an MSC can only by accessed from the CPUs in its cpu-affinity
set we need to be running on one of those CPUs to probe the MSC
hardware.
Do this work in the cpuhp callback. Probing the hardware will only
happen before MPAM is enabled, walk all the MSCs and probe those we can
reach that haven't already been probed.
Later once MPAM is enabled, this cpuhp callback will be replaced by
one that avoids the global list.
Enabling a static key will also take the cpuhp lock, so can't be done
from the cpuhp callback. Whenever a new MSC has been probed schedule
work to test if all the MSCs have now been probed.
CC: Lecopzer Chen <lecopzerc@nvidia.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_devices.c  | 144 +++++++++++++++++++++++++++++++-
 drivers/resctrl/mpam_internal.h |   8 +-
 2 files changed, 147 insertions(+), 5 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 5baf2a8786fb..9d6516f98acf 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -4,6 +4,7 @@
 #define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
 
 #include <linux/acpi.h>
+#include <linux/atomic.h>
 #include <linux/arm_mpam.h>
 #include <linux/cacheinfo.h>
 #include <linux/cpu.h>
@@ -21,6 +22,7 @@
 #include <linux/slab.h>
 #include <linux/spinlock.h>
 #include <linux/types.h>
+#include <linux/workqueue.h>
 
 #include <acpi/pcc.h>
 
@@ -39,6 +41,16 @@ struct srcu_struct mpam_srcu;
 /* MPAM isn't available until all the MSC have been probed. */
 static u32 mpam_num_msc;
 
+static int mpam_cpuhp_state;
+static DEFINE_MUTEX(mpam_cpuhp_state_lock);
+
+/*
+ * mpam is enabled once all devices have been probed from CPU online callbacks,
+ * scheduled via this work_struct. If access to an MSC depends on a CPU that
+ * was not brought online at boot, this can happen surprisingly late.
+ */
+static DECLARE_WORK(mpam_enable_work, &mpam_enable);
+
 /*
  * An MSC is a physical container for controls and monitors, each identified by
  * their RIS index. These share a base-address, interrupts and some MMIO
@@ -78,6 +90,22 @@ LIST_HEAD(mpam_classes);
 /* List of all objects that can be free()d after synchronise_srcu() */
 static LLIST_HEAD(mpam_garbage);
 
+static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
+{
+	WARN_ON_ONCE(reg > msc->mapped_hwpage_sz);
+	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
+
+	return readl_relaxed(msc->mapped_hwpage + reg);
+}
+
+static inline u32 _mpam_read_partsel_reg(struct mpam_msc *msc, u16 reg)
+{
+	lockdep_assert_held_once(&msc->part_sel_lock);
+	return __mpam_read_reg(msc, reg);
+}
+
+#define mpam_read_partsel_reg(msc, reg)        _mpam_read_partsel_reg(msc, MPAMF_##reg)
+
 #define init_garbage(x)	init_llist_node(&(x)->garbage.llist)
 
 static struct mpam_vmsc *
@@ -511,9 +539,84 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
 	return err;
 }
 
-static void mpam_discovery_complete(void)
+static int mpam_msc_hw_probe(struct mpam_msc *msc)
+{
+	u64 idr;
+	int err;
+
+	lockdep_assert_held(&msc->probe_lock);
+
+	mutex_lock(&msc->part_sel_lock);
+	idr = mpam_read_partsel_reg(msc, AIDR);
+	if ((idr & MPAMF_AIDR_ARCH_MAJOR_REV) != MPAM_ARCHITECTURE_V1) {
+		pr_err_once("%s does not match MPAM architecture v1.x\n",
+			    dev_name(&msc->pdev->dev));
+		err = -EIO;
+	} else {
+		msc->probed = true;
+		err = 0;
+	}
+	mutex_unlock(&msc->part_sel_lock);
+
+	return err;
+}
+
+static int mpam_cpu_online(unsigned int cpu)
 {
-	pr_err("Discovered all MSC\n");
+	return 0;
+}
+
+/* Before mpam is enabled, try to probe new MSC */
+static int mpam_discovery_cpu_online(unsigned int cpu)
+{
+	int err = 0;
+	struct mpam_msc *msc;
+	bool new_device_probed = false;
+
+	mutex_lock(&mpam_list_lock);
+	list_for_each_entry(msc, &mpam_all_msc, glbl_list) {
+		if (!cpumask_test_cpu(cpu, &msc->accessibility))
+			continue;
+
+		mutex_lock(&msc->probe_lock);
+		if (!msc->probed)
+			err = mpam_msc_hw_probe(msc);
+		mutex_unlock(&msc->probe_lock);
+
+		if (!err)
+			new_device_probed = true;
+		else
+			break; // mpam_broken
+	}
+	mutex_unlock(&mpam_list_lock);
+
+	if (new_device_probed && !err)
+		schedule_work(&mpam_enable_work);
+
+	return err;
+}
+
+static int mpam_cpu_offline(unsigned int cpu)
+{
+	return 0;
+}
+
+static void mpam_register_cpuhp_callbacks(int (*online)(unsigned int online),
+					  int (*offline)(unsigned int offline))
+{
+	mutex_lock(&mpam_cpuhp_state_lock);
+	if (mpam_cpuhp_state) {
+		cpuhp_remove_state(mpam_cpuhp_state);
+		mpam_cpuhp_state = 0;
+	}
+
+	mpam_cpuhp_state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "mpam:online",
+					     online, offline);
+	if (mpam_cpuhp_state <= 0) {
+		pr_err("Failed to register cpuhp callbacks");
+		mpam_cpuhp_state = 0;
+	}
+	mutex_unlock(&mpam_cpuhp_state_lock);
 }
 
 static int mpam_dt_count_msc(void)
@@ -772,7 +875,7 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
 	}
 
 	if (!err && fw_num_msc == mpam_num_msc)
-		mpam_discovery_complete();
+		mpam_register_cpuhp_callbacks(&mpam_discovery_cpu_online, NULL);
 
 	if (err && msc)
 		mpam_msc_drv_remove(pdev);
@@ -795,6 +898,41 @@ static struct platform_driver mpam_msc_driver = {
 	.remove = mpam_msc_drv_remove,
 };
 
+static void mpam_enable_once(void)
+{
+	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
+
+	pr_info("MPAM enabled\n");
+}
+
+/*
+ * Enable mpam once all devices have been probed.
+ * Scheduled by mpam_discovery_cpu_online() once all devices have been created.
+ * Also scheduled when new devices are probed when new CPUs come online.
+ */
+void mpam_enable(struct work_struct *work)
+{
+	static atomic_t once;
+	struct mpam_msc *msc;
+	bool all_devices_probed = true;
+
+	/* Have we probed all the hw devices? */
+	mutex_lock(&mpam_list_lock);
+	list_for_each_entry(msc, &mpam_all_msc, glbl_list) {
+		mutex_lock(&msc->probe_lock);
+		if (!msc->probed)
+			all_devices_probed = false;
+		mutex_unlock(&msc->probe_lock);
+
+		if (!all_devices_probed)
+			break;
+	}
+	mutex_unlock(&mpam_list_lock);
+
+	if (all_devices_probed && !atomic_fetch_inc(&once))
+		mpam_enable_once();
+}
+
 /*
  * MSC that are hidden under caches are not created as platform devices
  * as there is no cache driver. Caches are also special-cased in
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 6e0982a1a9ac..a98cca08a2ef 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -49,6 +49,7 @@ struct mpam_msc {
 	 * properties become read-only and the lists are protected by SRCU.
 	 */
 	struct mutex		probe_lock;
+	bool			probed;
 	unsigned long		ris_idxs[128 / BITS_PER_LONG];
 	u32			ris_max;
 
@@ -59,14 +60,14 @@ struct mpam_msc {
 	 * part_sel_lock protects access to the MSC hardware registers that are
 	 * affected by MPAMCFG_PART_SEL. (including the ID registers that vary
 	 * by RIS).
-	 * If needed, take msc->lock first.
+	 * If needed, take msc->probe_lock first.
 	 */
 	struct mutex		part_sel_lock;
 
 	/*
 	 * mon_sel_lock protects access to the MSC hardware registers that are
 	 * affeted by MPAMCFG_MON_SEL.
-	 * If needed, take msc->lock first.
+	 * If needed, take msc->probe_lock first.
 	 */
 	struct mutex		outer_mon_sel_lock;
 	raw_spinlock_t		inner_mon_sel_lock;
@@ -147,6 +148,9 @@ struct mpam_msc_ris {
 extern struct srcu_struct mpam_srcu;
 extern struct list_head mpam_classes;
 
+/* Scheduled work callback to enable mpam once all MSC have been probed */
+void mpam_enable(struct work_struct *work);
+
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
 
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * [PATCH 15/33] arm_mpam: Probe MSCs to find the supported partid/pmg values
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (47 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 14/33] arm_mpam: Add cpuhp callbacks to probe MSC hardware James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-22 15:30 ` [PATCH 16/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers James Morse
                   ` (18 subsequent siblings)
  67 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
CPUs can generate traffic with a range of PARTID and PMG values,
but each MSC may have its own maximum size for these fields.
Before MPAM can be used, the driver needs to probe each RIS on
each MSC, to find the system-wide smallest value that can be used.
While doing this, RIS entries that firmware didn't describe are create
under MPAM_CLASS_UNKNOWN.
While we're here, implement the mpam_register_requestor() call
for the arch code to register the CPU limits. Future callers of this
will tell us about the SMMU and ITS.
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_devices.c  | 158 ++++++++++++++++++++++++++++++--
 drivers/resctrl/mpam_internal.h |   6 ++
 include/linux/arm_mpam.h        |  14 +++
 3 files changed, 171 insertions(+), 7 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 9d6516f98acf..012e09e80300 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -6,6 +6,7 @@
 #include <linux/acpi.h>
 #include <linux/atomic.h>
 #include <linux/arm_mpam.h>
+#include <linux/bitfield.h>
 #include <linux/cacheinfo.h>
 #include <linux/cpu.h>
 #include <linux/cpumask.h>
@@ -44,6 +45,15 @@ static u32 mpam_num_msc;
 static int mpam_cpuhp_state;
 static DEFINE_MUTEX(mpam_cpuhp_state_lock);
 
+/*
+ * The smallest common values for any CPU or MSC in the system.
+ * Generating traffic outside this range will result in screaming interrupts.
+ */
+u16 mpam_partid_max;
+u8 mpam_pmg_max;
+static bool partid_max_init, partid_max_published;
+static DEFINE_SPINLOCK(partid_max_lock);
+
 /*
  * mpam is enabled once all devices have been probed from CPU online callbacks,
  * scheduled via this work_struct. If access to an MSC depends on a CPU that
@@ -106,6 +116,74 @@ static inline u32 _mpam_read_partsel_reg(struct mpam_msc *msc, u16 reg)
 
 #define mpam_read_partsel_reg(msc, reg)        _mpam_read_partsel_reg(msc, MPAMF_##reg)
 
+static void __mpam_write_reg(struct mpam_msc *msc, u16 reg, u32 val)
+{
+	WARN_ON_ONCE(reg + sizeof(u32) > msc->mapped_hwpage_sz);
+	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
+
+	writel_relaxed(val, msc->mapped_hwpage + reg);
+}
+
+static inline void _mpam_write_partsel_reg(struct mpam_msc *msc, u16 reg, u32 val)
+{
+	lockdep_assert_held_once(&msc->part_sel_lock);
+	__mpam_write_reg(msc, reg, val);
+}
+#define mpam_write_partsel_reg(msc, reg, val)  _mpam_write_partsel_reg(msc, MPAMCFG_##reg, val)
+
+static u64 mpam_msc_read_idr(struct mpam_msc *msc)
+{
+	u64 idr_high = 0, idr_low;
+
+	lockdep_assert_held(&msc->part_sel_lock);
+
+	idr_low = mpam_read_partsel_reg(msc, IDR);
+	if (FIELD_GET(MPAMF_IDR_EXT, idr_low))
+		idr_high = mpam_read_partsel_reg(msc, IDR + 4);
+
+	return (idr_high << 32) | idr_low;
+}
+
+static void __mpam_part_sel_raw(u32 partsel, struct mpam_msc *msc)
+{
+	lockdep_assert_held(&msc->part_sel_lock);
+
+	mpam_write_partsel_reg(msc, PART_SEL, partsel);
+}
+
+static void __mpam_part_sel(u8 ris_idx, u16 partid, struct mpam_msc *msc)
+{
+	u32 partsel = FIELD_PREP(MPAMCFG_PART_SEL_RIS, ris_idx) |
+		      FIELD_PREP(MPAMCFG_PART_SEL_PARTID_SEL, partid);
+
+	__mpam_part_sel_raw(partsel, msc);
+}
+
+int mpam_register_requestor(u16 partid_max, u8 pmg_max)
+{
+	int err = 0;
+
+	lockdep_assert_irqs_enabled();
+
+	spin_lock(&partid_max_lock);
+	if (!partid_max_init) {
+		mpam_partid_max = partid_max;
+		mpam_pmg_max = pmg_max;
+		partid_max_init = true;
+	} else if (!partid_max_published) {
+		mpam_partid_max = min(mpam_partid_max, partid_max);
+		mpam_pmg_max = min(mpam_pmg_max, pmg_max);
+	} else {
+		/* New requestors can't lower the values */
+		if (partid_max < mpam_partid_max || pmg_max < mpam_pmg_max)
+			err = -EBUSY;
+	}
+	spin_unlock(&partid_max_lock);
+
+	return err;
+}
+EXPORT_SYMBOL(mpam_register_requestor);
+
 #define init_garbage(x)	init_llist_node(&(x)->garbage.llist)
 
 static struct mpam_vmsc *
@@ -520,6 +598,7 @@ static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
 	cpumask_or(&comp->affinity, &comp->affinity, &ris->affinity);
 	cpumask_or(&class->affinity, &class->affinity, &ris->affinity);
 	list_add_rcu(&ris->vmsc_list, &vmsc->ris);
+	list_add_rcu(&ris->msc_list, &msc->ris);
 
 	return 0;
 }
@@ -539,10 +618,37 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
 	return err;
 }
 
+static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
+						   u8 ris_idx)
+{
+	int err;
+	struct mpam_msc_ris *ris, *found = ERR_PTR(-ENOENT);
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	if (!test_bit(ris_idx, msc->ris_idxs)) {
+		err = mpam_ris_create_locked(msc, ris_idx, MPAM_CLASS_UNKNOWN,
+					     0, 0, GFP_ATOMIC);
+		if (err)
+			return ERR_PTR(err);
+	}
+
+	list_for_each_entry(ris, &msc->ris, msc_list) {
+		if (ris->ris_idx == ris_idx) {
+			found = ris;
+			break;
+		}
+	}
+
+	return found;
+}
+
 static int mpam_msc_hw_probe(struct mpam_msc *msc)
 {
 	u64 idr;
-	int err;
+	u16 partid_max;
+	u8 ris_idx, pmg_max;
+	struct mpam_msc_ris *ris;
 
 	lockdep_assert_held(&msc->probe_lock);
 
@@ -551,14 +657,42 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 	if ((idr & MPAMF_AIDR_ARCH_MAJOR_REV) != MPAM_ARCHITECTURE_V1) {
 		pr_err_once("%s does not match MPAM architecture v1.x\n",
 			    dev_name(&msc->pdev->dev));
-		err = -EIO;
-	} else {
-		msc->probed = true;
-		err = 0;
+		mutex_unlock(&msc->part_sel_lock);
+		return -EIO;
 	}
+
+	idr = mpam_msc_read_idr(msc);
 	mutex_unlock(&msc->part_sel_lock);
+	msc->ris_max = FIELD_GET(MPAMF_IDR_RIS_MAX, idr);
+
+	/* Use these values so partid/pmg always starts with a valid value */
+	msc->partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
+	msc->pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
+
+	for (ris_idx = 0; ris_idx <= msc->ris_max; ris_idx++) {
+		mutex_lock(&msc->part_sel_lock);
+		__mpam_part_sel(ris_idx, 0, msc);
+		idr = mpam_msc_read_idr(msc);
+		mutex_unlock(&msc->part_sel_lock);
+
+		partid_max = FIELD_GET(MPAMF_IDR_PARTID_MAX, idr);
+		pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
+		msc->partid_max = min(msc->partid_max, partid_max);
+		msc->pmg_max = min(msc->pmg_max, pmg_max);
+
+		ris = mpam_get_or_create_ris(msc, ris_idx);
+		if (IS_ERR(ris))
+			return PTR_ERR(ris);
+	}
 
-	return err;
+	spin_lock(&partid_max_lock);
+	mpam_partid_max = min(mpam_partid_max, msc->partid_max);
+	mpam_pmg_max = min(mpam_pmg_max, msc->pmg_max);
+	spin_unlock(&partid_max_lock);
+
+	msc->probed = true;
+
+	return 0;
 }
 
 static int mpam_cpu_online(unsigned int cpu)
@@ -900,9 +1034,18 @@ static struct platform_driver mpam_msc_driver = {
 
 static void mpam_enable_once(void)
 {
+	/*
+	 * Once the cpuhp callbacks have been changed, mpam_partid_max can no
+	 * longer change.
+	 */
+	spin_lock(&partid_max_lock);
+	partid_max_published = true;
+	spin_unlock(&partid_max_lock);
+
 	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
 
-	pr_info("MPAM enabled\n");
+	printk(KERN_INFO "MPAM enabled with %u partid and %u pmg\n",
+	       mpam_partid_max + 1, mpam_pmg_max + 1);
 }
 
 /*
@@ -972,4 +1115,5 @@ static int __init mpam_msc_driver_init(void)
 
 	return platform_driver_register(&mpam_msc_driver);
 }
+/* Must occur after arm64_mpam_register_cpus() from arch_initcall() */
 subsys_initcall(mpam_msc_driver_init);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index a98cca08a2ef..a623f405ddd8 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -50,6 +50,8 @@ struct mpam_msc {
 	 */
 	struct mutex		probe_lock;
 	bool			probed;
+	u16			partid_max;
+	u8			pmg_max;
 	unsigned long		ris_idxs[128 / BITS_PER_LONG];
 	u32			ris_max;
 
@@ -148,6 +150,10 @@ struct mpam_msc_ris {
 extern struct srcu_struct mpam_srcu;
 extern struct list_head mpam_classes;
 
+/* System wide partid/pmg values */
+extern u16 mpam_partid_max;
+extern u8 mpam_pmg_max;
+
 /* Scheduled work callback to enable mpam once all MSC have been probed */
 void mpam_enable(struct work_struct *work);
 
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
index 406a77be68cb..8af93794c7a2 100644
--- a/include/linux/arm_mpam.h
+++ b/include/linux/arm_mpam.h
@@ -39,4 +39,18 @@ static inline int acpi_mpam_count_msc(void) { return -EINVAL; }
 int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
 		    enum mpam_class_types type, u8 class_id, int component_id);
 
+/**
+ * mpam_register_requestor() - Register a requestor with the MPAM driver
+ * @partid_max:		The maximum PARTID value the requestor can generate.
+ * @pmg_max:		The maximum PMG value the requestor can generate.
+ *
+ * Registers a requestor with the MPAM driver to ensure the chosen system-wide
+ * minimum PARTID and PMG values will allow the requestors features to be used.
+ *
+ * Returns an error if the registration is too late, and a larger PARTID/PMG
+ * value has been advertised to user-space. In this case the requestor should
+ * not use its MPAM features. Returns 0 on success.
+ */
+int mpam_register_requestor(u16 partid_max, u8 pmg_max);
+
 #endif /* __LINUX_ARM_MPAM_H */
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * [PATCH 16/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (48 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 15/33] arm_mpam: Probe MSCs to find the supported partid/pmg values James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-22 15:30 ` [PATCH 17/33] arm_mpam: Probe the hardware features resctrl supports James Morse
                   ` (17 subsequent siblings)
  67 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
The MSC MON_SEL register needs to be accessed from hardirq context by the
PMU drivers, making an irqsave spinlock the obvious lock to protect these
registers. On systems with SCMI mailboxes it must be able to sleep, meaning
a mutex must be used.
Clearly these two can't exist at the same time.
Add helpers for the MON_SEL locking. The outer lock must be taken in a
pre-emptible context before the inner lock can be taken. On systems with
SCMI mailboxes where the MON_SEL accesses must sleep - the inner lock
will fail to be 'taken' if the caller is unable to sleep. This will allow
the PMU driver to fail without having to check the interface type of
each MSC.
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_internal.h | 57 ++++++++++++++++++++++++++++++++-
 1 file changed, 56 insertions(+), 1 deletion(-)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index a623f405ddd8..c6f087f9fa7d 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -68,10 +68,19 @@ struct mpam_msc {
 
 	/*
 	 * mon_sel_lock protects access to the MSC hardware registers that are
-	 * affeted by MPAMCFG_MON_SEL.
+	 * affected by MPAMCFG_MON_SEL, and the mbwu_state.
+	 * Both the 'inner' and 'outer' must be taken.
+	 * For real MMIO MSC, the outer lock is unnecessary - but keeps the
+	 * code common with:
+	 * Firmware backed MSC need to sleep when accessing the MSC, which
+	 * means some code-paths will always fail. For these MSC the outer
+	 * lock is providing the protection, and the inner lock fails to
+	 * be taken if the task is unable to sleep.
+	 *
 	 * If needed, take msc->probe_lock first.
 	 */
 	struct mutex		outer_mon_sel_lock;
+	bool			outer_lock_held;
 	raw_spinlock_t		inner_mon_sel_lock;
 	unsigned long		inner_mon_sel_flags;
 
@@ -81,6 +90,52 @@ struct mpam_msc {
 	struct mpam_garbage	garbage;
 };
 
+static inline bool __must_check mpam_mon_sel_inner_lock(struct mpam_msc *msc)
+{
+	/*
+	 * The outer lock may be taken by a CPU that then issues an IPI to run
+	 * a helper that takes the inner lock. lockdep can't help us here.
+	 */
+	WARN_ON_ONCE(!msc->outer_lock_held);
+
+	if (msc->iface == MPAM_IFACE_MMIO) {
+		raw_spin_lock_irqsave(&msc->inner_mon_sel_lock, msc->inner_mon_sel_flags);
+		return true;
+	}
+
+	/* Accesses must fail if we are not pre-emptible */
+	return !!preemptible();
+}
+
+static inline void mpam_mon_sel_inner_unlock(struct mpam_msc *msc)
+{
+	WARN_ON_ONCE(!msc->outer_lock_held);
+
+	if (msc->iface == MPAM_IFACE_MMIO)
+		raw_spin_unlock_irqrestore(&msc->inner_mon_sel_lock, msc->inner_mon_sel_flags);
+}
+
+static inline void mpam_mon_sel_outer_lock(struct mpam_msc *msc)
+{
+	mutex_lock(&msc->outer_mon_sel_lock);
+	msc->outer_lock_held = true;
+}
+
+static inline void mpam_mon_sel_outer_unlock(struct mpam_msc *msc)
+{
+	msc->outer_lock_held = false;
+	mutex_unlock(&msc->outer_mon_sel_lock);
+}
+
+static inline void mpam_mon_sel_lock_held(struct mpam_msc *msc)
+{
+	WARN_ON_ONCE(!msc->outer_lock_held);
+	if (msc->iface == MPAM_IFACE_MMIO)
+		lockdep_assert_held_once(&msc->inner_mon_sel_lock);
+	else
+		lockdep_assert_preemption_enabled();
+}
+
 struct mpam_class {
 	/* mpam_components in this class */
 	struct list_head	components;
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * [PATCH 17/33] arm_mpam: Probe the hardware features resctrl supports
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (49 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 16/33] arm_mpam: Add helpers for managing the locking around the mon_sel registers James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-22 15:30 ` [PATCH 18/33] arm_mpam: Merge supported features during mpam_enable() into mpam_class James Morse
                   ` (16 subsequent siblings)
  67 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Dave Martin
Expand the probing support with the control and monitor types
we can use with resctrl.
CC: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
 * Made mpam_ris_hw_probe_hw_nrdy() more in C.
 * Added static assert on features bitmap size.
---
 drivers/resctrl/mpam_devices.c  | 156 +++++++++++++++++++++++++++++++-
 drivers/resctrl/mpam_internal.h |  54 +++++++++++
 2 files changed, 209 insertions(+), 1 deletion(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 012e09e80300..290a04f8654f 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -102,7 +102,7 @@ static LLIST_HEAD(mpam_garbage);
 
 static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
 {
-	WARN_ON_ONCE(reg > msc->mapped_hwpage_sz);
+	WARN_ON_ONCE(reg + sizeof(u32) > msc->mapped_hwpage_sz);
 	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
 
 	return readl_relaxed(msc->mapped_hwpage + reg);
@@ -131,6 +131,20 @@ static inline void _mpam_write_partsel_reg(struct mpam_msc *msc, u16 reg, u32 va
 }
 #define mpam_write_partsel_reg(msc, reg, val)  _mpam_write_partsel_reg(msc, MPAMCFG_##reg, val)
 
+static inline u32 _mpam_read_monsel_reg(struct mpam_msc *msc, u16 reg)
+{
+	mpam_mon_sel_lock_held(msc);
+	return __mpam_read_reg(msc, reg);
+}
+#define mpam_read_monsel_reg(msc, reg) _mpam_read_monsel_reg(msc, MSMON_##reg)
+
+static inline void _mpam_write_monsel_reg(struct mpam_msc *msc, u16 reg, u32 val)
+{
+	mpam_mon_sel_lock_held(msc);
+	__mpam_write_reg(msc, reg, val);
+}
+#define mpam_write_monsel_reg(msc, reg, val)   _mpam_write_monsel_reg(msc, MSMON_##reg, val)
+
 static u64 mpam_msc_read_idr(struct mpam_msc *msc)
 {
 	u64 idr_high = 0, idr_low;
@@ -643,6 +657,139 @@ static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
 	return found;
 }
 
+/*
+ * IHI009A.a has this nugget: "If a monitor does not support automatic behaviour
+ * of NRDY, software can use this bit for any purpose" - so hardware might not
+ * implement this - but it isn't RES0.
+ *
+ * Try and see what values stick in this bit. If we can write either value,
+ * its probably not implemented by hardware.
+ */
+static bool _mpam_ris_hw_probe_hw_nrdy(struct mpam_msc_ris * ris, u32 mon_reg)
+{
+	u32 now;
+	u64 mon_sel;
+	bool can_set, can_clear;
+	struct mpam_msc *msc = ris->vmsc->msc;
+
+	if (WARN_ON_ONCE(!mpam_mon_sel_inner_lock(msc)))
+		return false;
+
+	mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, 0) |
+		  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
+	_mpam_write_monsel_reg(msc, mon_reg, mon_sel);
+
+	_mpam_write_monsel_reg(msc, mon_reg, MSMON___NRDY);
+	now = _mpam_read_monsel_reg(msc, mon_reg);
+	can_set = now & MSMON___NRDY;
+
+	_mpam_write_monsel_reg(msc, mon_reg, 0);
+	now = _mpam_read_monsel_reg(msc, mon_reg);
+	can_clear = !(now & MSMON___NRDY);
+	mpam_mon_sel_inner_unlock(msc);
+
+	return (!can_set || !can_clear);
+}
+
+#define mpam_ris_hw_probe_hw_nrdy(_ris, _mon_reg)			\
+        _mpam_ris_hw_probe_hw_nrdy(_ris, MSMON_##_mon_reg)
+
+static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
+{
+	int err;
+	struct mpam_msc *msc = ris->vmsc->msc;
+	struct mpam_props *props = &ris->props;
+
+	lockdep_assert_held(&msc->probe_lock);
+	lockdep_assert_held(&msc->part_sel_lock);
+
+	/* Cache Portion partitioning */
+	if (FIELD_GET(MPAMF_IDR_HAS_CPOR_PART, ris->idr)) {
+		u32 cpor_features = mpam_read_partsel_reg(msc, CPOR_IDR);
+
+		props->cpbm_wd = FIELD_GET(MPAMF_CPOR_IDR_CPBM_WD, cpor_features);
+		if (props->cpbm_wd)
+			mpam_set_feature(mpam_feat_cpor_part, props);
+	}
+
+	/* Memory bandwidth partitioning */
+	if (FIELD_GET(MPAMF_IDR_HAS_MBW_PART, ris->idr)) {
+		u32 mbw_features = mpam_read_partsel_reg(msc, MBW_IDR);
+
+		/* portion bitmap resolution */
+		props->mbw_pbm_bits = FIELD_GET(MPAMF_MBW_IDR_BWPBM_WD, mbw_features);
+		if (props->mbw_pbm_bits &&
+		    FIELD_GET(MPAMF_MBW_IDR_HAS_PBM, mbw_features))
+			mpam_set_feature(mpam_feat_mbw_part, props);
+
+		props->bwa_wd = FIELD_GET(MPAMF_MBW_IDR_BWA_WD, mbw_features);
+		if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MAX, mbw_features))
+			mpam_set_feature(mpam_feat_mbw_max, props);
+	}
+
+	/* Performance Monitoring */
+	if (FIELD_GET(MPAMF_IDR_HAS_MSMON, ris->idr)) {
+		u32 msmon_features = mpam_read_partsel_reg(msc, MSMON_IDR);
+
+		/*
+		 * If the firmware max-nrdy-us property is missing, the
+		 * CSU counters can't be used. Should we wait forever?
+		 */
+		err = device_property_read_u32(&msc->pdev->dev,
+					       "arm,not-ready-us",
+					       &msc->nrdy_usec);
+
+		if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_CSU, msmon_features)) {
+			u32 csumonidr;
+
+			csumonidr = mpam_read_partsel_reg(msc, CSUMON_IDR);
+			props->num_csu_mon = FIELD_GET(MPAMF_CSUMON_IDR_NUM_MON, csumonidr);
+			if (props->num_csu_mon) {
+				bool hw_managed;
+
+				mpam_set_feature(mpam_feat_msmon_csu, props);
+
+				/* Is NRDY hardware managed? */
+				mpam_mon_sel_outer_lock(msc);
+				hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, CSU);
+				mpam_mon_sel_outer_unlock(msc);
+				if (hw_managed)
+					mpam_set_feature(mpam_feat_msmon_csu_hw_nrdy, props);
+			}
+
+			/*
+			 * Accept the missing firmware property if NRDY appears
+			 * un-implemented.
+			 */
+			if (err && mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, props))
+				pr_err_once("Counters are not usable because not-ready timeout was not provided by firmware.");
+		}
+		if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_MBWU, msmon_features)) {
+			bool hw_managed;
+			u32 mbwumonidr = mpam_read_partsel_reg(msc, MBWUMON_IDR);
+
+			props->num_mbwu_mon = FIELD_GET(MPAMF_MBWUMON_IDR_NUM_MON, mbwumonidr);
+			if (props->num_mbwu_mon)
+				mpam_set_feature(mpam_feat_msmon_mbwu, props);
+
+			if (FIELD_GET(MPAMF_MBWUMON_IDR_HAS_RWBW, mbwumonidr))
+				mpam_set_feature(mpam_feat_msmon_mbwu_rwbw, props);
+
+			/* Is NRDY hardware managed? */
+			mpam_mon_sel_outer_lock(msc);
+			hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, MBWU);
+			mpam_mon_sel_outer_unlock(msc);
+			if (hw_managed)
+				mpam_set_feature(mpam_feat_msmon_mbwu_hw_nrdy, props);
+
+			/*
+			 * Don't warn about any missing firmware property for
+			 * MBWU NRDY - it doesn't make any sense!
+			 */
+		}
+	}
+}
+
 static int mpam_msc_hw_probe(struct mpam_msc *msc)
 {
 	u64 idr;
@@ -663,6 +810,7 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 
 	idr = mpam_msc_read_idr(msc);
 	mutex_unlock(&msc->part_sel_lock);
+
 	msc->ris_max = FIELD_GET(MPAMF_IDR_RIS_MAX, idr);
 
 	/* Use these values so partid/pmg always starts with a valid value */
@@ -683,6 +831,12 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 		ris = mpam_get_or_create_ris(msc, ris_idx);
 		if (IS_ERR(ris))
 			return PTR_ERR(ris);
+		ris->idr = idr;
+
+		mutex_lock(&msc->part_sel_lock);
+		__mpam_part_sel(ris_idx, 0, msc);
+		mpam_ris_hw_probe(ris);
+		mutex_unlock(&msc->part_sel_lock);
 	}
 
 	spin_lock(&partid_max_lock);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index c6f087f9fa7d..9f6cd4a68cce 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -136,6 +136,56 @@ static inline void mpam_mon_sel_lock_held(struct mpam_msc *msc)
 		lockdep_assert_preemption_enabled();
 }
 
+/*
+ * When we compact the supported features, we don't care what they are.
+ * Storing them as a bitmap makes life easy.
+ */
+typedef u16 mpam_features_t;
+
+/* Bits for mpam_features_t */
+enum mpam_device_features {
+	mpam_feat_ccap_part = 0,
+	mpam_feat_cpor_part,
+	mpam_feat_mbw_part,
+	mpam_feat_mbw_min,
+	mpam_feat_mbw_max,
+	mpam_feat_mbw_prop,
+	mpam_feat_msmon,
+	mpam_feat_msmon_csu,
+	mpam_feat_msmon_csu_capture,
+	mpam_feat_msmon_csu_hw_nrdy,
+	mpam_feat_msmon_mbwu,
+	mpam_feat_msmon_mbwu_capture,
+	mpam_feat_msmon_mbwu_rwbw,
+	mpam_feat_msmon_mbwu_hw_nrdy,
+	mpam_feat_msmon_capt,
+	MPAM_FEATURE_LAST,
+};
+static_assert(BITS_PER_TYPE(mpam_features_t) >= MPAM_FEATURE_LAST);
+#define MPAM_ALL_FEATURES      ((1 << MPAM_FEATURE_LAST) - 1)
+
+struct mpam_props {
+	mpam_features_t		features;
+
+	u16			cpbm_wd;
+	u16			mbw_pbm_bits;
+	u16			bwa_wd;
+	u16			num_csu_mon;
+	u16			num_mbwu_mon;
+};
+
+static inline bool mpam_has_feature(enum mpam_device_features feat,
+				    struct mpam_props *props)
+{
+	return (1 << feat) & props->features;
+}
+
+static inline void mpam_set_feature(enum mpam_device_features feat,
+				    struct mpam_props *props)
+{
+	props->features |= (1 << feat);
+}
+
 struct mpam_class {
 	/* mpam_components in this class */
 	struct list_head	components;
@@ -175,6 +225,8 @@ struct mpam_vmsc {
 	/* mpam_msc_ris in this vmsc */
 	struct list_head	ris;
 
+	struct mpam_props	props;
+
 	/* All RIS in this vMSC are members of this MSC */
 	struct mpam_msc		*msc;
 
@@ -186,6 +238,8 @@ struct mpam_vmsc {
 
 struct mpam_msc_ris {
 	u8			ris_idx;
+	u64			idr;
+	struct mpam_props	props;
 
 	cpumask_t		affinity;
 
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * [PATCH 18/33] arm_mpam: Merge supported features during mpam_enable() into mpam_class
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (50 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 17/33] arm_mpam: Probe the hardware features resctrl supports James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-22 15:30 ` [PATCH 19/33] arm_mpam: Reset MSC controls from cpu hp callbacks James Morse
                   ` (15 subsequent siblings)
  67 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
To make a decision about whether to expose an mpam class as
a resctrl resource we need to know its overall supported
features and properties.
Once we've probed all the resources, we can walk the tree
and produce overall values by merging the bitmaps. This
eliminates features that are only supported by some MSC
that make up a component or class.
If bitmap properties are mismatched within a component we
cannot support the mismatched feature.
Care has to be taken as vMSC may hold mismatched RIS.
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_devices.c  | 215 ++++++++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h |   8 ++
 2 files changed, 223 insertions(+)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 290a04f8654f..bb62de6d3847 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -1186,8 +1186,223 @@ static struct platform_driver mpam_msc_driver = {
 	.remove = mpam_msc_drv_remove,
 };
 
+/* Any of these features mean the BWA_WD field is valid. */
+static bool mpam_has_bwa_wd_feature(struct mpam_props *props)
+{
+	if (mpam_has_feature(mpam_feat_mbw_min, props))
+		return true;
+	if (mpam_has_feature(mpam_feat_mbw_max, props))
+		return true;
+	if (mpam_has_feature(mpam_feat_mbw_prop, props))
+		return true;
+	return false;
+}
+
+#define MISMATCHED_HELPER(parent, child, helper, field, alias)		\
+	helper(parent) &&						\
+	((helper(child) && (parent)->field != (child)->field) ||	\
+	 (!helper(child) && !(alias)))
+
+#define MISMATCHED_FEAT(parent, child, feat, field, alias)		     \
+	mpam_has_feature((feat), (parent)) &&				     \
+	((mpam_has_feature((feat), (child)) && (parent)->field != (child)->field) || \
+	 (!mpam_has_feature((feat), (child)) && !(alias)))
+
+#define CAN_MERGE_FEAT(parent, child, feat, alias)			\
+	(alias) && !mpam_has_feature((feat), (parent)) &&		\
+	mpam_has_feature((feat), (child))
+
+/*
+ * Combine two props fields.
+ * If this is for controls that alias the same resource, it is safe to just
+ * copy the values over. If two aliasing controls implement the same scheme
+ * a safe value must be picked.
+ * For non-aliasing controls, these control different resources, and the
+ * resulting safe value must be compatible with both. When merging values in
+ * the tree, all the aliasing resources must be handled first.
+ * On mismatch, parent is modified.
+ */
+static void __props_mismatch(struct mpam_props *parent,
+			     struct mpam_props *child, bool alias)
+{
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_cpor_part, alias)) {
+		parent->cpbm_wd = child->cpbm_wd;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_cpor_part,
+				   cpbm_wd, alias)) {
+		pr_debug("%s cleared cpor_part\n", __func__);
+		mpam_clear_feature(mpam_feat_cpor_part, &parent->features);
+		parent->cpbm_wd = 0;
+	}
+
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_mbw_part, alias)) {
+		parent->mbw_pbm_bits = child->mbw_pbm_bits;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_mbw_part,
+				   mbw_pbm_bits, alias)) {
+		pr_debug("%s cleared mbw_part\n", __func__);
+		mpam_clear_feature(mpam_feat_mbw_part, &parent->features);
+		parent->mbw_pbm_bits = 0;
+	}
+
+	/* bwa_wd is a count of bits, fewer bits means less precision */
+	if (alias && !mpam_has_bwa_wd_feature(parent) && mpam_has_bwa_wd_feature(child)) {
+		parent->bwa_wd = child->bwa_wd;
+	} else if (MISMATCHED_HELPER(parent, child, mpam_has_bwa_wd_feature,
+				     bwa_wd, alias)) {
+		pr_debug("%s took the min bwa_wd\n", __func__);
+		parent->bwa_wd = min(parent->bwa_wd, child->bwa_wd);
+	}
+
+	/* For num properties, take the minimum */
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_msmon_csu, alias)) {
+		parent->num_csu_mon = child->num_csu_mon;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_msmon_csu,
+				   num_csu_mon, alias)) {
+		pr_debug("%s took the min num_csu_mon\n", __func__);
+		parent->num_csu_mon = min(parent->num_csu_mon, child->num_csu_mon);
+	}
+
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_msmon_mbwu, alias)) {
+		parent->num_mbwu_mon = child->num_mbwu_mon;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_msmon_mbwu,
+				   num_mbwu_mon, alias)) {
+		pr_debug("%s took the min num_mbwu_mon\n", __func__);
+		parent->num_mbwu_mon = min(parent->num_mbwu_mon, child->num_mbwu_mon);
+	}
+
+	if (alias) {
+		/* Merge features for aliased resources */
+		parent->features |= child->features;
+	} else {
+		/* Clear missing features for non aliasing */
+		parent->features &= child->features;
+	}
+}
+
+/*
+ * If a vmsc doesn't match class feature/configuration, do the right thing(tm).
+ * For 'num' properties we can just take the minimum.
+ * For properties where the mismatched unused bits would make a difference, we
+ * nobble the class feature, as we can't configure all the resources.
+ * e.g. The L3 cache is composed of two resources with 13 and 17 portion
+ * bitmaps respectively.
+ */
+static void
+__class_props_mismatch(struct mpam_class *class, struct mpam_vmsc *vmsc)
+{
+	struct mpam_props *cprops = &class->props;
+	struct mpam_props *vprops = &vmsc->props;
+
+	lockdep_assert_held(&mpam_list_lock); /* we modify class */
+
+	pr_debug("%s: Merging features for class:0x%lx &= vmsc:0x%lx\n",
+		 dev_name(&vmsc->msc->pdev->dev),
+		 (long)cprops->features, (long)vprops->features);
+
+	/* Take the safe value for any common features */
+	__props_mismatch(cprops, vprops, false);
+}
+
+static void
+__vmsc_props_mismatch(struct mpam_vmsc *vmsc, struct mpam_msc_ris *ris)
+{
+	struct mpam_props *rprops = &ris->props;
+	struct mpam_props *vprops = &vmsc->props;
+
+	lockdep_assert_held(&mpam_list_lock); /* we modify vmsc */
+
+	pr_debug("%s: Merging features for vmsc:0x%lx |= ris:0x%lx\n",
+		 dev_name(&vmsc->msc->pdev->dev),
+		 (long)vprops->features, (long)rprops->features);
+
+	/*
+	 * Merge mismatched features - Copy any features that aren't common,
+	 * but take the safe value for any common features.
+	 */
+	__props_mismatch(vprops, rprops, true);
+}
+
+/*
+ * Copy the first component's first vMSC's properties and features to the
+ * class. __class_props_mismatch() will remove conflicts.
+ * It is not possible to have a class with no components, or a component with
+ * no resources. The vMSC properties have already been built.
+ */
+static void mpam_enable_init_class_features(struct mpam_class *class)
+{
+	struct mpam_vmsc *vmsc;
+	struct mpam_component *comp;
+
+	comp = list_first_entry_or_null(&class->components,
+					struct mpam_component, class_list);
+	if (WARN_ON(!comp))
+		return;
+
+	vmsc = list_first_entry_or_null(&comp->vmsc,
+					struct mpam_vmsc, comp_list);
+	if (WARN_ON(!vmsc))
+		return;
+
+	class->props = vmsc->props;
+}
+
+static void mpam_enable_merge_vmsc_features(struct mpam_component *comp)
+{
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc_ris *ris;
+	struct mpam_class *class = comp->class;
+
+	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+		list_for_each_entry(ris, &vmsc->ris, vmsc_list) {
+			__vmsc_props_mismatch(vmsc, ris);
+			class->nrdy_usec = max(class->nrdy_usec,
+					       vmsc->msc->nrdy_usec);
+		}
+	}
+}
+
+static void mpam_enable_merge_class_features(struct mpam_component *comp)
+{
+	struct mpam_vmsc *vmsc;
+	struct mpam_class *class = comp->class;
+
+	list_for_each_entry(vmsc, &comp->vmsc, comp_list)
+		__class_props_mismatch(class, vmsc);
+}
+
+/*
+ * Merge all the common resource features into class.
+ * vmsc features are bitwise-or'd together, this must be done first.
+ * Next the class features are the bitwise-and of all the vmsc features.
+ * Other features are the min/max as appropriate.
+ *
+ * To avoid walking the whole tree twice, the class->nrdy_usec property is
+ * updated when working with the vmsc as it is a max(), and doesn't need
+ * initialising first.
+ */
+static void mpam_enable_merge_features(struct list_head *all_classes_list)
+{
+	struct mpam_class *class;
+	struct mpam_component *comp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_for_each_entry(class, all_classes_list, classes_list) {
+		list_for_each_entry(comp, &class->components, class_list)
+			mpam_enable_merge_vmsc_features(comp);
+
+		mpam_enable_init_class_features(class);
+
+		list_for_each_entry(comp, &class->components, class_list)
+			mpam_enable_merge_class_features(comp);
+	}
+}
+
 static void mpam_enable_once(void)
 {
+	mutex_lock(&mpam_list_lock);
+	mpam_enable_merge_features(&mpam_classes);
+	mutex_unlock(&mpam_list_lock);
+
 	/*
 	 * Once the cpuhp callbacks have been changed, mpam_partid_max can no
 	 * longer change.
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 9f6cd4a68cce..a2b0ff411138 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -186,12 +186,20 @@ static inline void mpam_set_feature(enum mpam_device_features feat,
 	props->features |= (1 << feat);
 }
 
+static inline void mpam_clear_feature(enum mpam_device_features feat,
+				      mpam_features_t *supported)
+{
+	*supported &= ~(1 << feat);
+}
+
 struct mpam_class {
 	/* mpam_components in this class */
 	struct list_head	components;
 
 	cpumask_t		affinity;
 
+	struct mpam_props	props;
+	u32			nrdy_usec;
 	u8			level;
 	enum mpam_class_types	type;
 
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * [PATCH 19/33] arm_mpam: Reset MSC controls from cpu hp callbacks
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (51 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 18/33] arm_mpam: Merge supported features during mpam_enable() into mpam_class James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-22 15:30 ` [PATCH 20/33] arm_mpam: Add a helper to touch an MSC from any CPU James Morse
                   ` (14 subsequent siblings)
  67 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Rohit Mathew
When a CPU comes online, it may bring a newly accessible MSC with
it. Only the default partid has its value reset by hardware, and
even then the MSC might not have been reset since its config was
previously dirtyied. e.g. Kexec.
Any in-use partid must have its configuration restored, or reset.
In-use partids may be held in caches and evicted later.
MSC are also reset when CPUs are taken offline to cover cases where
firmware doesn't reset the MSC over reboot using UEFI, or kexec
where there is no firmware involvement.
If the configuration for a RIS has not been touched since it was
brought online, it does not need resetting again.
To reset, write the maximum values for all discovered controls.
CC: Rohit Mathew <Rohit.Mathew@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
 * Last bitmap write will always be non-zero.
  * Dropped READ_ONCE() - teh value can no longer change.
---
 drivers/resctrl/mpam_devices.c  | 121 ++++++++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h |   8 +++
 2 files changed, 129 insertions(+)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index bb62de6d3847..c1f01dd748ad 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -7,6 +7,7 @@
 #include <linux/atomic.h>
 #include <linux/arm_mpam.h>
 #include <linux/bitfield.h>
+#include <linux/bitmap.h>
 #include <linux/cacheinfo.h>
 #include <linux/cpu.h>
 #include <linux/cpumask.h>
@@ -849,8 +850,115 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 	return 0;
 }
 
+static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
+{
+	u32 num_words, msb;
+	u32 bm = ~0;
+	int i;
+
+	lockdep_assert_held(&msc->part_sel_lock);
+
+	if (wd == 0)
+		return;
+
+	/*
+	 * Write all ~0 to all but the last 32bit-word, which may
+	 * have fewer bits...
+	 */
+	num_words = DIV_ROUND_UP(wd, 32);
+	for (i = 0; i < num_words - 1; i++, reg += sizeof(bm))
+		__mpam_write_reg(msc, reg, bm);
+
+	/*
+	 * ....and then the last (maybe) partial 32bit word. When wd is a
+	 * multiple of 32, msb should be 31 to write a full 32bit word.
+	 */
+	msb = (wd - 1) % 32;
+	bm = GENMASK(msb, 0);
+	__mpam_write_reg(msc, reg, bm);
+}
+
+static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
+{
+	u16 bwa_fract = MPAMCFG_MBW_MAX_MAX;
+	struct mpam_msc *msc = ris->vmsc->msc;
+	struct mpam_props *rprops = &ris->props;
+
+	mpam_assert_srcu_read_lock_held();
+
+	mutex_lock(&msc->part_sel_lock);
+	__mpam_part_sel(ris->ris_idx, partid, msc);
+
+	if (mpam_has_feature(mpam_feat_cpor_part, rprops))
+		mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM, rprops->cpbm_wd);
+
+	if (mpam_has_feature(mpam_feat_mbw_part, rprops))
+		mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM, rprops->mbw_pbm_bits);
+
+	if (mpam_has_feature(mpam_feat_mbw_min, rprops))
+		mpam_write_partsel_reg(msc, MBW_MIN, 0);
+
+	if (mpam_has_feature(mpam_feat_mbw_max, rprops))
+		mpam_write_partsel_reg(msc, MBW_MAX, bwa_fract);
+
+	if (mpam_has_feature(mpam_feat_mbw_prop, rprops))
+		mpam_write_partsel_reg(msc, MBW_PROP, bwa_fract);
+	mutex_unlock(&msc->part_sel_lock);
+}
+
+static void mpam_reset_ris(struct mpam_msc_ris *ris)
+{
+	u16 partid, partid_max;
+
+	mpam_assert_srcu_read_lock_held();
+
+	if (ris->in_reset_state)
+		return;
+
+	spin_lock(&partid_max_lock);
+	partid_max = mpam_partid_max;
+	spin_unlock(&partid_max_lock);
+	for (partid = 0; partid < partid_max; partid++)
+		mpam_reset_ris_partid(ris, partid);
+}
+
+static void mpam_reset_msc(struct mpam_msc *msc, bool online)
+{
+	int idx;
+	struct mpam_msc_ris *ris;
+
+	mpam_assert_srcu_read_lock_held();
+
+	mpam_mon_sel_outer_lock(msc);
+	idx = srcu_read_lock(&mpam_srcu);
+	list_for_each_entry_srcu(ris, &msc->ris, msc_list, srcu_read_lock_held(&mpam_srcu)) {
+		mpam_reset_ris(ris);
+
+		/*
+		 * Set in_reset_state when coming online. The reset state
+		 * for non-zero partid may be lost while the CPUs are offline.
+		 */
+		ris->in_reset_state = online;
+	}
+	srcu_read_unlock(&mpam_srcu, idx);
+	mpam_mon_sel_outer_unlock(msc);
+}
+
 static int mpam_cpu_online(unsigned int cpu)
 {
+	int idx;
+	struct mpam_msc *msc;
+
+	idx = srcu_read_lock(&mpam_srcu);
+	list_for_each_entry_srcu(msc, &mpam_all_msc, glbl_list, srcu_read_lock_held(&mpam_srcu)) {
+		if (!cpumask_test_cpu(cpu, &msc->accessibility))
+			continue;
+
+		if (atomic_fetch_inc(&msc->online_refs) == 0)
+			mpam_reset_msc(msc, true);
+	}
+	srcu_read_unlock(&mpam_srcu, idx);
+
 	return 0;
 }
 
@@ -886,6 +994,19 @@ static int mpam_discovery_cpu_online(unsigned int cpu)
 
 static int mpam_cpu_offline(unsigned int cpu)
 {
+	int idx;
+	struct mpam_msc *msc;
+
+	idx = srcu_read_lock(&mpam_srcu);
+	list_for_each_entry_srcu(msc, &mpam_all_msc, glbl_list, srcu_read_lock_held(&mpam_srcu)) {
+		if (!cpumask_test_cpu(cpu, &msc->accessibility))
+			continue;
+
+		if (atomic_dec_and_test(&msc->online_refs))
+			mpam_reset_msc(msc, false);
+	}
+	srcu_read_unlock(&mpam_srcu, idx);
+
 	return 0;
 }
 
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index a2b0ff411138..466d670a01eb 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -5,6 +5,7 @@
 #define MPAM_INTERNAL_H
 
 #include <linux/arm_mpam.h>
+#include <linux/atomic.h>
 #include <linux/cpumask.h>
 #include <linux/io.h>
 #include <linux/llist.h>
@@ -43,6 +44,7 @@ struct mpam_msc {
 	struct pcc_mbox_chan	*pcc_chan;
 	u32			nrdy_usec;
 	cpumask_t		accessibility;
+	atomic_t		online_refs;
 
 	/*
 	 * probe_lock is only take during discovery. After discovery these
@@ -248,6 +250,7 @@ struct mpam_msc_ris {
 	u8			ris_idx;
 	u64			idr;
 	struct mpam_props	props;
+	bool			in_reset_state;
 
 	cpumask_t		affinity;
 
@@ -267,6 +270,11 @@ struct mpam_msc_ris {
 extern struct srcu_struct mpam_srcu;
 extern struct list_head mpam_classes;
 
+static inline void mpam_assert_srcu_read_lock_held(void)
+{
+	WARN_ON_ONCE(!srcu_read_lock_held((&mpam_srcu)));
+}
+
 /* System wide partid/pmg values */
 extern u16 mpam_partid_max;
 extern u8 mpam_pmg_max;
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * [PATCH 20/33] arm_mpam: Add a helper to touch an MSC from any CPU
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (52 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 19/33] arm_mpam: Reset MSC controls from cpu hp callbacks James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-22 15:30 ` [PATCH 21/33] arm_mpam: Extend reset logic to allow devices to be reset any time James Morse
                   ` (13 subsequent siblings)
  67 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Resetting RIS entries from the cpuhp callback is easy as the
callback occurs on the correct CPU. This won't be true for any other
caller that wants to reset or configure an MSC.
Add a helper that schedules the provided function if necessary.
Prevent the cpuhp callbacks from changing the MSC state by taking the
cpuhp lock.
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_devices.c | 37 +++++++++++++++++++++++++++++++---
 1 file changed, 34 insertions(+), 3 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index c1f01dd748ad..759244966736 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -906,20 +906,51 @@ static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
 	mutex_unlock(&msc->part_sel_lock);
 }
 
-static void mpam_reset_ris(struct mpam_msc_ris *ris)
+/*
+ * Called via smp_call_on_cpu() to prevent migration, while still being
+ * pre-emptible.
+ */
+static int mpam_reset_ris(void *arg)
 {
 	u16 partid, partid_max;
+	struct mpam_msc_ris *ris = arg;
 
 	mpam_assert_srcu_read_lock_held();
 
 	if (ris->in_reset_state)
-		return;
+		return 0;
 
 	spin_lock(&partid_max_lock);
 	partid_max = mpam_partid_max;
 	spin_unlock(&partid_max_lock);
 	for (partid = 0; partid < partid_max; partid++)
 		mpam_reset_ris_partid(ris, partid);
+
+	return 0;
+}
+
+/*
+ * Get the preferred CPU for this MSC. If it is accessible from this CPU,
+ * this CPU is preferred. This can be preempted/migrated, it will only result
+ * in more work.
+ */
+static int mpam_get_msc_preferred_cpu(struct mpam_msc *msc)
+{
+	int cpu = raw_smp_processor_id();
+
+	if (cpumask_test_cpu(cpu, &msc->accessibility))
+		return cpu;
+
+	return cpumask_first_and(&msc->accessibility, cpu_online_mask);
+}
+
+static int mpam_touch_msc(struct mpam_msc *msc, int (*fn)(void *a), void *arg)
+{
+	lockdep_assert_irqs_enabled();
+	lockdep_assert_cpus_held();
+	mpam_assert_srcu_read_lock_held();
+
+	return smp_call_on_cpu(mpam_get_msc_preferred_cpu(msc), fn, arg, true);
 }
 
 static void mpam_reset_msc(struct mpam_msc *msc, bool online)
@@ -932,7 +963,7 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
 	mpam_mon_sel_outer_lock(msc);
 	idx = srcu_read_lock(&mpam_srcu);
 	list_for_each_entry_srcu(ris, &msc->ris, msc_list, srcu_read_lock_held(&mpam_srcu)) {
-		mpam_reset_ris(ris);
+		mpam_touch_msc(msc, &mpam_reset_ris, ris);
 
 		/*
 		 * Set in_reset_state when coming online. The reset state
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * [PATCH 21/33] arm_mpam: Extend reset logic to allow devices to be reset any time
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (53 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 20/33] arm_mpam: Add a helper to touch an MSC from any CPU James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-22 15:30 ` [PATCH 22/33] arm_mpam: Register and enable IRQs James Morse
                   ` (12 subsequent siblings)
  67 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
cpuhp callbacks aren't the only time the MSC configuration may need to
be reset. Resctrl has an API call to reset a class.
If an MPAM error interrupt arrives it indicates the driver has
misprogrammed an MSC. The safest thing to do is reset all the MSCs
and disable MPAM.
Add a helper to reset RIS via their class. Call this from mpam_disable(),
which can be scheduled from the error interrupt handler.
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_devices.c  | 62 +++++++++++++++++++++++++++++++--
 drivers/resctrl/mpam_internal.h |  1 +
 2 files changed, 61 insertions(+), 2 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 759244966736..3516cbe8623e 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -915,8 +915,6 @@ static int mpam_reset_ris(void *arg)
 	u16 partid, partid_max;
 	struct mpam_msc_ris *ris = arg;
 
-	mpam_assert_srcu_read_lock_held();
-
 	if (ris->in_reset_state)
 		return 0;
 
@@ -1569,6 +1567,66 @@ static void mpam_enable_once(void)
 	       mpam_partid_max + 1, mpam_pmg_max + 1);
 }
 
+static void mpam_reset_component_locked(struct mpam_component *comp)
+{
+	int idx;
+	struct mpam_msc *msc;
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc_ris *ris;
+
+	might_sleep();
+	lockdep_assert_cpus_held();
+
+	idx = srcu_read_lock(&mpam_srcu);
+	list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
+		msc = vmsc->msc;
+
+		list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
+			if (!ris->in_reset_state)
+				mpam_touch_msc(msc, mpam_reset_ris, ris);
+			ris->in_reset_state = true;
+		}
+	}
+	srcu_read_unlock(&mpam_srcu, idx);
+}
+
+static void mpam_reset_class_locked(struct mpam_class *class)
+{
+	int idx;
+	struct mpam_component *comp;
+
+	lockdep_assert_cpus_held();
+
+	idx = srcu_read_lock(&mpam_srcu);
+	list_for_each_entry_rcu(comp, &class->components, class_list)
+		mpam_reset_component_locked(comp);
+	srcu_read_unlock(&mpam_srcu, idx);
+}
+
+static void mpam_reset_class(struct mpam_class *class)
+{
+	cpus_read_lock();
+	mpam_reset_class_locked(class);
+	cpus_read_unlock();
+}
+
+/*
+ * Called in response to an error IRQ.
+ * All of MPAMs errors indicate a software bug, restore any modified
+ * controls to their reset values.
+ */
+void mpam_disable(void)
+{
+	int idx;
+	struct mpam_class *class;
+
+	idx = srcu_read_lock(&mpam_srcu);
+	list_for_each_entry_srcu(class, &mpam_classes, classes_list,
+				 srcu_read_lock_held(&mpam_srcu))
+		mpam_reset_class(class);
+	srcu_read_unlock(&mpam_srcu, idx);
+}
+
 /*
  * Enable mpam once all devices have been probed.
  * Scheduled by mpam_discovery_cpu_online() once all devices have been created.
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 466d670a01eb..b30fee2b7674 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -281,6 +281,7 @@ extern u8 mpam_pmg_max;
 
 /* Scheduled work callback to enable mpam once all MSC have been probed */
 void mpam_enable(struct work_struct *work);
+void mpam_disable(void);
 
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * [PATCH 22/33] arm_mpam: Register and enable IRQs
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (54 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 21/33] arm_mpam: Extend reset logic to allow devices to be reset any time James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-09-01 10:05   ` Ben Horgan
  2025-08-22 15:30 ` [PATCH 23/33] arm_mpam: Use a static key to indicate when mpam is enabled James Morse
                   ` (11 subsequent siblings)
  67 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Register and enable error IRQs. All the MPAM error interrupts indicate a
software bug, e.g. out of range partid. If the error interrupt is ever
signalled, attempt to disable MPAM.
Only the irq handler accesses the ESR register, so no locking is needed.
The work to disable MPAM after an error needs to happen at process
context, use a threaded interrupt.
There is no support for percpu threaded interrupts, for now schedule
the work to be done from the irq handler.
Enabling the IRQs in the MSC may involve cross calling to a CPU that
can access the MSC.
Once the IRQ is requested, the mpam_disable() path can be called
asynchronously, which will walk structures sized by max_partid. Ensure
this size is fixed before the interrupt is requested.
CC: Rohit Mathew <rohit.mathew@arm.com>
Tested-by: Rohit Mathew <rohit.mathew@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
 * Use guard marco when walking srcu list.
 * Use INTEN macro for enabling interrupts.
 * Move partid_max_published up earlier in mpam_enable_once().
---
 drivers/resctrl/mpam_devices.c  | 311 +++++++++++++++++++++++++++++++-
 drivers/resctrl/mpam_internal.h |   9 +-
 2 files changed, 312 insertions(+), 8 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 3516cbe8623e..210d64fad0b1 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -14,6 +14,9 @@
 #include <linux/device.h>
 #include <linux/errno.h>
 #include <linux/gfp.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/irqdesc.h>
 #include <linux/list.h>
 #include <linux/lockdep.h>
 #include <linux/mutex.h>
@@ -62,6 +65,12 @@ static DEFINE_SPINLOCK(partid_max_lock);
  */
 static DECLARE_WORK(mpam_enable_work, &mpam_enable);
 
+/*
+ * All mpam error interrupts indicate a software bug. On receipt, disable the
+ * driver.
+ */
+static DECLARE_WORK(mpam_broken_work, &mpam_disable);
+
 /*
  * An MSC is a physical container for controls and monitors, each identified by
  * their RIS index. These share a base-address, interrupts and some MMIO
@@ -159,6 +168,24 @@ static u64 mpam_msc_read_idr(struct mpam_msc *msc)
 	return (idr_high << 32) | idr_low;
 }
 
+static void mpam_msc_zero_esr(struct mpam_msc *msc)
+{
+	__mpam_write_reg(msc, MPAMF_ESR, 0);
+	if (msc->has_extd_esr)
+		__mpam_write_reg(msc, MPAMF_ESR + 4, 0);
+}
+
+static u64 mpam_msc_read_esr(struct mpam_msc *msc)
+{
+	u64 esr_high = 0, esr_low;
+
+	esr_low = __mpam_read_reg(msc, MPAMF_ESR);
+	if (msc->has_extd_esr)
+		esr_high = __mpam_read_reg(msc, MPAMF_ESR + 4);
+
+	return (esr_high << 32) | esr_low;
+}
+
 static void __mpam_part_sel_raw(u32 partsel, struct mpam_msc *msc)
 {
 	lockdep_assert_held(&msc->part_sel_lock);
@@ -405,12 +432,12 @@ static void mpam_msc_destroy(struct mpam_msc *msc)
 
 	lockdep_assert_held(&mpam_list_lock);
 
-	list_del_rcu(&msc->glbl_list);
-	platform_set_drvdata(pdev, NULL);
-
 	list_for_each_entry_safe(ris, tmp, &msc->ris, msc_list)
 		mpam_ris_destroy(ris);
 
+	list_del_rcu(&msc->glbl_list);
+	platform_set_drvdata(pdev, NULL);
+
 	add_to_garbage(msc);
 	msc->garbage.pdev = pdev;
 }
@@ -828,6 +855,7 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 		pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
 		msc->partid_max = min(msc->partid_max, partid_max);
 		msc->pmg_max = min(msc->pmg_max, pmg_max);
+		msc->has_extd_esr = FIELD_GET(MPAMF_IDR_HAS_EXTD_ESR, idr);
 
 		ris = mpam_get_or_create_ris(msc, ris_idx);
 		if (IS_ERR(ris))
@@ -840,6 +868,9 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 		mutex_unlock(&msc->part_sel_lock);
 	}
 
+	/* Clear any stale errors */
+	mpam_msc_zero_esr(msc);
+
 	spin_lock(&partid_max_lock);
 	mpam_partid_max = min(mpam_partid_max, msc->partid_max);
 	mpam_pmg_max = min(mpam_pmg_max, msc->pmg_max);
@@ -973,6 +1004,13 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
 	mpam_mon_sel_outer_unlock(msc);
 }
 
+static void _enable_percpu_irq(void *_irq)
+{
+	int *irq = _irq;
+
+	enable_percpu_irq(*irq, IRQ_TYPE_NONE);
+}
+
 static int mpam_cpu_online(unsigned int cpu)
 {
 	int idx;
@@ -983,6 +1021,9 @@ static int mpam_cpu_online(unsigned int cpu)
 		if (!cpumask_test_cpu(cpu, &msc->accessibility))
 			continue;
 
+		if (msc->reenable_error_ppi)
+			_enable_percpu_irq(&msc->reenable_error_ppi);
+
 		if (atomic_fetch_inc(&msc->online_refs) == 0)
 			mpam_reset_msc(msc, true);
 	}
@@ -1031,6 +1072,9 @@ static int mpam_cpu_offline(unsigned int cpu)
 		if (!cpumask_test_cpu(cpu, &msc->accessibility))
 			continue;
 
+		if (msc->reenable_error_ppi)
+			disable_percpu_irq(msc->reenable_error_ppi);
+
 		if (atomic_dec_and_test(&msc->online_refs))
 			mpam_reset_msc(msc, false);
 	}
@@ -1057,6 +1101,51 @@ static void mpam_register_cpuhp_callbacks(int (*online)(unsigned int online),
 	mutex_unlock(&mpam_cpuhp_state_lock);
 }
 
+static int __setup_ppi(struct mpam_msc *msc)
+{
+	int cpu;
+
+	msc->error_dev_id = alloc_percpu_gfp(struct mpam_msc *, GFP_KERNEL);
+	if (!msc->error_dev_id)
+		return -ENOMEM;
+
+	for_each_cpu(cpu, &msc->accessibility) {
+		struct mpam_msc *empty = *per_cpu_ptr(msc->error_dev_id, cpu);
+
+		if (empty) {
+			pr_err_once("%s shares PPI with %s!\n",
+				    dev_name(&msc->pdev->dev),
+				    dev_name(&empty->pdev->dev));
+			return -EBUSY;
+		}
+		*per_cpu_ptr(msc->error_dev_id, cpu) = msc;
+	}
+
+	return 0;
+}
+
+static int mpam_msc_setup_error_irq(struct mpam_msc *msc)
+{
+	int irq;
+
+	irq = platform_get_irq_byname_optional(msc->pdev, "error");
+	if (irq <= 0)
+		return 0;
+
+	/* Allocate and initialise the percpu device pointer for PPI */
+	if (irq_is_percpu(irq))
+		return __setup_ppi(msc);
+
+	/* sanity check: shared interrupts can be routed anywhere? */
+	if (!cpumask_equal(&msc->accessibility, cpu_possible_mask)) {
+		pr_err_once("msc:%u is a private resource with a shared error interrupt",
+			    msc->id);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static int mpam_dt_count_msc(void)
 {
 	int count = 0;
@@ -1265,6 +1354,10 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
 			break;
 		}
 
+		err = mpam_msc_setup_error_irq(msc);
+		if (err)
+			break;
+
 		if (device_property_read_u32(&pdev->dev, "pcc-channel",
 					     &msc->pcc_subspace_id))
 			msc->iface = MPAM_IFACE_MMIO;
@@ -1547,11 +1640,171 @@ static void mpam_enable_merge_features(struct list_head *all_classes_list)
 	}
 }
 
+static char *mpam_errcode_names[16] = {
+	[0] = "No error",
+	[1] = "PARTID_SEL_Range",
+	[2] = "Req_PARTID_Range",
+	[3] = "MSMONCFG_ID_RANGE",
+	[4] = "Req_PMG_Range",
+	[5] = "Monitor_Range",
+	[6] = "intPARTID_Range",
+	[7] = "Unexpected_INTERNAL",
+	[8] = "Undefined_RIS_PART_SEL",
+	[9] = "RIS_No_Control",
+	[10] = "Undefined_RIS_MON_SEL",
+	[11] = "RIS_No_Monitor",
+	[12 ... 15] = "Reserved"
+};
+
+static int mpam_enable_msc_ecr(void *_msc)
+{
+	struct mpam_msc *msc = _msc;
+
+	__mpam_write_reg(msc, MPAMF_ECR, MPAMF_ECR_INTEN);
+
+	return 0;
+}
+
+static int mpam_disable_msc_ecr(void *_msc)
+{
+	struct mpam_msc *msc = _msc;
+
+	__mpam_write_reg(msc, MPAMF_ECR, 0);
+
+	return 0;
+}
+
+static irqreturn_t __mpam_irq_handler(int irq, struct mpam_msc *msc)
+{
+	u64 reg;
+	u16 partid;
+	u8 errcode, pmg, ris;
+
+	if (WARN_ON_ONCE(!msc) ||
+	    WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(),
+					   &msc->accessibility)))
+		return IRQ_NONE;
+
+	reg = mpam_msc_read_esr(msc);
+
+	errcode = FIELD_GET(MPAMF_ESR_ERRCODE, reg);
+	if (!errcode)
+		return IRQ_NONE;
+
+	/* Clear level triggered irq */
+	mpam_msc_zero_esr(msc);
+
+	partid = FIELD_GET(MPAMF_ESR_PARTID_MON, reg);
+	pmg = FIELD_GET(MPAMF_ESR_PMG, reg);
+	ris = FIELD_GET(MPAMF_ESR_RIS, reg);
+
+	pr_err("error irq from msc:%u '%s', partid:%u, pmg: %u, ris: %u\n",
+	       msc->id, mpam_errcode_names[errcode], partid, pmg, ris);
+
+	if (irq_is_percpu(irq)) {
+		mpam_disable_msc_ecr(msc);
+		schedule_work(&mpam_broken_work);
+		return IRQ_HANDLED;
+	}
+
+	return IRQ_WAKE_THREAD;
+}
+
+static irqreturn_t mpam_ppi_handler(int irq, void *dev_id)
+{
+	struct mpam_msc *msc = *(struct mpam_msc **)dev_id;
+
+	return __mpam_irq_handler(irq, msc);
+}
+
+static irqreturn_t mpam_spi_handler(int irq, void *dev_id)
+{
+	struct mpam_msc *msc = dev_id;
+
+	return __mpam_irq_handler(irq, msc);
+}
+
+static irqreturn_t mpam_disable_thread(int irq, void *dev_id);
+
+static int mpam_register_irqs(void)
+{
+	int err, irq;
+	struct mpam_msc *msc;
+
+	lockdep_assert_cpus_held();
+
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_srcu(msc, &mpam_all_msc, glbl_list, srcu_read_lock_held(&mpam_srcu)) {
+		irq = platform_get_irq_byname_optional(msc->pdev, "error");
+		if (irq <= 0)
+			continue;
+
+		/* The MPAM spec says the interrupt can be SPI, PPI or LPI */
+		/* We anticipate sharing the interrupt with other MSCs */
+		if (irq_is_percpu(irq)) {
+			err = request_percpu_irq(irq, &mpam_ppi_handler,
+						 "mpam:msc:error",
+						 msc->error_dev_id);
+			if (err)
+				return err;
+
+			msc->reenable_error_ppi = irq;
+			smp_call_function_many(&msc->accessibility,
+					       &_enable_percpu_irq, &irq,
+					       true);
+		} else {
+			err = devm_request_threaded_irq(&msc->pdev->dev, irq,
+							&mpam_spi_handler,
+							&mpam_disable_thread,
+							IRQF_SHARED,
+							"mpam:msc:error", msc);
+			if (err)
+				return err;
+		}
+
+		msc->error_irq_requested = true;
+		mpam_touch_msc(msc, mpam_enable_msc_ecr, msc);
+		msc->error_irq_hw_enabled = true;
+	}
+
+	return 0;
+}
+
+static void mpam_unregister_irqs(void)
+{
+	int irq, idx;
+	struct mpam_msc *msc;
+
+	cpus_read_lock();
+	/* take the lock as free_irq() can sleep */
+	idx = srcu_read_lock(&mpam_srcu);
+	list_for_each_entry_srcu(msc, &mpam_all_msc, glbl_list, srcu_read_lock_held(&mpam_srcu)) {
+		irq = platform_get_irq_byname_optional(msc->pdev, "error");
+		if (irq <= 0)
+			continue;
+
+		if (msc->error_irq_hw_enabled) {
+			mpam_touch_msc(msc, mpam_disable_msc_ecr, msc);
+			msc->error_irq_hw_enabled = false;
+		}
+
+		if (msc->error_irq_requested) {
+			if (irq_is_percpu(irq)) {
+				msc->reenable_error_ppi = 0;
+				free_percpu_irq(irq, msc->error_dev_id);
+			} else {
+				devm_free_irq(&msc->pdev->dev, irq, msc);
+			}
+			msc->error_irq_requested = false;
+		}
+	}
+	srcu_read_unlock(&mpam_srcu, idx);
+	cpus_read_unlock();
+}
+
 static void mpam_enable_once(void)
 {
-	mutex_lock(&mpam_list_lock);
-	mpam_enable_merge_features(&mpam_classes);
-	mutex_unlock(&mpam_list_lock);
+	int err;
 
 	/*
 	 * Once the cpuhp callbacks have been changed, mpam_partid_max can no
@@ -1561,6 +1814,27 @@ static void mpam_enable_once(void)
 	partid_max_published = true;
 	spin_unlock(&partid_max_lock);
 
+	/*
+	 * If all the MSC have been probed, enabling the IRQs happens next.
+	 * That involves cross-calling to a CPU that can reach the MSC, and
+	 * the locks must be taken in this order:
+	 */
+	cpus_read_lock();
+	mutex_lock(&mpam_list_lock);
+	mpam_enable_merge_features(&mpam_classes);
+
+	err = mpam_register_irqs();
+	if (err)
+		pr_warn("Failed to register irqs: %d\n", err);
+
+	mutex_unlock(&mpam_list_lock);
+	cpus_read_unlock();
+
+	if (err) {
+		schedule_work(&mpam_broken_work);
+		return;
+	}
+
 	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
 
 	printk(KERN_INFO "MPAM enabled with %u partid and %u pmg\n",
@@ -1615,16 +1889,39 @@ static void mpam_reset_class(struct mpam_class *class)
  * All of MPAMs errors indicate a software bug, restore any modified
  * controls to their reset values.
  */
-void mpam_disable(void)
+static irqreturn_t mpam_disable_thread(int irq, void *dev_id)
 {
 	int idx;
 	struct mpam_class *class;
+	struct mpam_msc *msc, *tmp;
+
+	mutex_lock(&mpam_cpuhp_state_lock);
+	if (mpam_cpuhp_state) {
+		cpuhp_remove_state(mpam_cpuhp_state);
+		mpam_cpuhp_state = 0;
+	}
+	mutex_unlock(&mpam_cpuhp_state_lock);
+
+	mpam_unregister_irqs();
 
 	idx = srcu_read_lock(&mpam_srcu);
 	list_for_each_entry_srcu(class, &mpam_classes, classes_list,
 				 srcu_read_lock_held(&mpam_srcu))
 		mpam_reset_class(class);
 	srcu_read_unlock(&mpam_srcu, idx);
+
+	mutex_lock(&mpam_list_lock);
+	list_for_each_entry_safe(msc, tmp, &mpam_all_msc, glbl_list)
+		mpam_msc_destroy(msc);
+	mutex_unlock(&mpam_list_lock);
+	mpam_free_garbage();
+
+	return IRQ_HANDLED;
+}
+
+void mpam_disable(struct work_struct *ignored)
+{
+	mpam_disable_thread(0, NULL);
 }
 
 /*
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index b30fee2b7674..c9418c9cf9f2 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -44,6 +44,11 @@ struct mpam_msc {
 	struct pcc_mbox_chan	*pcc_chan;
 	u32			nrdy_usec;
 	cpumask_t		accessibility;
+	bool			has_extd_esr;
+
+	int				reenable_error_ppi;
+	struct mpam_msc * __percpu	*error_dev_id;
+
 	atomic_t		online_refs;
 
 	/*
@@ -52,6 +57,8 @@ struct mpam_msc {
 	 */
 	struct mutex		probe_lock;
 	bool			probed;
+	bool			error_irq_requested;
+	bool			error_irq_hw_enabled;
 	u16			partid_max;
 	u8			pmg_max;
 	unsigned long		ris_idxs[128 / BITS_PER_LONG];
@@ -281,7 +288,7 @@ extern u8 mpam_pmg_max;
 
 /* Scheduled work callback to enable mpam once all MSC have been probed */
 void mpam_enable(struct work_struct *work);
-void mpam_disable(void);
+void mpam_disable(struct work_struct *work);
 
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 22/33] arm_mpam: Register and enable IRQs
  2025-08-22 15:30 ` [PATCH 22/33] arm_mpam: Register and enable IRQs James Morse
@ 2025-09-01 10:05   ` Ben Horgan
  0 siblings, 0 replies; 200+ messages in thread
From: Ben Horgan @ 2025-09-01 10:05 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi James,
On 8/22/25 16:30, James Morse wrote:
> Register and enable error IRQs. All the MPAM error interrupts indicate a
> software bug, e.g. out of range partid. If the error interrupt is ever
> signalled, attempt to disable MPAM.
> 
> Only the irq handler accesses the ESR register, so no locking is needed.
> The work to disable MPAM after an error needs to happen at process
> context, use a threaded interrupt.
> 
> There is no support for percpu threaded interrupts, for now schedule
> the work to be done from the irq handler.
> 
> Enabling the IRQs in the MSC may involve cross calling to a CPU that
> can access the MSC.
> 
> Once the IRQ is requested, the mpam_disable() path can be called
> asynchronously, which will walk structures sized by max_partid. Ensure
> this size is fixed before the interrupt is requested.
> 
> CC: Rohit Mathew <rohit.mathew@arm.com>
> Tested-by: Rohit Mathew <rohit.mathew@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since RFC:
>  * Use guard marco when walking srcu list.
>  * Use INTEN macro for enabling interrupts.
>  * Move partid_max_published up earlier in mpam_enable_once().
> ---
>  drivers/resctrl/mpam_devices.c  | 311 +++++++++++++++++++++++++++++++-
>  drivers/resctrl/mpam_internal.h |   9 +-
>  2 files changed, 312 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 3516cbe8623e..210d64fad0b1 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -14,6 +14,9 @@
>  #include <linux/device.h>
>  #include <linux/errno.h>
>  #include <linux/gfp.h>
> +#include <linux/interrupt.h>
> +#include <linux/irq.h>
> +#include <linux/irqdesc.h>
>  #include <linux/list.h>
>  #include <linux/lockdep.h>
>  #include <linux/mutex.h>
> @@ -62,6 +65,12 @@ static DEFINE_SPINLOCK(partid_max_lock);
>   */
>  static DECLARE_WORK(mpam_enable_work, &mpam_enable);
>  
> +/*
> + * All mpam error interrupts indicate a software bug. On receipt, disable the
> + * driver.
> + */
> +static DECLARE_WORK(mpam_broken_work, &mpam_disable);
> +
>  /*
>   * An MSC is a physical container for controls and monitors, each identified by
>   * their RIS index. These share a base-address, interrupts and some MMIO
> @@ -159,6 +168,24 @@ static u64 mpam_msc_read_idr(struct mpam_msc *msc)
>  	return (idr_high << 32) | idr_low;
>  }
>  
> +static void mpam_msc_zero_esr(struct mpam_msc *msc)
> +{
> +	__mpam_write_reg(msc, MPAMF_ESR, 0);
> +	if (msc->has_extd_esr)
> +		__mpam_write_reg(msc, MPAMF_ESR + 4, 0);
> +}
> +
> +static u64 mpam_msc_read_esr(struct mpam_msc *msc)
> +{
> +	u64 esr_high = 0, esr_low;
> +
> +	esr_low = __mpam_read_reg(msc, MPAMF_ESR);
> +	if (msc->has_extd_esr)
> +		esr_high = __mpam_read_reg(msc, MPAMF_ESR + 4);
> +
> +	return (esr_high << 32) | esr_low;
> +}
> +
>  static void __mpam_part_sel_raw(u32 partsel, struct mpam_msc *msc)
>  {
>  	lockdep_assert_held(&msc->part_sel_lock);
> @@ -405,12 +432,12 @@ static void mpam_msc_destroy(struct mpam_msc *msc)
>  
>  	lockdep_assert_held(&mpam_list_lock);
>  
> -	list_del_rcu(&msc->glbl_list);
> -	platform_set_drvdata(pdev, NULL);
> -
>  	list_for_each_entry_safe(ris, tmp, &msc->ris, msc_list)
>  		mpam_ris_destroy(ris);
>  
> +	list_del_rcu(&msc->glbl_list);
> +	platform_set_drvdata(pdev, NULL);
> +
Reordering can be done when introduced.
>  	add_to_garbage(msc);
>  	msc->garbage.pdev = pdev;
>  }
> @@ -828,6 +855,7 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>  		pmg_max = FIELD_GET(MPAMF_IDR_PMG_MAX, idr);
>  		msc->partid_max = min(msc->partid_max, partid_max);
>  		msc->pmg_max = min(msc->pmg_max, pmg_max);
> +		msc->has_extd_esr = FIELD_GET(MPAMF_IDR_HAS_EXTD_ESR, idr);
>  
>  		ris = mpam_get_or_create_ris(msc, ris_idx);
>  		if (IS_ERR(ris))
> @@ -840,6 +868,9 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>  		mutex_unlock(&msc->part_sel_lock);
>  	}
>  
> +	/* Clear any stale errors */
> +	mpam_msc_zero_esr(msc);
> +
>  	spin_lock(&partid_max_lock);
>  	mpam_partid_max = min(mpam_partid_max, msc->partid_max);
>  	mpam_pmg_max = min(mpam_pmg_max, msc->pmg_max);
> @@ -973,6 +1004,13 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
>  	mpam_mon_sel_outer_unlock(msc);
>  }
>  
> +static void _enable_percpu_irq(void *_irq)
> +{
> +	int *irq = _irq;
> +
> +	enable_percpu_irq(*irq, IRQ_TYPE_NONE);
> +}
> +
>  static int mpam_cpu_online(unsigned int cpu)
>  {
>  	int idx;
> @@ -983,6 +1021,9 @@ static int mpam_cpu_online(unsigned int cpu)
>  		if (!cpumask_test_cpu(cpu, &msc->accessibility))
>  			continue;
>  
> +		if (msc->reenable_error_ppi)
> +			_enable_percpu_irq(&msc->reenable_error_ppi);
> +
>  		if (atomic_fetch_inc(&msc->online_refs) == 0)
>  			mpam_reset_msc(msc, true);
>  	}
> @@ -1031,6 +1072,9 @@ static int mpam_cpu_offline(unsigned int cpu)
>  		if (!cpumask_test_cpu(cpu, &msc->accessibility))
>  			continue;
>  
> +		if (msc->reenable_error_ppi)
> +			disable_percpu_irq(msc->reenable_error_ppi);
> +
>  		if (atomic_dec_and_test(&msc->online_refs))
>  			mpam_reset_msc(msc, false);
>  	}
> @@ -1057,6 +1101,51 @@ static void mpam_register_cpuhp_callbacks(int (*online)(unsigned int online),
>  	mutex_unlock(&mpam_cpuhp_state_lock);
>  }
>  
> +static int __setup_ppi(struct mpam_msc *msc)
> +{
> +	int cpu;
> +
> +	msc->error_dev_id = alloc_percpu_gfp(struct mpam_msc *, GFP_KERNEL);
Simpler to use alloc_percpu().
> +	if (!msc->error_dev_id)
> +		return -ENOMEM;
> +
> +	for_each_cpu(cpu, &msc->accessibility) {
> +		struct mpam_msc *empty = *per_cpu_ptr(msc->error_dev_id, cpu);
> +
> +		if (empty) {
> +			pr_err_once("%s shares PPI with %s!\n",
> +				    dev_name(&msc->pdev->dev),
> +				    dev_name(&empty->pdev->dev));
> +			return -EBUSY;
> +		}
> +		*per_cpu_ptr(msc->error_dev_id, cpu) = msc;
> +	}
> +
> +	return 0;
> +}
> +
> +static int mpam_msc_setup_error_irq(struct mpam_msc *msc)
> +{
> +	int irq;
> +
> +	irq = platform_get_irq_byname_optional(msc->pdev, "error");
> +	if (irq <= 0)
> +		return 0;
> +
> +	/* Allocate and initialise the percpu device pointer for PPI */
> +	if (irq_is_percpu(irq))
> +		return __setup_ppi(msc);
> +
> +	/* sanity check: shared interrupts can be routed anywhere? */
> +	if (!cpumask_equal(&msc->accessibility, cpu_possible_mask)) {
> +		pr_err_once("msc:%u is a private resource with a shared error interrupt",
> +			    msc->id);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
>  static int mpam_dt_count_msc(void)
>  {
>  	int count = 0;
> @@ -1265,6 +1354,10 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
>  			break;
>  		}
>  
> +		err = mpam_msc_setup_error_irq(msc);
> +		if (err)
> +			break;
> +
>  		if (device_property_read_u32(&pdev->dev, "pcc-channel",
>  					     &msc->pcc_subspace_id))
>  			msc->iface = MPAM_IFACE_MMIO;
> @@ -1547,11 +1640,171 @@ static void mpam_enable_merge_features(struct list_head *all_classes_list)
>  	}
>  }
>  
> +static char *mpam_errcode_names[16] = {
> +	[0] = "No error",
> +	[1] = "PARTID_SEL_Range",
> +	[2] = "Req_PARTID_Range",
> +	[3] = "MSMONCFG_ID_RANGE",
> +	[4] = "Req_PMG_Range",
> +	[5] = "Monitor_Range",
> +	[6] = "intPARTID_Range",
> +	[7] = "Unexpected_INTERNAL",
> +	[8] = "Undefined_RIS_PART_SEL",
> +	[9] = "RIS_No_Control",
> +	[10] = "Undefined_RIS_MON_SEL",
> +	[11] = "RIS_No_Monitor",
> +	[12 ... 15] = "Reserved"
> +};
These names match the spec.
> +
> +static int mpam_enable_msc_ecr(void *_msc)
> +{
> +	struct mpam_msc *msc = _msc;
> +
> +	__mpam_write_reg(msc, MPAMF_ECR, MPAMF_ECR_INTEN);
> +
> +	return 0;
> +}
> +
> +static int mpam_disable_msc_ecr(void *_msc)
> +{
> +	struct mpam_msc *msc = _msc;
> +
> +	__mpam_write_reg(msc, MPAMF_ECR, 0);
> +
> +	return 0;
> +}
> +
> +static irqreturn_t __mpam_irq_handler(int irq, struct mpam_msc *msc)
> +{
> +	u64 reg;
> +	u16 partid;
> +	u8 errcode, pmg, ris;
> +
> +	if (WARN_ON_ONCE(!msc) ||
> +	    WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(),
> +					   &msc->accessibility)))
> +		return IRQ_NONE;
> +
> +	reg = mpam_msc_read_esr(msc);
> +
> +	errcode = FIELD_GET(MPAMF_ESR_ERRCODE, reg);
> +	if (!errcode)
> +		return IRQ_NONE;
> +
> +	/* Clear level triggered irq */
> +	mpam_msc_zero_esr(msc);
> +
> +	partid = FIELD_GET(MPAMF_ESR_PARTID_MON, reg);
> +	pmg = FIELD_GET(MPAMF_ESR_PMG, reg);
> +	ris = FIELD_GET(MPAMF_ESR_RIS, reg);
> +
> +	pr_err("error irq from msc:%u '%s', partid:%u, pmg: %u, ris: %u\n",
> +	       msc->id, mpam_errcode_names[errcode], partid, pmg, ris);
> +
> +	if (irq_is_percpu(irq)) {
> +		mpam_disable_msc_ecr(msc);
> +		schedule_work(&mpam_broken_work);
> +		return IRQ_HANDLED;
> +	}
> +
> +	return IRQ_WAKE_THREAD;
> +}
> +
> +static irqreturn_t mpam_ppi_handler(int irq, void *dev_id)
> +{
> +	struct mpam_msc *msc = *(struct mpam_msc **)dev_id;
> +
> +	return __mpam_irq_handler(irq, msc);
> +}
> +
> +static irqreturn_t mpam_spi_handler(int irq, void *dev_id)
> +{
> +	struct mpam_msc *msc = dev_id;
> +
> +	return __mpam_irq_handler(irq, msc);
> +}
> +
> +static irqreturn_t mpam_disable_thread(int irq, void *dev_id);
> +
> +static int mpam_register_irqs(void)
> +{
> +	int err, irq;
> +	struct mpam_msc *msc;
> +
> +	lockdep_assert_cpus_held();
> +
> +	guard(srcu)(&mpam_srcu);
> +	list_for_each_entry_srcu(msc, &mpam_all_msc, glbl_list, srcu_read_lock_held(&mpam_srcu)) {
> +		irq = platform_get_irq_byname_optional(msc->pdev, "error");
> +		if (irq <= 0)
> +			continue;
> +
> +		/* The MPAM spec says the interrupt can be SPI, PPI or LPI */
> +		/* We anticipate sharing the interrupt with other MSCs */
> +		if (irq_is_percpu(irq)) {
> +			err = request_percpu_irq(irq, &mpam_ppi_handler,
> +						 "mpam:msc:error",
> +						 msc->error_dev_id);
> +			if (err)
> +				return err;
> +
> +			msc->reenable_error_ppi = irq;
> +			smp_call_function_many(&msc->accessibility,
> +					       &_enable_percpu_irq, &irq,
> +					       true);
> +		} else {
> +			err = devm_request_threaded_irq(&msc->pdev->dev, irq,
> +							&mpam_spi_handler,
> +							&mpam_disable_thread,
> +							IRQF_SHARED,
> +							"mpam:msc:error", msc);
> +			if (err)
> +				return err;
> +		}
> +
> +		msc->error_irq_requested = true;
> +		mpam_touch_msc(msc, mpam_enable_msc_ecr, msc);
> +		msc->error_irq_hw_enabled = true;
> +	}
> +
> +	return 0;
> +}
> +
> +static void mpam_unregister_irqs(void)
> +{
> +	int irq, idx;
> +	struct mpam_msc *msc;
> +
> +	cpus_read_lock();
> +	/* take the lock as free_irq() can sleep */
> +	idx = srcu_read_lock(&mpam_srcu);
> +	list_for_each_entry_srcu(msc, &mpam_all_msc, glbl_list, srcu_read_lock_held(&mpam_srcu)) {
> +		irq = platform_get_irq_byname_optional(msc->pdev, "error");
> +		if (irq <= 0)
> +			continue;
> +
> +		if (msc->error_irq_hw_enabled) {
> +			mpam_touch_msc(msc, mpam_disable_msc_ecr, msc);
> +			msc->error_irq_hw_enabled = false;
> +		}
> +
> +		if (msc->error_irq_requested) {
> +			if (irq_is_percpu(irq)) {
> +				msc->reenable_error_ppi = 0;
> +				free_percpu_irq(irq, msc->error_dev_id);
> +			} else {
> +				devm_free_irq(&msc->pdev->dev, irq, msc);
> +			}
> +			msc->error_irq_requested = false;
> +		}
> +	}
> +	srcu_read_unlock(&mpam_srcu, idx);
> +	cpus_read_unlock();
> +}
> +
>  static void mpam_enable_once(void)
>  {
> -	mutex_lock(&mpam_list_lock);
> -	mpam_enable_merge_features(&mpam_classes);
> -	mutex_unlock(&mpam_list_lock);
> +	int err;
>  
>  	/*
>  	 * Once the cpuhp callbacks have been changed, mpam_partid_max can no
> @@ -1561,6 +1814,27 @@ static void mpam_enable_once(void)
>  	partid_max_published = true;
>  	spin_unlock(&partid_max_lock);
>  
> +	/*
> +	 * If all the MSC have been probed, enabling the IRQs happens next.
> +	 * That involves cross-calling to a CPU that can reach the MSC, and
> +	 * the locks must be taken in this order:
> +	 */
> +	cpus_read_lock();
> +	mutex_lock(&mpam_list_lock);
> +	mpam_enable_merge_features(&mpam_classes);
> +
> +	err = mpam_register_irqs();
> +	if (err)
> +		pr_warn("Failed to register irqs: %d\n", err);
> +
> +	mutex_unlock(&mpam_list_lock);
> +	cpus_read_unlock();
> +
> +	if (err) {
> +		schedule_work(&mpam_broken_work);
> +		return;
> +	}
> +
>  	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
>  
>  	printk(KERN_INFO "MPAM enabled with %u partid and %u pmg\n",
> @@ -1615,16 +1889,39 @@ static void mpam_reset_class(struct mpam_class *class)
>   * All of MPAMs errors indicate a software bug, restore any modified
>   * controls to their reset values.
>   */
> -void mpam_disable(void)
> +static irqreturn_t mpam_disable_thread(int irq, void *dev_id)
>  {
>  	int idx;
>  	struct mpam_class *class;
> +	struct mpam_msc *msc, *tmp;
> +
> +	mutex_lock(&mpam_cpuhp_state_lock);
> +	if (mpam_cpuhp_state) {
> +		cpuhp_remove_state(mpam_cpuhp_state);
> +		mpam_cpuhp_state = 0;
> +	}
> +	mutex_unlock(&mpam_cpuhp_state_lock);
> +
> +	mpam_unregister_irqs();
>  
>  	idx = srcu_read_lock(&mpam_srcu);
>  	list_for_each_entry_srcu(class, &mpam_classes, classes_list,
>  				 srcu_read_lock_held(&mpam_srcu))
>  		mpam_reset_class(class);
>  	srcu_read_unlock(&mpam_srcu, idx);
> +
> +	mutex_lock(&mpam_list_lock);
> +	list_for_each_entry_safe(msc, tmp, &mpam_all_msc, glbl_list)
> +		mpam_msc_destroy(msc);
> +	mutex_unlock(&mpam_list_lock);
> +	mpam_free_garbage();
> +
> +	return IRQ_HANDLED;
> +}
> +
> +void mpam_disable(struct work_struct *ignored)
> +{
> +	mpam_disable_thread(0, NULL);
>  }
>  
>  /*
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index b30fee2b7674..c9418c9cf9f2 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -44,6 +44,11 @@ struct mpam_msc {
>  	struct pcc_mbox_chan	*pcc_chan;
>  	u32			nrdy_usec;
>  	cpumask_t		accessibility;
> +	bool			has_extd_esr;
> +
> +	int				reenable_error_ppi;
> +	struct mpam_msc * __percpu	*error_dev_id;
> +
>  	atomic_t		online_refs;
>  
>  	/*
> @@ -52,6 +57,8 @@ struct mpam_msc {
>  	 */
>  	struct mutex		probe_lock;
>  	bool			probed;
> +	bool			error_irq_requested;
> +	bool			error_irq_hw_enabled;
>  	u16			partid_max;
>  	u8			pmg_max;
>  	unsigned long		ris_idxs[128 / BITS_PER_LONG];
> @@ -281,7 +288,7 @@ extern u8 mpam_pmg_max;
>  
>  /* Scheduled work callback to enable mpam once all MSC have been probed */
>  void mpam_enable(struct work_struct *work);
> -void mpam_disable(void);
> +void mpam_disable(struct work_struct *work);
>  
>  int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
>  				   cpumask_t *affinity);
Thanks,
Ben
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
- * [PATCH 23/33] arm_mpam: Use a static key to indicate when mpam is enabled
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (55 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 22/33] arm_mpam: Register and enable IRQs James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-22 15:30 ` [PATCH 24/33] arm_mpam: Allow configuration to be applied and restored during cpu online James Morse
                   ` (10 subsequent siblings)
  67 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Once all the MSC have been probed, the system wide usable number of
PARTID is known and the configuration arrays can be allocated.
After this point, checking all the MSC have been probed is pointless,
and the cpuhp callbacks should restore the configuration, instead of
just resetting the MSC.
Add a static key to enable this behaviour. This will also allow MPAM
to be disabled in repsonse to an error, and the architecture code to
enable/disable the context switch of the MPAM system registers.
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_devices.c  | 8 ++++++++
 drivers/resctrl/mpam_internal.h | 8 ++++++++
 2 files changed, 16 insertions(+)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 210d64fad0b1..b424af666b1e 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -33,6 +33,8 @@
 
 #include "mpam_internal.h"
 
+DEFINE_STATIC_KEY_FALSE(mpam_enabled); /* TODO: move to arch code */
+
 /*
  * mpam_list_lock protects the SRCU lists when writing. Once the
  * mpam_enabled key is enabled these lists are read-only,
@@ -1039,6 +1041,9 @@ static int mpam_discovery_cpu_online(unsigned int cpu)
 	struct mpam_msc *msc;
 	bool new_device_probed = false;
 
+	if (mpam_is_enabled())
+		return 0;
+
 	mutex_lock(&mpam_list_lock);
 	list_for_each_entry(msc, &mpam_all_msc, glbl_list) {
 		if (!cpumask_test_cpu(cpu, &msc->accessibility))
@@ -1835,6 +1840,7 @@ static void mpam_enable_once(void)
 		return;
 	}
 
+	static_branch_enable(&mpam_enabled);
 	mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
 
 	printk(KERN_INFO "MPAM enabled with %u partid and %u pmg\n",
@@ -1902,6 +1908,8 @@ static irqreturn_t mpam_disable_thread(int irq, void *dev_id)
 	}
 	mutex_unlock(&mpam_cpuhp_state_lock);
 
+	static_branch_disable(&mpam_enabled);
+
 	mpam_unregister_irqs();
 
 	idx = srcu_read_lock(&mpam_srcu);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index c9418c9cf9f2..3476ee97f8ac 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -8,6 +8,7 @@
 #include <linux/atomic.h>
 #include <linux/cpumask.h>
 #include <linux/io.h>
+#include <linux/jump_label.h>
 #include <linux/llist.h>
 #include <linux/mailbox_client.h>
 #include <linux/mutex.h>
@@ -15,6 +16,13 @@
 #include <linux/sizes.h>
 #include <linux/srcu.h>
 
+DECLARE_STATIC_KEY_FALSE(mpam_enabled);
+
+static inline bool mpam_is_enabled(void)
+{
+	return static_branch_likely(&mpam_enabled);
+}
+
 /*
  * Structures protected by SRCU may not be freed for a surprising amount of
  * time (especially if perf is running). To ensure the MPAM error interrupt can
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * [PATCH 24/33] arm_mpam: Allow configuration to be applied and restored during cpu online
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (56 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 23/33] arm_mpam: Use a static key to indicate when mpam is enabled James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-22 15:30 ` [PATCH 25/33] arm_mpam: Probe and reset the rest of the features James Morse
                   ` (9 subsequent siblings)
  67 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Dave Martin
When CPUs come online the original configuration should be restored.
Once the maximum partid is known, allocate an configuration array for
each component, and reprogram each RIS configuration from this.
The MPAM spec describes how multiple controls can interact. To prevent
this happening by accident, always reset controls that don't have a
valid configuration. This allows the same helper to be used for
configuration and reset.
CC: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
 * Added a comment about the ordering around max_partid.
 * Allocate configurations after interrupts are registered to reduce churn.
 * Added mpam_assert_partid_sizes_fixed();
---
 drivers/resctrl/mpam_devices.c  | 253 +++++++++++++++++++++++++++++---
 drivers/resctrl/mpam_internal.h |  26 +++-
 2 files changed, 251 insertions(+), 28 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index b424af666b1e..8f6df2406c22 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -112,6 +112,16 @@ LIST_HEAD(mpam_classes);
 /* List of all objects that can be free()d after synchronise_srcu() */
 static LLIST_HEAD(mpam_garbage);
 
+/*
+ * Once mpam is enabled, new requestors cannot further reduce the available
+ * partid. Assert that the size is fixed, and new requestors will be turned
+ * away.
+ */
+static void mpam_assert_partid_sizes_fixed(void)
+{
+	WARN_ON_ONCE(!partid_max_published);
+}
+
 static u32 __mpam_read_reg(struct mpam_msc *msc, u16 reg)
 {
 	WARN_ON_ONCE(reg + sizeof(u32) > msc->mapped_hwpage_sz);
@@ -374,12 +384,16 @@ static void mpam_class_destroy(struct mpam_class *class)
 	add_to_garbage(class);
 }
 
+static void __destroy_component_cfg(struct mpam_component *comp);
+
 static void mpam_comp_destroy(struct mpam_component *comp)
 {
 	struct mpam_class *class = comp->class;
 
 	lockdep_assert_held(&mpam_list_lock);
 
+	__destroy_component_cfg(comp);
+
 	list_del_rcu(&comp->class_list);
 	add_to_garbage(comp);
 
@@ -911,51 +925,90 @@ static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
 	__mpam_write_reg(msc, reg, bm);
 }
 
-static void mpam_reset_ris_partid(struct mpam_msc_ris *ris, u16 partid)
+/* Called via IPI. Call while holding an SRCU reference */
+static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
+				      struct mpam_config *cfg)
 {
 	u16 bwa_fract = MPAMCFG_MBW_MAX_MAX;
 	struct mpam_msc *msc = ris->vmsc->msc;
 	struct mpam_props *rprops = &ris->props;
 
-	mpam_assert_srcu_read_lock_held();
-
 	mutex_lock(&msc->part_sel_lock);
 	__mpam_part_sel(ris->ris_idx, partid, msc);
 
-	if (mpam_has_feature(mpam_feat_cpor_part, rprops))
-		mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM, rprops->cpbm_wd);
+	if (mpam_has_feature(mpam_feat_cpor_part, rprops)) {
+		if (mpam_has_feature(mpam_feat_cpor_part, cfg))
+			mpam_write_partsel_reg(msc, CPBM, cfg->cpbm);
+		else
+			mpam_reset_msc_bitmap(msc, MPAMCFG_CPBM,
+					      rprops->cpbm_wd);
+	}
 
-	if (mpam_has_feature(mpam_feat_mbw_part, rprops))
-		mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM, rprops->mbw_pbm_bits);
+	if (mpam_has_feature(mpam_feat_mbw_part, rprops)) {
+		if (mpam_has_feature(mpam_feat_mbw_part, cfg))
+			mpam_write_partsel_reg(msc, MBW_PBM, cfg->mbw_pbm);
+		else
+			mpam_reset_msc_bitmap(msc, MPAMCFG_MBW_PBM,
+					      rprops->mbw_pbm_bits);
+	}
 
 	if (mpam_has_feature(mpam_feat_mbw_min, rprops))
 		mpam_write_partsel_reg(msc, MBW_MIN, 0);
 
-	if (mpam_has_feature(mpam_feat_mbw_max, rprops))
-		mpam_write_partsel_reg(msc, MBW_MAX, bwa_fract);
+	if (mpam_has_feature(mpam_feat_mbw_max, rprops)) {
+		if (mpam_has_feature(mpam_feat_mbw_max, cfg))
+			mpam_write_partsel_reg(msc, MBW_MAX, cfg->mbw_max);
+		else
+			mpam_write_partsel_reg(msc, MBW_MAX, bwa_fract);
+	}
 
 	if (mpam_has_feature(mpam_feat_mbw_prop, rprops))
 		mpam_write_partsel_reg(msc, MBW_PROP, bwa_fract);
 	mutex_unlock(&msc->part_sel_lock);
 }
 
+struct reprogram_ris {
+	struct mpam_msc_ris *ris;
+	struct mpam_config *cfg;
+};
+
+/* Call with MSC lock held */
+static int mpam_reprogram_ris(void *_arg)
+{
+	u16 partid, partid_max;
+	struct reprogram_ris *arg = _arg;
+	struct mpam_msc_ris *ris = arg->ris;
+	struct mpam_config *cfg = arg->cfg;
+
+	if (ris->in_reset_state)
+		return 0;
+
+	spin_lock(&partid_max_lock);
+	partid_max = mpam_partid_max;
+	spin_unlock(&partid_max_lock);
+	for (partid = 0; partid <= partid_max; partid++)
+		mpam_reprogram_ris_partid(ris, partid, cfg);
+
+	return 0;
+}
+
 /*
  * Called via smp_call_on_cpu() to prevent migration, while still being
  * pre-emptible.
  */
 static int mpam_reset_ris(void *arg)
 {
-	u16 partid, partid_max;
 	struct mpam_msc_ris *ris = arg;
+	struct reprogram_ris reprogram_arg;
+	struct mpam_config empty_cfg = { 0 };
 
 	if (ris->in_reset_state)
 		return 0;
 
-	spin_lock(&partid_max_lock);
-	partid_max = mpam_partid_max;
-	spin_unlock(&partid_max_lock);
-	for (partid = 0; partid < partid_max; partid++)
-		mpam_reset_ris_partid(ris, partid);
+	reprogram_arg.ris = ris;
+	reprogram_arg.cfg = &empty_cfg;
+
+	mpam_reprogram_ris(&reprogram_arg);
 
 	return 0;
 }
@@ -986,13 +1039,11 @@ static int mpam_touch_msc(struct mpam_msc *msc, int (*fn)(void *a), void *arg)
 
 static void mpam_reset_msc(struct mpam_msc *msc, bool online)
 {
-	int idx;
 	struct mpam_msc_ris *ris;
 
 	mpam_assert_srcu_read_lock_held();
 
 	mpam_mon_sel_outer_lock(msc);
-	idx = srcu_read_lock(&mpam_srcu);
 	list_for_each_entry_srcu(ris, &msc->ris, msc_list, srcu_read_lock_held(&mpam_srcu)) {
 		mpam_touch_msc(msc, &mpam_reset_ris, ris);
 
@@ -1002,10 +1053,42 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
 		 */
 		ris->in_reset_state = online;
 	}
-	srcu_read_unlock(&mpam_srcu, idx);
 	mpam_mon_sel_outer_unlock(msc);
 }
 
+static void mpam_reprogram_msc(struct mpam_msc *msc)
+{
+	u16 partid;
+	bool reset;
+	struct mpam_config *cfg;
+	struct mpam_msc_ris *ris;
+
+	/*
+	 * No lock for mpam_partid_max as partid_max_published has been
+	 * set by mpam_enabled(), so the values can no longer change.
+	 */
+	mpam_assert_partid_sizes_fixed();
+
+	guard(srcu)(&mpam_srcu);
+	list_for_each_entry_rcu(ris, &msc->ris, msc_list) {
+		if (!mpam_is_enabled() && !ris->in_reset_state) {
+			mpam_touch_msc(msc, &mpam_reset_ris, ris);
+			ris->in_reset_state = true;
+			continue;
+		}
+
+		reset = true;
+		for (partid = 0; partid <= mpam_partid_max; partid++) {
+			cfg = &ris->vmsc->comp->cfg[partid];
+			if (cfg->features)
+				reset = false;
+
+			mpam_reprogram_ris_partid(ris, partid, cfg);
+		}
+		ris->in_reset_state = reset;
+	}
+}
+
 static void _enable_percpu_irq(void *_irq)
 {
 	int *irq = _irq;
@@ -1027,7 +1110,7 @@ static int mpam_cpu_online(unsigned int cpu)
 			_enable_percpu_irq(&msc->reenable_error_ppi);
 
 		if (atomic_fetch_inc(&msc->online_refs) == 0)
-			mpam_reset_msc(msc, true);
+			mpam_reprogram_msc(msc);
 	}
 	srcu_read_unlock(&mpam_srcu, idx);
 
@@ -1807,6 +1890,45 @@ static void mpam_unregister_irqs(void)
 	cpus_read_unlock();
 }
 
+static void __destroy_component_cfg(struct mpam_component *comp)
+{
+	add_to_garbage(comp->cfg);
+}
+
+static int __allocate_component_cfg(struct mpam_component *comp)
+{
+	mpam_assert_partid_sizes_fixed();
+
+	if (comp->cfg)
+		return 0;
+
+	comp->cfg = kcalloc(mpam_partid_max + 1, sizeof(*comp->cfg), GFP_KERNEL);
+	if (!comp->cfg)
+		return -ENOMEM;
+	init_garbage(comp->cfg);
+
+	return 0;
+}
+
+static int mpam_allocate_config(void)
+{
+	int err = 0;
+	struct mpam_class *class;
+	struct mpam_component *comp;
+
+	lockdep_assert_held(&mpam_list_lock);
+
+	list_for_each_entry(class, &mpam_classes, classes_list) {
+		list_for_each_entry(comp, &class->components, class_list) {
+			err = __allocate_component_cfg(comp);
+			if (err)
+				return err;
+		}
+	}
+
+	return 0;
+}
+
 static void mpam_enable_once(void)
 {
 	int err;
@@ -1826,12 +1948,21 @@ static void mpam_enable_once(void)
 	 */
 	cpus_read_lock();
 	mutex_lock(&mpam_list_lock);
-	mpam_enable_merge_features(&mpam_classes);
+	do {
+		mpam_enable_merge_features(&mpam_classes);
 
-	err = mpam_register_irqs();
-	if (err)
-		pr_warn("Failed to register irqs: %d\n", err);
+		err = mpam_register_irqs();
+		if (err) {
+			pr_warn("Failed to register irqs: %d\n", err);
+			break;
+		}
 
+		err = mpam_allocate_config();
+		if (err) {
+			pr_err("Failed to allocate configuration arrays.\n");
+			break;
+		}
+	} while (0);
 	mutex_unlock(&mpam_list_lock);
 	cpus_read_unlock();
 
@@ -1856,6 +1987,9 @@ static void mpam_reset_component_locked(struct mpam_component *comp)
 
 	might_sleep();
 	lockdep_assert_cpus_held();
+	mpam_assert_partid_sizes_fixed();
+
+	memset(comp->cfg, 0, (mpam_partid_max * sizeof(*comp->cfg)));
 
 	idx = srcu_read_lock(&mpam_srcu);
 	list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
@@ -1960,6 +2094,79 @@ void mpam_enable(struct work_struct *work)
 		mpam_enable_once();
 }
 
+struct mpam_write_config_arg {
+	struct mpam_msc_ris *ris;
+	struct mpam_component *comp;
+	u16 partid;
+};
+
+static int __write_config(void *arg)
+{
+	struct mpam_write_config_arg *c = arg;
+
+	mpam_reprogram_ris_partid(c->ris, c->partid, &c->comp->cfg[c->partid]);
+
+	return 0;
+}
+
+#define maybe_update_config(cfg, feature, newcfg, member, changes) do { \
+	if (mpam_has_feature(feature, newcfg) &&			\
+	    (newcfg)->member != (cfg)->member) {			\
+		(cfg)->member = (newcfg)->member;			\
+		cfg->features |= (1 << feature);			\
+									\
+		(changes) |= (1 << feature);				\
+	}								\
+} while (0)
+
+static mpam_features_t mpam_update_config(struct mpam_config *cfg,
+					  const struct mpam_config *newcfg)
+{
+	mpam_features_t changes = 0;
+
+	maybe_update_config(cfg, mpam_feat_cpor_part, newcfg, cpbm, changes);
+	maybe_update_config(cfg, mpam_feat_mbw_part, newcfg, mbw_pbm, changes);
+	maybe_update_config(cfg, mpam_feat_mbw_max, newcfg, mbw_max, changes);
+
+	return changes;
+}
+
+/* TODO: split into write_config/sync_config */
+/* TODO: add config_dirty bitmap to drive sync_config */
+int mpam_apply_config(struct mpam_component *comp, u16 partid,
+		      struct mpam_config *cfg)
+{
+	struct mpam_write_config_arg arg;
+	struct mpam_msc_ris *ris;
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc *msc;
+	int idx;
+
+	lockdep_assert_cpus_held();
+
+	/* Don't pass in the current config! */
+	WARN_ON_ONCE(&comp->cfg[partid] == cfg);
+
+	if (!mpam_update_config(&comp->cfg[partid], cfg))
+		return 0;
+
+	arg.comp = comp;
+	arg.partid = partid;
+
+	idx = srcu_read_lock(&mpam_srcu);
+	list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
+		msc = vmsc->msc;
+
+		list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
+			arg.ris = ris;
+			mpam_touch_msc(msc, __write_config, &arg);
+		}
+	}
+	srcu_read_unlock(&mpam_srcu, idx);
+
+	return 0;
+}
+
 /*
  * MSC that are hidden under caches are not created as platform devices
  * as there is no cache driver. Caches are also special-cased in
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 3476ee97f8ac..70cba9f22746 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -191,11 +191,7 @@ struct mpam_props {
 	u16			num_mbwu_mon;
 };
 
-static inline bool mpam_has_feature(enum mpam_device_features feat,
-				    struct mpam_props *props)
-{
-	return (1 << feat) & props->features;
-}
+#define mpam_has_feature(_feat, x)	((1 << (_feat)) & (x)->features)
 
 static inline void mpam_set_feature(enum mpam_device_features feat,
 				    struct mpam_props *props)
@@ -226,6 +222,17 @@ struct mpam_class {
 	struct mpam_garbage	garbage;
 };
 
+struct mpam_config {
+	/* Which configuration values are valid. 0 is used for reset */
+	mpam_features_t		features;
+
+	u32	cpbm;
+	u32	mbw_pbm;
+	u16	mbw_max;
+
+	struct mpam_garbage	garbage;
+};
+
 struct mpam_component {
 	u32			comp_id;
 
@@ -234,6 +241,12 @@ struct mpam_component {
 
 	cpumask_t		affinity;
 
+	/*
+	 * Array of configuration values, indexed by partid.
+	 * Read from cpuhp callbacks, hold the cpuhp lock when writing.
+	 */
+	struct mpam_config	*cfg;
+
 	/* member of mpam_class:components */
 	struct list_head	class_list;
 
@@ -298,6 +311,9 @@ extern u8 mpam_pmg_max;
 void mpam_enable(struct work_struct *work);
 void mpam_disable(struct work_struct *work);
 
+int mpam_apply_config(struct mpam_component *comp, u16 partid,
+		      struct mpam_config *cfg);
+
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
 
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * [PATCH 25/33] arm_mpam: Probe and reset the rest of the features
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (57 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 24/33] arm_mpam: Allow configuration to be applied and restored during cpu online James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-22 15:30 ` [PATCH 26/33] arm_mpam: Add helpers to allocate monitors James Morse
                   ` (8 subsequent siblings)
  67 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich, Rohit Mathew,
	Zeng Heng, Dave Martin
MPAM supports more features than are going to be exposed to resctrl.
For partid other than 0, the reset values of these controls isn't
known.
Discover the rest of the features so they can be reset to avoid any
side effects when resctrl is in use.
PARTID narrowing allows MSC/RIS to support less configuration space than
is usable. If this feature is found on a class of device we are likely
to use, then reduce the partid_max to make it usable. This allows us
to map a PARTID to itself.
CC: Rohit Mathew <Rohit.Mathew@arm.com>
CC: Zeng Heng <zengheng4@huawei.com>
CC: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_devices.c  | 175 ++++++++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h |  16 ++-
 2 files changed, 189 insertions(+), 2 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 8f6df2406c22..aedd743d6827 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -213,6 +213,15 @@ static void __mpam_part_sel(u8 ris_idx, u16 partid, struct mpam_msc *msc)
 	__mpam_part_sel_raw(partsel, msc);
 }
 
+static void __mpam_intpart_sel(u8 ris_idx, u16 intpartid, struct mpam_msc *msc)
+{
+	u32 partsel = FIELD_PREP(MPAMCFG_PART_SEL_RIS, ris_idx) |
+		      FIELD_PREP(MPAMCFG_PART_SEL_PARTID_SEL, intpartid) |
+		      MPAMCFG_PART_SEL_INTERNAL;
+
+	__mpam_part_sel_raw(partsel, msc);
+}
+
 int mpam_register_requestor(u16 partid_max, u8 pmg_max)
 {
 	int err = 0;
@@ -743,10 +752,35 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
 	int err;
 	struct mpam_msc *msc = ris->vmsc->msc;
 	struct mpam_props *props = &ris->props;
+	struct mpam_class *class = ris->vmsc->comp->class;
 
 	lockdep_assert_held(&msc->probe_lock);
 	lockdep_assert_held(&msc->part_sel_lock);
 
+	/* Cache Capacity Partitioning */
+	if (FIELD_GET(MPAMF_IDR_HAS_CCAP_PART, ris->idr)) {
+		u32 ccap_features = mpam_read_partsel_reg(msc, CCAP_IDR);
+
+		props->cmax_wd = FIELD_GET(MPAMF_CCAP_IDR_CMAX_WD, ccap_features);
+		if (props->cmax_wd &&
+		    FIELD_GET(MPAMF_CCAP_IDR_HAS_CMAX_SOFTLIM, ccap_features))
+			mpam_set_feature(mpam_feat_cmax_softlim, props);
+
+		if (props->cmax_wd &&
+		    !FIELD_GET(MPAMF_CCAP_IDR_NO_CMAX, ccap_features))
+			mpam_set_feature(mpam_feat_cmax_cmax, props);
+
+		if (props->cmax_wd &&
+		    FIELD_GET(MPAMF_CCAP_IDR_HAS_CMIN, ccap_features))
+			mpam_set_feature(mpam_feat_cmax_cmin, props);
+
+		props->cassoc_wd = FIELD_GET(MPAMF_CCAP_IDR_CASSOC_WD, ccap_features);
+
+		if (props->cassoc_wd &&
+		    FIELD_GET(MPAMF_CCAP_IDR_HAS_CASSOC, ccap_features))
+			mpam_set_feature(mpam_feat_cmax_cassoc, props);
+	}
+
 	/* Cache Portion partitioning */
 	if (FIELD_GET(MPAMF_IDR_HAS_CPOR_PART, ris->idr)) {
 		u32 cpor_features = mpam_read_partsel_reg(msc, CPOR_IDR);
@@ -769,6 +803,31 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
 		props->bwa_wd = FIELD_GET(MPAMF_MBW_IDR_BWA_WD, mbw_features);
 		if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MAX, mbw_features))
 			mpam_set_feature(mpam_feat_mbw_max, props);
+
+		if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MIN, mbw_features))
+			mpam_set_feature(mpam_feat_mbw_min, props);
+
+		if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_PROP, mbw_features))
+			mpam_set_feature(mpam_feat_mbw_prop, props);
+	}
+
+	/* Priority partitioning */
+	if (FIELD_GET(MPAMF_IDR_HAS_PRI_PART, ris->idr)) {
+		u32 pri_features = mpam_read_partsel_reg(msc, PRI_IDR);
+
+		props->intpri_wd = FIELD_GET(MPAMF_PRI_IDR_INTPRI_WD, pri_features);
+		if (props->intpri_wd && FIELD_GET(MPAMF_PRI_IDR_HAS_INTPRI, pri_features)) {
+			mpam_set_feature(mpam_feat_intpri_part, props);
+			if (FIELD_GET(MPAMF_PRI_IDR_INTPRI_0_IS_LOW, pri_features))
+				mpam_set_feature(mpam_feat_intpri_part_0_low, props);
+		}
+
+		props->dspri_wd = FIELD_GET(MPAMF_PRI_IDR_DSPRI_WD, pri_features);
+		if (props->dspri_wd && FIELD_GET(MPAMF_PRI_IDR_HAS_DSPRI, pri_features)) {
+			mpam_set_feature(mpam_feat_dspri_part, props);
+			if (FIELD_GET(MPAMF_PRI_IDR_DSPRI_0_IS_LOW, pri_features))
+				mpam_set_feature(mpam_feat_dspri_part_0_low, props);
+		}
 	}
 
 	/* Performance Monitoring */
@@ -832,6 +891,21 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
 			 */
 		}
 	}
+
+	/*
+	 * RIS with PARTID narrowing don't have enough storage for one
+	 * configuration per PARTID. If these are in a class we could use,
+	 * reduce the supported partid_max to match the number of intpartid.
+	 * If the class is unknown, just ignore it.
+	 */
+	if (FIELD_GET(MPAMF_IDR_HAS_PARTID_NRW, ris->idr) &&
+	    class->type != MPAM_CLASS_UNKNOWN) {
+		u32 nrwidr = mpam_read_partsel_reg(msc, PARTID_NRW_IDR);
+		u16 partid_max = FIELD_GET(MPAMF_PARTID_NRW_IDR_INTPARTID_MAX, nrwidr);
+
+		mpam_set_feature(mpam_feat_partid_nrw, props);
+		msc->partid_max = min(msc->partid_max, partid_max);
+	}
 }
 
 static int mpam_msc_hw_probe(struct mpam_msc *msc)
@@ -929,13 +1003,29 @@ static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
 static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
 				      struct mpam_config *cfg)
 {
+	u32 pri_val = 0;
+	u16 cmax = MPAMCFG_CMAX_CMAX;
 	u16 bwa_fract = MPAMCFG_MBW_MAX_MAX;
 	struct mpam_msc *msc = ris->vmsc->msc;
 	struct mpam_props *rprops = &ris->props;
+	u16 dspri = GENMASK(rprops->dspri_wd, 0);
+	u16 intpri = GENMASK(rprops->intpri_wd, 0);
 
 	mutex_lock(&msc->part_sel_lock);
 	__mpam_part_sel(ris->ris_idx, partid, msc);
 
+	if (mpam_has_feature(mpam_feat_partid_nrw, rprops)) {
+		/* Update the intpartid mapping */
+		mpam_write_partsel_reg(msc, INTPARTID,
+				       MPAMCFG_INTPARTID_INTERNAL | partid);
+
+		/*
+		 * Then switch to the 'internal' partid to update the
+		 * configuration.
+		 */
+		__mpam_intpart_sel(ris->ris_idx, partid, msc);
+	}
+
 	if (mpam_has_feature(mpam_feat_cpor_part, rprops)) {
 		if (mpam_has_feature(mpam_feat_cpor_part, cfg))
 			mpam_write_partsel_reg(msc, CPBM, cfg->cpbm);
@@ -964,6 +1054,29 @@ static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
 
 	if (mpam_has_feature(mpam_feat_mbw_prop, rprops))
 		mpam_write_partsel_reg(msc, MBW_PROP, bwa_fract);
+
+	if (mpam_has_feature(mpam_feat_cmax_cmax, rprops))
+		mpam_write_partsel_reg(msc, CMAX, cmax);
+
+	if (mpam_has_feature(mpam_feat_cmax_cmin, rprops))
+		mpam_write_partsel_reg(msc, CMIN, 0);
+
+	if (mpam_has_feature(mpam_feat_intpri_part, rprops) ||
+	    mpam_has_feature(mpam_feat_dspri_part, rprops)) {
+		/* aces high? */
+		if (!mpam_has_feature(mpam_feat_intpri_part_0_low, rprops))
+			intpri = 0;
+		if (!mpam_has_feature(mpam_feat_dspri_part_0_low, rprops))
+			dspri = 0;
+
+		if (mpam_has_feature(mpam_feat_intpri_part, rprops))
+			pri_val |= FIELD_PREP(MPAMCFG_PRI_INTPRI, intpri);
+		if (mpam_has_feature(mpam_feat_dspri_part, rprops))
+			pri_val |= FIELD_PREP(MPAMCFG_PRI_DSPRI, dspri);
+
+		mpam_write_partsel_reg(msc, PRI, pri_val);
+	}
+
 	mutex_unlock(&msc->part_sel_lock);
 }
 
@@ -1529,6 +1642,16 @@ static bool mpam_has_bwa_wd_feature(struct mpam_props *props)
 	return false;
 }
 
+/* Any of these features mean the CMAX_WD field is valid. */
+static bool mpam_has_cmax_wd_feature(struct mpam_props *props)
+{
+	if (mpam_has_feature(mpam_feat_cmax_cmax, props))
+		return true;
+	if (mpam_has_feature(mpam_feat_cmax_cmin, props))
+		return true;
+	return false;
+}
+
 #define MISMATCHED_HELPER(parent, child, helper, field, alias)		\
 	helper(parent) &&						\
 	((helper(child) && (parent)->field != (child)->field) ||	\
@@ -1583,6 +1706,23 @@ static void __props_mismatch(struct mpam_props *parent,
 		parent->bwa_wd = min(parent->bwa_wd, child->bwa_wd);
 	}
 
+	if (alias && !mpam_has_cmax_wd_feature(parent) && mpam_has_cmax_wd_feature(child)) {
+		parent->cmax_wd = child->cmax_wd;
+	} else if (MISMATCHED_HELPER(parent, child, mpam_has_cmax_wd_feature,
+				     cmax_wd, alias)) {
+		pr_debug("%s took the min cmax_wd\n", __func__);
+		parent->cmax_wd = min(parent->cmax_wd, child->cmax_wd);
+	}
+
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_cmax_cassoc, alias)) {
+		parent->cassoc_wd = child->cassoc_wd;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_cmax_cassoc,
+				   cassoc_wd, alias)) {
+		pr_debug("%s cleared cassoc_wd\n", __func__);
+		mpam_clear_feature(mpam_feat_cmax_cassoc, &parent->features);
+		parent->cassoc_wd = 0;
+	}
+
 	/* For num properties, take the minimum */
 	if (CAN_MERGE_FEAT(parent, child, mpam_feat_msmon_csu, alias)) {
 		parent->num_csu_mon = child->num_csu_mon;
@@ -1600,6 +1740,41 @@ static void __props_mismatch(struct mpam_props *parent,
 		parent->num_mbwu_mon = min(parent->num_mbwu_mon, child->num_mbwu_mon);
 	}
 
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_intpri_part, alias)) {
+		parent->intpri_wd = child->intpri_wd;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_intpri_part,
+				   intpri_wd, alias)) {
+		pr_debug("%s took the min intpri_wd\n", __func__);
+		parent->intpri_wd = min(parent->intpri_wd, child->intpri_wd);
+	}
+
+	if (CAN_MERGE_FEAT(parent, child, mpam_feat_dspri_part, alias)) {
+		parent->dspri_wd = child->dspri_wd;
+	} else if (MISMATCHED_FEAT(parent, child, mpam_feat_dspri_part,
+				   dspri_wd, alias)) {
+		pr_debug("%s took the min dspri_wd\n", __func__);
+		parent->dspri_wd = min(parent->dspri_wd, child->dspri_wd);
+	}
+
+	/* TODO: alias support for these two */
+	/* {int,ds}pri may not have differing 0-low behaviour */
+	if (mpam_has_feature(mpam_feat_intpri_part, parent) &&
+	    (!mpam_has_feature(mpam_feat_intpri_part, child) ||
+	     mpam_has_feature(mpam_feat_intpri_part_0_low, parent) !=
+	     mpam_has_feature(mpam_feat_intpri_part_0_low, child))) {
+		pr_debug("%s cleared intpri_part\n", __func__);
+		mpam_clear_feature(mpam_feat_intpri_part, &parent->features);
+		mpam_clear_feature(mpam_feat_intpri_part_0_low, &parent->features);
+	}
+	if (mpam_has_feature(mpam_feat_dspri_part, parent) &&
+	    (!mpam_has_feature(mpam_feat_dspri_part, child) ||
+	     mpam_has_feature(mpam_feat_dspri_part_0_low, parent) !=
+	     mpam_has_feature(mpam_feat_dspri_part_0_low, child))) {
+		pr_debug("%s cleared dspri_part\n", __func__);
+		mpam_clear_feature(mpam_feat_dspri_part, &parent->features);
+		mpam_clear_feature(mpam_feat_dspri_part_0_low, &parent->features);
+	}
+
 	if (alias) {
 		/* Merge features for aliased resources */
 		parent->features |= child->features;
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 70cba9f22746..23445aedbabd 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -157,16 +157,23 @@ static inline void mpam_mon_sel_lock_held(struct mpam_msc *msc)
  * When we compact the supported features, we don't care what they are.
  * Storing them as a bitmap makes life easy.
  */
-typedef u16 mpam_features_t;
+typedef u32 mpam_features_t;
 
 /* Bits for mpam_features_t */
 enum mpam_device_features {
-	mpam_feat_ccap_part = 0,
+	mpam_feat_cmax_softlim,
+	mpam_feat_cmax_cmax,
+	mpam_feat_cmax_cmin,
+	mpam_feat_cmax_cassoc,
 	mpam_feat_cpor_part,
 	mpam_feat_mbw_part,
 	mpam_feat_mbw_min,
 	mpam_feat_mbw_max,
 	mpam_feat_mbw_prop,
+	mpam_feat_intpri_part,
+	mpam_feat_intpri_part_0_low,
+	mpam_feat_dspri_part,
+	mpam_feat_dspri_part_0_low,
 	mpam_feat_msmon,
 	mpam_feat_msmon_csu,
 	mpam_feat_msmon_csu_capture,
@@ -176,6 +183,7 @@ enum mpam_device_features {
 	mpam_feat_msmon_mbwu_rwbw,
 	mpam_feat_msmon_mbwu_hw_nrdy,
 	mpam_feat_msmon_capt,
+	mpam_feat_partid_nrw,
 	MPAM_FEATURE_LAST,
 };
 static_assert(BITS_PER_TYPE(mpam_features_t) >= MPAM_FEATURE_LAST);
@@ -187,6 +195,10 @@ struct mpam_props {
 	u16			cpbm_wd;
 	u16			mbw_pbm_bits;
 	u16			bwa_wd;
+	u16			cmax_wd;
+	u16			cassoc_wd;
+	u16			intpri_wd;
+	u16			dspri_wd;
 	u16			num_csu_mon;
 	u16			num_mbwu_mon;
 };
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * [PATCH 26/33] arm_mpam: Add helpers to allocate monitors
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (58 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 25/33] arm_mpam: Probe and reset the rest of the features James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-22 15:30 ` [PATCH 27/33] arm_mpam: Add mpam_msmon_read() to read monitor value James Morse
                   ` (7 subsequent siblings)
  67 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
MPAM's MSC support a number of monitors, each of which supports
bandwidth counters, or cache-storage-utilisation counters. To use
a counter, a monitor needs to be configured. Add helpers to allocate
and free CSU or MBWU monitors.
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_devices.c  |  2 ++
 drivers/resctrl/mpam_internal.h | 35 +++++++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index aedd743d6827..e7e00c632512 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -348,6 +348,8 @@ mpam_class_alloc(u8 level_idx, enum mpam_class_types type, gfp_t gfp)
 	class->level = level_idx;
 	class->type = type;
 	INIT_LIST_HEAD_RCU(&class->classes_list);
+	ida_init(&class->ida_csu_mon);
+	ida_init(&class->ida_mbwu_mon);
 
 	list_add_rcu(&class->classes_list, &mpam_classes);
 
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 23445aedbabd..4981de120869 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -231,6 +231,9 @@ struct mpam_class {
 	/* member of mpam_classes */
 	struct list_head	classes_list;
 
+	struct ida		ida_csu_mon;
+	struct ida		ida_mbwu_mon;
+
 	struct mpam_garbage	garbage;
 };
 
@@ -306,6 +309,38 @@ struct mpam_msc_ris {
 	struct mpam_garbage	garbage;
 };
 
+static inline int mpam_alloc_csu_mon(struct mpam_class *class)
+{
+	struct mpam_props *cprops = &class->props;
+
+	if (!mpam_has_feature(mpam_feat_msmon_csu, cprops))
+		return -EOPNOTSUPP;
+
+	return ida_alloc_range(&class->ida_csu_mon, 0, cprops->num_csu_mon - 1,
+			       GFP_KERNEL);
+}
+
+static inline void mpam_free_csu_mon(struct mpam_class *class, int csu_mon)
+{
+	ida_free(&class->ida_csu_mon, csu_mon);
+}
+
+static inline int mpam_alloc_mbwu_mon(struct mpam_class *class)
+{
+	struct mpam_props *cprops = &class->props;
+
+	if (!mpam_has_feature(mpam_feat_msmon_mbwu, cprops))
+		return -EOPNOTSUPP;
+
+	return ida_alloc_range(&class->ida_mbwu_mon, 0,
+			       cprops->num_mbwu_mon - 1, GFP_KERNEL);
+}
+
+static inline void mpam_free_mbwu_mon(struct mpam_class *class, int mbwu_mon)
+{
+	ida_free(&class->ida_mbwu_mon, mbwu_mon);
+}
+
 /* List of all classes - protected by srcu*/
 extern struct srcu_struct mpam_srcu;
 extern struct list_head mpam_classes;
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * [PATCH 27/33] arm_mpam: Add mpam_msmon_read() to read monitor value
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (59 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 26/33] arm_mpam: Add helpers to allocate monitors James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-22 15:30 ` [PATCH 28/33] arm_mpam: Track bandwidth counter state for overflow and power management James Morse
                   ` (6 subsequent siblings)
  67 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Reading a monitor involves configuring what you want to monitor, and
reading the value. Components made up of multiple MSC may need values
from each MSC. MSCs may take time to configure, returning 'not ready'.
The maximum 'not ready' time should have been provided by firmware.
Add mpam_msmon_read() to hide all this. If (one of) the MSC returns
not ready, then wait the full timeout value before trying again.
CC: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_devices.c  | 222 ++++++++++++++++++++++++++++++++
 drivers/resctrl/mpam_internal.h |  18 +++
 2 files changed, 240 insertions(+)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index e7e00c632512..9ce771aaf671 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -973,6 +973,228 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
 	return 0;
 }
 
+struct mon_read {
+	struct mpam_msc_ris		*ris;
+	struct mon_cfg			*ctx;
+	enum mpam_device_features	type;
+	u64				*val;
+	int				err;
+};
+
+static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
+				   u32 *flt_val)
+{
+	struct mon_cfg *ctx = m->ctx;
+
+	switch (m->type) {
+	case mpam_feat_msmon_csu:
+		*ctl_val = MSMON_CFG_CSU_CTL_TYPE_CSU;
+		break;
+	case mpam_feat_msmon_mbwu:
+		*ctl_val = MSMON_CFG_MBWU_CTL_TYPE_MBWU;
+		break;
+	default:
+		return;
+	}
+
+	/*
+	 * For CSU counters its implementation-defined what happens when not
+	 * filtering by partid.
+	 */
+	*ctl_val |= MSMON_CFG_x_CTL_MATCH_PARTID;
+
+	*flt_val = FIELD_PREP(MSMON_CFG_MBWU_FLT_PARTID, ctx->partid);
+	if (m->ctx->match_pmg) {
+		*ctl_val |= MSMON_CFG_x_CTL_MATCH_PMG;
+		*flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_PMG, ctx->pmg);
+	}
+
+	if (mpam_has_feature(mpam_feat_msmon_mbwu_rwbw, &m->ris->props))
+		*flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_RWBW, ctx->opts);
+}
+
+static void read_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
+				    u32 *flt_val)
+{
+	struct mpam_msc *msc = m->ris->vmsc->msc;
+
+	switch (m->type) {
+	case mpam_feat_msmon_csu:
+		*ctl_val = mpam_read_monsel_reg(msc, CFG_CSU_CTL);
+		*flt_val = mpam_read_monsel_reg(msc, CFG_CSU_FLT);
+		break;
+	case mpam_feat_msmon_mbwu:
+		*ctl_val = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
+		*flt_val = mpam_read_monsel_reg(msc, CFG_MBWU_FLT);
+		break;
+	default:
+		return;
+	}
+}
+
+/* Remove values set by the hardware to prevent apparant mismatches. */
+static void clean_msmon_ctl_val(u32 *cur_ctl)
+{
+	*cur_ctl &= ~MSMON_CFG_x_CTL_OFLOW_STATUS;
+}
+
+static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
+				     u32 flt_val)
+{
+	struct mpam_msc *msc = m->ris->vmsc->msc;
+
+	/*
+	 * Write the ctl_val with the enable bit cleared, reset the counter,
+	 * then enable counter.
+	 */
+	switch (m->type) {
+	case mpam_feat_msmon_csu:
+		mpam_write_monsel_reg(msc, CFG_CSU_FLT, flt_val);
+		mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val);
+		mpam_write_monsel_reg(msc, CSU, 0);
+		mpam_write_monsel_reg(msc, CFG_CSU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
+		break;
+	case mpam_feat_msmon_mbwu:
+		mpam_write_monsel_reg(msc, CFG_MBWU_FLT, flt_val);
+		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
+		mpam_write_monsel_reg(msc, MBWU, 0);
+		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
+		break;
+	default:
+		return;
+	}
+}
+
+/* Call with MSC lock held */
+static void __ris_msmon_read(void *arg)
+{
+	u64 now;
+	bool nrdy = false;
+	struct mon_read *m = arg;
+	struct mon_cfg *ctx = m->ctx;
+	struct mpam_msc_ris *ris = m->ris;
+	struct mpam_props *rprops = &ris->props;
+	struct mpam_msc *msc = m->ris->vmsc->msc;
+	u32 mon_sel, ctl_val, flt_val, cur_ctl, cur_flt;
+
+	if (!mpam_mon_sel_inner_lock(msc)) {
+		m->err = -EIO;
+		return;
+	}
+	mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, ctx->mon) |
+		  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
+	mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
+
+	/*
+	 * Read the existing configuration to avoid re-writing the same values.
+	 * This saves waiting for 'nrdy' on subsequent reads.
+	 */
+	read_msmon_ctl_flt_vals(m, &cur_ctl, &cur_flt);
+	clean_msmon_ctl_val(&cur_ctl);
+	gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
+	if (cur_flt != flt_val || cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN))
+		write_msmon_ctl_flt_vals(m, ctl_val, flt_val);
+
+	switch (m->type) {
+	case mpam_feat_msmon_csu:
+		now = mpam_read_monsel_reg(msc, CSU);
+		if (mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, rprops))
+			nrdy = now & MSMON___NRDY;
+		break;
+	case mpam_feat_msmon_mbwu:
+		now = mpam_read_monsel_reg(msc, MBWU);
+		if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
+			nrdy = now & MSMON___NRDY;
+		break;
+	default:
+		m->err = -EINVAL;
+		break;
+	}
+	mpam_mon_sel_inner_unlock(msc);
+
+	if (nrdy) {
+		m->err = -EBUSY;
+		return;
+	}
+
+	now = FIELD_GET(MSMON___VALUE, now);
+	*m->val += now;
+}
+
+static int _msmon_read(struct mpam_component *comp, struct mon_read *arg)
+{
+	int err, idx;
+	struct mpam_msc *msc;
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc_ris *ris;
+
+	idx = srcu_read_lock(&mpam_srcu);
+	list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
+		msc = vmsc->msc;
+
+		mpam_mon_sel_outer_lock(msc);
+		list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
+			arg->ris = ris;
+
+			err = smp_call_function_any(&msc->accessibility,
+						    __ris_msmon_read, arg,
+						    true);
+			if (!err && arg->err)
+				err = arg->err;
+			if (err)
+				break;
+		}
+		mpam_mon_sel_outer_unlock(msc);
+		if (err)
+			break;
+	}
+	srcu_read_unlock(&mpam_srcu, idx);
+
+	return err;
+}
+
+int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
+		    enum mpam_device_features type, u64 *val)
+{
+	int err;
+	struct mon_read arg;
+	u64 wait_jiffies = 0;
+	struct mpam_props *cprops = &comp->class->props;
+
+	might_sleep();
+
+	if (!mpam_is_enabled())
+		return -EIO;
+
+	if (!mpam_has_feature(type, cprops))
+		return -EOPNOTSUPP;
+
+	memset(&arg, 0, sizeof(arg));
+	arg.ctx = ctx;
+	arg.type = type;
+	arg.val = val;
+	*val = 0;
+
+	err = _msmon_read(comp, &arg);
+	if (err == -EBUSY && comp->class->nrdy_usec)
+		wait_jiffies = usecs_to_jiffies(comp->class->nrdy_usec);
+
+	while (wait_jiffies)
+		wait_jiffies = schedule_timeout_uninterruptible(wait_jiffies);
+
+	if (err == -EBUSY) {
+		memset(&arg, 0, sizeof(arg));
+		arg.ctx = ctx;
+		arg.type = type;
+		arg.val = val;
+		*val = 0;
+
+		err = _msmon_read(comp, &arg);
+	}
+
+	return err;
+}
+
 static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
 {
 	u32 num_words, msb;
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 4981de120869..76e406a2b0d1 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -309,6 +309,21 @@ struct mpam_msc_ris {
 	struct mpam_garbage	garbage;
 };
 
+/* The values for MSMON_CFG_MBWU_FLT.RWBW */
+enum mon_filter_options {
+	COUNT_BOTH	= 0,
+	COUNT_WRITE	= 1,
+	COUNT_READ	= 2,
+};
+
+struct mon_cfg {
+	u16                     mon;
+	u8                      pmg;
+	bool                    match_pmg;
+	u32                     partid;
+	enum mon_filter_options opts;
+};
+
 static inline int mpam_alloc_csu_mon(struct mpam_class *class)
 {
 	struct mpam_props *cprops = &class->props;
@@ -361,6 +376,9 @@ void mpam_disable(struct work_struct *work);
 int mpam_apply_config(struct mpam_component *comp, u16 partid,
 		      struct mpam_config *cfg);
 
+int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
+		    enum mpam_device_features, u64 *val);
+
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
 
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * [PATCH 28/33] arm_mpam: Track bandwidth counter state for overflow and power management
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (60 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 27/33] arm_mpam: Add mpam_msmon_read() to read monitor value James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-28  0:58   ` Fenghua Yu
  2025-08-22 15:30 ` [PATCH 29/33] arm_mpam: Probe for long/lwd mbwu counters James Morse
                   ` (5 subsequent siblings)
  67 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
Bandwidth counters need to run continuously to correctly reflect the
bandwidth.
The value read may be lower than the previous value read in the case
of overflow and when the hardware is reset due to CPU hotplug.
Add struct mbwu_state to track the bandwidth counter to allow overflow
and power management to be handled.
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_devices.c  | 163 +++++++++++++++++++++++++++++++-
 drivers/resctrl/mpam_internal.h |  54 ++++++++---
 2 files changed, 200 insertions(+), 17 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 9ce771aaf671..11be34b54643 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -1004,6 +1004,7 @@ static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
 	*ctl_val |= MSMON_CFG_x_CTL_MATCH_PARTID;
 
 	*flt_val = FIELD_PREP(MSMON_CFG_MBWU_FLT_PARTID, ctx->partid);
+	*flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_RWBW, ctx->opts);
 	if (m->ctx->match_pmg) {
 		*ctl_val |= MSMON_CFG_x_CTL_MATCH_PMG;
 		*flt_val |= FIELD_PREP(MSMON_CFG_MBWU_FLT_PMG, ctx->pmg);
@@ -1041,6 +1042,7 @@ static void clean_msmon_ctl_val(u32 *cur_ctl)
 static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
 				     u32 flt_val)
 {
+	struct msmon_mbwu_state *mbwu_state;
 	struct mpam_msc *msc = m->ris->vmsc->msc;
 
 	/*
@@ -1059,20 +1061,32 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
 		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
 		mpam_write_monsel_reg(msc, MBWU, 0);
 		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
+
+		mbwu_state = &m->ris->mbwu_state[m->ctx->mon];
+		if (mbwu_state)
+			mbwu_state->prev_val = 0;
+
 		break;
 	default:
 		return;
 	}
 }
 
+static u64 mpam_msmon_overflow_val(struct mpam_msc_ris *ris)
+{
+	/* TODO: scaling, and long counters */
+	return GENMASK_ULL(30, 0);
+}
+
 /* Call with MSC lock held */
 static void __ris_msmon_read(void *arg)
 {
-	u64 now;
 	bool nrdy = false;
 	struct mon_read *m = arg;
+	u64 now, overflow_val = 0;
 	struct mon_cfg *ctx = m->ctx;
 	struct mpam_msc_ris *ris = m->ris;
+	struct msmon_mbwu_state *mbwu_state;
 	struct mpam_props *rprops = &ris->props;
 	struct mpam_msc *msc = m->ris->vmsc->msc;
 	u32 mon_sel, ctl_val, flt_val, cur_ctl, cur_flt;
@@ -1100,11 +1114,30 @@ static void __ris_msmon_read(void *arg)
 		now = mpam_read_monsel_reg(msc, CSU);
 		if (mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, rprops))
 			nrdy = now & MSMON___NRDY;
+		now = FIELD_GET(MSMON___VALUE, now);
 		break;
 	case mpam_feat_msmon_mbwu:
 		now = mpam_read_monsel_reg(msc, MBWU);
 		if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
 			nrdy = now & MSMON___NRDY;
+		now = FIELD_GET(MSMON___VALUE, now);
+
+		if (nrdy)
+			break;
+
+		mbwu_state = &ris->mbwu_state[ctx->mon];
+		if (!mbwu_state)
+			break;
+
+		/* Add any pre-overflow value to the mbwu_state->val */
+		if (mbwu_state->prev_val > now)
+			overflow_val = mpam_msmon_overflow_val(ris) - mbwu_state->prev_val;
+
+		mbwu_state->prev_val = now;
+		mbwu_state->correction += overflow_val;
+
+		/* Include bandwidth consumed before the last hardware reset */
+		now += mbwu_state->correction;
 		break;
 	default:
 		m->err = -EINVAL;
@@ -1117,7 +1150,6 @@ static void __ris_msmon_read(void *arg)
 		return;
 	}
 
-	now = FIELD_GET(MSMON___VALUE, now);
 	*m->val += now;
 }
 
@@ -1329,6 +1361,72 @@ static int mpam_reprogram_ris(void *_arg)
 	return 0;
 }
 
+/* Call with MSC lock and outer mon_sel lock held */
+static int mpam_restore_mbwu_state(void *_ris)
+{
+	int i;
+	struct mon_read mwbu_arg;
+	struct mpam_msc_ris *ris = _ris;
+	struct mpam_msc *msc = ris->vmsc->msc;
+
+	mpam_mon_sel_outer_lock(msc);
+
+	for (i = 0; i < ris->props.num_mbwu_mon; i++) {
+		if (ris->mbwu_state[i].enabled) {
+			mwbu_arg.ris = ris;
+			mwbu_arg.ctx = &ris->mbwu_state[i].cfg;
+			mwbu_arg.type = mpam_feat_msmon_mbwu;
+
+			__ris_msmon_read(&mwbu_arg);
+		}
+	}
+
+	mpam_mon_sel_outer_unlock(msc);
+
+	return 0;
+}
+
+/* Call with MSC lock and outer mon_sel lock held */
+static int mpam_save_mbwu_state(void *arg)
+{
+	int i;
+	u64 val;
+	struct mon_cfg *cfg;
+	u32 cur_flt, cur_ctl, mon_sel;
+	struct mpam_msc_ris *ris = arg;
+	struct msmon_mbwu_state *mbwu_state;
+	struct mpam_msc *msc = ris->vmsc->msc;
+
+	for (i = 0; i < ris->props.num_mbwu_mon; i++) {
+		mbwu_state = &ris->mbwu_state[i];
+		cfg = &mbwu_state->cfg;
+
+		if (WARN_ON_ONCE(!mpam_mon_sel_inner_lock(msc)))
+			return -EIO;
+
+		mon_sel = FIELD_PREP(MSMON_CFG_MON_SEL_MON_SEL, i) |
+			  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
+		mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
+
+		cur_flt = mpam_read_monsel_reg(msc, CFG_MBWU_FLT);
+		cur_ctl = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
+		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, 0);
+
+		val = mpam_read_monsel_reg(msc, MBWU);
+		mpam_write_monsel_reg(msc, MBWU, 0);
+
+		cfg->mon = i;
+		cfg->pmg = FIELD_GET(MSMON_CFG_MBWU_FLT_PMG, cur_flt);
+		cfg->match_pmg = FIELD_GET(MSMON_CFG_x_CTL_MATCH_PMG, cur_ctl);
+		cfg->partid = FIELD_GET(MSMON_CFG_MBWU_FLT_PARTID, cur_flt);
+		mbwu_state->correction += val;
+		mbwu_state->enabled = FIELD_GET(MSMON_CFG_x_CTL_EN, cur_ctl);
+		mpam_mon_sel_inner_unlock(msc);
+	}
+
+	return 0;
+}
+
 /*
  * Called via smp_call_on_cpu() to prevent migration, while still being
  * pre-emptible.
@@ -1389,6 +1487,9 @@ static void mpam_reset_msc(struct mpam_msc *msc, bool online)
 		 * for non-zero partid may be lost while the CPUs are offline.
 		 */
 		ris->in_reset_state = online;
+
+		if (mpam_is_enabled() && !online)
+			mpam_touch_msc(msc, &mpam_save_mbwu_state, ris);
 	}
 	mpam_mon_sel_outer_unlock(msc);
 }
@@ -1423,6 +1524,9 @@ static void mpam_reprogram_msc(struct mpam_msc *msc)
 			mpam_reprogram_ris_partid(ris, partid, cfg);
 		}
 		ris->in_reset_state = reset;
+
+		if (mpam_has_feature(mpam_feat_msmon_mbwu, &ris->props))
+			mpam_touch_msc(msc, &mpam_restore_mbwu_state, ris);
 	}
 }
 
@@ -2291,11 +2395,35 @@ static void mpam_unregister_irqs(void)
 
 static void __destroy_component_cfg(struct mpam_component *comp)
 {
+	struct mpam_msc *msc;
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc_ris *ris;
+
+	lockdep_assert_held(&mpam_list_lock);
+
 	add_to_garbage(comp->cfg);
+	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+		msc = vmsc->msc;
+
+		mpam_mon_sel_outer_lock(msc);
+		if (mpam_mon_sel_inner_lock(msc)) {
+			list_for_each_entry(ris, &vmsc->ris, vmsc_list)
+				add_to_garbage(ris->mbwu_state);
+			mpam_mon_sel_inner_unlock(msc);
+		}
+		mpam_mon_sel_outer_lock(msc);
+	}
 }
 
 static int __allocate_component_cfg(struct mpam_component *comp)
 {
+	int err = 0;
+	struct mpam_msc *msc;
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc_ris *ris;
+	struct msmon_mbwu_state *mbwu_state;
+
+	lockdep_assert_held(&mpam_list_lock);
 	mpam_assert_partid_sizes_fixed();
 
 	if (comp->cfg)
@@ -2306,6 +2434,37 @@ static int __allocate_component_cfg(struct mpam_component *comp)
 		return -ENOMEM;
 	init_garbage(comp->cfg);
 
+	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
+		if (!vmsc->props.num_mbwu_mon)
+			continue;
+
+		msc = vmsc->msc;
+		mpam_mon_sel_outer_lock(msc);
+		list_for_each_entry(ris, &vmsc->ris, vmsc_list) {
+			if (!ris->props.num_mbwu_mon)
+				continue;
+
+			mbwu_state = kcalloc(ris->props.num_mbwu_mon,
+					     sizeof(*ris->mbwu_state),
+					     GFP_KERNEL);
+			if (!mbwu_state) {
+				__destroy_component_cfg(comp);
+				err = -ENOMEM;
+				break;
+			}
+
+			if (mpam_mon_sel_inner_lock(msc)) {
+				init_garbage(mbwu_state);
+				ris->mbwu_state = mbwu_state;
+				mpam_mon_sel_inner_unlock(msc);
+			}
+		}
+		mpam_mon_sel_outer_unlock(msc);
+
+		if (err)
+			break;
+	}
+
 	return 0;
 }
 
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 76e406a2b0d1..9a50a5432f4a 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -271,6 +271,42 @@ struct mpam_component {
 	struct mpam_garbage	garbage;
 };
 
+/* The values for MSMON_CFG_MBWU_FLT.RWBW */
+enum mon_filter_options {
+	COUNT_BOTH	= 0,
+	COUNT_WRITE	= 1,
+	COUNT_READ	= 2,
+};
+
+struct mon_cfg {
+	/* mon is wider than u16 to hold an out of range 'USE_RMID_IDX' */
+	u32                     mon;
+	u8                      pmg;
+	bool                    match_pmg;
+	u32                     partid;
+	enum mon_filter_options opts;
+};
+
+/*
+ * Changes to enabled and cfg are protected by the msc->lock.
+ * Changes to prev_val and correction are protected by the msc's mon_sel_lock.
+ */
+struct msmon_mbwu_state {
+	bool		enabled;
+	struct mon_cfg	cfg;
+
+	/* The value last read from the hardware. Used to detect overflow. */
+	u64		prev_val;
+
+	/*
+	 * The value to add to the new reading to account for power management,
+	 * and shifts to trigger the overflow interrupt.
+	 */
+	u64		correction;
+
+	struct mpam_garbage	garbage;
+};
+
 struct mpam_vmsc {
 	/* member of mpam_component:vmsc_list */
 	struct list_head	comp_list;
@@ -306,22 +342,10 @@ struct mpam_msc_ris {
 	/* parent: */
 	struct mpam_vmsc	*vmsc;
 
-	struct mpam_garbage	garbage;
-};
+	/* msmon mbwu configuration is preserved over reset */
+	struct msmon_mbwu_state	*mbwu_state;
 
-/* The values for MSMON_CFG_MBWU_FLT.RWBW */
-enum mon_filter_options {
-	COUNT_BOTH	= 0,
-	COUNT_WRITE	= 1,
-	COUNT_READ	= 2,
-};
-
-struct mon_cfg {
-	u16                     mon;
-	u8                      pmg;
-	bool                    match_pmg;
-	u32                     partid;
-	enum mon_filter_options opts;
+	struct mpam_garbage	garbage;
 };
 
 static inline int mpam_alloc_csu_mon(struct mpam_class *class)
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 28/33] arm_mpam: Track bandwidth counter state for overflow and power management
  2025-08-22 15:30 ` [PATCH 28/33] arm_mpam: Track bandwidth counter state for overflow and power management James Morse
@ 2025-08-28  0:58   ` Fenghua Yu
  2025-09-10 19:29     ` James Morse
  0 siblings, 1 reply; 200+ messages in thread
From: Fenghua Yu @ 2025-08-28  0:58 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi, James,
On 8/22/25 08:30, James Morse wrote:
> Bandwidth counters need to run continuously to correctly reflect the
> bandwidth.
>
> The value read may be lower than the previous value read in the case
> of overflow and when the hardware is reset due to CPU hotplug.
>
> Add struct mbwu_state to track the bandwidth counter to allow overflow
> and power management to be handled.
>
> Signed-off-by: James Morse <james.morse@arm.com>
[SNIP]
> @@ -2291,11 +2395,35 @@ static void mpam_unregister_irqs(void)
>   
>   static void __destroy_component_cfg(struct mpam_component *comp)
>   {
> +	struct mpam_msc *msc;
> +	struct mpam_vmsc *vmsc;
> +	struct mpam_msc_ris *ris;
> +
> +	lockdep_assert_held(&mpam_list_lock);
> +
>   	add_to_garbage(comp->cfg);
> +	list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
> +		msc = vmsc->msc;
> +
> +		mpam_mon_sel_outer_lock(msc);
> +		if (mpam_mon_sel_inner_lock(msc)) {
> +			list_for_each_entry(ris, &vmsc->ris, vmsc_list)
> +				add_to_garbage(ris->mbwu_state);
> +			mpam_mon_sel_inner_unlock(msc);
> +		}
> +		mpam_mon_sel_outer_lock(msc);
s/mpam_mon_sel_outer_lock(msc);/mpam_mon_sel_outer_unlock(msc);/
Or this will hit a dead lock.
[SNIP]
Thanks.
-Fenghua
^ permalink raw reply	[flat|nested] 200+ messages in thread
- * Re: [PATCH 28/33] arm_mpam: Track bandwidth counter state for overflow and power management
  2025-08-28  0:58   ` Fenghua Yu
@ 2025-09-10 19:29     ` James Morse
  0 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-09-10 19:29 UTC (permalink / raw)
  To: Fenghua Yu, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi Fenghua,
On 28/08/2025 01:58, Fenghua Yu wrote:
> On 8/22/25 08:30, James Morse wrote:
>> Bandwidth counters need to run continuously to correctly reflect the
>> bandwidth.
>>
>> The value read may be lower than the previous value read in the case
>> of overflow and when the hardware is reset due to CPU hotplug.
>>
>> Add struct mbwu_state to track the bandwidth counter to allow overflow
>> and power management to be handled.
>> @@ -2291,11 +2395,35 @@ static void mpam_unregister_irqs(void)
>>     static void __destroy_component_cfg(struct mpam_component *comp)
>>   {
>> +    struct mpam_msc *msc;
>> +    struct mpam_vmsc *vmsc;
>> +    struct mpam_msc_ris *ris;
>> +
>> +    lockdep_assert_held(&mpam_list_lock);
>> +
>>       add_to_garbage(comp->cfg);
>> +    list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
>> +        msc = vmsc->msc;
>> +
>> +        mpam_mon_sel_outer_lock(msc);
>> +        if (mpam_mon_sel_inner_lock(msc)) {
>> +            list_for_each_entry(ris, &vmsc->ris, vmsc_list)
>> +                add_to_garbage(ris->mbwu_state);
>> +            mpam_mon_sel_inner_unlock(msc);
>> +        }
>> +        mpam_mon_sel_outer_lock(msc);
> 
> s/mpam_mon_sel_outer_lock(msc);/mpam_mon_sel_outer_unlock(msc);/
> 
> Or this will hit a dead lock.
Heh, that's a good typo. Thanks!
James
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
 
- * [PATCH 29/33] arm_mpam: Probe for long/lwd mbwu counters
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (61 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 28/33] arm_mpam: Track bandwidth counter state for overflow and power management James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-22 15:30 ` [PATCH 30/33] arm_mpam: Use long MBWU counters if supported James Morse
                   ` (4 subsequent siblings)
  67 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
From: Rohit Mathew <rohit.mathew@arm.com>
mpam v0.1 and versions above v1.0 support optional long counter for
memory bandwidth monitoring. The MPAMF_MBWUMON_IDR register have fields
indicating support for long counters. As of now, a 44 bit counter
represented by HAS_LONG field (bit 30) and a 63 bit counter represented
by LWD (bit 29) can be optionally integrated. Probe for these counters
and set corresponding feature bits if any of these counters are present.
Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_devices.c  | 23 ++++++++++++++++++++++-
 drivers/resctrl/mpam_internal.h |  8 ++++++++
 2 files changed, 30 insertions(+), 1 deletion(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 11be34b54643..2ab7f127baaa 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -870,7 +870,7 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
 				pr_err_once("Counters are not usable because not-ready timeout was not provided by firmware.");
 		}
 		if (FIELD_GET(MPAMF_MSMON_IDR_MSMON_MBWU, msmon_features)) {
-			bool hw_managed;
+			bool has_long, hw_managed;
 			u32 mbwumonidr = mpam_read_partsel_reg(msc, MBWUMON_IDR);
 
 			props->num_mbwu_mon = FIELD_GET(MPAMF_MBWUMON_IDR_NUM_MON, mbwumonidr);
@@ -880,6 +880,27 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
 			if (FIELD_GET(MPAMF_MBWUMON_IDR_HAS_RWBW, mbwumonidr))
 				mpam_set_feature(mpam_feat_msmon_mbwu_rwbw, props);
 
+			/*
+			 * Treat long counter and its extension, lwd as mutually
+			 * exclusive feature bits. Though these are dependent
+			 * fields at the implementation level, there would never
+			 * be a need for mpam_feat_msmon_mbwu_44counter (long
+			 * counter) and mpam_feat_msmon_mbwu_63counter (lwd)
+			 * bits to be set together.
+			 *
+			 * mpam_feat_msmon_mbwu isn't treated as an exclusive
+			 * bit as this feature bit would be used as the "front
+			 * facing feature bit" for any checks related to mbwu
+			 * monitors.
+			 */
+			has_long = FIELD_GET(MPAMF_MBWUMON_IDR_HAS_LONG, mbwumonidr);
+			if (props->num_mbwu_mon && has_long) {
+				if (FIELD_GET(MPAMF_MBWUMON_IDR_LWD, mbwumonidr))
+					mpam_set_feature(mpam_feat_msmon_mbwu_63counter, props);
+				else
+					mpam_set_feature(mpam_feat_msmon_mbwu_44counter, props);
+			}
+
 			/* Is NRDY hardware managed? */
 			mpam_mon_sel_outer_lock(msc);
 			hw_managed = mpam_ris_hw_probe_hw_nrdy(ris, MBWU);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 9a50a5432f4a..9f627b5f72a1 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -178,7 +178,15 @@ enum mpam_device_features {
 	mpam_feat_msmon_csu,
 	mpam_feat_msmon_csu_capture,
 	mpam_feat_msmon_csu_hw_nrdy,
+
+	/*
+	 * Having mpam_feat_msmon_mbwu set doesn't mean the regular 31 bit MBWU
+	 * counter would be used. The exact counter used is decided based on the
+	 * status of mpam_feat_msmon_mbwu_l/mpam_feat_msmon_mbwu_lwd as well.
+	 */
 	mpam_feat_msmon_mbwu,
+	mpam_feat_msmon_mbwu_44counter,
+	mpam_feat_msmon_mbwu_63counter,
 	mpam_feat_msmon_mbwu_capture,
 	mpam_feat_msmon_mbwu_rwbw,
 	mpam_feat_msmon_mbwu_hw_nrdy,
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * [PATCH 30/33] arm_mpam: Use long MBWU counters if supported
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (62 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 29/33] arm_mpam: Probe for long/lwd mbwu counters James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-22 15:30 ` [PATCH 31/33] arm_mpam: Add helper to reset saved mbwu state James Morse
                   ` (3 subsequent siblings)
  67 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
From: Rohit Mathew <rohit.mathew@arm.com>
If the 44 bit (long) or 63 bit (LWD) counters are detected on probing
the RIS, use long/LWD counter instead of the regular 31 bit mbwu
counter.
Only 32bit accesses to the MSC are required to be supported by the
spec, but these registers are 64bits. The lower half may overflow
into the higher half between two 32bit reads. To avoid this, use
a helper that reads the top half multiple times to check for overflow.
Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
[morse: merged multiple patches from Rohit]
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since RFC:
 * Commit message wrangling.
 * Refer to 31 bit counters as opposed to 32 bit (registers).
---
 drivers/resctrl/mpam_devices.c | 89 ++++++++++++++++++++++++++++++----
 1 file changed, 80 insertions(+), 9 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 2ab7f127baaa..8fbcf6eb946a 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -1002,6 +1002,48 @@ struct mon_read {
 	int				err;
 };
 
+static bool mpam_ris_has_mbwu_long_counter(struct mpam_msc_ris *ris)
+{
+	return (mpam_has_feature(mpam_feat_msmon_mbwu_63counter, &ris->props) ||
+		mpam_has_feature(mpam_feat_msmon_mbwu_44counter, &ris->props));
+}
+
+static u64 mpam_msc_read_mbwu_l(struct mpam_msc *msc)
+{
+	int retry = 3;
+	u32 mbwu_l_low;
+	u64 mbwu_l_high1, mbwu_l_high2;
+
+	mpam_mon_sel_lock_held(msc);
+
+	WARN_ON_ONCE((MSMON_MBWU_L + sizeof(u64)) > msc->mapped_hwpage_sz);
+	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
+
+	mbwu_l_high2 = __mpam_read_reg(msc, MSMON_MBWU_L + 4);
+	do {
+		mbwu_l_high1 = mbwu_l_high2;
+		mbwu_l_low = __mpam_read_reg(msc, MSMON_MBWU_L);
+		mbwu_l_high2 = __mpam_read_reg(msc, MSMON_MBWU_L + 4);
+
+		retry--;
+	} while (mbwu_l_high1 != mbwu_l_high2 && retry > 0);
+
+	if (mbwu_l_high1 == mbwu_l_high2)
+		return (mbwu_l_high1 << 32) | mbwu_l_low;
+	return MSMON___NRDY_L;
+}
+
+static void mpam_msc_zero_mbwu_l(struct mpam_msc *msc)
+{
+	mpam_mon_sel_lock_held(msc);
+
+	WARN_ON_ONCE((MSMON_MBWU_L + sizeof(u64)) > msc->mapped_hwpage_sz);
+	WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(), &msc->accessibility));
+
+	__mpam_write_reg(msc, MSMON_MBWU_L, 0);
+	__mpam_write_reg(msc, MSMON_MBWU_L + 4, 0);
+}
+
 static void gen_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
 				   u32 *flt_val)
 {
@@ -1058,6 +1100,7 @@ static void read_msmon_ctl_flt_vals(struct mon_read *m, u32 *ctl_val,
 static void clean_msmon_ctl_val(u32 *cur_ctl)
 {
 	*cur_ctl &= ~MSMON_CFG_x_CTL_OFLOW_STATUS;
+	*cur_ctl &= ~MSMON_CFG_MBWU_CTL_OFLOW_STATUS_L;
 }
 
 static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
@@ -1080,7 +1123,11 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
 	case mpam_feat_msmon_mbwu:
 		mpam_write_monsel_reg(msc, CFG_MBWU_FLT, flt_val);
 		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
-		mpam_write_monsel_reg(msc, MBWU, 0);
+		if (mpam_ris_has_mbwu_long_counter(m->ris))
+			mpam_msc_zero_mbwu_l(m->ris->vmsc->msc);
+		else
+			mpam_write_monsel_reg(msc, MBWU, 0);
+
 		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
 
 		mbwu_state = &m->ris->mbwu_state[m->ctx->mon];
@@ -1095,8 +1142,13 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
 
 static u64 mpam_msmon_overflow_val(struct mpam_msc_ris *ris)
 {
-	/* TODO: scaling, and long counters */
-	return GENMASK_ULL(30, 0);
+	/* TODO: implement scaling counters */
+	if (mpam_has_feature(mpam_feat_msmon_mbwu_63counter, &ris->props))
+		return GENMASK_ULL(62, 0);
+	else if (mpam_has_feature(mpam_feat_msmon_mbwu_44counter, &ris->props))
+		return GENMASK_ULL(43, 0);
+	else
+		return GENMASK_ULL(30, 0);
 }
 
 /* Call with MSC lock held */
@@ -1138,10 +1190,24 @@ static void __ris_msmon_read(void *arg)
 		now = FIELD_GET(MSMON___VALUE, now);
 		break;
 	case mpam_feat_msmon_mbwu:
-		now = mpam_read_monsel_reg(msc, MBWU);
-		if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
-			nrdy = now & MSMON___NRDY;
-		now = FIELD_GET(MSMON___VALUE, now);
+		/*
+		 * If long or lwd counters are supported, use them, else revert
+		 * to the 31 bit counter.
+		 */
+		if (mpam_ris_has_mbwu_long_counter(ris)) {
+			now = mpam_msc_read_mbwu_l(msc);
+			if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
+				nrdy = now & MSMON___NRDY_L;
+			if (mpam_has_feature(mpam_feat_msmon_mbwu_63counter, rprops))
+				now = FIELD_GET(MSMON___LWD_VALUE, now);
+			else
+				now = FIELD_GET(MSMON___L_VALUE, now);
+		} else {
+			now = mpam_read_monsel_reg(msc, MBWU);
+			if (mpam_has_feature(mpam_feat_msmon_mbwu_hw_nrdy, rprops))
+				nrdy = now & MSMON___NRDY;
+			now = FIELD_GET(MSMON___VALUE, now);
+		}
 
 		if (nrdy)
 			break;
@@ -1433,8 +1499,13 @@ static int mpam_save_mbwu_state(void *arg)
 		cur_ctl = mpam_read_monsel_reg(msc, CFG_MBWU_CTL);
 		mpam_write_monsel_reg(msc, CFG_MBWU_CTL, 0);
 
-		val = mpam_read_monsel_reg(msc, MBWU);
-		mpam_write_monsel_reg(msc, MBWU, 0);
+		if (mpam_ris_has_mbwu_long_counter(ris)) {
+			val = mpam_msc_read_mbwu_l(msc);
+			mpam_msc_zero_mbwu_l(msc);
+		} else {
+			val = mpam_read_monsel_reg(msc, MBWU);
+			mpam_write_monsel_reg(msc, MBWU, 0);
+		}
 
 		cfg->mon = i;
 		cfg->pmg = FIELD_GET(MSMON_CFG_MBWU_FLT_PMG, cur_flt);
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * [PATCH 31/33] arm_mpam: Add helper to reset saved mbwu state
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (63 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 30/33] arm_mpam: Use long MBWU counters if supported James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-22 15:30 ` [PATCH 32/33] arm_mpam: Add kunit test for bitmap reset James Morse
                   ` (2 subsequent siblings)
  67 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
resctrl expects to reset the bandwidth counters when the filesystem
is mounted.
To allow this, add a helper that clears the saved mbwu state. Instead
of cross calling to each CPU that can access the component MSC to
write to the counter, set a flag that causes it to be zero'd on the
the next read. This is easily done by forcing a configuration update.
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_devices.c  | 49 +++++++++++++++++++++++++++++++--
 drivers/resctrl/mpam_internal.h |  5 +++-
 2 files changed, 51 insertions(+), 3 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 8fbcf6eb946a..65c30ebfe001 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -1155,9 +1155,11 @@ static u64 mpam_msmon_overflow_val(struct mpam_msc_ris *ris)
 static void __ris_msmon_read(void *arg)
 {
 	bool nrdy = false;
+	bool config_mismatch;
 	struct mon_read *m = arg;
 	u64 now, overflow_val = 0;
 	struct mon_cfg *ctx = m->ctx;
+	bool reset_on_next_read = false;
 	struct mpam_msc_ris *ris = m->ris;
 	struct msmon_mbwu_state *mbwu_state;
 	struct mpam_props *rprops = &ris->props;
@@ -1172,6 +1174,14 @@ static void __ris_msmon_read(void *arg)
 		  FIELD_PREP(MSMON_CFG_MON_SEL_RIS, ris->ris_idx);
 	mpam_write_monsel_reg(msc, CFG_MON_SEL, mon_sel);
 
+	if (m->type == mpam_feat_msmon_mbwu) {
+		mbwu_state = &ris->mbwu_state[ctx->mon];
+		if (mbwu_state) {
+			reset_on_next_read = mbwu_state->reset_on_next_read;
+			mbwu_state->reset_on_next_read = false;
+		}
+	}
+
 	/*
 	 * Read the existing configuration to avoid re-writing the same values.
 	 * This saves waiting for 'nrdy' on subsequent reads.
@@ -1179,7 +1189,10 @@ static void __ris_msmon_read(void *arg)
 	read_msmon_ctl_flt_vals(m, &cur_ctl, &cur_flt);
 	clean_msmon_ctl_val(&cur_ctl);
 	gen_msmon_ctl_flt_vals(m, &ctl_val, &flt_val);
-	if (cur_flt != flt_val || cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN))
+	config_mismatch = cur_flt != flt_val ||
+			  cur_ctl != (ctl_val | MSMON_CFG_x_CTL_EN);
+
+	if (config_mismatch || reset_on_next_read)
 		write_msmon_ctl_flt_vals(m, ctl_val, flt_val);
 
 	switch (m->type) {
@@ -1212,7 +1225,6 @@ static void __ris_msmon_read(void *arg)
 		if (nrdy)
 			break;
 
-		mbwu_state = &ris->mbwu_state[ctx->mon];
 		if (!mbwu_state)
 			break;
 
@@ -1314,6 +1326,39 @@ int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
 	return err;
 }
 
+void mpam_msmon_reset_mbwu(struct mpam_component *comp, struct mon_cfg *ctx)
+{
+	int idx;
+	struct mpam_msc *msc;
+	struct mpam_vmsc *vmsc;
+	struct mpam_msc_ris *ris;
+
+	if (!mpam_is_enabled())
+		return;
+
+	idx = srcu_read_lock(&mpam_srcu);
+	list_for_each_entry_rcu(vmsc, &comp->vmsc, comp_list) {
+		if (!mpam_has_feature(mpam_feat_msmon_mbwu, &vmsc->props))
+			continue;
+
+		msc = vmsc->msc;
+		mpam_mon_sel_outer_lock(msc);
+		list_for_each_entry_rcu(ris, &vmsc->ris, vmsc_list) {
+			if (!mpam_has_feature(mpam_feat_msmon_mbwu, &ris->props))
+				continue;
+
+			if (WARN_ON_ONCE(!mpam_mon_sel_inner_lock(msc)))
+				continue;
+
+			ris->mbwu_state[ctx->mon].correction = 0;
+			ris->mbwu_state[ctx->mon].reset_on_next_read = true;
+			mpam_mon_sel_inner_unlock(msc);
+		}
+		mpam_mon_sel_outer_unlock(msc);
+	}
+	srcu_read_unlock(&mpam_srcu, idx);
+}
+
 static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
 {
 	u32 num_words, msb;
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 9f627b5f72a1..bbf0306abc82 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -297,10 +297,12 @@ struct mon_cfg {
 
 /*
  * Changes to enabled and cfg are protected by the msc->lock.
- * Changes to prev_val and correction are protected by the msc's mon_sel_lock.
+ * Changes to reset_on_next_read, prev_val and correction are protected by the
+ * msc's mon_sel_lock.
  */
 struct msmon_mbwu_state {
 	bool		enabled;
+	bool		reset_on_next_read;
 	struct mon_cfg	cfg;
 
 	/* The value last read from the hardware. Used to detect overflow. */
@@ -410,6 +412,7 @@ int mpam_apply_config(struct mpam_component *comp, u16 partid,
 
 int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
 		    enum mpam_device_features, u64 *val);
+void mpam_msmon_reset_mbwu(struct mpam_component *comp, struct mon_cfg *ctx);
 
 int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
 				   cpumask_t *affinity);
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * [PATCH 32/33] arm_mpam: Add kunit test for bitmap reset
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (64 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 31/33] arm_mpam: Add helper to reset saved mbwu state James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-08-22 15:30 ` [PATCH 33/33] arm_mpam: Add kunit tests for props_mismatch() James Morse
  2025-08-24 17:24 ` [PATCH 00/33] arm_mpam: Add basic mpam driver Krzysztof Kozlowski
  67 siblings, 0 replies; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich,
	Jonathan Cameron
The bitmap reset code has been a source of bugs. Add a unit test.
This currently has to be built in, as the rest of the driver is
builtin.
Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/Kconfig             | 13 ++++++
 drivers/resctrl/mpam_devices.c      |  4 ++
 drivers/resctrl/test_mpam_devices.c | 68 +++++++++++++++++++++++++++++
 3 files changed, 85 insertions(+)
 create mode 100644 drivers/resctrl/test_mpam_devices.c
diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
index dff7b87280ab..f5e0609975e4 100644
--- a/drivers/resctrl/Kconfig
+++ b/drivers/resctrl/Kconfig
@@ -4,8 +4,21 @@ config ARM64_MPAM_DRIVER
 	bool "MPAM driver for System IP, e,g. caches and memory controllers"
 	depends on ARM64_MPAM && EXPERT
 
+menu "ARM64 MPAM driver options"
+
 config ARM64_MPAM_DRIVER_DEBUG
 	bool "Enable debug messages from the MPAM driver."
 	depends on ARM64_MPAM_DRIVER
 	help
 	  Say yes here to enable debug messages from the MPAM driver.
+
+config MPAM_KUNIT_TEST
+	bool "KUnit tests for MPAM driver " if !KUNIT_ALL_TESTS
+	depends on KUNIT=y
+	default KUNIT_ALL_TESTS
+	help
+	  Enable this option to run tests in the MPAM driver.
+
+	  If unsure, say N.
+
+endmenu
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 65c30ebfe001..4cf5aae88c53 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -2903,3 +2903,7 @@ static int __init mpam_msc_driver_init(void)
 }
 /* Must occur after arm64_mpam_register_cpus() from arch_initcall() */
 subsys_initcall(mpam_msc_driver_init);
+
+#ifdef CONFIG_MPAM_KUNIT_TEST
+#include "test_mpam_devices.c"
+#endif
diff --git a/drivers/resctrl/test_mpam_devices.c b/drivers/resctrl/test_mpam_devices.c
new file mode 100644
index 000000000000..8e9d6c88171c
--- /dev/null
+++ b/drivers/resctrl/test_mpam_devices.c
@@ -0,0 +1,68 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2024 Arm Ltd.
+/* This file is intended to be included into mpam_devices.c */
+
+#include <kunit/test.h>
+
+static void test_mpam_reset_msc_bitmap(struct kunit *test)
+{
+	char *buf = kunit_kzalloc(test, SZ_16K, GFP_KERNEL);
+	struct mpam_msc fake_msc;
+	u32 *test_result;
+
+	if (!buf)
+		return;
+
+	fake_msc.mapped_hwpage = buf;
+	fake_msc.mapped_hwpage_sz = SZ_16K;
+	cpumask_copy(&fake_msc.accessibility, cpu_possible_mask);
+
+	mutex_init(&fake_msc.part_sel_lock);
+	mutex_lock(&fake_msc.part_sel_lock);
+
+	test_result = (u32 *)(buf + MPAMCFG_CPBM);
+
+	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 0);
+	KUNIT_EXPECT_EQ(test, test_result[0], 0);
+	KUNIT_EXPECT_EQ(test, test_result[1], 0);
+	test_result[0] = 0;
+	test_result[1] = 0;
+
+	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 1);
+	KUNIT_EXPECT_EQ(test, test_result[0], 1);
+	KUNIT_EXPECT_EQ(test, test_result[1], 0);
+	test_result[0] = 0;
+	test_result[1] = 0;
+
+	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 16);
+	KUNIT_EXPECT_EQ(test, test_result[0], 0xffff);
+	KUNIT_EXPECT_EQ(test, test_result[1], 0);
+	test_result[0] = 0;
+	test_result[1] = 0;
+
+	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 32);
+	KUNIT_EXPECT_EQ(test, test_result[0], 0xffffffff);
+	KUNIT_EXPECT_EQ(test, test_result[1], 0);
+	test_result[0] = 0;
+	test_result[1] = 0;
+
+	mpam_reset_msc_bitmap(&fake_msc, MPAMCFG_CPBM, 33);
+	KUNIT_EXPECT_EQ(test, test_result[0], 0xffffffff);
+	KUNIT_EXPECT_EQ(test, test_result[1], 1);
+	test_result[0] = 0;
+	test_result[1] = 0;
+
+	mutex_unlock(&fake_msc.part_sel_lock);
+}
+
+static struct kunit_case mpam_devices_test_cases[] = {
+	KUNIT_CASE(test_mpam_reset_msc_bitmap),
+	{}
+};
+
+static struct kunit_suite mpam_devices_test_suite = {
+	.name = "mpam_devices_test_suite",
+	.test_cases = mpam_devices_test_cases,
+};
+
+kunit_test_suites(&mpam_devices_test_suite);
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * [PATCH 33/33] arm_mpam: Add kunit tests for props_mismatch()
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (65 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 32/33] arm_mpam: Add kunit test for bitmap reset James Morse
@ 2025-08-22 15:30 ` James Morse
  2025-09-02 16:59   ` Fenghua Yu
  2025-08-24 17:24 ` [PATCH 00/33] arm_mpam: Add basic mpam driver Krzysztof Kozlowski
  67 siblings, 1 reply; 200+ messages in thread
From: James Morse @ 2025-08-22 15:30 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-acpi, devicetree
  Cc: James Morse, shameerali.kolothum.thodi, D Scott Phillips OS, carl,
	lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
	Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
	Rex Nie, Dave Martin, Koba Ko, Shanker Donthineni, fenghuay,
	baisheng.gao, Jonathan Cameron, Rob Herring, Rohit Mathew,
	Rafael Wysocki, Len Brown, Lorenzo Pieralisi, Hanjun Guo,
	Sudeep Holla, Krzysztof Kozlowski, Conor Dooley, Catalin Marinas,
	Will Deacon, Greg Kroah-Hartman, Danilo Krummrich
When features are mismatched between MSC the way features are combined
to the class determines whether resctrl can support this SoC.
Add some tests to illustrate the sort of thing that is expected to
work, and those that must be removed.
Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/resctrl/mpam_internal.h     |   8 +-
 drivers/resctrl/test_mpam_devices.c | 322 ++++++++++++++++++++++++++++
 2 files changed, 329 insertions(+), 1 deletion(-)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index bbf0306abc82..6e973be095f8 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -18,6 +18,12 @@
 
 DECLARE_STATIC_KEY_FALSE(mpam_enabled);
 
+#ifdef CONFIG_MPAM_KUNIT_TEST
+#define PACKED_FOR_KUNIT __packed
+#else
+#define PACKED_FOR_KUNIT
+#endif
+
 static inline bool mpam_is_enabled(void)
 {
 	return static_branch_likely(&mpam_enabled);
@@ -209,7 +215,7 @@ struct mpam_props {
 	u16			dspri_wd;
 	u16			num_csu_mon;
 	u16			num_mbwu_mon;
-};
+} PACKED_FOR_KUNIT;
 
 #define mpam_has_feature(_feat, x)	((1 << (_feat)) & (x)->features)
 
diff --git a/drivers/resctrl/test_mpam_devices.c b/drivers/resctrl/test_mpam_devices.c
index 8e9d6c88171c..ef39696e7ff8 100644
--- a/drivers/resctrl/test_mpam_devices.c
+++ b/drivers/resctrl/test_mpam_devices.c
@@ -4,6 +4,326 @@
 
 #include <kunit/test.h>
 
+/*
+ * This test catches fields that aren't being sanitised - but can't tell you
+ * which one...
+ */
+static void test__props_mismatch(struct kunit *test)
+{
+	struct mpam_props parent = { 0 };
+	struct mpam_props child;
+
+	memset(&child, 0xff, sizeof(child));
+	__props_mismatch(&parent, &child, false);
+
+	memset(&child, 0, sizeof(child));
+	KUNIT_EXPECT_EQ(test, memcmp(&parent, &child, sizeof(child)), 0);
+
+	memset(&child, 0xff, sizeof(child));
+	__props_mismatch(&parent, &child, true);
+
+	KUNIT_EXPECT_EQ(test, memcmp(&parent, &child, sizeof(child)), 0);
+}
+
+static void test_mpam_enable_merge_features(struct kunit *test)
+{
+	/* o/` How deep is your stack? o/` */
+	struct list_head fake_classes_list;
+	struct mpam_class fake_class = { 0 };
+	struct mpam_component fake_comp1 = { 0 };
+	struct mpam_component fake_comp2 = { 0 };
+	struct mpam_vmsc fake_vmsc1 = { 0 };
+	struct mpam_vmsc fake_vmsc2 = { 0 };
+	struct mpam_msc fake_msc1 = { 0 };
+	struct mpam_msc fake_msc2 = { 0 };
+	struct mpam_msc_ris fake_ris1 = { 0 };
+	struct mpam_msc_ris fake_ris2 = { 0 };
+	struct platform_device fake_pdev = { 0 };
+
+#define RESET_FAKE_HIEARCHY()	do {				\
+	INIT_LIST_HEAD(&fake_classes_list);			\
+								\
+	memset(&fake_class, 0, sizeof(fake_class));		\
+	fake_class.level = 3;					\
+	fake_class.type = MPAM_CLASS_CACHE;			\
+	INIT_LIST_HEAD_RCU(&fake_class.components);		\
+	INIT_LIST_HEAD(&fake_class.classes_list);		\
+								\
+	memset(&fake_comp1, 0, sizeof(fake_comp1));		\
+	memset(&fake_comp2, 0, sizeof(fake_comp2));		\
+	fake_comp1.comp_id = 1;					\
+	fake_comp2.comp_id = 2;					\
+	INIT_LIST_HEAD(&fake_comp1.vmsc);			\
+	INIT_LIST_HEAD(&fake_comp1.class_list);			\
+	INIT_LIST_HEAD(&fake_comp2.vmsc);			\
+	INIT_LIST_HEAD(&fake_comp2.class_list);			\
+								\
+	memset(&fake_vmsc1, 0, sizeof(fake_vmsc1));		\
+	memset(&fake_vmsc2, 0, sizeof(fake_vmsc2));		\
+	INIT_LIST_HEAD(&fake_vmsc1.ris);			\
+	INIT_LIST_HEAD(&fake_vmsc1.comp_list);			\
+	fake_vmsc1.msc = &fake_msc1;				\
+	INIT_LIST_HEAD(&fake_vmsc2.ris);			\
+	INIT_LIST_HEAD(&fake_vmsc2.comp_list);			\
+	fake_vmsc2.msc = &fake_msc2;				\
+								\
+	memset(&fake_ris1, 0, sizeof(fake_ris1));		\
+	memset(&fake_ris2, 0, sizeof(fake_ris2));		\
+	fake_ris1.ris_idx = 1;					\
+	INIT_LIST_HEAD(&fake_ris1.msc_list);			\
+	fake_ris2.ris_idx = 2;					\
+	INIT_LIST_HEAD(&fake_ris2.msc_list);			\
+								\
+	fake_msc1.pdev = &fake_pdev;				\
+	fake_msc2.pdev = &fake_pdev;				\
+								\
+	list_add(&fake_class.classes_list, &fake_classes_list);	\
+} while (0)
+
+	RESET_FAKE_HIEARCHY();
+
+	mutex_lock(&mpam_list_lock);
+
+	/* One Class+Comp, two RIS in one vMSC with common features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = NULL;
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc1;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc1.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cpbm_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
+
+	RESET_FAKE_HIEARCHY();
+
+	/* One Class+Comp, two RIS in one vMSC with non-overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = NULL;
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc1;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc1.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cmax_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	/* Multiple RIS within one MSC controlling the same resource can be mismatched */
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_vmsc1.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
+	KUNIT_EXPECT_EQ(test, fake_vmsc1.props.cmax_wd, 4);
+	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 4);
+
+	RESET_FAKE_HIEARCHY();
+
+	/* One Class+Comp, two MSC with overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp1;
+	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cpbm_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
+
+	RESET_FAKE_HIEARCHY();
+
+	/* One Class+Comp, two MSC with non-overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp1;
+	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cmax_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	/*
+	 * Multiple RIS in different MSC can't the same resource, mismatched
+	 * features can not be supported.
+	 */
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
+	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 0);
+
+	RESET_FAKE_HIEARCHY();
+
+	/* One Class+Comp, two MSC with incompatible overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp1;
+	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
+	mpam_set_feature(mpam_feat_mbw_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_mbw_part, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 5;
+	fake_ris2.props.cpbm_wd = 3;
+	fake_ris1.props.mbw_pbm_bits = 5;
+	fake_ris2.props.mbw_pbm_bits = 3;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	/*
+	 * Multiple RIS in different MSC can't the same resource, mismatched
+	 * features can not be supported.
+	 */
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_mbw_part, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
+	KUNIT_EXPECT_EQ(test, fake_class.props.mbw_pbm_bits, 0);
+
+	RESET_FAKE_HIEARCHY();
+
+	/* One Class+Comp, two MSC with overlapping features that need tweaking */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = NULL;
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp1;
+	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_mbw_min, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_mbw_min, &fake_ris2.props);
+	mpam_set_feature(mpam_feat_cmax_cmax, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cmax_cmax, &fake_ris2.props);
+	fake_ris1.props.bwa_wd = 5;
+	fake_ris2.props.bwa_wd = 3;
+	fake_ris1.props.cmax_wd = 5;
+	fake_ris2.props.cmax_wd = 3;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	/*
+	 * Multiple RIS in different MSC can't the same resource, mismatched
+	 * features can not be supported.
+	 */
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_min, &fake_class.props));
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmax, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.bwa_wd, 3);
+	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 3);
+
+	RESET_FAKE_HIEARCHY();
+
+	/* One Class Two Comp with overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = &fake_class;
+	list_add(&fake_comp2.class_list, &fake_class.components);
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp2;
+	list_add(&fake_vmsc2.comp_list, &fake_comp2.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cpbm_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
+
+	RESET_FAKE_HIEARCHY();
+
+	/* One Class Two Comp with non-overlapping features */
+	fake_comp1.class = &fake_class;
+	list_add(&fake_comp1.class_list, &fake_class.components);
+	fake_comp2.class = &fake_class;
+	list_add(&fake_comp2.class_list, &fake_class.components);
+	fake_vmsc1.comp = &fake_comp1;
+	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
+	fake_vmsc2.comp = &fake_comp2;
+	list_add(&fake_vmsc2.comp_list, &fake_comp2.vmsc);
+	fake_ris1.vmsc = &fake_vmsc1;
+	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
+	fake_ris2.vmsc = &fake_vmsc2;
+	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
+
+	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
+	mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
+	fake_ris1.props.cpbm_wd = 4;
+	fake_ris2.props.cmax_wd = 4;
+
+	mpam_enable_merge_features(&fake_classes_list);
+
+	/*
+	 * Multiple components can't control the same resource, mismatched features can
+	 * not be supported.
+	 */
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
+	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
+	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
+	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 0);
+
+	mutex_unlock(&mpam_list_lock);
+
+#undef RESET_FAKE_HIEARCHY
+}
+
 static void test_mpam_reset_msc_bitmap(struct kunit *test)
 {
 	char *buf = kunit_kzalloc(test, SZ_16K, GFP_KERNEL);
@@ -57,6 +377,8 @@ static void test_mpam_reset_msc_bitmap(struct kunit *test)
 
 static struct kunit_case mpam_devices_test_cases[] = {
 	KUNIT_CASE(test_mpam_reset_msc_bitmap),
+	KUNIT_CASE(test_mpam_enable_merge_features),
+	KUNIT_CASE(test__props_mismatch),
 	{}
 };
 
-- 
2.20.1
^ permalink raw reply related	[flat|nested] 200+ messages in thread
- * Re: [PATCH 33/33] arm_mpam: Add kunit tests for props_mismatch()
  2025-08-22 15:30 ` [PATCH 33/33] arm_mpam: Add kunit tests for props_mismatch() James Morse
@ 2025-09-02 16:59   ` Fenghua Yu
  0 siblings, 0 replies; 200+ messages in thread
From: Fenghua Yu @ 2025-09-02 16:59 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
Hi, James,
On 8/22/25 08:30, James Morse wrote:
> When features are mismatched between MSC the way features are combined
> to the class determines whether resctrl can support this SoC.
>
> Add some tests to illustrate the sort of thing that is expected to
> work, and those that must be removed.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>   drivers/resctrl/mpam_internal.h     |   8 +-
>   drivers/resctrl/test_mpam_devices.c | 322 ++++++++++++++++++++++++++++
>   2 files changed, 329 insertions(+), 1 deletion(-)
[SNIP]
> diff --git a/drivers/resctrl/test_mpam_devices.c b/drivers/resctrl/test_mpam_devices.c
> index 8e9d6c88171c..ef39696e7ff8 100644
> --- a/drivers/resctrl/test_mpam_devices.c
> +++ b/drivers/resctrl/test_mpam_devices.c
> @@ -4,6 +4,326 @@
>   
>   #include <kunit/test.h>
>   
> +/*
> + * This test catches fields that aren't being sanitised - but can't tell you
> + * which one...
> + */
> +static void test__props_mismatch(struct kunit *test)
> +{
> +	struct mpam_props parent = { 0 };
> +	struct mpam_props child;
> +
> +	memset(&child, 0xff, sizeof(child));
> +	__props_mismatch(&parent, &child, false);
> +
> +	memset(&child, 0, sizeof(child));
> +	KUNIT_EXPECT_EQ(test, memcmp(&parent, &child, sizeof(child)), 0);
> +
> +	memset(&child, 0xff, sizeof(child));
> +	__props_mismatch(&parent, &child, true);
> +
> +	KUNIT_EXPECT_EQ(test, memcmp(&parent, &child, sizeof(child)), 0);
> +}
> +
> +static void test_mpam_enable_merge_features(struct kunit *test)
> +{
> +	/* o/` How deep is your stack? o/` */
> +	struct list_head fake_classes_list;
> +	struct mpam_class fake_class = { 0 };
> +	struct mpam_component fake_comp1 = { 0 };
> +	struct mpam_component fake_comp2 = { 0 };
> +	struct mpam_vmsc fake_vmsc1 = { 0 };
> +	struct mpam_vmsc fake_vmsc2 = { 0 };
> +	struct mpam_msc fake_msc1 = { 0 };
> +	struct mpam_msc fake_msc2 = { 0 };
> +	struct mpam_msc_ris fake_ris1 = { 0 };
> +	struct mpam_msc_ris fake_ris2 = { 0 };
> +	struct platform_device fake_pdev = { 0 };
> +
> +#define RESET_FAKE_HIEARCHY()	do {				\
> +	INIT_LIST_HEAD(&fake_classes_list);			\
> +								\
> +	memset(&fake_class, 0, sizeof(fake_class));		\
> +	fake_class.level = 3;					\
> +	fake_class.type = MPAM_CLASS_CACHE;			\
> +	INIT_LIST_HEAD_RCU(&fake_class.components);		\
> +	INIT_LIST_HEAD(&fake_class.classes_list);		\
> +								\
> +	memset(&fake_comp1, 0, sizeof(fake_comp1));		\
> +	memset(&fake_comp2, 0, sizeof(fake_comp2));		\
> +	fake_comp1.comp_id = 1;					\
> +	fake_comp2.comp_id = 2;					\
> +	INIT_LIST_HEAD(&fake_comp1.vmsc);			\
> +	INIT_LIST_HEAD(&fake_comp1.class_list);			\
> +	INIT_LIST_HEAD(&fake_comp2.vmsc);			\
> +	INIT_LIST_HEAD(&fake_comp2.class_list);			\
> +								\
> +	memset(&fake_vmsc1, 0, sizeof(fake_vmsc1));		\
> +	memset(&fake_vmsc2, 0, sizeof(fake_vmsc2));		\
> +	INIT_LIST_HEAD(&fake_vmsc1.ris);			\
> +	INIT_LIST_HEAD(&fake_vmsc1.comp_list);			\
> +	fake_vmsc1.msc = &fake_msc1;				\
> +	INIT_LIST_HEAD(&fake_vmsc2.ris);			\
> +	INIT_LIST_HEAD(&fake_vmsc2.comp_list);			\
> +	fake_vmsc2.msc = &fake_msc2;				\
> +								\
> +	memset(&fake_ris1, 0, sizeof(fake_ris1));		\
> +	memset(&fake_ris2, 0, sizeof(fake_ris2));		\
> +	fake_ris1.ris_idx = 1;					\
> +	INIT_LIST_HEAD(&fake_ris1.msc_list);			\
> +	fake_ris2.ris_idx = 2;					\
> +	INIT_LIST_HEAD(&fake_ris2.msc_list);			\
> +								\
> +	fake_msc1.pdev = &fake_pdev;				\
> +	fake_msc2.pdev = &fake_pdev;				\
> +								\
> +	list_add(&fake_class.classes_list, &fake_classes_list);	\
> +} while (0)
> +
> +	RESET_FAKE_HIEARCHY();
> +
> +	mutex_lock(&mpam_list_lock);
> +
> +	/* One Class+Comp, two RIS in one vMSC with common features */
> +	fake_comp1.class = &fake_class;
> +	list_add(&fake_comp1.class_list, &fake_class.components);
> +	fake_comp2.class = NULL;
> +	fake_vmsc1.comp = &fake_comp1;
> +	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> +	fake_vmsc2.comp = NULL;
> +	fake_ris1.vmsc = &fake_vmsc1;
> +	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> +	fake_ris2.vmsc = &fake_vmsc1;
> +	list_add(&fake_ris2.vmsc_list, &fake_vmsc1.ris);
> +
> +	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
> +	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
> +	fake_ris1.props.cpbm_wd = 4;
> +	fake_ris2.props.cpbm_wd = 4;
> +
> +	mpam_enable_merge_features(&fake_classes_list);
> +
> +	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
> +	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
> +
> +	RESET_FAKE_HIEARCHY();
> +
> +	/* One Class+Comp, two RIS in one vMSC with non-overlapping features */
> +	fake_comp1.class = &fake_class;
> +	list_add(&fake_comp1.class_list, &fake_class.components);
> +	fake_comp2.class = NULL;
> +	fake_vmsc1.comp = &fake_comp1;
> +	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> +	fake_vmsc2.comp = NULL;
> +	fake_ris1.vmsc = &fake_vmsc1;
> +	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> +	fake_ris2.vmsc = &fake_vmsc1;
> +	list_add(&fake_ris2.vmsc_list, &fake_vmsc1.ris);
> +
> +	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
> +	mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
> +	fake_ris1.props.cpbm_wd = 4;
> +	fake_ris2.props.cmax_wd = 4;
> +
> +	mpam_enable_merge_features(&fake_classes_list);
> +
> +	/* Multiple RIS within one MSC controlling the same resource can be mismatched */
> +	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
> +	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
> +	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_vmsc1.props));
> +	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
> +	KUNIT_EXPECT_EQ(test, fake_vmsc1.props.cmax_wd, 4);
> +	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 4);
> +
> +	RESET_FAKE_HIEARCHY();
> +
> +	/* One Class+Comp, two MSC with overlapping features */
> +	fake_comp1.class = &fake_class;
> +	list_add(&fake_comp1.class_list, &fake_class.components);
> +	fake_comp2.class = NULL;
> +	fake_vmsc1.comp = &fake_comp1;
> +	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> +	fake_vmsc2.comp = &fake_comp1;
> +	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
> +	fake_ris1.vmsc = &fake_vmsc1;
> +	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> +	fake_ris2.vmsc = &fake_vmsc2;
> +	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
> +
> +	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
> +	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
> +	fake_ris1.props.cpbm_wd = 4;
> +	fake_ris2.props.cpbm_wd = 4;
> +
> +	mpam_enable_merge_features(&fake_classes_list);
> +
> +	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
> +	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
> +
> +	RESET_FAKE_HIEARCHY();
> +
> +	/* One Class+Comp, two MSC with non-overlapping features */
> +	fake_comp1.class = &fake_class;
> +	list_add(&fake_comp1.class_list, &fake_class.components);
> +	fake_comp2.class = NULL;
> +	fake_vmsc1.comp = &fake_comp1;
> +	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> +	fake_vmsc2.comp = &fake_comp1;
> +	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
> +	fake_ris1.vmsc = &fake_vmsc1;
> +	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> +	fake_ris2.vmsc = &fake_vmsc2;
> +	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
> +
> +	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
> +	mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
> +	fake_ris1.props.cpbm_wd = 4;
> +	fake_ris2.props.cmax_wd = 4;
> +
> +	mpam_enable_merge_features(&fake_classes_list);
> +
> +	/*
> +	 * Multiple RIS in different MSC can't the same resource, mismatched
> +	 * features can not be supported.
> +	 */
> +	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
> +	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
> +	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
> +	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 0);
> +
> +	RESET_FAKE_HIEARCHY();
> +
> +	/* One Class+Comp, two MSC with incompatible overlapping features */
> +	fake_comp1.class = &fake_class;
> +	list_add(&fake_comp1.class_list, &fake_class.components);
> +	fake_comp2.class = NULL;
> +	fake_vmsc1.comp = &fake_comp1;
> +	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> +	fake_vmsc2.comp = &fake_comp1;
> +	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
> +	fake_ris1.vmsc = &fake_vmsc1;
> +	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> +	fake_ris2.vmsc = &fake_vmsc2;
> +	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
> +
> +	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
> +	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
> +	mpam_set_feature(mpam_feat_mbw_part, &fake_ris1.props);
> +	mpam_set_feature(mpam_feat_mbw_part, &fake_ris2.props);
> +	fake_ris1.props.cpbm_wd = 5;
> +	fake_ris2.props.cpbm_wd = 3;
> +	fake_ris1.props.mbw_pbm_bits = 5;
> +	fake_ris2.props.mbw_pbm_bits = 3;
> +
> +	mpam_enable_merge_features(&fake_classes_list);
> +
> +	/*
> +	 * Multiple RIS in different MSC can't the same resource, mismatched
> +	 * features can not be supported.
> +	 */
> +	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
> +	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_mbw_part, &fake_class.props));
> +	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
> +	KUNIT_EXPECT_EQ(test, fake_class.props.mbw_pbm_bits, 0);
> +
> +	RESET_FAKE_HIEARCHY();
> +
> +	/* One Class+Comp, two MSC with overlapping features that need tweaking */
> +	fake_comp1.class = &fake_class;
> +	list_add(&fake_comp1.class_list, &fake_class.components);
> +	fake_comp2.class = NULL;
> +	fake_vmsc1.comp = &fake_comp1;
> +	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> +	fake_vmsc2.comp = &fake_comp1;
> +	list_add(&fake_vmsc2.comp_list, &fake_comp1.vmsc);
> +	fake_ris1.vmsc = &fake_vmsc1;
> +	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> +	fake_ris2.vmsc = &fake_vmsc2;
> +	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
> +
> +	mpam_set_feature(mpam_feat_mbw_min, &fake_ris1.props);
> +	mpam_set_feature(mpam_feat_mbw_min, &fake_ris2.props);
> +	mpam_set_feature(mpam_feat_cmax_cmax, &fake_ris1.props);
> +	mpam_set_feature(mpam_feat_cmax_cmax, &fake_ris2.props);
> +	fake_ris1.props.bwa_wd = 5;
> +	fake_ris2.props.bwa_wd = 3;
> +	fake_ris1.props.cmax_wd = 5;
> +	fake_ris2.props.cmax_wd = 3;
> +
> +	mpam_enable_merge_features(&fake_classes_list);
> +
> +	/*
> +	 * Multiple RIS in different MSC can't the same resource, mismatched
> +	 * features can not be supported.
> +	 */
> +	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_min, &fake_class.props));
> +	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cmax_cmax, &fake_class.props));
> +	KUNIT_EXPECT_EQ(test, fake_class.props.bwa_wd, 3);
> +	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 3);
> +
> +	RESET_FAKE_HIEARCHY();
> +
> +	/* One Class Two Comp with overlapping features */
> +	fake_comp1.class = &fake_class;
> +	list_add(&fake_comp1.class_list, &fake_class.components);
> +	fake_comp2.class = &fake_class;
> +	list_add(&fake_comp2.class_list, &fake_class.components);
> +	fake_vmsc1.comp = &fake_comp1;
> +	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> +	fake_vmsc2.comp = &fake_comp2;
> +	list_add(&fake_vmsc2.comp_list, &fake_comp2.vmsc);
> +	fake_ris1.vmsc = &fake_vmsc1;
> +	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> +	fake_ris2.vmsc = &fake_vmsc2;
> +	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
> +
> +	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
> +	mpam_set_feature(mpam_feat_cpor_part, &fake_ris2.props);
> +	fake_ris1.props.cpbm_wd = 4;
> +	fake_ris2.props.cpbm_wd = 4;
> +
> +	mpam_enable_merge_features(&fake_classes_list);
> +
> +	KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
> +	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 4);
> +
> +	RESET_FAKE_HIEARCHY();
> +
> +	/* One Class Two Comp with non-overlapping features */
> +	fake_comp1.class = &fake_class;
> +	list_add(&fake_comp1.class_list, &fake_class.components);
> +	fake_comp2.class = &fake_class;
> +	list_add(&fake_comp2.class_list, &fake_class.components);
> +	fake_vmsc1.comp = &fake_comp1;
> +	list_add(&fake_vmsc1.comp_list, &fake_comp1.vmsc);
> +	fake_vmsc2.comp = &fake_comp2;
> +	list_add(&fake_vmsc2.comp_list, &fake_comp2.vmsc);
> +	fake_ris1.vmsc = &fake_vmsc1;
> +	list_add(&fake_ris1.vmsc_list, &fake_vmsc1.ris);
> +	fake_ris2.vmsc = &fake_vmsc2;
> +	list_add(&fake_ris2.vmsc_list, &fake_vmsc2.ris);
> +
> +	mpam_set_feature(mpam_feat_cpor_part, &fake_ris1.props);
> +	mpam_set_feature(mpam_feat_cmax_cmin, &fake_ris2.props);
> +	fake_ris1.props.cpbm_wd = 4;
> +	fake_ris2.props.cmax_wd = 4;
> +
> +	mpam_enable_merge_features(&fake_classes_list);
> +
> +	/*
> +	 * Multiple components can't control the same resource, mismatched features can
> +	 * not be supported.
> +	 */
> +	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cpor_part, &fake_class.props));
> +	KUNIT_EXPECT_FALSE(test, mpam_has_feature(mpam_feat_cmax_cmin, &fake_class.props));
> +	KUNIT_EXPECT_EQ(test, fake_class.props.cpbm_wd, 0);
> +	KUNIT_EXPECT_EQ(test, fake_class.props.cmax_wd, 0);
> +
> +	mutex_unlock(&mpam_list_lock);
> +
> +#undef RESET_FAKE_HIEARCHY
> +}
> +
In file included from drivers/resctrl/mpam_devices.c:2908:
drivers/resctrl/test_mpam_devices.c: In function 
‘test_mpam_enable_merge_features’:
drivers/resctrl/test_mpam_devices.c:325:1: error: the frame size of 5520 
bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
   325 | }
       | ^
It's better to split the big function into a few sub-tests. Each 
sub-test defines and uses less variables to avoid big frame size issue.
[SNIP]
Thanks.
-Fenghua
^ permalink raw reply	[flat|nested] 200+ messages in thread
 
- * Re: [PATCH 00/33] arm_mpam: Add basic mpam driver
  2025-08-22 15:29 [PATCH 00/33] arm_mpam: Add basic mpam driver James Morse
                   ` (66 preceding siblings ...)
  2025-08-22 15:30 ` [PATCH 33/33] arm_mpam: Add kunit tests for props_mismatch() James Morse
@ 2025-08-24 17:24 ` Krzysztof Kozlowski
  67 siblings, 0 replies; 200+ messages in thread
From: Krzysztof Kozlowski @ 2025-08-24 17:24 UTC (permalink / raw)
  To: James Morse, linux-kernel, linux-arm-kernel, linux-acpi,
	devicetree
  Cc: shameerali.kolothum.thodi, D Scott Phillips OS, carl, lcherian,
	bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao,
	peternewman, dfustini, amitsinght, David Hildenbrand, Rex Nie,
	Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
	Jonathan Cameron, Rob Herring, Rohit Mathew, Rafael Wysocki,
	Len Brown, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla,
	Krzysztof Kozlowski, Conor Dooley, Catalin Marinas, Will Deacon,
	Greg Kroah-Hartman, Danilo Krummrich
On 22/08/2025 17:29, James Morse wrote:
> Hello,
> 
> This is just enough MPAM driver for the ACPI and DT pre-requisites.
> It doesn't contain any of the resctrl code, meaning you can't actually drive it
> from user-space yet. Becuase of that, its hidden behind CONFIG_EXPERT.
> This will change once the user interface is connected up.
> 
> This is the initial group of patches that allows the resctrl code to be built
> on top. Including that will increase the number of trees that may need to
> coordinate, so breaking it up make sense.
> 
There was v1 of this, so that's a v2. Start using b4 to get it right,
because you just make it difficult for us to review.
Try yourself:
b4 diff <this-patchset>
Works? No.
Also, for some reason you sent it twice, so again: use b4.
Best regards,
Krzysztof
^ permalink raw reply	[flat|nested] 200+ messages in thread