* [PATCH v3 01/47] arm_mpam: Remove duplicate linux/srcu.h header
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-12 17:13 ` Fenghua Yu
2026-01-15 2:12 ` Gavin Shan
2026-01-12 16:58 ` [PATCH v3 02/47] arm_mpam: Use non-atomic bitops when modifying feature bitmap Ben Horgan
` (50 subsequent siblings)
51 siblings, 2 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm, Jiapeng Chong,
Abaci Robot
From: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
./drivers/resctrl/mpam_internal.h: linux/srcu.h is included more than once.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=27328
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Acked-by: James Morse <james.morse@arm.com>
[BH: Keep alphabetical order]
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
drivers/resctrl/mpam_internal.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index e79c3c47259c..17cdc3080d58 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -12,7 +12,6 @@
#include <linux/jump_label.h>
#include <linux/llist.h>
#include <linux/mutex.h>
-#include <linux/srcu.h>
#include <linux/spinlock.h>
#include <linux/srcu.h>
#include <linux/types.h>
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 01/47] arm_mpam: Remove duplicate linux/srcu.h header
2026-01-12 16:58 ` [PATCH v3 01/47] arm_mpam: Remove duplicate linux/srcu.h header Ben Horgan
@ 2026-01-12 17:13 ` Fenghua Yu
2026-01-15 2:12 ` Gavin Shan
1 sibling, 0 replies; 160+ messages in thread
From: Fenghua Yu @ 2026-01-12 17:13 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, gshan, james.morse, jonathan.cameron, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm, Jiapeng Chong, Abaci Robot
On 1/12/26 08:58, Ben Horgan wrote:
> From: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
>
> ./drivers/resctrl/mpam_internal.h: linux/srcu.h is included more than once.
>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reported-by: Abaci Robot <abaci@linux.alibaba.com>
> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=27328
> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
> Acked-by: James Morse <james.morse@arm.com>
> [BH: Keep alphabetical order]
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Thanks.
-Fenghua
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 01/47] arm_mpam: Remove duplicate linux/srcu.h header
2026-01-12 16:58 ` [PATCH v3 01/47] arm_mpam: Remove duplicate linux/srcu.h header Ben Horgan
2026-01-12 17:13 ` Fenghua Yu
@ 2026-01-15 2:12 ` Gavin Shan
1 sibling, 0 replies; 160+ messages in thread
From: Gavin Shan @ 2026-01-15 2:12 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm, Jiapeng Chong,
Abaci Robot
On 1/13/26 12:58 AM, Ben Horgan wrote:
> From: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
>
> ./drivers/resctrl/mpam_internal.h: linux/srcu.h is included more than once.
>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reported-by: Abaci Robot <abaci@linux.alibaba.com>
> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=27328
> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
> Acked-by: James Morse <james.morse@arm.com>
> [BH: Keep alphabetical order]
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> drivers/resctrl/mpam_internal.h | 1 -
> 1 file changed, 1 deletion(-)
>
Reviewed-by: Gavin Shan <gshan@redhat.com>
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 02/47] arm_mpam: Use non-atomic bitops when modifying feature bitmap
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
2026-01-12 16:58 ` [PATCH v3 01/47] arm_mpam: Remove duplicate linux/srcu.h header Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-15 2:14 ` Gavin Shan
2026-01-16 11:57 ` Catalin Marinas
2026-01-12 16:58 ` [PATCH v3 03/47] arm64/sysreg: Add MPAMSM_EL1 register Ben Horgan
` (49 subsequent siblings)
51 siblings, 2 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
In the test__props_mismatch() kunit test we rely on the struct mpam_props
being packed to ensure memcmp doesn't consider packing. Making it packed
reduces the alignment of the features bitmap and so breaks a requirement
for the use of atomics. As we don't rely on the set/clear of these bits
being atomic, just make them non-atomic.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v2:
Add comment (Jonathan)
---
drivers/resctrl/mpam_internal.h | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 17cdc3080d58..e8971842b124 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -200,8 +200,12 @@ struct mpam_props {
} PACKED_FOR_KUNIT;
#define mpam_has_feature(_feat, x) test_bit(_feat, (x)->features)
-#define mpam_set_feature(_feat, x) set_bit(_feat, (x)->features)
-#define mpam_clear_feature(_feat, x) clear_bit(_feat, (x)->features)
+/*
+ * The non-atomic get/set operations are used because if struct mpam_props is
+ * packed, the alignment requirements for atomics aren't met.
+ */
+#define mpam_set_feature(_feat, x) __set_bit(_feat, (x)->features)
+#define mpam_clear_feature(_feat, x) __clear_bit(_feat, (x)->features)
/* The values for MSMON_CFG_MBWU_FLT.RWBW */
enum mon_filter_options {
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 02/47] arm_mpam: Use non-atomic bitops when modifying feature bitmap
2026-01-12 16:58 ` [PATCH v3 02/47] arm_mpam: Use non-atomic bitops when modifying feature bitmap Ben Horgan
@ 2026-01-15 2:14 ` Gavin Shan
2026-01-16 11:57 ` Catalin Marinas
1 sibling, 0 replies; 160+ messages in thread
From: Gavin Shan @ 2026-01-15 2:14 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
On 1/13/26 12:58 AM, Ben Horgan wrote:
> In the test__props_mismatch() kunit test we rely on the struct mpam_props
> being packed to ensure memcmp doesn't consider packing. Making it packed
> reduces the alignment of the features bitmap and so breaks a requirement
> for the use of atomics. As we don't rely on the set/clear of these bits
> being atomic, just make them non-atomic.
>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v2:
> Add comment (Jonathan)
> ---
> drivers/resctrl/mpam_internal.h | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
Reviewed-by: Gavin Shan <gshan@redhat.com>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 02/47] arm_mpam: Use non-atomic bitops when modifying feature bitmap
2026-01-12 16:58 ` [PATCH v3 02/47] arm_mpam: Use non-atomic bitops when modifying feature bitmap Ben Horgan
2026-01-15 2:14 ` Gavin Shan
@ 2026-01-16 11:57 ` Catalin Marinas
2026-01-16 12:02 ` Ben Horgan
1 sibling, 1 reply; 160+ messages in thread
From: Catalin Marinas @ 2026-01-16 11:57 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
On Mon, Jan 12, 2026 at 04:58:29PM +0000, Ben Horgan wrote:
> In the test__props_mismatch() kunit test we rely on the struct mpam_props
> being packed to ensure memcmp doesn't consider packing. Making it packed
> reduces the alignment of the features bitmap and so breaks a requirement
> for the use of atomics. As we don't rely on the set/clear of these bits
> being atomic, just make them non-atomic.
>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v2:
> Add comment (Jonathan)
> ---
> drivers/resctrl/mpam_internal.h | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 17cdc3080d58..e8971842b124 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -200,8 +200,12 @@ struct mpam_props {
> } PACKED_FOR_KUNIT;
>
> #define mpam_has_feature(_feat, x) test_bit(_feat, (x)->features)
> -#define mpam_set_feature(_feat, x) set_bit(_feat, (x)->features)
> -#define mpam_clear_feature(_feat, x) clear_bit(_feat, (x)->features)
> +/*
> + * The non-atomic get/set operations are used because if struct mpam_props is
> + * packed, the alignment requirements for atomics aren't met.
> + */
> +#define mpam_set_feature(_feat, x) __set_bit(_feat, (x)->features)
> +#define mpam_clear_feature(_feat, x) __clear_bit(_feat, (x)->features)
After discussing privately, I can see how test__props_mismatch() can
end up with unaligned atomics on the mmap_props::features array. Happy to
pick it up for 6.19 (probably the first patch as well, though that's
harmless).
Is there a Fixes tag here for future reference?
--
Catalin
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 02/47] arm_mpam: Use non-atomic bitops when modifying feature bitmap
2026-01-16 11:57 ` Catalin Marinas
@ 2026-01-16 12:02 ` Ben Horgan
2026-01-16 12:12 ` Ben Horgan
0 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-16 12:02 UTC (permalink / raw)
To: Catalin Marinas
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
Hi Catalin,
On 1/16/26 11:57, Catalin Marinas wrote:
> On Mon, Jan 12, 2026 at 04:58:29PM +0000, Ben Horgan wrote:
>> In the test__props_mismatch() kunit test we rely on the struct mpam_props
>> being packed to ensure memcmp doesn't consider packing. Making it packed
>> reduces the alignment of the features bitmap and so breaks a requirement
>> for the use of atomics. As we don't rely on the set/clear of these bits
>> being atomic, just make them non-atomic.
>>
>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>> ---
>> Changes since v2:
>> Add comment (Jonathan)
>> ---
>> drivers/resctrl/mpam_internal.h | 8 ++++++--
>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>> index 17cdc3080d58..e8971842b124 100644
>> --- a/drivers/resctrl/mpam_internal.h
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -200,8 +200,12 @@ struct mpam_props {
>> } PACKED_FOR_KUNIT;
>>
>> #define mpam_has_feature(_feat, x) test_bit(_feat, (x)->features)
>> -#define mpam_set_feature(_feat, x) set_bit(_feat, (x)->features)
>> -#define mpam_clear_feature(_feat, x) clear_bit(_feat, (x)->features)
>> +/*
>> + * The non-atomic get/set operations are used because if struct mpam_props is
>> + * packed, the alignment requirements for atomics aren't met.
>> + */
>> +#define mpam_set_feature(_feat, x) __set_bit(_feat, (x)->features)
>> +#define mpam_clear_feature(_feat, x) __clear_bit(_feat, (x)->features)
>
> After discussing privately, I can see how test__props_mismatch() can
> end up with unaligned atomics on the mmap_props::features array. Happy to
> pick it up for 6.19 (probably the first patch as well, though that's
> harmless).
Yes please.
>
> Is there a Fixes tag here for future reference?
>
Yes, the mpam_set/clear macros were introduced in
Fixes: 8c90dc68a5de ("arm_mpam: Probe the hardware features resctrl supports")
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 02/47] arm_mpam: Use non-atomic bitops when modifying feature bitmap
2026-01-16 12:02 ` Ben Horgan
@ 2026-01-16 12:12 ` Ben Horgan
2026-01-16 15:51 ` Catalin Marinas
0 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-16 12:12 UTC (permalink / raw)
To: Catalin Marinas
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
On 1/16/26 12:02, Ben Horgan wrote:
> Hi Catalin,
>
> On 1/16/26 11:57, Catalin Marinas wrote:
>> On Mon, Jan 12, 2026 at 04:58:29PM +0000, Ben Horgan wrote:
>>> In the test__props_mismatch() kunit test we rely on the struct mpam_props
>>> being packed to ensure memcmp doesn't consider packing. Making it packed
>>> reduces the alignment of the features bitmap and so breaks a requirement
>>> for the use of atomics. As we don't rely on the set/clear of these bits
>>> being atomic, just make them non-atomic.
>>>
>>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>>> ---
>>> Changes since v2:
>>> Add comment (Jonathan)
>>> ---
>>> drivers/resctrl/mpam_internal.h | 8 ++++++--
>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>>> index 17cdc3080d58..e8971842b124 100644
>>> --- a/drivers/resctrl/mpam_internal.h
>>> +++ b/drivers/resctrl/mpam_internal.h
>>> @@ -200,8 +200,12 @@ struct mpam_props {
>>> } PACKED_FOR_KUNIT;
>>>
>>> #define mpam_has_feature(_feat, x) test_bit(_feat, (x)->features)
>>> -#define mpam_set_feature(_feat, x) set_bit(_feat, (x)->features)
>>> -#define mpam_clear_feature(_feat, x) clear_bit(_feat, (x)->features)
>>> +/*
>>> + * The non-atomic get/set operations are used because if struct mpam_props is
>>> + * packed, the alignment requirements for atomics aren't met.
>>> + */
>>> +#define mpam_set_feature(_feat, x) __set_bit(_feat, (x)->features)
>>> +#define mpam_clear_feature(_feat, x) __clear_bit(_feat, (x)->features)
>>
>> After discussing privately, I can see how test__props_mismatch() can
>> end up with unaligned atomics on the mmap_props::features array. Happy to
>> pick it up for 6.19 (probably the first patch as well, though that's
>> harmless).
>
> Yes please.
>
>>
>> Is there a Fixes tag here for future reference?
>>
>
> Yes, the mpam_set/clear macros were introduced in
The mpam_set_clear() actually comes after in:
c10ca83a7783 arm_mpam: Merge supported features during mpam_enable() into mpam_class
but I think the fixes below is still the correct one as it is where we could
first start seeing the problem.
>
> Fixes: 8c90dc68a5de ("arm_mpam: Probe the hardware features resctrl supports")
>
> Thanks,
>
> Ben
>
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 02/47] arm_mpam: Use non-atomic bitops when modifying feature bitmap
2026-01-16 12:12 ` Ben Horgan
@ 2026-01-16 15:51 ` Catalin Marinas
0 siblings, 0 replies; 160+ messages in thread
From: Catalin Marinas @ 2026-01-16 15:51 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
On Fri, Jan 16, 2026 at 12:12:53PM +0000, Ben Horgan wrote:
> On 1/16/26 12:02, Ben Horgan wrote:
> > On 1/16/26 11:57, Catalin Marinas wrote:
> >> On Mon, Jan 12, 2026 at 04:58:29PM +0000, Ben Horgan wrote:
> >>> In the test__props_mismatch() kunit test we rely on the struct mpam_props
> >>> being packed to ensure memcmp doesn't consider packing. Making it packed
> >>> reduces the alignment of the features bitmap and so breaks a requirement
> >>> for the use of atomics. As we don't rely on the set/clear of these bits
> >>> being atomic, just make them non-atomic.
> >>>
> >>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> >>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> >>> ---
> >>> Changes since v2:
> >>> Add comment (Jonathan)
> >>> ---
> >>> drivers/resctrl/mpam_internal.h | 8 ++++++--
> >>> 1 file changed, 6 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> >>> index 17cdc3080d58..e8971842b124 100644
> >>> --- a/drivers/resctrl/mpam_internal.h
> >>> +++ b/drivers/resctrl/mpam_internal.h
> >>> @@ -200,8 +200,12 @@ struct mpam_props {
> >>> } PACKED_FOR_KUNIT;
> >>>
> >>> #define mpam_has_feature(_feat, x) test_bit(_feat, (x)->features)
> >>> -#define mpam_set_feature(_feat, x) set_bit(_feat, (x)->features)
> >>> -#define mpam_clear_feature(_feat, x) clear_bit(_feat, (x)->features)
> >>> +/*
> >>> + * The non-atomic get/set operations are used because if struct mpam_props is
> >>> + * packed, the alignment requirements for atomics aren't met.
> >>> + */
> >>> +#define mpam_set_feature(_feat, x) __set_bit(_feat, (x)->features)
> >>> +#define mpam_clear_feature(_feat, x) __clear_bit(_feat, (x)->features)
> >>
> >> After discussing privately, I can see how test__props_mismatch() can
> >> end up with unaligned atomics on the mmap_props::features array. Happy to
> >> pick it up for 6.19 (probably the first patch as well, though that's
> >> harmless).
> >
> > Yes please.
> >
> >>
> >> Is there a Fixes tag here for future reference?
> >>
> >
> > Yes, the mpam_set/clear macros were introduced in
>
> The mpam_set_clear() actually comes after in:
> c10ca83a7783 arm_mpam: Merge supported features during mpam_enable() into mpam_class
> but I think the fixes below is still the correct one as it is where we could
> first start seeing the problem.
> >
> > Fixes: 8c90dc68a5de ("arm_mpam: Probe the hardware features resctrl supports")
Yes, I left the original as that's the one first introducing the atomic
bitops on this structure.
--
Catalin
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 03/47] arm64/sysreg: Add MPAMSM_EL1 register
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
2026-01-12 16:58 ` [PATCH v3 01/47] arm_mpam: Remove duplicate linux/srcu.h header Ben Horgan
2026-01-12 16:58 ` [PATCH v3 02/47] arm_mpam: Use non-atomic bitops when modifying feature bitmap Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-15 2:16 ` Gavin Shan
2026-01-15 17:59 ` Catalin Marinas
2026-01-12 16:58 ` [PATCH v3 04/47] KVM: arm64: Preserve host MPAM configuration when changing traps Ben Horgan
` (48 subsequent siblings)
51 siblings, 2 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
The MPAMSM_EL1 register determines the MPAM configuration for an SMCU. Add
the register definition.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
arch/arm64/tools/sysreg | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
index 8921b51866d6..afbb55c9b038 100644
--- a/arch/arm64/tools/sysreg
+++ b/arch/arm64/tools/sysreg
@@ -5052,6 +5052,14 @@ Field 31:16 PARTID_D
Field 15:0 PARTID_I
EndSysreg
+Sysreg MPAMSM_EL1 3 0 10 5 3
+Res0 63:48
+Field 47:40 PMG_D
+Res0 39:32
+Field 31:16 PARTID_D
+Res0 15:0
+EndSysreg
+
Sysreg ISR_EL1 3 0 12 1 0
Res0 63:11
Field 10 IS
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 03/47] arm64/sysreg: Add MPAMSM_EL1 register
2026-01-12 16:58 ` [PATCH v3 03/47] arm64/sysreg: Add MPAMSM_EL1 register Ben Horgan
@ 2026-01-15 2:16 ` Gavin Shan
2026-01-15 17:59 ` Catalin Marinas
1 sibling, 0 replies; 160+ messages in thread
From: Gavin Shan @ 2026-01-15 2:16 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
On 1/13/26 12:58 AM, Ben Horgan wrote:
> The MPAMSM_EL1 register determines the MPAM configuration for an SMCU. Add
> the register definition.
>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> arch/arm64/tools/sysreg | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
Reviewed-by: Gavin Shan <gshan@redhat.com>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 03/47] arm64/sysreg: Add MPAMSM_EL1 register
2026-01-12 16:58 ` [PATCH v3 03/47] arm64/sysreg: Add MPAMSM_EL1 register Ben Horgan
2026-01-15 2:16 ` Gavin Shan
@ 2026-01-15 17:59 ` Catalin Marinas
1 sibling, 0 replies; 160+ messages in thread
From: Catalin Marinas @ 2026-01-15 17:59 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
On Mon, Jan 12, 2026 at 04:58:30PM +0000, Ben Horgan wrote:
> The MPAMSM_EL1 register determines the MPAM configuration for an SMCU. Add
> the register definition.
>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 04/47] KVM: arm64: Preserve host MPAM configuration when changing traps
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (2 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 03/47] arm64/sysreg: Add MPAMSM_EL1 register Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-15 2:33 ` Gavin Shan
2026-01-12 16:58 ` [PATCH v3 05/47] KVM: arm64: Make MPAMSM_EL1 accesses UNDEF Ben Horgan
` (47 subsequent siblings)
51 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
When kvm enables or disables MPAM traps to EL2 it clears all other bits in
MPAM2_EL2. Notably, it clears the partition ids (PARTIDs) and performance
monitoring groups (PMGs). Avoid changing these bits in anticipation of
adding support for MPAM in the kernel. Otherwise, on a VHE system with the
host running at EL2 where MPAM2_EL2 and MPAM1_EL1 access the same register,
any attempt to use MPAM to monitor or partition resources for kernel space
would be foiled by running a KVM guest. Additionally, MPAM2_EL2.EnMPAMSM is
always set to 0 which causes MPAMSM_EL1 to always trap. Keep EnMPAMSM set
to 1 when not in a guest so that the kernel can use MPAMSM_EL1.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
arch/arm64/kvm/hyp/include/hyp/switch.h | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index c5d5e5b86eaf..63195275a8b8 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -269,7 +269,8 @@ static inline void __deactivate_traps_hfgxtr(struct kvm_vcpu *vcpu)
static inline void __activate_traps_mpam(struct kvm_vcpu *vcpu)
{
- u64 r = MPAM2_EL2_TRAPMPAM0EL1 | MPAM2_EL2_TRAPMPAM1EL1;
+ u64 clr = MPAM2_EL2_EnMPAMSM;
+ u64 set = MPAM2_EL2_TRAPMPAM0EL1 | MPAM2_EL2_TRAPMPAM1EL1;
if (!system_supports_mpam())
return;
@@ -279,18 +280,21 @@ static inline void __activate_traps_mpam(struct kvm_vcpu *vcpu)
write_sysreg_s(MPAMHCR_EL2_TRAP_MPAMIDR_EL1, SYS_MPAMHCR_EL2);
} else {
/* From v1.1 TIDR can trap MPAMIDR, set it unconditionally */
- r |= MPAM2_EL2_TIDR;
+ set |= MPAM2_EL2_TIDR;
}
- write_sysreg_s(r, SYS_MPAM2_EL2);
+ sysreg_clear_set_s(SYS_MPAM2_EL2, clr, set);
}
static inline void __deactivate_traps_mpam(void)
{
+ u64 clr = MPAM2_EL2_TRAPMPAM0EL1 | MPAM2_EL2_TRAPMPAM1EL1 | MPAM2_EL2_TIDR;
+ u64 set = MPAM2_EL2_EnMPAMSM;
+
if (!system_supports_mpam())
return;
- write_sysreg_s(0, SYS_MPAM2_EL2);
+ sysreg_clear_set_s(SYS_MPAM2_EL2, clr, set);
if (system_supports_mpam_hcr())
write_sysreg_s(MPAMHCR_HOST_FLAGS, SYS_MPAMHCR_EL2);
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 04/47] KVM: arm64: Preserve host MPAM configuration when changing traps
2026-01-12 16:58 ` [PATCH v3 04/47] KVM: arm64: Preserve host MPAM configuration when changing traps Ben Horgan
@ 2026-01-15 2:33 ` Gavin Shan
0 siblings, 0 replies; 160+ messages in thread
From: Gavin Shan @ 2026-01-15 2:33 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
On 1/13/26 12:58 AM, Ben Horgan wrote:
> When kvm enables or disables MPAM traps to EL2 it clears all other bits in
> MPAM2_EL2. Notably, it clears the partition ids (PARTIDs) and performance
> monitoring groups (PMGs). Avoid changing these bits in anticipation of
> adding support for MPAM in the kernel. Otherwise, on a VHE system with the
> host running at EL2 where MPAM2_EL2 and MPAM1_EL1 access the same register,
> any attempt to use MPAM to monitor or partition resources for kernel space
> would be foiled by running a KVM guest. Additionally, MPAM2_EL2.EnMPAMSM is
> always set to 0 which causes MPAMSM_EL1 to always trap. Keep EnMPAMSM set
> to 1 when not in a guest so that the kernel can use MPAMSM_EL1.
>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> arch/arm64/kvm/hyp/include/hyp/switch.h | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
Reviewed-by: Gavin Shan <gshan@redhat.com>
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 05/47] KVM: arm64: Make MPAMSM_EL1 accesses UNDEF
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (3 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 04/47] KVM: arm64: Preserve host MPAM configuration when changing traps Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-15 2:34 ` Gavin Shan
2026-01-12 16:58 ` [PATCH v3 06/47] arm64: mpam: Context switch the MPAM registers Ben Horgan
` (46 subsequent siblings)
51 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
The MPAMSM_EL1 controls the MPAM labeling for an SMCU, Streaming Mode
Compute Unit. As there is on MPAM support in kvm, make sure MPAMSM_EL1
accesses trigger an UNDEF.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v2:
Remove paragraph from commit on allowed range of values
---
arch/arm64/kvm/sys_regs.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index c8fd7c6a12a1..72654ab984ee 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -3373,6 +3373,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ SYS_DESC(SYS_MPAM1_EL1), undef_access },
{ SYS_DESC(SYS_MPAM0_EL1), undef_access },
+ { SYS_DESC(SYS_MPAMSM_EL1), undef_access },
+
{ SYS_DESC(SYS_VBAR_EL1), access_rw, reset_val, VBAR_EL1, 0 },
{ SYS_DESC(SYS_DISR_EL1), NULL, reset_val, DISR_EL1, 0 },
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 05/47] KVM: arm64: Make MPAMSM_EL1 accesses UNDEF
2026-01-12 16:58 ` [PATCH v3 05/47] KVM: arm64: Make MPAMSM_EL1 accesses UNDEF Ben Horgan
@ 2026-01-15 2:34 ` Gavin Shan
0 siblings, 0 replies; 160+ messages in thread
From: Gavin Shan @ 2026-01-15 2:34 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
On 1/13/26 12:58 AM, Ben Horgan wrote:
> The MPAMSM_EL1 controls the MPAM labeling for an SMCU, Streaming Mode
> Compute Unit. As there is on MPAM support in kvm, make sure MPAMSM_EL1
> accesses trigger an UNDEF.
>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v2:
> Remove paragraph from commit on allowed range of values
> ---
> arch/arm64/kvm/sys_regs.c | 2 ++
> 1 file changed, 2 insertions(+)
>
Reviewed-by: Gavin Shan <gshan@redhat.com>
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 06/47] arm64: mpam: Context switch the MPAM registers
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (4 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 05/47] KVM: arm64: Make MPAMSM_EL1 accesses UNDEF Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-15 6:47 ` Gavin Shan
2026-01-15 17:58 ` Catalin Marinas
2026-01-12 16:58 ` [PATCH v3 07/47] arm64: mpam: Re-initialise MPAM regs when CPU comes online Ben Horgan
` (45 subsequent siblings)
51 siblings, 2 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: James Morse <james.morse@arm.com>
MPAM allows traffic in the SoC to be labeled by the OS, these labels are
used to apply policy in caches and bandwidth regulators, and to monitor
traffic in the SoC. The label is made up of a PARTID and PMG value. The x86
equivalent calls these CLOSID and RMID, but they don't map precisely.
MPAM has two CPU system registers that is used to hold the PARTID and PMG
values that traffic generated at each exception level will use. These can
be set per-task by the resctrl file system. (resctrl is the defacto
interface for controlling this stuff).
Add a helper to switch this.
struct task_struct's separate CLOSID and RMID fields are insufficient to
implement resctrl using MPAM, as resctrl can change the PARTID (CLOSID) and
PMG (sort of like the RMID) separately. On x86, the rmid is an independent
number, so a race that writes a mismatched closid and rmid into hardware is
benign. On arm64, the pmg bits extend the partid.
(i.e. partid-5 has a pmg-0 that is not the same as partid-6's pmg-0). In
this case, mismatching the values will 'dirty' a pmg value that resctrl
believes is clean, and is not tracking with its 'limbo' code.
To avoid this, the partid and pmg are always read and written as a
pair. This requires a new u64 field. In struct task_struct there are two
u32, rmid and closid for the x86 case, but as we can't use them here do
something else. Add this new field, mpam_partid_pmg, to struct thread_info
to avoid adding more architecture specific code to struct task_struct.
Always use READ_ONCE()/WRITE_ONCE() when accessing this field.
Resctrl allows a per-cpu 'default' value to be set, this overrides the
values when scheduling a task in the default control-group, which has
PARTID 0. The way 'code data prioritisation' gets emulated means the
register value for the default group needs to be a variable.
The current system register value is kept in a per-cpu variable to avoid
writing to the system register if the value isn't going to change. Writes
to this register may reset the hardware state for regulating bandwidth.
Finally, there is no reason to context switch these registers unless there
is a driver changing the values in struct task_struct. Hide the whole thing
behind a static key. This also allows the driver to disable MPAM in
response to errors reported by hardware. Move the existing static key to
belong to the arch code, as in the future the MPAM driver may become a
loadable module.
All this should depend on whether there is an MPAM driver, hide it behind
CONFIG_ARM64_MPAM.
CC: Amit Singh Tomar <amitsinght@marvell.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since rfc:
CONFIG_MPAM -> CONFIG_ARM64_MPAM in commit message
Remove extra DECLARE_STATIC_KEY_FALSE
Function name in comment, __mpam_sched_in() -> mpam_thread_switch()
Remove unused headers
Expand comment (Jonathan)
Changes since v2:
Tidy up ifdefs
---
arch/arm64/Kconfig | 2 +
arch/arm64/include/asm/mpam.h | 67 ++++++++++++++++++++++++++++
arch/arm64/include/asm/thread_info.h | 3 ++
arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/mpam.c | 13 ++++++
arch/arm64/kernel/process.c | 7 +++
drivers/resctrl/mpam_devices.c | 2 -
drivers/resctrl/mpam_internal.h | 4 +-
8 files changed, 95 insertions(+), 4 deletions(-)
create mode 100644 arch/arm64/include/asm/mpam.h
create mode 100644 arch/arm64/kernel/mpam.c
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 93173f0a09c7..cdcc5b76a110 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2049,6 +2049,8 @@ config ARM64_MPAM
MPAM is exposed to user-space via the resctrl pseudo filesystem.
+ This option enables the extra context switch code.
+
endmenu # "ARMv8.4 architectural features"
menu "ARMv8.5 architectural features"
diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
new file mode 100644
index 000000000000..14011e5970ce
--- /dev/null
+++ b/arch/arm64/include/asm/mpam.h
@@ -0,0 +1,67 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2025 Arm Ltd. */
+
+#ifndef __ASM__MPAM_H
+#define __ASM__MPAM_H
+
+#include <linux/jump_label.h>
+#include <linux/percpu.h>
+#include <linux/sched.h>
+
+#include <asm/sysreg.h>
+
+DECLARE_STATIC_KEY_FALSE(mpam_enabled);
+DECLARE_PER_CPU(u64, arm64_mpam_default);
+DECLARE_PER_CPU(u64, arm64_mpam_current);
+
+/*
+ * The value of the MPAM0_EL1 sysreg when a task is in resctrl's default group.
+ * This is used by the context switch code to use the resctrl CPU property
+ * instead. The value is modified when CDP is enabled/disabled by mounting
+ * the resctrl filesystem.
+ */
+extern u64 arm64_mpam_global_default;
+
+/*
+ * The resctrl filesystem writes to the partid/pmg values for threads and CPUs,
+ * which may race with reads in mpam_thread_switch(). Ensure only one of the old
+ * or new values are used. Particular care should be taken with the pmg field as
+ * mpam_thread_switch() may read a partid and pmg that don't match, causing this
+ * value to be stored with cache allocations, despite being considered 'free' by
+ * resctrl.
+ */
+#ifdef CONFIG_ARM64_MPAM
+static inline u64 mpam_get_regval(struct task_struct *tsk)
+{
+ return READ_ONCE(task_thread_info(tsk)->mpam_partid_pmg);
+}
+
+static inline void mpam_thread_switch(struct task_struct *tsk)
+{
+ u64 oldregval;
+ int cpu = smp_processor_id();
+ u64 regval = mpam_get_regval(tsk);
+
+ if (!static_branch_likely(&mpam_enabled))
+ return;
+
+ if (regval == READ_ONCE(arm64_mpam_global_default))
+ regval = READ_ONCE(per_cpu(arm64_mpam_default, cpu));
+
+ oldregval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
+ if (oldregval == regval)
+ return;
+
+ write_sysreg_s(regval, SYS_MPAM1_EL1);
+ isb();
+
+ /* Synchronising the EL0 write is left until the ERET to EL0 */
+ write_sysreg_s(regval, SYS_MPAM0_EL1);
+
+ WRITE_ONCE(per_cpu(arm64_mpam_current, cpu), regval);
+}
+#else
+static inline void mpam_thread_switch(struct task_struct *tsk) {}
+#endif /* CONFIG_ARM64_MPAM */
+
+#endif /* __ASM__MPAM_H */
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index a803b887b0b4..fc801a26ff9e 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -41,6 +41,9 @@ struct thread_info {
#ifdef CONFIG_SHADOW_CALL_STACK
void *scs_base;
void *scs_sp;
+#endif
+#ifdef CONFIG_ARM64_MPAM
+ u64 mpam_partid_pmg;
#endif
u32 cpu;
};
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 76f32e424065..15979f366519 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -67,6 +67,7 @@ obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
obj-$(CONFIG_VMCORE_INFO) += vmcore_info.o
obj-$(CONFIG_ARM_SDE_INTERFACE) += sdei.o
obj-$(CONFIG_ARM64_PTR_AUTH) += pointer_auth.o
+obj-$(CONFIG_ARM64_MPAM) += mpam.o
obj-$(CONFIG_ARM64_MTE) += mte.o
obj-y += vdso-wrap.o
obj-$(CONFIG_COMPAT_VDSO) += vdso32-wrap.o
diff --git a/arch/arm64/kernel/mpam.c b/arch/arm64/kernel/mpam.c
new file mode 100644
index 000000000000..9866d2ca0faa
--- /dev/null
+++ b/arch/arm64/kernel/mpam.c
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2025 Arm Ltd. */
+
+#include <asm/mpam.h>
+
+#include <linux/jump_label.h>
+#include <linux/percpu.h>
+
+DEFINE_STATIC_KEY_FALSE(mpam_enabled);
+DEFINE_PER_CPU(u64, arm64_mpam_default);
+DEFINE_PER_CPU(u64, arm64_mpam_current);
+
+u64 arm64_mpam_global_default;
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 489554931231..47698955fa1e 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -51,6 +51,7 @@
#include <asm/fpsimd.h>
#include <asm/gcs.h>
#include <asm/mmu_context.h>
+#include <asm/mpam.h>
#include <asm/mte.h>
#include <asm/processor.h>
#include <asm/pointer_auth.h>
@@ -738,6 +739,12 @@ struct task_struct *__switch_to(struct task_struct *prev,
if (prev->thread.sctlr_user != next->thread.sctlr_user)
update_sctlr_el1(next->thread.sctlr_user);
+ /*
+ * MPAM thread switch happens after the DSB to ensure prev's accesses
+ * use prev's MPAM settings.
+ */
+ mpam_thread_switch(next);
+
/* the actual thread switch */
last = cpu_switch_to(prev, next);
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index b495d5291868..860181266b15 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -29,8 +29,6 @@
#include "mpam_internal.h"
-DEFINE_STATIC_KEY_FALSE(mpam_enabled); /* This moves to arch code */
-
/*
* mpam_list_lock protects the SRCU lists when writing. Once the
* mpam_enabled key is enabled these lists are read-only,
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index e8971842b124..4632985bcca6 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -16,12 +16,12 @@
#include <linux/srcu.h>
#include <linux/types.h>
+#include <asm/mpam.h>
+
#define MPAM_MSC_MAX_NUM_RIS 16
struct platform_device;
-DECLARE_STATIC_KEY_FALSE(mpam_enabled);
-
#ifdef CONFIG_MPAM_KUNIT_TEST
#define PACKED_FOR_KUNIT __packed
#else
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 06/47] arm64: mpam: Context switch the MPAM registers
2026-01-12 16:58 ` [PATCH v3 06/47] arm64: mpam: Context switch the MPAM registers Ben Horgan
@ 2026-01-15 6:47 ` Gavin Shan
2026-01-15 12:09 ` Jonathan Cameron
2026-01-15 17:58 ` Catalin Marinas
1 sibling, 1 reply; 160+ messages in thread
From: Gavin Shan @ 2026-01-15 6:47 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
Hi Ben,
On 1/13/26 12:58 AM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
>
> MPAM allows traffic in the SoC to be labeled by the OS, these labels are
> used to apply policy in caches and bandwidth regulators, and to monitor
> traffic in the SoC. The label is made up of a PARTID and PMG value. The x86
> equivalent calls these CLOSID and RMID, but they don't map precisely.
>
> MPAM has two CPU system registers that is used to hold the PARTID and PMG
> values that traffic generated at each exception level will use. These can
> be set per-task by the resctrl file system. (resctrl is the defacto
> interface for controlling this stuff).
>
> Add a helper to switch this.
>
> struct task_struct's separate CLOSID and RMID fields are insufficient to
> implement resctrl using MPAM, as resctrl can change the PARTID (CLOSID) and
> PMG (sort of like the RMID) separately. On x86, the rmid is an independent
> number, so a race that writes a mismatched closid and rmid into hardware is
> benign. On arm64, the pmg bits extend the partid.
> (i.e. partid-5 has a pmg-0 that is not the same as partid-6's pmg-0). In
> this case, mismatching the values will 'dirty' a pmg value that resctrl
> believes is clean, and is not tracking with its 'limbo' code.
>
> To avoid this, the partid and pmg are always read and written as a
> pair. This requires a new u64 field. In struct task_struct there are two
> u32, rmid and closid for the x86 case, but as we can't use them here do
> something else. Add this new field, mpam_partid_pmg, to struct thread_info
> to avoid adding more architecture specific code to struct task_struct.
> Always use READ_ONCE()/WRITE_ONCE() when accessing this field.
>
> Resctrl allows a per-cpu 'default' value to be set, this overrides the
> values when scheduling a task in the default control-group, which has
> PARTID 0. The way 'code data prioritisation' gets emulated means the
> register value for the default group needs to be a variable.
>
> The current system register value is kept in a per-cpu variable to avoid
> writing to the system register if the value isn't going to change. Writes
> to this register may reset the hardware state for regulating bandwidth.
>
> Finally, there is no reason to context switch these registers unless there
> is a driver changing the values in struct task_struct. Hide the whole thing
> behind a static key. This also allows the driver to disable MPAM in
> response to errors reported by hardware. Move the existing static key to
> belong to the arch code, as in the future the MPAM driver may become a
> loadable module.
>
> All this should depend on whether there is an MPAM driver, hide it behind
> CONFIG_ARM64_MPAM.
>
> CC: Amit Singh Tomar <amitsinght@marvell.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since rfc:
> CONFIG_MPAM -> CONFIG_ARM64_MPAM in commit message
> Remove extra DECLARE_STATIC_KEY_FALSE
> Function name in comment, __mpam_sched_in() -> mpam_thread_switch()
> Remove unused headers
> Expand comment (Jonathan)
>
> Changes since v2:
> Tidy up ifdefs
> ---
> arch/arm64/Kconfig | 2 +
> arch/arm64/include/asm/mpam.h | 67 ++++++++++++++++++++++++++++
> arch/arm64/include/asm/thread_info.h | 3 ++
> arch/arm64/kernel/Makefile | 1 +
> arch/arm64/kernel/mpam.c | 13 ++++++
> arch/arm64/kernel/process.c | 7 +++
> drivers/resctrl/mpam_devices.c | 2 -
> drivers/resctrl/mpam_internal.h | 4 +-
> 8 files changed, 95 insertions(+), 4 deletions(-)
> create mode 100644 arch/arm64/include/asm/mpam.h
> create mode 100644 arch/arm64/kernel/mpam.c
>
With the following nitpick addressed:
Reviewed-by: Gavin Shan <gshan@redhat.com>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 93173f0a09c7..cdcc5b76a110 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2049,6 +2049,8 @@ config ARM64_MPAM
>
> MPAM is exposed to user-space via the resctrl pseudo filesystem.
>
> + This option enables the extra context switch code.
> +
> endmenu # "ARMv8.4 architectural features"
>
> menu "ARMv8.5 architectural features"
> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
> new file mode 100644
> index 000000000000..14011e5970ce
> --- /dev/null
> +++ b/arch/arm64/include/asm/mpam.h
> @@ -0,0 +1,67 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright (C) 2025 Arm Ltd. */
> +
> +#ifndef __ASM__MPAM_H
> +#define __ASM__MPAM_H
> +
> +#include <linux/jump_label.h>
> +#include <linux/percpu.h>
> +#include <linux/sched.h>
> +
> +#include <asm/sysreg.h>
> +
> +DECLARE_STATIC_KEY_FALSE(mpam_enabled);
> +DECLARE_PER_CPU(u64, arm64_mpam_default);
> +DECLARE_PER_CPU(u64, arm64_mpam_current);
> +
> +/*
> + * The value of the MPAM0_EL1 sysreg when a task is in resctrl's default group.
> + * This is used by the context switch code to use the resctrl CPU property
> + * instead. The value is modified when CDP is enabled/disabled by mounting
> + * the resctrl filesystem.
> + */
> +extern u64 arm64_mpam_global_default;
> +
> +/*
> + * The resctrl filesystem writes to the partid/pmg values for threads and CPUs,
> + * which may race with reads in mpam_thread_switch(). Ensure only one of the old
> + * or new values are used. Particular care should be taken with the pmg field as
> + * mpam_thread_switch() may read a partid and pmg that don't match, causing this
> + * value to be stored with cache allocations, despite being considered 'free' by
> + * resctrl.
> + */
> +#ifdef CONFIG_ARM64_MPAM
> +static inline u64 mpam_get_regval(struct task_struct *tsk)
> +{
> + return READ_ONCE(task_thread_info(tsk)->mpam_partid_pmg);
> +}
> +
> +static inline void mpam_thread_switch(struct task_struct *tsk)
> +{
> + u64 oldregval;
> + int cpu = smp_processor_id();
> + u64 regval = mpam_get_regval(tsk);
> +
> + if (!static_branch_likely(&mpam_enabled))
> + return;
> +
> + if (regval == READ_ONCE(arm64_mpam_global_default))
> + regval = READ_ONCE(per_cpu(arm64_mpam_default, cpu));
> +
> + oldregval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
> + if (oldregval == regval)
> + return;
> +
> + write_sysreg_s(regval, SYS_MPAM1_EL1);
> + isb();
> +
> + /* Synchronising the EL0 write is left until the ERET to EL0 */
> + write_sysreg_s(regval, SYS_MPAM0_EL1);
> +
> + WRITE_ONCE(per_cpu(arm64_mpam_current, cpu), regval);
> +}
> +#else
> +static inline void mpam_thread_switch(struct task_struct *tsk) {}
> +#endif /* CONFIG_ARM64_MPAM */
> +
> +#endif /* __ASM__MPAM_H */
> diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
> index a803b887b0b4..fc801a26ff9e 100644
> --- a/arch/arm64/include/asm/thread_info.h
> +++ b/arch/arm64/include/asm/thread_info.h
> @@ -41,6 +41,9 @@ struct thread_info {
> #ifdef CONFIG_SHADOW_CALL_STACK
> void *scs_base;
> void *scs_sp;
> +#endif
> +#ifdef CONFIG_ARM64_MPAM
> + u64 mpam_partid_pmg;
> #endif
> u32 cpu;
> };
> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> index 76f32e424065..15979f366519 100644
> --- a/arch/arm64/kernel/Makefile
> +++ b/arch/arm64/kernel/Makefile
> @@ -67,6 +67,7 @@ obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
> obj-$(CONFIG_VMCORE_INFO) += vmcore_info.o
> obj-$(CONFIG_ARM_SDE_INTERFACE) += sdei.o
> obj-$(CONFIG_ARM64_PTR_AUTH) += pointer_auth.o
> +obj-$(CONFIG_ARM64_MPAM) += mpam.o
> obj-$(CONFIG_ARM64_MTE) += mte.o
> obj-y += vdso-wrap.o
> obj-$(CONFIG_COMPAT_VDSO) += vdso32-wrap.o
> diff --git a/arch/arm64/kernel/mpam.c b/arch/arm64/kernel/mpam.c
> new file mode 100644
> index 000000000000..9866d2ca0faa
> --- /dev/null
> +++ b/arch/arm64/kernel/mpam.c
> @@ -0,0 +1,13 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (C) 2025 Arm Ltd. */
> +
> +#include <asm/mpam.h>
> +
> +#include <linux/jump_label.h>
> +#include <linux/percpu.h>
> +
Nitpick: Needn't include those two header files since they have been included
to <asm/mpam.h>
> +DEFINE_STATIC_KEY_FALSE(mpam_enabled);
> +DEFINE_PER_CPU(u64, arm64_mpam_default);
> +DEFINE_PER_CPU(u64, arm64_mpam_current);
> +
> +u64 arm64_mpam_global_default;
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 489554931231..47698955fa1e 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -51,6 +51,7 @@
> #include <asm/fpsimd.h>
> #include <asm/gcs.h>
> #include <asm/mmu_context.h>
> +#include <asm/mpam.h>
> #include <asm/mte.h>
> #include <asm/processor.h>
> #include <asm/pointer_auth.h>
> @@ -738,6 +739,12 @@ struct task_struct *__switch_to(struct task_struct *prev,
> if (prev->thread.sctlr_user != next->thread.sctlr_user)
> update_sctlr_el1(next->thread.sctlr_user);
>
> + /*
> + * MPAM thread switch happens after the DSB to ensure prev's accesses
> + * use prev's MPAM settings.
> + */
> + mpam_thread_switch(next);
> +
> /* the actual thread switch */
> last = cpu_switch_to(prev, next);
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index b495d5291868..860181266b15 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -29,8 +29,6 @@
>
> #include "mpam_internal.h"
>
> -DEFINE_STATIC_KEY_FALSE(mpam_enabled); /* This moves to arch code */
> -
> /*
> * mpam_list_lock protects the SRCU lists when writing. Once the
> * mpam_enabled key is enabled these lists are read-only,
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index e8971842b124..4632985bcca6 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -16,12 +16,12 @@
> #include <linux/srcu.h>
> #include <linux/types.h>
>
> +#include <asm/mpam.h>
> +
> #define MPAM_MSC_MAX_NUM_RIS 16
>
> struct platform_device;
>
> -DECLARE_STATIC_KEY_FALSE(mpam_enabled);
> -
> #ifdef CONFIG_MPAM_KUNIT_TEST
> #define PACKED_FOR_KUNIT __packed
> #else
Thanks,
Gavin
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 06/47] arm64: mpam: Context switch the MPAM registers
2026-01-15 6:47 ` Gavin Shan
@ 2026-01-15 12:09 ` Jonathan Cameron
2026-01-19 14:00 ` Ben Horgan
0 siblings, 1 reply; 160+ messages in thread
From: Jonathan Cameron @ 2026-01-15 12:09 UTC (permalink / raw)
To: Gavin Shan
Cc: Ben Horgan, amitsinght, baisheng.gao, baolin.wang, carl,
dave.martin, david, dfustini, fenghuay, james.morse, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
On Thu, 15 Jan 2026 14:47:28 +0800
Gavin Shan <gshan@redhat.com> wrote:
> Hi Ben,
>
> On 1/13/26 12:58 AM, Ben Horgan wrote:
> > From: James Morse <james.morse@arm.com>
> >
> > MPAM allows traffic in the SoC to be labeled by the OS, these labels are
> > used to apply policy in caches and bandwidth regulators, and to monitor
> > traffic in the SoC. The label is made up of a PARTID and PMG value. The x86
> > equivalent calls these CLOSID and RMID, but they don't map precisely.
> >
> > MPAM has two CPU system registers that is used to hold the PARTID and PMG
> > values that traffic generated at each exception level will use. These can
> > be set per-task by the resctrl file system. (resctrl is the defacto
> > interface for controlling this stuff).
> >
> > Add a helper to switch this.
> >
> > struct task_struct's separate CLOSID and RMID fields are insufficient to
> > implement resctrl using MPAM, as resctrl can change the PARTID (CLOSID) and
> > PMG (sort of like the RMID) separately. On x86, the rmid is an independent
> > number, so a race that writes a mismatched closid and rmid into hardware is
> > benign. On arm64, the pmg bits extend the partid.
> > (i.e. partid-5 has a pmg-0 that is not the same as partid-6's pmg-0). In
> > this case, mismatching the values will 'dirty' a pmg value that resctrl
> > believes is clean, and is not tracking with its 'limbo' code.
> >
> > To avoid this, the partid and pmg are always read and written as a
> > pair. This requires a new u64 field. In struct task_struct there are two
> > u32, rmid and closid for the x86 case, but as we can't use them here do
> > something else. Add this new field, mpam_partid_pmg, to struct thread_info
> > to avoid adding more architecture specific code to struct task_struct.
> > Always use READ_ONCE()/WRITE_ONCE() when accessing this field.
> >
> > Resctrl allows a per-cpu 'default' value to be set, this overrides the
> > values when scheduling a task in the default control-group, which has
> > PARTID 0. The way 'code data prioritisation' gets emulated means the
> > register value for the default group needs to be a variable.
> >
> > The current system register value is kept in a per-cpu variable to avoid
> > writing to the system register if the value isn't going to change. Writes
> > to this register may reset the hardware state for regulating bandwidth.
> >
> > Finally, there is no reason to context switch these registers unless there
> > is a driver changing the values in struct task_struct. Hide the whole thing
> > behind a static key. This also allows the driver to disable MPAM in
> > response to errors reported by hardware. Move the existing static key to
> > belong to the arch code, as in the future the MPAM driver may become a
> > loadable module.
> >
> > All this should depend on whether there is an MPAM driver, hide it behind
> > CONFIG_ARM64_MPAM.
> >
> > CC: Amit Singh Tomar <amitsinght@marvell.com>
> > Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> > Signed-off-by: James Morse <james.morse@arm.com>
> > Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> > ---
> > Changes since rfc:
> > CONFIG_MPAM -> CONFIG_ARM64_MPAM in commit message
> > Remove extra DECLARE_STATIC_KEY_FALSE
> > Function name in comment, __mpam_sched_in() -> mpam_thread_switch()
> > Remove unused headers
> > Expand comment (Jonathan)
> >
> > Changes since v2:
> > Tidy up ifdefs
> > ---
> > arch/arm64/Kconfig | 2 +
> > arch/arm64/include/asm/mpam.h | 67 ++++++++++++++++++++++++++++
> > arch/arm64/include/asm/thread_info.h | 3 ++
> > arch/arm64/kernel/Makefile | 1 +
> > arch/arm64/kernel/mpam.c | 13 ++++++
> > arch/arm64/kernel/process.c | 7 +++
> > drivers/resctrl/mpam_devices.c | 2 -
> > drivers/resctrl/mpam_internal.h | 4 +-
> > 8 files changed, 95 insertions(+), 4 deletions(-)
> > create mode 100644 arch/arm64/include/asm/mpam.h
> > create mode 100644 arch/arm64/kernel/mpam.c
> >
>
> With the following nitpick addressed:
>
I commented on the nitpick.
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> > diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> > index 76f32e424065..15979f366519 100644
> > --- a/arch/arm64/kernel/Makefile
> > +++ b/arch/arm64/kernel/Makefile
> > @@ -67,6 +67,7 @@ obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
> > obj-$(CONFIG_VMCORE_INFO) += vmcore_info.o
> > obj-$(CONFIG_ARM_SDE_INTERFACE) += sdei.o
> > obj-$(CONFIG_ARM64_PTR_AUTH) += pointer_auth.o
> > +obj-$(CONFIG_ARM64_MPAM) += mpam.o
> > obj-$(CONFIG_ARM64_MTE) += mte.o
> > obj-y += vdso-wrap.o
> > obj-$(CONFIG_COMPAT_VDSO) += vdso32-wrap.o
> > diff --git a/arch/arm64/kernel/mpam.c b/arch/arm64/kernel/mpam.c
> > new file mode 100644
> > index 000000000000..9866d2ca0faa
> > --- /dev/null
> > +++ b/arch/arm64/kernel/mpam.c
> > @@ -0,0 +1,13 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright (C) 2025 Arm Ltd. */
> > +
> > +#include <asm/mpam.h>
> > +
> > +#include <linux/jump_label.h>
> > +#include <linux/percpu.h>
> > +
>
> Nitpick: Needn't include those two header files since they have been included
> to <asm/mpam.h>
That is a non obvious include chain that we should not rely on.
Please keep the headers and continue to follow include what you use
style (with exceptions when a given header is clearly documented as always including
another like some of the bit map stuff.) It is more obviously correct and
causes less grief if headers get refactored in future.
>
> > +DEFINE_STATIC_KEY_FALSE(mpam_enabled);
> > +DEFINE_PER_CPU(u64, arm64_mpam_default);
> > +DEFINE_PER_CPU(u64, arm64_mpam_current);
> > +
> > +u64 arm64_mpam_global_default;
>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 06/47] arm64: mpam: Context switch the MPAM registers
2026-01-15 12:09 ` Jonathan Cameron
@ 2026-01-19 14:00 ` Ben Horgan
2026-01-20 1:42 ` Gavin Shan
0 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-19 14:00 UTC (permalink / raw)
To: Jonathan Cameron, Gavin Shan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
Hi Gavin, Jonathan,
On 1/15/26 12:09, Jonathan Cameron wrote:
> On Thu, 15 Jan 2026 14:47:28 +0800
> Gavin Shan <gshan@redhat.com> wrote:
>
>> Hi Ben,
>>
>> On 1/13/26 12:58 AM, Ben Horgan wrote:
>>> From: James Morse <james.morse@arm.com>
>>>
>>> MPAM allows traffic in the SoC to be labeled by the OS, these labels are
>>> used to apply policy in caches and bandwidth regulators, and to monitor
>>> traffic in the SoC. The label is made up of a PARTID and PMG value. The x86
>>> equivalent calls these CLOSID and RMID, but they don't map precisely.
>>>
>>> MPAM has two CPU system registers that is used to hold the PARTID and PMG
>>> values that traffic generated at each exception level will use. These can
>>> be set per-task by the resctrl file system. (resctrl is the defacto
>>> interface for controlling this stuff).
>>>
>>> Add a helper to switch this.
>>>
>>> struct task_struct's separate CLOSID and RMID fields are insufficient to
>>> implement resctrl using MPAM, as resctrl can change the PARTID (CLOSID) and
>>> PMG (sort of like the RMID) separately. On x86, the rmid is an independent
>>> number, so a race that writes a mismatched closid and rmid into hardware is
>>> benign. On arm64, the pmg bits extend the partid.
>>> (i.e. partid-5 has a pmg-0 that is not the same as partid-6's pmg-0). In
>>> this case, mismatching the values will 'dirty' a pmg value that resctrl
>>> believes is clean, and is not tracking with its 'limbo' code.
>>>
>>> To avoid this, the partid and pmg are always read and written as a
>>> pair. This requires a new u64 field. In struct task_struct there are two
>>> u32, rmid and closid for the x86 case, but as we can't use them here do
>>> something else. Add this new field, mpam_partid_pmg, to struct thread_info
>>> to avoid adding more architecture specific code to struct task_struct.
>>> Always use READ_ONCE()/WRITE_ONCE() when accessing this field.
>>>
>>> Resctrl allows a per-cpu 'default' value to be set, this overrides the
>>> values when scheduling a task in the default control-group, which has
>>> PARTID 0. The way 'code data prioritisation' gets emulated means the
>>> register value for the default group needs to be a variable.
>>>
>>> The current system register value is kept in a per-cpu variable to avoid
>>> writing to the system register if the value isn't going to change. Writes
>>> to this register may reset the hardware state for regulating bandwidth.
>>>
>>> Finally, there is no reason to context switch these registers unless there
>>> is a driver changing the values in struct task_struct. Hide the whole thing
>>> behind a static key. This also allows the driver to disable MPAM in
>>> response to errors reported by hardware. Move the existing static key to
>>> belong to the arch code, as in the future the MPAM driver may become a
>>> loadable module.
>>>
>>> All this should depend on whether there is an MPAM driver, hide it behind
>>> CONFIG_ARM64_MPAM.
>>>
>>> CC: Amit Singh Tomar <amitsinght@marvell.com>
>>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>>> Signed-off-by: James Morse <james.morse@arm.com>
>>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>>> ---
>>> Changes since rfc:
>>> CONFIG_MPAM -> CONFIG_ARM64_MPAM in commit message
>>> Remove extra DECLARE_STATIC_KEY_FALSE
>>> Function name in comment, __mpam_sched_in() -> mpam_thread_switch()
>>> Remove unused headers
>>> Expand comment (Jonathan)
>>>
>>> Changes since v2:
>>> Tidy up ifdefs
>>> ---
>>> arch/arm64/Kconfig | 2 +
>>> arch/arm64/include/asm/mpam.h | 67 ++++++++++++++++++++++++++++
>>> arch/arm64/include/asm/thread_info.h | 3 ++
>>> arch/arm64/kernel/Makefile | 1 +
>>> arch/arm64/kernel/mpam.c | 13 ++++++
>>> arch/arm64/kernel/process.c | 7 +++
>>> drivers/resctrl/mpam_devices.c | 2 -
>>> drivers/resctrl/mpam_internal.h | 4 +-
>>> 8 files changed, 95 insertions(+), 4 deletions(-)
>>> create mode 100644 arch/arm64/include/asm/mpam.h
>>> create mode 100644 arch/arm64/kernel/mpam.c
>>>
>>
>> With the following nitpick addressed:
>>
>
> I commented on the nitpick.
>
>> Reviewed-by: Gavin Shan <gshan@redhat.com>
>
>>> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
>>> index 76f32e424065..15979f366519 100644
>>> --- a/arch/arm64/kernel/Makefile
>>> +++ b/arch/arm64/kernel/Makefile
>>> @@ -67,6 +67,7 @@ obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
>>> obj-$(CONFIG_VMCORE_INFO) += vmcore_info.o
>>> obj-$(CONFIG_ARM_SDE_INTERFACE) += sdei.o
>>> obj-$(CONFIG_ARM64_PTR_AUTH) += pointer_auth.o
>>> +obj-$(CONFIG_ARM64_MPAM) += mpam.o
>>> obj-$(CONFIG_ARM64_MTE) += mte.o
>>> obj-y += vdso-wrap.o
>>> obj-$(CONFIG_COMPAT_VDSO) += vdso32-wrap.o
>>> diff --git a/arch/arm64/kernel/mpam.c b/arch/arm64/kernel/mpam.c
>>> new file mode 100644
>>> index 000000000000..9866d2ca0faa
>>> --- /dev/null
>>> +++ b/arch/arm64/kernel/mpam.c
>>> @@ -0,0 +1,13 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +/* Copyright (C) 2025 Arm Ltd. */
>>> +
>>> +#include <asm/mpam.h>
>>> +
>>> +#include <linux/jump_label.h>
>>> +#include <linux/percpu.h>
>>> +
>>
>> Nitpick: Needn't include those two header files since they have been included
>> to <asm/mpam.h>
>
> That is a non obvious include chain that we should not rely on.
> Please keep the headers and continue to follow include what you use
> style (with exceptions when a given header is clearly documented as always including
> another like some of the bit map stuff.) It is more obviously correct and
> causes less grief if headers get refactored in future.
Keeping the includes here makes sense to me too. Gavin, are you ok with
keeping this as is?
>
>>
>>> +DEFINE_STATIC_KEY_FALSE(mpam_enabled);
>>> +DEFINE_PER_CPU(u64, arm64_mpam_default);
>>> +DEFINE_PER_CPU(u64, arm64_mpam_current);
>>> +
>>> +u64 arm64_mpam_global_default;
>
>>
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 06/47] arm64: mpam: Context switch the MPAM registers
2026-01-19 14:00 ` Ben Horgan
@ 2026-01-20 1:42 ` Gavin Shan
0 siblings, 0 replies; 160+ messages in thread
From: Gavin Shan @ 2026-01-20 1:42 UTC (permalink / raw)
To: Ben Horgan, Jonathan Cameron
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
Hi Ben and Jonathan,
On 1/19/26 10:00 PM, Ben Horgan wrote:
> Hi Gavin, Jonathan,
>
> On 1/15/26 12:09, Jonathan Cameron wrote:
>> On Thu, 15 Jan 2026 14:47:28 +0800
>> Gavin Shan <gshan@redhat.com> wrote:
>>
>>> Hi Ben,
>>>
>>> On 1/13/26 12:58 AM, Ben Horgan wrote:
>>>> From: James Morse <james.morse@arm.com>
>>>>
>>>> MPAM allows traffic in the SoC to be labeled by the OS, these labels are
>>>> used to apply policy in caches and bandwidth regulators, and to monitor
>>>> traffic in the SoC. The label is made up of a PARTID and PMG value. The x86
>>>> equivalent calls these CLOSID and RMID, but they don't map precisely.
>>>>
>>>> MPAM has two CPU system registers that is used to hold the PARTID and PMG
>>>> values that traffic generated at each exception level will use. These can
>>>> be set per-task by the resctrl file system. (resctrl is the defacto
>>>> interface for controlling this stuff).
>>>>
>>>> Add a helper to switch this.
>>>>
>>>> struct task_struct's separate CLOSID and RMID fields are insufficient to
>>>> implement resctrl using MPAM, as resctrl can change the PARTID (CLOSID) and
>>>> PMG (sort of like the RMID) separately. On x86, the rmid is an independent
>>>> number, so a race that writes a mismatched closid and rmid into hardware is
>>>> benign. On arm64, the pmg bits extend the partid.
>>>> (i.e. partid-5 has a pmg-0 that is not the same as partid-6's pmg-0). In
>>>> this case, mismatching the values will 'dirty' a pmg value that resctrl
>>>> believes is clean, and is not tracking with its 'limbo' code.
>>>>
>>>> To avoid this, the partid and pmg are always read and written as a
>>>> pair. This requires a new u64 field. In struct task_struct there are two
>>>> u32, rmid and closid for the x86 case, but as we can't use them here do
>>>> something else. Add this new field, mpam_partid_pmg, to struct thread_info
>>>> to avoid adding more architecture specific code to struct task_struct.
>>>> Always use READ_ONCE()/WRITE_ONCE() when accessing this field.
>>>>
>>>> Resctrl allows a per-cpu 'default' value to be set, this overrides the
>>>> values when scheduling a task in the default control-group, which has
>>>> PARTID 0. The way 'code data prioritisation' gets emulated means the
>>>> register value for the default group needs to be a variable.
>>>>
>>>> The current system register value is kept in a per-cpu variable to avoid
>>>> writing to the system register if the value isn't going to change. Writes
>>>> to this register may reset the hardware state for regulating bandwidth.
>>>>
>>>> Finally, there is no reason to context switch these registers unless there
>>>> is a driver changing the values in struct task_struct. Hide the whole thing
>>>> behind a static key. This also allows the driver to disable MPAM in
>>>> response to errors reported by hardware. Move the existing static key to
>>>> belong to the arch code, as in the future the MPAM driver may become a
>>>> loadable module.
>>>>
>>>> All this should depend on whether there is an MPAM driver, hide it behind
>>>> CONFIG_ARM64_MPAM.
>>>>
>>>> CC: Amit Singh Tomar <amitsinght@marvell.com>
>>>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>>>> Signed-off-by: James Morse <james.morse@arm.com>
>>>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>>>> ---
>>>> Changes since rfc:
>>>> CONFIG_MPAM -> CONFIG_ARM64_MPAM in commit message
>>>> Remove extra DECLARE_STATIC_KEY_FALSE
>>>> Function name in comment, __mpam_sched_in() -> mpam_thread_switch()
>>>> Remove unused headers
>>>> Expand comment (Jonathan)
>>>>
>>>> Changes since v2:
>>>> Tidy up ifdefs
>>>> ---
>>>> arch/arm64/Kconfig | 2 +
>>>> arch/arm64/include/asm/mpam.h | 67 ++++++++++++++++++++++++++++
>>>> arch/arm64/include/asm/thread_info.h | 3 ++
>>>> arch/arm64/kernel/Makefile | 1 +
>>>> arch/arm64/kernel/mpam.c | 13 ++++++
>>>> arch/arm64/kernel/process.c | 7 +++
>>>> drivers/resctrl/mpam_devices.c | 2 -
>>>> drivers/resctrl/mpam_internal.h | 4 +-
>>>> 8 files changed, 95 insertions(+), 4 deletions(-)
>>>> create mode 100644 arch/arm64/include/asm/mpam.h
>>>> create mode 100644 arch/arm64/kernel/mpam.c
>>>>
>>>
>>> With the following nitpick addressed:
>>>
>>
>> I commented on the nitpick.
>>
>>> Reviewed-by: Gavin Shan <gshan@redhat.com>
>>
>>>> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
>>>> index 76f32e424065..15979f366519 100644
>>>> --- a/arch/arm64/kernel/Makefile
>>>> +++ b/arch/arm64/kernel/Makefile
>>>> @@ -67,6 +67,7 @@ obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
>>>> obj-$(CONFIG_VMCORE_INFO) += vmcore_info.o
>>>> obj-$(CONFIG_ARM_SDE_INTERFACE) += sdei.o
>>>> obj-$(CONFIG_ARM64_PTR_AUTH) += pointer_auth.o
>>>> +obj-$(CONFIG_ARM64_MPAM) += mpam.o
>>>> obj-$(CONFIG_ARM64_MTE) += mte.o
>>>> obj-y += vdso-wrap.o
>>>> obj-$(CONFIG_COMPAT_VDSO) += vdso32-wrap.o
>>>> diff --git a/arch/arm64/kernel/mpam.c b/arch/arm64/kernel/mpam.c
>>>> new file mode 100644
>>>> index 000000000000..9866d2ca0faa
>>>> --- /dev/null
>>>> +++ b/arch/arm64/kernel/mpam.c
>>>> @@ -0,0 +1,13 @@
>>>> +// SPDX-License-Identifier: GPL-2.0
>>>> +/* Copyright (C) 2025 Arm Ltd. */
>>>> +
>>>> +#include <asm/mpam.h>
>>>> +
>>>> +#include <linux/jump_label.h>
>>>> +#include <linux/percpu.h>
>>>> +
>>>
>>> Nitpick: Needn't include those two header files since they have been included
>>> to <asm/mpam.h>
>>
>> That is a non obvious include chain that we should not rely on.
>> Please keep the headers and continue to follow include what you use
>> style (with exceptions when a given header is clearly documented as always including
>> another like some of the bit map stuff.) It is more obviously correct and
>> causes less grief if headers get refactored in future.
>
> Keeping the includes here makes sense to me too. Gavin, are you ok with
> keeping this as is?
>
Yeah, I'm fine to keep it as of being :-)
>>
>>>
>>>> +DEFINE_STATIC_KEY_FALSE(mpam_enabled);
>>>> +DEFINE_PER_CPU(u64, arm64_mpam_default);
>>>> +DEFINE_PER_CPU(u64, arm64_mpam_current);
>>>> +
>>>> +u64 arm64_mpam_global_default;
>>
>>>
>>
>
> Thanks,
>
> Ben
>
>
Thanks,
Gavin
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 06/47] arm64: mpam: Context switch the MPAM registers
2026-01-12 16:58 ` [PATCH v3 06/47] arm64: mpam: Context switch the MPAM registers Ben Horgan
2026-01-15 6:47 ` Gavin Shan
@ 2026-01-15 17:58 ` Catalin Marinas
2026-01-19 12:23 ` Ben Horgan
1 sibling, 1 reply; 160+ messages in thread
From: Catalin Marinas @ 2026-01-15 17:58 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
On Mon, Jan 12, 2026 at 04:58:33PM +0000, Ben Horgan wrote:
> menu "ARMv8.5 architectural features"
> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
> new file mode 100644
> index 000000000000..14011e5970ce
> --- /dev/null
> +++ b/arch/arm64/include/asm/mpam.h
> @@ -0,0 +1,67 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright (C) 2025 Arm Ltd. */
> +
> +#ifndef __ASM__MPAM_H
> +#define __ASM__MPAM_H
> +
> +#include <linux/jump_label.h>
> +#include <linux/percpu.h>
> +#include <linux/sched.h>
> +
> +#include <asm/sysreg.h>
> +
> +DECLARE_STATIC_KEY_FALSE(mpam_enabled);
> +DECLARE_PER_CPU(u64, arm64_mpam_default);
> +DECLARE_PER_CPU(u64, arm64_mpam_current);
> +
> +/*
> + * The value of the MPAM0_EL1 sysreg when a task is in resctrl's default group.
> + * This is used by the context switch code to use the resctrl CPU property
> + * instead. The value is modified when CDP is enabled/disabled by mounting
> + * the resctrl filesystem.
> + */
> +extern u64 arm64_mpam_global_default;
> +
> +/*
> + * The resctrl filesystem writes to the partid/pmg values for threads and CPUs,
> + * which may race with reads in mpam_thread_switch(). Ensure only one of the old
> + * or new values are used. Particular care should be taken with the pmg field as
> + * mpam_thread_switch() may read a partid and pmg that don't match, causing this
> + * value to be stored with cache allocations, despite being considered 'free' by
> + * resctrl.
> + */
> +#ifdef CONFIG_ARM64_MPAM
> +static inline u64 mpam_get_regval(struct task_struct *tsk)
> +{
> + return READ_ONCE(task_thread_info(tsk)->mpam_partid_pmg);
> +}
> +
> +static inline void mpam_thread_switch(struct task_struct *tsk)
> +{
> + u64 oldregval;
> + int cpu = smp_processor_id();
> + u64 regval = mpam_get_regval(tsk);
> +
> + if (!static_branch_likely(&mpam_enabled))
> + return;
> +
> + if (regval == READ_ONCE(arm64_mpam_global_default))
> + regval = READ_ONCE(per_cpu(arm64_mpam_default, cpu));
> +
> + oldregval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
> + if (oldregval == regval)
> + return;
> +
> + write_sysreg_s(regval, SYS_MPAM1_EL1);
> + isb();
> +
> + /* Synchronising the EL0 write is left until the ERET to EL0 */
> + write_sysreg_s(regval, SYS_MPAM0_EL1);
Since we have an isb() already, does it make any difference if we write
MPAM0 before the barrier? Similar question for other places where we
write these two registers.
At some point, we should go through __switch_to() and coalesce the isbs
into fewer as we keep accumulating them (e.g. all those switch function
setting some sync variable if needed).
> +
> + WRITE_ONCE(per_cpu(arm64_mpam_current, cpu), regval);
Is it too expensive to read the MPAM sysregs and avoid carrying around
another per-CPU state? You use it for pm restoring but we could just
save it in cpu_do_suspend() like other sysregs. Not a big issue, it just
feels like this function got unnecessarily complicated (it took me a bit
to figure out what it all does).
A related question - is resctrl_arch_set_cdp_enabled() always called in
non-preemptible contexts? We potentially have a race between setting
current->mpam_partid_msg and arm64_mpam_global_default, so the check in
mpam_thread_switch() can get confused.
And I couldn't figure out where the MPAMx_EL1 registers are written. If
any global/per-cpu/per-task value is changed, does the kernel wait until
the next thread switch to write the sysreg? The only places I can found
touching these sysregs are the thread switch, pm notifiers and KVM.
--
Catalin
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 06/47] arm64: mpam: Context switch the MPAM registers
2026-01-15 17:58 ` Catalin Marinas
@ 2026-01-19 12:23 ` Ben Horgan
2026-01-23 14:29 ` Catalin Marinas
0 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-19 12:23 UTC (permalink / raw)
To: Catalin Marinas
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
Hi Catalin,
On 1/15/26 17:58, Catalin Marinas wrote:
> On Mon, Jan 12, 2026 at 04:58:33PM +0000, Ben Horgan wrote:
>> menu "ARMv8.5 architectural features"
>> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
>> new file mode 100644
>> index 000000000000..14011e5970ce
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/mpam.h
>> @@ -0,0 +1,67 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/* Copyright (C) 2025 Arm Ltd. */
>> +
>> +#ifndef __ASM__MPAM_H
>> +#define __ASM__MPAM_H
>> +
>> +#include <linux/jump_label.h>
>> +#include <linux/percpu.h>
>> +#include <linux/sched.h>
>> +
>> +#include <asm/sysreg.h>
>> +
>> +DECLARE_STATIC_KEY_FALSE(mpam_enabled);
>> +DECLARE_PER_CPU(u64, arm64_mpam_default);
>> +DECLARE_PER_CPU(u64, arm64_mpam_current);
>> +
>> +/*
>> + * The value of the MPAM0_EL1 sysreg when a task is in resctrl's default group.
>> + * This is used by the context switch code to use the resctrl CPU property
>> + * instead. The value is modified when CDP is enabled/disabled by mounting
>> + * the resctrl filesystem.
>> + */
>> +extern u64 arm64_mpam_global_default;
>> +
>> +/*
>> + * The resctrl filesystem writes to the partid/pmg values for threads and CPUs,
>> + * which may race with reads in mpam_thread_switch(). Ensure only one of the old
>> + * or new values are used. Particular care should be taken with the pmg field as
>> + * mpam_thread_switch() may read a partid and pmg that don't match, causing this
>> + * value to be stored with cache allocations, despite being considered 'free' by
>> + * resctrl.
>> + */
>> +#ifdef CONFIG_ARM64_MPAM
>> +static inline u64 mpam_get_regval(struct task_struct *tsk)
>> +{
>> + return READ_ONCE(task_thread_info(tsk)->mpam_partid_pmg);
>> +}
>> +
>> +static inline void mpam_thread_switch(struct task_struct *tsk)
>> +{
>> + u64 oldregval;
>> + int cpu = smp_processor_id();
>> + u64 regval = mpam_get_regval(tsk);
>> +
>> + if (!static_branch_likely(&mpam_enabled))
>> + return;
>> +
>> + if (regval == READ_ONCE(arm64_mpam_global_default))
>> + regval = READ_ONCE(per_cpu(arm64_mpam_default, cpu));
>> +
>> + oldregval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
>> + if (oldregval == regval)
>> + return;
>> +
>> + write_sysreg_s(regval, SYS_MPAM1_EL1);
>> + isb();
>> +
>> + /* Synchronising the EL0 write is left until the ERET to EL0 */
>> + write_sysreg_s(regval, SYS_MPAM0_EL1);
>
> Since we have an isb() already, does it make any difference if we write
> MPAM0 before the barrier? Similar question for other places where we
> write these two registers.
The reason for the isb() placement is to document that it's not required
for the MPAM0_EL1. All instructions running at EL1take their MPAM
configuration from MPAM1_EL1. This includes LDTR and STTR as you asked
about in a different thread.
>
> At some point, we should go through __switch_to() and coalesce the isbs
> into fewer as we keep accumulating them (e.g. all those switch function
> setting some sync variable if needed).
>
>> +
>> + WRITE_ONCE(per_cpu(arm64_mpam_current, cpu), regval);
>
> Is it too expensive to read the MPAM sysregs and avoid carrying around
> another per-CPU state? You use it for pm restoring but we could just
> save it in cpu_do_suspend() like other sysregs. Not a big issue, it just
> feels like this function got unnecessarily complicated (it took me a bit
> to figure out what it all does).
It's done this way as it matches what's done in x86. I expect it would
be cheap enough to read the register to check whether a write is required.
>
> A related question - is resctrl_arch_set_cdp_enabled() always called in
> non-preemptible contexts? We potentially have a race between setting
> current->mpam_partid_msg and arm64_mpam_global_default, so the check in
> mpam_thread_switch() can get confused.
The resctrl filesystem can only ever get mounted once and
resctrl_arch_set_cdp_enabled() is always called in mount. In
rdt_get_tree(), rdt_enable_ctx() calls resctrl_arch_set_cdp_enabled().
This is done while holding the rdtgroup_mutex and the user can't change
the mpam configuration from the default until the mutex is released and
it can claim it.
>
> And I couldn't figure out where the MPAMx_EL1 registers are written. If
> any global/per-cpu/per-task value is changed, does the kernel wait until
> the next thread switch to write the sysreg? The only places I can found
> touching these sysregs are the thread switch, pm notifiers and KVM.
>
If the task configuration is changed then the MPAM sysreg will only be
updated the next time it goes into the kernel. So, just eventually
consistent. For cpu configuration, update_closid_rmid() gets called
which ipis the cpus and ends up calling mpam_thread_switch() from
resctrl_arch_sched_in().
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 06/47] arm64: mpam: Context switch the MPAM registers
2026-01-19 12:23 ` Ben Horgan
@ 2026-01-23 14:29 ` Catalin Marinas
2026-01-26 14:30 ` Ben Horgan
0 siblings, 1 reply; 160+ messages in thread
From: Catalin Marinas @ 2026-01-23 14:29 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
Hi Ben,
On Mon, Jan 19, 2026 at 12:23:13PM +0000, Ben Horgan wrote:
> On 1/15/26 17:58, Catalin Marinas wrote:
> > On Mon, Jan 12, 2026 at 04:58:33PM +0000, Ben Horgan wrote:
> >> +static inline void mpam_thread_switch(struct task_struct *tsk)
> >> +{
> >> + u64 oldregval;
> >> + int cpu = smp_processor_id();
> >> + u64 regval = mpam_get_regval(tsk);
> >> +
> >> + if (!static_branch_likely(&mpam_enabled))
> >> + return;
> >> +
> >> + if (regval == READ_ONCE(arm64_mpam_global_default))
> >> + regval = READ_ONCE(per_cpu(arm64_mpam_default, cpu));
> >> +
> >> + oldregval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
> >> + if (oldregval == regval)
> >> + return;
> >> +
> >> + write_sysreg_s(regval, SYS_MPAM1_EL1);
> >> + isb();
> >> +
> >> + /* Synchronising the EL0 write is left until the ERET to EL0 */
> >> + write_sysreg_s(regval, SYS_MPAM0_EL1);
> >
> > Since we have an isb() already, does it make any difference if we write
> > MPAM0 before the barrier? Similar question for other places where we
> > write these two registers.
>
> The reason for the isb() placement is to document that it's not required
> for the MPAM0_EL1. All instructions running at EL1take their MPAM
> configuration from MPAM1_EL1. This includes LDTR and STTR as you asked
> about in a different thread.
It's fine to keep it this way if LDTR/STTR are not affected by the MPAM0
register.
> >> +
> >> + WRITE_ONCE(per_cpu(arm64_mpam_current, cpu), regval);
> >
> > Is it too expensive to read the MPAM sysregs and avoid carrying around
> > another per-CPU state? You use it for pm restoring but we could just
> > save it in cpu_do_suspend() like other sysregs. Not a big issue, it just
> > feels like this function got unnecessarily complicated (it took me a bit
> > to figure out what it all does).
>
> It's done this way as it matches what's done in x86. I expect it would
> be cheap enough to read the register to check whether a write is required.
Since you use it for CPU suspend/resume, I guess it's fine, it saves us
from having to preserve it in the low-level asm sleep code. I don't have
a strong preference either way.
> > A related question - is resctrl_arch_set_cdp_enabled() always called in
> > non-preemptible contexts? We potentially have a race between setting
> > current->mpam_partid_msg and arm64_mpam_global_default, so the check in
> > mpam_thread_switch() can get confused.
>
> The resctrl filesystem can only ever get mounted once and
> resctrl_arch_set_cdp_enabled() is always called in mount. In
> rdt_get_tree(), rdt_enable_ctx() calls resctrl_arch_set_cdp_enabled().
> This is done while holding the rdtgroup_mutex and the user can't change
> the mpam configuration from the default until the mutex is released and
> it can claim it.
What if resctrl_arch_set_cdp_enabled() gets preempted between updating
current task partid and setting arm64_mpam_global_default? The mutex
doesn't help.
> > And I couldn't figure out where the MPAMx_EL1 registers are written. If
> > any global/per-cpu/per-task value is changed, does the kernel wait until
> > the next thread switch to write the sysreg? The only places I can found
> > touching these sysregs are the thread switch, pm notifiers and KVM.
>
> If the task configuration is changed then the MPAM sysreg will only be
> updated the next time it goes into the kernel.
Is it updated when it goes into the kernel or when we have a context
switch? If you have a long-running user thread that is never scheduled
out, it may not notice the update even if it entered the kernel (but no
context switch).
> So, just eventually
> consistent. For cpu configuration, update_closid_rmid() gets called
> which ipis the cpus and ends up calling mpam_thread_switch() from
> resctrl_arch_sched_in().
I see, it should be fine.
--
Catalin
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 06/47] arm64: mpam: Context switch the MPAM registers
2026-01-23 14:29 ` Catalin Marinas
@ 2026-01-26 14:30 ` Ben Horgan
2026-01-26 14:50 ` Ben Horgan
0 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-26 14:30 UTC (permalink / raw)
To: Catalin Marinas
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
Hi Catalin,
On 1/23/26 14:29, Catalin Marinas wrote:
> Hi Ben,
>
> On Mon, Jan 19, 2026 at 12:23:13PM +0000, Ben Horgan wrote:
>> On 1/15/26 17:58, Catalin Marinas wrote:
>>> On Mon, Jan 12, 2026 at 04:58:33PM +0000, Ben Horgan wrote:
>>>> +static inline void mpam_thread_switch(struct task_struct *tsk)
>>>> +{
>>>> + u64 oldregval;
>>>> + int cpu = smp_processor_id();
>>>> + u64 regval = mpam_get_regval(tsk);
>>>> +
>>>> + if (!static_branch_likely(&mpam_enabled))
>>>> + return;
>>>> +
>>>> + if (regval == READ_ONCE(arm64_mpam_global_default))
>>>> + regval = READ_ONCE(per_cpu(arm64_mpam_default, cpu));
>>>> +
>>>> + oldregval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
>>>> + if (oldregval == regval)
>>>> + return;
>>>> +
>>>> + write_sysreg_s(regval, SYS_MPAM1_EL1);
>>>> + isb();
>>>> +
>>>> + /* Synchronising the EL0 write is left until the ERET to EL0 */
>>>> + write_sysreg_s(regval, SYS_MPAM0_EL1);
>>>
>>> Since we have an isb() already, does it make any difference if we write
>>> MPAM0 before the barrier? Similar question for other places where we
>>> write these two registers.
>>
>> The reason for the isb() placement is to document that it's not required
>> for the MPAM0_EL1. All instructions running at EL1take their MPAM
>> configuration from MPAM1_EL1. This includes LDTR and STTR as you asked
>> about in a different thread.
>
> It's fine to keep it this way if LDTR/STTR are not affected by the MPAM0
> register.
>
>>>> +
>>>> + WRITE_ONCE(per_cpu(arm64_mpam_current, cpu), regval);
>>>
>>> Is it too expensive to read the MPAM sysregs and avoid carrying around
>>> another per-CPU state? You use it for pm restoring but we could just
>>> save it in cpu_do_suspend() like other sysregs. Not a big issue, it just
>>> feels like this function got unnecessarily complicated (it took me a bit
>>> to figure out what it all does).
>>
>> It's done this way as it matches what's done in x86. I expect it would
>> be cheap enough to read the register to check whether a write is required.
>
> Since you use it for CPU suspend/resume, I guess it's fine, it saves us
> from having to preserve it in the low-level asm sleep code. I don't have
> a strong preference either way.
>
>>> A related question - is resctrl_arch_set_cdp_enabled() always called in
>>> non-preemptible contexts? We potentially have a race between setting
>>> current->mpam_partid_msg and arm64_mpam_global_default, so the check in
>>> mpam_thread_switch() can get confused.
>>
>> The resctrl filesystem can only ever get mounted once and
>> resctrl_arch_set_cdp_enabled() is always called in mount. In
>> rdt_get_tree(), rdt_enable_ctx() calls resctrl_arch_set_cdp_enabled().
>> This is done while holding the rdtgroup_mutex and the user can't change
>> the mpam configuration from the default until the mutex is released and
>> it can claim it.
>
> What if resctrl_arch_set_cdp_enabled() gets preempted between updating
> current task partid and setting arm64_mpam_global_default? The mutex
> doesn't help.
I see, I had misinterpreted your question.
When looking into this I spotted that the per-cpu default,
arm64_mpam_default, is not updated on the cdp enable and so the default
tasks carry on running with the non-cdp default pmg/partid configuration.
On the race itself, if current->mpam_partid_pmg is set but not
arm64_mpam_global_default then if the current task is preempted at that
point there is a discrepancy. When the task gets context switched back
in then the comparison in mpam_thread_switch() will be false when true
would be expected.
In order to update arm64_mpam_default and make sure the mpam variables
and registers for the online cpus are in the right state by the end of
resctrl_arch_set_cdp_enabled() I'll add a per cpu call to
resctrl_arch_sync_cpu_closid_rmid(). I'll also add something in the
resume path so that arm64_mpam_default gets set correctly for the cpus
that were offline.
>
>>> And I couldn't figure out where the MPAMx_EL1 registers are written. If
>>> any global/per-cpu/per-task value is changed, does the kernel wait until
>>> the next thread switch to write the sysreg? The only places I can found
>>> touching these sysregs are the thread switch, pm notifiers and KVM.
>>
>> If the task configuration is changed then the MPAM sysreg will only be
>> updated the next time it goes into the kernel.
>
> Is it updated when it goes into the kernel or when we have a context
> switch? If you have a long-running user thread that is never scheduled
> out, it may not notice the update even if it entered the kernel (but no
> context switch).
Yes, only on context switch.
>
>> So, just eventually
>> consistent. For cpu configuration, update_closid_rmid() gets called
>> which ipis the cpus and ends up calling mpam_thread_switch() from
>> resctrl_arch_sched_in().
>
> I see, it should be fine.
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 06/47] arm64: mpam: Context switch the MPAM registers
2026-01-26 14:30 ` Ben Horgan
@ 2026-01-26 14:50 ` Ben Horgan
0 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-26 14:50 UTC (permalink / raw)
To: Catalin Marinas
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
On 1/26/26 14:30, Ben Horgan wrote:
> Hi Catalin,
>
> On 1/23/26 14:29, Catalin Marinas wrote:
>> Hi Ben,
>>
>> On Mon, Jan 19, 2026 at 12:23:13PM +0000, Ben Horgan wrote:
>>> On 1/15/26 17:58, Catalin Marinas wrote:
>>>> On Mon, Jan 12, 2026 at 04:58:33PM +0000, Ben Horgan wrote:
>>>>> +static inline void mpam_thread_switch(struct task_struct *tsk)
>>>>> +{
>>>>> + u64 oldregval;
>>>>> + int cpu = smp_processor_id();
>>>>> + u64 regval = mpam_get_regval(tsk);
>>>>> +
>>>>> + if (!static_branch_likely(&mpam_enabled))
>>>>> + return;
>>>>> +
>>>>> + if (regval == READ_ONCE(arm64_mpam_global_default))
>>>>> + regval = READ_ONCE(per_cpu(arm64_mpam_default, cpu));
>>>>> +
>>>>> + oldregval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
>>>>> + if (oldregval == regval)
>>>>> + return;
>>>>> +
>>>>> + write_sysreg_s(regval, SYS_MPAM1_EL1);
>>>>> + isb();
>>>>> +
>>>>> + /* Synchronising the EL0 write is left until the ERET to EL0 */
>>>>> + write_sysreg_s(regval, SYS_MPAM0_EL1);
>>>>
>>>> Since we have an isb() already, does it make any difference if we write
>>>> MPAM0 before the barrier? Similar question for other places where we
>>>> write these two registers.
>>>
>>> The reason for the isb() placement is to document that it's not required
>>> for the MPAM0_EL1. All instructions running at EL1take their MPAM
>>> configuration from MPAM1_EL1. This includes LDTR and STTR as you asked
>>> about in a different thread.
>>
>> It's fine to keep it this way if LDTR/STTR are not affected by the MPAM0
>> register.
>>
>>>>> +
>>>>> + WRITE_ONCE(per_cpu(arm64_mpam_current, cpu), regval);
>>>>
>>>> Is it too expensive to read the MPAM sysregs and avoid carrying around
>>>> another per-CPU state? You use it for pm restoring but we could just
>>>> save it in cpu_do_suspend() like other sysregs. Not a big issue, it just
>>>> feels like this function got unnecessarily complicated (it took me a bit
>>>> to figure out what it all does).
>>>
>>> It's done this way as it matches what's done in x86. I expect it would
>>> be cheap enough to read the register to check whether a write is required.
>>
>> Since you use it for CPU suspend/resume, I guess it's fine, it saves us
>> from having to preserve it in the low-level asm sleep code. I don't have
>> a strong preference either way.
>>
>>>> A related question - is resctrl_arch_set_cdp_enabled() always called in
>>>> non-preemptible contexts? We potentially have a race between setting
>>>> current->mpam_partid_msg and arm64_mpam_global_default, so the check in
>>>> mpam_thread_switch() can get confused.
>>>
>>> The resctrl filesystem can only ever get mounted once and
>>> resctrl_arch_set_cdp_enabled() is always called in mount. In
>>> rdt_get_tree(), rdt_enable_ctx() calls resctrl_arch_set_cdp_enabled().
>>> This is done while holding the rdtgroup_mutex and the user can't change
>>> the mpam configuration from the default until the mutex is released and
>>> it can claim it.
>>
>> What if resctrl_arch_set_cdp_enabled() gets preempted between updating
>> current task partid and setting arm64_mpam_global_default? The mutex
>> doesn't help.
>
> I see, I had misinterpreted your question.
>
> When looking into this I spotted that the per-cpu default,
> arm64_mpam_default, is not updated on the cdp enable and so the default
> tasks carry on running with the non-cdp default pmg/partid configuration.
>
> On the race itself, if current->mpam_partid_pmg is set but not
> arm64_mpam_global_default then if the current task is preempted at that
> point there is a discrepancy. When the task gets context switched back
> in then the comparison in mpam_thread_switch() will be false when true
> would be expected.
>
> In order to update arm64_mpam_default and make sure the mpam variables
> and registers for the online cpus are in the right state by the end of
> resctrl_arch_set_cdp_enabled() I'll add a per cpu call to
> resctrl_arch_sync_cpu_closid_rmid(). I'll also add something in the
> resume path so that arm64_mpam_default gets set correctly for the cpus
> that were offline.
Rather than the resume path I'll just set all the arm64_mpam_default in
resctrl_arch_set_cdp_enabled(). In code:
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -193,6 +193,7 @@ static void resctrl_reset_task_closids(void)
int resctrl_arch_set_cdp_enabled(enum resctrl_res_level ignored, bool
enable)
{
u32 partid_i = RESCTRL_RESERVED_CLOSID, partid_d =
RESCTRL_RESERVED_CLOSID;
+ int cpu;
cdp_enabled = enable;
@@ -209,6 +210,10 @@ int resctrl_arch_set_cdp_enabled(enum
resctrl_res_level ignored, bool enable)
resctrl_reset_task_closids();
+ for_each_possible_cpu(cpu)
+ mpam_set_cpu_defaults(cpu, partid_d, partid_i, 0, 0);
+ on_each_cpu(resctrl_arch_sync_cpu_closid_rmid, NULL, 1);
+
return 0;
}
>
>>
>>>> And I couldn't figure out where the MPAMx_EL1 registers are written. If
>>>> any global/per-cpu/per-task value is changed, does the kernel wait until
>>>> the next thread switch to write the sysreg? The only places I can found
>>>> touching these sysregs are the thread switch, pm notifiers and KVM.
>>>
>>> If the task configuration is changed then the MPAM sysreg will only be
>>> updated the next time it goes into the kernel.
>>
>> Is it updated when it goes into the kernel or when we have a context
>> switch? If you have a long-running user thread that is never scheduled
>> out, it may not notice the update even if it entered the kernel (but no
>> context switch).
>
> Yes, only on context switch.
>
>>
>>> So, just eventually
>>> consistent. For cpu configuration, update_closid_rmid() gets called
>>> which ipis the cpus and ends up calling mpam_thread_switch() from
>>> resctrl_arch_sched_in().
>>
>> I see, it should be fine.
>>
>
> Thanks,
>
> Ben
>
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 07/47] arm64: mpam: Re-initialise MPAM regs when CPU comes online
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (5 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 06/47] arm64: mpam: Context switch the MPAM registers Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-15 6:50 ` Gavin Shan
2026-01-15 18:14 ` Catalin Marinas
2026-01-12 16:58 ` [PATCH v3 08/47] arm64: mpam: Advertise the CPUs MPAM limits to the driver Ben Horgan
` (44 subsequent siblings)
51 siblings, 2 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: James Morse <james.morse@arm.com>
Now that the MPAM system registers are expected to have values that change,
reprogram them based on the previous value when a CPU is brought online.
Previously MPAM's 'default PARTID' of 0 was always used for MPAM in
kernel-space as this is the PARTID that hardware guarantees to
reset. Because there are a limited number of PARTID, this value is exposed
to user-space, meaning resctrl changes to the resctrl default group would
also affect kernel threads. Instead, use the task's PARTID value for
kernel work on behalf of user-space too. The default of 0 is kept for both
user-space and kernel-space when MPAM is not enabled.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since rfc:
CONFIG_MPAM -> CONFIG_ARM64_MPAM
Check mpam_enabled
Comment about relying on ERET for synchronisation
Update commit message
---
arch/arm64/kernel/cpufeature.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index c840a93b9ef9..0cdfb3728f43 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -86,6 +86,7 @@
#include <asm/kvm_host.h>
#include <asm/mmu.h>
#include <asm/mmu_context.h>
+#include <asm/mpam.h>
#include <asm/mte.h>
#include <asm/hypervisor.h>
#include <asm/processor.h>
@@ -2483,13 +2484,17 @@ test_has_mpam(const struct arm64_cpu_capabilities *entry, int scope)
static void
cpu_enable_mpam(const struct arm64_cpu_capabilities *entry)
{
- /*
- * Access by the kernel (at EL1) should use the reserved PARTID
- * which is configured unrestricted. This avoids priority-inversion
- * where latency sensitive tasks have to wait for a task that has
- * been throttled to release the lock.
- */
- write_sysreg_s(0, SYS_MPAM1_EL1);
+ int cpu = smp_processor_id();
+ u64 regval = 0;
+
+ if (IS_ENABLED(CONFIG_ARM64_MPAM) && static_branch_likely(&mpam_enabled))
+ regval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
+
+ write_sysreg_s(regval, SYS_MPAM1_EL1);
+ isb();
+
+ /* Synchronising the EL0 write is left until the ERET to EL0 */
+ write_sysreg_s(regval, SYS_MPAM0_EL1);
}
static bool
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 07/47] arm64: mpam: Re-initialise MPAM regs when CPU comes online
2026-01-12 16:58 ` [PATCH v3 07/47] arm64: mpam: Re-initialise MPAM regs when CPU comes online Ben Horgan
@ 2026-01-15 6:50 ` Gavin Shan
2026-01-15 18:14 ` Catalin Marinas
1 sibling, 0 replies; 160+ messages in thread
From: Gavin Shan @ 2026-01-15 6:50 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
On 1/13/26 12:58 AM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
>
> Now that the MPAM system registers are expected to have values that change,
> reprogram them based on the previous value when a CPU is brought online.
>
> Previously MPAM's 'default PARTID' of 0 was always used for MPAM in
> kernel-space as this is the PARTID that hardware guarantees to
> reset. Because there are a limited number of PARTID, this value is exposed
> to user-space, meaning resctrl changes to the resctrl default group would
> also affect kernel threads. Instead, use the task's PARTID value for
> kernel work on behalf of user-space too. The default of 0 is kept for both
> user-space and kernel-space when MPAM is not enabled.
>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since rfc:
> CONFIG_MPAM -> CONFIG_ARM64_MPAM
> Check mpam_enabled
> Comment about relying on ERET for synchronisation
> Update commit message
> ---
> arch/arm64/kernel/cpufeature.c | 19 ++++++++++++-------
> 1 file changed, 12 insertions(+), 7 deletions(-)
>
Reviewed-by: Gavin Shan <gshan@redhat.com>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 07/47] arm64: mpam: Re-initialise MPAM regs when CPU comes online
2026-01-12 16:58 ` [PATCH v3 07/47] arm64: mpam: Re-initialise MPAM regs when CPU comes online Ben Horgan
2026-01-15 6:50 ` Gavin Shan
@ 2026-01-15 18:14 ` Catalin Marinas
2026-01-19 13:38 ` Ben Horgan
1 sibling, 1 reply; 160+ messages in thread
From: Catalin Marinas @ 2026-01-15 18:14 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
On Mon, Jan 12, 2026 at 04:58:34PM +0000, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
>
> Now that the MPAM system registers are expected to have values that change,
> reprogram them based on the previous value when a CPU is brought online.
>
> Previously MPAM's 'default PARTID' of 0 was always used for MPAM in
> kernel-space as this is the PARTID that hardware guarantees to
> reset. Because there are a limited number of PARTID, this value is exposed
> to user-space, meaning resctrl changes to the resctrl default group would
> also affect kernel threads. Instead, use the task's PARTID value for
> kernel work on behalf of user-space too. The default of 0 is kept for both
> user-space and kernel-space when MPAM is not enabled.
>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since rfc:
> CONFIG_MPAM -> CONFIG_ARM64_MPAM
> Check mpam_enabled
> Comment about relying on ERET for synchronisation
> Update commit message
> ---
> arch/arm64/kernel/cpufeature.c | 19 ++++++++++++-------
> 1 file changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index c840a93b9ef9..0cdfb3728f43 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -86,6 +86,7 @@
> #include <asm/kvm_host.h>
> #include <asm/mmu.h>
> #include <asm/mmu_context.h>
> +#include <asm/mpam.h>
> #include <asm/mte.h>
> #include <asm/hypervisor.h>
> #include <asm/processor.h>
> @@ -2483,13 +2484,17 @@ test_has_mpam(const struct arm64_cpu_capabilities *entry, int scope)
> static void
> cpu_enable_mpam(const struct arm64_cpu_capabilities *entry)
> {
> - /*
> - * Access by the kernel (at EL1) should use the reserved PARTID
> - * which is configured unrestricted. This avoids priority-inversion
> - * where latency sensitive tasks have to wait for a task that has
> - * been throttled to release the lock.
> - */
> - write_sysreg_s(0, SYS_MPAM1_EL1);
Is this comment about priority inversion no longer valid? I see thread
switching sets the same value for both MPAM0 and MPAM1 registers but I
couldn't find an explanation why this is now better when it wasn't
before.
MPAM1 will also be inherited by IRQ handlers AFAICT.
> + int cpu = smp_processor_id();
> + u64 regval = 0;
> +
> + if (IS_ENABLED(CONFIG_ARM64_MPAM) && static_branch_likely(&mpam_enabled))
> + regval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
> +
> + write_sysreg_s(regval, SYS_MPAM1_EL1);
> + isb();
> +
> + /* Synchronising the EL0 write is left until the ERET to EL0 */
> + write_sysreg_s(regval, SYS_MPAM0_EL1);
I mentioned before, is it worth waiting until ERET?
Related to this, do LDTR/STTR use MPAM0 or MPAM1? I couldn't figure out
from the Arm ARM. If they use MPAM0, then we need the ISB early for the
uaccess routines, at least in the thread switching path (an earlier
patch).
--
Catalin
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 07/47] arm64: mpam: Re-initialise MPAM regs when CPU comes online
2026-01-15 18:14 ` Catalin Marinas
@ 2026-01-19 13:38 ` Ben Horgan
2026-01-19 14:22 ` Ben Horgan
0 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-19 13:38 UTC (permalink / raw)
To: Catalin Marinas
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
Hi Catalin,
On 1/15/26 18:14, Catalin Marinas wrote:
> On Mon, Jan 12, 2026 at 04:58:34PM +0000, Ben Horgan wrote:
>> From: James Morse <james.morse@arm.com>
>>
>> Now that the MPAM system registers are expected to have values that change,
>> reprogram them based on the previous value when a CPU is brought online.
>>
>> Previously MPAM's 'default PARTID' of 0 was always used for MPAM in
>> kernel-space as this is the PARTID that hardware guarantees to
>> reset. Because there are a limited number of PARTID, this value is exposed
>> to user-space, meaning resctrl changes to the resctrl default group would
>> also affect kernel threads. Instead, use the task's PARTID value for
>> kernel work on behalf of user-space too. The default of 0 is kept for both
>> user-space and kernel-space when MPAM is not enabled.
>>
>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>> ---
>> Changes since rfc:
>> CONFIG_MPAM -> CONFIG_ARM64_MPAM
>> Check mpam_enabled
>> Comment about relying on ERET for synchronisation
>> Update commit message
>> ---
>> arch/arm64/kernel/cpufeature.c | 19 ++++++++++++-------
>> 1 file changed, 12 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>> index c840a93b9ef9..0cdfb3728f43 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ b/arch/arm64/kernel/cpufeature.c
>> @@ -86,6 +86,7 @@
>> #include <asm/kvm_host.h>
>> #include <asm/mmu.h>
>> #include <asm/mmu_context.h>
>> +#include <asm/mpam.h>
>> #include <asm/mte.h>
>> #include <asm/hypervisor.h>
>> #include <asm/processor.h>
>> @@ -2483,13 +2484,17 @@ test_has_mpam(const struct arm64_cpu_capabilities *entry, int scope)
>> static void
>> cpu_enable_mpam(const struct arm64_cpu_capabilities *entry)
>> {
>> - /*
>> - * Access by the kernel (at EL1) should use the reserved PARTID
>> - * which is configured unrestricted. This avoids priority-inversion
>> - * where latency sensitive tasks have to wait for a task that has
>> - * been throttled to release the lock.
>> - */
>> - write_sysreg_s(0, SYS_MPAM1_EL1);
>
> Is this comment about priority inversion no longer valid?
Yes, will drop it.
I see thread
> switching sets the same value for both MPAM0 and MPAM1 registers but I
> couldn't find an explanation why this is now better when it wasn't
> before.
I touch on it in the cover letter. It is the way it is done for x86 and
so sensible to make it the default. All partids are usable from
user-space and user-space can't bypass MPAM controls by doing the work
in the kernel.
There is a proposal from Babu at AMD, PLZA, which he presented at LPC
which would give a new interface to have different configuration,
closid, for userspace and kernel space. We should be able to use this
with MPAM too.
>
> MPAM1 will also be inherited by IRQ handlers AFAICT.
Yes, this is a disadvantage of having MPAM1 and MPAM0 change together.
>
>> + int cpu = smp_processor_id();
>> + u64 regval = 0;
>> +
>> + if (IS_ENABLED(CONFIG_ARM64_MPAM) && static_branch_likely(&mpam_enabled))
>> + regval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
>> +
>> + write_sysreg_s(regval, SYS_MPAM1_EL1);
>> + isb();
>> +
>> + /* Synchronising the EL0 write is left until the ERET to EL0 */
>> + write_sysreg_s(regval, SYS_MPAM0_EL1);
>
> I mentioned before, is it worth waiting until ERET?
Just for documentation. I can change it if you prefer.
>
> Related to this, do LDTR/STTR use MPAM0 or MPAM1? I couldn't figure out
> from the Arm ARM. If they use MPAM0, then we need the ISB early for the
> uaccess routines, at least in the thread switching path (an earlier
> patch).
>
They use LDTR/STTR. MPAM doesn't care about which instruction is running.
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 07/47] arm64: mpam: Re-initialise MPAM regs when CPU comes online
2026-01-19 13:38 ` Ben Horgan
@ 2026-01-19 14:22 ` Ben Horgan
0 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-19 14:22 UTC (permalink / raw)
To: Catalin Marinas
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
Hi Catalin,
On 1/19/26 13:38, Ben Horgan wrote:
> Hi Catalin,
>
> On 1/15/26 18:14, Catalin Marinas wrote:
>> On Mon, Jan 12, 2026 at 04:58:34PM +0000, Ben Horgan wrote:
>>> From: James Morse <james.morse@arm.com>
>>>
>>> Now that the MPAM system registers are expected to have values that change,
>>> reprogram them based on the previous value when a CPU is brought online.
>>>
>>> Previously MPAM's 'default PARTID' of 0 was always used for MPAM in
>>> kernel-space as this is the PARTID that hardware guarantees to
>>> reset. Because there are a limited number of PARTID, this value is exposed
>>> to user-space, meaning resctrl changes to the resctrl default group would
>>> also affect kernel threads. Instead, use the task's PARTID value for
>>> kernel work on behalf of user-space too. The default of 0 is kept for both
>>> user-space and kernel-space when MPAM is not enabled.
>>>
>>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>>> Signed-off-by: James Morse <james.morse@arm.com>
>>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>>> ---
>>> Changes since rfc:
>>> CONFIG_MPAM -> CONFIG_ARM64_MPAM
>>> Check mpam_enabled
>>> Comment about relying on ERET for synchronisation
>>> Update commit message
>>> ---
>>> arch/arm64/kernel/cpufeature.c | 19 ++++++++++++-------
>>> 1 file changed, 12 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>>> index c840a93b9ef9..0cdfb3728f43 100644
>>> --- a/arch/arm64/kernel/cpufeature.c
>>> +++ b/arch/arm64/kernel/cpufeature.c
>>> @@ -86,6 +86,7 @@
>>> #include <asm/kvm_host.h>
>>> #include <asm/mmu.h>
>>> #include <asm/mmu_context.h>
>>> +#include <asm/mpam.h>
>>> #include <asm/mte.h>
>>> #include <asm/hypervisor.h>
>>> #include <asm/processor.h>
>>> @@ -2483,13 +2484,17 @@ test_has_mpam(const struct arm64_cpu_capabilities *entry, int scope)
>>> static void
>>> cpu_enable_mpam(const struct arm64_cpu_capabilities *entry)
>>> {
>>> - /*
>>> - * Access by the kernel (at EL1) should use the reserved PARTID
>>> - * which is configured unrestricted. This avoids priority-inversion
>>> - * where latency sensitive tasks have to wait for a task that has
>>> - * been throttled to release the lock.
>>> - */
>>> - write_sysreg_s(0, SYS_MPAM1_EL1);
>>
>> Is this comment about priority inversion no longer valid?
>
> Yes, will drop it.
Ah, that's done in the patch already. Also, we can't treat 0 as special
without taking it away from userspace and then we're down a partid and
there might be that many.
>
> I see thread
>> switching sets the same value for both MPAM0 and MPAM1 registers but I
>> couldn't find an explanation why this is now better when it wasn't
>> before.
>
> I touch on it in the cover letter. It is the way it is done for x86 and
> so sensible to make it the default. All partids are usable from
> user-space and user-space can't bypass MPAM controls by doing the work
> in the kernel.
>
> There is a proposal from Babu at AMD, PLZA, which he presented at LPC
> which would give a new interface to have different configuration,
> closid, for userspace and kernel space. We should be able to use this
> with MPAM too.
>
>>
>> MPAM1 will also be inherited by IRQ handlers AFAICT.
>
> Yes, this is a disadvantage of having MPAM1 and MPAM0 change together.
>
>>
>>> + int cpu = smp_processor_id();
>>> + u64 regval = 0;
>>> +
>>> + if (IS_ENABLED(CONFIG_ARM64_MPAM) && static_branch_likely(&mpam_enabled))
>>> + regval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
>>> +
>>> + write_sysreg_s(regval, SYS_MPAM1_EL1);
>>> + isb();
>>> +
>>> + /* Synchronising the EL0 write is left until the ERET to EL0 */
>>> + write_sysreg_s(regval, SYS_MPAM0_EL1);
>>
>> I mentioned before, is it worth waiting until ERET?
>
> Just for documentation. I can change it if you prefer.
>
>>
>> Related to this, do LDTR/STTR use MPAM0 or MPAM1? I couldn't figure out
>> from the Arm ARM. If they use MPAM0, then we need the ISB early for the
>> uaccess routines, at least in the thread switching path (an earlier
>> patch).
>>
>
> They use LDTR/STTR. MPAM doesn't care about which instruction is running.
>
> Thanks,
>
> Ben
>
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 08/47] arm64: mpam: Advertise the CPUs MPAM limits to the driver
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (6 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 07/47] arm64: mpam: Re-initialise MPAM regs when CPU comes online Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-15 18:16 ` Catalin Marinas
2026-01-19 6:37 ` Gavin Shan
2026-01-12 16:58 ` [PATCH v3 09/47] arm64: mpam: Add cpu_pm notifier to restore MPAM sysregs Ben Horgan
` (43 subsequent siblings)
51 siblings, 2 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: James Morse <james.morse@arm.com>
Requestors need to populate the MPAM fields for any traffic they send on
the interconnect. For the CPUs these values are taken from the
corresponding MPAMy_ELx register. Each requestor may have a limit on the
largest PARTID or PMG value that can be used. The MPAM driver has to
determine the system-wide minimum supported PARTID and PMG values.
To do this, the driver needs to be told what each requestor's limit is.
CPUs are special, but this infrastructure is also needed for the SMMU and
GIC ITS. Call the helper to tell the MPAM driver what the CPUs can do.
The return value can be ignored by the arch code as it runs well before the
MPAM driver starts probing.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
arch/arm64/kernel/mpam.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/arch/arm64/kernel/mpam.c b/arch/arm64/kernel/mpam.c
index 9866d2ca0faa..e6feff2324ac 100644
--- a/arch/arm64/kernel/mpam.c
+++ b/arch/arm64/kernel/mpam.c
@@ -3,6 +3,7 @@
#include <asm/mpam.h>
+#include <linux/arm_mpam.h>
#include <linux/jump_label.h>
#include <linux/percpu.h>
@@ -11,3 +12,14 @@ DEFINE_PER_CPU(u64, arm64_mpam_default);
DEFINE_PER_CPU(u64, arm64_mpam_current);
u64 arm64_mpam_global_default;
+
+static int __init arm64_mpam_register_cpus(void)
+{
+ u64 mpamidr = read_sanitised_ftr_reg(SYS_MPAMIDR_EL1);
+ u16 partid_max = FIELD_GET(MPAMIDR_EL1_PARTID_MAX, mpamidr);
+ u8 pmg_max = FIELD_GET(MPAMIDR_EL1_PMG_MAX, mpamidr);
+
+ return mpam_register_requestor(partid_max, pmg_max);
+}
+/* Must occur before mpam_msc_driver_init() from subsys_initcall() */
+arch_initcall(arm64_mpam_register_cpus)
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 08/47] arm64: mpam: Advertise the CPUs MPAM limits to the driver
2026-01-12 16:58 ` [PATCH v3 08/47] arm64: mpam: Advertise the CPUs MPAM limits to the driver Ben Horgan
@ 2026-01-15 18:16 ` Catalin Marinas
2026-01-19 6:37 ` Gavin Shan
1 sibling, 0 replies; 160+ messages in thread
From: Catalin Marinas @ 2026-01-15 18:16 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
On Mon, Jan 12, 2026 at 04:58:35PM +0000, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
>
> Requestors need to populate the MPAM fields for any traffic they send on
> the interconnect. For the CPUs these values are taken from the
> corresponding MPAMy_ELx register. Each requestor may have a limit on the
> largest PARTID or PMG value that can be used. The MPAM driver has to
> determine the system-wide minimum supported PARTID and PMG values.
>
> To do this, the driver needs to be told what each requestor's limit is.
>
> CPUs are special, but this infrastructure is also needed for the SMMU and
> GIC ITS. Call the helper to tell the MPAM driver what the CPUs can do.
>
> The return value can be ignored by the arch code as it runs well before the
> MPAM driver starts probing.
>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 08/47] arm64: mpam: Advertise the CPUs MPAM limits to the driver
2026-01-12 16:58 ` [PATCH v3 08/47] arm64: mpam: Advertise the CPUs MPAM limits to the driver Ben Horgan
2026-01-15 18:16 ` Catalin Marinas
@ 2026-01-19 6:37 ` Gavin Shan
2026-01-19 14:49 ` Ben Horgan
1 sibling, 1 reply; 160+ messages in thread
From: Gavin Shan @ 2026-01-19 6:37 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
Hi Ben,
On 1/13/26 12:58 AM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
>
> Requestors need to populate the MPAM fields for any traffic they send on
> the interconnect. For the CPUs these values are taken from the
> corresponding MPAMy_ELx register. Each requestor may have a limit on the
> largest PARTID or PMG value that can be used. The MPAM driver has to
> determine the system-wide minimum supported PARTID and PMG values.
>
> To do this, the driver needs to be told what each requestor's limit is.
>
> CPUs are special, but this infrastructure is also needed for the SMMU and
> GIC ITS. Call the helper to tell the MPAM driver what the CPUs can do.
>
> The return value can be ignored by the arch code as it runs well before the
> MPAM driver starts probing.
>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> arch/arm64/kernel/mpam.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/arch/arm64/kernel/mpam.c b/arch/arm64/kernel/mpam.c
> index 9866d2ca0faa..e6feff2324ac 100644
> --- a/arch/arm64/kernel/mpam.c
> +++ b/arch/arm64/kernel/mpam.c
> @@ -3,6 +3,7 @@
>
> #include <asm/mpam.h>
>
> +#include <linux/arm_mpam.h>
> #include <linux/jump_label.h>
> #include <linux/percpu.h>
>
> @@ -11,3 +12,14 @@ DEFINE_PER_CPU(u64, arm64_mpam_default);
> DEFINE_PER_CPU(u64, arm64_mpam_current);
>
> u64 arm64_mpam_global_default;
> +
> +static int __init arm64_mpam_register_cpus(void)
> +{
> + u64 mpamidr = read_sanitised_ftr_reg(SYS_MPAMIDR_EL1);
> + u16 partid_max = FIELD_GET(MPAMIDR_EL1_PARTID_MAX, mpamidr);
> + u8 pmg_max = FIELD_GET(MPAMIDR_EL1_PMG_MAX, mpamidr);
> +
> + return mpam_register_requestor(partid_max, pmg_max);
mpam_register_requestor() is exposed until CONFIG_ARM64_MPAM_DRIVER is set.
CONFIG_ARM64_MPAM_DRIVER and CONFIG_ARM64_MPAM can be different until PATCH[39/47]
is applied. So we need PATCH[39/47] to be applied prior to this patch so that
mpam_register_requestor() is always existing and exposed.
> +}
> +/* Must occur before mpam_msc_driver_init() from subsys_initcall() */
> +arch_initcall(arm64_mpam_register_cpus)
Thanks,
Gavin
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 08/47] arm64: mpam: Advertise the CPUs MPAM limits to the driver
2026-01-19 6:37 ` Gavin Shan
@ 2026-01-19 14:49 ` Ben Horgan
0 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-19 14:49 UTC (permalink / raw)
To: Gavin Shan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
Hi Gavin,
On 1/19/26 06:37, Gavin Shan wrote:
> Hi Ben,
>
> On 1/13/26 12:58 AM, Ben Horgan wrote:
>> From: James Morse <james.morse@arm.com>
>>
>> Requestors need to populate the MPAM fields for any traffic they send on
>> the interconnect. For the CPUs these values are taken from the
>> corresponding MPAMy_ELx register. Each requestor may have a limit on the
>> largest PARTID or PMG value that can be used. The MPAM driver has to
>> determine the system-wide minimum supported PARTID and PMG values.
>>
>> To do this, the driver needs to be told what each requestor's limit is.
>>
>> CPUs are special, but this infrastructure is also needed for the SMMU and
>> GIC ITS. Call the helper to tell the MPAM driver what the CPUs can do.
>>
>> The return value can be ignored by the arch code as it runs well
>> before the
>> MPAM driver starts probing.
>>
>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>> ---
>> arch/arm64/kernel/mpam.c | 12 ++++++++++++
>> 1 file changed, 12 insertions(+)
>>
>> diff --git a/arch/arm64/kernel/mpam.c b/arch/arm64/kernel/mpam.c
>> index 9866d2ca0faa..e6feff2324ac 100644
>> --- a/arch/arm64/kernel/mpam.c
>> +++ b/arch/arm64/kernel/mpam.c
>> @@ -3,6 +3,7 @@
>> #include <asm/mpam.h>
>> +#include <linux/arm_mpam.h>
>> #include <linux/jump_label.h>
>> #include <linux/percpu.h>
>> @@ -11,3 +12,14 @@ DEFINE_PER_CPU(u64, arm64_mpam_default);
>> DEFINE_PER_CPU(u64, arm64_mpam_current);
>> u64 arm64_mpam_global_default;
>> +
>> +static int __init arm64_mpam_register_cpus(void)
>> +{
>> + u64 mpamidr = read_sanitised_ftr_reg(SYS_MPAMIDR_EL1);
>> + u16 partid_max = FIELD_GET(MPAMIDR_EL1_PARTID_MAX, mpamidr);
>> + u8 pmg_max = FIELD_GET(MPAMIDR_EL1_PMG_MAX, mpamidr);
>> +
>> + return mpam_register_requestor(partid_max, pmg_max);
>
> mpam_register_requestor() is exposed until CONFIG_ARM64_MPAM_DRIVER is set.
> CONFIG_ARM64_MPAM_DRIVER and CONFIG_ARM64_MPAM can be different until
> PATCH[39/47]
> is applied. So we need PATCH[39/47] to be applied prior to this patch so
> that
> mpam_register_requestor() is always existing and exposed.
I've split out the part of PATCH[39/47] that removes the CONFIG_EXPERT
restriction and put it before this patch. With that CONFIG_ARM64_MPAM
will unconditionally select CONFIG_ARM64_MPAM_DRIVER.
>
>> +}
>> +/* Must occur before mpam_msc_driver_init() from subsys_initcall() */
>> +arch_initcall(arm64_mpam_register_cpus)
>
> Thanks,
> Gavin
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 09/47] arm64: mpam: Add cpu_pm notifier to restore MPAM sysregs
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (7 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 08/47] arm64: mpam: Advertise the CPUs MPAM limits to the driver Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-15 18:20 ` Catalin Marinas
` (2 more replies)
2026-01-12 16:58 ` [PATCH v3 10/47] arm64: mpam: Initialise and context switch the MPAMSM_EL1 register Ben Horgan
` (42 subsequent siblings)
51 siblings, 3 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: James Morse <james.morse@arm.com>
The MPAM system registers will be lost if the CPU is reset during PSCI's
CPU_SUSPEND.
Add a PM notifier to restore them.
mpam_thread_switch(current) can't be used as this won't make any changes if
the in-memory copy says the register already has the correct value. In
reality the system register is UNKNOWN out of reset.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
arch/arm64/kernel/mpam.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/arch/arm64/kernel/mpam.c b/arch/arm64/kernel/mpam.c
index e6feff2324ac..dbe0a2d05abb 100644
--- a/arch/arm64/kernel/mpam.c
+++ b/arch/arm64/kernel/mpam.c
@@ -4,6 +4,7 @@
#include <asm/mpam.h>
#include <linux/arm_mpam.h>
+#include <linux/cpu_pm.h>
#include <linux/jump_label.h>
#include <linux/percpu.h>
@@ -13,12 +14,41 @@ DEFINE_PER_CPU(u64, arm64_mpam_current);
u64 arm64_mpam_global_default;
+static int mpam_pm_notifier(struct notifier_block *self,
+ unsigned long cmd, void *v)
+{
+ u64 regval;
+ int cpu = smp_processor_id();
+
+ switch (cmd) {
+ case CPU_PM_EXIT:
+ /*
+ * Don't use mpam_thread_switch() as the system register
+ * value has changed under our feet.
+ */
+ regval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
+ write_sysreg_s(regval, SYS_MPAM1_EL1);
+ isb();
+
+ write_sysreg_s(regval, SYS_MPAM0_EL1);
+
+ return NOTIFY_OK;
+ default:
+ return NOTIFY_DONE;
+ }
+}
+
+static struct notifier_block mpam_pm_nb = {
+ .notifier_call = mpam_pm_notifier,
+};
+
static int __init arm64_mpam_register_cpus(void)
{
u64 mpamidr = read_sanitised_ftr_reg(SYS_MPAMIDR_EL1);
u16 partid_max = FIELD_GET(MPAMIDR_EL1_PARTID_MAX, mpamidr);
u8 pmg_max = FIELD_GET(MPAMIDR_EL1_PMG_MAX, mpamidr);
+ cpu_pm_register_notifier(&mpam_pm_nb);
return mpam_register_requestor(partid_max, pmg_max);
}
/* Must occur before mpam_msc_driver_init() from subsys_initcall() */
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 09/47] arm64: mpam: Add cpu_pm notifier to restore MPAM sysregs
2026-01-12 16:58 ` [PATCH v3 09/47] arm64: mpam: Add cpu_pm notifier to restore MPAM sysregs Ben Horgan
@ 2026-01-15 18:20 ` Catalin Marinas
2026-01-19 6:40 ` Gavin Shan
2026-01-19 6:50 ` Gavin Shan
2 siblings, 0 replies; 160+ messages in thread
From: Catalin Marinas @ 2026-01-15 18:20 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
On Mon, Jan 12, 2026 at 04:58:36PM +0000, Ben Horgan wrote:
> +static int mpam_pm_notifier(struct notifier_block *self,
> + unsigned long cmd, void *v)
> +{
> + u64 regval;
> + int cpu = smp_processor_id();
> +
> + switch (cmd) {
> + case CPU_PM_EXIT:
> + /*
> + * Don't use mpam_thread_switch() as the system register
> + * value has changed under our feet.
> + */
> + regval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
> + write_sysreg_s(regval, SYS_MPAM1_EL1);
> + isb();
> +
> + write_sysreg_s(regval, SYS_MPAM0_EL1);
> +
> + return NOTIFY_OK;
> + default:
> + return NOTIFY_DONE;
> + }
> +}
This looks fine unless we decide to save/restore them in the low-level
suspend/resume functions.
--
Catalin
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 09/47] arm64: mpam: Add cpu_pm notifier to restore MPAM sysregs
2026-01-12 16:58 ` [PATCH v3 09/47] arm64: mpam: Add cpu_pm notifier to restore MPAM sysregs Ben Horgan
2026-01-15 18:20 ` Catalin Marinas
@ 2026-01-19 6:40 ` Gavin Shan
2026-01-19 6:50 ` Gavin Shan
2 siblings, 0 replies; 160+ messages in thread
From: Gavin Shan @ 2026-01-19 6:40 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
On 1/13/26 12:58 AM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
>
> The MPAM system registers will be lost if the CPU is reset during PSCI's
> CPU_SUSPEND.
>
> Add a PM notifier to restore them.
>
> mpam_thread_switch(current) can't be used as this won't make any changes if
> the in-memory copy says the register already has the correct value. In
> reality the system register is UNKNOWN out of reset.
>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> arch/arm64/kernel/mpam.c | 30 ++++++++++++++++++++++++++++++
> 1 file changed, 30 insertions(+)
>
Reviewed-by: Gavin Shan <gshan@redhat.com>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 09/47] arm64: mpam: Add cpu_pm notifier to restore MPAM sysregs
2026-01-12 16:58 ` [PATCH v3 09/47] arm64: mpam: Add cpu_pm notifier to restore MPAM sysregs Ben Horgan
2026-01-15 18:20 ` Catalin Marinas
2026-01-19 6:40 ` Gavin Shan
@ 2026-01-19 6:50 ` Gavin Shan
2026-01-19 15:08 ` Ben Horgan
2 siblings, 1 reply; 160+ messages in thread
From: Gavin Shan @ 2026-01-19 6:50 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
Hi Ben,
On 1/13/26 12:58 AM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
>
> The MPAM system registers will be lost if the CPU is reset during PSCI's
> CPU_SUSPEND.
>
> Add a PM notifier to restore them.
>
> mpam_thread_switch(current) can't be used as this won't make any changes if
> the in-memory copy says the register already has the correct value. In
> reality the system register is UNKNOWN out of reset.
>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> arch/arm64/kernel/mpam.c | 30 ++++++++++++++++++++++++++++++
> 1 file changed, 30 insertions(+)
>
One question below...
> diff --git a/arch/arm64/kernel/mpam.c b/arch/arm64/kernel/mpam.c
> index e6feff2324ac..dbe0a2d05abb 100644
> --- a/arch/arm64/kernel/mpam.c
> +++ b/arch/arm64/kernel/mpam.c
> @@ -4,6 +4,7 @@
> #include <asm/mpam.h>
>
> #include <linux/arm_mpam.h>
> +#include <linux/cpu_pm.h>
> #include <linux/jump_label.h>
> #include <linux/percpu.h>
>
> @@ -13,12 +14,41 @@ DEFINE_PER_CPU(u64, arm64_mpam_current);
>
> u64 arm64_mpam_global_default;
>
> +static int mpam_pm_notifier(struct notifier_block *self,
> + unsigned long cmd, void *v)
> +{
> + u64 regval;
> + int cpu = smp_processor_id();
> +
> + switch (cmd) {
> + case CPU_PM_EXIT:
> + /*
> + * Don't use mpam_thread_switch() as the system register
> + * value has changed under our feet.
> + */
> + regval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
> + write_sysreg_s(regval, SYS_MPAM1_EL1);
> + isb();
> +
> + write_sysreg_s(regval, SYS_MPAM0_EL1);
> +
> + return NOTIFY_OK;
> + default:
> + return NOTIFY_DONE;
> + }
> +}
> +
> +static struct notifier_block mpam_pm_nb = {
> + .notifier_call = mpam_pm_notifier,
> +};
> +
> static int __init arm64_mpam_register_cpus(void)
> {
> u64 mpamidr = read_sanitised_ftr_reg(SYS_MPAMIDR_EL1);
> u16 partid_max = FIELD_GET(MPAMIDR_EL1_PARTID_MAX, mpamidr);
> u8 pmg_max = FIELD_GET(MPAMIDR_EL1_PMG_MAX, mpamidr);
>
> + cpu_pm_register_notifier(&mpam_pm_nb);
Need we ensure MPAM capability exists in the hardware before the notifier is registerred?
Otherwise, mpam_pm_notifier() can accesses SYS_MPAM0_EL1 and SYS_MPAM1_EL1 system registers
which may not supported by the hardware.
> return mpam_register_requestor(partid_max, pmg_max);
> }
> /* Must occur before mpam_msc_driver_init() from subsys_initcall() */
Thanks,
Gavin
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 09/47] arm64: mpam: Add cpu_pm notifier to restore MPAM sysregs
2026-01-19 6:50 ` Gavin Shan
@ 2026-01-19 15:08 ` Ben Horgan
0 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-19 15:08 UTC (permalink / raw)
To: Gavin Shan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
Hi Gavin,
On 1/19/26 06:50, Gavin Shan wrote:
> Hi Ben,
>
> On 1/13/26 12:58 AM, Ben Horgan wrote:
>> From: James Morse <james.morse@arm.com>
>>
>> The MPAM system registers will be lost if the CPU is reset during PSCI's
>> CPU_SUSPEND.
>>
>> Add a PM notifier to restore them.
>>
>> mpam_thread_switch(current) can't be used as this won't make any
>> changes if
>> the in-memory copy says the register already has the correct value. In
>> reality the system register is UNKNOWN out of reset.
>>
>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>> ---
>> arch/arm64/kernel/mpam.c | 30 ++++++++++++++++++++++++++++++
>> 1 file changed, 30 insertions(+)
>>
>
> One question below...
>
>> diff --git a/arch/arm64/kernel/mpam.c b/arch/arm64/kernel/mpam.c
>> index e6feff2324ac..dbe0a2d05abb 100644
>> --- a/arch/arm64/kernel/mpam.c
>> +++ b/arch/arm64/kernel/mpam.c
>> @@ -4,6 +4,7 @@
>> #include <asm/mpam.h>
>> #include <linux/arm_mpam.h>
>> +#include <linux/cpu_pm.h>
>> #include <linux/jump_label.h>
>> #include <linux/percpu.h>
>> @@ -13,12 +14,41 @@ DEFINE_PER_CPU(u64, arm64_mpam_current);
>> u64 arm64_mpam_global_default;
>> +static int mpam_pm_notifier(struct notifier_block *self,
>> + unsigned long cmd, void *v)
>> +{
>> + u64 regval;
>> + int cpu = smp_processor_id();
>> +
>> + switch (cmd) {
>> + case CPU_PM_EXIT:
>> + /*
>> + * Don't use mpam_thread_switch() as the system register
>> + * value has changed under our feet.
>> + */
>> + regval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
>> + write_sysreg_s(regval, SYS_MPAM1_EL1);
>> + isb();
>> +
>> + write_sysreg_s(regval, SYS_MPAM0_EL1);
>> +
>> + return NOTIFY_OK;
>> + default:
>> + return NOTIFY_DONE;
>> + }
>> +}
>> +
>> +static struct notifier_block mpam_pm_nb = {
>> + .notifier_call = mpam_pm_notifier,
>> +};
>> +
>> static int __init arm64_mpam_register_cpus(void)
>> {
>> u64 mpamidr = read_sanitised_ftr_reg(SYS_MPAMIDR_EL1);
>> u16 partid_max = FIELD_GET(MPAMIDR_EL1_PARTID_MAX, mpamidr);
>> u8 pmg_max = FIELD_GET(MPAMIDR_EL1_PMG_MAX, mpamidr);
>> + cpu_pm_register_notifier(&mpam_pm_nb);
>
> Need we ensure MPAM capability exists in the hardware before the
> notifier is registerred?
> Otherwise, mpam_pm_notifier() can accesses SYS_MPAM0_EL1 and
> SYS_MPAM1_EL1 system registers
> which may not supported by the hardware.
Ah yes, I'll add a system_supports_mpam() check here.
>
>> return mpam_register_requestor(partid_max, pmg_max);
>> }
>> /* Must occur before mpam_msc_driver_init() from subsys_initcall() */
>
> Thanks,
> Gavin
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 10/47] arm64: mpam: Initialise and context switch the MPAMSM_EL1 register
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (8 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 09/47] arm64: mpam: Add cpu_pm notifier to restore MPAM sysregs Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-15 19:08 ` Catalin Marinas
2026-01-19 6:51 ` Gavin Shan
2026-01-12 16:58 ` [PATCH v3 11/47] arm64: mpam: Add helpers to change a task or cpu's MPAM PARTID/PMG values Ben Horgan
` (41 subsequent siblings)
51 siblings, 2 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
The MPAMSM_EL1 sets the MPAM labels, PMG and PARTID, for loads and stores
generated by a shared SMCU. Disable the traps so the kernel can use it and
set it to the same configuration as the per-EL cpu MPAM configuration.
If an SMCU is not shared with other cpus then it is implementation
defined whether the configuration from MPAMSM_EL1 is used or that from
the appropriate MPAMy_ELx. As we set the same, PMG_D and PARTID_D,
configuration for MPAM0_EL1, MPAM1_EL1 and MPAMSM_EL1 the resulting
configuration is the same regardless.
The range of valid configurations for the PARTID and PMG in MPAMSM_EL1 is
not currently specified in Arm Architectural Reference Manual but the
architect has confirmed that it is intended to be the same as that for the
cpu configuration in the MPAMy_ELx registers.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v2:
Mention PMG_D and PARTID_D specifically int he commit message
Add paragraph in commit message on range of MPAMSM_EL1 fields
---
arch/arm64/include/asm/el2_setup.h | 3 ++-
arch/arm64/include/asm/mpam.h | 2 ++
arch/arm64/kernel/cpufeature.c | 2 ++
arch/arm64/kernel/mpam.c | 3 +++
4 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h
index cacd20df1786..d37984c09799 100644
--- a/arch/arm64/include/asm/el2_setup.h
+++ b/arch/arm64/include/asm/el2_setup.h
@@ -504,7 +504,8 @@
check_override id_aa64pfr0, ID_AA64PFR0_EL1_MPAM_SHIFT, .Linit_mpam_\@, .Lskip_mpam_\@, x1, x2
.Linit_mpam_\@:
- msr_s SYS_MPAM2_EL2, xzr // use the default partition
+ mov x0, #MPAM2_EL2_EnMPAMSM_MASK
+ msr_s SYS_MPAM2_EL2, x0 // use the default partition,
// and disable lower traps
mrs_s x0, SYS_MPAMIDR_EL1
tbz x0, #MPAMIDR_EL1_HAS_HCR_SHIFT, .Lskip_mpam_\@ // skip if no MPAMHCR reg
diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
index 14011e5970ce..7b3d3abad162 100644
--- a/arch/arm64/include/asm/mpam.h
+++ b/arch/arm64/include/asm/mpam.h
@@ -53,6 +53,8 @@ static inline void mpam_thread_switch(struct task_struct *tsk)
return;
write_sysreg_s(regval, SYS_MPAM1_EL1);
+ if (system_supports_sme())
+ write_sysreg_s(regval & (MPAMSM_EL1_PARTID_D | MPAMSM_EL1_PMG_D), SYS_MPAMSM_EL1);
isb();
/* Synchronising the EL0 write is left until the ERET to EL0 */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 0cdfb3728f43..2ede543b3eeb 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2491,6 +2491,8 @@ cpu_enable_mpam(const struct arm64_cpu_capabilities *entry)
regval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
write_sysreg_s(regval, SYS_MPAM1_EL1);
+ if (system_supports_sme())
+ write_sysreg_s(regval & (MPAMSM_EL1_PARTID_D | MPAMSM_EL1_PMG_D), SYS_MPAMSM_EL1);
isb();
/* Synchronising the EL0 write is left until the ERET to EL0 */
diff --git a/arch/arm64/kernel/mpam.c b/arch/arm64/kernel/mpam.c
index dbe0a2d05abb..6ce4a36469ce 100644
--- a/arch/arm64/kernel/mpam.c
+++ b/arch/arm64/kernel/mpam.c
@@ -28,6 +28,9 @@ static int mpam_pm_notifier(struct notifier_block *self,
*/
regval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
write_sysreg_s(regval, SYS_MPAM1_EL1);
+ if (system_supports_sme())
+ write_sysreg_s(regval & (MPAMSM_EL1_PARTID_D | MPAMSM_EL1_PMG_D),
+ SYS_MPAMSM_EL1);
isb();
write_sysreg_s(regval, SYS_MPAM0_EL1);
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 10/47] arm64: mpam: Initialise and context switch the MPAMSM_EL1 register
2026-01-12 16:58 ` [PATCH v3 10/47] arm64: mpam: Initialise and context switch the MPAMSM_EL1 register Ben Horgan
@ 2026-01-15 19:08 ` Catalin Marinas
2026-01-19 13:40 ` Ben Horgan
2026-01-19 6:51 ` Gavin Shan
1 sibling, 1 reply; 160+ messages in thread
From: Catalin Marinas @ 2026-01-15 19:08 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
On Mon, Jan 12, 2026 at 04:58:37PM +0000, Ben Horgan wrote:
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 0cdfb3728f43..2ede543b3eeb 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -2491,6 +2491,8 @@ cpu_enable_mpam(const struct arm64_cpu_capabilities *entry)
> regval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
>
> write_sysreg_s(regval, SYS_MPAM1_EL1);
> + if (system_supports_sme())
> + write_sysreg_s(regval & (MPAMSM_EL1_PARTID_D | MPAMSM_EL1_PMG_D), SYS_MPAMSM_EL1);
> isb();
Do we know for sure that system_supports_sme() returns true at this
point (if SME supported)? Digging into the code, system_supports_sme()
uses alternative_has_cap_unlikely() which relies on instruction
patching. setup_system_capabilities(), IIUC, patches the alternatives
after enable_cpu_capabilities(). I think you better use cpus_have_cap().
--
Catalin
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 10/47] arm64: mpam: Initialise and context switch the MPAMSM_EL1 register
2026-01-15 19:08 ` Catalin Marinas
@ 2026-01-19 13:40 ` Ben Horgan
0 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-19 13:40 UTC (permalink / raw)
To: Catalin Marinas
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
Hi Catalin,
On 1/15/26 19:08, Catalin Marinas wrote:
> On Mon, Jan 12, 2026 at 04:58:37PM +0000, Ben Horgan wrote:
>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>> index 0cdfb3728f43..2ede543b3eeb 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ b/arch/arm64/kernel/cpufeature.c
>> @@ -2491,6 +2491,8 @@ cpu_enable_mpam(const struct arm64_cpu_capabilities *entry)
>> regval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
>>
>> write_sysreg_s(regval, SYS_MPAM1_EL1);
>> + if (system_supports_sme())
>> + write_sysreg_s(regval & (MPAMSM_EL1_PARTID_D | MPAMSM_EL1_PMG_D), SYS_MPAMSM_EL1);
>> isb();
>
> Do we know for sure that system_supports_sme() returns true at this
> point (if SME supported)? Digging into the code, system_supports_sme()
> uses alternative_has_cap_unlikely() which relies on instruction
> patching. setup_system_capabilities(), IIUC, patches the alternatives
> after enable_cpu_capabilities(). I think you better use cpus_have_cap().
>
I'll switch to cpus_have_cap().
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 10/47] arm64: mpam: Initialise and context switch the MPAMSM_EL1 register
2026-01-12 16:58 ` [PATCH v3 10/47] arm64: mpam: Initialise and context switch the MPAMSM_EL1 register Ben Horgan
2026-01-15 19:08 ` Catalin Marinas
@ 2026-01-19 6:51 ` Gavin Shan
2026-01-19 15:31 ` Ben Horgan
1 sibling, 1 reply; 160+ messages in thread
From: Gavin Shan @ 2026-01-19 6:51 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
Hi Ben,
On 1/13/26 12:58 AM, Ben Horgan wrote:
> The MPAMSM_EL1 sets the MPAM labels, PMG and PARTID, for loads and stores
> generated by a shared SMCU. Disable the traps so the kernel can use it and
> set it to the same configuration as the per-EL cpu MPAM configuration.
>
> If an SMCU is not shared with other cpus then it is implementation
> defined whether the configuration from MPAMSM_EL1 is used or that from
> the appropriate MPAMy_ELx. As we set the same, PMG_D and PARTID_D,
> configuration for MPAM0_EL1, MPAM1_EL1 and MPAMSM_EL1 the resulting
> configuration is the same regardless.
>
> The range of valid configurations for the PARTID and PMG in MPAMSM_EL1 is
> not currently specified in Arm Architectural Reference Manual but the
> architect has confirmed that it is intended to be the same as that for the
> cpu configuration in the MPAMy_ELx registers.
>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v2:
> Mention PMG_D and PARTID_D specifically int he commit message
> Add paragraph in commit message on range of MPAMSM_EL1 fields
> ---
> arch/arm64/include/asm/el2_setup.h | 3 ++-
> arch/arm64/include/asm/mpam.h | 2 ++
> arch/arm64/kernel/cpufeature.c | 2 ++
> arch/arm64/kernel/mpam.c | 3 +++
> 4 files changed, 9 insertions(+), 1 deletion(-)
>
One nitpick below...
Reviewed-by: Gavin Shan <gshan@redhat.com>
> diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h
> index cacd20df1786..d37984c09799 100644
> --- a/arch/arm64/include/asm/el2_setup.h
> +++ b/arch/arm64/include/asm/el2_setup.h
> @@ -504,7 +504,8 @@
> check_override id_aa64pfr0, ID_AA64PFR0_EL1_MPAM_SHIFT, .Linit_mpam_\@, .Lskip_mpam_\@, x1, x2
>
> .Linit_mpam_\@:
> - msr_s SYS_MPAM2_EL2, xzr // use the default partition
> + mov x0, #MPAM2_EL2_EnMPAMSM_MASK
> + msr_s SYS_MPAM2_EL2, x0 // use the default partition,
> // and disable lower traps
> mrs_s x0, SYS_MPAMIDR_EL1
> tbz x0, #MPAMIDR_EL1_HAS_HCR_SHIFT, .Lskip_mpam_\@ // skip if no MPAMHCR reg
> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
> index 14011e5970ce..7b3d3abad162 100644
> --- a/arch/arm64/include/asm/mpam.h
> +++ b/arch/arm64/include/asm/mpam.h
> @@ -53,6 +53,8 @@ static inline void mpam_thread_switch(struct task_struct *tsk)
> return;
>
> write_sysreg_s(regval, SYS_MPAM1_EL1);
> + if (system_supports_sme())
> + write_sysreg_s(regval & (MPAMSM_EL1_PARTID_D | MPAMSM_EL1_PMG_D), SYS_MPAMSM_EL1);
> isb();
>
> /* Synchronising the EL0 write is left until the ERET to EL0 */
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 0cdfb3728f43..2ede543b3eeb 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -2491,6 +2491,8 @@ cpu_enable_mpam(const struct arm64_cpu_capabilities *entry)
> regval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
>
> write_sysreg_s(regval, SYS_MPAM1_EL1);
> + if (system_supports_sme())
> + write_sysreg_s(regval & (MPAMSM_EL1_PARTID_D | MPAMSM_EL1_PMG_D), SYS_MPAMSM_EL1);
> isb();
>
> /* Synchronising the EL0 write is left until the ERET to EL0 */
> diff --git a/arch/arm64/kernel/mpam.c b/arch/arm64/kernel/mpam.c
> index dbe0a2d05abb..6ce4a36469ce 100644
> --- a/arch/arm64/kernel/mpam.c
> +++ b/arch/arm64/kernel/mpam.c
> @@ -28,6 +28,9 @@ static int mpam_pm_notifier(struct notifier_block *self,
> */
> regval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
> write_sysreg_s(regval, SYS_MPAM1_EL1);
> + if (system_supports_sme())
> + write_sysreg_s(regval & (MPAMSM_EL1_PARTID_D | MPAMSM_EL1_PMG_D),
> + SYS_MPAMSM_EL1);
{ } is missed here.
> isb();
>
> write_sysreg_s(regval, SYS_MPAM0_EL1);
Thanks,
Gavin
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 10/47] arm64: mpam: Initialise and context switch the MPAMSM_EL1 register
2026-01-19 6:51 ` Gavin Shan
@ 2026-01-19 15:31 ` Ben Horgan
0 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-19 15:31 UTC (permalink / raw)
To: Gavin Shan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
Hi Gavin,
On 1/19/26 06:51, Gavin Shan wrote:
> Hi Ben,
>
> On 1/13/26 12:58 AM, Ben Horgan wrote:
>> The MPAMSM_EL1 sets the MPAM labels, PMG and PARTID, for loads and stores
>> generated by a shared SMCU. Disable the traps so the kernel can use it
>> and
>> set it to the same configuration as the per-EL cpu MPAM configuration.
>>
>> If an SMCU is not shared with other cpus then it is implementation
>> defined whether the configuration from MPAMSM_EL1 is used or that from
>> the appropriate MPAMy_ELx. As we set the same, PMG_D and PARTID_D,
>> configuration for MPAM0_EL1, MPAM1_EL1 and MPAMSM_EL1 the resulting
>> configuration is the same regardless.
>>
>> The range of valid configurations for the PARTID and PMG in MPAMSM_EL1 is
>> not currently specified in Arm Architectural Reference Manual but the
>> architect has confirmed that it is intended to be the same as that for
>> the
>> cpu configuration in the MPAMy_ELx registers.
>>
>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>> ---
>> Changes since v2:
>> Mention PMG_D and PARTID_D specifically int he commit message
>> Add paragraph in commit message on range of MPAMSM_EL1 fields
>> ---
>> arch/arm64/include/asm/el2_setup.h | 3 ++-
>> arch/arm64/include/asm/mpam.h | 2 ++
>> arch/arm64/kernel/cpufeature.c | 2 ++
>> arch/arm64/kernel/mpam.c | 3 +++
>> 4 files changed, 9 insertions(+), 1 deletion(-)
>>
>
> One nitpick below...
>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
>
>> diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/
>> asm/el2_setup.h
>> index cacd20df1786..d37984c09799 100644
>> --- a/arch/arm64/include/asm/el2_setup.h
>> +++ b/arch/arm64/include/asm/el2_setup.h
>> @@ -504,7 +504,8 @@
>> check_override id_aa64pfr0,
>> ID_AA64PFR0_EL1_MPAM_SHIFT, .Linit_mpam_\@, .Lskip_mpam_\@, x1, x2
>> .Linit_mpam_\@:
>> - msr_s SYS_MPAM2_EL2, xzr // use the default partition
>> + mov x0, #MPAM2_EL2_EnMPAMSM_MASK
>> + msr_s SYS_MPAM2_EL2, x0 // use the default partition,
>> // and disable lower traps
>> mrs_s x0, SYS_MPAMIDR_EL1
>> tbz x0, #MPAMIDR_EL1_HAS_HCR_SHIFT, .Lskip_mpam_\@ // skip
>> if no MPAMHCR reg
>> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/
>> mpam.h
>> index 14011e5970ce..7b3d3abad162 100644
>> --- a/arch/arm64/include/asm/mpam.h
>> +++ b/arch/arm64/include/asm/mpam.h
>> @@ -53,6 +53,8 @@ static inline void mpam_thread_switch(struct
>> task_struct *tsk)
>> return;
>> write_sysreg_s(regval, SYS_MPAM1_EL1);
>> + if (system_supports_sme())
>> + write_sysreg_s(regval & (MPAMSM_EL1_PARTID_D |
>> MPAMSM_EL1_PMG_D), SYS_MPAMSM_EL1);
>> isb();
>> /* Synchronising the EL0 write is left until the ERET to EL0 */
>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/
>> cpufeature.c
>> index 0cdfb3728f43..2ede543b3eeb 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ b/arch/arm64/kernel/cpufeature.c
>> @@ -2491,6 +2491,8 @@ cpu_enable_mpam(const struct
>> arm64_cpu_capabilities *entry)
>> regval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
>> write_sysreg_s(regval, SYS_MPAM1_EL1);
>> + if (system_supports_sme())
>> + write_sysreg_s(regval & (MPAMSM_EL1_PARTID_D |
>> MPAMSM_EL1_PMG_D), SYS_MPAMSM_EL1);
>> isb();
>> /* Synchronising the EL0 write is left until the ERET to EL0 */
>> diff --git a/arch/arm64/kernel/mpam.c b/arch/arm64/kernel/mpam.c
>> index dbe0a2d05abb..6ce4a36469ce 100644
>> --- a/arch/arm64/kernel/mpam.c
>> +++ b/arch/arm64/kernel/mpam.c
>> @@ -28,6 +28,9 @@ static int mpam_pm_notifier(struct notifier_block
>> *self,
>> */
>> regval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
>> write_sysreg_s(regval, SYS_MPAM1_EL1);
>> + if (system_supports_sme())
>> + write_sysreg_s(regval & (MPAMSM_EL1_PARTID_D |
>> MPAMSM_EL1_PMG_D),
>> + SYS_MPAMSM_EL1);
>
> { } is missed here.
Addded.
>
>> isb();
>> write_sysreg_s(regval, SYS_MPAM0_EL1);
>
> Thanks,
> Gavin
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 11/47] arm64: mpam: Add helpers to change a task or cpu's MPAM PARTID/PMG values
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (9 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 10/47] arm64: mpam: Initialise and context switch the MPAMSM_EL1 register Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-15 19:13 ` Catalin Marinas
2026-01-19 7:01 ` Gavin Shan
2026-01-12 16:58 ` [PATCH v3 12/47] KVM: arm64: Force guest EL1 to use user-space's partid configuration Ben Horgan
` (40 subsequent siblings)
51 siblings, 2 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm, Dave Martin
From: James Morse <james.morse@arm.com>
Care must be taken when modifying the PARTID and PMG of a task in any
per-task structure as writing these values may race with the task being
scheduled in, and reading the modified values.
Add helpers to set the task properties, and the CPU default value. These
use WRITE_ONCE() that pairs with the READ_ONCE() in mpam_get_regval() to
avoid causing torn values.
CC: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since rfc:
Keep comment attached to mpam_get_regval()
Add internal helper, __mpam_regval() (Jonathan)
---
arch/arm64/include/asm/mpam.h | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
index 7b3d3abad162..c9b73f1af7ce 100644
--- a/arch/arm64/include/asm/mpam.h
+++ b/arch/arm64/include/asm/mpam.h
@@ -4,6 +4,7 @@
#ifndef __ASM__MPAM_H
#define __ASM__MPAM_H
+#include <linux/bitfield.h>
#include <linux/jump_label.h>
#include <linux/percpu.h>
#include <linux/sched.h>
@@ -22,6 +23,22 @@ DECLARE_PER_CPU(u64, arm64_mpam_current);
*/
extern u64 arm64_mpam_global_default;
+static inline u64 __mpam_regval(u16 partid_d, u16 partid_i, u8 pmg_d, u8 pmg_i)
+{
+ return FIELD_PREP(MPAM0_EL1_PARTID_D, partid_d) |
+ FIELD_PREP(MPAM0_EL1_PARTID_I, partid_i) |
+ FIELD_PREP(MPAM0_EL1_PMG_D, pmg_d) |
+ FIELD_PREP(MPAM0_EL1_PMG_I, pmg_i);
+}
+
+static inline void mpam_set_cpu_defaults(int cpu, u16 partid_d, u16 partid_i,
+ u8 pmg_d, u8 pmg_i)
+{
+ u64 default_val = __mpam_regval(partid_d, partid_i, pmg_d, pmg_i);
+
+ WRITE_ONCE(per_cpu(arm64_mpam_default, cpu), default_val);
+}
+
/*
* The resctrl filesystem writes to the partid/pmg values for threads and CPUs,
* which may race with reads in mpam_thread_switch(). Ensure only one of the old
@@ -36,6 +53,17 @@ static inline u64 mpam_get_regval(struct task_struct *tsk)
return READ_ONCE(task_thread_info(tsk)->mpam_partid_pmg);
}
+static inline void mpam_set_task_partid_pmg(struct task_struct *tsk,
+ u16 partid_d, u16 partid_i,
+ u8 pmg_d, u8 pmg_i)
+{
+#ifdef CONFIG_ARM64_MPAM
+ u64 regval = __mpam_regval(partid_d, partid_i, pmg_d, pmg_i);
+
+ WRITE_ONCE(task_thread_info(tsk)->mpam_partid_pmg, regval);
+#endif
+}
+
static inline void mpam_thread_switch(struct task_struct *tsk)
{
u64 oldregval;
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 11/47] arm64: mpam: Add helpers to change a task or cpu's MPAM PARTID/PMG values
2026-01-12 16:58 ` [PATCH v3 11/47] arm64: mpam: Add helpers to change a task or cpu's MPAM PARTID/PMG values Ben Horgan
@ 2026-01-15 19:13 ` Catalin Marinas
2026-01-19 6:56 ` Gavin Shan
2026-01-19 15:47 ` Ben Horgan
2026-01-19 7:01 ` Gavin Shan
1 sibling, 2 replies; 160+ messages in thread
From: Catalin Marinas @ 2026-01-15 19:13 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
On Mon, Jan 12, 2026 at 04:58:38PM +0000, Ben Horgan wrote:
> +static inline void mpam_set_task_partid_pmg(struct task_struct *tsk,
> + u16 partid_d, u16 partid_i,
> + u8 pmg_d, u8 pmg_i)
> +{
> +#ifdef CONFIG_ARM64_MPAM
> + u64 regval = __mpam_regval(partid_d, partid_i, pmg_d, pmg_i);
> +
> + WRITE_ONCE(task_thread_info(tsk)->mpam_partid_pmg, regval);
> +#endif
> +}
Isn't this function, together with mpam_thread_switch(), in an enclosing
#ifdef already?
> +
> static inline void mpam_thread_switch(struct task_struct *tsk)
> {
> u64 oldregval;
> --
> 2.43.0
>
--
Catalin
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 11/47] arm64: mpam: Add helpers to change a task or cpu's MPAM PARTID/PMG values
2026-01-15 19:13 ` Catalin Marinas
@ 2026-01-19 6:56 ` Gavin Shan
2026-01-19 15:47 ` Ben Horgan
1 sibling, 0 replies; 160+ messages in thread
From: Gavin Shan @ 2026-01-19 6:56 UTC (permalink / raw)
To: Catalin Marinas, Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
On 1/16/26 3:13 AM, Catalin Marinas wrote:
> On Mon, Jan 12, 2026 at 04:58:38PM +0000, Ben Horgan wrote:
>> +static inline void mpam_set_task_partid_pmg(struct task_struct *tsk,
>> + u16 partid_d, u16 partid_i,
>> + u8 pmg_d, u8 pmg_i)
>> +{
>> +#ifdef CONFIG_ARM64_MPAM
>> + u64 regval = __mpam_regval(partid_d, partid_i, pmg_d, pmg_i);
>> +
>> + WRITE_ONCE(task_thread_info(tsk)->mpam_partid_pmg, regval);
>> +#endif
>> +}
>
> Isn't this function, together with mpam_thread_switch(), in an enclosing
> #ifdef already?
>
I think Catalin is correct that this ifdef inside mpam_set_task_partid_pmg()
can be dropped.
>> +
>> static inline void mpam_thread_switch(struct task_struct *tsk)
>> {
>> u64 oldregval;
>> --
>> 2.43.0
>>
Thanks,
Gavin
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 11/47] arm64: mpam: Add helpers to change a task or cpu's MPAM PARTID/PMG values
2026-01-15 19:13 ` Catalin Marinas
2026-01-19 6:56 ` Gavin Shan
@ 2026-01-19 15:47 ` Ben Horgan
1 sibling, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-19 15:47 UTC (permalink / raw)
To: Catalin Marinas
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
Hi Catalin,
On 1/15/26 19:13, Catalin Marinas wrote:
> On Mon, Jan 12, 2026 at 04:58:38PM +0000, Ben Horgan wrote:
>> +static inline void mpam_set_task_partid_pmg(struct task_struct *tsk,
>> + u16 partid_d, u16 partid_i,
>> + u8 pmg_d, u8 pmg_i)
>> +{
>> +#ifdef CONFIG_ARM64_MPAM
>> + u64 regval = __mpam_regval(partid_d, partid_i, pmg_d, pmg_i);
>> +
>> + WRITE_ONCE(task_thread_info(tsk)->mpam_partid_pmg, regval);
>> +#endif
>> +}
>
> Isn't this function, together with mpam_thread_switch(), in an enclosing
> #ifdef already?
Yes, I'll remove the inner #ifdef.
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 11/47] arm64: mpam: Add helpers to change a task or cpu's MPAM PARTID/PMG values
2026-01-12 16:58 ` [PATCH v3 11/47] arm64: mpam: Add helpers to change a task or cpu's MPAM PARTID/PMG values Ben Horgan
2026-01-15 19:13 ` Catalin Marinas
@ 2026-01-19 7:01 ` Gavin Shan
2026-01-19 15:49 ` Ben Horgan
1 sibling, 1 reply; 160+ messages in thread
From: Gavin Shan @ 2026-01-19 7:01 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
Hi Ben,
On 1/13/26 12:58 AM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
>
> Care must be taken when modifying the PARTID and PMG of a task in any
> per-task structure as writing these values may race with the task being
> scheduled in, and reading the modified values.
>
> Add helpers to set the task properties, and the CPU default value. These
> use WRITE_ONCE() that pairs with the READ_ONCE() in mpam_get_regval() to
> avoid causing torn values.
>
> CC: Dave Martin <Dave.Martin@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since rfc:
> Keep comment attached to mpam_get_regval()
> Add internal helper, __mpam_regval() (Jonathan)
> ---
> arch/arm64/include/asm/mpam.h | 28 ++++++++++++++++++++++++++++
> 1 file changed, 28 insertions(+)
>
> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
> index 7b3d3abad162..c9b73f1af7ce 100644
> --- a/arch/arm64/include/asm/mpam.h
> +++ b/arch/arm64/include/asm/mpam.h
> @@ -4,6 +4,7 @@
> #ifndef __ASM__MPAM_H
> #define __ASM__MPAM_H
>
> +#include <linux/bitfield.h>
> #include <linux/jump_label.h>
> #include <linux/percpu.h>
> #include <linux/sched.h>
> @@ -22,6 +23,22 @@ DECLARE_PER_CPU(u64, arm64_mpam_current);
> */
> extern u64 arm64_mpam_global_default;
>
> +static inline u64 __mpam_regval(u16 partid_d, u16 partid_i, u8 pmg_d, u8 pmg_i)
> +{
> + return FIELD_PREP(MPAM0_EL1_PARTID_D, partid_d) |
> + FIELD_PREP(MPAM0_EL1_PARTID_I, partid_i) |
> + FIELD_PREP(MPAM0_EL1_PMG_D, pmg_d) |
> + FIELD_PREP(MPAM0_EL1_PMG_I, pmg_i);
> +}
Nitpick: Alignment issues in the lines for 2nd/3rd/4th FIELD_PREP().
return FIELD_PREP(...) |
FIELD_PREP(...) |
FIELD_PREP(...) |
FIELD_PREP(...);
> +
> +static inline void mpam_set_cpu_defaults(int cpu, u16 partid_d, u16 partid_i,
> + u8 pmg_d, u8 pmg_i)
> +{
> + u64 default_val = __mpam_regval(partid_d, partid_i, pmg_d, pmg_i);
> +
> + WRITE_ONCE(per_cpu(arm64_mpam_default, cpu), default_val);
> +}
> +
per_cpu(arm64_mpam_default) won't be reachable until CONFIG_ARM64_MPAM is set.
So I think both mpam_set_cpu_defaults() and __mpam_regval() need to be protected
by '#ifdef CONFIG_ARM64_MPAM ... #endif'.
> /*
> * The resctrl filesystem writes to the partid/pmg values for threads and CPUs,
> * which may race with reads in mpam_thread_switch(). Ensure only one of the old
> @@ -36,6 +53,17 @@ static inline u64 mpam_get_regval(struct task_struct *tsk)
> return READ_ONCE(task_thread_info(tsk)->mpam_partid_pmg);
> }
>
> +static inline void mpam_set_task_partid_pmg(struct task_struct *tsk,
> + u16 partid_d, u16 partid_i,
> + u8 pmg_d, u8 pmg_i)
> +{
> +#ifdef CONFIG_ARM64_MPAM
> + u64 regval = __mpam_regval(partid_d, partid_i, pmg_d, pmg_i);
> +
> + WRITE_ONCE(task_thread_info(tsk)->mpam_partid_pmg, regval);
> +#endif
> +}
> +
> static inline void mpam_thread_switch(struct task_struct *tsk)
> {
> u64 oldregval;
Thanks,
Gavin
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 11/47] arm64: mpam: Add helpers to change a task or cpu's MPAM PARTID/PMG values
2026-01-19 7:01 ` Gavin Shan
@ 2026-01-19 15:49 ` Ben Horgan
0 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-19 15:49 UTC (permalink / raw)
To: Gavin Shan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
Hi Gavin,
On 1/19/26 07:01, Gavin Shan wrote:
> Hi Ben,
>
> On 1/13/26 12:58 AM, Ben Horgan wrote:
>> From: James Morse <james.morse@arm.com>
>>
>> Care must be taken when modifying the PARTID and PMG of a task in any
>> per-task structure as writing these values may race with the task being
>> scheduled in, and reading the modified values.
>>
>> Add helpers to set the task properties, and the CPU default value. These
>> use WRITE_ONCE() that pairs with the READ_ONCE() in mpam_get_regval() to
>> avoid causing torn values.
>>
>> CC: Dave Martin <Dave.Martin@arm.com>
>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>> ---
>> Changes since rfc:
>> Keep comment attached to mpam_get_regval()
>> Add internal helper, __mpam_regval() (Jonathan)
>> ---
>> arch/arm64/include/asm/mpam.h | 28 ++++++++++++++++++++++++++++
>> 1 file changed, 28 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/
>> mpam.h
>> index 7b3d3abad162..c9b73f1af7ce 100644
>> --- a/arch/arm64/include/asm/mpam.h
>> +++ b/arch/arm64/include/asm/mpam.h
>> @@ -4,6 +4,7 @@
>> #ifndef __ASM__MPAM_H
>> #define __ASM__MPAM_H
>> +#include <linux/bitfield.h>
>> #include <linux/jump_label.h>
>> #include <linux/percpu.h>
>> #include <linux/sched.h>
>> @@ -22,6 +23,22 @@ DECLARE_PER_CPU(u64, arm64_mpam_current);
>> */
>> extern u64 arm64_mpam_global_default;
>> +static inline u64 __mpam_regval(u16 partid_d, u16 partid_i, u8
>> pmg_d, u8 pmg_i)
>> +{
>> + return FIELD_PREP(MPAM0_EL1_PARTID_D, partid_d) |
>> + FIELD_PREP(MPAM0_EL1_PARTID_I, partid_i) |
>> + FIELD_PREP(MPAM0_EL1_PMG_D, pmg_d) |
>> + FIELD_PREP(MPAM0_EL1_PMG_I, pmg_i);
>> +}
>
> Nitpick: Alignment issues in the lines for 2nd/3rd/4th FIELD_PREP().
>
> return FIELD_PREP(...) |
> FIELD_PREP(...) |
> FIELD_PREP(...) |
> FIELD_PREP(...);
Sure, I'll align these.
>
>> +
>> +static inline void mpam_set_cpu_defaults(int cpu, u16 partid_d, u16
>> partid_i,
>> + u8 pmg_d, u8 pmg_i)
>> +{
>> + u64 default_val = __mpam_regval(partid_d, partid_i, pmg_d, pmg_i);
>> +
>> + WRITE_ONCE(per_cpu(arm64_mpam_default, cpu), default_val);
>> +}
>> +
>
> per_cpu(arm64_mpam_default) won't be reachable until CONFIG_ARM64_MPAM
> is set.
> So I think both mpam_set_cpu_defaults() and __mpam_regval() need to be
> protected
> by '#ifdef CONFIG_ARM64_MPAM ... #endif'.
I'll move the #ifdef CONFIG_ARM64_MPAM up to cover these.
>
>> /*
>> * The resctrl filesystem writes to the partid/pmg values for
>> threads and CPUs,
>> * which may race with reads in mpam_thread_switch(). Ensure only
>> one of the old
>> @@ -36,6 +53,17 @@ static inline u64 mpam_get_regval(struct
>> task_struct *tsk)
>> return READ_ONCE(task_thread_info(tsk)->mpam_partid_pmg);
>> }
>> +static inline void mpam_set_task_partid_pmg(struct task_struct *tsk,
>> + u16 partid_d, u16 partid_i,
>> + u8 pmg_d, u8 pmg_i)
>> +{
>> +#ifdef CONFIG_ARM64_MPAM
>> + u64 regval = __mpam_regval(partid_d, partid_i, pmg_d, pmg_i);
>> +
>> + WRITE_ONCE(task_thread_info(tsk)->mpam_partid_pmg, regval);
>> +#endif
>> +}
>> +
>> static inline void mpam_thread_switch(struct task_struct *tsk)
>> {
>> u64 oldregval;
>
> Thanks,
> Gavin
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 12/47] KVM: arm64: Force guest EL1 to use user-space's partid configuration
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (10 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 11/47] arm64: mpam: Add helpers to change a task or cpu's MPAM PARTID/PMG values Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-13 14:19 ` Jonathan Cameron
2026-01-14 12:06 ` Marc Zyngier
2026-01-12 16:58 ` [PATCH v3 13/47] KVM: arm64: Use kernel-space partid configuration for hypercalls Ben Horgan
` (39 subsequent siblings)
51 siblings, 2 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: James Morse <james.morse@arm.com>
While we trap the guest's attempts to read/write the MPAM control
registers, the hardware continues to use them. Guest-EL0 uses KVM's
user-space's configuration, as the value is left in the register, and
guest-EL1 uses either the host kernel's configuration, or in the case of
VHE, the UNKNOWN reset value of MPAM1_EL1.
We want to force the guest-EL1 to use KVM's user-space's MPAM
configuration. On nVHE rely on MPAM0_EL1 and MPAM1_EL1 always being
programmed the same and on VHE copy MPAM0_EL1 into the guest's
MPAM1_EL1. There is no need to restore as this is out of context once TGE
is set.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since rfc:
Drop the unneeded __mpam_guest_load() in nvhre and the MPAM1_EL1 save restore
Defer EL2 handling until next patch
Changes since v2:
Use mask (Oliver)
---
arch/arm64/kvm/hyp/vhe/sysreg-sr.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
index f28c6cf4fe1b..9fb8e6628611 100644
--- a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
@@ -183,6 +183,18 @@ void sysreg_restore_guest_state_vhe(struct kvm_cpu_context *ctxt)
}
NOKPROBE_SYMBOL(sysreg_restore_guest_state_vhe);
+/*
+ * The _EL0 value was written by the host's context switch and belongs to the
+ * VMM. Copy this into the guest's _EL1 register.
+ */
+static inline void __mpam_guest_load(void)
+{
+ u64 mask = MPAM0_EL1_PARTID_D | MPAM0_EL1_PARTID_I | MPAM0_EL1_PMG_D | MPAM0_EL1_PMG_I;
+
+ if (system_supports_mpam())
+ write_sysreg_el1(read_sysreg_s(SYS_MPAM0_EL1) & mask, SYS_MPAM1);
+}
+
/**
* __vcpu_load_switch_sysregs - Load guest system registers to the physical CPU
*
@@ -222,6 +234,7 @@ void __vcpu_load_switch_sysregs(struct kvm_vcpu *vcpu)
*/
__sysreg32_restore_state(vcpu);
__sysreg_restore_user_state(guest_ctxt);
+ __mpam_guest_load();
if (unlikely(is_hyp_ctxt(vcpu))) {
__sysreg_restore_vel2_state(vcpu);
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 12/47] KVM: arm64: Force guest EL1 to use user-space's partid configuration
2026-01-12 16:58 ` [PATCH v3 12/47] KVM: arm64: Force guest EL1 to use user-space's partid configuration Ben Horgan
@ 2026-01-13 14:19 ` Jonathan Cameron
2026-01-14 12:06 ` Marc Zyngier
1 sibling, 0 replies; 160+ messages in thread
From: Jonathan Cameron @ 2026-01-13 14:19 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
On Mon, 12 Jan 2026 16:58:39 +0000
Ben Horgan <ben.horgan@arm.com> wrote:
> From: James Morse <james.morse@arm.com>
>
> While we trap the guest's attempts to read/write the MPAM control
> registers, the hardware continues to use them. Guest-EL0 uses KVM's
> user-space's configuration, as the value is left in the register, and
> guest-EL1 uses either the host kernel's configuration, or in the case of
> VHE, the UNKNOWN reset value of MPAM1_EL1.
>
> We want to force the guest-EL1 to use KVM's user-space's MPAM
> configuration. On nVHE rely on MPAM0_EL1 and MPAM1_EL1 always being
> programmed the same and on VHE copy MPAM0_EL1 into the guest's
> MPAM1_EL1. There is no need to restore as this is out of context once TGE
> is set.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 12/47] KVM: arm64: Force guest EL1 to use user-space's partid configuration
2026-01-12 16:58 ` [PATCH v3 12/47] KVM: arm64: Force guest EL1 to use user-space's partid configuration Ben Horgan
2026-01-13 14:19 ` Jonathan Cameron
@ 2026-01-14 12:06 ` Marc Zyngier
2026-01-14 14:50 ` Ben Horgan
1 sibling, 1 reply; 160+ messages in thread
From: Marc Zyngier @ 2026-01-14 12:06 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
oupton, joey.gouly, suzuki.poulose, kvmarm
On Mon, 12 Jan 2026 16:58:39 +0000,
Ben Horgan <ben.horgan@arm.com> wrote:
>
> From: James Morse <james.morse@arm.com>
>
> While we trap the guest's attempts to read/write the MPAM control
> registers, the hardware continues to use them. Guest-EL0 uses KVM's
> user-space's configuration, as the value is left in the register, and
> guest-EL1 uses either the host kernel's configuration, or in the case of
> VHE, the UNKNOWN reset value of MPAM1_EL1.
>
> We want to force the guest-EL1 to use KVM's user-space's MPAM
> configuration. On nVHE rely on MPAM0_EL1 and MPAM1_EL1 always being
> programmed the same and on VHE copy MPAM0_EL1 into the guest's
> MPAM1_EL1. There is no need to restore as this is out of context once TGE
> is set.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since rfc:
> Drop the unneeded __mpam_guest_load() in nvhre and the MPAM1_EL1 save restore
> Defer EL2 handling until next patch
>
> Changes since v2:
> Use mask (Oliver)
> ---
> arch/arm64/kvm/hyp/vhe/sysreg-sr.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
> index f28c6cf4fe1b..9fb8e6628611 100644
> --- a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
> @@ -183,6 +183,18 @@ void sysreg_restore_guest_state_vhe(struct kvm_cpu_context *ctxt)
> }
> NOKPROBE_SYMBOL(sysreg_restore_guest_state_vhe);
>
> +/*
> + * The _EL0 value was written by the host's context switch and belongs to the
> + * VMM. Copy this into the guest's _EL1 register.
> + */
> +static inline void __mpam_guest_load(void)
> +{
> + u64 mask = MPAM0_EL1_PARTID_D | MPAM0_EL1_PARTID_I | MPAM0_EL1_PMG_D | MPAM0_EL1_PMG_I;
> +
> + if (system_supports_mpam())
> + write_sysreg_el1(read_sysreg_s(SYS_MPAM0_EL1) & mask, SYS_MPAM1);
> +}
> +
> /**
> * __vcpu_load_switch_sysregs - Load guest system registers to the physical CPU
> *
> @@ -222,6 +234,7 @@ void __vcpu_load_switch_sysregs(struct kvm_vcpu *vcpu)
> */
> __sysreg32_restore_state(vcpu);
> __sysreg_restore_user_state(guest_ctxt);
> + __mpam_guest_load();
What's the rationale for doing this independently of rest of the MPAM
stuff in __activate_traps_mpam()?
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 12/47] KVM: arm64: Force guest EL1 to use user-space's partid configuration
2026-01-14 12:06 ` Marc Zyngier
@ 2026-01-14 14:50 ` Ben Horgan
2026-01-15 9:05 ` Marc Zyngier
0 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-14 14:50 UTC (permalink / raw)
To: Marc Zyngier
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
oupton, joey.gouly, suzuki.poulose, kvmarm
Hi Marc,
On 1/14/26 12:06, Marc Zyngier wrote:
> On Mon, 12 Jan 2026 16:58:39 +0000,
> Ben Horgan <ben.horgan@arm.com> wrote:
>>
>> From: James Morse <james.morse@arm.com>
>>
>> While we trap the guest's attempts to read/write the MPAM control
>> registers, the hardware continues to use them. Guest-EL0 uses KVM's
>> user-space's configuration, as the value is left in the register, and
>> guest-EL1 uses either the host kernel's configuration, or in the case of
>> VHE, the UNKNOWN reset value of MPAM1_EL1.
>>
>> We want to force the guest-EL1 to use KVM's user-space's MPAM
>> configuration. On nVHE rely on MPAM0_EL1 and MPAM1_EL1 always being
>> programmed the same and on VHE copy MPAM0_EL1 into the guest's
>> MPAM1_EL1. There is no need to restore as this is out of context once TGE
>> is set.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>> ---
>> Changes since rfc:
>> Drop the unneeded __mpam_guest_load() in nvhre and the MPAM1_EL1 save restore
>> Defer EL2 handling until next patch
>>
>> Changes since v2:
>> Use mask (Oliver)
>> ---
>> arch/arm64/kvm/hyp/vhe/sysreg-sr.c | 13 +++++++++++++
>> 1 file changed, 13 insertions(+)
>>
>> diff --git a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
>> index f28c6cf4fe1b..9fb8e6628611 100644
>> --- a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
>> +++ b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
>> @@ -183,6 +183,18 @@ void sysreg_restore_guest_state_vhe(struct kvm_cpu_context *ctxt)
>> }
>> NOKPROBE_SYMBOL(sysreg_restore_guest_state_vhe);
>>
>> +/*
>> + * The _EL0 value was written by the host's context switch and belongs to the
>> + * VMM. Copy this into the guest's _EL1 register.
>> + */
>> +static inline void __mpam_guest_load(void)
>> +{
>> + u64 mask = MPAM0_EL1_PARTID_D | MPAM0_EL1_PARTID_I | MPAM0_EL1_PMG_D | MPAM0_EL1_PMG_I;
>> +
>> + if (system_supports_mpam())
>> + write_sysreg_el1(read_sysreg_s(SYS_MPAM0_EL1) & mask, SYS_MPAM1);
>> +}
>> +
>> /**
>> * __vcpu_load_switch_sysregs - Load guest system registers to the physical CPU
>> *
>> @@ -222,6 +234,7 @@ void __vcpu_load_switch_sysregs(struct kvm_vcpu *vcpu)
>> */
>> __sysreg32_restore_state(vcpu);
>> __sysreg_restore_user_state(guest_ctxt);
>> + __mpam_guest_load();
>
> What's the rationale for doing this independently of rest of the MPAM
> stuff in __activate_traps_mpam()?
The __activate_traps_mpam() is relevant even for nvhe but
__mpam_guest_load() is only need in vhe as otherwise we can rely on
MPAM1_EL1 and MPAM0_EL0 having the same partid/pmg configuration
(although this MPAM policy will likely become configurable sometime down
the line). Besides that it just makes the naming less exact.
>
> M.
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 12/47] KVM: arm64: Force guest EL1 to use user-space's partid configuration
2026-01-14 14:50 ` Ben Horgan
@ 2026-01-15 9:05 ` Marc Zyngier
2026-01-15 11:14 ` Ben Horgan
0 siblings, 1 reply; 160+ messages in thread
From: Marc Zyngier @ 2026-01-15 9:05 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
oupton, joey.gouly, suzuki.poulose, kvmarm
On Wed, 14 Jan 2026 14:50:22 +0000,
Ben Horgan <ben.horgan@arm.com> wrote:
>
> Hi Marc,
>
> On 1/14/26 12:06, Marc Zyngier wrote:
> > On Mon, 12 Jan 2026 16:58:39 +0000,
> > Ben Horgan <ben.horgan@arm.com> wrote:
> >>
> >> From: James Morse <james.morse@arm.com>
> >>
> >> While we trap the guest's attempts to read/write the MPAM control
> >> registers, the hardware continues to use them. Guest-EL0 uses KVM's
> >> user-space's configuration, as the value is left in the register, and
> >> guest-EL1 uses either the host kernel's configuration, or in the case of
> >> VHE, the UNKNOWN reset value of MPAM1_EL1.
> >>
> >> We want to force the guest-EL1 to use KVM's user-space's MPAM
> >> configuration. On nVHE rely on MPAM0_EL1 and MPAM1_EL1 always being
> >> programmed the same and on VHE copy MPAM0_EL1 into the guest's
> >> MPAM1_EL1. There is no need to restore as this is out of context once TGE
> >> is set.
> >>
> >> Signed-off-by: James Morse <james.morse@arm.com>
> >> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> >> ---
> >> Changes since rfc:
> >> Drop the unneeded __mpam_guest_load() in nvhre and the MPAM1_EL1 save restore
> >> Defer EL2 handling until next patch
> >>
> >> Changes since v2:
> >> Use mask (Oliver)
> >> ---
> >> arch/arm64/kvm/hyp/vhe/sysreg-sr.c | 13 +++++++++++++
> >> 1 file changed, 13 insertions(+)
> >>
> >> diff --git a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
> >> index f28c6cf4fe1b..9fb8e6628611 100644
> >> --- a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
> >> +++ b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
> >> @@ -183,6 +183,18 @@ void sysreg_restore_guest_state_vhe(struct kvm_cpu_context *ctxt)
> >> }
> >> NOKPROBE_SYMBOL(sysreg_restore_guest_state_vhe);
> >>
> >> +/*
> >> + * The _EL0 value was written by the host's context switch and belongs to the
> >> + * VMM. Copy this into the guest's _EL1 register.
> >> + */
> >> +static inline void __mpam_guest_load(void)
> >> +{
> >> + u64 mask = MPAM0_EL1_PARTID_D | MPAM0_EL1_PARTID_I | MPAM0_EL1_PMG_D | MPAM0_EL1_PMG_I;
> >> +
> >> + if (system_supports_mpam())
> >> + write_sysreg_el1(read_sysreg_s(SYS_MPAM0_EL1) & mask, SYS_MPAM1);
> >> +}
> >> +
> >> /**
> >> * __vcpu_load_switch_sysregs - Load guest system registers to the physical CPU
> >> *
> >> @@ -222,6 +234,7 @@ void __vcpu_load_switch_sysregs(struct kvm_vcpu *vcpu)
> >> */
> >> __sysreg32_restore_state(vcpu);
> >> __sysreg_restore_user_state(guest_ctxt);
> >> + __mpam_guest_load();
> >
> > What's the rationale for doing this independently of rest of the MPAM
> > stuff in __activate_traps_mpam()?
>
> The __activate_traps_mpam() is relevant even for nvhe but
> __mpam_guest_load() is only need in vhe as otherwise we can rely on
> MPAM1_EL1 and MPAM0_EL0 having the same partid/pmg configuration
It is completely unclear to me what enforces this. Please point me to
the code that does that.
> (although this MPAM policy will likely become configurable sometime down
> the line).
Or not. the VM only exists as an extension of userspace, and I don't
see on what grounds it should get its own MPAM configuration.
> Besides that it just makes the naming less exact.
I don't care about the naming. I care about how the configuration flow
is organised. And so far, this seems extremely messy.
Can you please document what gets configured when and in which mode?
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 12/47] KVM: arm64: Force guest EL1 to use user-space's partid configuration
2026-01-15 9:05 ` Marc Zyngier
@ 2026-01-15 11:14 ` Ben Horgan
0 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-15 11:14 UTC (permalink / raw)
To: Marc Zyngier
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
oupton, joey.gouly, suzuki.poulose, kvmarm
Hi Marc,
On 1/15/26 09:05, Marc Zyngier wrote:
> On Wed, 14 Jan 2026 14:50:22 +0000,
> Ben Horgan <ben.horgan@arm.com> wrote:
>>
>> Hi Marc,
>>
>> On 1/14/26 12:06, Marc Zyngier wrote:
>>> On Mon, 12 Jan 2026 16:58:39 +0000,
>>> Ben Horgan <ben.horgan@arm.com> wrote:
>>>>
>>>> From: James Morse <james.morse@arm.com>
>>>>
>>>> While we trap the guest's attempts to read/write the MPAM control
>>>> registers, the hardware continues to use them. Guest-EL0 uses KVM's
>>>> user-space's configuration, as the value is left in the register, and
>>>> guest-EL1 uses either the host kernel's configuration, or in the case of
>>>> VHE, the UNKNOWN reset value of MPAM1_EL1.
>>>>
>>>> We want to force the guest-EL1 to use KVM's user-space's MPAM
>>>> configuration. On nVHE rely on MPAM0_EL1 and MPAM1_EL1 always being
>>>> programmed the same and on VHE copy MPAM0_EL1 into the guest's
>>>> MPAM1_EL1. There is no need to restore as this is out of context once TGE
>>>> is set.
>>>>
>>>> Signed-off-by: James Morse <james.morse@arm.com>
>>>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>>>> ---
>>>> Changes since rfc:
>>>> Drop the unneeded __mpam_guest_load() in nvhre and the MPAM1_EL1 save restore
>>>> Defer EL2 handling until next patch
>>>>
>>>> Changes since v2:
>>>> Use mask (Oliver)
>>>> ---
>>>> arch/arm64/kvm/hyp/vhe/sysreg-sr.c | 13 +++++++++++++
>>>> 1 file changed, 13 insertions(+)
>>>>
>>>> diff --git a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
>>>> index f28c6cf4fe1b..9fb8e6628611 100644
>>>> --- a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
>>>> +++ b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
>>>> @@ -183,6 +183,18 @@ void sysreg_restore_guest_state_vhe(struct kvm_cpu_context *ctxt)
>>>> }
>>>> NOKPROBE_SYMBOL(sysreg_restore_guest_state_vhe);
>>>>
>>>> +/*
>>>> + * The _EL0 value was written by the host's context switch and belongs to the
>>>> + * VMM. Copy this into the guest's _EL1 register.
>>>> + */
>>>> +static inline void __mpam_guest_load(void)
>>>> +{
>>>> + u64 mask = MPAM0_EL1_PARTID_D | MPAM0_EL1_PARTID_I | MPAM0_EL1_PMG_D | MPAM0_EL1_PMG_I;
>>>> +
>>>> + if (system_supports_mpam())
>>>> + write_sysreg_el1(read_sysreg_s(SYS_MPAM0_EL1) & mask, SYS_MPAM1);
>>>> +}
>>>> +
>>>> /**
>>>> * __vcpu_load_switch_sysregs - Load guest system registers to the physical CPU
>>>> *
>>>> @@ -222,6 +234,7 @@ void __vcpu_load_switch_sysregs(struct kvm_vcpu *vcpu)
>>>> */
>>>> __sysreg32_restore_state(vcpu);
>>>> __sysreg_restore_user_state(guest_ctxt);
>>>> + __mpam_guest_load();
>>>
>>> What's the rationale for doing this independently of rest of the MPAM
>>> stuff in __activate_traps_mpam()?
>>
>> The __activate_traps_mpam() is relevant even for nvhe but
>> __mpam_guest_load() is only need in vhe as otherwise we can rely on
>> MPAM1_EL1 and MPAM0_EL0 having the same partid/pmg configuration
>
> It is completely unclear to me what enforces this. Please point me to
> the code that does that.
The new MPAM arch code always configures kernel space MPAM configuration
in tandem, same value to MPAM0_EL1 and MPAM1_EL1 (will access MPAM2_EL2
in vhe).
For the cpu part see
PATCH v3 07 arm64: mpam: Re-initialise MPAM regs when CPU comes online
and for context switching part see
PATCH v3 06 arm64: mpam: Context switch the MPAM registers
>
>> (although this MPAM policy will likely become configurable sometime down
>> the line).
>
> Or not. the VM only exists as an extension of userspace, and I don't
> see on what grounds it should get its own MPAM configuration.
There are no plans to give the VM it's own MPAM configuration. The
intent of this patch is to give all the code running in the VM, whether
or not it is at EL0 or EL1, uses the MPAM partid/pmg configuration of
the userspace VMM that is running it. What I was alluding to here is
that if a future host MPAM policy allows the partid/pmg to differ for
the kernel and userspace then we won't be able rely on the EL1
configuration matching EL0 in the nvhe case and so would have to copy
the host EL0 configuration to EL1 (and add a restore).
>
>> Besides that it just makes the naming less exact.
>
> I don't care about the naming. I care about how the configuration flow
> is organised. And so far, this seems extremely messy.
>
> Can you please document what gets configured when and in which mode?
The goal is that all times the VM runs with the same MPAM partid/pmg
configuration
as the userspace vmm running it.
In VHE:
Host kernel configuration is in MPAM2_EL2
Host userspace configuration is in MPAM0_EL1
Guest kernel configuration needs to be in MPAM1_EL1 and so we copy it
there from MPAM0_EL1 when
switching to the guest. On switching back to the host we can just leave
it there as the host doesn't use it.
Guest userspace configuration is in MPAM0_EL1, just keep that unchanged.
In nVHE:
Host kernel configuration is in MPAM1_EL1
Host userspace configuration is in MPAM0_EL1
Once again, guest userspace configuration is in MPAM0_EL1, just keep
that unchanged. As host policy ensures MPAM0_EL1 == MPAM1_EL1 for
pmg/partid we rely on this to skip changing MPAM1_EL1 on guest entry and
to skip the restore on guest exit.
I hope this makes the intent clearer.
Would you prefer to decouple the kvm handling of the MPAM configuration
from the host policy? If so, I expect the MPAM handling can be done
together in __activate_traps_mpam().
Thoughts?
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 13/47] KVM: arm64: Use kernel-space partid configuration for hypercalls
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (11 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 12/47] KVM: arm64: Force guest EL1 to use user-space's partid configuration Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-13 14:21 ` Jonathan Cameron
2026-01-14 12:09 ` Marc Zyngier
2026-01-12 16:58 ` [PATCH v3 14/47] arm_mpam: resctrl: Add boilerplate cpuhp and domain allocation Ben Horgan
` (38 subsequent siblings)
51 siblings, 2 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
On nVHE systems whether or not MPAM is enabled, EL2 continues to use
partid-0 for hypercalls, even when the host may have configured its kernel
threads to use a different partid. 0 may have been assigned to another
task. Copy the EL1 MPAM register to EL2. This ensures hypercalls use the
same partid as the kernel thread does on the host.
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v2:
Use mask
Use read_sysreg_el1 to cope with hvhe
---
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index a7c689152f68..ad99d8a73a9e 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -635,6 +635,14 @@ static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
unsigned long hcall_min = 0;
hcall_t hfn;
+ if (system_supports_mpam()) {
+ u64 mask = MPAM1_EL1_PARTID_D | MPAM1_EL1_PARTID_I |
+ MPAM1_EL1_PMG_D | MPAM1_EL1_PMG_I;
+
+ write_sysreg_s(read_sysreg_el1(SYS_MPAM1) & mask, SYS_MPAM2_EL2);
+ isb();
+ }
+
/*
* If pKVM has been initialised then reject any calls to the
* early "privileged" hypercalls. Note that we cannot reject
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 13/47] KVM: arm64: Use kernel-space partid configuration for hypercalls
2026-01-12 16:58 ` [PATCH v3 13/47] KVM: arm64: Use kernel-space partid configuration for hypercalls Ben Horgan
@ 2026-01-13 14:21 ` Jonathan Cameron
2026-01-13 14:35 ` Ben Horgan
2026-01-14 12:09 ` Marc Zyngier
1 sibling, 1 reply; 160+ messages in thread
From: Jonathan Cameron @ 2026-01-13 14:21 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
On Mon, 12 Jan 2026 16:58:40 +0000
Ben Horgan <ben.horgan@arm.com> wrote:
> On nVHE systems whether or not MPAM is enabled, EL2 continues to use
> partid-0 for hypercalls, even when the host may have configured its kernel
> threads to use a different partid. 0 may have been assigned to another
> task. Copy the EL1 MPAM register to EL2. This ensures hypercalls use the
> same partid as the kernel thread does on the host.
>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> ---
> Changes since v2:
> Use mask
> Use read_sysreg_el1 to cope with hvhe
nvhe
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 13/47] KVM: arm64: Use kernel-space partid configuration for hypercalls
2026-01-13 14:21 ` Jonathan Cameron
@ 2026-01-13 14:35 ` Ben Horgan
0 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-13 14:35 UTC (permalink / raw)
To: Jonathan Cameron
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
Hi Jonathan,
On 1/13/26 14:21, Jonathan Cameron wrote:
> On Mon, 12 Jan 2026 16:58:40 +0000
> Ben Horgan <ben.horgan@arm.com> wrote:
>
>> On nVHE systems whether or not MPAM is enabled, EL2 continues to use
>> partid-0 for hypercalls, even when the host may have configured its kernel
>> threads to use a different partid. 0 may have been assigned to another
>> task. Copy the EL1 MPAM register to EL2. This ensures hypercalls use the
>> same partid as the kernel thread does on the host.
>>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>> ---
>> Changes since v2:
>> Use mask
>> Use read_sysreg_el1 to cope with hvhe
>
> nvhe
In this case, I do mean arm64_sw.hvhe which is the configuration when
kvm-arm.mode=protected. The aliasing of the registers when E2H is set
meant that the read_sysreg_s(SYS_MPAM1_EL1) was actually accessing
SYS_MPAM2_EL2 and so the read and write were for the same value.
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 13/47] KVM: arm64: Use kernel-space partid configuration for hypercalls
2026-01-12 16:58 ` [PATCH v3 13/47] KVM: arm64: Use kernel-space partid configuration for hypercalls Ben Horgan
2026-01-13 14:21 ` Jonathan Cameron
@ 2026-01-14 12:09 ` Marc Zyngier
2026-01-14 14:39 ` Ben Horgan
1 sibling, 1 reply; 160+ messages in thread
From: Marc Zyngier @ 2026-01-14 12:09 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
oupton, joey.gouly, suzuki.poulose, kvmarm
On Mon, 12 Jan 2026 16:58:40 +0000,
Ben Horgan <ben.horgan@arm.com> wrote:
>
> On nVHE systems whether or not MPAM is enabled, EL2 continues to use
> partid-0 for hypercalls, even when the host may have configured its kernel
> threads to use a different partid. 0 may have been assigned to another
> task. Copy the EL1 MPAM register to EL2. This ensures hypercalls use the
> same partid as the kernel thread does on the host.
>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v2:
> Use mask
> Use read_sysreg_el1 to cope with hvhe
> ---
> arch/arm64/kvm/hyp/nvhe/hyp-main.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index a7c689152f68..ad99d8a73a9e 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -635,6 +635,14 @@ static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
> unsigned long hcall_min = 0;
> hcall_t hfn;
>
> + if (system_supports_mpam()) {
> + u64 mask = MPAM1_EL1_PARTID_D | MPAM1_EL1_PARTID_I |
> + MPAM1_EL1_PMG_D | MPAM1_EL1_PMG_I;
> +
> + write_sysreg_s(read_sysreg_el1(SYS_MPAM1) & mask, SYS_MPAM2_EL2);
> + isb();
> + }
Is it really OK to not preserve the rest of MPAM2_EL2? This explicitly
clears MPAM2_EL2.MPAMEN, which feels counter-productive.
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 13/47] KVM: arm64: Use kernel-space partid configuration for hypercalls
2026-01-14 12:09 ` Marc Zyngier
@ 2026-01-14 14:39 ` Ben Horgan
2026-01-14 16:50 ` Ben Horgan
0 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-14 14:39 UTC (permalink / raw)
To: Marc Zyngier
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
oupton, joey.gouly, suzuki.poulose, kvmarm
Hi Marc,
On 1/14/26 12:09, Marc Zyngier wrote:
> On Mon, 12 Jan 2026 16:58:40 +0000,
> Ben Horgan <ben.horgan@arm.com> wrote:
>>
>> On nVHE systems whether or not MPAM is enabled, EL2 continues to use
>> partid-0 for hypercalls, even when the host may have configured its kernel
>> threads to use a different partid. 0 may have been assigned to another
>> task. Copy the EL1 MPAM register to EL2. This ensures hypercalls use the
>> same partid as the kernel thread does on the host.
>>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>> ---
>> Changes since v2:
>> Use mask
>> Use read_sysreg_el1 to cope with hvhe
>> ---
>> arch/arm64/kvm/hyp/nvhe/hyp-main.c | 8 ++++++++
>> 1 file changed, 8 insertions(+)
>>
>> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
>> index a7c689152f68..ad99d8a73a9e 100644
>> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
>> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
>> @@ -635,6 +635,14 @@ static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
>> unsigned long hcall_min = 0;
>> hcall_t hfn;
>>
>> + if (system_supports_mpam()) {
>> + u64 mask = MPAM1_EL1_PARTID_D | MPAM1_EL1_PARTID_I |
>> + MPAM1_EL1_PMG_D | MPAM1_EL1_PMG_I;
>> +
>> + write_sysreg_s(read_sysreg_el1(SYS_MPAM1) & mask, SYS_MPAM2_EL2);
>> + isb();
>> + }
>
> Is it really OK to not preserve the rest of MPAM2_EL2? This explicitly
> clears MPAM2_EL2.MPAMEN, which feels counter-productive.
>
> M.
>
There are 3 things to consider:
1. traps - these are only relevant when we leave EL2 and are dealt with
in __activate_traps_mpam(). (This also covers EnMPAMSM which is a
not-trap bit.)
2. MPAM2_EL2.MPAMEN - this is read only as long as we have an EL3 and if
we don't have EL3 will be 0 anyway from el2_setup.h and MPAM won't be
considered supported in the kernel.
3. The alternate partid space fields which are kept as zero and relate
to FEAT_RME.
So, safe. Ok with you or would you rather I make it more obviously safe?
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 13/47] KVM: arm64: Use kernel-space partid configuration for hypercalls
2026-01-14 14:39 ` Ben Horgan
@ 2026-01-14 16:50 ` Ben Horgan
2026-01-14 17:50 ` Marc Zyngier
0 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-14 16:50 UTC (permalink / raw)
To: Marc Zyngier
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
oupton, joey.gouly, suzuki.poulose, kvmarm
Hi Marc,
On 1/14/26 14:39, Ben Horgan wrote:
> Hi Marc,
>
> On 1/14/26 12:09, Marc Zyngier wrote:
>> On Mon, 12 Jan 2026 16:58:40 +0000,
>> Ben Horgan <ben.horgan@arm.com> wrote:
>>>
>>> On nVHE systems whether or not MPAM is enabled, EL2 continues to use
>>> partid-0 for hypercalls, even when the host may have configured its kernel
>>> threads to use a different partid. 0 may have been assigned to another
>>> task. Copy the EL1 MPAM register to EL2. This ensures hypercalls use the
>>> same partid as the kernel thread does on the host.
>>>
>>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>>> ---
>>> Changes since v2:
>>> Use mask
>>> Use read_sysreg_el1 to cope with hvhe
>>> ---
>>> arch/arm64/kvm/hyp/nvhe/hyp-main.c | 8 ++++++++
>>> 1 file changed, 8 insertions(+)
>>>
>>> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
>>> index a7c689152f68..ad99d8a73a9e 100644
>>> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
>>> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
>>> @@ -635,6 +635,14 @@ static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
>>> unsigned long hcall_min = 0;
>>> hcall_t hfn;
>>>
>>> + if (system_supports_mpam()) {
>>> + u64 mask = MPAM1_EL1_PARTID_D | MPAM1_EL1_PARTID_I |
>>> + MPAM1_EL1_PMG_D | MPAM1_EL1_PMG_I;
>>> +
>>> + write_sysreg_s(read_sysreg_el1(SYS_MPAM1) & mask, SYS_MPAM2_EL2);
>>> + isb();
>>> + }
>>
>> Is it really OK to not preserve the rest of MPAM2_EL2? This explicitly
>> clears MPAM2_EL2.MPAMEN, which feels counter-productive.
>>
>> M.
>>
>
> There are 3 things to consider:
> 1. traps - these are only relevant when we leave EL2 and are dealt with
> in __activate_traps_mpam(). (This also covers EnMPAMSM which is a
> not-trap bit.)
> 2. MPAM2_EL2.MPAMEN - this is read only as long as we have an EL3 and if
> we don't have EL3 will be 0 anyway from el2_setup.h and MPAM won't be
> considered supported in the kernel.
> 3. The alternate partid space fields which are kept as zero and relate
> to FEAT_RME.
>
> So, safe. Ok with you or would you rather I make it more obviously safe?
As discussed offline, to avoid having to reason about MPAM2_EL2.MPAMEN
I'll set this bit to 1 in this write as we are already assuming mpam is
enabled and we want to keep it enabled.
>
> Thanks,
>
> Ben
>
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 13/47] KVM: arm64: Use kernel-space partid configuration for hypercalls
2026-01-14 16:50 ` Ben Horgan
@ 2026-01-14 17:50 ` Marc Zyngier
0 siblings, 0 replies; 160+ messages in thread
From: Marc Zyngier @ 2026-01-14 17:50 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
oupton, joey.gouly, suzuki.poulose, kvmarm
On Wed, 14 Jan 2026 16:50:50 +0000,
Ben Horgan <ben.horgan@arm.com> wrote:
>
> Hi Marc,
>
> On 1/14/26 14:39, Ben Horgan wrote:
> > Hi Marc,
> >
> > On 1/14/26 12:09, Marc Zyngier wrote:
> >> On Mon, 12 Jan 2026 16:58:40 +0000,
> >> Ben Horgan <ben.horgan@arm.com> wrote:
> >>>
> >>> On nVHE systems whether or not MPAM is enabled, EL2 continues to use
> >>> partid-0 for hypercalls, even when the host may have configured its kernel
> >>> threads to use a different partid. 0 may have been assigned to another
> >>> task. Copy the EL1 MPAM register to EL2. This ensures hypercalls use the
> >>> same partid as the kernel thread does on the host.
> >>>
> >>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> >>> ---
> >>> Changes since v2:
> >>> Use mask
> >>> Use read_sysreg_el1 to cope with hvhe
> >>> ---
> >>> arch/arm64/kvm/hyp/nvhe/hyp-main.c | 8 ++++++++
> >>> 1 file changed, 8 insertions(+)
> >>>
> >>> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> >>> index a7c689152f68..ad99d8a73a9e 100644
> >>> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> >>> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> >>> @@ -635,6 +635,14 @@ static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
> >>> unsigned long hcall_min = 0;
> >>> hcall_t hfn;
> >>>
> >>> + if (system_supports_mpam()) {
> >>> + u64 mask = MPAM1_EL1_PARTID_D | MPAM1_EL1_PARTID_I |
> >>> + MPAM1_EL1_PMG_D | MPAM1_EL1_PMG_I;
> >>> +
> >>> + write_sysreg_s(read_sysreg_el1(SYS_MPAM1) & mask, SYS_MPAM2_EL2);
> >>> + isb();
> >>> + }
> >>
> >> Is it really OK to not preserve the rest of MPAM2_EL2? This explicitly
> >> clears MPAM2_EL2.MPAMEN, which feels counter-productive.
> >>
> >> M.
> >>
> >
> > There are 3 things to consider:
> > 1. traps - these are only relevant when we leave EL2 and are dealt with
> > in __activate_traps_mpam(). (This also covers EnMPAMSM which is a
> > not-trap bit.)
> > 2. MPAM2_EL2.MPAMEN - this is read only as long as we have an EL3 and if
> > we don't have EL3 will be 0 anyway from el2_setup.h and MPAM won't be
> > considered supported in the kernel.
> > 3. The alternate partid space fields which are kept as zero and relate
> > to FEAT_RME.
> >
> > So, safe. Ok with you or would you rather I make it more obviously safe?
>
> As discussed offline, to avoid having to reason about MPAM2_EL2.MPAMEN
> I'll set this bit to 1 in this write as we are already assuming mpam is
> enabled and we want to keep it enabled.
Sounds good, thanks.
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 14/47] arm_mpam: resctrl: Add boilerplate cpuhp and domain allocation
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (12 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 13/47] KVM: arm64: Use kernel-space partid configuration for hypercalls Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-13 16:49 ` Reinette Chatre
2026-01-12 16:58 ` [PATCH v3 15/47] arm_mpam: resctrl: Sort the order of the domain lists Ben Horgan
` (37 subsequent siblings)
51 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: James Morse <james.morse@arm.com>
resctrl has its own data structures to describe its resources. We can't use
these directly as we play tricks with the 'MBA' resource, picking the MPAM
controls or monitors that best apply. We may export the same component as
both L3 and MBA.
Add mpam_resctrl_exports[] as the array of class->resctrl mappings we are
exporting, and add the cpuhp hooks that allocated and free the resctrl
domain structures.
While we're here, plumb in a few other obvious things.
CONFIG_ARM_CPU_RESCTRL is used to allow this code to be built even though
it can't yet be linked against resctrl.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since rfc:
Domain list is an rcu list
Add synchronize_rcu() to free the deleted element
Code flow simplification (Jonathan)
Changes since v2:
Iterate over mpam_resctrl_dom directly (Jonathan)
Code flow clarification
Comment tidying
Remove power of 2 check as no longer creates holes in rmid indices
Remove unused type argument
add macro helper for_each_mpam_resctrl_control
---
drivers/resctrl/Makefile | 1 +
drivers/resctrl/mpam_devices.c | 12 ++
drivers/resctrl/mpam_internal.h | 22 +++
drivers/resctrl/mpam_resctrl.c | 321 ++++++++++++++++++++++++++++++++
include/linux/arm_mpam.h | 3 +
5 files changed, 359 insertions(+)
create mode 100644 drivers/resctrl/mpam_resctrl.c
diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
index 898199dcf80d..40beaf999582 100644
--- a/drivers/resctrl/Makefile
+++ b/drivers/resctrl/Makefile
@@ -1,4 +1,5 @@
obj-$(CONFIG_ARM64_MPAM_DRIVER) += mpam.o
mpam-y += mpam_devices.o
+mpam-$(CONFIG_ARM_CPU_RESCTRL) += mpam_resctrl.o
ccflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG) += -DDEBUG
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 860181266b15..b81d5c7f44ca 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -1628,6 +1628,9 @@ static int mpam_cpu_online(unsigned int cpu)
mpam_reprogram_msc(msc);
}
+ if (mpam_is_enabled())
+ return mpam_resctrl_online_cpu(cpu);
+
return 0;
}
@@ -1671,6 +1674,9 @@ static int mpam_cpu_offline(unsigned int cpu)
{
struct mpam_msc *msc;
+ if (mpam_is_enabled())
+ mpam_resctrl_offline_cpu(cpu);
+
guard(srcu)(&mpam_srcu);
list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
srcu_read_lock_held(&mpam_srcu)) {
@@ -2517,6 +2523,12 @@ static void mpam_enable_once(void)
mutex_unlock(&mpam_list_lock);
cpus_read_unlock();
+ if (!err) {
+ err = mpam_resctrl_setup();
+ if (err)
+ pr_err("Failed to initialise resctrl: %d\n", err);
+ }
+
if (err) {
mpam_disable_reason = "Failed to enable.";
schedule_work(&mpam_broken_work);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 4632985bcca6..e394ee78918a 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -12,6 +12,7 @@
#include <linux/jump_label.h>
#include <linux/llist.h>
#include <linux/mutex.h>
+#include <linux/resctrl.h>
#include <linux/spinlock.h>
#include <linux/srcu.h>
#include <linux/types.h>
@@ -337,6 +338,17 @@ struct mpam_msc_ris {
struct mpam_garbage garbage;
};
+struct mpam_resctrl_dom {
+ struct mpam_component *ctrl_comp;
+ struct rdt_ctrl_domain resctrl_ctrl_dom;
+ struct rdt_mon_domain resctrl_mon_dom;
+};
+
+struct mpam_resctrl_res {
+ struct mpam_class *class;
+ struct rdt_resource resctrl_res;
+};
+
static inline int mpam_alloc_csu_mon(struct mpam_class *class)
{
struct mpam_props *cprops = &class->props;
@@ -391,6 +403,16 @@ void mpam_msmon_reset_mbwu(struct mpam_component *comp, struct mon_cfg *ctx);
int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
cpumask_t *affinity);
+#ifdef CONFIG_RESCTRL_FS
+int mpam_resctrl_setup(void);
+int mpam_resctrl_online_cpu(unsigned int cpu);
+void mpam_resctrl_offline_cpu(unsigned int cpu);
+#else
+static inline int mpam_resctrl_setup(void) { return 0; }
+static inline int mpam_resctrl_online_cpu(unsigned int cpu) { return 0; }
+static inline void mpam_resctrl_offline_cpu(unsigned int cpu) { }
+#endif /* CONFIG_RESCTRL_FS */
+
/*
* MPAM MSCs have the following register layout. See:
* Arm Memory System Resource Partitioning and Monitoring (MPAM) System
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
new file mode 100644
index 000000000000..cb29c05edfa8
--- /dev/null
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -0,0 +1,321 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2025 Arm Ltd.
+
+#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
+
+#include <linux/arm_mpam.h>
+#include <linux/cacheinfo.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/errno.h>
+#include <linux/list.h>
+#include <linux/printk.h>
+#include <linux/rculist.h>
+#include <linux/resctrl.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+
+#include <asm/mpam.h>
+
+#include "mpam_internal.h"
+
+/*
+ * The classes we've picked to map to resctrl resources, wrapped
+ * in with their resctrl structure.
+ * Class pointer may be NULL.
+ */
+static struct mpam_resctrl_res mpam_resctrl_controls[RDT_NUM_RESOURCES];
+
+#define for_each_mpam_resctrl_control(res, rid) \
+ for (rid = 0, res = &mpam_resctrl_controls[rid]; \
+ rid < RDT_NUM_RESOURCES; \
+ rid++, res = &mpam_resctrl_controls[rid])
+
+/* The lock for modifying resctrl's domain lists from cpuhp callbacks. */
+static DEFINE_MUTEX(domain_list_lock);
+
+static bool exposed_alloc_capable;
+static bool exposed_mon_capable;
+
+bool resctrl_arch_alloc_capable(void)
+{
+ return exposed_alloc_capable;
+}
+
+bool resctrl_arch_mon_capable(void)
+{
+ return exposed_mon_capable;
+}
+
+/*
+ * MSC may raise an error interrupt if it sees an out or range partid/pmg,
+ * and go on to truncate the value. Regardless of what the hardware supports,
+ * only the system wide safe value is safe to use.
+ */
+u32 resctrl_arch_get_num_closid(struct rdt_resource *ignored)
+{
+ return mpam_partid_max + 1;
+}
+
+struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
+{
+ if (l >= RDT_NUM_RESOURCES)
+ return NULL;
+
+ return &mpam_resctrl_controls[l].resctrl_res;
+}
+
+static int mpam_resctrl_control_init(struct mpam_resctrl_res *res)
+{
+ /* TODO: initialise the resctrl resources */
+
+ return 0;
+}
+
+static int mpam_resctrl_pick_domain_id(int cpu, struct mpam_component *comp)
+{
+ struct mpam_class *class = comp->class;
+
+ if (class->type == MPAM_CLASS_CACHE)
+ return comp->comp_id;
+
+ /* TODO: repaint domain ids to match the L3 domain ids */
+ /* Otherwise, expose the ID used by the firmware table code. */
+ return comp->comp_id;
+}
+
+static void mpam_resctrl_domain_hdr_init(int cpu, struct mpam_component *comp,
+ struct rdt_domain_hdr *hdr)
+{
+ lockdep_assert_cpus_held();
+
+ INIT_LIST_HEAD(&hdr->list);
+ hdr->id = mpam_resctrl_pick_domain_id(cpu, comp);
+ cpumask_set_cpu(cpu, &hdr->cpu_mask);
+}
+
+/**
+ * mpam_resctrl_offline_domain_hdr() - Update the domain header to remove a CPU.
+ * @cpu: The CPU to remove from the domain.
+ * @hdr: The domain's header.
+ *
+ * Removes @cpu from the header mask. If this was the last CPU in the domain,
+ * the domain header is removed from its parent list and true is returned,
+ * indicating the parent structure can be freed.
+ * If there are other CPUs in the domain, returns false.
+ */
+static bool mpam_resctrl_offline_domain_hdr(unsigned int cpu,
+ struct rdt_domain_hdr *hdr)
+{
+ lockdep_assert_held(&domain_list_lock);
+
+ cpumask_clear_cpu(cpu, &hdr->cpu_mask);
+ if (cpumask_empty(&hdr->cpu_mask)) {
+ list_del_rcu(&hdr->list);
+ synchronize_rcu();
+ return true;
+ }
+
+ return false;
+}
+
+static struct mpam_resctrl_dom *
+mpam_resctrl_alloc_domain(unsigned int cpu, struct mpam_resctrl_res *res)
+{
+ int err;
+ struct mpam_resctrl_dom *dom;
+ struct rdt_mon_domain *mon_d;
+ struct rdt_ctrl_domain *ctrl_d;
+ struct mpam_class *class = res->class;
+ struct mpam_component *comp_iter, *ctrl_comp;
+ struct rdt_resource *r = &res->resctrl_res;
+
+ lockdep_assert_held(&domain_list_lock);
+
+ ctrl_comp = NULL;
+ guard(srcu)(&mpam_srcu);
+ list_for_each_entry_srcu(comp_iter, &class->components, class_list,
+ srcu_read_lock_held(&mpam_srcu)) {
+ if (cpumask_test_cpu(cpu, &comp_iter->affinity)) {
+ ctrl_comp = comp_iter;
+ break;
+ }
+ }
+
+ /* class has no component for this CPU */
+ if (WARN_ON_ONCE(!ctrl_comp))
+ return ERR_PTR(-EINVAL);
+
+ dom = kzalloc_node(sizeof(*dom), GFP_KERNEL, cpu_to_node(cpu));
+ if (!dom)
+ return ERR_PTR(-ENOMEM);
+
+ if (exposed_alloc_capable) {
+ dom->ctrl_comp = ctrl_comp;
+
+ ctrl_d = &dom->resctrl_ctrl_dom;
+ mpam_resctrl_domain_hdr_init(cpu, ctrl_comp, &ctrl_d->hdr);
+ ctrl_d->hdr.type = RESCTRL_CTRL_DOMAIN;
+ /* TODO: this list should be sorted */
+ list_add_tail_rcu(&ctrl_d->hdr.list, &r->ctrl_domains);
+ err = resctrl_online_ctrl_domain(r, ctrl_d);
+ if (err) {
+ dom = ERR_PTR(err);
+ goto offline_ctrl_domain;
+ }
+ } else {
+ pr_debug("Skipped control domain online - no controls\n");
+ }
+
+ if (exposed_mon_capable) {
+ mon_d = &dom->resctrl_mon_dom;
+ mpam_resctrl_domain_hdr_init(cpu, ctrl_comp, &mon_d->hdr);
+ mon_d->hdr.type = RESCTRL_MON_DOMAIN;
+ /* TODO: this list should be sorted */
+ list_add_tail_rcu(&mon_d->hdr.list, &r->mon_domains);
+ err = resctrl_online_mon_domain(r, mon_d);
+ if (err) {
+ dom = ERR_PTR(err);
+ goto offline_mon_hdr;
+ }
+ } else {
+ pr_debug("Skipped monitor domain online - no monitors\n");
+ }
+
+ return dom;
+
+offline_mon_hdr:
+ mpam_resctrl_offline_domain_hdr(cpu, &mon_d->hdr);
+offline_ctrl_domain:
+ resctrl_offline_ctrl_domain(r, ctrl_d);
+
+ return dom;
+}
+
+static struct mpam_resctrl_dom *
+mpam_resctrl_get_domain_from_cpu(int cpu, struct mpam_resctrl_res *res)
+{
+ struct mpam_resctrl_dom *dom;
+ struct rdt_resource *r = &res->resctrl_res;
+
+ lockdep_assert_cpus_held();
+
+ list_for_each_entry_rcu(dom, &r->ctrl_domains, resctrl_ctrl_dom.hdr.list) {
+ if (cpumask_test_cpu(cpu, &dom->ctrl_comp->affinity))
+ return dom;
+ }
+
+ return NULL;
+}
+
+int mpam_resctrl_online_cpu(unsigned int cpu)
+{
+ struct mpam_resctrl_res *res;
+ enum resctrl_res_level rid;
+
+ guard(mutex)(&domain_list_lock);
+ for_each_mpam_resctrl_control(res, rid) {
+ struct mpam_resctrl_dom *dom;
+
+ if (!res->class)
+ continue; // dummy_resource;
+
+ dom = mpam_resctrl_get_domain_from_cpu(cpu, res);
+ if (!dom)
+ dom = mpam_resctrl_alloc_domain(cpu, res);
+ if (IS_ERR(dom))
+ return PTR_ERR(dom);
+ }
+
+ resctrl_online_cpu(cpu);
+
+ return 0;
+}
+
+void mpam_resctrl_offline_cpu(unsigned int cpu)
+{
+ struct mpam_resctrl_res *res;
+ enum resctrl_res_level rid;
+
+ resctrl_offline_cpu(cpu);
+
+ guard(mutex)(&domain_list_lock);
+ for_each_mpam_resctrl_control(res, rid) {
+ struct mpam_resctrl_dom *dom;
+ struct rdt_mon_domain *mon_d;
+ struct rdt_ctrl_domain *ctrl_d;
+ bool ctrl_dom_empty, mon_dom_empty;
+
+ if (!res->class)
+ continue; // dummy resource
+
+ dom = mpam_resctrl_get_domain_from_cpu(cpu, res);
+ if (WARN_ON_ONCE(!dom))
+ continue;
+
+ if (exposed_alloc_capable) {
+ ctrl_d = &dom->resctrl_ctrl_dom;
+ ctrl_dom_empty = mpam_resctrl_offline_domain_hdr(cpu, &ctrl_d->hdr);
+ if (ctrl_dom_empty)
+ resctrl_offline_ctrl_domain(&res->resctrl_res, ctrl_d);
+ } else {
+ ctrl_dom_empty = true;
+ }
+
+ if (exposed_mon_capable) {
+ mon_d = &dom->resctrl_mon_dom;
+ mon_dom_empty = mpam_resctrl_offline_domain_hdr(cpu, &mon_d->hdr);
+ if (mon_dom_empty)
+ resctrl_offline_mon_domain(&res->resctrl_res, mon_d);
+ } else {
+ mon_dom_empty = true;
+ }
+
+ if (ctrl_dom_empty && mon_dom_empty)
+ kfree(dom);
+ }
+}
+
+int mpam_resctrl_setup(void)
+{
+ int err = 0;
+ struct mpam_resctrl_res *res;
+ enum resctrl_res_level rid;
+
+ cpus_read_lock();
+ for_each_mpam_resctrl_control(res, rid) {
+ INIT_LIST_HEAD_RCU(&res->resctrl_res.ctrl_domains);
+ INIT_LIST_HEAD_RCU(&res->resctrl_res.mon_domains);
+ res->resctrl_res.rid = rid;
+ }
+
+ /* TODO: pick MPAM classes to map to resctrl resources */
+
+ /* Initialise the resctrl structures from the classes */
+ for_each_mpam_resctrl_control(res, rid) {
+ if (!res->class)
+ continue; // dummy resource
+
+ err = mpam_resctrl_control_init(res);
+ if (err) {
+ pr_debug("Failed to initialise rid %u\n", rid);
+ break;
+ }
+ }
+ cpus_read_unlock();
+
+ if (err) {
+ pr_debug("Internal error %d - resctrl not supported\n", err);
+ return err;
+ }
+
+ if (!exposed_alloc_capable && !exposed_mon_capable) {
+ pr_debug("No alloc(%u) or monitor(%u) found - resctrl not supported\n",
+ exposed_alloc_capable, exposed_mon_capable);
+ return -EOPNOTSUPP;
+ }
+
+ /* TODO: call resctrl_init() */
+
+ return 0;
+}
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
index 7f00c5285a32..2c7d1413a401 100644
--- a/include/linux/arm_mpam.h
+++ b/include/linux/arm_mpam.h
@@ -49,6 +49,9 @@ static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
}
#endif
+bool resctrl_arch_alloc_capable(void);
+bool resctrl_arch_mon_capable(void);
+
/**
* mpam_register_requestor() - Register a requestor with the MPAM driver
* @partid_max: The maximum PARTID value the requestor can generate.
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 14/47] arm_mpam: resctrl: Add boilerplate cpuhp and domain allocation
2026-01-12 16:58 ` [PATCH v3 14/47] arm_mpam: resctrl: Add boilerplate cpuhp and domain allocation Ben Horgan
@ 2026-01-13 16:49 ` Reinette Chatre
2026-01-19 17:20 ` Ben Horgan
0 siblings, 1 reply; 160+ messages in thread
From: Reinette Chatre @ 2026-01-13 16:49 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
Hi Ben,
(Please note I am unfamiliar with this code so missing some context.)
On 1/12/26 8:58 AM, Ben Horgan wrote:
> +
> +static struct mpam_resctrl_dom *
> +mpam_resctrl_alloc_domain(unsigned int cpu, struct mpam_resctrl_res *res)
> +{
> + int err;
> + struct mpam_resctrl_dom *dom;
> + struct rdt_mon_domain *mon_d;
> + struct rdt_ctrl_domain *ctrl_d;
> + struct mpam_class *class = res->class;
> + struct mpam_component *comp_iter, *ctrl_comp;
> + struct rdt_resource *r = &res->resctrl_res;
> +
> + lockdep_assert_held(&domain_list_lock);
> +
> + ctrl_comp = NULL;
> + guard(srcu)(&mpam_srcu);
> + list_for_each_entry_srcu(comp_iter, &class->components, class_list,
> + srcu_read_lock_held(&mpam_srcu)) {
> + if (cpumask_test_cpu(cpu, &comp_iter->affinity)) {
> + ctrl_comp = comp_iter;
> + break;
> + }
> + }
> +
> + /* class has no component for this CPU */
> + if (WARN_ON_ONCE(!ctrl_comp))
> + return ERR_PTR(-EINVAL);
> +
> + dom = kzalloc_node(sizeof(*dom), GFP_KERNEL, cpu_to_node(cpu));
> + if (!dom)
> + return ERR_PTR(-ENOMEM);
> +
> + if (exposed_alloc_capable) {
> + dom->ctrl_comp = ctrl_comp;
> +
> + ctrl_d = &dom->resctrl_ctrl_dom;
> + mpam_resctrl_domain_hdr_init(cpu, ctrl_comp, &ctrl_d->hdr);
> + ctrl_d->hdr.type = RESCTRL_CTRL_DOMAIN;
> + /* TODO: this list should be sorted */
> + list_add_tail_rcu(&ctrl_d->hdr.list, &r->ctrl_domains);
> + err = resctrl_online_ctrl_domain(r, ctrl_d);
> + if (err) {
> + dom = ERR_PTR(err);
> + goto offline_ctrl_domain;
It should not be necessary to offline the control domain if attempt to
online it failed but removing it from the ctrl_domains list is necessary. What
happens to memory dom points to?
> + }
> + } else {
> + pr_debug("Skipped control domain online - no controls\n");
> + }
> +
> + if (exposed_mon_capable) {
> + mon_d = &dom->resctrl_mon_dom;
> + mpam_resctrl_domain_hdr_init(cpu, ctrl_comp, &mon_d->hdr);
> + mon_d->hdr.type = RESCTRL_MON_DOMAIN;
> + /* TODO: this list should be sorted */
> + list_add_tail_rcu(&mon_d->hdr.list, &r->mon_domains);
> + err = resctrl_online_mon_domain(r, mon_d);
> + if (err) {
> + dom = ERR_PTR(err);
> + goto offline_mon_hdr;
> + }
> + } else {
> + pr_debug("Skipped monitor domain online - no monitors\n");
> + }
> +
> + return dom;
> +
> +offline_mon_hdr:
> + mpam_resctrl_offline_domain_hdr(cpu, &mon_d->hdr);
> +offline_ctrl_domain:
> + resctrl_offline_ctrl_domain(r, ctrl_d);
> +
> + return dom;
This error path is unexpected to me. From what I can tell, if there is a problem
initializing the monitor domain this flow will undo both monitor and control domain,
even if initialization of control domain was successful. In this case:
- Flow jumps to error path from within the if (exposed_mon_capable) block and proceeds
to do control domain cleanup without considering whether control domain was initialized
or not. That is, does not take exposed_alloc_capable into account
- Control domain cleanup seems to be partial, for example, should it remove domain from ctrl_domains list?
- On failure there is dom = ERR_PTR(err) but I cannot see where this memory is freed in both
the monitor and control domain error paths.
> +int mpam_resctrl_online_cpu(unsigned int cpu)
> +{
> + struct mpam_resctrl_res *res;
> + enum resctrl_res_level rid;
> +
> + guard(mutex)(&domain_list_lock);
> + for_each_mpam_resctrl_control(res, rid) {
> + struct mpam_resctrl_dom *dom;
> +
> + if (!res->class)
> + continue; // dummy_resource;
> +
> + dom = mpam_resctrl_get_domain_from_cpu(cpu, res);
On success, should cpu be added to the respective headers' cpumask?
> + if (!dom)
> + dom = mpam_resctrl_alloc_domain(cpu, res);
> + if (IS_ERR(dom))
> + return PTR_ERR(dom);
> + }
> +
> + resctrl_online_cpu(cpu);
> +
> + return 0;
> +}
Reinette
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 14/47] arm_mpam: resctrl: Add boilerplate cpuhp and domain allocation
2026-01-13 16:49 ` Reinette Chatre
@ 2026-01-19 17:20 ` Ben Horgan
0 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-19 17:20 UTC (permalink / raw)
To: Reinette Chatre
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
Hi Reinette,
On 1/13/26 16:49, Reinette Chatre wrote:
> Hi Ben,
>
> (Please note I am unfamiliar with this code so missing some context.)
>
> On 1/12/26 8:58 AM, Ben Horgan wrote:
>> +
>> +static struct mpam_resctrl_dom *
>> +mpam_resctrl_alloc_domain(unsigned int cpu, struct mpam_resctrl_res *res)
>> +{
>> + int err;
>> + struct mpam_resctrl_dom *dom;
>> + struct rdt_mon_domain *mon_d;
>> + struct rdt_ctrl_domain *ctrl_d;
>> + struct mpam_class *class = res->class;
>> + struct mpam_component *comp_iter, *ctrl_comp;
>> + struct rdt_resource *r = &res->resctrl_res;
>> +
>> + lockdep_assert_held(&domain_list_lock);
>> +
>> + ctrl_comp = NULL;
>> + guard(srcu)(&mpam_srcu);
>> + list_for_each_entry_srcu(comp_iter, &class->components, class_list,
>> + srcu_read_lock_held(&mpam_srcu)) {
>> + if (cpumask_test_cpu(cpu, &comp_iter->affinity)) {
>> + ctrl_comp = comp_iter;
>> + break;
>> + }
>> + }
>> +
>> + /* class has no component for this CPU */
>> + if (WARN_ON_ONCE(!ctrl_comp))
>> + return ERR_PTR(-EINVAL);
>> +
>> + dom = kzalloc_node(sizeof(*dom), GFP_KERNEL, cpu_to_node(cpu));
>> + if (!dom)
>> + return ERR_PTR(-ENOMEM);
>> +
>> + if (exposed_alloc_capable) {
>> + dom->ctrl_comp = ctrl_comp;
>> +
>> + ctrl_d = &dom->resctrl_ctrl_dom;
>> + mpam_resctrl_domain_hdr_init(cpu, ctrl_comp, &ctrl_d->hdr);
>> + ctrl_d->hdr.type = RESCTRL_CTRL_DOMAIN;
>> + /* TODO: this list should be sorted */
>> + list_add_tail_rcu(&ctrl_d->hdr.list, &r->ctrl_domains);
>> + err = resctrl_online_ctrl_domain(r, ctrl_d);
>> + if (err) {
>> + dom = ERR_PTR(err);
>> + goto offline_ctrl_domain;
>
> It should not be necessary to offline the control domain if attempt to
> online it failed but removing it from the ctrl_domains list is necessary. What
> happens to memory dom points to?
Yeah, that leaks the memory and the offline call is unnecessary.
>
>> + }
>> + } else {
>> + pr_debug("Skipped control domain online - no controls\n");
>> + }
>> +
>> + if (exposed_mon_capable) {
>> + mon_d = &dom->resctrl_mon_dom;
>> + mpam_resctrl_domain_hdr_init(cpu, ctrl_comp, &mon_d->hdr);
>> + mon_d->hdr.type = RESCTRL_MON_DOMAIN;
>> + /* TODO: this list should be sorted */
>> + list_add_tail_rcu(&mon_d->hdr.list, &r->mon_domains);
>> + err = resctrl_online_mon_domain(r, mon_d);
>> + if (err) {
>> + dom = ERR_PTR(err);
>> + goto offline_mon_hdr;
>> + }
>> + } else {
>> + pr_debug("Skipped monitor domain online - no monitors\n");
>> + }
>> +
>> + return dom;
>> +
>> +offline_mon_hdr:
>> + mpam_resctrl_offline_domain_hdr(cpu, &mon_d->hdr);
>> +offline_ctrl_domain:
>> + resctrl_offline_ctrl_domain(r, ctrl_d);
>> +
>> + return dom;
>
> This error path is unexpected to me. From what I can tell, if there is a problem
> initializing the monitor domain this flow will undo both monitor and control domain,
> even if initialization of control domain was successful. In this case:
> - Flow jumps to error path from within the if (exposed_mon_capable) block and proceeds
> to do control domain cleanup without considering whether control domain was initialized
> or not. That is, does not take exposed_alloc_capable into account
Yes.
> - Control domain cleanup seems to be partial, for example, should it remove domain from ctrl_domains list?
Indeed.
> - On failure there is dom = ERR_PTR(err) but I cannot see where this memory is freed in both
> the monitor and control domain error paths.
Yes, it's missing.
I've reworked the code to move the resctrl_online_*() calls further up
so there is less to do on error, added a kfree(dom) and made the
ctrl_mon cleanup after the monitor domain failure to be conditional on
exposed_alloc_capable.
>
>
>> +int mpam_resctrl_online_cpu(unsigned int cpu)
>> +{
>> + struct mpam_resctrl_res *res;
>> + enum resctrl_res_level rid;
>> +
>> + guard(mutex)(&domain_list_lock);
>> + for_each_mpam_resctrl_control(res, rid) {
>> + struct mpam_resctrl_dom *dom;
>> +
>> + if (!res->class)
>> + continue; // dummy_resource;
>> +
>> + dom = mpam_resctrl_get_domain_from_cpu(cpu, res);
>
> On success, should cpu be added to the respective headers' cpumask?
Yes, added.
>
>> + if (!dom)
>> + dom = mpam_resctrl_alloc_domain(cpu, res);
>> + if (IS_ERR(dom))
>> + return PTR_ERR(dom);
>> + }
>> +
>> + resctrl_online_cpu(cpu);
>> +
>> + return 0;
>> +}
>
>
> Reinette
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 15/47] arm_mpam: resctrl: Sort the order of the domain lists
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (13 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 14/47] arm_mpam: resctrl: Add boilerplate cpuhp and domain allocation Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-12 16:58 ` [PATCH v3 16/47] arm_mpam: resctrl: Pick the caches we will use as resctrl resources Ben Horgan
` (36 subsequent siblings)
51 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: James Morse <james.morse@arm.com>
resctrl documents that the domains appear in numeric order in the schemata
file. This means a little more work is needed when bringing a domain
online.
Add the support for this, using resctrl_find_domain() to find the point to
insert in the list.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
drivers/resctrl/mpam_resctrl.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index cb29c05edfa8..4dba6f58f79c 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -119,6 +119,21 @@ static bool mpam_resctrl_offline_domain_hdr(unsigned int cpu,
return false;
}
+static void mpam_resctrl_domain_insert(struct list_head *list,
+ struct rdt_domain_hdr *new)
+{
+ struct rdt_domain_hdr *err;
+ struct list_head *pos = NULL;
+
+ lockdep_assert_held(&domain_list_lock);
+
+ err = resctrl_find_domain(list, new->id, &pos);
+ if (WARN_ON_ONCE(err))
+ return;
+
+ list_add_tail_rcu(&new->list, pos);
+}
+
static struct mpam_resctrl_dom *
mpam_resctrl_alloc_domain(unsigned int cpu, struct mpam_resctrl_res *res)
{
@@ -156,8 +171,7 @@ mpam_resctrl_alloc_domain(unsigned int cpu, struct mpam_resctrl_res *res)
ctrl_d = &dom->resctrl_ctrl_dom;
mpam_resctrl_domain_hdr_init(cpu, ctrl_comp, &ctrl_d->hdr);
ctrl_d->hdr.type = RESCTRL_CTRL_DOMAIN;
- /* TODO: this list should be sorted */
- list_add_tail_rcu(&ctrl_d->hdr.list, &r->ctrl_domains);
+ mpam_resctrl_domain_insert(&r->ctrl_domains, &ctrl_d->hdr);
err = resctrl_online_ctrl_domain(r, ctrl_d);
if (err) {
dom = ERR_PTR(err);
@@ -171,8 +185,7 @@ mpam_resctrl_alloc_domain(unsigned int cpu, struct mpam_resctrl_res *res)
mon_d = &dom->resctrl_mon_dom;
mpam_resctrl_domain_hdr_init(cpu, ctrl_comp, &mon_d->hdr);
mon_d->hdr.type = RESCTRL_MON_DOMAIN;
- /* TODO: this list should be sorted */
- list_add_tail_rcu(&mon_d->hdr.list, &r->mon_domains);
+ mpam_resctrl_domain_insert(&r->mon_domains, &mon_d->hdr);
err = resctrl_online_mon_domain(r, mon_d);
if (err) {
dom = ERR_PTR(err);
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH v3 16/47] arm_mpam: resctrl: Pick the caches we will use as resctrl resources
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (14 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 15/47] arm_mpam: resctrl: Sort the order of the domain lists Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-12 16:58 ` [PATCH v3 17/47] arm_mpam: resctrl: Implement resctrl_arch_reset_all_ctrls() Ben Horgan
` (35 subsequent siblings)
51 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: James Morse <james.morse@arm.com>
Systems with MPAM support may have a variety of control types at any point
of their system layout. We can only expose certain types of control, and
only if they exist at particular locations.
Start with the well-known caches. These have to be depth 2 or 3 and support
MPAM's cache portion bitmap controls, with a number of portions fewer than
resctrl's limit.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since rfc:
Jonathan:
Remove brackets
Compress debug message
Use temp var, r
Changes since v2:
Return -EINVAL in mpam_resctrl_control_init() for unknown rid
---
drivers/resctrl/mpam_resctrl.c | 90 +++++++++++++++++++++++++++++++++-
1 file changed, 88 insertions(+), 2 deletions(-)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index 4dba6f58f79c..1566d0c686e6 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -65,9 +65,94 @@ struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
return &mpam_resctrl_controls[l].resctrl_res;
}
+static bool cache_has_usable_cpor(struct mpam_class *class)
+{
+ struct mpam_props *cprops = &class->props;
+
+ if (!mpam_has_feature(mpam_feat_cpor_part, cprops))
+ return false;
+
+ /* resctrl uses u32 for all bitmap configurations */
+ return class->props.cpbm_wd <= 32;
+}
+
+/* Test whether we can export MPAM_CLASS_CACHE:{2,3}? */
+static void mpam_resctrl_pick_caches(void)
+{
+ struct mpam_class *class;
+ struct mpam_resctrl_res *res;
+
+ lockdep_assert_cpus_held();
+
+ guard(srcu)(&mpam_srcu);
+ list_for_each_entry_srcu(class, &mpam_classes, classes_list,
+ srcu_read_lock_held(&mpam_srcu)) {
+ if (class->type != MPAM_CLASS_CACHE) {
+ pr_debug("class %u is not a cache\n", class->level);
+ continue;
+ }
+
+ if (class->level != 2 && class->level != 3) {
+ pr_debug("class %u is not L2 or L3\n", class->level);
+ continue;
+ }
+
+ if (!cache_has_usable_cpor(class)) {
+ pr_debug("class %u cache misses CPOR\n", class->level);
+ continue;
+ }
+
+ if (!cpumask_equal(&class->affinity, cpu_possible_mask)) {
+ pr_debug("class %u has missing CPUs, mask %*pb != %*pb\n", class->level,
+ cpumask_pr_args(&class->affinity),
+ cpumask_pr_args(cpu_possible_mask));
+ continue;
+ }
+
+ if (class->level == 2)
+ res = &mpam_resctrl_controls[RDT_RESOURCE_L2];
+ else
+ res = &mpam_resctrl_controls[RDT_RESOURCE_L3];
+ res->class = class;
+ exposed_alloc_capable = true;
+ }
+}
+
static int mpam_resctrl_control_init(struct mpam_resctrl_res *res)
{
- /* TODO: initialise the resctrl resources */
+ struct mpam_class *class = res->class;
+ struct rdt_resource *r = &res->resctrl_res;
+
+ switch (r->rid) {
+ case RDT_RESOURCE_L2:
+ case RDT_RESOURCE_L3:
+ r->alloc_capable = true;
+ r->schema_fmt = RESCTRL_SCHEMA_BITMAP;
+ r->cache.arch_has_sparse_bitmasks = true;
+
+ r->cache.cbm_len = class->props.cpbm_wd;
+ /* mpam_devices will reject empty bitmaps */
+ r->cache.min_cbm_bits = 1;
+
+ if (r->rid == RDT_RESOURCE_L2) {
+ r->name = "L2";
+ r->ctrl_scope = RESCTRL_L2_CACHE;
+ } else {
+ r->name = "L3";
+ r->ctrl_scope = RESCTRL_L3_CACHE;
+ }
+
+ /*
+ * Which bits are shared with other ...things...
+ * Unknown devices use partid-0 which uses all the bitmap
+ * fields. Until we configured the SMMU and GIC not to do this
+ * 'all the bits' is the correct answer here.
+ */
+ r->cache.shareable_bits = resctrl_get_default_ctrl(r);
+ break;
+ default:
+ return -EINVAL;
+ }
return 0;
}
@@ -302,7 +387,8 @@ int mpam_resctrl_setup(void)
res->resctrl_res.rid = rid;
}
- /* TODO: pick MPAM classes to map to resctrl resources */
+ /* Find some classes to use for controls */
+ mpam_resctrl_pick_caches();
/* Initialise the resctrl structures from the classes */
for_each_mpam_resctrl_control(res, rid) {
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH v3 17/47] arm_mpam: resctrl: Implement resctrl_arch_reset_all_ctrls()
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (15 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 16/47] arm_mpam: resctrl: Pick the caches we will use as resctrl resources Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-13 14:46 ` Jonathan Cameron
2026-01-12 16:58 ` [PATCH v3 18/47] arm_mpam: resctrl: Add resctrl_arch_get_config() Ben Horgan
` (34 subsequent siblings)
51 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: James Morse <james.morse@arm.com>
We already have a helper for resetting an mpam class and component. Hook
it up to resctrl_arch_reset_all_ctrls() and the domain offline path.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v2:
Don't expose unlocked reset
---
drivers/resctrl/mpam_devices.c | 4 ++--
drivers/resctrl/mpam_internal.h | 6 ++++++
drivers/resctrl/mpam_resctrl.c | 15 +++++++++++++++
3 files changed, 23 insertions(+), 2 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index b81d5c7f44ca..0dd7f613f7a3 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -2544,7 +2544,7 @@ static void mpam_enable_once(void)
mpam_partid_max + 1, mpam_pmg_max + 1);
}
-static void mpam_reset_component_locked(struct mpam_component *comp)
+void mpam_reset_component_locked(struct mpam_component *comp)
{
struct mpam_vmsc *vmsc;
@@ -2568,7 +2568,7 @@ static void mpam_reset_component_locked(struct mpam_component *comp)
}
}
-static void mpam_reset_class_locked(struct mpam_class *class)
+void mpam_reset_class_locked(struct mpam_class *class)
{
struct mpam_component *comp;
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index e394ee78918a..f89ceaf7623d 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -393,6 +393,12 @@ extern u8 mpam_pmg_max;
void mpam_enable(struct work_struct *work);
void mpam_disable(struct work_struct *work);
+/* Reset all the RIS in a class under cpus_read_lock() */
+void mpam_reset_class_locked(struct mpam_class *class);
+
+/* Reset all the RIS in a component under cpus_read_lock() */
+void mpam_reset_component_locked(struct mpam_component *comp);
+
int mpam_apply_config(struct mpam_component *comp, u16 partid,
struct mpam_config *cfg);
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index 1566d0c686e6..683bdd6989d4 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -169,6 +169,19 @@ static int mpam_resctrl_pick_domain_id(int cpu, struct mpam_component *comp)
return comp->comp_id;
}
+void resctrl_arch_reset_all_ctrls(struct rdt_resource *r)
+{
+ struct mpam_resctrl_res *res;
+
+ lockdep_assert_cpus_held();
+
+ if (!mpam_is_enabled())
+ return;
+
+ res = container_of(r, struct mpam_resctrl_res, resctrl_res);
+ mpam_reset_class_locked(res->class);
+}
+
static void mpam_resctrl_domain_hdr_init(int cpu, struct mpam_component *comp,
struct rdt_domain_hdr *hdr)
{
@@ -352,6 +365,8 @@ void mpam_resctrl_offline_cpu(unsigned int cpu)
continue;
if (exposed_alloc_capable) {
+ mpam_reset_component_locked(dom->ctrl_comp);
+
ctrl_d = &dom->resctrl_ctrl_dom;
ctrl_dom_empty = mpam_resctrl_offline_domain_hdr(cpu, &ctrl_d->hdr);
if (ctrl_dom_empty)
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 17/47] arm_mpam: resctrl: Implement resctrl_arch_reset_all_ctrls()
2026-01-12 16:58 ` [PATCH v3 17/47] arm_mpam: resctrl: Implement resctrl_arch_reset_all_ctrls() Ben Horgan
@ 2026-01-13 14:46 ` Jonathan Cameron
2026-01-13 14:58 ` Ben Horgan
0 siblings, 1 reply; 160+ messages in thread
From: Jonathan Cameron @ 2026-01-13 14:46 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm, Zeng Heng
On Mon, 12 Jan 2026 16:58:44 +0000
Ben Horgan <ben.horgan@arm.com> wrote:
> From: James Morse <james.morse@arm.com>
>
> We already have a helper for resetting an mpam class and component. Hook
> it up to resctrl_arch_reset_all_ctrls() and the domain offline path.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
There was a question from Zhengheng on v2 that doesn't seem to be addressed
and I'm not seeing a reply on list. Maybe an email snafu?
https://lore.kernel.org/all/20260109034506.1176234-1-zengheng4@huawei.com/
Please +CC zhengheng on future versions if we need them!
Thanks,
Jonathan
> ---
> Changes since v2:
> Don't expose unlocked reset
> ---
> drivers/resctrl/mpam_devices.c | 4 ++--
> drivers/resctrl/mpam_internal.h | 6 ++++++
> drivers/resctrl/mpam_resctrl.c | 15 +++++++++++++++
> 3 files changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index b81d5c7f44ca..0dd7f613f7a3 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -2544,7 +2544,7 @@ static void mpam_enable_once(void)
> mpam_partid_max + 1, mpam_pmg_max + 1);
> }
>
> -static void mpam_reset_component_locked(struct mpam_component *comp)
> +void mpam_reset_component_locked(struct mpam_component *comp)
> {
> struct mpam_vmsc *vmsc;
>
> @@ -2568,7 +2568,7 @@ static void mpam_reset_component_locked(struct mpam_component *comp)
> }
> }
>
> -static void mpam_reset_class_locked(struct mpam_class *class)
> +void mpam_reset_class_locked(struct mpam_class *class)
> {
> struct mpam_component *comp;
>
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index e394ee78918a..f89ceaf7623d 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -393,6 +393,12 @@ extern u8 mpam_pmg_max;
> void mpam_enable(struct work_struct *work);
> void mpam_disable(struct work_struct *work);
>
> +/* Reset all the RIS in a class under cpus_read_lock() */
> +void mpam_reset_class_locked(struct mpam_class *class);
> +
> +/* Reset all the RIS in a component under cpus_read_lock() */
> +void mpam_reset_component_locked(struct mpam_component *comp);
> +
> int mpam_apply_config(struct mpam_component *comp, u16 partid,
> struct mpam_config *cfg);
>
> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
> index 1566d0c686e6..683bdd6989d4 100644
> --- a/drivers/resctrl/mpam_resctrl.c
> +++ b/drivers/resctrl/mpam_resctrl.c
> @@ -169,6 +169,19 @@ static int mpam_resctrl_pick_domain_id(int cpu, struct mpam_component *comp)
> return comp->comp_id;
> }
>
> +void resctrl_arch_reset_all_ctrls(struct rdt_resource *r)
> +{
> + struct mpam_resctrl_res *res;
> +
> + lockdep_assert_cpus_held();
> +
> + if (!mpam_is_enabled())
> + return;
> +
> + res = container_of(r, struct mpam_resctrl_res, resctrl_res);
> + mpam_reset_class_locked(res->class);
> +}
> +
> static void mpam_resctrl_domain_hdr_init(int cpu, struct mpam_component *comp,
> struct rdt_domain_hdr *hdr)
> {
> @@ -352,6 +365,8 @@ void mpam_resctrl_offline_cpu(unsigned int cpu)
> continue;
>
> if (exposed_alloc_capable) {
> + mpam_reset_component_locked(dom->ctrl_comp);
> +
> ctrl_d = &dom->resctrl_ctrl_dom;
> ctrl_dom_empty = mpam_resctrl_offline_domain_hdr(cpu, &ctrl_d->hdr);
> if (ctrl_dom_empty)
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 17/47] arm_mpam: resctrl: Implement resctrl_arch_reset_all_ctrls()
2026-01-13 14:46 ` Jonathan Cameron
@ 2026-01-13 14:58 ` Ben Horgan
0 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-13 14:58 UTC (permalink / raw)
To: Jonathan Cameron
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm, Zeng Heng
Hi Jonathan, Zeng,
On 1/13/26 14:46, Jonathan Cameron wrote:
> On Mon, 12 Jan 2026 16:58:44 +0000
> Ben Horgan <ben.horgan@arm.com> wrote:
>
>> From: James Morse <james.morse@arm.com>
>>
>> We already have a helper for resetting an mpam class and component. Hook
>> it up to resctrl_arch_reset_all_ctrls() and the domain offline path.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>
> There was a question from Zhengheng on v2 that doesn't seem to be addressed
> and I'm not seeing a reply on list. Maybe an email snafu?
Yes, sorry! For some reason I don't get his emails. :( It happened on
the base mpam driver series too. I'll keep a better look at lore and
reply to his v2 query.
>
> https://lore.kernel.org/all/20260109034506.1176234-1-zengheng4@huawei.com/
>
> Please +CC zhengheng on future versions if we need them!
Will do. Should be CC'd on this version too. I don't if there is an
email issue with that too.
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 18/47] arm_mpam: resctrl: Add resctrl_arch_get_config()
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (16 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 17/47] arm_mpam: resctrl: Implement resctrl_arch_reset_all_ctrls() Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-12 16:58 ` [PATCH v3 19/47] arm_mpam: resctrl: Implement helpers to update configuration Ben Horgan
` (33 subsequent siblings)
51 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: James Morse <james.morse@arm.com>
Implement resctrl_arch_get_config() by testing the live configuration for a
CPOR bitmap. For any other configuration type return the default.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
drivers/resctrl/mpam_resctrl.c | 43 ++++++++++++++++++++++++++++++++++
1 file changed, 43 insertions(+)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index 683bdd6989d4..25012e779509 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -169,6 +169,49 @@ static int mpam_resctrl_pick_domain_id(int cpu, struct mpam_component *comp)
return comp->comp_id;
}
+u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
+ u32 closid, enum resctrl_conf_type type)
+{
+ u32 partid;
+ struct mpam_config *cfg;
+ struct mpam_props *cprops;
+ struct mpam_resctrl_res *res;
+ struct mpam_resctrl_dom *dom;
+ enum mpam_device_features configured_by;
+
+ lockdep_assert_cpus_held();
+
+ if (!mpam_is_enabled())
+ return resctrl_get_default_ctrl(r);
+
+ res = container_of(r, struct mpam_resctrl_res, resctrl_res);
+ dom = container_of(d, struct mpam_resctrl_dom, resctrl_ctrl_dom);
+ cprops = &res->class->props;
+
+ partid = resctrl_get_config_index(closid, type);
+ cfg = &dom->ctrl_comp->cfg[partid];
+
+ switch (r->rid) {
+ case RDT_RESOURCE_L2:
+ case RDT_RESOURCE_L3:
+ configured_by = mpam_feat_cpor_part;
+ break;
+ default:
+ return resctrl_get_default_ctrl(r);
+ }
+
+ if (!r->alloc_capable || partid >= resctrl_arch_get_num_closid(r) ||
+ !mpam_has_feature(configured_by, cfg))
+ return resctrl_get_default_ctrl(r);
+
+ switch (configured_by) {
+ case mpam_feat_cpor_part:
+ return cfg->cpbm;
+ default:
+ return resctrl_get_default_ctrl(r);
+ }
+}
+
void resctrl_arch_reset_all_ctrls(struct rdt_resource *r)
{
struct mpam_resctrl_res *res;
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH v3 19/47] arm_mpam: resctrl: Implement helpers to update configuration
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (17 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 18/47] arm_mpam: resctrl: Add resctrl_arch_get_config() Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-12 16:58 ` [PATCH v3 20/47] arm_mpam: resctrl: Add plumbing against arm64 task and cpu hooks Ben Horgan
` (32 subsequent siblings)
51 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: James Morse <james.morse@arm.com>
resctrl has two helpers for updating the configuration.
resctrl_arch_update_one() updates a single value, and is used by the
software-controller to apply feedback to the bandwidth controls, it has to
be called on one of the CPUs in the resctrl:domain.
resctrl_arch_update_domains() copies multiple staged configurations, it can
be called from anywhere.
Both helpers should update any changes to the underlying hardware.
Implement resctrl_arch_update_domains() to use
resctrl_arch_update_one(). Neither need to be called on a specific CPU as
the mpam driver will send IPIs as needed.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since rfc:
list_for_each_entry -> list_for_each_entry_rcu
return 0
Restrict scope of local variables
Changes since v2:
whitespace fix
---
drivers/resctrl/mpam_resctrl.c | 70 ++++++++++++++++++++++++++++++++++
1 file changed, 70 insertions(+)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index 25012e779509..5b73fe45b8fa 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -212,6 +212,76 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
}
}
+int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
+ u32 closid, enum resctrl_conf_type t, u32 cfg_val)
+{
+ u32 partid;
+ struct mpam_config cfg;
+ struct mpam_props *cprops;
+ struct mpam_resctrl_res *res;
+ struct mpam_resctrl_dom *dom;
+
+ lockdep_assert_cpus_held();
+ lockdep_assert_irqs_enabled();
+
+ /*
+ * No need to check the CPU as mpam_apply_config() doesn't care, and
+ * resctrl_arch_update_domains() relies on this.
+ */
+ res = container_of(r, struct mpam_resctrl_res, resctrl_res);
+ dom = container_of(d, struct mpam_resctrl_dom, resctrl_ctrl_dom);
+ cprops = &res->class->props;
+
+ partid = resctrl_get_config_index(closid, t);
+ if (!r->alloc_capable || partid >= resctrl_arch_get_num_closid(r)) {
+ pr_debug("Not alloc capable or computed PARTID out of range\n");
+ return -EINVAL;
+ }
+
+ /*
+ * Copy the current config to avoid clearing other resources when the
+ * same component is exposed multiple times through resctrl.
+ */
+ cfg = dom->ctrl_comp->cfg[partid];
+
+ switch (r->rid) {
+ case RDT_RESOURCE_L2:
+ case RDT_RESOURCE_L3:
+ cfg.cpbm = cfg_val;
+ mpam_set_feature(mpam_feat_cpor_part, &cfg);
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return mpam_apply_config(dom->ctrl_comp, partid, &cfg);
+}
+
+int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
+{
+ int err;
+ struct rdt_ctrl_domain *d;
+
+ lockdep_assert_cpus_held();
+ lockdep_assert_irqs_enabled();
+
+ list_for_each_entry_rcu(d, &r->ctrl_domains, hdr.list) {
+ for (enum resctrl_conf_type t = 0; t < CDP_NUM_TYPES; t++) {
+ struct resctrl_staged_config *cfg = &d->staged_config[t];
+
+ if (!cfg->have_new_ctrl)
+ continue;
+
+ err = resctrl_arch_update_one(r, d, closid, t,
+ cfg->new_ctrl);
+ if (err)
+ return err;
+ }
+ }
+
+ return 0;
+}
+
void resctrl_arch_reset_all_ctrls(struct rdt_resource *r)
{
struct mpam_resctrl_res *res;
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH v3 20/47] arm_mpam: resctrl: Add plumbing against arm64 task and cpu hooks
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (18 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 19/47] arm_mpam: resctrl: Implement helpers to update configuration Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-12 16:58 ` [PATCH v3 21/47] arm_mpam: resctrl: Add CDP emulation Ben Horgan
` (31 subsequent siblings)
51 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: James Morse <james.morse@arm.com>
arm64 provides helpers for changing a task's and a cpu's mpam partid/pmg
values.
These are used to back a number of resctrl_arch_ functions. Connect them
up.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v2:
apostrophes in commit message
---
drivers/resctrl/mpam_resctrl.c | 58 ++++++++++++++++++++++++++++++++++
include/linux/arm_mpam.h | 5 +++
2 files changed, 63 insertions(+)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index 5b73fe45b8fa..c596b224c967 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -8,6 +8,7 @@
#include <linux/cpu.h>
#include <linux/cpumask.h>
#include <linux/errno.h>
+#include <linux/limits.h>
#include <linux/list.h>
#include <linux/printk.h>
#include <linux/rculist.h>
@@ -37,6 +38,8 @@ static DEFINE_MUTEX(domain_list_lock);
static bool exposed_alloc_capable;
static bool exposed_mon_capable;
+static bool cdp_enabled;
+
bool resctrl_arch_alloc_capable(void)
{
return exposed_alloc_capable;
@@ -57,6 +60,61 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *ignored)
return mpam_partid_max + 1;
}
+void resctrl_arch_sched_in(struct task_struct *tsk)
+{
+ lockdep_assert_preemption_disabled();
+
+ mpam_thread_switch(tsk);
+}
+
+void resctrl_arch_set_cpu_default_closid_rmid(int cpu, u32 closid, u32 rmid)
+{
+ WARN_ON_ONCE(closid > U16_MAX);
+ WARN_ON_ONCE(rmid > U8_MAX);
+
+ if (!cdp_enabled) {
+ mpam_set_cpu_defaults(cpu, closid, closid, rmid, rmid);
+ } else {
+ /*
+ * When CDP is enabled, resctrl halves the closid range and we
+ * use odd/even partid for one closid.
+ */
+ u32 partid_d = resctrl_get_config_index(closid, CDP_DATA);
+ u32 partid_i = resctrl_get_config_index(closid, CDP_CODE);
+
+ mpam_set_cpu_defaults(cpu, partid_d, partid_i, rmid, rmid);
+ }
+}
+
+void resctrl_arch_sync_cpu_closid_rmid(void *info)
+{
+ struct resctrl_cpu_defaults *r = info;
+
+ lockdep_assert_preemption_disabled();
+
+ if (r) {
+ resctrl_arch_set_cpu_default_closid_rmid(smp_processor_id(),
+ r->closid, r->rmid);
+ }
+
+ resctrl_arch_sched_in(current);
+}
+
+void resctrl_arch_set_closid_rmid(struct task_struct *tsk, u32 closid, u32 rmid)
+{
+ WARN_ON_ONCE(closid > U16_MAX);
+ WARN_ON_ONCE(rmid > U8_MAX);
+
+ if (!cdp_enabled) {
+ mpam_set_task_partid_pmg(tsk, closid, closid, rmid, rmid);
+ } else {
+ u32 partid_d = resctrl_get_config_index(closid, CDP_DATA);
+ u32 partid_i = resctrl_get_config_index(closid, CDP_CODE);
+
+ mpam_set_task_partid_pmg(tsk, partid_d, partid_i, rmid, rmid);
+ }
+}
+
struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
{
if (l >= RDT_NUM_RESOURCES)
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
index 2c7d1413a401..5a78299ec464 100644
--- a/include/linux/arm_mpam.h
+++ b/include/linux/arm_mpam.h
@@ -52,6 +52,11 @@ static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
bool resctrl_arch_alloc_capable(void);
bool resctrl_arch_mon_capable(void);
+void resctrl_arch_set_cpu_default_closid(int cpu, u32 closid);
+void resctrl_arch_set_closid_rmid(struct task_struct *tsk, u32 closid, u32 rmid);
+void resctrl_arch_set_cpu_default_closid_rmid(int cpu, u32 closid, u32 rmid);
+void resctrl_arch_sched_in(struct task_struct *tsk);
+
/**
* mpam_register_requestor() - Register a requestor with the MPAM driver
* @partid_max: The maximum PARTID value the requestor can generate.
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH v3 21/47] arm_mpam: resctrl: Add CDP emulation
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (19 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 20/47] arm_mpam: resctrl: Add plumbing against arm64 task and cpu hooks Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-12 16:58 ` [PATCH v3 22/47] arm_mpam: resctrl: Convert to/from MPAMs fixed-point formats Ben Horgan
` (30 subsequent siblings)
51 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm, Dave Martin
From: James Morse <james.morse@arm.com>
Intel RDT's CDP feature allows the cache to use a different control value
depending on whether the accesses was for instruction fetch or a data
access. MPAM's equivalent feature is the other way up: the CPU assigns a
different partid label to traffic depending on whether it was instruction
fetch or a data access, which causes the cache to use a different control
value based solely on the partid.
MPAM can emulate CDP, with the side effect that the alternative partid is
seen by all MSC, it can't be enabled per-MSC.
Add the resctrl hooks to turn this on or off. Add the helpers that match a
closid against a task, which need to be aware that the value written to
hardware is not the same as the one resctrl is using.
Update the 'arm64_mpam_global_default' variable the arch code uses during
context switch to know when the per-cpu value should be used instead.
Awkwardly, the MB controls don't implement CDP. To emulate this, the MPAM
equivalent needs programming twice by the resctrl glue, as resctrl expects
the bandwidth controls to be applied independently for both data and
instruction-fetch.
CC: Dave Martin <Dave.Martin@arm.com>
CC: Amit Singh Tomar <amitsinght@marvell.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since rfc:
Fail cdp initialisation if there is only one partid
Correct data/code confusion
Changes since v2:
Don't include unused header
---
arch/arm64/include/asm/mpam.h | 1 +
drivers/resctrl/mpam_resctrl.c | 112 +++++++++++++++++++++++++++++++++
include/linux/arm_mpam.h | 2 +
3 files changed, 115 insertions(+)
diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
index c9b73f1af7ce..3a49f666e5e8 100644
--- a/arch/arm64/include/asm/mpam.h
+++ b/arch/arm64/include/asm/mpam.h
@@ -4,6 +4,7 @@
#ifndef __ASM__MPAM_H
#define __ASM__MPAM_H
+#include <linux/arm_mpam.h>
#include <linux/bitfield.h>
#include <linux/jump_label.h>
#include <linux/percpu.h>
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index c596b224c967..8112bcb85e73 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -38,6 +38,10 @@ static DEFINE_MUTEX(domain_list_lock);
static bool exposed_alloc_capable;
static bool exposed_mon_capable;
+/*
+ * MPAM emulates CDP by setting different PARTID in the I/D fields of MPAM0_EL1.
+ * This applies globally to all traffic the CPU generates.
+ */
static bool cdp_enabled;
bool resctrl_arch_alloc_capable(void)
@@ -50,6 +54,67 @@ bool resctrl_arch_mon_capable(void)
return exposed_mon_capable;
}
+bool resctrl_arch_get_cdp_enabled(enum resctrl_res_level rid)
+{
+ switch (rid) {
+ case RDT_RESOURCE_L2:
+ case RDT_RESOURCE_L3:
+ return cdp_enabled;
+ case RDT_RESOURCE_MBA:
+ default:
+ /*
+ * x86's MBA control doesn't support CDP, so user-space doesn't
+ * expect it.
+ */
+ return false;
+ }
+}
+
+/**
+ * resctrl_reset_task_closids() - Reset the PARTID/PMG values for all tasks.
+ *
+ * At boot, all existing tasks use partid zero for D and I.
+ * To enable/disable CDP emulation, all these tasks need relabelling.
+ */
+static void resctrl_reset_task_closids(void)
+{
+ struct task_struct *p, *t;
+
+ read_lock(&tasklist_lock);
+ for_each_process_thread(p, t) {
+ resctrl_arch_set_closid_rmid(t, RESCTRL_RESERVED_CLOSID,
+ RESCTRL_RESERVED_RMID);
+ }
+ read_unlock(&tasklist_lock);
+}
+
+int resctrl_arch_set_cdp_enabled(enum resctrl_res_level ignored, bool enable)
+{
+ u32 partid_i = RESCTRL_RESERVED_CLOSID, partid_d = RESCTRL_RESERVED_CLOSID;
+
+ cdp_enabled = enable;
+
+ if (enable) {
+ if (mpam_partid_max < 1)
+ return -EINVAL;
+
+ partid_d = resctrl_get_config_index(RESCTRL_RESERVED_CLOSID, CDP_DATA);
+ partid_i = resctrl_get_config_index(RESCTRL_RESERVED_CLOSID, CDP_CODE);
+ }
+
+ mpam_set_task_partid_pmg(current, partid_d, partid_i, 0, 0);
+ WRITE_ONCE(arm64_mpam_global_default, mpam_get_regval(current));
+
+ resctrl_reset_task_closids();
+
+ return 0;
+}
+
+static bool mpam_resctrl_hide_cdp(enum resctrl_res_level rid)
+{
+ return cdp_enabled && !resctrl_arch_get_cdp_enabled(rid);
+}
+
/*
* MSC may raise an error interrupt if it sees an out or range partid/pmg,
* and go on to truncate the value. Regardless of what the hardware supports,
@@ -115,6 +180,30 @@ void resctrl_arch_set_closid_rmid(struct task_struct *tsk, u32 closid, u32 rmid)
}
}
+bool resctrl_arch_match_closid(struct task_struct *tsk, u32 closid)
+{
+ u64 regval = mpam_get_regval(tsk);
+ u32 tsk_closid = FIELD_GET(MPAM0_EL1_PARTID_D, regval);
+
+ if (cdp_enabled)
+ tsk_closid >>= 1;
+
+ return tsk_closid == closid;
+}
+
+/* The task's pmg is not unique, the partid must be considered too */
+bool resctrl_arch_match_rmid(struct task_struct *tsk, u32 closid, u32 rmid)
+{
+ u64 regval = mpam_get_regval(tsk);
+ u32 tsk_closid = FIELD_GET(MPAM0_EL1_PARTID_D, regval);
+ u32 tsk_rmid = FIELD_GET(MPAM0_EL1_PMG_D, regval);
+
+ if (cdp_enabled)
+ tsk_closid >>= 1;
+
+ return (tsk_closid == closid) && (tsk_rmid == rmid);
+}
+
struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
{
if (l >= RDT_NUM_RESOURCES)
@@ -246,6 +335,14 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
dom = container_of(d, struct mpam_resctrl_dom, resctrl_ctrl_dom);
cprops = &res->class->props;
+ /*
+ * When CDP is enabled, but the resource doesn't support it,
+ * the control is cloned across both partids.
+ * Pick one at random to read:
+ */
+ if (mpam_resctrl_hide_cdp(r->rid))
+ type = CDP_DATA;
+
partid = resctrl_get_config_index(closid, type);
cfg = &dom->ctrl_comp->cfg[partid];
@@ -273,6 +370,7 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
u32 closid, enum resctrl_conf_type t, u32 cfg_val)
{
+ int err;
u32 partid;
struct mpam_config cfg;
struct mpam_props *cprops;
@@ -312,6 +410,20 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
return -EINVAL;
}
+ /*
+ * When CDP is enabled, but the resource doesn't support it, we need to
+ * apply the same configuration to the other partid.
+ */
+ if (mpam_resctrl_hide_cdp(r->rid)) {
+ partid = resctrl_get_config_index(closid, CDP_CODE);
+ err = mpam_apply_config(dom->ctrl_comp, partid, &cfg);
+ if (err)
+ return err;
+
+ partid = resctrl_get_config_index(closid, CDP_DATA);
+ return mpam_apply_config(dom->ctrl_comp, partid, &cfg);
+ }
+
return mpam_apply_config(dom->ctrl_comp, partid, &cfg);
}
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
index 5a78299ec464..d329b1dc148b 100644
--- a/include/linux/arm_mpam.h
+++ b/include/linux/arm_mpam.h
@@ -56,6 +56,8 @@ void resctrl_arch_set_cpu_default_closid(int cpu, u32 closid);
void resctrl_arch_set_closid_rmid(struct task_struct *tsk, u32 closid, u32 rmid);
void resctrl_arch_set_cpu_default_closid_rmid(int cpu, u32 closid, u32 rmid);
void resctrl_arch_sched_in(struct task_struct *tsk);
+bool resctrl_arch_match_closid(struct task_struct *tsk, u32 closid);
+bool resctrl_arch_match_rmid(struct task_struct *tsk, u32 closid, u32 rmid);
/**
* mpam_register_requestor() - Register a requestor with the MPAM driver
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH v3 22/47] arm_mpam: resctrl: Convert to/from MPAMs fixed-point formats
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (20 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 21/47] arm_mpam: resctrl: Add CDP emulation Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-12 16:58 ` [PATCH v3 23/47] arm_mpam: resctrl: Add kunit test for control format conversions Ben Horgan
` (29 subsequent siblings)
51 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm, Dave Martin
From: Dave Martin <Dave.Martin@arm.com>
MPAM uses a fixed-point formats for some hardware controls. Resctrl
provides the bandwidth controls as a percentage. Add helpers to convert
between these.
Ensure bwa_wd is at most 16 to make it clear higher values have no meaning.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v2:
Ensure bwa_wd is at most 16 (moved from patch 40: arm_mpam: Generate a
configuration for min controls)
Expand comments
---
drivers/resctrl/mpam_devices.c | 7 +++++
drivers/resctrl/mpam_resctrl.c | 51 ++++++++++++++++++++++++++++++++++
2 files changed, 58 insertions(+)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 0dd7f613f7a3..c2127570cf37 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -713,6 +713,13 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
mpam_set_feature(mpam_feat_mbw_part, props);
props->bwa_wd = FIELD_GET(MPAMF_MBW_IDR_BWA_WD, mbw_features);
+
+ /*
+ * The BWA_WD field can represent 0-63, but the control fields it
+ * describes have a maximum of 16 bits.
+ */
+ props->bwa_wd = min(props->bwa_wd, 16);
+
if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MAX, mbw_features))
mpam_set_feature(mpam_feat_mbw_max, props);
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index 8112bcb85e73..71227d072d46 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -10,6 +10,7 @@
#include <linux/errno.h>
#include <linux/limits.h>
#include <linux/list.h>
+#include <linux/math.h>
#include <linux/printk.h>
#include <linux/rculist.h>
#include <linux/resctrl.h>
@@ -223,6 +224,56 @@ static bool cache_has_usable_cpor(struct mpam_class *class)
return class->props.cpbm_wd <= 32;
}
+/*
+ * Each fixed-point hardware value architecturally represents a range
+ * of values: the full range 0% - 100% is split contiguously into
+ * (1 << cprops->bwa_wd) equal bands.
+ *
+ * Although the bwa_bwd fields have 6 bits the maximum valid value is 16
+ * as it reports the width of fields that are at most 16 bits. When
+ * fewer than 16 bits are valid the least significant bits are
+ * ignored. The implied binary point is kept between bits 15 and 16 and
+ * so the valid bits are leftmost.
+ *
+ * See ARM IHI0099B.a "MPAM system component specification", Section 9.3,
+ * "The fixed-point fractional format" for more information.
+ *
+ * Find the nearest percentage value to the upper bound of the selected band:
+ */
+static u32 mbw_max_to_percent(u16 mbw_max, struct mpam_props *cprops)
+{
+ u32 val = mbw_max;
+
+ val >>= 16 - cprops->bwa_wd;
+ val += 1;
+ val *= MAX_MBA_BW;
+ val = DIV_ROUND_CLOSEST(val, 1 << cprops->bwa_wd);
+
+ return val;
+}
+
+/*
+ * Find the band whose upper bound is closest to the specified percentage.
+ *
+ * A round-to-nearest policy is followed here as a balanced compromise
+ * between unexpected under-commit of the resource (where the total of
+ * a set of resource allocations after conversion is less than the
+ * expected total, due to rounding of the individual converted
+ * percentages) and over-commit (where the total of the converted
+ * allocations is greater than expected).
+ */
+static u16 percent_to_mbw_max(u8 pc, struct mpam_props *cprops)
+{
+ u32 val = pc;
+
+ val <<= cprops->bwa_wd;
+ val = DIV_ROUND_CLOSEST(val, MAX_MBA_BW);
+ val = max(val, 1) - 1;
+ val <<= 16 - cprops->bwa_wd;
+
+ return val;
+}
+
/* Test whether we can export MPAM_CLASS_CACHE:{2,3}? */
static void mpam_resctrl_pick_caches(void)
{
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH v3 23/47] arm_mpam: resctrl: Add kunit test for control format conversions
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (21 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 22/47] arm_mpam: resctrl: Convert to/from MPAMs fixed-point formats Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-12 16:58 ` [PATCH v3 24/47] arm_mpam: resctrl: Add rmid index helpers Ben Horgan
` (28 subsequent siblings)
51 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm, Dave Martin
From: Dave Martin <Dave.Martin@arm.com>
resctrl specifies the format of the control schemes, and these don't match
the hardware.
Some of the conversions are a bit hairy - add some kunit tests.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
[morse: squashed enough of Dave's fixes in here that it's his patch now!]
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v2:
Include additional values from the latest spec
---
drivers/resctrl/mpam_resctrl.c | 4 +
drivers/resctrl/test_mpam_resctrl.c | 315 ++++++++++++++++++++++++++++
2 files changed, 319 insertions(+)
create mode 100644 drivers/resctrl/test_mpam_resctrl.c
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index 71227d072d46..b6bbe73bc248 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -767,3 +767,7 @@ int mpam_resctrl_setup(void)
return 0;
}
+
+#ifdef CONFIG_MPAM_KUNIT_TEST
+#include "test_mpam_resctrl.c"
+#endif
diff --git a/drivers/resctrl/test_mpam_resctrl.c b/drivers/resctrl/test_mpam_resctrl.c
new file mode 100644
index 000000000000..b93d6ad87e43
--- /dev/null
+++ b/drivers/resctrl/test_mpam_resctrl.c
@@ -0,0 +1,315 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2025 Arm Ltd.
+/* This file is intended to be included into mpam_resctrl.c */
+
+#include <kunit/test.h>
+#include <linux/array_size.h>
+#include <linux/bits.h>
+#include <linux/math.h>
+#include <linux/sprintf.h>
+
+struct percent_value_case {
+ u8 pc;
+ u8 width;
+ u16 value;
+};
+
+/*
+ * Mysterious inscriptions taken from the union of ARM DDI 0598D.b,
+ * "Arm Architecture Reference Manual Supplement - Memory System
+ * Resource Partitioning and Monitoring (MPAM), for A-profile
+ * architecture", Section 9.8, "About the fixed-point fractional
+ * format" (exact percentage entries only) and ARM IHI0099B.a
+ * "MPAM system component specification", Section 9.3,
+ * "The fixed-point fractional format":
+ */
+static const struct percent_value_case percent_value_cases[] = {
+ /* Architectural cases: */
+ { 1, 8, 1 }, { 1, 12, 0x27 }, { 1, 16, 0x28e },
+ { 25, 8, 0x3f }, { 25, 12, 0x3ff }, { 25, 16, 0x3fff },
+ { 33, 8, 0x53 }, { 33, 12, 0x546 }, { 33, 16, 0x5479 },
+ { 35, 8, 0x58 }, { 35, 12, 0x598 }, { 35, 16, 0x5998 },
+ { 45, 8, 0x72 }, { 45, 12, 0x732 }, { 45, 16, 0x7332 },
+ { 50, 8, 0x7f }, { 50, 12, 0x7ff }, { 50, 16, 0x7fff },
+ { 52, 8, 0x84 }, { 52, 12, 0x850 }, { 52, 16, 0x851d },
+ { 55, 8, 0x8b }, { 55, 12, 0x8cb }, { 55, 16, 0x8ccb },
+ { 58, 8, 0x93 }, { 58, 12, 0x946 }, { 58, 16, 0x9479 },
+ { 75, 8, 0xbf }, { 75, 12, 0xbff }, { 75, 16, 0xbfff },
+ { 80, 8, 0xcb }, { 80, 12, 0xccb }, { 80, 16, 0xcccb },
+ { 88, 8, 0xe0 }, { 88, 12, 0xe13 }, { 88, 16, 0xe146 },
+ { 95, 8, 0xf2 }, { 95, 12, 0xf32 }, { 95, 16, 0xf332 },
+ { 100, 8, 0xff }, { 100, 12, 0xfff }, { 100, 16, 0xffff },
+};
+
+static void test_percent_value_desc(const struct percent_value_case *param,
+ char *desc)
+{
+ snprintf(desc, KUNIT_PARAM_DESC_SIZE,
+ "pc=%d, width=%d, value=0x%.*x\n",
+ param->pc, param->width,
+ DIV_ROUND_UP(param->width, 4), param->value);
+}
+
+KUNIT_ARRAY_PARAM(test_percent_value, percent_value_cases,
+ test_percent_value_desc);
+
+struct percent_value_test_info {
+ u32 pc; /* result of value-to-percent conversion */
+ u32 value; /* result of percent-to-value conversion */
+ u32 max_value; /* maximum raw value allowed by test params */
+ unsigned int shift; /* promotes raw testcase value to 16 bits */
+};
+
+/*
+ * Convert a reference percentage to a fixed-point MAX value and
+ * vice-versa, based on param (not test->param_value!)
+ */
+static void __prepare_percent_value_test(struct kunit *test,
+ struct percent_value_test_info *res,
+ const struct percent_value_case *param)
+{
+ struct mpam_props fake_props = { };
+
+ /* Reject bogus test parameters that would break the tests: */
+ KUNIT_ASSERT_GE(test, param->width, 1);
+ KUNIT_ASSERT_LE(test, param->width, 16);
+ KUNIT_ASSERT_LT(test, param->value, 1 << param->width);
+
+ mpam_set_feature(mpam_feat_mbw_max, &fake_props);
+ fake_props.bwa_wd = param->width;
+
+ res->shift = 16 - param->width;
+ res->max_value = GENMASK_U32(param->width - 1, 0);
+ res->value = percent_to_mbw_max(param->pc, &fake_props);
+ res->pc = mbw_max_to_percent(param->value << res->shift, &fake_props);
+}
+
+static void test_get_mba_granularity(struct kunit *test)
+{
+ int ret;
+ struct mpam_props fake_props = { };
+
+ /* Use MBW_MAX */
+ mpam_set_feature(mpam_feat_mbw_max, &fake_props);
+
+ fake_props.bwa_wd = 0;
+ KUNIT_EXPECT_FALSE(test, mba_class_use_mbw_max(&fake_props));
+
+ fake_props.bwa_wd = 1;
+ KUNIT_EXPECT_TRUE(test, mba_class_use_mbw_max(&fake_props));
+
+ /* Architectural maximum: */
+ fake_props.bwa_wd = 16;
+ KUNIT_EXPECT_TRUE(test, mba_class_use_mbw_max(&fake_props));
+
+ /* No usable control... */
+ fake_props.bwa_wd = 0;
+ ret = get_mba_granularity(&fake_props);
+ KUNIT_EXPECT_EQ(test, ret, 0);
+
+ fake_props.bwa_wd = 1;
+ ret = get_mba_granularity(&fake_props);
+ KUNIT_EXPECT_EQ(test, ret, 50); /* DIV_ROUND_UP(100, 1 << 1)% = 50% */
+
+ fake_props.bwa_wd = 2;
+ ret = get_mba_granularity(&fake_props);
+ KUNIT_EXPECT_EQ(test, ret, 25); /* DIV_ROUND_UP(100, 1 << 2)% = 25% */
+
+ fake_props.bwa_wd = 3;
+ ret = get_mba_granularity(&fake_props);
+ KUNIT_EXPECT_EQ(test, ret, 13); /* DIV_ROUND_UP(100, 1 << 3)% = 13% */
+
+ fake_props.bwa_wd = 6;
+ ret = get_mba_granularity(&fake_props);
+ KUNIT_EXPECT_EQ(test, ret, 2); /* DIV_ROUND_UP(100, 1 << 6)% = 2% */
+
+ fake_props.bwa_wd = 7;
+ ret = get_mba_granularity(&fake_props);
+ KUNIT_EXPECT_EQ(test, ret, 1); /* DIV_ROUND_UP(100, 1 << 7)% = 1% */
+
+ /* Granularity saturates at 1% */
+ fake_props.bwa_wd = 16; /* architectural maximum */
+ ret = get_mba_granularity(&fake_props);
+ KUNIT_EXPECT_EQ(test, ret, 1); /* DIV_ROUND_UP(100, 1 << 16)% = 1% */
+}
+
+static void test_mbw_max_to_percent(struct kunit *test)
+{
+ const struct percent_value_case *param = test->param_value;
+ struct percent_value_test_info res;
+
+ /*
+ * Since the reference values in percent_value_cases[] all
+ * correspond to exact percentages, round-to-nearest will
+ * always give the exact percentage back when the MPAM max
+ * value has precision of 0.5% or finer. (Always true for the
+ * reference data, since they all specify 8 bits or more of
+ * precision.
+ *
+ * So, keep it simple and demand an exact match:
+ */
+ __prepare_percent_value_test(test, &res, param);
+ KUNIT_EXPECT_EQ(test, res.pc, param->pc);
+}
+
+static void test_percent_to_mbw_max(struct kunit *test)
+{
+ const struct percent_value_case *param = test->param_value;
+ struct percent_value_test_info res;
+
+ __prepare_percent_value_test(test, &res, param);
+
+ KUNIT_EXPECT_GE(test, res.value, param->value << res.shift);
+ KUNIT_EXPECT_LE(test, res.value, (param->value + 1) << res.shift);
+ KUNIT_EXPECT_LE(test, res.value, res.max_value << res.shift);
+
+ /* No flexibility allowed for 0% and 100%! */
+
+ if (param->pc == 0)
+ KUNIT_EXPECT_EQ(test, res.value, 0);
+
+ if (param->pc == 100)
+ KUNIT_EXPECT_EQ(test, res.value, res.max_value << res.shift);
+}
+
+static const void *test_all_bwa_wd_gen_params(struct kunit *test, const void *prev,
+ char *desc)
+{
+ uintptr_t param = (uintptr_t)prev;
+
+ if (param > 15)
+ return NULL;
+
+ param++;
+
+ snprintf(desc, KUNIT_PARAM_DESC_SIZE, "wd=%u\n", (unsigned int)param);
+
+ return (void *)param;
+}
+
+static unsigned int test_get_bwa_wd(struct kunit *test)
+{
+ uintptr_t param = (uintptr_t)test->param_value;
+
+ KUNIT_ASSERT_GE(test, param, 1);
+ KUNIT_ASSERT_LE(test, param, 16);
+
+ return param;
+}
+
+static void test_mbw_max_to_percent_limits(struct kunit *test)
+{
+ struct mpam_props fake_props = {0};
+ u32 max_value;
+
+ mpam_set_feature(mpam_feat_mbw_max, &fake_props);
+ fake_props.bwa_wd = test_get_bwa_wd(test);
+ max_value = GENMASK(15, 16 - fake_props.bwa_wd);
+
+ KUNIT_EXPECT_EQ(test, mbw_max_to_percent(max_value, &fake_props),
+ MAX_MBA_BW);
+ KUNIT_EXPECT_EQ(test, mbw_max_to_percent(0, &fake_props),
+ get_mba_min(&fake_props));
+
+ /*
+ * Rounding policy dependent 0% sanity-check:
+ * With round-to-nearest, the minimum mbw_max value really
+ * should map to 0% if there are at least 200 steps.
+ * (100 steps may be enough for some other rounding policies.)
+ */
+ if (fake_props.bwa_wd >= 8)
+ KUNIT_EXPECT_EQ(test, mbw_max_to_percent(0, &fake_props), 0);
+
+ if (fake_props.bwa_wd < 8 &&
+ mbw_max_to_percent(0, &fake_props) == 0)
+ kunit_warn(test, "wd=%d: Testsuite/driver Rounding policy mismatch?",
+ fake_props.bwa_wd);
+}
+
+/*
+ * Check that converting a percentage to mbw_max and back again (or, as
+ * appropriate, vice-versa) always restores the original value:
+ */
+static void test_percent_max_roundtrip_stability(struct kunit *test)
+{
+ struct mpam_props fake_props = {0};
+ unsigned int shift;
+ u32 pc, max, pc2, max2;
+
+ mpam_set_feature(mpam_feat_mbw_max, &fake_props);
+ fake_props.bwa_wd = test_get_bwa_wd(test);
+ shift = 16 - fake_props.bwa_wd;
+
+ /*
+ * Converting a valid value from the coarser scale to the finer
+ * scale and back again must yield the original value:
+ */
+ if (fake_props.bwa_wd >= 7) {
+ /* More than 100 steps: only test exact pc values: */
+ for (pc = get_mba_min(&fake_props); pc <= MAX_MBA_BW; pc++) {
+ max = percent_to_mbw_max(pc, &fake_props);
+ pc2 = mbw_max_to_percent(max, &fake_props);
+ KUNIT_EXPECT_EQ(test, pc2, pc);
+ }
+ } else {
+ /* Fewer than 100 steps: only test exact mbw_max values: */
+ for (max = 0; max < 1 << 16; max += 1 << shift) {
+ pc = mbw_max_to_percent(max, &fake_props);
+ max2 = percent_to_mbw_max(pc, &fake_props);
+ KUNIT_EXPECT_EQ(test, max2, max);
+ }
+ }
+}
+
+static void test_percent_to_max_rounding(struct kunit *test)
+{
+ const struct percent_value_case *param = test->param_value;
+ unsigned int num_rounded_up = 0, total = 0;
+ struct percent_value_test_info res;
+
+ for (param = percent_value_cases, total = 0;
+ param < &percent_value_cases[ARRAY_SIZE(percent_value_cases)];
+ param++, total++) {
+ __prepare_percent_value_test(test, &res, param);
+ if (res.value > param->value << res.shift)
+ num_rounded_up++;
+ }
+
+ /*
+ * The MPAM driver applies a round-to-nearest policy, whereas a
+ * round-down policy seems to have been applied in the
+ * reference table from which the test vectors were selected.
+ *
+ * For a large and well-distributed suite of test vectors,
+ * about half should be rounded up and half down compared with
+ * the reference table. The actual test vectors are few in
+ * number and probably not very well distributed however, so
+ * tolerate a round-up rate of between 1/4 and 3/4 before
+ * crying foul:
+ */
+
+ kunit_info(test, "Round-up rate: %u%% (%u/%u)\n",
+ DIV_ROUND_CLOSEST(num_rounded_up * 100, total),
+ num_rounded_up, total);
+
+ KUNIT_EXPECT_GE(test, 4 * num_rounded_up, 1 * total);
+ KUNIT_EXPECT_LE(test, 4 * num_rounded_up, 3 * total);
+}
+
+static struct kunit_case mpam_resctrl_test_cases[] = {
+ KUNIT_CASE(test_get_mba_granularity),
+ KUNIT_CASE_PARAM(test_mbw_max_to_percent, test_percent_value_gen_params),
+ KUNIT_CASE_PARAM(test_percent_to_mbw_max, test_percent_value_gen_params),
+ KUNIT_CASE_PARAM(test_mbw_max_to_percent_limits, test_all_bwa_wd_gen_params),
+ KUNIT_CASE(test_percent_to_max_rounding),
+ KUNIT_CASE_PARAM(test_percent_max_roundtrip_stability,
+ test_all_bwa_wd_gen_params),
+ {}
+};
+
+static struct kunit_suite mpam_resctrl_test_suite = {
+ .name = "mpam_resctrl_test_suite",
+ .test_cases = mpam_resctrl_test_cases,
+};
+
+kunit_test_suites(&mpam_resctrl_test_suite);
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH v3 24/47] arm_mpam: resctrl: Add rmid index helpers
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (22 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 23/47] arm_mpam: resctrl: Add kunit test for control format conversions Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-13 14:55 ` Jonathan Cameron
2026-01-12 16:58 ` [PATCH v3 25/47] arm_mpam: resctrl: Add kunit test for rmid idx conversions Ben Horgan
` (27 subsequent siblings)
51 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
Because MPAM's pmg aren't identical to RDT's rmid, resctrl handles some
data structures by index. This allows x86 to map indexes to RMID, and MPAM
to map them to partid-and-pmg.
Add the helpers to do this.
Suggested-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since rfc:
Use ~0U instead of ~0 in lhs of left shift
Changes since v2:
Drop changes signed-off-by as reworked patch
Use multiply and add rather than shift to avoid holes
---
drivers/resctrl/mpam_resctrl.c | 16 ++++++++++++++++
include/linux/arm_mpam.h | 3 +++
2 files changed, 19 insertions(+)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index b6bbe73bc248..92b2a2d4b51d 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -126,6 +126,22 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *ignored)
return mpam_partid_max + 1;
}
+u32 resctrl_arch_system_num_rmid_idx(void)
+{
+ return (mpam_pmg_max + 1) * (mpam_partid_max + 1);
+}
+
+u32 resctrl_arch_rmid_idx_encode(u32 closid, u32 rmid)
+{
+ return closid * (mpam_pmg_max + 1) + rmid;
+}
+
+void resctrl_arch_rmid_idx_decode(u32 idx, u32 *closid, u32 *rmid)
+{
+ *closid = idx / (mpam_pmg_max + 1);
+ *rmid = idx % (mpam_pmg_max + 1);
+}
+
void resctrl_arch_sched_in(struct task_struct *tsk)
{
lockdep_assert_preemption_disabled();
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
index d329b1dc148b..7d23c90f077d 100644
--- a/include/linux/arm_mpam.h
+++ b/include/linux/arm_mpam.h
@@ -58,6 +58,9 @@ void resctrl_arch_set_cpu_default_closid_rmid(int cpu, u32 closid, u32 rmid);
void resctrl_arch_sched_in(struct task_struct *tsk);
bool resctrl_arch_match_closid(struct task_struct *tsk, u32 closid);
bool resctrl_arch_match_rmid(struct task_struct *tsk, u32 closid, u32 rmid);
+u32 resctrl_arch_rmid_idx_encode(u32 closid, u32 rmid);
+void resctrl_arch_rmid_idx_decode(u32 idx, u32 *closid, u32 *rmid);
+u32 resctrl_arch_system_num_rmid_idx(void);
/**
* mpam_register_requestor() - Register a requestor with the MPAM driver
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 24/47] arm_mpam: resctrl: Add rmid index helpers
2026-01-12 16:58 ` [PATCH v3 24/47] arm_mpam: resctrl: Add rmid index helpers Ben Horgan
@ 2026-01-13 14:55 ` Jonathan Cameron
0 siblings, 0 replies; 160+ messages in thread
From: Jonathan Cameron @ 2026-01-13 14:55 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
On Mon, 12 Jan 2026 16:58:51 +0000
Ben Horgan <ben.horgan@arm.com> wrote:
> Because MPAM's pmg aren't identical to RDT's rmid, resctrl handles some
> data structures by index. This allows x86 to map indexes to RMID, and MPAM
> to map them to partid-and-pmg.
>
> Add the helpers to do this.
>
> Suggested-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Much easier to read. Thanks!
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 25/47] arm_mpam: resctrl: Add kunit test for rmid idx conversions
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (23 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 24/47] arm_mpam: resctrl: Add rmid index helpers Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-13 14:59 ` Jonathan Cameron
2026-01-12 16:58 ` [PATCH v3 26/47] arm_mpam: resctrl: Wait for cacheinfo to be ready Ben Horgan
` (26 subsequent siblings)
51 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
As MPAM's pmg are scoped by partid and RDT's rmid are global the
rescrl mapping to an index needs to differ.
Add some tests for the MPAM rmid mapping.
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
drivers/resctrl/test_mpam_resctrl.c | 49 +++++++++++++++++++++++++++++
1 file changed, 49 insertions(+)
diff --git a/drivers/resctrl/test_mpam_resctrl.c b/drivers/resctrl/test_mpam_resctrl.c
index b93d6ad87e43..a20da161d965 100644
--- a/drivers/resctrl/test_mpam_resctrl.c
+++ b/drivers/resctrl/test_mpam_resctrl.c
@@ -296,6 +296,54 @@ static void test_percent_to_max_rounding(struct kunit *test)
KUNIT_EXPECT_LE(test, 4 * num_rounded_up, 3 * total);
}
+struct rmid_idx_case {
+ u32 max_partid;
+ u32 max_pmg;
+};
+
+static const struct rmid_idx_case rmid_idx_cases[] = {
+ {0, 0}, {1, 4}, {3, 1}, {5, 9}, {4, 4}, {100, 11}, {0xFFFF, 0xFF},
+};
+
+static void test_rmid_idx_desc(const struct rmid_idx_case *param, char *desc)
+{
+ snprintf(desc, KUNIT_PARAM_DESC_SIZE, "max_partid=%d, max_pmg=%d\n",
+ param->max_partid, param->max_pmg);
+}
+
+KUNIT_ARRAY_PARAM(test_rmid_idx, rmid_idx_cases, test_rmid_idx_desc);
+
+static void test_rmid_idx_encoding(struct kunit *test)
+{
+ u32 orig_mpam_partid_max = mpam_partid_max;
+ u32 orig_mpam_pmg_max = mpam_pmg_max;
+ const struct rmid_idx_case *param = test->param_value;
+ u32 idx, num_idx, count = 0;
+
+ mpam_partid_max = param->max_partid;
+ mpam_pmg_max = param->max_pmg;
+
+ for (u32 partid = 0; partid <= mpam_partid_max; partid++) {
+ for (u32 pmg = 0; pmg <= mpam_pmg_max; pmg++) {
+ u32 partid_out, pmg_out;
+
+ idx = resctrl_arch_rmid_idx_encode(partid, pmg);
+ /* Confirm there are no holes in the rmid idx range */
+ KUNIT_EXPECT_EQ(test, count, idx);
+ count++;
+ resctrl_arch_rmid_idx_decode(idx, &partid_out, &pmg_out);
+ KUNIT_EXPECT_EQ(test, pmg, pmg_out);
+ KUNIT_EXPECT_EQ(test, partid, partid_out);
+ }
+ }
+ num_idx = resctrl_arch_system_num_rmid_idx();
+ KUNIT_EXPECT_EQ(test, idx + 1, num_idx);
+
+ /* Restore global variables that were messed with */
+ mpam_partid_max = orig_mpam_partid_max;
+ mpam_pmg_max = orig_mpam_pmg_max;
+}
+
static struct kunit_case mpam_resctrl_test_cases[] = {
KUNIT_CASE(test_get_mba_granularity),
KUNIT_CASE_PARAM(test_mbw_max_to_percent, test_percent_value_gen_params),
@@ -304,6 +352,7 @@ static struct kunit_case mpam_resctrl_test_cases[] = {
KUNIT_CASE(test_percent_to_max_rounding),
KUNIT_CASE_PARAM(test_percent_max_roundtrip_stability,
test_all_bwa_wd_gen_params),
+ KUNIT_CASE_PARAM(test_rmid_idx_encoding, test_rmid_idx_gen_params),
{}
};
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 25/47] arm_mpam: resctrl: Add kunit test for rmid idx conversions
2026-01-12 16:58 ` [PATCH v3 25/47] arm_mpam: resctrl: Add kunit test for rmid idx conversions Ben Horgan
@ 2026-01-13 14:59 ` Jonathan Cameron
0 siblings, 0 replies; 160+ messages in thread
From: Jonathan Cameron @ 2026-01-13 14:59 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
On Mon, 12 Jan 2026 16:58:52 +0000
Ben Horgan <ben.horgan@arm.com> wrote:
> As MPAM's pmg are scoped by partid and RDT's rmid are global the
> rescrl mapping to an index needs to differ.
>
> Add some tests for the MPAM rmid mapping.
>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
I'm not sure this particular test brings a massive amount of value, but
I'm not one to object to more tests!
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> ---
> drivers/resctrl/test_mpam_resctrl.c | 49 +++++++++++++++++++++++++++++
> 1 file changed, 49 insertions(+)
>
> diff --git a/drivers/resctrl/test_mpam_resctrl.c b/drivers/resctrl/test_mpam_resctrl.c
> index b93d6ad87e43..a20da161d965 100644
> --- a/drivers/resctrl/test_mpam_resctrl.c
> +++ b/drivers/resctrl/test_mpam_resctrl.c
> @@ -296,6 +296,54 @@ static void test_percent_to_max_rounding(struct kunit *test)
> KUNIT_EXPECT_LE(test, 4 * num_rounded_up, 3 * total);
> }
>
> +struct rmid_idx_case {
> + u32 max_partid;
> + u32 max_pmg;
> +};
> +
> +static const struct rmid_idx_case rmid_idx_cases[] = {
> + {0, 0}, {1, 4}, {3, 1}, {5, 9}, {4, 4}, {100, 11}, {0xFFFF, 0xFF},
> +};
> +
> +static void test_rmid_idx_desc(const struct rmid_idx_case *param, char *desc)
> +{
> + snprintf(desc, KUNIT_PARAM_DESC_SIZE, "max_partid=%d, max_pmg=%d\n",
> + param->max_partid, param->max_pmg);
> +}
> +
> +KUNIT_ARRAY_PARAM(test_rmid_idx, rmid_idx_cases, test_rmid_idx_desc);
> +
> +static void test_rmid_idx_encoding(struct kunit *test)
> +{
> + u32 orig_mpam_partid_max = mpam_partid_max;
> + u32 orig_mpam_pmg_max = mpam_pmg_max;
> + const struct rmid_idx_case *param = test->param_value;
> + u32 idx, num_idx, count = 0;
> +
> + mpam_partid_max = param->max_partid;
> + mpam_pmg_max = param->max_pmg;
> +
> + for (u32 partid = 0; partid <= mpam_partid_max; partid++) {
> + for (u32 pmg = 0; pmg <= mpam_pmg_max; pmg++) {
> + u32 partid_out, pmg_out;
> +
> + idx = resctrl_arch_rmid_idx_encode(partid, pmg);
> + /* Confirm there are no holes in the rmid idx range */
> + KUNIT_EXPECT_EQ(test, count, idx);
> + count++;
> + resctrl_arch_rmid_idx_decode(idx, &partid_out, &pmg_out);
> + KUNIT_EXPECT_EQ(test, pmg, pmg_out);
> + KUNIT_EXPECT_EQ(test, partid, partid_out);
> + }
> + }
> + num_idx = resctrl_arch_system_num_rmid_idx();
> + KUNIT_EXPECT_EQ(test, idx + 1, num_idx);
> +
> + /* Restore global variables that were messed with */
> + mpam_partid_max = orig_mpam_partid_max;
> + mpam_pmg_max = orig_mpam_pmg_max;
> +}
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 26/47] arm_mpam: resctrl: Wait for cacheinfo to be ready
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (24 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 25/47] arm_mpam: resctrl: Add kunit test for rmid idx conversions Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-13 15:01 ` Jonathan Cameron
2026-01-12 16:58 ` [PATCH v3 27/47] arm_mpam: resctrl: Add support for 'MB' resource Ben Horgan
` (25 subsequent siblings)
51 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
In order to calculate the rmid realloc threshold the size of the cache
needs to be known. Cache domains will also be named after the cache id. So
that this information can be extracted from cacheinfo we need to wait for
it to be ready. The cacheinfo information is populated in device_initcall()
so we wait for that.
Signed-off-by: James Morse <james.morse@arm.com>
[horgan: split out from another patch]
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
This is moved into it's own patch to allow all uses of cacheinfo to be
valid when they are introduced.
---
drivers/resctrl/mpam_resctrl.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index 92b2a2d4b51d..3ca977527698 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -16,6 +16,7 @@
#include <linux/resctrl.h>
#include <linux/slab.h>
#include <linux/types.h>
+#include <linux/wait.h>
#include <asm/mpam.h>
@@ -45,6 +46,13 @@ static bool exposed_mon_capable;
*/
static bool cdp_enabled;
+/*
+ * We use cacheinfo to discover the size of the caches and their id. cacheinfo
+ * populates this from a device_initcall(). mpam_resctrl_setup() must wait.
+ */
+static bool cacheinfo_ready;
+static DECLARE_WAIT_QUEUE_HEAD(wait_cacheinfo_ready);
+
bool resctrl_arch_alloc_capable(void)
{
return exposed_alloc_capable;
@@ -745,6 +753,8 @@ int mpam_resctrl_setup(void)
struct mpam_resctrl_res *res;
enum resctrl_res_level rid;
+ wait_event(wait_cacheinfo_ready, cacheinfo_ready);
+
cpus_read_lock();
for_each_mpam_resctrl_control(res, rid) {
INIT_LIST_HEAD_RCU(&res->resctrl_res.ctrl_domains);
@@ -784,6 +794,15 @@ int mpam_resctrl_setup(void)
return 0;
}
+static int __init __cacheinfo_ready(void)
+{
+ cacheinfo_ready = true;
+ wake_up(&wait_cacheinfo_ready);
+
+ return 0;
+}
+device_initcall_sync(__cacheinfo_ready);
+
#ifdef CONFIG_MPAM_KUNIT_TEST
#include "test_mpam_resctrl.c"
#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 26/47] arm_mpam: resctrl: Wait for cacheinfo to be ready
2026-01-12 16:58 ` [PATCH v3 26/47] arm_mpam: resctrl: Wait for cacheinfo to be ready Ben Horgan
@ 2026-01-13 15:01 ` Jonathan Cameron
2026-01-13 15:15 ` Ben Horgan
0 siblings, 1 reply; 160+ messages in thread
From: Jonathan Cameron @ 2026-01-13 15:01 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
On Mon, 12 Jan 2026 16:58:53 +0000
Ben Horgan <ben.horgan@arm.com> wrote:
> In order to calculate the rmid realloc threshold the size of the cache
> needs to be known. Cache domains will also be named after the cache id. So
> that this information can be extracted from cacheinfo we need to wait for
> it to be ready. The cacheinfo information is populated in device_initcall()
> so we wait for that.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> [horgan: split out from another patch]
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> This is moved into it's own patch to allow all uses of cacheinfo to be
> valid when they are introduced.
I don't mind the patch but I'm not entirely following this comment. Is
the point that previously there was a sneaky user before this was added in
the series?
Anyhow, that's not in the patch itself so
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 26/47] arm_mpam: resctrl: Wait for cacheinfo to be ready
2026-01-13 15:01 ` Jonathan Cameron
@ 2026-01-13 15:15 ` Ben Horgan
0 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-13 15:15 UTC (permalink / raw)
To: Jonathan Cameron
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
Hi Jonathan,
On 1/13/26 15:01, Jonathan Cameron wrote:
> On Mon, 12 Jan 2026 16:58:53 +0000
> Ben Horgan <ben.horgan@arm.com> wrote:
>
>> In order to calculate the rmid realloc threshold the size of the cache
>> needs to be known. Cache domains will also be named after the cache id. So
>> that this information can be extracted from cacheinfo we need to wait for
>> it to be ready. The cacheinfo information is populated in device_initcall()
>> so we wait for that.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> [horgan: split out from another patch]
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>> ---
>> This is moved into it's own patch to allow all uses of cacheinfo to be
>> valid when they are introduced.
>
> I don't mind the patch but I'm not entirely following this comment. Is
> the point that previously there was a sneaky user before this was added in
> the series?
Yes, arm_mpam: resctrl: Add support for 'MB' resource, accesses
cacheinfo so wanted to order this patch before that.
>
> Anyhow, that's not in the patch itself so
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 27/47] arm_mpam: resctrl: Add support for 'MB' resource
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (25 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 26/47] arm_mpam: resctrl: Wait for cacheinfo to be ready Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-13 15:06 ` Jonathan Cameron
` (2 more replies)
2026-01-12 16:58 ` [PATCH v3 28/47] arm_mpam: resctrl: Add support for csu counters Ben Horgan
` (24 subsequent siblings)
51 siblings, 3 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm, Zeng Heng,
Dave Martin
From: James Morse <james.morse@arm.com>
resctrl supports 'MB', as a percentage throttling of traffic somewhere
after the L3. This is the control that mba_sc uses, so ideally the class
chosen should be as close as possible to the counters used for mba_local.
MB's percentage control should be backed either with the fixed point
fraction MBW_MAX or bandwidth portion bitmaps. The bandwidth portion
bitmaps is not used as its tricky to pick which bits to use to avoid
contention, and may be possible to expose this as something other than a
percentage in the future.
CC: Zeng Heng <zengheng4@huawei.com>
Co-developed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v2:
Code flow change
Commit message 'or'
---
drivers/resctrl/mpam_resctrl.c | 209 ++++++++++++++++++++++++++++++++-
1 file changed, 208 insertions(+), 1 deletion(-)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index 3ca977527698..7402bf4293b6 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -248,6 +248,33 @@ static bool cache_has_usable_cpor(struct mpam_class *class)
return class->props.cpbm_wd <= 32;
}
+static bool mba_class_use_mbw_max(struct mpam_props *cprops)
+{
+ return (mpam_has_feature(mpam_feat_mbw_max, cprops) &&
+ cprops->bwa_wd);
+}
+
+static bool class_has_usable_mba(struct mpam_props *cprops)
+{
+ return mba_class_use_mbw_max(cprops);
+}
+
+/*
+ * Calculate the worst-case percentage change from each implemented step
+ * in the control.
+ */
+static u32 get_mba_granularity(struct mpam_props *cprops)
+{
+ if (!mba_class_use_mbw_max(cprops))
+ return 0;
+
+ /*
+ * bwa_wd is the number of bits implemented in the 0.xxx
+ * fixed point fraction. 1 bit is 50%, 2 is 25% etc.
+ */
+ return DIV_ROUND_UP(MAX_MBA_BW, 1 << cprops->bwa_wd);
+}
+
/*
* Each fixed-point hardware value architecturally represents a range
* of values: the full range 0% - 100% is split contiguously into
@@ -298,6 +325,94 @@ static u16 percent_to_mbw_max(u8 pc, struct mpam_props *cprops)
return val;
}
+static u32 get_mba_min(struct mpam_props *cprops)
+{
+ if (!mba_class_use_mbw_max(cprops)) {
+ WARN_ON_ONCE(1);
+ return 0;
+ }
+
+ return mbw_max_to_percent(0, cprops);
+}
+
+/* Find the L3 cache that has affinity with this CPU */
+static int find_l3_equivalent_bitmask(int cpu, cpumask_var_t tmp_cpumask)
+{
+ u32 cache_id = get_cpu_cacheinfo_id(cpu, 3);
+
+ lockdep_assert_cpus_held();
+
+ return mpam_get_cpumask_from_cache_id(cache_id, 3, tmp_cpumask);
+}
+
+/*
+ * topology_matches_l3() - Is the provided class the same shape as L3
+ * @victim: The class we'd like to pretend is L3.
+ *
+ * resctrl expects all the world's a Xeon, and all counters are on the
+ * L3. We play fast and loose with this, mapping counters on other
+ * classes - provided the CPU->domain mapping is the same kind of shape.
+ *
+ * Using cacheinfo directly would make this work even if resctrl can't
+ * use the L3 - but cacheinfo can't tell us anything about offline CPUs.
+ * Using the L3 resctrl domain list also depends on CPUs being online.
+ * Using the mpam_class we picked for L3 so we can use its domain list
+ * assumes that there are MPAM controls on the L3.
+ * Instead, this path eventually uses the mpam_get_cpumask_from_cache_id()
+ * helper which can tell us about offline CPUs ... but getting the cache_id
+ * to start with relies on at least one CPU per L3 cache being online at
+ * boot.
+ *
+ * Walk the victim component list and compare the affinity mask with the
+ * corresponding L3. The topology matches if each victim:component's affinity
+ * mask is the same as the CPU's corresponding L3's. These lists/masks are
+ * computed from firmware tables so don't change at runtime.
+ */
+static bool topology_matches_l3(struct mpam_class *victim)
+{
+ int cpu, err;
+ struct mpam_component *victim_iter;
+ cpumask_var_t __free(free_cpumask_var) tmp_cpumask;
+
+ if (!alloc_cpumask_var(&tmp_cpumask, GFP_KERNEL))
+ return false;
+
+ guard(srcu)(&mpam_srcu);
+ list_for_each_entry_srcu(victim_iter, &victim->components, class_list,
+ srcu_read_lock_held(&mpam_srcu)) {
+ if (cpumask_empty(&victim_iter->affinity)) {
+ pr_debug("class %u has CPU-less component %u - can't match L3!\n",
+ victim->level, victim_iter->comp_id);
+ return false;
+ }
+
+ cpu = cpumask_any(&victim_iter->affinity);
+ if (WARN_ON_ONCE(cpu >= nr_cpu_ids))
+ return false;
+
+ cpumask_clear(tmp_cpumask);
+ err = find_l3_equivalent_bitmask(cpu, tmp_cpumask);
+ if (err) {
+ pr_debug("Failed to find L3's equivalent component to class %u component %u\n",
+ victim->level, victim_iter->comp_id);
+ return false;
+ }
+
+ /* Any differing bits in the affinity mask? */
+ if (!cpumask_equal(tmp_cpumask, &victim_iter->affinity)) {
+ pr_debug("class %u component %u has Mismatched CPU mask with L3 equivalent\n"
+ "L3:%*pbl != victim:%*pbl\n",
+ victim->level, victim_iter->comp_id,
+ cpumask_pr_args(tmp_cpumask),
+ cpumask_pr_args(&victim_iter->affinity));
+
+ return false;
+ }
+ }
+
+ return true;
+}
+
/* Test whether we can export MPAM_CLASS_CACHE:{2,3}? */
static void mpam_resctrl_pick_caches(void)
{
@@ -340,9 +455,62 @@ static void mpam_resctrl_pick_caches(void)
}
}
+static void mpam_resctrl_pick_mba(void)
+{
+ struct mpam_class *class, *candidate_class = NULL;
+ struct mpam_resctrl_res *res;
+
+ lockdep_assert_cpus_held();
+
+ guard(srcu)(&mpam_srcu);
+ list_for_each_entry_srcu(class, &mpam_classes, classes_list,
+ srcu_read_lock_held(&mpam_srcu)) {
+ struct mpam_props *cprops = &class->props;
+
+ if (class->level < 3) {
+ pr_debug("class %u is before L3\n", class->level);
+ continue;
+ }
+
+ if (!class_has_usable_mba(cprops)) {
+ pr_debug("class %u has no bandwidth control\n",
+ class->level);
+ continue;
+ }
+
+ if (!cpumask_equal(&class->affinity, cpu_possible_mask)) {
+ pr_debug("class %u has missing CPUs\n", class->level);
+ continue;
+ }
+
+ if (!topology_matches_l3(class)) {
+ pr_debug("class %u topology doesn't match L3\n",
+ class->level);
+ continue;
+ }
+
+ /*
+ * mba_sc reads the mbm_local counter, and waggles the MBA
+ * controls. mbm_local is implicitly part of the L3, pick a
+ * resource to be MBA that as close as possible to the L3.
+ */
+ if (!candidate_class || class->level < candidate_class->level)
+ candidate_class = class;
+ }
+
+ if (candidate_class) {
+ pr_debug("selected class %u to back MBA\n",
+ candidate_class->level);
+ res = &mpam_resctrl_controls[RDT_RESOURCE_MBA];
+ res->class = candidate_class;
+ exposed_alloc_capable = true;
+ }
+}
+
static int mpam_resctrl_control_init(struct mpam_resctrl_res *res)
{
struct mpam_class *class = res->class;
+ struct mpam_props *cprops = &class->props;
struct rdt_resource *r = &res->resctrl_res;
switch (r->rid) {
@@ -372,6 +540,19 @@ static int mpam_resctrl_control_init(struct mpam_resctrl_res *res)
*/
r->cache.shareable_bits = resctrl_get_default_ctrl(r);
break;
+ case RDT_RESOURCE_MBA:
+ r->alloc_capable = true;
+ r->schema_fmt = RESCTRL_SCHEMA_RANGE;
+ r->ctrl_scope = RESCTRL_L3_CACHE;
+
+ r->membw.delay_linear = true;
+ r->membw.throttle_mode = THREAD_THROTTLE_UNDEFINED;
+ r->membw.min_bw = get_mba_min(cprops);
+ r->membw.max_bw = MAX_MBA_BW;
+ r->membw.bw_gran = get_mba_granularity(cprops);
+
+ r->name = "MB";
+ break;
default:
return -EINVAL;
}
@@ -386,7 +567,17 @@ static int mpam_resctrl_pick_domain_id(int cpu, struct mpam_component *comp)
if (class->type == MPAM_CLASS_CACHE)
return comp->comp_id;
- /* TODO: repaint domain ids to match the L3 domain ids */
+ if (topology_matches_l3(class)) {
+ /* Use the corresponding L3 component ID as the domain ID */
+ int id = get_cpu_cacheinfo_id(cpu, 3);
+
+ /* Implies topology_matches_l3() made a mistake */
+ if (WARN_ON_ONCE(id == -1))
+ return comp->comp_id;
+
+ return id;
+ }
+
/* Otherwise, expose the ID used by the firmware table code. */
return comp->comp_id;
}
@@ -426,6 +617,12 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
case RDT_RESOURCE_L3:
configured_by = mpam_feat_cpor_part;
break;
+ case RDT_RESOURCE_MBA:
+ if (mpam_has_feature(mpam_feat_mbw_max, cprops)) {
+ configured_by = mpam_feat_mbw_max;
+ break;
+ }
+ fallthrough;
default:
return resctrl_get_default_ctrl(r);
}
@@ -437,6 +634,8 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
switch (configured_by) {
case mpam_feat_cpor_part:
return cfg->cpbm;
+ case mpam_feat_mbw_max:
+ return mbw_max_to_percent(cfg->mbw_max, cprops);
default:
return resctrl_get_default_ctrl(r);
}
@@ -481,6 +680,13 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
cfg.cpbm = cfg_val;
mpam_set_feature(mpam_feat_cpor_part, &cfg);
break;
+ case RDT_RESOURCE_MBA:
+ if (mpam_has_feature(mpam_feat_mbw_max, cprops)) {
+ cfg.mbw_max = percent_to_mbw_max(cfg_val, cprops);
+ mpam_set_feature(mpam_feat_mbw_max, &cfg);
+ break;
+ }
+ fallthrough;
default:
return -EINVAL;
}
@@ -764,6 +970,7 @@ int mpam_resctrl_setup(void)
/* Find some classes to use for controls */
mpam_resctrl_pick_caches();
+ mpam_resctrl_pick_mba();
/* Initialise the resctrl structures from the classes */
for_each_mpam_resctrl_control(res, rid) {
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 27/47] arm_mpam: resctrl: Add support for 'MB' resource
2026-01-12 16:58 ` [PATCH v3 27/47] arm_mpam: resctrl: Add support for 'MB' resource Ben Horgan
@ 2026-01-13 15:06 ` Jonathan Cameron
2026-01-13 22:18 ` Reinette Chatre
2026-01-19 11:53 ` Gavin Shan
2 siblings, 0 replies; 160+ messages in thread
From: Jonathan Cameron @ 2026-01-13 15:06 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm, Zeng Heng
On Mon, 12 Jan 2026 16:58:54 +0000
Ben Horgan <ben.horgan@arm.com> wrote:
> From: James Morse <james.morse@arm.com>
>
> resctrl supports 'MB', as a percentage throttling of traffic somewhere
> after the L3. This is the control that mba_sc uses, so ideally the class
> chosen should be as close as possible to the counters used for mba_local.
>
> MB's percentage control should be backed either with the fixed point
> fraction MBW_MAX or bandwidth portion bitmaps. The bandwidth portion
> bitmaps is not used as its tricky to pick which bits to use to avoid
> contention, and may be possible to expose this as something other than a
> percentage in the future.
>
> CC: Zeng Heng <zengheng4@huawei.com>
> Co-developed-by: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 27/47] arm_mpam: resctrl: Add support for 'MB' resource
2026-01-12 16:58 ` [PATCH v3 27/47] arm_mpam: resctrl: Add support for 'MB' resource Ben Horgan
2026-01-13 15:06 ` Jonathan Cameron
@ 2026-01-13 22:18 ` Reinette Chatre
2026-01-19 11:53 ` Gavin Shan
2 siblings, 0 replies; 160+ messages in thread
From: Reinette Chatre @ 2026-01-13 22:18 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm, Zeng Heng
Hi Ben,
On 1/12/26 8:58 AM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
>
> resctrl supports 'MB', as a percentage throttling of traffic somewhere
> after the L3. This is the control that mba_sc uses, so ideally the class
> chosen should be as close as possible to the counters used for mba_local.
fyi ... [1] enabled total memory bandwidth to also be used as input to
software controller and additionally enabled user space to set per resource group
which memory bandwidth event is used as input to the software controller.
Reinette
[1] https://lore.kernel.org/all/20241206163148.83828-1-tony.luck@intel.com/
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 27/47] arm_mpam: resctrl: Add support for 'MB' resource
2026-01-12 16:58 ` [PATCH v3 27/47] arm_mpam: resctrl: Add support for 'MB' resource Ben Horgan
2026-01-13 15:06 ` Jonathan Cameron
2026-01-13 22:18 ` Reinette Chatre
@ 2026-01-19 11:53 ` Gavin Shan
2026-01-19 13:53 ` Ben Horgan
2 siblings, 1 reply; 160+ messages in thread
From: Gavin Shan @ 2026-01-19 11:53 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm, Zeng Heng
Hi Ben,
On 1/13/26 12:58 AM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
>
> resctrl supports 'MB', as a percentage throttling of traffic somewhere
> after the L3. This is the control that mba_sc uses, so ideally the class
> chosen should be as close as possible to the counters used for mba_local.
>
> MB's percentage control should be backed either with the fixed point
> fraction MBW_MAX or bandwidth portion bitmaps. The bandwidth portion
> bitmaps is not used as its tricky to pick which bits to use to avoid
> contention, and may be possible to expose this as something other than a
> percentage in the future.
>
> CC: Zeng Heng <zengheng4@huawei.com>
> Co-developed-by: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v2:
> Code flow change
> Commit message 'or'
> ---
> drivers/resctrl/mpam_resctrl.c | 209 ++++++++++++++++++++++++++++++++-
> 1 file changed, 208 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
> index 3ca977527698..7402bf4293b6 100644
> --- a/drivers/resctrl/mpam_resctrl.c
> +++ b/drivers/resctrl/mpam_resctrl.c
> @@ -248,6 +248,33 @@ static bool cache_has_usable_cpor(struct mpam_class *class)
> return class->props.cpbm_wd <= 32;
> }
>
> +static bool mba_class_use_mbw_max(struct mpam_props *cprops)
> +{
> + return (mpam_has_feature(mpam_feat_mbw_max, cprops) &&
> + cprops->bwa_wd);
> +}
> +
> +static bool class_has_usable_mba(struct mpam_props *cprops)
> +{
> + return mba_class_use_mbw_max(cprops);
> +}
> +
> +/*
> + * Calculate the worst-case percentage change from each implemented step
> + * in the control.
> + */
> +static u32 get_mba_granularity(struct mpam_props *cprops)
> +{
> + if (!mba_class_use_mbw_max(cprops))
> + return 0;
> +
> + /*
> + * bwa_wd is the number of bits implemented in the 0.xxx
> + * fixed point fraction. 1 bit is 50%, 2 is 25% etc.
> + */
> + return DIV_ROUND_UP(MAX_MBA_BW, 1 << cprops->bwa_wd);
> +}
> +
> /*
> * Each fixed-point hardware value architecturally represents a range
> * of values: the full range 0% - 100% is split contiguously into
> @@ -298,6 +325,94 @@ static u16 percent_to_mbw_max(u8 pc, struct mpam_props *cprops)
> return val;
> }
>
> +static u32 get_mba_min(struct mpam_props *cprops)
> +{
> + if (!mba_class_use_mbw_max(cprops)) {
> + WARN_ON_ONCE(1);
> + return 0;
> + }
> +
> + return mbw_max_to_percent(0, cprops);
> +}
> +
> +/* Find the L3 cache that has affinity with this CPU */
> +static int find_l3_equivalent_bitmask(int cpu, cpumask_var_t tmp_cpumask)
> +{
> + u32 cache_id = get_cpu_cacheinfo_id(cpu, 3);
> +
> + lockdep_assert_cpus_held();
> +
> + return mpam_get_cpumask_from_cache_id(cache_id, 3, tmp_cpumask);
> +}
> +
> +/*
> + * topology_matches_l3() - Is the provided class the same shape as L3
> + * @victim: The class we'd like to pretend is L3.
> + *
> + * resctrl expects all the world's a Xeon, and all counters are on the
> + * L3. We play fast and loose with this, mapping counters on other
> + * classes - provided the CPU->domain mapping is the same kind of shape.
> + *
> + * Using cacheinfo directly would make this work even if resctrl can't
> + * use the L3 - but cacheinfo can't tell us anything about offline CPUs.
> + * Using the L3 resctrl domain list also depends on CPUs being online.
> + * Using the mpam_class we picked for L3 so we can use its domain list
> + * assumes that there are MPAM controls on the L3.
> + * Instead, this path eventually uses the mpam_get_cpumask_from_cache_id()
> + * helper which can tell us about offline CPUs ... but getting the cache_id
> + * to start with relies on at least one CPU per L3 cache being online at
> + * boot.
> + *
> + * Walk the victim component list and compare the affinity mask with the
> + * corresponding L3. The topology matches if each victim:component's affinity
> + * mask is the same as the CPU's corresponding L3's. These lists/masks are
> + * computed from firmware tables so don't change at runtime.
> + */
> +static bool topology_matches_l3(struct mpam_class *victim)
> +{
> + int cpu, err;
> + struct mpam_component *victim_iter;
> + cpumask_var_t __free(free_cpumask_var) tmp_cpumask;
> +
A warning reported by checkpatch.pl like below.
WARNING: Missing a blank line after declarations
#117: FILE: drivers/resctrl/mpam_resctrl.c:375:
+ struct mpam_component *victim_iter;
+ cpumask_var_t __free(free_cpumask_var) tmp_cpumask;
Besides, it'd better to initialize @tmp_cpumask:
cpumask_var_t __free(free_cpumask_var) tmp_cpumask = CPUMASK_VAR_NULL;
> + if (!alloc_cpumask_var(&tmp_cpumask, GFP_KERNEL))
> + return false;
> +
> + guard(srcu)(&mpam_srcu);
> + list_for_each_entry_srcu(victim_iter, &victim->components, class_list,
> + srcu_read_lock_held(&mpam_srcu)) {
> + if (cpumask_empty(&victim_iter->affinity)) {
> + pr_debug("class %u has CPU-less component %u - can't match L3!\n",
> + victim->level, victim_iter->comp_id);
> + return false;
> + }
> +
> + cpu = cpumask_any(&victim_iter->affinity);
> + if (WARN_ON_ONCE(cpu >= nr_cpu_ids))
> + return false;
> +
> + cpumask_clear(tmp_cpumask);
> + err = find_l3_equivalent_bitmask(cpu, tmp_cpumask);
> + if (err) {
> + pr_debug("Failed to find L3's equivalent component to class %u component %u\n",
> + victim->level, victim_iter->comp_id);
> + return false;
> + }
> +
> + /* Any differing bits in the affinity mask? */
> + if (!cpumask_equal(tmp_cpumask, &victim_iter->affinity)) {
> + pr_debug("class %u component %u has Mismatched CPU mask with L3 equivalent\n"
> + "L3:%*pbl != victim:%*pbl\n",
> + victim->level, victim_iter->comp_id,
> + cpumask_pr_args(tmp_cpumask),
> + cpumask_pr_args(&victim_iter->affinity));
> +
> + return false;
> + }
> + }
> +
> + return true;
> +}
> +
> /* Test whether we can export MPAM_CLASS_CACHE:{2,3}? */
> static void mpam_resctrl_pick_caches(void)
> {
> @@ -340,9 +455,62 @@ static void mpam_resctrl_pick_caches(void)
> }
> }
>
> +static void mpam_resctrl_pick_mba(void)
> +{
> + struct mpam_class *class, *candidate_class = NULL;
> + struct mpam_resctrl_res *res;
> +
> + lockdep_assert_cpus_held();
> +
> + guard(srcu)(&mpam_srcu);
> + list_for_each_entry_srcu(class, &mpam_classes, classes_list,
> + srcu_read_lock_held(&mpam_srcu)) {
> + struct mpam_props *cprops = &class->props;
> +
> + if (class->level < 3) {
> + pr_debug("class %u is before L3\n", class->level);
> + continue;
> + }
> +
> + if (!class_has_usable_mba(cprops)) {
> + pr_debug("class %u has no bandwidth control\n",
> + class->level);
> + continue;
> + }
> +
> + if (!cpumask_equal(&class->affinity, cpu_possible_mask)) {
> + pr_debug("class %u has missing CPUs\n", class->level);
> + continue;
> + }
> +
> + if (!topology_matches_l3(class)) {
> + pr_debug("class %u topology doesn't match L3\n",
> + class->level);
> + continue;
> + }
> +
> + /*
> + * mba_sc reads the mbm_local counter, and waggles the MBA
> + * controls. mbm_local is implicitly part of the L3, pick a
> + * resource to be MBA that as close as possible to the L3.
> + */
> + if (!candidate_class || class->level < candidate_class->level)
> + candidate_class = class;
> + }
> +
> + if (candidate_class) {
> + pr_debug("selected class %u to back MBA\n",
> + candidate_class->level);
> + res = &mpam_resctrl_controls[RDT_RESOURCE_MBA];
> + res->class = candidate_class;
> + exposed_alloc_capable = true;
> + }
> +}
> +
> static int mpam_resctrl_control_init(struct mpam_resctrl_res *res)
> {
> struct mpam_class *class = res->class;
> + struct mpam_props *cprops = &class->props;
> struct rdt_resource *r = &res->resctrl_res;
>
> switch (r->rid) {
> @@ -372,6 +540,19 @@ static int mpam_resctrl_control_init(struct mpam_resctrl_res *res)
> */
> r->cache.shareable_bits = resctrl_get_default_ctrl(r);
> break;
> + case RDT_RESOURCE_MBA:
> + r->alloc_capable = true;
> + r->schema_fmt = RESCTRL_SCHEMA_RANGE;
> + r->ctrl_scope = RESCTRL_L3_CACHE;
> +
> + r->membw.delay_linear = true;
> + r->membw.throttle_mode = THREAD_THROTTLE_UNDEFINED;
> + r->membw.min_bw = get_mba_min(cprops);
> + r->membw.max_bw = MAX_MBA_BW;
> + r->membw.bw_gran = get_mba_granularity(cprops);
> +
> + r->name = "MB";
> + break;
> default:
> return -EINVAL;
> }
> @@ -386,7 +567,17 @@ static int mpam_resctrl_pick_domain_id(int cpu, struct mpam_component *comp)
> if (class->type == MPAM_CLASS_CACHE)
> return comp->comp_id;
>
> - /* TODO: repaint domain ids to match the L3 domain ids */
> + if (topology_matches_l3(class)) {
> + /* Use the corresponding L3 component ID as the domain ID */
> + int id = get_cpu_cacheinfo_id(cpu, 3);
> +
> + /* Implies topology_matches_l3() made a mistake */
> + if (WARN_ON_ONCE(id == -1))
> + return comp->comp_id;
> +
> + return id;
> + }
> +
> /* Otherwise, expose the ID used by the firmware table code. */
> return comp->comp_id;
> }
> @@ -426,6 +617,12 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
> case RDT_RESOURCE_L3:
> configured_by = mpam_feat_cpor_part;
> break;
> + case RDT_RESOURCE_MBA:
> + if (mpam_has_feature(mpam_feat_mbw_max, cprops)) {
> + configured_by = mpam_feat_mbw_max;
> + break;
> + }
> + fallthrough;
> default:
> return resctrl_get_default_ctrl(r);
> }
> @@ -437,6 +634,8 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
> switch (configured_by) {
> case mpam_feat_cpor_part:
> return cfg->cpbm;
> + case mpam_feat_mbw_max:
> + return mbw_max_to_percent(cfg->mbw_max, cprops);
> default:
> return resctrl_get_default_ctrl(r);
> }
> @@ -481,6 +680,13 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
> cfg.cpbm = cfg_val;
> mpam_set_feature(mpam_feat_cpor_part, &cfg);
> break;
> + case RDT_RESOURCE_MBA:
> + if (mpam_has_feature(mpam_feat_mbw_max, cprops)) {
> + cfg.mbw_max = percent_to_mbw_max(cfg_val, cprops);
> + mpam_set_feature(mpam_feat_mbw_max, &cfg);
> + break;
> + }
> + fallthrough;
> default:
> return -EINVAL;
> }
> @@ -764,6 +970,7 @@ int mpam_resctrl_setup(void)
>
> /* Find some classes to use for controls */
> mpam_resctrl_pick_caches();
> + mpam_resctrl_pick_mba();
>
> /* Initialise the resctrl structures from the classes */
> for_each_mpam_resctrl_control(res, rid) {
Thanks,
Gavin
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 27/47] arm_mpam: resctrl: Add support for 'MB' resource
2026-01-19 11:53 ` Gavin Shan
@ 2026-01-19 13:53 ` Ben Horgan
0 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-19 13:53 UTC (permalink / raw)
To: Gavin Shan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm, Zeng Heng
Hi Gavin,
On 1/19/26 11:53, Gavin Shan wrote:
> Hi Ben,
>
> On 1/13/26 12:58 AM, Ben Horgan wrote:
>> From: James Morse <james.morse@arm.com>
>>
>> resctrl supports 'MB', as a percentage throttling of traffic somewhere
>> after the L3. This is the control that mba_sc uses, so ideally the class
>> chosen should be as close as possible to the counters used for mba_local.
>>
>> MB's percentage control should be backed either with the fixed point
>> fraction MBW_MAX or bandwidth portion bitmaps. The bandwidth portion
>> bitmaps is not used as its tricky to pick which bits to use to avoid
>> contention, and may be possible to expose this as something other than a
>> percentage in the future.
>>
>> CC: Zeng Heng <zengheng4@huawei.com>
>> Co-developed-by: Dave Martin <Dave.Martin@arm.com>
>> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
>> Signed-off-by: James Morse <james.morse@arm.com>>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>> ---
[...]
>> +static bool topology_matches_l3(struct mpam_class *victim)
>> +{
>> + int cpu, err;
>> + struct mpam_component *victim_iter;
>> + cpumask_var_t __free(free_cpumask_var) tmp_cpumask;
>> +
>
> A warning reported by checkpatch.pl like below.
>
> WARNING: Missing a blank line after declarations
> #117: FILE: drivers/resctrl/mpam_resctrl.c:375:
> + struct mpam_component *victim_iter;
> + cpumask_var_t __free(free_cpumask_var) tmp_cpumask;
I expect this is because __free() declarations don't need to be at the
top of a scope and so checkpatch doesn't treat them as normal
declarations. I don't think anything needs changing here.
>
> Besides, it'd better to initialize @tmp_cpumask:
>
> cpumask_var_t __free(free_cpumask_var) tmp_cpumask = CPUMASK_VAR_NULL;
Yep, I'll update.
>
>> + if (!alloc_cpumask_var(&tmp_cpumask, GFP_KERNEL))
>> + return false;
>> +
>> + guard(srcu)(&mpam_srcu);
>> + list_for_each_entry_srcu(victim_iter, &victim->components,
>> class_list,
>> + srcu_read_lock_held(&mpam_srcu)) {
>> + if (cpumask_empty(&victim_iter->affinity)) {
>> + pr_debug("class %u has CPU-less component %u - can't
>> match L3!\n",
>> + victim->level, victim_iter->comp_id);
>> + return false;
>> + }
>> +
>> + cpu = cpumask_any(&victim_iter->affinity);
>> + if (WARN_ON_ONCE(cpu >= nr_cpu_ids))
>> + return false;
>> +
>> + cpumask_clear(tmp_cpumask);
>> + err = find_l3_equivalent_bitmask(cpu, tmp_cpumask);
>> + if (err) {
>> + pr_debug("Failed to find L3's equivalent component to
>> class %u component %u\n",
>> + victim->level, victim_iter->comp_id);
>> + return false;
>> + }
>> +
>> + /* Any differing bits in the affinity mask? */
>> + if (!cpumask_equal(tmp_cpumask, &victim_iter->affinity)) {
>> + pr_debug("class %u component %u has Mismatched CPU mask
>> with L3 equivalent\n"
>> + "L3:%*pbl != victim:%*pbl\n",
>> + victim->level, victim_iter->comp_id,
>> + cpumask_pr_args(tmp_cpumask),
>> + cpumask_pr_args(&victim_iter->affinity));
>> +
>> + return false;
>> + }
>> + }
>> +
>> + return true;
>> +}
>> +
>> /* Test whether we can export MPAM_CLASS_CACHE:{2,3}? */
>> static void mpam_resctrl_pick_caches(void)
>> {
>> @@ -340,9 +455,62 @@ static void mpam_resctrl_pick_caches(void)
>> }
>> }
>> +static void mpam_resctrl_pick_mba(void)
>> +{
>> + struct mpam_class *class, *candidate_class = NULL;
>> + struct mpam_resctrl_res *res;
>> +
>> + lockdep_assert_cpus_held();
>> +
>> + guard(srcu)(&mpam_srcu);
>> + list_for_each_entry_srcu(class, &mpam_classes, classes_list,
>> + srcu_read_lock_held(&mpam_srcu)) {
>> + struct mpam_props *cprops = &class->props;
>> +
>> + if (class->level < 3) {
>> + pr_debug("class %u is before L3\n", class->level);
>> + continue;
>> + }
>> +
>> + if (!class_has_usable_mba(cprops)) {
>> + pr_debug("class %u has no bandwidth control\n",
>> + class->level);
>> + continue;
>> + }
>> +
>> + if (!cpumask_equal(&class->affinity, cpu_possible_mask)) {
>> + pr_debug("class %u has missing CPUs\n", class->level);
>> + continue;
>> + }
>> +
>> + if (!topology_matches_l3(class)) {
>> + pr_debug("class %u topology doesn't match L3\n",
>> + class->level);
>> + continue;
>> + }
>> +
>> + /*
>> + * mba_sc reads the mbm_local counter, and waggles the MBA
>> + * controls. mbm_local is implicitly part of the L3, pick a
>> + * resource to be MBA that as close as possible to the L3.
>> + */
>> + if (!candidate_class || class->level < candidate_class->level)
>> + candidate_class = class;
>> + }
>> +
>> + if (candidate_class) {
>> + pr_debug("selected class %u to back MBA\n",
>> + candidate_class->level);
>> + res = &mpam_resctrl_controls[RDT_RESOURCE_MBA];
>> + res->class = candidate_class;
>> + exposed_alloc_capable = true;
>> + }
>> +}
>> +
>> static int mpam_resctrl_control_init(struct mpam_resctrl_res *res)
>> {
>> struct mpam_class *class = res->class;
>> + struct mpam_props *cprops = &class->props;
>> struct rdt_resource *r = &res->resctrl_res;
>> switch (r->rid) {
>> @@ -372,6 +540,19 @@ static int mpam_resctrl_control_init(struct
>> mpam_resctrl_res *res)
>> */
>> r->cache.shareable_bits = resctrl_get_default_ctrl(r);
>> break;
>> + case RDT_RESOURCE_MBA:
>> + r->alloc_capable = true;
>> + r->schema_fmt = RESCTRL_SCHEMA_RANGE;
>> + r->ctrl_scope = RESCTRL_L3_CACHE;
>> +
>> + r->membw.delay_linear = true;
>> + r->membw.throttle_mode = THREAD_THROTTLE_UNDEFINED;
>> + r->membw.min_bw = get_mba_min(cprops);
>> + r->membw.max_bw = MAX_MBA_BW;
>> + r->membw.bw_gran = get_mba_granularity(cprops);
>> +
>> + r->name = "MB";
>> + break;
>> default:
>> return -EINVAL;
>> }
>> @@ -386,7 +567,17 @@ static int mpam_resctrl_pick_domain_id(int cpu,
>> struct mpam_component *comp)
>> if (class->type == MPAM_CLASS_CACHE)
>> return comp->comp_id;
>> - /* TODO: repaint domain ids to match the L3 domain ids */
>> + if (topology_matches_l3(class)) {
>> + /* Use the corresponding L3 component ID as the domain ID */
>> + int id = get_cpu_cacheinfo_id(cpu, 3);
>> +
>> + /* Implies topology_matches_l3() made a mistake */
>> + if (WARN_ON_ONCE(id == -1))
>> + return comp->comp_id;
>> +
>> + return id;
>> + }
>> +
>> /* Otherwise, expose the ID used by the firmware table code. */
>> return comp->comp_id;
>> }
>> @@ -426,6 +617,12 @@ u32 resctrl_arch_get_config(struct rdt_resource
>> *r, struct rdt_ctrl_domain *d,
>> case RDT_RESOURCE_L3:
>> configured_by = mpam_feat_cpor_part;
>> break;
>> + case RDT_RESOURCE_MBA:
>> + if (mpam_has_feature(mpam_feat_mbw_max, cprops)) {
>> + configured_by = mpam_feat_mbw_max;
>> + break;
>> + }
>> + fallthrough;
>> default:
>> return resctrl_get_default_ctrl(r);
>> }
>> @@ -437,6 +634,8 @@ u32 resctrl_arch_get_config(struct rdt_resource
>> *r, struct rdt_ctrl_domain *d,
>> switch (configured_by) {
>> case mpam_feat_cpor_part:
>> return cfg->cpbm;
>> + case mpam_feat_mbw_max:
>> + return mbw_max_to_percent(cfg->mbw_max, cprops);
>> default:
>> return resctrl_get_default_ctrl(r);
>> }
>> @@ -481,6 +680,13 @@ int resctrl_arch_update_one(struct rdt_resource
>> *r, struct rdt_ctrl_domain *d,
>> cfg.cpbm = cfg_val;
>> mpam_set_feature(mpam_feat_cpor_part, &cfg);
>> break;
>> + case RDT_RESOURCE_MBA:
>> + if (mpam_has_feature(mpam_feat_mbw_max, cprops)) {
>> + cfg.mbw_max = percent_to_mbw_max(cfg_val, cprops);
>> + mpam_set_feature(mpam_feat_mbw_max, &cfg);
>> + break;
>> + }
>> + fallthrough;
>> default:
>> return -EINVAL;
>> }
>> @@ -764,6 +970,7 @@ int mpam_resctrl_setup(void)
>> /* Find some classes to use for controls */
>> mpam_resctrl_pick_caches();
>> + mpam_resctrl_pick_mba();
>> /* Initialise the resctrl structures from the classes */
>> for_each_mpam_resctrl_control(res, rid) {
>
> Thanks,
> Gavin
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 28/47] arm_mpam: resctrl: Add support for csu counters
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (26 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 27/47] arm_mpam: resctrl: Add support for 'MB' resource Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-13 23:14 ` Reinette Chatre
2026-01-30 11:19 ` Ben Horgan
2026-01-12 16:58 ` [PATCH v3 29/47] arm_mpam: resctrl: Pick classes for use as mbm counters Ben Horgan
` (23 subsequent siblings)
51 siblings, 2 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: James Morse <james.morse@arm.com>
resctrl exposes a counter via a file named llc_occupancy. This isn't really
a counter as its value goes up and down, this is a snapshot of the cache
storage usage monitor.
Add some picking code to find a cache as close as possible to the L3 that
supports the CSU monitor.
If there is an L3, but it doesn't have any controls, force the L3 resource
to exist. The existing topology_matches_l3() and
mpam_resctrl_domain_hdr_init() code will ensure this looks like the L3,
even if the class belongs to a later cache.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Co-developed-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Dave Martin <dave.martin@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since rfc:
Allow csu counters however many partid or pmg there are
else if -> if
reduce scope of local variables
drop has_csu
Changes since v2:
return -> break so works for mbwu in later patch
add for_each_mpam_resctrl_mon
return error from mpam_resctrl_monitor_init(). It may fail when is abmc
allocation introduced in a later patch.
Squashed in patch from Dave Martin:
https://lore.kernel.org/lkml/20250820131621.54983-1-Dave.Martin@arm.com/
---
drivers/resctrl/mpam_internal.h | 6 ++
drivers/resctrl/mpam_resctrl.c | 173 +++++++++++++++++++++++++++++++-
2 files changed, 174 insertions(+), 5 deletions(-)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index f89ceaf7623d..21cc776e57aa 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -349,6 +349,12 @@ struct mpam_resctrl_res {
struct rdt_resource resctrl_res;
};
+struct mpam_resctrl_mon {
+ struct mpam_class *class;
+
+ /* per-class data that resctrl needs will live here */
+};
+
static inline int mpam_alloc_csu_mon(struct mpam_class *class)
{
struct mpam_props *cprops = &class->props;
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index 7402bf4293b6..5020a5faed96 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -37,6 +37,21 @@ static struct mpam_resctrl_res mpam_resctrl_controls[RDT_NUM_RESOURCES];
/* The lock for modifying resctrl's domain lists from cpuhp callbacks. */
static DEFINE_MUTEX(domain_list_lock);
+/*
+ * The classes we've picked to map to resctrl events.
+ * Resctrl believes all the worlds a Xeon, and these are all on the L3. This
+ * array lets us find the actual class backing the event counters. e.g.
+ * the only memory bandwidth counters may be on the memory controller, but to
+ * make use of them, we pretend they are on L3.
+ * Class pointer may be NULL.
+ */
+static struct mpam_resctrl_mon mpam_resctrl_counters[QOS_NUM_EVENTS];
+
+#define for_each_mpam_resctrl_mon(mon, eventid) \
+ for (eventid = 0, mon = &mpam_resctrl_counters[eventid]; \
+ eventid < QOS_NUM_EVENTS; \
+ eventid++, mon = &mpam_resctrl_counters[eventid])
+
static bool exposed_alloc_capable;
static bool exposed_mon_capable;
@@ -259,6 +274,28 @@ static bool class_has_usable_mba(struct mpam_props *cprops)
return mba_class_use_mbw_max(cprops);
}
+static bool cache_has_usable_csu(struct mpam_class *class)
+{
+ struct mpam_props *cprops;
+
+ if (!class)
+ return false;
+
+ cprops = &class->props;
+
+ if (!mpam_has_feature(mpam_feat_msmon_csu, cprops))
+ return false;
+
+ /*
+ * CSU counters settle on the value, so we can get away with
+ * having only one.
+ */
+ if (!cprops->num_csu_mon)
+ return false;
+
+ return true;
+}
+
/*
* Calculate the worst-case percentage change from each implemented step
* in the control.
@@ -507,6 +544,64 @@ static void mpam_resctrl_pick_mba(void)
}
}
+static void counter_update_class(enum resctrl_event_id evt_id,
+ struct mpam_class *class)
+{
+ struct mpam_class *existing_class = mpam_resctrl_counters[evt_id].class;
+
+ if (existing_class) {
+ if (class->level == 3) {
+ pr_debug("Existing class is L3 - L3 wins\n");
+ return;
+ }
+
+ if (existing_class->level < class->level) {
+ pr_debug("Existing class is closer to L3, %u versus %u - closer is better\n",
+ existing_class->level, class->level);
+ return;
+ }
+ }
+
+ mpam_resctrl_counters[evt_id].class = class;
+ exposed_mon_capable = true;
+}
+
+static void mpam_resctrl_pick_counters(void)
+{
+ struct mpam_class *class;
+
+ lockdep_assert_cpus_held();
+
+ guard(srcu)(&mpam_srcu);
+ list_for_each_entry_srcu(class, &mpam_classes, classes_list,
+ srcu_read_lock_held(&mpam_srcu)) {
+ if (class->level < 3) {
+ pr_debug("class %u is before L3", class->level);
+ continue;
+ }
+
+ if (!cpumask_equal(&class->affinity, cpu_possible_mask)) {
+ pr_debug("class %u does not cover all CPUs",
+ class->level);
+ continue;
+ }
+
+ if (cache_has_usable_csu(class) && topology_matches_l3(class)) {
+ pr_debug("class %u has usable CSU, and matches L3 topology",
+ class->level);
+
+ /* CSU counters only make sense on a cache. */
+ switch (class->type) {
+ case MPAM_CLASS_CACHE:
+ counter_update_class(QOS_L3_OCCUP_EVENT_ID, class);
+ break;
+ default:
+ break;
+ }
+ }
+ }
+}
+
static int mpam_resctrl_control_init(struct mpam_resctrl_res *res)
{
struct mpam_class *class = res->class;
@@ -582,6 +677,57 @@ static int mpam_resctrl_pick_domain_id(int cpu, struct mpam_component *comp)
return comp->comp_id;
}
+static int mpam_resctrl_monitor_init(struct mpam_resctrl_mon *mon,
+ enum resctrl_event_id type)
+{
+ struct mpam_resctrl_res *res = &mpam_resctrl_controls[RDT_RESOURCE_L3];
+ struct rdt_resource *l3 = &res->resctrl_res;
+
+ lockdep_assert_cpus_held();
+
+ /* There also needs to be an L3 cache present */
+ if (get_cpu_cacheinfo_id(smp_processor_id(), 3) == -1)
+ return 0;
+
+ /*
+ * If there are no MPAM resources on L3, force it into existence.
+ * topology_matches_l3() already ensures this looks like the L3.
+ * The domain-ids will be fixed up by mpam_resctrl_domain_hdr_init().
+ */
+ if (!res->class) {
+ pr_warn_once("Faking L3 MSC to enable counters.\n");
+ res->class = mpam_resctrl_counters[type].class;
+ }
+
+ /* Called multiple times!, once per event type */
+ if (exposed_mon_capable) {
+ l3->mon_capable = true;
+
+ /* Setting name is necessary on monitor only platforms */
+ l3->name = "L3";
+ l3->mon_scope = RESCTRL_L3_CACHE;
+
+ resctrl_enable_mon_event(type);
+
+ /*
+ * Unfortunately, num_rmid doesn't mean anything for
+ * mpam, and its exposed to user-space!
+ *
+ * num-rmid is supposed to mean the minimum number of
+ * monitoring groups that can exist simultaneously, including
+ * the default monitoring group for each control group.
+ *
+ * For mpam, each control group has its own pmg/rmid space, so
+ * it is not appropriate to advertise the whole rmid_idx space
+ * here. But the pmgs corresponding to the parent control
+ * group can be allocated freely:
+ */
+ l3->mon.num_rmid = mpam_pmg_max + 1;
+ }
+
+ return 0;
+}
+
u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
u32 closid, enum resctrl_conf_type type)
{
@@ -958,6 +1104,8 @@ int mpam_resctrl_setup(void)
int err = 0;
struct mpam_resctrl_res *res;
enum resctrl_res_level rid;
+ struct mpam_resctrl_mon *mon;
+ enum resctrl_event_id eventid;
wait_event(wait_cacheinfo_ready, cacheinfo_ready);
@@ -980,16 +1128,26 @@ int mpam_resctrl_setup(void)
err = mpam_resctrl_control_init(res);
if (err) {
pr_debug("Failed to initialise rid %u\n", rid);
- break;
+ goto internal_error;
}
}
- cpus_read_unlock();
- if (err) {
- pr_debug("Internal error %d - resctrl not supported\n", err);
- return err;
+ /* Find some classes to use for monitors */
+ mpam_resctrl_pick_counters();
+
+ for_each_mpam_resctrl_mon(mon, eventid) {
+ if (!mon->class)
+ continue; // dummy resource
+
+ err = mpam_resctrl_monitor_init(mon, eventid);
+ if (err) {
+ pr_debug("Failed to initialise event %u\n", eventid);
+ goto internal_error;
+ }
}
+ cpus_read_unlock();
+
if (!exposed_alloc_capable && !exposed_mon_capable) {
pr_debug("No alloc(%u) or monitor(%u) found - resctrl not supported\n",
exposed_alloc_capable, exposed_mon_capable);
@@ -999,6 +1157,11 @@ int mpam_resctrl_setup(void)
/* TODO: call resctrl_init() */
return 0;
+
+internal_error:
+ cpus_read_unlock();
+ pr_debug("Internal error %d - resctrl not supported\n", err);
+ return err;
}
static int __init __cacheinfo_ready(void)
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 28/47] arm_mpam: resctrl: Add support for csu counters
2026-01-12 16:58 ` [PATCH v3 28/47] arm_mpam: resctrl: Add support for csu counters Ben Horgan
@ 2026-01-13 23:14 ` Reinette Chatre
2026-01-15 15:43 ` Ben Horgan
2026-01-30 11:19 ` Ben Horgan
1 sibling, 1 reply; 160+ messages in thread
From: Reinette Chatre @ 2026-01-13 23:14 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
Hi Ben,
On 1/12/26 8:58 AM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
>
> resctrl exposes a counter via a file named llc_occupancy. This isn't really
> a counter as its value goes up and down, this is a snapshot of the cache
> storage usage monitor.
>
> Add some picking code to find a cache as close as possible to the L3 that
> supports the CSU monitor.
>
> If there is an L3, but it doesn't have any controls, force the L3 resource
> to exist. The existing topology_matches_l3() and
> mpam_resctrl_domain_hdr_init() code will ensure this looks like the L3,
> even if the class belongs to a later cache.
>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Co-developed-by: Dave Martin <dave.martin@arm.com>
> Signed-off-by: Dave Martin <dave.martin@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since rfc:
> Allow csu counters however many partid or pmg there are
> else if -> if
> reduce scope of local variables
> drop has_csu
>
> Changes since v2:
> return -> break so works for mbwu in later patch
> add for_each_mpam_resctrl_mon
> return error from mpam_resctrl_monitor_init(). It may fail when is abmc
> allocation introduced in a later patch.
> Squashed in patch from Dave Martin:
> https://lore.kernel.org/lkml/20250820131621.54983-1-Dave.Martin@arm.com/
> ---
> drivers/resctrl/mpam_internal.h | 6 ++
> drivers/resctrl/mpam_resctrl.c | 173 +++++++++++++++++++++++++++++++-
> 2 files changed, 174 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index f89ceaf7623d..21cc776e57aa 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -349,6 +349,12 @@ struct mpam_resctrl_res {
> struct rdt_resource resctrl_res;
> };
>
> +struct mpam_resctrl_mon {
> + struct mpam_class *class;
> +
> + /* per-class data that resctrl needs will live here */
> +};
> +
> static inline int mpam_alloc_csu_mon(struct mpam_class *class)
> {
> struct mpam_props *cprops = &class->props;
> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
> index 7402bf4293b6..5020a5faed96 100644
> --- a/drivers/resctrl/mpam_resctrl.c
> +++ b/drivers/resctrl/mpam_resctrl.c
> @@ -37,6 +37,21 @@ static struct mpam_resctrl_res mpam_resctrl_controls[RDT_NUM_RESOURCES];
> /* The lock for modifying resctrl's domain lists from cpuhp callbacks. */
> static DEFINE_MUTEX(domain_list_lock);
>
> +/*
> + * The classes we've picked to map to resctrl events.
> + * Resctrl believes all the worlds a Xeon, and these are all on the L3. This
> + * array lets us find the actual class backing the event counters. e.g.
> + * the only memory bandwidth counters may be on the memory controller, but to
> + * make use of them, we pretend they are on L3.
> + * Class pointer may be NULL.
> + */
> +static struct mpam_resctrl_mon mpam_resctrl_counters[QOS_NUM_EVENTS];
> +
> +#define for_each_mpam_resctrl_mon(mon, eventid) \
> + for (eventid = 0, mon = &mpam_resctrl_counters[eventid]; \
> + eventid < QOS_NUM_EVENTS; \
> + eventid++, mon = &mpam_resctrl_counters[eventid])
> +
Reading the above loop and how it is used to call mpam_resctrl_monitor_init() for every event
it looks like there is an implicit assumption that MPAM supports all events known to
resctrl.
Please consider the most recent resctrl feature "telemetry monitoring" currently queued
for inclusion: https://lore.kernel.org/lkml/20251217172121.12030-1-tony.luck@intel.com/
(You can find latest resctrl code queued for inclusion on the x86/cache branch of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git)
New telemetry monitoring introduces several new events known to resctrl. Specifically, here
is how enum resctrl_event_id looks at the moment:
/* Event IDs */
enum resctrl_event_id {
/* Must match value of first event below */
QOS_FIRST_EVENT = 0x01,
/*
* These values match those used to program IA32_QM_EVTSEL before
* reading IA32_QM_CTR on RDT systems.
*/
QOS_L3_OCCUP_EVENT_ID = 0x01,
QOS_L3_MBM_TOTAL_EVENT_ID = 0x02,
QOS_L3_MBM_LOCAL_EVENT_ID = 0x03,
/* Intel Telemetry Events */
PMT_EVENT_ENERGY,
PMT_EVENT_ACTIVITY,
PMT_EVENT_STALLS_LLC_HIT,
PMT_EVENT_C1_RES,
PMT_EVENT_UNHALTED_CORE_CYCLES,
PMT_EVENT_STALLS_LLC_MISS,
PMT_EVENT_AUTO_C6_RES,
PMT_EVENT_UNHALTED_REF_CYCLES,
PMT_EVENT_UOPS_RETIRED,
/* Must be the last */
QOS_NUM_EVENTS,
};
...
> @@ -582,6 +677,57 @@ static int mpam_resctrl_pick_domain_id(int cpu, struct mpam_component *comp)
> return comp->comp_id;
> }
>
> +static int mpam_resctrl_monitor_init(struct mpam_resctrl_mon *mon,
> + enum resctrl_event_id type)
> +{
> + struct mpam_resctrl_res *res = &mpam_resctrl_controls[RDT_RESOURCE_L3];
> + struct rdt_resource *l3 = &res->resctrl_res;
> +
> + lockdep_assert_cpus_held();
> +
> + /* There also needs to be an L3 cache present */
> + if (get_cpu_cacheinfo_id(smp_processor_id(), 3) == -1)
> + return 0;
> +
> + /*
> + * If there are no MPAM resources on L3, force it into existence.
> + * topology_matches_l3() already ensures this looks like the L3.
> + * The domain-ids will be fixed up by mpam_resctrl_domain_hdr_init().
> + */
> + if (!res->class) {
> + pr_warn_once("Faking L3 MSC to enable counters.\n");
> + res->class = mpam_resctrl_counters[type].class;
> + }
> +
> + /* Called multiple times!, once per event type */
> + if (exposed_mon_capable) {
> + l3->mon_capable = true;
> +
> + /* Setting name is necessary on monitor only platforms */
> + l3->name = "L3";
> + l3->mon_scope = RESCTRL_L3_CACHE;
> +
> + resctrl_enable_mon_event(type);
btw, the telemetry work also changed this function prototype to be:
bool resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu,
unsigned int binary_bits, void *arch_priv);
If I understand correctly resctrl_enable_mon_event() will be called for every event in
enum resctrl_event_id which now contains events that may not actually be supported. I think it
may be safer to be specific in which events MPAM wants to enable.
> +
> + /*
> + * Unfortunately, num_rmid doesn't mean anything for
> + * mpam, and its exposed to user-space!
> + *
The idea of adding a per MON group "num_mon_groups" file has been floated a couple of
times now. I have not heard any objections against doing something like this.
https://lore.kernel.org/all/cbe665c2-fe83-e446-1696-7115c0f9fd76@arm.com/
https://lore.kernel.org/lkml/46767ca7-1f1b-48e8-8ce6-be4b00d129f9@intel.com/
> + * num-rmid is supposed to mean the minimum number of
> + * monitoring groups that can exist simultaneously, including
> + * the default monitoring group for each control group.
> + *
> + * For mpam, each control group has its own pmg/rmid space, so
> + * it is not appropriate to advertise the whole rmid_idx space
> + * here. But the pmgs corresponding to the parent control
> + * group can be allocated freely:
> + */
> + l3->mon.num_rmid = mpam_pmg_max + 1;
> + }
> +
> + return 0;
> +}
> +
Reinette
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 28/47] arm_mpam: resctrl: Add support for csu counters
2026-01-13 23:14 ` Reinette Chatre
@ 2026-01-15 15:43 ` Ben Horgan
2026-01-15 18:54 ` Reinette Chatre
0 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-15 15:43 UTC (permalink / raw)
To: Reinette Chatre
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
Hi Reinette,
On 1/13/26 23:14, Reinette Chatre wrote:
> Hi Ben,
>
> On 1/12/26 8:58 AM, Ben Horgan wrote:
>> From: James Morse <james.morse@arm.com>
>>
>> resctrl exposes a counter via a file named llc_occupancy. This isn't really
>> a counter as its value goes up and down, this is a snapshot of the cache
>> storage usage monitor.
>>
>> Add some picking code to find a cache as close as possible to the L3 that
>> supports the CSU monitor.
>>
>> If there is an L3, but it doesn't have any controls, force the L3 resource
>> to exist. The existing topology_matches_l3() and
>> mpam_resctrl_domain_hdr_init() code will ensure this looks like the L3,
>> even if the class belongs to a later cache.
>>
>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Co-developed-by: Dave Martin <dave.martin@arm.com>
>> Signed-off-by: Dave Martin <dave.martin@arm.com>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>> ---
>> Changes since rfc:
>> Allow csu counters however many partid or pmg there are
>> else if -> if
>> reduce scope of local variables
>> drop has_csu
>>
>> Changes since v2:
>> return -> break so works for mbwu in later patch
>> add for_each_mpam_resctrl_mon
>> return error from mpam_resctrl_monitor_init(). It may fail when is abmc
>> allocation introduced in a later patch.
>> Squashed in patch from Dave Martin:
>> https://lore.kernel.org/lkml/20250820131621.54983-1-Dave.Martin@arm.com/
>> ---
>> drivers/resctrl/mpam_internal.h | 6 ++
>> drivers/resctrl/mpam_resctrl.c | 173 +++++++++++++++++++++++++++++++-
>> 2 files changed, 174 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>> index f89ceaf7623d..21cc776e57aa 100644
>> --- a/drivers/resctrl/mpam_internal.h
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -349,6 +349,12 @@ struct mpam_resctrl_res {
>> struct rdt_resource resctrl_res;
>> };
>>
>> +struct mpam_resctrl_mon {
>> + struct mpam_class *class;
>> +
>> + /* per-class data that resctrl needs will live here */
>> +};
>> +
>> static inline int mpam_alloc_csu_mon(struct mpam_class *class)
>> {
>> struct mpam_props *cprops = &class->props;
>> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
>> index 7402bf4293b6..5020a5faed96 100644
>> --- a/drivers/resctrl/mpam_resctrl.c
>> +++ b/drivers/resctrl/mpam_resctrl.c
>> @@ -37,6 +37,21 @@ static struct mpam_resctrl_res mpam_resctrl_controls[RDT_NUM_RESOURCES];
>> /* The lock for modifying resctrl's domain lists from cpuhp callbacks. */
>> static DEFINE_MUTEX(domain_list_lock);
>>
>> +/*
>> + * The classes we've picked to map to resctrl events.
>> + * Resctrl believes all the worlds a Xeon, and these are all on the L3. This
>> + * array lets us find the actual class backing the event counters. e.g.
>> + * the only memory bandwidth counters may be on the memory controller, but to
>> + * make use of them, we pretend they are on L3.
>> + * Class pointer may be NULL.
>> + */
>> +static struct mpam_resctrl_mon mpam_resctrl_counters[QOS_NUM_EVENTS];
>> +
>> +#define for_each_mpam_resctrl_mon(mon, eventid) \
>> + for (eventid = 0, mon = &mpam_resctrl_counters[eventid]; \
>> + eventid < QOS_NUM_EVENTS; \
>> + eventid++, mon = &mpam_resctrl_counters[eventid])
>> +
>
> Reading the above loop and how it is used to call mpam_resctrl_monitor_init() for every event
> it looks like there is an implicit assumption that MPAM supports all events known to
> resctrl.
>
> Please consider the most recent resctrl feature "telemetry monitoring" currently queued
> for inclusion: https://lore.kernel.org/lkml/20251217172121.12030-1-tony.luck@intel.com/
>
> (You can find latest resctrl code queued for inclusion on the x86/cache branch of
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git)
I'll test against this.
>
> New telemetry monitoring introduces several new events known to resctrl. Specifically, here
> is how enum resctrl_event_id looks at the moment:
>
> /* Event IDs */
> enum resctrl_event_id {
> /* Must match value of first event below */
> QOS_FIRST_EVENT = 0x01,
[...]
Thanks for bringing this to my attention. mpam_resctrl_monitor_init()
won't be called for all events known to resctrl as
mpam_resctrl_pick_counters() will only set a class for the 3 that MPAM
knows about. Still, it is probably best to restrict the iterator to the
relevant ones.
> };
>
> ...
>
>> @@ -582,6 +677,57 @@ static int mpam_resctrl_pick_domain_id(int cpu, struct mpam_component *comp)
>> return comp->comp_id;
>> }
>>
>> +static int mpam_resctrl_monitor_init(struct mpam_resctrl_mon *mon,
>> + enum resctrl_event_id type)
>> +{
>> + struct mpam_resctrl_res *res = &mpam_resctrl_controls[RDT_RESOURCE_L3];
>> + struct rdt_resource *l3 = &res->resctrl_res;
>> +
>> + lockdep_assert_cpus_held();
>> +
>> + /* There also needs to be an L3 cache present */
>> + if (get_cpu_cacheinfo_id(smp_processor_id(), 3) == -1)
>> + return 0;
>> +
>> + /*
>> + * If there are no MPAM resources on L3, force it into existence.
>> + * topology_matches_l3() already ensures this looks like the L3.
>> + * The domain-ids will be fixed up by mpam_resctrl_domain_hdr_init().
>> + */
>> + if (!res->class) {
>> + pr_warn_once("Faking L3 MSC to enable counters.\n");
>> + res->class = mpam_resctrl_counters[type].class;
>> + }
>> +
>> + /* Called multiple times!, once per event type */
>> + if (exposed_mon_capable) {
>> + l3->mon_capable = true;
>> +
>> + /* Setting name is necessary on monitor only platforms */
>> + l3->name = "L3";
>> + l3->mon_scope = RESCTRL_L3_CACHE;
>> +
>> + resctrl_enable_mon_event(type);
>
> btw, the telemetry work also changed this function prototype to be:
> bool resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu,
> unsigned int binary_bits, void *arch_priv);
I will update to use the new signature.
>
> If I understand correctly resctrl_enable_mon_event() will be called for every event in
> enum resctrl_event_id which now contains events that may not actually be supported. I think it
> may be safer to be specific in which events MPAM wants to enable.
As noted above, this only happens for the ones chosen
mpam_resctrl_pick_counters().
>
>> +
>> + /*
>> + * Unfortunately, num_rmid doesn't mean anything for
>> + * mpam, and its exposed to user-space!
>> + *
>
> The idea of adding a per MON group "num_mon_groups" file has been floated a couple of
> times now. I have not heard any objections against doing something like this.
> https://lore.kernel.org/all/cbe665c2-fe83-e446-1696-7115c0f9fd76@arm.com/
> https://lore.kernel.org/lkml/46767ca7-1f1b-48e8-8ce6-be4b00d129f9@intel.com/
Hmm, I see now that 'num_rmid' is documented as an upper bound and so
neither 1 or mpam_pmg_max + 1 agree with the documentation.
"
"num_rmids":
The number of RMIDs available. This is the
upper bound for how many "CTRL_MON" + "MON"
groups can be created.
"
So, if I understand correctly you're proposing setting
num_rmids = num_pmg * num_partids on arm platforms and that in the
interim this can then be used to calculate the num_pmg by calculating
num_closid/num_rmid but that a per CTRL_MON num_mon_groups should be
added to make this consistent across architectures?
>
>> + * num-rmid is supposed to mean the minimum number of
>> + * monitoring groups that can exist simultaneously, including
>> + * the default monitoring group for each control group.
>> + *
>> + * For mpam, each control group has its own pmg/rmid space, so
>> + * it is not appropriate to advertise the whole rmid_idx space
>> + * here. But the pmgs corresponding to the parent control
>> + * group can be allocated freely:
>> + */
>> + l3->mon.num_rmid = mpam_pmg_max + 1;
>> + }
>> +
>> + return 0;
>> +}
>> +
>
> Reinette
>
I appreciate that you have shared this resctrl knowledge with me.
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 28/47] arm_mpam: resctrl: Add support for csu counters
2026-01-15 15:43 ` Ben Horgan
@ 2026-01-15 18:54 ` Reinette Chatre
2026-01-16 10:29 ` Ben Horgan
0 siblings, 1 reply; 160+ messages in thread
From: Reinette Chatre @ 2026-01-15 18:54 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
Hi Ben,
On 1/15/26 7:43 AM, Ben Horgan wrote:
> On 1/13/26 23:14, Reinette Chatre wrote:
>> On 1/12/26 8:58 AM, Ben Horgan wrote:
...
>>> +
>>> + /*
>>> + * Unfortunately, num_rmid doesn't mean anything for
>>> + * mpam, and its exposed to user-space!
>>> + *
>>
>> The idea of adding a per MON group "num_mon_groups" file has been floated a couple of
>> times now. I have not heard any objections against doing something like this.
>> https://lore.kernel.org/all/cbe665c2-fe83-e446-1696-7115c0f9fd76@arm.com/
>> https://lore.kernel.org/lkml/46767ca7-1f1b-48e8-8ce6-be4b00d129f9@intel.com/
>
> Hmm, I see now that 'num_rmid' is documented as an upper bound and so
> neither 1 or mpam_pmg_max + 1 agree with the documentation.
>
> "
> "num_rmids":
> The number of RMIDs available. This is the
> upper bound for how many "CTRL_MON" + "MON"
> groups can be created.
> "
Please note that this documentation has been refactored (without changing its
meaning). The above quoted text is specific to L3 monitoring and with the
addition of telemetry monitoring the relevant text now reads:
The upper bound for how many "CTRL_MON" + "MON" can be created
is the smaller of the L3_MON and PERF_PKG_MON "num_rmids" values.
>
> So, if I understand correctly you're proposing setting
> num_rmids = num_pmg * num_partids on arm platforms and that in the
> interim this can then be used to calculate the num_pmg by calculating
> num_closid/num_rmid but that a per CTRL_MON num_mon_groups should be
> added to make this consistent across architectures?
Yes for num_rmids = num_pmg * num_partids. The motivation for this is that to me
this looks like the value that best matches the num_rmids documentation. I understand
the RMID vs PMG is difficult so my proposal is certainly not set in stone and I would like to
hear motivation for different interpretations. "calculating num_pmg" is not obvious
though. I interpret "num_pmg" here as number of monitor groups per control group and on
an Arm system this is indeed num_closid/num_rmids (if num_rmids = num_pmg * num_partids)
but on x86 it is just num_rmids. Having user space depend on such computation to determine how
many monitor groups per control group would thus require that user space knows whether the
underlying system is Arm or x86 and would go against goal of having resctrl as a generic interface.
The way forward may be to deprecate (somehow) num_rmids and transition to something
like "num_mon_groups" but it is currently vague how "num_mon_groups" may look like. That thread
(https://lore.kernel.org/lkml/46767ca7-1f1b-48e8-8ce6-be4b00d129f9@intel.com/) fizzled
out after raising a few options how it may look.
Another proposal was to add a "mon_id_includes_control_id" to use as another "guide" to
determine how many monitoring groups can be created but at the time it seemed an intermediary
step for user to determine the number of monitor groups that resctrl can also provide.
https://lore.kernel.org/lkml/CALPaoChad6=xqz+BQQd=dB915xhj1gusmcrS9ya+T2GyhTQc5Q@mail.gmail.com/
Making this consistent across architectures is the goal since resctrl aims to be
a generic interface. Users should not need to do things like infer which system they
are running on by looking at output of resctrl files as mentioned.
fwiw ... there seems to be a usage by Google to compare num_rmids to num_closids to determine
how to interact with resctrl:
https://lore.kernel.org/lkml/CALPaoCgSO7HzK9BjyM8yL50oPyq9kBj64Nkgyo1WEJrWy5uHUg@mail.gmail.com/
Reinette
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 28/47] arm_mpam: resctrl: Add support for csu counters
2026-01-15 18:54 ` Reinette Chatre
@ 2026-01-16 10:29 ` Ben Horgan
2026-01-20 15:28 ` Peter Newman
0 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-16 10:29 UTC (permalink / raw)
To: Reinette Chatre, peternewman@google.com
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
Hi Reinette, Peter,
On 1/15/26 18:54, Reinette Chatre wrote:
> Hi Ben,
>
> On 1/15/26 7:43 AM, Ben Horgan wrote:
>> On 1/13/26 23:14, Reinette Chatre wrote:
>>> On 1/12/26 8:58 AM, Ben Horgan wrote:
> ...
>>>> +
>>>> + /*
>>>> + * Unfortunately, num_rmid doesn't mean anything for
>>>> + * mpam, and its exposed to user-space!
>>>> + *
>>>
>>> The idea of adding a per MON group "num_mon_groups" file has been floated a couple of
>>> times now. I have not heard any objections against doing something like this.
>>> https://lore.kernel.org/all/cbe665c2-fe83-e446-1696-7115c0f9fd76@arm.com/
>>> https://lore.kernel.org/lkml/46767ca7-1f1b-48e8-8ce6-be4b00d129f9@intel.com/
>>
>> Hmm, I see now that 'num_rmid' is documented as an upper bound and so
>> neither 1 or mpam_pmg_max + 1 agree with the documentation.
>>
>> "
>> "num_rmids":
>> The number of RMIDs available. This is the
>> upper bound for how many "CTRL_MON" + "MON"
>> groups can be created.
>> "
>
> Please note that this documentation has been refactored (without changing its
> meaning). The above quoted text is specific to L3 monitoring and with the
> addition of telemetry monitoring the relevant text now reads:
> The upper bound for how many "CTRL_MON" + "MON" can be created
> is the smaller of the L3_MON and PERF_PKG_MON "num_rmids" values.
>
>>
>> So, if I understand correctly you're proposing setting
>> num_rmids = num_pmg * num_partids on arm platforms and that in the
>> interim this can then be used to calculate the num_pmg by calculating
>> num_closid/num_rmid but that a per CTRL_MON num_mon_groups should be
>> added to make this consistent across architectures?
>
> Yes for num_rmids = num_pmg * num_partids.
Ok, I don't really see another option.
The motivation for this is that to me
> this looks like the value that best matches the num_rmids documentation. I understand
> the RMID vs PMG is difficult so my proposal is certainly not set in stone and I would like to
> hear motivation for different interpretations. "calculating num_pmg" is not obvious
> though. I interpret "num_pmg" here as number of monitor groups per control group and on
> an Arm system this is indeed num_closid/num_rmids (if num_rmids = num_pmg * num_partids)
> but on x86 it is just num_rmids. Having user space depend on such computation to determine how
> many monitor groups per control group would thus require that user space knows whether the
> underlying system is Arm or x86 and would go against goal of having resctrl as a generic interface.
>
> The way forward may be to deprecate (somehow) num_rmids and transition to something
> like "num_mon_groups" but it is currently vague how "num_mon_groups" may look like. That thread
> (https://lore.kernel.org/lkml/46767ca7-1f1b-48e8-8ce6-be4b00d129f9@intel.com/) fizzled
> out after raising a few options how it may look.
>
> Another proposal was to add a "mon_id_includes_control_id" to use as another "guide" to
> determine how many monitoring groups can be created but at the time it seemed an intermediary
> step for user to determine the number of monitor groups that resctrl can also provide.
> https://lore.kernel.org/lkml/CALPaoChad6=xqz+BQQd=dB915xhj1gusmcrS9ya+T2GyhTQc5Q@mail.gmail.com/
Just thinking about it now but the "mon_id_includes_control_id" option
seems the best to me as it is a single bit option that along with
"num_rmids" let's you know which monitor groups you can create and if
it's sensible to move monitor groups between CTRL MON groups.
The "num_mon_groups" per CTRL MON group would also need to be
interpreted together with "num_rmid" to know if it is a global or per
CTRL MON upper bound. This option also uses multiple files to give the
same bit of information.
>
> Making this consistent across architectures is the goal since resctrl aims to be
> a generic interface. Users should not need to do things like infer which system they
> are running on by looking at output of resctrl files as mentioned.
>
> fwiw ... there seems to be a usage by Google to compare num_rmids to num_closids to determine
> how to interact with resctrl:
> https://lore.kernel.org/lkml/CALPaoCgSO7HzK9BjyM8yL50oPyq9kBj64Nkgyo1WEJrWy5uHUg@mail.gmail.com/
Unfortunately, it looks like we're about to break this heuristic :( At
least, until a way to get this information generically in resctrl is
decided upon.
>
> Reinette
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 28/47] arm_mpam: resctrl: Add support for csu counters
2026-01-16 10:29 ` Ben Horgan
@ 2026-01-20 15:28 ` Peter Newman
2026-01-21 17:58 ` Reinette Chatre
0 siblings, 1 reply; 160+ messages in thread
From: Peter Newman @ 2026-01-20 15:28 UTC (permalink / raw)
To: Ben Horgan
Cc: Reinette Chatre, amitsinght, baisheng.gao, baolin.wang, carl,
dave.martin, david, dfustini, fenghuay, gshan, james.morse,
jonathan.cameron, kobak, lcherian, linux-arm-kernel, linux-kernel,
punit.agrawal, quic_jiles, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
Hi Ben,
On Fri, Jan 16, 2026 at 11:29 AM Ben Horgan <ben.horgan@arm.com> wrote:
>
> Hi Reinette, Peter,
>
> On 1/15/26 18:54, Reinette Chatre wrote:
> > Hi Ben,
> >
> > On 1/15/26 7:43 AM, Ben Horgan wrote:
> >> On 1/13/26 23:14, Reinette Chatre wrote:
> >>> On 1/12/26 8:58 AM, Ben Horgan wrote:
> > ...
> >>>> +
> >>>> + /*
> >>>> + * Unfortunately, num_rmid doesn't mean anything for
> >>>> + * mpam, and its exposed to user-space!
> >>>> + *
> >>>
> >>> The idea of adding a per MON group "num_mon_groups" file has been floated a couple of
> >>> times now. I have not heard any objections against doing something like this.
> >>> https://lore.kernel.org/all/cbe665c2-fe83-e446-1696-7115c0f9fd76@arm.com/
> >>> https://lore.kernel.org/lkml/46767ca7-1f1b-48e8-8ce6-be4b00d129f9@intel.com/
> >>
> >> Hmm, I see now that 'num_rmid' is documented as an upper bound and so
> >> neither 1 or mpam_pmg_max + 1 agree with the documentation.
> >>
> >> "
> >> "num_rmids":
> >> The number of RMIDs available. This is the
> >> upper bound for how many "CTRL_MON" + "MON"
> >> groups can be created.
> >> "
> >
> > Please note that this documentation has been refactored (without changing its
> > meaning). The above quoted text is specific to L3 monitoring and with the
> > addition of telemetry monitoring the relevant text now reads:
> > The upper bound for how many "CTRL_MON" + "MON" can be created
> > is the smaller of the L3_MON and PERF_PKG_MON "num_rmids" values.
> >
> >>
> >> So, if I understand correctly you're proposing setting
> >> num_rmids = num_pmg * num_partids on arm platforms and that in the
> >> interim this can then be used to calculate the num_pmg by calculating
> >> num_closid/num_rmid but that a per CTRL_MON num_mon_groups should be
> >> added to make this consistent across architectures?
> >
> > Yes for num_rmids = num_pmg * num_partids.
>
> Ok, I don't really see another option.
>
> The motivation for this is that to me
> > this looks like the value that best matches the num_rmids documentation. I understand
> > the RMID vs PMG is difficult so my proposal is certainly not set in stone and I would like to
> > hear motivation for different interpretations. "calculating num_pmg" is not obvious
> > though. I interpret "num_pmg" here as number of monitor groups per control group and on
> > an Arm system this is indeed num_closid/num_rmids (if num_rmids = num_pmg * num_partids)
> > but on x86 it is just num_rmids. Having user space depend on such computation to determine how
> > many monitor groups per control group would thus require that user space knows whether the
> > underlying system is Arm or x86 and would go against goal of having resctrl as a generic interface.
> >
> > The way forward may be to deprecate (somehow) num_rmids and transition to something
> > like "num_mon_groups" but it is currently vague how "num_mon_groups" may look like. That thread
> > (https://lore.kernel.org/lkml/46767ca7-1f1b-48e8-8ce6-be4b00d129f9@intel.com/) fizzled
> > out after raising a few options how it may look.
> >
> > Another proposal was to add a "mon_id_includes_control_id" to use as another "guide" to
> > determine how many monitoring groups can be created but at the time it seemed an intermediary
> > step for user to determine the number of monitor groups that resctrl can also provide.
> > https://lore.kernel.org/lkml/CALPaoChad6=xqz+BQQd=dB915xhj1gusmcrS9ya+T2GyhTQc5Q@mail.gmail.com/
>
> Just thinking about it now but the "mon_id_includes_control_id" option
> seems the best to me as it is a single bit option that along with
> "num_rmids" let's you know which monitor groups you can create and if
> it's sensible to move monitor groups between CTRL MON groups.
>
> The "num_mon_groups" per CTRL MON group would also need to be
> interpreted together with "num_rmid" to know if it is a global or per
> CTRL MON upper bound. This option also uses multiple files to give the
> same bit of information.
>
> >
> > Making this consistent across architectures is the goal since resctrl aims to be
> > a generic interface. Users should not need to do things like infer which system they
> > are running on by looking at output of resctrl files as mentioned.
> >
> > fwiw ... there seems to be a usage by Google to compare num_rmids to num_closids to determine
> > how to interact with resctrl:
> > https://lore.kernel.org/lkml/CALPaoCgSO7HzK9BjyM8yL50oPyq9kBj64Nkgyo1WEJrWy5uHUg@mail.gmail.com/
>
> Unfortunately, it looks like we're about to break this heuristic :( At
> least, until a way to get this information generically in resctrl is
> decided upon.
We actually ended up going with the "mon_id_includes_control_id" approach.
The property it represents is rather fundamental to what a monitoring
group actually is and is a low-level implementation detail that is
difficult to hide. Google generally needs support for as many
monitoring IDs as jobs it expects to be able to run on a machine, so
the number of monitoring groups will be routinely maxed out (and there
will be some jobs that are forever stuck in the default group because
no RMIDs were free at the time it started[1])
Thanks,
-Peter
[1] https://lore.kernel.org/lkml/CALPaoCjTwySGX9i7uAtCWLKQpmELKP55xDLJhHmUve8ptsfFTw@mail.gmail.com/
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 28/47] arm_mpam: resctrl: Add support for csu counters
2026-01-20 15:28 ` Peter Newman
@ 2026-01-21 17:58 ` Reinette Chatre
2026-01-30 11:07 ` Ben Horgan
0 siblings, 1 reply; 160+ messages in thread
From: Reinette Chatre @ 2026-01-21 17:58 UTC (permalink / raw)
To: Peter Newman, Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, punit.agrawal,
quic_jiles, rohit.mathew, scott, sdonthineni, tan.shaopeng, xhao,
catalin.marinas, will, corbet, maz, oupton, joey.gouly,
suzuki.poulose, kvmarm
Hi Ben and Peter,
On 1/20/26 7:28 AM, Peter Newman wrote:
> Hi Ben,
>
> On Fri, Jan 16, 2026 at 11:29 AM Ben Horgan <ben.horgan@arm.com> wrote:
>>
>> Hi Reinette, Peter,
>>
>> On 1/15/26 18:54, Reinette Chatre wrote:
>>> Hi Ben,
>>>
>>> On 1/15/26 7:43 AM, Ben Horgan wrote:
>>>> On 1/13/26 23:14, Reinette Chatre wrote:
>>>>> On 1/12/26 8:58 AM, Ben Horgan wrote:
>>> ...
>>>>>> +
>>>>>> + /*
>>>>>> + * Unfortunately, num_rmid doesn't mean anything for
>>>>>> + * mpam, and its exposed to user-space!
>>>>>> + *
>>>>>
>>>>> The idea of adding a per MON group "num_mon_groups" file has been floated a couple of
>>>>> times now. I have not heard any objections against doing something like this.
>>>>> https://lore.kernel.org/all/cbe665c2-fe83-e446-1696-7115c0f9fd76@arm.com/
>>>>> https://lore.kernel.org/lkml/46767ca7-1f1b-48e8-8ce6-be4b00d129f9@intel.com/
>>>>
>>>> Hmm, I see now that 'num_rmid' is documented as an upper bound and so
>>>> neither 1 or mpam_pmg_max + 1 agree with the documentation.
>>>>
>>>> "
>>>> "num_rmids":
>>>> The number of RMIDs available. This is the
>>>> upper bound for how many "CTRL_MON" + "MON"
>>>> groups can be created.
>>>> "
>>>
>>> Please note that this documentation has been refactored (without changing its
>>> meaning). The above quoted text is specific to L3 monitoring and with the
>>> addition of telemetry monitoring the relevant text now reads:
>>> The upper bound for how many "CTRL_MON" + "MON" can be created
>>> is the smaller of the L3_MON and PERF_PKG_MON "num_rmids" values.
>>>
>>>>
>>>> So, if I understand correctly you're proposing setting
>>>> num_rmids = num_pmg * num_partids on arm platforms and that in the
>>>> interim this can then be used to calculate the num_pmg by calculating
>>>> num_closid/num_rmid but that a per CTRL_MON num_mon_groups should be
>>>> added to make this consistent across architectures?
>>>
>>> Yes for num_rmids = num_pmg * num_partids.
>>
>> Ok, I don't really see another option.
>>
>> The motivation for this is that to me
>>> this looks like the value that best matches the num_rmids documentation. I understand
>>> the RMID vs PMG is difficult so my proposal is certainly not set in stone and I would like to
>>> hear motivation for different interpretations. "calculating num_pmg" is not obvious
>>> though. I interpret "num_pmg" here as number of monitor groups per control group and on
>>> an Arm system this is indeed num_closid/num_rmids (if num_rmids = num_pmg * num_partids)
>>> but on x86 it is just num_rmids. Having user space depend on such computation to determine how
>>> many monitor groups per control group would thus require that user space knows whether the
>>> underlying system is Arm or x86 and would go against goal of having resctrl as a generic interface.
>>>
>>> The way forward may be to deprecate (somehow) num_rmids and transition to something
>>> like "num_mon_groups" but it is currently vague how "num_mon_groups" may look like. That thread
>>> (https://lore.kernel.org/lkml/46767ca7-1f1b-48e8-8ce6-be4b00d129f9@intel.com/) fizzled
>>> out after raising a few options how it may look.
>>>
>>> Another proposal was to add a "mon_id_includes_control_id" to use as another "guide" to
>>> determine how many monitoring groups can be created but at the time it seemed an intermediary
>>> step for user to determine the number of monitor groups that resctrl can also provide.
>>> https://lore.kernel.org/lkml/CALPaoChad6=xqz+BQQd=dB915xhj1gusmcrS9ya+T2GyhTQc5Q@mail.gmail.com/
>>
>> Just thinking about it now but the "mon_id_includes_control_id" option
>> seems the best to me as it is a single bit option that along with
>> "num_rmids" let's you know which monitor groups you can create and if
>> it's sensible to move monitor groups between CTRL MON groups.
>>
>> The "num_mon_groups" per CTRL MON group would also need to be
>> interpreted together with "num_rmid" to know if it is a global or per
>> CTRL MON upper bound. This option also uses multiple files to give the
>> same bit of information.
>>
>>>
>>> Making this consistent across architectures is the goal since resctrl aims to be
>>> a generic interface. Users should not need to do things like infer which system they
>>> are running on by looking at output of resctrl files as mentioned.
>>>
>>> fwiw ... there seems to be a usage by Google to compare num_rmids to num_closids to determine
>>> how to interact with resctrl:
>>> https://lore.kernel.org/lkml/CALPaoCgSO7HzK9BjyM8yL50oPyq9kBj64Nkgyo1WEJrWy5uHUg@mail.gmail.com/
>>
>> Unfortunately, it looks like we're about to break this heuristic :( At
>> least, until a way to get this information generically in resctrl is
>> decided upon.
>
> We actually ended up going with the "mon_id_includes_control_id" approach.
Thank you for confirming. I was hoping we could deprecate num_rmids after introducing a
per resource group file but this does not seem to support all the use cases as highlighted by
Ben.
As I see it, a name like "mon_id_includes_control_id" also implies that "num_rmids", perhaps
linked to a new "num_mon_ids" as Peter suggested in [2], should contain num_pmg * num_partids.
One concern from earlier was that "mon_id_includes_control_id" may be used as a
heuristic for whether monitor groups can be moved or not. Instead I seem to remember that
there was a plan for MPAM to support moving monitor groups, with the caveat that
counters will reset for which resctrl may need another flag.
> The property it represents is rather fundamental to what a monitoring
> group actually is and is a low-level implementation detail that is
> difficult to hide. Google generally needs support for as many
> monitoring IDs as jobs it expects to be able to run on a machine, so
> the number of monitoring groups will be routinely maxed out (and there
> will be some jobs that are forever stuck in the default group because
> no RMIDs were free at the time it started[1])
>
> Thanks,
> -Peter
>
> [1] https://lore.kernel.org/lkml/CALPaoCjTwySGX9i7uAtCWLKQpmELKP55xDLJhHmUve8ptsfFTw@mail.gmail.com/
Reinette
[2] https://lore.kernel.org/lkml/CALPaoChad6=xqz+BQQd=dB915xhj1gusmcrS9ya+T2GyhTQc5Q@mail.gmail.com/
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 28/47] arm_mpam: resctrl: Add support for csu counters
2026-01-21 17:58 ` Reinette Chatre
@ 2026-01-30 11:07 ` Ben Horgan
0 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-30 11:07 UTC (permalink / raw)
To: Reinette Chatre, Peter Newman
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, punit.agrawal,
quic_jiles, rohit.mathew, scott, sdonthineni, tan.shaopeng, xhao,
catalin.marinas, will, corbet, maz, oupton, joey.gouly,
suzuki.poulose, kvmarm
Hi Reinette, Peter,
On 1/21/26 17:58, Reinette Chatre wrote:
> Hi Ben and Peter,
>
> On 1/20/26 7:28 AM, Peter Newman wrote:
>> Hi Ben,
>>
>> On Fri, Jan 16, 2026 at 11:29 AM Ben Horgan <ben.horgan@arm.com> wrote:
>>>
>>> Hi Reinette, Peter,
>>>
>>> On 1/15/26 18:54, Reinette Chatre wrote:
>>>> Hi Ben,
>>>>
>>>> On 1/15/26 7:43 AM, Ben Horgan wrote:
>>>>> On 1/13/26 23:14, Reinette Chatre wrote:
>>>>>> On 1/12/26 8:58 AM, Ben Horgan wrote:
>>>> ...
>>>>>>> +
>>>>>>> + /*
>>>>>>> + * Unfortunately, num_rmid doesn't mean anything for
>>>>>>> + * mpam, and its exposed to user-space!
>>>>>>> + *
>>>>>>
>>>>>> The idea of adding a per MON group "num_mon_groups" file has been floated a couple of
>>>>>> times now. I have not heard any objections against doing something like this.
>>>>>> https://lore.kernel.org/all/cbe665c2-fe83-e446-1696-7115c0f9fd76@arm.com/
>>>>>> https://lore.kernel.org/lkml/46767ca7-1f1b-48e8-8ce6-be4b00d129f9@intel.com/
>>>>>
>>>>> Hmm, I see now that 'num_rmid' is documented as an upper bound and so
>>>>> neither 1 or mpam_pmg_max + 1 agree with the documentation.
>>>>>
>>>>> "
>>>>> "num_rmids":
>>>>> The number of RMIDs available. This is the
>>>>> upper bound for how many "CTRL_MON" + "MON"
>>>>> groups can be created.
>>>>> "
>>>>
>>>> Please note that this documentation has been refactored (without changing its
>>>> meaning). The above quoted text is specific to L3 monitoring and with the
>>>> addition of telemetry monitoring the relevant text now reads:
>>>> The upper bound for how many "CTRL_MON" + "MON" can be created
>>>> is the smaller of the L3_MON and PERF_PKG_MON "num_rmids" values.
>>>>
>>>>>
>>>>> So, if I understand correctly you're proposing setting
>>>>> num_rmids = num_pmg * num_partids on arm platforms and that in the
>>>>> interim this can then be used to calculate the num_pmg by calculating
>>>>> num_closid/num_rmid but that a per CTRL_MON num_mon_groups should be
>>>>> added to make this consistent across architectures?
>>>>
>>>> Yes for num_rmids = num_pmg * num_partids.
>>>
>>> Ok, I don't really see another option.
>>>
>>> The motivation for this is that to me
>>>> this looks like the value that best matches the num_rmids documentation. I understand
>>>> the RMID vs PMG is difficult so my proposal is certainly not set in stone and I would like to
>>>> hear motivation for different interpretations. "calculating num_pmg" is not obvious
>>>> though. I interpret "num_pmg" here as number of monitor groups per control group and on
>>>> an Arm system this is indeed num_closid/num_rmids (if num_rmids = num_pmg * num_partids)
>>>> but on x86 it is just num_rmids. Having user space depend on such computation to determine how
>>>> many monitor groups per control group would thus require that user space knows whether the
>>>> underlying system is Arm or x86 and would go against goal of having resctrl as a generic interface.
>>>>
>>>> The way forward may be to deprecate (somehow) num_rmids and transition to something
>>>> like "num_mon_groups" but it is currently vague how "num_mon_groups" may look like. That thread
>>>> (https://lore.kernel.org/lkml/46767ca7-1f1b-48e8-8ce6-be4b00d129f9@intel.com/) fizzled
>>>> out after raising a few options how it may look.
>>>>
>>>> Another proposal was to add a "mon_id_includes_control_id" to use as another "guide" to
>>>> determine how many monitoring groups can be created but at the time it seemed an intermediary
>>>> step for user to determine the number of monitor groups that resctrl can also provide.
>>>> https://lore.kernel.org/lkml/CALPaoChad6=xqz+BQQd=dB915xhj1gusmcrS9ya+T2GyhTQc5Q@mail.gmail.com/
>>>
>>> Just thinking about it now but the "mon_id_includes_control_id" option
>>> seems the best to me as it is a single bit option that along with
>>> "num_rmids" let's you know which monitor groups you can create and if
>>> it's sensible to move monitor groups between CTRL MON groups.
>>>
>>> The "num_mon_groups" per CTRL MON group would also need to be
>>> interpreted together with "num_rmid" to know if it is a global or per
>>> CTRL MON upper bound. This option also uses multiple files to give the
>>> same bit of information.
>>>
>>>>
>>>> Making this consistent across architectures is the goal since resctrl aims to be
>>>> a generic interface. Users should not need to do things like infer which system they
>>>> are running on by looking at output of resctrl files as mentioned.
>>>>
>>>> fwiw ... there seems to be a usage by Google to compare num_rmids to num_closids to determine
>>>> how to interact with resctrl:
>>>> https://lore.kernel.org/lkml/CALPaoCgSO7HzK9BjyM8yL50oPyq9kBj64Nkgyo1WEJrWy5uHUg@mail.gmail.com/
>>>
>>> Unfortunately, it looks like we're about to break this heuristic :( At
>>> least, until a way to get this information generically in resctrl is
>>> decided upon.
>>
>> We actually ended up going with the "mon_id_includes_control_id" approach.
>
> Thank you for confirming. I was hoping we could deprecate num_rmids after introducing a
> per resource group file but this does not seem to support all the use cases as highlighted by
> Ben.
>
> As I see it, a name like "mon_id_includes_control_id" also implies that "num_rmids", perhaps
> linked to a new "num_mon_ids" as Peter suggested in [2], should contain num_pmg * num_partids.
>
> One concern from earlier was that "mon_id_includes_control_id" may be used as a
> heuristic for whether monitor groups can be moved or not. Instead I seem to remember that
> there was a plan for MPAM to support moving monitor groups, with the caveat that
> counters will reset for which resctrl may need another flag.
I had a chat offline with James about this. Currently, userspace expects
either the copy to succeed and the counters not to glitch or the move to
fail. If we were going to support a monitor move in MPAM with counter
reset (or a best effort counter value) we would have to make this opt-in
for userspace. If userspace tries the monitor move while being unaware
of the new flag it would unexpectedly lose counter data. To get this
opt-in behaviour there could be a mount option,
"destructive_monitor_move" or such like. Although this was considered in
the past, we're not currently aware of any usecase for this desctructive
monitor move and so are not proposing adding it or changing the existing
behaviour around this. This doesn't mean that a flag for indicating
whether monitor move is supported or not is not useful; a user may want
to know if monitor move is supported but not to do a monitor move at the
current time.
>
>> The property it represents is rather fundamental to what a monitoring
>> group actually is and is a low-level implementation detail that is
>> difficult to hide. Google generally needs support for as many
>> monitoring IDs as jobs it expects to be able to run on a machine, so
>> the number of monitoring groups will be routinely maxed out (and there
>> will be some jobs that are forever stuck in the default group because
>> no RMIDs were free at the time it started[1])
>>
>> Thanks,
>> -Peter
>>
>> [1] https://lore.kernel.org/lkml/CALPaoCjTwySGX9i7uAtCWLKQpmELKP55xDLJhHmUve8ptsfFTw@mail.gmail.com/
>
> Reinette
>
> [2] https://lore.kernel.org/lkml/CALPaoChad6=xqz+BQQd=dB915xhj1gusmcrS9ya+T2GyhTQc5Q@mail.gmail.com/
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 28/47] arm_mpam: resctrl: Add support for csu counters
2026-01-12 16:58 ` [PATCH v3 28/47] arm_mpam: resctrl: Add support for csu counters Ben Horgan
2026-01-13 23:14 ` Reinette Chatre
@ 2026-01-30 11:19 ` Ben Horgan
1 sibling, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-30 11:19 UTC (permalink / raw)
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
Replying to myself...
On 1/12/26 16:58, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
>
> resctrl exposes a counter via a file named llc_occupancy. This isn't really
> a counter as its value goes up and down, this is a snapshot of the cache
> storage usage monitor.
>
> Add some picking code to find a cache as close as possible to the L3 that
> supports the CSU monitor.
>
> If there is an L3, but it doesn't have any controls, force the L3 resource
> to exist. The existing topology_matches_l3() and
> mpam_resctrl_domain_hdr_init() code will ensure this looks like the L3,
> even if the class belongs to a later cache.
>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Co-developed-by: Dave Martin <dave.martin@arm.com>
> Signed-off-by: Dave Martin <dave.martin@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since rfc:
> Allow csu counters however many partid or pmg there are
> else if -> if
> reduce scope of local variables
> drop has_csu
>
> Changes since v2:
> return -> break so works for mbwu in later patch
> add for_each_mpam_resctrl_mon
> return error from mpam_resctrl_monitor_init(). It may fail when is abmc
> allocation introduced in a later patch.
> Squashed in patch from Dave Martin:
> https://lore.kernel.org/lkml/20250820131621.54983-1-Dave.Martin@arm.com/
> ---
> drivers/resctrl/mpam_internal.h | 6 ++
> drivers/resctrl/mpam_resctrl.c | 173 +++++++++++++++++++++++++++++++-
> 2 files changed, 174 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index f89ceaf7623d..21cc776e57aa 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -349,6 +349,12 @@ struct mpam_resctrl_res {
> struct rdt_resource resctrl_res;
> };
>
> +struct mpam_resctrl_mon {
> + struct mpam_class *class;
> +
> + /* per-class data that resctrl needs will live here */
> +};
> +
> static inline int mpam_alloc_csu_mon(struct mpam_class *class)
> {
> struct mpam_props *cprops = &class->props;
> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
> index 7402bf4293b6..5020a5faed96 100644
> --- a/drivers/resctrl/mpam_resctrl.c
> +++ b/drivers/resctrl/mpam_resctrl.c
> @@ -37,6 +37,21 @@ static struct mpam_resctrl_res mpam_resctrl_controls[RDT_NUM_RESOURCES];
> /* The lock for modifying resctrl's domain lists from cpuhp callbacks. */
> static DEFINE_MUTEX(domain_list_lock);
>
> +/*
> + * The classes we've picked to map to resctrl events.
> + * Resctrl believes all the worlds a Xeon, and these are all on the L3. This
> + * array lets us find the actual class backing the event counters. e.g.
> + * the only memory bandwidth counters may be on the memory controller, but to
> + * make use of them, we pretend they are on L3.
> + * Class pointer may be NULL.
> + */
> +static struct mpam_resctrl_mon mpam_resctrl_counters[QOS_NUM_EVENTS];
> +
> +#define for_each_mpam_resctrl_mon(mon, eventid) \
> + for (eventid = 0, mon = &mpam_resctrl_counters[eventid]; \
> + eventid < QOS_NUM_EVENTS; \
> + eventid++, mon = &mpam_resctrl_counters[eventid])
> +
> static bool exposed_alloc_capable;
> static bool exposed_mon_capable;
>
> @@ -259,6 +274,28 @@ static bool class_has_usable_mba(struct mpam_props *cprops)
> return mba_class_use_mbw_max(cprops);
> }
>
> +static bool cache_has_usable_csu(struct mpam_class *class)
> +{
> + struct mpam_props *cprops;
> +
> + if (!class)
> + return false;
> +
> + cprops = &class->props;
> +
> + if (!mpam_has_feature(mpam_feat_msmon_csu, cprops))
> + return false;
> +
> + /*
> + * CSU counters settle on the value, so we can get away with
> + * having only one.
> + */
> + if (!cprops->num_csu_mon)
> + return false;
> +
> + return true;
> +}
> +
> /*
> * Calculate the worst-case percentage change from each implemented step
> * in the control.
> @@ -507,6 +544,64 @@ static void mpam_resctrl_pick_mba(void)
> }
> }
>
> +static void counter_update_class(enum resctrl_event_id evt_id,
> + struct mpam_class *class)
> +{
> + struct mpam_class *existing_class = mpam_resctrl_counters[evt_id].class;
> +
> + if (existing_class) {
> + if (class->level == 3) {
> + pr_debug("Existing class is L3 - L3 wins\n");
> + return;
> + }
> +
> + if (existing_class->level < class->level) {
> + pr_debug("Existing class is closer to L3, %u versus %u - closer is better\n",
> + existing_class->level, class->level);
> + return;
> + }
> + }
> +
> + mpam_resctrl_counters[evt_id].class = class;
> + exposed_mon_capable = true;
> +}
> +
> +static void mpam_resctrl_pick_counters(void)
> +{
> + struct mpam_class *class;
> +
> + lockdep_assert_cpus_held();
> +
> + guard(srcu)(&mpam_srcu);
> + list_for_each_entry_srcu(class, &mpam_classes, classes_list,
> + srcu_read_lock_held(&mpam_srcu)) {
> + if (class->level < 3) {
> + pr_debug("class %u is before L3", class->level);
> + continue;
> + }
> +
> + if (!cpumask_equal(&class->affinity, cpu_possible_mask)) {
> + pr_debug("class %u does not cover all CPUs",
> + class->level);
> + continue;
> + }
> +
> + if (cache_has_usable_csu(class) && topology_matches_l3(class)) {
> + pr_debug("class %u has usable CSU, and matches L3 topology",
> + class->level);
> +
> + /* CSU counters only make sense on a cache. */
> + switch (class->type) {
> + case MPAM_CLASS_CACHE:
> + counter_update_class(QOS_L3_OCCUP_EVENT_ID, class);
As the counter is named llc_occupancy (llc = last level cache) and we
are naming the resource L3 it would be surprising to have llc_occupancy
on anything other than an L3. Also, that L3 should be a last level
cache. I'll update this as part of a general push to tighten up the
heuristics in this series to make sure that in the future when more
fitting user visible interfaces are added in resctrl we are able to use
them rather than being stuck with something that almost fits.
> + break;
> + default:
> + break;
> + }
> + }
> + }
> +}
> +
[...]
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 29/47] arm_mpam: resctrl: Pick classes for use as mbm counters
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (27 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 28/47] arm_mpam: resctrl: Add support for csu counters Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-15 15:49 ` Peter Newman
2026-01-12 16:58 ` [PATCH v3 30/47] arm_mpam: resctrl: Pre-allocate free running monitors Ben Horgan
` (22 subsequent siblings)
51 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: James Morse <james.morse@arm.com>
resctrl has two types of counters, NUMA-local and global. MPAM has only
bandwidth counters, but the position of the MSC may mean it counts
NUMA-local, or global traffic.
But the topology information is not available.
Apply a heuristic: the L2 or L3 supports bandwidth monitors, these are
probably NUMA-local. If the memory controller supports bandwidth monitors,
they are probably global.
This also allows us to assert that we don't have the same class backing two
different resctrl events.
Because the class or component backing the event may not be 'the L3', it is
necessary for mpam_resctrl_get_domain_from_cpu() to search the monitor
domains too. This matters the most for 'monitor only' systems, where 'the
L3' control domains may be empty, and the ctrl_comp pointer NULL.
resctrl expects there to be enough monitors for every possible control and
monitor group to have one. Such a system gets called 'free running' as the
monitors can be programmed once and left running. Any other platform will
need to emulate ABMC.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since rfc:
drop has_mbwu
Changes since v2:
Iterate over mpam_resctrl_dom directly (Jonathan)
Use for_each_mpam_resctrl_mon
---
drivers/resctrl/mpam_internal.h | 8 ++
drivers/resctrl/mpam_resctrl.c | 133 +++++++++++++++++++++++++++++++-
2 files changed, 139 insertions(+), 2 deletions(-)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 21cc776e57aa..1c5492008fe8 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -340,6 +340,14 @@ struct mpam_msc_ris {
struct mpam_resctrl_dom {
struct mpam_component *ctrl_comp;
+
+ /*
+ * There is no single mon_comp because different events may be backed
+ * by different class/components. mon_comp is indexed by the event
+ * number.
+ */
+ struct mpam_component *mon_comp[QOS_NUM_EVENTS];
+
struct rdt_ctrl_domain resctrl_ctrl_dom;
struct rdt_mon_domain resctrl_mon_dom;
};
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index 5020a5faed96..14a8dcaf1366 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -68,6 +68,14 @@ static bool cdp_enabled;
static bool cacheinfo_ready;
static DECLARE_WAIT_QUEUE_HEAD(wait_cacheinfo_ready);
+/* Whether this num_mbw_mon could result in a free_running system */
+static int __mpam_monitors_free_running(u16 num_mbwu_mon)
+{
+ if (num_mbwu_mon >= resctrl_arch_system_num_rmid_idx())
+ return resctrl_arch_system_num_rmid_idx();
+ return 0;
+}
+
bool resctrl_arch_alloc_capable(void)
{
return exposed_alloc_capable;
@@ -296,6 +304,26 @@ static bool cache_has_usable_csu(struct mpam_class *class)
return true;
}
+static bool class_has_usable_mbwu(struct mpam_class *class)
+{
+ struct mpam_props *cprops = &class->props;
+
+ if (!mpam_has_feature(mpam_feat_msmon_mbwu, cprops))
+ return false;
+
+ /*
+ * resctrl expects the bandwidth counters to be free running,
+ * which means we need as many monitors as resctrl has
+ * control/monitor groups.
+ */
+ if (__mpam_monitors_free_running(cprops->num_mbwu_mon)) {
+ pr_debug("monitors usable in free-running mode\n");
+ return true;
+ }
+
+ return false;
+}
+
/*
* Calculate the worst-case percentage change from each implemented step
* in the control.
@@ -599,7 +627,36 @@ static void mpam_resctrl_pick_counters(void)
break;
}
}
+
+ if (class_has_usable_mbwu(class) && topology_matches_l3(class)) {
+ pr_debug("class %u has usable MBWU, and matches L3 topology",
+ class->level);
+
+ /*
+ * MBWU counters may be 'local' or 'total' depending on
+ * where they are in the topology. Counters on caches
+ * are assumed to be local. If it's on the memory
+ * controller, its assumed to be global.
+ */
+ switch (class->type) {
+ case MPAM_CLASS_CACHE:
+ counter_update_class(QOS_L3_MBM_LOCAL_EVENT_ID,
+ class);
+ break;
+ case MPAM_CLASS_MEMORY:
+ counter_update_class(QOS_L3_MBM_TOTAL_EVENT_ID,
+ class);
+ break;
+ default:
+ break;
+ }
+ }
}
+
+ /* Allocation of MBWU monitors assumes that the class is unique... */
+ if (mpam_resctrl_counters[QOS_L3_MBM_LOCAL_EVENT_ID].class)
+ WARN_ON_ONCE(mpam_resctrl_counters[QOS_L3_MBM_LOCAL_EVENT_ID].class ==
+ mpam_resctrl_counters[QOS_L3_MBM_TOTAL_EVENT_ID].class);
}
static int mpam_resctrl_control_init(struct mpam_resctrl_res *res)
@@ -942,6 +999,20 @@ static void mpam_resctrl_domain_insert(struct list_head *list,
list_add_tail_rcu(&new->list, pos);
}
+static struct mpam_component *find_component(struct mpam_class *class, int cpu)
+{
+ struct mpam_component *comp;
+
+ guard(srcu)(&mpam_srcu);
+ list_for_each_entry_srcu(comp, &class->components, class_list,
+ srcu_read_lock_held(&mpam_srcu)) {
+ if (cpumask_test_cpu(cpu, &comp->affinity))
+ return comp;
+ }
+
+ return NULL;
+}
+
static struct mpam_resctrl_dom *
mpam_resctrl_alloc_domain(unsigned int cpu, struct mpam_resctrl_res *res)
{
@@ -990,8 +1061,33 @@ mpam_resctrl_alloc_domain(unsigned int cpu, struct mpam_resctrl_res *res)
}
if (exposed_mon_capable) {
+ struct mpam_component *any_mon_comp;
+ struct mpam_resctrl_mon *mon;
+ enum resctrl_event_id eventid;
+
+ /*
+ * Even if the monitor domain is backed by a different
+ * component, the L3 component IDs need to be used... only
+ * there may be no ctrl_comp for the L3.
+ * Search each event's class list for a component with
+ * overlapping CPUs and set up the dom->mon_comp array.
+ */
+
+ for_each_mpam_resctrl_mon(mon, eventid) {
+ struct mpam_component *mon_comp;
+
+ if (!mon->class)
+ continue; // dummy resource
+
+ mon_comp = find_component(mon->class, cpu);
+ dom->mon_comp[eventid] = mon_comp;
+ if (mon_comp)
+ any_mon_comp = mon_comp;
+ }
+ WARN_ON_ONCE(!any_mon_comp);
+
mon_d = &dom->resctrl_mon_dom;
- mpam_resctrl_domain_hdr_init(cpu, ctrl_comp, &mon_d->hdr);
+ mpam_resctrl_domain_hdr_init(cpu, any_mon_comp, &mon_d->hdr);
mon_d->hdr.type = RESCTRL_MON_DOMAIN;
mpam_resctrl_domain_insert(&r->mon_domains, &mon_d->hdr);
err = resctrl_online_mon_domain(r, mon_d);
@@ -1013,6 +1109,35 @@ mpam_resctrl_alloc_domain(unsigned int cpu, struct mpam_resctrl_res *res)
return dom;
}
+/*
+ * We know all the monitors are associated with the L3, even if there are no
+ * controls and therefore no control component. Find the cache-id for the CPU
+ * and use that to search for existing resctrl domains.
+ * This relies on mpam_resctrl_pick_domain_id() using the L3 cache-id
+ * for anything that is not a cache.
+ */
+static struct mpam_resctrl_dom *mpam_resctrl_get_mon_domain_from_cpu(int cpu)
+{
+ u32 cache_id;
+ struct mpam_resctrl_dom *dom;
+ struct mpam_resctrl_res *l3 = &mpam_resctrl_controls[RDT_RESOURCE_L3];
+
+ lockdep_assert_cpus_held();
+
+ if (!l3->class)
+ return NULL;
+ cache_id = get_cpu_cacheinfo_id(cpu, 3);
+ if (cache_id == ~0)
+ return NULL;
+
+ list_for_each_entry_rcu(dom, &l3->resctrl_res.mon_domains, resctrl_mon_dom.hdr.list) {
+ if (dom->resctrl_mon_dom.hdr.id == cache_id)
+ return dom;
+ }
+
+ return NULL;
+}
+
static struct mpam_resctrl_dom *
mpam_resctrl_get_domain_from_cpu(int cpu, struct mpam_resctrl_res *res)
{
@@ -1026,7 +1151,11 @@ mpam_resctrl_get_domain_from_cpu(int cpu, struct mpam_resctrl_res *res)
return dom;
}
- return NULL;
+ if (r->rid != RDT_RESOURCE_L3)
+ return NULL;
+
+ /* Search the mon domain list too - needed on monitor only platforms. */
+ return mpam_resctrl_get_mon_domain_from_cpu(cpu);
}
int mpam_resctrl_online_cpu(unsigned int cpu)
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 29/47] arm_mpam: resctrl: Pick classes for use as mbm counters
2026-01-12 16:58 ` [PATCH v3 29/47] arm_mpam: resctrl: Pick classes for use as mbm counters Ben Horgan
@ 2026-01-15 15:49 ` Peter Newman
2026-01-19 12:04 ` James Morse
0 siblings, 1 reply; 160+ messages in thread
From: Peter Newman @ 2026-01-15 15:49 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
Hi Ben,
On Mon, Jan 12, 2026 at 6:02 PM Ben Horgan <ben.horgan@arm.com> wrote:
>
> From: James Morse <james.morse@arm.com>
>
> resctrl has two types of counters, NUMA-local and global. MPAM has only
> bandwidth counters, but the position of the MSC may mean it counts
> NUMA-local, or global traffic.
>
> But the topology information is not available.
>
> Apply a heuristic: the L2 or L3 supports bandwidth monitors, these are
> probably NUMA-local. If the memory controller supports bandwidth monitors,
> they are probably global.
Are remote memory accesses not cached? How do we know an MBWU monitor
residing on a cache won't count remote traffic?
Thanks,
-Peter
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 29/47] arm_mpam: resctrl: Pick classes for use as mbm counters
2026-01-15 15:49 ` Peter Newman
@ 2026-01-19 12:04 ` James Morse
2026-01-19 12:47 ` Peter Newman
0 siblings, 1 reply; 160+ messages in thread
From: James Morse @ 2026-01-19 12:04 UTC (permalink / raw)
To: Peter Newman, Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, jonathan.cameron, kobak, lcherian,
linux-arm-kernel, linux-kernel, punit.agrawal, quic_jiles,
reinette.chatre, rohit.mathew, scott, sdonthineni, tan.shaopeng,
xhao, catalin.marinas, will, corbet, maz, oupton, joey.gouly,
suzuki.poulose, kvmarm
Hi Peter,
On 15/01/2026 15:49, Peter Newman wrote:
> On Mon, Jan 12, 2026 at 6:02 PM Ben Horgan <ben.horgan@arm.com> wrote:
>> From: James Morse <james.morse@arm.com>
>>
>> resctrl has two types of counters, NUMA-local and global. MPAM has only
>> bandwidth counters, but the position of the MSC may mean it counts
>> NUMA-local, or global traffic.
>>
>> But the topology information is not available.
>>
>> Apply a heuristic: the L2 or L3 supports bandwidth monitors, these are
>> probably NUMA-local. If the memory controller supports bandwidth monitors,
>> they are probably global.
> Are remote memory accesses not cached? How do we know an MBWU monitor
> residing on a cache won't count remote traffic?
It will, yes you get double counting. Is forbidding both mbm_total and mbm_local preferable?
I think this comes from 'total' in mbm_total not really having the obvious meaning of the
word:
If I have CPUs in NUMA-A and no memory controllers, then NUMA-B has no CPUs, and all the
memory-controllers.
With MPAM: we've only got one bandwidth counter, it doesn't know where the traffic goes
after the MSC. mbm-local on the L3 would reflect all the bandwidth, and mbm-total on the
memory-controllers would have the same number.
I think on x86 mbm_local on the CPUs would read zero as zero traffic went to the 'local'
memory controller, and mbm_total would reflect all the memory bandwidth. (so 'total'
really means 'other')
I think what MPAM is doing here is still useful as a system normally has both CPUs and
memory controllers in the NUMA nodes, and you can use this to spot a control/monitor group
on a NUMA-node that is hammering all the memory (outlier mbm_local), or the same where a
NUMA-node's memory controller is getting hammered by all the NUMA nodes (outlier
mbm_total)
I've not heard of a platform with both memory bandwidth monitors at L3 and the memory
controller, so this may be a theoretical issue.
Shall we only expose one of mbm-local/total to prevent this being seen by user-space?
Thanks,
James
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 29/47] arm_mpam: resctrl: Pick classes for use as mbm counters
2026-01-19 12:04 ` James Morse
@ 2026-01-19 12:47 ` Peter Newman
2026-01-26 16:00 ` Ben Horgan
0 siblings, 1 reply; 160+ messages in thread
From: Peter Newman @ 2026-01-19 12:47 UTC (permalink / raw)
To: James Morse
Cc: Ben Horgan, amitsinght, baisheng.gao, baolin.wang, carl,
dave.martin, david, dfustini, fenghuay, gshan, jonathan.cameron,
kobak, lcherian, linux-arm-kernel, linux-kernel, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
Hi James,
On Mon, Jan 19, 2026 at 1:04 PM James Morse <james.morse@arm.com> wrote:
>
> Hi Peter,
>
> On 15/01/2026 15:49, Peter Newman wrote:
> > On Mon, Jan 12, 2026 at 6:02 PM Ben Horgan <ben.horgan@arm.com> wrote:
> >> From: James Morse <james.morse@arm.com>
> >>
> >> resctrl has two types of counters, NUMA-local and global. MPAM has only
> >> bandwidth counters, but the position of the MSC may mean it counts
> >> NUMA-local, or global traffic.
> >>
> >> But the topology information is not available.
> >>
> >> Apply a heuristic: the L2 or L3 supports bandwidth monitors, these are
> >> probably NUMA-local. If the memory controller supports bandwidth monitors,
> >> they are probably global.
>
> > Are remote memory accesses not cached? How do we know an MBWU monitor
> > residing on a cache won't count remote traffic?
>
> It will, yes you get double counting. Is forbidding both mbm_total and mbm_local preferable?
>
> I think this comes from 'total' in mbm_total not really having the obvious meaning of the
> word:
> If I have CPUs in NUMA-A and no memory controllers, then NUMA-B has no CPUs, and all the
> memory-controllers.
> With MPAM: we've only got one bandwidth counter, it doesn't know where the traffic goes
> after the MSC. mbm-local on the L3 would reflect all the bandwidth, and mbm-total on the
> memory-controllers would have the same number.
> I think on x86 mbm_local on the CPUs would read zero as zero traffic went to the 'local'
> memory controller, and mbm_total would reflect all the memory bandwidth. (so 'total'
> really means 'other')
Our software is going off the definition from the Intel SDM:
"This event monitors the L3 external bandwidth satisfied by the local
memory. In most platforms that support this event, L3 requests are
likely serviced by a memory system with non-uniform memory
architecture. This allows bandwidth to off-package memory resources to
be tracked by subtracting local from total bandwidth (for instance,
bandwidth over QPI to a memory controller on another physical
processor could be tracked by subtraction).
On NUMA-capable hardware that can support this event where all memory
is local, mbm_local == mbm_total, but in practice you can't read them
at the same time from userspace, so if you read mbm_total first,
you'll probably get a small negative result for remote bandwidth.
>
> I think what MPAM is doing here is still useful as a system normally has both CPUs and
> memory controllers in the NUMA nodes, and you can use this to spot a control/monitor group
> on a NUMA-node that is hammering all the memory (outlier mbm_local), or the same where a
> NUMA-node's memory controller is getting hammered by all the NUMA nodes (outlier
> mbm_total)
>
> I've not heard of a platform with both memory bandwidth monitors at L3 and the memory
> controller, so this may be a theoretical issue.
>
> Shall we only expose one of mbm-local/total to prevent this being seen by user-space?
I believe in the current software design, MPAM is only able to support
mbm_total, as an individual MSC (or class of MSCs with the same
configuration) can't separate traffic by destination, so it must be
the combined value. On a hardware design where MSCs were placed such
that one only counts local traffic and another only counts remote, the
resctrl MPAM driver would have to understand the hardware
configuration well enough to be able to produce counts following
Intel's definition of mbm_local and mbm_total.
Thanks,
-Peter
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 29/47] arm_mpam: resctrl: Pick classes for use as mbm counters
2026-01-19 12:47 ` Peter Newman
@ 2026-01-26 16:00 ` Ben Horgan
2026-01-30 13:04 ` Peter Newman
0 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-26 16:00 UTC (permalink / raw)
To: Peter Newman, James Morse
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, jonathan.cameron, kobak, lcherian,
linux-arm-kernel, linux-kernel, punit.agrawal, quic_jiles,
reinette.chatre, rohit.mathew, scott, sdonthineni, tan.shaopeng,
xhao, catalin.marinas, will, corbet, maz, oupton, joey.gouly,
suzuki.poulose, kvmarm
Hi Peter, James,
On 1/19/26 12:47, Peter Newman wrote:
> Hi James,
>
> On Mon, Jan 19, 2026 at 1:04 PM James Morse <james.morse@arm.com> wrote:
>>
>> Hi Peter,
>>
>> On 15/01/2026 15:49, Peter Newman wrote:
>>> On Mon, Jan 12, 2026 at 6:02 PM Ben Horgan <ben.horgan@arm.com> wrote:
>>>> From: James Morse <james.morse@arm.com>
>>>>
>>>> resctrl has two types of counters, NUMA-local and global. MPAM has only
>>>> bandwidth counters, but the position of the MSC may mean it counts
>>>> NUMA-local, or global traffic.
>>>>
>>>> But the topology information is not available.
>>>>
>>>> Apply a heuristic: the L2 or L3 supports bandwidth monitors, these are
>>>> probably NUMA-local. If the memory controller supports bandwidth monitors,
>>>> they are probably global.
>>
>>> Are remote memory accesses not cached? How do we know an MBWU monitor
>>> residing on a cache won't count remote traffic?
>>
>> It will, yes you get double counting. Is forbidding both mbm_total and mbm_local preferable?
>>
>> I think this comes from 'total' in mbm_total not really having the obvious meaning of the
>> word:
>> If I have CPUs in NUMA-A and no memory controllers, then NUMA-B has no CPUs, and all the
>> memory-controllers.
>> With MPAM: we've only got one bandwidth counter, it doesn't know where the traffic goes
>> after the MSC. mbm-local on the L3 would reflect all the bandwidth, and mbm-total on the
>> memory-controllers would have the same number.
>> I think on x86 mbm_local on the CPUs would read zero as zero traffic went to the 'local'
>> memory controller, and mbm_total would reflect all the memory bandwidth. (so 'total'
>> really means 'other')
>
> Our software is going off the definition from the Intel SDM:
>
> "This event monitors the L3 external bandwidth satisfied by the local
> memory. In most platforms that support this event, L3 requests are
> likely serviced by a memory system with non-uniform memory
> architecture. This allows bandwidth to off-package memory resources to
> be tracked by subtracting local from total bandwidth (for instance,
> bandwidth over QPI to a memory controller on another physical
> processor could be tracked by subtraction).
Indeed we should base our discussion on the event definition in the
Intel SDM. For our reference, the description for the external bandwidth
monitoring event (mbm_total) is:
"This event monitors the L3 total external bandwidth to the next level
of the cache hierarchy, including all demand and prefetch misses from
the L3 to the next hierarchy of the memory system. In most platforms,
this represents memory bandwidth."
>
> On NUMA-capable hardware that can support this event where all memory
> is local, mbm_local == mbm_total, but in practice you can't read them
> at the same time from userspace, so if you read mbm_total first,
> you'll probably get a small negative result for remote bandwidth.
>
>>
>> I think what MPAM is doing here is still useful as a system normally has both CPUs and
>> memory controllers in the NUMA nodes, and you can use this to spot a control/monitor group
>> on a NUMA-node that is hammering all the memory (outlier mbm_local), or the same where a
>> NUMA-node's memory controller is getting hammered by all the NUMA nodes (outlier
>> mbm_total)
>>
>> I've not heard of a platform with both memory bandwidth monitors at L3 and the memory
>> controller, so this may be a theoretical issue.
>>
>> Shall we only expose one of mbm-local/total to prevent this being seen by user-space?
>
> I believe in the current software design, MPAM is only able to support
> mbm_total, as an individual MSC (or class of MSCs with the same
> configuration) can't separate traffic by destination, so it must be
> the combined value. On a hardware design where MSCs were placed such
> that one only counts local traffic and another only counts remote, the
> resctrl MPAM driver would have to understand the hardware
> configuration well enough to be able to produce counts following
> Intel's definition of mbm_local and mbm_total.
On a system with MSC measuring memory bandwidth on the L3 caches these
MSC will measure all bandwidth to the next level of the memory hierarchy
which matches the definition of mbm_total. (We assume any MSC on an L3
is at the egress even though acpi/dt doesn't distinguish ingress and
egress.)
For MSC on memory controllers then they don't distinguish which L3 cache
the traffic came from and so unless there is a single L3 then we can't
use these memory bandwidth monitors as they count neither mbm_local nor
mbm_total. When there is a single L3 (and no higher level caches) then
it would match both mbm_total and mbm_local.
Hence, I agree we should just use mbm_total and update the heuristics
such that if the MSC are at the memory only consider them if there are
no higher caches and a single L3.
The introduction of ABMC muddies the waters as the "event_filter" file
defines the meaning of mbm_local and mbm_total. In order to handle this
file properly with MPAM, fs/resctrl changes are needed. We could either
make "event_filter" show the bits that correspond to the mbm counter and
unchangeable or decouple the "event_filter" part of ABMC from the
counter assignment part. As more work is needed to not break abi here
I'll drop the ABMC patches from the next respin of this series.
>
> Thanks,
> -Peter
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 29/47] arm_mpam: resctrl: Pick classes for use as mbm counters
2026-01-26 16:00 ` Ben Horgan
@ 2026-01-30 13:04 ` Peter Newman
2026-01-30 14:38 ` Ben Horgan
0 siblings, 1 reply; 160+ messages in thread
From: Peter Newman @ 2026-01-30 13:04 UTC (permalink / raw)
To: Ben Horgan
Cc: James Morse, amitsinght, baisheng.gao, baolin.wang, carl,
dave.martin, david, dfustini, fenghuay, gshan, jonathan.cameron,
kobak, lcherian, linux-arm-kernel, linux-kernel, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
Hi Ben,
On Mon, Jan 26, 2026 at 5:00 PM Ben Horgan <ben.horgan@arm.com> wrote:
>
> Hi Peter, James,
>
> On 1/19/26 12:47, Peter Newman wrote:
> > Hi James,
> >
> > On Mon, Jan 19, 2026 at 1:04 PM James Morse <james.morse@arm.com> wrote:
> >>
> >> Hi Peter,
> >>
> >> On 15/01/2026 15:49, Peter Newman wrote:
> >>> On Mon, Jan 12, 2026 at 6:02 PM Ben Horgan <ben.horgan@arm.com> wrote:
> >>>> From: James Morse <james.morse@arm.com>
> >>>>
> >>>> resctrl has two types of counters, NUMA-local and global. MPAM has only
> >>>> bandwidth counters, but the position of the MSC may mean it counts
> >>>> NUMA-local, or global traffic.
> >>>>
> >>>> But the topology information is not available.
> >>>>
> >>>> Apply a heuristic: the L2 or L3 supports bandwidth monitors, these are
> >>>> probably NUMA-local. If the memory controller supports bandwidth monitors,
> >>>> they are probably global.
> >>
> >>> Are remote memory accesses not cached? How do we know an MBWU monitor
> >>> residing on a cache won't count remote traffic?
> >>
> >> It will, yes you get double counting. Is forbidding both mbm_total and mbm_local preferable?
> >>
> >> I think this comes from 'total' in mbm_total not really having the obvious meaning of the
> >> word:
> >> If I have CPUs in NUMA-A and no memory controllers, then NUMA-B has no CPUs, and all the
> >> memory-controllers.
> >> With MPAM: we've only got one bandwidth counter, it doesn't know where the traffic goes
> >> after the MSC. mbm-local on the L3 would reflect all the bandwidth, and mbm-total on the
> >> memory-controllers would have the same number.
> >> I think on x86 mbm_local on the CPUs would read zero as zero traffic went to the 'local'
> >> memory controller, and mbm_total would reflect all the memory bandwidth. (so 'total'
> >> really means 'other')
> >
> > Our software is going off the definition from the Intel SDM:
> >
> > "This event monitors the L3 external bandwidth satisfied by the local
> > memory. In most platforms that support this event, L3 requests are
> > likely serviced by a memory system with non-uniform memory
> > architecture. This allows bandwidth to off-package memory resources to
> > be tracked by subtracting local from total bandwidth (for instance,
> > bandwidth over QPI to a memory controller on another physical
> > processor could be tracked by subtraction).
>
> Indeed we should base our discussion on the event definition in the
> Intel SDM. For our reference, the description for the external bandwidth
> monitoring event (mbm_total) is:
>
> "This event monitors the L3 total external bandwidth to the next level
> of the cache hierarchy, including all demand and prefetch misses from
> the L3 to the next hierarchy of the memory system. In most platforms,
> this represents memory bandwidth."
>
> >
> > On NUMA-capable hardware that can support this event where all memory
> > is local, mbm_local == mbm_total, but in practice you can't read them
> > at the same time from userspace, so if you read mbm_total first,
> > you'll probably get a small negative result for remote bandwidth.
> >
> >>
> >> I think what MPAM is doing here is still useful as a system normally has both CPUs and
> >> memory controllers in the NUMA nodes, and you can use this to spot a control/monitor group
> >> on a NUMA-node that is hammering all the memory (outlier mbm_local), or the same where a
> >> NUMA-node's memory controller is getting hammered by all the NUMA nodes (outlier
> >> mbm_total)
> >>
> >> I've not heard of a platform with both memory bandwidth monitors at L3 and the memory
> >> controller, so this may be a theoretical issue.
> >>
> >> Shall we only expose one of mbm-local/total to prevent this being seen by user-space?
> >
> > I believe in the current software design, MPAM is only able to support
> > mbm_total, as an individual MSC (or class of MSCs with the same
> > configuration) can't separate traffic by destination, so it must be
> > the combined value. On a hardware design where MSCs were placed such
> > that one only counts local traffic and another only counts remote, the
> > resctrl MPAM driver would have to understand the hardware
> > configuration well enough to be able to produce counts following
> > Intel's definition of mbm_local and mbm_total.
>
> On a system with MSC measuring memory bandwidth on the L3 caches these
> MSC will measure all bandwidth to the next level of the memory hierarchy
> which matches the definition of mbm_total. (We assume any MSC on an L3
> is at the egress even though acpi/dt doesn't distinguish ingress and
> egress.)
>
> For MSC on memory controllers then they don't distinguish which L3 cache
> the traffic came from and so unless there is a single L3 then we can't
> use these memory bandwidth monitors as they count neither mbm_local nor
> mbm_total. When there is a single L3 (and no higher level caches) then
> it would match both mbm_total and mbm_local.
The text you quoted from Intel was in the context of the L3. I assume
if such an event were implemented at a different level of the memory
system, it would continue to refer to downstream bandwidth.
>
> Hence, I agree we should just use mbm_total and update the heuristics
> such that if the MSC are at the memory only consider them if there are
> no higher caches and a single L3.
That should be ok for now. If I see a system where this makes MBWU
counters inaccessible, we'll continue the discussion then.
>
> The introduction of ABMC muddies the waters as the "event_filter" file
> defines the meaning of mbm_local and mbm_total. In order to handle this
> file properly with MPAM, fs/resctrl changes are needed. We could either
> make "event_filter" show the bits that correspond to the mbm counter and
> unchangeable or decouple the "event_filter" part of ABMC from the
> counter assignment part. As more work is needed to not break abi here
> I'll drop the ABMC patches from the next respin of this series.
I would prefer if you can just leave out the event_filter or make it
unconfigurable on MPAM. The rest of the counter assignment seems to
work well.
Longer term, the event_filter interface is supposed to give us the
ability to define and name our own counter events, but we'll have to
find a way past the decision to define the event filters in terms
copy-pasted from an AMD manual.
Thanks,
-Peter
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 29/47] arm_mpam: resctrl: Pick classes for use as mbm counters
2026-01-30 13:04 ` Peter Newman
@ 2026-01-30 14:38 ` Ben Horgan
0 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-30 14:38 UTC (permalink / raw)
To: Peter Newman
Cc: James Morse, amitsinght, baisheng.gao, baolin.wang, carl,
dave.martin, david, dfustini, fenghuay, gshan, jonathan.cameron,
kobak, lcherian, linux-arm-kernel, linux-kernel, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
Hi Peter,
On 1/30/26 13:04, Peter Newman wrote:
> Hi Ben,
>
> On Mon, Jan 26, 2026 at 5:00 PM Ben Horgan <ben.horgan@arm.com> wrote:
>>
>> Hi Peter, James,
>>
>> On 1/19/26 12:47, Peter Newman wrote:
>>> Hi James,
>>>
>>> On Mon, Jan 19, 2026 at 1:04 PM James Morse <james.morse@arm.com> wrote:
>>>>
>>>> Hi Peter,
>>>>
>>>> On 15/01/2026 15:49, Peter Newman wrote:
>>>>> On Mon, Jan 12, 2026 at 6:02 PM Ben Horgan <ben.horgan@arm.com> wrote:
>>>>>> From: James Morse <james.morse@arm.com>
>>>>>>
>>>>>> resctrl has two types of counters, NUMA-local and global. MPAM has only
>>>>>> bandwidth counters, but the position of the MSC may mean it counts
>>>>>> NUMA-local, or global traffic.
>>>>>>
>>>>>> But the topology information is not available.
>>>>>>
>>>>>> Apply a heuristic: the L2 or L3 supports bandwidth monitors, these are
>>>>>> probably NUMA-local. If the memory controller supports bandwidth monitors,
>>>>>> they are probably global.
>>>>
>>>>> Are remote memory accesses not cached? How do we know an MBWU monitor
>>>>> residing on a cache won't count remote traffic?
>>>>
>>>> It will, yes you get double counting. Is forbidding both mbm_total and mbm_local preferable?
>>>>
>>>> I think this comes from 'total' in mbm_total not really having the obvious meaning of the
>>>> word:
>>>> If I have CPUs in NUMA-A and no memory controllers, then NUMA-B has no CPUs, and all the
>>>> memory-controllers.
>>>> With MPAM: we've only got one bandwidth counter, it doesn't know where the traffic goes
>>>> after the MSC. mbm-local on the L3 would reflect all the bandwidth, and mbm-total on the
>>>> memory-controllers would have the same number.
>>>> I think on x86 mbm_local on the CPUs would read zero as zero traffic went to the 'local'
>>>> memory controller, and mbm_total would reflect all the memory bandwidth. (so 'total'
>>>> really means 'other')
>>>
>>> Our software is going off the definition from the Intel SDM:
>>>
>>> "This event monitors the L3 external bandwidth satisfied by the local
>>> memory. In most platforms that support this event, L3 requests are
>>> likely serviced by a memory system with non-uniform memory
>>> architecture. This allows bandwidth to off-package memory resources to
>>> be tracked by subtracting local from total bandwidth (for instance,
>>> bandwidth over QPI to a memory controller on another physical
>>> processor could be tracked by subtraction).
>>
>> Indeed we should base our discussion on the event definition in the
>> Intel SDM. For our reference, the description for the external bandwidth
>> monitoring event (mbm_total) is:
>>
>> "This event monitors the L3 total external bandwidth to the next level
>> of the cache hierarchy, including all demand and prefetch misses from
>> the L3 to the next hierarchy of the memory system. In most platforms,
>> this represents memory bandwidth."
>>
>>>
>>> On NUMA-capable hardware that can support this event where all memory
>>> is local, mbm_local == mbm_total, but in practice you can't read them
>>> at the same time from userspace, so if you read mbm_total first,
>>> you'll probably get a small negative result for remote bandwidth.
>>>
>>>>
>>>> I think what MPAM is doing here is still useful as a system normally has both CPUs and
>>>> memory controllers in the NUMA nodes, and you can use this to spot a control/monitor group
>>>> on a NUMA-node that is hammering all the memory (outlier mbm_local), or the same where a
>>>> NUMA-node's memory controller is getting hammered by all the NUMA nodes (outlier
>>>> mbm_total)
>>>>
>>>> I've not heard of a platform with both memory bandwidth monitors at L3 and the memory
>>>> controller, so this may be a theoretical issue.
>>>>
>>>> Shall we only expose one of mbm-local/total to prevent this being seen by user-space?
>>>
>>> I believe in the current software design, MPAM is only able to support
>>> mbm_total, as an individual MSC (or class of MSCs with the same
>>> configuration) can't separate traffic by destination, so it must be
>>> the combined value. On a hardware design where MSCs were placed such
>>> that one only counts local traffic and another only counts remote, the
>>> resctrl MPAM driver would have to understand the hardware
>>> configuration well enough to be able to produce counts following
>>> Intel's definition of mbm_local and mbm_total.
>>
>> On a system with MSC measuring memory bandwidth on the L3 caches these
>> MSC will measure all bandwidth to the next level of the memory hierarchy
>> which matches the definition of mbm_total. (We assume any MSC on an L3
>> is at the egress even though acpi/dt doesn't distinguish ingress and
>> egress.)
>>
>> For MSC on memory controllers then they don't distinguish which L3 cache
>> the traffic came from and so unless there is a single L3 then we can't
>> use these memory bandwidth monitors as they count neither mbm_local nor
>> mbm_total. When there is a single L3 (and no higher level caches) then
>> it would match both mbm_total and mbm_local.
>
> The text you quoted from Intel was in the context of the L3. I assume
> if such an event were implemented at a different level of the memory
> system, it would continue to refer to downstream bandwidth.
Yes, that does seem reasonable. That cache level would have to match
with what is reported in resctrl too. I expect that would involve adding
a new entry in enum resctrl_scope.
>
>>
>> Hence, I agree we should just use mbm_total and update the heuristics
>> such that if the MSC are at the memory only consider them if there are
>> no higher caches and a single L3.
>
> That should be ok for now. If I see a system where this makes MBWU
> counters inaccessible, we'll continue the discussion then.
Good to know. I'm looking into tightening the heuristics in general.
Please shout if any of the changes in heuristics mean that any hardware
or features stop being usable.
>
>>
>> The introduction of ABMC muddies the waters as the "event_filter" file
>> defines the meaning of mbm_local and mbm_total. In order to handle this
>> file properly with MPAM, fs/resctrl changes are needed. We could either
>> make "event_filter" show the bits that correspond to the mbm counter and
>> unchangeable or decouple the "event_filter" part of ABMC from the
>> counter assignment part. As more work is needed to not break abi here
>> I'll drop the ABMC patches from the next respin of this series.
>
> I would prefer if you can just leave out the event_filter or make it
> unconfigurable on MPAM. The rest of the counter assignment seems to
> work well.
If there is an event_filter file it should show the "correct" values and
so just leaving it out would be the way to go. However, unless I'm
missing something even this requires changes in fs/resctrl. As such, I
think it's expedient to defer adding ABMC to the series until we have
decided what to do in fs/resctrl.
>
> Longer term, the event_filter interface is supposed to give us the
> ability to define and name our own counter events, but we'll have to
> find a way past the decision to define the event filters in terms
> copy-pasted from an AMD manual.
>
> Thanks,
> -Peter
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 30/47] arm_mpam: resctrl: Pre-allocate free running monitors
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (28 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 29/47] arm_mpam: resctrl: Pick classes for use as mbm counters Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-13 15:10 ` Jonathan Cameron
2026-01-19 11:57 ` Gavin Shan
2026-01-12 16:58 ` [PATCH v3 31/47] arm_mpam: resctrl: Pre-allocate assignable monitors Ben Horgan
` (21 subsequent siblings)
51 siblings, 2 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: James Morse <james.morse@arm.com>
When there are enough monitors, the resctrl mbm local and total files can
be exposed. These need all the monitors that resctrl may use to be
allocated up front.
Add helpers to do this.
If a different candidate class is discovered, the old array should be
free'd and the allocated monitors returned to the driver.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v2:
Code flow tidying (Jonathan)
---
drivers/resctrl/mpam_internal.h | 8 +++-
drivers/resctrl/mpam_resctrl.c | 81 ++++++++++++++++++++++++++++++++-
2 files changed, 86 insertions(+), 3 deletions(-)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 1c5492008fe8..89f9d374ded0 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -360,7 +360,13 @@ struct mpam_resctrl_res {
struct mpam_resctrl_mon {
struct mpam_class *class;
- /* per-class data that resctrl needs will live here */
+ /*
+ * Array of allocated MBWU monitors, indexed by (closid, rmid).
+ * When ABMC is not in use, this array directly maps (closid, rmid)
+ * to the allocated monitor. Otherwise this array is sparse, and
+ * un-assigned (closid, rmid) are -1.
+ */
+ int *mbwu_idx_to_mon;
};
static inline int mpam_alloc_csu_mon(struct mpam_class *class)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index 14a8dcaf1366..3af12ad77fba 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -572,10 +572,58 @@ static void mpam_resctrl_pick_mba(void)
}
}
+static void __free_mbwu_mon(struct mpam_class *class, int *array,
+ u16 num_mbwu_mon)
+{
+ for (int i = 0; i < num_mbwu_mon; i++) {
+ if (array[i] < 0)
+ continue;
+
+ mpam_free_mbwu_mon(class, array[i]);
+ array[i] = ~0;
+ }
+}
+
+static int __alloc_mbwu_mon(struct mpam_class *class, int *array,
+ u16 num_mbwu_mon)
+{
+ for (int i = 0; i < num_mbwu_mon; i++) {
+ int mbwu_mon = mpam_alloc_mbwu_mon(class);
+
+ if (mbwu_mon < 0) {
+ __free_mbwu_mon(class, array, num_mbwu_mon);
+ return mbwu_mon;
+ }
+ array[i] = mbwu_mon;
+ }
+
+ return 0;
+}
+
+static int *__alloc_mbwu_array(struct mpam_class *class, u16 num_mbwu_mon)
+{
+ int err;
+ size_t array_size = num_mbwu_mon * sizeof(int);
+ int *array __free(kfree) = kmalloc(array_size, GFP_KERNEL);
+
+ if (!array)
+ return ERR_PTR(-ENOMEM);
+
+ memset(array, -1, array_size);
+
+ err = __alloc_mbwu_mon(class, array, num_mbwu_mon);
+ if (err)
+ return ERR_PTR(err);
+ return_ptr(array);
+}
+
static void counter_update_class(enum resctrl_event_id evt_id,
struct mpam_class *class)
{
- struct mpam_class *existing_class = mpam_resctrl_counters[evt_id].class;
+ struct mpam_resctrl_mon *mon = &mpam_resctrl_counters[evt_id];
+ struct mpam_class *existing_class = mon->class;
+ u16 num_mbwu_mon = class->props.num_mbwu_mon;
+ int *new_array, *existing_array = mon->mbwu_idx_to_mon;
if (existing_class) {
if (class->level == 3) {
@@ -590,8 +638,37 @@ static void counter_update_class(enum resctrl_event_id evt_id,
}
}
- mpam_resctrl_counters[evt_id].class = class;
+ pr_debug("Updating event %u to use class %u\n", evt_id, class->level);
+
+ /* Might not need all the monitors */
+ num_mbwu_mon = __mpam_monitors_free_running(num_mbwu_mon);
+
+ if ((evt_id != QOS_L3_OCCUP_EVENT_ID) && num_mbwu_mon) {
+ /*
+ * This is the pre-allocated free-running monitors path. It always
+ * allocates one monitor per PARTID * PMG.
+ */
+ WARN_ON_ONCE(num_mbwu_mon != resctrl_arch_system_num_rmid_idx());
+
+ new_array = __alloc_mbwu_array(class, num_mbwu_mon);
+ if (IS_ERR(new_array)) {
+ pr_debug("Failed to allocate MBWU array\n");
+ return;
+ }
+ mon->mbwu_idx_to_mon = new_array;
+
+ if (existing_array) {
+ pr_debug("Releasing previous class %u's monitors\n",
+ existing_class->level);
+ __free_mbwu_mon(existing_class, existing_array, num_mbwu_mon);
+ kfree(existing_array);
+ }
+ } else if (evt_id != QOS_L3_OCCUP_EVENT_ID) {
+ pr_debug("Not pre-allocating free-running counters\n");
+ }
+
exposed_mon_capable = true;
+ mon->class = class;
}
static void mpam_resctrl_pick_counters(void)
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 30/47] arm_mpam: resctrl: Pre-allocate free running monitors
2026-01-12 16:58 ` [PATCH v3 30/47] arm_mpam: resctrl: Pre-allocate free running monitors Ben Horgan
@ 2026-01-13 15:10 ` Jonathan Cameron
2026-01-19 11:57 ` Gavin Shan
1 sibling, 0 replies; 160+ messages in thread
From: Jonathan Cameron @ 2026-01-13 15:10 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
On Mon, 12 Jan 2026 16:58:57 +0000
Ben Horgan <ben.horgan@arm.com> wrote:
> From: James Morse <james.morse@arm.com>
>
> When there are enough monitors, the resctrl mbm local and total files can
> be exposed. These need all the monitors that resctrl may use to be
> allocated up front.
>
> Add helpers to do this.
>
> If a different candidate class is discovered, the old array should be
> free'd and the allocated monitors returned to the driver.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 30/47] arm_mpam: resctrl: Pre-allocate free running monitors
2026-01-12 16:58 ` [PATCH v3 30/47] arm_mpam: resctrl: Pre-allocate free running monitors Ben Horgan
2026-01-13 15:10 ` Jonathan Cameron
@ 2026-01-19 11:57 ` Gavin Shan
2026-01-19 20:27 ` Ben Horgan
1 sibling, 1 reply; 160+ messages in thread
From: Gavin Shan @ 2026-01-19 11:57 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
Hi Ben,
On 1/13/26 12:58 AM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
>
> When there are enough monitors, the resctrl mbm local and total files can
> be exposed. These need all the monitors that resctrl may use to be
> allocated up front.
>
> Add helpers to do this.
>
> If a different candidate class is discovered, the old array should be
> free'd and the allocated monitors returned to the driver.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v2:
> Code flow tidying (Jonathan)
> ---
> drivers/resctrl/mpam_internal.h | 8 +++-
> drivers/resctrl/mpam_resctrl.c | 81 ++++++++++++++++++++++++++++++++-
> 2 files changed, 86 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 1c5492008fe8..89f9d374ded0 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -360,7 +360,13 @@ struct mpam_resctrl_res {
> struct mpam_resctrl_mon {
> struct mpam_class *class;
>
> - /* per-class data that resctrl needs will live here */
> + /*
> + * Array of allocated MBWU monitors, indexed by (closid, rmid).
> + * When ABMC is not in use, this array directly maps (closid, rmid)
> + * to the allocated monitor. Otherwise this array is sparse, and
> + * un-assigned (closid, rmid) are -1.
> + */
> + int *mbwu_idx_to_mon;
> };
>
> static inline int mpam_alloc_csu_mon(struct mpam_class *class)
> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
> index 14a8dcaf1366..3af12ad77fba 100644
> --- a/drivers/resctrl/mpam_resctrl.c
> +++ b/drivers/resctrl/mpam_resctrl.c
> @@ -572,10 +572,58 @@ static void mpam_resctrl_pick_mba(void)
> }
> }
>
> +static void __free_mbwu_mon(struct mpam_class *class, int *array,
> + u16 num_mbwu_mon)
> +{
> + for (int i = 0; i < num_mbwu_mon; i++) {
> + if (array[i] < 0)
> + continue;
> +
> + mpam_free_mbwu_mon(class, array[i]);
> + array[i] = ~0;
> + }
> +}
> +
> +static int __alloc_mbwu_mon(struct mpam_class *class, int *array,
> + u16 num_mbwu_mon)
> +{
> + for (int i = 0; i < num_mbwu_mon; i++) {
> + int mbwu_mon = mpam_alloc_mbwu_mon(class);
> +
> + if (mbwu_mon < 0) {
> + __free_mbwu_mon(class, array, num_mbwu_mon);
> + return mbwu_mon;
> + }
> + array[i] = mbwu_mon;
> + }
> +
> + return 0;
> +}
> +
> +static int *__alloc_mbwu_array(struct mpam_class *class, u16 num_mbwu_mon)
> +{
> + int err;
> + size_t array_size = num_mbwu_mon * sizeof(int);
> + int *array __free(kfree) = kmalloc(array_size, GFP_KERNEL);
> +
A warning reported by checkpatch.pl as below.
WARNING: Missing a blank line after declarations
#84: FILE: drivers/resctrl/mpam_resctrl.c:607:
+ size_t array_size = num_mbwu_mon * sizeof(int);
+ int *array __free(kfree) = kmalloc(array_size, GFP_KERNEL);
> + if (!array)
> + return ERR_PTR(-ENOMEM);
> +
> + memset(array, -1, array_size);
> +
> + err = __alloc_mbwu_mon(class, array, num_mbwu_mon);
> + if (err)
> + return ERR_PTR(err);
> + return_ptr(array);
> +}
> +
> static void counter_update_class(enum resctrl_event_id evt_id,
> struct mpam_class *class)
> {
> - struct mpam_class *existing_class = mpam_resctrl_counters[evt_id].class;
> + struct mpam_resctrl_mon *mon = &mpam_resctrl_counters[evt_id];
> + struct mpam_class *existing_class = mon->class;
> + u16 num_mbwu_mon = class->props.num_mbwu_mon;
> + int *new_array, *existing_array = mon->mbwu_idx_to_mon;
>
> if (existing_class) {
> if (class->level == 3) {
> @@ -590,8 +638,37 @@ static void counter_update_class(enum resctrl_event_id evt_id,
> }
> }
>
> - mpam_resctrl_counters[evt_id].class = class;
> + pr_debug("Updating event %u to use class %u\n", evt_id, class->level);
> +
> + /* Might not need all the monitors */
> + num_mbwu_mon = __mpam_monitors_free_running(num_mbwu_mon);
> +
> + if ((evt_id != QOS_L3_OCCUP_EVENT_ID) && num_mbwu_mon) {
> + /*
> + * This is the pre-allocated free-running monitors path. It always
> + * allocates one monitor per PARTID * PMG.
> + */
> + WARN_ON_ONCE(num_mbwu_mon != resctrl_arch_system_num_rmid_idx());
> +
> + new_array = __alloc_mbwu_array(class, num_mbwu_mon);
> + if (IS_ERR(new_array)) {
> + pr_debug("Failed to allocate MBWU array\n");
> + return;
> + }
> + mon->mbwu_idx_to_mon = new_array;
> +
> + if (existing_array) {
> + pr_debug("Releasing previous class %u's monitors\n",
> + existing_class->level);
> + __free_mbwu_mon(existing_class, existing_array, num_mbwu_mon);
> + kfree(existing_array);
> + }
> + } else if (evt_id != QOS_L3_OCCUP_EVENT_ID) {
> + pr_debug("Not pre-allocating free-running counters\n");
> + }
> +
> exposed_mon_capable = true;
> + mon->class = class;
> }
>
> static void mpam_resctrl_pick_counters(void)
Thanks,
Gavin
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 30/47] arm_mpam: resctrl: Pre-allocate free running monitors
2026-01-19 11:57 ` Gavin Shan
@ 2026-01-19 20:27 ` Ben Horgan
0 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-19 20:27 UTC (permalink / raw)
To: Gavin Shan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
Hi Gavin,
On 1/19/26 11:57, Gavin Shan wrote:
> Hi Ben,
>
> On 1/13/26 12:58 AM, Ben Horgan wrote:
>> From: James Morse <james.morse@arm.com>
>>
>> When there are enough monitors, the resctrl mbm local and total files can
>> be exposed. These need all the monitors that resctrl may use to be
>> allocated up front.
>>
>> Add helpers to do this.
>>
>> If a different candidate class is discovered, the old array should be
>> free'd and the allocated monitors returned to the driver.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>> +
>> +static int *__alloc_mbwu_array(struct mpam_class *class, u16
>> num_mbwu_mon)
>> +{
>> + int err;
>> + size_t array_size = num_mbwu_mon * sizeof(int);
>> + int *array __free(kfree) = kmalloc(array_size, GFP_KERNEL);
>> +
>
> A warning reported by checkpatch.pl as below.
>
> WARNING: Missing a blank line after declarations
> #84: FILE: drivers/resctrl/mpam_resctrl.c:607:
> + size_t array_size = num_mbwu_mon * sizeof(int);
> + int *array __free(kfree) = kmalloc(array_size, GFP_KERNEL);
>
Similarly to the other blank line checkpatch.pl warning I expect this is
to do with how it handles the __free() annotation. I'm not intending to
change this code unless there is some style guideline that I've missed
or other reason to do so.
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 31/47] arm_mpam: resctrl: Pre-allocate assignable monitors
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (29 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 30/47] arm_mpam: resctrl: Pre-allocate free running monitors Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-16 10:34 ` Shaopeng Tan (Fujitsu)
2026-01-12 16:58 ` [PATCH v3 32/47] arm_mpam: resctrl: Add kunit test for ABMC/CDP interactions Ben Horgan
` (20 subsequent siblings)
51 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: James Morse <james.morse@arm.com>
When there are not enough monitors, MPAM is able to emulate ABMC by making
a smaller number of monitors assignable. These monitors still need to be
allocated from the driver, and mapped to whichever control/monitor group
resctrl wants to use them with.
Add a second array to hold the monitor values indexed by resctrl's cntr_id.
When CDP is in use, two monitors are needed so the available number of
counters halves. Platforms with one monitor will have zero monitors when
CDP is in use.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since rfc:
Move __free
kmalloc -> kcalloc
Changes since v2:
kcalloc -> kmalloc_array
sizeof(*rmid_array)
Return error from resctrl_arch_mbm_cntr_assign_set()
---
drivers/resctrl/mpam_internal.h | 7 +++
drivers/resctrl/mpam_resctrl.c | 97 +++++++++++++++++++++++++++++++++
2 files changed, 104 insertions(+)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 89f9d374ded0..58076abd5de3 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -367,6 +367,13 @@ struct mpam_resctrl_mon {
* un-assigned (closid, rmid) are -1.
*/
int *mbwu_idx_to_mon;
+
+ /*
+ * Array of assigned MBWU monitors, indexed by idx argument.
+ * When ABMC is not in use, this array can be NULL. Otherwise
+ * it maps idx to the allocated monitor.
+ */
+ int *assigned_counters;
};
static inline int mpam_alloc_csu_mon(struct mpam_class *class)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index 3af12ad77fba..2a9efc6c6fae 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -68,6 +68,12 @@ static bool cdp_enabled;
static bool cacheinfo_ready;
static DECLARE_WAIT_QUEUE_HEAD(wait_cacheinfo_ready);
+/*
+ * L3 local/total may come from different classes - what is the number of MBWU
+ * 'on L3'?
+ */
+static unsigned int l3_num_allocated_mbwu = ~0;
+
/* Whether this num_mbw_mon could result in a free_running system */
static int __mpam_monitors_free_running(u16 num_mbwu_mon)
{
@@ -76,6 +82,15 @@ static int __mpam_monitors_free_running(u16 num_mbwu_mon)
return 0;
}
+/*
+ * If l3_num_allocated_mbwu is forced below PARTID * PMG, then the counters
+ * are not free running, and ABMC's user-interface must be used to assign them.
+ */
+static bool mpam_resctrl_abmc_enabled(void)
+{
+ return l3_num_allocated_mbwu < resctrl_arch_system_num_rmid_idx();
+}
+
bool resctrl_arch_alloc_capable(void)
{
return exposed_alloc_capable;
@@ -120,9 +135,26 @@ static void resctrl_reset_task_closids(void)
read_unlock(&tasklist_lock);
}
+static void mpam_resctrl_monitor_sync_abmc_vals(struct rdt_resource *l3)
+{
+ l3->mon.num_mbm_cntrs = l3_num_allocated_mbwu;
+ if (cdp_enabled)
+ l3->mon.num_mbm_cntrs /= 2;
+
+ if (l3->mon.num_mbm_cntrs) {
+ l3->mon.mbm_cntr_assignable = mpam_resctrl_abmc_enabled();
+ l3->mon.mbm_assign_on_mkdir = mpam_resctrl_abmc_enabled();
+ } else {
+ l3->mon.mbm_cntr_assignable = false;
+ l3->mon.mbm_assign_on_mkdir = false;
+ }
+}
+
int resctrl_arch_set_cdp_enabled(enum resctrl_res_level ignored, bool enable)
{
u32 partid_i = RESCTRL_RESERVED_CLOSID, partid_d = RESCTRL_RESERVED_CLOSID;
+ struct mpam_resctrl_res *res = &mpam_resctrl_controls[RDT_RESOURCE_L3];
+ struct rdt_resource *l3 = &res->resctrl_res;
cdp_enabled = enable;
@@ -138,6 +170,7 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level ignored, bool enable)
WRITE_ONCE(arm64_mpam_global_default, mpam_get_regval(current));
resctrl_reset_task_closids();
+ mpam_resctrl_monitor_sync_abmc_vals(l3);
return 0;
}
@@ -321,6 +354,11 @@ static bool class_has_usable_mbwu(struct mpam_class *class)
return true;
}
+ if (cprops->num_mbwu_mon) {
+ pr_debug("monitors usable via ABMC assignment\n");
+ return true;
+ }
+
return false;
}
@@ -597,6 +635,8 @@ static int __alloc_mbwu_mon(struct mpam_class *class, int *array,
array[i] = mbwu_mon;
}
+ l3_num_allocated_mbwu = min(l3_num_allocated_mbwu, num_mbwu_mon);
+
return 0;
}
@@ -736,6 +776,19 @@ static void mpam_resctrl_pick_counters(void)
mpam_resctrl_counters[QOS_L3_MBM_TOTAL_EVENT_ID].class);
}
+bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r)
+{
+ if (r != &mpam_resctrl_controls[RDT_RESOURCE_L3].resctrl_res)
+ return false;
+
+ return mpam_resctrl_abmc_enabled();
+}
+
+int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable)
+{
+ return -EINVAL;
+}
+
static int mpam_resctrl_control_init(struct mpam_resctrl_res *res)
{
struct mpam_class *class = res->class;
@@ -811,6 +864,42 @@ static int mpam_resctrl_pick_domain_id(int cpu, struct mpam_component *comp)
return comp->comp_id;
}
+/*
+ * This must run after all event counters have been picked so that any free
+ * running counters have already been allocated.
+ */
+static int mpam_resctrl_monitor_init_abmc(struct mpam_resctrl_mon *mon)
+{
+ struct mpam_resctrl_res *res = &mpam_resctrl_controls[RDT_RESOURCE_L3];
+ struct rdt_resource *l3 = &res->resctrl_res;
+ struct mpam_class *class = mon->class;
+ u16 num_mbwu_mon;
+
+ if (mon->mbwu_idx_to_mon) {
+ pr_debug("monitors free running\n");
+ return 0;
+ }
+
+ int *rmid_array __free(kfree) =
+ kmalloc_array(resctrl_arch_system_num_rmid_idx(), sizeof(*rmid_array), GFP_KERNEL);
+
+ if (!rmid_array) {
+ pr_debug("Failed to allocate RMID array\n");
+ return -ENOMEM;
+ }
+ memset(rmid_array, -1, resctrl_arch_system_num_rmid_idx() * sizeof(int));
+
+ num_mbwu_mon = class->props.num_mbwu_mon;
+ mon->assigned_counters = __alloc_mbwu_array(mon->class, num_mbwu_mon);
+ if (IS_ERR(mon->assigned_counters))
+ return PTR_ERR(mon->assigned_counters);
+ mon->mbwu_idx_to_mon = no_free_ptr(rmid_array);
+
+ mpam_resctrl_monitor_sync_abmc_vals(l3);
+
+ return 0;
+}
+
static int mpam_resctrl_monitor_init(struct mpam_resctrl_mon *mon,
enum resctrl_event_id type)
{
@@ -857,6 +946,14 @@ static int mpam_resctrl_monitor_init(struct mpam_resctrl_mon *mon,
* group can be allocated freely:
*/
l3->mon.num_rmid = mpam_pmg_max + 1;
+
+ switch (type) {
+ case QOS_L3_MBM_LOCAL_EVENT_ID:
+ case QOS_L3_MBM_TOTAL_EVENT_ID:
+ return mpam_resctrl_monitor_init_abmc(mon);
+ default:
+ return 0;
+ }
}
return 0;
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 31/47] arm_mpam: resctrl: Pre-allocate assignable monitors
2026-01-12 16:58 ` [PATCH v3 31/47] arm_mpam: resctrl: Pre-allocate assignable monitors Ben Horgan
@ 2026-01-16 10:34 ` Shaopeng Tan (Fujitsu)
2026-01-16 11:04 ` Ben Horgan
0 siblings, 1 reply; 160+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2026-01-16 10:34 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght@marvell.com, baisheng.gao@unisoc.com,
baolin.wang@linux.alibaba.com, carl@os.amperecomputing.com,
dave.martin@arm.com, david@kernel.org, dfustini@baylibre.com,
fenghuay@nvidia.com, gshan@redhat.com, james.morse@arm.com,
jonathan.cameron@huawei.com, kobak@nvidia.com,
lcherian@marvell.com, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, peternewman@google.com,
punit.agrawal@oss.qualcomm.com, quic_jiles@quicinc.com,
reinette.chatre@intel.com, rohit.mathew@arm.com,
scott@os.amperecomputing.com, sdonthineni@nvidia.com,
xhao@linux.alibaba.com, catalin.marinas@arm.com, will@kernel.org,
corbet@lwn.net, maz@kernel.org, oupton@kernel.org,
joey.gouly@arm.com, suzuki.poulose@arm.com,
kvmarm@lists.linux.dev
Hello Ben,
> From: James Morse <james.morse@arm.com>
>
> When there are not enough monitors, MPAM is able to emulate ABMC by making
> a smaller number of monitors assignable. These monitors still need to be
> allocated from the driver, and mapped to whichever control/monitor group
> resctrl wants to use them with.
>
> Add a second array to hold the monitor values indexed by resctrl's cntr_id.
>
> When CDP is in use, two monitors are needed so the available number of
> counters halves. Platforms with one monitor will have zero monitors when
> CDP is in use.
>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since rfc:
> Move __free
> kmalloc -> kcalloc
>
> Changes since v2:
> kcalloc -> kmalloc_array
> sizeof(*rmid_array)
> Return error from resctrl_arch_mbm_cntr_assign_set()
> ---
> drivers/resctrl/mpam_internal.h | 7 +++
> drivers/resctrl/mpam_resctrl.c | 97 +++++++++++++++++++++++++++++++++
> 2 files changed, 104 insertions(+)
>
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 89f9d374ded0..58076abd5de3 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -367,6 +367,13 @@ struct mpam_resctrl_mon {
> * un-assigned (closid, rmid) are -1.
> */
> int *mbwu_idx_to_mon;
> +
> + /*
> + * Array of assigned MBWU monitors, indexed by idx argument.
> + * When ABMC is not in use, this array can be NULL. Otherwise
> + * it maps idx to the allocated monitor.
> + */
> + int *assigned_counters;
> };
>
> static inline int mpam_alloc_csu_mon(struct mpam_class *class)
> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
> index 3af12ad77fba..2a9efc6c6fae 100644
> --- a/drivers/resctrl/mpam_resctrl.c
> +++ b/drivers/resctrl/mpam_resctrl.c
> @@ -68,6 +68,12 @@ static bool cdp_enabled;
> static bool cacheinfo_ready;
> static DECLARE_WAIT_QUEUE_HEAD(wait_cacheinfo_ready);
>
> +/*
> + * L3 local/total may come from different classes - what is the number of MBWU
> + * 'on L3'?
> + */
> +static unsigned int l3_num_allocated_mbwu = ~0;
> +
> /* Whether this num_mbw_mon could result in a free_running system */
> static int __mpam_monitors_free_running(u16 num_mbwu_mon)
> {
> @@ -76,6 +82,15 @@ static int __mpam_monitors_free_running(u16 num_mbwu_mon)
> return 0;
> }
>
> +/*
> + * If l3_num_allocated_mbwu is forced below PARTID * PMG, then the counters
> + * are not free running, and ABMC's user-interface must be used to assign them.
> + */
> +static bool mpam_resctrl_abmc_enabled(void)
> +{
> + return l3_num_allocated_mbwu < resctrl_arch_system_num_rmid_idx();
> +}
> +
> bool resctrl_arch_alloc_capable(void)
> {
> return exposed_alloc_capable;
> @@ -120,9 +135,26 @@ static void resctrl_reset_task_closids(void)
> read_unlock(&tasklist_lock);
> }
>
> +static void mpam_resctrl_monitor_sync_abmc_vals(struct rdt_resource *l3)
> +{
> + l3->mon.num_mbm_cntrs = l3_num_allocated_mbwu;
> + if (cdp_enabled)
> + l3->mon.num_mbm_cntrs /= 2;
> +
> + if (l3->mon.num_mbm_cntrs) {
> + l3->mon.mbm_cntr_assignable = mpam_resctrl_abmc_enabled();
> + l3->mon.mbm_assign_on_mkdir = mpam_resctrl_abmc_enabled();
> + } else {
> + l3->mon.mbm_cntr_assignable = false;
> + l3->mon.mbm_assign_on_mkdir = false;
> + }
> +}
> +
> int resctrl_arch_set_cdp_enabled(enum resctrl_res_level ignored, bool enable)
> {
> u32 partid_i = RESCTRL_RESERVED_CLOSID, partid_d = RESCTRL_RESERVED_CLOSID;
> + struct mpam_resctrl_res *res = &mpam_resctrl_controls[RDT_RESOURCE_L3];
> + struct rdt_resource *l3 = &res->resctrl_res;
>
> cdp_enabled = enable;
>
> @@ -138,6 +170,7 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level ignored, bool enable)
> WRITE_ONCE(arm64_mpam_global_default, mpam_get_regval(current));
>
> resctrl_reset_task_closids();
> + mpam_resctrl_monitor_sync_abmc_vals(l3);
>
> return 0;
> }
> @@ -321,6 +354,11 @@ static bool class_has_usable_mbwu(struct mpam_class *class)
> return true;
> }
>
> + if (cprops->num_mbwu_mon) {
> + pr_debug("monitors usable via ABMC assignment\n");
> + return true;
> + }
> +
> return false;
> }
>
> @@ -597,6 +635,8 @@ static int __alloc_mbwu_mon(struct mpam_class *class, int *array,
> array[i] = mbwu_mon;
> }
>
> + l3_num_allocated_mbwu = min(l3_num_allocated_mbwu, num_mbwu_mon);
> +
> return 0;
> }
>
> @@ -736,6 +776,19 @@ static void mpam_resctrl_pick_counters(void)
> mpam_resctrl_counters[QOS_L3_MBM_TOTAL_EVENT_ID].class);
> }
>
> +bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r)
> +{
> + if (r != &mpam_resctrl_controls[RDT_RESOURCE_L3].resctrl_res)
> + return false;
> +
> + return mpam_resctrl_abmc_enabled();
> +}
> +
> +int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable)
> +{
> + return -EINVAL;
> +}
$ echo "default" | sudo tee /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
default
tee: /sys/fs/resctrl/info/L3_MON/mbm_assign_mode:Invalid argument
"return -EOPNOTSUPP;" might be better.
Best regards,
Shaopeng TAN
> static int mpam_resctrl_control_init(struct mpam_resctrl_res *res)
> {
> struct mpam_class *class = res->class;
> @@ -811,6 +864,42 @@ static int mpam_resctrl_pick_domain_id(int cpu, struct mpam_component *comp)
> return comp->comp_id;
> }
>
> +/*
> + * This must run after all event counters have been picked so that any free
> + * running counters have already been allocated.
> + */
> +static int mpam_resctrl_monitor_init_abmc(struct mpam_resctrl_mon *mon)
> +{
> + struct mpam_resctrl_res *res = &mpam_resctrl_controls[RDT_RESOURCE_L3];
> + struct rdt_resource *l3 = &res->resctrl_res;
> + struct mpam_class *class = mon->class;
> + u16 num_mbwu_mon;
> +
> + if (mon->mbwu_idx_to_mon) {
> + pr_debug("monitors free running\n");
> + return 0;
> + }
> +
> + int *rmid_array __free(kfree) =
> + kmalloc_array(resctrl_arch_system_num_rmid_idx(), sizeof(*rmid_array), GFP_KERNEL);
> +
> + if (!rmid_array) {
> + pr_debug("Failed to allocate RMID array\n");
> + return -ENOMEM;
> + }
> + memset(rmid_array, -1, resctrl_arch_system_num_rmid_idx() * sizeof(int));
> +
> + num_mbwu_mon = class->props.num_mbwu_mon;
> + mon->assigned_counters = __alloc_mbwu_array(mon->class, num_mbwu_mon);
> + if (IS_ERR(mon->assigned_counters))
> + return PTR_ERR(mon->assigned_counters);
> + mon->mbwu_idx_to_mon = no_free_ptr(rmid_array);
> +
> + mpam_resctrl_monitor_sync_abmc_vals(l3);
> +
> + return 0;
> +}
> +
> static int mpam_resctrl_monitor_init(struct mpam_resctrl_mon *mon,
> enum resctrl_event_id type)
> {
> @@ -857,6 +946,14 @@ static int mpam_resctrl_monitor_init(struct mpam_resctrl_mon *mon,
> * group can be allocated freely:
> */
> l3->mon.num_rmid = mpam_pmg_max + 1;
> +
> + switch (type) {
> + case QOS_L3_MBM_LOCAL_EVENT_ID:
> + case QOS_L3_MBM_TOTAL_EVENT_ID:
> + return mpam_resctrl_monitor_init_abmc(mon);
> + default:
> + return 0;
> + }
> }
>
> return 0;
> --
> 2.43.0
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 31/47] arm_mpam: resctrl: Pre-allocate assignable monitors
2026-01-16 10:34 ` Shaopeng Tan (Fujitsu)
@ 2026-01-16 11:04 ` Ben Horgan
2026-01-19 20:34 ` Ben Horgan
0 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-16 11:04 UTC (permalink / raw)
To: Shaopeng Tan (Fujitsu)
Cc: amitsinght@marvell.com, baisheng.gao@unisoc.com,
baolin.wang@linux.alibaba.com, carl@os.amperecomputing.com,
dave.martin@arm.com, david@kernel.org, dfustini@baylibre.com,
fenghuay@nvidia.com, gshan@redhat.com, james.morse@arm.com,
jonathan.cameron@huawei.com, kobak@nvidia.com,
lcherian@marvell.com, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, peternewman@google.com,
punit.agrawal@oss.qualcomm.com, quic_jiles@quicinc.com,
reinette.chatre@intel.com, rohit.mathew@arm.com,
scott@os.amperecomputing.com, sdonthineni@nvidia.com,
xhao@linux.alibaba.com, catalin.marinas@arm.com, will@kernel.org,
corbet@lwn.net, maz@kernel.org, oupton@kernel.org,
joey.gouly@arm.com, suzuki.poulose@arm.com,
kvmarm@lists.linux.dev
Hi Shaopeng,
On 1/16/26 10:34, Shaopeng Tan (Fujitsu) wrote:
> Hello Ben,
>
>> From: James Morse <james.morse@arm.com>
>>
>> When there are not enough monitors, MPAM is able to emulate ABMC by making
>> a smaller number of monitors assignable. These monitors still need to be
>> allocated from the driver, and mapped to whichever control/monitor group
>> resctrl wants to use them with.
>>
>> Add a second array to hold the monitor values indexed by resctrl's cntr_id.
>>
>> When CDP is in use, two monitors are needed so the available number of
>> counters halves. Platforms with one monitor will have zero monitors when
>> CDP is in use.
>>
>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
[...]
>> +bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r)
>> +{
>> + if (r != &mpam_resctrl_controls[RDT_RESOURCE_L3].resctrl_res)
>> + return false;
>> +
>> + return mpam_resctrl_abmc_enabled();
>> +}
>> +
>> +int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable)
>> +{
>> + return -EINVAL;
>> +}
>
> $ echo "default" | sudo tee /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> default
> tee: /sys/fs/resctrl/info/L3_MON/mbm_assign_mode:Invalid argument
> "return -EOPNOTSUPP;" might be better.
I'll keep this as -EINVAL in the case when 'enable' matches the current mode
and change it to return 0 when there is nothing to change. This will match the
behaviour once this is handled in resctrl. See the outcome of the discussion on
this resctrl patch [1] Note, that I haven't yet updated the patch to match the
discussion. Any objection?
[1] https://lore.kernel.org/lkml/bf8bb682-6a4d-4f39-916c-952719fcf48d@arm.com/
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 31/47] arm_mpam: resctrl: Pre-allocate assignable monitors
2026-01-16 11:04 ` Ben Horgan
@ 2026-01-19 20:34 ` Ben Horgan
0 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-19 20:34 UTC (permalink / raw)
To: Shaopeng Tan (Fujitsu)
Cc: amitsinght@marvell.com, baisheng.gao@unisoc.com,
baolin.wang@linux.alibaba.com, carl@os.amperecomputing.com,
dave.martin@arm.com, david@kernel.org, dfustini@baylibre.com,
fenghuay@nvidia.com, gshan@redhat.com, james.morse@arm.com,
jonathan.cameron@huawei.com, kobak@nvidia.com,
lcherian@marvell.com, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, peternewman@google.com,
punit.agrawal@oss.qualcomm.com, quic_jiles@quicinc.com,
reinette.chatre@intel.com, rohit.mathew@arm.com,
scott@os.amperecomputing.com, sdonthineni@nvidia.com,
xhao@linux.alibaba.com, catalin.marinas@arm.com, will@kernel.org,
corbet@lwn.net, maz@kernel.org, oupton@kernel.org,
joey.gouly@arm.com, suzuki.poulose@arm.com,
kvmarm@lists.linux.dev
Hi Shaopeng,
On 1/16/26 11:04, Ben Horgan wrote:
> Hi Shaopeng,
>
> On 1/16/26 10:34, Shaopeng Tan (Fujitsu) wrote:
>> Hello Ben,
>>
>>> From: James Morse <james.morse@arm.com>
>>>
>>> When there are not enough monitors, MPAM is able to emulate ABMC by making
>>> a smaller number of monitors assignable. These monitors still need to be
>>> allocated from the driver, and mapped to whichever control/monitor group
>>> resctrl wants to use them with.
>>>
>>> Add a second array to hold the monitor values indexed by resctrl's cntr_id.
>>>
>>> When CDP is in use, two monitors are needed so the available number of
>>> counters halves. Platforms with one monitor will have zero monitors when
>>> CDP is in use.
>>>
>>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>>> Signed-off-by: James Morse <james.morse@arm.com>
>>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> [...]
>>> +bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r)
>>> +{
>>> + if (r != &mpam_resctrl_controls[RDT_RESOURCE_L3].resctrl_res)
>>> + return false;
>>> +
>>> + return mpam_resctrl_abmc_enabled();
>>> +}
>>> +
>>> +int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable)
>>> +{
>>> + return -EINVAL;
>>> +}
>>
>> $ echo "default" | sudo tee /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> default
>> tee: /sys/fs/resctrl/info/L3_MON/mbm_assign_mode:Invalid argument
>> "return -EOPNOTSUPP;" might be better.
>
> I'll keep this as -EINVAL in the case when 'enable' matches the current mode
> and change it to return 0 when there is nothing to change. This will match the
Actually, resctl_arch_cntr_assign_set() only gets called on an attempt
to change the value so I'll keep it as is.
> behaviour once this is handled in resctrl. See the outcome of the discussion on
> this resctrl patch [1] Note, that I haven't yet updated the patch to match the
> discussion. Any objection?
>
> [1] https://lore.kernel.org/lkml/bf8bb682-6a4d-4f39-916c-952719fcf48d@arm.com/
>
> Thanks,
>
> Ben
>
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 32/47] arm_mpam: resctrl: Add kunit test for ABMC/CDP interactions
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (30 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 31/47] arm_mpam: resctrl: Pre-allocate assignable monitors Ben Horgan
@ 2026-01-12 16:58 ` Ben Horgan
2026-01-13 15:26 ` Jonathan Cameron
2026-01-12 16:59 ` [PATCH v3 33/47] arm_mpam: resctrl: Add resctrl_arch_config_cntr() for ABMC use Ben Horgan
` (19 subsequent siblings)
51 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:58 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: James Morse <james.morse@arm.com>
ABMC exposes a fun corner case where a platform with one monitor can use
ABMC for assignable counters - but not when CDP is enabled.
Add some tests.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
drivers/resctrl/test_mpam_resctrl.c | 62 +++++++++++++++++++++++++++++
1 file changed, 62 insertions(+)
diff --git a/drivers/resctrl/test_mpam_resctrl.c b/drivers/resctrl/test_mpam_resctrl.c
index a20da161d965..fd63f7bbb147 100644
--- a/drivers/resctrl/test_mpam_resctrl.c
+++ b/drivers/resctrl/test_mpam_resctrl.c
@@ -344,6 +344,67 @@ static void test_rmid_idx_encoding(struct kunit *test)
mpam_pmg_max = orig_mpam_pmg_max;
}
+static void test_num_assignable_counters(struct kunit *test)
+{
+ unsigned int orig_l3_num_allocated_mbwu = l3_num_allocated_mbwu;
+ u32 orig_mpam_partid_max = mpam_partid_max;
+ u32 orig_mpam_pmg_max = mpam_pmg_max;
+ bool orig_cdp_enabled = cdp_enabled;
+ struct rdt_resource fake_l3;
+
+ /* Force there to be some PARTID/PMG */
+ mpam_partid_max = 3;
+ mpam_pmg_max = 1;
+
+ cdp_enabled = false;
+
+ /* ABMC off, CDP off */
+ l3_num_allocated_mbwu = resctrl_arch_system_num_rmid_idx();
+ mpam_resctrl_monitor_sync_abmc_vals(&fake_l3);
+ KUNIT_EXPECT_EQ(test, fake_l3.mon.num_mbm_cntrs, resctrl_arch_system_num_rmid_idx());
+ KUNIT_EXPECT_FALSE(test, fake_l3.mon.mbm_cntr_assignable);
+ KUNIT_EXPECT_FALSE(test, fake_l3.mon.mbm_assign_on_mkdir);
+
+ /* ABMC on, CDP off */
+ l3_num_allocated_mbwu = 4;
+ mpam_resctrl_monitor_sync_abmc_vals(&fake_l3);
+ KUNIT_EXPECT_EQ(test, fake_l3.mon.num_mbm_cntrs, 4);
+ KUNIT_EXPECT_TRUE(test, fake_l3.mon.mbm_cntr_assignable);
+ KUNIT_EXPECT_TRUE(test, fake_l3.mon.mbm_assign_on_mkdir);
+
+ cdp_enabled = true;
+
+ /* ABMC off, CDP on */
+ l3_num_allocated_mbwu = resctrl_arch_system_num_rmid_idx();
+ mpam_resctrl_monitor_sync_abmc_vals(&fake_l3);
+
+ /* (value not consumed by resctrl) */
+ KUNIT_EXPECT_EQ(test, fake_l3.mon.num_mbm_cntrs, resctrl_arch_system_num_rmid_idx() / 2);
+
+ KUNIT_EXPECT_FALSE(test, fake_l3.mon.mbm_cntr_assignable);
+ KUNIT_EXPECT_FALSE(test, fake_l3.mon.mbm_assign_on_mkdir);
+
+ /* ABMC on, CDP on */
+ l3_num_allocated_mbwu = 4;
+ mpam_resctrl_monitor_sync_abmc_vals(&fake_l3);
+ KUNIT_EXPECT_EQ(test, fake_l3.mon.num_mbm_cntrs, 2);
+ KUNIT_EXPECT_TRUE(test, fake_l3.mon.mbm_cntr_assignable);
+ KUNIT_EXPECT_TRUE(test, fake_l3.mon.mbm_assign_on_mkdir);
+
+ /* ABMC 'on', CDP on - but not enough counters */
+ l3_num_allocated_mbwu = 1;
+ mpam_resctrl_monitor_sync_abmc_vals(&fake_l3);
+ KUNIT_EXPECT_EQ(test, fake_l3.mon.num_mbm_cntrs, 0);
+ KUNIT_EXPECT_FALSE(test, fake_l3.mon.mbm_cntr_assignable);
+ KUNIT_EXPECT_FALSE(test, fake_l3.mon.mbm_assign_on_mkdir);
+
+ /* Restore global variables that were messed with */
+ l3_num_allocated_mbwu = orig_l3_num_allocated_mbwu;
+ mpam_partid_max = orig_mpam_partid_max;
+ mpam_pmg_max = orig_mpam_pmg_max;
+ cdp_enabled = orig_cdp_enabled;
+}
+
static struct kunit_case mpam_resctrl_test_cases[] = {
KUNIT_CASE(test_get_mba_granularity),
KUNIT_CASE_PARAM(test_mbw_max_to_percent, test_percent_value_gen_params),
@@ -353,6 +414,7 @@ static struct kunit_case mpam_resctrl_test_cases[] = {
KUNIT_CASE_PARAM(test_percent_max_roundtrip_stability,
test_all_bwa_wd_gen_params),
KUNIT_CASE_PARAM(test_rmid_idx_encoding, test_rmid_idx_gen_params),
+ KUNIT_CASE(test_num_assignable_counters),
{}
};
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 32/47] arm_mpam: resctrl: Add kunit test for ABMC/CDP interactions
2026-01-12 16:58 ` [PATCH v3 32/47] arm_mpam: resctrl: Add kunit test for ABMC/CDP interactions Ben Horgan
@ 2026-01-13 15:26 ` Jonathan Cameron
0 siblings, 0 replies; 160+ messages in thread
From: Jonathan Cameron @ 2026-01-13 15:26 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
On Mon, 12 Jan 2026 16:58:59 +0000
Ben Horgan <ben.horgan@arm.com> wrote:
> From: James Morse <james.morse@arm.com>
>
> ABMC exposes a fun corner case where a platform with one monitor can use
> ABMC for assignable counters - but not when CDP is enabled.
>
> Add some tests.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 33/47] arm_mpam: resctrl: Add resctrl_arch_config_cntr() for ABMC use
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (31 preceding siblings ...)
2026-01-12 16:58 ` [PATCH v3 32/47] arm_mpam: resctrl: Add kunit test for ABMC/CDP interactions Ben Horgan
@ 2026-01-12 16:59 ` Ben Horgan
2026-01-12 16:59 ` [PATCH v3 34/47] arm_mpam: resctrl: Allow resctrl to allocate monitors Ben Horgan
` (18 subsequent siblings)
51 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:59 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: James Morse <james.morse@arm.com>
ABMC has a helper resctrl_arch_config_cntr() for changing the mapping
between 'cntr_id' and a CLOSID/RMID pair.
Add the helper.
For MPAM this is done by updating the mon->mbwu_idx_to_mon[] array, and as
usual CDP means it needs doing in three different ways.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
drivers/resctrl/mpam_resctrl.c | 37 ++++++++++++++++++++++++++++++++++
1 file changed, 37 insertions(+)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index 2a9efc6c6fae..9198af3221d5 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -776,6 +776,43 @@ static void mpam_resctrl_pick_counters(void)
mpam_resctrl_counters[QOS_L3_MBM_TOTAL_EVENT_ID].class);
}
+static void __config_cntr(struct mpam_resctrl_mon *mon, u32 cntr_id,
+ enum resctrl_conf_type cdp_type, u32 closid, u32 rmid,
+ bool assign)
+{
+ u32 mbwu_idx, mon_idx = resctrl_get_config_index(cntr_id, cdp_type);
+
+ closid = resctrl_get_config_index(closid, cdp_type);
+ mbwu_idx = resctrl_arch_rmid_idx_encode(closid, rmid);
+ WARN_ON_ONCE(mon_idx > l3_num_allocated_mbwu);
+
+ if (assign)
+ mon->mbwu_idx_to_mon[mbwu_idx] = mon->assigned_counters[mon_idx];
+ else
+ mon->mbwu_idx_to_mon[mbwu_idx] = -1;
+}
+
+void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
+ enum resctrl_event_id evtid, u32 rmid, u32 closid,
+ u32 cntr_id, bool assign)
+{
+ struct mpam_resctrl_mon *mon = &mpam_resctrl_counters[evtid];
+
+ if (!mon->mbwu_idx_to_mon || !mon->assigned_counters) {
+ pr_debug("monitor arrays not allocated\n");
+ return;
+ }
+
+ if (cdp_enabled) {
+ __config_cntr(mon, cntr_id, CDP_CODE, closid, rmid, assign);
+ __config_cntr(mon, cntr_id, CDP_DATA, closid, rmid, assign);
+ } else {
+ __config_cntr(mon, cntr_id, CDP_NONE, closid, rmid, assign);
+ }
+
+ resctrl_arch_reset_rmid(r, d, closid, rmid, evtid);
+}
+
bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r)
{
if (r != &mpam_resctrl_controls[RDT_RESOURCE_L3].resctrl_res)
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH v3 34/47] arm_mpam: resctrl: Allow resctrl to allocate monitors
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (32 preceding siblings ...)
2026-01-12 16:59 ` [PATCH v3 33/47] arm_mpam: resctrl: Add resctrl_arch_config_cntr() for ABMC use Ben Horgan
@ 2026-01-12 16:59 ` Ben Horgan
2026-01-12 16:59 ` [PATCH v3 35/47] arm_mpam: resctrl: Add resctrl_arch_rmid_read() and resctrl_arch_reset_rmid() Ben Horgan
` (17 subsequent siblings)
51 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:59 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: James Morse <james.morse@arm.com>
When resctrl wants to read a domain's 'QOS_L3_OCCUP', it needs to allocate
a monitor on the corresponding resource. Monitors are allocated by class
instead of component.
MBM monitors are much more complicated, if there are enough monitors, they
will be pre-allocated and free-running. If ABMC is in use instead then
'some' are pre-allocated in a different way, and need assigning.
Add helpers to allocate a CSU monitor. These helper return an out of range
value for MBM counters.
Allocating a montitor context is expected to block until hardware resources
become available. This only makes sense for QOS_L3_OCCUP as unallocated MBM
counters are losing data.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since rfc:
USE_RMID_IDX -> USE_PRE_ALLOCATED in comment
Remove unnecessary arch_mon_ctx = NULL
Changes since v2:
Add include of resctrl_types.h as dropped from earlier patch
---
drivers/resctrl/mpam_internal.h | 14 ++++++-
drivers/resctrl/mpam_resctrl.c | 67 +++++++++++++++++++++++++++++++++
include/linux/arm_mpam.h | 5 +++
3 files changed, 85 insertions(+), 1 deletion(-)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 58076abd5de3..8f73528414af 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -29,6 +29,14 @@ struct platform_device;
#define PACKED_FOR_KUNIT
#endif
+/*
+ * This 'mon' values must not alias an actual monitor, so must be larger than
+ * U16_MAX, but not be confused with an errno value, so smaller than
+ * (u32)-SZ_4K.
+ * USE_PRE_ALLOCATED is used to avoid confusion with an actual monitor.
+ */
+#define USE_PRE_ALLOCATED (U16_MAX + 1)
+
static inline bool mpam_is_enabled(void)
{
return static_branch_likely(&mpam_enabled);
@@ -216,7 +224,11 @@ enum mon_filter_options {
};
struct mon_cfg {
- u16 mon;
+ /*
+ * mon must be large enough to hold out of range values like
+ * USE_PRE_ALLOCATED
+ */
+ u32 mon;
u8 pmg;
bool match_pmg;
bool csu_exclude_clean;
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index 9198af3221d5..c01419c1a381 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -22,6 +22,8 @@
#include "mpam_internal.h"
+DECLARE_WAIT_QUEUE_HEAD(resctrl_mon_ctx_waiters);
+
/*
* The classes we've picked to map to resctrl resources, wrapped
* in with their resctrl structure.
@@ -293,6 +295,71 @@ struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
return &mpam_resctrl_controls[l].resctrl_res;
}
+static int resctrl_arch_mon_ctx_alloc_no_wait(enum resctrl_event_id evtid)
+{
+ struct mpam_resctrl_mon *mon = &mpam_resctrl_counters[evtid];
+
+ if (!mon->class)
+ return -EINVAL;
+
+ switch (evtid) {
+ case QOS_L3_OCCUP_EVENT_ID:
+ /* With CDP, one monitor gets used for both code/data reads */
+ return mpam_alloc_csu_mon(mon->class);
+ case QOS_L3_MBM_LOCAL_EVENT_ID:
+ case QOS_L3_MBM_TOTAL_EVENT_ID:
+ return USE_PRE_ALLOCATED;
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+void *resctrl_arch_mon_ctx_alloc(struct rdt_resource *r,
+ enum resctrl_event_id evtid)
+{
+ DEFINE_WAIT(wait);
+ int *ret;
+
+ ret = kmalloc(sizeof(*ret), GFP_KERNEL);
+ if (!ret)
+ return ERR_PTR(-ENOMEM);
+
+ do {
+ prepare_to_wait(&resctrl_mon_ctx_waiters, &wait,
+ TASK_INTERRUPTIBLE);
+ *ret = resctrl_arch_mon_ctx_alloc_no_wait(evtid);
+ if (*ret == -ENOSPC)
+ schedule();
+ } while (*ret == -ENOSPC && !signal_pending(current));
+ finish_wait(&resctrl_mon_ctx_waiters, &wait);
+
+ return ret;
+}
+
+static void resctrl_arch_mon_ctx_free_no_wait(enum resctrl_event_id evtid,
+ u32 mon_idx)
+{
+ struct mpam_resctrl_mon *mon = &mpam_resctrl_counters[evtid];
+
+ if (!mon->class)
+ return;
+
+ if (evtid == QOS_L3_OCCUP_EVENT_ID)
+ mpam_free_csu_mon(mon->class, mon_idx);
+
+ wake_up(&resctrl_mon_ctx_waiters);
+}
+
+void resctrl_arch_mon_ctx_free(struct rdt_resource *r,
+ enum resctrl_event_id evtid, void *arch_mon_ctx)
+{
+ u32 mon_idx = *(u32 *)arch_mon_ctx;
+
+ kfree(arch_mon_ctx);
+
+ resctrl_arch_mon_ctx_free_no_wait(evtid, mon_idx);
+}
+
static bool cache_has_usable_cpor(struct mpam_class *class)
{
struct mpam_props *cprops = &class->props;
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
index 7d23c90f077d..e1461e32af75 100644
--- a/include/linux/arm_mpam.h
+++ b/include/linux/arm_mpam.h
@@ -5,6 +5,7 @@
#define __LINUX_ARM_MPAM_H
#include <linux/acpi.h>
+#include <linux/resctrl_types.h>
#include <linux/types.h>
struct mpam_msc;
@@ -62,6 +63,10 @@ u32 resctrl_arch_rmid_idx_encode(u32 closid, u32 rmid);
void resctrl_arch_rmid_idx_decode(u32 idx, u32 *closid, u32 *rmid);
u32 resctrl_arch_system_num_rmid_idx(void);
+struct rdt_resource;
+void *resctrl_arch_mon_ctx_alloc(struct rdt_resource *r, enum resctrl_event_id evtid);
+void resctrl_arch_mon_ctx_free(struct rdt_resource *r, enum resctrl_event_id evtid, void *ctx);
+
/**
* mpam_register_requestor() - Register a requestor with the MPAM driver
* @partid_max: The maximum PARTID value the requestor can generate.
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH v3 35/47] arm_mpam: resctrl: Add resctrl_arch_rmid_read() and resctrl_arch_reset_rmid()
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (33 preceding siblings ...)
2026-01-12 16:59 ` [PATCH v3 34/47] arm_mpam: resctrl: Allow resctrl to allocate monitors Ben Horgan
@ 2026-01-12 16:59 ` Ben Horgan
2026-01-12 16:59 ` [PATCH v3 36/47] arm_mpam: resctrl: Add resctrl_arch_cntr_read() & resctrl_arch_reset_cntr() Ben Horgan
` (16 subsequent siblings)
51 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:59 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: James Morse <james.morse@arm.com>
resctrl uses resctrl_arch_rmid_read() to read counters. CDP emulation means
the counter may need reading in three different ways. The same goes for
reset.
The helpers behind the resctrl_arch_ functions will be re-used for the ABMC
equivalent functions.
Add the rounding helper for checking monitor values while we're here.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since rfc:
cfg intialisation style
code flow at end of read_mon_cdp_safe()
Changes since v2:
Whitespace changes
---
drivers/resctrl/mpam_resctrl.c | 158 +++++++++++++++++++++++++++++++++
include/linux/arm_mpam.h | 5 ++
2 files changed, 163 insertions(+)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index c01419c1a381..921be9b53c00 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -360,6 +360,164 @@ void resctrl_arch_mon_ctx_free(struct rdt_resource *r,
resctrl_arch_mon_ctx_free_no_wait(evtid, mon_idx);
}
+static int __read_mon(struct mpam_resctrl_mon *mon, struct mpam_component *mon_comp,
+ enum mpam_device_features mon_type,
+ int mon_idx,
+ enum resctrl_conf_type cdp_type, u32 closid, u32 rmid, u64 *val)
+{
+ struct mon_cfg cfg;
+
+ if (!mpam_is_enabled())
+ return -EINVAL;
+
+ /* Shift closid to account for CDP */
+ closid = resctrl_get_config_index(closid, cdp_type);
+
+ if (mon_idx == USE_PRE_ALLOCATED) {
+ int mbwu_idx = resctrl_arch_rmid_idx_encode(closid, rmid);
+
+ mon_idx = mon->mbwu_idx_to_mon[mbwu_idx];
+ if (mon_idx == -1) {
+ if (mpam_resctrl_abmc_enabled()) {
+ /* Report Unassigned */
+ return -ENOENT;
+ }
+ /* Report Unavailable */
+ return -EINVAL;
+ }
+ }
+
+ if (irqs_disabled()) {
+ /* Check if we can access this domain without an IPI */
+ return -EIO;
+ }
+
+ cfg = (struct mon_cfg) {
+ .mon = mon_idx,
+ .match_pmg = true,
+ .partid = closid,
+ .pmg = rmid,
+ };
+
+ return mpam_msmon_read(mon_comp, &cfg, mon_type, val);
+}
+
+static int read_mon_cdp_safe(struct mpam_resctrl_mon *mon, struct mpam_component *mon_comp,
+ enum mpam_device_features mon_type,
+ int mon_idx, u32 closid, u32 rmid, u64 *val)
+{
+ if (cdp_enabled) {
+ u64 code_val = 0, data_val = 0;
+ int err;
+
+ err = __read_mon(mon, mon_comp, mon_type, mon_idx,
+ CDP_CODE, closid, rmid, &code_val);
+ if (err)
+ return err;
+
+ err = __read_mon(mon, mon_comp, mon_type, mon_idx,
+ CDP_DATA, closid, rmid, &data_val);
+ if (err)
+ return err;
+
+ *val += code_val + data_val;
+ return 0;
+ }
+
+ return __read_mon(mon, mon_comp, mon_type, mon_idx,
+ CDP_NONE, closid, rmid, val);
+}
+
+/* MBWU when not in ABMC mode, and CSU counters. */
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
+ u32 closid, u32 rmid, enum resctrl_event_id eventid,
+ u64 *val, void *arch_mon_ctx)
+{
+ struct mpam_resctrl_dom *l3_dom;
+ struct mpam_component *mon_comp;
+ u32 mon_idx = *(u32 *)arch_mon_ctx;
+ enum mpam_device_features mon_type;
+ struct mpam_resctrl_mon *mon = &mpam_resctrl_counters[eventid];
+
+ resctrl_arch_rmid_read_context_check();
+
+ if (eventid >= QOS_NUM_EVENTS || !mon->class)
+ return -EINVAL;
+
+ l3_dom = container_of(d, struct mpam_resctrl_dom, resctrl_mon_dom);
+ mon_comp = l3_dom->mon_comp[eventid];
+
+ switch (eventid) {
+ case QOS_L3_OCCUP_EVENT_ID:
+ mon_type = mpam_feat_msmon_csu;
+ break;
+ case QOS_L3_MBM_LOCAL_EVENT_ID:
+ case QOS_L3_MBM_TOTAL_EVENT_ID:
+ mon_type = mpam_feat_msmon_mbwu;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return read_mon_cdp_safe(mon, mon_comp, mon_type, mon_idx,
+ closid, rmid, val);
+}
+
+static void __reset_mon(struct mpam_resctrl_mon *mon, struct mpam_component *mon_comp,
+ int mon_idx,
+ enum resctrl_conf_type cdp_type, u32 closid, u32 rmid)
+{
+ struct mon_cfg cfg = { };
+
+ if (!mpam_is_enabled())
+ return;
+
+ /* Shift closid to account for CDP */
+ closid = resctrl_get_config_index(closid, cdp_type);
+
+ if (mon_idx == USE_PRE_ALLOCATED) {
+ int mbwu_idx = resctrl_arch_rmid_idx_encode(closid, rmid);
+ mon_idx = mon->mbwu_idx_to_mon[mbwu_idx];
+ }
+
+ if (mon_idx == -1)
+ return;
+ cfg.mon = mon_idx;
+ mpam_msmon_reset_mbwu(mon_comp, &cfg);
+}
+
+static void reset_mon_cdp_safe(struct mpam_resctrl_mon *mon, struct mpam_component *mon_comp,
+ int mon_idx, u32 closid, u32 rmid)
+{
+ if (cdp_enabled) {
+ __reset_mon(mon, mon_comp, mon_idx, CDP_CODE, closid, rmid);
+ __reset_mon(mon, mon_comp, mon_idx, CDP_DATA, closid, rmid);
+ } else {
+ __reset_mon(mon, mon_comp, mon_idx, CDP_NONE, closid, rmid);
+ }
+}
+
+/* Called via IPI. Call with read_cpus_lock() held. */
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
+ u32 closid, u32 rmid, enum resctrl_event_id eventid)
+{
+ struct mpam_resctrl_dom *l3_dom;
+ struct mpam_component *mon_comp;
+ struct mpam_resctrl_mon *mon = &mpam_resctrl_counters[eventid];
+
+ if (!mpam_is_enabled())
+ return;
+
+ /* Only MBWU counters are relevant, and for supported event types. */
+ if (eventid == QOS_L3_OCCUP_EVENT_ID || !mon->class)
+ return;
+
+ l3_dom = container_of(d, struct mpam_resctrl_dom, resctrl_mon_dom);
+ mon_comp = l3_dom->mon_comp[eventid];
+
+ reset_mon_cdp_safe(mon, mon_comp, USE_PRE_ALLOCATED, closid, rmid);
+}
+
static bool cache_has_usable_cpor(struct mpam_class *class)
{
struct mpam_props *cprops = &class->props;
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
index e1461e32af75..86d5e326d2bd 100644
--- a/include/linux/arm_mpam.h
+++ b/include/linux/arm_mpam.h
@@ -67,6 +67,11 @@ struct rdt_resource;
void *resctrl_arch_mon_ctx_alloc(struct rdt_resource *r, enum resctrl_event_id evtid);
void resctrl_arch_mon_ctx_free(struct rdt_resource *r, enum resctrl_event_id evtid, void *ctx);
+static inline unsigned int resctrl_arch_round_mon_val(unsigned int val)
+{
+ return val;
+}
+
/**
* mpam_register_requestor() - Register a requestor with the MPAM driver
* @partid_max: The maximum PARTID value the requestor can generate.
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH v3 36/47] arm_mpam: resctrl: Add resctrl_arch_cntr_read() & resctrl_arch_reset_cntr()
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (34 preceding siblings ...)
2026-01-12 16:59 ` [PATCH v3 35/47] arm_mpam: resctrl: Add resctrl_arch_rmid_read() and resctrl_arch_reset_rmid() Ben Horgan
@ 2026-01-12 16:59 ` Ben Horgan
2026-01-12 16:59 ` [PATCH v3 37/47] arm_mpam: resctrl: Update the rmid reallocation limit Ben Horgan
` (15 subsequent siblings)
51 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:59 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: James Morse <james.morse@arm.com>
When used in ABMC mode, resctrl uses a different set of helpers to read and
reset the counters.
Add these.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
drivers/resctrl/mpam_resctrl.c | 43 ++++++++++++++++++++++++++++++++++
1 file changed, 43 insertions(+)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index 921be9b53c00..5adc78f9c96f 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -463,6 +463,28 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
closid, rmid, val);
}
+/* MBWU counters when in ABMC mode */
+int resctrl_arch_cntr_read(struct rdt_resource *r, struct rdt_mon_domain *d,
+ u32 closid, u32 rmid, int mon_idx,
+ enum resctrl_event_id eventid, u64 *val)
+{
+ struct mpam_resctrl_mon *mon = &mpam_resctrl_counters[eventid];
+ struct mpam_resctrl_dom *l3_dom;
+ struct mpam_component *mon_comp;
+
+ if (!mpam_is_enabled())
+ return -EINVAL;
+
+ if (eventid == QOS_L3_OCCUP_EVENT_ID || !mon->class)
+ return -EINVAL;
+
+ l3_dom = container_of(d, struct mpam_resctrl_dom, resctrl_mon_dom);
+ mon_comp = l3_dom->mon_comp[eventid];
+
+ return read_mon_cdp_safe(mon, mon_comp, mpam_feat_msmon_mbwu, mon_idx,
+ closid, rmid, val);
+}
+
static void __reset_mon(struct mpam_resctrl_mon *mon, struct mpam_component *mon_comp,
int mon_idx,
enum resctrl_conf_type cdp_type, u32 closid, u32 rmid)
@@ -518,6 +540,27 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
reset_mon_cdp_safe(mon, mon_comp, USE_PRE_ALLOCATED, closid, rmid);
}
+/* Reset an assigned counter */
+void resctrl_arch_reset_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
+ u32 closid, u32 rmid, int cntr_id,
+ enum resctrl_event_id eventid)
+{
+ struct mpam_resctrl_mon *mon = &mpam_resctrl_counters[eventid];
+ struct mpam_resctrl_dom *l3_dom;
+ struct mpam_component *mon_comp;
+
+ if (!mpam_is_enabled())
+ return;
+
+ if (eventid == QOS_L3_OCCUP_EVENT_ID || !mon->class)
+ return;
+
+ l3_dom = container_of(d, struct mpam_resctrl_dom, resctrl_mon_dom);
+ mon_comp = l3_dom->mon_comp[eventid];
+
+ reset_mon_cdp_safe(mon, mon_comp, USE_PRE_ALLOCATED, closid, rmid);
+}
+
static bool cache_has_usable_cpor(struct mpam_class *class)
{
struct mpam_props *cprops = &class->props;
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH v3 37/47] arm_mpam: resctrl: Update the rmid reallocation limit
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (35 preceding siblings ...)
2026-01-12 16:59 ` [PATCH v3 36/47] arm_mpam: resctrl: Add resctrl_arch_cntr_read() & resctrl_arch_reset_cntr() Ben Horgan
@ 2026-01-12 16:59 ` Ben Horgan
2026-01-15 10:05 ` Shaopeng Tan (Fujitsu)
2026-01-12 16:59 ` [PATCH v3 38/47] arm_mpam: resctrl: Add empty definitions for assorted resctrl functions Ben Horgan
` (14 subsequent siblings)
51 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:59 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: James Morse <james.morse@arm.com>
resctrl's limbo code needs to be told when the data left in a cache is
small enough for the partid+pmg value to be re-allocated.
x86 uses the cache size divided by the number of rmid users the cache may
have. Do the same, but for the smallest cache, and with the number of
partid-and-pmg users.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v2:
Move waiting for cache info into it's own patch
---
drivers/resctrl/mpam_resctrl.c | 35 ++++++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index 5adc78f9c96f..a6be3ce84241 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -561,6 +561,38 @@ void resctrl_arch_reset_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
reset_mon_cdp_safe(mon, mon_comp, USE_PRE_ALLOCATED, closid, rmid);
}
+/*
+ * The rmid realloc threshold should be for the smallest cache exposed to
+ * resctrl.
+ */
+static int update_rmid_limits(struct mpam_class *class)
+{
+ u32 num_unique_pmg = resctrl_arch_system_num_rmid_idx();
+ struct mpam_props *cprops = &class->props;
+ struct cacheinfo *ci;
+
+ lockdep_assert_cpus_held();
+
+ /* Assume cache levels are the same size for all CPUs... */
+ ci = get_cpu_cacheinfo_level(smp_processor_id(), class->level);
+ if (!ci || ci->size == 0) {
+ pr_debug("Could not read cache size for class %u\n",
+ class->level);
+ return -EINVAL;
+ }
+
+ if (!mpam_has_feature(mpam_feat_msmon_csu, cprops))
+ return 0;
+
+ if (!resctrl_rmid_realloc_limit ||
+ ci->size < resctrl_rmid_realloc_limit) {
+ resctrl_rmid_realloc_limit = ci->size;
+ resctrl_rmid_realloc_threshold = ci->size / num_unique_pmg;
+ }
+
+ return 0;
+}
+
static bool cache_has_usable_cpor(struct mpam_class *class)
{
struct mpam_props *cprops = &class->props;
@@ -1006,6 +1038,9 @@ static void mpam_resctrl_pick_counters(void)
/* CSU counters only make sense on a cache. */
switch (class->type) {
case MPAM_CLASS_CACHE:
+ if (update_rmid_limits(class))
+ continue;
+
counter_update_class(QOS_L3_OCCUP_EVENT_ID, class);
break;
default:
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 37/47] arm_mpam: resctrl: Update the rmid reallocation limit
2026-01-12 16:59 ` [PATCH v3 37/47] arm_mpam: resctrl: Update the rmid reallocation limit Ben Horgan
@ 2026-01-15 10:05 ` Shaopeng Tan (Fujitsu)
2026-01-15 16:02 ` Ben Horgan
0 siblings, 1 reply; 160+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2026-01-15 10:05 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght@marvell.com, baisheng.gao@unisoc.com,
baolin.wang@linux.alibaba.com, carl@os.amperecomputing.com,
dave.martin@arm.com, david@kernel.org, dfustini@baylibre.com,
fenghuay@nvidia.com, gshan@redhat.com, james.morse@arm.com,
jonathan.cameron@huawei.com, kobak@nvidia.com,
lcherian@marvell.com, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, peternewman@google.com,
punit.agrawal@oss.qualcomm.com, quic_jiles@quicinc.com,
reinette.chatre@intel.com, rohit.mathew@arm.com,
scott@os.amperecomputing.com, sdonthineni@nvidia.com,
xhao@linux.alibaba.com, catalin.marinas@arm.com, will@kernel.org,
corbet@lwn.net, maz@kernel.org, oupton@kernel.org,
joey.gouly@arm.com, suzuki.poulose@arm.com,
kvmarm@lists.linux.dev
Hello Ben,
> From: James Morse <james.morse@arm.com>
>
> resctrl's limbo code needs to be told when the data left in a cache is
> small enough for the partid+pmg value to be re-allocated.
>
> x86 uses the cache size divided by the number of rmid users the cache may
> have. Do the same, but for the smallest cache, and with the number of
> partid-and-pmg users.
>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v2:
> Move waiting for cache info into it's own patch
> ---
> drivers/resctrl/mpam_resctrl.c | 35 ++++++++++++++++++++++++++++++++++
> 1 file changed, 35 insertions(+)
>
> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
> index 5adc78f9c96f..a6be3ce84241 100644
> --- a/drivers/resctrl/mpam_resctrl.c
> +++ b/drivers/resctrl/mpam_resctrl.c
> @@ -561,6 +561,38 @@ void resctrl_arch_reset_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
> reset_mon_cdp_safe(mon, mon_comp, USE_PRE_ALLOCATED, closid, rmid);
> }
>
> +/*
> + * The rmid realloc threshold should be for the smallest cache exposed to
> + * resctrl.
> + */
> +static int update_rmid_limits(struct mpam_class *class)
> +{
> + u32 num_unique_pmg = resctrl_arch_system_num_rmid_idx();
> + struct mpam_props *cprops = &class->props;
> + struct cacheinfo *ci;
> +
> + lockdep_assert_cpus_held();
> +
> + /* Assume cache levels are the same size for all CPUs... */
> + ci = get_cpu_cacheinfo_level(smp_processor_id(), class->level);
> + if (!ci || ci->size == 0) {
> + pr_debug("Could not read cache size for class %u\n",
> + class->level);
> + return -EINVAL;
> + }
> +
> + if (!mpam_has_feature(mpam_feat_msmon_csu, cprops))
> + return 0;
Shouldn't it be return -EOPNOTSUPP;?
However, before the function update_rmid_limits() is called, there is a check: if (cache_has_usable_csu(class) && topology_matches_l3(class)).
cache_has_usable_csu(class) already contains an identical check. Therefore, I think it's safe to remove this redundant one.
Best regards,
Shaopeng TAN
> + if (!resctrl_rmid_realloc_limit ||
> + ci->size < resctrl_rmid_realloc_limit) {
> + resctrl_rmid_realloc_limit = ci->size;
> + resctrl_rmid_realloc_threshold = ci->size / num_unique_pmg;
> + }
> +
> + return 0;
> +}
> +
> static bool cache_has_usable_cpor(struct mpam_class *class)
> {
> struct mpam_props *cprops = &class->props;
> @@ -1006,6 +1038,9 @@ static void mpam_resctrl_pick_counters(void)
> /* CSU counters only make sense on a cache. */
> switch (class->type) {
> case MPAM_CLASS_CACHE:
> + if (update_rmid_limits(class))
> + continue;
> +
> counter_update_class(QOS_L3_OCCUP_EVENT_ID, class);
> break;
> default:
> --
> 2.43.0
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 37/47] arm_mpam: resctrl: Update the rmid reallocation limit
2026-01-15 10:05 ` Shaopeng Tan (Fujitsu)
@ 2026-01-15 16:02 ` Ben Horgan
0 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-15 16:02 UTC (permalink / raw)
To: Shaopeng Tan (Fujitsu)
Cc: amitsinght@marvell.com, baisheng.gao@unisoc.com,
baolin.wang@linux.alibaba.com, carl@os.amperecomputing.com,
dave.martin@arm.com, david@kernel.org, dfustini@baylibre.com,
fenghuay@nvidia.com, gshan@redhat.com, james.morse@arm.com,
jonathan.cameron@huawei.com, kobak@nvidia.com,
lcherian@marvell.com, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, peternewman@google.com,
punit.agrawal@oss.qualcomm.com, quic_jiles@quicinc.com,
reinette.chatre@intel.com, rohit.mathew@arm.com,
scott@os.amperecomputing.com, sdonthineni@nvidia.com,
xhao@linux.alibaba.com, catalin.marinas@arm.com, will@kernel.org,
corbet@lwn.net, maz@kernel.org, oupton@kernel.org,
joey.gouly@arm.com, suzuki.poulose@arm.com,
kvmarm@lists.linux.dev
Hi Shaopeng,
On 1/15/26 10:05, Shaopeng Tan (Fujitsu) wrote:
> Hello Ben,
>
>> From: James Morse <james.morse@arm.com>
>>
>> resctrl's limbo code needs to be told when the data left in a cache is
>> small enough for the partid+pmg value to be re-allocated.
>>
>> x86 uses the cache size divided by the number of rmid users the cache may
>> have. Do the same, but for the smallest cache, and with the number of
>> partid-and-pmg users.
>>
>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>> ---
>> Changes since v2:
>> Move waiting for cache info into it's own patch
>> ---
>> drivers/resctrl/mpam_resctrl.c | 35 ++++++++++++++++++++++++++++++++++
>> 1 file changed, 35 insertions(+)
>>
>> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
>> index 5adc78f9c96f..a6be3ce84241 100644
>> --- a/drivers/resctrl/mpam_resctrl.c
>> +++ b/drivers/resctrl/mpam_resctrl.c
>> @@ -561,6 +561,38 @@ void resctrl_arch_reset_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
>> reset_mon_cdp_safe(mon, mon_comp, USE_PRE_ALLOCATED, closid, rmid);
>> }
>>
>> +/*
>> + * The rmid realloc threshold should be for the smallest cache exposed to
>> + * resctrl.
>> + */
>> +static int update_rmid_limits(struct mpam_class *class)
>> +{
>> + u32 num_unique_pmg = resctrl_arch_system_num_rmid_idx();
>> + struct mpam_props *cprops = &class->props;
>> + struct cacheinfo *ci;
>> +
>> + lockdep_assert_cpus_held();
>> +
>> + /* Assume cache levels are the same size for all CPUs... */
>> + ci = get_cpu_cacheinfo_level(smp_processor_id(), class->level);
>> + if (!ci || ci->size == 0) {
>> + pr_debug("Could not read cache size for class %u\n",
>> + class->level);
>> + return -EINVAL;
>> + }
>> +
>> + if (!mpam_has_feature(mpam_feat_msmon_csu, cprops))
>> + return 0;
>
> Shouldn't it be return -EOPNOTSUPP;?
The intent of returning 0 here is that if csu is not supported on this
class then there is nothing to do and hence no error.
>
> However, before the function update_rmid_limits() is called, there is a check: if (cache_has_usable_csu(class) && topology_matches_l3(class)).
> cache_has_usable_csu(class) already contains an identical check. Therefore, I think it's safe to remove this redundant one.
Yes, the check is redundant and can be removed. (If it were to stay it
should be moved to be before call get_cpu_cacheinfo_level() call so that
wouldn't give spurious errors for non-cache cpus.)
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 38/47] arm_mpam: resctrl: Add empty definitions for assorted resctrl functions
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (36 preceding siblings ...)
2026-01-12 16:59 ` [PATCH v3 37/47] arm_mpam: resctrl: Update the rmid reallocation limit Ben Horgan
@ 2026-01-12 16:59 ` Ben Horgan
2026-01-12 16:59 ` [PATCH v3 39/47] arm64: mpam: Select ARCH_HAS_CPU_RESCTRL Ben Horgan
` (13 subsequent siblings)
51 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:59 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: James Morse <james.morse@arm.com>
A few resctrl features and hooks need to be provided, but aren't needed or
supported on MPAM platforms.
resctrl has individual hooks to separately enable and disable the
closid/partid and rmid/pmg context switching code. For MPAM this is all the
same thing, as the value in struct task_struct is used to cache the value
that should be written to hardware. arm64's context switching code is
enabled once MPAM is usable, but doesn't touch the hardware unless the
value has changed.
For now event configuration is not supported, and can be turned off by
returning 'false' from resctrl_arch_is_evt_configurable().
The new io_alloc feature is not supported either, always return false from
the enable helper to indicate and fail the enable.
Add this, and empty definitions for the other hooks.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
drivers/resctrl/mpam_resctrl.c | 27 +++++++++++++++++++++++++++
include/linux/arm_mpam.h | 9 +++++++++
2 files changed, 36 insertions(+)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index a6be3ce84241..8e2fea2e51d3 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -103,6 +103,23 @@ bool resctrl_arch_mon_capable(void)
return exposed_mon_capable;
}
+bool resctrl_arch_is_evt_configurable(enum resctrl_event_id evt)
+{
+ return false;
+}
+
+void resctrl_arch_mon_event_config_read(void *info)
+{
+}
+
+void resctrl_arch_mon_event_config_write(void *info)
+{
+}
+
+void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d)
+{
+}
+
bool resctrl_arch_get_cdp_enabled(enum resctrl_res_level rid)
{
switch (rid) {
@@ -1129,6 +1146,16 @@ int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable)
return -EINVAL;
}
+int resctrl_arch_io_alloc_enable(struct rdt_resource *r, bool enable)
+{
+ return -EOPNOTSUPP;
+}
+
+bool resctrl_arch_get_io_alloc_enabled(struct rdt_resource *r)
+{
+ return false;
+}
+
static int mpam_resctrl_control_init(struct mpam_resctrl_res *res)
{
struct mpam_class *class = res->class;
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
index 86d5e326d2bd..f92a36187a52 100644
--- a/include/linux/arm_mpam.h
+++ b/include/linux/arm_mpam.h
@@ -67,6 +67,15 @@ struct rdt_resource;
void *resctrl_arch_mon_ctx_alloc(struct rdt_resource *r, enum resctrl_event_id evtid);
void resctrl_arch_mon_ctx_free(struct rdt_resource *r, enum resctrl_event_id evtid, void *ctx);
+/*
+ * The CPU configuration for MPAM is cheap to write, and is only written if it
+ * has changed. No need for fine grained enables.
+ */
+static inline void resctrl_arch_enable_mon(void) { }
+static inline void resctrl_arch_disable_mon(void) { }
+static inline void resctrl_arch_enable_alloc(void) { }
+static inline void resctrl_arch_disable_alloc(void) { }
+
static inline unsigned int resctrl_arch_round_mon_val(unsigned int val)
{
return val;
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH v3 39/47] arm64: mpam: Select ARCH_HAS_CPU_RESCTRL
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (37 preceding siblings ...)
2026-01-12 16:59 ` [PATCH v3 38/47] arm_mpam: resctrl: Add empty definitions for assorted resctrl functions Ben Horgan
@ 2026-01-12 16:59 ` Ben Horgan
2026-01-15 19:16 ` Catalin Marinas
2026-01-12 16:59 ` [PATCH v3 40/47] arm_mpam: resctrl: Call resctrl_init() on platforms that can support resctrl Ben Horgan
` (12 subsequent siblings)
51 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:59 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: James Morse <james.morse@arm.com>
Enough MPAM support is present to enable ARCH_HAS_CPU_RESCTRL. Let it
rip^Wlink!
ARCH_HAS_CPU_RESCTRL indicates resctrl can be enabled. It is enabled by the
arch code simply because it has 'arch' in its name.
This removes ARM_CPU_RESCTRL as a mimic of X86_CPU_RESCTRL. While here,
move the ACPI dependency to the driver's Kconfig file.
In anticipation of MPAM being useful remove the CONFIG_EXPERT restriction.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
arch/arm64/Kconfig | 4 ++--
arch/arm64/include/asm/resctrl.h | 2 ++
drivers/resctrl/Kconfig | 9 ++++++++-
drivers/resctrl/Makefile | 2 +-
4 files changed, 13 insertions(+), 4 deletions(-)
create mode 100644 arch/arm64/include/asm/resctrl.h
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index cdcc5b76a110..baeecb88771d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2026,8 +2026,8 @@ config ARM64_TLB_RANGE
config ARM64_MPAM
bool "Enable support for MPAM"
- select ARM64_MPAM_DRIVER if EXPERT # does nothing yet
- select ACPI_MPAM if ACPI
+ select ARM64_MPAM_DRIVER
+ select ARCH_HAS_CPU_RESCTRL
help
Memory System Resource Partitioning and Monitoring (MPAM) is an
optional extension to the Arm architecture that allows each
diff --git a/arch/arm64/include/asm/resctrl.h b/arch/arm64/include/asm/resctrl.h
new file mode 100644
index 000000000000..b506e95cf6e3
--- /dev/null
+++ b/arch/arm64/include/asm/resctrl.h
@@ -0,0 +1,2 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <linux/arm_mpam.h>
diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
index c808e0470394..672abea3b03c 100644
--- a/drivers/resctrl/Kconfig
+++ b/drivers/resctrl/Kconfig
@@ -1,6 +1,7 @@
menuconfig ARM64_MPAM_DRIVER
bool "MPAM driver"
- depends on ARM64 && ARM64_MPAM && EXPERT
+ depends on ARM64 && ARM64_MPAM
+ select ACPI_MPAM if ACPI
help
Memory System Resource Partitioning and Monitoring (MPAM) driver for
System IP, e.g. caches and memory controllers.
@@ -22,3 +23,9 @@ config MPAM_KUNIT_TEST
If unsure, say N.
endif
+
+config ARM64_MPAM_RESCTRL_FS
+ bool
+ default y if ARM64_MPAM_DRIVER && RESCTRL_FS
+ select RESCTRL_RMID_DEPENDS_ON_CLOSID
+ select RESCTRL_ASSIGN_FIXED
diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
index 40beaf999582..4f6d0e81f9b8 100644
--- a/drivers/resctrl/Makefile
+++ b/drivers/resctrl/Makefile
@@ -1,5 +1,5 @@
obj-$(CONFIG_ARM64_MPAM_DRIVER) += mpam.o
mpam-y += mpam_devices.o
-mpam-$(CONFIG_ARM_CPU_RESCTRL) += mpam_resctrl.o
+mpam-$(CONFIG_ARM64_MPAM_RESCTRL_FS) += mpam_resctrl.o
ccflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG) += -DDEBUG
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 39/47] arm64: mpam: Select ARCH_HAS_CPU_RESCTRL
2026-01-12 16:59 ` [PATCH v3 39/47] arm64: mpam: Select ARCH_HAS_CPU_RESCTRL Ben Horgan
@ 2026-01-15 19:16 ` Catalin Marinas
0 siblings, 0 replies; 160+ messages in thread
From: Catalin Marinas @ 2026-01-15 19:16 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
On Mon, Jan 12, 2026 at 04:59:06PM +0000, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
>
> Enough MPAM support is present to enable ARCH_HAS_CPU_RESCTRL. Let it
> rip^Wlink!
>
> ARCH_HAS_CPU_RESCTRL indicates resctrl can be enabled. It is enabled by the
> arch code simply because it has 'arch' in its name.
>
> This removes ARM_CPU_RESCTRL as a mimic of X86_CPU_RESCTRL. While here,
> move the ACPI dependency to the driver's Kconfig file.
>
> In anticipation of MPAM being useful remove the CONFIG_EXPERT restriction.
>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 40/47] arm_mpam: resctrl: Call resctrl_init() on platforms that can support resctrl
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (38 preceding siblings ...)
2026-01-12 16:59 ` [PATCH v3 39/47] arm64: mpam: Select ARCH_HAS_CPU_RESCTRL Ben Horgan
@ 2026-01-12 16:59 ` Ben Horgan
2026-01-12 16:59 ` [PATCH v3 41/47] arm_mpam: Generate a configuration for min controls Ben Horgan
` (11 subsequent siblings)
51 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:59 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: James Morse <james.morse@arm.com>
Now that MPAM links against resctrl, call resctrl_init() to register the
filesystem and setup resctrl's structures.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v2:
Use for_each_mpam...
error path tidying
---
drivers/resctrl/mpam_devices.c | 32 +++++++++++--
drivers/resctrl/mpam_internal.h | 4 ++
drivers/resctrl/mpam_resctrl.c | 82 ++++++++++++++++++++++++++++++++-
3 files changed, 113 insertions(+), 5 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index c2127570cf37..9fbe4fe3b13a 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -73,6 +73,14 @@ static DECLARE_WORK(mpam_broken_work, &mpam_disable);
/* When mpam is disabled, the printed reason to aid debugging */
static char *mpam_disable_reason;
+/*
+ * Whether resctrl has been setup. Used by cpuhp in preference to
+ * mpam_is_enabled(). The disable call after an error interrupt makes
+ * mpam_is_enabled() false before the cpuhp callbacks are made.
+ * Reads/writes should hold mpam_cpuhp_state_lock, (or be cpuhp callbacks).
+ */
+static bool mpam_resctrl_enabled;
+
/*
* An MSC is a physical container for controls and monitors, each identified by
* their RIS index. These share a base-address, interrupts and some MMIO
@@ -1635,7 +1643,7 @@ static int mpam_cpu_online(unsigned int cpu)
mpam_reprogram_msc(msc);
}
- if (mpam_is_enabled())
+ if (mpam_resctrl_enabled)
return mpam_resctrl_online_cpu(cpu);
return 0;
@@ -1681,7 +1689,7 @@ static int mpam_cpu_offline(unsigned int cpu)
{
struct mpam_msc *msc;
- if (mpam_is_enabled())
+ if (mpam_resctrl_enabled)
mpam_resctrl_offline_cpu(cpu);
guard(srcu)(&mpam_srcu);
@@ -2543,6 +2551,7 @@ static void mpam_enable_once(void)
}
static_branch_enable(&mpam_enabled);
+ mpam_resctrl_enabled = true;
mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline,
"mpam:online");
@@ -2602,24 +2611,39 @@ static void mpam_reset_class(struct mpam_class *class)
void mpam_disable(struct work_struct *ignored)
{
int idx;
+ bool do_resctrl_exit;
struct mpam_class *class;
struct mpam_msc *msc, *tmp;
+ if (mpam_is_enabled())
+ static_branch_disable(&mpam_enabled);
+
mutex_lock(&mpam_cpuhp_state_lock);
if (mpam_cpuhp_state) {
cpuhp_remove_state(mpam_cpuhp_state);
mpam_cpuhp_state = 0;
}
+
+ /*
+ * Removing the cpuhp state called mpam_cpu_offline() and told resctrl
+ * all the CPUs are offline.
+ */
+ do_resctrl_exit = mpam_resctrl_enabled;
+ mpam_resctrl_enabled = false;
mutex_unlock(&mpam_cpuhp_state_lock);
- static_branch_disable(&mpam_enabled);
+ if (do_resctrl_exit)
+ mpam_resctrl_exit();
mpam_unregister_irqs();
idx = srcu_read_lock(&mpam_srcu);
list_for_each_entry_srcu(class, &mpam_classes, classes_list,
- srcu_read_lock_held(&mpam_srcu))
+ srcu_read_lock_held(&mpam_srcu)) {
mpam_reset_class(class);
+ if (do_resctrl_exit)
+ mpam_resctrl_teardown_class(class);
+ }
srcu_read_unlock(&mpam_srcu, idx);
mutex_lock(&mpam_list_lock);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 8f73528414af..d9f52023d730 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -450,12 +450,16 @@ int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
#ifdef CONFIG_RESCTRL_FS
int mpam_resctrl_setup(void);
+void mpam_resctrl_exit(void);
int mpam_resctrl_online_cpu(unsigned int cpu);
void mpam_resctrl_offline_cpu(unsigned int cpu);
+void mpam_resctrl_teardown_class(struct mpam_class *class);
#else
static inline int mpam_resctrl_setup(void) { return 0; }
+static inline void mpam_resctrl_exit(void) { }
static inline int mpam_resctrl_online_cpu(unsigned int cpu) { return 0; }
static inline void mpam_resctrl_offline_cpu(unsigned int cpu) { }
+static inline void mpam_resctrl_teardown_class(struct mpam_class *class) { }
#endif /* CONFIG_RESCTRL_FS */
/*
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index 8e2fea2e51d3..e7b839c478fd 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -70,6 +70,12 @@ static bool cdp_enabled;
static bool cacheinfo_ready;
static DECLARE_WAIT_QUEUE_HEAD(wait_cacheinfo_ready);
+/*
+ * If resctrl_init() succeeded, resctrl_exit() can be used to remove support
+ * for the filesystem in the event of an error.
+ */
+static bool resctrl_enabled;
+
/*
* L3 local/total may come from different classes - what is the number of MBWU
* 'on L3'?
@@ -316,6 +322,9 @@ static int resctrl_arch_mon_ctx_alloc_no_wait(enum resctrl_event_id evtid)
{
struct mpam_resctrl_mon *mon = &mpam_resctrl_counters[evtid];
+ if (!mpam_is_enabled())
+ return -EINVAL;
+
if (!mon->class)
return -EINVAL;
@@ -358,6 +367,9 @@ static void resctrl_arch_mon_ctx_free_no_wait(enum resctrl_event_id evtid,
{
struct mpam_resctrl_mon *mon = &mpam_resctrl_counters[evtid];
+ if (!mpam_is_enabled())
+ return;
+
if (!mon->class)
return;
@@ -458,6 +470,9 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
resctrl_arch_rmid_read_context_check();
+ if (!mpam_is_enabled())
+ return -EINVAL;
+
if (eventid >= QOS_NUM_EVENTS || !mon->class)
return -EINVAL;
@@ -1398,6 +1413,9 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
lockdep_assert_cpus_held();
lockdep_assert_irqs_enabled();
+ if (!mpam_is_enabled())
+ return -EINVAL;
+
/*
* No need to check the CPU as mpam_apply_config() doesn't care, and
* resctrl_arch_update_domains() relies on this.
@@ -1460,6 +1478,9 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
lockdep_assert_cpus_held();
lockdep_assert_irqs_enabled();
+ if (!mpam_is_enabled())
+ return -EINVAL;
+
list_for_each_entry_rcu(d, &r->ctrl_domains, hdr.list) {
for (enum resctrl_conf_type t = 0; t < CDP_NUM_TYPES; t++) {
struct resctrl_staged_config *cfg = &d->staged_config[t];
@@ -1824,7 +1845,11 @@ int mpam_resctrl_setup(void)
return -EOPNOTSUPP;
}
- /* TODO: call resctrl_init() */
+ err = resctrl_init();
+ if (err)
+ return err;
+
+ WRITE_ONCE(resctrl_enabled, true);
return 0;
@@ -1834,6 +1859,61 @@ int mpam_resctrl_setup(void)
return err;
}
+void mpam_resctrl_exit(void)
+{
+ if (!READ_ONCE(resctrl_enabled))
+ return;
+
+ WRITE_ONCE(resctrl_enabled, false);
+ resctrl_exit();
+}
+
+static void mpam_resctrl_teardown_mon(struct mpam_resctrl_mon *mon, struct mpam_class *class)
+{
+ u32 num_mbwu_mon = l3_num_allocated_mbwu;
+
+ if (!mon->mbwu_idx_to_mon)
+ return;
+
+ if (mon->assigned_counters) {
+ __free_mbwu_mon(class, mon->assigned_counters, num_mbwu_mon);
+ mon->assigned_counters = NULL;
+ kfree(mon->mbwu_idx_to_mon);
+ } else {
+ __free_mbwu_mon(class, mon->mbwu_idx_to_mon, num_mbwu_mon);
+ }
+ mon->mbwu_idx_to_mon = NULL;
+}
+
+/*
+ * The driver is detaching an MSC from this class, if resctrl was using it,
+ * pull on resctrl_exit().
+ */
+void mpam_resctrl_teardown_class(struct mpam_class *class)
+{
+ struct mpam_resctrl_res *res;
+ enum resctrl_res_level rid;
+ struct mpam_resctrl_mon *mon;
+ enum resctrl_event_id eventid;
+
+ might_sleep();
+
+ for_each_mpam_resctrl_control(res, rid) {
+ if (res->class == class) {
+ res->class = NULL;
+ break;
+ }
+ }
+ for_each_mpam_resctrl_mon(mon, eventid) {
+ if (mon->class == class) {
+ mon->class = NULL;
+
+ mpam_resctrl_teardown_mon(mon, class);
+ break;
+ }
+ }
+}
+
static int __init __cacheinfo_ready(void)
{
cacheinfo_ready = true;
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH v3 41/47] arm_mpam: Generate a configuration for min controls
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (39 preceding siblings ...)
2026-01-12 16:59 ` [PATCH v3 40/47] arm_mpam: resctrl: Call resctrl_init() on platforms that can support resctrl Ben Horgan
@ 2026-01-12 16:59 ` Ben Horgan
2026-01-13 15:39 ` Jonathan Cameron
2026-01-12 16:59 ` [PATCH v3 42/47] arm_mpam: resctrl: Add kunit test for mbw min control generation Ben Horgan
` (10 subsequent siblings)
51 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:59 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm, Zeng Heng
From: James Morse <james.morse@arm.com>
MPAM supports a minimum and maximum control for memory bandwidth. The
purpose of the minimum control is to give priority to tasks that are below
their minimum value. Resctrl only provides one value for the bandwidth
configuration, which is used for the maximum.
The minimum control is always programmed to zero on hardware that supports
it.
Generate a minimum bandwidth value that is 5% lower than the value provided
by resctrl. This means tasks that are not receiving their target bandwidth
can be prioritised by the hardware.
For component reset reuse the same calculation so that the default is a
value resctrl can set.
CC: Zeng Heng <zengheng4@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since rfc:
Add reset_mbw_min
Clear min cfg when setting max
use mpam_extend_config on component reset
Changes since v2:
bwa_wd limit to 16 moved to earlier patch
restrict scope of min and delta variables
move code out of loop so smaller change for quirking min
move testing into its own commit
---
drivers/resctrl/mpam_devices.c | 69 ++++++++++++++++++++++++++++++---
drivers/resctrl/mpam_internal.h | 3 ++
drivers/resctrl/mpam_resctrl.c | 2 +
3 files changed, 69 insertions(+), 5 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 9fbe4fe3b13a..37bd8efc6ecf 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -1394,8 +1394,12 @@ static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
}
if (mpam_has_feature(mpam_feat_mbw_min, rprops) &&
- mpam_has_feature(mpam_feat_mbw_min, cfg))
- mpam_write_partsel_reg(msc, MBW_MIN, 0);
+ mpam_has_feature(mpam_feat_mbw_min, cfg)) {
+ if (cfg->reset_mbw_min)
+ mpam_write_partsel_reg(msc, MBW_MIN, 0);
+ else
+ mpam_write_partsel_reg(msc, MBW_MIN, cfg->mbw_min);
+ }
if (mpam_has_feature(mpam_feat_mbw_max, rprops) &&
mpam_has_feature(mpam_feat_mbw_max, cfg)) {
@@ -1510,6 +1514,7 @@ static void mpam_init_reset_cfg(struct mpam_config *reset_cfg)
.reset_cpbm = true,
.reset_mbw_pbm = true,
.reset_mbw_max = true,
+ .reset_mbw_min = true,
};
bitmap_fill(reset_cfg->features, MPAM_FEATURE_LAST);
}
@@ -2408,6 +2413,45 @@ static void __destroy_component_cfg(struct mpam_component *comp)
}
}
+static void mpam_extend_config(struct mpam_class *class, struct mpam_config *cfg)
+{
+ struct mpam_props *cprops = &class->props;
+ u16 min_hw_granule, max_hw_value, res0_bits;
+
+ /*
+ * Calculate the values the 'min' control can hold.
+ * e.g. on a platform with bwa_wd = 8, min_hw_granule is 0x00ff because
+ * those bits are RES0. Configurations of this value are effectively
+ * zero. But configurations need to saturate at min_hw_granule on
+ * systems with mismatched bwa_wd, where the 'less than 0' values are
+ * implemented on some MSC, but not others.
+ */
+ res0_bits = 16 - cprops->bwa_wd;
+ max_hw_value = ((1 << cprops->bwa_wd) - 1) << res0_bits;
+ min_hw_granule = ~max_hw_value;
+
+ /*
+ * MAX and MIN should be set together. If only one is provided,
+ * generate a configuration for the other. If only one control
+ * type is supported, the other value will be ignored.
+ *
+ * Resctrl can only configure the MAX.
+ */
+ if (mpam_has_feature(mpam_feat_mbw_max, cfg) &&
+ !mpam_has_feature(mpam_feat_mbw_min, cfg)) {
+ u16 min, delta;
+
+ delta = ((5 * MPAMCFG_MBW_MAX_MAX) / 100) - 1;
+ if (cfg->mbw_max > delta)
+ min = cfg->mbw_max - delta;
+ else
+ min = 0;
+
+ cfg->mbw_min = max(min, min_hw_granule);
+ mpam_set_feature(mpam_feat_mbw_min, cfg);
+ }
+}
+
static void mpam_reset_component_cfg(struct mpam_component *comp)
{
int i;
@@ -2426,6 +2470,8 @@ static void mpam_reset_component_cfg(struct mpam_component *comp)
comp->cfg[i].mbw_pbm = GENMASK(cprops->mbw_pbm_bits - 1, 0);
if (cprops->bwa_wd)
comp->cfg[i].mbw_max = GENMASK(15, 16 - cprops->bwa_wd);
+
+ mpam_extend_config(comp->class, &comp->cfg[i]);
}
}
@@ -2701,24 +2747,37 @@ static bool mpam_update_config(struct mpam_config *cfg,
maybe_update_config(cfg, mpam_feat_cpor_part, newcfg, cpbm, has_changes);
maybe_update_config(cfg, mpam_feat_mbw_part, newcfg, mbw_pbm, has_changes);
maybe_update_config(cfg, mpam_feat_mbw_max, newcfg, mbw_max, has_changes);
+ maybe_update_config(cfg, mpam_feat_mbw_min, newcfg, mbw_min, has_changes);
return has_changes;
}
int mpam_apply_config(struct mpam_component *comp, u16 partid,
- struct mpam_config *cfg)
+ struct mpam_config *user_cfg)
{
struct mpam_write_config_arg arg;
struct mpam_msc_ris *ris;
+ struct mpam_config cfg;
struct mpam_vmsc *vmsc;
struct mpam_msc *msc;
lockdep_assert_cpus_held();
/* Don't pass in the current config! */
- WARN_ON_ONCE(&comp->cfg[partid] == cfg);
+ WARN_ON_ONCE(&comp->cfg[partid] == user_cfg);
+
+ /*
+ * Copy the config to avoid writing back the 'extended' version to
+ * the caller.
+ * This avoids mpam_devices.c setting a mbm_min that mpam_resctrl.c
+ * is unaware of ... when it then changes mbm_max to be lower than
+ * mbm_min.
+ */
+ cfg = *user_cfg;
+
+ mpam_extend_config(comp->class, &cfg);
- if (!mpam_update_config(&comp->cfg[partid], cfg))
+ if (!mpam_update_config(&comp->cfg[partid], &cfg))
return 0;
arg.comp = comp;
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index d9f52023d730..69cb75616561 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -278,10 +278,12 @@ struct mpam_config {
u32 cpbm;
u32 mbw_pbm;
u16 mbw_max;
+ u16 mbw_min;
bool reset_cpbm;
bool reset_mbw_pbm;
bool reset_mbw_max;
+ bool reset_mbw_min;
struct mpam_garbage garbage;
};
@@ -618,6 +620,7 @@ static inline void mpam_resctrl_teardown_class(struct mpam_class *class) { }
* MPAMCFG_MBW_MAX - MPAM memory maximum bandwidth partitioning configuration
* register
*/
+#define MPAMCFG_MBW_MAX_MAX_NR_BITS 16
#define MPAMCFG_MBW_MAX_MAX GENMASK(15, 0)
#define MPAMCFG_MBW_MAX_HARDLIM BIT(31)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index e7b839c478fd..019f7a1d74fd 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -1446,6 +1446,8 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
if (mpam_has_feature(mpam_feat_mbw_max, cprops)) {
cfg.mbw_max = percent_to_mbw_max(cfg_val, cprops);
mpam_set_feature(mpam_feat_mbw_max, &cfg);
+ /* Allow the min to be calculated from the max */
+ mpam_clear_feature(mpam_feat_mbw_min, &cfg);
break;
}
fallthrough;
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 41/47] arm_mpam: Generate a configuration for min controls
2026-01-12 16:59 ` [PATCH v3 41/47] arm_mpam: Generate a configuration for min controls Ben Horgan
@ 2026-01-13 15:39 ` Jonathan Cameron
2026-01-30 14:17 ` Ben Horgan
0 siblings, 1 reply; 160+ messages in thread
From: Jonathan Cameron @ 2026-01-13 15:39 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm, Zeng Heng
On Mon, 12 Jan 2026 16:59:08 +0000
Ben Horgan <ben.horgan@arm.com> wrote:
> From: James Morse <james.morse@arm.com>
>
> MPAM supports a minimum and maximum control for memory bandwidth. The
> purpose of the minimum control is to give priority to tasks that are below
> their minimum value. Resctrl only provides one value for the bandwidth
> configuration, which is used for the maximum.
>
> The minimum control is always programmed to zero on hardware that supports
> it.
>
> Generate a minimum bandwidth value that is 5% lower than the value provided
> by resctrl. This means tasks that are not receiving their target bandwidth
> can be prioritised by the hardware.
>
> For component reset reuse the same calculation so that the default is a
> value resctrl can set.
>
> CC: Zeng Heng <zengheng4@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
I'm interested to see how this plays out as a default choice
vs what people elect to run. Seems harmless to start with this.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 41/47] arm_mpam: Generate a configuration for min controls
2026-01-13 15:39 ` Jonathan Cameron
@ 2026-01-30 14:17 ` Ben Horgan
2026-01-31 2:30 ` Shanker Donthineni
0 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-30 14:17 UTC (permalink / raw)
To: Jonathan Cameron, fenghuay@nvidia.com
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm, Zeng Heng
Hi Fenghua, Jonathan,
On 1/13/26 15:39, Jonathan Cameron wrote:
> On Mon, 12 Jan 2026 16:59:08 +0000
> Ben Horgan <ben.horgan@arm.com> wrote:
>
>> From: James Morse <james.morse@arm.com>
>>
>> MPAM supports a minimum and maximum control for memory bandwidth. The
>> purpose of the minimum control is to give priority to tasks that are below
>> their minimum value. Resctrl only provides one value for the bandwidth
>> configuration, which is used for the maximum.
>>
>> The minimum control is always programmed to zero on hardware that supports
>> it.
>>
>> Generate a minimum bandwidth value that is 5% lower than the value provided
>> by resctrl. This means tasks that are not receiving their target bandwidth
>> can be prioritised by the hardware.
>>
>> For component reset reuse the same calculation so that the default is a
>> value resctrl can set.
>>
>> CC: Zeng Heng <zengheng4@huawei.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>
> I'm interested to see how this plays out as a default choice
> vs what people elect to run. Seems harmless to start with this.
I've realised it's not harmless :(
>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>
In the discussion on a platform quirk, arm_mpam: Add workaround for
T241-MPAM-4, Fenghua raised the following issues.
"
MBW_MIN is 1% or 5% less than MBW_MAX.
The lower MBW_MIN hints hardware to lower mem bandwidth when mem access
contention. That causes memory performance degradation.
Is it possible to do the following changes to fix the performance issue?
1. By default min mbw is equal to max mbw. So hardware won't lower
performance unless it's needed. This can fix the current performance issue.
2. Add a new schemata line (e.g. MBI:<id>=x;<id>=y;...) to specify min
mbw just like max mbw specified by schemata line "MB:...". User can use
this line to change min mbw per partition per node. This could be added
in the future.
"
On 1.
Thinking about this again, I think adding any heuristic tied to mbw_max
to determine what mbw_min is undesirable. Loading the mpam driver or
mounting resctrl shouldn't change the defaults away from the defaults
for h/w partid 0 or performance characteristics may change unexpectedly.
The spec only gives us suggestions for these but we should go with
those. See table 3.8 in IH0099B.a Mpam System Component Specification.
The MBW_MIN that is 0xFFFF. Also, having mbw_min doesn't necessarily
mean that there is mbw_max. A system that doesn't advertise mbw_min
support to the user should act as if there is no mbw_min support.
On 2.
Yes, adding a new user interface in resctrl is the way to deal with
this. See [1] for a discussion on adding new schema.
Hence, I'll drop this patch, and update the mbw_min default to be 0xFFFF
and for the value not to change even if mbw_max changes. I think this
leaves us in the best position going forward without any heuristics that
may come back to bite us later when proper support for a schema
supporting mbw_min is added to resctrl.
[1] https://lore.kernel.org/lkml/aPtfMFfLV1l%2FRB0L@e133380.arm.com/
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 41/47] arm_mpam: Generate a configuration for min controls
2026-01-30 14:17 ` Ben Horgan
@ 2026-01-31 2:30 ` Shanker Donthineni
2026-02-02 10:21 ` Ben Horgan
0 siblings, 1 reply; 160+ messages in thread
From: Shanker Donthineni @ 2026-01-31 2:30 UTC (permalink / raw)
To: Ben Horgan, Jonathan Cameron, fenghuay@nvidia.com
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, gshan, james.morse, kobak, lcherian, linux-arm-kernel,
linux-kernel, peternewman, punit.agrawal, quic_jiles,
reinette.chatre, rohit.mathew, scott, tan.shaopeng, xhao,
catalin.marinas, will, corbet, maz, oupton, joey.gouly,
suzuki.poulose, kvmarm, Zeng Heng
Hi Ben,
On 1/30/2026 8:17 AM, Ben Horgan wrote:
> External email: Use caution opening links or attachments
>
>
> Hi Fenghua, Jonathan,
>
> On 1/13/26 15:39, Jonathan Cameron wrote:
>> On Mon, 12 Jan 2026 16:59:08 +0000
>> Ben Horgan <ben.horgan@arm.com> wrote:
>>
>>> From: James Morse <james.morse@arm.com>
>>>
>>> MPAM supports a minimum and maximum control for memory bandwidth. The
>>> purpose of the minimum control is to give priority to tasks that are below
>>> their minimum value. Resctrl only provides one value for the bandwidth
>>> configuration, which is used for the maximum.
>>>
>>>
>>> Hence, I'll drop this patch, and update the mbw_min default to be 0xFFFF
>>> and for the value not to change even if mbw_max changes. I think this
>>> leaves us in the best position going forward without any heuristics that
>>> may come back to bite us later when proper support for a schema
>>> supporting mbw_min is added to resctrl.
Background: I previouslyshared original fix(seecodesnippet below) with
James Morse
~2 years ago to address the errata, which explicitly recommends usinga
5% gap for
mitigation of the Hardware issue (the problem described in commit text
of T241-MPAM-4)
For some reason theoriginalimplementationwas splitinto two patches:
- Generic change applicable toall chips
- Specific fixfor Graceerrata T241-MPAM-4
Issue: Dropping this patch impacts[PATCH v3 45/47] forthe errata fix. If
removalis
necessary, please mergethis changeinto the T241-MPAM-4-specific patch.
--- a/drivers/platform/mpam/mpam_devices.c
+++ b/drivers/platform/mpam/mpam_devices.c
@@ -1190,8 +1190,12 @@ static void mpam_reprogram_ris_partid(struct
mpam_msc_ris *ris, u16 partid, rprops->mbw_pbm_bits);
}
- if (mpam_has_feature(mpam_feat_mbw_min, rprops))
- mpam_write_partsel_reg(msc, MBW_MIN, 0);
+ if (mpam_has_feature(mpam_feat_mbw_min, rprops)) {
+ if (mpam_has_feature(mpam_feat_mbw_max, cfg))
+ mpam_write_partsel_reg(msc, MBW_MIN, cfg->mbw_min);
+ else
+ mpam_write_partsel_reg(msc, MBW_MIN, 0);
+ }
if (mpam_has_feature(mpam_feat_mbw_max, rprops)) {
if (mpam_has_feature(mpam_feat_mbw_max, cfg)) @@
-2332,6 +2336,31 @@ static int __write_config(void *arg)
return 0;
}
+static void mpam_extend_config(struct mpam_class *class, struct mpam_config *cfg)
+{
+ struct mpam_props *cprops = &class->props;
+ u32 min, delta;
+
+ /*
+ * MAX and MIN should be set together. If only one is provided,
+ * generate a configuration for the other. If only one control
+ * type is supported, the other value will be ignored.
+ *
+ * Resctrl can only configure the MAX.
+ *
+ * Parts affected by Nvidia's T241-MPAM-4 depend on this occurring,
+ * and recommend a 5% difference.
+ */
+ if (mpam_has_feature(mpam_feat_mbw_max, cfg) &&
+ !mpam_has_feature(mpam_feat_mbw_min, cfg)){
+ delta = ((5 * MPAMCFG_MBW_MAX_MAX) / 100) - 1;
+ min = max_t(s32, cfg->mbw_max - delta, BIT(cprops->bwa_wd));
+
+ cfg->mbw_min = max_t(s32, cfg->mbw_max - delta, BIT(16 - cprops->bwa_wd));
+ mpam_set_feature(mpam_feat_mbw_min, cfg);
+ }
+}
Shanker
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 41/47] arm_mpam: Generate a configuration for min controls
2026-01-31 2:30 ` Shanker Donthineni
@ 2026-02-02 10:21 ` Ben Horgan
2026-02-02 16:34 ` Shanker Donthineni
0 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-02-02 10:21 UTC (permalink / raw)
To: Shanker Donthineni, Jonathan Cameron, fenghuay@nvidia.com
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, gshan, james.morse, kobak, lcherian, linux-arm-kernel,
linux-kernel, peternewman, punit.agrawal, quic_jiles,
reinette.chatre, rohit.mathew, scott, tan.shaopeng, xhao,
catalin.marinas, will, corbet, maz, oupton, joey.gouly,
suzuki.poulose, kvmarm, Zeng Heng
Hi Shanker,
On 1/31/26 02:30, Shanker Donthineni wrote:
> Hi Ben,
>
> On 1/30/2026 8:17 AM, Ben Horgan wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Hi Fenghua, Jonathan,
>>
>> On 1/13/26 15:39, Jonathan Cameron wrote:
>>> On Mon, 12 Jan 2026 16:59:08 +0000
>>> Ben Horgan <ben.horgan@arm.com> wrote:
>>>
>>>> From: James Morse <james.morse@arm.com>
>>>>
>>>> MPAM supports a minimum and maximum control for memory bandwidth. The
>>>> purpose of the minimum control is to give priority to tasks that are
>>>> below
>>>> their minimum value. Resctrl only provides one value for the bandwidth
>>>> configuration, which is used for the maximum.
>>>>
>>>>
>>>> Hence, I'll drop this patch, and update the mbw_min default to be
>>>> 0xFFFF
>>>> and for the value not to change even if mbw_max changes. I think this
>>>> leaves us in the best position going forward without any heuristics
>>>> that
>>>> may come back to bite us later when proper support for a schema
>>>> supporting mbw_min is added to resctrl.
>
> Background: I previouslyshared original fix(seecodesnippet below) with
> James Morse
> ~2 years ago to address the errata, which explicitly recommends usinga
> 5% gap for
> mitigation of the Hardware issue (the problem described in commit text
> of T241-MPAM-4)
>
> For some reason theoriginalimplementationwas splitinto two patches:
> - Generic change applicable toall chips
> - Specific fixfor Graceerrata T241-MPAM-4
> >
> Issue: Dropping this patch impacts[PATCH v3 45/47] forthe errata fix. If
> removalis
> necessary, please mergethis changeinto the T241-MPAM-4-specific patch.
What's the behaviour on T241 when MBW_MIN is always 0xFFFF?
I'm worried if we make a policy decision of how to set MBW_MIN based on
MBW_MAX for this platform then we won't be able to support a
configurable MBW_MIN in the future for this platform. As when MBW_MIN
support is added in resctrl the user's configuration for this platform
would change meaning on kernel upgrade.
>
> --- a/drivers/platform/mpam/mpam_devices.c
> +++ b/drivers/platform/mpam/mpam_devices.c
> @@ -1190,8 +1190,12 @@ static void mpam_reprogram_ris_partid(struct
> mpam_msc_ris *ris, u16 partid, rprops->mbw_pbm_bits);
> }
> - if (mpam_has_feature(mpam_feat_mbw_min, rprops))
> - mpam_write_partsel_reg(msc, MBW_MIN, 0);
> + if (mpam_has_feature(mpam_feat_mbw_min, rprops)) {
> + if (mpam_has_feature(mpam_feat_mbw_max, cfg))
> + mpam_write_partsel_reg(msc, MBW_MIN, cfg->mbw_min);
> + else
> + mpam_write_partsel_reg(msc, MBW_MIN, 0);
> + }
> if (mpam_has_feature(mpam_feat_mbw_max, rprops)) {
> if (mpam_has_feature(mpam_feat_mbw_max, cfg)) @@
> -2332,6 +2336,31 @@ static int __write_config(void *arg)
> return 0;
> }
> +static void mpam_extend_config(struct mpam_class *class, struct
> mpam_config *cfg)
> +{
> + struct mpam_props *cprops = &class->props;
> + u32 min, delta;
> +
> + /*
> + * MAX and MIN should be set together. If only one is provided,
> + * generate a configuration for the other. If only one control
> + * type is supported, the other value will be ignored.
> + *
> + * Resctrl can only configure the MAX.
> + *
> + * Parts affected by Nvidia's T241-MPAM-4 depend on this occurring,
> + * and recommend a 5% difference.
> + */
> + if (mpam_has_feature(mpam_feat_mbw_max, cfg) &&
> + !mpam_has_feature(mpam_feat_mbw_min, cfg)){
> + delta = ((5 * MPAMCFG_MBW_MAX_MAX) / 100) - 1;
> + min = max_t(s32, cfg->mbw_max - delta, BIT(cprops-
>>bwa_wd));
> +
> + cfg->mbw_min = max_t(s32, cfg->mbw_max - delta, BIT(16 -
> cprops->bwa_wd));
> + mpam_set_feature(mpam_feat_mbw_min, cfg);
> + }
> +}
>
> Shanker
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 41/47] arm_mpam: Generate a configuration for min controls
2026-02-02 10:21 ` Ben Horgan
@ 2026-02-02 16:34 ` Shanker Donthineni
2026-02-03 9:33 ` Ben Horgan
0 siblings, 1 reply; 160+ messages in thread
From: Shanker Donthineni @ 2026-02-02 16:34 UTC (permalink / raw)
To: Ben Horgan, Jonathan Cameron, fenghuay@nvidia.com
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, gshan, james.morse, kobak, lcherian, linux-arm-kernel,
linux-kernel, peternewman, punit.agrawal, quic_jiles,
reinette.chatre, rohit.mathew, scott, tan.shaopeng, xhao,
catalin.marinas, will, corbet, maz, oupton, joey.gouly,
suzuki.poulose, kvmarm, Zeng Heng, jasjits, Jason Sequeira,
Vikram Sethi
Hi Ben,
On 2/2/2026 4:21 AM, Ben Horgan wrote:
> External email: Use caution opening links or attachments
>
>
> Hi Shanker,
>
> On 1/31/26 02:30, Shanker Donthineni wrote:
>> Hi Ben,
>>
>> On 1/30/2026 8:17 AM, Ben Horgan wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> Hi Fenghua, Jonathan,
>>>
>>> On 1/13/26 15:39, Jonathan Cameron wrote:
>>>> On Mon, 12 Jan 2026 16:59:08 +0000
>>>> Ben Horgan <ben.horgan@arm.com> wrote:
>>>>
>>>>> From: James Morse <james.morse@arm.com>
>>>>>
>>>>> MPAM supports a minimum and maximum control for memory bandwidth. The
>>>>> purpose of the minimum control is to give priority to tasks that are
>>>>> below
>>>>> their minimum value. Resctrl only provides one value for the bandwidth
>>>>> configuration, which is used for the maximum.
>>>>>
>>>>>
>>>>> Hence, I'll drop this patch, and update the mbw_min default to be
>>>>> 0xFFFF
>>>>> and for the value not to change even if mbw_max changes. I think this
>>>>> leaves us in the best position going forward without any heuristics
>>>>> that
>>>>> may come back to bite us later when proper support for a schema
>>>>> supporting mbw_min is added to resctrl.
>> Background: I previouslyshared original fix(seecodesnippet below) with
>> James Morse
>> ~2 years ago to address the errata, which explicitly recommends usinga
>> 5% gap for
>> mitigation of the Hardware issue (the problem described in commit text
>> of T241-MPAM-4)
>>
>> For some reason theoriginalimplementationwas splitinto two patches:
>> - Generic change applicable toall chips
>> - Specific fixfor Graceerrata T241-MPAM-4
>> Issue: Dropping this patch impacts[PATCH v3 45/47] forthe errata fix. If
>> removalis
>> necessary, please mergethis changeinto the T241-MPAM-4-specific patch.
>
> What's the behaviour on T241 when MBW_MIN is always 0xFFFF?
Memory bandwidth throttling will not function correctly. The MPAM hardware
monitors MIN and MAX values for each active partition to maintain memory
bandwidth usage between MBW_MIN and MBW_MAX. Therefore, MBW_MIN must be
less than MBW_MAX (IMO, setting MBW_MIN to always 0xFFFF is incorrect)
Grace errata T241-MPAM-4 has two issues:
- MBW_MIN must be greater than 0 (WAR set to one when when it's zero) - In the Grace implementation of memory-bandwidth partitioning (MPAM),
in the absence of contention for bandwidth, the minimum bandwidth
setting can affect the amount of achieved bandwidth. Specifically,
the achieved bandwidth in the absence of contention can settle to any
value between the values of MIN and MAX. This means if the gap between
MIN and MAX is large then the BW can settle closer to MIN. To achieve
BW closer to MAX in the absence of contention, software should configure
a relatively narrow gap between MPAMCFG_MBW_MIN and MPAMCFG_MBW_MAX.
The recommendation is to use a 5% gap, corresponding to an absolute
difference of (0xFFFF * 0.05) = 0xCCC between MPAMCFG_MBW_MIN and
MPAMCFG_MBW_MAX.
> I'm worried if we make a policy decision of how to set MBW_MIN based on
> MBW_MAX for this platform then we won't be able to support a
> configurable MBW_MIN in the future for this platform.
Yes, we can't support generic programmable MBW_MIN for Grace chip. The
currentresctrl interface doesnot exposeMBW_MIN, preventingusers from
configuring the recommended5% gap. Without this interfacesupport,
theonly wayto applytheworkaround is through driver-level changes.
> As when MBW_MIN
> support is added in resctrl the user's configuration for this platform
> would change meaning on kernel upgrade.
What is the timelineforaddingMBW_MIN support? We have two options.
Option-A: Keep the current WAR 5% gap and don't allow users to program MBW_MIN.
Option-B:Remove the5% gap workaround and relyon usersto program MBW_MIN
accordingto the Grace recommendations whentheinterfacebecomes available.
We'll prefer option-B.
Thanks,
Shanker
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 41/47] arm_mpam: Generate a configuration for min controls
2026-02-02 16:34 ` Shanker Donthineni
@ 2026-02-03 9:33 ` Ben Horgan
0 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-02-03 9:33 UTC (permalink / raw)
To: Shanker Donthineni, Jonathan Cameron, fenghuay@nvidia.com
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, gshan, james.morse, kobak, lcherian, linux-arm-kernel,
linux-kernel, peternewman, punit.agrawal, quic_jiles,
reinette.chatre, rohit.mathew, scott, tan.shaopeng, xhao,
catalin.marinas, will, corbet, maz, oupton, joey.gouly,
suzuki.poulose, kvmarm, Zeng Heng, jasjits, Jason Sequeira,
Vikram Sethi
Hi Shanker, Fenghua,
On 2/2/26 16:34, Shanker Donthineni wrote:
> Hi Ben,
>
> On 2/2/2026 4:21 AM, Ben Horgan wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Hi Shanker,
>>
>> On 1/31/26 02:30, Shanker Donthineni wrote:
>>> Hi Ben,
>>>
>>> On 1/30/2026 8:17 AM, Ben Horgan wrote:
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> Hi Fenghua, Jonathan,
>>>>
>>>> On 1/13/26 15:39, Jonathan Cameron wrote:
>>>>> On Mon, 12 Jan 2026 16:59:08 +0000
>>>>> Ben Horgan <ben.horgan@arm.com> wrote:
>>>>>
>>>>>> From: James Morse <james.morse@arm.com>
>>>>>>
>>>>>> MPAM supports a minimum and maximum control for memory bandwidth. The
>>>>>> purpose of the minimum control is to give priority to tasks that are
>>>>>> below
>>>>>> their minimum value. Resctrl only provides one value for the
>>>>>> bandwidth
>>>>>> configuration, which is used for the maximum.
>>>>>>
>>>>>>
>>>>>> Hence, I'll drop this patch, and update the mbw_min default to be
>>>>>> 0xFFFF
>>>>>> and for the value not to change even if mbw_max changes. I think this
>>>>>> leaves us in the best position going forward without any heuristics
>>>>>> that
>>>>>> may come back to bite us later when proper support for a schema
>>>>>> supporting mbw_min is added to resctrl.
>>> Background: I previouslyshared original fix(seecodesnippet below) with
>>> James Morse
>>> ~2 years ago to address the errata, which explicitly recommends usinga
>>> 5% gap for
>>> mitigation of the Hardware issue (the problem described in commit text
>>> of T241-MPAM-4)
>>>
>>> For some reason theoriginalimplementationwas splitinto two patches:
>>> - Generic change applicable toall chips
>>> - Specific fixfor Graceerrata T241-MPAM-4
>>> Issue: Dropping this patch impacts[PATCH v3 45/47] forthe errata fix. If
>>> removalis
>>> necessary, please mergethis changeinto the T241-MPAM-4-specific patch.
>>
>> What's the behaviour on T241 when MBW_MIN is always 0xFFFF?
>
> Memory bandwidth throttling will not function correctly. The MPAM hardware
> monitors MIN and MAX values for each active partition to maintain memory
> bandwidth usage between MBW_MIN and MBW_MAX. Therefore, MBW_MIN must be
> less than MBW_MAX (IMO, setting MBW_MIN to always 0xFFFF is incorrect)
Ah, yes. 0xFFFF is indeed a bad default. Looking at Table 5-3 in Mpam
system component B.a I see that as all bandwidth will be below the
minimum and so high preference the MBW_MAX will have no effect. I'll
keep the default for MBW_MIN as 0 (or the minimum for grace).
>
> Grace errata T241-MPAM-4 has two issues:
> - MBW_MIN must be greater than 0 (WAR set to one when when it's zero) -
> In the Grace implementation of memory-bandwidth partitioning (MPAM),
> in the absence of contention for bandwidth, the minimum bandwidth
> setting can affect the amount of achieved bandwidth. Specifically,
> the achieved bandwidth in the absence of contention can settle to any
> value between the values of MIN and MAX. This means if the gap between
> MIN and MAX is large then the BW can settle closer to MIN. To achieve
> BW closer to MAX in the absence of contention, software should configure
> a relatively narrow gap between MPAMCFG_MBW_MIN and MPAMCFG_MBW_MAX.
> The recommendation is to use a 5% gap, corresponding to an absolute
> difference of (0xFFFF * 0.05) = 0xCCC between MPAMCFG_MBW_MIN and
> MPAMCFG_MBW_MAX.
Ok, thanks. I understand the issue more now.
>
>> I'm worried if we make a policy decision of how to set MBW_MIN based on
>> MBW_MAX for this platform then we won't be able to support a
>> configurable MBW_MIN in the future for this platform.
>
> Yes, we can't support generic programmable MBW_MIN for Grace chip. The
> currentresctrl interface doesnot exposeMBW_MIN, preventingusers from
> configuring the recommended5% gap. Without this interfacesupport,
> theonly wayto applytheworkaround is through driver-level changes.
>
>> As when MBW_MIN
>> support is added in resctrl the user's configuration for this platform
>> would change meaning on kernel upgrade.
>
> What is the timelineforaddingMBW_MIN support? We have two options.
> Option-A: Keep the current WAR 5% gap and don't allow users to program
> MBW_MIN.
> Option-B:Remove the5% gap workaround and relyon usersto program MBW_MIN
> accordingto the Grace recommendations
> whentheinterfacebecomes available.
>
> We'll prefer option-B.
The problem with option-B is that the transition introduces a change in
user visible
for any existing MBW_MAX configuration.
If option-A is preferable to disabling MBW_MAX on grace until we have
proper MBW_MIN support in resctrl then I think we should assume option-A.
The work to decide how new schema is underway but it's difficult to say
how long it will take.
See: https://lore.kernel.org/lkml/aPtfMFfLV1l%2FRB0L@e133380.arm.com/
Assuming that you're sure that the 5% gap is the best policy and that
there are no other objections I'll add that policy back into the
T241-MPAM-4 workaround and look into a way to ensure that we don't
accidentally enable MBW_MIN support for grace comes when the proper
support is added.
>
> Thanks,
> Shanker
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 42/47] arm_mpam: resctrl: Add kunit test for mbw min control generation
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (40 preceding siblings ...)
2026-01-12 16:59 ` [PATCH v3 41/47] arm_mpam: Generate a configuration for min controls Ben Horgan
@ 2026-01-12 16:59 ` Ben Horgan
2026-01-13 15:43 ` Jonathan Cameron
2026-01-12 16:59 ` [PATCH v3 43/47] arm_mpam: Add quirk framework Ben Horgan
` (9 subsequent siblings)
51 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:59 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
By default we generate a minimum bandwidth value that is 5% lower
than the maximum bandwidth value given by resctrl.
Add a test for this.
Signed-off-by: James Morse <james.morse@arm.com>
[horgan: Split test into separate patch]
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
drivers/resctrl/test_mpam_devices.c | 66 +++++++++++++++++++++++++++++
1 file changed, 66 insertions(+)
diff --git a/drivers/resctrl/test_mpam_devices.c b/drivers/resctrl/test_mpam_devices.c
index 3e8d564a0c64..2f802fd9f249 100644
--- a/drivers/resctrl/test_mpam_devices.c
+++ b/drivers/resctrl/test_mpam_devices.c
@@ -322,6 +322,71 @@ static void test_mpam_enable_merge_features(struct kunit *test)
mutex_unlock(&mpam_list_lock);
}
+static void test_mpam_extend_config(struct kunit *test)
+{
+ struct mpam_config fake_cfg = { };
+ struct mpam_class fake_class = { };
+
+ /* Configurations with both are not modified */
+ fake_class.props.bwa_wd = 16;
+ fake_cfg.mbw_max = 0xfeef;
+ fake_cfg.mbw_min = 0xfeef;
+ bitmap_zero(fake_cfg.features, MPAM_FEATURE_LAST);
+ mpam_set_feature(mpam_feat_mbw_max, &fake_cfg);
+ mpam_set_feature(mpam_feat_mbw_min, &fake_cfg);
+ mpam_extend_config(&fake_class, &fake_cfg);
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_max, &fake_cfg));
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_min, &fake_cfg));
+ KUNIT_EXPECT_EQ(test, fake_cfg.mbw_max, 0xfeef);
+ KUNIT_EXPECT_EQ(test, fake_cfg.mbw_min, 0xfeef);
+
+ /* When a min is missing, it is generated */
+ fake_class.props.bwa_wd = 16;
+ fake_cfg.mbw_max = 0xfeef;
+ fake_cfg.mbw_min = 0;
+ bitmap_zero(fake_cfg.features, MPAM_FEATURE_LAST);
+ mpam_set_feature(mpam_feat_mbw_max, &fake_cfg);
+ mpam_extend_config(&fake_class, &fake_cfg);
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_max, &fake_cfg));
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_min, &fake_cfg));
+ KUNIT_EXPECT_EQ(test, fake_cfg.mbw_max, 0xfeef);
+ KUNIT_EXPECT_EQ(test, fake_cfg.mbw_min, 0xf224);
+
+ fake_class.props.bwa_wd = 8;
+ fake_cfg.mbw_max = 0xfeef;
+ fake_cfg.mbw_min = 0;
+ bitmap_zero(fake_cfg.features, MPAM_FEATURE_LAST);
+ mpam_set_feature(mpam_feat_mbw_max, &fake_cfg);
+ mpam_extend_config(&fake_class, &fake_cfg);
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_max, &fake_cfg));
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_min, &fake_cfg));
+ KUNIT_EXPECT_EQ(test, fake_cfg.mbw_max, 0xfeef);
+ KUNIT_EXPECT_EQ(test, fake_cfg.mbw_min, 0xf224);
+
+ /* 5% below the minimum granule, is still the minimum granule */
+ fake_class.props.bwa_wd = 12;
+ fake_cfg.mbw_max = 0xf;
+ fake_cfg.mbw_min = 0;
+ bitmap_zero(fake_cfg.features, MPAM_FEATURE_LAST);
+ mpam_set_feature(mpam_feat_mbw_max, &fake_cfg);
+ mpam_extend_config(&fake_class, &fake_cfg);
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_max, &fake_cfg));
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_min, &fake_cfg));
+ KUNIT_EXPECT_EQ(test, fake_cfg.mbw_max, 0xf);
+ KUNIT_EXPECT_EQ(test, fake_cfg.mbw_min, 0xf);
+
+ fake_class.props.bwa_wd = 16;
+ fake_cfg.mbw_max = 0x4;
+ fake_cfg.mbw_min = 0;
+ bitmap_zero(fake_cfg.features, MPAM_FEATURE_LAST);
+ mpam_set_feature(mpam_feat_mbw_max, &fake_cfg);
+ mpam_extend_config(&fake_class, &fake_cfg);
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_max, &fake_cfg));
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_min, &fake_cfg));
+ KUNIT_EXPECT_EQ(test, fake_cfg.mbw_max, 0x4);
+ KUNIT_EXPECT_EQ(test, fake_cfg.mbw_min, 0x0);
+}
+
static void test_mpam_reset_msc_bitmap(struct kunit *test)
{
char __iomem *buf = kunit_kzalloc(test, SZ_16K, GFP_KERNEL);
@@ -378,6 +443,7 @@ static struct kunit_case mpam_devices_test_cases[] = {
KUNIT_CASE(test_mpam_reset_msc_bitmap),
KUNIT_CASE(test_mpam_enable_merge_features),
KUNIT_CASE(test__props_mismatch),
+ KUNIT_CASE(test_mpam_extend_config),
{}
};
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 42/47] arm_mpam: resctrl: Add kunit test for mbw min control generation
2026-01-12 16:59 ` [PATCH v3 42/47] arm_mpam: resctrl: Add kunit test for mbw min control generation Ben Horgan
@ 2026-01-13 15:43 ` Jonathan Cameron
0 siblings, 0 replies; 160+ messages in thread
From: Jonathan Cameron @ 2026-01-13 15:43 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
On Mon, 12 Jan 2026 16:59:09 +0000
Ben Horgan <ben.horgan@arm.com> wrote:
> By default we generate a minimum bandwidth value that is 5% lower
> than the maximum bandwidth value given by resctrl.
>
> Add a test for this.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> [horgan: Split test into separate patch]
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 43/47] arm_mpam: Add quirk framework
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (41 preceding siblings ...)
2026-01-12 16:59 ` [PATCH v3 42/47] arm_mpam: resctrl: Add kunit test for mbw min control generation Ben Horgan
@ 2026-01-12 16:59 ` Ben Horgan
2026-01-19 12:14 ` Gavin Shan
2026-01-12 16:59 ` [PATCH v3 44/47] arm_mpam: Add workaround for T241-MPAM-1 Ben Horgan
` (8 subsequent siblings)
51 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:59 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: Shanker Donthineni <sdonthineni@nvidia.com>
The MPAM specification includes the MPAMF_IIDR, which serves to uniquely
identify the MSC implementation through a combination of implementer
details, product ID, variant, and revision. Certain hardware issues/errata
can be resolved using software workarounds.
Introduce a quirk framework to allow workarounds to be enabled based on the
MPAMF_IIDR value.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Co-developed-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Co-developed-by: James Morse <james.morse@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes by James:
Stash the IIDR so this doesn't need an IPI, enable quirks only
once, move the description to the callback so it can be pr_once()d, add an
enum of workarounds for popular errata. Add macros for making lists of
product/revision/vendor half readable
Changes since rfc:
remove trailing commas in last element of enums
Make mpam_enable_quirks() in charge of mpam_set_quirk() even if there
is an enable.
---
drivers/resctrl/mpam_devices.c | 32 ++++++++++++++++++++++++++++++++
drivers/resctrl/mpam_internal.h | 25 +++++++++++++++++++++++++
2 files changed, 57 insertions(+)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 37bd8efc6ecf..5f741df9abcc 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -630,6 +630,30 @@ static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
return ERR_PTR(-ENOENT);
}
+static const struct mpam_quirk mpam_quirks[] = {
+ { NULL } /* Sentinel */
+};
+
+static void mpam_enable_quirks(struct mpam_msc *msc)
+{
+ const struct mpam_quirk *quirk;
+
+ for (quirk = &mpam_quirks[0]; quirk->iidr_mask; quirk++) {
+ int err = 0;
+
+ if (quirk->iidr != (msc->iidr & quirk->iidr_mask))
+ continue;
+
+ if (quirk->init)
+ err = quirk->init(msc, quirk);
+
+ if (err)
+ continue;
+
+ mpam_set_quirk(quirk->workaround, msc);
+ }
+}
+
/*
* IHI009A.a has this nugget: "If a monitor does not support automatic behaviour
* of NRDY, software can use this bit for any purpose" - so hardware might not
@@ -864,8 +888,11 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
/* Grab an IDR value to find out how many RIS there are */
mutex_lock(&msc->part_sel_lock);
idr = mpam_msc_read_idr(msc);
+ msc->iidr = mpam_read_partsel_reg(msc, IIDR);
mutex_unlock(&msc->part_sel_lock);
+ mpam_enable_quirks(msc);
+
msc->ris_max = FIELD_GET(MPAMF_IDR_RIS_MAX, idr);
/* Use these values so partid/pmg always starts with a valid value */
@@ -1993,6 +2020,7 @@ static bool mpam_has_cmax_wd_feature(struct mpam_props *props)
* resulting safe value must be compatible with both. When merging values in
* the tree, all the aliasing resources must be handled first.
* On mismatch, parent is modified.
+ * Quirks on an MSC will apply to all MSC in that class.
*/
static void __props_mismatch(struct mpam_props *parent,
struct mpam_props *child, bool alias)
@@ -2112,6 +2140,7 @@ static void __props_mismatch(struct mpam_props *parent,
* nobble the class feature, as we can't configure all the resources.
* e.g. The L3 cache is composed of two resources with 13 and 17 portion
* bitmaps respectively.
+ * Quirks on an MSC will apply to all MSC in that class.
*/
static void
__class_props_mismatch(struct mpam_class *class, struct mpam_vmsc *vmsc)
@@ -2125,6 +2154,9 @@ __class_props_mismatch(struct mpam_class *class, struct mpam_vmsc *vmsc)
dev_dbg(dev, "Merging features for class:0x%lx &= vmsc:0x%lx\n",
(long)cprops->features, (long)vprops->features);
+ /* Merge quirks */
+ class->quirks |= vmsc->msc->quirks;
+
/* Take the safe value for any common features */
__props_mismatch(cprops, vprops, false);
}
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 69cb75616561..d60a3caf6f6e 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -85,6 +85,8 @@ struct mpam_msc {
u8 pmg_max;
unsigned long ris_idxs;
u32 ris_max;
+ u32 iidr;
+ u16 quirks;
/*
* error_irq_lock is taken when registering/unregistering the error
@@ -216,6 +218,28 @@ struct mpam_props {
#define mpam_set_feature(_feat, x) __set_bit(_feat, (x)->features)
#define mpam_clear_feature(_feat, x) __clear_bit(_feat, (x)->features)
+/* Workaround bits for msc->quirks */
+enum mpam_device_quirks {
+ MPAM_QUIRK_LAST
+};
+
+#define mpam_has_quirk(_quirk, x) ((1 << (_quirk) & (x)->quirks))
+#define mpam_set_quirk(_quirk, x) ((x)->quirks |= (1 << (_quirk)))
+
+struct mpam_quirk {
+ int (*init)(struct mpam_msc *msc, const struct mpam_quirk *quirk);
+
+ u32 iidr;
+ u32 iidr_mask;
+
+ enum mpam_device_quirks workaround;
+};
+
+#define MPAM_IIDR_MATCH_ONE FIELD_PREP_CONST(MPAMF_IIDR_PRODUCTID, 0xfff) | \
+ FIELD_PREP_CONST(MPAMF_IIDR_VARIANT, 0xf) | \
+ FIELD_PREP_CONST(MPAMF_IIDR_REVISION, 0xf) | \
+ FIELD_PREP_CONST(MPAMF_IIDR_IMPLEMENTER, 0xfff)
+
/* The values for MSMON_CFG_MBWU_FLT.RWBW */
enum mon_filter_options {
COUNT_BOTH = 0,
@@ -259,6 +283,7 @@ struct mpam_class {
struct mpam_props props;
u32 nrdy_usec;
+ u16 quirks;
u8 level;
enum mpam_class_types type;
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 43/47] arm_mpam: Add quirk framework
2026-01-12 16:59 ` [PATCH v3 43/47] arm_mpam: Add quirk framework Ben Horgan
@ 2026-01-19 12:14 ` Gavin Shan
2026-01-19 20:48 ` Ben Horgan
0 siblings, 1 reply; 160+ messages in thread
From: Gavin Shan @ 2026-01-19 12:14 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
On 1/13/26 12:59 AM, Ben Horgan wrote:
> From: Shanker Donthineni <sdonthineni@nvidia.com>
>
> The MPAM specification includes the MPAMF_IIDR, which serves to uniquely
> identify the MSC implementation through a combination of implementer
> details, product ID, variant, and revision. Certain hardware issues/errata
> can be resolved using software workarounds.
>
> Introduce a quirk framework to allow workarounds to be enabled based on the
> MPAMF_IIDR value.
>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Co-developed-by: Shanker Donthineni <sdonthineni@nvidia.com>
> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
> Co-developed-by: James Morse <james.morse@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes by James:
> Stash the IIDR so this doesn't need an IPI, enable quirks only
> once, move the description to the callback so it can be pr_once()d, add an
> enum of workarounds for popular errata. Add macros for making lists of
> product/revision/vendor half readable
>
> Changes since rfc:
> remove trailing commas in last element of enums
> Make mpam_enable_quirks() in charge of mpam_set_quirk() even if there
> is an enable.
> ---
> drivers/resctrl/mpam_devices.c | 32 ++++++++++++++++++++++++++++++++
> drivers/resctrl/mpam_internal.h | 25 +++++++++++++++++++++++++
> 2 files changed, 57 insertions(+)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 37bd8efc6ecf..5f741df9abcc 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -630,6 +630,30 @@ static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
> return ERR_PTR(-ENOENT);
> }
>
> +static const struct mpam_quirk mpam_quirks[] = {
> + { NULL } /* Sentinel */
> +};
> +
> +static void mpam_enable_quirks(struct mpam_msc *msc)
> +{
> + const struct mpam_quirk *quirk;
> +
> + for (quirk = &mpam_quirks[0]; quirk->iidr_mask; quirk++) {
> + int err = 0;
> +
> + if (quirk->iidr != (msc->iidr & quirk->iidr_mask))
> + continue;
> +
> + if (quirk->init)
> + err = quirk->init(msc, quirk);
> +
> + if (err)
> + continue;
> +
> + mpam_set_quirk(quirk->workaround, msc);
> + }
> +}
> +
> /*
> * IHI009A.a has this nugget: "If a monitor does not support automatic behaviour
> * of NRDY, software can use this bit for any purpose" - so hardware might not
> @@ -864,8 +888,11 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
> /* Grab an IDR value to find out how many RIS there are */
> mutex_lock(&msc->part_sel_lock);
> idr = mpam_msc_read_idr(msc);
> + msc->iidr = mpam_read_partsel_reg(msc, IIDR);
> mutex_unlock(&msc->part_sel_lock);
>
> + mpam_enable_quirks(msc);
> +
> msc->ris_max = FIELD_GET(MPAMF_IDR_RIS_MAX, idr);
>
> /* Use these values so partid/pmg always starts with a valid value */
> @@ -1993,6 +2020,7 @@ static bool mpam_has_cmax_wd_feature(struct mpam_props *props)
> * resulting safe value must be compatible with both. When merging values in
> * the tree, all the aliasing resources must be handled first.
> * On mismatch, parent is modified.
> + * Quirks on an MSC will apply to all MSC in that class.
> */
> static void __props_mismatch(struct mpam_props *parent,
> struct mpam_props *child, bool alias)
> @@ -2112,6 +2140,7 @@ static void __props_mismatch(struct mpam_props *parent,
> * nobble the class feature, as we can't configure all the resources.
> * e.g. The L3 cache is composed of two resources with 13 and 17 portion
> * bitmaps respectively.
> + * Quirks on an MSC will apply to all MSC in that class.
> */
> static void
> __class_props_mismatch(struct mpam_class *class, struct mpam_vmsc *vmsc)
> @@ -2125,6 +2154,9 @@ __class_props_mismatch(struct mpam_class *class, struct mpam_vmsc *vmsc)
> dev_dbg(dev, "Merging features for class:0x%lx &= vmsc:0x%lx\n",
> (long)cprops->features, (long)vprops->features);
>
> + /* Merge quirks */
> + class->quirks |= vmsc->msc->quirks;
> +
> /* Take the safe value for any common features */
> __props_mismatch(cprops, vprops, false);
> }
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 69cb75616561..d60a3caf6f6e 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -85,6 +85,8 @@ struct mpam_msc {
> u8 pmg_max;
> unsigned long ris_idxs;
> u32 ris_max;
> + u32 iidr;
> + u16 quirks;
>
> /*
> * error_irq_lock is taken when registering/unregistering the error
> @@ -216,6 +218,28 @@ struct mpam_props {
> #define mpam_set_feature(_feat, x) __set_bit(_feat, (x)->features)
> #define mpam_clear_feature(_feat, x) __clear_bit(_feat, (x)->features)
>
> +/* Workaround bits for msc->quirks */
> +enum mpam_device_quirks {
> + MPAM_QUIRK_LAST
> +};
> +
> +#define mpam_has_quirk(_quirk, x) ((1 << (_quirk) & (x)->quirks))
> +#define mpam_set_quirk(_quirk, x) ((x)->quirks |= (1 << (_quirk)))
> +
> +struct mpam_quirk {
> + int (*init)(struct mpam_msc *msc, const struct mpam_quirk *quirk);
> +
> + u32 iidr;
> + u32 iidr_mask;
> +
> + enum mpam_device_quirks workaround;
> +};
> +
> +#define MPAM_IIDR_MATCH_ONE FIELD_PREP_CONST(MPAMF_IIDR_PRODUCTID, 0xfff) | \
> + FIELD_PREP_CONST(MPAMF_IIDR_VARIANT, 0xf) | \
> + FIELD_PREP_CONST(MPAMF_IIDR_REVISION, 0xf) | \
> + FIELD_PREP_CONST(MPAMF_IIDR_IMPLEMENTER, 0xfff)
> +
An error reported by checkpatch.pl as below.
ERROR: Macros with complex values should be enclosed in parentheses
#135: FILE: drivers/resctrl/mpam_internal.h:238:
+#define MPAM_IIDR_MATCH_ONE FIELD_PREP_CONST(MPAMF_IIDR_PRODUCTID, 0xfff) | \
+ FIELD_PREP_CONST(MPAMF_IIDR_VARIANT, 0xf) | \
+ FIELD_PREP_CONST(MPAMF_IIDR_REVISION, 0xf) | \
+ FIELD_PREP_CONST(MPAMF_IIDR_IMPLEMENTER, 0xfff)
> /* The values for MSMON_CFG_MBWU_FLT.RWBW */
> enum mon_filter_options {
> COUNT_BOTH = 0,
> @@ -259,6 +283,7 @@ struct mpam_class {
>
> struct mpam_props props;
> u32 nrdy_usec;
> + u16 quirks;
> u8 level;
> enum mpam_class_types type;
>
Thanks,
Gavin
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 43/47] arm_mpam: Add quirk framework
2026-01-19 12:14 ` Gavin Shan
@ 2026-01-19 20:48 ` Ben Horgan
0 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-19 20:48 UTC (permalink / raw)
To: Gavin Shan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
Hi Gavin,
On 1/19/26 12:14, Gavin Shan wrote:
> On 1/13/26 12:59 AM, Ben Horgan wrote:
>> From: Shanker Donthineni <sdonthineni@nvidia.com>
>>
>> The MPAM specification includes the MPAMF_IIDR, which serves to uniquely
>> identify the MSC implementation through a combination of implementer
>> details, product ID, variant, and revision. Certain hardware issues/
>> errata
>> can be resolved using software workarounds.
>>
>> Introduce a quirk framework to allow workarounds to be enabled based
>> on the
>> MPAMF_IIDR value.
>>
>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>> Co-developed-by: Shanker Donthineni <sdonthineni@nvidia.com>
>> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
>> Co-developed-by: James Morse <james.morse@arm.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>> ---
>> Changes by James:
>> Stash the IIDR so this doesn't need an IPI, enable quirks only
>> once, move the description to the callback so it can be pr_once()d,
>> add an
>> enum of workarounds for popular errata. Add macros for making lists of
>> product/revision/vendor half readable
>>
>> Changes since rfc:
>> remove trailing commas in last element of enums
>> Make mpam_enable_quirks() in charge of mpam_set_quirk() even if there
>> is an enable.
>> ---
>> drivers/resctrl/mpam_devices.c | 32 ++++++++++++++++++++++++++++++++
>> drivers/resctrl/mpam_internal.h | 25 +++++++++++++++++++++++++
>> 2 files changed, 57 insertions(+)
>>
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/
>> mpam_devices.c
>> index 37bd8efc6ecf..5f741df9abcc 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -630,6 +630,30 @@ static struct mpam_msc_ris
>> *mpam_get_or_create_ris(struct mpam_msc *msc,
>> return ERR_PTR(-ENOENT);
>> }
>> +static const struct mpam_quirk mpam_quirks[] = {
>> + { NULL } /* Sentinel */
>> +};
>> +
>> +static void mpam_enable_quirks(struct mpam_msc *msc)
>> +{
>> + const struct mpam_quirk *quirk;
>> +
>> + for (quirk = &mpam_quirks[0]; quirk->iidr_mask; quirk++) {
>> + int err = 0;
>> +
>> + if (quirk->iidr != (msc->iidr & quirk->iidr_mask))
>> + continue;
>> +
>> + if (quirk->init)
>> + err = quirk->init(msc, quirk);
>> +
>> + if (err)
>> + continue;
>> +
>> + mpam_set_quirk(quirk->workaround, msc);
>> + }
>> +}
>> +
>> /*
>> * IHI009A.a has this nugget: "If a monitor does not support
>> automatic behaviour
>> * of NRDY, software can use this bit for any purpose" - so hardware
>> might not
>> @@ -864,8 +888,11 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
>> /* Grab an IDR value to find out how many RIS there are */
>> mutex_lock(&msc->part_sel_lock);
>> idr = mpam_msc_read_idr(msc);
>> + msc->iidr = mpam_read_partsel_reg(msc, IIDR);
>> mutex_unlock(&msc->part_sel_lock);
>> + mpam_enable_quirks(msc);
>> +
>> msc->ris_max = FIELD_GET(MPAMF_IDR_RIS_MAX, idr);
>> /* Use these values so partid/pmg always starts with a valid
>> value */
>> @@ -1993,6 +2020,7 @@ static bool mpam_has_cmax_wd_feature(struct
>> mpam_props *props)
>> * resulting safe value must be compatible with both. When merging
>> values in
>> * the tree, all the aliasing resources must be handled first.
>> * On mismatch, parent is modified.
>> + * Quirks on an MSC will apply to all MSC in that class.
>> */
>> static void __props_mismatch(struct mpam_props *parent,
>> struct mpam_props *child, bool alias)
>> @@ -2112,6 +2140,7 @@ static void __props_mismatch(struct mpam_props
>> *parent,
>> * nobble the class feature, as we can't configure all the resources.
>> * e.g. The L3 cache is composed of two resources with 13 and 17
>> portion
>> * bitmaps respectively.
>> + * Quirks on an MSC will apply to all MSC in that class.
>> */
>> static void
>> __class_props_mismatch(struct mpam_class *class, struct mpam_vmsc
>> *vmsc)
>> @@ -2125,6 +2154,9 @@ __class_props_mismatch(struct mpam_class *class,
>> struct mpam_vmsc *vmsc)
>> dev_dbg(dev, "Merging features for class:0x%lx &= vmsc:0x%lx\n",
>> (long)cprops->features, (long)vprops->features);
>> + /* Merge quirks */
>> + class->quirks |= vmsc->msc->quirks;
>> +
>> /* Take the safe value for any common features */
>> __props_mismatch(cprops, vprops, false);
>> }
>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/
>> mpam_internal.h
>> index 69cb75616561..d60a3caf6f6e 100644
>> --- a/drivers/resctrl/mpam_internal.h
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -85,6 +85,8 @@ struct mpam_msc {
>> u8 pmg_max;
>> unsigned long ris_idxs;
>> u32 ris_max;
>> + u32 iidr;
>> + u16 quirks;
>> /*
>> * error_irq_lock is taken when registering/unregistering the error
>> @@ -216,6 +218,28 @@ struct mpam_props {
>> #define mpam_set_feature(_feat, x) __set_bit(_feat, (x)->features)
>> #define mpam_clear_feature(_feat, x) __clear_bit(_feat, (x)-
>> >features)
>> +/* Workaround bits for msc->quirks */
>> +enum mpam_device_quirks {
>> + MPAM_QUIRK_LAST
>> +};
>> +
>> +#define mpam_has_quirk(_quirk, x) ((1 << (_quirk) & (x)->quirks))
>> +#define mpam_set_quirk(_quirk, x) ((x)->quirks |= (1 << (_quirk)))
>> +
>> +struct mpam_quirk {
>> + int (*init)(struct mpam_msc *msc, const struct mpam_quirk *quirk);
>> +
>> + u32 iidr;
>> + u32 iidr_mask;
>> +
>> + enum mpam_device_quirks workaround;
>> +};
>> +
>> +#define MPAM_IIDR_MATCH_ONE
>> FIELD_PREP_CONST(MPAMF_IIDR_PRODUCTID, 0xfff) | \
>> + FIELD_PREP_CONST(MPAMF_IIDR_VARIANT, 0xf) | \
>> + FIELD_PREP_CONST(MPAMF_IIDR_REVISION, 0xf) | \
>> + FIELD_PREP_CONST(MPAMF_IIDR_IMPLEMENTER, 0xfff)
>> +
>
> An error reported by checkpatch.pl as below.
>
> ERROR: Macros with complex values should be enclosed in parentheses
> #135: FILE: drivers/resctrl/mpam_internal.h:238:
> +#define MPAM_IIDR_MATCH_ONE FIELD_PREP_CONST(MPAMF_IIDR_PRODUCTID,
> 0xfff) | \
> + FIELD_PREP_CONST(MPAMF_IIDR_VARIANT, 0xf) | \
> + FIELD_PREP_CONST(MPAMF_IIDR_REVISION, 0xf) | \
> + FIELD_PREP_CONST(MPAMF_IIDR_IMPLEMENTER, 0xfff)
>
>
That's a real error. Fixed.
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 44/47] arm_mpam: Add workaround for T241-MPAM-1
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (42 preceding siblings ...)
2026-01-12 16:59 ` [PATCH v3 43/47] arm_mpam: Add quirk framework Ben Horgan
@ 2026-01-12 16:59 ` Ben Horgan
2026-01-19 12:16 ` Gavin Shan
2026-01-12 16:59 ` [PATCH v3 45/47] arm_mpam: Add workaround for T241-MPAM-4 Ben Horgan
` (7 subsequent siblings)
51 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:59 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: Shanker Donthineni <sdonthineni@nvidia.com>
The MPAM bandwidth partitioning controls will not be correctly configured,
and hardware will retain default configuration register values, meaning
generally that bandwidth will remain unprovisioned.
To address the issue, follow the below steps after updating the MBW_MIN
and/or MBW_MAX registers.
- Perform 64b reads from all 12 bridge MPAM shadow registers at offsets
(0x360048 + slice*0x10000 + partid*8). These registers are read-only.
- Continue iterating until all 12 shadow register values match in a loop.
pr_warn_once if the values fail to match within the loop count 1000.
- Perform 64b writes with the value 0x0 to the two spare registers at
offsets 0x1b0000 and 0x1c0000.
In the hardware, writes to the MPAMCFG_MBW_MAX MPAMCFG_MBW_MIN registers
are transformed into broadcast writes to the 12 shadow registers. The
final two writes to the spare registers cause a final rank of downstream
micro-architectural MPAM registers to be updated from the shadow copies.
The intervening loop to read the 12 shadow registers helps avoid a race
condition where writes to the spare registers occur before all shadow
registers have been updated.
[ morse: Merged the min/max update into a single
mpam_quirk_post_config_change() helper. Stashed the t241_id in the msc
instead of carrying the physical address around. Test the msc quirk bit
instead of a static key. ]
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since rfc:
MPAM_IIDR_NVIDIA_T421 -> MPAM_IIDR_NVIDIA_T241
return err from init
Be specific about the errata in the init name,
mpam_enable_quirk_nvidia_t241 -> mpam_enable_quirk_nvidia_t241_1
---
Documentation/arch/arm64/silicon-errata.rst | 2 +
drivers/resctrl/mpam_devices.c | 88 +++++++++++++++++++++
drivers/resctrl/mpam_internal.h | 9 +++
3 files changed, 99 insertions(+)
diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
index a7ec57060f64..4e86b85fe3d6 100644
--- a/Documentation/arch/arm64/silicon-errata.rst
+++ b/Documentation/arch/arm64/silicon-errata.rst
@@ -246,6 +246,8 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| NVIDIA | T241 GICv3/4.x | T241-FABRIC-4 | N/A |
+----------------+-----------------+-----------------+-----------------------------+
+| NVIDIA | T241 MPAM | T241-MPAM-1 | N/A |
++----------------+-----------------+-----------------+-----------------------------+
+----------------+-----------------+-----------------+-----------------------------+
| Freescale/NXP | LS2080A/LS1043A | A-008585 | FSL_ERRATUM_A008585 |
+----------------+-----------------+-----------------+-----------------------------+
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 5f741df9abcc..bdf13a22d98f 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -29,6 +29,16 @@
#include "mpam_internal.h"
+/* Values for the T241 errata workaround */
+#define T241_CHIPS_MAX 4
+#define T241_CHIP_NSLICES 12
+#define T241_SPARE_REG0_OFF 0x1b0000
+#define T241_SPARE_REG1_OFF 0x1c0000
+#define T241_CHIP_ID(phys) FIELD_GET(GENMASK_ULL(44, 43), phys)
+#define T241_SHADOW_REG_OFF(sidx, pid) (0x360048 + (sidx) * 0x10000 + (pid) * 8)
+#define SMCCC_SOC_ID_T241 0x036b0241
+static void __iomem *t241_scratch_regs[T241_CHIPS_MAX];
+
/*
* mpam_list_lock protects the SRCU lists when writing. Once the
* mpam_enabled key is enabled these lists are read-only,
@@ -630,7 +640,45 @@ static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
return ERR_PTR(-ENOENT);
}
+static int mpam_enable_quirk_nvidia_t241_1(struct mpam_msc *msc,
+ const struct mpam_quirk *quirk)
+{
+ s32 soc_id = arm_smccc_get_soc_id_version();
+ struct resource *r;
+ phys_addr_t phys;
+
+ /*
+ * A mapping to a device other than the MSC is needed, check
+ * SOC_ID is NVIDIA T241 chip (036b:0241)
+ */
+ if (soc_id < 0 || soc_id != SMCCC_SOC_ID_T241)
+ return -EINVAL;
+
+ r = platform_get_resource(msc->pdev, IORESOURCE_MEM, 0);
+ if (!r)
+ return -EINVAL;
+
+ /* Find the internal registers base addr from the CHIP ID */
+ msc->t241_id = T241_CHIP_ID(r->start);
+ phys = FIELD_PREP(GENMASK_ULL(45, 44), msc->t241_id) | 0x19000000ULL;
+
+ t241_scratch_regs[msc->t241_id] = ioremap(phys, SZ_8M);
+ if (WARN_ON_ONCE(!t241_scratch_regs[msc->t241_id]))
+ return -EINVAL;
+
+ pr_info_once("Enabled workaround for NVIDIA T241 erratum T241-MPAM-1\n");
+
+ return 0;
+}
+
static const struct mpam_quirk mpam_quirks[] = {
+ {
+ /* NVIDIA t241 erratum T241-MPAM-1 */
+ .init = mpam_enable_quirk_nvidia_t241_1,
+ .iidr = MPAM_IIDR_NVIDIA_T241,
+ .iidr_mask = MPAM_IIDR_MATCH_ONE,
+ .workaround = T241_SCRUB_SHADOW_REGS,
+ },
{ NULL } /* Sentinel */
};
@@ -1378,6 +1426,44 @@ static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
__mpam_write_reg(msc, reg, bm);
}
+static void mpam_apply_t241_erratum(struct mpam_msc_ris *ris, u16 partid)
+{
+ int sidx, i, lcount = 1000;
+ void __iomem *regs;
+ u64 val0, val;
+
+ regs = t241_scratch_regs[ris->vmsc->msc->t241_id];
+
+ for (i = 0; i < lcount; i++) {
+ /* Read the shadow register at index 0 */
+ val0 = readq_relaxed(regs + T241_SHADOW_REG_OFF(0, partid));
+
+ /* Check if all the shadow registers have the same value */
+ for (sidx = 1; sidx < T241_CHIP_NSLICES; sidx++) {
+ val = readq_relaxed(regs +
+ T241_SHADOW_REG_OFF(sidx, partid));
+ if (val != val0)
+ break;
+ }
+ if (sidx == T241_CHIP_NSLICES)
+ break;
+ }
+
+ if (i == lcount)
+ pr_warn_once("t241: inconsistent values in shadow regs");
+
+ /* Write a value zero to spare registers to take effect of MBW conf */
+ writeq_relaxed(0, regs + T241_SPARE_REG0_OFF);
+ writeq_relaxed(0, regs + T241_SPARE_REG1_OFF);
+}
+
+static void mpam_quirk_post_config_change(struct mpam_msc_ris *ris, u16 partid,
+ struct mpam_config *cfg)
+{
+ if (mpam_has_quirk(T241_SCRUB_SHADOW_REGS, ris->vmsc->msc))
+ mpam_apply_t241_erratum(ris, partid);
+}
+
/* Called via IPI. Call while holding an SRCU reference */
static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
struct mpam_config *cfg)
@@ -1465,6 +1551,8 @@ static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
mpam_write_partsel_reg(msc, PRI, pri_val);
}
+ mpam_quirk_post_config_change(ris, partid, cfg);
+
mutex_unlock(&msc->part_sel_lock);
}
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index d60a3caf6f6e..9d15d37d4b5a 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -130,6 +130,9 @@ struct mpam_msc {
void __iomem *mapped_hwpage;
size_t mapped_hwpage_sz;
+ /* Values only used on some platforms for quirks */
+ u32 t241_id;
+
struct mpam_garbage garbage;
};
@@ -220,6 +223,7 @@ struct mpam_props {
/* Workaround bits for msc->quirks */
enum mpam_device_quirks {
+ T241_SCRUB_SHADOW_REGS,
MPAM_QUIRK_LAST
};
@@ -240,6 +244,11 @@ struct mpam_quirk {
FIELD_PREP_CONST(MPAMF_IIDR_REVISION, 0xf) | \
FIELD_PREP_CONST(MPAMF_IIDR_IMPLEMENTER, 0xfff)
+#define MPAM_IIDR_NVIDIA_T241 FIELD_PREP_CONST(MPAMF_IIDR_PRODUCTID, 0x241) | \
+ FIELD_PREP_CONST(MPAMF_IIDR_VARIANT, 0) | \
+ FIELD_PREP_CONST(MPAMF_IIDR_REVISION, 0) | \
+ FIELD_PREP_CONST(MPAMF_IIDR_IMPLEMENTER, 0x36b)
+
/* The values for MSMON_CFG_MBWU_FLT.RWBW */
enum mon_filter_options {
COUNT_BOTH = 0,
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 44/47] arm_mpam: Add workaround for T241-MPAM-1
2026-01-12 16:59 ` [PATCH v3 44/47] arm_mpam: Add workaround for T241-MPAM-1 Ben Horgan
@ 2026-01-19 12:16 ` Gavin Shan
2026-01-19 20:54 ` Ben Horgan
0 siblings, 1 reply; 160+ messages in thread
From: Gavin Shan @ 2026-01-19 12:16 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
On 1/13/26 12:59 AM, Ben Horgan wrote:
> From: Shanker Donthineni <sdonthineni@nvidia.com>
>
> The MPAM bandwidth partitioning controls will not be correctly configured,
> and hardware will retain default configuration register values, meaning
> generally that bandwidth will remain unprovisioned.
>
> To address the issue, follow the below steps after updating the MBW_MIN
> and/or MBW_MAX registers.
>
> - Perform 64b reads from all 12 bridge MPAM shadow registers at offsets
> (0x360048 + slice*0x10000 + partid*8). These registers are read-only.
> - Continue iterating until all 12 shadow register values match in a loop.
> pr_warn_once if the values fail to match within the loop count 1000.
> - Perform 64b writes with the value 0x0 to the two spare registers at
> offsets 0x1b0000 and 0x1c0000.
>
> In the hardware, writes to the MPAMCFG_MBW_MAX MPAMCFG_MBW_MIN registers
> are transformed into broadcast writes to the 12 shadow registers. The
> final two writes to the spare registers cause a final rank of downstream
> micro-architectural MPAM registers to be updated from the shadow copies.
> The intervening loop to read the 12 shadow registers helps avoid a race
> condition where writes to the spare registers occur before all shadow
> registers have been updated.
>
> [ morse: Merged the min/max update into a single
> mpam_quirk_post_config_change() helper. Stashed the t241_id in the msc
> instead of carrying the physical address around. Test the msc quirk bit
> instead of a static key. ]
>
> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since rfc:
> MPAM_IIDR_NVIDIA_T421 -> MPAM_IIDR_NVIDIA_T241
> return err from init
> Be specific about the errata in the init name,
> mpam_enable_quirk_nvidia_t241 -> mpam_enable_quirk_nvidia_t241_1
> ---
> Documentation/arch/arm64/silicon-errata.rst | 2 +
> drivers/resctrl/mpam_devices.c | 88 +++++++++++++++++++++
> drivers/resctrl/mpam_internal.h | 9 +++
> 3 files changed, 99 insertions(+)
>
> diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
> index a7ec57060f64..4e86b85fe3d6 100644
> --- a/Documentation/arch/arm64/silicon-errata.rst
> +++ b/Documentation/arch/arm64/silicon-errata.rst
> @@ -246,6 +246,8 @@ stable kernels.
> +----------------+-----------------+-----------------+-----------------------------+
> | NVIDIA | T241 GICv3/4.x | T241-FABRIC-4 | N/A |
> +----------------+-----------------+-----------------+-----------------------------+
> +| NVIDIA | T241 MPAM | T241-MPAM-1 | N/A |
> ++----------------+-----------------+-----------------+-----------------------------+
> +----------------+-----------------+-----------------+-----------------------------+
> | Freescale/NXP | LS2080A/LS1043A | A-008585 | FSL_ERRATUM_A008585 |
> +----------------+-----------------+-----------------+-----------------------------+
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 5f741df9abcc..bdf13a22d98f 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -29,6 +29,16 @@
>
> #include "mpam_internal.h"
>
> +/* Values for the T241 errata workaround */
> +#define T241_CHIPS_MAX 4
> +#define T241_CHIP_NSLICES 12
> +#define T241_SPARE_REG0_OFF 0x1b0000
> +#define T241_SPARE_REG1_OFF 0x1c0000
> +#define T241_CHIP_ID(phys) FIELD_GET(GENMASK_ULL(44, 43), phys)
> +#define T241_SHADOW_REG_OFF(sidx, pid) (0x360048 + (sidx) * 0x10000 + (pid) * 8)
> +#define SMCCC_SOC_ID_T241 0x036b0241
> +static void __iomem *t241_scratch_regs[T241_CHIPS_MAX];
> +
> /*
> * mpam_list_lock protects the SRCU lists when writing. Once the
> * mpam_enabled key is enabled these lists are read-only,
> @@ -630,7 +640,45 @@ static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
> return ERR_PTR(-ENOENT);
> }
>
> +static int mpam_enable_quirk_nvidia_t241_1(struct mpam_msc *msc,
> + const struct mpam_quirk *quirk)
> +{
> + s32 soc_id = arm_smccc_get_soc_id_version();
> + struct resource *r;
> + phys_addr_t phys;
> +
> + /*
> + * A mapping to a device other than the MSC is needed, check
> + * SOC_ID is NVIDIA T241 chip (036b:0241)
> + */
> + if (soc_id < 0 || soc_id != SMCCC_SOC_ID_T241)
> + return -EINVAL;
> +
> + r = platform_get_resource(msc->pdev, IORESOURCE_MEM, 0);
> + if (!r)
> + return -EINVAL;
> +
> + /* Find the internal registers base addr from the CHIP ID */
> + msc->t241_id = T241_CHIP_ID(r->start);
> + phys = FIELD_PREP(GENMASK_ULL(45, 44), msc->t241_id) | 0x19000000ULL;
> +
> + t241_scratch_regs[msc->t241_id] = ioremap(phys, SZ_8M);
> + if (WARN_ON_ONCE(!t241_scratch_regs[msc->t241_id]))
> + return -EINVAL;
> +
> + pr_info_once("Enabled workaround for NVIDIA T241 erratum T241-MPAM-1\n");
> +
> + return 0;
> +}
> +
> static const struct mpam_quirk mpam_quirks[] = {
> + {
> + /* NVIDIA t241 erratum T241-MPAM-1 */
> + .init = mpam_enable_quirk_nvidia_t241_1,
> + .iidr = MPAM_IIDR_NVIDIA_T241,
> + .iidr_mask = MPAM_IIDR_MATCH_ONE,
> + .workaround = T241_SCRUB_SHADOW_REGS,
> + },
> { NULL } /* Sentinel */
> };
>
> @@ -1378,6 +1426,44 @@ static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
> __mpam_write_reg(msc, reg, bm);
> }
>
> +static void mpam_apply_t241_erratum(struct mpam_msc_ris *ris, u16 partid)
> +{
> + int sidx, i, lcount = 1000;
> + void __iomem *regs;
> + u64 val0, val;
> +
> + regs = t241_scratch_regs[ris->vmsc->msc->t241_id];
> +
> + for (i = 0; i < lcount; i++) {
> + /* Read the shadow register at index 0 */
> + val0 = readq_relaxed(regs + T241_SHADOW_REG_OFF(0, partid));
> +
> + /* Check if all the shadow registers have the same value */
> + for (sidx = 1; sidx < T241_CHIP_NSLICES; sidx++) {
> + val = readq_relaxed(regs +
> + T241_SHADOW_REG_OFF(sidx, partid));
> + if (val != val0)
> + break;
> + }
> + if (sidx == T241_CHIP_NSLICES)
> + break;
> + }
> +
> + if (i == lcount)
> + pr_warn_once("t241: inconsistent values in shadow regs");
> +
> + /* Write a value zero to spare registers to take effect of MBW conf */
> + writeq_relaxed(0, regs + T241_SPARE_REG0_OFF);
> + writeq_relaxed(0, regs + T241_SPARE_REG1_OFF);
> +}
> +
> +static void mpam_quirk_post_config_change(struct mpam_msc_ris *ris, u16 partid,
> + struct mpam_config *cfg)
> +{
> + if (mpam_has_quirk(T241_SCRUB_SHADOW_REGS, ris->vmsc->msc))
> + mpam_apply_t241_erratum(ris, partid);
> +}
> +
> /* Called via IPI. Call while holding an SRCU reference */
> static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
> struct mpam_config *cfg)
> @@ -1465,6 +1551,8 @@ static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
> mpam_write_partsel_reg(msc, PRI, pri_val);
> }
>
> + mpam_quirk_post_config_change(ris, partid, cfg);
> +
> mutex_unlock(&msc->part_sel_lock);
> }
>
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index d60a3caf6f6e..9d15d37d4b5a 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -130,6 +130,9 @@ struct mpam_msc {
> void __iomem *mapped_hwpage;
> size_t mapped_hwpage_sz;
>
> + /* Values only used on some platforms for quirks */
> + u32 t241_id;
> +
> struct mpam_garbage garbage;
> };
>
> @@ -220,6 +223,7 @@ struct mpam_props {
>
> /* Workaround bits for msc->quirks */
> enum mpam_device_quirks {
> + T241_SCRUB_SHADOW_REGS,
> MPAM_QUIRK_LAST
> };
>
> @@ -240,6 +244,11 @@ struct mpam_quirk {
> FIELD_PREP_CONST(MPAMF_IIDR_REVISION, 0xf) | \
> FIELD_PREP_CONST(MPAMF_IIDR_IMPLEMENTER, 0xfff)
>
> +#define MPAM_IIDR_NVIDIA_T241 FIELD_PREP_CONST(MPAMF_IIDR_PRODUCTID, 0x241) | \
> + FIELD_PREP_CONST(MPAMF_IIDR_VARIANT, 0) | \
> + FIELD_PREP_CONST(MPAMF_IIDR_REVISION, 0) | \
> + FIELD_PREP_CONST(MPAMF_IIDR_IMPLEMENTER, 0x36b)
> +
An error reported by checkpatch.pl as below.
ERROR: Macros with complex values should be enclosed in parentheses
#205: FILE: drivers/resctrl/mpam_internal.h:247:
+#define MPAM_IIDR_NVIDIA_T241 FIELD_PREP_CONST(MPAMF_IIDR_PRODUCTID, 0x241) | \
+ FIELD_PREP_CONST(MPAMF_IIDR_VARIANT, 0) | \
+ FIELD_PREP_CONST(MPAMF_IIDR_REVISION, 0) | \
+ FIELD_PREP_CONST(MPAMF_IIDR_IMPLEMENTER, 0x36b)
> /* The values for MSMON_CFG_MBWU_FLT.RWBW */
> enum mon_filter_options {
> COUNT_BOTH = 0,
Thanks,
Gavin
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 44/47] arm_mpam: Add workaround for T241-MPAM-1
2026-01-19 12:16 ` Gavin Shan
@ 2026-01-19 20:54 ` Ben Horgan
0 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-19 20:54 UTC (permalink / raw)
To: Gavin Shan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
Hi Gavin,
On 1/19/26 12:16, Gavin Shan wrote:
> On 1/13/26 12:59 AM, Ben Horgan wrote:
>> +#define MPAM_IIDR_NVIDIA_T241
>> FIELD_PREP_CONST(MPAMF_IIDR_PRODUCTID, 0x241) | \
>> + FIELD_PREP_CONST(MPAMF_IIDR_VARIANT, 0) | \
>> + FIELD_PREP_CONST(MPAMF_IIDR_REVISION, 0) | \
>> + FIELD_PREP_CONST(MPAMF_IIDR_IMPLEMENTER, 0x36b)
>> +
>
> An error reported by checkpatch.pl as below.
>
> ERROR: Macros with complex values should be enclosed in parentheses
> #205: FILE: drivers/resctrl/mpam_internal.h:247:
> +#define MPAM_IIDR_NVIDIA_T241
> FIELD_PREP_CONST(MPAMF_IIDR_PRODUCTID, 0x241) | \
> + FIELD_PREP_CONST(MPAMF_IIDR_VARIANT, 0) | \
> + FIELD_PREP_CONST(MPAMF_IIDR_REVISION, 0) | \
> + FIELD_PREP_CONST(MPAMF_IIDR_IMPLEMENTER, 0x36b)
Fixed
>
>
>> /* The values for MSMON_CFG_MBWU_FLT.RWBW */
>> enum mon_filter_options {
>> COUNT_BOTH = 0,
>
> Thanks,
> Gavin
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 45/47] arm_mpam: Add workaround for T241-MPAM-4
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (43 preceding siblings ...)
2026-01-12 16:59 ` [PATCH v3 44/47] arm_mpam: Add workaround for T241-MPAM-1 Ben Horgan
@ 2026-01-12 16:59 ` Ben Horgan
2026-01-15 23:20 ` Fenghua Yu
2026-01-12 16:59 ` [PATCH v3 46/47] arm_mpam: Add workaround for T241-MPAM-6 Ben Horgan
` (6 subsequent siblings)
51 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:59 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: Shanker Donthineni <sdonthineni@nvidia.com>
In the T241 implementation of memory-bandwidth partitioning, in the absence
of contention for bandwidth, the minimum bandwidth setting can affect the
amount of achieved bandwidth. Specifically, the achieved bandwidth in the
absence of contention can settle to any value between the values of
MPAMCFG_MBW_MIN and MPAMCFG_MBW_MAX. Also, if MPAMCFG_MBW_MIN is set
zero (below 0.78125%), once a core enters a throttled state, it will never
leave that state.
The first issue is not a concern if the MPAM software allows to program
MPAMCFG_MBW_MIN through the sysfs interface. This patch ensures program
MBW_MIN=1 (0.78125%) whenever MPAMCFG_MBW_MIN=0 is programmed.
In the scenario where the resctrl doesn't support the MBW_MIN interface via
sysfs, to achieve bandwidth closer to MBW_MAX in the absence of contention,
software should configure a relatively narrow gap between MBW_MIN and
MBW_MAX. The recommendation is to use a 5% gap to mitigate the problem.
[ morse: Added as second quirk, adapted to use the new intermediate values
in mpam_extend_config() ]
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since rfc:
MPAM_IIDR_NVIDIA_T421 -> MPAM_IIDR_NVIDIA_T241
Handling when reset_mbw_min is set
---
Documentation/arch/arm64/silicon-errata.rst | 2 ++
drivers/resctrl/mpam_devices.c | 34 +++++++++++++++++++--
drivers/resctrl/mpam_internal.h | 1 +
3 files changed, 34 insertions(+), 3 deletions(-)
diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
index 4e86b85fe3d6..b18bc704d4a1 100644
--- a/Documentation/arch/arm64/silicon-errata.rst
+++ b/Documentation/arch/arm64/silicon-errata.rst
@@ -248,6 +248,8 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| NVIDIA | T241 MPAM | T241-MPAM-1 | N/A |
+----------------+-----------------+-----------------+-----------------------------+
+| NVIDIA | T241 MPAM | T241-MPAM-4 | N/A |
++----------------+-----------------+-----------------+-----------------------------+
+----------------+-----------------+-----------------+-----------------------------+
| Freescale/NXP | LS2080A/LS1043A | A-008585 | FSL_ERRATUM_A008585 |
+----------------+-----------------+-----------------+-----------------------------+
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index bdf13a22d98f..884ca6a6d8f3 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -679,6 +679,12 @@ static const struct mpam_quirk mpam_quirks[] = {
.iidr_mask = MPAM_IIDR_MATCH_ONE,
.workaround = T241_SCRUB_SHADOW_REGS,
},
+ {
+ /* NVIDIA t241 erratum T241-MPAM-4 */
+ .iidr = MPAM_IIDR_NVIDIA_T241,
+ .iidr_mask = MPAM_IIDR_MATCH_ONE,
+ .workaround = T241_FORCE_MBW_MIN_TO_ONE,
+ },
{ NULL } /* Sentinel */
};
@@ -1464,6 +1470,17 @@ static void mpam_quirk_post_config_change(struct mpam_msc_ris *ris, u16 partid,
mpam_apply_t241_erratum(ris, partid);
}
+static u16 mpam_wa_t241_force_mbw_min_to_one(struct mpam_props *props)
+{
+ u16 max_hw_value, min_hw_granule, res0_bits;
+
+ res0_bits = 16 - props->bwa_wd;
+ max_hw_value = ((1 << props->bwa_wd) - 1) << res0_bits;
+ min_hw_granule = ~max_hw_value;
+
+ return min_hw_granule + 1;
+}
+
/* Called via IPI. Call while holding an SRCU reference */
static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
struct mpam_config *cfg)
@@ -1508,10 +1525,15 @@ static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
if (mpam_has_feature(mpam_feat_mbw_min, rprops) &&
mpam_has_feature(mpam_feat_mbw_min, cfg)) {
- if (cfg->reset_mbw_min)
- mpam_write_partsel_reg(msc, MBW_MIN, 0);
- else
+ if (cfg->reset_mbw_min) {
+ u16 reset = 0;
+
+ if (mpam_has_quirk(T241_FORCE_MBW_MIN_TO_ONE, msc))
+ reset = mpam_wa_t241_force_mbw_min_to_one(rprops);
+ mpam_write_partsel_reg(msc, MBW_MIN, reset);
+ } else {
mpam_write_partsel_reg(msc, MBW_MIN, cfg->mbw_min);
+ }
}
if (mpam_has_feature(mpam_feat_mbw_max, rprops) &&
@@ -2570,6 +2592,12 @@ static void mpam_extend_config(struct mpam_class *class, struct mpam_config *cfg
cfg->mbw_min = max(min, min_hw_granule);
mpam_set_feature(mpam_feat_mbw_min, cfg);
}
+
+ if (mpam_has_quirk(T241_FORCE_MBW_MIN_TO_ONE, class) &&
+ cfg->mbw_min <= min_hw_granule) {
+ cfg->mbw_min = min_hw_granule + 1;
+ mpam_set_feature(mpam_feat_mbw_min, cfg);
+ }
}
static void mpam_reset_component_cfg(struct mpam_component *comp)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 9d15d37d4b5a..7b4566814945 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -224,6 +224,7 @@ struct mpam_props {
/* Workaround bits for msc->quirks */
enum mpam_device_quirks {
T241_SCRUB_SHADOW_REGS,
+ T241_FORCE_MBW_MIN_TO_ONE,
MPAM_QUIRK_LAST
};
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 45/47] arm_mpam: Add workaround for T241-MPAM-4
2026-01-12 16:59 ` [PATCH v3 45/47] arm_mpam: Add workaround for T241-MPAM-4 Ben Horgan
@ 2026-01-15 23:20 ` Fenghua Yu
2026-01-19 20:56 ` Ben Horgan
0 siblings, 1 reply; 160+ messages in thread
From: Fenghua Yu @ 2026-01-15 23:20 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, gshan, james.morse, jonathan.cameron, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
Hi, Shanker and Ben,
On 1/12/26 08:59, Ben Horgan wrote:
> From: Shanker Donthineni <sdonthineni@nvidia.com>
>
> In the T241 implementation of memory-bandwidth partitioning, in the absence
> of contention for bandwidth, the minimum bandwidth setting can affect the
> amount of achieved bandwidth. Specifically, the achieved bandwidth in the
> absence of contention can settle to any value between the values of
> MPAMCFG_MBW_MIN and MPAMCFG_MBW_MAX. Also, if MPAMCFG_MBW_MIN is set
> zero (below 0.78125%), once a core enters a throttled state, it will never
> leave that state.
>
> The first issue is not a concern if the MPAM software allows to program
> MPAMCFG_MBW_MIN through the sysfs interface. This patch ensures program
> MBW_MIN=1 (0.78125%) whenever MPAMCFG_MBW_MIN=0 is programmed.
When MBW_MIN=1, min mem bw can be very low when contention. This may
drop mem access performance. Is it possible to set MBW_MIN bigger so
that ensure the floor of mem access is high?
Thanks.
-Fenghua
>
> In the scenario where the resctrl doesn't support the MBW_MIN interface via
> sysfs, to achieve bandwidth closer to MBW_MAX in the absence of contention,
> software should configure a relatively narrow gap between MBW_MIN and
> MBW_MAX. The recommendation is to use a 5% gap to mitigate the problem.
>
> [ morse: Added as second quirk, adapted to use the new intermediate values
> in mpam_extend_config() ]
>
> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since rfc:
> MPAM_IIDR_NVIDIA_T421 -> MPAM_IIDR_NVIDIA_T241
> Handling when reset_mbw_min is set
> ---
> Documentation/arch/arm64/silicon-errata.rst | 2 ++
> drivers/resctrl/mpam_devices.c | 34 +++++++++++++++++++--
> drivers/resctrl/mpam_internal.h | 1 +
> 3 files changed, 34 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
> index 4e86b85fe3d6..b18bc704d4a1 100644
> --- a/Documentation/arch/arm64/silicon-errata.rst
> +++ b/Documentation/arch/arm64/silicon-errata.rst
> @@ -248,6 +248,8 @@ stable kernels.
> +----------------+-----------------+-----------------+-----------------------------+
> | NVIDIA | T241 MPAM | T241-MPAM-1 | N/A |
> +----------------+-----------------+-----------------+-----------------------------+
> +| NVIDIA | T241 MPAM | T241-MPAM-4 | N/A |
> ++----------------+-----------------+-----------------+-----------------------------+
> +----------------+-----------------+-----------------+-----------------------------+
> | Freescale/NXP | LS2080A/LS1043A | A-008585 | FSL_ERRATUM_A008585 |
> +----------------+-----------------+-----------------+-----------------------------+
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index bdf13a22d98f..884ca6a6d8f3 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -679,6 +679,12 @@ static const struct mpam_quirk mpam_quirks[] = {
> .iidr_mask = MPAM_IIDR_MATCH_ONE,
> .workaround = T241_SCRUB_SHADOW_REGS,
> },
> + {
> + /* NVIDIA t241 erratum T241-MPAM-4 */
> + .iidr = MPAM_IIDR_NVIDIA_T241,
> + .iidr_mask = MPAM_IIDR_MATCH_ONE,
> + .workaround = T241_FORCE_MBW_MIN_TO_ONE,
> + },
> { NULL } /* Sentinel */
> };
>
> @@ -1464,6 +1470,17 @@ static void mpam_quirk_post_config_change(struct mpam_msc_ris *ris, u16 partid,
> mpam_apply_t241_erratum(ris, partid);
> }
>
> +static u16 mpam_wa_t241_force_mbw_min_to_one(struct mpam_props *props)
> +{
> + u16 max_hw_value, min_hw_granule, res0_bits;
> +
> + res0_bits = 16 - props->bwa_wd;
> + max_hw_value = ((1 << props->bwa_wd) - 1) << res0_bits;
> + min_hw_granule = ~max_hw_value;
> +
> + return min_hw_granule + 1;
> +}
> +
> /* Called via IPI. Call while holding an SRCU reference */
> static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
> struct mpam_config *cfg)
> @@ -1508,10 +1525,15 @@ static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
>
> if (mpam_has_feature(mpam_feat_mbw_min, rprops) &&
> mpam_has_feature(mpam_feat_mbw_min, cfg)) {
> - if (cfg->reset_mbw_min)
> - mpam_write_partsel_reg(msc, MBW_MIN, 0);
> - else
> + if (cfg->reset_mbw_min) {
> + u16 reset = 0;
> +
> + if (mpam_has_quirk(T241_FORCE_MBW_MIN_TO_ONE, msc))
> + reset = mpam_wa_t241_force_mbw_min_to_one(rprops);
> + mpam_write_partsel_reg(msc, MBW_MIN, reset);
> + } else {
> mpam_write_partsel_reg(msc, MBW_MIN, cfg->mbw_min);
> + }
> }
>
> if (mpam_has_feature(mpam_feat_mbw_max, rprops) &&
> @@ -2570,6 +2592,12 @@ static void mpam_extend_config(struct mpam_class *class, struct mpam_config *cfg
> cfg->mbw_min = max(min, min_hw_granule);
> mpam_set_feature(mpam_feat_mbw_min, cfg);
> }
> +
> + if (mpam_has_quirk(T241_FORCE_MBW_MIN_TO_ONE, class) &&
> + cfg->mbw_min <= min_hw_granule) {
> + cfg->mbw_min = min_hw_granule + 1;
> + mpam_set_feature(mpam_feat_mbw_min, cfg);
> + }
> }
>
> static void mpam_reset_component_cfg(struct mpam_component *comp)
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 9d15d37d4b5a..7b4566814945 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -224,6 +224,7 @@ struct mpam_props {
> /* Workaround bits for msc->quirks */
> enum mpam_device_quirks {
> T241_SCRUB_SHADOW_REGS,
> + T241_FORCE_MBW_MIN_TO_ONE,
> MPAM_QUIRK_LAST
> };
>
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 45/47] arm_mpam: Add workaround for T241-MPAM-4
2026-01-15 23:20 ` Fenghua Yu
@ 2026-01-19 20:56 ` Ben Horgan
2026-01-29 22:14 ` Fenghua Yu
0 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-19 20:56 UTC (permalink / raw)
To: Fenghua Yu
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, gshan, james.morse, jonathan.cameron, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
Hi Fenghua,
On 1/15/26 23:20, Fenghua Yu wrote:
> Hi, Shanker and Ben,
>
> On 1/12/26 08:59, Ben Horgan wrote:
>> From: Shanker Donthineni <sdonthineni@nvidia.com>
>>
>> In the T241 implementation of memory-bandwidth partitioning, in the
>> absence
>> of contention for bandwidth, the minimum bandwidth setting can affect the
>> amount of achieved bandwidth. Specifically, the achieved bandwidth in the
>> absence of contention can settle to any value between the values of
>> MPAMCFG_MBW_MIN and MPAMCFG_MBW_MAX. Also, if MPAMCFG_MBW_MIN is set
>> zero (below 0.78125%), once a core enters a throttled state, it will
>> never
>> leave that state.
>>
>> The first issue is not a concern if the MPAM software allows to program
>> MPAMCFG_MBW_MIN through the sysfs interface. This patch ensures program
>> MBW_MIN=1 (0.78125%) whenever MPAMCFG_MBW_MIN=0 is programmed.
>
> When MBW_MIN=1, min mem bw can be very low when contention. This may
> drop mem access performance. Is it possible to set MBW_MIN bigger so
> that ensure the floor of mem access is high?
Isn't that a policy decision rather than something we should be putting
in a quirk framework?
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 45/47] arm_mpam: Add workaround for T241-MPAM-4
2026-01-19 20:56 ` Ben Horgan
@ 2026-01-29 22:14 ` Fenghua Yu
2026-01-30 12:21 ` Ben Horgan
0 siblings, 1 reply; 160+ messages in thread
From: Fenghua Yu @ 2026-01-29 22:14 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, gshan, james.morse, jonathan.cameron, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
Hi, Ben,
On 1/19/26 12:56, Ben Horgan wrote:
> Hi Fenghua,
>
> On 1/15/26 23:20, Fenghua Yu wrote:
>> Hi, Shanker and Ben,
>>
>> On 1/12/26 08:59, Ben Horgan wrote:
>>> From: Shanker Donthineni <sdonthineni@nvidia.com>
>>>
>>> In the T241 implementation of memory-bandwidth partitioning, in the
>>> absence
>>> of contention for bandwidth, the minimum bandwidth setting can affect the
>>> amount of achieved bandwidth. Specifically, the achieved bandwidth in the
>>> absence of contention can settle to any value between the values of
>>> MPAMCFG_MBW_MIN and MPAMCFG_MBW_MAX. Also, if MPAMCFG_MBW_MIN is set
>>> zero (below 0.78125%), once a core enters a throttled state, it will
>>> never
>>> leave that state.
>>>
>>> The first issue is not a concern if the MPAM software allows to program
>>> MPAMCFG_MBW_MIN through the sysfs interface. This patch ensures program
>>> MBW_MIN=1 (0.78125%) whenever MPAMCFG_MBW_MIN=0 is programmed.
>>
>> When MBW_MIN=1, min mem bw can be very low when contention. This may
>> drop mem access performance. Is it possible to set MBW_MIN bigger so
>> that ensure the floor of mem access is high?
>
> Isn't that a policy decision rather than something we should be putting
> in a quirk framework?
MBW_MIN is 1% or 5% less than MBW_MAX.
The lower MBW_MIN hints hardware to lower mem bandwidth when mem access
contention. That causes memory performance degradation.
Is it possible to do the following changes to fix the performance issue?
1. By default min mbw is equal to max mbw. So hardware won't lower
performance unless it's needed. This can fix the current performance issue.
2. Add a new schemata line (e.g. MBI:<id>=x;<id>=y;...) to specify min
mbw just like max mbw specified by schemata line "MB:...". User can use
this line to change min mbw per partition per node. This could be added
in the future.
Thanks.
-Fenghua
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 45/47] arm_mpam: Add workaround for T241-MPAM-4
2026-01-29 22:14 ` Fenghua Yu
@ 2026-01-30 12:21 ` Ben Horgan
0 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-30 12:21 UTC (permalink / raw)
To: Fenghua Yu
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, gshan, james.morse, jonathan.cameron, kobak, lcherian,
linux-arm-kernel, linux-kernel, peternewman, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
Hi Fenghua,
On 1/29/26 22:14, Fenghua Yu wrote:
> Hi, Ben,
>
> On 1/19/26 12:56, Ben Horgan wrote:
>> Hi Fenghua,
>>
>> On 1/15/26 23:20, Fenghua Yu wrote:
>>> Hi, Shanker and Ben,
>>>
>>> On 1/12/26 08:59, Ben Horgan wrote:
>>>> From: Shanker Donthineni <sdonthineni@nvidia.com>
>>>>
>>>> In the T241 implementation of memory-bandwidth partitioning, in the
>>>> absence
>>>> of contention for bandwidth, the minimum bandwidth setting can
>>>> affect the
>>>> amount of achieved bandwidth. Specifically, the achieved bandwidth
>>>> in the
>>>> absence of contention can settle to any value between the values of
>>>> MPAMCFG_MBW_MIN and MPAMCFG_MBW_MAX. Also, if MPAMCFG_MBW_MIN is set
>>>> zero (below 0.78125%), once a core enters a throttled state, it will
>>>> never
>>>> leave that state.
>>>>
>>>> The first issue is not a concern if the MPAM software allows to program
>>>> MPAMCFG_MBW_MIN through the sysfs interface. This patch ensures program
>>>> MBW_MIN=1 (0.78125%) whenever MPAMCFG_MBW_MIN=0 is programmed.
>>>
>>> When MBW_MIN=1, min mem bw can be very low when contention. This may
>>> drop mem access performance. Is it possible to set MBW_MIN bigger so
>>> that ensure the floor of mem access is high?
>>
>> Isn't that a policy decision rather than something we should be putting
>> in a quirk framework?
>
> MBW_MIN is 1% or 5% less than MBW_MAX.
>
> The lower MBW_MIN hints hardware to lower mem bandwidth when mem access
> contention. That causes memory performance degradation.
>
> Is it possible to do the following changes to fix the performance issue?
> 1. By default min mbw is equal to max mbw. So hardware won't lower
> performance unless it's needed. This can fix the current performance issue.
> 2. Add a new schemata line (e.g. MBI:<id>=x;<id>=y;...) to specify min
> mbw just like max mbw specified by schemata line "MB:...". User can use
> this line to change min mbw per partition per node. This could be added
> in the future.
Thanks for bringing this up. It raises some more general queries about
the handling of mbw_min and so I'll move the discussion to
[PATCH v3 41/47] arm_mpam: Generate a configuration for min controls
as it isn't specific to the nvidia quirks.
>
> Thanks.
>
> -Fenghua
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* [PATCH v3 46/47] arm_mpam: Add workaround for T241-MPAM-6
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (44 preceding siblings ...)
2026-01-12 16:59 ` [PATCH v3 45/47] arm_mpam: Add workaround for T241-MPAM-4 Ben Horgan
@ 2026-01-12 16:59 ` Ben Horgan
2026-01-12 16:59 ` [PATCH v3 47/47] arm_mpam: Quirk CMN-650's CSU NRDY behaviour Ben Horgan
` (5 subsequent siblings)
51 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:59 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: Shanker Donthineni <sdonthineni@nvidia.com>
The registers MSMON_MBWU_L and MSMON_MBWU return the number of requests
rather than the number of bytes transferred.
Bandwidth resource monitoring is performed at the last level cache, where
each request arrive in 64Byte granularity. The current implementation
returns the number of transactions received at the last level cache but
does not provide the value in bytes. Scaling by 64 gives an accurate byte
count to match the MPAM specification for the MSMON_MBWU and MSMON_MBWU_L
registers. This patch fixes the issue by reporting the actual number of
bytes instead of the number of transactions from __ris_msmon_read().
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since rfc:
MPAM_IIDR_NVIDIA_T421 -> MPAM_IIDR_NVIDIA_T241
Don't apply workaround to MSMON_MBWU_LWD
---
Documentation/arch/arm64/silicon-errata.rst | 2 ++
drivers/resctrl/mpam_devices.c | 26 +++++++++++++++++++--
drivers/resctrl/mpam_internal.h | 1 +
3 files changed, 27 insertions(+), 2 deletions(-)
diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
index b18bc704d4a1..e810b2a8f40e 100644
--- a/Documentation/arch/arm64/silicon-errata.rst
+++ b/Documentation/arch/arm64/silicon-errata.rst
@@ -250,6 +250,8 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| NVIDIA | T241 MPAM | T241-MPAM-4 | N/A |
+----------------+-----------------+-----------------+-----------------------------+
+| NVIDIA | T241 MPAM | T241-MPAM-6 | N/A |
++----------------+-----------------+-----------------+-----------------------------+
+----------------+-----------------+-----------------+-----------------------------+
| Freescale/NXP | LS2080A/LS1043A | A-008585 | FSL_ERRATUM_A008585 |
+----------------+-----------------+-----------------+-----------------------------+
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 884ca6a6d8f3..7409cb7edab4 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -685,6 +685,12 @@ static const struct mpam_quirk mpam_quirks[] = {
.iidr_mask = MPAM_IIDR_MATCH_ONE,
.workaround = T241_FORCE_MBW_MIN_TO_ONE,
},
+ {
+ /* NVIDIA t241 erratum T241-MPAM-6 */
+ .iidr = MPAM_IIDR_NVIDIA_T241,
+ .iidr_mask = MPAM_IIDR_MATCH_ONE,
+ .workaround = T241_MBW_COUNTER_SCALE_64,
+ },
{ NULL } /* Sentinel */
};
@@ -1146,7 +1152,7 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
}
}
-static u64 mpam_msmon_overflow_val(enum mpam_device_features type)
+static u64 __mpam_msmon_overflow_val(enum mpam_device_features type)
{
/* TODO: implement scaling counters */
switch (type) {
@@ -1161,6 +1167,18 @@ static u64 mpam_msmon_overflow_val(enum mpam_device_features type)
}
}
+static u64 mpam_msmon_overflow_val(enum mpam_device_features type,
+ struct mpam_msc *msc)
+{
+ u64 overflow_val = __mpam_msmon_overflow_val(type);
+
+ if (mpam_has_quirk(T241_MBW_COUNTER_SCALE_64, msc) &&
+ type != mpam_feat_msmon_mbwu_63counter)
+ overflow_val *= 64;
+
+ return overflow_val;
+}
+
static void __ris_msmon_read(void *arg)
{
u64 now;
@@ -1251,13 +1269,17 @@ static void __ris_msmon_read(void *arg)
now = FIELD_GET(MSMON___VALUE, now);
}
+ if (mpam_has_quirk(T241_MBW_COUNTER_SCALE_64, msc) &&
+ m->type != mpam_feat_msmon_mbwu_63counter)
+ now *= 64;
+
if (nrdy)
break;
mbwu_state = &ris->mbwu_state[ctx->mon];
if (overflow)
- mbwu_state->correction += mpam_msmon_overflow_val(m->type);
+ mbwu_state->correction += mpam_msmon_overflow_val(m->type, msc);
/*
* Include bandwidth consumed before the last hardware reset and
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 7b4566814945..1680d1036472 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -225,6 +225,7 @@ struct mpam_props {
enum mpam_device_quirks {
T241_SCRUB_SHADOW_REGS,
T241_FORCE_MBW_MIN_TO_ONE,
+ T241_MBW_COUNTER_SCALE_64,
MPAM_QUIRK_LAST
};
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* [PATCH v3 47/47] arm_mpam: Quirk CMN-650's CSU NRDY behaviour
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (45 preceding siblings ...)
2026-01-12 16:59 ` [PATCH v3 46/47] arm_mpam: Add workaround for T241-MPAM-6 Ben Horgan
@ 2026-01-12 16:59 ` Ben Horgan
2026-01-19 12:18 ` Gavin Shan
2026-01-14 6:51 ` [PATCH RESEND v2 0/45] arm_mpam: Add KVM/arm64 and resctrl glue code Zeng Heng
` (4 subsequent siblings)
51 siblings, 1 reply; 160+ messages in thread
From: Ben Horgan @ 2026-01-12 16:59 UTC (permalink / raw)
To: ben.horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
From: James Morse <james.morse@arm.com>
CMN-650 is afflicted with an erratum where the CSU NRDY bit never clears.
This tells us the monitor never finishes scanning the cache. The erratum
document says to wait the maximum time, then ignore the field.
Add a flag to indicate whether this is the final attempt to read the
counter, and when this quirk is applied, ignore the NRDY field.
This means accesses to this counter will always retry, even if the counter
was previously programmed to the same values.
The counter value is not expected to be stable, it drifts up and down with
each allocation and eviction. The CSU register provides the value for a
point in time.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Documentation/arch/arm64/silicon-errata.rst | 3 +++
drivers/resctrl/mpam_devices.c | 12 ++++++++++++
drivers/resctrl/mpam_internal.h | 6 ++++++
3 files changed, 21 insertions(+)
diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
index e810b2a8f40e..3667650036fb 100644
--- a/Documentation/arch/arm64/silicon-errata.rst
+++ b/Documentation/arch/arm64/silicon-errata.rst
@@ -213,6 +213,9 @@ stable kernels.
| ARM | GIC-700 | #2941627 | ARM64_ERRATUM_2941627 |
+----------------+-----------------+-----------------+-----------------------------+
+----------------+-----------------+-----------------+-----------------------------+
+| ARM | CMN-650 | #3642720 | N/A |
++----------------+-----------------+-----------------+-----------------------------+
++----------------+-----------------+-----------------+-----------------------------+
| Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_845719 |
+----------------+-----------------+-----------------+-----------------------------+
| Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_843419 |
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 7409cb7edab4..e6c9ddaa60e2 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -691,6 +691,12 @@ static const struct mpam_quirk mpam_quirks[] = {
.iidr_mask = MPAM_IIDR_MATCH_ONE,
.workaround = T241_MBW_COUNTER_SCALE_64,
},
+ {
+ /* ARM CMN-650 CSU erratum 3642720 */
+ .iidr = MPAM_IIDR_ARM_CMN_650,
+ .iidr_mask = MPAM_IIDR_MATCH_ONE,
+ .workaround = IGNORE_CSU_NRDY,
+ },
{ NULL } /* Sentinel */
};
@@ -1003,6 +1009,7 @@ struct mon_read {
enum mpam_device_features type;
u64 *val;
int err;
+ bool waited_timeout;
};
static bool mpam_ris_has_mbwu_long_counter(struct mpam_msc_ris *ris)
@@ -1249,6 +1256,10 @@ static void __ris_msmon_read(void *arg)
if (mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, rprops))
nrdy = now & MSMON___NRDY;
now = FIELD_GET(MSMON___VALUE, now);
+
+ if (mpam_has_quirk(IGNORE_CSU_NRDY, msc) && m->waited_timeout)
+ nrdy = false;
+
break;
case mpam_feat_msmon_mbwu_31counter:
case mpam_feat_msmon_mbwu_44counter:
@@ -1386,6 +1397,7 @@ int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
.ctx = ctx,
.type = type,
.val = val,
+ .waited_timeout = true,
};
*val = 0;
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 1680d1036472..0bd323728b53 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -226,6 +226,7 @@ enum mpam_device_quirks {
T241_SCRUB_SHADOW_REGS,
T241_FORCE_MBW_MIN_TO_ONE,
T241_MBW_COUNTER_SCALE_64,
+ IGNORE_CSU_NRDY,
MPAM_QUIRK_LAST
};
@@ -251,6 +252,11 @@ struct mpam_quirk {
FIELD_PREP_CONST(MPAMF_IIDR_REVISION, 0) | \
FIELD_PREP_CONST(MPAMF_IIDR_IMPLEMENTER, 0x36b)
+#define MPAM_IIDR_ARM_CMN_650 FIELD_PREP_CONST(MPAMF_IIDR_PRODUCTID, 0) | \
+ FIELD_PREP_CONST(MPAMF_IIDR_VARIANT, 0) | \
+ FIELD_PREP_CONST(MPAMF_IIDR_REVISION, 0) | \
+ FIELD_PREP_CONST(MPAMF_IIDR_IMPLEMENTER, 0x43b)
+
/* The values for MSMON_CFG_MBWU_FLT.RWBW */
enum mon_filter_options {
COUNT_BOTH = 0,
--
2.43.0
^ permalink raw reply related [flat|nested] 160+ messages in thread* Re: [PATCH v3 47/47] arm_mpam: Quirk CMN-650's CSU NRDY behaviour
2026-01-12 16:59 ` [PATCH v3 47/47] arm_mpam: Quirk CMN-650's CSU NRDY behaviour Ben Horgan
@ 2026-01-19 12:18 ` Gavin Shan
2026-01-19 20:58 ` Ben Horgan
0 siblings, 1 reply; 160+ messages in thread
From: Gavin Shan @ 2026-01-19 12:18 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
On 1/13/26 12:59 AM, Ben Horgan wrote:
> From: James Morse <james.morse@arm.com>
>
> CMN-650 is afflicted with an erratum where the CSU NRDY bit never clears.
> This tells us the monitor never finishes scanning the cache. The erratum
> document says to wait the maximum time, then ignore the field.
>
> Add a flag to indicate whether this is the final attempt to read the
> counter, and when this quirk is applied, ignore the NRDY field.
>
> This means accesses to this counter will always retry, even if the counter
> was previously programmed to the same values.
>
> The counter value is not expected to be stable, it drifts up and down with
> each allocation and eviction. The CSU register provides the value for a
> point in time.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Documentation/arch/arm64/silicon-errata.rst | 3 +++
> drivers/resctrl/mpam_devices.c | 12 ++++++++++++
> drivers/resctrl/mpam_internal.h | 6 ++++++
> 3 files changed, 21 insertions(+)
>
> diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
> index e810b2a8f40e..3667650036fb 100644
> --- a/Documentation/arch/arm64/silicon-errata.rst
> +++ b/Documentation/arch/arm64/silicon-errata.rst
> @@ -213,6 +213,9 @@ stable kernels.
> | ARM | GIC-700 | #2941627 | ARM64_ERRATUM_2941627 |
> +----------------+-----------------+-----------------+-----------------------------+
> +----------------+-----------------+-----------------+-----------------------------+
> +| ARM | CMN-650 | #3642720 | N/A |
> ++----------------+-----------------+-----------------+-----------------------------+
> ++----------------+-----------------+-----------------+-----------------------------+
> | Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_845719 |
> +----------------+-----------------+-----------------+-----------------------------+
> | Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_843419 |
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 7409cb7edab4..e6c9ddaa60e2 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -691,6 +691,12 @@ static const struct mpam_quirk mpam_quirks[] = {
> .iidr_mask = MPAM_IIDR_MATCH_ONE,
> .workaround = T241_MBW_COUNTER_SCALE_64,
> },
> + {
> + /* ARM CMN-650 CSU erratum 3642720 */
> + .iidr = MPAM_IIDR_ARM_CMN_650,
> + .iidr_mask = MPAM_IIDR_MATCH_ONE,
> + .workaround = IGNORE_CSU_NRDY,
> + },
> { NULL } /* Sentinel */
> };
>
> @@ -1003,6 +1009,7 @@ struct mon_read {
> enum mpam_device_features type;
> u64 *val;
> int err;
> + bool waited_timeout;
> };
>
> static bool mpam_ris_has_mbwu_long_counter(struct mpam_msc_ris *ris)
> @@ -1249,6 +1256,10 @@ static void __ris_msmon_read(void *arg)
> if (mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, rprops))
> nrdy = now & MSMON___NRDY;
> now = FIELD_GET(MSMON___VALUE, now);
> +
> + if (mpam_has_quirk(IGNORE_CSU_NRDY, msc) && m->waited_timeout)
> + nrdy = false;
> +
> break;
> case mpam_feat_msmon_mbwu_31counter:
> case mpam_feat_msmon_mbwu_44counter:
> @@ -1386,6 +1397,7 @@ int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
> .ctx = ctx,
> .type = type,
> .val = val,
> + .waited_timeout = true,
> };
> *val = 0;
>
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 1680d1036472..0bd323728b53 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -226,6 +226,7 @@ enum mpam_device_quirks {
> T241_SCRUB_SHADOW_REGS,
> T241_FORCE_MBW_MIN_TO_ONE,
> T241_MBW_COUNTER_SCALE_64,
> + IGNORE_CSU_NRDY,
> MPAM_QUIRK_LAST
> };
>
> @@ -251,6 +252,11 @@ struct mpam_quirk {
> FIELD_PREP_CONST(MPAMF_IIDR_REVISION, 0) | \
> FIELD_PREP_CONST(MPAMF_IIDR_IMPLEMENTER, 0x36b)
>
> +#define MPAM_IIDR_ARM_CMN_650 FIELD_PREP_CONST(MPAMF_IIDR_PRODUCTID, 0) | \
> + FIELD_PREP_CONST(MPAMF_IIDR_VARIANT, 0) | \
> + FIELD_PREP_CONST(MPAMF_IIDR_REVISION, 0) | \
> + FIELD_PREP_CONST(MPAMF_IIDR_IMPLEMENTER, 0x43b)
> +
An error reported by checkpatch.pl as below.
ERROR: Macros with complex values should be enclosed in parentheses
#105: FILE: drivers/resctrl/mpam_internal.h:255:
+#define MPAM_IIDR_ARM_CMN_650 FIELD_PREP_CONST(MPAMF_IIDR_PRODUCTID, 0) | \
+ FIELD_PREP_CONST(MPAMF_IIDR_VARIANT, 0) | \
+ FIELD_PREP_CONST(MPAMF_IIDR_REVISION, 0) | \
+ FIELD_PREP_CONST(MPAMF_IIDR_IMPLEMENTER, 0x43b)
> /* The values for MSMON_CFG_MBWU_FLT.RWBW */
> enum mon_filter_options {
> COUNT_BOTH = 0,
Thanks,
Gavin
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 47/47] arm_mpam: Quirk CMN-650's CSU NRDY behaviour
2026-01-19 12:18 ` Gavin Shan
@ 2026-01-19 20:58 ` Ben Horgan
0 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-19 20:58 UTC (permalink / raw)
To: Gavin Shan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
Hi Gavin,
On 1/19/26 12:18, Gavin Shan wrote:
> On 1/13/26 12:59 AM, Ben Horgan wrote:
> An error reported by checkpatch.pl as below.
>
> ERROR: Macros with complex values should be enclosed in parentheses
> #105: FILE: drivers/resctrl/mpam_internal.h:255:
> +#define MPAM_IIDR_ARM_CMN_650
> FIELD_PREP_CONST(MPAMF_IIDR_PRODUCTID, 0) | \
> + FIELD_PREP_CONST(MPAMF_IIDR_VARIANT, 0) | \
> + FIELD_PREP_CONST(MPAMF_IIDR_REVISION, 0) | \
> + FIELD_PREP_CONST(MPAMF_IIDR_IMPLEMENTER, 0x43b)
>
Fixed
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH RESEND v2 0/45] arm_mpam: Add KVM/arm64 and resctrl glue code
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (46 preceding siblings ...)
2026-01-12 16:59 ` [PATCH v3 47/47] arm_mpam: Quirk CMN-650's CSU NRDY behaviour Ben Horgan
@ 2026-01-14 6:51 ` Zeng Heng
2026-01-15 14:37 ` Ben Horgan
2026-01-15 11:14 ` [PATCH v3 00/47] " Peter Newman
` (3 subsequent siblings)
51 siblings, 1 reply; 160+ messages in thread
From: Zeng Heng @ 2026-01-14 6:51 UTC (permalink / raw)
To: ben.horgan, james.morse
Cc: amitsinght, baisheng.gao, baolin.wang, carl, catalin.marinas,
corbet, dave.martin, david, dfustini, fenghuay, gshan, joey.gouly,
jonathan.cameron, kobak, kvmarm, lcherian, linux-arm-kernel,
linux-kernel, maz, oupton, peternewman, punit.agrawal, quic_jiles,
reinette.chatre, rohit.mathew, scott, sdonthineni, suzuki.poulose,
tan.shaopeng, will, sunnanyong, zengheng4
> From: Ben Horgan <ben.horgan@arm.com>
> Date: Fri, 19 Dec 2025 18:11:02 +0000
> Subject: [PATCH v2 00/45] arm_mpam: Add KVM/arm64 and resctrl glue code
>
> One major departure from the previous snapshot branches referenced in the
> base driver series is that the same MPAM setting are used for kernel-space
> and user-space. That is, MPAM1_EL1 is set to the same value as MPAM0_EL1
> rather than keeping the default value. The advantages of this are that it
> is closer to the x86 model where the closid is globally applicable, all
> partids are usable from user-space and user-space can't bypass MPAM
> controls by doing the work in the kernel. However, this causes some
> priority inversion where a high priority task waits to take a mutex held by
> another whose resources are restricted by MPAM. It also adds some extra
> isb(). I would be interested in opinions/data on the policy for MPAM in
> kernel space, i.e how MPAM1_EL1 is set.
Another advantage is that, given the small size of the L2 cache,
frequent switching of MPAM configurations between kernel and user modes
can cause cache-capacity jitter, making it difficult to isolate
interference from noisy neighborhood.
However, in addition to the issues mentioned above, updating the
MPAM1_EL1 configuration also exposes interrupt handling to the MPAM
settings of the current task.
I still agree with the current modification of setting MPAM1_EL1 to the
same value as MPAM0_EL1. However, the ARM MPAM hardware supports more
flexible configuration schemes than x86 RDT and another approach is also
worth considering: Software can let a control group choose whether
kernel mode follows the user mode MPAM settings, or whether the kernel
mode configuration is delegated to the default control group, though
this may change the existing user interface.
At the LPC resctrl micro-conference, Babu also mentioned the PLZA proposal
as an attempt to address the issues raised above. Seems like no clear
interface was presented yet. Wait to see what new interface that solution
will introduce.
One last thing, please add me to the CC list for future MPAM patch series.
I'll provide timely testing on my local aarch64 environment and review
feedback. Thanks.
Best Regards,
Zeng Heng
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH RESEND v2 0/45] arm_mpam: Add KVM/arm64 and resctrl glue code
2026-01-14 6:51 ` [PATCH RESEND v2 0/45] arm_mpam: Add KVM/arm64 and resctrl glue code Zeng Heng
@ 2026-01-15 14:37 ` Ben Horgan
0 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-15 14:37 UTC (permalink / raw)
To: Zeng Heng, james.morse
Cc: amitsinght, baisheng.gao, baolin.wang, carl, catalin.marinas,
corbet, dave.martin, david, dfustini, fenghuay, gshan, joey.gouly,
jonathan.cameron, kobak, kvmarm, lcherian, linux-arm-kernel,
linux-kernel, maz, oupton, peternewman, punit.agrawal, quic_jiles,
reinette.chatre, rohit.mathew, scott, sdonthineni, suzuki.poulose,
tan.shaopeng, will, sunnanyong, Babu Moger
Hi Zeng,
+CC Babu (Comments on PLZA)
On 1/14/26 06:51, Zeng Heng wrote:
>> From: Ben Horgan <ben.horgan@arm.com>
>> Date: Fri, 19 Dec 2025 18:11:02 +0000
>> Subject: [PATCH v2 00/45] arm_mpam: Add KVM/arm64 and resctrl glue code
>>
>> One major departure from the previous snapshot branches referenced in the
>> base driver series is that the same MPAM setting are used for kernel-space
>> and user-space. That is, MPAM1_EL1 is set to the same value as MPAM0_EL1
>> rather than keeping the default value. The advantages of this are that it
>> is closer to the x86 model where the closid is globally applicable, all
>> partids are usable from user-space and user-space can't bypass MPAM
>> controls by doing the work in the kernel. However, this causes some
>> priority inversion where a high priority task waits to take a mutex held by
>> another whose resources are restricted by MPAM. It also adds some extra
>> isb(). I would be interested in opinions/data on the policy for MPAM in
>> kernel space, i.e how MPAM1_EL1 is set.
>
> Another advantage is that, given the small size of the L2 cache,
> frequent switching of MPAM configurations between kernel and user modes
> can cause cache-capacity jitter, making it difficult to isolate
> interference from noisy neighborhood.
>
> However, in addition to the issues mentioned above, updating the
> MPAM1_EL1 configuration also exposes interrupt handling to the MPAM
> settings of the current task.
Makes sense, thanks for these two observations.
>
> I still agree with the current modification of setting MPAM1_EL1 to the
> same value as MPAM0_EL1. However, the ARM MPAM hardware supports more
> flexible configuration schemes than x86 RDT and another approach is also
> worth considering: Software can let a control group choose whether
> kernel mode follows the user mode MPAM settings, or whether the kernel
> mode configuration is delegated to the default control group, though
> this may change the existing user interface.
I wonder if this would be possible in AMD PLZA as well. Babu?
>
> At the LPC resctrl micro-conference, Babu also mentioned the PLZA proposal
> as an attempt to address the issues raised above. Seems like no clear
> interface was presented yet. Wait to see what new interface that solution
> will introduce.
Yes, I watched a recording of that. :)
>
> One last thing, please add me to the CC list for future MPAM patch series.
> I'll provide timely testing on my local aarch64 environment and review
> feedback. Thanks.
Will do. Apologies for not doing this earlier and thank you for the
promise of testing and reviews :) There is a v3 which you have hopefully
seen.
>
>
> Best Regards,
> Zeng Heng
>
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (47 preceding siblings ...)
2026-01-14 6:51 ` [PATCH RESEND v2 0/45] arm_mpam: Add KVM/arm64 and resctrl glue code Zeng Heng
@ 2026-01-15 11:14 ` Peter Newman
2026-01-15 11:36 ` Ben Horgan
2026-01-16 10:47 ` Shaopeng Tan (Fujitsu)
` (2 subsequent siblings)
51 siblings, 1 reply; 160+ messages in thread
From: Peter Newman @ 2026-01-15 11:14 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
Hi Ben,
On Mon, Jan 12, 2026 at 5:59 PM Ben Horgan <ben.horgan@arm.com> wrote:
>
> This new version of the mpam missing pieces has no major rework from the
> previous version. It's mainly small corrections and code tidying based on
> review and things I spotted along the way. To be able to merge this we need
> review from more people and people to start testing on their platforms and
> giving some Tested-by tags.
>
> Change list in patches.
>
> As mentioned in the cover letter for v2, one major departure from the
> previous snapshot branches referenced in the base driver series is that the
> same MPAM setting are used for kernel-space and user-space. There are pros
> and cons of choosing this policy but I think it is the best thing to start
> with as there are AMD plans for adding a resctrl feature to allow a
> different closid/rmid configuration for user-space from kernel space. The
> AMD feature is called PLZA and is mentioned in this lpc slide deck[1]. This
> gives us a path forward to add support for having the EL1 and EL0 MPAM
> partid/pmg configuration differ from each other.
>
> From James' cover letter:
>
> This is the missing piece to make MPAM usable resctrl in user-space. This has
> shed its debugfs code and the read/write 'event configuration' for the monitors
> to make the series smaller.
>
> This adds the arch code and KVM support first. I anticipate the whole thing
> going via arm64, but if goes via tip instead, the an immutable branch with those
> patches should be easy to do.
>
> Generally the resctrl glue code works by picking what MPAM features it can expose
> from the MPAM drive, then configuring the structs that back the resctrl helpers.
> If your platform is sufficiently Xeon shaped, you should be able to get L2/L3 CPOR
> bitmaps exposed via resctrl. CSU counters work if they are on/after the L3. MBWU
> counters are considerably more hairy, and depend on hueristics around the topology,
> and a bunch of stuff trying to emulate ABMC.
> If it didn't pick what you wanted it to, please share the debug messages produced
> when enabling dynamic debug and booting with:
> | dyndbg="file mpam_resctrl.c +pl"
>
> I've not found a platform that can test all the behaviours around the monitors,
> so this is where I'd expect the most bugs.
>
> The MPAM spec that describes all the system and MMIO registers can be found here:
> https://developer.arm.com/documentation/ddi0598/db/?lang=en
> (Ignored the 'RETIRED' warning - that is just arm moving the documentation around.
> This document has the best overview)
>
>
> Based on v6.19-rc5
> This series can be retrieved from:
> https://gitlab.arm.com/linux-arm/linux-bh.git mpam_resctrl_glue_v3
>
> v2 can be found at:
> https://lore.kernel.org/linux-arm-kernel/20251219181147.3404071-1-ben.horgan@arm.com/
>
> rfc can be found at:
> https://lore.kernel.org/linux-arm-kernel/20251205215901.17772-1-james.morse@arm.com/
>
> [1] https://lpc.events/event/19/contributions/2093/attachments/1958/4172/resctrl%20Microconference%20LPC%202025%20Tokyo.pdf
>
> Ben Horgan (10):
> arm_mpam: Use non-atomic bitops when modifying feature bitmap
> arm64/sysreg: Add MPAMSM_EL1 register
> KVM: arm64: Preserve host MPAM configuration when changing traps
> KVM: arm64: Make MPAMSM_EL1 accesses UNDEF
> arm64: mpam: Initialise and context switch the MPAMSM_EL1 register
> KVM: arm64: Use kernel-space partid configuration for hypercalls
> arm_mpam: resctrl: Add rmid index helpers
> arm_mpam: resctrl: Add kunit test for rmid idx conversions
> arm_mpam: resctrl: Wait for cacheinfo to be ready
> arm_mpam: resctrl: Add kunit test for mbw min control generation
>
> Dave Martin (2):
> arm_mpam: resctrl: Convert to/from MPAMs fixed-point formats
> arm_mpam: resctrl: Add kunit test for control format conversions
>
> James Morse (30):
> arm64: mpam: Context switch the MPAM registers
> arm64: mpam: Re-initialise MPAM regs when CPU comes online
> arm64: mpam: Advertise the CPUs MPAM limits to the driver
> arm64: mpam: Add cpu_pm notifier to restore MPAM sysregs
> arm64: mpam: Add helpers to change a task or cpu's MPAM PARTID/PMG
> values
> KVM: arm64: Force guest EL1 to use user-space's partid configuration
> arm_mpam: resctrl: Add boilerplate cpuhp and domain allocation
> arm_mpam: resctrl: Sort the order of the domain lists
> arm_mpam: resctrl: Pick the caches we will use as resctrl resources
> arm_mpam: resctrl: Implement resctrl_arch_reset_all_ctrls()
> arm_mpam: resctrl: Add resctrl_arch_get_config()
> arm_mpam: resctrl: Implement helpers to update configuration
> arm_mpam: resctrl: Add plumbing against arm64 task and cpu hooks
> arm_mpam: resctrl: Add CDP emulation
> arm_mpam: resctrl: Add support for 'MB' resource
> arm_mpam: resctrl: Add support for csu counters
> arm_mpam: resctrl: Pick classes for use as mbm counters
> arm_mpam: resctrl: Pre-allocate free running monitors
> arm_mpam: resctrl: Pre-allocate assignable monitors
> arm_mpam: resctrl: Add kunit test for ABMC/CDP interactions
> arm_mpam: resctrl: Add resctrl_arch_config_cntr() for ABMC use
> arm_mpam: resctrl: Allow resctrl to allocate monitors
> arm_mpam: resctrl: Add resctrl_arch_rmid_read() and
> resctrl_arch_reset_rmid()
> arm_mpam: resctrl: Add resctrl_arch_cntr_read() &
> resctrl_arch_reset_cntr()
> arm_mpam: resctrl: Update the rmid reallocation limit
> arm_mpam: resctrl: Add empty definitions for assorted resctrl
> functions
> arm64: mpam: Select ARCH_HAS_CPU_RESCTRL
> arm_mpam: resctrl: Call resctrl_init() on platforms that can support
> resctrl
> arm_mpam: Generate a configuration for min controls
> arm_mpam: Quirk CMN-650's CSU NRDY behaviour
>
> Jiapeng Chong (1):
> arm_mpam: Remove duplicate linux/srcu.h header
>
> Shanker Donthineni (4):
> arm_mpam: Add quirk framework
> arm_mpam: Add workaround for T241-MPAM-1
> arm_mpam: Add workaround for T241-MPAM-4
> arm_mpam: Add workaround for T241-MPAM-6
>
> Documentation/arch/arm64/silicon-errata.rst | 9 +
> arch/arm64/Kconfig | 6 +-
> arch/arm64/include/asm/el2_setup.h | 3 +-
> arch/arm64/include/asm/mpam.h | 98 +
> arch/arm64/include/asm/resctrl.h | 2 +
> arch/arm64/include/asm/thread_info.h | 3 +
> arch/arm64/kernel/Makefile | 1 +
> arch/arm64/kernel/cpufeature.c | 21 +-
> arch/arm64/kernel/mpam.c | 58 +
> arch/arm64/kernel/process.c | 7 +
> arch/arm64/kvm/hyp/include/hyp/switch.h | 12 +-
> arch/arm64/kvm/hyp/nvhe/hyp-main.c | 8 +
> arch/arm64/kvm/hyp/vhe/sysreg-sr.c | 13 +
> arch/arm64/kvm/sys_regs.c | 2 +
> arch/arm64/tools/sysreg | 8 +
> drivers/resctrl/Kconfig | 9 +-
> drivers/resctrl/Makefile | 1 +
> drivers/resctrl/mpam_devices.c | 306 ++-
> drivers/resctrl/mpam_internal.h | 131 +-
> drivers/resctrl/mpam_resctrl.c | 1930 +++++++++++++++++++
> drivers/resctrl/test_mpam_devices.c | 66 +
> drivers/resctrl/test_mpam_resctrl.c | 426 ++++
> include/linux/arm_mpam.h | 32 +
> 23 files changed, 3119 insertions(+), 33 deletions(-)
> create mode 100644 arch/arm64/include/asm/mpam.h
> create mode 100644 arch/arm64/include/asm/resctrl.h
> create mode 100644 arch/arm64/kernel/mpam.c
> create mode 100644 drivers/resctrl/mpam_resctrl.c
> create mode 100644 drivers/resctrl/test_mpam_resctrl.c
Like before, I applied the patches and successfully booted a kernel on
a baremetal Google Cloud C4A instance. I was able to confirm that the
resources we expect were present and I was able to successfully run
the monitor assignment test cases I used to validate ABMC on AMD
systems. (Though I had to hack the driver to pretend there were less
MBWU monitors so that the counter assignment interfaces would become
available.)
My use cases only cover resctrl in the host, so I didn't try any of
the KVM integration, but I can at least say there wasn't any evidence
that it interfered with resctrl.
Tested-by: Peter Newman <peternewman@google.com>
Thanks,
-Peter
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code
2026-01-15 11:14 ` [PATCH v3 00/47] " Peter Newman
@ 2026-01-15 11:36 ` Ben Horgan
0 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-15 11:36 UTC (permalink / raw)
To: Peter Newman
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, punit.agrawal,
quic_jiles, reinette.chatre, rohit.mathew, scott, sdonthineni,
tan.shaopeng, xhao, catalin.marinas, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
Hi Peter,
On 1/15/26 11:14, Peter Newman wrote:
> Hi Ben,
>
> On Mon, Jan 12, 2026 at 5:59 PM Ben Horgan <ben.horgan@arm.com> wrote:
[...]
>
> Like before, I applied the patches and successfully booted a kernel on
> a baremetal Google Cloud C4A instance. I was able to confirm that the
> resources we expect were present and I was able to successfully run
> the monitor assignment test cases I used to validate ABMC on AMD
> systems. (Though I had to hack the driver to pretend there were less
> MBWU monitors so that the counter assignment interfaces would become
> available.)
>
> My use cases only cover resctrl in the host, so I didn't try any of
> the KVM integration, but I can at least say there wasn't any evidence
> that it interfered with resctrl.
>
> Tested-by: Peter Newman <peternewman@google.com>
Thanks for the testing!
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (48 preceding siblings ...)
2026-01-15 11:14 ` [PATCH v3 00/47] " Peter Newman
@ 2026-01-16 10:47 ` Shaopeng Tan (Fujitsu)
2026-01-16 11:05 ` Ben Horgan
2026-01-16 15:47 ` (subset) " Catalin Marinas
2026-01-19 1:30 ` Gavin Shan
51 siblings, 1 reply; 160+ messages in thread
From: Shaopeng Tan (Fujitsu) @ 2026-01-16 10:47 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght@marvell.com, baisheng.gao@unisoc.com,
baolin.wang@linux.alibaba.com, carl@os.amperecomputing.com,
dave.martin@arm.com, david@kernel.org, dfustini@baylibre.com,
fenghuay@nvidia.com, gshan@redhat.com, james.morse@arm.com,
jonathan.cameron@huawei.com, kobak@nvidia.com,
lcherian@marvell.com, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, peternewman@google.com,
punit.agrawal@oss.qualcomm.com, quic_jiles@quicinc.com,
reinette.chatre@intel.com, rohit.mathew@arm.com,
scott@os.amperecomputing.com, sdonthineni@nvidia.com,
xhao@linux.alibaba.com, catalin.marinas@arm.com, will@kernel.org,
corbet@lwn.net, maz@kernel.org, oupton@kernel.org,
joey.gouly@arm.com, suzuki.poulose@arm.com,
kvmarm@lists.linux.dev
Hello Ben,
I ran the MPAM driver on NVIDIA's Grace machine, and it seems to be working fine.
Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code
2026-01-16 10:47 ` Shaopeng Tan (Fujitsu)
@ 2026-01-16 11:05 ` Ben Horgan
0 siblings, 0 replies; 160+ messages in thread
From: Ben Horgan @ 2026-01-16 11:05 UTC (permalink / raw)
To: Shaopeng Tan (Fujitsu)
Cc: amitsinght@marvell.com, baisheng.gao@unisoc.com,
baolin.wang@linux.alibaba.com, carl@os.amperecomputing.com,
dave.martin@arm.com, david@kernel.org, dfustini@baylibre.com,
fenghuay@nvidia.com, gshan@redhat.com, james.morse@arm.com,
jonathan.cameron@huawei.com, kobak@nvidia.com,
lcherian@marvell.com, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, peternewman@google.com,
punit.agrawal@oss.qualcomm.com, quic_jiles@quicinc.com,
reinette.chatre@intel.com, rohit.mathew@arm.com,
scott@os.amperecomputing.com, sdonthineni@nvidia.com,
xhao@linux.alibaba.com, catalin.marinas@arm.com, will@kernel.org,
corbet@lwn.net, maz@kernel.org, oupton@kernel.org,
joey.gouly@arm.com, suzuki.poulose@arm.com,
kvmarm@lists.linux.dev
Hi Shaopeng,
On 1/16/26 10:47, Shaopeng Tan (Fujitsu) wrote:
> Hello Ben,
>
> I ran the MPAM driver on NVIDIA's Grace machine, and it seems to be working fine.
>
> Tested-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Thanks for the testing!
Ben
^ permalink raw reply [flat|nested] 160+ messages in thread
* Re: (subset) [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (49 preceding siblings ...)
2026-01-16 10:47 ` Shaopeng Tan (Fujitsu)
@ 2026-01-16 15:47 ` Catalin Marinas
2026-01-19 1:30 ` Gavin Shan
51 siblings, 0 replies; 160+ messages in thread
From: Catalin Marinas @ 2026-01-16 15:47 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, gshan, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, will, corbet, maz, oupton,
joey.gouly, suzuki.poulose, kvmarm
On Mon, 12 Jan 2026 16:58:27 +0000, Ben Horgan wrote:
> This new version of the mpam missing pieces has no major rework from the
> previous version. It's mainly small corrections and code tidying based on
> review and things I spotted along the way. To be able to merge this we need
> review from more people and people to start testing on their platforms and
> giving some Tested-by tags.
>
> Change list in patches.
>
> [...]
Applied to arm64 (for-next/fixes), thanks!
[01/47] arm_mpam: Remove duplicate linux/srcu.h header
https://git.kernel.org/arm64/c/b5a69c486921
[02/47] arm_mpam: Use non-atomic bitops when modifying feature bitmap
https://git.kernel.org/arm64/c/b9f5c38e4af1
--
Catalin
^ permalink raw reply [flat|nested] 160+ messages in thread* Re: [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code
2026-01-12 16:58 [PATCH v3 00/47] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
` (50 preceding siblings ...)
2026-01-16 15:47 ` (subset) " Catalin Marinas
@ 2026-01-19 1:30 ` Gavin Shan
51 siblings, 0 replies; 160+ messages in thread
From: Gavin Shan @ 2026-01-19 1:30 UTC (permalink / raw)
To: Ben Horgan
Cc: amitsinght, baisheng.gao, baolin.wang, carl, dave.martin, david,
dfustini, fenghuay, james.morse, jonathan.cameron, kobak,
lcherian, linux-arm-kernel, linux-kernel, peternewman,
punit.agrawal, quic_jiles, reinette.chatre, rohit.mathew, scott,
sdonthineni, tan.shaopeng, xhao, catalin.marinas, will, corbet,
maz, oupton, joey.gouly, suzuki.poulose, kvmarm
On 1/13/26 12:58 AM, Ben Horgan wrote:
> This new version of the mpam missing pieces has no major rework from the
> previous version. It's mainly small corrections and code tidying based on
> review and things I spotted along the way. To be able to merge this we need
> review from more people and people to start testing on their platforms and
> giving some Tested-by tags.
>
> Change list in patches.
>
> As mentioned in the cover letter for v2, one major departure from the
> previous snapshot branches referenced in the base driver series is that the
> same MPAM setting are used for kernel-space and user-space. There are pros
> and cons of choosing this policy but I think it is the best thing to start
> with as there are AMD plans for adding a resctrl feature to allow a
> different closid/rmid configuration for user-space from kernel space. The
> AMD feature is called PLZA and is mentioned in this lpc slide deck[1]. This
> gives us a path forward to add support for having the EL1 and EL0 MPAM
> partid/pmg configuration differ from each other.
>
> From James' cover letter:
>
> This is the missing piece to make MPAM usable resctrl in user-space. This has
> shed its debugfs code and the read/write 'event configuration' for the monitors
> to make the series smaller.
>
> This adds the arch code and KVM support first. I anticipate the whole thing
> going via arm64, but if goes via tip instead, the an immutable branch with those
> patches should be easy to do.
>
> Generally the resctrl glue code works by picking what MPAM features it can expose
> from the MPAM drive, then configuring the structs that back the resctrl helpers.
> If your platform is sufficiently Xeon shaped, you should be able to get L2/L3 CPOR
> bitmaps exposed via resctrl. CSU counters work if they are on/after the L3. MBWU
> counters are considerably more hairy, and depend on hueristics around the topology,
> and a bunch of stuff trying to emulate ABMC.
> If it didn't pick what you wanted it to, please share the debug messages produced
> when enabling dynamic debug and booting with:
> | dyndbg="file mpam_resctrl.c +pl"
>
> I've not found a platform that can test all the behaviours around the monitors,
> so this is where I'd expect the most bugs.
>
> The MPAM spec that describes all the system and MMIO registers can be found here:
> https://developer.arm.com/documentation/ddi0598/db/?lang=en
> (Ignored the 'RETIRED' warning - that is just arm moving the documentation around.
> This document has the best overview)
>
>
> Based on v6.19-rc5
> This series can be retrieved from:
> https://gitlab.arm.com/linux-arm/linux-bh.git mpam_resctrl_glue_v3
>
> v2 can be found at:
> https://lore.kernel.org/linux-arm-kernel/20251219181147.3404071-1-ben.horgan@arm.com/
>
> rfc can be found at:
> https://lore.kernel.org/linux-arm-kernel/20251205215901.17772-1-james.morse@arm.com/
>
> [1] https://lpc.events/event/19/contributions/2093/attachments/1958/4172/resctrl%20Microconference%20LPC%202025%20Tokyo.pdf
>
> Ben Horgan (10):
> arm_mpam: Use non-atomic bitops when modifying feature bitmap
> arm64/sysreg: Add MPAMSM_EL1 register
> KVM: arm64: Preserve host MPAM configuration when changing traps
> KVM: arm64: Make MPAMSM_EL1 accesses UNDEF
> arm64: mpam: Initialise and context switch the MPAMSM_EL1 register
> KVM: arm64: Use kernel-space partid configuration for hypercalls
> arm_mpam: resctrl: Add rmid index helpers
> arm_mpam: resctrl: Add kunit test for rmid idx conversions
> arm_mpam: resctrl: Wait for cacheinfo to be ready
> arm_mpam: resctrl: Add kunit test for mbw min control generation
>
> Dave Martin (2):
> arm_mpam: resctrl: Convert to/from MPAMs fixed-point formats
> arm_mpam: resctrl: Add kunit test for control format conversions
>
> James Morse (30):
> arm64: mpam: Context switch the MPAM registers
> arm64: mpam: Re-initialise MPAM regs when CPU comes online
> arm64: mpam: Advertise the CPUs MPAM limits to the driver
> arm64: mpam: Add cpu_pm notifier to restore MPAM sysregs
> arm64: mpam: Add helpers to change a task or cpu's MPAM PARTID/PMG
> values
> KVM: arm64: Force guest EL1 to use user-space's partid configuration
> arm_mpam: resctrl: Add boilerplate cpuhp and domain allocation
> arm_mpam: resctrl: Sort the order of the domain lists
> arm_mpam: resctrl: Pick the caches we will use as resctrl resources
> arm_mpam: resctrl: Implement resctrl_arch_reset_all_ctrls()
> arm_mpam: resctrl: Add resctrl_arch_get_config()
> arm_mpam: resctrl: Implement helpers to update configuration
> arm_mpam: resctrl: Add plumbing against arm64 task and cpu hooks
> arm_mpam: resctrl: Add CDP emulation
> arm_mpam: resctrl: Add support for 'MB' resource
> arm_mpam: resctrl: Add support for csu counters
> arm_mpam: resctrl: Pick classes for use as mbm counters
> arm_mpam: resctrl: Pre-allocate free running monitors
> arm_mpam: resctrl: Pre-allocate assignable monitors
> arm_mpam: resctrl: Add kunit test for ABMC/CDP interactions
> arm_mpam: resctrl: Add resctrl_arch_config_cntr() for ABMC use
> arm_mpam: resctrl: Allow resctrl to allocate monitors
> arm_mpam: resctrl: Add resctrl_arch_rmid_read() and
> resctrl_arch_reset_rmid()
> arm_mpam: resctrl: Add resctrl_arch_cntr_read() &
> resctrl_arch_reset_cntr()
> arm_mpam: resctrl: Update the rmid reallocation limit
> arm_mpam: resctrl: Add empty definitions for assorted resctrl
> functions
> arm64: mpam: Select ARCH_HAS_CPU_RESCTRL
> arm_mpam: resctrl: Call resctrl_init() on platforms that can support
> resctrl
> arm_mpam: Generate a configuration for min controls
> arm_mpam: Quirk CMN-650's CSU NRDY behaviour
>
> Jiapeng Chong (1):
> arm_mpam: Remove duplicate linux/srcu.h header
>
> Shanker Donthineni (4):
> arm_mpam: Add quirk framework
> arm_mpam: Add workaround for T241-MPAM-1
> arm_mpam: Add workaround for T241-MPAM-4
> arm_mpam: Add workaround for T241-MPAM-6
>
> Documentation/arch/arm64/silicon-errata.rst | 9 +
> arch/arm64/Kconfig | 6 +-
> arch/arm64/include/asm/el2_setup.h | 3 +-
> arch/arm64/include/asm/mpam.h | 98 +
> arch/arm64/include/asm/resctrl.h | 2 +
> arch/arm64/include/asm/thread_info.h | 3 +
> arch/arm64/kernel/Makefile | 1 +
> arch/arm64/kernel/cpufeature.c | 21 +-
> arch/arm64/kernel/mpam.c | 58 +
> arch/arm64/kernel/process.c | 7 +
> arch/arm64/kvm/hyp/include/hyp/switch.h | 12 +-
> arch/arm64/kvm/hyp/nvhe/hyp-main.c | 8 +
> arch/arm64/kvm/hyp/vhe/sysreg-sr.c | 13 +
> arch/arm64/kvm/sys_regs.c | 2 +
> arch/arm64/tools/sysreg | 8 +
> drivers/resctrl/Kconfig | 9 +-
> drivers/resctrl/Makefile | 1 +
> drivers/resctrl/mpam_devices.c | 306 ++-
> drivers/resctrl/mpam_internal.h | 131 +-
> drivers/resctrl/mpam_resctrl.c | 1930 +++++++++++++++++++
> drivers/resctrl/test_mpam_devices.c | 66 +
> drivers/resctrl/test_mpam_resctrl.c | 426 ++++
> include/linux/arm_mpam.h | 32 +
> 23 files changed, 3119 insertions(+), 33 deletions(-)
> create mode 100644 arch/arm64/include/asm/mpam.h
> create mode 100644 arch/arm64/include/asm/resctrl.h
> create mode 100644 arch/arm64/kernel/mpam.c
> create mode 100644 drivers/resctrl/mpam_resctrl.c
> create mode 100644 drivers/resctrl/test_mpam_resctrl.c
>
The L3 cache partitioning and MBW (soft) limiting works fine on NVidia's grace-hopper machine.
Tested-by: Gavin Shan <gshan@redhat.com>
Thanks,
Gavin
^ permalink raw reply [flat|nested] 160+ messages in thread