* [RFC PATCH 01/38] arm64: mpam: Context switch the MPAM registers
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-05 23:53 ` Fenghua Yu
` (3 more replies)
2025-12-05 21:58 ` [RFC PATCH 02/38] arm64: mpam: Re-initialise MPAM regs when CPU comes online James Morse
` (37 subsequent siblings)
38 siblings, 4 replies; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
MPAM allows traffic in the SoC to be labeled by the OS, these labels
are used to apply policy in caches and bandwidth regulators, and to
monitor traffic in the SoC. The label is made up of a PARTID and PMG
value. The x86 equivalent calls these CLOSID and RMID, but they don't
map precisely.
MPAM has two CPU system registers that is used to hold the PARTID and PMG
values that traffic generated at each exception level will use. These can be
set per-task by the resctrl file system. (resctrl is the defacto interface
for controlling this stuff).
Add a helper to switch this.
struct task_struct's separate CLOSID and RMID fields are insufficient
to implement resctrl using MPAM, as resctrl can change the PARTID (CLOSID)
and PMG (sort of like the RMID) separately. On x86, the rmid is an
independent number, so a race that writes a mismatched closid and rmid
into hardware is benign. On arm64, the pmg bits extend the partid.
(i.e. partid-5 has a pmg-0 that is not the same as partid-6's pmg-0).
In this case, mismatching the values will 'dirty' a pmg value that
resctrl believes is clean, and is not tracking with its 'limbo' code.
To avoid this, the partid and pmg are always read and written as a pair.
Instead of making struct task_struct's closid and rmid fields an
endian-unsafe union, add the value to struct thread_info and always use
READ_ONCE()/WRITE_ONCE() when accessing this field.
Resctrl allows a per-cpu 'default' value to be set, this overrides the
values when scheduling a task in the default control-group, which has
PARTID 0. The way 'code data prioritisation' gets emulated means the
register value for the default group needs to be a variable.
The current system register value is kept in a per-cpu variable to
avoid writing to the system register if the value isn't going to change.
Writes to this register may reset the hardware state for regulating
bandwidth.
Finally, there is no reason to context switch these registers unless
there is a driver changing the values in struct task_struct. Hide
the whole thing behind a static key. This also allows the driver to
disable MPAM in response to errors reported by hardware. Move the
existing static key to belong to the arch code, as in the future
the MPAM driver may become a loadable module.
All this should depend on whether there is an MPAM driver, hide
it behind CONFIG_MPAM.
CC: Amit Singh Tomar <amitsinght@marvell.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
arch/arm64/Kconfig | 2 +
arch/arm64/include/asm/mpam.h | 74 ++++++++++++++++++++++++++++
arch/arm64/include/asm/thread_info.h | 3 ++
arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/mpam.c | 13 +++++
arch/arm64/kernel/process.c | 7 +++
drivers/resctrl/mpam_devices.c | 2 -
drivers/resctrl/mpam_internal.h | 2 +
8 files changed, 102 insertions(+), 2 deletions(-)
create mode 100644 arch/arm64/include/asm/mpam.h
create mode 100644 arch/arm64/kernel/mpam.c
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 004d58cfbff8..558baa9e7c08 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2048,6 +2048,8 @@ config ARM64_MPAM
MPAM is exposed to user-space via the resctrl pseudo filesystem.
+ This option enables the extra context switch code.
+
endmenu # "ARMv8.4 architectural features"
menu "ARMv8.5 architectural features"
diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
new file mode 100644
index 000000000000..86a55176f884
--- /dev/null
+++ b/arch/arm64/include/asm/mpam.h
@@ -0,0 +1,74 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2025 Arm Ltd. */
+
+#ifndef __ASM__MPAM_H
+#define __ASM__MPAM_H
+
+#include <linux/bitops.h>
+#include <linux/init.h>
+#include <linux/jump_label.h>
+#include <linux/percpu.h>
+#include <linux/sched.h>
+
+#include <asm/cpucaps.h>
+#include <asm/cpufeature.h>
+#include <asm/sysreg.h>
+
+DECLARE_STATIC_KEY_FALSE(mpam_enabled);
+DECLARE_PER_CPU(u64, arm64_mpam_default);
+DECLARE_PER_CPU(u64, arm64_mpam_current);
+
+/*
+ * The value of the MPAM0_EL1 sysreg when a task is in resctrl's default group.
+ * This is used by the context switch code to use the resctrl CPU property
+ * instead. The value is modified when CDP is enabled/disabled by mounting
+ * the resctrl filesystem.
+ */
+extern u64 arm64_mpam_global_default;
+
+/*
+ * The resctrl filesystem writes to the partid/pmg values for threads and CPUs,
+ * which may race with reads in __mpam_sched_in(). Ensure only one of the old
+ * or new values are used. Particular care should be taken with the pmg field
+ * as __mpam_sched_in() may read a partid and pmg that don't match, causing
+ * this value to be stored with cache allocations, despite being considered
+ * 'free' by resctrl.
+ *
+ * A value in struct thread_info is used instead of struct task_struct as the
+ * cpu's u64 register format is used, but struct task_struct has two u32'.
+ */
+static inline u64 mpam_get_regval(struct task_struct *tsk)
+{
+#ifdef CONFIG_ARM64_MPAM
+ return READ_ONCE(task_thread_info(tsk)->mpam_partid_pmg);
+#else
+ return 0;
+#endif
+}
+
+static inline void mpam_thread_switch(struct task_struct *tsk)
+{
+ u64 oldregval;
+ int cpu = smp_processor_id();
+ u64 regval = mpam_get_regval(tsk);
+
+ if (!IS_ENABLED(CONFIG_ARM64_MPAM) ||
+ !static_branch_likely(&mpam_enabled))
+ return;
+
+ if (regval == READ_ONCE(arm64_mpam_global_default))
+ regval = READ_ONCE(per_cpu(arm64_mpam_default, cpu));
+
+ oldregval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
+ if (oldregval == regval)
+ return;
+
+ write_sysreg_s(regval, SYS_MPAM1_EL1);
+ isb();
+
+ /* Synchronising the EL0 write is left until the ERET to EL0 */
+ write_sysreg_s(regval, SYS_MPAM0_EL1);
+
+ WRITE_ONCE(per_cpu(arm64_mpam_current, cpu), regval);
+}
+#endif /* __ASM__MPAM_H */
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index f241b8601ebd..c226dabd5019 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -41,6 +41,9 @@ struct thread_info {
#ifdef CONFIG_SHADOW_CALL_STACK
void *scs_base;
void *scs_sp;
+#endif
+#ifdef CONFIG_ARM64_MPAM
+ u64 mpam_partid_pmg;
#endif
u32 cpu;
};
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 76f32e424065..15979f366519 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -67,6 +67,7 @@ obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
obj-$(CONFIG_VMCORE_INFO) += vmcore_info.o
obj-$(CONFIG_ARM_SDE_INTERFACE) += sdei.o
obj-$(CONFIG_ARM64_PTR_AUTH) += pointer_auth.o
+obj-$(CONFIG_ARM64_MPAM) += mpam.o
obj-$(CONFIG_ARM64_MTE) += mte.o
obj-y += vdso-wrap.o
obj-$(CONFIG_COMPAT_VDSO) += vdso32-wrap.o
diff --git a/arch/arm64/kernel/mpam.c b/arch/arm64/kernel/mpam.c
new file mode 100644
index 000000000000..9866d2ca0faa
--- /dev/null
+++ b/arch/arm64/kernel/mpam.c
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2025 Arm Ltd. */
+
+#include <asm/mpam.h>
+
+#include <linux/jump_label.h>
+#include <linux/percpu.h>
+
+DEFINE_STATIC_KEY_FALSE(mpam_enabled);
+DEFINE_PER_CPU(u64, arm64_mpam_default);
+DEFINE_PER_CPU(u64, arm64_mpam_current);
+
+u64 arm64_mpam_global_default;
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index fba7ca102a8c..b510c0699313 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -51,6 +51,7 @@
#include <asm/fpsimd.h>
#include <asm/gcs.h>
#include <asm/mmu_context.h>
+#include <asm/mpam.h>
#include <asm/mte.h>
#include <asm/processor.h>
#include <asm/pointer_auth.h>
@@ -737,6 +738,12 @@ struct task_struct *__switch_to(struct task_struct *prev,
if (prev->thread.sctlr_user != next->thread.sctlr_user)
update_sctlr_el1(next->thread.sctlr_user);
+ /*
+ * MPAM thread switch happens after the DSB to ensure prev's accesses
+ * use prev's MPAM settings.
+ */
+ mpam_thread_switch(next);
+
/* the actual thread switch */
last = cpu_switch_to(prev, next);
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 0b5b158e1aaf..2996ad93fc3e 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -29,8 +29,6 @@
#include "mpam_internal.h"
-DEFINE_STATIC_KEY_FALSE(mpam_enabled); /* This moves to arch code */
-
/*
* mpam_list_lock protects the SRCU lists when writing. Once the
* mpam_enabled key is enabled these lists are read-only,
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index e79c3c47259c..4508a6654fe0 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -17,6 +17,8 @@
#include <linux/srcu.h>
#include <linux/types.h>
+#include <asm/mpam.h>
+
#define MPAM_MSC_MAX_NUM_RIS 16
struct platform_device;
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [RFC PATCH 01/38] arm64: mpam: Context switch the MPAM registers
2025-12-05 21:58 ` [RFC PATCH 01/38] arm64: mpam: Context switch the MPAM registers James Morse
@ 2025-12-05 23:53 ` Fenghua Yu
2025-12-09 15:08 ` Ben Horgan
2025-12-09 14:49 ` Ben Horgan
` (2 subsequent siblings)
3 siblings, 1 reply; 95+ messages in thread
From: Fenghua Yu @ 2025-12-05 23:53 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel
Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, baisheng.gao, Jonathan Cameron, Gavin Shan,
Ben Horgan, rohit.mathew, reinette.chatre, Punit Agrawal
Hi, James,
On 12/5/25 13:58, James Morse wrote:
> MPAM allows traffic in the SoC to be labeled by the OS, these labels
> are used to apply policy in caches and bandwidth regulators, and to
> monitor traffic in the SoC. The label is made up of a PARTID and PMG
> value. The x86 equivalent calls these CLOSID and RMID, but they don't
> map precisely.
>
> MPAM has two CPU system registers that is used to hold the PARTID and PMG
> values that traffic generated at each exception level will use. These can be
> set per-task by the resctrl file system. (resctrl is the defacto interface
> for controlling this stuff).
>
> Add a helper to switch this.
>
> struct task_struct's separate CLOSID and RMID fields are insufficient
> to implement resctrl using MPAM, as resctrl can change the PARTID (CLOSID)
> and PMG (sort of like the RMID) separately. On x86, the rmid is an
> independent number, so a race that writes a mismatched closid and rmid
> into hardware is benign. On arm64, the pmg bits extend the partid.
> (i.e. partid-5 has a pmg-0 that is not the same as partid-6's pmg-0).
> In this case, mismatching the values will 'dirty' a pmg value that
> resctrl believes is clean, and is not tracking with its 'limbo' code.
>
> To avoid this, the partid and pmg are always read and written as a pair.
> Instead of making struct task_struct's closid and rmid fields an
> endian-unsafe union, add the value to struct thread_info and always use
> READ_ONCE()/WRITE_ONCE() when accessing this field.
>
> Resctrl allows a per-cpu 'default' value to be set, this overrides the
> values when scheduling a task in the default control-group, which has
> PARTID 0. The way 'code data prioritisation' gets emulated means the
> register value for the default group needs to be a variable.
>
> The current system register value is kept in a per-cpu variable to
> avoid writing to the system register if the value isn't going to change.
> Writes to this register may reset the hardware state for regulating
> bandwidth.
>
> Finally, there is no reason to context switch these registers unless
> there is a driver changing the values in struct task_struct. Hide
> the whole thing behind a static key. This also allows the driver to
> disable MPAM in response to errors reported by hardware. Move the
> existing static key to belong to the arch code, as in the future
> the MPAM driver may become a loadable module.
>
> All this should depend on whether there is an MPAM driver, hide
> it behind CONFIG_MPAM.
>
> CC: Amit Singh Tomar <amitsinght@marvell.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> arch/arm64/Kconfig | 2 +
> arch/arm64/include/asm/mpam.h | 74 ++++++++++++++++++++++++++++
> arch/arm64/include/asm/thread_info.h | 3 ++
> arch/arm64/kernel/Makefile | 1 +
> arch/arm64/kernel/mpam.c | 13 +++++
> arch/arm64/kernel/process.c | 7 +++
> drivers/resctrl/mpam_devices.c | 2 -
> drivers/resctrl/mpam_internal.h | 2 +
> 8 files changed, 102 insertions(+), 2 deletions(-)
> create mode 100644 arch/arm64/include/asm/mpam.h
> create mode 100644 arch/arm64/kernel/mpam.c
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 004d58cfbff8..558baa9e7c08 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2048,6 +2048,8 @@ config ARM64_MPAM
>
> MPAM is exposed to user-space via the resctrl pseudo filesystem.
>
> + This option enables the extra context switch code.
> +
> endmenu # "ARMv8.4 architectural features"
>
> menu "ARMv8.5 architectural features"
> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
> new file mode 100644
> index 000000000000..86a55176f884
> --- /dev/null
> +++ b/arch/arm64/include/asm/mpam.h
> @@ -0,0 +1,74 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright (C) 2025 Arm Ltd. */
> +
> +#ifndef __ASM__MPAM_H
> +#define __ASM__MPAM_H
> +
> +#include <linux/bitops.h>
> +#include <linux/init.h>
> +#include <linux/jump_label.h>
> +#include <linux/percpu.h>
> +#include <linux/sched.h>
> +
> +#include <asm/cpucaps.h>
> +#include <asm/cpufeature.h>
> +#include <asm/sysreg.h>
> +
> +DECLARE_STATIC_KEY_FALSE(mpam_enabled);
> +DECLARE_PER_CPU(u64, arm64_mpam_default);
> +DECLARE_PER_CPU(u64, arm64_mpam_current);
> +
> +/*
> + * The value of the MPAM0_EL1 sysreg when a task is in resctrl's default group.
> + * This is used by the context switch code to use the resctrl CPU property
> + * instead. The value is modified when CDP is enabled/disabled by mounting
> + * the resctrl filesystem.
> + */
> +extern u64 arm64_mpam_global_default;
> +
> +/*
> + * The resctrl filesystem writes to the partid/pmg values for threads and CPUs,
> + * which may race with reads in __mpam_sched_in(). Ensure only one of the old
> + * or new values are used. Particular care should be taken with the pmg field
> + * as __mpam_sched_in() may read a partid and pmg that don't match, causing
> + * this value to be stored with cache allocations, despite being considered
> + * 'free' by resctrl.
> + *
> + * A value in struct thread_info is used instead of struct task_struct as the
> + * cpu's u64 register format is used, but struct task_struct has two u32'.
> + */
> +static inline u64 mpam_get_regval(struct task_struct *tsk)
> +{
> +#ifdef CONFIG_ARM64_MPAM
> + return READ_ONCE(task_thread_info(tsk)->mpam_partid_pmg);
> +#else
> + return 0;
> +#endif
> +}
> +
> +static inline void mpam_thread_switch(struct task_struct *tsk)
> +{
> + u64 oldregval;
> + int cpu = smp_processor_id();
> + u64 regval = mpam_get_regval(tsk);
> +
> + if (!IS_ENABLED(CONFIG_ARM64_MPAM) ||
> + !static_branch_likely(&mpam_enabled))
> + return;
> +
> + if (regval == READ_ONCE(arm64_mpam_global_default))
Why is this check needed? We need to read arm64_mpam_default in any
case, right?> + regval = READ_ONCE(per_cpu(arm64_mpam_default, cpu));
> +
> + oldregval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
> + if (oldregval == regval)
> + return;
> +
> + write_sysreg_s(regval, SYS_MPAM1_EL1);
> + isb();
> +
> + /* Synchronising the EL0 write is left until the ERET to EL0 */
> + write_sysreg_s(regval, SYS_MPAM0_EL1);
> +
> + WRITE_ONCE(per_cpu(arm64_mpam_current, cpu), regval);
> +}
> +#endif /* __ASM__MPAM_H */
[SNIP]
Thanks.
-Fenghua
^ permalink raw reply [flat|nested] 95+ messages in thread* Re: [RFC PATCH 01/38] arm64: mpam: Context switch the MPAM registers
2025-12-05 23:53 ` Fenghua Yu
@ 2025-12-09 15:08 ` Ben Horgan
0 siblings, 0 replies; 95+ messages in thread
From: Ben Horgan @ 2025-12-09 15:08 UTC (permalink / raw)
To: Fenghua Yu, James Morse, linux-kernel, linux-arm-kernel
Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, baisheng.gao, Jonathan Cameron, Gavin Shan,
rohit.mathew, reinette.chatre, Punit Agrawal
Hi Fenghua,
On 12/5/25 23:53, Fenghua Yu wrote:
> Hi, James,
>
> On 12/5/25 13:58, James Morse wrote:
[...]
>>
>> Resctrl allows a per-cpu 'default' value to be set, this overrides the
>> values when scheduling a task in the default control-group, which has
>> PARTID 0. The way 'code data prioritisation' gets emulated means the
>> register value for the default group needs to be a variable.
>>
[...]
>> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/
>> mpam.h
>> new file mode 100644
>> index 000000000000..86a55176f884
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/mpam.h
[...]
>> +/*
>> + * The value of the MPAM0_EL1 sysreg when a task is in resctrl's
>> default group.
>> + * This is used by the context switch code to use the resctrl CPU
>> property
>> + * instead. The value is modified when CDP is enabled/disabled by
>> mounting
>> + * the resctrl filesystem.
>> + */
>> +extern u64 arm64_mpam_global_default;
>> +
>> +/*
>> + * The resctrl filesystem writes to the partid/pmg values for threads
>> and CPUs,
>> + * which may race with reads in __mpam_sched_in(). Ensure only one of
>> the old
>> + * or new values are used. Particular care should be taken with the
>> pmg field
>> + * as __mpam_sched_in() may read a partid and pmg that don't match,
>> causing
>> + * this value to be stored with cache allocations, despite being
>> considered
>> + * 'free' by resctrl.
>> + *
>> + * A value in struct thread_info is used instead of struct
>> task_struct as the
>> + * cpu's u64 register format is used, but struct task_struct has two
>> u32'.
>> + */
>> +static inline u64 mpam_get_regval(struct task_struct *tsk)
>> +{
>> +#ifdef CONFIG_ARM64_MPAM
>> + return READ_ONCE(task_thread_info(tsk)->mpam_partid_pmg);
>> +#else
>> + return 0;
>> +#endif
>> +}
>> +
>> +static inline void mpam_thread_switch(struct task_struct *tsk)
>> +{
>> + u64 oldregval;
>> + int cpu = smp_processor_id();
>> + u64 regval = mpam_get_regval(tsk);
>> +
>> + if (!IS_ENABLED(CONFIG_ARM64_MPAM) ||
>> + !static_branch_likely(&mpam_enabled))
>> + return;
>> +
>> + if (regval == READ_ONCE(arm64_mpam_global_default))
> Why is this check needed? We need to read arm64_mpam_default in any
> case, right?
If a task is in the default resctrl group then the per-cpu resctrl
configuration determines the mpam configuration. The way this is dealt
with here is that for a task in the default group mpam_partid_pmg is set
to arm64_mpam_global_default. When mpam_partid_pmg is
arm64_mpam_global_default we know the task is in the default group,
otherwise it would have a different partid, and so we consider the
per-cpu configuration, arm64_mpam_default. When the task is not in the
default resctrl group then we can just consider the per-task configuration.
As mentioned in the commit message, this is complicated by code data
prioritisation (cdp) and the value of arm64_mpam_global_default, depends
on whether this is enabled or not. See resctrl_arch_set_cdp_enabled()
added in a later patch. I'm not sure of a way to make this clearer in
the code while keeping the arch and drivers/resctrl patches separate.
> + regval = READ_ONCE(per_cpu(arm64_mpam_default,
> cpu));
>> +
>> + oldregval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
>> + if (oldregval == regval)
>> + return;
>> +
>> + write_sysreg_s(regval, SYS_MPAM1_EL1);
>> + isb();
>> +
>> + /* Synchronising the EL0 write is left until the ERET to EL0 */
>> + write_sysreg_s(regval, SYS_MPAM0_EL1);
>> +
>> + WRITE_ONCE(per_cpu(arm64_mpam_current, cpu), regval);
>> +}
>> +#endif /* __ASM__MPAM_H */
>
> [SNIP]
> Thanks.
>
> -Fenghua
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [RFC PATCH 01/38] arm64: mpam: Context switch the MPAM registers
2025-12-05 21:58 ` [RFC PATCH 01/38] arm64: mpam: Context switch the MPAM registers James Morse
2025-12-05 23:53 ` Fenghua Yu
@ 2025-12-09 14:49 ` Ben Horgan
2025-12-12 12:30 ` Ben Horgan
2025-12-18 10:35 ` Jonathan Cameron
3 siblings, 0 replies; 95+ messages in thread
From: Ben Horgan @ 2025-12-09 14:49 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel
Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal
Hi James,
On 12/5/25 21:58, James Morse wrote:
> MPAM allows traffic in the SoC to be labeled by the OS, these labels
> are used to apply policy in caches and bandwidth regulators, and to
> monitor traffic in the SoC. The label is made up of a PARTID and PMG
> value. The x86 equivalent calls these CLOSID and RMID, but they don't
> map precisely.
>
[...]
>
> All this should depend on whether there is an MPAM driver, hide
> it behind CONFIG_MPAM.
CONFIG_MPAM -> CONFIG_ARM64_MPAM
>
> CC: Amit Singh Tomar <amitsinght@marvell.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> arch/arm64/Kconfig | 2 +
> arch/arm64/include/asm/mpam.h | 74 ++++++++++++++++++++++++++++
> arch/arm64/include/asm/thread_info.h | 3 ++
> arch/arm64/kernel/Makefile | 1 +
> arch/arm64/kernel/mpam.c | 13 +++++
> arch/arm64/kernel/process.c | 7 +++
> drivers/resctrl/mpam_devices.c | 2 -
> drivers/resctrl/mpam_internal.h | 2 +
> 8 files changed, 102 insertions(+), 2 deletions(-)
> create mode 100644 arch/arm64/include/asm/mpam.h
> create mode 100644 arch/arm64/kernel/mpam.c
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 004d58cfbff8..558baa9e7c08 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2048,6 +2048,8 @@ config ARM64_MPAM
>
> MPAM is exposed to user-space via the resctrl pseudo filesystem.
>
> + This option enables the extra context switch code.
> +
> endmenu # "ARMv8.4 architectural features"
>
> menu "ARMv8.5 architectural features"
> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
> new file mode 100644
> index 000000000000..86a55176f884
> --- /dev/null
> +++ b/arch/arm64/include/asm/mpam.h
> @@ -0,0 +1,74 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright (C) 2025 Arm Ltd. */
> +
> +#ifndef __ASM__MPAM_H
> +#define __ASM__MPAM_H
> +
> +#include <linux/bitops.h>
> +#include <linux/init.h>
> +#include <linux/jump_label.h>
> +#include <linux/percpu.h>
> +#include <linux/sched.h>
> +
> +#include <asm/cpucaps.h>
> +#include <asm/cpufeature.h>
> +#include <asm/sysreg.h>
> +
> +DECLARE_STATIC_KEY_FALSE(mpam_enabled);
> +DECLARE_PER_CPU(u64, arm64_mpam_default);
> +DECLARE_PER_CPU(u64, arm64_mpam_current);
> +
> +/*
> + * The value of the MPAM0_EL1 sysreg when a task is in resctrl's default group.
> + * This is used by the context switch code to use the resctrl CPU property
> + * instead. The value is modified when CDP is enabled/disabled by mounting
> + * the resctrl filesystem.
> + */
> +extern u64 arm64_mpam_global_default;
> +
> +/*
> + * The resctrl filesystem writes to the partid/pmg values for threads and CPUs,
> + * which may race with reads in __mpam_sched_in(). Ensure only one of the old
> + * or new values are used. Particular care should be taken with the pmg field
> + * as __mpam_sched_in() may read a partid and pmg that don't match, causing
> + * this value to be stored with cache allocations, despite being considered
> + * 'free' by resctrl.
__mpam_sched_in() -> mpam_thread_switch()
This is called from resctrl_arch_sched_in().
> + *
> + * A value in struct thread_info is used instead of struct task_struct as the
> + * cpu's u64 register format is used, but struct task_struct has two u32'.
> + */
> +static inline u64 mpam_get_regval(struct task_struct *tsk)
> +{
> +#ifdef CONFIG_ARM64_MPAM
> + return READ_ONCE(task_thread_info(tsk)->mpam_partid_pmg);
> +#else
> + return 0;
> +#endif
> +}
> +
> +static inline void mpam_thread_switch(struct task_struct *tsk)
> +{
> + u64 oldregval;
> + int cpu = smp_processor_id();
> + u64 regval = mpam_get_regval(tsk);
> +
> + if (!IS_ENABLED(CONFIG_ARM64_MPAM) ||
> + !static_branch_likely(&mpam_enabled))
> + return;
> +
> + if (regval == READ_ONCE(arm64_mpam_global_default))
> + regval = READ_ONCE(per_cpu(arm64_mpam_default, cpu));
> +
> + oldregval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
> + if (oldregval == regval)
> + return;
> +
> + write_sysreg_s(regval, SYS_MPAM1_EL1);
> + isb();
> +
> + /* Synchronising the EL0 write is left until the ERET to EL0 */
> + write_sysreg_s(regval, SYS_MPAM0_EL1);
> +
> + WRITE_ONCE(per_cpu(arm64_mpam_current, cpu), regval);
> +}
> +#endif /* __ASM__MPAM_H */
> diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
> index f241b8601ebd..c226dabd5019 100644
> --- a/arch/arm64/include/asm/thread_info.h
> +++ b/arch/arm64/include/asm/thread_info.h
> @@ -41,6 +41,9 @@ struct thread_info {
> #ifdef CONFIG_SHADOW_CALL_STACK
> void *scs_base;
> void *scs_sp;
> +#endif
> +#ifdef CONFIG_ARM64_MPAM
> + u64 mpam_partid_pmg;
> #end if
> u32 cpu;
> };
> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> index 76f32e424065..15979f366519 100644
> --- a/arch/arm64/kernel/Makefile
> +++ b/arch/arm64/kernel/Makefile
> @@ -67,6 +67,7 @@ obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
> obj-$(CONFIG_VMCORE_INFO) += vmcore_info.o
> obj-$(CONFIG_ARM_SDE_INTERFACE) += sdei.o
> obj-$(CONFIG_ARM64_PTR_AUTH) += pointer_auth.o
> +obj-$(CONFIG_ARM64_MPAM) += mpam.o
> obj-$(CONFIG_ARM64_MTE) += mte.o
> obj-y += vdso-wrap.o
> obj-$(CONFIG_COMPAT_VDSO) += vdso32-wrap.o
> diff --git a/arch/arm64/kernel/mpam.c b/arch/arm64/kernel/mpam.c
> new file mode 100644
> index 000000000000..9866d2ca0faa
> --- /dev/null
> +++ b/arch/arm64/kernel/mpam.c
> @@ -0,0 +1,13 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (C) 2025 Arm Ltd. */
> +
> +#include <asm/mpam.h>
> +
> +#include <linux/jump_label.h>
> +#include <linux/percpu.h>
> +
> +DEFINE_STATIC_KEY_FALSE(mpam_enabled);
> +DEFINE_PER_CPU(u64, arm64_mpam_default);
> +DEFINE_PER_CPU(u64, arm64_mpam_current);
> +
> +u64 arm64_mpam_global_default;
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index fba7ca102a8c..b510c0699313 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -51,6 +51,7 @@
> #include <asm/fpsimd.h>
> #include <asm/gcs.h>
> #include <asm/mmu_context.h>
> +#include <asm/mpam.h>
> #include <asm/mte.h>
> #include <asm/processor.h>
> #include <asm/pointer_auth.h>
> @@ -737,6 +738,12 @@ struct task_struct *__switch_to(struct task_struct *prev,
> if (prev->thread.sctlr_user != next->thread.sctlr_user)
> update_sctlr_el1(next->thread.sctlr_user);
>
> + /*
> + * MPAM thread switch happens after the DSB to ensure prev's accesses
> + * use prev's MPAM settings.
> + */
> + mpam_thread_switch(next);
> +
> /* the actual thread switch */
> last = cpu_switch_to(prev, next);
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 0b5b158e1aaf..2996ad93fc3e 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -29,8 +29,6 @@
>
> #include "mpam_internal.h"
>
> -DEFINE_STATIC_KEY_FALSE(mpam_enabled); /* This moves to arch code */
> -
> /*
> * mpam_list_lock protects the SRCU lists when writing. Once the
> * mpam_enabled key is enabled these lists are read-only,
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index e79c3c47259c..4508a6654fe0 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -17,6 +17,8 @@
> #include <linux/srcu.h>
> #include <linux/types.h>
>
> +#include <asm/mpam.h>
> +
> #define MPAM_MSC_MAX_NUM_RIS 16
>
> struct platform_device;
The DECLARE_STATIC_KEY_FALSE(mpam_enabled) on the next line can now be
removed.
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread* Re: [RFC PATCH 01/38] arm64: mpam: Context switch the MPAM registers
2025-12-05 21:58 ` [RFC PATCH 01/38] arm64: mpam: Context switch the MPAM registers James Morse
2025-12-05 23:53 ` Fenghua Yu
2025-12-09 14:49 ` Ben Horgan
@ 2025-12-12 12:30 ` Ben Horgan
2025-12-18 10:35 ` Jonathan Cameron
3 siblings, 0 replies; 95+ messages in thread
From: Ben Horgan @ 2025-12-12 12:30 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel
Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal
Hi James,
On 12/5/25 21:58, James Morse wrote:
> MPAM allows traffic in the SoC to be labeled by the OS, these labels
> are used to apply policy in caches and bandwidth regulators, and to
> monitor traffic in the SoC. The label is made up of a PARTID and PMG
> value. The x86 equivalent calls these CLOSID and RMID, but they don't
> map precisely.
>
> MPAM has two CPU system registers that is used to hold the PARTID and PMG
> values that traffic generated at each exception level will use. These can be
> set per-task by the resctrl file system. (resctrl is the defacto interface
> for controlling this stuff).
>
> Add a helper to switch this.
>
> struct task_struct's separate CLOSID and RMID fields are insufficient
> to implement resctrl using MPAM, as resctrl can change the PARTID (CLOSID)
> and PMG (sort of like the RMID) separately. On x86, the rmid is an
> independent number, so a race that writes a mismatched closid and rmid
> into hardware is benign. On arm64, the pmg bits extend the partid.
> (i.e. partid-5 has a pmg-0 that is not the same as partid-6's pmg-0).
> In this case, mismatching the values will 'dirty' a pmg value that
> resctrl believes is clean, and is not tracking with its 'limbo' code.
>
> To avoid this, the partid and pmg are always read and written as a pair.
> Instead of making struct task_struct's closid and rmid fields an
> endian-unsafe union, add the value to struct thread_info and always use
> READ_ONCE()/WRITE_ONCE() when accessing this field.
>
> Resctrl allows a per-cpu 'default' value to be set, this overrides the
> values when scheduling a task in the default control-group, which has
> PARTID 0. The way 'code data prioritisation' gets emulated means the
> register value for the default group needs to be a variable.
>
> The current system register value is kept in a per-cpu variable to
> avoid writing to the system register if the value isn't going to change.
> Writes to this register may reset the hardware state for regulating
> bandwidth.
>
> Finally, there is no reason to context switch these registers unless
> there is a driver changing the values in struct task_struct. Hide
> the whole thing behind a static key. This also allows the driver to
> disable MPAM in response to errors reported by hardware. Move the
> existing static key to belong to the arch code, as in the future
> the MPAM driver may become a loadable module.
>
> All this should depend on whether there is an MPAM driver, hide
> it behind CONFIG_MPAM.
>
> CC: Amit Singh Tomar <amitsinght@marvell.com>
> Signed-off-by: James Morse <james.morse@arm.com>
[...]
> +
> +static inline void mpam_thread_switch(struct task_struct *tsk)
> +{
> + u64 oldregval;
> + int cpu = smp_processor_id();
> + u64 regval = mpam_get_regval(tsk);
> +
> + if (!IS_ENABLED(CONFIG_ARM64_MPAM) ||
> + !static_branch_likely(&mpam_enabled))
> + return;
> +
> + if (regval == READ_ONCE(arm64_mpam_global_default))
> + regval = READ_ONCE(per_cpu(arm64_mpam_default, cpu));
> +
> + oldregval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
> + if (oldregval == regval)
> + return;
> +
> + write_sysreg_s(regval, SYS_MPAM1_EL1);
> + isb();
> +
> + /* Synchronising the EL0 write is left until the ERET to EL0 */
> + write_sysreg_s(regval, SYS_MPAM0_EL1);
SYS_MPAMSM_EL1 needs to be written here too to account for accesses
generated by SME loads. Also, when in streaming mode, SVE and FP loads,
stores and SVE prefetches.
SYS_MPAMSM_EL1 should also be considered in initialisation code too.
> +
> + WRITE_ONCE(per_cpu(arm64_mpam_current, cpu), regval);
> +}
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread* Re: [RFC PATCH 01/38] arm64: mpam: Context switch the MPAM registers
2025-12-05 21:58 ` [RFC PATCH 01/38] arm64: mpam: Context switch the MPAM registers James Morse
` (2 preceding siblings ...)
2025-12-12 12:30 ` Ben Horgan
@ 2025-12-18 10:35 ` Jonathan Cameron
2025-12-18 14:52 ` Ben Horgan
3 siblings, 1 reply; 95+ messages in thread
From: Jonathan Cameron @ 2025-12-18 10:35 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
On Fri, 5 Dec 2025 21:58:24 +0000
James Morse <james.morse@arm.com> wrote:
> MPAM allows traffic in the SoC to be labeled by the OS, these labels
> are used to apply policy in caches and bandwidth regulators, and to
> monitor traffic in the SoC. The label is made up of a PARTID and PMG
> value. The x86 equivalent calls these CLOSID and RMID, but they don't
> map precisely.
>
> MPAM has two CPU system registers that is used to hold the PARTID and PMG
> values that traffic generated at each exception level will use. These can be
> set per-task by the resctrl file system. (resctrl is the defacto interface
> for controlling this stuff).
>
> Add a helper to switch this.
>
> struct task_struct's separate CLOSID and RMID fields are insufficient
> to implement resctrl using MPAM, as resctrl can change the PARTID (CLOSID)
> and PMG (sort of like the RMID) separately. On x86, the rmid is an
> independent number, so a race that writes a mismatched closid and rmid
> into hardware is benign. On arm64, the pmg bits extend the partid.
> (i.e. partid-5 has a pmg-0 that is not the same as partid-6's pmg-0).
> In this case, mismatching the values will 'dirty' a pmg value that
> resctrl believes is clean, and is not tracking with its 'limbo' code.
>
> To avoid this, the partid and pmg are always read and written as a pair.
> Instead of making struct task_struct's closid and rmid fields an
> endian-unsafe union, add the value to struct thread_info and always use
> READ_ONCE()/WRITE_ONCE() when accessing this field.
>
> Resctrl allows a per-cpu 'default' value to be set, this overrides the
> values when scheduling a task in the default control-group, which has
> PARTID 0. The way 'code data prioritisation' gets emulated means the
> register value for the default group needs to be a variable.
>
> The current system register value is kept in a per-cpu variable to
> avoid writing to the system register if the value isn't going to change.
> Writes to this register may reset the hardware state for regulating
> bandwidth.
>
> Finally, there is no reason to context switch these registers unless
> there is a driver changing the values in struct task_struct. Hide
> the whole thing behind a static key. This also allows the driver to
> disable MPAM in response to errors reported by hardware. Move the
> existing static key to belong to the arch code, as in the future
> the MPAM driver may become a loadable module.
>
> All this should depend on whether there is an MPAM driver, hide
> it behind CONFIG_MPAM.
>
> CC: Amit Singh Tomar <amitsinght@marvell.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
> new file mode 100644
> index 000000000000..86a55176f884
> --- /dev/null
> +++ b/arch/arm64/include/asm/mpam.h
> @@ -0,0 +1,74 @@
...
> +/*
> + * The resctrl filesystem writes to the partid/pmg values for threads and CPUs,
> + * which may race with reads in __mpam_sched_in(). Ensure only one of the old
> + * or new values are used. Particular care should be taken with the pmg field
> + * as __mpam_sched_in() may read a partid and pmg that don't match, causing
> + * this value to be stored with cache allocations, despite being considered
> + * 'free' by resctrl.
> + *
> + * A value in struct thread_info is used instead of struct task_struct as the
> + * cpu's u64 register format is used, but struct task_struct has two u32'.
This comment probably wants to provide a little more info if it is to be useful,
Is it a reference to the closid and rmid fields under CONFIG_X86_CPU_RESCTRL?
I'm not immediately understanding why that matters given you could slap
a union on it without (I think) resulting in anything else moving.
Now having it in thread_info moves it into arch header territory so
might make sense for that reason.
> + */
> +static inline u64 mpam_get_regval(struct task_struct *tsk)
> +{
> +#ifdef CONFIG_ARM64_MPAM
> + return READ_ONCE(task_thread_info(tsk)->mpam_partid_pmg);
> +#else
> + return 0;
> +#endif
> +}
^ permalink raw reply [flat|nested] 95+ messages in thread* Re: [RFC PATCH 01/38] arm64: mpam: Context switch the MPAM registers
2025-12-18 10:35 ` Jonathan Cameron
@ 2025-12-18 14:52 ` Ben Horgan
2025-12-18 14:55 ` Ben Horgan
0 siblings, 1 reply; 95+ messages in thread
From: Ben Horgan @ 2025-12-18 14:52 UTC (permalink / raw)
To: Jonathan Cameron, James Morse
Cc: linux-kernel, linux-arm-kernel, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal
Hi Jonathan,
On 12/18/25 10:35, Jonathan Cameron wrote:
> On Fri, 5 Dec 2025 21:58:24 +0000
> James Morse <james.morse@arm.com> wrote:
>
>> MPAM allows traffic in the SoC to be labeled by the OS, these labels
>> are used to apply policy in caches and bandwidth regulators, and to
>> monitor traffic in the SoC. The label is made up of a PARTID and PMG
>> value. The x86 equivalent calls these CLOSID and RMID, but they don't
>> map precisely.
>>
>> MPAM has two CPU system registers that is used to hold the PARTID and PMG
>> values that traffic generated at each exception level will use. These can be
>> set per-task by the resctrl file system. (resctrl is the defacto interface
>> for controlling this stuff).
>>
>> Add a helper to switch this.
>>
>> struct task_struct's separate CLOSID and RMID fields are insufficient
>> to implement resctrl using MPAM, as resctrl can change the PARTID (CLOSID)
>> and PMG (sort of like the RMID) separately. On x86, the rmid is an
>> independent number, so a race that writes a mismatched closid and rmid
>> into hardware is benign. On arm64, the pmg bits extend the partid.
>> (i.e. partid-5 has a pmg-0 that is not the same as partid-6's pmg-0).
>> In this case, mismatching the values will 'dirty' a pmg value that
>> resctrl believes is clean, and is not tracking with its 'limbo' code.
>>
>> To avoid this, the partid and pmg are always read and written as a pair.
>> Instead of making struct task_struct's closid and rmid fields an
>> endian-unsafe union, add the value to struct thread_info and always use
>> READ_ONCE()/WRITE_ONCE() when accessing this field.
>>
>> Resctrl allows a per-cpu 'default' value to be set, this overrides the
>> values when scheduling a task in the default control-group, which has
>> PARTID 0. The way 'code data prioritisation' gets emulated means the
>> register value for the default group needs to be a variable.
>>
>> The current system register value is kept in a per-cpu variable to
>> avoid writing to the system register if the value isn't going to change.
>> Writes to this register may reset the hardware state for regulating
>> bandwidth.
>>
>> Finally, there is no reason to context switch these registers unless
>> there is a driver changing the values in struct task_struct. Hide
>> the whole thing behind a static key. This also allows the driver to
>> disable MPAM in response to errors reported by hardware. Move the
>> existing static key to belong to the arch code, as in the future
>> the MPAM driver may become a loadable module.
>>
>> All this should depend on whether there is an MPAM driver, hide
>> it behind CONFIG_MPAM.
>>
>> CC: Amit Singh Tomar <amitsinght@marvell.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>
>> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
>> new file mode 100644
>> index 000000000000..86a55176f884
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/mpam.h
>> @@ -0,0 +1,74 @@
> ...
>
>> +/*
>> + * The resctrl filesystem writes to the partid/pmg values for threads and CPUs,
>> + * which may race with reads in __mpam_sched_in(). Ensure only one of the old
>> + * or new values are used. Particular care should be taken with the pmg field
>> + * as __mpam_sched_in() may read a partid and pmg that don't match, causing
>> + * this value to be stored with cache allocations, despite being considered
>> + * 'free' by resctrl.
>> + *
>> + * A value in struct thread_info is used instead of struct task_struct as the
>> + * cpu's u64 register format is used, but struct task_struct has two u32'.
>
> This comment probably wants to provide a little more info if it is to be useful,
>
> Is it a reference to the closid and rmid fields under CONFIG_X86_CPU_RESCTRL?
> I'm not immediately understanding why that matters given you could slap
> a union on it without (I think) resulting in anything else moving.
Yes, the fields referred to are those closid and rmid. As James writes
in the commit message a union is an alternative, but it would be endian
unsafe. Unlikely to matter but lets not break things.
I'm replying for James as he is otherwise engaged. Thanks for the review
of this series and all your review on the previous MPAM series.
>
> Now having it in thread_info moves it into arch header territory so
> might make sense for that reason.
>
>> + */
>> +static inline u64 mpam_get_regval(struct task_struct *tsk)
>> +{
>> +#ifdef CONFIG_ARM64_MPAM
>> + return READ_ONCE(task_thread_info(tsk)->mpam_partid_pmg);
>> +#else
>> + return 0;
>> +#endif
>> +}
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread* Re: [RFC PATCH 01/38] arm64: mpam: Context switch the MPAM registers
2025-12-18 14:52 ` Ben Horgan
@ 2025-12-18 14:55 ` Ben Horgan
2025-12-18 15:38 ` Jonathan Cameron
0 siblings, 1 reply; 95+ messages in thread
From: Ben Horgan @ 2025-12-18 14:55 UTC (permalink / raw)
To: Jonathan Cameron, James Morse
Cc: linux-kernel, linux-arm-kernel, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal
On 12/18/25 14:52, Ben Horgan wrote:
> Hi Jonathan,
>
> On 12/18/25 10:35, Jonathan Cameron wrote:
>> On Fri, 5 Dec 2025 21:58:24 +0000
>> James Morse <james.morse@arm.com> wrote:
>>
>>> MPAM allows traffic in the SoC to be labeled by the OS, these labels
>>> are used to apply policy in caches and bandwidth regulators, and to
>>> monitor traffic in the SoC. The label is made up of a PARTID and PMG
>>> value. The x86 equivalent calls these CLOSID and RMID, but they don't
>>> map precisely.
>>>
>>> MPAM has two CPU system registers that is used to hold the PARTID and PMG
>>> values that traffic generated at each exception level will use. These can be
>>> set per-task by the resctrl file system. (resctrl is the defacto interface
>>> for controlling this stuff).
>>>
>>> Add a helper to switch this.
>>>
>>> struct task_struct's separate CLOSID and RMID fields are insufficient
>>> to implement resctrl using MPAM, as resctrl can change the PARTID (CLOSID)
>>> and PMG (sort of like the RMID) separately. On x86, the rmid is an
>>> independent number, so a race that writes a mismatched closid and rmid
>>> into hardware is benign. On arm64, the pmg bits extend the partid.
>>> (i.e. partid-5 has a pmg-0 that is not the same as partid-6's pmg-0).
>>> In this case, mismatching the values will 'dirty' a pmg value that
>>> resctrl believes is clean, and is not tracking with its 'limbo' code.
>>>
>>> To avoid this, the partid and pmg are always read and written as a pair.
>>> Instead of making struct task_struct's closid and rmid fields an
>>> endian-unsafe union, add the value to struct thread_info and always use
>>> READ_ONCE()/WRITE_ONCE() when accessing this field.
>>>
>>> Resctrl allows a per-cpu 'default' value to be set, this overrides the
>>> values when scheduling a task in the default control-group, which has
>>> PARTID 0. The way 'code data prioritisation' gets emulated means the
>>> register value for the default group needs to be a variable.
>>>
>>> The current system register value is kept in a per-cpu variable to
>>> avoid writing to the system register if the value isn't going to change.
>>> Writes to this register may reset the hardware state for regulating
>>> bandwidth.
>>>
>>> Finally, there is no reason to context switch these registers unless
>>> there is a driver changing the values in struct task_struct. Hide
>>> the whole thing behind a static key. This also allows the driver to
>>> disable MPAM in response to errors reported by hardware. Move the
>>> existing static key to belong to the arch code, as in the future
>>> the MPAM driver may become a loadable module.
>>>
>>> All this should depend on whether there is an MPAM driver, hide
>>> it behind CONFIG_MPAM.
>>>
>>> CC: Amit Singh Tomar <amitsinght@marvell.com>
>>> Signed-off-by: James Morse <james.morse@arm.com>
>>
>>> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
>>> new file mode 100644
>>> index 000000000000..86a55176f884
>>> --- /dev/null
>>> +++ b/arch/arm64/include/asm/mpam.h
>>> @@ -0,0 +1,74 @@
>> ...
>>
>>> +/*
>>> + * The resctrl filesystem writes to the partid/pmg values for threads and CPUs,
>>> + * which may race with reads in __mpam_sched_in(). Ensure only one of the old
>>> + * or new values are used. Particular care should be taken with the pmg field
>>> + * as __mpam_sched_in() may read a partid and pmg that don't match, causing
>>> + * this value to be stored with cache allocations, despite being considered
>>> + * 'free' by resctrl.
>>> + *
>>> + * A value in struct thread_info is used instead of struct task_struct as the
>>> + * cpu's u64 register format is used, but struct task_struct has two u32'.
>>
>> This comment probably wants to provide a little more info if it is to be useful,
>>
>> Is it a reference to the closid and rmid fields under CONFIG_X86_CPU_RESCTRL?
>> I'm not immediately understanding why that matters given you could slap
>> a union on it without (I think) resulting in anything else moving.
>
> Yes, the fields referred to are those closid and rmid. As James writes
> in the commit message a union is an alternative, but it would be endian
> unsafe. Unlikely to matter but lets not break things.
Meant to say... I'll add clarification in this vein to the comment.
>
> I'm replying for James as he is otherwise engaged. Thanks for the review
> of this series and all your review on the previous MPAM series.
>
>>
>> Now having it in thread_info moves it into arch header territory so
>> might make sense for that reason.
>>
>>> + */
>>> +static inline u64 mpam_get_regval(struct task_struct *tsk)
>>> +{
>>> +#ifdef CONFIG_ARM64_MPAM
>>> + return READ_ONCE(task_thread_info(tsk)->mpam_partid_pmg);
>>> +#else
>>> + return 0;
>>> +#endif
>>> +}
>>
>
> Thanks,
>
> Ben
>
--
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread* Re: [RFC PATCH 01/38] arm64: mpam: Context switch the MPAM registers
2025-12-18 14:55 ` Ben Horgan
@ 2025-12-18 15:38 ` Jonathan Cameron
2025-12-18 15:54 ` Ben Horgan
0 siblings, 1 reply; 95+ messages in thread
From: Jonathan Cameron @ 2025-12-18 15:38 UTC (permalink / raw)
To: Ben Horgan
Cc: James Morse, linux-kernel, linux-arm-kernel, D Scott Phillips OS,
carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
fenghuay, baisheng.gao, Gavin Shan, rohit.mathew, reinette.chatre,
Punit Agrawal
On Thu, 18 Dec 2025 14:55:23 +0000
Ben Horgan <ben.horgan@arm.com> wrote:
> On 12/18/25 14:52, Ben Horgan wrote:
> > Hi Jonathan,
> >
> > On 12/18/25 10:35, Jonathan Cameron wrote:
> >> On Fri, 5 Dec 2025 21:58:24 +0000
> >> James Morse <james.morse@arm.com> wrote:
> >>
> >>> MPAM allows traffic in the SoC to be labeled by the OS, these labels
> >>> are used to apply policy in caches and bandwidth regulators, and to
> >>> monitor traffic in the SoC. The label is made up of a PARTID and PMG
> >>> value. The x86 equivalent calls these CLOSID and RMID, but they don't
> >>> map precisely.
> >>>
> >>> MPAM has two CPU system registers that is used to hold the PARTID and PMG
> >>> values that traffic generated at each exception level will use. These can be
> >>> set per-task by the resctrl file system. (resctrl is the defacto interface
> >>> for controlling this stuff).
> >>>
> >>> Add a helper to switch this.
> >>>
> >>> struct task_struct's separate CLOSID and RMID fields are insufficient
> >>> to implement resctrl using MPAM, as resctrl can change the PARTID (CLOSID)
> >>> and PMG (sort of like the RMID) separately. On x86, the rmid is an
> >>> independent number, so a race that writes a mismatched closid and rmid
> >>> into hardware is benign. On arm64, the pmg bits extend the partid.
> >>> (i.e. partid-5 has a pmg-0 that is not the same as partid-6's pmg-0).
> >>> In this case, mismatching the values will 'dirty' a pmg value that
> >>> resctrl believes is clean, and is not tracking with its 'limbo' code.
> >>>
> >>> To avoid this, the partid and pmg are always read and written as a pair.
> >>> Instead of making struct task_struct's closid and rmid fields an
> >>> endian-unsafe union, add the value to struct thread_info and always use
> >>> READ_ONCE()/WRITE_ONCE() when accessing this field.
> >>>
> >>> Resctrl allows a per-cpu 'default' value to be set, this overrides the
> >>> values when scheduling a task in the default control-group, which has
> >>> PARTID 0. The way 'code data prioritisation' gets emulated means the
> >>> register value for the default group needs to be a variable.
> >>>
> >>> The current system register value is kept in a per-cpu variable to
> >>> avoid writing to the system register if the value isn't going to change.
> >>> Writes to this register may reset the hardware state for regulating
> >>> bandwidth.
> >>>
> >>> Finally, there is no reason to context switch these registers unless
> >>> there is a driver changing the values in struct task_struct. Hide
> >>> the whole thing behind a static key. This also allows the driver to
> >>> disable MPAM in response to errors reported by hardware. Move the
> >>> existing static key to belong to the arch code, as in the future
> >>> the MPAM driver may become a loadable module.
> >>>
> >>> All this should depend on whether there is an MPAM driver, hide
> >>> it behind CONFIG_MPAM.
> >>>
> >>> CC: Amit Singh Tomar <amitsinght@marvell.com>
> >>> Signed-off-by: James Morse <james.morse@arm.com>
> >>
> >>> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
> >>> new file mode 100644
> >>> index 000000000000..86a55176f884
> >>> --- /dev/null
> >>> +++ b/arch/arm64/include/asm/mpam.h
> >>> @@ -0,0 +1,74 @@
> >> ...
> >>
> >>> +/*
> >>> + * The resctrl filesystem writes to the partid/pmg values for threads and CPUs,
> >>> + * which may race with reads in __mpam_sched_in(). Ensure only one of the old
> >>> + * or new values are used. Particular care should be taken with the pmg field
> >>> + * as __mpam_sched_in() may read a partid and pmg that don't match, causing
> >>> + * this value to be stored with cache allocations, despite being considered
> >>> + * 'free' by resctrl.
> >>> + *
> >>> + * A value in struct thread_info is used instead of struct task_struct as the
> >>> + * cpu's u64 register format is used, but struct task_struct has two u32'.
> >>
> >> This comment probably wants to provide a little more info if it is to be useful,
> >>
> >> Is it a reference to the closid and rmid fields under CONFIG_X86_CPU_RESCTRL?
> >> I'm not immediately understanding why that matters given you could slap
> >> a union on it without (I think) resulting in anything else moving.
> >
> > Yes, the fields referred to are those closid and rmid. As James writes
> > in the commit message a union is an alternative, but it would be endian
> > unsafe. Unlikely to matter but lets not break things.
>
> Meant to say... I'll add clarification in this vein to the comment.
Goes to show I didn't read the patch description. Oops.
I'm probably just being slow today, but why would it be endian unsafe?
I didn't think the alternative would be to assume the two uses of the storage
were valid at the same time but rather just to reuse the space (which would
have 64bit alignment anyway). For that matter we could just put a u64 in
under a separate ifdef CONFIG_...
Obviously if the code made use of the access to closid / rmid for arm64
systems it would be a problem.
Anyway just expanding on the comment a bit should do the job with no
need for any other changes.
Thanks,
Jonathan
>
> >
> > I'm replying for James as he is otherwise engaged. Thanks for the review
> > of this series and all your review on the previous MPAM series.
> >
> >>
> >> Now having it in thread_info moves it into arch header territory so
> >> might make sense for that reason.
> >>
> >>> + */
> >>> +static inline u64 mpam_get_regval(struct task_struct *tsk)
> >>> +{
> >>> +#ifdef CONFIG_ARM64_MPAM
> >>> + return READ_ONCE(task_thread_info(tsk)->mpam_partid_pmg);
> >>> +#else
> >>> + return 0;
> >>> +#endif
> >>> +}
> >>
> >
> > Thanks,
> >
> > Ben
> >
>
^ permalink raw reply [flat|nested] 95+ messages in thread* Re: [RFC PATCH 01/38] arm64: mpam: Context switch the MPAM registers
2025-12-18 15:38 ` Jonathan Cameron
@ 2025-12-18 15:54 ` Ben Horgan
0 siblings, 0 replies; 95+ messages in thread
From: Ben Horgan @ 2025-12-18 15:54 UTC (permalink / raw)
To: Jonathan Cameron
Cc: James Morse, linux-kernel, linux-arm-kernel, D Scott Phillips OS,
carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
fenghuay, baisheng.gao, Gavin Shan, rohit.mathew, reinette.chatre,
Punit Agrawal
On 12/18/25 15:38, Jonathan Cameron wrote:
> On Thu, 18 Dec 2025 14:55:23 +0000
> Ben Horgan <ben.horgan@arm.com> wrote:
>
>> On 12/18/25 14:52, Ben Horgan wrote:
>>> Hi Jonathan,
>>>
>>> On 12/18/25 10:35, Jonathan Cameron wrote:
>>>> On Fri, 5 Dec 2025 21:58:24 +0000
>>>> James Morse <james.morse@arm.com> wrote:
>>>>
>>>>> MPAM allows traffic in the SoC to be labeled by the OS, these labels
>>>>> are used to apply policy in caches and bandwidth regulators, and to
>>>>> monitor traffic in the SoC. The label is made up of a PARTID and PMG
>>>>> value. The x86 equivalent calls these CLOSID and RMID, but they don't
>>>>> map precisely.
>>>>>
>>>>> MPAM has two CPU system registers that is used to hold the PARTID and PMG
>>>>> values that traffic generated at each exception level will use. These can be
>>>>> set per-task by the resctrl file system. (resctrl is the defacto interface
>>>>> for controlling this stuff).
>>>>>
>>>>> Add a helper to switch this.
>>>>>
>>>>> struct task_struct's separate CLOSID and RMID fields are insufficient
>>>>> to implement resctrl using MPAM, as resctrl can change the PARTID (CLOSID)
>>>>> and PMG (sort of like the RMID) separately. On x86, the rmid is an
>>>>> independent number, so a race that writes a mismatched closid and rmid
>>>>> into hardware is benign. On arm64, the pmg bits extend the partid.
>>>>> (i.e. partid-5 has a pmg-0 that is not the same as partid-6's pmg-0).
>>>>> In this case, mismatching the values will 'dirty' a pmg value that
>>>>> resctrl believes is clean, and is not tracking with its 'limbo' code.
>>>>>
>>>>> To avoid this, the partid and pmg are always read and written as a pair.
>>>>> Instead of making struct task_struct's closid and rmid fields an
>>>>> endian-unsafe union, add the value to struct thread_info and always use
>>>>> READ_ONCE()/WRITE_ONCE() when accessing this field.
>>>>>
>>>>> Resctrl allows a per-cpu 'default' value to be set, this overrides the
>>>>> values when scheduling a task in the default control-group, which has
>>>>> PARTID 0. The way 'code data prioritisation' gets emulated means the
>>>>> register value for the default group needs to be a variable.
>>>>>
>>>>> The current system register value is kept in a per-cpu variable to
>>>>> avoid writing to the system register if the value isn't going to change.
>>>>> Writes to this register may reset the hardware state for regulating
>>>>> bandwidth.
>>>>>
>>>>> Finally, there is no reason to context switch these registers unless
>>>>> there is a driver changing the values in struct task_struct. Hide
>>>>> the whole thing behind a static key. This also allows the driver to
>>>>> disable MPAM in response to errors reported by hardware. Move the
>>>>> existing static key to belong to the arch code, as in the future
>>>>> the MPAM driver may become a loadable module.
>>>>>
>>>>> All this should depend on whether there is an MPAM driver, hide
>>>>> it behind CONFIG_MPAM.
>>>>>
>>>>> CC: Amit Singh Tomar <amitsinght@marvell.com>
>>>>> Signed-off-by: James Morse <james.morse@arm.com>
>>>>
>>>>> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
>>>>> new file mode 100644
>>>>> index 000000000000..86a55176f884
>>>>> --- /dev/null
>>>>> +++ b/arch/arm64/include/asm/mpam.h
>>>>> @@ -0,0 +1,74 @@
>>>> ...
>>>>
>>>>> +/*
>>>>> + * The resctrl filesystem writes to the partid/pmg values for threads and CPUs,
>>>>> + * which may race with reads in __mpam_sched_in(). Ensure only one of the old
>>>>> + * or new values are used. Particular care should be taken with the pmg field
>>>>> + * as __mpam_sched_in() may read a partid and pmg that don't match, causing
>>>>> + * this value to be stored with cache allocations, despite being considered
>>>>> + * 'free' by resctrl.
>>>>> + *
>>>>> + * A value in struct thread_info is used instead of struct task_struct as the
>>>>> + * cpu's u64 register format is used, but struct task_struct has two u32'.
>>>>
>>>> This comment probably wants to provide a little more info if it is to be useful,
>>>>
>>>> Is it a reference to the closid and rmid fields under CONFIG_X86_CPU_RESCTRL?
>>>> I'm not immediately understanding why that matters given you could slap
>>>> a union on it without (I think) resulting in anything else moving.
>>>
>>> Yes, the fields referred to are those closid and rmid. As James writes
>>> in the commit message a union is an alternative, but it would be endian
>>> unsafe. Unlikely to matter but lets not break things.
>>
>> Meant to say... I'll add clarification in this vein to the comment.
>
> Goes to show I didn't read the patch description. Oops.
>
> I'm probably just being slow today, but why would it be endian unsafe?
> I didn't think the alternative would be to assume the two uses of the storage
> were valid at the same time but rather just to reuse the space (which would
> have 64bit alignment anyway). For that matter we could just put a u64 in
> under a separate ifdef CONFIG_...
>
> Obviously if the code made use of the access to closid / rmid for arm64
> systems it would be a problem.
Yes, I think it would only be unsafe if closid / rmid were accessed, but
if they're not, just reusing the spot is cleaner. I assume James' point
is that as we can't use what we've already got it's ok just to do
something new.
>
> Anyway just expanding on the comment a bit should do the job with no
> need for any other changes.
>
> Thanks,
>
> Jonathan
>
>
>>
>>>
>>> I'm replying for James as he is otherwise engaged. Thanks for the review
>>> of this series and all your review on the previous MPAM series.
>>>
>>>>
>>>> Now having it in thread_info moves it into arch header territory so
>>>> might make sense for that reason.
>>>>
>>>>> + */
>>>>> +static inline u64 mpam_get_regval(struct task_struct *tsk)
>>>>> +{
>>>>> +#ifdef CONFIG_ARM64_MPAM
>>>>> + return READ_ONCE(task_thread_info(tsk)->mpam_partid_pmg);
>>>>> +#else
>>>>> + return 0;
>>>>> +#endif
>>>>> +}
>>>>
>>>
>>> Thanks,
>>>
>>> Ben
>>>
>>
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread
* [RFC PATCH 02/38] arm64: mpam: Re-initialise MPAM regs when CPU comes online
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
2025-12-05 21:58 ` [RFC PATCH 01/38] arm64: mpam: Context switch the MPAM registers James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-09 15:13 ` Ben Horgan
2025-12-05 21:58 ` [RFC PATCH 03/38] arm64: mpam: Advertise the CPUs MPAM limits to the driver James Morse
` (36 subsequent siblings)
38 siblings, 1 reply; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
Now that the MPAM system registers are expected to have values that change,
reprogram them based on struct task_struct when a CPU is brought online.
Previously MPAM's 'default PARTID' of 0 was used this is the PARTID that
hardware guarantees to reset. Because there are a limited number of
PARTID, this value is exposed to user space, meaning resctrl changes
to the resctrl default group would also affect kernel threads.
Instead, use the task's PARTID value for kernel work on behalf of
user-space too.
Signed-off-by: James Morse <james.morse@arm.com>
---
arch/arm64/kernel/cpufeature.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 5ed401ff79e3..429128a181ac 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -86,6 +86,7 @@
#include <asm/kvm_host.h>
#include <asm/mmu.h>
#include <asm/mmu_context.h>
+#include <asm/mpam.h>
#include <asm/mte.h>
#include <asm/hypervisor.h>
#include <asm/processor.h>
@@ -2439,13 +2440,16 @@ test_has_mpam(const struct arm64_cpu_capabilities *entry, int scope)
static void
cpu_enable_mpam(const struct arm64_cpu_capabilities *entry)
{
- /*
- * Access by the kernel (at EL1) should use the reserved PARTID
- * which is configured unrestricted. This avoids priority-inversion
- * where latency sensitive tasks have to wait for a task that has
- * been throttled to release the lock.
- */
- write_sysreg_s(0, SYS_MPAM1_EL1);
+ int cpu = smp_processor_id();
+ u64 regval = 0;
+
+ if (IS_ENABLED(CONFIG_MPAM))
+ regval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
+
+ write_sysreg_s(regval, SYS_MPAM1_EL1);
+ isb();
+
+ write_sysreg_s(regval, SYS_MPAM0_EL1);
}
static bool
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [RFC PATCH 02/38] arm64: mpam: Re-initialise MPAM regs when CPU comes online
2025-12-05 21:58 ` [RFC PATCH 02/38] arm64: mpam: Re-initialise MPAM regs when CPU comes online James Morse
@ 2025-12-09 15:13 ` Ben Horgan
2025-12-11 11:23 ` Ben Horgan
0 siblings, 1 reply; 95+ messages in thread
From: Ben Horgan @ 2025-12-09 15:13 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel
Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal
Hi James,
On 12/5/25 21:58, James Morse wrote:
> Now that the MPAM system registers are expected to have values that change,
> reprogram them based on struct task_struct when a CPU is brought online.
>
> Previously MPAM's 'default PARTID' of 0 was used this is the PARTID that
> hardware guarantees to reset. Because there are a limited number of
> PARTID, this value is exposed to user space, meaning resctrl changes
> to the resctrl default group would also affect kernel threads.
> Instead, use the task's PARTID value for kernel work on behalf of
> user-space too.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> arch/arm64/kernel/cpufeature.c | 18 +++++++++++-------
> 1 file changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 5ed401ff79e3..429128a181ac 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -86,6 +86,7 @@
> #include <asm/kvm_host.h>
> #include <asm/mmu.h>
> #include <asm/mmu_context.h>
> +#include <asm/mpam.h>
> #include <asm/mte.h>
> #include <asm/hypervisor.h>
> #include <asm/processor.h>
> @@ -2439,13 +2440,16 @@ test_has_mpam(const struct arm64_cpu_capabilities *entry, int scope)
> static void
> cpu_enable_mpam(const struct arm64_cpu_capabilities *entry)
> {
> - /*
> - * Access by the kernel (at EL1) should use the reserved PARTID
> - * which is configured unrestricted. This avoids priority-inversion
> - * where latency sensitive tasks have to wait for a task that has
> - * been throttled to release the lock.
> - */
> - write_sysreg_s(0, SYS_MPAM1_EL1);
> + int cpu = smp_processor_id();
> + u64 regval = 0;
> +
> + if (IS_ENABLED(CONFIG_MPAM))
> + regval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
CONFIG_MPAM -> CONFIG_ARM64_MPAM
> +
> + write_sysreg_s(regval, SYS_MPAM1_EL1);
> + isb();
> +
> + write_sysreg_s(regval, SYS_MPAM0_EL1);
> }
>
> static bool
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread* Re: [RFC PATCH 02/38] arm64: mpam: Re-initialise MPAM regs when CPU comes online
2025-12-09 15:13 ` Ben Horgan
@ 2025-12-11 11:23 ` Ben Horgan
2025-12-11 11:32 ` Ben Horgan
0 siblings, 1 reply; 95+ messages in thread
From: Ben Horgan @ 2025-12-11 11:23 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel
Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal
Hi James,
On 12/9/25 15:13, Ben Horgan wrote:
> Hi James,
>
> On 12/5/25 21:58, James Morse wrote:
>> Now that the MPAM system registers are expected to have values that change,
>> reprogram them based on struct task_struct when a CPU is brought online.
>>
>> Previously MPAM's 'default PARTID' of 0 was used this is the PARTID that
>> hardware guarantees to reset. Because there are a limited number of
>> PARTID, this value is exposed to user space, meaning resctrl changes
>> to the resctrl default group would also affect kernel threads.
>> Instead, use the task's PARTID value for kernel work on behalf of
>> user-space too.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> ---
>> arch/arm64/kernel/cpufeature.c | 18 +++++++++++-------
>> 1 file changed, 11 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>> index 5ed401ff79e3..429128a181ac 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ b/arch/arm64/kernel/cpufeature.c
>> @@ -86,6 +86,7 @@
>> #include <asm/kvm_host.h>
>> #include <asm/mmu.h>
>> #include <asm/mmu_context.h>
>> +#include <asm/mpam.h>
>> #include <asm/mte.h>
>> #include <asm/hypervisor.h>
>> #include <asm/processor.h>
>> @@ -2439,13 +2440,16 @@ test_has_mpam(const struct arm64_cpu_capabilities *entry, int scope)
>> static void
>> cpu_enable_mpam(const struct arm64_cpu_capabilities *entry)
>> {
>> - /*
>> - * Access by the kernel (at EL1) should use the reserved PARTID
>> - * which is configured unrestricted. This avoids priority-inversion
>> - * where latency sensitive tasks have to wait for a task that has
>> - * been throttled to release the lock.
>> - */
>> - write_sysreg_s(0, SYS_MPAM1_EL1);
>> + int cpu = smp_processor_id();
>> + u64 regval = 0;
>> +
>> + if (IS_ENABLED(CONFIG_MPAM))
>> + regval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
>
> CONFIG_MPAM -> CONFIG_ARM64_MPAM
Actually, this code is only run before the mpam enablement is finished,
importantly before the mpam_enabled static key is set, and so
arm64_mpam_current is still 0 for every cpu. For cpus that are brought
up after boot time this is never run.
As SYS_MPAM0_EL1 and SYS_MPAM1_EL1 are unknown out of reset we should
set them to 0 whenever a cpu comes online to make sure they initially
use PARTID 0 as that is the only one guaranteed to have sensible
defaults in the MSC. Once the mpam driver has configured the MSC we can
start setting other values.
>
>> +
>> + write_sysreg_s(regval, SYS_MPAM1_EL1);
>> + isb();
>> +
>> + write_sysreg_s(regval, SYS_MPAM0_EL1);
>> }
>>
>> static bool
>
> Thanks,
>
> Ben
>
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread* Re: [RFC PATCH 02/38] arm64: mpam: Re-initialise MPAM regs when CPU comes online
2025-12-11 11:23 ` Ben Horgan
@ 2025-12-11 11:32 ` Ben Horgan
0 siblings, 0 replies; 95+ messages in thread
From: Ben Horgan @ 2025-12-11 11:32 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel
Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal
Retraction... sorry
On 12/11/25 11:23, Ben Horgan wrote:
> Hi James,
>
> On 12/9/25 15:13, Ben Horgan wrote:
>> Hi James,
>>
>> On 12/5/25 21:58, James Morse wrote:
>
> Actually, this code is only run before the mpam enablement is finished,
> importantly before the mpam_enabled static key is set, and so
> arm64_mpam_current is still 0 for every cpu. For cpus that are brought
> up after boot time this is never run.
Actually, actually, this is run whenever a cpu comes online, even if by
hotplug later, and so arm64_mpam_current will be 0 as need on the first
switch on and the previous value for the cpu when it has previously been
off.
>
> As SYS_MPAM0_EL1 and SYS_MPAM1_EL1 are unknown out of reset we should
> set them to 0 whenever a cpu comes online to make sure they initially
> use PARTID 0 as that is the only one guaranteed to have sensible
> defaults in the MSC. Once the mpam driver has configured the MSC we can
> start setting other values.
>
>>
>>> +
>>> + write_sysreg_s(regval, SYS_MPAM1_EL1);
>>> + isb();
>>> +
>>> + write_sysreg_s(regval, SYS_MPAM0_EL1);
>>> }
>>>
>>> static bool
>>
>> Thanks,
>>
>> Ben
>>
>>
>
> Thanks,
>
> Ben
>
>
--
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread
* [RFC PATCH 03/38] arm64: mpam: Advertise the CPUs MPAM limits to the driver
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
2025-12-05 21:58 ` [RFC PATCH 01/38] arm64: mpam: Context switch the MPAM registers James Morse
2025-12-05 21:58 ` [RFC PATCH 02/38] arm64: mpam: Re-initialise MPAM regs when CPU comes online James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-18 10:38 ` Jonathan Cameron
2025-12-05 21:58 ` [RFC PATCH 04/38] arm64: mpam: Add cpu_pm notifier to restore MPAM sysregs James Morse
` (35 subsequent siblings)
38 siblings, 1 reply; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
Requestors need to populate the MPAM fields for any traffic they send
on the interconnect. For the CPUs these values are taken from the
corresponding MPAMy_ELx register. Each requestor may have a limit on
the largest PARTID or PMG value that can be used. The MPAM driver has
to determine the system-wide minimum supported PARTID and PMG values.
To do this, the driver needs to be told what each requestor's
limit is.
CPUs are special, but this infrastructure is also needed for the
SMMU and GIC ITS. Call the helper to tell the MPAM driver what the
CPUs can do.
The return value can be ignored by the arch code as it runs well
before the MPAM driver starts probing.
Signed-off-by: James Morse <james.morse@arm.com>
---
arch/arm64/kernel/mpam.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/arch/arm64/kernel/mpam.c b/arch/arm64/kernel/mpam.c
index 9866d2ca0faa..e6feff2324ac 100644
--- a/arch/arm64/kernel/mpam.c
+++ b/arch/arm64/kernel/mpam.c
@@ -3,6 +3,7 @@
#include <asm/mpam.h>
+#include <linux/arm_mpam.h>
#include <linux/jump_label.h>
#include <linux/percpu.h>
@@ -11,3 +12,14 @@ DEFINE_PER_CPU(u64, arm64_mpam_default);
DEFINE_PER_CPU(u64, arm64_mpam_current);
u64 arm64_mpam_global_default;
+
+static int __init arm64_mpam_register_cpus(void)
+{
+ u64 mpamidr = read_sanitised_ftr_reg(SYS_MPAMIDR_EL1);
+ u16 partid_max = FIELD_GET(MPAMIDR_EL1_PARTID_MAX, mpamidr);
+ u8 pmg_max = FIELD_GET(MPAMIDR_EL1_PMG_MAX, mpamidr);
+
+ return mpam_register_requestor(partid_max, pmg_max);
+}
+/* Must occur before mpam_msc_driver_init() from subsys_initcall() */
+arch_initcall(arm64_mpam_register_cpus)
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [RFC PATCH 03/38] arm64: mpam: Advertise the CPUs MPAM limits to the driver
2025-12-05 21:58 ` [RFC PATCH 03/38] arm64: mpam: Advertise the CPUs MPAM limits to the driver James Morse
@ 2025-12-18 10:38 ` Jonathan Cameron
0 siblings, 0 replies; 95+ messages in thread
From: Jonathan Cameron @ 2025-12-18 10:38 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
On Fri, 5 Dec 2025 21:58:26 +0000
James Morse <james.morse@arm.com> wrote:
> Requestors need to populate the MPAM fields for any traffic they send
> on the interconnect. For the CPUs these values are taken from the
> corresponding MPAMy_ELx register. Each requestor may have a limit on
> the largest PARTID or PMG value that can be used. The MPAM driver has
> to determine the system-wide minimum supported PARTID and PMG values.
>
> To do this, the driver needs to be told what each requestor's
> limit is.
Trivial but this commit message line wrap isn't very consistent.
>
> CPUs are special, but this infrastructure is also needed for the
> SMMU and GIC ITS. Call the helper to tell the MPAM driver what the
> CPUs can do.
>
> The return value can be ignored by the arch code as it runs well
> before the MPAM driver starts probing.
>
> Signed-off-by: James Morse <james.morse@arm.com>
^ permalink raw reply [flat|nested] 95+ messages in thread
* [RFC PATCH 04/38] arm64: mpam: Add cpu_pm notifier to restore MPAM sysregs
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (2 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 03/38] arm64: mpam: Advertise the CPUs MPAM limits to the driver James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-11 13:41 ` Ben Horgan
2025-12-05 21:58 ` [RFC PATCH 05/38] arm64: mpam: Add helpers to change a task or cpu's MPAM PARTID/PMG values James Morse
` (34 subsequent siblings)
38 siblings, 1 reply; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
The MPAM system registers will be lost if the CPU is reset during
PSCI's CPU_SUSPEND.
Add a PM notifier to restore them.
mpam_thread_switch(current) can't be used as this won't make any
changes if the in-memory copy says the register already has the
correct value. In reality the system register is UNKNOWN out of reset.
Signed-off-by: James Morse <james.morse@arm.com>
---
arch/arm64/kernel/mpam.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/arch/arm64/kernel/mpam.c b/arch/arm64/kernel/mpam.c
index e6feff2324ac..dbe0a2d05abb 100644
--- a/arch/arm64/kernel/mpam.c
+++ b/arch/arm64/kernel/mpam.c
@@ -4,6 +4,7 @@
#include <asm/mpam.h>
#include <linux/arm_mpam.h>
+#include <linux/cpu_pm.h>
#include <linux/jump_label.h>
#include <linux/percpu.h>
@@ -13,12 +14,41 @@ DEFINE_PER_CPU(u64, arm64_mpam_current);
u64 arm64_mpam_global_default;
+static int mpam_pm_notifier(struct notifier_block *self,
+ unsigned long cmd, void *v)
+{
+ u64 regval;
+ int cpu = smp_processor_id();
+
+ switch (cmd) {
+ case CPU_PM_EXIT:
+ /*
+ * Don't use mpam_thread_switch() as the system register
+ * value has changed under our feet.
+ */
+ regval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
+ write_sysreg_s(regval, SYS_MPAM1_EL1);
+ isb();
+
+ write_sysreg_s(regval, SYS_MPAM0_EL1);
+
+ return NOTIFY_OK;
+ default:
+ return NOTIFY_DONE;
+ }
+}
+
+static struct notifier_block mpam_pm_nb = {
+ .notifier_call = mpam_pm_notifier,
+};
+
static int __init arm64_mpam_register_cpus(void)
{
u64 mpamidr = read_sanitised_ftr_reg(SYS_MPAMIDR_EL1);
u16 partid_max = FIELD_GET(MPAMIDR_EL1_PARTID_MAX, mpamidr);
u8 pmg_max = FIELD_GET(MPAMIDR_EL1_PMG_MAX, mpamidr);
+ cpu_pm_register_notifier(&mpam_pm_nb);
return mpam_register_requestor(partid_max, pmg_max);
}
/* Must occur before mpam_msc_driver_init() from subsys_initcall() */
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [RFC PATCH 04/38] arm64: mpam: Add cpu_pm notifier to restore MPAM sysregs
2025-12-05 21:58 ` [RFC PATCH 04/38] arm64: mpam: Add cpu_pm notifier to restore MPAM sysregs James Morse
@ 2025-12-11 13:41 ` Ben Horgan
0 siblings, 0 replies; 95+ messages in thread
From: Ben Horgan @ 2025-12-11 13:41 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel
Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal
Hi James,
On 12/5/25 21:58, James Morse wrote:
> The MPAM system registers will be lost if the CPU is reset during
> PSCI's CPU_SUSPEND.
>
> Add a PM notifier to restore them.
>
> mpam_thread_switch(current) can't be used as this won't make any
> changes if the in-memory copy says the register already has the
> correct value. In reality the system register is UNKNOWN out of reset.
If CONFIG_ARM64_MPAM is not enabled then the PM notifier is never
registered but the MPAMx_ELy will still come out of reset with UNKNOWN
values and we need to set the PARTID values back to 0 as the MSC are
only guaranteed to provide good defaults for that value. (We aren't
using the MPAM driver to initialise the other PARTID.)
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> arch/arm64/kernel/mpam.c | 30 ++++++++++++++++++++++++++++++
> 1 file changed, 30 insertions(+)
>
> diff --git a/arch/arm64/kernel/mpam.c b/arch/arm64/kernel/mpam.c
> index e6feff2324ac..dbe0a2d05abb 100644
> --- a/arch/arm64/kernel/mpam.c
> +++ b/arch/arm64/kernel/mpam.c
> @@ -4,6 +4,7 @@
> #include <asm/mpam.h>
>
> #include <linux/arm_mpam.h>
> +#include <linux/cpu_pm.h>
> #include <linux/jump_label.h>
> #include <linux/percpu.h>
>
> @@ -13,12 +14,41 @@ DEFINE_PER_CPU(u64, arm64_mpam_current);
>
> u64 arm64_mpam_global_default;
>
> +static int mpam_pm_notifier(struct notifier_block *self,
> + unsigned long cmd, void *v)
> +{
> + u64 regval;
> + int cpu = smp_processor_id();
> +
> + switch (cmd) {
> + case CPU_PM_EXIT:
> + /*
> + * Don't use mpam_thread_switch() as the system register
> + * value has changed under our feet.
> + */
> + regval = READ_ONCE(per_cpu(arm64_mpam_current, cpu));
> + write_sysreg_s(regval, SYS_MPAM1_EL1);
> + isb();
> +
> + write_sysreg_s(regval, SYS_MPAM0_EL1);
> +
> + return NOTIFY_OK;
> + default:
> + return NOTIFY_DONE;
> + }
> +}
> +
> +static struct notifier_block mpam_pm_nb = {
> + .notifier_call = mpam_pm_notifier,
> +};
> +
> static int __init arm64_mpam_register_cpus(void)
> {
> u64 mpamidr = read_sanitised_ftr_reg(SYS_MPAMIDR_EL1);
> u16 partid_max = FIELD_GET(MPAMIDR_EL1_PARTID_MAX, mpamidr);
> u8 pmg_max = FIELD_GET(MPAMIDR_EL1_PMG_MAX, mpamidr);
>
> + cpu_pm_register_notifier(&mpam_pm_nb);
> return mpam_register_requestor(partid_max, pmg_max);
> }
> /* Must occur before mpam_msc_driver_init() from subsys_initcall() */
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread
* [RFC PATCH 05/38] arm64: mpam: Add helpers to change a task or cpu's MPAM PARTID/PMG values
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (3 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 04/38] arm64: mpam: Add cpu_pm notifier to restore MPAM sysregs James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-18 10:44 ` Jonathan Cameron
2025-12-05 21:58 ` [RFC PATCH 06/38] KVM: arm64: Force guest EL1 to use user-space's partid configuration James Morse
` (33 subsequent siblings)
38 siblings, 1 reply; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal, Dave Martin
Care must be taken when modifying the PARTID and PMG of a task in any
per-task structure as writing these values may race with the task being
scheduled in, and reading the modified values.
Add helpers to set the task properties, and the CPU default value.
These use WRITE_ONCE() that pairs with the READ_ONCE() in mpam_get_regval()
to avoid causing torn values.
CC: Dave Martin <Dave.Martin@arm.com>
CC: Ben Horgan <ben.horgan@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
arch/arm64/include/asm/mpam.h | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
index 86a55176f884..2960ffaf6574 100644
--- a/arch/arm64/include/asm/mpam.h
+++ b/arch/arm64/include/asm/mpam.h
@@ -5,6 +5,7 @@
#define __ASM__MPAM_H
#include <linux/bitops.h>
+#include <linux/bitfield.h>
#include <linux/init.h>
#include <linux/jump_label.h>
#include <linux/percpu.h>
@@ -37,6 +38,35 @@ extern u64 arm64_mpam_global_default;
* A value in struct thread_info is used instead of struct task_struct as the
* cpu's u64 register format is used, but struct task_struct has two u32'.
*/
+static inline void mpam_set_cpu_defaults(int cpu, u16 partid_d, u16 partid_i,
+ u8 pmg_d, u8 pmg_i)
+{
+ u64 default_val;
+
+ default_val = FIELD_PREP(MPAM0_EL1_PARTID_D, partid_d);
+ default_val |= FIELD_PREP(MPAM0_EL1_PARTID_I, partid_i);
+ default_val |= FIELD_PREP(MPAM0_EL1_PMG_D, pmg_d);
+ default_val |= FIELD_PREP(MPAM0_EL1_PMG_I, pmg_i);
+
+ WRITE_ONCE(per_cpu(arm64_mpam_default, cpu), default_val);
+}
+
+static inline void mpam_set_task_partid_pmg(struct task_struct *tsk,
+ u16 partid_d, u16 partid_i,
+ u8 pmg_d, u8 pmg_i)
+{
+#ifdef CONFIG_ARM64_MPAM
+ u64 regval;
+
+ regval = FIELD_PREP(MPAM0_EL1_PARTID_D, partid_d);
+ regval |= FIELD_PREP(MPAM0_EL1_PARTID_I, partid_i);
+ regval |= FIELD_PREP(MPAM0_EL1_PMG_D, pmg_d);
+ regval |= FIELD_PREP(MPAM0_EL1_PMG_I, pmg_i);
+
+ WRITE_ONCE(task_thread_info(tsk)->mpam_partid_pmg, regval);
+#endif
+}
+
static inline u64 mpam_get_regval(struct task_struct *tsk)
{
#ifdef CONFIG_ARM64_MPAM
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [RFC PATCH 05/38] arm64: mpam: Add helpers to change a task or cpu's MPAM PARTID/PMG values
2025-12-05 21:58 ` [RFC PATCH 05/38] arm64: mpam: Add helpers to change a task or cpu's MPAM PARTID/PMG values James Morse
@ 2025-12-18 10:44 ` Jonathan Cameron
2025-12-19 11:56 ` Ben Horgan
0 siblings, 1 reply; 95+ messages in thread
From: Jonathan Cameron @ 2025-12-18 10:44 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
On Fri, 5 Dec 2025 21:58:28 +0000
James Morse <james.morse@arm.com> wrote:
> Care must be taken when modifying the PARTID and PMG of a task in any
> per-task structure as writing these values may race with the task being
> scheduled in, and reading the modified values.
>
> Add helpers to set the task properties, and the CPU default value.
> These use WRITE_ONCE() that pairs with the READ_ONCE() in mpam_get_regval()
> to avoid causing torn values.
>
> CC: Dave Martin <Dave.Martin@arm.com>
> CC: Ben Horgan <ben.horgan@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> arch/arm64/include/asm/mpam.h | 30 ++++++++++++++++++++++++++++++
> 1 file changed, 30 insertions(+)
>
> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
> index 86a55176f884..2960ffaf6574 100644
> --- a/arch/arm64/include/asm/mpam.h
> +++ b/arch/arm64/include/asm/mpam.h
> @@ -5,6 +5,7 @@
> #define __ASM__MPAM_H
>
> #include <linux/bitops.h>
> +#include <linux/bitfield.h>
> #include <linux/init.h>
> #include <linux/jump_label.h>
> #include <linux/percpu.h>
> @@ -37,6 +38,35 @@ extern u64 arm64_mpam_global_default;
> * A value in struct thread_info is used instead of struct task_struct as the
> * cpu's u64 register format is used, but struct task_struct has two u32'.
> */
I'd be tempted to reorder this so the comment ends up near at least
one use of task_thread_info(tsk)->mpam_partid_pmg. Otherwise it is even
less obvious what it is referring to.
Easiest might be to put the mpam_get_regval() before the mpam_set_task_partid_pmg()
> +static inline void mpam_set_cpu_defaults(int cpu, u16 partid_d, u16 partid_i,
> + u8 pmg_d, u8 pmg_i)
> +{
> + u64 default_val;
> +
> + default_val = FIELD_PREP(MPAM0_EL1_PARTID_D, partid_d);
> + default_val |= FIELD_PREP(MPAM0_EL1_PARTID_I, partid_i);
> + default_val |= FIELD_PREP(MPAM0_EL1_PMG_D, pmg_d);
> + default_val |= FIELD_PREP(MPAM0_EL1_PMG_I, pmg_i);
> +
> + WRITE_ONCE(per_cpu(arm64_mpam_default, cpu), default_val);
> +}
> +
> +static inline void mpam_set_task_partid_pmg(struct task_struct *tsk,
> + u16 partid_d, u16 partid_i,
> + u8 pmg_d, u8 pmg_i)
> +{
> +#ifdef CONFIG_ARM64_MPAM
> + u64 regval;
> +
> + regval = FIELD_PREP(MPAM0_EL1_PARTID_D, partid_d);
> + regval |= FIELD_PREP(MPAM0_EL1_PARTID_I, partid_i);
> + regval |= FIELD_PREP(MPAM0_EL1_PMG_D, pmg_d);
> + regval |= FIELD_PREP(MPAM0_EL1_PMG_I, pmg_i);
Maybe a macro or helper to build regval given this replicates the block in
mpam_set_cpu_defaults. Obviously ignore if this changes in later patches!
> +
> + WRITE_ONCE(task_thread_info(tsk)->mpam_partid_pmg, regval);
> +#endif
> +}
> +
> static inline u64 mpam_get_regval(struct task_struct *tsk)
> {
> #ifdef CONFIG_ARM64_MPAM
^ permalink raw reply [flat|nested] 95+ messages in thread* Re: [RFC PATCH 05/38] arm64: mpam: Add helpers to change a task or cpu's MPAM PARTID/PMG values
2025-12-18 10:44 ` Jonathan Cameron
@ 2025-12-19 11:56 ` Ben Horgan
0 siblings, 0 replies; 95+ messages in thread
From: Ben Horgan @ 2025-12-19 11:56 UTC (permalink / raw)
To: Jonathan Cameron, James Morse
Cc: linux-kernel, linux-arm-kernel, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal
Hi Jonathan,
On 12/18/25 10:44, Jonathan Cameron wrote:
> On Fri, 5 Dec 2025 21:58:28 +0000
> James Morse <james.morse@arm.com> wrote:
>
>> Care must be taken when modifying the PARTID and PMG of a task in any
>> per-task structure as writing these values may race with the task being
>> scheduled in, and reading the modified values.
>>
>> Add helpers to set the task properties, and the CPU default value.
>> These use WRITE_ONCE() that pairs with the READ_ONCE() in mpam_get_regval()
>> to avoid causing torn values.
>>
>> CC: Dave Martin <Dave.Martin@arm.com>
>> CC: Ben Horgan <ben.horgan@arm.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> ---
>> arch/arm64/include/asm/mpam.h | 30 ++++++++++++++++++++++++++++++
>> 1 file changed, 30 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
>> index 86a55176f884..2960ffaf6574 100644
>> --- a/arch/arm64/include/asm/mpam.h
>> +++ b/arch/arm64/include/asm/mpam.h
>> @@ -5,6 +5,7 @@
>> #define __ASM__MPAM_H
>>
>> #include <linux/bitops.h>
>> +#include <linux/bitfield.h>
>> #include <linux/init.h>
>> #include <linux/jump_label.h>
>> #include <linux/percpu.h>
>> @@ -37,6 +38,35 @@ extern u64 arm64_mpam_global_default;
>> * A value in struct thread_info is used instead of struct task_struct as the
>> * cpu's u64 register format is used, but struct task_struct has two u32'.
>> */
> I'd be tempted to reorder this so the comment ends up near at least
> one use of task_thread_info(tsk)->mpam_partid_pmg. Otherwise it is even
> less obvious what it is referring to.
>
> Easiest might be to put the mpam_get_regval() before the mpam_set_task_partid_pmg()
Yes, this comment positioning was confusing.
>
>
>> +static inline void mpam_set_cpu_defaults(int cpu, u16 partid_d, u16 partid_i,
>> + u8 pmg_d, u8 pmg_i)
>> +{
>> + u64 default_val;
>> +
>> + default_val = FIELD_PREP(MPAM0_EL1_PARTID_D, partid_d);
>> + default_val |= FIELD_PREP(MPAM0_EL1_PARTID_I, partid_i);
>> + default_val |= FIELD_PREP(MPAM0_EL1_PMG_D, pmg_d);
>> + default_val |= FIELD_PREP(MPAM0_EL1_PMG_I, pmg_i);
>> +
>> + WRITE_ONCE(per_cpu(arm64_mpam_default, cpu), default_val);
>> +}
>> +
>> +static inline void mpam_set_task_partid_pmg(struct task_struct *tsk,
>> + u16 partid_d, u16 partid_i,
>> + u8 pmg_d, u8 pmg_i)
>> +{
>> +#ifdef CONFIG_ARM64_MPAM
>> + u64 regval;
>> +
>> + regval = FIELD_PREP(MPAM0_EL1_PARTID_D, partid_d);
>> + regval |= FIELD_PREP(MPAM0_EL1_PARTID_I, partid_i);
>> + regval |= FIELD_PREP(MPAM0_EL1_PMG_D, pmg_d);
>> + regval |= FIELD_PREP(MPAM0_EL1_PMG_I, pmg_i);
>
> Maybe a macro or helper to build regval given this replicates the block in
> mpam_set_cpu_defaults. Obviously ignore if this changes in later patches!
I've added a helper.
>
>> +
>> + WRITE_ONCE(task_thread_info(tsk)->mpam_partid_pmg, regval);
>> +#endif
>> +}
>> +
>> static inline u64 mpam_get_regval(struct task_struct *tsk)
>> {
>> #ifdef CONFIG_ARM64_MPAM
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread
* [RFC PATCH 06/38] KVM: arm64: Force guest EL1 to use user-space's partid configuration
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (4 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 05/38] arm64: mpam: Add helpers to change a task or cpu's MPAM PARTID/PMG values James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-09 15:32 ` Ben Horgan
2025-12-12 11:31 ` Ben Horgan
2025-12-05 21:58 ` [RFC PATCH 07/38] arm_mpam: resctrl: Add boilerplate cpuhp and domain allocation James Morse
` (32 subsequent siblings)
38 siblings, 2 replies; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
While we trap the guest's attempts to read/write the MPAM control
registers, the hardware continues to use them. Guest-EL0 uses KVM's
user-space's configuration, as the value is left in the register, and
guest-EL1 uses either the host kernel's configuration, or in the case of
VHE, the UNKNOWN reset value of MPAM1_EL1.
On nVHE systems, EL2 continues to use partid-0 for world-switch, even
when the host may have configured its kernel threads to use a different
partid. 0 may have been assigned to another task.
We want to force the guest-EL1 to use KVM's user-space's MPAM
configuration, and EL2s to match the host's EL1 config.
On a nVHE system, copy the EL1 MPAM register to EL2. This ensures
world-switch uses the same partid as the kernel thread does on the host.
When loading the guest's EL1 registers, copy the VMM's EL0 partid to
the EL1 register.
For VHE systems, we can skip restoring the EL1 register for the host,
as it is out-of-context once HCR_EL2.TGE is set.
This is done outside the usual sysreg save/restore as the values can
change behind KVMs back, so should not be stored in the guest context.
Signed-off-by: James Morse <james.morse@arm.com>
---
arch/arm64/include/asm/kvm_host.h | 1 +
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 16 ++++++++++++++++
arch/arm64/kvm/hyp/nvhe/switch.c | 12 ++++++++++++
arch/arm64/kvm/hyp/vhe/sysreg-sr.c | 1 +
4 files changed, 30 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index b763293281c8..baba23b7ce97 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -447,6 +447,7 @@ enum vcpu_sysreg {
MDCCINT_EL1, /* Monitor Debug Comms Channel Interrupt Enable Reg */
OSLSR_EL1, /* OS Lock Status Register */
DISR_EL1, /* Deferred Interrupt Status Register */
+ MPAM1_EL1, /* Memory Partitioning And Monitoring register */
/* Performance Monitors Registers */
PMCR_EL0, /* Control Register */
diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
index a17cbe7582de..d8ab0ced0403 100644
--- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
+++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
@@ -166,6 +166,9 @@ static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
ctxt_sys_reg(ctxt, TFSRE0_EL1) = read_sysreg_s(SYS_TFSRE0_EL1);
}
+ if (system_supports_mpam())
+ ctxt_sys_reg(ctxt, MPAM1_EL1) = read_sysreg_el1(SYS_MPAM1);
+
ctxt_sys_reg(ctxt, SP_EL1) = read_sysreg(sp_el1);
ctxt_sys_reg(ctxt, ELR_EL1) = read_sysreg_el1(SYS_ELR);
ctxt_sys_reg(ctxt, SPSR_EL1) = read_sysreg_el1(SYS_SPSR);
@@ -261,6 +264,9 @@ static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt,
write_sysreg_s(ctxt_sys_reg(ctxt, TFSRE0_EL1), SYS_TFSRE0_EL1);
}
+ if (system_supports_mpam())
+ write_sysreg_el1(ctxt_sys_reg(ctxt, MPAM1_EL1), SYS_MPAM1);
+
if (!has_vhe() &&
cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT) &&
ctxt->__hyp_running_vcpu) {
@@ -374,4 +380,14 @@ static inline void __sysreg32_restore_state(struct kvm_vcpu *vcpu)
write_sysreg(__vcpu_sys_reg(vcpu, DBGVCR32_EL2), dbgvcr32_el2);
}
+/*
+ * The _EL0 value was written by the host's context switch and belongs to the
+ * VMM. Copy this into the guest's _EL1 register.
+ */
+static inline void __mpam_guest_load(void)
+{
+ if (system_supports_mpam())
+ write_sysreg_el1(read_sysreg_s(SYS_MPAM0_EL1), SYS_MPAM1);
+}
+
#endif /* __ARM64_KVM_HYP_SYSREG_SR_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index d3b9ec8a7c28..b785977aa61e 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -238,6 +238,15 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
return __fixup_guest_exit(vcpu, exit_code, handlers);
}
+/* Use the host thread's partid and pmg for world switch */
+static void __mpam_copy_el1_to_el2(void)
+{
+ if (system_supports_mpam()) {
+ write_sysreg_s(read_sysreg_s(SYS_MPAM1_EL1), SYS_MPAM2_EL2);
+ isb();
+ }
+}
+
/* Switch to the guest for legacy non-VHE systems */
int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
{
@@ -247,6 +256,8 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
bool pmu_switch_needed;
u64 exit_code;
+ __mpam_copy_el1_to_el2();
+
/*
* Having IRQs masked via PMR when entering the guest means the GIC
* will not signal the CPU of interrupts of lower priority, and the
@@ -306,6 +317,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
__timer_enable_traps(vcpu);
__debug_switch_to_guest(vcpu);
+ __mpam_guest_load();
do {
/* Jump in the fire! */
diff --git a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
index f28c6cf4fe1b..2a84edc90465 100644
--- a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
@@ -222,6 +222,7 @@ void __vcpu_load_switch_sysregs(struct kvm_vcpu *vcpu)
*/
__sysreg32_restore_state(vcpu);
__sysreg_restore_user_state(guest_ctxt);
+ __mpam_guest_load();
if (unlikely(is_hyp_ctxt(vcpu))) {
__sysreg_restore_vel2_state(vcpu);
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [RFC PATCH 06/38] KVM: arm64: Force guest EL1 to use user-space's partid configuration
2025-12-05 21:58 ` [RFC PATCH 06/38] KVM: arm64: Force guest EL1 to use user-space's partid configuration James Morse
@ 2025-12-09 15:32 ` Ben Horgan
2025-12-12 11:31 ` Ben Horgan
1 sibling, 0 replies; 95+ messages in thread
From: Ben Horgan @ 2025-12-09 15:32 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel
Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal
Hi James,
On 12/5/25 21:58, James Morse wrote:
> While we trap the guest's attempts to read/write the MPAM control
> registers, the hardware continues to use them. Guest-EL0 uses KVM's
> user-space's configuration, as the value is left in the register, and
> guest-EL1 uses either the host kernel's configuration, or in the case of
> VHE, the UNKNOWN reset value of MPAM1_EL1.
>
> On nVHE systems, EL2 continues to use partid-0 for world-switch, even
> when the host may have configured its kernel threads to use a different
> partid. 0 may have been assigned to another task.
> > We want to force the guest-EL1 to use KVM's user-space's MPAM
> configuration, and EL2s to match the host's EL1 config.
>
> On a nVHE system, copy the EL1 MPAM register to EL2. This ensures
> world-switch uses the same partid as the kernel thread does on the host.
>
> When loading the guest's EL1 registers, copy the VMM's EL0 partid to
> the EL1 register.
>
> For VHE systems, we can skip restoring the EL1 register for the host,
> as it is out-of-context once HCR_EL2.TGE is set.
>
> This is done outside the usual sysreg save/restore as the values can
> change behind KVMs back, so should not be stored in the guest context.
>
> Signed-off-by: James Morse <james.morse@arm.com>
One thing that is needed in kvm now that we are changing MPAM for EL1 is
an update to the __*activate_traps_mpam() to not assume partids and pmgs
are 0. I have an existing patch to deal with this.
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [RFC PATCH 06/38] KVM: arm64: Force guest EL1 to use user-space's partid configuration
2025-12-05 21:58 ` [RFC PATCH 06/38] KVM: arm64: Force guest EL1 to use user-space's partid configuration James Morse
2025-12-09 15:32 ` Ben Horgan
@ 2025-12-12 11:31 ` Ben Horgan
1 sibling, 0 replies; 95+ messages in thread
From: Ben Horgan @ 2025-12-12 11:31 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel
Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal
Hi James,
On 12/5/25 21:58, James Morse wrote:
> While we trap the guest's attempts to read/write the MPAM control
> registers, the hardware continues to use them. Guest-EL0 uses KVM's
> user-space's configuration, as the value is left in the register, and
> guest-EL1 uses either the host kernel's configuration, or in the case of
> VHE, the UNKNOWN reset value of MPAM1_EL1.
>
> On nVHE systems, EL2 continues to use partid-0 for world-switch, even
> when the host may have configured its kernel threads to use a different
> partid. 0 may have been assigned to another task.
>
> We want to force the guest-EL1 to use KVM's user-space's MPAM
> configuration, and EL2s to match the host's EL1 config.
>
> On a nVHE system, copy the EL1 MPAM register to EL2. This ensures
> world-switch uses the same partid as the kernel thread does on the host.
>
> When loading the guest's EL1 registers, copy the VMM's EL0 partid to
> the EL1 register.
>
> For VHE systems, we can skip restoring the EL1 register for the host,
> as it is out-of-context once HCR_EL2.TGE is set.
>
> This is done outside the usual sysreg save/restore as the values can
> change behind KVMs back, so should not be stored in the guest context.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> arch/arm64/include/asm/kvm_host.h | 1 +
> arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 16 ++++++++++++++++
> arch/arm64/kvm/hyp/nvhe/switch.c | 12 ++++++++++++
> arch/arm64/kvm/hyp/vhe/sysreg-sr.c | 1 +
> 4 files changed, 30 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index b763293281c8..baba23b7ce97 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -447,6 +447,7 @@ enum vcpu_sysreg {
> MDCCINT_EL1, /* Monitor Debug Comms Channel Interrupt Enable Reg */
> OSLSR_EL1, /* OS Lock Status Register */
> DISR_EL1, /* Deferred Interrupt Status Register */
> + MPAM1_EL1, /* Memory Partitioning And Monitoring register */
>
> /* Performance Monitors Registers */
> PMCR_EL0, /* Control Register */
> diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> index a17cbe7582de..d8ab0ced0403 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h
> @@ -166,6 +166,9 @@ static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt)
> ctxt_sys_reg(ctxt, TFSRE0_EL1) = read_sysreg_s(SYS_TFSRE0_EL1);
> }
>
> + if (system_supports_mpam())
> + ctxt_sys_reg(ctxt, MPAM1_EL1) = read_sysreg_el1(SYS_MPAM1);
> +
> ctxt_sys_reg(ctxt, SP_EL1) = read_sysreg(sp_el1);
> ctxt_sys_reg(ctxt, ELR_EL1) = read_sysreg_el1(SYS_ELR);
> ctxt_sys_reg(ctxt, SPSR_EL1) = read_sysreg_el1(SYS_SPSR);
> @@ -261,6 +264,9 @@ static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt,
> write_sysreg_s(ctxt_sys_reg(ctxt, TFSRE0_EL1), SYS_TFSRE0_EL1);
> }
>
> + if (system_supports_mpam())
> + write_sysreg_el1(ctxt_sys_reg(ctxt, MPAM1_EL1), SYS_MPAM1);
> +
I don't think this
__sysreg_save_el1_state()/__sysreg_restore_el1_state() mpam change adds
anything. Assuming MPAM0 and MPAM1 are always set together and that we
continue to trap accesses from the guest there is nothing to change the
value away from that of the task. If MPAM0 and MPAM1 were set separately
then we would need a way to restore the host value of MPAM1 in the nvhe
case due to the copy from MPAM0 to MPAM1.
> if (!has_vhe() &&
> cpus_have_final_cap(ARM64_WORKAROUND_SPECULATIVE_AT) &&
> ctxt->__hyp_running_vcpu) {
> @@ -374,4 +380,14 @@ static inline void __sysreg32_restore_state(struct kvm_vcpu *vcpu)
> write_sysreg(__vcpu_sys_reg(vcpu, DBGVCR32_EL2), dbgvcr32_el2);
> }
>
> +/*
> + * The _EL0 value was written by the host's context switch and belongs to the
> + * VMM. Copy this into the guest's _EL1 register.
> + */
> +static inline void __mpam_guest_load(void)
> +{
> + if (system_supports_mpam())
> + write_sysreg_el1(read_sysreg_s(SYS_MPAM0_EL1), SYS_MPAM1);
> +}
> +
> #endif /* __ARM64_KVM_HYP_SYSREG_SR_H__ */
> diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> index d3b9ec8a7c28..b785977aa61e 100644
> --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> @@ -238,6 +238,15 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
> return __fixup_guest_exit(vcpu, exit_code, handlers);
> }
>
> +/* Use the host thread's partid and pmg for world switch */
> +static void __mpam_copy_el1_to_el2(void)
> +{
> + if (system_supports_mpam()) {
> + write_sysreg_s(read_sysreg_s(SYS_MPAM1_EL1), SYS_MPAM2_EL2);
> + isb();
> + }
> +}
> +
> /* Switch to the guest for legacy non-VHE systems */
> int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> {
> @@ -247,6 +256,8 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> bool pmu_switch_needed;
> u64 exit_code;
>
> + __mpam_copy_el1_to_el2();
> +
What about the other hypercalls? E.g. __pkvm_init_vm(). Don't we end up
just running them all with the MPAM settings of the previous vcpu that ran?
> /*
> * Having IRQs masked via PMR when entering the guest means the GIC
> * will not signal the CPU of interrupts of lower priority, and the
> @@ -306,6 +317,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> __timer_enable_traps(vcpu);
>
> __debug_switch_to_guest(vcpu);
> + __mpam_guest_load();
As MPAM0 and MPAM1 are kept in sync this doesn't do anything in the nvhe
case.
>
> do {
> /* Jump in the fire! */
> diff --git a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
> index f28c6cf4fe1b..2a84edc90465 100644
> --- a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
> @@ -222,6 +222,7 @@ void __vcpu_load_switch_sysregs(struct kvm_vcpu *vcpu)
> */
> __sysreg32_restore_state(vcpu);
> __sysreg_restore_user_state(guest_ctxt);
> + __mpam_guest_load();
>
> if (unlikely(is_hyp_ctxt(vcpu))) {
> __sysreg_restore_vel2_state(vcpu);
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread
* [RFC PATCH 07/38] arm_mpam: resctrl: Add boilerplate cpuhp and domain allocation
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (5 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 06/38] KVM: arm64: Force guest EL1 to use user-space's partid configuration James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-09 15:43 ` Ben Horgan
2025-12-18 11:30 ` Jonathan Cameron
2025-12-05 21:58 ` [RFC PATCH 08/38] arm_mpam: resctrl: Pick the caches we will use as resctrl resources James Morse
` (31 subsequent siblings)
38 siblings, 2 replies; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
resctrl has its own data structures to describe its resources. We
can't use these directly as we play tricks with the 'MBA' resource,
picking the MPAM controls or monitors that best apply. We may export
the same component as both L3 and MBA.
Add mpam_resctrl_exports[] as the array of class->resctrl mappings we
are exporting, and add the cpuhp hooks that allocated and free the
resctrl domain structures.
While we're here, plumb in a few other obvious things.
CONFIG_ARM_CPU_RESCTRL is used to allow this code to be built
even though it can't yet be linked against resctrl.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/Makefile | 1 +
drivers/resctrl/mpam_devices.c | 12 ++
drivers/resctrl/mpam_internal.h | 22 +++
drivers/resctrl/mpam_resctrl.c | 329 ++++++++++++++++++++++++++++++++
include/linux/arm_mpam.h | 3 +
5 files changed, 367 insertions(+)
create mode 100644 drivers/resctrl/mpam_resctrl.c
diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
index 898199dcf80d..40beaf999582 100644
--- a/drivers/resctrl/Makefile
+++ b/drivers/resctrl/Makefile
@@ -1,4 +1,5 @@
obj-$(CONFIG_ARM64_MPAM_DRIVER) += mpam.o
mpam-y += mpam_devices.o
+mpam-$(CONFIG_ARM_CPU_RESCTRL) += mpam_resctrl.o
ccflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG) += -DDEBUG
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 2996ad93fc3e..efaf7633bc35 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -1627,6 +1627,9 @@ static int mpam_cpu_online(unsigned int cpu)
mpam_reprogram_msc(msc);
}
+ if (mpam_is_enabled())
+ mpam_resctrl_online_cpu(cpu);
+
return 0;
}
@@ -1670,6 +1673,9 @@ static int mpam_cpu_offline(unsigned int cpu)
{
struct mpam_msc *msc;
+ if (mpam_is_enabled())
+ mpam_resctrl_offline_cpu(cpu);
+
guard(srcu)(&mpam_srcu);
list_for_each_entry_srcu(msc, &mpam_all_msc, all_msc_list,
srcu_read_lock_held(&mpam_srcu)) {
@@ -2516,6 +2522,12 @@ static void mpam_enable_once(void)
mutex_unlock(&mpam_list_lock);
cpus_read_unlock();
+ if (!err) {
+ err = mpam_resctrl_setup();
+ if (err)
+ pr_err("Failed to initialise resctrl: %d\n", err);
+ }
+
if (err) {
mpam_disable_reason = "Failed to enable.";
schedule_work(&mpam_broken_work);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 4508a6654fe0..dfd3512ac924 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -12,6 +12,7 @@
#include <linux/jump_label.h>
#include <linux/llist.h>
#include <linux/mutex.h>
+#include <linux/resctrl.h>
#include <linux/srcu.h>
#include <linux/spinlock.h>
#include <linux/srcu.h>
@@ -336,6 +337,17 @@ struct mpam_msc_ris {
struct mpam_garbage garbage;
};
+struct mpam_resctrl_dom {
+ struct mpam_component *ctrl_comp;
+ struct rdt_ctrl_domain resctrl_ctrl_dom;
+ struct rdt_mon_domain resctrl_mon_dom;
+};
+
+struct mpam_resctrl_res {
+ struct mpam_class *class;
+ struct rdt_resource resctrl_res;
+};
+
static inline int mpam_alloc_csu_mon(struct mpam_class *class)
{
struct mpam_props *cprops = &class->props;
@@ -390,6 +402,16 @@ void mpam_msmon_reset_mbwu(struct mpam_component *comp, struct mon_cfg *ctx);
int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
cpumask_t *affinity);
+#ifdef CONFIG_RESCTRL_FS
+int mpam_resctrl_setup(void);
+int mpam_resctrl_online_cpu(unsigned int cpu);
+void mpam_resctrl_offline_cpu(unsigned int cpu);
+#else
+static inline int mpam_resctrl_setup(void) { return 0; }
+static inline int mpam_resctrl_online_cpu(unsigned int cpu) { return 0; }
+static inline void mpam_resctrl_offline_cpu(unsigned int cpu) { }
+#endif /* CONFIG_RESCTRL_FS */
+
/*
* MPAM MSCs have the following register layout. See:
* Arm Memory System Resource Partitioning and Monitoring (MPAM) System
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
new file mode 100644
index 000000000000..320cebbd37ce
--- /dev/null
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -0,0 +1,329 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2025 Arm Ltd.
+
+#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
+
+#include <linux/arm_mpam.h>
+#include <linux/cacheinfo.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/errno.h>
+#include <linux/list.h>
+#include <linux/printk.h>
+#include <linux/rculist.h>
+#include <linux/resctrl.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+
+#include <asm/mpam.h>
+
+#include "mpam_internal.h"
+
+/*
+ * The classes we've picked to map to resctrl resources, wrapped
+ * in with their resctrl structure.
+ * Class pointer may be NULL.
+ */
+static struct mpam_resctrl_res mpam_resctrl_controls[RDT_NUM_RESOURCES];
+
+/* The lock for modifying resctrl's domain lists from cpuhp callbacks. */
+static DEFINE_MUTEX(domain_list_lock);
+
+static bool exposed_alloc_capable;
+static bool exposed_mon_capable;
+
+bool resctrl_arch_alloc_capable(void)
+{
+ return exposed_alloc_capable;
+}
+
+bool resctrl_arch_mon_capable(void)
+{
+ return exposed_mon_capable;
+}
+
+/*
+ * MSC may raise an error interrupt if it sees an out or range partid/pmg,
+ * and go on to truncate the value. Regardless of what the hardware supports,
+ * only the system wide safe value is safe to use.
+ */
+u32 resctrl_arch_get_num_closid(struct rdt_resource *ignored)
+{
+ return mpam_partid_max + 1;
+}
+
+struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
+{
+ if (l >= RDT_NUM_RESOURCES)
+ return NULL;
+
+ return &mpam_resctrl_controls[l].resctrl_res;
+}
+
+static int mpam_resctrl_control_init(struct mpam_resctrl_res *res,
+ enum resctrl_res_level type)
+{
+ /* TODO: initialise the resctrl resources */
+
+ return 0;
+}
+
+static int mpam_resctrl_pick_domain_id(int cpu, struct mpam_component *comp)
+{
+ struct mpam_class *class = comp->class;
+
+ if (class->type == MPAM_CLASS_CACHE)
+ return comp->comp_id;
+
+ /* TODO: repaint domain ids to match the L3 domain ids */
+ /*
+ * Otherwise, expose the ID used by the firmware table code.
+ */
+ return comp->comp_id;
+}
+
+static void mpam_resctrl_domain_hdr_init(int cpu, struct mpam_component *comp,
+ struct rdt_domain_hdr *hdr)
+{
+ lockdep_assert_cpus_held();
+
+ INIT_LIST_HEAD(&hdr->list);
+ hdr->id = mpam_resctrl_pick_domain_id(cpu, comp);
+ cpumask_set_cpu(cpu, &hdr->cpu_mask);
+}
+
+/**
+ * mpam_resctrl_offline_domain_hdr() - Update the domain header to remove a CPU.
+ * @cpu: The CPU to remove from the domain.
+ * @hdr: The domain's header.
+ *
+ * Removes @cpu from the header mask. If this was the last CPU in the domain,
+ * the domain header is removed from its parent list and true is returned,
+ * indicating the parent structure can be freed.
+ * If there are other CPUs in the domain, returns false.
+ */
+static bool mpam_resctrl_offline_domain_hdr(unsigned int cpu,
+ struct rdt_domain_hdr *hdr)
+{
+ cpumask_clear_cpu(cpu, &hdr->cpu_mask);
+ if (cpumask_empty(&hdr->cpu_mask)) {
+ list_del(&hdr->list);
+ return true;
+ }
+
+ return false;
+}
+
+static struct mpam_resctrl_dom *
+mpam_resctrl_alloc_domain(unsigned int cpu, struct mpam_resctrl_res *res)
+{
+ int err;
+ struct mpam_resctrl_dom *dom;
+ struct rdt_mon_domain *mon_d;
+ struct rdt_ctrl_domain *ctrl_d;
+ struct mpam_class *class = res->class;
+ struct mpam_component *comp_iter, *ctrl_comp;
+ struct rdt_resource *r = &res->resctrl_res;
+
+ lockdep_assert_held(&domain_list_lock);
+
+ ctrl_comp = NULL;
+ guard(srcu)(&mpam_srcu);
+ list_for_each_entry_srcu(comp_iter, &class->components, class_list,
+ srcu_read_lock_held(&mpam_srcu)) {
+ if (cpumask_test_cpu(cpu, &comp_iter->affinity)) {
+ ctrl_comp = comp_iter;
+ break;
+ }
+ }
+
+ /* class has no component for this CPU */
+ if (WARN_ON_ONCE(!ctrl_comp))
+ return ERR_PTR(-EINVAL);
+
+ dom = kzalloc_node(sizeof(*dom), GFP_KERNEL, cpu_to_node(cpu));
+ if (!dom)
+ return ERR_PTR(-ENOMEM);
+
+ if (exposed_alloc_capable) {
+ dom->ctrl_comp = ctrl_comp;
+
+ ctrl_d = &dom->resctrl_ctrl_dom;
+ mpam_resctrl_domain_hdr_init(cpu, ctrl_comp, &ctrl_d->hdr);
+ ctrl_d->hdr.type = RESCTRL_CTRL_DOMAIN;
+ /* TODO: this list should be sorted */
+ list_add_tail_rcu(&ctrl_d->hdr.list, &r->ctrl_domains);
+ err = resctrl_online_ctrl_domain(r, ctrl_d);
+ if (err) {
+ dom = ERR_PTR(err);
+ goto offline_ctrl_domain;
+ }
+ } else {
+ pr_debug("Skipped control domain online - no controls\n");
+ }
+
+ if (exposed_mon_capable) {
+ mon_d = &dom->resctrl_mon_dom;
+ mpam_resctrl_domain_hdr_init(cpu, ctrl_comp, &mon_d->hdr);
+ mon_d->hdr.type = RESCTRL_MON_DOMAIN;
+ /* TODO: this list should be sorted */
+ list_add_tail_rcu(&mon_d->hdr.list, &r->mon_domains);
+ err = resctrl_online_mon_domain(r, mon_d);
+ if (err) {
+ dom = ERR_PTR(err);
+ goto offline_mon_hdr;
+ }
+ } else {
+ pr_debug("Skipped monitor domain online - no monitors\n");
+ }
+ goto out;
+
+offline_mon_hdr:
+ mpam_resctrl_offline_domain_hdr(cpu, &mon_d->hdr);
+offline_ctrl_domain:
+ resctrl_offline_ctrl_domain(r, ctrl_d);
+out:
+ return dom;
+}
+
+static struct mpam_resctrl_dom *
+mpam_resctrl_get_domain_from_cpu(int cpu, struct mpam_resctrl_res *res)
+{
+ struct mpam_resctrl_dom *dom;
+ struct rdt_ctrl_domain *ctrl_d;
+
+ lockdep_assert_cpus_held();
+
+ list_for_each_entry_rcu(ctrl_d, &res->resctrl_res.ctrl_domains,
+ hdr.list) {
+ dom = container_of(ctrl_d, struct mpam_resctrl_dom,
+ resctrl_ctrl_dom);
+
+ if (cpumask_test_cpu(cpu, &dom->ctrl_comp->affinity))
+ return dom;
+ }
+
+ return NULL;
+}
+
+int mpam_resctrl_online_cpu(unsigned int cpu)
+{
+ int i;
+ struct mpam_resctrl_dom *dom;
+ struct mpam_resctrl_res *res;
+
+ guard(mutex)(&domain_list_lock);
+ for (i = 0; i < RDT_NUM_RESOURCES; i++) {
+ res = &mpam_resctrl_controls[i];
+ if (!res->class)
+ continue; // dummy_resource;
+
+ dom = mpam_resctrl_get_domain_from_cpu(cpu, res);
+ if (!dom)
+ dom = mpam_resctrl_alloc_domain(cpu, res);
+ if (IS_ERR(dom))
+ return PTR_ERR(dom);
+ }
+
+ resctrl_online_cpu(cpu);
+
+ return 0;
+}
+
+void mpam_resctrl_offline_cpu(unsigned int cpu)
+{
+ int i;
+ struct mpam_resctrl_res *res;
+ struct mpam_resctrl_dom *dom;
+ struct rdt_mon_domain *mon_d;
+ struct rdt_ctrl_domain *ctrl_d;
+ bool ctrl_dom_empty, mon_dom_empty;
+
+ resctrl_offline_cpu(cpu);
+
+ guard(mutex)(&domain_list_lock);
+ for (i = 0; i < RDT_NUM_RESOURCES; i++) {
+ res = &mpam_resctrl_controls[i];
+ if (!res->class)
+ continue; // dummy resource
+
+ dom = mpam_resctrl_get_domain_from_cpu(cpu, res);
+ if (WARN_ON_ONCE(!dom))
+ continue;
+
+ ctrl_dom_empty = true;
+ if (exposed_alloc_capable) {
+ ctrl_d = &dom->resctrl_ctrl_dom;
+ ctrl_dom_empty = mpam_resctrl_offline_domain_hdr(cpu, &ctrl_d->hdr);
+ if (ctrl_dom_empty)
+ resctrl_offline_ctrl_domain(&res->resctrl_res, ctrl_d);
+ }
+
+ mon_dom_empty = true;
+ if (exposed_mon_capable) {
+ mon_d = &dom->resctrl_mon_dom;
+ mon_dom_empty = mpam_resctrl_offline_domain_hdr(cpu, &mon_d->hdr);
+ if (mon_dom_empty)
+ resctrl_offline_mon_domain(&res->resctrl_res, mon_d);
+ }
+
+ if (ctrl_dom_empty && mon_dom_empty)
+ kfree(dom);
+ }
+}
+
+int mpam_resctrl_setup(void)
+{
+ int err = 0;
+ enum resctrl_res_level i;
+ struct mpam_resctrl_res *res;
+
+ cpus_read_lock();
+ for (i = 0; i < RDT_NUM_RESOURCES; i++) {
+ res = &mpam_resctrl_controls[i];
+ INIT_LIST_HEAD_RCU(&res->resctrl_res.ctrl_domains);
+ INIT_LIST_HEAD_RCU(&res->resctrl_res.mon_domains);
+ res->resctrl_res.rid = i;
+ }
+
+ /* TODO: pick MPAM classes to map to resctrl resources */
+
+ /* Initialise the resctrl structures from the classes */
+ for (i = 0; i < RDT_NUM_RESOURCES; i++) {
+ res = &mpam_resctrl_controls[i];
+ if (!res->class)
+ continue; // dummy resource
+
+ err = mpam_resctrl_control_init(res, i);
+ if (err) {
+ pr_debug("Failed to initialise rid %u\n", i);
+ break;
+ }
+ }
+ cpus_read_unlock();
+
+ if (err || (!exposed_alloc_capable && !exposed_mon_capable)) {
+ if (err)
+ pr_debug("Internal error %d - resctrl not supported\n",
+ err);
+ else
+ pr_debug("No alloc(%u) or monitor(%u) found - resctrl not supported\n",
+ exposed_alloc_capable, exposed_mon_capable);
+ err = -EOPNOTSUPP;
+ }
+
+ if (!err) {
+ if (!is_power_of_2(mpam_pmg_max + 1)) {
+ /*
+ * If not all the partid*pmg values are valid indexes,
+ * resctrl may allocate pmg that don't exist. This
+ * should cause an error interrupt.
+ */
+ pr_warn("Number of PMG is not a power of 2! resctrl may misbehave");
+ }
+
+ /* TODO: call resctrl_init() */
+ }
+
+ return err;
+}
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
index 7f00c5285a32..2c7d1413a401 100644
--- a/include/linux/arm_mpam.h
+++ b/include/linux/arm_mpam.h
@@ -49,6 +49,9 @@ static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
}
#endif
+bool resctrl_arch_alloc_capable(void);
+bool resctrl_arch_mon_capable(void);
+
/**
* mpam_register_requestor() - Register a requestor with the MPAM driver
* @partid_max: The maximum PARTID value the requestor can generate.
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [RFC PATCH 07/38] arm_mpam: resctrl: Add boilerplate cpuhp and domain allocation
2025-12-05 21:58 ` [RFC PATCH 07/38] arm_mpam: resctrl: Add boilerplate cpuhp and domain allocation James Morse
@ 2025-12-09 15:43 ` Ben Horgan
2025-12-18 11:30 ` Jonathan Cameron
1 sibling, 0 replies; 95+ messages in thread
From: Ben Horgan @ 2025-12-09 15:43 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel
Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal
Hi James,
On 12/5/25 21:58, James Morse wrote:
> resctrl has its own data structures to describe its resources. We
> can't use these directly as we play tricks with the 'MBA' resource,
> picking the MPAM controls or monitors that best apply. We may export
> the same component as both L3 and MBA.
>
> Add mpam_resctrl_exports[] as the array of class->resctrl mappings we
> are exporting, and add the cpuhp hooks that allocated and free the
> resctrl domain structures.
>
> While we're here, plumb in a few other obvious things.
>
> CONFIG_ARM_CPU_RESCTRL is used to allow this code to be built
> even though it can't yet be linked against resctrl.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> drivers/resctrl/Makefile | 1 +
> drivers/resctrl/mpam_devices.c | 12 ++
> drivers/resctrl/mpam_internal.h | 22 +++
> drivers/resctrl/mpam_resctrl.c | 329 ++++++++++++++++++++++++++++++++
> include/linux/arm_mpam.h | 3 +
> 5 files changed, 367 insertions(+)
> create mode 100644 drivers/resctrl/mpam_resctrl.c
>
[...]
> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
> new file mode 100644
> index 000000000000..320cebbd37ce
> --- /dev/null
> +++ b/drivers/resctrl/mpam_resctrl.c
> @@ -0,0 +1,329 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2025 Arm Ltd.
> +
> +#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
> +
> +#include <linux/arm_mpam.h>
> +#include <linux/cacheinfo.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
> +#include <linux/errno.h>
> +#include <linux/list.h>
> +#include <linux/printk.h>
> +#include <linux/rculist.h>
> +#include <linux/resctrl.h>
> +#include <linux/slab.h>
> +#include <linux/types.h>
> +
> +#include <asm/mpam.h>
> +
> +#include "mpam_internal.h"
> +
> +/*
> + * The classes we've picked to map to resctrl resources, wrapped
> + * in with their resctrl structure.
> + * Class pointer may be NULL.
> + */
> +static struct mpam_resctrl_res mpam_resctrl_controls[RDT_NUM_RESOURCES];
> +
> +/* The lock for modifying resctrl's domain lists from cpuhp callbacks. */
> +static DEFINE_MUTEX(domain_list_lock);
> +
> +static bool exposed_alloc_capable;
> +static bool exposed_mon_capable;
> +
> +bool resctrl_arch_alloc_capable(void)
> +{
> + return exposed_alloc_capable;
> +}
> +
> +bool resctrl_arch_mon_capable(void)
> +{
> + return exposed_mon_capable;
> +}
> +
> +/*
> + * MSC may raise an error interrupt if it sees an out or range partid/pmg,
> + * and go on to truncate the value. Regardless of what the hardware supports,
> + * only the system wide safe value is safe to use.
> + */
> +u32 resctrl_arch_get_num_closid(struct rdt_resource *ignored)
> +{
> + return mpam_partid_max + 1;
> +}
> +
> +struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
> +{
> + if (l >= RDT_NUM_RESOURCES)
> + return NULL;
> +
> + return &mpam_resctrl_controls[l].resctrl_res;
> +}
> +
> +static int mpam_resctrl_control_init(struct mpam_resctrl_res *res,
> + enum resctrl_res_level type)
> +{
> + /* TODO: initialise the resctrl resources */
> +
> + return 0;
> +}
> +
> +static int mpam_resctrl_pick_domain_id(int cpu, struct mpam_component *comp)
> +{
> + struct mpam_class *class = comp->class;
> +
> + if (class->type == MPAM_CLASS_CACHE)
> + return comp->comp_id;
> +
> + /* TODO: repaint domain ids to match the L3 domain ids */
> + /*
> + * Otherwise, expose the ID used by the firmware table code.
> + */
> + return comp->comp_id;
> +}
> +
> +static void mpam_resctrl_domain_hdr_init(int cpu, struct mpam_component *comp,
> + struct rdt_domain_hdr *hdr)
> +{
> + lockdep_assert_cpus_held();
> +
> + INIT_LIST_HEAD(&hdr->list);
> + hdr->id = mpam_resctrl_pick_domain_id(cpu, comp);
> + cpumask_set_cpu(cpu, &hdr->cpu_mask);
> +}
> +
> +/**
> + * mpam_resctrl_offline_domain_hdr() - Update the domain header to remove a CPU.
> + * @cpu: The CPU to remove from the domain.
> + * @hdr: The domain's header.
> + *
> + * Removes @cpu from the header mask. If this was the last CPU in the domain,
> + * the domain header is removed from its parent list and true is returned,
> + * indicating the parent structure can be freed.
> + * If there are other CPUs in the domain, returns false.
> + */
> +static bool mpam_resctrl_offline_domain_hdr(unsigned int cpu,
> + struct rdt_domain_hdr *hdr)
> +{
> + cpumask_clear_cpu(cpu, &hdr->cpu_mask);
> + if (cpumask_empty(&hdr->cpu_mask)) {
> + list_del(&hdr->list);
list_del_rcu(). I'll check some more as I'm not yet convinced we need
rcu for these lists.
> + return true;
> + }
> +
> + return false;
> +}
> +
> +static struct mpam_resctrl_dom *
> +mpam_resctrl_alloc_domain(unsigned int cpu, struct mpam_resctrl_res *res)
> +{
> + int err;
> + struct mpam_resctrl_dom *dom;
> + struct rdt_mon_domain *mon_d;
> + struct rdt_ctrl_domain *ctrl_d;
> + struct mpam_class *class = res->class;
> + struct mpam_component *comp_iter, *ctrl_comp;
> + struct rdt_resource *r = &res->resctrl_res;
> +
> + lockdep_assert_held(&domain_list_lock);
> +
> + ctrl_comp = NULL;
> + guard(srcu)(&mpam_srcu);
> + list_for_each_entry_srcu(comp_iter, &class->components, class_list,
> + srcu_read_lock_held(&mpam_srcu)) {
> + if (cpumask_test_cpu(cpu, &comp_iter->affinity)) {
> + ctrl_comp = comp_iter;
> + break;
> + }
> + }
> +
> + /* class has no component for this CPU */
> + if (WARN_ON_ONCE(!ctrl_comp))
> + return ERR_PTR(-EINVAL);
> +
> + dom = kzalloc_node(sizeof(*dom), GFP_KERNEL, cpu_to_node(cpu));
> + if (!dom)
> + return ERR_PTR(-ENOMEM);
> +
> + if (exposed_alloc_capable) {
> + dom->ctrl_comp = ctrl_comp;
> +
> + ctrl_d = &dom->resctrl_ctrl_dom;
> + mpam_resctrl_domain_hdr_init(cpu, ctrl_comp, &ctrl_d->hdr);
> + ctrl_d->hdr.type = RESCTRL_CTRL_DOMAIN;
> + /* TODO: this list should be sorted */
> + list_add_tail_rcu(&ctrl_d->hdr.list, &r->ctrl_domains);
> + err = resctrl_online_ctrl_domain(r, ctrl_d);
> + if (err) {
> + dom = ERR_PTR(err);
> + goto offline_ctrl_domain;
> + }
> + } else {
> + pr_debug("Skipped control domain online - no controls\n");
> + }
> +
> + if (exposed_mon_capable) {
> + mon_d = &dom->resctrl_mon_dom;
> + mpam_resctrl_domain_hdr_init(cpu, ctrl_comp, &mon_d->hdr);
> + mon_d->hdr.type = RESCTRL_MON_DOMAIN;
> + /* TODO: this list should be sorted */
> + list_add_tail_rcu(&mon_d->hdr.list, &r->mon_domains);
> + err = resctrl_online_mon_domain(r, mon_d);
> + if (err) {
> + dom = ERR_PTR(err);
> + goto offline_mon_hdr;
> + }
> + } else {
> + pr_debug("Skipped monitor domain online - no monitors\n");
> + }
> + goto out;
> +
> +offline_mon_hdr:
> + mpam_resctrl_offline_domain_hdr(cpu, &mon_d->hdr);
> +offline_ctrl_domain:
> + resctrl_offline_ctrl_domain(r, ctrl_d);
> +out:
> + return dom;
> +}
> +
> +static struct mpam_resctrl_dom *
> +mpam_resctrl_get_domain_from_cpu(int cpu, struct mpam_resctrl_res *res)
> +{
> + struct mpam_resctrl_dom *dom;
> + struct rdt_ctrl_domain *ctrl_d;
> +
> + lockdep_assert_cpus_held();
> +
> + list_for_each_entry_rcu(ctrl_d, &res->resctrl_res.ctrl_domains,
> + hdr.list) {
> + dom = container_of(ctrl_d, struct mpam_resctrl_dom,
> + resctrl_ctrl_dom);
> +
> + if (cpumask_test_cpu(cpu, &dom->ctrl_comp->affinity))
> + return dom;
> + }
> +
> + return NULL;
> +}
> +
> +int mpam_resctrl_online_cpu(unsigned int cpu)
> +{
> + int i;
> + struct mpam_resctrl_dom *dom;
> + struct mpam_resctrl_res *res;
> +
> + guard(mutex)(&domain_list_lock);
> + for (i = 0; i < RDT_NUM_RESOURCES; i++) {
> + res = &mpam_resctrl_controls[i];
> + if (!res->class)
> + continue; // dummy_resource;
> +
> + dom = mpam_resctrl_get_domain_from_cpu(cpu, res);
> + if (!dom)
> + dom = mpam_resctrl_alloc_domain(cpu, res);
> + if (IS_ERR(dom))
> + return PTR_ERR(dom);
> + }
> +
> + resctrl_online_cpu(cpu);
> +
> + return 0;
> +}
> +
> +void mpam_resctrl_offline_cpu(unsigned int cpu)
> +{
> + int i;
> + struct mpam_resctrl_res *res;
> + struct mpam_resctrl_dom *dom;
> + struct rdt_mon_domain *mon_d;
> + struct rdt_ctrl_domain *ctrl_d;
> + bool ctrl_dom_empty, mon_dom_empty;
> +
> + resctrl_offline_cpu(cpu);
> +
> + guard(mutex)(&domain_list_lock);
> + for (i = 0; i < RDT_NUM_RESOURCES; i++) {
> + res = &mpam_resctrl_controls[i];
> + if (!res->class)
> + continue; // dummy resource
> +
> + dom = mpam_resctrl_get_domain_from_cpu(cpu, res);
> + if (WARN_ON_ONCE(!dom))
> + continue;
> +
> + ctrl_dom_empty = true;
> + if (exposed_alloc_capable) {
> + ctrl_d = &dom->resctrl_ctrl_dom;
> + ctrl_dom_empty = mpam_resctrl_offline_domain_hdr(cpu, &ctrl_d->hdr);
> + if (ctrl_dom_empty)
> + resctrl_offline_ctrl_domain(&res->resctrl_res, ctrl_d);
> + }
> +
> + mon_dom_empty = true;
> + if (exposed_mon_capable) {
> + mon_d = &dom->resctrl_mon_dom;
> + mon_dom_empty = mpam_resctrl_offline_domain_hdr(cpu, &mon_d->hdr);
> + if (mon_dom_empty)
> + resctrl_offline_mon_domain(&res->resctrl_res, mon_d);
> + }
> +
> + if (ctrl_dom_empty && mon_dom_empty)
> + kfree(dom);
> + }
> +}
> +
> +int mpam_resctrl_setup(void)
> +{
> + int err = 0;
> + enum resctrl_res_level i;
> + struct mpam_resctrl_res *res;
> +
> + cpus_read_lock();
> + for (i = 0; i < RDT_NUM_RESOURCES; i++) {
> + res = &mpam_resctrl_controls[i];
> + INIT_LIST_HEAD_RCU(&res->resctrl_res.ctrl_domains);
> + INIT_LIST_HEAD_RCU(&res->resctrl_res.mon_domains);
> + res->resctrl_res.rid = i;
> + }
> +
> + /* TODO: pick MPAM classes to map to resctrl resources */
> +
> + /* Initialise the resctrl structures from the classes */
> + for (i = 0; i < RDT_NUM_RESOURCES; i++) {
> + res = &mpam_resctrl_controls[i];
> + if (!res->class)
> + continue; // dummy resource
> +
> + err = mpam_resctrl_control_init(res, i);
> + if (err) {
> + pr_debug("Failed to initialise rid %u\n", i);
> + break;
> + }
> + }
> + cpus_read_unlock();
> +
> + if (err || (!exposed_alloc_capable && !exposed_mon_capable)) {
> + if (err)
> + pr_debug("Internal error %d - resctrl not supported\n",
> + err);
> + else
> + pr_debug("No alloc(%u) or monitor(%u) found - resctrl not supported\n",
> + exposed_alloc_capable, exposed_mon_capable);
> + err = -EOPNOTSUPP;
> + }
> +
> + if (!err) {
> + if (!is_power_of_2(mpam_pmg_max + 1)) {
> + /*
> + * If not all the partid*pmg values are valid indexes,
> + * resctrl may allocate pmg that don't exist. This
> + * should cause an error interrupt.
> + */
> + pr_warn("Number of PMG is not a power of 2! resctrl may misbehave");
> + }
> +
> + /* TODO: call resctrl_init() */
> + }
> +
> + return err;
> +}
> diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
> index 7f00c5285a32..2c7d1413a401 100644
> --- a/include/linux/arm_mpam.h
> +++ b/include/linux/arm_mpam.h
> @@ -49,6 +49,9 @@ static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
> }
> #endif
>
> +bool resctrl_arch_alloc_capable(void);
> +bool resctrl_arch_mon_capable(void);
> +
> /**
> * mpam_register_requestor() - Register a requestor with the MPAM driver
> * @partid_max: The maximum PARTID value the requestor can generate.
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread* Re: [RFC PATCH 07/38] arm_mpam: resctrl: Add boilerplate cpuhp and domain allocation
2025-12-05 21:58 ` [RFC PATCH 07/38] arm_mpam: resctrl: Add boilerplate cpuhp and domain allocation James Morse
2025-12-09 15:43 ` Ben Horgan
@ 2025-12-18 11:30 ` Jonathan Cameron
2025-12-19 12:02 ` Ben Horgan
2025-12-19 12:17 ` Ben Horgan
1 sibling, 2 replies; 95+ messages in thread
From: Jonathan Cameron @ 2025-12-18 11:30 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
On Fri, 5 Dec 2025 21:58:30 +0000
James Morse <james.morse@arm.com> wrote:
> resctrl has its own data structures to describe its resources. We
> can't use these directly as we play tricks with the 'MBA' resource,
> picking the MPAM controls or monitors that best apply. We may export
> the same component as both L3 and MBA.
>
> Add mpam_resctrl_exports[] as the array of class->resctrl mappings we
> are exporting, and add the cpuhp hooks that allocated and free the
> resctrl domain structures.
>
> While we're here, plumb in a few other obvious things.
>
> CONFIG_ARM_CPU_RESCTRL is used to allow this code to be built
> even though it can't yet be linked against resctrl.
>
> Signed-off-by: James Morse <james.morse@arm.com>
Hi,
A few code flow related comments. Fairly trivial stuff but I think
some parts of this can be made more readable / maintainable with
minor reorganization.
Jonathan
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 2996ad93fc3e..efaf7633bc35 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
...
> @@ -2516,6 +2522,12 @@ static void mpam_enable_once(void)
> mutex_unlock(&mpam_list_lock);
> cpus_read_unlock();
>
> + if (!err) {
> + err = mpam_resctrl_setup();
> + if (err)
> + pr_err("Failed to initialise resctrl: %d\n", err);
> + }
> +
> if (err) {
> mpam_disable_reason = "Failed to enable.";
> schedule_work(&mpam_broken_work);
I'd be tempted to move this to an error handling block via a goto
making this bit
if (err)
goto err_disable_mpam;
err = mpam_resctrl_setup();
if (err) {
pr_err();
goto err_dsiable_mpam;
}
Up to you though. Personally I like all my good paths as straight line
code with the errors handled in if (err) as that consistency really helps
readability.
> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
> new file mode 100644
> index 000000000000..320cebbd37ce
> --- /dev/null
> +++ b/drivers/resctrl/mpam_resctrl.c
> @@ -0,0 +1,329 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2025 Arm Ltd.
> +
> +#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
> +
> +#include <linux/arm_mpam.h>
> +#include <linux/cacheinfo.h>
> +#include <linux/cpu.h>
> +#include <linux/cpumask.h>
> +#include <linux/errno.h>
> +#include <linux/list.h>
> +#include <linux/printk.h>
> +#include <linux/rculist.h>
> +#include <linux/resctrl.h>
> +#include <linux/slab.h>
> +#include <linux/types.h>
> +
> +#include <asm/mpam.h>
> +
> +#include "mpam_internal.h"
> +static struct mpam_resctrl_dom *
> +mpam_resctrl_alloc_domain(unsigned int cpu, struct mpam_resctrl_res *res)
> +{
> + int err;
> + struct mpam_resctrl_dom *dom;
> + struct rdt_mon_domain *mon_d;
> + struct rdt_ctrl_domain *ctrl_d;
> + struct mpam_class *class = res->class;
> + struct mpam_component *comp_iter, *ctrl_comp;
> + struct rdt_resource *r = &res->resctrl_res;
> +
> + lockdep_assert_held(&domain_list_lock);
> +
> + ctrl_comp = NULL;
> + guard(srcu)(&mpam_srcu);
> + list_for_each_entry_srcu(comp_iter, &class->components, class_list,
> + srcu_read_lock_held(&mpam_srcu)) {
> + if (cpumask_test_cpu(cpu, &comp_iter->affinity)) {
> + ctrl_comp = comp_iter;
> + break;
> + }
> + }
> +
> + /* class has no component for this CPU */
> + if (WARN_ON_ONCE(!ctrl_comp))
> + return ERR_PTR(-EINVAL);
> +
> + dom = kzalloc_node(sizeof(*dom), GFP_KERNEL, cpu_to_node(cpu));
> + if (!dom)
> + return ERR_PTR(-ENOMEM);
> +
> + if (exposed_alloc_capable) {
> + dom->ctrl_comp = ctrl_comp;
> +
> + ctrl_d = &dom->resctrl_ctrl_dom;
> + mpam_resctrl_domain_hdr_init(cpu, ctrl_comp, &ctrl_d->hdr);
> + ctrl_d->hdr.type = RESCTRL_CTRL_DOMAIN;
> + /* TODO: this list should be sorted */
> + list_add_tail_rcu(&ctrl_d->hdr.list, &r->ctrl_domains);
> + err = resctrl_online_ctrl_domain(r, ctrl_d);
> + if (err) {
> + dom = ERR_PTR(err);
> + goto offline_ctrl_domain;
> + }
> + } else {
> + pr_debug("Skipped control domain online - no controls\n");
> + }
> +
> + if (exposed_mon_capable) {
> + mon_d = &dom->resctrl_mon_dom;
> + mpam_resctrl_domain_hdr_init(cpu, ctrl_comp, &mon_d->hdr);
> + mon_d->hdr.type = RESCTRL_MON_DOMAIN;
> + /* TODO: this list should be sorted */
> + list_add_tail_rcu(&mon_d->hdr.list, &r->mon_domains);
> + err = resctrl_online_mon_domain(r, mon_d);
> + if (err) {
> + dom = ERR_PTR(err);
> + goto offline_mon_hdr;
> + }
> + } else {
> + pr_debug("Skipped monitor domain online - no monitors\n");
> + }
> + goto out;
To keep flow simple, return here. I thought maybe there was more stuff
that was always done (added in later patches) but not seeing that.
If there were then it would be a fairly strong indicator that a different
code structure makes more sense - probably with some helper functions.
> +
> +offline_mon_hdr:
> + mpam_resctrl_offline_domain_hdr(cpu, &mon_d->hdr);
> +offline_ctrl_domain:
> + resctrl_offline_ctrl_domain(r, ctrl_d);
> +out:
> + return dom;
> +}
> +
> +static struct mpam_resctrl_dom *
> +mpam_resctrl_get_domain_from_cpu(int cpu, struct mpam_resctrl_res *res)
> +{
> + struct mpam_resctrl_dom *dom;
> + struct rdt_ctrl_domain *ctrl_d;
> +
> + lockdep_assert_cpus_held();
> +
> + list_for_each_entry_rcu(ctrl_d, &res->resctrl_res.ctrl_domains,
> + hdr.list) {
> + dom = container_of(ctrl_d, struct mpam_resctrl_dom,
> + resctrl_ctrl_dom);
I'm lazy so haven't checked for more code here in later patches, but
if not, why not iterate the list to access the domain directly rather
than jumping through the rdt_ctrl_domain?
Something along lines of:
list_for_each_entry_rcu(dom, &res->resctrl_res.ctrl_domains,
resctrl_ctrl_dom.hdr.list) {
}
> +
> + if (cpumask_test_cpu(cpu, &dom->ctrl_comp->affinity))
> + return dom;
> + }
> +
> + return NULL;
> +}
> +
> +int mpam_resctrl_online_cpu(unsigned int cpu)
> +{
> + int i;
> + struct mpam_resctrl_dom *dom;
> + struct mpam_resctrl_res *res;
> +
> + guard(mutex)(&domain_list_lock);
> + for (i = 0; i < RDT_NUM_RESOURCES; i++) {
I'd narrow the scope for dom and res to inside the loop.
Maybe put the iterator in the for loop init (now considered
acceptable in kernel code)
Similar applies in various other places. No that important
for functions that more or less just consist of a loop though.
> + res = &mpam_resctrl_controls[i];
> + if (!res->class)
> + continue; // dummy_resource;
> +
> + dom = mpam_resctrl_get_domain_from_cpu(cpu, res);
> + if (!dom)
> + dom = mpam_resctrl_alloc_domain(cpu, res);
> + if (IS_ERR(dom))
> + return PTR_ERR(dom);
> + }
> +
> + resctrl_online_cpu(cpu);
> +
> + return 0;
> +}
> +int mpam_resctrl_setup(void)
> +{
> + int err = 0;
> + enum resctrl_res_level i;
> + struct mpam_resctrl_res *res;
> +
> + cpus_read_lock();
> + for (i = 0; i < RDT_NUM_RESOURCES; i++) {
> + res = &mpam_resctrl_controls[i];
> + INIT_LIST_HEAD_RCU(&res->resctrl_res.ctrl_domains);
> + INIT_LIST_HEAD_RCU(&res->resctrl_res.mon_domains);
> + res->resctrl_res.rid = i;
> + }
> +
> + /* TODO: pick MPAM classes to map to resctrl resources */
> +
> + /* Initialise the resctrl structures from the classes */
> + for (i = 0; i < RDT_NUM_RESOURCES; i++) {
> + res = &mpam_resctrl_controls[i];
> + if (!res->class)
> + continue; // dummy resource
> +
> + err = mpam_resctrl_control_init(res, i);
> + if (err) {
> + pr_debug("Failed to initialise rid %u\n", i);
> + break;
> + }
> + }
> + cpus_read_unlock();
> +
> + if (err || (!exposed_alloc_capable && !exposed_mon_capable)) {
> + if (err)
> + pr_debug("Internal error %d - resctrl not supported\n",
> + err);
> + else
> + pr_debug("No alloc(%u) or monitor(%u) found - resctrl not supported\n",
> + exposed_alloc_capable, exposed_mon_capable);
> + err = -EOPNOTSUPP;
return -EOPNOTSUPP; here to make the code flow simpler.
Mind you nice to avoid eating err if it is set and the sharing here doesn't seem
all that useful so perhaps just make this:
if (err) {
pr_debug("Internal error %d - resctrl not supported\n", err);
return err;
}
if (!exposed_alloc_capable && !exposed_mon_capable) {
pr_debug("No alloc(%u) or monitor(%u) found - resctrl not supported\n",
exposed_alloc_capable, exposed_mon_capable);
return -EOPNOTSUPP;
}
> + }
> +
> + if (!err) {
> + if (!is_power_of_2(mpam_pmg_max + 1)) {
> + /*
> + * If not all the partid*pmg values are valid indexes,
> + * resctrl may allocate pmg that don't exist. This
> + * should cause an error interrupt.
> + */
> + pr_warn("Number of PMG is not a power of 2! resctrl may misbehave");
> + }
> +
> + /* TODO: call resctrl_init() */
> + }
> +
> + return err;
> +}
^ permalink raw reply [flat|nested] 95+ messages in thread* Re: [RFC PATCH 07/38] arm_mpam: resctrl: Add boilerplate cpuhp and domain allocation
2025-12-18 11:30 ` Jonathan Cameron
@ 2025-12-19 12:02 ` Ben Horgan
2025-12-22 11:48 ` Jonathan Cameron
2025-12-19 12:17 ` Ben Horgan
1 sibling, 1 reply; 95+ messages in thread
From: Ben Horgan @ 2025-12-19 12:02 UTC (permalink / raw)
To: Jonathan Cameron, James Morse
Cc: linux-kernel, linux-arm-kernel, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal
Hi Jonathan,
On 12/18/25 11:30, Jonathan Cameron wrote:
> On Fri, 5 Dec 2025 21:58:30 +0000
> James Morse <james.morse@arm.com> wrote:
>
>> resctrl has its own data structures to describe its resources. We
>> can't use these directly as we play tricks with the 'MBA' resource,
>> picking the MPAM controls or monitors that best apply. We may export
>> the same component as both L3 and MBA.
>>
>> Add mpam_resctrl_exports[] as the array of class->resctrl mappings we
>> are exporting, and add the cpuhp hooks that allocated and free the
>> resctrl domain structures.
>>
>> While we're here, plumb in a few other obvious things.
>>
>> CONFIG_ARM_CPU_RESCTRL is used to allow this code to be built
>> even though it can't yet be linked against resctrl.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
> Hi,
>
> A few code flow related comments. Fairly trivial stuff but I think
> some parts of this can be made more readable / maintainable with
> minor reorganization.
>
> Jonathan
>
>
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index 2996ad93fc3e..efaf7633bc35 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
> ...
>
>> @@ -2516,6 +2522,12 @@ static void mpam_enable_once(void)
>> mutex_unlock(&mpam_list_lock);
>> cpus_read_unlock();
>>
>> + if (!err) {
>> + err = mpam_resctrl_setup();
>> + if (err)
>> + pr_err("Failed to initialise resctrl: %d\n", err);
>> + }
>> +
>> if (err) {
>> mpam_disable_reason = "Failed to enable.";
>> schedule_work(&mpam_broken_work);
>
> I'd be tempted to move this to an error handling block via a goto
> making this bit
> if (err)
> goto err_disable_mpam;
>
> err = mpam_resctrl_setup();
> if (err) {
> pr_err();
> goto err_dsiable_mpam;
> }
>
> Up to you though. Personally I like all my good paths as straight line
> code with the errors handled in if (err) as that consistency really helps
> readability.
>
>> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
>> new file mode 100644
>> index 000000000000..320cebbd37ce
>> --- /dev/null
>> +++ b/drivers/resctrl/mpam_resctrl.c
>> @@ -0,0 +1,329 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +// Copyright (C) 2025 Arm Ltd.
>> +
>> +#define pr_fmt(fmt) "%s:%s: " fmt, KBUILD_MODNAME, __func__
>> +
>> +#include <linux/arm_mpam.h>
>> +#include <linux/cacheinfo.h>
>> +#include <linux/cpu.h>
>> +#include <linux/cpumask.h>
>> +#include <linux/errno.h>
>> +#include <linux/list.h>
>> +#include <linux/printk.h>
>> +#include <linux/rculist.h>
>> +#include <linux/resctrl.h>
>> +#include <linux/slab.h>
>> +#include <linux/types.h>
>> +
>> +#include <asm/mpam.h>
>> +
>> +#include "mpam_internal.h"
>
>
>> +static struct mpam_resctrl_dom *
>> +mpam_resctrl_alloc_domain(unsigned int cpu, struct mpam_resctrl_res *res)
>> +{
>> + int err;
>> + struct mpam_resctrl_dom *dom;
>> + struct rdt_mon_domain *mon_d;
>> + struct rdt_ctrl_domain *ctrl_d;
>> + struct mpam_class *class = res->class;
>> + struct mpam_component *comp_iter, *ctrl_comp;
>> + struct rdt_resource *r = &res->resctrl_res;
>> +
>> + lockdep_assert_held(&domain_list_lock);
>> +
>> + ctrl_comp = NULL;
>> + guard(srcu)(&mpam_srcu);
>> + list_for_each_entry_srcu(comp_iter, &class->components, class_list,
>> + srcu_read_lock_held(&mpam_srcu)) {
>> + if (cpumask_test_cpu(cpu, &comp_iter->affinity)) {
>> + ctrl_comp = comp_iter;
>> + break;
>> + }
>> + }
>> +
>> + /* class has no component for this CPU */
>> + if (WARN_ON_ONCE(!ctrl_comp))
>> + return ERR_PTR(-EINVAL);
>> +
>> + dom = kzalloc_node(sizeof(*dom), GFP_KERNEL, cpu_to_node(cpu));
>> + if (!dom)
>> + return ERR_PTR(-ENOMEM);
>> +
>> + if (exposed_alloc_capable) {
>> + dom->ctrl_comp = ctrl_comp;
>> +
>> + ctrl_d = &dom->resctrl_ctrl_dom;
>> + mpam_resctrl_domain_hdr_init(cpu, ctrl_comp, &ctrl_d->hdr);
>> + ctrl_d->hdr.type = RESCTRL_CTRL_DOMAIN;
>> + /* TODO: this list should be sorted */
>> + list_add_tail_rcu(&ctrl_d->hdr.list, &r->ctrl_domains);
>> + err = resctrl_online_ctrl_domain(r, ctrl_d);
>> + if (err) {
>> + dom = ERR_PTR(err);
>> + goto offline_ctrl_domain;
>> + }
>> + } else {
>> + pr_debug("Skipped control domain online - no controls\n");
>> + }
>> +
>> + if (exposed_mon_capable) {
>> + mon_d = &dom->resctrl_mon_dom;
>> + mpam_resctrl_domain_hdr_init(cpu, ctrl_comp, &mon_d->hdr);
>> + mon_d->hdr.type = RESCTRL_MON_DOMAIN;
>> + /* TODO: this list should be sorted */
>> + list_add_tail_rcu(&mon_d->hdr.list, &r->mon_domains);
>> + err = resctrl_online_mon_domain(r, mon_d);
>> + if (err) {
>> + dom = ERR_PTR(err);
>> + goto offline_mon_hdr;
>> + }
>> + } else {
>> + pr_debug("Skipped monitor domain online - no monitors\n");
>> + }
>> + goto out;
>
> To keep flow simple, return here. I thought maybe there was more stuff
> that was always done (added in later patches) but not seeing that.
> If there were then it would be a fairly strong indicator that a different
> code structure makes more sense - probably with some helper functions.
Makes sense.
>
>> +
>> +offline_mon_hdr:
>> + mpam_resctrl_offline_domain_hdr(cpu, &mon_d->hdr);
>> +offline_ctrl_domain:
>> + resctrl_offline_ctrl_domain(r, ctrl_d);
>> +out:
>> + return dom;
>> +}
>> +
>> +static struct mpam_resctrl_dom *
>> +mpam_resctrl_get_domain_from_cpu(int cpu, struct mpam_resctrl_res *res)
>> +{
>> + struct mpam_resctrl_dom *dom;
>> + struct rdt_ctrl_domain *ctrl_d;
>> +
>> + lockdep_assert_cpus_held();
>> +
>> + list_for_each_entry_rcu(ctrl_d, &res->resctrl_res.ctrl_domains,
>> + hdr.list) {
>> + dom = container_of(ctrl_d, struct mpam_resctrl_dom,
>> + resctrl_ctrl_dom);
>
> I'm lazy so haven't checked for more code here in later patches, but
> if not, why not iterate the list to access the domain directly rather
> than jumping through the rdt_ctrl_domain?
>
> Something along lines of:
>
> list_for_each_entry_rcu(dom, &res->resctrl_res.ctrl_domains,
> resctrl_ctrl_dom.hdr.list) {
> }
>
Unless I've misunderstood I don't think this works because it's not what
the fs/resctrl code expects.
>> +
>> + if (cpumask_test_cpu(cpu, &dom->ctrl_comp->affinity))
>> + return dom;
>> + }
>> +
>> + return NULL;
>> +}
>> +
>> +int mpam_resctrl_online_cpu(unsigned int cpu)
>> +{
>> + int i;
>> + struct mpam_resctrl_dom *dom;
>> + struct mpam_resctrl_res *res;
>> +
>> + guard(mutex)(&domain_list_lock);
>> + for (i = 0; i < RDT_NUM_RESOURCES; i++) {
>
> I'd narrow the scope for dom and res to inside the loop.
> Maybe put the iterator in the for loop init (now considered
> acceptable in kernel code)
>
> Similar applies in various other places. No that important
> for functions that more or less just consist of a loop though.
I've done a bit of scope reducing here and in some other places.
>
>> + res = &mpam_resctrl_controls[i];
>> + if (!res->class)
>> + continue; // dummy_resource;
>> +
>> + dom = mpam_resctrl_get_domain_from_cpu(cpu, res);
>> + if (!dom)
>> + dom = mpam_resctrl_alloc_domain(cpu, res);
>> + if (IS_ERR(dom))
>> + return PTR_ERR(dom);
>> + }
>> +
>> + resctrl_online_cpu(cpu);
>> +
>> + return 0;
>> +}
>
>> +int mpam_resctrl_setup(void)
>> +{
>> + int err = 0;
>> + enum resctrl_res_level i;
>> + struct mpam_resctrl_res *res;
>> +
>> + cpus_read_lock();
>> + for (i = 0; i < RDT_NUM_RESOURCES; i++) {
>> + res = &mpam_resctrl_controls[i];
>> + INIT_LIST_HEAD_RCU(&res->resctrl_res.ctrl_domains);
>> + INIT_LIST_HEAD_RCU(&res->resctrl_res.mon_domains);
>> + res->resctrl_res.rid = i;
>> + }
>> +
>> + /* TODO: pick MPAM classes to map to resctrl resources */
>> +
>> + /* Initialise the resctrl structures from the classes */
>> + for (i = 0; i < RDT_NUM_RESOURCES; i++) {
>> + res = &mpam_resctrl_controls[i];
>> + if (!res->class)
>> + continue; // dummy resource
>> +
>> + err = mpam_resctrl_control_init(res, i);
>> + if (err) {
>> + pr_debug("Failed to initialise rid %u\n", i);
>> + break;
>> + }
>> + }
>> + cpus_read_unlock();
>> +
>> + if (err || (!exposed_alloc_capable && !exposed_mon_capable)) {
>> + if (err)
>> + pr_debug("Internal error %d - resctrl not supported\n",
>> + err);
>> + else
>> + pr_debug("No alloc(%u) or monitor(%u) found - resctrl not supported\n",
>> + exposed_alloc_capable, exposed_mon_capable);
>> + err = -EOPNOTSUPP;
>
> return -EOPNOTSUPP; here to make the code flow simpler.
> Mind you nice to avoid eating err if it is set and the sharing here doesn't seem
> all that useful so perhaps just make this:
>
> if (err) {
> pr_debug("Internal error %d - resctrl not supported\n", err);
> return err;
> }
>
> if (!exposed_alloc_capable && !exposed_mon_capable) {
> pr_debug("No alloc(%u) or monitor(%u) found - resctrl not supported\n",
> exposed_alloc_capable, exposed_mon_capable);
> return -EOPNOTSUPP;
> }
I've gone for the second option.
>
>
>> + }
>> +
>> + if (!err) {
>> + if (!is_power_of_2(mpam_pmg_max + 1)) {
>> + /*
>> + * If not all the partid*pmg values are valid indexes,
>> + * resctrl may allocate pmg that don't exist. This
>> + * should cause an error interrupt.
>> + */
>> + pr_warn("Number of PMG is not a power of 2! resctrl may misbehave");
>> + }
>> +
>> + /* TODO: call resctrl_init() */
>> + }
>> +
>> + return err;
>> +}
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread* Re: [RFC PATCH 07/38] arm_mpam: resctrl: Add boilerplate cpuhp and domain allocation
2025-12-19 12:02 ` Ben Horgan
@ 2025-12-22 11:48 ` Jonathan Cameron
2026-01-02 11:07 ` Ben Horgan
0 siblings, 1 reply; 95+ messages in thread
From: Jonathan Cameron @ 2025-12-22 11:48 UTC (permalink / raw)
To: Ben Horgan
Cc: James Morse, linux-kernel, linux-arm-kernel, D Scott Phillips OS,
carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
fenghuay, baisheng.gao, Gavin Shan, rohit.mathew, reinette.chatre,
Punit Agrawal
> >> +static struct mpam_resctrl_dom *
> >> +mpam_resctrl_get_domain_from_cpu(int cpu, struct mpam_resctrl_res *res)
> >> +{
> >> + struct mpam_resctrl_dom *dom;
> >> + struct rdt_ctrl_domain *ctrl_d;
> >> +
> >> + lockdep_assert_cpus_held();
> >> +
> >> + list_for_each_entry_rcu(ctrl_d, &res->resctrl_res.ctrl_domains,
> >> + hdr.list) {
> >> + dom = container_of(ctrl_d, struct mpam_resctrl_dom,
> >> + resctrl_ctrl_dom);
> >
> > I'm lazy so haven't checked for more code here in later patches, but
> > if not, why not iterate the list to access the domain directly rather
> > than jumping through the rdt_ctrl_domain?
> >
> > Something along lines of:
> >
> > list_for_each_entry_rcu(dom, &res->resctrl_res.ctrl_domains,
> > resctrl_ctrl_dom.hdr.list) {
> > }
> >
>
> Unless I've misunderstood I don't think this works because it's not what
> the fs/resctrl code expects.
I think I explained this one badly.
This should be functionally identical to the above so no visible side
effects outside of this code. All this change is meant to do is wrap the
container_of() in the list iterator. When using the _entry_ variants
it is wrapping container_of() anyway so just going one level further
up the hierarchy of nested structures.
struct a {
struct b {
struct list_head l;
}
}
It's actually a list of struct a as all elements on this list are
struct b instances within struct a, but you are treating it as a list
of struct b and then using a container_of() to get to struct a on
each one.
The change is treat it as a list of struct a with the list_head happening
to be wrapped in struct b. Results in slightly simpler code and makes
the point these are always struct a instances.
Jonathan
>
>
> >> +
> >> + if (cpumask_test_cpu(cpu, &dom->ctrl_comp->affinity))
> >> + return dom;
> >> + }
> >> +
> >> + return NULL;
> >> +}
> >> +
^ permalink raw reply [flat|nested] 95+ messages in thread* Re: [RFC PATCH 07/38] arm_mpam: resctrl: Add boilerplate cpuhp and domain allocation
2025-12-22 11:48 ` Jonathan Cameron
@ 2026-01-02 11:07 ` Ben Horgan
0 siblings, 0 replies; 95+ messages in thread
From: Ben Horgan @ 2026-01-02 11:07 UTC (permalink / raw)
To: Jonathan Cameron
Cc: James Morse, linux-kernel, linux-arm-kernel, D Scott Phillips OS,
carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
Jamie Iles, Xin Hao, peternewman, dfustini, amitsinght,
David Hildenbrand, Dave Martin, Koba Ko, Shanker Donthineni,
fenghuay, baisheng.gao, Gavin Shan, rohit.mathew, reinette.chatre,
Punit Agrawal
Hi Jonathan,
On 12/22/25 11:48, Jonathan Cameron wrote:
>
>>>> +static struct mpam_resctrl_dom *
>>>> +mpam_resctrl_get_domain_from_cpu(int cpu, struct mpam_resctrl_res *res)
>>>> +{
>>>> + struct mpam_resctrl_dom *dom;
>>>> + struct rdt_ctrl_domain *ctrl_d;
>>>> +
>>>> + lockdep_assert_cpus_held();
>>>> +
>>>> + list_for_each_entry_rcu(ctrl_d, &res->resctrl_res.ctrl_domains,
>>>> + hdr.list) {
>>>> + dom = container_of(ctrl_d, struct mpam_resctrl_dom,
>>>> + resctrl_ctrl_dom);
>>>
>>> I'm lazy so haven't checked for more code here in later patches, but
>>> if not, why not iterate the list to access the domain directly rather
>>> than jumping through the rdt_ctrl_domain?
>>>
>>> Something along lines of:
>>>
>>> list_for_each_entry_rcu(dom, &res->resctrl_res.ctrl_domains,
>>> resctrl_ctrl_dom.hdr.list) {
>>> }
>>>
>>
>> Unless I've misunderstood I don't think this works because it's not what
>> the fs/resctrl code expects.
>
> I think I explained this one badly.
>
> This should be functionally identical to the above so no visible side
> effects outside of this code. All this change is meant to do is wrap the
> container_of() in the list iterator. When using the _entry_ variants
> it is wrapping container_of() anyway so just going one level further
> up the hierarchy of nested structures.
>
> struct a {
> struct b {
> struct list_head l;
> }
> }
>
> It's actually a list of struct a as all elements on this list are
> struct b instances within struct a, but you are treating it as a list
> of struct b and then using a container_of() to get to struct a on
> each one.
>
> The change is treat it as a list of struct a with the list_head happening
> to be wrapped in struct b. Results in slightly simpler code and makes
> the point these are always struct a instances.
Thanks for the detailed explanation. This makes sense to me now and I'll
make the change.
>
> Jonathan
>
>
>
>>
>>
>>>> +
>>>> + if (cpumask_test_cpu(cpu, &dom->ctrl_comp->affinity))
>>>> + return dom;
>>>> + }
>>>> +
>>>> + return NULL;
>>>> +}
>>>> +
>
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [RFC PATCH 07/38] arm_mpam: resctrl: Add boilerplate cpuhp and domain allocation
2025-12-18 11:30 ` Jonathan Cameron
2025-12-19 12:02 ` Ben Horgan
@ 2025-12-19 12:17 ` Ben Horgan
1 sibling, 0 replies; 95+ messages in thread
From: Ben Horgan @ 2025-12-19 12:17 UTC (permalink / raw)
To: Jonathan Cameron, James Morse
Cc: linux-kernel, linux-arm-kernel, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal
Hi Jonathan,
On 12/18/25 11:30, Jonathan Cameron wrote:
> On Fri, 5 Dec 2025 21:58:30 +0000
> James Morse <james.morse@arm.com> wrote:
>
>> resctrl has its own data structures to describe its resources. We
>> can't use these directly as we play tricks with the 'MBA' resource,
>> picking the MPAM controls or monitors that best apply. We may export
>> the same component as both L3 and MBA.
>>
>> Add mpam_resctrl_exports[] as the array of class->resctrl mappings we
>> are exporting, and add the cpuhp hooks that allocated and free the
>> resctrl domain structures.
>>
>> While we're here, plumb in a few other obvious things.
>>
>> CONFIG_ARM_CPU_RESCTRL is used to allow this code to be built
>> even though it can't yet be linked against resctrl.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
> Hi,
>
> A few code flow related comments. Fairly trivial stuff but I think
> some parts of this can be made more readable / maintainable with
> minor reorganization.
>
> Jonathan
>
>
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index 2996ad93fc3e..efaf7633bc35 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
> ...
>
>> @@ -2516,6 +2522,12 @@ static void mpam_enable_once(void)
>> mutex_unlock(&mpam_list_lock);
>> cpus_read_unlock();
>>
>> + if (!err) {
>> + err = mpam_resctrl_setup();
>> + if (err)
>> + pr_err("Failed to initialise resctrl: %d\n", err);
>> + }
>> +
>> if (err) {
>> mpam_disable_reason = "Failed to enable.";
>> schedule_work(&mpam_broken_work);
>
> I'd be tempted to move this to an error handling block via a goto
> making this bit
> if (err)
> goto err_disable_mpam;
>
> err = mpam_resctrl_setup();
> if (err) {
> pr_err();
> goto err_dsiable_mpam;
> }
>
> Up to you though. Personally I like all my good paths as straight line
> code with the errors handled in if (err) as that consistency really helps
> readability.
>
I'll leave this one as is. It looks like James tried hard to avoid a goto.
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread
* [RFC PATCH 08/38] arm_mpam: resctrl: Pick the caches we will use as resctrl resources
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (6 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 07/38] arm_mpam: resctrl: Add boilerplate cpuhp and domain allocation James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-09 15:57 ` Ben Horgan
2025-12-18 11:38 ` Jonathan Cameron
2025-12-05 21:58 ` [RFC PATCH 09/38] arm_mpam: resctrl: Implement resctrl_arch_reset_all_ctrls() James Morse
` (30 subsequent siblings)
38 siblings, 2 replies; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
Systems with MPAM support may have a variety of control types at any
point of their system layout. We can only expose certain types of
control, and only if they exist at particular locations.
Start with the well-know caches. These have to be depth 2 or 3
and support MPAM's cache portion bitmap controls, with a number
of portions fewer than resctrl's limit.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_resctrl.c | 91 +++++++++++++++++++++++++++++++++-
1 file changed, 89 insertions(+), 2 deletions(-)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index 320cebbd37ce..ceaf11af4fc1 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -60,10 +60,96 @@ struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
return &mpam_resctrl_controls[l].resctrl_res;
}
+static bool cache_has_usable_cpor(struct mpam_class *class)
+{
+ struct mpam_props *cprops = &class->props;
+
+ if (!mpam_has_feature(mpam_feat_cpor_part, cprops))
+ return false;
+
+ /* resctrl uses u32 for all bitmap configurations */
+ return (class->props.cpbm_wd <= 32);
+}
+
+/* Test whether we can export MPAM_CLASS_CACHE:{2,3}? */
+static void mpam_resctrl_pick_caches(void)
+{
+ struct mpam_class *class;
+ struct mpam_resctrl_res *res;
+
+ lockdep_assert_cpus_held();
+
+ guard(srcu)(&mpam_srcu);
+ list_for_each_entry_srcu(class, &mpam_classes, classes_list,
+ srcu_read_lock_held(&mpam_srcu)) {
+ if (class->type != MPAM_CLASS_CACHE) {
+ pr_debug("class %u is not a cache\n", class->level);
+ continue;
+ }
+
+ if (class->level != 2 && class->level != 3) {
+ pr_debug("class %u is not L2 or L3\n", class->level);
+ continue;
+ }
+
+ if (!cache_has_usable_cpor(class)) {
+ pr_debug("class %u cache misses CPOR\n", class->level);
+ continue;
+ }
+
+ if (!cpumask_equal(&class->affinity, cpu_possible_mask)) {
+ pr_debug("class %u has missing CPUs\n", class->level);
+ pr_debug("class %u mask %*pb != %*pb\n", class->level,
+ cpumask_pr_args(&class->affinity),
+ cpumask_pr_args(cpu_possible_mask));
+ continue;
+ }
+
+ if (class->level == 2)
+ res = &mpam_resctrl_controls[RDT_RESOURCE_L2];
+ else
+ res = &mpam_resctrl_controls[RDT_RESOURCE_L3];
+ res->class = class;
+ exposed_alloc_capable = true;
+ }
+}
+
static int mpam_resctrl_control_init(struct mpam_resctrl_res *res,
enum resctrl_res_level type)
{
- /* TODO: initialise the resctrl resources */
+ struct mpam_class *class = res->class;
+ struct rdt_resource *r = &res->resctrl_res;
+
+ switch (res->resctrl_res.rid) {
+ case RDT_RESOURCE_L2:
+ case RDT_RESOURCE_L3:
+ r->alloc_capable = true;
+ r->schema_fmt = RESCTRL_SCHEMA_BITMAP;
+ r->cache.arch_has_sparse_bitmasks = true;
+
+ r->cache.cbm_len = class->props.cpbm_wd;
+ /* mpam_devices will reject empty bitmaps */
+ r->cache.min_cbm_bits = 1;
+
+ if (r->rid == RDT_RESOURCE_L2) {
+ r->name = "L2";
+ r->ctrl_scope = RESCTRL_L2_CACHE;
+ } else {
+ r->name = "L3";
+ r->ctrl_scope = RESCTRL_L3_CACHE;
+ }
+
+ /*
+ * Which bits are shared with other ...things...
+ * Unknown devices use partid-0 which uses all the bitmap
+ * fields. Until we configured the SMMU and GIC not to do this
+ * 'all the bits' is the correct answer here.
+ */
+ r->cache.shareable_bits = resctrl_get_default_ctrl(r);
+ break;
+ default:
+ break;
+ }
return 0;
}
@@ -286,7 +372,8 @@ int mpam_resctrl_setup(void)
res->resctrl_res.rid = i;
}
- /* TODO: pick MPAM classes to map to resctrl resources */
+ /* Find some classes to use for controls */
+ mpam_resctrl_pick_caches();
/* Initialise the resctrl structures from the classes */
for (i = 0; i < RDT_NUM_RESOURCES; i++) {
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [RFC PATCH 08/38] arm_mpam: resctrl: Pick the caches we will use as resctrl resources
2025-12-05 21:58 ` [RFC PATCH 08/38] arm_mpam: resctrl: Pick the caches we will use as resctrl resources James Morse
@ 2025-12-09 15:57 ` Ben Horgan
2025-12-16 10:14 ` Ben Horgan
2025-12-18 11:38 ` Jonathan Cameron
1 sibling, 1 reply; 95+ messages in thread
From: Ben Horgan @ 2025-12-09 15:57 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel
Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal
Hi James,
On 12/5/25 21:58, James Morse wrote:
> Systems with MPAM support may have a variety of control types at any
> point of their system layout. We can only expose certain types of
> control, and only if they exist at particular locations.
>
> Start with the well-know caches. These have to be depth 2 or 3
> and support MPAM's cache portion bitmap controls, with a number
> of portions fewer than resctrl's limit.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> drivers/resctrl/mpam_resctrl.c | 91 +++++++++++++++++++++++++++++++++-
> 1 file changed, 89 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
> index 320cebbd37ce..ceaf11af4fc1 100644
> --- a/drivers/resctrl/mpam_resctrl.c
> +++ b/drivers/resctrl/mpam_resctrl.c
> @@ -60,10 +60,96 @@ struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
> return &mpam_resctrl_controls[l].resctrl_res;
> }
>
> +static bool cache_has_usable_cpor(struct mpam_class *class)
> +{
> + struct mpam_props *cprops = &class->props;
> +
> + if (!mpam_has_feature(mpam_feat_cpor_part, cprops))
> + return false;
> +
> + /* resctrl uses u32 for all bitmap configurations */
> + return (class->props.cpbm_wd <= 32);
> +}
cpbm_wd > 32 is not support properly in mpam_devices.c (e.g. reset and
config value are limited to 32 bits.) and so we should consider just not
adding the feature to the class in that case.
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread* Re: [RFC PATCH 08/38] arm_mpam: resctrl: Pick the caches we will use as resctrl resources
2025-12-09 15:57 ` Ben Horgan
@ 2025-12-16 10:14 ` Ben Horgan
0 siblings, 0 replies; 95+ messages in thread
From: Ben Horgan @ 2025-12-16 10:14 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel
Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal
On 12/9/25 15:57, Ben Horgan wrote:
> Hi James,
>
> On 12/5/25 21:58, James Morse wrote:
>> Systems with MPAM support may have a variety of control types at any
>> point of their system layout. We can only expose certain types of
>> control, and only if they exist at particular locations.
>>
>> Start with the well-know caches. These have to be depth 2 or 3
>> and support MPAM's cache portion bitmap controls, with a number
>> of portions fewer than resctrl's limit.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> ---
>> drivers/resctrl/mpam_resctrl.c | 91 +++++++++++++++++++++++++++++++++-
>> 1 file changed, 89 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
>> index 320cebbd37ce..ceaf11af4fc1 100644
>> --- a/drivers/resctrl/mpam_resctrl.c
>> +++ b/drivers/resctrl/mpam_resctrl.c
>> @@ -60,10 +60,96 @@ struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
>> return &mpam_resctrl_controls[l].resctrl_res;
>> }
>>
>> +static bool cache_has_usable_cpor(struct mpam_class *class)
>> +{
>> + struct mpam_props *cprops = &class->props;
>> +
>> + if (!mpam_has_feature(mpam_feat_cpor_part, cprops))
>> + return false;
>> +
>> + /* resctrl uses u32 for all bitmap configurations */
>> + return (class->props.cpbm_wd <= 32);
>> +}
>
> cpbm_wd > 32 is not support properly in mpam_devices.c (e.g. reset and
> config value are limited to 32 bits.) and so we should consider just not
> adding the feature to the class in that case.
I'll keep this as is because the value still needs to be reset correctly
by mpam_devices.c even if it is not usable by resctrl.
>
> Thanks,
>
> Ben
>
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [RFC PATCH 08/38] arm_mpam: resctrl: Pick the caches we will use as resctrl resources
2025-12-05 21:58 ` [RFC PATCH 08/38] arm_mpam: resctrl: Pick the caches we will use as resctrl resources James Morse
2025-12-09 15:57 ` Ben Horgan
@ 2025-12-18 11:38 ` Jonathan Cameron
2025-12-19 12:04 ` Ben Horgan
1 sibling, 1 reply; 95+ messages in thread
From: Jonathan Cameron @ 2025-12-18 11:38 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
On Fri, 5 Dec 2025 21:58:31 +0000
James Morse <james.morse@arm.com> wrote:
> Systems with MPAM support may have a variety of control types at any
> point of their system layout. We can only expose certain types of
> control, and only if they exist at particular locations.
>
> Start with the well-know caches. These have to be depth 2 or 3
> and support MPAM's cache portion bitmap controls, with a number
> of portions fewer than resctrl's limit.
Another one that is a bit random on wrap point. Probably worth tidying
up formatting of all the commit messages so fussy people like me stop
moaning ;)
Otherwise trivial stuff inline.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> drivers/resctrl/mpam_resctrl.c | 91 +++++++++++++++++++++++++++++++++-
> 1 file changed, 89 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
> index 320cebbd37ce..ceaf11af4fc1 100644
> --- a/drivers/resctrl/mpam_resctrl.c
> +++ b/drivers/resctrl/mpam_resctrl.c
> @@ -60,10 +60,96 @@ struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
> return &mpam_resctrl_controls[l].resctrl_res;
> }
>
> +static bool cache_has_usable_cpor(struct mpam_class *class)
> +{
> + struct mpam_props *cprops = &class->props;
> +
> + if (!mpam_has_feature(mpam_feat_cpor_part, cprops))
> + return false;
> +
> + /* resctrl uses u32 for all bitmap configurations */
> + return (class->props.cpbm_wd <= 32);
For me those brackets aren't adding anything. It's not like
anyone forgets precedence wrt return vs operators.
> +}
> +
> +/* Test whether we can export MPAM_CLASS_CACHE:{2,3}? */
> +static void mpam_resctrl_pick_caches(void)
> +{
> + struct mpam_class *class;
> + struct mpam_resctrl_res *res;
> +
> + lockdep_assert_cpus_held();
> +
> + guard(srcu)(&mpam_srcu);
> + list_for_each_entry_srcu(class, &mpam_classes, classes_list,
> + srcu_read_lock_held(&mpam_srcu)) {
> + if (class->type != MPAM_CLASS_CACHE) {
> + pr_debug("class %u is not a cache\n", class->level);
Lots of things aren't caches and seems unlikely that is a case that will
make people wonder why pick caches didn't happen. So maybe this debug
print is excessive?
> + continue;
> + }
> +
> + if (class->level != 2 && class->level != 3) {
> + pr_debug("class %u is not L2 or L3\n", class->level);
> + continue;
> + }
> +
> + if (!cache_has_usable_cpor(class)) {
> + pr_debug("class %u cache misses CPOR\n", class->level);
> + continue;
> + }
> +
> + if (!cpumask_equal(&class->affinity, cpu_possible_mask)) {
> + pr_debug("class %u has missing CPUs\n", class->level);
> + pr_debug("class %u mask %*pb != %*pb\n", class->level,
> + cpumask_pr_args(&class->affinity),
> + cpumask_pr_args(cpu_possible_mask));
Unless this is getting more complex in later patches, doesn't seem like
pr_debug("class %u has missing CPUs, mask %*pb != %*pb\n",
is too much to put on one line.
> + continue;
> + }
> +
> + if (class->level == 2)
> + res = &mpam_resctrl_controls[RDT_RESOURCE_L2];
> + else
> + res = &mpam_resctrl_controls[RDT_RESOURCE_L3];
> + res->class = class;
> + exposed_alloc_capable = true;
> + }
> +}
> +
> static int mpam_resctrl_control_init(struct mpam_resctrl_res *res,
> enum resctrl_res_level type)
> {
> - /* TODO: initialise the resctrl resources */
> + struct mpam_class *class = res->class;
> + struct rdt_resource *r = &res->resctrl_res;
> +
> + switch (res->resctrl_res.rid) {
switch (r->rid) { ?
> + case RDT_RESOURCE_L2:
> + case RDT_RESOURCE_L3:
> + r->alloc_capable = true;
> + r->schema_fmt = RESCTRL_SCHEMA_BITMAP;
> + r->cache.arch_has_sparse_bitmasks = true;
> +
> + r->cache.cbm_len = class->props.cpbm_wd;
> + /* mpam_devices will reject empty bitmaps */
> + r->cache.min_cbm_bits = 1;
> +
> + if (r->rid == RDT_RESOURCE_L2) {
> + r->name = "L2";
> + r->ctrl_scope = RESCTRL_L2_CACHE;
> + } else {
> + r->name = "L3";
> + r->ctrl_scope = RESCTRL_L3_CACHE;
> + }
> +
> + /*
> + * Which bits are shared with other ...things...
> + * Unknown devices use partid-0 which uses all the bitmap
> + * fields. Until we configured the SMMU and GIC not to do this
> + * 'all the bits' is the correct answer here.
> + */
> + r->cache.shareable_bits = resctrl_get_default_ctrl(r);
> + break;
> + default:
> + break;
> + }
>
> return 0;
> }
> @@ -286,7 +372,8 @@ int mpam_resctrl_setup(void)
> res->resctrl_res.rid = i;
> }
>
> - /* TODO: pick MPAM classes to map to resctrl resources */
> + /* Find some classes to use for controls */
> + mpam_resctrl_pick_caches();
>
> /* Initialise the resctrl structures from the classes */
> for (i = 0; i < RDT_NUM_RESOURCES; i++) {
^ permalink raw reply [flat|nested] 95+ messages in thread* Re: [RFC PATCH 08/38] arm_mpam: resctrl: Pick the caches we will use as resctrl resources
2025-12-18 11:38 ` Jonathan Cameron
@ 2025-12-19 12:04 ` Ben Horgan
0 siblings, 0 replies; 95+ messages in thread
From: Ben Horgan @ 2025-12-19 12:04 UTC (permalink / raw)
To: Jonathan Cameron, James Morse
Cc: linux-kernel, linux-arm-kernel, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal
Hi Jonathan,
On 12/18/25 11:38, Jonathan Cameron wrote:
> On Fri, 5 Dec 2025 21:58:31 +0000
> James Morse <james.morse@arm.com> wrote:
>
>> Systems with MPAM support may have a variety of control types at any
>> point of their system layout. We can only expose certain types of
>> control, and only if they exist at particular locations.
>>
>> Start with the well-know caches. These have to be depth 2 or 3
>> and support MPAM's cache portion bitmap controls, with a number
>> of portions fewer than resctrl's limit.
>
> Another one that is a bit random on wrap point. Probably worth tidying
> up formatting of all the commit messages so fussy people like me stop
> moaning ;)
Will do.
>
> Otherwise trivial stuff inline.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> ---
>> drivers/resctrl/mpam_resctrl.c | 91 +++++++++++++++++++++++++++++++++-
>> 1 file changed, 89 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
>> index 320cebbd37ce..ceaf11af4fc1 100644
>> --- a/drivers/resctrl/mpam_resctrl.c
>> +++ b/drivers/resctrl/mpam_resctrl.c
>> @@ -60,10 +60,96 @@ struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
>> return &mpam_resctrl_controls[l].resctrl_res;
>> }
>>
>> +static bool cache_has_usable_cpor(struct mpam_class *class)
>> +{
>> + struct mpam_props *cprops = &class->props;
>> +
>> + if (!mpam_has_feature(mpam_feat_cpor_part, cprops))
>> + return false;
>> +
>> + /* resctrl uses u32 for all bitmap configurations */
>> + return (class->props.cpbm_wd <= 32);
>
> For me those brackets aren't adding anything. It's not like
> anyone forgets precedence wrt return vs operators.
>
>> +}
>> +
>> +/* Test whether we can export MPAM_CLASS_CACHE:{2,3}? */
>> +static void mpam_resctrl_pick_caches(void)
>> +{
>> + struct mpam_class *class;
>> + struct mpam_resctrl_res *res;
>> +
>> + lockdep_assert_cpus_held();
>> +
>> + guard(srcu)(&mpam_srcu);
>> + list_for_each_entry_srcu(class, &mpam_classes, classes_list,
>> + srcu_read_lock_held(&mpam_srcu)) {
>> + if (class->type != MPAM_CLASS_CACHE) {
>> + pr_debug("class %u is not a cache\n", class->level);
>
> Lots of things aren't caches and seems unlikely that is a case that will
> make people wonder why pick caches didn't happen. So maybe this debug
> print is excessive?
Only memory or cache so far. I'll leave it here for now.
>
>> + continue;
>> + }
>> +
>> + if (class->level != 2 && class->level != 3) {
>> + pr_debug("class %u is not L2 or L3\n", class->level);
>> + continue;
>> + }
>> +
>> + if (!cache_has_usable_cpor(class)) {
>> + pr_debug("class %u cache misses CPOR\n", class->level);
>> + continue;
>> + }
>> +
>> + if (!cpumask_equal(&class->affinity, cpu_possible_mask)) {
>> + pr_debug("class %u has missing CPUs\n", class->level);
>> + pr_debug("class %u mask %*pb != %*pb\n", class->level,
>> + cpumask_pr_args(&class->affinity),
>> + cpumask_pr_args(cpu_possible_mask));
>
> Unless this is getting more complex in later patches, doesn't seem like
> pr_debug("class %u has missing CPUs, mask %*pb != %*pb\n",
> is too much to put on one line.
Done.
>
>
>> + continue;
>> + }
>> +
>> + if (class->level == 2)
>> + res = &mpam_resctrl_controls[RDT_RESOURCE_L2];
>> + else
>> + res = &mpam_resctrl_controls[RDT_RESOURCE_L3];
>> + res->class = class;
>> + exposed_alloc_capable = true;
>> + }
>> +}
>> +
>> static int mpam_resctrl_control_init(struct mpam_resctrl_res *res,
>> enum resctrl_res_level type)
>> {
>> - /* TODO: initialise the resctrl resources */
>> + struct mpam_class *class = res->class;
>> + struct rdt_resource *r = &res->resctrl_res;
>> +
>> + switch (res->resctrl_res.rid) {
>
> switch (r->rid) { ?
>
>> + case RDT_RESOURCE_L2:
>> + case RDT_RESOURCE_L3:
>> + r->alloc_capable = true;
>> + r->schema_fmt = RESCTRL_SCHEMA_BITMAP;
>> + r->cache.arch_has_sparse_bitmasks = true;
>> +
>> + r->cache.cbm_len = class->props.cpbm_wd;
>> + /* mpam_devices will reject empty bitmaps */
>> + r->cache.min_cbm_bits = 1;
>> +
>> + if (r->rid == RDT_RESOURCE_L2) {
>> + r->name = "L2";
>> + r->ctrl_scope = RESCTRL_L2_CACHE;
>> + } else {
>> + r->name = "L3";
>> + r->ctrl_scope = RESCTRL_L3_CACHE;
>> + }
>> +
>> + /*
>> + * Which bits are shared with other ...things...
>> + * Unknown devices use partid-0 which uses all the bitmap
>> + * fields. Until we configured the SMMU and GIC not to do this
>> + * 'all the bits' is the correct answer here.
>> + */
>> + r->cache.shareable_bits = resctrl_get_default_ctrl(r);
>> + break;
>> + default:
>> + break;
>> + }
>>
>> return 0;
>> }
>> @@ -286,7 +372,8 @@ int mpam_resctrl_setup(void)
>> res->resctrl_res.rid = i;
>> }
>>
>> - /* TODO: pick MPAM classes to map to resctrl resources */
>> + /* Find some classes to use for controls */
>> + mpam_resctrl_pick_caches();
>>
>> /* Initialise the resctrl structures from the classes */
>> for (i = 0; i < RDT_NUM_RESOURCES; i++) {
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread
* [RFC PATCH 09/38] arm_mpam: resctrl: Implement resctrl_arch_reset_all_ctrls()
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (7 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 08/38] arm_mpam: resctrl: Pick the caches we will use as resctrl resources James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-05 21:58 ` [RFC PATCH 10/38] arm_mpam: resctrl: Add resctrl_arch_get_config() James Morse
` (29 subsequent siblings)
38 siblings, 0 replies; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
We already have a helper for resetting an mpam class and component.
Hook it up to resctrl_arch_reset_all_ctrls() and the domain offline
path.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 6 +++---
drivers/resctrl/mpam_internal.h | 7 +++++++
drivers/resctrl/mpam_resctrl.c | 15 +++++++++++++++
3 files changed, 25 insertions(+), 3 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index efaf7633bc35..fccebfd980d8 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -2543,7 +2543,7 @@ static void mpam_enable_once(void)
mpam_partid_max + 1, mpam_pmg_max + 1);
}
-static void mpam_reset_component_locked(struct mpam_component *comp)
+void mpam_reset_component_locked(struct mpam_component *comp)
{
struct mpam_vmsc *vmsc;
@@ -2567,7 +2567,7 @@ static void mpam_reset_component_locked(struct mpam_component *comp)
}
}
-static void mpam_reset_class_locked(struct mpam_class *class)
+void mpam_reset_class_locked(struct mpam_class *class)
{
struct mpam_component *comp;
@@ -2579,7 +2579,7 @@ static void mpam_reset_class_locked(struct mpam_class *class)
mpam_reset_component_locked(comp);
}
-static void mpam_reset_class(struct mpam_class *class)
+void mpam_reset_class(struct mpam_class *class)
{
cpus_read_lock();
mpam_reset_class_locked(class);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index dfd3512ac924..8684bd35d4ab 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -392,6 +392,13 @@ extern u8 mpam_pmg_max;
void mpam_enable(struct work_struct *work);
void mpam_disable(struct work_struct *work);
+/* Reset all the RIS in a class, optionally while holding cpus_read_lock() */
+void mpam_reset_class_locked(struct mpam_class *class);
+void mpam_reset_class(struct mpam_class *class);
+
+/* Reset all the RIS in a component under cpus_read_lock() */
+void mpam_reset_component_locked(struct mpam_component *comp);
+
int mpam_apply_config(struct mpam_component *comp, u16 partid,
struct mpam_config *cfg);
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index ceaf11af4fc1..a2deea1f4818 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -168,6 +168,19 @@ static int mpam_resctrl_pick_domain_id(int cpu, struct mpam_component *comp)
return comp->comp_id;
}
+void resctrl_arch_reset_all_ctrls(struct rdt_resource *r)
+{
+ struct mpam_resctrl_res *res;
+
+ lockdep_assert_cpus_held();
+
+ if (!mpam_is_enabled())
+ return;
+
+ res = container_of(r, struct mpam_resctrl_res, resctrl_res);
+ mpam_reset_class_locked(res->class);
+}
+
static void mpam_resctrl_domain_hdr_init(int cpu, struct mpam_component *comp,
struct rdt_domain_hdr *hdr)
{
@@ -339,6 +352,8 @@ void mpam_resctrl_offline_cpu(unsigned int cpu)
ctrl_dom_empty = true;
if (exposed_alloc_capable) {
+ mpam_reset_component_locked(dom->ctrl_comp);
+
ctrl_d = &dom->resctrl_ctrl_dom;
ctrl_dom_empty = mpam_resctrl_offline_domain_hdr(cpu, &ctrl_d->hdr);
if (ctrl_dom_empty)
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* [RFC PATCH 10/38] arm_mpam: resctrl: Add resctrl_arch_get_config()
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (8 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 09/38] arm_mpam: resctrl: Implement resctrl_arch_reset_all_ctrls() James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-05 21:58 ` [RFC PATCH 11/38] arm_mpam: resctrl: Implement helpers to update configuration James Morse
` (28 subsequent siblings)
38 siblings, 0 replies; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
Implement resctrl_arch_get_config() by testing the live configuration for
a CPOR bitmap. For any other configuration type return the default.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_resctrl.c | 43 ++++++++++++++++++++++++++++++++++
1 file changed, 43 insertions(+)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index a2deea1f4818..a26eb1f3efd0 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -168,6 +168,49 @@ static int mpam_resctrl_pick_domain_id(int cpu, struct mpam_component *comp)
return comp->comp_id;
}
+u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
+ u32 closid, enum resctrl_conf_type type)
+{
+ u32 partid;
+ struct mpam_config *cfg;
+ struct mpam_props *cprops;
+ struct mpam_resctrl_res *res;
+ struct mpam_resctrl_dom *dom;
+ enum mpam_device_features configured_by;
+
+ lockdep_assert_cpus_held();
+
+ if (!mpam_is_enabled())
+ return resctrl_get_default_ctrl(r);
+
+ res = container_of(r, struct mpam_resctrl_res, resctrl_res);
+ dom = container_of(d, struct mpam_resctrl_dom, resctrl_ctrl_dom);
+ cprops = &res->class->props;
+
+ partid = resctrl_get_config_index(closid, type);
+ cfg = &dom->ctrl_comp->cfg[partid];
+
+ switch (r->rid) {
+ case RDT_RESOURCE_L2:
+ case RDT_RESOURCE_L3:
+ configured_by = mpam_feat_cpor_part;
+ break;
+ default:
+ return resctrl_get_default_ctrl(r);
+ }
+
+ if (!r->alloc_capable || partid >= resctrl_arch_get_num_closid(r) ||
+ !mpam_has_feature(configured_by, cfg))
+ return resctrl_get_default_ctrl(r);
+
+ switch (configured_by) {
+ case mpam_feat_cpor_part:
+ return cfg->cpbm;
+ default:
+ return resctrl_get_default_ctrl(r);
+ }
+}
+
void resctrl_arch_reset_all_ctrls(struct rdt_resource *r)
{
struct mpam_resctrl_res *res;
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* [RFC PATCH 11/38] arm_mpam: resctrl: Implement helpers to update configuration
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (9 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 10/38] arm_mpam: resctrl: Add resctrl_arch_get_config() James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-18 11:47 ` Jonathan Cameron
2025-12-05 21:58 ` [RFC PATCH 12/38] arm_mpam: resctrl: Add plumbing against arm64 task and cpu hooks James Morse
` (27 subsequent siblings)
38 siblings, 1 reply; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
resctrl has two helpers for updating the configuration.
resctrl_arch_update_one() updates a single value, and is used by the
software-controller to apply feedback to the bandwidth controls,
it has to be called on one of the CPUs in the resctrl:domain.
resctrl_arch_update_domains() copies multiple staged configurations,
it can be called from anywhere.
Both helpers should update any changes to the underlying hardware.
Imlpement resctrl_arch_update_domains() to use
resctrl_arch_update_one(). Neither need to be called on a specific
CPU as the mpam driver will send IPIs as needed.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_resctrl.c | 71 ++++++++++++++++++++++++++++++++++
1 file changed, 71 insertions(+)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index a26eb1f3efd0..ae0d17857b78 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -211,6 +211,77 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
}
}
+int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
+ u32 closid, enum resctrl_conf_type t, u32 cfg_val)
+{
+ u32 partid;
+ struct mpam_config cfg;
+ struct mpam_props *cprops;
+ struct mpam_resctrl_res *res;
+ struct mpam_resctrl_dom *dom;
+
+ lockdep_assert_cpus_held();
+ lockdep_assert_irqs_enabled();
+
+ /*
+ * No need to check the CPU as mpam_apply_config() doesn't care, and
+ * resctrl_arch_update_domains() relies on this.
+ */
+ res = container_of(r, struct mpam_resctrl_res, resctrl_res);
+ dom = container_of(d, struct mpam_resctrl_dom, resctrl_ctrl_dom);
+ cprops = &res->class->props;
+
+ partid = resctrl_get_config_index(closid, t);
+ if (!r->alloc_capable || partid >= resctrl_arch_get_num_closid(r)) {
+ pr_debug("Not alloc capable or computed PARTID out of range\n");
+ return -EINVAL;
+ }
+
+ /*
+ * Copy the current config to avoid clearing other resources when the
+ * same component is exposed multiple times through resctrl.
+ */
+ cfg = dom->ctrl_comp->cfg[partid];
+
+ switch (r->rid) {
+ case RDT_RESOURCE_L2:
+ case RDT_RESOURCE_L3:
+ cfg.cpbm = cfg_val;
+ mpam_set_feature(mpam_feat_cpor_part, &cfg);
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return mpam_apply_config(dom->ctrl_comp, partid, &cfg);
+}
+
+int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
+{
+ int err = 0;
+ enum resctrl_conf_type t;
+ struct rdt_ctrl_domain *d;
+ struct resctrl_staged_config *cfg;
+
+ lockdep_assert_cpus_held();
+ lockdep_assert_irqs_enabled();
+
+ list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
+ for (t = 0; t < CDP_NUM_TYPES; t++) {
+ cfg = &d->staged_config[t];
+ if (!cfg->have_new_ctrl)
+ continue;
+
+ err = resctrl_arch_update_one(r, d, closid, t,
+ cfg->new_ctrl);
+ if (err)
+ return err;
+ }
+ }
+
+ return err;
+}
+
void resctrl_arch_reset_all_ctrls(struct rdt_resource *r)
{
struct mpam_resctrl_res *res;
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [RFC PATCH 11/38] arm_mpam: resctrl: Implement helpers to update configuration
2025-12-05 21:58 ` [RFC PATCH 11/38] arm_mpam: resctrl: Implement helpers to update configuration James Morse
@ 2025-12-18 11:47 ` Jonathan Cameron
0 siblings, 0 replies; 95+ messages in thread
From: Jonathan Cameron @ 2025-12-18 11:47 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
On Fri, 5 Dec 2025 21:58:34 +0000
James Morse <james.morse@arm.com> wrote:
> resctrl has two helpers for updating the configuration.
> resctrl_arch_update_one() updates a single value, and is used by the
> software-controller to apply feedback to the bandwidth controls,
> it has to be called on one of the CPUs in the resctrl:domain.
>
> resctrl_arch_update_domains() copies multiple staged configurations,
> it can be called from anywhere.
>
> Both helpers should update any changes to the underlying hardware.
>
> Imlpement resctrl_arch_update_domains() to use
Implement
> resctrl_arch_update_one(). Neither need to be called on a specific
> CPU as the mpam driver will send IPIs as needed.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> drivers/resctrl/mpam_resctrl.c | 71 ++++++++++++++++++++++++++++++++++
> 1 file changed, 71 insertions(+)
>
> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
> index a26eb1f3efd0..ae0d17857b78 100644
> --- a/drivers/resctrl/mpam_resctrl.c
> +++ b/drivers/resctrl/mpam_resctrl.c
> +int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
> +{
> + int err = 0;
> + enum resctrl_conf_type t;
> + struct rdt_ctrl_domain *d;
> + struct resctrl_staged_config *cfg;
> +
> + lockdep_assert_cpus_held();
> + lockdep_assert_irqs_enabled();
> +
> + list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
> + for (t = 0; t < CDP_NUM_TYPES; t++) {
> + cfg = &d->staged_config[t];
> + if (!cfg->have_new_ctrl)
> + continue;
> +
> + err = resctrl_arch_update_one(r, d, closid, t,
> + cfg->new_ctrl);
> + if (err)
> + return err;
> + }
> + }
> +
> + return err;
If it stays this simple in later patches
return 0;
to make it clear this is the good path. Also can then avoid initializing
err at the top.
> +}
> +
> void resctrl_arch_reset_all_ctrls(struct rdt_resource *r)
> {
> struct mpam_resctrl_res *res;
^ permalink raw reply [flat|nested] 95+ messages in thread
* [RFC PATCH 12/38] arm_mpam: resctrl: Add plumbing against arm64 task and cpu hooks
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (10 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 11/38] arm_mpam: resctrl: Implement helpers to update configuration James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-05 21:58 ` [RFC PATCH 13/38] arm_mpam: resctrl: Add CDP emulation James Morse
` (26 subsequent siblings)
38 siblings, 0 replies; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
arm64 provides helpers for changing a tasks and a cpus mpam partid/pmg
values.
These are used to back a number of resctrl_arch_ functions. Connect them
up.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_resctrl.c | 58 ++++++++++++++++++++++++++++++++++
include/linux/arm_mpam.h | 5 +++
2 files changed, 63 insertions(+)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index ae0d17857b78..c2650abb99ec 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -8,6 +8,7 @@
#include <linux/cpu.h>
#include <linux/cpumask.h>
#include <linux/errno.h>
+#include <linux/limits.h>
#include <linux/list.h>
#include <linux/printk.h>
#include <linux/rculist.h>
@@ -32,6 +33,8 @@ static DEFINE_MUTEX(domain_list_lock);
static bool exposed_alloc_capable;
static bool exposed_mon_capable;
+static bool cdp_enabled;
+
bool resctrl_arch_alloc_capable(void)
{
return exposed_alloc_capable;
@@ -52,6 +55,61 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *ignored)
return mpam_partid_max + 1;
}
+void resctrl_arch_sched_in(struct task_struct *tsk)
+{
+ lockdep_assert_preemption_disabled();
+
+ mpam_thread_switch(tsk);
+}
+
+void resctrl_arch_set_cpu_default_closid_rmid(int cpu, u32 closid, u32 rmid)
+{
+ WARN_ON_ONCE(closid > U16_MAX);
+ WARN_ON_ONCE(rmid > U8_MAX);
+
+ if (!cdp_enabled) {
+ mpam_set_cpu_defaults(cpu, closid, closid, rmid, rmid);
+ } else {
+ /*
+ * When CDP is enabled, resctrl halves the closid range and we
+ * use odd/even partid for one closid.
+ */
+ u32 partid_d = resctrl_get_config_index(closid, CDP_DATA);
+ u32 partid_i = resctrl_get_config_index(closid, CDP_CODE);
+
+ mpam_set_cpu_defaults(cpu, partid_d, partid_i, rmid, rmid);
+ }
+}
+
+void resctrl_arch_sync_cpu_closid_rmid(void *info)
+{
+ struct resctrl_cpu_defaults *r = info;
+
+ lockdep_assert_preemption_disabled();
+
+ if (r) {
+ resctrl_arch_set_cpu_default_closid_rmid(smp_processor_id(),
+ r->closid, r->rmid);
+ }
+
+ resctrl_arch_sched_in(current);
+}
+
+void resctrl_arch_set_closid_rmid(struct task_struct *tsk, u32 closid, u32 rmid)
+{
+ WARN_ON_ONCE(closid > U16_MAX);
+ WARN_ON_ONCE(rmid > U8_MAX);
+
+ if (!cdp_enabled) {
+ mpam_set_task_partid_pmg(tsk, closid, closid, rmid, rmid);
+ } else {
+ u32 partid_d = resctrl_get_config_index(closid, CDP_DATA);
+ u32 partid_i = resctrl_get_config_index(closid, CDP_CODE);
+
+ mpam_set_task_partid_pmg(tsk, partid_d, partid_i, rmid, rmid);
+ }
+}
+
struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
{
if (l >= RDT_NUM_RESOURCES)
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
index 2c7d1413a401..5a78299ec464 100644
--- a/include/linux/arm_mpam.h
+++ b/include/linux/arm_mpam.h
@@ -52,6 +52,11 @@ static inline int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
bool resctrl_arch_alloc_capable(void);
bool resctrl_arch_mon_capable(void);
+void resctrl_arch_set_cpu_default_closid(int cpu, u32 closid);
+void resctrl_arch_set_closid_rmid(struct task_struct *tsk, u32 closid, u32 rmid);
+void resctrl_arch_set_cpu_default_closid_rmid(int cpu, u32 closid, u32 rmid);
+void resctrl_arch_sched_in(struct task_struct *tsk);
+
/**
* mpam_register_requestor() - Register a requestor with the MPAM driver
* @partid_max: The maximum PARTID value the requestor can generate.
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* [RFC PATCH 13/38] arm_mpam: resctrl: Add CDP emulation
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (11 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 12/38] arm_mpam: resctrl: Add plumbing against arm64 task and cpu hooks James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-16 13:49 ` Ben Horgan
` (2 more replies)
2025-12-05 21:58 ` [RFC PATCH 14/38] arm_mpam: resctrl: Add rmid index helpers James Morse
` (25 subsequent siblings)
38 siblings, 3 replies; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal, Dave Martin
Intel RDT's CDP feature allows the cache to use a different control value
depending on whether the accesses was for instruction fetch or a data
access. MPAM's equivalent feature is the other way up: the CPU assigns a
different partid label to traffic depending on whether it was instruction
fetch or a data access, which causes the cache to use a different control
value based solely on the partid.
MPAM can emulate CDP, with the side effect that the alternative partid is
seen by all MSC, it can't be enabled per-MSC.
Add the resctrl hooks to turn this on or off. Add the helpers that
match a closid against a task, which need to be aware that the value
written to hardware is not the same as the one resctrl is using.
Update the 'arm64_mpam_global_default' variable the arch code uses
during context switch to know when the per-cpu value should be used
instead.
Awkwardly, the MB controls don't implement CDP. To emulate this, the
MPAM equivalent needs programming twice by the resctrl glue, as
resctrl expects the bandwidth controls to be applied independently for
both data and isntruction-fetch.
CC: Dave Martin <Dave.Martin@arm.com>
CC: Ben Horgan <ben.horgan@arm.com>
CC: Amit Singh Tomar <amitsinght@marvell.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
arch/arm64/include/asm/mpam.h | 1 +
drivers/resctrl/mpam_resctrl.c | 115 ++++++++++++++++++++++++++++++++-
include/linux/arm_mpam.h | 3 +
3 files changed, 118 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
index 2960ffaf6574..689285c4c61f 100644
--- a/arch/arm64/include/asm/mpam.h
+++ b/arch/arm64/include/asm/mpam.h
@@ -4,6 +4,7 @@
#ifndef __ASM__MPAM_H
#define __ASM__MPAM_H
+#include <linux/arm_mpam.h>
#include <linux/bitops.h>
#include <linux/bitfield.h>
#include <linux/init.h>
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index c2650abb99ec..d5f75ed67e46 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -33,6 +33,10 @@ static DEFINE_MUTEX(domain_list_lock);
static bool exposed_alloc_capable;
static bool exposed_mon_capable;
+/*
+ * MPAM emulates CDP by setting different PARTID in the I/D fields of MPAM0_EL1.
+ * This applies globally to all traffic the CPU generates.
+ */
static bool cdp_enabled;
bool resctrl_arch_alloc_capable(void)
@@ -45,6 +49,67 @@ bool resctrl_arch_mon_capable(void)
return exposed_mon_capable;
}
+bool resctrl_arch_get_cdp_enabled(enum resctrl_res_level rid)
+{
+ switch (rid) {
+ case RDT_RESOURCE_L2:
+ case RDT_RESOURCE_L3:
+ return cdp_enabled;
+ case RDT_RESOURCE_MBA:
+ default:
+ /*
+ * x86's MBA control doesn't support CDP, so user-space doesn't
+ * expect it.
+ */
+ return false;
+ }
+}
+
+/**
+ * resctrl_reset_task_closids() - Reset the PARTID/PMG values for all tasks.
+ *
+ * At boot, all existing tasks use partid zero for D and I.
+ * To enable/disable CDP emulation, all these tasks need relabelling.
+ */
+static void resctrl_reset_task_closids(void)
+{
+ struct task_struct *p, *t;
+
+ read_lock(&tasklist_lock);
+ for_each_process_thread(p, t) {
+ resctrl_arch_set_closid_rmid(t, RESCTRL_RESERVED_CLOSID,
+ RESCTRL_RESERVED_RMID);
+ }
+ read_unlock(&tasklist_lock);
+}
+
+int resctrl_arch_set_cdp_enabled(enum resctrl_res_level ignored, bool enable)
+{
+ u32 partid_i, partid_d;
+
+ cdp_enabled = enable;
+ partid_i = partid_d = RESCTRL_RESERVED_CLOSID;
+
+ if (enable) {
+ u32 partid = RESCTRL_RESERVED_CLOSID;
+
+ partid_d = resctrl_get_config_index(partid, CDP_CODE);
+ partid_i = resctrl_get_config_index(partid, CDP_DATA);
+ }
+
+ mpam_set_task_partid_pmg(current, partid_d, partid_i, 0, 0);
+ WRITE_ONCE(arm64_mpam_global_default, mpam_get_regval(current));
+
+ resctrl_reset_task_closids();
+
+ return 0;
+}
+
+static bool mpam_resctrl_hide_cdp(enum resctrl_res_level rid)
+{
+ return cdp_enabled && !resctrl_arch_get_cdp_enabled(rid);
+}
+
/*
* MSC may raise an error interrupt if it sees an out or range partid/pmg,
* and go on to truncate the value. Regardless of what the hardware supports,
@@ -110,6 +175,30 @@ void resctrl_arch_set_closid_rmid(struct task_struct *tsk, u32 closid, u32 rmid)
}
}
+bool resctrl_arch_match_closid(struct task_struct *tsk, u32 closid)
+{
+ u64 regval = mpam_get_regval(tsk);
+ u32 tsk_closid = FIELD_GET(MPAM0_EL1_PARTID_D, regval);
+
+ if (cdp_enabled)
+ tsk_closid >>= 1;
+
+ return tsk_closid == closid;
+}
+
+/* The task's pmg is not unique, the partid must be considered too */
+bool resctrl_arch_match_rmid(struct task_struct *tsk, u32 closid, u32 rmid)
+{
+ u64 regval = mpam_get_regval(tsk);
+ u32 tsk_closid = FIELD_GET(MPAM0_EL1_PARTID_D, regval);
+ u32 tsk_rmid = FIELD_GET(MPAM0_EL1_PMG_D, regval);
+
+ if (cdp_enabled)
+ tsk_closid >>= 1;
+
+ return (tsk_closid == closid) && (tsk_rmid == rmid);
+}
+
struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
{
if (l >= RDT_NUM_RESOURCES)
@@ -245,6 +334,14 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
dom = container_of(d, struct mpam_resctrl_dom, resctrl_ctrl_dom);
cprops = &res->class->props;
+ /*
+ * When CDP is enabled, but the resource doesn't support it,
+ * the control is cloned across both partids.
+ * Pick one at random to read:
+ */
+ if (mpam_resctrl_hide_cdp(r->rid))
+ type = CDP_DATA;
+
partid = resctrl_get_config_index(closid, type);
cfg = &dom->ctrl_comp->cfg[partid];
@@ -272,6 +369,7 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
u32 closid, enum resctrl_conf_type t, u32 cfg_val)
{
+ int err;
u32 partid;
struct mpam_config cfg;
struct mpam_props *cprops;
@@ -311,7 +409,22 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
return -EINVAL;
}
- return mpam_apply_config(dom->ctrl_comp, partid, &cfg);
+ /*
+ * When CDP is enabled, but the resource doesn't support it, we need to
+ * apply the same configuration to the other partid.
+ */
+ if (mpam_resctrl_hide_cdp(r->rid)) {
+ partid = resctrl_get_config_index(closid, CDP_CODE);
+ err = mpam_apply_config(dom->ctrl_comp, partid, &cfg);
+ if (err)
+ return err;
+
+ partid = resctrl_get_config_index(closid, CDP_DATA);
+ return mpam_apply_config(dom->ctrl_comp, partid, &cfg);
+
+ } else {
+ return mpam_apply_config(dom->ctrl_comp, partid, &cfg);
+ }
}
int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
index 5a78299ec464..ba0312b55d9f 100644
--- a/include/linux/arm_mpam.h
+++ b/include/linux/arm_mpam.h
@@ -5,6 +5,7 @@
#define __LINUX_ARM_MPAM_H
#include <linux/acpi.h>
+#include <linux/resctrl_types.h>
#include <linux/types.h>
struct mpam_msc;
@@ -56,6 +57,8 @@ void resctrl_arch_set_cpu_default_closid(int cpu, u32 closid);
void resctrl_arch_set_closid_rmid(struct task_struct *tsk, u32 closid, u32 rmid);
void resctrl_arch_set_cpu_default_closid_rmid(int cpu, u32 closid, u32 rmid);
void resctrl_arch_sched_in(struct task_struct *tsk);
+bool resctrl_arch_match_closid(struct task_struct *tsk, u32 closid);
+bool resctrl_arch_match_rmid(struct task_struct *tsk, u32 closid, u32 rmid);
/**
* mpam_register_requestor() - Register a requestor with the MPAM driver
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [RFC PATCH 13/38] arm_mpam: resctrl: Add CDP emulation
2025-12-05 21:58 ` [RFC PATCH 13/38] arm_mpam: resctrl: Add CDP emulation James Morse
@ 2025-12-16 13:49 ` Ben Horgan
2025-12-16 14:24 ` Ben Horgan
2025-12-18 11:58 ` Jonathan Cameron
2 siblings, 0 replies; 95+ messages in thread
From: Ben Horgan @ 2025-12-16 13:49 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel
Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal
Hi James,
On 12/5/25 21:58, James Morse wrote:
> Intel RDT's CDP feature allows the cache to use a different control value
> depending on whether the accesses was for instruction fetch or a data
> access. MPAM's equivalent feature is the other way up: the CPU assigns a
> different partid label to traffic depending on whether it was instruction
> fetch or a data access, which causes the cache to use a different control
> value based solely on the partid.
>
> MPAM can emulate CDP, with the side effect that the alternative partid is
> seen by all MSC, it can't be enabled per-MSC.
>
> Add the resctrl hooks to turn this on or off. Add the helpers that
> match a closid against a task, which need to be aware that the value
> written to hardware is not the same as the one resctrl is using.
>
> Update the 'arm64_mpam_global_default' variable the arch code uses
> during context switch to know when the per-cpu value should be used
> instead.
>
> Awkwardly, the MB controls don't implement CDP. To emulate this, the
> MPAM equivalent needs programming twice by the resctrl glue, as
> resctrl expects the bandwidth controls to be applied independently for
> both data and isntruction-fetch.
>
> CC: Dave Martin <Dave.Martin@arm.com>
> CC: Ben Horgan <ben.horgan@arm.com>
> CC: Amit Singh Tomar <amitsinght@marvell.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> arch/arm64/include/asm/mpam.h | 1 +
> drivers/resctrl/mpam_resctrl.c | 115 ++++++++++++++++++++++++++++++++-
> include/linux/arm_mpam.h | 3 +
> 3 files changed, 118 insertions(+), 1 deletion(-)
>
[...]
>
> @@ -272,6 +369,7 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
> int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
> u32 closid, enum resctrl_conf_type t, u32 cfg_val)
> {
> + int err;
> u32 partid;
> struct mpam_config cfg;
> struct mpam_props *cprops;
> @@ -311,7 +409,22 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
> return -EINVAL;
> }
>
> - return mpam_apply_config(dom->ctrl_comp, partid, &cfg);
> + /*
> + * When CDP is enabled, but the resource doesn't support it, we need to
> + * apply the same configuration to the other partid.
> + */
> + if (mpam_resctrl_hide_cdp(r->rid)) {
> + partid = resctrl_get_config_index(closid, CDP_CODE);
> + err = mpam_apply_config(dom->ctrl_comp, partid, &cfg);
> + if (err)
> + return err;
> +
> + partid = resctrl_get_config_index(closid, CDP_DATA);
> + return mpam_apply_config(dom->ctrl_comp, partid, &cfg);
This is indeed awkward. As we are programming twice, if instruction and
data use b/w equally then the MB setting will be as if it was double.
However, if we halved the configured value, and most of the bandwidth
was for data, then it would be as if it was halved. For controls where
the actual parts are chosen, rather than just the size then this
programming twice will work as expected. Therefore, I think your chosen
policy is the least surprising and I'll keep this as is.
> +
> + } else {
> + return mpam_apply_config(dom->ctrl_comp, partid, &cfg);
> + }
> }
[...]
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread* Re: [RFC PATCH 13/38] arm_mpam: resctrl: Add CDP emulation
2025-12-05 21:58 ` [RFC PATCH 13/38] arm_mpam: resctrl: Add CDP emulation James Morse
2025-12-16 13:49 ` Ben Horgan
@ 2025-12-16 14:24 ` Ben Horgan
2025-12-18 11:58 ` Jonathan Cameron
2 siblings, 0 replies; 95+ messages in thread
From: Ben Horgan @ 2025-12-16 14:24 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel
Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal
Hi James,
On 12/5/25 21:58, James Morse wrote:
> Intel RDT's CDP feature allows the cache to use a different control value
> depending on whether the accesses was for instruction fetch or a data
> access. MPAM's equivalent feature is the other way up: the CPU assigns a
> different partid label to traffic depending on whether it was instruction
> fetch or a data access, which causes the cache to use a different control
> value based solely on the partid.
>
> MPAM can emulate CDP, with the side effect that the alternative partid is
> seen by all MSC, it can't be enabled per-MSC.
>
> Add the resctrl hooks to turn this on or off. Add the helpers that
> match a closid against a task, which need to be aware that the value
> written to hardware is not the same as the one resctrl is using.
>
> Update the 'arm64_mpam_global_default' variable the arch code uses
> during context switch to know when the per-cpu value should be used
> instead.
>
> Awkwardly, the MB controls don't implement CDP. To emulate this, the
> MPAM equivalent needs programming twice by the resctrl glue, as
> resctrl expects the bandwidth controls to be applied independently for
> both data and isntruction-fetch.
>
> CC: Dave Martin <Dave.Martin@arm.com>
> CC: Ben Horgan <ben.horgan@arm.com>
> CC: Amit Singh Tomar <amitsinght@marvell.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> arch/arm64/include/asm/mpam.h | 1 +
> drivers/resctrl/mpam_resctrl.c | 115 ++++++++++++++++++++++++++++++++-
> include/linux/arm_mpam.h | 3 +
> 3 files changed, 118 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/mpam.h b/arch/arm64/include/asm/mpam.h
> index 2960ffaf6574..689285c4c61f 100644
> --- a/arch/arm64/include/asm/mpam.h
> +++ b/arch/arm64/include/asm/mpam.h
> @@ -4,6 +4,7 @@
> #ifndef __ASM__MPAM_H
> #define __ASM__MPAM_H
>
> +#include <linux/arm_mpam.h>
> #include <linux/bitops.h>
> #include <linux/bitfield.h>
> #include <linux/init.h>
> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
> index c2650abb99ec..d5f75ed67e46 100644
> --- a/drivers/resctrl/mpam_resctrl.c
> +++ b/drivers/resctrl/mpam_resctrl.c
> @@ -33,6 +33,10 @@ static DEFINE_MUTEX(domain_list_lock);
> static bool exposed_alloc_capable;
> static bool exposed_mon_capable;
>
> +/*
> + * MPAM emulates CDP by setting different PARTID in the I/D fields of MPAM0_EL1.
> + * This applies globally to all traffic the CPU generates.
> + */
> static bool cdp_enabled;
>
> bool resctrl_arch_alloc_capable(void)
> @@ -45,6 +49,67 @@ bool resctrl_arch_mon_capable(void)
> return exposed_mon_capable;
> }
>
> +bool resctrl_arch_get_cdp_enabled(enum resctrl_res_level rid)
> +{
> + switch (rid) {
> + case RDT_RESOURCE_L2:
> + case RDT_RESOURCE_L3:
> + return cdp_enabled;
> + case RDT_RESOURCE_MBA:
> + default:
> + /*
> + * x86's MBA control doesn't support CDP, so user-space doesn't
> + * expect it.
> + */
> + return false;
> + }
> +}
> +
> +/**
> + * resctrl_reset_task_closids() - Reset the PARTID/PMG values for all tasks.
> + *
> + * At boot, all existing tasks use partid zero for D and I.
> + * To enable/disable CDP emulation, all these tasks need relabelling.
> + */
> +static void resctrl_reset_task_closids(void)
> +{
> + struct task_struct *p, *t;
> +
> + read_lock(&tasklist_lock);
> + for_each_process_thread(p, t) {
> + resctrl_arch_set_closid_rmid(t, RESCTRL_RESERVED_CLOSID,
> + RESCTRL_RESERVED_RMID);
> + }
> + read_unlock(&tasklist_lock);
> +}
> +
> +int resctrl_arch_set_cdp_enabled(enum resctrl_res_level ignored, bool enable)
> +{
> + u32 partid_i, partid_d;
> +
> + cdp_enabled = enable;
> + partid_i = partid_d = RESCTRL_RESERVED_CLOSID;
> +
> + if (enable) {
> + u32 partid = RESCTRL_RESERVED_CLOSID;
> +
> + partid_d = resctrl_get_config_index(partid, CDP_CODE);
> + partid_i = resctrl_get_config_index(partid, CDP_DATA);
These are back to front, partid_d is data and partid_i is code. Also, if
there is only one partid then we don't have 2 different partids and we
should fail to enable cdp.
> + }
> +
> + mpam_set_task_partid_pmg(current, partid_d, partid_i, 0, 0);
> + WRITE_ONCE(arm64_mpam_global_default, mpam_get_regval(current));
> +
> + resctrl_reset_task_closids();
> +
> + return 0;
> +}
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread* Re: [RFC PATCH 13/38] arm_mpam: resctrl: Add CDP emulation
2025-12-05 21:58 ` [RFC PATCH 13/38] arm_mpam: resctrl: Add CDP emulation James Morse
2025-12-16 13:49 ` Ben Horgan
2025-12-16 14:24 ` Ben Horgan
@ 2025-12-18 11:58 ` Jonathan Cameron
2 siblings, 0 replies; 95+ messages in thread
From: Jonathan Cameron @ 2025-12-18 11:58 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
On Fri, 5 Dec 2025 21:58:36 +0000
James Morse <james.morse@arm.com> wrote:
> Intel RDT's CDP feature allows the cache to use a different control value
> depending on whether the accesses was for instruction fetch or a data
> access. MPAM's equivalent feature is the other way up: the CPU assigns a
> different partid label to traffic depending on whether it was instruction
> fetch or a data access, which causes the cache to use a different control
> value based solely on the partid.
>
> MPAM can emulate CDP, with the side effect that the alternative partid is
> seen by all MSC, it can't be enabled per-MSC.
>
> Add the resctrl hooks to turn this on or off. Add the helpers that
> match a closid against a task, which need to be aware that the value
> written to hardware is not the same as the one resctrl is using.
>
> Update the 'arm64_mpam_global_default' variable the arch code uses
> during context switch to know when the per-cpu value should be used
> instead.
>
> Awkwardly, the MB controls don't implement CDP. To emulate this, the
> MPAM equivalent needs programming twice by the resctrl glue, as
> resctrl expects the bandwidth controls to be applied independently for
> both data and isntruction-fetch.
>
> CC: Dave Martin <Dave.Martin@arm.com>
> CC: Ben Horgan <ben.horgan@arm.com>
> CC: Amit Singh Tomar <amitsinght@marvell.com>
> Signed-off-by: James Morse <james.morse@arm.com>
More trivial stuff. It is clearly that kind of a day.
> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
> index c2650abb99ec..d5f75ed67e46 100644
> --- a/drivers/resctrl/mpam_resctrl.c
> +++ b/drivers/resctrl/mpam_resctrl.c
> +int resctrl_arch_set_cdp_enabled(enum resctrl_res_level ignored, bool enable)
> +{
> + u32 partid_i, partid_d;
> +
> + cdp_enabled = enable;
> + partid_i = partid_d = RESCTRL_RESERVED_CLOSID;
I'd do these at the declarations. The kernel style guide does also
say not to do multiple assignments on one line. Obviously this
one is fairly obviously fine but none the less...
> +
> + if (enable) {
> + u32 partid = RESCTRL_RESERVED_CLOSID;
> +
> + partid_d = resctrl_get_config_index(partid, CDP_CODE);
> + partid_i = resctrl_get_config_index(partid, CDP_DATA);
> + }
> +
> + mpam_set_task_partid_pmg(current, partid_d, partid_i, 0, 0);
> + WRITE_ONCE(arm64_mpam_global_default, mpam_get_regval(current));
> +
> + resctrl_reset_task_closids();
> +
> + return 0;
> +}
> @@ -272,6 +369,7 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
> int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
> u32 closid, enum resctrl_conf_type t, u32 cfg_val)
> {
> + int err;
> u32 partid;
> struct mpam_config cfg;
> struct mpam_props *cprops;
> @@ -311,7 +409,22 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
> return -EINVAL;
> }
>
> - return mpam_apply_config(dom->ctrl_comp, partid, &cfg);
> + /*
> + * When CDP is enabled, but the resource doesn't support it, we need to
> + * apply the same configuration to the other partid.
> + */
> + if (mpam_resctrl_hide_cdp(r->rid)) {
> + partid = resctrl_get_config_index(closid, CDP_CODE);
> + err = mpam_apply_config(dom->ctrl_comp, partid, &cfg);
> + if (err)
> + return err;
> +
> + partid = resctrl_get_config_index(closid, CDP_DATA);
> + return mpam_apply_config(dom->ctrl_comp, partid, &cfg);
Given you returned, maybe drop the else which will reduce the diff
as well as a bonus. Definitely drop this blank line.
> +
> + } else {
> + return mpam_apply_config(dom->ctrl_comp, partid, &cfg);
> + }
> }
^ permalink raw reply [flat|nested] 95+ messages in thread
* [RFC PATCH 14/38] arm_mpam: resctrl: Add rmid index helpers
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (12 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 13/38] arm_mpam: resctrl: Add CDP emulation James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-05 21:58 ` [RFC PATCH 15/38] arm_mpam: resctrl: Convert to/from MPAMs fixed-point formats James Morse
` (24 subsequent siblings)
38 siblings, 0 replies; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
Because MPAM's pmg aren't identical to RDT's rmid, resctrl handles
some data structures by index. This allows x86 to map indexes to
RMID, and MPAM to map them to partid-and-pmg.
Add the helpers to do this.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_resctrl.c | 28 ++++++++++++++++++++++++++++
include/linux/arm_mpam.h | 3 +++
2 files changed, 31 insertions(+)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index d5f75ed67e46..19d70d00bbcc 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -120,6 +120,34 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *ignored)
return mpam_partid_max + 1;
}
+u32 resctrl_arch_system_num_rmid_idx(void)
+{
+ u8 closid_shift = fls(mpam_pmg_max);
+ u32 num_partid = resctrl_arch_get_num_closid(NULL);
+
+ return num_partid << closid_shift;
+}
+
+u32 resctrl_arch_rmid_idx_encode(u32 closid, u32 rmid)
+{
+ u8 closid_shift = fls(mpam_pmg_max);
+
+ WARN_ON_ONCE(closid_shift > 8);
+
+ return (closid << closid_shift) | rmid;
+}
+
+void resctrl_arch_rmid_idx_decode(u32 idx, u32 *closid, u32 *rmid)
+{
+ u8 closid_shift = fls(mpam_pmg_max);
+ u32 pmg_mask = ~(~0 << closid_shift);
+
+ WARN_ON_ONCE(closid_shift > 8);
+
+ *closid = idx >> closid_shift;
+ *rmid = idx & pmg_mask;
+}
+
void resctrl_arch_sched_in(struct task_struct *tsk)
{
lockdep_assert_preemption_disabled();
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
index ba0312b55d9f..385554ceb452 100644
--- a/include/linux/arm_mpam.h
+++ b/include/linux/arm_mpam.h
@@ -59,6 +59,9 @@ void resctrl_arch_set_cpu_default_closid_rmid(int cpu, u32 closid, u32 rmid);
void resctrl_arch_sched_in(struct task_struct *tsk);
bool resctrl_arch_match_closid(struct task_struct *tsk, u32 closid);
bool resctrl_arch_match_rmid(struct task_struct *tsk, u32 closid, u32 rmid);
+u32 resctrl_arch_rmid_idx_encode(u32 closid, u32 rmid);
+void resctrl_arch_rmid_idx_decode(u32 idx, u32 *closid, u32 *rmid);
+u32 resctrl_arch_system_num_rmid_idx(void);
/**
* mpam_register_requestor() - Register a requestor with the MPAM driver
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* [RFC PATCH 15/38] arm_mpam: resctrl: Convert to/from MPAMs fixed-point formats
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (13 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 14/38] arm_mpam: resctrl: Add rmid index helpers James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-05 21:58 ` [RFC PATCH 16/38] arm_mpam: resctrl: Add support for 'MB' resource James Morse
` (23 subsequent siblings)
38 siblings, 0 replies; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal, Dave Martin
From: Dave Martin <Dave.Martin@arm.com>
MPAM uses a fixed-point formats for some hardware controls.
Resctrl provides the bandwidth controls as a percentage. Add helpers to
convert between these.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_resctrl.c | 41 ++++++++++++++++++++++++++++++++++
1 file changed, 41 insertions(+)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index 19d70d00bbcc..55576d0caf12 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -10,6 +10,7 @@
#include <linux/errno.h>
#include <linux/limits.h>
#include <linux/list.h>
+#include <linux/math.h>
#include <linux/printk.h>
#include <linux/rculist.h>
#include <linux/resctrl.h>
@@ -246,6 +247,46 @@ static bool cache_has_usable_cpor(struct mpam_class *class)
return (class->props.cpbm_wd <= 32);
}
+/*
+ * Each fixed-point hardware value architecturally represents a range
+ * of values: the full range 0% - 100% is split contiguously into
+ * (1 << cprops->bwa_wd) equal bands.
+ * Find the nearest percentage value to the upper bound of the selected band:
+ */
+static u32 mbw_max_to_percent(u16 mbw_max, struct mpam_props *cprops)
+{
+ u32 val = mbw_max;
+
+ val >>= 16 - cprops->bwa_wd;
+ val += 1;
+ val *= MAX_MBA_BW;
+ val = DIV_ROUND_CLOSEST(val, 1 << cprops->bwa_wd);
+
+ return val;
+}
+
+/*
+ * Find the band whose upper bound is closest to the specified percentage.
+ *
+ * A round-to-nearest policy is followed here as a balanced compromise
+ * between unexpected under-commit of the resource (where the total of
+ * a set of resource allocations after conversion is less than the
+ * expected total, due to rounding of the individual converted
+ * percentages) and over-commit (where the total of the converted
+ * allocations is greater than expected).
+ */
+static u16 percent_to_mbw_max(u8 pc, struct mpam_props *cprops)
+{
+ u32 val = pc;
+
+ val <<= cprops->bwa_wd;
+ val = DIV_ROUND_CLOSEST(val, MAX_MBA_BW);
+ val = max(val, 1) - 1;
+ val <<= 16 - cprops->bwa_wd;
+
+ return val;
+}
+
/* Test whether we can export MPAM_CLASS_CACHE:{2,3}? */
static void mpam_resctrl_pick_caches(void)
{
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* [RFC PATCH 16/38] arm_mpam: resctrl: Add support for 'MB' resource
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (14 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 15/38] arm_mpam: resctrl: Convert to/from MPAMs fixed-point formats James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-12 4:27 ` Gavin Shan
2025-12-05 21:58 ` [RFC PATCH 17/38] arm_mpam: resctrl: Add kunit test for control format conversions James Morse
` (22 subsequent siblings)
38 siblings, 1 reply; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal, Zeng Heng, Dave Martin
resctrl supports 'MB', as a percentage throttling of traffic somewhere
after the L3. This is the control that mba_sc uses, so ideally the
class chosen should be as close as possible to the counters used for
mba_local.
MB's percentage control should be backed either with the fixed point
fraction MBW_MAX. The bandwidth portion bitmaps is not used as its
tricky to pick which bits to use to avoid contention, and may be
possible to expose this as something other than a percentage in the
future.
CC: Zeng Heng <zengheng4@huawei.com>
Co-developed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>>
---
drivers/resctrl/mpam_resctrl.c | 212 ++++++++++++++++++++++++++++++++-
1 file changed, 211 insertions(+), 1 deletion(-)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index 55576d0caf12..b9f3f00d8cad 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -247,6 +247,33 @@ static bool cache_has_usable_cpor(struct mpam_class *class)
return (class->props.cpbm_wd <= 32);
}
+static bool mba_class_use_mbw_max(struct mpam_props *cprops)
+{
+ return (mpam_has_feature(mpam_feat_mbw_max, cprops) &&
+ cprops->bwa_wd);
+}
+
+static bool class_has_usable_mba(struct mpam_props *cprops)
+{
+ return mba_class_use_mbw_max(cprops);
+}
+
+/*
+ * Calculate the worst-case percentage change from each implemented step
+ * in the control.
+ */
+static u32 get_mba_granularity(struct mpam_props *cprops)
+{
+ if (!mba_class_use_mbw_max(cprops))
+ return 0;
+
+ /*
+ * bwa_wd is the number of bits implemented in the 0.xxx
+ * fixed point fraction. 1 bit is 50%, 2 is 25% etc.
+ */
+ return DIV_ROUND_UP(MAX_MBA_BW, 1 << cprops->bwa_wd);
+}
+
/*
* Each fixed-point hardware value architecturally represents a range
* of values: the full range 0% - 100% is split contiguously into
@@ -287,6 +314,96 @@ static u16 percent_to_mbw_max(u8 pc, struct mpam_props *cprops)
return val;
}
+static u32 get_mba_min(struct mpam_props *cprops)
+{
+ u32 val = 0;
+
+ if (mba_class_use_mbw_max(cprops))
+ val = mbw_max_to_percent(val, cprops);
+ else
+ WARN_ON_ONCE(1);
+
+ return val;
+}
+
+/* Find the L3 cache that has affinity with this CPU */
+static int find_l3_equivalent_bitmask(int cpu, cpumask_var_t tmp_cpumask)
+{
+ u32 cache_id = get_cpu_cacheinfo_id(cpu, 3);
+
+ lockdep_assert_cpus_held();
+
+ return mpam_get_cpumask_from_cache_id(cache_id, 3, tmp_cpumask);
+}
+
+/*
+ * topology_matches_l3() - Is the provided class the same shape as L3
+ * @victim: The class we'd like to pretend is L3.
+ *
+ * resctrl expects all the world's a Xeon, and all counters are on the
+ * L3. We play fast and loose with this, mapping counters on other
+ * classes - provided the CPU->domain mapping is the same kind of shape.
+ *
+ * Using cacheinfo directly would make this work even if resctrl can't
+ * use the L3 - but cacheinfo can't tell us anything about offline CPUs.
+ * Using the L3 resctrl domain list also depends on CPUs being online.
+ * Using the mpam_class we picked for L3 so we can use its domain list
+ * assumes that there are MPAM controls on the L3.
+ * Instead, this path eventually uses the mpam_get_cpumask_from_cache_id()
+ * helper which can tell us about offline CPUs ... but getting the cache_id
+ * to start with relies on at least one CPU per L3 cache being online at
+ * boot.
+ *
+ * Walk the victim component list and compare the affinity mask with the
+ * corresponding L3. The topology matches if each victim:component's affinity
+ * mask is the same as the CPU's corresponding L3's. These lists/masks are
+ * computed from firmware tables so don't change at runtime.
+ */
+static bool topology_matches_l3(struct mpam_class *victim)
+{
+ int cpu, err;
+ struct mpam_component *victim_iter;
+ cpumask_var_t __free(free_cpumask_var) tmp_cpumask;
+
+ if (!alloc_cpumask_var(&tmp_cpumask, GFP_KERNEL))
+ return false;
+
+ guard(srcu)(&mpam_srcu);
+ list_for_each_entry_srcu(victim_iter, &victim->components, class_list,
+ srcu_read_lock_held(&mpam_srcu)) {
+ if (cpumask_empty(&victim_iter->affinity)) {
+ pr_debug("class %u has CPU-less component %u - can't match L3!\n",
+ victim->level, victim_iter->comp_id);
+ return false;
+ }
+
+ cpu = cpumask_any(&victim_iter->affinity);
+ if (WARN_ON_ONCE(cpu >= nr_cpu_ids))
+ return false;
+
+ cpumask_clear(tmp_cpumask);
+ err = find_l3_equivalent_bitmask(cpu, tmp_cpumask);
+ if (err) {
+ pr_debug("Failed to find L3's equivalent component to class %u component %u\n",
+ victim->level, victim_iter->comp_id);
+ return false;
+ }
+
+ /* Any differing bits in the affinity mask? */
+ if (!cpumask_equal(tmp_cpumask, &victim_iter->affinity)) {
+ pr_debug("class %u component %u has Mismatched CPU mask with L3 equivalent\n"
+ "L3:%*pbl != victim:%*pbl\n",
+ victim->level, victim_iter->comp_id,
+ cpumask_pr_args(tmp_cpumask),
+ cpumask_pr_args(&victim_iter->affinity));
+
+ return false;
+ }
+ }
+
+ return true;
+}
+
/* Test whether we can export MPAM_CLASS_CACHE:{2,3}? */
static void mpam_resctrl_pick_caches(void)
{
@@ -330,10 +447,63 @@ static void mpam_resctrl_pick_caches(void)
}
}
+static void mpam_resctrl_pick_mba(void)
+{
+ struct mpam_class *class, *candidate_class = NULL;
+ struct mpam_resctrl_res *res;
+
+ lockdep_assert_cpus_held();
+
+ guard(srcu)(&mpam_srcu);
+ list_for_each_entry_srcu(class, &mpam_classes, classes_list,
+ srcu_read_lock_held(&mpam_srcu)) {
+ struct mpam_props *cprops = &class->props;
+
+ if (class->level < 3) {
+ pr_debug("class %u is before L3\n", class->level);
+ continue;
+ }
+
+ if (!class_has_usable_mba(cprops)) {
+ pr_debug("class %u has no bandwidth control\n",
+ class->level);
+ continue;
+ }
+
+ if (!cpumask_equal(&class->affinity, cpu_possible_mask)) {
+ pr_debug("class %u has missing CPUs\n", class->level);
+ continue;
+ }
+
+ if (!topology_matches_l3(class)) {
+ pr_debug("class %u topology doesn't match L3\n",
+ class->level);
+ continue;
+ }
+
+ /*
+ * mba_sc reads the mbm_local counter, and waggles the MBA
+ * controls. mbm_local is implicitly part of the L3, pick a
+ * resource to be MBA that as close as possible to the L3.
+ */
+ if (!candidate_class || class->level < candidate_class->level)
+ candidate_class = class;
+ }
+
+ if (candidate_class) {
+ pr_debug("selected class %u to back MBA\n",
+ candidate_class->level);
+ res = &mpam_resctrl_controls[RDT_RESOURCE_MBA];
+ res->class = candidate_class;
+ exposed_alloc_capable = true;
+ }
+}
+
static int mpam_resctrl_control_init(struct mpam_resctrl_res *res,
enum resctrl_res_level type)
{
struct mpam_class *class = res->class;
+ struct mpam_props *cprops = &class->props;
struct rdt_resource *r = &res->resctrl_res;
switch (res->resctrl_res.rid) {
@@ -362,6 +532,20 @@ static int mpam_resctrl_control_init(struct mpam_resctrl_res *res,
* 'all the bits' is the correct answer here.
*/
r->cache.shareable_bits = resctrl_get_default_ctrl(r);
+ break;
+ case RDT_RESOURCE_MBA:
+ r->alloc_capable = true;
+ r->schema_fmt = RESCTRL_SCHEMA_RANGE;
+ r->ctrl_scope = RESCTRL_L3_CACHE;
+
+ r->membw.delay_linear = true;
+ r->membw.throttle_mode = THREAD_THROTTLE_UNDEFINED;
+ r->membw.min_bw = get_mba_min(cprops);
+ r->membw.max_bw = MAX_MBA_BW;
+ r->membw.bw_gran = get_mba_granularity(cprops);
+
+ r->name = "MB";
+
break;
default:
break;
@@ -377,7 +561,17 @@ static int mpam_resctrl_pick_domain_id(int cpu, struct mpam_component *comp)
if (class->type == MPAM_CLASS_CACHE)
return comp->comp_id;
- /* TODO: repaint domain ids to match the L3 domain ids */
+ if (topology_matches_l3(class)) {
+ /* Use the corresponding L3 component ID as the domain ID */
+ int id = get_cpu_cacheinfo_id(cpu, 3);
+
+ /* Implies topology_matches_l3() made a mistake */
+ if (WARN_ON_ONCE(id == -1))
+ return comp->comp_id;
+
+ return id;
+ }
+
/*
* Otherwise, expose the ID used by the firmware table code.
*/
@@ -419,6 +613,12 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
case RDT_RESOURCE_L3:
configured_by = mpam_feat_cpor_part;
break;
+ case RDT_RESOURCE_MBA:
+ if (mpam_has_feature(mpam_feat_mbw_max, cprops)) {
+ configured_by = mpam_feat_mbw_max;
+ break;
+ }
+ fallthrough;
default:
return resctrl_get_default_ctrl(r);
}
@@ -430,6 +630,8 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
switch (configured_by) {
case mpam_feat_cpor_part:
return cfg->cpbm;
+ case mpam_feat_mbw_max:
+ return mbw_max_to_percent(cfg->mbw_max, cprops);
default:
return resctrl_get_default_ctrl(r);
}
@@ -474,6 +676,13 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
cfg.cpbm = cfg_val;
mpam_set_feature(mpam_feat_cpor_part, &cfg);
break;
+ case RDT_RESOURCE_MBA:
+ if (mpam_has_feature(mpam_feat_mbw_max, cprops)) {
+ cfg.mbw_max = percent_to_mbw_max(cfg_val, cprops);
+ mpam_set_feature(mpam_feat_mbw_max, &cfg);
+ break;
+ }
+ fallthrough;
default:
return -EINVAL;
}
@@ -743,6 +952,7 @@ int mpam_resctrl_setup(void)
/* Find some classes to use for controls */
mpam_resctrl_pick_caches();
+ mpam_resctrl_pick_mba();
/* Initialise the resctrl structures from the classes */
for (i = 0; i < RDT_NUM_RESOURCES; i++) {
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [RFC PATCH 16/38] arm_mpam: resctrl: Add support for 'MB' resource
2025-12-05 21:58 ` [RFC PATCH 16/38] arm_mpam: resctrl: Add support for 'MB' resource James Morse
@ 2025-12-12 4:27 ` Gavin Shan
2025-12-16 15:56 ` Ben Horgan
0 siblings, 1 reply; 95+ messages in thread
From: Gavin Shan @ 2025-12-12 4:27 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel
Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Ben Horgan, rohit.mathew, reinette.chatre, Punit Agrawal,
Zeng Heng
Hi James and Ben,
On 12/6/25 7:58 AM, James Morse wrote:
> resctrl supports 'MB', as a percentage throttling of traffic somewhere
> after the L3. This is the control that mba_sc uses, so ideally the
> class chosen should be as close as possible to the counters used for
> mba_local.
>
> MB's percentage control should be backed either with the fixed point
> fraction MBW_MAX. The bandwidth portion bitmaps is not used as its
> tricky to pick which bits to use to avoid contention, and may be
> possible to expose this as something other than a percentage in the
> future.
>
> CC: Zeng Heng <zengheng4@huawei.com>
> Co-developed-by: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: James Morse <james.morse@arm.com>>
> ---
> drivers/resctrl/mpam_resctrl.c | 212 ++++++++++++++++++++++++++++++++-
> 1 file changed, 211 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
> index 55576d0caf12..b9f3f00d8cad 100644
> --- a/drivers/resctrl/mpam_resctrl.c
> +++ b/drivers/resctrl/mpam_resctrl.c
> @@ -247,6 +247,33 @@ static bool cache_has_usable_cpor(struct mpam_class *class)
> return (class->props.cpbm_wd <= 32);
> }
>
> +static bool mba_class_use_mbw_max(struct mpam_props *cprops)
> +{
> + return (mpam_has_feature(mpam_feat_mbw_max, cprops) &&
> + cprops->bwa_wd);
> +}
> +
> +static bool class_has_usable_mba(struct mpam_props *cprops)
> +{
> + return mba_class_use_mbw_max(cprops);
> +}
> +
> +/*
> + * Calculate the worst-case percentage change from each implemented step
> + * in the control.
> + */
> +static u32 get_mba_granularity(struct mpam_props *cprops)
> +{
> + if (!mba_class_use_mbw_max(cprops))
> + return 0;
> +
> + /*
> + * bwa_wd is the number of bits implemented in the 0.xxx
> + * fixed point fraction. 1 bit is 50%, 2 is 25% etc.
> + */
> + return DIV_ROUND_UP(MAX_MBA_BW, 1 << cprops->bwa_wd);
> +}
> +
> /*
> * Each fixed-point hardware value architecturally represents a range
> * of values: the full range 0% - 100% is split contiguously into
> @@ -287,6 +314,96 @@ static u16 percent_to_mbw_max(u8 pc, struct mpam_props *cprops)
> return val;
> }
>
> +static u32 get_mba_min(struct mpam_props *cprops)
> +{
> + u32 val = 0;
> +
> + if (mba_class_use_mbw_max(cprops))
> + val = mbw_max_to_percent(val, cprops);
> + else
> + WARN_ON_ONCE(1);
> +
> + return val;
> +}
> +
> +/* Find the L3 cache that has affinity with this CPU */
> +static int find_l3_equivalent_bitmask(int cpu, cpumask_var_t tmp_cpumask)
> +{
> + u32 cache_id = get_cpu_cacheinfo_id(cpu, 3);
> +
> + lockdep_assert_cpus_held();
> +
> + return mpam_get_cpumask_from_cache_id(cache_id, 3, tmp_cpumask);
> +}
> +
> +/*
> + * topology_matches_l3() - Is the provided class the same shape as L3
> + * @victim: The class we'd like to pretend is L3.
> + *
> + * resctrl expects all the world's a Xeon, and all counters are on the
> + * L3. We play fast and loose with this, mapping counters on other
> + * classes - provided the CPU->domain mapping is the same kind of shape.
> + *
> + * Using cacheinfo directly would make this work even if resctrl can't
> + * use the L3 - but cacheinfo can't tell us anything about offline CPUs.
> + * Using the L3 resctrl domain list also depends on CPUs being online.
> + * Using the mpam_class we picked for L3 so we can use its domain list
> + * assumes that there are MPAM controls on the L3.
> + * Instead, this path eventually uses the mpam_get_cpumask_from_cache_id()
> + * helper which can tell us about offline CPUs ... but getting the cache_id
> + * to start with relies on at least one CPU per L3 cache being online at
> + * boot.
> + *
> + * Walk the victim component list and compare the affinity mask with the
> + * corresponding L3. The topology matches if each victim:component's affinity
> + * mask is the same as the CPU's corresponding L3's. These lists/masks are
> + * computed from firmware tables so don't change at runtime.
> + */
> +static bool topology_matches_l3(struct mpam_class *victim)
> +{
> + int cpu, err;
> + struct mpam_component *victim_iter;
> + cpumask_var_t __free(free_cpumask_var) tmp_cpumask;
> +
> + if (!alloc_cpumask_var(&tmp_cpumask, GFP_KERNEL))
> + return false;
> +
> + guard(srcu)(&mpam_srcu);
> + list_for_each_entry_srcu(victim_iter, &victim->components, class_list,
> + srcu_read_lock_held(&mpam_srcu)) {
> + if (cpumask_empty(&victim_iter->affinity)) {
> + pr_debug("class %u has CPU-less component %u - can't match L3!\n",
> + victim->level, victim_iter->comp_id);
> + return false;
> + }
> +
> + cpu = cpumask_any(&victim_iter->affinity);
> + if (WARN_ON_ONCE(cpu >= nr_cpu_ids))
> + return false;
> +
> + cpumask_clear(tmp_cpumask);
> + err = find_l3_equivalent_bitmask(cpu, tmp_cpumask);
> + if (err) {
> + pr_debug("Failed to find L3's equivalent component to class %u component %u\n",
> + victim->level, victim_iter->comp_id);
> + return false;
> + }
> +
> + /* Any differing bits in the affinity mask? */
> + if (!cpumask_equal(tmp_cpumask, &victim_iter->affinity)) {
> + pr_debug("class %u component %u has Mismatched CPU mask with L3 equivalent\n"
> + "L3:%*pbl != victim:%*pbl\n",
> + victim->level, victim_iter->comp_id,
> + cpumask_pr_args(tmp_cpumask),
> + cpumask_pr_args(&victim_iter->affinity));
> +
> + return false;
> + }
> + }
> +
> + return true;
> +}
> +
> /* Test whether we can export MPAM_CLASS_CACHE:{2,3}? */
> static void mpam_resctrl_pick_caches(void)
> {
> @@ -330,10 +447,63 @@ static void mpam_resctrl_pick_caches(void)
> }
> }
>
> +static void mpam_resctrl_pick_mba(void)
> +{
> + struct mpam_class *class, *candidate_class = NULL;
> + struct mpam_resctrl_res *res;
> +
> + lockdep_assert_cpus_held();
> +
> + guard(srcu)(&mpam_srcu);
> + list_for_each_entry_srcu(class, &mpam_classes, classes_list,
> + srcu_read_lock_held(&mpam_srcu)) {
> + struct mpam_props *cprops = &class->props;
> +
> + if (class->level < 3) {
> + pr_debug("class %u is before L3\n", class->level);
> + continue;
> + }
> +
> + if (!class_has_usable_mba(cprops)) {
> + pr_debug("class %u has no bandwidth control\n",
> + class->level);
> + continue;
> + }
> +
> + if (!cpumask_equal(&class->affinity, cpu_possible_mask)) {
> + pr_debug("class %u has missing CPUs\n", class->level);
> + continue;
> + }
> +
> + if (!topology_matches_l3(class)) {
> + pr_debug("class %u topology doesn't match L3\n",
> + class->level);
> + continue;
> + }
> +
> + /*
> + * mba_sc reads the mbm_local counter, and waggles the MBA
> + * controls. mbm_local is implicitly part of the L3, pick a
> + * resource to be MBA that as close as possible to the L3.
> + */
> + if (!candidate_class || class->level < candidate_class->level)
> + candidate_class = class;
> + }
> +
> + if (candidate_class) {
> + pr_debug("selected class %u to back MBA\n",
> + candidate_class->level);
> + res = &mpam_resctrl_controls[RDT_RESOURCE_MBA];
> + res->class = candidate_class;
> + exposed_alloc_capable = true;
> + }
> +}
> +
> static int mpam_resctrl_control_init(struct mpam_resctrl_res *res,
> enum resctrl_res_level type)
> {
> struct mpam_class *class = res->class;
> + struct mpam_props *cprops = &class->props;
> struct rdt_resource *r = &res->resctrl_res;
>
> switch (res->resctrl_res.rid) {
> @@ -362,6 +532,20 @@ static int mpam_resctrl_control_init(struct mpam_resctrl_res *res,
> * 'all the bits' is the correct answer here.
> */
> r->cache.shareable_bits = resctrl_get_default_ctrl(r);
> + break;
> + case RDT_RESOURCE_MBA:
> + r->alloc_capable = true;
> + r->schema_fmt = RESCTRL_SCHEMA_RANGE;
> + r->ctrl_scope = RESCTRL_L3_CACHE;
> +
> + r->membw.delay_linear = true;
> + r->membw.throttle_mode = THREAD_THROTTLE_UNDEFINED;
> + r->membw.min_bw = get_mba_min(cprops);
> + r->membw.max_bw = MAX_MBA_BW;
> + r->membw.bw_gran = get_mba_granularity(cprops);
> +
> + r->name = "MB";
> +
> break;
> default:
> break;
> @@ -377,7 +561,17 @@ static int mpam_resctrl_pick_domain_id(int cpu, struct mpam_component *comp)
> if (class->type == MPAM_CLASS_CACHE)
> return comp->comp_id;
>
> - /* TODO: repaint domain ids to match the L3 domain ids */
> + if (topology_matches_l3(class)) {
> + /* Use the corresponding L3 component ID as the domain ID */
> + int id = get_cpu_cacheinfo_id(cpu, 3);
> +
> + /* Implies topology_matches_l3() made a mistake */
> + if (WARN_ON_ONCE(id == -1))
> + return comp->comp_id;
> +
> + return id;
> + }
> +
> /*
> * Otherwise, expose the ID used by the firmware table code.
> */
> @@ -419,6 +613,12 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
> case RDT_RESOURCE_L3:
> configured_by = mpam_feat_cpor_part;
> break;
> + case RDT_RESOURCE_MBA:
> + if (mpam_has_feature(mpam_feat_mbw_max, cprops)) {
> + configured_by = mpam_feat_mbw_max;
> + break;
> + }
> + fallthrough;
> default:
> return resctrl_get_default_ctrl(r);
> }
> @@ -430,6 +630,8 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
> switch (configured_by) {
> case mpam_feat_cpor_part:
> return cfg->cpbm;
> + case mpam_feat_mbw_max:
> + return mbw_max_to_percent(cfg->mbw_max, cprops);
> default:
> return resctrl_get_default_ctrl(r);
> }
> @@ -474,6 +676,13 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
> cfg.cpbm = cfg_val;
> mpam_set_feature(mpam_feat_cpor_part, &cfg);
> break;
> + case RDT_RESOURCE_MBA:
> + if (mpam_has_feature(mpam_feat_mbw_max, cprops)) {
> + cfg.mbw_max = percent_to_mbw_max(cfg_val, cprops);
> + mpam_set_feature(mpam_feat_mbw_max, &cfg);
> + break;
> + }
> + fallthrough;
I think mpam_feat_mbw_min peroperly need to be cleared in '&cfg', whose content
is copied from that of the component. mpam_feat_mbw_min may have been existing
in '&cfg' and struct mpam_config::mbw_min won't be updated correctly in the
subsequent call mpam_extend_config(). It means register MPAMCFG_MBW_MIN isn't
updated correctly.
On NVidia's grace-hopper machine, I got:
host$ mount none -tresctrl /sys/fs/resctrl/
host$ mkdir -p /sys/fs/resctrl/all
host$ mkdir -p /sys/fs/resctrl/test
host$ cat /proc/dump_feat_regs
MPAMF_IDR 0000008057010027
MAPMF_MBW_IDR 00000c07
host$ echo "MB:1=98" > /sys/fs/resctrl/test/schemata
host$ cat /proc/dump_cfg_regs
MPAMCFG_PART_SEL 00000002
MPAMCFG_MBW_MAX 0000f9ff
MPAMCFG_MBW_MIN 0000f000
host$ echo "MB:1=2" > /sys/fs/resctrl/test/schemata
host$ cat /proc/dump_cfg_regs
MPAMCFG_PART_SEL 00000002
MPAMCFG_MBW_MAX 000005ff
MPAMCFG_MBW_MIN 0000f000
With 'mpam_clear_feature(mpam_feat_mbw_min, &cfg);' applied here, the register
can be updated correctly. It also makes my (soft) MBW limiting tests happy.
host$ echo "MB:1=98" > /sys/fs/resctrl/test/schemata
host$ cat /proc/dump_cfg_regs
MPAMCFG_PART_SEL 00000002
MPAMCFG_MBW_MAX 0000f9ff
MPAMCFG_MBW_MIN 0000ea00
host$ echo "MB:1=2" > /sys/fs/resctrl/test/schemata
host$ cat /proc/dump_cfg_regs
MPAMCFG_PART_SEL 00000002
MPAMCFG_MBW_MAX 000005ff
MPAMCFG_MBW_MIN 00000200
Thanks,
Gavin
> default:
> return -EINVAL;
> }
> @@ -743,6 +952,7 @@ int mpam_resctrl_setup(void)
>
> /* Find some classes to use for controls */
> mpam_resctrl_pick_caches();
> + mpam_resctrl_pick_mba();
>
> /* Initialise the resctrl structures from the classes */
> for (i = 0; i < RDT_NUM_RESOURCES; i++) {
^ permalink raw reply [flat|nested] 95+ messages in thread* Re: [RFC PATCH 16/38] arm_mpam: resctrl: Add support for 'MB' resource
2025-12-12 4:27 ` Gavin Shan
@ 2025-12-16 15:56 ` Ben Horgan
0 siblings, 0 replies; 95+ messages in thread
From: Ben Horgan @ 2025-12-16 15:56 UTC (permalink / raw)
To: Gavin Shan, James Morse, linux-kernel, linux-arm-kernel
Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
rohit.mathew, reinette.chatre, Punit Agrawal, Zeng Heng
Hi Gavin,
On 12/12/25 04:27, Gavin Shan wrote:
> Hi James and Ben,
>
> On 12/6/25 7:58 AM, James Morse wrote:
>> resctrl supports 'MB', as a percentage throttling of traffic somewhere
>> after the L3. This is the control that mba_sc uses, so ideally the
>> class chosen should be as close as possible to the counters used for
>> mba_local.
>>
>> MB's percentage control should be backed either with the fixed point
>> fraction MBW_MAX. The bandwidth portion bitmaps is not used as its
>> tricky to pick which bits to use to avoid contention, and may be
>> possible to expose this as something other than a percentage in the
>> future.
>>
>> CC: Zeng Heng <zengheng4@huawei.com>
>> Co-developed-by: Dave Martin <Dave.Martin@arm.com>
>> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
>> Signed-off-by: James Morse <james.morse@arm.com>>
>> ---
>> drivers/resctrl/mpam_resctrl.c | 212 ++++++++++++++++++++++++++++++++-
>> 1 file changed, 211 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/
>> mpam_resctrl.c
>> index 55576d0caf12..b9f3f00d8cad 100644
>> --- a/drivers/resctrl/mpam_resctrl.c
>> +++ b/drivers/resctrl/mpam_resctrl.c
>> @@ -247,6 +247,33 @@ static bool cache_has_usable_cpor(struct
>> mpam_class *class)
>> return (class->props.cpbm_wd <= 32);
>> }
>> +static bool mba_class_use_mbw_max(struct mpam_props *cprops)
>> +{
>> + return (mpam_has_feature(mpam_feat_mbw_max, cprops) &&
>> + cprops->bwa_wd);
>> +}
>> +
>> +static bool class_has_usable_mba(struct mpam_props *cprops)
>> +{
>> + return mba_class_use_mbw_max(cprops);
>> +}
>> +
>> +/*
>> + * Calculate the worst-case percentage change from each implemented step
>> + * in the control.
>> + */
>> +static u32 get_mba_granularity(struct mpam_props *cprops)
>> +{
>> + if (!mba_class_use_mbw_max(cprops))
>> + return 0;
>> +
>> + /*
>> + * bwa_wd is the number of bits implemented in the 0.xxx
>> + * fixed point fraction. 1 bit is 50%, 2 is 25% etc.
>> + */
>> + return DIV_ROUND_UP(MAX_MBA_BW, 1 << cprops->bwa_wd);
>> +}
>> +
>> /*
>> * Each fixed-point hardware value architecturally represents a range
>> * of values: the full range 0% - 100% is split contiguously into
>> @@ -287,6 +314,96 @@ static u16 percent_to_mbw_max(u8 pc, struct
>> mpam_props *cprops)
>> return val;
>> }
>> +static u32 get_mba_min(struct mpam_props *cprops)
>> +{
>> + u32 val = 0;
>> +
>> + if (mba_class_use_mbw_max(cprops))
>> + val = mbw_max_to_percent(val, cprops);
>> + else
>> + WARN_ON_ONCE(1);
>> +
>> + return val;
>> +}
>> +
>> +/* Find the L3 cache that has affinity with this CPU */
>> +static int find_l3_equivalent_bitmask(int cpu, cpumask_var_t
>> tmp_cpumask)
>> +{
>> + u32 cache_id = get_cpu_cacheinfo_id(cpu, 3);
>> +
>> + lockdep_assert_cpus_held();
>> +
>> + return mpam_get_cpumask_from_cache_id(cache_id, 3, tmp_cpumask);
>> +}
>> +
>> +/*
>> + * topology_matches_l3() - Is the provided class the same shape as L3
>> + * @victim: The class we'd like to pretend is L3.
>> + *
>> + * resctrl expects all the world's a Xeon, and all counters are on the
>> + * L3. We play fast and loose with this, mapping counters on other
>> + * classes - provided the CPU->domain mapping is the same kind of shape.
>> + *
>> + * Using cacheinfo directly would make this work even if resctrl can't
>> + * use the L3 - but cacheinfo can't tell us anything about offline CPUs.
>> + * Using the L3 resctrl domain list also depends on CPUs being online.
>> + * Using the mpam_class we picked for L3 so we can use its domain list
>> + * assumes that there are MPAM controls on the L3.
>> + * Instead, this path eventually uses the
>> mpam_get_cpumask_from_cache_id()
>> + * helper which can tell us about offline CPUs ... but getting the
>> cache_id
>> + * to start with relies on at least one CPU per L3 cache being online at
>> + * boot.
>> + *
>> + * Walk the victim component list and compare the affinity mask with the
>> + * corresponding L3. The topology matches if each victim:component's
>> affinity
>> + * mask is the same as the CPU's corresponding L3's. These lists/
>> masks are
>> + * computed from firmware tables so don't change at runtime.
>> + */
>> +static bool topology_matches_l3(struct mpam_class *victim)
>> +{
>> + int cpu, err;
>> + struct mpam_component *victim_iter;
>> + cpumask_var_t __free(free_cpumask_var) tmp_cpumask;
>> +
>> + if (!alloc_cpumask_var(&tmp_cpumask, GFP_KERNEL))
>> + return false;
>> +
>> + guard(srcu)(&mpam_srcu);
>> + list_for_each_entry_srcu(victim_iter, &victim->components,
>> class_list,
>> + srcu_read_lock_held(&mpam_srcu)) {
>> + if (cpumask_empty(&victim_iter->affinity)) {
>> + pr_debug("class %u has CPU-less component %u - can't
>> match L3!\n",
>> + victim->level, victim_iter->comp_id);
>> + return false;
>> + }
>> +
>> + cpu = cpumask_any(&victim_iter->affinity);
>> + if (WARN_ON_ONCE(cpu >= nr_cpu_ids))
>> + return false;
>> +
>> + cpumask_clear(tmp_cpumask);
>> + err = find_l3_equivalent_bitmask(cpu, tmp_cpumask);
>> + if (err) {
>> + pr_debug("Failed to find L3's equivalent component to
>> class %u component %u\n",
>> + victim->level, victim_iter->comp_id);
>> + return false;
>> + }
>> +
>> + /* Any differing bits in the affinity mask? */
>> + if (!cpumask_equal(tmp_cpumask, &victim_iter->affinity)) {
>> + pr_debug("class %u component %u has Mismatched CPU mask
>> with L3 equivalent\n"
>> + "L3:%*pbl != victim:%*pbl\n",
>> + victim->level, victim_iter->comp_id,
>> + cpumask_pr_args(tmp_cpumask),
>> + cpumask_pr_args(&victim_iter->affinity));
>> +
>> + return false;
>> + }
>> + }
>> +
>> + return true;
>> +}
>> +
>> /* Test whether we can export MPAM_CLASS_CACHE:{2,3}? */
>> static void mpam_resctrl_pick_caches(void)
>> {
>> @@ -330,10 +447,63 @@ static void mpam_resctrl_pick_caches(void)
>> }
>> }
>> +static void mpam_resctrl_pick_mba(void)
>> +{
>> + struct mpam_class *class, *candidate_class = NULL;
>> + struct mpam_resctrl_res *res;
>> +
>> + lockdep_assert_cpus_held();
>> +
>> + guard(srcu)(&mpam_srcu);
>> + list_for_each_entry_srcu(class, &mpam_classes, classes_list,
>> + srcu_read_lock_held(&mpam_srcu)) {
>> + struct mpam_props *cprops = &class->props;
>> +
>> + if (class->level < 3) {
>> + pr_debug("class %u is before L3\n", class->level);
>> + continue;
>> + }
>> +
>> + if (!class_has_usable_mba(cprops)) {
>> + pr_debug("class %u has no bandwidth control\n",
>> + class->level);
>> + continue;
>> + }
>> +
>> + if (!cpumask_equal(&class->affinity, cpu_possible_mask)) {
>> + pr_debug("class %u has missing CPUs\n", class->level);
>> + continue;
>> + }
>> +
>> + if (!topology_matches_l3(class)) {
>> + pr_debug("class %u topology doesn't match L3\n",
>> + class->level);
>> + continue;
>> + }
>> +
>> + /*
>> + * mba_sc reads the mbm_local counter, and waggles the MBA
>> + * controls. mbm_local is implicitly part of the L3, pick a
>> + * resource to be MBA that as close as possible to the L3.
>> + */
>> + if (!candidate_class || class->level < candidate_class->level)
>> + candidate_class = class;
>> + }
>> +
>> + if (candidate_class) {
>> + pr_debug("selected class %u to back MBA\n",
>> + candidate_class->level);
>> + res = &mpam_resctrl_controls[RDT_RESOURCE_MBA];
>> + res->class = candidate_class;
>> + exposed_alloc_capable = true;
>> + }
>> +}
>> +
>> static int mpam_resctrl_control_init(struct mpam_resctrl_res *res,
>> enum resctrl_res_level type)
>> {
>> struct mpam_class *class = res->class;
>> + struct mpam_props *cprops = &class->props;
>> struct rdt_resource *r = &res->resctrl_res;
>> switch (res->resctrl_res.rid) {
>> @@ -362,6 +532,20 @@ static int mpam_resctrl_control_init(struct
>> mpam_resctrl_res *res,
>> * 'all the bits' is the correct answer here.
>> */
>> r->cache.shareable_bits = resctrl_get_default_ctrl(r);
>> + break;
>> + case RDT_RESOURCE_MBA:
>> + r->alloc_capable = true;
>> + r->schema_fmt = RESCTRL_SCHEMA_RANGE;
>> + r->ctrl_scope = RESCTRL_L3_CACHE;
>> +
>> + r->membw.delay_linear = true;
>> + r->membw.throttle_mode = THREAD_THROTTLE_UNDEFINED;
>> + r->membw.min_bw = get_mba_min(cprops);
>> + r->membw.max_bw = MAX_MBA_BW;
>> + r->membw.bw_gran = get_mba_granularity(cprops);
>> +
>> + r->name = "MB";
>> +
>> break;
>> default:
>> break;
>> @@ -377,7 +561,17 @@ static int mpam_resctrl_pick_domain_id(int cpu,
>> struct mpam_component *comp)
>> if (class->type == MPAM_CLASS_CACHE)
>> return comp->comp_id;
>> - /* TODO: repaint domain ids to match the L3 domain ids */
>> + if (topology_matches_l3(class)) {
>> + /* Use the corresponding L3 component ID as the domain ID */
>> + int id = get_cpu_cacheinfo_id(cpu, 3);
>> +
>> + /* Implies topology_matches_l3() made a mistake */
>> + if (WARN_ON_ONCE(id == -1))
>> + return comp->comp_id;
>> +
>> + return id;
>> + }
>> +
>> /*
>> * Otherwise, expose the ID used by the firmware table code.
>> */
>> @@ -419,6 +613,12 @@ u32 resctrl_arch_get_config(struct rdt_resource
>> *r, struct rdt_ctrl_domain *d,
>> case RDT_RESOURCE_L3:
>> configured_by = mpam_feat_cpor_part;
>> break;
>> + case RDT_RESOURCE_MBA:
>> + if (mpam_has_feature(mpam_feat_mbw_max, cprops)) {
>> + configured_by = mpam_feat_mbw_max;
>> + break;
>> + }
>> + fallthrough;
>> default:
>> return resctrl_get_default_ctrl(r);
>> }
>> @@ -430,6 +630,8 @@ u32 resctrl_arch_get_config(struct rdt_resource
>> *r, struct rdt_ctrl_domain *d,
>> switch (configured_by) {
>> case mpam_feat_cpor_part:
>> return cfg->cpbm;
>> + case mpam_feat_mbw_max:
>> + return mbw_max_to_percent(cfg->mbw_max, cprops);
>> default:
>> return resctrl_get_default_ctrl(r);
>> }
>> @@ -474,6 +676,13 @@ int resctrl_arch_update_one(struct rdt_resource
>> *r, struct rdt_ctrl_domain *d,
>> cfg.cpbm = cfg_val;
>> mpam_set_feature(mpam_feat_cpor_part, &cfg);
>> break;
>> + case RDT_RESOURCE_MBA:
>> + if (mpam_has_feature(mpam_feat_mbw_max, cprops)) {
>> + cfg.mbw_max = percent_to_mbw_max(cfg_val, cprops);
>> + mpam_set_feature(mpam_feat_mbw_max, &cfg);
>> + break;
>> + }
>> + fallthrough;
>
> I think mpam_feat_mbw_min peroperly need to be cleared in '&cfg', whose
> content
> is copied from that of the component. mpam_feat_mbw_min may have been
> existing
> in '&cfg' and struct mpam_config::mbw_min won't be updated correctly in the
> subsequent call mpam_extend_config(). It means register MPAMCFG_MBW_MIN
> isn't
> updated correctly.
Thanks for the report. The suggested fix makes sense too. I'll fix this
in the later patch where mbw_min is introduced.
>
> On NVidia's grace-hopper machine, I got:
>
> host$ mount none -tresctrl /sys/fs/resctrl/
> host$ mkdir -p /sys/fs/resctrl/all
> host$ mkdir -p /sys/fs/resctrl/test
> host$ cat /proc/dump_feat_regs
> MPAMF_IDR 0000008057010027
> MAPMF_MBW_IDR 00000c07
>
> host$ echo "MB:1=98" > /sys/fs/resctrl/test/schemata
> host$ cat /proc/dump_cfg_regs
> MPAMCFG_PART_SEL 00000002
> MPAMCFG_MBW_MAX 0000f9ff
> MPAMCFG_MBW_MIN 0000f000
>
> host$ echo "MB:1=2" > /sys/fs/resctrl/test/schemata
> host$ cat /proc/dump_cfg_regs
> MPAMCFG_PART_SEL 00000002
> MPAMCFG_MBW_MAX 000005ff
> MPAMCFG_MBW_MIN 0000f000
>
> With 'mpam_clear_feature(mpam_feat_mbw_min, &cfg);' applied here, the
> register
> can be updated correctly. It also makes my (soft) MBW limiting tests happy.
>
> host$ echo "MB:1=98" > /sys/fs/resctrl/test/schemata
> host$ cat /proc/dump_cfg_regs
> MPAMCFG_PART_SEL 00000002
> MPAMCFG_MBW_MAX 0000f9ff
> MPAMCFG_MBW_MIN 0000ea00
>
> host$ echo "MB:1=2" > /sys/fs/resctrl/test/schemata
> host$ cat /proc/dump_cfg_regs
> MPAMCFG_PART_SEL 00000002
> MPAMCFG_MBW_MAX 000005ff
> MPAMCFG_MBW_MIN 00000200
>
>
> Thanks,
> Gavin
>
>> default:
>> return -EINVAL;
>> }
>> @@ -743,6 +952,7 @@ int mpam_resctrl_setup(void)
>> /* Find some classes to use for controls */
>> mpam_resctrl_pick_caches();
>> + mpam_resctrl_pick_mba();
>> /* Initialise the resctrl structures from the classes */
>> for (i = 0; i < RDT_NUM_RESOURCES; i++) {
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread
* [RFC PATCH 17/38] arm_mpam: resctrl: Add kunit test for control format conversions
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (15 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 16/38] arm_mpam: resctrl: Add support for 'MB' resource James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-05 21:58 ` [RFC PATCH 18/38] arm_mpam: resctrl: Add support for csu counters James Morse
` (21 subsequent siblings)
38 siblings, 0 replies; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal, Dave Martin
From: Dave Martin <Dave.Martin@arm.com>
resctrl specifies the format of the control schemes, and these don't
match the hardware.
Some of the conversions are a bit hairy - add some kunit tests.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
[morse: squashed enough of Dave's fixes in here that it's his patch now!]
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_resctrl.c | 4 +
drivers/resctrl/test_mpam_resctrl.c | 312 ++++++++++++++++++++++++++++
2 files changed, 316 insertions(+)
create mode 100644 drivers/resctrl/test_mpam_resctrl.c
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index b9f3f00d8cad..fe830524639e 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -993,3 +993,7 @@ int mpam_resctrl_setup(void)
return err;
}
+
+#ifdef CONFIG_MPAM_KUNIT_TEST
+#include "test_mpam_resctrl.c"
+#endif
diff --git a/drivers/resctrl/test_mpam_resctrl.c b/drivers/resctrl/test_mpam_resctrl.c
new file mode 100644
index 000000000000..d0615aa7671c
--- /dev/null
+++ b/drivers/resctrl/test_mpam_resctrl.c
@@ -0,0 +1,312 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2025 Arm Ltd.
+/* This file is intended to be included into mpam_resctrl.c */
+
+#include <kunit/test.h>
+#include <linux/array_size.h>
+#include <linux/bits.h>
+#include <linux/math.h>
+#include <linux/sprintf.h>
+
+struct percent_value_case {
+ u8 pc;
+ u8 width;
+ u16 value;
+};
+
+/*
+ * Mysterious inscriptions taken from ARM DDI 0598D.b,
+ * "Arm Architecture Reference Manual Supplement - Memory System
+ * Resource Partitioning and Monitoring (MPAM), for A-profile
+ * architecture", Section 9.8, "About the fixed-point fractional
+ * format" (exact percentage entries only):
+ */
+static const struct percent_value_case percent_value_cases[] = {
+ /* Architectural cases: */
+ { 1, 8, 1 }, { 1, 12, 0x27 }, { 1, 16, 0x28e },
+ { 25, 8, 0x3f }, { 25, 12, 0x3ff }, { 25, 16, 0x3fff },
+ { 35, 8, 0x58 }, { 35, 12, 0x598 }, { 35, 16, 0x5998 },
+ { 45, 8, 0x72 }, { 45, 12, 0x732 }, { 45, 16, 0x7332 },
+ { 50, 8, 0x7f }, { 50, 12, 0x7ff }, { 50, 16, 0x7fff },
+ { 52, 8, 0x84 }, { 52, 12, 0x850 }, { 52, 16, 0x851d },
+ { 55, 8, 0x8b }, { 55, 12, 0x8cb }, { 55, 16, 0x8ccb },
+ { 58, 8, 0x93 }, { 58, 12, 0x946 }, { 58, 16, 0x9479 },
+ { 75, 8, 0xbf }, { 75, 12, 0xbff }, { 75, 16, 0xbfff },
+ { 88, 8, 0xe0 }, { 88, 12, 0xe13 }, { 88, 16, 0xe146 },
+ { 95, 8, 0xf2 }, { 95, 12, 0xf32 }, { 95, 16, 0xf332 },
+ { 100, 8, 0xff }, { 100, 12, 0xfff }, { 100, 16, 0xffff },
+
+};
+
+static void test_percent_value_desc(const struct percent_value_case *param,
+ char *desc)
+{
+ snprintf(desc, KUNIT_PARAM_DESC_SIZE,
+ "pc=%d, width=%d, value=0x%.*x\n",
+ param->pc, param->width,
+ DIV_ROUND_UP(param->width, 4), param->value);
+}
+
+KUNIT_ARRAY_PARAM(test_percent_value, percent_value_cases,
+ test_percent_value_desc);
+
+struct percent_value_test_info {
+ u32 pc; /* result of value-to-percent conversion */
+ u32 value; /* result of percent-to-value conversion */
+ u32 max_value; /* maximum raw value allowed by test params */
+ unsigned int shift; /* promotes raw testcase value to 16 bits */
+};
+
+/*
+ * Convert a reference percentage to a fixed-point MAX value and
+ * vice-versa, based on param (not test->param_value!)
+ */
+static void __prepare_percent_value_test(struct kunit *test,
+ struct percent_value_test_info *res,
+ const struct percent_value_case *param)
+{
+ struct mpam_props fake_props = { };
+
+ /* Reject bogus test parameters that would break the tests: */
+ KUNIT_ASSERT_GE(test, param->width, 1);
+ KUNIT_ASSERT_LE(test, param->width, 16);
+ KUNIT_ASSERT_LT(test, param->value, 1 << param->width);
+
+ mpam_set_feature(mpam_feat_mbw_max, &fake_props);
+ fake_props.bwa_wd = param->width;
+
+ res->shift = 16 - param->width;
+ res->max_value = GENMASK_U32(param->width - 1, 0);
+ res->value = percent_to_mbw_max(param->pc, &fake_props);
+ res->pc = mbw_max_to_percent(param->value << res->shift, &fake_props);
+}
+
+static void test_get_mba_granularity(struct kunit *test)
+{
+ int ret;
+ struct mpam_props fake_props = { };
+
+ /* Use MBW_MAX */
+ mpam_set_feature(mpam_feat_mbw_max, &fake_props);
+
+ fake_props.bwa_wd = 0;
+ KUNIT_EXPECT_FALSE(test, mba_class_use_mbw_max(&fake_props));
+
+ fake_props.bwa_wd = 1;
+ KUNIT_EXPECT_TRUE(test, mba_class_use_mbw_max(&fake_props));
+
+ /* Architectural maximum: */
+ fake_props.bwa_wd = 16;
+ KUNIT_EXPECT_TRUE(test, mba_class_use_mbw_max(&fake_props));
+
+ /* No usable control... */
+ fake_props.bwa_wd = 0;
+ ret = get_mba_granularity(&fake_props);
+ KUNIT_EXPECT_EQ(test, ret, 0);
+
+ fake_props.bwa_wd = 1;
+ ret = get_mba_granularity(&fake_props);
+ KUNIT_EXPECT_EQ(test, ret, 50); /* DIV_ROUND_UP(100, 1 << 1)% = 50% */
+
+ fake_props.bwa_wd = 2;
+ ret = get_mba_granularity(&fake_props);
+ KUNIT_EXPECT_EQ(test, ret, 25); /* DIV_ROUND_UP(100, 1 << 2)% = 25% */
+
+ fake_props.bwa_wd = 3;
+ ret = get_mba_granularity(&fake_props);
+ KUNIT_EXPECT_EQ(test, ret, 13); /* DIV_ROUND_UP(100, 1 << 3)% = 13% */
+
+ fake_props.bwa_wd = 6;
+ ret = get_mba_granularity(&fake_props);
+ KUNIT_EXPECT_EQ(test, ret, 2); /* DIV_ROUND_UP(100, 1 << 6)% = 2% */
+
+ fake_props.bwa_wd = 7;
+ ret = get_mba_granularity(&fake_props);
+ KUNIT_EXPECT_EQ(test, ret, 1); /* DIV_ROUND_UP(100, 1 << 7)% = 1% */
+
+ /* Granularity saturates at 1% */
+ fake_props.bwa_wd = 16; /* architectural maximum */
+ ret = get_mba_granularity(&fake_props);
+ KUNIT_EXPECT_EQ(test, ret, 1); /* DIV_ROUND_UP(100, 1 << 16)% = 1% */
+}
+
+static void test_mbw_max_to_percent(struct kunit *test)
+{
+ const struct percent_value_case *param = test->param_value;
+ struct percent_value_test_info res;
+
+ /*
+ * Since the reference values in percent_value_cases[] all
+ * correspond to exact percentages, round-to-nearest will
+ * always give the exact percentage back when the MPAM max
+ * value has precision of 0.5% or finer. (Always true for the
+ * reference data, since they all specify 8 bits or more of
+ * precision.
+ *
+ * So, keep it simple and demand an exact match:
+ */
+ __prepare_percent_value_test(test, &res, param);
+ KUNIT_EXPECT_EQ(test, res.pc, param->pc);
+}
+
+static void test_percent_to_mbw_max(struct kunit *test)
+{
+ const struct percent_value_case *param = test->param_value;
+ struct percent_value_test_info res;
+
+ __prepare_percent_value_test(test, &res, param);
+
+ KUNIT_EXPECT_GE(test, res.value, param->value << res.shift);
+ KUNIT_EXPECT_LE(test, res.value, (param->value + 1) << res.shift);
+ KUNIT_EXPECT_LE(test, res.value, res.max_value << res.shift);
+
+ /* No flexibility allowed for 0% and 100%! */
+
+ if (param->pc == 0)
+ KUNIT_EXPECT_EQ(test, res.value, 0);
+
+ if (param->pc == 100)
+ KUNIT_EXPECT_EQ(test, res.value, res.max_value << res.shift);
+}
+
+static const void *test_all_bwa_wd_gen_params(struct kunit *test, const void *prev,
+ char *desc)
+{
+ uintptr_t param = (uintptr_t)prev;
+
+ if (param > 15)
+ return NULL;
+
+ param++;
+
+ snprintf(desc, KUNIT_PARAM_DESC_SIZE, "wd=%u\n", (unsigned int)param);
+
+ return (void *)param;
+}
+
+static unsigned int test_get_bwa_wd(struct kunit *test)
+{
+ uintptr_t param = (uintptr_t)test->param_value;
+
+ KUNIT_ASSERT_GE(test, param, 1);
+ KUNIT_ASSERT_LE(test, param, 16);
+
+ return param;
+}
+
+static void test_mbw_max_to_percent_limits(struct kunit *test)
+{
+ struct mpam_props fake_props = {0};
+ u32 max_value;
+
+ mpam_set_feature(mpam_feat_mbw_max, &fake_props);
+ fake_props.bwa_wd = test_get_bwa_wd(test);
+ max_value = GENMASK(15, 16 - fake_props.bwa_wd);
+
+ KUNIT_EXPECT_EQ(test, mbw_max_to_percent(max_value, &fake_props),
+ MAX_MBA_BW);
+ KUNIT_EXPECT_EQ(test, mbw_max_to_percent(0, &fake_props),
+ get_mba_min(&fake_props));
+
+ /*
+ * Rounding policy dependent 0% sanity-check:
+ * With round-to-nearest, the minimum mbw_max value really
+ * should map to 0% if there are at least 200 steps.
+ * (100 steps may be enough for some other rounding policies.)
+ */
+ if (fake_props.bwa_wd >= 8)
+ KUNIT_EXPECT_EQ(test, mbw_max_to_percent(0, &fake_props), 0);
+
+ if (fake_props.bwa_wd < 8 &&
+ mbw_max_to_percent(0, &fake_props) == 0)
+ kunit_warn(test, "wd=%d: Testsuite/driver Rounding policy mismatch?",
+ fake_props.bwa_wd);
+}
+
+/*
+ * Check that converting a percentage to mbw_max and back again (or, as
+ * appropriate, vice-versa) always restores the original value:
+ */
+static void test_percent_max_roundtrip_stability(struct kunit *test)
+{
+ struct mpam_props fake_props = {0};
+ unsigned int shift;
+ u32 pc, max, pc2, max2;
+
+ mpam_set_feature(mpam_feat_mbw_max, &fake_props);
+ fake_props.bwa_wd = test_get_bwa_wd(test);
+ shift = 16 - fake_props.bwa_wd;
+
+ /*
+ * Converting a valid value from the coarser scale to the finer
+ * scale and back again must yield the original value:
+ */
+ if (fake_props.bwa_wd >= 7) {
+ /* More than 100 steps: only test exact pc values: */
+ for (pc = get_mba_min(&fake_props); pc <= MAX_MBA_BW; pc++) {
+ max = percent_to_mbw_max(pc, &fake_props);
+ pc2 = mbw_max_to_percent(max, &fake_props);
+ KUNIT_EXPECT_EQ(test, pc2, pc);
+ }
+ } else {
+ /* Fewer than 100 steps: only test exact mbw_max values: */
+ for (max = 0; max < 1 << 16; max += 1 << shift) {
+ pc = mbw_max_to_percent(max, &fake_props);
+ max2 = percent_to_mbw_max(pc, &fake_props);
+ KUNIT_EXPECT_EQ(test, max2, max);
+ }
+ }
+}
+
+static void test_percent_to_max_rounding(struct kunit *test)
+{
+ const struct percent_value_case *param = test->param_value;
+ unsigned int num_rounded_up = 0, total = 0;
+ struct percent_value_test_info res;
+
+ for (param = percent_value_cases, total = 0;
+ param < &percent_value_cases[ARRAY_SIZE(percent_value_cases)];
+ param++, total++) {
+ __prepare_percent_value_test(test, &res, param);
+ if (res.value > param->value << res.shift)
+ num_rounded_up++;
+ }
+
+ /*
+ * The MPAM driver applies a round-to-nearest policy, whereas a
+ * round-down policy seems to have been applied in the
+ * reference table from which the test vectors were selected.
+ *
+ * For a large and well-distributed suite of test vectors,
+ * about half should be rounded up and half down compared with
+ * the reference table. The actual test vectors are few in
+ * number and probably not very well distributed however, so
+ * tolerate a round-up rate of between 1/4 and 3/4 before
+ * crying foul:
+ */
+
+ kunit_info(test, "Round-up rate: %u%% (%u/%u)\n",
+ DIV_ROUND_CLOSEST(num_rounded_up * 100, total),
+ num_rounded_up, total);
+
+ KUNIT_EXPECT_GE(test, 4 * num_rounded_up, 1 * total);
+ KUNIT_EXPECT_LE(test, 4 * num_rounded_up, 3 * total);
+}
+
+static struct kunit_case mpam_resctrl_test_cases[] = {
+ KUNIT_CASE(test_get_mba_granularity),
+ KUNIT_CASE_PARAM(test_mbw_max_to_percent, test_percent_value_gen_params),
+ KUNIT_CASE_PARAM(test_percent_to_mbw_max, test_percent_value_gen_params),
+ KUNIT_CASE_PARAM(test_mbw_max_to_percent_limits, test_all_bwa_wd_gen_params),
+ KUNIT_CASE(test_percent_to_max_rounding),
+ KUNIT_CASE_PARAM(test_percent_max_roundtrip_stability,
+ test_all_bwa_wd_gen_params),
+ {}
+};
+
+static struct kunit_suite mpam_resctrl_test_suite = {
+ .name = "mpam_resctrl_test_suite",
+ .test_cases = mpam_resctrl_test_cases,
+};
+
+kunit_test_suites(&mpam_resctrl_test_suite);
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* [RFC PATCH 18/38] arm_mpam: resctrl: Add support for csu counters
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (16 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 17/38] arm_mpam: resctrl: Add kunit test for control format conversions James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-16 13:55 ` Ben Horgan
2025-12-18 13:20 ` Jonathan Cameron
2025-12-05 21:58 ` [RFC PATCH 19/38] arm_mpam: resctrl: pick classes for use as mbm counters James Morse
` (20 subsequent siblings)
38 siblings, 2 replies; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
resctrl exposes a counter via a file named llc_occupancy. This isn't really
a counter as its value goes up and down, this is a snapshot of the cache
storage usage monitor.
Add some picking code to find a cache as close as possible to the L3 that
supports the CSU monitor.
If there is an L3, but it doesn't have any controls, force the L3 resource
to exist. The existing topology_matches_l3() and
mpam_resctrl_domain_hdr_init() code will ensure this looks like the L3,
even if the class belongs to a later cache.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_internal.h | 6 ++
drivers/resctrl/mpam_resctrl.c | 148 ++++++++++++++++++++++++++++++++
2 files changed, 154 insertions(+)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 8684bd35d4ab..f9d2a1004c32 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -348,6 +348,12 @@ struct mpam_resctrl_res {
struct rdt_resource resctrl_res;
};
+struct mpam_resctrl_mon {
+ struct mpam_class *class;
+
+ /* per-class data that resctrl needs will live here */
+};
+
static inline int mpam_alloc_csu_mon(struct mpam_class *class)
{
struct mpam_props *cprops = &class->props;
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index fe830524639e..fc1f054f187e 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -31,6 +31,16 @@ static struct mpam_resctrl_res mpam_resctrl_controls[RDT_NUM_RESOURCES];
/* The lock for modifying resctrl's domain lists from cpuhp callbacks. */
static DEFINE_MUTEX(domain_list_lock);
+/*
+ * The classes we've picked to map to resctrl events.
+ * Resctrl believes all the worlds a Xeon, and these are all on the L3. This
+ * array lets us find the actual class backing the event counters. e.g.
+ * the only memory bandwidth counters may be on the memory controller, but to
+ * make use of them, we pretend they are on L3.
+ * Class pointer may be NULL.
+ */
+static struct mpam_resctrl_mon mpam_resctrl_counters[QOS_NUM_EVENTS];
+
static bool exposed_alloc_capable;
static bool exposed_mon_capable;
@@ -258,6 +268,28 @@ static bool class_has_usable_mba(struct mpam_props *cprops)
return mba_class_use_mbw_max(cprops);
}
+static bool cache_has_usable_csu(struct mpam_class *class)
+{
+ struct mpam_props *cprops;
+
+ if (!class)
+ return false;
+
+ cprops = &class->props;
+
+ if (!mpam_has_feature(mpam_feat_msmon_csu, cprops))
+ return false;
+
+ /*
+ * CSU counters settle on the value, so we can get away with
+ * having only one.
+ */
+ if (!cprops->num_csu_mon)
+ return false;
+
+ return (mpam_partid_max > 1) || (mpam_pmg_max != 0);
+}
+
/*
* Calculate the worst-case percentage change from each implemented step
* in the control.
@@ -499,6 +531,64 @@ static void mpam_resctrl_pick_mba(void)
}
}
+static void counter_update_class(enum resctrl_event_id evt_id,
+ struct mpam_class *class)
+{
+ struct mpam_class *existing_class = mpam_resctrl_counters[evt_id].class;
+
+ if (existing_class) {
+ if (class->level == 3) {
+ pr_debug("Existing class is L3 - L3 wins\n");
+ return;
+ } else if (existing_class->level < class->level) {
+ pr_debug("Existing class is closer to L3, %u versus %u - closer is better\n",
+ existing_class->level, class->level);
+ return;
+ }
+ }
+
+ mpam_resctrl_counters[evt_id].class = class;
+ exposed_mon_capable = true;
+}
+
+static void mpam_resctrl_pick_counters(void)
+{
+ struct mpam_class *class;
+ bool has_csu;
+
+ lockdep_assert_cpus_held();
+
+ guard(srcu)(&mpam_srcu);
+ list_for_each_entry_srcu(class, &mpam_classes, classes_list,
+ srcu_read_lock_held(&mpam_srcu)) {
+ if (class->level < 3) {
+ pr_debug("class %u is before L3", class->level);
+ continue;
+ }
+
+ if (!cpumask_equal(&class->affinity, cpu_possible_mask)) {
+ pr_debug("class %u does not cover all CPUs",
+ class->level);
+ continue;
+ }
+
+ has_csu = cache_has_usable_csu(class);
+ if (has_csu && topology_matches_l3(class)) {
+ pr_debug("class %u has usable CSU, and matches L3 topology",
+ class->level);
+
+ /* CSU counters only make sense on a cache. */
+ switch (class->type) {
+ case MPAM_CLASS_CACHE:
+ counter_update_class(QOS_L3_OCCUP_EVENT_ID, class);
+ return;
+ default:
+ return;
+ }
+ }
+ }
+}
+
static int mpam_resctrl_control_init(struct mpam_resctrl_res *res,
enum resctrl_res_level type)
{
@@ -578,6 +668,50 @@ static int mpam_resctrl_pick_domain_id(int cpu, struct mpam_component *comp)
return comp->comp_id;
}
+static void mpam_resctrl_monitor_init(struct mpam_resctrl_mon *mon,
+ enum resctrl_event_id type)
+{
+ struct mpam_resctrl_res *res = &mpam_resctrl_controls[RDT_RESOURCE_L3];
+ struct rdt_resource *l3 = &res->resctrl_res;
+
+ lockdep_assert_cpus_held();
+
+ /* There also needs to be an L3 cache present */
+ if (get_cpu_cacheinfo_id(smp_processor_id(), 3) == -1)
+ return;
+
+ /*
+ * If there are no MPAM resources on L3, force it into existence.
+ * topology_matches_l3() already ensures this looks like the L3.
+ * The domain-ids will be fixed up by mpam_resctrl_domain_hdr_init().
+ */
+ if (!res->class) {
+ pr_warn_once("Faking L3 MSC to enable counters.\n");
+ res->class = mpam_resctrl_counters[type].class;
+ }
+
+ /* Called multiple times!, once per event type */
+ if (exposed_mon_capable) {
+ l3->mon_capable = true;
+
+ /* Setting name is necessary on monitor only platforms */
+ l3->name = "L3";
+ l3->mon_scope = RESCTRL_L3_CACHE;
+
+ resctrl_enable_mon_event(type);
+
+ /*
+ * Unfortunately, num_rmid doesn't mean anything for
+ * mpam, and its exposed to user-space!
+ * num-rmid is supposed to mean the number of groups
+ * that can be created, both control or monitor groups.
+ * For mpam, each control group has its own pmg/rmid
+ * space.
+ */
+ l3->mon.num_rmid = 1;
+ }
+}
+
u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
u32 closid, enum resctrl_conf_type type)
{
@@ -939,8 +1073,10 @@ void mpam_resctrl_offline_cpu(unsigned int cpu)
int mpam_resctrl_setup(void)
{
int err = 0;
+ enum resctrl_event_id j;
enum resctrl_res_level i;
struct mpam_resctrl_res *res;
+ struct mpam_resctrl_mon *mon;
cpus_read_lock();
for (i = 0; i < RDT_NUM_RESOURCES; i++) {
@@ -966,6 +1102,18 @@ int mpam_resctrl_setup(void)
break;
}
}
+
+ /* Find some classes to use for monitors */
+ mpam_resctrl_pick_counters();
+
+ for (j = 0; j < QOS_NUM_EVENTS; j++) {
+ mon = &mpam_resctrl_counters[j];
+ if (!mon->class)
+ continue; // dummy resource
+
+ mpam_resctrl_monitor_init(mon, j);
+ }
+
cpus_read_unlock();
if (err || (!exposed_alloc_capable && !exposed_mon_capable)) {
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [RFC PATCH 18/38] arm_mpam: resctrl: Add support for csu counters
2025-12-05 21:58 ` [RFC PATCH 18/38] arm_mpam: resctrl: Add support for csu counters James Morse
@ 2025-12-16 13:55 ` Ben Horgan
2025-12-18 13:20 ` Jonathan Cameron
1 sibling, 0 replies; 95+ messages in thread
From: Ben Horgan @ 2025-12-16 13:55 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel
Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal
Hi James,
On 12/5/25 21:58, James Morse wrote:
> resctrl exposes a counter via a file named llc_occupancy. This isn't really
> a counter as its value goes up and down, this is a snapshot of the cache
> storage usage monitor.
>
> Add some picking code to find a cache as close as possible to the L3 that
> supports the CSU monitor.
>
> If there is an L3, but it doesn't have any controls, force the L3 resource
> to exist. The existing topology_matches_l3() and
> mpam_resctrl_domain_hdr_init() code will ensure this looks like the L3,
> even if the class belongs to a later cache.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> drivers/resctrl/mpam_internal.h | 6 ++
> drivers/resctrl/mpam_resctrl.c | 148 ++++++++++++++++++++++++++++++++
> 2 files changed, 154 insertions(+)
>
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 8684bd35d4ab..f9d2a1004c32 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -348,6 +348,12 @@ struct mpam_resctrl_res {
> struct rdt_resource resctrl_res;
> };
>
> +struct mpam_resctrl_mon {
> + struct mpam_class *class;
> +
> + /* per-class data that resctrl needs will live here */
> +};
> +
> static inline int mpam_alloc_csu_mon(struct mpam_class *class)
> {
> struct mpam_props *cprops = &class->props;
> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
> index fe830524639e..fc1f054f187e 100644
> --- a/drivers/resctrl/mpam_resctrl.c
> +++ b/drivers/resctrl/mpam_resctrl.c
> @@ -31,6 +31,16 @@ static struct mpam_resctrl_res mpam_resctrl_controls[RDT_NUM_RESOURCES];
> /* The lock for modifying resctrl's domain lists from cpuhp callbacks. */
> static DEFINE_MUTEX(domain_list_lock);
>
> +/*
> + * The classes we've picked to map to resctrl events.
> + * Resctrl believes all the worlds a Xeon, and these are all on the L3. This
> + * array lets us find the actual class backing the event counters. e.g.
> + * the only memory bandwidth counters may be on the memory controller, but to
> + * make use of them, we pretend they are on L3.
> + * Class pointer may be NULL.
> + */
> +static struct mpam_resctrl_mon mpam_resctrl_counters[QOS_NUM_EVENTS];
> +
> static bool exposed_alloc_capable;
> static bool exposed_mon_capable;
>
> @@ -258,6 +268,28 @@ static bool class_has_usable_mba(struct mpam_props *cprops)
> return mba_class_use_mbw_max(cprops);
> }
>
> +static bool cache_has_usable_csu(struct mpam_class *class)
> +{
> + struct mpam_props *cprops;
> +
> + if (!class)
> + return false;
> +
> + cprops = &class->props;
> +
> + if (!mpam_has_feature(mpam_feat_msmon_csu, cprops))
> + return false;
> +
> + /*
> + * CSU counters settle on the value, so we can get away with
> + * having only one.
> + */
> + if (!cprops->num_csu_mon)
> + return false;
> +
> + return (mpam_partid_max > 1) || (mpam_pmg_max != 0);
> +}
Why not allow csu when partid_max and pmg_max are both zero?
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread* Re: [RFC PATCH 18/38] arm_mpam: resctrl: Add support for csu counters
2025-12-05 21:58 ` [RFC PATCH 18/38] arm_mpam: resctrl: Add support for csu counters James Morse
2025-12-16 13:55 ` Ben Horgan
@ 2025-12-18 13:20 ` Jonathan Cameron
2025-12-19 12:06 ` Ben Horgan
1 sibling, 1 reply; 95+ messages in thread
From: Jonathan Cameron @ 2025-12-18 13:20 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
On Fri, 5 Dec 2025 21:58:41 +0000
James Morse <james.morse@arm.com> wrote:
> resctrl exposes a counter via a file named llc_occupancy. This isn't really
> a counter as its value goes up and down, this is a snapshot of the cache
> storage usage monitor.
>
> Add some picking code to find a cache as close as possible to the L3 that
> supports the CSU monitor.
>
> If there is an L3, but it doesn't have any controls, force the L3 resource
> to exist. The existing topology_matches_l3() and
> mpam_resctrl_domain_hdr_init() code will ensure this looks like the L3,
> even if the class belongs to a later cache.
>
> Signed-off-by: James Morse <james.morse@arm.com>
More triviality from me. I'll take a separate look at whether this actually
works for all the systems we care about. It feels like maybe a top level
MPAM to resctl mapping document might be useful as people are going to fall
into traps around these various heuristics, particularly as more exciting
topologies arrive in the future.
> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
> index fe830524639e..fc1f054f187e 100644
> --- a/drivers/resctrl/mpam_resctrl.c
> +++ b/drivers/resctrl/mpam_resctrl.c
> /*
> * Calculate the worst-case percentage change from each implemented step
> * in the control.
> @@ -499,6 +531,64 @@ static void mpam_resctrl_pick_mba(void)
> }
> }
>
> +static void counter_update_class(enum resctrl_event_id evt_id,
> + struct mpam_class *class)
> +{
> + struct mpam_class *existing_class = mpam_resctrl_counters[evt_id].class;
> +
> + if (existing_class) {
> + if (class->level == 3) {
> + pr_debug("Existing class is L3 - L3 wins\n");
> + return;
As returned the else isn't adding anything much.
> + } else if (existing_class->level < class->level) {
> + pr_debug("Existing class is closer to L3, %u versus %u - closer is better\n",
> + existing_class->level, class->level);
> + return;
> + }
> + }
> +
> + mpam_resctrl_counters[evt_id].class = class;
> + exposed_mon_capable = true;
> +}
^ permalink raw reply [flat|nested] 95+ messages in thread* Re: [RFC PATCH 18/38] arm_mpam: resctrl: Add support for csu counters
2025-12-18 13:20 ` Jonathan Cameron
@ 2025-12-19 12:06 ` Ben Horgan
0 siblings, 0 replies; 95+ messages in thread
From: Ben Horgan @ 2025-12-19 12:06 UTC (permalink / raw)
To: Jonathan Cameron, James Morse
Cc: linux-kernel, linux-arm-kernel, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal
Hi Jonathan,
On 12/18/25 13:20, Jonathan Cameron wrote:
> On Fri, 5 Dec 2025 21:58:41 +0000
> James Morse <james.morse@arm.com> wrote:
>
>> resctrl exposes a counter via a file named llc_occupancy. This isn't really
>> a counter as its value goes up and down, this is a snapshot of the cache
>> storage usage monitor.
>>
>> Add some picking code to find a cache as close as possible to the L3 that
>> supports the CSU monitor.
>>
>> If there is an L3, but it doesn't have any controls, force the L3 resource
>> to exist. The existing topology_matches_l3() and
>> mpam_resctrl_domain_hdr_init() code will ensure this looks like the L3,
>> even if the class belongs to a later cache.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
>
> More triviality from me. I'll take a separate look at whether this actually
> works for all the systems we care about. It feels like maybe a top level
> MPAM to resctl mapping document might be useful as people are going to fall
> into traps around these various heuristics, particularly as more exciting
> topologies arrive in the future.
Seems sensible, this is one of the tricky things.
>
>
>> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
>> index fe830524639e..fc1f054f187e 100644
>> --- a/drivers/resctrl/mpam_resctrl.c
>> +++ b/drivers/resctrl/mpam_resctrl.c
>
>> /*
>> * Calculate the worst-case percentage change from each implemented step
>> * in the control.
>> @@ -499,6 +531,64 @@ static void mpam_resctrl_pick_mba(void)
>> }
>> }
>>
>> +static void counter_update_class(enum resctrl_event_id evt_id,
>> + struct mpam_class *class)
>> +{
>> + struct mpam_class *existing_class = mpam_resctrl_counters[evt_id].class;
>> +
>> + if (existing_class) {
>> + if (class->level == 3) {
>> + pr_debug("Existing class is L3 - L3 wins\n");
>> + return;
>
> As returned the else isn't adding anything much.
>
>> + } else if (existing_class->level < class->level) {
>> + pr_debug("Existing class is closer to L3, %u versus %u - closer is better\n",
>> + existing_class->level, class->level);
>> + return;
>> + }
>> + }
>> +
>> + mpam_resctrl_counters[evt_id].class = class;
>> + exposed_mon_capable = true;
>> +}
>
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread
* [RFC PATCH 19/38] arm_mpam: resctrl: pick classes for use as mbm counters
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (17 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 18/38] arm_mpam: resctrl: Add support for csu counters James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-18 13:36 ` Jonathan Cameron
2025-12-05 21:58 ` [RFC PATCH 20/38] arm_mpam: resctrl: Pre-allocate free running monitors James Morse
` (19 subsequent siblings)
38 siblings, 1 reply; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
resctrl has two types of counters, NUMA-local and global. MPAM has only
bandwidth counters, but the position of the MSC may mean it counts
NUMA-local, or global traffic.
But the topology information is not available.
Apply a heuristic: the L2 or L3 supports bandwidth monitors, these are
probably NUMA-local. If the memory controller supports bandwidth
monitors, they are probably global.
This also allows us to assert that we don't have the same class
backing two different resctrl events.
Because the class or component backing the event may not be 'the L3',
it is necessary for mpam_resctrl_get_domain_from_cpu() to search
the monitor domains too. This matters the most for 'monitor only'
systems, where 'the L3' control domains may be empty, and the
ctrl_comp pointer NULL.
resctrl expects there to be enough monitors for every possible control
and monitor group to have one. Such a system gets called 'free running'
as the monitors can be programmed once and left running.
Any other platform will need to emulate ABMC.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_internal.h | 8 ++
drivers/resctrl/mpam_resctrl.c | 141 ++++++++++++++++++++++++++++++--
2 files changed, 144 insertions(+), 5 deletions(-)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index f9d2a1004c32..0984ac32f303 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -339,6 +339,14 @@ struct mpam_msc_ris {
struct mpam_resctrl_dom {
struct mpam_component *ctrl_comp;
+
+ /*
+ * There is no single mon_comp because different events may be backed
+ * by different class/components. mon_comp is indexed by the event
+ * number.
+ */
+ struct mpam_component *mon_comp[QOS_NUM_EVENTS];
+
struct rdt_ctrl_domain resctrl_ctrl_dom;
struct rdt_mon_domain resctrl_mon_dom;
};
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index fc1f054f187e..9978eb48c1f4 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -50,6 +50,14 @@ static bool exposed_mon_capable;
*/
static bool cdp_enabled;
+/* Whether this num_mbw_mon could result in a free_running system */
+static int __mpam_monitors_free_running(u16 num_mbwu_mon)
+{
+ if (num_mbwu_mon >= resctrl_arch_system_num_rmid_idx())
+ return resctrl_arch_system_num_rmid_idx();
+ return 0;
+}
+
bool resctrl_arch_alloc_capable(void)
{
return exposed_alloc_capable;
@@ -290,6 +298,26 @@ static bool cache_has_usable_csu(struct mpam_class *class)
return (mpam_partid_max > 1) || (mpam_pmg_max != 0);
}
+static bool class_has_usable_mbwu(struct mpam_class *class)
+{
+ struct mpam_props *cprops = &class->props;
+
+ if (!mpam_has_feature(mpam_feat_msmon_mbwu, cprops))
+ return false;
+
+ /*
+ * resctrl expects the bandwidth counters to be free running,
+ * which means we need as many monitors as resctrl has
+ * control/monitor groups.
+ */
+ if (__mpam_monitors_free_running(cprops->num_mbwu_mon)) {
+ pr_debug("monitors usable in free-running mode\n");
+ return true;
+ }
+
+ return false;
+}
+
/*
* Calculate the worst-case percentage change from each implemented step
* in the control.
@@ -554,7 +582,7 @@ static void counter_update_class(enum resctrl_event_id evt_id,
static void mpam_resctrl_pick_counters(void)
{
struct mpam_class *class;
- bool has_csu;
+ bool has_csu, has_mbwu;
lockdep_assert_cpus_held();
@@ -586,7 +614,37 @@ static void mpam_resctrl_pick_counters(void)
return;
}
}
+
+ has_mbwu = class_has_usable_mbwu(class);
+ if (has_mbwu && topology_matches_l3(class)) {
+ pr_debug("class %u has usable MBWU, and matches L3 topology",
+ class->level);
+
+ /*
+ * MBWU counters may be 'local' or 'total' depending on
+ * where they are in the topology. Counters on caches
+ * are assumed to be local. If it's on the memory
+ * controller, its assumed to be global.
+ */
+ switch (class->type) {
+ case MPAM_CLASS_CACHE:
+ counter_update_class(QOS_L3_MBM_LOCAL_EVENT_ID,
+ class);
+ break;
+ case MPAM_CLASS_MEMORY:
+ counter_update_class(QOS_L3_MBM_TOTAL_EVENT_ID,
+ class);
+ break;
+ default:
+ break;
+ }
+ }
}
+
+ /* Allocation of MBWU monitors assumes that the class is unique... */
+ if (mpam_resctrl_counters[QOS_L3_MBM_LOCAL_EVENT_ID].class)
+ WARN_ON_ONCE(mpam_resctrl_counters[QOS_L3_MBM_LOCAL_EVENT_ID].class ==
+ mpam_resctrl_counters[QOS_L3_MBM_TOTAL_EVENT_ID].class);
}
static int mpam_resctrl_control_init(struct mpam_resctrl_res *res,
@@ -910,6 +968,20 @@ static bool mpam_resctrl_offline_domain_hdr(unsigned int cpu,
return false;
}
+static struct mpam_component *find_component(struct mpam_class *victim, int cpu)
+{
+ struct mpam_component *victim_comp;
+
+ guard(srcu)(&mpam_srcu);
+ list_for_each_entry_srcu(victim_comp, &victim->components, class_list,
+ srcu_read_lock_held(&mpam_srcu)) {
+ if (cpumask_test_cpu(cpu, &victim_comp->affinity))
+ return victim_comp;
+ }
+
+ return NULL;
+}
+
static struct mpam_resctrl_dom *
mpam_resctrl_alloc_domain(unsigned int cpu, struct mpam_resctrl_res *res)
{
@@ -959,8 +1031,32 @@ mpam_resctrl_alloc_domain(unsigned int cpu, struct mpam_resctrl_res *res)
}
if (exposed_mon_capable) {
+ int i;
+ struct mpam_component *mon_comp, *any_mon_comp;
+
+ /*
+ * Even if the monitor domain is backed by a different
+ * component, the L3 component IDs need to be used... only
+ * there may be no ctrl_comp for the L3.
+ * Search each event's class list for a component with
+ * overlapping CPUs and set up the dom->mon_comp array.
+ */
+ for (i = 0; i < QOS_NUM_EVENTS; i++) {
+ struct mpam_resctrl_mon *mon;
+
+ mon = &mpam_resctrl_counters[i];
+ if (!mon->class)
+ continue; // dummy resource
+
+ mon_comp = find_component(mon->class, cpu);
+ dom->mon_comp[i] = mon_comp;
+ if (mon_comp)
+ any_mon_comp = mon_comp;
+ }
+ WARN_ON_ONCE(!any_mon_comp);
+
mon_d = &dom->resctrl_mon_dom;
- mpam_resctrl_domain_hdr_init(cpu, ctrl_comp, &mon_d->hdr);
+ mpam_resctrl_domain_hdr_init(cpu, any_mon_comp, &mon_d->hdr);
mon_d->hdr.type = RESCTRL_MON_DOMAIN;
/* TODO: this list should be sorted */
list_add_tail_rcu(&mon_d->hdr.list, &r->mon_domains);
@@ -982,16 +1078,47 @@ mpam_resctrl_alloc_domain(unsigned int cpu, struct mpam_resctrl_res *res)
return dom;
}
+/*
+ * We know all the monitors are associated with the L3, even if there are no
+ * controls and therefore no control component. Find the cache-id for the CPU
+ * and use that to search for existing resctrl domains.
+ * This relies on mpam_resctrl_pick_domain_id() using the L3 cache-id
+ * for anything that is not a cache.
+ */
+static struct mpam_resctrl_dom *mpam_resctrl_get_mon_domain_from_cpu(int cpu)
+{
+ u32 cache_id;
+ struct rdt_mon_domain *mon_d;
+ struct mpam_resctrl_dom *dom;
+ struct mpam_resctrl_res *l3 = &mpam_resctrl_controls[RDT_RESOURCE_L3];
+
+ if (!l3->class)
+ return NULL;
+ /* TODO: how does this order with cacheinfo updates under cpuhp? */
+ cache_id = get_cpu_cacheinfo_id(cpu, 3);
+ if (cache_id == ~0)
+ return NULL;
+
+ list_for_each_entry(mon_d, &l3->resctrl_res.mon_domains, hdr.list) {
+ dom = container_of(mon_d, struct mpam_resctrl_dom, resctrl_mon_dom);
+
+ if (mon_d->hdr.id == cache_id)
+ return dom;
+ }
+
+ return NULL;
+}
+
static struct mpam_resctrl_dom *
mpam_resctrl_get_domain_from_cpu(int cpu, struct mpam_resctrl_res *res)
{
struct mpam_resctrl_dom *dom;
struct rdt_ctrl_domain *ctrl_d;
+ struct rdt_resource *r = &res->resctrl_res;
lockdep_assert_cpus_held();
- list_for_each_entry_rcu(ctrl_d, &res->resctrl_res.ctrl_domains,
- hdr.list) {
+ list_for_each_entry_rcu(ctrl_d, &r->ctrl_domains, hdr.list) {
dom = container_of(ctrl_d, struct mpam_resctrl_dom,
resctrl_ctrl_dom);
@@ -999,7 +1126,11 @@ mpam_resctrl_get_domain_from_cpu(int cpu, struct mpam_resctrl_res *res)
return dom;
}
- return NULL;
+ if (r->rid != RDT_RESOURCE_L3)
+ return NULL;
+
+ /* Search the mon domain list too - needed on monitor only platforms. */
+ return mpam_resctrl_get_mon_domain_from_cpu(cpu);
}
int mpam_resctrl_online_cpu(unsigned int cpu)
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [RFC PATCH 19/38] arm_mpam: resctrl: pick classes for use as mbm counters
2025-12-05 21:58 ` [RFC PATCH 19/38] arm_mpam: resctrl: pick classes for use as mbm counters James Morse
@ 2025-12-18 13:36 ` Jonathan Cameron
0 siblings, 0 replies; 95+ messages in thread
From: Jonathan Cameron @ 2025-12-18 13:36 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
On Fri, 5 Dec 2025 21:58:42 +0000
James Morse <james.morse@arm.com> wrote:
> resctrl has two types of counters, NUMA-local and global. MPAM has only
> bandwidth counters, but the position of the MSC may mean it counts
> NUMA-local, or global traffic.
>
> But the topology information is not available.
>
> Apply a heuristic: the L2 or L3 supports bandwidth monitors, these are
> probably NUMA-local. If the memory controller supports bandwidth
> monitors, they are probably global.
>
> This also allows us to assert that we don't have the same class
> backing two different resctrl events.
>
> Because the class or component backing the event may not be 'the L3',
> it is necessary for mpam_resctrl_get_domain_from_cpu() to search
> the monitor domains too. This matters the most for 'monitor only'
> systems, where 'the L3' control domains may be empty, and the
> ctrl_comp pointer NULL.
>
> resctrl expects there to be enough monitors for every possible control
> and monitor group to have one. Such a system gets called 'free running'
> as the monitors can be programmed once and left running.
> Any other platform will need to emulate ABMC.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
> index fc1f054f187e..9978eb48c1f4 100644
> --- a/drivers/resctrl/mpam_resctrl.c
> +++ b/drivers/resctrl/mpam_resctrl.c
> @@ -586,7 +614,37 @@ static void mpam_resctrl_pick_counters(void)
> return;
> }
> }
> +
> + has_mbwu = class_has_usable_mbwu(class);
> + if (has_mbwu && topology_matches_l3(class)) {
Might get reused in later patches. If not
if (class_has_usable_mbwu(class) && topology_matches_l3(class))
> + pr_debug("class %u has usable MBWU, and matches L3 topology",
> + class->level);
> +
> + /*
> + * MBWU counters may be 'local' or 'total' depending on
> + * where they are in the topology. Counters on caches
> + * are assumed to be local. If it's on the memory
> + * controller, its assumed to be global.
> + */
> + switch (class->type) {
> + case MPAM_CLASS_CACHE:
> + counter_update_class(QOS_L3_MBM_LOCAL_EVENT_ID,
> + class);
> + break;
> + case MPAM_CLASS_MEMORY:
> + counter_update_class(QOS_L3_MBM_TOTAL_EVENT_ID,
> + class);
> + break;
> + default:
> + break;
> + }
> + }
> }
> +
> + /* Allocation of MBWU monitors assumes that the class is unique... */
> + if (mpam_resctrl_counters[QOS_L3_MBM_LOCAL_EVENT_ID].class)
> + WARN_ON_ONCE(mpam_resctrl_counters[QOS_L3_MBM_LOCAL_EVENT_ID].class ==
> + mpam_resctrl_counters[QOS_L3_MBM_TOTAL_EVENT_ID].class);
> }
>
> +/*
> + * We know all the monitors are associated with the L3, even if there are no
> + * controls and therefore no control component. Find the cache-id for the CPU
> + * and use that to search for existing resctrl domains.
> + * This relies on mpam_resctrl_pick_domain_id() using the L3 cache-id
> + * for anything that is not a cache.
> + */
> +static struct mpam_resctrl_dom *mpam_resctrl_get_mon_domain_from_cpu(int cpu)
> +{
> + u32 cache_id;
> + struct rdt_mon_domain *mon_d;
> + struct mpam_resctrl_dom *dom;
> + struct mpam_resctrl_res *l3 = &mpam_resctrl_controls[RDT_RESOURCE_L3];
> +
> + if (!l3->class)
> + return NULL;
> + /* TODO: how does this order with cacheinfo updates under cpuhp? */
> + cache_id = get_cpu_cacheinfo_id(cpu, 3);
> + if (cache_id == ~0)
> + return NULL;
> +
> + list_for_each_entry(mon_d, &l3->resctrl_res.mon_domains, hdr.list) {
> + dom = container_of(mon_d, struct mpam_resctrl_dom, resctrl_mon_dom);
Similar comment to one on earlier patch. Can make the list iterator directly
provide dom as that's what it's actually a list of, not rdt_mon_domain structures.
> +
> + if (mon_d->hdr.id == cache_id)
> + return dom;
> + }
> +
> + return NULL;
> +}
> +
> static struct mpam_resctrl_dom *
> mpam_resctrl_get_domain_from_cpu(int cpu, struct mpam_resctrl_res *res)
> {
> struct mpam_resctrl_dom *dom;
> struct rdt_ctrl_domain *ctrl_d;
> + struct rdt_resource *r = &res->resctrl_res;
Push back to original patch.
>
> lockdep_assert_cpus_held();
>
> - list_for_each_entry_rcu(ctrl_d, &res->resctrl_res.ctrl_domains,
> - hdr.list) {
> + list_for_each_entry_rcu(ctrl_d, &r->ctrl_domains, hdr.list) {
> dom = container_of(ctrl_d, struct mpam_resctrl_dom,
> resctrl_ctrl_dom);
>
^ permalink raw reply [flat|nested] 95+ messages in thread
* [RFC PATCH 20/38] arm_mpam: resctrl: Pre-allocate free running monitors
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (18 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 19/38] arm_mpam: resctrl: pick classes for use as mbm counters James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-05 21:58 ` [RFC PATCH 21/38] arm_mpam: resctrl: Pre-allocate assignable monitors James Morse
` (18 subsequent siblings)
38 siblings, 0 replies; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
When there are enough monitors, the resctrl mbm local and total
files can be exposed. These need all the monitors that resctrl
may use to be allocated up front.
Add helpers to do this.
If a different candidate class is discovered, the old array
should be free'd and the allocated monitors returned to the
driver.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_internal.h | 8 +++-
drivers/resctrl/mpam_resctrl.c | 84 ++++++++++++++++++++++++++++++++-
2 files changed, 89 insertions(+), 3 deletions(-)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 0984ac32f303..b7c914febeb4 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -359,7 +359,13 @@ struct mpam_resctrl_res {
struct mpam_resctrl_mon {
struct mpam_class *class;
- /* per-class data that resctrl needs will live here */
+ /*
+ * Array of allocated MBWU monitors, indexed by (closid, rmid).
+ * When ABMC is not in use, this array directly maps (closid, rmid)
+ * to the allocated monitor. Otherwise this array is sparse, and
+ * un-assigned (closid, rmid) are -1.
+ */
+ int *mbwu_idx_to_mon;
};
static inline int mpam_alloc_csu_mon(struct mpam_class *class)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index 9978eb48c1f4..de5220fed97d 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -559,10 +559,58 @@ static void mpam_resctrl_pick_mba(void)
}
}
+static void __free_mbwu_mon(struct mpam_class *class, int *array,
+ u16 num_mbwu_mon)
+{
+ for (int i = 0; i < num_mbwu_mon; i++) {
+ if (array[i] < 0)
+ continue;
+
+ mpam_free_mbwu_mon(class, array[i]);
+ array[i] = ~0;
+ }
+}
+
+static int __alloc_mbwu_mon(struct mpam_class *class, int *array,
+ u16 num_mbwu_mon)
+{
+ for (int i = 0; i < num_mbwu_mon; i++) {
+ int mbwu_mon = mpam_alloc_mbwu_mon(class);
+
+ if (mbwu_mon < 0) {
+ __free_mbwu_mon(class, array, num_mbwu_mon);
+ return mbwu_mon;
+ }
+ array[i] = mbwu_mon;
+ }
+
+ return 0;
+}
+
+static int *__alloc_mbwu_array(struct mpam_class *class, u16 num_mbwu_mon)
+{
+ int err;
+ size_t array_size = num_mbwu_mon * sizeof(int);
+ int *array __free(kfree) = kmalloc(array_size, GFP_KERNEL);
+
+ if (!array)
+ return ERR_PTR(-ENOMEM);
+
+ memset(array, -1, array_size);
+
+ err = __alloc_mbwu_mon(class, array, num_mbwu_mon);
+ if (err)
+ return ERR_PTR(err);
+ return_ptr(array);
+}
+
static void counter_update_class(enum resctrl_event_id evt_id,
struct mpam_class *class)
{
- struct mpam_class *existing_class = mpam_resctrl_counters[evt_id].class;
+ struct mpam_resctrl_mon *mon = &mpam_resctrl_counters[evt_id];
+ struct mpam_class *existing_class = mon->class;
+ u16 num_mbwu_mon = class->props.num_mbwu_mon;
+ int *existing_array = mon->mbwu_idx_to_mon;
if (existing_class) {
if (class->level == 3) {
@@ -575,8 +623,40 @@ static void counter_update_class(enum resctrl_event_id evt_id,
}
}
- mpam_resctrl_counters[evt_id].class = class;
+ pr_debug("Updating event %u to use class %u\n", evt_id, class->level);
+ mon->class = class;
exposed_mon_capable = true;
+
+ if (evt_id == QOS_L3_OCCUP_EVENT_ID)
+ return;
+
+ /* Might not need all the monitors */
+ num_mbwu_mon = __mpam_monitors_free_running(num_mbwu_mon);
+ if (!num_mbwu_mon) {
+ pr_debug("Not pre-allocating free-running counters\n");
+ return;
+ }
+
+ /*
+ * This is the pre-allocated free-running monitors path. It always
+ * allocates one monitor per PARTID * PMG.
+ */
+ WARN_ON_ONCE(num_mbwu_mon != resctrl_arch_system_num_rmid_idx());
+
+ mon->mbwu_idx_to_mon = __alloc_mbwu_array(class, num_mbwu_mon);
+ if (IS_ERR(mon->mbwu_idx_to_mon)) {
+ pr_debug("Failed to allocate MBWU array\n");
+ mon->class = existing_class;
+ mon->mbwu_idx_to_mon = existing_array;
+ return;
+ }
+
+ if (existing_array) {
+ pr_debug("Releasing previous class %u's monitors\n",
+ existing_class->level);
+ __free_mbwu_mon(existing_class, existing_array, num_mbwu_mon);
+ kfree(existing_array);
+ }
}
static void mpam_resctrl_pick_counters(void)
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* [RFC PATCH 21/38] arm_mpam: resctrl: Pre-allocate assignable monitors
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (19 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 20/38] arm_mpam: resctrl: Pre-allocate free running monitors James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-18 13:42 ` Jonathan Cameron
2025-12-05 21:58 ` [RFC PATCH 22/38] arm_mpam: resctrl: Add kunit test for ABMC/CDP interactions James Morse
` (17 subsequent siblings)
38 siblings, 1 reply; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
When there are not enough monitors, MPAM is able to emulate ABMC by making
a smaller number of monitors assignable. These monitors still need to be
allocated from the driver, and mapped to whichever control/monitor group
resctrl wants to use them with.
Add a second array to hold the monitor values indexed by resctrl's
cntr_id.
When CDP is in use, two monitors are needed so the available number of
counters halves. Platforms with one monitor will have zero monitors
when CDP is in use.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_internal.h | 7 +++
drivers/resctrl/mpam_resctrl.c | 102 ++++++++++++++++++++++++++++++++
2 files changed, 109 insertions(+)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index b7c914febeb4..05101186af17 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -366,6 +366,13 @@ struct mpam_resctrl_mon {
* un-assigned (closid, rmid) are -1.
*/
int *mbwu_idx_to_mon;
+
+ /*
+ * Array of assigned MBWU monitors, indexed by idx argument.
+ * When ABMC is not in use, this array can be NULL. Otherwise
+ * it maps idx to the allocated monitor.
+ */
+ int *assigned_counters;
};
static inline int mpam_alloc_csu_mon(struct mpam_class *class)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index de5220fed97d..f607feaf0126 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -50,6 +50,12 @@ static bool exposed_mon_capable;
*/
static bool cdp_enabled;
+/*
+ * L3 local/total may come from different classes - what is the number of MBWU
+ * 'on L3'?
+ */
+static unsigned int l3_num_allocated_mbwu = ~0;
+
/* Whether this num_mbw_mon could result in a free_running system */
static int __mpam_monitors_free_running(u16 num_mbwu_mon)
{
@@ -58,6 +64,15 @@ static int __mpam_monitors_free_running(u16 num_mbwu_mon)
return 0;
}
+/*
+ * If l3_num_allocated_mbwu is forced below PARTID * PMG, then the counters
+ * are not free running, and ABMC's user-interface must be used to assign them.
+ */
+static bool mpam_resctrl_abmc_enabled(void)
+{
+ return l3_num_allocated_mbwu < resctrl_arch_system_num_rmid_idx();
+}
+
bool resctrl_arch_alloc_capable(void)
{
return exposed_alloc_capable;
@@ -102,8 +117,25 @@ static void resctrl_reset_task_closids(void)
read_unlock(&tasklist_lock);
}
+static void mpam_resctrl_monitor_sync_abmc_vals(struct rdt_resource *l3)
+{
+ l3->mon.num_mbm_cntrs = l3_num_allocated_mbwu;
+ if (cdp_enabled)
+ l3->mon.num_mbm_cntrs /= 2;
+
+ if (l3->mon.num_mbm_cntrs) {
+ l3->mon.mbm_cntr_assignable = mpam_resctrl_abmc_enabled();
+ l3->mon.mbm_assign_on_mkdir = mpam_resctrl_abmc_enabled();
+ } else {
+ l3->mon.mbm_cntr_assignable = false;
+ l3->mon.mbm_assign_on_mkdir = false;
+ }
+}
+
int resctrl_arch_set_cdp_enabled(enum resctrl_res_level ignored, bool enable)
{
+ struct mpam_resctrl_res *res = &mpam_resctrl_controls[RDT_RESOURCE_L3];
+ struct rdt_resource *l3 = &res->resctrl_res;
u32 partid_i, partid_d;
cdp_enabled = enable;
@@ -120,6 +152,7 @@ int resctrl_arch_set_cdp_enabled(enum resctrl_res_level ignored, bool enable)
WRITE_ONCE(arm64_mpam_global_default, mpam_get_regval(current));
resctrl_reset_task_closids();
+ mpam_resctrl_monitor_sync_abmc_vals(l3);
return 0;
}
@@ -315,6 +348,11 @@ static bool class_has_usable_mbwu(struct mpam_class *class)
return true;
}
+ if (cprops->num_mbwu_mon) {
+ pr_debug("monitors usable via ABMC assignment\n");
+ return true;
+ }
+
return false;
}
@@ -584,6 +622,8 @@ static int __alloc_mbwu_mon(struct mpam_class *class, int *array,
array[i] = mbwu_mon;
}
+ l3_num_allocated_mbwu = min(l3_num_allocated_mbwu, num_mbwu_mon);
+
return 0;
}
@@ -727,6 +767,23 @@ static void mpam_resctrl_pick_counters(void)
mpam_resctrl_counters[QOS_L3_MBM_TOTAL_EVENT_ID].class);
}
+bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r)
+{
+ if (r != &mpam_resctrl_controls[RDT_RESOURCE_L3].resctrl_res)
+ return false;
+
+ return mpam_resctrl_abmc_enabled();
+}
+
+int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable)
+{
+ lockdep_assert_cpus_held();
+
+ WARN_ON_ONCE(1);
+
+ return 0;
+}
+
static int mpam_resctrl_control_init(struct mpam_resctrl_res *res,
enum resctrl_res_level type)
{
@@ -806,6 +863,41 @@ static int mpam_resctrl_pick_domain_id(int cpu, struct mpam_component *comp)
return comp->comp_id;
}
+/*
+ * This must run after all event counters have been picked so that any free
+ * running counters have already been allocated.
+ */
+static int mpam_resctrl_monitor_init_abmc(struct mpam_resctrl_mon *mon)
+{
+ struct mpam_resctrl_res *res = &mpam_resctrl_controls[RDT_RESOURCE_L3];
+ size_t array_size = resctrl_arch_system_num_rmid_idx() * sizeof(int);
+ int *rmid_array __free(kfree) = kmalloc(array_size, GFP_KERNEL);
+ struct rdt_resource *l3 = &res->resctrl_res;
+ struct mpam_class *class = mon->class;
+ u16 num_mbwu_mon;
+
+ if (mon->mbwu_idx_to_mon) {
+ pr_debug("monitors free running\n");
+ return 0;
+ }
+
+ if (!rmid_array) {
+ pr_debug("Failed to allocate RMID array\n");
+ return -ENOMEM;
+ }
+ memset(rmid_array, -1, array_size);
+
+ num_mbwu_mon = class->props.num_mbwu_mon;
+ mon->assigned_counters = __alloc_mbwu_array(mon->class, num_mbwu_mon);
+ if (IS_ERR(mon->assigned_counters))
+ return PTR_ERR(mon->assigned_counters);
+ mon->mbwu_idx_to_mon = no_free_ptr(rmid_array);
+
+ mpam_resctrl_monitor_sync_abmc_vals(l3);
+
+ return 0;
+}
+
static void mpam_resctrl_monitor_init(struct mpam_resctrl_mon *mon,
enum resctrl_event_id type)
{
@@ -847,6 +939,16 @@ static void mpam_resctrl_monitor_init(struct mpam_resctrl_mon *mon,
* space.
*/
l3->mon.num_rmid = 1;
+
+ switch (type) {
+ case QOS_L3_MBM_LOCAL_EVENT_ID:
+ case QOS_L3_MBM_TOTAL_EVENT_ID:
+ mpam_resctrl_monitor_init_abmc(mon);
+
+ return;
+ default:
+ return;
+ }
}
}
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [RFC PATCH 21/38] arm_mpam: resctrl: Pre-allocate assignable monitors
2025-12-05 21:58 ` [RFC PATCH 21/38] arm_mpam: resctrl: Pre-allocate assignable monitors James Morse
@ 2025-12-18 13:42 ` Jonathan Cameron
0 siblings, 0 replies; 95+ messages in thread
From: Jonathan Cameron @ 2025-12-18 13:42 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
On Fri, 5 Dec 2025 21:58:44 +0000
James Morse <james.morse@arm.com> wrote:
> When there are not enough monitors, MPAM is able to emulate ABMC by making
> a smaller number of monitors assignable. These monitors still need to be
> allocated from the driver, and mapped to whichever control/monitor group
> resctrl wants to use them with.
>
> Add a second array to hold the monitor values indexed by resctrl's
> cntr_id.
>
> When CDP is in use, two monitors are needed so the available number of
> counters halves. Platforms with one monitor will have zero monitors
> when CDP is in use.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
> index de5220fed97d..f607feaf0126 100644
> --- a/drivers/resctrl/mpam_resctrl.c
> +++ b/drivers/resctrl/mpam_resctrl.c
> @@ -806,6 +863,41 @@ static int mpam_resctrl_pick_domain_id(int cpu, struct mpam_component *comp)
> return comp->comp_id;
> }
>
> +/*
> + * This must run after all event counters have been picked so that any free
> + * running counters have already been allocated.
> + */
> +static int mpam_resctrl_monitor_init_abmc(struct mpam_resctrl_mon *mon)
> +{
> + struct mpam_resctrl_res *res = &mpam_resctrl_controls[RDT_RESOURCE_L3];
> + size_t array_size = resctrl_arch_system_num_rmid_idx() * sizeof(int);
> + int *rmid_array __free(kfree) = kmalloc(array_size, GFP_KERNEL);
kcalloc? Given you call it an array I assume it is one. Will zero though
and if you really care about optimizing that out add a comment.
Move this...
> + struct rdt_resource *l3 = &res->resctrl_res;
> + struct mpam_class *class = mon->class;
> + u16 num_mbwu_mon;
> +
> + if (mon->mbwu_idx_to_mon) {
> + pr_debug("monitors free running\n");
> + return 0;
> + }
Down to here. See cleanup.h notes that summarize how much Linus doesn't want
people not declaring these inline because of how much it hurts readability.
> +
> + if (!rmid_array) {
> + pr_debug("Failed to allocate RMID array\n");
> + return -ENOMEM;
> + }
> + memset(rmid_array, -1, array_size);
> +
> + num_mbwu_mon = class->props.num_mbwu_mon;
> + mon->assigned_counters = __alloc_mbwu_array(mon->class, num_mbwu_mon);
> + if (IS_ERR(mon->assigned_counters))
> + return PTR_ERR(mon->assigned_counters);
> + mon->mbwu_idx_to_mon = no_free_ptr(rmid_array);
> +
> + mpam_resctrl_monitor_sync_abmc_vals(l3);
> +
> + return 0;
> +}
> +
> static void mpam_resctrl_monitor_init(struct mpam_resctrl_mon *mon,
> enum resctrl_event_id type)
> {
> @@ -847,6 +939,16 @@ static void mpam_resctrl_monitor_init(struct mpam_resctrl_mon *mon,
> * space.
> */
> l3->mon.num_rmid = 1;
> +
> + switch (type) {
> + case QOS_L3_MBM_LOCAL_EVENT_ID:
> + case QOS_L3_MBM_TOTAL_EVENT_ID:
> + mpam_resctrl_monitor_init_abmc(mon);
> +
> + return;
> + default:
> + return;
> + }
> }
> }
>
^ permalink raw reply [flat|nested] 95+ messages in thread
* [RFC PATCH 22/38] arm_mpam: resctrl: Add kunit test for ABMC/CDP interactions
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (20 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 21/38] arm_mpam: resctrl: Pre-allocate assignable monitors James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-05 21:58 ` [RFC PATCH 23/38] arm_mpam: resctrl: Add resctrl_arch_config_cntr() for ABMC use James Morse
` (16 subsequent siblings)
38 siblings, 0 replies; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
ABMC exposes a fun corner case where a platform with one monitor can
use ABMC for assignable counters - but not when CDP is enabled.
Add some tests.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/test_mpam_resctrl.c | 62 +++++++++++++++++++++++++++++
1 file changed, 62 insertions(+)
diff --git a/drivers/resctrl/test_mpam_resctrl.c b/drivers/resctrl/test_mpam_resctrl.c
index d0615aa7671c..c83d54f21fa0 100644
--- a/drivers/resctrl/test_mpam_resctrl.c
+++ b/drivers/resctrl/test_mpam_resctrl.c
@@ -293,6 +293,67 @@ static void test_percent_to_max_rounding(struct kunit *test)
KUNIT_EXPECT_LE(test, 4 * num_rounded_up, 3 * total);
}
+static void test_num_assignable_counters(struct kunit *test)
+{
+ unsigned int orig_l3_num_allocated_mbwu = l3_num_allocated_mbwu;
+ u32 orig_mpam_partid_max = mpam_partid_max;
+ u32 orig_mpam_pmg_max = mpam_pmg_max;
+ bool orig_cdp_enabled = cdp_enabled;
+ struct rdt_resource fake_l3;
+
+ /* Force there to be some PARTID/PMG */
+ mpam_partid_max = 3;
+ mpam_pmg_max = 1;
+
+ cdp_enabled = false;
+
+ /* ABMC off, CDP off */
+ l3_num_allocated_mbwu = resctrl_arch_system_num_rmid_idx();
+ mpam_resctrl_monitor_sync_abmc_vals(&fake_l3);
+ KUNIT_EXPECT_EQ(test, fake_l3.mon.num_mbm_cntrs, resctrl_arch_system_num_rmid_idx());
+ KUNIT_EXPECT_FALSE(test, fake_l3.mon.mbm_cntr_assignable);
+ KUNIT_EXPECT_FALSE(test, fake_l3.mon.mbm_assign_on_mkdir);
+
+ /* ABMC on, CDP off */
+ l3_num_allocated_mbwu = 4;
+ mpam_resctrl_monitor_sync_abmc_vals(&fake_l3);
+ KUNIT_EXPECT_EQ(test, fake_l3.mon.num_mbm_cntrs, 4);
+ KUNIT_EXPECT_TRUE(test, fake_l3.mon.mbm_cntr_assignable);
+ KUNIT_EXPECT_TRUE(test, fake_l3.mon.mbm_assign_on_mkdir);
+
+ cdp_enabled = true;
+
+ /* ABMC off, CDP on */
+ l3_num_allocated_mbwu = resctrl_arch_system_num_rmid_idx();
+ mpam_resctrl_monitor_sync_abmc_vals(&fake_l3);
+
+ /* (value not consumed by resctrl) */
+ KUNIT_EXPECT_EQ(test, fake_l3.mon.num_mbm_cntrs, resctrl_arch_system_num_rmid_idx() / 2);
+
+ KUNIT_EXPECT_FALSE(test, fake_l3.mon.mbm_cntr_assignable);
+ KUNIT_EXPECT_FALSE(test, fake_l3.mon.mbm_assign_on_mkdir);
+
+ /* ABMC on, CDP on */
+ l3_num_allocated_mbwu = 4;
+ mpam_resctrl_monitor_sync_abmc_vals(&fake_l3);
+ KUNIT_EXPECT_EQ(test, fake_l3.mon.num_mbm_cntrs, 2);
+ KUNIT_EXPECT_TRUE(test, fake_l3.mon.mbm_cntr_assignable);
+ KUNIT_EXPECT_TRUE(test, fake_l3.mon.mbm_assign_on_mkdir);
+
+ /* ABMC 'on', CDP on - but not enough counters */
+ l3_num_allocated_mbwu = 1;
+ mpam_resctrl_monitor_sync_abmc_vals(&fake_l3);
+ KUNIT_EXPECT_EQ(test, fake_l3.mon.num_mbm_cntrs, 0);
+ KUNIT_EXPECT_FALSE(test, fake_l3.mon.mbm_cntr_assignable);
+ KUNIT_EXPECT_FALSE(test, fake_l3.mon.mbm_assign_on_mkdir);
+
+ /* Restore global variables that were messed with */
+ l3_num_allocated_mbwu = orig_l3_num_allocated_mbwu;
+ mpam_partid_max = orig_mpam_partid_max;
+ mpam_pmg_max = orig_mpam_pmg_max;
+ cdp_enabled = orig_cdp_enabled;
+}
+
static struct kunit_case mpam_resctrl_test_cases[] = {
KUNIT_CASE(test_get_mba_granularity),
KUNIT_CASE_PARAM(test_mbw_max_to_percent, test_percent_value_gen_params),
@@ -301,6 +362,7 @@ static struct kunit_case mpam_resctrl_test_cases[] = {
KUNIT_CASE(test_percent_to_max_rounding),
KUNIT_CASE_PARAM(test_percent_max_roundtrip_stability,
test_all_bwa_wd_gen_params),
+ KUNIT_CASE(test_num_assignable_counters),
{}
};
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* [RFC PATCH 23/38] arm_mpam: resctrl: Add resctrl_arch_config_cntr() for ABMC use
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (21 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 22/38] arm_mpam: resctrl: Add kunit test for ABMC/CDP interactions James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-05 21:58 ` [RFC PATCH 24/38] arm_mpam: resctrl: Allow resctrl to allocate monitors James Morse
` (15 subsequent siblings)
38 siblings, 0 replies; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
ABMC has a helper resctrl_arch_config_cntr() for changing the mapping
between 'cntr_id' and a CLOSID/RMID pair.
Add the helper.
For MPAM this is done by updating the mon->mbwu_idx_to_mon[] array,
and as usual CDP means it needs doing in three different ways.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_resctrl.c | 37 ++++++++++++++++++++++++++++++++++
1 file changed, 37 insertions(+)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index f607feaf0126..22ad5dd3c383 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -767,6 +767,43 @@ static void mpam_resctrl_pick_counters(void)
mpam_resctrl_counters[QOS_L3_MBM_TOTAL_EVENT_ID].class);
}
+static void __config_cntr(struct mpam_resctrl_mon *mon, u32 cntr_id,
+ enum resctrl_conf_type cdp_type, u32 closid, u32 rmid,
+ bool assign)
+{
+ u32 mbwu_idx, mon_idx = resctrl_get_config_index(cntr_id, cdp_type);
+
+ closid = resctrl_get_config_index(closid, cdp_type);
+ mbwu_idx = resctrl_arch_rmid_idx_encode(closid, rmid);
+ WARN_ON_ONCE(mon_idx > l3_num_allocated_mbwu);
+
+ if (assign)
+ mon->mbwu_idx_to_mon[mbwu_idx] = mon->assigned_counters[mon_idx];
+ else
+ mon->mbwu_idx_to_mon[mbwu_idx] = -1;
+}
+
+void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
+ enum resctrl_event_id evtid, u32 rmid, u32 closid,
+ u32 cntr_id, bool assign)
+{
+ struct mpam_resctrl_mon *mon = &mpam_resctrl_counters[evtid];
+
+ if (!mon->mbwu_idx_to_mon || !mon->assigned_counters) {
+ pr_debug("monitor arrays not allocated\n");
+ return;
+ }
+
+ if (cdp_enabled) {
+ __config_cntr(mon, cntr_id, CDP_CODE, closid, rmid, assign);
+ __config_cntr(mon, cntr_id, CDP_DATA, closid, rmid, assign);
+ } else {
+ __config_cntr(mon, cntr_id, CDP_NONE, closid, rmid, assign);
+ }
+
+ resctrl_arch_reset_rmid(r, d, closid, rmid, evtid);
+}
+
bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r)
{
if (r != &mpam_resctrl_controls[RDT_RESOURCE_L3].resctrl_res)
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* [RFC PATCH 24/38] arm_mpam: resctrl: Allow resctrl to allocate monitors
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (22 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 23/38] arm_mpam: resctrl: Add resctrl_arch_config_cntr() for ABMC use James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-16 16:58 ` Ben Horgan
2025-12-18 13:49 ` Jonathan Cameron
2025-12-05 21:58 ` [RFC PATCH 25/38] arm_mpam: resctrl: Add resctrl_arch_rmid_read() and resctrl_arch_reset_rmid() James Morse
` (14 subsequent siblings)
38 siblings, 2 replies; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
When resctrl wants to read a domain's 'QOS_L3_OCCUP', it needs
to allocate a monitor on the corresponding resource. Monitors are
allocated by class instead of component.
MBM monitors are much more complicated, if there are enough monitors,
they will be pre-allocated and free-running. If ABMC is in use instead
then 'some' are pre-allocated in a different way, and need assigning.
Add helpers to allocate a CSU monitor. These helper return an out
of range value for MBM counters.
Allocating a montitor context is expected to block until hardware
resources become available. This only makes sense for QOS_L3_OCCUP
as unallocated MBM counters are losing data.
Signed-off-by: James Morse <james.morse@arm.com>
---
Since ABMC got merged it may be possible to remove the monitor alloc
call for MBM counters from resctrl as this work is now done by ABMC's
assign call.
---
drivers/resctrl/mpam_internal.h | 14 ++++++-
drivers/resctrl/mpam_resctrl.c | 68 +++++++++++++++++++++++++++++++++
include/linux/arm_mpam.h | 4 ++
3 files changed, 85 insertions(+), 1 deletion(-)
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 05101186af17..3a68ebd498fa 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -32,6 +32,14 @@ DECLARE_STATIC_KEY_FALSE(mpam_enabled);
#define PACKED_FOR_KUNIT
#endif
+/*
+ * This 'mon' values must not alias an actual monitor, so must be larger than
+ * U16_MAX, but not be confused with an errno value, so smaller than
+ * (u32)-SZ_4K.
+ * USE_PRE_ALLOCATED is used to avoid confusion with an actual monitor.
+ */
+#define USE_PRE_ALLOCATED (U16_MAX + 1)
+
static inline bool mpam_is_enabled(void)
{
return static_branch_likely(&mpam_enabled);
@@ -215,7 +223,11 @@ enum mon_filter_options {
};
struct mon_cfg {
- u16 mon;
+ /*
+ * mon must be large enough to hold out of range values like
+ * USE_RMID_IDX
+ */
+ u32 mon;
u8 pmg;
bool match_pmg;
bool csu_exclude_clean;
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index 22ad5dd3c383..a2b238d47117 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -21,6 +21,8 @@
#include "mpam_internal.h"
+DECLARE_WAIT_QUEUE_HEAD(resctrl_mon_ctx_waiters);
+
/*
* The classes we've picked to map to resctrl resources, wrapped
* in with their resctrl structure.
@@ -287,6 +289,72 @@ struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
return &mpam_resctrl_controls[l].resctrl_res;
}
+static int resctrl_arch_mon_ctx_alloc_no_wait(enum resctrl_event_id evtid)
+{
+ struct mpam_resctrl_mon *mon = &mpam_resctrl_counters[evtid];
+
+ if (!mon->class)
+ return -EINVAL;
+
+ switch (evtid) {
+ case QOS_L3_OCCUP_EVENT_ID:
+ /* With CDP, one monitor gets used for both code/data reads */
+ return mpam_alloc_csu_mon(mon->class);
+ case QOS_L3_MBM_LOCAL_EVENT_ID:
+ case QOS_L3_MBM_TOTAL_EVENT_ID:
+ return USE_PRE_ALLOCATED;
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+void *resctrl_arch_mon_ctx_alloc(struct rdt_resource *r,
+ enum resctrl_event_id evtid)
+{
+ DEFINE_WAIT(wait);
+ int *ret;
+
+ ret = kmalloc(sizeof(*ret), GFP_KERNEL);
+ if (!ret)
+ return ERR_PTR(-ENOMEM);
+
+ do {
+ prepare_to_wait(&resctrl_mon_ctx_waiters, &wait,
+ TASK_INTERRUPTIBLE);
+ *ret = resctrl_arch_mon_ctx_alloc_no_wait(evtid);
+ if (*ret == -ENOSPC)
+ schedule();
+ } while (*ret == -ENOSPC && !signal_pending(current));
+ finish_wait(&resctrl_mon_ctx_waiters, &wait);
+
+ return ret;
+}
+
+static void resctrl_arch_mon_ctx_free_no_wait(enum resctrl_event_id evtid,
+ u32 mon_idx)
+{
+ struct mpam_resctrl_mon *mon = &mpam_resctrl_counters[evtid];
+
+ if (!mon->class)
+ return;
+
+ if (evtid == QOS_L3_OCCUP_EVENT_ID)
+ mpam_free_csu_mon(mon->class, mon_idx);
+
+ wake_up(&resctrl_mon_ctx_waiters);
+}
+
+void resctrl_arch_mon_ctx_free(struct rdt_resource *r,
+ enum resctrl_event_id evtid, void *arch_mon_ctx)
+{
+ u32 mon_idx = *(u32 *)arch_mon_ctx;
+
+ kfree(arch_mon_ctx);
+ arch_mon_ctx = NULL;
+
+ resctrl_arch_mon_ctx_free_no_wait(evtid, mon_idx);
+}
+
static bool cache_has_usable_cpor(struct mpam_class *class)
{
struct mpam_props *cprops = &class->props;
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
index 385554ceb452..e1461e32af75 100644
--- a/include/linux/arm_mpam.h
+++ b/include/linux/arm_mpam.h
@@ -63,6 +63,10 @@ u32 resctrl_arch_rmid_idx_encode(u32 closid, u32 rmid);
void resctrl_arch_rmid_idx_decode(u32 idx, u32 *closid, u32 *rmid);
u32 resctrl_arch_system_num_rmid_idx(void);
+struct rdt_resource;
+void *resctrl_arch_mon_ctx_alloc(struct rdt_resource *r, enum resctrl_event_id evtid);
+void resctrl_arch_mon_ctx_free(struct rdt_resource *r, enum resctrl_event_id evtid, void *ctx);
+
/**
* mpam_register_requestor() - Register a requestor with the MPAM driver
* @partid_max: The maximum PARTID value the requestor can generate.
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [RFC PATCH 24/38] arm_mpam: resctrl: Allow resctrl to allocate monitors
2025-12-05 21:58 ` [RFC PATCH 24/38] arm_mpam: resctrl: Allow resctrl to allocate monitors James Morse
@ 2025-12-16 16:58 ` Ben Horgan
2025-12-18 13:49 ` Jonathan Cameron
1 sibling, 0 replies; 95+ messages in thread
From: Ben Horgan @ 2025-12-16 16:58 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel
Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal
Hi James,
On 12/5/25 21:58, James Morse wrote:
> When resctrl wants to read a domain's 'QOS_L3_OCCUP', it needs
> to allocate a monitor on the corresponding resource. Monitors are
> allocated by class instead of component.
>
> MBM monitors are much more complicated, if there are enough monitors,
> they will be pre-allocated and free-running. If ABMC is in use instead
> then 'some' are pre-allocated in a different way, and need assigning.
>
> Add helpers to allocate a CSU monitor. These helper return an out
> of range value for MBM counters.
>
> Allocating a montitor context is expected to block until hardware
> resources become available. This only makes sense for QOS_L3_OCCUP
> as unallocated MBM counters are losing data.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Since ABMC got merged it may be possible to remove the monitor alloc
> call for MBM counters from resctrl as this work is now done by ABMC's
> assign call.
> ---
> drivers/resctrl/mpam_internal.h | 14 ++++++-
> drivers/resctrl/mpam_resctrl.c | 68 +++++++++++++++++++++++++++++++++
> include/linux/arm_mpam.h | 4 ++
> 3 files changed, 85 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 05101186af17..3a68ebd498fa 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -32,6 +32,14 @@ DECLARE_STATIC_KEY_FALSE(mpam_enabled);
> #define PACKED_FOR_KUNIT
> #endif
>
> +/*
> + * This 'mon' values must not alias an actual monitor, so must be larger than
> + * U16_MAX, but not be confused with an errno value, so smaller than
> + * (u32)-SZ_4K.
> + * USE_PRE_ALLOCATED is used to avoid confusion with an actual monitor.
> + */
> +#define USE_PRE_ALLOCATED (U16_MAX + 1)
> +
> static inline bool mpam_is_enabled(void)
> {
> return static_branch_likely(&mpam_enabled);
> @@ -215,7 +223,11 @@ enum mon_filter_options {
> };
>
> struct mon_cfg {
> - u16 mon;
> + /*
> + * mon must be large enough to hold out of range values like
> + * USE_RMID_IDX
> + */
USE_RMID_IDX -> USE_PRE_ALLOCATED
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread* Re: [RFC PATCH 24/38] arm_mpam: resctrl: Allow resctrl to allocate monitors
2025-12-05 21:58 ` [RFC PATCH 24/38] arm_mpam: resctrl: Allow resctrl to allocate monitors James Morse
2025-12-16 16:58 ` Ben Horgan
@ 2025-12-18 13:49 ` Jonathan Cameron
1 sibling, 0 replies; 95+ messages in thread
From: Jonathan Cameron @ 2025-12-18 13:49 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
On Fri, 5 Dec 2025 21:58:47 +0000
James Morse <james.morse@arm.com> wrote:
> When resctrl wants to read a domain's 'QOS_L3_OCCUP', it needs
> to allocate a monitor on the corresponding resource. Monitors are
> allocated by class instead of component.
>
> MBM monitors are much more complicated, if there are enough monitors,
> they will be pre-allocated and free-running. If ABMC is in use instead
> then 'some' are pre-allocated in a different way, and need assigning.
>
> Add helpers to allocate a CSU monitor. These helper return an out
> of range value for MBM counters.
>
> Allocating a montitor context is expected to block until hardware
> resources become available. This only makes sense for QOS_L3_OCCUP
> as unallocated MBM counters are losing data.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Since ABMC got merged it may be possible to remove the monitor alloc
> call for MBM counters from resctrl as this work is now done by ABMC's
> assign call.
> ---
> drivers/resctrl/mpam_internal.h | 14 ++++++-
> drivers/resctrl/mpam_resctrl.c | 68 +++++++++++++++++++++++++++++++++
> include/linux/arm_mpam.h | 4 ++
> 3 files changed, 85 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
> index 22ad5dd3c383..a2b238d47117 100644
> --- a/drivers/resctrl/mpam_resctrl.c
> +++ b/drivers/resctrl/mpam_resctrl.c
> +
> +void resctrl_arch_mon_ctx_free(struct rdt_resource *r,
> + enum resctrl_event_id evtid, void *arch_mon_ctx)
> +{
> + u32 mon_idx = *(u32 *)arch_mon_ctx;
> +
> + kfree(arch_mon_ctx);
> + arch_mon_ctx = NULL;
Why is this useful? Updating the local pointer copy which is not
used again. Maybe I need more coffee.
> +
> + resctrl_arch_mon_ctx_free_no_wait(evtid, mon_idx);
> +}
^ permalink raw reply [flat|nested] 95+ messages in thread
* [RFC PATCH 25/38] arm_mpam: resctrl: Add resctrl_arch_rmid_read() and resctrl_arch_reset_rmid()
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (23 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 24/38] arm_mpam: resctrl: Allow resctrl to allocate monitors James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-18 13:53 ` Jonathan Cameron
2025-12-05 21:58 ` [RFC PATCH 26/38] arm_mpam: resctrl: Add resctrl_arch_cntr_read() & resctrl_arch_reset_cntr() James Morse
` (13 subsequent siblings)
38 siblings, 1 reply; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
resctrl uses resctrl_arch_rmid_read() to read counters. CDP emulation
means the counter may need reading in three different ways. The same
goes for reset.
The helpers behind the resctrl_arch_ functions will be re-used for the
ABMC equivalent functions.
Add the rounding helper for checking monitor values while we're here.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_resctrl.c | 154 +++++++++++++++++++++++++++++++++
include/linux/arm_mpam.h | 5 ++
2 files changed, 159 insertions(+)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index a2b238d47117..dc8a819d0976 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -355,6 +355,160 @@ void resctrl_arch_mon_ctx_free(struct rdt_resource *r,
resctrl_arch_mon_ctx_free_no_wait(evtid, mon_idx);
}
+static int
+__read_mon(struct mpam_resctrl_mon *mon, struct mpam_component *mon_comp,
+ enum mpam_device_features mon_type,
+ int mon_idx,
+ enum resctrl_conf_type cdp_type, u32 closid, u32 rmid, u64 *val)
+{
+ struct mon_cfg cfg = { };
+
+ if (!mpam_is_enabled())
+ return -EINVAL;
+
+ /* Shift closid to account for CDP */
+ closid = resctrl_get_config_index(closid, cdp_type);
+
+ if (mon_idx == USE_PRE_ALLOCATED) {
+ int mbwu_idx = resctrl_arch_rmid_idx_encode(closid, rmid);
+ mon_idx = mon->mbwu_idx_to_mon[mbwu_idx];
+ if (mon_idx == -1) {
+ if (mpam_resctrl_abmc_enabled()) {
+ /* Report Unassigned */
+ return -ENOENT;
+ }
+ /* Report Unavailable */
+ return -EINVAL;
+ }
+ }
+
+ cfg.mon = mon_idx;
+ cfg.match_pmg = true;
+ cfg.partid = closid;
+ cfg.pmg = rmid;
+
+ if (irqs_disabled()) {
+ /* Check if we can access this domain without an IPI */
+ return -EIO;
+ }
+
+ return mpam_msmon_read(mon_comp, &cfg, mon_type, val);
+}
+
+static int read_mon_cdp_safe(struct mpam_resctrl_mon *mon, struct mpam_component *mon_comp,
+ enum mpam_device_features mon_type,
+ int mon_idx, u32 closid, u32 rmid, u64 *val)
+{
+ if (cdp_enabled) {
+ u64 code_val = 0, data_val = 0;
+ int err;
+
+ err = __read_mon(mon, mon_comp, mon_type, mon_idx,
+ CDP_CODE, closid, rmid, &code_val);
+ if (err)
+ return err;
+
+ err = __read_mon(mon, mon_comp, mon_type, mon_idx,
+ CDP_DATA, closid, rmid, &data_val);
+ if (!err)
+ *val += code_val + data_val;
+ return err;
+ }
+
+ return __read_mon(mon, mon_comp, mon_type, mon_idx,
+ CDP_NONE, closid, rmid, val);
+}
+
+/* MBWU when not in ABMC mode, and CSU counters. */
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
+ u32 closid, u32 rmid, enum resctrl_event_id eventid,
+ u64 *val, void *arch_mon_ctx)
+{
+ struct mpam_resctrl_dom *l3_dom;
+ struct mpam_component *mon_comp;
+ u32 mon_idx = *(u32 *)arch_mon_ctx;
+ enum mpam_device_features mon_type;
+ struct mpam_resctrl_mon *mon = &mpam_resctrl_counters[eventid];
+
+ resctrl_arch_rmid_read_context_check();
+
+ if (eventid >= QOS_NUM_EVENTS || !mon->class)
+ return -EINVAL;
+
+ l3_dom = container_of(d, struct mpam_resctrl_dom, resctrl_mon_dom);
+ mon_comp = l3_dom->mon_comp[eventid];
+
+ switch (eventid) {
+ case QOS_L3_OCCUP_EVENT_ID:
+ mon_type = mpam_feat_msmon_csu;
+ break;
+ case QOS_L3_MBM_LOCAL_EVENT_ID:
+ case QOS_L3_MBM_TOTAL_EVENT_ID:
+ mon_type = mpam_feat_msmon_mbwu;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return read_mon_cdp_safe(mon, mon_comp, mon_type, mon_idx,
+ closid, rmid, val);
+}
+
+static void __reset_mon(struct mpam_resctrl_mon *mon, struct mpam_component *mon_comp,
+ int mon_idx,
+ enum resctrl_conf_type cdp_type, u32 closid, u32 rmid)
+{
+ struct mon_cfg cfg = { };
+
+ if (!mpam_is_enabled())
+ return;
+
+ /* Shift closid to account for CDP */
+ closid = resctrl_get_config_index(closid, cdp_type);
+
+ if (mon_idx == USE_PRE_ALLOCATED) {
+ int mbwu_idx = resctrl_arch_rmid_idx_encode(closid, rmid);
+ mon_idx = mon->mbwu_idx_to_mon[mbwu_idx];
+ }
+
+ if (mon_idx == -1)
+ return;
+ cfg.mon = mon_idx;
+ mpam_msmon_reset_mbwu(mon_comp, &cfg);
+}
+
+static void reset_mon_cdp_safe(struct mpam_resctrl_mon *mon, struct mpam_component *mon_comp,
+ int mon_idx, u32 closid, u32 rmid)
+{
+ if (cdp_enabled) {
+ __reset_mon(mon, mon_comp, mon_idx, CDP_CODE, closid, rmid);
+ __reset_mon(mon, mon_comp, mon_idx, CDP_DATA, closid, rmid);
+ } else {
+ __reset_mon(mon, mon_comp, mon_idx, CDP_NONE, closid, rmid);
+ }
+}
+
+/* Called via IPI. Call with read_cpus_lock() held. */
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
+ u32 closid, u32 rmid, enum resctrl_event_id eventid)
+{
+ struct mpam_resctrl_dom *l3_dom;
+ struct mpam_component *mon_comp;
+ struct mpam_resctrl_mon *mon = &mpam_resctrl_counters[eventid];
+
+ if (!mpam_is_enabled())
+ return;
+
+ /* Only MBWU counters are relevant, and for supported event types. */
+ if (eventid == QOS_L3_OCCUP_EVENT_ID || !mon->class)
+ return;
+
+ l3_dom = container_of(d, struct mpam_resctrl_dom, resctrl_mon_dom);
+ mon_comp = l3_dom->mon_comp[eventid];
+
+ reset_mon_cdp_safe(mon, mon_comp, USE_PRE_ALLOCATED, closid, rmid);
+}
+
static bool cache_has_usable_cpor(struct mpam_class *class)
{
struct mpam_props *cprops = &class->props;
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
index e1461e32af75..86d5e326d2bd 100644
--- a/include/linux/arm_mpam.h
+++ b/include/linux/arm_mpam.h
@@ -67,6 +67,11 @@ struct rdt_resource;
void *resctrl_arch_mon_ctx_alloc(struct rdt_resource *r, enum resctrl_event_id evtid);
void resctrl_arch_mon_ctx_free(struct rdt_resource *r, enum resctrl_event_id evtid, void *ctx);
+static inline unsigned int resctrl_arch_round_mon_val(unsigned int val)
+{
+ return val;
+}
+
/**
* mpam_register_requestor() - Register a requestor with the MPAM driver
* @partid_max: The maximum PARTID value the requestor can generate.
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [RFC PATCH 25/38] arm_mpam: resctrl: Add resctrl_arch_rmid_read() and resctrl_arch_reset_rmid()
2025-12-05 21:58 ` [RFC PATCH 25/38] arm_mpam: resctrl: Add resctrl_arch_rmid_read() and resctrl_arch_reset_rmid() James Morse
@ 2025-12-18 13:53 ` Jonathan Cameron
0 siblings, 0 replies; 95+ messages in thread
From: Jonathan Cameron @ 2025-12-18 13:53 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
On Fri, 5 Dec 2025 21:58:48 +0000
James Morse <james.morse@arm.com> wrote:
> resctrl uses resctrl_arch_rmid_read() to read counters. CDP emulation
> means the counter may need reading in three different ways. The same
> goes for reset.
>
> The helpers behind the resctrl_arch_ functions will be re-used for the
> ABMC equivalent functions.
>
> Add the rounding helper for checking monitor values while we're here.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> drivers/resctrl/mpam_resctrl.c | 154 +++++++++++++++++++++++++++++++++
> include/linux/arm_mpam.h | 5 ++
> 2 files changed, 159 insertions(+)
>
> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
> index a2b238d47117..dc8a819d0976 100644
> --- a/drivers/resctrl/mpam_resctrl.c
> +++ b/drivers/resctrl/mpam_resctrl.c
> @@ -355,6 +355,160 @@ void resctrl_arch_mon_ctx_free(struct rdt_resource *r,
> resctrl_arch_mon_ctx_free_no_wait(evtid, mon_idx);
> }
>
> +static int
> +__read_mon(struct mpam_resctrl_mon *mon, struct mpam_component *mon_comp,
> + enum mpam_device_features mon_type,
> + int mon_idx,
> + enum resctrl_conf_type cdp_type, u32 closid, u32 rmid, u64 *val)
> +{
> + struct mon_cfg cfg = { };
> +
> + if (!mpam_is_enabled())
> + return -EINVAL;
> +
> + /* Shift closid to account for CDP */
> + closid = resctrl_get_config_index(closid, cdp_type);
> +
> + if (mon_idx == USE_PRE_ALLOCATED) {
> + int mbwu_idx = resctrl_arch_rmid_idx_encode(closid, rmid);
> + mon_idx = mon->mbwu_idx_to_mon[mbwu_idx];
> + if (mon_idx == -1) {
> + if (mpam_resctrl_abmc_enabled()) {
> + /* Report Unassigned */
> + return -ENOENT;
> + }
> + /* Report Unavailable */
> + return -EINVAL;
> + }
> + }
> +
> + cfg.mon = mon_idx;
> + cfg.match_pmg = true;
> + cfg.partid = closid;
> + cfg.pmg = rmid;
Maybe use
cfg = (struct mon_cfg) {
.mon = ...
and drop the earlier initialization.
};
> +
> + if (irqs_disabled()) {
> + /* Check if we can access this domain without an IPI */
> + return -EIO;
> + }
> +
> + return mpam_msmon_read(mon_comp, &cfg, mon_type, val);
> +}
> +
> +static int read_mon_cdp_safe(struct mpam_resctrl_mon *mon, struct mpam_component *mon_comp,
> + enum mpam_device_features mon_type,
> + int mon_idx, u32 closid, u32 rmid, u64 *val)
> +{
> + if (cdp_enabled) {
> + u64 code_val = 0, data_val = 0;
> + int err;
> +
> + err = __read_mon(mon, mon_comp, mon_type, mon_idx,
> + CDP_CODE, closid, rmid, &code_val);
> + if (err)
> + return err;
> +
> + err = __read_mon(mon, mon_comp, mon_type, mon_idx,
> + CDP_DATA, closid, rmid, &data_val);
> + if (!err)
> + *val += code_val + data_val;
I'd stick to out of line error handling and just spend a couple more lines
for readability.
if (err)
return err;
*val += ..
return 0;
> + return err;
> + }
> +
> + return __read_mon(mon, mon_comp, mon_type, mon_idx,
> + CDP_NONE, closid, rmid, val);
> +}
^ permalink raw reply [flat|nested] 95+ messages in thread
* [RFC PATCH 26/38] arm_mpam: resctrl: Add resctrl_arch_cntr_read() & resctrl_arch_reset_cntr()
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (24 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 25/38] arm_mpam: resctrl: Add resctrl_arch_rmid_read() and resctrl_arch_reset_rmid() James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-05 21:58 ` [RFC PATCH 27/38] arm_mpam: resctrl: Add empty definitions for assorted resctrl functions James Morse
` (12 subsequent siblings)
38 siblings, 0 replies; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
When used in ABMC mode, resctrl uses a different set of helpers to
read and reset the counters.
Add these.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_resctrl.c | 43 ++++++++++++++++++++++++++++++++++
1 file changed, 43 insertions(+)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index dc8a819d0976..1333dc40714a 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -454,6 +454,28 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
closid, rmid, val);
}
+/* MBWU counters when in ABMC mode */
+int resctrl_arch_cntr_read(struct rdt_resource *r, struct rdt_mon_domain *d,
+ u32 closid, u32 rmid, int mon_idx,
+ enum resctrl_event_id eventid, u64 *val)
+{
+ struct mpam_resctrl_mon *mon = &mpam_resctrl_counters[eventid];
+ struct mpam_resctrl_dom *l3_dom;
+ struct mpam_component *mon_comp;
+
+ if (!mpam_is_enabled())
+ return -EINVAL;
+
+ if (eventid == QOS_L3_OCCUP_EVENT_ID || !mon->class)
+ return -EINVAL;
+
+ l3_dom = container_of(d, struct mpam_resctrl_dom, resctrl_mon_dom);
+ mon_comp = l3_dom->mon_comp[eventid];
+
+ return read_mon_cdp_safe(mon, mon_comp, mpam_feat_msmon_mbwu, mon_idx,
+ closid, rmid, val);
+}
+
static void __reset_mon(struct mpam_resctrl_mon *mon, struct mpam_component *mon_comp,
int mon_idx,
enum resctrl_conf_type cdp_type, u32 closid, u32 rmid)
@@ -509,6 +531,27 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
reset_mon_cdp_safe(mon, mon_comp, USE_PRE_ALLOCATED, closid, rmid);
}
+/* Reset an assigned counter */
+void resctrl_arch_reset_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
+ u32 closid, u32 rmid, int cntr_id,
+ enum resctrl_event_id eventid)
+{
+ struct mpam_resctrl_mon *mon = &mpam_resctrl_counters[eventid];
+ struct mpam_resctrl_dom *l3_dom;
+ struct mpam_component *mon_comp;
+
+ if (!mpam_is_enabled())
+ return;
+
+ if (eventid == QOS_L3_OCCUP_EVENT_ID || !mon->class)
+ return;
+
+ l3_dom = container_of(d, struct mpam_resctrl_dom, resctrl_mon_dom);
+ mon_comp = l3_dom->mon_comp[eventid];
+
+ reset_mon_cdp_safe(mon, mon_comp, USE_PRE_ALLOCATED, closid, rmid);
+}
+
static bool cache_has_usable_cpor(struct mpam_class *class)
{
struct mpam_props *cprops = &class->props;
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* [RFC PATCH 27/38] arm_mpam: resctrl: Add empty definitions for assorted resctrl functions
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (25 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 26/38] arm_mpam: resctrl: Add resctrl_arch_cntr_read() & resctrl_arch_reset_cntr() James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-09 16:31 ` Ben Horgan
2025-12-05 21:58 ` [RFC PATCH 28/38] arm64: mpam: Select ARCH_HAS_CPU_RESCTRL James Morse
` (11 subsequent siblings)
38 siblings, 1 reply; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
A few resctrl features and hooks need to be provided, but aren't needed
or supported on MPAM platforms.
resctrl has individual hooks to separately enable and disable the
closid/partid and rmid/pmg context switching code. For MPAM this is all
the same thing, as the value in struct task_struct is used to cache the
value that should be written to hardware. arm64's context switching code
is enabled once MPAM is usable, but doesn't touch the hardware unless
the value has changed.
For now event configuration is not supported, and can be turned off
by returning 'false' from resctrl_arch_is_evt_configurable().
Add this, and empty definitions for the other hooks.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_resctrl.c | 17 +++++++++++++++++
include/linux/arm_mpam.h | 9 +++++++++
2 files changed, 26 insertions(+)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index 1333dc40714a..00455de3efe9 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -85,6 +85,23 @@ bool resctrl_arch_mon_capable(void)
return exposed_mon_capable;
}
+bool resctrl_arch_is_evt_configurable(enum resctrl_event_id evt)
+{
+ return false;
+}
+
+void resctrl_arch_mon_event_config_read(void *info)
+{
+}
+
+void resctrl_arch_mon_event_config_write(void *info)
+{
+}
+
+void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d)
+{
+}
+
bool resctrl_arch_get_cdp_enabled(enum resctrl_res_level rid)
{
switch (rid) {
diff --git a/include/linux/arm_mpam.h b/include/linux/arm_mpam.h
index 86d5e326d2bd..f92a36187a52 100644
--- a/include/linux/arm_mpam.h
+++ b/include/linux/arm_mpam.h
@@ -67,6 +67,15 @@ struct rdt_resource;
void *resctrl_arch_mon_ctx_alloc(struct rdt_resource *r, enum resctrl_event_id evtid);
void resctrl_arch_mon_ctx_free(struct rdt_resource *r, enum resctrl_event_id evtid, void *ctx);
+/*
+ * The CPU configuration for MPAM is cheap to write, and is only written if it
+ * has changed. No need for fine grained enables.
+ */
+static inline void resctrl_arch_enable_mon(void) { }
+static inline void resctrl_arch_disable_mon(void) { }
+static inline void resctrl_arch_enable_alloc(void) { }
+static inline void resctrl_arch_disable_alloc(void) { }
+
static inline unsigned int resctrl_arch_round_mon_val(unsigned int val)
{
return val;
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [RFC PATCH 27/38] arm_mpam: resctrl: Add empty definitions for assorted resctrl functions
2025-12-05 21:58 ` [RFC PATCH 27/38] arm_mpam: resctrl: Add empty definitions for assorted resctrl functions James Morse
@ 2025-12-09 16:31 ` Ben Horgan
0 siblings, 0 replies; 95+ messages in thread
From: Ben Horgan @ 2025-12-09 16:31 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel
Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal
Hi James,
On 12/5/25 21:58, James Morse wrote:
> A few resctrl features and hooks need to be provided, but aren't needed
> or supported on MPAM platforms.
>
> resctrl has individual hooks to separately enable and disable the
> closid/partid and rmid/pmg context switching code. For MPAM this is all
> the same thing, as the value in struct task_struct is used to cache the
> value that should be written to hardware. arm64's context switching code
> is enabled once MPAM is usable, but doesn't touch the hardware unless
> the value has changed.
>
> For now event configuration is not supported, and can be turned off
> by returning 'false' from resctrl_arch_is_evt_configurable().
>
> Add this, and empty definitions for the other hooks.
>
> Signed-off-by: James Morse <james.morse@arm.com>
The newly added io_alloc support means we need a couple more dummy
resctrl arch functions in here, resctrl_arch_io_alloc_enable() and
resctrl_arch_io_alloc_enabled().
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread
* [RFC PATCH 28/38] arm64: mpam: Select ARCH_HAS_CPU_RESCTRL
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (26 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 27/38] arm_mpam: resctrl: Add empty definitions for assorted resctrl functions James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-09 16:33 ` Ben Horgan
2025-12-18 13:55 ` Jonathan Cameron
2025-12-05 21:58 ` [RFC PATCH 29/38] arm_mpam: resctrl: Call resctrl_init() on platforms that can support resctrl James Morse
` (10 subsequent siblings)
38 siblings, 2 replies; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
Enough MPAM support is present to enable ARCH_HAS_CPU_RESCTRL.
Let it rip^Wlink!
ARCH_HAS_CPU_RESCTRL indicates resctrl can be enabled. It is enabled
by the arch code sipmly because it has 'arch' in its name.
This removes ARM_CPU_RESCTRL as a mimic of X86_CPU_RESCTRL.
While here, move the ACPI dependency to the driver's Kconfig file.
Signed-off-by: James Morse <james.morse@arm.com>
---
arch/arm64/Kconfig | 4 ++--
arch/arm64/include/asm/resctrl.h | 2 ++
drivers/resctrl/Kconfig | 9 ++++++++-
drivers/resctrl/Makefile | 2 +-
4 files changed, 13 insertions(+), 4 deletions(-)
create mode 100644 arch/arm64/include/asm/resctrl.h
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 558baa9e7c08..e67885ac7717 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2025,8 +2025,8 @@ config ARM64_TLB_RANGE
config ARM64_MPAM
bool "Enable support for MPAM"
- select ARM64_MPAM_DRIVER if EXPERT # does nothing yet
- select ACPI_MPAM if ACPI
+ select ARM64_MPAM_DRIVER
+ select ARCH_HAS_CPU_RESCTRL
help
Memory System Resource Partitioning and Monitoring (MPAM) is an
optional extension to the Arm architecture that allows each
diff --git a/arch/arm64/include/asm/resctrl.h b/arch/arm64/include/asm/resctrl.h
new file mode 100644
index 000000000000..b506e95cf6e3
--- /dev/null
+++ b/arch/arm64/include/asm/resctrl.h
@@ -0,0 +1,2 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <linux/arm_mpam.h>
diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
index c808e0470394..672abea3b03c 100644
--- a/drivers/resctrl/Kconfig
+++ b/drivers/resctrl/Kconfig
@@ -1,6 +1,7 @@
menuconfig ARM64_MPAM_DRIVER
bool "MPAM driver"
- depends on ARM64 && ARM64_MPAM && EXPERT
+ depends on ARM64 && ARM64_MPAM
+ select ACPI_MPAM if ACPI
help
Memory System Resource Partitioning and Monitoring (MPAM) driver for
System IP, e.g. caches and memory controllers.
@@ -22,3 +23,9 @@ config MPAM_KUNIT_TEST
If unsure, say N.
endif
+
+config ARM64_MPAM_RESCTRL_FS
+ bool
+ default y if ARM64_MPAM_DRIVER && RESCTRL_FS
+ select RESCTRL_RMID_DEPENDS_ON_CLOSID
+ select RESCTRL_ASSIGN_FIXED
diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
index 40beaf999582..4f6d0e81f9b8 100644
--- a/drivers/resctrl/Makefile
+++ b/drivers/resctrl/Makefile
@@ -1,5 +1,5 @@
obj-$(CONFIG_ARM64_MPAM_DRIVER) += mpam.o
mpam-y += mpam_devices.o
-mpam-$(CONFIG_ARM_CPU_RESCTRL) += mpam_resctrl.o
+mpam-$(CONFIG_ARM64_MPAM_RESCTRL_FS) += mpam_resctrl.o
ccflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG) += -DDEBUG
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [RFC PATCH 28/38] arm64: mpam: Select ARCH_HAS_CPU_RESCTRL
2025-12-05 21:58 ` [RFC PATCH 28/38] arm64: mpam: Select ARCH_HAS_CPU_RESCTRL James Morse
@ 2025-12-09 16:33 ` Ben Horgan
2025-12-18 13:55 ` Jonathan Cameron
1 sibling, 0 replies; 95+ messages in thread
From: Ben Horgan @ 2025-12-09 16:33 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel
Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal
Hi James,
On 12/5/25 21:58, James Morse wrote:
> Enough MPAM support is present to enable ARCH_HAS_CPU_RESCTRL.
> Let it rip^Wlink!
>
> ARCH_HAS_CPU_RESCTRL indicates resctrl can be enabled. It is enabled
> by the arch code sipmly because it has 'arch' in its name.
>
> This removes ARM_CPU_RESCTRL as a mimic of X86_CPU_RESCTRL.
> While here, move the ACPI dependency to the driver's Kconfig file.
Mention we're know longer hiding behind CONFIG_EXPERT.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> arch/arm64/Kconfig | 4 ++--
> arch/arm64/include/asm/resctrl.h | 2 ++
> drivers/resctrl/Kconfig | 9 ++++++++-
> drivers/resctrl/Makefile | 2 +-
> 4 files changed, 13 insertions(+), 4 deletions(-)
> create mode 100644 arch/arm64/include/asm/resctrl.h
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 558baa9e7c08..e67885ac7717 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -2025,8 +2025,8 @@ config ARM64_TLB_RANGE
>
> config ARM64_MPAM
> bool "Enable support for MPAM"
> - select ARM64_MPAM_DRIVER if EXPERT # does nothing yet
> - select ACPI_MPAM if ACPI
> + select ARM64_MPAM_DRIVER
> + select ARCH_HAS_CPU_RESCTRL
> help
> Memory System Resource Partitioning and Monitoring (MPAM) is an
> optional extension to the Arm architecture that allows each
> diff --git a/arch/arm64/include/asm/resctrl.h b/arch/arm64/include/asm/resctrl.h
> new file mode 100644
> index 000000000000..b506e95cf6e3
> --- /dev/null
> +++ b/arch/arm64/include/asm/resctrl.h
> @@ -0,0 +1,2 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#include <linux/arm_mpam.h>
> diff --git a/drivers/resctrl/Kconfig b/drivers/resctrl/Kconfig
> index c808e0470394..672abea3b03c 100644
> --- a/drivers/resctrl/Kconfig
> +++ b/drivers/resctrl/Kconfig
> @@ -1,6 +1,7 @@
> menuconfig ARM64_MPAM_DRIVER
> bool "MPAM driver"
> - depends on ARM64 && ARM64_MPAM && EXPERT
> + depends on ARM64 && ARM64_MPAM
> + select ACPI_MPAM if ACPI
> help
> Memory System Resource Partitioning and Monitoring (MPAM) driver for
> System IP, e.g. caches and memory controllers.
> @@ -22,3 +23,9 @@ config MPAM_KUNIT_TEST
> If unsure, say N.
>
> endif
> +
> +config ARM64_MPAM_RESCTRL_FS
> + bool
> + default y if ARM64_MPAM_DRIVER && RESCTRL_FS
> + select RESCTRL_RMID_DEPENDS_ON_CLOSID
> + select RESCTRL_ASSIGN_FIXED
> diff --git a/drivers/resctrl/Makefile b/drivers/resctrl/Makefile
> index 40beaf999582..4f6d0e81f9b8 100644
> --- a/drivers/resctrl/Makefile
> +++ b/drivers/resctrl/Makefile
> @@ -1,5 +1,5 @@
> obj-$(CONFIG_ARM64_MPAM_DRIVER) += mpam.o
> mpam-y += mpam_devices.o
> -mpam-$(CONFIG_ARM_CPU_RESCTRL) += mpam_resctrl.o
> +mpam-$(CONFIG_ARM64_MPAM_RESCTRL_FS) += mpam_resctrl.o
>
> ccflags-$(CONFIG_ARM64_MPAM_DRIVER_DEBUG) += -DDEBUG
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [RFC PATCH 28/38] arm64: mpam: Select ARCH_HAS_CPU_RESCTRL
2025-12-05 21:58 ` [RFC PATCH 28/38] arm64: mpam: Select ARCH_HAS_CPU_RESCTRL James Morse
2025-12-09 16:33 ` Ben Horgan
@ 2025-12-18 13:55 ` Jonathan Cameron
1 sibling, 0 replies; 95+ messages in thread
From: Jonathan Cameron @ 2025-12-18 13:55 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
On Fri, 5 Dec 2025 21:58:51 +0000
James Morse <james.morse@arm.com> wrote:
> Enough MPAM support is present to enable ARCH_HAS_CPU_RESCTRL.
> Let it rip^Wlink!
>
> ARCH_HAS_CPU_RESCTRL indicates resctrl can be enabled. It is enabled
> by the arch code sipmly because it has 'arch' in its name.
simply
>
> This removes ARM_CPU_RESCTRL as a mimic of X86_CPU_RESCTRL.
> While here, move the ACPI dependency to the driver's Kconfig file.
>
> Signed-off-by: James Morse <james.morse@arm.com>
^ permalink raw reply [flat|nested] 95+ messages in thread
* [RFC PATCH 29/38] arm_mpam: resctrl: Call resctrl_init() on platforms that can support resctrl
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (27 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 28/38] arm64: mpam: Select ARCH_HAS_CPU_RESCTRL James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-05 21:58 ` [RFC PATCH 30/38] arm_mpam: resctrl: Call resctrl_exit() in the event of errors James Morse
` (9 subsequent siblings)
38 siblings, 0 replies; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
Now that MPAM links against resctrl, call resctrl_init() to register
the filesystem and setup resctrl's strutures.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_resctrl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index 00455de3efe9..eb3caee45470 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -1768,7 +1768,7 @@ int mpam_resctrl_setup(void)
pr_warn("Number of PMG is not a power of 2! resctrl may misbehave");
}
- /* TODO: call resctrl_init() */
+ err = resctrl_init();
}
return err;
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* [RFC PATCH 30/38] arm_mpam: resctrl: Call resctrl_exit() in the event of errors
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (28 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 29/38] arm_mpam: resctrl: Call resctrl_init() on platforms that can support resctrl James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-05 21:58 ` [RFC PATCH 31/38] arm_mpam: resctrl: Update the rmid reallocation limit James Morse
` (8 subsequent siblings)
38 siblings, 0 replies; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
All of MPAMs errors indicate a software bug, e.g. an out-of-bounds
partid has been generated. When this happens, the mpam driver
is disabled.
If resctrl_init() succeeded, also call resctrl_exit() to
remove resctrl.
mpam_devices.c calls mpam_resctrl_teardown_class() when a class
becomes incomplete, and can no longer be used by resctrl. If
resctrl was using this class, then resctrl_exit() is called.
This in turn removes the kernfs hierarchy from the filesystem
and free()s memory that was allocated by resctrl.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 32 +++++++++++--
drivers/resctrl/mpam_internal.h | 4 ++
drivers/resctrl/mpam_resctrl.c | 80 +++++++++++++++++++++++++++++++++
3 files changed, 112 insertions(+), 4 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index fccebfd980d8..1334093fc03e 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -73,6 +73,14 @@ static DECLARE_WORK(mpam_broken_work, &mpam_disable);
/* When mpam is disabled, the printed reason to aid debugging */
static char *mpam_disable_reason;
+/*
+ * Whether resctrl has been setup. Used by cpuhp in preference to
+ * mpam_is_enabled(). The disable call after an error interrupt makes
+ * mpam_is_enabled() false before the cpuhp callbacks are made.
+ * Reads/writes should hold mpam_cpuhp_state_lock, (or be cpuhp callbacks).
+ */
+static bool mpam_resctrl_enabled;
+
/*
* An MSC is a physical container for controls and monitors, each identified by
* their RIS index. These share a base-address, interrupts and some MMIO
@@ -1627,7 +1635,7 @@ static int mpam_cpu_online(unsigned int cpu)
mpam_reprogram_msc(msc);
}
- if (mpam_is_enabled())
+ if (mpam_resctrl_enabled)
mpam_resctrl_online_cpu(cpu);
return 0;
@@ -1673,7 +1681,7 @@ static int mpam_cpu_offline(unsigned int cpu)
{
struct mpam_msc *msc;
- if (mpam_is_enabled())
+ if (mpam_resctrl_enabled)
mpam_resctrl_offline_cpu(cpu);
guard(srcu)(&mpam_srcu);
@@ -2535,6 +2543,7 @@ static void mpam_enable_once(void)
}
static_branch_enable(&mpam_enabled);
+ mpam_resctrl_enabled = true;
mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline,
"mpam:online");
@@ -2594,24 +2603,39 @@ void mpam_reset_class(struct mpam_class *class)
void mpam_disable(struct work_struct *ignored)
{
int idx;
+ bool do_resctrl_exit;
struct mpam_class *class;
struct mpam_msc *msc, *tmp;
+ if (mpam_is_enabled())
+ static_branch_disable(&mpam_enabled);
+
mutex_lock(&mpam_cpuhp_state_lock);
if (mpam_cpuhp_state) {
cpuhp_remove_state(mpam_cpuhp_state);
mpam_cpuhp_state = 0;
}
+
+ /*
+ * Removing the cpuhp state called mpam_cpu_offline() and told resctrl
+ * all the CPUs are offline.
+ */
+ do_resctrl_exit = mpam_resctrl_enabled;
+ mpam_resctrl_enabled = false;
mutex_unlock(&mpam_cpuhp_state_lock);
- static_branch_disable(&mpam_enabled);
+ if (do_resctrl_exit)
+ mpam_resctrl_exit();
mpam_unregister_irqs();
idx = srcu_read_lock(&mpam_srcu);
list_for_each_entry_srcu(class, &mpam_classes, classes_list,
- srcu_read_lock_held(&mpam_srcu))
+ srcu_read_lock_held(&mpam_srcu)) {
mpam_reset_class(class);
+ if (do_resctrl_exit)
+ mpam_resctrl_teardown_class(class);
+ }
srcu_read_unlock(&mpam_srcu, idx);
mutex_lock(&mpam_list_lock);
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 3a68ebd498fa..b13d5e55e701 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -450,12 +450,16 @@ int mpam_get_cpumask_from_cache_id(unsigned long cache_id, u32 cache_level,
#ifdef CONFIG_RESCTRL_FS
int mpam_resctrl_setup(void);
+void mpam_resctrl_exit(void);
int mpam_resctrl_online_cpu(unsigned int cpu);
void mpam_resctrl_offline_cpu(unsigned int cpu);
+void mpam_resctrl_teardown_class(struct mpam_class *class);
#else
static inline int mpam_resctrl_setup(void) { return 0; }
+static inline void mpam_resctrl_exit(void) { }
static inline int mpam_resctrl_online_cpu(unsigned int cpu) { return 0; }
static inline void mpam_resctrl_offline_cpu(unsigned int cpu) { }
+static inline void mpam_resctrl_teardown_class(struct mpam_class *class) { }
#endif /* CONFIG_RESCTRL_FS */
/*
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index eb3caee45470..506063bd3348 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -52,6 +52,12 @@ static bool exposed_mon_capable;
*/
static bool cdp_enabled;
+/*
+ * If resctrl_init() succeeded, resctrl_exit() can be used to remove support
+ * for the filesystem in the event of an error.
+ */
+static bool resctrl_enabled;
+
/*
* L3 local/total may come from different classes - what is the number of MBWU
* 'on L3'?
@@ -310,6 +316,9 @@ static int resctrl_arch_mon_ctx_alloc_no_wait(enum resctrl_event_id evtid)
{
struct mpam_resctrl_mon *mon = &mpam_resctrl_counters[evtid];
+ if (!mpam_is_enabled())
+ return -EINVAL;
+
if (!mon->class)
return -EINVAL;
@@ -352,6 +361,9 @@ static void resctrl_arch_mon_ctx_free_no_wait(enum resctrl_event_id evtid,
{
struct mpam_resctrl_mon *mon = &mpam_resctrl_counters[evtid];
+ if (!mpam_is_enabled())
+ return;
+
if (!mon->class)
return;
@@ -449,6 +461,9 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
resctrl_arch_rmid_read_context_check();
+ if (!mpam_is_enabled())
+ return -EINVAL;
+
if (eventid >= QOS_NUM_EVENTS || !mon->class)
return -EINVAL;
@@ -1343,6 +1358,9 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
lockdep_assert_cpus_held();
lockdep_assert_irqs_enabled();
+ if (!mpam_is_enabled())
+ return -EINVAL;
+
/*
* No need to check the CPU as mpam_apply_config() doesn't care, and
* resctrl_arch_update_domains() relies on this.
@@ -1408,6 +1426,9 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
lockdep_assert_cpus_held();
lockdep_assert_irqs_enabled();
+ if (!mpam_is_enabled())
+ return -EINVAL;
+
list_for_each_entry(d, &r->ctrl_domains, hdr.list) {
for (t = 0; t < CDP_NUM_TYPES; t++) {
cfg = &d->staged_config[t];
@@ -1769,11 +1790,70 @@ int mpam_resctrl_setup(void)
}
err = resctrl_init();
+ if (!err)
+ WRITE_ONCE(resctrl_enabled, true);
}
return err;
}
+void mpam_resctrl_exit(void)
+{
+ if (!READ_ONCE(resctrl_enabled))
+ return;
+
+ WRITE_ONCE(resctrl_enabled, false);
+ resctrl_exit();
+}
+
+static void mpam_resctrl_teardown_mon(struct mpam_resctrl_mon *mon, struct mpam_class *class)
+{
+ u32 num_mbwu_mon = l3_num_allocated_mbwu;
+
+ if (!mon->mbwu_idx_to_mon)
+ return;
+
+ if (mon->assigned_counters) {
+ __free_mbwu_mon(class, mon->assigned_counters, num_mbwu_mon);
+ mon->assigned_counters = NULL;
+ kfree(mon->mbwu_idx_to_mon);
+ } else {
+ __free_mbwu_mon(class, mon->mbwu_idx_to_mon, num_mbwu_mon);
+ }
+ mon->mbwu_idx_to_mon = NULL;
+}
+
+/*
+ * The driver is detaching an MSC from this class, if resctrl was using it,
+ * pull on resctrl_exit().
+ */
+void mpam_resctrl_teardown_class(struct mpam_class *class)
+{
+ int i;
+ struct mpam_resctrl_res *res;
+ struct mpam_resctrl_mon *mon;
+
+ might_sleep();
+
+ for (i = 0; i < RDT_NUM_RESOURCES; i++) {
+ res = &mpam_resctrl_controls[i];
+ if (res->class == class) {
+ res->class = NULL;
+ break;
+ }
+ }
+ for (i = 0; i < QOS_NUM_EVENTS; i++) {
+ mon = &mpam_resctrl_counters[i];
+ if (mon->class == class) {
+ mon->class = NULL;
+
+ mpam_resctrl_teardown_mon(mon, class);
+
+ break;
+ }
+ }
+}
+
#ifdef CONFIG_MPAM_KUNIT_TEST
#include "test_mpam_resctrl.c"
#endif
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* [RFC PATCH 31/38] arm_mpam: resctrl: Update the rmid reallocation limit
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (29 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 30/38] arm_mpam: resctrl: Call resctrl_exit() in the event of errors James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-06 0:06 ` Fenghua Yu
2025-12-05 21:58 ` [RFC PATCH 32/38] arm_mpam: resctrl: Sort the order of the domain lists James Morse
` (7 subsequent siblings)
38 siblings, 1 reply; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
resctrl's limbo code needs to be told when the data left in a cache is
small enough for the partid+pmg value to be re-allocated.
x86 uses the cache size divided by the number of rmid users the cache
may have. Do the same, but for the smallest cache, and with the
number of partid-and-pmg users.
Querying the cache size can't happen until after cacheinfo_sysfs_init()
has run, so mpam_resctrl_setup() must wait until then.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_resctrl.c | 54 ++++++++++++++++++++++++++++++++++
1 file changed, 54 insertions(+)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index 506063bd3348..ccdf8db742c9 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -16,6 +16,7 @@
#include <linux/resctrl.h>
#include <linux/slab.h>
#include <linux/types.h>
+#include <linux/wait.h>
#include <asm/mpam.h>
@@ -58,6 +59,13 @@ static bool cdp_enabled;
*/
static bool resctrl_enabled;
+/*
+ * mpam_resctrl_pick_caches() needs to know the size of the caches. cacheinfo
+ * populates this from a device_initcall(). mpam_resctrl_setup() must wait.
+ */
+static bool cacheinfo_ready;
+static DECLARE_WAIT_QUEUE_HEAD(wait_cacheinfo_ready);
+
/*
* L3 local/total may come from different classes - what is the number of MBWU
* 'on L3'?
@@ -584,6 +592,38 @@ void resctrl_arch_reset_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
reset_mon_cdp_safe(mon, mon_comp, USE_PRE_ALLOCATED, closid, rmid);
}
+/*
+ * The rmid realloc threshold should be for the smallest cache exposed to
+ * resctrl.
+ */
+static int update_rmid_limits(struct mpam_class *class)
+{
+ u32 num_unique_pmg = resctrl_arch_system_num_rmid_idx();
+ struct mpam_props *cprops = &class->props;
+ struct cacheinfo *ci;
+
+ lockdep_assert_cpus_held();
+
+ /* Assume cache levels are the same size for all CPUs... */
+ ci = get_cpu_cacheinfo_level(smp_processor_id(), class->level);
+ if (!ci || ci->size == 0) {
+ pr_debug("Could not read cache size for class %u\n",
+ class->level);
+ return -EINVAL;
+ }
+
+ if (!mpam_has_feature(mpam_feat_msmon_csu, cprops))
+ return 0;
+
+ if (!resctrl_rmid_realloc_limit ||
+ ci->size < resctrl_rmid_realloc_limit) {
+ resctrl_rmid_realloc_limit = ci->size;
+ resctrl_rmid_realloc_threshold = ci->size / num_unique_pmg;
+ }
+
+ return 0;
+}
+
static bool cache_has_usable_cpor(struct mpam_class *class)
{
struct mpam_props *cprops = &class->props;
@@ -1025,6 +1065,9 @@ static void mpam_resctrl_pick_counters(void)
/* CSU counters only make sense on a cache. */
switch (class->type) {
case MPAM_CLASS_CACHE:
+ if (update_rmid_limits(class))
+ continue;
+
counter_update_class(QOS_L3_OCCUP_EVENT_ID, class);
return;
default:
@@ -1731,6 +1774,8 @@ int mpam_resctrl_setup(void)
struct mpam_resctrl_res *res;
struct mpam_resctrl_mon *mon;
+ wait_event(wait_cacheinfo_ready, cacheinfo_ready);
+
cpus_read_lock();
for (i = 0; i < RDT_NUM_RESOURCES; i++) {
res = &mpam_resctrl_controls[i];
@@ -1854,6 +1899,15 @@ void mpam_resctrl_teardown_class(struct mpam_class *class)
}
}
+static int __init __cacheinfo_ready(void)
+{
+ cacheinfo_ready = true;
+ wake_up(&wait_cacheinfo_ready);
+
+ return 0;
+}
+device_initcall_sync(__cacheinfo_ready);
+
#ifdef CONFIG_MPAM_KUNIT_TEST
#include "test_mpam_resctrl.c"
#endif
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [RFC PATCH 31/38] arm_mpam: resctrl: Update the rmid reallocation limit
2025-12-05 21:58 ` [RFC PATCH 31/38] arm_mpam: resctrl: Update the rmid reallocation limit James Morse
@ 2025-12-06 0:06 ` Fenghua Yu
2025-12-09 16:36 ` Ben Horgan
0 siblings, 1 reply; 95+ messages in thread
From: Fenghua Yu @ 2025-12-06 0:06 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel
Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, baisheng.gao, Jonathan Cameron, Gavin Shan,
Ben Horgan, rohit.mathew, reinette.chatre, Punit Agrawal
Hi, James,
On 12/5/25 13:58, James Morse wrote:
> resctrl's limbo code needs to be told when the data left in a cache is
> small enough for the partid+pmg value to be re-allocated.
>
> x86 uses the cache size divided by the number of rmid users the cache
> may have. Do the same, but for the smallest cache, and with the
> number of partid-and-pmg users.
>
> Querying the cache size can't happen until after cacheinfo_sysfs_init()
> has run, so mpam_resctrl_setup() must wait until then.
>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> drivers/resctrl/mpam_resctrl.c | 54 ++++++++++++++++++++++++++++++++++
> 1 file changed, 54 insertions(+)
>
> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
> index 506063bd3348..ccdf8db742c9 100644
> --- a/drivers/resctrl/mpam_resctrl.c
> +++ b/drivers/resctrl/mpam_resctrl.c
> @@ -16,6 +16,7 @@
> #include <linux/resctrl.h>
> #include <linux/slab.h>
> #include <linux/types.h>
> +#include <linux/wait.h>
>
> #include <asm/mpam.h>
>
> @@ -58,6 +59,13 @@ static bool cdp_enabled;
> */
> static bool resctrl_enabled;
>
> +/*
> + * mpam_resctrl_pick_caches() needs to know the size of the caches. cacheinfo
> + * populates this from a device_initcall(). mpam_resctrl_setup() must wait.
> + */
> +static bool cacheinfo_ready;
> +static DECLARE_WAIT_QUEUE_HEAD(wait_cacheinfo_ready);
> +
> /*
> * L3 local/total may come from different classes - what is the number of MBWU
> * 'on L3'?
> @@ -584,6 +592,38 @@ void resctrl_arch_reset_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
> reset_mon_cdp_safe(mon, mon_comp, USE_PRE_ALLOCATED, closid, rmid);
> }
>
> +/*
> + * The rmid realloc threshold should be for the smallest cache exposed to
> + * resctrl.
> + */
> +static int update_rmid_limits(struct mpam_class *class)
> +{
> + u32 num_unique_pmg = resctrl_arch_system_num_rmid_idx();
> + struct mpam_props *cprops = &class->props;
> + struct cacheinfo *ci;
> +
> + lockdep_assert_cpus_held();
> +
> + /* Assume cache levels are the same size for all CPUs... */
> + ci = get_cpu_cacheinfo_level(smp_processor_id(), class->level);
> + if (!ci || ci->size == 0) {
> + pr_debug("Could not read cache size for class %u\n",
> + class->level);
> + return -EINVAL;
> + }
> +
> + if (!mpam_has_feature(mpam_feat_msmon_csu, cprops))
> + return 0;
> +
> + if (!resctrl_rmid_realloc_limit ||
> + ci->size < resctrl_rmid_realloc_limit) {
> + resctrl_rmid_realloc_limit = ci->size;
> + resctrl_rmid_realloc_threshold = ci->size / num_unique_pmg;
> + }
> +
> + return 0;
> +}
> +
> static bool cache_has_usable_cpor(struct mpam_class *class)
> {
> struct mpam_props *cprops = &class->props;
> @@ -1025,6 +1065,9 @@ static void mpam_resctrl_pick_counters(void)
> /* CSU counters only make sense on a cache. */
> switch (class->type) {
> case MPAM_CLASS_CACHE:
> + if (update_rmid_limits(class))
> + continue;
> +
> counter_update_class(QOS_L3_OCCUP_EVENT_ID, class);
> return;
> default:
> @@ -1731,6 +1774,8 @@ int mpam_resctrl_setup(void)
> struct mpam_resctrl_res *res;
> struct mpam_resctrl_mon *mon;
>
> + wait_event(wait_cacheinfo_ready, cacheinfo_ready);
This may cause system hang for any hw/fw issue that causes cacheinfo
failure. Instead of hang, is it better to have a timeout wait here? Like
errowait_event_timeout(wait_cache_info_read, cacheinfo_ready, 5 * HZ);
and report failure when cacheinfo is not ready.
Thanks.
-Fenghua
^ permalink raw reply [flat|nested] 95+ messages in thread* Re: [RFC PATCH 31/38] arm_mpam: resctrl: Update the rmid reallocation limit
2025-12-06 0:06 ` Fenghua Yu
@ 2025-12-09 16:36 ` Ben Horgan
0 siblings, 0 replies; 95+ messages in thread
From: Ben Horgan @ 2025-12-09 16:36 UTC (permalink / raw)
To: Fenghua Yu, James Morse, linux-kernel, linux-arm-kernel
Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, baisheng.gao, Jonathan Cameron, Gavin Shan,
rohit.mathew, reinette.chatre, Punit Agrawal
Hi Fenghua,
On 12/6/25 00:06, Fenghua Yu wrote:
> Hi, James,
>
> On 12/5/25 13:58, James Morse wrote:
>> resctrl's limbo code needs to be told when the data left in a cache is
>> small enough for the partid+pmg value to be re-allocated.
>>
>> x86 uses the cache size divided by the number of rmid users the cache
>> may have. Do the same, but for the smallest cache, and with the
>> number of partid-and-pmg users.
>>
>> Querying the cache size can't happen until after cacheinfo_sysfs_init()
>> has run, so mpam_resctrl_setup() must wait until then.
>>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> ---
>> drivers/resctrl/mpam_resctrl.c | 54 ++++++++++++++++++++++++++++++++++
>> 1 file changed, 54 insertions(+)
>>
>> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/
>> mpam_resctrl.c
>> index 506063bd3348..ccdf8db742c9 100644
>> --- a/drivers/resctrl/mpam_resctrl.c
>> +++ b/drivers/resctrl/mpam_resctrl.c
>> @@ -16,6 +16,7 @@
>> #include <linux/resctrl.h>
>> #include <linux/slab.h>
>> #include <linux/types.h>
>> +#include <linux/wait.h>
>> #include <asm/mpam.h>
>> @@ -58,6 +59,13 @@ static bool cdp_enabled;
>> */
>> static bool resctrl_enabled;
>> +/*
>> + * mpam_resctrl_pick_caches() needs to know the size of the caches.
>> cacheinfo
>> + * populates this from a device_initcall(). mpam_resctrl_setup() must
>> wait.
>> + */
>> +static bool cacheinfo_ready;
>> +static DECLARE_WAIT_QUEUE_HEAD(wait_cacheinfo_ready);
>> +
>> /*
>> * L3 local/total may come from different classes - what is the
>> number of MBWU
>> * 'on L3'?
>> @@ -584,6 +592,38 @@ void resctrl_arch_reset_cntr(struct rdt_resource
>> *r, struct rdt_mon_domain *d,
>> reset_mon_cdp_safe(mon, mon_comp, USE_PRE_ALLOCATED, closid, rmid);
>> }
>> +/*
>> + * The rmid realloc threshold should be for the smallest cache
>> exposed to
>> + * resctrl.
>> + */
>> +static int update_rmid_limits(struct mpam_class *class)
>> +{
>> + u32 num_unique_pmg = resctrl_arch_system_num_rmid_idx();
>> + struct mpam_props *cprops = &class->props;
>> + struct cacheinfo *ci;
>> +
>> + lockdep_assert_cpus_held();
>> +
>> + /* Assume cache levels are the same size for all CPUs... */
>> + ci = get_cpu_cacheinfo_level(smp_processor_id(), class->level);
>> + if (!ci || ci->size == 0) {
>> + pr_debug("Could not read cache size for class %u\n",
>> + class->level);
>> + return -EINVAL;
>> + }
>> +
>> + if (!mpam_has_feature(mpam_feat_msmon_csu, cprops))
>> + return 0;
>> +
>> + if (!resctrl_rmid_realloc_limit ||
>> + ci->size < resctrl_rmid_realloc_limit) {
>> + resctrl_rmid_realloc_limit = ci->size;
>> + resctrl_rmid_realloc_threshold = ci->size / num_unique_pmg;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> static bool cache_has_usable_cpor(struct mpam_class *class)
>> {
>> struct mpam_props *cprops = &class->props;
>> @@ -1025,6 +1065,9 @@ static void mpam_resctrl_pick_counters(void)
>> /* CSU counters only make sense on a cache. */
>> switch (class->type) {
>> case MPAM_CLASS_CACHE:
>> + if (update_rmid_limits(class))
>> + continue;
>> +
>> counter_update_class(QOS_L3_OCCUP_EVENT_ID, class);
>> return;
>> default:
>> @@ -1731,6 +1774,8 @@ int mpam_resctrl_setup(void)
>> struct mpam_resctrl_res *res;
>> struct mpam_resctrl_mon *mon;
>> + wait_event(wait_cacheinfo_ready, cacheinfo_ready);
>
> This may cause system hang for any hw/fw issue that causes cacheinfo
> failure. Instead of hang, is it better to have a timeout wait here? Like
> errowait_event_timeout(wait_cache_info_read, cacheinfo_ready, 5 * HZ);
> and report failure when cacheinfo is not ready.
This is just waiting on everything in device_initcall(). I think we've
got bigger problems if that doesn't finish.
>
> Thanks.
>
> -Fenghua
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread
* [RFC PATCH 32/38] arm_mpam: resctrl: Sort the order of the domain lists
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (30 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 31/38] arm_mpam: resctrl: Update the rmid reallocation limit James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-05 21:58 ` [RFC PATCH 33/38] arm_mpam: Generate a configuration for min controls James Morse
` (6 subsequent siblings)
38 siblings, 0 replies; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
resctrl documents that the domains appear in numeric order in the
schemata file. This means a little more work is needed when bringing
a domain online.
Add the support for this, using resctrl_find_domain() to find the
point to insert in the list.
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_resctrl.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index ccdf8db742c9..84ef3c1b4f53 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -1547,6 +1547,21 @@ static struct mpam_component *find_component(struct mpam_class *victim, int cpu)
return NULL;
}
+static void mpam_resctrl_domain_insert(struct list_head *list,
+ struct rdt_domain_hdr *new)
+{
+ struct rdt_domain_hdr *err;
+ struct list_head *pos = NULL;
+
+ lockdep_assert_held(&domain_list_lock);
+
+ err = resctrl_find_domain(list, new->id, &pos);
+ if (WARN_ON_ONCE(err))
+ return;
+
+ list_add_tail_rcu(&new->list, pos);
+}
+
static struct mpam_resctrl_dom *
mpam_resctrl_alloc_domain(unsigned int cpu, struct mpam_resctrl_res *res)
{
@@ -1584,8 +1599,7 @@ mpam_resctrl_alloc_domain(unsigned int cpu, struct mpam_resctrl_res *res)
ctrl_d = &dom->resctrl_ctrl_dom;
mpam_resctrl_domain_hdr_init(cpu, ctrl_comp, &ctrl_d->hdr);
ctrl_d->hdr.type = RESCTRL_CTRL_DOMAIN;
- /* TODO: this list should be sorted */
- list_add_tail_rcu(&ctrl_d->hdr.list, &r->ctrl_domains);
+ mpam_resctrl_domain_insert(&r->ctrl_domains, &ctrl_d->hdr);
err = resctrl_online_ctrl_domain(r, ctrl_d);
if (err) {
dom = ERR_PTR(err);
@@ -1623,8 +1637,7 @@ mpam_resctrl_alloc_domain(unsigned int cpu, struct mpam_resctrl_res *res)
mon_d = &dom->resctrl_mon_dom;
mpam_resctrl_domain_hdr_init(cpu, any_mon_comp, &mon_d->hdr);
mon_d->hdr.type = RESCTRL_MON_DOMAIN;
- /* TODO: this list should be sorted */
- list_add_tail_rcu(&mon_d->hdr.list, &r->mon_domains);
+ mpam_resctrl_domain_insert(&r->mon_domains, &mon_d->hdr);
err = resctrl_online_mon_domain(r, mon_d);
if (err) {
dom = ERR_PTR(err);
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* [RFC PATCH 33/38] arm_mpam: Generate a configuration for min controls
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (31 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 32/38] arm_mpam: resctrl: Sort the order of the domain lists James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-09 16:45 ` Ben Horgan
2025-12-05 21:58 ` [RFC PATCH 34/38] arm_mpam: Add quirk framework James Morse
` (5 subsequent siblings)
38 siblings, 1 reply; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal, Zeng Heng
MPAM supports a minimum and maximum control for memory bandwidth. The
purpose of the minimum control is to give priority to tasks that are
below their minimum value. Resctrl only provides one value for the
bandwidth configuration, which is used for the maximum.
The minimum control is always programmed to zero on hardware that
supports it.
Generate a minimum bandwidth value that is 5% lower than the
value provided by resctrl. This means tasks that are not
receiving their target bandwidth can be prioritised by the
hardware.
CC: Zeng Heng <zengheng4@huawei.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 68 +++++++++++++++++++++++++++--
drivers/resctrl/mpam_internal.h | 2 +
drivers/resctrl/test_mpam_devices.c | 66 ++++++++++++++++++++++++++++
3 files changed, 132 insertions(+), 4 deletions(-)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 1334093fc03e..741e14e1e6cf 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -721,6 +721,13 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
mpam_set_feature(mpam_feat_mbw_part, props);
props->bwa_wd = FIELD_GET(MPAMF_MBW_IDR_BWA_WD, mbw_features);
+
+ /*
+ * The BWA_WD field can represent 0-63, but the control fields it
+ * describes have a maximum of 16 bits.
+ */
+ props->bwa_wd = min(props->bwa_wd, 16);
+
if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MAX, mbw_features))
mpam_set_feature(mpam_feat_mbw_max, props);
@@ -1387,7 +1394,7 @@ static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
if (mpam_has_feature(mpam_feat_mbw_min, rprops) &&
mpam_has_feature(mpam_feat_mbw_min, cfg))
- mpam_write_partsel_reg(msc, MBW_MIN, 0);
+ mpam_write_partsel_reg(msc, MBW_MIN, cfg->mbw_min);
if (mpam_has_feature(mpam_feat_mbw_max, rprops) &&
mpam_has_feature(mpam_feat_mbw_max, cfg)) {
@@ -2693,24 +2700,77 @@ static bool mpam_update_config(struct mpam_config *cfg,
maybe_update_config(cfg, mpam_feat_cpor_part, newcfg, cpbm, has_changes);
maybe_update_config(cfg, mpam_feat_mbw_part, newcfg, mbw_pbm, has_changes);
maybe_update_config(cfg, mpam_feat_mbw_max, newcfg, mbw_max, has_changes);
+ maybe_update_config(cfg, mpam_feat_mbw_min, newcfg, mbw_min, has_changes);
return has_changes;
}
+static void mpam_extend_config(struct mpam_class *class, struct mpam_config *cfg)
+{
+ struct mpam_props *cprops = &class->props;
+ u16 min, min_hw_granule, delta;
+ u16 max_hw_value, res0_bits;
+
+ /*
+ * MAX and MIN should be set together. If only one is provided,
+ * generate a configuration for the other. If only one control
+ * type is supported, the other value will be ignored.
+ *
+ * Resctrl can only configure the MAX.
+ */
+ if (mpam_has_feature(mpam_feat_mbw_max, cfg) &&
+ !mpam_has_feature(mpam_feat_mbw_min, cfg)) {
+ /*
+ * Calculate the values the 'min' control can hold.
+ * e.g. on a platform with bwa_wd = 8, min_hw_granule is 0x00ff
+ * because those bits are RES0. Configurations of this value
+ * are effectively zero. But configurations need to saturate
+ * at min_hw_granule on systems with mismatched bwa_wd, where
+ * the 'less than 0' values are implemented on some MSC, but
+ * not others.
+ */
+ res0_bits = 16 - cprops->bwa_wd;
+ max_hw_value = ((1 << cprops->bwa_wd) - 1) << res0_bits;
+ min_hw_granule = ~max_hw_value;
+
+ delta = ((5 * MPAMCFG_MBW_MAX_MAX) / 100) - 1;
+ if (cfg->mbw_max > delta)
+ min = cfg->mbw_max - delta;
+ else
+ min = 0;
+
+ cfg->mbw_min = max(min, min_hw_granule);
+ mpam_set_feature(mpam_feat_mbw_min, cfg);
+ }
+}
+
int mpam_apply_config(struct mpam_component *comp, u16 partid,
- struct mpam_config *cfg)
+ struct mpam_config *user_cfg)
{
struct mpam_write_config_arg arg;
struct mpam_msc_ris *ris;
+ struct mpam_config cfg;
struct mpam_vmsc *vmsc;
struct mpam_msc *msc;
lockdep_assert_cpus_held();
+
/* Don't pass in the current config! */
- WARN_ON_ONCE(&comp->cfg[partid] == cfg);
+ WARN_ON_ONCE(&comp->cfg[partid] == user_cfg);
- if (!mpam_update_config(&comp->cfg[partid], cfg))
+ /*
+ * Copy the config to avoid writing back the 'extended' version to
+ * the caller.
+ * This avoids mpam_devices.c setting a mbm_min that mpam_resctrl.c
+ * is unaware of ... when it then changes mbm_max to be lower than
+ * mbm_min.
+ */
+ cfg = *user_cfg;
+
+ mpam_extend_config(comp->class, &cfg);
+
+ if (!mpam_update_config(&comp->cfg[partid], &cfg))
return 0;
arg.comp = comp;
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index b13d5e55e701..d381906545ed 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -277,6 +277,7 @@ struct mpam_config {
u32 cpbm;
u32 mbw_pbm;
u16 mbw_max;
+ u16 mbw_min;
bool reset_cpbm;
bool reset_mbw_pbm;
@@ -618,6 +619,7 @@ static inline void mpam_resctrl_teardown_class(struct mpam_class *class) { }
* MPAMCFG_MBW_MAX - MPAM memory maximum bandwidth partitioning configuration
* register
*/
+#define MPAMCFG_MBW_MAX_MAX_NR_BITS 16
#define MPAMCFG_MBW_MAX_MAX GENMASK(15, 0)
#define MPAMCFG_MBW_MAX_HARDLIM BIT(31)
diff --git a/drivers/resctrl/test_mpam_devices.c b/drivers/resctrl/test_mpam_devices.c
index 3e8d564a0c64..2f802fd9f249 100644
--- a/drivers/resctrl/test_mpam_devices.c
+++ b/drivers/resctrl/test_mpam_devices.c
@@ -322,6 +322,71 @@ static void test_mpam_enable_merge_features(struct kunit *test)
mutex_unlock(&mpam_list_lock);
}
+static void test_mpam_extend_config(struct kunit *test)
+{
+ struct mpam_config fake_cfg = { };
+ struct mpam_class fake_class = { };
+
+ /* Configurations with both are not modified */
+ fake_class.props.bwa_wd = 16;
+ fake_cfg.mbw_max = 0xfeef;
+ fake_cfg.mbw_min = 0xfeef;
+ bitmap_zero(fake_cfg.features, MPAM_FEATURE_LAST);
+ mpam_set_feature(mpam_feat_mbw_max, &fake_cfg);
+ mpam_set_feature(mpam_feat_mbw_min, &fake_cfg);
+ mpam_extend_config(&fake_class, &fake_cfg);
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_max, &fake_cfg));
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_min, &fake_cfg));
+ KUNIT_EXPECT_EQ(test, fake_cfg.mbw_max, 0xfeef);
+ KUNIT_EXPECT_EQ(test, fake_cfg.mbw_min, 0xfeef);
+
+ /* When a min is missing, it is generated */
+ fake_class.props.bwa_wd = 16;
+ fake_cfg.mbw_max = 0xfeef;
+ fake_cfg.mbw_min = 0;
+ bitmap_zero(fake_cfg.features, MPAM_FEATURE_LAST);
+ mpam_set_feature(mpam_feat_mbw_max, &fake_cfg);
+ mpam_extend_config(&fake_class, &fake_cfg);
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_max, &fake_cfg));
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_min, &fake_cfg));
+ KUNIT_EXPECT_EQ(test, fake_cfg.mbw_max, 0xfeef);
+ KUNIT_EXPECT_EQ(test, fake_cfg.mbw_min, 0xf224);
+
+ fake_class.props.bwa_wd = 8;
+ fake_cfg.mbw_max = 0xfeef;
+ fake_cfg.mbw_min = 0;
+ bitmap_zero(fake_cfg.features, MPAM_FEATURE_LAST);
+ mpam_set_feature(mpam_feat_mbw_max, &fake_cfg);
+ mpam_extend_config(&fake_class, &fake_cfg);
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_max, &fake_cfg));
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_min, &fake_cfg));
+ KUNIT_EXPECT_EQ(test, fake_cfg.mbw_max, 0xfeef);
+ KUNIT_EXPECT_EQ(test, fake_cfg.mbw_min, 0xf224);
+
+ /* 5% below the minimum granule, is still the minimum granule */
+ fake_class.props.bwa_wd = 12;
+ fake_cfg.mbw_max = 0xf;
+ fake_cfg.mbw_min = 0;
+ bitmap_zero(fake_cfg.features, MPAM_FEATURE_LAST);
+ mpam_set_feature(mpam_feat_mbw_max, &fake_cfg);
+ mpam_extend_config(&fake_class, &fake_cfg);
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_max, &fake_cfg));
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_min, &fake_cfg));
+ KUNIT_EXPECT_EQ(test, fake_cfg.mbw_max, 0xf);
+ KUNIT_EXPECT_EQ(test, fake_cfg.mbw_min, 0xf);
+
+ fake_class.props.bwa_wd = 16;
+ fake_cfg.mbw_max = 0x4;
+ fake_cfg.mbw_min = 0;
+ bitmap_zero(fake_cfg.features, MPAM_FEATURE_LAST);
+ mpam_set_feature(mpam_feat_mbw_max, &fake_cfg);
+ mpam_extend_config(&fake_class, &fake_cfg);
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_max, &fake_cfg));
+ KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_min, &fake_cfg));
+ KUNIT_EXPECT_EQ(test, fake_cfg.mbw_max, 0x4);
+ KUNIT_EXPECT_EQ(test, fake_cfg.mbw_min, 0x0);
+}
+
static void test_mpam_reset_msc_bitmap(struct kunit *test)
{
char __iomem *buf = kunit_kzalloc(test, SZ_16K, GFP_KERNEL);
@@ -378,6 +443,7 @@ static struct kunit_case mpam_devices_test_cases[] = {
KUNIT_CASE(test_mpam_reset_msc_bitmap),
KUNIT_CASE(test_mpam_enable_merge_features),
KUNIT_CASE(test__props_mismatch),
+ KUNIT_CASE(test_mpam_extend_config),
{}
};
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [RFC PATCH 33/38] arm_mpam: Generate a configuration for min controls
2025-12-05 21:58 ` [RFC PATCH 33/38] arm_mpam: Generate a configuration for min controls James Morse
@ 2025-12-09 16:45 ` Ben Horgan
0 siblings, 0 replies; 95+ messages in thread
From: Ben Horgan @ 2025-12-09 16:45 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel
Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal,
Zeng Heng
Hi James,
On 12/5/25 21:58, James Morse wrote:
> MPAM supports a minimum and maximum control for memory bandwidth. The
> purpose of the minimum control is to give priority to tasks that are
> below their minimum value. Resctrl only provides one value for the
> bandwidth configuration, which is used for the maximum.
>
> The minimum control is always programmed to zero on hardware that
> supports it.
>
> Generate a minimum bandwidth value that is 5% lower than the
> value provided by resctrl. This means tasks that are not
> receiving their target bandwidth can be prioritised by the
> hardware.
To ensure that the min is always programmed we need to add a
reset_mbw_min to the reset_cfg for the ris level reset and give a value
in mpam_reset_component_cfg() for the component level reset.
>
> CC: Zeng Heng <zengheng4@huawei.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> drivers/resctrl/mpam_devices.c | 68 +++++++++++++++++++++++++++--
> drivers/resctrl/mpam_internal.h | 2 +
> drivers/resctrl/test_mpam_devices.c | 66 ++++++++++++++++++++++++++++
> 3 files changed, 132 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 1334093fc03e..741e14e1e6cf 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -721,6 +721,13 @@ static void mpam_ris_hw_probe(struct mpam_msc_ris *ris)
> mpam_set_feature(mpam_feat_mbw_part, props);
>
> props->bwa_wd = FIELD_GET(MPAMF_MBW_IDR_BWA_WD, mbw_features);
> +
> + /*
> + * The BWA_WD field can represent 0-63, but the control fields it
> + * describes have a maximum of 16 bits.
> + */
> + props->bwa_wd = min(props->bwa_wd, 16);
> +
> if (props->bwa_wd && FIELD_GET(MPAMF_MBW_IDR_HAS_MAX, mbw_features))
> mpam_set_feature(mpam_feat_mbw_max, props);
>
> @@ -1387,7 +1394,7 @@ static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
>
> if (mpam_has_feature(mpam_feat_mbw_min, rprops) &&
> mpam_has_feature(mpam_feat_mbw_min, cfg))
> - mpam_write_partsel_reg(msc, MBW_MIN, 0);
> + mpam_write_partsel_reg(msc, MBW_MIN, cfg->mbw_min);
>
> if (mpam_has_feature(mpam_feat_mbw_max, rprops) &&
> mpam_has_feature(mpam_feat_mbw_max, cfg)) {
> @@ -2693,24 +2700,77 @@ static bool mpam_update_config(struct mpam_config *cfg,
> maybe_update_config(cfg, mpam_feat_cpor_part, newcfg, cpbm, has_changes);
> maybe_update_config(cfg, mpam_feat_mbw_part, newcfg, mbw_pbm, has_changes);
> maybe_update_config(cfg, mpam_feat_mbw_max, newcfg, mbw_max, has_changes);
> + maybe_update_config(cfg, mpam_feat_mbw_min, newcfg, mbw_min, has_changes);
>
> return has_changes;
> }
>
> +static void mpam_extend_config(struct mpam_class *class, struct mpam_config *cfg)
> +{
> + struct mpam_props *cprops = &class->props;
> + u16 min, min_hw_granule, delta;
> + u16 max_hw_value, res0_bits;
> +
> + /*
> + * MAX and MIN should be set together. If only one is provided,
> + * generate a configuration for the other. If only one control
> + * type is supported, the other value will be ignored.
> + *
> + * Resctrl can only configure the MAX.
> + */
> + if (mpam_has_feature(mpam_feat_mbw_max, cfg) &&
> + !mpam_has_feature(mpam_feat_mbw_min, cfg)) {
> + /*
> + * Calculate the values the 'min' control can hold.
> + * e.g. on a platform with bwa_wd = 8, min_hw_granule is 0x00ff
> + * because those bits are RES0. Configurations of this value
> + * are effectively zero. But configurations need to saturate
> + * at min_hw_granule on systems with mismatched bwa_wd, where
> + * the 'less than 0' values are implemented on some MSC, but
> + * not others.
> + */
> + res0_bits = 16 - cprops->bwa_wd;
> + max_hw_value = ((1 << cprops->bwa_wd) - 1) << res0_bits;
> + min_hw_granule = ~max_hw_value;
> +
> + delta = ((5 * MPAMCFG_MBW_MAX_MAX) / 100) - 1;
> + if (cfg->mbw_max > delta)
> + min = cfg->mbw_max - delta;
> + else
> + min = 0;
> +
> + cfg->mbw_min = max(min, min_hw_granule);
> + mpam_set_feature(mpam_feat_mbw_min, cfg);
> + }
> +}
> +
> int mpam_apply_config(struct mpam_component *comp, u16 partid,
> - struct mpam_config *cfg)
> + struct mpam_config *user_cfg)
> {
> struct mpam_write_config_arg arg;
> struct mpam_msc_ris *ris;
> + struct mpam_config cfg;
> struct mpam_vmsc *vmsc;
> struct mpam_msc *msc;
>
> lockdep_assert_cpus_held();
>
> +
> /* Don't pass in the current config! */
> - WARN_ON_ONCE(&comp->cfg[partid] == cfg);
> + WARN_ON_ONCE(&comp->cfg[partid] == user_cfg);
>
> - if (!mpam_update_config(&comp->cfg[partid], cfg))
> + /*
> + * Copy the config to avoid writing back the 'extended' version to
> + * the caller.
> + * This avoids mpam_devices.c setting a mbm_min that mpam_resctrl.c
> + * is unaware of ... when it then changes mbm_max to be lower than
> + * mbm_min.
> + */
> + cfg = *user_cfg;
> +
> + mpam_extend_config(comp->class, &cfg);
> +
> + if (!mpam_update_config(&comp->cfg[partid], &cfg))
> return 0;
>
> arg.comp = comp;
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index b13d5e55e701..d381906545ed 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -277,6 +277,7 @@ struct mpam_config {
> u32 cpbm;
> u32 mbw_pbm;
> u16 mbw_max;
> + u16 mbw_min;
>
> bool reset_cpbm;
> bool reset_mbw_pbm;
> @@ -618,6 +619,7 @@ static inline void mpam_resctrl_teardown_class(struct mpam_class *class) { }
> * MPAMCFG_MBW_MAX - MPAM memory maximum bandwidth partitioning configuration
> * register
> */
> +#define MPAMCFG_MBW_MAX_MAX_NR_BITS 16
> #define MPAMCFG_MBW_MAX_MAX GENMASK(15, 0)
> #define MPAMCFG_MBW_MAX_HARDLIM BIT(31)
>
> diff --git a/drivers/resctrl/test_mpam_devices.c b/drivers/resctrl/test_mpam_devices.c
> index 3e8d564a0c64..2f802fd9f249 100644
> --- a/drivers/resctrl/test_mpam_devices.c
> +++ b/drivers/resctrl/test_mpam_devices.c
> @@ -322,6 +322,71 @@ static void test_mpam_enable_merge_features(struct kunit *test)
> mutex_unlock(&mpam_list_lock);
> }
>
> +static void test_mpam_extend_config(struct kunit *test)
> +{
> + struct mpam_config fake_cfg = { };
> + struct mpam_class fake_class = { };
> +
> + /* Configurations with both are not modified */
> + fake_class.props.bwa_wd = 16;
> + fake_cfg.mbw_max = 0xfeef;
> + fake_cfg.mbw_min = 0xfeef;
> + bitmap_zero(fake_cfg.features, MPAM_FEATURE_LAST);
> + mpam_set_feature(mpam_feat_mbw_max, &fake_cfg);
> + mpam_set_feature(mpam_feat_mbw_min, &fake_cfg);
> + mpam_extend_config(&fake_class, &fake_cfg);
> + KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_max, &fake_cfg));
> + KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_min, &fake_cfg));
> + KUNIT_EXPECT_EQ(test, fake_cfg.mbw_max, 0xfeef);
> + KUNIT_EXPECT_EQ(test, fake_cfg.mbw_min, 0xfeef);
> +
> + /* When a min is missing, it is generated */
> + fake_class.props.bwa_wd = 16;
> + fake_cfg.mbw_max = 0xfeef;
> + fake_cfg.mbw_min = 0;
> + bitmap_zero(fake_cfg.features, MPAM_FEATURE_LAST);
> + mpam_set_feature(mpam_feat_mbw_max, &fake_cfg);
> + mpam_extend_config(&fake_class, &fake_cfg);
> + KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_max, &fake_cfg));
> + KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_min, &fake_cfg));
> + KUNIT_EXPECT_EQ(test, fake_cfg.mbw_max, 0xfeef);
> + KUNIT_EXPECT_EQ(test, fake_cfg.mbw_min, 0xf224);
> +
> + fake_class.props.bwa_wd = 8;
> + fake_cfg.mbw_max = 0xfeef;
> + fake_cfg.mbw_min = 0;
> + bitmap_zero(fake_cfg.features, MPAM_FEATURE_LAST);
> + mpam_set_feature(mpam_feat_mbw_max, &fake_cfg);
> + mpam_extend_config(&fake_class, &fake_cfg);
> + KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_max, &fake_cfg));
> + KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_min, &fake_cfg));
> + KUNIT_EXPECT_EQ(test, fake_cfg.mbw_max, 0xfeef);
> + KUNIT_EXPECT_EQ(test, fake_cfg.mbw_min, 0xf224);
> +
> + /* 5% below the minimum granule, is still the minimum granule */
> + fake_class.props.bwa_wd = 12;
> + fake_cfg.mbw_max = 0xf;
> + fake_cfg.mbw_min = 0;
> + bitmap_zero(fake_cfg.features, MPAM_FEATURE_LAST);
> + mpam_set_feature(mpam_feat_mbw_max, &fake_cfg);
> + mpam_extend_config(&fake_class, &fake_cfg);
> + KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_max, &fake_cfg));
> + KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_min, &fake_cfg));
> + KUNIT_EXPECT_EQ(test, fake_cfg.mbw_max, 0xf);
> + KUNIT_EXPECT_EQ(test, fake_cfg.mbw_min, 0xf);
> +
> + fake_class.props.bwa_wd = 16;
> + fake_cfg.mbw_max = 0x4;
> + fake_cfg.mbw_min = 0;
> + bitmap_zero(fake_cfg.features, MPAM_FEATURE_LAST);
> + mpam_set_feature(mpam_feat_mbw_max, &fake_cfg);
> + mpam_extend_config(&fake_class, &fake_cfg);
> + KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_max, &fake_cfg));
> + KUNIT_EXPECT_TRUE(test, mpam_has_feature(mpam_feat_mbw_min, &fake_cfg));
> + KUNIT_EXPECT_EQ(test, fake_cfg.mbw_max, 0x4);
> + KUNIT_EXPECT_EQ(test, fake_cfg.mbw_min, 0x0);
> +}
> +
> static void test_mpam_reset_msc_bitmap(struct kunit *test)
> {
> char __iomem *buf = kunit_kzalloc(test, SZ_16K, GFP_KERNEL);
> @@ -378,6 +443,7 @@ static struct kunit_case mpam_devices_test_cases[] = {
> KUNIT_CASE(test_mpam_reset_msc_bitmap),
> KUNIT_CASE(test_mpam_enable_merge_features),
> KUNIT_CASE(test__props_mismatch),
> + KUNIT_CASE(test_mpam_extend_config),
> {}
> };
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread
* [RFC PATCH 34/38] arm_mpam: Add quirk framework
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (32 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 33/38] arm_mpam: Generate a configuration for min controls James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-18 14:04 ` Jonathan Cameron
2025-12-05 21:58 ` [RFC PATCH 35/38] arm_mpam: Add workaround for T241-MPAM-1 James Morse
` (4 subsequent siblings)
38 siblings, 1 reply; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
From: Shanker Donthineni <sdonthineni@nvidia.com>
The MPAM specification includes the MPAMF_IIDR, which serves to
uniquely identify the MSC implementation through a combination of
implementer details, product ID, variant, and revision. Certain
hardware issues/errata can be resolved using software workarounds.
Introduce a quirk framework to allow workarounds to be enabled based
on the MPAMF_IIDR value.
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
[ morse: Stash the IIDR so this doesn't need an IPI, enable quirks only
once, move the description to the callback so it can be pr_once()d, add
an enum of workarounds for popular errata. Add macros for making lists
of product/revision/vendor half readable ]
Signed-off-by: James Morse <james.morse@arm.com>
---
drivers/resctrl/mpam_devices.c | 27 +++++++++++++++++++++++++++
drivers/resctrl/mpam_internal.h | 26 ++++++++++++++++++++++++++
2 files changed, 53 insertions(+)
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 741e14e1e6cf..f0f6f9b55ad4 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -630,6 +630,25 @@ static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
return ERR_PTR(-ENOENT);
}
+static const struct mpam_quirk mpam_quirks[] = {
+ { NULL }, /* Sentinel */
+};
+
+static void mpam_enable_quirks(struct mpam_msc *msc)
+{
+ const struct mpam_quirk *quirk;
+
+ for (quirk = &mpam_quirks[0]; quirk->iidr_mask; quirk++) {
+ if (quirk->iidr != (msc->iidr & quirk->iidr_mask))
+ continue;
+
+ if (quirk->init)
+ quirk->init(msc, quirk);
+ else
+ mpam_set_quirk(quirk->workaround, msc);
+ }
+}
+
/*
* IHI009A.a has this nugget: "If a monitor does not support automatic behaviour
* of NRDY, software can use this bit for any purpose" - so hardware might not
@@ -864,8 +883,11 @@ static int mpam_msc_hw_probe(struct mpam_msc *msc)
/* Grab an IDR value to find out how many RIS there are */
mutex_lock(&msc->part_sel_lock);
idr = mpam_msc_read_idr(msc);
+ msc->iidr = mpam_read_partsel_reg(msc, IIDR);
mutex_unlock(&msc->part_sel_lock);
+ mpam_enable_quirks(msc);
+
msc->ris_max = FIELD_GET(MPAMF_IDR_RIS_MAX, idr);
/* Use these values so partid/pmg always starts with a valid value */
@@ -1987,6 +2009,7 @@ static bool mpam_has_cmax_wd_feature(struct mpam_props *props)
* resulting safe value must be compatible with both. When merging values in
* the tree, all the aliasing resources must be handled first.
* On mismatch, parent is modified.
+ * Quirks on an MSC will apply to all MSC in that class.
*/
static void __props_mismatch(struct mpam_props *parent,
struct mpam_props *child, bool alias)
@@ -2106,6 +2129,7 @@ static void __props_mismatch(struct mpam_props *parent,
* nobble the class feature, as we can't configure all the resources.
* e.g. The L3 cache is composed of two resources with 13 and 17 portion
* bitmaps respectively.
+ * Quirks on an MSC will apply to all MSC in that class.
*/
static void
__class_props_mismatch(struct mpam_class *class, struct mpam_vmsc *vmsc)
@@ -2119,6 +2143,9 @@ __class_props_mismatch(struct mpam_class *class, struct mpam_vmsc *vmsc)
dev_dbg(dev, "Merging features for class:0x%lx &= vmsc:0x%lx\n",
(long)cprops->features, (long)vprops->features);
+ /* Merge quirks */
+ class->quirks |= vmsc->msc->quirks;
+
/* Take the safe value for any common features */
__props_mismatch(cprops, vprops, false);
}
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index d381906545ed..de3e5faa12b2 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -88,6 +88,8 @@ struct mpam_msc {
u8 pmg_max;
unsigned long ris_idxs;
u32 ris_max;
+ u32 iidr;
+ u16 quirks;
/*
* error_irq_lock is taken when registering/unregistering the error
@@ -215,6 +217,29 @@ struct mpam_props {
#define mpam_set_feature(_feat, x) set_bit(_feat, (x)->features)
#define mpam_clear_feature(_feat, x) clear_bit(_feat, (x)->features)
+/* Workaround bits for msc->quirks */
+enum mpam_device_quirks {
+ MPAM_QUIRK_LAST,
+};
+
+#define mpam_has_quirk(_quirk, x) ((1 << (_quirk) & (x)->quirks))
+#define mpam_set_quirk(_quirk, x) ((x)->quirks |= (1 << (_quirk)))
+
+struct mpam_quirk {
+ void (*init)(struct mpam_msc *msc, const struct mpam_quirk *quirk);
+
+ u32 iidr;
+ u32 iidr_mask;
+
+ enum mpam_device_quirks workaround;
+};
+
+#define MPAM_IIDR_MATCH_ONE FIELD_PREP_CONST(MPAMF_IIDR_PRODUCTID, 0xfff) | \
+ FIELD_PREP_CONST(MPAMF_IIDR_VARIANT, 0xf ) | \
+ FIELD_PREP_CONST(MPAMF_IIDR_REVISION, 0xf ) | \
+ FIELD_PREP_CONST(MPAMF_IIDR_IMPLEMENTER, 0xfff)
+
+
/* The values for MSMON_CFG_MBWU_FLT.RWBW */
enum mon_filter_options {
COUNT_BOTH = 0,
@@ -258,6 +283,7 @@ struct mpam_class {
struct mpam_props props;
u32 nrdy_usec;
+ u16 quirks;
u8 level;
enum mpam_class_types type;
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [RFC PATCH 34/38] arm_mpam: Add quirk framework
2025-12-05 21:58 ` [RFC PATCH 34/38] arm_mpam: Add quirk framework James Morse
@ 2025-12-18 14:04 ` Jonathan Cameron
2025-12-19 12:19 ` Ben Horgan
0 siblings, 1 reply; 95+ messages in thread
From: Jonathan Cameron @ 2025-12-18 14:04 UTC (permalink / raw)
To: James Morse
Cc: linux-kernel, linux-arm-kernel, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
On Fri, 5 Dec 2025 21:58:57 +0000
James Morse <james.morse@arm.com> wrote:
> From: Shanker Donthineni <sdonthineni@nvidia.com>
>
> The MPAM specification includes the MPAMF_IIDR, which serves to
> uniquely identify the MSC implementation through a combination of
> implementer details, product ID, variant, and revision. Certain
> hardware issues/errata can be resolved using software workarounds.
>
> Introduce a quirk framework to allow workarounds to be enabled based
> on the MPAMF_IIDR value.
>
> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
> [ morse: Stash the IIDR so this doesn't need an IPI, enable quirks only
> once, move the description to the callback so it can be pr_once()d, add
> an enum of workarounds for popular errata. Add macros for making lists
> of product/revision/vendor half readable ]
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> drivers/resctrl/mpam_devices.c | 27 +++++++++++++++++++++++++++
> drivers/resctrl/mpam_internal.h | 26 ++++++++++++++++++++++++++
> 2 files changed, 53 insertions(+)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 741e14e1e6cf..f0f6f9b55ad4 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -630,6 +630,25 @@ static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
> return ERR_PTR(-ENOENT);
> }
>
> +static const struct mpam_quirk mpam_quirks[] = {
> + { NULL }, /* Sentinel */
Drop the trailing , given I assume whole point is nothing after this?
> +};
> +
> +static void mpam_enable_quirks(struct mpam_msc *msc)
> +{
> + const struct mpam_quirk *quirk;
> +
> + for (quirk = &mpam_quirks[0]; quirk->iidr_mask; quirk++) {
> + if (quirk->iidr != (msc->iidr & quirk->iidr_mask))
> + continue;
> +
> + if (quirk->init)
> + quirk->init(msc, quirk);
I'm curious why you don't return a bool from this and call
mpam_set_quirk() if that's not indicating it should not be set.
Seems a bit odd to push the tracking that is relevant to the generic
framework (mpam_set_quirk) down into the particular quirk inits.
> + else
> + mpam_set_quirk(quirk->workaround, msc);
> + }
> +}
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index d381906545ed..de3e5faa12b2 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -88,6 +88,8 @@ struct mpam_msc {
> u8 pmg_max;
> unsigned long ris_idxs;
> u32 ris_max;
> + u32 iidr;
> + u16 quirks;
>
> /*
> * error_irq_lock is taken when registering/unregistering the error
> @@ -215,6 +217,29 @@ struct mpam_props {
> #define mpam_set_feature(_feat, x) set_bit(_feat, (x)->features)
> #define mpam_clear_feature(_feat, x) clear_bit(_feat, (x)->features)
>
> +/* Workaround bits for msc->quirks */
> +enum mpam_device_quirks {
> + MPAM_QUIRK_LAST,
Dropping this comma should make it harder for anyone to stick an entry
after this.
^ permalink raw reply [flat|nested] 95+ messages in thread* Re: [RFC PATCH 34/38] arm_mpam: Add quirk framework
2025-12-18 14:04 ` Jonathan Cameron
@ 2025-12-19 12:19 ` Ben Horgan
0 siblings, 0 replies; 95+ messages in thread
From: Ben Horgan @ 2025-12-19 12:19 UTC (permalink / raw)
To: Jonathan Cameron, James Morse
Cc: linux-kernel, linux-arm-kernel, D Scott Phillips OS, carl,
lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang, Jamie Iles,
Xin Hao, peternewman, dfustini, amitsinght, David Hildenbrand,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal
Hi Jonathan,
Thanks for your review. I haven't replied to every point but unless
commented I have acted on them.
On 12/18/25 14:04, Jonathan Cameron wrote:
> On Fri, 5 Dec 2025 21:58:57 +0000
> James Morse <james.morse@arm.com> wrote:
>
>> From: Shanker Donthineni <sdonthineni@nvidia.com>
>>
>> The MPAM specification includes the MPAMF_IIDR, which serves to
>> uniquely identify the MSC implementation through a combination of
>> implementer details, product ID, variant, and revision. Certain
>> hardware issues/errata can be resolved using software workarounds.
>>
>> Introduce a quirk framework to allow workarounds to be enabled based
>> on the MPAMF_IIDR value.
>>
>> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
>> [ morse: Stash the IIDR so this doesn't need an IPI, enable quirks only
>> once, move the description to the callback so it can be pr_once()d, add
>> an enum of workarounds for popular errata. Add macros for making lists
>> of product/revision/vendor half readable ]
>> Signed-off-by: James Morse <james.morse@arm.com>
>> ---
>> drivers/resctrl/mpam_devices.c | 27 +++++++++++++++++++++++++++
>> drivers/resctrl/mpam_internal.h | 26 ++++++++++++++++++++++++++
>> 2 files changed, 53 insertions(+)
>>
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index 741e14e1e6cf..f0f6f9b55ad4 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -630,6 +630,25 @@ static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
>> return ERR_PTR(-ENOENT);
>> }
>>
>> +static const struct mpam_quirk mpam_quirks[] = {
>> + { NULL }, /* Sentinel */
>
> Drop the trailing , given I assume whole point is nothing after this?
>
>> +};
>> +
>> +static void mpam_enable_quirks(struct mpam_msc *msc)
>> +{
>> + const struct mpam_quirk *quirk;
>> +
>> + for (quirk = &mpam_quirks[0]; quirk->iidr_mask; quirk++) {
>> + if (quirk->iidr != (msc->iidr & quirk->iidr_mask))
>> + continue;
>> +
>> + if (quirk->init)
>> + quirk->init(msc, quirk);
>
> I'm curious why you don't return a bool from this and call
> mpam_set_quirk() if that's not indicating it should not be set.
> Seems a bit odd to push the tracking that is relevant to the generic
> framework (mpam_set_quirk) down into the particular quirk inits.
Good point. I've changed the init to return 0 or an error.
>
>> + else
>> + mpam_set_quirk(quirk->workaround, msc);
>> + }
>> +}
>
>> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
>> index d381906545ed..de3e5faa12b2 100644
>> --- a/drivers/resctrl/mpam_internal.h
>> +++ b/drivers/resctrl/mpam_internal.h
>> @@ -88,6 +88,8 @@ struct mpam_msc {
>> u8 pmg_max;
>> unsigned long ris_idxs;
>> u32 ris_max;
>> + u32 iidr;
>> + u16 quirks;
>>
>> /*
>> * error_irq_lock is taken when registering/unregistering the error
>> @@ -215,6 +217,29 @@ struct mpam_props {
>> #define mpam_set_feature(_feat, x) set_bit(_feat, (x)->features)
>> #define mpam_clear_feature(_feat, x) clear_bit(_feat, (x)->features)
>>
>> +/* Workaround bits for msc->quirks */
>> +enum mpam_device_quirks {
>> + MPAM_QUIRK_LAST,
>
> Dropping this comma should make it harder for anyone to stick an entry
> after this.
Commas removed.
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread
* [RFC PATCH 35/38] arm_mpam: Add workaround for T241-MPAM-1
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (33 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 34/38] arm_mpam: Add quirk framework James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-10 12:20 ` Ben Horgan
2025-12-05 21:58 ` [RFC PATCH 36/38] arm_mpam: Add workaround for T241-MPAM-4 James Morse
` (3 subsequent siblings)
38 siblings, 1 reply; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
From: Shanker Donthineni <sdonthineni@nvidia.com>
The MPAM bandwidth partitioning controls will not be correctly configured,
and hardware will retain default configuration register values, meaning
generally that bandwidth will remain unprovisioned.
To address the issue, follow the below steps after updating the MBW_MIN
and/or MBW_MAX registers.
- Perform 64b reads from all 12 bridge MPAM shadow registers at offsets
(0x360048 + slice*0x10000 + partid*8). These registers are read-only.
- Continue iterating until all 12 shadow register values match in a loop.
pr_warn_once if the values fail to match within the loop count 1000.
- Perform 64b writes with the value 0x0 to the two spare registers at
offsets 0x1b0000 and 0x1c0000.
In the hardware, writes to the MPAMCFG_MBW_MAX MPAMCFG_MBW_MIN registers
are transformed into broadcast writes to the 12 shadow registers. The
final two writes to the spare registers cause a final rank of downstream
micro-architectural MPAM registers to be updated from the shadow copies.
The intervening loop to read the 12 shadow registers helps avoid a race
condition where writes to the spare registers occur before all shadow
registers have been updated.
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
[ morse: Merged the min/max update into a single
mpam_quirk_post_config_change() helper. Stashed the t241_id in the msc
instead of carrying the physical address around. Test the msc quirk bit
instead of a static key. ]
Signed-off-by: James Morse <james.morse@arm.com>
---
Documentation/arch/arm64/silicon-errata.rst | 2 +
drivers/resctrl/mpam_devices.c | 87 +++++++++++++++++++++
drivers/resctrl/mpam_internal.h | 9 +++
3 files changed, 98 insertions(+)
diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
index a7ec57060f64..4e86b85fe3d6 100644
--- a/Documentation/arch/arm64/silicon-errata.rst
+++ b/Documentation/arch/arm64/silicon-errata.rst
@@ -246,6 +246,8 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| NVIDIA | T241 GICv3/4.x | T241-FABRIC-4 | N/A |
+----------------+-----------------+-----------------+-----------------------------+
+| NVIDIA | T241 MPAM | T241-MPAM-1 | N/A |
++----------------+-----------------+-----------------+-----------------------------+
+----------------+-----------------+-----------------+-----------------------------+
| Freescale/NXP | LS2080A/LS1043A | A-008585 | FSL_ERRATUM_A008585 |
+----------------+-----------------+-----------------+-----------------------------+
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index f0f6f9b55ad4..f1f03ceade0a 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -29,6 +29,16 @@
#include "mpam_internal.h"
+/* Values for the T241 errata workaround */
+#define T241_CHIPS_MAX 4
+#define T241_CHIP_NSLICES 12
+#define T241_SPARE_REG0_OFF 0x1b0000
+#define T241_SPARE_REG1_OFF 0x1c0000
+#define T241_CHIP_ID(phys) FIELD_GET(GENMASK_ULL(44, 43), phys)
+#define T241_SHADOW_REG_OFF(sidx, pid) (0x360048 + (sidx) * 0x10000 + (pid) * 8)
+#define SMCCC_SOC_ID_T241 0x036b0241
+static void __iomem *t241_scratch_regs[T241_CHIPS_MAX];
+
/*
* mpam_list_lock protects the SRCU lists when writing. Once the
* mpam_enabled key is enabled these lists are read-only,
@@ -630,7 +640,44 @@ static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
return ERR_PTR(-ENOENT);
}
+static void mpam_enable_quirk_nvidia_t241(struct mpam_msc *msc,
+ const struct mpam_quirk *quirk)
+{
+ s32 soc_id = arm_smccc_get_soc_id_version();
+ struct resource *r;
+ phys_addr_t phys;
+
+ /*
+ * A mapping to a device other than the MSC is needed, check
+ * SOC_ID is NVIDIA T241 chip (036b:0241)
+ */
+ if (soc_id < 0 || soc_id != SMCCC_SOC_ID_T241)
+ return;
+
+ r = platform_get_resource(msc->pdev, IORESOURCE_MEM, 0);
+ if (!r)
+ return;
+
+ /* Find the internal registers base addr from the CHIP ID */
+ msc->t241_id = T241_CHIP_ID(r->start);
+ phys = FIELD_PREP(GENMASK_ULL(45, 44), msc->t241_id) | 0x19000000ULL;
+
+ t241_scratch_regs[msc->t241_id] = ioremap(phys, SZ_8M);
+ if (WARN_ON_ONCE(!t241_scratch_regs[msc->t241_id]))
+ return;
+
+ mpam_set_quirk(quirk->workaround, msc);
+ pr_info_once("Enabled workaround for NVIDIA T241 erratum T241-MPAM-1\n");
+}
+
static const struct mpam_quirk mpam_quirks[] = {
+ {
+ /* NVIDIA t241 erratum T241-MPAM-1 */
+ .init = mpam_enable_quirk_nvidia_t241,
+ .iidr = MPAM_IIDR_NVIDIA_T421,
+ .iidr_mask = MPAM_IIDR_MATCH_ONE,
+ .workaround = T241_SCRUB_SHADOW_REGS,
+ },
{ NULL }, /* Sentinel */
};
@@ -1372,6 +1419,44 @@ static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
__mpam_write_reg(msc, reg, bm);
}
+static void mpam_apply_t241_erratum(struct mpam_msc_ris *ris, u16 partid)
+{
+ int sidx, i, lcount = 1000;
+ void __iomem *regs;
+ u64 val0, val;
+
+ regs = t241_scratch_regs[ris->vmsc->msc->t241_id];
+
+ for (i = 0; i < lcount; i++) {
+ /* Read the shadow register at index 0 */
+ val0 = readq_relaxed(regs + T241_SHADOW_REG_OFF(0, partid));
+
+ /* Check if all the shadow registers have the same value */
+ for (sidx = 1; sidx < T241_CHIP_NSLICES; sidx++) {
+ val = readq_relaxed(regs +
+ T241_SHADOW_REG_OFF(sidx, partid));
+ if (val != val0)
+ break;
+ }
+ if (sidx == T241_CHIP_NSLICES)
+ break;
+ }
+
+ if (i == lcount)
+ pr_warn_once("t241: inconsistent values in shadow regs");
+
+ /* Write a value zero to spare registers to take effect of MBW conf */
+ writeq_relaxed(0, regs + T241_SPARE_REG0_OFF);
+ writeq_relaxed(0, regs + T241_SPARE_REG1_OFF);
+}
+
+static void mpam_quirk_post_config_change(struct mpam_msc_ris *ris, u16 partid,
+ struct mpam_config *cfg)
+{
+ if (mpam_has_quirk(T241_SCRUB_SHADOW_REGS, ris->vmsc->msc))
+ mpam_apply_t241_erratum(ris, partid);
+}
+
/* Called via IPI. Call while holding an SRCU reference */
static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
struct mpam_config *cfg)
@@ -1455,6 +1540,8 @@ static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
mpam_write_partsel_reg(msc, PRI, pri_val);
}
+ mpam_quirk_post_config_change(ris, partid, cfg);
+
mutex_unlock(&msc->part_sel_lock);
}
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index de3e5faa12b2..70b78cfd1f5b 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -133,6 +133,9 @@ struct mpam_msc {
void __iomem *mapped_hwpage;
size_t mapped_hwpage_sz;
+ /* Values only used on some platforms for quirks */
+ u32 t241_id;
+
struct mpam_garbage garbage;
};
@@ -219,6 +222,7 @@ struct mpam_props {
/* Workaround bits for msc->quirks */
enum mpam_device_quirks {
+ T241_SCRUB_SHADOW_REGS,
MPAM_QUIRK_LAST,
};
@@ -239,6 +243,11 @@ struct mpam_quirk {
FIELD_PREP_CONST(MPAMF_IIDR_REVISION, 0xf ) | \
FIELD_PREP_CONST(MPAMF_IIDR_IMPLEMENTER, 0xfff)
+#define MPAM_IIDR_NVIDIA_T421 FIELD_PREP_CONST(MPAMF_IIDR_PRODUCTID, 0x241) | \
+ FIELD_PREP_CONST(MPAMF_IIDR_VARIANT, 0 ) | \
+ FIELD_PREP_CONST(MPAMF_IIDR_REVISION, 0 ) | \
+ FIELD_PREP_CONST(MPAMF_IIDR_IMPLEMENTER, 0x36b)
+
/* The values for MSMON_CFG_MBWU_FLT.RWBW */
enum mon_filter_options {
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [RFC PATCH 35/38] arm_mpam: Add workaround for T241-MPAM-1
2025-12-05 21:58 ` [RFC PATCH 35/38] arm_mpam: Add workaround for T241-MPAM-1 James Morse
@ 2025-12-10 12:20 ` Ben Horgan
0 siblings, 0 replies; 95+ messages in thread
From: Ben Horgan @ 2025-12-10 12:20 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel
Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal
Hi James,
On 12/5/25 21:58, James Morse wrote:
> From: Shanker Donthineni <sdonthineni@nvidia.com>
>
> The MPAM bandwidth partitioning controls will not be correctly configured,
> and hardware will retain default configuration register values, meaning
> generally that bandwidth will remain unprovisioned.
>
> To address the issue, follow the below steps after updating the MBW_MIN
> and/or MBW_MAX registers.
>
> - Perform 64b reads from all 12 bridge MPAM shadow registers at offsets
> (0x360048 + slice*0x10000 + partid*8). These registers are read-only.
> - Continue iterating until all 12 shadow register values match in a loop.
> pr_warn_once if the values fail to match within the loop count 1000.
> - Perform 64b writes with the value 0x0 to the two spare registers at
> offsets 0x1b0000 and 0x1c0000.
>
> In the hardware, writes to the MPAMCFG_MBW_MAX MPAMCFG_MBW_MIN registers
> are transformed into broadcast writes to the 12 shadow registers. The
> final two writes to the spare registers cause a final rank of downstream
> micro-architectural MPAM registers to be updated from the shadow copies.
> The intervening loop to read the 12 shadow registers helps avoid a race
> condition where writes to the spare registers occur before all shadow
> registers have been updated.
>
> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
> [ morse: Merged the min/max update into a single
> mpam_quirk_post_config_change() helper. Stashed the t241_id in the msc
> instead of carrying the physical address around. Test the msc quirk bit
> instead of a static key. ]
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Documentation/arch/arm64/silicon-errata.rst | 2 +
> drivers/resctrl/mpam_devices.c | 87 +++++++++++++++++++++
> drivers/resctrl/mpam_internal.h | 9 +++
> 3 files changed, 98 insertions(+)
>
> diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
> index a7ec57060f64..4e86b85fe3d6 100644
> --- a/Documentation/arch/arm64/silicon-errata.rst
> +++ b/Documentation/arch/arm64/silicon-errata.rst
> @@ -246,6 +246,8 @@ stable kernels.
> +----------------+-----------------+-----------------+-----------------------------+
> | NVIDIA | T241 GICv3/4.x | T241-FABRIC-4 | N/A |
> +----------------+-----------------+-----------------+-----------------------------+
> +| NVIDIA | T241 MPAM | T241-MPAM-1 | N/A |
> ++----------------+-----------------+-----------------+-----------------------------+
> +----------------+-----------------+-----------------+-----------------------------+
> | Freescale/NXP | LS2080A/LS1043A | A-008585 | FSL_ERRATUM_A008585 |
> +----------------+-----------------+-----------------+-----------------------------+
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index f0f6f9b55ad4..f1f03ceade0a 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -29,6 +29,16 @@
>
> #include "mpam_internal.h"
>
> +/* Values for the T241 errata workaround */
> +#define T241_CHIPS_MAX 4
> +#define T241_CHIP_NSLICES 12
> +#define T241_SPARE_REG0_OFF 0x1b0000
> +#define T241_SPARE_REG1_OFF 0x1c0000
> +#define T241_CHIP_ID(phys) FIELD_GET(GENMASK_ULL(44, 43), phys)
> +#define T241_SHADOW_REG_OFF(sidx, pid) (0x360048 + (sidx) * 0x10000 + (pid) * 8)
> +#define SMCCC_SOC_ID_T241 0x036b0241
> +static void __iomem *t241_scratch_regs[T241_CHIPS_MAX];
> +
> /*
> * mpam_list_lock protects the SRCU lists when writing. Once the
> * mpam_enabled key is enabled these lists are read-only,
> @@ -630,7 +640,44 @@ static struct mpam_msc_ris *mpam_get_or_create_ris(struct mpam_msc *msc,
> return ERR_PTR(-ENOENT);
> }
>
> +static void mpam_enable_quirk_nvidia_t241(struct mpam_msc *msc,
> + const struct mpam_quirk *quirk)
> +{
> + s32 soc_id = arm_smccc_get_soc_id_version();
> + struct resource *r;
> + phys_addr_t phys;
> +
> + /*
> + * A mapping to a device other than the MSC is needed, check
> + * SOC_ID is NVIDIA T241 chip (036b:0241)
> + */
> + if (soc_id < 0 || soc_id != SMCCC_SOC_ID_T241)
> + return;
> +
> + r = platform_get_resource(msc->pdev, IORESOURCE_MEM, 0);
> + if (!r)
> + return;
> +
> + /* Find the internal registers base addr from the CHIP ID */
> + msc->t241_id = T241_CHIP_ID(r->start);
> + phys = FIELD_PREP(GENMASK_ULL(45, 44), msc->t241_id) | 0x19000000ULL;
> +
> + t241_scratch_regs[msc->t241_id] = ioremap(phys, SZ_8M);
> + if (WARN_ON_ONCE(!t241_scratch_regs[msc->t241_id]))
> + return;
> +
> + mpam_set_quirk(quirk->workaround, msc);
> + pr_info_once("Enabled workaround for NVIDIA T241 erratum T241-MPAM-1\n");
> +}
> +
> static const struct mpam_quirk mpam_quirks[] = {
> + {
> + /* NVIDIA t241 erratum T241-MPAM-1 */
> + .init = mpam_enable_quirk_nvidia_t241,
> + .iidr = MPAM_IIDR_NVIDIA_T421,
MPAM_IIDR_NVIDIA_T421 -> MPAM_IIDR_NVIDIA_T241
> + .iidr_mask = MPAM_IIDR_MATCH_ONE,
> + .workaround = T241_SCRUB_SHADOW_REGS,
> + },
> { NULL }, /* Sentinel */
> };
>
> @@ -1372,6 +1419,44 @@ static void mpam_reset_msc_bitmap(struct mpam_msc *msc, u16 reg, u16 wd)
> __mpam_write_reg(msc, reg, bm);
> }
>
> +static void mpam_apply_t241_erratum(struct mpam_msc_ris *ris, u16 partid)
> +{
> + int sidx, i, lcount = 1000;
> + void __iomem *regs;
> + u64 val0, val;
> +
> + regs = t241_scratch_regs[ris->vmsc->msc->t241_id];
> +
> + for (i = 0; i < lcount; i++) {
> + /* Read the shadow register at index 0 */
> + val0 = readq_relaxed(regs + T241_SHADOW_REG_OFF(0, partid));
> +
> + /* Check if all the shadow registers have the same value */
> + for (sidx = 1; sidx < T241_CHIP_NSLICES; sidx++) {
> + val = readq_relaxed(regs +
> + T241_SHADOW_REG_OFF(sidx, partid));
> + if (val != val0)
> + break;
> + }
> + if (sidx == T241_CHIP_NSLICES)
> + break;
> + }
> +
> + if (i == lcount)
> + pr_warn_once("t241: inconsistent values in shadow regs");
> +
> + /* Write a value zero to spare registers to take effect of MBW conf */
> + writeq_relaxed(0, regs + T241_SPARE_REG0_OFF);
> + writeq_relaxed(0, regs + T241_SPARE_REG1_OFF);
> +}
> +
> +static void mpam_quirk_post_config_change(struct mpam_msc_ris *ris, u16 partid,
> + struct mpam_config *cfg)
> +{
> + if (mpam_has_quirk(T241_SCRUB_SHADOW_REGS, ris->vmsc->msc))
> + mpam_apply_t241_erratum(ris, partid);
> +}
> +
> /* Called via IPI. Call while holding an SRCU reference */
> static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
> struct mpam_config *cfg)
> @@ -1455,6 +1540,8 @@ static void mpam_reprogram_ris_partid(struct mpam_msc_ris *ris, u16 partid,
> mpam_write_partsel_reg(msc, PRI, pri_val);
> }
>
> + mpam_quirk_post_config_change(ris, partid, cfg);
> +
> mutex_unlock(&msc->part_sel_lock);
> }
>
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index de3e5faa12b2..70b78cfd1f5b 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -133,6 +133,9 @@ struct mpam_msc {
> void __iomem *mapped_hwpage;
> size_t mapped_hwpage_sz;
>
> + /* Values only used on some platforms for quirks */
> + u32 t241_id;
> +
> struct mpam_garbage garbage;
> };
>
> @@ -219,6 +222,7 @@ struct mpam_props {
>
> /* Workaround bits for msc->quirks */
> enum mpam_device_quirks {
> + T241_SCRUB_SHADOW_REGS,
> MPAM_QUIRK_LAST,
> };
>
> @@ -239,6 +243,11 @@ struct mpam_quirk {
> FIELD_PREP_CONST(MPAMF_IIDR_REVISION, 0xf ) | \
> FIELD_PREP_CONST(MPAMF_IIDR_IMPLEMENTER, 0xfff)
>
> +#define MPAM_IIDR_NVIDIA_T421 FIELD_PREP_CONST(MPAMF_IIDR_PRODUCTID, 0x241) | \
MPAM_IIDR_NVIDIA_T421 -> MPAM_IIDR_NVIDIA_T241
> + FIELD_PREP_CONST(MPAMF_IIDR_VARIANT, 0 ) | \
> + FIELD_PREP_CONST(MPAMF_IIDR_REVISION, 0 ) | \
> + FIELD_PREP_CONST(MPAMF_IIDR_IMPLEMENTER, 0x36b)
> +
>
> /* The values for MSMON_CFG_MBWU_FLT.RWBW */
> enum mon_filter_options {
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread
* [RFC PATCH 36/38] arm_mpam: Add workaround for T241-MPAM-4
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (34 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 35/38] arm_mpam: Add workaround for T241-MPAM-1 James Morse
@ 2025-12-05 21:58 ` James Morse
2025-12-09 16:58 ` Ben Horgan
2025-12-05 21:59 ` [RFC PATCH 37/38] arm_mpam: Add workaround for T241-MPAM-6 James Morse
` (2 subsequent siblings)
38 siblings, 1 reply; 95+ messages in thread
From: James Morse @ 2025-12-05 21:58 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
From: Shanker Donthineni <sdonthineni@nvidia.com>
In the T241 implementation of memory-bandwidth partitioning, in the
absence of contention for bandwidth, the minimum bandwidth setting
can affect the amount of achieved bandwidth. Specifically, the
achieved bandwidth in the absence of contention can settle to any
value between the values of MPAMCFG_MBW_MIN and MPAMCFG_MBW_MAX.
Also, if MPAMCFG_MBW_MIN is set zero (below 0.78125%), once a core
enters a throttled state, it will never leave that state.
The first issue is not a cocern if the MPAM software allows to
program MPAMCFG_MBW_MIN through the sysfs interface. This patch
ensures program MBW_MIN=1 (0.78125%) whenever MPAMCFG_MBW_MIN=0
is programmed.
In the scenario where the resctrl doesn't support the MBW_MIN
interface via sysfs, to achieve bandwidth closer to MW_MAX in the
absence of contention, software should configure a relatively narrow
gap between MBW_MIN and MBW_MAX. The recommendation is to use a 5%
gap to mitigate the problem.
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
[ morse: Added as second quirk, adapted to use the new intermediate values
in mpam_extend_config() ]
Signed-off-by: James Morse <james.morse@arm.com>
---
Documentation/arch/arm64/silicon-errata.rst | 2 +
drivers/resctrl/mpam_devices.c | 60 ++++++++++++++++-----
drivers/resctrl/mpam_internal.h | 1 +
3 files changed, 49 insertions(+), 14 deletions(-)
diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
index 4e86b85fe3d6..b18bc704d4a1 100644
--- a/Documentation/arch/arm64/silicon-errata.rst
+++ b/Documentation/arch/arm64/silicon-errata.rst
@@ -248,6 +248,8 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| NVIDIA | T241 MPAM | T241-MPAM-1 | N/A |
+----------------+-----------------+-----------------+-----------------------------+
+| NVIDIA | T241 MPAM | T241-MPAM-4 | N/A |
++----------------+-----------------+-----------------+-----------------------------+
+----------------+-----------------+-----------------+-----------------------------+
| Freescale/NXP | LS2080A/LS1043A | A-008585 | FSL_ERRATUM_A008585 |
+----------------+-----------------+-----------------+-----------------------------+
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index f1f03ceade0a..5ba0aa703807 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -678,6 +678,12 @@ static const struct mpam_quirk mpam_quirks[] = {
.iidr_mask = MPAM_IIDR_MATCH_ONE,
.workaround = T241_SCRUB_SHADOW_REGS,
},
+ {
+ /* NVIDIA t241 erratum T241-MPAM-4 */
+ .iidr = MPAM_IIDR_NVIDIA_T421,
+ .iidr_mask = MPAM_IIDR_MATCH_ONE,
+ .workaround = T241_FORCE_MBW_MIN_TO_ONE,
+ },
{ NULL }, /* Sentinel */
};
@@ -1622,6 +1628,22 @@ static void mpam_init_reset_cfg(struct mpam_config *reset_cfg)
bitmap_fill(reset_cfg->features, MPAM_FEATURE_LAST);
}
+/*
+ * This is not part of mpam_init_reset_cfg() as high level callers have the
+ * class, and low level callers a ris.
+ */
+static void mpam_wa_t241_force_mbw_min_to_one(struct mpam_config *cfg,
+ struct mpam_props *props)
+{
+ u16 max_hw_value, min_hw_granule, res0_bits;
+
+ res0_bits = 16 - props->bwa_wd;
+ max_hw_value = ((1 << props->bwa_wd) - 1) << res0_bits;
+ min_hw_granule = ~max_hw_value;
+
+ cfg->mbw_min = min_hw_granule + 1;
+}
+
/*
* Called via smp_call_on_cpu() to prevent migration, while still being
* pre-emptible. Caller must hold mpam_srcu.
@@ -2524,7 +2546,8 @@ static void __destroy_component_cfg(struct mpam_component *comp)
static void mpam_reset_component_cfg(struct mpam_component *comp)
{
int i;
- struct mpam_props *cprops = &comp->class->props;
+ struct mpam_class *class = comp->class;
+ struct mpam_props *cprops = &class->props;
mpam_assert_partid_sizes_fixed();
@@ -2539,6 +2562,10 @@ static void mpam_reset_component_cfg(struct mpam_component *comp)
comp->cfg[i].mbw_pbm = GENMASK(cprops->mbw_pbm_bits - 1, 0);
if (cprops->bwa_wd)
comp->cfg[i].mbw_max = GENMASK(15, 16 - cprops->bwa_wd);
+
+ if (mpam_has_quirk(T241_FORCE_MBW_MIN_TO_ONE, class))
+ mpam_wa_t241_force_mbw_min_to_one(&comp->cfg[i],
+ &class->props);
}
}
@@ -2825,6 +2852,18 @@ static void mpam_extend_config(struct mpam_class *class, struct mpam_config *cfg
u16 min, min_hw_granule, delta;
u16 max_hw_value, res0_bits;
+ /*
+ * Calculate the values the 'min' control can hold.
+ * e.g. on a platform with bwa_wd = 8, min_hw_granule is 0x00ff because
+ * those bits are RES0. Configurations of this value are effectively
+ * zero. But configurations need to saturate at min_hw_granule on
+ * systems with mismatched bwa_wd, where the 'less than 0' values are
+ * implemented on some MSC, but not others.
+ */
+ res0_bits = 16 - cprops->bwa_wd;
+ max_hw_value = ((1 << cprops->bwa_wd) - 1) << res0_bits;
+ min_hw_granule = ~max_hw_value;
+
/*
* MAX and MIN should be set together. If only one is provided,
* generate a configuration for the other. If only one control
@@ -2834,19 +2873,6 @@ static void mpam_extend_config(struct mpam_class *class, struct mpam_config *cfg
*/
if (mpam_has_feature(mpam_feat_mbw_max, cfg) &&
!mpam_has_feature(mpam_feat_mbw_min, cfg)) {
- /*
- * Calculate the values the 'min' control can hold.
- * e.g. on a platform with bwa_wd = 8, min_hw_granule is 0x00ff
- * because those bits are RES0. Configurations of this value
- * are effectively zero. But configurations need to saturate
- * at min_hw_granule on systems with mismatched bwa_wd, where
- * the 'less than 0' values are implemented on some MSC, but
- * not others.
- */
- res0_bits = 16 - cprops->bwa_wd;
- max_hw_value = ((1 << cprops->bwa_wd) - 1) << res0_bits;
- min_hw_granule = ~max_hw_value;
-
delta = ((5 * MPAMCFG_MBW_MAX_MAX) / 100) - 1;
if (cfg->mbw_max > delta)
min = cfg->mbw_max - delta;
@@ -2856,6 +2882,12 @@ static void mpam_extend_config(struct mpam_class *class, struct mpam_config *cfg
cfg->mbw_min = max(min, min_hw_granule);
mpam_set_feature(mpam_feat_mbw_min, cfg);
}
+
+ if (mpam_has_quirk(T241_FORCE_MBW_MIN_TO_ONE, class) &&
+ cfg->mbw_min <= min_hw_granule) {
+ cfg->mbw_min = min_hw_granule + 1;
+ mpam_set_feature(mpam_feat_mbw_min, cfg);
+ }
}
int mpam_apply_config(struct mpam_component *comp, u16 partid,
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 70b78cfd1f5b..01882f0acee2 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -223,6 +223,7 @@ struct mpam_props {
/* Workaround bits for msc->quirks */
enum mpam_device_quirks {
T241_SCRUB_SHADOW_REGS,
+ T241_FORCE_MBW_MIN_TO_ONE,
MPAM_QUIRK_LAST,
};
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [RFC PATCH 36/38] arm_mpam: Add workaround for T241-MPAM-4
2025-12-05 21:58 ` [RFC PATCH 36/38] arm_mpam: Add workaround for T241-MPAM-4 James Morse
@ 2025-12-09 16:58 ` Ben Horgan
0 siblings, 0 replies; 95+ messages in thread
From: Ben Horgan @ 2025-12-09 16:58 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel
Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal
Hi James,
On 12/5/25 21:58, James Morse wrote:
> From: Shanker Donthineni <sdonthineni@nvidia.com>
>
> In the T241 implementation of memory-bandwidth partitioning, in the
> absence of contention for bandwidth, the minimum bandwidth setting
> can affect the amount of achieved bandwidth. Specifically, the
> achieved bandwidth in the absence of contention can settle to any
> value between the values of MPAMCFG_MBW_MIN and MPAMCFG_MBW_MAX.
> Also, if MPAMCFG_MBW_MIN is set zero (below 0.78125%), once a core
> enters a throttled state, it will never leave that state.
>
> The first issue is not a cocern if the MPAM software allows to
> program MPAMCFG_MBW_MIN through the sysfs interface. This patch
> ensures program MBW_MIN=1 (0.78125%) whenever MPAMCFG_MBW_MIN=0
> is programmed.
>
> In the scenario where the resctrl doesn't support the MBW_MIN
> interface via sysfs, to achieve bandwidth closer to MW_MAX in the
> absence of contention, software should configure a relatively narrow
> gap between MBW_MIN and MBW_MAX. The recommendation is to use a 5%
> gap to mitigate the problem.
>
> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
> [ morse: Added as second quirk, adapted to use the new intermediate values
> in mpam_extend_config() ]
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Documentation/arch/arm64/silicon-errata.rst | 2 +
> drivers/resctrl/mpam_devices.c | 60 ++++++++++++++++-----
> drivers/resctrl/mpam_internal.h | 1 +
> 3 files changed, 49 insertions(+), 14 deletions(-)
>
> diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
> index 4e86b85fe3d6..b18bc704d4a1 100644
> --- a/Documentation/arch/arm64/silicon-errata.rst
> +++ b/Documentation/arch/arm64/silicon-errata.rst
> @@ -248,6 +248,8 @@ stable kernels.
> +----------------+-----------------+-----------------+-----------------------------+
> | NVIDIA | T241 MPAM | T241-MPAM-1 | N/A |
> +----------------+-----------------+-----------------+-----------------------------+
> +| NVIDIA | T241 MPAM | T241-MPAM-4 | N/A |
> ++----------------+-----------------+-----------------+-----------------------------+
> +----------------+-----------------+-----------------+-----------------------------+
> | Freescale/NXP | LS2080A/LS1043A | A-008585 | FSL_ERRATUM_A008585 |
> +----------------+-----------------+-----------------+-----------------------------+
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index f1f03ceade0a..5ba0aa703807 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -678,6 +678,12 @@ static const struct mpam_quirk mpam_quirks[] = {
> .iidr_mask = MPAM_IIDR_MATCH_ONE,
> .workaround = T241_SCRUB_SHADOW_REGS,
> },
> + {
> + /* NVIDIA t241 erratum T241-MPAM-4 */
> + .iidr = MPAM_IIDR_NVIDIA_T421,
> + .iidr_mask = MPAM_IIDR_MATCH_ONE,
> + .workaround = T241_FORCE_MBW_MIN_TO_ONE,
> + },
> { NULL }, /* Sentinel */
> };
>
> @@ -1622,6 +1628,22 @@ static void mpam_init_reset_cfg(struct mpam_config *reset_cfg)
> bitmap_fill(reset_cfg->features, MPAM_FEATURE_LAST);
> }
>
> +/*
> + * This is not part of mpam_init_reset_cfg() as high level callers have the
> + * class, and low level callers a ris.
> + */
> +static void mpam_wa_t241_force_mbw_min_to_one(struct mpam_config *cfg,
> + struct mpam_props *props)
> +{
> + u16 max_hw_value, min_hw_granule, res0_bits;
> +
> + res0_bits = 16 - props->bwa_wd;
> + max_hw_value = ((1 << props->bwa_wd) - 1) << res0_bits;
> + min_hw_granule = ~max_hw_value;
> +
> + cfg->mbw_min = min_hw_granule + 1;
> +}
> +
> /*
> * Called via smp_call_on_cpu() to prevent migration, while still being
> * pre-emptible. Caller must hold mpam_srcu.
> @@ -2524,7 +2546,8 @@ static void __destroy_component_cfg(struct mpam_component *comp)
> static void mpam_reset_component_cfg(struct mpam_component *comp)
> {
> int i;
> - struct mpam_props *cprops = &comp->class->props;
> + struct mpam_class *class = comp->class;
> + struct mpam_props *cprops = &class->props;
>
> mpam_assert_partid_sizes_fixed();
>
> @@ -2539,6 +2562,10 @@ static void mpam_reset_component_cfg(struct mpam_component *comp)
> comp->cfg[i].mbw_pbm = GENMASK(cprops->mbw_pbm_bits - 1, 0);
> if (cprops->bwa_wd)
> comp->cfg[i].mbw_max = GENMASK(15, 16 - cprops->bwa_wd);
> +
> + if (mpam_has_quirk(T241_FORCE_MBW_MIN_TO_ONE, class))
> + mpam_wa_t241_force_mbw_min_to_one(&comp->cfg[i],
> + &class->props);
> }
> }
Also need to consider the mbw_min in mpam_reset_ris() reset config.
>
> @@ -2825,6 +2852,18 @@ static void mpam_extend_config(struct mpam_class *class, struct mpam_config *cfg
> u16 min, min_hw_granule, delta;
> u16 max_hw_value, res0_bits;
>
> + /*
> + * Calculate the values the 'min' control can hold.
> + * e.g. on a platform with bwa_wd = 8, min_hw_granule is 0x00ff because
> + * those bits are RES0. Configurations of this value are effectively
> + * zero. But configurations need to saturate at min_hw_granule on
> + * systems with mismatched bwa_wd, where the 'less than 0' values are
> + * implemented on some MSC, but not others.
> + */
> + res0_bits = 16 - cprops->bwa_wd;
> + max_hw_value = ((1 << cprops->bwa_wd) - 1) << res0_bits;
> + min_hw_granule = ~max_hw_value;
> +
> /*
> * MAX and MIN should be set together. If only one is provided,
> * generate a configuration for the other. If only one control
> @@ -2834,19 +2873,6 @@ static void mpam_extend_config(struct mpam_class *class, struct mpam_config *cfg
> */
> if (mpam_has_feature(mpam_feat_mbw_max, cfg) &&
> !mpam_has_feature(mpam_feat_mbw_min, cfg)) {
> - /*
> - * Calculate the values the 'min' control can hold.
> - * e.g. on a platform with bwa_wd = 8, min_hw_granule is 0x00ff
> - * because those bits are RES0. Configurations of this value
> - * are effectively zero. But configurations need to saturate
> - * at min_hw_granule on systems with mismatched bwa_wd, where
> - * the 'less than 0' values are implemented on some MSC, but
> - * not others.
> - */
> - res0_bits = 16 - cprops->bwa_wd;
> - max_hw_value = ((1 << cprops->bwa_wd) - 1) << res0_bits;
> - min_hw_granule = ~max_hw_value;
> -
> delta = ((5 * MPAMCFG_MBW_MAX_MAX) / 100) - 1;
> if (cfg->mbw_max > delta)
> min = cfg->mbw_max - delta;
> @@ -2856,6 +2882,12 @@ static void mpam_extend_config(struct mpam_class *class, struct mpam_config *cfg
> cfg->mbw_min = max(min, min_hw_granule);
> mpam_set_feature(mpam_feat_mbw_min, cfg);
> }
> +
> + if (mpam_has_quirk(T241_FORCE_MBW_MIN_TO_ONE, class) &&
> + cfg->mbw_min <= min_hw_granule) {
> + cfg->mbw_min = min_hw_granule + 1;
> + mpam_set_feature(mpam_feat_mbw_min, cfg);
> + }
> }
>
> int mpam_apply_config(struct mpam_component *comp, u16 partid,
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 70b78cfd1f5b..01882f0acee2 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -223,6 +223,7 @@ struct mpam_props {
> /* Workaround bits for msc->quirks */
> enum mpam_device_quirks {
> T241_SCRUB_SHADOW_REGS,
> + T241_FORCE_MBW_MIN_TO_ONE,
> MPAM_QUIRK_LAST,
> };
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread
* [RFC PATCH 37/38] arm_mpam: Add workaround for T241-MPAM-6
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (35 preceding siblings ...)
2025-12-05 21:58 ` [RFC PATCH 36/38] arm_mpam: Add workaround for T241-MPAM-4 James Morse
@ 2025-12-05 21:59 ` James Morse
2025-12-09 17:06 ` Ben Horgan
2025-12-05 21:59 ` [RFC PATCH 38/38] arm_mpam: Quirk CMN-650's CSU NRDY behaviour James Morse
2025-12-09 14:40 ` [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
38 siblings, 1 reply; 95+ messages in thread
From: James Morse @ 2025-12-05 21:59 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
From: Shanker Donthineni <sdonthineni@nvidia.com>
The registers MSMON_MBWU_L and MSMON_MBWU return the number of
requests rather than the number of bytes transferred.
Bandwidth resource monitoring is performed at the last level cache,
where each request arrive in 64Byte granularity. The current
implementation returns the number of transactions received at the
last level cache but does not provide the value in bytes. Scaling
by 64 gives an accurate byte count to match the MPAM specification
for the MSMON_MBWU and MSMON_MBWU_L registers. This patch fixes
the issue by reporting the actual number of bytes instead of the
number of transactions from __ris_msmon_read().
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Documentation/arch/arm64/silicon-errata.rst | 2 ++
drivers/resctrl/mpam_devices.c | 24 +++++++++++++++++++--
drivers/resctrl/mpam_internal.h | 1 +
3 files changed, 25 insertions(+), 2 deletions(-)
diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
index b18bc704d4a1..e810b2a8f40e 100644
--- a/Documentation/arch/arm64/silicon-errata.rst
+++ b/Documentation/arch/arm64/silicon-errata.rst
@@ -250,6 +250,8 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| NVIDIA | T241 MPAM | T241-MPAM-4 | N/A |
+----------------+-----------------+-----------------+-----------------------------+
+| NVIDIA | T241 MPAM | T241-MPAM-6 | N/A |
++----------------+-----------------+-----------------+-----------------------------+
+----------------+-----------------+-----------------+-----------------------------+
| Freescale/NXP | LS2080A/LS1043A | A-008585 | FSL_ERRATUM_A008585 |
+----------------+-----------------+-----------------+-----------------------------+
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index 5ba0aa703807..c17a6fdea982 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -684,6 +684,12 @@ static const struct mpam_quirk mpam_quirks[] = {
.iidr_mask = MPAM_IIDR_MATCH_ONE,
.workaround = T241_FORCE_MBW_MIN_TO_ONE,
},
+ {
+ /* NVIDIA t241 erratum T241-MPAM-6 */
+ .iidr = MPAM_IIDR_NVIDIA_T421,
+ .iidr_mask = MPAM_IIDR_MATCH_ONE,
+ .workaround = T241_MBW_COUNTER_SCALE_64,
+ },
{ NULL }, /* Sentinel */
};
@@ -1140,7 +1146,7 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
}
}
-static u64 mpam_msmon_overflow_val(enum mpam_device_features type)
+static u64 __mpam_msmon_overflow_val(enum mpam_device_features type)
{
/* TODO: implement scaling counters */
switch (type) {
@@ -1155,6 +1161,17 @@ static u64 mpam_msmon_overflow_val(enum mpam_device_features type)
}
}
+static u64 mpam_msmon_overflow_val(enum mpam_device_features type,
+ struct mpam_msc *msc)
+{
+ u64 overflow_val = __mpam_msmon_overflow_val(type);
+
+ if (mpam_has_quirk(T241_MBW_COUNTER_SCALE_64, msc))
+ overflow_val *= 64;
+
+ return overflow_val;
+}
+
static void __ris_msmon_read(void *arg)
{
u64 now;
@@ -1245,13 +1262,16 @@ static void __ris_msmon_read(void *arg)
now = FIELD_GET(MSMON___VALUE, now);
}
+ if (mpam_has_quirk(T241_MBW_COUNTER_SCALE_64, msc))
+ now *= 64;
+
if (nrdy)
break;
mbwu_state = &ris->mbwu_state[ctx->mon];
if (overflow)
- mbwu_state->correction += mpam_msmon_overflow_val(m->type);
+ mbwu_state->correction += mpam_msmon_overflow_val(m->type, msc);
/*
* Include bandwidth consumed before the last hardware reset and
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 01882f0acee2..108a8373901c 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -224,6 +224,7 @@ struct mpam_props {
enum mpam_device_quirks {
T241_SCRUB_SHADOW_REGS,
T241_FORCE_MBW_MIN_TO_ONE,
+ T241_MBW_COUNTER_SCALE_64,
MPAM_QUIRK_LAST,
};
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [RFC PATCH 37/38] arm_mpam: Add workaround for T241-MPAM-6
2025-12-05 21:59 ` [RFC PATCH 37/38] arm_mpam: Add workaround for T241-MPAM-6 James Morse
@ 2025-12-09 17:06 ` Ben Horgan
0 siblings, 0 replies; 95+ messages in thread
From: Ben Horgan @ 2025-12-09 17:06 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel
Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal
Hi James, Shanker,
On 12/5/25 21:59, James Morse wrote:
> From: Shanker Donthineni <sdonthineni@nvidia.com>
>
> The registers MSMON_MBWU_L and MSMON_MBWU return the number of
> requests rather than the number of bytes transferred.
>
> Bandwidth resource monitoring is performed at the last level cache,
> where each request arrive in 64Byte granularity. The current
> implementation returns the number of transactions received at the
> last level cache but does not provide the value in bytes. Scaling
> by 64 gives an accurate byte count to match the MPAM specification
> for the MSMON_MBWU and MSMON_MBWU_L registers. This patch fixes
> the issue by reporting the actual number of bytes instead of the
> number of transactions from __ris_msmon_read().
>
> Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Documentation/arch/arm64/silicon-errata.rst | 2 ++
> drivers/resctrl/mpam_devices.c | 24 +++++++++++++++++++--
> drivers/resctrl/mpam_internal.h | 1 +
> 3 files changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
> index b18bc704d4a1..e810b2a8f40e 100644
> --- a/Documentation/arch/arm64/silicon-errata.rst
> +++ b/Documentation/arch/arm64/silicon-errata.rst
> @@ -250,6 +250,8 @@ stable kernels.
> +----------------+-----------------+-----------------+-----------------------------+
> | NVIDIA | T241 MPAM | T241-MPAM-4 | N/A |
> +----------------+-----------------+-----------------+-----------------------------+
> +| NVIDIA | T241 MPAM | T241-MPAM-6 | N/A |
> ++----------------+-----------------+-----------------+-----------------------------+
> +----------------+-----------------+-----------------+-----------------------------+
> | Freescale/NXP | LS2080A/LS1043A | A-008585 | FSL_ERRATUM_A008585 |
> +----------------+-----------------+-----------------+-----------------------------+
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
> index 5ba0aa703807..c17a6fdea982 100644
> --- a/drivers/resctrl/mpam_devices.c
> +++ b/drivers/resctrl/mpam_devices.c
> @@ -684,6 +684,12 @@ static const struct mpam_quirk mpam_quirks[] = {
> .iidr_mask = MPAM_IIDR_MATCH_ONE,
> .workaround = T241_FORCE_MBW_MIN_TO_ONE,
> },
> + {
> + /* NVIDIA t241 erratum T241-MPAM-6 */
> + .iidr = MPAM_IIDR_NVIDIA_T421,
> + .iidr_mask = MPAM_IIDR_MATCH_ONE,
> + .workaround = T241_MBW_COUNTER_SCALE_64,
> + },
> { NULL }, /* Sentinel */
> };
>
> @@ -1140,7 +1146,7 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
> }
> }
>
> -static u64 mpam_msmon_overflow_val(enum mpam_device_features type)
> +static u64 __mpam_msmon_overflow_val(enum mpam_device_features type)
> {
> /* TODO: implement scaling counters */
> switch (type) {
> @@ -1155,6 +1161,17 @@ static u64 mpam_msmon_overflow_val(enum mpam_device_features type)
> }
> }
>
> +static u64 mpam_msmon_overflow_val(enum mpam_device_features type,
> + struct mpam_msc *msc)
> +{
> + u64 overflow_val = __mpam_msmon_overflow_val(type);
> +
> + if (mpam_has_quirk(T241_MBW_COUNTER_SCALE_64, msc))
> + overflow_val *= 64;
> +
> + return overflow_val;
overflow_val wraps around for 63 bit counters. Do those need to be
considered for this errata?
> +}
> +
> static void __ris_msmon_read(void *arg)
> {
> u64 now;
> @@ -1245,13 +1262,16 @@ static void __ris_msmon_read(void *arg)
> now = FIELD_GET(MSMON___VALUE, now);
> }
>
> + if (mpam_has_quirk(T241_MBW_COUNTER_SCALE_64, msc))
> + now *= 64;
> +
> if (nrdy)
> break;
>
> mbwu_state = &ris->mbwu_state[ctx->mon];
>
> if (overflow)
> - mbwu_state->correction += mpam_msmon_overflow_val(m->type);
> + mbwu_state->correction += mpam_msmon_overflow_val(m->type, msc);
>
> /*
> * Include bandwidth consumed before the last hardware reset and
> diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
> index 01882f0acee2..108a8373901c 100644
> --- a/drivers/resctrl/mpam_internal.h
> +++ b/drivers/resctrl/mpam_internal.h
> @@ -224,6 +224,7 @@ struct mpam_props {
> enum mpam_device_quirks {
> T241_SCRUB_SHADOW_REGS,
> T241_FORCE_MBW_MIN_TO_ONE,
> + T241_MBW_COUNTER_SCALE_64,
> MPAM_QUIRK_LAST,
> };
>
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread
* [RFC PATCH 38/38] arm_mpam: Quirk CMN-650's CSU NRDY behaviour
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (36 preceding siblings ...)
2025-12-05 21:59 ` [RFC PATCH 37/38] arm_mpam: Add workaround for T241-MPAM-6 James Morse
@ 2025-12-05 21:59 ` James Morse
2025-12-09 14:40 ` [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
38 siblings, 0 replies; 95+ messages in thread
From: James Morse @ 2025-12-05 21:59 UTC (permalink / raw)
To: linux-kernel, linux-arm-kernel
Cc: James Morse, D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, Ben Horgan, rohit.mathew, reinette.chatre,
Punit Agrawal
CMN-650 is afflicted with an erratum where the CSU NRDY bit never clears.
This tells us the monitor never finishes scanning the cache. The erratum
document says to wait the maximum time, then ignore the field.
Add a flag to indicate whether this is the final attempt to read the
counter, and when this quirk is applied, ignore the NRDY field.
This means accesses to this counter will always retry, even if the
counter was previously programmed to the same values.
The counter value is not expected to be stable, it drifts up and down
with each allocation and eviction. The CSU register provides the value
for a point in time.
Signed-off-by: James Morse <james.morse@arm.com>
---
Documentation/arch/arm64/silicon-errata.rst | 3 +++
drivers/resctrl/mpam_devices.c | 12 ++++++++++++
drivers/resctrl/mpam_internal.h | 5 +++++
3 files changed, 20 insertions(+)
diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
index e810b2a8f40e..3667650036fb 100644
--- a/Documentation/arch/arm64/silicon-errata.rst
+++ b/Documentation/arch/arm64/silicon-errata.rst
@@ -213,6 +213,9 @@ stable kernels.
| ARM | GIC-700 | #2941627 | ARM64_ERRATUM_2941627 |
+----------------+-----------------+-----------------+-----------------------------+
+----------------+-----------------+-----------------+-----------------------------+
+| ARM | CMN-650 | #3642720 | N/A |
++----------------+-----------------+-----------------+-----------------------------+
++----------------+-----------------+-----------------+-----------------------------+
| Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_845719 |
+----------------+-----------------+-----------------+-----------------------------+
| Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_843419 |
diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
index c17a6fdea982..174a0224ed62 100644
--- a/drivers/resctrl/mpam_devices.c
+++ b/drivers/resctrl/mpam_devices.c
@@ -690,6 +690,12 @@ static const struct mpam_quirk mpam_quirks[] = {
.iidr_mask = MPAM_IIDR_MATCH_ONE,
.workaround = T241_MBW_COUNTER_SCALE_64,
},
+ {
+ /* ARM CMN-650 CSU erratum 3642720 */
+ .iidr = MPAM_IIDR_ARM_CMN_650,
+ .iidr_mask = MPAM_IIDR_MATCH_ONE,
+ .workaround = IGNORE_CSU_NRDY,
+ },
{ NULL }, /* Sentinel */
};
@@ -997,6 +1003,7 @@ struct mon_read {
enum mpam_device_features type;
u64 *val;
int err;
+ bool waited_timeout;
};
static bool mpam_ris_has_mbwu_long_counter(struct mpam_msc_ris *ris)
@@ -1242,6 +1249,10 @@ static void __ris_msmon_read(void *arg)
if (mpam_has_feature(mpam_feat_msmon_csu_hw_nrdy, rprops))
nrdy = now & MSMON___NRDY;
now = FIELD_GET(MSMON___VALUE, now);
+
+ if (mpam_has_quirk(IGNORE_CSU_NRDY, msc) && m->waited_timeout)
+ nrdy = false;
+
break;
case mpam_feat_msmon_mbwu_31counter:
case mpam_feat_msmon_mbwu_44counter:
@@ -1377,6 +1388,7 @@ int mpam_msmon_read(struct mpam_component *comp, struct mon_cfg *ctx,
.ctx = ctx,
.type = type,
.val = val,
+ .waited_timeout = true,
};
*val = 0;
diff --git a/drivers/resctrl/mpam_internal.h b/drivers/resctrl/mpam_internal.h
index 108a8373901c..00a61e4277f5 100644
--- a/drivers/resctrl/mpam_internal.h
+++ b/drivers/resctrl/mpam_internal.h
@@ -225,6 +225,7 @@ enum mpam_device_quirks {
T241_SCRUB_SHADOW_REGS,
T241_FORCE_MBW_MIN_TO_ONE,
T241_MBW_COUNTER_SCALE_64,
+ IGNORE_CSU_NRDY,
MPAM_QUIRK_LAST,
};
@@ -250,6 +251,10 @@ struct mpam_quirk {
FIELD_PREP_CONST(MPAMF_IIDR_REVISION, 0 ) | \
FIELD_PREP_CONST(MPAMF_IIDR_IMPLEMENTER, 0x36b)
+#define MPAM_IIDR_ARM_CMN_650 FIELD_PREP_CONST(MPAMF_IIDR_PRODUCTID, 0 ) | \
+ FIELD_PREP_CONST(MPAMF_IIDR_VARIANT, 0 ) | \
+ FIELD_PREP_CONST(MPAMF_IIDR_REVISION, 0 ) | \
+ FIELD_PREP_CONST(MPAMF_IIDR_IMPLEMENTER, 0x43b)
/* The values for MSMON_CFG_MBWU_FLT.RWBW */
enum mon_filter_options {
--
2.39.5
^ permalink raw reply related [flat|nested] 95+ messages in thread* Re: [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code
2025-12-05 21:58 [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code James Morse
` (37 preceding siblings ...)
2025-12-05 21:59 ` [RFC PATCH 38/38] arm_mpam: Quirk CMN-650's CSU NRDY behaviour James Morse
@ 2025-12-09 14:40 ` Ben Horgan
2025-12-09 15:53 ` Peter Newman
38 siblings, 1 reply; 95+ messages in thread
From: Ben Horgan @ 2025-12-09 14:40 UTC (permalink / raw)
To: James Morse, linux-kernel, linux-arm-kernel
Cc: D Scott Phillips OS, carl, lcherian, bobo.shaobowang,
tan.shaopeng, baolin.wang, Jamie Iles, Xin Hao, peternewman,
dfustini, amitsinght, David Hildenbrand, Dave Martin, Koba Ko,
Shanker Donthineni, fenghuay, baisheng.gao, Jonathan Cameron,
Gavin Shan, rohit.mathew, reinette.chatre, Punit Agrawal
Hi James and all,
As James is otherwise occupied, I am planning to post a follow up
version of this series once it's had time to be reviewed. I will be
posting my own review comments; please give them extra scrutiny.
On 12/5/25 21:58, James Morse wrote:
> This is the missing piece to make MPAM usable resctrl in user-space. This has
> shed its debugfs code and the read/write 'event configuration' for the monitors
> to make the series smaller.
>
[...]
>
> This series is based on arm64/for-next/core, and can be retrieved from:
> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/glue/rfc
>
> There is no snapshot branch - this is it!
> I'll push the extras branch once I've gotten a handle on the DT shaped mess in
> there.
One major departure from the previous snapshot branches referenced in
the base driver series is that the same MPAM setting are used for
kernel-space and user-space. That is, MPAM1_EL1 is set to the same value
as MPAM0_EL1 rather than keeping the default value. The advantages of
this are that it is closer to the x86 model where the closid is globally
applicable, all partids are usable from user-space and user-space can't
bypass MPAM controls by doing the work in the kernel. However, this
causes some priority inversion where a high priority task waits to take
a mutex held by another whose resources are restricted by MPAM.
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread* Re: [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code
2025-12-09 14:40 ` [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code Ben Horgan
@ 2025-12-09 15:53 ` Peter Newman
2025-12-09 16:14 ` Ben Horgan
0 siblings, 1 reply; 95+ messages in thread
From: Peter Newman @ 2025-12-09 15:53 UTC (permalink / raw)
To: Ben Horgan
Cc: James Morse, linux-kernel, linux-arm-kernel, D Scott Phillips OS,
carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
Jamie Iles, Xin Hao, dfustini, amitsinght, David Hildenbrand,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Gavin Shan, rohit.mathew, reinette.chatre,
Punit Agrawal
Hi Ben,
On Tue, Dec 9, 2025 at 3:40 PM Ben Horgan <ben.horgan@arm.com> wrote:
>
> Hi James and all,
>
> As James is otherwise occupied, I am planning to post a follow up
> version of this series once it's had time to be reviewed. I will be
> posting my own review comments; please give them extra scrutiny.
>
> On 12/5/25 21:58, James Morse wrote:
> > This is the missing piece to make MPAM usable resctrl in user-space. This has
> > shed its debugfs code and the read/write 'event configuration' for the monitors
> > to make the series smaller.
> >
> [...]
> >
> > This series is based on arm64/for-next/core, and can be retrieved from:
> > https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/glue/rfc
> >
> > There is no snapshot branch - this is it!
> > I'll push the extras branch once I've gotten a handle on the DT shaped mess in
> > there.
>
> One major departure from the previous snapshot branches referenced in
> the base driver series is that the same MPAM setting are used for
> kernel-space and user-space. That is, MPAM1_EL1 is set to the same value
> as MPAM0_EL1 rather than keeping the default value. The advantages of
> this are that it is closer to the x86 model where the closid is globally
> applicable, all partids are usable from user-space and user-space can't
> bypass MPAM controls by doing the work in the kernel. However, this
> causes some priority inversion where a high priority task waits to take
> a mutex held by another whose resources are restricted by MPAM.
> Thanks,
In our experience, the disadvantages of the x86 model were worse
because they triggered on hosts unintentionally, while making the
kernel do work unrestricted on behalf of the user at least requires
intentional abuse.
-Peter
^ permalink raw reply [flat|nested] 95+ messages in thread
* Re: [RFC PATCH 00/38] arm_mpam: Add KVM/arm64 and resctrl glue code
2025-12-09 15:53 ` Peter Newman
@ 2025-12-09 16:14 ` Ben Horgan
0 siblings, 0 replies; 95+ messages in thread
From: Ben Horgan @ 2025-12-09 16:14 UTC (permalink / raw)
To: Peter Newman
Cc: James Morse, linux-kernel, linux-arm-kernel, D Scott Phillips OS,
carl, lcherian, bobo.shaobowang, tan.shaopeng, baolin.wang,
Jamie Iles, Xin Hao, dfustini, amitsinght, David Hildenbrand,
Dave Martin, Koba Ko, Shanker Donthineni, fenghuay, baisheng.gao,
Jonathan Cameron, Gavin Shan, rohit.mathew, reinette.chatre,
Punit Agrawal
Hi Peter,
On 12/9/25 15:53, Peter Newman wrote:
> Hi Ben,
>
> On Tue, Dec 9, 2025 at 3:40 PM Ben Horgan <ben.horgan@arm.com> wrote:
>>
>> Hi James and all,
[...]
>>
>> One major departure from the previous snapshot branches referenced in
>> the base driver series is that the same MPAM setting are used for
>> kernel-space and user-space. That is, MPAM1_EL1 is set to the same value
>> as MPAM0_EL1 rather than keeping the default value. The advantages of
>> this are that it is closer to the x86 model where the closid is globally
>> applicable, all partids are usable from user-space and user-space can't
>> bypass MPAM controls by doing the work in the kernel. However, this
>> causes some priority inversion where a high priority task waits to take
>> a mutex held by another whose resources are restricted by MPAM.
>> Thanks,
>
> In our experience, the disadvantages of the x86 model were worse
> because they triggered on hosts unintentionally, while making the
> kernel do work unrestricted on behalf of the user at least requires
> intentional abuse.
>
> -Peter
Thanks for the quick feedback. Do you have any more data/information on
this that you can share?
Thanks,
Ben
^ permalink raw reply [flat|nested] 95+ messages in thread