* [PATCH v4 1/7] fs/resctrl: Tidy up the error path in resctrl_mkdir_event_configs()
2026-03-26 17:25 [PATCH v4 0/7] x86,fs/resctrl: Pave the way for MPAM counter assignment Ben Horgan
@ 2026-03-26 17:25 ` Ben Horgan
2026-03-27 16:06 ` Reinette Chatre
2026-03-26 17:25 ` [PATCH v4 2/7] x86,fs/resctrl: Make 'event_filter' files read only if they're not configurable Ben Horgan
` (5 subsequent siblings)
6 siblings, 1 reply; 25+ messages in thread
From: Ben Horgan @ 2026-03-26 17:25 UTC (permalink / raw)
To: linux-kernel
Cc: tony.luck, reinette.chatre, Dave.Martin, james.morse, babu.moger,
tglx, mingo, bp, dave.hansen, x86, hpa, ben.horgan, fenghuay,
tan.shaopeng
The error path in resctrl_mkdir_event_configs() is unnecessarily
complicated. Simplify it to just return directly on error.
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
fs/resctrl/rdtgroup.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 5da305bd36c9..4753841c2ca3 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2331,20 +2331,19 @@ static int resctrl_mkdir_event_configs(struct rdt_resource *r, struct kernfs_nod
kn_subdir2 = kernfs_create_dir(kn_subdir, mevt->name, kn_subdir->mode, mevt);
if (IS_ERR(kn_subdir2)) {
ret = PTR_ERR(kn_subdir2);
- goto out;
+ return ret;
}
ret = rdtgroup_kn_set_ugid(kn_subdir2);
if (ret)
- goto out;
+ return ret;
ret = rdtgroup_add_files(kn_subdir2, RFTYPE_ASSIGN_CONFIG);
if (ret)
- break;
+ return ret;
}
-out:
- return ret;
+ return 0;
}
static int rdtgroup_mkdir_info_resdir(void *priv, char *name,
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [PATCH v4 1/7] fs/resctrl: Tidy up the error path in resctrl_mkdir_event_configs()
2026-03-26 17:25 ` [PATCH v4 1/7] fs/resctrl: Tidy up the error path in resctrl_mkdir_event_configs() Ben Horgan
@ 2026-03-27 16:06 ` Reinette Chatre
0 siblings, 0 replies; 25+ messages in thread
From: Reinette Chatre @ 2026-03-27 16:06 UTC (permalink / raw)
To: Ben Horgan, linux-kernel
Cc: tony.luck, Dave.Martin, james.morse, babu.moger, tglx, mingo, bp,
dave.hansen, x86, hpa, fenghuay, tan.shaopeng
Hi Ben,
On 3/26/26 10:25 AM, Ben Horgan wrote:
> The error path in resctrl_mkdir_event_configs() is unnecessarily
> complicated. Simplify it to just return directly on error.
>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reinette
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v4 2/7] x86,fs/resctrl: Make 'event_filter' files read only if they're not configurable
2026-03-26 17:25 [PATCH v4 0/7] x86,fs/resctrl: Pave the way for MPAM counter assignment Ben Horgan
2026-03-26 17:25 ` [PATCH v4 1/7] fs/resctrl: Tidy up the error path in resctrl_mkdir_event_configs() Ben Horgan
@ 2026-03-26 17:25 ` Ben Horgan
2026-03-27 16:13 ` Reinette Chatre
2026-03-26 17:25 ` [PATCH v4 3/7] fs/resctrl: Disallow the software controller when MBM counters are assignable Ben Horgan
` (4 subsequent siblings)
6 siblings, 1 reply; 25+ messages in thread
From: Ben Horgan @ 2026-03-26 17:25 UTC (permalink / raw)
To: linux-kernel
Cc: tony.luck, reinette.chatre, Dave.Martin, james.morse, babu.moger,
tglx, mingo, bp, dave.hansen, x86, hpa, ben.horgan, fenghuay,
tan.shaopeng
When the counter assignment mode is mbm_event resctrl assumes the MBM
events are configurable and exposes the 'event_filter' files. These files
live at info/L3_MON/event_configs/<event>/event_filter and are used to
display and set the event configuration. The MPAM architecture has support
for configuring the memory bandwidth utilization (MBWU) counters to only
count reads or only count writes. However, In MPAM, this event filtering
support is optional in the hardware (and not yet implemented in the MPAM
driver) but MBM counter assignment is always possible for MPAM MBWU
counters.
In order to support mbm_event mode with MPAM, make the 'event_filter' files
read only if the event configuration can't be changed. A user can still
chmod the file and so also return early with an error from
event_filter_write().
Introduce a new monitor property, mbm_cntr_configurable, to indicate
whether or not assignable MBM counters are configurable. On x86, set this
to true whenever mbm_cntr_assignable is true to keep existing behaviour.
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v2:
Use property, mbm_cntr_configurable, rather than arch hook
Change the event_filter mode to read only in res_common_files[]
Add resctrl_file_mode_init() and use in resctrl_l3_mon_resource_init()
set mbm_cntr_configurable for x86 ABMC and mention in commit message
---
arch/x86/kernel/cpu/resctrl/monitor.c | 1 +
fs/resctrl/internal.h | 2 ++
fs/resctrl/monitor.c | 7 +++++++
fs/resctrl/rdtgroup.c | 11 ++++++++++-
include/linux/resctrl.h | 16 +++++++++-------
5 files changed, 29 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 9bd87bae4983..794a6fb175e4 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -454,6 +454,7 @@ int __init rdt_get_l3_mon_config(struct rdt_resource *r)
(rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL) ||
rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL))) {
r->mon.mbm_cntr_assignable = true;
+ r->mon.mbm_cntr_configurable = true;
cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
hw_res->mbm_cntr_assign_enabled = true;
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 1a9b29119f88..48af75b9dc85 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -408,6 +408,8 @@ void __check_limbo(struct rdt_l3_mon_domain *d, bool force_free);
void resctrl_file_fflags_init(const char *config, unsigned long fflags);
+void resctrl_file_mode_init(const char *config, umode_t mode);
+
void rdt_staged_configs_clear(void);
bool closid_allocated(unsigned int closid);
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 49f3f6b846b2..8fec3dea33c3 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -1420,6 +1420,11 @@ ssize_t event_filter_write(struct kernfs_open_file *of, char *buf, size_t nbytes
ret = -EINVAL;
goto out_unlock;
}
+ if (!r->mon.mbm_cntr_configurable) {
+ rdt_last_cmd_puts("event_filter is not configurable\n");
+ ret = -EPERM;
+ goto out_unlock;
+ }
ret = resctrl_parse_mem_transactions(buf, &evt_cfg);
if (!ret && mevt->evt_cfg != evt_cfg) {
@@ -1884,6 +1889,8 @@ int resctrl_l3_mon_resource_init(void)
resctrl_file_fflags_init("available_mbm_cntrs",
RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
resctrl_file_fflags_init("event_filter", RFTYPE_ASSIGN_CONFIG);
+ if (r->mon.mbm_cntr_configurable)
+ resctrl_file_mode_init("event_filter", 0644);
resctrl_file_fflags_init("mbm_assign_on_mkdir", RFTYPE_MON_INFO |
RFTYPE_RES_CACHE);
resctrl_file_fflags_init("mbm_L3_assignments", RFTYPE_MON_BASE);
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 4753841c2ca3..fa5712db3778 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2020,7 +2020,7 @@ static struct rftype res_common_files[] = {
},
{
.name = "event_filter",
- .mode = 0644,
+ .mode = 0444,
.kf_ops = &rdtgroup_kf_single_ops,
.seq_show = event_filter_show,
.write = event_filter_write,
@@ -2213,6 +2213,15 @@ void resctrl_file_fflags_init(const char *config, unsigned long fflags)
rft->fflags = fflags;
}
+void resctrl_file_mode_init(const char *config, umode_t mode)
+{
+ struct rftype *rft;
+
+ rft = rdtgroup_get_rftype_by_name(config);
+ if (rft)
+ rft->mode = mode;
+}
+
/**
* rdtgroup_kn_mode_restrict - Restrict user access to named resctrl file
* @r: The resource group with which the file is associated.
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 006e57fd7ca5..06e8c72e8660 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -286,13 +286,14 @@ enum resctrl_schema_fmt {
/**
* struct resctrl_mon - Monitoring related data of a resctrl resource.
- * @num_rmid: Number of RMIDs available.
- * @mbm_cfg_mask: Memory transactions that can be tracked when bandwidth
- * monitoring events can be configured.
- * @num_mbm_cntrs: Number of assignable counters.
- * @mbm_cntr_assignable:Is system capable of supporting counter assignment?
- * @mbm_assign_on_mkdir:True if counters should automatically be assigned to MBM
- * events of monitor groups created via mkdir.
+ * @num_rmid: Number of RMIDs available.
+ * @mbm_cfg_mask: Memory transactions that can be tracked when
+ * bandwidth monitoring events can be configured.
+ * @num_mbm_cntrs: Number of assignable counters.
+ * @mbm_cntr_assignable: Is system capable of supporting counter assignment?
+ * @mbm_assign_on_mkdir: True if counters should automatically be assigned to MBM
+ * events of monitor groups created via mkdir.
+ * @mbm_cntr_configurable: True if assignable counters are configurable.
*/
struct resctrl_mon {
u32 num_rmid;
@@ -300,6 +301,7 @@ struct resctrl_mon {
int num_mbm_cntrs;
bool mbm_cntr_assignable;
bool mbm_assign_on_mkdir;
+ bool mbm_cntr_configurable;
};
/**
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [PATCH v4 2/7] x86,fs/resctrl: Make 'event_filter' files read only if they're not configurable
2026-03-26 17:25 ` [PATCH v4 2/7] x86,fs/resctrl: Make 'event_filter' files read only if they're not configurable Ben Horgan
@ 2026-03-27 16:13 ` Reinette Chatre
2026-03-27 17:15 ` Ben Horgan
0 siblings, 1 reply; 25+ messages in thread
From: Reinette Chatre @ 2026-03-27 16:13 UTC (permalink / raw)
To: Ben Horgan, linux-kernel
Cc: tony.luck, Dave.Martin, james.morse, babu.moger, tglx, mingo, bp,
dave.hansen, x86, hpa, fenghuay, tan.shaopeng
Hi Ben,
nit: Since there is some precedent for resctrl files changing permissions
after creation I think it would support this work to make clear that the
permissions are set before the files are created and as such they are
created with correct permissions.
This could just be a simple: "Make" -> "Create" in subject.
On 3/26/26 10:25 AM, Ben Horgan wrote:
> When the counter assignment mode is mbm_event resctrl assumes the MBM
> events are configurable and exposes the 'event_filter' files. These files
> live at info/L3_MON/event_configs/<event>/event_filter and are used to
> display and set the event configuration.
<split here into separata paragraphs>
The MPAM architecture has support
> for configuring the memory bandwidth utilization (MBWU) counters to only
> count reads or only count writes. However, In MPAM, this event filtering
> support is optional in the hardware (and not yet implemented in the MPAM
> driver) but MBM counter assignment is always possible for MPAM MBWU
> counters.
>
> In order to support mbm_event mode with MPAM, make the 'event_filter' files
"make" -> "create"
> read only if the event configuration can't be changed. A user can still
> chmod the file and so also return early with an error from
> event_filter_write().
I went back-and-forth a few times on whether we should add a permissions
check to rdtgroup_file_write() to not call rft->write() (and thus event_filter_write())
at all if the file is not writable. In the end I think this patch is ok since
there is the last_cmd_status help to give insight into why user is unable to
write to the file that may be writable under other circumstances. Any opinions
welcome here.
>
> Introduce a new monitor property, mbm_cntr_configurable, to indicate
> whether or not assignable MBM counters are configurable. On x86, set this
> to true whenever mbm_cntr_assignable is true to keep existing behaviour.
>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v2:
> Use property, mbm_cntr_configurable, rather than arch hook
> Change the event_filter mode to read only in res_common_files[]
> Add resctrl_file_mode_init() and use in resctrl_l3_mon_resource_init()
> set mbm_cntr_configurable for x86 ABMC and mention in commit message
> ---
> arch/x86/kernel/cpu/resctrl/monitor.c | 1 +
> fs/resctrl/internal.h | 2 ++
> fs/resctrl/monitor.c | 7 +++++++
> fs/resctrl/rdtgroup.c | 11 ++++++++++-
> include/linux/resctrl.h | 16 +++++++++-------
> 5 files changed, 29 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 9bd87bae4983..794a6fb175e4 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -454,6 +454,7 @@ int __init rdt_get_l3_mon_config(struct rdt_resource *r)
> (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL) ||
> rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL))) {
> r->mon.mbm_cntr_assignable = true;
> + r->mon.mbm_cntr_configurable = true;
> cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
> r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
> hw_res->mbm_cntr_assign_enabled = true;
> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
> index 1a9b29119f88..48af75b9dc85 100644
> --- a/fs/resctrl/internal.h
> +++ b/fs/resctrl/internal.h
> @@ -408,6 +408,8 @@ void __check_limbo(struct rdt_l3_mon_domain *d, bool force_free);
>
> void resctrl_file_fflags_init(const char *config, unsigned long fflags);
>
> +void resctrl_file_mode_init(const char *config, umode_t mode);
> +
> void rdt_staged_configs_clear(void);
>
> bool closid_allocated(unsigned int closid);
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index 49f3f6b846b2..8fec3dea33c3 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -1420,6 +1420,11 @@ ssize_t event_filter_write(struct kernfs_open_file *of, char *buf, size_t nbytes
> ret = -EINVAL;
> goto out_unlock;
> }
> + if (!r->mon.mbm_cntr_configurable) {
> + rdt_last_cmd_puts("event_filter is not configurable\n");
> + ret = -EPERM;
> + goto out_unlock;
> + }
>
> ret = resctrl_parse_mem_transactions(buf, &evt_cfg);
> if (!ret && mevt->evt_cfg != evt_cfg) {
> @@ -1884,6 +1889,8 @@ int resctrl_l3_mon_resource_init(void)
> resctrl_file_fflags_init("available_mbm_cntrs",
> RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
> resctrl_file_fflags_init("event_filter", RFTYPE_ASSIGN_CONFIG);
> + if (r->mon.mbm_cntr_configurable)
> + resctrl_file_mode_init("event_filter", 0644);
> resctrl_file_fflags_init("mbm_assign_on_mkdir", RFTYPE_MON_INFO |
> RFTYPE_RES_CACHE);
> resctrl_file_fflags_init("mbm_L3_assignments", RFTYPE_MON_BASE);
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index 4753841c2ca3..fa5712db3778 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -2020,7 +2020,7 @@ static struct rftype res_common_files[] = {
> },
> {
> .name = "event_filter",
> - .mode = 0644,
> + .mode = 0444,
> .kf_ops = &rdtgroup_kf_single_ops,
> .seq_show = event_filter_show,
> .write = event_filter_write,
> @@ -2213,6 +2213,15 @@ void resctrl_file_fflags_init(const char *config, unsigned long fflags)
> rft->fflags = fflags;
> }
>
> +void resctrl_file_mode_init(const char *config, umode_t mode)
> +{
> + struct rftype *rft;
> +
> + rft = rdtgroup_get_rftype_by_name(config);
> + if (rft)
> + rft->mode = mode;
> +}
> +
> /**
> * rdtgroup_kn_mode_restrict - Restrict user access to named resctrl file
> * @r: The resource group with which the file is associated.
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 006e57fd7ca5..06e8c72e8660 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -286,13 +286,14 @@ enum resctrl_schema_fmt {
>
> /**
> * struct resctrl_mon - Monitoring related data of a resctrl resource.
> - * @num_rmid: Number of RMIDs available.
> - * @mbm_cfg_mask: Memory transactions that can be tracked when bandwidth
> - * monitoring events can be configured.
> - * @num_mbm_cntrs: Number of assignable counters.
> - * @mbm_cntr_assignable:Is system capable of supporting counter assignment?
> - * @mbm_assign_on_mkdir:True if counters should automatically be assigned to MBM
> - * events of monitor groups created via mkdir.
> + * @num_rmid: Number of RMIDs available.
> + * @mbm_cfg_mask: Memory transactions that can be tracked when
> + * bandwidth monitoring events can be configured.
> + * @num_mbm_cntrs: Number of assignable counters.
> + * @mbm_cntr_assignable: Is system capable of supporting counter assignment?
> + * @mbm_assign_on_mkdir: True if counters should automatically be assigned to MBM
> + * events of monitor groups created via mkdir.
> + * @mbm_cntr_configurable: True if assignable counters are configurable.
> */
> struct resctrl_mon {
> u32 num_rmid;
> @@ -300,6 +301,7 @@ struct resctrl_mon {
> int num_mbm_cntrs;
> bool mbm_cntr_assignable;
> bool mbm_assign_on_mkdir;
> + bool mbm_cntr_configurable;
> };
>
> /**
The above looks good to me. The documentation does still specifically state that "event_filter"
is read/write so it needs a change to match. For example, like something below but please feel
free to improve:
diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index 3ec6b3b1b603..70a4ae89d8e1 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -419,9 +419,9 @@ with the following files:
Two MBM events are supported by default: mbm_local_bytes and mbm_total_bytes.
Each MBM event's sub-directory contains a file named "event_filter" that is
- used to view and modify which memory transactions the MBM event is configured
- with. The file is accessible only when "mbm_event" counter assignment mode is
- enabled.
+ used to view and (if writable) modify which memory transactions the MBM
+ event is configured with. The file is accessible only when "mbm_event" counter
+ assignment mode is enabled.
List of memory transaction types supported:
@@ -446,9 +446,8 @@ with the following files:
# cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_local_bytes/event_filter
local_reads,local_non_temporal_writes,local_reads_slow_memory
- Modify the event configuration by writing to the "event_filter" file within
- the "event_configs" directory. The read/write "event_filter" file contains the
- configuration of the event that reflects which memory transactions are counted by it.
+ The memory transactions the MBM event is configured with can be changed
+ if "event_filter" is writable.
For example::
Reinette
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [PATCH v4 2/7] x86,fs/resctrl: Make 'event_filter' files read only if they're not configurable
2026-03-27 16:13 ` Reinette Chatre
@ 2026-03-27 17:15 ` Ben Horgan
0 siblings, 0 replies; 25+ messages in thread
From: Ben Horgan @ 2026-03-27 17:15 UTC (permalink / raw)
To: Reinette Chatre, linux-kernel
Cc: tony.luck, Dave.Martin, james.morse, babu.moger, tglx, mingo, bp,
dave.hansen, x86, hpa, fenghuay, tan.shaopeng
Hi Reinette,
On 3/27/26 16:13, Reinette Chatre wrote:
> Hi Ben,
>
> nit: Since there is some precedent for resctrl files changing permissions
> after creation I think it would support this work to make clear that the
> permissions are set before the files are created and as such they are
> created with correct permissions.
>
> This could just be a simple: "Make" -> "Create" in subject.
Makes sense. I'll update to use "Create".
>
> On 3/26/26 10:25 AM, Ben Horgan wrote:
>> When the counter assignment mode is mbm_event resctrl assumes the MBM
>> events are configurable and exposes the 'event_filter' files. These files
>> live at info/L3_MON/event_configs/<event>/event_filter and are used to
>> display and set the event configuration.
>
> <split here into separata paragraphs>
Ack
>
> The MPAM architecture has support
>> for configuring the memory bandwidth utilization (MBWU) counters to only
>> count reads or only count writes. However, In MPAM, this event filtering
>> support is optional in the hardware (and not yet implemented in the MPAM
>> driver) but MBM counter assignment is always possible for MPAM MBWU
>> counters.
>>
>> In order to support mbm_event mode with MPAM, make the 'event_filter' files
>
> "make" -> "create"
Ack
>
>> read only if the event configuration can't be changed. A user can still
>> chmod the file and so also return early with an error from
>> event_filter_write().
>
> I went back-and-forth a few times on whether we should add a permissions
> check to rdtgroup_file_write() to not call rft->write() (and thus event_filter_write())
> at all if the file is not writable. In the end I think this patch is ok since
> there is the last_cmd_status help to give insight into why user is unable to
> write to the file that may be writable under other circumstances. Any opinions
> welcome here.
>
>>
>> Introduce a new monitor property, mbm_cntr_configurable, to indicate
>> whether or not assignable MBM counters are configurable. On x86, set this
>> to true whenever mbm_cntr_assignable is true to keep existing behaviour.
>>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>> ---
>> Changes since v2:
>> Use property, mbm_cntr_configurable, rather than arch hook
>> Change the event_filter mode to read only in res_common_files[]
>> Add resctrl_file_mode_init() and use in resctrl_l3_mon_resource_init()
>> set mbm_cntr_configurable for x86 ABMC and mention in commit message
>> ---
>> arch/x86/kernel/cpu/resctrl/monitor.c | 1 +
>> fs/resctrl/internal.h | 2 ++
>> fs/resctrl/monitor.c | 7 +++++++
>> fs/resctrl/rdtgroup.c | 11 ++++++++++-
>> include/linux/resctrl.h | 16 +++++++++-------
>> 5 files changed, 29 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index 9bd87bae4983..794a6fb175e4 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -454,6 +454,7 @@ int __init rdt_get_l3_mon_config(struct rdt_resource *r)
>> (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL) ||
>> rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL))) {
>> r->mon.mbm_cntr_assignable = true;
>> + r->mon.mbm_cntr_configurable = true;
>> cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
>> r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
>> hw_res->mbm_cntr_assign_enabled = true;
>> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
>> index 1a9b29119f88..48af75b9dc85 100644
>> --- a/fs/resctrl/internal.h
>> +++ b/fs/resctrl/internal.h
>> @@ -408,6 +408,8 @@ void __check_limbo(struct rdt_l3_mon_domain *d, bool force_free);
>>
>> void resctrl_file_fflags_init(const char *config, unsigned long fflags);
>>
>> +void resctrl_file_mode_init(const char *config, umode_t mode);
>> +
>> void rdt_staged_configs_clear(void);
>>
>> bool closid_allocated(unsigned int closid);
>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>> index 49f3f6b846b2..8fec3dea33c3 100644
>> --- a/fs/resctrl/monitor.c
>> +++ b/fs/resctrl/monitor.c
>> @@ -1420,6 +1420,11 @@ ssize_t event_filter_write(struct kernfs_open_file *of, char *buf, size_t nbytes
>> ret = -EINVAL;
>> goto out_unlock;
>> }
>> + if (!r->mon.mbm_cntr_configurable) {
>> + rdt_last_cmd_puts("event_filter is not configurable\n");
>> + ret = -EPERM;
>> + goto out_unlock;
>> + }
>>
>> ret = resctrl_parse_mem_transactions(buf, &evt_cfg);
>> if (!ret && mevt->evt_cfg != evt_cfg) {
>> @@ -1884,6 +1889,8 @@ int resctrl_l3_mon_resource_init(void)
>> resctrl_file_fflags_init("available_mbm_cntrs",
>> RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
>> resctrl_file_fflags_init("event_filter", RFTYPE_ASSIGN_CONFIG);
>> + if (r->mon.mbm_cntr_configurable)
>> + resctrl_file_mode_init("event_filter", 0644);
>> resctrl_file_fflags_init("mbm_assign_on_mkdir", RFTYPE_MON_INFO |
>> RFTYPE_RES_CACHE);
>> resctrl_file_fflags_init("mbm_L3_assignments", RFTYPE_MON_BASE);
>> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
>> index 4753841c2ca3..fa5712db3778 100644
>> --- a/fs/resctrl/rdtgroup.c
>> +++ b/fs/resctrl/rdtgroup.c
>> @@ -2020,7 +2020,7 @@ static struct rftype res_common_files[] = {
>> },
>> {
>> .name = "event_filter",
>> - .mode = 0644,
>> + .mode = 0444,
>> .kf_ops = &rdtgroup_kf_single_ops,
>> .seq_show = event_filter_show,
>> .write = event_filter_write,
>> @@ -2213,6 +2213,15 @@ void resctrl_file_fflags_init(const char *config, unsigned long fflags)
>> rft->fflags = fflags;
>> }
>>
>> +void resctrl_file_mode_init(const char *config, umode_t mode)
>> +{
>> + struct rftype *rft;
>> +
>> + rft = rdtgroup_get_rftype_by_name(config);
>> + if (rft)
>> + rft->mode = mode;
>> +}
>> +
>> /**
>> * rdtgroup_kn_mode_restrict - Restrict user access to named resctrl file
>> * @r: The resource group with which the file is associated.
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index 006e57fd7ca5..06e8c72e8660 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -286,13 +286,14 @@ enum resctrl_schema_fmt {
>>
>> /**
>> * struct resctrl_mon - Monitoring related data of a resctrl resource.
>> - * @num_rmid: Number of RMIDs available.
>> - * @mbm_cfg_mask: Memory transactions that can be tracked when bandwidth
>> - * monitoring events can be configured.
>> - * @num_mbm_cntrs: Number of assignable counters.
>> - * @mbm_cntr_assignable:Is system capable of supporting counter assignment?
>> - * @mbm_assign_on_mkdir:True if counters should automatically be assigned to MBM
>> - * events of monitor groups created via mkdir.
>> + * @num_rmid: Number of RMIDs available.
>> + * @mbm_cfg_mask: Memory transactions that can be tracked when
>> + * bandwidth monitoring events can be configured.
>> + * @num_mbm_cntrs: Number of assignable counters.
>> + * @mbm_cntr_assignable: Is system capable of supporting counter assignment?
>> + * @mbm_assign_on_mkdir: True if counters should automatically be assigned to MBM
>> + * events of monitor groups created via mkdir.
>> + * @mbm_cntr_configurable: True if assignable counters are configurable.
>> */
>> struct resctrl_mon {
>> u32 num_rmid;
>> @@ -300,6 +301,7 @@ struct resctrl_mon {
>> int num_mbm_cntrs;
>> bool mbm_cntr_assignable;
>> bool mbm_assign_on_mkdir;
>> + bool mbm_cntr_configurable;
>> };
>>
>> /**
>
> The above looks good to me. The documentation does still specifically state that "event_filter"
> is read/write so it needs a change to match. For example, like something below but please feel
> free to improve:
>
> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
> index 3ec6b3b1b603..70a4ae89d8e1 100644
> --- a/Documentation/filesystems/resctrl.rst
> +++ b/Documentation/filesystems/resctrl.rst
> @@ -419,9 +419,9 @@ with the following files:
>
> Two MBM events are supported by default: mbm_local_bytes and mbm_total_bytes.
> Each MBM event's sub-directory contains a file named "event_filter" that is
> - used to view and modify which memory transactions the MBM event is configured
> - with. The file is accessible only when "mbm_event" counter assignment mode is
> - enabled.
> + used to view and (if writable) modify which memory transactions the MBM
> + event is configured with. The file is accessible only when "mbm_event" counter
> + assignment mode is enabled.
>
> List of memory transaction types supported:
>
> @@ -446,9 +446,8 @@ with the following files:
> # cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_local_bytes/event_filter
> local_reads,local_non_temporal_writes,local_reads_slow_memory
>
> - Modify the event configuration by writing to the "event_filter" file within
> - the "event_configs" directory. The read/write "event_filter" file contains the
> - configuration of the event that reflects which memory transactions are counted by it.
> + The memory transactions the MBM event is configured with can be changed
> + if "event_filter" is writable.
Thanks for the wording. It reads well to me.
Ben
>
> For example::
>
> Reinette
>
>
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v4 3/7] fs/resctrl: Disallow the software controller when MBM counters are assignable
2026-03-26 17:25 [PATCH v4 0/7] x86,fs/resctrl: Pave the way for MPAM counter assignment Ben Horgan
2026-03-26 17:25 ` [PATCH v4 1/7] fs/resctrl: Tidy up the error path in resctrl_mkdir_event_configs() Ben Horgan
2026-03-26 17:25 ` [PATCH v4 2/7] x86,fs/resctrl: Make 'event_filter' files read only if they're not configurable Ben Horgan
@ 2026-03-26 17:25 ` Ben Horgan
2026-03-27 16:14 ` Reinette Chatre
2026-03-26 17:25 ` [PATCH v4 4/7] x86,fs/resctrl: Add monitor property 'mbm_cntr_assign_fixed' Ben Horgan
` (3 subsequent siblings)
6 siblings, 1 reply; 25+ messages in thread
From: Ben Horgan @ 2026-03-26 17:25 UTC (permalink / raw)
To: linux-kernel
Cc: tony.luck, reinette.chatre, Dave.Martin, james.morse, babu.moger,
tglx, mingo, bp, dave.hansen, x86, hpa, ben.horgan, fenghuay,
tan.shaopeng
The software controller requires that for each MBA control there is one MBM
counter per monitor group that is assigned to the event backing the
software controller, as per mba_MBps_event. When mbm_event mode is in use,
it is not guaranteed that any particular event will have an assigned
counter.
Currently, only AMD systems support counter assignment, but the MBA delay
is non-linear and so the software controller is never supported anyway. On
MPAM systems, the MBA delay is linear and so the software controller could
be supported. The MPAM driver, unless a need arises, will not support the
'default' mbm_assign_mode and will always use the 'mbm_event' mode for
memory bandwidth monitoring.
Rather than develop a way to guarantee the counter assignment requirements
needed by the software controller, take the pragmatic approach. Don't allow
the software controller to be used at the same time as 'mbm_event' mode. As
MPAM is the only relevant architecture and it will use 'mbm_event' mode
whenever there are assignable MBM counters, for simplicity's sake, don't
allow the software controller when the MBM counters are assignable.
Implement this by failing the mount if the user requests the software
controller, the mba_MBps option, and the MBM counters are assignable.
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
fs/resctrl/rdtgroup.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index fa5712db3778..7ef316b24a41 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2528,7 +2528,8 @@ static bool supports_mba_mbps(void)
return (resctrl_is_mbm_enabled() &&
r->alloc_capable && is_mba_linear() &&
- r->ctrl_scope == rmbm->mon_scope);
+ r->ctrl_scope == rmbm->mon_scope &&
+ !rmbm->mon.mbm_cntr_assignable);
}
/*
@@ -2943,7 +2944,7 @@ static int rdt_parse_param(struct fs_context *fc, struct fs_parameter *param)
ctx->enable_cdpl2 = true;
return 0;
case Opt_mba_mbps:
- msg = "mba_MBps requires MBM and linear scale MBA at L3 scope";
+ msg = "mba_MBps requires dedicated MBM counters and linear scale MBA at L3 scope";
if (!supports_mba_mbps())
return invalfc(fc, msg);
ctx->enable_mba_mbps = true;
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [PATCH v4 3/7] fs/resctrl: Disallow the software controller when MBM counters are assignable
2026-03-26 17:25 ` [PATCH v4 3/7] fs/resctrl: Disallow the software controller when MBM counters are assignable Ben Horgan
@ 2026-03-27 16:14 ` Reinette Chatre
2026-03-27 17:24 ` Ben Horgan
0 siblings, 1 reply; 25+ messages in thread
From: Reinette Chatre @ 2026-03-27 16:14 UTC (permalink / raw)
To: Ben Horgan, linux-kernel
Cc: tony.luck, Dave.Martin, james.morse, babu.moger, tglx, mingo, bp,
dave.hansen, x86, hpa, fenghuay, tan.shaopeng
Hi Ben,
On 3/26/26 10:25 AM, Ben Horgan wrote:
> The software controller requires that for each MBA control there is one MBM
Could you please elaborate the "for each MBA control" distinction here?
> counter per monitor group that is assigned to the event backing the
> software controller, as per mba_MBps_event. When mbm_event mode is in use,
> it is not guaranteed that any particular event will have an assigned
> counter.
>
> Currently, only AMD systems support counter assignment, but the MBA delay
> is non-linear and so the software controller is never supported anyway. On
> MPAM systems, the MBA delay is linear and so the software controller could
> be supported. The MPAM driver, unless a need arises, will not support the
> 'default' mbm_assign_mode and will always use the 'mbm_event' mode for
> memory bandwidth monitoring.
>
> Rather than develop a way to guarantee the counter assignment requirements
> needed by the software controller, take the pragmatic approach. Don't allow
> the software controller to be used at the same time as 'mbm_event' mode. As
> MPAM is the only relevant architecture and it will use 'mbm_event' mode
> whenever there are assignable MBM counters, for simplicity's sake, don't
> allow the software controller when the MBM counters are assignable.
>
> Implement this by failing the mount if the user requests the software
> controller, the mba_MBps option, and the MBM counters are assignable.
>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> fs/resctrl/rdtgroup.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index fa5712db3778..7ef316b24a41 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -2528,7 +2528,8 @@ static bool supports_mba_mbps(void)
>
> return (resctrl_is_mbm_enabled() &&
> r->alloc_capable && is_mba_linear() &&
> - r->ctrl_scope == rmbm->mon_scope);
> + r->ctrl_scope == rmbm->mon_scope &&
> + !rmbm->mon.mbm_cntr_assignable);
> }
>
> /*
> @@ -2943,7 +2944,7 @@ static int rdt_parse_param(struct fs_context *fc, struct fs_parameter *param)
> ctx->enable_cdpl2 = true;
> return 0;
> case Opt_mba_mbps:
> - msg = "mba_MBps requires MBM and linear scale MBA at L3 scope";
> + msg = "mba_MBps requires dedicated MBM counters and linear scale MBA at L3 scope";
This looks like the original. I was expecting:
https://lore.kernel.org/lkml/cac437e2-8139-4833-9cbd-55d626062730@arm.com/ ?
> if (!supports_mba_mbps())
> return invalfc(fc, msg);
> ctx->enable_mba_mbps = true;
Reinette
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 3/7] fs/resctrl: Disallow the software controller when MBM counters are assignable
2026-03-27 16:14 ` Reinette Chatre
@ 2026-03-27 17:24 ` Ben Horgan
0 siblings, 0 replies; 25+ messages in thread
From: Ben Horgan @ 2026-03-27 17:24 UTC (permalink / raw)
To: Reinette Chatre, linux-kernel
Cc: tony.luck, Dave.Martin, james.morse, babu.moger, tglx, mingo, bp,
dave.hansen, x86, hpa, fenghuay, tan.shaopeng
Hi Reinette,
On 3/27/26 16:14, Reinette Chatre wrote:
> Hi Ben,
>
> On 3/26/26 10:25 AM, Ben Horgan wrote:
>> The software controller requires that for each MBA control there is one MBM
>
> Could you please elaborate the "for each MBA control" distinction here?
I was referring to the per-control per-domain configurable bandwidth but perhaps
it's clearer to just drop "for each MBA control" from the sentence.
>
>> counter per monitor group that is assigned to the event backing the
>> software controller, as per mba_MBps_event. When mbm_event mode is in use,
>> it is not guaranteed that any particular event will have an assigned
>> counter.
>>
>> Currently, only AMD systems support counter assignment, but the MBA delay
>> is non-linear and so the software controller is never supported anyway. On
>> MPAM systems, the MBA delay is linear and so the software controller could
>> be supported. The MPAM driver, unless a need arises, will not support the
>> 'default' mbm_assign_mode and will always use the 'mbm_event' mode for
>> memory bandwidth monitoring.
>>
>> Rather than develop a way to guarantee the counter assignment requirements
>> needed by the software controller, take the pragmatic approach. Don't allow
>> the software controller to be used at the same time as 'mbm_event' mode. As
>> MPAM is the only relevant architecture and it will use 'mbm_event' mode
>> whenever there are assignable MBM counters, for simplicity's sake, don't
>> allow the software controller when the MBM counters are assignable.
>>
>> Implement this by failing the mount if the user requests the software
>> controller, the mba_MBps option, and the MBM counters are assignable.
>>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>> ---
>> fs/resctrl/rdtgroup.c | 5 +++--
>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
>> index fa5712db3778..7ef316b24a41 100644
>> --- a/fs/resctrl/rdtgroup.c
>> +++ b/fs/resctrl/rdtgroup.c
>> @@ -2528,7 +2528,8 @@ static bool supports_mba_mbps(void)
>>
>> return (resctrl_is_mbm_enabled() &&
>> r->alloc_capable && is_mba_linear() &&
>> - r->ctrl_scope == rmbm->mon_scope);
>> + r->ctrl_scope == rmbm->mon_scope &&
>> + !rmbm->mon.mbm_cntr_assignable);
>> }
>>
>> /*
>> @@ -2943,7 +2944,7 @@ static int rdt_parse_param(struct fs_context *fc, struct fs_parameter *param)
>> ctx->enable_cdpl2 = true;
>> return 0;
>> case Opt_mba_mbps:
>> - msg = "mba_MBps requires MBM and linear scale MBA at L3 scope";
>> + msg = "mba_MBps requires dedicated MBM counters and linear scale MBA at L3 scope";
>
> This looks like the original. I was expecting:
> https://lore.kernel.org/lkml/cac437e2-8139-4833-9cbd-55d626062730@arm.com/ ?
Ah, sorry! Updated locally now.
Thanks,
Ben
>
>> if (!supports_mba_mbps())
>> return invalfc(fc, msg);
>> ctx->enable_mba_mbps = true;
>
> Reinette
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v4 4/7] x86,fs/resctrl: Add monitor property 'mbm_cntr_assign_fixed'
2026-03-26 17:25 [PATCH v4 0/7] x86,fs/resctrl: Pave the way for MPAM counter assignment Ben Horgan
` (2 preceding siblings ...)
2026-03-26 17:25 ` [PATCH v4 3/7] fs/resctrl: Disallow the software controller when MBM counters are assignable Ben Horgan
@ 2026-03-26 17:25 ` Ben Horgan
2026-03-27 16:15 ` Reinette Chatre
2026-03-26 17:25 ` [PATCH v4 5/7] fs/resctrl: Continue counter allocation after failure Ben Horgan
` (2 subsequent siblings)
6 siblings, 1 reply; 25+ messages in thread
From: Ben Horgan @ 2026-03-26 17:25 UTC (permalink / raw)
To: linux-kernel
Cc: tony.luck, reinette.chatre, Dave.Martin, james.morse, babu.moger,
tglx, mingo, bp, dave.hansen, x86, hpa, ben.horgan, fenghuay,
tan.shaopeng
Commit
3b497c3f4f04 ("fs/resctrl: Introduce the interface to display monitoring modes")
introduced CONFIG_RESCTRL_ASSIGN_FIXED but left adding the Kconfig
entry until it was necessary. The counter assignment mode is fixed in
MPAM, even when there are assignable counters, and so addressing this
is needed to support MPAM.
To avoid the burden of another Kconfig entry, replace
CONFIG_RESCTRL_ASSIGN_FIXED with a new property in 'struct resctrl_mon',
'mbm_cntr_assign_fixed'.
To enable better user reporting check the new property in
resctrl_mbm_assign_mode_write().
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Changes since v2:
Change to a resctrl_mon property rather than a arch hook
Update the commit message to mention the property
---
fs/resctrl/monitor.c | 8 +++++++-
include/linux/resctrl.h | 2 ++
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 8fec3dea33c3..6afa2af26ff7 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -1454,7 +1454,7 @@ int resctrl_mbm_assign_mode_show(struct kernfs_open_file *of,
else
seq_puts(s, "[default]\n");
- if (!IS_ENABLED(CONFIG_RESCTRL_ASSIGN_FIXED)) {
+ if (!r->mon.mbm_cntr_assign_fixed) {
if (enabled)
seq_puts(s, "default\n");
else
@@ -1505,6 +1505,12 @@ ssize_t resctrl_mbm_assign_mode_write(struct kernfs_open_file *of, char *buf,
}
if (enable != resctrl_arch_mbm_cntr_assign_enabled(r)) {
+ if (r->mon.mbm_cntr_assign_fixed) {
+ ret = -EINVAL;
+ rdt_last_cmd_puts("Counter assignment mode is not configurable\n");
+ goto out_unlock;
+ }
+
ret = resctrl_arch_mbm_cntr_assign_set(r, enable);
if (ret)
goto out_unlock;
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 06e8c72e8660..a986daf5f2ef 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -294,6 +294,7 @@ enum resctrl_schema_fmt {
* @mbm_assign_on_mkdir: True if counters should automatically be assigned to MBM
* events of monitor groups created via mkdir.
* @mbm_cntr_configurable: True if assignable counters are configurable.
+ * @mbm_cntr_assign_fixed: True if the counter assignment mode is fix.
*/
struct resctrl_mon {
u32 num_rmid;
@@ -302,6 +303,7 @@ struct resctrl_mon {
bool mbm_cntr_assignable;
bool mbm_assign_on_mkdir;
bool mbm_cntr_configurable;
+ bool mbm_cntr_assign_fixed;
};
/**
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [PATCH v4 4/7] x86,fs/resctrl: Add monitor property 'mbm_cntr_assign_fixed'
2026-03-26 17:25 ` [PATCH v4 4/7] x86,fs/resctrl: Add monitor property 'mbm_cntr_assign_fixed' Ben Horgan
@ 2026-03-27 16:15 ` Reinette Chatre
0 siblings, 0 replies; 25+ messages in thread
From: Reinette Chatre @ 2026-03-27 16:15 UTC (permalink / raw)
To: Ben Horgan, linux-kernel
Cc: tony.luck, Dave.Martin, james.morse, babu.moger, tglx, mingo, bp,
dave.hansen, x86, hpa, fenghuay, tan.shaopeng
Hi Ben,
Now that the patch no longer touches x86 code the subject prefix can just
be "fs/resctrl:".
On 3/26/26 10:25 AM, Ben Horgan wrote:
> Commit
>
> 3b497c3f4f04 ("fs/resctrl: Introduce the interface to display monitoring modes")
>
> introduced CONFIG_RESCTRL_ASSIGN_FIXED but left adding the Kconfig
> entry until it was necessary. The counter assignment mode is fixed in
> MPAM, even when there are assignable counters, and so addressing this
> is needed to support MPAM.
>
> To avoid the burden of another Kconfig entry, replace
> CONFIG_RESCTRL_ASSIGN_FIXED with a new property in 'struct resctrl_mon',
> 'mbm_cntr_assign_fixed'.
Can append "to be set by architecture."
>
> To enable better user reporting check the new property in
> resctrl_mbm_assign_mode_write().
How about:
Do not request the architecture to change the counter assignment
mode if it does not support doing so. Provide insight to user space
about why such a request fails.
>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> Changes since v2:
> Change to a resctrl_mon property rather than a arch hook
> Update the commit message to mention the property
> ---
> fs/resctrl/monitor.c | 8 +++++++-
> include/linux/resctrl.h | 2 ++
> 2 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index 8fec3dea33c3..6afa2af26ff7 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -1454,7 +1454,7 @@ int resctrl_mbm_assign_mode_show(struct kernfs_open_file *of,
> else
> seq_puts(s, "[default]\n");
>
> - if (!IS_ENABLED(CONFIG_RESCTRL_ASSIGN_FIXED)) {
> + if (!r->mon.mbm_cntr_assign_fixed) {
> if (enabled)
> seq_puts(s, "default\n");
> else
> @@ -1505,6 +1505,12 @@ ssize_t resctrl_mbm_assign_mode_write(struct kernfs_open_file *of, char *buf,
> }
>
> if (enable != resctrl_arch_mbm_cntr_assign_enabled(r)) {
> + if (r->mon.mbm_cntr_assign_fixed) {
> + ret = -EINVAL;
> + rdt_last_cmd_puts("Counter assignment mode is not configurable\n");
> + goto out_unlock;
> + }
> +
> ret = resctrl_arch_mbm_cntr_assign_set(r, enable);
> if (ret)
> goto out_unlock;
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 06e8c72e8660..a986daf5f2ef 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -294,6 +294,7 @@ enum resctrl_schema_fmt {
> * @mbm_assign_on_mkdir: True if counters should automatically be assigned to MBM
> * events of monitor groups created via mkdir.
> * @mbm_cntr_configurable: True if assignable counters are configurable.
> + * @mbm_cntr_assign_fixed: True if the counter assignment mode is fix.
"is fix" -> "is fixed"?
> */
> struct resctrl_mon {
> u32 num_rmid;
> @@ -302,6 +303,7 @@ struct resctrl_mon {
> bool mbm_cntr_assignable;
> bool mbm_assign_on_mkdir;
> bool mbm_cntr_configurable;
> + bool mbm_cntr_assign_fixed;
> };
>
> /**
Reinette
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v4 5/7] fs/resctrl: Continue counter allocation after failure
2026-03-26 17:25 [PATCH v4 0/7] x86,fs/resctrl: Pave the way for MPAM counter assignment Ben Horgan
` (3 preceding siblings ...)
2026-03-26 17:25 ` [PATCH v4 4/7] x86,fs/resctrl: Add monitor property 'mbm_cntr_assign_fixed' Ben Horgan
@ 2026-03-26 17:25 ` Ben Horgan
2026-03-27 16:21 ` Reinette Chatre
2026-03-26 17:25 ` [PATCH v4 6/7] fs/resctrl: Document that automatic counter assignment is best effort Ben Horgan
2026-03-26 17:25 ` [PATCH v4 7/7] fs/resctrl: Document tasks file behaviour for task id 0 and idle tasks Ben Horgan
6 siblings, 1 reply; 25+ messages in thread
From: Ben Horgan @ 2026-03-26 17:25 UTC (permalink / raw)
To: linux-kernel
Cc: tony.luck, reinette.chatre, Dave.Martin, james.morse, babu.moger,
tglx, mingo, bp, dave.hansen, x86, hpa, ben.horgan, fenghuay,
tan.shaopeng
In mbm_event mode, with mbm_assign_on_mkdir set to 1, when a user creates a
new CTRL_MON or MON group resctrl will attempt to allocate counters for
each of the supported mbm events on each resctrl domain. As counters are
limited, these allocations may fail. If an mbm_total_event counter
allocation fails then the mbm_total_event counter allocations for the
remaining domains are skipped and then the mbm_local_event counter
allocations are made. These failures don't cause the group allocation to
fail but the user should still be aware of them. A message for each
attempted allocation is reported in last_cmd_status but in order to fully
interpret that information the user needs to know what was skipped. This
is knowable as the domain list is sorted but it is undesirable to rely on
such implementation details.
Writes to mbm_L3_assignments using the wildcard format, <event>:*=e, will
also skip counter allocation after any counter allocation failure. Leading
once again to counters that are allocated but have no corresponding message
in last_cmd_status to indicate that.
When a new group is created always attempt to allocate all the counters
requested. Similarly, when a a wildcard assign operation is written to
mbm_L3_assignments, attempt to allocate all counters requested by that
particular operation.
For mbm_L3_assignments, continue to return an error on counter allocation
failure and for a write specifying multiple assign operations continue to
abort after the first failing assign operation.
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
fs/resctrl/monitor.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 6afa2af26ff7..3f33fff8eb7f 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -1209,9 +1209,10 @@ static int rdtgroup_alloc_assign_cntr(struct rdt_resource *r, struct rdt_l3_mon_
* NULL; otherwise, assign the counter to the specified domain @d.
*
* If all counters in a domain are already in use, rdtgroup_alloc_assign_cntr()
- * will fail. The assignment process will abort at the first failure encountered
- * during domain traversal, which may result in the event being only partially
- * assigned.
+ * will fail. When attempting to assign counters to all domains, carry on trying
+ * to assign counters after a failure since only some domains may have counters
+ * and the goal is to assign counters where possible. If any counter assignment
+ * fails, return the error from the last failing assignment.
*
* Return:
* 0 on success, < 0 on failure.
@@ -1224,9 +1225,11 @@ static int rdtgroup_assign_cntr_event(struct rdt_l3_mon_domain *d, struct rdtgro
if (!d) {
list_for_each_entry(d, &r->mon_domains, hdr.list) {
- ret = rdtgroup_alloc_assign_cntr(r, d, rdtgrp, mevt);
- if (ret)
- return ret;
+ int err;
+
+ err = rdtgroup_alloc_assign_cntr(r, d, rdtgrp, mevt);
+ if (err)
+ ret = err;
}
} else {
ret = rdtgroup_alloc_assign_cntr(r, d, rdtgrp, mevt);
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [PATCH v4 5/7] fs/resctrl: Continue counter allocation after failure
2026-03-26 17:25 ` [PATCH v4 5/7] fs/resctrl: Continue counter allocation after failure Ben Horgan
@ 2026-03-27 16:21 ` Reinette Chatre
2026-04-14 14:42 ` Ben Horgan
0 siblings, 1 reply; 25+ messages in thread
From: Reinette Chatre @ 2026-03-27 16:21 UTC (permalink / raw)
To: Ben Horgan, linux-kernel
Cc: tony.luck, Dave.Martin, james.morse, babu.moger, tglx, mingo, bp,
dave.hansen, x86, hpa, fenghuay, tan.shaopeng
Hi Ben,
On 3/26/26 10:25 AM, Ben Horgan wrote:
> In mbm_event mode, with mbm_assign_on_mkdir set to 1, when a user creates a
> new CTRL_MON or MON group resctrl will attempt to allocate counters for
"will attempt" -> "attempts"
> each of the supported mbm events on each resctrl domain. As counters are
"mbm" -> "MBM"
> limited, these allocations may fail. If an mbm_total_event counter
This switches from high level description to example, back to high level description.
> allocation fails then the mbm_total_event counter allocations for the
> remaining domains are skipped and then the mbm_local_event counter
> allocations are made. These failures don't cause the group allocation to
"are made -> "are attempted"
> fail but the user should still be aware of them. A message for each
> attempted allocation is reported in last_cmd_status but in order to fully
> interpret that information the user needs to know what was skipped. This
> is knowable as the domain list is sorted but it is undesirable to rely on
> such implementation details.
User can always get most accurate counter assignment state from
mbm_L3_assignments. There is no need for any guessing or needing to know
implementation details.
>
> Writes to mbm_L3_assignments using the wildcard format, <event>:*=e, will
> also skip counter allocation after any counter allocation failure. Leading
> once again to counters that are allocated but have no corresponding message
> in last_cmd_status to indicate that.
User should not rely on last_cmd_status for state and we should not move
towards making it part of ABI.
>
> When a new group is created always attempt to allocate all the counters
> requested. Similarly, when a a wildcard assign operation is written to
"a a wildcard"
> mbm_L3_assignments, attempt to allocate all counters requested by that
> particular operation.
>
> For mbm_L3_assignments, continue to return an error on counter allocation
> failure and for a write specifying multiple assign operations continue to
> abort after the first failing assign operation.
I support the change to attempt counter creation in all domains but I am
concerned about the motivation here - the goal should not be to document
all failed domains in last_cmd_status and pointing user to it to learn which
allocations failed. Instead user should use mbm_L3_assignments for most
accurate state.
Consider a changelog like below that just focuses on problem being solved
(but please correct me if you find I am missing the point):
In mbm_event mode, with mbm_assign_on_mkdir set to 1, when a user
creates a new CTRL_MON or MON group resctrl attempts to allocate
counters for each of the supported MBM events on each resctrl
domain. As counters are limited, such allocation may fail and
when it does counter allocations for the remaining domains are
skipped even if the domains have available counters.
Since a counter allocation failure may result in counter allocation
skipped on other domains the user needs to view the resource group's
mbm_L3_assignments files to get an accurate view of counter assignment
in a new resource group and then manually create counters in the skipped
domains with available counters.
Writes to mbm_L3_assignments using the wildcard format, <event>:*=e,
also skip counter allocation in other domains after a counter allocation
failure.
When handling a request to create counters in all domains it is unnecessary
for a counter allocation in one domain to prevent counter allocation in
other domains. Always attempt to allocate all the counters requested.
>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> fs/resctrl/monitor.c | 15 +++++++++------
> 1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index 6afa2af26ff7..3f33fff8eb7f 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -1209,9 +1209,10 @@ static int rdtgroup_alloc_assign_cntr(struct rdt_resource *r, struct rdt_l3_mon_
> * NULL; otherwise, assign the counter to the specified domain @d.
> *
> * If all counters in a domain are already in use, rdtgroup_alloc_assign_cntr()
> - * will fail. The assignment process will abort at the first failure encountered
> - * during domain traversal, which may result in the event being only partially
> - * assigned.
> + * will fail. When attempting to assign counters to all domains, carry on trying
> + * to assign counters after a failure since only some domains may have counters
> + * and the goal is to assign counters where possible. If any counter assignment
> + * fails, return the error from the last failing assignment.
> *
> * Return:
> * 0 on success, < 0 on failure.
> @@ -1224,9 +1225,11 @@ static int rdtgroup_assign_cntr_event(struct rdt_l3_mon_domain *d, struct rdtgro
>
> if (!d) {
> list_for_each_entry(d, &r->mon_domains, hdr.list) {
> - ret = rdtgroup_alloc_assign_cntr(r, d, rdtgrp, mevt);
> - if (ret)
> - return ret;
> + int err;
> +
> + err = rdtgroup_alloc_assign_cntr(r, d, rdtgrp, mevt);
> + if (err)
> + ret = err;
> }
> } else {
> ret = rdtgroup_alloc_assign_cntr(r, d, rdtgrp, mevt);
The change looks good to me.
Reinette
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH v4 5/7] fs/resctrl: Continue counter allocation after failure
2026-03-27 16:21 ` Reinette Chatre
@ 2026-04-14 14:42 ` Ben Horgan
2026-04-15 14:27 ` Reinette Chatre
0 siblings, 1 reply; 25+ messages in thread
From: Ben Horgan @ 2026-04-14 14:42 UTC (permalink / raw)
To: Reinette Chatre, linux-kernel
Cc: tony.luck, Dave.Martin, james.morse, babu.moger, tglx, mingo, bp,
dave.hansen, x86, hpa, fenghuay, tan.shaopeng
Hi Reinette,
On 3/27/26 16:21, Reinette Chatre wrote:
> Hi Ben,
>
> On 3/26/26 10:25 AM, Ben Horgan wrote:
>> In mbm_event mode, with mbm_assign_on_mkdir set to 1, when a user creates a
>> new CTRL_MON or MON group resctrl will attempt to allocate counters for
>
> "will attempt" -> "attempts"
>
>> each of the supported mbm events on each resctrl domain. As counters are
>
> "mbm" -> "MBM"
>
>> limited, these allocations may fail. If an mbm_total_event counter
>
> This switches from high level description to example, back to high level description.
>
>> allocation fails then the mbm_total_event counter allocations for the
>> remaining domains are skipped and then the mbm_local_event counter
>> allocations are made. These failures don't cause the group allocation to
>
> "are made -> "are attempted"
>
>> fail but the user should still be aware of them. A message for each
>> attempted allocation is reported in last_cmd_status but in order to fully
>> interpret that information the user needs to know what was skipped. This
>> is knowable as the domain list is sorted but it is undesirable to rely on
>> such implementation details.
>
> User can always get most accurate counter assignment state from
> mbm_L3_assignments. There is no need for any guessing or needing to know
> implementation details.
>
>>
>> Writes to mbm_L3_assignments using the wildcard format, <event>:*=e, will
>> also skip counter allocation after any counter allocation failure. Leading
>> once again to counters that are allocated but have no corresponding message
>> in last_cmd_status to indicate that.
>
> User should not rely on last_cmd_status for state and we should not move
> towards making it part of ABI.
>
>>
>> When a new group is created always attempt to allocate all the counters
>> requested. Similarly, when a a wildcard assign operation is written to
>
> "a a wildcard"
>
>> mbm_L3_assignments, attempt to allocate all counters requested by that
>> particular operation.
>>
>> For mbm_L3_assignments, continue to return an error on counter allocation
>> failure and for a write specifying multiple assign operations continue to
>> abort after the first failing assign operation.
>
> I support the change to attempt counter creation in all domains but I am
> concerned about the motivation here - the goal should not be to document
> all failed domains in last_cmd_status and pointing user to it to learn which
Seems sensible to me.
> allocations failed. Instead user should use mbm_L3_assignments for most
> accurate state.
>
> Consider a changelog like below that just focuses on problem being solved
> (but please correct me if you find I am missing the point):
>
> In mbm_event mode, with mbm_assign_on_mkdir set to 1, when a user
> creates a new CTRL_MON or MON group resctrl attempts to allocate
> counters for each of the supported MBM events on each resctrl
> domain. As counters are limited, such allocation may fail and
> when it does counter allocations for the remaining domains are
> skipped even if the domains have available counters.
>
> Since a counter allocation failure may result in counter allocation
> skipped on other domains the user needs to view the resource group's
skipped -> being skipped
> mbm_L3_assignments files to get an accurate view of counter assignment
> in a new resource group and then manually create counters in the skipped
> domains with available counters.
>
> Writes to mbm_L3_assignments using the wildcard format, <event>:*=e,
> also skip counter allocation in other domains after a counter allocation
> failure.
>
> When handling a request to create counters in all domains it is unnecessary
> for a counter allocation in one domain to prevent counter allocation in
> other domains. Always attempt to allocate all the counters requested.
I can use this but how about if I add,
Skipping counter allocation in subsequent domains after failure makes predicting which
counters will be allocated harder for the user as they need to know the ordering of the
domains as well as the expected failures.
before the final sentence to make it clear that this change is an improvement not just
a change in policy.
Thanks,
Ben
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 5/7] fs/resctrl: Continue counter allocation after failure
2026-04-14 14:42 ` Ben Horgan
@ 2026-04-15 14:27 ` Reinette Chatre
2026-04-15 14:46 ` Ben Horgan
0 siblings, 1 reply; 25+ messages in thread
From: Reinette Chatre @ 2026-04-15 14:27 UTC (permalink / raw)
To: Ben Horgan, linux-kernel
Cc: tony.luck, Dave.Martin, james.morse, babu.moger, tglx, mingo, bp,
dave.hansen, x86, hpa, fenghuay, tan.shaopeng
Hi Ben,
On 4/14/26 7:42 AM, Ben Horgan wrote:
> On 3/27/26 16:21, Reinette Chatre wrote:
>> Consider a changelog like below that just focuses on problem being solved
>> (but please correct me if you find I am missing the point):
>>
>> In mbm_event mode, with mbm_assign_on_mkdir set to 1, when a user
>> creates a new CTRL_MON or MON group resctrl attempts to allocate
>> counters for each of the supported MBM events on each resctrl
>> domain. As counters are limited, such allocation may fail and
>> when it does counter allocations for the remaining domains are
>> skipped even if the domains have available counters.
>>
>> Since a counter allocation failure may result in counter allocation
>> skipped on other domains the user needs to view the resource group's
>
> skipped -> being skipped
>
>> mbm_L3_assignments files to get an accurate view of counter assignment
>> in a new resource group and then manually create counters in the skipped
>> domains with available counters.
>>
>> Writes to mbm_L3_assignments using the wildcard format, <event>:*=e,
>> also skip counter allocation in other domains after a counter allocation
>> failure.
>>
>> When handling a request to create counters in all domains it is unnecessary
>> for a counter allocation in one domain to prevent counter allocation in
>> other domains. Always attempt to allocate all the counters requested.
>
> I can use this but how about if I add,
>
> Skipping counter allocation in subsequent domains after failure makes predicting which
> counters will be allocated harder for the user as they need to know the ordering of the
> domains as well as the expected failures.
I do not see why the user needs to make any predictions with the current implementation.
mbm_L3_assignments will always contain accurate information regarding counter assignment, no?
>
> before the final sentence to make it clear that this change is an improvement not just
> a change in policy.
Reinette
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 5/7] fs/resctrl: Continue counter allocation after failure
2026-04-15 14:27 ` Reinette Chatre
@ 2026-04-15 14:46 ` Ben Horgan
2026-04-15 15:38 ` Reinette Chatre
0 siblings, 1 reply; 25+ messages in thread
From: Ben Horgan @ 2026-04-15 14:46 UTC (permalink / raw)
To: Reinette Chatre, linux-kernel
Cc: tony.luck, Dave.Martin, james.morse, babu.moger, tglx, mingo, bp,
dave.hansen, x86, hpa, fenghuay, tan.shaopeng
Hi Reinette,
On 4/15/26 15:27, Reinette Chatre wrote:
> Hi Ben,
>
> On 4/14/26 7:42 AM, Ben Horgan wrote:
>> On 3/27/26 16:21, Reinette Chatre wrote:
>
>>> Consider a changelog like below that just focuses on problem being solved
>>> (but please correct me if you find I am missing the point):
>>>
>>> In mbm_event mode, with mbm_assign_on_mkdir set to 1, when a user
>>> creates a new CTRL_MON or MON group resctrl attempts to allocate
>>> counters for each of the supported MBM events on each resctrl
>>> domain. As counters are limited, such allocation may fail and
>>> when it does counter allocations for the remaining domains are
>>> skipped even if the domains have available counters.
>>>
>>> Since a counter allocation failure may result in counter allocation
>>> skipped on other domains the user needs to view the resource group's
>>
>> skipped -> being skipped
>>
>>> mbm_L3_assignments files to get an accurate view of counter assignment
>>> in a new resource group and then manually create counters in the skipped
>>> domains with available counters.
>>>
>>> Writes to mbm_L3_assignments using the wildcard format, <event>:*=e,
>>> also skip counter allocation in other domains after a counter allocation
>>> failure.
>>>
>>> When handling a request to create counters in all domains it is unnecessary
>>> for a counter allocation in one domain to prevent counter allocation in
>>> other domains. Always attempt to allocate all the counters requested.
>>
>> I can use this but how about if I add,
>>
>> Skipping counter allocation in subsequent domains after failure makes predicting which
>> counters will be allocated harder for the user as they need to know the ordering of the
>> domains as well as the expected failures.
>
> I do not see why the user needs to make any predictions with the current implementation.
> mbm_L3_assignments will always contain accurate information regarding counter assignment, no?
They can see the result with mbm_L3_assignments. In general, if the user is doing any operation it helps for them to
know what they can expect from that operation before doing it.
Happy to drop the extra sentence if you don't think it adds anything. Making all allocations of multiple counters best
effort is the main point.
Thanks,
Ben
>
>>
>> before the final sentence to make it clear that this change is an improvement not just
>> a change in policy.
>
> Reinette
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 5/7] fs/resctrl: Continue counter allocation after failure
2026-04-15 14:46 ` Ben Horgan
@ 2026-04-15 15:38 ` Reinette Chatre
2026-04-15 16:31 ` Ben Horgan
0 siblings, 1 reply; 25+ messages in thread
From: Reinette Chatre @ 2026-04-15 15:38 UTC (permalink / raw)
To: Ben Horgan, linux-kernel
Cc: tony.luck, Dave.Martin, james.morse, babu.moger, tglx, mingo, bp,
dave.hansen, x86, hpa, fenghuay, tan.shaopeng
Hi Ben,
On 4/15/26 7:46 AM, Ben Horgan wrote:
> Hi Reinette,
>
> On 4/15/26 15:27, Reinette Chatre wrote:
>> Hi Ben,
>>
>> On 4/14/26 7:42 AM, Ben Horgan wrote:
>>> On 3/27/26 16:21, Reinette Chatre wrote:
>>
>>>> Consider a changelog like below that just focuses on problem being solved
>>>> (but please correct me if you find I am missing the point):
>>>>
>>>> In mbm_event mode, with mbm_assign_on_mkdir set to 1, when a user
>>>> creates a new CTRL_MON or MON group resctrl attempts to allocate
>>>> counters for each of the supported MBM events on each resctrl
>>>> domain. As counters are limited, such allocation may fail and
>>>> when it does counter allocations for the remaining domains are
>>>> skipped even if the domains have available counters.
>>>>
>>>> Since a counter allocation failure may result in counter allocation
>>>> skipped on other domains the user needs to view the resource group's
>>>
>>> skipped -> being skipped
>>>
>>>> mbm_L3_assignments files to get an accurate view of counter assignment
>>>> in a new resource group and then manually create counters in the skipped
>>>> domains with available counters.
>>>>
>>>> Writes to mbm_L3_assignments using the wildcard format, <event>:*=e,
>>>> also skip counter allocation in other domains after a counter allocation
>>>> failure.
>>>>
>>>> When handling a request to create counters in all domains it is unnecessary
>>>> for a counter allocation in one domain to prevent counter allocation in
>>>> other domains. Always attempt to allocate all the counters requested.
>>>
>>> I can use this but how about if I add,
>>>
>>> Skipping counter allocation in subsequent domains after failure makes predicting which
>>> counters will be allocated harder for the user as they need to know the ordering of the
>>> domains as well as the expected failures.
>>
>> I do not see why the user needs to make any predictions with the current implementation.
>> mbm_L3_assignments will always contain accurate information regarding counter assignment, no?
>
> They can see the result with mbm_L3_assignments. In general, if the user is doing any operation it helps for them to
> know what they can expect from that operation before doing it.
I totally agree. I seem to be missing the goal here. Are you saying that currently it is not clear
what the user can expect when running these commands? I believe that is clarified with the documentation
update in patch #6?
>
> Happy to drop the extra sentence if you don't think it adds anything. Making all allocations of multiple counters best
> effort is the main point.
I do not object adding extra sentences but the proposal mentions how user space needs to
predict behaviors and know about kernel internals which I do not believe is required now nor
with planned changes. I really seem to be missing something here so would appreciate if you
could elaborate the goals here.
Reinette
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 5/7] fs/resctrl: Continue counter allocation after failure
2026-04-15 15:38 ` Reinette Chatre
@ 2026-04-15 16:31 ` Ben Horgan
2026-04-15 17:28 ` Reinette Chatre
0 siblings, 1 reply; 25+ messages in thread
From: Ben Horgan @ 2026-04-15 16:31 UTC (permalink / raw)
To: Reinette Chatre, linux-kernel
Cc: tony.luck, Dave.Martin, james.morse, babu.moger, tglx, mingo, bp,
dave.hansen, x86, hpa, fenghuay, tan.shaopeng
Hi Reinette,
On 4/15/26 16:38, Reinette Chatre wrote:
> Hi Ben,
>
> On 4/15/26 7:46 AM, Ben Horgan wrote:
>> Hi Reinette,
>>
>> On 4/15/26 15:27, Reinette Chatre wrote:
>>> Hi Ben,
>>>
>>> On 4/14/26 7:42 AM, Ben Horgan wrote:
>>>> On 3/27/26 16:21, Reinette Chatre wrote:
>>>
>>>>> Consider a changelog like below that just focuses on problem being solved
>>>>> (but please correct me if you find I am missing the point):
>>>>>
>>>>> In mbm_event mode, with mbm_assign_on_mkdir set to 1, when a user
>>>>> creates a new CTRL_MON or MON group resctrl attempts to allocate
>>>>> counters for each of the supported MBM events on each resctrl
>>>>> domain. As counters are limited, such allocation may fail and
>>>>> when it does counter allocations for the remaining domains are
>>>>> skipped even if the domains have available counters.
>>>>>
>>>>> Since a counter allocation failure may result in counter allocation
>>>>> skipped on other domains the user needs to view the resource group's
>>>>
>>>> skipped -> being skipped
>>>>
>>>>> mbm_L3_assignments files to get an accurate view of counter assignment
>>>>> in a new resource group and then manually create counters in the skipped
>>>>> domains with available counters.
>>>>>
>>>>> Writes to mbm_L3_assignments using the wildcard format, <event>:*=e,
>>>>> also skip counter allocation in other domains after a counter allocation
>>>>> failure.
>>>>>
>>>>> When handling a request to create counters in all domains it is unnecessary
>>>>> for a counter allocation in one domain to prevent counter allocation in
>>>>> other domains. Always attempt to allocate all the counters requested.
>>>>
>>>> I can use this but how about if I add,
>>>>
>>>> Skipping counter allocation in subsequent domains after failure makes predicting which
>>>> counters will be allocated harder for the user as they need to know the ordering of the
>>>> domains as well as the expected failures.
>>>
>>> I do not see why the user needs to make any predictions with the current implementation.
>>> mbm_L3_assignments will always contain accurate information regarding counter assignment, no?
>>
>> They can see the result with mbm_L3_assignments. In general, if the user is doing any operation it helps for them to
>> know what they can expect from that operation before doing it.
>
> I totally agree. I seem to be missing the goal here. Are you saying that currently it is not clear
> what the user can expect when running these commands? I believe that is clarified with the documentation
> update in patch #6?
With this patch and the documentation in patch #6 the user should be clear on what to expect. Without this patch the
expectation is to iterate through the domains allocating counters but not consider subsequent domains after a failure.
This means that the order of the domains are considered effects the outcome of command.
With this patch: each counter is attempted to be allocated
Without this patch: Domains are considered in order of ascending id and each counter in the domain is attempted to be
allocated but if a counter allocation fails no subsequent domains are considered.
I think the behaviour with this patch is easier to understand and what would be most useful to the user.
Suppose you have two domains, domain 0 and 1 and one counter remaining on one of the domains and create a new group
(with mbm_assign_on_mkdir set to 1). Without this patch the available counter is only allocated if that domain is
considered first (lower id) but with this patch it allocated regardless of which domain it's in.
>
>>
>> Happy to drop the extra sentence if you don't think it adds anything. Making all allocations of multiple counters best
>> effort is the main point.
>
> I do not object adding extra sentences but the proposal mentions how user space needs to
> predict behaviors and know about kernel internals which I do not believe is required now nor
> with planned changes. I really seem to be missing something here so would appreciate if you
> could elaborate the goals here.
It seems that we agree on the patch but I've somehow confused you on the motivation. In essence, it's making a better
effort at best effort counter allocation. Does this help clarify things?
Thanks,
Ben
>
> Reinette
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 5/7] fs/resctrl: Continue counter allocation after failure
2026-04-15 16:31 ` Ben Horgan
@ 2026-04-15 17:28 ` Reinette Chatre
2026-04-16 8:32 ` Ben Horgan
0 siblings, 1 reply; 25+ messages in thread
From: Reinette Chatre @ 2026-04-15 17:28 UTC (permalink / raw)
To: Ben Horgan, linux-kernel
Cc: tony.luck, Dave.Martin, james.morse, babu.moger, tglx, mingo, bp,
dave.hansen, x86, hpa, fenghuay, tan.shaopeng
Hi Ben,
On 4/15/26 9:31 AM, Ben Horgan wrote:
> On 4/15/26 16:38, Reinette Chatre wrote:
>> On 4/15/26 7:46 AM, Ben Horgan wrote:
>>> On 4/15/26 15:27, Reinette Chatre wrote:
>>>> On 4/14/26 7:42 AM, Ben Horgan wrote:
>>>>> On 3/27/26 16:21, Reinette Chatre wrote:
>>>>
>>>>>> Consider a changelog like below that just focuses on problem being solved
>>>>>> (but please correct me if you find I am missing the point):
>>>>>>
>>>>>> In mbm_event mode, with mbm_assign_on_mkdir set to 1, when a user
>>>>>> creates a new CTRL_MON or MON group resctrl attempts to allocate
>>>>>> counters for each of the supported MBM events on each resctrl
>>>>>> domain. As counters are limited, such allocation may fail and
>>>>>> when it does counter allocations for the remaining domains are
>>>>>> skipped even if the domains have available counters.
>>>>>>
>>>>>> Since a counter allocation failure may result in counter allocation
>>>>>> skipped on other domains the user needs to view the resource group's
>>>>>
>>>>> skipped -> being skipped
>>>>>
>>>>>> mbm_L3_assignments files to get an accurate view of counter assignment
>>>>>> in a new resource group and then manually create counters in the skipped
>>>>>> domains with available counters.
>>>>>>
>>>>>> Writes to mbm_L3_assignments using the wildcard format, <event>:*=e,
>>>>>> also skip counter allocation in other domains after a counter allocation
>>>>>> failure.
>>>>>>
>>>>>> When handling a request to create counters in all domains it is unnecessary
>>>>>> for a counter allocation in one domain to prevent counter allocation in
>>>>>> other domains. Always attempt to allocate all the counters requested.
>>>>>
>>>>> I can use this but how about if I add,
>>>>>
>>>>> Skipping counter allocation in subsequent domains after failure makes predicting which
>>>>> counters will be allocated harder for the user as they need to know the ordering of the
>>>>> domains as well as the expected failures.
>>>>
>>>> I do not see why the user needs to make any predictions with the current implementation.
>>>> mbm_L3_assignments will always contain accurate information regarding counter assignment, no?
>>>
>>> They can see the result with mbm_L3_assignments. In general, if the user is doing any operation it helps for them to
>>> know what they can expect from that operation before doing it.
>>
>> I totally agree. I seem to be missing the goal here. Are you saying that currently it is not clear
>> what the user can expect when running these commands? I believe that is clarified with the documentation
>> update in patch #6?
>
> With this patch and the documentation in patch #6 the user should be clear on what to expect. Without this patch the
Agree with "With this patch ...".
> expectation is to iterate through the domains allocating counters but not consider subsequent domains after a failure.
I hear what you are trying to say with the "Without this patch ..." part but I am hesitant to
equate current undocumented kernel behavior as established user expectations. Instead I think
when user requests counter allocation in all domains then expectation would/should be that
counters will be allocated in all domains (that have available counters). I thus see this work
more as a fix that brings behavior closer to match what user may expect when running
impacted commands.
One possible expectation that user space may develop based on experience with existing implementation
is that requests to allocate counters may fail even if counters are available in some domains. If user
space does have such expectation then they still need to consider mbm_L3_assignments to know
what was actually allocated and I still do not see how they need to make any predictions to
use resctrl.
We thus seem to disagree on user expectations? Since this is quite subjective that may be ok since
I believe this work addresses all aspects?
> This means that the order of the domains are considered effects the outcome of command.
Right.
>
> With this patch: each counter is attempted to be allocated
> Without this patch: Domains are considered in order of ascending id and each counter in the domain is attempted to be
> allocated but if a counter allocation fails no subsequent domains are considered.
Understood.
>
> I think the behaviour with this patch is easier to understand and what would be most useful to the user.
I agree.
>
> Suppose you have two domains, domain 0 and 1 and one counter remaining on one of the domains and create a new group
> (with mbm_assign_on_mkdir set to 1). Without this patch the available counter is only allocated if that domain is
> considered first (lower id) but with this patch it allocated regardless of which domain it's in.
Understood.
>>> Happy to drop the extra sentence if you don't think it adds anything. Making all allocations of multiple counters best
>>> effort is the main point.
>>
>> I do not object adding extra sentences but the proposal mentions how user space needs to
>> predict behaviors and know about kernel internals which I do not believe is required now nor
>> with planned changes. I really seem to be missing something here so would appreciate if you
>> could elaborate the goals here.
>
> It seems that we agree on the patch but I've somehow confused you on the motivation. In essence, it's making a better
> effort at best effort counter allocation. Does this help clarify things?
I understand motivation and I agree this patch improves current behavior. You need not
motivate this patch. My comment was about the additional information that you propose
adding to the changelog where I still do not see user needing to ever make any predictions to
use the existing interfaces.
Reinette
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v4 5/7] fs/resctrl: Continue counter allocation after failure
2026-04-15 17:28 ` Reinette Chatre
@ 2026-04-16 8:32 ` Ben Horgan
0 siblings, 0 replies; 25+ messages in thread
From: Ben Horgan @ 2026-04-16 8:32 UTC (permalink / raw)
To: Reinette Chatre, linux-kernel
Cc: tony.luck, Dave.Martin, james.morse, babu.moger, tglx, mingo, bp,
dave.hansen, x86, hpa, fenghuay, tan.shaopeng
Hi Reinette,
On 4/15/26 18:28, Reinette Chatre wrote:
> Hi Ben,
>
> On 4/15/26 9:31 AM, Ben Horgan wrote:
>> On 4/15/26 16:38, Reinette Chatre wrote:
>>> On 4/15/26 7:46 AM, Ben Horgan wrote:
>>>> On 4/15/26 15:27, Reinette Chatre wrote:
>>>>> On 4/14/26 7:42 AM, Ben Horgan wrote:
>>>>>> On 3/27/26 16:21, Reinette Chatre wrote:
>>>>>
>>>>>>> Consider a changelog like below that just focuses on problem being solved
>>>>>>> (but please correct me if you find I am missing the point):
>>>>>>>
>>>>>>> In mbm_event mode, with mbm_assign_on_mkdir set to 1, when a user
>>>>>>> creates a new CTRL_MON or MON group resctrl attempts to allocate
>>>>>>> counters for each of the supported MBM events on each resctrl
>>>>>>> domain. As counters are limited, such allocation may fail and
>>>>>>> when it does counter allocations for the remaining domains are
>>>>>>> skipped even if the domains have available counters.
>>>>>>>
>>>>>>> Since a counter allocation failure may result in counter allocation
>>>>>>> skipped on other domains the user needs to view the resource group's
>>>>>>
>>>>>> skipped -> being skipped
>>>>>>
>>>>>>> mbm_L3_assignments files to get an accurate view of counter assignment
>>>>>>> in a new resource group and then manually create counters in the skipped
>>>>>>> domains with available counters.
>>>>>>>
>>>>>>> Writes to mbm_L3_assignments using the wildcard format, <event>:*=e,
>>>>>>> also skip counter allocation in other domains after a counter allocation
>>>>>>> failure.
>>>>>>>
>>>>>>> When handling a request to create counters in all domains it is unnecessary
>>>>>>> for a counter allocation in one domain to prevent counter allocation in
>>>>>>> other domains. Always attempt to allocate all the counters requested.
>>>>>>
>>>>>> I can use this but how about if I add,
>>>>>>
>>>>>> Skipping counter allocation in subsequent domains after failure makes predicting which
>>>>>> counters will be allocated harder for the user as they need to know the ordering of the
>>>>>> domains as well as the expected failures.
>>>>>
>>>>> I do not see why the user needs to make any predictions with the current implementation.
>>>>> mbm_L3_assignments will always contain accurate information regarding counter assignment, no?
>>>>
>>>> They can see the result with mbm_L3_assignments. In general, if the user is doing any operation it helps for them to
>>>> know what they can expect from that operation before doing it.
>>>
>>> I totally agree. I seem to be missing the goal here. Are you saying that currently it is not clear
>>> what the user can expect when running these commands? I believe that is clarified with the documentation
>>> update in patch #6?
>>
>> With this patch and the documentation in patch #6 the user should be clear on what to expect. Without this patch the
>
> Agree with "With this patch ...".
>
>> expectation is to iterate through the domains allocating counters but not consider subsequent domains after a failure.
>
> I hear what you are trying to say with the "Without this patch ..." part but I am hesitant to
> equate current undocumented kernel behavior as established user expectations. Instead I think
> when user requests counter allocation in all domains then expectation would/should be that
> counters will be allocated in all domains (that have available counters). I thus see this work
> more as a fix that brings behavior closer to match what user may expect when running
> impacted commands.
Yes, that's reasonable.
>
> One possible expectation that user space may develop based on experience with existing implementation
> is that requests to allocate counters may fail even if counters are available in some domains. If user
> space does have such expectation then they still need to consider mbm_L3_assignments to know
> what was actually allocated and I still do not see how they need to make any predictions to
> use resctrl.
>
> We thus seem to disagree on user expectations? Since this is quite subjective that may be ok since
> I believe this work addresses all aspects?
I think it's ok. We may disagree on how the user views the old behaviour but we still agree that the new behaviour is
the correct thing.
>
>> This means that the order of the domains are considered effects the outcome of command.
>
> Right.
>
>>
>> With this patch: each counter is attempted to be allocated
>> Without this patch: Domains are considered in order of ascending id and each counter in the domain is attempted to be
>> allocated but if a counter allocation fails no subsequent domains are considered.
>
> Understood.
>
>>
>> I think the behaviour with this patch is easier to understand and what would be most useful to the user.
>
> I agree.
>
>>
>> Suppose you have two domains, domain 0 and 1 and one counter remaining on one of the domains and create a new group
>> (with mbm_assign_on_mkdir set to 1). Without this patch the available counter is only allocated if that domain is
>> considered first (lower id) but with this patch it allocated regardless of which domain it's in.
>
> Understood.
>
>>>> Happy to drop the extra sentence if you don't think it adds anything. Making all allocations of multiple counters best
>>>> effort is the main point.
>>>
>>> I do not object adding extra sentences but the proposal mentions how user space needs to
>>> predict behaviors and know about kernel internals which I do not believe is required now nor
>>> with planned changes. I really seem to be missing something here so would appreciate if you
>>> could elaborate the goals here.
>>
>> It seems that we agree on the patch but I've somehow confused you on the motivation. In essence, it's making a better
>> effort at best effort counter allocation. Does this help clarify things?
>
> I understand motivation and I agree this patch improves current behavior. You need not
> motivate this patch. My comment was about the additional information that you propose
> adding to the changelog where I still do not see user needing to ever make any predictions to
> use the existing interfaces.
I think "predict" can be equated with "knows what to expect" which is useful for any interface. Anyhow, I already seem
to have caused more confusion than the sentence is worth and if the sentence confuses you it is likely to confuse
others. It seems sensible to just drop the sentence.
Thanks,
Ben
>
> Reinette
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v4 6/7] fs/resctrl: Document that automatic counter assignment is best effort
2026-03-26 17:25 [PATCH v4 0/7] x86,fs/resctrl: Pave the way for MPAM counter assignment Ben Horgan
` (4 preceding siblings ...)
2026-03-26 17:25 ` [PATCH v4 5/7] fs/resctrl: Continue counter allocation after failure Ben Horgan
@ 2026-03-26 17:25 ` Ben Horgan
2026-03-27 16:24 ` Reinette Chatre
2026-03-26 17:25 ` [PATCH v4 7/7] fs/resctrl: Document tasks file behaviour for task id 0 and idle tasks Ben Horgan
6 siblings, 1 reply; 25+ messages in thread
From: Ben Horgan @ 2026-03-26 17:25 UTC (permalink / raw)
To: linux-kernel
Cc: tony.luck, reinette.chatre, Dave.Martin, james.morse, babu.moger,
tglx, mingo, bp, dave.hansen, x86, hpa, ben.horgan, fenghuay,
tan.shaopeng
When using automatic counter assignment it's useful for a user to know
which counters they can expect to be assigned on group creation.
Document that automatic counter assignment is best effort and how to
discover any assignment failures.
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
Hi Reinette,
I've just taken your suggested paragraph with a minor tweak (s/scenario/a
scenario/) and moved it above the examples as I think it makes sense to
keep examples last in each section.
---
Documentation/filesystems/resctrl.rst | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index ba609f8d4de5..68cada238844 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -472,6 +472,12 @@ with the following files:
"1":
Auto assignment is enabled.
+ Automatic counter assignment is done with best effort. If auto
+ assignment is enabled but there are not enough available counters then
+ monitor group creation could succeed while one or more events belonging
+ to the group may not have a counter assigned. Consult last_cmd_status
+ for details during such a scenario.
+
Example::
# echo 0 > /sys/fs/resctrl/info/L3_MON/mbm_assign_on_mkdir
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [PATCH v4 6/7] fs/resctrl: Document that automatic counter assignment is best effort
2026-03-26 17:25 ` [PATCH v4 6/7] fs/resctrl: Document that automatic counter assignment is best effort Ben Horgan
@ 2026-03-27 16:24 ` Reinette Chatre
2026-04-14 14:55 ` Ben Horgan
0 siblings, 1 reply; 25+ messages in thread
From: Reinette Chatre @ 2026-03-27 16:24 UTC (permalink / raw)
To: Ben Horgan, linux-kernel
Cc: tony.luck, Dave.Martin, james.morse, babu.moger, tglx, mingo, bp,
dave.hansen, x86, hpa, fenghuay, tan.shaopeng
Hi Ben,
On 3/26/26 10:25 AM, Ben Horgan wrote:
> When using automatic counter assignment it's useful for a user to know
> which counters they can expect to be assigned on group creation.
>
> Document that automatic counter assignment is best effort and how to
> discover any assignment failures.
>
> Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
>
> Hi Reinette,
> I've just taken your suggested paragraph with a minor tweak (s/scenario/a
> scenario/) and moved it above the examples as I think it makes sense to
> keep examples last in each section.
> ---
> Documentation/filesystems/resctrl.rst | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
> index ba609f8d4de5..68cada238844 100644
> --- a/Documentation/filesystems/resctrl.rst
> +++ b/Documentation/filesystems/resctrl.rst
> @@ -472,6 +472,12 @@ with the following files:
> "1":
> Auto assignment is enabled.
>
> + Automatic counter assignment is done with best effort. If auto
> + assignment is enabled but there are not enough available counters then
> + monitor group creation could succeed while one or more events belonging
> + to the group may not have a counter assigned. Consult last_cmd_status
"not have a counter assigned" -> "not have a counter assigned in all domains"?
> + for details during such a scenario.
While last_cmd_status would indicate a failure it is best for user space to
consult mbm_L3_assignments for the accurate counter assignment state ("the
details"). It is starting to sound like the documentation is directing user space
to last_cmd_status for state and I would like to avoid this. How about replacing
"Consult last_cmd_status ..." sentence with "Consult mbm_L3_assignments for
counter assignment states of the new group."?
> +
> Example::
>
> # echo 0 > /sys/fs/resctrl/info/L3_MON/mbm_assign_on_mkdir
Reinette
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH v4 6/7] fs/resctrl: Document that automatic counter assignment is best effort
2026-03-27 16:24 ` Reinette Chatre
@ 2026-04-14 14:55 ` Ben Horgan
0 siblings, 0 replies; 25+ messages in thread
From: Ben Horgan @ 2026-04-14 14:55 UTC (permalink / raw)
To: Reinette Chatre, linux-kernel
Cc: tony.luck, Dave.Martin, james.morse, babu.moger, tglx, mingo, bp,
dave.hansen, x86, hpa, fenghuay, tan.shaopeng
Hi Reinette,
On 3/27/26 16:24, Reinette Chatre wrote:
> Hi Ben,
>
> On 3/26/26 10:25 AM, Ben Horgan wrote:
>> When using automatic counter assignment it's useful for a user to know
>> which counters they can expect to be assigned on group creation.
>>
>> Document that automatic counter assignment is best effort and how to
>> discover any assignment failures.
>>
>> Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
>> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
>> ---
>>
>> Hi Reinette,
>> I've just taken your suggested paragraph with a minor tweak (s/scenario/a
>> scenario/) and moved it above the examples as I think it makes sense to
>> keep examples last in each section.
>> ---
>> Documentation/filesystems/resctrl.rst | 6 ++++++
>> 1 file changed, 6 insertions(+)
>>
>> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
>> index ba609f8d4de5..68cada238844 100644
>> --- a/Documentation/filesystems/resctrl.rst
>> +++ b/Documentation/filesystems/resctrl.rst
>> @@ -472,6 +472,12 @@ with the following files:
>> "1":
>> Auto assignment is enabled.
>>
>> + Automatic counter assignment is done with best effort. If auto
>> + assignment is enabled but there are not enough available counters then
>> + monitor group creation could succeed while one or more events belonging
>> + to the group may not have a counter assigned. Consult last_cmd_status
>
> "not have a counter assigned" -> "not have a counter assigned in all domains"?
Ok, this is more correct.
>
>> + for details during such a scenario.
>
> While last_cmd_status would indicate a failure it is best for user space to
> consult mbm_L3_assignments for the accurate counter assignment state ("the
> details"). It is starting to sound like the documentation is directing user space
> to last_cmd_status for state and I would like to avoid this. How about replacing
> "Consult last_cmd_status ..." sentence with "Consult mbm_L3_assignments for
> counter assignment states of the new group."?
Makes sense. I agree that last_cmd_status should be keep as just a guide rather than
over complicating the reporting.
Thanks,
Ben
>
>> +
>> Example::
>>
>> # echo 0 > /sys/fs/resctrl/info/L3_MON/mbm_assign_on_mkdir
>
> Reinette
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v4 7/7] fs/resctrl: Document tasks file behaviour for task id 0 and idle tasks
2026-03-26 17:25 [PATCH v4 0/7] x86,fs/resctrl: Pave the way for MPAM counter assignment Ben Horgan
` (5 preceding siblings ...)
2026-03-26 17:25 ` [PATCH v4 6/7] fs/resctrl: Document that automatic counter assignment is best effort Ben Horgan
@ 2026-03-26 17:25 ` Ben Horgan
2026-03-27 16:40 ` Reinette Chatre
6 siblings, 1 reply; 25+ messages in thread
From: Ben Horgan @ 2026-03-26 17:25 UTC (permalink / raw)
To: linux-kernel
Cc: tony.luck, reinette.chatre, Dave.Martin, james.morse, babu.moger,
tglx, mingo, bp, dave.hansen, x86, hpa, ben.horgan, fenghuay,
tan.shaopeng
When 0 is written to the tasks file it is interpreted as the current task
in rdtgroup_move_task(). The task_struct for each CPUs idle task has pid
set to 0 and, on x86, the closid to RESCTRL_RESERVED_CLOSID and rmid to
RESCTRL_RESERVED_RMID. Equivalently, on MPAM platforms,
thread_info->mpam_partid_pmg is encoded with PARTID and PMG set to
RESCTRL_RESERVED_CLOSID and RESCTRL_RESERVED_RMID, respectively. As there
is no interface to change these from the default, the resctrl configuration
for the idle tasks is fixed and they always behave equivalently to a task
in the default tasks file and so take their configuration from the cpus
files.
On read of the tasks file, show_rdt_tasks() filters out any 0 pid. Hence,
a task id of 0 is never shown in the tasks file and the idle tasks are
not represented either.
Document the user visible behaviour.
Signed-off-by: Ben Horgan <ben.horgan@arm.com>
---
I have confirmed this experimentally on an MPAM platform.
---
Documentation/filesystems/resctrl.rst | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index 68cada238844..3ec6b3b1b603 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -568,6 +568,11 @@ All groups contain the following files:
then the task must already belong to the CTRL_MON parent of this
group. The task is removed from any previous MON group.
+ When writing to this file, a task id of 0 is interpreted as the
+ task id of the currently running task. On reading the file, a task
+ id of 0 will never be shown and there is no representation of the
+ idle tasks. Instead, a CPU's idle task is always considered as a
+ member of the group owning the CPU.
"cpus":
Reading this file shows a bitmask of the logical CPUs owned by
--
2.43.0
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [PATCH v4 7/7] fs/resctrl: Document tasks file behaviour for task id 0 and idle tasks
2026-03-26 17:25 ` [PATCH v4 7/7] fs/resctrl: Document tasks file behaviour for task id 0 and idle tasks Ben Horgan
@ 2026-03-27 16:40 ` Reinette Chatre
0 siblings, 0 replies; 25+ messages in thread
From: Reinette Chatre @ 2026-03-27 16:40 UTC (permalink / raw)
To: Ben Horgan, linux-kernel
Cc: tony.luck, Dave.Martin, james.morse, babu.moger, tglx, mingo, bp,
dave.hansen, x86, hpa, fenghuay, tan.shaopeng
Hi Ben,
On 3/26/26 10:25 AM, Ben Horgan wrote:
> When 0 is written to the tasks file it is interpreted as the current task
> in rdtgroup_move_task(). The task_struct for each CPUs idle task has pid
"each CPUs" -> "each CPU's"
"pid" could be PID or task_struct::pid to help make clear what it refers to.
> set to 0 and, on x86, the closid to RESCTRL_RESERVED_CLOSID and rmid to
It is not clear if you refer to the actual task_struct field, for example
task_struct::closid, or what it represents, for example CLOSID.
> RESCTRL_RESERVED_RMID. Equivalently, on MPAM platforms,
> thread_info->mpam_partid_pmg is encoded with PARTID and PMG set to
thread_info->mpam_partid_pmg to thread_info::mpam_partid_pmg to be consistent
if making an earlier change.
> RESCTRL_RESERVED_CLOSID and RESCTRL_RESERVED_RMID, respectively. As there
> is no interface to change these from the default, the resctrl configuration
> for the idle tasks is fixed and they always behave equivalently to a task
> in the default tasks file and so take their configuration from the cpus
"cpus" -> "cpus/cpus_list"
> files.
>
> On read of the tasks file, show_rdt_tasks() filters out any 0 pid. Hence,
"pid" -> "PID"
> a task id of 0 is never shown in the tasks file and the idle tasks are
> not represented either.
>
> Document the user visible behaviour.
>
> Signed-off-by: Ben Horgan <ben.horgan@arm.com>
> ---
> I have confirmed this experimentally on an MPAM platform.
> ---
> Documentation/filesystems/resctrl.rst | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
> index 68cada238844..3ec6b3b1b603 100644
> --- a/Documentation/filesystems/resctrl.rst
> +++ b/Documentation/filesystems/resctrl.rst
> @@ -568,6 +568,11 @@ All groups contain the following files:
> then the task must already belong to the CTRL_MON parent of this
> group. The task is removed from any previous MON group.
>
> + When writing to this file, a task id of 0 is interpreted as the
> + task id of the currently running task. On reading the file, a task
> + id of 0 will never be shown and there is no representation of the
> + idle tasks. Instead, a CPU's idle task is always considered as a
> + member of the group owning the CPU.
>
This is a valuable addition. Thank you very much for adding it.
> "cpus":
> Reading this file shows a bitmask of the logical CPUs owned by
Reinette
^ permalink raw reply [flat|nested] 25+ messages in thread