* [PATCH 01/11] fs/resctrl: Add missing return value descriptions
2026-03-02 18:46 [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency Reinette Chatre
@ 2026-03-02 18:46 ` Reinette Chatre
2026-03-02 18:46 ` [PATCH 02/11] fs/resctrl: Avoid "may be used uninitialized" warning Reinette Chatre
` (11 subsequent siblings)
12 siblings, 0 replies; 37+ messages in thread
From: Reinette Chatre @ 2026-03-02 18:46 UTC (permalink / raw)
To: tony.luck, james.morse, Dave.Martin, babu.moger, bp, tglx,
dave.hansen
Cc: x86, hpa, ben.horgan, fustini, fenghuay, peternewman,
linux-kernel, patches, reinette.chatre
Using the stricter "./tools/docs/kernel-doc -Wall -v" to verify proper
formatting of documentation comments includes warnings related to return
markup on functions that are omitted during the default verification
checks. This stricter verification reports a couple of missing return
descriptions in resctrl:
Warning: .../fs/resctrl/rdtgroup.c:1536 No description found for return value of 'rdtgroup_cbm_to_size'
Warning: .../fs/resctrl/rdtgroup.c:3131 No description found for return value of 'mon_get_kn_priv'
Warning: .../fs/resctrl/rdtgroup.c:3523 No description found for return value of 'cbm_ensure_valid'
Warning: .../fs/resctrl/monitor.c:238 No description found for return value of 'resctrl_find_cleanest_closid'
Add the missing return descriptions.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
fs/resctrl/monitor.c | 2 ++
fs/resctrl/rdtgroup.c | 6 ++++++
2 files changed, 8 insertions(+)
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 0cd5476a483a..93db7c1cd7be 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -234,6 +234,8 @@ static struct rmid_entry *resctrl_find_free_rmid(u32 closid)
*
* When the CLOSID and RMID are independent numbers, the first free CLOSID will
* be returned.
+ *
+ * Return: Free CLOSID on success, < 0 on failure.
*/
int resctrl_find_cleanest_closid(void)
{
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index ba8d503551cd..3c9508fc558d 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1519,6 +1519,8 @@ static ssize_t rdtgroup_mode_write(struct kernfs_open_file *of,
*
* @cbm is unsigned long, even if only 32 bits are used to make the
* bitmap functions work correctly.
+ *
+ * Return: size in bytes, 0 on failure.
*/
unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r,
struct rdt_ctrl_domain *d, unsigned long cbm)
@@ -3102,6 +3104,8 @@ static void rmdir_all_sub(void)
* @mevt: The type of event file being created.
* @do_sum: Whether SNC summing monitors are being created. Only set
* when @rid == RDT_RESOURCE_L3.
+ *
+ * Return: Pointer to mon_data private data of the event, NULL on failure.
*/
static struct mon_data *mon_get_kn_priv(enum resctrl_res_level rid, int domid,
struct mon_evt *mevt,
@@ -3496,6 +3500,8 @@ static int mkdir_mondata_all(struct kernfs_node *parent_kn,
* resource group is initialized. The user can follow this with a
* modification to the CBM if the default does not satisfy the
* requirements.
+ *
+ * Return: A CBM that is valid for resource @r.
*/
static u32 cbm_ensure_valid(u32 _val, struct rdt_resource *r)
{
--
2.50.1
^ permalink raw reply related [flat|nested] 37+ messages in thread* [PATCH 02/11] fs/resctrl: Avoid "may be used uninitialized" warning
2026-03-02 18:46 [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency Reinette Chatre
2026-03-02 18:46 ` [PATCH 01/11] fs/resctrl: Add missing return value descriptions Reinette Chatre
@ 2026-03-02 18:46 ` Reinette Chatre
2026-03-02 18:46 ` [PATCH 03/11] fs/resctrl: Use correct format specifier for printing error pointers Reinette Chatre
` (10 subsequent siblings)
12 siblings, 0 replies; 37+ messages in thread
From: Reinette Chatre @ 2026-03-02 18:46 UTC (permalink / raw)
To: tony.luck, james.morse, Dave.Martin, babu.moger, bp, tglx,
dave.hansen
Cc: x86, hpa, ben.horgan, fustini, fenghuay, peternewman,
linux-kernel, patches, reinette.chatre
Building resctrl with extra checks ("W=12") produces the following warning:
.../include/linux/ucopysize.h:22:17: warning: ‘buf’ may be used uninitialized [-Wmaybe-uninitialized]
22 | __check_object_size(ptr, n, to_user);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.../include/linux/ucopysize.h: In function ‘pseudo_lock_measure_trigger’:
.../include/linux/ucopysize.h:10:13: note: by argument 1 of type ‘const void *’ to ‘__check_object_size’ declared here
10 | extern void __check_object_size(const void *ptr, unsigned long n,
| ^~~~~~~~~~~~~~~~~~~
.../fs/resctrl/pseudo_lock.c:754:14: note: ‘buf’ declared here
754 | char buf[32];
| ^~~
__check_object_size() ensures the provided buffer is within a valid location
but does not read from the uninitialized buffer. Even so, initialize the
buffer to silence the warning to help resctrl have a cleaner build.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
fs/resctrl/pseudo_lock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/resctrl/pseudo_lock.c b/fs/resctrl/pseudo_lock.c
index e81d71abfe54..a857e51580e2 100644
--- a/fs/resctrl/pseudo_lock.c
+++ b/fs/resctrl/pseudo_lock.c
@@ -750,8 +750,8 @@ static ssize_t pseudo_lock_measure_trigger(struct file *file,
size_t count, loff_t *ppos)
{
struct rdtgroup *rdtgrp = file->private_data;
+ char buf[32] = {};
size_t buf_size;
- char buf[32];
int ret;
int sel;
--
2.50.1
^ permalink raw reply related [flat|nested] 37+ messages in thread* [PATCH 03/11] fs/resctrl: Use correct format specifier for printing error pointers
2026-03-02 18:46 [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency Reinette Chatre
2026-03-02 18:46 ` [PATCH 01/11] fs/resctrl: Add missing return value descriptions Reinette Chatre
2026-03-02 18:46 ` [PATCH 02/11] fs/resctrl: Avoid "may be used uninitialized" warning Reinette Chatre
@ 2026-03-02 18:46 ` Reinette Chatre
2026-03-02 18:46 ` [PATCH 04/11] x86/resctrl: Protect against bad shift Reinette Chatre
` (9 subsequent siblings)
12 siblings, 0 replies; 37+ messages in thread
From: Reinette Chatre @ 2026-03-02 18:46 UTC (permalink / raw)
To: tony.luck, james.morse, Dave.Martin, babu.moger, bp, tglx,
dave.hansen
Cc: x86, hpa, ben.horgan, fustini, fenghuay, peternewman,
linux-kernel, patches, reinette.chatre
Use correct format specifier for error pointer as coccinelle suggests:
.../fs/resctrl/monitor.c:148:8-15: WARNING: Consider using %pe to print PTR_ERR()
.../fs/resctrl/monitor.c:760:9-16: WARNING: Consider using %pe to print PTR_ERR()
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
fs/resctrl/monitor.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 93db7c1cd7be..421c70f96426 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -144,8 +144,8 @@ void __check_limbo(struct rdt_l3_mon_domain *d, bool force_free)
arch_priv = mon_event_all[QOS_L3_OCCUP_EVENT_ID].arch_priv;
arch_mon_ctx = resctrl_arch_mon_ctx_alloc(r, QOS_L3_OCCUP_EVENT_ID);
if (IS_ERR(arch_mon_ctx)) {
- pr_warn_ratelimited("Failed to allocate monitor context: %ld",
- PTR_ERR(arch_mon_ctx));
+ pr_warn_ratelimited("Failed to allocate monitor context: %pe",
+ arch_mon_ctx);
return;
}
@@ -752,8 +752,8 @@ static void mbm_update_one_event(struct rdt_resource *r, struct rdt_l3_mon_domai
} else {
rr.arch_mon_ctx = resctrl_arch_mon_ctx_alloc(rr.r, evtid);
if (IS_ERR(rr.arch_mon_ctx)) {
- pr_warn_ratelimited("Failed to allocate monitor context: %ld",
- PTR_ERR(rr.arch_mon_ctx));
+ pr_warn_ratelimited("Failed to allocate monitor context: %pe",
+ rr.arch_mon_ctx);
return;
}
}
--
2.50.1
^ permalink raw reply related [flat|nested] 37+ messages in thread* [PATCH 04/11] x86/resctrl: Protect against bad shift
2026-03-02 18:46 [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency Reinette Chatre
` (2 preceding siblings ...)
2026-03-02 18:46 ` [PATCH 03/11] fs/resctrl: Use correct format specifier for printing error pointers Reinette Chatre
@ 2026-03-02 18:46 ` Reinette Chatre
2026-03-02 18:46 ` [PATCH 05/11] fs/resctrl: Use accurate type for rdt_resource::rid Reinette Chatre
` (8 subsequent siblings)
12 siblings, 0 replies; 37+ messages in thread
From: Reinette Chatre @ 2026-03-02 18:46 UTC (permalink / raw)
To: tony.luck, james.morse, Dave.Martin, babu.moger, bp, tglx,
dave.hansen
Cc: x86, hpa, ben.horgan, fustini, fenghuay, peternewman,
linux-kernel, patches, reinette.chatre
The size of the bandwidth specifier field is enumerated from AMD hardware.
resctrl uses this field width to determine the maximum bandwidth supported
that is stored in resctrl_membw::max_bw to which user space allocation
requests are compared for validity.
resctrl_membw::max_bw is of type u32 while the register containing the
bandwidth specifier field, L3QOS_BW_CONTROL_n, is 64 bits. While not an
issue with current hardware it is theoretically possible that enumeration
of maximum bandwidth may trigger invalid behavior if a future system can
use a bandwidth specifier field larger than 32 bits. Whether this could
ever represent a reasonable bandwidth value is unknown but addressing the
issue will appease static checkers.
Ensure resctrl can accommodate the hardware's bandwidth specifier field
width with an additional check. Switch to BIT() instead of open-coding the
bitshift to avoid signed integer overflow if the number of bits is a
valid 31.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
arch/x86/kernel/cpu/resctrl/core.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 7667cf7c4e94..db787c4dee61 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -246,7 +246,9 @@ static __init bool __rdt_get_mem_config_amd(struct rdt_resource *r)
cpuid_count(0x80000020, subleaf, &eax, &ebx, &ecx, &edx);
hw_res->num_closid = edx + 1;
- r->membw.max_bw = 1 << eax;
+ if (WARN_ON(BITS_PER_TYPE(r->membw.max_bw) <= eax))
+ return false;
+ r->membw.max_bw = BIT(eax);
/* AMD does not use delay */
r->membw.delay_linear = false;
--
2.50.1
^ permalink raw reply related [flat|nested] 37+ messages in thread* [PATCH 05/11] fs/resctrl: Use accurate type for rdt_resource::rid
2026-03-02 18:46 [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency Reinette Chatre
` (3 preceding siblings ...)
2026-03-02 18:46 ` [PATCH 04/11] x86/resctrl: Protect against bad shift Reinette Chatre
@ 2026-03-02 18:46 ` Reinette Chatre
2026-03-03 18:20 ` Luck, Tony
2026-03-17 11:23 ` Ben Horgan
2026-03-02 18:46 ` [PATCH 06/11] fs/resctrl: Pass error reading event through to user space Reinette Chatre
` (7 subsequent siblings)
12 siblings, 2 replies; 37+ messages in thread
From: Reinette Chatre @ 2026-03-02 18:46 UTC (permalink / raw)
To: tony.luck, james.morse, Dave.Martin, babu.moger, bp, tglx,
dave.hansen
Cc: x86, hpa, ben.horgan, fustini, fenghuay, peternewman,
linux-kernel, patches, reinette.chatre
Every resctrl resource has a unique ID described by enum resctrl_res_level.
enum resctrl_res_level is used in all resource ID initializations and all
resource ID comparisons. All functions consuming the resource ID expects an
enum resctrl_res_level. Of the four structures that contain a resource ID
(struct mon_data, struct mon_evt, struct rdt_domain_hdr, and struct
rdt_resource) only struct rdt_resource does not use enum resctrl_res_level.
Switch the type of rdt_resource::rid to be enum resctrl_res_level to make
it obvious what values are valid, match the type everywhere this member is
used, and obtain benefits from tools that can flag any enum misuse.
Move define of RDT_NUM_RESOURCES outside the enum to enable tools to catch
when a switch() on the resource ID does not handle all the resources and
thus help flag which switch statements need an update when a new resource
is added.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
include/linux/resctrl.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 006e57fd7ca5..48e95f273fb3 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -54,11 +54,11 @@ enum resctrl_res_level {
RDT_RESOURCE_MBA,
RDT_RESOURCE_SMBA,
RDT_RESOURCE_PERF_PKG,
-
- /* Must be the last */
- RDT_NUM_RESOURCES,
+ /* Additions to enum need to update RDT_NUM_RESOURCES. */
};
+#define RDT_NUM_RESOURCES (RDT_RESOURCE_PERF_PKG + 1)
+
/**
* enum resctrl_conf_type - The type of configuration.
* @CDP_NONE: No prioritisation, both code and data are controlled or monitored.
@@ -319,7 +319,7 @@ struct resctrl_mon {
* @cdp_capable: Is the CDP feature available on this resource
*/
struct rdt_resource {
- int rid;
+ enum resctrl_res_level rid;
bool alloc_capable;
bool mon_capable;
enum resctrl_scope ctrl_scope;
--
2.50.1
^ permalink raw reply related [flat|nested] 37+ messages in thread* Re: [PATCH 05/11] fs/resctrl: Use accurate type for rdt_resource::rid
2026-03-02 18:46 ` [PATCH 05/11] fs/resctrl: Use accurate type for rdt_resource::rid Reinette Chatre
@ 2026-03-03 18:20 ` Luck, Tony
2026-03-03 19:06 ` Reinette Chatre
2026-03-17 11:23 ` Ben Horgan
1 sibling, 1 reply; 37+ messages in thread
From: Luck, Tony @ 2026-03-03 18:20 UTC (permalink / raw)
To: Reinette Chatre
Cc: james.morse, Dave.Martin, babu.moger, bp, tglx, dave.hansen, x86,
hpa, ben.horgan, fustini, fenghuay, peternewman, linux-kernel,
patches
On Mon, Mar 02, 2026 at 10:46:11AM -0800, Reinette Chatre wrote:
> - /* Must be the last */
> - RDT_NUM_RESOURCES,
> + /* Additions to enum need to update RDT_NUM_RESOURCES. */
> };
>
> +#define RDT_NUM_RESOURCES (RDT_RESOURCE_PERF_PKG + 1)
Alternative approach that doesn't rely on developers reading
that comment and updating the define.
Replace the RDT_NUM_RESOURCES #define with a variable initialized
to ARRAY_SIZE(rdt_resources_all).
-Tony
From b31a874c7199347d875fd6e9505d41aa2639eafd Mon Sep 17 00:00:00 2001
From: Reinette Chatre <reinette.chatre@intel.com>
Date: Mon, 2 Mar 2026 10:46:11 -0800
Subject: [PATCH] fs/resctrl: Use accurate type for rdt_resource::rid
Every resctrl resource has a unique ID described by enum resctrl_res_level.
enum resctrl_res_level is used in all resource ID initializations and all
resource ID comparisons. All functions consuming the resource ID expects an
enum resctrl_res_level. Of the four structures that contain a resource ID
(struct mon_data, struct mon_evt, struct rdt_domain_hdr, and struct
rdt_resource) only struct rdt_resource does not use enum resctrl_res_level.
Switch the type of rdt_resource::rid to be enum resctrl_res_level to make
it obvious what values are valid, match the type everywhere this member is
used, and obtain benefits from tools that can flag any enum misuse.
Use a variable rdt_num_resources to replace the define of RDT_NUM_RESOURCES
in the enum to enable tools to catch when a switch() on the resource ID does
not handle all the resources and thus help flag which switch statements need
an update when a new resource is added.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
include/linux/resctrl.h | 11 +++++------
arch/x86/kernel/cpu/resctrl/core.c | 9 ++++++---
2 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 006e57fd7ca5..ef2efa2e4b39 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -27,13 +27,15 @@ int proc_resctrl_show(struct seq_file *m,
#endif
+extern int rdt_num_resources;
+
/* max value for struct rdt_domain's mbps_val */
#define MBA_MAX_MBPS U32_MAX
/* Walk all possible resources, with variants for only controls or monitors. */
#define for_each_rdt_resource(_r) \
for ((_r) = resctrl_arch_get_resource(0); \
- (_r) && (_r)->rid < RDT_NUM_RESOURCES; \
+ (_r) && (_r)->rid < rdt_num_resources; \
(_r) = resctrl_arch_get_resource((_r)->rid + 1))
#define for_each_capable_rdt_resource(r) \
@@ -54,9 +56,6 @@ enum resctrl_res_level {
RDT_RESOURCE_MBA,
RDT_RESOURCE_SMBA,
RDT_RESOURCE_PERF_PKG,
-
- /* Must be the last */
- RDT_NUM_RESOURCES,
};
/**
@@ -319,7 +318,7 @@ struct resctrl_mon {
* @cdp_capable: Is the CDP feature available on this resource
*/
struct rdt_resource {
- int rid;
+ enum resctrl_res_level rid;
bool alloc_capable;
bool mon_capable;
enum resctrl_scope ctrl_scope;
@@ -336,7 +335,7 @@ struct rdt_resource {
/*
* Get the resource that exists at this level. If the level is not supported
- * a dummy/not-capable resource can be returned. Levels >= RDT_NUM_RESOURCES
+ * a dummy/not-capable resource can be returned. Levels >= rdt_num_resources
* will return NULL.
*/
struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l);
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index db787c4dee61..2557bb639555 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -16,6 +16,7 @@
#define pr_fmt(fmt) "resctrl: " fmt
+#include <linux/array_size.h>
#include <linux/cpu.h>
#include <linux/slab.h>
#include <linux/err.h>
@@ -57,7 +58,7 @@ static void mba_wrmsr_amd(struct msr_param *m);
#define ctrl_domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.ctrl_domains)
#define mon_domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.mon_domains)
-struct rdt_hw_resource rdt_resources_all[RDT_NUM_RESOURCES] = {
+struct rdt_hw_resource rdt_resources_all[] = {
[RDT_RESOURCE_L3] =
{
.r_resctrl = {
@@ -110,6 +111,8 @@ struct rdt_hw_resource rdt_resources_all[RDT_NUM_RESOURCES] = {
},
};
+int rdt_num_resources = ARRAY_SIZE(rdt_resources_all);
+
/**
* resctrl_arch_system_num_rmid_idx - Compute number of supported RMIDs
* (minimum across all mon_capable resource)
@@ -131,7 +134,7 @@ u32 resctrl_arch_system_num_rmid_idx(void)
struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
{
- if (l >= RDT_NUM_RESOURCES)
+ if (l >= rdt_num_resources)
return NULL;
return &rdt_resources_all[l].r_resctrl;
@@ -1121,7 +1124,7 @@ static int __init resctrl_arch_late_init(void)
int state, ret, i;
/* for_each_rdt_resource() requires all rid to be initialised. */
- for (i = 0; i < RDT_NUM_RESOURCES; i++)
+ for (i = 0; i < rdt_num_resources; i++)
rdt_resources_all[i].r_resctrl.rid = i;
/*
--
2.53.0
^ permalink raw reply related [flat|nested] 37+ messages in thread* Re: [PATCH 05/11] fs/resctrl: Use accurate type for rdt_resource::rid
2026-03-03 18:20 ` Luck, Tony
@ 2026-03-03 19:06 ` Reinette Chatre
2026-03-03 19:54 ` Luck, Tony
0 siblings, 1 reply; 37+ messages in thread
From: Reinette Chatre @ 2026-03-03 19:06 UTC (permalink / raw)
To: Luck, Tony
Cc: james.morse, Dave.Martin, babu.moger, bp, tglx, dave.hansen, x86,
hpa, ben.horgan, fustini, fenghuay, peternewman, linux-kernel,
patches
Hi Tony,
On 3/3/26 10:20 AM, Luck, Tony wrote:
> On Mon, Mar 02, 2026 at 10:46:11AM -0800, Reinette Chatre wrote:
>
>> - /* Must be the last */
>> - RDT_NUM_RESOURCES,
>> + /* Additions to enum need to update RDT_NUM_RESOURCES. */
>> };
>>
>> +#define RDT_NUM_RESOURCES (RDT_RESOURCE_PERF_PKG + 1)
>
> Alternative approach that doesn't rely on developers reading
> that comment and updating the define.
>
> Replace the RDT_NUM_RESOURCES #define with a variable initialized
> to ARRAY_SIZE(rdt_resources_all).
....
> include/linux/resctrl.h | 11 +++++------
> arch/x86/kernel/cpu/resctrl/core.c | 9 ++++++---
> 2 files changed, 11 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 006e57fd7ca5..ef2efa2e4b39 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
The resources resctrl fs supports are defined within this global
resctrl fs header file as:
enum resctrl_res_level {
RDT_RESOURCE_L3,
RDT_RESOURCE_L2,
RDT_RESOURCE_MBA,
RDT_RESOURCE_SMBA,
RDT_RESOURCE_PERF_PKG,
RDT_NUM_RESOURCES,
};
> @@ -27,13 +27,15 @@ int proc_resctrl_show(struct seq_file *m,
>
> #endif
>
> +extern int rdt_num_resources;
> +
....
Architecture code below ...
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
...
> @@ -57,7 +58,7 @@ static void mba_wrmsr_amd(struct msr_param *m);
> #define ctrl_domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.ctrl_domains)
> #define mon_domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.mon_domains)
>
> -struct rdt_hw_resource rdt_resources_all[RDT_NUM_RESOURCES] = {
> +struct rdt_hw_resource rdt_resources_all[] = {
> [RDT_RESOURCE_L3] =
> {
> .r_resctrl = {
> @@ -110,6 +111,8 @@ struct rdt_hw_resource rdt_resources_all[RDT_NUM_RESOURCES] = {
> },
> };
>
> +int rdt_num_resources = ARRAY_SIZE(rdt_resources_all);
> +
... and this proposes to let the *architecture* initialize how many
resources resctrl fs supports?
This implies that all architectures need to initialize this on behalf of
resctrl fs. resctrl fs does not force an architecture to use an array nor does
it require an architecture to support all resources. What if an architecture
decides to not use an array and does not support all the resources resctrl fs
supports? How should it initialize rdt_num_resources?
I see the number of resources supported by resctrl fs as a resctrl fs property,
not something it should depend on the architecture to initialize.
Reinette
^ permalink raw reply [flat|nested] 37+ messages in thread* Re: [PATCH 05/11] fs/resctrl: Use accurate type for rdt_resource::rid
2026-03-03 19:06 ` Reinette Chatre
@ 2026-03-03 19:54 ` Luck, Tony
2026-03-03 22:29 ` Reinette Chatre
0 siblings, 1 reply; 37+ messages in thread
From: Luck, Tony @ 2026-03-03 19:54 UTC (permalink / raw)
To: Reinette Chatre
Cc: james.morse, Dave.Martin, babu.moger, bp, tglx, dave.hansen, x86,
hpa, ben.horgan, fustini, fenghuay, peternewman, linux-kernel,
patches
On Tue, Mar 03, 2026 at 11:06:52AM -0800, Reinette Chatre wrote:
> Hi Tony,
> > +int rdt_num_resources = ARRAY_SIZE(rdt_resources_all);
> > +
>
> ... and this proposes to let the *architecture* initialize how many
> resources resctrl fs supports?
Not exactly. The file system is free to support as many resources as
it wants to.
The architecture just provides the largest value that will provide
any useful result from resctrl_arch_get_resource() so that this filesystem macro
works without redundant iterations for unimplemented resources at the high
end of the enum range:
/* Walk all possible resources, with variants for only controls or monitors. */
#define for_each_rdt_resource(_r) \
for ((_r) = resctrl_arch_get_resource(0); \
(_r) && (_r)->rid < rdt_num_resources; \
(_r) = resctrl_arch_get_resource((_r)->rid + 1))
>
> This implies that all architectures need to initialize this on behalf of
> resctrl fs. resctrl fs does not force an architecture to use an array nor does
> it require an architecture to support all resources. What if an architecture
> decides to not use an array and does not support all the resources resctrl fs
> supports? How should it initialize rdt_num_resources?
Yes. Each architecture would have to provide a value. Each architecture
must support the resctrl_arch_get_resource(enum resctrl_res_level l)
function. Regardless of whether this is backed up with an array of
resources, a list, or some other exotic structure this function must
return a valid "struct rdt_resource *" pointer that can be dereferenced
for all values [0 ... RDT_NUM_RESOURCES).
Allowing the architecture to define the upper limit of supported resource
numbers doesn't constrain the file system.
> I see the number of resources supported by resctrl fs as a resctrl fs property,
> not something it should depend on the architecture to initialize.
Having resctrl fs define the number means that an architecture that uses
an array must pad out that array.
-Tony
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 05/11] fs/resctrl: Use accurate type for rdt_resource::rid
2026-03-03 19:54 ` Luck, Tony
@ 2026-03-03 22:29 ` Reinette Chatre
2026-03-03 23:26 ` Luck, Tony
0 siblings, 1 reply; 37+ messages in thread
From: Reinette Chatre @ 2026-03-03 22:29 UTC (permalink / raw)
To: Luck, Tony
Cc: james.morse, Dave.Martin, babu.moger, bp, tglx, dave.hansen, x86,
hpa, ben.horgan, fustini, fenghuay, peternewman, linux-kernel,
patches
Hi Tony,
On 3/3/26 11:54 AM, Luck, Tony wrote:
> On Tue, Mar 03, 2026 at 11:06:52AM -0800, Reinette Chatre wrote:
>> Hi Tony,
>>> +int rdt_num_resources = ARRAY_SIZE(rdt_resources_all);
>>> +
>>
>> ... and this proposes to let the *architecture* initialize how many
>> resources resctrl fs supports?
>
> Not exactly. The file system is free to support as many resources as
> it wants to.
>
> The architecture just provides the largest value that will provide
> any useful result from resctrl_arch_get_resource() so that this filesystem macro
> works without redundant iterations for unimplemented resources at the high
> end of the enum range:
I am seeing three different values being considered:
- The number of resources supported by resctrl fs.
- The number of resources supported by architecture.
- The maximum resource id supported by architecture.
At this time RDT_NUM_RESOURCES is used for all three on x86 but these three
numbers can be different.
If I understand correctly you are not actually proposing to "Replace the
RDT_NUM_RESOURCES #define with a variable initialized to
ARRAY_SIZE(rdt_resources_all)." but instead you are proposing
to drop RDT_NUM_RESOURCES entirely and introduce a new variable that
captures the maximum resource ID supported by the architecture (which can
be different from the number of resources supported by the architecture).
> /* Walk all possible resources, with variants for only controls or monitors. */
> #define for_each_rdt_resource(_r) \
> for ((_r) = resctrl_arch_get_resource(0); \
> (_r) && (_r)->rid < rdt_num_resources; \
> (_r) = resctrl_arch_get_resource((_r)->rid + 1))
Letting rdt_num_resources be the maximum resource ID supported by the architecture
will not avoid that resctrl_arch_get_resource() be called for a resource that is
not supported by the architecture nor does it avoid resctrl fs calling
resctrl_arch_get_resource() for resource IDs larger than the maximum set by
the architecture.
Consider all the other places where resctrl fs call resctrl_arch_get_resource()
directly. There are several places where resctrl_arch_get_resource() is called
directly for a resource. Introducing a new max that architecture can set with
the expectation that resctrl_arch_get_resource() will not be called for an id
larger than that is not correct and a solution that implements such contract
between resctrl fs and the architecture needs to look further than the
for_each_rdt_resource() loop.
>> This implies that all architectures need to initialize this on behalf of
>> resctrl fs. resctrl fs does not force an architecture to use an array nor does
>> it require an architecture to support all resources. What if an architecture
>> decides to not use an array and does not support all the resources resctrl fs
>> supports? How should it initialize rdt_num_resources?
>
> Yes. Each architecture would have to provide a value. Each architecture
> must support the resctrl_arch_get_resource(enum resctrl_res_level l)
> function. Regardless of whether this is backed up with an array of
> resources, a list, or some other exotic structure this function must
> return a valid "struct rdt_resource *" pointer that can be dereferenced
> for all values [0 ... RDT_NUM_RESOURCES).
To support the loop the architecture only needs to provide valid pointers
(with an initialized rdt_resource::rid) up to the maximum resource ID supported
by it. This proposal does not change this requirement.
The expectation is that for any resource known to resctrl the architecture
will return a "not-capable" resource. For a comprehensive solution resctrl
fs should either never call resctrl_arch_get_resource() on a non-capable
resource or be able to handle NULL returned from any resctrl_arch_get_resource()
at all other call sites.
This may be where this proposal is headed but in its current form it
creates a fragmented contract between resctrl fs and the architecture.
> Allowing the architecture to define the upper limit of supported resource
> numbers doesn't constrain the file system.
What is the purpose of defining an upper limit when file system still
expects architecture to handle requests that exceed the limit?
>> I see the number of resources supported by resctrl fs as a resctrl fs property,
>> not something it should depend on the architecture to initialize.
>
> Having resctrl fs define the number means that an architecture that uses
> an array must pad out that array.
Not for the for_each_rdt_resource() loop but it does still require padding the
unsupported resources with IDs less than the max that this proposal does not address.
Will need to survey resctrl for the other call sites that are not covered by this
proposal anyway. If the goal is to avoid padding then the solution needs to be
more comprehensive.
Reinette
^ permalink raw reply [flat|nested] 37+ messages in thread
* RE: [PATCH 05/11] fs/resctrl: Use accurate type for rdt_resource::rid
2026-03-03 22:29 ` Reinette Chatre
@ 2026-03-03 23:26 ` Luck, Tony
0 siblings, 0 replies; 37+ messages in thread
From: Luck, Tony @ 2026-03-03 23:26 UTC (permalink / raw)
To: Chatre, Reinette
Cc: james.morse@arm.com, Dave.Martin@arm.com, babu.moger@amd.com,
bp@alien8.de, tglx@linutronix.de, dave.hansen@linux.intel.com,
x86@kernel.org, hpa@zytor.com, ben.horgan@arm.com,
fustini@kernel.org, fenghuay@nvidia.com, peternewman@google.com,
linux-kernel@vger.kernel.org, patches@lists.linux.dev
> I am seeing three different values being considered:
> - The number of resources supported by resctrl fs.
> - The number of resources supported by architecture.
> - The maximum resource id supported by architecture.
> At this time RDT_NUM_RESOURCES is used for all three on x86 but these three
> numbers can be different.
Agreed.
> Letting rdt_num_resources be the maximum resource ID supported by the architecture
> will not avoid that resctrl_arch_get_resource() be called for a resource that is
> not supported by the architecture nor does it avoid resctrl fs calling
> resctrl_arch_get_resource() for resource IDs larger than the maximum set by
> the architecture.
I hadn't considered the other calls to resctrl_arch_get_resource()
Drop my idea and stick with your patch 5.
-Tony
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 05/11] fs/resctrl: Use accurate type for rdt_resource::rid
2026-03-02 18:46 ` [PATCH 05/11] fs/resctrl: Use accurate type for rdt_resource::rid Reinette Chatre
2026-03-03 18:20 ` Luck, Tony
@ 2026-03-17 11:23 ` Ben Horgan
2026-03-17 17:34 ` Reinette Chatre
1 sibling, 1 reply; 37+ messages in thread
From: Ben Horgan @ 2026-03-17 11:23 UTC (permalink / raw)
To: Reinette Chatre, tony.luck, james.morse, Dave.Martin, babu.moger,
bp, tglx, dave.hansen
Cc: x86, hpa, fustini, fenghuay, peternewman, linux-kernel, patches
Hi Reinette,
On 3/2/26 18:46, Reinette Chatre wrote:
> Every resctrl resource has a unique ID described by enum resctrl_res_level.
> enum resctrl_res_level is used in all resource ID initializations and all
> resource ID comparisons. All functions consuming the resource ID expects an
> enum resctrl_res_level. Of the four structures that contain a resource ID
> (struct mon_data, struct mon_evt, struct rdt_domain_hdr, and struct
> rdt_resource) only struct rdt_resource does not use enum resctrl_res_level.
>
> Switch the type of rdt_resource::rid to be enum resctrl_res_level to make
> it obvious what values are valid, match the type everywhere this member is
> used, and obtain benefits from tools that can flag any enum misuse.
>
> Move define of RDT_NUM_RESOURCES outside the enum to enable tools to catch
> when a switch() on the resource ID does not handle all the resources and
> thus help flag which switch statements need an update when a new resource
> is added.
>
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
> include/linux/resctrl.h | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 006e57fd7ca5..48e95f273fb3 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -54,11 +54,11 @@ enum resctrl_res_level {
> RDT_RESOURCE_MBA,
> RDT_RESOURCE_SMBA,
> RDT_RESOURCE_PERF_PKG,
> -
> - /* Must be the last */
> - RDT_NUM_RESOURCES,
> + /* Additions to enum need to update RDT_NUM_RESOURCES. */
> };>
> +#define RDT_NUM_RESOURCES (RDT_RESOURCE_PERF_PKG + 1)
Would it be clearer to have RDT_RESOURCE_LAST = RDT_RESOURCE_PERF_PKG in the enum
and use RDT_RESOURCE_LAST instead of RDT_RESOURCE_PERF_PKG here?
Thanks,
Ben
> +
> /**
> * enum resctrl_conf_type - The type of configuration.
> * @CDP_NONE: No prioritisation, both code and data are controlled or monitored.
> @@ -319,7 +319,7 @@ struct resctrl_mon {
> * @cdp_capable: Is the CDP feature available on this resource
> */
> struct rdt_resource {
> - int rid;
> + enum resctrl_res_level rid;
> bool alloc_capable;
> bool mon_capable;
> enum resctrl_scope ctrl_scope;
^ permalink raw reply [flat|nested] 37+ messages in thread* Re: [PATCH 05/11] fs/resctrl: Use accurate type for rdt_resource::rid
2026-03-17 11:23 ` Ben Horgan
@ 2026-03-17 17:34 ` Reinette Chatre
0 siblings, 0 replies; 37+ messages in thread
From: Reinette Chatre @ 2026-03-17 17:34 UTC (permalink / raw)
To: Ben Horgan, tony.luck, james.morse, Dave.Martin, babu.moger, bp,
tglx, dave.hansen
Cc: x86, hpa, fustini, fenghuay, peternewman, linux-kernel, patches
Hi Ben,
On 3/17/26 4:23 AM, Ben Horgan wrote:
> Hi Reinette,
>
> On 3/2/26 18:46, Reinette Chatre wrote:
>> Every resctrl resource has a unique ID described by enum resctrl_res_level.
>> enum resctrl_res_level is used in all resource ID initializations and all
>> resource ID comparisons. All functions consuming the resource ID expects an
>> enum resctrl_res_level. Of the four structures that contain a resource ID
>> (struct mon_data, struct mon_evt, struct rdt_domain_hdr, and struct
>> rdt_resource) only struct rdt_resource does not use enum resctrl_res_level.
>>
>> Switch the type of rdt_resource::rid to be enum resctrl_res_level to make
>> it obvious what values are valid, match the type everywhere this member is
>> used, and obtain benefits from tools that can flag any enum misuse.
>>
>> Move define of RDT_NUM_RESOURCES outside the enum to enable tools to catch
>> when a switch() on the resource ID does not handle all the resources and
>> thus help flag which switch statements need an update when a new resource
>> is added.
>>
>> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
>> ---
>> include/linux/resctrl.h | 8 ++++----
>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index 006e57fd7ca5..48e95f273fb3 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -54,11 +54,11 @@ enum resctrl_res_level {
>> RDT_RESOURCE_MBA,
>> RDT_RESOURCE_SMBA,
>> RDT_RESOURCE_PERF_PKG,
>> -
>> - /* Must be the last */
>> - RDT_NUM_RESOURCES,
>> + /* Additions to enum need to update RDT_NUM_RESOURCES. */
>> };>
>> +#define RDT_NUM_RESOURCES (RDT_RESOURCE_PERF_PKG + 1)
>
> Would it be clearer to have RDT_RESOURCE_LAST = RDT_RESOURCE_PERF_PKG in the enum
> and use RDT_RESOURCE_LAST instead of RDT_RESOURCE_PERF_PKG here?
It would. Thank you very much for the suggestion. To have resctrl code be consistent
I'll add another patch that also switches enum resctrl_conf_type to use this pattern.
Reinette
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 06/11] fs/resctrl: Pass error reading event through to user space
2026-03-02 18:46 [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency Reinette Chatre
` (4 preceding siblings ...)
2026-03-02 18:46 ` [PATCH 05/11] fs/resctrl: Use accurate type for rdt_resource::rid Reinette Chatre
@ 2026-03-02 18:46 ` Reinette Chatre
2026-03-17 17:08 ` Ben Horgan
2026-03-02 18:46 ` [PATCH 07/11] fs/resctrl: Add last_cmd_status support for writes to max_threshold_occupancy Reinette Chatre
` (6 subsequent siblings)
12 siblings, 1 reply; 37+ messages in thread
From: Reinette Chatre @ 2026-03-02 18:46 UTC (permalink / raw)
To: tony.luck, james.morse, Dave.Martin, babu.moger, bp, tglx,
dave.hansen
Cc: x86, hpa, ben.horgan, fustini, fenghuay, peternewman,
linux-kernel, patches, reinette.chatre
Reading of event data is managed through populating a struct rmid_read with
properties of event needing to be read. This data is dispatched to an
appropriate CPU and upon completion any error can be found in rmid_read::err,
or on success the event data will be in rmid_read::val.
rmid_read::err is not updated in the unlikely scenario that the reading
of the event was dispatched to a wrong CPU. If this ever occurs due to
a bug in resctrl the user space read will return "success" but the data
reported will be invalid.
Ensure accurate error reporting so that if there may be an issue with
how resctrl picks a CPU it could be learned with an error to user space
instead of silent failure.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
fs/resctrl/monitor.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 421c70f96426..f1a08fdbdd61 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -453,8 +453,10 @@ static int __l3_mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
}
/* Reading a single domain, must be on a CPU in that domain. */
- if (!cpumask_test_cpu(cpu, &d->hdr.cpu_mask))
+ if (!cpumask_test_cpu(cpu, &d->hdr.cpu_mask)) {
+ rr->err = -EIO;
return -EINVAL;
+ }
if (rr->is_mbm_cntr)
rr->err = resctrl_arch_cntr_read(rr->r, d, closid, rmid, cntr_id,
rr->evt->evtid, &tval);
@@ -491,8 +493,10 @@ static int __l3_mon_event_count_sum(struct rdtgroup *rdtgrp, struct rmid_read *r
}
/* Summing domains that share a cache, must be on a CPU for that cache. */
- if (!cpumask_test_cpu(cpu, &rr->ci->shared_cpu_map))
+ if (!cpumask_test_cpu(cpu, &rr->ci->shared_cpu_map)) {
+ rr->err = -EIO;
return -EINVAL;
+ }
/*
* Legacy files must report the sum of an event across all
--
2.50.1
^ permalink raw reply related [flat|nested] 37+ messages in thread* Re: [PATCH 06/11] fs/resctrl: Pass error reading event through to user space
2026-03-02 18:46 ` [PATCH 06/11] fs/resctrl: Pass error reading event through to user space Reinette Chatre
@ 2026-03-17 17:08 ` Ben Horgan
0 siblings, 0 replies; 37+ messages in thread
From: Ben Horgan @ 2026-03-17 17:08 UTC (permalink / raw)
To: Reinette Chatre, tony.luck, james.morse, Dave.Martin, babu.moger,
bp, tglx, dave.hansen
Cc: x86, hpa, fustini, fenghuay, peternewman, linux-kernel, patches
Hi Reinette,
On 3/2/26 18:46, Reinette Chatre wrote:
> Reading of event data is managed through populating a struct rmid_read with
> properties of event needing to be read. This data is dispatched to an
> appropriate CPU and upon completion any error can be found in rmid_read::err,
> or on success the event data will be in rmid_read::val.
>
> rmid_read::err is not updated in the unlikely scenario that the reading
> of the event was dispatched to a wrong CPU. If this ever occurs due to
> a bug in resctrl the user space read will return "success" but the data
> reported will be invalid.
>
> Ensure accurate error reporting so that if there may be an issue with
> how resctrl picks a CPU it could be learned with an error to user space
> instead of silent failure.
Looks good to me. mon_event_count() won't discard the error as the same
cpu and domain choice is used for each __mon_event_count() call.
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
>
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
> fs/resctrl/monitor.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index 421c70f96426..f1a08fdbdd61 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -453,8 +453,10 @@ static int __l3_mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
> }
>
> /* Reading a single domain, must be on a CPU in that domain. */
> - if (!cpumask_test_cpu(cpu, &d->hdr.cpu_mask))
> + if (!cpumask_test_cpu(cpu, &d->hdr.cpu_mask)) {
> + rr->err = -EIO;
> return -EINVAL;
> + }
> if (rr->is_mbm_cntr)
> rr->err = resctrl_arch_cntr_read(rr->r, d, closid, rmid, cntr_id,
> rr->evt->evtid, &tval);
> @@ -491,8 +493,10 @@ static int __l3_mon_event_count_sum(struct rdtgroup *rdtgrp, struct rmid_read *r
> }
>
> /* Summing domains that share a cache, must be on a CPU for that cache. */
> - if (!cpumask_test_cpu(cpu, &rr->ci->shared_cpu_map))
> + if (!cpumask_test_cpu(cpu, &rr->ci->shared_cpu_map)) {
> + rr->err = -EIO;
> return -EINVAL;
> + }
>
> /*
> * Legacy files must report the sum of an event across all
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 07/11] fs/resctrl: Add last_cmd_status support for writes to max_threshold_occupancy
2026-03-02 18:46 [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency Reinette Chatre
` (5 preceding siblings ...)
2026-03-02 18:46 ` [PATCH 06/11] fs/resctrl: Pass error reading event through to user space Reinette Chatre
@ 2026-03-02 18:46 ` Reinette Chatre
2026-03-17 17:13 ` Ben Horgan
2026-03-02 18:46 ` [PATCH 08/11] fs/resctrl: Use accurate and symmetric exit flows Reinette Chatre
` (5 subsequent siblings)
12 siblings, 1 reply; 37+ messages in thread
From: Reinette Chatre @ 2026-03-02 18:46 UTC (permalink / raw)
To: tony.luck, james.morse, Dave.Martin, babu.moger, bp, tglx,
dave.hansen
Cc: x86, hpa, ben.horgan, fustini, fenghuay, peternewman,
linux-kernel, patches, reinette.chatre
info/last_cmd_status is intended to contain more information if a write to
any resctrl file fails. Writes to max_threshold_occupancy did not receive
last_cmd_status support during initial last_cmd_status enabling. Add it now.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
fs/resctrl/rdtgroup.c | 21 ++++++++++++++++-----
1 file changed, 16 insertions(+), 5 deletions(-)
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 3c9508fc558d..933f6ae26d59 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1238,16 +1238,27 @@ static ssize_t max_threshold_occ_write(struct kernfs_open_file *of,
unsigned int bytes;
int ret;
+ mutex_lock(&rdtgroup_mutex);
+ rdt_last_cmd_clear();
+
ret = kstrtouint(buf, 0, &bytes);
- if (ret)
- return ret;
+ if (ret) {
+ rdt_last_cmd_puts("Invalid input\n");
+ goto out_unlock;
+ }
- if (bytes > resctrl_rmid_realloc_limit)
- return -EINVAL;
+ if (bytes > resctrl_rmid_realloc_limit) {
+ rdt_last_cmd_printf("Exceeds limit (before adjustment) of %u bytes\n",
+ resctrl_rmid_realloc_limit);
+ ret = -EINVAL;
+ goto out_unlock;
+ }
resctrl_rmid_realloc_threshold = resctrl_arch_round_mon_val(bytes);
- return nbytes;
+out_unlock:
+ mutex_unlock(&rdtgroup_mutex);
+ return ret ?: nbytes;
}
/*
--
2.50.1
^ permalink raw reply related [flat|nested] 37+ messages in thread* Re: [PATCH 07/11] fs/resctrl: Add last_cmd_status support for writes to max_threshold_occupancy
2026-03-02 18:46 ` [PATCH 07/11] fs/resctrl: Add last_cmd_status support for writes to max_threshold_occupancy Reinette Chatre
@ 2026-03-17 17:13 ` Ben Horgan
0 siblings, 0 replies; 37+ messages in thread
From: Ben Horgan @ 2026-03-17 17:13 UTC (permalink / raw)
To: Reinette Chatre, tony.luck, james.morse, Dave.Martin, babu.moger,
bp, tglx, dave.hansen
Cc: x86, hpa, fustini, fenghuay, peternewman, linux-kernel, patches
Hi Reinette,
On 3/2/26 18:46, Reinette Chatre wrote:
> info/last_cmd_status is intended to contain more information if a write to
> any resctrl file fails. Writes to max_threshold_occupancy did not receive
> last_cmd_status support during initial last_cmd_status enabling. Add it now.
All looks sensible to me.
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Thanks,
Ben
>
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
> fs/resctrl/rdtgroup.c | 21 ++++++++++++++++-----
> 1 file changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index 3c9508fc558d..933f6ae26d59 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -1238,16 +1238,27 @@ static ssize_t max_threshold_occ_write(struct kernfs_open_file *of,
> unsigned int bytes;
> int ret;
>
> + mutex_lock(&rdtgroup_mutex);
> + rdt_last_cmd_clear();
> +
> ret = kstrtouint(buf, 0, &bytes);
> - if (ret)
> - return ret;
> + if (ret) {
> + rdt_last_cmd_puts("Invalid input\n");
> + goto out_unlock;
> + }
>
> - if (bytes > resctrl_rmid_realloc_limit)
> - return -EINVAL;
> + if (bytes > resctrl_rmid_realloc_limit) {
> + rdt_last_cmd_printf("Exceeds limit (before adjustment) of %u bytes\n",
> + resctrl_rmid_realloc_limit);
> + ret = -EINVAL;
> + goto out_unlock;
> + }
>
> resctrl_rmid_realloc_threshold = resctrl_arch_round_mon_val(bytes);
>
> - return nbytes;
> +out_unlock:
> + mutex_unlock(&rdtgroup_mutex);
> + return ret ?: nbytes;
> }
>
> /*
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 08/11] fs/resctrl: Use accurate and symmetric exit flows
2026-03-02 18:46 [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency Reinette Chatre
` (6 preceding siblings ...)
2026-03-02 18:46 ` [PATCH 07/11] fs/resctrl: Add last_cmd_status support for writes to max_threshold_occupancy Reinette Chatre
@ 2026-03-02 18:46 ` Reinette Chatre
2026-03-02 18:46 ` [PATCH 09/11] fs/resctrl: Use stricter checks on input to cpus/cpus_list file Reinette Chatre
` (4 subsequent siblings)
12 siblings, 0 replies; 37+ messages in thread
From: Reinette Chatre @ 2026-03-02 18:46 UTC (permalink / raw)
To: tony.luck, james.morse, Dave.Martin, babu.moger, bp, tglx,
dave.hansen
Cc: x86, hpa, ben.horgan, fustini, fenghuay, peternewman,
linux-kernel, patches, reinette.chatre
During schemata file write handling there is one error exit path labeled
"out" that handles all cleanup and unlocking needed on exit. The staged
configs cleared during an error exit may not have been used at the time
of exit making the clearing of the staged configs unnecessary.
Access the exit code using two labels and only clear the staged
configuration if it was in use at the time of exit. Doing so makes the
code flow obvious and simplifies upcoming changes that improve the
handling of failures during early input parsing.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
fs/resctrl/ctrlmondata.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index cc4237c57cbe..a8e7cdcfec6e 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -331,7 +331,7 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) {
ret = -EINVAL;
rdt_last_cmd_puts("Resource group is pseudo-locked\n");
- goto out;
+ goto out_unlock;
}
rdt_staged_configs_clear();
@@ -341,16 +341,16 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
if (!tok) {
rdt_last_cmd_puts("Missing ':'\n");
ret = -EINVAL;
- goto out;
+ goto out_clear_staged;
}
if (tok[0] == '\0') {
rdt_last_cmd_printf("Missing '%s' value\n", resname);
ret = -EINVAL;
- goto out;
+ goto out_clear_staged;
}
ret = rdtgroup_parse_resource(resname, tok, rdtgrp);
if (ret)
- goto out;
+ goto out_clear_staged;
}
list_for_each_entry(s, &resctrl_schema_all, list) {
@@ -365,7 +365,7 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
ret = resctrl_arch_update_domains(r, rdtgrp->closid);
if (ret)
- goto out;
+ goto out_clear_staged;
}
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
@@ -378,8 +378,9 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
ret = rdtgroup_pseudo_lock_create(rdtgrp);
}
-out:
+out_clear_staged:
rdt_staged_configs_clear();
+out_unlock:
rdtgroup_kn_unlock(of->kn);
return ret ?: nbytes;
}
--
2.50.1
^ permalink raw reply related [flat|nested] 37+ messages in thread* [PATCH 09/11] fs/resctrl: Use stricter checks on input to cpus/cpus_list file
2026-03-02 18:46 [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency Reinette Chatre
` (7 preceding siblings ...)
2026-03-02 18:46 ` [PATCH 08/11] fs/resctrl: Use accurate and symmetric exit flows Reinette Chatre
@ 2026-03-02 18:46 ` Reinette Chatre
2026-03-02 18:46 ` [PATCH 10/11] fs/resctrl: Change last_cmd_status custom during input parsing Reinette Chatre
` (3 subsequent siblings)
12 siblings, 0 replies; 37+ messages in thread
From: Reinette Chatre @ 2026-03-02 18:46 UTC (permalink / raw)
To: tony.luck, james.morse, Dave.Martin, babu.moger, bp, tglx,
dave.hansen
Cc: x86, hpa, ben.horgan, fustini, fenghuay, peternewman,
linux-kernel, patches, reinette.chatre
Callbacks handling writes to resctrl files are called via
kernfs_fop_write_iter() that ensures that the resctrl callback is never
called with a NULL buffer. Even if user space issues a write() with a NULL
buffer and zero count the resctrl callback receives a valid buffer
terminated with '\0' and the number of bytes parameter set to zero.
The NULL buffer check at the beginning of rdtgroup_cpus_write(), while
making no assumptions about caller behavior, is not useful. An empty
buffer is transformed into an empty cpumask that is passed through entire
flow that results in no changes.
Ensure that the user provided buffer contains some data before attempting
to parse it using the same check as other resctrl files (the familiar
"nbytes == 0"). The custom is for resctrl file callbacks to fail on an
empty buffer and this brings interactions with the cpus/cpus_list file in
line with this custom. The risk is that if there exists a user space that
uses empty writes to this specific file then those successful interactions
will start failing.
Exit right away if there was no failure yet no cpumask could be created
from the input. It is of no use to pass an empty cpumask through the
entire flow, just return with success to short-circuit the existing
behavior.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
fs/resctrl/rdtgroup.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 933f6ae26d59..f33a50c545bc 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -515,7 +515,7 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of,
struct rdtgroup *rdtgrp;
int ret;
- if (!buf)
+ if (!buf || nbytes == 0)
return -EINVAL;
if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
@@ -555,6 +555,9 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of,
goto unlock;
}
+ if (cpumask_empty(newmask))
+ goto unlock;
+
/* check that user didn't specify any offline cpus */
cpumask_andnot(tmpmask, newmask, cpu_online_mask);
if (!cpumask_empty(tmpmask)) {
--
2.50.1
^ permalink raw reply related [flat|nested] 37+ messages in thread* [PATCH 10/11] fs/resctrl: Change last_cmd_status custom during input parsing
2026-03-02 18:46 [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency Reinette Chatre
` (8 preceding siblings ...)
2026-03-02 18:46 ` [PATCH 09/11] fs/resctrl: Use stricter checks on input to cpus/cpus_list file Reinette Chatre
@ 2026-03-02 18:46 ` Reinette Chatre
2026-03-17 17:20 ` Ben Horgan
2026-03-02 18:46 ` [PATCH 11/11] fs/resctrl: Communicate resource group deleted error via last_cmd_status Reinette Chatre
` (2 subsequent siblings)
12 siblings, 1 reply; 37+ messages in thread
From: Reinette Chatre @ 2026-03-02 18:46 UTC (permalink / raw)
To: tony.luck, james.morse, Dave.Martin, babu.moger, bp, tglx,
dave.hansen
Cc: x86, hpa, ben.horgan, fustini, fenghuay, peternewman,
linux-kernel, patches, reinette.chatre
A pattern of usage of last_cmd_status was introduced during its enabling in
commit
c377dcfbee80 ("x86/intel_rdt: Add diagnostics when writing the schemata file")
and since copied throughout resctrl to result in the following custom:
..._write()
{
/* Early parsing of input, exit on failure. */
/* Obtain rdtgroup_mutex */
rdt_last_cmd_clear(); /* Clear last_cmd_status buffer */
/*
* Act on user command, failures result in detail
* error message in last_cmd_status buffer via
* rdt_last_cmd_puts()/rdt_last_cmd_printf().
*/
/* Release rdtgroup_mutex */
}
If resctrl exits with failure during early parsing of input there are two
possible scenarios:
- The last_cmd_status buffer is empty and a user's read of
info/last_cmd_status returns "ok".
- The last_cmd_status buffer contains details from an earlier ...write()
failure and a user's read of info/last_cmd_status returns this outdated
error description.
Writing to a resctrl file is considered a "resctrl command" and the
resctrl documentation states the following about the last_cmd_status file:
"If the command failed, it will provide more information that can be
conveyed in the error returns from file operations."
Neither of the current scenarios is correct behavior.
Move early input parsing to be done with rdtgroup_mutex held after the
last_cmd_status buffer is cleared. Let info/last_cmd_status be accurate
when an error is encountered during parsing of user command.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
fs/resctrl/ctrlmondata.c | 54 ++++++++++++++++----------
fs/resctrl/monitor.c | 60 ++++++++++++++++------------
fs/resctrl/rdtgroup.c | 84 +++++++++++++++++++++++-----------------
3 files changed, 118 insertions(+), 80 deletions(-)
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index a8e7cdcfec6e..7b90c36ff0a6 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -312,11 +312,6 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
char *tok, *resname;
int ret = 0;
- /* Valid input requires a trailing newline */
- if (nbytes == 0 || buf[nbytes - 1] != '\n')
- return -EINVAL;
- buf[nbytes - 1] = '\0';
-
rdtgrp = rdtgroup_kn_lock_live(of->kn);
if (!rdtgrp) {
rdtgroup_kn_unlock(of->kn);
@@ -324,6 +319,15 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
}
rdt_last_cmd_clear();
+ /* Valid input requires a trailing newline */
+ if (nbytes == 0 || buf[nbytes - 1] != '\n') {
+ rdt_last_cmd_puts("Invalid input\n");
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+
+ buf[nbytes - 1] = '\0';
+
/*
* No changes to pseudo-locked region allowed. It has to be removed
* and re-created instead.
@@ -466,11 +470,6 @@ ssize_t rdtgroup_mba_mbps_event_write(struct kernfs_open_file *of,
struct rdtgroup *rdtgrp;
int ret = 0;
- /* Valid input requires a trailing newline */
- if (nbytes == 0 || buf[nbytes - 1] != '\n')
- return -EINVAL;
- buf[nbytes - 1] = '\0';
-
rdtgrp = rdtgroup_kn_lock_live(of->kn);
if (!rdtgrp) {
rdtgroup_kn_unlock(of->kn);
@@ -478,6 +477,15 @@ ssize_t rdtgroup_mba_mbps_event_write(struct kernfs_open_file *of,
}
rdt_last_cmd_clear();
+ /* Valid input requires a trailing newline */
+ if (nbytes == 0 || buf[nbytes - 1] != '\n') {
+ rdt_last_cmd_puts("Invalid input\n");
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+
+ buf[nbytes - 1] = '\0';
+
if (!strcmp(buf, "mbm_local_bytes")) {
if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
rdtgrp->mba_mbps_event = QOS_L3_MBM_LOCAL_EVENT_ID;
@@ -495,6 +503,7 @@ ssize_t rdtgroup_mba_mbps_event_write(struct kernfs_open_file *of,
if (ret)
rdt_last_cmd_printf("Unsupported event id '%s'\n", buf);
+out_unlock:
rdtgroup_kn_unlock(of->kn);
return ret ?: nbytes;
@@ -854,15 +863,17 @@ ssize_t resctrl_io_alloc_write(struct kernfs_open_file *of, char *buf,
bool enable;
int ret;
- ret = kstrtobool(buf, &enable);
- if (ret)
- return ret;
-
cpus_read_lock();
mutex_lock(&rdtgroup_mutex);
rdt_last_cmd_clear();
+ ret = kstrtobool(buf, &enable);
+ if (ret) {
+ rdt_last_cmd_puts("Invalid input\n");
+ goto out_unlock;
+ }
+
if (!r->cache.io_alloc_capable) {
rdt_last_cmd_printf("io_alloc is not supported on %s\n", s->name);
ret = -ENODEV;
@@ -1004,16 +1015,19 @@ ssize_t resctrl_io_alloc_cbm_write(struct kernfs_open_file *of, char *buf,
u32 io_alloc_closid;
int ret = 0;
- /* Valid input requires a trailing newline */
- if (nbytes == 0 || buf[nbytes - 1] != '\n')
- return -EINVAL;
-
- buf[nbytes - 1] = '\0';
-
cpus_read_lock();
mutex_lock(&rdtgroup_mutex);
rdt_last_cmd_clear();
+ /* Valid input requires a trailing newline */
+ if (nbytes == 0 || buf[nbytes - 1] != '\n') {
+ rdt_last_cmd_puts("Invalid input\n");
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+
+ buf[nbytes - 1] = '\0';
+
if (!r->cache.io_alloc_capable) {
rdt_last_cmd_printf("io_alloc is not supported on %s\n", s->name);
ret = -ENODEV;
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index f1a08fdbdd61..6c499a0bd435 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -1112,13 +1112,15 @@ ssize_t resctrl_mbm_assign_on_mkdir_write(struct kernfs_open_file *of, char *buf
bool value;
int ret;
- ret = kstrtobool(buf, &value);
- if (ret)
- return ret;
-
mutex_lock(&rdtgroup_mutex);
rdt_last_cmd_clear();
+ ret = kstrtobool(buf, &value);
+ if (ret) {
+ rdt_last_cmd_puts("Invalid input\n");
+ goto out_unlock;
+ }
+
if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
rdt_last_cmd_puts("mbm_event counter assignment mode is not enabled\n");
ret = -EINVAL;
@@ -1409,17 +1411,20 @@ ssize_t event_filter_write(struct kernfs_open_file *of, char *buf, size_t nbytes
u32 evt_cfg = 0;
int ret = 0;
- /* Valid input requires a trailing newline */
- if (nbytes == 0 || buf[nbytes - 1] != '\n')
- return -EINVAL;
-
- buf[nbytes - 1] = '\0';
-
cpus_read_lock();
mutex_lock(&rdtgroup_mutex);
rdt_last_cmd_clear();
+ /* Valid input requires a trailing newline */
+ if (nbytes == 0 || buf[nbytes - 1] != '\n') {
+ rdt_last_cmd_puts("Invalid input\n");
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+
+ buf[nbytes - 1] = '\0';
+
r = resctrl_arch_get_resource(mevt->rid);
if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
rdt_last_cmd_puts("mbm_event counter assignment mode is not enabled\n");
@@ -1478,17 +1483,20 @@ ssize_t resctrl_mbm_assign_mode_write(struct kernfs_open_file *of, char *buf,
int ret = 0;
bool enable;
- /* Valid input requires a trailing newline */
- if (nbytes == 0 || buf[nbytes - 1] != '\n')
- return -EINVAL;
-
- buf[nbytes - 1] = '\0';
-
cpus_read_lock();
mutex_lock(&rdtgroup_mutex);
rdt_last_cmd_clear();
+ /* Valid input requires a trailing newline */
+ if (nbytes == 0 || buf[nbytes - 1] != '\n') {
+ rdt_last_cmd_puts("Invalid input\n");
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+
+ buf[nbytes - 1] = '\0';
+
if (!strcmp(buf, "default")) {
enable = 0;
} else if (!strcmp(buf, "mbm_event")) {
@@ -1759,12 +1767,6 @@ ssize_t mbm_L3_assignments_write(struct kernfs_open_file *of, char *buf,
char *token, *event;
int ret = 0;
- /* Valid input requires a trailing newline */
- if (nbytes == 0 || buf[nbytes - 1] != '\n')
- return -EINVAL;
-
- buf[nbytes - 1] = '\0';
-
rdtgrp = rdtgroup_kn_lock_live(of->kn);
if (!rdtgrp) {
rdtgroup_kn_unlock(of->kn);
@@ -1772,10 +1774,19 @@ ssize_t mbm_L3_assignments_write(struct kernfs_open_file *of, char *buf,
}
rdt_last_cmd_clear();
+ /* Valid input requires a trailing newline */
+ if (nbytes == 0 || buf[nbytes - 1] != '\n') {
+ rdt_last_cmd_puts("Invalid input\n");
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+
+ buf[nbytes - 1] = '\0';
+
if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
rdt_last_cmd_puts("mbm_event mode is not enabled\n");
- rdtgroup_kn_unlock(of->kn);
- return -EINVAL;
+ ret = -EINVAL;
+ goto out_unlock;
}
while ((token = strsep(&buf, "\n")) != NULL) {
@@ -1791,6 +1802,7 @@ ssize_t mbm_L3_assignments_write(struct kernfs_open_file *of, char *buf,
break;
}
+out_unlock:
rdtgroup_kn_unlock(of->kn);
return ret ?: nbytes;
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index f33a50c545bc..3b3acc748d03 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -511,38 +511,38 @@ static int cpus_ctrl_write(struct rdtgroup *rdtgrp, cpumask_var_t newmask,
static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off)
{
- cpumask_var_t tmpmask, newmask, tmpmask1;
+ cpumask_var_t tmpmask = CPUMASK_VAR_NULL, newmask = CPUMASK_VAR_NULL;
+ cpumask_var_t tmpmask1 = CPUMASK_VAR_NULL;
struct rdtgroup *rdtgrp;
int ret;
- if (!buf || nbytes == 0)
- return -EINVAL;
-
- if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
- return -ENOMEM;
- if (!zalloc_cpumask_var(&newmask, GFP_KERNEL)) {
- free_cpumask_var(tmpmask);
- return -ENOMEM;
- }
- if (!zalloc_cpumask_var(&tmpmask1, GFP_KERNEL)) {
- free_cpumask_var(tmpmask);
- free_cpumask_var(newmask);
- return -ENOMEM;
- }
-
rdtgrp = rdtgroup_kn_lock_live(of->kn);
if (!rdtgrp) {
ret = -ENOENT;
- goto unlock;
+ goto out_unlock;
}
rdt_last_cmd_clear();
+ if (!buf || nbytes == 0) {
+ rdt_last_cmd_puts("Invalid input\n");
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+
+ if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL) ||
+ !zalloc_cpumask_var(&newmask, GFP_KERNEL) ||
+ !zalloc_cpumask_var(&tmpmask1, GFP_KERNEL)) {
+ rdt_last_cmd_puts("Kernel allocation failure\n");
+ ret = -ENOMEM;
+ goto out_free;
+ }
+
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED ||
rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
ret = -EINVAL;
rdt_last_cmd_puts("Pseudo-locking in progress\n");
- goto unlock;
+ goto out_free;
}
if (is_cpu_list(of))
@@ -552,18 +552,18 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of,
if (ret) {
rdt_last_cmd_puts("Bad CPU list/mask\n");
- goto unlock;
+ goto out_free;
}
if (cpumask_empty(newmask))
- goto unlock;
+ goto out_free;
/* check that user didn't specify any offline cpus */
cpumask_andnot(tmpmask, newmask, cpu_online_mask);
if (!cpumask_empty(tmpmask)) {
ret = -EINVAL;
rdt_last_cmd_puts("Can only assign online CPUs\n");
- goto unlock;
+ goto out_free;
}
if (rdtgrp->type == RDTCTRL_GROUP)
@@ -573,11 +573,12 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of,
else
ret = -EINVAL;
-unlock:
- rdtgroup_kn_unlock(of->kn);
+out_free:
free_cpumask_var(tmpmask);
free_cpumask_var(newmask);
free_cpumask_var(tmpmask1);
+out_unlock:
+ rdtgroup_kn_unlock(of->kn);
return ret ?: nbytes;
}
@@ -1457,11 +1458,6 @@ static ssize_t rdtgroup_mode_write(struct kernfs_open_file *of,
enum rdtgrp_mode mode;
int ret = 0;
- /* Valid input requires a trailing newline */
- if (nbytes == 0 || buf[nbytes - 1] != '\n')
- return -EINVAL;
- buf[nbytes - 1] = '\0';
-
rdtgrp = rdtgroup_kn_lock_live(of->kn);
if (!rdtgrp) {
rdtgroup_kn_unlock(of->kn);
@@ -1469,6 +1465,14 @@ static ssize_t rdtgroup_mode_write(struct kernfs_open_file *of,
}
rdt_last_cmd_clear();
+ /* Valid input requires a trailing newline */
+ if (nbytes == 0 || buf[nbytes - 1] != '\n') {
+ rdt_last_cmd_puts("Invalid input\n");
+ ret = -EINVAL;
+ goto out;
+ }
+
+ buf[nbytes - 1] = '\0';
mode = rdtgrp->mode;
@@ -1794,19 +1798,23 @@ static ssize_t mbm_total_bytes_config_write(struct kernfs_open_file *of,
struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
int ret;
- /* Valid input requires a trailing newline */
- if (nbytes == 0 || buf[nbytes - 1] != '\n')
- return -EINVAL;
-
cpus_read_lock();
mutex_lock(&rdtgroup_mutex);
rdt_last_cmd_clear();
+ /* Valid input requires a trailing newline */
+ if (nbytes == 0 || buf[nbytes - 1] != '\n') {
+ rdt_last_cmd_puts("Invalid input\n");
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+
buf[nbytes - 1] = '\0';
ret = mon_config_write(r, buf, QOS_L3_MBM_TOTAL_EVENT_ID);
+out_unlock:
mutex_unlock(&rdtgroup_mutex);
cpus_read_unlock();
@@ -1820,19 +1828,23 @@ static ssize_t mbm_local_bytes_config_write(struct kernfs_open_file *of,
struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
int ret;
- /* Valid input requires a trailing newline */
- if (nbytes == 0 || buf[nbytes - 1] != '\n')
- return -EINVAL;
-
cpus_read_lock();
mutex_lock(&rdtgroup_mutex);
rdt_last_cmd_clear();
+ /* Valid input requires a trailing newline */
+ if (nbytes == 0 || buf[nbytes - 1] != '\n') {
+ rdt_last_cmd_puts("Invalid input\n");
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+
buf[nbytes - 1] = '\0';
ret = mon_config_write(r, buf, QOS_L3_MBM_LOCAL_EVENT_ID);
+out_unlock:
mutex_unlock(&rdtgroup_mutex);
cpus_read_unlock();
--
2.50.1
^ permalink raw reply related [flat|nested] 37+ messages in thread* Re: [PATCH 10/11] fs/resctrl: Change last_cmd_status custom during input parsing
2026-03-02 18:46 ` [PATCH 10/11] fs/resctrl: Change last_cmd_status custom during input parsing Reinette Chatre
@ 2026-03-17 17:20 ` Ben Horgan
0 siblings, 0 replies; 37+ messages in thread
From: Ben Horgan @ 2026-03-17 17:20 UTC (permalink / raw)
To: Reinette Chatre, tony.luck, james.morse, Dave.Martin, babu.moger,
bp, tglx, dave.hansen
Cc: x86, hpa, fustini, fenghuay, peternewman, linux-kernel, patches
Hi Reinette,
On 3/2/26 18:46, Reinette Chatre wrote:
> A pattern of usage of last_cmd_status was introduced during its enabling in
> commit
>
> c377dcfbee80 ("x86/intel_rdt: Add diagnostics when writing the schemata file")
>
> and since copied throughout resctrl to result in the following custom:
>
> ..._write()
> {
> /* Early parsing of input, exit on failure. */
>
> /* Obtain rdtgroup_mutex */
> rdt_last_cmd_clear(); /* Clear last_cmd_status buffer */
>
> /*
> * Act on user command, failures result in detail
> * error message in last_cmd_status buffer via
> * rdt_last_cmd_puts()/rdt_last_cmd_printf().
> */
>
> /* Release rdtgroup_mutex */
> }
>
> If resctrl exits with failure during early parsing of input there are two
> possible scenarios:
> - The last_cmd_status buffer is empty and a user's read of
> info/last_cmd_status returns "ok".
> - The last_cmd_status buffer contains details from an earlier ...write()
> failure and a user's read of info/last_cmd_status returns this outdated
> error description.
>
> Writing to a resctrl file is considered a "resctrl command" and the
> resctrl documentation states the following about the last_cmd_status file:
> "If the command failed, it will provide more information that can be
> conveyed in the error returns from file operations."
> Neither of the current scenarios is correct behavior.
>
> Move early input parsing to be done with rdtgroup_mutex held after the
> last_cmd_status buffer is cleared. Let info/last_cmd_status be accurate
> when an error is encountered during parsing of user command.
This seems useful and the patch looks good to me.
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Thanks,
Ben
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 11/11] fs/resctrl: Communicate resource group deleted error via last_cmd_status
2026-03-02 18:46 [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency Reinette Chatre
` (9 preceding siblings ...)
2026-03-02 18:46 ` [PATCH 10/11] fs/resctrl: Change last_cmd_status custom during input parsing Reinette Chatre
@ 2026-03-02 18:46 ` Reinette Chatre
2026-03-02 23:37 ` [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency Luck, Tony
2026-03-16 17:44 ` Ben Horgan
12 siblings, 0 replies; 37+ messages in thread
From: Reinette Chatre @ 2026-03-02 18:46 UTC (permalink / raw)
To: tony.luck, james.morse, Dave.Martin, babu.moger, bp, tglx,
dave.hansen
Cc: x86, hpa, ben.horgan, fustini, fenghuay, peternewman,
linux-kernel, patches, reinette.chatre
User space expects last_cmd_status to contain additional information if
any resctrl command fails.
A resctrl command may be blocked waiting on the rdtgroup_mutex waiting for
another command to finish and find that once the mutex is available that
the resource group has since been deleted. In this scenario the command
will fail while last_cmd_status contains either "ok" if the last_cmd_status
buffer is empty or an outdated error from a previous command failure if
last_cmd_status buffer has content.
Include clearing of last_cmd_status buffer as part of
rdtgroup_kn_lock_live() that is used to obtain access and needed locking
to a resource group before attempting a command on the group. With the
last_cmd_status buffer ready, provide an appropriate message to user space
if the resource group has been deleted.
No last_cmd_status treatment is needed for the remaining failure of
rdtgroup_kn_lock_live() encountering a non-existent resource group since
that could only occur during an attempt to obtain a resource group lock
on a file in info/ which is an invalid usage.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
fs/resctrl/ctrlmondata.c | 3 ---
fs/resctrl/monitor.c | 2 --
fs/resctrl/rdtgroup.c | 14 +++++---------
3 files changed, 5 insertions(+), 14 deletions(-)
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index 7b90c36ff0a6..9915f714e26a 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -317,7 +317,6 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
rdtgroup_kn_unlock(of->kn);
return -ENOENT;
}
- rdt_last_cmd_clear();
/* Valid input requires a trailing newline */
if (nbytes == 0 || buf[nbytes - 1] != '\n') {
@@ -434,7 +433,6 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
}
} else if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) {
if (!rdtgrp->plr->d) {
- rdt_last_cmd_clear();
rdt_last_cmd_puts("Cache domain offline\n");
ret = -ENODEV;
} else {
@@ -475,7 +473,6 @@ ssize_t rdtgroup_mba_mbps_event_write(struct kernfs_open_file *of,
rdtgroup_kn_unlock(of->kn);
return -ENOENT;
}
- rdt_last_cmd_clear();
/* Valid input requires a trailing newline */
if (nbytes == 0 || buf[nbytes - 1] != '\n') {
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 6c499a0bd435..73893a7e14e0 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -1632,7 +1632,6 @@ int mbm_L3_assignments_show(struct kernfs_open_file *of, struct seq_file *s, voi
goto out_unlock;
}
- rdt_last_cmd_clear();
if (!resctrl_arch_mbm_cntr_assign_enabled(r)) {
rdt_last_cmd_puts("mbm_event counter assignment mode is not enabled\n");
ret = -EINVAL;
@@ -1772,7 +1771,6 @@ ssize_t mbm_L3_assignments_write(struct kernfs_open_file *of, char *buf,
rdtgroup_kn_unlock(of->kn);
return -ENOENT;
}
- rdt_last_cmd_clear();
/* Valid input requires a trailing newline */
if (nbytes == 0 || buf[nbytes - 1] != '\n') {
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 3b3acc748d03..09d688bee0a3 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -359,7 +359,6 @@ static int rdtgroup_cpus_show(struct kernfs_open_file *of,
if (rdtgrp) {
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) {
if (!rdtgrp->plr->d) {
- rdt_last_cmd_clear();
rdt_last_cmd_puts("Cache domain offline\n");
ret = -ENODEV;
} else {
@@ -522,8 +521,6 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of,
goto out_unlock;
}
- rdt_last_cmd_clear();
-
if (!buf || nbytes == 0) {
rdt_last_cmd_puts("Invalid input\n");
ret = -EINVAL;
@@ -783,7 +780,6 @@ static ssize_t rdtgroup_tasks_write(struct kernfs_open_file *of,
rdtgroup_kn_unlock(of->kn);
return -ENOENT;
}
- rdt_last_cmd_clear();
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED ||
rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
@@ -1464,7 +1460,6 @@ static ssize_t rdtgroup_mode_write(struct kernfs_open_file *of,
return -ENOENT;
}
- rdt_last_cmd_clear();
/* Valid input requires a trailing newline */
if (nbytes == 0 || buf[nbytes - 1] != '\n') {
rdt_last_cmd_puts("Invalid input\n");
@@ -1601,7 +1596,6 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) {
if (!rdtgrp->plr->d) {
- rdt_last_cmd_clear();
rdt_last_cmd_puts("Cache domain offline\n");
ret = -ENODEV;
} else {
@@ -2639,10 +2633,14 @@ struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn)
cpus_read_lock();
mutex_lock(&rdtgroup_mutex);
+ rdt_last_cmd_clear();
/* Was this group deleted while we waited? */
- if (rdtgrp->flags & RDT_DELETED)
+ if (rdtgrp->flags & RDT_DELETED) {
+ rdt_last_cmd_printf("Resource group %s deleted. No commands possible.\n",
+ rdt_kn_name(rdtgrp->kn));
return NULL;
+ }
return rdtgrp;
}
@@ -3765,8 +3763,6 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
goto out_unlock;
}
- rdt_last_cmd_clear();
-
/*
* Check that the parent directory for a monitor group is a "mon_groups"
* directory.
--
2.50.1
^ permalink raw reply related [flat|nested] 37+ messages in thread* Re: [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency
2026-03-02 18:46 [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency Reinette Chatre
` (10 preceding siblings ...)
2026-03-02 18:46 ` [PATCH 11/11] fs/resctrl: Communicate resource group deleted error via last_cmd_status Reinette Chatre
@ 2026-03-02 23:37 ` Luck, Tony
2026-03-03 2:18 ` Reinette Chatre
2026-03-04 11:48 ` Ben Horgan
2026-03-16 17:44 ` Ben Horgan
12 siblings, 2 replies; 37+ messages in thread
From: Luck, Tony @ 2026-03-02 23:37 UTC (permalink / raw)
To: Reinette Chatre
Cc: james.morse, Dave.Martin, babu.moger, bp, tglx, dave.hansen, x86,
hpa, ben.horgan, fustini, fenghuay, peternewman, linux-kernel,
patches
On Mon, Mar 02, 2026 at 10:46:06AM -0800, Reinette Chatre wrote:
> Hi Everybody,
>
> This is a collection of resctrl cleanups assembled together for convenience
> and simpler tracking. I'd be happy to split them up if it makes review and/or
> handling easier.
If it is time for spring cleaning in the rescctrl code, maybe fix some
bad fir tree declarations too?
Note resctrl_arch_pseudo_lock_fn() needs help too, but complicated by
having #ifdef CONFIG_KASAN mixed in with declarations. It might need
to remain an exception.
-Tony
From dd9c2ad1a1361b34e25fc10d18d3ceb3ba57fb92 Mon Sep 17 00:00:00 2001
From: Tony Luck <tony.luck@intel.com>
Date: Mon, 2 Mar 2026 15:28:36 -0800
Subject: [PATCH] fs/resctrl: Clean up some bad "fir tree" declarations
Sort local variables by length (longest first) per TIP tree.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
fs/resctrl/pseudo_lock.c | 2 +-
fs/resctrl/rdtgroup.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/resctrl/pseudo_lock.c b/fs/resctrl/pseudo_lock.c
index e1e9134474f4..cd57d862e0cf 100644
--- a/fs/resctrl/pseudo_lock.c
+++ b/fs/resctrl/pseudo_lock.c
@@ -797,10 +797,10 @@ static const struct file_operations pseudo_measure_fops = {
int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp)
{
struct pseudo_lock_region *plr = rdtgrp->plr;
+ char *kn_name __free(kfree) = NULL;
struct task_struct *thread;
unsigned int new_minor;
struct device *dev;
- char *kn_name __free(kfree) = NULL;
int ret;
ret = pseudo_lock_region_alloc(plr);
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 0e93cecf6f30..b2ca6394714a 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -3423,8 +3423,8 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
static void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
struct rdt_domain_hdr *hdr)
{
- struct kernfs_node *parent_kn;
struct rdtgroup *prgrp, *crgrp;
+ struct kernfs_node *parent_kn;
struct list_head *head;
list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
@@ -3559,9 +3559,9 @@ static int __init_one_rdt_domain(struct rdt_ctrl_domain *d, struct resctrl_schem
struct resctrl_staged_config *cfg;
struct rdt_resource *r = s->res;
u32 used_b = 0, unused_b = 0;
+ u32 peer_ctl, ctrl_val;
unsigned long tmp_cbm;
enum rdtgrp_mode mode;
- u32 peer_ctl, ctrl_val;
int i;
cfg = &d->staged_config[t];
--
2.53.0
^ permalink raw reply related [flat|nested] 37+ messages in thread* Re: [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency
2026-03-02 23:37 ` [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency Luck, Tony
@ 2026-03-03 2:18 ` Reinette Chatre
2026-03-04 11:48 ` Ben Horgan
1 sibling, 0 replies; 37+ messages in thread
From: Reinette Chatre @ 2026-03-03 2:18 UTC (permalink / raw)
To: Luck, Tony
Cc: james.morse, Dave.Martin, babu.moger, bp, tglx, dave.hansen, x86,
hpa, ben.horgan, fustini, fenghuay, peternewman, linux-kernel,
patches
Hi Tony,
On 3/2/26 3:37 PM, Luck, Tony wrote:
> On Mon, Mar 02, 2026 at 10:46:06AM -0800, Reinette Chatre wrote:
>> Hi Everybody,
>>
>> This is a collection of resctrl cleanups assembled together for convenience
>> and simpler tracking. I'd be happy to split them up if it makes review and/or
>> handling easier.
>
> If it is time for spring cleaning in the rescctrl code, maybe fix some
> bad fir tree declarations too?
I do not know if a cleanup like this will be appreciated but it looks good to me
and I can include it for consideration as part of the next version.
> Note resctrl_arch_pseudo_lock_fn() needs help too, but complicated by
> having #ifdef CONFIG_KASAN mixed in with declarations. It might need
> to remain an exception.
Agreed.
Reinette
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency
2026-03-02 23:37 ` [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency Luck, Tony
2026-03-03 2:18 ` Reinette Chatre
@ 2026-03-04 11:48 ` Ben Horgan
2026-03-16 22:28 ` Reinette Chatre
1 sibling, 1 reply; 37+ messages in thread
From: Ben Horgan @ 2026-03-04 11:48 UTC (permalink / raw)
To: Luck, Tony, Reinette Chatre
Cc: james.morse, Dave.Martin, babu.moger, bp, tglx, dave.hansen, x86,
hpa, fustini, fenghuay, peternewman, linux-kernel, patches
Hi Tony,
On 3/2/26 23:37, Luck, Tony wrote:
> On Mon, Mar 02, 2026 at 10:46:06AM -0800, Reinette Chatre wrote:
>> Hi Everybody,
>>
>> This is a collection of resctrl cleanups assembled together for convenience
>> and simpler tracking. I'd be happy to split them up if it makes review and/or
>> handling easier.
>
> If it is time for spring cleaning in the rescctrl code, maybe fix some
> bad fir tree declarations too?
>
> Note resctrl_arch_pseudo_lock_fn() needs help too, but complicated by
> having #ifdef CONFIG_KASAN mixed in with declarations. It might need
> to remain an exception.
>
> -Tony
>
>
> From dd9c2ad1a1361b34e25fc10d18d3ceb3ba57fb92 Mon Sep 17 00:00:00 2001
> From: Tony Luck <tony.luck@intel.com>
> Date: Mon, 2 Mar 2026 15:28:36 -0800
> Subject: [PATCH] fs/resctrl: Clean up some bad "fir tree" declarations
>
> Sort local variables by length (longest first) per TIP tree.
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> fs/resctrl/pseudo_lock.c | 2 +-
> fs/resctrl/rdtgroup.c | 4 ++--
> 2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/fs/resctrl/pseudo_lock.c b/fs/resctrl/pseudo_lock.c
> index e1e9134474f4..cd57d862e0cf 100644
> --- a/fs/resctrl/pseudo_lock.c
> +++ b/fs/resctrl/pseudo_lock.c
> @@ -797,10 +797,10 @@ static const struct file_operations pseudo_measure_fops = {
> int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp)
> {
> struct pseudo_lock_region *plr = rdtgrp->plr;
> + char *kn_name __free(kfree) = NULL;
> struct task_struct *thread;
> unsigned int new_minor;
> struct device *dev;
> - char *kn_name __free(kfree) = NULL;
If you are changing this, I would expect either the declaration to come
lower at the allocation or alternatively drop the __free and do an
explict kfree. This is based on the guidance in include/linux/cleanup.h
""
* Given that the "__free(...) = NULL" pattern for variables defined at
* the top of the function poses this potential interdependency problem
* the recommendation is to always define and assign variables in one
* statement and not group variable definitions at the top of the
* function when __free() is used.
*
* Lastly, given that the benefit of cleanup helpers is removal of
* "goto", and that the "goto" statement can jump between scopes, the
* expectation is that usage of "goto" and cleanup helpers is never
* mixed in the same function. I.e. for a given routine, convert all
* resources that need a "goto" cleanup to scope-based cleanup, or
* convert none of them.
""
> int ret;
>
> ret = pseudo_lock_region_alloc(plr);
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index 0e93cecf6f30..b2ca6394714a 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -3423,8 +3423,8 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
> static void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
> struct rdt_domain_hdr *hdr)
> {
> - struct kernfs_node *parent_kn;
> struct rdtgroup *prgrp, *crgrp;
> + struct kernfs_node *parent_kn;
> struct list_head *head;
>
> list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
> @@ -3559,9 +3559,9 @@ static int __init_one_rdt_domain(struct rdt_ctrl_domain *d, struct resctrl_schem
> struct resctrl_staged_config *cfg;
> struct rdt_resource *r = s->res;
> u32 used_b = 0, unused_b = 0;
> + u32 peer_ctl, ctrl_val;
> unsigned long tmp_cbm;
> enum rdtgrp_mode mode;
> - u32 peer_ctl, ctrl_val;
> int i;
>
> cfg = &d->staged_config[t];
Thanks,
Ben
^ permalink raw reply [flat|nested] 37+ messages in thread* Re: [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency
2026-03-04 11:48 ` Ben Horgan
@ 2026-03-16 22:28 ` Reinette Chatre
0 siblings, 0 replies; 37+ messages in thread
From: Reinette Chatre @ 2026-03-16 22:28 UTC (permalink / raw)
To: Ben Horgan, Luck, Tony
Cc: james.morse, Dave.Martin, babu.moger, bp, tglx, dave.hansen, x86,
hpa, fustini, fenghuay, peternewman, linux-kernel, patches
Hi Ben and Tony,
On 3/4/26 3:48 AM, Ben Horgan wrote:
> Hi Tony,
>
> On 3/2/26 23:37, Luck, Tony wrote:
>> On Mon, Mar 02, 2026 at 10:46:06AM -0800, Reinette Chatre wrote:
>>> Hi Everybody,
>>>
>>> This is a collection of resctrl cleanups assembled together for convenience
>>> and simpler tracking. I'd be happy to split them up if it makes review and/or
>>> handling easier.
>>
>> If it is time for spring cleaning in the rescctrl code, maybe fix some
>> bad fir tree declarations too?
>>
>> Note resctrl_arch_pseudo_lock_fn() needs help too, but complicated by
>> having #ifdef CONFIG_KASAN mixed in with declarations. It might need
>> to remain an exception.
>>
>> -Tony
>>
>>
>> From dd9c2ad1a1361b34e25fc10d18d3ceb3ba57fb92 Mon Sep 17 00:00:00 2001
>> From: Tony Luck <tony.luck@intel.com>
>> Date: Mon, 2 Mar 2026 15:28:36 -0800
>> Subject: [PATCH] fs/resctrl: Clean up some bad "fir tree" declarations
>>
>> Sort local variables by length (longest first) per TIP tree.
>>
>> Signed-off-by: Tony Luck <tony.luck@intel.com>
>> ---
>> fs/resctrl/pseudo_lock.c | 2 +-
>> fs/resctrl/rdtgroup.c | 4 ++--
>> 2 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/resctrl/pseudo_lock.c b/fs/resctrl/pseudo_lock.c
>> index e1e9134474f4..cd57d862e0cf 100644
>> --- a/fs/resctrl/pseudo_lock.c
>> +++ b/fs/resctrl/pseudo_lock.c
>> @@ -797,10 +797,10 @@ static const struct file_operations pseudo_measure_fops = {
>> int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp)
>> {
>> struct pseudo_lock_region *plr = rdtgrp->plr;
>> + char *kn_name __free(kfree) = NULL;
>> struct task_struct *thread;
>> unsigned int new_minor;
>> struct device *dev;
>> - char *kn_name __free(kfree) = NULL;
>
> If you are changing this, I would expect either the declaration to come
> lower at the allocation or alternatively drop the __free and do an
> explict kfree. This is based on the guidance in include/linux/cleanup.h
>
> ""
> * Given that the "__free(...) = NULL" pattern for variables defined at
> * the top of the function poses this potential interdependency problem
> * the recommendation is to always define and assign variables in one
> * statement and not group variable definitions at the top of the
> * function when __free() is used.
> *
> * Lastly, given that the benefit of cleanup helpers is removal of
> * "goto", and that the "goto" statement can jump between scopes, the
> * expectation is that usage of "goto" and cleanup helpers is never
> * mixed in the same function. I.e. for a given routine, convert all
> * resources that need a "goto" cleanup to scope-based cleanup, or
> * convert none of them.
> ""
Since this is the only cleanup helper in this function it is not clear to
me what impact moving of the declaration would have. I fully agree with the latter
though, the usage of the single cleanup helper in this function that continues
to heavily use goto makes the flow through this function difficult to follow.
Reinette
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency
2026-03-02 18:46 [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency Reinette Chatre
` (11 preceding siblings ...)
2026-03-02 23:37 ` [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency Luck, Tony
@ 2026-03-16 17:44 ` Ben Horgan
2026-03-16 18:18 ` Reinette Chatre
12 siblings, 1 reply; 37+ messages in thread
From: Ben Horgan @ 2026-03-16 17:44 UTC (permalink / raw)
To: Reinette Chatre, tony.luck, james.morse, Dave.Martin, babu.moger,
bp, tglx, dave.hansen
Cc: x86, hpa, fustini, fenghuay, peternewman, linux-kernel, patches
Hi Reinette,
On 3/2/26 18:46, Reinette Chatre wrote:
> Hi Everybody,
>
> This is a collection of resctrl cleanups assembled together for convenience
> and simpler tracking. I'd be happy to split them up if it makes review and/or
> handling easier.
>
> Summary of changes:
>
> - Let resctrl pass stricter checks from various tools to provide a cleaner
> baseline with the goal to promote healthier contributions:
> - ./tools/docs/kernel-doc -Wall -v <files>
> - Build with W=12
> - ./scripts/coccicheck
> - Static checkers
>
> - Use accurate and consistent type for all uses of resource ID.
>
> - In the unlikely scenario that resctrl picked a wrong CPU to read an event
> from, pass the error through to user space instead of claiming to succeed
> and returning a (wrong) result.
>
> - Since inception of last_cmd_status feature there have been mismatches
> between resctrl file operation failures and the contents of
> info/last_cmd_status. This pattern keeps propagating with each new resctrl
> feature. Establish a new baseline with a new pattern that ensures
> info/last_cmd_status contains an accurate failure description that matches
> the most recent resctrl file operation failure.
>
One related issue I've just noticed is that when ABMC and mbm_assign_on_mkdir are
enabled the creation of MON/CTRL_MON directories may succeed but an error message
is written to last_cmd_status. E.g.
/sys/fs/resctrl# mkdir mon_groups/new5
/sys/fs/resctrl# cat info/last_cmd_status
Failed to allocate counter for mbm_total_bytes in domain 2
The failure is ignored, as expected, in rdt_assign_cntrs() but the last_cmd_status
is never cleared. I think this could be fixed by:
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 62edb464410a..396f17ed72c6 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -1260,6 +1260,8 @@ void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
rdtgroup_assign_cntr_event(NULL, rdtgrp,
&mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID]);
+
+ rdt_last_cmd_clear();
}
Is this right thing to do? Let me know if you want a proper patch.
Thanks,
Ben
^ permalink raw reply related [flat|nested] 37+ messages in thread* Re: [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency
2026-03-16 17:44 ` Ben Horgan
@ 2026-03-16 18:18 ` Reinette Chatre
2026-03-17 10:25 ` Ben Horgan
0 siblings, 1 reply; 37+ messages in thread
From: Reinette Chatre @ 2026-03-16 18:18 UTC (permalink / raw)
To: Ben Horgan, tony.luck, james.morse, Dave.Martin, babu.moger, bp,
tglx, dave.hansen
Cc: x86, hpa, fustini, fenghuay, peternewman, linux-kernel, patches
Hi Ben,
On 3/16/26 10:44 AM, Ben Horgan wrote:
> On 3/2/26 18:46, Reinette Chatre wrote:
...
> One related issue I've just noticed is that when ABMC and mbm_assign_on_mkdir are
> enabled the creation of MON/CTRL_MON directories may succeed but an error message
> is written to last_cmd_status. E.g.
>
> /sys/fs/resctrl# mkdir mon_groups/new5
> /sys/fs/resctrl# cat info/last_cmd_status
> Failed to allocate counter for mbm_total_bytes in domain 2
>
> The failure is ignored, as expected, in rdt_assign_cntrs() but the last_cmd_status
> is never cleared. I think this could be fixed by:
>
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index 62edb464410a..396f17ed72c6 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -1260,6 +1260,8 @@ void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
> if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
> rdtgroup_assign_cntr_event(NULL, rdtgrp,
> &mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID]);
> +
> + rdt_last_cmd_clear();
> }
>
> Is this right thing to do? Let me know if you want a proper patch.
Letting group be created without any counters assigned while writing the error
to last_cmd_status is the intended behavior. If the last_cmd_status buffer is cleared
at this point then user space will never have the opportunity to see the message that
contains the details.
It did not seem appropriate to let resource group creation fail when no counters
are available. I see that the documentation is not clear on this. What do you think
of an update to documentation instead? Would something like below help clarify behavior?
diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index ba609f8d4de5..20dc58d281cf 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -478,6 +478,12 @@ with the following files:
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_on_mkdir
0
+ Automatic counter assignment is done with best effort. If auto assignment
+ is enabled but there are not enough available counters then monitor group
+ creation could succeed while one or more events belonging to the group may
+ not have a counter assigned. Consult last_cmd_status for details during
+ such scenario.
+
"max_threshold_occupancy":
Read/write file provides the largest value (in
bytes) at which a previously used LLC_occupancy
Reinette
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency
2026-03-16 18:18 ` Reinette Chatre
@ 2026-03-17 10:25 ` Ben Horgan
2026-03-17 18:09 ` Reinette Chatre
0 siblings, 1 reply; 37+ messages in thread
From: Ben Horgan @ 2026-03-17 10:25 UTC (permalink / raw)
To: Reinette Chatre, tony.luck, james.morse, Dave.Martin, babu.moger,
bp, tglx, dave.hansen
Cc: x86, hpa, fustini, fenghuay, peternewman, linux-kernel, patches
Hi Reinette,
On 3/16/26 18:18, Reinette Chatre wrote:
> Hi Ben,
>
> On 3/16/26 10:44 AM, Ben Horgan wrote:
>> On 3/2/26 18:46, Reinette Chatre wrote:
> ...
>> One related issue I've just noticed is that when ABMC and mbm_assign_on_mkdir are
>> enabled the creation of MON/CTRL_MON directories may succeed but an error message
>> is written to last_cmd_status. E.g.
>>
>> /sys/fs/resctrl# mkdir mon_groups/new5
>> /sys/fs/resctrl# cat info/last_cmd_status
>> Failed to allocate counter for mbm_total_bytes in domain 2
>>
>> The failure is ignored, as expected, in rdt_assign_cntrs() but the last_cmd_status
>> is never cleared. I think this could be fixed by:
>>
>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>> index 62edb464410a..396f17ed72c6 100644
>> --- a/fs/resctrl/monitor.c
>> +++ b/fs/resctrl/monitor.c
>> @@ -1260,6 +1260,8 @@ void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
>> if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
>> rdtgroup_assign_cntr_event(NULL, rdtgrp,
>> &mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID]);
>> +
>> + rdt_last_cmd_clear();
>> }
>>
>> Is this right thing to do? Let me know if you want a proper patch.
>
> Letting group be created without any counters assigned while writing the error
> to last_cmd_status is the intended behavior. If the last_cmd_status buffer is cleared
> at this point then user space will never have the opportunity to see the message that
> contains the details.
>
> It did not seem appropriate to let resource group creation fail when no counters
> are available. I see that the documentation is not clear on this. What do you think
> of an update to documentation instead? Would something like below help clarify behavior?
>
> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
> index ba609f8d4de5..20dc58d281cf 100644
> --- a/Documentation/filesystems/resctrl.rst
> +++ b/Documentation/filesystems/resctrl.rst
> @@ -478,6 +478,12 @@ with the following files:
> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_on_mkdir
> 0
>
> + Automatic counter assignment is done with best effort. If auto assignment
> + is enabled but there are not enough available counters then monitor group
> + creation could succeed while one or more events belonging to the group may
> + not have a counter assigned. Consult last_cmd_status for details during
> + such scenario.
> +
This does the improve the situation but the multiple domain failure behaviour depends
on the order the domains are iterated through. This is stable as the list is sorted but
does seem a bit complicated.
I.e. if you have two domains, with ids 2 and 3, with no counters remaining on domain 2 but
some on domain 3 then rdtgroup_assign_cntr_event() will fail early and the counter won't
be assigned for domain 3 but the last_cmd_status will only be about domain 2. The user
either needs to know a failure at one domain means all higher domains will not be
considered for that counter or look at the new mbm_L3_assignments to understand what's happened.
In this case we have:
/sys/fs/resctrl# cat info/L3_MON/mbm_assign_on_mkdir
1
/sys/fs/resctrl# cat info/L3_MON/available_mbm_cntrs
2=0;3=1
/sys/fs/resctrl# mkdir mon_groups/new
/sys/fs/resctrl# cat info/last_cmd_status
Failed to allocate counter for mbm_total_bytes in domain 2
/sys/fs/resctrl# cat mon_groups/new/mbm_L3_assignments
mbm_total_bytes:2=_;3=_
Would it be better for each domain to be considered even if a previous failure occurred or
is this now a fixed behaviour? For illustration:
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 62edb464410a..8e061bce9742 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -1248,18 +1248,25 @@ static int rdtgroup_assign_cntr_event(struct rdt_l3_mon_domain *d, struct rdtgro
void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
{
struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
+ struct rdt_l3_mon_domain *d;
if (!r->mon_capable || !resctrl_arch_mbm_cntr_assign_enabled(r) ||
!r->mon.mbm_assign_on_mkdir)
return;
- if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
- rdtgroup_assign_cntr_event(NULL, rdtgrp,
- &mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID]);
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID)) {
+ list_for_each_entry(d, &r->mon_domains, hdr.list) {
+ rdtgroup_assign_cntr_event(d, rdtgrp,
+ &mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID]);
+ }
+ }
- if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
- rdtgroup_assign_cntr_event(NULL, rdtgrp,
- &mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID]);
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID)) {
+ list_for_each_entry(d, &r->mon_domains, hdr.list) {
+ rdtgroup_assign_cntr_event(d, rdtgrp,
+ &mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID]);
+ }
+ }
}
Thanks,
Ben
> "max_threshold_occupancy":
> Read/write file provides the largest value (in
> bytes) at which a previously used LLC_occupancy
>
> Reinette
>
>
>
^ permalink raw reply related [flat|nested] 37+ messages in thread* Re: [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency
2026-03-17 10:25 ` Ben Horgan
@ 2026-03-17 18:09 ` Reinette Chatre
2026-03-18 11:59 ` Ben Horgan
0 siblings, 1 reply; 37+ messages in thread
From: Reinette Chatre @ 2026-03-17 18:09 UTC (permalink / raw)
To: Ben Horgan, tony.luck, james.morse, Dave.Martin, babu.moger, bp,
tglx, dave.hansen
Cc: x86, hpa, fustini, fenghuay, peternewman, linux-kernel, patches
Hi Ben,
On 3/17/26 3:25 AM, Ben Horgan wrote:
> On 3/16/26 18:18, Reinette Chatre wrote:
>> On 3/16/26 10:44 AM, Ben Horgan wrote:
>>> On 3/2/26 18:46, Reinette Chatre wrote:
>> ...
>>> One related issue I've just noticed is that when ABMC and mbm_assign_on_mkdir are
>>> enabled the creation of MON/CTRL_MON directories may succeed but an error message
>>> is written to last_cmd_status. E.g.
>>>
>>> /sys/fs/resctrl# mkdir mon_groups/new5
>>> /sys/fs/resctrl# cat info/last_cmd_status
>>> Failed to allocate counter for mbm_total_bytes in domain 2
>>>
>>> The failure is ignored, as expected, in rdt_assign_cntrs() but the last_cmd_status
>>> is never cleared. I think this could be fixed by:
>>>
>>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>>> index 62edb464410a..396f17ed72c6 100644
>>> --- a/fs/resctrl/monitor.c
>>> +++ b/fs/resctrl/monitor.c
>>> @@ -1260,6 +1260,8 @@ void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
>>> if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
>>> rdtgroup_assign_cntr_event(NULL, rdtgrp,
>>> &mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID]);
>>> +
>>> + rdt_last_cmd_clear();
>>> }
>>>
>>> Is this right thing to do? Let me know if you want a proper patch.
>>
>> Letting group be created without any counters assigned while writing the error
>> to last_cmd_status is the intended behavior. If the last_cmd_status buffer is cleared
>> at this point then user space will never have the opportunity to see the message that
>> contains the details.
>>
>> It did not seem appropriate to let resource group creation fail when no counters
>> are available. I see that the documentation is not clear on this. What do you think
>> of an update to documentation instead? Would something like below help clarify behavior?
>>
>> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
>> index ba609f8d4de5..20dc58d281cf 100644
>> --- a/Documentation/filesystems/resctrl.rst
>> +++ b/Documentation/filesystems/resctrl.rst
>> @@ -478,6 +478,12 @@ with the following files:
>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_on_mkdir
>> 0
>>
>> + Automatic counter assignment is done with best effort. If auto assignment
>> + is enabled but there are not enough available counters then monitor group
>> + creation could succeed while one or more events belonging to the group may
>> + not have a counter assigned. Consult last_cmd_status for details during
>> + such scenario.
>> +
>
> This does the improve the situation but the multiple domain failure behaviour depends
> on the order the domains are iterated through. This is stable as the list is sorted but
> does seem a bit complicated.
> I.e. if you have two domains, with ids 2 and 3, with no counters remaining on domain 2 but
> some on domain 3 then rdtgroup_assign_cntr_event() will fail early and the counter won't
> be assigned for domain 3 but the last_cmd_status will only be about domain 2. The user
> either needs to know a failure at one domain means all higher domains will not be
> considered for that counter or look at the new mbm_L3_assignments to understand what's happened.
> In this case we have:
>
> /sys/fs/resctrl# cat info/L3_MON/mbm_assign_on_mkdir
> 1
> /sys/fs/resctrl# cat info/L3_MON/available_mbm_cntrs
> 2=0;3=1
> /sys/fs/resctrl# mkdir mon_groups/new
> /sys/fs/resctrl# cat info/last_cmd_status
> Failed to allocate counter for mbm_total_bytes in domain 2
> /sys/fs/resctrl# cat mon_groups/new/mbm_L3_assignments
> mbm_total_bytes:2=_;3=_
Good point.
>
> Would it be better for each domain to be considered even if a previous failure occurred or
> is this now a fixed behaviour? For illustration:
I do not believe this needs to be fixed.
>
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index 62edb464410a..8e061bce9742 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -1248,18 +1248,25 @@ static int rdtgroup_assign_cntr_event(struct rdt_l3_mon_domain *d, struct rdtgro
> void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
> {
> struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
> + struct rdt_l3_mon_domain *d;
>
> if (!r->mon_capable || !resctrl_arch_mbm_cntr_assign_enabled(r) ||
> !r->mon.mbm_assign_on_mkdir)
> return;
>
> - if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
> - rdtgroup_assign_cntr_event(NULL, rdtgrp,
> - &mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID]);
> + if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID)) {
> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
> + rdtgroup_assign_cntr_event(d, rdtgrp,
> + &mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID]);
> + }
> + }
>
> - if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
> - rdtgroup_assign_cntr_event(NULL, rdtgrp,
> - &mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID]);
> + if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID)) {
> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
> + rdtgroup_assign_cntr_event(d, rdtgrp,
> + &mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID]);
> + }
> + }
> }
With a solution like this last_cmd_status could potentially contain multiple lines, one
line per domain that failed. last_cmd_status is 512 bytes so if this is a system with
many domains there is a risk of overflow and user space not seeing all failures.
That may be ok?
I think this can be simplified within rdt_assign_cntr_event() though. Consider:
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 49f3f6b846b2..a6a791a15e30 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -1209,12 +1209,13 @@ static int rdtgroup_alloc_assign_cntr(struct rdt_resource *r, struct rdt_l3_mon_
* NULL; otherwise, assign the counter to the specified domain @d.
*
* If all counters in a domain are already in use, rdtgroup_alloc_assign_cntr()
- * will fail. The assignment process will abort at the first failure encountered
- * during domain traversal, which may result in the event being only partially
- * assigned.
+ * will fail. Ignore errors when attempting to assign a counter to all domains
+ * since only some domains may have counters available and goal is to assign
+ * counters where possible. Only caller providing @d of NULL is
+ * rdtgroup_assign_cntrs() that ignores errors.
*
* Return:
- * 0 on success, < 0 on failure.
+ * 0 on success when @d is specified, 0 always when @d is NULL, < 0 on failure.
*/
static int rdtgroup_assign_cntr_event(struct rdt_l3_mon_domain *d, struct rdtgroup *rdtgrp,
struct mon_evt *mevt)
@@ -1223,11 +1224,8 @@ static int rdtgroup_assign_cntr_event(struct rdt_l3_mon_domain *d, struct rdtgro
int ret = 0;
if (!d) {
- list_for_each_entry(d, &r->mon_domains, hdr.list) {
- ret = rdtgroup_alloc_assign_cntr(r, d, rdtgrp, mevt);
- if (ret)
- return ret;
- }
+ list_for_each_entry(d, &r->mon_domains, hdr.list)
+ rdtgroup_alloc_assign_cntr(r, d, rdtgrp, mevt);
} else {
ret = rdtgroup_alloc_assign_cntr(r, d, rdtgrp, mevt);
}
Reinette
^ permalink raw reply related [flat|nested] 37+ messages in thread* Re: [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency
2026-03-17 18:09 ` Reinette Chatre
@ 2026-03-18 11:59 ` Ben Horgan
2026-03-18 16:35 ` Reinette Chatre
0 siblings, 1 reply; 37+ messages in thread
From: Ben Horgan @ 2026-03-18 11:59 UTC (permalink / raw)
To: Reinette Chatre, tony.luck, james.morse, Dave.Martin, babu.moger,
bp, tglx, dave.hansen
Cc: x86, hpa, fustini, fenghuay, peternewman, linux-kernel, patches
Hi Reinette,
On 3/17/26 18:09, Reinette Chatre wrote:
> Hi Ben,
>
> On 3/17/26 3:25 AM, Ben Horgan wrote:
>> On 3/16/26 18:18, Reinette Chatre wrote:
>>> On 3/16/26 10:44 AM, Ben Horgan wrote:
>>>> On 3/2/26 18:46, Reinette Chatre wrote:
>>> ...
>>>> One related issue I've just noticed is that when ABMC and mbm_assign_on_mkdir are
>>>> enabled the creation of MON/CTRL_MON directories may succeed but an error message
>>>> is written to last_cmd_status. E.g.
>>>>
>>>> /sys/fs/resctrl# mkdir mon_groups/new5
>>>> /sys/fs/resctrl# cat info/last_cmd_status
>>>> Failed to allocate counter for mbm_total_bytes in domain 2
>>>>
>>>> The failure is ignored, as expected, in rdt_assign_cntrs() but the last_cmd_status
>>>> is never cleared. I think this could be fixed by:
>>>>
>>>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>>>> index 62edb464410a..396f17ed72c6 100644
>>>> --- a/fs/resctrl/monitor.c
>>>> +++ b/fs/resctrl/monitor.c
>>>> @@ -1260,6 +1260,8 @@ void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
>>>> if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
>>>> rdtgroup_assign_cntr_event(NULL, rdtgrp,
>>>> &mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID]);
>>>> +
>>>> + rdt_last_cmd_clear();
>>>> }
>>>>
>>>> Is this right thing to do? Let me know if you want a proper patch.
>>>
>>> Letting group be created without any counters assigned while writing the error
>>> to last_cmd_status is the intended behavior. If the last_cmd_status buffer is cleared
>>> at this point then user space will never have the opportunity to see the message that
>>> contains the details.
>>>
>>> It did not seem appropriate to let resource group creation fail when no counters
>>> are available. I see that the documentation is not clear on this. What do you think
>>> of an update to documentation instead? Would something like below help clarify behavior?
>>>
>>> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
>>> index ba609f8d4de5..20dc58d281cf 100644
>>> --- a/Documentation/filesystems/resctrl.rst
>>> +++ b/Documentation/filesystems/resctrl.rst
>>> @@ -478,6 +478,12 @@ with the following files:
>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_on_mkdir
>>> 0
>>>
>>> + Automatic counter assignment is done with best effort. If auto assignment
>>> + is enabled but there are not enough available counters then monitor group
>>> + creation could succeed while one or more events belonging to the group may
>>> + not have a counter assigned. Consult last_cmd_status for details during
>>> + such scenario.
>>> +
>>
>> This does the improve the situation but the multiple domain failure behaviour depends
>> on the order the domains are iterated through. This is stable as the list is sorted but
>> does seem a bit complicated.
>> I.e. if you have two domains, with ids 2 and 3, with no counters remaining on domain 2 but
>> some on domain 3 then rdtgroup_assign_cntr_event() will fail early and the counter won't
>> be assigned for domain 3 but the last_cmd_status will only be about domain 2. The user
>> either needs to know a failure at one domain means all higher domains will not be
>> considered for that counter or look at the new mbm_L3_assignments to understand what's happened.
>> In this case we have:
>>
>> /sys/fs/resctrl# cat info/L3_MON/mbm_assign_on_mkdir
>> 1
>> /sys/fs/resctrl# cat info/L3_MON/available_mbm_cntrs
>> 2=0;3=1
>> /sys/fs/resctrl# mkdir mon_groups/new
>> /sys/fs/resctrl# cat info/last_cmd_status
>> Failed to allocate counter for mbm_total_bytes in domain 2
>> /sys/fs/resctrl# cat mon_groups/new/mbm_L3_assignments
>> mbm_total_bytes:2=_;3=_
>
> Good point.
>
>>
>> Would it be better for each domain to be considered even if a previous failure occurred or
>> is this now a fixed behaviour? For illustration:
>
> I do not believe this needs to be fixed.
>
>>
>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>> index 62edb464410a..8e061bce9742 100644
>> --- a/fs/resctrl/monitor.c
>> +++ b/fs/resctrl/monitor.c
>> @@ -1248,18 +1248,25 @@ static int rdtgroup_assign_cntr_event(struct rdt_l3_mon_domain *d, struct rdtgro
>> void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
>> {
>> struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
>> + struct rdt_l3_mon_domain *d;
>>
>> if (!r->mon_capable || !resctrl_arch_mbm_cntr_assign_enabled(r) ||
>> !r->mon.mbm_assign_on_mkdir)
>> return;
>>
>> - if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
>> - rdtgroup_assign_cntr_event(NULL, rdtgrp,
>> - &mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID]);
>> + if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID)) {
>> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> + rdtgroup_assign_cntr_event(d, rdtgrp,
>> + &mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID]);
>> + }
>> + }
>>
>> - if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
>> - rdtgroup_assign_cntr_event(NULL, rdtgrp,
>> - &mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID]);
>> + if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID)) {
>> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> + rdtgroup_assign_cntr_event(d, rdtgrp,
>> + &mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID]);
>> + }
>> + }
>> }
>
> With a solution like this last_cmd_status could potentially contain multiple lines, one
> line per domain that failed. last_cmd_status is 512 bytes so if this is a system with
> many domains there is a risk of overflow and user space not seeing all failures.
> That may be ok?
Probably but maybe we can do better. I wonder that given that we know we are trying to allocate
counters in all domains the information in last_cmd_status could summarize which
domains failed. 512 bytes isn't that large so a hex encoded bit map or something else information
dense would be needed. This does make last_cmd_status a bit less human readable so may not
be the way to go.
>
> I think this can be simplified within rdt_assign_cntr_event() though. Consider:
>
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index 49f3f6b846b2..a6a791a15e30 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -1209,12 +1209,13 @@ static int rdtgroup_alloc_assign_cntr(struct rdt_resource *r, struct rdt_l3_mon_
> * NULL; otherwise, assign the counter to the specified domain @d.
> *
> * If all counters in a domain are already in use, rdtgroup_alloc_assign_cntr()
> - * will fail. The assignment process will abort at the first failure encountered
> - * during domain traversal, which may result in the event being only partially
> - * assigned.
> + * will fail. Ignore errors when attempting to assign a counter to all domains
> + * since only some domains may have counters available and goal is to assign
> + * counters where possible. Only caller providing @d of NULL is
> + * rdtgroup_assign_cntrs() that ignores errors.
> *
> * Return:
> - * 0 on success, < 0 on failure.
> + * 0 on success when @d is specified, 0 always when @d is NULL, < 0 on failure.
> */
> static int rdtgroup_assign_cntr_event(struct rdt_l3_mon_domain *d, struct rdtgroup *rdtgrp,
> struct mon_evt *mevt)
> @@ -1223,11 +1224,8 @@ static int rdtgroup_assign_cntr_event(struct rdt_l3_mon_domain *d, struct rdtgro
> int ret = 0;
>
> if (!d) {
> - list_for_each_entry(d, &r->mon_domains, hdr.list) {
> - ret = rdtgroup_alloc_assign_cntr(r, d, rdtgrp, mevt);
> - if (ret)
> - return ret;
> - }
> + list_for_each_entry(d, &r->mon_domains, hdr.list)
> + rdtgroup_alloc_assign_cntr(r, d, rdtgrp, mevt);
> } else {
> ret = rdtgroup_alloc_assign_cntr(r, d, rdtgrp, mevt);
> }
That does the job but is it simpler to delete rdtgroup_assign_cntr_event()
and call rdtgroup_alloc_assign_cntr() directly from rdtgroup_assign_cntrs()
and rdtgroup_modify_assign_state() rather than special casing it further?
Thanks,
Ben
>
>
> Reinette
>
^ permalink raw reply [flat|nested] 37+ messages in thread* Re: [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency
2026-03-18 11:59 ` Ben Horgan
@ 2026-03-18 16:35 ` Reinette Chatre
2026-03-18 17:10 ` Ben Horgan
0 siblings, 1 reply; 37+ messages in thread
From: Reinette Chatre @ 2026-03-18 16:35 UTC (permalink / raw)
To: Ben Horgan, tony.luck, james.morse, Dave.Martin, babu.moger, bp,
tglx, dave.hansen
Cc: x86, hpa, fustini, fenghuay, peternewman, linux-kernel, patches
Hi Ben,
On 3/18/26 4:59 AM, Ben Horgan wrote:
> On 3/17/26 18:09, Reinette Chatre wrote:
>> On 3/17/26 3:25 AM, Ben Horgan wrote:
>>> On 3/16/26 18:18, Reinette Chatre wrote:
>>>> On 3/16/26 10:44 AM, Ben Horgan wrote:
>>>>> On 3/2/26 18:46, Reinette Chatre wrote:
>>>> ...
>>>>> One related issue I've just noticed is that when ABMC and mbm_assign_on_mkdir are
>>>>> enabled the creation of MON/CTRL_MON directories may succeed but an error message
>>>>> is written to last_cmd_status. E.g.
>>>>>
>>>>> /sys/fs/resctrl# mkdir mon_groups/new5
>>>>> /sys/fs/resctrl# cat info/last_cmd_status
>>>>> Failed to allocate counter for mbm_total_bytes in domain 2
>>>>>
>>>>> The failure is ignored, as expected, in rdt_assign_cntrs() but the last_cmd_status
>>>>> is never cleared. I think this could be fixed by:
>>>>>
>>>>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>>>>> index 62edb464410a..396f17ed72c6 100644
>>>>> --- a/fs/resctrl/monitor.c
>>>>> +++ b/fs/resctrl/monitor.c
>>>>> @@ -1260,6 +1260,8 @@ void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
>>>>> if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
>>>>> rdtgroup_assign_cntr_event(NULL, rdtgrp,
>>>>> &mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID]);
>>>>> +
>>>>> + rdt_last_cmd_clear();
>>>>> }
>>>>>
>>>>> Is this right thing to do? Let me know if you want a proper patch.
>>>>
>>>> Letting group be created without any counters assigned while writing the error
>>>> to last_cmd_status is the intended behavior. If the last_cmd_status buffer is cleared
>>>> at this point then user space will never have the opportunity to see the message that
>>>> contains the details.
>>>>
>>>> It did not seem appropriate to let resource group creation fail when no counters
>>>> are available. I see that the documentation is not clear on this. What do you think
>>>> of an update to documentation instead? Would something like below help clarify behavior?
>>>>
>>>> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
>>>> index ba609f8d4de5..20dc58d281cf 100644
>>>> --- a/Documentation/filesystems/resctrl.rst
>>>> +++ b/Documentation/filesystems/resctrl.rst
>>>> @@ -478,6 +478,12 @@ with the following files:
>>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_on_mkdir
>>>> 0
>>>>
>>>> + Automatic counter assignment is done with best effort. If auto assignment
>>>> + is enabled but there are not enough available counters then monitor group
>>>> + creation could succeed while one or more events belonging to the group may
>>>> + not have a counter assigned. Consult last_cmd_status for details during
>>>> + such scenario.
>>>> +
>>>
>>> This does the improve the situation but the multiple domain failure behaviour depends
>>> on the order the domains are iterated through. This is stable as the list is sorted but
>>> does seem a bit complicated.
>>> I.e. if you have two domains, with ids 2 and 3, with no counters remaining on domain 2 but
>>> some on domain 3 then rdtgroup_assign_cntr_event() will fail early and the counter won't
>>> be assigned for domain 3 but the last_cmd_status will only be about domain 2. The user
>>> either needs to know a failure at one domain means all higher domains will not be
>>> considered for that counter or look at the new mbm_L3_assignments to understand what's happened.
>>> In this case we have:
>>>
>>> /sys/fs/resctrl# cat info/L3_MON/mbm_assign_on_mkdir
>>> 1
>>> /sys/fs/resctrl# cat info/L3_MON/available_mbm_cntrs
>>> 2=0;3=1
>>> /sys/fs/resctrl# mkdir mon_groups/new
>>> /sys/fs/resctrl# cat info/last_cmd_status
>>> Failed to allocate counter for mbm_total_bytes in domain 2
>>> /sys/fs/resctrl# cat mon_groups/new/mbm_L3_assignments
>>> mbm_total_bytes:2=_;3=_
>>
>> Good point.
>>
>>>
>>> Would it be better for each domain to be considered even if a previous failure occurred or
>>> is this now a fixed behaviour? For illustration:
>>
>> I do not believe this needs to be fixed.
>>
>>>
>>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>>> index 62edb464410a..8e061bce9742 100644
>>> --- a/fs/resctrl/monitor.c
>>> +++ b/fs/resctrl/monitor.c
>>> @@ -1248,18 +1248,25 @@ static int rdtgroup_assign_cntr_event(struct rdt_l3_mon_domain *d, struct rdtgro
>>> void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
>>> {
>>> struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
>>> + struct rdt_l3_mon_domain *d;
>>>
>>> if (!r->mon_capable || !resctrl_arch_mbm_cntr_assign_enabled(r) ||
>>> !r->mon.mbm_assign_on_mkdir)
>>> return;
>>>
>>> - if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
>>> - rdtgroup_assign_cntr_event(NULL, rdtgrp,
>>> - &mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID]);
>>> + if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID)) {
>>> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
>>> + rdtgroup_assign_cntr_event(d, rdtgrp,
>>> + &mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID]);
>>> + }
>>> + }
>>>
>>> - if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
>>> - rdtgroup_assign_cntr_event(NULL, rdtgrp,
>>> - &mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID]);
>>> + if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID)) {
>>> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
>>> + rdtgroup_assign_cntr_event(d, rdtgrp,
>>> + &mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID]);
>>> + }
>>> + }
>>> }
>>
>> With a solution like this last_cmd_status could potentially contain multiple lines, one
>> line per domain that failed. last_cmd_status is 512 bytes so if this is a system with
>> many domains there is a risk of overflow and user space not seeing all failures.
>> That may be ok?
>
> Probably but maybe we can do better. I wonder that given that we know we are trying to allocate
> counters in all domains the information in last_cmd_status could summarize which
> domains failed. 512 bytes isn't that large so a hex encoded bit map or something else information
> dense would be needed. This does make last_cmd_status a bit less human readable so may not
> be the way to go.
Indeed, at this point it becomes difficult to convey all the failures. I expect a bit more
from user space during such scenarios though. Specifically, a user space needing to work in
constrained environment would be better off disabling mbm_assign_on_mkdir and manage counters
itself or ensure there are enough counters available before creating a new monitor group.
What resctrl could do in such scenario is to at least convey that some messages were
dropped. Consider, for example:
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 5da305bd36c9..ea77fa6a38f7 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -973,10 +973,13 @@ static int rdt_last_cmd_status_show(struct kernfs_open_file *of,
mutex_lock(&rdtgroup_mutex);
len = seq_buf_used(&last_cmd_status);
- if (len)
+ if (len) {
seq_printf(seq, "%.*s", len, last_cmd_status_buf);
- else
+ if (seq_buf_has_overflowed(&last_cmd_status))
+ seq_puts(seq, "[truncated]\n");
+ } else {
seq_puts(seq, "ok\n");
+ }
mutex_unlock(&rdtgroup_mutex);
return 0;
}
>
>>
>> I think this can be simplified within rdt_assign_cntr_event() though. Consider:
>>
>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>> index 49f3f6b846b2..a6a791a15e30 100644
>> --- a/fs/resctrl/monitor.c
>> +++ b/fs/resctrl/monitor.c
>> @@ -1209,12 +1209,13 @@ static int rdtgroup_alloc_assign_cntr(struct rdt_resource *r, struct rdt_l3_mon_
>> * NULL; otherwise, assign the counter to the specified domain @d.
>> *
>> * If all counters in a domain are already in use, rdtgroup_alloc_assign_cntr()
>> - * will fail. The assignment process will abort at the first failure encountered
>> - * during domain traversal, which may result in the event being only partially
>> - * assigned.
>> + * will fail. Ignore errors when attempting to assign a counter to all domains
>> + * since only some domains may have counters available and goal is to assign
>> + * counters where possible. Only caller providing @d of NULL is
>> + * rdtgroup_assign_cntrs() that ignores errors.
>> *
>> * Return:
>> - * 0 on success, < 0 on failure.
>> + * 0 on success when @d is specified, 0 always when @d is NULL, < 0 on failure.
>> */
>> static int rdtgroup_assign_cntr_event(struct rdt_l3_mon_domain *d, struct rdtgroup *rdtgrp,
>> struct mon_evt *mevt)
>> @@ -1223,11 +1224,8 @@ static int rdtgroup_assign_cntr_event(struct rdt_l3_mon_domain *d, struct rdtgro
>> int ret = 0;
>>
>> if (!d) {
>> - list_for_each_entry(d, &r->mon_domains, hdr.list) {
>> - ret = rdtgroup_alloc_assign_cntr(r, d, rdtgrp, mevt);
>> - if (ret)
>> - return ret;
>> - }
>> + list_for_each_entry(d, &r->mon_domains, hdr.list)
>> + rdtgroup_alloc_assign_cntr(r, d, rdtgrp, mevt);
>> } else {
>> ret = rdtgroup_alloc_assign_cntr(r, d, rdtgrp, mevt);
>> }
>
> That does the job but is it simpler to delete rdtgroup_assign_cntr_event()
> and call rdtgroup_alloc_assign_cntr() directly from rdtgroup_assign_cntrs()
> and rdtgroup_modify_assign_state() rather than special casing it further?
Yes, I agree, that would be much simpler.
Reinette
^ permalink raw reply related [flat|nested] 37+ messages in thread* Re: [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency
2026-03-18 16:35 ` Reinette Chatre
@ 2026-03-18 17:10 ` Ben Horgan
2026-03-18 20:12 ` Reinette Chatre
0 siblings, 1 reply; 37+ messages in thread
From: Ben Horgan @ 2026-03-18 17:10 UTC (permalink / raw)
To: Reinette Chatre, tony.luck, james.morse, Dave.Martin, babu.moger,
bp, tglx, dave.hansen
Cc: x86, hpa, fustini, fenghuay, peternewman, linux-kernel, patches
Hi Reinette,
On 3/18/26 16:35, Reinette Chatre wrote:
> Hi Ben,
>
> On 3/18/26 4:59 AM, Ben Horgan wrote:
>> On 3/17/26 18:09, Reinette Chatre wrote:
>>> On 3/17/26 3:25 AM, Ben Horgan wrote:
>>>> On 3/16/26 18:18, Reinette Chatre wrote:
>>>>> On 3/16/26 10:44 AM, Ben Horgan wrote:
>>>>>> On 3/2/26 18:46, Reinette Chatre wrote:
>>>>> ...
>>>>>> One related issue I've just noticed is that when ABMC and mbm_assign_on_mkdir are
>>>>>> enabled the creation of MON/CTRL_MON directories may succeed but an error message
>>>>>> is written to last_cmd_status. E.g.
>>>>>>
>>>>>> /sys/fs/resctrl# mkdir mon_groups/new5
>>>>>> /sys/fs/resctrl# cat info/last_cmd_status
>>>>>> Failed to allocate counter for mbm_total_bytes in domain 2
>>>>>>
>>>>>> The failure is ignored, as expected, in rdt_assign_cntrs() but the last_cmd_status
>>>>>> is never cleared. I think this could be fixed by:
>>>>>>
>>>>>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>>>>>> index 62edb464410a..396f17ed72c6 100644
>>>>>> --- a/fs/resctrl/monitor.c
>>>>>> +++ b/fs/resctrl/monitor.c
>>>>>> @@ -1260,6 +1260,8 @@ void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
>>>>>> if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
>>>>>> rdtgroup_assign_cntr_event(NULL, rdtgrp,
>>>>>> &mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID]);
>>>>>> +
>>>>>> + rdt_last_cmd_clear();
>>>>>> }
>>>>>>
>>>>>> Is this right thing to do? Let me know if you want a proper patch.
>>>>>
>>>>> Letting group be created without any counters assigned while writing the error
>>>>> to last_cmd_status is the intended behavior. If the last_cmd_status buffer is cleared
>>>>> at this point then user space will never have the opportunity to see the message that
>>>>> contains the details.
>>>>>
>>>>> It did not seem appropriate to let resource group creation fail when no counters
>>>>> are available. I see that the documentation is not clear on this. What do you think
>>>>> of an update to documentation instead? Would something like below help clarify behavior?
>>>>>
>>>>> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
>>>>> index ba609f8d4de5..20dc58d281cf 100644
>>>>> --- a/Documentation/filesystems/resctrl.rst
>>>>> +++ b/Documentation/filesystems/resctrl.rst
>>>>> @@ -478,6 +478,12 @@ with the following files:
>>>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_on_mkdir
>>>>> 0
>>>>>
>>>>> + Automatic counter assignment is done with best effort. If auto assignment
>>>>> + is enabled but there are not enough available counters then monitor group
>>>>> + creation could succeed while one or more events belonging to the group may
>>>>> + not have a counter assigned. Consult last_cmd_status for details during
>>>>> + such scenario.
>>>>> +
>>>>
>>>> This does the improve the situation but the multiple domain failure behaviour depends
>>>> on the order the domains are iterated through. This is stable as the list is sorted but
>>>> does seem a bit complicated.
>>>> I.e. if you have two domains, with ids 2 and 3, with no counters remaining on domain 2 but
>>>> some on domain 3 then rdtgroup_assign_cntr_event() will fail early and the counter won't
>>>> be assigned for domain 3 but the last_cmd_status will only be about domain 2. The user
>>>> either needs to know a failure at one domain means all higher domains will not be
>>>> considered for that counter or look at the new mbm_L3_assignments to understand what's happened.
>>>> In this case we have:
>>>>
>>>> /sys/fs/resctrl# cat info/L3_MON/mbm_assign_on_mkdir
>>>> 1
>>>> /sys/fs/resctrl# cat info/L3_MON/available_mbm_cntrs
>>>> 2=0;3=1
>>>> /sys/fs/resctrl# mkdir mon_groups/new
>>>> /sys/fs/resctrl# cat info/last_cmd_status
>>>> Failed to allocate counter for mbm_total_bytes in domain 2
>>>> /sys/fs/resctrl# cat mon_groups/new/mbm_L3_assignments
>>>> mbm_total_bytes:2=_;3=_
>>>
>>> Good point.
>>>
>>>>
>>>> Would it be better for each domain to be considered even if a previous failure occurred or
>>>> is this now a fixed behaviour? For illustration:
>>>
>>> I do not believe this needs to be fixed.
>>>
>>>>
>>>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>>>> index 62edb464410a..8e061bce9742 100644
>>>> --- a/fs/resctrl/monitor.c
>>>> +++ b/fs/resctrl/monitor.c
>>>> @@ -1248,18 +1248,25 @@ static int rdtgroup_assign_cntr_event(struct rdt_l3_mon_domain *d, struct rdtgro
>>>> void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
>>>> {
>>>> struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
>>>> + struct rdt_l3_mon_domain *d;
>>>>
>>>> if (!r->mon_capable || !resctrl_arch_mbm_cntr_assign_enabled(r) ||
>>>> !r->mon.mbm_assign_on_mkdir)
>>>> return;
>>>>
>>>> - if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
>>>> - rdtgroup_assign_cntr_event(NULL, rdtgrp,
>>>> - &mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID]);
>>>> + if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID)) {
>>>> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
>>>> + rdtgroup_assign_cntr_event(d, rdtgrp,
>>>> + &mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID]);
>>>> + }
>>>> + }
>>>>
>>>> - if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
>>>> - rdtgroup_assign_cntr_event(NULL, rdtgrp,
>>>> - &mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID]);
>>>> + if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID)) {
>>>> + list_for_each_entry(d, &r->mon_domains, hdr.list) {
>>>> + rdtgroup_assign_cntr_event(d, rdtgrp,
>>>> + &mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID]);
>>>> + }
>>>> + }
>>>> }
>>>
>>> With a solution like this last_cmd_status could potentially contain multiple lines, one
>>> line per domain that failed. last_cmd_status is 512 bytes so if this is a system with
>>> many domains there is a risk of overflow and user space not seeing all failures.
>>> That may be ok?
>>
>> Probably but maybe we can do better. I wonder that given that we know we are trying to allocate
>> counters in all domains the information in last_cmd_status could summarize which
>> domains failed. 512 bytes isn't that large so a hex encoded bit map or something else information
>> dense would be needed. This does make last_cmd_status a bit less human readable so may not
>> be the way to go.
>
> Indeed, at this point it becomes difficult to convey all the failures. I expect a bit more
> from user space during such scenarios though. Specifically, a user space needing to work in
> constrained environment would be better off disabling mbm_assign_on_mkdir and manage counters
> itself or ensure there are enough counters available before creating a new monitor group.
>
> What resctrl could do in such scenario is to at least convey that some messages were
> dropped. Consider, for example:
>
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index 5da305bd36c9..ea77fa6a38f7 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -973,10 +973,13 @@ static int rdt_last_cmd_status_show(struct kernfs_open_file *of,
>
> mutex_lock(&rdtgroup_mutex);
> len = seq_buf_used(&last_cmd_status);
> - if (len)
> + if (len) {
> seq_printf(seq, "%.*s", len, last_cmd_status_buf);
> - else
> + if (seq_buf_has_overflowed(&last_cmd_status))
> + seq_puts(seq, "[truncated]\n");
> + } else {
> seq_puts(seq, "ok\n");
> + }
> mutex_unlock(&rdtgroup_mutex);
> return 0;
> }
Adding a truncation indication makes sense to me. Would it be good to reserve space in the
last_cmd_status_buf[] to ensure this can always be displayed? It looks like space could be
made by interacting with seq->size directly but I'm not sure if there is a cleaner way
to do it.
Thanks,
Ben
>
>
>>
>>>
>>> I think this can be simplified within rdt_assign_cntr_event() though. Consider:
>>>
>>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>>> index 49f3f6b846b2..a6a791a15e30 100644
>>> --- a/fs/resctrl/monitor.c
>>> +++ b/fs/resctrl/monitor.c
>>> @@ -1209,12 +1209,13 @@ static int rdtgroup_alloc_assign_cntr(struct rdt_resource *r, struct rdt_l3_mon_
>>> * NULL; otherwise, assign the counter to the specified domain @d.
>>> *
>>> * If all counters in a domain are already in use, rdtgroup_alloc_assign_cntr()
>>> - * will fail. The assignment process will abort at the first failure encountered
>>> - * during domain traversal, which may result in the event being only partially
>>> - * assigned.
>>> + * will fail. Ignore errors when attempting to assign a counter to all domains
>>> + * since only some domains may have counters available and goal is to assign
>>> + * counters where possible. Only caller providing @d of NULL is
>>> + * rdtgroup_assign_cntrs() that ignores errors.
>>> *
>>> * Return:
>>> - * 0 on success, < 0 on failure.
>>> + * 0 on success when @d is specified, 0 always when @d is NULL, < 0 on failure.
>>> */
>>> static int rdtgroup_assign_cntr_event(struct rdt_l3_mon_domain *d, struct rdtgroup *rdtgrp,
>>> struct mon_evt *mevt)
>>> @@ -1223,11 +1224,8 @@ static int rdtgroup_assign_cntr_event(struct rdt_l3_mon_domain *d, struct rdtgro
>>> int ret = 0;
>>>
>>> if (!d) {
>>> - list_for_each_entry(d, &r->mon_domains, hdr.list) {
>>> - ret = rdtgroup_alloc_assign_cntr(r, d, rdtgrp, mevt);
>>> - if (ret)
>>> - return ret;
>>> - }
>>> + list_for_each_entry(d, &r->mon_domains, hdr.list)
>>> + rdtgroup_alloc_assign_cntr(r, d, rdtgrp, mevt);
>>> } else {
>>> ret = rdtgroup_alloc_assign_cntr(r, d, rdtgrp, mevt);
>>> }
>>
>> That does the job but is it simpler to delete rdtgroup_assign_cntr_event()
>> and call rdtgroup_alloc_assign_cntr() directly from rdtgroup_assign_cntrs()
>> and rdtgroup_modify_assign_state() rather than special casing it further?
>
> Yes, I agree, that would be much simpler.
>
> Reinette
>
^ permalink raw reply [flat|nested] 37+ messages in thread* Re: [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency
2026-03-18 17:10 ` Ben Horgan
@ 2026-03-18 20:12 ` Reinette Chatre
2026-03-19 9:53 ` Ben Horgan
0 siblings, 1 reply; 37+ messages in thread
From: Reinette Chatre @ 2026-03-18 20:12 UTC (permalink / raw)
To: Ben Horgan, tony.luck, james.morse, Dave.Martin, babu.moger, bp,
tglx, dave.hansen
Cc: x86, hpa, fustini, fenghuay, peternewman, linux-kernel, patches
Hi Ben,
On 3/18/26 10:10 AM, Ben Horgan wrote:
> On 3/18/26 16:35, Reinette Chatre wrote:
>> What resctrl could do in such scenario is to at least convey that some messages were
>> dropped. Consider, for example:
>>
>> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
>> index 5da305bd36c9..ea77fa6a38f7 100644
>> --- a/fs/resctrl/rdtgroup.c
>> +++ b/fs/resctrl/rdtgroup.c
>> @@ -973,10 +973,13 @@ static int rdt_last_cmd_status_show(struct kernfs_open_file *of,
>>
>> mutex_lock(&rdtgroup_mutex);
>> len = seq_buf_used(&last_cmd_status);
>> - if (len)
>> + if (len) {
>> seq_printf(seq, "%.*s", len, last_cmd_status_buf);
>> - else
>> + if (seq_buf_has_overflowed(&last_cmd_status))
>> + seq_puts(seq, "[truncated]\n");
>> + } else {
>> seq_puts(seq, "ok\n");
>> + }
>> mutex_unlock(&rdtgroup_mutex);
>> return 0;
>> }
>
> Adding a truncation indication makes sense to me. Would it be good to reserve space in the
> last_cmd_status_buf[] to ensure this can always be displayed? It looks like space could be
> made by interacting with seq->size directly but I'm not sure if there is a cleaner way
> to do it.
>
Please note the distinction between the struct seq_file instance pointed to by seq and the
struct seq_buf instance last_cmd_status. The last_cmd_status seq_buf instance is backed by
last_cmd_status_buf of 512 bytes which is printed to the seq_file instance seq that is
backed by another buffer that starts out with size PAGE_SIZE. So, it looks to me
like printing last_cmd_status_buf to the seq seq_file instance followed by "[truncated]\n"
should fit by default? This should keep working even if last_cmd_status_buf size is
increased since seq_read_iter() that calls this show() would just keep increasing the
buffer backing the seq_file (up to a very large limit of MAX_RW_COUNT which is INT_MAX & PAGE_MASK)
until it does fit.
Reinette
^ permalink raw reply [flat|nested] 37+ messages in thread* Re: [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency
2026-03-18 20:12 ` Reinette Chatre
@ 2026-03-19 9:53 ` Ben Horgan
2026-03-19 16:18 ` Reinette Chatre
0 siblings, 1 reply; 37+ messages in thread
From: Ben Horgan @ 2026-03-19 9:53 UTC (permalink / raw)
To: Reinette Chatre, tony.luck, james.morse, Dave.Martin, babu.moger,
bp, tglx, dave.hansen
Cc: x86, hpa, fustini, fenghuay, peternewman, linux-kernel, patches
Hi Reinette,
On 3/18/26 20:12, Reinette Chatre wrote:
> Hi Ben,
>
> On 3/18/26 10:10 AM, Ben Horgan wrote:
>> On 3/18/26 16:35, Reinette Chatre wrote:
>
>
>>> What resctrl could do in such scenario is to at least convey that some messages were
>>> dropped. Consider, for example:
>>>
>>> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
>>> index 5da305bd36c9..ea77fa6a38f7 100644
>>> --- a/fs/resctrl/rdtgroup.c
>>> +++ b/fs/resctrl/rdtgroup.c
>>> @@ -973,10 +973,13 @@ static int rdt_last_cmd_status_show(struct kernfs_open_file *of,
>>>
>>> mutex_lock(&rdtgroup_mutex);
>>> len = seq_buf_used(&last_cmd_status);
>>> - if (len)
>>> + if (len) {
>>> seq_printf(seq, "%.*s", len, last_cmd_status_buf);
>>> - else
>>> + if (seq_buf_has_overflowed(&last_cmd_status))
>>> + seq_puts(seq, "[truncated]\n");
>>> + } else {
>>> seq_puts(seq, "ok\n");
>>> + }
>>> mutex_unlock(&rdtgroup_mutex);
>>> return 0;
>>> }
>>
>> Adding a truncation indication makes sense to me. Would it be good to reserve space in the
>> last_cmd_status_buf[] to ensure this can always be displayed? It looks like space could be
>> made by interacting with seq->size directly but I'm not sure if there is a cleaner way
>> to do it.
>>
>
> Please note the distinction between the struct seq_file instance pointed to by seq and the
> struct seq_buf instance last_cmd_status. The last_cmd_status seq_buf instance is backed by
> last_cmd_status_buf of 512 bytes which is printed to the seq_file instance seq that is
> backed by another buffer that starts out with size PAGE_SIZE. So, it looks to me
> like printing last_cmd_status_buf to the seq seq_file instance followed by "[truncated]\n"
> should fit by default? This should keep working even if last_cmd_status_buf size is
Thanks for the explanation. I had missed that distinction. I've just given your code a go with
some hacked in rdt_last_cmd_puts() calls and it behaves as you say.
We've discussed two changes, one is adding a truncation message to last_cmd_status and the other
is carrying on after failure when allocating counters. Are you going to take these from here or would
you like patches from me?
Thanks,
Ben
> increased since seq_read_iter() that calls this show() would just keep increasing the
> buffer backing the seq_file (up to a very large limit of MAX_RW_COUNT which is INT_MAX & PAGE_MASK)
> until it does fit.
>
> Reinette
^ permalink raw reply [flat|nested] 37+ messages in thread* Re: [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency
2026-03-19 9:53 ` Ben Horgan
@ 2026-03-19 16:18 ` Reinette Chatre
2026-03-19 17:18 ` Ben Horgan
0 siblings, 1 reply; 37+ messages in thread
From: Reinette Chatre @ 2026-03-19 16:18 UTC (permalink / raw)
To: Ben Horgan, tony.luck, james.morse, Dave.Martin, babu.moger, bp,
tglx, dave.hansen
Cc: x86, hpa, fustini, fenghuay, peternewman, linux-kernel, patches
Hi Ben,
On 3/19/26 2:53 AM, Ben Horgan wrote:
> Hi Reinette,
>
> On 3/18/26 20:12, Reinette Chatre wrote:
>> Hi Ben,
>>
>> On 3/18/26 10:10 AM, Ben Horgan wrote:
>>> On 3/18/26 16:35, Reinette Chatre wrote:
>>
>>
>>>> What resctrl could do in such scenario is to at least convey that some messages were
>>>> dropped. Consider, for example:
>>>>
>>>> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
>>>> index 5da305bd36c9..ea77fa6a38f7 100644
>>>> --- a/fs/resctrl/rdtgroup.c
>>>> +++ b/fs/resctrl/rdtgroup.c
>>>> @@ -973,10 +973,13 @@ static int rdt_last_cmd_status_show(struct kernfs_open_file *of,
>>>>
>>>> mutex_lock(&rdtgroup_mutex);
>>>> len = seq_buf_used(&last_cmd_status);
>>>> - if (len)
>>>> + if (len) {
>>>> seq_printf(seq, "%.*s", len, last_cmd_status_buf);
>>>> - else
>>>> + if (seq_buf_has_overflowed(&last_cmd_status))
>>>> + seq_puts(seq, "[truncated]\n");
>>>> + } else {
>>>> seq_puts(seq, "ok\n");
>>>> + }
>>>> mutex_unlock(&rdtgroup_mutex);
>>>> return 0;
>>>> }
>>>
>>> Adding a truncation indication makes sense to me. Would it be good to reserve space in the
>>> last_cmd_status_buf[] to ensure this can always be displayed? It looks like space could be
>>> made by interacting with seq->size directly but I'm not sure if there is a cleaner way
>>> to do it.
>>>
>>
>> Please note the distinction between the struct seq_file instance pointed to by seq and the
>> struct seq_buf instance last_cmd_status. The last_cmd_status seq_buf instance is backed by
>> last_cmd_status_buf of 512 bytes which is printed to the seq_file instance seq that is
>> backed by another buffer that starts out with size PAGE_SIZE. So, it looks to me
>> like printing last_cmd_status_buf to the seq seq_file instance followed by "[truncated]\n"
>> should fit by default? This should keep working even if last_cmd_status_buf size is
>
> Thanks for the explanation. I had missed that distinction. I've just given your code a go with
> some hacked in rdt_last_cmd_puts() calls and it behaves as you say.
Thank you very much for trying it out.
>
> We've discussed two changes, one is adding a truncation message to last_cmd_status and the other
> is carrying on after failure when allocating counters. Are you going to take these from here or would
> you like patches from me?
Adding the truncation message to last_cmd_status seems to complement the other last_cmd_status
improvements in this series. I'd be happy to create the patch and include it as part of this
series.
I would like to confirm behavior when encountering error when allocating counters though:
My previous assessment of rdtgroup_assign_cntr_event() callers was incomplete since it is also
called with NULL domain when the user uses the "*=<assignment state>" syntax to mbm_L3_assignments.
This is a different scenario from directory creation since it passes the error back to
user space. While here it may also be ok to carry on after a counter could not be allocated in
one domain I do think that user space should still learn via error return that not all allocations
succeeded.
Even so, to answer your original question, please create the patches that change the allocation
behavior. This seems to complement the work you are currently doing and would be easy for you to test
(I do not have access to assignable mode hardware).
Thank you.
Reinette
^ permalink raw reply [flat|nested] 37+ messages in thread* Re: [PATCH 00/11] x86,fs/resctrl: Improve resctrl quality and consistency
2026-03-19 16:18 ` Reinette Chatre
@ 2026-03-19 17:18 ` Ben Horgan
0 siblings, 0 replies; 37+ messages in thread
From: Ben Horgan @ 2026-03-19 17:18 UTC (permalink / raw)
To: Reinette Chatre, tony.luck, james.morse, Dave.Martin, babu.moger,
bp, tglx, dave.hansen
Cc: x86, hpa, fustini, fenghuay, peternewman, linux-kernel, patches
Hi Reinette,
On 3/19/26 16:18, Reinette Chatre wrote:
> Hi Ben,
>
> On 3/19/26 2:53 AM, Ben Horgan wrote:
>> Hi Reinette,
>>
>> On 3/18/26 20:12, Reinette Chatre wrote:
>>> Hi Ben,
>>>
>>> On 3/18/26 10:10 AM, Ben Horgan wrote:
>>>> On 3/18/26 16:35, Reinette Chatre wrote:
[...]
>>
>> We've discussed two changes, one is adding a truncation message to last_cmd_status and the other
>> is carrying on after failure when allocating counters. Are you going to take these from here or would
>> you like patches from me?
>
> Adding the truncation message to last_cmd_status seems to complement the other last_cmd_status
> improvements in this series. I'd be happy to create the patch and include it as part of this
> series.
>
> I would like to confirm behavior when encountering error when allocating counters though:
> My previous assessment of rdtgroup_assign_cntr_event() callers was incomplete since it is also
> called with NULL domain when the user uses the "*=<assignment state>" syntax to mbm_L3_assignments.
> This is a different scenario from directory creation since it passes the error back to
> user space. While here it may also be ok to carry on after a counter could not be allocated in
> one domain I do think that user space should still learn via error return that not all allocations
> succeeded.
Thanks for pointing this out, I'd missed it too.
>
> Even so, to answer your original question, please create the patches that change the allocation
> behavior. This seems to complement the work you are currently doing and would be easy for you to test
> (I do not have access to assignable mode hardware).
Ok, will do.
Thanks,
Ben
>
> Thank you.
>
> Reinette
>
^ permalink raw reply [flat|nested] 37+ messages in thread