* Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
2026-05-29 18:06 [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept Reinette Chatre
@ 2026-06-02 20:23 ` Babu Moger
2026-06-02 22:56 ` Reinette Chatre
2026-06-02 23:32 ` Chen, Yu C
` (3 subsequent siblings)
4 siblings, 1 reply; 30+ messages in thread
From: Babu Moger @ 2026-06-02 20:23 UTC (permalink / raw)
To: Reinette Chatre, Tony Luck, Ben Horgan, James Morse, Dave Martin,
Drew Fustini, Fenghua Yu, Chen Yu
Cc: Borislav Petkov, Thomas Gleixner, Dave Hansen, Peter Newman,
x86@kernel.org, linux-kernel@vger.kernel.org
Hi Reinette,
For some reason, I couldn’t find your patch on lore.kernel.org:
https://lore.kernel.org/lkml/?q=Reinette+Chatre
I eventually located it here:
https://sashiko.dev/#/message/aab804b9-e8b5-40ad-a85b-af7033391243%40intel.com
Thanks for sharing the patches. I’m still reviewing them.
I was able to build and boot the kernel and can see the MIN and MAX
controls. After moving your test code to __rdt_get_mem_config_amd().
On 5/29/26 13:06, Reinette Chatre wrote:
> Hi Everybody,
>
> It has been a while since we discussed the resctrl changes required to support
> hardware that has controls with fine granularity or hardware that has multiple
> controls per resource. For reference, the most recent email discussion can
> be found at [1] with a summary of discussions in last year's plumbers slides [2].
>
> I created a PoC that I believe supports what folks have agreed to so far. I
> hope this can help us to restart the discussion with the goal that resctrl gains
> support for upcoming hardware that require these features.
>
> Request regarding this PoC
> ==========================
>
> Please consider this PoC as a "direction check" on the schema description and multiple
> control discussions held thus far.
>
> Could folks working on enabling new hardware requiring this capability please consider
> if this is something you can build on and how it should be improved to support these
> upcoming capabilities?
>
> Opens
> =====
>
> While the PoC aims to support what folks agreed on some opens remain:
> - I attempted to make some MPAM supporting changes but these are all just compile
> tested. While MPAM should benefit from the new control properties I did not
> initialize them on MPAM and did not attempt refactor to separate out
> the architecture specific control properties (more on what this means later).
> I did attempt some MPAM refactoring that duplicates the MPAM domain to the
> control domain and monitoring domain lists in support of there being multiple
> controls each with its own list of control domains but it is definitely not good
> design.
> - No support for emulated controls (yet). The PoC is quite large already
> but I think it can be used as a base for emulated controls for which the software
> controller could be a potential first customer. In this PoC mounting with
> software controller will still display the original controller's properties.
> - One open that needs to be addressed as part of support for emulated controls is
> how best to display emulation relationship via resctrl hierarchy.
> - No support for "read-modify-write" usage of schemata file. This is where we
> discussed (without agreement) on possibly introducing the "#" prefix to schemata
> file entries. This PoC does not support this prefix and the current assumption/expectation
> is that when user space changes a configuration only the new control values are
> written to schemata file. I thus do not have a plan to support this so please
> share opinions in this regard if you have some.
> - Controls are independent for now. This means that, for example, if a resource
> supports a "MIN" and "MAX" control then this implementation would allow user to
> set the "maximum" control values to be less than the "minimum" control values.
> - PoC supports the "bitmap" control but does not (yet) expose properties of a bitmap
> control to the new info/<resource>/resource_schemata directory.
>
> Accessing PoC
> =============
>
> Please consider the PoC as a rough draft. It has only been compile tested for Arm
> and known to be incomplete in Arm support. To help with experimenting I only
> fully adapted the Intel MBA resource to demo two dummy additional MBA controls.
> All architectures should immediately benefit from the new schema descriptions
> and new info/MB/resource_schemata hierarchy.
>
> I considered the patches self too many for email. Instead, the PoC can be found at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/reinette/linux.git branch resctrl/controls_rfc_v1
>
> The work is based on v7.1-rc2 that also includes the following series (two of which has
> since been queued) included:
>
> "selftests/resctrl: Fixes and improvements focused on Intel platforms"
> https://lore.kernel.org/lkml/cover.1775266384.git.reinette.chatre@intel.com/
>
> "x86,fs/resctrl: Improve resctrl quality and consistency"
> https://lore.kernel.org/lkml/cover.1777419024.git.reinette.chatre@intel.com/
>
> "x86,fs/resctrl: Pave the way for MPAM counter assignment"
> https://lore.kernel.org/lkml/20260506082855.3694761-1-ben.horgan@arm.com/
>
>
> Primary resctrl fs data structure changes
> =========================================
>
> Introduces a control represented by struct resctrl_ctrl that looks as below. To make
> the changes easier to follow I kept some of the original names to help communicate
> where familiar data structures land.
>
> What to notice about a control is that it has some common properties required
> from all controls (scope, type, etc.) and then depending on the type of control
> (RESCTRL_CTRL_BITMAP or RESCTRL_CTRL_SCALAR) there are type specific properties.
>
> /**
> * struct resctrl_ctrl - A resource control
> * @entry: List entry of rdt_resource::controls
> * @scope: Scope of the resource that this control allocates
> * @domains: RCU list of all control domains
> * @type: The control type that determines the properties of the control,
> * format string for displaying control values to user space, and
> * parser of control values provided by user space.
> * @name: Name of the control. Appended to final resource name
> * (rdt_resource_final::name) to create final schema entry.
> * Specifically, "rdt_resource_final::name"_"resctrl_ctrl::name".
> * For example, with resource name "MB" and control name "MAX" the
> * schema entry will be "MB_MAX".
> * @cache: Cache allocation control properties.
> * @membw: Bandwidth control properties.
> */
> struct resctrl_ctrl {
> struct list_head entry;
> enum resctrl_scope scope;
> struct list_head domains;
> enum resctrl_ctrl_type type;
> enum resctrl_ctrl_name name;
> union {
> struct resctrl_cache cache;
> struct resctrl_membw membw;
> };
> };
>
> Two members summarize how this new structure fits into the rest of resctrl:
> a) resctrl_ctrl::entry
> Since a resource can support multiple controls there is a new list
> in struct rdt_resource named "controls" that contains the list of all
> controls supported by the resource.
> b) resctrl_ctrl::domains
> Instead of the list of control domains belonging to a resource they
> now belong to the control self. By doing so resctrl can support resource
> controls at different scope for the same resource. This is intended to
> support some upcoming MPAM and RISC-V usages.
>
I like the idea of supporting multiple controls for each resource.
With these patches, now we have one list containing all the controls.
However, in case of RDT_RESOURCE_L3, we have two lists "mon_domains" and
"controls". mon_domains list deals with monitoring and control deals
with control parts(multiple).
Have you thought about making the list("control") generic so that the
control can be monitoring also. It will just one list containing
multiple controls or monitor.
> Example architectural data structure changes
> ============================================
>
> An architecture can use the new control by following a similar pattern to
> resource and domain use by architectures. Consider the following for x86
> where a new architecture specific struct resctrl_hw_ctrl includes
> struct resctrl_ctrl and any architecture private data needed to support
> the control:
>
> /*
> * struct resctrl_hw_ctrl - Arch private properties of a resource control
> * @r_ctrl: Control properties exposed to resctrl file system
> * @msr_base: Base MSR address where control values should be programmed
> * @msr_update: Function pointer to update control values
> */
> struct resctrl_hw_ctrl {
> struct resctrl_ctrl r_ctrl;
> unsigned int msr_base;
> void (*msr_update)(struct msr_param *m);
> };
>
> Structure of patch series
> =========================
>
> As a PoC the series is not perfectly structured but to help navigate this work
> on a high level the changes can be categorized as follows:
>
> Patch 1 to 11:
> With a vision of what a "control" is, remove unused/unnecessary
> members, make clear what is a *resource* property vs a *control*
> property, do some renaming to help with the PoC.
>
> Patch 12:
> Introduce struct resctrl_ctrl and re-arrange existing struct rdt_resource
> members to form part of new rdt_resource::ctrl
>
> Patch 13 to 44:
> A lot of wrangling to introduce struct resctrl_ctrl to all code that needs
> to work with a control and/or domain without assuming that the control is
> the one and only control embedded in the resource it belongs to. Essentially,
> a lot of changes passing the control around in addition to the resource/domain.
>
> Patch 45:
> Switch the single struct resctrl_ctrl member of struct rdt_resource to be
> a list of struct resctrl_ctrl.
>
> Patch 47 to 49:
> Introduce new info/<resource>/resource_schemata hierarchy to first only
> consist of properties already known to resctrl fs.
>
> Patch 50 to 52:
> Introduce the new control properties per [1], initialize them for x86,
> and expose them via info/<resource>/resource_schemata
>
> Patch 53:
> Let the new struct resctrl_hw_ctrl contain architecture's control properties.
>
> Patch 54:
> Teach resctrl fs about "MIN" and "MAX" controls.
>
> Patch 55:
> Sample of "MIN" and "MAX" memory bandwidth controls for x86.
My assumption is that the MIN and MAX controls here are just examples,
correct?
You only mentioned patch 55 as "NOT_FOR_INCLUSION". I assume patch 54
should also be marked as "NOT_FOR_INCLUSION"?
Thanks
Babu
>
> Example interactions
> ====================
>
> This series can be used on an x86 system where it will show two new dummy controls
> where it is possible to interact with the new controls.
> For example:
>
> # cat schemata
> MB_MAX:0=100;1=100
> MB_MIN:0=100;1=100
> MB:0=100;1=100
> L3:0=fff;1=fff
> # echo 'MB_MIN:0=50' > schemata
> # cat schemata
> MB_MAX:0=100;1=100
> MB_MIN:0=50;1=100
> MB:0=100;1=100
> L3:0=fff;1=fff
>
> Writing to the dummy control will call a dummy callback that just prints to the
> kernel log:
> "resctrl: Updata temporary MIN control on domain 0 with user value 50"
>
>
> Example output of info/MB/:
> /sys/fs/resctrl/info/MB/thread_throttle_mode:max
> /sys/fs/resctrl/info/MB/num_closids:15
> /sys/fs/resctrl/info/MB/delay_linear:1
> /sys/fs/resctrl/info/MB/min_bandwidth:10
> /sys/fs/resctrl/info/MB/resource_schemata/MB/resolution:100
> /sys/fs/resctrl/info/MB/resource_schemata/MB/tolerance:5
> /sys/fs/resctrl/info/MB/resource_schemata/MB/type:scalar
> /sys/fs/resctrl/info/MB/resource_schemata/MB/min:10
> /sys/fs/resctrl/info/MB/resource_schemata/MB/scale:1
> /sys/fs/resctrl/info/MB/resource_schemata/MB/scope:L3
> /sys/fs/resctrl/info/MB/resource_schemata/MB/unit:all
> /sys/fs/resctrl/info/MB/resource_schemata/MB/max:100
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MIN/resolution:100
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MIN/tolerance:5
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MIN/type:scalar
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MIN/min:10
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MIN/scale:1
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MIN/scope:L3
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MIN/unit:all
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MIN/max:100
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MAX/resolution:100
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MAX/tolerance:5
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MAX/type:scalar
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MAX/min:10
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MAX/scale:1
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MAX/scope:L3
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MAX/unit:all
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MAX/max:100
> /sys/fs/resctrl/info/MB/bandwidth_gran:10
>
> Any feedback is appreciated.
>
> Reinette
>
> [1] https://lore.kernel.org/lkml/aPtfMFfLV1l%2FRB0L@e133380.arm.com/
> [2] https://lpc.events/event/19/contributions/2093/attachments/1958/4172/resctrl%20Microconference%20LPC%202025%20Tokyo.pdf
>
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
2026-06-02 20:23 ` Babu Moger
@ 2026-06-02 22:56 ` Reinette Chatre
2026-06-03 1:14 ` Moger, Babu
0 siblings, 1 reply; 30+ messages in thread
From: Reinette Chatre @ 2026-06-02 22:56 UTC (permalink / raw)
To: Babu Moger, Tony Luck, Ben Horgan, James Morse, Dave Martin,
Drew Fustini, Fenghua Yu, Chen Yu
Cc: Borislav Petkov, Thomas Gleixner, Dave Hansen, Peter Newman,
x86@kernel.org, linux-kernel@vger.kernel.org
Hi Babu,
On 6/2/26 1:23 PM, Babu Moger wrote:
> Hi Reinette,
>
> For some reason, I couldn’t find your patch on lore.kernel.org:
> https://lore.kernel.org/lkml/?q=Reinette+Chatre
How about:
https://lore.kernel.org/lkml/aab804b9-e8b5-40ad-a85b-af7033391243@intel.com/
>
> I eventually located it here:
> https://sashiko.dev/#/message/aab804b9-e8b5-40ad-a85b-af7033391243%40intel.com
>
> Thanks for sharing the patches. I’m still reviewing them.
>
> I was able to build and boot the kernel and can see the MIN and MAX controls. After moving your test code to __rdt_get_mem_config_amd().
Thank you very much for trying it out.
> On 5/29/26 13:06, Reinette Chatre wrote:
...
>> Primary resctrl fs data structure changes
>> =========================================
>>
>> Introduces a control represented by struct resctrl_ctrl that looks as below. To make
>> the changes easier to follow I kept some of the original names to help communicate
>> where familiar data structures land.
>>
>> What to notice about a control is that it has some common properties required
>> from all controls (scope, type, etc.) and then depending on the type of control
>> (RESCTRL_CTRL_BITMAP or RESCTRL_CTRL_SCALAR) there are type specific properties.
>>
>> /**
>> * struct resctrl_ctrl - A resource control
>> * @entry: List entry of rdt_resource::controls
>> * @scope: Scope of the resource that this control allocates
>> * @domains: RCU list of all control domains
>> * @type: The control type that determines the properties of the control,
>> * format string for displaying control values to user space, and
>> * parser of control values provided by user space.
>> * @name: Name of the control. Appended to final resource name
>> * (rdt_resource_final::name) to create final schema entry.
>> * Specifically, "rdt_resource_final::name"_"resctrl_ctrl::name".
>> * For example, with resource name "MB" and control name "MAX" the
>> * schema entry will be "MB_MAX".
>> * @cache: Cache allocation control properties.
>> * @membw: Bandwidth control properties.
>> */
>> struct resctrl_ctrl {
>> struct list_head entry;
>> enum resctrl_scope scope;
>> struct list_head domains;
>> enum resctrl_ctrl_type type;
>> enum resctrl_ctrl_name name;
>> union {
>> struct resctrl_cache cache;
>> struct resctrl_membw membw;
>> };
>> };
>>
>> Two members summarize how this new structure fits into the rest of resctrl:
>> a) resctrl_ctrl::entry
>> Since a resource can support multiple controls there is a new list
>> in struct rdt_resource named "controls" that contains the list of all
>> controls supported by the resource.
>> b) resctrl_ctrl::domains
>> Instead of the list of control domains belonging to a resource they
>> now belong to the control self. By doing so resctrl can support resource
>> controls at different scope for the same resource. This is intended to
>> support some upcoming MPAM and RISC-V usages.
>>
>
> I like the idea of supporting multiple controls for each resource.
>
> With these patches, now we have one list containing all the controls.
>
> However, in case of RDT_RESOURCE_L3, we have two lists "mon_domains" and "controls". mon_domains list deals with monitoring and control deals with control parts(multiple).
>
> Have you thought about making the list("control") generic so that the control can be monitoring also. It will just one list containing multiple controls or monitor.
The control list adds an additional layer of abstraction just for control management,
independent from monitoring. The mon_domains list is unchanged while each control now
has its own ctrl_domains list.
Here is an attempt to visualize how a resource with two monitoring domains, and two controls,
each with two control domains end up being managed:
+-------------------------+
| struct rdt_resource |
+-------------------------+
| ... |
| controls (list_head) |---------+
| mon_domains (list_head) |---+ |
| ... | | |
+-------------------------+ | |
| |
+---------------------------+ |
| |
v v
+-----------------------------+ +-------------------------+
| struct rdt_l3_mon_domain #1 | | struct resctrl_ctrl #A |
+-----------------------------+ +-------------------------+
| rdt_domain_hdr |-+ | entry (list_head) | +----------------------------+
| ... | | | domains (list_head) |------>| struct rdt_ctrl_domain #A1 |
+-----------------------------+ | | ... | +----------------------------+
| +-------------------------+ | rdt_domain_hdr |---+
+-----------------------------+ | | ... | |
| (next) | (next) +----------------------------+ |
v v |
+-----------------------------+ +-------------------------+ +----------------------------+<--+
| struct rdt_l3_mon_domain #2 | | struct resctrl_ctrl #B | | struct rdt_ctrl_domain #A2 |
+-----------------------------+ +-------------------------+ +----------------------------+
| rdt_domain_hdr | | entry (list_head) | | rdt_domain_hdr |
| ... | | domains (list_head) |----n | ... |
+-----------------------------+ | ... | | +----------------------------+
+-------------------------+ |
|
+----------------------------+
|
v
+----------------------------+
| struct rdt_ctrl_domain #B1 |
+----------------------------+
| rdt_domain_hdr |---+
| ... | |
+----------------------------+ |
|
+----------------------------+<--+
| struct rdt_ctrl_domain #B2 |
+----------------------------+
| rdt_domain_hdr |
| ... |
+----------------------------+
resctrl used to manage single domains that are capable of both monitoring and control
but this was split to support scenario where monitoring and control of a resource are
done at different scope. See
cd84f72b6a5c ("x86/resctrl: Prepare for different scope for control/monitor operations")
This feature further expands the difference between monitoring and control since now there
can be multiple instances of a control domain (one per control) associated with a resource
while monitoring still just supports one monitoring domain per resource.
I thus cannot see how this can be accomplished with a single list. Could you sketch out
what you have in mind?
...
>>
>> Patch 54:
>> Teach resctrl fs about "MIN" and "MAX" controls.
>>
>> Patch 55:
>> Sample of "MIN" and "MAX" memory bandwidth controls for x86.
>
> My assumption is that the MIN and MAX controls here are just examples, correct?
>
> You only mentioned patch 55 as "NOT_FOR_INCLUSION". I assume patch 54 should also be marked as "NOT_FOR_INCLUSION"?
While patch 55 is the first and only "user" of patch 54 I believe that patch 54 (or some variant
of it) will stay since we already know that both MPAM and Intel need to support min and max controls.
Reinette
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
2026-06-02 22:56 ` Reinette Chatre
@ 2026-06-03 1:14 ` Moger, Babu
2026-06-03 3:55 ` Reinette Chatre
0 siblings, 1 reply; 30+ messages in thread
From: Moger, Babu @ 2026-06-03 1:14 UTC (permalink / raw)
To: Reinette Chatre, Babu Moger, Tony Luck, Ben Horgan, James Morse,
Dave Martin, Drew Fustini, Fenghua Yu, Chen Yu
Cc: Borislav Petkov, Thomas Gleixner, Dave Hansen, Peter Newman,
x86@kernel.org, linux-kernel@vger.kernel.org
Hi Reinette,
On 6/2/2026 5:56 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/2/26 1:23 PM, Babu Moger wrote:
>> Hi Reinette,
>>
>> For some reason, I couldn’t find your patch on lore.kernel.org:
>> https://lore.kernel.org/lkml/?q=Reinette+Chatre
>
> How about:
> https://lore.kernel.org/lkml/aab804b9-e8b5-40ad-a85b-af7033391243@intel.com/
Yes. It works. But, I used to find patches using the names. It did not
work this time.
>
>>
>> I eventually located it here:
>> https://sashiko.dev/#/message/aab804b9-e8b5-40ad-a85b-af7033391243%40intel.com
>>
>> Thanks for sharing the patches. I’m still reviewing them.
>>
>> I was able to build and boot the kernel and can see the MIN and MAX controls. After moving your test code to __rdt_get_mem_config_amd().
>
> Thank you very much for trying it out.
>
>> On 5/29/26 13:06, Reinette Chatre wrote:
>
> ...
>
>>> Primary resctrl fs data structure changes
>>> =========================================
>>>
>>> Introduces a control represented by struct resctrl_ctrl that looks as below. To make
>>> the changes easier to follow I kept some of the original names to help communicate
>>> where familiar data structures land.
>>>
>>> What to notice about a control is that it has some common properties required
>>> from all controls (scope, type, etc.) and then depending on the type of control
>>> (RESCTRL_CTRL_BITMAP or RESCTRL_CTRL_SCALAR) there are type specific properties.
>>>
>>> /**
>>> * struct resctrl_ctrl - A resource control
>>> * @entry: List entry of rdt_resource::controls
>>> * @scope: Scope of the resource that this control allocates
>>> * @domains: RCU list of all control domains
>>> * @type: The control type that determines the properties of the control,
>>> * format string for displaying control values to user space, and
>>> * parser of control values provided by user space.
>>> * @name: Name of the control. Appended to final resource name
>>> * (rdt_resource_final::name) to create final schema entry.
>>> * Specifically, "rdt_resource_final::name"_"resctrl_ctrl::name".
>>> * For example, with resource name "MB" and control name "MAX" the
>>> * schema entry will be "MB_MAX".
>>> * @cache: Cache allocation control properties.
>>> * @membw: Bandwidth control properties.
>>> */
>>> struct resctrl_ctrl {
>>> struct list_head entry;
>>> enum resctrl_scope scope;
>>> struct list_head domains;
>>> enum resctrl_ctrl_type type;
>>> enum resctrl_ctrl_name name;
>>> union {
>>> struct resctrl_cache cache;
>>> struct resctrl_membw membw;
>>> };
>>> };
>>>
>>> Two members summarize how this new structure fits into the rest of resctrl:
>>> a) resctrl_ctrl::entry
>>> Since a resource can support multiple controls there is a new list
>>> in struct rdt_resource named "controls" that contains the list of all
>>> controls supported by the resource.
>>> b) resctrl_ctrl::domains
>>> Instead of the list of control domains belonging to a resource they
>>> now belong to the control self. By doing so resctrl can support resource
>>> controls at different scope for the same resource. This is intended to
>>> support some upcoming MPAM and RISC-V usages.
>>>
>>
>> I like the idea of supporting multiple controls for each resource.
>>
>> With these patches, now we have one list containing all the controls.
>>
>> However, in case of RDT_RESOURCE_L3, we have two lists "mon_domains" and "controls". mon_domains list deals with monitoring and control deals with control parts(multiple).
>>
>> Have you thought about making the list("control") generic so that the control can be monitoring also. It will just one list containing multiple controls or monitor.
>
> The control list adds an additional layer of abstraction just for control management,
> independent from monitoring. The mon_domains list is unchanged while each control now
> has its own ctrl_domains list.
>
> Here is an attempt to visualize how a resource with two monitoring domains, and two controls,
> each with two control domains end up being managed:
>
> +-------------------------+
> | struct rdt_resource |
> +-------------------------+
> | ... |
> | controls (list_head) |---------+
> | mon_domains (list_head) |---+ |
> | ... | | |
> +-------------------------+ | |
> | |
> +---------------------------+ |
> | |
> v v
> +-----------------------------+ +-------------------------+
> | struct rdt_l3_mon_domain #1 | | struct resctrl_ctrl #A |
> +-----------------------------+ +-------------------------+
> | rdt_domain_hdr |-+ | entry (list_head) | +----------------------------+
> | ... | | | domains (list_head) |------>| struct rdt_ctrl_domain #A1 |
> +-----------------------------+ | | ... | +----------------------------+
> | +-------------------------+ | rdt_domain_hdr |---+
> +-----------------------------+ | | ... | |
> | (next) | (next) +----------------------------+ |
> v v |
> +-----------------------------+ +-------------------------+ +----------------------------+<--+
> | struct rdt_l3_mon_domain #2 | | struct resctrl_ctrl #B | | struct rdt_ctrl_domain #A2 |
> +-----------------------------+ +-------------------------+ +----------------------------+
> | rdt_domain_hdr | | entry (list_head) | | rdt_domain_hdr |
> | ... | | domains (list_head) |----n | ... |
> +-----------------------------+ | ... | | +----------------------------+
> +-------------------------+ |
> |
> +----------------------------+
> |
> v
> +----------------------------+
> | struct rdt_ctrl_domain #B1 |
> +----------------------------+
> | rdt_domain_hdr |---+
> | ... | |
> +----------------------------+ |
> |
> +----------------------------+<--+
> | struct rdt_ctrl_domain #B2 |
> +----------------------------+
> | rdt_domain_hdr |
> | ... |
> +----------------------------+
>
> resctrl used to manage single domains that are capable of both monitoring and control
> but this was split to support scenario where monitoring and control of a resource are
> done at different scope. See
> cd84f72b6a5c ("x86/resctrl: Prepare for different scope for control/monitor operations")
>
> This feature further expands the difference between monitoring and control since now there
> can be multiple instances of a control domain (one per control) associated with a resource
> while monitoring still just supports one monitoring domain per resource.
>
> I thus cannot see how this can be accomplished with a single list. Could you sketch out
> what you have in mind?
>
Thanks for the sketch. It makes it much more clear.
mon_domains directly maintains a list of monitoring domains, whereas
controls maintains a list of control objects, each of which owns its own
domain list.
I was thinking, one possibility would be to make mon_domains follow a
structure similar to controls, allowing both to be represented through a
common top-level list.
However, the resulting abstraction may become fairly large and complex,
with limited overlap between the monitoring and control data structures.
> ...
>
>>>
>>> Patch 54:
>>> Teach resctrl fs about "MIN" and "MAX" controls.
>>>
>>> Patch 55:
>>> Sample of "MIN" and "MAX" memory bandwidth controls for x86.
>>
>> My assumption is that the MIN and MAX controls here are just examples, correct?
>>
>> You only mentioned patch 55 as "NOT_FOR_INCLUSION". I assume patch 54 should also be marked as "NOT_FOR_INCLUSION"?
>
> While patch 55 is the first and only "user" of patch 54 I believe that patch 54 (or some variant
> of it) will stay since we already know that both MPAM and Intel need to support min and max controls.
That is good to know.
Thanks
Babu
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
2026-06-03 1:14 ` Moger, Babu
@ 2026-06-03 3:55 ` Reinette Chatre
2026-06-03 14:40 ` Babu Moger
0 siblings, 1 reply; 30+ messages in thread
From: Reinette Chatre @ 2026-06-03 3:55 UTC (permalink / raw)
To: Moger, Babu, Babu Moger, Tony Luck, Ben Horgan, James Morse,
Dave Martin, Drew Fustini, Fenghua Yu, Chen Yu
Cc: Borislav Petkov, Thomas Gleixner, Dave Hansen, Peter Newman,
x86@kernel.org, linux-kernel@vger.kernel.org
Hi Babu,
On 6/2/26 6:14 PM, Moger, Babu wrote:
> Hi Reinette,
>
> On 6/2/2026 5:56 PM, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 6/2/26 1:23 PM, Babu Moger wrote:
>>> Hi Reinette,
>>>
>>> For some reason, I couldn’t find your patch on lore.kernel.org:
>>> https://lore.kernel.org/lkml/?q=Reinette+Chatre
>>
>> How about:
>> https://lore.kernel.org/lkml/aab804b9-e8b5-40ad-a85b-af7033391243@intel.com/
>
> Yes. It works. But, I used to find patches using the names. It did not work this time.
Interesting. When I run your lore query the first email from this thread is among the
results.
>
>>
>>>
>>> I eventually located it here:
>>> https://sashiko.dev/#/message/aab804b9-e8b5-40ad-a85b-af7033391243%40intel.com
>>>
>>> Thanks for sharing the patches. I’m still reviewing them.
>>>
>>> I was able to build and boot the kernel and can see the MIN and MAX controls. After moving your test code to __rdt_get_mem_config_amd().
>>
>> Thank you very much for trying it out.
>>
>>> On 5/29/26 13:06, Reinette Chatre wrote:
>>
>> ...
>>
>>>> Primary resctrl fs data structure changes
>>>> =========================================
>>>>
>>>> Introduces a control represented by struct resctrl_ctrl that looks as below. To make
>>>> the changes easier to follow I kept some of the original names to help communicate
>>>> where familiar data structures land.
>>>>
>>>> What to notice about a control is that it has some common properties required
>>>> from all controls (scope, type, etc.) and then depending on the type of control
>>>> (RESCTRL_CTRL_BITMAP or RESCTRL_CTRL_SCALAR) there are type specific properties.
>>>>
>>>> /**
>>>> * struct resctrl_ctrl - A resource control
>>>> * @entry: List entry of rdt_resource::controls
>>>> * @scope: Scope of the resource that this control allocates
>>>> * @domains: RCU list of all control domains
>>>> * @type: The control type that determines the properties of the control,
>>>> * format string for displaying control values to user space, and
>>>> * parser of control values provided by user space.
>>>> * @name: Name of the control. Appended to final resource name
>>>> * (rdt_resource_final::name) to create final schema entry.
>>>> * Specifically, "rdt_resource_final::name"_"resctrl_ctrl::name".
>>>> * For example, with resource name "MB" and control name "MAX" the
>>>> * schema entry will be "MB_MAX".
>>>> * @cache: Cache allocation control properties.
>>>> * @membw: Bandwidth control properties.
>>>> */
>>>> struct resctrl_ctrl {
>>>> struct list_head entry;
>>>> enum resctrl_scope scope;
>>>> struct list_head domains;
>>>> enum resctrl_ctrl_type type;
>>>> enum resctrl_ctrl_name name;
>>>> union {
>>>> struct resctrl_cache cache;
>>>> struct resctrl_membw membw;
>>>> };
>>>> };
>>>>
>>>> Two members summarize how this new structure fits into the rest of resctrl:
>>>> a) resctrl_ctrl::entry
>>>> Since a resource can support multiple controls there is a new list
>>>> in struct rdt_resource named "controls" that contains the list of all
>>>> controls supported by the resource.
>>>> b) resctrl_ctrl::domains
>>>> Instead of the list of control domains belonging to a resource they
>>>> now belong to the control self. By doing so resctrl can support resource
>>>> controls at different scope for the same resource. This is intended to
>>>> support some upcoming MPAM and RISC-V usages.
>>>>
>>>
>>> I like the idea of supporting multiple controls for each resource.
>>>
>>> With these patches, now we have one list containing all the controls.
>>>
>>> However, in case of RDT_RESOURCE_L3, we have two lists "mon_domains" and "controls". mon_domains list deals with monitoring and control deals with control parts(multiple).
>>>
>>> Have you thought about making the list("control") generic so that the control can be monitoring also. It will just one list containing multiple controls or monitor.
>>
>> The control list adds an additional layer of abstraction just for control management,
>> independent from monitoring. The mon_domains list is unchanged while each control now
>> has its own ctrl_domains list.
>>
>> Here is an attempt to visualize how a resource with two monitoring domains, and two controls,
>> each with two control domains end up being managed:
>>
>> +-------------------------+
>> | struct rdt_resource |
>> +-------------------------+
>> | ... |
>> | controls (list_head) |---------+
>> | mon_domains (list_head) |---+ |
>> | ... | | |
>> +-------------------------+ | |
>> | |
>> +---------------------------+ |
>> | |
>> v v
>> +-----------------------------+ +-------------------------+
>> | struct rdt_l3_mon_domain #1 | | struct resctrl_ctrl #A |
>> +-----------------------------+ +-------------------------+
>> | rdt_domain_hdr |-+ | entry (list_head) | +----------------------------+
>> | ... | | | domains (list_head) |------>| struct rdt_ctrl_domain #A1 |
>> +-----------------------------+ | | ... | +----------------------------+
>> | +-------------------------+ | rdt_domain_hdr |---+
>> +-----------------------------+ | | ... | |
>> | (next) | (next) +----------------------------+ |
>> v v |
>> +-----------------------------+ +-------------------------+ +----------------------------+<--+
>> | struct rdt_l3_mon_domain #2 | | struct resctrl_ctrl #B | | struct rdt_ctrl_domain #A2 |
>> +-----------------------------+ +-------------------------+ +----------------------------+
>> | rdt_domain_hdr | | entry (list_head) | | rdt_domain_hdr |
>> | ... | | domains (list_head) |----n | ... |
>> +-----------------------------+ | ... | | +----------------------------+
>> +-------------------------+ |
>> |
>> +----------------------------+
>> |
>> v
>> +----------------------------+
>> | struct rdt_ctrl_domain #B1 |
>> +----------------------------+
>> | rdt_domain_hdr |---+
>> | ... | |
>> +----------------------------+ |
>> |
>> +----------------------------+<--+
>> | struct rdt_ctrl_domain #B2 |
>> +----------------------------+
>> | rdt_domain_hdr |
>> | ... |
>> +----------------------------+
>>
>> resctrl used to manage single domains that are capable of both monitoring and control
>> but this was split to support scenario where monitoring and control of a resource are
>> done at different scope. See
>> cd84f72b6a5c ("x86/resctrl: Prepare for different scope for control/monitor operations")
>>
>> This feature further expands the difference between monitoring and control since now there
>> can be multiple instances of a control domain (one per control) associated with a resource
>> while monitoring still just supports one monitoring domain per resource.
>>
>> I thus cannot see how this can be accomplished with a single list. Could you sketch out
>> what you have in mind?
>>
>
> Thanks for the sketch. It makes it much more clear.
>
> mon_domains directly maintains a list of monitoring domains, whereas controls maintains a list of control objects, each of which owns its own domain list.
>
> I was thinking, one possibility would be to make mon_domains follow a structure similar to controls, allowing both to be represented through a common top-level list.
>
> However, the resulting abstraction may become fairly large and complex, with limited overlap between the monitoring and control data structures.
>
I am trying to envision this but it sounds to be as though each control has its own monitoring
which is not clear to me. resctrl ultimately needs to support hardware features. Could you please
share more about the hardware feature(s) this design intends to support?
Reinette
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
2026-06-03 3:55 ` Reinette Chatre
@ 2026-06-03 14:40 ` Babu Moger
0 siblings, 0 replies; 30+ messages in thread
From: Babu Moger @ 2026-06-03 14:40 UTC (permalink / raw)
To: Reinette Chatre, Moger, Babu, Tony Luck, Ben Horgan, James Morse,
Dave Martin, Drew Fustini, Fenghua Yu, Chen Yu
Cc: Borislav Petkov, Thomas Gleixner, Dave Hansen, Peter Newman,
x86@kernel.org, linux-kernel@vger.kernel.org
Hi Reinette,
On 6/2/26 22:55, Reinette Chatre wrote:
> Hi Babu,
>
> On 6/2/26 6:14 PM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 6/2/2026 5:56 PM, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 6/2/26 1:23 PM, Babu Moger wrote:
>>>> Hi Reinette,
>>>>
>>>> For some reason, I couldn’t find your patch on lore.kernel.org:
>>>> https://lore.kernel.org/lkml/?q=Reinette+Chatre
>>>
>>> How about:
>>> https://lore.kernel.org/lkml/aab804b9-e8b5-40ad-a85b-af7033391243@intel.com/
>>
>> Yes. It works. But, I used to find patches using the names. It did not work this time.
>
> Interesting. When I run your lore query the first email from this thread is among the
> results.
>
>>
>>>
>>>>
>>>> I eventually located it here:
>>>> https://sashiko.dev/#/message/aab804b9-e8b5-40ad-a85b-af7033391243%40intel.com
>>>>
>>>> Thanks for sharing the patches. I’m still reviewing them.
>>>>
>>>> I was able to build and boot the kernel and can see the MIN and MAX controls. After moving your test code to __rdt_get_mem_config_amd().
>>>
>>> Thank you very much for trying it out.
>>>
>>>> On 5/29/26 13:06, Reinette Chatre wrote:
>>>
>>> ...
>>>
>>>>> Primary resctrl fs data structure changes
>>>>> =========================================
>>>>>
>>>>> Introduces a control represented by struct resctrl_ctrl that looks as below. To make
>>>>> the changes easier to follow I kept some of the original names to help communicate
>>>>> where familiar data structures land.
>>>>>
>>>>> What to notice about a control is that it has some common properties required
>>>>> from all controls (scope, type, etc.) and then depending on the type of control
>>>>> (RESCTRL_CTRL_BITMAP or RESCTRL_CTRL_SCALAR) there are type specific properties.
>>>>>
>>>>> /**
>>>>> * struct resctrl_ctrl - A resource control
>>>>> * @entry: List entry of rdt_resource::controls
>>>>> * @scope: Scope of the resource that this control allocates
>>>>> * @domains: RCU list of all control domains
>>>>> * @type: The control type that determines the properties of the control,
>>>>> * format string for displaying control values to user space, and
>>>>> * parser of control values provided by user space.
>>>>> * @name: Name of the control. Appended to final resource name
>>>>> * (rdt_resource_final::name) to create final schema entry.
>>>>> * Specifically, "rdt_resource_final::name"_"resctrl_ctrl::name".
>>>>> * For example, with resource name "MB" and control name "MAX" the
>>>>> * schema entry will be "MB_MAX".
>>>>> * @cache: Cache allocation control properties.
>>>>> * @membw: Bandwidth control properties.
>>>>> */
>>>>> struct resctrl_ctrl {
>>>>> struct list_head entry;
>>>>> enum resctrl_scope scope;
>>>>> struct list_head domains;
>>>>> enum resctrl_ctrl_type type;
>>>>> enum resctrl_ctrl_name name;
>>>>> union {
>>>>> struct resctrl_cache cache;
>>>>> struct resctrl_membw membw;
>>>>> };
>>>>> };
>>>>>
>>>>> Two members summarize how this new structure fits into the rest of resctrl:
>>>>> a) resctrl_ctrl::entry
>>>>> Since a resource can support multiple controls there is a new list
>>>>> in struct rdt_resource named "controls" that contains the list of all
>>>>> controls supported by the resource.
>>>>> b) resctrl_ctrl::domains
>>>>> Instead of the list of control domains belonging to a resource they
>>>>> now belong to the control self. By doing so resctrl can support resource
>>>>> controls at different scope for the same resource. This is intended to
>>>>> support some upcoming MPAM and RISC-V usages.
>>>>>
>>>>
>>>> I like the idea of supporting multiple controls for each resource.
>>>>
>>>> With these patches, now we have one list containing all the controls.
>>>>
>>>> However, in case of RDT_RESOURCE_L3, we have two lists "mon_domains" and "controls". mon_domains list deals with monitoring and control deals with control parts(multiple).
>>>>
>>>> Have you thought about making the list("control") generic so that the control can be monitoring also. It will just one list containing multiple controls or monitor.
>>>
>>> The control list adds an additional layer of abstraction just for control management,
>>> independent from monitoring. The mon_domains list is unchanged while each control now
>>> has its own ctrl_domains list.
>>>
>>> Here is an attempt to visualize how a resource with two monitoring domains, and two controls,
>>> each with two control domains end up being managed:
>>>
>>> +-------------------------+
>>> | struct rdt_resource |
>>> +-------------------------+
>>> | ... |
>>> | controls (list_head) |---------+
>>> | mon_domains (list_head) |---+ |
>>> | ... | | |
>>> +-------------------------+ | |
>>> | |
>>> +---------------------------+ |
>>> | |
>>> v v
>>> +-----------------------------+ +-------------------------+
>>> | struct rdt_l3_mon_domain #1 | | struct resctrl_ctrl #A |
>>> +-----------------------------+ +-------------------------+
>>> | rdt_domain_hdr |-+ | entry (list_head) | +----------------------------+
>>> | ... | | | domains (list_head) |------>| struct rdt_ctrl_domain #A1 |
>>> +-----------------------------+ | | ... | +----------------------------+
>>> | +-------------------------+ | rdt_domain_hdr |---+
>>> +-----------------------------+ | | ... | |
>>> | (next) | (next) +----------------------------+ |
>>> v v |
>>> +-----------------------------+ +-------------------------+ +----------------------------+<--+
>>> | struct rdt_l3_mon_domain #2 | | struct resctrl_ctrl #B | | struct rdt_ctrl_domain #A2 |
>>> +-----------------------------+ +-------------------------+ +----------------------------+
>>> | rdt_domain_hdr | | entry (list_head) | | rdt_domain_hdr |
>>> | ... | | domains (list_head) |----n | ... |
>>> +-----------------------------+ | ... | | +----------------------------+
>>> +-------------------------+ |
>>> |
>>> +----------------------------+
>>> |
>>> v
>>> +----------------------------+
>>> | struct rdt_ctrl_domain #B1 |
>>> +----------------------------+
>>> | rdt_domain_hdr |---+
>>> | ... | |
>>> +----------------------------+ |
>>> |
>>> +----------------------------+<--+
>>> | struct rdt_ctrl_domain #B2 |
>>> +----------------------------+
>>> | rdt_domain_hdr |
>>> | ... |
>>> +----------------------------+
>>>
>>> resctrl used to manage single domains that are capable of both monitoring and control
>>> but this was split to support scenario where monitoring and control of a resource are
>>> done at different scope. See
>>> cd84f72b6a5c ("x86/resctrl: Prepare for different scope for control/monitor operations")
>>>
>>> This feature further expands the difference between monitoring and control since now there
>>> can be multiple instances of a control domain (one per control) associated with a resource
>>> while monitoring still just supports one monitoring domain per resource.
>>>
>>> I thus cannot see how this can be accomplished with a single list. Could you sketch out
>>> what you have in mind?
>>>
>>
>> Thanks for the sketch. It makes it much more clear.
>>
>> mon_domains directly maintains a list of monitoring domains, whereas controls maintains a list of control objects, each of which owns its own domain list.
>>
>> I was thinking, one possibility would be to make mon_domains follow a structure similar to controls, allowing both to be represented through a common top-level list.
>>
>> However, the resulting abstraction may become fairly large and complex, with limited overlap between the monitoring and control data structures.
>>
>
> I am trying to envision this but it sounds to be as though each control has its own monitoring
> which is not clear to me. resctrl ultimately needs to support hardware features. Could you please
> share more about the hardware feature(s) this design intends to support?
>
This is mostly my thinking. There is no new hardware feature that
required this change. We can safely ignore this idea.
Thanks
Babu
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
2026-05-29 18:06 [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept Reinette Chatre
2026-06-02 20:23 ` Babu Moger
@ 2026-06-02 23:32 ` Chen, Yu C
2026-06-03 3:45 ` Reinette Chatre
2026-06-03 15:15 ` Ben Horgan
` (2 subsequent siblings)
4 siblings, 1 reply; 30+ messages in thread
From: Chen, Yu C @ 2026-06-02 23:32 UTC (permalink / raw)
To: Reinette Chatre
Cc: Borislav Petkov, Thomas Gleixner, Dave Hansen, Peter Newman,
x86@kernel.org, linux-kernel@vger.kernel.org, Tony Luck,
Dave Martin, James Morse, Ben Horgan, Babu Moger, Drew Fustini,
Fenghua Yu
Hi Reinette,
On 5/30/2026 2:06 AM, Reinette Chatre wrote:
[ ... ]
> /**
> * struct resctrl_ctrl - A resource control
> * @entry: List entry of rdt_resource::controls
> * @scope: Scope of the resource that this control allocates
> * @domains: RCU list of all control domains
> * @type: The control type that determines the properties of the control,
> * format string for displaying control values to user space, and
> * parser of control values provided by user space.
> * @name: Name of the control. Appended to final resource name
> * (rdt_resource_final::name) to create final schema entry.
> * Specifically, "rdt_resource_final::name"_"resctrl_ctrl::name".
> * For example, with resource name "MB" and control name "MAX" the
> * schema entry will be "MB_MAX".
> * @cache: Cache allocation control properties.
> * @membw: Bandwidth control properties.
> */
> struct resctrl_ctrl {
> struct list_head entry;
> enum resctrl_scope scope;
> struct list_head domains;
> enum resctrl_ctrl_type type;
> enum resctrl_ctrl_name name;
> union {
> struct resctrl_cache cache;
> struct resctrl_membw membw;
> };
> };
>
Thanks for re-spinning this patch set.
Looking at commit (fs/resctrl: Introduce additional schema properties),
the newly added properties appear to be implemented for the
MBA resctrl_membw controller.
Would it make sense for these properties to be generic across all resctrl
controllers, CAT included? Should they consequently be relocated into
struct resctrl_ctrl if that approach is appropriate?
thanks,
Chenyu
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
2026-06-02 23:32 ` Chen, Yu C
@ 2026-06-03 3:45 ` Reinette Chatre
2026-06-03 11:53 ` Chen, Yu C
0 siblings, 1 reply; 30+ messages in thread
From: Reinette Chatre @ 2026-06-03 3:45 UTC (permalink / raw)
To: Chen, Yu C
Cc: Borislav Petkov, Thomas Gleixner, Dave Hansen, Peter Newman,
x86@kernel.org, linux-kernel@vger.kernel.org, Tony Luck,
Dave Martin, James Morse, Ben Horgan, Babu Moger, Drew Fustini,
Fenghua Yu
Hi Chenyu,
On 6/2/26 4:32 PM, Chen, Yu C wrote:
> Hi Reinette,
> On 5/30/2026 2:06 AM, Reinette Chatre wrote:
>
> [ ... ]
>
>> /**
>> * struct resctrl_ctrl - A resource control
>> * @entry: List entry of rdt_resource::controls
>> * @scope: Scope of the resource that this control allocates
>> * @domains: RCU list of all control domains
>> * @type: The control type that determines the properties of the control,
>> * format string for displaying control values to user space, and
>> * parser of control values provided by user space.
>> * @name: Name of the control. Appended to final resource name
>> * (rdt_resource_final::name) to create final schema entry.
>> * Specifically, "rdt_resource_final::name"_"resctrl_ctrl::name".
>> * For example, with resource name "MB" and control name "MAX" the
>> * schema entry will be "MB_MAX".
>> * @cache: Cache allocation control properties.
>> * @membw: Bandwidth control properties.
>> */
>> struct resctrl_ctrl {
>> struct list_head entry;
>> enum resctrl_scope scope;
>> struct list_head domains;
>> enum resctrl_ctrl_type type;
>> enum resctrl_ctrl_name name;
>> union {
>> struct resctrl_cache cache;
>> struct resctrl_membw membw;
>> };
>> };
>>
>
> Thanks for re-spinning this patch set.
> Looking at commit (fs/resctrl: Introduce additional schema properties),
> the newly added properties appear to be implemented for the
> MBA resctrl_membw controller.
Please do note that the "struct resctrl_membw" naming is temporary. I only kept
current naming to help folks find where existing known structures land in this new
design but the "struct resctrl_membw" name is not appropriate moving forward.
When user space interacts with the new controls the intention is that each control
has a "type" that the user can find in the new "type" file and depending on that value
the user will know what other files/properties can be expected.
In this PoC there are two types: "scalar" and "bitmap" and each is associated with
a struct that contains the properties associated with that type.
Considering this, these structures could, for example, be renamed as (very long):
struct resctrl_membw -> struct resctrl_ctrl_scalar
struct resctrl_cache -> struct resctrl_ctrl_bitmap
> Would it make sense for these properties to be generic across all resctrl
> controllers, CAT included? Should they consequently be relocated into
> struct resctrl_ctrl if that approach is appropriate?
If there are properties that are common across all controls of all types
then they should be moved to struct resctrl_ctrl. Current examples are "type"
and "scope" that user space can expect from every control. It is not obvious to
me at this time what other properties we can designate as common in this way.
For bitmaps there is already the properties that are available in the top-level
directory. These can just be replicated as properties into the resource_schemata,
but perhaps using new names as suggested by Dave in
https://lore.kernel.org/lkml/aQOUAeVP9oc7RIn%2F@e133380.arm.com/
Reinette
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
2026-06-03 3:45 ` Reinette Chatre
@ 2026-06-03 11:53 ` Chen, Yu C
2026-06-04 16:37 ` Reinette Chatre
0 siblings, 1 reply; 30+ messages in thread
From: Chen, Yu C @ 2026-06-03 11:53 UTC (permalink / raw)
To: Reinette Chatre
Cc: Borislav Petkov, Thomas Gleixner, Dave Hansen, Peter Newman,
x86@kernel.org, linux-kernel@vger.kernel.org, Tony Luck,
Dave Martin, James Morse, Ben Horgan, Babu Moger, Drew Fustini,
Fenghua Yu
Hi Reinette,
On 6/3/2026 11:45 AM, Reinette Chatre wrote:
> Hi Chenyu,
>
[ ... ]
>>
>> Thanks for re-spinning this patch set.
>> Looking at commit (fs/resctrl: Introduce additional schema properties),
>> the newly added properties appear to be implemented for the
>> MBA resctrl_membw controller.
>
> Please do note that the "struct resctrl_membw" naming is temporary. I only kept
> current naming to help folks find where existing known structures land in this new
> design but the "struct resctrl_membw" name is not appropriate moving forward.
>
> When user space interacts with the new controls the intention is that each control
> has a "type" that the user can find in the new "type" file and depending on that value
> the user will know what other files/properties can be expected.
>
> In this PoC there are two types: "scalar" and "bitmap" and each is associated with
> a struct that contains the properties associated with that type.
>
> Considering this, these structures could, for example, be renamed as (very long):
> struct resctrl_membw -> struct resctrl_ctrl_scalar
> struct resctrl_cache -> struct resctrl_ctrl_bitmap
>
OK, the type field serves as the primary key for querying other properties.
Alongside the "type" entry, there also exists a "flag" property. I haven't
spotted this field within resctrl_membw, though it appears in resctrl_cache
via arch_has_sparse_bitmasks. Would it make sense to introduce a dedicated
enum for this flag, or alternatively reuse the existing bw_delay_linear
for MBA?
>> Would it make sense for these properties to be generic across all resctrl
>> controllers, CAT included? Should they consequently be relocated into
>> struct resctrl_ctrl if that approach is appropriate?
>
> If there are properties that are common across all controls of all types
> then they should be moved to struct resctrl_ctrl. Current examples are "type"
> and "scope" that user space can expect from every control. It is not obvious to
> me at this time what other properties we can designate as common in this way.
>
> For bitmaps there is already the properties that are available in the top-level
> directory. These can just be replicated as properties into the resource_schemata,
> but perhaps using new names as suggested by Dave in
> https://lore.kernel.org/lkml/aQOUAeVP9oc7RIn%2F@e133380.arm.com/
>
Got it. Will further look into the code.
thanks,
Chenyu
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
2026-06-03 11:53 ` Chen, Yu C
@ 2026-06-04 16:37 ` Reinette Chatre
2026-06-05 15:43 ` Chen, Yu C
0 siblings, 1 reply; 30+ messages in thread
From: Reinette Chatre @ 2026-06-04 16:37 UTC (permalink / raw)
To: Chen, Yu C
Cc: Borislav Petkov, Thomas Gleixner, Dave Hansen, Peter Newman,
x86@kernel.org, linux-kernel@vger.kernel.org, Tony Luck,
Dave Martin, James Morse, Ben Horgan, Babu Moger, Drew Fustini,
Fenghua Yu
Hi Chenyu,
On 6/3/26 4:53 AM, Chen, Yu C wrote:
> Hi Reinette,
>
> On 6/3/2026 11:45 AM, Reinette Chatre wrote:
>> Hi Chenyu,
>>
>
> [ ... ]
>
>>>
>>> Thanks for re-spinning this patch set.
>>> Looking at commit (fs/resctrl: Introduce additional schema properties),
>>> the newly added properties appear to be implemented for the
>>> MBA resctrl_membw controller.
>>
>> Please do note that the "struct resctrl_membw" naming is temporary. I only kept
>> current naming to help folks find where existing known structures land in this new
>> design but the "struct resctrl_membw" name is not appropriate moving forward.
>>
>> When user space interacts with the new controls the intention is that each control
>> has a "type" that the user can find in the new "type" file and depending on that value
>> the user will know what other files/properties can be expected.
>>
>> In this PoC there are two types: "scalar" and "bitmap" and each is associated with
>> a struct that contains the properties associated with that type.
>>
>> Considering this, these structures could, for example, be renamed as (very long):
>> struct resctrl_membw -> struct resctrl_ctrl_scalar
>> struct resctrl_cache -> struct resctrl_ctrl_bitmap
>>
>
> OK, the type field serves as the primary key for querying other properties.
> Alongside the "type" entry, there also exists a "flag" property. I haven't
> spotted this field within resctrl_membw, though it appears in resctrl_cache
> via arch_has_sparse_bitmasks. Would it make sense to introduce a dedicated
> enum for this flag, or alternatively reuse the existing bw_delay_linear for MBA?
I realize now my previous answer to you was incomplete. You are right that there
is a plan to let the resctrl "type" file contain the schema type as well as
optional flags. This plan is unchanged but at this time it is not obvious to me
what flags this implementation should start with.
In the original proposal [1] "linear" was provided as example of a flag for the MB
resource but as I worked through this PoC whether a control is linear or not seemed
to fit better as a resource property. This is how I ended up with commit
bdcd8ac6e946 ("mpam,x86,fs/resctrl: Make memory bandwidth delay a resource property")
For this specific property I expect that all controls would have the same value. This
worked out well since resctrl already has the per-resource top level "delay_linear".
We can surely move this back to be a control property and have it be the first
"flag" but at this time it seems to me that all MB controls would just have the same
flag value?
Perhaps the safest alternative would be to keep it a resource property and just duplicate
this as a flag value among all the controls? I think this is what you are suggesting to
reuse the bw_delay_linear for MBA. This would not result in dedicated flags associated
with controls but it may set the user interface up for most flexibility.
Reinette
[1] https://lore.kernel.org/lkml/aPtfMFfLV1l%2FRB0L@e133380.arm.com/
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
2026-06-04 16:37 ` Reinette Chatre
@ 2026-06-05 15:43 ` Chen, Yu C
2026-06-05 16:20 ` Reinette Chatre
0 siblings, 1 reply; 30+ messages in thread
From: Chen, Yu C @ 2026-06-05 15:43 UTC (permalink / raw)
To: Reinette Chatre
Cc: Borislav Petkov, Thomas Gleixner, Dave Hansen, Peter Newman,
x86@kernel.org, linux-kernel@vger.kernel.org, Tony Luck,
Dave Martin, James Morse, Ben Horgan, Babu Moger, Drew Fustini,
Fenghua Yu
Hi Reinette,
On 6/5/2026 12:37 AM, Reinette Chatre wrote:
[ ... ]
>>
>> OK, the type field serves as the primary key for querying other properties.
>> Alongside the "type" entry, there also exists a "flag" property. I haven't
>> spotted this field within resctrl_membw, though it appears in resctrl_cache
>> via arch_has_sparse_bitmasks. Would it make sense to introduce a dedicated
>> enum for this flag, or alternatively reuse the existing bw_delay_linear for MBA?
>
> I realize now my previous answer to you was incomplete. You are right that there
> is a plan to let the resctrl "type" file contain the schema type as well as
> optional flags. This plan is unchanged but at this time it is not obvious to me
> what flags this implementation should start with.
>
> In the original proposal [1] "linear" was provided as example of a flag for the MB
> resource but as I worked through this PoC whether a control is linear or not seemed
> to fit better as a resource property. This is how I ended up with commit
> bdcd8ac6e946 ("mpam,x86,fs/resctrl: Make memory bandwidth delay a resource property")
>
> For this specific property I expect that all controls would have the same value. This
> worked out well since resctrl already has the per-resource top level "delay_linear".
>
> We can surely move this back to be a control property and have it be the first
> "flag" but at this time it seems to me that all MB controls would just have the same
> flag value?
>
> Perhaps the safest alternative would be to keep it a resource property and just duplicate
> this as a flag value among all the controls? I think this is what you are suggesting to
> reuse the bw_delay_linear for MBA. This would not result in dedicated flags associated
> with controls but it may set the user interface up for most flexibility.
>
Yes I think we can simply reuse the value of bw_delay_linear when
displaying the
"flag" field for each controller under the info directory.
BTW, would the L3 controller have similar case that, all L3 controllers
of the same
resource share the same value of "flag"? If yes, should we also move
arch_has_sparse_bitmasks
to rdt_resource? If yes, I'm not sure why we need "flag" in the controller.
thanks,
Chenyu
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
2026-06-05 15:43 ` Chen, Yu C
@ 2026-06-05 16:20 ` Reinette Chatre
0 siblings, 0 replies; 30+ messages in thread
From: Reinette Chatre @ 2026-06-05 16:20 UTC (permalink / raw)
To: Chen, Yu C
Cc: Borislav Petkov, Thomas Gleixner, Dave Hansen, Peter Newman,
x86@kernel.org, linux-kernel@vger.kernel.org, Tony Luck,
Dave Martin, James Morse, Ben Horgan, Babu Moger, Drew Fustini,
Fenghua Yu
Hi Chenyu,
On 6/5/26 8:43 AM, Chen, Yu C wrote:
> Hi Reinette,
>
> On 6/5/2026 12:37 AM, Reinette Chatre wrote:
>
> [ ... ]
>
>>>
>>> OK, the type field serves as the primary key for querying other properties.
>>> Alongside the "type" entry, there also exists a "flag" property. I haven't
>>> spotted this field within resctrl_membw, though it appears in resctrl_cache
>>> via arch_has_sparse_bitmasks. Would it make sense to introduce a dedicated
>>> enum for this flag, or alternatively reuse the existing bw_delay_linear for MBA?
>>
>> I realize now my previous answer to you was incomplete. You are right that there
>> is a plan to let the resctrl "type" file contain the schema type as well as
>> optional flags. This plan is unchanged but at this time it is not obvious to me
>> what flags this implementation should start with.
>>
>> In the original proposal [1] "linear" was provided as example of a flag for the MB
>> resource but as I worked through this PoC whether a control is linear or not seemed
>> to fit better as a resource property. This is how I ended up with commit
>> bdcd8ac6e946 ("mpam,x86,fs/resctrl: Make memory bandwidth delay a resource property")
>>
>> For this specific property I expect that all controls would have the same value. This
>> worked out well since resctrl already has the per-resource top level "delay_linear".
>>
>> We can surely move this back to be a control property and have it be the first
>> "flag" but at this time it seems to me that all MB controls would just have the same
>> flag value?
>>
>> Perhaps the safest alternative would be to keep it a resource property and just duplicate
>> this as a flag value among all the controls? I think this is what you are suggesting to
>> reuse the bw_delay_linear for MBA. This would not result in dedicated flags associated
>> with controls but it may set the user interface up for most flexibility.
>>
>
> Yes I think we can simply reuse the value of bw_delay_linear when displaying the
> "flag" field for each controller under the info directory.
> BTW, would the L3 controller have similar case that, all L3 controllers of the same
> resource share the same value of "flag"? If yes, should we also move arch_has_sparse_bitmasks
> to rdt_resource? If yes, I'm not sure why we need "flag" in the controller.
Whether a control of type bitmap allows sparse values or not does seem to be a valid
property of the bitmap control to me. It is difficult to predict how this will land since
I am not familiar with a resource that has multiple bitmap controllers.
I do think having a controller type support different flags in the interface exposed to the
user does give resctrl most flexibility for the future. With the user interface having that
capability the internal implementation can always adapt.
Reinette
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
2026-05-29 18:06 [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept Reinette Chatre
2026-06-02 20:23 ` Babu Moger
2026-06-02 23:32 ` Chen, Yu C
@ 2026-06-03 15:15 ` Ben Horgan
2026-06-03 19:34 ` Drew Fustini
2026-06-04 17:43 ` Reinette Chatre
2026-06-03 18:46 ` Luck, Tony
2026-06-03 22:14 ` Drew Fustini
4 siblings, 2 replies; 30+ messages in thread
From: Ben Horgan @ 2026-06-03 15:15 UTC (permalink / raw)
To: Reinette Chatre, Tony Luck, James Morse, Dave Martin, Babu Moger,
Drew Fustini, Fenghua Yu, Chen Yu
Cc: Borislav Petkov, Thomas Gleixner, Dave Hansen, Peter Newman,
x86@kernel.org, linux-kernel@vger.kernel.org
Hi Reinette,
On 5/29/26 19:06, Reinette Chatre wrote:
> Hi Everybody,
>
> It has been a while since we discussed the resctrl changes required to support
> hardware that has controls with fine granularity or hardware that has multiple
> controls per resource. For reference, the most recent email discussion can
> be found at [1] with a summary of discussions in last year's plumbers slides [2].
>
> I created a PoC that I believe supports what folks have agreed to so far. I
> hope this can help us to restart the discussion with the goal that resctrl gains
> support for upcoming hardware that require these features.
Thank you very much for doing this work. I believe this will be very useful for
MPAM and other architectures.
>
> Request regarding this PoC
> ==========================
>
> Please consider this PoC as a "direction check" on the schema description and multiple
> control discussions held thus far.
>
> Could folks working on enabling new hardware requiring this capability please consider
> if this is something you can build on and how it should be improved to support these
> upcoming capabilities?
>
> Opens
> =====
>
> While the PoC aims to support what folks agreed on some opens remain:
> - I attempted to make some MPAM supporting changes but these are all just compile
> tested. While MPAM should benefit from the new control properties I did not
> initialize them on MPAM and did not attempt refactor to separate out
> the architecture specific control properties (more on what this means later).
> I did attempt some MPAM refactoring that duplicates the MPAM domain to the
> control domain and monitoring domain lists in support of there being multiple
> controls each with its own list of control domains but it is definitely not good
> design.
I appreciate you including MPAM in this PoC. With this one line change I was
able to boot an MPAM system and mount resctrl which appears to behave correctly.
We can consider how best to do the code design later.
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -1697,6 +1697,8 @@ int mpam_resctrl_setup(void)
/* Initialise the resctrl structures from the classes */
for_each_mpam_resctrl_control(res, rid) {
+ INIT_LIST_HEAD(&res->resctrl_res.controls); // list_empty needs
to work
+
if (!res->class)
continue; // dummy resource
I plumbed in support for the MB_MIN resource schema which also works under light
testing. The only fs resctrl code change I needed was:
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -483,6 +483,9 @@ static inline u32 resctrl_get_default_ctrlval(struct
resctrl_ctrl *ctrl)
case RESCTRL_CTRL_BITMAP:
return BIT_MASK(ctrl->cache.cbm_len) - 1;
case RESCTRL_CTRL_SCALAR:
+ if (ctrl->name == RESCTRL_CTRL_NAME_MIN)
+ return ctrl->membw.min_bw;
+
return ctrl->membw.max_bw;
}
At least on MPAM systems, we use a default of 0 for minimum bandwidth controls
as the maximum bandwidth controls only take effect if their value is higher than
the minimum bandwidth value. I have specialised this on the ctrl->name which
breaks your ctrl->type based classification but that's fixable by just adding a
default field to membw.
> - No support for emulated controls (yet). The PoC is quite large already
> but I think it can be used as a base for emulated controls for which the software
> controller could be a potential first customer. In this PoC mounting with
> software controller will still display the original controller's properties.
> - One open that needs to be addressed as part of support for emulated controls is
> how best to display emulation relationship via resctrl hierarchy.
What does emulated controls mean here? Is there some previous discussion you
could point me at?
> - No support for "read-modify-write" usage of schemata file. This is where we
> discussed (without agreement) on possibly introducing the "#" prefix to schemata
> file entries. This PoC does not support this prefix and the current assumption/expectation
> is that when user space changes a configuration only the new control values are
> written to schemata file. I thus do not have a plan to support this so please
> share opinions in this regard if you have some.
There is now less motivation from the MPAM side for this than when this was
initially discussed. In pre-upstream versions of the MPAM patches a change in
the MB resource control value would change both the mpam h/w mbw_min and mbw_max
values but now (on non-broken h/w) we just change the mbw_max. (mbw_min kept at 0).
However, it would be useful not to be limited by percentages. In my quick
experimentation with your patches I used a percentage value for MB_MIN but it
would be best to move away from this. For new controls I think we can mandate
that user space has to discover the resolution from the info directly but how
can we retrofit this. For MPAM, MB and MB_MAX, would control the same things.
Could we just add MB_MAX with a h/w friendly scale and then reflect changes in
MB_MAX in MB and vica versa with MB taking precedent if both are set? Old
software can continue setting MB can move to using MB_MAX and take advantage of
the improved control. (I don't think we should expose the MPAM hardware value
directly as it has confusion over whether all 1s is 100% or not and we'd like to
have something generic and friendly to the user.)
> - Controls are independent for now. This means that, for example, if a resource
> supports a "MIN" and "MAX" control then this implementation would allow user to
> set the "maximum" control values to be less than the "minimum" control values.
I think this is ok as long as adding support for new controls in resctrl doesn't
change the existing behaviour. In MPAM we dodged this by introducing MB as only
affecting the h/w mbw_max and not mbw_min (as mentioned above).
> - PoC supports the "bitmap" control but does not (yet) expose properties of a bitmap
> control to the new info/<resource>/resource_schemata directory.
>
> Accessing PoC
> =============
>
> Please consider the PoC as a rough draft. It has only been compile tested for Arm
> and known to be incomplete in Arm support. To help with experimenting I only
> fully adapted the Intel MBA resource to demo two dummy additional MBA controls.
> All architectures should immediately benefit from the new schema descriptions
> and new info/MB/resource_schemata hierarchy.
>
> I considered the patches self too many for email. Instead, the PoC can be found at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/reinette/linux.git branch resctrl/controls_rfc_v1
>
> The work is based on v7.1-rc2 that also includes the following series (two of which has
> since been queued) included:
>
> "selftests/resctrl: Fixes and improvements focused on Intel platforms"
> https://lore.kernel.org/lkml/cover.1775266384.git.reinette.chatre@intel.com/
>
> "x86,fs/resctrl: Improve resctrl quality and consistency"
> https://lore.kernel.org/lkml/cover.1777419024.git.reinette.chatre@intel.com/
>
> "x86,fs/resctrl: Pave the way for MPAM counter assignment"
> https://lore.kernel.org/lkml/20260506082855.3694761-1-ben.horgan@arm.com/
>
>
> Primary resctrl fs data structure changes
> =========================================
>
> Introduces a control represented by struct resctrl_ctrl that looks as below. To make
> the changes easier to follow I kept some of the original names to help communicate
> where familiar data structures land.
>
> What to notice about a control is that it has some common properties required
> from all controls (scope, type, etc.) and then depending on the type of control
> (RESCTRL_CTRL_BITMAP or RESCTRL_CTRL_SCALAR) there are type specific properties.
>
> /**
> * struct resctrl_ctrl - A resource control
> * @entry: List entry of rdt_resource::controls
> * @scope: Scope of the resource that this control allocates
> * @domains: RCU list of all control domains
> * @type: The control type that determines the properties of the control,
> * format string for displaying control values to user space, and
> * parser of control values provided by user space.
> * @name: Name of the control. Appended to final resource name
> * (rdt_resource_final::name) to create final schema entry.
> * Specifically, "rdt_resource_final::name"_"resctrl_ctrl::name".
> * For example, with resource name "MB" and control name "MAX" the
> * schema entry will be "MB_MAX".
> * @cache: Cache allocation control properties.
> * @membw: Bandwidth control properties.
> */
> struct resctrl_ctrl {
> struct list_head entry;
> enum resctrl_scope scope;
> struct list_head domains;
> enum resctrl_ctrl_type type;
> enum resctrl_ctrl_name name;
> union {
> struct resctrl_cache cache;
> struct resctrl_membw membw;
> };
> };
>
> Two members summarize how this new structure fits into the rest of resctrl:
> a) resctrl_ctrl::entry
> Since a resource can support multiple controls there is a new list
> in struct rdt_resource named "controls" that contains the list of all
> controls supported by the resource.
> b) resctrl_ctrl::domains
> Instead of the list of control domains belonging to a resource they
> now belong to the control self. By doing so resctrl can support resource
> controls at different scope for the same resource. This is intended to
> support some upcoming MPAM and RISC-V usages.
Please can you expand a bit on part b).
In an MPAM system we consider 3 resctrl resources, RDT_RESOURCE_L3,
RDT_RESOURCE_L2 and RDT_RESOURCE_MBA which correspond to the L3 caches, L2
caches and memory bandwidth on egress from the L3 caches. The domain for each of
these corresponds to the instance of the resource. That is, for RDT_RESOURCE_L2
there is a resource for each L2 instance, similarly for L3, and for
RDT_RESOURCE_MBA there is a domain for each L3 cache. If we were to add suport
for controls on a new cache level, say the L4, then I'd expect to add a new
resource. For memory bandwidth, we'd like to be able to control b/w on the L2
egress (e.g. in a DSU). Wouldn't this too be a separate resource or would this
be a new set of controls on the same resource?
New controls on the same resource
MB_MIN2
MB_MAX2
MB_PROP2
...
or
MB2_MIN
MB2_MAX
MB2_PROP
AFAIK, the DSU h/w just supports proportional bandwidth controls at the moment
but we should consider what to do about the potential naming.
In the MPAM driver, we collect MSC into components (based on instances) and
those into classes (components of the same type). Currently, a resource is
mapped to a single class. (Two resources may map to the same class.)
I expect it is useful in the memory region and sub numa cases but I'd still
expect the common case to be that the domains are the same within a control. Or
am I missing something?
>
> Example architectural data structure changes
> ============================================
>
> An architecture can use the new control by following a similar pattern to
> resource and domain use by architectures. Consider the following for x86
> where a new architecture specific struct resctrl_hw_ctrl includes
> struct resctrl_ctrl and any architecture private data needed to support
> the control:
>
> /*
> * struct resctrl_hw_ctrl - Arch private properties of a resource control
> * @r_ctrl: Control properties exposed to resctrl file system
> * @msr_base: Base MSR address where control values should be programmed
> * @msr_update: Function pointer to update control values
> */
> struct resctrl_hw_ctrl {
> struct resctrl_ctrl r_ctrl;
> unsigned int msr_base;
> void (*msr_update)(struct msr_param *m);
> };
>
> Structure of patch series
> =========================
>
> As a PoC the series is not perfectly structured but to help navigate this work
> on a high level the changes can be categorized as follows:
>
> Patch 1 to 11:
> With a vision of what a "control" is, remove unused/unnecessary
> members, make clear what is a *resource* property vs a *control*
> property, do some renaming to help with the PoC.
A few of the changes are generic cleanup and could hopefully be dealt with
before decisions on the larger PoC are made. I see:
fs/resctrl: Remove unused resctrl_membw::mb_map
x86,fs/resctrl: Remove "arch_needs_linear"
Perhaps a few more.
>
> Patch 12:
> Introduce struct resctrl_ctrl and re-arrange existing struct rdt_resource
> members to form part of new rdt_resource::ctrl
>
> Patch 13 to 44:
> A lot of wrangling to introduce struct resctrl_ctrl to all code that needs
> to work with a control and/or domain without assuming that the control is
> the one and only control embedded in the resource it belongs to. Essentially,
> a lot of changes passing the control around in addition to the resource/domain.
You mention a few times in the commit message that you expect the cache
resources to only have one control. On MPAM we have CMAX (and there looks to be
a RISC-V equivalent) where the total number of bytes in the cache for a given
closid is limited. The allocation must still respect the CPBM bitmap though.
Looking at the code though I don't see much problem in adding this as an
additional control. The assumption that these patches is making is not that
there is only one control for cache resources but rather that cache portions are
managed by the default cache resource control. Am I missing something or does
that assessment make sense to you?
I have been looking at adding CMAX control to resctrl and will have a go at
basing what I have so far on top of this series.
>
> Patch 45:
> Switch the single struct resctrl_ctrl member of struct rdt_resource to be
> a list of struct resctrl_ctrl.
>
> Patch 47 to 49:
> Introduce new info/<resource>/resource_schemata hierarchy to first only
> consist of properties already known to resctrl fs.
>
> Patch 50 to 52:
> Introduce the new control properties per [1], initialize them for x86,
> and expose them via info/<resource>/resource_schemata
>
> Patch 53:
> Let the new struct resctrl_hw_ctrl contain architecture's control properties.
>
> Patch 54:
> Teach resctrl fs about "MIN" and "MAX" controls.
>
> Patch 55:
> Sample of "MIN" and "MAX" memory bandwidth controls for x86.
> > Example interactions
> ====================
>
> This series can be used on an x86 system where it will show two new dummy controls
> where it is possible to interact with the new controls.
> For example:
>
> # cat schemata
> MB_MAX:0=100;1=100
> MB_MIN:0=100;1=100
> MB:0=100;1=100
> L3:0=fff;1=fff
> # echo 'MB_MIN:0=50' > schemata
> # cat schemata
> MB_MAX:0=100;1=100
> MB_MIN:0=50;1=100
> MB:0=100;1=100
> L3:0=fff;1=fff
>
> Writing to the dummy control will call a dummy callback that just prints to the
> kernel log:
> "resctrl: Updata temporary MIN control on domain 0 with user value 50"
>
>
> Example output of info/MB/:
> /sys/fs/resctrl/info/MB/thread_throttle_mode:max
> /sys/fs/resctrl/info/MB/num_closids:15
> /sys/fs/resctrl/info/MB/delay_linear:1
> /sys/fs/resctrl/info/MB/min_bandwidth:10
> /sys/fs/resctrl/info/MB/resource_schemata/MB/resolution:100
> /sys/fs/resctrl/info/MB/resource_schemata/MB/tolerance:5
> /sys/fs/resctrl/info/MB/resource_schemata/MB/type:scalar
> /sys/fs/resctrl/info/MB/resource_schemata/MB/min:10
> /sys/fs/resctrl/info/MB/resource_schemata/MB/scale:1
> /sys/fs/resctrl/info/MB/resource_schemata/MB/scope:L3
> /sys/fs/resctrl/info/MB/resource_schemata/MB/unit:all
> /sys/fs/resctrl/info/MB/resource_schemata/MB/max:100
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MIN/resolution:100
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MIN/tolerance:5
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MIN/type:scalar
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MIN/min:10
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MIN/scale:1
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MIN/scope:L3
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MIN/unit:all
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MIN/max:100
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MAX/resolution:100
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MAX/tolerance:5
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MAX/type:scalar
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MAX/min:10
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MAX/scale:1
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MAX/scope:L3
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MAX/unit:all
> /sys/fs/resctrl/info/MB/resource_schemata/MB_MAX/max:100
> /sys/fs/resctrl/info/MB/bandwidth_gran:10
> > Any feedback is appreciated.
Overall, this looks to be a big step in the right direction.
Thanks,
Ben
>
> Reinette
>
> [1] https://lore.kernel.org/lkml/aPtfMFfLV1l%2FRB0L@e133380.arm.com/
> [2] https://lpc.events/event/19/contributions/2093/attachments/1958/4172/resctrl%20Microconference%20LPC%202025%20Tokyo.pdf
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
2026-06-03 15:15 ` Ben Horgan
@ 2026-06-03 19:34 ` Drew Fustini
2026-06-04 11:24 ` Ben Horgan
2026-06-04 21:05 ` Reinette Chatre
2026-06-04 17:43 ` Reinette Chatre
1 sibling, 2 replies; 30+ messages in thread
From: Drew Fustini @ 2026-06-03 19:34 UTC (permalink / raw)
To: Ben Horgan
Cc: Reinette Chatre, Tony Luck, James Morse, Dave Martin, Babu Moger,
Fenghua Yu, Chen Yu, Borislav Petkov, Thomas Gleixner,
Dave Hansen, Peter Newman, x86@kernel.org,
linux-kernel@vger.kernel.org
On Wed, Jun 03, 2026 at 04:15:51PM +0100, Ben Horgan wrote:
> Hi Reinette,
>
> On 5/29/26 19:06, Reinette Chatre wrote:
> > Hi Everybody,
> >
> > It has been a while since we discussed the resctrl changes required to support
> > hardware that has controls with fine granularity or hardware that has multiple
> > controls per resource. For reference, the most recent email discussion can
> > be found at [1] with a summary of discussions in last year's plumbers slides [2].
> >
> > I created a PoC that I believe supports what folks have agreed to so far. I
> > hope this can help us to restart the discussion with the goal that resctrl gains
> > support for upcoming hardware that require these features.
>
> Thank you very much for doing this work. I believe this will be very useful for
> MPAM and other architectures.
Yes, thanks to Reinette for working on the generic schema proof of
concept. This will be helpful for supporting the RISC-V CBQRI (capacity
and bandwidth QoS) spec.
> I plumbed in support for the MB_MIN resource schema which also works under light
> testing. The only fs resctrl code change I needed was:
>
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -483,6 +483,9 @@ static inline u32 resctrl_get_default_ctrlval(struct
> resctrl_ctrl *ctrl)
> case RESCTRL_CTRL_BITMAP:
> return BIT_MASK(ctrl->cache.cbm_len) - 1;
> case RESCTRL_CTRL_SCALAR:
> + if (ctrl->name == RESCTRL_CTRL_NAME_MIN)
> + return ctrl->membw.min_bw;
> +
> return ctrl->membw.max_bw;
> }
>
>
> At least on MPAM systems, we use a default of 0 for minimum bandwidth controls
> as the maximum bandwidth controls only take effect if their value is higher than
> the minimum bandwidth value. I have specialised this on the ctrl->name which
> breaks your ctrl->type based classification but that's fixable by just adding a
> default field to membw.
This should be useful for RISC-V.
RESCTRL_CTRL_NAME_MIN maps well to CBQRI Rbwb (reserved bandwidth
blocks). The sum of Rbwb across all control groups must be less than
MRBWB (maximum number of reserved bandwidth blocks). As a result, MB_MIN
needs to default to 1 so that the sum does not violate that rule. In my
RFC series, I added default_to_min to resctrl_membw [1] but this
solution looks cleaner.
> > - No support for "read-modify-write" usage of schemata file. This is where we
> > discussed (without agreement) on possibly introducing the "#" prefix to schemata
> > file entries. This PoC does not support this prefix and the current assumption/expectation
> > is that when user space changes a configuration only the new control values are
> > written to schemata file. I thus do not have a plan to support this so please
> > share opinions in this regard if you have some.
>
> There is now less motivation from the MPAM side for this than when this was
> initially discussed. In pre-upstream versions of the MPAM patches a change in
> the MB resource control value would change both the mpam h/w mbw_min and mbw_max
> values but now (on non-broken h/w) we just change the mbw_max. (mbw_min kept at 0).
>
> However, it would be useful not to be limited by percentages. In my quick
> experimentation with your patches I used a percentage value for MB_MIN but it
> would be best to move away from this. For new controls I think we can mandate
> that user space has to discover the resolution from the info directly but how
> can we retrofit this. For MPAM, MB and MB_MAX, would control the same things.
> Could we just add MB_MAX with a h/w friendly scale and then reflect changes in
> MB_MAX in MB and vica versa with MB taking precedent if both are set? Old
> software can continue setting MB can move to using MB_MAX and take advantage of
> the improved control. (I don't think we should expose the MPAM hardware value
> directly as it has confusion over whether all 1s is 100% or not and we'd like to
> have something generic and friendly to the user.)
The facility for non-percentage value is import for RISC-V as CBQRI does
not include percentage throttle. It has two controls for bandwidth:
- Rbwb: number of reserved bandwidth blocks [1, 2^13]
- Mweight: weighted share of the remaining bandwidth [0, 255]
- 0: disables work-conserving sharing
- 1..255: compete for the leftover pool
- It makes for it to default to max (255) so that there won't be
any unused bandwidth
I think Mweight could be aligned with MPAM's proportional stride.
Here is the patch I created to add Mweight support:
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index d95ab8ad36e2..3537071e3ab0 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -304,6 +304,7 @@ static const char * const resctrl_ctrl_name[] = {
[RESCTRL_CTRL_NAME_DEF] = "",
[RESCTRL_CTRL_NAME_MIN] = "MIN",
[RESCTRL_CTRL_NAME_MAX] = "MAX",
+ [RESCTRL_CTRL_NAME_WGHT] = "WGHT",
};
const char *resctrl_ctrl_name_str(enum resctrl_ctrl_name name)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 72fb7256270e..09efcef9ce66 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -348,12 +348,14 @@ struct resctrl_mon {
* has the same name as the resource.
* @RESCTRL_CTRL_NAME_MIN: "MIN"
* @RESCTRL_CTRL_NAME_MAX: "MAX"
+ * @RESCTRL_CTRL_NAME_WGHT: "WGHT"
*/
enum resctrl_ctrl_name {
RESCTRL_CTRL_NAME_DEF,
RESCTRL_CTRL_NAME_MIN,
RESCTRL_CTRL_NAME_MAX,
- RESCTRL_CTRL_NAME_LAST = RESCTRL_CTRL_NAME_MAX
+ RESCTRL_CTRL_NAME_WGHT,
+ RESCTRL_CTRL_NAME_LAST = RESCTRL_CTRL_NAME_WGHT
};
> > - Controls are independent for now. This means that, for example, if a resource
> > supports a "MIN" and "MAX" control then this implementation would allow user to
> > set the "maximum" control values to be less than the "minimum" control values.
>
> I think this is ok as long as adding support for new controls in resctrl doesn't
> change the existing behaviour. In MPAM we dodged this by introducing MB as only
> affecting the h/w mbw_max and not mbw_min (as mentioned above).
There is no equivalent to MB (percentage throttle) in RISC-V so I would
want it to be valid to have MB_MIN (minimum reservation) without MB.
I rebased my RISC-V CBQRI v6 series on top of this proof of concept and
was able to validate it works okay in Qemu:
MB_WGHT:72=255
MB_MIN:72=756
L2:64=fff;65=fff
L3:75=ffff
Thanks,
Drew
[1] https://lore.kernel.org/all/20260601-ssqosid-cbqri-rqsc-v7-0-v6-6-baf00f50028a@kernel.org/
^ permalink raw reply related [flat|nested] 30+ messages in thread* Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
2026-06-03 19:34 ` Drew Fustini
@ 2026-06-04 11:24 ` Ben Horgan
2026-06-04 17:38 ` Drew Fustini
2026-06-04 21:05 ` Reinette Chatre
1 sibling, 1 reply; 30+ messages in thread
From: Ben Horgan @ 2026-06-04 11:24 UTC (permalink / raw)
To: Drew Fustini
Cc: Reinette Chatre, Tony Luck, James Morse, Dave Martin, Babu Moger,
Fenghua Yu, Chen Yu, Borislav Petkov, Thomas Gleixner,
Dave Hansen, Peter Newman, x86@kernel.org,
linux-kernel@vger.kernel.org
Hi Drew,
On 6/3/26 20:34, Drew Fustini wrote:
> On Wed, Jun 03, 2026 at 04:15:51PM +0100, Ben Horgan wrote:
>> Hi Reinette,
>>
>> On 5/29/26 19:06, Reinette Chatre wrote:
>>> Hi Everybody,
>>>
>>> It has been a while since we discussed the resctrl changes required to support
>>> hardware that has controls with fine granularity or hardware that has multiple
>>> controls per resource. For reference, the most recent email discussion can
>>> be found at [1] with a summary of discussions in last year's plumbers slides [2].
>>>
>>> I created a PoC that I believe supports what folks have agreed to so far. I
>>> hope this can help us to restart the discussion with the goal that resctrl gains
>>> support for upcoming hardware that require these features.
>>
>> Thank you very much for doing this work. I believe this will be very useful for
>> MPAM and other architectures.
>
> Yes, thanks to Reinette for working on the generic schema proof of
> concept. This will be helpful for supporting the RISC-V CBQRI (capacity
> and bandwidth QoS) spec.
>
>> I plumbed in support for the MB_MIN resource schema which also works under light
>> testing. The only fs resctrl code change I needed was:
>>
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -483,6 +483,9 @@ static inline u32 resctrl_get_default_ctrlval(struct
>> resctrl_ctrl *ctrl)
>> case RESCTRL_CTRL_BITMAP:
>> return BIT_MASK(ctrl->cache.cbm_len) - 1;
>> case RESCTRL_CTRL_SCALAR:
>> + if (ctrl->name == RESCTRL_CTRL_NAME_MIN)
>> + return ctrl->membw.min_bw;
>> +
>> return ctrl->membw.max_bw;
>> }
>>
>>
>> At least on MPAM systems, we use a default of 0 for minimum bandwidth controls
>> as the maximum bandwidth controls only take effect if their value is higher than
>> the minimum bandwidth value. I have specialised this on the ctrl->name which
>> breaks your ctrl->type based classification but that's fixable by just adding a
>> default field to membw.
>
> This should be useful for RISC-V.
>
> RESCTRL_CTRL_NAME_MIN maps well to CBQRI Rbwb (reserved bandwidth
> blocks). The sum of Rbwb across all control groups must be less than
> MRBWB (maximum number of reserved bandwidth blocks). As a result, MB_MIN
> needs to default to 1 so that the sum does not violate that rule. In my
> RFC series, I added default_to_min to resctrl_membw [1] but this
> solution looks cleaner.
>
>>> - No support for "read-modify-write" usage of schemata file. This is where we
>>> discussed (without agreement) on possibly introducing the "#" prefix to schemata
>>> file entries. This PoC does not support this prefix and the current assumption/expectation
>>> is that when user space changes a configuration only the new control values are
>>> written to schemata file. I thus do not have a plan to support this so please
>>> share opinions in this regard if you have some.
>>
>> There is now less motivation from the MPAM side for this than when this was
>> initially discussed. In pre-upstream versions of the MPAM patches a change in
>> the MB resource control value would change both the mpam h/w mbw_min and mbw_max
>> values but now (on non-broken h/w) we just change the mbw_max. (mbw_min kept at 0).
>>
>> However, it would be useful not to be limited by percentages. In my quick
>> experimentation with your patches I used a percentage value for MB_MIN but it
>> would be best to move away from this. For new controls I think we can mandate
>> that user space has to discover the resolution from the info directly but how
>> can we retrofit this. For MPAM, MB and MB_MAX, would control the same things.
>> Could we just add MB_MAX with a h/w friendly scale and then reflect changes in
>> MB_MAX in MB and vica versa with MB taking precedent if both are set? Old
>> software can continue setting MB can move to using MB_MAX and take advantage of
>> the improved control. (I don't think we should expose the MPAM hardware value
>> directly as it has confusion over whether all 1s is 100% or not and we'd like to
>> have something generic and friendly to the user.)
>
> The facility for non-percentage value is import for RISC-V as CBQRI does
> not include percentage throttle. It has two controls for bandwidth:
>
> - Rbwb: number of reserved bandwidth blocks [1, 2^13]
> - Mweight: weighted share of the remaining bandwidth [0, 255]
> - 0: disables work-conserving sharing
> - 1..255: compete for the leftover pool
> - It makes for it to default to max (255) so that there won't be
> any unused bandwidth
>
> I think Mweight could be aligned with MPAM's proportional stride.
Yes, I hope so. There a few differences which would have to be considered.
MPAM doesn't have a concept of only applying the weights once reserved min
bandwidth is consumed. The interaction with min bandwidth is currently
unspecified. I don't think there are any designs where proportional bandwidth
and min b/w are on the same component and so it's only a theoretical/future problem.
For MPAM proportional stride, the higher the stride the lower the weight. We'll
have to make sure that whatever user configuration scale we provide works well
for both. If two PARTIDs have stride 2x and a third x then the 2 PARTIDS with
stride 2x together get the same bandidth as the third. Whereas, to get the same
in RISC-V the two partid would have weight y and the third 2y.
It's not specified for MPAM exactly what happens when you disable proportional
stride for a given PARTID.
The MPAM proportional control is work-conserving (the table in B.b RWKZBJ has
been confirmed as a spec mistake) and only corresponds to the current contenders
for bandwidth. From my reading of the CBQRI spec this is the same for RISC-V.
Thanks,
Ben
>
> Here is the patch I created to add Mweight support:
>
> diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
> index d95ab8ad36e2..3537071e3ab0 100644
> --- a/fs/resctrl/ctrlmondata.c
> +++ b/fs/resctrl/ctrlmondata.c
> @@ -304,6 +304,7 @@ static const char * const resctrl_ctrl_name[] = {
> [RESCTRL_CTRL_NAME_DEF] = "",
> [RESCTRL_CTRL_NAME_MIN] = "MIN",
> [RESCTRL_CTRL_NAME_MAX] = "MAX",
> + [RESCTRL_CTRL_NAME_WGHT] = "WGHT",
> };
>
> const char *resctrl_ctrl_name_str(enum resctrl_ctrl_name name)
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 72fb7256270e..09efcef9ce66 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -348,12 +348,14 @@ struct resctrl_mon {
> * has the same name as the resource.
> * @RESCTRL_CTRL_NAME_MIN: "MIN"
> * @RESCTRL_CTRL_NAME_MAX: "MAX"
> + * @RESCTRL_CTRL_NAME_WGHT: "WGHT"
> */
> enum resctrl_ctrl_name {
> RESCTRL_CTRL_NAME_DEF,
> RESCTRL_CTRL_NAME_MIN,
> RESCTRL_CTRL_NAME_MAX,
> - RESCTRL_CTRL_NAME_LAST = RESCTRL_CTRL_NAME_MAX
> + RESCTRL_CTRL_NAME_WGHT,
> + RESCTRL_CTRL_NAME_LAST = RESCTRL_CTRL_NAME_WGHT
> };
>
>>> - Controls are independent for now. This means that, for example, if a resource
>>> supports a "MIN" and "MAX" control then this implementation would allow user to
>>> set the "maximum" control values to be less than the "minimum" control values.
>>
>> I think this is ok as long as adding support for new controls in resctrl doesn't
>> change the existing behaviour. In MPAM we dodged this by introducing MB as only
>> affecting the h/w mbw_max and not mbw_min (as mentioned above).
>
> There is no equivalent to MB (percentage throttle) in RISC-V so I would
> want it to be valid to have MB_MIN (minimum reservation) without MB.
>
> I rebased my RISC-V CBQRI v6 series on top of this proof of concept and
> was able to validate it works okay in Qemu:
>
> MB_WGHT:72=255
> MB_MIN:72=756
> L2:64=fff;65=fff
> L3:75=ffff
>
> Thanks,
> Drew
>
> [1] https://lore.kernel.org/all/20260601-ssqosid-cbqri-rqsc-v7-0-v6-6-baf00f50028a@kernel.org/
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
2026-06-04 11:24 ` Ben Horgan
@ 2026-06-04 17:38 ` Drew Fustini
0 siblings, 0 replies; 30+ messages in thread
From: Drew Fustini @ 2026-06-04 17:38 UTC (permalink / raw)
To: Ben Horgan
Cc: Reinette Chatre, Tony Luck, James Morse, Dave Martin, Babu Moger,
Fenghua Yu, Chen Yu, Borislav Petkov, Thomas Gleixner,
Dave Hansen, Peter Newman, x86@kernel.org,
linux-kernel@vger.kernel.org
On Thu, Jun 04, 2026 at 12:24:57PM +0100, Ben Horgan wrote:
> > The facility for non-percentage value is import for RISC-V as CBQRI does
> > not include percentage throttle. It has two controls for bandwidth:
> >
> > - Rbwb: number of reserved bandwidth blocks [1, 2^13]
> > - Mweight: weighted share of the remaining bandwidth [0, 255]
> > - 0: disables work-conserving sharing
> > - 1..255: compete for the leftover pool
> > - It makes for it to default to max (255) so that there won't be
> > any unused bandwidth
> >
> > I think Mweight could be aligned with MPAM's proportional stride.
>
> Yes, I hope so. There a few differences which would have to be considered.
>
> MPAM doesn't have a concept of only applying the weights once reserved min
> bandwidth is consumed. The interaction with min bandwidth is currently
> unspecified. I don't think there are any designs where proportional bandwidth
> and min b/w are on the same component and so it's only a theoretical/future problem.
>
> For MPAM proportional stride, the higher the stride the lower the weight. We'll
> have to make sure that whatever user configuration scale we provide works well
> for both. If two PARTIDs have stride 2x and a third x then the 2 PARTIDS with
> stride 2x together get the same bandidth as the third. Whereas, to get the same
> in RISC-V the two partid would have weight y and the third 2y.
Thanks for explaining. I had been thinking it would be best to try to
share a control for stride and weight, but now I am wondering if those
should just be considered separate controls.
It seems like they are similar enough to be able to convert to a common
scale but maybe that would be too confusing for the user. I guess it is
a matter of how much userspace needs or wants to be aware of the
difference between systems with MPAM and CBQRI.
Alternatively, it just occurred to me that Mweight could be mapped to MB.
I think Mweight could be thought of in the context of throttling: all
groups start with the max of 255 which can be represented as 100%.
> It's not specified for MPAM exactly what happens when you disable proportional
> stride for a given PARTID.
>
> The MPAM proportional control is work-conserving (the table in B.b RWKZBJ has
> been confirmed as a spec mistake) and only corresponds to the current contenders
> for bandwidth. From my reading of the CBQRI spec this is the same for RISC-V.
Yes, I think they are the same. The only exception is that Mweight of 0
means no shared badwidth and that the group is restricted to just its
reserved bandwidth blocks (e.g. MB_MIN).
Thanks,
Drew
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
2026-06-03 19:34 ` Drew Fustini
2026-06-04 11:24 ` Ben Horgan
@ 2026-06-04 21:05 ` Reinette Chatre
2026-06-05 19:35 ` Drew Fustini
1 sibling, 1 reply; 30+ messages in thread
From: Reinette Chatre @ 2026-06-04 21:05 UTC (permalink / raw)
To: Drew Fustini, Ben Horgan
Cc: Tony Luck, James Morse, Dave Martin, Babu Moger, Fenghua Yu,
Chen Yu, Borislav Petkov, Thomas Gleixner, Dave Hansen,
Peter Newman, x86@kernel.org, linux-kernel@vger.kernel.org
Hi Drew,
On 6/3/26 12:34 PM, Drew Fustini wrote:
> On Wed, Jun 03, 2026 at 04:15:51PM +0100, Ben Horgan wrote:
>> Hi Reinette,
>>
>> On 5/29/26 19:06, Reinette Chatre wrote:
>>> Hi Everybody,
>>>
>>> It has been a while since we discussed the resctrl changes required to support
>>> hardware that has controls with fine granularity or hardware that has multiple
>>> controls per resource. For reference, the most recent email discussion can
>>> be found at [1] with a summary of discussions in last year's plumbers slides [2].
>>>
>>> I created a PoC that I believe supports what folks have agreed to so far. I
>>> hope this can help us to restart the discussion with the goal that resctrl gains
>>> support for upcoming hardware that require these features.
>>
>> Thank you very much for doing this work. I believe this will be very useful for
>> MPAM and other architectures.
>
> Yes, thanks to Reinette for working on the generic schema proof of
> concept. This will be helpful for supporting the RISC-V CBQRI (capacity
> and bandwidth QoS) spec.
Thank you very much for considering this work.
>
>> I plumbed in support for the MB_MIN resource schema which also works under light
>> testing. The only fs resctrl code change I needed was:
>>
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -483,6 +483,9 @@ static inline u32 resctrl_get_default_ctrlval(struct
>> resctrl_ctrl *ctrl)
>> case RESCTRL_CTRL_BITMAP:
>> return BIT_MASK(ctrl->cache.cbm_len) - 1;
>> case RESCTRL_CTRL_SCALAR:
>> + if (ctrl->name == RESCTRL_CTRL_NAME_MIN)
>> + return ctrl->membw.min_bw;
>> +
>> return ctrl->membw.max_bw;
>> }
>>
>>
>> At least on MPAM systems, we use a default of 0 for minimum bandwidth controls
>> as the maximum bandwidth controls only take effect if their value is higher than
>> the minimum bandwidth value. I have specialised this on the ctrl->name which
>> breaks your ctrl->type based classification but that's fixable by just adding a
>> default field to membw.
>
> This should be useful for RISC-V.
>
> RESCTRL_CTRL_NAME_MIN maps well to CBQRI Rbwb (reserved bandwidth
> blocks). The sum of Rbwb across all control groups must be less than
> MRBWB (maximum number of reserved bandwidth blocks). As a result, MB_MIN
> needs to default to 1 so that the sum does not violate that rule. In my
> RFC series, I added default_to_min to resctrl_membw [1] but this
> solution looks cleaner.
As I mentioned in response to Ben [2] there seems to be a mismatch between
architecture requirements here. resctrl uses the value returned by
resctrl_get_default_ctrlval() as the control value that means "no throttling".
For Intel this means min == max but this does not seem to be the case for MPAM
and CBQRI. I am not familiar enough with either to have an alternative proposal here
so I need to become familiar now. There is a bit of backlog on other resctl
work right now so this will take me some time to sort out.
>
>>> - No support for "read-modify-write" usage of schemata file. This is where we
>>> discussed (without agreement) on possibly introducing the "#" prefix to schemata
>>> file entries. This PoC does not support this prefix and the current assumption/expectation
>>> is that when user space changes a configuration only the new control values are
>>> written to schemata file. I thus do not have a plan to support this so please
>>> share opinions in this regard if you have some.
>>
>> There is now less motivation from the MPAM side for this than when this was
>> initially discussed. In pre-upstream versions of the MPAM patches a change in
>> the MB resource control value would change both the mpam h/w mbw_min and mbw_max
>> values but now (on non-broken h/w) we just change the mbw_max. (mbw_min kept at 0).
>>
>> However, it would be useful not to be limited by percentages. In my quick
>> experimentation with your patches I used a percentage value for MB_MIN but it
>> would be best to move away from this. For new controls I think we can mandate
>> that user space has to discover the resolution from the info directly but how
>> can we retrofit this. For MPAM, MB and MB_MAX, would control the same things.
>> Could we just add MB_MAX with a h/w friendly scale and then reflect changes in
>> MB_MAX in MB and vica versa with MB taking precedent if both are set? Old
>> software can continue setting MB can move to using MB_MAX and take advantage of
>> the improved control. (I don't think we should expose the MPAM hardware value
>> directly as it has confusion over whether all 1s is 100% or not and we'd like to
>> have something generic and friendly to the user.)
>
> The facility for non-percentage value is import for RISC-V as CBQRI does
> not include percentage throttle. It has two controls for bandwidth:
>
> - Rbwb: number of reserved bandwidth blocks [1, 2^13]
> - Mweight: weighted share of the remaining bandwidth [0, 255]
> - 0: disables work-conserving sharing
> - 1..255: compete for the leftover pool
> - It makes for it to default to max (255) so that there won't be
> any unused bandwidth
>
> I think Mweight could be aligned with MPAM's proportional stride.
>
> Here is the patch I created to add Mweight support:
>
> diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
> index d95ab8ad36e2..3537071e3ab0 100644
> --- a/fs/resctrl/ctrlmondata.c
> +++ b/fs/resctrl/ctrlmondata.c
> @@ -304,6 +304,7 @@ static const char * const resctrl_ctrl_name[] = {
> [RESCTRL_CTRL_NAME_DEF] = "",
> [RESCTRL_CTRL_NAME_MIN] = "MIN",
> [RESCTRL_CTRL_NAME_MAX] = "MAX",
> + [RESCTRL_CTRL_NAME_WGHT] = "WGHT",
> };
>
> const char *resctrl_ctrl_name_str(enum resctrl_ctrl_name name)
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 72fb7256270e..09efcef9ce66 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -348,12 +348,14 @@ struct resctrl_mon {
> * has the same name as the resource.
> * @RESCTRL_CTRL_NAME_MIN: "MIN"
> * @RESCTRL_CTRL_NAME_MAX: "MAX"
> + * @RESCTRL_CTRL_NAME_WGHT: "WGHT"
> */
> enum resctrl_ctrl_name {
> RESCTRL_CTRL_NAME_DEF,
> RESCTRL_CTRL_NAME_MIN,
> RESCTRL_CTRL_NAME_MAX,
> - RESCTRL_CTRL_NAME_LAST = RESCTRL_CTRL_NAME_MAX
> + RESCTRL_CTRL_NAME_WGHT,
> + RESCTRL_CTRL_NAME_LAST = RESCTRL_CTRL_NAME_WGHT
> };
>
>>> - Controls are independent for now. This means that, for example, if a resource
>>> supports a "MIN" and "MAX" control then this implementation would allow user to
>>> set the "maximum" control values to be less than the "minimum" control values.
>>
>> I think this is ok as long as adding support for new controls in resctrl doesn't
>> change the existing behaviour. In MPAM we dodged this by introducing MB as only
>> affecting the h/w mbw_max and not mbw_min (as mentioned above).
>
> There is no equivalent to MB (percentage throttle) in RISC-V so I would
> want it to be valid to have MB_MIN (minimum reservation) without MB.
>
> I rebased my RISC-V CBQRI v6 series on top of this proof of concept and
> was able to validate it works okay in Qemu:
>
> MB_WGHT:72=255
> MB_MIN:72=756
> L2:64=fff;65=fff
> L3:75=ffff
Ideally any new support should not break existing user space and the existing
user interface expects a MB entry in the schemata file when the MB resource exists.
Is it possible to emulate the percentage based MB control with MB_WGHT or MB_MIN?
This sounds similar as what is/was planned for MPAM [2].
Something that may be of interest is a proposal that Chenyu is refining to address an
issue with the region-aware MBA support where there is no intuitive backward compatible
interface. This was highlighted in the plumbers slides (see slide titled "Open: maintaining
backward compatibility when region aware"). The current idea to deal with this is to
introduce a "mode" associated with the resource controls. For example,
# cat /sys/fs/resctrl/info/MB/resource_schemata/mode
[legacy] native
By default the "legacy" mode will be enabled and exposes the "MB" default control to user
space via the schemata file. In support of this each new control has a new property file
named "status" that can have value "enabled" or "disabled". Only "enabled" controls are
present in the schemata file but all controls are always present in the resource_schemata
directory. By writing to the "mode" file user space acknowledges familiarity with the new
"resource_schemata" based interface and can change the status of a control and
thus manage its visibility in the schemata file.
Could something like this work for CBQRI?
Reinette
> [1] https://lore.kernel.org/all/20260601-ssqosid-cbqri-rqsc-v7-0-v6-6-baf00f50028a@kernel.org/
[2] https://lore.kernel.org/lkml/c78169bc-e2d6-4583-96ec-09fa6dd6653a@intel.com/
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
2026-06-04 21:05 ` Reinette Chatre
@ 2026-06-05 19:35 ` Drew Fustini
2026-06-06 5:10 ` Drew Fustini
0 siblings, 1 reply; 30+ messages in thread
From: Drew Fustini @ 2026-06-05 19:35 UTC (permalink / raw)
To: Reinette Chatre
Cc: Ben Horgan, Tony Luck, James Morse, Dave Martin, Babu Moger,
Fenghua Yu, Chen Yu, Borislav Petkov, Thomas Gleixner,
Dave Hansen, Peter Newman, x86@kernel.org,
linux-kernel@vger.kernel.org
On Thu, Jun 04, 2026 at 02:05:08PM -0700, Reinette Chatre wrote:
> >> I plumbed in support for the MB_MIN resource schema which also works under light
> >> testing. The only fs resctrl code change I needed was:
> >>
> >> --- a/include/linux/resctrl.h
> >> +++ b/include/linux/resctrl.h
> >> @@ -483,6 +483,9 @@ static inline u32 resctrl_get_default_ctrlval(struct
> >> resctrl_ctrl *ctrl)
> >> case RESCTRL_CTRL_BITMAP:
> >> return BIT_MASK(ctrl->cache.cbm_len) - 1;
> >> case RESCTRL_CTRL_SCALAR:
> >> + if (ctrl->name == RESCTRL_CTRL_NAME_MIN)
> >> + return ctrl->membw.min_bw;
> >> +
> >> return ctrl->membw.max_bw;
> >> }
> >>
> >>
> >> At least on MPAM systems, we use a default of 0 for minimum bandwidth controls
> >> as the maximum bandwidth controls only take effect if their value is higher than
> >> the minimum bandwidth value. I have specialised this on the ctrl->name which
> >> breaks your ctrl->type based classification but that's fixable by just adding a
> >> default field to membw.
> >
> > This should be useful for RISC-V.
> >
> > RESCTRL_CTRL_NAME_MIN maps well to CBQRI Rbwb (reserved bandwidth
> > blocks). The sum of Rbwb across all control groups must be less than
> > MRBWB (maximum number of reserved bandwidth blocks). As a result, MB_MIN
> > needs to default to 1 so that the sum does not violate that rule. In my
> > RFC series, I added default_to_min to resctrl_membw [1] but this
> > solution looks cleaner.
>
> As I mentioned in response to Ben [2] there seems to be a mismatch between
> architecture requirements here. resctrl uses the value returned by
> resctrl_get_default_ctrlval() as the control value that means "no throttling".
> For Intel this means min == max but this does not seem to be the case for MPAM
> and CBQRI. I am not familiar enough with either to have an alternative proposal here
> so I need to become familiar now. There is a bit of backlog on other resctl
> work right now so this will take me some time to sort out.
Thanks for pointing this out. In that case, it doesn't seem to match
what I was thinking of for MB_MIN. The CBQRI reserved bandwidth blocks
Rbwb) control can be thought of as a minimum amount of guranteed
bandwidth for a control group. Each RCID (e.g. CLOSID) must be assigned
at least 1 bandwidth block per the spec. Therefore, the membw.min_bw
would need to be 1.
There is also a max bandwidth reservation across all control groups
(RCIDs / CLOSIDs) so that there will be some amount of unreserved
bandwidth. Mweight (1-255) controls how much of that unreserved
bandwidth pool that a group can use. Mweight of 0 means no shared
bandwidth. I think the membw.min_bw would need to 255 so that all groups
get equal share of the unreserved pool.
It seems like that would be incorrect use of membw.min_bw in both cases?
> > There is no equivalent to MB (percentage throttle) in RISC-V so I would
> > want it to be valid to have MB_MIN (minimum reservation) without MB.
> >
> > I rebased my RISC-V CBQRI v6 series on top of this proof of concept and
> > was able to validate it works okay in Qemu:
> >
> > MB_WGHT:72=255
> > MB_MIN:72=756
> > L2:64=fff;65=fff
> > L3:75=ffff
>
> Ideally any new support should not break existing user space and the existing
> user interface expects a MB entry in the schemata file when the MB resource exists.
> Is it possible to emulate the percentage based MB control with MB_WGHT or MB_MIN?
> This sounds similar as what is/was planned for MPAM [2].
Yes, I think that Mweight could be mapped to the MB concept of
throttling. All groups could start with the max Mweight of 255 which
could can be represented as 100%.
However, I'm not sure what to do about membw.min_bw. Mweight = 0 means
it can not use any of the shared unreserved bandwidth pool. If
resctrl_get_default_ctrlval() is designed to mean "no throttling", then
it seems like the membw.min_bw would need to be 255. But that feels
weird for the min_bw value to be equal to the max weight for unreserved
bandwidth.
> Something that may be of interest is a proposal that Chenyu is refining to address an
> issue with the region-aware MBA support where there is no intuitive backward compatible
> interface. This was highlighted in the plumbers slides (see slide titled "Open: maintaining
> backward compatibility when region aware"). The current idea to deal with this is to
> introduce a "mode" associated with the resource controls. For example,
>
> # cat /sys/fs/resctrl/info/MB/resource_schemata/mode
> [legacy] native
>
> By default the "legacy" mode will be enabled and exposes the "MB" default control to user
> space via the schemata file. In support of this each new control has a new property file
> named "status" that can have value "enabled" or "disabled". Only "enabled" controls are
> present in the schemata file but all controls are always present in the resource_schemata
> directory. By writing to the "mode" file user space acknowledges familiarity with the new
> "resource_schemata" based interface and can change the status of a control and
> thus manage its visibility in the schemata file.
> Could something like this work for CBQRI?
Yes, I think that would work. There are no existing users of resctrl on
RISC-V so I think having users opt into this resource_schemata interface
would work, especially if that allows a truer represenation of the
controls in the CBQRI spec.
Thanks,
Drew
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
2026-06-05 19:35 ` Drew Fustini
@ 2026-06-06 5:10 ` Drew Fustini
2026-06-06 5:23 ` Drew Fustini
0 siblings, 1 reply; 30+ messages in thread
From: Drew Fustini @ 2026-06-06 5:10 UTC (permalink / raw)
To: Reinette Chatre
Cc: Ben Horgan, Tony Luck, James Morse, Dave Martin, Babu Moger,
Fenghua Yu, Chen Yu, Borislav Petkov, Thomas Gleixner,
Dave Hansen, Peter Newman, x86@kernel.org,
linux-kernel@vger.kernel.org
On Fri, Jun 05, 2026 at 12:35:51PM -0700, Drew Fustini wrote:
> > As I mentioned in response to Ben [2] there seems to be a mismatch between
> > architecture requirements here. resctrl uses the value returned by
> > resctrl_get_default_ctrlval() as the control value that means "no throttling".
> > For Intel this means min == max but this does not seem to be the case for MPAM
> > and CBQRI. I am not familiar enough with either to have an alternative proposal here
> > so I need to become familiar now. There is a bit of backlog on other resctl
> > work right now so this will take me some time to sort out.
>
> Thanks for pointing this out. In that case, it doesn't seem to match
> what I was thinking of for MB_MIN. The CBQRI reserved bandwidth blocks
> Rbwb) control can be thought of as a minimum amount of guranteed
> bandwidth for a control group. Each RCID (e.g. CLOSID) must be assigned
> at least 1 bandwidth block per the spec. Therefore, the membw.min_bw
> would need to be 1.
>
> There is also a max bandwidth reservation across all control groups
> (RCIDs / CLOSIDs) so that there will be some amount of unreserved
> bandwidth. Mweight (1-255) controls how much of that unreserved
> bandwidth pool that a group can use. Mweight of 0 means no shared
> bandwidth. I think the membw.min_bw would need to 255 so that all groups
> get equal share of the unreserved pool.
Sorry, I wasn't thinking about this right. If Mweight is used for MB,
then membw.max_bw would be 100 (MAX_MBA_BW) and membw.min_bw would be 0
which means no shared bandwidth.
> It seems like that would be incorrect use of membw.min_bw in both cases?
The issue is really just for Rbwb (reserved bandwidth) as that needs to
default to the minimum of 1. What about introducing membw.reset_val
which would be returned by resctrl_get_default_ctrl()?
MB could set membw.reset_val to be the same value as membw.max_bw.
> > > There is no equivalent to MB (percentage throttle) in RISC-V so I would
> > > want it to be valid to have MB_MIN (minimum reservation) without MB.
> > >
> > > I rebased my RISC-V CBQRI v6 series on top of this proof of concept and
> > > was able to validate it works okay in Qemu:
> > >
> > > MB_WGHT:72=255
> > > MB_MIN:72=756
> > > L2:64=fff;65=fff
> > > L3:75=ffff
> >
> > Ideally any new support should not break existing user space and the existing
> > user interface expects a MB entry in the schemata file when the MB resource exists.
> > Is it possible to emulate the percentage based MB control with MB_WGHT or MB_MIN?
> > This sounds similar as what is/was planned for MPAM [2].
>
> Yes, I think that Mweight could be mapped to the MB concept of
> throttling. All groups could start with the max Mweight of 255 which
> could can be represented as 100%.
>
> However, I'm not sure what to do about membw.min_bw. Mweight = 0 means
> it can not use any of the shared unreserved bandwidth pool. If
> resctrl_get_default_ctrlval() is designed to mean "no throttling", then
> it seems like the membw.min_bw would need to be 255. But that feels
> weird for the min_bw value to be equal to the max weight for unreserved
> bandwidth.
MB would have the typical membw.min_bw = 100, and
resctrl_get_default_ctrl() would return 100. The controller would be
programmed with Mweight 255 for 100%.
Thanks,
Drew
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
2026-06-06 5:10 ` Drew Fustini
@ 2026-06-06 5:23 ` Drew Fustini
0 siblings, 0 replies; 30+ messages in thread
From: Drew Fustini @ 2026-06-06 5:23 UTC (permalink / raw)
To: Reinette Chatre
Cc: Ben Horgan, Tony Luck, James Morse, Dave Martin, Babu Moger,
Fenghua Yu, Chen Yu, Borislav Petkov, Thomas Gleixner,
Dave Hansen, Peter Newman, x86@kernel.org,
linux-kernel@vger.kernel.org
On Fri, Jun 05, 2026 at 10:10:38PM -0700, Drew Fustini wrote:
> On Fri, Jun 05, 2026 at 12:35:51PM -0700, Drew Fustini wrote:
> > > As I mentioned in response to Ben [2] there seems to be a mismatch between
> > > architecture requirements here. resctrl uses the value returned by
> > > resctrl_get_default_ctrlval() as the control value that means "no throttling".
> > > For Intel this means min == max but this does not seem to be the case for MPAM
> > > and CBQRI. I am not familiar enough with either to have an alternative proposal here
> > > so I need to become familiar now. There is a bit of backlog on other resctl
> > > work right now so this will take me some time to sort out.
> >
> > Thanks for pointing this out. In that case, it doesn't seem to match
> > what I was thinking of for MB_MIN. The CBQRI reserved bandwidth blocks
> > Rbwb) control can be thought of as a minimum amount of guranteed
> > bandwidth for a control group. Each RCID (e.g. CLOSID) must be assigned
> > at least 1 bandwidth block per the spec. Therefore, the membw.min_bw
> > would need to be 1.
> >
> > There is also a max bandwidth reservation across all control groups
> > (RCIDs / CLOSIDs) so that there will be some amount of unreserved
> > bandwidth. Mweight (1-255) controls how much of that unreserved
> > bandwidth pool that a group can use. Mweight of 0 means no shared
> > bandwidth. I think the membw.min_bw would need to 255 so that all groups
> > get equal share of the unreserved pool.
>
> Sorry, I wasn't thinking about this right. If Mweight is used for MB,
> then membw.max_bw would be 100 (MAX_MBA_BW) and membw.min_bw would be 0
> which means no shared bandwidth.
>
> > It seems like that would be incorrect use of membw.min_bw in both cases?
>
> The issue is really just for Rbwb (reserved bandwidth) as that needs to
> default to the minimum of 1. What about introducing membw.reset_val
> which would be returned by resctrl_get_default_ctrl()?
>
> MB could set membw.reset_val to be the same value as membw.max_bw.
>
> > > > There is no equivalent to MB (percentage throttle) in RISC-V so I would
> > > > want it to be valid to have MB_MIN (minimum reservation) without MB.
> > > >
> > > > I rebased my RISC-V CBQRI v6 series on top of this proof of concept and
> > > > was able to validate it works okay in Qemu:
> > > >
> > > > MB_WGHT:72=255
> > > > MB_MIN:72=756
> > > > L2:64=fff;65=fff
> > > > L3:75=ffff
> > >
> > > Ideally any new support should not break existing user space and the existing
> > > user interface expects a MB entry in the schemata file when the MB resource exists.
> > > Is it possible to emulate the percentage based MB control with MB_WGHT or MB_MIN?
> > > This sounds similar as what is/was planned for MPAM [2].
> >
> > Yes, I think that Mweight could be mapped to the MB concept of
> > throttling. All groups could start with the max Mweight of 255 which
> > could can be represented as 100%.
> >
> > However, I'm not sure what to do about membw.min_bw. Mweight = 0 means
> > it can not use any of the shared unreserved bandwidth pool. If
> > resctrl_get_default_ctrlval() is designed to mean "no throttling", then
> > it seems like the membw.min_bw would need to be 255. But that feels
> > weird for the min_bw value to be equal to the max weight for unreserved
> > bandwidth.
>
> MB would have the typical membw.min_bw = 100, and
> resctrl_get_default_ctrl() would return 100. The controller would be
> programmed with Mweight 255 for 100%.
Sorry - I meant MB would have membw.max_bw = 100 which would cause CBQRI
bandwidth controller to be programmed with Mweight = 255.
Drew
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
2026-06-03 15:15 ` Ben Horgan
2026-06-03 19:34 ` Drew Fustini
@ 2026-06-04 17:43 ` Reinette Chatre
2026-06-05 14:53 ` Ben Horgan
1 sibling, 1 reply; 30+ messages in thread
From: Reinette Chatre @ 2026-06-04 17:43 UTC (permalink / raw)
To: Ben Horgan, Tony Luck, James Morse, Dave Martin, Babu Moger,
Drew Fustini, Fenghua Yu, Chen Yu
Cc: Borislav Petkov, Thomas Gleixner, Dave Hansen, Peter Newman,
x86@kernel.org, linux-kernel@vger.kernel.org
Hi Ben,
On 6/3/26 8:15 AM, Ben Horgan wrote:
> Hi Reinette,
>
> On 5/29/26 19:06, Reinette Chatre wrote:
>> Hi Everybody,
>>
>> It has been a while since we discussed the resctrl changes required to support
>> hardware that has controls with fine granularity or hardware that has multiple
>> controls per resource. For reference, the most recent email discussion can
>> be found at [1] with a summary of discussions in last year's plumbers slides [2].
>>
>> I created a PoC that I believe supports what folks have agreed to so far. I
>> hope this can help us to restart the discussion with the goal that resctrl gains
>> support for upcoming hardware that require these features.
>
> Thank you very much for doing this work. I believe this will be very useful for
> MPAM and other architectures.
Thank you very much for reviewing this.
>
>>
>> Request regarding this PoC
>> ==========================
>>
>> Please consider this PoC as a "direction check" on the schema description and multiple
>> control discussions held thus far.
>>
>> Could folks working on enabling new hardware requiring this capability please consider
>> if this is something you can build on and how it should be improved to support these
>> upcoming capabilities?
>>
>> Opens
>> =====
>>
>> While the PoC aims to support what folks agreed on some opens remain:
>> - I attempted to make some MPAM supporting changes but these are all just compile
>> tested. While MPAM should benefit from the new control properties I did not
>> initialize them on MPAM and did not attempt refactor to separate out
>> the architecture specific control properties (more on what this means later).
>> I did attempt some MPAM refactoring that duplicates the MPAM domain to the
>> control domain and monitoring domain lists in support of there being multiple
>> controls each with its own list of control domains but it is definitely not good
>> design.
>
> I appreciate you including MPAM in this PoC. With this one line change I was
> able to boot an MPAM system and mount resctrl which appears to behave correctly.
> We can consider how best to do the code design later.
>
> --- a/drivers/resctrl/mpam_resctrl.c
> +++ b/drivers/resctrl/mpam_resctrl.c
> @@ -1697,6 +1697,8 @@ int mpam_resctrl_setup(void)
>
> /* Initialise the resctrl structures from the classes */
> for_each_mpam_resctrl_control(res, rid) {
> + INIT_LIST_HEAD(&res->resctrl_res.controls); // list_empty needs
> to work
> +
> if (!res->class)
> continue; // dummy resource
>
Thank you very much. I picked this up but ended up moving it earlier to be next
to the mon_domains list initialization. Doing so made it easier for me to follow
the initialization. That ok?
> I plumbed in support for the MB_MIN resource schema which also works under light
> testing. The only fs resctrl code change I needed was:
>
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -483,6 +483,9 @@ static inline u32 resctrl_get_default_ctrlval(struct
> resctrl_ctrl *ctrl)
> case RESCTRL_CTRL_BITMAP:
> return BIT_MASK(ctrl->cache.cbm_len) - 1;
> case RESCTRL_CTRL_SCALAR:
> + if (ctrl->name == RESCTRL_CTRL_NAME_MIN)
> + return ctrl->membw.min_bw;
> +
> return ctrl->membw.max_bw;
> }
>
>
> At least on MPAM systems, we use a default of 0 for minimum bandwidth controls
> as the maximum bandwidth controls only take effect if their value is higher than
> the minimum bandwidth value. I have specialised this on the ctrl->name which
> breaks your ctrl->type based classification but that's fixable by just adding a
> default field to membw.
This I am not sure about. In my understanding a typical "default" value means
"no throttling" and, at least on Intel, this default hardware state has been
summarized as "min" == "max" == "optimal".
Are you saying that on MPAM systems if "min" == "max" then max bandwidth controls
do not take effect? Could you please elaborate what happens if "min" == "max"?
>> - No support for emulated controls (yet). The PoC is quite large already
>> but I think it can be used as a base for emulated controls for which the software
>> controller could be a potential first customer. In this PoC mounting with
>> software controller will still display the original controller's properties.
>> - One open that needs to be addressed as part of support for emulated controls is
>> how best to display emulation relationship via resctrl hierarchy.
>
> What does emulated controls mean here? Is there some previous discussion you
> could point me at?
For emulated controls in context of MPAM I think the best reference is
https://lore.kernel.org/lkml/aPJP52jXJvRYAjjV@e133380.arm.com/
Above is the email discussion that I attempted to visualize in the middle example in slide 6
("resctrl controls vs. hardware controls") of
https://lpc.events/event/19/contributions/2093/attachments/1958/4172/resctrl%20Microconference%20LPC%202025%20Tokyo.pdf
When comparing the slide to Dave's text, please replace "MB_HW" from Dave's example
with "MB_OPT" in the slide. I changed the name since I found the "HW" in an
emulated control to be potentially confusing.
>> - No support for "read-modify-write" usage of schemata file. This is where we
>> discussed (without agreement) on possibly introducing the "#" prefix to schemata
>> file entries. This PoC does not support this prefix and the current assumption/expectation
>> is that when user space changes a configuration only the new control values are
>> written to schemata file. I thus do not have a plan to support this so please
>> share opinions in this regard if you have some.
>
> There is now less motivation from the MPAM side for this than when this was
> initially discussed. In pre-upstream versions of the MPAM patches a change in
> the MB resource control value would change both the mpam h/w mbw_min and mbw_max
> values but now (on non-broken h/w) we just change the mbw_max. (mbw_min kept at 0).
Ah, thanks for the correction. The email I linked above indeed refers to changing
both min and max.
>
> However, it would be useful not to be limited by percentages. In my quick
Indeed. Not being limited by percentages while still needing to have a backward
compatible user interface is how we ended up with "emulated controls".
> experimentation with your patches I used a percentage value for MB_MIN but it
> would be best to move away from this. For new controls I think we can mandate
> that user space has to discover the resolution from the info directly but how
> can we retrofit this. For MPAM, MB and MB_MAX, would control the same things.
> Could we just add MB_MAX with a h/w friendly scale and then reflect changes in
> MB_MAX in MB and vica versa with MB taking precedent if both are set? Old
> software can continue setting MB can move to using MB_MAX and take advantage of
> the improved control. (I don't think we should expose the MPAM hardware value
> directly as it has confusion over whether all 1s is 100% or not and we'd like to
> have something generic and friendly to the user.)
Sounds to me as though you are describing emulated controls. Exposing two
controls in schemata file that essentially controls the same thing is what the
emulated controls aim to solve and the resctrl hierarchies presented in slide #6
of that presentation (and discussed in the email thread) is how we contemplated how
to represent the relationship among these controls to user space. So, considering
your example resctrl may display something like:
info//
└── MB/
└── resource_schemata/
└── MB/
└── MB_MAX/
Above hierarchy describes the relationship to user space that if MB is changed it
will impact MB_MAX and vice-versa.
The one open I am aware of surrounding emulated controls is how to present some
semblance of consistency to user space when considering all the possibilities
the different architectures (and even within architectures) may have.
>> - Controls are independent for now. This means that, for example, if a resource
>> supports a "MIN" and "MAX" control then this implementation would allow user to
>> set the "maximum" control values to be less than the "minimum" control values.
>
> I think this is ok as long as adding support for new controls in resctrl doesn't
> change the existing behaviour. In MPAM we dodged this by introducing MB as only
> affecting the h/w mbw_max and not mbw_min (as mentioned above).
I understand this to be a requirement for Intel where the spec contains "The Maximum Cap
should be programmed to be greater than or equal to the Minimum and Optimal caps.
Undesirable and undefined performance effects may result if cap programming guidelines
are not followed."
I am currently thinking that resctrl should not try to be too smart here and if user
space wants to make dramatic changes to min and max values then it should just ensure
the ordering is appropriate. For example, attempting to set a new min to be larger than
the old max would fail and user space should first increase the old max and then set
a new min.
>
>> - PoC supports the "bitmap" control but does not (yet) expose properties of a bitmap
>> control to the new info/<resource>/resource_schemata directory.
>>
>> Accessing PoC
>> =============
>>
>> Please consider the PoC as a rough draft. It has only been compile tested for Arm
>> and known to be incomplete in Arm support. To help with experimenting I only
>> fully adapted the Intel MBA resource to demo two dummy additional MBA controls.
>> All architectures should immediately benefit from the new schema descriptions
>> and new info/MB/resource_schemata hierarchy.
>>
>> I considered the patches self too many for email. Instead, the PoC can be found at:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/reinette/linux.git branch resctrl/controls_rfc_v1
>>
>> The work is based on v7.1-rc2 that also includes the following series (two of which has
>> since been queued) included:
>>
>> "selftests/resctrl: Fixes and improvements focused on Intel platforms"
>> https://lore.kernel.org/lkml/cover.1775266384.git.reinette.chatre@intel.com/
>>
>> "x86,fs/resctrl: Improve resctrl quality and consistency"
>> https://lore.kernel.org/lkml/cover.1777419024.git.reinette.chatre@intel.com/
>>
>> "x86,fs/resctrl: Pave the way for MPAM counter assignment"
>> https://lore.kernel.org/lkml/20260506082855.3694761-1-ben.horgan@arm.com/
>>
>>
>> Primary resctrl fs data structure changes
>> =========================================
>>
>> Introduces a control represented by struct resctrl_ctrl that looks as below. To make
>> the changes easier to follow I kept some of the original names to help communicate
>> where familiar data structures land.
>>
>> What to notice about a control is that it has some common properties required
>> from all controls (scope, type, etc.) and then depending on the type of control
>> (RESCTRL_CTRL_BITMAP or RESCTRL_CTRL_SCALAR) there are type specific properties.
>>
>> /**
>> * struct resctrl_ctrl - A resource control
>> * @entry: List entry of rdt_resource::controls
>> * @scope: Scope of the resource that this control allocates
>> * @domains: RCU list of all control domains
>> * @type: The control type that determines the properties of the control,
>> * format string for displaying control values to user space, and
>> * parser of control values provided by user space.
>> * @name: Name of the control. Appended to final resource name
>> * (rdt_resource_final::name) to create final schema entry.
>> * Specifically, "rdt_resource_final::name"_"resctrl_ctrl::name".
>> * For example, with resource name "MB" and control name "MAX" the
>> * schema entry will be "MB_MAX".
>> * @cache: Cache allocation control properties.
>> * @membw: Bandwidth control properties.
>> */
>> struct resctrl_ctrl {
>> struct list_head entry;
>> enum resctrl_scope scope;
>> struct list_head domains;
>> enum resctrl_ctrl_type type;
>> enum resctrl_ctrl_name name;
>> union {
>> struct resctrl_cache cache;
>> struct resctrl_membw membw;
>> };
>> };
>>
>> Two members summarize how this new structure fits into the rest of resctrl:
>> a) resctrl_ctrl::entry
>> Since a resource can support multiple controls there is a new list
>> in struct rdt_resource named "controls" that contains the list of all
>> controls supported by the resource.
>> b) resctrl_ctrl::domains
>> Instead of the list of control domains belonging to a resource they
>> now belong to the control self. By doing so resctrl can support resource
>> controls at different scope for the same resource. This is intended to
>> support some upcoming MPAM and RISC-V usages.
>
> Please can you expand a bit on part b).
>
> In an MPAM system we consider 3 resctrl resources, RDT_RESOURCE_L3,
> RDT_RESOURCE_L2 and RDT_RESOURCE_MBA which correspond to the L3 caches, L2
> caches and memory bandwidth on egress from the L3 caches. The domain for each of
> these corresponds to the instance of the resource. That is, for RDT_RESOURCE_L2
> there is a resource for each L2 instance, similarly for L3, and for
(I'm assuming above is typo and it is "there is a domain for each L2 instance"?)
> RDT_RESOURCE_MBA there is a domain for each L3 cache. If we were to add suport
> for controls on a new cache level, say the L4, then I'd expect to add a new
> resource. For memory bandwidth, we'd like to be able to control b/w on the L2
> egress (e.g. in a DSU). Wouldn't this too be a separate resource or would this
> be a new set of controls on the same resource?
>
> New controls on the same resource
> MB_MIN2
> MB_MAX2
> MB_PROP2
> ...
>
> or
> MB2_MIN
> MB2_MAX
> MB2_PROP
The way I currently see it is that controlling bandwidth at a different scope would
be a new set of controls associated with the MB resource. There are more scenarios
coming this way with AMD's "Global MBA" that is memory bandwidth allocation at
NUMA node scope. If I understand correctly the "CPU-less Memory Node" that Nvidia
shared at plumbers would need this also and control memory bandwidth allocation
at the NUMA node scope. A related technology is Intel's region-aware MBA, which is
still at L3 scope.
I fully agree that we need to figure out how to represent all of this to user space
without turning the interface into something unintelligible. In the end this is
required for user space to know what a domain ID represents.
Would it help to make the scope part of the control name? The ship has sailed for
MB being associated with L3 scope but this could mean the "default" scope of MB
resource is L3 (which user space can still confirm by looking at the control's
"scope" file) and the others include scope in the name? Consider for example:
https://lore.kernel.org/lkml/fb1e2686-237b-4536-acd6-15159abafcba@intel.com/
>
> AFAIK, the DSU h/w just supports proportional bandwidth controls at the moment
> but we should consider what to do about the potential naming.
ack.
>
> In the MPAM driver, we collect MSC into components (based on instances) and
> those into classes (components of the same type). Currently, a resource is
> mapped to a single class. (Two resources may map to the same class.)
>
> I expect it is useful in the memory region and sub numa cases but I'd still
> expect the common case to be that the domains are the same within a control. Or
> am I missing something?
Domains of a control should all be at the same scope. Since the schemata file
exposes the control with the different IDs representing the instances of the
resource needing to be controlled it has to be clear to user space what the
domain ID represents.
>
>>
>> Example architectural data structure changes
>> ============================================
>>
>> An architecture can use the new control by following a similar pattern to
>> resource and domain use by architectures. Consider the following for x86
>> where a new architecture specific struct resctrl_hw_ctrl includes
>> struct resctrl_ctrl and any architecture private data needed to support
>> the control:
>>
>> /*
>> * struct resctrl_hw_ctrl - Arch private properties of a resource control
>> * @r_ctrl: Control properties exposed to resctrl file system
>> * @msr_base: Base MSR address where control values should be programmed
>> * @msr_update: Function pointer to update control values
>> */
>> struct resctrl_hw_ctrl {
>> struct resctrl_ctrl r_ctrl;
>> unsigned int msr_base;
>> void (*msr_update)(struct msr_param *m);
>> };
>>
>> Structure of patch series
>> =========================
>>
>> As a PoC the series is not perfectly structured but to help navigate this work
>> on a high level the changes can be categorized as follows:
>>
>> Patch 1 to 11:
>> With a vision of what a "control" is, remove unused/unnecessary
>> members, make clear what is a *resource* property vs a *control*
>> property, do some renaming to help with the PoC.
>
> A few of the changes are generic cleanup and could hopefully be dealt with
> before decisions on the larger PoC are made. I see:
> fs/resctrl: Remove unused resctrl_membw::mb_map
> x86,fs/resctrl: Remove "arch_needs_linear"
> Perhaps a few more.
ack.
>>
>> Patch 12:
>> Introduce struct resctrl_ctrl and re-arrange existing struct rdt_resource
>> members to form part of new rdt_resource::ctrl
>>
>> Patch 13 to 44:
>> A lot of wrangling to introduce struct resctrl_ctrl to all code that needs
>> to work with a control and/or domain without assuming that the control is
>> the one and only control embedded in the resource it belongs to. Essentially,
>> a lot of changes passing the control around in addition to the resource/domain.
>
> You mention a few times in the commit message that you expect the cache
> resources to only have one control. On MPAM we have CMAX (and there looks to be
> a RISC-V equivalent) where the total number of bytes in the cache for a given
> closid is limited. The allocation must still respect the CPBM bitmap though.
> Looking at the code though I don't see much problem in adding this as an
> additional control. The assumption that these patches is making is not that
> there is only one control for cache resources but rather that cache portions are
> managed by the default cache resource control. Am I missing something or does
> that assessment make sense to you?
Your assessment is correct. There are still a few assumptions built into resctrl
about there only being a single cache control and it being a bitmap control.
Since I am not familiar with the other possible cache controls I instead focused
on isolating the existing cache control. When I/we have better understanding about
how additional cache controls behave this implementation can be adapted to support
it.
>
> I have been looking at adding CMAX control to resctrl and will have a go at
> basing what I have so far on top of this series.
Thank you!
...
>>> Any feedback is appreciated.
>
> Overall, this looks to be a big step in the right direction.
Glad to hear this.
Thank you very much.
Reinette
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
2026-06-04 17:43 ` Reinette Chatre
@ 2026-06-05 14:53 ` Ben Horgan
2026-06-05 15:39 ` Reinette Chatre
0 siblings, 1 reply; 30+ messages in thread
From: Ben Horgan @ 2026-06-05 14:53 UTC (permalink / raw)
To: Reinette Chatre, Tony Luck, James Morse, Dave Martin, Babu Moger,
Drew Fustini, Fenghua Yu, Chen Yu
Cc: Borislav Petkov, Thomas Gleixner, Dave Hansen, Peter Newman,
x86@kernel.org, linux-kernel@vger.kernel.org
Hi Reinette,
On 6/4/26 18:43, Reinette Chatre wrote:
> Hi Ben,
>
> On 6/3/26 8:15 AM, Ben Horgan wrote:
>> Hi Reinette,
>>
>> On 5/29/26 19:06, Reinette Chatre wrote:
>>> Hi Everybody,
>>>
>>> It has been a while since we discussed the resctrl changes required to support
>>> hardware that has controls with fine granularity or hardware that has multiple
>>> controls per resource. For reference, the most recent email discussion can
>>> be found at [1] with a summary of discussions in last year's plumbers slides [2].
>>>
>>> I created a PoC that I believe supports what folks have agreed to so far. I
>>> hope this can help us to restart the discussion with the goal that resctrl gains
>>> support for upcoming hardware that require these features.
>>
>> Thank you very much for doing this work. I believe this will be very useful for
>> MPAM and other architectures.
>
> Thank you very much for reviewing this.
>
>>
>>>
>>> Request regarding this PoC
>>> ==========================
>>>
>>> Please consider this PoC as a "direction check" on the schema description and multiple
>>> control discussions held thus far.
>>>
>>> Could folks working on enabling new hardware requiring this capability please consider
>>> if this is something you can build on and how it should be improved to support these
>>> upcoming capabilities?
>>>
>>> Opens
>>> =====
>>>
>>> While the PoC aims to support what folks agreed on some opens remain:
>>> - I attempted to make some MPAM supporting changes but these are all just compile
>>> tested. While MPAM should benefit from the new control properties I did not
>>> initialize them on MPAM and did not attempt refactor to separate out
>>> the architecture specific control properties (more on what this means later).
>>> I did attempt some MPAM refactoring that duplicates the MPAM domain to the
>>> control domain and monitoring domain lists in support of there being multiple
>>> controls each with its own list of control domains but it is definitely not good
>>> design.
>>
>> I appreciate you including MPAM in this PoC. With this one line change I was
>> able to boot an MPAM system and mount resctrl which appears to behave correctly.
>> We can consider how best to do the code design later.
>>
>> --- a/drivers/resctrl/mpam_resctrl.c
>> +++ b/drivers/resctrl/mpam_resctrl.c
>> @@ -1697,6 +1697,8 @@ int mpam_resctrl_setup(void)
>>
>> /* Initialise the resctrl structures from the classes */
>> for_each_mpam_resctrl_control(res, rid) {
>> + INIT_LIST_HEAD(&res->resctrl_res.controls); // list_empty needs
>> to work
>> +
>> if (!res->class)
>> continue; // dummy resource
>>
>
> Thank you very much. I picked this up but ended up moving it earlier to be next
> to the mon_domains list initialization. Doing so made it easier for me to follow
> the initialization. That ok?
Yes, that makes sense.
>
>> I plumbed in support for the MB_MIN resource schema which also works under light
>> testing. The only fs resctrl code change I needed was:
>>
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -483,6 +483,9 @@ static inline u32 resctrl_get_default_ctrlval(struct
>> resctrl_ctrl *ctrl)
>> case RESCTRL_CTRL_BITMAP:
>> return BIT_MASK(ctrl->cache.cbm_len) - 1;
>> case RESCTRL_CTRL_SCALAR:
>> + if (ctrl->name == RESCTRL_CTRL_NAME_MIN)
>> + return ctrl->membw.min_bw;
>> +
>> return ctrl->membw.max_bw;
>> }
>>
>>
>> At least on MPAM systems, we use a default of 0 for minimum bandwidth controls
>> as the maximum bandwidth controls only take effect if their value is higher than
>> the minimum bandwidth value. I have specialised this on the ctrl->name which
>> breaks your ctrl->type based classification but that's fixable by just adding a
>> default field to membw.
>
> This I am not sure about. In my understanding a typical "default" value means
> "no throttling" and, at least on Intel, this default hardware state has been
> summarized as "min" == "max" == "optimal".
Ok, this sounds odd to me but that is probably because I don't know what Intel
systems do. On MPAM systems a MIN control is a boost rather than a throttling
control. Although, you can always think of that as throttling the traffic with
the other PARTIDs.
>
> Are you saying that on MPAM systems if "min" == "max" then max bandwidth controls
> do not take effect? Could you please elaborate what happens if "min" == "max"?
Table 5-4 from section 5.2.8 of the IHI0099B.b shows the interaction between the
min and maximum controls.
If used bandwidth is The preference is Description
Below the minimum High Only high requests compete with this
request.
Above the minimum:
Below the maximum Medium High requests are serviced first then
this request competes with other
medium requests.
Above the maximum, Low Requests are not serviced if any high
when HARDLIM is 0 or medium requests are available.
Above the maximum, None Requests are not serviced
when HARDLIM is 1
So if we keep the minimum and the maximum controls values always the same then
all traffic will be given "high" preference until the target bandwidth is
reached. For some MPAM systems it is recommended to set the minimum value as 5%
less than the maximum value to get a reliable target bandwidth. As 5% seems
implementation specific and some systems don't have min controls it seemed
better to just match the MB control with a maximum bandwidth control and let the
user have freedom to choose the minimum bandwidth control when MB_MIN support is
added.
If a default for the minimum of the maximum possible bandwidth is used (100%)
then any change of the maximum won't have any effect as it's always less than
minimum (if that's unchanged) and so all traffic is high preference. I now see
from your reply below that you are planning on not allowing this kind of
configuration.
If the minimum always tracks the maximum then we lose the distinction between
medium and high preference traffic and so to reserve some high preference
bandwidth for one control group we'd have to change the configuration in the
other controls groups so that they're bandwidth preference is medium (minimum
value at 0).
>
>>> - No support for emulated controls (yet). The PoC is quite large already
>>> but I think it can be used as a base for emulated controls for which the software
>>> controller could be a potential first customer. In this PoC mounting with
>>> software controller will still display the original controller's properties.
>>> - One open that needs to be addressed as part of support for emulated controls is
>>> how best to display emulation relationship via resctrl hierarchy.
>>
>> What does emulated controls mean here? Is there some previous discussion you
>> could point me at?
>
> For emulated controls in context of MPAM I think the best reference is
> https://lore.kernel.org/lkml/aPJP52jXJvRYAjjV@e133380.arm.com/
>
> Above is the email discussion that I attempted to visualize in the middle example in slide 6
> ("resctrl controls vs. hardware controls") of
> https://lpc.events/event/19/contributions/2093/attachments/1958/4172/resctrl%20Microconference%20LPC%202025%20Tokyo.pdf
>
> When comparing the slide to Dave's text, please replace "MB_HW" from Dave's example
> with "MB_OPT" in the slide. I changed the name since I found the "HW" in an
> emulated control to be potentially confusing.
Ah I see, thanks for links and descriptions.
>
>
>>> - No support for "read-modify-write" usage of schemata file. This is where we
>>> discussed (without agreement) on possibly introducing the "#" prefix to schemata
>>> file entries. This PoC does not support this prefix and the current assumption/expectation
>>> is that when user space changes a configuration only the new control values are
>>> written to schemata file. I thus do not have a plan to support this so please
>>> share opinions in this regard if you have some.
>>
>> There is now less motivation from the MPAM side for this than when this was
>> initially discussed. In pre-upstream versions of the MPAM patches a change in
>> the MB resource control value would change both the mpam h/w mbw_min and mbw_max
>> values but now (on non-broken h/w) we just change the mbw_max. (mbw_min kept at 0).
>
> Ah, thanks for the correction. The email I linked above indeed refers to changing
> both min and max.
>
>>
>> However, it would be useful not to be limited by percentages. In my quick
>
> Indeed. Not being limited by percentages while still needing to have a backward
> compatible user interface is how we ended up with "emulated controls".
>
>> experimentation with your patches I used a percentage value for MB_MIN but it
>> would be best to move away from this. For new controls I think we can mandate
>> that user space has to discover the resolution from the info directly but how
>> can we retrofit this. For MPAM, MB and MB_MAX, would control the same things.
>> Could we just add MB_MAX with a h/w friendly scale and then reflect changes in
>> MB_MAX in MB and vica versa with MB taking precedent if both are set? Old
>> software can continue setting MB can move to using MB_MAX and take advantage of
>> the improved control. (I don't think we should expose the MPAM hardware value
>> directly as it has confusion over whether all 1s is 100% or not and we'd like to
>> have something generic and friendly to the user.)
>
> Sounds to me as though you are describing emulated controls. Exposing two
> controls in schemata file that essentially controls the same thing is what the
> emulated controls aim to solve and the resctrl hierarchies presented in slide #6
> of that presentation (and discussed in the email thread) is how we contemplated how
> to represent the relationship among these controls to user space. So, considering
> your example resctrl may display something like:
>
> info//
> └── MB/
> └── resource_schemata/
> └── MB/
> └── MB_MAX/
>
> Above hierarchy describes the relationship to user space that if MB is changed it
> will impact MB_MAX and vice-versa.
>
> The one open I am aware of surrounding emulated controls is how to present some
> semblance of consistency to user space when considering all the possibilities
> the different architectures (and even within architectures) may have.
What other use cases do we have apart from MB and MB_MAX? I was wondering if
this could be limited to a default control (L2, L3, MB..) with a single new
style control (L2_*, L3_*, MB_ ...) under it.
>
>
>>> - Controls are independent for now. This means that, for example, if a resource
>>> supports a "MIN" and "MAX" control then this implementation would allow user to
>>> set the "maximum" control values to be less than the "minimum" control values.
>>
>> I think this is ok as long as adding support for new controls in resctrl doesn't
>> change the existing behaviour. In MPAM we dodged this by introducing MB as only
>> affecting the h/w mbw_max and not mbw_min (as mentioned above).
>
> I understand this to be a requirement for Intel where the spec contains "The Maximum Cap
> should be programmed to be greater than or equal to the Minimum and Optimal caps.
> Undesirable and undefined performance effects may result if cap programming guidelines
> are not followed."
>
> I am currently thinking that resctrl should not try to be too smart here and if user
> space wants to make dramatic changes to min and max values then it should just ensure
> the ordering is appropriate. For example, attempting to set a new min to be larger than
> the old max would fail and user space should first increase the old max and then set
> a new min.
Ok with me.
>
>>
>>> - PoC supports the "bitmap" control but does not (yet) expose properties of a bitmap
>>> control to the new info/<resource>/resource_schemata directory.
>>>
>>> Accessing PoC
>>> =============
>>>
>>> Please consider the PoC as a rough draft. It has only been compile tested for Arm
>>> and known to be incomplete in Arm support. To help with experimenting I only
>>> fully adapted the Intel MBA resource to demo two dummy additional MBA controls.
>>> All architectures should immediately benefit from the new schema descriptions
>>> and new info/MB/resource_schemata hierarchy.
>>>
>>> I considered the patches self too many for email. Instead, the PoC can be found at:
>>>
>>> git://git.kernel.org/pub/scm/linux/kernel/git/reinette/linux.git branch resctrl/controls_rfc_v1
>>>
>>> The work is based on v7.1-rc2 that also includes the following series (two of which has
>>> since been queued) included:
>>>
>>> "selftests/resctrl: Fixes and improvements focused on Intel platforms"
>>> https://lore.kernel.org/lkml/cover.1775266384.git.reinette.chatre@intel.com/
>>>
>>> "x86,fs/resctrl: Improve resctrl quality and consistency"
>>> https://lore.kernel.org/lkml/cover.1777419024.git.reinette.chatre@intel.com/
>>>
>>> "x86,fs/resctrl: Pave the way for MPAM counter assignment"
>>> https://lore.kernel.org/lkml/20260506082855.3694761-1-ben.horgan@arm.com/
>>>
>>>
>>> Primary resctrl fs data structure changes
>>> =========================================
>>>
>>> Introduces a control represented by struct resctrl_ctrl that looks as below. To make
>>> the changes easier to follow I kept some of the original names to help communicate
>>> where familiar data structures land.
>>>
>>> What to notice about a control is that it has some common properties required
>>> from all controls (scope, type, etc.) and then depending on the type of control
>>> (RESCTRL_CTRL_BITMAP or RESCTRL_CTRL_SCALAR) there are type specific properties.
>>>
>>> /**
>>> * struct resctrl_ctrl - A resource control
>>> * @entry: List entry of rdt_resource::controls
>>> * @scope: Scope of the resource that this control allocates
>>> * @domains: RCU list of all control domains
>>> * @type: The control type that determines the properties of the control,
>>> * format string for displaying control values to user space, and
>>> * parser of control values provided by user space.
>>> * @name: Name of the control. Appended to final resource name
>>> * (rdt_resource_final::name) to create final schema entry.
>>> * Specifically, "rdt_resource_final::name"_"resctrl_ctrl::name".
>>> * For example, with resource name "MB" and control name "MAX" the
>>> * schema entry will be "MB_MAX".
>>> * @cache: Cache allocation control properties.
>>> * @membw: Bandwidth control properties.
>>> */
>>> struct resctrl_ctrl {
>>> struct list_head entry;
>>> enum resctrl_scope scope;
>>> struct list_head domains;
>>> enum resctrl_ctrl_type type;
>>> enum resctrl_ctrl_name name;
>>> union {
>>> struct resctrl_cache cache;
>>> struct resctrl_membw membw;
>>> };
>>> };
>>>
>>> Two members summarize how this new structure fits into the rest of resctrl:
>>> a) resctrl_ctrl::entry
>>> Since a resource can support multiple controls there is a new list
>>> in struct rdt_resource named "controls" that contains the list of all
>>> controls supported by the resource.
>>> b) resctrl_ctrl::domains
>>> Instead of the list of control domains belonging to a resource they
>>> now belong to the control self. By doing so resctrl can support resource
>>> controls at different scope for the same resource. This is intended to
>>> support some upcoming MPAM and RISC-V usages.
>>
>> Please can you expand a bit on part b).
>>
>> In an MPAM system we consider 3 resctrl resources, RDT_RESOURCE_L3,
>> RDT_RESOURCE_L2 and RDT_RESOURCE_MBA which correspond to the L3 caches, L2
>> caches and memory bandwidth on egress from the L3 caches. The domain for each of
>> these corresponds to the instance of the resource. That is, for RDT_RESOURCE_L2
>> there is a resource for each L2 instance, similarly for L3, and for
>
> (I'm assuming above is typo and it is "there is a domain for each L2 instance"?)
yes, a mistake
>
>> RDT_RESOURCE_MBA there is a domain for each L3 cache. If we were to add suport
>> for controls on a new cache level, say the L4, then I'd expect to add a new
>> resource. For memory bandwidth, we'd like to be able to control b/w on the L2
>> egress (e.g. in a DSU). Wouldn't this too be a separate resource or would this
>> be a new set of controls on the same resource?
>>
>> New controls on the same resource
>> MB_MIN2
>> MB_MAX2
>> MB_PROP2
>> ...
>>
>> or
>> MB2_MIN
>> MB2_MAX
>> MB2_PROP
>
>
> The way I currently see it is that controlling bandwidth at a different scope would
> be a new set of controls associated with the MB resource. There are more scenarios
> coming this way with AMD's "Global MBA" that is memory bandwidth allocation at
> NUMA node scope. If I understand correctly the "CPU-less Memory Node" that Nvidia
> shared at plumbers would need this also and control memory bandwidth allocation
> at the NUMA node scope.
Yes, in general for MSC at the memory controlers it would be good to scope these
by NUMA node whether or not they are CPU-less or not.
> A related technology is Intel's region-aware MBA, which is
> still at L3 scope.
>
> I fully agree that we need to figure out how to represent all of this to user space
> without turning the interface into something unintelligible. In the end this is
> required for user space to know what a domain ID represents.
>
> Would it help to make the scope part of the control name? The ship has sailed for
> MB being associated with L3 scope but this could mean the "default" scope of MB
> resource is L3 (which user space can still confirm by looking at the control's
> "scope" file) and the others include scope in the name? Consider for example:
> https://lore.kernel.org/lkml/fb1e2686-237b-4536-acd6-15159abafcba@intel.com/
This certainly helps with the naming.
The scope does have an effect on what causes a domain to be present or not. For
existing scopes, such as L3 scope, that whether a domain is online or not is
dependent on whether or not a set cpu is online and the cpu_read_lock is taken.
However, for NUMA scope in MPAM (maybe not GMBA?) then whether or not the domain
is online would need to depend on whether the memory is online or not and the
memory hotplug lock will be needed to be taken. I am wondering if this sort of
configuration means it's better to have the NUMA scoped memory bandwidth on a
different resource or we just say ok and always take the memory hotplug lock ,
get_online_mems(), where we take the cpu_read_lock.
>
>
>>
>> AFAIK, the DSU h/w just supports proportional bandwidth controls at the moment
>> but we should consider what to do about the potential naming.
>
> ack.
>
>>
>> In the MPAM driver, we collect MSC into components (based on instances) and
>> those into classes (components of the same type). Currently, a resource is
>> mapped to a single class. (Two resources may map to the same class.)
>>
>> I expect it is useful in the memory region and sub numa cases but I'd still
>> expect the common case to be that the domains are the same within a control. Or
>> am I missing something?
>
> Domains of a control should all be at the same scope. Since the schemata file
> exposes the control with the different IDs representing the instances of the
> resource needing to be controlled it has to be clear to user space what the
> domain ID represents.
Agreed. (I meant to say the domains within a resource are likely to be the same
for each control within the same resource.)
Thanks,
Ben
>
>>
>>>
>>> Example architectural data structure changes
>>> ============================================
>>>
>>> An architecture can use the new control by following a similar pattern to
>>> resource and domain use by architectures. Consider the following for x86
>>> where a new architecture specific struct resctrl_hw_ctrl includes
>>> struct resctrl_ctrl and any architecture private data needed to support
>>> the control:
>>>
>>> /*
>>> * struct resctrl_hw_ctrl - Arch private properties of a resource control
>>> * @r_ctrl: Control properties exposed to resctrl file system
>>> * @msr_base: Base MSR address where control values should be programmed
>>> * @msr_update: Function pointer to update control values
>>> */
>>> struct resctrl_hw_ctrl {
>>> struct resctrl_ctrl r_ctrl;
>>> unsigned int msr_base;
>>> void (*msr_update)(struct msr_param *m);
>>> };
>>>
>>> Structure of patch series
>>> =========================
>>>
>>> As a PoC the series is not perfectly structured but to help navigate this work
>>> on a high level the changes can be categorized as follows:
>>>
>>> Patch 1 to 11:
>>> With a vision of what a "control" is, remove unused/unnecessary
>>> members, make clear what is a *resource* property vs a *control*
>>> property, do some renaming to help with the PoC.
>>
>> A few of the changes are generic cleanup and could hopefully be dealt with
>> before decisions on the larger PoC are made. I see:
>> fs/resctrl: Remove unused resctrl_membw::mb_map
>> x86,fs/resctrl: Remove "arch_needs_linear"
>> Perhaps a few more.
>
> ack.
>
>>>
>>> Patch 12:
>>> Introduce struct resctrl_ctrl and re-arrange existing struct rdt_resource
>>> members to form part of new rdt_resource::ctrl
>>>
>>> Patch 13 to 44:
>>> A lot of wrangling to introduce struct resctrl_ctrl to all code that needs
>>> to work with a control and/or domain without assuming that the control is
>>> the one and only control embedded in the resource it belongs to. Essentially,
>>> a lot of changes passing the control around in addition to the resource/domain.
>>
>> You mention a few times in the commit message that you expect the cache
>> resources to only have one control. On MPAM we have CMAX (and there looks to be
>> a RISC-V equivalent) where the total number of bytes in the cache for a given
>> closid is limited. The allocation must still respect the CPBM bitmap though.
>> Looking at the code though I don't see much problem in adding this as an
>> additional control. The assumption that these patches is making is not that
>> there is only one control for cache resources but rather that cache portions are
>> managed by the default cache resource control. Am I missing something or does
>> that assessment make sense to you?
>
> Your assessment is correct. There are still a few assumptions built into resctrl
> about there only being a single cache control and it being a bitmap control.
> Since I am not familiar with the other possible cache controls I instead focused
> on isolating the existing cache control. When I/we have better understanding about
> how additional cache controls behave this implementation can be adapted to support
> it.
>
>>
>> I have been looking at adding CMAX control to resctrl and will have a go at
>> basing what I have so far on top of this series.
>
> Thank you!
>
>
> ...
>
>>>> Any feedback is appreciated.
>>
>> Overall, this looks to be a big step in the right direction.
>
> Glad to hear this.
>
> Thank you very much.
>
> Reinette
>
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
2026-06-05 14:53 ` Ben Horgan
@ 2026-06-05 15:39 ` Reinette Chatre
2026-06-05 16:37 ` Ben Horgan
0 siblings, 1 reply; 30+ messages in thread
From: Reinette Chatre @ 2026-06-05 15:39 UTC (permalink / raw)
To: Ben Horgan, Tony Luck, James Morse, Dave Martin, Babu Moger,
Drew Fustini, Fenghua Yu, Chen Yu
Cc: Borislav Petkov, Thomas Gleixner, Dave Hansen, Peter Newman,
x86@kernel.org, linux-kernel@vger.kernel.org
Hi Ben,
On 6/5/26 7:53 AM, Ben Horgan wrote:
> On 6/4/26 18:43, Reinette Chatre wrote:
>> On 6/3/26 8:15 AM, Ben Horgan wrote:
>>> On 5/29/26 19:06, Reinette Chatre wrote:
...
>>
>>> I plumbed in support for the MB_MIN resource schema which also works under light
>>> testing. The only fs resctrl code change I needed was:
>>>
>>> --- a/include/linux/resctrl.h
>>> +++ b/include/linux/resctrl.h
>>> @@ -483,6 +483,9 @@ static inline u32 resctrl_get_default_ctrlval(struct
>>> resctrl_ctrl *ctrl)
>>> case RESCTRL_CTRL_BITMAP:
>>> return BIT_MASK(ctrl->cache.cbm_len) - 1;
>>> case RESCTRL_CTRL_SCALAR:
>>> + if (ctrl->name == RESCTRL_CTRL_NAME_MIN)
>>> + return ctrl->membw.min_bw;
>>> +
>>> return ctrl->membw.max_bw;
>>> }
>>>
>>>
>>> At least on MPAM systems, we use a default of 0 for minimum bandwidth controls
>>> as the maximum bandwidth controls only take effect if their value is higher than
>>> the minimum bandwidth value. I have specialised this on the ctrl->name which
>>> breaks your ctrl->type based classification but that's fixable by just adding a
>>> default field to membw.
>>
>> This I am not sure about. In my understanding a typical "default" value means
>> "no throttling" and, at least on Intel, this default hardware state has been
>> summarized as "min" == "max" == "optimal".
>
> Ok, this sounds odd to me but that is probably because I don't know what Intel
> systems do. On MPAM systems a MIN control is a boost rather than a throttling
> control. Although, you can always think of that as throttling the traffic with
> the other PARTIDs.
>
>>
>> Are you saying that on MPAM systems if "min" == "max" then max bandwidth controls
>> do not take effect? Could you please elaborate what happens if "min" == "max"?
>
> Table 5-4 from section 5.2.8 of the IHI0099B.b shows the interaction between the
> min and maximum controls.
>
> If used bandwidth is The preference is Description
> Below the minimum High Only high requests compete with this
> request.
> Above the minimum:
> Below the maximum Medium High requests are serviced first then
>
> this request competes with other
> medium requests.
>
> Above the maximum, Low Requests are not serviced if any high
> when HARDLIM is 0 or medium requests are available.
>
> Above the maximum, None Requests are not serviced
> when HARDLIM is 1
>
> So if we keep the minimum and the maximum controls values always the same then
> all traffic will be given "high" preference until the target bandwidth is
> reached. For some MPAM systems it is recommended to set the minimum value as 5%
> less than the maximum value to get a reliable target bandwidth. As 5% seems
> implementation specific and some systems don't have min controls it seemed
> better to just match the MB control with a maximum bandwidth control and let the
> user have freedom to choose the minimum bandwidth control when MB_MIN support is
> added.
>
> If a default for the minimum of the maximum possible bandwidth is used (100%)
> then any change of the maximum won't have any effect as it's always less than
> minimum (if that's unchanged) and so all traffic is high preference. I now see
> from your reply below that you are planning on not allowing this kind of
> configuration.
>
> If the minimum always tracks the maximum then we lose the distinction between
> medium and high preference traffic and so to reserve some high preference
> bandwidth for one control group we'd have to change the configuration in the
> other controls groups so that they're bandwidth preference is medium (minimum
> value at 0).
I do not think we are talking about the same thing here. I am *not* saying
that minimum and maximum controls should always be the same.
The discussion is about a proposed change to resctrl_get_default_ctrlval(). resctrl
uses this function in two places:
- When creating a new resource group:
The intention here is that when user space creates a new resource group it should
be created with maximum allocations possible. For MBA this means "unthrottled".
After creating the resource group user space can adjust allocations to match
workload requirements.
- When unmounting the resctrl fs.
The intention here is that all controls are set to unthrottled to stop any possible
impact to system when user space stops using resctrl.
resctrl_get_default_ctrlval() is thus intended to support an unthrottled baseline from
where user space can make configuration changes as supported by hardware and required
by workloads.
I see that the MPAM driver internally uses resctrl_get_default_ctrlval() in a couple
of places and I am not considering this usage here. If internally MPAM has other
usages for this function where it does not mean "unthrottled" then perhaps
it would be better to create a new function that matches the usage?
>>>> - No support for "read-modify-write" usage of schemata file. This is where we
>>>> discussed (without agreement) on possibly introducing the "#" prefix to schemata
>>>> file entries. This PoC does not support this prefix and the current assumption/expectation
>>>> is that when user space changes a configuration only the new control values are
>>>> written to schemata file. I thus do not have a plan to support this so please
>>>> share opinions in this regard if you have some.
>>>
>>> There is now less motivation from the MPAM side for this than when this was
>>> initially discussed. In pre-upstream versions of the MPAM patches a change in
>>> the MB resource control value would change both the mpam h/w mbw_min and mbw_max
>>> values but now (on non-broken h/w) we just change the mbw_max. (mbw_min kept at 0).
>>
>> Ah, thanks for the correction. The email I linked above indeed refers to changing
>> both min and max.
>>
>>>
>>> However, it would be useful not to be limited by percentages. In my quick
>>
>> Indeed. Not being limited by percentages while still needing to have a backward
>> compatible user interface is how we ended up with "emulated controls".
>>
>>> experimentation with your patches I used a percentage value for MB_MIN but it
>>> would be best to move away from this. For new controls I think we can mandate
>>> that user space has to discover the resolution from the info directly but how
>>> can we retrofit this. For MPAM, MB and MB_MAX, would control the same things.
>>> Could we just add MB_MAX with a h/w friendly scale and then reflect changes in
>>> MB_MAX in MB and vica versa with MB taking precedent if both are set? Old
>>> software can continue setting MB can move to using MB_MAX and take advantage of
>>> the improved control. (I don't think we should expose the MPAM hardware value
>>> directly as it has confusion over whether all 1s is 100% or not and we'd like to
>>> have something generic and friendly to the user.)
>>
>> Sounds to me as though you are describing emulated controls. Exposing two
>> controls in schemata file that essentially controls the same thing is what the
>> emulated controls aim to solve and the resctrl hierarchies presented in slide #6
>> of that presentation (and discussed in the email thread) is how we contemplated how
>> to represent the relationship among these controls to user space. So, considering
>> your example resctrl may display something like:
>>
>> info//
>> └── MB/
>> └── resource_schemata/
>> └── MB/
>> └── MB_MAX/
>>
>> Above hierarchy describes the relationship to user space that if MB is changed it
>> will impact MB_MAX and vice-versa.
>>
>> The one open I am aware of surrounding emulated controls is how to present some
>> semblance of consistency to user space when considering all the possibilities
>> the different architectures (and even within architectures) may have.
>
> What other use cases do we have apart from MB and MB_MAX? I was wondering if
> this could be limited to a default control (L2, L3, MB..) with a single new
> style control (L2_*, L3_*, MB_ ...) under it.
The motivation for these emulated controls is to not break a user space that does
not understand the "info/<resource>/resource_schemata" interface. At this time
user space expects every resource (not control) to have an entry in the schemata file.
So yes, I also see this as limited to the default control.
Whether it implies that only a single (finer grained/hardware) control would be under
it is not obvious to me since we already had one scenario where a legacy control is
emulated by two hardware controls when considering the example on MPAM where the "MB"
legacy control can be emulated with MPAM's "min" and "max" controls. An additional
complication is that some of these architecture specs describe several controls but have
their implementation as "optional" which presents a challenge when trying to create a
sane and consistent hierarchy.
>>>> - Controls are independent for now. This means that, for example, if a resource
>>>> supports a "MIN" and "MAX" control then this implementation would allow user to
>>>> set the "maximum" control values to be less than the "minimum" control values.
>>>
>>> I think this is ok as long as adding support for new controls in resctrl doesn't
>>> change the existing behaviour. In MPAM we dodged this by introducing MB as only
>>> affecting the h/w mbw_max and not mbw_min (as mentioned above).
>>
>> I understand this to be a requirement for Intel where the spec contains "The Maximum Cap
>> should be programmed to be greater than or equal to the Minimum and Optimal caps.
>> Undesirable and undefined performance effects may result if cap programming guidelines
>> are not followed."
>>
>> I am currently thinking that resctrl should not try to be too smart here and if user
>> space wants to make dramatic changes to min and max values then it should just ensure
>> the ordering is appropriate. For example, attempting to set a new min to be larger than
>> the old max would fail and user space should first increase the old max and then set
>> a new min.
>
> Ok with me.
Thank you for considering this.
...
>>>> Primary resctrl fs data structure changes
>>>> =========================================
>>>>
>>>> Introduces a control represented by struct resctrl_ctrl that looks as below. To make
>>>> the changes easier to follow I kept some of the original names to help communicate
>>>> where familiar data structures land.
>>>>
>>>> What to notice about a control is that it has some common properties required
>>>> from all controls (scope, type, etc.) and then depending on the type of control
>>>> (RESCTRL_CTRL_BITMAP or RESCTRL_CTRL_SCALAR) there are type specific properties.
>>>>
>>>> /**
>>>> * struct resctrl_ctrl - A resource control
>>>> * @entry: List entry of rdt_resource::controls
>>>> * @scope: Scope of the resource that this control allocates
>>>> * @domains: RCU list of all control domains
>>>> * @type: The control type that determines the properties of the control,
>>>> * format string for displaying control values to user space, and
>>>> * parser of control values provided by user space.
>>>> * @name: Name of the control. Appended to final resource name
>>>> * (rdt_resource_final::name) to create final schema entry.
>>>> * Specifically, "rdt_resource_final::name"_"resctrl_ctrl::name".
>>>> * For example, with resource name "MB" and control name "MAX" the
>>>> * schema entry will be "MB_MAX".
>>>> * @cache: Cache allocation control properties.
>>>> * @membw: Bandwidth control properties.
>>>> */
>>>> struct resctrl_ctrl {
>>>> struct list_head entry;
>>>> enum resctrl_scope scope;
>>>> struct list_head domains;
>>>> enum resctrl_ctrl_type type;
>>>> enum resctrl_ctrl_name name;
>>>> union {
>>>> struct resctrl_cache cache;
>>>> struct resctrl_membw membw;
>>>> };
>>>> };
>>>>
>>>> Two members summarize how this new structure fits into the rest of resctrl:
>>>> a) resctrl_ctrl::entry
>>>> Since a resource can support multiple controls there is a new list
>>>> in struct rdt_resource named "controls" that contains the list of all
>>>> controls supported by the resource.
>>>> b) resctrl_ctrl::domains
>>>> Instead of the list of control domains belonging to a resource they
>>>> now belong to the control self. By doing so resctrl can support resource
>>>> controls at different scope for the same resource. This is intended to
>>>> support some upcoming MPAM and RISC-V usages.
>>>
>>> Please can you expand a bit on part b).
>>>
>>> In an MPAM system we consider 3 resctrl resources, RDT_RESOURCE_L3,
>>> RDT_RESOURCE_L2 and RDT_RESOURCE_MBA which correspond to the L3 caches, L2
>>> caches and memory bandwidth on egress from the L3 caches. The domain for each of
>>> these corresponds to the instance of the resource. That is, for RDT_RESOURCE_L2
>>> there is a resource for each L2 instance, similarly for L3, and for
>>
>> (I'm assuming above is typo and it is "there is a domain for each L2 instance"?)
>
> yes, a mistake
>
>>
>>> RDT_RESOURCE_MBA there is a domain for each L3 cache. If we were to add suport
>>> for controls on a new cache level, say the L4, then I'd expect to add a new
>>> resource. For memory bandwidth, we'd like to be able to control b/w on the L2
>>> egress (e.g. in a DSU). Wouldn't this too be a separate resource or would this
>>> be a new set of controls on the same resource?
>>>
>>> New controls on the same resource
>>> MB_MIN2
>>> MB_MAX2
>>> MB_PROP2
>>> ...
>>>
>>> or
>>> MB2_MIN
>>> MB2_MAX
>>> MB2_PROP
>>
>>
>> The way I currently see it is that controlling bandwidth at a different scope would
>> be a new set of controls associated with the MB resource. There are more scenarios
>> coming this way with AMD's "Global MBA" that is memory bandwidth allocation at
>> NUMA node scope. If I understand correctly the "CPU-less Memory Node" that Nvidia
>> shared at plumbers would need this also and control memory bandwidth allocation
>> at the NUMA node scope.
>
> Yes, in general for MSC at the memory controlers it would be good to scope these
> by NUMA node whether or not they are CPU-less or not.
>
>> A related technology is Intel's region-aware MBA, which is
>> still at L3 scope.
>>
>> I fully agree that we need to figure out how to represent all of this to user space
>> without turning the interface into something unintelligible. In the end this is
>> required for user space to know what a domain ID represents.
>>
>> Would it help to make the scope part of the control name? The ship has sailed for
>> MB being associated with L3 scope but this could mean the "default" scope of MB
>> resource is L3 (which user space can still confirm by looking at the control's
>> "scope" file) and the others include scope in the name? Consider for example:
>> https://lore.kernel.org/lkml/fb1e2686-237b-4536-acd6-15159abafcba@intel.com/
>
> This certainly helps with the naming.
>
> The scope does have an effect on what causes a domain to be present or not. For
> existing scopes, such as L3 scope, that whether a domain is online or not is
> dependent on whether or not a set cpu is online and the cpu_read_lock is taken.
> However, for NUMA scope in MPAM (maybe not GMBA?) then whether or not the domain
> is online would need to depend on whether the memory is online or not and the
> memory hotplug lock will be needed to be taken. I am wondering if this sort of
> configuration means it's better to have the NUMA scoped memory bandwidth on a
> different resource or we just say ok and always take the memory hotplug lock ,
> get_online_mems(), where we take the cpu_read_lock.
oh, thank you for bringing this up. I have not considered how the memory hotplug lock
needs to be integrated. Taking cpus_read_lock() has permeated the entire subsystem.
My initial thought is that having unique per-resource locking sounds complicated while
always taking memory hotplug lock sounds much simpler. I do not see many users of
get_online_mems() though.
>>> AFAIK, the DSU h/w just supports proportional bandwidth controls at the moment
>>> but we should consider what to do about the potential naming.
>>
>> ack.
>>
>>>
>>> In the MPAM driver, we collect MSC into components (based on instances) and
>>> those into classes (components of the same type). Currently, a resource is
>>> mapped to a single class. (Two resources may map to the same class.)
>>>
>>> I expect it is useful in the memory region and sub numa cases but I'd still
>>> expect the common case to be that the domains are the same within a control. Or
>>> am I missing something?
>>
>> Domains of a control should all be at the same scope. Since the schemata file
>> exposes the control with the different IDs representing the instances of the
>> resource needing to be controlled it has to be clear to user space what the
>> domain ID represents.
>
> Agreed. (I meant to say the domains within a resource are likely to be the same
> for each control within the same resource.)
This seems accurate for the resources that have implicit scope (the caches) but
memory bandwidth as a resource is looking more like it needs to support allocation
at different scopes.
Reinette
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
2026-06-05 15:39 ` Reinette Chatre
@ 2026-06-05 16:37 ` Ben Horgan
0 siblings, 0 replies; 30+ messages in thread
From: Ben Horgan @ 2026-06-05 16:37 UTC (permalink / raw)
To: Reinette Chatre, Tony Luck, James Morse, Dave Martin, Babu Moger,
Drew Fustini, Fenghua Yu, Chen Yu
Cc: Borislav Petkov, Thomas Gleixner, Dave Hansen, Peter Newman,
x86@kernel.org, linux-kernel@vger.kernel.org
Hi Reinette,
On 6/5/26 16:39, Reinette Chatre wrote:
> Hi Ben,
>
> On 6/5/26 7:53 AM, Ben Horgan wrote:
>> On 6/4/26 18:43, Reinette Chatre wrote:
>>> On 6/3/26 8:15 AM, Ben Horgan wrote:
>>>> On 5/29/26 19:06, Reinette Chatre wrote:
>
> ...
>
>>>
>>>> I plumbed in support for the MB_MIN resource schema which also works under light
>>>> testing. The only fs resctrl code change I needed was:
>>>>
>>>> --- a/include/linux/resctrl.h
>>>> +++ b/include/linux/resctrl.h
>>>> @@ -483,6 +483,9 @@ static inline u32 resctrl_get_default_ctrlval(struct
>>>> resctrl_ctrl *ctrl)
>>>> case RESCTRL_CTRL_BITMAP:
>>>> return BIT_MASK(ctrl->cache.cbm_len) - 1;
>>>> case RESCTRL_CTRL_SCALAR:
>>>> + if (ctrl->name == RESCTRL_CTRL_NAME_MIN)
>>>> + return ctrl->membw.min_bw;
>>>> +
>>>> return ctrl->membw.max_bw;
>>>> }
>>>>
>>>>
>>>> At least on MPAM systems, we use a default of 0 for minimum bandwidth controls
>>>> as the maximum bandwidth controls only take effect if their value is higher than
>>>> the minimum bandwidth value. I have specialised this on the ctrl->name which
>>>> breaks your ctrl->type based classification but that's fixable by just adding a
>>>> default field to membw.
>>>
>>> This I am not sure about. In my understanding a typical "default" value means
>>> "no throttling" and, at least on Intel, this default hardware state has been
>>> summarized as "min" == "max" == "optimal".
>>
>> Ok, this sounds odd to me but that is probably because I don't know what Intel
>> systems do. On MPAM systems a MIN control is a boost rather than a throttling
>> control. Although, you can always think of that as throttling the traffic with
>> the other PARTIDs.
>>
>>>
>>> Are you saying that on MPAM systems if "min" == "max" then max bandwidth controls
>>> do not take effect? Could you please elaborate what happens if "min" == "max"?
>>
>> Table 5-4 from section 5.2.8 of the IHI0099B.b shows the interaction between the
>> min and maximum controls.
>>
>> If used bandwidth is The preference is Description
>> Below the minimum High Only high requests compete with this
>> request.
>> Above the minimum:
>> Below the maximum Medium High requests are serviced first then
>>
>> this request competes with other
>> medium requests.
>>
>> Above the maximum, Low Requests are not serviced if any high
>> when HARDLIM is 0 or medium requests are available.
>>
>> Above the maximum, None Requests are not serviced
>> when HARDLIM is 1
>>
>> So if we keep the minimum and the maximum controls values always the same then
>> all traffic will be given "high" preference until the target bandwidth is
>> reached. For some MPAM systems it is recommended to set the minimum value as 5%
>> less than the maximum value to get a reliable target bandwidth. As 5% seems
>> implementation specific and some systems don't have min controls it seemed
>> better to just match the MB control with a maximum bandwidth control and let the
>> user have freedom to choose the minimum bandwidth control when MB_MIN support is
>> added.
>>
>> If a default for the minimum of the maximum possible bandwidth is used (100%)
>> then any change of the maximum won't have any effect as it's always less than
>> minimum (if that's unchanged) and so all traffic is high preference. I now see
>> from your reply below that you are planning on not allowing this kind of
>> configuration.
>>
>> If the minimum always tracks the maximum then we lose the distinction between
>> medium and high preference traffic and so to reserve some high preference
>> bandwidth for one control group we'd have to change the configuration in the
>> other controls groups so that they're bandwidth preference is medium (minimum
>> value at 0).
>
> I do not think we are talking about the same thing here. I am *not* saying
> that minimum and maximum controls should always be the same.
>
> The discussion is about a proposed change to resctrl_get_default_ctrlval(). resctrl
> uses this function in two places:
> - When creating a new resource group:
> The intention here is that when user space creates a new resource group it should
> be created with maximum allocations possible. For MBA this means "unthrottled".
I would contend that for minimum controls that a policy of 'maximum allocation
possible' isn't a useful default. I try and explain a bit more below.
> After creating the resource group user space can adjust allocations to match
> workload requirements.
> - When unmounting the resctrl fs.
> The intention here is that all controls are set to unthrottled to stop any possible
> impact to system when user space stops using resctrl.
>
> resctrl_get_default_ctrlval() is thus intended to support an unthrottled baseline from
> where user space can make configuration changes as supported by hardware and required
> by workloads.
The baseline that I see makes most sense for a minimum control is to have the
default as 0. This just means that there is no "guaranteed"/high preference
bandwidth reserved for the control group. I would say this still unthrottled but
just not giving a boost. With this default the user can use MB (backed by max
bandwidth) without having to know about MB_MIN (keeping it constant). If the
default is 100% for min bandwidth then the user needs to know to set MB_MIN to
be able to use MB. Having a default of 100% for max bandwidth, correspondingly
means a user can change MB_MIN and see guaranteed bandwidth effects without
having to know about MB/MB_MAX.
Does this make sense?
>
> I see that the MPAM driver internally uses resctrl_get_default_ctrlval() in a couple
> of places and I am not considering this usage here. If internally MPAM has other
> usages for this function where it does not mean "unthrottled" then perhaps
> it would be better to create a new function that matches the usage?
I don't think the internal usage makes a difference here.
One process thing I was wondering about so that I know how to structure my
patches. In the series you have a few patches which touch all architectures;
these have the prefix mpam,x86,fs/resctrl. Is this how you would like cross
architectures patches to look like or is it just for convenience in the rfc and
a patch per-architecture is preferable?
Thanks,
Ben
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
2026-05-29 18:06 [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept Reinette Chatre
` (2 preceding siblings ...)
2026-06-03 15:15 ` Ben Horgan
@ 2026-06-03 18:46 ` Luck, Tony
2026-06-04 10:02 ` Ben Horgan
2026-06-04 21:42 ` Reinette Chatre
2026-06-03 22:14 ` Drew Fustini
4 siblings, 2 replies; 30+ messages in thread
From: Luck, Tony @ 2026-06-03 18:46 UTC (permalink / raw)
To: Reinette Chatre
Cc: Ben Horgan, James Morse, Dave Martin, Babu Moger, Drew Fustini,
Fenghua Yu, Chen Yu, Borislav Petkov, Thomas Gleixner,
Dave Hansen, Peter Newman, x86@kernel.org,
linux-kernel@vger.kernel.org
Reinette,
Tiny bug in "mpam,x86,fs/resctrl: Transition resource control to a list"
> /*
> * Return length needed to display longest control suffix.
> * Add 1 for the "_" character when control name exists.
> */
> size_t resctrl_resource_ctrl_max_len(struct rdt_resource *r)
> {
> struct resctrl_ctrl *ctrl;
> size_t total = 0;
> size_t len;
>
> for_each_resource_ctrl(ctrl,r) {
> len = strlen(resctrl_ctrl_name_str(ctrl->name));
> if (len)
> total += 1 + len;
Should be:
total = max(total, 1 + len);
> }
>
> return total;
> }
Your sample code just converts "MB" over to using a list of controls.
Would it also work for code & data priority? E.g. by having controls
for "_CODE" and "_DATA" that are on the list when CDP is enabled, and
just a no name control for the default?
-Tony
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
2026-06-03 18:46 ` Luck, Tony
@ 2026-06-04 10:02 ` Ben Horgan
2026-06-04 21:42 ` Reinette Chatre
1 sibling, 0 replies; 30+ messages in thread
From: Ben Horgan @ 2026-06-04 10:02 UTC (permalink / raw)
To: Luck, Tony, Reinette Chatre
Cc: James Morse, Dave Martin, Babu Moger, Drew Fustini, Fenghua Yu,
Chen Yu, Borislav Petkov, Thomas Gleixner, Dave Hansen,
Peter Newman, x86@kernel.org, linux-kernel@vger.kernel.org
Hi Tony,
On 6/3/26 19:46, Luck, Tony wrote:
> Reinette,
>
> Tiny bug in "mpam,x86,fs/resctrl: Transition resource control to a list"
>
>> /*
>> * Return length needed to display longest control suffix.
>> * Add 1 for the "_" character when control name exists.
>> */
>> size_t resctrl_resource_ctrl_max_len(struct rdt_resource *r)
>> {
>> struct resctrl_ctrl *ctrl;
>> size_t total = 0;
>> size_t len;
>>
>> for_each_resource_ctrl(ctrl,r) {
>> len = strlen(resctrl_ctrl_name_str(ctrl->name));
>> if (len)
>> total += 1 + len;
>
> Should be:
>
> total = max(total, 1 + len);
>
>> }
>>
>> return total;
>> }
>
> Your sample code just converts "MB" over to using a list of controls.
> Would it also work for code & data priority? E.g. by having controls
> for "_CODE" and "_DATA" that are on the list when CDP is enabled, and
> just a no name control for the default?
Is there a plan to expand to support CDP for things other than cache allocation?
This would help for MPAM as we currently don't support the MB schema when CDP is
enabled for L2 or L3 as we have no way of reliably supporting it for the caches
but not the memory allocation.
Thanks,
Ben
>
> -Tony
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
2026-06-03 18:46 ` Luck, Tony
2026-06-04 10:02 ` Ben Horgan
@ 2026-06-04 21:42 ` Reinette Chatre
1 sibling, 0 replies; 30+ messages in thread
From: Reinette Chatre @ 2026-06-04 21:42 UTC (permalink / raw)
To: Luck, Tony
Cc: Ben Horgan, James Morse, Dave Martin, Babu Moger, Drew Fustini,
Fenghua Yu, Chen Yu, Borislav Petkov, Thomas Gleixner,
Dave Hansen, Peter Newman, x86@kernel.org,
linux-kernel@vger.kernel.org
Hi Tony,
On 6/3/26 11:46 AM, Luck, Tony wrote:
> Reinette,
>
> Tiny bug in "mpam,x86,fs/resctrl: Transition resource control to a list"
>
>> /*
>> * Return length needed to display longest control suffix.
>> * Add 1 for the "_" character when control name exists.
>> */
>> size_t resctrl_resource_ctrl_max_len(struct rdt_resource *r)
>> {
>> struct resctrl_ctrl *ctrl;
>> size_t total = 0;
>> size_t len;
>>
>> for_each_resource_ctrl(ctrl,r) {
>> len = strlen(resctrl_ctrl_name_str(ctrl->name));
>> if (len)
>> total += 1 + len;
>
> Should be:
>
> total = max(total, 1 + len);
Thank you very much. I squashed this locally.
>
>> }
>>
>> return total;
>> }
>
> Your sample code just converts "MB" over to using a list of controls.
> Would it also work for code & data priority? E.g. by having controls
> for "_CODE" and "_DATA" that are on the list when CDP is enabled, and
> just a no name control for the default?
This PoC also converts existing cache allocation to use a (one entry) list
of controls. The current idea is that whether CDP is enabled or not there is
just one list of controls associated with the resource but when CDP is enabled
every domain associated with every control can accommodate both a "CODE" and
"DATA" control value.
For cache controls this PoC indeed still has many assumptions about there
being only one cache control. Even with the single control there are many
CDP special cases so there may be some unexplored corners when adding multiple
cache controls.
Reinette
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
2026-05-29 18:06 [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept Reinette Chatre
` (3 preceding siblings ...)
2026-06-03 18:46 ` Luck, Tony
@ 2026-06-03 22:14 ` Drew Fustini
2026-06-04 21:47 ` Reinette Chatre
4 siblings, 1 reply; 30+ messages in thread
From: Drew Fustini @ 2026-06-03 22:14 UTC (permalink / raw)
To: Reinette Chatre
Cc: Tony Luck, Ben Horgan, James Morse, Dave Martin, Babu Moger,
Fenghua Yu, Chen Yu, Borislav Petkov, Thomas Gleixner,
Dave Hansen, Peter Newman, x86@kernel.org,
linux-kernel@vger.kernel.org
On Fri, May 29, 2026 at 11:06:07AM -0700, Reinette Chatre wrote:
> /**
> * struct resctrl_ctrl - A resource control
> * @entry: List entry of rdt_resource::controls
> * @scope: Scope of the resource that this control allocates
> * @domains: RCU list of all control domains
> * @type: The control type that determines the properties of the control,
> * format string for displaying control values to user space, and
> * parser of control values provided by user space.
> * @name: Name of the control. Appended to final resource name
> * (rdt_resource_final::name) to create final schema entry.
> * Specifically, "rdt_resource_final::name"_"resctrl_ctrl::name".
> * For example, with resource name "MB" and control name "MAX" the
> * schema entry will be "MB_MAX".
> * @cache: Cache allocation control properties.
> * @membw: Bandwidth control properties.
> */
> struct resctrl_ctrl {
> struct list_head entry;
> enum resctrl_scope scope;
> struct list_head domains;
> enum resctrl_ctrl_type type;
> enum resctrl_ctrl_name name;
> union {
> struct resctrl_cache cache;
> struct resctrl_membw membw;
> };
> };
>
> Two members summarize how this new structure fits into the rest of resctrl:
> a) resctrl_ctrl::entry
> Since a resource can support multiple controls there is a new list
> in struct rdt_resource named "controls" that contains the list of all
> controls supported by the resource.
> b) resctrl_ctrl::domains
> Instead of the list of control domains belonging to a resource they
> now belong to the control self. By doing so resctrl can support resource
> controls at different scope for the same resource. This is intended to
> support some upcoming MPAM and RISC-V usages.
The ability to change scope is much needed for RISC-V. There are
compromises in my RFC [1] as a result of trying to map everything to
either L2 or L3 scope.
I would also like to see a non-cpu cache scope for monitoring too, but
would that be better discussed outside the context of this proof of
concept?
Thanks,
Drew
[1] https://lore.kernel.org/all/20260601-ssqosid-cbqri-rqsc-v7-0-v6-0-baf00f50028a@kernel.org/
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
2026-06-03 22:14 ` Drew Fustini
@ 2026-06-04 21:47 ` Reinette Chatre
2026-06-05 19:48 ` Drew Fustini
0 siblings, 1 reply; 30+ messages in thread
From: Reinette Chatre @ 2026-06-04 21:47 UTC (permalink / raw)
To: Drew Fustini
Cc: Tony Luck, Ben Horgan, James Morse, Dave Martin, Babu Moger,
Fenghua Yu, Chen Yu, Borislav Petkov, Thomas Gleixner,
Dave Hansen, Peter Newman, x86@kernel.org,
linux-kernel@vger.kernel.org
Hi Drew,
On 6/3/26 3:14 PM, Drew Fustini wrote:
> On Fri, May 29, 2026 at 11:06:07AM -0700, Reinette Chatre wrote:
>> /**
>> * struct resctrl_ctrl - A resource control
>> * @entry: List entry of rdt_resource::controls
>> * @scope: Scope of the resource that this control allocates
>> * @domains: RCU list of all control domains
>> * @type: The control type that determines the properties of the control,
>> * format string for displaying control values to user space, and
>> * parser of control values provided by user space.
>> * @name: Name of the control. Appended to final resource name
>> * (rdt_resource_final::name) to create final schema entry.
>> * Specifically, "rdt_resource_final::name"_"resctrl_ctrl::name".
>> * For example, with resource name "MB" and control name "MAX" the
>> * schema entry will be "MB_MAX".
>> * @cache: Cache allocation control properties.
>> * @membw: Bandwidth control properties.
>> */
>> struct resctrl_ctrl {
>> struct list_head entry;
>> enum resctrl_scope scope;
>> struct list_head domains;
>> enum resctrl_ctrl_type type;
>> enum resctrl_ctrl_name name;
>> union {
>> struct resctrl_cache cache;
>> struct resctrl_membw membw;
>> };
>> };
>>
>> Two members summarize how this new structure fits into the rest of resctrl:
>> a) resctrl_ctrl::entry
>> Since a resource can support multiple controls there is a new list
>> in struct rdt_resource named "controls" that contains the list of all
>> controls supported by the resource.
>> b) resctrl_ctrl::domains
>> Instead of the list of control domains belonging to a resource they
>> now belong to the control self. By doing so resctrl can support resource
>> controls at different scope for the same resource. This is intended to
>> support some upcoming MPAM and RISC-V usages.
>
> The ability to change scope is much needed for RISC-V. There are
> compromises in my RFC [1] as a result of trying to map everything to
> either L2 or L3 scope.
>
> I would also like to see a non-cpu cache scope for monitoring too, but
> would that be better discussed outside the context of this proof of
> concept?
I also think it would be good for it to be clear that monitoring is based on
scope, not a resource. With the MB controls supporting different scope I do think
that this would be a good next step. A previous musing from me on this topic can
be found (at the end of ) https://lore.kernel.org/lkml/fb1e2686-237b-4536-acd6-15159abafcba@intel.com/
I have not yet considered how this can be built on top of this PoC though.
Reinette
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
2026-06-04 21:47 ` Reinette Chatre
@ 2026-06-05 19:48 ` Drew Fustini
0 siblings, 0 replies; 30+ messages in thread
From: Drew Fustini @ 2026-06-05 19:48 UTC (permalink / raw)
To: Reinette Chatre
Cc: Tony Luck, Ben Horgan, James Morse, Dave Martin, Babu Moger,
Fenghua Yu, Chen Yu, Borislav Petkov, Thomas Gleixner,
Dave Hansen, Peter Newman, x86@kernel.org,
linux-kernel@vger.kernel.org
On Thu, Jun 04, 2026 at 02:47:39PM -0700, Reinette Chatre wrote:
> > The ability to change scope is much needed for RISC-V. There are
> > compromises in my RFC [1] as a result of trying to map everything to
> > either L2 or L3 scope.
> >
> > I would also like to see a non-cpu cache scope for monitoring too, but
> > would that be better discussed outside the context of this proof of
> > concept?
>
> I also think it would be good for it to be clear that monitoring is based on
> scope, not a resource. With the MB controls supporting different scope I do think
> that this would be a good next step. A previous musing from me on this topic can
> be found (at the end of ) https://lore.kernel.org/lkml/fb1e2686-237b-4536-acd6-15159abafcba@intel.com/
>
> I have not yet considered how this can be built on top of this PoC though.
Thanks for explaining. I like how how you show an example of
mon_data/mon_NODE_00/mbm_total_bytes in that thread. I believe that sort
of scheme would work well for RISc-V as a bandwidth controller
implementing the CBQRI spec can be located anywhere within the system.
Drew
^ permalink raw reply [flat|nested] 30+ messages in thread