The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Drew Fustini <fustini@kernel.org>
To: Ben Horgan <ben.horgan@arm.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>,
	Tony Luck <tony.luck@intel.com>,
	James Morse <james.morse@arm.com>,
	Dave Martin <Dave.Martin@arm.com>,
	Babu Moger <babu.moger@amd.com>, Fenghua Yu <fenghuay@nvidia.com>,
	Chen Yu <yu.c.chen@intel.com>, Borislav Petkov <bp@alien8.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Peter Newman <peternewman@google.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
Date: Wed, 3 Jun 2026 12:34:05 -0700	[thread overview]
Message-ID: <aiCBratZchVFVhws@gen8> (raw)
In-Reply-To: <29c95b69-e1a4-46b1-ab8b-45c09308b924@arm.com>

On Wed, Jun 03, 2026 at 04:15:51PM +0100, Ben Horgan wrote:
> Hi Reinette,
> 
> On 5/29/26 19:06, Reinette Chatre wrote:
> > Hi Everybody,
> > 
> > It has been a while since we discussed the resctrl changes required to support
> > hardware that has controls with fine granularity or hardware that has multiple
> > controls per resource. For reference, the most recent email discussion can
> > be found at [1] with a summary of discussions in last year's plumbers slides [2].
> > 
> > I created a PoC that I believe supports what folks have agreed to so far. I
> > hope this can help us to restart the discussion with the goal that resctrl gains
> > support for upcoming hardware that require these features.
> 
> Thank you very much for doing this work. I believe this will be very useful for
> MPAM and other architectures.

Yes, thanks to Reinette for working on the generic schema proof of
concept. This will be helpful for supporting the RISC-V CBQRI (capacity
and bandwidth QoS) spec.

> I plumbed in support for the MB_MIN resource schema which also works under light
> testing. The only fs resctrl code change I needed was:
> 
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -483,6 +483,9 @@ static inline u32 resctrl_get_default_ctrlval(struct
> resctrl_ctrl *ctrl)
>         case RESCTRL_CTRL_BITMAP:
>                 return BIT_MASK(ctrl->cache.cbm_len) - 1;
>         case RESCTRL_CTRL_SCALAR:
> +               if (ctrl->name == RESCTRL_CTRL_NAME_MIN)
> +                       return ctrl->membw.min_bw;
> +
>                 return ctrl->membw.max_bw;
>         }
> 
> 
> At least on MPAM systems, we use a default of 0 for minimum bandwidth controls
> as the maximum bandwidth controls only take effect if their value is higher than
> the minimum bandwidth value. I have specialised this on the ctrl->name which
> breaks your ctrl->type based classification but that's fixable by just adding a
> default field to membw.

This should be useful for RISC-V.

RESCTRL_CTRL_NAME_MIN maps well to CBQRI Rbwb (reserved bandwidth
blocks). The sum of Rbwb across all control groups must be less than
MRBWB (maximum number of reserved bandwidth blocks). As a result, MB_MIN
needs to default to 1 so that the sum does not violate that rule. In my
RFC series, I added default_to_min to resctrl_membw [1] but this
solution looks cleaner.

> > - No support for "read-modify-write" usage of schemata file. This is where we
> >   discussed (without agreement) on possibly introducing the "#" prefix to schemata
> >   file entries. This PoC does not support this prefix and the current assumption/expectation
> >   is that when user space changes a configuration only the new control values are
> >   written to schemata file. I thus do not have a plan to support this so please
> >   share opinions in this regard if you have some.
> 
> There is now less motivation from the MPAM side for this than when this was
> initially discussed. In pre-upstream versions of the MPAM patches a change in
> the MB resource control value would change both the mpam h/w mbw_min and mbw_max
> values but now (on non-broken h/w) we just change the mbw_max. (mbw_min kept at 0).
> 
> However, it would be useful not to be limited by percentages. In my quick
> experimentation with your patches I used a percentage value for MB_MIN but it
> would be best to move away from this. For new controls I think we can mandate
> that user space has to discover the resolution from the info directly but how
> can we retrofit this. For MPAM, MB and MB_MAX, would control the same things.
> Could we just add MB_MAX with a h/w friendly scale and then reflect changes in
> MB_MAX in MB and vica versa with MB taking precedent if both are set? Old
> software can continue setting MB can move to using MB_MAX and take advantage of
> the improved control. (I don't think we should expose the MPAM hardware value
> directly as it has confusion over whether all 1s is 100% or not and we'd like to
> have something generic and friendly to the user.)

The facility for non-percentage value is import for RISC-V as CBQRI does
not include percentage throttle. It has two controls for bandwidth:

   - Rbwb: number of reserved bandwidth blocks [1, 2^13]
   - Mweight: weighted share of the remaining bandwidth [0, 255]
     - 0: disables work-conserving sharing
     - 1..255: compete for the leftover pool
     - It makes for it to default to max (255) so that there won't be
       any unused bandwidth

I think Mweight could be aligned with MPAM's proportional stride.

Here is the patch I created to add Mweight support:

diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index d95ab8ad36e2..3537071e3ab0 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -304,6 +304,7 @@ static const char * const resctrl_ctrl_name[] = {
 	[RESCTRL_CTRL_NAME_DEF]		= "",
 	[RESCTRL_CTRL_NAME_MIN]		= "MIN",
 	[RESCTRL_CTRL_NAME_MAX]		= "MAX",
+	[RESCTRL_CTRL_NAME_WGHT]	= "WGHT",
 };

 const char *resctrl_ctrl_name_str(enum resctrl_ctrl_name name)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 72fb7256270e..09efcef9ce66 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -348,12 +348,14 @@ struct resctrl_mon {
  *				has the same name as the resource.
  * @RESCTRL_CTRL_NAME_MIN:	"MIN"
  * @RESCTRL_CTRL_NAME_MAX:	"MAX"
+ * @RESCTRL_CTRL_NAME_WGHT:	"WGHT"
  */
 enum resctrl_ctrl_name {
 	RESCTRL_CTRL_NAME_DEF,
 	RESCTRL_CTRL_NAME_MIN,
 	RESCTRL_CTRL_NAME_MAX,
-	RESCTRL_CTRL_NAME_LAST = RESCTRL_CTRL_NAME_MAX
+	RESCTRL_CTRL_NAME_WGHT,
+	RESCTRL_CTRL_NAME_LAST = RESCTRL_CTRL_NAME_WGHT
 };

> > - Controls are independent for now. This means that, for example, if a resource
> >   supports a "MIN" and "MAX" control then this implementation would allow user to
> >   set the "maximum" control values to be less than the "minimum" control values.
> 
> I think this is ok as long as adding support for new controls in resctrl doesn't
> change the existing behaviour. In MPAM we dodged this by introducing MB as only
> affecting the h/w mbw_max and not mbw_min (as mentioned above).

There is no equivalent to MB (percentage throttle) in RISC-V so I would
want it to be valid to have MB_MIN (minimum reservation) without MB.

I rebased my RISC-V CBQRI v6 series on top of this proof of concept and
was able to validate it works okay in Qemu:

MB_WGHT:72=255
 MB_MIN:72=756
     L2:64=fff;65=fff
     L3:75=ffff

Thanks,
Drew

[1] https://lore.kernel.org/all/20260601-ssqosid-cbqri-rqsc-v7-0-v6-6-baf00f50028a@kernel.org/

  reply	other threads:[~2026-06-03 19:34 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-29 18:06 [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept Reinette Chatre
2026-06-02 20:23 ` Babu Moger
2026-06-02 22:56   ` Reinette Chatre
2026-06-03  1:14     ` Moger, Babu
2026-06-03  3:55       ` Reinette Chatre
2026-06-03 14:40         ` Babu Moger
2026-06-02 23:32 ` Chen, Yu C
2026-06-03  3:45   ` Reinette Chatre
2026-06-03 11:53     ` Chen, Yu C
2026-06-04 16:37       ` Reinette Chatre
2026-06-05 15:43         ` Chen, Yu C
2026-06-05 16:20           ` Reinette Chatre
2026-06-03 15:15 ` Ben Horgan
2026-06-03 19:34   ` Drew Fustini [this message]
2026-06-04 11:24     ` Ben Horgan
2026-06-04 17:38       ` Drew Fustini
2026-06-04 21:05     ` Reinette Chatre
2026-06-05 19:35       ` Drew Fustini
2026-06-06  5:10         ` Drew Fustini
2026-06-06  5:23           ` Drew Fustini
2026-06-04 17:43   ` Reinette Chatre
2026-06-05 14:53     ` Ben Horgan
2026-06-05 15:39       ` Reinette Chatre
2026-06-05 16:37         ` Ben Horgan
2026-06-08 16:16           ` Reinette Chatre
2026-06-09 10:10             ` Ben Horgan
2026-06-09 15:28               ` Reinette Chatre
2026-06-09 16:37                 ` Ben Horgan
2026-06-09 17:41                   ` Reinette Chatre
2026-06-03 18:46 ` Luck, Tony
2026-06-04 10:02   ` Ben Horgan
2026-06-04 21:42   ` Reinette Chatre
2026-06-03 22:14 ` Drew Fustini
2026-06-04 21:47   ` Reinette Chatre
2026-06-05 19:48     ` Drew Fustini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aiCBratZchVFVhws@gen8 \
    --to=fustini@kernel.org \
    --cc=Dave.Martin@arm.com \
    --cc=babu.moger@amd.com \
    --cc=ben.horgan@arm.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=fenghuay@nvidia.com \
    --cc=james.morse@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peternewman@google.com \
    --cc=reinette.chatre@intel.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox