From: "Luck, Tony" <tony.luck@intel.com>
To: Dave Martin <Dave.Martin@arm.com>
Cc: <linux-kernel@vger.kernel.org>,
Reinette Chatre <reinette.chatre@intel.com>,
James Morse <james.morse@arm.com>,
"Thomas Gleixner" <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, "Borislav Petkov" <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Jonathan Corbet <corbet@lwn.net>, <x86@kernel.org>,
<linux-doc@vger.kernel.org>
Subject: Re: [PATCH] fs/resctrl,x86/resctrl: Factor mba rounding to be per-arch
Date: Mon, 29 Sep 2025 09:37:41 -0700 [thread overview]
Message-ID: <aNq11fmlac6dH4pH@agluck-desk3> (raw)
In-Reply-To: <aNqQAy8nOkLRYx4F@e133380.arm.com>
On Mon, Sep 29, 2025 at 02:56:19PM +0100, Dave Martin wrote:
> Hi Tony,
>
> Thanks for taking at look at this -- comments below.
>
> [...]
>
> On Thu, Sep 25, 2025 at 03:58:35PM -0700, Luck, Tony wrote:
> > On Mon, Sep 22, 2025 at 04:04:40PM +0100, Dave Martin wrote:
>
> [...]
>
> > > Would something like the following work? A read from schemata might
> > > produce something like this:
> > >
> > > MB: 0=50, 1=50
> > > # MB_HW: 0=32, 1=32
> > > # MB_MIN: 0=31, 1=31
> > > # MB_MAX: 0=32, 1=32
>
> [...]
>
> > > I'd be interested in people's thoughts on it, though.
> >
> > Applying this to Intel upcoming region aware memory bandwidth
> > that supports 255 steps and h/w min/max limits.
>
> Following the MPAM example, would you also expect:
>
> scale: 255
> unit: 100pc
>
> ...?
Yes. 255 (or whatever "Q" value is provided in the ACPI table)
corresponds to no throttling, so 100% bandwidth.
>
> > We would have info files with "min = 1, max = 255" and a schemata
> > file that looks like this to legacy apps:
> >
> > MB: 0=50;1=75
> > #MB_HW: 0=128;1=191
> > #MB_MIN: 0=128;1=191
> > #MB_MAX: 0=128;1=191
> >
> > But a newer app that is aware of the extensions can write:
> >
> > # cat > schemata << 'EOF'
> > MB_HW: 0=10
> > MB_MIN: 0=10
> > MB_MAX: 0=64
> > EOF
> >
> > which then reads back as:
> > MB: 0=4;1=75
> > #MB_HW: 0=10;1=191
> > #MB_MIN: 0=10;1=191
> > #MB_MAX: 0=64;1=191
> >
> > with the legacy line updated with the rounded value of the MB_HW
> > supplied by the user. 10/255 = 3.921% ... so call it "4".
>
> I'm suggesting that this always be rounded up, so that you have a
> guarantee that the steps are no smaller than the reported value.
Round up, rather than round-to-nearest, make sense. Though perhaps
only cosmetic as I would be surprised if anyone has a mix of tools
looking at the legacy schemata lines while programming using the
direct h/w controls.
>
> (In this case, round-up and round-to-nearest give the same answer
> anyway, though!)
>
> >
> > The region aware h/w supports separate bandwidth controls for each
> > region. We could hope (or perhaps update the spec to define) that
> > region0 is always node-local DDR memory and keep the "MB" tag for
> > that.
>
> Do you have concerns about existing software choking on the #-prefixed
> lines?
Do they even need a # prefix? We already mix lines for multiple
resources in the schemata file with a separate prefix for each resource.
The schemata file also allows writes to just update one resource (or
one domain in a single resource). The schemata file started with just
"L3". Then we added "L2", "MB", and "SMBA" with no concern that the
initial "L3" manipulating tools would be confused.
> > Then use some other tag naming for other regions. Remote DDR,
> > local CXL, remote CXL are the ones we think are next in the h/w
> > memory sequence. But the "region" concept would allow for other
> > options as other memory technologies come into use.
>
> Would it be reasnable just to have a set of these schema instances, per
> region, so:
>
> MB_HW: ... // implicitly region 0
> MB_HW_1: ...
> MB_HW_2: ...
Chen Yu is currently looking at putting the word "TIER" into the
name, since there's some precedent for describing memory in "tiers".
Whatever naming scheme is used, the important part is how will users
find out what each schemata line actually means/controls.
> etc.
>
> Or, did you have something else in mind?
>
> My thinking is that we avoid adding complexity in the schemata file if
> we treat mapping these schema instances onto the hardware topology as
> an orthogonal problem. So long as we have unique names in the schemata
> file, we can describe elsewhere what they relate to in the hardware.
Yes, exactly this.
>
> Cheers
> ---Dave
-Tony
next prev parent reply other threads:[~2025-09-29 16:38 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-02 16:24 [PATCH] fs/resctrl,x86/resctrl: Factor mba rounding to be per-arch Dave Martin
2025-09-12 22:19 ` Reinette Chatre
2025-09-22 14:39 ` Dave Martin
2025-09-23 17:27 ` Reinette Chatre
2025-09-25 12:46 ` Dave Martin
2025-09-25 20:53 ` Reinette Chatre
2025-09-25 21:35 ` Luck, Tony
2025-09-25 22:18 ` Reinette Chatre
2025-09-29 13:08 ` Dave Martin
2025-09-29 12:43 ` Dave Martin
2025-09-29 15:38 ` Reinette Chatre
2025-09-29 16:10 ` Dave Martin
2025-10-15 15:18 ` Dave Martin
2025-10-16 15:57 ` Reinette Chatre
2025-10-17 15:52 ` Dave Martin
2025-09-22 15:04 ` Dave Martin
2025-09-25 22:58 ` Luck, Tony
2025-09-29 9:19 ` Chen, Yu C
2025-09-29 14:13 ` Dave Martin
2025-09-29 16:23 ` Luck, Tony
2025-09-30 11:02 ` Chen, Yu C
2025-09-30 16:08 ` Luck, Tony
2025-09-30 4:43 ` Chen, Yu C
2025-09-30 15:55 ` Dave Martin
2025-10-01 12:13 ` Chen, Yu C
2025-10-02 15:40 ` Dave Martin
2025-10-02 16:43 ` Luck, Tony
2025-09-29 13:56 ` Dave Martin
2025-09-29 16:09 ` Reinette Chatre
2025-09-30 15:40 ` Dave Martin
2025-10-10 16:48 ` Reinette Chatre
2025-10-11 17:15 ` Chen, Yu C
2025-10-13 15:01 ` Dave Martin
2025-10-13 14:36 ` Dave Martin
2025-10-14 22:55 ` Reinette Chatre
2025-10-15 15:47 ` Dave Martin
2025-10-15 18:48 ` Luck, Tony
2025-10-16 14:50 ` Dave Martin
2025-10-16 16:31 ` Reinette Chatre
2025-10-17 14:17 ` Dave Martin
2025-10-17 15:59 ` Reinette Chatre
2025-10-20 15:50 ` Dave Martin
2025-10-20 16:31 ` Luck, Tony
2025-10-21 14:37 ` Dave Martin
2025-10-21 20:59 ` Luck, Tony
2025-10-22 14:58 ` Dave Martin
2025-10-22 16:21 ` Luck, Tony
2025-10-23 14:04 ` Dave Martin
2025-09-29 16:37 ` Luck, Tony [this message]
2025-09-30 16:02 ` Dave Martin
2025-09-26 20:54 ` Reinette Chatre
2025-09-29 13:40 ` Dave Martin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aNq11fmlac6dH4pH@agluck-desk3 \
--to=tony.luck@intel.com \
--cc=Dave.Martin@arm.com \
--cc=bp@alien8.de \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=james.morse@arm.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=reinette.chatre@intel.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).