From: Dave Martin <Dave.Martin@arm.com>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>,
linux-kernel@vger.kernel.org, James Morse <james.morse@arm.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Jonathan Corbet <corbet@lwn.net>,
x86@kernel.org, linux-doc@vger.kernel.org
Subject: Re: [PATCH] fs/resctrl,x86/resctrl: Factor mba rounding to be per-arch
Date: Tue, 21 Oct 2025 15:37:35 +0100 [thread overview]
Message-ID: <aPearyfcnpJJ/e06@e133380.arm.com> (raw)
In-Reply-To: <aPZj1nDVEYmYytY9@agluck-desk3>
Hi Tony,
On Mon, Oct 20, 2025 at 09:31:18AM -0700, Luck, Tony wrote:
> On Mon, Oct 20, 2025 at 04:50:38PM +0100, Dave Martin wrote:
> > Hi Reinette,
> >
> > On Fri, Oct 17, 2025 at 08:59:45AM -0700, Reinette Chatre wrote:
[...]
> > > By extension I assume that software that understands a schema that is introduced
> > > after the "relationship" format is established can be expected to understand the
> > > format and thus these new schemata do not require the '#' prefix. Even if
> > > a new schema is introduced with a single control it can be followed by a new child
> > > control without a '#' prefix a couple of kernel releases later. By this point it
> > > should hopefully be understood by user space that it should not write entries it does
> > > not understand.
> >
> > Generally, yes.
> >
> > I think that boils down to: "OK, previously you could just tweak bits
> > of the whole schemata file you read and write the whole thing back,
> > and the effect would be what you inuitively expected. But in future
> > different schemata in the file may not be independent of one another.
> > We'll warn you which things might not be independent, but we may not
> > describe exactly how they affect each other.
>
> Changes to the schemata file are currently "staged" and then applied.
> There's some filesystem level error/sanity checking during the parsing
> phase, but maybe for MB some parts can also be delayed, and re-ordered
> when architecture code applies the changes.
>
> E.g. while filesystem code could check min <= opt <= max. Architecture
> code would be responsible to write the values to h/w in a sane manner
> (assuming architecture cares about transient effects when things don't
> conform to the ordering).
>
> E.g. User requests moving from min,opt,max = 10,20,30 to 40,50,60
> Regardless of the order those requests appeared in the write(2) syscall
> architecture bumps max to 60, then opt to 50, and finally min to 40.
This could be sorted indeed be sorted out during staging, but I'm not
sure that we can/should rely on it.
If we treat the data coming from a single write() as a transaction, and
stage the whole thing before executing it, that's fine. But I think
this has to be viewed as an optimisation rather than guaranteed
semantics.
We told userspace that schemata is an S_IFREG regular file, so we have
to accept a write() boundary anywhere in the stream.
(In fact, resctrl chokes if a write boundary occurs in the middle of a
line. In practice, stdio buffering and similar means that this issue
turns out to be difficult to hit, except with shell scripts that try to
emit a line piecemeal -- I have a partial fix for that knocking around,
but this throws up other problems, so I gave up for the time being.)
We also cannot currently rely on userspace closing the fd between
"transactions". We never told userspace to do that, previously. We
could make a new requirement, but it feels unexpected/unreasonable (?)
> >
> > "So, from now on, only write the things that you actually want to set."
> >
> > Does that sound about right?
>
> Users might still use their favorite editor on the schemata file and
> so write everything, while only changing a subset. So if we don't go
> for the full two-phase update I describe above this would be:
>
> "only *change* the things that you actually want to set".
[...]
> -Tony
This works if the schemata file is output in the right order (and the
user doesn't change the order):
# cat schemata
MB:0=100;1=100
# MB_HW:0=1024;1=1024
->
# cat <<EOF >schemata
MB:0=100;1=100
MB_HW:0=512,1=512
EOF
... though it may still be inefficient, if the lines are not staged
together. The hardware memory bandwidth controls may get programmed
twice, here -- though the final result is probably what was intended.
I'd still prefer that we tell people that they should be doing this:
# cat <<EOF >schemata
MB_HW:0=512,1=512
EOF
...if they are really tyring to set MB_HW and don't care about the
effect on MB?
Cheers
---Dave
next prev parent reply other threads:[~2025-10-21 14:37 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-02 16:24 [PATCH] fs/resctrl,x86/resctrl: Factor mba rounding to be per-arch Dave Martin
2025-09-12 22:19 ` Reinette Chatre
2025-09-22 14:39 ` Dave Martin
2025-09-23 17:27 ` Reinette Chatre
2025-09-25 12:46 ` Dave Martin
2025-09-25 20:53 ` Reinette Chatre
2025-09-25 21:35 ` Luck, Tony
2025-09-25 22:18 ` Reinette Chatre
2025-09-29 13:08 ` Dave Martin
2025-09-29 12:43 ` Dave Martin
2025-09-29 15:38 ` Reinette Chatre
2025-09-29 16:10 ` Dave Martin
2025-10-15 15:18 ` Dave Martin
2025-10-16 15:57 ` Reinette Chatre
2025-10-17 15:52 ` Dave Martin
2025-09-22 15:04 ` Dave Martin
2025-09-25 22:58 ` Luck, Tony
2025-09-29 9:19 ` Chen, Yu C
2025-09-29 14:13 ` Dave Martin
2025-09-29 16:23 ` Luck, Tony
2025-09-30 11:02 ` Chen, Yu C
2025-09-30 16:08 ` Luck, Tony
2025-09-30 4:43 ` Chen, Yu C
2025-09-30 15:55 ` Dave Martin
2025-10-01 12:13 ` Chen, Yu C
2025-10-02 15:40 ` Dave Martin
2025-10-02 16:43 ` Luck, Tony
2025-09-29 13:56 ` Dave Martin
2025-09-29 16:09 ` Reinette Chatre
2025-09-30 15:40 ` Dave Martin
2025-10-10 16:48 ` Reinette Chatre
2025-10-11 17:15 ` Chen, Yu C
2025-10-13 15:01 ` Dave Martin
2025-10-13 14:36 ` Dave Martin
2025-10-14 22:55 ` Reinette Chatre
2025-10-15 15:47 ` Dave Martin
2025-10-15 18:48 ` Luck, Tony
2025-10-16 14:50 ` Dave Martin
2025-10-16 16:31 ` Reinette Chatre
2025-10-17 14:17 ` Dave Martin
2025-10-17 15:59 ` Reinette Chatre
2025-10-20 15:50 ` Dave Martin
2025-10-20 16:31 ` Luck, Tony
2025-10-21 14:37 ` Dave Martin [this message]
2025-10-21 20:59 ` Luck, Tony
2025-10-22 14:58 ` Dave Martin
2025-10-22 16:21 ` Luck, Tony
2025-10-23 14:04 ` Dave Martin
2025-09-29 16:37 ` Luck, Tony
2025-09-30 16:02 ` Dave Martin
2025-09-26 20:54 ` Reinette Chatre
2025-09-29 13:40 ` Dave Martin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aPearyfcnpJJ/e06@e133380.arm.com \
--to=dave.martin@arm.com \
--cc=bp@alien8.de \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=james.morse@arm.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=reinette.chatre@intel.com \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).