linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Luck, Tony" <tony.luck@intel.com>
To: Dave Martin <Dave.Martin@arm.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>,
	<linux-kernel@vger.kernel.org>, James Morse <james.morse@arm.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "Borislav Petkov" <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Jonathan Corbet <corbet@lwn.net>, <x86@kernel.org>,
	<linux-doc@vger.kernel.org>
Subject: Re: [PATCH] fs/resctrl,x86/resctrl: Factor mba rounding to be per-arch
Date: Tue, 21 Oct 2025 13:59:36 -0700	[thread overview]
Message-ID: <aPf0OKwDZ4XbmVRB@agluck-desk3> (raw)
In-Reply-To: <aPearyfcnpJJ/e06@e133380.arm.com>

Hi Dave,

On Tue, Oct 21, 2025 at 03:37:35PM +0100, Dave Martin wrote:
> Hi Tony,
> 
> On Mon, Oct 20, 2025 at 09:31:18AM -0700, Luck, Tony wrote:
> > On Mon, Oct 20, 2025 at 04:50:38PM +0100, Dave Martin wrote:
> > > Hi Reinette,
> > > 
> > > On Fri, Oct 17, 2025 at 08:59:45AM -0700, Reinette Chatre wrote:
> 
> [...]
> 
> > > > By extension I assume that software that understands a schema that is introduced
> > > > after the "relationship" format is established can be expected to understand the
> > > > format and thus these new schemata do not require the '#' prefix. Even if
> > > > a new schema is introduced with a single control it can be followed by a new child
> > > > control without a '#' prefix a couple of kernel releases later. By this point it
> > > > should hopefully be understood by user space that it should not write entries it does
> > > > not understand.
> > > 
> > > Generally, yes.
> > > 
> > > I think that boils down to: "OK, previously you could just tweak bits
> > > of the whole schemata file you read and write the whole thing back,
> > > and the effect would be what you inuitively expected.  But in future
> > > different schemata in the file may not be independent of one another.
> > > We'll warn you which things might not be independent, but we may not
> > > describe exactly how they affect each other.
> > 
> > Changes to the schemata file are currently "staged" and then applied.
> > There's some filesystem level error/sanity checking during the parsing
> > phase, but maybe for MB some parts can also be delayed, and re-ordered
> > when architecture code applies the changes.
> > 
> > E.g. while filesystem code could check min <= opt <= max. Architecture
> > code would be responsible to write the values to h/w in a sane manner
> > (assuming architecture cares about transient effects when things don't
> > conform to the ordering).
> > 
> > E.g. User requests moving from min,opt,max = 10,20,30 to 40,50,60
> > Regardless of the order those requests appeared in the write(2) syscall
> > architecture bumps max to 60, then opt to 50, and finally min to 40.
> 
> This could be sorted indeed be sorted out during staging, but I'm not
> sure that we can/should rely on it.
> 
> If we treat the data coming from a single write() as a transaction, and
> stage the whole thing before executing it, that's fine.  But I think
> this has to be viewed as an optimisation rather than guaranteed
> semantics.
> 
> 
> We told userspace that schemata is an S_IFREG regular file, so we have
> to accept a write() boundary anywhere in the stream.
> 
> (In fact, resctrl chokes if a write boundary occurs in the middle of a
> line.  In practice, stdio buffering and similar means that this issue
> turns out to be difficult to hit, except with shell scripts that try to
> emit a line piecemeal -- I have a partial fix for that knocking around,
> but this throws up other problems, so I gave up for the time being.)

Is this worth the pain and complexity? Maybe just document the reality
of the implementation since day 1 of resctrl that each write(2) must
contain one or more lines, each terminated with "\n".

There are already so many ways that the schemata file does not behave
like a regular S_IFREG file. E.g. accepting a write to just update
one domain in a resource: # echo L3:2=0xff > schemata

So describe schemata in terms of writing "update commands" rather
than "Lines"?

> 
> We also cannot currently rely on userspace closing the fd between
> "transactions".  We never told userspace to do that, previously.  We
> could make a new requirement, but it feels unexpected/unreasonable (?)
> 
> > > 
> > > "So, from now on, only write the things that you actually want to set."
> > > 
> > > Does that sound about right?
> > 
> > Users might still use their favorite editor on the schemata file and
> > so write everything, while only changing a subset. So if we don't go
> > for the full two-phase update I describe above this would be:
> > 
> >   "only *change* the things that you actually want to set".

I misremembered where the check for "did the user change the value"
happened. I thought it was during parsing, but it is actually in
resctrl_arch_update_domains() after all input parsing is complete
and resctrl is applying changes. So unless we change things to work
the way I hallucinated, then ordering does matter the way you
described.
> 
> [...]
> 
> > -Tony
> 
> This works if the schemata file is output in the right order (and the
> user doesn't change the order):
> 
> # cat schemata
> MB:0=100;1=100
> # MB_HW:0=1024;1=1024
> 
> ->
> 
> # cat <<EOF >schemata
> MB:0=100;1=100
> MB_HW:0=512,1=512
> EOF
> 
> ... though it may still be inefficient, if the lines are not staged
> together.  The hardware memory bandwidth controls may get programmed
> twice, here -- though the final result is probably what was intended.
> 
> I'd still prefer that we tell people that they should be doing this:
> # cat <<EOF >schemata
> MB_HW:0=512,1=512
> EOF
> 
> ...if they are really tyring to set MB_HW and don't care about the
> effect on MB?

I'm starting to worry about this co-existence of old/new syntax for
Intel region aware. Life seems simple if there is only one MB_HW
connected to the legacy "MB". Updates to either will make both
appear with new values when the schemata is read. E.g.

# cat schemata
MB:0=100
#MB_HW=255

# echo MB:0=50 > schemata

# cat schemata
MB:0=50
#MB_HW=127

But Intel will have several MB_HW controls, one for each region.
[Schemata names TBD, but I'll just call them 0, 1, 2, 3 here]

# cat schemata
MB:0=100
#MB_HW0=255
#MB_HW1=255
#MB_HW2=255
#MB_HW3=255

If the user sets just one of the HW controls:

# echo MB_HW1=64

what should resctrl display for the legacy "MB:" line?

> 
> Cheers
> ---Dave

-Tony

  reply	other threads:[~2025-10-21 20:59 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-02 16:24 [PATCH] fs/resctrl,x86/resctrl: Factor mba rounding to be per-arch Dave Martin
2025-09-12 22:19 ` Reinette Chatre
2025-09-22 14:39   ` Dave Martin
2025-09-23 17:27     ` Reinette Chatre
2025-09-25 12:46       ` Dave Martin
2025-09-25 20:53         ` Reinette Chatre
2025-09-25 21:35           ` Luck, Tony
2025-09-25 22:18             ` Reinette Chatre
2025-09-29 13:08               ` Dave Martin
2025-09-29 12:43           ` Dave Martin
2025-09-29 15:38             ` Reinette Chatre
2025-09-29 16:10               ` Dave Martin
2025-10-15 15:18     ` Dave Martin
2025-10-16 15:57       ` Reinette Chatre
2025-10-17 15:52         ` Dave Martin
2025-09-22 15:04 ` Dave Martin
2025-09-25 22:58   ` Luck, Tony
2025-09-29  9:19     ` Chen, Yu C
2025-09-29 14:13       ` Dave Martin
2025-09-29 16:23         ` Luck, Tony
2025-09-30 11:02           ` Chen, Yu C
2025-09-30 16:08             ` Luck, Tony
2025-09-30  4:43         ` Chen, Yu C
2025-09-30 15:55           ` Dave Martin
2025-10-01 12:13             ` Chen, Yu C
2025-10-02 15:40               ` Dave Martin
2025-10-02 16:43                 ` Luck, Tony
2025-09-29 13:56     ` Dave Martin
2025-09-29 16:09       ` Reinette Chatre
2025-09-30 15:40         ` Dave Martin
2025-10-10 16:48           ` Reinette Chatre
2025-10-11 17:15             ` Chen, Yu C
2025-10-13 15:01               ` Dave Martin
2025-10-13 14:36             ` Dave Martin
2025-10-14 22:55               ` Reinette Chatre
2025-10-15 15:47                 ` Dave Martin
2025-10-15 18:48                   ` Luck, Tony
2025-10-16 14:50                     ` Dave Martin
2025-10-16 16:31                   ` Reinette Chatre
2025-10-17 14:17                     ` Dave Martin
2025-10-17 15:59                       ` Reinette Chatre
2025-10-20 15:50                         ` Dave Martin
2025-10-20 16:31                           ` Luck, Tony
2025-10-21 14:37                             ` Dave Martin
2025-10-21 20:59                               ` Luck, Tony [this message]
2025-10-22 14:58                                 ` Dave Martin
2025-10-22 16:21                                   ` Luck, Tony
2025-10-23 14:04                                     ` Dave Martin
2025-09-29 16:37       ` Luck, Tony
2025-09-30 16:02         ` Dave Martin
2025-09-26 20:54   ` Reinette Chatre
2025-09-29 13:40     ` Dave Martin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aPf0OKwDZ4XbmVRB@agluck-desk3 \
    --to=tony.luck@intel.com \
    --cc=Dave.Martin@arm.com \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=james.morse@arm.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=reinette.chatre@intel.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).