From: Dave Martin <Dave.Martin@arm.com>
To: "Moger, Babu" <bmoger@amd.com>
Cc: Babu Moger <babu.moger@amd.com>,
corbet@lwn.net, reinette.chatre@intel.com, tglx@linutronix.de,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
tony.luck@intel.com, peternewman@google.com, x86@kernel.org,
hpa@zytor.com, paulmck@kernel.org, akpm@linux-foundation.org,
thuth@redhat.com, rostedt@goodmis.org,
xiongwei.song@windriver.com, pawan.kumar.gupta@linux.intel.com,
daniel.sneddon@linux.intel.com, jpoimboe@kernel.org,
perry.yuan@amd.com, sandipan.das@amd.com, kai.huang@intel.com,
xiaoyao.li@intel.com, seanjc@google.com, xin3.li@intel.com,
andrew.cooper3@citrix.com, ebiggers@google.com,
mario.limonciello@amd.com, james.morse@arm.com,
tan.shaopeng@fujitsu.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, maciej.wieczor-retman@intel.com,
eranian@google.com
Subject: Re: [PATCH v11 23/23] x86/resctrl: Introduce interface to modify assignment states of the groups
Date: Thu, 20 Feb 2025 15:21:33 +0000 [thread overview]
Message-ID: <Z7dIfWAk+f4Gc54X@e133380.arm.com> (raw)
In-Reply-To: <1ccb907b-e8c9-4997-bc45-4a457ee84494@amd.com>
Hi,
On Wed, Feb 19, 2025 at 06:34:42PM -0600, Moger, Babu wrote:
> Hi Dave,
>
> On 2/19/2025 10:07 AM, Dave Martin wrote:
> > Hi,
> >
> > On Wed, Jan 22, 2025 at 02:20:31PM -0600, Babu Moger wrote:
> > [...]
> >
> > > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > > index 6e29827239e0..299839bcf23f 100644
> > > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> > > @@ -1050,6 +1050,244 @@ static int resctrl_mbm_assign_control_show(struct kernfs_open_file *of,
> >
> > [...]
> >
> > > +static ssize_t resctrl_mbm_assign_control_write(struct kernfs_open_file *of,
> > > + char *buf, size_t nbytes, loff_t off)
> > > +{
[...]
> > > + while ((token = strsep(&buf, "\n")) != NULL) {
> > > + /*
> > > + * The write command follows the following format:
> > > + * “<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>”
> > > + * Extract the CTRL_MON group.
> > > + */
> > > + cmon_grp = strsep(&token, "/");
> > > +
> >
> > As when reading this file, I think that the data can grow larger than a
> > page and get split into multiple write() calls.
> >
> > I don't currently think the file needs to be redesigned, but there are
> > some concerns about how userspace will work with it that need to be
> > sorted out.
> >
> > Every monitoring group can contribute a line to this file:
> >
> > CTRL_GROUP / MON_GROUP / DOMAIN = [t][l] [ ; DOMAIN = [t][l] ]* LF
> >
> > so, 2 * (NAME_MAX + 1) + NUM_DOMAINS * 5 - 1 + 1
> >
> > NAME_MAX on Linux is 255, so with, say, up to 16 domains, that's about
> > 600 bytes per monitoring group in the worst case.
> >
> > We don't need to have many control and monitoring groups for this to
> > grow potentially over 4K.
> >
> >
> > We could simply place a limit on how much userspace is allowed to write
> > to this file in one go, although this restriction feels difficult for
> > userspace to follow -- but maybe this is workable in the short term, on
> > current systems (?)
> >
> > Otherwise, since we expect this interface to be written using scripting
> > languages, I think we need to be prepared to accept fully-buffered
> > I/O. That means that the data may be cut at random places, not
> > necessarily at newlines. (For smaller files such as schemata this is
> > not such an issue, since the whole file is likely to be small enough to
> > fit into the default stdio buffers -- this is how sysfs gets away with
> > it IIUC.)
> >
> > For fully-buffered I/O, we may have to cache an incomplete line in
> > between write() calls. If there is a dangling incomplete line when the
> > file is closed then it is hard to tell userspace, because people often
> > don't bother to check the return value of close(), fclose() etc.
> > However, since it's an ABI violation for userspace to end this file
> > with a partial line, I think it's sufficient to report that via
> > last_cmd_status. (Making close() return -EIO still seems a good idea
> > though, just in case userspace is listening.)
>
> Seems like we can add a check in resctrl_mbm_assign_control_write() to
> compare nbytes > PAGE_SIZE.
This might be a reasonable stopgap approach, if we are confident that the
number of RMIDs and monitoring domains is small enough on known
platforms that the problem is unlikely to be hit. I can't really judge
on this.
> But do we really need this? I have no way of testing this. Help me
> understand.
It's easy to demonatrate this using the schemata file (which works in a
similar way). Open f in /sys/fs/resctrl/schemata, then:
int n = 0;
for (n = 0; n < 1000; n++)
if (fputs("MB:0=100;1=100\n", f) == EOF)
fprintf(stderr, "Failed on interation %d\n", n);
This will succeed a certain number of times (272, for me) and then fail
when the stdio buffer for f overflows, triggering a write().
Putting an explicit fflush() after every fputs() call (or doing a
setlinebuf(f) before the loop) makes it work. But this is awkward and
unexpected for the user, and doing the right thing from a scripting
language may be tricky.
In this example I am doing something a bit artificial -- we don't
officially say what happens when a pre-opened schemata file handle is
reused in this way, AFAICT. But for mbm_assign_control it is
legitimate to write many lines, and we can hit this kind of problem.
I'll leave it to others to judge whether we _need_ to fix this, but it
feels like a problem waiting to happen.
> All these file operations go thru generic call kernfs_fop_write_iter().
> Doesn't it take care of buffer check and overflow?
No, this is called for each iovec segment (where userspace used one of
the iovec based I/O syscalls). But there is no buffering or
concatenation of the data read in: each segment gets passed down to the
individual kernfs_file_operations write method for the file:
len = ops->write(of, buf, len, iocb->ki_pos)
calls down to
resctrl_mbm_assign_control_write(of, buf, len, iocb->ki_pos).
I'll try to port my buffering hack on top of the series -- that should
help to illustrate what I mean.
Cheers
---Dave
next prev parent reply other threads:[~2025-02-20 15:21 UTC|newest]
Thread overview: 209+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-22 20:20 [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
2025-01-22 20:20 ` [PATCH v11 01/23] x86/resctrl: Add __init attribute to functions called from resctrl_late_init() Babu Moger
2025-02-05 22:22 ` Reinette Chatre
2025-02-19 13:28 ` Dave Martin
2025-02-19 16:53 ` Moger, Babu
2025-02-20 13:29 ` Dave Martin
2025-01-22 20:20 ` [PATCH v11 02/23] x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
2025-01-22 20:20 ` [PATCH v11 03/23] x86/resctrl: Add ABMC feature in the command line options Babu Moger
2025-01-22 20:20 ` [PATCH v11 04/23] x86/resctrl: Consolidate monitoring related data from rdt_resource Babu Moger
2025-01-22 20:20 ` [PATCH v11 05/23] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
2025-01-22 20:20 ` [PATCH v11 06/23] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
2025-02-05 22:49 ` Reinette Chatre
2025-02-06 16:15 ` Moger, Babu
2025-02-06 18:42 ` Reinette Chatre
2025-02-06 22:57 ` Moger, Babu
2025-02-06 23:28 ` Reinette Chatre
2025-02-21 18:05 ` James Morse
2025-02-21 18:25 ` Reinette Chatre
2025-01-22 20:20 ` [PATCH v11 07/23] x86/resctrl: Introduce the interface to display monitor mode Babu Moger
2025-02-06 18:01 ` Reinette Chatre
2025-02-06 23:41 ` Moger, Babu
2025-02-21 18:06 ` James Morse
2025-02-21 19:44 ` Moger, Babu
2025-01-22 20:20 ` [PATCH v11 08/23] x86/resctrl: Introduce interface to display number of monitoring counters Babu Moger
2025-02-05 23:17 ` Reinette Chatre
2025-02-07 17:18 ` Moger, Babu
2025-02-07 18:52 ` Moger, Babu
2025-02-10 18:08 ` Reinette Chatre
2025-02-10 20:26 ` Moger, Babu
2025-01-22 20:20 ` [PATCH v11 09/23] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg in struct rdt_hw_mon_domain Babu Moger
2025-01-22 20:20 ` [PATCH v11 10/23] x86/resctrl: Remove MSR reading of event configuration value Babu Moger
2025-02-05 23:58 ` Reinette Chatre
2025-02-06 0:51 ` Luck, Tony
2025-02-06 1:41 ` Reinette Chatre
2025-02-06 15:56 ` Luck, Tony
2025-02-21 18:08 ` James Morse
2025-02-19 13:28 ` Dave Martin
2025-02-21 18:08 ` James Morse
2025-02-07 17:30 ` Moger, Babu
2025-02-06 6:24 ` Xin Li
2025-02-06 16:17 ` Reinette Chatre
2025-02-07 10:07 ` Xin Li
2025-02-11 19:44 ` Moger, Babu
2025-02-12 8:33 ` Xin Li
2025-01-22 20:20 ` [PATCH v11 11/23] x86/resctrl: Introduce mbm_cntr_cfg to track assignable counters at domain Babu Moger
2025-02-05 23:57 ` Reinette Chatre
2025-02-07 18:23 ` Moger, Babu
2025-02-10 18:10 ` Reinette Chatre
2025-02-19 13:30 ` Dave Martin
2025-02-19 18:07 ` Moger, Babu
2025-02-20 13:33 ` Dave Martin
2025-02-21 18:07 ` James Morse
2025-02-21 18:35 ` Reinette Chatre
2025-02-21 20:10 ` Moger, Babu
2025-01-22 20:20 ` [PATCH v11 12/23] x86/resctrl: Introduce interface to display number of free counters Babu Moger
2025-02-06 0:19 ` Reinette Chatre
2025-02-07 18:59 ` Moger, Babu
2025-02-19 13:31 ` Dave Martin
2025-01-22 20:20 ` [PATCH v11 13/23] x86/resctrl: Add data structures and definitions for ABMC assignment Babu Moger
2025-01-22 20:20 ` [PATCH v11 14/23] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC Babu Moger
2025-02-19 13:32 ` Dave Martin
2025-02-19 21:00 ` Moger, Babu
2025-02-21 18:06 ` James Morse
2025-02-21 22:24 ` Moger, Babu
2025-01-22 20:20 ` [PATCH v11 15/23] x86/resctrl: Add the functionality to assigm MBM events Babu Moger
2025-02-06 1:05 ` Reinette Chatre
2025-02-07 21:10 ` Moger, Babu
2025-02-10 18:25 ` Reinette Chatre
2025-01-22 20:20 ` [PATCH v11 16/23] x86/resctrl: Add the functionality to unassigm " Babu Moger
2025-02-06 3:54 ` Reinette Chatre
2025-02-10 16:23 ` Moger, Babu
2025-02-10 18:30 ` Reinette Chatre
2025-02-22 0:36 ` Moger, Babu
2025-01-22 20:20 ` [PATCH v11 17/23] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled Babu Moger
2025-02-06 18:03 ` Reinette Chatre
2025-02-10 17:27 ` Moger, Babu
2025-02-10 18:34 ` Reinette Chatre
2025-02-19 13:41 ` Dave Martin
2025-02-19 14:09 ` Peter Newman
2025-02-19 17:55 ` Reinette Chatre
2025-02-20 10:35 ` Peter Newman
2025-02-20 13:40 ` Dave Martin
2025-02-20 17:08 ` Reinette Chatre
2025-02-21 17:14 ` Dave Martin
2025-02-21 18:23 ` Moger, Babu
2025-02-21 22:48 ` Reinette Chatre
2025-02-21 23:42 ` Moger, Babu
2025-02-27 11:07 ` Peter Newman
2025-01-22 20:20 ` [PATCH v11 18/23] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode Babu Moger
2025-02-06 18:04 ` Reinette Chatre
2025-02-10 17:39 ` Moger, Babu
2025-01-22 20:20 ` [PATCH v11 19/23] x86/resctrl: Introduce the interface to switch between monitor modes Babu Moger
2025-02-06 18:05 ` Reinette Chatre
2025-02-10 18:54 ` Moger, Babu
2025-01-22 20:20 ` [PATCH v11 20/23] x86/resctrl: Configure mbm_cntr_assign mode if supported Babu Moger
2025-02-21 18:06 ` James Morse
2025-02-24 15:49 ` Moger, Babu
2025-02-24 17:01 ` Reinette Chatre
2025-02-24 21:18 ` Moger, Babu
2025-02-24 22:20 ` Reinette Chatre
2025-01-22 20:20 ` [PATCH v11 21/23] x86/resctrl: Update assignments on event configuration changes Babu Moger
2025-01-22 20:20 ` [PATCH v11 22/23] x86/resctrl: Introduce interface to list assignment states of all the groups Babu Moger
2025-02-19 13:53 ` Dave Martin
2025-02-19 21:09 ` Moger, Babu
2025-02-20 15:44 ` Dave Martin
2025-02-20 21:29 ` Moger, Babu
2025-02-21 16:00 ` Dave Martin
2025-02-21 20:10 ` Reinette Chatre
2025-02-24 17:17 ` Dave Martin
2025-02-24 17:23 ` Luck, Tony
2025-02-28 17:50 ` Dave Martin
2025-03-03 19:30 ` Luck, Tony
2025-03-05 18:06 ` Dave Martin
2025-01-22 20:20 ` [PATCH v11 23/23] x86/resctrl: Introduce interface to modify assignment states of " Babu Moger
2025-02-06 18:48 ` Reinette Chatre
2025-02-10 19:46 ` Moger, Babu
2025-02-19 16:07 ` Dave Martin
2025-02-19 17:43 ` Luck, Tony
2025-02-20 14:57 ` Dave Martin
2025-02-20 0:34 ` Moger, Babu
2025-02-20 15:21 ` Dave Martin [this message]
2025-02-20 20:57 ` Moger, Babu
2025-02-21 15:53 ` Dave Martin
2025-02-21 20:16 ` Reinette Chatre
2025-02-21 18:07 ` James Morse
2025-02-24 20:49 ` Moger, Babu
2025-02-03 14:54 ` [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Peter Newman
2025-02-03 20:49 ` Moger, Babu
2025-02-13 17:51 ` Dave Martin
2025-02-13 18:08 ` Luck, Tony
2025-02-12 17:46 ` Dave Martin
2025-02-12 23:33 ` Reinette Chatre
2025-02-12 23:40 ` Reinette Chatre
2025-02-13 0:11 ` Luck, Tony
2025-02-13 17:56 ` Dave Martin
2025-02-13 17:37 ` Dave Martin
2025-02-14 6:26 ` Reinette Chatre
2025-02-14 18:31 ` Moger, Babu
2025-02-14 19:18 ` Reinette Chatre
2025-02-14 19:51 ` Moger, Babu
2025-02-17 10:26 ` Peter Newman
2025-02-17 16:45 ` Moger, Babu
2025-02-18 12:30 ` Dave Martin
2025-02-18 15:39 ` Moger, Babu
2025-02-18 18:14 ` Reinette Chatre
2025-02-18 19:32 ` Moger, Babu
2025-02-18 21:29 ` Reinette Chatre
2025-02-19 12:26 ` Dave Martin
2025-02-19 12:24 ` Dave Martin
2025-02-18 16:51 ` Luck, Tony
2025-02-18 18:27 ` Reinette Chatre
2025-02-18 19:08 ` Luck, Tony
2025-02-18 21:32 ` Reinette Chatre
2025-02-18 17:49 ` Reinette Chatre
2025-02-19 11:28 ` Peter Newman
2025-02-19 12:26 ` Dave Martin
2025-02-19 17:56 ` Reinette Chatre
2025-02-20 14:53 ` Peter Newman
2025-02-20 18:36 ` Reinette Chatre
2025-02-21 13:12 ` Peter Newman
2025-02-21 22:43 ` Reinette Chatre
2025-02-25 17:11 ` Peter Newman
2025-02-25 21:31 ` Moger, Babu
2025-02-26 13:27 ` Peter Newman
2025-02-26 16:25 ` Reinette Chatre
2025-02-26 17:12 ` Moger, Babu
2025-03-03 19:16 ` Moger, Babu
2025-03-04 16:44 ` Peter Newman
2025-03-04 21:49 ` Moger, Babu
2025-03-05 10:40 ` Peter Newman
2025-03-05 19:34 ` Moger, Babu
2025-03-10 22:48 ` Moger, Babu
2025-03-10 23:22 ` Luck, Tony
2025-03-11 1:44 ` Moger, Babu
2025-03-11 3:51 ` Reinette Chatre
2025-03-11 20:35 ` Moger, Babu
2025-03-11 20:53 ` Luck, Tony
2025-03-12 15:14 ` Moger, Babu
2025-03-12 15:15 ` Reinette Chatre
2025-03-12 15:07 ` Reinette Chatre
2025-03-12 16:03 ` Moger, Babu
2025-03-12 17:14 ` Reinette Chatre
2025-03-12 18:14 ` Moger, Babu
2025-03-13 16:08 ` Reinette Chatre
2025-03-13 20:13 ` Moger, Babu
2025-03-13 20:36 ` Luck, Tony
2025-03-14 14:49 ` Moger, Babu
2025-03-13 21:21 ` Reinette Chatre
2025-03-14 16:18 ` Moger, Babu
2025-03-19 18:36 ` Reinette Chatre
2025-03-20 18:12 ` Moger, Babu
2025-03-20 22:35 ` Reinette Chatre
2025-03-21 0:35 ` Moger, Babu
2025-03-17 16:27 ` Peter Newman
2025-03-17 23:00 ` Moger, Babu
2025-03-19 20:53 ` Reinette Chatre
2025-03-20 20:29 ` Moger, Babu
2025-02-25 21:41 ` Reinette Chatre
2025-02-20 16:46 ` Dave Martin
2025-02-20 17:46 ` Dave Martin
2025-02-20 18:36 ` Reinette Chatre
2025-02-21 16:47 ` Dave Martin
2025-02-21 22:43 ` Reinette Chatre
2025-02-13 16:19 ` Moger, Babu
2025-02-13 18:18 ` Dave Martin
2025-02-13 18:39 ` Luck, Tony
2025-02-14 6:34 ` Reinette Chatre
2025-02-14 7:23 ` Reinette Chatre
2025-02-21 18:07 ` James Morse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z7dIfWAk+f4Gc54X@e133380.arm.com \
--to=dave.martin@arm.com \
--cc=akpm@linux-foundation.org \
--cc=andrew.cooper3@citrix.com \
--cc=babu.moger@amd.com \
--cc=bmoger@amd.com \
--cc=bp@alien8.de \
--cc=corbet@lwn.net \
--cc=daniel.sneddon@linux.intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=ebiggers@google.com \
--cc=eranian@google.com \
--cc=hpa@zytor.com \
--cc=james.morse@arm.com \
--cc=jpoimboe@kernel.org \
--cc=kai.huang@intel.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maciej.wieczor-retman@intel.com \
--cc=mario.limonciello@amd.com \
--cc=mingo@redhat.com \
--cc=paulmck@kernel.org \
--cc=pawan.kumar.gupta@linux.intel.com \
--cc=perry.yuan@amd.com \
--cc=peternewman@google.com \
--cc=reinette.chatre@intel.com \
--cc=rostedt@goodmis.org \
--cc=sandipan.das@amd.com \
--cc=seanjc@google.com \
--cc=tan.shaopeng@fujitsu.com \
--cc=tglx@linutronix.de \
--cc=thuth@redhat.com \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
--cc=xiaoyao.li@intel.com \
--cc=xin3.li@intel.com \
--cc=xiongwei.song@windriver.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).