From: "Moger, Babu" <babu.moger@amd.com>
To: Dave Martin <Dave.Martin@arm.com>, "Moger, Babu" <bmoger@amd.com>
Cc: corbet@lwn.net, reinette.chatre@intel.com, tglx@linutronix.de,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
tony.luck@intel.com, peternewman@google.com, x86@kernel.org,
hpa@zytor.com, paulmck@kernel.org, akpm@linux-foundation.org,
thuth@redhat.com, rostedt@goodmis.org,
xiongwei.song@windriver.com, pawan.kumar.gupta@linux.intel.com,
daniel.sneddon@linux.intel.com, jpoimboe@kernel.org,
perry.yuan@amd.com, sandipan.das@amd.com, kai.huang@intel.com,
xiaoyao.li@intel.com, seanjc@google.com, xin3.li@intel.com,
andrew.cooper3@citrix.com, ebiggers@google.com,
mario.limonciello@amd.com, james.morse@arm.com,
tan.shaopeng@fujitsu.com, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, maciej.wieczor-retman@intel.com,
eranian@google.com
Subject: Re: [PATCH v11 23/23] x86/resctrl: Introduce interface to modify assignment states of the groups
Date: Thu, 20 Feb 2025 14:57:31 -0600 [thread overview]
Message-ID: <fdfe13ae-1fb1-417c-88f5-6b0973338c34@amd.com> (raw)
In-Reply-To: <Z7dIfWAk+f4Gc54X@e133380.arm.com>
Hi Dave,
On 2/20/25 09:21, Dave Martin wrote:
> Hi,
>
> On Wed, Feb 19, 2025 at 06:34:42PM -0600, Moger, Babu wrote:
>> Hi Dave,
>>
>> On 2/19/2025 10:07 AM, Dave Martin wrote:
>>> Hi,
>>>
>>> On Wed, Jan 22, 2025 at 02:20:31PM -0600, Babu Moger wrote:
>
>>> [...]
>>>
>>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> index 6e29827239e0..299839bcf23f 100644
>>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>>> @@ -1050,6 +1050,244 @@ static int resctrl_mbm_assign_control_show(struct kernfs_open_file *of,
>>>
>>> [...]
>>>
>>>> +static ssize_t resctrl_mbm_assign_control_write(struct kernfs_open_file *of,
>>>> + char *buf, size_t nbytes, loff_t off)
>>>> +{
>
> [...]
>
>>>> + while ((token = strsep(&buf, "\n")) != NULL) {
>>>> + /*
>>>> + * The write command follows the following format:
>>>> + * “<CTRL_MON group>/<MON group>/<domain_id><opcode><flags>”
>>>> + * Extract the CTRL_MON group.
>>>> + */
>>>> + cmon_grp = strsep(&token, "/");
>>>> +
>>>
>>> As when reading this file, I think that the data can grow larger than a
>>> page and get split into multiple write() calls.
>>>
>>> I don't currently think the file needs to be redesigned, but there are
>>> some concerns about how userspace will work with it that need to be
>>> sorted out.
>>>
>>> Every monitoring group can contribute a line to this file:
>>>
>>> CTRL_GROUP / MON_GROUP / DOMAIN = [t][l] [ ; DOMAIN = [t][l] ]* LF
>>>
>>> so, 2 * (NAME_MAX + 1) + NUM_DOMAINS * 5 - 1 + 1
>>>
>>> NAME_MAX on Linux is 255, so with, say, up to 16 domains, that's about
>>> 600 bytes per monitoring group in the worst case.
>>>
>>> We don't need to have many control and monitoring groups for this to
>>> grow potentially over 4K.
>>>
>>>
>>> We could simply place a limit on how much userspace is allowed to write
>>> to this file in one go, although this restriction feels difficult for
>>> userspace to follow -- but maybe this is workable in the short term, on
>>> current systems (?)
>>>
>>> Otherwise, since we expect this interface to be written using scripting
>>> languages, I think we need to be prepared to accept fully-buffered
>>> I/O. That means that the data may be cut at random places, not
>>> necessarily at newlines. (For smaller files such as schemata this is
>>> not such an issue, since the whole file is likely to be small enough to
>>> fit into the default stdio buffers -- this is how sysfs gets away with
>>> it IIUC.)
>>>
>>> For fully-buffered I/O, we may have to cache an incomplete line in
>>> between write() calls. If there is a dangling incomplete line when the
>>> file is closed then it is hard to tell userspace, because people often
>>> don't bother to check the return value of close(), fclose() etc.
>>> However, since it's an ABI violation for userspace to end this file
>>> with a partial line, I think it's sufficient to report that via
>>> last_cmd_status. (Making close() return -EIO still seems a good idea
>>> though, just in case userspace is listening.)
>>
>> Seems like we can add a check in resctrl_mbm_assign_control_write() to
>> compare nbytes > PAGE_SIZE.
>
> This might be a reasonable stopgap approach, if we are confident that the
> number of RMIDs and monitoring domains is small enough on known
> platforms that the problem is unlikely to be hit. I can't really judge
> on this.
>
>> But do we really need this? I have no way of testing this. Help me
>> understand.
>
> It's easy to demonatrate this using the schemata file (which works in a
> similar way). Open f in /sys/fs/resctrl/schemata, then:
>
> int n = 0;
>
> for (n = 0; n < 1000; n++)
> if (fputs("MB:0=100;1=100\n", f) == EOF)
> fprintf(stderr, "Failed on interation %d\n", n);
>
> This will succeed a certain number of times (272, for me) and then fail
> when the stdio buffer for f overflows, triggering a write().
>
> Putting an explicit fflush() after every fputs() call (or doing a
> setlinebuf(f) before the loop) makes it work. But this is awkward and
> unexpected for the user, and doing the right thing from a scripting
> language may be tricky.
>
> In this example I am doing something a bit artificial -- we don't
> officially say what happens when a pre-opened schemata file handle is
> reused in this way, AFAICT. But for mbm_assign_control it is
> legitimate to write many lines, and we can hit this kind of problem.
>
>
> I'll leave it to others to judge whether we _need_ to fix this, but it
> feels like a problem waiting to happen.
Created the problem using this code using a "test" group.
include <stdio.h>
#include <errno.h>
#include <string.h>
int main()
{
FILE *file;
int n;
file = fopen("/sys/fs/resctrl/info/L3_MON/mbm_assign_control", "w");
if (file == NULL) {
printf("Error opening file!\n");
return 1;
}
printf("File opened successfully.\n");
for (n = 0; n < 100; n++)
if
(fputs("test//0=tl;1=tl;2=tl;3=tl;4=tl;5=tl;9=tl;10=tl;11=tl\n", file) == EOF)
fprintf(stderr, "Failed on interation %d error
%s\n ", n, strerror(errno));
if (fclose(file) == 0) {
printf("File closed successfully.\n");
} else {
printf("Error closing file!\n");
}
}
When the buffer overflow happens the newline will not be there. I have
added this error via rdt_last_cmd_puts. At least user knows there is an error.
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 484d6009869f..70a96976e3ab 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1250,8 +1252,10 @@ static ssize_t
resctrl_mbm_assign_control_write(struct kernfs_open_file *of,
int ret;
/* Valid input requires a trailing newline */
- if (nbytes == 0 || buf[nbytes - 1] != '\n')
+ if (nbytes == 0 || buf[nbytes - 1] != '\n') {
+ rdt_last_cmd_puts("mbm_cntr_assign: buffer invalid\n");
return -EINVAL;
+ }
buf[nbytes - 1] = '\0';
I am open to other ideas to handle this case.
>
>
>> All these file operations go thru generic call kernfs_fop_write_iter().
>> Doesn't it take care of buffer check and overflow?
>
> No, this is called for each iovec segment (where userspace used one of
> the iovec based I/O syscalls). But there is no buffering or
> concatenation of the data read in: each segment gets passed down to the
> individual kernfs_file_operations write method for the file:
>
> len = ops->write(of, buf, len, iocb->ki_pos)
>
> calls down to
>
> resctrl_mbm_assign_control_write(of, buf, len, iocb->ki_pos).
>
>
> I'll try to port my buffering hack on top of the series -- that should
> help to illustrate what I mean.
>
> Cheers
> ---Dave
>
--
Thanks
Babu Moger
next prev parent reply other threads:[~2025-02-20 20:57 UTC|newest]
Thread overview: 209+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-22 20:20 [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
2025-01-22 20:20 ` [PATCH v11 01/23] x86/resctrl: Add __init attribute to functions called from resctrl_late_init() Babu Moger
2025-02-05 22:22 ` Reinette Chatre
2025-02-19 13:28 ` Dave Martin
2025-02-19 16:53 ` Moger, Babu
2025-02-20 13:29 ` Dave Martin
2025-01-22 20:20 ` [PATCH v11 02/23] x86/cpufeatures: Add support for Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
2025-01-22 20:20 ` [PATCH v11 03/23] x86/resctrl: Add ABMC feature in the command line options Babu Moger
2025-01-22 20:20 ` [PATCH v11 04/23] x86/resctrl: Consolidate monitoring related data from rdt_resource Babu Moger
2025-01-22 20:20 ` [PATCH v11 05/23] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
2025-01-22 20:20 ` [PATCH v11 06/23] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
2025-02-05 22:49 ` Reinette Chatre
2025-02-06 16:15 ` Moger, Babu
2025-02-06 18:42 ` Reinette Chatre
2025-02-06 22:57 ` Moger, Babu
2025-02-06 23:28 ` Reinette Chatre
2025-02-21 18:05 ` James Morse
2025-02-21 18:25 ` Reinette Chatre
2025-01-22 20:20 ` [PATCH v11 07/23] x86/resctrl: Introduce the interface to display monitor mode Babu Moger
2025-02-06 18:01 ` Reinette Chatre
2025-02-06 23:41 ` Moger, Babu
2025-02-21 18:06 ` James Morse
2025-02-21 19:44 ` Moger, Babu
2025-01-22 20:20 ` [PATCH v11 08/23] x86/resctrl: Introduce interface to display number of monitoring counters Babu Moger
2025-02-05 23:17 ` Reinette Chatre
2025-02-07 17:18 ` Moger, Babu
2025-02-07 18:52 ` Moger, Babu
2025-02-10 18:08 ` Reinette Chatre
2025-02-10 20:26 ` Moger, Babu
2025-01-22 20:20 ` [PATCH v11 09/23] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg in struct rdt_hw_mon_domain Babu Moger
2025-01-22 20:20 ` [PATCH v11 10/23] x86/resctrl: Remove MSR reading of event configuration value Babu Moger
2025-02-05 23:58 ` Reinette Chatre
2025-02-06 0:51 ` Luck, Tony
2025-02-06 1:41 ` Reinette Chatre
2025-02-06 15:56 ` Luck, Tony
2025-02-21 18:08 ` James Morse
2025-02-19 13:28 ` Dave Martin
2025-02-21 18:08 ` James Morse
2025-02-07 17:30 ` Moger, Babu
2025-02-06 6:24 ` Xin Li
2025-02-06 16:17 ` Reinette Chatre
2025-02-07 10:07 ` Xin Li
2025-02-11 19:44 ` Moger, Babu
2025-02-12 8:33 ` Xin Li
2025-01-22 20:20 ` [PATCH v11 11/23] x86/resctrl: Introduce mbm_cntr_cfg to track assignable counters at domain Babu Moger
2025-02-05 23:57 ` Reinette Chatre
2025-02-07 18:23 ` Moger, Babu
2025-02-10 18:10 ` Reinette Chatre
2025-02-19 13:30 ` Dave Martin
2025-02-19 18:07 ` Moger, Babu
2025-02-20 13:33 ` Dave Martin
2025-02-21 18:07 ` James Morse
2025-02-21 18:35 ` Reinette Chatre
2025-02-21 20:10 ` Moger, Babu
2025-01-22 20:20 ` [PATCH v11 12/23] x86/resctrl: Introduce interface to display number of free counters Babu Moger
2025-02-06 0:19 ` Reinette Chatre
2025-02-07 18:59 ` Moger, Babu
2025-02-19 13:31 ` Dave Martin
2025-01-22 20:20 ` [PATCH v11 13/23] x86/resctrl: Add data structures and definitions for ABMC assignment Babu Moger
2025-01-22 20:20 ` [PATCH v11 14/23] x86/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC Babu Moger
2025-02-19 13:32 ` Dave Martin
2025-02-19 21:00 ` Moger, Babu
2025-02-21 18:06 ` James Morse
2025-02-21 22:24 ` Moger, Babu
2025-01-22 20:20 ` [PATCH v11 15/23] x86/resctrl: Add the functionality to assigm MBM events Babu Moger
2025-02-06 1:05 ` Reinette Chatre
2025-02-07 21:10 ` Moger, Babu
2025-02-10 18:25 ` Reinette Chatre
2025-01-22 20:20 ` [PATCH v11 16/23] x86/resctrl: Add the functionality to unassigm " Babu Moger
2025-02-06 3:54 ` Reinette Chatre
2025-02-10 16:23 ` Moger, Babu
2025-02-10 18:30 ` Reinette Chatre
2025-02-22 0:36 ` Moger, Babu
2025-01-22 20:20 ` [PATCH v11 17/23] x86/resctrl: Auto assign/unassign counters when mbm_cntr_assign is enabled Babu Moger
2025-02-06 18:03 ` Reinette Chatre
2025-02-10 17:27 ` Moger, Babu
2025-02-10 18:34 ` Reinette Chatre
2025-02-19 13:41 ` Dave Martin
2025-02-19 14:09 ` Peter Newman
2025-02-19 17:55 ` Reinette Chatre
2025-02-20 10:35 ` Peter Newman
2025-02-20 13:40 ` Dave Martin
2025-02-20 17:08 ` Reinette Chatre
2025-02-21 17:14 ` Dave Martin
2025-02-21 18:23 ` Moger, Babu
2025-02-21 22:48 ` Reinette Chatre
2025-02-21 23:42 ` Moger, Babu
2025-02-27 11:07 ` Peter Newman
2025-01-22 20:20 ` [PATCH v11 18/23] x86/resctrl: Report "Unassigned" for MBM events in mbm_cntr_assign mode Babu Moger
2025-02-06 18:04 ` Reinette Chatre
2025-02-10 17:39 ` Moger, Babu
2025-01-22 20:20 ` [PATCH v11 19/23] x86/resctrl: Introduce the interface to switch between monitor modes Babu Moger
2025-02-06 18:05 ` Reinette Chatre
2025-02-10 18:54 ` Moger, Babu
2025-01-22 20:20 ` [PATCH v11 20/23] x86/resctrl: Configure mbm_cntr_assign mode if supported Babu Moger
2025-02-21 18:06 ` James Morse
2025-02-24 15:49 ` Moger, Babu
2025-02-24 17:01 ` Reinette Chatre
2025-02-24 21:18 ` Moger, Babu
2025-02-24 22:20 ` Reinette Chatre
2025-01-22 20:20 ` [PATCH v11 21/23] x86/resctrl: Update assignments on event configuration changes Babu Moger
2025-01-22 20:20 ` [PATCH v11 22/23] x86/resctrl: Introduce interface to list assignment states of all the groups Babu Moger
2025-02-19 13:53 ` Dave Martin
2025-02-19 21:09 ` Moger, Babu
2025-02-20 15:44 ` Dave Martin
2025-02-20 21:29 ` Moger, Babu
2025-02-21 16:00 ` Dave Martin
2025-02-21 20:10 ` Reinette Chatre
2025-02-24 17:17 ` Dave Martin
2025-02-24 17:23 ` Luck, Tony
2025-02-28 17:50 ` Dave Martin
2025-03-03 19:30 ` Luck, Tony
2025-03-05 18:06 ` Dave Martin
2025-01-22 20:20 ` [PATCH v11 23/23] x86/resctrl: Introduce interface to modify assignment states of " Babu Moger
2025-02-06 18:48 ` Reinette Chatre
2025-02-10 19:46 ` Moger, Babu
2025-02-19 16:07 ` Dave Martin
2025-02-19 17:43 ` Luck, Tony
2025-02-20 14:57 ` Dave Martin
2025-02-20 0:34 ` Moger, Babu
2025-02-20 15:21 ` Dave Martin
2025-02-20 20:57 ` Moger, Babu [this message]
2025-02-21 15:53 ` Dave Martin
2025-02-21 20:16 ` Reinette Chatre
2025-02-21 18:07 ` James Morse
2025-02-24 20:49 ` Moger, Babu
2025-02-03 14:54 ` [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Peter Newman
2025-02-03 20:49 ` Moger, Babu
2025-02-13 17:51 ` Dave Martin
2025-02-13 18:08 ` Luck, Tony
2025-02-12 17:46 ` Dave Martin
2025-02-12 23:33 ` Reinette Chatre
2025-02-12 23:40 ` Reinette Chatre
2025-02-13 0:11 ` Luck, Tony
2025-02-13 17:56 ` Dave Martin
2025-02-13 17:37 ` Dave Martin
2025-02-14 6:26 ` Reinette Chatre
2025-02-14 18:31 ` Moger, Babu
2025-02-14 19:18 ` Reinette Chatre
2025-02-14 19:51 ` Moger, Babu
2025-02-17 10:26 ` Peter Newman
2025-02-17 16:45 ` Moger, Babu
2025-02-18 12:30 ` Dave Martin
2025-02-18 15:39 ` Moger, Babu
2025-02-18 18:14 ` Reinette Chatre
2025-02-18 19:32 ` Moger, Babu
2025-02-18 21:29 ` Reinette Chatre
2025-02-19 12:26 ` Dave Martin
2025-02-19 12:24 ` Dave Martin
2025-02-18 16:51 ` Luck, Tony
2025-02-18 18:27 ` Reinette Chatre
2025-02-18 19:08 ` Luck, Tony
2025-02-18 21:32 ` Reinette Chatre
2025-02-18 17:49 ` Reinette Chatre
2025-02-19 11:28 ` Peter Newman
2025-02-19 12:26 ` Dave Martin
2025-02-19 17:56 ` Reinette Chatre
2025-02-20 14:53 ` Peter Newman
2025-02-20 18:36 ` Reinette Chatre
2025-02-21 13:12 ` Peter Newman
2025-02-21 22:43 ` Reinette Chatre
2025-02-25 17:11 ` Peter Newman
2025-02-25 21:31 ` Moger, Babu
2025-02-26 13:27 ` Peter Newman
2025-02-26 16:25 ` Reinette Chatre
2025-02-26 17:12 ` Moger, Babu
2025-03-03 19:16 ` Moger, Babu
2025-03-04 16:44 ` Peter Newman
2025-03-04 21:49 ` Moger, Babu
2025-03-05 10:40 ` Peter Newman
2025-03-05 19:34 ` Moger, Babu
2025-03-10 22:48 ` Moger, Babu
2025-03-10 23:22 ` Luck, Tony
2025-03-11 1:44 ` Moger, Babu
2025-03-11 3:51 ` Reinette Chatre
2025-03-11 20:35 ` Moger, Babu
2025-03-11 20:53 ` Luck, Tony
2025-03-12 15:14 ` Moger, Babu
2025-03-12 15:15 ` Reinette Chatre
2025-03-12 15:07 ` Reinette Chatre
2025-03-12 16:03 ` Moger, Babu
2025-03-12 17:14 ` Reinette Chatre
2025-03-12 18:14 ` Moger, Babu
2025-03-13 16:08 ` Reinette Chatre
2025-03-13 20:13 ` Moger, Babu
2025-03-13 20:36 ` Luck, Tony
2025-03-14 14:49 ` Moger, Babu
2025-03-13 21:21 ` Reinette Chatre
2025-03-14 16:18 ` Moger, Babu
2025-03-19 18:36 ` Reinette Chatre
2025-03-20 18:12 ` Moger, Babu
2025-03-20 22:35 ` Reinette Chatre
2025-03-21 0:35 ` Moger, Babu
2025-03-17 16:27 ` Peter Newman
2025-03-17 23:00 ` Moger, Babu
2025-03-19 20:53 ` Reinette Chatre
2025-03-20 20:29 ` Moger, Babu
2025-02-25 21:41 ` Reinette Chatre
2025-02-20 16:46 ` Dave Martin
2025-02-20 17:46 ` Dave Martin
2025-02-20 18:36 ` Reinette Chatre
2025-02-21 16:47 ` Dave Martin
2025-02-21 22:43 ` Reinette Chatre
2025-02-13 16:19 ` Moger, Babu
2025-02-13 18:18 ` Dave Martin
2025-02-13 18:39 ` Luck, Tony
2025-02-14 6:34 ` Reinette Chatre
2025-02-14 7:23 ` Reinette Chatre
2025-02-21 18:07 ` James Morse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fdfe13ae-1fb1-417c-88f5-6b0973338c34@amd.com \
--to=babu.moger@amd.com \
--cc=Dave.Martin@arm.com \
--cc=akpm@linux-foundation.org \
--cc=andrew.cooper3@citrix.com \
--cc=bmoger@amd.com \
--cc=bp@alien8.de \
--cc=corbet@lwn.net \
--cc=daniel.sneddon@linux.intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=ebiggers@google.com \
--cc=eranian@google.com \
--cc=hpa@zytor.com \
--cc=james.morse@arm.com \
--cc=jpoimboe@kernel.org \
--cc=kai.huang@intel.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maciej.wieczor-retman@intel.com \
--cc=mario.limonciello@amd.com \
--cc=mingo@redhat.com \
--cc=paulmck@kernel.org \
--cc=pawan.kumar.gupta@linux.intel.com \
--cc=perry.yuan@amd.com \
--cc=peternewman@google.com \
--cc=reinette.chatre@intel.com \
--cc=rostedt@goodmis.org \
--cc=sandipan.das@amd.com \
--cc=seanjc@google.com \
--cc=tan.shaopeng@fujitsu.com \
--cc=tglx@linutronix.de \
--cc=thuth@redhat.com \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
--cc=xiaoyao.li@intel.com \
--cc=xin3.li@intel.com \
--cc=xiongwei.song@windriver.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).