From: Reinette Chatre <reinette.chatre@intel.com>
To: Peter Newman <peternewman@google.com>
Cc: <babu.moger@amd.com>, <corbet@lwn.net>, <fenghua.yu@intel.com>,
<tglx@linutronix.de>, <mingo@redhat.com>, <bp@alien8.de>,
<dave.hansen@linux.intel.com>, <x86@kernel.org>, <hpa@zytor.com>,
<paulmck@kernel.org>, <rdunlap@infradead.org>, <tj@kernel.org>,
<peterz@infradead.org>, <yanjiewtw@gmail.com>,
<kim.phillips@amd.com>, <lukas.bulwahn@gmail.com>,
<seanjc@google.com>, <jmattson@google.com>, <leitao@debian.org>,
<jpoimboe@kernel.org>, <rick.p.edgecombe@intel.com>,
<kirill.shutemov@linux.intel.com>, <jithu.joseph@intel.com>,
<kai.huang@intel.com>, <kan.liang@linux.intel.com>,
<daniel.sneddon@linux.intel.com>, <pbonzini@redhat.com>,
<sandipan.das@amd.com>, <ilpo.jarvinen@linux.intel.com>,
<maciej.wieczor-retman@intel.com>, <linux-doc@vger.kernel.org>,
<linux-kernel@vger.kernel.org>, <eranian@google.com>,
<james.morse@arm.com>
Subject: Re: [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
Date: Thu, 15 Aug 2024 18:45:51 -0700 [thread overview]
Message-ID: <37766706-b4ea-4aa3-9686-191411eb480c@intel.com> (raw)
In-Reply-To: <CALPaoCiV-xTKn9N+oP4mqgRwsWxPUfutKDF==1h-8K9b7YfE0g@mail.gmail.com>
Hi Peter,
On 8/15/24 4:06 PM, Peter Newman wrote:
> On Wed, Aug 14, 2024 at 10:37 AM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>>
>> Hi Peter,
>>
>> On 8/2/24 3:50 PM, Peter Newman wrote:
>>> On Fri, Aug 2, 2024 at 1:55 PM Reinette Chatre
>>> <reinette.chatre@intel.com> wrote:
>>>> On 8/2/24 11:49 AM, Peter Newman wrote:
>>>>> On Fri, Aug 2, 2024 at 9:14 AM Reinette Chatre
>>>>>> I am of course not familiar with details of the software implementation
>>>>>> - could there be benefits to using it even if hardware counters are
>>>>>> supported?
>>>>>
>>>>> I can't see any situation where the user would want to choose software
>>>>> over hardware counters. The number of groups which can be monitored by
>>>>> software assignable counters will always be less than with hardware,
>>>>> due to the need for consuming one RMID (and the counters automatically
>>>>> allocated to it by the AMD hardware) for all unassigned groups.
>>>>
>>>> Thank you for clarifying. This seems specific to this software implementation,
>>>> and I missed that there was a shift from soft-RMIDs to soft-ABMC. If I remember
>>>> correctly this depends on undocumented hardware specific knowledge.
>>>
>>> For the benefit of anyone else who needs to monitor bandwidth on a
>>> large number of monitoring groups on pre-ABMC AMD implementations,
>>> hopefully a future AMD publication will clarify, at least on some
>>> existing, pre-ABMC models, exactly when the QM_CTR.U bit is set.
>>>
>>>
>>>>>
>>>>> The behavior as I've implemented today is:
>>>>>
>>>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_events
>>>>> 0
>>>>>
>>>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>>>>> test//0=_;1=_;
>>>>> //0=_;1=_;
>>>>>
>>>>> # echo "test//1+l" > /sys/fs/resctrl/info/L3_MON/mbm_control
>>>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>>>>> test//0=_;1=tl;
>>>>> //0=_;1=_;
>>>>>
>>>>> # echo "test//1-t" > /sys/fs/resctrl/info/L3_MON/mbm_control
>>>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_control
>>>>> test//0=_;1=_;
>>>>> //0=_;1=_;
>>>>>
>>>>>
>>>>
>>>> This highlights how there cannot be a generic/consistent interface between hardware
>>>> and software implementation. If resctrl implements something like above without any
>>>> other hints to user space then it will push complexity to user space since user space
>>>> would not know if setting one flag results in setting more than that flag, which may
>>>> force a user space implementation to always follow a write with a read that
>>>> needs to confirm what actually resulted from the write. Similarly, that removing a
>>>> flag impacts other flags needs to be clear without user space needing to "try and
>>>> see what happens".
>>>
>>> I'll return to this topic in the context of MPAM below...
>>>
>>>> It is not clear to me how to interpret the above example when it comes to the
>>>> RMID management though. If the RMID assignment is per group then I expected all
>>>> the domains of a group to have the same flag(s)?
>>>
>>> The group RMIDs are never programmed into any MSRs and the RMID space
>>> is independent in each domain, so it is still possible to do
>>> per-domain assignment. (and like with soft RMIDs, this enables us to
>>> create unlimited groups, but we've never been limited by the size of
>>> the RMID space)
>>>
>>> However, in our use cases, jobs are not confined to any domain, so
>>> bandwidth measurements must be done simultaneously in all domains, so
>>> we have no current use for per-domain assignment. But if any Google
>>> users did begin to see value in confining jobs to domains, this could
>>> change.
>>>
>>>>
>>>>>>
>>>>>>> However, If we don't expect to see these semantics in any other
>>>>>>> implementation, these semantics could be implicit in the definition of
>>>>>>> a SW assignable counter.
>>>>>>
>>>>>> It is not clear to me how implementation differences between hardware
>>>>>> and software assignment can be hidden from user space. It is possible
>>>>>> to let user space enable individual events and then silently upgrade it
>>>>>> to all events. I see two options here, either "mbm_control" needs to
>>>>>> explicitly show this "silent upgrade" so that user space knows which
>>>>>> events are actually enabled, or "mbm_control" only shows flags/events enabled
>>>>>> from user space perspective. In the former scenario, this needs more
>>>>>> user space support since a generic user space cannot be confident which
>>>>>> flags are set after writing to "mbm_control". In the latter scenario,
>>>>>> meaning of "num_mbm_cntrs" becomes unclear since user space is expected
>>>>>> to rely on it to know which events can be enabled and if some are
>>>>>> actually "silently enabled" when user space still thinks it needs to be
>>>>>> enabled the number of available counters becomes vague.
>>>>>>
>>>>>> It is not clear to me how to present hardware and software assignable
>>>>>> counters with a single consistent interface. Actually, what if the
>>>>>> "mbm_mode" is what distinguishes how counters are assigned instead of how
>>>>>> it is backed (hw vs sw)? What if, instead of "mbm_cntr_assignable" and
>>>>>> "mbm_cntr_sw_assignable" MBM modes the terms "mbm_cntr_event_assignable"
>>>>>> and "mbm_cntr_group_assignable" is used? Could that replace a
>>>>>> potential "mbm_assign_events" while also supporting user space in
>>>>>> interactions with "mbm_control"?
>>>>>
>>>>> If I understand this correctly, is this a preference that the info
>>>>> node be named differently if its value will have different units,
>>>>> rather than a second node to indicate what the value of num_mbm_cntrs
>>>>> actually means? This sounds reasonable to me.
>>>>
>>>> Indeed. As you highlighted, user space may not need to know if
>>>> counters are backed by hardware or software, but user space needs to
>>>> know what to expect from (how to interact with) interface.
>>>>
>>>>> I think it's also important to note that in MPAM, the MBWU (memory
>>>>> bandwidth usage) monitors don't have a concept of local versus total
>>>>> bandwidth, so event assignment would likely not apply there either.
>>>>> What the counted bandwidth actually represents is more implicit in the
>>>>> monitor's position in the memory system in the particular
>>>>> implementation. On a theoretical multi-socket system, resctrl would
>>>>> require knowledge about the system's architecture to stitch together
>>>>> the counts from different types of monitors to produce a local and
>>>>> total value. I don't know if we'd program this SoC-specific knowledge
>>>>> into the kernel to produce a unified MBM resource like we're
>>>>> accustomed to now or if we'd present multiple MBM resources, each only
>>>>> providing an mbm_total_bytes event. In this case, the counters would
>>>>> have to be assigned separately in each MBM resource, especially if the
>>>>> different MBM resources support a different number of counters.
>>>>>
>>>>
>>>> "total" and "local" bandwidth is already in grey area after the
>>>> introduction of mbm_total_bytes_config/mbm_local_bytes_config where
>>>> user space could set values reported to not be constrained by the
>>>> "total" and "local" terms. We keep sticking with it though, even in
>>>> this implementation that uses the "t" and "l" flags, knowing that
>>>> what is actually monitored when "l" is set is just what the user
>>>> configured via mbm_local_bytes_config, which theoretically
>>>> can be "total" bandwidth.
>>>
>>> If it makes sense to support a separate, group-assignment interface at
>>> least for MPAM, this would be a better fit for soft-ABMC, even if it
>>> does have to stay downstream.
>>
>> (apologies for the delay)
>>
>> Could we please take a step back and confirm/agree what is meant with "group-
>> assignment"? In a previous message [1] I latched onto the statement
>> "the implementation is assigning RMIDs to groups, assignment results in all
>> events being counted.". In this I understood "groups" to be resctrl groups
>> and I understood this to mean that when a (soft-ABMC) counter is assigned
>> it applies to the entire resctrl group (all domains, all events). The
>> subsequent example in [2] was thus unexpected to me when the interface
>> was used to assign a (soft-ABMC) counter to the group but not all domains
>> were impacted.
>>
>> Considering this, could you please elaborate what is meant with
>> "group assignment"?
>
> By "group assignment", I just mean assigning counters to individual
> MBM events is not possible, or that assignment results in counters
> being assigned to all MBM events for a group in a domain.
Thank you for clarifying. I still think it is possible to use an entry
in "mbm_mode" to indicate to user space what to expect from the mbm_control
interface but I withdraw my original naming suggestions since it would create
confusion about what is meant by "group".
>
> I only omitted per-domain assignment in soft-ABMC before because
> Google doesn't have a use-case for it. I started the prototype before
> Babu's proposed interface required domain-scoped assignments[1]. Now
> that some sort of domain selector is required, I'm reconsidering.
Could you please elaborate what you mean with the required "domain selector"?
The latest ABMC version (v6) added support for assigning all domains using '*'.
Reinette
next prev parent reply other threads:[~2024-08-16 1:46 UTC|newest]
Thread overview: 95+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-03 21:48 [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Babu Moger
2024-07-03 21:48 ` [PATCH v5 01/20] x86/cpufeatures: Add support for " Babu Moger
2024-07-12 21:55 ` Reinette Chatre
2024-07-15 18:36 ` Moger, Babu
2024-07-03 21:48 ` [PATCH v5 02/20] x86/resctrl: Add ABMC feature in the command line options Babu Moger
2024-07-03 21:48 ` [PATCH v5 03/20] x86/resctrl: Consolidate monitoring related data from rdt_resource Babu Moger
2024-07-12 21:57 ` Reinette Chatre
2024-07-15 19:05 ` Moger, Babu
2024-07-03 21:48 ` [PATCH v5 04/20] x86/resctrl: Detect Assignable Bandwidth Monitoring feature details Babu Moger
2024-07-12 22:04 ` Reinette Chatre
2024-07-15 20:04 ` Moger, Babu
2024-07-16 15:11 ` Reinette Chatre
2024-07-03 21:48 ` [PATCH v5 05/20] x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags Babu Moger
2024-07-12 22:04 ` Reinette Chatre
2024-07-03 21:48 ` [PATCH v5 06/20] x86/resctrl: Add support to enable/disable AMD ABMC feature Babu Moger
2024-07-12 22:05 ` Reinette Chatre
2024-07-16 15:13 ` Moger, Babu
2024-07-16 17:51 ` Reinette Chatre
2024-07-16 18:48 ` Moger, Babu
2024-07-16 20:41 ` Reinette Chatre
2024-07-18 21:11 ` Moger, Babu
2024-08-16 16:29 ` James Morse
2024-07-03 21:48 ` [PATCH v5 07/20] x86/resctrl: Introduce the interface to display monitor mode Babu Moger
2024-07-12 22:06 ` Reinette Chatre
2024-07-16 16:51 ` Moger, Babu
2024-07-03 21:48 ` [PATCH v5 08/20] x86/resctrl: Introduce interface to display number of monitoring counters Babu Moger
2024-07-03 21:48 ` [PATCH v5 09/20] x86/resctrl: Initialize monitor counters bitmap Babu Moger
2024-07-12 22:07 ` Reinette Chatre
2024-07-16 17:59 ` Moger, Babu
2024-07-26 22:48 ` Peter Newman
2024-07-26 23:53 ` Moger, Babu
2024-08-01 21:05 ` Reinette Chatre
2024-07-03 21:48 ` [PATCH v5 10/20] x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg Babu Moger
2024-07-12 22:08 ` Reinette Chatre
2024-07-16 19:21 ` Moger, Babu
2024-07-16 20:42 ` Reinette Chatre
2024-07-16 22:43 ` Moger, Babu
2024-07-03 21:48 ` [PATCH v5 11/20] x86/resctrl: Remove MSR reading of event configuration value Babu Moger
2024-07-12 22:10 ` Reinette Chatre
2024-07-16 19:34 ` Moger, Babu
2024-07-03 21:48 ` [PATCH v5 12/20] x86/resctrl: Add data structures and definitions for ABMC assignment Babu Moger
2024-07-12 22:13 ` Reinette Chatre
2024-07-16 20:24 ` Moger, Babu
2024-07-03 21:48 ` [PATCH v5 13/20] x86/resctrl: Add the interface to assign hardware counter Babu Moger
2024-07-12 22:09 ` Reinette Chatre
2024-07-16 20:45 ` Moger, Babu
2024-07-03 21:48 ` [PATCH v5 14/20] x86/resctrl: Add the interface to unassign " Babu Moger
2024-07-03 21:48 ` [PATCH v5 15/20] x86/resctrl: Assign/unassign counters by default when ABMC is enabled Babu Moger
2024-07-12 22:10 ` Reinette Chatre
2024-07-16 20:58 ` Moger, Babu
2024-07-26 23:22 ` Peter Newman
2024-07-26 23:57 ` Moger, Babu
2024-07-03 21:48 ` [PATCH v5 16/20] x86/resctrl: Report "Unassigned" for MBM events in ABMC mode Babu Moger
2024-07-12 22:13 ` Reinette Chatre
2024-07-16 21:04 ` Moger, Babu
2024-07-13 20:26 ` Markus Elfring
2024-07-03 21:48 ` [PATCH v5 17/20] x86/resctrl: Introduce the interface switch between monitor modes Babu Moger
2024-07-12 22:14 ` Reinette Chatre
2024-07-16 22:46 ` Moger, Babu
2024-07-13 7:15 ` Markus Elfring
2024-07-03 21:48 ` [PATCH v5 18/20] x86/resctrl: Enable AMD ABMC feature by default when supported Babu Moger
2024-07-12 22:15 ` Reinette Chatre
2024-07-16 23:23 ` Moger, Babu
2024-07-26 0:16 ` Moger, Babu
2024-08-01 21:40 ` Reinette Chatre
2024-07-03 21:48 ` [PATCH v5 19/20] x86/resctrl: Introduce interface to list monitor states of all the groups Babu Moger
2024-07-12 22:16 ` Reinette Chatre
2024-07-17 15:22 ` Moger, Babu
2024-08-01 21:37 ` Reinette Chatre
2024-08-02 16:10 ` Moger, Babu
2024-07-03 21:48 ` [PATCH v5 20/20] x86/resctrl: Introduce interface to modify assignment states of " Babu Moger
2024-07-12 22:17 ` Reinette Chatre
2024-07-17 16:22 ` Moger, Babu
2024-07-25 0:03 ` Peter Newman
2024-07-25 1:22 ` Moger, Babu
2024-07-25 17:11 ` Peter Newman
2024-07-25 17:28 ` Moger, Babu
2024-08-01 18:56 ` Reinette Chatre
2024-08-01 19:40 ` Moger, Babu
2024-07-12 22:03 ` [PATCH v5 00/20] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC) Reinette Chatre
2024-07-17 17:19 ` Moger, Babu
2024-08-01 21:49 ` Reinette Chatre
2024-08-01 22:45 ` Peter Newman
2024-08-02 16:13 ` Reinette Chatre
2024-08-02 18:49 ` Moger, Babu
2024-08-02 19:13 ` Peter Newman
2024-08-02 20:23 ` Moger, Babu
2024-08-02 18:49 ` Peter Newman
2024-08-02 20:38 ` Moger, Babu
2024-08-02 20:55 ` Reinette Chatre
2024-08-02 22:50 ` Peter Newman
2024-08-14 17:37 ` Reinette Chatre
2024-08-15 23:06 ` Peter Newman
2024-08-16 1:45 ` Reinette Chatre [this message]
2024-08-03 0:49 ` Moger, Babu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=37766706-b4ea-4aa3-9686-191411eb480c@intel.com \
--to=reinette.chatre@intel.com \
--cc=babu.moger@amd.com \
--cc=bp@alien8.de \
--cc=corbet@lwn.net \
--cc=daniel.sneddon@linux.intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=eranian@google.com \
--cc=fenghua.yu@intel.com \
--cc=hpa@zytor.com \
--cc=ilpo.jarvinen@linux.intel.com \
--cc=james.morse@arm.com \
--cc=jithu.joseph@intel.com \
--cc=jmattson@google.com \
--cc=jpoimboe@kernel.org \
--cc=kai.huang@intel.com \
--cc=kan.liang@linux.intel.com \
--cc=kim.phillips@amd.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=leitao@debian.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lukas.bulwahn@gmail.com \
--cc=maciej.wieczor-retman@intel.com \
--cc=mingo@redhat.com \
--cc=paulmck@kernel.org \
--cc=pbonzini@redhat.com \
--cc=peternewman@google.com \
--cc=peterz@infradead.org \
--cc=rdunlap@infradead.org \
--cc=rick.p.edgecombe@intel.com \
--cc=sandipan.das@amd.com \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=x86@kernel.org \
--cc=yanjiewtw@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox