public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Reinette Chatre <reinette.chatre@intel.com>
To: James Morse <james.morse@arm.com>, Peter Newman <peternewman@google.com>
Cc: Tony Luck <tony.luck@intel.com>,
	"Yu, Fenghua" <fenghua.yu@intel.com>,
	"Eranian, Stephane" <eranian@google.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	"Babu Moger" <Babu.Moger@amd.com>,
	Gaurang Upasani <gupasani@google.com>
Subject: Re: [RFD] resctrl: reassigning a running container's CTRL_MON group
Date: Wed, 9 Nov 2022 11:12:04 -0800	[thread overview]
Message-ID: <eb1a0949-dbd0-482f-d19a-738cf8842b96@intel.com> (raw)
In-Reply-To: <8325a442-92c1-4170-1862-3bc891a8d6af@arm.com>

Hi James,

On 11/9/2022 9:59 AM, James Morse wrote:
> Hi Reinette,
> 
> On 08/11/2022 21:28, Reinette Chatre wrote:
>> On 11/3/2022 10:06 AM, James Morse wrote:
>>> (I've not got to the last message in this part of the thread yes - I'm out of time this
>>> week, back Monday!)
>>>
>>> On 21/10/2022 21:09, Reinette Chatre wrote:
>>>> On 10/19/2022 6:57 AM, James Morse wrote:
>>>>> On 17/10/2022 11:15, Peter Newman wrote:
>>>>>> On Wed, Oct 12, 2022 at 6:55 PM James Morse <james.morse@arm.com> wrote:
>>
>> ...
>>
>>>>>> If there are a lot more PARTIDs than PMGs, then it would fit well with a
>>>>>> user who never creates child MON groups. In case the number of MON
>>>>>> groups gets ahead of the number of CTRL_MON groups and you've run out of
>>>>>> PMGs, perhaps you would just try to allocate another PARTID and program
>>>>>> the same partitioning configuration before giving up.
>>>>>
>>>>> User-space can choose to do this.
>>>>> If the kernel tries to be clever and do this behind user-space's back, it needs to
>>>>> allocate two monitors for this secretly-two-control-groups, and always sum the counters
>>>>> before reporting them to user-space.
>>>
>>>> If I understand this scenario correctly, the kernel is already doing this.
>>>> As implemented in mon_event_count() the monitor data of a CTRL_MON group is
>>>> the sum of the parent CTRL_MON group and all its child MON groups.
>>>
>>> That is true. MPAM has an additional headache here as it needs to allocate a monitor in
>>> order to read the counters. If there are enough monitors for each CLOSID*RMID to have one,
>>> then MPAM can export the counter files in the same way RDT does.
>>>
>>> While there are systems that have enough monitors, I don't think this is going to be the
>>> norm. To allow systems that don't have a surfeit of monitors to use the counters, I plan
>>> to export the values from resctrl_arch_rmid_read() via perf. (but only for bandwidth counters)
> 
>> This sounds related to the way monitoring was done in earlier kernels. This was
>> long before I become involved with this work. Unfortunately I am not familiar with
>> all the history involved that ended in it being removed from the kernel.
> 
> Yup, I'm aware there is some history to this. It's not appropriate for the llc_occupancy
> counter as that reports state, instead of events.

Perf counts events while a process is running so memory bandwidth monitoring may
also be impacted by the caveats Peter mentioned for the upcoming AMD changes:

https://lore.kernel.org/lkml/CALPaoCidd+WwGTyE3D74LhoL13ce+EvdTmOnyPrQN62j+zZ1fg@mail.gmail.com/
("This has the caveats that evictions while one task is running could have
resulted from a previous task on the current CPU, but will be counted
against the new task's software-RMID, ...")

...
>> The new counters will also not reflect the task's history.
> 
> Indeed. I anticipate user-space is sampling this file periodically, otherwise it can't
> calculate a MB/s from the raw byte-count. I don't think losing the history is problem.

Indeed. Cache occupancy may experience more corner cases depending on
the workloads. Your point that user space needs to know how/that counters
are impacted is important.

> 
> The state before the change being lost could be a problem, but this is a difference with
> the way MPAM works. I think its best to just expose this property to user-space, as I
> don't think its feasible to work around.
> 
> User-space would probably ignore the counter for a period of time after the move, as
> depending on where the regulation is happening, it may take a little while for the CLOSID
> change to take effect.

Agree.


>> Moving an arm64  monitor group may thus have a few surprises for user
>> space while sounding complex to support. Would adding all this additional
>> support be worth it if the guidance to user space is to instead create many
>> control groups in such a control-group-rich environment?
> 
> I'd prefer it didn't exist at all, but if there are reasons to support it on x86, I'd like
> the MPAM support to be as similar as possible. I'm willing to accept (advertised!) noise
> in the counters, but a whole missing syscall is a harder sell.

ok.

> 
> 
>>> Whether this old counters keep counting needs exposing to user-space so that it is aware.
>>
>> Could you please elaborate? Do old counters not always keep counting?
> 
> Its not new - but the expectation is the mv/rename support does this atomically without
> glitching/resetting the counters. Because of that new expectation, I think it needs
> exposing to user-space.
> 
> Something should be indicated to user-space so it knows it can move monitor groups around,
> otherwise its another 'try it and see'.

ok.
 
> 
>>> To solve Peter's use-case, we also need:
>>>  * to expose how many new groups can be created at each level.
>>>    This is because MPAM doesn't have a property like num_rmid.
> 
>> Unfortunately num_rmid is part of the user space interface. While MPAM
>> does not have "RMIDs" it seems that num_rmid can still be relevant
>> based on what it is described to represent in Documentation/x86/resctrl.rst:
>> "This is the upper bound for how many "CTRL_MON" + "MON" groups can
>> be created." 
> 
> I agree it can't be removed, and MPAM systems will need to put a value there.
> The problem is 'rmid' has a well known definition, even if the kernel documentation is
> nuanced.
> 
> This might be contentious, but ideally I'd 'deprecate' num_rmid, and split it into two
> properties that don't reference an architecture. (Obviously the files have to stay for at
> least the next 10 years!)

I think this may be difficult considering the various user space clients
already in use but doing so is reasonable. 

Reinette


  reply	other threads:[~2022-11-09 19:12 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-07 10:39 [RFD] resctrl: reassigning a running container's CTRL_MON group Peter Newman
2022-10-07 15:36 ` Reinette Chatre
2022-10-07 15:44   ` Yu, Fenghua
2022-10-07 17:28     ` Tony Luck
2022-10-10 23:35       ` Reinette Chatre
2022-10-12 11:21         ` Peter Newman
2022-10-12 16:55           ` James Morse
2022-10-17 10:15             ` Peter Newman
2022-10-19 13:57               ` James Morse
2022-10-20 10:39                 ` Peter Newman
2022-10-21 12:42                   ` Peter Newman
2022-10-25 15:55                     ` James Morse
2022-10-26  8:52                       ` Peter Newman
2022-10-26 21:12                         ` Reinette Chatre
2022-10-27  7:56                           ` Peter Newman
2022-10-27 17:35                             ` Reinette Chatre
2022-11-01 15:23                               ` Peter Newman
2022-11-01 15:53                                 ` Peter Newman
2022-11-01 16:48                                   ` Reinette Chatre
2022-10-25 15:56                   ` James Morse
2022-10-21 20:09                 ` Reinette Chatre
2022-10-21 20:22                   ` Luck, Tony
2022-10-21 21:34                     ` Reinette Chatre
2022-11-03 17:06                   ` James Morse
2022-11-08 21:28                     ` Reinette Chatre
2022-11-08 21:56                       ` Luck, Tony
2022-11-08 23:18                         ` Reinette Chatre
2022-11-09 17:58                           ` James Morse
2022-11-09  9:50                       ` Peter Newman
2022-11-09 19:11                         ` Reinette Chatre
2022-11-11 18:38                           ` James Morse
2022-11-14 18:02                             ` Reinette Chatre
2022-11-16 13:20                             ` Peter Newman
2022-11-09 17:59                       ` James Morse
2022-11-09 19:12                         ` Reinette Chatre [this message]
2022-11-11 18:36                           ` James Morse
2022-10-12 16:57           ` Yu, Fenghua
2022-10-12 17:23           ` Reinette Chatre
2022-10-14 12:56             ` James Morse
2022-10-19  9:08             ` Peter Newman
2022-10-19 13:20               ` James Morse
2022-10-19 23:54               ` Reinette Chatre
2022-10-20  8:48                 ` Peter Newman
2022-10-20 19:08                   ` Reinette Chatre
2022-10-21 10:09                     ` Peter Newman
2022-10-25 15:56                       ` James Morse
2022-10-25 15:55                     ` James Morse
2022-10-26  9:36                       ` Peter Newman
2022-11-03 17:06                         ` James Morse
2022-11-08 21:25                           ` Reinette Chatre
2022-10-07 17:57 ` Moger, Babu
2022-10-11 15:00   ` Stephane Eranian
2022-10-11 14:59 ` Stephane Eranian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eb1a0949-dbd0-482f-d19a-738cf8842b96@intel.com \
    --to=reinette.chatre@intel.com \
    --cc=Babu.Moger@amd.com \
    --cc=eranian@google.com \
    --cc=fenghua.yu@intel.com \
    --cc=gupasani@google.com \
    --cc=james.morse@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peternewman@google.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox