From: Reinette Chatre <reinette.chatre@intel.com>
To: Peter Newman <peternewman@google.com>
Cc: "Luck, Tony" <tony.luck@intel.com>,
Fenghua Yu <fenghuay@nvidia.com>,
Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>,
James Morse <james.morse@arm.com>,
Babu Moger <babu.moger@amd.com>,
Drew Fustini <dfustini@baylibre.com>,
Dave Martin <Dave.Martin@arm.com>,
"Anil Keshavamurthy" <anil.s.keshavamurthy@intel.com>,
<linux-kernel@vger.kernel.org>, <patches@lists.linux.dev>
Subject: Re: [PATCH v3 10/26] fs/resctrl: Improve handling for events that can be read from any CPU
Date: Wed, 23 Apr 2025 08:47:34 -0700 [thread overview]
Message-ID: <6863c369-706a-452d-a413-4d55a1c5861e@intel.com> (raw)
In-Reply-To: <CALPaoCimCmSyeejR9FCLcitquwenmOo0-0PVngUMtmSr_syy-A@mail.gmail.com>
Hi Peter,
On 4/23/25 6:27 AM, Peter Newman wrote:
> Hi Reinette,
>
> On Tue, Apr 22, 2025 at 8:20 PM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>>
>> Hi Tony,
>>
>> On 4/21/25 1:28 PM, Luck, Tony wrote:
>>> On Fri, Apr 18, 2025 at 03:54:02PM -0700, Reinette Chatre wrote:
>>>>> @@ -619,7 +622,8 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>>>>> goto out;
>>>>> }
>>>>> d = container_of(hdr, struct rdt_mon_domain, hdr);
>>>>> - mon_event_read(&rr, r, d, rdtgrp, &d->hdr.cpu_mask, evtid, false);
>>>>> + mask = md->any_cpu ? cpu_online_mask : &d->hdr.cpu_mask;
>>>>> + mon_event_read(&rr, r, d, rdtgrp, mask, evtid, false);
>>>>
>>>> I do not think this accomplishes the goal of this patch. Looking at mon_event_read() it calls
>>>> cpumask_any_housekeeping(cpumask, RESCTRL_PICK_ANY_CPU) before any of the smp_*() calls.
>>>>
>>>> cpumask_any_housekeeping()
>>>> {
>>>> ...
>>>> if (exclude_cpu == RESCTRL_PICK_ANY_CPU)
>>>> cpu = cpumask_any(mask);
>>>> ...
>>>> }
>>>>
>>>> cpumask_any() is just cpumask_first() so it will pick the first CPU in the
>>>> online mask that may not be the current CPU.
>>>>
>>>> fwiw ... there are some optimizations planned in this area that I have not yet studied:
>>>> https://lore.kernel.org/lkml/20250407153856.133093-1-yury.norov@gmail.com/
>>>
>>> I remember Peter complaining[1] about extra context switches when
>>> cpumask_any_housekeeping() was introduced, but it seems that the
>>> discussion died with no fix applied.
>>
>> The initial complaint was indeed that reading individual events is slower.
>>
>> The issue is that the intended use case read from many files at frequent
>> intervals and thus becomes vulnerable to any changes in this area that
>> really is already a slow path (reading from a file ... taking a mutex ...).
>>
>> Instead of working on shaving cycles off this path the discussion transitioned
>> to resctrl providing better support for the underlying use case. I
>> understood that this is being experimented with [2] and last I heard it
>> looks promising.
>>
>>>
>>> The blocking problem is that ARM may not be able to read a counter
>>> on a tick_nohz CPU because it may need to sleep.
>
> If I hadn't already turned my attention to optimizing bulk counter
> reads, I might have mentioned that the change Tony referred to is
> broken on MPAM implementations because the MPAM
> resctrl_arch_rmid_read() cannot wait for its internal mutex with
> preemption disabled.
>
>>>
>>> Do we need more options for events:
>>>
>>> 1) Must be read on a CPU in the right domain // Legacy
>>> 2) Can be read from any CPU // My addtion
>>> 3) Must be read on a "housekeeping" CPU // James' code in upstream
>>> 4) Cannot be read on a tick_nohz CPU // Could be combined with 1 or 2?
>>
>> I do not see needing additional complexity here. I think it will be simpler
>> to just replace use of cpumask_any_housekeeping() in mon_event_read() with
>> open code that supports the particular usage. As I understand it is prohibited
>> for all CPUs to be in tick_nohz_full_mask so it looks to me as though the
>> existing "if (tick_nohz_full_cpu(cpu))" should never be true (since no CPU is being excluded).
>> Also, since mon_event_read() has no need to exclude CPUs, just a cpumask_andnot()
>> should suffice to determine what remains of given mask after accounting for all the
>> NO_HZ CPUs if tick_nohz_full_enabled().
>
> Can you clarify what you mean by "all CPUs"? It's not difficult for
I mentioned this in the context of this patch that adds support for
events that can be ready from *any* CPU. The CPU reading the event data
need not be in the domain for which data is being read so all CPUs
on the system are available to the flow supporting these events. Since
all CPUs on the system cannot be in tick_nohz_full_mask there will always
be a CPU available to read this type of event that can be read from any CPU.
I made it way too complicated with this though. Tony proposed something
much better and simpler [1].
> all CPUs in an L3 domain to be in tick_nohz_full_mask on AMD
> implementations, where there are many small L3 domains (~8 CPUs each)
> in a socket.
>
> Google makes use of isolation along this domain boundary on AMD
> platforms in some products and these users prefer to read counters
> using IPIs because they are concerned about introducing context
> switches to the isolated part of the system. In these configurations,
> there is typically only one RMID in that domain, so few of these IPIs
> are needed. (Note that these are different users from the ones I had
> described before who spawn large numbers of containers not limited to
> any domains and want to read the MBM counters for all the RMIDs on all
> the domains frequently.)
>
Thank you for this insight. There is no change planned for reading
event counters for those events that need to be read from their
domain. Tony's recent proposal [1] moves the handling of these new
style of events to a separate branch.
Reinette
[1] https://lore.kernel.org/lkml/DS7PR11MB607763D8B912A60A3574D2BAFCBA2@DS7PR11MB6077.namprd11.prod.outlook.com/
next prev parent reply other threads:[~2025-04-23 15:47 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-07 23:40 [PATCH v3 00/26] x86/resctrl telemetry monitoring Tony Luck
2025-04-07 23:40 ` [PATCH v3 01/26] fs/resctrl: Simplify allocation of mon_data structures Tony Luck
2025-04-18 21:13 ` Reinette Chatre
2025-04-07 23:40 ` [PATCH v3 02/26] fs-x86/resctrl: Prepare for more monitor events Tony Luck
2025-04-18 21:17 ` Reinette Chatre
2025-04-07 23:40 ` [PATCH v3 03/26] fs/resctrl: Change how events are initialized Tony Luck
2025-04-18 21:22 ` Reinette Chatre
2025-04-07 23:40 ` [PATCH v3 04/26] fs/resctrl: Set up Kconfig options for telemetry events Tony Luck
2025-04-18 21:23 ` Reinette Chatre
2025-04-07 23:40 ` [PATCH v3 05/26] x86/rectrl: Fake OOBMSM interface Tony Luck
2025-04-18 21:27 ` Reinette Chatre
2025-04-07 23:40 ` [PATCH v3 06/26] fs-x86/rectrl: Improve domain type checking Tony Luck
2025-04-18 21:40 ` Reinette Chatre
2025-04-07 23:40 ` [PATCH v3 07/26] x86/resctrl: Move L3 initialization out of domain_add_cpu_mon() Tony Luck
2025-04-18 21:51 ` Reinette Chatre
2025-04-21 20:01 ` Luck, Tony
2025-04-22 18:18 ` Reinette Chatre
2025-04-07 23:40 ` [PATCH v3 08/26] x86/resctrl: Refactor domain_remove_cpu_mon() ready for new domain types Tony Luck
2025-04-18 21:53 ` Reinette Chatre
2025-04-07 23:40 ` [PATCH v3 09/26] x86/resctrl: Change generic monitor functions to use struct rdt_domain_hdr Tony Luck
2025-04-18 22:42 ` Reinette Chatre
2025-04-07 23:40 ` [PATCH v3 10/26] fs/resctrl: Improve handling for events that can be read from any CPU Tony Luck
2025-04-18 22:54 ` Reinette Chatre
2025-04-21 20:28 ` Luck, Tony
2025-04-22 18:19 ` Reinette Chatre
2025-04-23 0:51 ` Luck, Tony
2025-04-23 3:37 ` Reinette Chatre
2025-04-23 13:27 ` Peter Newman
2025-04-23 15:47 ` Reinette Chatre [this message]
2025-04-07 23:40 ` [PATCH v3 11/26] fs/resctrl: Add support for additional monitor event display formats Tony Luck
2025-04-18 23:02 ` Reinette Chatre
2025-04-21 19:34 ` Luck, Tony
2025-04-22 18:20 ` Reinette Chatre
2025-04-07 23:40 ` [PATCH v3 12/26] fs/resctrl: Add hook for architecture code to set monitor event attributes Tony Luck
2025-04-18 23:11 ` Reinette Chatre
2025-04-21 19:50 ` Luck, Tony
2025-04-22 18:20 ` Reinette Chatre
2025-04-07 23:40 ` [PATCH v3 13/26] fs/resctrl: Add an architectural hook called for each mount Tony Luck
2025-04-18 23:47 ` Reinette Chatre
2025-04-07 23:40 ` [PATCH v3 14/26] x86/resctrl: Add first part of telemetry event enumeration Tony Luck
2025-04-19 0:08 ` Reinette Chatre
2025-04-07 23:40 ` [PATCH v3 15/26] x86/resctrl: Second stage " Tony Luck
2025-04-19 0:30 ` Reinette Chatre
2025-04-07 23:40 ` [PATCH v3 16/26] x86/resctrl: Third phase " Tony Luck
2025-04-19 0:45 ` Reinette Chatre
2025-04-07 23:40 ` [PATCH v3 17/26] x86/resctrl: Build a lookup table for each resctrl event id Tony Luck
2025-04-19 0:48 ` Reinette Chatre
2025-04-07 23:40 ` [PATCH v3 18/26] x86/resctrl: Add code to read core telemetry events Tony Luck
2025-04-19 1:53 ` Reinette Chatre
2025-04-07 23:40 ` [PATCH v3 19/26] x86/resctrl: Sanity check telemetry RMID values Tony Luck
2025-04-19 5:14 ` Reinette Chatre
2025-04-07 23:40 ` [PATCH v3 20/26] x86/resctrl: Add and initialize rdt_resource for package scope core monitor Tony Luck
2025-04-07 23:40 ` [PATCH v3 21/26] fs-x86/resctrl: Handle RDT_RESOURCE_PERF_PKG in domain create/delete Tony Luck
2025-04-19 5:22 ` Reinette Chatre
2025-04-07 23:40 ` [PATCH v3 22/26] fs/resctrl: Add type define for PERF_PKG files Tony Luck
2025-04-07 23:40 ` [PATCH v3 23/26] fs/resctrl: Add new telemetry event id and structures Tony Luck
2025-04-07 23:40 ` [PATCH v3 24/26] x86/resctrl: Final steps to enable RDT_RESOURCE_PERF_PKG Tony Luck
2025-04-07 23:40 ` [PATCH v3 25/26] fs-x86/resctrl: Add detailed descriptions for Clearwater Forest events Tony Luck
2025-04-19 5:30 ` Reinette Chatre
2025-04-07 23:40 ` [PATCH v3 26/26] x86/resctrl: Update Documentation for package events Tony Luck
2025-04-19 5:40 ` Reinette Chatre
2025-04-18 21:13 ` [PATCH v3 00/26] x86/resctrl telemetry monitoring Reinette Chatre
2025-04-21 18:57 ` Luck, Tony
2025-04-21 22:59 ` Reinette Chatre
2025-04-22 16:20 ` Luck, Tony
2025-04-22 21:30 ` Reinette Chatre
2025-04-19 5:47 ` Reinette Chatre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6863c369-706a-452d-a413-4d55a1c5861e@intel.com \
--to=reinette.chatre@intel.com \
--cc=Dave.Martin@arm.com \
--cc=anil.s.keshavamurthy@intel.com \
--cc=babu.moger@amd.com \
--cc=dfustini@baylibre.com \
--cc=fenghuay@nvidia.com \
--cc=james.morse@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=maciej.wieczor-retman@intel.com \
--cc=patches@lists.linux.dev \
--cc=peternewman@google.com \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox