All of lore.kernel.org
 help / color / mirror / Atom feed
From: Reinette Chatre <reinette.chatre@intel.com>
To: Tony Luck <tony.luck@intel.com>, Fenghua Yu <fenghuay@nvidia.com>,
	"Maciej Wieczor-Retman" <maciej.wieczor-retman@intel.com>,
	Peter Newman <peternewman@google.com>,
	James Morse <james.morse@arm.com>,
	Babu Moger <babu.moger@amd.com>,
	Drew Fustini <dfustini@baylibre.com>,
	Dave Martin <Dave.Martin@arm.com>, Chen Yu <yu.c.chen@intel.com>
Cc: <x86@kernel.org>, <linux-kernel@vger.kernel.org>,
	<patches@lists.linux.dev>
Subject: Re: [PATCH v8 17/32] x86/resctrl: Discover hardware telemetry events
Date: Thu, 14 Aug 2025 14:39:26 -0700	[thread overview]
Message-ID: <4e99b982-cb31-4166-8357-7994cb24cf10@intel.com> (raw)
In-Reply-To: <20250811181709.6241-18-tony.luck@intel.com>

Hi Tony,

On 8/11/25 11:16 AM, Tony Luck wrote:
> Data for telemetry events is collected by each CPU and sent

"Each CPU collects data for telemetry events that it sends ..."
(imperative)

> to a nearby telemetry event aggregator either when the value
> of IA32_PQR_ASSOC.RMID is changed, or when two milliseconds

"is changed" -> "changes"
"or when a two millisecond timer expires" (imperative and 
matches the cover letter ... almost ... there is timer vs counter)

> have elapsed.

Please make use of allowed line length. These line wraps are getting very
short and quite noticeable with this growing changelog.

> 
> The telemetry event aggregators maintain per-RMID per-event
> counts of the total seen for all the CPUs. There may be more
> than one telemetry event aggregator per package.
> 
> Each telemetry event aggregator is responsible for a specific
> group of events. E.g. on the Intel Clearwater Forest CPU there
> are two types of aggregators. One type tracks a pair of energy
> related events. The other type tracks a subset of "perf" type
> events.
> 
> The event counts are made available to Linux in a region of
> MMIO space for each aggregator. All details about the layout
> of counters in each aggregator MMIO region are described in
> XML files published by Intel and mad available in a GitHub

"mad" -> "made"

> repository: https://github.com/intel/Intel-PMT.
> 
> The key to matching a specific telemetry aggregator to the
> XML file that describes the MMIO layout is a 32-bit value. The
> Linux telemetry subsystem refers to this as a "guid" while
> the XML files call it a "uniqueid".
> 
> Each XML file provides the following information:
> 1) Which telemetry events are included in the group.
> 2) The order in which the event counters appear for each RMID.
> 3) The value type of each event counter (integer or fixed-point).
> 4) The number of RMIDs supported.
> 5) Which additional aggregator status registers are included.
> 6) The total size of the MMIO region for this aggregator.
> 
> Enumeration of support for telemetry events is done by the
> INTEL_PMT_DISCOVERY driver (a subcomponent of the INTEL_PMT_TELEMETRY

"The INTEL_PMT_TELEMETRY driver enumerates support for telemetry events."
(imperative)
Could mentions of INTEL_PMT_DISCOVERY be dropped? It is 
INTEL_PMT_TELEMETRY that provides intel_pmt_get_regions_by_feature()
and I do not see how describing INTEL_PMT_DISCOVERY helps here.

> driver). This driver provides intel_pmt_get_regions_by_feature()
> to list all available telemetry event aggregators. The list
> includes the "guid", the base address in MMIO space for the
> region where the event counters are exposed, and the package
> id where the CPUs that report to this aggregator are located.
> 
> Add a new Kconfig option CONFIG_X86_CPU_RESCTRL_INTEL_AET for the
> Intel specific parts of telemetry code. This depends on the
> INTEL_PMT_TELEMETRY and INTEL_TPMI drivers being built-in to the kernel
> for enumeration of telemetry features.
> 
> Call intel_pmt_get_regions_by_feature() for each pmt_feature_id
> that indicates per-RMID telemetry.
> 
> Save the returned pmt_feature_group pointers with guids that are known
> to resctrl for use at run time.
> 
> Those pointers will be returned to the INTEL_PMT_TELEMETRY subsystem at
> resctrl_arch_exit() time.

Last two paragraphs can merge.

intel_aet_exit() is empty here and I think that the code that "returns the
pointers" can actually be included here to make clear how the
"get" done in this patch is paired with a "put" to not make one wonder about
reference counting issues. I believe that doing so cleanly here (without the
active_event_groups that I find unnecessary ... more later) will make the code
more symmetrical and easier to follow while also removing the "just trust me
I'll do this later" uncertainty.

> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---

(missing the maintainer notes about checkpatch.pl troubles)

...

> diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> new file mode 100644
> index 000000000000..25075f369148
> --- /dev/null
> +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> @@ -0,0 +1,120 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Resource Director Technology(RDT)
> + * - Intel Application Energy Telemetry
> + *
> + * Copyright (C) 2025 Intel Corporation
> + *
> + * Author:
> + *    Tony Luck <tony.luck@intel.com>
> + */
> +
> +#define pr_fmt(fmt)   "resctrl: " fmt
> +
> +#include <linux/cleanup.h>
> +#include <linux/cpu.h>
> +#include <linux/intel_vsec.h>
> +#include <linux/resctrl.h>
> +
> +#include "internal.h"
> +
> +/**
> + * struct event_group - All information about a group of telemetry events.
> + * @pfg:		Points to the aggregated telemetry space information
> + *			within the OOBMSM driver that contains data for all

"OOBMSM" -> "INTEL_PMT_TELEMETRY"?

> + *			telemetry regions.
> + * @guid:		Unique number per XML description file.
> + */
> +struct event_group {
> +	/* Data fields for additional structures to manage this group. */
> +	struct pmt_feature_group	*pfg;
> +
> +	/* Remaining fields initialized from XML file. */
> +	u32				guid;
> +};
> +
> +/*
> + * Link: https://github.com/intel/Intel-PMT
> + * File: xml/CWF/OOBMSM/RMID-ENERGY/cwf_aggregator.xml
> + */
> +static struct event_group energy_0x26696143 = {
> +	.guid		= 0x26696143,
> +};
> +
> +/*
> + * Link: https://github.com/intel/Intel-PMT
> + * File: xml/CWF/OOBMSM/RMID-PERF/cwf_aggregator.xml
> + */
> +static struct event_group perf_0x26557651 = {
> +	.guid		= 0x26557651,
> +};
> +
> +static struct event_group *known_energy_event_groups[] = {
> +	&energy_0x26696143,
> +};
> +
> +static struct event_group *known_perf_event_groups[] = {
> +	&perf_0x26557651,
> +};
> +
> +/* Stub for now */
> +static int discover_events(struct event_group *e, struct pmt_feature_group *p)
> +{
> +	return -EINVAL;
> +}
> +
> +DEFINE_FREE(intel_pmt_put_feature_group, struct pmt_feature_group *,
> +		if (!IS_ERR_OR_NULL(_T))
> +			intel_pmt_put_feature_group(_T))
> +
> +/*
> + * Make a request to the INTEL_PMT_DISCOVERY driver for the

"INTEL_PMT_DISCOVERY" -> "INTEL_PMT_TELEMETRY"

Please use line length available to not wrap lines so short.

> + * pmt_feature_group for a specific feature. If there is
> + * one the returned structure has an array of telemetry_region
> + * structures. Each describes one telemetry aggregator.
> + * Try to use every telemetry aggregator with a known guid.
> + */
> +static bool get_pmt_feature(enum pmt_feature_id feature, struct event_group **evgs,
> +			    unsigned int num_evg)
> +{
> +	struct pmt_feature_group *p __free(intel_pmt_put_feature_group) = NULL;
> +	struct event_group **peg;
> +	int ret;
> +
> +	p = intel_pmt_get_regions_by_feature(feature);
> +
> +	if (IS_ERR_OR_NULL(p))
> +		return false;
> +
> +	for (peg = evgs; peg < &evgs[num_evg]; peg++) {
> +		ret = discover_events(*peg, p);
> +		if (!ret) {
> +			(*peg)->pfg = no_free_ptr(p);
> +			return true;
> +		}
> +	}
> +
> +	return false;
> +}
> +
> +/*
> + * Ask OOBMSM discovery driver for all the RMID based telemetry groups

"OOBMSM discovery" -> "INTEL_PMT_TELEMETRY"

> + * that it supports.
> + */
> +bool intel_aet_get_events(void)
> +{
> +	bool ret1, ret2;
> +
> +	ret1 = get_pmt_feature(FEATURE_PER_RMID_ENERGY_TELEM,
> +			       known_energy_event_groups,
> +			       ARRAY_SIZE(known_energy_event_groups));
> +	ret2 = get_pmt_feature(FEATURE_PER_RMID_PERF_TELEM,
> +			       known_perf_event_groups,
> +			       ARRAY_SIZE(known_perf_event_groups));
> +
> +	return ret1 || ret2;
> +}
> +
> +void __exit intel_aet_exit(void)
> +{
> +}
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 58d890fe2100..56f0ff94c430 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -525,6 +525,19 @@ config X86_CPU_RESCTRL
>  
>  	  Say N if unsure.
>  
> +config X86_CPU_RESCTRL_INTEL_AET
> +	bool "Intel Application Energy Telemetry" if INTEL_PMT_TELEMETRY=y && INTEL_TPMI=y
> +	depends on X86_CPU_RESCTRL && CPU_SUP_INTEL
> +	help
> +	  Enable per-RMID telemetry events in resctrl

Missing period at end of sentence.

> +
> +	  Intel feature that collects per-RMID execution data
> +	  about energy consumption, measure of frequency independent
> +	  activity and other performance metrics. Data is aggregated
> +	  per package.
> +
> +	  Say N if unsure.
> +
>  config X86_FRED
>  	bool "Flexible Return and Event Delivery"
>  	depends on X86_64
> diff --git a/arch/x86/kernel/cpu/resctrl/Makefile b/arch/x86/kernel/cpu/resctrl/Makefile
> index d8a04b195da2..273ddfa30836 100644
> --- a/arch/x86/kernel/cpu/resctrl/Makefile
> +++ b/arch/x86/kernel/cpu/resctrl/Makefile
> @@ -1,6 +1,7 @@
>  # SPDX-License-Identifier: GPL-2.0
>  obj-$(CONFIG_X86_CPU_RESCTRL)		+= core.o rdtgroup.o monitor.o
>  obj-$(CONFIG_X86_CPU_RESCTRL)		+= ctrlmondata.o
> +obj-$(CONFIG_X86_CPU_RESCTRL_INTEL_AET)	+= intel_aet.o
>  obj-$(CONFIG_RESCTRL_FS_PSEUDO_LOCK)	+= pseudo_lock.o
>  
>  # To allow define_trace.h's recursive include:

Reinette

  reply	other threads:[~2025-08-14 21:39 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-11 18:16 [PATCH v8 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
2025-08-11 18:16 ` [PATCH v8 01/32] x86,fs/resctrl: Consolidate monitor event descriptions Tony Luck
2025-08-11 18:16 ` [PATCH v8 02/32] x86,fs/resctrl: Replace architecture event enabled checks Tony Luck
2025-08-11 18:16 ` [PATCH v8 03/32] x86/resctrl: Remove 'rdt_mon_features' global variable Tony Luck
2025-08-11 18:16 ` [PATCH v8 04/32] x86,fs/resctrl: Prepare for more monitor events Tony Luck
2025-08-11 18:16 ` [PATCH v8 05/32] x86,fs/resctrl: Improve domain type checking Tony Luck
2025-08-14  3:57   ` Reinette Chatre
2025-08-11 18:16 ` [PATCH v8 06/32] x86/resctrl: Move L3 initialization into new helper function Tony Luck
2025-08-14  3:58   ` Reinette Chatre
2025-08-11 18:16 ` [PATCH v8 07/32] x86,fs/resctrl: Refactor domain_remove_cpu_mon() ready for new domain types Tony Luck
2025-08-14  3:59   ` Reinette Chatre
2025-08-11 18:16 ` [PATCH v8 08/32] x86/resctrl: Clean up domain_remove_cpu_ctrl() Tony Luck
2025-08-14  3:59   ` Reinette Chatre
2025-08-11 18:16 ` [PATCH v8 09/32] x86,fs/resctrl: Use struct rdt_domain_hdr instead of struct rdt_mon_domain Tony Luck
2025-08-14  4:06   ` Reinette Chatre
2025-08-11 18:16 ` [PATCH v8 10/32] x86,fs/resctrl: Rename struct rdt_mon_domain and rdt_hw_mon_domain Tony Luck
2025-08-14  4:09   ` Reinette Chatre
2025-08-11 18:16 ` [PATCH v8 11/32] x86,fs/resctrl: Rename some L3 specific functions Tony Luck
2025-08-14  4:10   ` Reinette Chatre
2025-08-11 18:16 ` [PATCH v8 12/32] fs/resctrl: Make event details accessible to functions when reading events Tony Luck
2025-08-14  4:11   ` Reinette Chatre
2025-08-11 18:16 ` [PATCH v8 13/32] x86,fs/resctrl: Handle events that can be read from any CPU Tony Luck
2025-08-14  4:12   ` Reinette Chatre
2025-08-11 18:16 ` [PATCH v8 14/32] x86,fs/resctrl: Support binary fixed point event counters Tony Luck
2025-08-14  4:13   ` Reinette Chatre
2025-08-11 18:16 ` [PATCH v8 15/32] x86,fs/resctrl: Add an architectural hook called for each mount Tony Luck
2025-08-14 21:37   ` Reinette Chatre
2025-08-11 18:16 ` [PATCH v8 16/32] x86,fs/resctrl: Add and initialize rdt_resource for package scope monitor Tony Luck
2025-08-14 21:38   ` Reinette Chatre
2025-08-11 18:16 ` [PATCH v8 17/32] x86/resctrl: Discover hardware telemetry events Tony Luck
2025-08-14 21:39   ` Reinette Chatre [this message]
2025-08-11 18:16 ` [PATCH v8 18/32] x86/resctrl: Count valid telemetry aggregators per package Tony Luck
2025-08-14 21:41   ` Reinette Chatre
2025-08-11 18:16 ` [PATCH v8 19/32] x86/resctrl: Complete telemetry event enumeration Tony Luck
2025-08-14 21:42   ` Reinette Chatre
2025-08-11 18:16 ` [PATCH v8 20/32] x86,fs/resctrl: Fill in details of events for guid 0x26696143 and 0x26557651 Tony Luck
2025-08-14 21:42   ` Reinette Chatre
2025-08-11 18:16 ` [PATCH v8 21/32] x86,fs/resctrl: Add architectural event pointer Tony Luck
2025-08-14 21:43   ` Reinette Chatre
2025-08-11 18:16 ` [PATCH v8 22/32] x86/resctrl: Read telemetry events Tony Luck
2025-08-14 21:50   ` Reinette Chatre
2025-08-11 18:16 ` [PATCH v8 23/32] x86/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG Tony Luck
2025-08-14 21:51   ` Reinette Chatre
2025-08-11 18:16 ` [PATCH v8 24/32] x86/resctrl: Add energy/perf choices to rdt boot option Tony Luck
2025-08-14 21:51   ` Reinette Chatre
2025-08-11 18:16 ` [PATCH v8 25/32] x86/resctrl: Handle number of RMIDs supported by telemetry resources Tony Luck
2025-08-14 21:54   ` Reinette Chatre
2025-08-11 18:17 ` [PATCH v8 26/32] fs/resctrl: Move allocation/free of closid_num_dirty_rmid Tony Luck
2025-08-14 21:54   ` Reinette Chatre
2025-08-11 18:17 ` [PATCH v8 27/32] fs,x86/resctrl: Compute number of RMIDs as minimum across resources Tony Luck
2025-08-14 21:55   ` Reinette Chatre
2025-08-11 18:17 ` [PATCH v8 28/32] fs/resctrl: Move RMID initialization to first mount Tony Luck
2025-08-14 21:58   ` Reinette Chatre
2025-08-11 18:17 ` [PATCH v8 29/32] x86/resctrl: Enable RDT_RESOURCE_PERF_PKG Tony Luck
2025-08-14 21:57   ` Reinette Chatre
2025-08-11 18:17 ` [PATCH v8 30/32] fs/resctrl: Provide interface to create architecture specific debugfs area Tony Luck
2025-08-14 21:57   ` Reinette Chatre
2025-08-11 18:17 ` [PATCH v8 31/32] x86/resctrl: Add debugfs files to show telemetry aggregator status Tony Luck
2025-08-14 21:59   ` Reinette Chatre
2025-08-11 18:17 ` [PATCH v8 32/32] x86,fs/resctrl: Update Documentation for package events Tony Luck
2025-08-14 22:01   ` Reinette Chatre
2025-08-14  3:55 ` [PATCH v8 00/32] x86,fs/resctrl telemetry monitoring Reinette Chatre
2025-08-14 15:44   ` Luck, Tony
2025-08-14 16:14     ` Reinette Chatre
2025-08-14 23:57 ` Reinette Chatre
2025-08-15 15:47   ` Luck, Tony
2025-08-25 22:20     ` Luck, Tony
2025-08-28 16:45       ` Reinette Chatre
2025-08-28 20:14         ` Luck, Tony
2025-08-28 22:05           ` Reinette Chatre
2025-08-28 23:49             ` Luck, Tony

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4e99b982-cb31-4166-8357-7994cb24cf10@intel.com \
    --to=reinette.chatre@intel.com \
    --cc=Dave.Martin@arm.com \
    --cc=babu.moger@amd.com \
    --cc=dfustini@baylibre.com \
    --cc=fenghuay@nvidia.com \
    --cc=james.morse@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maciej.wieczor-retman@intel.com \
    --cc=patches@lists.linux.dev \
    --cc=peternewman@google.com \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.