[PATCH v17 00/32] x86,fs/resctrl telemetry monitoring

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring
@ 2025-12-17 17:20 Tony Luck
  2025-12-17 17:20 ` [PATCH v17 01/32] x86,fs/resctrl: Improve domain type checking Tony Luck
                   ` (32 more replies)
  0 siblings, 33 replies; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:20 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Patches based on Linus v6.19-rc1

Series available here:
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git rdt-aet-v17

Changes since v16 was posted here:
https://lore.kernel.org/all/20251210231413.59102-1-tony.luck@intel.com/

Cover letter
Added some examples for Babu

part 11
Added Reinette RB tag

part 19
Update commit message to explain why it is safe to enable just some events
within an event group.
Added Reinette RB tag

part 24
Added Reinette RB tag

part 25
Drop unneeded local variable "ret" from all_regions_have_sufficient_rmid()
Added Reinette RB tag

part 32
Added Reinette RB tag

Background
----------
On Intel systems that support per-RMID telemetry monitoring each logical
processor keeps a local count for various events. When the
MSR_IA32_PQR_ASSOC.RMID value for the logical processor changes (or when a
two millisecond counter expires) these event counts are transmitted to
an event aggregator on the same package as the processor together with
the current RMID value. The event counters are reset to zero to begin
counting again.

Each aggregator takes the incoming event counts and adds them to
cumulative counts for each event for each RMID. Note that there can be
multiple aggregators on each package with no architectural association
between logical processors and an aggregator.

All of these aggregated counters can be read by an operating system from
the MMIO space of the Out Of Band Management Service Module (OOBMSM)
device(s) on a system. Any counter can be read from any logical processor.

Intel publishes details for each processor generation showing which
events are counted by each logical processor and the offsets for each
accumulated counter value within the MMIO space in XML files here:
https://github.com/intel/Intel-PMT.

For example there are two energy related telemetry events for the
Clearwater Forest family of processors and the MMIO space looks like this:

Offset  RMID    Event
------  ----    -----
0x0000  0       core_energy
0x0008  0       activity
0x0010  1       core_energy
0x0018  1       activity
...
0x23F0  575     core_energy
0x23F8  575     activity

In addition the XML file provides the units (Joules for core_energy,
Farads for activity) and the type of data (fixed-point binary with
bit 63 used to indicate the data is valid, and the low 18 bits as a
binary fraction).

Finally, each XML file provides a 32-bit unique id (or guid) that is
used as an index to find the correct XML description file for each
telemetry implementation.

The INTEL_PMT_TELEMETRY driver provides intel_pmt_get_regions_by_feature()
to enumerate the aggregator instances (also referred to as "telemetry
regions" in this series) on a platform. It provides:

1) guid  - so resctrl can determine which events are supported
2) MMIO base address of counters
3) package id

Resctrl accumulates counts from all aggregators on a package in order
to provide a consistent user interface across processor generations.

Directory structure for the telemetry events looks like this:

$ tree /sys/fs/resctrl/mon_data/
/sys/fs/resctrl/mon_data/
mon_data
├── mon_PERF_PKG_00
│   ├── activity
│   └── core_energy
└── mon_PERF_PKG_01
    ├── activity
    └── core_energy

Reading the "core_energy" file from some resctrl mon_data directory shows
the cumulative energy (in Joules) used by all tasks that ran with the RMID
associated with that directory on a given package. Note that "core_energy"
reports only energy consumed by CPU cores (data processing units,
L1/L2 caches, etc.). It does not include energy used in the "uncore"
(L3 cache, on package devices, etc.), or used by memory or I/O devices.

Examples:
--------

As with other resctrl monitoring features first create CTRL_MON or MON
directories and assign the tasks of interest to the group.

# mkdir /sys/fs/resctrl/aet_example
# echo {list of PIDs} > /sys/fs/resctrl/aet_example/tasks

For simplicity in this example, assume that these tasks have their
affinity set to CPUs in the first socket. Set a shell variable to
point to the mon_data directory for socket 0:

$ dir=/sys/fs/resctrl/aet_example/mon_data/mon_PERF_PKG_00

Energy events:
-------------

There are two events associated with energy consumption in the core.
The "core_energy" event reports out directly in Joules. To compute
power just take the difference between two samples and divide by the
time between them. E.g.

$ cat $dir/core_energy; sleep 10; cat $dir/core_energy
94499439.510380
94499607.019680
$ bc -q
scale=3
(94499607.019680 - 94499439.510380) / 10
16.750

So 16.75 Watts in this example.

Note that different runs of the same workload may report different
energy consumption. This happens when cores shift to different
voltage/frequency profiles due to overall system load.

The "activity" event reports energy usage in a manner independent
of voltage and frequency. This may be useful for developers to
assess how modifications to a program (e.g. attaching to a library
optimized to use AVX instructions) affect energy consumption. So
read the "activity" at the start and end of program execution and
compute the difference.

Perf events:
-----------

The other telemetry events largely duplicate events available using
"perf", but avoid reading the perf counters on every context switch.
This may be a significant improvement when monitoring highly multi-threaded
applications. E.g. to find the ratio of core cycles to reference cycles:

$ cat $dir/unhalted_core_cycles $dir/unhalted_ref_cycles
1312249223146571
1660157011698276
$ { run application here }
$ cat $dir/unhalted_core_cycles $dir/unhalted_ref_cycles
1313573565617233
1661511224019444
$ bc -q
scale = 3
(1661511224019444 - 1660157011698276) / (1313573565617233 - 1312249223146571)
1.022

Signed-off-by: Tony Luck <tony.luck@intel.com>

Tony Luck (32):
  x86,fs/resctrl: Improve domain type checking
  x86/resctrl: Move L3 initialization into new helper function
  x86/resctrl: Refactor domain_remove_cpu_mon() ready for new domain
    types
  x86/resctrl: Clean up domain_remove_cpu_ctrl()
  x86,fs/resctrl: Refactor domain create/remove using struct
    rdt_domain_hdr
  fs/resctrl: Split L3 dependent parts out of __mon_event_count()
  x86,fs/resctrl: Use struct rdt_domain_hdr when reading counters
  x86,fs/resctrl: Rename struct rdt_mon_domain and rdt_hw_mon_domain
  x86,fs/resctrl: Rename some L3 specific functions
  fs/resctrl: Make event details accessible to functions when reading
    events
  x86,fs/resctrl: Handle events that can be read from any CPU
  x86,fs/resctrl: Support binary fixed point event counters
  x86,fs/resctrl: Add an architectural hook called for each mount
  x86,fs/resctrl: Add and initialize a resource for package scope
    monitoring
  fs/resctrl: Emphasize that L3 monitoring resource is required for
    summing domains
  x86/resctrl: Discover hardware telemetry events
  x86,fs/resctrl: Fill in details of events for guid 0x26696143 and
    0x26557651
  x86,fs/resctrl: Add architectural event pointer
  x86/resctrl: Find and enable usable telemetry events
  x86/resctrl: Read telemetry events
  fs/resctrl: Refactor mkdir_mondata_subdir()
  fs/resctrl: Refactor rmdir_mondata_subdir_allrdtgrp()
  x86,fs/resctrl: Handle domain creation/deletion for
    RDT_RESOURCE_PERF_PKG
  x86/resctrl: Add energy/perf choices to rdt boot option
  x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF_PKG
  fs/resctrl: Move allocation/free of closid_num_dirty_rmid[]
  x86,fs/resctrl: Compute number of RMIDs as minimum across resources
  fs/resctrl: Move RMID initialization to first mount
  x86/resctrl: Enable RDT_RESOURCE_PERF_PKG
  fs/resctrl: Provide interface to create architecture specific debugfs
    area
  x86/resctrl: Add debugfs files to show telemetry aggregator status
  x86,fs/resctrl: Update documentation for telemetry events

 .../admin-guide/kernel-parameters.txt         |   7 +-
 Documentation/filesystems/resctrl.rst         | 101 +++-
 include/linux/resctrl.h                       |  67 ++-
 include/linux/resctrl_types.h                 |  11 +
 arch/x86/kernel/cpu/resctrl/internal.h        |  48 +-
 fs/resctrl/internal.h                         |  68 ++-
 arch/x86/kernel/cpu/resctrl/core.c            | 230 ++++++---
 arch/x86/kernel/cpu/resctrl/intel_aet.c       | 473 ++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/monitor.c         |  50 +-
 fs/resctrl/ctrlmondata.c                      | 113 ++++-
 fs/resctrl/monitor.c                          | 364 +++++++++-----
 fs/resctrl/rdtgroup.c                         | 293 +++++++----
 arch/x86/Kconfig                              |  13 +
 arch/x86/kernel/cpu/resctrl/Makefile          |   1 +
 14 files changed, 1440 insertions(+), 399 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/resctrl/intel_aet.c

base-commit: 8f0b4cce4481fb22653697cced8d0d04027cb1e8
-- 
2.52.0

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v17 01/32] x86,fs/resctrl: Improve domain type checking
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
@ 2025-12-17 17:20 ` Tony Luck
  2025-12-17 17:20 ` [PATCH v17 02/32] x86/resctrl: Move L3 initialization into new helper function Tony Luck
                   ` (31 subsequent siblings)
  32 siblings, 0 replies; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:20 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Every resctrl resource has a list of domain structures. struct rdt_ctrl_domain
and struct rdt_mon_domain both begin with struct rdt_domain_hdr with
rdt_domain_hdr::type used in validity checks before accessing the domain of a
particular type.

Add the resource id to struct rdt_domain_hdr in preparation for a new monitoring
domain structure that will be associated with a new monitoring resource. Improve
existing domain validity checks with a new helper domain_header_is_valid()
that checks both domain type and resource id.  domain_header_is_valid() should
be used before every call to container_of() that accesses a domain structure.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 include/linux/resctrl.h            |  9 +++++++++
 arch/x86/kernel/cpu/resctrl/core.c | 10 ++++++----
 fs/resctrl/ctrlmondata.c           |  2 +-
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 54701668b3df..e7c218f8d4f7 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -131,15 +131,24 @@ enum resctrl_domain_type {
  * @list:		all instances of this resource
  * @id:			unique id for this instance
  * @type:		type of this instance
+ * @rid:		resource id for this instance
  * @cpu_mask:		which CPUs share this resource
  */
 struct rdt_domain_hdr {
 	struct list_head		list;
 	int				id;
 	enum resctrl_domain_type	type;
+	enum resctrl_res_level		rid;
 	struct cpumask			cpu_mask;
 };
 
+static inline bool domain_header_is_valid(struct rdt_domain_hdr *hdr,
+					  enum resctrl_domain_type type,
+					  enum resctrl_res_level rid)
+{
+	return !WARN_ON_ONCE(hdr->type != type || hdr->rid != rid);
+}
+
 /**
  * struct rdt_ctrl_domain - group of CPUs sharing a resctrl control resource
  * @hdr:		common header for different domain types
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 3792ab4819dc..0b8b7b8697a7 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -464,7 +464,7 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
 
 	hdr = resctrl_find_domain(&r->ctrl_domains, id, &add_pos);
 	if (hdr) {
-		if (WARN_ON_ONCE(hdr->type != RESCTRL_CTRL_DOMAIN))
+		if (!domain_header_is_valid(hdr, RESCTRL_CTRL_DOMAIN, r->rid))
 			return;
 		d = container_of(hdr, struct rdt_ctrl_domain, hdr);
 
@@ -481,6 +481,7 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
 	d = &hw_dom->d_resctrl;
 	d->hdr.id = id;
 	d->hdr.type = RESCTRL_CTRL_DOMAIN;
+	d->hdr.rid = r->rid;
 	cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
 
 	rdt_domain_reconfigure_cdp(r);
@@ -520,7 +521,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
 
 	hdr = resctrl_find_domain(&r->mon_domains, id, &add_pos);
 	if (hdr) {
-		if (WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN))
+		if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, r->rid))
 			return;
 		d = container_of(hdr, struct rdt_mon_domain, hdr);
 
@@ -538,6 +539,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
 	d = &hw_dom->d_resctrl;
 	d->hdr.id = id;
 	d->hdr.type = RESCTRL_MON_DOMAIN;
+	d->hdr.rid = r->rid;
 	ci = get_cpu_cacheinfo_level(cpu, RESCTRL_L3_CACHE);
 	if (!ci) {
 		pr_warn_once("Can't find L3 cache for CPU:%d resource %s\n", cpu, r->name);
@@ -598,7 +600,7 @@ static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
 		return;
 	}
 
-	if (WARN_ON_ONCE(hdr->type != RESCTRL_CTRL_DOMAIN))
+	if (!domain_header_is_valid(hdr, RESCTRL_CTRL_DOMAIN, r->rid))
 		return;
 
 	d = container_of(hdr, struct rdt_ctrl_domain, hdr);
@@ -644,7 +646,7 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
 		return;
 	}
 
-	if (WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN))
+	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, r->rid))
 		return;
 
 	d = container_of(hdr, struct rdt_mon_domain, hdr);
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index b2d178d3556e..905c310de573 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -653,7 +653,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 		 * the resource to find the domain with "domid".
 		 */
 		hdr = resctrl_find_domain(&r->mon_domains, domid, NULL);
-		if (!hdr || WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN)) {
+		if (!hdr || !domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, resid)) {
 			ret = -ENOENT;
 			goto out;
 		}
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v17 02/32] x86/resctrl: Move L3 initialization into new helper function
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
  2025-12-17 17:20 ` [PATCH v17 01/32] x86,fs/resctrl: Improve domain type checking Tony Luck
@ 2025-12-17 17:20 ` Tony Luck
  2025-12-17 17:20 ` [PATCH v17 03/32] x86/resctrl: Refactor domain_remove_cpu_mon() ready for new domain types Tony Luck
                   ` (30 subsequent siblings)
  32 siblings, 0 replies; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:20 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Carve out the resource monitoring domain init code into a separate helper in
order to be able to initialize new types of monitoring domains besides the
usual L3 ones.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/resctrl/core.c | 64 ++++++++++++++++--------------
 1 file changed, 34 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 0b8b7b8697a7..2a568b316711 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -501,37 +501,13 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
 	}
 }
 
-static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
+static void l3_mon_domain_setup(int cpu, int id, struct rdt_resource *r, struct list_head *add_pos)
 {
-	int id = get_domain_id_from_scope(cpu, r->mon_scope);
-	struct list_head *add_pos = NULL;
 	struct rdt_hw_mon_domain *hw_dom;
-	struct rdt_domain_hdr *hdr;
 	struct rdt_mon_domain *d;
 	struct cacheinfo *ci;
 	int err;
 
-	lockdep_assert_held(&domain_list_lock);
-
-	if (id < 0) {
-		pr_warn_once("Can't find monitor domain id for CPU:%d scope:%d for resource %s\n",
-			     cpu, r->mon_scope, r->name);
-		return;
-	}
-
-	hdr = resctrl_find_domain(&r->mon_domains, id, &add_pos);
-	if (hdr) {
-		if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, r->rid))
-			return;
-		d = container_of(hdr, struct rdt_mon_domain, hdr);
-
-		cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
-		/* Update the mbm_assign_mode state for the CPU if supported */
-		if (r->mon.mbm_cntr_assignable)
-			resctrl_arch_mbm_cntr_assign_set_one(r);
-		return;
-	}
-
 	hw_dom = kzalloc_node(sizeof(*hw_dom), GFP_KERNEL, cpu_to_node(cpu));
 	if (!hw_dom)
 		return;
@@ -539,7 +515,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
 	d = &hw_dom->d_resctrl;
 	d->hdr.id = id;
 	d->hdr.type = RESCTRL_MON_DOMAIN;
-	d->hdr.rid = r->rid;
+	d->hdr.rid = RDT_RESOURCE_L3;
 	ci = get_cpu_cacheinfo_level(cpu, RESCTRL_L3_CACHE);
 	if (!ci) {
 		pr_warn_once("Can't find L3 cache for CPU:%d resource %s\n", cpu, r->name);
@@ -549,10 +525,6 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
 	d->ci_id = ci->id;
 	cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
 
-	/* Update the mbm_assign_mode state for the CPU if supported */
-	if (r->mon.mbm_cntr_assignable)
-		resctrl_arch_mbm_cntr_assign_set_one(r);
-
 	arch_mon_domain_online(r, d);
 
 	if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
@@ -570,6 +542,38 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
 	}
 }
 
+static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
+{
+	int id = get_domain_id_from_scope(cpu, r->mon_scope);
+	struct list_head *add_pos = NULL;
+	struct rdt_domain_hdr *hdr;
+
+	lockdep_assert_held(&domain_list_lock);
+
+	if (id < 0) {
+		pr_warn_once("Can't find monitor domain id for CPU:%d scope:%d for resource %s\n",
+			     cpu, r->mon_scope, r->name);
+		return;
+	}
+
+	hdr = resctrl_find_domain(&r->mon_domains, id, &add_pos);
+	if (hdr)
+		cpumask_set_cpu(cpu, &hdr->cpu_mask);
+
+	switch (r->rid) {
+	case RDT_RESOURCE_L3:
+		/* Update the mbm_assign_mode state for the CPU if supported */
+		if (r->mon.mbm_cntr_assignable)
+			resctrl_arch_mbm_cntr_assign_set_one(r);
+		if (!hdr)
+			l3_mon_domain_setup(cpu, id, r, add_pos);
+		break;
+	default:
+		pr_warn_once("Unknown resource rid=%d\n", r->rid);
+		break;
+	}
+}
+
 static void domain_add_cpu(int cpu, struct rdt_resource *r)
 {
 	if (r->alloc_capable)
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v17 03/32] x86/resctrl: Refactor domain_remove_cpu_mon() ready for new domain types
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
  2025-12-17 17:20 ` [PATCH v17 01/32] x86,fs/resctrl: Improve domain type checking Tony Luck
  2025-12-17 17:20 ` [PATCH v17 02/32] x86/resctrl: Move L3 initialization into new helper function Tony Luck
@ 2025-12-17 17:20 ` Tony Luck
  2025-12-17 17:20 ` [PATCH v17 04/32] x86/resctrl: Clean up domain_remove_cpu_ctrl() Tony Luck
                   ` (29 subsequent siblings)
  32 siblings, 0 replies; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:20 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

New telemetry events will be associated with a new package scoped resource with
a new domain structure.

Refactor domain_remove_cpu_mon() so all the L3 domain processing is separate
the general domain action of clearing the CPU bit in the mask.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/resctrl/core.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 2a568b316711..49b133e847d4 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -631,9 +631,7 @@ static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
 static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
 {
 	int id = get_domain_id_from_scope(cpu, r->mon_scope);
-	struct rdt_hw_mon_domain *hw_dom;
 	struct rdt_domain_hdr *hdr;
-	struct rdt_mon_domain *d;
 
 	lockdep_assert_held(&domain_list_lock);
 
@@ -650,20 +648,29 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
 		return;
 	}
 
-	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, r->rid))
+	cpumask_clear_cpu(cpu, &hdr->cpu_mask);
+	if (!cpumask_empty(&hdr->cpu_mask))
 		return;
 
-	d = container_of(hdr, struct rdt_mon_domain, hdr);
-	hw_dom = resctrl_to_arch_mon_dom(d);
+	switch (r->rid) {
+	case RDT_RESOURCE_L3: {
+		struct rdt_hw_mon_domain *hw_dom;
+		struct rdt_mon_domain *d;
 
-	cpumask_clear_cpu(cpu, &d->hdr.cpu_mask);
-	if (cpumask_empty(&d->hdr.cpu_mask)) {
+		if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
+			return;
+
+		d = container_of(hdr, struct rdt_mon_domain, hdr);
+		hw_dom = resctrl_to_arch_mon_dom(d);
 		resctrl_offline_mon_domain(r, d);
-		list_del_rcu(&d->hdr.list);
+		list_del_rcu(&hdr->list);
 		synchronize_rcu();
 		mon_domain_free(hw_dom);
-
-		return;
+		break;
+	}
+	default:
+		pr_warn_once("Unknown resource rid=%d\n", r->rid);
+		break;
 	}
 }
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v17 04/32] x86/resctrl: Clean up domain_remove_cpu_ctrl()
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (2 preceding siblings ...)
  2025-12-17 17:20 ` [PATCH v17 03/32] x86/resctrl: Refactor domain_remove_cpu_mon() ready for new domain types Tony Luck
@ 2025-12-17 17:20 ` Tony Luck
  2025-12-17 17:20 ` [PATCH v17 05/32] x86,fs/resctrl: Refactor domain create/remove using struct rdt_domain_hdr Tony Luck
                   ` (28 subsequent siblings)
  32 siblings, 0 replies; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:20 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

For symmetry with domain_remove_cpu_mon() refactor domain_remove_cpu_ctrl()
to take an early return when removing a CPU does not empty the domain.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/resctrl/core.c | 29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 49b133e847d4..64ed81cbf8bf 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -604,28 +604,27 @@ static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
 		return;
 	}
 
+	cpumask_clear_cpu(cpu, &hdr->cpu_mask);
+	if (!cpumask_empty(&hdr->cpu_mask))
+		return;
+
 	if (!domain_header_is_valid(hdr, RESCTRL_CTRL_DOMAIN, r->rid))
 		return;
 
 	d = container_of(hdr, struct rdt_ctrl_domain, hdr);
 	hw_dom = resctrl_to_arch_ctrl_dom(d);
 
-	cpumask_clear_cpu(cpu, &d->hdr.cpu_mask);
-	if (cpumask_empty(&d->hdr.cpu_mask)) {
-		resctrl_offline_ctrl_domain(r, d);
-		list_del_rcu(&d->hdr.list);
-		synchronize_rcu();
-
-		/*
-		 * rdt_ctrl_domain "d" is going to be freed below, so clear
-		 * its pointer from pseudo_lock_region struct.
-		 */
-		if (d->plr)
-			d->plr->d = NULL;
-		ctrl_domain_free(hw_dom);
+	resctrl_offline_ctrl_domain(r, d);
+	list_del_rcu(&hdr->list);
+	synchronize_rcu();
 
-		return;
-	}
+	/*
+	 * rdt_ctrl_domain "d" is going to be freed below, so clear
+	 * its pointer from pseudo_lock_region struct.
+	 */
+	if (d->plr)
+		d->plr->d = NULL;
+	ctrl_domain_free(hw_dom);
 }
 
 static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v17 05/32] x86,fs/resctrl: Refactor domain create/remove using struct rdt_domain_hdr
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (3 preceding siblings ...)
  2025-12-17 17:20 ` [PATCH v17 04/32] x86/resctrl: Clean up domain_remove_cpu_ctrl() Tony Luck
@ 2025-12-17 17:20 ` Tony Luck
  2025-12-17 17:20 ` [PATCH v17 06/32] fs/resctrl: Split L3 dependent parts out of __mon_event_count() Tony Luck
                   ` (27 subsequent siblings)
  32 siblings, 0 replies; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:20 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Up until now, all monitoring events were associated with the L3 resource and it
made sense to use the L3 specific "struct rdt_mon_domain *" argument to functions
operating on domains.

Telemetry events will be tied to a new resource with its instances represented
by a new domain structure that, just like struct rdt_mon_domain, starts with
the generic struct rdt_domain_hdr.

Prepare to support domains belonging to different resources by changing the
calling convention of functions operating on domains.  Pass the generic header
and use that to find the domain specific structure where needed.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 include/linux/resctrl.h            |  4 +-
 fs/resctrl/internal.h              |  2 +-
 arch/x86/kernel/cpu/resctrl/core.c |  4 +-
 fs/resctrl/ctrlmondata.c           | 14 ++++--
 fs/resctrl/rdtgroup.c              | 69 +++++++++++++++++++++---------
 5 files changed, 63 insertions(+), 30 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index e7c218f8d4f7..5db37c7e89c5 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -507,9 +507,9 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
 u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
 			    u32 closid, enum resctrl_conf_type type);
 int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d);
-int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d);
+int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr);
 void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d);
-void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d);
+void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr);
 void resctrl_online_cpu(unsigned int cpu);
 void resctrl_offline_cpu(unsigned int cpu);
 
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index bff4a54ae333..5e52269b391e 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -362,7 +362,7 @@ void mon_event_count(void *info);
 int rdtgroup_mondata_show(struct seq_file *m, void *arg);
 
 void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
-		    struct rdt_mon_domain *d, struct rdtgroup *rdtgrp,
+		    struct rdt_domain_hdr *hdr, struct rdtgroup *rdtgrp,
 		    cpumask_t *cpumask, int evtid, int first);
 
 int resctrl_mon_resource_init(void);
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 64ed81cbf8bf..1fab4c67d273 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -534,7 +534,7 @@ static void l3_mon_domain_setup(int cpu, int id, struct rdt_resource *r, struct
 
 	list_add_tail_rcu(&d->hdr.list, add_pos);
 
-	err = resctrl_online_mon_domain(r, d);
+	err = resctrl_online_mon_domain(r, &d->hdr);
 	if (err) {
 		list_del_rcu(&d->hdr.list);
 		synchronize_rcu();
@@ -661,7 +661,7 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
 
 		d = container_of(hdr, struct rdt_mon_domain, hdr);
 		hw_dom = resctrl_to_arch_mon_dom(d);
-		resctrl_offline_mon_domain(r, d);
+		resctrl_offline_mon_domain(r, hdr);
 		list_del_rcu(&hdr->list);
 		synchronize_rcu();
 		mon_domain_free(hw_dom);
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index 905c310de573..3154cdc98a31 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -551,14 +551,21 @@ struct rdt_domain_hdr *resctrl_find_domain(struct list_head *h, int id,
 }
 
 void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
-		    struct rdt_mon_domain *d, struct rdtgroup *rdtgrp,
+		    struct rdt_domain_hdr *hdr, struct rdtgroup *rdtgrp,
 		    cpumask_t *cpumask, int evtid, int first)
 {
+	struct rdt_mon_domain *d = NULL;
 	int cpu;
 
 	/* When picking a CPU from cpu_mask, ensure it can't race with cpuhp */
 	lockdep_assert_cpus_held();
 
+	if (hdr) {
+		if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
+			return;
+		d = container_of(hdr, struct rdt_mon_domain, hdr);
+	}
+
 	/*
 	 * Setup the parameters to pass to mon_event_count() to read the data.
 	 */
@@ -653,12 +660,11 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 		 * the resource to find the domain with "domid".
 		 */
 		hdr = resctrl_find_domain(&r->mon_domains, domid, NULL);
-		if (!hdr || !domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, resid)) {
+		if (!hdr) {
 			ret = -ENOENT;
 			goto out;
 		}
-		d = container_of(hdr, struct rdt_mon_domain, hdr);
-		mon_event_read(&rr, r, d, rdtgrp, &d->hdr.cpu_mask, evtid, false);
+		mon_event_read(&rr, r, hdr, rdtgrp, &hdr->cpu_mask, evtid, false);
 	}
 
 checkresult:
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 8e39dfda56bc..89ffe54fb0fc 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -3229,17 +3229,22 @@ static void mon_rmdir_one_subdir(struct kernfs_node *pkn, char *name, char *subn
  * when last domain being summed is removed.
  */
 static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
-					   struct rdt_mon_domain *d)
+					   struct rdt_domain_hdr *hdr)
 {
 	struct rdtgroup *prgrp, *crgrp;
+	struct rdt_mon_domain *d;
 	char subname[32];
 	bool snc_mode;
 	char name[32];
 
+	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
+		return;
+
+	d = container_of(hdr, struct rdt_mon_domain, hdr);
 	snc_mode = r->mon_scope == RESCTRL_L3_NODE;
-	sprintf(name, "mon_%s_%02d", r->name, snc_mode ? d->ci_id : d->hdr.id);
+	sprintf(name, "mon_%s_%02d", r->name, snc_mode ? d->ci_id : hdr->id);
 	if (snc_mode)
-		sprintf(subname, "mon_sub_%s_%02d", r->name, d->hdr.id);
+		sprintf(subname, "mon_sub_%s_%02d", r->name, hdr->id);
 
 	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
 		mon_rmdir_one_subdir(prgrp->mon.mon_data_kn, name, subname);
@@ -3249,15 +3254,20 @@ static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
 	}
 }
 
-static int mon_add_all_files(struct kernfs_node *kn, struct rdt_mon_domain *d,
+static int mon_add_all_files(struct kernfs_node *kn, struct rdt_domain_hdr *hdr,
 			     struct rdt_resource *r, struct rdtgroup *prgrp,
 			     bool do_sum)
 {
 	struct rmid_read rr = {0};
+	struct rdt_mon_domain *d;
 	struct mon_data *priv;
 	struct mon_evt *mevt;
 	int ret, domid;
 
+	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
+		return -EINVAL;
+
+	d = container_of(hdr, struct rdt_mon_domain, hdr);
 	for_each_mon_event(mevt) {
 		if (mevt->rid != r->rid || !mevt->enabled)
 			continue;
@@ -3271,23 +3281,28 @@ static int mon_add_all_files(struct kernfs_node *kn, struct rdt_mon_domain *d,
 			return ret;
 
 		if (!do_sum && resctrl_is_mbm_event(mevt->evtid))
-			mon_event_read(&rr, r, d, prgrp, &d->hdr.cpu_mask, mevt->evtid, true);
+			mon_event_read(&rr, r, hdr, prgrp, &hdr->cpu_mask, mevt->evtid, true);
 	}
 
 	return 0;
 }
 
 static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
-				struct rdt_mon_domain *d,
+				struct rdt_domain_hdr *hdr,
 				struct rdt_resource *r, struct rdtgroup *prgrp)
 {
 	struct kernfs_node *kn, *ckn;
+	struct rdt_mon_domain *d;
 	char name[32];
 	bool snc_mode;
 	int ret = 0;
 
 	lockdep_assert_held(&rdtgroup_mutex);
 
+	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
+		return -EINVAL;
+
+	d = container_of(hdr, struct rdt_mon_domain, hdr);
 	snc_mode = r->mon_scope == RESCTRL_L3_NODE;
 	sprintf(name, "mon_%s_%02d", r->name, snc_mode ? d->ci_id : d->hdr.id);
 	kn = kernfs_find_and_get(parent_kn, name);
@@ -3305,13 +3320,13 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
 		ret = rdtgroup_kn_set_ugid(kn);
 		if (ret)
 			goto out_destroy;
-		ret = mon_add_all_files(kn, d, r, prgrp, snc_mode);
+		ret = mon_add_all_files(kn, hdr, r, prgrp, snc_mode);
 		if (ret)
 			goto out_destroy;
 	}
 
 	if (snc_mode) {
-		sprintf(name, "mon_sub_%s_%02d", r->name, d->hdr.id);
+		sprintf(name, "mon_sub_%s_%02d", r->name, hdr->id);
 		ckn = kernfs_create_dir(kn, name, parent_kn->mode, prgrp);
 		if (IS_ERR(ckn)) {
 			ret = -EINVAL;
@@ -3322,7 +3337,7 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
 		if (ret)
 			goto out_destroy;
 
-		ret = mon_add_all_files(ckn, d, r, prgrp, false);
+		ret = mon_add_all_files(ckn, hdr, r, prgrp, false);
 		if (ret)
 			goto out_destroy;
 	}
@@ -3340,7 +3355,7 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
  * and "monitor" groups with given domain id.
  */
 static void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
-					   struct rdt_mon_domain *d)
+					   struct rdt_domain_hdr *hdr)
 {
 	struct kernfs_node *parent_kn;
 	struct rdtgroup *prgrp, *crgrp;
@@ -3348,12 +3363,12 @@ static void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
 
 	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
 		parent_kn = prgrp->mon.mon_data_kn;
-		mkdir_mondata_subdir(parent_kn, d, r, prgrp);
+		mkdir_mondata_subdir(parent_kn, hdr, r, prgrp);
 
 		head = &prgrp->mon.crdtgrp_list;
 		list_for_each_entry(crgrp, head, mon.crdtgrp_list) {
 			parent_kn = crgrp->mon.mon_data_kn;
-			mkdir_mondata_subdir(parent_kn, d, r, crgrp);
+			mkdir_mondata_subdir(parent_kn, hdr, r, crgrp);
 		}
 	}
 }
@@ -3362,14 +3377,14 @@ static int mkdir_mondata_subdir_alldom(struct kernfs_node *parent_kn,
 				       struct rdt_resource *r,
 				       struct rdtgroup *prgrp)
 {
-	struct rdt_mon_domain *dom;
+	struct rdt_domain_hdr *hdr;
 	int ret;
 
 	/* Walking r->domains, ensure it can't race with cpuhp */
 	lockdep_assert_cpus_held();
 
-	list_for_each_entry(dom, &r->mon_domains, hdr.list) {
-		ret = mkdir_mondata_subdir(parent_kn, dom, r, prgrp);
+	list_for_each_entry(hdr, &r->mon_domains, list) {
+		ret = mkdir_mondata_subdir(parent_kn, hdr, r, prgrp);
 		if (ret)
 			return ret;
 	}
@@ -4253,16 +4268,23 @@ void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain
 	mutex_unlock(&rdtgroup_mutex);
 }
 
-void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
+void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr)
 {
+	struct rdt_mon_domain *d;
+
 	mutex_lock(&rdtgroup_mutex);
 
+	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
+		goto out_unlock;
+
+	d = container_of(hdr, struct rdt_mon_domain, hdr);
+
 	/*
 	 * If resctrl is mounted, remove all the
 	 * per domain monitor data directories.
 	 */
 	if (resctrl_mounted && resctrl_arch_mon_capable())
-		rmdir_mondata_subdir_allrdtgrp(r, d);
+		rmdir_mondata_subdir_allrdtgrp(r, hdr);
 
 	if (resctrl_is_mbm_enabled())
 		cancel_delayed_work(&d->mbm_over);
@@ -4280,7 +4302,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d
 	}
 
 	domain_destroy_mon_state(d);
-
+out_unlock:
 	mutex_unlock(&rdtgroup_mutex);
 }
 
@@ -4353,12 +4375,17 @@ int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d
 	return err;
 }
 
-int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
+int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr)
 {
-	int err;
+	struct rdt_mon_domain *d;
+	int err = -EINVAL;
 
 	mutex_lock(&rdtgroup_mutex);
 
+	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
+		goto out_unlock;
+
+	d = container_of(hdr, struct rdt_mon_domain, hdr);
 	err = domain_setup_mon_state(r, d);
 	if (err)
 		goto out_unlock;
@@ -4379,7 +4406,7 @@ int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
 	 * If resctrl is mounted, add per domain monitor data directories.
 	 */
 	if (resctrl_mounted && resctrl_arch_mon_capable())
-		mkdir_mondata_subdir_allrdtgrp(r, d);
+		mkdir_mondata_subdir_allrdtgrp(r, hdr);
 
 out_unlock:
 	mutex_unlock(&rdtgroup_mutex);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v17 06/32] fs/resctrl: Split L3 dependent parts out of __mon_event_count()
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (4 preceding siblings ...)
  2025-12-17 17:20 ` [PATCH v17 05/32] x86,fs/resctrl: Refactor domain create/remove using struct rdt_domain_hdr Tony Luck
@ 2025-12-17 17:20 ` Tony Luck
  2025-12-17 17:20 ` [PATCH v17 07/32] x86,fs/resctrl: Use struct rdt_domain_hdr when reading counters Tony Luck
                   ` (26 subsequent siblings)
  32 siblings, 0 replies; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:20 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Carve out the L3 resource specific event reading code into a separate helper
to support reading event data from a new monitoring resource.

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 fs/resctrl/monitor.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 572a9925bd6c..b5e0db38c8bf 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -413,7 +413,7 @@ static void mbm_cntr_free(struct rdt_mon_domain *d, int cntr_id)
 	memset(&d->cntr_cfg[cntr_id], 0, sizeof(*d->cntr_cfg));
 }
 
-static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
+static int __l3_mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 {
 	int cpu = smp_processor_id();
 	u32 closid = rdtgrp->closid;
@@ -494,6 +494,17 @@ static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 	return ret;
 }
 
+static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
+{
+	switch (rr->r->rid) {
+	case RDT_RESOURCE_L3:
+		return __l3_mon_event_count(rdtgrp, rr);
+	default:
+		rr->err = -EINVAL;
+		return -EINVAL;
+	}
+}
+
 /*
  * mbm_bw_count() - Update bw count from values previously read by
  *		    __mon_event_count().
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v17 07/32] x86,fs/resctrl: Use struct rdt_domain_hdr when reading counters
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (5 preceding siblings ...)
  2025-12-17 17:20 ` [PATCH v17 06/32] fs/resctrl: Split L3 dependent parts out of __mon_event_count() Tony Luck
@ 2025-12-17 17:20 ` Tony Luck
  2025-12-17 17:20 ` [PATCH v17 08/32] x86,fs/resctrl: Rename struct rdt_mon_domain and rdt_hw_mon_domain Tony Luck
                   ` (25 subsequent siblings)
  32 siblings, 0 replies; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:20 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Convert the whole call sequence from mon_event_read() to resctrl_arch_rmid_read() to
pass resource independent struct rdt_domain_hdr instead of an L3 specific domain
structure to prepare for monitoring events in other resources.

This additional layer of indirection obscures which aspects of event counting depend
on a valid domain. Event initialization, support for assignable counters, and normal
event counting implicitly depend on a valid domain while summing of domains does not.
Split summing domains from the core event counting handling to make their respective
dependencies obvious.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 include/linux/resctrl.h               |  4 +-
 fs/resctrl/internal.h                 | 18 +++---
 arch/x86/kernel/cpu/resctrl/monitor.c | 12 +++-
 fs/resctrl/ctrlmondata.c              |  9 +--
 fs/resctrl/monitor.c                  | 85 ++++++++++++++++++---------
 5 files changed, 78 insertions(+), 50 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 5db37c7e89c5..9b9877fb3238 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -517,7 +517,7 @@ void resctrl_offline_cpu(unsigned int cpu);
  * resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid
  *			      for this resource and domain.
  * @r:			resource that the counter should be read from.
- * @d:			domain that the counter should be read from.
+ * @hdr:		Header of domain that the counter should be read from.
  * @closid:		closid that matches the rmid. Depending on the architecture, the
  *			counter may match traffic of both @closid and @rmid, or @rmid
  *			only.
@@ -538,7 +538,7 @@ void resctrl_offline_cpu(unsigned int cpu);
  * Return:
  * 0 on success, or -EIO, -EINVAL etc on error.
  */
-int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
 			   u32 closid, u32 rmid, enum resctrl_event_id eventid,
 			   u64 *val, void *arch_mon_ctx);
 
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 5e52269b391e..9912b774a580 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -106,24 +106,26 @@ struct mon_data {
  *	   resource group then its event count is summed with the count from all
  *	   its child resource groups.
  * @r:	   Resource describing the properties of the event being read.
- * @d:	   Domain that the counter should be read from. If NULL then sum all
- *	   domains in @r sharing L3 @ci.id
+ * @hdr:   Header of domain that the counter should be read from. If NULL then
+ *	   sum all domains in @r sharing L3 @ci.id
  * @evtid: Which monitor event to read.
  * @first: Initialize MBM counter when true.
- * @ci:    Cacheinfo for L3. Only set when @d is NULL. Used when summing domains.
+ * @ci:    Cacheinfo for L3. Only set when @hdr is NULL. Used when summing
+ *	   domains.
  * @is_mbm_cntr: true if "mbm_event" counter assignment mode is enabled and it
  *	   is an MBM event.
  * @err:   Error encountered when reading counter.
- * @val:   Returned value of event counter. If @rgrp is a parent resource group,
- *	   @val includes the sum of event counts from its child resource groups.
- *	   If @d is NULL, @val includes the sum of all domains in @r sharing @ci.id,
- *	   (summed across child resource groups if @rgrp is a parent resource group).
+ * @val:   Returned value of event counter. If @rgrp is a parent resource
+ *	   group, @val includes the sum of event counts from its child
+ *	   resource groups.  If @hdr is NULL, @val includes the sum of all
+ *	   domains in @r sharing @ci.id, (summed across child resource groups
+ *	   if @rgrp is a parent resource group).
  * @arch_mon_ctx: Hardware monitor allocated for this read request (MPAM only).
  */
 struct rmid_read {
 	struct rdtgroup		*rgrp;
 	struct rdt_resource	*r;
-	struct rdt_mon_domain	*d;
+	struct rdt_domain_hdr	*hdr;
 	enum resctrl_event_id	evtid;
 	bool			first;
 	struct cacheinfo	*ci;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index dffcc8307500..3da970ea1903 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -238,19 +238,25 @@ static u64 get_corrected_val(struct rdt_resource *r, struct rdt_mon_domain *d,
 	return chunks * hw_res->mon_scale;
 }
 
-int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
 			   u32 unused, u32 rmid, enum resctrl_event_id eventid,
 			   u64 *val, void *ignored)
 {
-	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
-	int cpu = cpumask_any(&d->hdr.cpu_mask);
+	struct rdt_hw_mon_domain *hw_dom;
 	struct arch_mbm_state *am;
+	struct rdt_mon_domain *d;
 	u64 msr_val;
 	u32 prmid;
+	int cpu;
 	int ret;
 
 	resctrl_arch_rmid_read_context_check();
+	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
+		return -EINVAL;
 
+	d = container_of(hdr, struct rdt_mon_domain, hdr);
+	hw_dom = resctrl_to_arch_mon_dom(d);
+	cpu = cpumask_any(&hdr->cpu_mask);
 	prmid = logical_rmid_to_physical_rmid(cpu, rmid);
 	ret = __rmid_read_phys(prmid, eventid, &msr_val);
 
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index 3154cdc98a31..9242a2982e77 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -554,25 +554,18 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 		    struct rdt_domain_hdr *hdr, struct rdtgroup *rdtgrp,
 		    cpumask_t *cpumask, int evtid, int first)
 {
-	struct rdt_mon_domain *d = NULL;
 	int cpu;
 
 	/* When picking a CPU from cpu_mask, ensure it can't race with cpuhp */
 	lockdep_assert_cpus_held();
 
-	if (hdr) {
-		if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
-			return;
-		d = container_of(hdr, struct rdt_mon_domain, hdr);
-	}
-
 	/*
 	 * Setup the parameters to pass to mon_event_count() to read the data.
 	 */
 	rr->rgrp = rdtgrp;
 	rr->evtid = evtid;
 	rr->r = r;
-	rr->d = d;
+	rr->hdr = hdr;
 	rr->first = first;
 	if (resctrl_arch_mbm_cntr_assign_enabled(r) &&
 	    resctrl_is_mbm_event(evtid)) {
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index b5e0db38c8bf..e1c12201388f 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -159,7 +159,7 @@ void __check_limbo(struct rdt_mon_domain *d, bool force_free)
 			break;
 
 		entry = __rmid_entry(idx);
-		if (resctrl_arch_rmid_read(r, d, entry->closid, entry->rmid,
+		if (resctrl_arch_rmid_read(r, &d->hdr, entry->closid, entry->rmid,
 					   QOS_L3_OCCUP_EVENT_ID, &val,
 					   arch_mon_ctx)) {
 			rmid_dirty = true;
@@ -421,11 +421,16 @@ static int __l3_mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 	struct rdt_mon_domain *d;
 	int cntr_id = -ENOENT;
 	struct mbm_state *m;
-	int err, ret;
 	u64 tval = 0;
 
+	if (!domain_header_is_valid(rr->hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3)) {
+		rr->err = -EIO;
+		return -EINVAL;
+	}
+	d = container_of(rr->hdr, struct rdt_mon_domain, hdr);
+
 	if (rr->is_mbm_cntr) {
-		cntr_id = mbm_cntr_get(rr->r, rr->d, rdtgrp, rr->evtid);
+		cntr_id = mbm_cntr_get(rr->r, d, rdtgrp, rr->evtid);
 		if (cntr_id < 0) {
 			rr->err = -ENOENT;
 			return -EINVAL;
@@ -434,31 +439,50 @@ static int __l3_mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 
 	if (rr->first) {
 		if (rr->is_mbm_cntr)
-			resctrl_arch_reset_cntr(rr->r, rr->d, closid, rmid, cntr_id, rr->evtid);
+			resctrl_arch_reset_cntr(rr->r, d, closid, rmid, cntr_id, rr->evtid);
 		else
-			resctrl_arch_reset_rmid(rr->r, rr->d, closid, rmid, rr->evtid);
-		m = get_mbm_state(rr->d, closid, rmid, rr->evtid);
+			resctrl_arch_reset_rmid(rr->r, d, closid, rmid, rr->evtid);
+		m = get_mbm_state(d, closid, rmid, rr->evtid);
 		if (m)
 			memset(m, 0, sizeof(struct mbm_state));
 		return 0;
 	}
 
-	if (rr->d) {
-		/* Reading a single domain, must be on a CPU in that domain. */
-		if (!cpumask_test_cpu(cpu, &rr->d->hdr.cpu_mask))
-			return -EINVAL;
-		if (rr->is_mbm_cntr)
-			rr->err = resctrl_arch_cntr_read(rr->r, rr->d, closid, rmid, cntr_id,
-							 rr->evtid, &tval);
-		else
-			rr->err = resctrl_arch_rmid_read(rr->r, rr->d, closid, rmid,
-							 rr->evtid, &tval, rr->arch_mon_ctx);
-		if (rr->err)
-			return rr->err;
+	/* Reading a single domain, must be on a CPU in that domain. */
+	if (!cpumask_test_cpu(cpu, &d->hdr.cpu_mask))
+		return -EINVAL;
+	if (rr->is_mbm_cntr)
+		rr->err = resctrl_arch_cntr_read(rr->r, d, closid, rmid, cntr_id,
+						 rr->evtid, &tval);
+	else
+		rr->err = resctrl_arch_rmid_read(rr->r, rr->hdr, closid, rmid,
+						 rr->evtid, &tval, rr->arch_mon_ctx);
+	if (rr->err)
+		return rr->err;
 
-		rr->val += tval;
+	rr->val += tval;
 
-		return 0;
+	return 0;
+}
+
+static int __l3_mon_event_count_sum(struct rdtgroup *rdtgrp, struct rmid_read *rr)
+{
+	int cpu = smp_processor_id();
+	u32 closid = rdtgrp->closid;
+	u32 rmid = rdtgrp->mon.rmid;
+	struct rdt_mon_domain *d;
+	u64 tval = 0;
+	int err, ret;
+
+	/*
+	 * Summing across domains is only done for systems that implement
+	 * Sub-NUMA Cluster. There is no overlap with systems that support
+	 * assignable counters.
+	 */
+	if (rr->is_mbm_cntr) {
+		pr_warn_once("Summing domains using assignable counters is not supported\n");
+		rr->err = -EINVAL;
+		return -EINVAL;
 	}
 
 	/* Summing domains that share a cache, must be on a CPU for that cache. */
@@ -476,12 +500,8 @@ static int __l3_mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 	list_for_each_entry(d, &rr->r->mon_domains, hdr.list) {
 		if (d->ci_id != rr->ci->id)
 			continue;
-		if (rr->is_mbm_cntr)
-			err = resctrl_arch_cntr_read(rr->r, d, closid, rmid, cntr_id,
-						     rr->evtid, &tval);
-		else
-			err = resctrl_arch_rmid_read(rr->r, d, closid, rmid,
-						     rr->evtid, &tval, rr->arch_mon_ctx);
+		err = resctrl_arch_rmid_read(rr->r, &d->hdr, closid, rmid,
+					     rr->evtid, &tval, rr->arch_mon_ctx);
 		if (!err) {
 			rr->val += tval;
 			ret = 0;
@@ -498,7 +518,10 @@ static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 {
 	switch (rr->r->rid) {
 	case RDT_RESOURCE_L3:
-		return __l3_mon_event_count(rdtgrp, rr);
+		if (rr->hdr)
+			return __l3_mon_event_count(rdtgrp, rr);
+		else
+			return __l3_mon_event_count_sum(rdtgrp, rr);
 	default:
 		rr->err = -EINVAL;
 		return -EINVAL;
@@ -522,9 +545,13 @@ static void mbm_bw_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 	u64 cur_bw, bytes, cur_bytes;
 	u32 closid = rdtgrp->closid;
 	u32 rmid = rdtgrp->mon.rmid;
+	struct rdt_mon_domain *d;
 	struct mbm_state *m;
 
-	m = get_mbm_state(rr->d, closid, rmid, rr->evtid);
+	if (!domain_header_is_valid(rr->hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
+		return;
+	d = container_of(rr->hdr, struct rdt_mon_domain, hdr);
+	m = get_mbm_state(d, closid, rmid, rr->evtid);
 	if (WARN_ON_ONCE(!m))
 		return;
 
@@ -697,7 +724,7 @@ static void mbm_update_one_event(struct rdt_resource *r, struct rdt_mon_domain *
 	struct rmid_read rr = {0};
 
 	rr.r = r;
-	rr.d = d;
+	rr.hdr = &d->hdr;
 	rr.evtid = evtid;
 	if (resctrl_arch_mbm_cntr_assign_enabled(r)) {
 		rr.is_mbm_cntr = true;
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v17 08/32] x86,fs/resctrl: Rename struct rdt_mon_domain and rdt_hw_mon_domain
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (6 preceding siblings ...)
  2025-12-17 17:20 ` [PATCH v17 07/32] x86,fs/resctrl: Use struct rdt_domain_hdr when reading counters Tony Luck
@ 2025-12-17 17:20 ` Tony Luck
  2025-12-17 17:20 ` [PATCH v17 09/32] x86,fs/resctrl: Rename some L3 specific functions Tony Luck
                   ` (24 subsequent siblings)
  32 siblings, 0 replies; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:20 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

The upcoming telemetry event monitoring is not tied to the L3 resource and
will have a new domain structure.

Rename the L3 resource specific domain data structures to include "l3_"
in their names to avoid confusion between the different resource specific
domain structures:
rdt_mon_domain		-> rdt_l3_mon_domain
rdt_hw_mon_domain	-> rdt_hw_l3_mon_domain

No functional change.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 include/linux/resctrl.h                | 22 ++++----
 arch/x86/kernel/cpu/resctrl/internal.h | 16 +++---
 fs/resctrl/internal.h                  |  8 +--
 arch/x86/kernel/cpu/resctrl/core.c     | 14 +++---
 arch/x86/kernel/cpu/resctrl/monitor.c  | 36 ++++++-------
 fs/resctrl/ctrlmondata.c               |  2 +-
 fs/resctrl/monitor.c                   | 70 +++++++++++++-------------
 fs/resctrl/rdtgroup.c                  | 40 +++++++--------
 8 files changed, 104 insertions(+), 104 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 9b9877fb3238..79aaaabcdd3f 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -178,7 +178,7 @@ struct mbm_cntr_cfg {
 };
 
 /**
- * struct rdt_mon_domain - group of CPUs sharing a resctrl monitor resource
+ * struct rdt_l3_mon_domain - group of CPUs sharing RDT_RESOURCE_L3 monitoring
  * @hdr:		common header for different domain types
  * @ci_id:		cache info id for this domain
  * @rmid_busy_llc:	bitmap of which limbo RMIDs are above threshold
@@ -192,7 +192,7 @@ struct mbm_cntr_cfg {
  * @cntr_cfg:		array of assignable counters' configuration (indexed
  *			by counter ID)
  */
-struct rdt_mon_domain {
+struct rdt_l3_mon_domain {
 	struct rdt_domain_hdr		hdr;
 	unsigned int			ci_id;
 	unsigned long			*rmid_busy_llc;
@@ -367,10 +367,10 @@ struct resctrl_cpu_defaults {
 };
 
 struct resctrl_mon_config_info {
-	struct rdt_resource	*r;
-	struct rdt_mon_domain	*d;
-	u32			evtid;
-	u32			mon_config;
+	struct rdt_resource		*r;
+	struct rdt_l3_mon_domain	*d;
+	u32				evtid;
+	u32				mon_config;
 };
 
 /**
@@ -585,7 +585,7 @@ struct rdt_domain_hdr *resctrl_find_domain(struct list_head *h, int id,
  *
  * This can be called from any CPU.
  */
-void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
 			     u32 closid, u32 rmid,
 			     enum resctrl_event_id eventid);
 
@@ -598,7 +598,7 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
  *
  * This can be called from any CPU.
  */
-void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d);
+void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_l3_mon_domain *d);
 
 /**
  * resctrl_arch_reset_all_ctrls() - Reset the control for each CLOSID to its
@@ -624,7 +624,7 @@ void resctrl_arch_reset_all_ctrls(struct rdt_resource *r);
  *
  * This can be called from any CPU.
  */
-void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
+void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
 			      enum resctrl_event_id evtid, u32 rmid, u32 closid,
 			      u32 cntr_id, bool assign);
 
@@ -647,7 +647,7 @@ void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
  * Return:
  * 0 on success, or -EIO, -EINVAL etc on error.
  */
-int resctrl_arch_cntr_read(struct rdt_resource *r, struct rdt_mon_domain *d,
+int resctrl_arch_cntr_read(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
 			   u32 closid, u32 rmid, int cntr_id,
 			   enum resctrl_event_id eventid, u64 *val);
 
@@ -662,7 +662,7 @@ int resctrl_arch_cntr_read(struct rdt_resource *r, struct rdt_mon_domain *d,
  *
  * This can be called from any CPU.
  */
-void resctrl_arch_reset_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
+void resctrl_arch_reset_cntr(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
 			     u32 closid, u32 rmid, int cntr_id,
 			     enum resctrl_event_id eventid);
 
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 4a916c84a322..d73c0adf1026 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -63,17 +63,17 @@ struct rdt_hw_ctrl_domain {
 };
 
 /**
- * struct rdt_hw_mon_domain - Arch private attributes of a set of CPUs that share
- *			      a resource for a monitor function
- * @d_resctrl:	Properties exposed to the resctrl file system
+ * struct rdt_hw_l3_mon_domain - Arch private attributes of a set of CPUs sharing
+ *				 RDT_RESOURCE_L3 monitoring
+ * @d_resctrl:		Properties exposed to the resctrl file system
  * @arch_mbm_states:	Per-event pointer to the MBM event's saved state.
  *			An MBM event's state is an array of struct arch_mbm_state
  *			indexed by RMID on x86.
  *
  * Members of this structure are accessed via helpers that provide abstraction.
  */
-struct rdt_hw_mon_domain {
-	struct rdt_mon_domain		d_resctrl;
+struct rdt_hw_l3_mon_domain {
+	struct rdt_l3_mon_domain	d_resctrl;
 	struct arch_mbm_state		*arch_mbm_states[QOS_NUM_L3_MBM_EVENTS];
 };
 
@@ -82,9 +82,9 @@ static inline struct rdt_hw_ctrl_domain *resctrl_to_arch_ctrl_dom(struct rdt_ctr
 	return container_of(r, struct rdt_hw_ctrl_domain, d_resctrl);
 }
 
-static inline struct rdt_hw_mon_domain *resctrl_to_arch_mon_dom(struct rdt_mon_domain *r)
+static inline struct rdt_hw_l3_mon_domain *resctrl_to_arch_mon_dom(struct rdt_l3_mon_domain *r)
 {
-	return container_of(r, struct rdt_hw_mon_domain, d_resctrl);
+	return container_of(r, struct rdt_hw_l3_mon_domain, d_resctrl);
 }
 
 /**
@@ -140,7 +140,7 @@ static inline struct rdt_hw_resource *resctrl_to_arch_res(struct rdt_resource *r
 
 extern struct rdt_hw_resource rdt_resources_all[];
 
-void arch_mon_domain_online(struct rdt_resource *r, struct rdt_mon_domain *d);
+void arch_mon_domain_online(struct rdt_resource *r, struct rdt_l3_mon_domain *d);
 
 /* CPUID.(EAX=10H, ECX=ResID=1).EAX */
 union cpuid_0x10_1_eax {
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 9912b774a580..af47b6ddef62 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -369,7 +369,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 
 int resctrl_mon_resource_init(void);
 
-void mbm_setup_overflow_handler(struct rdt_mon_domain *dom,
+void mbm_setup_overflow_handler(struct rdt_l3_mon_domain *dom,
 				unsigned long delay_ms,
 				int exclude_cpu);
 
@@ -377,14 +377,14 @@ void mbm_handle_overflow(struct work_struct *work);
 
 bool is_mba_sc(struct rdt_resource *r);
 
-void cqm_setup_limbo_handler(struct rdt_mon_domain *dom, unsigned long delay_ms,
+void cqm_setup_limbo_handler(struct rdt_l3_mon_domain *dom, unsigned long delay_ms,
 			     int exclude_cpu);
 
 void cqm_handle_limbo(struct work_struct *work);
 
-bool has_busy_rmid(struct rdt_mon_domain *d);
+bool has_busy_rmid(struct rdt_l3_mon_domain *d);
 
-void __check_limbo(struct rdt_mon_domain *d, bool force_free);
+void __check_limbo(struct rdt_l3_mon_domain *d, bool force_free);
 
 void resctrl_file_fflags_init(const char *config, unsigned long fflags);
 
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 1fab4c67d273..cc1b846f9645 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -368,7 +368,7 @@ static void ctrl_domain_free(struct rdt_hw_ctrl_domain *hw_dom)
 	kfree(hw_dom);
 }
 
-static void mon_domain_free(struct rdt_hw_mon_domain *hw_dom)
+static void mon_domain_free(struct rdt_hw_l3_mon_domain *hw_dom)
 {
 	int idx;
 
@@ -405,7 +405,7 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_ctrl_domain *
  * @num_rmid:	The size of the MBM counter array
  * @hw_dom:	The domain that owns the allocated arrays
  */
-static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_mon_domain *hw_dom)
+static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_l3_mon_domain *hw_dom)
 {
 	size_t tsize = sizeof(*hw_dom->arch_mbm_states[0]);
 	enum resctrl_event_id eventid;
@@ -503,8 +503,8 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
 
 static void l3_mon_domain_setup(int cpu, int id, struct rdt_resource *r, struct list_head *add_pos)
 {
-	struct rdt_hw_mon_domain *hw_dom;
-	struct rdt_mon_domain *d;
+	struct rdt_hw_l3_mon_domain *hw_dom;
+	struct rdt_l3_mon_domain *d;
 	struct cacheinfo *ci;
 	int err;
 
@@ -653,13 +653,13 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
 
 	switch (r->rid) {
 	case RDT_RESOURCE_L3: {
-		struct rdt_hw_mon_domain *hw_dom;
-		struct rdt_mon_domain *d;
+		struct rdt_hw_l3_mon_domain *hw_dom;
+		struct rdt_l3_mon_domain *d;
 
 		if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
 			return;
 
-		d = container_of(hdr, struct rdt_mon_domain, hdr);
+		d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
 		hw_dom = resctrl_to_arch_mon_dom(d);
 		resctrl_offline_mon_domain(r, hdr);
 		list_del_rcu(&hdr->list);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 3da970ea1903..04b8f1e1f314 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -109,7 +109,7 @@ static inline u64 get_corrected_mbm_count(u32 rmid, unsigned long val)
  *
  * In RMID sharing mode there are fewer "logical RMID" values available
  * to accumulate data ("physical RMIDs" are divided evenly between SNC
- * nodes that share an L3 cache). Linux creates an rdt_mon_domain for
+ * nodes that share an L3 cache). Linux creates an rdt_l3_mon_domain for
  * each SNC node.
  *
  * The value loaded into IA32_PQR_ASSOC is the "logical RMID".
@@ -157,7 +157,7 @@ static int __rmid_read_phys(u32 prmid, enum resctrl_event_id eventid, u64 *val)
 	return 0;
 }
 
-static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_mon_domain *hw_dom,
+static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_l3_mon_domain *hw_dom,
 						 u32 rmid,
 						 enum resctrl_event_id eventid)
 {
@@ -171,11 +171,11 @@ static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_mon_domain *hw_do
 	return state ? &state[rmid] : NULL;
 }
 
-void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
 			     u32 unused, u32 rmid,
 			     enum resctrl_event_id eventid)
 {
-	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
+	struct rdt_hw_l3_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
 	int cpu = cpumask_any(&d->hdr.cpu_mask);
 	struct arch_mbm_state *am;
 	u32 prmid;
@@ -194,9 +194,9 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
  * Assumes that hardware counters are also reset and thus that there is
  * no need to record initial non-zero counts.
  */
-void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d)
+void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
 {
-	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
+	struct rdt_hw_l3_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
 	enum resctrl_event_id eventid;
 	int idx;
 
@@ -217,10 +217,10 @@ static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
 	return chunks >> shift;
 }
 
-static u64 get_corrected_val(struct rdt_resource *r, struct rdt_mon_domain *d,
+static u64 get_corrected_val(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
 			     u32 rmid, enum resctrl_event_id eventid, u64 msr_val)
 {
-	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
+	struct rdt_hw_l3_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 	struct arch_mbm_state *am;
 	u64 chunks;
@@ -242,9 +242,9 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
 			   u32 unused, u32 rmid, enum resctrl_event_id eventid,
 			   u64 *val, void *ignored)
 {
-	struct rdt_hw_mon_domain *hw_dom;
+	struct rdt_hw_l3_mon_domain *hw_dom;
+	struct rdt_l3_mon_domain *d;
 	struct arch_mbm_state *am;
-	struct rdt_mon_domain *d;
 	u64 msr_val;
 	u32 prmid;
 	int cpu;
@@ -254,7 +254,7 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
 	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
 		return -EINVAL;
 
-	d = container_of(hdr, struct rdt_mon_domain, hdr);
+	d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
 	hw_dom = resctrl_to_arch_mon_dom(d);
 	cpu = cpumask_any(&hdr->cpu_mask);
 	prmid = logical_rmid_to_physical_rmid(cpu, rmid);
@@ -308,11 +308,11 @@ static int __cntr_id_read(u32 cntr_id, u64 *val)
 	return 0;
 }
 
-void resctrl_arch_reset_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
+void resctrl_arch_reset_cntr(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
 			     u32 unused, u32 rmid, int cntr_id,
 			     enum resctrl_event_id eventid)
 {
-	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
+	struct rdt_hw_l3_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
 	struct arch_mbm_state *am;
 
 	am = get_arch_mbm_state(hw_dom, rmid, eventid);
@@ -324,7 +324,7 @@ void resctrl_arch_reset_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
 	}
 }
 
-int resctrl_arch_cntr_read(struct rdt_resource *r, struct rdt_mon_domain *d,
+int resctrl_arch_cntr_read(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
 			   u32 unused, u32 rmid, int cntr_id,
 			   enum resctrl_event_id eventid, u64 *val)
 {
@@ -354,7 +354,7 @@ int resctrl_arch_cntr_read(struct rdt_resource *r, struct rdt_mon_domain *d,
  * must adjust RMID counter numbers based on SNC node. See
  * logical_rmid_to_physical_rmid() for code that does this.
  */
-void arch_mon_domain_online(struct rdt_resource *r, struct rdt_mon_domain *d)
+void arch_mon_domain_online(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
 {
 	if (snc_nodes_per_l3_cache > 1)
 		msr_clear_bit(MSR_RMID_SNC_CONFIG, 0);
@@ -516,7 +516,7 @@ static void resctrl_abmc_set_one_amd(void *arg)
  */
 static void _resctrl_abmc_enable(struct rdt_resource *r, bool enable)
 {
-	struct rdt_mon_domain *d;
+	struct rdt_l3_mon_domain *d;
 
 	lockdep_assert_cpus_held();
 
@@ -555,11 +555,11 @@ static void resctrl_abmc_config_one_amd(void *info)
 /*
  * Send an IPI to the domain to assign the counter to RMID, event pair.
  */
-void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
+void resctrl_arch_config_cntr(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
 			      enum resctrl_event_id evtid, u32 rmid, u32 closid,
 			      u32 cntr_id, bool assign)
 {
-	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
+	struct rdt_hw_l3_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
 	union l3_qos_abmc_cfg abmc_cfg = { 0 };
 	struct arch_mbm_state *am;
 
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index 9242a2982e77..a3c734fe656e 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -600,9 +600,9 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 	struct kernfs_open_file *of = m->private;
 	enum resctrl_res_level resid;
 	enum resctrl_event_id evtid;
+	struct rdt_l3_mon_domain *d;
 	struct rdt_domain_hdr *hdr;
 	struct rmid_read rr = {0};
-	struct rdt_mon_domain *d;
 	struct rdtgroup *rdtgrp;
 	int domid, cpu, ret = 0;
 	struct rdt_resource *r;
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index e1c12201388f..9edbe9805d33 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -130,7 +130,7 @@ static void limbo_release_entry(struct rmid_entry *entry)
  * decrement the count. If the busy count gets to zero on an RMID, we
  * free the RMID
  */
-void __check_limbo(struct rdt_mon_domain *d, bool force_free)
+void __check_limbo(struct rdt_l3_mon_domain *d, bool force_free)
 {
 	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
 	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
@@ -188,7 +188,7 @@ void __check_limbo(struct rdt_mon_domain *d, bool force_free)
 	resctrl_arch_mon_ctx_free(r, QOS_L3_OCCUP_EVENT_ID, arch_mon_ctx);
 }
 
-bool has_busy_rmid(struct rdt_mon_domain *d)
+bool has_busy_rmid(struct rdt_l3_mon_domain *d)
 {
 	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
 
@@ -289,7 +289,7 @@ int alloc_rmid(u32 closid)
 static void add_rmid_to_limbo(struct rmid_entry *entry)
 {
 	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
-	struct rdt_mon_domain *d;
+	struct rdt_l3_mon_domain *d;
 	u32 idx;
 
 	lockdep_assert_held(&rdtgroup_mutex);
@@ -342,7 +342,7 @@ void free_rmid(u32 closid, u32 rmid)
 		list_add_tail(&entry->list, &rmid_free_lru);
 }
 
-static struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
+static struct mbm_state *get_mbm_state(struct rdt_l3_mon_domain *d, u32 closid,
 				       u32 rmid, enum resctrl_event_id evtid)
 {
 	u32 idx = resctrl_arch_rmid_idx_encode(closid, rmid);
@@ -362,7 +362,7 @@ static struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
  * Return:
  * Valid counter ID on success, or -ENOENT on failure.
  */
-static int mbm_cntr_get(struct rdt_resource *r, struct rdt_mon_domain *d,
+static int mbm_cntr_get(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
 			struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
 {
 	int cntr_id;
@@ -389,7 +389,7 @@ static int mbm_cntr_get(struct rdt_resource *r, struct rdt_mon_domain *d,
  * Return:
  * Valid counter ID on success, or -ENOSPC on failure.
  */
-static int mbm_cntr_alloc(struct rdt_resource *r, struct rdt_mon_domain *d,
+static int mbm_cntr_alloc(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
 			  struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
 {
 	int cntr_id;
@@ -408,7 +408,7 @@ static int mbm_cntr_alloc(struct rdt_resource *r, struct rdt_mon_domain *d,
 /*
  * mbm_cntr_free() - Clear the counter ID configuration details in the domain @d.
  */
-static void mbm_cntr_free(struct rdt_mon_domain *d, int cntr_id)
+static void mbm_cntr_free(struct rdt_l3_mon_domain *d, int cntr_id)
 {
 	memset(&d->cntr_cfg[cntr_id], 0, sizeof(*d->cntr_cfg));
 }
@@ -418,7 +418,7 @@ static int __l3_mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 	int cpu = smp_processor_id();
 	u32 closid = rdtgrp->closid;
 	u32 rmid = rdtgrp->mon.rmid;
-	struct rdt_mon_domain *d;
+	struct rdt_l3_mon_domain *d;
 	int cntr_id = -ENOENT;
 	struct mbm_state *m;
 	u64 tval = 0;
@@ -427,7 +427,7 @@ static int __l3_mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 		rr->err = -EIO;
 		return -EINVAL;
 	}
-	d = container_of(rr->hdr, struct rdt_mon_domain, hdr);
+	d = container_of(rr->hdr, struct rdt_l3_mon_domain, hdr);
 
 	if (rr->is_mbm_cntr) {
 		cntr_id = mbm_cntr_get(rr->r, d, rdtgrp, rr->evtid);
@@ -470,7 +470,7 @@ static int __l3_mon_event_count_sum(struct rdtgroup *rdtgrp, struct rmid_read *r
 	int cpu = smp_processor_id();
 	u32 closid = rdtgrp->closid;
 	u32 rmid = rdtgrp->mon.rmid;
-	struct rdt_mon_domain *d;
+	struct rdt_l3_mon_domain *d;
 	u64 tval = 0;
 	int err, ret;
 
@@ -545,12 +545,12 @@ static void mbm_bw_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 	u64 cur_bw, bytes, cur_bytes;
 	u32 closid = rdtgrp->closid;
 	u32 rmid = rdtgrp->mon.rmid;
-	struct rdt_mon_domain *d;
+	struct rdt_l3_mon_domain *d;
 	struct mbm_state *m;
 
 	if (!domain_header_is_valid(rr->hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
 		return;
-	d = container_of(rr->hdr, struct rdt_mon_domain, hdr);
+	d = container_of(rr->hdr, struct rdt_l3_mon_domain, hdr);
 	m = get_mbm_state(d, closid, rmid, rr->evtid);
 	if (WARN_ON_ONCE(!m))
 		return;
@@ -650,7 +650,7 @@ static struct rdt_ctrl_domain *get_ctrl_domain_from_cpu(int cpu,
  * throttle MSRs already have low percentage values.  To avoid
  * unnecessarily restricting such rdtgroups, we also increase the bandwidth.
  */
-static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm)
+static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_l3_mon_domain *dom_mbm)
 {
 	u32 closid, rmid, cur_msr_val, new_msr_val;
 	struct mbm_state *pmbm_data, *cmbm_data;
@@ -718,7 +718,7 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm)
 	resctrl_arch_update_one(r_mba, dom_mba, closid, CDP_NONE, new_msr_val);
 }
 
-static void mbm_update_one_event(struct rdt_resource *r, struct rdt_mon_domain *d,
+static void mbm_update_one_event(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
 				 struct rdtgroup *rdtgrp, enum resctrl_event_id evtid)
 {
 	struct rmid_read rr = {0};
@@ -750,7 +750,7 @@ static void mbm_update_one_event(struct rdt_resource *r, struct rdt_mon_domain *
 		resctrl_arch_mon_ctx_free(rr.r, rr.evtid, rr.arch_mon_ctx);
 }
 
-static void mbm_update(struct rdt_resource *r, struct rdt_mon_domain *d,
+static void mbm_update(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
 		       struct rdtgroup *rdtgrp)
 {
 	/*
@@ -771,12 +771,12 @@ static void mbm_update(struct rdt_resource *r, struct rdt_mon_domain *d,
 void cqm_handle_limbo(struct work_struct *work)
 {
 	unsigned long delay = msecs_to_jiffies(CQM_LIMBOCHECK_INTERVAL);
-	struct rdt_mon_domain *d;
+	struct rdt_l3_mon_domain *d;
 
 	cpus_read_lock();
 	mutex_lock(&rdtgroup_mutex);
 
-	d = container_of(work, struct rdt_mon_domain, cqm_limbo.work);
+	d = container_of(work, struct rdt_l3_mon_domain, cqm_limbo.work);
 
 	__check_limbo(d, false);
 
@@ -799,7 +799,7 @@ void cqm_handle_limbo(struct work_struct *work)
  * @exclude_cpu:   Which CPU the handler should not run on,
  *		   RESCTRL_PICK_ANY_CPU to pick any CPU.
  */
-void cqm_setup_limbo_handler(struct rdt_mon_domain *dom, unsigned long delay_ms,
+void cqm_setup_limbo_handler(struct rdt_l3_mon_domain *dom, unsigned long delay_ms,
 			     int exclude_cpu)
 {
 	unsigned long delay = msecs_to_jiffies(delay_ms);
@@ -816,7 +816,7 @@ void mbm_handle_overflow(struct work_struct *work)
 {
 	unsigned long delay = msecs_to_jiffies(MBM_OVERFLOW_INTERVAL);
 	struct rdtgroup *prgrp, *crgrp;
-	struct rdt_mon_domain *d;
+	struct rdt_l3_mon_domain *d;
 	struct list_head *head;
 	struct rdt_resource *r;
 
@@ -831,7 +831,7 @@ void mbm_handle_overflow(struct work_struct *work)
 		goto out_unlock;
 
 	r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
-	d = container_of(work, struct rdt_mon_domain, mbm_over.work);
+	d = container_of(work, struct rdt_l3_mon_domain, mbm_over.work);
 
 	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
 		mbm_update(r, d, prgrp);
@@ -865,7 +865,7 @@ void mbm_handle_overflow(struct work_struct *work)
  * @exclude_cpu:   Which CPU the handler should not run on,
  *		   RESCTRL_PICK_ANY_CPU to pick any CPU.
  */
-void mbm_setup_overflow_handler(struct rdt_mon_domain *dom, unsigned long delay_ms,
+void mbm_setup_overflow_handler(struct rdt_l3_mon_domain *dom, unsigned long delay_ms,
 				int exclude_cpu)
 {
 	unsigned long delay = msecs_to_jiffies(delay_ms);
@@ -1120,7 +1120,7 @@ ssize_t resctrl_mbm_assign_on_mkdir_write(struct kernfs_open_file *of, char *buf
  * mbm_cntr_free_all() - Clear all the counter ID configuration details in the
  *			 domain @d. Called when mbm_assign_mode is changed.
  */
-static void mbm_cntr_free_all(struct rdt_resource *r, struct rdt_mon_domain *d)
+static void mbm_cntr_free_all(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
 {
 	memset(d->cntr_cfg, 0, sizeof(*d->cntr_cfg) * r->mon.num_mbm_cntrs);
 }
@@ -1129,7 +1129,7 @@ static void mbm_cntr_free_all(struct rdt_resource *r, struct rdt_mon_domain *d)
  * resctrl_reset_rmid_all() - Reset all non-architecture states for all the
  *			      supported RMIDs.
  */
-static void resctrl_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d)
+static void resctrl_reset_rmid_all(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
 {
 	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
 	enum resctrl_event_id evt;
@@ -1150,7 +1150,7 @@ static void resctrl_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain
  * Assign the counter if @assign is true else unassign the counter. Reset the
  * associated non-architectural state.
  */
-static void rdtgroup_assign_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
+static void rdtgroup_assign_cntr(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
 				 enum resctrl_event_id evtid, u32 rmid, u32 closid,
 				 u32 cntr_id, bool assign)
 {
@@ -1170,7 +1170,7 @@ static void rdtgroup_assign_cntr(struct rdt_resource *r, struct rdt_mon_domain *
  * Return:
  * 0 on success, < 0 on failure.
  */
-static int rdtgroup_alloc_assign_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
+static int rdtgroup_alloc_assign_cntr(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
 				      struct rdtgroup *rdtgrp, struct mon_evt *mevt)
 {
 	int cntr_id;
@@ -1205,7 +1205,7 @@ static int rdtgroup_alloc_assign_cntr(struct rdt_resource *r, struct rdt_mon_dom
  * Return:
  * 0 on success, < 0 on failure.
  */
-static int rdtgroup_assign_cntr_event(struct rdt_mon_domain *d, struct rdtgroup *rdtgrp,
+static int rdtgroup_assign_cntr_event(struct rdt_l3_mon_domain *d, struct rdtgroup *rdtgrp,
 				      struct mon_evt *mevt)
 {
 	struct rdt_resource *r = resctrl_arch_get_resource(mevt->rid);
@@ -1255,7 +1255,7 @@ void rdtgroup_assign_cntrs(struct rdtgroup *rdtgrp)
  * rdtgroup_free_unassign_cntr() - Unassign and reset the counter ID configuration
  * for the event pointed to by @mevt within the domain @d and resctrl group @rdtgrp.
  */
-static void rdtgroup_free_unassign_cntr(struct rdt_resource *r, struct rdt_mon_domain *d,
+static void rdtgroup_free_unassign_cntr(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
 					struct rdtgroup *rdtgrp, struct mon_evt *mevt)
 {
 	int cntr_id;
@@ -1276,7 +1276,7 @@ static void rdtgroup_free_unassign_cntr(struct rdt_resource *r, struct rdt_mon_d
  * the event structure @mevt from the domain @d and the group @rdtgrp. Unassign
  * the counters from all the domains if @d is NULL else unassign from @d.
  */
-static void rdtgroup_unassign_cntr_event(struct rdt_mon_domain *d, struct rdtgroup *rdtgrp,
+static void rdtgroup_unassign_cntr_event(struct rdt_l3_mon_domain *d, struct rdtgroup *rdtgrp,
 					 struct mon_evt *mevt)
 {
 	struct rdt_resource *r = resctrl_arch_get_resource(mevt->rid);
@@ -1351,7 +1351,7 @@ static int resctrl_parse_mem_transactions(char *tok, u32 *val)
 static void rdtgroup_update_cntr_event(struct rdt_resource *r, struct rdtgroup *rdtgrp,
 				       enum resctrl_event_id evtid)
 {
-	struct rdt_mon_domain *d;
+	struct rdt_l3_mon_domain *d;
 	int cntr_id;
 
 	list_for_each_entry(d, &r->mon_domains, hdr.list) {
@@ -1457,7 +1457,7 @@ ssize_t resctrl_mbm_assign_mode_write(struct kernfs_open_file *of, char *buf,
 				      size_t nbytes, loff_t off)
 {
 	struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
-	struct rdt_mon_domain *d;
+	struct rdt_l3_mon_domain *d;
 	int ret = 0;
 	bool enable;
 
@@ -1530,7 +1530,7 @@ int resctrl_num_mbm_cntrs_show(struct kernfs_open_file *of,
 			       struct seq_file *s, void *v)
 {
 	struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
-	struct rdt_mon_domain *dom;
+	struct rdt_l3_mon_domain *dom;
 	bool sep = false;
 
 	cpus_read_lock();
@@ -1554,7 +1554,7 @@ int resctrl_available_mbm_cntrs_show(struct kernfs_open_file *of,
 				     struct seq_file *s, void *v)
 {
 	struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
-	struct rdt_mon_domain *dom;
+	struct rdt_l3_mon_domain *dom;
 	bool sep = false;
 	u32 cntrs, i;
 	int ret = 0;
@@ -1595,7 +1595,7 @@ int resctrl_available_mbm_cntrs_show(struct kernfs_open_file *of,
 int mbm_L3_assignments_show(struct kernfs_open_file *of, struct seq_file *s, void *v)
 {
 	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
-	struct rdt_mon_domain *d;
+	struct rdt_l3_mon_domain *d;
 	struct rdtgroup *rdtgrp;
 	struct mon_evt *mevt;
 	int ret = 0;
@@ -1658,7 +1658,7 @@ static struct mon_evt *mbm_get_mon_event_by_name(struct rdt_resource *r, char *n
 	return NULL;
 }
 
-static int rdtgroup_modify_assign_state(char *assign, struct rdt_mon_domain *d,
+static int rdtgroup_modify_assign_state(char *assign, struct rdt_l3_mon_domain *d,
 					struct rdtgroup *rdtgrp, struct mon_evt *mevt)
 {
 	int ret = 0;
@@ -1684,7 +1684,7 @@ static int rdtgroup_modify_assign_state(char *assign, struct rdt_mon_domain *d,
 static int resctrl_parse_mbm_assignment(struct rdt_resource *r, struct rdtgroup *rdtgrp,
 					char *event, char *tok)
 {
-	struct rdt_mon_domain *d;
+	struct rdt_l3_mon_domain *d;
 	unsigned long dom_id = 0;
 	char *dom_str, *id_str;
 	struct mon_evt *mevt;
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 89ffe54fb0fc..2ed435db1923 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1640,7 +1640,7 @@ static void mondata_config_read(struct resctrl_mon_config_info *mon_info)
 static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
 {
 	struct resctrl_mon_config_info mon_info;
-	struct rdt_mon_domain *dom;
+	struct rdt_l3_mon_domain *dom;
 	bool sep = false;
 
 	cpus_read_lock();
@@ -1688,7 +1688,7 @@ static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
 }
 
 static void mbm_config_write_domain(struct rdt_resource *r,
-				    struct rdt_mon_domain *d, u32 evtid, u32 val)
+				    struct rdt_l3_mon_domain *d, u32 evtid, u32 val)
 {
 	struct resctrl_mon_config_info mon_info = {0};
 
@@ -1729,8 +1729,8 @@ static void mbm_config_write_domain(struct rdt_resource *r,
 static int mon_config_write(struct rdt_resource *r, char *tok, u32 evtid)
 {
 	char *dom_str = NULL, *id_str;
+	struct rdt_l3_mon_domain *d;
 	unsigned long dom_id, val;
-	struct rdt_mon_domain *d;
 
 	/* Walking r->domains, ensure it can't race with cpuhp */
 	lockdep_assert_cpus_held();
@@ -2781,7 +2781,7 @@ static int rdt_get_tree(struct fs_context *fc)
 {
 	struct rdt_fs_context *ctx = rdt_fc2context(fc);
 	unsigned long flags = RFTYPE_CTRL_BASE;
-	struct rdt_mon_domain *dom;
+	struct rdt_l3_mon_domain *dom;
 	struct rdt_resource *r;
 	int ret;
 
@@ -3232,7 +3232,7 @@ static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
 					   struct rdt_domain_hdr *hdr)
 {
 	struct rdtgroup *prgrp, *crgrp;
-	struct rdt_mon_domain *d;
+	struct rdt_l3_mon_domain *d;
 	char subname[32];
 	bool snc_mode;
 	char name[32];
@@ -3240,7 +3240,7 @@ static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
 	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
 		return;
 
-	d = container_of(hdr, struct rdt_mon_domain, hdr);
+	d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
 	snc_mode = r->mon_scope == RESCTRL_L3_NODE;
 	sprintf(name, "mon_%s_%02d", r->name, snc_mode ? d->ci_id : hdr->id);
 	if (snc_mode)
@@ -3258,8 +3258,8 @@ static int mon_add_all_files(struct kernfs_node *kn, struct rdt_domain_hdr *hdr,
 			     struct rdt_resource *r, struct rdtgroup *prgrp,
 			     bool do_sum)
 {
+	struct rdt_l3_mon_domain *d;
 	struct rmid_read rr = {0};
-	struct rdt_mon_domain *d;
 	struct mon_data *priv;
 	struct mon_evt *mevt;
 	int ret, domid;
@@ -3267,7 +3267,7 @@ static int mon_add_all_files(struct kernfs_node *kn, struct rdt_domain_hdr *hdr,
 	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
 		return -EINVAL;
 
-	d = container_of(hdr, struct rdt_mon_domain, hdr);
+	d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
 	for_each_mon_event(mevt) {
 		if (mevt->rid != r->rid || !mevt->enabled)
 			continue;
@@ -3292,7 +3292,7 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
 				struct rdt_resource *r, struct rdtgroup *prgrp)
 {
 	struct kernfs_node *kn, *ckn;
-	struct rdt_mon_domain *d;
+	struct rdt_l3_mon_domain *d;
 	char name[32];
 	bool snc_mode;
 	int ret = 0;
@@ -3302,7 +3302,7 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
 	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
 		return -EINVAL;
 
-	d = container_of(hdr, struct rdt_mon_domain, hdr);
+	d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
 	snc_mode = r->mon_scope == RESCTRL_L3_NODE;
 	sprintf(name, "mon_%s_%02d", r->name, snc_mode ? d->ci_id : d->hdr.id);
 	kn = kernfs_find_and_get(parent_kn, name);
@@ -4246,7 +4246,7 @@ static void rdtgroup_setup_default(void)
 	mutex_unlock(&rdtgroup_mutex);
 }
 
-static void domain_destroy_mon_state(struct rdt_mon_domain *d)
+static void domain_destroy_mon_state(struct rdt_l3_mon_domain *d)
 {
 	int idx;
 
@@ -4270,14 +4270,14 @@ void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain
 
 void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr)
 {
-	struct rdt_mon_domain *d;
+	struct rdt_l3_mon_domain *d;
 
 	mutex_lock(&rdtgroup_mutex);
 
 	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
 		goto out_unlock;
 
-	d = container_of(hdr, struct rdt_mon_domain, hdr);
+	d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
 
 	/*
 	 * If resctrl is mounted, remove all the
@@ -4319,7 +4319,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *h
  *
  * Returns 0 for success, or -ENOMEM.
  */
-static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain *d)
+static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
 {
 	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
 	size_t tsize = sizeof(*d->mbm_states[0]);
@@ -4377,7 +4377,7 @@ int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d
 
 int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr)
 {
-	struct rdt_mon_domain *d;
+	struct rdt_l3_mon_domain *d;
 	int err = -EINVAL;
 
 	mutex_lock(&rdtgroup_mutex);
@@ -4385,7 +4385,7 @@ int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr
 	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
 		goto out_unlock;
 
-	d = container_of(hdr, struct rdt_mon_domain, hdr);
+	d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
 	err = domain_setup_mon_state(r, d);
 	if (err)
 		goto out_unlock;
@@ -4432,10 +4432,10 @@ static void clear_childcpus(struct rdtgroup *r, unsigned int cpu)
 	}
 }
 
-static struct rdt_mon_domain *get_mon_domain_from_cpu(int cpu,
-						      struct rdt_resource *r)
+static struct rdt_l3_mon_domain *get_mon_domain_from_cpu(int cpu,
+							 struct rdt_resource *r)
 {
-	struct rdt_mon_domain *d;
+	struct rdt_l3_mon_domain *d;
 
 	lockdep_assert_cpus_held();
 
@@ -4451,7 +4451,7 @@ static struct rdt_mon_domain *get_mon_domain_from_cpu(int cpu,
 void resctrl_offline_cpu(unsigned int cpu)
 {
 	struct rdt_resource *l3 = resctrl_arch_get_resource(RDT_RESOURCE_L3);
-	struct rdt_mon_domain *d;
+	struct rdt_l3_mon_domain *d;
 	struct rdtgroup *rdtgrp;
 
 	mutex_lock(&rdtgroup_mutex);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v17 09/32] x86,fs/resctrl: Rename some L3 specific functions
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (7 preceding siblings ...)
  2025-12-17 17:20 ` [PATCH v17 08/32] x86,fs/resctrl: Rename struct rdt_mon_domain and rdt_hw_mon_domain Tony Luck
@ 2025-12-17 17:20 ` Tony Luck
  2025-12-17 17:20 ` [PATCH v17 10/32] fs/resctrl: Make event details accessible to functions when reading events Tony Luck
                   ` (23 subsequent siblings)
  32 siblings, 0 replies; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:20 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

With the arrival of monitor events tied to new domains associated with a
different resource it would be clearer if the L3 resource specific functions
are more accurately named.

Rename three groups of functions:

Functions that allocate/free architecture per-RMID MBM state information:
arch_domain_mbm_alloc()		-> l3_mon_domain_mbm_alloc()
mon_domain_free()		-> l3_mon_domain_free()

Functions that allocate/free filesystem per-RMID MBM state information:
domain_setup_mon_state()	-> domain_setup_l3_mon_state()
domain_destroy_mon_state()	-> domain_destroy_l3_mon_state()

Initialization/exit:
rdt_get_mon_l3_config()		-> rdt_get_l3_mon_config()
resctrl_mon_resource_init()	-> resctrl_l3_mon_resource_init()
resctrl_mon_resource_exit()	-> resctrl_l3_mon_resource_exit()

Ensure kernel-doc descriptions of these functions' return values are present
and correctly formatted.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/resctrl/internal.h |  2 +-
 fs/resctrl/internal.h                  |  6 +++---
 arch/x86/kernel/cpu/resctrl/core.c     | 20 +++++++++++---------
 arch/x86/kernel/cpu/resctrl/monitor.c  |  2 +-
 fs/resctrl/monitor.c                   |  8 ++++----
 fs/resctrl/rdtgroup.c                  | 24 ++++++++++++------------
 6 files changed, 32 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index d73c0adf1026..11d06995810e 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -213,7 +213,7 @@ union l3_qos_abmc_cfg {
 
 void rdt_ctrl_update(void *arg);
 
-int rdt_get_mon_l3_config(struct rdt_resource *r);
+int rdt_get_l3_mon_config(struct rdt_resource *r);
 
 bool rdt_cpu_has(int flag);
 
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index af47b6ddef62..9768341aa21c 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -357,7 +357,9 @@ int alloc_rmid(u32 closid);
 
 void free_rmid(u32 closid, u32 rmid);
 
-void resctrl_mon_resource_exit(void);
+int resctrl_l3_mon_resource_init(void);
+
+void resctrl_l3_mon_resource_exit(void);
 
 void mon_event_count(void *info);
 
@@ -367,8 +369,6 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 		    struct rdt_domain_hdr *hdr, struct rdtgroup *rdtgrp,
 		    cpumask_t *cpumask, int evtid, int first);
 
-int resctrl_mon_resource_init(void);
-
 void mbm_setup_overflow_handler(struct rdt_l3_mon_domain *dom,
 				unsigned long delay_ms,
 				int exclude_cpu);
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index cc1b846f9645..b3a2dc56155d 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -368,7 +368,7 @@ static void ctrl_domain_free(struct rdt_hw_ctrl_domain *hw_dom)
 	kfree(hw_dom);
 }
 
-static void mon_domain_free(struct rdt_hw_l3_mon_domain *hw_dom)
+static void l3_mon_domain_free(struct rdt_hw_l3_mon_domain *hw_dom)
 {
 	int idx;
 
@@ -401,11 +401,13 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_ctrl_domain *
 }
 
 /**
- * arch_domain_mbm_alloc() - Allocate arch private storage for the MBM counters
+ * l3_mon_domain_mbm_alloc() - Allocate arch private storage for the MBM counters
  * @num_rmid:	The size of the MBM counter array
  * @hw_dom:	The domain that owns the allocated arrays
+ *
+ * Return:	0 for success, or -ENOMEM.
  */
-static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_l3_mon_domain *hw_dom)
+static int l3_mon_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_l3_mon_domain *hw_dom)
 {
 	size_t tsize = sizeof(*hw_dom->arch_mbm_states[0]);
 	enum resctrl_event_id eventid;
@@ -519,7 +521,7 @@ static void l3_mon_domain_setup(int cpu, int id, struct rdt_resource *r, struct
 	ci = get_cpu_cacheinfo_level(cpu, RESCTRL_L3_CACHE);
 	if (!ci) {
 		pr_warn_once("Can't find L3 cache for CPU:%d resource %s\n", cpu, r->name);
-		mon_domain_free(hw_dom);
+		l3_mon_domain_free(hw_dom);
 		return;
 	}
 	d->ci_id = ci->id;
@@ -527,8 +529,8 @@ static void l3_mon_domain_setup(int cpu, int id, struct rdt_resource *r, struct
 
 	arch_mon_domain_online(r, d);
 
-	if (arch_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
-		mon_domain_free(hw_dom);
+	if (l3_mon_domain_mbm_alloc(r->mon.num_rmid, hw_dom)) {
+		l3_mon_domain_free(hw_dom);
 		return;
 	}
 
@@ -538,7 +540,7 @@ static void l3_mon_domain_setup(int cpu, int id, struct rdt_resource *r, struct
 	if (err) {
 		list_del_rcu(&d->hdr.list);
 		synchronize_rcu();
-		mon_domain_free(hw_dom);
+		l3_mon_domain_free(hw_dom);
 	}
 }
 
@@ -664,7 +666,7 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
 		resctrl_offline_mon_domain(r, hdr);
 		list_del_rcu(&hdr->list);
 		synchronize_rcu();
-		mon_domain_free(hw_dom);
+		l3_mon_domain_free(hw_dom);
 		break;
 	}
 	default:
@@ -917,7 +919,7 @@ static __init bool get_rdt_mon_resources(void)
 	if (!ret)
 		return false;
 
-	return !rdt_get_mon_l3_config(r);
+	return !rdt_get_l3_mon_config(r);
 }
 
 static __init void __check_quirks_intel(void)
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 04b8f1e1f314..20605212656c 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -424,7 +424,7 @@ static __init int snc_get_config(void)
 	return ret;
 }
 
-int __init rdt_get_mon_l3_config(struct rdt_resource *r)
+int __init rdt_get_l3_mon_config(struct rdt_resource *r)
 {
 	unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 9edbe9805d33..d5ae0ef4c947 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -1780,7 +1780,7 @@ ssize_t mbm_L3_assignments_write(struct kernfs_open_file *of, char *buf,
 }
 
 /**
- * resctrl_mon_resource_init() - Initialise global monitoring structures.
+ * resctrl_l3_mon_resource_init() - Initialise global monitoring structures.
  *
  * Allocate and initialise global monitor resources that do not belong to a
  * specific domain. i.e. the rmid_ptrs[] used for the limbo and free lists.
@@ -1789,9 +1789,9 @@ ssize_t mbm_L3_assignments_write(struct kernfs_open_file *of, char *buf,
  * Resctrl's cpuhp callbacks may be called before this point to bring a domain
  * online.
  *
- * Returns 0 for success, or -ENOMEM.
+ * Return: 0 for success, or -ENOMEM.
  */
-int resctrl_mon_resource_init(void)
+int resctrl_l3_mon_resource_init(void)
 {
 	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
 	int ret;
@@ -1841,7 +1841,7 @@ int resctrl_mon_resource_init(void)
 	return 0;
 }
 
-void resctrl_mon_resource_exit(void)
+void resctrl_l3_mon_resource_exit(void)
 {
 	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
 
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 2ed435db1923..b57e1e78bbc2 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -4246,7 +4246,7 @@ static void rdtgroup_setup_default(void)
 	mutex_unlock(&rdtgroup_mutex);
 }
 
-static void domain_destroy_mon_state(struct rdt_l3_mon_domain *d)
+static void domain_destroy_l3_mon_state(struct rdt_l3_mon_domain *d)
 {
 	int idx;
 
@@ -4301,13 +4301,13 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *h
 		cancel_delayed_work(&d->cqm_limbo);
 	}
 
-	domain_destroy_mon_state(d);
+	domain_destroy_l3_mon_state(d);
 out_unlock:
 	mutex_unlock(&rdtgroup_mutex);
 }
 
 /**
- * domain_setup_mon_state() -  Initialise domain monitoring structures.
+ * domain_setup_l3_mon_state() -  Initialise domain monitoring structures.
  * @r:	The resource for the newly online domain.
  * @d:	The newly online domain.
  *
@@ -4315,11 +4315,11 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *h
  * Called when the first CPU of a domain comes online, regardless of whether
  * the filesystem is mounted.
  * During boot this may be called before global allocations have been made by
- * resctrl_mon_resource_init().
+ * resctrl_l3_mon_resource_init().
  *
- * Returns 0 for success, or -ENOMEM.
+ * Return: 0 for success, or -ENOMEM.
  */
-static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
+static int domain_setup_l3_mon_state(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
 {
 	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
 	size_t tsize = sizeof(*d->mbm_states[0]);
@@ -4386,7 +4386,7 @@ int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr
 		goto out_unlock;
 
 	d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
-	err = domain_setup_mon_state(r, d);
+	err = domain_setup_l3_mon_state(r, d);
 	if (err)
 		goto out_unlock;
 
@@ -4503,13 +4503,13 @@ int resctrl_init(void)
 
 	io_alloc_init();
 
-	ret = resctrl_mon_resource_init();
+	ret = resctrl_l3_mon_resource_init();
 	if (ret)
 		return ret;
 
 	ret = sysfs_create_mount_point(fs_kobj, "resctrl");
 	if (ret) {
-		resctrl_mon_resource_exit();
+		resctrl_l3_mon_resource_exit();
 		return ret;
 	}
 
@@ -4544,7 +4544,7 @@ int resctrl_init(void)
 
 cleanup_mountpoint:
 	sysfs_remove_mount_point(fs_kobj, "resctrl");
-	resctrl_mon_resource_exit();
+	resctrl_l3_mon_resource_exit();
 
 	return ret;
 }
@@ -4580,7 +4580,7 @@ static bool resctrl_online_domains_exist(void)
  * When called by the architecture code, all CPUs and resctrl domains must be
  * offline. This ensures the limbo and overflow handlers are not scheduled to
  * run, meaning the data structures they access can be freed by
- * resctrl_mon_resource_exit().
+ * resctrl_l3_mon_resource_exit().
  *
  * After resctrl_exit() returns, the architecture code should return an
  * error from all resctrl_arch_ functions that can do this.
@@ -4607,5 +4607,5 @@ void resctrl_exit(void)
 	 * it can be used to umount resctrl.
 	 */
 
-	resctrl_mon_resource_exit();
+	resctrl_l3_mon_resource_exit();
 }
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v17 10/32] fs/resctrl: Make event details accessible to functions when reading events
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (8 preceding siblings ...)
  2025-12-17 17:20 ` [PATCH v17 09/32] x86,fs/resctrl: Rename some L3 specific functions Tony Luck
@ 2025-12-17 17:20 ` Tony Luck
  2025-12-17 17:20 ` [PATCH v17 11/32] x86,fs/resctrl: Handle events that can be read from any CPU Tony Luck
                   ` (22 subsequent siblings)
  32 siblings, 0 replies; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:20 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Reading monitoring event data from MMIO requires more context than the event id
to be able to read the correct memory location. struct mon_evt is the appropriate
place for this event specific context.

Prepare for addition of extra fields to struct mon_evt by changing the calling
conventions to pass a pointer to the mon_evt structure instead of just the
event id.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 fs/resctrl/internal.h    | 10 +++++-----
 fs/resctrl/ctrlmondata.c | 18 +++++++++---------
 fs/resctrl/monitor.c     | 22 +++++++++++-----------
 fs/resctrl/rdtgroup.c    |  6 +++---
 4 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 9768341aa21c..86cf38ab08a7 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -81,7 +81,7 @@ extern struct mon_evt mon_event_all[QOS_NUM_EVENTS];
  * struct mon_data - Monitoring details for each event file.
  * @list:            Member of the global @mon_data_kn_priv_list list.
  * @rid:             Resource id associated with the event file.
- * @evtid:           Event id associated with the event file.
+ * @evt:             Event structure associated with the event file.
  * @sum:             Set when event must be summed across multiple
  *                   domains.
  * @domid:           When @sum is zero this is the domain to which
@@ -95,7 +95,7 @@ extern struct mon_evt mon_event_all[QOS_NUM_EVENTS];
 struct mon_data {
 	struct list_head	list;
 	enum resctrl_res_level	rid;
-	enum resctrl_event_id	evtid;
+	struct mon_evt		*evt;
 	int			domid;
 	bool			sum;
 };
@@ -108,7 +108,7 @@ struct mon_data {
  * @r:	   Resource describing the properties of the event being read.
  * @hdr:   Header of domain that the counter should be read from. If NULL then
  *	   sum all domains in @r sharing L3 @ci.id
- * @evtid: Which monitor event to read.
+ * @evt:   Which monitor event to read.
  * @first: Initialize MBM counter when true.
  * @ci:    Cacheinfo for L3. Only set when @hdr is NULL. Used when summing
  *	   domains.
@@ -126,7 +126,7 @@ struct rmid_read {
 	struct rdtgroup		*rgrp;
 	struct rdt_resource	*r;
 	struct rdt_domain_hdr	*hdr;
-	enum resctrl_event_id	evtid;
+	struct mon_evt		*evt;
 	bool			first;
 	struct cacheinfo	*ci;
 	bool			is_mbm_cntr;
@@ -367,7 +367,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg);
 
 void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 		    struct rdt_domain_hdr *hdr, struct rdtgroup *rdtgrp,
-		    cpumask_t *cpumask, int evtid, int first);
+		    cpumask_t *cpumask, struct mon_evt *evt, int first);
 
 void mbm_setup_overflow_handler(struct rdt_l3_mon_domain *dom,
 				unsigned long delay_ms,
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index a3c734fe656e..7f9b2fed117a 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -552,7 +552,7 @@ struct rdt_domain_hdr *resctrl_find_domain(struct list_head *h, int id,
 
 void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 		    struct rdt_domain_hdr *hdr, struct rdtgroup *rdtgrp,
-		    cpumask_t *cpumask, int evtid, int first)
+		    cpumask_t *cpumask, struct mon_evt *evt, int first)
 {
 	int cpu;
 
@@ -563,15 +563,15 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 	 * Setup the parameters to pass to mon_event_count() to read the data.
 	 */
 	rr->rgrp = rdtgrp;
-	rr->evtid = evtid;
+	rr->evt = evt;
 	rr->r = r;
 	rr->hdr = hdr;
 	rr->first = first;
 	if (resctrl_arch_mbm_cntr_assign_enabled(r) &&
-	    resctrl_is_mbm_event(evtid)) {
+	    resctrl_is_mbm_event(evt->evtid)) {
 		rr->is_mbm_cntr = true;
 	} else {
-		rr->arch_mon_ctx = resctrl_arch_mon_ctx_alloc(r, evtid);
+		rr->arch_mon_ctx = resctrl_arch_mon_ctx_alloc(r, evt->evtid);
 		if (IS_ERR(rr->arch_mon_ctx)) {
 			rr->err = -EINVAL;
 			return;
@@ -592,14 +592,13 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 		smp_call_on_cpu(cpu, smp_mon_event_count, rr, false);
 
 	if (rr->arch_mon_ctx)
-		resctrl_arch_mon_ctx_free(r, evtid, rr->arch_mon_ctx);
+		resctrl_arch_mon_ctx_free(r, evt->evtid, rr->arch_mon_ctx);
 }
 
 int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 {
 	struct kernfs_open_file *of = m->private;
 	enum resctrl_res_level resid;
-	enum resctrl_event_id evtid;
 	struct rdt_l3_mon_domain *d;
 	struct rdt_domain_hdr *hdr;
 	struct rmid_read rr = {0};
@@ -607,6 +606,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 	int domid, cpu, ret = 0;
 	struct rdt_resource *r;
 	struct cacheinfo *ci;
+	struct mon_evt *evt;
 	struct mon_data *md;
 
 	rdtgrp = rdtgroup_kn_lock_live(of->kn);
@@ -623,7 +623,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 
 	resid = md->rid;
 	domid = md->domid;
-	evtid = md->evtid;
+	evt = md->evt;
 	r = resctrl_arch_get_resource(resid);
 
 	if (md->sum) {
@@ -641,7 +641,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 					continue;
 				rr.ci = ci;
 				mon_event_read(&rr, r, NULL, rdtgrp,
-					       &ci->shared_cpu_map, evtid, false);
+					       &ci->shared_cpu_map, evt, false);
 				goto checkresult;
 			}
 		}
@@ -657,7 +657,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 			ret = -ENOENT;
 			goto out;
 		}
-		mon_event_read(&rr, r, hdr, rdtgrp, &hdr->cpu_mask, evtid, false);
+		mon_event_read(&rr, r, hdr, rdtgrp, &hdr->cpu_mask, evt, false);
 	}
 
 checkresult:
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index d5ae0ef4c947..340b847ab397 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -430,7 +430,7 @@ static int __l3_mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 	d = container_of(rr->hdr, struct rdt_l3_mon_domain, hdr);
 
 	if (rr->is_mbm_cntr) {
-		cntr_id = mbm_cntr_get(rr->r, d, rdtgrp, rr->evtid);
+		cntr_id = mbm_cntr_get(rr->r, d, rdtgrp, rr->evt->evtid);
 		if (cntr_id < 0) {
 			rr->err = -ENOENT;
 			return -EINVAL;
@@ -439,10 +439,10 @@ static int __l3_mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 
 	if (rr->first) {
 		if (rr->is_mbm_cntr)
-			resctrl_arch_reset_cntr(rr->r, d, closid, rmid, cntr_id, rr->evtid);
+			resctrl_arch_reset_cntr(rr->r, d, closid, rmid, cntr_id, rr->evt->evtid);
 		else
-			resctrl_arch_reset_rmid(rr->r, d, closid, rmid, rr->evtid);
-		m = get_mbm_state(d, closid, rmid, rr->evtid);
+			resctrl_arch_reset_rmid(rr->r, d, closid, rmid, rr->evt->evtid);
+		m = get_mbm_state(d, closid, rmid, rr->evt->evtid);
 		if (m)
 			memset(m, 0, sizeof(struct mbm_state));
 		return 0;
@@ -453,10 +453,10 @@ static int __l3_mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 		return -EINVAL;
 	if (rr->is_mbm_cntr)
 		rr->err = resctrl_arch_cntr_read(rr->r, d, closid, rmid, cntr_id,
-						 rr->evtid, &tval);
+						 rr->evt->evtid, &tval);
 	else
 		rr->err = resctrl_arch_rmid_read(rr->r, rr->hdr, closid, rmid,
-						 rr->evtid, &tval, rr->arch_mon_ctx);
+						 rr->evt->evtid, &tval, rr->arch_mon_ctx);
 	if (rr->err)
 		return rr->err;
 
@@ -501,7 +501,7 @@ static int __l3_mon_event_count_sum(struct rdtgroup *rdtgrp, struct rmid_read *r
 		if (d->ci_id != rr->ci->id)
 			continue;
 		err = resctrl_arch_rmid_read(rr->r, &d->hdr, closid, rmid,
-					     rr->evtid, &tval, rr->arch_mon_ctx);
+					     rr->evt->evtid, &tval, rr->arch_mon_ctx);
 		if (!err) {
 			rr->val += tval;
 			ret = 0;
@@ -551,7 +551,7 @@ static void mbm_bw_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 	if (!domain_header_is_valid(rr->hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
 		return;
 	d = container_of(rr->hdr, struct rdt_l3_mon_domain, hdr);
-	m = get_mbm_state(d, closid, rmid, rr->evtid);
+	m = get_mbm_state(d, closid, rmid, rr->evt->evtid);
 	if (WARN_ON_ONCE(!m))
 		return;
 
@@ -725,11 +725,11 @@ static void mbm_update_one_event(struct rdt_resource *r, struct rdt_l3_mon_domai
 
 	rr.r = r;
 	rr.hdr = &d->hdr;
-	rr.evtid = evtid;
+	rr.evt = &mon_event_all[evtid];
 	if (resctrl_arch_mbm_cntr_assign_enabled(r)) {
 		rr.is_mbm_cntr = true;
 	} else {
-		rr.arch_mon_ctx = resctrl_arch_mon_ctx_alloc(rr.r, rr.evtid);
+		rr.arch_mon_ctx = resctrl_arch_mon_ctx_alloc(rr.r, evtid);
 		if (IS_ERR(rr.arch_mon_ctx)) {
 			pr_warn_ratelimited("Failed to allocate monitor context: %ld",
 					    PTR_ERR(rr.arch_mon_ctx));
@@ -747,7 +747,7 @@ static void mbm_update_one_event(struct rdt_resource *r, struct rdt_l3_mon_domai
 		mbm_bw_count(rdtgrp, &rr);
 
 	if (rr.arch_mon_ctx)
-		resctrl_arch_mon_ctx_free(rr.r, rr.evtid, rr.arch_mon_ctx);
+		resctrl_arch_mon_ctx_free(rr.r, evtid, rr.arch_mon_ctx);
 }
 
 static void mbm_update(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index b57e1e78bbc2..771e40f02ba6 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -3103,7 +3103,7 @@ static struct mon_data *mon_get_kn_priv(enum resctrl_res_level rid, int domid,
 
 	list_for_each_entry(priv, &mon_data_kn_priv_list, list) {
 		if (priv->rid == rid && priv->domid == domid &&
-		    priv->sum == do_sum && priv->evtid == mevt->evtid)
+		    priv->sum == do_sum && priv->evt == mevt)
 			return priv;
 	}
 
@@ -3114,7 +3114,7 @@ static struct mon_data *mon_get_kn_priv(enum resctrl_res_level rid, int domid,
 	priv->rid = rid;
 	priv->domid = domid;
 	priv->sum = do_sum;
-	priv->evtid = mevt->evtid;
+	priv->evt = mevt;
 	list_add_tail(&priv->list, &mon_data_kn_priv_list);
 
 	return priv;
@@ -3281,7 +3281,7 @@ static int mon_add_all_files(struct kernfs_node *kn, struct rdt_domain_hdr *hdr,
 			return ret;
 
 		if (!do_sum && resctrl_is_mbm_event(mevt->evtid))
-			mon_event_read(&rr, r, hdr, prgrp, &hdr->cpu_mask, mevt->evtid, true);
+			mon_event_read(&rr, r, hdr, prgrp, &hdr->cpu_mask, mevt, true);
 	}
 
 	return 0;
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v17 11/32] x86,fs/resctrl: Handle events that can be read from any CPU
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (9 preceding siblings ...)
  2025-12-17 17:20 ` [PATCH v17 10/32] fs/resctrl: Make event details accessible to functions when reading events Tony Luck
@ 2025-12-17 17:20 ` Tony Luck
  2025-12-17 17:20 ` [PATCH v17 12/32] x86,fs/resctrl: Support binary fixed point event counters Tony Luck
                   ` (21 subsequent siblings)
  32 siblings, 0 replies; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:20 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

resctrl assumes that monitor events can only be read from a CPU in the
cpumask_t set of each domain.  This is true for x86 events accessed with an
MSR interface, but may not be true for other access methods such as MMIO.

Introduce and use flag mon_evt::any_cpu, settable by architecture, that
indicates there are no restrictions on which CPU can read that event.
This flag is not supported by the L3 event reading that requires to be run
on a CPU that belongs to the L3 domain of the event being read.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 include/linux/resctrl.h            | 2 +-
 fs/resctrl/internal.h              | 2 ++
 arch/x86/kernel/cpu/resctrl/core.c | 6 +++---
 fs/resctrl/ctrlmondata.c           | 6 ++++++
 fs/resctrl/monitor.c               | 4 +++-
 5 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 79aaaabcdd3f..22c5d07fe9ff 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -412,7 +412,7 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
 u32 resctrl_arch_system_num_rmid_idx(void);
 int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
 
-void resctrl_enable_mon_event(enum resctrl_event_id eventid);
+void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu);
 
 bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid);
 
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 86cf38ab08a7..fb0b6e40d022 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -61,6 +61,7 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
  *			READS_TO_REMOTE_MEM) being tracked by @evtid.
  *			Only valid if @evtid is an MBM event.
  * @configurable:	true if the event is configurable
+ * @any_cpu:		true if the event can be read from any CPU
  * @enabled:		true if the event is enabled
  */
 struct mon_evt {
@@ -69,6 +70,7 @@ struct mon_evt {
 	char			*name;
 	u32			evt_cfg;
 	bool			configurable;
+	bool			any_cpu;
 	bool			enabled;
 };
 
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index b3a2dc56155d..bd4a98106153 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -902,15 +902,15 @@ static __init bool get_rdt_mon_resources(void)
 	bool ret = false;
 
 	if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) {
-		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID);
+		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false);
 		ret = true;
 	}
 	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
-		resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID);
+		resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID, false);
 		ret = true;
 	}
 	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
-		resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID);
+		resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID, false);
 		ret = true;
 	}
 	if (rdt_cpu_has(X86_FEATURE_ABMC))
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index 7f9b2fed117a..2c69fcd70eeb 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -578,6 +578,11 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 		}
 	}
 
+	if (evt->any_cpu) {
+		mon_event_count(rr);
+		goto out_ctx_free;
+	}
+
 	cpu = cpumask_any_housekeeping(cpumask, RESCTRL_PICK_ANY_CPU);
 
 	/*
@@ -591,6 +596,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 	else
 		smp_call_on_cpu(cpu, smp_mon_event_count, rr, false);
 
+out_ctx_free:
 	if (rr->arch_mon_ctx)
 		resctrl_arch_mon_ctx_free(r, evt->evtid, rr->arch_mon_ctx);
 }
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 340b847ab397..8c76ac133bca 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -518,6 +518,7 @@ static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 {
 	switch (rr->r->rid) {
 	case RDT_RESOURCE_L3:
+		WARN_ON_ONCE(rr->evt->any_cpu);
 		if (rr->hdr)
 			return __l3_mon_event_count(rdtgrp, rr);
 		else
@@ -987,7 +988,7 @@ struct mon_evt mon_event_all[QOS_NUM_EVENTS] = {
 	},
 };
 
-void resctrl_enable_mon_event(enum resctrl_event_id eventid)
+void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu)
 {
 	if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS))
 		return;
@@ -996,6 +997,7 @@ void resctrl_enable_mon_event(enum resctrl_event_id eventid)
 		return;
 	}
 
+	mon_event_all[eventid].any_cpu = any_cpu;
 	mon_event_all[eventid].enabled = true;
 }
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v17 12/32] x86,fs/resctrl: Support binary fixed point event counters
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (10 preceding siblings ...)
  2025-12-17 17:20 ` [PATCH v17 11/32] x86,fs/resctrl: Handle events that can be read from any CPU Tony Luck
@ 2025-12-17 17:20 ` Tony Luck
  2025-12-17 17:21 ` [PATCH v17 13/32] x86,fs/resctrl: Add an architectural hook called for each mount Tony Luck
                   ` (20 subsequent siblings)
  32 siblings, 0 replies; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:20 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

resctrl assumes that all monitor events can be displayed as unsigned decimal
integers.

Hardware architecture counters may provide some telemetry events with greater
precision where the event is not a simple count, but is a measurement of
some sort (e.g. Joules for energy consumed).

Add a new argument to resctrl_enable_mon_event() for architecture code
to inform the file system that the value for a counter is a fixed-point
value with a specific number of binary places.
Only allow architecture to use floating point format on events that the
file system has marked with mon_evt::is_floating_point which reflects the
contract with user space on how the event values are displayed.

Display fixed point values with values rounded to ceil(binary_bits * log10(2))
decimal places. Special case for zero binary bits to print "{value}.0".

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 include/linux/resctrl.h            |  3 +-
 fs/resctrl/internal.h              |  8 ++++
 arch/x86/kernel/cpu/resctrl/core.c |  6 +--
 fs/resctrl/ctrlmondata.c           | 74 ++++++++++++++++++++++++++++++
 fs/resctrl/monitor.c               | 10 +++-
 5 files changed, 95 insertions(+), 6 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 22c5d07fe9ff..c43526cdf304 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -412,7 +412,8 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
 u32 resctrl_arch_system_num_rmid_idx(void);
 int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
 
-void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu);
+void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu,
+			      unsigned int binary_bits);
 
 bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid);
 
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index fb0b6e40d022..14e5a9ed1fbd 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -62,6 +62,9 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
  *			Only valid if @evtid is an MBM event.
  * @configurable:	true if the event is configurable
  * @any_cpu:		true if the event can be read from any CPU
+ * @is_floating_point:	event values are displayed in floating point format
+ * @binary_bits:	number of fixed-point binary bits from architecture,
+ *			only valid if @is_floating_point is true
  * @enabled:		true if the event is enabled
  */
 struct mon_evt {
@@ -71,6 +74,8 @@ struct mon_evt {
 	u32			evt_cfg;
 	bool			configurable;
 	bool			any_cpu;
+	bool			is_floating_point;
+	unsigned int		binary_bits;
 	bool			enabled;
 };
 
@@ -79,6 +84,9 @@ extern struct mon_evt mon_event_all[QOS_NUM_EVENTS];
 #define for_each_mon_event(mevt) for (mevt = &mon_event_all[QOS_FIRST_EVENT];	\
 				      mevt < &mon_event_all[QOS_NUM_EVENTS]; mevt++)
 
+/* Limit for mon_evt::binary_bits */
+#define MAX_BINARY_BITS	27
+
 /**
  * struct mon_data - Monitoring details for each event file.
  * @list:            Member of the global @mon_data_kn_priv_list list.
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index bd4a98106153..9222eee7ce07 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -902,15 +902,15 @@ static __init bool get_rdt_mon_resources(void)
 	bool ret = false;
 
 	if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) {
-		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false);
+		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false, 0);
 		ret = true;
 	}
 	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
-		resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID, false);
+		resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID, false, 0);
 		ret = true;
 	}
 	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
-		resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID, false);
+		resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID, false, 0);
 		ret = true;
 	}
 	if (rdt_cpu_has(X86_FEATURE_ABMC))
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index 2c69fcd70eeb..f319fd1a6de3 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -17,6 +17,7 @@
 
 #include <linux/cpu.h>
 #include <linux/kernfs.h>
+#include <linux/math.h>
 #include <linux/seq_file.h>
 #include <linux/slab.h>
 #include <linux/tick.h>
@@ -601,6 +602,77 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 		resctrl_arch_mon_ctx_free(r, evt->evtid, rr->arch_mon_ctx);
 }
 
+/*
+ * Decimal place precision to use for each number of fixed-point
+ * binary bits computed from ceil(binary_bits * log10(2)) except
+ * binary_bits == 0 which will print "value.0"
+ */
+static const unsigned int decplaces[MAX_BINARY_BITS + 1] = {
+	[0]  =  1,
+	[1]  =  1,
+	[2]  =  1,
+	[3]  =  1,
+	[4]  =  2,
+	[5]  =  2,
+	[6]  =  2,
+	[7]  =  3,
+	[8]  =  3,
+	[9]  =  3,
+	[10] =  4,
+	[11] =  4,
+	[12] =  4,
+	[13] =  4,
+	[14] =  5,
+	[15] =  5,
+	[16] =  5,
+	[17] =  6,
+	[18] =  6,
+	[19] =  6,
+	[20] =  7,
+	[21] =  7,
+	[22] =  7,
+	[23] =  7,
+	[24] =  8,
+	[25] =  8,
+	[26] =  8,
+	[27] =  9
+};
+
+static void print_event_value(struct seq_file *m, unsigned int binary_bits, u64 val)
+{
+	unsigned long long frac = 0;
+
+	if (binary_bits) {
+		/* Mask off the integer part of the fixed-point value. */
+		frac = val & GENMASK_ULL(binary_bits - 1, 0);
+
+		/*
+		 * Multiply by 10^{desired decimal places}. The integer part of
+		 * the fixed point value is now almost what is needed.
+		 */
+		frac *= int_pow(10ull, decplaces[binary_bits]);
+
+		/*
+		 * Round to nearest by adding a value that would be a "1" in the
+		 * binary_bits + 1 place.  Integer part of fixed point value is
+		 * now the needed value.
+		 */
+		frac += 1ull << (binary_bits - 1);
+
+		/*
+		 * Extract the integer part of the value. This is the decimal
+		 * representation of the original fixed-point fractional value.
+		 */
+		frac >>= binary_bits;
+	}
+
+	/*
+	 * "frac" is now in the range [0 .. 10^decplaces).  I.e. string
+	 * representation will fit into chosen number of decimal places.
+	 */
+	seq_printf(m, "%llu.%0*llu\n", val >> binary_bits, decplaces[binary_bits], frac);
+}
+
 int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 {
 	struct kernfs_open_file *of = m->private;
@@ -678,6 +750,8 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 		seq_puts(m, "Unavailable\n");
 	else if (rr.err == -ENOENT)
 		seq_puts(m, "Unassigned\n");
+	else if (evt->is_floating_point)
+		print_event_value(m, evt->binary_bits, rr.val);
 	else
 		seq_printf(m, "%llu\n", rr.val);
 
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 8c76ac133bca..844cf6875f60 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -988,16 +988,22 @@ struct mon_evt mon_event_all[QOS_NUM_EVENTS] = {
 	},
 };
 
-void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu)
+void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu, unsigned int binary_bits)
 {
-	if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS))
+	if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS ||
+			 binary_bits > MAX_BINARY_BITS))
 		return;
 	if (mon_event_all[eventid].enabled) {
 		pr_warn("Duplicate enable for event %d\n", eventid);
 		return;
 	}
+	if (binary_bits && !mon_event_all[eventid].is_floating_point) {
+		pr_warn("Event %d may not be floating point\n", eventid);
+		return;
+	}
 
 	mon_event_all[eventid].any_cpu = any_cpu;
+	mon_event_all[eventid].binary_bits = binary_bits;
 	mon_event_all[eventid].enabled = true;
 }
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v17 13/32] x86,fs/resctrl: Add an architectural hook called for each mount
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (11 preceding siblings ...)
  2025-12-17 17:20 ` [PATCH v17 12/32] x86,fs/resctrl: Support binary fixed point event counters Tony Luck
@ 2025-12-17 17:21 ` Tony Luck
  2026-01-05 19:17   ` Borislav Petkov
  2025-12-17 17:21 ` [PATCH v17 14/32] x86,fs/resctrl: Add and initialize a resource for package scope monitoring Tony Luck
                   ` (19 subsequent siblings)
  32 siblings, 1 reply; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:21 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Enumeration of Intel telemetry events is an asynchronous process involving
several mutually dependent drivers added as auxiliary devices during
the device_initcall() phase of Linux boot. The process finishes after
the probe functions of these drivers completes. But this happens after
resctrl_arch_late_init() is executed.

Tracing the enumeration process shows that it does complete a full seven
seconds before the earliest possible mount of the resctrl file system (when
included in /etc/fstab for automatic mount by systemd).

Add a hook at the beginning of the mount code that will be used to check
for telemetry events and initialize if any are found.

Call the hook on every attempted mount. Expectations are that most actions
(like enumeration) will only need to be performed on the first call.

resctrl filesystem calls the hook with no locks held. Architecture code is
responsible for any required locking.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 include/linux/resctrl.h            | 6 ++++++
 arch/x86/kernel/cpu/resctrl/core.c | 9 +++++++++
 fs/resctrl/rdtgroup.c              | 2 ++
 3 files changed, 17 insertions(+)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index c43526cdf304..dc148b7feb71 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -514,6 +514,12 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *h
 void resctrl_online_cpu(unsigned int cpu);
 void resctrl_offline_cpu(unsigned int cpu);
 
+/*
+ * Architecture hook called at beginning of each file system mount attempt.
+ * No locks are held.
+ */
+void resctrl_arch_pre_mount(void);
+
 /**
  * resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid
  *			      for this resource and domain.
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 9222eee7ce07..2dd48b59ba9b 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -726,6 +726,15 @@ static int resctrl_arch_offline_cpu(unsigned int cpu)
 	return 0;
 }
 
+void resctrl_arch_pre_mount(void)
+{
+	static atomic_t only_once = ATOMIC_INIT(0);
+	int old = 0;
+
+	if (!atomic_try_cmpxchg(&only_once, &old, 1))
+		return;
+}
+
 enum {
 	RDT_FLAG_CMT,
 	RDT_FLAG_MBM_TOTAL,
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 771e40f02ba6..b20d104ea0c9 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2785,6 +2785,8 @@ static int rdt_get_tree(struct fs_context *fc)
 	struct rdt_resource *r;
 	int ret;
 
+	resctrl_arch_pre_mount();
+
 	cpus_read_lock();
 	mutex_lock(&rdtgroup_mutex);
 	/*
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH v17 13/32] x86,fs/resctrl: Add an architectural hook called for each mount
  2025-12-17 17:21 ` [PATCH v17 13/32] x86,fs/resctrl: Add an architectural hook called for each mount Tony Luck
@ 2026-01-05 19:17   ` Borislav Petkov
  2026-01-05 19:39     ` Luck, Tony
  0 siblings, 1 reply; 58+ messages in thread
From: Borislav Petkov @ 2026-01-05 19:17 UTC (permalink / raw)
  To: Tony Luck
  Cc: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu, x86,
	linux-kernel, patches

On Wed, Dec 17, 2025 at 09:21:00AM -0800, Tony Luck wrote:
> +void resctrl_arch_pre_mount(void)
> +{
> +	static atomic_t only_once = ATOMIC_INIT(0);
> +	int old = 0;
> +
> +	if (!atomic_try_cmpxchg(&only_once, &old, 1))
> +		return;
> +}

There's

#define DO_ONCE(func, ...)

Can't use that?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v17 13/32] x86,fs/resctrl: Add an architectural hook called for each mount
  2026-01-05 19:17   ` Borislav Petkov
@ 2026-01-05 19:39     ` Luck, Tony
  2026-01-05 20:04       ` Borislav Petkov
  0 siblings, 1 reply; 58+ messages in thread
From: Luck, Tony @ 2026-01-05 19:39 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu, x86,
	linux-kernel, patches

On Mon, Jan 05, 2026 at 08:17:11PM +0100, Borislav Petkov wrote:
> On Wed, Dec 17, 2025 at 09:21:00AM -0800, Tony Luck wrote:
> > +void resctrl_arch_pre_mount(void)
> > +{
> > +	static atomic_t only_once = ATOMIC_INIT(0);
> > +	int old = 0;
> > +
> > +	if (!atomic_try_cmpxchg(&only_once, &old, 1))
> > +		return;
> > +}
> 
> There's
> 
> #define DO_ONCE(func, ...)
> 
> Can't use that?

Learn something new every day. Yes, <linux/once.h> looks possible here.

Though I believe I'll need the DO_ONCE_SLEEPABLE() version since 
resctrl_arch_pre_mount() grabs a mutex and various called functions
allocate memory using kmalloc().

-Tony

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v17 13/32] x86,fs/resctrl: Add an architectural hook called for each mount
  2026-01-05 19:39     ` Luck, Tony
@ 2026-01-05 20:04       ` Borislav Petkov
  2026-01-05 20:15         ` Luck, Tony
  0 siblings, 1 reply; 58+ messages in thread
From: Borislav Petkov @ 2026-01-05 20:04 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu, x86,
	linux-kernel, patches

On Mon, Jan 05, 2026 at 11:39:29AM -0800, Luck, Tony wrote:
> Learn something new every day. Yes, <linux/once.h> looks possible here.
> 
> Though I believe I'll need the DO_ONCE_SLEEPABLE() version since 
> resctrl_arch_pre_mount() grabs a mutex and various called functions
> allocate memory using kmalloc().

Aha.

Ok, if it works and passes testing, I could wait for you to send me an updated
patch and drop this one.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [PATCH v17 13/32] x86,fs/resctrl: Add an architectural hook called for each mount
  2026-01-05 20:04       ` Borislav Petkov
@ 2026-01-05 20:15         ` Luck, Tony
  2026-01-07 17:29           ` Reinette Chatre
  0 siblings, 1 reply; 58+ messages in thread
From: Luck, Tony @ 2026-01-05 20:15 UTC (permalink / raw)
  To: Borislav Petkov, Chatre, Reinette
  Cc: Fenghua Yu, Wieczor-Retman, Maciej, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Chen, Yu C, x86@kernel.org,
	linux-kernel@vger.kernel.org, patches@lists.linux.dev

> Ok, if it works and passes testing, I could wait for you to send me an updated
> patch and drop this one.

Building and testing now.

Reinette: When originally developing this you suggested that rdt_get_tree()
should call resctrl_arch_pre_mount() on *every* mount (to make it generally
useful should future changes need something to be done in architecture code
on each mount).

That flexibility isn't needed for enumerating telemetry events. Boris' suggestion
to use DO_ONCE_SLEEPABLE() would revert to what I had in some earlier
version where rdt_get_tree() only calls this hook on first mount.

Are you OK with this? Or do you still think that the hook should be called on
every mount?

-Tony

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v17 13/32] x86,fs/resctrl: Add an architectural hook called for each mount
  2026-01-05 20:15         ` Luck, Tony
@ 2026-01-07 17:29           ` Reinette Chatre
  2026-01-07 18:05             ` Luck, Tony
  0 siblings, 1 reply; 58+ messages in thread
From: Reinette Chatre @ 2026-01-07 17:29 UTC (permalink / raw)
  To: Luck, Tony, Borislav Petkov
  Cc: Fenghua Yu, Wieczor-Retman, Maciej, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Chen, Yu C, x86@kernel.org,
	linux-kernel@vger.kernel.org, patches@lists.linux.dev

Hi Tony,

On 1/5/26 12:15 PM, Luck, Tony wrote:
>> Ok, if it works and passes testing, I could wait for you to send me an updated
>> patch and drop this one.
> 
> Building and testing now.
> 
> Reinette: When originally developing this you suggested that rdt_get_tree()
> should call resctrl_arch_pre_mount() on *every* mount (to make it generally
> useful should future changes need something to be done in architecture code
> on each mount).

I'm digging through the history just to refresh on why I made that comment. From what I can
tell this work always called the AET init on every mount attempt. One difference is that during
v2 it did so by taking some extra locks before doing so, but still did the AET init before
resctrl's "resctrl_mounted" check. The move to current spot (before extra locks) was made in v3,
and looking at v2 comments I could just find a request to use a generic resctrl_arch_* helper in
fs code instead of the arch specific rdt_get_intel_aet_mount() called from fs code.

> 
> That flexibility isn't needed for enumerating telemetry events. Boris' suggestion
> to use DO_ONCE_SLEEPABLE() would revert to what I had in some earlier
> version where rdt_get_tree() only calls this hook on first mount.

I think I am missing something here - even the original RFC calls the AET init on
every mount. Which version are you referring to? I am also missing why DO_ONCE_SLEEPABLE()
requires a flow change.

> 
> Are you OK with this? Or do you still think that the hook should be called on
> every mount?

To be specific, the current implementation calls the resctrl_arch_pre_mount() hook on
every mount *attempt*. For the hook to be called on every mount it should be after the
resctrl_mounted check. This would change resctrl_arch_pre_mount() to be called with
rdtgroup_mutex held though but that seems trouble since resctrl_arch_pre_mount() currently
follows lock ordering of domain_list_lock then rdtgroup_mutex to match lock ordering
during resctrl init.

Reinette

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v17 13/32] x86,fs/resctrl: Add an architectural hook called for each mount
  2026-01-07 17:29           ` Reinette Chatre
@ 2026-01-07 18:05             ` Luck, Tony
  2026-01-07 19:33               ` Reinette Chatre
  0 siblings, 1 reply; 58+ messages in thread
From: Luck, Tony @ 2026-01-07 18:05 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Borislav Petkov, Fenghua Yu, Wieczor-Retman, Maciej, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen, Yu C,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	patches@lists.linux.dev

On Wed, Jan 07, 2026 at 09:29:27AM -0800, Reinette Chatre wrote:
> Hi Tony,
> 
> On 1/5/26 12:15 PM, Luck, Tony wrote:
> >> Ok, if it works and passes testing, I could wait for you to send me an updated
> >> patch and drop this one.
> > 
> > Building and testing now.
> > 
> > Reinette: When originally developing this you suggested that rdt_get_tree()
> > should call resctrl_arch_pre_mount() on *every* mount (to make it generally
> > useful should future changes need something to be done in architecture code
> > on each mount).
> 
> I'm digging through the history just to refresh on why I made that comment. From what I can
> tell this work always called the AET init on every mount attempt. One difference is that during
> v2 it did so by taking some extra locks before doing so, but still did the AET init before
> resctrl's "resctrl_mounted" check. The move to current spot (before extra locks) was made in v3,
> and looking at v2 comments I could just find a request to use a generic resctrl_arch_* helper in
> fs code instead of the arch specific rdt_get_intel_aet_mount() called from fs code.

I should stop relying on my memory and check the actual history. You are
right. The call from rdt_get_tree() has moved, and changed name, but all
versions called it every time.

> > 
> > That flexibility isn't needed for enumerating telemetry events. Boris' suggestion
> > to use DO_ONCE_SLEEPABLE() would revert to what I had in some earlier
> > version where rdt_get_tree() only calls this hook on first mount.
> 
> I think I am missing something here - even the original RFC calls the AET init on
> every mount. Which version are you referring to? I am also missing why DO_ONCE_SLEEPABLE()
> requires a flow change.
> 
> > 
> > Are you OK with this? Or do you still think that the hook should be called on
> > every mount?
> 
> To be specific, the current implementation calls the resctrl_arch_pre_mount() hook on
> every mount *attempt*. For the hook to be called on every mount it should be after the
> resctrl_mounted check. This would change resctrl_arch_pre_mount() to be called with
> rdtgroup_mutex held though but that seems trouble since resctrl_arch_pre_mount() currently
> follows lock ordering of domain_list_lock then rdtgroup_mutex to match lock ordering
> during resctrl init.

Yes, the call was moved before any locks obtained because of lock
ordering issues with domain_list_lock.

A better summary of the change is that the "only once" logic is being
moved from open-coded using atomic operations in resctrl_arch_pre_mount()
to using DO_ONCE_SLEEPABLE() in rdt_get_tree().
> 
> Reinette

-Tony

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v17 13/32] x86,fs/resctrl: Add an architectural hook called for each mount
  2026-01-07 18:05             ` Luck, Tony
@ 2026-01-07 19:33               ` Reinette Chatre
  2026-01-07 20:25                 ` Luck, Tony
  0 siblings, 1 reply; 58+ messages in thread
From: Reinette Chatre @ 2026-01-07 19:33 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Borislav Petkov, Fenghua Yu, Wieczor-Retman, Maciej, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen, Yu C,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	patches@lists.linux.dev

Hi Tony,

On 1/7/26 10:05 AM, Luck, Tony wrote:
> On Wed, Jan 07, 2026 at 09:29:27AM -0800, Reinette Chatre wrote:
>> Hi Tony,
>>
>> On 1/5/26 12:15 PM, Luck, Tony wrote:
>>>> Ok, if it works and passes testing, I could wait for you to send me an updated
>>>> patch and drop this one.
>>>
>>> Building and testing now.
>>>
>>> Reinette: When originally developing this you suggested that rdt_get_tree()
>>> should call resctrl_arch_pre_mount() on *every* mount (to make it generally
>>> useful should future changes need something to be done in architecture code
>>> on each mount).
>>
>> I'm digging through the history just to refresh on why I made that comment. From what I can
>> tell this work always called the AET init on every mount attempt. One difference is that during
>> v2 it did so by taking some extra locks before doing so, but still did the AET init before
>> resctrl's "resctrl_mounted" check. The move to current spot (before extra locks) was made in v3,
>> and looking at v2 comments I could just find a request to use a generic resctrl_arch_* helper in
>> fs code instead of the arch specific rdt_get_intel_aet_mount() called from fs code.
> 
> I should stop relying on my memory and check the actual history. You are
> right. The call from rdt_get_tree() has moved, and changed name, but all
> versions called it every time.
> 
>>>
>>> That flexibility isn't needed for enumerating telemetry events. Boris' suggestion
>>> to use DO_ONCE_SLEEPABLE() would revert to what I had in some earlier
>>> version where rdt_get_tree() only calls this hook on first mount.
>>
>> I think I am missing something here - even the original RFC calls the AET init on
>> every mount. Which version are you referring to? I am also missing why DO_ONCE_SLEEPABLE()
>> requires a flow change.
>>
>>>
>>> Are you OK with this? Or do you still think that the hook should be called on
>>> every mount?
>>
>> To be specific, the current implementation calls the resctrl_arch_pre_mount() hook on
>> every mount *attempt*. For the hook to be called on every mount it should be after the
>> resctrl_mounted check. This would change resctrl_arch_pre_mount() to be called with
>> rdtgroup_mutex held though but that seems trouble since resctrl_arch_pre_mount() currently
>> follows lock ordering of domain_list_lock then rdtgroup_mutex to match lock ordering
>> during resctrl init.
> 
> Yes, the call was moved before any locks obtained because of lock
> ordering issues with domain_list_lock.
> 
> A better summary of the change is that the "only once" logic is being
> moved from open-coded using atomic operations in resctrl_arch_pre_mount()
> to using DO_ONCE_SLEEPABLE() in rdt_get_tree().

One thing about DO_ONCE_SLEEPABLE() that is unexpected to me is that it disables the static
key in a workqueue that seems unnecessary. Original motivation for the workqueue on which DO_ONCE()
is based (per commit a48e42920ff3 ("net: introduce new macro net_get_random_once")) was to support
calling code from atomic sections. Looks like DO_ONCE_SLEEPABLE() copied this implementation. It switched
the spinlock to mutex but the changelog (62c07983bef9 ("once: add DO_ONCE_SLOW() for sleepable contexts")) does
not mention revisiting the workqueue. Looks like deferring it to workqueue does make it easier to not have
to worry whether helper is called with hotplug lock held or not though.

Do you see any issue with deferring the disable of the static key in the resctrl usage? Since
resctrl_arch_pre_mount() is called without any locks in resctrl control it now relies on fs code
to not have rdt_get_tree() called concurrently and thus risk resctrl_arch_pre_mount() called before
static key is disabled? I just want to make sure here since from what I can tell this makes resctrl the
first user of this helper apart from code for which this helper was created and there may be implicit
assumptions that resctrl does not adhere to.

Reinette

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v17 13/32] x86,fs/resctrl: Add an architectural hook called for each mount
  2026-01-07 19:33               ` Reinette Chatre
@ 2026-01-07 20:25                 ` Luck, Tony
  2026-01-07 22:09                   ` Reinette Chatre
  0 siblings, 1 reply; 58+ messages in thread
From: Luck, Tony @ 2026-01-07 20:25 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Borislav Petkov, Fenghua Yu, Wieczor-Retman, Maciej, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen, Yu C,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	patches@lists.linux.dev

On Wed, Jan 07, 2026 at 11:33:26AM -0800, Reinette Chatre wrote:
> > A better summary of the change is that the "only once" logic is being
> > moved from open-coded using atomic operations in resctrl_arch_pre_mount()
> > to using DO_ONCE_SLEEPABLE() in rdt_get_tree().
> 
> One thing about DO_ONCE_SLEEPABLE() that is unexpected to me is that it disables the static
> key in a workqueue that seems unnecessary. Original motivation for the workqueue on which DO_ONCE()
> is based (per commit a48e42920ff3 ("net: introduce new macro net_get_random_once")) was to support
> calling code from atomic sections. Looks like DO_ONCE_SLEEPABLE() copied this implementation. It switched
> the spinlock to mutex but the changelog (62c07983bef9 ("once: add DO_ONCE_SLOW() for sleepable contexts")) does
> not mention revisiting the workqueue. Looks like deferring it to workqueue does make it easier to not have
> to worry whether helper is called with hotplug lock held or not though.
> 
> Do you see any issue with deferring the disable of the static key in the resctrl usage? Since
> resctrl_arch_pre_mount() is called without any locks in resctrl control it now relies on fs code
> to not have rdt_get_tree() called concurrently and thus risk resctrl_arch_pre_mount() called before
> static key is disabled? I just want to make sure here since from what I can tell this makes resctrl the
> first user of this helper apart from code for which this helper was created and there may be implicit
> assumptions that resctrl does not adhere to.

Reinette,

The deferred reset of the static key does seem unnecessary.

But it looks like DO_ONCE_SLEEPABLE() is still correct. If there are
multiple parallel calls before the static key is reset, then the 2nd and
subsequent instance will block on mutex_lock(&once_mutex) in
__do_once_sleepable_start(). When it is their turn, they will find that
"*done" is true, so resctrl_arch_pre_mount() will not be called again.

Side note: This global "once_mutex" means that any other subsystem using
DO_ONCE_SLEEPABLE() would be blocked waiting for resctrl_arch_pre_mount()
to complete. Same is true for DO_ONCE() where parallel calls from
different subsystems would be serialized by the "once_lock" spinlock.

If these DO_ONCE macros are ever used heavily in run-time code, it might
be better for once_lock and once_mutex to be statically defined in each
invocation of the DO_ONCE() and DO_ONCE_SLEEPABLE() macros. But the fact
that the static key protects the spinlock/mutex from being called may
mean that it is practically hard to hit problems.

-Tony

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v17 13/32] x86,fs/resctrl: Add an architectural hook called for each mount
  2026-01-07 20:25                 ` Luck, Tony
@ 2026-01-07 22:09                   ` Reinette Chatre
  2026-01-07 22:27                     ` Luck, Tony
  0 siblings, 1 reply; 58+ messages in thread
From: Reinette Chatre @ 2026-01-07 22:09 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Borislav Petkov, Fenghua Yu, Wieczor-Retman, Maciej, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen, Yu C,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	patches@lists.linux.dev

Hi Tony,

On 1/7/26 12:25 PM, Luck, Tony wrote:
> On Wed, Jan 07, 2026 at 11:33:26AM -0800, Reinette Chatre wrote:
>>> A better summary of the change is that the "only once" logic is being
>>> moved from open-coded using atomic operations in resctrl_arch_pre_mount()
>>> to using DO_ONCE_SLEEPABLE() in rdt_get_tree().
>>
>> One thing about DO_ONCE_SLEEPABLE() that is unexpected to me is that it disables the static
>> key in a workqueue that seems unnecessary. Original motivation for the workqueue on which DO_ONCE()
>> is based (per commit a48e42920ff3 ("net: introduce new macro net_get_random_once")) was to support
>> calling code from atomic sections. Looks like DO_ONCE_SLEEPABLE() copied this implementation. It switched
>> the spinlock to mutex but the changelog (62c07983bef9 ("once: add DO_ONCE_SLOW() for sleepable contexts")) does
>> not mention revisiting the workqueue. Looks like deferring it to workqueue does make it easier to not have
>> to worry whether helper is called with hotplug lock held or not though.
>>
>> Do you see any issue with deferring the disable of the static key in the resctrl usage? Since
>> resctrl_arch_pre_mount() is called without any locks in resctrl control it now relies on fs code
>> to not have rdt_get_tree() called concurrently and thus risk resctrl_arch_pre_mount() called before
>> static key is disabled? I just want to make sure here since from what I can tell this makes resctrl the
>> first user of this helper apart from code for which this helper was created and there may be implicit
>> assumptions that resctrl does not adhere to.
> 
> Reinette,
> 
> The deferred reset of the static key does seem unnecessary.
> 
> But it looks like DO_ONCE_SLEEPABLE() is still correct. If there are
> multiple parallel calls before the static key is reset, then the 2nd and
> subsequent instance will block on mutex_lock(&once_mutex) in
> __do_once_sleepable_start(). When it is their turn, they will find that
> "*done" is true, so resctrl_arch_pre_mount() will not be called again.

Ah - I see. Thank you. So in addition to the static key there is the "done"
variable that is protected with the mutex and protects against a second call
before static key can be disabled.

> 
> Side note: This global "once_mutex" means that any other subsystem using
> DO_ONCE_SLEEPABLE() would be blocked waiting for resctrl_arch_pre_mount()
> to complete. Same is true for DO_ONCE() where parallel calls from
> different subsystems would be serialized by the "once_lock" spinlock.

oh, good catch.

> 
> If these DO_ONCE macros are ever used heavily in run-time code, it might
> be better for once_lock and once_mutex to be statically defined in each
> invocation of the DO_ONCE() and DO_ONCE_SLEEPABLE() macros. But the fact
> that the static key protects the spinlock/mutex from being called may
> mean that it is practically hard to hit problems.

Which problems do you have in mind? One problem I see is that since these "once"
functions are globally forced to be serialized this may cause unnecessary delays,
for example during initialization. I do not think this impacts the resctrl intended
usage since resctrl_arch_pre_mount() is not called during initialization and is
already ok with delays (it is on a "slow" path).

Reinette



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v17 13/32] x86,fs/resctrl: Add an architectural hook called for each mount
  2026-01-07 22:09                   ` Reinette Chatre
@ 2026-01-07 22:27                     ` Luck, Tony
  2026-01-07 23:09                       ` Reinette Chatre
  0 siblings, 1 reply; 58+ messages in thread
From: Luck, Tony @ 2026-01-07 22:27 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Borislav Petkov, Fenghua Yu, Wieczor-Retman, Maciej, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen, Yu C,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	patches@lists.linux.dev

On Wed, Jan 07, 2026 at 02:09:35PM -0800, Reinette Chatre wrote:
> Hi Tony,
> > If these DO_ONCE macros are ever used heavily in run-time code, it might
> > be better for once_lock and once_mutex to be statically defined in each
> > invocation of the DO_ONCE() and DO_ONCE_SLEEPABLE() macros. But the fact
> > that the static key protects the spinlock/mutex from being called may
> > mean that it is practically hard to hit problems.
> 
> Which problems do you have in mind? One problem I see is that since these "once"
> functions are globally forced to be serialized this may cause unnecessary delays,
> for example during initialization. I do not think this impacts the resctrl intended
> usage since resctrl_arch_pre_mount() is not called during initialization and is
> already ok with delays (it is on a "slow" path).

Reinette

Yes. Unnecessary delays due to serialization. But that only happens if
the first call to a DO_ONCE*() instance overlaps with another first
call. It might be quite hard to hit that during boot unless there are
many uses of DO_ONCE*()

Looking at this some more, DO_ONCE() is overkill for mounting resctrl. The
static key part is there so that DO_ONCE*() can be safely used in some
hot code path without adding overhead of checking some "bool done" type
variable and branching around it.  I don't see anyone except validation
executing resctrl mounts at multiple times per second.

But it does make the code easier to read with a single line with obvious
meaning instead of multiple lines with declarations, initializations,
and if () conditions.

-Tony

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v17 13/32] x86,fs/resctrl: Add an architectural hook called for each mount
  2026-01-07 22:27                     ` Luck, Tony
@ 2026-01-07 23:09                       ` Reinette Chatre
  2026-01-08  0:16                         ` Luck, Tony
  0 siblings, 1 reply; 58+ messages in thread
From: Reinette Chatre @ 2026-01-07 23:09 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Borislav Petkov, Fenghua Yu, Wieczor-Retman, Maciej, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen, Yu C,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	patches@lists.linux.dev

Hi Tony,

On 1/7/26 2:27 PM, Luck, Tony wrote:
> On Wed, Jan 07, 2026 at 02:09:35PM -0800, Reinette Chatre wrote:
>> Hi Tony,
>>> If these DO_ONCE macros are ever used heavily in run-time code, it might
>>> be better for once_lock and once_mutex to be statically defined in each
>>> invocation of the DO_ONCE() and DO_ONCE_SLEEPABLE() macros. But the fact
>>> that the static key protects the spinlock/mutex from being called may
>>> mean that it is practically hard to hit problems.
>>
>> Which problems do you have in mind? One problem I see is that since these "once"
>> functions are globally forced to be serialized this may cause unnecessary delays,
>> for example during initialization. I do not think this impacts the resctrl intended
>> usage since resctrl_arch_pre_mount() is not called during initialization and is
>> already ok with delays (it is on a "slow" path).
> 
> Reinette
> 
> Yes. Unnecessary delays due to serialization. But that only happens if
> the first call to a DO_ONCE*() instance overlaps with another first
> call. It might be quite hard to hit that during boot unless there are
> many uses of DO_ONCE*()
> 
> Looking at this some more, DO_ONCE() is overkill for mounting resctrl. The
> static key part is there so that DO_ONCE*() can be safely used in some
> hot code path without adding overhead of checking some "bool done" type
> variable and branching around it.  I don't see anyone except validation
> executing resctrl mounts at multiple times per second.
> 
> But it does make the code easier to read with a single line with obvious
> meaning instead of multiple lines with declarations, initializations,
> and if () conditions.

I am ok with using DO_ONCE_SLEEPABLE(). The next question (perhaps nitpicking?) is
if it is resctrl fs or the arch's decision to use this. That is, whether the flow is
something like below where the arch decides:
	arch/x86/kernel/cpu/resctrl/core.c:
		void resctrl_arch_pre_mount(void)
		{
			DO_ONCE_SLEEPABLE(aet_specific_call);
		}

	fs/resctrl/rdtgroup.c:
		static int rdt_get_tree(struct fs_context *fc)
		{
			...
			resctrl_arch_pre_mount();
			...
		}

or something like below where resctrl fs dictates the function can only be called once:

	arch/x86/kernel/cpu/resctrl/core.c:
		void resctrl_arch_pre_mount(void)
		{
			/* AET specific code */
		}

	fs/resctrl/rdtgroup.c:
		static int rdt_get_tree(struct fs_context *fc)
		{
			...
			DO_ONCE_SLEEPABLE(resctrl_arch_pre_mount);
			...
		}

It looks to me as though the first option creates opportunity for better isolation
of AET code into arch/x86/kernel/cpu/resctrl/intel_aet.c, specifically, it needs fewer AET
stubs in arch/x86/kernel/cpu/resctrl/internal.h. I do not envision resctrl fs needing
to call resctrl_arch_pre_mount() multiple times but the safe pattern appears to be to
place DO_ONCE* in a helper function to ensure that only one static key is ever created.

While the first option allows more flexibility to the arch that should not be a reason though
since this is internal and we can always change to better accommodate arch requirements.
The question here is just what is best for AET support. What do you think?

Reinette


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v17 13/32] x86,fs/resctrl: Add an architectural hook called for each mount
  2026-01-07 23:09                       ` Reinette Chatre
@ 2026-01-08  0:16                         ` Luck, Tony
  2026-01-08  2:42                           ` Reinette Chatre
  0 siblings, 1 reply; 58+ messages in thread
From: Luck, Tony @ 2026-01-08  0:16 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Borislav Petkov, Fenghua Yu, Wieczor-Retman, Maciej, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen, Yu C,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	patches@lists.linux.dev

On Wed, Jan 07, 2026 at 03:09:24PM -0800, Reinette Chatre wrote:
> Hi Tony,
> 
> On 1/7/26 2:27 PM, Luck, Tony wrote:
> > On Wed, Jan 07, 2026 at 02:09:35PM -0800, Reinette Chatre wrote:
> >> Hi Tony,
> >>> If these DO_ONCE macros are ever used heavily in run-time code, it might
> >>> be better for once_lock and once_mutex to be statically defined in each
> >>> invocation of the DO_ONCE() and DO_ONCE_SLEEPABLE() macros. But the fact
> >>> that the static key protects the spinlock/mutex from being called may
> >>> mean that it is practically hard to hit problems.
> >>
> >> Which problems do you have in mind? One problem I see is that since these "once"
> >> functions are globally forced to be serialized this may cause unnecessary delays,
> >> for example during initialization. I do not think this impacts the resctrl intended
> >> usage since resctrl_arch_pre_mount() is not called during initialization and is
> >> already ok with delays (it is on a "slow" path).
> > 
> > Reinette
> > 
> > Yes. Unnecessary delays due to serialization. But that only happens if
> > the first call to a DO_ONCE*() instance overlaps with another first
> > call. It might be quite hard to hit that during boot unless there are
> > many uses of DO_ONCE*()
> > 
> > Looking at this some more, DO_ONCE() is overkill for mounting resctrl. The
> > static key part is there so that DO_ONCE*() can be safely used in some
> > hot code path without adding overhead of checking some "bool done" type
> > variable and branching around it.  I don't see anyone except validation
> > executing resctrl mounts at multiple times per second.
> > 
> > But it does make the code easier to read with a single line with obvious
> > meaning instead of multiple lines with declarations, initializations,
> > and if () conditions.
> 
> I am ok with using DO_ONCE_SLEEPABLE(). The next question (perhaps nitpicking?) is
> if it is resctrl fs or the arch's decision to use this. That is, whether the flow is
> something like below where the arch decides:
> 	arch/x86/kernel/cpu/resctrl/core.c:
> 		void resctrl_arch_pre_mount(void)
> 		{
> 			DO_ONCE_SLEEPABLE(aet_specific_call);
> 		}

The AET code in resctrl_arch_pre_mount() includes building the domains.
That needs the domain_list_lock mutex and domain_add_cpu_mon() which are
both static in core.c. So either they need to be unstatic'd and added
to "internal.h", or that part of the code needs to stay in core.c

Opinion on making these available to intel_aet.c? I'm not a fan.

Keeping it in core.c means finding out if intel_aet_get_events()
succeeded or not. DO_ONCE_SLEEPABLE() doesn't return the return value
of the called function. It just returns true/false to say if it called
the function.

So with this approach I have:

void resctrl_arch_pre_mount(void)
{
	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
	int cpu;

	if (!DO_ONCE_SLEEPABLE(intel_aet_get_events))
		return;

	// intel_aet_get_events() sets mon_capable if it succeeds
	if (!r->mon_capable)
		return;

	/*
	 * Late discovery of telemetry events means the domains for the
	 * resource were not built. Do that now.
	 */
	cpus_read_lock();
	mutex_lock(&domain_list_lock);
	rdt_mon_capable = true;
	for_each_online_cpu(cpu)
		domain_add_cpu_mon(cpu, r);
	mutex_unlock(&domain_list_lock);
	cpus_read_unlock();
}

It does reduce by one the number of stubs. intel_aet_add_debugfs() can
be static in intel_aet.c

> 	fs/resctrl/rdtgroup.c:
> 		static int rdt_get_tree(struct fs_context *fc)
> 		{
> 			...
> 			resctrl_arch_pre_mount();
> 			...
> 		}
> 
> or something like below where resctrl fs dictates the function can only be called once:
> 
> 	arch/x86/kernel/cpu/resctrl/core.c:
> 		void resctrl_arch_pre_mount(void)
> 		{
> 			/* AET specific code */
This is the minimal change from my current series. So my laziness factor
leans toward it.

> 		}
> 
> 	fs/resctrl/rdtgroup.c:
> 		static int rdt_get_tree(struct fs_context *fc)
> 		{
> 			...
> 			DO_ONCE_SLEEPABLE(resctrl_arch_pre_mount);
> 			...
> 		}
> 
> It looks to me as though the first option creates opportunity for better isolation
> of AET code into arch/x86/kernel/cpu/resctrl/intel_aet.c, specifically, it needs fewer AET
> stubs in arch/x86/kernel/cpu/resctrl/internal.h. I do not envision resctrl fs needing
> to call resctrl_arch_pre_mount() multiple times but the safe pattern appears to be to
> place DO_ONCE* in a helper function to ensure that only one static key is ever created.
> 
> While the first option allows more flexibility to the arch that should not be a reason though
> since this is internal and we can always change to better accommodate arch requirements.
> The question here is just what is best for AET support. What do you think?

The current usage for resctrl_arch_pre_mount() is that it only needs to
be called once. As you say, that could be changed if a new requirement
appears. But the simpler approach today is to put the
DO_ONCE_SLEEPABLE() into rdt_get_tree()

> Reinette

-Tony

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v17 13/32] x86,fs/resctrl: Add an architectural hook called for each mount
  2026-01-08  0:16                         ` Luck, Tony
@ 2026-01-08  2:42                           ` Reinette Chatre
  0 siblings, 0 replies; 58+ messages in thread
From: Reinette Chatre @ 2026-01-08  2:42 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Borislav Petkov, Fenghua Yu, Wieczor-Retman, Maciej, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen, Yu C,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	patches@lists.linux.dev

Hi Tony,

On 1/7/26 4:16 PM, Luck, Tony wrote:
> On Wed, Jan 07, 2026 at 03:09:24PM -0800, Reinette Chatre wrote:
>> Hi Tony,
>>
>> On 1/7/26 2:27 PM, Luck, Tony wrote:
>>> On Wed, Jan 07, 2026 at 02:09:35PM -0800, Reinette Chatre wrote:
>>>> Hi Tony,
>>>>> If these DO_ONCE macros are ever used heavily in run-time code, it might
>>>>> be better for once_lock and once_mutex to be statically defined in each
>>>>> invocation of the DO_ONCE() and DO_ONCE_SLEEPABLE() macros. But the fact
>>>>> that the static key protects the spinlock/mutex from being called may
>>>>> mean that it is practically hard to hit problems.
>>>>
>>>> Which problems do you have in mind? One problem I see is that since these "once"
>>>> functions are globally forced to be serialized this may cause unnecessary delays,
>>>> for example during initialization. I do not think this impacts the resctrl intended
>>>> usage since resctrl_arch_pre_mount() is not called during initialization and is
>>>> already ok with delays (it is on a "slow" path).
>>>
>>> Reinette
>>>
>>> Yes. Unnecessary delays due to serialization. But that only happens if
>>> the first call to a DO_ONCE*() instance overlaps with another first
>>> call. It might be quite hard to hit that during boot unless there are
>>> many uses of DO_ONCE*()
>>>
>>> Looking at this some more, DO_ONCE() is overkill for mounting resctrl. The
>>> static key part is there so that DO_ONCE*() can be safely used in some
>>> hot code path without adding overhead of checking some "bool done" type
>>> variable and branching around it.  I don't see anyone except validation
>>> executing resctrl mounts at multiple times per second.
>>>
>>> But it does make the code easier to read with a single line with obvious
>>> meaning instead of multiple lines with declarations, initializations,
>>> and if () conditions.
>>
>> I am ok with using DO_ONCE_SLEEPABLE(). The next question (perhaps nitpicking?) is
>> if it is resctrl fs or the arch's decision to use this. That is, whether the flow is
>> something like below where the arch decides:
>> 	arch/x86/kernel/cpu/resctrl/core.c:
>> 		void resctrl_arch_pre_mount(void)
>> 		{
>> 			DO_ONCE_SLEEPABLE(aet_specific_call);
>> 		}
> 
> The AET code in resctrl_arch_pre_mount() includes building the domains.
> That needs the domain_list_lock mutex and domain_add_cpu_mon() which are
> both static in core.c. So either they need to be unstatic'd and added
> to "internal.h", or that part of the code needs to stay in core.c
> 
> Opinion on making these available to intel_aet.c? I'm not a fan.

ok, that is fair.

> 
> Keeping it in core.c means finding out if intel_aet_get_events()
> succeeded or not. DO_ONCE_SLEEPABLE() doesn't return the return value
> of the called function. It just returns true/false to say if it called
> the function.
> 
> So with this approach I have:
> 
> void resctrl_arch_pre_mount(void)
> {
> 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
> 	int cpu;
> 
> 	if (!DO_ONCE_SLEEPABLE(intel_aet_get_events))
> 		return;
> 

Thank you for considering. This is getting difficult to read.

> 	// intel_aet_get_events() sets mon_capable if it succeeds
> 	if (!r->mon_capable)
> 		return;
> 
> 	/*
> 	 * Late discovery of telemetry events means the domains for the
> 	 * resource were not built. Do that now.
> 	 */
> 	cpus_read_lock();
> 	mutex_lock(&domain_list_lock);
> 	rdt_mon_capable = true;
> 	for_each_online_cpu(cpu)
> 		domain_add_cpu_mon(cpu, r);
> 	mutex_unlock(&domain_list_lock);
> 	cpus_read_unlock();
> }
> 
> It does reduce by one the number of stubs. intel_aet_add_debugfs() can
> be static in intel_aet.c
> 
>> 	fs/resctrl/rdtgroup.c:
>> 		static int rdt_get_tree(struct fs_context *fc)
>> 		{
>> 			...
>> 			resctrl_arch_pre_mount();
>> 			...
>> 		}
>>
>> or something like below where resctrl fs dictates the function can only be called once:
>>
>> 	arch/x86/kernel/cpu/resctrl/core.c:
>> 		void resctrl_arch_pre_mount(void)
>> 		{
>> 			/* AET specific code */
> This is the minimal change from my current series. So my laziness factor
> leans toward it.
> 
>> 		}
>>
>> 	fs/resctrl/rdtgroup.c:
>> 		static int rdt_get_tree(struct fs_context *fc)
>> 		{
>> 			...
>> 			DO_ONCE_SLEEPABLE(resctrl_arch_pre_mount);
>> 			...
>> 		}
>>
>> It looks to me as though the first option creates opportunity for better isolation
>> of AET code into arch/x86/kernel/cpu/resctrl/intel_aet.c, specifically, it needs fewer AET
>> stubs in arch/x86/kernel/cpu/resctrl/internal.h. I do not envision resctrl fs needing
>> to call resctrl_arch_pre_mount() multiple times but the safe pattern appears to be to
>> place DO_ONCE* in a helper function to ensure that only one static key is ever created.
>>
>> While the first option allows more flexibility to the arch that should not be a reason though
>> since this is internal and we can always change to better accommodate arch requirements.
>> The question here is just what is best for AET support. What do you think?
> 
> The current usage for resctrl_arch_pre_mount() is that it only needs to
> be called once. As you say, that could be changed if a new requirement
> appears. But the simpler approach today is to put the
> DO_ONCE_SLEEPABLE() into rdt_get_tree()

Thank you for considering the options. Placing DO_ONCE_SLEEPABLE() in
rdt_get_tree() is fine by me.

Reinette

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v17 14/32] x86,fs/resctrl: Add and initialize a resource for package scope monitoring
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (12 preceding siblings ...)
  2025-12-17 17:21 ` [PATCH v17 13/32] x86,fs/resctrl: Add an architectural hook called for each mount Tony Luck
@ 2025-12-17 17:21 ` Tony Luck
  2025-12-17 17:21 ` [PATCH v17 15/32] fs/resctrl: Emphasize that L3 monitoring resource is required for summing domains Tony Luck
                   ` (18 subsequent siblings)
  32 siblings, 0 replies; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:21 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Add a new PERF_PKG resource and introduce package level scope for monitoring
telemetry events so that CPU hot plug notifiers can build domains at the
package granularity.

Use the physical package ID available via topology_physical_package_id()
to identify the monitoring domains with package level scope. This enables
user space to use:
 /sys/devices/system/cpu/cpuX/topology/physical_package_id
to identify the monitoring domain a CPU is associated with.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 include/linux/resctrl.h            |  2 ++
 fs/resctrl/internal.h              |  2 ++
 arch/x86/kernel/cpu/resctrl/core.c | 10 ++++++++++
 fs/resctrl/rdtgroup.c              |  2 ++
 4 files changed, 16 insertions(+)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index dc148b7feb71..2a3613f27274 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -53,6 +53,7 @@ enum resctrl_res_level {
 	RDT_RESOURCE_L2,
 	RDT_RESOURCE_MBA,
 	RDT_RESOURCE_SMBA,
+	RDT_RESOURCE_PERF_PKG,
 
 	/* Must be the last */
 	RDT_NUM_RESOURCES,
@@ -270,6 +271,7 @@ enum resctrl_scope {
 	RESCTRL_L2_CACHE = 2,
 	RESCTRL_L3_CACHE = 3,
 	RESCTRL_L3_NODE,
+	RESCTRL_PACKAGE,
 };
 
 /**
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 14e5a9ed1fbd..0110d1175398 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -255,6 +255,8 @@ struct rdtgroup {
 
 #define RFTYPE_ASSIGN_CONFIG		BIT(11)
 
+#define RFTYPE_RES_PERF_PKG		BIT(12)
+
 #define RFTYPE_CTRL_INFO		(RFTYPE_INFO | RFTYPE_CTRL)
 
 #define RFTYPE_MON_INFO			(RFTYPE_INFO | RFTYPE_MON)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 2dd48b59ba9b..986b1303efb9 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -100,6 +100,14 @@ struct rdt_hw_resource rdt_resources_all[RDT_NUM_RESOURCES] = {
 			.schema_fmt		= RESCTRL_SCHEMA_RANGE,
 		},
 	},
+	[RDT_RESOURCE_PERF_PKG] =
+	{
+		.r_resctrl = {
+			.name			= "PERF_PKG",
+			.mon_scope		= RESCTRL_PACKAGE,
+			.mon_domains		= mon_domain_init(RDT_RESOURCE_PERF_PKG),
+		},
+	},
 };
 
 u32 resctrl_arch_system_num_rmid_idx(void)
@@ -440,6 +448,8 @@ static int get_domain_id_from_scope(int cpu, enum resctrl_scope scope)
 		return get_cpu_cacheinfo_id(cpu, scope);
 	case RESCTRL_L3_NODE:
 		return cpu_to_node(cpu);
+	case RESCTRL_PACKAGE:
+		return topology_physical_package_id(cpu);
 	default:
 		break;
 	}
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index b20d104ea0c9..4952ba6b8609 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2395,6 +2395,8 @@ static unsigned long fflags_from_resource(struct rdt_resource *r)
 	case RDT_RESOURCE_MBA:
 	case RDT_RESOURCE_SMBA:
 		return RFTYPE_RES_MB;
+	case RDT_RESOURCE_PERF_PKG:
+		return RFTYPE_RES_PERF_PKG;
 	}
 
 	return WARN_ON_ONCE(1);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v17 15/32] fs/resctrl: Emphasize that L3 monitoring resource is required for summing domains
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (13 preceding siblings ...)
  2025-12-17 17:21 ` [PATCH v17 14/32] x86,fs/resctrl: Add and initialize a resource for package scope monitoring Tony Luck
@ 2025-12-17 17:21 ` Tony Luck
  2025-12-17 17:21 ` [PATCH v17 16/32] x86/resctrl: Discover hardware telemetry events Tony Luck
                   ` (17 subsequent siblings)
  32 siblings, 0 replies; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:21 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

The feature to sum event data across multiple domains supports systems with
Sub-NUMA Cluster (SNC) mode enabled. The top-level monitoring files in each
"mon_L3_XX" directory provide the sum of data across all SNC nodes sharing
an L3 cache instance while the "mon_sub_L3_YY" sub-directories provide the
event data of the individual nodes.

SNC is only associated with the L3 resource and domains and as a result the
flow handling the sum of event data implicitly assumes it is working with
the L3 resource and domains.

Reading of telemetry events do not require to sum event data so this feature
can remain dedicated to SNC and keep the implicit assumption of working with
the L3 resource and domains.

Add a WARN to where the implicit assumption of working with the L3 resource
is made and add comments on how the structure controlling the event sum
feature is used.

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 fs/resctrl/internal.h    | 4 ++--
 fs/resctrl/ctrlmondata.c | 8 +++++++-
 fs/resctrl/rdtgroup.c    | 3 ++-
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 0110d1175398..50d88e91e0da 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -92,8 +92,8 @@ extern struct mon_evt mon_event_all[QOS_NUM_EVENTS];
  * @list:            Member of the global @mon_data_kn_priv_list list.
  * @rid:             Resource id associated with the event file.
  * @evt:             Event structure associated with the event file.
- * @sum:             Set when event must be summed across multiple
- *                   domains.
+ * @sum:             Set for RDT_RESOURCE_L3 when event must be summed
+ *                   across multiple domains.
  * @domid:           When @sum is zero this is the domain to which
  *                   the event file belongs. When @sum is one this
  *                   is the id of the L3 cache that all domains to be
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index f319fd1a6de3..cc4237c57cbe 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -677,7 +677,6 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 {
 	struct kernfs_open_file *of = m->private;
 	enum resctrl_res_level resid;
-	struct rdt_l3_mon_domain *d;
 	struct rdt_domain_hdr *hdr;
 	struct rmid_read rr = {0};
 	struct rdtgroup *rdtgrp;
@@ -705,6 +704,13 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 	r = resctrl_arch_get_resource(resid);
 
 	if (md->sum) {
+		struct rdt_l3_mon_domain *d;
+
+		if (WARN_ON_ONCE(resid != RDT_RESOURCE_L3)) {
+			ret = -EINVAL;
+			goto out;
+		}
+
 		/*
 		 * This file requires summing across all domains that share
 		 * the L3 cache id that was provided in the "domid" field of the
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 4952ba6b8609..dce1f0a6d40b 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -3095,7 +3095,8 @@ static void rmdir_all_sub(void)
  * @rid:    The resource id for the event file being created.
  * @domid:  The domain id for the event file being created.
  * @mevt:   The type of event file being created.
- * @do_sum: Whether SNC summing monitors are being created.
+ * @do_sum: Whether SNC summing monitors are being created. Only set
+ *	    when @rid == RDT_RESOURCE_L3.
  */
 static struct mon_data *mon_get_kn_priv(enum resctrl_res_level rid, int domid,
 					struct mon_evt *mevt,
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v17 16/32] x86/resctrl: Discover hardware telemetry events
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (14 preceding siblings ...)
  2025-12-17 17:21 ` [PATCH v17 15/32] fs/resctrl: Emphasize that L3 monitoring resource is required for summing domains Tony Luck
@ 2025-12-17 17:21 ` Tony Luck
  2025-12-17 17:21 ` [PATCH v17 17/32] x86,fs/resctrl: Fill in details of events for guid 0x26696143 and 0x26557651 Tony Luck
                   ` (16 subsequent siblings)
  32 siblings, 0 replies; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:21 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Each CPU collects data for telemetry events that it sends to the nearest
telemetry event aggregator either when the value of MSR_IA32_PQR_ASSOC.RMID
changes, or when a two millisecond timer expires.

There is a feature type ("energy" or "perf"), guid, and MMIO region associated
with each aggregator. This combination links to an XML description of the
set of telemetry events tracked by the aggregator. XML files are published
by Intel in a GitHub repository [1].

The telemetry event aggregators maintain per-RMID per-event counts of the
total seen for all the CPUs. There may be multiple telemetry event aggregators
per package.

There are separate sets of aggregators for each feature type. Aggregators
in a set may have different guids. All aggregators with the same feature
type and guid are symmetric keeping counts for the same set of events for
the CPUs that provide data to them.

The XML file for each aggregator provides the following information:
0) Feature type of the events ("perf" or "energy")
1) Which telemetry events are tracked by the aggregator.
2) The order in which the event counters appear for each RMID.
3) The value type of each event counter (integer or fixed-point).
4) The number of RMIDs supported.
5) Which additional aggregator status registers are included.
6) The total size of the MMIO region for an aggregator.

Introduce struct event_group that condenses the relevant information from
an XML file. Hereafter an "event group" refers to a group of events of a
particular feature type (event_group::pfname set to "energy" or "perf") with
a particular guid.

Use event_group::pfname to determine the feature id needed to obtain the
aggregator details. It will later be used in console messages and with the
rdt= boot parameter.

The INTEL_PMT_TELEMETRY driver enumerates support for telemetry events.
This driver provides intel_pmt_get_regions_by_feature() to list all available
telemetry event aggregators of a given feature type. The list includes the
"guid", the base address in MMIO space for the region where the event counters
are exposed, and the package id where the all the CPUs that report to this
aggregator are located.

Call INTEL_PMT_TELEMETRY's intel_pmt_get_regions_by_feature() for each event
group to obtain a private copy of that event group's aggregator data. Duplicate
the aggregator data between event groups that have the same feature type
but different guid. Further processing on this private copy will be unique
to the event group.

Return the aggregator data to INTEL_PMT_TELEMETRY at resctrl exit time.

resctrl will silently ignore unknown guid values.

Add a new Kconfig option CONFIG_X86_CPU_RESCTRL_INTEL_AET for the Intel specific
parts of telemetry code. This depends on the INTEL_PMT_TELEMETRY and INTEL_TPMI
drivers being built-in to the kernel for enumeration of telemetry features.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://github.com/intel/Intel-PMT # [1]
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/resctrl/internal.h  |   8 ++
 arch/x86/kernel/cpu/resctrl/core.c      |   5 ++
 arch/x86/kernel/cpu/resctrl/intel_aet.c | 109 ++++++++++++++++++++++++
 arch/x86/Kconfig                        |  13 +++
 arch/x86/kernel/cpu/resctrl/Makefile    |   1 +
 5 files changed, 136 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/resctrl/intel_aet.c

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 11d06995810e..f2e6e3577df0 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -222,4 +222,12 @@ void __init intel_rdt_mbm_apply_quirk(void);
 void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
 void resctrl_arch_mbm_cntr_assign_set_one(struct rdt_resource *r);
 
+#ifdef CONFIG_X86_CPU_RESCTRL_INTEL_AET
+bool intel_aet_get_events(void);
+void __exit intel_aet_exit(void);
+#else
+static inline bool intel_aet_get_events(void) { return false; }
+static inline void __exit intel_aet_exit(void) { }
+#endif
+
 #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 986b1303efb9..88be77d5d20d 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -743,6 +743,9 @@ void resctrl_arch_pre_mount(void)
 
 	if (!atomic_try_cmpxchg(&only_once, &old, 1))
 		return;
+
+	if (!intel_aet_get_events())
+		return;
 }
 
 enum {
@@ -1104,6 +1107,8 @@ late_initcall(resctrl_arch_late_init);
 
 static void __exit resctrl_arch_exit(void)
 {
+	intel_aet_exit();
+
 	cpuhp_remove_state(rdt_online);
 
 	resctrl_exit();
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
new file mode 100644
index 000000000000..6e0d063cfc80
--- /dev/null
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -0,0 +1,109 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Resource Director Technology(RDT)
+ * - Intel Application Energy Telemetry
+ *
+ * Copyright (C) 2025 Intel Corporation
+ *
+ * Author:
+ *    Tony Luck <tony.luck@intel.com>
+ */
+
+#define pr_fmt(fmt)   "resctrl: " fmt
+
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/intel_pmt_features.h>
+#include <linux/intel_vsec.h>
+#include <linux/resctrl.h>
+#include <linux/stddef.h>
+
+#include "internal.h"
+
+/**
+ * struct event_group - Events with the same feature type ("energy" or "perf") and guid.
+ * @pfname:		PMT feature name ("energy" or "perf") of this event group.
+ * @pfg:		Points to the aggregated telemetry space information
+ *			returned by the intel_pmt_get_regions_by_feature()
+ *			call to the INTEL_PMT_TELEMETRY driver that contains
+ *			data for all telemetry regions of type @pfname.
+ *			Valid if the system supports the event group,
+ *			NULL otherwise.
+ */
+struct event_group {
+	/* Data fields for additional structures to manage this group. */
+	const char			*pfname;
+	struct pmt_feature_group	*pfg;
+};
+
+static struct event_group *known_event_groups[] = {
+};
+
+#define for_each_event_group(_peg)						\
+	for (_peg = known_event_groups;						\
+	     _peg < &known_event_groups[ARRAY_SIZE(known_event_groups)];	\
+	     _peg++)
+
+/* Stub for now */
+static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
+{
+	return false;
+}
+
+static enum pmt_feature_id lookup_pfid(const char *pfname)
+{
+	if (!strcmp(pfname, "energy"))
+		return FEATURE_PER_RMID_ENERGY_TELEM;
+	else if (!strcmp(pfname, "perf"))
+		return FEATURE_PER_RMID_PERF_TELEM;
+
+	pr_warn("Unknown PMT feature name '%s'\n", pfname);
+
+	return FEATURE_INVALID;
+}
+
+/*
+ * Request a copy of struct pmt_feature_group for each event group. If there is
+ * one, the returned structure has an array of telemetry_region structures,
+ * each element of the array describes one telemetry aggregator. The
+ * telemetry aggregators may have different guids so obtain duplicate struct
+ * pmt_feature_group for event groups with same feature type but different
+ * guid. Post-processing ensures an event group can only use the telemetry
+ * aggregators that match its guid. An event group keeps a pointer to its
+ * struct pmt_feature_group to indicate that its events are successfully
+ * enabled.
+ */
+bool intel_aet_get_events(void)
+{
+	struct pmt_feature_group *p;
+	enum pmt_feature_id pfid;
+	struct event_group **peg;
+	bool ret = false;
+
+	for_each_event_group(peg) {
+		pfid = lookup_pfid((*peg)->pfname);
+		p = intel_pmt_get_regions_by_feature(pfid);
+		if (IS_ERR_OR_NULL(p))
+			continue;
+		if (enable_events(*peg, p)) {
+			(*peg)->pfg = p;
+			ret = true;
+		} else {
+			intel_pmt_put_feature_group(p);
+		}
+	}
+
+	return ret;
+}
+
+void __exit intel_aet_exit(void)
+{
+	struct event_group **peg;
+
+	for_each_event_group(peg) {
+		if ((*peg)->pfg) {
+			intel_pmt_put_feature_group((*peg)->pfg);
+			(*peg)->pfg = NULL;
+		}
+	}
+}
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 80527299f859..0d874a92daed 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -540,6 +540,19 @@ config X86_CPU_RESCTRL
 
 	  Say N if unsure.
 
+config X86_CPU_RESCTRL_INTEL_AET
+	bool "Intel Application Energy Telemetry"
+	depends on X86_CPU_RESCTRL && CPU_SUP_INTEL && INTEL_PMT_TELEMETRY=y && INTEL_TPMI=y
+	help
+	  Enable per-RMID telemetry events in resctrl.
+
+	  Intel feature that collects per-RMID execution data
+	  about energy consumption, measure of frequency independent
+	  activity and other performance metrics. Data is aggregated
+	  per package.
+
+	  Say N if unsure.
+
 config X86_FRED
 	bool "Flexible Return and Event Delivery"
 	depends on X86_64
diff --git a/arch/x86/kernel/cpu/resctrl/Makefile b/arch/x86/kernel/cpu/resctrl/Makefile
index d8a04b195da2..273ddfa30836 100644
--- a/arch/x86/kernel/cpu/resctrl/Makefile
+++ b/arch/x86/kernel/cpu/resctrl/Makefile
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_X86_CPU_RESCTRL)		+= core.o rdtgroup.o monitor.o
 obj-$(CONFIG_X86_CPU_RESCTRL)		+= ctrlmondata.o
+obj-$(CONFIG_X86_CPU_RESCTRL_INTEL_AET)	+= intel_aet.o
 obj-$(CONFIG_RESCTRL_FS_PSEUDO_LOCK)	+= pseudo_lock.o
 
 # To allow define_trace.h's recursive include:
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v17 17/32] x86,fs/resctrl: Fill in details of events for guid 0x26696143 and 0x26557651
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (15 preceding siblings ...)
  2025-12-17 17:21 ` [PATCH v17 16/32] x86/resctrl: Discover hardware telemetry events Tony Luck
@ 2025-12-17 17:21 ` Tony Luck
  2025-12-17 17:21 ` [PATCH v17 18/32] x86,fs/resctrl: Add architectural event pointer Tony Luck
                   ` (15 subsequent siblings)
  32 siblings, 0 replies; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:21 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

The telemetry event aggregators of the Intel Clearwater Forest CPU support
two RMID-based feature types: "energy" with guid 0x26696143 [1], and "perf"
with guid 0x26557651 [2].

The event counter offsets in an aggregator's MMIO space are arranged in
groups for each RMID.

E.g the "energy" counters for guid 0x26696143 are arranged like this:

        MMIO offset:0x0000 Counter for RMID 0 PMT_EVENT_ENERGY
        MMIO offset:0x0008 Counter for RMID 0 PMT_EVENT_ACTIVITY
        MMIO offset:0x0010 Counter for RMID 1 PMT_EVENT_ENERGY
        MMIO offset:0x0018 Counter for RMID 1 PMT_EVENT_ACTIVITY
        ...
        MMIO offset:0x23F0 Counter for RMID 575 PMT_EVENT_ENERGY
        MMIO offset:0x23F8 Counter for RMID 575 PMT_EVENT_ACTIVITY

After all counters there are three status registers that provide indications
of how many times an aggregator was unable to process event counts, the time
stamp for the most recent loss of data, and the time stamp of the most recent
successful update.

	MMIO offset:0x2400 AGG_DATA_LOSS_COUNT
	MMIO offset:0x2408 AGG_DATA_LOSS_TIMESTAMP
	MMIO offset:0x2410 LAST_UPDATE_TIMESTAMP

Define event_group structures for both of these aggregator types and define the
events tracked by the aggregators in the file system code.

PMT_EVENT_ENERGY and PMT_EVENT_ACTIVITY are produced in fixed point
format. File system code must output as floating point values.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://github.com/intel/Intel-PMT/blob/main/xml/CWF/OOBMSM/RMID-ENERGY/cwf_aggregator.xml # [1]
Link: https://github.com/intel/Intel-PMT/blob/main/xml/CWF/OOBMSM/RMID-PERF/cwf_aggregator.xml # [2]
---
 include/linux/resctrl_types.h           | 11 +++++
 arch/x86/kernel/cpu/resctrl/intel_aet.c | 66 +++++++++++++++++++++++++
 fs/resctrl/monitor.c                    | 35 +++++++------
 3 files changed, 97 insertions(+), 15 deletions(-)

diff --git a/include/linux/resctrl_types.h b/include/linux/resctrl_types.h
index acfe07860b34..a5f56faa18d2 100644
--- a/include/linux/resctrl_types.h
+++ b/include/linux/resctrl_types.h
@@ -50,6 +50,17 @@ enum resctrl_event_id {
 	QOS_L3_MBM_TOTAL_EVENT_ID	= 0x02,
 	QOS_L3_MBM_LOCAL_EVENT_ID	= 0x03,
 
+	/* Intel Telemetry Events */
+	PMT_EVENT_ENERGY,
+	PMT_EVENT_ACTIVITY,
+	PMT_EVENT_STALLS_LLC_HIT,
+	PMT_EVENT_C1_RES,
+	PMT_EVENT_UNHALTED_CORE_CYCLES,
+	PMT_EVENT_STALLS_LLC_MISS,
+	PMT_EVENT_AUTO_C6_RES,
+	PMT_EVENT_UNHALTED_REF_CYCLES,
+	PMT_EVENT_UOPS_RETIRED,
+
 	/* Must be the last */
 	QOS_NUM_EVENTS,
 };
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index 6e0d063cfc80..c7d08eb26395 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -11,15 +11,33 @@
 
 #define pr_fmt(fmt)   "resctrl: " fmt
 
+#include <linux/compiler_types.h>
 #include <linux/err.h>
 #include <linux/init.h>
 #include <linux/intel_pmt_features.h>
 #include <linux/intel_vsec.h>
 #include <linux/resctrl.h>
+#include <linux/resctrl_types.h>
 #include <linux/stddef.h>
+#include <linux/types.h>
 
 #include "internal.h"
 
+/**
+ * struct pmt_event - Telemetry event.
+ * @id:		Resctrl event id.
+ * @idx:	Counter index within each per-RMID block of counters.
+ * @bin_bits:	Zero for integer valued events, else number bits in fraction
+ *		part of fixed-point.
+ */
+struct pmt_event {
+	enum resctrl_event_id	id;
+	unsigned int		idx;
+	unsigned int		bin_bits;
+};
+
+#define EVT(_id, _idx, _bits) { .id = _id, .idx = _idx, .bin_bits = _bits }
+
 /**
  * struct event_group - Events with the same feature type ("energy" or "perf") and guid.
  * @pfname:		PMT feature name ("energy" or "perf") of this event group.
@@ -29,14 +47,62 @@
  *			data for all telemetry regions of type @pfname.
  *			Valid if the system supports the event group,
  *			NULL otherwise.
+ * @guid:		Unique number per XML description file.
+ * @mmio_size:		Number of bytes of MMIO registers for this group.
+ * @num_events:		Number of events in this group.
+ * @evts:		Array of event descriptors.
  */
 struct event_group {
 	/* Data fields for additional structures to manage this group. */
 	const char			*pfname;
 	struct pmt_feature_group	*pfg;
+
+	/* Remaining fields initialized from XML file. */
+	u32				guid;
+	size_t				mmio_size;
+	unsigned int			num_events;
+	struct pmt_event		evts[] __counted_by(num_events);
+};
+
+#define XML_MMIO_SIZE(num_rmids, num_events, num_extra_status) \
+		      (((num_rmids) * (num_events) + (num_extra_status)) * sizeof(u64))
+
+/*
+ * Link: https://github.com/intel/Intel-PMT/blob/main/xml/CWF/OOBMSM/RMID-ENERGY/cwf_aggregator.xml
+ */
+static struct event_group energy_0x26696143 = {
+	.pfname		= "energy",
+	.guid		= 0x26696143,
+	.mmio_size	= XML_MMIO_SIZE(576, 2, 3),
+	.num_events	= 2,
+	.evts		= {
+		EVT(PMT_EVENT_ENERGY, 0, 18),
+		EVT(PMT_EVENT_ACTIVITY, 1, 18),
+	}
+};
+
+/*
+ * Link: https://github.com/intel/Intel-PMT/blob/main/xml/CWF/OOBMSM/RMID-PERF/cwf_aggregator.xml
+ */
+static struct event_group perf_0x26557651 = {
+	.pfname		= "perf",
+	.guid		= 0x26557651,
+	.mmio_size	= XML_MMIO_SIZE(576, 7, 3),
+	.num_events	= 7,
+	.evts		= {
+		EVT(PMT_EVENT_STALLS_LLC_HIT, 0, 0),
+		EVT(PMT_EVENT_C1_RES, 1, 0),
+		EVT(PMT_EVENT_UNHALTED_CORE_CYCLES, 2, 0),
+		EVT(PMT_EVENT_STALLS_LLC_MISS, 3, 0),
+		EVT(PMT_EVENT_AUTO_C6_RES, 4, 0),
+		EVT(PMT_EVENT_UNHALTED_REF_CYCLES, 5, 0),
+		EVT(PMT_EVENT_UOPS_RETIRED, 6, 0),
+	}
 };
 
 static struct event_group *known_event_groups[] = {
+	&energy_0x26696143,
+	&perf_0x26557651,
 };
 
 #define for_each_event_group(_peg)						\
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 844cf6875f60..9729acacdc19 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -965,27 +965,32 @@ static void dom_data_exit(struct rdt_resource *r)
 	mutex_unlock(&rdtgroup_mutex);
 }
 
+#define MON_EVENT(_eventid, _name, _res, _fp)	\
+	[_eventid] = {				\
+	.name			= _name,	\
+	.evtid			= _eventid,	\
+	.rid			= _res,		\
+	.is_floating_point	= _fp,		\
+}
+
 /*
  * All available events. Architecture code marks the ones that
  * are supported by a system using resctrl_enable_mon_event()
  * to set .enabled.
  */
 struct mon_evt mon_event_all[QOS_NUM_EVENTS] = {
-	[QOS_L3_OCCUP_EVENT_ID] = {
-		.name	= "llc_occupancy",
-		.evtid	= QOS_L3_OCCUP_EVENT_ID,
-		.rid	= RDT_RESOURCE_L3,
-	},
-	[QOS_L3_MBM_TOTAL_EVENT_ID] = {
-		.name	= "mbm_total_bytes",
-		.evtid	= QOS_L3_MBM_TOTAL_EVENT_ID,
-		.rid	= RDT_RESOURCE_L3,
-	},
-	[QOS_L3_MBM_LOCAL_EVENT_ID] = {
-		.name	= "mbm_local_bytes",
-		.evtid	= QOS_L3_MBM_LOCAL_EVENT_ID,
-		.rid	= RDT_RESOURCE_L3,
-	},
+	MON_EVENT(QOS_L3_OCCUP_EVENT_ID,		"llc_occupancy",	RDT_RESOURCE_L3,	false),
+	MON_EVENT(QOS_L3_MBM_TOTAL_EVENT_ID,		"mbm_total_bytes",	RDT_RESOURCE_L3,	false),
+	MON_EVENT(QOS_L3_MBM_LOCAL_EVENT_ID,		"mbm_local_bytes",	RDT_RESOURCE_L3,	false),
+	MON_EVENT(PMT_EVENT_ENERGY,			"core_energy",		RDT_RESOURCE_PERF_PKG,	true),
+	MON_EVENT(PMT_EVENT_ACTIVITY,			"activity",		RDT_RESOURCE_PERF_PKG,	true),
+	MON_EVENT(PMT_EVENT_STALLS_LLC_HIT,		"stalls_llc_hit",	RDT_RESOURCE_PERF_PKG,	false),
+	MON_EVENT(PMT_EVENT_C1_RES,			"c1_res",		RDT_RESOURCE_PERF_PKG,	false),
+	MON_EVENT(PMT_EVENT_UNHALTED_CORE_CYCLES,	"unhalted_core_cycles",	RDT_RESOURCE_PERF_PKG,	false),
+	MON_EVENT(PMT_EVENT_STALLS_LLC_MISS,		"stalls_llc_miss",	RDT_RESOURCE_PERF_PKG,	false),
+	MON_EVENT(PMT_EVENT_AUTO_C6_RES,		"c6_res",		RDT_RESOURCE_PERF_PKG,	false),
+	MON_EVENT(PMT_EVENT_UNHALTED_REF_CYCLES,	"unhalted_ref_cycles",	RDT_RESOURCE_PERF_PKG,	false),
+	MON_EVENT(PMT_EVENT_UOPS_RETIRED,		"uops_retired",		RDT_RESOURCE_PERF_PKG,	false),
 };
 
 void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu, unsigned int binary_bits)
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v17 18/32] x86,fs/resctrl: Add architectural event pointer
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (16 preceding siblings ...)
  2025-12-17 17:21 ` [PATCH v17 17/32] x86,fs/resctrl: Fill in details of events for guid 0x26696143 and 0x26557651 Tony Luck
@ 2025-12-17 17:21 ` Tony Luck
  2025-12-17 17:21 ` [PATCH v17 19/32] x86/resctrl: Find and enable usable telemetry events Tony Luck
                   ` (14 subsequent siblings)
  32 siblings, 0 replies; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:21 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

The resctrl file system layer passes the domain, RMID, and event id to the
architecture to fetch an event counter.

Fetching a telemetry event counter requires additional information that is
private to the architecture, for example, the offset into MMIO space from
where the counter should be read.

Add mon_evt::arch_priv that architecture can use for any private data related
to the event. resctrl filesystem initializes mon_evt::arch_priv when the
architecture enables the event and passes it back to architecture when
needing to fetch an event counter.

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 include/linux/resctrl.h               |  7 +++++--
 fs/resctrl/internal.h                 |  4 ++++
 arch/x86/kernel/cpu/resctrl/core.c    |  6 +++---
 arch/x86/kernel/cpu/resctrl/monitor.c |  2 +-
 fs/resctrl/monitor.c                  | 14 ++++++++++----
 5 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 2a3613f27274..b30f99335bbe 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -415,7 +415,7 @@ u32 resctrl_arch_system_num_rmid_idx(void);
 int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
 
 void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu,
-			      unsigned int binary_bits);
+			      unsigned int binary_bits, void *arch_priv);
 
 bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid);
 
@@ -532,6 +532,9 @@ void resctrl_arch_pre_mount(void);
  *			only.
  * @rmid:		rmid of the counter to read.
  * @eventid:		eventid to read, e.g. L3 occupancy.
+ * @arch_priv:		Architecture private data for this event.
+ *			The @arch_priv provided by the architecture via
+ *			resctrl_enable_mon_event().
  * @val:		result of the counter read in bytes.
  * @arch_mon_ctx:	An architecture specific value from
  *			resctrl_arch_mon_ctx_alloc(), for MPAM this identifies
@@ -549,7 +552,7 @@ void resctrl_arch_pre_mount(void);
  */
 int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
 			   u32 closid, u32 rmid, enum resctrl_event_id eventid,
-			   u64 *val, void *arch_mon_ctx);
+			   void *arch_priv, u64 *val, void *arch_mon_ctx);
 
 /**
  * resctrl_arch_rmid_read_context_check()  - warn about invalid contexts
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 50d88e91e0da..399f625be67d 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -66,6 +66,9 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
  * @binary_bits:	number of fixed-point binary bits from architecture,
  *			only valid if @is_floating_point is true
  * @enabled:		true if the event is enabled
+ * @arch_priv:		Architecture private data for this event.
+ *			The @arch_priv provided by the architecture via
+ *			resctrl_enable_mon_event().
  */
 struct mon_evt {
 	enum resctrl_event_id	evtid;
@@ -77,6 +80,7 @@ struct mon_evt {
 	bool			is_floating_point;
 	unsigned int		binary_bits;
 	bool			enabled;
+	void			*arch_priv;
 };
 
 extern struct mon_evt mon_event_all[QOS_NUM_EVENTS];
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 88be77d5d20d..3c6946a5ff1b 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -924,15 +924,15 @@ static __init bool get_rdt_mon_resources(void)
 	bool ret = false;
 
 	if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) {
-		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false, 0);
+		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false, 0, NULL);
 		ret = true;
 	}
 	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
-		resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID, false, 0);
+		resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID, false, 0, NULL);
 		ret = true;
 	}
 	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
-		resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID, false, 0);
+		resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID, false, 0, NULL);
 		ret = true;
 	}
 	if (rdt_cpu_has(X86_FEATURE_ABMC))
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 20605212656c..6929614ba6e6 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -240,7 +240,7 @@ static u64 get_corrected_val(struct rdt_resource *r, struct rdt_l3_mon_domain *d
 
 int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
 			   u32 unused, u32 rmid, enum resctrl_event_id eventid,
-			   u64 *val, void *ignored)
+			   void *arch_priv, u64 *val, void *ignored)
 {
 	struct rdt_hw_l3_mon_domain *hw_dom;
 	struct rdt_l3_mon_domain *d;
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 9729acacdc19..af43a33ce4cb 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -137,9 +137,11 @@ void __check_limbo(struct rdt_l3_mon_domain *d, bool force_free)
 	struct rmid_entry *entry;
 	u32 idx, cur_idx = 1;
 	void *arch_mon_ctx;
+	void *arch_priv;
 	bool rmid_dirty;
 	u64 val = 0;
 
+	arch_priv = mon_event_all[QOS_L3_OCCUP_EVENT_ID].arch_priv;
 	arch_mon_ctx = resctrl_arch_mon_ctx_alloc(r, QOS_L3_OCCUP_EVENT_ID);
 	if (IS_ERR(arch_mon_ctx)) {
 		pr_warn_ratelimited("Failed to allocate monitor context: %ld",
@@ -160,7 +162,7 @@ void __check_limbo(struct rdt_l3_mon_domain *d, bool force_free)
 
 		entry = __rmid_entry(idx);
 		if (resctrl_arch_rmid_read(r, &d->hdr, entry->closid, entry->rmid,
-					   QOS_L3_OCCUP_EVENT_ID, &val,
+					   QOS_L3_OCCUP_EVENT_ID, arch_priv, &val,
 					   arch_mon_ctx)) {
 			rmid_dirty = true;
 		} else {
@@ -456,7 +458,8 @@ static int __l3_mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 						 rr->evt->evtid, &tval);
 	else
 		rr->err = resctrl_arch_rmid_read(rr->r, rr->hdr, closid, rmid,
-						 rr->evt->evtid, &tval, rr->arch_mon_ctx);
+						 rr->evt->evtid, rr->evt->arch_priv,
+						 &tval, rr->arch_mon_ctx);
 	if (rr->err)
 		return rr->err;
 
@@ -501,7 +504,8 @@ static int __l3_mon_event_count_sum(struct rdtgroup *rdtgrp, struct rmid_read *r
 		if (d->ci_id != rr->ci->id)
 			continue;
 		err = resctrl_arch_rmid_read(rr->r, &d->hdr, closid, rmid,
-					     rr->evt->evtid, &tval, rr->arch_mon_ctx);
+					     rr->evt->evtid, rr->evt->arch_priv,
+					     &tval, rr->arch_mon_ctx);
 		if (!err) {
 			rr->val += tval;
 			ret = 0;
@@ -993,7 +997,8 @@ struct mon_evt mon_event_all[QOS_NUM_EVENTS] = {
 	MON_EVENT(PMT_EVENT_UOPS_RETIRED,		"uops_retired",		RDT_RESOURCE_PERF_PKG,	false),
 };
 
-void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu, unsigned int binary_bits)
+void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu,
+			      unsigned int binary_bits, void *arch_priv)
 {
 	if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS ||
 			 binary_bits > MAX_BINARY_BITS))
@@ -1009,6 +1014,7 @@ void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu, unsig
 
 	mon_event_all[eventid].any_cpu = any_cpu;
 	mon_event_all[eventid].binary_bits = binary_bits;
+	mon_event_all[eventid].arch_priv = arch_priv;
 	mon_event_all[eventid].enabled = true;
 }
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v17 19/32] x86/resctrl: Find and enable usable telemetry events
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (17 preceding siblings ...)
  2025-12-17 17:21 ` [PATCH v17 18/32] x86,fs/resctrl: Add architectural event pointer Tony Luck
@ 2025-12-17 17:21 ` Tony Luck
  2026-01-09 12:16   ` Borislav Petkov
  2025-12-17 17:21 ` [PATCH v17 20/32] x86/resctrl: Read " Tony Luck
                   ` (13 subsequent siblings)
  32 siblings, 1 reply; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:21 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Every event group has a private copy of the data of all telemetry event
aggregators (aka "telemetry regions") tracking its feature type. Included
may be regions that have the same feature type but tracking different guid
from the event group's.

Traverse the event group's telemetry region data and mark all regions that
are not usable by the event group as unusable by clearing those regions'
MMIO addresses. A region is considered unusable if:
1) guid does not match the guid of the event group.
2) Package ID is invalid.
3) The enumerated size of the MMIO region does not match the expected
   value from the XML description file.

Hereafter any telemetry region with an MMIO address is considered valid for
the event group it is associated with.

Enable all the event group's events as long as there is at least one usable
region from where data for its events can be read. Enabling of an event can
fail if the same event has already been enabled as part of another event
group. It should never happen that the same event is described by different
guid supported by the same system so just WARN (via resctrl_enable_mon_event())
and skip the event.

Note that it is architecturally possible that some telemetry events are
only supported by a subset of the packages in the system. It is not expected
that systems will ever do this. If they do the user will see event files in
resctrl that always return "Unavailable".

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 include/linux/resctrl.h                 |  2 +-
 arch/x86/kernel/cpu/resctrl/intel_aet.c | 67 ++++++++++++++++++++++++-
 fs/resctrl/monitor.c                    | 10 ++--
 3 files changed, 72 insertions(+), 7 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index b30f99335bbe..14126d228e61 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -414,7 +414,7 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
 u32 resctrl_arch_system_num_rmid_idx(void);
 int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
 
-void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu,
+bool resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu,
 			      unsigned int binary_bits, void *arch_priv);
 
 bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid);
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index c7d08eb26395..611c6b1fc08d 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -16,9 +16,11 @@
 #include <linux/init.h>
 #include <linux/intel_pmt_features.h>
 #include <linux/intel_vsec.h>
+#include <linux/printk.h>
 #include <linux/resctrl.h>
 #include <linux/resctrl_types.h>
 #include <linux/stddef.h>
+#include <linux/topology.h>
 #include <linux/types.h>
 
 #include "internal.h"
@@ -110,12 +112,73 @@ static struct event_group *known_event_groups[] = {
 	     _peg < &known_event_groups[ARRAY_SIZE(known_event_groups)];	\
 	     _peg++)
 
-/* Stub for now */
-static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
+/*
+ * Clear the address field of regions that did not pass the checks in
+ * skip_telem_region() so they will not be used by intel_aet_read_event().
+ * This is safe to do because intel_pmt_get_regions_by_feature() allocates
+ * a new pmt_feature_group structure to return to each caller and only makes
+ * use of the pmt_feature_group::kref field when intel_pmt_put_feature_group()
+ * returns the structure.
+ */
+static void mark_telem_region_unusable(struct telemetry_region *tr)
 {
+	tr->addr = NULL;
+}
+
+static bool skip_telem_region(struct telemetry_region *tr, struct event_group *e)
+{
+	if (tr->guid != e->guid)
+		return true;
+	if (tr->plat_info.package_id >= topology_max_packages()) {
+		pr_warn("Bad package %u in guid 0x%x\n", tr->plat_info.package_id,
+			tr->guid);
+		return true;
+	}
+	if (tr->size != e->mmio_size) {
+		pr_warn("MMIO space wrong size (%zu bytes) for guid 0x%x. Expected %zu bytes.\n",
+			tr->size, e->guid, e->mmio_size);
+		return true;
+	}
+
 	return false;
 }
 
+static bool group_has_usable_regions(struct event_group *e, struct pmt_feature_group *p)
+{
+	bool usable_regions = false;
+
+	for (int i = 0; i < p->count; i++) {
+		if (skip_telem_region(&p->regions[i], e)) {
+			mark_telem_region_unusable(&p->regions[i]);
+			continue;
+		}
+		usable_regions = true;
+	}
+
+	return usable_regions;
+}
+
+static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
+{
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
+	int skipped_events = 0;
+
+	if (!group_has_usable_regions(e, p))
+		return false;
+
+	for (int j = 0; j < e->num_events; j++) {
+		if (!resctrl_enable_mon_event(e->evts[j].id, true,
+					      e->evts[j].bin_bits, &e->evts[j]))
+			skipped_events++;
+	}
+	if (e->num_events == skipped_events) {
+		pr_info("No events enabled in %s %s:0x%x\n", r->name, e->pfname, e->guid);
+		return false;
+	}
+
+	return true;
+}
+
 static enum pmt_feature_id lookup_pfid(const char *pfname)
 {
 	if (!strcmp(pfname, "energy"))
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index af43a33ce4cb..9af08b673e39 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -997,25 +997,27 @@ struct mon_evt mon_event_all[QOS_NUM_EVENTS] = {
 	MON_EVENT(PMT_EVENT_UOPS_RETIRED,		"uops_retired",		RDT_RESOURCE_PERF_PKG,	false),
 };
 
-void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu,
+bool resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu,
 			      unsigned int binary_bits, void *arch_priv)
 {
 	if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS ||
 			 binary_bits > MAX_BINARY_BITS))
-		return;
+		return false;
 	if (mon_event_all[eventid].enabled) {
 		pr_warn("Duplicate enable for event %d\n", eventid);
-		return;
+		return false;
 	}
 	if (binary_bits && !mon_event_all[eventid].is_floating_point) {
 		pr_warn("Event %d may not be floating point\n", eventid);
-		return;
+		return false;
 	}
 
 	mon_event_all[eventid].any_cpu = any_cpu;
 	mon_event_all[eventid].binary_bits = binary_bits;
 	mon_event_all[eventid].arch_priv = arch_priv;
 	mon_event_all[eventid].enabled = true;
+
+	return true;
 }
 
 bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid)
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH v17 19/32] x86/resctrl: Find and enable usable telemetry events
  2025-12-17 17:21 ` [PATCH v17 19/32] x86/resctrl: Find and enable usable telemetry events Tony Luck
@ 2026-01-09 12:16   ` Borislav Petkov
  2026-01-09 16:17     ` Reinette Chatre
  2026-01-09 16:53     ` Luck, Tony
  0 siblings, 2 replies; 58+ messages in thread
From: Borislav Petkov @ 2026-01-09 12:16 UTC (permalink / raw)
  To: Tony Luck
  Cc: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu, x86,
	linux-kernel, patches

On Wed, Dec 17, 2025 at 09:21:06AM -0800, Tony Luck wrote:
> -static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
> +/*
> + * Clear the address field of regions that did not pass the checks in
> + * skip_telem_region() so they will not be used by intel_aet_read_event().
> + * This is safe to do because intel_pmt_get_regions_by_feature() allocates
> + * a new pmt_feature_group structure to return to each caller and only makes
> + * use of the pmt_feature_group::kref field when intel_pmt_put_feature_group()
> + * returns the structure.
> + */
> +static void mark_telem_region_unusable(struct telemetry_region *tr)
>  {
> +	tr->addr = NULL;
> +}

We probably don't really need such a silly helper which is used only once and,
AFAICT, doesn't grow any other functionality by the end of the patchset:

diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index 4074fd43830e..7d0bd7b070a7 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -112,19 +112,6 @@ static struct event_group *known_event_groups[] = {
 	     _peg < &known_event_groups[ARRAY_SIZE(known_event_groups)];	\
 	     _peg++)
 
-/*
- * Clear the address field of regions that did not pass the checks in
- * skip_telem_region() so they will not be used by intel_aet_read_event().
- * This is safe to do because intel_pmt_get_regions_by_feature() allocates
- * a new pmt_feature_group structure to return to each caller and only makes
- * use of the pmt_feature_group::kref field when intel_pmt_put_feature_group()
- * returns the structure.
- */
-static void mark_telem_region_unusable(struct telemetry_region *tr)
-{
-	tr->addr = NULL;
-}
-
 static bool skip_telem_region(struct telemetry_region *tr, struct event_group *e)
 {
 	if (tr->guid != e->guid)
@@ -149,7 +136,16 @@ static bool group_has_usable_regions(struct event_group *e, struct pmt_feature_g
 
 	for (int i = 0; i < p->count; i++) {
 		if (skip_telem_region(&p->regions[i], e)) {
-			mark_telem_region_unusable(&p->regions[i]);
+			/*
+			 * Clear the address field of regions that did not pass the checks in
+			 * skip_telem_region() so they will not be used by intel_aet_read_event().
+			 * This is safe to do because intel_pmt_get_regions_by_feature() allocates
+			 * a new pmt_feature_group structure to return to each caller and only makes
+			 * use of the pmt_feature_group::kref field when intel_pmt_put_feature_group()
+			 * returns the structure.
+			 */
+			p->regions[i].addr = NULL;
+
 			continue;
 		}
 		usable_regions = true;


-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH v17 19/32] x86/resctrl: Find and enable usable telemetry events
  2026-01-09 12:16   ` Borislav Petkov
@ 2026-01-09 16:17     ` Reinette Chatre
  2026-01-09 16:53     ` Luck, Tony
  1 sibling, 0 replies; 58+ messages in thread
From: Reinette Chatre @ 2026-01-09 16:17 UTC (permalink / raw)
  To: Borislav Petkov, Tony Luck
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Chen Yu, x86, linux-kernel,
	patches

Hi Boris,

On 1/9/26 4:16 AM, Borislav Petkov wrote:
> On Wed, Dec 17, 2025 at 09:21:06AM -0800, Tony Luck wrote:
>> -static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
>> +/*
>> + * Clear the address field of regions that did not pass the checks in
>> + * skip_telem_region() so they will not be used by intel_aet_read_event().
>> + * This is safe to do because intel_pmt_get_regions_by_feature() allocates
>> + * a new pmt_feature_group structure to return to each caller and only makes
>> + * use of the pmt_feature_group::kref field when intel_pmt_put_feature_group()
>> + * returns the structure.
>> + */
>> +static void mark_telem_region_unusable(struct telemetry_region *tr)
>>  {
>> +	tr->addr = NULL;
>> +}
> 
> We probably don't really need such a silly helper which is used only once and,
> AFAICT, doesn't grow any other functionality by the end of the patchset:
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> index 4074fd43830e..7d0bd7b070a7 100644
> --- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
> +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> @@ -112,19 +112,6 @@ static struct event_group *known_event_groups[] = {
>  	     _peg < &known_event_groups[ARRAY_SIZE(known_event_groups)];	\
>  	     _peg++)
>  
> -/*
> - * Clear the address field of regions that did not pass the checks in
> - * skip_telem_region() so they will not be used by intel_aet_read_event().
> - * This is safe to do because intel_pmt_get_regions_by_feature() allocates
> - * a new pmt_feature_group structure to return to each caller and only makes
> - * use of the pmt_feature_group::kref field when intel_pmt_put_feature_group()
> - * returns the structure.
> - */
> -static void mark_telem_region_unusable(struct telemetry_region *tr)
> -{
> -	tr->addr = NULL;
> -}
> -
>  static bool skip_telem_region(struct telemetry_region *tr, struct event_group *e)
>  {
>  	if (tr->guid != e->guid)
> @@ -149,7 +136,16 @@ static bool group_has_usable_regions(struct event_group *e, struct pmt_feature_g
>  
>  	for (int i = 0; i < p->count; i++) {
>  		if (skip_telem_region(&p->regions[i], e)) {
> -			mark_telem_region_unusable(&p->regions[i]);
> +			/*
> +			 * Clear the address field of regions that did not pass the checks in
> +			 * skip_telem_region() so they will not be used by intel_aet_read_event().
> +			 * This is safe to do because intel_pmt_get_regions_by_feature() allocates
> +			 * a new pmt_feature_group structure to return to each caller and only makes
> +			 * use of the pmt_feature_group::kref field when intel_pmt_put_feature_group()
> +			 * returns the structure.
> +			 */
> +			p->regions[i].addr = NULL;
> +
>  			continue;
>  		}
>  		usable_regions = true;
> 
> 

Looks good to me. Thank you very much.

Reinette


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v17 19/32] x86/resctrl: Find and enable usable telemetry events
  2026-01-09 12:16   ` Borislav Petkov
  2026-01-09 16:17     ` Reinette Chatre
@ 2026-01-09 16:53     ` Luck, Tony
  2026-01-09 22:01       ` Borislav Petkov
  1 sibling, 1 reply; 58+ messages in thread
From: Luck, Tony @ 2026-01-09 16:53 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu, x86,
	linux-kernel, patches, Ilpo Järvinen

On Fri, Jan 09, 2026 at 01:16:16PM +0100, Borislav Petkov wrote:
> On Wed, Dec 17, 2025 at 09:21:06AM -0800, Tony Luck wrote:
> > -static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
> > +/*
> > + * Clear the address field of regions that did not pass the checks in
> > + * skip_telem_region() so they will not be used by intel_aet_read_event().
> > + * This is safe to do because intel_pmt_get_regions_by_feature() allocates
> > + * a new pmt_feature_group structure to return to each caller and only makes
> > + * use of the pmt_feature_group::kref field when intel_pmt_put_feature_group()
> > + * returns the structure.
> > + */
> > +static void mark_telem_region_unusable(struct telemetry_region *tr)
> >  {
> > +	tr->addr = NULL;
> > +}
> 
> We probably don't really need such a silly helper which is used only once and,
> AFAICT, doesn't grow any other functionality by the end of the patchset:

This was made a separate function in response to a comment against the
v9 series from Ilpo:

	As this is at least semi-hacky, I suggest you move it into own function 
	and add a bit longer comment to the function (along the lines what the 
	changelog also states why it works).

Link: https://lore.kernel.org/all/3b0546d4-d0bc-f76e-e1c2-eef2b4abf0f1@linux.intel.com/

But at that point the "p->regions[i].addr = NULL;" was part of a much
larger function. Since then refactoring into various helpers means that
it now looks OK to move it inline. The comment about why it is safe to
update a structure that was provided by Intel-PMT driver is the
important bit, and I see that you preserved that.

So LGTM

-Tony

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v17 19/32] x86/resctrl: Find and enable usable telemetry events
  2026-01-09 16:53     ` Luck, Tony
@ 2026-01-09 22:01       ` Borislav Petkov
  0 siblings, 0 replies; 58+ messages in thread
From: Borislav Petkov @ 2026-01-09 22:01 UTC (permalink / raw)
  To: Luck, Tony, Reinette Chatre
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Chen Yu, x86, linux-kernel,
	patches, Ilpo Järvinen

On Fri, Jan 09, 2026 at 08:53:09AM -0800, Luck, Tony wrote:
> This was made a separate function in response to a comment against the
> v9 series from Ilpo:
> 
> 	As this is at least semi-hacky, I suggest you move it into own function 
> 	and add a bit longer comment to the function (along the lines what the 
> 	changelog also states why it works).
> 
> Link: https://lore.kernel.org/all/3b0546d4-d0bc-f76e-e1c2-eef2b4abf0f1@linux.intel.com/
> 
> But at that point the "p->regions[i].addr = NULL;" was part of a much
> larger function. Since then refactoring into various helpers means that
> it now looks OK to move it inline. The comment about why it is safe to
> update a structure that was provided by Intel-PMT driver is the
> important bit, and I see that you preserved that.
> 
> So LGTM

Thank you both!

/me folds in.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v17 20/32] x86/resctrl: Read telemetry events
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (18 preceding siblings ...)
  2025-12-17 17:21 ` [PATCH v17 19/32] x86/resctrl: Find and enable usable telemetry events Tony Luck
@ 2025-12-17 17:21 ` Tony Luck
  2025-12-17 17:21 ` [PATCH v17 21/32] fs/resctrl: Refactor mkdir_mondata_subdir() Tony Luck
                   ` (12 subsequent siblings)
  32 siblings, 0 replies; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:21 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Introduce intel_aet_read_event() to read telemetry events for resource
RDT_RESOURCE_PERF_PKG. There may be multiple aggregators tracking each package,
so scan all of them and add up all counters. Aggregators may return an invalid
data indication if they have received no records for a given RMID. User will
see "Unavailable" if none of the aggregators on a package provide valid counts.

Resctrl now uses readq() so depends on X86_64. Update Kconfig.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/resctrl/internal.h  |  5 +++
 arch/x86/kernel/cpu/resctrl/intel_aet.c | 51 +++++++++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/monitor.c   |  4 ++
 fs/resctrl/monitor.c                    | 14 +++++++
 arch/x86/Kconfig                        |  2 +-
 5 files changed, 75 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index f2e6e3577df0..10743f5d5fd4 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -225,9 +225,14 @@ void resctrl_arch_mbm_cntr_assign_set_one(struct rdt_resource *r);
 #ifdef CONFIG_X86_CPU_RESCTRL_INTEL_AET
 bool intel_aet_get_events(void);
 void __exit intel_aet_exit(void);
+int intel_aet_read_event(int domid, u32 rmid, void *arch_priv, u64 *val);
 #else
 static inline bool intel_aet_get_events(void) { return false; }
 static inline void __exit intel_aet_exit(void) { }
+static inline int intel_aet_read_event(int domid, u32 rmid, void *arch_priv, u64 *val)
+{
+	return -EINVAL;
+}
 #endif
 
 #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index 611c6b1fc08d..38985613d4be 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -11,11 +11,15 @@
 
 #define pr_fmt(fmt)   "resctrl: " fmt
 
+#include <linux/bits.h>
 #include <linux/compiler_types.h>
+#include <linux/container_of.h>
 #include <linux/err.h>
+#include <linux/errno.h>
 #include <linux/init.h>
 #include <linux/intel_pmt_features.h>
 #include <linux/intel_vsec.h>
+#include <linux/io.h>
 #include <linux/printk.h>
 #include <linux/resctrl.h>
 #include <linux/resctrl_types.h>
@@ -236,3 +240,50 @@ void __exit intel_aet_exit(void)
 		}
 	}
 }
+
+#define DATA_VALID	BIT_ULL(63)
+#define DATA_BITS	GENMASK_ULL(62, 0)
+
+/*
+ * Read counter for an event on a domain (summing all aggregators on the
+ * domain). If an aggregator hasn't received any data for a specific RMID,
+ * the MMIO read indicates that data is not valid.  Return success if at
+ * least one aggregator has valid data.
+ */
+int intel_aet_read_event(int domid, u32 rmid, void *arch_priv, u64 *val)
+{
+	struct pmt_event *pevt = arch_priv;
+	struct event_group *e;
+	bool valid = false;
+	u64 total = 0;
+	u64 evtcount;
+	void *pevt0;
+	u32 idx;
+
+	pevt0 = pevt - pevt->idx;
+	e = container_of(pevt0, struct event_group, evts);
+	idx = rmid * e->num_events;
+	idx += pevt->idx;
+
+	if (idx * sizeof(u64) + sizeof(u64) > e->mmio_size) {
+		pr_warn_once("MMIO index %u out of range\n", idx);
+		return -EIO;
+	}
+
+	for (int i = 0; i < e->pfg->count; i++) {
+		if (!e->pfg->regions[i].addr)
+			continue;
+		if (e->pfg->regions[i].plat_info.package_id != domid)
+			continue;
+		evtcount = readq(e->pfg->regions[i].addr + idx * sizeof(u64));
+		if (!(evtcount & DATA_VALID))
+			continue;
+		total += evtcount & DATA_BITS;
+		valid = true;
+	}
+
+	if (valid)
+		*val = total;
+
+	return valid ? 0 : -EINVAL;
+}
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 6929614ba6e6..e6a154240b8d 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -251,6 +251,10 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
 	int ret;
 
 	resctrl_arch_rmid_read_context_check();
+
+	if (r->rid == RDT_RESOURCE_PERF_PKG)
+		return intel_aet_read_event(hdr->id, rmid, arch_priv, val);
+
 	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
 		return -EINVAL;
 
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 9af08b673e39..8a4c2ae72740 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -527,6 +527,20 @@ static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 			return __l3_mon_event_count(rdtgrp, rr);
 		else
 			return __l3_mon_event_count_sum(rdtgrp, rr);
+	case RDT_RESOURCE_PERF_PKG: {
+		u64 tval = 0;
+
+		rr->err = resctrl_arch_rmid_read(rr->r, rr->hdr, rdtgrp->closid,
+						 rdtgrp->mon.rmid, rr->evt->evtid,
+						 rr->evt->arch_priv,
+						 &tval, rr->arch_mon_ctx);
+		if (rr->err)
+			return rr->err;
+
+		rr->val += tval;
+
+		return 0;
+	}
 	default:
 		rr->err = -EINVAL;
 		return -EINVAL;
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0d874a92daed..c25ea0f139ca 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -542,7 +542,7 @@ config X86_CPU_RESCTRL
 
 config X86_CPU_RESCTRL_INTEL_AET
 	bool "Intel Application Energy Telemetry"
-	depends on X86_CPU_RESCTRL && CPU_SUP_INTEL && INTEL_PMT_TELEMETRY=y && INTEL_TPMI=y
+	depends on X86_64 && X86_CPU_RESCTRL && CPU_SUP_INTEL && INTEL_PMT_TELEMETRY=y && INTEL_TPMI=y
 	help
 	  Enable per-RMID telemetry events in resctrl.
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v17 21/32] fs/resctrl: Refactor mkdir_mondata_subdir()
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (19 preceding siblings ...)
  2025-12-17 17:21 ` [PATCH v17 20/32] x86/resctrl: Read " Tony Luck
@ 2025-12-17 17:21 ` Tony Luck
  2025-12-17 17:21 ` [PATCH v17 22/32] fs/resctrl: Refactor rmdir_mondata_subdir_allrdtgrp() Tony Luck
                   ` (11 subsequent siblings)
  32 siblings, 0 replies; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:21 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Population of a monitor group's mon_data directory is unreasonably complicated
because of the support for Sub-NUMA Cluster (SNC) mode.

Split out the SNC code into a helper function to make it easier to add support
for a new telemetry resource.

Move all the duplicated code to make and set owner of domain directories
into the mon_add_all_files() helper and rename to _mkdir_mondata_subdir().

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 fs/resctrl/rdtgroup.c | 108 +++++++++++++++++++++++-------------------
 1 file changed, 58 insertions(+), 50 deletions(-)

diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index dce1f0a6d40b..c0db5d8999ee 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -3259,57 +3259,65 @@ static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
 	}
 }
 
-static int mon_add_all_files(struct kernfs_node *kn, struct rdt_domain_hdr *hdr,
-			     struct rdt_resource *r, struct rdtgroup *prgrp,
-			     bool do_sum)
+/*
+ * Create a directory for a domain and populate it with monitor files. Create
+ * summing monitors when @hdr is NULL. No need to initialize summing monitors.
+ */
+static struct kernfs_node *_mkdir_mondata_subdir(struct kernfs_node *parent_kn, char *name,
+						 struct rdt_domain_hdr *hdr,
+						 struct rdt_resource *r,
+						 struct rdtgroup *prgrp, int domid)
 {
-	struct rdt_l3_mon_domain *d;
 	struct rmid_read rr = {0};
+	struct kernfs_node *kn;
 	struct mon_data *priv;
 	struct mon_evt *mevt;
-	int ret, domid;
+	int ret;
 
-	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
-		return -EINVAL;
+	kn = kernfs_create_dir(parent_kn, name, parent_kn->mode, prgrp);
+	if (IS_ERR(kn))
+		return kn;
+
+	ret = rdtgroup_kn_set_ugid(kn);
+	if (ret)
+		goto out_destroy;
 
-	d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
 	for_each_mon_event(mevt) {
 		if (mevt->rid != r->rid || !mevt->enabled)
 			continue;
-		domid = do_sum ? d->ci_id : d->hdr.id;
-		priv = mon_get_kn_priv(r->rid, domid, mevt, do_sum);
-		if (WARN_ON_ONCE(!priv))
-			return -EINVAL;
+		priv = mon_get_kn_priv(r->rid, domid, mevt, !hdr);
+		if (WARN_ON_ONCE(!priv)) {
+			ret = -EINVAL;
+			goto out_destroy;
+		}
 
 		ret = mon_addfile(kn, mevt->name, priv);
 		if (ret)
-			return ret;
+			goto out_destroy;
 
-		if (!do_sum && resctrl_is_mbm_event(mevt->evtid))
+		if (hdr && resctrl_is_mbm_event(mevt->evtid))
 			mon_event_read(&rr, r, hdr, prgrp, &hdr->cpu_mask, mevt, true);
 	}
 
-	return 0;
+	return kn;
+out_destroy:
+	kernfs_remove(kn);
+	return ERR_PTR(ret);
 }
 
-static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
-				struct rdt_domain_hdr *hdr,
-				struct rdt_resource *r, struct rdtgroup *prgrp)
+static int mkdir_mondata_subdir_snc(struct kernfs_node *parent_kn,
+				    struct rdt_domain_hdr *hdr,
+				    struct rdt_resource *r, struct rdtgroup *prgrp)
 {
-	struct kernfs_node *kn, *ckn;
+	struct kernfs_node *ckn, *kn;
 	struct rdt_l3_mon_domain *d;
 	char name[32];
-	bool snc_mode;
-	int ret = 0;
-
-	lockdep_assert_held(&rdtgroup_mutex);
 
 	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
 		return -EINVAL;
 
 	d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
-	snc_mode = r->mon_scope == RESCTRL_L3_NODE;
-	sprintf(name, "mon_%s_%02d", r->name, snc_mode ? d->ci_id : d->hdr.id);
+	sprintf(name, "mon_%s_%02d", r->name, d->ci_id);
 	kn = kernfs_find_and_get(parent_kn, name);
 	if (kn) {
 		/*
@@ -3318,41 +3326,41 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
 		 */
 		kernfs_put(kn);
 	} else {
-		kn = kernfs_create_dir(parent_kn, name, parent_kn->mode, prgrp);
+		kn = _mkdir_mondata_subdir(parent_kn, name, NULL, r, prgrp, d->ci_id);
 		if (IS_ERR(kn))
 			return PTR_ERR(kn);
+	}
 
-		ret = rdtgroup_kn_set_ugid(kn);
-		if (ret)
-			goto out_destroy;
-		ret = mon_add_all_files(kn, hdr, r, prgrp, snc_mode);
-		if (ret)
-			goto out_destroy;
+	sprintf(name, "mon_sub_%s_%02d", r->name, hdr->id);
+	ckn = _mkdir_mondata_subdir(kn, name, hdr, r, prgrp, hdr->id);
+	if (IS_ERR(ckn)) {
+		kernfs_remove(kn);
+		return PTR_ERR(ckn);
 	}
 
-	if (snc_mode) {
-		sprintf(name, "mon_sub_%s_%02d", r->name, hdr->id);
-		ckn = kernfs_create_dir(kn, name, parent_kn->mode, prgrp);
-		if (IS_ERR(ckn)) {
-			ret = -EINVAL;
-			goto out_destroy;
-		}
+	kernfs_activate(kn);
+	return 0;
+}
 
-		ret = rdtgroup_kn_set_ugid(ckn);
-		if (ret)
-			goto out_destroy;
+static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
+				struct rdt_domain_hdr *hdr,
+				struct rdt_resource *r, struct rdtgroup *prgrp)
+{
+	struct kernfs_node *kn;
+	char name[32];
 
-		ret = mon_add_all_files(ckn, hdr, r, prgrp, false);
-		if (ret)
-			goto out_destroy;
-	}
+	lockdep_assert_held(&rdtgroup_mutex);
+
+	if (r->rid == RDT_RESOURCE_L3 && r->mon_scope == RESCTRL_L3_NODE)
+		return mkdir_mondata_subdir_snc(parent_kn, hdr, r, prgrp);
+
+	sprintf(name, "mon_%s_%02d", r->name, hdr->id);
+	kn = _mkdir_mondata_subdir(parent_kn, name, hdr, r, prgrp, hdr->id);
+	if (IS_ERR(kn))
+		return PTR_ERR(kn);
 
 	kernfs_activate(kn);
 	return 0;
-
-out_destroy:
-	kernfs_remove(kn);
-	return ret;
 }
 
 /*
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v17 22/32] fs/resctrl: Refactor rmdir_mondata_subdir_allrdtgrp()
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (20 preceding siblings ...)
  2025-12-17 17:21 ` [PATCH v17 21/32] fs/resctrl: Refactor mkdir_mondata_subdir() Tony Luck
@ 2025-12-17 17:21 ` Tony Luck
  2025-12-17 17:21 ` [PATCH v17 23/32] x86,fs/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG Tony Luck
                   ` (10 subsequent siblings)
  32 siblings, 0 replies; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:21 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Clearing a monitor group's mon_data directory is complicated because of the
support for Sub-NUMA Cluster (SNC) mode.

Refactor the SNC case into a helper function to make it easier to add support
for a new telemetry resource.

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 fs/resctrl/rdtgroup.c | 42 +++++++++++++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 11 deletions(-)

diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index c0db5d8999ee..679247c43b1a 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -3228,28 +3228,24 @@ static void mon_rmdir_one_subdir(struct kernfs_node *pkn, char *name, char *subn
 }
 
 /*
- * Remove all subdirectories of mon_data of ctrl_mon groups
- * and monitor groups for the given domain.
- * Remove files and directories containing "sum" of domain data
- * when last domain being summed is removed.
+ * Remove files and directories for one SNC node. If it is the last node
+ * sharing an L3 cache, then remove the upper level directory containing
+ * the "sum" files too.
  */
-static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
-					   struct rdt_domain_hdr *hdr)
+static void rmdir_mondata_subdir_allrdtgrp_snc(struct rdt_resource *r,
+					       struct rdt_domain_hdr *hdr)
 {
 	struct rdtgroup *prgrp, *crgrp;
 	struct rdt_l3_mon_domain *d;
 	char subname[32];
-	bool snc_mode;
 	char name[32];
 
 	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
 		return;
 
 	d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
-	snc_mode = r->mon_scope == RESCTRL_L3_NODE;
-	sprintf(name, "mon_%s_%02d", r->name, snc_mode ? d->ci_id : hdr->id);
-	if (snc_mode)
-		sprintf(subname, "mon_sub_%s_%02d", r->name, hdr->id);
+	sprintf(name, "mon_%s_%02d", r->name, d->ci_id);
+	sprintf(subname, "mon_sub_%s_%02d", r->name, hdr->id);
 
 	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
 		mon_rmdir_one_subdir(prgrp->mon.mon_data_kn, name, subname);
@@ -3259,6 +3255,30 @@ static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
 	}
 }
 
+/*
+ * Remove all subdirectories of mon_data of ctrl_mon groups
+ * and monitor groups for the given domain.
+ */
+static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
+					   struct rdt_domain_hdr *hdr)
+{
+	struct rdtgroup *prgrp, *crgrp;
+	char name[32];
+
+	if (r->rid == RDT_RESOURCE_L3 && r->mon_scope == RESCTRL_L3_NODE) {
+		rmdir_mondata_subdir_allrdtgrp_snc(r, hdr);
+		return;
+	}
+
+	sprintf(name, "mon_%s_%02d", r->name, hdr->id);
+	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
+		kernfs_remove_by_name(prgrp->mon.mon_data_kn, name);
+
+		list_for_each_entry(crgrp, &prgrp->mon.crdtgrp_list, mon.crdtgrp_list)
+			kernfs_remove_by_name(crgrp->mon.mon_data_kn, name);
+	}
+}
+
 /*
  * Create a directory for a domain and populate it with monitor files. Create
  * summing monitors when @hdr is NULL. No need to initialize summing monitors.
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v17 23/32] x86,fs/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (21 preceding siblings ...)
  2025-12-17 17:21 ` [PATCH v17 22/32] fs/resctrl: Refactor rmdir_mondata_subdir_allrdtgrp() Tony Luck
@ 2025-12-17 17:21 ` Tony Luck
  2025-12-17 17:21 ` [PATCH v17 24/32] x86/resctrl: Add energy/perf choices to rdt boot option Tony Luck
                   ` (9 subsequent siblings)
  32 siblings, 0 replies; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:21 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

The L3 resource has several requirements for domains. There are per-domain
structures that hold the 64-bit values of counters, and elements to keep
track of the overflow and limbo threads.

None of these are needed for the PERF_PKG resource. The hardware counters
are wide enough that they do not wrap around for decades.

Define a new rdt_perf_pkg_mon_domain structure which just consists of the
standard rdt_domain_hdr to keep track of domain id and CPU mask.

Update resctrl_online_mon_domain() for RDT_RESOURCE_PERF_PKG. The only action
needed for this resource is to create and populate domain directories if a
domain is added while resctrl is mounted.

Similarly resctrl_offline_mon_domain() only needs to remove domain directories.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/resctrl/internal.h  | 13 +++++++++++
 arch/x86/kernel/cpu/resctrl/core.c      | 17 +++++++++++++++
 arch/x86/kernel/cpu/resctrl/intel_aet.c | 29 +++++++++++++++++++++++++
 fs/resctrl/rdtgroup.c                   | 17 ++++++++++-----
 4 files changed, 71 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 10743f5d5fd4..3b228b241fb2 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -87,6 +87,14 @@ static inline struct rdt_hw_l3_mon_domain *resctrl_to_arch_mon_dom(struct rdt_l3
 	return container_of(r, struct rdt_hw_l3_mon_domain, d_resctrl);
 }
 
+/**
+ * struct rdt_perf_pkg_mon_domain - CPUs sharing an package scoped resctrl monitor resource
+ * @hdr:	common header for different domain types
+ */
+struct rdt_perf_pkg_mon_domain {
+	struct rdt_domain_hdr	hdr;
+};
+
 /**
  * struct msr_param - set a range of MSRs from a domain
  * @res:       The resource to use
@@ -226,6 +234,8 @@ void resctrl_arch_mbm_cntr_assign_set_one(struct rdt_resource *r);
 bool intel_aet_get_events(void);
 void __exit intel_aet_exit(void);
 int intel_aet_read_event(int domid, u32 rmid, void *arch_priv, u64 *val);
+void intel_aet_mon_domain_setup(int cpu, int id, struct rdt_resource *r,
+				struct list_head *add_pos);
 #else
 static inline bool intel_aet_get_events(void) { return false; }
 static inline void __exit intel_aet_exit(void) { }
@@ -233,6 +243,9 @@ static inline int intel_aet_read_event(int domid, u32 rmid, void *arch_priv, u64
 {
 	return -EINVAL;
 }
+
+static inline void intel_aet_mon_domain_setup(int cpu, int id, struct rdt_resource *r,
+					      struct list_head *add_pos) { }
 #endif
 
 #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 3c6946a5ff1b..283d653002a2 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -580,6 +580,10 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
 		if (!hdr)
 			l3_mon_domain_setup(cpu, id, r, add_pos);
 		break;
+	case RDT_RESOURCE_PERF_PKG:
+		if (!hdr)
+			intel_aet_mon_domain_setup(cpu, id, r, add_pos);
+		break;
 	default:
 		pr_warn_once("Unknown resource rid=%d\n", r->rid);
 		break;
@@ -679,6 +683,19 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
 		l3_mon_domain_free(hw_dom);
 		break;
 	}
+	case RDT_RESOURCE_PERF_PKG: {
+		struct rdt_perf_pkg_mon_domain *pkgd;
+
+		if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_PERF_PKG))
+			return;
+
+		pkgd = container_of(hdr, struct rdt_perf_pkg_mon_domain, hdr);
+		resctrl_offline_mon_domain(r, hdr);
+		list_del_rcu(&hdr->list);
+		synchronize_rcu();
+		kfree(pkgd);
+		break;
+	}
 	default:
 		pr_warn_once("Unknown resource rid=%d\n", r->rid);
 		break;
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index 38985613d4be..35761d6cd55b 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -14,15 +14,20 @@
 #include <linux/bits.h>
 #include <linux/compiler_types.h>
 #include <linux/container_of.h>
+#include <linux/cpumask.h>
 #include <linux/err.h>
 #include <linux/errno.h>
+#include <linux/gfp_types.h>
 #include <linux/init.h>
 #include <linux/intel_pmt_features.h>
 #include <linux/intel_vsec.h>
 #include <linux/io.h>
 #include <linux/printk.h>
+#include <linux/rculist.h>
+#include <linux/rcupdate.h>
 #include <linux/resctrl.h>
 #include <linux/resctrl_types.h>
+#include <linux/slab.h>
 #include <linux/stddef.h>
 #include <linux/topology.h>
 #include <linux/types.h>
@@ -287,3 +292,27 @@ int intel_aet_read_event(int domid, u32 rmid, void *arch_priv, u64 *val)
 
 	return valid ? 0 : -EINVAL;
 }
+
+void intel_aet_mon_domain_setup(int cpu, int id, struct rdt_resource *r,
+				struct list_head *add_pos)
+{
+	struct rdt_perf_pkg_mon_domain *d;
+	int err;
+
+	d = kzalloc_node(sizeof(*d), GFP_KERNEL, cpu_to_node(cpu));
+	if (!d)
+		return;
+
+	d->hdr.id = id;
+	d->hdr.type = RESCTRL_MON_DOMAIN;
+	d->hdr.rid = RDT_RESOURCE_PERF_PKG;
+	cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
+	list_add_tail_rcu(&d->hdr.list, add_pos);
+
+	err = resctrl_online_mon_domain(r, &d->hdr);
+	if (err) {
+		list_del_rcu(&d->hdr.list);
+		synchronize_rcu();
+		kfree(d);
+	}
+}
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 679247c43b1a..ac3c6e44b7c5 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -4307,11 +4307,6 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *h
 
 	mutex_lock(&rdtgroup_mutex);
 
-	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
-		goto out_unlock;
-
-	d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
-
 	/*
 	 * If resctrl is mounted, remove all the
 	 * per domain monitor data directories.
@@ -4319,6 +4314,13 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *h
 	if (resctrl_mounted && resctrl_arch_mon_capable())
 		rmdir_mondata_subdir_allrdtgrp(r, hdr);
 
+	if (r->rid != RDT_RESOURCE_L3)
+		goto out_unlock;
+
+	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
+		goto out_unlock;
+
+	d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
 	if (resctrl_is_mbm_enabled())
 		cancel_delayed_work(&d->mbm_over);
 	if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID) && has_busy_rmid(d)) {
@@ -4415,6 +4417,9 @@ int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr
 
 	mutex_lock(&rdtgroup_mutex);
 
+	if (r->rid != RDT_RESOURCE_L3)
+		goto mkdir;
+
 	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
 		goto out_unlock;
 
@@ -4432,6 +4437,8 @@ int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr
 	if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID))
 		INIT_DELAYED_WORK(&d->cqm_limbo, cqm_handle_limbo);
 
+mkdir:
+	err = 0;
 	/*
 	 * If the filesystem is not mounted then only the default resource group
 	 * exists. Creation of its directories is deferred until mount time
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v17 24/32] x86/resctrl: Add energy/perf choices to rdt boot option
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (22 preceding siblings ...)
  2025-12-17 17:21 ` [PATCH v17 23/32] x86,fs/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG Tony Luck
@ 2025-12-17 17:21 ` Tony Luck
  2026-01-09 22:16   ` Borislav Petkov
  2025-12-17 17:21 ` [PATCH v17 25/32] x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF_PKG Tony Luck
                   ` (8 subsequent siblings)
  32 siblings, 1 reply; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:21 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Legacy resctrl features are enumerated by X86_FEATURE_* flags. These may be
overridden by quirks to disable features in the case of errata.  Users can
use kernel command line options to either disable a feature, or to force
enable a feature that was disabled by a quirk.

A different approach is needed for hardware features that do not have an
X86_FEATURE_* flag.

Update parsing of the "rdt=" boot parameter to call the telemetry driver
directly to handle new "perf" and "energy" options that controls activation
of telemetry monitoring of the named type. By itself a "perf" or "energy" option
controls the forced enabling or disabling (with ! prefix) of all event groups of
the named type. A ":guid" suffix allows for fine grain control per event group.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 .../admin-guide/kernel-parameters.txt         |  7 +++-
 arch/x86/kernel/cpu/resctrl/internal.h        |  2 +
 arch/x86/kernel/cpu/resctrl/core.c            |  2 +
 arch/x86/kernel/cpu/resctrl/intel_aet.c       | 38 +++++++++++++++++++
 4 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a8d0afde7f85..abd77f39c783 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6325,9 +6325,14 @@ Kernel parameters
 	rdt=		[HW,X86,RDT]
 			Turn on/off individual RDT features. List is:
 			cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
-			mba, smba, bmec, abmc, sdciae.
+			mba, smba, bmec, abmc, sdciae, energy[:guid],
+			perf[:guid].
 			E.g. to turn on cmt and turn off mba use:
 				rdt=cmt,!mba
+			To turn off all energy telemetry monitoring and ensure that
+			perf telemetry monitoring associated with guid 0x12345
+			is enabled use:
+				rdt=!energy,perf:0x12345
 
 	reboot=		[KNL]
 			Format (x86 or x86_64):
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 3b228b241fb2..df09091f7c6c 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -236,6 +236,7 @@ void __exit intel_aet_exit(void);
 int intel_aet_read_event(int domid, u32 rmid, void *arch_priv, u64 *val);
 void intel_aet_mon_domain_setup(int cpu, int id, struct rdt_resource *r,
 				struct list_head *add_pos);
+bool intel_aet_option(bool force_off, char *tok);
 #else
 static inline bool intel_aet_get_events(void) { return false; }
 static inline void __exit intel_aet_exit(void) { }
@@ -246,6 +247,7 @@ static inline int intel_aet_read_event(int domid, u32 rmid, void *arch_priv, u64
 
 static inline void intel_aet_mon_domain_setup(int cpu, int id, struct rdt_resource *r,
 					      struct list_head *add_pos) { }
+static inline bool intel_aet_option(bool force_off, char *tok) { return false; }
 #endif
 
 #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 283d653002a2..960974ffa866 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -820,6 +820,8 @@ static int __init set_rdt_options(char *str)
 		force_off = *tok == '!';
 		if (force_off)
 			tok++;
+		if (intel_aet_option(force_off, tok))
+			continue;
 		for (o = rdt_options; o < &rdt_options[NUM_RDT_OPTIONS]; o++) {
 			if (strcmp(tok, o->name) == 0) {
 				if (force_off)
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index 35761d6cd55b..df91e1ea4a7b 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -52,12 +52,17 @@ struct pmt_event {
 /**
  * struct event_group - Events with the same feature type ("energy" or "perf") and guid.
  * @pfname:		PMT feature name ("energy" or "perf") of this event group.
+ *			Used by boot rdt= option.
  * @pfg:		Points to the aggregated telemetry space information
  *			returned by the intel_pmt_get_regions_by_feature()
  *			call to the INTEL_PMT_TELEMETRY driver that contains
  *			data for all telemetry regions of type @pfname.
  *			Valid if the system supports the event group,
  *			NULL otherwise.
+ * @force_off:		True when "rdt" command line or architecture code disables
+ *			this event group.
+ * @force_on:		True when "rdt" command line overrides disable of this
+ *			event group.
  * @guid:		Unique number per XML description file.
  * @mmio_size:		Number of bytes of MMIO registers for this group.
  * @num_events:		Number of events in this group.
@@ -67,6 +72,7 @@ struct event_group {
 	/* Data fields for additional structures to manage this group. */
 	const char			*pfname;
 	struct pmt_feature_group	*pfg;
+	bool				force_off, force_on;
 
 	/* Remaining fields initialized from XML file. */
 	u32				guid;
@@ -121,6 +127,35 @@ static struct event_group *known_event_groups[] = {
 	     _peg < &known_event_groups[ARRAY_SIZE(known_event_groups)];	\
 	     _peg++)
 
+bool intel_aet_option(bool force_off, char *tok)
+{
+	struct event_group **peg;
+	bool ret = false;
+	u32 guid = 0;
+	char *name;
+
+	if (!tok)
+		return false;
+
+	name = strsep(&tok, ":");
+	if (tok && kstrtou32(tok, 16, &guid))
+		return false;
+
+	for_each_event_group(peg) {
+		if (strcmp(name, (*peg)->pfname))
+			continue;
+		if (guid && (*peg)->guid != guid)
+			continue;
+		if (force_off)
+			(*peg)->force_off = true;
+		else
+			(*peg)->force_on = true;
+		ret = true;
+	}
+
+	return ret;
+}
+
 /*
  * Clear the address field of regions that did not pass the checks in
  * skip_telem_region() so they will not be used by intel_aet_read_event().
@@ -172,6 +207,9 @@ static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
 	int skipped_events = 0;
 
+	if (e->force_off)
+		return false;
+
 	if (!group_has_usable_regions(e, p))
 		return false;
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH v17 24/32] x86/resctrl: Add energy/perf choices to rdt boot option
  2025-12-17 17:21 ` [PATCH v17 24/32] x86/resctrl: Add energy/perf choices to rdt boot option Tony Luck
@ 2026-01-09 22:16   ` Borislav Petkov
  2026-01-09 22:20     ` Luck, Tony
  0 siblings, 1 reply; 58+ messages in thread
From: Borislav Petkov @ 2026-01-09 22:16 UTC (permalink / raw)
  To: Tony Luck
  Cc: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu, x86,
	linux-kernel, patches

On Wed, Dec 17, 2025 at 09:21:11AM -0800, Tony Luck wrote:
> +bool intel_aet_option(bool force_off, char *tok)

What does that function do? Handle the AET options? So then

	intel_handle_aet_option()

?

IOW, you need a verb in its name so that it is clear what happens:

	if (intel_handle_aet_option())
		bla;

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [PATCH v17 24/32] x86/resctrl: Add energy/perf choices to rdt boot option
  2026-01-09 22:16   ` Borislav Petkov
@ 2026-01-09 22:20     ` Luck, Tony
  0 siblings, 0 replies; 58+ messages in thread
From: Luck, Tony @ 2026-01-09 22:20 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Fenghua Yu, Chatre, Reinette, Wieczor-Retman, Maciej,
	Peter Newman, James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Chen, Yu C, x86@kernel.org, linux-kernel@vger.kernel.org,
	patches@lists.linux.dev

> IOW, you need a verb in its name so that it is clear what happens:
>
>	if (intel_handle_aet_option())
>		bla;

Yup. That reads better.

-Tony

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v17 25/32] x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF_PKG
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (23 preceding siblings ...)
  2025-12-17 17:21 ` [PATCH v17 24/32] x86/resctrl: Add energy/perf choices to rdt boot option Tony Luck
@ 2025-12-17 17:21 ` Tony Luck
  2025-12-17 17:21 ` [PATCH v17 26/32] fs/resctrl: Move allocation/free of closid_num_dirty_rmid[] Tony Luck
                   ` (7 subsequent siblings)
  32 siblings, 0 replies; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:21 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

There are now three meanings for "number of RMIDs":

1) The number for legacy features enumerated by CPUID leaf 0xF. This is the
maximum number of distinct values that can be loaded into MSR_IA32_PQR_ASSOC.
Note that systems with Sub-NUMA Cluster mode enabled will force scaling down
the CPUID enumerated value by the number of SNC nodes per L3-cache.

2) The number of registers in MMIO space for each event. This is enumerated
in the XML files and is the value initialized into event_group::num_rmid.

3) The number of "hardware counters" (this isn't a strictly accurate
description of how things work, but serves as a useful analogy that does
describe the limitations) feeding to those MMIO registers. This is enumerated
in telemetry_region::num_rmids returned by intel_pmt_get_regions_by_feature()

Event groups with insufficient "hardware counters" to track all RMIDs are
difficult for users to use, since the system may reassign "hardware counters"
at any time. This means that users cannot reliably collect two consecutive
event counts to compute the rate at which events are occurring. Disable
such event groups by default. The user may override this with a command line
"rdt=" option. In this case limit an under-resourced event group's number of
possible monitor resource groups to the lowest number of "hardware counters".

Scan all enabled event groups and assign the RDT_RESOURCE_PERF_PKG resource
"num_rmid" value to the smallest of these values as this value will be used
later to compare against the number of RMIDs supported by other resources
to determine how many monitoring resource groups are supported.

N.B. Change type of resctrl_mon::num_rmid to u32 to match its usage and
the type of event_group::num_rmid so that min(r->num_rmid, e->num_rmid)
won't complain about mixing signed and unsigned types.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 include/linux/resctrl.h                 |  2 +-
 arch/x86/kernel/cpu/resctrl/intel_aet.c | 53 ++++++++++++++++++++++++-
 fs/resctrl/rdtgroup.c                   |  2 +-
 3 files changed, 54 insertions(+), 3 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 14126d228e61..8623e450619a 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -295,7 +295,7 @@ enum resctrl_schema_fmt {
  *			events of monitor groups created via mkdir.
  */
 struct resctrl_mon {
-	int			num_rmid;
+	u32			num_rmid;
 	unsigned int		mbm_cfg_mask;
 	int			num_mbm_cntrs;
 	bool			mbm_cntr_assignable;
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index df91e1ea4a7b..e50c04afc8eb 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -22,6 +22,7 @@
 #include <linux/intel_pmt_features.h>
 #include <linux/intel_vsec.h>
 #include <linux/io.h>
+#include <linux/minmax.h>
 #include <linux/printk.h>
 #include <linux/rculist.h>
 #include <linux/rcupdate.h>
@@ -60,10 +61,14 @@ struct pmt_event {
  *			Valid if the system supports the event group,
  *			NULL otherwise.
  * @force_off:		True when "rdt" command line or architecture code disables
- *			this event group.
+ *			this event group due to insufficient RMIDs.
  * @force_on:		True when "rdt" command line overrides disable of this
  *			event group.
  * @guid:		Unique number per XML description file.
+ * @num_rmid:		Number of RMIDs supported by this group. May be
+ *			adjusted downwards if enumeration from
+ *			intel_pmt_get_regions_by_feature() indicates fewer
+ *			RMIDs can be tracked simultaneously.
  * @mmio_size:		Number of bytes of MMIO registers for this group.
  * @num_events:		Number of events in this group.
  * @evts:		Array of event descriptors.
@@ -76,6 +81,7 @@ struct event_group {
 
 	/* Remaining fields initialized from XML file. */
 	u32				guid;
+	u32				num_rmid;
 	size_t				mmio_size;
 	unsigned int			num_events;
 	struct pmt_event		evts[] __counted_by(num_events);
@@ -90,6 +96,7 @@ struct event_group {
 static struct event_group energy_0x26696143 = {
 	.pfname		= "energy",
 	.guid		= 0x26696143,
+	.num_rmid	= 576,
 	.mmio_size	= XML_MMIO_SIZE(576, 2, 3),
 	.num_events	= 2,
 	.evts		= {
@@ -104,6 +111,7 @@ static struct event_group energy_0x26696143 = {
 static struct event_group perf_0x26557651 = {
 	.pfname		= "perf",
 	.guid		= 0x26557651,
+	.num_rmid	= 576,
 	.mmio_size	= XML_MMIO_SIZE(576, 7, 3),
 	.num_events	= 7,
 	.evts		= {
@@ -202,6 +210,23 @@ static bool group_has_usable_regions(struct event_group *e, struct pmt_feature_g
 	return usable_regions;
 }
 
+static bool all_regions_have_sufficient_rmid(struct event_group *e, struct pmt_feature_group *p)
+{
+	struct telemetry_region *tr;
+
+	for (int i = 0; i < p->count; i++) {
+		if (!p->regions[i].addr)
+			continue;
+		tr = &p->regions[i];
+		if (tr->num_rmids < e->num_rmid) {
+			e->force_off = true;
+			return false;
+		}
+	}
+
+	return true;
+}
+
 static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
 {
 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
@@ -213,6 +238,27 @@ static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
 	if (!group_has_usable_regions(e, p))
 		return false;
 
+	/*
+	 * Only enable event group with insufficient RMIDs if the user requested
+	 * it from the kernel command line.
+	 */
+	if (!all_regions_have_sufficient_rmid(e, p) && !e->force_on) {
+		pr_info("%s %s:0x%x monitoring not enabled due to insufficient RMIDs\n",
+			r->name, e->pfname, e->guid);
+		return false;
+	}
+
+	for (int i = 0; i < p->count; i++) {
+		if (!p->regions[i].addr)
+			continue;
+		/*
+		 * e->num_rmid only adjusted lower if user (via rdt= kernel
+		 * parameter) forces an event group with insufficient RMID
+		 * to be enabled.
+		 */
+		e->num_rmid = min(e->num_rmid, p->regions[i].num_rmids);
+	}
+
 	for (int j = 0; j < e->num_events; j++) {
 		if (!resctrl_enable_mon_event(e->evts[j].id, true,
 					      e->evts[j].bin_bits, &e->evts[j]))
@@ -223,6 +269,11 @@ static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
 		return false;
 	}
 
+	if (r->mon.num_rmid)
+		r->mon.num_rmid = min(r->mon.num_rmid, e->num_rmid);
+	else
+		r->mon.num_rmid = e->num_rmid;
+
 	return true;
 }
 
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index ac3c6e44b7c5..60ce2390723e 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1157,7 +1157,7 @@ static int rdt_num_rmids_show(struct kernfs_open_file *of,
 {
 	struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
 
-	seq_printf(seq, "%d\n", r->mon.num_rmid);
+	seq_printf(seq, "%u\n", r->mon.num_rmid);
 
 	return 0;
 }
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v17 26/32] fs/resctrl: Move allocation/free of closid_num_dirty_rmid[]
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (24 preceding siblings ...)
  2025-12-17 17:21 ` [PATCH v17 25/32] x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF_PKG Tony Luck
@ 2025-12-17 17:21 ` Tony Luck
  2025-12-17 17:21 ` [PATCH v17 27/32] x86,fs/resctrl: Compute number of RMIDs as minimum across resources Tony Luck
                   ` (6 subsequent siblings)
  32 siblings, 0 replies; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:21 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

closid_num_dirty_rmid[] and rmid_ptrs[] are allocated together during resctrl
initialization and freed together during resctrl exit.

Telemetry events are enumerated on resctrl mount so only at resctrl mount
will the number of RMID supported by all monitoring resources and needed as
size for rmid_ptrs[] be known.

Separate closid_num_dirty_rmid[] and rmid_ptrs[] allocation and free in
preparation for rmid_ptrs[] to be allocated on resctrl mount.

Keep the rdtgroup_mutex protection around the allocation and free of
closid_num_dirty_rmid[] as ARM needs this to guarantee memory ordering.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 fs/resctrl/monitor.c | 79 ++++++++++++++++++++++++++++----------------
 1 file changed, 51 insertions(+), 28 deletions(-)

diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 8a4c2ae72740..bbf8c9037887 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -906,36 +906,14 @@ void mbm_setup_overflow_handler(struct rdt_l3_mon_domain *dom, unsigned long del
 static int dom_data_init(struct rdt_resource *r)
 {
 	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
-	u32 num_closid = resctrl_arch_get_num_closid(r);
 	struct rmid_entry *entry = NULL;
 	int err = 0, i;
 	u32 idx;
 
 	mutex_lock(&rdtgroup_mutex);
-	if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
-		u32 *tmp;
-
-		/*
-		 * If the architecture hasn't provided a sanitised value here,
-		 * this may result in larger arrays than necessary. Resctrl will
-		 * use a smaller system wide value based on the resources in
-		 * use.
-		 */
-		tmp = kcalloc(num_closid, sizeof(*tmp), GFP_KERNEL);
-		if (!tmp) {
-			err = -ENOMEM;
-			goto out_unlock;
-		}
-
-		closid_num_dirty_rmid = tmp;
-	}
 
 	rmid_ptrs = kcalloc(idx_limit, sizeof(struct rmid_entry), GFP_KERNEL);
 	if (!rmid_ptrs) {
-		if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
-			kfree(closid_num_dirty_rmid);
-			closid_num_dirty_rmid = NULL;
-		}
 		err = -ENOMEM;
 		goto out_unlock;
 	}
@@ -971,11 +949,6 @@ static void dom_data_exit(struct rdt_resource *r)
 	if (!r->mon_capable)
 		goto out_unlock;
 
-	if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
-		kfree(closid_num_dirty_rmid);
-		closid_num_dirty_rmid = NULL;
-	}
-
 	kfree(rmid_ptrs);
 	rmid_ptrs = NULL;
 
@@ -1814,6 +1787,45 @@ ssize_t mbm_L3_assignments_write(struct kernfs_open_file *of, char *buf,
 	return ret ?: nbytes;
 }
 
+static int closid_num_dirty_rmid_alloc(struct rdt_resource *r)
+{
+	if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
+		u32 num_closid = resctrl_arch_get_num_closid(r);
+		u32 *tmp;
+
+		/* For ARM memory ordering access to closid_num_dirty_rmid */
+		mutex_lock(&rdtgroup_mutex);
+
+		/*
+		 * If the architecture hasn't provided a sanitised value here,
+		 * this may result in larger arrays than necessary. Resctrl will
+		 * use a smaller system wide value based on the resources in
+		 * use.
+		 */
+		tmp = kcalloc(num_closid, sizeof(*tmp), GFP_KERNEL);
+		if (!tmp) {
+			mutex_unlock(&rdtgroup_mutex);
+			return -ENOMEM;
+		}
+
+		closid_num_dirty_rmid = tmp;
+
+		mutex_unlock(&rdtgroup_mutex);
+	}
+
+	return 0;
+}
+
+static void closid_num_dirty_rmid_free(void)
+{
+	if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
+		mutex_lock(&rdtgroup_mutex);
+		kfree(closid_num_dirty_rmid);
+		closid_num_dirty_rmid = NULL;
+		mutex_unlock(&rdtgroup_mutex);
+	}
+}
+
 /**
  * resctrl_l3_mon_resource_init() - Initialise global monitoring structures.
  *
@@ -1834,10 +1846,16 @@ int resctrl_l3_mon_resource_init(void)
 	if (!r->mon_capable)
 		return 0;
 
-	ret = dom_data_init(r);
+	ret = closid_num_dirty_rmid_alloc(r);
 	if (ret)
 		return ret;
 
+	ret = dom_data_init(r);
+	if (ret) {
+		closid_num_dirty_rmid_free();
+		return ret;
+	}
+
 	if (resctrl_arch_is_evt_configurable(QOS_L3_MBM_TOTAL_EVENT_ID)) {
 		mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID].configurable = true;
 		resctrl_file_fflags_init("mbm_total_bytes_config",
@@ -1880,5 +1898,10 @@ void resctrl_l3_mon_resource_exit(void)
 {
 	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
 
+	if (!r->mon_capable)
+		return;
+
+	closid_num_dirty_rmid_free();
+
 	dom_data_exit(r);
 }
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v17 27/32] x86,fs/resctrl: Compute number of RMIDs as minimum across resources
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (25 preceding siblings ...)
  2025-12-17 17:21 ` [PATCH v17 26/32] fs/resctrl: Move allocation/free of closid_num_dirty_rmid[] Tony Luck
@ 2025-12-17 17:21 ` Tony Luck
  2025-12-17 17:21 ` [PATCH v17 28/32] fs/resctrl: Move RMID initialization to first mount Tony Luck
                   ` (5 subsequent siblings)
  32 siblings, 0 replies; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:21 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

resctrl assumes that only the L3 resource supports monitor events, so it
simply takes the rdt_resource::num_rmid from RDT_RESOURCE_L3 as the system's
number of RMIDs.

The addition of telemetry events in a different resource breaks that
assumption.

Compute the number of available RMIDs as the minimum value across all
mon_capable resources (analogous to how the number of CLOSIDs is computed
across alloc_capable resources).

Note that mount time enumeration of the telemetry resource means that
this number can be reduced. If this happens, then some memory will
be wasted as the allocations for rdt_l3_mon_domain::mbm_states[] and
rdt_l3_mon_domain::rmid_busy_llc created during resctrl initialization will
be larger than needed.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/resctrl/core.c | 15 +++++++++++++--
 fs/resctrl/rdtgroup.c              |  6 ++++++
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 960974ffa866..bca65851d592 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -110,12 +110,23 @@ struct rdt_hw_resource rdt_resources_all[RDT_NUM_RESOURCES] = {
 	},
 };
 
+/**
+ * resctrl_arch_system_num_rmid_idx - Compute number of supported RMIDs
+ *				      (minimum across all mon_capable resource)
+ *
+ * Return: Number of supported RMIDs at time of call. Note that mount time
+ * enumeration of resources may reduce the number.
+ */
 u32 resctrl_arch_system_num_rmid_idx(void)
 {
-	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+	u32 num_rmids = U32_MAX;
+	struct rdt_resource *r;
+
+	for_each_mon_capable_rdt_resource(r)
+		num_rmids = min(num_rmids, r->mon.num_rmid);
 
 	/* RMID are independent numbers for x86. num_rmid_idx == num_rmid */
-	return r->mon.num_rmid;
+	return num_rmids == U32_MAX ? 0 : num_rmids;
 }
 
 struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 60ce2390723e..e95d3d0dc515 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -4352,6 +4352,12 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *h
  * During boot this may be called before global allocations have been made by
  * resctrl_l3_mon_resource_init().
  *
+ * Called during CPU online that may run as soon as CPU online callbacks
+ * are set up during resctrl initialization. The number of supported RMIDs
+ * may be reduced if additional mon_capable resources are enumerated
+ * at mount time. This means the rdt_l3_mon_domain::mbm_states[] and
+ * rdt_l3_mon_domain::rmid_busy_llc allocations may be larger than needed.
+ *
  * Return: 0 for success, or -ENOMEM.
  */
 static int domain_setup_l3_mon_state(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v17 28/32] fs/resctrl: Move RMID initialization to first mount
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (26 preceding siblings ...)
  2025-12-17 17:21 ` [PATCH v17 27/32] x86,fs/resctrl: Compute number of RMIDs as minimum across resources Tony Luck
@ 2025-12-17 17:21 ` Tony Luck
  2025-12-17 17:21 ` [PATCH v17 29/32] x86/resctrl: Enable RDT_RESOURCE_PERF_PKG Tony Luck
                   ` (4 subsequent siblings)
  32 siblings, 0 replies; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:21 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

L3 monitor features are enumerated during resctrl initialization and
rmid_ptrs[] that tracks all RMIDs and depends on the number of supported
RMIDs is allocated during this time.

Telemetry monitor features are enumerated during first resctrl mount and
may support a different number of RMIDs compared to L3 monitor features.

Delay allocation and initialization of rmid_ptrs[] until first mount.
Since the number of RMIDs cannot change on later mounts, keep the same set of
rmid_ptrs[] until resctrl_exit(). This is required because the limbo handler
keeps running after resctrl is unmounted and needs to access rmid_ptrs[]
as it keeps tracking busy RMIDs after unmount.

Rename routines to match what they now do:
dom_data_init() -> setup_rmid_lru_list()
dom_data_exit() -> free_rmid_lru_list()

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 fs/resctrl/internal.h |  4 ++++
 fs/resctrl/monitor.c  | 54 ++++++++++++++++++++-----------------------
 fs/resctrl/rdtgroup.c |  5 ++++
 3 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 399f625be67d..1a9b29119f88 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -369,6 +369,10 @@ int closids_supported(void);
 
 void closid_free(int closid);
 
+int setup_rmid_lru_list(void);
+
+void free_rmid_lru_list(void);
+
 int alloc_rmid(u32 closid);
 
 void free_rmid(u32 closid, u32 rmid);
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index bbf8c9037887..0cd5476a483a 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -903,20 +903,29 @@ void mbm_setup_overflow_handler(struct rdt_l3_mon_domain *dom, unsigned long del
 		schedule_delayed_work_on(cpu, &dom->mbm_over, delay);
 }
 
-static int dom_data_init(struct rdt_resource *r)
+int setup_rmid_lru_list(void)
 {
-	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
 	struct rmid_entry *entry = NULL;
-	int err = 0, i;
+	u32 idx_limit;
 	u32 idx;
+	int i;
 
-	mutex_lock(&rdtgroup_mutex);
+	if (!resctrl_arch_mon_capable())
+		return 0;
 
+	/*
+	 * Called on every mount, but the number of RMIDs cannot change
+	 * after the first mount, so keep using the same set of rmid_ptrs[]
+	 * until resctrl_exit(). Note that the limbo handler continues to
+	 * access rmid_ptrs[] after resctrl is unmounted.
+	 */
+	if (rmid_ptrs)
+		return 0;
+
+	idx_limit = resctrl_arch_system_num_rmid_idx();
 	rmid_ptrs = kcalloc(idx_limit, sizeof(struct rmid_entry), GFP_KERNEL);
-	if (!rmid_ptrs) {
-		err = -ENOMEM;
-		goto out_unlock;
-	}
+	if (!rmid_ptrs)
+		return -ENOMEM;
 
 	for (i = 0; i < idx_limit; i++) {
 		entry = &rmid_ptrs[i];
@@ -929,30 +938,24 @@ static int dom_data_init(struct rdt_resource *r)
 	/*
 	 * RESCTRL_RESERVED_CLOSID and RESCTRL_RESERVED_RMID are special and
 	 * are always allocated. These are used for the rdtgroup_default
-	 * control group, which will be setup later in resctrl_init().
+	 * control group, which was setup earlier in rdtgroup_setup_default().
 	 */
 	idx = resctrl_arch_rmid_idx_encode(RESCTRL_RESERVED_CLOSID,
 					   RESCTRL_RESERVED_RMID);
 	entry = __rmid_entry(idx);
 	list_del(&entry->list);
 
-out_unlock:
-	mutex_unlock(&rdtgroup_mutex);
-
-	return err;
+	return 0;
 }
 
-static void dom_data_exit(struct rdt_resource *r)
+void free_rmid_lru_list(void)
 {
-	mutex_lock(&rdtgroup_mutex);
-
-	if (!r->mon_capable)
-		goto out_unlock;
+	if (!resctrl_arch_mon_capable())
+		return;
 
+	mutex_lock(&rdtgroup_mutex);
 	kfree(rmid_ptrs);
 	rmid_ptrs = NULL;
-
-out_unlock:
 	mutex_unlock(&rdtgroup_mutex);
 }
 
@@ -1830,7 +1833,8 @@ static void closid_num_dirty_rmid_free(void)
  * resctrl_l3_mon_resource_init() - Initialise global monitoring structures.
  *
  * Allocate and initialise global monitor resources that do not belong to a
- * specific domain. i.e. the rmid_ptrs[] used for the limbo and free lists.
+ * specific domain. i.e. the closid_num_dirty_rmid[] used to find the CLOSID
+ * with the cleanest set of RMIDs.
  * Called once during boot after the struct rdt_resource's have been configured
  * but before the filesystem is mounted.
  * Resctrl's cpuhp callbacks may be called before this point to bring a domain
@@ -1850,12 +1854,6 @@ int resctrl_l3_mon_resource_init(void)
 	if (ret)
 		return ret;
 
-	ret = dom_data_init(r);
-	if (ret) {
-		closid_num_dirty_rmid_free();
-		return ret;
-	}
-
 	if (resctrl_arch_is_evt_configurable(QOS_L3_MBM_TOTAL_EVENT_ID)) {
 		mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID].configurable = true;
 		resctrl_file_fflags_init("mbm_total_bytes_config",
@@ -1902,6 +1900,4 @@ void resctrl_l3_mon_resource_exit(void)
 		return;
 
 	closid_num_dirty_rmid_free();
-
-	dom_data_exit(r);
 }
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index e95d3d0dc515..d49ffc56ea61 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2799,6 +2799,10 @@ static int rdt_get_tree(struct fs_context *fc)
 		goto out;
 	}
 
+	ret = setup_rmid_lru_list();
+	if (ret)
+		goto out;
+
 	ret = rdtgroup_setup_root(ctx);
 	if (ret)
 		goto out;
@@ -4654,4 +4658,5 @@ void resctrl_exit(void)
 	 */
 
 	resctrl_l3_mon_resource_exit();
+	free_rmid_lru_list();
 }
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v17 29/32] x86/resctrl: Enable RDT_RESOURCE_PERF_PKG
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (27 preceding siblings ...)
  2025-12-17 17:21 ` [PATCH v17 28/32] fs/resctrl: Move RMID initialization to first mount Tony Luck
@ 2025-12-17 17:21 ` Tony Luck
  2025-12-17 17:21 ` [PATCH v17 30/32] fs/resctrl: Provide interface to create architecture specific debugfs area Tony Luck
                   ` (3 subsequent siblings)
  32 siblings, 0 replies; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:21 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Since telemetry events are enumerated on resctrl mount the RDT_RESOURCE_PERF_PKG
resource is not considered "monitoring capable" during early resctrl initialization.
This means that the domain list for RDT_RESOURCE_PERF_PKG is not built when the CPU
hot plug notifiers are registered and run for the first time right after resctrl
initialization.

Mark the RDT_RESOURCE_PERF_PKG as "monitoring capable" upon successful telemetry
event enumeration to ensure future CPU hotplug events include this resource and
initialize its domain list for CPUs that are already online.

Print to console log announcing the name of the telemetry feature detected.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/resctrl/core.c      | 16 +++++++++++++++-
 arch/x86/kernel/cpu/resctrl/intel_aet.c |  6 ++++++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index bca65851d592..829633bc54e5 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -766,14 +766,28 @@ static int resctrl_arch_offline_cpu(unsigned int cpu)
 
 void resctrl_arch_pre_mount(void)
 {
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
 	static atomic_t only_once = ATOMIC_INIT(0);
-	int old = 0;
+	int cpu, old = 0;
 
 	if (!atomic_try_cmpxchg(&only_once, &old, 1))
 		return;
 
 	if (!intel_aet_get_events())
 		return;
+
+	/*
+	 * Late discovery of telemetry events means the domains for the
+	 * resource were not built. Do that now.
+	 */
+	cpus_read_lock();
+	mutex_lock(&domain_list_lock);
+	r->mon_capable = true;
+	rdt_mon_capable = true;
+	for_each_online_cpu(cpu)
+		domain_add_cpu_mon(cpu, r);
+	mutex_unlock(&domain_list_lock);
+	cpus_read_unlock();
 }
 
 enum {
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index e50c04afc8eb..3ee51b05a297 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -274,6 +274,12 @@ static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
 	else
 		r->mon.num_rmid = e->num_rmid;
 
+	if (skipped_events)
+		pr_info("%s %s:0x%x monitoring detected (skipped %d events)\n", r->name,
+			e->pfname, e->guid, skipped_events);
+	else
+		pr_info("%s %s:0x%x monitoring detected\n", r->name, e->pfname, e->guid);
+
 	return true;
 }
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v17 30/32] fs/resctrl: Provide interface to create architecture specific debugfs area
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (28 preceding siblings ...)
  2025-12-17 17:21 ` [PATCH v17 29/32] x86/resctrl: Enable RDT_RESOURCE_PERF_PKG Tony Luck
@ 2025-12-17 17:21 ` Tony Luck
  2026-01-10 10:57   ` Borislav Petkov
  2025-12-17 17:21 ` [PATCH v17 31/32] x86/resctrl: Add debugfs files to show telemetry aggregator status Tony Luck
                   ` (2 subsequent siblings)
  32 siblings, 1 reply; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:21 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

All files below /sys/fs/resctrl are considered user ABI.

This leaves no place for architectures to provide additional interfaces.

Add resctrl_debugfs_mon_info_arch_mkdir() which creates a directory in
the debugfs file system for a monitoring resource. Naming follows the
layout of the main resctrl hierarchy:

	/sys/kernel/debug/resctrl/info/{resource}_MON/{arch}

The {arch} last level directory name matches the output of the user level
"uname -m" command.

Architecture code may use this directory for debug information, or for minor
tuning of features. It must not be used for basic feature enabling as debugfs
may not be configured/mounted on production systems.

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 include/linux/resctrl.h | 10 ++++++++++
 fs/resctrl/rdtgroup.c   | 29 +++++++++++++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 8623e450619a..b862a9dd785b 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -702,6 +702,16 @@ bool resctrl_arch_get_io_alloc_enabled(struct rdt_resource *r);
 extern unsigned int resctrl_rmid_realloc_threshold;
 extern unsigned int resctrl_rmid_realloc_limit;
 
+/**
+ * resctrl_debugfs_mon_info_arch_mkdir() - Create a debugfs info directory.
+ *					   Removed by resctrl_exit().
+ * @r:	Resource (must be mon_capable).
+ *
+ * Return: NULL if resource is not monitoring capable,
+ * dentry pointer on success, or ERR_PTR(-ERROR) on failure.
+ */
+struct dentry *resctrl_debugfs_mon_info_arch_mkdir(struct rdt_resource *r);
+
 int resctrl_init(void);
 void resctrl_exit(void);
 
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index d49ffc56ea61..d13291d71adf 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -24,6 +24,7 @@
 #include <linux/sched/task.h>
 #include <linux/slab.h>
 #include <linux/user_namespace.h>
+#include <linux/utsname.h>
 
 #include <uapi/linux/magic.h>
 
@@ -75,6 +76,8 @@ static void rdtgroup_destroy_root(void);
 
 struct dentry *debugfs_resctrl;
 
+static struct dentry *debugfs_resctrl_info;
+
 /*
  * Memory bandwidth monitoring event to use for the default CTRL_MON group
  * and each new CTRL_MON group created by the user.  Only relevant when
@@ -4599,6 +4602,31 @@ int resctrl_init(void)
 	return ret;
 }
 
+/*
+ * Create /sys/kernel/debug/resctrl/info/{r->name}_MON/{arch} directory
+ * by request for architecture to use for debugging or minor tuning.
+ * Basic functionality of features must not be controlled by files
+ * added to this directory as debugfs may not be configured/mounted
+ * on production systems.
+ */
+struct dentry *resctrl_debugfs_mon_info_arch_mkdir(struct rdt_resource *r)
+{
+	struct dentry *moninfodir;
+	char name[32];
+
+	if (!r->mon_capable)
+		return NULL;
+
+	if (!debugfs_resctrl_info)
+		debugfs_resctrl_info = debugfs_create_dir("info", debugfs_resctrl);
+
+	sprintf(name, "%s_MON", r->name);
+
+	moninfodir = debugfs_create_dir(name, debugfs_resctrl_info);
+
+	return debugfs_create_dir(utsname()->machine, moninfodir);
+}
+
 static bool resctrl_online_domains_exist(void)
 {
 	struct rdt_resource *r;
@@ -4650,6 +4678,7 @@ void resctrl_exit(void)
 
 	debugfs_remove_recursive(debugfs_resctrl);
 	debugfs_resctrl = NULL;
+	debugfs_resctrl_info = NULL;
 	unregister_filesystem(&rdt_fs_type);
 
 	/*
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH v17 30/32] fs/resctrl: Provide interface to create architecture specific debugfs area
  2025-12-17 17:21 ` [PATCH v17 30/32] fs/resctrl: Provide interface to create architecture specific debugfs area Tony Luck
@ 2026-01-10 10:57   ` Borislav Petkov
  2026-01-10 19:13     ` Luck, Tony
  0 siblings, 1 reply; 58+ messages in thread
From: Borislav Petkov @ 2026-01-10 10:57 UTC (permalink / raw)
  To: Tony Luck
  Cc: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu, x86,
	linux-kernel, patches

On Wed, Dec 17, 2025 at 09:21:17AM -0800, Tony Luck wrote:
> All files below /sys/fs/resctrl are considered user ABI.
> 
> This leaves no place for architectures to provide additional interfaces.
> 
> Add resctrl_debugfs_mon_info_arch_mkdir() which creates a directory in
> the debugfs file system for a monitoring resource. Naming follows the
> layout of the main resctrl hierarchy:
> 
> 	/sys/kernel/debug/resctrl/info/{resource}_MON/{arch}
> 
> The {arch} last level directory name matches the output of the user level
> "uname -m" command.
> 
> Architecture code may use this directory for debug information, or for minor
> tuning of features. It must not be used for basic feature enabling as debugfs
> may not be configured/mounted on production systems.

I, like you guys, thought that debugfs is "safe" in the sense, stuff there
can't really be an ABI but just recently at LPC I got schooled about it and,
basically, if anything in luserspace starts using debugfs, it will be
considered an ABI.

And this thinking has been there for a looong time now:

https://lwn.net/Articles/309298/

So, before you put anything there, think again because you might end up
supporting it just like an ABI.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v17 30/32] fs/resctrl: Provide interface to create architecture specific debugfs area
  2026-01-10 10:57   ` Borislav Petkov
@ 2026-01-10 19:13     ` Luck, Tony
  2026-01-10 19:42       ` Borislav Petkov
  0 siblings, 1 reply; 58+ messages in thread
From: Luck, Tony @ 2026-01-10 19:13 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu, x86,
	linux-kernel, patches

On Sat, Jan 10, 2026 at 11:57:58AM +0100, Borislav Petkov wrote:
> On Wed, Dec 17, 2025 at 09:21:17AM -0800, Tony Luck wrote:
> > All files below /sys/fs/resctrl are considered user ABI.
> > 
> > This leaves no place for architectures to provide additional interfaces.
> > 
> > Add resctrl_debugfs_mon_info_arch_mkdir() which creates a directory in
> > the debugfs file system for a monitoring resource. Naming follows the
> > layout of the main resctrl hierarchy:
> > 
> > 	/sys/kernel/debug/resctrl/info/{resource}_MON/{arch}
> > 
> > The {arch} last level directory name matches the output of the user level
> > "uname -m" command.
> > 
> > Architecture code may use this directory for debug information, or for minor
> > tuning of features. It must not be used for basic feature enabling as debugfs
> > may not be configured/mounted on production systems.
> 
> I, like you guys, thought that debugfs is "safe" in the sense, stuff there
> can't really be an ABI but just recently at LPC I got schooled about it and,
> basically, if anything in luserspace starts using debugfs, it will be
> considered an ABI.
> 
> And this thinking has been there for a looong time now:
> 
> https://lwn.net/Articles/309298/
> 
> So, before you put anything there, think again because you might end up
> supporting it just like an ABI.

Please drop patches 30, 31, and the debugfs hunk to Documentation from patch 32
while I look at options. The debugfs bits aren't needed for normal
operation, just for debugging if things don't appear to be working as
expected. No need to hold up inclusion waiting for a solution.

Telemetry based events tied to RMIDs are new in Clearwater Forest. I
have line of sight to a second platorm. But extrapolating from two
data points to decide on a stable ABI seems very risky.

Thanks

-Tony

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v17 30/32] fs/resctrl: Provide interface to create architecture specific debugfs area
  2026-01-10 19:13     ` Luck, Tony
@ 2026-01-10 19:42       ` Borislav Petkov
  2026-01-10 23:29         ` Luck, Tony
  0 siblings, 1 reply; 58+ messages in thread
From: Borislav Petkov @ 2026-01-10 19:42 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu, x86,
	linux-kernel, patches

On Sat, Jan 10, 2026 at 11:13:40AM -0800, Luck, Tony wrote:
> Please drop patches 30, 31, and the debugfs hunk to Documentation from patch 32
> while I look at options. The debugfs bits aren't needed for normal
> operation, just for debugging if things don't appear to be working as
> expected. No need to hold up inclusion waiting for a solution.
> 
> Telemetry based events tied to RMIDs are new in Clearwater Forest. I
> have line of sight to a second platorm. But extrapolating from two
> data points to decide on a stable ABI seems very risky.

Ok, makes sense.

I've pushed out the branch, you could have a look and lemme know whether
something's still amiss.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [PATCH v17 30/32] fs/resctrl: Provide interface to create architecture specific debugfs area
  2026-01-10 19:42       ` Borislav Petkov
@ 2026-01-10 23:29         ` Luck, Tony
  0 siblings, 0 replies; 58+ messages in thread
From: Luck, Tony @ 2026-01-10 23:29 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Fenghua Yu, Chatre, Reinette, Wieczor-Retman, Maciej,
	Peter Newman, James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Chen, Yu C, x86@kernel.org, linux-kernel@vger.kernel.org,
	patches@lists.linux.dev

> I've pushed out the branch, you could have a look and lemme know whether
> something's still amiss.

Basic build and simple tests pass. So looks good so far.

Thanks

-Tony

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v17 31/32] x86/resctrl: Add debugfs files to show telemetry aggregator status
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (29 preceding siblings ...)
  2025-12-17 17:21 ` [PATCH v17 30/32] fs/resctrl: Provide interface to create architecture specific debugfs area Tony Luck
@ 2025-12-17 17:21 ` Tony Luck
  2025-12-17 17:21 ` [PATCH v17 32/32] x86,fs/resctrl: Update documentation for telemetry events Tony Luck
  2025-12-17 22:16 ` [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Reinette Chatre
  32 siblings, 0 replies; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:21 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Each telemetry aggregator provides three status registers at the top end of
MMIO space after all the per-RMID per-event counters:

  data_loss_count: This counts the number of times that this aggregator
  failed to accumulate a counter value supplied by a CPU core.

  data_loss_timestamp: This is a "timestamp" from a free running
  25MHz uncore timer indicating when the most recent data loss occurred.

  last_update_timestamp: Another 25MHz timestamp indicating when the
  most recent counter update was successfully applied.

Create files in /sys/kernel/debug/resctrl/info/PERF_PKG_MON/x86_64/ to display
the value of each of these status registers for each aggregator in each enabled
event group. The prefix for each file name describes the type of aggregator,
the guid, which package it is located on, and an opaque instance number to
provide a unique file name when there are multiple aggregators on a package.

The suffix is one of the three strings listed above. An example name is:

	energy_0x26696143_pkg0_agg2_data_loss_count

These files are removed along with all other debugfs entries by the call to
debugfs_remove_recursive() in resctrl_exit().

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 arch/x86/kernel/cpu/resctrl/internal.h  |  2 +
 arch/x86/kernel/cpu/resctrl/core.c      |  2 +
 arch/x86/kernel/cpu/resctrl/intel_aet.c | 60 +++++++++++++++++++++++++
 3 files changed, 64 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index df09091f7c6c..e538174fe193 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -236,6 +236,7 @@ void __exit intel_aet_exit(void);
 int intel_aet_read_event(int domid, u32 rmid, void *arch_priv, u64 *val);
 void intel_aet_mon_domain_setup(int cpu, int id, struct rdt_resource *r,
 				struct list_head *add_pos);
+void intel_aet_add_debugfs(void);
 bool intel_aet_option(bool force_off, char *tok);
 #else
 static inline bool intel_aet_get_events(void) { return false; }
@@ -247,6 +248,7 @@ static inline int intel_aet_read_event(int domid, u32 rmid, void *arch_priv, u64
 
 static inline void intel_aet_mon_domain_setup(int cpu, int id, struct rdt_resource *r,
 					      struct list_head *add_pos) { }
+static inline void intel_aet_add_debugfs(void) { }
 static inline bool intel_aet_option(bool force_off, char *tok) { return false; }
 #endif
 
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 829633bc54e5..62e96aad060d 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -788,6 +788,8 @@ void resctrl_arch_pre_mount(void)
 		domain_add_cpu_mon(cpu, r);
 	mutex_unlock(&domain_list_lock);
 	cpus_read_unlock();
+
+	intel_aet_add_debugfs();
 }
 
 enum {
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index 3ee51b05a297..ba5a64a74bd6 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -15,8 +15,11 @@
 #include <linux/compiler_types.h>
 #include <linux/container_of.h>
 #include <linux/cpumask.h>
+#include <linux/debugfs.h>
+#include <linux/dcache.h>
 #include <linux/err.h>
 #include <linux/errno.h>
+#include <linux/fs.h>
 #include <linux/gfp_types.h>
 #include <linux/init.h>
 #include <linux/intel_pmt_features.h>
@@ -29,6 +32,7 @@
 #include <linux/resctrl.h>
 #include <linux/resctrl_types.h>
 #include <linux/slab.h>
+#include <linux/sprintf.h>
 #include <linux/stddef.h>
 #include <linux/topology.h>
 #include <linux/types.h>
@@ -227,6 +231,46 @@ static bool all_regions_have_sufficient_rmid(struct event_group *e, struct pmt_f
 	return true;
 }
 
+static int status_read(void *priv, u64 *val)
+{
+	void __iomem *info = (void __iomem *)priv;
+
+	*val = readq(info);
+
+	return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(status_fops, status_read, NULL, "%llu\n");
+
+static void make_status_files(struct dentry *dir, struct event_group *e, u8 pkg,
+			      int instance, void *info_end)
+{
+	char name[80];
+
+	sprintf(name, "%s_0x%x_pkg%u_agg%d_data_loss_count", e->pfname, e->guid, pkg, instance);
+	debugfs_create_file(name, 0400, dir, info_end - 24, &status_fops);
+
+	sprintf(name, "%s_0x%x_pkg%u_agg%d_data_loss_timestamp", e->pfname, e->guid, pkg, instance);
+	debugfs_create_file(name, 0400, dir, info_end - 16, &status_fops);
+
+	sprintf(name, "%s_0x%x_pkg%u_agg%d_last_update_timestamp", e->pfname, e->guid, pkg, instance);
+	debugfs_create_file(name, 0400, dir, info_end - 8, &status_fops);
+}
+
+static void create_debug_event_status_files(struct dentry *dir, struct event_group *e)
+{
+	struct pmt_feature_group *p = e->pfg;
+	void *info_end;
+
+	for (int i = 0; i < p->count; i++) {
+		if (!p->regions[i].addr)
+			continue;
+		info_end = (void __force *)p->regions[i].addr + e->mmio_size;
+		make_status_files(dir, e, p->regions[i].plat_info.package_id,
+				  i, info_end);
+	}
+}
+
 static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
 {
 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
@@ -411,3 +455,19 @@ void intel_aet_mon_domain_setup(int cpu, int id, struct rdt_resource *r,
 		kfree(d);
 	}
 }
+
+void intel_aet_add_debugfs(void)
+{
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
+	struct event_group **peg;
+	struct dentry *infodir;
+
+	infodir = resctrl_debugfs_mon_info_arch_mkdir(r);
+
+	if (IS_ERR_OR_NULL(infodir))
+		return;
+
+	for_each_event_group(peg)
+		if ((*peg)->pfg)
+			create_debug_event_status_files(infodir, *peg);
+}
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v17 32/32] x86,fs/resctrl: Update documentation for telemetry events
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (30 preceding siblings ...)
  2025-12-17 17:21 ` [PATCH v17 31/32] x86/resctrl: Add debugfs files to show telemetry aggregator status Tony Luck
@ 2025-12-17 17:21 ` Tony Luck
  2025-12-17 22:16 ` [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Reinette Chatre
  32 siblings, 0 replies; 58+ messages in thread
From: Tony Luck @ 2025-12-17 17:21 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Update resctrl filesystem documentation with the details about the
resctrl files that support telemetry events.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
 Documentation/filesystems/resctrl.rst | 101 +++++++++++++++++++++++---
 1 file changed, 89 insertions(+), 12 deletions(-)

diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index 8c8ce678148a..513d04e0be29 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -252,13 +252,12 @@ with respect to allocation:
 			bandwidth percentages are directly applied to
 			the threads running on the core
 
-If RDT monitoring is available there will be an "L3_MON" directory
+If L3 monitoring is available there will be an "L3_MON" directory
 with the following files:
 
 "num_rmids":
-		The number of RMIDs available. This is the
-		upper bound for how many "CTRL_MON" + "MON"
-		groups can be created.
+		The number of RMIDs supported by hardware for
+		L3 monitoring events.
 
 "mon_features":
 		Lists the monitoring events if
@@ -484,6 +483,24 @@ with the following files:
 		bytes) at which a previously used LLC_occupancy
 		counter can be considered for re-use.
 
+If telemetry monitoring is available there will be a "PERF_PKG_MON" directory
+with the following files:
+
+"num_rmids":
+		The number of RMIDs for telemetry monitoring events.
+
+		On Intel resctrl will not enable telemetry events if the number of
+		RMIDs that can be tracked concurrently is lower than the total number
+		of RMIDs supported. Telemetry events can be force-enabled with the
+		"rdt=" kernel parameter, but this may reduce the number of
+		monitoring groups that can be created.
+
+"mon_features":
+		Lists the telemetry monitoring events that are enabled on this system.
+
+The upper bound for how many "CTRL_MON" + "MON" can be created
+is the smaller of the L3_MON and PERF_PKG_MON "num_rmids" values.
+
 Finally, in the top level of the "info" directory there is a file
 named "last_cmd_status". This is reset with every "command" issued
 via the file system (making new directories or writing to any of the
@@ -589,15 +606,40 @@ When control is enabled all CTRL_MON groups will also contain:
 When monitoring is enabled all MON groups will also contain:
 
 "mon_data":
-	This contains a set of files organized by L3 domain and by
-	RDT event. E.g. on a system with two L3 domains there will
-	be subdirectories "mon_L3_00" and "mon_L3_01".	Each of these
-	directories have one file per event (e.g. "llc_occupancy",
-	"mbm_total_bytes", and "mbm_local_bytes"). In a MON group these
-	files provide a read out of the current value of the event for
-	all tasks in the group. In CTRL_MON groups these files provide
-	the sum for all tasks in the CTRL_MON group and all tasks in
+	This contains directories for each monitor domain.
+
+	If L3 monitoring is enabled, there will be a "mon_L3_XX" directory for
+	each instance of an L3 cache. Each directory contains files for the enabled
+	L3 events (e.g. "llc_occupancy", "mbm_total_bytes", and "mbm_local_bytes").
+
+	If telemetry monitoring is enabled, there will be a "mon_PERF_PKG_YY"
+	directory for each physical processor package. Each directory contains
+	files for the enabled telemetry events (e.g. "core_energy". "activity",
+	"uops_retired", etc.)
+
+	The info/`*`/mon_features files provide the full list of enabled
+	event/file names.
+
+	"core energy" reports a floating point number for the energy (in Joules)
+	consumed by cores (registers, arithmetic units, TLB and L1/L2 caches)
+	during execution of instructions summed across all logical CPUs on a
+	package for the current monitoring group.
+
+	"activity" also reports a floating point value (in Farads).  This provides
+	an estimate of work done independent of the frequency that the CPUs used
+	for execution.
+
+	Note that "core energy" and "activity" only measure energy/activity in the
+	"core" of the CPU (arithmetic units, TLB, L1 and L2 caches, etc.). They
+	do not include L3 cache, memory, I/O devices etc.
+
+	All other events report decimal integer values.
+
+	In a MON group these files provide a read out of the current value of
+	the event for all tasks in the group. In CTRL_MON groups these files
+	provide the sum for all tasks in the CTRL_MON group and all tasks in
 	MON groups. Please see example section for more details on usage.
+
 	On systems with Sub-NUMA Cluster (SNC) enabled there are extra
 	directories for each node (located within the "mon_L3_XX" directory
 	for the L3 cache they occupy). These are named "mon_sub_L3_YY"
@@ -1590,6 +1632,41 @@ Example with C::
     resctrl_release_lock(fd);
   }
 
+Debugfs
+=======
+In addition to the use of debugfs for tracing of pseudo-locking performance,
+architecture code may create debugfs directories associated with monitoring
+features for a specific resource.
+
+The full pathname for these is in the form:
+
+    /sys/kernel/debug/resctrl/info/{resource_name}_MON/{arch}/
+
+The presence, names, and format of these files may vary between architectures
+even if the same resource is present.
+
+PERF_PKG_MON/x86_64
+-------------------
+Three files are present per telemetry aggregator instance that show status.
+The prefix of each file name describes the type ("energy" or "perf"), the
+guid, which processor package it belongs to, and the instance number of the
+aggregator.  For example: "energy_0x26696143_pkg1_agg2".
+
+The suffix describes which data is reported in the file and is one of:
+
+data_loss_count:
+	This counts the number of times that this aggregator
+	failed to accumulate a counter value supplied by a CPU.
+
+data_loss_timestamp:
+	This is a "timestamp" from a free running 25MHz uncore
+	timer indicating when the most recent data loss occurred.
+
+last_update_timestamp:
+	Another 25MHz timestamp indicating when the
+	most recent counter update was successfully applied.
+
+
 Examples for RDT Monitoring along with allocation usage
 =======================================================
 Reading monitored data
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring
  2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (31 preceding siblings ...)
  2025-12-17 17:21 ` [PATCH v17 32/32] x86,fs/resctrl: Update documentation for telemetry events Tony Luck
@ 2025-12-17 22:16 ` Reinette Chatre
  2026-01-04  6:14   ` Borislav Petkov
  32 siblings, 1 reply; 58+ messages in thread
From: Reinette Chatre @ 2025-12-17 22:16 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu,
	Borislav Petkov, x86@kernel.org
  Cc: linux-kernel, patches

Dear x86 maintainers,

Could you please consider this series for inclusion when you find is most appropriate?

I understand the timing before holidays is not ideal. I myself will be offline for a while
over the holidays. We can surely continue discussions about this work after the holidays.
I just thought it best to make you aware right when series is ready instead of guessing
what an appropriate time may be.

Regards,

Reinette

On 12/17/25 9:20 AM, Tony Luck wrote:
> Patches based on Linus v6.19-rc1
> 
> Series available here:
> git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git rdt-aet-v17
> 
> Changes since v16 was posted here:
> https://lore.kernel.org/all/20251210231413.59102-1-tony.luck@intel.com/
> 
> Cover letter
> Added some examples for Babu
> 
> part 11
> Added Reinette RB tag
> 
> part 19
> Update commit message to explain why it is safe to enable just some events
> within an event group.
> Added Reinette RB tag
> 
> part 24
> Added Reinette RB tag
> 
> part 25
> Drop unneeded local variable "ret" from all_regions_have_sufficient_rmid()
> Added Reinette RB tag
> 
> part 32
> Added Reinette RB tag
> 
> Background
> ----------
> On Intel systems that support per-RMID telemetry monitoring each logical
> processor keeps a local count for various events. When the
> MSR_IA32_PQR_ASSOC.RMID value for the logical processor changes (or when a
> two millisecond counter expires) these event counts are transmitted to
> an event aggregator on the same package as the processor together with
> the current RMID value. The event counters are reset to zero to begin
> counting again.
> 
> Each aggregator takes the incoming event counts and adds them to
> cumulative counts for each event for each RMID. Note that there can be
> multiple aggregators on each package with no architectural association
> between logical processors and an aggregator.
> 
> All of these aggregated counters can be read by an operating system from
> the MMIO space of the Out Of Band Management Service Module (OOBMSM)
> device(s) on a system. Any counter can be read from any logical processor.
> 
> Intel publishes details for each processor generation showing which
> events are counted by each logical processor and the offsets for each
> accumulated counter value within the MMIO space in XML files here:
> https://github.com/intel/Intel-PMT.
> 
> For example there are two energy related telemetry events for the
> Clearwater Forest family of processors and the MMIO space looks like this:
> 
> Offset  RMID    Event
> ------  ----    -----
> 0x0000  0       core_energy
> 0x0008  0       activity
> 0x0010  1       core_energy
> 0x0018  1       activity
> ...
> 0x23F0  575     core_energy
> 0x23F8  575     activity
> 
> In addition the XML file provides the units (Joules for core_energy,
> Farads for activity) and the type of data (fixed-point binary with
> bit 63 used to indicate the data is valid, and the low 18 bits as a
> binary fraction).
> 
> Finally, each XML file provides a 32-bit unique id (or guid) that is
> used as an index to find the correct XML description file for each
> telemetry implementation.
> 
> The INTEL_PMT_TELEMETRY driver provides intel_pmt_get_regions_by_feature()
> to enumerate the aggregator instances (also referred to as "telemetry
> regions" in this series) on a platform. It provides:
> 
> 1) guid  - so resctrl can determine which events are supported
> 2) MMIO base address of counters
> 3) package id
> 
> Resctrl accumulates counts from all aggregators on a package in order
> to provide a consistent user interface across processor generations.
> 
> Directory structure for the telemetry events looks like this:
> 
> $ tree /sys/fs/resctrl/mon_data/
> /sys/fs/resctrl/mon_data/
> mon_data
> ├── mon_PERF_PKG_00
> │   ├── activity
> │   └── core_energy
> └── mon_PERF_PKG_01
>     ├── activity
>     └── core_energy
> 
> Reading the "core_energy" file from some resctrl mon_data directory shows
> the cumulative energy (in Joules) used by all tasks that ran with the RMID
> associated with that directory on a given package. Note that "core_energy"
> reports only energy consumed by CPU cores (data processing units,
> L1/L2 caches, etc.). It does not include energy used in the "uncore"
> (L3 cache, on package devices, etc.), or used by memory or I/O devices.
> 
> Examples:
> --------
> 
> As with other resctrl monitoring features first create CTRL_MON or MON
> directories and assign the tasks of interest to the group.
> 
> # mkdir /sys/fs/resctrl/aet_example
> # echo {list of PIDs} > /sys/fs/resctrl/aet_example/tasks
> 
> For simplicity in this example, assume that these tasks have their
> affinity set to CPUs in the first socket. Set a shell variable to
> point to the mon_data directory for socket 0:
> 
> $ dir=/sys/fs/resctrl/aet_example/mon_data/mon_PERF_PKG_00
> 
> Energy events:
> -------------
> 
> There are two events associated with energy consumption in the core.
> The "core_energy" event reports out directly in Joules. To compute
> power just take the difference between two samples and divide by the
> time between them. E.g.
> 
> $ cat $dir/core_energy; sleep 10; cat $dir/core_energy
> 94499439.510380
> 94499607.019680
> $ bc -q
> scale=3
> (94499607.019680 - 94499439.510380) / 10
> 16.750
> 
> So 16.75 Watts in this example.
> 
> Note that different runs of the same workload may report different
> energy consumption. This happens when cores shift to different
> voltage/frequency profiles due to overall system load.
> 
> The "activity" event reports energy usage in a manner independent
> of voltage and frequency. This may be useful for developers to
> assess how modifications to a program (e.g. attaching to a library
> optimized to use AVX instructions) affect energy consumption. So
> read the "activity" at the start and end of program execution and
> compute the difference.
> 
> Perf events:
> -----------
> 
> The other telemetry events largely duplicate events available using
> "perf", but avoid reading the perf counters on every context switch.
> This may be a significant improvement when monitoring highly multi-threaded
> applications. E.g. to find the ratio of core cycles to reference cycles:
> 
> $ cat $dir/unhalted_core_cycles $dir/unhalted_ref_cycles
> 1312249223146571
> 1660157011698276
> $ { run application here }
> $ cat $dir/unhalted_core_cycles $dir/unhalted_ref_cycles
> 1313573565617233
> 1661511224019444
> $ bc -q
> scale = 3
> (1661511224019444 - 1660157011698276) / (1313573565617233 - 1312249223146571)
> 1.022
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> 
> Tony Luck (32):
>   x86,fs/resctrl: Improve domain type checking
>   x86/resctrl: Move L3 initialization into new helper function
>   x86/resctrl: Refactor domain_remove_cpu_mon() ready for new domain
>     types
>   x86/resctrl: Clean up domain_remove_cpu_ctrl()
>   x86,fs/resctrl: Refactor domain create/remove using struct
>     rdt_domain_hdr
>   fs/resctrl: Split L3 dependent parts out of __mon_event_count()
>   x86,fs/resctrl: Use struct rdt_domain_hdr when reading counters
>   x86,fs/resctrl: Rename struct rdt_mon_domain and rdt_hw_mon_domain
>   x86,fs/resctrl: Rename some L3 specific functions
>   fs/resctrl: Make event details accessible to functions when reading
>     events
>   x86,fs/resctrl: Handle events that can be read from any CPU
>   x86,fs/resctrl: Support binary fixed point event counters
>   x86,fs/resctrl: Add an architectural hook called for each mount
>   x86,fs/resctrl: Add and initialize a resource for package scope
>     monitoring
>   fs/resctrl: Emphasize that L3 monitoring resource is required for
>     summing domains
>   x86/resctrl: Discover hardware telemetry events
>   x86,fs/resctrl: Fill in details of events for guid 0x26696143 and
>     0x26557651
>   x86,fs/resctrl: Add architectural event pointer
>   x86/resctrl: Find and enable usable telemetry events
>   x86/resctrl: Read telemetry events
>   fs/resctrl: Refactor mkdir_mondata_subdir()
>   fs/resctrl: Refactor rmdir_mondata_subdir_allrdtgrp()
>   x86,fs/resctrl: Handle domain creation/deletion for
>     RDT_RESOURCE_PERF_PKG
>   x86/resctrl: Add energy/perf choices to rdt boot option
>   x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF_PKG
>   fs/resctrl: Move allocation/free of closid_num_dirty_rmid[]
>   x86,fs/resctrl: Compute number of RMIDs as minimum across resources
>   fs/resctrl: Move RMID initialization to first mount
>   x86/resctrl: Enable RDT_RESOURCE_PERF_PKG
>   fs/resctrl: Provide interface to create architecture specific debugfs
>     area
>   x86/resctrl: Add debugfs files to show telemetry aggregator status
>   x86,fs/resctrl: Update documentation for telemetry events
> 
>  .../admin-guide/kernel-parameters.txt         |   7 +-
>  Documentation/filesystems/resctrl.rst         | 101 +++-
>  include/linux/resctrl.h                       |  67 ++-
>  include/linux/resctrl_types.h                 |  11 +
>  arch/x86/kernel/cpu/resctrl/internal.h        |  48 +-
>  fs/resctrl/internal.h                         |  68 ++-
>  arch/x86/kernel/cpu/resctrl/core.c            | 230 ++++++---
>  arch/x86/kernel/cpu/resctrl/intel_aet.c       | 473 ++++++++++++++++++
>  arch/x86/kernel/cpu/resctrl/monitor.c         |  50 +-
>  fs/resctrl/ctrlmondata.c                      | 113 ++++-
>  fs/resctrl/monitor.c                          | 364 +++++++++-----
>  fs/resctrl/rdtgroup.c                         | 293 +++++++----
>  arch/x86/Kconfig                              |  13 +
>  arch/x86/kernel/cpu/resctrl/Makefile          |   1 +
>  14 files changed, 1440 insertions(+), 399 deletions(-)
>  create mode 100644 arch/x86/kernel/cpu/resctrl/intel_aet.c
> 
> 
> base-commit: 8f0b4cce4481fb22653697cced8d0d04027cb1e8


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring
  2025-12-17 22:16 ` [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Reinette Chatre
@ 2026-01-04  6:14   ` Borislav Petkov
  0 siblings, 0 replies; 58+ messages in thread
From: Borislav Petkov @ 2026-01-04  6:14 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu,
	x86@kernel.org, linux-kernel, patches

On Wed, Dec 17, 2025 at 02:16:52PM -0800, Reinette Chatre wrote:
> Dear x86 maintainers,
> 
> Could you please consider this series for inclusion when you find is most appropriate?

Sure, lemme take a look.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2026-01-10 23:29 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-17 17:20 [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Tony Luck
2025-12-17 17:20 ` [PATCH v17 01/32] x86,fs/resctrl: Improve domain type checking Tony Luck
2025-12-17 17:20 ` [PATCH v17 02/32] x86/resctrl: Move L3 initialization into new helper function Tony Luck
2025-12-17 17:20 ` [PATCH v17 03/32] x86/resctrl: Refactor domain_remove_cpu_mon() ready for new domain types Tony Luck
2025-12-17 17:20 ` [PATCH v17 04/32] x86/resctrl: Clean up domain_remove_cpu_ctrl() Tony Luck
2025-12-17 17:20 ` [PATCH v17 05/32] x86,fs/resctrl: Refactor domain create/remove using struct rdt_domain_hdr Tony Luck
2025-12-17 17:20 ` [PATCH v17 06/32] fs/resctrl: Split L3 dependent parts out of __mon_event_count() Tony Luck
2025-12-17 17:20 ` [PATCH v17 07/32] x86,fs/resctrl: Use struct rdt_domain_hdr when reading counters Tony Luck
2025-12-17 17:20 ` [PATCH v17 08/32] x86,fs/resctrl: Rename struct rdt_mon_domain and rdt_hw_mon_domain Tony Luck
2025-12-17 17:20 ` [PATCH v17 09/32] x86,fs/resctrl: Rename some L3 specific functions Tony Luck
2025-12-17 17:20 ` [PATCH v17 10/32] fs/resctrl: Make event details accessible to functions when reading events Tony Luck
2025-12-17 17:20 ` [PATCH v17 11/32] x86,fs/resctrl: Handle events that can be read from any CPU Tony Luck
2025-12-17 17:20 ` [PATCH v17 12/32] x86,fs/resctrl: Support binary fixed point event counters Tony Luck
2025-12-17 17:21 ` [PATCH v17 13/32] x86,fs/resctrl: Add an architectural hook called for each mount Tony Luck
2026-01-05 19:17   ` Borislav Petkov
2026-01-05 19:39     ` Luck, Tony
2026-01-05 20:04       ` Borislav Petkov
2026-01-05 20:15         ` Luck, Tony
2026-01-07 17:29           ` Reinette Chatre
2026-01-07 18:05             ` Luck, Tony
2026-01-07 19:33               ` Reinette Chatre
2026-01-07 20:25                 ` Luck, Tony
2026-01-07 22:09                   ` Reinette Chatre
2026-01-07 22:27                     ` Luck, Tony
2026-01-07 23:09                       ` Reinette Chatre
2026-01-08  0:16                         ` Luck, Tony
2026-01-08  2:42                           ` Reinette Chatre
2025-12-17 17:21 ` [PATCH v17 14/32] x86,fs/resctrl: Add and initialize a resource for package scope monitoring Tony Luck
2025-12-17 17:21 ` [PATCH v17 15/32] fs/resctrl: Emphasize that L3 monitoring resource is required for summing domains Tony Luck
2025-12-17 17:21 ` [PATCH v17 16/32] x86/resctrl: Discover hardware telemetry events Tony Luck
2025-12-17 17:21 ` [PATCH v17 17/32] x86,fs/resctrl: Fill in details of events for guid 0x26696143 and 0x26557651 Tony Luck
2025-12-17 17:21 ` [PATCH v17 18/32] x86,fs/resctrl: Add architectural event pointer Tony Luck
2025-12-17 17:21 ` [PATCH v17 19/32] x86/resctrl: Find and enable usable telemetry events Tony Luck
2026-01-09 12:16   ` Borislav Petkov
2026-01-09 16:17     ` Reinette Chatre
2026-01-09 16:53     ` Luck, Tony
2026-01-09 22:01       ` Borislav Petkov
2025-12-17 17:21 ` [PATCH v17 20/32] x86/resctrl: Read " Tony Luck
2025-12-17 17:21 ` [PATCH v17 21/32] fs/resctrl: Refactor mkdir_mondata_subdir() Tony Luck
2025-12-17 17:21 ` [PATCH v17 22/32] fs/resctrl: Refactor rmdir_mondata_subdir_allrdtgrp() Tony Luck
2025-12-17 17:21 ` [PATCH v17 23/32] x86,fs/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG Tony Luck
2025-12-17 17:21 ` [PATCH v17 24/32] x86/resctrl: Add energy/perf choices to rdt boot option Tony Luck
2026-01-09 22:16   ` Borislav Petkov
2026-01-09 22:20     ` Luck, Tony
2025-12-17 17:21 ` [PATCH v17 25/32] x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF_PKG Tony Luck
2025-12-17 17:21 ` [PATCH v17 26/32] fs/resctrl: Move allocation/free of closid_num_dirty_rmid[] Tony Luck
2025-12-17 17:21 ` [PATCH v17 27/32] x86,fs/resctrl: Compute number of RMIDs as minimum across resources Tony Luck
2025-12-17 17:21 ` [PATCH v17 28/32] fs/resctrl: Move RMID initialization to first mount Tony Luck
2025-12-17 17:21 ` [PATCH v17 29/32] x86/resctrl: Enable RDT_RESOURCE_PERF_PKG Tony Luck
2025-12-17 17:21 ` [PATCH v17 30/32] fs/resctrl: Provide interface to create architecture specific debugfs area Tony Luck
2026-01-10 10:57   ` Borislav Petkov
2026-01-10 19:13     ` Luck, Tony
2026-01-10 19:42       ` Borislav Petkov
2026-01-10 23:29         ` Luck, Tony
2025-12-17 17:21 ` [PATCH v17 31/32] x86/resctrl: Add debugfs files to show telemetry aggregator status Tony Luck
2025-12-17 17:21 ` [PATCH v17 32/32] x86,fs/resctrl: Update documentation for telemetry events Tony Luck
2025-12-17 22:16 ` [PATCH v17 00/32] x86,fs/resctrl telemetry monitoring Reinette Chatre
2026-01-04  6:14   ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox