All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/6] Introduce MMIO-based CMT access for Enhanced RDT
@ 2026-06-13  7:56 Chen Yu
  2026-06-13  7:56 ` [PATCH v4 1/6] x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID Chen Yu
                   ` (5 more replies)
  0 siblings, 6 replies; 23+ messages in thread
From: Chen Yu @ 2026-06-13  7:56 UTC (permalink / raw)
  To: tony.luck, reinette.chatre
  Cc: x86, linux-kernel, tglx, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy

Intel Enhanced Resource Director Technology (ERDT) extends the existing
RDT framework with two major capabilities:

  1. MMIO-based access to monitoring and allocation registers, replacing
     the legacy MSR-based interface.
  2. Region-aware RDT for fine-grained control over different tiers of
     memory (e.g., CXL.mem, DDR).

This is described in the Intel RDT Architecture Specification:
https://cdrdv2-public.intel.com/789566/356688-intel-rdt-arch-spec.pdf

This patch set focuses on the first part: enabling MMIO-based access for
Cache Monitoring Technology (CMT), while CAT/MBM/MBA are still using MSR.
The platform advertises the MMIO register layout through the ACPI ERDT
(Enhanced Resource Director Technology) table, which contains sub-tables
describing per-domain register regions for monitoring and allocation.

With ERDT, L3 cache occupancy counters are read via MMIO rather than
MSR, allowing the reads to be performed from any CPU without requiring
cross-CPU IPIs. This series parses the relevant ACPI sub-tables (RMDD,
CMRC), prepares the resctrl monitor infrastructure for MMIO-based reads,
and adds initial support for reading L3 occupancy via the CMRC interface.

kselftest of CMT and L3_CAT has passed with minor adjustment at
https://lore.kernel.org/lkml/20260523101715.3964456-1-yu.c.chen@intel.com/.

Changes from V3 to V4:
- Remove the redundant table length check in subtbl_valid() (Thomas Gleixner)
- Reuse subtbl_valid() for all the table iteration (Thomas Gleixner)
- Refine the commit log of [PATCH 5/6] to state that this change is a
  preparation for [PATCH 6/6] rather than fixing an existing issue
  (Thomas Gleixner, Reinette Chatre, Tony Luck)
- Fix if CACD lists all CPUs in the LLC domain (sashiko)
- Deal with a corner case that if there is no valid RMDD tables,
  the erdt_enabled_flag should remain false.(sashiko)
- Add Thomas's Reviewed-by and Hongyu's Tested-by.

Changes from V2 to V3:
- Wrap __resctrl_arch_late_init() to avoid the goto logic. (Thomas Gleixner)
- Make the variables in struct erdt_domain_info tabular format (Thomas Gleixner)
- Remove tail comments (Thomas Gleixner)
- Make the name of erdt_enabled() and variable in it consistent and
  comprehensible. (Thomas Gleixner)
- Use topo_lookup_cpuid() to search the CPU id according to the x2apic id
  (Thomas Gleixner)
- Fix kernel doc comment format (Thomas Gleixner)
- Use brackets for multiple lines "if" case. (Thomas Gleixner)
- Let the parameter for cacd_init() to fully utilize 100 characters.
  (Thomas Gleixner)
- Variables are reordered in reverse fir-tree.(Thomas Gleixner)
- Added a named constant and use it in the rmdd->flags check.
  (Thomas Gleixner)
- Introduce helper functions to make the code readable when iterating
  the RMDD tables. (Thomas Gleixner)
- Make the macros tabular format. (Thomas Gleixner)

Changes from V1 to V2:
- Add #include <linux/cleanup.h> to follow the "include-what-you-use" best
  practice (Tony Luck)
- Fix 3 issues reported by:
  https://sashiko.dev/#/patchset/cover.1779872016.git.yu.c.chen%40intel.com
  Remove the variable of cacd in struct erdt_domain_info as it will
  never be used after initialization.
  Invoke erdt_exit() to avoid resource leak if rdt_alloc_capable and
  rdt_mon_capable are both false.
  Adjust the comments suggested by sashiko.

Anil S Keshavamurthy (1):
  x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID

Chen Yu (4):
  x86/resctrl: Parse ACPI CMRC table
  x86/resctrl: Rename prev_msr to prev_mon_val
  x86/resctrl: Refactor the monitor read function
  x86/resctrl: Add support for L3 occupancy monitoring via RMID MMIO
    read

Tony Luck (1):
  fs/resctrl: Do not invoke smp_processor_id() in preemptible context

 arch/x86/Kconfig                       |   4 +-
 arch/x86/include/asm/apic.h            |   1 +
 arch/x86/include/asm/resctrl.h         |   4 +
 arch/x86/kernel/cpu/resctrl/Makefile   |   1 +
 arch/x86/kernel/cpu/resctrl/core.c     |  16 +-
 arch/x86/kernel/cpu/resctrl/erdt.c     | 425 +++++++++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/internal.h |  11 +-
 arch/x86/kernel/cpu/resctrl/monitor.c  |  64 ++--
 arch/x86/kernel/cpu/topology.c         |   2 +-
 fs/resctrl/monitor.c                   |  42 ++-
 10 files changed, 528 insertions(+), 42 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/resctrl/erdt.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v4 1/6] x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID
  2026-06-13  7:56 [PATCH v4 0/6] Introduce MMIO-based CMT access for Enhanced RDT Chen Yu
@ 2026-06-13  7:56 ` Chen Yu
  2026-06-18 23:37   ` Reinette Chatre
  2026-06-13  7:57 ` [PATCH v4 2/6] x86/resctrl: Parse ACPI CMRC table Chen Yu
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Chen Yu @ 2026-06-13  7:56 UTC (permalink / raw)
  To: tony.luck, reinette.chatre
  Cc: x86, linux-kernel, tglx, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy

From: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>

ERDT(Enhanced RDT) introduces a new top-level ACPI structure
(the ERDT) that the kernel must parse before any enhanced
RDT feature can be used. The ERDT improves the existing RDT
by switching low-level register access from MSR-based to
MMIO-based, which is more efficient.

The ERDT structure may include several sub ACPI tables:

  - Resource Management Domain Description Structure (RMDD)
  - CPU Agent Collection Description Structure (CACD)
  - Cache Monitoring Registers for CPU Agents Description Structure
    (CMRC)

There is one ERDT table per platform.
Each RMDD substructure in ERDT represents one resource management
domain (RMD), also known as an L3 domain. Thus, the total number
of RMDDs equals the number of L3 domains on the platform.
Each RMDD contains information such as MMIO addresses. This address
is used to retrieve RDT metrics like L3 occupancy.

Add basic ERDT ACPI table and sub-table parsing, and store the
relevant tables for later processing.

Among these sub-tables, RMDD requires special handling. There is one
RMDD per domain, and the domain ID reuses the L3 cache ID. Many code
paths need to retrieve an RMDD efficiently by domain ID (L3 cache ID).
Because L3 cache IDs are derived from x2APIC IDs and are not
contiguous, using a plain array indexed by domain ID would waste
memory. As a trade-off, an xarray is used to store these tables, with
the L3 cache ID as the key.

Suggested-by: Tony Luck <tony.luck@intel.com>
Tested-by: Hongyu Ning <hongyu.ning@linux.intel.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
---
v3->v4:
   Remove the redundant table length check in subtbl_valid() (Thomas Gleixner)
   Reuse subtbl_valid() for all the table iteration (Thomas Gleixner)
   Fix if CACD lists all CPUs in the LLC domain (sashiko)
   Deal with a corner case that if there is no valid RMDD table,
   the erdt_enabled_flag should remain false (sashiko)
   Add Hongyu's tag.

v2->v3:
   Wrap __resctrl_arch_late_init() to avoid the goto logic. (Thomas Gleixner)
   Make the variables in struct erdt_domain_info tabular format (Thomas Gleixner)
   Remove tail comments (Thomas Gleixner)
   Make the name of erdt_enabled() and variable in it consistent and
   comprehensible. (Thomas Gleixner)
   Use topo_lookup_cpuid() to search the CPU id according to the x2apic id
   (Thomas Gleixner)
   Fix kernel doc comment format (Thomas Gleixner)
   Use brackets for multiple lines "if" case. (Thomas Gleixner)
   Let the parameter for cacd_init() to fully utilize 100 characters.
   (Thomas Gleixner)
   Variables are reordered in reverse fir-tree.(Thomas Gleixner)
   Added a named constant and use it in the rmdd->flags check.
   (Thomas Gleixner)
   Introduce helper functions to make the code readable when iterating
   the RMDD tables. (Thomas Gleixner)

v1->v2:
  Add #include <linux/cleanup.h> to follow the "include-what-you-use" best.
  (Tony Luck)
  Remove the variable of cacd in struct erdt_domain_info as it will
  never be used after initialization.(sashiko)
  Invoke erdt_exit() to avoid resource leak if rdt_alloc_capable and
  rdt_mon_capable are both false.(sashiko)
  Refine comments.(sashiko)
---
 arch/x86/Kconfig                       |   4 +-
 arch/x86/include/asm/apic.h            |   1 +
 arch/x86/include/asm/resctrl.h         |   2 +
 arch/x86/kernel/cpu/resctrl/Makefile   |   1 +
 arch/x86/kernel/cpu/resctrl/core.c     |  14 +-
 arch/x86/kernel/cpu/resctrl/erdt.c     | 296 +++++++++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/internal.h |   3 +
 arch/x86/kernel/cpu/topology.c         |   2 +-
 8 files changed, 319 insertions(+), 4 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/resctrl/erdt.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f3f7cb01d69d..97d210bd9bb5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -515,7 +515,7 @@ config X86_MPPARSE
 
 config X86_CPU_RESCTRL
 	bool "x86 CPU resource control support"
-	depends on X86 && (CPU_SUP_INTEL || CPU_SUP_AMD)
+	depends on X86_64 && (CPU_SUP_INTEL || CPU_SUP_AMD)
 	depends on MISC_FILESYSTEMS
 	select ARCH_HAS_CPU_RESCTRL
 	select RESCTRL_FS
@@ -538,7 +538,7 @@ config X86_CPU_RESCTRL
 
 config X86_CPU_RESCTRL_INTEL_AET
 	bool "Intel Application Energy Telemetry"
-	depends on X86_64 && X86_CPU_RESCTRL && CPU_SUP_INTEL && INTEL_PMT_TELEMETRY=y && INTEL_TPMI=y
+	depends on X86_CPU_RESCTRL && CPU_SUP_INTEL && INTEL_PMT_TELEMETRY=y && INTEL_TPMI=y
 	help
 	  Enable per-RMID telemetry events in resctrl.
 
diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 9cd493d467d4..bb84651b14bd 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -54,6 +54,7 @@ static inline void x86_32_probe_apic(void) { }
 #endif
 
 extern u32 cpuid_to_apicid[];
+int topo_lookup_cpuid(u32 apic_id);
 
 #define CPU_ACPIID_INVALID	U32_MAX
 
diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index 575f8408a9e7..97c2f6bc7a5f 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -40,6 +40,8 @@ struct resctrl_pqr_state {
 	u32			default_closid;
 };
 
+bool erdt_enabled(void);
+
 DECLARE_PER_CPU(struct resctrl_pqr_state, pqr_state);
 
 extern bool rdt_alloc_capable;
diff --git a/arch/x86/kernel/cpu/resctrl/Makefile b/arch/x86/kernel/cpu/resctrl/Makefile
index 273ddfa30836..2216ee084832 100644
--- a/arch/x86/kernel/cpu/resctrl/Makefile
+++ b/arch/x86/kernel/cpu/resctrl/Makefile
@@ -2,6 +2,7 @@
 obj-$(CONFIG_X86_CPU_RESCTRL)		+= core.o rdtgroup.o monitor.o
 obj-$(CONFIG_X86_CPU_RESCTRL)		+= ctrlmondata.o
 obj-$(CONFIG_X86_CPU_RESCTRL_INTEL_AET)	+= intel_aet.o
+obj-$(CONFIG_X86_CPU_RESCTRL)		+= erdt.o
 obj-$(CONFIG_RESCTRL_FS_PSEUDO_LOCK)	+= pseudo_lock.o
 
 # To allow define_trace.h's recursive include:
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 7667cf7c4e94..90730f0851fa 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -1012,6 +1012,7 @@ static __init void check_quirks(void)
 
 static __init bool get_rdt_resources(void)
 {
+	erdt_init();
 	rdt_alloc_capable = get_rdt_alloc_resources();
 	rdt_mon_capable = get_rdt_mon_resources();
 
@@ -1113,7 +1114,7 @@ void resctrl_cpu_detect(struct cpuinfo_x86 *c)
 	}
 }
 
-static int __init resctrl_arch_late_init(void)
+static int __init __resctrl_arch_late_init(void)
 {
 	struct rdt_resource *r;
 	int state, ret, i;
@@ -1156,6 +1157,15 @@ static int __init resctrl_arch_late_init(void)
 	return 0;
 }
 
+static int __init resctrl_arch_late_init(void)
+{
+	int ret = __resctrl_arch_late_init();
+
+	if (ret)
+		erdt_exit();
+	return ret;
+}
+
 late_initcall(resctrl_arch_late_init);
 
 static void __exit resctrl_arch_exit(void)
@@ -1165,6 +1175,8 @@ static void __exit resctrl_arch_exit(void)
 	cpuhp_remove_state(rdt_online);
 
 	resctrl_exit();
+
+	erdt_exit();
 }
 
 __exitcall(resctrl_arch_exit);
diff --git a/arch/x86/kernel/cpu/resctrl/erdt.c b/arch/x86/kernel/cpu/resctrl/erdt.c
new file mode 100644
index 000000000000..b43dcc06f009
--- /dev/null
+++ b/arch/x86/kernel/cpu/resctrl/erdt.c
@@ -0,0 +1,296 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Enhanced Resource Director Technology(ERDT)
+ *
+ * Copyright (C) 2026 Intel Corporation
+ *
+ */
+
+#define pr_fmt(fmt)     "resctrl: " fmt
+
+#include <linux/cleanup.h>
+#include <linux/cpu.h>
+#include <linux/err.h>
+#include <linux/xarray.h>
+#include <linux/resctrl.h>
+#include <linux/acpi.h>
+#include <asm/apic.h>
+#include <asm/cpu_device_id.h>
+#include "internal.h"
+
+enum erdt_mmio_type {
+	ERDT_MMIO_RMDD_CREG,
+	ERDT_MMIO_CMRC_BASE,
+	ERDT_MMIO_MAX
+};
+
+struct erdt_domain_info {
+	void __iomem		*base[ERDT_MMIO_MAX];
+	struct acpi_erdt_cmrc	*cmrc;
+};
+
+static bool erdt_enabled_flag;
+
+static DEFINE_XARRAY(erdt_domain_xa);
+
+#define ERDT_VALID_VERSION		1
+#define RMDD_FLAG_CPU_DOMAIN		BIT(0)
+
+static u32 valid_subtbl_mask;
+
+bool erdt_enabled(void)
+{
+	return erdt_enabled_flag;
+}
+
+/**
+ * get_l3_cache_id_from_cacd - Resolve L3 cache ID from CACD subtable
+ * @cacd: Pointer to the ACPI ERDT CACD structure
+ *
+ * Parses the X2APIC ID list in the given CACD subtable to
+ * identify an online logical CPU and uses it to query the associated
+ * L3 cache ID. The first valid CPU found is used for this lookup.
+ *
+ * The L3 cache ID is used as a unique domain key for ERDT domain
+ * registration and lookup.
+ *
+ * Return: L3 cache ID for the first matching CPU, or -1 on failure.
+ */
+static __init int get_l3_cache_id_from_cacd(struct acpi_erdt_cacd *cacd)
+{
+	int num_ids, cpu, online_cpu = -1, cache_id = -1, tmp;
+	struct cacheinfo *ci;
+
+	if (cacd->header.length < sizeof(*cacd) + sizeof(cacd->X2APICIDS[0])) {
+		pr_warn(FW_BUG "Invalid x2apicid CACD table\n");
+		return -1;
+	}
+
+	num_ids = (cacd->header.length - sizeof(*cacd)) / sizeof(cacd->X2APICIDS[0]);
+
+	guard(cpus_read_lock)();
+
+	for (int i = 0; i < num_ids; i++) {
+		cpu = topo_lookup_cpuid(cacd->X2APICIDS[i]);
+		if (cpu < 0) {
+			pr_warn(FW_BUG "Unknown x2apicid 0x%x\n", cacd->X2APICIDS[i]);
+
+			return -1;
+		}
+
+		if (!cpu_online(cpu))
+			continue;
+
+		tmp = get_cpu_cacheinfo_id(cpu, RESCTRL_L3_CACHE);
+		if (tmp == -1) {
+			pr_warn(FW_BUG "Can not find L3 cache id for CPU%d\n", cpu);
+			return -1;
+		}
+
+		if (cache_id == -1)
+			cache_id = tmp;
+
+		if (tmp != cache_id) {
+			pr_warn(FW_BUG "CACD references multiple L3 cache instances\n");
+			return -1;
+		}
+		online_cpu = cpu;
+	}
+
+	if (online_cpu == -1)
+		return -1;
+
+	/*
+	 * Check if CACD lists all CPUs in the LLC domain.
+	 */
+	ci = get_cpu_cacheinfo_level(online_cpu, RESCTRL_L3_CACHE);
+	if (!ci || num_ids < cpumask_weight(&ci->shared_cpu_map)) {
+		pr_warn(FW_BUG "CACD does not list all the CPUs in L3 domain\n");
+		return -1;
+	}
+
+	return cache_id;
+}
+
+static void __iomem *erdt_ioremap_checked(phys_addr_t base, u32 size, const char *desc)
+{
+	void __iomem *addr = ioremap(base, size << 12);
+
+	if (!addr) {
+		pr_err("ERDT: Failed to map %s at phys addr %#llx (size: %u pages)\n",
+		       desc, (unsigned long long)base, size);
+	}
+	return addr;
+}
+
+static void erdt_iounmap_domain(struct erdt_domain_info *domain)
+{
+	for (int i = 0; i < ERDT_MMIO_MAX; i++) {
+		if (domain->base[i]) {
+			iounmap(domain->base[i]);
+			domain->base[i] = NULL;
+		}
+	}
+}
+
+static void cleanup_one_domain(struct erdt_domain_info *d)
+{
+	erdt_iounmap_domain(d);
+	kfree(d);
+}
+
+static __init bool cacd_init(struct acpi_subtbl_hdr_16 *subtbl, int *l3_cache_id)
+{
+	*l3_cache_id = get_l3_cache_id_from_cacd((struct acpi_erdt_cacd *)subtbl);
+
+	return *l3_cache_id != -1;
+}
+
+static inline struct acpi_subtbl_hdr_16 *rmdd_subtbl(struct acpi_erdt_rmdd *rmdd)
+{
+	return (void *)rmdd + sizeof(*rmdd);
+}
+
+static inline struct acpi_subtbl_hdr_16 *next_subtbl(struct acpi_subtbl_hdr_16 *subtbl)
+{
+	return (void *)subtbl + subtbl->length;
+}
+
+static inline bool subtbl_valid(void *end, struct acpi_subtbl_hdr_16 *subtbl)
+{
+	if (subtbl->length < sizeof(*subtbl))
+		return false;
+
+	if ((void *)subtbl + subtbl->length > end)
+		return false;
+
+	return true;
+}
+
+static __init bool parse_rmdd_entry(struct acpi_subtbl_hdr_16 *rmdd_hdr)
+{
+	struct erdt_domain_info *domain_info;
+	struct acpi_subtbl_hdr_16 *subtbl;
+	struct acpi_erdt_rmdd *rmdd;
+	int l3_cache_id = -1;
+	u32 subtbl_mask = 0;
+
+	if (rmdd_hdr->length < sizeof(*rmdd)) {
+		pr_info(FW_BUG "Invalid RMDD length %u\n", rmdd_hdr->length);
+		return false;
+	}
+
+	rmdd = (struct acpi_erdt_rmdd *)rmdd_hdr;
+
+	/* Quietly ignore non-CPU-based L3 domains */
+	if (!(rmdd->flags & RMDD_FLAG_CPU_DOMAIN))
+		return true;
+
+	domain_info = kzalloc(sizeof(*domain_info), GFP_KERNEL);
+	if (!domain_info)
+		return false;
+
+	domain_info->base[ERDT_MMIO_RMDD_CREG] =
+		erdt_ioremap_checked(rmdd->creg_base, rmdd->creg_size, "RMDD ctrl base");
+	if (!domain_info->base[ERDT_MMIO_RMDD_CREG])
+		goto cleanup;
+
+	for (subtbl = rmdd_subtbl(rmdd);
+	     subtbl_valid((void *)rmdd + rmdd->header.length, subtbl);
+	     subtbl = next_subtbl(subtbl)) {
+		switch (subtbl->type) {
+		case ACPI_ERDT_TYPE_CACD:
+			if (cacd_init(subtbl, &l3_cache_id))
+				subtbl_mask |= BIT(ACPI_ERDT_TYPE_CACD);
+			break;
+		default:
+			break;
+		}
+	}
+
+	if (l3_cache_id == -1) {
+		pr_info("ERDT: Failed to resolve L3 cache ID for RMDD domain %d\n",
+			rmdd->domain_id);
+
+		goto cleanup;
+	}
+
+	/* Require all RMDDs to support same set of sub-tables */
+	if (!valid_subtbl_mask) {
+		valid_subtbl_mask = subtbl_mask;
+	} else if (subtbl_mask != valid_subtbl_mask) {
+		pr_info(FW_BUG "Mismatch domain\n");
+		goto cleanup;
+	}
+
+	if (xa_insert(&erdt_domain_xa, l3_cache_id, domain_info, GFP_KERNEL)) {
+		pr_info("ERDT: Failed to store domain info for RMDD domain %d\n",
+			rmdd->domain_id);
+		goto cleanup;
+	}
+
+	return true;
+
+cleanup:
+	cleanup_one_domain(domain_info);
+	return false;
+}
+
+static void erdt_cleanup(void)
+{
+	struct erdt_domain_info *d;
+	unsigned long index;
+
+	xa_for_each(&erdt_domain_xa, index, d)
+		cleanup_one_domain(d);
+	xa_destroy(&erdt_domain_xa);
+}
+
+static __init int enumerate_erdt_table(struct acpi_table_header *table_hdr)
+{
+	struct acpi_table_erdt *erdt = (struct acpi_table_erdt *)table_hdr;
+	struct acpi_subtbl_hdr_16 *subtbl;
+	void *table_end;
+
+	if (erdt->header.revision != ERDT_VALID_VERSION) {
+		pr_info("Unknown ERDT table revision %d\n", erdt->header.revision);
+		return -EINVAL;
+	}
+
+	if (erdt->header.length < sizeof(*erdt)) {
+		pr_info(FW_BUG "ERDT: Invalid table length %u\n", erdt->header.length);
+		return -EINVAL;
+	}
+
+	subtbl = (void *)erdt + sizeof(struct acpi_table_erdt);
+	table_end = (void *)erdt + erdt->header.length;
+
+	while (subtbl_valid(table_end, subtbl)) {
+		if (subtbl->type == ACPI_ERDT_TYPE_RMDD)
+			if (!parse_rmdd_entry(subtbl))
+				goto cleanup;
+
+		subtbl = next_subtbl(subtbl);
+	}
+
+	if (xa_empty(&erdt_domain_xa))
+		goto cleanup;
+
+	erdt_enabled_flag = true;
+
+	return 0;
+
+cleanup:
+	erdt_cleanup();
+	return -EINVAL;
+}
+
+int __init erdt_init(void)
+{
+	return acpi_table_parse(ACPI_SIG_ERDT, enumerate_erdt_table);
+}
+
+void erdt_exit(void)
+{
+	erdt_cleanup();
+}
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index e3cfa0c10e92..9c59bd5e028e 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -253,4 +253,7 @@ static inline void intel_aet_mon_domain_setup(int cpu, int id, struct rdt_resour
 static inline bool intel_handle_aet_option(bool force_off, char *tok) { return false; }
 #endif
 
+int erdt_init(void);
+void erdt_exit(void);
+
 #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index 4913b64ec592..bcee70fb9277 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -92,7 +92,7 @@ static inline u32 topo_apicid(u32 apicid, enum x86_topology_domains dom)
 	return apicid & (UINT_MAX << x86_topo_system.dom_shifts[dom - 1]);
 }
 
-static int topo_lookup_cpuid(u32 apic_id)
+int topo_lookup_cpuid(u32 apic_id)
 {
 	int i;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 2/6] x86/resctrl: Parse ACPI CMRC table
  2026-06-13  7:56 [PATCH v4 0/6] Introduce MMIO-based CMT access for Enhanced RDT Chen Yu
  2026-06-13  7:56 ` [PATCH v4 1/6] x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID Chen Yu
@ 2026-06-13  7:57 ` Chen Yu
  2026-06-13  7:57 ` [PATCH v4 3/6] x86/resctrl: Rename prev_msr to prev_mon_val Chen Yu
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 23+ messages in thread
From: Chen Yu @ 2026-06-13  7:57 UTC (permalink / raw)
  To: tony.luck, reinette.chatre
  Cc: x86, linux-kernel, tglx, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy

The CMRC (Cache Monitoring Registers for CPU Agents Description)
sub-table of ERDT describes the MMIO registers used to read
cache monitoring counters (e.g. LLC occupancy) for an RMD.

Parse each CMRC sub-table, ioremap its register window, and save
the CMRC pointer in the corresponding ERDT domain entry so that
later monitoring code can read the counters via MMIO.

Suggested-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Hongyu Ning <hongyu.ning@linux.intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
v3->v4:
  Add Thomas and Hongyu's tags.
v2->v3:
  Make the macros tabular format. (Thomas Gleixner)
---
 arch/x86/kernel/cpu/resctrl/erdt.c | 44 ++++++++++++++++++++++++++++--
 1 file changed, 42 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/erdt.c b/arch/x86/kernel/cpu/resctrl/erdt.c
index b43dcc06f009..a1d5de82d5c9 100644
--- a/arch/x86/kernel/cpu/resctrl/erdt.c
+++ b/arch/x86/kernel/cpu/resctrl/erdt.c
@@ -33,8 +33,9 @@ static bool erdt_enabled_flag;
 
 static DEFINE_XARRAY(erdt_domain_xa);
 
-#define ERDT_VALID_VERSION		1
-#define RMDD_FLAG_CPU_DOMAIN		BIT(0)
+#define ERDT_VALID_VERSION			1
+#define CMRC_VALID_INDEX_FUNC_VERSION		1
+#define RMDD_FLAG_CPU_DOMAIN			BIT(0)
 
 static u32 valid_subtbl_mask;
 
@@ -136,6 +137,7 @@ static void erdt_iounmap_domain(struct erdt_domain_info *domain)
 static void cleanup_one_domain(struct erdt_domain_info *d)
 {
 	erdt_iounmap_domain(d);
+	kfree(d->cmrc);
 	kfree(d);
 }
 
@@ -146,6 +148,40 @@ static __init bool cacd_init(struct acpi_subtbl_hdr_16 *subtbl, int *l3_cache_id
 	return *l3_cache_id != -1;
 }
 
+static __init bool cmrc_init(struct erdt_domain_info *d, struct acpi_subtbl_hdr_16 *subtbl)
+{
+	struct acpi_erdt_cmrc *cmrc = (struct acpi_erdt_cmrc *)subtbl;
+
+	if (subtbl->length < sizeof(*cmrc)) {
+		pr_warn(FW_BUG "Truncated CMRC subtable\n");
+		return false;
+	}
+
+	if (cmrc->index_fn != CMRC_VALID_INDEX_FUNC_VERSION) {
+		pr_info("Unknown CMRC index function %d\n", cmrc->index_fn);
+		return false;
+	}
+
+	if (!cmrc->clump_size) {
+		pr_warn(FW_BUG "CMRC clump_size is zero\n");
+		return false;
+	}
+
+	d->base[ERDT_MMIO_CMRC_BASE] = erdt_ioremap_checked(cmrc->cmt_reg_base,
+							    cmrc->cmt_reg_size, "CMRC base");
+	if (!d->base[ERDT_MMIO_CMRC_BASE])
+		return false;
+
+	d->cmrc = kmemdup(cmrc, subtbl->length, GFP_KERNEL);
+	if (!d->cmrc) {
+		iounmap(d->base[ERDT_MMIO_CMRC_BASE]);
+		d->base[ERDT_MMIO_CMRC_BASE] = NULL;
+		return false;
+	}
+
+	return true;
+}
+
 static inline struct acpi_subtbl_hdr_16 *rmdd_subtbl(struct acpi_erdt_rmdd *rmdd)
 {
 	return (void *)rmdd + sizeof(*rmdd);
@@ -203,6 +239,10 @@ static __init bool parse_rmdd_entry(struct acpi_subtbl_hdr_16 *rmdd_hdr)
 			if (cacd_init(subtbl, &l3_cache_id))
 				subtbl_mask |= BIT(ACPI_ERDT_TYPE_CACD);
 			break;
+		case ACPI_ERDT_TYPE_CMRC:
+			if (cmrc_init(domain_info, subtbl))
+				subtbl_mask |= BIT(ACPI_ERDT_TYPE_CMRC);
+			break;
 		default:
 			break;
 		}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 3/6] x86/resctrl: Rename prev_msr to prev_mon_val
  2026-06-13  7:56 [PATCH v4 0/6] Introduce MMIO-based CMT access for Enhanced RDT Chen Yu
  2026-06-13  7:56 ` [PATCH v4 1/6] x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID Chen Yu
  2026-06-13  7:57 ` [PATCH v4 2/6] x86/resctrl: Parse ACPI CMRC table Chen Yu
@ 2026-06-13  7:57 ` Chen Yu
  2026-06-18 23:39   ` Reinette Chatre
  2026-06-13  7:57 ` [PATCH v4 4/6] x86/resctrl: Refactor the monitor read function Chen Yu
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Chen Yu @ 2026-06-13  7:57 UTC (permalink / raw)
  To: tony.luck, reinette.chatre
  Cc: x86, linux-kernel, tglx, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy

Rename the prev_msr field in struct arch_mbm_state to prev_mon_val.
With ERDT, the previous monitor value may come from an MMIO
register rather than from an MSR, so the "msr" suffix is no longer
accurate. The new name describes the field by its meaning (the
previous monitor value) instead of by the access method.

This is preparation for ERDT support, which reads monitoring
counters via MMIO.

No functional change.

Reviewed-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Hongyu Ning <hongyu.ning@linux.intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
v3->v4:
  Add Thomas and Hongyu's tags.
---
 arch/x86/kernel/cpu/resctrl/internal.h |  8 +++----
 arch/x86/kernel/cpu/resctrl/monitor.c  | 30 +++++++++++++-------------
 2 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 9c59bd5e028e..97065dc6e14f 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -31,13 +31,13 @@
 /**
  * struct arch_mbm_state - values used to compute resctrl_arch_rmid_read()s
  *			   return value.
- * @chunks:	Total data moved (multiply by rdt_group.mon_scale to get bytes)
- * @prev_msr:	Value of IA32_QM_CTR last time it was read for the RMID used to
- *		find this struct.
+ * @chunks:		Total data moved (multiply by rdt_group.mon_scale to get bytes)
+ * @prev_mon_val:	Previous monitor counter value for the RMID used to
+ *			find this struct.
  */
 struct arch_mbm_state {
 	u64	chunks;
-	u64	prev_msr;
+	u64	prev_mon_val;
 };
 
 /* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 59215fef3924..93c7fc936861 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -186,7 +186,7 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_l3_mon_domain *d
 
 		prmid = logical_rmid_to_physical_rmid(cpu, rmid);
 		/* Record any initial, non-zero count value. */
-		__rmid_read_phys(prmid, eventid, &am->prev_msr);
+		__rmid_read_phys(prmid, eventid, &am->prev_mon_val);
 	}
 }
 
@@ -209,16 +209,16 @@ void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_l3_mon_domai
 	}
 }
 
-static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
+static u64 mbm_overflow_count(u64 prev_val, u64 cur_val, unsigned int width)
 {
 	u64 shift = 64 - width, chunks;
 
-	chunks = (cur_msr << shift) - (prev_msr << shift);
+	chunks = (cur_val << shift) - (prev_val << shift);
 	return chunks >> shift;
 }
 
 static u64 get_corrected_val(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
-			     u32 rmid, enum resctrl_event_id eventid, u64 msr_val)
+			     u32 rmid, enum resctrl_event_id eventid, u64 mon_val)
 {
 	struct rdt_hw_l3_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
@@ -227,12 +227,12 @@ static u64 get_corrected_val(struct rdt_resource *r, struct rdt_l3_mon_domain *d
 
 	am = get_arch_mbm_state(hw_dom, rmid, eventid);
 	if (am) {
-		am->chunks += mbm_overflow_count(am->prev_msr, msr_val,
+		am->chunks += mbm_overflow_count(am->prev_mon_val, mon_val,
 						 hw_res->mbm_width);
 		chunks = get_corrected_mbm_count(rmid, am->chunks);
-		am->prev_msr = msr_val;
+		am->prev_mon_val = mon_val;
 	} else {
-		chunks = msr_val;
+		chunks = mon_val;
 	}
 
 	return chunks * hw_res->mon_scale;
@@ -245,7 +245,7 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
 	struct rdt_hw_l3_mon_domain *hw_dom;
 	struct rdt_l3_mon_domain *d;
 	struct arch_mbm_state *am;
-	u64 msr_val;
+	u64 mon_val;
 	u32 prmid;
 	int cpu;
 	int ret;
@@ -262,14 +262,14 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
 	hw_dom = resctrl_to_arch_mon_dom(d);
 	cpu = cpumask_any(&hdr->cpu_mask);
 	prmid = logical_rmid_to_physical_rmid(cpu, rmid);
-	ret = __rmid_read_phys(prmid, eventid, &msr_val);
+	ret = __rmid_read_phys(prmid, eventid, &mon_val);
 
 	if (!ret) {
-		*val = get_corrected_val(r, d, rmid, eventid, msr_val);
+		*val = get_corrected_val(r, d, rmid, eventid, mon_val);
 	} else if (ret == -EINVAL) {
 		am = get_arch_mbm_state(hw_dom, rmid, eventid);
 		if (am)
-			am->prev_msr = 0;
+			am->prev_mon_val = 0;
 	}
 
 	return ret;
@@ -324,7 +324,7 @@ void resctrl_arch_reset_cntr(struct rdt_resource *r, struct rdt_l3_mon_domain *d
 		memset(am, 0, sizeof(*am));
 
 		/* Record any initial, non-zero count value. */
-		__cntr_id_read(cntr_id, &am->prev_msr);
+		__cntr_id_read(cntr_id, &am->prev_mon_val);
 	}
 }
 
@@ -332,14 +332,14 @@ int resctrl_arch_cntr_read(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
 			   u32 unused, u32 rmid, int cntr_id,
 			   enum resctrl_event_id eventid, u64 *val)
 {
-	u64 msr_val;
+	u64 mon_val;
 	int ret;
 
-	ret = __cntr_id_read(cntr_id, &msr_val);
+	ret = __cntr_id_read(cntr_id, &mon_val);
 	if (ret)
 		return ret;
 
-	*val = get_corrected_val(r, d, rmid, eventid, msr_val);
+	*val = get_corrected_val(r, d, rmid, eventid, mon_val);
 
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 4/6] x86/resctrl: Refactor the monitor read function
  2026-06-13  7:56 [PATCH v4 0/6] Introduce MMIO-based CMT access for Enhanced RDT Chen Yu
                   ` (2 preceding siblings ...)
  2026-06-13  7:57 ` [PATCH v4 3/6] x86/resctrl: Rename prev_msr to prev_mon_val Chen Yu
@ 2026-06-13  7:57 ` Chen Yu
  2026-06-13  7:57 ` [PATCH v4 5/6] fs/resctrl: Do not invoke smp_processor_id() in preemptible context Chen Yu
  2026-06-13  7:57 ` [PATCH v4 6/6] x86/resctrl: Add support for L3 occupancy monitoring via RMID MMIO read Chen Yu
  5 siblings, 0 replies; 23+ messages in thread
From: Chen Yu @ 2026-06-13  7:57 UTC (permalink / raw)
  To: tony.luck, reinette.chatre
  Cc: x86, linux-kernel, tglx, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy

Split the monitor read helper into an L3 read path and an AET
(Intel Application Energy Telemetry) read path. This makes the
two distinct monitoring sources easier to extend independently
and prepares the L3 path for ERDT-based MMIO reads added in a
later patch.

No functional change.

Tested-by: Hongyu Ning <hongyu.ning@linux.intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
v3->v4:
   Add Hongyu's tag.
---
 arch/x86/kernel/cpu/resctrl/monitor.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 93c7fc936861..9209927f88a2 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -238,9 +238,9 @@ static u64 get_corrected_val(struct rdt_resource *r, struct rdt_l3_mon_domain *d
 	return chunks * hw_res->mon_scale;
 }
 
-int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
-			   u32 unused, u32 rmid, enum resctrl_event_id eventid,
-			   void *arch_priv, u64 *val, void *ignored)
+static int arch_l3_read_event(struct rdt_domain_hdr *hdr, u32 rmid,
+			      enum resctrl_event_id eventid, u64 *val,
+			      struct rdt_resource *r)
 {
 	struct rdt_hw_l3_mon_domain *hw_dom;
 	struct rdt_l3_mon_domain *d;
@@ -250,11 +250,6 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
 	int cpu;
 	int ret;
 
-	resctrl_arch_rmid_read_context_check();
-
-	if (r->rid == RDT_RESOURCE_PERF_PKG)
-		return intel_aet_read_event(hdr->id, rmid, arch_priv, val);
-
 	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
 		return -EINVAL;
 
@@ -275,6 +270,22 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
 	return ret;
 }
 
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
+			   u32 unused, u32 rmid, enum resctrl_event_id eventid,
+			   void *arch_priv, u64 *val, void *ignored)
+{
+	resctrl_arch_rmid_read_context_check();
+
+	switch (r->rid) {
+	case RDT_RESOURCE_L3:
+		return arch_l3_read_event(hdr, rmid, eventid, val, r);
+	case RDT_RESOURCE_PERF_PKG:
+		return intel_aet_read_event(hdr->id, rmid, arch_priv, val);
+	default:
+		return -EINVAL;
+	}
+}
+
 static int __cntr_id_read(u32 cntr_id, u64 *val)
 {
 	u64 msr_val;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 5/6] fs/resctrl: Do not invoke smp_processor_id() in preemptible context
  2026-06-13  7:56 [PATCH v4 0/6] Introduce MMIO-based CMT access for Enhanced RDT Chen Yu
                   ` (3 preceding siblings ...)
  2026-06-13  7:57 ` [PATCH v4 4/6] x86/resctrl: Refactor the monitor read function Chen Yu
@ 2026-06-13  7:57 ` Chen Yu
  2026-06-13  7:57 ` [PATCH v4 6/6] x86/resctrl: Add support for L3 occupancy monitoring via RMID MMIO read Chen Yu
  5 siblings, 0 replies; 23+ messages in thread
From: Chen Yu @ 2026-06-13  7:57 UTC (permalink / raw)
  To: tony.luck, reinette.chatre
  Cc: x86, linux-kernel, tglx, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy

From: Tony Luck <tony.luck@intel.com>

Currently, mon_evt::any_cpu is false for all events associated with
RDT_RESOURCE_L3. The MMIO-based CMT will set any_cpu to true in a
follow-up patch. This change will trigger a warning when calling
smp_processor_id() inside __l3_mon_event_count(), because
__l3_mon_event_count() may be invoked from arbitrary CPUs in task
context. Since the calling context is preemptible here,
smp_processor_id() will emit a debug warning.

To prepare for MMIO-based CMT reads, skip the current CPU lookup
when an event's any_cpu flag is set. Events with this flag do not
require execution on a specific CPU.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Hongyu Ning <hongyu.ning@linux.intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
v3->v4:
   Revise the commit log to explain the background of this patch.
   Add Hongyu's tag.
---
 fs/resctrl/monitor.c | 42 ++++++++++++++++++++++++++++++++----------
 1 file changed, 32 insertions(+), 10 deletions(-)

diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 9fd901c78dc6..593974f64f40 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -417,9 +417,37 @@ static void mbm_cntr_free(struct rdt_l3_mon_domain *d, int cntr_id)
 	memset(&d->cntr_cfg[cntr_id], 0, sizeof(*d->cntr_cfg));
 }
 
+/**
+ * cpu_on_correct_domain() - Check if current CPU is in the correct
+ *			      domain for the event.
+ * @rr: The rmid_read structure containing event and domain information.
+ *
+ * Context: Preemptible process context when @rr->evt->any_cpu is set.
+ *          Non-migratable process context (via smp_call_on_cpu()) or
+ *          non-preemptible context (via smp_call_function_any()) when
+ *          the event must be read on a specific CPU.
+ * Return: true if the current CPU can read this event, false otherwise.
+ */
+static bool cpu_on_correct_domain(struct rmid_read *rr)
+{
+	int cpu;
+
+	/* Any CPU is OK for this event */
+	if (rr->evt->any_cpu)
+		return true;
+
+	cpu = smp_processor_id();
+
+	/* Single domain. Must be on a CPU in that domain. */
+	if (rr->hdr)
+		return cpumask_test_cpu(cpu, &rr->hdr->cpu_mask);
+
+	/* Summing domains that share a cache, must be on a CPU for that cache. */
+	return cpumask_test_cpu(cpu, &rr->ci->shared_cpu_map);
+}
+
 static int __l3_mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 {
-	int cpu = smp_processor_id();
 	u32 closid = rdtgrp->closid;
 	u32 rmid = rdtgrp->mon.rmid;
 	struct rdt_l3_mon_domain *d;
@@ -452,9 +480,6 @@ static int __l3_mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 		return 0;
 	}
 
-	/* Reading a single domain, must be on a CPU in that domain. */
-	if (!cpumask_test_cpu(cpu, &d->hdr.cpu_mask))
-		return -EINVAL;
 	if (rr->is_mbm_cntr)
 		rr->err = resctrl_arch_cntr_read(rr->r, d, closid, rmid, cntr_id,
 						 rr->evt->evtid, &tval);
@@ -472,7 +497,6 @@ static int __l3_mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 
 static int __l3_mon_event_count_sum(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 {
-	int cpu = smp_processor_id();
 	u32 closid = rdtgrp->closid;
 	u32 rmid = rdtgrp->mon.rmid;
 	struct rdt_l3_mon_domain *d;
@@ -490,10 +514,6 @@ static int __l3_mon_event_count_sum(struct rdtgroup *rdtgrp, struct rmid_read *r
 		return -EINVAL;
 	}
 
-	/* Summing domains that share a cache, must be on a CPU for that cache. */
-	if (!cpumask_test_cpu(cpu, &rr->ci->shared_cpu_map))
-		return -EINVAL;
-
 	/*
 	 * Legacy files must report the sum of an event across all
 	 * domains that share the same L3 cache instance.
@@ -524,7 +544,9 @@ static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 {
 	switch (rr->r->rid) {
 	case RDT_RESOURCE_L3:
-		WARN_ON_ONCE(rr->evt->any_cpu);
+		if (!cpu_on_correct_domain(rr))
+			return -EINVAL;
+
 		if (rr->hdr)
 			return __l3_mon_event_count(rdtgrp, rr);
 		else
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 6/6] x86/resctrl: Add support for L3 occupancy monitoring via RMID MMIO read
  2026-06-13  7:56 [PATCH v4 0/6] Introduce MMIO-based CMT access for Enhanced RDT Chen Yu
                   ` (4 preceding siblings ...)
  2026-06-13  7:57 ` [PATCH v4 5/6] fs/resctrl: Do not invoke smp_processor_id() in preemptible context Chen Yu
@ 2026-06-13  7:57 ` Chen Yu
  2026-06-18 23:40   ` Reinette Chatre
  5 siblings, 1 reply; 23+ messages in thread
From: Chen Yu @ 2026-06-13  7:57 UTC (permalink / raw)
  To: tony.luck, reinette.chatre
  Cc: x86, linux-kernel, tglx, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy

The CMRC (Cache Monitoring Registers for CPU Agents Description)
ACPI sub-table provides the MMIO address used to read the LLC
occupancy counter for each RMID. When ERDT is enabled on the
platform, use this MMIO interface instead of the legacy MSR read
to obtain the L3 occupancy value.

Introduce erdt_mon_read(), a helper that retrieves monitoring
data for a given RMID and event ID from an ERDT domain. Initial
support is added for the L3 occupancy monitoring event
(QOS_L3_OCCUP_EVENT_ID).

If the platform supports ERDT, CMRC-based MMIO access is used by
default. If ERDT is unavailable, the implementation is to use
MSR-based operations. 

Suggested-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Hongyu Ning <hongyu.ning@linux.intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
v3->v4:
  Add Thomas and Hongyu's tags.
---
 arch/x86/include/asm/resctrl.h        |  2 +
 arch/x86/kernel/cpu/resctrl/core.c    |  2 +-
 arch/x86/kernel/cpu/resctrl/erdt.c    | 89 +++++++++++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/monitor.c |  7 +++
 4 files changed, 99 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index 97c2f6bc7a5f..9b3b03279dd8 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -41,6 +41,8 @@ struct resctrl_pqr_state {
 };
 
 bool erdt_enabled(void);
+struct rdt_domain_hdr;
+int erdt_mon_read(struct rdt_domain_hdr *hdr, int ev_id, int rmid, u64 *val);
 
 DECLARE_PER_CPU(struct resctrl_pqr_state, pqr_state);
 
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 90730f0851fa..fe812f7190fc 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -965,7 +965,7 @@ static __init bool get_rdt_mon_resources(void)
 	bool ret = false;
 
 	if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) {
-		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false, 0, NULL);
+		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, erdt_enabled(), 0, NULL);
 		ret = true;
 	}
 	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
diff --git a/arch/x86/kernel/cpu/resctrl/erdt.c b/arch/x86/kernel/cpu/resctrl/erdt.c
index a1d5de82d5c9..2a1f638285b2 100644
--- a/arch/x86/kernel/cpu/resctrl/erdt.c
+++ b/arch/x86/kernel/cpu/resctrl/erdt.c
@@ -35,6 +35,7 @@ static DEFINE_XARRAY(erdt_domain_xa);
 
 #define ERDT_VALID_VERSION			1
 #define CMRC_VALID_INDEX_FUNC_VERSION		1
+#define UNAVAILABLE_COUNTER			BIT_ULL(63)
 #define RMDD_FLAG_CPU_DOMAIN			BIT(0)
 
 static u32 valid_subtbl_mask;
@@ -44,6 +45,94 @@ bool erdt_enabled(void)
 	return erdt_enabled_flag;
 }
 
+static void __iomem *cmrc_index_function_1(struct erdt_domain_info *d,
+					   struct acpi_erdt_cmrc *cmrc, int rmid)
+{
+	u16 clump_size, stride_size;
+	void __iomem *vaddr;
+
+	clump_size = cmrc->clump_size;
+	stride_size = cmrc->clump_stride;
+
+	/*
+	 * MMIO_ADDRESS_for_RMID# = CMRC Base +
+	 *   (RMID / ClumpSize) * Stride +
+	 *   (RMID % ClumpSize) * 8
+	 */
+	vaddr = d->base[ERDT_MMIO_CMRC_BASE] +
+		(rmid / clump_size) * stride_size +
+		(rmid % clump_size) * 8;
+
+	return vaddr;
+}
+
+/**
+ * erdt_read_l3_occupancy - Read L3 occupancy count for a given RMID
+ * @d:    Pointer to the ERDT domain info
+ * @rmid: Resource Monitoring ID to read occupancy for
+ * @val:  Output pointer to store the scaled occupancy count
+ *
+ * Calculates the MMIO address using clump and stride information
+ * from the CMRC ACPI structure and reads the L3 cache occupancy
+ * count for the given RMID. The raw value is scaled using the
+ * up_scale factor provided by firmware.
+ *
+ * Return: 0 for success, error code for other cases.
+ */
+static int erdt_read_l3_occupancy(struct erdt_domain_info *d, int rmid, u64 *val)
+{
+	struct acpi_erdt_cmrc *cmrc;
+	void __iomem *vaddr;
+	u64 l3_cmt_count;
+	u32 offset;
+
+	cmrc = d->cmrc;
+	if (!cmrc)
+		return -EIO;
+
+	offset = (rmid / cmrc->clump_size) * cmrc->clump_stride +
+		 (rmid % cmrc->clump_size) * 8;
+	if (offset + sizeof(u64) > (u32)cmrc->cmt_reg_size << 12)
+		return -EINVAL;
+
+	vaddr = cmrc_index_function_1(d, cmrc, rmid);
+
+	l3_cmt_count = readq(vaddr);
+	if (l3_cmt_count & UNAVAILABLE_COUNTER)
+		return -EINVAL;
+
+	*val = l3_cmt_count * cmrc->up_scale;
+
+	return 0;
+}
+
+/**
+ * erdt_mon_read - Read monitoring data for a given domain and RMID
+ * @hdr:    Domain header
+ * @ev_id:  Monitoring event ID (e.g. QOS_L3_OCCUP_EVENT_ID)
+ * @rmid:   Resource Monitoring ID for which to read the data
+ * @val:    Store the read data
+ *
+ * Looks up the domain by domid and dispatches the read request
+ * to the appropriate helper based on the event type.
+ * Currently supports only L3 occupancy monitoring.
+ *
+ * Return: 0 on success, error code otherwise.
+ */
+int erdt_mon_read(struct rdt_domain_hdr *hdr, int ev_id, int rmid, u64 *val)
+{
+	struct erdt_domain_info *d;
+
+	d = xa_load(&erdt_domain_xa, hdr->id);
+	if (!d)
+		return -EIO;
+
+	if (ev_id == QOS_L3_OCCUP_EVENT_ID)
+		return erdt_read_l3_occupancy(d, rmid, val);
+
+	return -EINVAL;
+}
+
 /**
  * get_l3_cache_id_from_cacd - Resolve L3 cache ID from CACD subtable
  * @cacd: Pointer to the ACPI ERDT CACD structure
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 9209927f88a2..1491f96b57c3 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -278,6 +278,13 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
 
 	switch (r->rid) {
 	case RDT_RESOURCE_L3:
+		/*
+		 * No SNC for mmio based L3 occupancy, so there is no need
+		 * to convert logical RMID to a physical RMID via
+		 * logical_rmid_to_physical_rmid().
+		 */
+		if (erdt_enabled() && eventid == QOS_L3_OCCUP_EVENT_ID)
+			return erdt_mon_read(hdr, eventid, rmid, val);
 		return arch_l3_read_event(hdr, rmid, eventid, val, r);
 	case RDT_RESOURCE_PERF_PKG:
 		return intel_aet_read_event(hdr->id, rmid, arch_priv, val);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 1/6] x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID
  2026-06-13  7:56 ` [PATCH v4 1/6] x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID Chen Yu
@ 2026-06-18 23:37   ` Reinette Chatre
  2026-06-22  9:07     ` Chen, Yu C
  0 siblings, 1 reply; 23+ messages in thread
From: Reinette Chatre @ 2026-06-18 23:37 UTC (permalink / raw)
  To: Chen Yu, tony.luck
  Cc: x86, linux-kernel, tglx, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy

Hi Chenyu,

On 6/13/26 12:56 AM, Chen Yu wrote:
> From: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
> 
> ERDT(Enhanced RDT) introduces a new top-level ACPI structure

(nit: please add space between ERDT and the "(")

> (the ERDT) that the kernel must parse before any enhanced
> RDT feature can be used. The ERDT improves the existing RDT
> by switching low-level register access from MSR-based to
> MMIO-based, which is more efficient.

(nit: please use full line length available for changelog)

Above states clearly "switching low-level register access from MSR-based to
MMIO-based". This implementation clearly does so ... but the new ACPI tables
supporting the MMIO-based access also include, for example, the
maximum RMIDs that determines how many MMIO registers are supported.
Even so, from what I can tell, the "maximum RMID" that the ACPI tables
report is ignored and this implementation instead continues to rely on
the maximum RMID reported via CPUID that indicates the maximum RMIDs supported
by the *MSR* to also guide the access of the MMIO registers.

This implementation thus looks to straddle some middle ground between
ACPI and CPUID enumeration, using return of one to guide access to the
other that does not seem robust and requires platforms to forever support both.

 
> The ERDT structure may include several sub ACPI tables:
> 
>   - Resource Management Domain Description Structure (RMDD)
>   - CPU Agent Collection Description Structure (CACD)
>   - Cache Monitoring Registers for CPU Agents Description Structure
>     (CMRC)
> 
> There is one ERDT table per platform.
> Each RMDD substructure in ERDT represents one resource management
> domain (RMD), also known as an L3 domain. Thus, the total number
> of RMDDs equals the number of L3 domains on the platform.
> Each RMDD contains information such as MMIO addresses. This address
> is used to retrieve RDT metrics like L3 occupancy.
> 
> Add basic ERDT ACPI table and sub-table parsing, and store the
> relevant tables for later processing.
> 
> Among these sub-tables, RMDD requires special handling. There is one
> RMDD per domain, and the domain ID reuses the L3 cache ID. Many code
> paths need to retrieve an RMDD efficiently by domain ID (L3 cache ID).
> Because L3 cache IDs are derived from x2APIC IDs and are not
> contiguous, using a plain array indexed by domain ID would waste
> memory. As a trade-off, an xarray is used to store these tables, with

(nit: needs imperative eg. "use an xarray ...")

> the L3 cache ID as the key.
> 
> Suggested-by: Tony Luck <tony.luck@intel.com>
> Tested-by: Hongyu Ning <hongyu.ning@linux.intel.com>
> Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
> ---
...

> ---
>  arch/x86/Kconfig                       |   4 +-
>  arch/x86/include/asm/apic.h            |   1 +
>  arch/x86/include/asm/resctrl.h         |   2 +
>  arch/x86/kernel/cpu/resctrl/Makefile   |   1 +
>  arch/x86/kernel/cpu/resctrl/core.c     |  14 +-
>  arch/x86/kernel/cpu/resctrl/erdt.c     | 296 +++++++++++++++++++++++++
>  arch/x86/kernel/cpu/resctrl/internal.h |   3 +
>  arch/x86/kernel/cpu/topology.c         |   2 +-

Please split out the non-resctrl changes into separate patch with sybsystem-specific
prefix to highlight these changes to not appear to sneak changes into other
subsystems via resctrl. They can stil form part of this series as preparatory patches.

>  8 files changed, 319 insertions(+), 4 deletions(-)
>  create mode 100644 arch/x86/kernel/cpu/resctrl/erdt.c
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index f3f7cb01d69d..97d210bd9bb5 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -515,7 +515,7 @@ config X86_MPPARSE
>  
>  config X86_CPU_RESCTRL
>  	bool "x86 CPU resource control support"
> -	depends on X86 && (CPU_SUP_INTEL || CPU_SUP_AMD)
> +	depends on X86_64 && (CPU_SUP_INTEL || CPU_SUP_AMD)
>  	depends on MISC_FILESYSTEMS
>  	select ARCH_HAS_CPU_RESCTRL
>  	select RESCTRL_FS
> @@ -538,7 +538,7 @@ config X86_CPU_RESCTRL
>  
>  config X86_CPU_RESCTRL_INTEL_AET
>  	bool "Intel Application Energy Telemetry"
> -	depends on X86_64 && X86_CPU_RESCTRL && CPU_SUP_INTEL && INTEL_PMT_TELEMETRY=y && INTEL_TPMI=y
> +	depends on X86_CPU_RESCTRL && CPU_SUP_INTEL && INTEL_PMT_TELEMETRY=y && INTEL_TPMI=y
>  	help
>  	  Enable per-RMID telemetry events in resctrl.
>  

Removing 32bit support is a significant change worth a mention and motivation in the changelog.

> diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
> index 9cd493d467d4..bb84651b14bd 100644
> --- a/arch/x86/include/asm/apic.h
> +++ b/arch/x86/include/asm/apic.h
> @@ -54,6 +54,7 @@ static inline void x86_32_probe_apic(void) { }
>  #endif
>  
>  extern u32 cpuid_to_apicid[];
> +int topo_lookup_cpuid(u32 apic_id);
>  
>  #define CPU_ACPIID_INVALID	U32_MAX
>  
> diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
> index 575f8408a9e7..97c2f6bc7a5f 100644
> --- a/arch/x86/include/asm/resctrl.h
> +++ b/arch/x86/include/asm/resctrl.h
> @@ -40,6 +40,8 @@ struct resctrl_pqr_state {
>  	u32			default_closid;
>  };
>  
> +bool erdt_enabled(void);
> +

I cannot see why this needs to be in asm header ... why not arch/x86/kernel/cpu/resctrl/internal.h?

>  DECLARE_PER_CPU(struct resctrl_pqr_state, pqr_state);
>  
>  extern bool rdt_alloc_capable;
> diff --git a/arch/x86/kernel/cpu/resctrl/Makefile b/arch/x86/kernel/cpu/resctrl/Makefile
> index 273ddfa30836..2216ee084832 100644
> --- a/arch/x86/kernel/cpu/resctrl/Makefile
> +++ b/arch/x86/kernel/cpu/resctrl/Makefile
> @@ -2,6 +2,7 @@
>  obj-$(CONFIG_X86_CPU_RESCTRL)		+= core.o rdtgroup.o monitor.o
>  obj-$(CONFIG_X86_CPU_RESCTRL)		+= ctrlmondata.o
>  obj-$(CONFIG_X86_CPU_RESCTRL_INTEL_AET)	+= intel_aet.o
> +obj-$(CONFIG_X86_CPU_RESCTRL)		+= erdt.o
>  obj-$(CONFIG_RESCTRL_FS_PSEUDO_LOCK)	+= pseudo_lock.o
>  
>  # To allow define_trace.h's recursive include:
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 7667cf7c4e94..90730f0851fa 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -1012,6 +1012,7 @@ static __init void check_quirks(void)
>  
>  static __init bool get_rdt_resources(void)
>  {
> +	erdt_init();
>  	rdt_alloc_capable = get_rdt_alloc_resources();
>  	rdt_mon_capable = get_rdt_mon_resources();
>  
> @@ -1113,7 +1114,7 @@ void resctrl_cpu_detect(struct cpuinfo_x86 *c)
>  	}
>  }
>  
> -static int __init resctrl_arch_late_init(void)
> +static int __init __resctrl_arch_late_init(void)
>  {
>  	struct rdt_resource *r;
>  	int state, ret, i;
> @@ -1156,6 +1157,15 @@ static int __init resctrl_arch_late_init(void)
>  	return 0;
>  }
>  
> +static int __init resctrl_arch_late_init(void)
> +{
> +	int ret = __resctrl_arch_late_init();
> +
> +	if (ret)
> +		erdt_exit();
> +	return ret;
> +}
> +
>  late_initcall(resctrl_arch_late_init);
>  
>  static void __exit resctrl_arch_exit(void)
> @@ -1165,6 +1175,8 @@ static void __exit resctrl_arch_exit(void)
>  	cpuhp_remove_state(rdt_online);
>  
>  	resctrl_exit();
> +
> +	erdt_exit();
>  }
>  
>  __exitcall(resctrl_arch_exit);
> diff --git a/arch/x86/kernel/cpu/resctrl/erdt.c b/arch/x86/kernel/cpu/resctrl/erdt.c
> new file mode 100644
> index 000000000000..b43dcc06f009
> --- /dev/null
> +++ b/arch/x86/kernel/cpu/resctrl/erdt.c
> @@ -0,0 +1,296 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Enhanced Resource Director Technology(ERDT)

(nit: please add space)

> + *
> + * Copyright (C) 2026 Intel Corporation
> + *
> + */
> +
> +#define pr_fmt(fmt)     "resctrl: " fmt
> +
> +#include <linux/cleanup.h>
> +#include <linux/cpu.h>
> +#include <linux/err.h>
> +#include <linux/xarray.h>
> +#include <linux/resctrl.h>
> +#include <linux/acpi.h>

Please sort the includes alphabetically and add empty lines between the groups of headers.

> +#include <asm/apic.h>
> +#include <asm/cpu_device_id.h>

Why is above #include needed?

> +#include "internal.h"
> +


Please add a comment for the enum below to describe what the different members represent.

> +enum erdt_mmio_type {
> +	ERDT_MMIO_RMDD_CREG,
> +	ERDT_MMIO_CMRC_BASE,
> +	ERDT_MMIO_MAX
> +};

Depending on how this enum ends up being used, could it benefit from a pattern like
https://lore.kernel.org/lkml/4a304e6c9ed3f5d92156b8045f5dd4ddba359dfc.1777419024.git.reinette.chatre@intel ?

> +
> +struct erdt_domain_info {
> +	void __iomem		*base[ERDT_MMIO_MAX];
> +	struct acpi_erdt_cmrc	*cmrc;
> +};
> +
> +static bool erdt_enabled_flag;

What is significance of "flag"? Can it just be "erdt_enabled"? Since it is a bool this seems 
more intuitive?

> +
> +static DEFINE_XARRAY(erdt_domain_xa);

Please add comment to describe what is stored in erdt_domain_xa and what the
index represents.

> +
> +#define ERDT_VALID_VERSION		1
> +#define RMDD_FLAG_CPU_DOMAIN		BIT(0)

I think this name can be more accurate, how about, for example "RMDD_FLAG_L3_DOMAIN"?

> +
> +static u32 valid_subtbl_mask;

Please add comment to describe what above global represents.

> +
> +bool erdt_enabled(void)
> +{
> +	return erdt_enabled_flag;
> +}
> +
> +/**
> + * get_l3_cache_id_from_cacd - Resolve L3 cache ID from CACD subtable

As I interpret it ... above does not describe to be what this function does.

> + * @cacd: Pointer to the ACPI ERDT CACD structure
> + *
> + * Parses the X2APIC ID list in the given CACD subtable to
> + * identify an online logical CPU and uses it to query the associated
> + * L3 cache ID. The first valid CPU found is used for this lookup.

This function is __init and requires at least one CPU per domain to be online.
What will happen if a system is booted with all CPUs of a domain offline and
then later a CPU is online'd?

> + *
> + * The L3 cache ID is used as a unique domain key for ERDT domain
> + * registration and lookup.
> + *
> + * Return: L3 cache ID for the first matching CPU, or -1 on failure.

Could you please use consistent terminology? Above comments switches between
"online logical CPU" and "valid CPU" and "matching CPU" that all seem to refer
to the same idea.

> + */
> +static __init int get_l3_cache_id_from_cacd(struct acpi_erdt_cacd *cacd)
> +{
> +	int num_ids, cpu, online_cpu = -1, cache_id = -1, tmp;
> +	struct cacheinfo *ci;
> +
> +	if (cacd->header.length < sizeof(*cacd) + sizeof(cacd->X2APICIDS[0])) {

Is above an open code of struct_size()?

> +		pr_warn(FW_BUG "Invalid x2apicid CACD table\n");

This patch displays several "FW_BUG" messages but they are sometimes printed with
pr_info() and sometimes with pr_warn(). Is this intentional?

> +		return -1;
> +	}
> +
> +	num_ids = (cacd->header.length - sizeof(*cacd)) / sizeof(cacd->X2APICIDS[0]);
> +
> +	guard(cpus_read_lock)();
> +
> +	for (int i = 0; i < num_ids; i++) {
> +		cpu = topo_lookup_cpuid(cacd->X2APICIDS[i]);
> +		if (cpu < 0) {
> +			pr_warn(FW_BUG "Unknown x2apicid 0x%x\n", cacd->X2APICIDS[i]);
> +
(nit: unnecessary empty line)


> +			return -1;
> +		}
> +
> +		if (!cpu_online(cpu))
> +			continue;
> +
> +		tmp = get_cpu_cacheinfo_id(cpu, RESCTRL_L3_CACHE);

From what I can tell it is tmp that is essentially returned by this function and it
is the result of get_cpu_cacheinfo_id() that returns the cache ID initialized from 
CPUID leaf 0x4 ... the function name ("get_l3_cache_id_from_cacd()") and function comment
"Resolve L3 cache ID from CACD subtable" implies differently and the way this is done
makes me wonder why CACD needs to be parsed at all.

resctrl already has a hotplug handler and it dynamically determines the domain ID
based on the cache ID returned from get_cpu_cacheinfo_id(). If the domain ID used by the
new ERDT tables can be, as it appears is done here, trusted to what Linux already
treating as domain ID then why is it needed to parse CACD at all? Why not just
pick the domain ID from RMDD directly and use it directly as domain ID? 

Later the domain ID from RMDD is just used in error messages ... never actually used
as a domain ID to guide the implementation so the correlation appears to be implicit with
this CACD parsing not intuitive.

> +		if (tmp == -1) {
> +			pr_warn(FW_BUG "Can not find L3 cache id for CPU%d\n", cpu);
> +			return -1;
> +		}
> +
> +		if (cache_id == -1)
> +			cache_id = tmp;
> +
> +		if (tmp != cache_id) {
> +			pr_warn(FW_BUG "CACD references multiple L3 cache instances\n");
> +			return -1;
> +		}
> +		online_cpu = cpu;
> +	}
> +
> +	if (online_cpu == -1)
> +		return -1;
> +
> +	/*
> +	 * Check if CACD lists all CPUs in the LLC domain.
> +	 */
> +	ci = get_cpu_cacheinfo_level(online_cpu, RESCTRL_L3_CACHE);
> +	if (!ci || num_ids < cpumask_weight(&ci->shared_cpu_map)) {
> +		pr_warn(FW_BUG "CACD does not list all the CPUs in L3 domain\n");
> +		return -1;
> +	}
> +
> +	return cache_id;
> +}
> +
> +static void __iomem *erdt_ioremap_checked(phys_addr_t base, u32 size, const char *desc)

"size" -> "num_pages"?

> +{
> +	void __iomem *addr = ioremap(base, size << 12);

The 12 magic constant is not ideal. Seems like PAGE_SHIFT is avoided since this *has* to be
4K? Would "num_pages * SZ_4K" be more accurate? To be most robust please consider check_mul_overflow()
or check_shl_overflow().

It is not clear to me what the "_checked" of the function name implies but I do not see
the parameters checked. For example, base could be checked for 4KB alignment?

> +
> +	if (!addr) {
> +		pr_err("ERDT: Failed to map %s at phys addr %#llx (size: %u pages)\n",

phys_addt_t is printed with %pa

> +		       desc, (unsigned long long)base, size);
> +	}
> +	return addr;
> +}
> +
> +static void erdt_iounmap_domain(struct erdt_domain_info *domain)
> +{
> +	for (int i = 0; i < ERDT_MMIO_MAX; i++) {
> +		if (domain->base[i]) {
> +			iounmap(domain->base[i]);
> +			domain->base[i] = NULL;
> +		}
> +	}
> +}
> +
> +static void cleanup_one_domain(struct erdt_domain_info *d)
> +{
> +	erdt_iounmap_domain(d);
> +	kfree(d);
> +}
> +
> +static __init bool cacd_init(struct acpi_subtbl_hdr_16 *subtbl, int *l3_cache_id)
> +{
> +	*l3_cache_id = get_l3_cache_id_from_cacd((struct acpi_erdt_cacd *)subtbl);
> +
> +	return *l3_cache_id != -1;
> +}
> +
> +static inline struct acpi_subtbl_hdr_16 *rmdd_subtbl(struct acpi_erdt_rmdd *rmdd)
> +{
> +	return (void *)rmdd + sizeof(*rmdd);
> +}
> +
> +static inline struct acpi_subtbl_hdr_16 *next_subtbl(struct acpi_subtbl_hdr_16 *subtbl)
> +{
> +	return (void *)subtbl + subtbl->length;
> +}
> +
> +static inline bool subtbl_valid(void *end, struct acpi_subtbl_hdr_16 *subtbl)
> +{

(please see later comment where it seems that it may be possible that this function is
called with end == subtbl and the reference below would thus be past end of table)

> +	if (subtbl->length < sizeof(*subtbl))
> +		return false;
> +
> +	if ((void *)subtbl + subtbl->length > end)
> +		return false;
> +
> +	return true;
> +}
> +
> +static __init bool parse_rmdd_entry(struct acpi_subtbl_hdr_16 *rmdd_hdr)
> +{
> +	struct erdt_domain_info *domain_info;
> +	struct acpi_subtbl_hdr_16 *subtbl;
> +	struct acpi_erdt_rmdd *rmdd;
> +	int l3_cache_id = -1;
> +	u32 subtbl_mask = 0;
> +
> +	if (rmdd_hdr->length < sizeof(*rmdd)) {
> +		pr_info(FW_BUG "Invalid RMDD length %u\n", rmdd_hdr->length);
> +		return false;
> +	}
> +
> +	rmdd = (struct acpi_erdt_rmdd *)rmdd_hdr;
> +
> +	/* Quietly ignore non-CPU-based L3 domains */
> +	if (!(rmdd->flags & RMDD_FLAG_CPU_DOMAIN))
> +		return true;
> +
> +	domain_info = kzalloc(sizeof(*domain_info), GFP_KERNEL);

Please use kzalloc_obj().
commit 2932ba8d9c99 ("slab: Introduce kmalloc_obj() and family") contains details about benefits.

> +	if (!domain_info)
> +		return false;
> +
> +	domain_info->base[ERDT_MMIO_RMDD_CREG] =
> +		erdt_ioremap_checked(rmdd->creg_base, rmdd->creg_size, "RMDD ctrl base");
> +	if (!domain_info->base[ERDT_MMIO_RMDD_CREG])
> +		goto cleanup;
> +
> +	for (subtbl = rmdd_subtbl(rmdd);
> +	     subtbl_valid((void *)rmdd + rmdd->header.length, subtbl);

This does not look safe to me. What will happen if RMDD is empty? Would subtbl not
point to end of the table and then subtbl_valid() would dereference pointer beyond the
end of the table to do its checks?

> +	     subtbl = next_subtbl(subtbl)) {
> +		switch (subtbl->type) {
> +		case ACPI_ERDT_TYPE_CACD:
> +			if (cacd_init(subtbl, &l3_cache_id))
> +				subtbl_mask |= BIT(ACPI_ERDT_TYPE_CACD);
> +			break;
> +		default:
> +			break;
> +		}
> +	}
> +
> +	if (l3_cache_id == -1) {
> +		pr_info("ERDT: Failed to resolve L3 cache ID for RMDD domain %d\n",
> +			rmdd->domain_id);
> +
> +		goto cleanup;
> +	}
> +
> +	/* Require all RMDDs to support same set of sub-tables */
> +	if (!valid_subtbl_mask) {
> +		valid_subtbl_mask = subtbl_mask;
> +	} else if (subtbl_mask != valid_subtbl_mask) {
> +		pr_info(FW_BUG "Mismatch domain\n");
> +		goto cleanup;
> +	}
> +
> +	if (xa_insert(&erdt_domain_xa, l3_cache_id, domain_info, GFP_KERNEL)) {
> +		pr_info("ERDT: Failed to store domain info for RMDD domain %d\n",
> +			rmdd->domain_id);
> +		goto cleanup;
> +	}
> +
> +	return true;
> +
> +cleanup:
> +	cleanup_one_domain(domain_info);
> +	return false;
> +}
> +
> +static void erdt_cleanup(void)
> +{
> +	struct erdt_domain_info *d;
> +	unsigned long index;
> +

Looks like the initialization via get_rdt_resources() does not do any error
checking so there may be nothing to clean up and certainly there will be nothing
to clean up if ERDT is not supported. Can the cleanup have a short circuit here to
only clean up if the array is not empty or perhaps skip cleanup if !erdt_enabled?

> +	xa_for_each(&erdt_domain_xa, index, d)
> +		cleanup_one_domain(d);
> +	xa_destroy(&erdt_domain_xa);
> +}
> +
> +static __init int enumerate_erdt_table(struct acpi_table_header *table_hdr)
> +{
> +	struct acpi_table_erdt *erdt = (struct acpi_table_erdt *)table_hdr;
> +	struct acpi_subtbl_hdr_16 *subtbl;
> +	void *table_end;
> +
> +	if (erdt->header.revision != ERDT_VALID_VERSION) {
> +		pr_info("Unknown ERDT table revision %d\n", erdt->header.revision);
(nit: "Unknown" -> "Unsupported"?)

> +		return -EINVAL;
> +	}
> +
> +	if (erdt->header.length < sizeof(*erdt)) {
> +		pr_info(FW_BUG "ERDT: Invalid table length %u\n", erdt->header.length);

Here is an example of a "pr_info()" FW_BUG. It may help to add a unit
and not just report "length" 

> +		return -EINVAL;
> +	}
> +
> +	subtbl = (void *)erdt + sizeof(struct acpi_table_erdt);
> +	table_end = (void *)erdt + erdt->header.length;
> +
> +	while (subtbl_valid(table_end, subtbl)) {
> +		if (subtbl->type == ACPI_ERDT_TYPE_RMDD)
> +			if (!parse_rmdd_entry(subtbl))
> +				goto cleanup;
> +
> +		subtbl = next_subtbl(subtbl);
> +	}
> +
> +	if (xa_empty(&erdt_domain_xa))
> +		goto cleanup;
> +
> +	erdt_enabled_flag = true;
> +
> +	return 0;
> +
> +cleanup:
> +	erdt_cleanup();
> +	return -EINVAL;
> +}
> +
> +int __init erdt_init(void)
> +{
> +	return acpi_table_parse(ACPI_SIG_ERDT, enumerate_erdt_table);
> +}
> +
> +void erdt_exit(void)
> +{
> +	erdt_cleanup();
> +}

Why is erdt_exit() and erdt_cleanup() both needed?

Reinette

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 3/6] x86/resctrl: Rename prev_msr to prev_mon_val
  2026-06-13  7:57 ` [PATCH v4 3/6] x86/resctrl: Rename prev_msr to prev_mon_val Chen Yu
@ 2026-06-18 23:39   ` Reinette Chatre
  2026-06-22 12:42     ` Chen, Yu C
  0 siblings, 1 reply; 23+ messages in thread
From: Reinette Chatre @ 2026-06-18 23:39 UTC (permalink / raw)
  To: Chen Yu, tony.luck
  Cc: x86, linux-kernel, tglx, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy

Hi Chenyu,

On 6/13/26 12:57 AM, Chen Yu wrote:
> Rename the prev_msr field in struct arch_mbm_state to prev_mon_val.

This change as described seems out of place when considering the following
from cover letter:
 This patch set focuses on the first part: enabling MMIO-based access for        
 Cache Monitoring Technology (CMT), *while CAT/MBM/MBA are still using MSR*.

It is thus confusing to find a change related to switching MBM to use MSR
in a series that claims that it does not do this.

The patch looks fine but reading the changelog and the first few hunks makes
this patch look out of place. Could you please update the changelog to be 
accurate about what this patch does?

> With ERDT, the previous monitor value may come from an MMIO
> register rather than from an MSR, so the "msr" suffix is no longer
> accurate. The new name describes the field by its meaning (the
> previous monitor value) instead of by the access method.
> 
> This is preparation for ERDT support, which reads monitoring
> counters via MMIO.
> 
> No functional change.
> 
Reinette

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 6/6] x86/resctrl: Add support for L3 occupancy monitoring via RMID MMIO read
  2026-06-13  7:57 ` [PATCH v4 6/6] x86/resctrl: Add support for L3 occupancy monitoring via RMID MMIO read Chen Yu
@ 2026-06-18 23:40   ` Reinette Chatre
  2026-06-22 14:09     ` Chen, Yu C
  0 siblings, 1 reply; 23+ messages in thread
From: Reinette Chatre @ 2026-06-18 23:40 UTC (permalink / raw)
  To: Chen Yu, tony.luck
  Cc: x86, linux-kernel, tglx, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy

Hi Chenyu,

On 6/13/26 12:57 AM, Chen Yu wrote:
> The CMRC (Cache Monitoring Registers for CPU Agents Description)
> ACPI sub-table provides the MMIO address used to read the LLC
> occupancy counter for each RMID. When ERDT is enabled on the
> platform, use this MMIO interface instead of the legacy MSR read
> to obtain the L3 occupancy value.
> 
> Introduce erdt_mon_read(), a helper that retrieves monitoring
> data for a given RMID and event ID from an ERDT domain. Initial
> support is added for the L3 occupancy monitoring event
> (QOS_L3_OCCUP_EVENT_ID).
> 
> If the platform supports ERDT, CMRC-based MMIO access is used by
> default. If ERDT is unavailable, the implementation is to use
> MSR-based operations. 

(nit: please write in imperative tone and use entire line length available)

... 

> diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
> index 97c2f6bc7a5f..9b3b03279dd8 100644
> --- a/arch/x86/include/asm/resctrl.h
> +++ b/arch/x86/include/asm/resctrl.h
> @@ -41,6 +41,8 @@ struct resctrl_pqr_state {
>  };
>  
>  bool erdt_enabled(void);
> +struct rdt_domain_hdr;
> +int erdt_mon_read(struct rdt_domain_hdr *hdr, int ev_id, int rmid, u64 *val);
>  
>  DECLARE_PER_CPU(struct resctrl_pqr_state, pqr_state);
>  
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 90730f0851fa..fe812f7190fc 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -965,7 +965,7 @@ static __init bool get_rdt_mon_resources(void)
>  	bool ret = false;
>  
>  	if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) {
> -		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false, 0, NULL);
> +		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, erdt_enabled(), 0, NULL);
>  		ret = true;
>  	}
>  	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {

As mentioned in patch #1, when erdt_enabled() is true the enumeration still proceeds to
enumerate the monitoring properties via CPUID to discover the number of RMIDs that the
*MSR* supports and use it as the maximum RMID (and thus the maximum number of registers)
that MMIO supports?

...
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 9209927f88a2..1491f96b57c3 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -278,6 +278,13 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
>  
>  	switch (r->rid) {
>  	case RDT_RESOURCE_L3:
> +		/*
> +		 * No SNC for mmio based L3 occupancy, so there is no need

"No SNC for mmio based L3 occupancy" is too significant to be buried like this. Could you
please elaborate on this claim and enforce it with the implementation? For example,
could rdt_get_l3_mon_config() WARN and *not* set r->mon_capable if erdt_enable() and
snc_nodes_per_l3_cache > 1?

> +		 * to convert logical RMID to a physical RMID via
> +		 * logical_rmid_to_physical_rmid().
> +		 */
> +		if (erdt_enabled() && eventid == QOS_L3_OCCUP_EVENT_ID)
> +			return erdt_mon_read(hdr, eventid, rmid, val);
>  		return arch_l3_read_event(hdr, rmid, eventid, val, r);
>  	case RDT_RESOURCE_PERF_PKG:
>  		return intel_aet_read_event(hdr->id, rmid, arch_priv, val);

Reinette

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 1/6] x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID
  2026-06-18 23:37   ` Reinette Chatre
@ 2026-06-22  9:07     ` Chen, Yu C
  2026-06-22 21:28       ` Reinette Chatre
  0 siblings, 1 reply; 23+ messages in thread
From: Chen, Yu C @ 2026-06-22  9:07 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: x86, linux-kernel, tglx, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy, tony.luck,
	Chen Yu

Hi Reinette,
Thanks very much for the detailed review.

On 6/19/2026 7:37 AM, Reinette Chatre wrote:
> Hi Chenyu,
> 
> On 6/13/26 12:56 AM, Chen Yu wrote:
>> From: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
>>
>> ERDT(Enhanced RDT) introduces a new top-level ACPI structure
> 
> (nit: please add space between ERDT and the "(")
> 

OK, will do.

>> (the ERDT) that the kernel must parse before any enhanced
>> RDT feature can be used. The ERDT improves the existing RDT
>> by switching low-level register access from MSR-based to
>> MMIO-based, which is more efficient.
> 
> (nit: please use full line length available for changelog)
> 

OK, will do.

> Above states clearly "switching low-level register access from MSR-based to
> MMIO-based". This implementation clearly does so ... but the new ACPI tables
> supporting the MMIO-based access also include, for example, the
> maximum RMIDs that determines how many MMIO registers are supported.
> Even so, from what I can tell, the "maximum RMID" that the ACPI tables
> report is ignored and this implementation instead continues to rely on
> the maximum RMID reported via CPUID that indicates the maximum RMIDs supported
> by the *MSR* to also guide the access of the MMIO registers.
> 
> This implementation thus looks to straddle some middle ground between
> ACPI and CPUID enumeration, using return of one to guide access to the
> other that does not seem robust and requires platforms to forever support both.
> 
> 

I see. If we rely on ERDT information, we should use the "maximum RMID"
exposed via ACPI tables instead of CPUID. I will adjust this accordingly.

>> memory. As a trade-off, an xarray is used to store these tables, with
> 
> (nit: needs imperative eg. "use an xarray ...")
> 

OK, will change it accordingly.

>> the L3 cache ID as the key.
>>   arch/x86/Kconfig                       |   4 +-
>>   arch/x86/include/asm/apic.h            |   1 +
>>   arch/x86/include/asm/resctrl.h         |   2 +
>>   arch/x86/kernel/cpu/resctrl/Makefile   |   1 +
>>   arch/x86/kernel/cpu/resctrl/core.c     |  14 +-
>>   arch/x86/kernel/cpu/resctrl/erdt.c     | 296 +++++++++++++++++++++++++
>>   arch/x86/kernel/cpu/resctrl/internal.h |   3 +
>>   arch/x86/kernel/cpu/topology.c         |   2 +-
> 
> Please split out the non-resctrl changes into separate patch with sybsystem-specific
> prefix to highlight these changes to not appear to sneak changes into other
> subsystems via resctrl. They can stil form part of this series as preparatory patches.
> 

I see, let me split two of them out:
x86/topology: Export topo_lookup_cpuid for module use
x86/apic: Add declaration for topo_lookup_cpuid


>>   config X86_CPU_RESCTRL
>>   	bool "x86 CPU resource control support"
>> -	depends on X86 && (CPU_SUP_INTEL || CPU_SUP_AMD)
>> +	depends on X86_64 && (CPU_SUP_INTEL || CPU_SUP_AMD)
>>   	depends on MISC_FILESYSTEMS
>>   	select ARCH_HAS_CPU_RESCTRL
>>   	select RESCTRL_FS
>> @@ -538,7 +538,7 @@ config X86_CPU_RESCTRL
>>   
>>   config X86_CPU_RESCTRL_INTEL_AET
>>   	bool "Intel Application Energy Telemetry"
>> -	depends on X86_64 && X86_CPU_RESCTRL && CPU_SUP_INTEL && INTEL_PMT_TELEMETRY=y && INTEL_TPMI=y
>> +	depends on X86_CPU_RESCTRL && CPU_SUP_INTEL && INTEL_PMT_TELEMETRY=y && INTEL_TPMI=y
>>   	help
>>   	  Enable per-RMID telemetry events in resctrl.
>>   
> 
> Removing 32bit support is a significant change worth a mention and motivation in the changelog.
> 

The original idea was that MMIO-based access (including CMT/MBM/MBA) 
uses readq/writeq,
which depend on 64-bit. Since MMIO CMT relies on CONFIG_X86_CPU_RESCTRL, 
making
CONFIG_X86_CPU_RESCTRL depend on X86_64 would simplify the code.
If I understand correctly, 32-bit x86 lacks practical hardware support 
for RDT/QoS.
The Kconfig entry currently depends on X86, which technically permits 
32-bit builds.
However, I am unsure whether there exists any real-world use case where 
a user compiles
a 32-bit kernel on a server and enables resctrl?
If we wish to retain 32-bit RDT support, I can introduce a new config 
option to gate ERDT:
config X86_CPU_RESCTRL_ERDT
     bool
     depends on X86_CPU_RESCTRL && X86_64 && ACPI
     default y
Could you advise which approach is preferable?


>> diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
>> index 575f8408a9e7..97c2f6bc7a5f 100644
>> --- a/arch/x86/include/asm/resctrl.h
>> +++ b/arch/x86/include/asm/resctrl.h
>> @@ -40,6 +40,8 @@ struct resctrl_pqr_state {
>>   	u32			default_closid;
>>   };
>>   
>> +bool erdt_enabled(void);
>> +
> 
> I cannot see why this needs to be in asm header ... why not arch/x86/kernel/cpu/resctrl/internal.h?
> 

OK, let me move it into internal.h. By the way, I believe fs/resctrl
also needs to call erdt_enabled() for region-aware MBA/MBM. Would it
be acceptable to expose erdt_enabled() in include/linux/resctrl.h?

>> + * Enhanced Resource Director Technology(ERDT)
> 
> (nit: please add space)
> 

OK, will do.


>> +#include <linux/cleanup.h>
>> +#include <linux/cpu.h>
>> +#include <linux/err.h>
>> +#include <linux/xarray.h>
>> +#include <linux/resctrl.h>
>> +#include <linux/acpi.h>
> 
> Please sort the includes alphabetically and add empty lines between the groups of headers.
> 

OK, will do.

>> +#include <asm/apic.h>
>> +#include <asm/cpu_device_id.h>
> 
> Why is above #include needed?
> 

It was included in the previous version for CPU count queries,
but it is no longer needed; I will remove it.

>> +#include "internal.h"
>> +
> 
> 
> Please add a comment for the enum below to describe what the different members represent.
> 

OK, will do.

>> +enum erdt_mmio_type {
>> +	ERDT_MMIO_RMDD_CREG,
>> +	ERDT_MMIO_CMRC_BASE,
>> +	ERDT_MMIO_MAX
>> +};
> 
> Depending on how this enum ends up being used, could it benefit from a pattern like
> https://lore.kernel.org/lkml/4a304e6c9ed3f5d92156b8045f5dd4ddba359dfc.1777419024.git.reinette.chatre@intel ?
> 

Got it, thanks for sharing this, will do.

>> +
>> +struct erdt_domain_info {
>> +	void __iomem		*base[ERDT_MMIO_MAX];
>> +	struct acpi_erdt_cmrc	*cmrc;
>> +};
>> +
>> +static bool erdt_enabled_flag;
> 
> What is significance of "flag"? Can it just be "erdt_enabled"? Since it is a bool this seems
> more intuitive?
> 

erdt_enabled() was already used as a function name. Let me change it to 
static bool is_enabled

>> +
>> +static DEFINE_XARRAY(erdt_domain_xa);
> 
> Please add comment to describe what is stored in erdt_domain_xa and what the
> index represents.
> 

OK, will do.

>> +
>> +#define ERDT_VALID_VERSION		1
>> +#define RMDD_FLAG_CPU_DOMAIN		BIT(0)
> 
> I think this name can be more accurate, how about, for example "RMDD_FLAG_L3_DOMAIN"?
> 

According to the spec, this flag indicates: CPU L3 cache and I/O L3 
Domain, maybe using
RMDD_FLAG_CPU_L3_DOMAIN?

>> +
>> +static u32 valid_subtbl_mask;
> 
> Please add comment to describe what above global represents.
> 

OK, will do.

>> +
>> +bool erdt_enabled(void)
>> +{
>> +	return erdt_enabled_flag;
>> +}
>> +
>> +/**
>> + * get_l3_cache_id_from_cacd - Resolve L3 cache ID from CACD subtable
> 
> As I interpret it ... above does not describe to be what this function does.
> 
>> + * @cacd: Pointer to the ACPI ERDT CACD structure
>> + *
>> + * Parses the X2APIC ID list in the given CACD subtable to
>> + * identify an online logical CPU and uses it to query the associated
>> + * L3 cache ID. The first valid CPU found is used for this lookup.
> 
> This function is __init and requires at least one CPU per domain to be online.
> What will happen if a system is booted with all CPUs of a domain offline and
> then later a CPU is online'd?
> 

Got it, this appears to be a corner case bug: offlined CPUs within a 
domain never
get an opportunity to retrieve their L3 ID. I will figure out how to 
handle this
case, potentially by leveraging resctrl_arch_online_cpu() (or maybe use 
other
method)

>> + *
>> + * The L3 cache ID is used as a unique domain key for ERDT domain
>> + * registration and lookup.
>> + *
>> + * Return: L3 cache ID for the first matching CPU, or -1 on failure.
> 
> Could you please use consistent terminology? Above comments switches between
> "online logical CPU" and "valid CPU" and "matching CPU" that all seem to refer
> to the same idea.
> 

OK, let me switch to "CPU" in all cases.

>> + */
>> +static __init int get_l3_cache_id_from_cacd(struct acpi_erdt_cacd *cacd)
>> +{
>> +	int num_ids, cpu, online_cpu = -1, cache_id = -1, tmp;
>> +	struct cacheinfo *ci;
>> +
>> +	if (cacd->header.length < sizeof(*cacd) + sizeof(cacd->X2APICIDS[0])) {
> 
> Is above an open code of struct_size()?
> 

OK, will change it to
if (cacd->header.length < struct_size(cacd, X2APICIDS, 1))

>> +		pr_warn(FW_BUG "Invalid x2apicid CACD table\n");
> 
> This patch displays several "FW_BUG" messages but they are sometimes printed with
> pr_info() and sometimes with pr_warn(). Is this intentional?
> 

Let me switch all the print to pr_warn(FW_BUG).

>> +		return -1;
>> +	}
>> +
>> +	num_ids = (cacd->header.length - sizeof(*cacd)) / sizeof(cacd->X2APICIDS[0]);
>> +
>> +	guard(cpus_read_lock)();
>> +
>> +	for (int i = 0; i < num_ids; i++) {
>> +		cpu = topo_lookup_cpuid(cacd->X2APICIDS[i]);
>> +		if (cpu < 0) {
>> +			pr_warn(FW_BUG "Unknown x2apicid 0x%x\n", cacd->X2APICIDS[i]);
>> +
> (nit: unnecessary empty line)
> 
> 

OK, will remove it.

>> +			return -1;
>> +		}
>> +
>> +		if (!cpu_online(cpu))
>> +			continue;
>> +
>> +		tmp = get_cpu_cacheinfo_id(cpu, RESCTRL_L3_CACHE);
> 
>  From what I can tell it is tmp that is essentially returned by this function and it
> is the result of get_cpu_cacheinfo_id() that returns the cache ID initialized from
> CPUID leaf 0x4 ... the function name ("get_l3_cache_id_from_cacd()") and function comment
> "Resolve L3 cache ID from CACD subtable" implies differently and the way this is done
> makes me wonder why CACD needs to be parsed at all.
> 
> resctrl already has a hotplug handler and it dynamically determines the domain ID
> based on the cache ID returned from get_cpu_cacheinfo_id(). If the domain ID used by the
> new ERDT tables can be, as it appears is done here, trusted to what Linux already
> treating as domain ID then why is it needed to parse CACD at all? Why not just
> pick the domain ID from RMDD directly and use it directly as domain ID?
> 
> Later the domain ID from RMDD is just used in error messages ... never actually used
> as a domain ID to guide the implementation so the correlation appears to be implicit with
> this CACD parsing not intuitive.
> 

You are right that the cache ID ultimately comes from 
get_cpu_cacheinfo_id(),
not from CACD directly - the function name is misleading and should be 
improved.

However, my understanding is that CACD parsing acts as the bridge 
between ACPI's
RMDD domain namespace and the kernel's L3 domain ID: CACD enumerates 
which CPUs
belong to each RMDD, and from those CPUs we derive the kernel-side L3 
cache ID.
The specification defines an RMDD DomainID as an identifier unique 
within the
ERDT table, yet it does not guarantee that this ID matches the kernel’s 
cache ID
obtained via CPUID leaf 0x4. For this reason, we omitted the RMDD domain 
ID in
the current release.


>> +static void __iomem *erdt_ioremap_checked(phys_addr_t base, u32 size, const char *desc)
> 
> "size" -> "num_pages"?
> 

OK, will change the name.

>> +{
>> +	void __iomem *addr = ioremap(base, size << 12);
> 
> The 12 magic constant is not ideal. Seems like PAGE_SHIFT is avoided since this *has* to be
> 4K? Would "num_pages * SZ_4K" be more accurate? To be most robust please consider check_mul_overflow()
> or check_shl_overflow().
> 

OK, will change it to use SZ_4K and check the overflow.

> It is not clear to me what the "_checked" of the function name implies but I do not see
> the parameters checked. For example, base could be checked for 4KB alignment?
> 

The _checked suffix was intended to perform error checking for ioremap, 
not the 4KB
alignment check. I will remove this suffix.

>> +
>> +	if (!addr) {
>> +		pr_err("ERDT: Failed to map %s at phys addr %#llx (size: %u pages)\n",
> 
> phys_addt_t is printed with %pa
>

OK, will do.

>> +	domain_info = kzalloc(sizeof(*domain_info), GFP_KERNEL);
> 
> Please use kzalloc_obj().
> commit 2932ba8d9c99 ("slab: Introduce kmalloc_obj() and family") contains details about benefits.
> 

OK, thanks for sharing, will do.

>> +	if (!domain_info)
>> +		return false;
>> +
>> +	domain_info->base[ERDT_MMIO_RMDD_CREG] =
>> +		erdt_ioremap_checked(rmdd->creg_base, rmdd->creg_size, "RMDD ctrl base");
>> +	if (!domain_info->base[ERDT_MMIO_RMDD_CREG])
>> +		goto cleanup;
>> +
>> +	for (subtbl = rmdd_subtbl(rmdd);
>> +	     subtbl_valid((void *)rmdd + rmdd->header.length, subtbl);
> 
> This does not look safe to me. What will happen if RMDD is empty? Would subtbl not
> point to end of the table and then subtbl_valid() would dereference pointer beyond the
> end of the table to do its checks?
> 

I see. I will switch to the following version and add comments to 
clarify this
check:
static inline bool subtbl_valid(void *end, struct acpi_subtbl_hdr_16 
*subtbl)
{
         /* Ensure the header is within bounds before dereferencing it. */
         if ((void *)subtbl + sizeof(*subtbl) > end)
                 return false;

         /* A sub-table must be at least as large as its header. */
         if (subtbl->length < sizeof(*subtbl))
                 return false;

         /* The entire sub-table (including body) must fit within the 
parent. */
         if ((void *)subtbl + subtbl->length > end)
                 return false;

         return true;
}

>> +static void erdt_cleanup(void)
>> +{
>> +	struct erdt_domain_info *d;
>> +	unsigned long index;
>> +
> 
> Looks like the initialization via get_rdt_resources() does not do any error
> checking so there may be nothing to clean up and certainly there will be nothing
> to clean up if ERDT is not supported. Can the cleanup have a short circuit here to
> only clean up if the array is not empty or perhaps skip cleanup if !erdt_enabled?
> 

OK, let me add !erdt_enabled in erdt_cleanup().

>> +	xa_for_each(&erdt_domain_xa, index, d)
>> +		cleanup_one_domain(d);
>> +	xa_destroy(&erdt_domain_xa);
>> +}
>> +
>> +static __init int enumerate_erdt_table(struct acpi_table_header *table_hdr)
>> +{
>> +	struct acpi_table_erdt *erdt = (struct acpi_table_erdt *)table_hdr;
>> +	struct acpi_subtbl_hdr_16 *subtbl;
>> +	void *table_end;
>> +
>> +	if (erdt->header.revision != ERDT_VALID_VERSION) {
>> +		pr_info("Unknown ERDT table revision %d\n", erdt->header.revision);
> (nit: "Unknown" -> "Unsupported"?)
> 

OK, will change it.

>> +		return -EINVAL;
>> +	}
>> +
>> +	if (erdt->header.length < sizeof(*erdt)) {
>> +		pr_info(FW_BUG "ERDT: Invalid table length %u\n", erdt->header.length);
> 
> Here is an example of a "pr_info()" FW_BUG. It may help to add a unit
> and not just report "length"
> 

OK, will use pr_warn() and add a "bytes".

>> +void erdt_exit(void)
>> +{
>> +	erdt_cleanup();
>> +}
> 
> Why is erdt_exit() and erdt_cleanup() both needed?
> 

Previously erdt_cleanup() was supposed to be used internally by erdt.c,
and erdt_exit() was exposed for non-arch code. But yes, we can use only
one, let me change it.

thanks,
Chenyu



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 3/6] x86/resctrl: Rename prev_msr to prev_mon_val
  2026-06-18 23:39   ` Reinette Chatre
@ 2026-06-22 12:42     ` Chen, Yu C
  2026-06-22 21:29       ` Reinette Chatre
  0 siblings, 1 reply; 23+ messages in thread
From: Chen, Yu C @ 2026-06-22 12:42 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: x86, linux-kernel, tglx, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy, tony.luck,
	Chen Yu

Hi Reinette,

On 6/19/2026 7:39 AM, Reinette Chatre wrote:
> Hi Chenyu,
> 
> On 6/13/26 12:57 AM, Chen Yu wrote:
>> Rename the prev_msr field in struct arch_mbm_state to prev_mon_val.
> 
> This change as described seems out of place when considering the following
> from cover letter:
>   This patch set focuses on the first part: enabling MMIO-based access for
>   Cache Monitoring Technology (CMT), *while CAT/MBM/MBA are still using MSR*.
> 
> It is thus confusing to find a change related to switching MBM to use MSR
> in a series that claims that it does not do this.
> 
> The patch looks fine but reading the changelog and the first few hunks makes
> this patch look out of place. Could you please update the changelog to be
> accurate about what this patch does?
> 

OK, let me change it to:

x86/resctrl: Rename prev_msr to prev_mon_val

Rename the prev_msr field in struct arch_mbm_state to prev_mon_val
to decouple the field name from the specific access method.

Use a generic name for the stored previous monitoring counter value
so that the field remains accurate regardless of whether the value
is read through an MSR or another interface in the future.

No functional change.

thanks,
Chenyu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 6/6] x86/resctrl: Add support for L3 occupancy monitoring via RMID MMIO read
  2026-06-18 23:40   ` Reinette Chatre
@ 2026-06-22 14:09     ` Chen, Yu C
  2026-06-22 21:30       ` Reinette Chatre
  0 siblings, 1 reply; 23+ messages in thread
From: Chen, Yu C @ 2026-06-22 14:09 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: x86, linux-kernel, tglx, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy, tony.luck,
	Chen Yu

Hi Reinette,

On 6/19/2026 7:40 AM, Reinette Chatre wrote:
> Hi Chenyu,
> 
> On 6/13/26 12:57 AM, Chen Yu wrote:
>> The CMRC (Cache Monitoring Registers for CPU Agents Description)
>> ACPI sub-table provides the MMIO address used to read the LLC
>> occupancy counter for each RMID. When ERDT is enabled on the
>> platform, use this MMIO interface instead of the legacy MSR read
>> to obtain the L3 occupancy value.
>>
>> Introduce erdt_mon_read(), a helper that retrieves monitoring
>> data for a given RMID and event ID from an ERDT domain. Initial
>> support is added for the L3 occupancy monitoring event
>> (QOS_L3_OCCUP_EVENT_ID).
>>
>> If the platform supports ERDT, CMRC-based MMIO access is used by
>> default. If ERDT is unavailable, the implementation is to use
>> MSR-based operations.
> 
> (nit: please write in imperative tone and use entire line length available)
> 
> ...
> 

OK, will do in next version.

>> diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
>> index 97c2f6bc7a5f..9b3b03279dd8 100644
>> --- a/arch/x86/include/asm/resctrl.h
>> +++ b/arch/x86/include/asm/resctrl.h
>> @@ -41,6 +41,8 @@ struct resctrl_pqr_state {
>>   };
>>   
>>   bool erdt_enabled(void);
>> +struct rdt_domain_hdr;
>> +int erdt_mon_read(struct rdt_domain_hdr *hdr, int ev_id, int rmid, u64 *val);
>>   
>>   DECLARE_PER_CPU(struct resctrl_pqr_state, pqr_state);
>>   
>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>> index 90730f0851fa..fe812f7190fc 100644
>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>> @@ -965,7 +965,7 @@ static __init bool get_rdt_mon_resources(void)
>>   	bool ret = false;
>>   
>>   	if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) {
>> -		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false, 0, NULL);
>> +		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, erdt_enabled(), 0, NULL);
>>   		ret = true;
>>   	}
>>   	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
> 
> As mentioned in patch #1, when erdt_enabled() is true the enumeration still proceeds to
> enumerate the monitoring properties via CPUID to discover the number of RMIDs that the
> *MSR* supports and use it as the maximum RMID (and thus the maximum number of registers)
> that MMIO supports?
> 

OK, will switch to the maximum RMID exposed by ACPI table, if 
erdt_enabled() is true.

>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index 9209927f88a2..1491f96b57c3 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -278,6 +278,13 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
>>   
>>   	switch (r->rid) {
>>   	case RDT_RESOURCE_L3:
>> +		/*
>> +		 * No SNC for mmio based L3 occupancy, so there is no need
> 
> "No SNC for mmio based L3 occupancy" is too significant to be buried like this. Could you
> please elaborate on this claim and enforce it with the implementation? For example,
> could rdt_get_l3_mon_config() WARN and *not* set r->mon_capable if erdt_enable() and
> snc_nodes_per_l3_cache > 1?
> 

IIUC, SNC happens to be unsupported on platforms that currently support 
MMIO-based
access. I’m not sure if there will be a platform that enables both MMIO 
access and
SNC. I’ll add a WARN here as suggested (and handle the case if such 
platform is
introduced later).

thanks,
Chenyu

>> +		 * to convert logical RMID to a physical RMID via
>> +		 * logical_rmid_to_physical_rmid().
>> +		 */
>> +		if (erdt_enabled() && eventid == QOS_L3_OCCUP_EVENT_ID)
>> +			return erdt_mon_read(hdr, eventid, rmid, val);
>>   		return arch_l3_read_event(hdr, rmid, eventid, val, r);
>>   	case RDT_RESOURCE_PERF_PKG:
>>   		return intel_aet_read_event(hdr->id, rmid, arch_priv, val);
> 
> Reinette

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 1/6] x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID
  2026-06-22  9:07     ` Chen, Yu C
@ 2026-06-22 21:28       ` Reinette Chatre
  2026-06-23  6:29         ` Chen, Yu C
  0 siblings, 1 reply; 23+ messages in thread
From: Reinette Chatre @ 2026-06-22 21:28 UTC (permalink / raw)
  To: Chen, Yu C
  Cc: x86, linux-kernel, tglx, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy, tony.luck

Hi Chenyu,

On 6/22/26 2:07 AM, Chen, Yu C wrote:
> On 6/19/2026 7:37 AM, Reinette Chatre wrote:
>> On 6/13/26 12:56 AM, Chen Yu wrote:

...

>>>   arch/x86/Kconfig                       |   4 +-
>>>   arch/x86/include/asm/apic.h            |   1 +
>>>   arch/x86/include/asm/resctrl.h         |   2 +
>>>   arch/x86/kernel/cpu/resctrl/Makefile   |   1 +
>>>   arch/x86/kernel/cpu/resctrl/core.c     |  14 +-
>>>   arch/x86/kernel/cpu/resctrl/erdt.c     | 296 +++++++++++++++++++++++++
>>>   arch/x86/kernel/cpu/resctrl/internal.h |   3 +
>>>   arch/x86/kernel/cpu/topology.c         |   2 +-
>>
>> Please split out the non-resctrl changes into separate patch with sybsystem-specific
>> prefix to highlight these changes to not appear to sneak changes into other
>> subsystems via resctrl. They can stil form part of this series as preparatory patches.
>>
> 
> I see, let me split two of them out:
> x86/topology: Export topo_lookup_cpuid for module use
> x86/apic: Add declaration for topo_lookup_cpuid

I think a single patch would be ok. Interestingly the subject prefix to
this area is not consistent ... looks like either "x86/topology" or
"x86/cpu/topology" would be ok.

> 
> 
>>>   config X86_CPU_RESCTRL
>>>       bool "x86 CPU resource control support"
>>> -    depends on X86 && (CPU_SUP_INTEL || CPU_SUP_AMD)
>>> +    depends on X86_64 && (CPU_SUP_INTEL || CPU_SUP_AMD)
>>>       depends on MISC_FILESYSTEMS
>>>       select ARCH_HAS_CPU_RESCTRL
>>>       select RESCTRL_FS
>>> @@ -538,7 +538,7 @@ config X86_CPU_RESCTRL
>>>     config X86_CPU_RESCTRL_INTEL_AET
>>>       bool "Intel Application Energy Telemetry"
>>> -    depends on X86_64 && X86_CPU_RESCTRL && CPU_SUP_INTEL && INTEL_PMT_TELEMETRY=y && INTEL_TPMI=y
>>> +    depends on X86_CPU_RESCTRL && CPU_SUP_INTEL && INTEL_PMT_TELEMETRY=y && INTEL_TPMI=y
>>>       help
>>>         Enable per-RMID telemetry events in resctrl.
>>>   
>>
>> Removing 32bit support is a significant change worth a mention and motivation in the changelog.
>>
> 
> The original idea was that MMIO-based access (including CMT/MBM/MBA) uses readq/writeq,
> which depend on 64-bit. Since MMIO CMT relies on CONFIG_X86_CPU_RESCTRL, making
> CONFIG_X86_CPU_RESCTRL depend on X86_64 would simplify the code.
> If I understand correctly, 32-bit x86 lacks practical hardware support for RDT/QoS.
> The Kconfig entry currently depends on X86, which technically permits 32-bit builds.
> However, I am unsure whether there exists any real-world use case where a user compiles
> a 32-bit kernel on a server and enables resctrl?
> If we wish to retain 32-bit RDT support, I can introduce a new config option to gate ERDT:
> config X86_CPU_RESCTRL_ERDT
>     bool
>     depends on X86_CPU_RESCTRL && X86_64 && ACPI
>     default y
> Could you advise which approach is preferable?

x86 resctrl will soon depend on 64bit, this dependency will arrive via this series
or the AET reworks. Whether x86 resctrl depends on 64bit is not the question here.

My question is more specific though:
	What in *this* patch requires dropping 32bit support?
You highlight above that 64bit support is needed for readq/writeq. That is understood.
Even so, this patch does not have any instances of readq/writeq. This patch thus turns
off 32bit support without any mention and without any reason to do so. This is a very
significant change to just sneak into a patch that does not appear to require such
change.

>>> diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
>>> index 575f8408a9e7..97c2f6bc7a5f 100644
>>> --- a/arch/x86/include/asm/resctrl.h
>>> +++ b/arch/x86/include/asm/resctrl.h
>>> @@ -40,6 +40,8 @@ struct resctrl_pqr_state {
>>>       u32            default_closid;
>>>   };
>>>   +bool erdt_enabled(void);
>>> +
>>
>> I cannot see why this needs to be in asm header ... why not arch/x86/kernel/cpu/resctrl/internal.h?
>>
> 
> OK, let me move it into internal.h. By the way, I believe fs/resctrl
> also needs to call erdt_enabled() for region-aware MBA/MBM. Would it
> be acceptable to expose erdt_enabled() in include/linux/resctrl.h?

I do not see any calls to erdt_enabled() from resctrl fs in this series.
I also do not think such calls would be appropriate. Whether x86's RDT is
backed with CPUID or ACPI is an architectural detail of no concern to resctrl
fs.


....

>>> +
>>> +struct erdt_domain_info {
>>> +    void __iomem        *base[ERDT_MMIO_MAX];
>>> +    struct acpi_erdt_cmrc    *cmrc;
>>> +};
>>> +
>>> +static bool erdt_enabled_flag;
>>
>> What is significance of "flag"? Can it just be "erdt_enabled"? Since it is a bool this seems
>> more intuitive?
>>
> 
> erdt_enabled() was already used as a function name. Let me change it to static bool is_enabled

This could also be named "__erdt_enabled" to highlight that it is internal.

>>> +
>>> +#define ERDT_VALID_VERSION        1
>>> +#define RMDD_FLAG_CPU_DOMAIN        BIT(0)
>>
>> I think this name can be more accurate, how about, for example "RMDD_FLAG_L3_DOMAIN"?
>>
> 
> According to the spec, this flag indicates: CPU L3 cache and I/O L3 Domain, maybe using
> RMDD_FLAG_CPU_L3_DOMAIN?

Looks good to me.

...

>>> +            return -1;
>>> +        }
>>> +
>>> +        if (!cpu_online(cpu))
>>> +            continue;
>>> +
>>> +        tmp = get_cpu_cacheinfo_id(cpu, RESCTRL_L3_CACHE);
>>
>>  From what I can tell it is tmp that is essentially returned by this function and it
>> is the result of get_cpu_cacheinfo_id() that returns the cache ID initialized from
>> CPUID leaf 0x4 ... the function name ("get_l3_cache_id_from_cacd()") and function comment
>> "Resolve L3 cache ID from CACD subtable" implies differently and the way this is done
>> makes me wonder why CACD needs to be parsed at all.
>>
>> resctrl already has a hotplug handler and it dynamically determines the domain ID
>> based on the cache ID returned from get_cpu_cacheinfo_id(). If the domain ID used by the
>> new ERDT tables can be, as it appears is done here, trusted to what Linux already
>> treating as domain ID then why is it needed to parse CACD at all? Why not just
>> pick the domain ID from RMDD directly and use it directly as domain ID?
>>
>> Later the domain ID from RMDD is just used in error messages ... never actually used
>> as a domain ID to guide the implementation so the correlation appears to be implicit with
>> this CACD parsing not intuitive.
>>
> 
> You are right that the cache ID ultimately comes from get_cpu_cacheinfo_id(),
> not from CACD directly - the function name is misleading and should be improved.
> 
> However, my understanding is that CACD parsing acts as the bridge between ACPI's
> RMDD domain namespace and the kernel's L3 domain ID: CACD enumerates which CPUs
> belong to each RMDD, and from those CPUs we derive the kernel-side L3 cache ID.
> The specification defines an RMDD DomainID as an identifier unique within the
> ERDT table, yet it does not guarantee that this ID matches the kernel’s cache ID
> obtained via CPUID leaf 0x4. For this reason, we omitted the RMDD domain ID in
> the current release.

get_l3_cache_id_from_cacd() implies a straightforward query and the function
comments supports this. At the same time the function includes what appears to be
sanity checks but from what you say turns out to be essential enforcement of the
RMDD to CPUID leaf 0x4 mapping. Is this correct? As sanity checks these additional
checks are ok but as RMDD to CPUID leaf 0x4 mapping enforcement I find them inadequate
because its correctness requires all CPUs of all domains to be online which cannot be
guaranteed when this initialization runs.

Consider for example a scenario where RMDD and CPUID do not agree on which CPUs form
part of a domain. If only one CPU of this RMDD domain is online at the time this
initialization is done then the mismatch will never be noticed and resctrl will just
silently use the CPUID mapping, no?

Reinette

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 3/6] x86/resctrl: Rename prev_msr to prev_mon_val
  2026-06-22 12:42     ` Chen, Yu C
@ 2026-06-22 21:29       ` Reinette Chatre
  2026-06-23  7:48         ` Chen, Yu C
  0 siblings, 1 reply; 23+ messages in thread
From: Reinette Chatre @ 2026-06-22 21:29 UTC (permalink / raw)
  To: Chen, Yu C
  Cc: x86, linux-kernel, tglx, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy, tony.luck

Hi Chenyu,

On 6/22/26 5:42 AM, Chen, Yu C wrote:
> Hi Reinette,
> 
> On 6/19/2026 7:39 AM, Reinette Chatre wrote:
>> Hi Chenyu,
>>
>> On 6/13/26 12:57 AM, Chen Yu wrote:
>>> Rename the prev_msr field in struct arch_mbm_state to prev_mon_val.
>>
>> This change as described seems out of place when considering the following
>> from cover letter:
>>   This patch set focuses on the first part: enabling MMIO-based access for
>>   Cache Monitoring Technology (CMT), *while CAT/MBM/MBA are still using MSR*.
>>
>> It is thus confusing to find a change related to switching MBM to use MSR
>> in a series that claims that it does not do this.
>>
>> The patch looks fine but reading the changelog and the first few hunks makes
>> this patch look out of place. Could you please update the changelog to be
>> accurate about what this patch does?
>>
> 
> OK, let me change it to:
> 
> x86/resctrl: Rename prev_msr to prev_mon_val
> 
> Rename the prev_msr field in struct arch_mbm_state to prev_mon_val
> to decouple the field name from the specific access method.

This patch changes many more names to support this switch to MMIO yet the changelog
only highlights one of the changes which is the struct member specific to MBM while
the work explicitly states that MBM monitoring is not impacted by this series.

Consider for comparison a more generic (just a draft to present an idea, please do not just copy&paste):
	x86/resctrl: Replace "msr" in monitoring data identifiers

	Monitoring data is only consumed via MSR and many identifiers handling
	the monitoring data contains "msr" as part of their names, for example
	"msr_val". 

	Replace "msr" in monitoring data identifiers as appropriate to support
	their use for monitoring data accessed via MMIO.

> 
> Use a generic name for the stored previous monitoring counter value
> so that the field remains accurate regardless of whether the value
> is read through an MSR or another interface in the future.
> 
> No functional change.
> 
> thanks,
> Chenyu


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 6/6] x86/resctrl: Add support for L3 occupancy monitoring via RMID MMIO read
  2026-06-22 14:09     ` Chen, Yu C
@ 2026-06-22 21:30       ` Reinette Chatre
  2026-06-23  5:00         ` Chen, Yu C
  0 siblings, 1 reply; 23+ messages in thread
From: Reinette Chatre @ 2026-06-22 21:30 UTC (permalink / raw)
  To: Chen, Yu C
  Cc: x86, linux-kernel, tglx, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy, tony.luck

Hi Chenyu,

On 6/22/26 7:09 AM, Chen, Yu C wrote:
> On 6/19/2026 7:40 AM, Reinette Chatre wrote:
>> On 6/13/26 12:57 AM, Chen Yu wrote:

...

>>> diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
>>> index 97c2f6bc7a5f..9b3b03279dd8 100644
>>> --- a/arch/x86/include/asm/resctrl.h
>>> +++ b/arch/x86/include/asm/resctrl.h
>>> @@ -41,6 +41,8 @@ struct resctrl_pqr_state {
>>>   };
>>>     bool erdt_enabled(void);
>>> +struct rdt_domain_hdr;
>>> +int erdt_mon_read(struct rdt_domain_hdr *hdr, int ev_id, int rmid, u64 *val);
>>>     DECLARE_PER_CPU(struct resctrl_pqr_state, pqr_state);
>>>   diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>>> index 90730f0851fa..fe812f7190fc 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>>> @@ -965,7 +965,7 @@ static __init bool get_rdt_mon_resources(void)
>>>       bool ret = false;
>>>         if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) {
>>> -        resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false, 0, NULL);
>>> +        resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, erdt_enabled(), 0, NULL);
>>>           ret = true;
>>>       }
>>>       if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
>>
>> As mentioned in patch #1, when erdt_enabled() is true the enumeration still proceeds to
>> enumerate the monitoring properties via CPUID to discover the number of RMIDs that the
>> *MSR* supports and use it as the maximum RMID (and thus the maximum number of registers)
>> that MMIO supports?
>>
> 
> OK, will switch to the maximum RMID exposed by ACPI table, if erdt_enabled() is true.

I believe the issue is larger than just the RMID enumeration. The CPUID and ACPI enumeration
appears to be fully intertwined. Taking a closer look at what above code does:
it checks *CPUID* whether CMT is enabled and then enables the LLC occupancy event to blindly use
MMIO if ERDT is enabled, irrespective of whether the ERDT tables include a cache monitoring table
or not. How is it guaranteed that if ERDT is enabled that there is a cache monitoring table?
Should it not be the existence of the ACPI cache monitoring table and its properties that
determines whether the LLC occupancy counter using MMIO registers should be enabled?

Reinette

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 6/6] x86/resctrl: Add support for L3 occupancy monitoring via RMID MMIO read
  2026-06-22 21:30       ` Reinette Chatre
@ 2026-06-23  5:00         ` Chen, Yu C
  2026-06-23 16:48           ` Reinette Chatre
  0 siblings, 1 reply; 23+ messages in thread
From: Chen, Yu C @ 2026-06-23  5:00 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: x86, linux-kernel, tglx, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy, tony.luck,
	Chen Yu

Hi Reinette,

On 6/23/2026 5:30 AM, Reinette Chatre wrote:
> Hi Chenyu,
>>>> diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
>>>> index 97c2f6bc7a5f..9b3b03279dd8 100644
>>>> --- a/arch/x86/include/asm/resctrl.h
>>>> +++ b/arch/x86/include/asm/resctrl.h
>>>> @@ -41,6 +41,8 @@ struct resctrl_pqr_state {
>>>>    };
>>>>      bool erdt_enabled(void);
>>>> +struct rdt_domain_hdr;
>>>> +int erdt_mon_read(struct rdt_domain_hdr *hdr, int ev_id, int rmid, u64 *val);
>>>>      DECLARE_PER_CPU(struct resctrl_pqr_state, pqr_state);
>>>>    diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>>>> index 90730f0851fa..fe812f7190fc 100644
>>>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>>>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>>>> @@ -965,7 +965,7 @@ static __init bool get_rdt_mon_resources(void)
>>>>        bool ret = false;
>>>>          if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) {
>>>> -        resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false, 0, NULL);
>>>> +        resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, erdt_enabled(), 0, NULL);
>>>>            ret = true;
>>>>        }
>>>>        if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
>>>
>>> As mentioned in patch #1, when erdt_enabled() is true the enumeration still proceeds to
>>> enumerate the monitoring properties via CPUID to discover the number of RMIDs that the
>>> *MSR* supports and use it as the maximum RMID (and thus the maximum number of registers)
>>> that MMIO supports?
>>>
>>
>> OK, will switch to the maximum RMID exposed by ACPI table, if erdt_enabled() is true.
> 
> I believe the issue is larger than just the RMID enumeration. The CPUID and ACPI enumeration
> appears to be fully intertwined. Taking a closer look at what above code does:
> it checks *CPUID* whether CMT is enabled and then enables the LLC occupancy event to blindly use
> MMIO if ERDT is enabled, irrespective of whether the ERDT tables include a cache monitoring table
> or not. How is it guaranteed that if ERDT is enabled that there is a cache monitoring table?
> Should it not be the existence of the ACPI cache monitoring table and its properties that
> determines whether the LLC occupancy counter using MMIO registers should be enabled?
>

I see. How about replacing erdt_enabled() with fine-grained helper 
functions such as
erdt_has_cmrc(), erdt_has_mmrc(), and erdt_has_marc()? The latter two 
will be added
later for region-aware MBM/MBA. CMRC, MMRC and MARC are not guaranteed 
to coexist,
so splitting them into separate helpers would offer finer control.

thanks,
Chenyu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 1/6] x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID
  2026-06-22 21:28       ` Reinette Chatre
@ 2026-06-23  6:29         ` Chen, Yu C
  2026-06-23 11:43           ` Chen, Yu C
  0 siblings, 1 reply; 23+ messages in thread
From: Chen, Yu C @ 2026-06-23  6:29 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: x86, linux-kernel, tglx, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy, tony.luck,
	Chen Yu

Hi Reinette,

On 6/23/2026 5:28 AM, Reinette Chatre wrote:
> Hi Chenyu,
> 
> On 6/22/26 2:07 AM, Chen, Yu C wrote:
>> On 6/19/2026 7:37 AM, Reinette Chatre wrote:
>>> On 6/13/26 12:56 AM, Chen Yu wrote:
> 
> ...
> 
>>>>    arch/x86/Kconfig                       |   4 +-
>>>>    arch/x86/include/asm/apic.h            |   1 +
>>>>    arch/x86/include/asm/resctrl.h         |   2 +
>>>>    arch/x86/kernel/cpu/resctrl/Makefile   |   1 +
>>>>    arch/x86/kernel/cpu/resctrl/core.c     |  14 +-
>>>>    arch/x86/kernel/cpu/resctrl/erdt.c     | 296 +++++++++++++++++++++++++
>>>>    arch/x86/kernel/cpu/resctrl/internal.h |   3 +
>>>>    arch/x86/kernel/cpu/topology.c         |   2 +-
>>>
>>> Please split out the non-resctrl changes into separate patch with sybsystem-specific
>>> prefix to highlight these changes to not appear to sneak changes into other
>>> subsystems via resctrl. They can stil form part of this series as preparatory patches.
>>>
>>
>> I see, let me split two of them out:
>> x86/topology: Export topo_lookup_cpuid for module use
>> x86/apic: Add declaration for topo_lookup_cpuid
> 
> I think a single patch would be ok. Interestingly the subject prefix to
> this area is not consistent ... looks like either "x86/topology" or
> "x86/cpu/topology" would be ok.
> 

OK, will do.

>>
>>
>>>>    config X86_CPU_RESCTRL
>>>>        bool "x86 CPU resource control support"
>>>> -    depends on X86 && (CPU_SUP_INTEL || CPU_SUP_AMD)
>>>> +    depends on X86_64 && (CPU_SUP_INTEL || CPU_SUP_AMD)
>>>>        depends on MISC_FILESYSTEMS
>>>>        select ARCH_HAS_CPU_RESCTRL
>>>>        select RESCTRL_FS
>>>> @@ -538,7 +538,7 @@ config X86_CPU_RESCTRL
>>>>      config X86_CPU_RESCTRL_INTEL_AET
>>>>        bool "Intel Application Energy Telemetry"
>>>> -    depends on X86_64 && X86_CPU_RESCTRL && CPU_SUP_INTEL && INTEL_PMT_TELEMETRY=y && INTEL_TPMI=y
>>>> +    depends on X86_CPU_RESCTRL && CPU_SUP_INTEL && INTEL_PMT_TELEMETRY=y && INTEL_TPMI=y
>>>>        help
>>>>          Enable per-RMID telemetry events in resctrl.
>>>>    
>>>
>>> Removing 32bit support is a significant change worth a mention and motivation in the changelog.
>>>
>>
>> The original idea was that MMIO-based access (including CMT/MBM/MBA) uses readq/writeq,
>> which depend on 64-bit. Since MMIO CMT relies on CONFIG_X86_CPU_RESCTRL, making
>> CONFIG_X86_CPU_RESCTRL depend on X86_64 would simplify the code.
>> If I understand correctly, 32-bit x86 lacks practical hardware support for RDT/QoS.
>> The Kconfig entry currently depends on X86, which technically permits 32-bit builds.
>> However, I am unsure whether there exists any real-world use case where a user compiles
>> a 32-bit kernel on a server and enables resctrl?
>> If we wish to retain 32-bit RDT support, I can introduce a new config option to gate ERDT:
>> config X86_CPU_RESCTRL_ERDT
>>      bool
>>      depends on X86_CPU_RESCTRL && X86_64 && ACPI
>>      default y
>> Could you advise which approach is preferable?
> 
> x86 resctrl will soon depend on 64bit, this dependency will arrive via this series
> or the AET reworks. Whether x86 resctrl depends on 64bit is not the question here.
> 
> My question is more specific though:
> 	What in *this* patch requires dropping 32bit support?
> You highlight above that 64bit support is needed for readq/writeq. That is understood.
> Even so, this patch does not have any instances of readq/writeq. This patch thus turns
> off 32bit support without any mention and without any reason to do so. This is a very
> significant change to just sneak into a patch that does not appear to require such
> change.
> 

Got it. In theory, the 32-bit removal could be integrated into [PATCH 6/6],
where readq/writeq are introduced. We can add corresponding comments and 
update
the commit log for [PATCH 6/6] to explain why 32-bit support is being 
dropped.
However, Handling it this way would sneak this trivial change into the 
patch series.
Would the following two alternatives be applicable?
1. Add explanatory text in the current [PATCH 1/6] to state that 
removing 32-bit
    is preparatory work for CMT, alongside background context on the 
motivation
    for this removal. Or
2. Create a standalone preparatory patch solely responsible for dropping 
32-bit support.

>>>> diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
>>>> index 575f8408a9e7..97c2f6bc7a5f 100644
>>>> --- a/arch/x86/include/asm/resctrl.h
>>>> +++ b/arch/x86/include/asm/resctrl.h
>>>> @@ -40,6 +40,8 @@ struct resctrl_pqr_state {
>>>>        u32            default_closid;
>>>>    };
>>>>    +bool erdt_enabled(void);
>>>> +
>>>
>>> I cannot see why this needs to be in asm header ... why not arch/x86/kernel/cpu/resctrl/internal.h?
>>>
>>
>> OK, let me move it into internal.h. By the way, I believe fs/resctrl
>> also needs to call erdt_enabled() for region-aware MBA/MBM. Would it
>> be acceptable to expose erdt_enabled() in include/linux/resctrl.h?
> 
> I do not see any calls to erdt_enabled() from resctrl fs in this series.
> I also do not think such calls would be appropriate. Whether x86's RDT is
> backed with CPUID or ACPI is an architectural detail of no concern to resctrl
> fs.

OK, so I suppose ERDT should only be visible within arch/x86.

>>>> +
>>>> +struct erdt_domain_info {
>>>> +    void __iomem        *base[ERDT_MMIO_MAX];
>>>> +    struct acpi_erdt_cmrc    *cmrc;
>>>> +};
>>>> +
>>>> +static bool erdt_enabled_flag;
>>>
>>> What is significance of "flag"? Can it just be "erdt_enabled"? Since it is a bool this seems
>>> more intuitive?
>>>
>>
>> erdt_enabled() was already used as a function name. Let me change it to static bool is_enabled
> 
> This could also be named "__erdt_enabled" to highlight that it is internal.
> 

OK.

[ ... ]

>>>   From what I can tell it is tmp that is essentially returned by this function and it
>>> is the result of get_cpu_cacheinfo_id() that returns the cache ID initialized from
>>> CPUID leaf 0x4 ... the function name ("get_l3_cache_id_from_cacd()") and function comment
>>> "Resolve L3 cache ID from CACD subtable" implies differently and the way this is done
>>> makes me wonder why CACD needs to be parsed at all.
>>>
>>> resctrl already has a hotplug handler and it dynamically determines the domain ID
>>> based on the cache ID returned from get_cpu_cacheinfo_id(). If the domain ID used by the
>>> new ERDT tables can be, as it appears is done here, trusted to what Linux already
>>> treating as domain ID then why is it needed to parse CACD at all? Why not just
>>> pick the domain ID from RMDD directly and use it directly as domain ID?
>>>
>>> Later the domain ID from RMDD is just used in error messages ... never actually used
>>> as a domain ID to guide the implementation so the correlation appears to be implicit with
>>> this CACD parsing not intuitive.
>>>
>>
>> You are right that the cache ID ultimately comes from get_cpu_cacheinfo_id(),
>> not from CACD directly - the function name is misleading and should be improved.
>>
>> However, my understanding is that CACD parsing acts as the bridge between ACPI's
>> RMDD domain namespace and the kernel's L3 domain ID: CACD enumerates which CPUs
>> belong to each RMDD, and from those CPUs we derive the kernel-side L3 cache ID.
>> The specification defines an RMDD DomainID as an identifier unique within the
>> ERDT table, yet it does not guarantee that this ID matches the kernel’s cache ID
>> obtained via CPUID leaf 0x4. For this reason, we omitted the RMDD domain ID in
>> the current release.
> 
> get_l3_cache_id_from_cacd() implies a straightforward query and the function
> comments supports this. At the same time the function includes what appears to be
> sanity checks but from what you say turns out to be essential enforcement of the
> RMDD to CPUID leaf 0x4 mapping. Is this correct? As sanity checks these additional
> checks are ok but as RMDD to CPUID leaf 0x4 mapping enforcement I find them inadequate
> because its correctness requires all CPUs of all domains to be online which cannot be
> guaranteed when this initialization runs.
> 
> Consider for example a scenario where RMDD and CPUID do not agree on which CPUs form
> part of a domain. If only one CPU of this RMDD domain is online at the time this
> initialization is done then the mismatch will never be noticed and resctrl will just
> silently use the CPUID mapping, no?
> 

I suppose you are referring to the following scenario:
CACD says CPUs {0,1,2,3} belong to RMDD-A, and CPUID says
CPUs {0,1} share L3-X while CPUs {2,3} share L3-Y (a mismatch).
If only CPU 0 is brought online:

get_l3_cache_id_from_cacd() iterates CACD's X2APIC list: {0,1,2,3}
CPU 1, 2, 3 are offline -> skipped
CPU 0 is online -> cache_id = get_cpu_cacheinfo_id(0) = L3-X,
all passed.

Then later CPU2 is online, resctrl's hotplug creates domain L3-Y.
But the xarray has no entry for L3-Y, so erdt_mon_read() will return
-EIO for that domain.

If this is the case, I wonder if we could run the sanity check during
CPU hotplug to reparse the CACD table whenever a CPU comes online. My
thought would be to call get_l3_cache_id_from_cacd() within
resctrl_arch_online_cpu().

thanks,
Chenyu


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 3/6] x86/resctrl: Rename prev_msr to prev_mon_val
  2026-06-22 21:29       ` Reinette Chatre
@ 2026-06-23  7:48         ` Chen, Yu C
  2026-06-23 16:47           ` Reinette Chatre
  0 siblings, 1 reply; 23+ messages in thread
From: Chen, Yu C @ 2026-06-23  7:48 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: x86, linux-kernel, tglx, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy, tony.luck,
	Chen Yu

Hi Reinette,

On 6/23/2026 5:29 AM, Reinette Chatre wrote:
> Hi Chenyu,
> 
>>>
>>> On 6/13/26 12:57 AM, Chen Yu wrote:
>>>> Rename the prev_msr field in struct arch_mbm_state to prev_mon_val.
>>>
>>> This change as described seems out of place when considering the following
>>> from cover letter:
>>>    This patch set focuses on the first part: enabling MMIO-based access for
>>>    Cache Monitoring Technology (CMT), *while CAT/MBM/MBA are still using MSR*.
>>>
>>> It is thus confusing to find a change related to switching MBM to use MSR
>>> in a series that claims that it does not do this.
>>>
>>> The patch looks fine but reading the changelog and the first few hunks makes
>>> this patch look out of place. Could you please update the changelog to be
>>> accurate about what this patch does?
>>>
>>
>> OK, let me change it to:
>>
>> x86/resctrl: Rename prev_msr to prev_mon_val
>>
>> Rename the prev_msr field in struct arch_mbm_state to prev_mon_val
>> to decouple the field name from the specific access method.
> 
> This patch changes many more names to support this switch to MMIO yet the changelog
> only highlights one of the changes which is the struct member specific to MBM while
> the work explicitly states that MBM monitoring is not impacted by this series.
> 
> Consider for comparison a more generic (just a draft to present an idea, please do not just copy&paste):
> 	x86/resctrl: Replace "msr" in monitoring data identifiers
> 
> 	Monitoring data is only consumed via MSR and many identifiers handling
> 	the monitoring data contains "msr" as part of their names, for example
> 	"msr_val".
> 
> 	Replace "msr" in monitoring data identifiers as appropriate to support
> 	their use for monitoring data accessed via MMIO.
> 
>>
>> Use a generic name for the stored previous monitoring counter value
>> so that the field remains accurate regardless of whether the value
>> is read through an MSR or another interface in the future.
>>
>> No functional change.
>>

Would the following be appropriate?

x86/resctrl: Replace "msr" in monitoring data identifiers

Monitoring counter values are currently obtained via MSR and many
identifiers involved in processing these values contain "msr"
as part of their names, for example "prev_msr" and "msr_val".

The ERDT ACPI table describes MMIO registers for monitoring access.
A subsequent patch adds MMIO-based L3 occupancy reads. Rename "msr"
based identifiers to interface-agnostic names so they remain accurate
as monitoring values may no longer originate exclusively from MSR.

No functional change.

thanks,
Chenyu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 1/6] x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID
  2026-06-23  6:29         ` Chen, Yu C
@ 2026-06-23 11:43           ` Chen, Yu C
  2026-06-23 16:46             ` Reinette Chatre
  0 siblings, 1 reply; 23+ messages in thread
From: Chen, Yu C @ 2026-06-23 11:43 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: x86, linux-kernel, tglx, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy, tony.luck,
	Chen Yu

On 6/23/2026 2:29 PM, Chen, Yu C wrote:
>>>>   From what I can tell it is tmp that is essentially returned by 
>>>> this function and it
>>>> is the result of get_cpu_cacheinfo_id() that returns the cache ID 
>>>> initialized from
>>>> CPUID leaf 0x4 ... the function name ("get_l3_cache_id_from_cacd()") 
>>>> and function comment
>>>> "Resolve L3 cache ID from CACD subtable" implies differently and the 
>>>> way this is done
>>>> makes me wonder why CACD needs to be parsed at all.
>>>>
>>>> resctrl already has a hotplug handler and it dynamically determines 
>>>> the domain ID
>>>> based on the cache ID returned from get_cpu_cacheinfo_id(). If the 
>>>> domain ID used by the
>>>> new ERDT tables can be, as it appears is done here, trusted to what 
>>>> Linux already
>>>> treating as domain ID then why is it needed to parse CACD at all? 
>>>> Why not just
>>>> pick the domain ID from RMDD directly and use it directly as domain ID?
>>>>
>>>> Later the domain ID from RMDD is just used in error messages ... 
>>>> never actually used
>>>> as a domain ID to guide the implementation so the correlation 
>>>> appears to be implicit with
>>>> this CACD parsing not intuitive.
>>>>
>>>
>>> You are right that the cache ID ultimately comes from 
>>> get_cpu_cacheinfo_id(),
>>> not from CACD directly - the function name is misleading and should 
>>> be improved.
>>>
>>> However, my understanding is that CACD parsing acts as the bridge 
>>> between ACPI's
>>> RMDD domain namespace and the kernel's L3 domain ID: CACD enumerates 
>>> which CPUs
>>> belong to each RMDD, and from those CPUs we derive the kernel-side L3 
>>> cache ID.
>>> The specification defines an RMDD DomainID as an identifier unique 
>>> within the
>>> ERDT table, yet it does not guarantee that this ID matches the 
>>> kernel’s cache ID
>>> obtained via CPUID leaf 0x4. For this reason, we omitted the RMDD 
>>> domain ID in
>>> the current release.
>>
>> get_l3_cache_id_from_cacd() implies a straightforward query and the 
>> function
>> comments supports this. At the same time the function includes what 
>> appears to be
>> sanity checks but from what you say turns out to be essential 
>> enforcement of the
>> RMDD to CPUID leaf 0x4 mapping. Is this correct? As sanity checks 
>> these additional
>> checks are ok but as RMDD to CPUID leaf 0x4 mapping enforcement I find 
>> them inadequate
>> because its correctness requires all CPUs of all domains to be online 
>> which cannot be
>> guaranteed when this initialization runs.
>>
>> Consider for example a scenario where RMDD and CPUID do not agree on 
>> which CPUs form
>> part of a domain. If only one CPU of this RMDD domain is online at the 
>> time this
>> initialization is done then the mismatch will never be noticed and 
>> resctrl will just
>> silently use the CPUID mapping, no?
>>
> 
> I suppose you are referring to the following scenario:
> CACD says CPUs {0,1,2,3} belong to RMDD-A, and CPUID says
> CPUs {0,1} share L3-X while CPUs {2,3} share L3-Y (a mismatch).
> If only CPU 0 is brought online:
> 
> get_l3_cache_id_from_cacd() iterates CACD's X2APIC list: {0,1,2,3}
> CPU 1, 2, 3 are offline -> skipped
> CPU 0 is online -> cache_id = get_cpu_cacheinfo_id(0) = L3-X,
> all passed.
> 
> Then later CPU2 is online, resctrl's hotplug creates domain L3-Y.
> But the xarray has no entry for L3-Y, so erdt_mon_read() will return
> -EIO for that domain.
> 
> If this is the case, I wonder if we could run the sanity check during
> CPU hotplug to reparse the CACD table whenever a CPU comes online. My
> thought would be to call get_l3_cache_id_from_cacd() within
> resctrl_arch_online_cpu().
> 

In fact, parsing the CACD table during CPU hotplug may not be feasible,
as the ACPI tables could have already been released. We could instead
perform the check directly against the erdt_domain_xa xarray. If a
mismatch is detected, disable ERDT and fall back to legacy MSR-based
access:

/* Called from resctrl_arch_online_cpu() */
void erdt_verify_cpu_domain(int cpu)
{
     int cache_id;

     if (!__erdt_enabled)
         return;

     cache_id = get_cpu_cacheinfo_id(cpu, RESCTRL_L3_CACHE);
     if (cache_id < 0)
         return;

     if (!xa_load(&erdt_domain_xa, cache_id)) {
         pr_warn(FW_BUG "CPU%d L3 domain %d not described by ERDT, 
disabling ERDT\n",
                cpu, cache_id);
         __erdt_enabled = false;
     }
}

thanks,
Chenyu

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 1/6] x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID
  2026-06-23 11:43           ` Chen, Yu C
@ 2026-06-23 16:46             ` Reinette Chatre
  0 siblings, 0 replies; 23+ messages in thread
From: Reinette Chatre @ 2026-06-23 16:46 UTC (permalink / raw)
  To: Chen, Yu C
  Cc: x86, linux-kernel, tglx, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy, tony.luck

Hi Chenyu,

On 6/23/26 4:43 AM, Chen, Yu C wrote:
> On 6/23/2026 2:29 PM, Chen, Yu C wrote:
>>>>>   From what I can tell it is tmp that is essentially returned by this function and it
>>>>> is the result of get_cpu_cacheinfo_id() that returns the cache ID initialized from
>>>>> CPUID leaf 0x4 ... the function name ("get_l3_cache_id_from_cacd()") and function comment
>>>>> "Resolve L3 cache ID from CACD subtable" implies differently and the way this is done
>>>>> makes me wonder why CACD needs to be parsed at all.
>>>>>
>>>>> resctrl already has a hotplug handler and it dynamically determines the domain ID
>>>>> based on the cache ID returned from get_cpu_cacheinfo_id(). If the domain ID used by the
>>>>> new ERDT tables can be, as it appears is done here, trusted to what Linux already
>>>>> treating as domain ID then why is it needed to parse CACD at all? Why not just
>>>>> pick the domain ID from RMDD directly and use it directly as domain ID?
>>>>>
>>>>> Later the domain ID from RMDD is just used in error messages ... never actually used
>>>>> as a domain ID to guide the implementation so the correlation appears to be implicit with
>>>>> this CACD parsing not intuitive.
>>>>>
>>>>
>>>> You are right that the cache ID ultimately comes from get_cpu_cacheinfo_id(),
>>>> not from CACD directly - the function name is misleading and should be improved.
>>>>
>>>> However, my understanding is that CACD parsing acts as the bridge between ACPI's
>>>> RMDD domain namespace and the kernel's L3 domain ID: CACD enumerates which CPUs
>>>> belong to each RMDD, and from those CPUs we derive the kernel-side L3 cache ID.
>>>> The specification defines an RMDD DomainID as an identifier unique within the
>>>> ERDT table, yet it does not guarantee that this ID matches the kernel’s cache ID
>>>> obtained via CPUID leaf 0x4. For this reason, we omitted the RMDD domain ID in
>>>> the current release.
>>>
>>> get_l3_cache_id_from_cacd() implies a straightforward query and the function
>>> comments supports this. At the same time the function includes what appears to be
>>> sanity checks but from what you say turns out to be essential enforcement of the
>>> RMDD to CPUID leaf 0x4 mapping. Is this correct? As sanity checks these additional
>>> checks are ok but as RMDD to CPUID leaf 0x4 mapping enforcement I find them inadequate
>>> because its correctness requires all CPUs of all domains to be online which cannot be
>>> guaranteed when this initialization runs.
>>>
>>> Consider for example a scenario where RMDD and CPUID do not agree on which CPUs form
>>> part of a domain. If only one CPU of this RMDD domain is online at the time this
>>> initialization is done then the mismatch will never be noticed and resctrl will just
>>> silently use the CPUID mapping, no?
>>>
>>
>> I suppose you are referring to the following scenario:
>> CACD says CPUs {0,1,2,3} belong to RMDD-A, and CPUID says
>> CPUs {0,1} share L3-X while CPUs {2,3} share L3-Y (a mismatch).
>> If only CPU 0 is brought online:
>>
>> get_l3_cache_id_from_cacd() iterates CACD's X2APIC list: {0,1,2,3}
>> CPU 1, 2, 3 are offline -> skipped
>> CPU 0 is online -> cache_id = get_cpu_cacheinfo_id(0) = L3-X,
>> all passed.
>>
>> Then later CPU2 is online, resctrl's hotplug creates domain L3-Y.
>> But the xarray has no entry for L3-Y, so erdt_mon_read() will return
>> -EIO for that domain.

An error would at least give indication of a problem but I also see a possible
scenario where the data will happily be read from the wrong register. Consider the
following change to the scenario you present: 
CACD: RMDD-A = {0, 2}; RMDD-B = {1, 3}
CPUID: L3-X = {0, 1}; L3-Y = {2, 3}

If ACPI enumeration is done with CPUs 1, 2, and 3 offline then they are skipped.
CPU 0 is online -> cache_id = get_cpu_cacheinfo_id(0) = L3-X

Later when CPU 1 comes online resctrl will consider it to be in L3-X and
also find an xarray entry for L3-X which will result in erdt_mon_read() to
read the RMID data from RMDD-A instead of RMDD-B?

>>
>> If this is the case, I wonder if we could run the sanity check during
>> CPU hotplug to reparse the CACD table whenever a CPU comes online. My
>> thought would be to call get_l3_cache_id_from_cacd() within
>> resctrl_arch_online_cpu().
>>
> 
> In fact, parsing the CACD table during CPU hotplug may not be feasible,
> as the ACPI tables could have already been released. We could instead
> perform the check directly against the erdt_domain_xa xarray. If a
> mismatch is detected, disable ERDT and fall back to legacy MSR-based
> access:
> 
> /* Called from resctrl_arch_online_cpu() */
> void erdt_verify_cpu_domain(int cpu)
> {
>     int cache_id;
> 
>     if (!__erdt_enabled)
>         return;
> 
>     cache_id = get_cpu_cacheinfo_id(cpu, RESCTRL_L3_CACHE);
>     if (cache_id < 0)
>         return;
> 
>     if (!xa_load(&erdt_domain_xa, cache_id)) {
>         pr_warn(FW_BUG "CPU%d L3 domain %d not described by ERDT, disabling ERDT\n",
>                cpu, cache_id);
>         __erdt_enabled = false;
>     }
> }
When considering the example scenario above I do not believe that this implements a
robust transition from ACPI to CPUID.

Additionally, disabling ERDT during a CPU online has broader implications: if ERDT was
enabled earlier then the associated events will be enabled at this point and have
mov_evt::any_cpu set. From what I can tell this would just implicitly expect that the
events could still be read via MSR but not actually route the read to the correct CPU?

I think it would be simpler to separate the ACPI enumeration from the CPUID enumeration
instead of wedging in these snippets at various touch points. The ACPI enumeration could,
for example, just record the ERDT RMDD domain <-> cpu_mask mapping without taking online CPUs
into account. Depending on which tables are found in ACPI, if ERDT is the intended mechanism then
any duplicate CPUID enumeration should be avoided. This means that there will be two unique
flows to enabling resctrl events, one for ACPI and one for CPUID. The x86 online handler can
perform the sanity check when a CPU comes online and since RMDD and CPUID are expected to
correlate the CPU could not be added to resctrl if it is not in the expected domain. To me
this seems simpler and more robust.

Reinette


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 3/6] x86/resctrl: Rename prev_msr to prev_mon_val
  2026-06-23  7:48         ` Chen, Yu C
@ 2026-06-23 16:47           ` Reinette Chatre
  0 siblings, 0 replies; 23+ messages in thread
From: Reinette Chatre @ 2026-06-23 16:47 UTC (permalink / raw)
  To: Chen, Yu C
  Cc: x86, linux-kernel, tglx, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy, tony.luck

Hi Chenyu,

On 6/23/26 12:48 AM, Chen, Yu C wrote:
> On 6/23/2026 5:29 AM, Reinette Chatre wrote:
> Would the following be appropriate?
> 
> x86/resctrl: Replace "msr" in monitoring data identifiers
> 
> Monitoring counter values are currently obtained via MSR and many
> identifiers involved in processing these values contain "msr"
> as part of their names, for example "prev_msr" and "msr_val".
> 
> The ERDT ACPI table describes MMIO registers for monitoring access.
> A subsequent patch adds MMIO-based L3 occupancy reads. Rename "msr"

Please drop the "A subsequent patch ...". Apart from it resembling the
dreaded "this patch", once it is merged the intended meaning is lost. As an
alternative the second paragraph can be:
	The ERDT ACPI table describes MMIO registers for monitoring data
	access. Rename "msr"-based identifiers to be interface-agnostic
	to support their upcoming use for MMIO-read values.

	No functional change.

> based identifiers to interface-agnostic names so they remain accurate
> as monitoring values may no longer originate exclusively from MSR.
> 
> No functional change.
Reinette

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 6/6] x86/resctrl: Add support for L3 occupancy monitoring via RMID MMIO read
  2026-06-23  5:00         ` Chen, Yu C
@ 2026-06-23 16:48           ` Reinette Chatre
  0 siblings, 0 replies; 23+ messages in thread
From: Reinette Chatre @ 2026-06-23 16:48 UTC (permalink / raw)
  To: Chen, Yu C
  Cc: x86, linux-kernel, tglx, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy, tony.luck

Hi Chenyu,

On 6/22/26 10:00 PM, Chen, Yu C wrote:
> Hi Reinette,
> 
> On 6/23/2026 5:30 AM, Reinette Chatre wrote:
>> Hi Chenyu,
>>>>> diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
>>>>> index 97c2f6bc7a5f..9b3b03279dd8 100644
>>>>> --- a/arch/x86/include/asm/resctrl.h
>>>>> +++ b/arch/x86/include/asm/resctrl.h
>>>>> @@ -41,6 +41,8 @@ struct resctrl_pqr_state {
>>>>>    };
>>>>>      bool erdt_enabled(void);
>>>>> +struct rdt_domain_hdr;
>>>>> +int erdt_mon_read(struct rdt_domain_hdr *hdr, int ev_id, int rmid, u64 *val);
>>>>>      DECLARE_PER_CPU(struct resctrl_pqr_state, pqr_state);
>>>>>    diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>>>>> index 90730f0851fa..fe812f7190fc 100644
>>>>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>>>>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>>>>> @@ -965,7 +965,7 @@ static __init bool get_rdt_mon_resources(void)
>>>>>        bool ret = false;
>>>>>          if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) {
>>>>> -        resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false, 0, NULL);
>>>>> +        resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, erdt_enabled(), 0, NULL);
>>>>>            ret = true;
>>>>>        }
>>>>>        if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
>>>>
>>>> As mentioned in patch #1, when erdt_enabled() is true the enumeration still proceeds to
>>>> enumerate the monitoring properties via CPUID to discover the number of RMIDs that the
>>>> *MSR* supports and use it as the maximum RMID (and thus the maximum number of registers)
>>>> that MMIO supports?
>>>>
>>>
>>> OK, will switch to the maximum RMID exposed by ACPI table, if erdt_enabled() is true.
>>
>> I believe the issue is larger than just the RMID enumeration. The CPUID and ACPI enumeration
>> appears to be fully intertwined. Taking a closer look at what above code does:
>> it checks *CPUID* whether CMT is enabled and then enables the LLC occupancy event to blindly use
>> MMIO if ERDT is enabled, irrespective of whether the ERDT tables include a cache monitoring table
>> or not. How is it guaranteed that if ERDT is enabled that there is a cache monitoring table?
>> Should it not be the existence of the ACPI cache monitoring table and its properties that
>> determines whether the LLC occupancy counter using MMIO registers should be enabled?
>>
> 
> I see. How about replacing erdt_enabled() with fine-grained helper functions such as
> erdt_has_cmrc(), erdt_has_mmrc(), and erdt_has_marc()? The latter two will be added
> later for region-aware MBM/MBA. CMRC, MMRC and MARC are not guaranteed to coexist,
> so splitting them into separate helpers would offer finer control.
I am concerned where this is headed since it looks to me as though the plan is to
sprinkle these finer grained checks throughout resctrl. To me this sounds complicated
and error prone. Consider that resctrl_enable_mon_event() has an arch_priv parameter. To me
this seems to be the appropriate place for the architecture to give itself the needed
information about how to read the event.

Reinette


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2026-06-23 16:48 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-13  7:56 [PATCH v4 0/6] Introduce MMIO-based CMT access for Enhanced RDT Chen Yu
2026-06-13  7:56 ` [PATCH v4 1/6] x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID Chen Yu
2026-06-18 23:37   ` Reinette Chatre
2026-06-22  9:07     ` Chen, Yu C
2026-06-22 21:28       ` Reinette Chatre
2026-06-23  6:29         ` Chen, Yu C
2026-06-23 11:43           ` Chen, Yu C
2026-06-23 16:46             ` Reinette Chatre
2026-06-13  7:57 ` [PATCH v4 2/6] x86/resctrl: Parse ACPI CMRC table Chen Yu
2026-06-13  7:57 ` [PATCH v4 3/6] x86/resctrl: Rename prev_msr to prev_mon_val Chen Yu
2026-06-18 23:39   ` Reinette Chatre
2026-06-22 12:42     ` Chen, Yu C
2026-06-22 21:29       ` Reinette Chatre
2026-06-23  7:48         ` Chen, Yu C
2026-06-23 16:47           ` Reinette Chatre
2026-06-13  7:57 ` [PATCH v4 4/6] x86/resctrl: Refactor the monitor read function Chen Yu
2026-06-13  7:57 ` [PATCH v4 5/6] fs/resctrl: Do not invoke smp_processor_id() in preemptible context Chen Yu
2026-06-13  7:57 ` [PATCH v4 6/6] x86/resctrl: Add support for L3 occupancy monitoring via RMID MMIO read Chen Yu
2026-06-18 23:40   ` Reinette Chatre
2026-06-22 14:09     ` Chen, Yu C
2026-06-22 21:30       ` Reinette Chatre
2026-06-23  5:00         ` Chen, Yu C
2026-06-23 16:48           ` Reinette Chatre

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.