All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/6] Introduce MMIO-based CMT access for Enhanced RDT
@ 2026-06-06  2:31 Chen Yu
  2026-06-06  2:32 ` [PATCH v3 1/6] x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID Chen Yu
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Chen Yu @ 2026-06-06  2:31 UTC (permalink / raw)
  To: tony.luck, reinette.chatre
  Cc: x86, linux-kernel, bp, tglx, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy

Intel Enhanced Resource Director Technology (ERDT) extends the existing
RDT framework with two major capabilities:

  1. MMIO-based access to monitoring and allocation registers, replacing
     the legacy MSR-based interface.
  2. Region-aware RDT for fine-grained control over different tiers of
     memory (e.g., CXL.mem, DDR).

This is described in the Intel RDT Architecture Specification:
https://cdrdv2-public.intel.com/789566/356688-intel-rdt-arch-spec.pdf

This patch set focuses on the first part: enabling MMIO-based access for
Cache Monitoring Technology (CMT), while CAT/MBM/MBA are still using MSR.
The platform advertises the MMIO register layout through the ACPI ERDT
(Enhanced Resource Director Technology) table, which contains sub-tables
describing per-domain register regions for monitoring and allocation.

With ERDT, L3 cache occupancy counters are read via MMIO rather than
MSR, allowing the reads to be performed from any CPU without requiring
cross-CPU IPIs. This series parses the relevant ACPI sub-tables (RMDD,
CMRC), prepares the resctrl monitor infrastructure for MMIO-based reads,
and adds initial support for reading L3 occupancy via the CMRC interface.

kselftest of CMT and L3_CAT has passed with minor adjustment at
https://lore.kernel.org/lkml/20260523101715.3964456-1-yu.c.chen@intel.com/.

Changes from V2 to V3:
- Wrap __resctrl_arch_late_init() to avoid the goto logic. (Thomas Gleixner)
- Make the variables in struct erdt_domain_info tabular format (Thomas Gleixner)
- Remove tail comments (Thomas Gleixner)
- Make the name of erdt_enabled() and variable in it consistent and
  comprehensible. (Thomas Gleixner)
- Use topo_lookup_cpuid() to search the CPU id according to the x2apic id
  (Thomas Gleixner)
- Fix kernel doc comment format (Thomas Gleixner)
- Use brackets for multiple lines "if" case. (Thomas Gleixner)
- Let the parameter for cacd_init() to fully utilize 100 characters.
  (Thomas Gleixner)
- Variables are reordered in reverse fir-tree.(Thomas Gleixner)
- Added a named constant and use it in the rmdd->flags check.
  (Thomas Gleixner)
- Introduce helper functions to make the code readable when iterating
  the RMDD tables. (Thomas Gleixner)
- Make the macros tabular format. (Thomas Gleixner)

Changes from V1 to V2:
- Add #include <linux/cleanup.h> to follow the "include-what-you-use" best
  practice (Tony Luck)
- Fix 3 issues reported by:
  https://sashiko.dev/#/patchset/cover.1779872016.git.yu.c.chen%40intel.com
  Remove the variable of cacd in struct erdt_domain_info as it will
  never be used after initialization.
  Invoke erdt_exit() to avoid resource leak if rdt_alloc_capable and
  rdt_mon_capable are both false.
  Adjust the comments suggested by sashiko.

Anil S Keshavamurthy (1):
  x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID

Chen Yu (4):
  x86/resctrl: Parse ACPI CMRC table
  x86/resctrl: Rename prev_msr to prev_mon_val
  x86/resctrl: Refactor the monitor read function
  x86/resctrl: Add support for L3 occupancy monitoring via RMID MMIO
    read

Tony Luck (1):
  fs/resctrl: Do not invoke smp_processor_id() in preemptible context

 arch/x86/Kconfig                       |   4 +-
 arch/x86/include/asm/apic.h            |   1 +
 arch/x86/include/asm/resctrl.h         |   4 +
 arch/x86/kernel/cpu/resctrl/Makefile   |   1 +
 arch/x86/kernel/cpu/resctrl/core.c     |  16 +-
 arch/x86/kernel/cpu/resctrl/erdt.c     | 433 +++++++++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/internal.h |  11 +-
 arch/x86/kernel/cpu/resctrl/monitor.c  |  64 ++--
 arch/x86/kernel/cpu/topology.c         |   2 +-
 fs/resctrl/monitor.c                   |  41 ++-
 10 files changed, 535 insertions(+), 42 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/resctrl/erdt.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3 1/6] x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID
  2026-06-06  2:31 [PATCH v3 0/6] Introduce MMIO-based CMT access for Enhanced RDT Chen Yu
@ 2026-06-06  2:32 ` Chen Yu
  2026-06-08  8:59   ` Thomas Gleixner
  2026-06-06  2:33 ` [PATCH v3 2/6] x86/resctrl: Parse ACPI CMRC table Chen Yu
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Chen Yu @ 2026-06-06  2:32 UTC (permalink / raw)
  To: tony.luck, reinette.chatre
  Cc: x86, linux-kernel, bp, tglx, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy

From: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>

ERDT(Enhanced RDT) introduces a new top-level ACPI structure
(the ERDT) that the kernel must parse before any enhanced
RDT feature can be used. The ERDT improves the existing RDT
by switching low-level register access from MSR-based to
MMIO-based, which is more efficient.

The ERDT structure may include several sub ACPI tables:

  - Resource Management Domain Description Structure (RMDD)
  - CPU Agent Collection Description Structure (CACD)
  - Cache Monitoring Registers for CPU Agents Description Structure
    (CMRC)

There is one ERDT table per platform.
Each RMDD substructure in ERDT represents one resource management
domain (RMD), also known as an L3 domain. Thus, the total number
of RMDDs equals the number of L3 domains on the platform.
Each RMDD contains information such as MMIO addresses. This address
is used to retrieve RDT metrics like L3 occupancy.

Add basic ERDT ACPI table and sub-table parsing, and store the
relevant tables for later processing.

Among these sub-tables, RMDD requires special handling. There is one
RMDD per domain, and the domain ID reuses the L3 cache ID. Many code
paths need to retrieve an RMDD efficiently by domain ID (L3 cache ID).
Because L3 cache IDs are derived from x2APIC IDs and are not
contiguous, using a plain array indexed by domain ID would waste
memory. As a trade-off, an xarray is used to store these tables, with
the L3 cache ID as the key.

Suggested-by: Tony Luck <tony.luck@intel.com>
Co-developed-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
---
v2->v3:
   Wrap __resctrl_arch_late_init() to avoid the goto logic. (Thomas Gleixner)
   Make the variables in struct erdt_domain_info tabular format (Thomas Gleixner)
   Remove tail comments (Thomas Gleixner)
   Make the name of erdt_enabled() and variable in it consistent and
   comprehensible. (Thomas Gleixner)
   Use topo_lookup_cpuid() to search the CPU id according to the x2apic id
   (Thomas Gleixner)
   Fix kernel doc comment format (Thomas Gleixner)
   Use brackets for multiple lines "if" case. (Thomas Gleixner)
   Let the parameter for cacd_init() to fully utilize 100 characters.
   (Thomas Gleixner)
   Variables are reordered in reverse fir-tree.(Thomas Gleixner)
   Added a named constant and use it in the rmdd->flags check.
   (Thomas Gleixner)
   Introduce helper functions to make the code readable when iterating
   the RMDD tables. (Thomas Gleixner)

v1->v2:
  Add #include <linux/cleanup.h> to follow the "include-what-you-use" best.
  (Tony Luck)
  Remove the variable of cacd in struct erdt_domain_info as it will
  never be used after initialization.(sashiko)
  Invoke erdt_exit() to avoid resource leak if rdt_alloc_capable and
  rdt_mon_capable are both false.(sashiko)
  Refine comments.(sashiko)
---
 arch/x86/Kconfig                       |   4 +-
 arch/x86/include/asm/apic.h            |   1 +
 arch/x86/include/asm/resctrl.h         |   2 +
 arch/x86/kernel/cpu/resctrl/Makefile   |   1 +
 arch/x86/kernel/cpu/resctrl/core.c     |  14 +-
 arch/x86/kernel/cpu/resctrl/erdt.c     | 304 +++++++++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/internal.h |   3 +
 arch/x86/kernel/cpu/topology.c         |   2 +-
 8 files changed, 327 insertions(+), 4 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/resctrl/erdt.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f3f7cb01d69d..97d210bd9bb5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -515,7 +515,7 @@ config X86_MPPARSE
 
 config X86_CPU_RESCTRL
 	bool "x86 CPU resource control support"
-	depends on X86 && (CPU_SUP_INTEL || CPU_SUP_AMD)
+	depends on X86_64 && (CPU_SUP_INTEL || CPU_SUP_AMD)
 	depends on MISC_FILESYSTEMS
 	select ARCH_HAS_CPU_RESCTRL
 	select RESCTRL_FS
@@ -538,7 +538,7 @@ config X86_CPU_RESCTRL
 
 config X86_CPU_RESCTRL_INTEL_AET
 	bool "Intel Application Energy Telemetry"
-	depends on X86_64 && X86_CPU_RESCTRL && CPU_SUP_INTEL && INTEL_PMT_TELEMETRY=y && INTEL_TPMI=y
+	depends on X86_CPU_RESCTRL && CPU_SUP_INTEL && INTEL_PMT_TELEMETRY=y && INTEL_TPMI=y
 	help
 	  Enable per-RMID telemetry events in resctrl.
 
diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 9cd493d467d4..bb84651b14bd 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -54,6 +54,7 @@ static inline void x86_32_probe_apic(void) { }
 #endif
 
 extern u32 cpuid_to_apicid[];
+int topo_lookup_cpuid(u32 apic_id);
 
 #define CPU_ACPIID_INVALID	U32_MAX
 
diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index 575f8408a9e7..97c2f6bc7a5f 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -40,6 +40,8 @@ struct resctrl_pqr_state {
 	u32			default_closid;
 };
 
+bool erdt_enabled(void);
+
 DECLARE_PER_CPU(struct resctrl_pqr_state, pqr_state);
 
 extern bool rdt_alloc_capable;
diff --git a/arch/x86/kernel/cpu/resctrl/Makefile b/arch/x86/kernel/cpu/resctrl/Makefile
index 273ddfa30836..2216ee084832 100644
--- a/arch/x86/kernel/cpu/resctrl/Makefile
+++ b/arch/x86/kernel/cpu/resctrl/Makefile
@@ -2,6 +2,7 @@
 obj-$(CONFIG_X86_CPU_RESCTRL)		+= core.o rdtgroup.o monitor.o
 obj-$(CONFIG_X86_CPU_RESCTRL)		+= ctrlmondata.o
 obj-$(CONFIG_X86_CPU_RESCTRL_INTEL_AET)	+= intel_aet.o
+obj-$(CONFIG_X86_CPU_RESCTRL)		+= erdt.o
 obj-$(CONFIG_RESCTRL_FS_PSEUDO_LOCK)	+= pseudo_lock.o
 
 # To allow define_trace.h's recursive include:
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 7667cf7c4e94..90730f0851fa 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -1012,6 +1012,7 @@ static __init void check_quirks(void)
 
 static __init bool get_rdt_resources(void)
 {
+	erdt_init();
 	rdt_alloc_capable = get_rdt_alloc_resources();
 	rdt_mon_capable = get_rdt_mon_resources();
 
@@ -1113,7 +1114,7 @@ void resctrl_cpu_detect(struct cpuinfo_x86 *c)
 	}
 }
 
-static int __init resctrl_arch_late_init(void)
+static int __init __resctrl_arch_late_init(void)
 {
 	struct rdt_resource *r;
 	int state, ret, i;
@@ -1156,6 +1157,15 @@ static int __init resctrl_arch_late_init(void)
 	return 0;
 }
 
+static int __init resctrl_arch_late_init(void)
+{
+	int ret = __resctrl_arch_late_init();
+
+	if (ret)
+		erdt_exit();
+	return ret;
+}
+
 late_initcall(resctrl_arch_late_init);
 
 static void __exit resctrl_arch_exit(void)
@@ -1165,6 +1175,8 @@ static void __exit resctrl_arch_exit(void)
 	cpuhp_remove_state(rdt_online);
 
 	resctrl_exit();
+
+	erdt_exit();
 }
 
 __exitcall(resctrl_arch_exit);
diff --git a/arch/x86/kernel/cpu/resctrl/erdt.c b/arch/x86/kernel/cpu/resctrl/erdt.c
new file mode 100644
index 000000000000..51597a6e0058
--- /dev/null
+++ b/arch/x86/kernel/cpu/resctrl/erdt.c
@@ -0,0 +1,304 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Enhanced Resource Director Technology(ERDT)
+ *
+ * Copyright (C) 2026 Intel Corporation
+ *
+ */
+
+#define pr_fmt(fmt)     "resctrl: " fmt
+
+#include <linux/cleanup.h>
+#include <linux/cpu.h>
+#include <linux/err.h>
+#include <linux/xarray.h>
+#include <linux/resctrl.h>
+#include <linux/acpi.h>
+#include <asm/apic.h>
+#include <asm/cpu_device_id.h>
+#include "internal.h"
+
+enum erdt_mmio_type {
+	ERDT_MMIO_RMDD_CREG,
+	ERDT_MMIO_CMRC_BASE,
+	ERDT_MMIO_MAX
+};
+
+struct erdt_domain_info {
+	void __iomem		*base[ERDT_MMIO_MAX];
+	struct acpi_erdt_cmrc	*cmrc;
+};
+
+static bool erdt_enabled_flag;
+
+static DEFINE_XARRAY(erdt_domain_xa);
+
+#define ERDT_VALID_VERSION		1
+#define RMDD_FLAG_CPU_DOMAIN		BIT(0)
+
+static u32 valid_subtbl_mask;
+
+bool erdt_enabled(void)
+{
+	return erdt_enabled_flag;
+}
+
+/**
+ * get_l3_cache_id_from_cacd - Resolve L3 cache ID from CACD subtable
+ * @cacd: Pointer to the ACPI ERDT CACD structure
+ *
+ * Parses the X2APIC ID list in the given CACD subtable to
+ * identify an online logical CPU and uses it to query the associated
+ * L3 cache ID. The first valid CPU found is used for this lookup.
+ *
+ * The L3 cache ID is used as a unique domain key for ERDT domain
+ * registration and lookup.
+ *
+ * Return: L3 cache ID for the first matching CPU, or -1 on failure.
+ */
+static __init int get_l3_cache_id_from_cacd(struct acpi_erdt_cacd *cacd)
+{
+	int num_ids, cpu, online_cpu = -1, cache_id = -1, tmp;
+	struct cacheinfo *ci;
+
+	if (cacd->header.length < sizeof(*cacd) + sizeof(cacd->X2APICIDS[0])) {
+		pr_warn(FW_BUG "Invalid x2apicid CACD table\n");
+		return -1;
+	}
+
+	num_ids = (cacd->header.length - sizeof(*cacd)) / sizeof(cacd->X2APICIDS[0]);
+
+	guard(cpus_read_lock)();
+
+	for (int i = 0; i < num_ids; i++) {
+		cpu = topo_lookup_cpuid(cacd->X2APICIDS[i]);
+		if (cpu < 0) {
+			pr_warn(FW_BUG "Unknown x2apicid 0x%x\n", cacd->X2APICIDS[i]);
+
+			return -1;
+		}
+
+		if (!cpu_online(cpu))
+			continue;
+
+		tmp = get_cpu_cacheinfo_id(cpu, RESCTRL_L3_CACHE);
+		if (tmp == -1) {
+			pr_warn(FW_BUG "Can not find L3 cache id for CPU%d\n", cpu);
+			return -1;
+		}
+
+		if (cache_id == -1)
+			cache_id = tmp;
+
+		if (tmp != cache_id) {
+			pr_warn(FW_BUG "CACD references multiple L3 cache instances\n");
+			return -1;
+		}
+		online_cpu = cpu;
+	}
+
+	if (online_cpu == -1)
+		return -1;
+
+	/*
+	 * Check if CACD lists all CPUs in the LLC domain.
+	 */
+	ci = get_cpu_cacheinfo_level(online_cpu, RESCTRL_L3_CACHE);
+	if (!ci || num_ids != cpumask_weight(&ci->shared_cpu_map)) {
+		pr_warn(FW_BUG "CACD does not list all the CPUs in L3 domain\n");
+		return -1;
+	}
+
+	return cache_id;
+}
+
+static void __iomem *erdt_ioremap_checked(phys_addr_t base, u32 size, const char *desc)
+{
+	void __iomem *addr = ioremap(base, size << 12);
+
+	if (!addr) {
+		pr_err("ERDT: Failed to map %s at phys addr %#llx (size: %u pages)\n",
+		       desc, (unsigned long long)base, size);
+	}
+	return addr;
+}
+
+static void erdt_iounmap_domain(struct erdt_domain_info *domain)
+{
+	for (int i = 0; i < ERDT_MMIO_MAX; i++) {
+		if (domain->base[i]) {
+			iounmap(domain->base[i]);
+			domain->base[i] = NULL;
+		}
+	}
+}
+
+static void cleanup_one_domain(struct erdt_domain_info *d)
+{
+	erdt_iounmap_domain(d);
+	kfree(d);
+}
+
+static __init bool cacd_init(struct erdt_domain_info *d, struct acpi_subtbl_hdr_16 *subtbl,
+			     int *l3_cache_id)
+{
+	*l3_cache_id = get_l3_cache_id_from_cacd((struct acpi_erdt_cacd *)subtbl);
+
+	return *l3_cache_id != -1;
+}
+
+static inline struct acpi_subtbl_hdr_16 *rmdd_subtbl(struct acpi_erdt_rmdd *rmdd)
+{
+	return (void *)rmdd + sizeof(*rmdd);
+}
+
+static inline struct acpi_subtbl_hdr_16 *next_subtbl(struct acpi_subtbl_hdr_16 *subtbl)
+{
+	return (void *)subtbl + subtbl->length;
+}
+
+static inline bool subtbl_valid(struct acpi_erdt_rmdd *rmdd, struct acpi_subtbl_hdr_16 *subtbl)
+{
+	void *rmdd_end = (void *)rmdd + rmdd->header.length;
+
+	if (subtbl->length < sizeof(*subtbl))
+		return false;
+
+	if ((void *)subtbl + sizeof(*subtbl) > rmdd_end)
+		return false;
+
+	if ((void *)subtbl + subtbl->length > rmdd_end)
+		return false;
+
+	return true;
+}
+
+static __init bool parse_rmdd_entry(struct acpi_subtbl_hdr_16 *rmdd_hdr)
+{
+	struct erdt_domain_info *domain_info;
+	struct acpi_subtbl_hdr_16 *subtbl;
+	struct acpi_erdt_rmdd *rmdd;
+	int l3_cache_id = -1;
+	u32 subtbl_mask = 0;
+
+	if (rmdd_hdr->length < sizeof(*rmdd)) {
+		pr_info(FW_BUG "Invalid RMDD length %u\n", rmdd_hdr->length);
+		return false;
+	}
+
+	rmdd = (struct acpi_erdt_rmdd *)rmdd_hdr;
+
+	/* Quietly ignore non-CPU-based L3 domains */
+	if (!(rmdd->flags & RMDD_FLAG_CPU_DOMAIN))
+		return true;
+
+	domain_info = kzalloc(sizeof(*domain_info), GFP_KERNEL);
+	if (!domain_info)
+		return false;
+
+	domain_info->base[ERDT_MMIO_RMDD_CREG] =
+		erdt_ioremap_checked(rmdd->creg_base, rmdd->creg_size, "RMDD ctrl base");
+	if (!domain_info->base[ERDT_MMIO_RMDD_CREG])
+		goto cleanup;
+
+	for (subtbl = rmdd_subtbl(rmdd); subtbl_valid(rmdd, subtbl);
+	     subtbl = next_subtbl(subtbl)) {
+		switch (subtbl->type) {
+		case ACPI_ERDT_TYPE_CACD:
+			if (cacd_init(domain_info, subtbl, &l3_cache_id))
+				subtbl_mask |= BIT(ACPI_ERDT_TYPE_CACD);
+			break;
+		default:
+			break;
+		}
+	}
+
+	if (l3_cache_id == -1) {
+		pr_info("ERDT: Failed to resolve L3 cache ID for RMDD domain %d\n",
+			rmdd->domain_id);
+
+		goto cleanup;
+	}
+
+	/* Require all RMDDs to support same set of sub-tables */
+	if (!valid_subtbl_mask) {
+		valid_subtbl_mask = subtbl_mask;
+	} else if (subtbl_mask != valid_subtbl_mask) {
+		pr_info(FW_BUG "Mismatch domain\n");
+		goto cleanup;
+	}
+
+	if (xa_insert(&erdt_domain_xa, l3_cache_id, domain_info, GFP_KERNEL)) {
+		pr_info("ERDT: Failed to store domain info for RMDD domain %d\n",
+			rmdd->domain_id);
+		goto cleanup;
+	}
+
+	return true;
+
+cleanup:
+	cleanup_one_domain(domain_info);
+	return false;
+}
+
+static void erdt_cleanup(void)
+{
+	struct erdt_domain_info *d;
+	unsigned long index;
+
+	xa_for_each(&erdt_domain_xa, index, d)
+		cleanup_one_domain(d);
+	xa_destroy(&erdt_domain_xa);
+}
+
+static __init int enumerate_erdt_table(struct acpi_table_header *table_hdr)
+{
+	struct acpi_table_erdt *erdt = (struct acpi_table_erdt *)table_hdr;
+	struct acpi_subtbl_hdr_16 *subtbl;
+	void *table_end;
+
+	if (erdt->header.revision != ERDT_VALID_VERSION) {
+		pr_info("Unknown ERDT table revision %d\n", erdt->header.revision);
+		return -EINVAL;
+	}
+
+	if (erdt->header.length < sizeof(*erdt)) {
+		pr_info(FW_BUG "ERDT: Invalid table length %u\n", erdt->header.length);
+		return -EINVAL;
+	}
+
+	subtbl = (void *)erdt + sizeof(struct acpi_table_erdt);
+	table_end = (void *)erdt + erdt->header.length;
+
+	while ((void *)subtbl + sizeof(*subtbl) <= table_end) {
+		if (subtbl->length < sizeof(*subtbl) ||
+		    (void *)subtbl + subtbl->length > table_end) {
+			pr_info("ERDT: Invalid subtable length\n");
+			goto cleanup;
+		}
+
+		if (subtbl->type == ACPI_ERDT_TYPE_RMDD)
+			if (!parse_rmdd_entry(subtbl))
+				goto cleanup;
+
+		subtbl = (void *)subtbl + subtbl->length;
+	}
+
+	erdt_enabled_flag = true;
+
+	return 0;
+
+cleanup:
+	erdt_cleanup();
+	return -EINVAL;
+}
+
+int __init erdt_init(void)
+{
+	return acpi_table_parse(ACPI_SIG_ERDT, enumerate_erdt_table);
+}
+
+void erdt_exit(void)
+{
+	erdt_cleanup();
+}
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index e3cfa0c10e92..9c59bd5e028e 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -253,4 +253,7 @@ static inline void intel_aet_mon_domain_setup(int cpu, int id, struct rdt_resour
 static inline bool intel_handle_aet_option(bool force_off, char *tok) { return false; }
 #endif
 
+int erdt_init(void);
+void erdt_exit(void);
+
 #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index 4913b64ec592..bcee70fb9277 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -92,7 +92,7 @@ static inline u32 topo_apicid(u32 apicid, enum x86_topology_domains dom)
 	return apicid & (UINT_MAX << x86_topo_system.dom_shifts[dom - 1]);
 }
 
-static int topo_lookup_cpuid(u32 apic_id)
+int topo_lookup_cpuid(u32 apic_id)
 {
 	int i;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 2/6] x86/resctrl: Parse ACPI CMRC table
  2026-06-06  2:31 [PATCH v3 0/6] Introduce MMIO-based CMT access for Enhanced RDT Chen Yu
  2026-06-06  2:32 ` [PATCH v3 1/6] x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID Chen Yu
@ 2026-06-06  2:33 ` Chen Yu
  2026-06-08  8:30   ` Thomas Gleixner
  2026-06-06  2:35 ` [PATCH v3 3/6] x86/resctrl: Rename prev_msr to prev_mon_val Chen Yu
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Chen Yu @ 2026-06-06  2:33 UTC (permalink / raw)
  To: tony.luck, reinette.chatre
  Cc: x86, linux-kernel, bp, tglx, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy

The CMRC (Cache Monitoring Registers for CPU Agents Description)
sub-table of ERDT describes the MMIO registers used to read
cache monitoring counters (e.g. LLC occupancy) for an RMD.

Parse each CMRC sub-table, ioremap its register window, and save
the CMRC pointer in the corresponding ERDT domain entry so that
later monitoring code can read the counters via MMIO.

Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
v2->v3:
  Make the macros tabular format. (Thomas Gleixner)
---
 arch/x86/kernel/cpu/resctrl/erdt.c | 44 ++++++++++++++++++++++++++++--
 1 file changed, 42 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/erdt.c b/arch/x86/kernel/cpu/resctrl/erdt.c
index 51597a6e0058..8d0b9f4ad6d8 100644
--- a/arch/x86/kernel/cpu/resctrl/erdt.c
+++ b/arch/x86/kernel/cpu/resctrl/erdt.c
@@ -33,8 +33,9 @@ static bool erdt_enabled_flag;
 
 static DEFINE_XARRAY(erdt_domain_xa);
 
-#define ERDT_VALID_VERSION		1
-#define RMDD_FLAG_CPU_DOMAIN		BIT(0)
+#define ERDT_VALID_VERSION			1
+#define CMRC_VALID_INDEX_FUNC_VERSION		1
+#define RMDD_FLAG_CPU_DOMAIN			BIT(0)
 
 static u32 valid_subtbl_mask;
 
@@ -136,6 +137,7 @@ static void erdt_iounmap_domain(struct erdt_domain_info *domain)
 static void cleanup_one_domain(struct erdt_domain_info *d)
 {
 	erdt_iounmap_domain(d);
+	kfree(d->cmrc);
 	kfree(d);
 }
 
@@ -147,6 +149,40 @@ static __init bool cacd_init(struct erdt_domain_info *d, struct acpi_subtbl_hdr_
 	return *l3_cache_id != -1;
 }
 
+static __init bool cmrc_init(struct erdt_domain_info *d, struct acpi_subtbl_hdr_16 *subtbl)
+{
+	struct acpi_erdt_cmrc *cmrc = (struct acpi_erdt_cmrc *)subtbl;
+
+	if (subtbl->length < sizeof(*cmrc)) {
+		pr_warn(FW_BUG "Truncated CMRC subtable\n");
+		return false;
+	}
+
+	if (cmrc->index_fn != CMRC_VALID_INDEX_FUNC_VERSION) {
+		pr_info("Unknown CMRC index function %d\n", cmrc->index_fn);
+		return false;
+	}
+
+	if (!cmrc->clump_size) {
+		pr_warn(FW_BUG "CMRC clump_size is zero\n");
+		return false;
+	}
+
+	d->base[ERDT_MMIO_CMRC_BASE] = erdt_ioremap_checked(cmrc->cmt_reg_base,
+							    cmrc->cmt_reg_size, "CMRC base");
+	if (!d->base[ERDT_MMIO_CMRC_BASE])
+		return false;
+
+	d->cmrc = kmemdup(cmrc, subtbl->length, GFP_KERNEL);
+	if (!d->cmrc) {
+		iounmap(d->base[ERDT_MMIO_CMRC_BASE]);
+		d->base[ERDT_MMIO_CMRC_BASE] = NULL;
+		return false;
+	}
+
+	return true;
+}
+
 static inline struct acpi_subtbl_hdr_16 *rmdd_subtbl(struct acpi_erdt_rmdd *rmdd)
 {
 	return (void *)rmdd + sizeof(*rmdd);
@@ -208,6 +244,10 @@ static __init bool parse_rmdd_entry(struct acpi_subtbl_hdr_16 *rmdd_hdr)
 			if (cacd_init(domain_info, subtbl, &l3_cache_id))
 				subtbl_mask |= BIT(ACPI_ERDT_TYPE_CACD);
 			break;
+		case ACPI_ERDT_TYPE_CMRC:
+			if (cmrc_init(domain_info, subtbl))
+				subtbl_mask |= BIT(ACPI_ERDT_TYPE_CMRC);
+			break;
 		default:
 			break;
 		}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 3/6] x86/resctrl: Rename prev_msr to prev_mon_val
  2026-06-06  2:31 [PATCH v3 0/6] Introduce MMIO-based CMT access for Enhanced RDT Chen Yu
  2026-06-06  2:32 ` [PATCH v3 1/6] x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID Chen Yu
  2026-06-06  2:33 ` [PATCH v3 2/6] x86/resctrl: Parse ACPI CMRC table Chen Yu
@ 2026-06-06  2:35 ` Chen Yu
  2026-06-08  8:32   ` Thomas Gleixner
  2026-06-06  2:38 ` [PATCH v3 4/6] x86/resctrl: Refactor the monitor read function Chen Yu
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Chen Yu @ 2026-06-06  2:35 UTC (permalink / raw)
  To: tony.luck, reinette.chatre
  Cc: x86, linux-kernel, bp, tglx, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy

Rename the prev_msr field in struct arch_mbm_state to prev_mon_val.
With ERDT, the previous monitor value may come from an MMIO
register rather than from an MSR, so the "msr" suffix is no longer
accurate. The new name describes the field by its meaning (the
previous monitor value) instead of by the access method.

This is preparation for ERDT support, which reads monitoring
counters via MMIO.

No functional change.

Co-developed-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
 arch/x86/kernel/cpu/resctrl/internal.h |  8 +++----
 arch/x86/kernel/cpu/resctrl/monitor.c  | 30 +++++++++++++-------------
 2 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 9c59bd5e028e..97065dc6e14f 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -31,13 +31,13 @@
 /**
  * struct arch_mbm_state - values used to compute resctrl_arch_rmid_read()s
  *			   return value.
- * @chunks:	Total data moved (multiply by rdt_group.mon_scale to get bytes)
- * @prev_msr:	Value of IA32_QM_CTR last time it was read for the RMID used to
- *		find this struct.
+ * @chunks:		Total data moved (multiply by rdt_group.mon_scale to get bytes)
+ * @prev_mon_val:	Previous monitor counter value for the RMID used to
+ *			find this struct.
  */
 struct arch_mbm_state {
 	u64	chunks;
-	u64	prev_msr;
+	u64	prev_mon_val;
 };
 
 /* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 9bd87bae4983..991f0a796551 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -186,7 +186,7 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_l3_mon_domain *d
 
 		prmid = logical_rmid_to_physical_rmid(cpu, rmid);
 		/* Record any initial, non-zero count value. */
-		__rmid_read_phys(prmid, eventid, &am->prev_msr);
+		__rmid_read_phys(prmid, eventid, &am->prev_mon_val);
 	}
 }
 
@@ -209,16 +209,16 @@ void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_l3_mon_domai
 	}
 }
 
-static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
+static u64 mbm_overflow_count(u64 prev_val, u64 cur_val, unsigned int width)
 {
 	u64 shift = 64 - width, chunks;
 
-	chunks = (cur_msr << shift) - (prev_msr << shift);
+	chunks = (cur_val << shift) - (prev_val << shift);
 	return chunks >> shift;
 }
 
 static u64 get_corrected_val(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
-			     u32 rmid, enum resctrl_event_id eventid, u64 msr_val)
+			     u32 rmid, enum resctrl_event_id eventid, u64 mon_val)
 {
 	struct rdt_hw_l3_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
@@ -227,12 +227,12 @@ static u64 get_corrected_val(struct rdt_resource *r, struct rdt_l3_mon_domain *d
 
 	am = get_arch_mbm_state(hw_dom, rmid, eventid);
 	if (am) {
-		am->chunks += mbm_overflow_count(am->prev_msr, msr_val,
+		am->chunks += mbm_overflow_count(am->prev_mon_val, mon_val,
 						 hw_res->mbm_width);
 		chunks = get_corrected_mbm_count(rmid, am->chunks);
-		am->prev_msr = msr_val;
+		am->prev_mon_val = mon_val;
 	} else {
-		chunks = msr_val;
+		chunks = mon_val;
 	}
 
 	return chunks * hw_res->mon_scale;
@@ -245,7 +245,7 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
 	struct rdt_hw_l3_mon_domain *hw_dom;
 	struct rdt_l3_mon_domain *d;
 	struct arch_mbm_state *am;
-	u64 msr_val;
+	u64 mon_val;
 	u32 prmid;
 	int cpu;
 	int ret;
@@ -262,14 +262,14 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
 	hw_dom = resctrl_to_arch_mon_dom(d);
 	cpu = cpumask_any(&hdr->cpu_mask);
 	prmid = logical_rmid_to_physical_rmid(cpu, rmid);
-	ret = __rmid_read_phys(prmid, eventid, &msr_val);
+	ret = __rmid_read_phys(prmid, eventid, &mon_val);
 
 	if (!ret) {
-		*val = get_corrected_val(r, d, rmid, eventid, msr_val);
+		*val = get_corrected_val(r, d, rmid, eventid, mon_val);
 	} else if (ret == -EINVAL) {
 		am = get_arch_mbm_state(hw_dom, rmid, eventid);
 		if (am)
-			am->prev_msr = 0;
+			am->prev_mon_val = 0;
 	}
 
 	return ret;
@@ -324,7 +324,7 @@ void resctrl_arch_reset_cntr(struct rdt_resource *r, struct rdt_l3_mon_domain *d
 		memset(am, 0, sizeof(*am));
 
 		/* Record any initial, non-zero count value. */
-		__cntr_id_read(cntr_id, &am->prev_msr);
+		__cntr_id_read(cntr_id, &am->prev_mon_val);
 	}
 }
 
@@ -332,14 +332,14 @@ int resctrl_arch_cntr_read(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
 			   u32 unused, u32 rmid, int cntr_id,
 			   enum resctrl_event_id eventid, u64 *val)
 {
-	u64 msr_val;
+	u64 mon_val;
 	int ret;
 
-	ret = __cntr_id_read(cntr_id, &msr_val);
+	ret = __cntr_id_read(cntr_id, &mon_val);
 	if (ret)
 		return ret;
 
-	*val = get_corrected_val(r, d, rmid, eventid, msr_val);
+	*val = get_corrected_val(r, d, rmid, eventid, mon_val);
 
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 4/6] x86/resctrl: Refactor the monitor read function
  2026-06-06  2:31 [PATCH v3 0/6] Introduce MMIO-based CMT access for Enhanced RDT Chen Yu
                   ` (2 preceding siblings ...)
  2026-06-06  2:35 ` [PATCH v3 3/6] x86/resctrl: Rename prev_msr to prev_mon_val Chen Yu
@ 2026-06-06  2:38 ` Chen Yu
  2026-06-06  2:38 ` [PATCH v3 5/6] fs/resctrl: Do not invoke smp_processor_id() in preemptible context Chen Yu
  2026-06-06  2:38 ` [PATCH v3 6/6] x86/resctrl: Add support for L3 occupancy monitoring via RMID MMIO read Chen Yu
  5 siblings, 0 replies; 18+ messages in thread
From: Chen Yu @ 2026-06-06  2:38 UTC (permalink / raw)
  To: tony.luck, reinette.chatre
  Cc: x86, linux-kernel, bp, tglx, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy

Split the monitor read helper into an L3 read path and an AET
(Intel Application Energy Telemetry) read path. This makes the
two distinct monitoring sources easier to extend independently
and prepares the L3 path for ERDT-based MMIO reads added in a
later patch.

No functional change.

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
 arch/x86/kernel/cpu/resctrl/monitor.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 991f0a796551..1e81b3c33843 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -238,9 +238,9 @@ static u64 get_corrected_val(struct rdt_resource *r, struct rdt_l3_mon_domain *d
 	return chunks * hw_res->mon_scale;
 }
 
-int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
-			   u32 unused, u32 rmid, enum resctrl_event_id eventid,
-			   void *arch_priv, u64 *val, void *ignored)
+static int arch_l3_read_event(struct rdt_domain_hdr *hdr, u32 rmid,
+			      enum resctrl_event_id eventid, u64 *val,
+			      struct rdt_resource *r)
 {
 	struct rdt_hw_l3_mon_domain *hw_dom;
 	struct rdt_l3_mon_domain *d;
@@ -250,11 +250,6 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
 	int cpu;
 	int ret;
 
-	resctrl_arch_rmid_read_context_check();
-
-	if (r->rid == RDT_RESOURCE_PERF_PKG)
-		return intel_aet_read_event(hdr->id, rmid, arch_priv, val);
-
 	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
 		return -EINVAL;
 
@@ -275,6 +270,22 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
 	return ret;
 }
 
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
+			   u32 unused, u32 rmid, enum resctrl_event_id eventid,
+			   void *arch_priv, u64 *val, void *ignored)
+{
+	resctrl_arch_rmid_read_context_check();
+
+	switch (r->rid) {
+	case RDT_RESOURCE_L3:
+		return arch_l3_read_event(hdr, rmid, eventid, val, r);
+	case RDT_RESOURCE_PERF_PKG:
+		return intel_aet_read_event(hdr->id, rmid, arch_priv, val);
+	default:
+		return -EINVAL;
+	}
+}
+
 static int __cntr_id_read(u32 cntr_id, u64 *val)
 {
 	u64 msr_val;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 5/6] fs/resctrl: Do not invoke smp_processor_id() in preemptible context
  2026-06-06  2:31 [PATCH v3 0/6] Introduce MMIO-based CMT access for Enhanced RDT Chen Yu
                   ` (3 preceding siblings ...)
  2026-06-06  2:38 ` [PATCH v3 4/6] x86/resctrl: Refactor the monitor read function Chen Yu
@ 2026-06-06  2:38 ` Chen Yu
  2026-06-08  8:36   ` Thomas Gleixner
  2026-06-06  2:38 ` [PATCH v3 6/6] x86/resctrl: Add support for L3 occupancy monitoring via RMID MMIO read Chen Yu
  5 siblings, 1 reply; 18+ messages in thread
From: Chen Yu @ 2026-06-06  2:38 UTC (permalink / raw)
  To: tony.luck, reinette.chatre
  Cc: x86, linux-kernel, bp, tglx, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy

From: Tony Luck <tony.luck@intel.com>

__l3_mon_event_count() and __l3_mon_event_count_sum() call
smp_processor_id() to obtain the current CPU. However, some
monitor events can be read from any CPU in task context via
mon_event_count(); in that case the calling context is
preemptible and smp_processor_id() triggers a debug warning.

Fix this by skipping the current-CPU lookup when the event's
any_cpu flag is set, since such events do not need to run on a
specific CPU.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
 fs/resctrl/monitor.c | 41 +++++++++++++++++++++++++++++++----------
 1 file changed, 31 insertions(+), 10 deletions(-)

diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 9fd901c78dc6..3e8995e3380e 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -417,9 +417,36 @@ static void mbm_cntr_free(struct rdt_l3_mon_domain *d, int cntr_id)
 	memset(&d->cntr_cfg[cntr_id], 0, sizeof(*d->cntr_cfg));
 }
 
+/**
+ * cpu_on_correct_domain() - Check if current CPU is in the correct domain for the event.
+ * @rr: The rmid_read structure containing event and domain information.
+ *
+ * Context: Preemptible process context when @rr->evt->any_cpu is set.
+ *          Non-migratable process context (via smp_call_on_cpu()) or
+ *          non-preemptible context (via smp_call_function_any()) when
+ *          the event must be read on a specific CPU.
+ * Return: true if the current CPU can read this event, false otherwise.
+ */
+static bool cpu_on_correct_domain(struct rmid_read *rr)
+{
+	int cpu;
+
+	/* Any CPU is OK for this event */
+	if (rr->evt->any_cpu)
+		return true;
+
+	cpu = smp_processor_id();
+
+	/* Single domain. Must be on a CPU in that domain. */
+	if (rr->hdr)
+		return cpumask_test_cpu(cpu, &rr->hdr->cpu_mask);
+
+	/* Summing domains that share a cache, must be on a CPU for that cache. */
+	return cpumask_test_cpu(cpu, &rr->ci->shared_cpu_map);
+}
+
 static int __l3_mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 {
-	int cpu = smp_processor_id();
 	u32 closid = rdtgrp->closid;
 	u32 rmid = rdtgrp->mon.rmid;
 	struct rdt_l3_mon_domain *d;
@@ -452,9 +479,6 @@ static int __l3_mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 		return 0;
 	}
 
-	/* Reading a single domain, must be on a CPU in that domain. */
-	if (!cpumask_test_cpu(cpu, &d->hdr.cpu_mask))
-		return -EINVAL;
 	if (rr->is_mbm_cntr)
 		rr->err = resctrl_arch_cntr_read(rr->r, d, closid, rmid, cntr_id,
 						 rr->evt->evtid, &tval);
@@ -472,7 +496,6 @@ static int __l3_mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 
 static int __l3_mon_event_count_sum(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 {
-	int cpu = smp_processor_id();
 	u32 closid = rdtgrp->closid;
 	u32 rmid = rdtgrp->mon.rmid;
 	struct rdt_l3_mon_domain *d;
@@ -490,10 +513,6 @@ static int __l3_mon_event_count_sum(struct rdtgroup *rdtgrp, struct rmid_read *r
 		return -EINVAL;
 	}
 
-	/* Summing domains that share a cache, must be on a CPU for that cache. */
-	if (!cpumask_test_cpu(cpu, &rr->ci->shared_cpu_map))
-		return -EINVAL;
-
 	/*
 	 * Legacy files must report the sum of an event across all
 	 * domains that share the same L3 cache instance.
@@ -524,7 +543,9 @@ static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
 {
 	switch (rr->r->rid) {
 	case RDT_RESOURCE_L3:
-		WARN_ON_ONCE(rr->evt->any_cpu);
+		if (!cpu_on_correct_domain(rr))
+			return -EINVAL;
+
 		if (rr->hdr)
 			return __l3_mon_event_count(rdtgrp, rr);
 		else
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 6/6] x86/resctrl: Add support for L3 occupancy monitoring via RMID MMIO read
  2026-06-06  2:31 [PATCH v3 0/6] Introduce MMIO-based CMT access for Enhanced RDT Chen Yu
                   ` (4 preceding siblings ...)
  2026-06-06  2:38 ` [PATCH v3 5/6] fs/resctrl: Do not invoke smp_processor_id() in preemptible context Chen Yu
@ 2026-06-06  2:38 ` Chen Yu
  2026-06-08  8:33   ` Thomas Gleixner
  5 siblings, 1 reply; 18+ messages in thread
From: Chen Yu @ 2026-06-06  2:38 UTC (permalink / raw)
  To: tony.luck, reinette.chatre
  Cc: x86, linux-kernel, bp, tglx, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy

The CMRC (Cache Monitoring Registers for CPU Agents Description)
ACPI sub-table provides the MMIO address used to read the LLC
occupancy counter for each RMID. When ERDT is enabled on the
platform, use this MMIO interface instead of the legacy MSR read
to obtain the L3 occupancy value.

Introduce erdt_mon_read(), a helper that retrieves monitoring
data for a given RMID and event ID from an ERDT domain. Initial
support is added for the L3 occupancy monitoring event
(QOS_L3_OCCUP_EVENT_ID).

If the platform supports ERDT, CMRC-based MMIO access is used by
default. If ERDT is unavailable, the implementation is to use
MSR-based operations.

Suggested-by: Tony Luck <tony.luck@intel.com>
Co-developed-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
 arch/x86/include/asm/resctrl.h        |  2 +
 arch/x86/kernel/cpu/resctrl/core.c    |  2 +-
 arch/x86/kernel/cpu/resctrl/erdt.c    | 89 +++++++++++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/monitor.c |  7 +++
 4 files changed, 99 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index 97c2f6bc7a5f..9b3b03279dd8 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -41,6 +41,8 @@ struct resctrl_pqr_state {
 };
 
 bool erdt_enabled(void);
+struct rdt_domain_hdr;
+int erdt_mon_read(struct rdt_domain_hdr *hdr, int ev_id, int rmid, u64 *val);
 
 DECLARE_PER_CPU(struct resctrl_pqr_state, pqr_state);
 
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 90730f0851fa..fe812f7190fc 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -965,7 +965,7 @@ static __init bool get_rdt_mon_resources(void)
 	bool ret = false;
 
 	if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) {
-		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false, 0, NULL);
+		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, erdt_enabled(), 0, NULL);
 		ret = true;
 	}
 	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
diff --git a/arch/x86/kernel/cpu/resctrl/erdt.c b/arch/x86/kernel/cpu/resctrl/erdt.c
index 8d0b9f4ad6d8..b000eaccc02e 100644
--- a/arch/x86/kernel/cpu/resctrl/erdt.c
+++ b/arch/x86/kernel/cpu/resctrl/erdt.c
@@ -35,6 +35,7 @@ static DEFINE_XARRAY(erdt_domain_xa);
 
 #define ERDT_VALID_VERSION			1
 #define CMRC_VALID_INDEX_FUNC_VERSION		1
+#define UNAVAILABLE_COUNTER			BIT_ULL(63)
 #define RMDD_FLAG_CPU_DOMAIN			BIT(0)
 
 static u32 valid_subtbl_mask;
@@ -44,6 +45,94 @@ bool erdt_enabled(void)
 	return erdt_enabled_flag;
 }
 
+static void __iomem *cmrc_index_function_1(struct erdt_domain_info *d,
+					   struct acpi_erdt_cmrc *cmrc, int rmid)
+{
+	u16 clump_size, stride_size;
+	void __iomem *vaddr;
+
+	clump_size = cmrc->clump_size;
+	stride_size = cmrc->clump_stride;
+
+	/*
+	 * MMIO_ADDRESS_for_RMID# = CMRC Base +
+	 *   (RMID / ClumpSize) * Stride +
+	 *   (RMID % ClumpSize) * 8
+	 */
+	vaddr = d->base[ERDT_MMIO_CMRC_BASE] +
+		(rmid / clump_size) * stride_size +
+		(rmid % clump_size) * 8;
+
+	return vaddr;
+}
+
+/**
+ * erdt_read_l3_occupancy - Read L3 occupancy count for a given RMID
+ * @d:    Pointer to the ERDT domain info
+ * @rmid: Resource Monitoring ID to read occupancy for
+ * @val:  Output pointer to store the scaled occupancy count
+ *
+ * Calculates the MMIO address using clump and stride information
+ * from the CMRC ACPI structure and reads the L3 cache occupancy
+ * count for the given RMID. The raw value is scaled using the
+ * up_scale factor provided by firmware.
+ *
+ * Return: 0 for success, error code for other cases.
+ */
+static int erdt_read_l3_occupancy(struct erdt_domain_info *d, int rmid, u64 *val)
+{
+	struct acpi_erdt_cmrc *cmrc;
+	void __iomem *vaddr;
+	u64 l3_cmt_count;
+	u32 offset;
+
+	cmrc = d->cmrc;
+	if (!cmrc)
+		return -EIO;
+
+	offset = (rmid / cmrc->clump_size) * cmrc->clump_stride +
+		 (rmid % cmrc->clump_size) * 8;
+	if (offset + sizeof(u64) > (u32)cmrc->cmt_reg_size << 12)
+		return -EINVAL;
+
+	vaddr = cmrc_index_function_1(d, cmrc, rmid);
+
+	l3_cmt_count = readq(vaddr);
+	if (l3_cmt_count & UNAVAILABLE_COUNTER)
+		return -EINVAL;
+
+	*val = l3_cmt_count * cmrc->up_scale;
+
+	return 0;
+}
+
+/**
+ * erdt_mon_read - Read monitoring data for a given domain and RMID
+ * @hdr:    Domain header
+ * @ev_id:  Monitoring event ID (e.g. QOS_L3_OCCUP_EVENT_ID)
+ * @rmid:   Resource Monitoring ID for which to read the data
+ * @val:    Store the read data
+ *
+ * Looks up the domain by domid and dispatches the read request
+ * to the appropriate helper based on the event type.
+ * Currently supports only L3 occupancy monitoring.
+ *
+ * Return: 0 on success, error code otherwise.
+ */
+int erdt_mon_read(struct rdt_domain_hdr *hdr, int ev_id, int rmid, u64 *val)
+{
+	struct erdt_domain_info *d;
+
+	d = xa_load(&erdt_domain_xa, hdr->id);
+	if (!d)
+		return -EIO;
+
+	if (ev_id == QOS_L3_OCCUP_EVENT_ID)
+		return erdt_read_l3_occupancy(d, rmid, val);
+
+	return -EINVAL;
+}
+
 /**
  * get_l3_cache_id_from_cacd - Resolve L3 cache ID from CACD subtable
  * @cacd: Pointer to the ACPI ERDT CACD structure
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 1e81b3c33843..12b4014e47f3 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -278,6 +278,13 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
 
 	switch (r->rid) {
 	case RDT_RESOURCE_L3:
+		/*
+		 * No SNC for mmio based L3 occupancy, so there is no need
+		 * to convert logical RMID to a physical RMID via
+		 * logical_rmid_to_physical_rmid().
+		 */
+		if (erdt_enabled() && eventid == QOS_L3_OCCUP_EVENT_ID)
+			return erdt_mon_read(hdr, eventid, rmid, val);
 		return arch_l3_read_event(hdr, rmid, eventid, val, r);
 	case RDT_RESOURCE_PERF_PKG:
 		return intel_aet_read_event(hdr->id, rmid, arch_priv, val);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/6] x86/resctrl: Parse ACPI CMRC table
  2026-06-06  2:33 ` [PATCH v3 2/6] x86/resctrl: Parse ACPI CMRC table Chen Yu
@ 2026-06-08  8:30   ` Thomas Gleixner
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Gleixner @ 2026-06-08  8:30 UTC (permalink / raw)
  To: Chen Yu, tony.luck, reinette.chatre
  Cc: x86, linux-kernel, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy

On Sat, Jun 06 2026 at 10:33, Chen Yu wrote:
> The CMRC (Cache Monitoring Registers for CPU Agents Description)
> sub-table of ERDT describes the MMIO registers used to read
> cache monitoring counters (e.g. LLC occupancy) for an RMD.
>
> Parse each CMRC sub-table, ioremap its register window, and save
> the CMRC pointer in the corresponding ERDT domain entry so that
> later monitoring code can read the counters via MMIO.
>
> Suggested-by: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Chen Yu <yu.c.chen@intel.com>

Reviewed-by: Thomas Gleixner <tglx@kernel.org>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 3/6] x86/resctrl: Rename prev_msr to prev_mon_val
  2026-06-06  2:35 ` [PATCH v3 3/6] x86/resctrl: Rename prev_msr to prev_mon_val Chen Yu
@ 2026-06-08  8:32   ` Thomas Gleixner
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Gleixner @ 2026-06-08  8:32 UTC (permalink / raw)
  To: Chen Yu, tony.luck, reinette.chatre
  Cc: x86, linux-kernel, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy

On Sat, Jun 06 2026 at 10:35, Chen Yu wrote:

> Rename the prev_msr field in struct arch_mbm_state to prev_mon_val.
> With ERDT, the previous monitor value may come from an MMIO
> register rather than from an MSR, so the "msr" suffix is no longer
> accurate. The new name describes the field by its meaning (the
> previous monitor value) instead of by the access method.
>
> This is preparation for ERDT support, which reads monitoring
> counters via MMIO.
>
> No functional change.
>
> Co-developed-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
> Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
> Signed-off-by: Chen Yu <yu.c.chen@intel.com>

Reviewed-by: Thomas Gleixner <tglx@kernel.org>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 6/6] x86/resctrl: Add support for L3 occupancy monitoring via RMID MMIO read
  2026-06-06  2:38 ` [PATCH v3 6/6] x86/resctrl: Add support for L3 occupancy monitoring via RMID MMIO read Chen Yu
@ 2026-06-08  8:33   ` Thomas Gleixner
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Gleixner @ 2026-06-08  8:33 UTC (permalink / raw)
  To: Chen Yu, tony.luck, reinette.chatre
  Cc: x86, linux-kernel, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy

On Sat, Jun 06 2026 at 10:38, Chen Yu wrote:

> The CMRC (Cache Monitoring Registers for CPU Agents Description)
> ACPI sub-table provides the MMIO address used to read the LLC
> occupancy counter for each RMID. When ERDT is enabled on the
> platform, use this MMIO interface instead of the legacy MSR read
> to obtain the L3 occupancy value.
>
> Introduce erdt_mon_read(), a helper that retrieves monitoring
> data for a given RMID and event ID from an ERDT domain. Initial
> support is added for the L3 occupancy monitoring event
> (QOS_L3_OCCUP_EVENT_ID).
>
> If the platform supports ERDT, CMRC-based MMIO access is used by
> default. If ERDT is unavailable, the implementation is to use
> MSR-based operations.
>
> Suggested-by: Tony Luck <tony.luck@intel.com>
> Co-developed-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
> Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
> Signed-off-by: Chen Yu <yu.c.chen@intel.com>

Reviewed-by: Thomas Gleixner <tglx@kernel.org>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 5/6] fs/resctrl: Do not invoke smp_processor_id() in preemptible context
  2026-06-06  2:38 ` [PATCH v3 5/6] fs/resctrl: Do not invoke smp_processor_id() in preemptible context Chen Yu
@ 2026-06-08  8:36   ` Thomas Gleixner
  2026-06-08 11:26     ` Chen, Yu C
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2026-06-08  8:36 UTC (permalink / raw)
  To: Chen Yu, tony.luck, reinette.chatre
  Cc: x86, linux-kernel, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy

On Sat, Jun 06 2026 at 10:38, Chen Yu wrote:
> From: Tony Luck <tony.luck@intel.com>
>
> __l3_mon_event_count() and __l3_mon_event_count_sum() call
> smp_processor_id() to obtain the current CPU. However, some
> monitor events can be read from any CPU in task context via
> mon_event_count(); in that case the calling context is
> preemptible and smp_processor_id() triggers a debug warning.

Is this new with this MMIO stuff or is this an existing issue? If the
latter then this patch should be in front of the series and get a fixes
tag. If not the change log should explain it.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/6] x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID
  2026-06-06  2:32 ` [PATCH v3 1/6] x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID Chen Yu
@ 2026-06-08  8:59   ` Thomas Gleixner
  2026-06-08 11:20     ` Chen, Yu C
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2026-06-08  8:59 UTC (permalink / raw)
  To: Chen Yu, tony.luck, reinette.chatre
  Cc: x86, linux-kernel, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy

On Sat, Jun 06 2026 at 10:32, Chen Yu wrote:
> +
> +static inline struct acpi_subtbl_hdr_16 *rmdd_subtbl(struct acpi_erdt_rmdd *rmdd)
> +{
> +	return (void *)rmdd + sizeof(*rmdd);
> +}
> +
> +static inline struct acpi_subtbl_hdr_16 *next_subtbl(struct acpi_subtbl_hdr_16 *subtbl)
> +{
> +	return (void *)subtbl + subtbl->length;
> +}

Second thoughts on this. Hit send too fast.

> +static inline bool subtbl_valid(struct acpi_erdt_rmdd *rmdd, struct acpi_subtbl_hdr_16 *subtbl)
> +{
> +	void *rmdd_end = (void *)rmdd + rmdd->header.length;
> +
> +	if (subtbl->length < sizeof(*subtbl))
> +		return false;
> +
> +	if ((void *)subtbl + sizeof(*subtbl) > rmdd_end)
> +		return false;
> +
> +	if ((void *)subtbl + subtbl->length > rmdd_end)
> +		return false;

These conditions are confusing. This basically allows subtbl->length to
be larger than sizeof(tbl) as long as it's within the limits.

If that's intentional then the second condition is pointless because
according to the first condition

          length >= sizeof(tbl)

so

          tbl + length <= end

is catching both. No?

> +	return true;

> +static __init int enumerate_erdt_table(struct acpi_table_header *table_hdr)
> +{
> +	struct acpi_table_erdt *erdt = (struct acpi_table_erdt *)table_hdr;
> +	struct acpi_subtbl_hdr_16 *subtbl;
> +	void *table_end;
> +
> +	if (erdt->header.revision != ERDT_VALID_VERSION) {
> +		pr_info("Unknown ERDT table revision %d\n", erdt->header.revision);
> +		return -EINVAL;
> +	}
> +
> +	if (erdt->header.length < sizeof(*erdt)) {
> +		pr_info(FW_BUG "ERDT: Invalid table length %u\n", erdt->header.length);
> +		return -EINVAL;
> +	}
> +
> +	subtbl = (void *)erdt + sizeof(struct acpi_table_erdt);
> +	table_end = (void *)erdt + erdt->header.length;
> +
> +	while ((void *)subtbl + sizeof(*subtbl) <= table_end) {
> +		if (subtbl->length < sizeof(*subtbl) ||
> +		    (void *)subtbl + subtbl->length > table_end) {
> +			pr_info("ERDT: Invalid subtable length\n");
> +			goto cleanup;
> +		}

So this is yet another version of the above, just slighty different with
the same strange conditions.

If you make subtbl_valid():

static inline bool subtbl_valid(void *end, struct acpi_subtbl_hdr_16 *subtbl)
   
and calculate the end at the call sites you can reuse that function.

> +
> +		if (subtbl->type == ACPI_ERDT_TYPE_RMDD)
> +			if (!parse_rmdd_entry(subtbl))
> +				goto cleanup;
> +
> +		subtbl = (void *)subtbl + subtbl->length;

 open coded variant of next_subtbl()


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/6] x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID
  2026-06-08  8:59   ` Thomas Gleixner
@ 2026-06-08 11:20     ` Chen, Yu C
  0 siblings, 0 replies; 18+ messages in thread
From: Chen, Yu C @ 2026-06-08 11:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: x86, linux-kernel, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy, tony.luck,
	reinette.chatre

Hi Thomas,

On 6/8/2026 4:59 PM, Thomas Gleixner wrote:
> On Sat, Jun 06 2026 at 10:32, Chen Yu wrote:
>> +
>> +static inline struct acpi_subtbl_hdr_16 *rmdd_subtbl(struct acpi_erdt_rmdd *rmdd)
>> +{
>> +	return (void *)rmdd + sizeof(*rmdd);
>> +}
>> +
>> +static inline struct acpi_subtbl_hdr_16 *next_subtbl(struct acpi_subtbl_hdr_16 *subtbl)
>> +{
>> +	return (void *)subtbl + subtbl->length;
>> +}
> 
> Second thoughts on this. Hit send too fast.
> 
>> +static inline bool subtbl_valid(struct acpi_erdt_rmdd *rmdd, struct acpi_subtbl_hdr_16 *subtbl)
>> +{
>> +	void *rmdd_end = (void *)rmdd + rmdd->header.length;
>> +
>> +	if (subtbl->length < sizeof(*subtbl))
>> +		return false;
>> +
>> +	if ((void *)subtbl + sizeof(*subtbl) > rmdd_end)
>> +		return false;
>> +
>> +	if ((void *)subtbl + subtbl->length > rmdd_end)
>> +		return false;
> 
> These conditions are confusing. This basically allows subtbl->length to
> be larger than sizeof(tbl) as long as it's within the limits.
> 
> If that's intentional then the second condition is pointless because
> according to the first condition
> 
>            length >= sizeof(tbl)
> 
> so
> 
>            tbl + length <= end
> 
> is catching both. No?
> 

You are right, subtbl + sizeof(*subtbl) > rmdd_end
is redundant because condition 3 has protected against
any invalid out-of-boundary access. Will remove condition 2.

>> +	return true;
> 
>> +static __init int enumerate_erdt_table(struct acpi_table_header *table_hdr)
>> +{
>> +	struct acpi_table_erdt *erdt = (struct acpi_table_erdt *)table_hdr;
>> +	struct acpi_subtbl_hdr_16 *subtbl;
>> +	void *table_end;
>> +
>> +	if (erdt->header.revision != ERDT_VALID_VERSION) {
>> +		pr_info("Unknown ERDT table revision %d\n", erdt->header.revision);
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (erdt->header.length < sizeof(*erdt)) {
>> +		pr_info(FW_BUG "ERDT: Invalid table length %u\n", erdt->header.length);
>> +		return -EINVAL;
>> +	}
>> +
>> +	subtbl = (void *)erdt + sizeof(struct acpi_table_erdt);
>> +	table_end = (void *)erdt + erdt->header.length;
>> +
>> +	while ((void *)subtbl + sizeof(*subtbl) <= table_end) {
>> +		if (subtbl->length < sizeof(*subtbl) ||
>> +		    (void *)subtbl + subtbl->length > table_end) {
>> +			pr_info("ERDT: Invalid subtable length\n");
>> +			goto cleanup;
>> +		}
> 
> So this is yet another version of the above, just slighty different with
> the same strange conditions.
> 
> If you make subtbl_valid():
> 
> static inline bool subtbl_valid(void *end, struct acpi_subtbl_hdr_16 *subtbl)
>     
> and calculate the end at the call sites you can reuse that function.
> 
>> +
>> +		if (subtbl->type == ACPI_ERDT_TYPE_RMDD)
>> +			if (!parse_rmdd_entry(subtbl))
>> +				goto cleanup;
>> +
>> +		subtbl = (void *)subtbl + subtbl->length;
> 
>   open coded variant of next_subtbl()
> 

OK, will reuse next_subtbl() and subtbl_valid(). Thanks!


thanks,
Chenyu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 5/6] fs/resctrl: Do not invoke smp_processor_id() in preemptible context
  2026-06-08  8:36   ` Thomas Gleixner
@ 2026-06-08 11:26     ` Chen, Yu C
  2026-06-08 15:10       ` Reinette Chatre
  2026-06-08 15:32       ` Luck, Tony
  0 siblings, 2 replies; 18+ messages in thread
From: Chen, Yu C @ 2026-06-08 11:26 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: x86, linux-kernel, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy, tony.luck,
	reinette.chatre

On 6/8/2026 4:36 PM, Thomas Gleixner wrote:
> On Sat, Jun 06 2026 at 10:38, Chen Yu wrote:
>> From: Tony Luck <tony.luck@intel.com>
>>
>> __l3_mon_event_count() and __l3_mon_event_count_sum() call
>> smp_processor_id() to obtain the current CPU. However, some
>> monitor events can be read from any CPU in task context via
>> mon_event_count(); in that case the calling context is
>> preemptible and smp_processor_id() triggers a debug warning.
> 
> Is this new with this MMIO stuff or is this an existing issue? If the
> latter then this patch should be in front of the series and get a fixes
> tag. If not the change log should explain it.
> 

It is an existing issue. I'll adjust the sequence and provide a fixes tag.
Thanks!

thanks,
Chenyu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 5/6] fs/resctrl: Do not invoke smp_processor_id() in preemptible context
  2026-06-08 11:26     ` Chen, Yu C
@ 2026-06-08 15:10       ` Reinette Chatre
  2026-06-08 16:45         ` Chen, Yu C
  2026-06-08 15:32       ` Luck, Tony
  1 sibling, 1 reply; 18+ messages in thread
From: Reinette Chatre @ 2026-06-08 15:10 UTC (permalink / raw)
  To: Chen, Yu C, Thomas Gleixner
  Cc: x86, linux-kernel, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy, tony.luck

Hi Chenyu,

On 6/8/26 4:26 AM, Chen, Yu C wrote:
> On 6/8/2026 4:36 PM, Thomas Gleixner wrote:
>> On Sat, Jun 06 2026 at 10:38, Chen Yu wrote:
>>> From: Tony Luck <tony.luck@intel.com>
>>>
>>> __l3_mon_event_count() and __l3_mon_event_count_sum() call
>>> smp_processor_id() to obtain the current CPU. However, some
>>> monitor events can be read from any CPU in task context via
>>> mon_event_count(); in that case the calling context is
>>> preemptible and smp_processor_id() triggers a debug warning.
>>
>> Is this new with this MMIO stuff or is this an existing issue? If the
>> latter then this patch should be in front of the series and get a fixes
>> tag. If not the change log should explain it.
>>
> 
> It is an existing issue. I'll adjust the sequence and provide a fixes tag.
> Thanks!


Could you please elaborate how this is an existing issue?

At this point in implementation mon_evt::any_cpu is and can only be false for
all events associated with resource RDT_RESOURCE_L3. This is highlighted by
the existing splat that will be triggered by the
		WARN_ON_ONCE(rr->evt->any_cpu);
that is removed by this patch.

It is this series that makes it possible to read L3 cache occupancy from any CPU.
This is only done in the following patch (6/6) that makes mon_evt::any_cpu for
L3 cache occupancy be conditional on MMIO support enabled by this series.

This patch (5/6) replaces the above mentioned splat with a subtler check since it
is about to become possible for L3 cache occupancy events to be read from any CPU.

I thus do not see this as an existing issue needing fixing but instead a
necessary part of this series that is specifically a preparatory patch for the
final MMIO based event enabling done in patch 6/6.

Reinette


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 5/6] fs/resctrl: Do not invoke smp_processor_id() in preemptible context
  2026-06-08 11:26     ` Chen, Yu C
  2026-06-08 15:10       ` Reinette Chatre
@ 2026-06-08 15:32       ` Luck, Tony
  2026-06-08 16:54         ` Chen, Yu C
  1 sibling, 1 reply; 18+ messages in thread
From: Luck, Tony @ 2026-06-08 15:32 UTC (permalink / raw)
  To: Chen, Yu C
  Cc: Thomas Gleixner, x86, linux-kernel, bp, mingo, dave.hansen, hpa,
	dave.martin, james.morse, fenghuay, babu.moger,
	anil.keshavamurthy, reinette.chatre

On Mon, Jun 08, 2026 at 07:26:30PM +0800, Chen, Yu C wrote:
> On 6/8/2026 4:36 PM, Thomas Gleixner wrote:
> > On Sat, Jun 06 2026 at 10:38, Chen Yu wrote:
> > > From: Tony Luck <tony.luck@intel.com>
> > > 
> > > __l3_mon_event_count() and __l3_mon_event_count_sum() call
> > > smp_processor_id() to obtain the current CPU. However, some
> > > monitor events can be read from any CPU in task context via
> > > mon_event_count(); in that case the calling context is
> > > preemptible and smp_processor_id() triggers a debug warning.
> > 
> > Is this new with this MMIO stuff or is this an existing issue? If the
> > latter then this patch should be in front of the series and get a fixes
> > tag. If not the change log should explain it.
> > 
> 
> It is an existing issue. I'll adjust the sequence and provide a fixes tag.
> Thanks!

I don't think this is an existing issue. Before addition of MMIO
registers access is via the IA32_QM_EVTSEL and IA32_QM_CTR MSRs
which require access from a CPU on the same domain.

-Tony

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 5/6] fs/resctrl: Do not invoke smp_processor_id() in preemptible context
  2026-06-08 15:10       ` Reinette Chatre
@ 2026-06-08 16:45         ` Chen, Yu C
  0 siblings, 0 replies; 18+ messages in thread
From: Chen, Yu C @ 2026-06-08 16:45 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: x86, linux-kernel, bp, mingo, dave.hansen, hpa, dave.martin,
	james.morse, fenghuay, babu.moger, anil.keshavamurthy, tony.luck,
	Thomas Gleixner

Hi Reinette,

On 6/8/2026 11:10 PM, Reinette Chatre wrote:
> Hi Chenyu,
> 
> On 6/8/26 4:26 AM, Chen, Yu C wrote:
>> On 6/8/2026 4:36 PM, Thomas Gleixner wrote:
>>> On Sat, Jun 06 2026 at 10:38, Chen Yu wrote:
>>>> From: Tony Luck <tony.luck@intel.com>
>>>>
>>>> __l3_mon_event_count() and __l3_mon_event_count_sum() call
>>>> smp_processor_id() to obtain the current CPU. However, some
>>>> monitor events can be read from any CPU in task context via
>>>> mon_event_count(); in that case the calling context is
>>>> preemptible and smp_processor_id() triggers a debug warning.
>>>
>>> Is this new with this MMIO stuff or is this an existing issue? If the
>>> latter then this patch should be in front of the series and get a fixes
>>> tag. If not the change log should explain it.
>>>
>>
>> It is an existing issue. I'll adjust the sequence and provide a fixes tag.
>> Thanks!
> 
> 
> Could you please elaborate how this is an existing issue?
> 
> At this point in implementation mon_evt::any_cpu is and can only be false for
> all events associated with resource RDT_RESOURCE_L3. This is highlighted by
> the existing splat that will be triggered by the
> 		WARN_ON_ONCE(rr->evt->any_cpu);
> that is removed by this patch.
> 
> It is this series that makes it possible to read L3 cache occupancy from any CPU.
> This is only done in the following patch (6/6) that makes mon_evt::any_cpu for
> L3 cache occupancy be conditional on MMIO support enabled by this series.
> 
> This patch (5/6) replaces the above mentioned splat with a subtler check since it
> is about to become possible for L3 cache occupancy events to be read from any CPU.
> 
> I thus do not see this as an existing issue needing fixing but instead a
> necessary part of this series that is specifically a preparatory patch for the
> final MMIO based event enabling done in patch 6/6.
> 

You are right. For RDT_RESOURCE_L3 monitor data reading, before MMIO 
monitor read is
introduced, any_cpu is false, and MSRs are always read on the CPU of 
that domain in the
IPI context. Let me refine the commit log for [PATCH 5/6].

thanks,
Chenyu

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 5/6] fs/resctrl: Do not invoke smp_processor_id() in preemptible context
  2026-06-08 15:32       ` Luck, Tony
@ 2026-06-08 16:54         ` Chen, Yu C
  0 siblings, 0 replies; 18+ messages in thread
From: Chen, Yu C @ 2026-06-08 16:54 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Thomas Gleixner, x86, linux-kernel, bp, mingo, dave.hansen, hpa,
	dave.martin, james.morse, fenghuay, babu.moger,
	anil.keshavamurthy, reinette.chatre

Hi Tony,

On 6/8/2026 11:32 PM, Luck, Tony wrote:
> On Mon, Jun 08, 2026 at 07:26:30PM +0800, Chen, Yu C wrote:
>> On 6/8/2026 4:36 PM, Thomas Gleixner wrote:
>>> On Sat, Jun 06 2026 at 10:38, Chen Yu wrote:
>>>> From: Tony Luck <tony.luck@intel.com>
>>>>
>>>> __l3_mon_event_count() and __l3_mon_event_count_sum() call
>>>> smp_processor_id() to obtain the current CPU. However, some
>>>> monitor events can be read from any CPU in task context via
>>>> mon_event_count(); in that case the calling context is
>>>> preemptible and smp_processor_id() triggers a debug warning.
>>>
>>> Is this new with this MMIO stuff or is this an existing issue? If the
>>> latter then this patch should be in front of the series and get a fixes
>>> tag. If not the change log should explain it.
>>>
>>
>> It is an existing issue. I'll adjust the sequence and provide a fixes tag.
>> Thanks!
> 
> I don't think this is an existing issue. Before addition of MMIO
> registers access is via the IA32_QM_EVTSEL and IA32_QM_CTR MSRs
> which require access from a CPU on the same domain.
> 

Thanks for pointing this out. Right, [PATCH 5/6] serves as a preparatory
patch to enable MMIO reads  for RDT_RESOURCE_L3 within a preemptible 
context.
I had overlooked that any_cpu was set to false for RDT_RESOURCE_L3. any_cpu
will only be enabled  for QOS_L3_OCCUP_EVENT_ID in [PATCH 6/6].

thanks,
Chenyu

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2026-06-08 16:55 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-06  2:31 [PATCH v3 0/6] Introduce MMIO-based CMT access for Enhanced RDT Chen Yu
2026-06-06  2:32 ` [PATCH v3 1/6] x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID Chen Yu
2026-06-08  8:59   ` Thomas Gleixner
2026-06-08 11:20     ` Chen, Yu C
2026-06-06  2:33 ` [PATCH v3 2/6] x86/resctrl: Parse ACPI CMRC table Chen Yu
2026-06-08  8:30   ` Thomas Gleixner
2026-06-06  2:35 ` [PATCH v3 3/6] x86/resctrl: Rename prev_msr to prev_mon_val Chen Yu
2026-06-08  8:32   ` Thomas Gleixner
2026-06-06  2:38 ` [PATCH v3 4/6] x86/resctrl: Refactor the monitor read function Chen Yu
2026-06-06  2:38 ` [PATCH v3 5/6] fs/resctrl: Do not invoke smp_processor_id() in preemptible context Chen Yu
2026-06-08  8:36   ` Thomas Gleixner
2026-06-08 11:26     ` Chen, Yu C
2026-06-08 15:10       ` Reinette Chatre
2026-06-08 16:45         ` Chen, Yu C
2026-06-08 15:32       ` Luck, Tony
2026-06-08 16:54         ` Chen, Yu C
2026-06-06  2:38 ` [PATCH v3 6/6] x86/resctrl: Add support for L3 occupancy monitoring via RMID MMIO read Chen Yu
2026-06-08  8:33   ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.