* [PATCH v2 0/6] Introduce MMIO-based CMT access for Enhanced RDT
@ 2026-06-04 16:07 Chen Yu
2026-06-04 16:08 ` [PATCH v2 1/6] x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID Chen Yu
` (5 more replies)
0 siblings, 6 replies; 11+ messages in thread
From: Chen Yu @ 2026-06-04 16:07 UTC (permalink / raw)
To: tony.luck, reinette.chatre
Cc: x86, linux-kernel, bp, tglx, mingo, dave.hansen, hpa, dave.martin,
james.morse, fenghuay, babu.moger, anil.keshavamurthy
Intel Enhanced Resource Director Technology (ERDT) extends the existing
RDT framework with two major capabilities:
1. MMIO-based access to monitoring and allocation registers, replacing
the legacy MSR-based interface.
2. Region-aware RDT for fine-grained control over different tiers of
memory (e.g., CXL.mem, DDR).
This is described in the Intel RDT Architecture Specification:
https://cdrdv2-public.intel.com/789566/356688-intel-rdt-arch-spec.pdf
This patch set focuses on the first part: enabling MMIO-based access for
Cache Monitoring Technology (CMT), while CAT/MBM/MBA are still using MSR.
The platform advertises the MMIO register layout through the ACPI ERDT
(Enhanced Resource Director Technology) table, which contains sub-tables
describing per-domain register regions for monitoring and allocation.
With ERDT, L3 cache occupancy counters are read via MMIO rather than
MSR, allowing the reads to be performed from any CPU without requiring
cross-CPU IPIs. This series parses the relevant ACPI sub-tables (RMDD,
CMRC), prepares the resctrl monitor infrastructure for MMIO-based reads,
and adds initial support for reading L3 occupancy via the CMRC interface.
kselftest of CMT and L3_CAT has passed with minor adjustment at
https://lore.kernel.org/lkml/20260523101715.3964456-1-yu.c.chen@intel.com/.
Changes since V1:
- Add #include <linux/cleanup.h> to follow the "include-what-you-use" best
practice (Tony Luck)
- Fix 3 issues reported by:
https://sashiko.dev/#/patchset/cover.1779872016.git.yu.c.chen%40intel.com
Remove the variable of cacd in struct erdt_domain_info as it will
never be used after initialization.
Invoke erdt_exit() to avoid resource leak if rdt_alloc_capable and
rdt_mon_capable are both false.
Adjust the comments suggested by sashiko.
Anil S Keshavamurthy (1):
x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID
Chen Yu (4):
x86/resctrl: Parse ACPI CMRC table
x86/resctrl: Rename prev_msr to prev_mon_val
x86/resctrl: Refactor the monitor read function
x86/resctrl: Add support for L3 occupancy monitoring via RMID MMIO
read
Tony Luck (1):
fs/resctrl: Do not invoke smp_processor_id() in preemptible context
arch/x86/Kconfig | 4 +-
arch/x86/include/asm/resctrl.h | 4 +
arch/x86/kernel/cpu/resctrl/Makefile | 1 +
arch/x86/kernel/cpu/resctrl/core.c | 23 +-
arch/x86/kernel/cpu/resctrl/erdt.c | 451 +++++++++++++++++++++++++
arch/x86/kernel/cpu/resctrl/internal.h | 11 +-
arch/x86/kernel/cpu/resctrl/monitor.c | 64 ++--
fs/resctrl/monitor.c | 38 ++-
8 files changed, 551 insertions(+), 45 deletions(-)
create mode 100644 arch/x86/kernel/cpu/resctrl/erdt.c
--
2.25.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 1/6] x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID
2026-06-04 16:07 [PATCH v2 0/6] Introduce MMIO-based CMT access for Enhanced RDT Chen Yu
@ 2026-06-04 16:08 ` Chen Yu
2026-06-04 16:56 ` Thomas Gleixner
2026-06-04 16:11 ` [PATCH v2 2/6] x86/resctrl: Parse ACPI CMRC table Chen Yu
` (4 subsequent siblings)
5 siblings, 1 reply; 11+ messages in thread
From: Chen Yu @ 2026-06-04 16:08 UTC (permalink / raw)
To: tony.luck, reinette.chatre
Cc: x86, linux-kernel, bp, tglx, mingo, dave.hansen, hpa, dave.martin,
james.morse, fenghuay, babu.moger, anil.keshavamurthy
From: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
ERDT(Enhanced RDT) introduces a new top-level ACPI structure
(the ERDT) that the kernel must parse before any enhanced
RDT feature can be used. The ERDT improves the existing RDT
by switching low-level register access from MSR-based to
MMIO-based, which is more efficient.
The ERDT structure may include several sub ACPI tables:
- Resource Management Domain Description Structure (RMDD)
- CPU Agent Collection Description Structure (CACD)
- Cache Monitoring Registers for CPU Agents Description Structure
(CMRC)
There is one ERDT table per platform.
Each RMDD substructure in ERDT represents one resource management
domain (RMD), also known as an L3 domain. Thus, the total number
of RMDDs equals the number of L3 domains on the platform.
Each RMDD contains information such as MMIO addresses. This address
is used to retrieve RDT metrics like L3 occupancy.
Add basic ERDT ACPI table and sub-table parsing, and store the
relevant tables for later processing.
Among these sub-tables, RMDD requires special handling. There is one
RMDD per domain, and the domain ID reuses the L3 cache ID. Many code
paths need to retrieve an RMDD efficiently by domain ID (L3 cache ID).
Because L3 cache IDs are derived from x2APIC IDs and are not
contiguous, using a plain array indexed by domain ID would waste
memory. As a trade-off, an xarray is used to store these tables, with
the L3 cache ID as the key.
Suggested-by: Tony Luck <tony.luck@intel.com>
Co-developed-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
v1->v2:
Added #include <linux/cleanup.h> (Tony Luck)
Remove the variable of cacd in struct erdt_domain_info as it will
never be used after initialization(sashiko)
Invoke erdt_exit() to avoid leak resources if rdt_alloc_capable and
rdt_mon_capable are both false(sashiko)
Adjust the comments(sashiko)
---
arch/x86/Kconfig | 4 +-
arch/x86/include/asm/resctrl.h | 2 +
arch/x86/kernel/cpu/resctrl/Makefile | 1 +
arch/x86/kernel/cpu/resctrl/core.c | 21 +-
arch/x86/kernel/cpu/resctrl/erdt.c | 321 +++++++++++++++++++++++++
arch/x86/kernel/cpu/resctrl/internal.h | 3 +
6 files changed, 345 insertions(+), 7 deletions(-)
create mode 100644 arch/x86/kernel/cpu/resctrl/erdt.c
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f3f7cb01d69d..97d210bd9bb5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -515,7 +515,7 @@ config X86_MPPARSE
config X86_CPU_RESCTRL
bool "x86 CPU resource control support"
- depends on X86 && (CPU_SUP_INTEL || CPU_SUP_AMD)
+ depends on X86_64 && (CPU_SUP_INTEL || CPU_SUP_AMD)
depends on MISC_FILESYSTEMS
select ARCH_HAS_CPU_RESCTRL
select RESCTRL_FS
@@ -538,7 +538,7 @@ config X86_CPU_RESCTRL
config X86_CPU_RESCTRL_INTEL_AET
bool "Intel Application Energy Telemetry"
- depends on X86_64 && X86_CPU_RESCTRL && CPU_SUP_INTEL && INTEL_PMT_TELEMETRY=y && INTEL_TPMI=y
+ depends on X86_CPU_RESCTRL && CPU_SUP_INTEL && INTEL_PMT_TELEMETRY=y && INTEL_TPMI=y
help
Enable per-RMID telemetry events in resctrl.
diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index 575f8408a9e7..97c2f6bc7a5f 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -40,6 +40,8 @@ struct resctrl_pqr_state {
u32 default_closid;
};
+bool erdt_enabled(void);
+
DECLARE_PER_CPU(struct resctrl_pqr_state, pqr_state);
extern bool rdt_alloc_capable;
diff --git a/arch/x86/kernel/cpu/resctrl/Makefile b/arch/x86/kernel/cpu/resctrl/Makefile
index 273ddfa30836..2216ee084832 100644
--- a/arch/x86/kernel/cpu/resctrl/Makefile
+++ b/arch/x86/kernel/cpu/resctrl/Makefile
@@ -2,6 +2,7 @@
obj-$(CONFIG_X86_CPU_RESCTRL) += core.o rdtgroup.o monitor.o
obj-$(CONFIG_X86_CPU_RESCTRL) += ctrlmondata.o
obj-$(CONFIG_X86_CPU_RESCTRL_INTEL_AET) += intel_aet.o
+obj-$(CONFIG_X86_CPU_RESCTRL) += erdt.o
obj-$(CONFIG_RESCTRL_FS_PSEUDO_LOCK) += pseudo_lock.o
# To allow define_trace.h's recursive include:
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 7667cf7c4e94..88966aa5e050 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -1012,6 +1012,7 @@ static __init void check_quirks(void)
static __init bool get_rdt_resources(void)
{
+ erdt_init();
rdt_alloc_capable = get_rdt_alloc_resources();
rdt_mon_capable = get_rdt_mon_resources();
@@ -1130,20 +1131,24 @@ static int __init resctrl_arch_late_init(void)
check_quirks();
- if (!get_rdt_resources())
- return -ENODEV;
+ if (!get_rdt_resources()) {
+ ret = -ENODEV;
+ goto out;
+ }
state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
"x86/resctrl/cat:online:",
resctrl_arch_online_cpu,
resctrl_arch_offline_cpu);
- if (state < 0)
- return state;
+ if (state < 0) {
+ ret = state;
+ goto out;
+ }
ret = resctrl_init();
if (ret) {
cpuhp_remove_state(state);
- return ret;
+ goto out;
}
rdt_online = state;
@@ -1154,6 +1159,10 @@ static int __init resctrl_arch_late_init(void)
pr_info("%s monitoring detected\n", r->name);
return 0;
+
+out:
+ erdt_exit();
+ return ret;
}
late_initcall(resctrl_arch_late_init);
@@ -1165,6 +1174,8 @@ static void __exit resctrl_arch_exit(void)
cpuhp_remove_state(rdt_online);
resctrl_exit();
+
+ erdt_exit();
}
__exitcall(resctrl_arch_exit);
diff --git a/arch/x86/kernel/cpu/resctrl/erdt.c b/arch/x86/kernel/cpu/resctrl/erdt.c
new file mode 100644
index 000000000000..3f309a4b15c8
--- /dev/null
+++ b/arch/x86/kernel/cpu/resctrl/erdt.c
@@ -0,0 +1,321 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Enhanced Resource Director Technology(ERDT)
+ *
+ * Copyright (C) 2026 Intel Corporation
+ *
+ */
+
+#define pr_fmt(fmt) "resctrl: " fmt
+
+#include <linux/cleanup.h>
+#include <linux/cpu.h>
+#include <linux/err.h>
+#include <linux/xarray.h>
+#include <linux/resctrl.h>
+#include <linux/acpi.h>
+#include <asm/cpu.h>
+#include <asm/apic.h>
+#include <asm/cpu_device_id.h>
+#include "internal.h"
+
+enum erdt_mmio_type {
+ ERDT_MMIO_RMDD_CREG,
+ ERDT_MMIO_CMRC_BASE,
+ ERDT_MMIO_MAX
+};
+
+struct erdt_domain_info {
+ struct acpi_erdt_cmrc *cmrc;
+ /* MMIO address */
+ void __iomem *base[ERDT_MMIO_MAX];
+};
+
+/* true if ERDT table is present and valid */
+static bool erdt_available;
+
+/* Global variable to hold ERDT ACPI table information for later processing */
+static DEFINE_XARRAY(erdt_domain_xa); /* Indexed by L3 cache ID */
+
+#define ERDT_VALID_VERSION 1
+
+static u32 valid_subtbl_mask;
+
+/*
+ * erdt_enabled - Check if the ERDT table is present and enabled
+ */
+bool erdt_enabled(void)
+{
+ return erdt_available;
+}
+
+/*
+ * lookup_logical_cpu_by_x2apicid - Map x2APIC ID to logical CPU number
+ */
+static __init int lookup_logical_cpu_by_x2apicid(u32 x2apicid)
+{
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ if (cpu_physical_id(cpu) == x2apicid)
+ return cpu;
+ }
+
+ return -1;
+}
+
+/*
+ * get_l3_cache_id_from_cacd - Resolve L3 cache ID from CACD subtable
+ * @cacd: Pointer to the ACPI ERDT CACD structure
+ *
+ * Parses the X2APIC ID list in the given CACD subtable to
+ * identify an online logical CPU and uses it to query the associated
+ * L3 cache ID. The first valid CPU found is used for this lookup.
+ *
+ * The L3 cache ID is used as a unique domain key for ERDT domain
+ * registration and lookup.
+ *
+ * Return:
+ * L3 cache ID for the first matching CPU, or
+ * -1 if no valid CPU or L3 cache ID could be determined.
+ */
+static __init int get_l3_cache_id_from_cacd(struct acpi_erdt_cacd *cacd)
+{
+ int num_ids, cpu, online_cpu = -1, cache_id = -1, tmp;
+ struct cacheinfo *ci;
+
+ if (cacd->header.length < sizeof(*cacd) + sizeof(cacd->X2APICIDS[0])) {
+ pr_warn(FW_BUG "Invalid x2apicid CACD table\n");
+ return -1;
+ }
+
+ num_ids = (cacd->header.length - sizeof(*cacd)) / sizeof(cacd->X2APICIDS[0]);
+
+ guard(cpus_read_lock)();
+
+ for (int i = 0; i < num_ids; i++) {
+ cpu = lookup_logical_cpu_by_x2apicid(cacd->X2APICIDS[i]);
+ if (cpu == -1) {
+ pr_warn(FW_BUG "Unknown x2apicid 0x%x\n", cacd->X2APICIDS[i]);
+
+ return -1;
+ }
+
+ if (!cpu_online(cpu))
+ continue;
+
+ tmp = get_cpu_cacheinfo_id(cpu, RESCTRL_L3_CACHE);
+ if (tmp == -1) {
+ pr_warn(FW_BUG "Can not find L3 cache id for CPU%d\n", cpu);
+ return -1;
+ }
+
+ if (cache_id == -1)
+ cache_id = tmp;
+
+ if (tmp != cache_id) {
+ pr_warn(FW_BUG "CACD references multiple L3 cache instances\n");
+ return -1;
+ }
+ online_cpu = cpu;
+ }
+
+ if (online_cpu == -1)
+ return -1;
+
+ /*
+ * Check if CACD lists all CPUs in the LLC domain.
+ */
+ ci = get_cpu_cacheinfo_level(online_cpu, RESCTRL_L3_CACHE);
+ if (!ci || num_ids != cpumask_weight(&ci->shared_cpu_map)) {
+ pr_warn(FW_BUG "CACD does not list all the CPUs in L3 domain\n");
+ return -1;
+ }
+
+ return cache_id;
+}
+
+static void __iomem *erdt_ioremap_checked(phys_addr_t base, u32 size,
+ const char *desc)
+{
+ void __iomem *addr = ioremap(base, size << 12);
+
+ if (!addr)
+ pr_err("ERDT: Failed to map %s at phys addr %#llx (size: %u pages)\n",
+ desc, (unsigned long long)base, size);
+ return addr;
+}
+
+static void erdt_iounmap_domain(struct erdt_domain_info *domain)
+{
+ for (int i = 0; i < ERDT_MMIO_MAX; i++) {
+ if (domain->base[i]) {
+ iounmap(domain->base[i]);
+ domain->base[i] = NULL;
+ }
+ }
+}
+
+static void cleanup_one_domain(struct erdt_domain_info *d)
+{
+ erdt_iounmap_domain(d);
+ kfree(d);
+}
+
+static __init bool cacd_init(struct erdt_domain_info *d,
+ struct acpi_subtbl_hdr_16 *subtbl,
+ int *l3_cache_id)
+{
+ *l3_cache_id = get_l3_cache_id_from_cacd((struct acpi_erdt_cacd *)subtbl);
+
+ return *l3_cache_id != -1;
+}
+
+static __init bool parse_rmdd_entry(struct acpi_subtbl_hdr_16 *rmdd_hdr)
+{
+ struct acpi_erdt_rmdd *rmdd;
+ struct erdt_domain_info *domain_info;
+ struct acpi_subtbl_hdr_16 *subtbl;
+ int l3_cache_id = -1;
+ u32 subtbl_mask = 0;
+ void *rmdd_end;
+
+ if (rmdd_hdr->length < sizeof(*rmdd)) {
+ pr_info(FW_BUG "Invalid RMDD length %u\n", rmdd_hdr->length);
+ return false;
+ }
+
+ rmdd = (struct acpi_erdt_rmdd *)rmdd_hdr;
+
+ /* Quietly ignore non-CPU-based L3 domains */
+ if (!(rmdd->flags & 0x1))
+ return true;
+
+ domain_info = kzalloc(sizeof(*domain_info), GFP_KERNEL);
+ if (!domain_info)
+ return false;
+
+ domain_info->base[ERDT_MMIO_RMDD_CREG] = erdt_ioremap_checked(rmdd->creg_base, rmdd->creg_size,
+ "RMDD ctrl base");
+ if (!domain_info->base[ERDT_MMIO_RMDD_CREG])
+ goto cleanup;
+
+ rmdd_end = (void *)rmdd + rmdd->header.length;
+
+ /* Iterate through all sub-structures inside this RMDD block */
+ for (subtbl = (void *)rmdd + sizeof(*rmdd);
+ (void *)subtbl + sizeof(*subtbl) <= rmdd_end;
+ subtbl = (void *)subtbl + subtbl->length) {
+ if (subtbl->length < sizeof(*subtbl) ||
+ (void *)subtbl + subtbl->length > rmdd_end) {
+ pr_info("ERDT: Invalid subtable length in RMDD domain %d\n",
+ rmdd->domain_id);
+
+ goto cleanup;
+ }
+
+ switch (subtbl->type) {
+ case ACPI_ERDT_TYPE_CACD:
+ if (cacd_init(domain_info, subtbl, &l3_cache_id))
+ subtbl_mask |= BIT(ACPI_ERDT_TYPE_CACD);
+ break;
+ default:
+ break;
+ }
+ }
+
+ if (l3_cache_id == -1) {
+ pr_info("ERDT: Failed to resolve L3 cache ID for RMDD domain %d\n",
+ rmdd->domain_id);
+
+ goto cleanup;
+ }
+
+ /* Require all RMDDs to support same set of sub-tables */
+ if (!valid_subtbl_mask) {
+ valid_subtbl_mask = subtbl_mask;
+ } else if (subtbl_mask != valid_subtbl_mask) {
+ pr_info(FW_BUG "Mismatch domain\n");
+ goto cleanup;
+ }
+
+ if (xa_insert(&erdt_domain_xa, l3_cache_id, domain_info, GFP_KERNEL)) {
+ pr_info("ERDT: Failed to store domain info for RMDD domain %d\n",
+ rmdd->domain_id);
+ goto cleanup;
+ }
+
+ return true;
+
+cleanup:
+ cleanup_one_domain(domain_info);
+ return false;
+}
+
+static void erdt_cleanup(void)
+{
+ struct erdt_domain_info *d;
+ unsigned long index;
+
+ xa_for_each(&erdt_domain_xa, index, d)
+ cleanup_one_domain(d);
+ xa_destroy(&erdt_domain_xa);
+}
+
+/*
+ * enumerate_erdt_table - Store pointer to ERDT and begin domain parsing
+ */
+static __init int enumerate_erdt_table(struct acpi_table_header *table_hdr)
+{
+ struct acpi_table_erdt *erdt = (struct acpi_table_erdt *)table_hdr;
+ struct acpi_subtbl_hdr_16 *subtbl;
+ void *table_end;
+
+ if (erdt->header.revision != ERDT_VALID_VERSION) {
+ pr_info("Unknown ERDT table revision %d\n", erdt->header.revision);
+ return -EINVAL;
+ }
+
+ if (erdt->header.length < sizeof(*erdt)) {
+ pr_info(FW_BUG "ERDT: Invalid table length %u\n", erdt->header.length);
+ return -EINVAL;
+ }
+
+ subtbl = (void *)erdt + sizeof(struct acpi_table_erdt);
+ table_end = (void *)erdt + erdt->header.length;
+
+ while ((void *)subtbl + sizeof(*subtbl) <= table_end) {
+ if (subtbl->length < sizeof(*subtbl) ||
+ (void *)subtbl + subtbl->length > table_end) {
+ pr_info("ERDT: Invalid subtable length\n");
+ goto cleanup;
+ }
+
+ if (subtbl->type == ACPI_ERDT_TYPE_RMDD)
+ if (!parse_rmdd_entry(subtbl))
+ goto cleanup;
+
+ subtbl = (void *)subtbl + subtbl->length;
+ }
+
+ erdt_available = true;
+
+ return 0;
+
+cleanup:
+ erdt_cleanup();
+ return -EINVAL;
+}
+
+/*
+ * erdt_init - ACPI ERDT table initialization entry point
+ */
+int __init erdt_init(void)
+{
+ return acpi_table_parse(ACPI_SIG_ERDT, enumerate_erdt_table);
+}
+
+void erdt_exit(void)
+{
+ erdt_cleanup();
+}
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index e3cfa0c10e92..9c59bd5e028e 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -253,4 +253,7 @@ static inline void intel_aet_mon_domain_setup(int cpu, int id, struct rdt_resour
static inline bool intel_handle_aet_option(bool force_off, char *tok) { return false; }
#endif
+int erdt_init(void);
+void erdt_exit(void);
+
#endif /* _ASM_X86_RESCTRL_INTERNAL_H */
--
2.25.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 2/6] x86/resctrl: Parse ACPI CMRC table
2026-06-04 16:07 [PATCH v2 0/6] Introduce MMIO-based CMT access for Enhanced RDT Chen Yu
2026-06-04 16:08 ` [PATCH v2 1/6] x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID Chen Yu
@ 2026-06-04 16:11 ` Chen Yu
2026-06-04 16:57 ` Thomas Gleixner
2026-06-04 16:11 ` [PATCH v2 3/6] x86/resctrl: Rename prev_msr to prev_mon_val Chen Yu
` (3 subsequent siblings)
5 siblings, 1 reply; 11+ messages in thread
From: Chen Yu @ 2026-06-04 16:11 UTC (permalink / raw)
To: tony.luck, reinette.chatre
Cc: x86, linux-kernel, bp, tglx, mingo, dave.hansen, hpa, dave.martin,
james.morse, fenghuay, babu.moger, anil.keshavamurthy
The CMRC (Cache Monitoring Registers for CPU Agents Description)
sub-table of ERDT describes the MMIO registers used to read
cache monitoring counters (e.g. LLC occupancy) for an RMD.
Parse each CMRC sub-table, ioremap its register window, and save
the CMRC pointer in the corresponding ERDT domain entry so that
later monitoring code can read the counters via MMIO.
Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
arch/x86/kernel/cpu/resctrl/erdt.c | 40 ++++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/erdt.c b/arch/x86/kernel/cpu/resctrl/erdt.c
index 3f309a4b15c8..d1eca4594f09 100644
--- a/arch/x86/kernel/cpu/resctrl/erdt.c
+++ b/arch/x86/kernel/cpu/resctrl/erdt.c
@@ -38,6 +38,7 @@ static bool erdt_available;
static DEFINE_XARRAY(erdt_domain_xa); /* Indexed by L3 cache ID */
#define ERDT_VALID_VERSION 1
+#define CMRC_VALID_INDEX_FUNC_VERSION 1
static u32 valid_subtbl_mask;
@@ -159,6 +160,7 @@ static void erdt_iounmap_domain(struct erdt_domain_info *domain)
static void cleanup_one_domain(struct erdt_domain_info *d)
{
erdt_iounmap_domain(d);
+ kfree(d->cmrc);
kfree(d);
}
@@ -171,6 +173,40 @@ static __init bool cacd_init(struct erdt_domain_info *d,
return *l3_cache_id != -1;
}
+static __init bool cmrc_init(struct erdt_domain_info *d, struct acpi_subtbl_hdr_16 *subtbl)
+{
+ struct acpi_erdt_cmrc *cmrc = (struct acpi_erdt_cmrc *)subtbl;
+
+ if (subtbl->length < sizeof(*cmrc)) {
+ pr_warn(FW_BUG "Truncated CMRC subtable\n");
+ return false;
+ }
+
+ if (cmrc->index_fn != CMRC_VALID_INDEX_FUNC_VERSION) {
+ pr_info("Unknown CMRC index function %d\n", cmrc->index_fn);
+ return false;
+ }
+
+ if (!cmrc->clump_size) {
+ pr_warn(FW_BUG "CMRC clump_size is zero\n");
+ return false;
+ }
+
+ d->base[ERDT_MMIO_CMRC_BASE] = erdt_ioremap_checked(cmrc->cmt_reg_base,
+ cmrc->cmt_reg_size, "CMRC base");
+ if (!d->base[ERDT_MMIO_CMRC_BASE])
+ return false;
+
+ d->cmrc = kmemdup(cmrc, subtbl->length, GFP_KERNEL);
+ if (!d->cmrc) {
+ iounmap(d->base[ERDT_MMIO_CMRC_BASE]);
+ d->base[ERDT_MMIO_CMRC_BASE] = NULL;
+ return false;
+ }
+
+ return true;
+}
+
static __init bool parse_rmdd_entry(struct acpi_subtbl_hdr_16 *rmdd_hdr)
{
struct acpi_erdt_rmdd *rmdd;
@@ -219,6 +255,10 @@ static __init bool parse_rmdd_entry(struct acpi_subtbl_hdr_16 *rmdd_hdr)
if (cacd_init(domain_info, subtbl, &l3_cache_id))
subtbl_mask |= BIT(ACPI_ERDT_TYPE_CACD);
break;
+ case ACPI_ERDT_TYPE_CMRC:
+ if (cmrc_init(domain_info, subtbl))
+ subtbl_mask |= BIT(ACPI_ERDT_TYPE_CMRC);
+ break;
default:
break;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 3/6] x86/resctrl: Rename prev_msr to prev_mon_val
2026-06-04 16:07 [PATCH v2 0/6] Introduce MMIO-based CMT access for Enhanced RDT Chen Yu
2026-06-04 16:08 ` [PATCH v2 1/6] x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID Chen Yu
2026-06-04 16:11 ` [PATCH v2 2/6] x86/resctrl: Parse ACPI CMRC table Chen Yu
@ 2026-06-04 16:11 ` Chen Yu
2026-06-04 16:11 ` [PATCH v2 4/6] x86/resctrl: Refactor the monitor read function Chen Yu
` (2 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: Chen Yu @ 2026-06-04 16:11 UTC (permalink / raw)
To: tony.luck, reinette.chatre
Cc: x86, linux-kernel, bp, tglx, mingo, dave.hansen, hpa, dave.martin,
james.morse, fenghuay, babu.moger, anil.keshavamurthy
Rename the prev_msr field in struct arch_mbm_state to prev_mon_val.
With ERDT, the previous monitor value may come from an MMIO
register rather than from an MSR, so the "msr" suffix is no longer
accurate. The new name describes the field by its meaning (the
previous monitor value) instead of by the access method.
This is preparation for ERDT support, which reads monitoring
counters via MMIO.
No functional change.
Co-developed-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
arch/x86/kernel/cpu/resctrl/internal.h | 8 +++----
arch/x86/kernel/cpu/resctrl/monitor.c | 30 +++++++++++++-------------
2 files changed, 19 insertions(+), 19 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 9c59bd5e028e..97065dc6e14f 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -31,13 +31,13 @@
/**
* struct arch_mbm_state - values used to compute resctrl_arch_rmid_read()s
* return value.
- * @chunks: Total data moved (multiply by rdt_group.mon_scale to get bytes)
- * @prev_msr: Value of IA32_QM_CTR last time it was read for the RMID used to
- * find this struct.
+ * @chunks: Total data moved (multiply by rdt_group.mon_scale to get bytes)
+ * @prev_mon_val: Previous monitor counter value for the RMID used to
+ * find this struct.
*/
struct arch_mbm_state {
u64 chunks;
- u64 prev_msr;
+ u64 prev_mon_val;
};
/* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature. */
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 9bd87bae4983..991f0a796551 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -186,7 +186,7 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_l3_mon_domain *d
prmid = logical_rmid_to_physical_rmid(cpu, rmid);
/* Record any initial, non-zero count value. */
- __rmid_read_phys(prmid, eventid, &am->prev_msr);
+ __rmid_read_phys(prmid, eventid, &am->prev_mon_val);
}
}
@@ -209,16 +209,16 @@ void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_l3_mon_domai
}
}
-static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
+static u64 mbm_overflow_count(u64 prev_val, u64 cur_val, unsigned int width)
{
u64 shift = 64 - width, chunks;
- chunks = (cur_msr << shift) - (prev_msr << shift);
+ chunks = (cur_val << shift) - (prev_val << shift);
return chunks >> shift;
}
static u64 get_corrected_val(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
- u32 rmid, enum resctrl_event_id eventid, u64 msr_val)
+ u32 rmid, enum resctrl_event_id eventid, u64 mon_val)
{
struct rdt_hw_l3_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
@@ -227,12 +227,12 @@ static u64 get_corrected_val(struct rdt_resource *r, struct rdt_l3_mon_domain *d
am = get_arch_mbm_state(hw_dom, rmid, eventid);
if (am) {
- am->chunks += mbm_overflow_count(am->prev_msr, msr_val,
+ am->chunks += mbm_overflow_count(am->prev_mon_val, mon_val,
hw_res->mbm_width);
chunks = get_corrected_mbm_count(rmid, am->chunks);
- am->prev_msr = msr_val;
+ am->prev_mon_val = mon_val;
} else {
- chunks = msr_val;
+ chunks = mon_val;
}
return chunks * hw_res->mon_scale;
@@ -245,7 +245,7 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
struct rdt_hw_l3_mon_domain *hw_dom;
struct rdt_l3_mon_domain *d;
struct arch_mbm_state *am;
- u64 msr_val;
+ u64 mon_val;
u32 prmid;
int cpu;
int ret;
@@ -262,14 +262,14 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
hw_dom = resctrl_to_arch_mon_dom(d);
cpu = cpumask_any(&hdr->cpu_mask);
prmid = logical_rmid_to_physical_rmid(cpu, rmid);
- ret = __rmid_read_phys(prmid, eventid, &msr_val);
+ ret = __rmid_read_phys(prmid, eventid, &mon_val);
if (!ret) {
- *val = get_corrected_val(r, d, rmid, eventid, msr_val);
+ *val = get_corrected_val(r, d, rmid, eventid, mon_val);
} else if (ret == -EINVAL) {
am = get_arch_mbm_state(hw_dom, rmid, eventid);
if (am)
- am->prev_msr = 0;
+ am->prev_mon_val = 0;
}
return ret;
@@ -324,7 +324,7 @@ void resctrl_arch_reset_cntr(struct rdt_resource *r, struct rdt_l3_mon_domain *d
memset(am, 0, sizeof(*am));
/* Record any initial, non-zero count value. */
- __cntr_id_read(cntr_id, &am->prev_msr);
+ __cntr_id_read(cntr_id, &am->prev_mon_val);
}
}
@@ -332,14 +332,14 @@ int resctrl_arch_cntr_read(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
u32 unused, u32 rmid, int cntr_id,
enum resctrl_event_id eventid, u64 *val)
{
- u64 msr_val;
+ u64 mon_val;
int ret;
- ret = __cntr_id_read(cntr_id, &msr_val);
+ ret = __cntr_id_read(cntr_id, &mon_val);
if (ret)
return ret;
- *val = get_corrected_val(r, d, rmid, eventid, msr_val);
+ *val = get_corrected_val(r, d, rmid, eventid, mon_val);
return 0;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 4/6] x86/resctrl: Refactor the monitor read function
2026-06-04 16:07 [PATCH v2 0/6] Introduce MMIO-based CMT access for Enhanced RDT Chen Yu
` (2 preceding siblings ...)
2026-06-04 16:11 ` [PATCH v2 3/6] x86/resctrl: Rename prev_msr to prev_mon_val Chen Yu
@ 2026-06-04 16:11 ` Chen Yu
2026-06-04 16:11 ` [PATCH v2 5/6] fs/resctrl: Do not invoke smp_processor_id() in preemptible context Chen Yu
2026-06-04 16:11 ` [PATCH v2 6/6] x86/resctrl: Add support for L3 occupancy monitoring via RMID MMIO read Chen Yu
5 siblings, 0 replies; 11+ messages in thread
From: Chen Yu @ 2026-06-04 16:11 UTC (permalink / raw)
To: tony.luck, reinette.chatre
Cc: x86, linux-kernel, bp, tglx, mingo, dave.hansen, hpa, dave.martin,
james.morse, fenghuay, babu.moger, anil.keshavamurthy
Split the monitor read helper into an L3 read path and an AET
(Intel Application Energy Telemetry) read path. This makes the
two distinct monitoring sources easier to extend independently
and prepares the L3 path for ERDT-based MMIO reads added in a
later patch.
No functional change.
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
arch/x86/kernel/cpu/resctrl/monitor.c | 27 +++++++++++++++++++--------
1 file changed, 19 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 991f0a796551..1e81b3c33843 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -238,9 +238,9 @@ static u64 get_corrected_val(struct rdt_resource *r, struct rdt_l3_mon_domain *d
return chunks * hw_res->mon_scale;
}
-int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
- u32 unused, u32 rmid, enum resctrl_event_id eventid,
- void *arch_priv, u64 *val, void *ignored)
+static int arch_l3_read_event(struct rdt_domain_hdr *hdr, u32 rmid,
+ enum resctrl_event_id eventid, u64 *val,
+ struct rdt_resource *r)
{
struct rdt_hw_l3_mon_domain *hw_dom;
struct rdt_l3_mon_domain *d;
@@ -250,11 +250,6 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
int cpu;
int ret;
- resctrl_arch_rmid_read_context_check();
-
- if (r->rid == RDT_RESOURCE_PERF_PKG)
- return intel_aet_read_event(hdr->id, rmid, arch_priv, val);
-
if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
return -EINVAL;
@@ -275,6 +270,22 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
return ret;
}
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
+ u32 unused, u32 rmid, enum resctrl_event_id eventid,
+ void *arch_priv, u64 *val, void *ignored)
+{
+ resctrl_arch_rmid_read_context_check();
+
+ switch (r->rid) {
+ case RDT_RESOURCE_L3:
+ return arch_l3_read_event(hdr, rmid, eventid, val, r);
+ case RDT_RESOURCE_PERF_PKG:
+ return intel_aet_read_event(hdr->id, rmid, arch_priv, val);
+ default:
+ return -EINVAL;
+ }
+}
+
static int __cntr_id_read(u32 cntr_id, u64 *val)
{
u64 msr_val;
--
2.25.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 5/6] fs/resctrl: Do not invoke smp_processor_id() in preemptible context
2026-06-04 16:07 [PATCH v2 0/6] Introduce MMIO-based CMT access for Enhanced RDT Chen Yu
` (3 preceding siblings ...)
2026-06-04 16:11 ` [PATCH v2 4/6] x86/resctrl: Refactor the monitor read function Chen Yu
@ 2026-06-04 16:11 ` Chen Yu
2026-06-04 16:11 ` [PATCH v2 6/6] x86/resctrl: Add support for L3 occupancy monitoring via RMID MMIO read Chen Yu
5 siblings, 0 replies; 11+ messages in thread
From: Chen Yu @ 2026-06-04 16:11 UTC (permalink / raw)
To: tony.luck, reinette.chatre
Cc: x86, linux-kernel, bp, tglx, mingo, dave.hansen, hpa, dave.martin,
james.morse, fenghuay, babu.moger, anil.keshavamurthy
From: Tony Luck <tony.luck@intel.com>
__l3_mon_event_count() and __l3_mon_event_count_sum() call
smp_processor_id() to obtain the current CPU. However, some
monitor events can be read from any CPU in task context via
mon_event_count(); in that case the calling context is
preemptible and smp_processor_id() triggers a debug warning.
Fix this by skipping the current-CPU lookup when the event's
any_cpu flag is set, since such events do not need to run on a
specific CPU.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
fs/resctrl/monitor.c | 38 ++++++++++++++++++++++++++++----------
1 file changed, 28 insertions(+), 10 deletions(-)
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 9fd901c78dc6..371ccae04892 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -417,9 +417,33 @@ static void mbm_cntr_free(struct rdt_l3_mon_domain *d, int cntr_id)
memset(&d->cntr_cfg[cntr_id], 0, sizeof(*d->cntr_cfg));
}
+/*
+ * Called from preemptible context via a direct call of mon_event_count() for
+ * events that can be read on any CPU.
+ * Called from preemptible but non-migratable process context (mon_event_count()
+ * via smp_call_on_cpu()) OR non-preemptible context (mon_event_count() via
+ * smp_call_function_any()) for events that need to be read on a specific CPU.
+ */
+static bool cpu_on_correct_domain(struct rmid_read *rr)
+{
+ int cpu;
+
+ /* Any CPU is OK for this event */
+ if (rr->evt->any_cpu)
+ return true;
+
+ cpu = smp_processor_id();
+
+ /* Single domain. Must be on a CPU in that domain. */
+ if (rr->hdr)
+ return cpumask_test_cpu(cpu, &rr->hdr->cpu_mask);
+
+ /* Summing domains that share a cache, must be on a CPU for that cache. */
+ return cpumask_test_cpu(cpu, &rr->ci->shared_cpu_map);
+}
+
static int __l3_mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
{
- int cpu = smp_processor_id();
u32 closid = rdtgrp->closid;
u32 rmid = rdtgrp->mon.rmid;
struct rdt_l3_mon_domain *d;
@@ -452,9 +476,6 @@ static int __l3_mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
return 0;
}
- /* Reading a single domain, must be on a CPU in that domain. */
- if (!cpumask_test_cpu(cpu, &d->hdr.cpu_mask))
- return -EINVAL;
if (rr->is_mbm_cntr)
rr->err = resctrl_arch_cntr_read(rr->r, d, closid, rmid, cntr_id,
rr->evt->evtid, &tval);
@@ -472,7 +493,6 @@ static int __l3_mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
static int __l3_mon_event_count_sum(struct rdtgroup *rdtgrp, struct rmid_read *rr)
{
- int cpu = smp_processor_id();
u32 closid = rdtgrp->closid;
u32 rmid = rdtgrp->mon.rmid;
struct rdt_l3_mon_domain *d;
@@ -490,10 +510,6 @@ static int __l3_mon_event_count_sum(struct rdtgroup *rdtgrp, struct rmid_read *r
return -EINVAL;
}
- /* Summing domains that share a cache, must be on a CPU for that cache. */
- if (!cpumask_test_cpu(cpu, &rr->ci->shared_cpu_map))
- return -EINVAL;
-
/*
* Legacy files must report the sum of an event across all
* domains that share the same L3 cache instance.
@@ -524,7 +540,9 @@ static int __mon_event_count(struct rdtgroup *rdtgrp, struct rmid_read *rr)
{
switch (rr->r->rid) {
case RDT_RESOURCE_L3:
- WARN_ON_ONCE(rr->evt->any_cpu);
+ if (!cpu_on_correct_domain(rr))
+ return -EINVAL;
+
if (rr->hdr)
return __l3_mon_event_count(rdtgrp, rr);
else
--
2.25.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 6/6] x86/resctrl: Add support for L3 occupancy monitoring via RMID MMIO read
2026-06-04 16:07 [PATCH v2 0/6] Introduce MMIO-based CMT access for Enhanced RDT Chen Yu
` (4 preceding siblings ...)
2026-06-04 16:11 ` [PATCH v2 5/6] fs/resctrl: Do not invoke smp_processor_id() in preemptible context Chen Yu
@ 2026-06-04 16:11 ` Chen Yu
5 siblings, 0 replies; 11+ messages in thread
From: Chen Yu @ 2026-06-04 16:11 UTC (permalink / raw)
To: tony.luck, reinette.chatre
Cc: x86, linux-kernel, bp, tglx, mingo, dave.hansen, hpa, dave.martin,
james.morse, fenghuay, babu.moger, anil.keshavamurthy
The CMRC (Cache Monitoring Registers for CPU Agents Description)
ACPI sub-table provides the MMIO address used to read the LLC
occupancy counter for each RMID. When ERDT is enabled on the
platform, use this MMIO interface instead of the legacy MSR read
to obtain the L3 occupancy value.
Introduce erdt_mon_read(), a helper that retrieves monitoring
data for a given RMID and event ID from an ERDT domain. Initial
support is added for the L3 occupancy monitoring event
(QOS_L3_OCCUP_EVENT_ID).
If the platform supports ERDT, CMRC-based MMIO access is used by
default. If ERDT is unavailable, the implementation is to use
MSR-based operations.
Suggested-by: Tony Luck <tony.luck@intel.com>
Co-developed-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
arch/x86/include/asm/resctrl.h | 2 +
arch/x86/kernel/cpu/resctrl/core.c | 2 +-
arch/x86/kernel/cpu/resctrl/erdt.c | 90 +++++++++++++++++++++++++++
arch/x86/kernel/cpu/resctrl/monitor.c | 7 +++
4 files changed, 100 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index 97c2f6bc7a5f..9b3b03279dd8 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -41,6 +41,8 @@ struct resctrl_pqr_state {
};
bool erdt_enabled(void);
+struct rdt_domain_hdr;
+int erdt_mon_read(struct rdt_domain_hdr *hdr, int ev_id, int rmid, u64 *val);
DECLARE_PER_CPU(struct resctrl_pqr_state, pqr_state);
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 88966aa5e050..2dda8011c2ce 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -965,7 +965,7 @@ static __init bool get_rdt_mon_resources(void)
bool ret = false;
if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) {
- resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false, 0, NULL);
+ resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, erdt_enabled(), 0, NULL);
ret = true;
}
if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
diff --git a/arch/x86/kernel/cpu/resctrl/erdt.c b/arch/x86/kernel/cpu/resctrl/erdt.c
index d1eca4594f09..905d7fb53bd2 100644
--- a/arch/x86/kernel/cpu/resctrl/erdt.c
+++ b/arch/x86/kernel/cpu/resctrl/erdt.c
@@ -39,6 +39,7 @@ static DEFINE_XARRAY(erdt_domain_xa); /* Indexed by L3 cache ID */
#define ERDT_VALID_VERSION 1
#define CMRC_VALID_INDEX_FUNC_VERSION 1
+#define UNAVAILABLE_COUNTER BIT_ULL(63)
static u32 valid_subtbl_mask;
@@ -65,6 +66,95 @@ static __init int lookup_logical_cpu_by_x2apicid(u32 x2apicid)
return -1;
}
+static void __iomem *cmrc_index_function_1(struct erdt_domain_info *d,
+ struct acpi_erdt_cmrc *cmrc, int rmid)
+{
+ u16 clump_size, stride_size;
+ void __iomem *vaddr;
+
+ clump_size = cmrc->clump_size;
+ stride_size = cmrc->clump_stride;
+
+ /*
+ * MMIO_ADDRESS_for_RMID# = CMRC Base +
+ * (RMID / ClumpSize) * Stride +
+ * (RMID % ClumpSize) * 8
+ */
+ vaddr = d->base[ERDT_MMIO_CMRC_BASE] +
+ (rmid / clump_size) * stride_size +
+ (rmid % clump_size) * 8;
+
+ return vaddr;
+}
+
+/*
+ * erdt_read_l3_occupancy - Read L3 occupancy count for a given RMID
+ * @d: Pointer to the ERDT domain info
+ * @rmid: Resource Monitoring ID to read occupancy for
+ *
+ * Calculates the MMIO address using clump and stride information
+ * from the CMRC ACPI structure and reads the L3 cache occupancy
+ * count for the given RMID. The raw value is scaled using the
+ * up_scale factor provided by firmware.
+ *
+ * Return: 0 for success, error code for other cases.
+ */
+static int erdt_read_l3_occupancy(struct erdt_domain_info *d, int rmid,
+ u64 *val)
+{
+ struct acpi_erdt_cmrc *cmrc;
+ void __iomem *vaddr;
+ u64 l3_cmt_count;
+ u32 offset;
+
+ cmrc = d->cmrc;
+ if (!cmrc)
+ return -EIO;
+
+ offset = (rmid / cmrc->clump_size) * cmrc->clump_stride +
+ (rmid % cmrc->clump_size) * 8;
+ if (offset + sizeof(u64) > (u32)cmrc->cmt_reg_size << 12)
+ return -EINVAL;
+
+ vaddr = cmrc_index_function_1(d, cmrc, rmid);
+
+ l3_cmt_count = readq(vaddr);
+ if (l3_cmt_count & UNAVAILABLE_COUNTER)
+ return -EINVAL;
+
+ *val = l3_cmt_count * cmrc->up_scale;
+
+ return 0;
+}
+
+/*
+ * erdt_mon_read - Read monitoring data for a given domain and RMID
+ * @hdr: Domain header
+ * @ev_id: Monitoring event ID (e.g. QOS_L3_OCCUP_EVENT_ID)
+ * @rmid: Resource Monitoring ID for which to read the data
+ * @val: Store the read data
+ *
+ * Looks up the domain by domid and dispatches the read request
+ * to the appropriate helper based on the event type.
+ * Currently supports only L3 occupancy monitoring.
+ *
+ * Return 0 on succeed, error code otherwise.
+ */
+int erdt_mon_read(struct rdt_domain_hdr *hdr, int ev_id, int rmid,
+ u64 *val)
+{
+ struct erdt_domain_info *d;
+
+ d = xa_load(&erdt_domain_xa, hdr->id);
+ if (!d)
+ return -EIO;
+
+ if (ev_id == QOS_L3_OCCUP_EVENT_ID)
+ return erdt_read_l3_occupancy(d, rmid, val);
+
+ return -EINVAL;
+}
+
/*
* get_l3_cache_id_from_cacd - Resolve L3 cache ID from CACD subtable
* @cacd: Pointer to the ACPI ERDT CACD structure
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 1e81b3c33843..12b4014e47f3 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -278,6 +278,13 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
switch (r->rid) {
case RDT_RESOURCE_L3:
+ /*
+ * No SNC for mmio based L3 occupancy, so there is no need
+ * to convert logical RMID to a physical RMID via
+ * logical_rmid_to_physical_rmid().
+ */
+ if (erdt_enabled() && eventid == QOS_L3_OCCUP_EVENT_ID)
+ return erdt_mon_read(hdr, eventid, rmid, val);
return arch_l3_read_event(hdr, rmid, eventid, val, r);
case RDT_RESOURCE_PERF_PKG:
return intel_aet_read_event(hdr->id, rmid, arch_priv, val);
--
2.25.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/6] x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID
2026-06-04 16:08 ` [PATCH v2 1/6] x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID Chen Yu
@ 2026-06-04 16:56 ` Thomas Gleixner
2026-06-05 11:29 ` Chen, Yu C
0 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2026-06-04 16:56 UTC (permalink / raw)
To: Chen Yu, tony.luck, reinette.chatre
Cc: x86, linux-kernel, bp, mingo, dave.hansen, hpa, dave.martin,
james.morse, fenghuay, babu.moger, anil.keshavamurthy
On Fri, Jun 05 2026 at 00:08, Chen Yu wrote:
> @@ -1130,20 +1131,24 @@ static int __init resctrl_arch_late_init(void)
>
> check_quirks();
>
> - if (!get_rdt_resources())
> - return -ENODEV;
> + if (!get_rdt_resources()) {
> + ret = -ENODEV;
> + goto out;
You can spare all that goto mess by renaming this function to
__resctrl_arch_late_init() and have a new
static int __init resctrl_arch_late_init(void)
{
int ret = __resctrl_arch_late_init();
if (ret)
erdt_exit();
return ret;
}
No?
> +struct erdt_domain_info {
> + struct acpi_erdt_cmrc *cmrc;
> + /* MMIO address */
> + void __iomem *base[ERDT_MMIO_MAX];
> +};
https://docs.kernel.org/process/maintainer-tip.html#struct-declarations-and-initializers
and the rest of that document.
> +
> +/* true if ERDT table is present and valid */
> +static bool erdt_available;
> +
> +/* Global variable to hold ERDT ACPI table information for later processing */
> +static DEFINE_XARRAY(erdt_domain_xa); /* Indexed by L3 cache ID */
No tail comments please
> +#define ERDT_VALID_VERSION 1
> +
> +static u32 valid_subtbl_mask;
> +
> +/*
> + * erdt_enabled - Check if the ERDT table is present and enabled
> + */
> +bool erdt_enabled(void)
> +{
> + return erdt_available;
This naming convention is confusing at best. First you declare
> +/* true if ERDT table is present and valid */
> +static bool erdt_available;
Which says nothing about enabled and then you claim that reading the
available variable tells you whether it is enabled. Can you please make
your mind up and make this consistent and comprehensible?
> +/*
> + * lookup_logical_cpu_by_x2apicid - Map x2APIC ID to logical CPU number
> + */
> +static __init int lookup_logical_cpu_by_x2apicid(u32 x2apicid)
> +{
> + int cpu;
> +
> + for_each_possible_cpu(cpu) {
> + if (cpu_physical_id(cpu) == x2apicid)
> + return cpu;
> + }
> +
> + return -1;
No. The topology code has topo_lookup_cpuid(). Please expose that
instead of hacking up your own version of it.
> +/*
Not a valid kernel doc comment. Those start with /**
> +
> +static void __iomem *erdt_ioremap_checked(phys_addr_t base, u32 size,
> + const char *desc)
> +{
> + void __iomem *addr = ioremap(base, size << 12);
> +
> + if (!addr)
> + pr_err("ERDT: Failed to map %s at phys addr %#llx (size: %u pages)\n",
> + desc, (unsigned long long)base, size);
Lacks brackets. See bracket rules.
> + return addr;
> +}
> +
> +static __init bool cacd_init(struct erdt_domain_info *d,
> + struct acpi_subtbl_hdr_16 *subtbl,
> + int *l3_cache_id)
You have 100 characters. Please use them.
> +{
> + *l3_cache_id = get_l3_cache_id_from_cacd((struct acpi_erdt_cacd *)subtbl);
> +
> + return *l3_cache_id != -1;
> +}
> +
> +static __init bool parse_rmdd_entry(struct acpi_subtbl_hdr_16 *rmdd_hdr)
> +{
> + struct acpi_erdt_rmdd *rmdd;
> + struct erdt_domain_info *domain_info;
> + struct acpi_subtbl_hdr_16 *subtbl;
> + int l3_cache_id = -1;
> + u32 subtbl_mask = 0;
> + void *rmdd_end;
See variable declaration doc chapter
> +
> + if (rmdd_hdr->length < sizeof(*rmdd)) {
> + pr_info(FW_BUG "Invalid RMDD length %u\n", rmdd_hdr->length);
> + return false;
> + }
> +
> + rmdd = (struct acpi_erdt_rmdd *)rmdd_hdr;
> +
> + /* Quietly ignore non-CPU-based L3 domains */
> + if (!(rmdd->flags & 0x1))
0x1 is really a useful and intuitive constant. Use a proper define for it.
> + return true;
> +
> + domain_info = kzalloc(sizeof(*domain_info), GFP_KERNEL);
> + if (!domain_info)
> + return false;
> +
> + domain_info->base[ERDT_MMIO_RMDD_CREG] = erdt_ioremap_checked(rmdd->creg_base, rmdd->creg_size,
> + "RMDD ctrl base");
> + if (!domain_info->base[ERDT_MMIO_RMDD_CREG])
> + goto cleanup;
> +
> + rmdd_end = (void *)rmdd + rmdd->header.length;
> +
> + /* Iterate through all sub-structures inside this RMDD block */
> + for (subtbl = (void *)rmdd + sizeof(*rmdd);
> + (void *)subtbl + sizeof(*subtbl) <= rmdd_end;
> + subtbl = (void *)subtbl + subtbl->length) {
> + if (subtbl->length < sizeof(*subtbl) ||
> + (void *)subtbl + subtbl->length > rmdd_end) {
This unreadable type cast orgy makes my eyes bleed.
for (subtbl = rmdd_subtbl(rmdd); subtbl_valid(rmdd, subtbl); subtbl = next_subtbl(subtbl))
or something comprehensible like that.
Also this gem is made up nonsense:
> + if (subtbl->length < sizeof(*subtbl) ||
> + (void *)subtbl + subtbl->length > rmdd_end) {
Because of the loop condition above it is equivalent to:
if (subtbl->length != sizeof(*subtbl))
fail();
But that's not convoluted enough it seems.
Thanks,
tglx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/6] x86/resctrl: Parse ACPI CMRC table
2026-06-04 16:11 ` [PATCH v2 2/6] x86/resctrl: Parse ACPI CMRC table Chen Yu
@ 2026-06-04 16:57 ` Thomas Gleixner
2026-06-05 12:14 ` Chen, Yu C
0 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2026-06-04 16:57 UTC (permalink / raw)
To: Chen Yu, tony.luck, reinette.chatre
Cc: x86, linux-kernel, bp, mingo, dave.hansen, hpa, dave.martin,
james.morse, fenghuay, babu.moger, anil.keshavamurthy
On Fri, Jun 05 2026 at 00:11, Chen Yu wrote:
> The CMRC (Cache Monitoring Registers for CPU Agents Description)
> sub-table of ERDT describes the MMIO registers used to read
> cache monitoring counters (e.g. LLC occupancy) for an RMD.
>
> Parse each CMRC sub-table, ioremap its register window, and save
> the CMRC pointer in the corresponding ERDT domain entry so that
> later monitoring code can read the counters via MMIO.
>
> Suggested-by: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> ---
> arch/x86/kernel/cpu/resctrl/erdt.c | 40 ++++++++++++++++++++++++++++++
> 1 file changed, 40 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/erdt.c b/arch/x86/kernel/cpu/resctrl/erdt.c
> index 3f309a4b15c8..d1eca4594f09 100644
> --- a/arch/x86/kernel/cpu/resctrl/erdt.c
> +++ b/arch/x86/kernel/cpu/resctrl/erdt.c
> @@ -38,6 +38,7 @@ static bool erdt_available;
> static DEFINE_XARRAY(erdt_domain_xa); /* Indexed by L3 cache ID */
>
> #define ERDT_VALID_VERSION 1
> +#define CMRC_VALID_INDEX_FUNC_VERSION 1
Please make this tabular so it's easy to parse
#define ERDT_VALID_VERSION 1
#define CMRC_VALID_INDEX_FUNC_VERSION 1
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/6] x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID
2026-06-04 16:56 ` Thomas Gleixner
@ 2026-06-05 11:29 ` Chen, Yu C
0 siblings, 0 replies; 11+ messages in thread
From: Chen, Yu C @ 2026-06-05 11:29 UTC (permalink / raw)
To: Thomas Gleixner
Cc: x86, linux-kernel, bp, mingo, dave.hansen, hpa, dave.martin,
james.morse, fenghuay, babu.moger, anil.keshavamurthy, tony.luck,
reinette.chatre
Hi Thomas,
Thanks for the thorough review.
On 6/5/2026 12:56 AM, Thomas Gleixner wrote:
> On Fri, Jun 05 2026 at 00:08, Chen Yu wrote:
>> @@ -1130,20 +1131,24 @@ static int __init resctrl_arch_late_init(void)
>>
>> check_quirks();
>>
>> - if (!get_rdt_resources())
>> - return -ENODEV;
>> + if (!get_rdt_resources()) {
>> + ret = -ENODEV;
>> + goto out;
>
> You can spare all that goto mess by renaming this function to
> __resctrl_arch_late_init() and have a new
>
> static int __init resctrl_arch_late_init(void)
> {
> int ret = __resctrl_arch_late_init();
>
> if (ret)
> erdt_exit();
> return ret;
> }
>
> No?
>
OK, renamed the original to __resctrl_arch_late_init() and added
the wrapper.
>> +struct erdt_domain_info {
>> + struct acpi_erdt_cmrc *cmrc;
>> + /* MMIO address */
>> + void __iomem *base[ERDT_MMIO_MAX];
>> +};
>
> https://docs.kernel.org/process/maintainer-tip.html#struct-declarations-and-initializers
>
> and the rest of that document.
>
OK, reformatted to use tabular member alignment.
>> +
>> +/* true if ERDT table is present and valid */
>> +static bool erdt_available;
>> +
>> +/* Global variable to hold ERDT ACPI table information for later processing */
>> +static DEFINE_XARRAY(erdt_domain_xa); /* Indexed by L3 cache ID */
>
> No tail comments please
>
OK.
>> +#define ERDT_VALID_VERSION 1
>> +
>> +static u32 valid_subtbl_mask;
>> +
>> +/*
>> + * erdt_enabled - Check if the ERDT table is present and enabled
>> + */
>> +bool erdt_enabled(void)
>> +{
>> + return erdt_available;
>
> This naming convention is confusing at best. First you declare
>
>> +/* true if ERDT table is present and valid */
>> +static bool erdt_available;
>
> Which says nothing about enabled and then you claim that reading the
> available variable tells you whether it is enabled. Can you please make
> your mind up and make this consistent and comprehensible?
>
OK, renamed the static variable to 'erdt_enabled_flag' so it
matches the accessor erdt_enabled().
>> +/*
>> + * lookup_logical_cpu_by_x2apicid - Map x2APIC ID to logical CPU number
>> + */
>> +static __init int lookup_logical_cpu_by_x2apicid(u32 x2apicid)
>> +{
>> + int cpu;
>> +
>> + for_each_possible_cpu(cpu) {
>> + if (cpu_physical_id(cpu) == x2apicid)
>> + return cpu;
>> + }
>> +
>> + return -1;
>
> No. The topology code has topo_lookup_cpuid(). Please expose that
> instead of hacking up your own version of it.
>
OK, dropped lookup_logical_cpu_by_x2apicid() and make
topo_lookup_cpuid() non-static and use this helper instead.
>> +/*
>
> Not a valid kernel doc comment. Those start with /**
>
OK, fixed.
>> +
>> +static void __iomem *erdt_ioremap_checked(phys_addr_t base, u32 size,
>> + const char *desc)
>> +{
>> + void __iomem *addr = ioremap(base, size << 12);
>> +
>> + if (!addr)
>> + pr_err("ERDT: Failed to map %s at phys addr %#llx (size: %u pages)\n",
>> + desc, (unsigned long long)base, size);
>
> Lacks brackets. See bracket rules.
>
OK, added brackets since the pr_err() spans multiple lines.
>> + return addr;
>> +}
>> +
>
>> +static __init bool cacd_init(struct erdt_domain_info *d,
>> + struct acpi_subtbl_hdr_16 *subtbl,
>> + int *l3_cache_id)
>
> You have 100 characters. Please use them.
>
OK, the functions that fit within 100 columns are now on a
single line(and subsequent patches).
>> +{
>> + *l3_cache_id = get_l3_cache_id_from_cacd((struct acpi_erdt_cacd *)subtbl);
>> +
>> + return *l3_cache_id != -1;
>> +}
>> +
>> +static __init bool parse_rmdd_entry(struct acpi_subtbl_hdr_16 *rmdd_hdr)
>> +{
>> + struct acpi_erdt_rmdd *rmdd;
>> + struct erdt_domain_info *domain_info;
>> + struct acpi_subtbl_hdr_16 *subtbl;
>> + int l3_cache_id = -1;
>> + u32 subtbl_mask = 0;
>> + void *rmdd_end;
>
> See variable declaration doc chapter
>
OK, reordered in reverse fir-tree.
>> +
>> + if (rmdd_hdr->length < sizeof(*rmdd)) {
>> + pr_info(FW_BUG "Invalid RMDD length %u\n", rmdd_hdr->length);
>> + return false;
>> + }
>> +
>> + rmdd = (struct acpi_erdt_rmdd *)rmdd_hdr;
>> +
>> + /* Quietly ignore non-CPU-based L3 domains */
>> + if (!(rmdd->flags & 0x1))
>
> 0x1 is really a useful and intuitive constant. Use a proper define for it.
>
Added a named constant and use it in the check.
>> + return true;
>> +
>> + domain_info = kzalloc(sizeof(*domain_info), GFP_KERNEL);
>> + if (!domain_info)
>> + return false;
>> +
>> + domain_info->base[ERDT_MMIO_RMDD_CREG] = erdt_ioremap_checked(rmdd->creg_base, rmdd->creg_size,
>> + "RMDD ctrl base");
>> + if (!domain_info->base[ERDT_MMIO_RMDD_CREG])
>> + goto cleanup;
>> +
>> + rmdd_end = (void *)rmdd + rmdd->header.length;
>> +
>> + /* Iterate through all sub-structures inside this RMDD block */
>> + for (subtbl = (void *)rmdd + sizeof(*rmdd);
>> + (void *)subtbl + sizeof(*subtbl) <= rmdd_end;
>> + subtbl = (void *)subtbl + subtbl->length) {
>> + if (subtbl->length < sizeof(*subtbl) ||
>> + (void *)subtbl + subtbl->length > rmdd_end) {
>
> This unreadable type cast orgy makes my eyes bleed.
>
> for (subtbl = rmdd_subtbl(rmdd); subtbl_valid(rmdd, subtbl); subtbl = next_subtbl(subtbl))
>
> or something comprehensible like that.
>
OK, will do.
> Also this gem is made up nonsense:
>
>> + if (subtbl->length < sizeof(*subtbl) ||
>> + (void *)subtbl + subtbl->length > rmdd_end) {
>
> Because of the loop condition above it is equivalent to:
>
> if (subtbl->length != sizeof(*subtbl))
> fail();
>
sizeof(*subtbl) is the 8-byte header. CACD's length is variable
(8 + Enumeration-IDs array), so a valid CACD always has
length > sizeof(*subtbl). Using subtbl->length!= sizeof(*subtbl)
might reject the valid entry.
Moved the validation into subtbl_valid().
thanks,
Chenyu
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 2/6] x86/resctrl: Parse ACPI CMRC table
2026-06-04 16:57 ` Thomas Gleixner
@ 2026-06-05 12:14 ` Chen, Yu C
0 siblings, 0 replies; 11+ messages in thread
From: Chen, Yu C @ 2026-06-05 12:14 UTC (permalink / raw)
To: Thomas Gleixner
Cc: x86, linux-kernel, bp, mingo, dave.hansen, hpa, dave.martin,
james.morse, fenghuay, babu.moger, anil.keshavamurthy,
reinette.chatre, tony.luck
On 6/5/2026 12:57 AM, Thomas Gleixner wrote:
> On Fri, Jun 05 2026 at 00:11, Chen Yu wrote:
>
>> The CMRC (Cache Monitoring Registers for CPU Agents Description)
>> sub-table of ERDT describes the MMIO registers used to read
>> cache monitoring counters (e.g. LLC occupancy) for an RMD.
>>
>> Parse each CMRC sub-table, ioremap its register window, and save
>> the CMRC pointer in the corresponding ERDT domain entry so that
>> later monitoring code can read the counters via MMIO.
>>
>> Suggested-by: Tony Luck <tony.luck@intel.com>
>> Signed-off-by: Chen Yu <yu.c.chen@intel.com>
>> ---
>> arch/x86/kernel/cpu/resctrl/erdt.c | 40 ++++++++++++++++++++++++++++++
>> 1 file changed, 40 insertions(+)
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/erdt.c b/arch/x86/kernel/cpu/resctrl/erdt.c
>> index 3f309a4b15c8..d1eca4594f09 100644
>> --- a/arch/x86/kernel/cpu/resctrl/erdt.c
>> +++ b/arch/x86/kernel/cpu/resctrl/erdt.c
>> @@ -38,6 +38,7 @@ static bool erdt_available;
>> static DEFINE_XARRAY(erdt_domain_xa); /* Indexed by L3 cache ID */
>>
>> #define ERDT_VALID_VERSION 1
>> +#define CMRC_VALID_INDEX_FUNC_VERSION 1
>
> Please make this tabular so it's easy to parse
>
> #define ERDT_VALID_VERSION 1
> #define CMRC_VALID_INDEX_FUNC_VERSION 1
>
OK, fixed.
thanks,
Chenyu
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-06-05 12:15 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-04 16:07 [PATCH v2 0/6] Introduce MMIO-based CMT access for Enhanced RDT Chen Yu
2026-06-04 16:08 ` [PATCH v2 1/6] x86/resctrl: Parse ACPI ERDT table and map RMDD domains by L3 cache ID Chen Yu
2026-06-04 16:56 ` Thomas Gleixner
2026-06-05 11:29 ` Chen, Yu C
2026-06-04 16:11 ` [PATCH v2 2/6] x86/resctrl: Parse ACPI CMRC table Chen Yu
2026-06-04 16:57 ` Thomas Gleixner
2026-06-05 12:14 ` Chen, Yu C
2026-06-04 16:11 ` [PATCH v2 3/6] x86/resctrl: Rename prev_msr to prev_mon_val Chen Yu
2026-06-04 16:11 ` [PATCH v2 4/6] x86/resctrl: Refactor the monitor read function Chen Yu
2026-06-04 16:11 ` [PATCH v2 5/6] fs/resctrl: Do not invoke smp_processor_id() in preemptible context Chen Yu
2026-06-04 16:11 ` [PATCH v2 6/6] x86/resctrl: Add support for L3 occupancy monitoring via RMID MMIO read Chen Yu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox