From: Tony Luck <tony.luck@intel.com>
To: Fenghua Yu <fenghuay@nvidia.com>,
Reinette Chatre <reinette.chatre@intel.com>,
Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>,
Peter Newman <peternewman@google.com>,
James Morse <james.morse@arm.com>,
Babu Moger <babu.moger@amd.com>,
Drew Fustini <dfustini@baylibre.com>,
Dave Martin <Dave.Martin@arm.com>, Chen Yu <yu.c.chen@intel.com>
Cc: x86@kernel.org, linux-kernel@vger.kernel.org,
patches@lists.linux.dev, Tony Luck <tony.luck@intel.com>
Subject: [PATCH v12 24/31] x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF_PKG
Date: Mon, 13 Oct 2025 15:33:38 -0700 [thread overview]
Message-ID: <20251013223348.103390-25-tony.luck@intel.com> (raw)
In-Reply-To: <20251013223348.103390-1-tony.luck@intel.com>
There are now three meanings for "number of RMIDs":
1) The number for legacy features enumerated by CPUID leaf 0xF. This
is the maximum number of distinct values that can be loaded into
MSR_IA32_PQR_ASSOC. Note that systems with Sub-NUMA Cluster mode enabled
will force scaling down the CPUID enumerated value by the number of SNC
nodes per L3-cache.
2) The number of registers in MMIO space for each event. This
is enumerated in the XML files and is the value initialized into
event_group::num_rmids.
3) The number of "hardware counters" (this isn't a strictly accurate
description of how things work, but serves as a useful analogy that
does describe the limitations) feeding to those MMIO registers. This
is enumerated in telemetry_region::num_rmids returned from the call to
intel_pmt_get_regions_by_feature()
Event groups with insufficient "hardware counters" to track all RMIDs
are difficult for users to use, since the system may reassign "hardware
counters" at any time. This means that users cannot reliably collect
two consecutive event counts to compute the rate at which events are
occurring.
Introduce rdt_set_feature_disabled() to mark any under-resourced event groups
(those with telemetry_region::num_rmids < event_group::num_rmids for any of
the event group's telemetry regions) as unusable. Note that the rdt_options[]
structure must now be writable at run-time.
Limit an event group's number of possible monitor resource groups
to the lowest number of "hardware counters" if the user explicitly
requests to enable an under-resourced event group.
Scan all enabled event groups and assign the RDT_RESOURCE_PERF_PKG
resource "num_rmids" value to the smallest of these values as this value
will be used later to compare against the number of RMIDs supported
by other resources to determine how many monitoring resource groups
are supported.
N.B. Change type of rdt_resource::num_rmid to u32 to match type of
event_group::num_rmids so that min(r->num_rmid, e->num_rmids) won't
complain about mixing signed and unsigned types. Print r->num_rmid as
unsigned value in rdt_num_rmids_show().
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
include/linux/resctrl.h | 2 +-
arch/x86/kernel/cpu/resctrl/internal.h | 2 +
arch/x86/kernel/cpu/resctrl/core.c | 18 +++++++-
arch/x86/kernel/cpu/resctrl/intel_aet.c | 56 +++++++++++++++++++++++--
fs/resctrl/rdtgroup.c | 2 +-
5 files changed, 74 insertions(+), 6 deletions(-)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 111c8f1dc77e..c7b5e56d25bb 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -292,7 +292,7 @@ enum resctrl_schema_fmt {
* events of monitor groups created via mkdir.
*/
struct resctrl_mon {
- int num_rmid;
+ u32 num_rmid;
unsigned int mbm_cfg_mask;
int num_mbm_cntrs;
bool mbm_cntr_assignable;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index e3710b9f993e..cea76f88422c 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -227,6 +227,8 @@ void resctrl_arch_mbm_cntr_assign_set_one(struct rdt_resource *r);
bool rdt_is_feature_enabled(char *name);
+void rdt_set_feature_disabled(char *name);
+
#ifdef CONFIG_X86_CPU_RESCTRL_INTEL_AET
bool intel_aet_get_events(void);
void __exit intel_aet_exit(void);
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 9d14ebde9d89..cc3479429044 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -788,7 +788,7 @@ struct rdt_options {
bool force_off, force_on;
};
-static struct rdt_options rdt_options[] __ro_after_init = {
+static struct rdt_options rdt_options[] = {
RDT_OPT(RDT_FLAG_CMT, "cmt", X86_FEATURE_CQM_OCCUP_LLC),
RDT_OPT(RDT_FLAG_MBM_TOTAL, "mbmtotal", X86_FEATURE_CQM_MBM_TOTAL),
RDT_OPT(RDT_FLAG_MBM_LOCAL, "mbmlocal", X86_FEATURE_CQM_MBM_LOCAL),
@@ -851,6 +851,22 @@ bool rdt_cpu_has(int flag)
return ret;
}
+/*
+ * Can be called during feature enumeration if sanity check of
+ * a feature's parameters indicates problems with the feature.
+ */
+void rdt_set_feature_disabled(char *name)
+{
+ struct rdt_options *o;
+
+ for (o = rdt_options; o < &rdt_options[NUM_RDT_OPTIONS]; o++) {
+ if (!strcmp(name, o->name)) {
+ o->force_off = true;
+ return;
+ }
+ }
+}
+
/*
* Hardware features that do not have X86_FEATURE_* bits. There is no
* "hardware does not support this at all" case. Assume that the caller
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index fa6d3456fb7a..03d159afcfe1 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -25,6 +25,7 @@
#include <linux/intel_pmt_features.h>
#include <linux/intel_vsec.h>
#include <linux/io.h>
+#include <linux/minmax.h>
#include <linux/overflow.h>
#include <linux/printk.h>
#include <linux/rculist.h>
@@ -55,6 +56,7 @@ struct pmt_event {
/**
* struct event_group - All information about a group of telemetry events.
+ * @name: Name for this group (used by boot rdt= option)
* @pfg: Points to the aggregated telemetry space information
* returned by the intel_pmt_get_regions_by_feature()
* call to the INTEL_PMT_TELEMETRY driver that contains
@@ -62,16 +64,22 @@ struct pmt_event {
* Valid if the system supports the event group.
* NULL otherwise.
* @guid: Unique number per XML description file.
+ * @num_rmids: Number of RMIDs supported by this group. May be
+ * adjusted downwards if enumeration from
+ * intel_pmt_get_regions_by_feature() indicates fewer
+ * RMIDs can be tracked simultaneously.
* @mmio_size: Number of bytes of MMIO registers for this group.
* @num_events: Number of events in this group.
* @evts: Array of event descriptors.
*/
struct event_group {
/* Data fields for additional structures to manage this group. */
+ char *name;
struct pmt_feature_group *pfg;
/* Remaining fields initialized from XML file. */
u32 guid;
+ u32 num_rmids;
size_t mmio_size;
unsigned int num_events;
struct pmt_event evts[] __counted_by(num_events);
@@ -85,7 +93,9 @@ struct event_group {
* File: xml/CWF/OOBMSM/RMID-ENERGY/cwf_aggregator.xml
*/
static struct event_group energy_0x26696143 = {
+ .name = "energy",
.guid = 0x26696143,
+ .num_rmids = 576,
.mmio_size = XML_MMIO_SIZE(576, 2, 3),
.num_events = 2,
.evts = {
@@ -99,7 +109,9 @@ static struct event_group energy_0x26696143 = {
* File: xml/CWF/OOBMSM/RMID-PERF/cwf_aggregator.xml
*/
static struct event_group perf_0x26557651 = {
+ .name = "perf",
.guid = 0x26557651,
+ .num_rmids = 576,
.mmio_size = XML_MMIO_SIZE(576, 7, 3),
.num_events = 7,
.evts = {
@@ -156,21 +168,59 @@ static bool skip_telem_region(struct telemetry_region *tr, struct event_group *e
return false;
}
+/* Side effect: Detects unusable regions and marks them as unusable */
+static bool all_regions_have_sufficient_rmid(struct event_group *e, struct pmt_feature_group *p)
+{
+ struct telemetry_region *tr;
+ bool ret = true;
+
+ for (int i = 0; i < p->count; i++) {
+ tr = &p->regions[i];
+ if (skip_telem_region(tr, e)) {
+ mark_telem_region_unusable(tr);
+ continue;
+ }
+
+ if (tr->num_rmids < e->num_rmids)
+ ret = false;
+ }
+
+ return ret;
+}
+
static bool enable_events(struct event_group *e, struct pmt_feature_group *p)
{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
bool usable_events = false;
+ /* Disable feature if insufficient RMIDs */
+ if (!all_regions_have_sufficient_rmid(e, p))
+ rdt_set_feature_disabled(e->name);
+
+ /* User can override above disable from kernel command line */
+ if (!rdt_is_feature_enabled(e->name))
+ return false;
+
for (int i = 0; i < p->count; i++) {
- if (skip_telem_region(&p->regions[i], e)) {
- mark_telem_region_unusable(&p->regions[i]);
+ if (!p->regions[i].addr)
continue;
- }
+ /*
+ * e->num_rmids only adjusted lower if user (via rdt= kernel
+ * parameter) forces an event group with insufficient RMID
+ * to be enabled.
+ */
+ e->num_rmids = min(e->num_rmids, p->regions[i].num_rmids);
usable_events = true;
}
if (!usable_events)
return false;
+ if (r->mon.num_rmid)
+ r->mon.num_rmid = min(r->mon.num_rmid, e->num_rmids);
+ else
+ r->mon.num_rmid = e->num_rmids;
+
for (int j = 0; j < e->num_events; j++)
resctrl_enable_mon_event(e->evts[j].id, true,
e->evts[j].bin_bits, &e->evts[j]);
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 2238a5536f4b..f18cc5b38315 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1135,7 +1135,7 @@ static int rdt_num_rmids_show(struct kernfs_open_file *of,
{
struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
- seq_printf(seq, "%d\n", r->mon.num_rmid);
+ seq_printf(seq, "%u\n", r->mon.num_rmid);
return 0;
}
--
2.51.0
next prev parent reply other threads:[~2025-10-13 22:34 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-13 22:33 [PATCH v12 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
2025-10-13 22:33 ` [PATCH v12 01/31] x86,fs/resctrl: Improve domain type checking Tony Luck
2025-10-23 4:07 ` Reinette Chatre
2025-10-13 22:33 ` [PATCH v12 02/31] x86/resctrl: Move L3 initialization into new helper function Tony Luck
2025-10-13 22:33 ` [PATCH v12 03/31] x86/resctrl: Refactor domain_remove_cpu_mon() ready for new domain types Tony Luck
2025-10-23 4:08 ` Reinette Chatre
2025-10-13 22:33 ` [PATCH v12 04/31] x86/resctrl: Clean up domain_remove_cpu_ctrl() Tony Luck
2025-10-13 22:33 ` [PATCH v12 05/31] x86,fs/resctrl: Refactor domain create/remove using struct rdt_domain_hdr Tony Luck
2025-10-23 4:15 ` Reinette Chatre
2025-10-13 22:33 ` [PATCH v12 06/31] x86,fs/resctrl: Use struct rdt_domain_hdr when reading counters Tony Luck
2025-10-23 4:17 ` Reinette Chatre
2025-10-23 20:27 ` Luck, Tony
2025-10-13 22:33 ` [PATCH v12 07/31] x86,fs/resctrl: Rename struct rdt_mon_domain and rdt_hw_mon_domain Tony Luck
2025-10-23 4:18 ` Reinette Chatre
2025-10-13 22:33 ` [PATCH v12 08/31] x86,fs/resctrl: Rename some L3 specific functions Tony Luck
2025-10-23 4:21 ` Reinette Chatre
2025-10-13 22:33 ` [PATCH v12 09/31] fs/resctrl: Make event details accessible to functions when reading events Tony Luck
2025-10-13 22:33 ` [PATCH v12 10/31] x86,fs/resctrl: Handle events that can be read from any CPU Tony Luck
2025-10-23 4:22 ` Reinette Chatre
2025-10-13 22:33 ` [PATCH v12 11/31] x86,fs/resctrl: Support binary fixed point event counters Tony Luck
2025-10-13 22:33 ` [PATCH v12 12/31] x86,fs/resctrl: Add an architectural hook called for each mount Tony Luck
2025-10-13 22:33 ` [PATCH v12 13/31] x86,fs/resctrl: Add and initialize rdt_resource for package scope monitor Tony Luck
2025-10-23 4:33 ` Reinette Chatre
2025-10-13 22:33 ` [PATCH v12 14/31] x86/resctrl: Discover hardware telemetry events Tony Luck
2025-10-23 4:28 ` Reinette Chatre
2025-10-13 22:33 ` [PATCH v12 15/31] x86,fs/resctrl: Fill in details of events for guid 0x26696143 and 0x26557651 Tony Luck
2025-10-23 4:28 ` Reinette Chatre
2025-10-13 22:33 ` [PATCH v12 16/31] x86,fs/resctrl: Add architectural event pointer Tony Luck
2025-10-23 4:34 ` Reinette Chatre
2025-10-13 22:33 ` [PATCH v12 17/31] x86/resctrl: Find and enable usable telemetry events Tony Luck
2025-10-23 4:35 ` Reinette Chatre
2025-10-13 22:33 ` [PATCH v12 18/31] fs/resctrl: Split L3 dependent parts out of __mon_event_count() Tony Luck
2025-10-23 4:37 ` Reinette Chatre
2025-10-13 22:33 ` [PATCH v12 19/31] x86/resctrl: Read telemetry events Tony Luck
2025-10-23 4:47 ` Reinette Chatre
2025-10-13 22:33 ` [PATCH v12 20/31] fs/resctrl: Refactor mkdir_mondata_subdir() Tony Luck
2025-10-23 17:45 ` Reinette Chatre
2025-10-27 23:00 ` Luck, Tony
2025-10-28 16:00 ` Reinette Chatre
2025-10-28 17:14 ` Luck, Tony
2025-10-28 17:40 ` Reinette Chatre
2025-10-28 18:40 ` Luck, Tony
2025-10-28 23:55 ` Reinette Chatre
2025-10-13 22:33 ` [PATCH v12 21/31] fs/resctrl: Refactor rmdir_mondata_subdir_allrdtgrp() Tony Luck
2025-10-23 17:45 ` Reinette Chatre
2025-10-13 22:33 ` [PATCH v12 22/31] x86,fs/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG Tony Luck
2025-10-23 17:46 ` Reinette Chatre
2025-10-13 22:33 ` [PATCH v12 23/31] x86/resctrl: Add energy/perf choices to rdt boot option Tony Luck
2025-10-23 17:45 ` Reinette Chatre
2025-10-13 22:33 ` Tony Luck [this message]
2025-10-23 17:48 ` [PATCH v12 24/31] x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF_PKG Reinette Chatre
2025-10-13 22:33 ` [PATCH v12 25/31] fs/resctrl: Move allocation/free of closid_num_dirty_rmid[] Tony Luck
2025-10-23 17:49 ` Reinette Chatre
2025-10-13 22:33 ` [PATCH v12 26/31] x86,fs/resctrl: Compute number of RMIDs as minimum across resources Tony Luck
2025-10-13 22:33 ` [PATCH v12 27/31] fs/resctrl: Move RMID initialization to first mount Tony Luck
2025-10-23 17:49 ` Reinette Chatre
2025-10-13 22:33 ` [PATCH v12 28/31] x86/resctrl: Enable RDT_RESOURCE_PERF_PKG Tony Luck
2025-10-23 17:50 ` Reinette Chatre
2025-10-13 22:33 ` [PATCH v12 29/31] fs/resctrl: Provide interface to create architecture specific debugfs area Tony Luck
2025-10-23 17:50 ` Reinette Chatre
2025-10-13 22:33 ` [PATCH v12 30/31] x86/resctrl: Add debugfs files to show telemetry aggregator status Tony Luck
2025-10-23 17:50 ` Reinette Chatre
2025-10-13 22:33 ` [PATCH v12 31/31] x86,fs/resctrl: Update documentation for telemetry events Tony Luck
2025-10-23 17:52 ` Reinette Chatre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251013223348.103390-25-tony.luck@intel.com \
--to=tony.luck@intel.com \
--cc=Dave.Martin@arm.com \
--cc=babu.moger@amd.com \
--cc=dfustini@baylibre.com \
--cc=fenghuay@nvidia.com \
--cc=james.morse@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=maciej.wieczor-retman@intel.com \
--cc=patches@lists.linux.dev \
--cc=peternewman@google.com \
--cc=reinette.chatre@intel.com \
--cc=x86@kernel.org \
--cc=yu.c.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).