* [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring
@ 2025-07-11 23:53 Tony Luck
2025-07-11 23:53 ` [PATCH v7 01/31] x86,fs/resctrl: Consolidate monitor event descriptions Tony Luck
` (31 more replies)
0 siblings, 32 replies; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
The prerequisite patch series to the Intel Telemetry code is
now in the Linux x86 platform drivers tree:
Link: https://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86.git/
queued for the v6.17 merge window.
That series is based on v6.16-rc1. One resctrl bugfix went into
Linus' tree after -rcl: commit 594902c986e2 ("x86,fs/resctrl: Remove
inappropriate references to cacheinfo in the resctrl subsystem")
These patches are based on the x86 platform drivers tree plus cherry
pick of that patch. For convenience I've pushed that base, and this
series to the rdt-aet-v7-base and rdt-aet-v7 branches of:
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git
Changes since v6 was posted here:
Link: https://lore.kernel.org/all/20250626164941.106341-1-tony.luck@intel.com/
--- cover-letter ---
Rewritten - and then updated with comments from:
Link: https://lore.kernel.org/all/f3ba783a-6387-4997-9e8c-897109ee3559@intel.com/
--- 1 ---
Added review tag from Fenghua
Change kerneldoc for mon_evt::rid to "resource id for this event"
--- 2 ---
Added review tags from Fenghua & Reinette
--- 3 ---
Added review tag from Reinette
--- 4 ---
Added review tag from Reinette
--- 5 ---
Change kerneldoc for rdt_domain_hdr::rid to "resource id for this instance"
Fix domain_header_is_valid() check in rdtgroup_mondata_show() to use mon_data::rid
--- 6 ---
Fix Subject and commit comment to match the code changed.
--- 7 ---
Commit comment s/will still need to/needs to/
Fix for "d used before set" applied.
--- 8 ---
No comments.
Fix for "d used before set" applied.
--- 9 ---
Commit comment: s/it made sense to use/it made sense to use the L3 specific/
and s/to not/to note/.
Added details of why the change to pass the rdt_domain_hdr is needed and
what that entails.
rmdir_mondata_subdir_allrdtgrp() s/d->hdr.id/hdr->id/
mkdir_mondata_subdir() s/bool snc_mode = 0/bool snc_mode = false/
resctrl_online_mon_domain() s/r->rid/RDT_RESOURCE_L3/
--- 10 ---
Commit comment: Add "of the L3 resource".
Fix alignment in kerneldoc for struct rdt_hw_l3_mon_domain.
Fix alignment for structure members in rdt_hw_l3_mon_domain.
--- 11 ---
Replace first two paragraphs of commit message.
s/mbm/MBM/
--- 12 ---
Update kerneldoc for mon_data::evt to match change to field.
--- 13 ---
Add comment before cpu_on_correct_domain() explaining when it is called
in preemptible vs. non-preemptible context.
Make just one call to cpu_on_correct_domain() at head of __mon_event_count()
to cover all subsequent code paths.
--- 14 ---
Fenghua: s/0.0. 0.25/0.0, 0.25/ in commit message.
Kerneldoc for mon_evt::is_floating_point s/may be displayed/are displayed/
Kerneldoc for mon_evt::binary_bits Add "only valid if @is_floating_point is true"
Use consistent "unsigned int" type for binary_bits.
print_event_value(): Use int_pow() instead of hard coding powers of 10.
With only one element left in struct fixed_params, drop it and just use
a simple array of "unsigned int" for number of decimal places.
Change "frac += 1 << (binary_bits - 1);" to "frac += 1ull << (binary_bits - 1);"
s/sprintf/snprintf/ to fill the buf[] with the fractional part of value.
If architecture did not supply a binary_bits value for an event that
filesystem designated as floating point, print as "val.0" for consistent
user interface.
--- 15 ---
No comments. Unchanged.
--- 16 ---
Replace commit comment with Reinette version.
--- 17 ---
Commit comment:
s/published in the/published in/
s/The XML files provide/The XML file provides/
Added paragraph tying "aggregators" to "regions":
Each aggregator makes event counters available to Linux in
a region of MMIO memory. Enumeration of these regions is
done by the INTEL_PMT_DISCOVERY discovery driver.
Change name of configure_events() to discover_events()
Introduce a new Kconfig option X86_RESCTRL_CPU_INTEL_AET that is
only visible when INTEL_PMT_DISCOVERY=y so that x86 resctrl can
be configured without forcing inclusion of an Intel specific
driver.
--- 18 ---
Change MMIO size check to require equality with expected value.
s/Configure events/Discover events/ to match new function name
Update comment "Count how many per package" to say what is being
counted ("usable telemetry regions").
s/telemetry_regions/telemetry regions/ in comment.
Split known event group array into energy and perf lists.
Build a list of the enabled event groups.
--- 19 ---
Fix second paragraph to avoid redundancy. s/mmio/MMIO/
Clarify in ascii art that the first level structures refer
to package ID 0, package ID 1.
Rename struct mmio_info to struct pkg_mmio_info and
free_mmio_info() free_pkg_mmio_info() to match.
Change type of pkg_mmio_info::num_regions to unsigned int.
Update step 2 in header comment for discover_events().
Drop the test for duplicate telemetry information for a guid.
It can't happen because get_pmt_feature() stops scanning as
soon as discover_events() succeeds.
--- 20 ---
Change subject to say adding events for two specific GUIDs.
Added note that these two GUIDs are the ones used by Clearwater
Forest.
Changed the commit comment on MMIO counter layout to use
the "energy" guid=0x26696143 case as a specific example.
Changed type of pmt_event::idx and pmt_event::bin_bits to unsigned int
Ditto for event_group::num_events.
--- 21 ---
Add more detail for kerneldoc for resctrl_arch_rmid_read() arch_priv argument.
Provide the QOS_L3_OCCUP_EVENT_ID arch_priv value to resctrl_arch_rmid_read()
in the call from __check_limbo().
--- 22 ---
Drop confusing "core" from Subject.
--- 23 ---
Drop stale part of commit message about changing functions to
add/remove domains and bring them online or take them offline.
Those changes were incorporated into earlier patches in the
series.
--- 24 ---
Drop the "software_" part from rdt_is_software_feature_enabled().
Update comments to refer to hardware features instead of software
features. Expand "h/w" to "hardware. Drop use of "s/w".
Drop a stray "*" that appeared when reformatting block comment.
--- 25 ---
"as any time" -> "at any time"
drop "to ensure that all resctrl groups ..."
Fix rdt_num_rmids_show() to print r->num_rmid with "%u"
Dropped the rdt_is_software_feature_force_enabled() function.
Added a new rdt_set_feature_disabled().
Code flow now matches more closely that for the X86_FEATURE
enable/disable.
1) Check at the start of discover_events() to see if there
are sufficient RMIDs. If there are not, then use
rdt_set_feature_disabled() to mark the feature disabled.
2) Then call rdt_is_feature_enabled() which will override
the decision if the user specified rdt={feature} on the
kernel commend line.
--- 26 ---
Split into two:
Part 0027 cleans up the life cycle of closid_num_dirty_rmids and renames
some functions to add "_l3" components to their names.
Part 0028 Moves scan of resources and allocation of RMIDs to mount time
and renames dom_data_{init,exit{(). Dropped the "once" check from
rdt_get_tree()
--- 27 ---
No comments. Moved the "{name} monitoring detected" message to discover_events()
--- 28 ---
Fixed comment comment to match function name and path name with code.
Added a note that the {arch} component is "uname -m".
Added usage guidance in comment comment and block comment for
resctrl_debugfs_mon_info_arch_mkdir().
Added kerneldoc on life cycle (removed by resctrl_exit()) and
return value.
Clear debugfs_resctrl_info in resctrl_exit()
--- 29 ---
Fix Subject to refer to debugfs instead of info/ directory
s/arch/x86_64/ in pathname description
Use DEFINE_SIMPLE_ATTRIBUTE() to define file ops.
Exit create_debugfs_status_file early if debugfs directory cannot be created
--- 30 ---
Update details about core_energy and activity events to explain
they measure only "core" amounts.
Remove section about new status file uner info directory.
Add section about debugfs use.
Background
----------
On Intel systems that support per-RMID telemetry monitoring each logical
processor keeps a local count for various events. When the IA32_PQR_ASSOC.RMID
value for the logical processor changes (or when a two millisecond counter
expires) these event counts are transmitted to an event aggregator on
the same package as the processor together with the current RMID value. The
event counters are reset to zero to begin counting again.
Each aggregator takes the incoming event counts and adds them to
cumulative counts for each event for each RMID. Note that there can be
multiple aggregators on each package with no architectural association
between logical processors and an aggregator.
All of these aggregated counters can be read by an operating system from
the MMIO space of the Out Of Band Management Service Module (OOBMSM)
device(s) on a system. Any counter can be read from any logical processor.
Intel publishes details for each processor generation showing which
events are counted by each logical processor and the offsets for each
accumulated counter value within the MMIO space in XML files here:
https://github.com/intel/Intel-PMT.
For example there are two energy related telemetry events for the Clearwater
Forest family of processors and the MMIO space looks like this:
Offset RMID Event
------ ---- -----
0x0000 0 core_energy
0x0008 0 activity
0x0010 1 core_energy
0x0018 1 activity
...
0x23F0 575 core_energy
0x23F8 575 activity
In addition the XML file provides the units (Joules for core_energy,
Farads for activity) and the type of data (fixed-point binary with
bit 63 used to indicate the data is valid, and the low 18 bits as a
binary fraction).
Finally, each XML file provides a 32-bit unique id (or guid) that is
used as an index to find the correct XML description file for each
telemetry implementation.
The INTEL_PMT_DISCOVERY driver provides intel_pmt_get_regions_by_feature()
to enumerate the aggregator instances (also referred to as "telemetry
regions" in this series) on a platform. It provides:
1) guid - so resctrl can determine which events are supported
2) MMIO base address of counters
3) package id
Resctrl accumulates counts from all aggregators on a package in order
to provide a consistent user interface across processor generations.
Directory structure for the telemetry events looks like this:
$ tree /sys/fs/resctrl/mon_data/
/sys/fs/resctrl/mon_data/
mon_data
├── mon_PERF_PKG_00
│ ├── activity
│ └── core_energy
└── mon_PERF_PKG_01
├── activity
└── core_energy
Reading the "core_energy" file from some resctrl mon_data directory shows
the cumulative energy (in Joules) used by all tasks that ran with the RMID
associated with that directory on a given package. Note that "core_energy"
reports only energy consumed by CPU cores (data processing units,
L1/L2 caches, etc.). It does not include energy used in the "uncore"
(L3 cache, on package devices, etc.), or used by memory or I/O devices.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tony Luck (31):
x86,fs/resctrl: Consolidate monitor event descriptions
x86,fs/resctrl: Replace architecture event enabled checks
x86/resctrl: Remove 'rdt_mon_features' global variable
x86,fs/resctrl: Prepare for more monitor events
x86,fs/resctrl: Improve domain type checking
x86/resctrl: Move L3 initialization into new helper function
x86,fs/resctrl: Refactor domain_remove_cpu_mon() ready for new domain
types
x86/resctrl: Clean up domain_remove_cpu_ctrl()
x86,fs/resctrl: Use struct rdt_domain_hdr instead of struct
rdt_mon_domain
x86,fs/resctrl: Rename struct rdt_mon_domain and rdt_hw_mon_domain
x86,fs/resctrl: Rename some L3 specific functions
fs/resctrl: Make event details accessible to functions when reading
events
x86,fs/resctrl: Handle events that can be read from any CPU
x86,fs/resctrl: Support binary fixed point event counters
x86,fs/resctrl: Add an architectural hook called for each mount
x86,fs/resctrl: Add and initialize rdt_resource for package scope core
monitor
x86/resctrl: Discover hardware telemetry events
x86/resctrl: Count valid telemetry aggregators per package
x86/resctrl: Complete telemetry event enumeration
x86,fs/resctrl: Fill in details of events for guid 0x26696143 and
0x26557651
x86,fs/resctrl: Add architectural event pointer
x86/resctrl: Read telemetry events
x86/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG
x86/resctrl: Add energy/perf choices to rdt boot option
x86/resctrl: Handle number of RMIDs supported by telemetry resources
fs/resctrl: Fix life-cycle of closid_num_dirty_rmid
x86,fs/resctrl: Move RMID initialization to first mount
x86/resctrl: Enable RDT_RESOURCE_PERF_PKG
fs/resctrl: Provide interface to create architecture specific debugfs
area
x86/resctrl: Add debugfs files to show telemetry aggregator status
x86,fs/resctrl: Update Documentation for package events
.../admin-guide/kernel-parameters.txt | 2 +-
Documentation/filesystems/resctrl.rst | 85 +++-
include/linux/resctrl.h | 89 +++-
include/linux/resctrl_types.h | 26 +-
arch/x86/include/asm/resctrl.h | 16 -
arch/x86/kernel/cpu/resctrl/internal.h | 45 +-
fs/resctrl/internal.h | 66 ++-
arch/x86/kernel/cpu/resctrl/core.c | 331 ++++++++++----
arch/x86/kernel/cpu/resctrl/intel_aet.c | 426 ++++++++++++++++++
arch/x86/kernel/cpu/resctrl/monitor.c | 78 ++--
fs/resctrl/ctrlmondata.c | 127 +++++-
fs/resctrl/monitor.c | 306 ++++++++-----
fs/resctrl/rdtgroup.c | 266 +++++++----
arch/x86/Kconfig | 13 +
arch/x86/kernel/cpu/resctrl/Makefile | 1 +
15 files changed, 1439 insertions(+), 438 deletions(-)
create mode 100644 arch/x86/kernel/cpu/resctrl/intel_aet.c
base-tree: git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git
base-branch: rdt-aet-v7-base
base-commit: 882f32fbcc7ba5b46cf8889607fa677de1e222e0
--
2.50.0
^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v7 01/31] x86,fs/resctrl: Consolidate monitor event descriptions
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-17 17:51 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 02/31] x86,fs/resctrl: Replace architecture event enabled checks Tony Luck
` (30 subsequent siblings)
31 siblings, 1 reply; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
There are currently only three monitor events, all associated with
the RDT_RESOURCE_L3 resource. Growing support for additional events
will be easier with some restructuring to have a single point in
file system code where all attributes of all events are defined.
Place all event descriptions into an array mon_event_all[]. Doing
this has the beneficial side effect of removing the need for
rdt_resource::evt_list.
Add resctrl_event_id::QOS_FIRST_EVENT for a lower bound on range
checks for event ids and as the starting index to scan mon_event_all[].
Drop the code that builds evt_list and change the two places where
the list is scanned to scan mon_event_all[] instead using a new
helper macro for_each_mon_event().
Architecture code now informs file system code which events are
available with resctrl_enable_mon_event().
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
---
include/linux/resctrl.h | 4 +-
include/linux/resctrl_types.h | 12 ++++--
fs/resctrl/internal.h | 13 ++++--
arch/x86/kernel/cpu/resctrl/core.c | 12 ++++--
fs/resctrl/monitor.c | 63 +++++++++++++++---------------
fs/resctrl/rdtgroup.c | 11 +++---
6 files changed, 66 insertions(+), 49 deletions(-)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 6fb4894b8cfd..2944042bd84c 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -269,7 +269,6 @@ enum resctrl_schema_fmt {
* @mon_domains: RCU list of all monitor domains for this resource
* @name: Name to use in "schemata" file.
* @schema_fmt: Which format string and parser is used for this schema.
- * @evt_list: List of monitoring events
* @mbm_cfg_mask: Bandwidth sources that can be tracked when bandwidth
* monitoring events can be configured.
* @cdp_capable: Is the CDP feature available on this resource
@@ -287,7 +286,6 @@ struct rdt_resource {
struct list_head mon_domains;
char *name;
enum resctrl_schema_fmt schema_fmt;
- struct list_head evt_list;
unsigned int mbm_cfg_mask;
bool cdp_capable;
};
@@ -372,6 +370,8 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
u32 resctrl_arch_system_num_rmid_idx(void);
int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
+void resctrl_enable_mon_event(enum resctrl_event_id eventid);
+
bool resctrl_arch_is_evt_configurable(enum resctrl_event_id evt);
/**
diff --git a/include/linux/resctrl_types.h b/include/linux/resctrl_types.h
index a25fb9c4070d..2dadbc54e4b3 100644
--- a/include/linux/resctrl_types.h
+++ b/include/linux/resctrl_types.h
@@ -34,11 +34,15 @@
/* Max event bits supported */
#define MAX_EVT_CONFIG_BITS GENMASK(6, 0)
-/*
- * Event IDs, the values match those used to program IA32_QM_EVTSEL before
- * reading IA32_QM_CTR on RDT systems.
- */
+/* Event IDs */
enum resctrl_event_id {
+ /* Must match value of first event below */
+ QOS_FIRST_EVENT = 0x01,
+
+ /*
+ * These values match those used to program IA32_QM_EVTSEL before
+ * reading IA32_QM_CTR on RDT systems.
+ */
QOS_L3_OCCUP_EVENT_ID = 0x01,
QOS_L3_MBM_TOTAL_EVENT_ID = 0x02,
QOS_L3_MBM_LOCAL_EVENT_ID = 0x03,
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 0a1eedba2b03..4f315b7e9ec0 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -52,19 +52,26 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
}
/**
- * struct mon_evt - Entry in the event list of a resource
+ * struct mon_evt - Properties of a monitor event
* @evtid: event id
+ * @rid: resource id for this event
* @name: name of the event
* @configurable: true if the event is configurable
- * @list: entry in &rdt_resource->evt_list
+ * @enabled: true if the event is enabled
*/
struct mon_evt {
enum resctrl_event_id evtid;
+ enum resctrl_res_level rid;
char *name;
bool configurable;
- struct list_head list;
+ bool enabled;
};
+extern struct mon_evt mon_event_all[QOS_NUM_EVENTS];
+
+#define for_each_mon_event(mevt) for (mevt = &mon_event_all[QOS_FIRST_EVENT]; \
+ mevt < &mon_event_all[QOS_NUM_EVENTS]; mevt++)
+
/**
* struct mon_data - Monitoring details for each event file.
* @list: Member of the global @mon_data_kn_priv_list list.
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 187d527ef73b..7fcae25874fe 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -864,12 +864,18 @@ static __init bool get_rdt_mon_resources(void)
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
- if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC))
+ if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) {
+ resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID);
rdt_mon_features |= (1 << QOS_L3_OCCUP_EVENT_ID);
- if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL))
+ }
+ if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
+ resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID);
rdt_mon_features |= (1 << QOS_L3_MBM_TOTAL_EVENT_ID);
- if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL))
+ }
+ if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
+ resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID);
rdt_mon_features |= (1 << QOS_L3_MBM_LOCAL_EVENT_ID);
+ }
if (!rdt_mon_features)
return false;
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index f5637855c3ac..2313e48de55f 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -844,38 +844,39 @@ static void dom_data_exit(struct rdt_resource *r)
mutex_unlock(&rdtgroup_mutex);
}
-static struct mon_evt llc_occupancy_event = {
- .name = "llc_occupancy",
- .evtid = QOS_L3_OCCUP_EVENT_ID,
-};
-
-static struct mon_evt mbm_total_event = {
- .name = "mbm_total_bytes",
- .evtid = QOS_L3_MBM_TOTAL_EVENT_ID,
-};
-
-static struct mon_evt mbm_local_event = {
- .name = "mbm_local_bytes",
- .evtid = QOS_L3_MBM_LOCAL_EVENT_ID,
-};
-
/*
- * Initialize the event list for the resource.
- *
- * Note that MBM events are also part of RDT_RESOURCE_L3 resource
- * because as per the SDM the total and local memory bandwidth
- * are enumerated as part of L3 monitoring.
+ * All available events. Architecture code marks the ones that
+ * are supported by a system using resctrl_enable_mon_event()
+ * to set .enabled.
*/
-static void l3_mon_evt_init(struct rdt_resource *r)
+struct mon_evt mon_event_all[QOS_NUM_EVENTS] = {
+ [QOS_L3_OCCUP_EVENT_ID] = {
+ .name = "llc_occupancy",
+ .evtid = QOS_L3_OCCUP_EVENT_ID,
+ .rid = RDT_RESOURCE_L3,
+ },
+ [QOS_L3_MBM_TOTAL_EVENT_ID] = {
+ .name = "mbm_total_bytes",
+ .evtid = QOS_L3_MBM_TOTAL_EVENT_ID,
+ .rid = RDT_RESOURCE_L3,
+ },
+ [QOS_L3_MBM_LOCAL_EVENT_ID] = {
+ .name = "mbm_local_bytes",
+ .evtid = QOS_L3_MBM_LOCAL_EVENT_ID,
+ .rid = RDT_RESOURCE_L3,
+ },
+};
+
+void resctrl_enable_mon_event(enum resctrl_event_id eventid)
{
- INIT_LIST_HEAD(&r->evt_list);
+ if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS))
+ return;
+ if (mon_event_all[eventid].enabled) {
+ pr_warn("Duplicate enable for event %d\n", eventid);
+ return;
+ }
- if (resctrl_arch_is_llc_occupancy_enabled())
- list_add_tail(&llc_occupancy_event.list, &r->evt_list);
- if (resctrl_arch_is_mbm_total_enabled())
- list_add_tail(&mbm_total_event.list, &r->evt_list);
- if (resctrl_arch_is_mbm_local_enabled())
- list_add_tail(&mbm_local_event.list, &r->evt_list);
+ mon_event_all[eventid].enabled = true;
}
/**
@@ -902,15 +903,13 @@ int resctrl_mon_resource_init(void)
if (ret)
return ret;
- l3_mon_evt_init(r);
-
if (resctrl_arch_is_evt_configurable(QOS_L3_MBM_TOTAL_EVENT_ID)) {
- mbm_total_event.configurable = true;
+ mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID].configurable = true;
resctrl_file_fflags_init("mbm_total_bytes_config",
RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
}
if (resctrl_arch_is_evt_configurable(QOS_L3_MBM_LOCAL_EVENT_ID)) {
- mbm_local_event.configurable = true;
+ mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID].configurable = true;
resctrl_file_fflags_init("mbm_local_bytes_config",
RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
}
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 77d08229d855..b95501d4b5de 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1152,7 +1152,9 @@ static int rdt_mon_features_show(struct kernfs_open_file *of,
struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
struct mon_evt *mevt;
- list_for_each_entry(mevt, &r->evt_list, list) {
+ for_each_mon_event(mevt) {
+ if (mevt->rid != r->rid || !mevt->enabled)
+ continue;
seq_printf(seq, "%s\n", mevt->name);
if (mevt->configurable)
seq_printf(seq, "%s_config\n", mevt->name);
@@ -3057,10 +3059,9 @@ static int mon_add_all_files(struct kernfs_node *kn, struct rdt_mon_domain *d,
struct mon_evt *mevt;
int ret, domid;
- if (WARN_ON(list_empty(&r->evt_list)))
- return -EPERM;
-
- list_for_each_entry(mevt, &r->evt_list, list) {
+ for_each_mon_event(mevt) {
+ if (mevt->rid != r->rid || !mevt->enabled)
+ continue;
domid = do_sum ? d->ci_id : d->hdr.id;
priv = mon_get_kn_priv(r->rid, domid, mevt, do_sum);
if (WARN_ON_ONCE(!priv))
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 02/31] x86,fs/resctrl: Replace architecture event enabled checks
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
2025-07-11 23:53 ` [PATCH v7 01/31] x86,fs/resctrl: Consolidate monitor event descriptions Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-11 23:53 ` [PATCH v7 03/31] x86/resctrl: Remove 'rdt_mon_features' global variable Tony Luck
` (29 subsequent siblings)
31 siblings, 0 replies; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
The resctrl file system now has complete knowledge of the status
of every event. So there is no need for per-event function calls
to check.
Replace each of the resctrl_arch_is_{event}enabled() calls with
resctrl_is_mon_event_enabled(QOS_{EVENT}).
No functional change.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
include/linux/resctrl.h | 2 ++
arch/x86/include/asm/resctrl.h | 15 ---------------
arch/x86/kernel/cpu/resctrl/core.c | 4 ++--
arch/x86/kernel/cpu/resctrl/monitor.c | 4 ++--
fs/resctrl/ctrlmondata.c | 4 ++--
fs/resctrl/monitor.c | 16 +++++++++++-----
fs/resctrl/rdtgroup.c | 18 +++++++++---------
7 files changed, 28 insertions(+), 35 deletions(-)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 2944042bd84c..40aba6b5d4f0 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -372,6 +372,8 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
void resctrl_enable_mon_event(enum resctrl_event_id eventid);
+bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid);
+
bool resctrl_arch_is_evt_configurable(enum resctrl_event_id evt);
/**
diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index feb93b50e990..b1dd5d6b87db 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -84,21 +84,6 @@ static inline void resctrl_arch_disable_mon(void)
static_branch_dec_cpuslocked(&rdt_enable_key);
}
-static inline bool resctrl_arch_is_llc_occupancy_enabled(void)
-{
- return (rdt_mon_features & (1 << QOS_L3_OCCUP_EVENT_ID));
-}
-
-static inline bool resctrl_arch_is_mbm_total_enabled(void)
-{
- return (rdt_mon_features & (1 << QOS_L3_MBM_TOTAL_EVENT_ID));
-}
-
-static inline bool resctrl_arch_is_mbm_local_enabled(void)
-{
- return (rdt_mon_features & (1 << QOS_L3_MBM_LOCAL_EVENT_ID));
-}
-
/*
* __resctrl_sched_in() - Writes the task's CLOSid/RMID to IA32_PQR_MSR
*
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 7fcae25874fe..1a319ce9328c 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -402,13 +402,13 @@ static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_mon_domain *hw_dom)
{
size_t tsize;
- if (resctrl_arch_is_mbm_total_enabled()) {
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID)) {
tsize = sizeof(*hw_dom->arch_mbm_total);
hw_dom->arch_mbm_total = kcalloc(num_rmid, tsize, GFP_KERNEL);
if (!hw_dom->arch_mbm_total)
return -ENOMEM;
}
- if (resctrl_arch_is_mbm_local_enabled()) {
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID)) {
tsize = sizeof(*hw_dom->arch_mbm_local);
hw_dom->arch_mbm_local = kcalloc(num_rmid, tsize, GFP_KERNEL);
if (!hw_dom->arch_mbm_local) {
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index c261558276cd..61d38517e2bf 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -207,11 +207,11 @@ void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *
{
struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
- if (resctrl_arch_is_mbm_total_enabled())
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
memset(hw_dom->arch_mbm_total, 0,
sizeof(*hw_dom->arch_mbm_total) * r->num_rmid);
- if (resctrl_arch_is_mbm_local_enabled())
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
memset(hw_dom->arch_mbm_local, 0,
sizeof(*hw_dom->arch_mbm_local) * r->num_rmid);
}
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index d98e0d2de09f..ad7ffc6acf13 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -473,12 +473,12 @@ ssize_t rdtgroup_mba_mbps_event_write(struct kernfs_open_file *of,
rdt_last_cmd_clear();
if (!strcmp(buf, "mbm_local_bytes")) {
- if (resctrl_arch_is_mbm_local_enabled())
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
rdtgrp->mba_mbps_event = QOS_L3_MBM_LOCAL_EVENT_ID;
else
ret = -EINVAL;
} else if (!strcmp(buf, "mbm_total_bytes")) {
- if (resctrl_arch_is_mbm_total_enabled())
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
rdtgrp->mba_mbps_event = QOS_L3_MBM_TOTAL_EVENT_ID;
else
ret = -EINVAL;
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 2313e48de55f..9e988b2c1a22 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -336,7 +336,7 @@ void free_rmid(u32 closid, u32 rmid)
entry = __rmid_entry(idx);
- if (resctrl_arch_is_llc_occupancy_enabled())
+ if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID))
add_rmid_to_limbo(entry);
else
list_add_tail(&entry->list, &rmid_free_lru);
@@ -637,10 +637,10 @@ static void mbm_update(struct rdt_resource *r, struct rdt_mon_domain *d,
* This is protected from concurrent reads from user as both
* the user and overflow handler hold the global mutex.
*/
- if (resctrl_arch_is_mbm_total_enabled())
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
mbm_update_one_event(r, d, closid, rmid, QOS_L3_MBM_TOTAL_EVENT_ID);
- if (resctrl_arch_is_mbm_local_enabled())
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
mbm_update_one_event(r, d, closid, rmid, QOS_L3_MBM_LOCAL_EVENT_ID);
}
@@ -879,6 +879,12 @@ void resctrl_enable_mon_event(enum resctrl_event_id eventid)
mon_event_all[eventid].enabled = true;
}
+bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid)
+{
+ return eventid >= QOS_FIRST_EVENT && eventid < QOS_NUM_EVENTS &&
+ mon_event_all[eventid].enabled;
+}
+
/**
* resctrl_mon_resource_init() - Initialise global monitoring structures.
*
@@ -914,9 +920,9 @@ int resctrl_mon_resource_init(void)
RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
}
- if (resctrl_arch_is_mbm_local_enabled())
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
mba_mbps_default_event = QOS_L3_MBM_LOCAL_EVENT_ID;
- else if (resctrl_arch_is_mbm_total_enabled())
+ else if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
mba_mbps_default_event = QOS_L3_MBM_TOTAL_EVENT_ID;
return 0;
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index b95501d4b5de..a7eeb33501da 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -123,8 +123,8 @@ void rdt_staged_configs_clear(void)
static bool resctrl_is_mbm_enabled(void)
{
- return (resctrl_arch_is_mbm_total_enabled() ||
- resctrl_arch_is_mbm_local_enabled());
+ return (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID) ||
+ resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID));
}
static bool resctrl_is_mbm_event(int e)
@@ -196,7 +196,7 @@ static int closid_alloc(void)
lockdep_assert_held(&rdtgroup_mutex);
if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID) &&
- resctrl_arch_is_llc_occupancy_enabled()) {
+ resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID)) {
cleanest_closid = resctrl_find_cleanest_closid();
if (cleanest_closid < 0)
return cleanest_closid;
@@ -4051,7 +4051,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d
if (resctrl_is_mbm_enabled())
cancel_delayed_work(&d->mbm_over);
- if (resctrl_arch_is_llc_occupancy_enabled() && has_busy_rmid(d)) {
+ if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID) && has_busy_rmid(d)) {
/*
* When a package is going down, forcefully
* decrement rmid->ebusy. There is no way to know
@@ -4087,12 +4087,12 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain
u32 idx_limit = resctrl_arch_system_num_rmid_idx();
size_t tsize;
- if (resctrl_arch_is_llc_occupancy_enabled()) {
+ if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID)) {
d->rmid_busy_llc = bitmap_zalloc(idx_limit, GFP_KERNEL);
if (!d->rmid_busy_llc)
return -ENOMEM;
}
- if (resctrl_arch_is_mbm_total_enabled()) {
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID)) {
tsize = sizeof(*d->mbm_total);
d->mbm_total = kcalloc(idx_limit, tsize, GFP_KERNEL);
if (!d->mbm_total) {
@@ -4100,7 +4100,7 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain
return -ENOMEM;
}
}
- if (resctrl_arch_is_mbm_local_enabled()) {
+ if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID)) {
tsize = sizeof(*d->mbm_local);
d->mbm_local = kcalloc(idx_limit, tsize, GFP_KERNEL);
if (!d->mbm_local) {
@@ -4145,7 +4145,7 @@ int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
RESCTRL_PICK_ANY_CPU);
}
- if (resctrl_arch_is_llc_occupancy_enabled())
+ if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID))
INIT_DELAYED_WORK(&d->cqm_limbo, cqm_handle_limbo);
/*
@@ -4220,7 +4220,7 @@ void resctrl_offline_cpu(unsigned int cpu)
cancel_delayed_work(&d->mbm_over);
mbm_setup_overflow_handler(d, 0, cpu);
}
- if (resctrl_arch_is_llc_occupancy_enabled() &&
+ if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID) &&
cpu == d->cqm_work_cpu && has_busy_rmid(d)) {
cancel_delayed_work(&d->cqm_limbo);
cqm_setup_limbo_handler(d, 0, cpu);
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 03/31] x86/resctrl: Remove 'rdt_mon_features' global variable
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
2025-07-11 23:53 ` [PATCH v7 01/31] x86,fs/resctrl: Consolidate monitor event descriptions Tony Luck
2025-07-11 23:53 ` [PATCH v7 02/31] x86,fs/resctrl: Replace architecture event enabled checks Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-11 23:53 ` [PATCH v7 04/31] x86,fs/resctrl: Prepare for more monitor events Tony Luck
` (28 subsequent siblings)
31 siblings, 0 replies; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
rdt_mon_features is used as a bitmask of enabled monitor events. A monitor
event's status is now maintained in mon_evt::enabled with all monitor
events' mon_evt structures found in the filesystem's mon_event_all[] array.
Remove the remaining uses of rdt_mon_features.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
arch/x86/include/asm/resctrl.h | 1 -
arch/x86/kernel/cpu/resctrl/core.c | 9 +++++----
arch/x86/kernel/cpu/resctrl/monitor.c | 5 -----
3 files changed, 5 insertions(+), 10 deletions(-)
diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index b1dd5d6b87db..575f8408a9e7 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -44,7 +44,6 @@ DECLARE_PER_CPU(struct resctrl_pqr_state, pqr_state);
extern bool rdt_alloc_capable;
extern bool rdt_mon_capable;
-extern unsigned int rdt_mon_features;
DECLARE_STATIC_KEY_FALSE(rdt_enable_key);
DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 1a319ce9328c..5d14f9a14eda 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -863,21 +863,22 @@ static __init bool get_rdt_alloc_resources(void)
static __init bool get_rdt_mon_resources(void)
{
struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ bool ret = false;
if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) {
resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID);
- rdt_mon_features |= (1 << QOS_L3_OCCUP_EVENT_ID);
+ ret = true;
}
if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID);
- rdt_mon_features |= (1 << QOS_L3_MBM_TOTAL_EVENT_ID);
+ ret = true;
}
if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID);
- rdt_mon_features |= (1 << QOS_L3_MBM_LOCAL_EVENT_ID);
+ ret = true;
}
- if (!rdt_mon_features)
+ if (!ret)
return false;
return !rdt_get_mon_l3_config(r);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 61d38517e2bf..07f8ab097cbe 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -31,11 +31,6 @@
*/
bool rdt_mon_capable;
-/*
- * Global to indicate which monitoring events are enabled.
- */
-unsigned int rdt_mon_features;
-
#define CF(cf) ((unsigned long)(1048576 * (cf) + 0.5))
static int snc_nodes_per_l3_cache = 1;
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 04/31] x86,fs/resctrl: Prepare for more monitor events
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
` (2 preceding siblings ...)
2025-07-11 23:53 ` [PATCH v7 03/31] x86/resctrl: Remove 'rdt_mon_features' global variable Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-11 23:53 ` [PATCH v7 05/31] x86,fs/resctrl: Improve domain type checking Tony Luck
` (27 subsequent siblings)
31 siblings, 0 replies; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
There's a rule in computer programming that objects appear zero,
once, or many times. So code accordingly.
There are two MBM events and resctrl is coded with a lot of
if (local)
do one thing
if (total)
do a different thing
Change the rdt_mon_domain and rdt_hw_mon_domain structures to hold arrays
of pointers to per event data instead of explicit fields for total and
local bandwidth.
Simplify by coding for many events using loops on which are enabled.
Move resctrl_is_mbm_event() to <linux/resctrl.h> so it can be used more
widely. Also provide a for_each_mbm_event_id() helper macro.
Cleanup variable names in functions touched to consistently use
"eventid" for those with type enum resctrl_event_id.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
---
include/linux/resctrl.h | 23 +++++++++---
include/linux/resctrl_types.h | 3 ++
arch/x86/kernel/cpu/resctrl/internal.h | 8 ++---
arch/x86/kernel/cpu/resctrl/core.c | 40 +++++++++++----------
arch/x86/kernel/cpu/resctrl/monitor.c | 36 +++++++++----------
fs/resctrl/monitor.c | 13 ++++---
fs/resctrl/rdtgroup.c | 50 +++++++++++++-------------
7 files changed, 96 insertions(+), 77 deletions(-)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 40aba6b5d4f0..478d7a935ca3 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -161,8 +161,9 @@ struct rdt_ctrl_domain {
* @hdr: common header for different domain types
* @ci_id: cache info id for this domain
* @rmid_busy_llc: bitmap of which limbo RMIDs are above threshold
- * @mbm_total: saved state for MBM total bandwidth
- * @mbm_local: saved state for MBM local bandwidth
+ * @mbm_states: Per-event pointer to the MBM event's saved state.
+ * An MBM event's state is an array of struct mbm_state
+ * indexed by RMID on x86 or combined CLOSID, RMID on Arm.
* @mbm_over: worker to periodically read MBM h/w counters
* @cqm_limbo: worker to periodically read CQM h/w counters
* @mbm_work_cpu: worker CPU for MBM h/w counters
@@ -172,8 +173,7 @@ struct rdt_mon_domain {
struct rdt_domain_hdr hdr;
unsigned int ci_id;
unsigned long *rmid_busy_llc;
- struct mbm_state *mbm_total;
- struct mbm_state *mbm_local;
+ struct mbm_state *mbm_states[QOS_NUM_L3_MBM_EVENTS];
struct delayed_work mbm_over;
struct delayed_work cqm_limbo;
int mbm_work_cpu;
@@ -376,6 +376,21 @@ bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid);
bool resctrl_arch_is_evt_configurable(enum resctrl_event_id evt);
+static inline bool resctrl_is_mbm_event(enum resctrl_event_id eventid)
+{
+ return (eventid >= QOS_L3_MBM_TOTAL_EVENT_ID &&
+ eventid <= QOS_L3_MBM_LOCAL_EVENT_ID);
+}
+
+/* Iterate over all memory bandwidth events */
+#define for_each_mbm_event_id(eventid) \
+ for (eventid = QOS_L3_MBM_TOTAL_EVENT_ID; \
+ eventid <= QOS_L3_MBM_LOCAL_EVENT_ID; eventid++)
+
+/* Iterate over memory bandwidth arrays in domain structures */
+#define for_each_mbm_idx(idx) \
+ for (idx = 0; idx < QOS_NUM_L3_MBM_EVENTS; idx++)
+
/**
* resctrl_arch_mon_event_config_write() - Write the config for an event.
* @config_info: struct resctrl_mon_config_info describing the resource, domain
diff --git a/include/linux/resctrl_types.h b/include/linux/resctrl_types.h
index 2dadbc54e4b3..d98351663c2c 100644
--- a/include/linux/resctrl_types.h
+++ b/include/linux/resctrl_types.h
@@ -51,4 +51,7 @@ enum resctrl_event_id {
QOS_NUM_EVENTS,
};
+#define QOS_NUM_L3_MBM_EVENTS (QOS_L3_MBM_LOCAL_EVENT_ID - QOS_L3_MBM_TOTAL_EVENT_ID + 1)
+#define MBM_STATE_IDX(evt) ((evt) - QOS_L3_MBM_TOTAL_EVENT_ID)
+
#endif /* __LINUX_RESCTRL_TYPES_H */
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 5e3c41b36437..58dca892a5df 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -54,15 +54,15 @@ struct rdt_hw_ctrl_domain {
* struct rdt_hw_mon_domain - Arch private attributes of a set of CPUs that share
* a resource for a monitor function
* @d_resctrl: Properties exposed to the resctrl file system
- * @arch_mbm_total: arch private state for MBM total bandwidth
- * @arch_mbm_local: arch private state for MBM local bandwidth
+ * @arch_mbm_states: Per-event pointer to the MBM event's saved state.
+ * An MBM event's state is an array of struct arch_mbm_state
+ * indexed by RMID on x86.
*
* Members of this structure are accessed via helpers that provide abstraction.
*/
struct rdt_hw_mon_domain {
struct rdt_mon_domain d_resctrl;
- struct arch_mbm_state *arch_mbm_total;
- struct arch_mbm_state *arch_mbm_local;
+ struct arch_mbm_state *arch_mbm_states[QOS_NUM_L3_MBM_EVENTS];
};
static inline struct rdt_hw_ctrl_domain *resctrl_to_arch_ctrl_dom(struct rdt_ctrl_domain *r)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 5d14f9a14eda..fbf019c1ff11 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -365,8 +365,10 @@ static void ctrl_domain_free(struct rdt_hw_ctrl_domain *hw_dom)
static void mon_domain_free(struct rdt_hw_mon_domain *hw_dom)
{
- kfree(hw_dom->arch_mbm_total);
- kfree(hw_dom->arch_mbm_local);
+ int idx;
+
+ for_each_mbm_idx(idx)
+ kfree(hw_dom->arch_mbm_states[idx]);
kfree(hw_dom);
}
@@ -400,25 +402,27 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_ctrl_domain *
*/
static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_mon_domain *hw_dom)
{
- size_t tsize;
-
- if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID)) {
- tsize = sizeof(*hw_dom->arch_mbm_total);
- hw_dom->arch_mbm_total = kcalloc(num_rmid, tsize, GFP_KERNEL);
- if (!hw_dom->arch_mbm_total)
- return -ENOMEM;
- }
- if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID)) {
- tsize = sizeof(*hw_dom->arch_mbm_local);
- hw_dom->arch_mbm_local = kcalloc(num_rmid, tsize, GFP_KERNEL);
- if (!hw_dom->arch_mbm_local) {
- kfree(hw_dom->arch_mbm_total);
- hw_dom->arch_mbm_total = NULL;
- return -ENOMEM;
- }
+ size_t tsize = sizeof(*hw_dom->arch_mbm_states[0]);
+ enum resctrl_event_id eventid;
+ int idx;
+
+ for_each_mbm_event_id(eventid) {
+ if (!resctrl_is_mon_event_enabled(eventid))
+ continue;
+ idx = MBM_STATE_IDX(eventid);
+ hw_dom->arch_mbm_states[idx] = kcalloc(num_rmid, tsize, GFP_KERNEL);
+ if (!hw_dom->arch_mbm_states[idx])
+ goto cleanup;
}
return 0;
+cleanup:
+ for_each_mbm_idx(idx) {
+ kfree(hw_dom->arch_mbm_states[idx]);
+ hw_dom->arch_mbm_states[idx] = NULL;
+ }
+
+ return -ENOMEM;
}
static int get_domain_id_from_scope(int cpu, enum resctrl_scope scope)
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 07f8ab097cbe..f01db2034d08 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -161,18 +161,14 @@ static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_mon_domain *hw_do
u32 rmid,
enum resctrl_event_id eventid)
{
- switch (eventid) {
- case QOS_L3_OCCUP_EVENT_ID:
- return NULL;
- case QOS_L3_MBM_TOTAL_EVENT_ID:
- return &hw_dom->arch_mbm_total[rmid];
- case QOS_L3_MBM_LOCAL_EVENT_ID:
- return &hw_dom->arch_mbm_local[rmid];
- default:
- /* Never expect to get here */
- WARN_ON_ONCE(1);
+ struct arch_mbm_state *state;
+
+ if (!resctrl_is_mbm_event(eventid))
return NULL;
- }
+
+ state = hw_dom->arch_mbm_states[MBM_STATE_IDX(eventid)];
+
+ return state ? &state[rmid] : NULL;
}
void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
@@ -201,14 +197,16 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d)
{
struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
-
- if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
- memset(hw_dom->arch_mbm_total, 0,
- sizeof(*hw_dom->arch_mbm_total) * r->num_rmid);
-
- if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
- memset(hw_dom->arch_mbm_local, 0,
- sizeof(*hw_dom->arch_mbm_local) * r->num_rmid);
+ enum resctrl_event_id eventid;
+ int idx;
+
+ for_each_mbm_event_id(eventid) {
+ if (!resctrl_is_mon_event_enabled(eventid))
+ continue;
+ idx = MBM_STATE_IDX(eventid);
+ memset(hw_dom->arch_mbm_states[idx], 0,
+ sizeof(*hw_dom->arch_mbm_states[0]) * r->num_rmid);
+ }
}
static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 9e988b2c1a22..dcc6c00eb362 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -346,15 +346,14 @@ static struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
u32 rmid, enum resctrl_event_id evtid)
{
u32 idx = resctrl_arch_rmid_idx_encode(closid, rmid);
+ struct mbm_state *state;
- switch (evtid) {
- case QOS_L3_MBM_TOTAL_EVENT_ID:
- return &d->mbm_total[idx];
- case QOS_L3_MBM_LOCAL_EVENT_ID:
- return &d->mbm_local[idx];
- default:
+ if (!resctrl_is_mbm_event(evtid))
return NULL;
- }
+
+ state = d->mbm_states[MBM_STATE_IDX(evtid)];
+
+ return state ? &state[idx] : NULL;
}
static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index a7eeb33501da..77336d5e4915 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -127,12 +127,6 @@ static bool resctrl_is_mbm_enabled(void)
resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID));
}
-static bool resctrl_is_mbm_event(int e)
-{
- return (e >= QOS_L3_MBM_TOTAL_EVENT_ID &&
- e <= QOS_L3_MBM_LOCAL_EVENT_ID);
-}
-
/*
* Trivial allocator for CLOSIDs. Use BITMAP APIs to manipulate a bitmap
* of free CLOSIDs.
@@ -4023,9 +4017,13 @@ static void rdtgroup_setup_default(void)
static void domain_destroy_mon_state(struct rdt_mon_domain *d)
{
+ int idx;
+
bitmap_free(d->rmid_busy_llc);
- kfree(d->mbm_total);
- kfree(d->mbm_local);
+ for_each_mbm_idx(idx) {
+ kfree(d->mbm_states[idx]);
+ d->mbm_states[idx] = NULL;
+ }
}
void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d)
@@ -4085,32 +4083,34 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d
static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain *d)
{
u32 idx_limit = resctrl_arch_system_num_rmid_idx();
- size_t tsize;
+ size_t tsize = sizeof(*d->mbm_states[0]);
+ enum resctrl_event_id eventid;
+ int idx;
if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID)) {
d->rmid_busy_llc = bitmap_zalloc(idx_limit, GFP_KERNEL);
if (!d->rmid_busy_llc)
return -ENOMEM;
}
- if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID)) {
- tsize = sizeof(*d->mbm_total);
- d->mbm_total = kcalloc(idx_limit, tsize, GFP_KERNEL);
- if (!d->mbm_total) {
- bitmap_free(d->rmid_busy_llc);
- return -ENOMEM;
- }
- }
- if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID)) {
- tsize = sizeof(*d->mbm_local);
- d->mbm_local = kcalloc(idx_limit, tsize, GFP_KERNEL);
- if (!d->mbm_local) {
- bitmap_free(d->rmid_busy_llc);
- kfree(d->mbm_total);
- return -ENOMEM;
- }
+
+ for_each_mbm_event_id(eventid) {
+ if (!resctrl_is_mon_event_enabled(eventid))
+ continue;
+ idx = MBM_STATE_IDX(eventid);
+ d->mbm_states[idx] = kcalloc(idx_limit, tsize, GFP_KERNEL);
+ if (!d->mbm_states[idx])
+ goto cleanup;
}
return 0;
+cleanup:
+ bitmap_free(d->rmid_busy_llc);
+ for_each_mbm_idx(idx) {
+ kfree(d->mbm_states[idx]);
+ d->mbm_states[idx] = NULL;
+ }
+
+ return -ENOMEM;
}
int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d)
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 05/31] x86,fs/resctrl: Improve domain type checking
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
` (3 preceding siblings ...)
2025-07-11 23:53 ` [PATCH v7 04/31] x86,fs/resctrl: Prepare for more monitor events Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-25 23:17 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 06/31] x86/resctrl: Move L3 initialization into new helper function Tony Luck
` (26 subsequent siblings)
31 siblings, 1 reply; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
The rdt_domain_hdr structure is used in both control and monitor
domain structures to provide common methods for operations such as
adding a CPU to a domain, removing a CPU from a domain, accessing
the mask of all CPUs in a domain.
The "type" field provides a simple check whether a domain is a
control or monitor domain so that programming errors operating
on domains will be quickly caught.
To prepare for additional domain types that depend on the rdt_resource
to which they are connected add the resource id into the header
and check that in addition to the type.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
include/linux/resctrl.h | 9 +++++++++
arch/x86/kernel/cpu/resctrl/core.c | 10 ++++++----
fs/resctrl/ctrlmondata.c | 2 +-
3 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 478d7a935ca3..091135eca2b8 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -131,15 +131,24 @@ enum resctrl_domain_type {
* @list: all instances of this resource
* @id: unique id for this instance
* @type: type of this instance
+ * @rid: resource id for this instance
* @cpu_mask: which CPUs share this resource
*/
struct rdt_domain_hdr {
struct list_head list;
int id;
enum resctrl_domain_type type;
+ enum resctrl_res_level rid;
struct cpumask cpu_mask;
};
+static inline bool domain_header_is_valid(struct rdt_domain_hdr *hdr,
+ enum resctrl_domain_type type,
+ enum resctrl_res_level rid)
+{
+ return !WARN_ON_ONCE(hdr->type != type || hdr->rid != rid);
+}
+
/**
* struct rdt_ctrl_domain - group of CPUs sharing a resctrl control resource
* @hdr: common header for different domain types
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index fbf019c1ff11..420e4eb7c160 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -459,7 +459,7 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
hdr = resctrl_find_domain(&r->ctrl_domains, id, &add_pos);
if (hdr) {
- if (WARN_ON_ONCE(hdr->type != RESCTRL_CTRL_DOMAIN))
+ if (!domain_header_is_valid(hdr, RESCTRL_CTRL_DOMAIN, r->rid))
return;
d = container_of(hdr, struct rdt_ctrl_domain, hdr);
@@ -476,6 +476,7 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
d = &hw_dom->d_resctrl;
d->hdr.id = id;
d->hdr.type = RESCTRL_CTRL_DOMAIN;
+ d->hdr.rid = r->rid;
cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
rdt_domain_reconfigure_cdp(r);
@@ -515,7 +516,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
hdr = resctrl_find_domain(&r->mon_domains, id, &add_pos);
if (hdr) {
- if (WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN))
+ if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, r->rid))
return;
d = container_of(hdr, struct rdt_mon_domain, hdr);
@@ -530,6 +531,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
d = &hw_dom->d_resctrl;
d->hdr.id = id;
d->hdr.type = RESCTRL_MON_DOMAIN;
+ d->hdr.rid = r->rid;
ci = get_cpu_cacheinfo_level(cpu, RESCTRL_L3_CACHE);
if (!ci) {
pr_warn_once("Can't find L3 cache for CPU:%d resource %s\n", cpu, r->name);
@@ -586,7 +588,7 @@ static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
return;
}
- if (WARN_ON_ONCE(hdr->type != RESCTRL_CTRL_DOMAIN))
+ if (!domain_header_is_valid(hdr, RESCTRL_CTRL_DOMAIN, r->rid))
return;
d = container_of(hdr, struct rdt_ctrl_domain, hdr);
@@ -632,7 +634,7 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
return;
}
- if (WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN))
+ if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, r->rid))
return;
d = container_of(hdr, struct rdt_mon_domain, hdr);
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index ad7ffc6acf13..a7d60e74a29d 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -643,7 +643,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
* the resource to find the domain with "domid".
*/
hdr = resctrl_find_domain(&r->mon_domains, domid, NULL);
- if (!hdr || WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN)) {
+ if (!hdr || !domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, resid)) {
ret = -ENOENT;
goto out;
}
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 06/31] x86/resctrl: Move L3 initialization into new helper function
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
` (4 preceding siblings ...)
2025-07-11 23:53 ` [PATCH v7 05/31] x86,fs/resctrl: Improve domain type checking Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-25 23:21 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 07/31] x86,fs/resctrl: Refactor domain_remove_cpu_mon() ready for new domain types Tony Luck
` (25 subsequent siblings)
31 siblings, 1 reply; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
To prepare for additional types of monitoring domains, move open coded L3
resource monitoring domain initialization from domain_add_cpu_mon() into
a new helper function l3_mon_domain_setup() called by domain_add_cpu_mon().
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
arch/x86/kernel/cpu/resctrl/core.c | 55 ++++++++++++++++++------------
1 file changed, 33 insertions(+), 22 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 420e4eb7c160..20b6f2bbf858 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -496,34 +496,13 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
}
}
-static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
+static void l3_mon_domain_setup(int cpu, int id, struct rdt_resource *r, struct list_head *add_pos)
{
- int id = get_domain_id_from_scope(cpu, r->mon_scope);
- struct list_head *add_pos = NULL;
struct rdt_hw_mon_domain *hw_dom;
- struct rdt_domain_hdr *hdr;
struct rdt_mon_domain *d;
struct cacheinfo *ci;
int err;
- lockdep_assert_held(&domain_list_lock);
-
- if (id < 0) {
- pr_warn_once("Can't find monitor domain id for CPU:%d scope:%d for resource %s\n",
- cpu, r->mon_scope, r->name);
- return;
- }
-
- hdr = resctrl_find_domain(&r->mon_domains, id, &add_pos);
- if (hdr) {
- if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, r->rid))
- return;
- d = container_of(hdr, struct rdt_mon_domain, hdr);
-
- cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
- return;
- }
-
hw_dom = kzalloc_node(sizeof(*hw_dom), GFP_KERNEL, cpu_to_node(cpu));
if (!hw_dom)
return;
@@ -558,6 +537,38 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
}
}
+static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
+{
+ int id = get_domain_id_from_scope(cpu, r->mon_scope);
+ struct list_head *add_pos = NULL;
+ struct rdt_domain_hdr *hdr;
+
+ lockdep_assert_held(&domain_list_lock);
+
+ if (id < 0) {
+ pr_warn_once("Can't find monitor domain id for CPU:%d scope:%d for resource %s\n",
+ cpu, r->mon_scope, r->name);
+ return;
+ }
+
+ hdr = resctrl_find_domain(&r->mon_domains, id, &add_pos);
+ if (hdr) {
+ if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, r->rid))
+ return;
+ cpumask_set_cpu(cpu, &hdr->cpu_mask);
+
+ return;
+ }
+
+ switch (r->rid) {
+ case RDT_RESOURCE_L3:
+ l3_mon_domain_setup(cpu, id, r, add_pos);
+ break;
+ default:
+ WARN_ON_ONCE(1);
+ }
+}
+
static void domain_add_cpu(int cpu, struct rdt_resource *r)
{
if (r->alloc_capable)
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 07/31] x86,fs/resctrl: Refactor domain_remove_cpu_mon() ready for new domain types
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
` (5 preceding siblings ...)
2025-07-11 23:53 ` [PATCH v7 06/31] x86/resctrl: Move L3 initialization into new helper function Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-25 23:29 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 08/31] x86/resctrl: Clean up domain_remove_cpu_ctrl() Tony Luck
` (24 subsequent siblings)
31 siblings, 1 reply; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
Historically all monitoring events have been associated with the L3
resource. This will change when support for telemetry events is added.
The RDT_RESOURCE_L3 resource carries a lot of state in the domain
structures which needs to be dealt with when a domain is taken offline
by removing the last CPU in the domain.
Refactor domain_remove_cpu_mon() so all the L3 processing is separated
from general actions of clearing the CPU bit in the mask and removing
directories from mon_data.
resctrl_offline_mon_domain() needs to remove domain specific
directories and files from the "mon_data" directories, but can skip the
L3 resource specific cleanup when called for other resource types.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
arch/x86/kernel/cpu/resctrl/core.c | 17 +++++++++++------
fs/resctrl/rdtgroup.c | 5 ++++-
2 files changed, 15 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 20b6f2bbf858..49e17c246c60 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -648,17 +648,22 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, r->rid))
return;
- d = container_of(hdr, struct rdt_mon_domain, hdr);
- hw_dom = resctrl_to_arch_mon_dom(d);
+ cpumask_clear_cpu(cpu, &hdr->cpu_mask);
+ if (!cpumask_empty(&hdr->cpu_mask))
+ return;
- cpumask_clear_cpu(cpu, &d->hdr.cpu_mask);
- if (cpumask_empty(&d->hdr.cpu_mask)) {
+ switch (r->rid) {
+ case RDT_RESOURCE_L3:
+ d = container_of(hdr, struct rdt_mon_domain, hdr);
+ hw_dom = resctrl_to_arch_mon_dom(d);
resctrl_offline_mon_domain(r, d);
list_del_rcu(&d->hdr.list);
synchronize_rcu();
mon_domain_free(hw_dom);
-
- return;
+ break;
+ default:
+ pr_warn_once("Unknown resource rid=%d\n", r->rid);
+ break;
}
}
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 77336d5e4915..05438e15e2ca 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -4047,6 +4047,9 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d
if (resctrl_mounted && resctrl_arch_mon_capable())
rmdir_mondata_subdir_allrdtgrp(r, d);
+ if (r->rid != RDT_RESOURCE_L3)
+ goto out_unlock;
+
if (resctrl_is_mbm_enabled())
cancel_delayed_work(&d->mbm_over);
if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID) && has_busy_rmid(d)) {
@@ -4063,7 +4066,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d
}
domain_destroy_mon_state(d);
-
+out_unlock:
mutex_unlock(&rdtgroup_mutex);
}
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 08/31] x86/resctrl: Clean up domain_remove_cpu_ctrl()
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
` (6 preceding siblings ...)
2025-07-11 23:53 ` [PATCH v7 07/31] x86,fs/resctrl: Refactor domain_remove_cpu_mon() ready for new domain types Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-25 23:22 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 09/31] x86,fs/resctrl: Use struct rdt_domain_hdr instead of struct rdt_mon_domain Tony Luck
` (23 subsequent siblings)
31 siblings, 1 reply; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
For symmetry with domain_remove_cpu_mon() refactor to take an
early return when removing a CPU does not empty the domain.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
arch/x86/kernel/cpu/resctrl/core.c | 29 ++++++++++++++---------------
1 file changed, 14 insertions(+), 15 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 49e17c246c60..0c5ada54bb20 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -602,25 +602,24 @@ static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
if (!domain_header_is_valid(hdr, RESCTRL_CTRL_DOMAIN, r->rid))
return;
+ cpumask_clear_cpu(cpu, &hdr->cpu_mask);
+ if (!cpumask_empty(&hdr->cpu_mask))
+ return;
+
d = container_of(hdr, struct rdt_ctrl_domain, hdr);
hw_dom = resctrl_to_arch_ctrl_dom(d);
- cpumask_clear_cpu(cpu, &d->hdr.cpu_mask);
- if (cpumask_empty(&d->hdr.cpu_mask)) {
- resctrl_offline_ctrl_domain(r, d);
- list_del_rcu(&d->hdr.list);
- synchronize_rcu();
-
- /*
- * rdt_ctrl_domain "d" is going to be freed below, so clear
- * its pointer from pseudo_lock_region struct.
- */
- if (d->plr)
- d->plr->d = NULL;
- ctrl_domain_free(hw_dom);
+ resctrl_offline_ctrl_domain(r, d);
+ list_del_rcu(&d->hdr.list);
+ synchronize_rcu();
- return;
- }
+ /*
+ * rdt_ctrl_domain "d" is going to be freed below, so clear
+ * its pointer from pseudo_lock_region struct.
+ */
+ if (d->plr)
+ d->plr->d = NULL;
+ ctrl_domain_free(hw_dom);
}
static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 09/31] x86,fs/resctrl: Use struct rdt_domain_hdr instead of struct rdt_mon_domain
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
` (7 preceding siblings ...)
2025-07-11 23:53 ` [PATCH v7 08/31] x86/resctrl: Clean up domain_remove_cpu_ctrl() Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-25 23:25 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 10/31] x86,fs/resctrl: Rename struct rdt_mon_domain and rdt_hw_mon_domain Tony Luck
` (22 subsequent siblings)
31 siblings, 1 reply; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
Historically all monitoring events have been associated with
the L3 resource and it made sense to use the L3 specific "struct
rdt_mon_domain *" arguments to functions manipulating domains. But
the addition of monitor events tied to other resources changes this
assumption.
To enable enumeration of domains for events in other resources, change
the calling sequence to use the generic struct rdt_domain_hdr for domain
addition and deletion to preserve as much common code as possible.
Same change to allow reading events in other resources. In this case
the code flow passes from mon_event_read() via smp_call*() eventually
to __mon_event_count() so the rmid_read::d field is replaced with
the new rmid_read::hdr field.
The mon_data structure is unchanged, but documentation is updated
to note that mon_data::sum is only used for RDT_RESOURCE_L3.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
include/linux/resctrl.h | 8 +-
fs/resctrl/internal.h | 14 ++--
arch/x86/kernel/cpu/resctrl/core.c | 4 +-
arch/x86/kernel/cpu/resctrl/monitor.c | 18 ++++-
fs/resctrl/ctrlmondata.c | 14 ++--
fs/resctrl/monitor.c | 31 +++++---
fs/resctrl/rdtgroup.c | 103 ++++++++++++++++++--------
7 files changed, 129 insertions(+), 63 deletions(-)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 091135eca2b8..c8200626b91a 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -452,9 +452,9 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
u32 closid, enum resctrl_conf_type type);
int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d);
-int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d);
+int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr);
void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d);
-void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d);
+void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr);
void resctrl_online_cpu(unsigned int cpu);
void resctrl_offline_cpu(unsigned int cpu);
@@ -462,7 +462,7 @@ void resctrl_offline_cpu(unsigned int cpu);
* resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid
* for this resource and domain.
* @r: resource that the counter should be read from.
- * @d: domain that the counter should be read from.
+ * @hdr: Header of domain that the counter should be read from.
* @closid: closid that matches the rmid. Depending on the architecture, the
* counter may match traffic of both @closid and @rmid, or @rmid
* only.
@@ -483,7 +483,7 @@ void resctrl_offline_cpu(unsigned int cpu);
* Return:
* 0 on success, or -EIO, -EINVAL etc on error.
*/
-int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
u32 closid, u32 rmid, enum resctrl_event_id eventid,
u64 *val, void *arch_mon_ctx);
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 4f315b7e9ec0..b19e974b7865 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -77,8 +77,8 @@ extern struct mon_evt mon_event_all[QOS_NUM_EVENTS];
* @list: Member of the global @mon_data_kn_priv_list list.
* @rid: Resource id associated with the event file.
* @evtid: Event id associated with the event file.
- * @sum: Set when event must be summed across multiple
- * domains.
+ * @sum: Set for RDT_RESOURCE_L3 when event must be summed
+ * across multiple domains.
* @domid: When @sum is zero this is the domain to which
* the event file belongs. When @sum is one this
* is the id of the L3 cache that all domains to be
@@ -101,22 +101,22 @@ struct mon_data {
* resource group then its event count is summed with the count from all
* its child resource groups.
* @r: Resource describing the properties of the event being read.
- * @d: Domain that the counter should be read from. If NULL then sum all
+ * @hdr: Header of domain that the counter should be read from. If NULL then sum all
* domains in @r sharing L3 @ci.id
* @evtid: Which monitor event to read.
* @first: Initialize MBM counter when true.
- * @ci_id: Cacheinfo id for L3. Only set when @d is NULL. Used when summing domains.
+ * @ci_id: Cacheinfo id for L3. Only set when @hdr is NULL. Used when summing domains.
* @err: Error encountered when reading counter.
* @val: Returned value of event counter. If @rgrp is a parent resource group,
* @val includes the sum of event counts from its child resource groups.
- * If @d is NULL, @val includes the sum of all domains in @r sharing @ci.id,
+ * If @hdr is NULL, @val includes the sum of all domains in @r sharing @ci.id,
* (summed across child resource groups if @rgrp is a parent resource group).
* @arch_mon_ctx: Hardware monitor allocated for this read request (MPAM only).
*/
struct rmid_read {
struct rdtgroup *rgrp;
struct rdt_resource *r;
- struct rdt_mon_domain *d;
+ struct rdt_domain_hdr *hdr;
enum resctrl_event_id evtid;
bool first;
unsigned int ci_id;
@@ -352,7 +352,7 @@ void mon_event_count(void *info);
int rdtgroup_mondata_show(struct seq_file *m, void *arg);
void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
- struct rdt_mon_domain *d, struct rdtgroup *rdtgrp,
+ struct rdt_domain_hdr *hdr, struct rdtgroup *rdtgrp,
cpumask_t *cpumask, int evtid, int first);
int resctrl_mon_resource_init(void);
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 0c5ada54bb20..0bf793579b9a 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -529,7 +529,7 @@ static void l3_mon_domain_setup(int cpu, int id, struct rdt_resource *r, struct
list_add_tail_rcu(&d->hdr.list, add_pos);
- err = resctrl_online_mon_domain(r, d);
+ err = resctrl_online_mon_domain(r, &d->hdr);
if (err) {
list_del_rcu(&d->hdr.list);
synchronize_rcu();
@@ -655,7 +655,7 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
case RDT_RESOURCE_L3:
d = container_of(hdr, struct rdt_mon_domain, hdr);
hw_dom = resctrl_to_arch_mon_dom(d);
- resctrl_offline_mon_domain(r, d);
+ resctrl_offline_mon_domain(r, hdr);
list_del_rcu(&d->hdr.list);
synchronize_rcu();
mon_domain_free(hw_dom);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index f01db2034d08..b31794c5dcd4 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -217,20 +217,30 @@ static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
return chunks >> shift;
}
-int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
u32 unused, u32 rmid, enum resctrl_event_id eventid,
u64 *val, void *ignored)
{
- struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
- struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
- int cpu = cpumask_any(&d->hdr.cpu_mask);
+ int cpu = cpumask_any(&hdr->cpu_mask);
+ struct rdt_hw_mon_domain *hw_dom;
+ struct rdt_hw_resource *hw_res;
struct arch_mbm_state *am;
+ struct rdt_mon_domain *d;
u64 msr_val, chunks;
u32 prmid;
int ret;
resctrl_arch_rmid_read_context_check();
+ if (r->rid != RDT_RESOURCE_L3)
+ return -EINVAL;
+
+ if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
+ return -EINVAL;
+
+ d = container_of(hdr, struct rdt_mon_domain, hdr);
+ hw_dom = resctrl_to_arch_mon_dom(d);
+ hw_res = resctrl_to_arch_res(r);
prmid = logical_rmid_to_physical_rmid(cpu, rmid);
ret = __rmid_read_phys(prmid, eventid, &msr_val);
if (ret)
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index a7d60e74a29d..1c1c0e7bbc11 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -547,7 +547,7 @@ struct rdt_domain_hdr *resctrl_find_domain(struct list_head *h, int id,
}
void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
- struct rdt_mon_domain *d, struct rdtgroup *rdtgrp,
+ struct rdt_domain_hdr *hdr, struct rdtgroup *rdtgrp,
cpumask_t *cpumask, int evtid, int first)
{
int cpu;
@@ -561,7 +561,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
rr->rgrp = rdtgrp;
rr->evtid = evtid;
rr->r = r;
- rr->d = d;
+ rr->hdr = hdr;
rr->first = first;
rr->arch_mon_ctx = resctrl_arch_mon_ctx_alloc(r, evtid);
if (IS_ERR(rr->arch_mon_ctx)) {
@@ -592,7 +592,6 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
enum resctrl_event_id evtid;
struct rdt_domain_hdr *hdr;
struct rmid_read rr = {0};
- struct rdt_mon_domain *d;
struct rdtgroup *rdtgrp;
int domid, cpu, ret = 0;
struct rdt_resource *r;
@@ -617,6 +616,12 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
r = resctrl_arch_get_resource(resid);
if (md->sum) {
+ struct rdt_mon_domain *d;
+
+ if (WARN_ON_ONCE(resid != RDT_RESOURCE_L3)) {
+ ret = -EIO;
+ goto out;
+ }
/*
* This file requires summing across all domains that share
* the L3 cache id that was provided in the "domid" field of the
@@ -647,8 +652,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
ret = -ENOENT;
goto out;
}
- d = container_of(hdr, struct rdt_mon_domain, hdr);
- mon_event_read(&rr, r, d, rdtgrp, &d->hdr.cpu_mask, evtid, false);
+ mon_event_read(&rr, r, hdr, rdtgrp, &hdr->cpu_mask, evtid, false);
}
checkresult:
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index dcc6c00eb362..85fe88b965fa 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -159,7 +159,7 @@ void __check_limbo(struct rdt_mon_domain *d, bool force_free)
break;
entry = __rmid_entry(idx);
- if (resctrl_arch_rmid_read(r, d, entry->closid, entry->rmid,
+ if (resctrl_arch_rmid_read(r, &d->hdr, entry->closid, entry->rmid,
QOS_L3_OCCUP_EVENT_ID, &val,
arch_mon_ctx)) {
rmid_dirty = true;
@@ -365,19 +365,23 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
int err, ret;
u64 tval = 0;
- if (rr->first) {
- resctrl_arch_reset_rmid(rr->r, rr->d, closid, rmid, rr->evtid);
- m = get_mbm_state(rr->d, closid, rmid, rr->evtid);
+ if (rr->r->rid == RDT_RESOURCE_L3 && rr->first) {
+ if (WARN_ON_ONCE(!domain_header_is_valid(rr->hdr, RESCTRL_MON_DOMAIN,
+ RDT_RESOURCE_L3)))
+ return -EINVAL;
+ d = container_of(rr->hdr, struct rdt_mon_domain, hdr);
+ resctrl_arch_reset_rmid(rr->r, d, closid, rmid, rr->evtid);
+ m = get_mbm_state(d, closid, rmid, rr->evtid);
if (m)
memset(m, 0, sizeof(struct mbm_state));
return 0;
}
- if (rr->d) {
+ if (rr->hdr) {
/* Reading a single domain, must be on a CPU in that domain. */
- if (!cpumask_test_cpu(cpu, &rr->d->hdr.cpu_mask))
+ if (!cpumask_test_cpu(cpu, &rr->hdr->cpu_mask))
return -EINVAL;
- rr->err = resctrl_arch_rmid_read(rr->r, rr->d, closid, rmid,
+ rr->err = resctrl_arch_rmid_read(rr->r, rr->hdr, closid, rmid,
rr->evtid, &tval, rr->arch_mon_ctx);
if (rr->err)
return rr->err;
@@ -387,6 +391,9 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
return 0;
}
+ if (WARN_ON_ONCE(rr->r->rid != RDT_RESOURCE_L3))
+ return -EINVAL;
+
/* Summing domains that share a cache, must be on a CPU for that cache. */
ci = get_cpu_cacheinfo_level(cpu, RESCTRL_L3_CACHE);
if (!ci || ci->id != rr->ci_id)
@@ -403,7 +410,7 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
list_for_each_entry(d, &rr->r->mon_domains, hdr.list) {
if (d->ci_id != rr->ci_id)
continue;
- err = resctrl_arch_rmid_read(rr->r, d, closid, rmid,
+ err = resctrl_arch_rmid_read(rr->r, &d->hdr, closid, rmid,
rr->evtid, &tval, rr->arch_mon_ctx);
if (!err) {
rr->val += tval;
@@ -432,9 +439,13 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
static void mbm_bw_count(u32 closid, u32 rmid, struct rmid_read *rr)
{
u64 cur_bw, bytes, cur_bytes;
+ struct rdt_mon_domain *d;
struct mbm_state *m;
- m = get_mbm_state(rr->d, closid, rmid, rr->evtid);
+ if (WARN_ON_ONCE(domain_header_is_valid(rr->hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3)))
+ return;
+ d = container_of(rr->hdr, struct rdt_mon_domain, hdr);
+ m = get_mbm_state(d, closid, rmid, rr->evtid);
if (WARN_ON_ONCE(!m))
return;
@@ -608,7 +619,7 @@ static void mbm_update_one_event(struct rdt_resource *r, struct rdt_mon_domain *
struct rmid_read rr = {0};
rr.r = r;
- rr.d = d;
+ rr.hdr = &d->hdr;
rr.evtid = evtid;
rr.arch_mon_ctx = resctrl_arch_mon_ctx_alloc(rr.r, rr.evtid);
if (IS_ERR(rr.arch_mon_ctx)) {
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 05438e15e2ca..32f9134fdec4 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2887,7 +2887,8 @@ static void rmdir_all_sub(void)
* @rid: The resource id for the event file being created.
* @domid: The domain id for the event file being created.
* @mevt: The type of event file being created.
- * @do_sum: Whether SNC summing monitors are being created.
+ * @do_sum: Whether SNC summing monitors are being created. Only set
+ * when @rid == RDT_RESOURCE_L3.
*/
static struct mon_data *mon_get_kn_priv(enum resctrl_res_level rid, int domid,
struct mon_evt *mevt,
@@ -2897,6 +2898,9 @@ static struct mon_data *mon_get_kn_priv(enum resctrl_res_level rid, int domid,
lockdep_assert_held(&rdtgroup_mutex);
+ if (WARN_ON_ONCE(do_sum && rid != RDT_RESOURCE_L3))
+ return NULL;
+
list_for_each_entry(priv, &mon_data_kn_priv_list, list) {
if (priv->rid == rid && priv->domid == domid &&
priv->sum == do_sum && priv->evtid == mevt->evtid)
@@ -3024,17 +3028,27 @@ static void mon_rmdir_one_subdir(struct kernfs_node *pkn, char *name, char *subn
* when last domain being summed is removed.
*/
static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
- struct rdt_mon_domain *d)
+ struct rdt_domain_hdr *hdr)
{
struct rdtgroup *prgrp, *crgrp;
+ int domid = hdr->id;
char subname[32];
- bool snc_mode;
char name[32];
- snc_mode = r->mon_scope == RESCTRL_L3_NODE;
- sprintf(name, "mon_%s_%02d", r->name, snc_mode ? d->ci_id : d->hdr.id);
- if (snc_mode)
- sprintf(subname, "mon_sub_%s_%02d", r->name, d->hdr.id);
+ if (r->rid == RDT_RESOURCE_L3) {
+ struct rdt_mon_domain *d;
+
+ if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
+ return;
+ d = container_of(hdr, struct rdt_mon_domain, hdr);
+
+ /* SNC mode? */
+ if (r->mon_scope == RESCTRL_L3_NODE) {
+ domid = d->ci_id;
+ sprintf(subname, "mon_sub_%s_%02d", r->name, hdr->id);
+ }
+ }
+ sprintf(name, "mon_%s_%02d", r->name, domid);
list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
mon_rmdir_one_subdir(prgrp->mon.mon_data_kn, name, subname);
@@ -3044,19 +3058,18 @@ static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
}
}
-static int mon_add_all_files(struct kernfs_node *kn, struct rdt_mon_domain *d,
+static int mon_add_all_files(struct kernfs_node *kn, struct rdt_domain_hdr *hdr,
struct rdt_resource *r, struct rdtgroup *prgrp,
- bool do_sum)
+ int domid, bool do_sum)
{
struct rmid_read rr = {0};
struct mon_data *priv;
struct mon_evt *mevt;
- int ret, domid;
+ int ret;
for_each_mon_event(mevt) {
if (mevt->rid != r->rid || !mevt->enabled)
continue;
- domid = do_sum ? d->ci_id : d->hdr.id;
priv = mon_get_kn_priv(r->rid, domid, mevt, do_sum);
if (WARN_ON_ONCE(!priv))
return -EINVAL;
@@ -3065,26 +3078,38 @@ static int mon_add_all_files(struct kernfs_node *kn, struct rdt_mon_domain *d,
if (ret)
return ret;
- if (!do_sum && resctrl_is_mbm_event(mevt->evtid))
- mon_event_read(&rr, r, d, prgrp, &d->hdr.cpu_mask, mevt->evtid, true);
+ if (r->rid == RDT_RESOURCE_L3 && !do_sum && resctrl_is_mbm_event(mevt->evtid))
+ mon_event_read(&rr, r, hdr, prgrp, &hdr->cpu_mask, mevt->evtid, true);
}
return 0;
}
static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
- struct rdt_mon_domain *d,
+ struct rdt_domain_hdr *hdr,
struct rdt_resource *r, struct rdtgroup *prgrp)
{
struct kernfs_node *kn, *ckn;
+ bool snc_mode = false;
+ int domid = hdr->id;
char name[32];
- bool snc_mode;
int ret = 0;
lockdep_assert_held(&rdtgroup_mutex);
- snc_mode = r->mon_scope == RESCTRL_L3_NODE;
- sprintf(name, "mon_%s_%02d", r->name, snc_mode ? d->ci_id : d->hdr.id);
+ if (r->rid == RDT_RESOURCE_L3) {
+ if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
+ return -EINVAL;
+ snc_mode = r->mon_scope == RESCTRL_L3_NODE;
+ if (snc_mode) {
+ struct rdt_mon_domain *d;
+
+ d = container_of(hdr, struct rdt_mon_domain, hdr);
+ domid = d->ci_id;
+ }
+ }
+ sprintf(name, "mon_%s_%02d", r->name, domid);
+
kn = kernfs_find_and_get(parent_kn, name);
if (kn) {
/*
@@ -3100,13 +3125,13 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
ret = rdtgroup_kn_set_ugid(kn);
if (ret)
goto out_destroy;
- ret = mon_add_all_files(kn, d, r, prgrp, snc_mode);
+ ret = mon_add_all_files(kn, hdr, r, prgrp, domid, snc_mode);
if (ret)
goto out_destroy;
}
if (snc_mode) {
- sprintf(name, "mon_sub_%s_%02d", r->name, d->hdr.id);
+ sprintf(name, "mon_sub_%s_%02d", r->name, hdr->id);
ckn = kernfs_create_dir(kn, name, parent_kn->mode, prgrp);
if (IS_ERR(ckn)) {
ret = -EINVAL;
@@ -3117,7 +3142,7 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
if (ret)
goto out_destroy;
- ret = mon_add_all_files(ckn, d, r, prgrp, false);
+ ret = mon_add_all_files(ckn, hdr, r, prgrp, hdr->id, false);
if (ret)
goto out_destroy;
}
@@ -3135,7 +3160,7 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
* and "monitor" groups with given domain id.
*/
static void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
- struct rdt_mon_domain *d)
+ struct rdt_domain_hdr *hdr)
{
struct kernfs_node *parent_kn;
struct rdtgroup *prgrp, *crgrp;
@@ -3143,12 +3168,12 @@ static void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
parent_kn = prgrp->mon.mon_data_kn;
- mkdir_mondata_subdir(parent_kn, d, r, prgrp);
+ mkdir_mondata_subdir(parent_kn, hdr, r, prgrp);
head = &prgrp->mon.crdtgrp_list;
list_for_each_entry(crgrp, head, mon.crdtgrp_list) {
parent_kn = crgrp->mon.mon_data_kn;
- mkdir_mondata_subdir(parent_kn, d, r, crgrp);
+ mkdir_mondata_subdir(parent_kn, hdr, r, crgrp);
}
}
}
@@ -3157,14 +3182,14 @@ static int mkdir_mondata_subdir_alldom(struct kernfs_node *parent_kn,
struct rdt_resource *r,
struct rdtgroup *prgrp)
{
- struct rdt_mon_domain *dom;
+ struct rdt_domain_hdr *hdr;
int ret;
/* Walking r->domains, ensure it can't race with cpuhp */
lockdep_assert_cpus_held();
- list_for_each_entry(dom, &r->mon_domains, hdr.list) {
- ret = mkdir_mondata_subdir(parent_kn, dom, r, prgrp);
+ list_for_each_entry(hdr, &r->mon_domains, list) {
+ ret = mkdir_mondata_subdir(parent_kn, hdr, r, prgrp);
if (ret)
return ret;
}
@@ -4036,8 +4061,10 @@ void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain
mutex_unlock(&rdtgroup_mutex);
}
-void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
+void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr)
{
+ struct rdt_mon_domain *d;
+
mutex_lock(&rdtgroup_mutex);
/*
@@ -4045,11 +4072,15 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d
* per domain monitor data directories.
*/
if (resctrl_mounted && resctrl_arch_mon_capable())
- rmdir_mondata_subdir_allrdtgrp(r, d);
+ rmdir_mondata_subdir_allrdtgrp(r, hdr);
if (r->rid != RDT_RESOURCE_L3)
goto out_unlock;
+ if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
+ goto out_unlock;
+
+ d = container_of(hdr, struct rdt_mon_domain, hdr);
if (resctrl_is_mbm_enabled())
cancel_delayed_work(&d->mbm_over);
if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID) && has_busy_rmid(d)) {
@@ -4132,12 +4163,20 @@ int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d
return err;
}
-int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
+int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr)
{
- int err;
+ struct rdt_mon_domain *d;
+ int err = -EINVAL;
mutex_lock(&rdtgroup_mutex);
+ if (r->rid != RDT_RESOURCE_L3)
+ goto mkdir;
+
+ if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
+ goto out_unlock;
+
+ d = container_of(hdr, struct rdt_mon_domain, hdr);
err = domain_setup_mon_state(r, d);
if (err)
goto out_unlock;
@@ -4151,6 +4190,8 @@ int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID))
INIT_DELAYED_WORK(&d->cqm_limbo, cqm_handle_limbo);
+mkdir:
+ err = 0;
/*
* If the filesystem is not mounted then only the default resource group
* exists. Creation of its directories is deferred until mount time
@@ -4158,7 +4199,7 @@ int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
* If resctrl is mounted, add per domain monitor data directories.
*/
if (resctrl_mounted && resctrl_arch_mon_capable())
- mkdir_mondata_subdir_allrdtgrp(r, d);
+ mkdir_mondata_subdir_allrdtgrp(r, hdr);
out_unlock:
mutex_unlock(&rdtgroup_mutex);
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 10/31] x86,fs/resctrl: Rename struct rdt_mon_domain and rdt_hw_mon_domain
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
` (8 preceding siblings ...)
2025-07-11 23:53 ` [PATCH v7 09/31] x86,fs/resctrl: Use struct rdt_domain_hdr instead of struct rdt_mon_domain Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-25 23:26 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 11/31] x86,fs/resctrl: Rename some L3 specific functions Tony Luck
` (21 subsequent siblings)
31 siblings, 1 reply; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
Historically all monitoring events have been associated with the L3
resource. This will change when support for telemetry events is added.
The structures to track monitor domains of the L3 resource at both the
file system and architecture level have generic names. This may cause
confusion when support for monitoring events in other resources is added.
Rename by adding "l3_" into the names:
rdt_mon_domain -> rdt_l3_mon_domain
rdt_hw_mon_domain -> rdt_hw_l3_mon_domain
No functional change.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
include/linux/resctrl.h | 16 ++++++------
arch/x86/kernel/cpu/resctrl/internal.h | 16 ++++++------
fs/resctrl/internal.h | 8 +++---
arch/x86/kernel/cpu/resctrl/core.c | 14 +++++-----
arch/x86/kernel/cpu/resctrl/monitor.c | 18 ++++++-------
fs/resctrl/ctrlmondata.c | 2 +-
fs/resctrl/monitor.c | 34 ++++++++++++------------
fs/resctrl/rdtgroup.c | 36 +++++++++++++-------------
8 files changed, 72 insertions(+), 72 deletions(-)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index c8200626b91a..5788e1970d8c 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -166,7 +166,7 @@ struct rdt_ctrl_domain {
};
/**
- * struct rdt_mon_domain - group of CPUs sharing a resctrl monitor resource
+ * struct rdt_l3_mon_domain - group of CPUs sharing a resctrl monitor resource
* @hdr: common header for different domain types
* @ci_id: cache info id for this domain
* @rmid_busy_llc: bitmap of which limbo RMIDs are above threshold
@@ -178,7 +178,7 @@ struct rdt_ctrl_domain {
* @mbm_work_cpu: worker CPU for MBM h/w counters
* @cqm_work_cpu: worker CPU for CQM h/w counters
*/
-struct rdt_mon_domain {
+struct rdt_l3_mon_domain {
struct rdt_domain_hdr hdr;
unsigned int ci_id;
unsigned long *rmid_busy_llc;
@@ -334,10 +334,10 @@ struct resctrl_cpu_defaults {
};
struct resctrl_mon_config_info {
- struct rdt_resource *r;
- struct rdt_mon_domain *d;
- u32 evtid;
- u32 mon_config;
+ struct rdt_resource *r;
+ struct rdt_l3_mon_domain *d;
+ u32 evtid;
+ u32 mon_config;
};
/**
@@ -530,7 +530,7 @@ struct rdt_domain_hdr *resctrl_find_domain(struct list_head *h, int id,
*
* This can be called from any CPU.
*/
-void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
u32 closid, u32 rmid,
enum resctrl_event_id eventid);
@@ -543,7 +543,7 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
*
* This can be called from any CPU.
*/
-void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d);
+void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_l3_mon_domain *d);
/**
* resctrl_arch_reset_all_ctrls() - Reset the control for each CLOSID to its
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 58dca892a5df..684a1b830ced 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -51,17 +51,17 @@ struct rdt_hw_ctrl_domain {
};
/**
- * struct rdt_hw_mon_domain - Arch private attributes of a set of CPUs that share
- * a resource for a monitor function
- * @d_resctrl: Properties exposed to the resctrl file system
+ * struct rdt_hw_l3_mon_domain - Arch private attributes of a set of CPUs that share
+ * a resource for a monitor function
+ * @d_resctrl: Properties exposed to the resctrl file system
* @arch_mbm_states: Per-event pointer to the MBM event's saved state.
* An MBM event's state is an array of struct arch_mbm_state
* indexed by RMID on x86.
*
* Members of this structure are accessed via helpers that provide abstraction.
*/
-struct rdt_hw_mon_domain {
- struct rdt_mon_domain d_resctrl;
+struct rdt_hw_l3_mon_domain {
+ struct rdt_l3_mon_domain d_resctrl;
struct arch_mbm_state *arch_mbm_states[QOS_NUM_L3_MBM_EVENTS];
};
@@ -70,9 +70,9 @@ static inline struct rdt_hw_ctrl_domain *resctrl_to_arch_ctrl_dom(struct rdt_ctr
return container_of(r, struct rdt_hw_ctrl_domain, d_resctrl);
}
-static inline struct rdt_hw_mon_domain *resctrl_to_arch_mon_dom(struct rdt_mon_domain *r)
+static inline struct rdt_hw_l3_mon_domain *resctrl_to_arch_mon_dom(struct rdt_l3_mon_domain *r)
{
- return container_of(r, struct rdt_hw_mon_domain, d_resctrl);
+ return container_of(r, struct rdt_hw_l3_mon_domain, d_resctrl);
}
/**
@@ -124,7 +124,7 @@ static inline struct rdt_hw_resource *resctrl_to_arch_res(struct rdt_resource *r
extern struct rdt_hw_resource rdt_resources_all[];
-void arch_mon_domain_online(struct rdt_resource *r, struct rdt_mon_domain *d);
+void arch_mon_domain_online(struct rdt_resource *r, struct rdt_l3_mon_domain *d);
/* CPUID.(EAX=10H, ECX=ResID=1).EAX */
union cpuid_0x10_1_eax {
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index b19e974b7865..e4f06f700063 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -357,7 +357,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
int resctrl_mon_resource_init(void);
-void mbm_setup_overflow_handler(struct rdt_mon_domain *dom,
+void mbm_setup_overflow_handler(struct rdt_l3_mon_domain *dom,
unsigned long delay_ms,
int exclude_cpu);
@@ -365,14 +365,14 @@ void mbm_handle_overflow(struct work_struct *work);
bool is_mba_sc(struct rdt_resource *r);
-void cqm_setup_limbo_handler(struct rdt_mon_domain *dom, unsigned long delay_ms,
+void cqm_setup_limbo_handler(struct rdt_l3_mon_domain *dom, unsigned long delay_ms,
int exclude_cpu);
void cqm_handle_limbo(struct work_struct *work);
-bool has_busy_rmid(struct rdt_mon_domain *d);
+bool has_busy_rmid(struct rdt_l3_mon_domain *d);
-void __check_limbo(struct rdt_mon_domain *d, bool force_free);
+void __check_limbo(struct rdt_l3_mon_domain *d, bool force_free);
void resctrl_file_fflags_init(const char *config, unsigned long fflags);
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 0bf793579b9a..46c5e2a7565d 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -363,7 +363,7 @@ static void ctrl_domain_free(struct rdt_hw_ctrl_domain *hw_dom)
kfree(hw_dom);
}
-static void mon_domain_free(struct rdt_hw_mon_domain *hw_dom)
+static void mon_domain_free(struct rdt_hw_l3_mon_domain *hw_dom)
{
int idx;
@@ -400,7 +400,7 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_ctrl_domain *
* @num_rmid: The size of the MBM counter array
* @hw_dom: The domain that owns the allocated arrays
*/
-static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_mon_domain *hw_dom)
+static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_l3_mon_domain *hw_dom)
{
size_t tsize = sizeof(*hw_dom->arch_mbm_states[0]);
enum resctrl_event_id eventid;
@@ -498,8 +498,8 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
static void l3_mon_domain_setup(int cpu, int id, struct rdt_resource *r, struct list_head *add_pos)
{
- struct rdt_hw_mon_domain *hw_dom;
- struct rdt_mon_domain *d;
+ struct rdt_hw_l3_mon_domain *hw_dom;
+ struct rdt_l3_mon_domain *d;
struct cacheinfo *ci;
int err;
@@ -625,9 +625,9 @@ static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
{
int id = get_domain_id_from_scope(cpu, r->mon_scope);
- struct rdt_hw_mon_domain *hw_dom;
+ struct rdt_hw_l3_mon_domain *hw_dom;
+ struct rdt_l3_mon_domain *d;
struct rdt_domain_hdr *hdr;
- struct rdt_mon_domain *d;
lockdep_assert_held(&domain_list_lock);
@@ -653,7 +653,7 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
switch (r->rid) {
case RDT_RESOURCE_L3:
- d = container_of(hdr, struct rdt_mon_domain, hdr);
+ d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
hw_dom = resctrl_to_arch_mon_dom(d);
resctrl_offline_mon_domain(r, hdr);
list_del_rcu(&d->hdr.list);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index b31794c5dcd4..043f777378a6 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -157,7 +157,7 @@ static int __rmid_read_phys(u32 prmid, enum resctrl_event_id eventid, u64 *val)
return 0;
}
-static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_mon_domain *hw_dom,
+static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_l3_mon_domain *hw_dom,
u32 rmid,
enum resctrl_event_id eventid)
{
@@ -171,11 +171,11 @@ static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_mon_domain *hw_do
return state ? &state[rmid] : NULL;
}
-void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
u32 unused, u32 rmid,
enum resctrl_event_id eventid)
{
- struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
+ struct rdt_hw_l3_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
int cpu = cpumask_any(&d->hdr.cpu_mask);
struct arch_mbm_state *am;
u32 prmid;
@@ -194,9 +194,9 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
* Assumes that hardware counters are also reset and thus that there is
* no need to record initial non-zero counts.
*/
-void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d)
+void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
{
- struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
+ struct rdt_hw_l3_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
enum resctrl_event_id eventid;
int idx;
@@ -222,10 +222,10 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
u64 *val, void *ignored)
{
int cpu = cpumask_any(&hdr->cpu_mask);
- struct rdt_hw_mon_domain *hw_dom;
+ struct rdt_hw_l3_mon_domain *hw_dom;
struct rdt_hw_resource *hw_res;
+ struct rdt_l3_mon_domain *d;
struct arch_mbm_state *am;
- struct rdt_mon_domain *d;
u64 msr_val, chunks;
u32 prmid;
int ret;
@@ -238,7 +238,7 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
return -EINVAL;
- d = container_of(hdr, struct rdt_mon_domain, hdr);
+ d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
hw_dom = resctrl_to_arch_mon_dom(d);
hw_res = resctrl_to_arch_res(r);
prmid = logical_rmid_to_physical_rmid(cpu, rmid);
@@ -275,7 +275,7 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
* must adjust RMID counter numbers based on SNC node. See
* logical_rmid_to_physical_rmid() for code that does this.
*/
-void arch_mon_domain_online(struct rdt_resource *r, struct rdt_mon_domain *d)
+void arch_mon_domain_online(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
{
if (snc_nodes_per_l3_cache > 1)
msr_clear_bit(MSR_RMID_SNC_CONFIG, 0);
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index 1c1c0e7bbc11..1d7086509bfa 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -616,7 +616,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
r = resctrl_arch_get_resource(resid);
if (md->sum) {
- struct rdt_mon_domain *d;
+ struct rdt_l3_mon_domain *d;
if (WARN_ON_ONCE(resid != RDT_RESOURCE_L3)) {
ret = -EIO;
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 85fe88b965fa..28d96147b9f4 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -130,7 +130,7 @@ static void limbo_release_entry(struct rmid_entry *entry)
* decrement the count. If the busy count gets to zero on an RMID, we
* free the RMID
*/
-void __check_limbo(struct rdt_mon_domain *d, bool force_free)
+void __check_limbo(struct rdt_l3_mon_domain *d, bool force_free)
{
struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
u32 idx_limit = resctrl_arch_system_num_rmid_idx();
@@ -188,7 +188,7 @@ void __check_limbo(struct rdt_mon_domain *d, bool force_free)
resctrl_arch_mon_ctx_free(r, QOS_L3_OCCUP_EVENT_ID, arch_mon_ctx);
}
-bool has_busy_rmid(struct rdt_mon_domain *d)
+bool has_busy_rmid(struct rdt_l3_mon_domain *d)
{
u32 idx_limit = resctrl_arch_system_num_rmid_idx();
@@ -289,7 +289,7 @@ int alloc_rmid(u32 closid)
static void add_rmid_to_limbo(struct rmid_entry *entry)
{
struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
- struct rdt_mon_domain *d;
+ struct rdt_l3_mon_domain *d;
u32 idx;
lockdep_assert_held(&rdtgroup_mutex);
@@ -342,7 +342,7 @@ void free_rmid(u32 closid, u32 rmid)
list_add_tail(&entry->list, &rmid_free_lru);
}
-static struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
+static struct mbm_state *get_mbm_state(struct rdt_l3_mon_domain *d, u32 closid,
u32 rmid, enum resctrl_event_id evtid)
{
u32 idx = resctrl_arch_rmid_idx_encode(closid, rmid);
@@ -359,7 +359,7 @@ static struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
{
int cpu = smp_processor_id();
- struct rdt_mon_domain *d;
+ struct rdt_l3_mon_domain *d;
struct cacheinfo *ci;
struct mbm_state *m;
int err, ret;
@@ -369,7 +369,7 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
if (WARN_ON_ONCE(!domain_header_is_valid(rr->hdr, RESCTRL_MON_DOMAIN,
RDT_RESOURCE_L3)))
return -EINVAL;
- d = container_of(rr->hdr, struct rdt_mon_domain, hdr);
+ d = container_of(rr->hdr, struct rdt_l3_mon_domain, hdr);
resctrl_arch_reset_rmid(rr->r, d, closid, rmid, rr->evtid);
m = get_mbm_state(d, closid, rmid, rr->evtid);
if (m)
@@ -439,12 +439,12 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
static void mbm_bw_count(u32 closid, u32 rmid, struct rmid_read *rr)
{
u64 cur_bw, bytes, cur_bytes;
- struct rdt_mon_domain *d;
+ struct rdt_l3_mon_domain *d;
struct mbm_state *m;
if (WARN_ON_ONCE(domain_header_is_valid(rr->hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3)))
return;
- d = container_of(rr->hdr, struct rdt_mon_domain, hdr);
+ d = container_of(rr->hdr, struct rdt_l3_mon_domain, hdr);
m = get_mbm_state(d, closid, rmid, rr->evtid);
if (WARN_ON_ONCE(!m))
return;
@@ -545,7 +545,7 @@ static struct rdt_ctrl_domain *get_ctrl_domain_from_cpu(int cpu,
* throttle MSRs already have low percentage values. To avoid
* unnecessarily restricting such rdtgroups, we also increase the bandwidth.
*/
-static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm)
+static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_l3_mon_domain *dom_mbm)
{
u32 closid, rmid, cur_msr_val, new_msr_val;
struct mbm_state *pmbm_data, *cmbm_data;
@@ -613,7 +613,7 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm)
resctrl_arch_update_one(r_mba, dom_mba, closid, CDP_NONE, new_msr_val);
}
-static void mbm_update_one_event(struct rdt_resource *r, struct rdt_mon_domain *d,
+static void mbm_update_one_event(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
u32 closid, u32 rmid, enum resctrl_event_id evtid)
{
struct rmid_read rr = {0};
@@ -640,7 +640,7 @@ static void mbm_update_one_event(struct rdt_resource *r, struct rdt_mon_domain *
resctrl_arch_mon_ctx_free(rr.r, rr.evtid, rr.arch_mon_ctx);
}
-static void mbm_update(struct rdt_resource *r, struct rdt_mon_domain *d,
+static void mbm_update(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
u32 closid, u32 rmid)
{
/*
@@ -661,12 +661,12 @@ static void mbm_update(struct rdt_resource *r, struct rdt_mon_domain *d,
void cqm_handle_limbo(struct work_struct *work)
{
unsigned long delay = msecs_to_jiffies(CQM_LIMBOCHECK_INTERVAL);
- struct rdt_mon_domain *d;
+ struct rdt_l3_mon_domain *d;
cpus_read_lock();
mutex_lock(&rdtgroup_mutex);
- d = container_of(work, struct rdt_mon_domain, cqm_limbo.work);
+ d = container_of(work, struct rdt_l3_mon_domain, cqm_limbo.work);
__check_limbo(d, false);
@@ -689,7 +689,7 @@ void cqm_handle_limbo(struct work_struct *work)
* @exclude_cpu: Which CPU the handler should not run on,
* RESCTRL_PICK_ANY_CPU to pick any CPU.
*/
-void cqm_setup_limbo_handler(struct rdt_mon_domain *dom, unsigned long delay_ms,
+void cqm_setup_limbo_handler(struct rdt_l3_mon_domain *dom, unsigned long delay_ms,
int exclude_cpu)
{
unsigned long delay = msecs_to_jiffies(delay_ms);
@@ -706,7 +706,7 @@ void mbm_handle_overflow(struct work_struct *work)
{
unsigned long delay = msecs_to_jiffies(MBM_OVERFLOW_INTERVAL);
struct rdtgroup *prgrp, *crgrp;
- struct rdt_mon_domain *d;
+ struct rdt_l3_mon_domain *d;
struct list_head *head;
struct rdt_resource *r;
@@ -721,7 +721,7 @@ void mbm_handle_overflow(struct work_struct *work)
goto out_unlock;
r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
- d = container_of(work, struct rdt_mon_domain, mbm_over.work);
+ d = container_of(work, struct rdt_l3_mon_domain, mbm_over.work);
list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
mbm_update(r, d, prgrp->closid, prgrp->mon.rmid);
@@ -755,7 +755,7 @@ void mbm_handle_overflow(struct work_struct *work)
* @exclude_cpu: Which CPU the handler should not run on,
* RESCTRL_PICK_ANY_CPU to pick any CPU.
*/
-void mbm_setup_overflow_handler(struct rdt_mon_domain *dom, unsigned long delay_ms,
+void mbm_setup_overflow_handler(struct rdt_l3_mon_domain *dom, unsigned long delay_ms,
int exclude_cpu)
{
unsigned long delay = msecs_to_jiffies(delay_ms);
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 32f9134fdec4..d93a8bf18792 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1617,7 +1617,7 @@ static void mondata_config_read(struct resctrl_mon_config_info *mon_info)
static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
{
struct resctrl_mon_config_info mon_info;
- struct rdt_mon_domain *dom;
+ struct rdt_l3_mon_domain *dom;
bool sep = false;
cpus_read_lock();
@@ -1665,7 +1665,7 @@ static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
}
static void mbm_config_write_domain(struct rdt_resource *r,
- struct rdt_mon_domain *d, u32 evtid, u32 val)
+ struct rdt_l3_mon_domain *d, u32 evtid, u32 val)
{
struct resctrl_mon_config_info mon_info = {0};
@@ -1706,8 +1706,8 @@ static void mbm_config_write_domain(struct rdt_resource *r,
static int mon_config_write(struct rdt_resource *r, char *tok, u32 evtid)
{
char *dom_str = NULL, *id_str;
+ struct rdt_l3_mon_domain *d;
unsigned long dom_id, val;
- struct rdt_mon_domain *d;
/* Walking r->domains, ensure it can't race with cpuhp */
lockdep_assert_cpus_held();
@@ -2581,7 +2581,7 @@ static int rdt_get_tree(struct fs_context *fc)
{
struct rdt_fs_context *ctx = rdt_fc2context(fc);
unsigned long flags = RFTYPE_CTRL_BASE;
- struct rdt_mon_domain *dom;
+ struct rdt_l3_mon_domain *dom;
struct rdt_resource *r;
int ret;
@@ -3036,11 +3036,11 @@ static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
char name[32];
if (r->rid == RDT_RESOURCE_L3) {
- struct rdt_mon_domain *d;
+ struct rdt_l3_mon_domain *d;
if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
return;
- d = container_of(hdr, struct rdt_mon_domain, hdr);
+ d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
/* SNC mode? */
if (r->mon_scope == RESCTRL_L3_NODE) {
@@ -3102,9 +3102,9 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
return -EINVAL;
snc_mode = r->mon_scope == RESCTRL_L3_NODE;
if (snc_mode) {
- struct rdt_mon_domain *d;
+ struct rdt_l3_mon_domain *d;
- d = container_of(hdr, struct rdt_mon_domain, hdr);
+ d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
domid = d->ci_id;
}
}
@@ -4040,7 +4040,7 @@ static void rdtgroup_setup_default(void)
mutex_unlock(&rdtgroup_mutex);
}
-static void domain_destroy_mon_state(struct rdt_mon_domain *d)
+static void domain_destroy_mon_state(struct rdt_l3_mon_domain *d)
{
int idx;
@@ -4063,7 +4063,7 @@ void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain
void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr)
{
- struct rdt_mon_domain *d;
+ struct rdt_l3_mon_domain *d;
mutex_lock(&rdtgroup_mutex);
@@ -4080,7 +4080,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *h
if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
goto out_unlock;
- d = container_of(hdr, struct rdt_mon_domain, hdr);
+ d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
if (resctrl_is_mbm_enabled())
cancel_delayed_work(&d->mbm_over);
if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID) && has_busy_rmid(d)) {
@@ -4114,7 +4114,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *h
*
* Returns 0 for success, or -ENOMEM.
*/
-static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain *d)
+static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
{
u32 idx_limit = resctrl_arch_system_num_rmid_idx();
size_t tsize = sizeof(*d->mbm_states[0]);
@@ -4165,7 +4165,7 @@ int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d
int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr)
{
- struct rdt_mon_domain *d;
+ struct rdt_l3_mon_domain *d;
int err = -EINVAL;
mutex_lock(&rdtgroup_mutex);
@@ -4176,7 +4176,7 @@ int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr
if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
goto out_unlock;
- d = container_of(hdr, struct rdt_mon_domain, hdr);
+ d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
err = domain_setup_mon_state(r, d);
if (err)
goto out_unlock;
@@ -4225,10 +4225,10 @@ static void clear_childcpus(struct rdtgroup *r, unsigned int cpu)
}
}
-static struct rdt_mon_domain *get_mon_domain_from_cpu(int cpu,
- struct rdt_resource *r)
+static struct rdt_l3_mon_domain *get_mon_domain_from_cpu(int cpu,
+ struct rdt_resource *r)
{
- struct rdt_mon_domain *d;
+ struct rdt_l3_mon_domain *d;
lockdep_assert_cpus_held();
@@ -4244,7 +4244,7 @@ static struct rdt_mon_domain *get_mon_domain_from_cpu(int cpu,
void resctrl_offline_cpu(unsigned int cpu)
{
struct rdt_resource *l3 = resctrl_arch_get_resource(RDT_RESOURCE_L3);
- struct rdt_mon_domain *d;
+ struct rdt_l3_mon_domain *d;
struct rdtgroup *rdtgrp;
mutex_lock(&rdtgroup_mutex);
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 11/31] x86,fs/resctrl: Rename some L3 specific functions
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
` (9 preceding siblings ...)
2025-07-11 23:53 ` [PATCH v7 10/31] x86,fs/resctrl: Rename struct rdt_mon_domain and rdt_hw_mon_domain Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-25 23:26 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 12/31] fs/resctrl: Make event details accessible to functions when reading events Tony Luck
` (20 subsequent siblings)
31 siblings, 1 reply; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
All monitor functions are tied to the RDT_RESOURCE_L3 resource,
so generic function names to setup and tear down domains makes sense.
With the arrival of monitor events tied to new domains associated with
different resources it would be clearer if these functions are more
accurately named.
Two groups of functions renamed here:
Functions that allocate/free architecture per-RMID MBM state information:
arch_domain_mbm_alloc() -> l3_mon_domain_mbm_alloc()
mon_domain_free() -> l3_mon_domain_free()
Functions that allocate/free filesystem per-RMID MBM state information:
domain_setup_mon_state() -> domain_setup_l3_mon_state()
domain_destroy_mon_state() -> domain_destroy_l3_mon_state()
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
arch/x86/kernel/cpu/resctrl/core.c | 16 ++++++++--------
fs/resctrl/rdtgroup.c | 10 +++++-----
2 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 46c5e2a7565d..304fe0b61e6d 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -363,7 +363,7 @@ static void ctrl_domain_free(struct rdt_hw_ctrl_domain *hw_dom)
kfree(hw_dom);
}
-static void mon_domain_free(struct rdt_hw_l3_mon_domain *hw_dom)
+static void l3_mon_domain_free(struct rdt_hw_l3_mon_domain *hw_dom)
{
int idx;
@@ -396,11 +396,11 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_ctrl_domain *
}
/**
- * arch_domain_mbm_alloc() - Allocate arch private storage for the MBM counters
+ * l3_mon_domain_mbm_alloc() - Allocate arch private storage for the MBM counters
* @num_rmid: The size of the MBM counter array
* @hw_dom: The domain that owns the allocated arrays
*/
-static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_l3_mon_domain *hw_dom)
+static int l3_mon_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_l3_mon_domain *hw_dom)
{
size_t tsize = sizeof(*hw_dom->arch_mbm_states[0]);
enum resctrl_event_id eventid;
@@ -514,7 +514,7 @@ static void l3_mon_domain_setup(int cpu, int id, struct rdt_resource *r, struct
ci = get_cpu_cacheinfo_level(cpu, RESCTRL_L3_CACHE);
if (!ci) {
pr_warn_once("Can't find L3 cache for CPU:%d resource %s\n", cpu, r->name);
- mon_domain_free(hw_dom);
+ l3_mon_domain_free(hw_dom);
return;
}
d->ci_id = ci->id;
@@ -522,8 +522,8 @@ static void l3_mon_domain_setup(int cpu, int id, struct rdt_resource *r, struct
arch_mon_domain_online(r, d);
- if (arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
- mon_domain_free(hw_dom);
+ if (l3_mon_domain_mbm_alloc(r->num_rmid, hw_dom)) {
+ l3_mon_domain_free(hw_dom);
return;
}
@@ -533,7 +533,7 @@ static void l3_mon_domain_setup(int cpu, int id, struct rdt_resource *r, struct
if (err) {
list_del_rcu(&d->hdr.list);
synchronize_rcu();
- mon_domain_free(hw_dom);
+ l3_mon_domain_free(hw_dom);
}
}
@@ -658,7 +658,7 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
resctrl_offline_mon_domain(r, hdr);
list_del_rcu(&d->hdr.list);
synchronize_rcu();
- mon_domain_free(hw_dom);
+ l3_mon_domain_free(hw_dom);
break;
default:
pr_warn_once("Unknown resource rid=%d\n", r->rid);
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index d93a8bf18792..2467db5bb5e8 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -4040,7 +4040,7 @@ static void rdtgroup_setup_default(void)
mutex_unlock(&rdtgroup_mutex);
}
-static void domain_destroy_mon_state(struct rdt_l3_mon_domain *d)
+static void domain_destroy_l3_mon_state(struct rdt_l3_mon_domain *d)
{
int idx;
@@ -4096,13 +4096,13 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *h
cancel_delayed_work(&d->cqm_limbo);
}
- domain_destroy_mon_state(d);
+ domain_destroy_l3_mon_state(d);
out_unlock:
mutex_unlock(&rdtgroup_mutex);
}
/**
- * domain_setup_mon_state() - Initialise domain monitoring structures.
+ * domain_setup_l3_mon_state() - Initialise domain monitoring structures.
* @r: The resource for the newly online domain.
* @d: The newly online domain.
*
@@ -4114,7 +4114,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *h
*
* Returns 0 for success, or -ENOMEM.
*/
-static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
+static int domain_setup_l3_mon_state(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
{
u32 idx_limit = resctrl_arch_system_num_rmid_idx();
size_t tsize = sizeof(*d->mbm_states[0]);
@@ -4177,7 +4177,7 @@ int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr
goto out_unlock;
d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
- err = domain_setup_mon_state(r, d);
+ err = domain_setup_l3_mon_state(r, d);
if (err)
goto out_unlock;
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 12/31] fs/resctrl: Make event details accessible to functions when reading events
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
` (10 preceding siblings ...)
2025-07-11 23:53 ` [PATCH v7 11/31] x86,fs/resctrl: Rename some L3 specific functions Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-25 23:27 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 13/31] x86,fs/resctrl: Handle events that can be read from any CPU Tony Luck
` (19 subsequent siblings)
31 siblings, 1 reply; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
All details about a monitor event are kept in the mon_evt structure.
Upper levels of code only provide the event id to lower levels.
This will become a problem when new attributes are added to the
mon_evt structure.
Change the mon_data and rmid_read structures to hold a pointer
to the mon_evt structure instead of just taking a copy of the
event id.
No functional change.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
fs/resctrl/internal.h | 10 +++++-----
fs/resctrl/ctrlmondata.c | 16 ++++++++--------
fs/resctrl/monitor.c | 17 +++++++++--------
fs/resctrl/rdtgroup.c | 6 +++---
4 files changed, 25 insertions(+), 24 deletions(-)
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index e4f06f700063..ef3ec2a4860f 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -76,7 +76,7 @@ extern struct mon_evt mon_event_all[QOS_NUM_EVENTS];
* struct mon_data - Monitoring details for each event file.
* @list: Member of the global @mon_data_kn_priv_list list.
* @rid: Resource id associated with the event file.
- * @evtid: Event id associated with the event file.
+ * @evt: Event structure associated with the event file.
* @sum: Set for RDT_RESOURCE_L3 when event must be summed
* across multiple domains.
* @domid: When @sum is zero this is the domain to which
@@ -90,7 +90,7 @@ extern struct mon_evt mon_event_all[QOS_NUM_EVENTS];
struct mon_data {
struct list_head list;
enum resctrl_res_level rid;
- enum resctrl_event_id evtid;
+ struct mon_evt *evt;
int domid;
bool sum;
};
@@ -103,7 +103,7 @@ struct mon_data {
* @r: Resource describing the properties of the event being read.
* @hdr: Header of domain that the counter should be read from. If NULL then sum all
* domains in @r sharing L3 @ci.id
- * @evtid: Which monitor event to read.
+ * @evt: Event associated with the event file.
* @first: Initialize MBM counter when true.
* @ci_id: Cacheinfo id for L3. Only set when @hdr is NULL. Used when summing domains.
* @err: Error encountered when reading counter.
@@ -117,7 +117,7 @@ struct rmid_read {
struct rdtgroup *rgrp;
struct rdt_resource *r;
struct rdt_domain_hdr *hdr;
- enum resctrl_event_id evtid;
+ struct mon_evt *evt;
bool first;
unsigned int ci_id;
int err;
@@ -353,7 +353,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg);
void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
struct rdt_domain_hdr *hdr, struct rdtgroup *rdtgrp,
- cpumask_t *cpumask, int evtid, int first);
+ cpumask_t *cpumask, struct mon_evt *evt, int first);
int resctrl_mon_resource_init(void);
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index 1d7086509bfa..a99903ac5d27 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -548,7 +548,7 @@ struct rdt_domain_hdr *resctrl_find_domain(struct list_head *h, int id,
void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
struct rdt_domain_hdr *hdr, struct rdtgroup *rdtgrp,
- cpumask_t *cpumask, int evtid, int first)
+ cpumask_t *cpumask, struct mon_evt *evt, int first)
{
int cpu;
@@ -559,11 +559,11 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
* Setup the parameters to pass to mon_event_count() to read the data.
*/
rr->rgrp = rdtgrp;
- rr->evtid = evtid;
+ rr->evt = evt;
rr->r = r;
rr->hdr = hdr;
rr->first = first;
- rr->arch_mon_ctx = resctrl_arch_mon_ctx_alloc(r, evtid);
+ rr->arch_mon_ctx = resctrl_arch_mon_ctx_alloc(r, evt->evtid);
if (IS_ERR(rr->arch_mon_ctx)) {
rr->err = -EINVAL;
return;
@@ -582,20 +582,20 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
else
smp_call_on_cpu(cpu, smp_mon_event_count, rr, false);
- resctrl_arch_mon_ctx_free(r, evtid, rr->arch_mon_ctx);
+ resctrl_arch_mon_ctx_free(r, evt->evtid, rr->arch_mon_ctx);
}
int rdtgroup_mondata_show(struct seq_file *m, void *arg)
{
struct kernfs_open_file *of = m->private;
enum resctrl_res_level resid;
- enum resctrl_event_id evtid;
struct rdt_domain_hdr *hdr;
struct rmid_read rr = {0};
struct rdtgroup *rdtgrp;
int domid, cpu, ret = 0;
struct rdt_resource *r;
struct cacheinfo *ci;
+ struct mon_evt *evt;
struct mon_data *md;
rdtgrp = rdtgroup_kn_lock_live(of->kn);
@@ -612,7 +612,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
resid = md->rid;
domid = md->domid;
- evtid = md->evtid;
+ evt = md->evt;
r = resctrl_arch_get_resource(resid);
if (md->sum) {
@@ -636,7 +636,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
if (!ci)
continue;
mon_event_read(&rr, r, NULL, rdtgrp,
- &ci->shared_cpu_map, evtid, false);
+ &ci->shared_cpu_map, evt, false);
goto checkresult;
}
}
@@ -652,7 +652,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
ret = -ENOENT;
goto out;
}
- mon_event_read(&rr, r, hdr, rdtgrp, &hdr->cpu_mask, evtid, false);
+ mon_event_read(&rr, r, hdr, rdtgrp, &hdr->cpu_mask, evt, false);
}
checkresult:
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 28d96147b9f4..6d4191eff391 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -370,8 +370,8 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
RDT_RESOURCE_L3)))
return -EINVAL;
d = container_of(rr->hdr, struct rdt_l3_mon_domain, hdr);
- resctrl_arch_reset_rmid(rr->r, d, closid, rmid, rr->evtid);
- m = get_mbm_state(d, closid, rmid, rr->evtid);
+ resctrl_arch_reset_rmid(rr->r, d, closid, rmid, rr->evt->evtid);
+ m = get_mbm_state(d, closid, rmid, rr->evt->evtid);
if (m)
memset(m, 0, sizeof(struct mbm_state));
return 0;
@@ -382,7 +382,7 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
if (!cpumask_test_cpu(cpu, &rr->hdr->cpu_mask))
return -EINVAL;
rr->err = resctrl_arch_rmid_read(rr->r, rr->hdr, closid, rmid,
- rr->evtid, &tval, rr->arch_mon_ctx);
+ rr->evt->evtid, &tval, rr->arch_mon_ctx);
if (rr->err)
return rr->err;
@@ -411,7 +411,7 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
if (d->ci_id != rr->ci_id)
continue;
err = resctrl_arch_rmid_read(rr->r, &d->hdr, closid, rmid,
- rr->evtid, &tval, rr->arch_mon_ctx);
+ rr->evt->evtid, &tval, rr->arch_mon_ctx);
if (!err) {
rr->val += tval;
ret = 0;
@@ -445,7 +445,7 @@ static void mbm_bw_count(u32 closid, u32 rmid, struct rmid_read *rr)
if (WARN_ON_ONCE(domain_header_is_valid(rr->hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3)))
return;
d = container_of(rr->hdr, struct rdt_l3_mon_domain, hdr);
- m = get_mbm_state(d, closid, rmid, rr->evtid);
+ m = get_mbm_state(d, closid, rmid, rr->evt->evtid);
if (WARN_ON_ONCE(!m))
return;
@@ -616,12 +616,13 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_l3_mon_domain *dom_m
static void mbm_update_one_event(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
u32 closid, u32 rmid, enum resctrl_event_id evtid)
{
+ struct mon_evt *evt = &mon_event_all[evtid];
struct rmid_read rr = {0};
rr.r = r;
rr.hdr = &d->hdr;
- rr.evtid = evtid;
- rr.arch_mon_ctx = resctrl_arch_mon_ctx_alloc(rr.r, rr.evtid);
+ rr.evt = evt;
+ rr.arch_mon_ctx = resctrl_arch_mon_ctx_alloc(rr.r, evt->evtid);
if (IS_ERR(rr.arch_mon_ctx)) {
pr_warn_ratelimited("Failed to allocate monitor context: %ld",
PTR_ERR(rr.arch_mon_ctx));
@@ -637,7 +638,7 @@ static void mbm_update_one_event(struct rdt_resource *r, struct rdt_l3_mon_domai
if (is_mba_sc(NULL))
mbm_bw_count(closid, rmid, &rr);
- resctrl_arch_mon_ctx_free(rr.r, rr.evtid, rr.arch_mon_ctx);
+ resctrl_arch_mon_ctx_free(rr.r, evt->evtid, rr.arch_mon_ctx);
}
static void mbm_update(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 2467db5bb5e8..6df06bf0e694 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2903,7 +2903,7 @@ static struct mon_data *mon_get_kn_priv(enum resctrl_res_level rid, int domid,
list_for_each_entry(priv, &mon_data_kn_priv_list, list) {
if (priv->rid == rid && priv->domid == domid &&
- priv->sum == do_sum && priv->evtid == mevt->evtid)
+ priv->sum == do_sum && priv->evt == mevt)
return priv;
}
@@ -2914,7 +2914,7 @@ static struct mon_data *mon_get_kn_priv(enum resctrl_res_level rid, int domid,
priv->rid = rid;
priv->domid = domid;
priv->sum = do_sum;
- priv->evtid = mevt->evtid;
+ priv->evt = mevt;
list_add_tail(&priv->list, &mon_data_kn_priv_list);
return priv;
@@ -3079,7 +3079,7 @@ static int mon_add_all_files(struct kernfs_node *kn, struct rdt_domain_hdr *hdr,
return ret;
if (r->rid == RDT_RESOURCE_L3 && !do_sum && resctrl_is_mbm_event(mevt->evtid))
- mon_event_read(&rr, r, hdr, prgrp, &hdr->cpu_mask, mevt->evtid, true);
+ mon_event_read(&rr, r, hdr, prgrp, &hdr->cpu_mask, mevt, true);
}
return 0;
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 13/31] x86,fs/resctrl: Handle events that can be read from any CPU
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
` (11 preceding siblings ...)
2025-07-11 23:53 ` [PATCH v7 12/31] fs/resctrl: Make event details accessible to functions when reading events Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-25 23:32 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 14/31] x86,fs/resctrl: Support binary fixed point event counters Tony Luck
` (18 subsequent siblings)
31 siblings, 1 reply; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
Resctrl file system code was built with the assumption that monitor
events can only be read from a CPU in the cpumask_t set for each
domain.
This was true for x86 events accessed with an MSR interface, but may
not be true for other access methods such as MMIO.
Add a flag to struct mon_evt to indicate if the event can be read on
any CPU.
Architecture uses resctrl_enable_mon_event() to enable an event and
set the flag appropriately.
Bypass all the smp_call*() code for events that can be read on any CPU
and call mon_event_count() directly from mon_event_read().
Add a test for events that can be read from any domain to skip checks
in __mon_event_count() that the read is being done from a CPU in the
correct domain or cache scope.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
include/linux/resctrl.h | 2 +-
fs/resctrl/internal.h | 2 ++
arch/x86/kernel/cpu/resctrl/core.c | 6 ++--
fs/resctrl/ctrlmondata.c | 7 ++++-
fs/resctrl/monitor.c | 46 +++++++++++++++++++++++-------
5 files changed, 47 insertions(+), 16 deletions(-)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 5788e1970d8c..17a21f193a3d 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -379,7 +379,7 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
u32 resctrl_arch_system_num_rmid_idx(void);
int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
-void resctrl_enable_mon_event(enum resctrl_event_id eventid);
+void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu);
bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid);
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index ef3ec2a4860f..23dd0b39a117 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -57,6 +57,7 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
* @rid: resource id for this event
* @name: name of the event
* @configurable: true if the event is configurable
+ * @any_cpu: true if the event can be read from any CPU
* @enabled: true if the event is enabled
*/
struct mon_evt {
@@ -64,6 +65,7 @@ struct mon_evt {
enum resctrl_res_level rid;
char *name;
bool configurable;
+ bool any_cpu;
bool enabled;
};
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 304fe0b61e6d..0a564285d829 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -887,15 +887,15 @@ static __init bool get_rdt_mon_resources(void)
bool ret = false;
if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) {
- resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID);
+ resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false);
ret = true;
}
if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
- resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID);
+ resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID, false);
ret = true;
}
if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
- resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID);
+ resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID, false);
ret = true;
}
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index a99903ac5d27..2e65fddc3408 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -569,6 +569,11 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
return;
}
+ if (evt->any_cpu) {
+ mon_event_count(rr);
+ goto out_ctx_free;
+ }
+
cpu = cpumask_any_housekeeping(cpumask, RESCTRL_PICK_ANY_CPU);
/*
@@ -581,7 +586,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
smp_call_function_any(cpumask, mon_event_count, rr, 1);
else
smp_call_on_cpu(cpu, smp_mon_event_count, rr, false);
-
+out_ctx_free:
resctrl_arch_mon_ctx_free(r, evt->evtid, rr->arch_mon_ctx);
}
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 6d4191eff391..a6d11011cb8e 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -356,15 +356,43 @@ static struct mbm_state *get_mbm_state(struct rdt_l3_mon_domain *d, u32 closid,
return state ? &state[idx] : NULL;
}
+/*
+ * For events that can be read on any CPU this function is called
+ * in preemptible context with a direct call from mon_event_read()
+ * to mon_event_count() instead of using smp_call*() to execute on a
+ * specific CPU. For other events it is called in non-preemptible context.
+ */
+static bool cpu_on_correct_domain(struct rmid_read *rr)
+{
+ struct cacheinfo *ci;
+ int cpu;
+
+ /* Any CPU is OK for this event */
+ if (rr->evt->any_cpu)
+ return true;
+
+ cpu = smp_processor_id();
+
+ /* Single domain. Must be on a CPU in that domain. */
+ if (rr->hdr)
+ return cpumask_test_cpu(cpu, &rr->hdr->cpu_mask);
+
+ /* Summing domains that share a cache, must be on a CPU for that cache. */
+ ci = get_cpu_cacheinfo_level(cpu, RESCTRL_L3_CACHE);
+
+ return ci && ci->id == rr->ci_id;
+}
+
static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
{
- int cpu = smp_processor_id();
struct rdt_l3_mon_domain *d;
- struct cacheinfo *ci;
struct mbm_state *m;
int err, ret;
u64 tval = 0;
+ if (!cpu_on_correct_domain(rr))
+ return -EINVAL;
+
if (rr->r->rid == RDT_RESOURCE_L3 && rr->first) {
if (WARN_ON_ONCE(!domain_header_is_valid(rr->hdr, RESCTRL_MON_DOMAIN,
RDT_RESOURCE_L3)))
@@ -378,9 +406,7 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
}
if (rr->hdr) {
- /* Reading a single domain, must be on a CPU in that domain. */
- if (!cpumask_test_cpu(cpu, &rr->hdr->cpu_mask))
- return -EINVAL;
+ /* Single domain. */
rr->err = resctrl_arch_rmid_read(rr->r, rr->hdr, closid, rmid,
rr->evt->evtid, &tval, rr->arch_mon_ctx);
if (rr->err)
@@ -394,12 +420,9 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
if (WARN_ON_ONCE(rr->r->rid != RDT_RESOURCE_L3))
return -EINVAL;
- /* Summing domains that share a cache, must be on a CPU for that cache. */
- ci = get_cpu_cacheinfo_level(cpu, RESCTRL_L3_CACHE);
- if (!ci || ci->id != rr->ci_id)
- return -EINVAL;
-
/*
+ * Sum across multiple domains.
+ *
* Legacy files must report the sum of an event across all
* domains that share the same L3 cache instance.
* Report success if a read from any domain succeeds, -EINVAL
@@ -878,7 +901,7 @@ struct mon_evt mon_event_all[QOS_NUM_EVENTS] = {
},
};
-void resctrl_enable_mon_event(enum resctrl_event_id eventid)
+void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu)
{
if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS))
return;
@@ -887,6 +910,7 @@ void resctrl_enable_mon_event(enum resctrl_event_id eventid)
return;
}
+ mon_event_all[eventid].any_cpu = any_cpu;
mon_event_all[eventid].enabled = true;
}
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 14/31] x86,fs/resctrl: Support binary fixed point event counters
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
` (12 preceding siblings ...)
2025-07-11 23:53 ` [PATCH v7 13/31] x86,fs/resctrl: Handle events that can be read from any CPU Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-25 23:34 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 15/31] x86,fs/resctrl: Add an architectural hook called for each mount Tony Luck
` (17 subsequent siblings)
31 siblings, 1 reply; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
Resctrl was written with the assumption that all monitor events can be
displayed as unsigned decimal integers.
Hardware architecture counters may provide some telemetry events with
greater precision where the event is not a simple count, but is a
measurement of some sort (e.g. Joules for energy consumed).
Add a new argument to resctrl_enable_mon_event() for architecture code
to inform the file system that the value for a counter is a fixed-point
value with a specific number of binary places. The file system will
only allow architecture to use floating point format on events that it
marked with mon_evt::is_floating_point.
Fixed point values are displayed with values rounded to an appropriate
number of decimal places for the precision of the number of binary places
provided. In general one extra decimal place is added for every three
additional binary places. There are some exceptions for low precision
binary values where exact representation is possible:
1 binary place is 0.0 or 0.5. => 1 decimal place
2 binary places is 0.0, 0.25, 0.5, 0.75 => 2 decimal places
3 binary places is 0.0, 0.125, etc. => 3 decimal places
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
include/linux/resctrl.h | 4 +-
fs/resctrl/internal.h | 5 ++
arch/x86/kernel/cpu/resctrl/core.c | 6 +-
fs/resctrl/ctrlmondata.c | 88 ++++++++++++++++++++++++++++++
fs/resctrl/monitor.c | 10 +++-
5 files changed, 107 insertions(+), 6 deletions(-)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 17a21f193a3d..e9a1cabfc724 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -379,7 +379,9 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
u32 resctrl_arch_system_num_rmid_idx(void);
int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
-void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu);
+#define MAX_BINARY_BITS 27
+
+void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu, unsigned int binary_bits);
bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid);
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 23dd0b39a117..263a34f06a5b 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -58,6 +58,9 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
* @name: name of the event
* @configurable: true if the event is configurable
* @any_cpu: true if the event can be read from any CPU
+ * @is_floating_point: event values are displayed in floating point format
+ * @binary_bits: number of fixed-point binary bits from architecture,
+ * only valid if @is_floating_point is true
* @enabled: true if the event is enabled
*/
struct mon_evt {
@@ -66,6 +69,8 @@ struct mon_evt {
char *name;
bool configurable;
bool any_cpu;
+ bool is_floating_point;
+ unsigned int binary_bits;
bool enabled;
};
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 0a564285d829..0286d3cf6754 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -887,15 +887,15 @@ static __init bool get_rdt_mon_resources(void)
bool ret = false;
if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) {
- resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false);
+ resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false, 0);
ret = true;
}
if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
- resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID, false);
+ resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID, false, 0);
ret = true;
}
if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
- resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID, false);
+ resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID, false, 0);
ret = true;
}
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index 2e65fddc3408..71d61c96c2b8 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -17,6 +17,7 @@
#include <linux/cpu.h>
#include <linux/kernfs.h>
+#include <linux/math.h>
#include <linux/seq_file.h>
#include <linux/slab.h>
#include <linux/tick.h>
@@ -590,6 +591,91 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
resctrl_arch_mon_ctx_free(r, evt->evtid, rr->arch_mon_ctx);
}
+/*
+ * Decimal place precision to use for each number of fixed-point
+ * binary bits.
+ */
+static unsigned int decplaces[MAX_BINARY_BITS + 1] = {
+ [1] = 1,
+ [2] = 2,
+ [3] = 3,
+ [4] = 3,
+ [5] = 3,
+ [6] = 3,
+ [7] = 3,
+ [8] = 3,
+ [9] = 3,
+ [10] = 4,
+ [11] = 4,
+ [12] = 4,
+ [13] = 5,
+ [14] = 5,
+ [15] = 5,
+ [16] = 6,
+ [17] = 6,
+ [18] = 6,
+ [19] = 7,
+ [20] = 7,
+ [21] = 7,
+ [22] = 8,
+ [23] = 8,
+ [24] = 8,
+ [25] = 9,
+ [26] = 9,
+ [27] = 9
+};
+
+static void print_event_value(struct seq_file *m, unsigned int binary_bits, u64 val)
+{
+ unsigned long long frac;
+ char buf[10];
+
+ if (!binary_bits) {
+ seq_printf(m, "%llu.0\n", val);
+ return;
+ }
+
+ /* Mask off the integer part of the fixed-point value. */
+ frac = val & GENMASK_ULL(binary_bits, 0);
+
+ /*
+ * Multiply by 10^{desired decimal places}. The
+ * integer part of the fixed point value is now
+ * almost what is needed.
+ */
+ frac *= int_pow(10ull, decplaces[binary_bits]);
+
+ /*
+ * Round to nearest by adding a value that
+ * would be a "1" in the binary_bit + 1 place.
+ * Integer part of fixed point value is now
+ * the needed value.
+ */
+ frac += 1ull << (binary_bits - 1);
+
+ /*
+ * Extract the integer part of the value. This
+ * is the decimal representation of the original
+ * fixed-point fractional value.
+ */
+ frac >>= binary_bits;
+
+ /*
+ * "frac" is now in the range [0 .. 10^decplaces).
+ * I.e. string representation will fit into
+ * chosemn number of decimal places.
+ */
+ snprintf(buf, sizeof(buf), "%0*llu", decplaces[binary_bits], frac);
+
+ /* Trim trailing zeroes */
+ for (int i = decplaces[binary_bits] - 1; i > 0; i--) {
+ if (buf[i] != '0')
+ break;
+ buf[i] = '\0';
+ }
+ seq_printf(m, "%llu.%s\n", val >> binary_bits, buf);
+}
+
int rdtgroup_mondata_show(struct seq_file *m, void *arg)
{
struct kernfs_open_file *of = m->private;
@@ -666,6 +752,8 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
seq_puts(m, "Error\n");
else if (rr.err == -EINVAL)
seq_puts(m, "Unavailable\n");
+ else if (evt->is_floating_point)
+ print_event_value(m, evt->binary_bits, rr.val);
else
seq_printf(m, "%llu\n", rr.val);
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index a6d11011cb8e..adb14a9be3d2 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -901,16 +901,22 @@ struct mon_evt mon_event_all[QOS_NUM_EVENTS] = {
},
};
-void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu)
+void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu, unsigned int binary_bits)
{
- if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS))
+ if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS ||
+ binary_bits > MAX_BINARY_BITS))
return;
if (mon_event_all[eventid].enabled) {
pr_warn("Duplicate enable for event %d\n", eventid);
return;
}
+ if (binary_bits && !mon_event_all[eventid].is_floating_point) {
+ pr_warn("Event %d may not be floating point\n", eventid);
+ return;
+ }
mon_event_all[eventid].any_cpu = any_cpu;
+ mon_event_all[eventid].binary_bits = binary_bits;
mon_event_all[eventid].enabled = true;
}
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 15/31] x86,fs/resctrl: Add an architectural hook called for each mount
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
` (13 preceding siblings ...)
2025-07-11 23:53 ` [PATCH v7 14/31] x86,fs/resctrl: Support binary fixed point event counters Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-25 23:35 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 16/31] x86,fs/resctrl: Add and initialize rdt_resource for package scope core monitor Tony Luck
` (16 subsequent siblings)
31 siblings, 1 reply; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
Enumeration of Intel telemetry events is not complete when the
resctrl "late_init" code is executed.
Add a hook at the beginning of the mount code that will be used
to check for telemetry events and initialize if any are found.
The hook is called on every attempted mount. But expectations are that
most actions (like enumeration) will only need to be performed
on the first call.
The call is made with no locks held. Architecture code is responsible
for any required locking.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
include/linux/resctrl.h | 6 ++++++
arch/x86/kernel/cpu/resctrl/core.c | 9 +++++++++
fs/resctrl/rdtgroup.c | 2 ++
3 files changed, 17 insertions(+)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index e9a1cabfc724..d2fc0fcd0226 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -460,6 +460,12 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *h
void resctrl_online_cpu(unsigned int cpu);
void resctrl_offline_cpu(unsigned int cpu);
+/*
+ * Architecture hook called for each attempted file system mount.
+ * No locks are held.
+ */
+void resctrl_arch_pre_mount(void);
+
/**
* resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid
* for this resource and domain.
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 0286d3cf6754..22ff91b666d0 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -717,6 +717,15 @@ static int resctrl_arch_offline_cpu(unsigned int cpu)
return 0;
}
+void resctrl_arch_pre_mount(void)
+{
+ static atomic_t only_once = ATOMIC_INIT(0);
+ int old = 0;
+
+ if (!atomic_try_cmpxchg(&only_once, &old, 1))
+ return;
+}
+
enum {
RDT_FLAG_CMT,
RDT_FLAG_MBM_TOTAL,
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 6df06bf0e694..627243a1175c 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2585,6 +2585,8 @@ static int rdt_get_tree(struct fs_context *fc)
struct rdt_resource *r;
int ret;
+ resctrl_arch_pre_mount();
+
cpus_read_lock();
mutex_lock(&rdtgroup_mutex);
/*
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 16/31] x86,fs/resctrl: Add and initialize rdt_resource for package scope core monitor
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
` (14 preceding siblings ...)
2025-07-11 23:53 ` [PATCH v7 15/31] x86,fs/resctrl: Add an architectural hook called for each mount Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-25 23:36 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 17/31] x86/resctrl: Discover hardware telemetry events Tony Luck
` (15 subsequent siblings)
31 siblings, 1 reply; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
Add a new PERF_PKG resource and introduce package level scope for
monitoring these events so that CPU hotplug notifiers can build domains
at the package granularity.
Use the physical package ID available via topology_physical_package_id()
to identify the monitoring domains with package level scope.
This enables user space to use
/sys/devices/system/cpu/cpuX/topology/physical_package_id
to identify the monitoring domain a CPU is associated with.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
include/linux/resctrl.h | 2 ++
fs/resctrl/internal.h | 2 ++
arch/x86/kernel/cpu/resctrl/core.c | 10 ++++++++++
fs/resctrl/rdtgroup.c | 2 ++
4 files changed, 16 insertions(+)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index d2fc0fcd0226..d89378346044 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -53,6 +53,7 @@ enum resctrl_res_level {
RDT_RESOURCE_L2,
RDT_RESOURCE_MBA,
RDT_RESOURCE_SMBA,
+ RDT_RESOURCE_PERF_PKG,
/* Must be the last */
RDT_NUM_RESOURCES,
@@ -252,6 +253,7 @@ enum resctrl_scope {
RESCTRL_L2_CACHE = 2,
RESCTRL_L3_CACHE = 3,
RESCTRL_L3_NODE,
+ RESCTRL_PACKAGE,
};
/**
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 263a34f06a5b..b0bacadd9786 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -240,6 +240,8 @@ struct rdtgroup {
#define RFTYPE_DEBUG BIT(10)
+#define RFTYPE_RES_PERF_PKG BIT(11)
+
#define RFTYPE_CTRL_INFO (RFTYPE_INFO | RFTYPE_CTRL)
#define RFTYPE_MON_INFO (RFTYPE_INFO | RFTYPE_MON)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 22ff91b666d0..2b2f76c76d73 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -100,6 +100,14 @@ struct rdt_hw_resource rdt_resources_all[RDT_NUM_RESOURCES] = {
.schema_fmt = RESCTRL_SCHEMA_RANGE,
},
},
+ [RDT_RESOURCE_PERF_PKG] =
+ {
+ .r_resctrl = {
+ .name = "PERF_PKG",
+ .mon_scope = RESCTRL_PACKAGE,
+ .mon_domains = mon_domain_init(RDT_RESOURCE_PERF_PKG),
+ },
+ },
};
u32 resctrl_arch_system_num_rmid_idx(void)
@@ -433,6 +441,8 @@ static int get_domain_id_from_scope(int cpu, enum resctrl_scope scope)
return get_cpu_cacheinfo_id(cpu, scope);
case RESCTRL_L3_NODE:
return cpu_to_node(cpu);
+ case RESCTRL_PACKAGE:
+ return topology_physical_package_id(cpu);
default:
break;
}
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 627243a1175c..11792e841525 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2195,6 +2195,8 @@ static unsigned long fflags_from_resource(struct rdt_resource *r)
case RDT_RESOURCE_MBA:
case RDT_RESOURCE_SMBA:
return RFTYPE_RES_MB;
+ case RDT_RESOURCE_PERF_PKG:
+ return RFTYPE_RES_PERF_PKG;
}
return WARN_ON_ONCE(1);
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 17/31] x86/resctrl: Discover hardware telemetry events
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
` (15 preceding siblings ...)
2025-07-11 23:53 ` [PATCH v7 16/31] x86,fs/resctrl: Add and initialize rdt_resource for package scope core monitor Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-25 23:39 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 18/31] x86/resctrl: Count valid telemetry aggregators per package Tony Luck
` (14 subsequent siblings)
31 siblings, 1 reply; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
Hardware has one or more telemetry event aggregators per package
for each group of telemetry events. Each aggregator provides access
to event counts in an array of 64-bit values in MMIO space. There
is a "guid" (in this case a unique 32-bit integer) which refers to
an XML file published in https://github.com/intel/Intel-PMT
that provides all the details about each aggregator.
The XML file provides the following information:
1) Which telemetry events are included in the group for this aggregator.
2) The order in which the event counters appear for each RMID.
3) The value type of each event counter (integer or fixed-point).
4) The number of RMIDs supported.
5) Which additional aggregator status registers are included.
6) The total size of the MMIO region for this aggregator.
Each aggregator makes event counters available to Linux in
a region of MMIO memory. Enumeration of these regions is
done by the INTEL_PMT_DISCOVERY discovery driver.
Add a new Kconfig option CONFIG_X86_RESCTRL_CPU_INTEL_AET for the
Intel specific parts of telemetry code. This depends on the
INTEL_PMT_DISCOVERY driver being built-in to the kernel for
enumeration of telemetry features.
Call intel_pmt_get_regions_by_feature() for each pmt_feature_id
that indicates per-RMID telemetry.
Save the returned pmt_feature_group pointers with guids that are known
to resctrl for use at run time.
Those pointers are returned to the INTEL_PMT_DISCOVERY driver at
resctrl_arch_exit() time.
Note that checkpatch complains about the alignment of additional
lines in the definition of the intel_pmt_put_feature_group
cleanup helper. I didn't find a way to appease conflicting
requirements from checkpatch.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
arch/x86/kernel/cpu/resctrl/internal.h | 8 ++
arch/x86/kernel/cpu/resctrl/core.c | 5 +
arch/x86/kernel/cpu/resctrl/intel_aet.c | 133 ++++++++++++++++++++++++
arch/x86/Kconfig | 13 +++
arch/x86/kernel/cpu/resctrl/Makefile | 1 +
5 files changed, 160 insertions(+)
create mode 100644 arch/x86/kernel/cpu/resctrl/intel_aet.c
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 684a1b830ced..36a2072c19c7 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -169,4 +169,12 @@ void __init intel_rdt_mbm_apply_quirk(void);
void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
+#ifdef CONFIG_X86_RESCTRL_CPU_INTEL_AET
+bool intel_aet_get_events(void);
+void __exit intel_aet_exit(void);
+#else
+static inline bool intel_aet_get_events(void) { return false; }
+static inline void __exit intel_aet_exit(void) { }
+#endif
+
#endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 2b2f76c76d73..b8288f5d4aff 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -734,6 +734,9 @@ void resctrl_arch_pre_mount(void)
if (!atomic_try_cmpxchg(&only_once, &old, 1))
return;
+
+ if (!intel_aet_get_events())
+ return;
}
enum {
@@ -1086,6 +1089,8 @@ late_initcall(resctrl_arch_late_init);
static void __exit resctrl_arch_exit(void)
{
+ intel_aet_exit();
+
cpuhp_remove_state(rdt_online);
resctrl_exit();
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
new file mode 100644
index 000000000000..d177e5aa1f6a
--- /dev/null
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -0,0 +1,133 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Resource Director Technology(RDT)
+ * - Intel Application Energy Telemetry
+ *
+ * Copyright (C) 2025 Intel Corporation
+ *
+ * Author:
+ * Tony Luck <tony.luck@intel.com>
+ */
+
+#define pr_fmt(fmt) "resctrl: " fmt
+
+#include <linux/cleanup.h>
+#include <linux/cpu.h>
+#include <linux/intel_vsec.h>
+#include <linux/resctrl.h>
+
+#include "internal.h"
+
+/**
+ * struct event_group - All information about a group of telemetry events.
+ * @pfg: Points to the aggregated telemetry space information
+ * within the OOBMSM driver that contains data for all
+ * telemetry regions.
+ * @list: List of active event groups.
+ * @guid: Unique number per XML description file.
+ */
+struct event_group {
+ /* Data fields for additional structures to manage this group. */
+ struct pmt_feature_group *pfg;
+ struct list_head list;
+
+ /* Remaining fields initialized from XML file. */
+ u32 guid;
+};
+
+static LIST_HEAD(active_event_groups);
+
+/*
+ * Link: https://github.com/intel/Intel-PMT
+ * File: xml/CWF/OOBMSM/RMID-ENERGY/cwf_aggregator.xml
+ */
+static struct event_group energy_0x26696143 = {
+ .guid = 0x26696143,
+};
+
+/*
+ * Link: https://github.com/intel/Intel-PMT
+ * File: xml/CWF/OOBMSM/RMID-PERF/cwf_aggregator.xml
+ */
+static struct event_group perf_0x26557651 = {
+ .guid = 0x26557651,
+};
+
+static struct event_group *known_energy_event_groups[] = {
+ &energy_0x26696143,
+};
+
+#define NUM_KNOWN_ENERGY_GROUPS ARRAY_SIZE(known_energy_event_groups)
+
+static struct event_group *known_perf_event_groups[] = {
+ &perf_0x26557651,
+};
+
+#define NUM_KNOWN_PERF_GROUPS ARRAY_SIZE(known_perf_event_groups)
+
+/* Stub for now */
+static int discover_events(struct event_group *e, struct pmt_feature_group *p)
+{
+ return -EINVAL;
+}
+
+DEFINE_FREE(intel_pmt_put_feature_group, struct pmt_feature_group *,
+ if (!IS_ERR_OR_NULL(_T))
+ intel_pmt_put_feature_group(_T))
+
+/*
+ * Make a request to the INTEL_PMT_DISCOVERY driver for the
+ * pmt_feature_group for a specific feature. If there is
+ * one the returned structure has an array of telemetry_region
+ * structures. Each describes one telemetry aggregator.
+ * Try to use any with a known matching guid.
+ */
+static bool get_pmt_feature(enum pmt_feature_id feature, struct event_group **evgs,
+ unsigned int num_evg)
+{
+ struct pmt_feature_group *p __free(intel_pmt_put_feature_group) = NULL;
+ struct event_group **peg;
+ bool ret;
+
+ p = intel_pmt_get_regions_by_feature(feature);
+
+ if (IS_ERR_OR_NULL(p))
+ return false;
+
+ for (peg = evgs; peg < &evgs[num_evg]; peg++) {
+ ret = discover_events(*peg, p);
+ if (!ret) {
+ (*peg)->pfg = no_free_ptr(p);
+ return true;
+ }
+ }
+
+ return false;
+}
+
+/*
+ * Ask OOBMSM discovery driver for all the RMID based telemetry groups
+ * that it supports.
+ */
+bool intel_aet_get_events(void)
+{
+ bool ret1, ret2;
+
+ ret1 = get_pmt_feature(FEATURE_PER_RMID_ENERGY_TELEM,
+ known_energy_event_groups, NUM_KNOWN_ENERGY_GROUPS);
+ ret2 = get_pmt_feature(FEATURE_PER_RMID_PERF_TELEM,
+ known_perf_event_groups, NUM_KNOWN_PERF_GROUPS);
+
+ return ret1 || ret2;
+}
+
+void __exit intel_aet_exit(void)
+{
+ struct event_group *evg, *tmp;
+
+ list_for_each_entry_safe(evg, tmp, &active_event_groups, list) {
+ intel_pmt_put_feature_group(evg->pfg);
+ evg->pfg = NULL;
+ list_del(&evg->list);
+ }
+}
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 340e5468980e..21c2d1022b15 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -528,6 +528,19 @@ config X86_CPU_RESCTRL
Say N if unsure.
+config X86_RESCTRL_CPU_INTEL_AET
+ bool "Intel Application Energy Telemetry" if INTEL_PMT_DISCOVERY=y
+ depends on X86_CPU_RESCTRL && CPU_SUP_INTEL
+ help
+ Enable per-RMID telemetry events in resctrl
+
+ Intel feature that collects per-RMID execution data
+ about energy consumption, measure of frequency independent
+ activity and other performance metrics. Data is aggregated
+ per package.
+
+ Say N if unsure.
+
config X86_FRED
bool "Flexible Return and Event Delivery"
depends on X86_64
diff --git a/arch/x86/kernel/cpu/resctrl/Makefile b/arch/x86/kernel/cpu/resctrl/Makefile
index d8a04b195da2..c86df4b23993 100644
--- a/arch/x86/kernel/cpu/resctrl/Makefile
+++ b/arch/x86/kernel/cpu/resctrl/Makefile
@@ -1,6 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_X86_CPU_RESCTRL) += core.o rdtgroup.o monitor.o
obj-$(CONFIG_X86_CPU_RESCTRL) += ctrlmondata.o
+obj-$(CONFIG_X86_RESCTRL_CPU_INTEL_AET) += intel_aet.o
obj-$(CONFIG_RESCTRL_FS_PSEUDO_LOCK) += pseudo_lock.o
# To allow define_trace.h's recursive include:
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 18/31] x86/resctrl: Count valid telemetry aggregators per package
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
` (16 preceding siblings ...)
2025-07-11 23:53 ` [PATCH v7 17/31] x86/resctrl: Discover hardware telemetry events Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-25 23:40 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 19/31] x86/resctrl: Complete telemetry event enumeration Tony Luck
` (13 subsequent siblings)
31 siblings, 1 reply; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
There may be multiple telemetry aggregators per package, each enumerated
by a telemetry region structure in the feature group.
Scan the array of telemetry region structures and count how many are
in each package in preparation to allocate structures to save the MMIO
addresses for each in a convenient format for use when reading event
counters.
Sanity check that the telemetry region structures have a valid
package_id and that the size they report for the MMIO space is as
large as expected from the XML description of the registers in
the region.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
arch/x86/kernel/cpu/resctrl/intel_aet.c | 55 ++++++++++++++++++++++++-
1 file changed, 53 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index d177e5aa1f6a..7cd6c06f9205 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -15,6 +15,7 @@
#include <linux/cpu.h>
#include <linux/intel_vsec.h>
#include <linux/resctrl.h>
+#include <linux/slab.h>
#include "internal.h"
@@ -25,6 +26,7 @@
* telemetry regions.
* @list: List of active event groups.
* @guid: Unique number per XML description file.
+ * @mmio_size: Number of bytes of MMIO registers for this group.
*/
struct event_group {
/* Data fields for additional structures to manage this group. */
@@ -33,16 +35,21 @@ struct event_group {
/* Remaining fields initialized from XML file. */
u32 guid;
+ size_t mmio_size;
};
static LIST_HEAD(active_event_groups);
+#define XML_MMIO_SIZE(num_rmids, num_events, num_extra_status) \
+ (((num_rmids) * (num_events) + (num_extra_status)) * sizeof(u64))
+
/*
* Link: https://github.com/intel/Intel-PMT
* File: xml/CWF/OOBMSM/RMID-ENERGY/cwf_aggregator.xml
*/
static struct event_group energy_0x26696143 = {
.guid = 0x26696143,
+ .mmio_size = XML_MMIO_SIZE(576, 2, 3),
};
/*
@@ -51,6 +58,7 @@ static struct event_group energy_0x26696143 = {
*/
static struct event_group perf_0x26557651 = {
.guid = 0x26557651,
+ .mmio_size = XML_MMIO_SIZE(576, 7, 3),
};
static struct event_group *known_energy_event_groups[] = {
@@ -65,10 +73,53 @@ static struct event_group *known_perf_event_groups[] = {
#define NUM_KNOWN_PERF_GROUPS ARRAY_SIZE(known_perf_event_groups)
-/* Stub for now */
+static bool skip_this_region(struct telemetry_region *tr, struct event_group *e)
+{
+ if (tr->guid != e->guid)
+ return true;
+ if (tr->plat_info.package_id >= topology_max_packages()) {
+ pr_warn_once("Bad package %d in guid 0x%x\n", tr->plat_info.package_id,
+ tr->guid);
+ return true;
+ }
+ if (tr->size != e->mmio_size) {
+ pr_warn_once("MMIO space %zu wrong size for guid 0x%x\n", tr->size, e->guid);
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Discover events from one pmt_feature_group.
+ * 1) Count how many usable telemetry regions per package.
+ * 2...) To be continued.
+ */
static int discover_events(struct event_group *e, struct pmt_feature_group *p)
{
- return -EINVAL;
+ int *pkgcounts __free(kfree) = NULL;
+ struct telemetry_region *tr;
+ int num_pkgs;
+
+ num_pkgs = topology_max_packages();
+
+ /* Get per-package counts of telemetry regions for this event group */
+ for (int i = 0; i < p->count; i++) {
+ tr = &p->regions[i];
+ if (skip_this_region(tr, e))
+ continue;
+ if (!pkgcounts) {
+ pkgcounts = kcalloc(num_pkgs, sizeof(*pkgcounts), GFP_KERNEL);
+ if (!pkgcounts)
+ return -ENOMEM;
+ }
+ pkgcounts[tr->plat_info.package_id]++;
+ }
+
+ if (!pkgcounts)
+ return -ENODEV;
+
+ return 0;
}
DEFINE_FREE(intel_pmt_put_feature_group, struct pmt_feature_group *,
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 19/31] x86/resctrl: Complete telemetry event enumeration
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
` (17 preceding siblings ...)
2025-07-11 23:53 ` [PATCH v7 18/31] x86/resctrl: Count valid telemetry aggregators per package Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-25 23:41 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 20/31] x86,fs/resctrl: Fill in details of events for guid 0x26696143 and 0x26557651 Tony Luck
` (12 subsequent siblings)
31 siblings, 1 reply; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
Counters for telemetry events are in MMIO space. Each telemetry_region
structure returned in the pmt_feature_group returned from OOBMSM contains
the base MMIO address for the counters.
There may be multiple aggregators per package. Scan all the
telemetry_region structures again and save the number of regions together
with a flex array of the MMIO addresses for each aggregator indexed by
package id.
Completed structure for each event group looks like this:
+---------------------+---------------------+
pkginfo** -->|pkginfo[package ID 0]|pkginfo[package ID 1]|
+---------------------+---------------------+
| |
v v
+----------------+ +----------------+
|struct mmio_info| |struct mmio_info|
+----------------+ +----------------+
|num_regions = N | |num_regions = N |
| addrs[0] | | addrs[0] |
| addrs[1] | | addrs[1] |
| ... | | ... |
| addrs[N-1] | | addrs[N-1] |
+----------------+ +----------------+
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
arch/x86/kernel/cpu/resctrl/intel_aet.c | 64 ++++++++++++++++++++++++-
1 file changed, 63 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index 7cd6c06f9205..3f383f0a9d08 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -19,12 +19,26 @@
#include "internal.h"
+/**
+ * struct pkg_mmio_info - MMIO address information for one event group of a package.
+ * @num_regions: Number of telemetry regions on this package.
+ * @addrs: Array of MMIO addresses, one per telemetry region on this package.
+ *
+ * Provides convenient access to all MMIO addresses of one event group
+ * for one package. Used when reading event data on a package.
+ */
+struct pkg_mmio_info {
+ unsigned int num_regions;
+ void __iomem *addrs[] __counted_by(num_regions);
+};
+
/**
* struct event_group - All information about a group of telemetry events.
* @pfg: Points to the aggregated telemetry space information
* within the OOBMSM driver that contains data for all
* telemetry regions.
* @list: List of active event groups.
+ * @pkginfo: Per-package MMIO addresses of telemetry regions belonging to this group.
* @guid: Unique number per XML description file.
* @mmio_size: Number of bytes of MMIO registers for this group.
*/
@@ -32,6 +46,7 @@ struct event_group {
/* Data fields for additional structures to manage this group. */
struct pmt_feature_group *pfg;
struct list_head list;
+ struct pkg_mmio_info **pkginfo;
/* Remaining fields initialized from XML file. */
u32 guid;
@@ -90,15 +105,32 @@ static bool skip_this_region(struct telemetry_region *tr, struct event_group *e)
return false;
}
+static void free_pkg_mmio_info(struct pkg_mmio_info **mmi)
+{
+ int num_pkgs = topology_max_packages();
+
+ if (!mmi)
+ return;
+
+ for (int i = 0; i < num_pkgs; i++)
+ kfree(mmi[i]);
+ kfree(mmi);
+}
+
+DEFINE_FREE(pkg_mmio_info, struct pkg_mmio_info **, free_pkg_mmio_info(_T))
+
/*
* Discover events from one pmt_feature_group.
* 1) Count how many usable telemetry regions per package.
- * 2...) To be continued.
+ * 2) Allocate per-package structures and populate with MMIO
+ * addresses of the telemetry regions used by each aggregator.
*/
static int discover_events(struct event_group *e, struct pmt_feature_group *p)
{
+ struct pkg_mmio_info **pkginfo __free(pkg_mmio_info) = NULL;
int *pkgcounts __free(kfree) = NULL;
struct telemetry_region *tr;
+ struct pkg_mmio_info *mmi;
int num_pkgs;
num_pkgs = topology_max_packages();
@@ -108,6 +140,7 @@ static int discover_events(struct event_group *e, struct pmt_feature_group *p)
tr = &p->regions[i];
if (skip_this_region(tr, e))
continue;
+
if (!pkgcounts) {
pkgcounts = kcalloc(num_pkgs, sizeof(*pkgcounts), GFP_KERNEL);
if (!pkgcounts)
@@ -119,6 +152,32 @@ static int discover_events(struct event_group *e, struct pmt_feature_group *p)
if (!pkgcounts)
return -ENODEV;
+ /* Allocate array for per-package struct pkg_mmio_info data */
+ pkginfo = kcalloc(num_pkgs, sizeof(*pkginfo), GFP_KERNEL);
+ if (!pkginfo)
+ return -ENOMEM;
+
+ /*
+ * Allocate per-package pkg_mmio_info structures and initialize
+ * count of telemetry_regions in each one.
+ */
+ for (int i = 0; i < num_pkgs; i++) {
+ pkginfo[i] = kzalloc(struct_size(pkginfo[i], addrs, pkgcounts[i]), GFP_KERNEL);
+ if (!pkginfo[i])
+ return -ENOMEM;
+ pkginfo[i]->num_regions = pkgcounts[i];
+ }
+
+ /* Save MMIO address(es) for each telemetry region in per-package structures */
+ for (int i = 0; i < p->count; i++) {
+ tr = &p->regions[i];
+ if (skip_this_region(tr, e))
+ continue;
+ mmi = pkginfo[tr->plat_info.package_id];
+ mmi->addrs[--pkgcounts[tr->plat_info.package_id]] = tr->addr;
+ }
+ e->pkginfo = no_free_ptr(pkginfo);
+
return 0;
}
@@ -151,6 +210,7 @@ static bool get_pmt_feature(enum pmt_feature_id feature, struct event_group **ev
(*peg)->pfg = no_free_ptr(p);
return true;
}
+ free_pkg_mmio_info((*peg)->pkginfo);
}
return false;
@@ -179,6 +239,8 @@ void __exit intel_aet_exit(void)
list_for_each_entry_safe(evg, tmp, &active_event_groups, list) {
intel_pmt_put_feature_group(evg->pfg);
evg->pfg = NULL;
+ free_pkg_mmio_info(evg->pkginfo);
+ evg->pkginfo = NULL;
list_del(&evg->list);
}
}
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 20/31] x86,fs/resctrl: Fill in details of events for guid 0x26696143 and 0x26557651
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
` (18 preceding siblings ...)
2025-07-11 23:53 ` [PATCH v7 19/31] x86/resctrl: Complete telemetry event enumeration Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-25 23:43 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 21/31] x86,fs/resctrl: Add architectural event pointer Tony Luck
` (11 subsequent siblings)
31 siblings, 1 reply; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
These two guids describe the events supported on Clearwater Forest.
The offsets in MMIO space are arranged in groups for each RMID.
E.g the "energy counters for guid 0x26696143 are arranged like this:
MMIO offset:0x0000 Counter for RMID 0 PMT_EVENT_ENERGY
MMIO offset:0x0008 Counter for RMID 0 PMT_EVENT_ACTIVITY
MMIO offset:0x0010 Counter for RMID 1 PMT_EVENT_ENERGY
MMIO offset:0x0018 Counter for RMID 1 PMT_EVENT_ACTIVITY
...
MMIO offset:0x23F0 Counter for RMID 575 PMT_EVENT_ENERGY
MMIO offset:0x23F8 Counter for RMID 575 PMT_EVENT_ACTIVITY
Define these events in the file system code and add the events
to the event_group structures.
PMT_EVENT_ENERGY and PMT_EVENT_ACTIVITY are produced in fixed point
format. File system code must output as floating point values.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
include/linux/resctrl_types.h | 11 ++++++++
arch/x86/kernel/cpu/resctrl/intel_aet.c | 35 +++++++++++++++++++++++++
fs/resctrl/monitor.c | 35 ++++++++++++++-----------
3 files changed, 66 insertions(+), 15 deletions(-)
diff --git a/include/linux/resctrl_types.h b/include/linux/resctrl_types.h
index d98351663c2c..6838b02d5ca3 100644
--- a/include/linux/resctrl_types.h
+++ b/include/linux/resctrl_types.h
@@ -47,6 +47,17 @@ enum resctrl_event_id {
QOS_L3_MBM_TOTAL_EVENT_ID = 0x02,
QOS_L3_MBM_LOCAL_EVENT_ID = 0x03,
+ /* Intel Telemetry Events */
+ PMT_EVENT_ENERGY,
+ PMT_EVENT_ACTIVITY,
+ PMT_EVENT_STALLS_LLC_HIT,
+ PMT_EVENT_C1_RES,
+ PMT_EVENT_UNHALTED_CORE_CYCLES,
+ PMT_EVENT_STALLS_LLC_MISS,
+ PMT_EVENT_AUTO_C6_RES,
+ PMT_EVENT_UNHALTED_REF_CYCLES,
+ PMT_EVENT_UOPS_RETIRED,
+
/* Must be the last */
QOS_NUM_EVENTS,
};
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index 3f383f0a9d08..f4bf0f2ccf26 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -32,6 +32,20 @@ struct pkg_mmio_info {
void __iomem *addrs[] __counted_by(num_regions);
};
+/**
+ * struct pmt_event - Telemetry event.
+ * @id: Resctrl event id.
+ * @idx: Counter index within each per-RMID block of counters.
+ * @bin_bits: Zero for integer valued events, else number bits in fixed-point.
+ */
+struct pmt_event {
+ enum resctrl_event_id id;
+ unsigned int idx;
+ unsigned int bin_bits;
+};
+
+#define EVT(_id, _idx, _bits) { .id = _id, .idx = _idx, .bin_bits = _bits }
+
/**
* struct event_group - All information about a group of telemetry events.
* @pfg: Points to the aggregated telemetry space information
@@ -41,6 +55,8 @@ struct pkg_mmio_info {
* @pkginfo: Per-package MMIO addresses of telemetry regions belonging to this group.
* @guid: Unique number per XML description file.
* @mmio_size: Number of bytes of MMIO registers for this group.
+ * @num_events: Number of events in this group.
+ * @evts: Array of event descriptors.
*/
struct event_group {
/* Data fields for additional structures to manage this group. */
@@ -51,6 +67,8 @@ struct event_group {
/* Remaining fields initialized from XML file. */
u32 guid;
size_t mmio_size;
+ unsigned int num_events;
+ struct pmt_event evts[] __counted_by(num_events);
};
static LIST_HEAD(active_event_groups);
@@ -65,6 +83,11 @@ static LIST_HEAD(active_event_groups);
static struct event_group energy_0x26696143 = {
.guid = 0x26696143,
.mmio_size = XML_MMIO_SIZE(576, 2, 3),
+ .num_events = 2,
+ .evts = {
+ EVT(PMT_EVENT_ENERGY, 0, 18),
+ EVT(PMT_EVENT_ACTIVITY, 1, 18),
+ }
};
/*
@@ -74,6 +97,16 @@ static struct event_group energy_0x26696143 = {
static struct event_group perf_0x26557651 = {
.guid = 0x26557651,
.mmio_size = XML_MMIO_SIZE(576, 7, 3),
+ .num_events = 7,
+ .evts = {
+ EVT(PMT_EVENT_STALLS_LLC_HIT, 0, 0),
+ EVT(PMT_EVENT_C1_RES, 1, 0),
+ EVT(PMT_EVENT_UNHALTED_CORE_CYCLES, 2, 0),
+ EVT(PMT_EVENT_STALLS_LLC_MISS, 3, 0),
+ EVT(PMT_EVENT_AUTO_C6_RES, 4, 0),
+ EVT(PMT_EVENT_UNHALTED_REF_CYCLES, 5, 0),
+ EVT(PMT_EVENT_UOPS_RETIRED, 6, 0),
+ }
};
static struct event_group *known_energy_event_groups[] = {
@@ -178,6 +211,8 @@ static int discover_events(struct event_group *e, struct pmt_feature_group *p)
}
e->pkginfo = no_free_ptr(pkginfo);
+ list_add(&e->list, &active_event_groups);
+
return 0;
}
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index adb14a9be3d2..fa1cd649b0f0 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -878,27 +878,32 @@ static void dom_data_exit(struct rdt_resource *r)
mutex_unlock(&rdtgroup_mutex);
}
+#define MON_EVENT(_eventid, _name, _res, _fp) \
+ [_eventid] = { \
+ .name = _name, \
+ .evtid = _eventid, \
+ .rid = _res, \
+ .is_floating_point = _fp, \
+}
+
/*
* All available events. Architecture code marks the ones that
* are supported by a system using resctrl_enable_mon_event()
* to set .enabled.
*/
struct mon_evt mon_event_all[QOS_NUM_EVENTS] = {
- [QOS_L3_OCCUP_EVENT_ID] = {
- .name = "llc_occupancy",
- .evtid = QOS_L3_OCCUP_EVENT_ID,
- .rid = RDT_RESOURCE_L3,
- },
- [QOS_L3_MBM_TOTAL_EVENT_ID] = {
- .name = "mbm_total_bytes",
- .evtid = QOS_L3_MBM_TOTAL_EVENT_ID,
- .rid = RDT_RESOURCE_L3,
- },
- [QOS_L3_MBM_LOCAL_EVENT_ID] = {
- .name = "mbm_local_bytes",
- .evtid = QOS_L3_MBM_LOCAL_EVENT_ID,
- .rid = RDT_RESOURCE_L3,
- },
+ MON_EVENT(QOS_L3_OCCUP_EVENT_ID, "llc_occupancy", RDT_RESOURCE_L3, false),
+ MON_EVENT(QOS_L3_MBM_TOTAL_EVENT_ID, "mbm_total_bytes", RDT_RESOURCE_L3, false),
+ MON_EVENT(QOS_L3_MBM_LOCAL_EVENT_ID, "mbm_local_bytes", RDT_RESOURCE_L3, false),
+ MON_EVENT(PMT_EVENT_ENERGY, "core_energy", RDT_RESOURCE_PERF_PKG, true),
+ MON_EVENT(PMT_EVENT_ACTIVITY, "activity", RDT_RESOURCE_PERF_PKG, true),
+ MON_EVENT(PMT_EVENT_STALLS_LLC_HIT, "stalls_llc_hit", RDT_RESOURCE_PERF_PKG, false),
+ MON_EVENT(PMT_EVENT_C1_RES, "c1_res", RDT_RESOURCE_PERF_PKG, false),
+ MON_EVENT(PMT_EVENT_UNHALTED_CORE_CYCLES, "unhalted_core_cycles", RDT_RESOURCE_PERF_PKG, false),
+ MON_EVENT(PMT_EVENT_STALLS_LLC_MISS, "stalls_llc_miss", RDT_RESOURCE_PERF_PKG, false),
+ MON_EVENT(PMT_EVENT_AUTO_C6_RES, "c6_res", RDT_RESOURCE_PERF_PKG, false),
+ MON_EVENT(PMT_EVENT_UNHALTED_REF_CYCLES, "unhalted_ref_cycles", RDT_RESOURCE_PERF_PKG, false),
+ MON_EVENT(PMT_EVENT_UOPS_RETIRED, "uops_retired", RDT_RESOURCE_PERF_PKG, false),
};
void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu, unsigned int binary_bits)
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 21/31] x86,fs/resctrl: Add architectural event pointer
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
` (19 preceding siblings ...)
2025-07-11 23:53 ` [PATCH v7 20/31] x86,fs/resctrl: Fill in details of events for guid 0x26696143 and 0x26557651 Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-25 23:43 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 22/31] x86/resctrl: Read telemetry events Tony Luck
` (10 subsequent siblings)
31 siblings, 1 reply; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
The resctrl file system layer passed the domain, rmid, and event id to
resctrl_arch_rmid_read() to fetch an event counter.
For some resources this may not be enough information to efficiently
access the counter.
Add mon_evt::arch_priv void pointer. Architecture code can initialize
this when marking each event enabled.
File system code passes this pointer to resctrl_arch_rmid_read().
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
include/linux/resctrl.h | 8 ++++++--
fs/resctrl/internal.h | 4 ++++
arch/x86/kernel/cpu/resctrl/core.c | 6 +++---
arch/x86/kernel/cpu/resctrl/monitor.c | 2 +-
fs/resctrl/monitor.c | 14 ++++++++++----
5 files changed, 24 insertions(+), 10 deletions(-)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index d89378346044..da76e9c37b69 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -383,7 +383,8 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
#define MAX_BINARY_BITS 27
-void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu, unsigned int binary_bits);
+void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu,
+ unsigned int binary_bits, void *arch_priv);
bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid);
@@ -478,6 +479,9 @@ void resctrl_arch_pre_mount(void);
* only.
* @rmid: rmid of the counter to read.
* @eventid: eventid to read, e.g. L3 occupancy.
+ * @arch_priv: Architecture private data for this event.
+ * The @arch_priv provided by the architecture via
+ * resctrl_enable_mon_event().
* @val: result of the counter read in bytes.
* @arch_mon_ctx: An architecture specific value from
* resctrl_arch_mon_ctx_alloc(), for MPAM this identifies
@@ -495,7 +499,7 @@ void resctrl_arch_pre_mount(void);
*/
int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
u32 closid, u32 rmid, enum resctrl_event_id eventid,
- u64 *val, void *arch_mon_ctx);
+ void *arch_priv, u64 *val, void *arch_mon_ctx);
/**
* resctrl_arch_rmid_read_context_check() - warn about invalid contexts
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index b0bacadd9786..56fdccb39375 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -62,6 +62,9 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
* @binary_bits: number of fixed-point binary bits from architecture,
* only valid if @is_floating_point is true
* @enabled: true if the event is enabled
+ * @arch_priv: Architecture private data for this event.
+ * The @arch_priv provided by the architecture via
+ * resctrl_enable_mon_event().
*/
struct mon_evt {
enum resctrl_event_id evtid;
@@ -72,6 +75,7 @@ struct mon_evt {
bool is_floating_point;
unsigned int binary_bits;
bool enabled;
+ void *arch_priv;
};
extern struct mon_evt mon_event_all[QOS_NUM_EVENTS];
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index b8288f5d4aff..63baab53821a 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -909,15 +909,15 @@ static __init bool get_rdt_mon_resources(void)
bool ret = false;
if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) {
- resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false, 0);
+ resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false, 0, NULL);
ret = true;
}
if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
- resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID, false, 0);
+ resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID, false, 0, NULL);
ret = true;
}
if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
- resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID, false, 0);
+ resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID, false, 0, NULL);
ret = true;
}
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 043f777378a6..185b203f6321 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -219,7 +219,7 @@ static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
u32 unused, u32 rmid, enum resctrl_event_id eventid,
- u64 *val, void *ignored)
+ void *arch_priv, u64 *val, void *ignored)
{
int cpu = cpumask_any(&hdr->cpu_mask);
struct rdt_hw_l3_mon_domain *hw_dom;
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index fa1cd649b0f0..92798e1fb5b0 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -137,9 +137,11 @@ void __check_limbo(struct rdt_l3_mon_domain *d, bool force_free)
struct rmid_entry *entry;
u32 idx, cur_idx = 1;
void *arch_mon_ctx;
+ void *arch_priv;
bool rmid_dirty;
u64 val = 0;
+ arch_priv = mon_event_all[QOS_L3_OCCUP_EVENT_ID].arch_priv;
arch_mon_ctx = resctrl_arch_mon_ctx_alloc(r, QOS_L3_OCCUP_EVENT_ID);
if (IS_ERR(arch_mon_ctx)) {
pr_warn_ratelimited("Failed to allocate monitor context: %ld",
@@ -160,7 +162,7 @@ void __check_limbo(struct rdt_l3_mon_domain *d, bool force_free)
entry = __rmid_entry(idx);
if (resctrl_arch_rmid_read(r, &d->hdr, entry->closid, entry->rmid,
- QOS_L3_OCCUP_EVENT_ID, &val,
+ QOS_L3_OCCUP_EVENT_ID, arch_priv, &val,
arch_mon_ctx)) {
rmid_dirty = true;
} else {
@@ -408,7 +410,8 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
if (rr->hdr) {
/* Single domain. */
rr->err = resctrl_arch_rmid_read(rr->r, rr->hdr, closid, rmid,
- rr->evt->evtid, &tval, rr->arch_mon_ctx);
+ rr->evt->evtid, rr->evt->arch_priv,
+ &tval, rr->arch_mon_ctx);
if (rr->err)
return rr->err;
@@ -434,7 +437,8 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
if (d->ci_id != rr->ci_id)
continue;
err = resctrl_arch_rmid_read(rr->r, &d->hdr, closid, rmid,
- rr->evt->evtid, &tval, rr->arch_mon_ctx);
+ rr->evt->evtid, rr->evt->arch_priv,
+ &tval, rr->arch_mon_ctx);
if (!err) {
rr->val += tval;
ret = 0;
@@ -906,7 +910,8 @@ struct mon_evt mon_event_all[QOS_NUM_EVENTS] = {
MON_EVENT(PMT_EVENT_UOPS_RETIRED, "uops_retired", RDT_RESOURCE_PERF_PKG, false),
};
-void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu, unsigned int binary_bits)
+void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu,
+ unsigned int binary_bits, void *arch_priv)
{
if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS ||
binary_bits > MAX_BINARY_BITS))
@@ -922,6 +927,7 @@ void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu, unsig
mon_event_all[eventid].any_cpu = any_cpu;
mon_event_all[eventid].binary_bits = binary_bits;
+ mon_event_all[eventid].arch_priv = arch_priv;
mon_event_all[eventid].enabled = true;
}
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 22/31] x86/resctrl: Read telemetry events
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
` (20 preceding siblings ...)
2025-07-11 23:53 ` [PATCH v7 21/31] x86,fs/resctrl: Add architectural event pointer Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-25 23:45 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 23/31] x86/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG Tony Luck
` (9 subsequent siblings)
31 siblings, 1 reply; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
The resctrl file system passes requests to read event monitor files to
the architecture resctrl_arch_rmid_read() to collect values
from hardware counters.
Use the resctrl resource to differentiate between calls to read legacy
L3 events from the new telemetry events (which are attached to
RDT_RESOURCE_PERF_PKG).
There may be multiple aggregators tracking each package, so scan all of
them and add up all counters.
Enable the events marked as readable from any CPU providing an
mon_evt::arch_priv pointer to the struct pmt_event for each
event.
At run time when a user reads an event file the file system code
provides the enum resctrl_event_id for the event and the arch_priv
pointer that was supplied when the event was enabled.
Resctrl now uses readq() so depends on X86_64. Update Kconfig.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
arch/x86/kernel/cpu/resctrl/internal.h | 7 ++++
arch/x86/kernel/cpu/resctrl/intel_aet.c | 46 +++++++++++++++++++++++++
arch/x86/kernel/cpu/resctrl/monitor.c | 3 ++
arch/x86/Kconfig | 2 +-
4 files changed, 57 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 36a2072c19c7..0081fb5a4420 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -172,9 +172,16 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
#ifdef CONFIG_X86_RESCTRL_CPU_INTEL_AET
bool intel_aet_get_events(void);
void __exit intel_aet_exit(void);
+int intel_aet_read_event(int domid, int rmid, enum resctrl_event_id evtid,
+ void *arch_priv, u64 *val);
#else
static inline bool intel_aet_get_events(void) { return false; }
static inline void __exit intel_aet_exit(void) { }
+static inline int intel_aet_read_event(int domid, int rmid, enum resctrl_event_id evtid,
+ void *arch_priv, u64 *val)
+{
+ return -EINVAL;
+}
#endif
#endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index f4bf0f2ccf26..bd6011a95d12 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -14,6 +14,7 @@
#include <linux/cleanup.h>
#include <linux/cpu.h>
#include <linux/intel_vsec.h>
+#include <linux/io.h>
#include <linux/resctrl.h>
#include <linux/slab.h>
@@ -213,6 +214,13 @@ static int discover_events(struct event_group *e, struct pmt_feature_group *p)
list_add(&e->list, &active_event_groups);
+ for (int i = 0; i < e->num_events; i++) {
+ enum resctrl_event_id eventid;
+
+ eventid = e->evts[i].id;
+ resctrl_enable_mon_event(eventid, true, e->evts[i].bin_bits, &e->evts[i]);
+ }
+
return 0;
}
@@ -279,3 +287,41 @@ void __exit intel_aet_exit(void)
list_del(&evg->list);
}
}
+
+#define DATA_VALID BIT_ULL(63)
+#define DATA_BITS GENMASK_ULL(62, 0)
+
+/*
+ * Read counter for an event on a domain (summing all aggregators
+ * on the domain).
+ */
+int intel_aet_read_event(int domid, int rmid, enum resctrl_event_id eventid,
+ void *arch_priv, u64 *val)
+{
+ struct pmt_event *pevt = arch_priv;
+ struct pkg_mmio_info *mmi;
+ struct event_group *e;
+ u64 evtcount;
+ void *pevt0;
+ int idx;
+
+ pevt0 = pevt - pevt->idx;
+ e = container_of(pevt0, struct event_group, evts);
+ idx = rmid * e->num_events;
+ idx += pevt->idx;
+ mmi = e->pkginfo[domid];
+
+ if (idx * sizeof(u64) + sizeof(u64) > e->mmio_size) {
+ pr_warn_once("MMIO index %d out of range\n", idx);
+ return -EIO;
+ }
+
+ for (int i = 0; i < mmi->num_regions; i++) {
+ evtcount = readq(mmi->addrs[i] + idx * sizeof(u64));
+ if (!(evtcount & DATA_VALID))
+ return -EINVAL;
+ *val += evtcount & DATA_BITS;
+ }
+
+ return 0;
+}
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 185b203f6321..51d7d99336c6 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -232,6 +232,9 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
resctrl_arch_rmid_read_context_check();
+ if (r->rid == RDT_RESOURCE_PERF_PKG)
+ return intel_aet_read_event(hdr->id, rmid, eventid, arch_priv, val);
+
if (r->rid != RDT_RESOURCE_L3)
return -EINVAL;
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 21c2d1022b15..512286ef6d71 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -530,7 +530,7 @@ config X86_CPU_RESCTRL
config X86_RESCTRL_CPU_INTEL_AET
bool "Intel Application Energy Telemetry" if INTEL_PMT_DISCOVERY=y
- depends on X86_CPU_RESCTRL && CPU_SUP_INTEL
+ depends on X86_64 && X86_CPU_RESCTRL && CPU_SUP_INTEL
help
Enable per-RMID telemetry events in resctrl
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 23/31] x86/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
` (21 preceding siblings ...)
2025-07-11 23:53 ` [PATCH v7 22/31] x86/resctrl: Read telemetry events Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-25 23:46 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 24/31] x86/resctrl: Add energy/perf choices to rdt boot option Tony Luck
` (8 subsequent siblings)
31 siblings, 1 reply; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
The L3 resource has several requirements for domains. There are structures
that hold the 64-bit values of counters, and elements to keep track of
the overflow and limbo threads.
None of these are needed for the PERF_PKG resource. The hardware counters
are wide enough that they do not wrap around for decades.
Define a new rdt_perf_pkg_mon_domain structure which just consists of
the standard rdt_domain_hdr to keep track of domain id and CPU mask.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
arch/x86/kernel/cpu/resctrl/core.c | 41 ++++++++++++++++++++++++++++++
1 file changed, 41 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 63baab53821a..c954171073c7 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -547,6 +547,38 @@ static void l3_mon_domain_setup(int cpu, int id, struct rdt_resource *r, struct
}
}
+/**
+ * struct rdt_perf_pkg_mon_domain - CPUs sharing an Intel-PMT-scoped resctrl monitor resource
+ * @hdr: common header for different domain types
+ */
+struct rdt_perf_pkg_mon_domain {
+ struct rdt_domain_hdr hdr;
+};
+
+static void setup_intel_aet_mon_domain(int cpu, int id, struct rdt_resource *r,
+ struct list_head *add_pos)
+{
+ struct rdt_perf_pkg_mon_domain *d;
+ int err;
+
+ d = kzalloc_node(sizeof(*d), GFP_KERNEL, cpu_to_node(cpu));
+ if (!d)
+ return;
+
+ d->hdr.id = id;
+ d->hdr.type = RESCTRL_MON_DOMAIN;
+ d->hdr.rid = r->rid;
+ cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
+ list_add_tail_rcu(&d->hdr.list, add_pos);
+
+ err = resctrl_online_mon_domain(r, &d->hdr);
+ if (err) {
+ list_del_rcu(&d->hdr.list);
+ synchronize_rcu();
+ kfree(d);
+ }
+}
+
static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
{
int id = get_domain_id_from_scope(cpu, r->mon_scope);
@@ -574,6 +606,9 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
case RDT_RESOURCE_L3:
l3_mon_domain_setup(cpu, id, r, add_pos);
break;
+ case RDT_RESOURCE_PERF_PKG:
+ setup_intel_aet_mon_domain(cpu, id, r, add_pos);
+ break;
default:
WARN_ON_ONCE(1);
}
@@ -670,6 +705,12 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
synchronize_rcu();
l3_mon_domain_free(hw_dom);
break;
+ case RDT_RESOURCE_PERF_PKG:
+ resctrl_offline_mon_domain(r, hdr);
+ list_del_rcu(&hdr->list);
+ synchronize_rcu();
+ kfree(container_of(hdr, struct rdt_perf_pkg_mon_domain, hdr));
+ break;
default:
pr_warn_once("Unknown resource rid=%d\n", r->rid);
break;
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 24/31] x86/resctrl: Add energy/perf choices to rdt boot option
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
` (22 preceding siblings ...)
2025-07-11 23:53 ` [PATCH v7 23/31] x86/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-25 23:46 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 25/31] x86/resctrl: Handle number of RMIDs supported by telemetry resources Tony Luck
` (7 subsequent siblings)
31 siblings, 1 reply; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
Legacy resctrl features are enumerated by X86_FEATURE_* flags. These
may be overridden by quirks to disable features in the case of errata.
Users can use kernel command line options to either disable a feature,
or to force enable a feature that was disabled by a quirk.
Provide similar functionality for hardware features that do not have an
X86_FEATURE_* flag.
Unlike other options that are tied to X86_FEATURE_* flags, these must be
queried by name. Add rdt_is_feature_enabled() to check whether quirks
or kernel command line have disabled a feature. Just like the legacy
feature options the command line enable overrides quirk disable.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
.../admin-guide/kernel-parameters.txt | 2 +-
arch/x86/kernel/cpu/resctrl/internal.h | 2 ++
arch/x86/kernel/cpu/resctrl/core.c | 30 +++++++++++++++++++
arch/x86/kernel/cpu/resctrl/intel_aet.c | 7 +++++
4 files changed, 40 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index f1f2c0874da9..4c12159f3ea0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6066,7 +6066,7 @@
rdt= [HW,X86,RDT]
Turn on/off individual RDT features. List is:
cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
- mba, smba, bmec.
+ mba, smba, bmec, energy, perf.
E.g. to turn on cmt and turn off mba use:
rdt=cmt,!mba
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 0081fb5a4420..83166dd0b9c8 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -169,6 +169,8 @@ void __init intel_rdt_mbm_apply_quirk(void);
void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
+bool rdt_is_feature_enabled(char *option);
+
#ifdef CONFIG_X86_RESCTRL_CPU_INTEL_AET
bool intel_aet_get_events(void);
void __exit intel_aet_exit(void);
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index c954171073c7..83e046313600 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -791,6 +791,8 @@ enum {
RDT_FLAG_MBA,
RDT_FLAG_SMBA,
RDT_FLAG_BMEC,
+ RDT_FLAG_ENERGY,
+ RDT_FLAG_PERF,
};
#define RDT_OPT(idx, n, f) \
@@ -816,6 +818,8 @@ static struct rdt_options rdt_options[] __ro_after_init = {
RDT_OPT(RDT_FLAG_MBA, "mba", X86_FEATURE_MBA),
RDT_OPT(RDT_FLAG_SMBA, "smba", X86_FEATURE_SMBA),
RDT_OPT(RDT_FLAG_BMEC, "bmec", X86_FEATURE_BMEC),
+ RDT_OPT(RDT_FLAG_ENERGY, "energy", 0),
+ RDT_OPT(RDT_FLAG_PERF, "perf", 0),
};
#define NUM_RDT_OPTIONS ARRAY_SIZE(rdt_options)
@@ -865,6 +869,32 @@ bool rdt_cpu_has(int flag)
return ret;
}
+/*
+ * Hardware features that do not have X86_FEATURE_* bits.
+ * There is no "hardware does not support this at all" case.
+ * Assume that the caller has already determined that
+ * support is present and just needs to check if the option has been
+ * disabled by a quirk that has not been overridden by a command
+ * line option.
+ */
+bool rdt_is_feature_enabled(char *name)
+{
+ struct rdt_options *o;
+ bool ret = true;
+
+ for (o = rdt_options; o < &rdt_options[NUM_RDT_OPTIONS]; o++) {
+ if (!strcmp(name, o->name)) {
+ if (o->force_off)
+ ret = false;
+ if (o->force_on)
+ ret = true;
+ break;
+ }
+ }
+
+ return ret;
+}
+
bool resctrl_arch_is_evt_configurable(enum resctrl_event_id evt)
{
if (!rdt_cpu_has(X86_FEATURE_BMEC))
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index bd6011a95d12..e64a4630e95c 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -49,6 +49,7 @@ struct pmt_event {
/**
* struct event_group - All information about a group of telemetry events.
+ * @name: Name for this group (used by boot rdt= option)
* @pfg: Points to the aggregated telemetry space information
* within the OOBMSM driver that contains data for all
* telemetry regions.
@@ -61,6 +62,7 @@ struct pmt_event {
*/
struct event_group {
/* Data fields for additional structures to manage this group. */
+ char *name;
struct pmt_feature_group *pfg;
struct list_head list;
struct pkg_mmio_info **pkginfo;
@@ -82,6 +84,7 @@ static LIST_HEAD(active_event_groups);
* File: xml/CWF/OOBMSM/RMID-ENERGY/cwf_aggregator.xml
*/
static struct event_group energy_0x26696143 = {
+ .name = "energy",
.guid = 0x26696143,
.mmio_size = XML_MMIO_SIZE(576, 2, 3),
.num_events = 2,
@@ -96,6 +99,7 @@ static struct event_group energy_0x26696143 = {
* File: xml/CWF/OOBMSM/RMID-PERF/cwf_aggregator.xml
*/
static struct event_group perf_0x26557651 = {
+ .name = "perf",
.guid = 0x26557651,
.mmio_size = XML_MMIO_SIZE(576, 7, 3),
.num_events = 7,
@@ -167,6 +171,9 @@ static int discover_events(struct event_group *e, struct pmt_feature_group *p)
struct pkg_mmio_info *mmi;
int num_pkgs;
+ if (!rdt_is_feature_enabled(e->name))
+ return -EINVAL;
+
num_pkgs = topology_max_packages();
/* Get per-package counts of telemetry regions for this event group */
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 25/31] x86/resctrl: Handle number of RMIDs supported by telemetry resources
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
` (23 preceding siblings ...)
2025-07-11 23:53 ` [PATCH v7 24/31] x86/resctrl: Add energy/perf choices to rdt boot option Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-25 23:49 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 26/31] fs/resctrl: Fix life-cycle of closid_num_dirty_rmid Tony Luck
` (6 subsequent siblings)
31 siblings, 1 reply; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
There are now three meanings for "number of RMIDs":
1) The number for legacy features enumerated by CPUID leaf 0xF. This
is the maximum number of distinct values that can be loaded into the
IA32_PQR_ASSOC MSR. Note that systems with Sub-NUMA Cluster mode enabled
will force scaling down the CPUID enumerated value by the number of SNC
nodes per L3-cache.
2) The number of registers in MMIO space for each event. This
is enumerated in the XML files and is the value initialized into
event_group::num_rmids. This will be overwritten with a lower
value if hardware does not support all these registers at the
same time (see next case).
3) The number of "h/w counters" (this isn't a strictly accurate
description of how things work, but serves as a useful analogy that
does describe the limitations) feeding to those MMIO registers. This
is enumerated in telemetry_region::num_rmids returned from the call to
intel_pmt_get_regions_by_feature()
Event groups with insufficient "h/w counter" to track all RMIDs are
difficult for users to use, since the system may reassign "h/w counters"
at any time. This means that users cannot reliably collect two consecutive
event counts to compute the rate at which events are occurring.
Add a variable rdt_num_system_rmids which holds the number of RMIDs
supported by the system (including adjustments if Sub-NUMA Cluster
mode is enabled).
Use rdt_set_feature_disabled() to mark such under-resourced event groups
as unusable. Note that the rdt_options[] structure must now be writable
at run-time. The request to disable will be overridden if the user
explicitly requests to enable using the "rdt=" Linux boot argument.
Scan all enabled event groups and assign the RDT_RESOURCE_PERF_PKG
resource "num_rmids" value to the smallest of these values as this
value will be used later to compare against the number of RMIDs
supported by other resources.
N.B. Changed type of rdt_resource::num_rmid to u32 to match, and
print as unsigned value in rdt_num_rmids_show().
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
include/linux/resctrl.h | 2 +-
arch/x86/kernel/cpu/resctrl/internal.h | 4 +++
arch/x86/kernel/cpu/resctrl/core.c | 18 ++++++++++++-
arch/x86/kernel/cpu/resctrl/intel_aet.c | 36 +++++++++++++++++++++++++
arch/x86/kernel/cpu/resctrl/monitor.c | 2 ++
fs/resctrl/rdtgroup.c | 2 +-
6 files changed, 61 insertions(+), 3 deletions(-)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index da76e9c37b69..74cd2979549b 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -288,7 +288,7 @@ struct rdt_resource {
int rid;
bool alloc_capable;
bool mon_capable;
- int num_rmid;
+ u32 num_rmid;
enum resctrl_scope ctrl_scope;
enum resctrl_scope mon_scope;
struct resctrl_cache cache;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 83166dd0b9c8..a6c41068dc2f 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -18,6 +18,8 @@
#define RMID_VAL_UNAVAIL BIT_ULL(62)
+extern u32 rdt_num_system_rmids;
+
/*
* With the above fields in use 62 bits remain in MSR_IA32_QM_CTR for
* data to be returned. The counter width is discovered from the hardware
@@ -171,6 +173,8 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
bool rdt_is_feature_enabled(char *option);
+void rdt_set_feature_disabled(char *name);
+
#ifdef CONFIG_X86_RESCTRL_CPU_INTEL_AET
bool intel_aet_get_events(void);
void __exit intel_aet_exit(void);
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 83e046313600..31fb598482bf 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -807,7 +807,7 @@ struct rdt_options {
bool force_off, force_on;
};
-static struct rdt_options rdt_options[] __ro_after_init = {
+static struct rdt_options rdt_options[] = {
RDT_OPT(RDT_FLAG_CMT, "cmt", X86_FEATURE_CQM_OCCUP_LLC),
RDT_OPT(RDT_FLAG_MBM_TOTAL, "mbmtotal", X86_FEATURE_CQM_MBM_TOTAL),
RDT_OPT(RDT_FLAG_MBM_LOCAL, "mbmlocal", X86_FEATURE_CQM_MBM_LOCAL),
@@ -869,6 +869,22 @@ bool rdt_cpu_has(int flag)
return ret;
}
+/*
+ * Can be called during feature enumeration if sanity check of
+ * a features parameters indicates problems with the feature.
+ */
+void rdt_set_feature_disabled(char *name)
+{
+ struct rdt_options *o;
+
+ for (o = rdt_options; o < &rdt_options[NUM_RDT_OPTIONS]; o++) {
+ if (!strcmp(name, o->name)) {
+ o->force_off = true;
+ return;
+ }
+ }
+}
+
/*
* Hardware features that do not have X86_FEATURE_* bits.
* There is no "hardware does not support this at all" case.
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index e64a4630e95c..6958efbf7e81 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -15,6 +15,7 @@
#include <linux/cpu.h>
#include <linux/intel_vsec.h>
#include <linux/io.h>
+#include <linux/minmax.h>
#include <linux/resctrl.h>
#include <linux/slab.h>
@@ -56,6 +57,9 @@ struct pmt_event {
* @list: List of active event groups.
* @pkginfo: Per-package MMIO addresses of telemetry regions belonging to this group.
* @guid: Unique number per XML description file.
+ * @num_rmids: Number of RMIDS supported by this group. Adjusted downwards
+ * if enumeration from intel_pmt_get_regions_by_feature() indicates
+ * fewer RMIDs can be tracked simultaneously.
* @mmio_size: Number of bytes of MMIO registers for this group.
* @num_events: Number of events in this group.
* @evts: Array of event descriptors.
@@ -69,6 +73,7 @@ struct event_group {
/* Remaining fields initialized from XML file. */
u32 guid;
+ u32 num_rmids;
size_t mmio_size;
unsigned int num_events;
struct pmt_event evts[] __counted_by(num_events);
@@ -86,6 +91,7 @@ static LIST_HEAD(active_event_groups);
static struct event_group energy_0x26696143 = {
.name = "energy",
.guid = 0x26696143,
+ .num_rmids = 576,
.mmio_size = XML_MMIO_SIZE(576, 2, 3),
.num_events = 2,
.evts = {
@@ -101,6 +107,7 @@ static struct event_group energy_0x26696143 = {
static struct event_group perf_0x26557651 = {
.name = "perf",
.guid = 0x26557651,
+ .num_rmids = 576,
.mmio_size = XML_MMIO_SIZE(576, 7, 3),
.num_events = 7,
.evts = {
@@ -143,6 +150,22 @@ static bool skip_this_region(struct telemetry_region *tr, struct event_group *e)
return false;
}
+static bool check_rmid_count(struct event_group *e, struct pmt_feature_group *p)
+{
+ struct telemetry_region *tr;
+
+ for (int i = 0; i < p->count; i++) {
+ tr = &p->regions[i];
+ if (skip_this_region(tr, e))
+ continue;
+
+ if (tr->num_rmids < rdt_num_system_rmids)
+ return false;
+ }
+
+ return true;
+}
+
static void free_pkg_mmio_info(struct pkg_mmio_info **mmi)
{
int num_pkgs = topology_max_packages();
@@ -165,12 +188,18 @@ DEFINE_FREE(pkg_mmio_info, struct pkg_mmio_info **, free_pkg_mmio_info(_T))
*/
static int discover_events(struct event_group *e, struct pmt_feature_group *p)
{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
struct pkg_mmio_info **pkginfo __free(pkg_mmio_info) = NULL;
int *pkgcounts __free(kfree) = NULL;
struct telemetry_region *tr;
struct pkg_mmio_info *mmi;
int num_pkgs;
+ /* Potentially disable feature if insufficient RMIDs */
+ if (!check_rmid_count(e, p))
+ rdt_set_feature_disabled(e->name);
+
+ /* User can override above disable from kernel command line */
if (!rdt_is_feature_enabled(e->name))
return -EINVAL;
@@ -182,6 +211,8 @@ static int discover_events(struct event_group *e, struct pmt_feature_group *p)
if (skip_this_region(tr, e))
continue;
+ e->num_rmids = min(e->num_rmids, tr->num_rmids);
+
if (!pkgcounts) {
pkgcounts = kcalloc(num_pkgs, sizeof(*pkgcounts), GFP_KERNEL);
if (!pkgcounts)
@@ -228,6 +259,11 @@ static int discover_events(struct event_group *e, struct pmt_feature_group *p)
resctrl_enable_mon_event(eventid, true, e->evts[i].bin_bits, &e->evts[i]);
}
+ if (r->num_rmid)
+ r->num_rmid = min(r->num_rmid, e->num_rmids);
+ else
+ r->num_rmid = e->num_rmids;
+
return 0;
}
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 51d7d99336c6..aac7b7310d81 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -33,6 +33,7 @@ bool rdt_mon_capable;
#define CF(cf) ((unsigned long)(1048576 * (cf) + 0.5))
+u32 rdt_num_system_rmids;
static int snc_nodes_per_l3_cache = 1;
/*
@@ -358,6 +359,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
resctrl_rmid_realloc_limit = boot_cpu_data.x86_cache_size * 1024;
hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale / snc_nodes_per_l3_cache;
r->num_rmid = (boot_cpu_data.x86_cache_max_rmid + 1) / snc_nodes_per_l3_cache;
+ rdt_num_system_rmids = r->num_rmid;
hw_res->mbm_width = MBM_CNTR_WIDTH_BASE;
if (mbm_offset > 0 && mbm_offset <= MBM_CNTR_WIDTH_OFFSET_MAX)
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 11792e841525..9e4df213906f 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1135,7 +1135,7 @@ static int rdt_num_rmids_show(struct kernfs_open_file *of,
{
struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
- seq_printf(seq, "%d\n", r->num_rmid);
+ seq_printf(seq, "%u\n", r->num_rmid);
return 0;
}
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 26/31] fs/resctrl: Fix life-cycle of closid_num_dirty_rmid
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
` (24 preceding siblings ...)
2025-07-11 23:53 ` [PATCH v7 25/31] x86/resctrl: Handle number of RMIDs supported by telemetry resources Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-25 23:51 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 27/31] x86,fs/resctrl: Move RMID initialization to first mount Tony Luck
` (5 subsequent siblings)
31 siblings, 1 reply; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
closid_num_dirty_rmid is specific to the L3 resource, but it
is allocated/freed in the more generic dom_data_{init,exit}().
Add helpers to allocate/free closid_num_dirty_rmid.
Rename resctrl_mon_resource_init() to resctrl_mon_l3_resource_init()
and call the closid_num_dirty_rmid_init() here, instead of
allocating in dom_data_init().
Making matching changes to the exit path by renaming
resctrl_mon_resource_exit() to resctrl_mon_l3_resource_exit()
and free closid_num_dirty_rmid here instead of in dom_data_exit().
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
fs/resctrl/internal.h | 6 ++--
fs/resctrl/monitor.c | 69 ++++++++++++++++++++++++-------------------
fs/resctrl/rdtgroup.c | 12 ++++----
3 files changed, 48 insertions(+), 39 deletions(-)
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 56fdccb39375..28d505efdb7c 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -358,7 +358,9 @@ int alloc_rmid(u32 closid);
void free_rmid(u32 closid, u32 rmid);
-void resctrl_mon_resource_exit(void);
+int resctrl_mon_l3_resource_init(void);
+
+void resctrl_mon_l3_resource_exit(void);
void mon_event_count(void *info);
@@ -368,8 +370,6 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
struct rdt_domain_hdr *hdr, struct rdtgroup *rdtgrp,
cpumask_t *cpumask, struct mon_evt *evt, int first);
-int resctrl_mon_resource_init(void);
-
void mbm_setup_overflow_handler(struct rdt_l3_mon_domain *dom,
unsigned long delay_ms,
int exclude_cpu);
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 92798e1fb5b0..e3eceba70713 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -86,6 +86,37 @@ unsigned int resctrl_rmid_realloc_threshold;
*/
unsigned int resctrl_rmid_realloc_limit;
+static int closid_num_dirty_rmid_init(struct rdt_resource *r)
+{
+ if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID) &&
+ !closid_num_dirty_rmid) {
+ u32 num_closid = resctrl_arch_get_num_closid(r);
+ u32 *tmp;
+
+ /*
+ * If the architecture hasn't provided a sanitised value here,
+ * this may result in larger arrays than necessary. Resctrl will
+ * use a smaller system wide value based on the resources in
+ * use.
+ */
+ tmp = kcalloc(num_closid, sizeof(*tmp), GFP_KERNEL);
+ if (!tmp)
+ return -ENOMEM;
+
+ closid_num_dirty_rmid = tmp;
+ }
+
+ return 0;
+}
+
+static void closid_num_dirty_rmid_exit(void)
+{
+ if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
+ kfree(closid_num_dirty_rmid);
+ closid_num_dirty_rmid = NULL;
+ }
+}
+
/*
* x86 and arm64 differ in their handling of monitoring.
* x86's RMID are independent numbers, there is only one source of traffic
@@ -805,36 +836,14 @@ void mbm_setup_overflow_handler(struct rdt_l3_mon_domain *dom, unsigned long del
static int dom_data_init(struct rdt_resource *r)
{
u32 idx_limit = resctrl_arch_system_num_rmid_idx();
- u32 num_closid = resctrl_arch_get_num_closid(r);
struct rmid_entry *entry = NULL;
int err = 0, i;
u32 idx;
mutex_lock(&rdtgroup_mutex);
- if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
- u32 *tmp;
-
- /*
- * If the architecture hasn't provided a sanitised value here,
- * this may result in larger arrays than necessary. Resctrl will
- * use a smaller system wide value based on the resources in
- * use.
- */
- tmp = kcalloc(num_closid, sizeof(*tmp), GFP_KERNEL);
- if (!tmp) {
- err = -ENOMEM;
- goto out_unlock;
- }
-
- closid_num_dirty_rmid = tmp;
- }
rmid_ptrs = kcalloc(idx_limit, sizeof(struct rmid_entry), GFP_KERNEL);
if (!rmid_ptrs) {
- if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
- kfree(closid_num_dirty_rmid);
- closid_num_dirty_rmid = NULL;
- }
err = -ENOMEM;
goto out_unlock;
}
@@ -870,11 +879,6 @@ static void dom_data_exit(struct rdt_resource *r)
if (!r->mon_capable)
goto out_unlock;
- if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
- kfree(closid_num_dirty_rmid);
- closid_num_dirty_rmid = NULL;
- }
-
kfree(rmid_ptrs);
rmid_ptrs = NULL;
@@ -938,7 +942,7 @@ bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid)
}
/**
- * resctrl_mon_resource_init() - Initialise global monitoring structures.
+ * resctrl_mon_l3_resource_init() - Initialise global monitoring structures.
*
* Allocate and initialise global monitor resources that do not belong to a
* specific domain. i.e. the rmid_ptrs[] used for the limbo and free lists.
@@ -949,7 +953,7 @@ bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid)
*
* Returns 0 for success, or -ENOMEM.
*/
-int resctrl_mon_resource_init(void)
+int resctrl_mon_l3_resource_init(void)
{
struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
int ret;
@@ -957,6 +961,10 @@ int resctrl_mon_resource_init(void)
if (!r->mon_capable)
return 0;
+ ret = closid_num_dirty_rmid_init(r);
+ if (ret)
+ return ret;
+
ret = dom_data_init(r);
if (ret)
return ret;
@@ -980,9 +988,10 @@ int resctrl_mon_resource_init(void)
return 0;
}
-void resctrl_mon_resource_exit(void)
+void resctrl_mon_l3_resource_exit(void)
{
struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
+ closid_num_dirty_rmid_exit();
dom_data_exit(r);
}
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 9e4df213906f..b45f3d63c629 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -4114,7 +4114,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *h
* Called when the first CPU of a domain comes online, regardless of whether
* the filesystem is mounted.
* During boot this may be called before global allocations have been made by
- * resctrl_mon_resource_init().
+ * resctrl_mon_l3_resource_init().
*
* Returns 0 for success, or -ENOMEM.
*/
@@ -4298,13 +4298,13 @@ int resctrl_init(void)
thread_throttle_mode_init();
- ret = resctrl_mon_resource_init();
+ ret = resctrl_mon_l3_resource_init();
if (ret)
return ret;
ret = sysfs_create_mount_point(fs_kobj, "resctrl");
if (ret) {
- resctrl_mon_resource_exit();
+ resctrl_mon_l3_resource_exit();
return ret;
}
@@ -4339,7 +4339,7 @@ int resctrl_init(void)
cleanup_mountpoint:
sysfs_remove_mount_point(fs_kobj, "resctrl");
- resctrl_mon_resource_exit();
+ resctrl_mon_l3_resource_exit();
return ret;
}
@@ -4375,7 +4375,7 @@ static bool resctrl_online_domains_exist(void)
* When called by the architecture code, all CPUs and resctrl domains must be
* offline. This ensures the limbo and overflow handlers are not scheduled to
* run, meaning the data structures they access can be freed by
- * resctrl_mon_resource_exit().
+ * resctrl_mon_l3_resource_exit().
*
* After resctrl_exit() returns, the architecture code should return an
* error from all resctrl_arch_ functions that can do this.
@@ -4402,5 +4402,5 @@ void resctrl_exit(void)
* it can be used to umount resctrl.
*/
- resctrl_mon_resource_exit();
+ resctrl_mon_l3_resource_exit();
}
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 27/31] x86,fs/resctrl: Move RMID initialization to first mount
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
` (25 preceding siblings ...)
2025-07-11 23:53 ` [PATCH v7 26/31] fs/resctrl: Fix life-cycle of closid_num_dirty_rmid Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-25 23:53 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 28/31] x86/resctrl: Enable RDT_RESOURCE_PERF_PKG Tony Luck
` (4 subsequent siblings)
31 siblings, 1 reply; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
The resctrl file system code assumed that the only monitor events were
tied to the RDT_RESOURCE_L3 resource. Also that the number of supported
RMIDs was enumerated during early initialization.
RDT_RESOURCE_PERF_PKG breaks both of those assumptions.
Delay the final enumeration of the number of RMIDs and subsequent
allocation of structures until first mount of the resctrl file system
so that the number of usable RMIDs can be computed as the minimum
value from all enabled monitor resources.
Since the dom_data* functions now only allocate/free structures
used for RMIDs, rename: dom_data_init() -> rmid_init(),
dom_data_exit() -> rmid_exit().
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
fs/resctrl/internal.h | 2 ++
arch/x86/kernel/cpu/resctrl/core.c | 8 ++++++--
fs/resctrl/monitor.c | 26 +++++++++-----------------
fs/resctrl/rdtgroup.c | 6 ++++++
4 files changed, 23 insertions(+), 19 deletions(-)
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 28d505efdb7c..7fca1849742f 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -358,6 +358,8 @@ int alloc_rmid(u32 closid);
void free_rmid(u32 closid, u32 rmid);
+int rmid_init(void);
+
int resctrl_mon_l3_resource_init(void);
void resctrl_mon_l3_resource_exit(void);
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 31fb598482bf..1a6635cc5b37 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -112,10 +112,14 @@ struct rdt_hw_resource rdt_resources_all[RDT_NUM_RESOURCES] = {
u32 resctrl_arch_system_num_rmid_idx(void)
{
- struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+ u32 num_rmids = U32_MAX;
+ struct rdt_resource *r;
+
+ for_each_mon_capable_rdt_resource(r)
+ num_rmids = min(num_rmids, r->num_rmid);
/* RMID are independent numbers for x86. num_rmid_idx == num_rmid */
- return r->num_rmid;
+ return num_rmids == U32_MAX ? 0 : num_rmids;
}
struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index e3eceba70713..3fe81c43e5e8 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -833,20 +833,19 @@ void mbm_setup_overflow_handler(struct rdt_l3_mon_domain *dom, unsigned long del
schedule_delayed_work_on(cpu, &dom->mbm_over, delay);
}
-static int dom_data_init(struct rdt_resource *r)
+int rmid_init(void)
{
u32 idx_limit = resctrl_arch_system_num_rmid_idx();
struct rmid_entry *entry = NULL;
- int err = 0, i;
u32 idx;
+ int i;
- mutex_lock(&rdtgroup_mutex);
+ if (rmid_ptrs)
+ return 0;
rmid_ptrs = kcalloc(idx_limit, sizeof(struct rmid_entry), GFP_KERNEL);
- if (!rmid_ptrs) {
- err = -ENOMEM;
- goto out_unlock;
- }
+ if (!rmid_ptrs)
+ return -ENOMEM;
for (i = 0; i < idx_limit; i++) {
entry = &rmid_ptrs[i];
@@ -866,13 +865,10 @@ static int dom_data_init(struct rdt_resource *r)
entry = __rmid_entry(idx);
list_del(&entry->list);
-out_unlock:
- mutex_unlock(&rdtgroup_mutex);
-
- return err;
+ return 0;
}
-static void dom_data_exit(struct rdt_resource *r)
+static void rmid_exit(struct rdt_resource *r)
{
mutex_lock(&rdtgroup_mutex);
@@ -965,10 +961,6 @@ int resctrl_mon_l3_resource_init(void)
if (ret)
return ret;
- ret = dom_data_init(r);
- if (ret)
- return ret;
-
if (resctrl_arch_is_evt_configurable(QOS_L3_MBM_TOTAL_EVENT_ID)) {
mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID].configurable = true;
resctrl_file_fflags_init("mbm_total_bytes_config",
@@ -993,5 +985,5 @@ void resctrl_mon_l3_resource_exit(void)
struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
closid_num_dirty_rmid_exit();
- dom_data_exit(r);
+ rmid_exit(r);
}
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index b45f3d63c629..9e667d3a93ae 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2599,6 +2599,12 @@ static int rdt_get_tree(struct fs_context *fc)
goto out;
}
+ if (resctrl_arch_mon_capable()) {
+ ret = rmid_init();
+ if (ret)
+ goto out;
+ }
+
ret = rdtgroup_setup_root(ctx);
if (ret)
goto out;
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 28/31] x86/resctrl: Enable RDT_RESOURCE_PERF_PKG
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
` (26 preceding siblings ...)
2025-07-11 23:53 ` [PATCH v7 27/31] x86,fs/resctrl: Move RMID initialization to first mount Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-25 23:54 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 29/31] fs/resctrl: Provide interface to create architecture specific debugfs area Tony Luck
` (3 subsequent siblings)
31 siblings, 1 reply; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
The RDT_RESOURCE_PERF_PKG resource is not marked as "mon_capable" during
early resctrl initialization. This means that the domain lists for the
resource are not built when the CPU hot plug notifiers are registered.
Mark the resource as mon_capable and call domain_add_cpu_mon() for
each online CPU to build the domain lists in the first call to the
resctrl_arch_pre_mount() hook.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
arch/x86/kernel/cpu/resctrl/core.c | 14 +++++++++++++-
arch/x86/kernel/cpu/resctrl/intel_aet.c | 3 +++
2 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 1a6635cc5b37..1d07c38ed528 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -774,14 +774,26 @@ static int resctrl_arch_offline_cpu(unsigned int cpu)
void resctrl_arch_pre_mount(void)
{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
static atomic_t only_once = ATOMIC_INIT(0);
- int old = 0;
+ int cpu, old = 0;
if (!atomic_try_cmpxchg(&only_once, &old, 1))
return;
if (!intel_aet_get_events())
return;
+
+ /*
+ * Late discovery of telemetry events means the domains for the
+ * resource were not built. Do that now.
+ */
+ cpus_read_lock();
+ mutex_lock(&domain_list_lock);
+ for_each_online_cpu(cpu)
+ domain_add_cpu_mon(cpu, r);
+ mutex_unlock(&domain_list_lock);
+ cpus_read_unlock();
}
enum {
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index 6958efbf7e81..ea7a782c1661 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -264,6 +264,9 @@ static int discover_events(struct event_group *e, struct pmt_feature_group *p)
else
r->num_rmid = e->num_rmids;
+ pr_info("%s %s monitoring detected\n", r->name, e->name);
+ r->mon_capable = true;
+
return 0;
}
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 29/31] fs/resctrl: Provide interface to create architecture specific debugfs area
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
` (27 preceding siblings ...)
2025-07-11 23:53 ` [PATCH v7 28/31] x86/resctrl: Enable RDT_RESOURCE_PERF_PKG Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-25 23:55 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 30/31] x86/resctrl: Add debugfs files to show telemetry aggregator status Tony Luck
` (2 subsequent siblings)
31 siblings, 1 reply; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
Architectures are constrained to just the file interfaces provided by
the file system for each resource. This does not allow for architecture
specific debug interfaces.
Add resctrl_debugfs_mon_info_arch_mkdir() which creates a directory in the
debugfs file system for a resource. Naming follows the layout of the
main resctrl hierarchy:
/sys/kernel/debug/resctrl/info/{resource}_MON/{arch}
The {arch} last level directory name matches the output of
the user level "uname -m" command.
Architecture code may use this directory for debug information,
or for minor tuning of features. It must not be used for basic
feature enabling as debugfs may not be configured/mounted on
production systems.
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
include/linux/resctrl.h | 9 +++++++++
fs/resctrl/rdtgroup.c | 29 +++++++++++++++++++++++++++++
2 files changed, 38 insertions(+)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 74cd2979549b..ed5085eeee1b 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -571,6 +571,15 @@ void resctrl_arch_reset_all_ctrls(struct rdt_resource *r);
extern unsigned int resctrl_rmid_realloc_threshold;
extern unsigned int resctrl_rmid_realloc_limit;
+/**
+ * resctrl_debugfs_mon_info_arch_mkdir() - Create a debugfs info directory.
+ * Removed by resctrl_exit().
+ * @r: Resource (must be mon_capable).
+ *
+ * Return: dentry pointer on success, or NULL on error.
+ */
+struct dentry *resctrl_debugfs_mon_info_arch_mkdir(struct rdt_resource *r);
+
int resctrl_init(void);
void resctrl_exit(void);
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 9e667d3a93ae..fdd6cf372d6c 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -24,6 +24,7 @@
#include <linux/sched/task.h>
#include <linux/slab.h>
#include <linux/user_namespace.h>
+#include <linux/utsname.h>
#include <uapi/linux/magic.h>
@@ -4350,6 +4351,33 @@ int resctrl_init(void)
return ret;
}
+static struct dentry *debugfs_resctrl_info;
+
+/*
+ * Create /sys/kernel/debug/resctrl/info/{r->name}_MON/{arch} directory
+ * by request for architecture to use for debugging or minor tuning.
+ * Basic functionality of features must not be controlled by files
+ * added to this directory as debugs may not be configured/mounted
+ * on production systems.
+ */
+struct dentry *resctrl_debugfs_mon_info_arch_mkdir(struct rdt_resource *r)
+{
+ struct dentry *moninfodir;
+ char name[32];
+
+ if (!r->mon_capable)
+ return NULL;
+
+ if (!debugfs_resctrl_info)
+ debugfs_resctrl_info = debugfs_create_dir("info", debugfs_resctrl);
+
+ sprintf(name, "%s_MON", r->name);
+
+ moninfodir = debugfs_create_dir(name, debugfs_resctrl_info);
+
+ return debugfs_create_dir(utsname()->machine, moninfodir);
+}
+
static bool resctrl_online_domains_exist(void)
{
struct rdt_resource *r;
@@ -4401,6 +4429,7 @@ void resctrl_exit(void)
debugfs_remove_recursive(debugfs_resctrl);
debugfs_resctrl = NULL;
+ debugfs_resctrl_info = NULL;
unregister_filesystem(&rdt_fs_type);
/*
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 30/31] x86/resctrl: Add debugfs files to show telemetry aggregator status
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
` (28 preceding siblings ...)
2025-07-11 23:53 ` [PATCH v7 29/31] fs/resctrl: Provide interface to create architecture specific debugfs area Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-11 23:53 ` [PATCH v7 31/31] x86,fs/resctrl: Update Documentation for package events Tony Luck
2025-07-30 18:42 ` [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Moger, Babu
31 siblings, 0 replies; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
Each telemetry aggregator provides three status registers at the top
end of MMIO space after all the per-RMID per-event counters:
agg_data_loss_count: This counts the number of times that this aggregator
failed to accumulate a counter value supplied by a CPU core.
agg_data_loss_timestamp: This is a "timestamp" from a free running
25MHz uncore timer indicating when the most recent data loss occurred.
last_update_timestamp: Another 25MHz timestamp indicating when the
most recent counter update was successfully applied.
Create files in /sys/kernel/debug/resctrl/info/PERF_PKG_MON/x86_64/
to display the value of each of these status registers for each aggregator
in each enabled event group.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
arch/x86/kernel/cpu/resctrl/intel_aet.c | 53 +++++++++++++++++++++++++
1 file changed, 53 insertions(+)
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index ea7a782c1661..80c0dbe33150 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -13,6 +13,7 @@
#include <linux/cleanup.h>
#include <linux/cpu.h>
+#include <linux/debugfs.h>
#include <linux/intel_vsec.h>
#include <linux/io.h>
#include <linux/minmax.h>
@@ -305,6 +306,55 @@ static bool get_pmt_feature(enum pmt_feature_id feature, struct event_group **ev
return false;
}
+static int status_read(void *priv, u64 *val)
+{
+ void __iomem *info = (void __iomem *)priv;
+
+ *val = readq(info);
+
+ return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(status_fops, status_read, NULL, "%llu\n");
+
+static void make_status_files(struct dentry *dir, struct event_group *e, int pkg, int instance)
+{
+ void *info = (void __force *)e->pkginfo[pkg]->addrs[instance] + e->mmio_size;
+ char name[64];
+
+ sprintf(name, "%s_pkg%d_agg%d_data_loss_count", e->name, pkg, instance);
+ debugfs_create_file(name, 0400, dir, info - 24, &status_fops);
+
+ sprintf(name, "%s_pkg%d_agg%d_data_loss_timestamp", e->name, pkg, instance);
+ debugfs_create_file(name, 0400, dir, info - 16, &status_fops);
+
+ sprintf(name, "%s_pkg%d_agg%d_last_update_timestamp", e->name, pkg, instance);
+ debugfs_create_file(name, 0400, dir, info - 8, &status_fops);
+}
+
+static void create_debug_event_status_files(struct dentry *dir, struct event_group *e)
+{
+ int num_pkgs = topology_max_packages();
+
+ for (int i = 0; i < num_pkgs; i++)
+ for (int j = 0; j < e->pkginfo[i]->num_regions; j++)
+ make_status_files(dir, e, i, j);
+}
+
+static void create_debugfs_status_file(void)
+{
+ struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
+ struct event_group *evg;
+ struct dentry *infodir;
+
+ infodir = resctrl_debugfs_mon_info_arch_mkdir(r);
+ if (!infodir)
+ return;
+
+ list_for_each_entry(evg, &active_event_groups, list)
+ create_debug_event_status_files(infodir, evg);
+}
+
/*
* Ask OOBMSM discovery driver for all the RMID based telemetry groups
* that it supports.
@@ -318,6 +368,9 @@ bool intel_aet_get_events(void)
ret2 = get_pmt_feature(FEATURE_PER_RMID_PERF_TELEM,
known_perf_event_groups, NUM_KNOWN_PERF_GROUPS);
+ if (ret1 || ret2)
+ create_debugfs_status_file();
+
return ret1 || ret2;
}
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v7 31/31] x86,fs/resctrl: Update Documentation for package events
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
` (29 preceding siblings ...)
2025-07-11 23:53 ` [PATCH v7 30/31] x86/resctrl: Add debugfs files to show telemetry aggregator status Tony Luck
@ 2025-07-11 23:53 ` Tony Luck
2025-07-30 18:42 ` [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Moger, Babu
31 siblings, 0 replies; 61+ messages in thread
From: Tony Luck @ 2025-07-11 23:53 UTC (permalink / raw)
To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches, Tony Luck
Each "mon_data" directory is now divided between L3 events and package
events.
The "info/PERF_PKG_MON" directory contains parameters for perf events.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
Documentation/filesystems/resctrl.rst | 85 +++++++++++++++++++++++----
1 file changed, 75 insertions(+), 10 deletions(-)
diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index c7949dd44f2f..065f9fdd8f95 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -167,7 +167,7 @@ with respect to allocation:
bandwidth percentages are directly applied to
the threads running on the core
-If RDT monitoring is available there will be an "L3_MON" directory
+If L3 monitoring is available there will be an "L3_MON" directory
with the following files:
"num_rmids":
@@ -261,6 +261,18 @@ with the following files:
bytes) at which a previously used LLC_occupancy
counter can be considered for re-use.
+If telemetry monitoring is available there will be an "PERF_PKG_MON" directory
+with the following files:
+
+"num_rmids":
+ The number of telemetry RMIDs supported. If this is different
+ from the number reported in the L3_MON directory the limit
+ on the number of "CTRL_MON" + "MON" directories is the
+ minimum of the values.
+
+"mon_features":
+ Lists the telemetry monitoring events that are enabled on this system.
+
Finally, in the top level of the "info" directory there is a file
named "last_cmd_status". This is reset with every "command" issued
via the file system (making new directories or writing to any of the
@@ -366,15 +378,36 @@ When control is enabled all CTRL_MON groups will also contain:
When monitoring is enabled all MON groups will also contain:
"mon_data":
- This contains a set of files organized by L3 domain and by
- RDT event. E.g. on a system with two L3 domains there will
- be subdirectories "mon_L3_00" and "mon_L3_01". Each of these
- directories have one file per event (e.g. "llc_occupancy",
- "mbm_total_bytes", and "mbm_local_bytes"). In a MON group these
- files provide a read out of the current value of the event for
- all tasks in the group. In CTRL_MON groups these files provide
- the sum for all tasks in the CTRL_MON group and all tasks in
- MON groups. Please see example section for more details on usage.
+ This contains a set of directories, one for each instance
+ of an L3 cache, or of a processor package. The L3 cache
+ directories are named "mon_L3_00", "mon_L3_01" etc. The
+ package directories "mon_PERF_PKG_00", "mon_PERF_PKG_01" etc.
+
+ Within each directory there is one file per event. In
+ the L3 directories: "llc_occupancy", "mbm_total_bytes",
+ and "mbm_local_bytes". In the PERF_PKG directories: "core_energy",
+ "activity", etc.
+
+ "core_energy" reports a floating point number for the energy
+ (in Joules) used by CPUs for each RMID.
+
+ "activity" also reports a floating point value (in Farads).
+ This provides an estimate of work done independent of the
+ frequency that the CPUs used for execution.
+
+ Note that these two counters only measure energy/activity
+ in the "core" of the CPU (arithmetic units, TLB, L1 and L2
+ caches, etc.). They do not include L3 cache, memory, I/O
+ devices etc.
+
+ All other events report decimal integer values.
+
+ In a MON group these files provide a read out of the current
+ value of the event for all tasks in the group. In CTRL_MON groups
+ these files provide the sum for all tasks in the CTRL_MON group
+ and all tasks in MON groups. Please see example section for more
+ details on usage.
+
On systems with Sub-NUMA Cluster (SNC) enabled there are extra
directories for each node (located within the "mon_L3_XX" directory
for the L3 cache they occupy). These are named "mon_sub_L3_YY"
@@ -1300,6 +1333,38 @@ Example with C::
resctrl_release_lock(fd);
}
+Debugfs
+=======
+In addition to the use of debugfs for tracing of pseudo-locking
+performance, architecture code may create debugfs directories
+associated with monitoring features for a specific resource.
+
+The full pathname for these is in the form:
+
+ /sys/kernel/debug/resctrl/info/{resource_name}_MON/{arch}/
+
+The prescence, names, and format of these files will vary
+between architectures even if the same resource is present.
+
+PERF_PKG_MON/x86_64
+-------------------
+Three files are present per telemetry aggregator instance
+that show when and how often the hardware has failed to
+collect and accumulate data from the CPUs.
+
+agg_data_loss_count:
+ This counts the number of times that this aggregator
+ failed to accumulate a counter value supplied by a CPU.
+
+agg_data_loss_timestamp:
+ This is a "timestamp" from a free running 25MHz uncore
+ timer indicating when the most recent data loss occurred.
+
+last_update_timestamp:
+ Another 25MHz timestamp indicating when the
+ most recent counter update was successfully applied.
+
+
Examples for RDT Monitoring along with allocation usage
=======================================================
Reading monitored data
--
2.50.0
^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH v7 01/31] x86,fs/resctrl: Consolidate monitor event descriptions
2025-07-11 23:53 ` [PATCH v7 01/31] x86,fs/resctrl: Consolidate monitor event descriptions Tony Luck
@ 2025-07-17 17:51 ` Reinette Chatre
0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2025-07-17 17:51 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches
Hi Tony,
On 7/11/25 4:53 PM, Tony Luck wrote:
> There are currently only three monitor events, all associated with
> the RDT_RESOURCE_L3 resource. Growing support for additional events
> will be easier with some restructuring to have a single point in
> file system code where all attributes of all events are defined.
>
> Place all event descriptions into an array mon_event_all[]. Doing
> this has the beneficial side effect of removing the need for
> rdt_resource::evt_list.
>
> Add resctrl_event_id::QOS_FIRST_EVENT for a lower bound on range
> checks for event ids and as the starting index to scan mon_event_all[].
>
> Drop the code that builds evt_list and change the two places where
> the list is scanned to scan mon_event_all[] instead using a new
> helper macro for_each_mon_event().
>
> Architecture code now informs file system code which events are
> available with resctrl_enable_mon_event().
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
> ---
Thank you.
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reinette
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v7 05/31] x86,fs/resctrl: Improve domain type checking
2025-07-11 23:53 ` [PATCH v7 05/31] x86,fs/resctrl: Improve domain type checking Tony Luck
@ 2025-07-25 23:17 ` Reinette Chatre
0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2025-07-25 23:17 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches
Hi Tony,
On 7/11/25 4:53 PM, Tony Luck wrote:
> The rdt_domain_hdr structure is used in both control and monitor
> domain structures to provide common methods for operations such as
> adding a CPU to a domain, removing a CPU from a domain, accessing
> the mask of all CPUs in a domain.
>
> The "type" field provides a simple check whether a domain is a
> control or monitor domain so that programming errors operating
> on domains will be quickly caught.
Above is context.
Below is a mixup of problem and solution. Please separate these clearly.
> To prepare for additional domain types that depend on the rdt_resource
If "rdt_resource" refers to the struct then please "struct rdt_resource".
If it instead just refers to the concept of a resource, just "resource".
> to which they are connected add the resource id into the header
> and check that in addition to the type.
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
Reinette
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v7 06/31] x86/resctrl: Move L3 initialization into new helper function
2025-07-11 23:53 ` [PATCH v7 06/31] x86/resctrl: Move L3 initialization into new helper function Tony Luck
@ 2025-07-25 23:21 ` Reinette Chatre
0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2025-07-25 23:21 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches
Hi Tony,
On 7/11/25 4:53 PM, Tony Luck wrote:
Missing context and problem description.
> To prepare for additional types of monitoring domains, move open coded L3
> resource monitoring domain initialization from domain_add_cpu_mon() into
> a new helper function l3_mon_domain_setup() called by domain_add_cpu_mon().
Please drop "function" that is unnecessary when using ().
(this highlights to consider my proposals critically as this confirms how easily
I too get it wrong)
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
...
> +static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
> +{
> + int id = get_domain_id_from_scope(cpu, r->mon_scope);
> + struct list_head *add_pos = NULL;
> + struct rdt_domain_hdr *hdr;
> +
> + lockdep_assert_held(&domain_list_lock);
> +
> + if (id < 0) {
> + pr_warn_once("Can't find monitor domain id for CPU:%d scope:%d for resource %s\n",
> + cpu, r->mon_scope, r->name);
> + return;
> + }
> +
> + hdr = resctrl_find_domain(&r->mon_domains, id, &add_pos);
> + if (hdr) {
> + if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, r->rid))
> + return;
> + cpumask_set_cpu(cpu, &hdr->cpu_mask);
> +
> + return;
> + }
> +
> + switch (r->rid) {
> + case RDT_RESOURCE_L3:
> + l3_mon_domain_setup(cpu, id, r, add_pos);
> + break;
> + default:
> + WARN_ON_ONCE(1);
Please add a "break". For consistency, when compared with the changes to
partner domain_remove_cpu_mon() in the next patch I'd vote for how the
default branch in domain_remove_cpu_mon() looks.
Reinette
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v7 08/31] x86/resctrl: Clean up domain_remove_cpu_ctrl()
2025-07-11 23:53 ` [PATCH v7 08/31] x86/resctrl: Clean up domain_remove_cpu_ctrl() Tony Luck
@ 2025-07-25 23:22 ` Reinette Chatre
0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2025-07-25 23:22 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches
Hi Tony,
On 7/11/25 4:53 PM, Tony Luck wrote:
> For symmetry with domain_remove_cpu_mon() refactor to take an
> early return when removing a CPU does not empty the domain.
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> arch/x86/kernel/cpu/resctrl/core.c | 29 ++++++++++++++---------------
> 1 file changed, 14 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 49e17c246c60..0c5ada54bb20 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -602,25 +602,24 @@ static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
> if (!domain_header_is_valid(hdr, RESCTRL_CTRL_DOMAIN, r->rid))
> return;
>
> + cpumask_clear_cpu(cpu, &hdr->cpu_mask);
> + if (!cpumask_empty(&hdr->cpu_mask))
> + return;
> +
> d = container_of(hdr, struct rdt_ctrl_domain, hdr);
> hw_dom = resctrl_to_arch_ctrl_dom(d);
>
> - cpumask_clear_cpu(cpu, &d->hdr.cpu_mask);
> - if (cpumask_empty(&d->hdr.cpu_mask)) {
> - resctrl_offline_ctrl_domain(r, d);
> - list_del_rcu(&d->hdr.list);
> - synchronize_rcu();
> -
> - /*
> - * rdt_ctrl_domain "d" is going to be freed below, so clear
> - * its pointer from pseudo_lock_region struct.
> - */
> - if (d->plr)
> - d->plr->d = NULL;
> - ctrl_domain_free(hw_dom);
> + resctrl_offline_ctrl_domain(r, d);
> + list_del_rcu(&d->hdr.list);
nit: this can just reference hdr directly.
Reinette
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v7 09/31] x86,fs/resctrl: Use struct rdt_domain_hdr instead of struct rdt_mon_domain
2025-07-11 23:53 ` [PATCH v7 09/31] x86,fs/resctrl: Use struct rdt_domain_hdr instead of struct rdt_mon_domain Tony Luck
@ 2025-07-25 23:25 ` Reinette Chatre
0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2025-07-25 23:25 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches
Hi Tony,
On 7/11/25 4:53 PM, Tony Luck wrote:
> Historically all monitoring events have been associated with
> the L3 resource and it made sense to use the L3 specific "struct
> rdt_mon_domain *" arguments to functions manipulating domains.
Above is context describing current implementation so can be in present
tense. Needs imperative tone, eg. "All monitoring events are associated ..."
But
> the addition of monitor events tied to other resources changes this
> assumption.
>
> To enable enumeration of domains for events in other resources, change
> the calling sequence to use the generic struct rdt_domain_hdr for domain
> addition and deletion to preserve as much common code as possible.
>
> Same change to allow reading events in other resources. In this case
> the code flow passes from mon_event_read() via smp_call*() eventually
> to __mon_event_count() so the rmid_read::d field is replaced with
> the new rmid_read::hdr field.
>
> The mon_data structure is unchanged, but documentation is updated
> to note that mon_data::sum is only used for RDT_RESOURCE_L3.
Needs imperative tone.
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
...
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -159,7 +159,7 @@ void __check_limbo(struct rdt_mon_domain *d, bool force_free)
> break;
>
> entry = __rmid_entry(idx);
> - if (resctrl_arch_rmid_read(r, d, entry->closid, entry->rmid,
> + if (resctrl_arch_rmid_read(r, &d->hdr, entry->closid, entry->rmid,
> QOS_L3_OCCUP_EVENT_ID, &val,
> arch_mon_ctx)) {
> rmid_dirty = true;
> @@ -365,19 +365,23 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
> int err, ret;
> u64 tval = 0;
>
> - if (rr->first) {
> - resctrl_arch_reset_rmid(rr->r, rr->d, closid, rmid, rr->evtid);
> - m = get_mbm_state(rr->d, closid, rmid, rr->evtid);
> + if (rr->r->rid == RDT_RESOURCE_L3 && rr->first) {
> + if (WARN_ON_ONCE(!domain_header_is_valid(rr->hdr, RESCTRL_MON_DOMAIN,
This seems like a doube WARN_ON_ONCE() considering the one within domain_header_is_valid().
> + RDT_RESOURCE_L3)))
> + return -EINVAL;
> + d = container_of(rr->hdr, struct rdt_mon_domain, hdr);
> + resctrl_arch_reset_rmid(rr->r, d, closid, rmid, rr->evtid);
> + m = get_mbm_state(d, closid, rmid, rr->evtid);
> if (m)
> memset(m, 0, sizeof(struct mbm_state));
> return 0;
> }
>
> - if (rr->d) {
> + if (rr->hdr) {
> /* Reading a single domain, must be on a CPU in that domain. */
> - if (!cpumask_test_cpu(cpu, &rr->d->hdr.cpu_mask))
> + if (!cpumask_test_cpu(cpu, &rr->hdr->cpu_mask))
> return -EINVAL;
> - rr->err = resctrl_arch_rmid_read(rr->r, rr->d, closid, rmid,
> + rr->err = resctrl_arch_rmid_read(rr->r, rr->hdr, closid, rmid,
> rr->evtid, &tval, rr->arch_mon_ctx);
> if (rr->err)
> return rr->err;
> @@ -387,6 +391,9 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
> return 0;
> }
>
> + if (WARN_ON_ONCE(rr->r->rid != RDT_RESOURCE_L3))
> + return -EINVAL;
> +
> /* Summing domains that share a cache, must be on a CPU for that cache. */
> ci = get_cpu_cacheinfo_level(cpu, RESCTRL_L3_CACHE);
> if (!ci || ci->id != rr->ci_id)
> @@ -403,7 +410,7 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
> list_for_each_entry(d, &rr->r->mon_domains, hdr.list) {
> if (d->ci_id != rr->ci_id)
> continue;
> - err = resctrl_arch_rmid_read(rr->r, d, closid, rmid,
> + err = resctrl_arch_rmid_read(rr->r, &d->hdr, closid, rmid,
> rr->evtid, &tval, rr->arch_mon_ctx);
> if (!err) {
> rr->val += tval;
> @@ -432,9 +439,13 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
> static void mbm_bw_count(u32 closid, u32 rmid, struct rmid_read *rr)
> {
> u64 cur_bw, bytes, cur_bytes;
> + struct rdt_mon_domain *d;
> struct mbm_state *m;
>
> - m = get_mbm_state(rr->d, closid, rmid, rr->evtid);
> + if (WARN_ON_ONCE(domain_header_is_valid(rr->hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3)))
> + return;
Double WARN_ON_ONCE()?
> + d = container_of(rr->hdr, struct rdt_mon_domain, hdr);
> + m = get_mbm_state(d, closid, rmid, rr->evtid);
> if (WARN_ON_ONCE(!m))
> return;
>
...
> @@ -3065,26 +3078,38 @@ static int mon_add_all_files(struct kernfs_node *kn, struct rdt_mon_domain *d,
> if (ret)
> return ret;
>
> - if (!do_sum && resctrl_is_mbm_event(mevt->evtid))
> - mon_event_read(&rr, r, d, prgrp, &d->hdr.cpu_mask, mevt->evtid, true);
> + if (r->rid == RDT_RESOURCE_L3 && !do_sum && resctrl_is_mbm_event(mevt->evtid))
> + mon_event_read(&rr, r, hdr, prgrp, &hdr->cpu_mask, mevt->evtid, true);
> }
>
> return 0;
> }
>
> static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
> - struct rdt_mon_domain *d,
> + struct rdt_domain_hdr *hdr,
> struct rdt_resource *r, struct rdtgroup *prgrp)
> {
> struct kernfs_node *kn, *ckn;
> + bool snc_mode = false;
> + int domid = hdr->id;
> char name[32];
> - bool snc_mode;
> int ret = 0;
>
> lockdep_assert_held(&rdtgroup_mutex);
>
> - snc_mode = r->mon_scope == RESCTRL_L3_NODE;
> - sprintf(name, "mon_%s_%02d", r->name, snc_mode ? d->ci_id : d->hdr.id);
> + if (r->rid == RDT_RESOURCE_L3) {
> + if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
> + return -EINVAL;
The domain header check only seems necessary when needing to run the container_of()
below that depends on SNC mode so can be moved into the if() below?
> + snc_mode = r->mon_scope == RESCTRL_L3_NODE;
> + if (snc_mode) {
> + struct rdt_mon_domain *d;
> +
> + d = container_of(hdr, struct rdt_mon_domain, hdr);
> + domid = d->ci_id;
> + }
> + }
> + sprintf(name, "mon_%s_%02d", r->name, domid);
> +
> kn = kernfs_find_and_get(parent_kn, name);
> if (kn) {
> /*
Reinette
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v7 10/31] x86,fs/resctrl: Rename struct rdt_mon_domain and rdt_hw_mon_domain
2025-07-11 23:53 ` [PATCH v7 10/31] x86,fs/resctrl: Rename struct rdt_mon_domain and rdt_hw_mon_domain Tony Luck
@ 2025-07-25 23:26 ` Reinette Chatre
0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2025-07-25 23:26 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches
Hi Tony,
On 7/11/25 4:53 PM, Tony Luck wrote:
> Historically all monitoring events have been associated with the L3
> resource. This will change when support for telemetry events is added.
Same comment as previous patches. Please keep context separate from problem
statement and write in imperative tone. You can review "Changelog" section
of Documentation/process/maintainer-tip.rst for what tip maintainers will
be looking for.
>
> The structures to track monitor domains of the L3 resource at both the
> file system and architecture level have generic names. This may cause
> confusion when support for monitoring events in other resources is added.
>
> Rename by adding "l3_" into the names:
> rdt_mon_domain -> rdt_l3_mon_domain
> rdt_hw_mon_domain -> rdt_hw_l3_mon_domain
>
> No functional change.
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
patch looks good
Reinette
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v7 11/31] x86,fs/resctrl: Rename some L3 specific functions
2025-07-11 23:53 ` [PATCH v7 11/31] x86,fs/resctrl: Rename some L3 specific functions Tony Luck
@ 2025-07-25 23:26 ` Reinette Chatre
0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2025-07-25 23:26 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches
Hi Tony,
On 7/11/25 4:53 PM, Tony Luck wrote:
> All monitor functions are tied to the RDT_RESOURCE_L3 resource,
> so generic function names to setup and tear down domains makes sense.
>
> With the arrival of monitor events tied to new domains associated with
> different resources it would be clearer if these functions are more
> accurately named.
>
> Two groups of functions renamed here:
>
> Functions that allocate/free architecture per-RMID MBM state information:
> arch_domain_mbm_alloc() -> l3_mon_domain_mbm_alloc()
> mon_domain_free() -> l3_mon_domain_free()
>
> Functions that allocate/free filesystem per-RMID MBM state information:
> domain_setup_mon_state() -> domain_setup_l3_mon_state()
> domain_destroy_mon_state() -> domain_destroy_l3_mon_state()
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
This patch looks good to me. I also think the L3 renaming done as
part of patch #26 should rather move to this patch to keep the
renaming together and helping patch #26 to focus on one thing:
resctrl_mon_resource_init -> resctrl_mon_l3_resource_init()
resctrl_mon_resource_exit() -> resctrl_mon_l3_resource_exit()
Reinette
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v7 12/31] fs/resctrl: Make event details accessible to functions when reading events
2025-07-11 23:53 ` [PATCH v7 12/31] fs/resctrl: Make event details accessible to functions when reading events Tony Luck
@ 2025-07-25 23:27 ` Reinette Chatre
0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2025-07-25 23:27 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches
Hi Tony,
On 7/11/25 4:53 PM, Tony Luck wrote:
> All details about a monitor event are kept in the mon_evt structure.
> Upper levels of code only provide the event id to lower levels.
<newline to separate context from problem>
> This will become a problem when new attributes are added to the
> mon_evt structure.
Please be specific what is meant with "This" (is it that all details are
kept in the mon_evt structure or that upper levels only provide event id?)
as well as what "a problem" refers to.
>
> Change the mon_data and rmid_read structures to hold a pointer
> to the mon_evt structure instead of just taking a copy of the
> event id.
>
> No functional change.
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> fs/resctrl/internal.h | 10 +++++-----
> fs/resctrl/ctrlmondata.c | 16 ++++++++--------
> fs/resctrl/monitor.c | 17 +++++++++--------
> fs/resctrl/rdtgroup.c | 6 +++---
> 4 files changed, 25 insertions(+), 24 deletions(-)
>
> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
> index e4f06f700063..ef3ec2a4860f 100644
> --- a/fs/resctrl/internal.h
> +++ b/fs/resctrl/internal.h
> @@ -76,7 +76,7 @@ extern struct mon_evt mon_event_all[QOS_NUM_EVENTS];
> * struct mon_data - Monitoring details for each event file.
> * @list: Member of the global @mon_data_kn_priv_list list.
> * @rid: Resource id associated with the event file.
> - * @evtid: Event id associated with the event file.
> + * @evt: Event structure associated with the event file.
> * @sum: Set for RDT_RESOURCE_L3 when event must be summed
> * across multiple domains.
> * @domid: When @sum is zero this is the domain to which
> @@ -90,7 +90,7 @@ extern struct mon_evt mon_event_all[QOS_NUM_EVENTS];
> struct mon_data {
> struct list_head list;
> enum resctrl_res_level rid;
> - enum resctrl_event_id evtid;
> + struct mon_evt *evt;
> int domid;
> bool sum;
> };
> @@ -103,7 +103,7 @@ struct mon_data {
> * @r: Resource describing the properties of the event being read.
> * @hdr: Header of domain that the counter should be read from. If NULL then sum all
> * domains in @r sharing L3 @ci.id
> - * @evtid: Which monitor event to read.
> + * @evt: Event associated with the event file.
There is not always an event file involved when struct rmid_read is used to
read a counter. I believe that the original description is sufficient.
> * @first: Initialize MBM counter when true.
> * @ci_id: Cacheinfo id for L3. Only set when @hdr is NULL. Used when summing domains.
> * @err: Error encountered when reading counter.
> @@ -117,7 +117,7 @@ struct rmid_read {
> struct rdtgroup *rgrp;
> struct rdt_resource *r;
> struct rdt_domain_hdr *hdr;
> - enum resctrl_event_id evtid;
> + struct mon_evt *evt;
> bool first;
> unsigned int ci_id;
> int err;
Reinette
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v7 07/31] x86,fs/resctrl: Refactor domain_remove_cpu_mon() ready for new domain types
2025-07-11 23:53 ` [PATCH v7 07/31] x86,fs/resctrl: Refactor domain_remove_cpu_mon() ready for new domain types Tony Luck
@ 2025-07-25 23:29 ` Reinette Chatre
0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2025-07-25 23:29 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
Cc: x86, linux-kernel, patches
Hi Tony,
On 7/11/25 4:53 PM, Tony Luck wrote:
> Historically all monitoring events have been associated with the L3
> resource. This will change when support for telemetry events is added.
This is not "history" but the current state. How about:
"All monitoring events are associated with the L3 resource."
Drop the "This will change when support for telemetry events is added." from the
first paragraph since it does not contribute to the context but can be used
in problem statement later.
>
> The RDT_RESOURCE_L3 resource carries a lot of state in the domain
> structures which needs to be dealt with when a domain is taken offline
> by removing the last CPU in the domain.
Above can be part of the context.
<insert problem statement here>
> Refactor domain_remove_cpu_mon() so all the L3 processing is separated
> from general actions of clearing the CPU bit in the mask and removing
> directories from mon_data.
"directories from mon_data" -> "sub-directories from the mon_data directory"
>
> resctrl_offline_mon_domain() needs to remove domain specific
"needs" -> "continues"
> directories and files from the "mon_data" directories, but can skip the
> L3 resource specific cleanup when called for other resource types.
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
Patch looks good.
Reinette
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v7 13/31] x86,fs/resctrl: Handle events that can be read from any CPU
2025-07-11 23:53 ` [PATCH v7 13/31] x86,fs/resctrl: Handle events that can be read from any CPU Tony Luck
@ 2025-07-25 23:32 ` Reinette Chatre
0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2025-07-25 23:32 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
Cc: x86, linux-kernel, patches
Hi Tony,
On 7/11/25 4:53 PM, Tony Luck wrote:
> Resctrl file system code was built with the assumption that monitor
"resctrl assumes that monitor events can only be read from a CPU in the
cpumask_t set of each domain."
> events can only be read from a CPU in the cpumask_t set for each
> domain.
>
> This was true for x86 events accessed with an MSR interface, but may
"This is true ..."
> not be true for other access methods such as MMIO.
>
> Add a flag to struct mon_evt to indicate if the event can be read on
> any CPU.
>
> Architecture uses resctrl_enable_mon_event() to enable an event and
> set the flag appropriately.
>
> Bypass all the smp_call*() code for events that can be read on any CPU
> and call mon_event_count() directly from mon_event_read().
>
> Add a test for events that can be read from any domain to skip checks
> in __mon_event_count() that the read is being done from a CPU in the
> correct domain or cache scope.
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
...
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index 6d4191eff391..a6d11011cb8e 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -356,15 +356,43 @@ static struct mbm_state *get_mbm_state(struct rdt_l3_mon_domain *d, u32 closid,
> return state ? &state[idx] : NULL;
> }
>
> +/*
> + * For events that can be read on any CPU this function is called
> + * in preemptible context with a direct call from mon_event_read()
> + * to mon_event_count() instead of using smp_call*() to execute on a
> + * specific CPU. For other events it is called in non-preemptible context.
Thinking about this more there are a few more things involved that makes an
attempt to simplify it to preemptible/non-preemptible not be accurate.
We know from resctrl_arch_rmid_read_context_check() that resctrl_arch_rmid_read()
can (usually) sleep and that is because mon_event_count() is usually called via
smp_call_on_cpu() that runs mon_event_count() in (preemptible but not-migratable)
task context. You can confirm this with a closer look at [1] that shows the
preempt_count() is 0.
Here is an attempt to clarify the context, please consider it critically and
improve:
Called from preemptible context via a direct call of mon_event_count() for
events that can be read on any CPU.
Called from preemptible but non-migratable process context (mon_event_count()
via smp_call_on_cpu()) OR non-preemptible context (mon_event_count() via
smp_call_function_any()) for events that need to be read on a specific CPU.
[1] https://lore.kernel.org/lkml/e818906f-b03a-474b-8a6b-d291cf1a74fe@intel.com/
> + */
> +static bool cpu_on_correct_domain(struct rmid_read *rr)
> +{
> + struct cacheinfo *ci;
> + int cpu;
> +
> + /* Any CPU is OK for this event */
> + if (rr->evt->any_cpu)
> + return true;
> +
> + cpu = smp_processor_id();
> +
> + /* Single domain. Must be on a CPU in that domain. */
> + if (rr->hdr)
> + return cpumask_test_cpu(cpu, &rr->hdr->cpu_mask);
> +
> + /* Summing domains that share a cache, must be on a CPU for that cache. */
> + ci = get_cpu_cacheinfo_level(cpu, RESCTRL_L3_CACHE);
> +
> + return ci && ci->id == rr->ci_id;
> +}
> +
> static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
> {
> - int cpu = smp_processor_id();
> struct rdt_l3_mon_domain *d;
> - struct cacheinfo *ci;
> struct mbm_state *m;
> int err, ret;
> u64 tval = 0;
>
> + if (!cpu_on_correct_domain(rr))
> + return -EINVAL;
> +
> if (rr->r->rid == RDT_RESOURCE_L3 && rr->first) {
> if (WARN_ON_ONCE(!domain_header_is_valid(rr->hdr, RESCTRL_MON_DOMAIN,
> RDT_RESOURCE_L3)))
> @@ -378,9 +406,7 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
> }
>
> if (rr->hdr) {
> - /* Reading a single domain, must be on a CPU in that domain. */
> - if (!cpumask_test_cpu(cpu, &rr->hdr->cpu_mask))
> - return -EINVAL;
> + /* Single domain. */
> rr->err = resctrl_arch_rmid_read(rr->r, rr->hdr, closid, rmid,
> rr->evt->evtid, &tval, rr->arch_mon_ctx);
> if (rr->err)
> @@ -394,12 +420,9 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
> if (WARN_ON_ONCE(rr->r->rid != RDT_RESOURCE_L3))
> return -EINVAL;
As I understand the above WARN ensures that only an L3 resource can proceed considering
that the code that follows explicitly uses RESCTRL_L3_CACHE. Now that this hardcoded
RESCTRL_L3_CACHE is moved elsewhere, should the WARN not follow it?
>
> - /* Summing domains that share a cache, must be on a CPU for that cache. */
> - ci = get_cpu_cacheinfo_level(cpu, RESCTRL_L3_CACHE);
> - if (!ci || ci->id != rr->ci_id)
> - return -EINVAL;
> -
> /*
> + * Sum across multiple domains.
> + *
> * Legacy files must report the sum of an event across all
> * domains that share the same L3 cache instance.
> * Report success if a read from any domain succeeds, -EINVAL
Reinette
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v7 14/31] x86,fs/resctrl: Support binary fixed point event counters
2025-07-11 23:53 ` [PATCH v7 14/31] x86,fs/resctrl: Support binary fixed point event counters Tony Luck
@ 2025-07-25 23:34 ` Reinette Chatre
0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2025-07-25 23:34 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches
Hi Tony,
On 7/11/25 4:53 PM, Tony Luck wrote:
> Resctrl was written with the assumption that all monitor events can be
> displayed as unsigned decimal integers.
"resctrl assumes that ..."
>
> Hardware architecture counters may provide some telemetry events with
> greater precision where the event is not a simple count, but is a
> measurement of some sort (e.g. Joules for energy consumed).
>
> Add a new argument to resctrl_enable_mon_event() for architecture code
> to inform the file system that the value for a counter is a fixed-point
> value with a specific number of binary places. The file system will
> only allow architecture to use floating point format on events that it
"The file system will only allow ..." -> "Only allow ..."
> marked with mon_evt::is_floating_point.
>
> Fixed point values are displayed with values rounded to an appropriate
"Display fixed point values ..."
(please review all changelogs to ensure tip requirements are met)
> number of decimal places for the precision of the number of binary places
> provided. In general one extra decimal place is added for every three
Needs imperative. For example:
"Add one extra decimal place for every three additional binary places,
except for low precision binary values where exact representation is possible"?
> additional binary places. There are some exceptions for low precision
> binary values where exact representation is possible:
>
> 1 binary place is 0.0 or 0.5. => 1 decimal place
> 2 binary places is 0.0, 0.25, 0.5, 0.75 => 2 decimal places
> 3 binary places is 0.0, 0.125, etc. => 3 decimal places
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> include/linux/resctrl.h | 4 +-
> fs/resctrl/internal.h | 5 ++
> arch/x86/kernel/cpu/resctrl/core.c | 6 +-
> fs/resctrl/ctrlmondata.c | 88 ++++++++++++++++++++++++++++++
> fs/resctrl/monitor.c | 10 +++-
> 5 files changed, 107 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 17a21f193a3d..e9a1cabfc724 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -379,7 +379,9 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
> u32 resctrl_arch_system_num_rmid_idx(void);
> int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
>
> -void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu);
> +#define MAX_BINARY_BITS 27
> +
> +void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu, unsigned int binary_bits);
This declartion is now over 100 columns and will anyway be split across two lines
in patch #21. That split can be done here to eliminate one checkpatch.pl complaint.
>
> bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid);
>
...
> diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
> index 2e65fddc3408..71d61c96c2b8 100644
> --- a/fs/resctrl/ctrlmondata.c
> +++ b/fs/resctrl/ctrlmondata.c
> @@ -17,6 +17,7 @@
>
> #include <linux/cpu.h>
> #include <linux/kernfs.h>
> +#include <linux/math.h>
> #include <linux/seq_file.h>
> #include <linux/slab.h>
> #include <linux/tick.h>
> @@ -590,6 +591,91 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
> resctrl_arch_mon_ctx_free(r, evt->evtid, rr->arch_mon_ctx);
> }
>
> +/*
> + * Decimal place precision to use for each number of fixed-point
> + * binary bits.
> + */
> +static unsigned int decplaces[MAX_BINARY_BITS + 1] = {
> + [1] = 1,
> + [2] = 2,
> + [3] = 3,
> + [4] = 3,
> + [5] = 3,
> + [6] = 3,
> + [7] = 3,
> + [8] = 3,
> + [9] = 3,
> + [10] = 4,
> + [11] = 4,
> + [12] = 4,
> + [13] = 5,
> + [14] = 5,
> + [15] = 5,
> + [16] = 6,
> + [17] = 6,
> + [18] = 6,
> + [19] = 7,
> + [20] = 7,
> + [21] = 7,
> + [22] = 8,
> + [23] = 8,
> + [24] = 8,
> + [25] = 9,
> + [26] = 9,
> + [27] = 9
> +};
> +
> +static void print_event_value(struct seq_file *m, unsigned int binary_bits, u64 val)
> +{
> + unsigned long long frac;
> + char buf[10];
> +
> + if (!binary_bits) {
> + seq_printf(m, "%llu.0\n", val);
> + return;
> + }
> +
> + /* Mask off the integer part of the fixed-point value. */
> + frac = val & GENMASK_ULL(binary_bits, 0);
> +
> + /*
> + * Multiply by 10^{desired decimal places}. The
> + * integer part of the fixed point value is now
> + * almost what is needed.
Please expand comments to 80 columns.
> + */
> + frac *= int_pow(10ull, decplaces[binary_bits]);
> +
> + /*
> + * Round to nearest by adding a value that
> + * would be a "1" in the binary_bit + 1 place.
"binary_bit" -> "binary_bits"?
> + * Integer part of fixed point value is now
> + * the needed value.
> + */
> + frac += 1ull << (binary_bits - 1);
> +
> + /*
> + * Extract the integer part of the value. This
> + * is the decimal representation of the original
> + * fixed-point fractional value.
> + */
> + frac >>= binary_bits;
> +
> + /*
> + * "frac" is now in the range [0 .. 10^decplaces).
> + * I.e. string representation will fit into
> + * chosemn number of decimal places.
chosemn -> chosen
> + */
> + snprintf(buf, sizeof(buf), "%0*llu", decplaces[binary_bits], frac);
> +
> + /* Trim trailing zeroes */
> + for (int i = decplaces[binary_bits] - 1; i > 0; i--) {
> + if (buf[i] != '0')
> + break;
> + buf[i] = '\0';
> + }
> + seq_printf(m, "%llu.%s\n", val >> binary_bits, buf);
> +}
> +
> int rdtgroup_mondata_show(struct seq_file *m, void *arg)
> {
> struct kernfs_open_file *of = m->private;
> @@ -666,6 +752,8 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
> seq_puts(m, "Error\n");
> else if (rr.err == -EINVAL)
> seq_puts(m, "Unavailable\n");
> + else if (evt->is_floating_point)
> + print_event_value(m, evt->binary_bits, rr.val);
> else
> seq_printf(m, "%llu\n", rr.val);
>
Reinette
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v7 15/31] x86,fs/resctrl: Add an architectural hook called for each mount
2025-07-11 23:53 ` [PATCH v7 15/31] x86,fs/resctrl: Add an architectural hook called for each mount Tony Luck
@ 2025-07-25 23:35 ` Reinette Chatre
0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2025-07-25 23:35 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
Cc: x86, linux-kernel, patches
Hi Tony,
On 7/11/25 4:53 PM, Tony Luck wrote:
> Enumeration of Intel telemetry events is not complete when the
> resctrl "late_init" code is executed.
Could you please expand the context to provide detail on why
this enumeration is not complete at resctrl init time? What guarantee
is there that the enumeration will be complete at resctrl mount time?
>
> Add a hook at the beginning of the mount code that will be used
> to check for telemetry events and initialize if any are found.
>
> The hook is called on every attempted mount. But expectations are that
"Call the hook ..." (I do not think I need to point these out anymore,
will leave to you to rework all changelogs)
> most actions (like enumeration) will only need to be performed
> on the first call.
>
> The call is made with no locks held. Architecture code is responsible
> for any required locking.
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> include/linux/resctrl.h | 6 ++++++
> arch/x86/kernel/cpu/resctrl/core.c | 9 +++++++++
> fs/resctrl/rdtgroup.c | 2 ++
> 3 files changed, 17 insertions(+)
>
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index e9a1cabfc724..d2fc0fcd0226 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -460,6 +460,12 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *h
> void resctrl_online_cpu(unsigned int cpu);
> void resctrl_offline_cpu(unsigned int cpu);
>
> +/*
> + * Architecture hook called for each attempted file system mount.
"for each attempted file system mount" -> "at beginning of each file system mount attempt"?
Reinette
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v7 16/31] x86,fs/resctrl: Add and initialize rdt_resource for package scope core monitor
2025-07-11 23:53 ` [PATCH v7 16/31] x86,fs/resctrl: Add and initialize rdt_resource for package scope core monitor Tony Luck
@ 2025-07-25 23:36 ` Reinette Chatre
0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2025-07-25 23:36 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
Cc: x86, linux-kernel, patches
Hi Tony,
In subject, should "core" be dropped?
On 7/11/25 4:53 PM, Tony Luck wrote:
> Add a new PERF_PKG resource and introduce package level scope for
> monitoring these events so that CPU hotplug notifiers can build domains
It is not obvious what "these events" refer to here.
> at the package granularity.
>
> Use the physical package ID available via topology_physical_package_id()
> to identify the monitoring domains with package level scope.
>
> This enables user space to use
> /sys/devices/system/cpu/cpuX/topology/physical_package_id
> to identify the monitoring domain a CPU is associated with.
Above two paragraphs can be merged?
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
Patch looks good.
Reinette
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v7 17/31] x86/resctrl: Discover hardware telemetry events
2025-07-11 23:53 ` [PATCH v7 17/31] x86/resctrl: Discover hardware telemetry events Tony Luck
@ 2025-07-25 23:39 ` Reinette Chatre
0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2025-07-25 23:39 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
Cc: x86, linux-kernel, patches
Hi Tony,
On 7/11/25 4:53 PM, Tony Luck wrote:
> Hardware has one or more telemetry event aggregators per package
> for each group of telemetry events. Each aggregator provides access
From what I can tell this is the first mention of "group of telemetry
events" yet reader is assumed to know what it represents.
Where is this concept introduced?
> to event counts in an array of 64-bit values in MMIO space. There
> is a "guid" (in this case a unique 32-bit integer) which refers to
> an XML file published in https://github.com/intel/Intel-PMT
> that provides all the details about each aggregator.
>
> The XML file provides the following information:
> 1) Which telemetry events are included in the group for this aggregator.
Regarding "in the group for this aggregator" ... does this mean that
each aggregator can have different events from the "group"?
> 2) The order in which the event counters appear for each RMID.
> 3) The value type of each event counter (integer or fixed-point).
> 4) The number of RMIDs supported.
> 5) Which additional aggregator status registers are included.
> 6) The total size of the MMIO region for this aggregator.
Does this mean that aggregators belonging to an event group
can have different MMIO sizes? (I expect this ties in with (1))
So, specifically, the "for this aggregator" implies that these properties
can differ between aggregators belonging to the same "event group" so reader
will be looking out for how the code handles this ... but it doesn't (yet
can only be seen in later patches).
> Each aggregator makes event counters available to Linux in
> a region of MMIO memory. Enumeration of these regions is
> done by the INTEL_PMT_DISCOVERY discovery driver.
>
> Add a new Kconfig option CONFIG_X86_RESCTRL_CPU_INTEL_AET for the
I proposed same namespace prefix before, yet this keeps being different.
Why not just be consistent with exiting CONFIG_X86_CPU_RESCTRL?
> Intel specific parts of telemetry code. This depends on the
> INTEL_PMT_DISCOVERY driver being built-in to the kernel for
> enumeration of telemetry features.
hmmm ... attempting to build with these dependencies met result in:
ld: vmlinux.o: in function `get_pmt_feature':
SNIP/linux/arch/x86/kernel/cpu/resctrl/intel_aet.c:292:(.text+0x6f4e9): undefined reference to `intel_pmt_get_regions_by_feature'
ld: vmlinux.o: in function `__free_intel_pmt_put_feature_group':
SNIP/linux/arch/x86/kernel/cpu/resctrl/intel_aet.c:274:(.text+0x6f708): undefined reference to `intel_pmt_put_feature_group'
ld: vmlinux.o: in function `intel_aet_exit':
SNIP/linux/arch/x86/kernel/cpu/resctrl/intel_aet.c:382:(.exit.text+0x286): undefined reference to `intel_pmt_put_feature_group'
Looks like the dependency should be INTEL_PMT_TELEMETRY instead?
>
> Call intel_pmt_get_regions_by_feature() for each pmt_feature_id
> that indicates per-RMID telemetry.
>
> Save the returned pmt_feature_group pointers with guids that are known
> to resctrl for use at run time.
>
> Those pointers are returned to the INTEL_PMT_DISCOVERY driver at
INTEL_PMT_TELEMETRY?
> resctrl_arch_exit() time.
>
What follows is not appropriate for the merged changelog and is instead material for
the "maintainer notes" that is below the "---"
> Note that checkpatch complains about the alignment of additional
> lines in the definition of the intel_pmt_put_feature_group
> cleanup helper. I didn't find a way to appease conflicting
> requirements from checkpatch.
This just mentions one checkpatch complaint. Could you please expand this to
mention what the conflicting checkpatch.pl requirements are?
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> arch/x86/kernel/cpu/resctrl/internal.h | 8 ++
> arch/x86/kernel/cpu/resctrl/core.c | 5 +
> arch/x86/kernel/cpu/resctrl/intel_aet.c | 133 ++++++++++++++++++++++++
> arch/x86/Kconfig | 13 +++
> arch/x86/kernel/cpu/resctrl/Makefile | 1 +
> 5 files changed, 160 insertions(+)
> create mode 100644 arch/x86/kernel/cpu/resctrl/intel_aet.c
>
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 684a1b830ced..36a2072c19c7 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -169,4 +169,12 @@ void __init intel_rdt_mbm_apply_quirk(void);
>
> void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
>
> +#ifdef CONFIG_X86_RESCTRL_CPU_INTEL_AET
> +bool intel_aet_get_events(void);
> +void __exit intel_aet_exit(void);
> +#else
> +static inline bool intel_aet_get_events(void) { return false; }
> +static inline void __exit intel_aet_exit(void) { }
> +#endif
> +
> #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 2b2f76c76d73..b8288f5d4aff 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -734,6 +734,9 @@ void resctrl_arch_pre_mount(void)
>
> if (!atomic_try_cmpxchg(&only_once, &old, 1))
> return;
> +
> + if (!intel_aet_get_events())
> + return;
> }
>
> enum {
> @@ -1086,6 +1089,8 @@ late_initcall(resctrl_arch_late_init);
>
> static void __exit resctrl_arch_exit(void)
> {
> + intel_aet_exit();
> +
> cpuhp_remove_state(rdt_online);
>
> resctrl_exit();
> diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> new file mode 100644
> index 000000000000..d177e5aa1f6a
> --- /dev/null
> +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> @@ -0,0 +1,133 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Resource Director Technology(RDT)
> + * - Intel Application Energy Telemetry
> + *
> + * Copyright (C) 2025 Intel Corporation
> + *
> + * Author:
> + * Tony Luck <tony.luck@intel.com>
> + */
> +
> +#define pr_fmt(fmt) "resctrl: " fmt
> +
> +#include <linux/cleanup.h>
> +#include <linux/cpu.h>
> +#include <linux/intel_vsec.h>
> +#include <linux/resctrl.h>
> +
> +#include "internal.h"
> +
> +/**
> + * struct event_group - All information about a group of telemetry events.
> + * @pfg: Points to the aggregated telemetry space information
> + * within the OOBMSM driver that contains data for all
> + * telemetry regions.
> + * @list: List of active event groups.
How about "Member of active_event_groups."? (although unclear how this
list is used at this point ... may be more appropriate for a later patch)
> + * @guid: Unique number per XML description file.
> + */
> +struct event_group {
> + /* Data fields for additional structures to manage this group. */
> + struct pmt_feature_group *pfg;
> + struct list_head list;
> +
> + /* Remaining fields initialized from XML file. */
> + u32 guid;
> +};
> +
> +static LIST_HEAD(active_event_groups);
> +
> +/*
> + * Link: https://github.com/intel/Intel-PMT
> + * File: xml/CWF/OOBMSM/RMID-ENERGY/cwf_aggregator.xml
> + */
> +static struct event_group energy_0x26696143 = {
> + .guid = 0x26696143,
> +};
> +
> +/*
> + * Link: https://github.com/intel/Intel-PMT
> + * File: xml/CWF/OOBMSM/RMID-PERF/cwf_aggregator.xml
> + */
> +static struct event_group perf_0x26557651 = {
> + .guid = 0x26557651,
> +};
> +
> +static struct event_group *known_energy_event_groups[] = {
> + &energy_0x26696143,
> +};
> +
> +#define NUM_KNOWN_ENERGY_GROUPS ARRAY_SIZE(known_energy_event_groups)
Why is this macro needed? I think the code will be easier to understand if this
"ARRAY_SIZE" is open coded instead of using this macro.
> +
> +static struct event_group *known_perf_event_groups[] = {
> + &perf_0x26557651,
> +};
> +
> +#define NUM_KNOWN_PERF_GROUPS ARRAY_SIZE(known_perf_event_groups)
Same wrt macro.
> +
> +/* Stub for now */
> +static int discover_events(struct event_group *e, struct pmt_feature_group *p)
> +{
> + return -EINVAL;
> +}
> +
> +DEFINE_FREE(intel_pmt_put_feature_group, struct pmt_feature_group *,
> + if (!IS_ERR_OR_NULL(_T))
> + intel_pmt_put_feature_group(_T))
> +
> +/*
> + * Make a request to the INTEL_PMT_DISCOVERY driver for the
> + * pmt_feature_group for a specific feature. If there is
> + * one the returned structure has an array of telemetry_region
> + * structures. Each describes one telemetry aggregator.
> + * Try to use any with a known matching guid.
"Try to use every telemetry aggregator with a known guid."?
> + */
> +static bool get_pmt_feature(enum pmt_feature_id feature, struct event_group **evgs,
> + unsigned int num_evg)
> +{
> + struct pmt_feature_group *p __free(intel_pmt_put_feature_group) = NULL;
> + struct event_group **peg;
> + bool ret;
"ret" is of type bool but is only used for return value of discover_events() that
returns an int? So when discover_events() return -EINVAL ...?
> +
> + p = intel_pmt_get_regions_by_feature(feature);
> +
> + if (IS_ERR_OR_NULL(p))
> + return false;
> +
> + for (peg = evgs; peg < &evgs[num_evg]; peg++) {
> + ret = discover_events(*peg, p);
> + if (!ret) {
> + (*peg)->pfg = no_free_ptr(p);
> + return true;
> + }
> + }
> +
> + return false;
> +}
> +
> +/*
> + * Ask OOBMSM discovery driver for all the RMID based telemetry groups
> + * that it supports.
> + */
> +bool intel_aet_get_events(void)
> +{
> + bool ret1, ret2;
> +
> + ret1 = get_pmt_feature(FEATURE_PER_RMID_ENERGY_TELEM,
> + known_energy_event_groups, NUM_KNOWN_ENERGY_GROUPS);
Just call ARRAY_SIZE() directly here? I think it will make the code easier to understand.
> + ret2 = get_pmt_feature(FEATURE_PER_RMID_PERF_TELEM,
> + known_perf_event_groups, NUM_KNOWN_PERF_GROUPS);
Same here.
> +
> + return ret1 || ret2;
> +}
> +
> +void __exit intel_aet_exit(void)
> +{
> + struct event_group *evg, *tmp;
> +
> + list_for_each_entry_safe(evg, tmp, &active_event_groups, list) {
This cleanup does not match initialization done in this patch.
> + intel_pmt_put_feature_group(evg->pfg);
> + evg->pfg = NULL;
> + list_del(&evg->list);
> + }
> +}
Reinette
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v7 18/31] x86/resctrl: Count valid telemetry aggregators per package
2025-07-11 23:53 ` [PATCH v7 18/31] x86/resctrl: Count valid telemetry aggregators per package Tony Luck
@ 2025-07-25 23:40 ` Reinette Chatre
0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2025-07-25 23:40 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
Cc: x86, linux-kernel, patches
Hi Tony,
On 7/11/25 4:53 PM, Tony Luck wrote:
> There may be multiple telemetry aggregators per package, each enumerated
> by a telemetry region structure in the feature group.
>
> Scan the array of telemetry region structures and count how many are
> in each package in preparation to allocate structures to save the MMIO
> addresses for each in a convenient format for use when reading event
> counters.
>
> Sanity check that the telemetry region structures have a valid
> package_id and that the size they report for the MMIO space is as
package_id -> package id
> large as expected from the XML description of the registers in
> the region.
The above two paragraphs seem to describe the actual flow backwards:
first is mentions that telemetry regions are counted (above implies that
all telemetry regions are counted, valid as well as invalid) and then how
to actually determine if a telemetry region is valid?
It will be less confusing to first describe how it is determined that a
telemetry region is valid and then it will be easier to explain that only
*valid* telemetry regions are counted?
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
Patch looks good.
Reinette
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v7 19/31] x86/resctrl: Complete telemetry event enumeration
2025-07-11 23:53 ` [PATCH v7 19/31] x86/resctrl: Complete telemetry event enumeration Tony Luck
@ 2025-07-25 23:41 ` Reinette Chatre
0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2025-07-25 23:41 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
Cc: x86, linux-kernel, patches
Hi Tony,
On 7/11/25 4:53 PM, Tony Luck wrote:
> Counters for telemetry events are in MMIO space. Each telemetry_region
> structure returned in the pmt_feature_group returned from OOBMSM contains
> the base MMIO address for the counters.
>
> There may be multiple aggregators per package. Scan all the
> telemetry_region structures again and save the number of regions together
> with a flex array of the MMIO addresses for each aggregator indexed by
> package id.
I do not see why it is needed to switch back and forth between interchangeable
regions and aggregators. Why not just stick with telemetry regions? It is
confusing when, for example, saying "number of regions" followed by "for each
aggregator"? Why not just say "number of regions" followed by "for each
region"?
>
> Completed structure for each event group looks like this:
>
> +---------------------+---------------------+
> pkginfo** -->|pkginfo[package ID 0]|pkginfo[package ID 1]|
> +---------------------+---------------------+
> | |
> v v
> +----------------+ +----------------+
> |struct mmio_info| |struct mmio_info|
mmio_info -> pkg_mmio_info
> +----------------+ +----------------+
> |num_regions = N | |num_regions = N |
The above "There may be multiple aggregators (telemetry regions?) per
package." could add that the number of telemetry regions per package may
be different and supported by an example where one package has "N"
regions and the other "M".
> | addrs[0] | | addrs[0] |
> | addrs[1] | | addrs[1] |
> | ... | | ... |
> | addrs[N-1] | | addrs[N-1] |
> +----------------+ +----------------+
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> arch/x86/kernel/cpu/resctrl/intel_aet.c | 64 ++++++++++++++++++++++++-
> 1 file changed, 63 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> index 7cd6c06f9205..3f383f0a9d08 100644
> --- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
> +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> @@ -19,12 +19,26 @@
>
> #include "internal.h"
>
> +/**
> + * struct pkg_mmio_info - MMIO address information for one event group of a package.
> + * @num_regions: Number of telemetry regions on this package.
> + * @addrs: Array of MMIO addresses, one per telemetry region on this package.
> + *
> + * Provides convenient access to all MMIO addresses of one event group
> + * for one package. Used when reading event data on a package.
> + */
> +struct pkg_mmio_info {
> + unsigned int num_regions;
> + void __iomem *addrs[] __counted_by(num_regions);
> +};
> +
> /**
> * struct event_group - All information about a group of telemetry events.
> * @pfg: Points to the aggregated telemetry space information
> * within the OOBMSM driver that contains data for all
> * telemetry regions.
> * @list: List of active event groups.
> + * @pkginfo: Per-package MMIO addresses of telemetry regions belonging to this group.
> * @guid: Unique number per XML description file.
> * @mmio_size: Number of bytes of MMIO registers for this group.
> */
> @@ -32,6 +46,7 @@ struct event_group {
> /* Data fields for additional structures to manage this group. */
> struct pmt_feature_group *pfg;
> struct list_head list;
> + struct pkg_mmio_info **pkginfo;
>
> /* Remaining fields initialized from XML file. */
> u32 guid;
> @@ -90,15 +105,32 @@ static bool skip_this_region(struct telemetry_region *tr, struct event_group *e)
> return false;
> }
>
> +static void free_pkg_mmio_info(struct pkg_mmio_info **mmi)
> +{
> + int num_pkgs = topology_max_packages();
> +
> + if (!mmi)
> + return;
> +
> + for (int i = 0; i < num_pkgs; i++)
> + kfree(mmi[i]);
> + kfree(mmi);
> +}
> +
> +DEFINE_FREE(pkg_mmio_info, struct pkg_mmio_info **, free_pkg_mmio_info(_T))
> +
> /*
> * Discover events from one pmt_feature_group.
> * 1) Count how many usable telemetry regions per package.
> - * 2...) To be continued.
> + * 2) Allocate per-package structures and populate with MMIO
> + * addresses of the telemetry regions used by each aggregator.
"the telemetry regions used by each aggregator" does not sound right. "telemetry region == aggregator", no?
> */
> static int discover_events(struct event_group *e, struct pmt_feature_group *p)
> {
> + struct pkg_mmio_info **pkginfo __free(pkg_mmio_info) = NULL;
> int *pkgcounts __free(kfree) = NULL;
> struct telemetry_region *tr;
> + struct pkg_mmio_info *mmi;
> int num_pkgs;
>
> num_pkgs = topology_max_packages();
> @@ -108,6 +140,7 @@ static int discover_events(struct event_group *e, struct pmt_feature_group *p)
> tr = &p->regions[i];
> if (skip_this_region(tr, e))
> continue;
> +
> if (!pkgcounts) {
> pkgcounts = kcalloc(num_pkgs, sizeof(*pkgcounts), GFP_KERNEL);
> if (!pkgcounts)
squash with previous patch.
> @@ -119,6 +152,32 @@ static int discover_events(struct event_group *e, struct pmt_feature_group *p)
> if (!pkgcounts)
> return -ENODEV;
>
> + /* Allocate array for per-package struct pkg_mmio_info data */
> + pkginfo = kcalloc(num_pkgs, sizeof(*pkginfo), GFP_KERNEL);
> + if (!pkginfo)
> + return -ENOMEM;
> +
> + /*
> + * Allocate per-package pkg_mmio_info structures and initialize
> + * count of telemetry_regions in each one.
> + */
> + for (int i = 0; i < num_pkgs; i++) {
> + pkginfo[i] = kzalloc(struct_size(pkginfo[i], addrs, pkgcounts[i]), GFP_KERNEL);
> + if (!pkginfo[i])
> + return -ENOMEM;
> + pkginfo[i]->num_regions = pkgcounts[i];
> + }
> +
> + /* Save MMIO address(es) for each telemetry region in per-package structures */
> + for (int i = 0; i < p->count; i++) {
> + tr = &p->regions[i];
> + if (skip_this_region(tr, e))
> + continue;
> + mmi = pkginfo[tr->plat_info.package_id];
> + mmi->addrs[--pkgcounts[tr->plat_info.package_id]] = tr->addr;
> + }
> + e->pkginfo = no_free_ptr(pkginfo);
> +
> return 0;
> }
>
> @@ -151,6 +210,7 @@ static bool get_pmt_feature(enum pmt_feature_id feature, struct event_group **ev
> (*peg)->pfg = no_free_ptr(p);
> return true;
> }
> + free_pkg_mmio_info((*peg)->pkginfo);
Is this necessary? pkginfo will only be set on success, no?
> }
>
> return false;
> @@ -179,6 +239,8 @@ void __exit intel_aet_exit(void)
> list_for_each_entry_safe(evg, tmp, &active_event_groups, list) {
> intel_pmt_put_feature_group(evg->pfg);
> evg->pfg = NULL;
> + free_pkg_mmio_info(evg->pkginfo);
> + evg->pkginfo = NULL;
> list_del(&evg->list);
> }
> }
Reinette
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v7 20/31] x86,fs/resctrl: Fill in details of events for guid 0x26696143 and 0x26557651
2025-07-11 23:53 ` [PATCH v7 20/31] x86,fs/resctrl: Fill in details of events for guid 0x26696143 and 0x26557651 Tony Luck
@ 2025-07-25 23:43 ` Reinette Chatre
0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2025-07-25 23:43 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
Cc: x86, linux-kernel, patches
Hi Tony,
On 7/11/25 4:53 PM, Tony Luck wrote:
> These two guids describe the events supported on Clearwater Forest.
>
> The offsets in MMIO space are arranged in groups for each RMID.
>
> E.g the "energy counters for guid 0x26696143 are arranged like this:
Missing end-quote.
>
> MMIO offset:0x0000 Counter for RMID 0 PMT_EVENT_ENERGY
> MMIO offset:0x0008 Counter for RMID 0 PMT_EVENT_ACTIVITY
> MMIO offset:0x0010 Counter for RMID 1 PMT_EVENT_ENERGY
> MMIO offset:0x0018 Counter for RMID 1 PMT_EVENT_ACTIVITY
> ...
> MMIO offset:0x23F0 Counter for RMID 575 PMT_EVENT_ENERGY
> MMIO offset:0x23F8 Counter for RMID 575 PMT_EVENT_ACTIVITY
>
> Define these events in the file system code and add the events
> to the event_group structures.
>
> PMT_EVENT_ENERGY and PMT_EVENT_ACTIVITY are produced in fixed point
> format. File system code must output as floating point values.
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> include/linux/resctrl_types.h | 11 ++++++++
> arch/x86/kernel/cpu/resctrl/intel_aet.c | 35 +++++++++++++++++++++++++
> fs/resctrl/monitor.c | 35 ++++++++++++++-----------
> 3 files changed, 66 insertions(+), 15 deletions(-)
>
> diff --git a/include/linux/resctrl_types.h b/include/linux/resctrl_types.h
> index d98351663c2c..6838b02d5ca3 100644
> --- a/include/linux/resctrl_types.h
> +++ b/include/linux/resctrl_types.h
> @@ -47,6 +47,17 @@ enum resctrl_event_id {
> QOS_L3_MBM_TOTAL_EVENT_ID = 0x02,
> QOS_L3_MBM_LOCAL_EVENT_ID = 0x03,
>
> + /* Intel Telemetry Events */
> + PMT_EVENT_ENERGY,
> + PMT_EVENT_ACTIVITY,
> + PMT_EVENT_STALLS_LLC_HIT,
> + PMT_EVENT_C1_RES,
> + PMT_EVENT_UNHALTED_CORE_CYCLES,
> + PMT_EVENT_STALLS_LLC_MISS,
> + PMT_EVENT_AUTO_C6_RES,
> + PMT_EVENT_UNHALTED_REF_CYCLES,
> + PMT_EVENT_UOPS_RETIRED,
> +
> /* Must be the last */
> QOS_NUM_EVENTS,
> };
> diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> index 3f383f0a9d08..f4bf0f2ccf26 100644
> --- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
> +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> @@ -32,6 +32,20 @@ struct pkg_mmio_info {
> void __iomem *addrs[] __counted_by(num_regions);
> };
>
> +/**
> + * struct pmt_event - Telemetry event.
> + * @id: Resctrl event id.
> + * @idx: Counter index within each per-RMID block of counters.
> + * @bin_bits: Zero for integer valued events, else number bits in fixed-point.
Is this obvious which part of fixed-point this refers to? Compare with, for example,
"else number of bits in fraction part of fixed-point"?
> + */
> +struct pmt_event {
> + enum resctrl_event_id id;
> + unsigned int idx;
> + unsigned int bin_bits;
> +};
> +
...
> @@ -178,6 +211,8 @@ static int discover_events(struct event_group *e, struct pmt_feature_group *p)
> }
> e->pkginfo = no_free_ptr(pkginfo);
>
> + list_add(&e->list, &active_event_groups);
> +
Stray change?
I have not seen any changelog mention active_event_groups and how it is used.
Reinette
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v7 21/31] x86,fs/resctrl: Add architectural event pointer
2025-07-11 23:53 ` [PATCH v7 21/31] x86,fs/resctrl: Add architectural event pointer Tony Luck
@ 2025-07-25 23:43 ` Reinette Chatre
0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2025-07-25 23:43 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
Cc: x86, linux-kernel, patches
Hi Tony,
On 7/11/25 4:53 PM, Tony Luck wrote:
> The resctrl file system layer passed the domain, rmid, and event id to
passed -> passes
rmid -> RMID
> resctrl_arch_rmid_read() to fetch an event counter.
>
> For some resources this may not be enough information to efficiently
"For some resources this may" this is vague and speculative. This can be
made definitive because there is a clear problem solved by this, for example,
"Fetching a telemetry event counter requires additional information that is
private to the architecture, for example, the offset into MMIO space from where
counter should be read."
> access the counter.
>
> Add mon_evt::arch_priv void pointer. Architecture code can initialize
> this when marking each event enabled.
>
> File system code passes this pointer to resctrl_arch_rmid_read().
>
> Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
Reinette
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v7 22/31] x86/resctrl: Read telemetry events
2025-07-11 23:53 ` [PATCH v7 22/31] x86/resctrl: Read telemetry events Tony Luck
@ 2025-07-25 23:45 ` Reinette Chatre
0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2025-07-25 23:45 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
Cc: x86, linux-kernel, patches
Hi Tony,
On 7/11/25 4:53 PM, Tony Luck wrote:
> The resctrl file system passes requests to read event monitor files to
> the architecture resctrl_arch_rmid_read() to collect values
> from hardware counters.
>
> Use the resctrl resource to differentiate between calls to read legacy
> L3 events from the new telemetry events (which are attached to
> RDT_RESOURCE_PERF_PKG).
>
> There may be multiple aggregators tracking each package, so scan all of
> them and add up all counters.
>
> Enable the events marked as readable from any CPU providing an
> mon_evt::arch_priv pointer to the struct pmt_event for each
> event.
>
> At run time when a user reads an event file the file system code
> provides the enum resctrl_event_id for the event and the arch_priv
> pointer that was supplied when the event was enabled.
The changelog ordering seems random. It starts by describing how reading of events are
handled and how counters are added when an event is read, then describes enabling the
events (this should happen before an event can be read?), then how data is passed when
reading an event (that should be followed by adding up the counters?).
I think it may help to clearly describe the phases involved. For example, start
with how events are enabled during enumeration/discovery, then how data is
passed during runtime when a user reads an event file, then how the
data is collected.
>
> Resctrl now uses readq() so depends on X86_64. Update Kconfig.
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
...
> diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> index f4bf0f2ccf26..bd6011a95d12 100644
> --- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
> +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> @@ -14,6 +14,7 @@
> #include <linux/cleanup.h>
> #include <linux/cpu.h>
> #include <linux/intel_vsec.h>
> +#include <linux/io.h>
> #include <linux/resctrl.h>
> #include <linux/slab.h>
>
> @@ -213,6 +214,13 @@ static int discover_events(struct event_group *e, struct pmt_feature_group *p)
>
> list_add(&e->list, &active_event_groups);
>
Should this addition be documented as "step 3"?
> + for (int i = 0; i < e->num_events; i++) {
> + enum resctrl_event_id eventid;
> +
> + eventid = e->evts[i].id;
> + resctrl_enable_mon_event(eventid, true, e->evts[i].bin_bits, &e->evts[i]);
Why is eventid needed? I think using e->evts[i].id makes it more obvious how
the parameters relate.
> + }
> +
> return 0;
> }
>
Reinette
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v7 23/31] x86/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG
2025-07-11 23:53 ` [PATCH v7 23/31] x86/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG Tony Luck
@ 2025-07-25 23:46 ` Reinette Chatre
0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2025-07-25 23:46 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
Cc: x86, linux-kernel, patches
Hi Tony,
On 7/11/25 4:53 PM, Tony Luck wrote:
> The L3 resource has several requirements for domains. There are structures
> that hold the 64-bit values of counters, and elements to keep track of
> the overflow and limbo threads.
>
> None of these are needed for the PERF_PKG resource. The hardware counters
> are wide enough that they do not wrap around for decades.
>
> Define a new rdt_perf_pkg_mon_domain structure which just consists of
> the standard rdt_domain_hdr to keep track of domain id and CPU mask.
This patch does more than this.
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> arch/x86/kernel/cpu/resctrl/core.c | 41 ++++++++++++++++++++++++++++++
> 1 file changed, 41 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 63baab53821a..c954171073c7 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -547,6 +547,38 @@ static void l3_mon_domain_setup(int cpu, int id, struct rdt_resource *r, struct
> }
> }
>
> +/**
> + * struct rdt_perf_pkg_mon_domain - CPUs sharing an Intel-PMT-scoped resctrl monitor resource
What does "Intel-PMT-scoped" mean?
> + * @hdr: common header for different domain types
> + */
> +struct rdt_perf_pkg_mon_domain {
> + struct rdt_domain_hdr hdr;
> +};
> +
> +static void setup_intel_aet_mon_domain(int cpu, int id, struct rdt_resource *r,
> + struct list_head *add_pos)
This belongs in arch/x86/kernel/cpu/resctrl/intel_aet.c?
> +{
> + struct rdt_perf_pkg_mon_domain *d;
> + int err;
> +
> + d = kzalloc_node(sizeof(*d), GFP_KERNEL, cpu_to_node(cpu));
> + if (!d)
> + return;
> +
> + d->hdr.id = id;
> + d->hdr.type = RESCTRL_MON_DOMAIN;
> + d->hdr.rid = r->rid;
> + cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
> + list_add_tail_rcu(&d->hdr.list, add_pos);
> +
> + err = resctrl_online_mon_domain(r, &d->hdr);
> + if (err) {
> + list_del_rcu(&d->hdr.list);
> + synchronize_rcu();
> + kfree(d);
> + }
> +}
> +
> static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
> {
> int id = get_domain_id_from_scope(cpu, r->mon_scope);
> @@ -574,6 +606,9 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
> case RDT_RESOURCE_L3:
> l3_mon_domain_setup(cpu, id, r, add_pos);
> break;
> + case RDT_RESOURCE_PERF_PKG:
> + setup_intel_aet_mon_domain(cpu, id, r, add_pos);
> + break;
> default:
> WARN_ON_ONCE(1);
> }
> @@ -670,6 +705,12 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
> synchronize_rcu();
> l3_mon_domain_free(hw_dom);
> break;
> + case RDT_RESOURCE_PERF_PKG:
> + resctrl_offline_mon_domain(r, hdr);
> + list_del_rcu(&hdr->list);
> + synchronize_rcu();
> + kfree(container_of(hdr, struct rdt_perf_pkg_mon_domain, hdr));
> + break;
> default:
> pr_warn_once("Unknown resource rid=%d\n", r->rid);
> break;
Reinette
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v7 24/31] x86/resctrl: Add energy/perf choices to rdt boot option
2025-07-11 23:53 ` [PATCH v7 24/31] x86/resctrl: Add energy/perf choices to rdt boot option Tony Luck
@ 2025-07-25 23:46 ` Reinette Chatre
0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2025-07-25 23:46 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
Cc: x86, linux-kernel, patches
Hi Tony,
On 7/11/25 4:53 PM, Tony Luck wrote:
> #define NUM_RDT_OPTIONS ARRAY_SIZE(rdt_options)
>
> @@ -865,6 +869,32 @@ bool rdt_cpu_has(int flag)
> return ret;
> }
>
> +/*
> + * Hardware features that do not have X86_FEATURE_* bits.
> + * There is no "hardware does not support this at all" case.
> + * Assume that the caller has already determined that
> + * support is present and just needs to check if the option has been
nit: "that support is present" -> "that hardware support is present"
option -> feature?
> + * disabled by a quirk that has not been overridden by a command
> + * line option.
> + */
Reinette
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v7 25/31] x86/resctrl: Handle number of RMIDs supported by telemetry resources
2025-07-11 23:53 ` [PATCH v7 25/31] x86/resctrl: Handle number of RMIDs supported by telemetry resources Tony Luck
@ 2025-07-25 23:49 ` Reinette Chatre
0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2025-07-25 23:49 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
Cc: x86, linux-kernel, patches
Hi Tony,
On 7/11/25 4:53 PM, Tony Luck wrote:
> There are now three meanings for "number of RMIDs":
>
> 1) The number for legacy features enumerated by CPUID leaf 0xF. This
> is the maximum number of distinct values that can be loaded into the
> IA32_PQR_ASSOC MSR. Note that systems with Sub-NUMA Cluster mode enabled
> will force scaling down the CPUID enumerated value by the number of SNC
> nodes per L3-cache.
>
> 2) The number of registers in MMIO space for each event. This
> is enumerated in the XML files and is the value initialized into
> event_group::num_rmids. This will be overwritten with a lower
> value if hardware does not support all these registers at the
> same time (see next case).
>
> 3) The number of "h/w counters" (this isn't a strictly accurate
"h/w" -> hardware (throughout please)
> description of how things work, but serves as a useful analogy that
> does describe the limitations) feeding to those MMIO registers. This
> is enumerated in telemetry_region::num_rmids returned from the call to
> intel_pmt_get_regions_by_feature()
>
> Event groups with insufficient "h/w counter" to track all RMIDs are
> difficult for users to use, since the system may reassign "h/w counters"
> at any time. This means that users cannot reliably collect two consecutive
> event counts to compute the rate at which events are occurring.
Based on definitions in (1), (2), (3) I interpret above paragraph to mean that
event groups with insufficient hardware counters, thus when
"telemetry_region::num_rmids < event_group::num_rmids", are hard to use.
> Add a variable rdt_num_system_rmids which holds the number of RMIDs
> supported by the system (including adjustments if Sub-NUMA Cluster
> mode is enabled).
I asked in v6 why rdt_num_system_rmids is necessary but that was not answered
so here we are again. It still is not clear how this fits in.
>
> Use rdt_set_feature_disabled() to mark such under-resourced event groups
> as unusable. Note that the rdt_options[] structure must now be writable
So an "under resourced event group" is one that does not have sufficient
hardware counters, aka telemetry_region::num_rmids, right?
> at run-time. The request to disable will be overridden if the user
> explicitly requests to enable using the "rdt=" Linux boot argument.
>
> Scan all enabled event groups and assign the RDT_RESOURCE_PERF_PKG
> resource "num_rmids" value to the smallest of these values as this
> value will be used later to compare against the number of RMIDs
> supported by other resources.
... and that "later" is the spot where the max RMID that can be loaded into
IA32_PQR_ASSOC will be taken into account.
>
> N.B. Changed type of rdt_resource::num_rmid to u32 to match, and
> print as unsigned value in rdt_num_rmids_show().
Only if rdt_num_system_rmids is really needed but even then
resctrl_arch_system_num_rmid_idx() already exists and returns a u32.
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> include/linux/resctrl.h | 2 +-
> arch/x86/kernel/cpu/resctrl/internal.h | 4 +++
> arch/x86/kernel/cpu/resctrl/core.c | 18 ++++++++++++-
> arch/x86/kernel/cpu/resctrl/intel_aet.c | 36 +++++++++++++++++++++++++
> arch/x86/kernel/cpu/resctrl/monitor.c | 2 ++
> fs/resctrl/rdtgroup.c | 2 +-
> 6 files changed, 61 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index da76e9c37b69..74cd2979549b 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -288,7 +288,7 @@ struct rdt_resource {
> int rid;
> bool alloc_capable;
> bool mon_capable;
> - int num_rmid;
> + u32 num_rmid;
> enum resctrl_scope ctrl_scope;
> enum resctrl_scope mon_scope;
> struct resctrl_cache cache;
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 83166dd0b9c8..a6c41068dc2f 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -18,6 +18,8 @@
>
> #define RMID_VAL_UNAVAIL BIT_ULL(62)
>
> +extern u32 rdt_num_system_rmids;
> +
> /*
> * With the above fields in use 62 bits remain in MSR_IA32_QM_CTR for
> * data to be returned. The counter width is discovered from the hardware
> @@ -171,6 +173,8 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
>
> bool rdt_is_feature_enabled(char *option);
>
> +void rdt_set_feature_disabled(char *name);
> +
> #ifdef CONFIG_X86_RESCTRL_CPU_INTEL_AET
> bool intel_aet_get_events(void);
> void __exit intel_aet_exit(void);
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 83e046313600..31fb598482bf 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -807,7 +807,7 @@ struct rdt_options {
> bool force_off, force_on;
> };
>
> -static struct rdt_options rdt_options[] __ro_after_init = {
> +static struct rdt_options rdt_options[] = {
> RDT_OPT(RDT_FLAG_CMT, "cmt", X86_FEATURE_CQM_OCCUP_LLC),
> RDT_OPT(RDT_FLAG_MBM_TOTAL, "mbmtotal", X86_FEATURE_CQM_MBM_TOTAL),
> RDT_OPT(RDT_FLAG_MBM_LOCAL, "mbmlocal", X86_FEATURE_CQM_MBM_LOCAL),
> @@ -869,6 +869,22 @@ bool rdt_cpu_has(int flag)
> return ret;
> }
>
> +/*
> + * Can be called during feature enumeration if sanity check of
> + * a features parameters indicates problems with the feature.
"features" -> "feature's"
> + */
> +void rdt_set_feature_disabled(char *name)
> +{
> + struct rdt_options *o;
> +
> + for (o = rdt_options; o < &rdt_options[NUM_RDT_OPTIONS]; o++) {
> + if (!strcmp(name, o->name)) {
> + o->force_off = true;
> + return;
> + }
> + }
> +}
> +
> /*
> * Hardware features that do not have X86_FEATURE_* bits.
> * There is no "hardware does not support this at all" case.
> diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> index e64a4630e95c..6958efbf7e81 100644
> --- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
> +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> @@ -15,6 +15,7 @@
> #include <linux/cpu.h>
> #include <linux/intel_vsec.h>
> #include <linux/io.h>
> +#include <linux/minmax.h>
> #include <linux/resctrl.h>
> #include <linux/slab.h>
>
> @@ -56,6 +57,9 @@ struct pmt_event {
> * @list: List of active event groups.
> * @pkginfo: Per-package MMIO addresses of telemetry regions belonging to this group.
> * @guid: Unique number per XML description file.
> + * @num_rmids: Number of RMIDS supported by this group. Adjusted downwards
> + * if enumeration from intel_pmt_get_regions_by_feature() indicates
> + * fewer RMIDs can be tracked simultaneously.
ok ... still matches changelog by confirming "downward adjustment" is based on telemetry
region data ...
> * @mmio_size: Number of bytes of MMIO registers for this group.
> * @num_events: Number of events in this group.
> * @evts: Array of event descriptors.
> @@ -69,6 +73,7 @@ struct event_group {
>
> /* Remaining fields initialized from XML file. */
> u32 guid;
> + u32 num_rmids;
> size_t mmio_size;
> unsigned int num_events;
> struct pmt_event evts[] __counted_by(num_events);
> @@ -86,6 +91,7 @@ static LIST_HEAD(active_event_groups);
> static struct event_group energy_0x26696143 = {
> .name = "energy",
> .guid = 0x26696143,
> + .num_rmids = 576,
> .mmio_size = XML_MMIO_SIZE(576, 2, 3),
> .num_events = 2,
> .evts = {
> @@ -101,6 +107,7 @@ static struct event_group energy_0x26696143 = {
> static struct event_group perf_0x26557651 = {
> .name = "perf",
> .guid = 0x26557651,
> + .num_rmids = 576,
> .mmio_size = XML_MMIO_SIZE(576, 7, 3),
> .num_events = 7,
> .evts = {
> @@ -143,6 +150,22 @@ static bool skip_this_region(struct telemetry_region *tr, struct event_group *e)
> return false;
> }
>
> +static bool check_rmid_count(struct event_group *e, struct pmt_feature_group *p)
> +{
> + struct telemetry_region *tr;
> +
> + for (int i = 0; i < p->count; i++) {
> + tr = &p->regions[i];
> + if (skip_this_region(tr, e))
> + continue;
> +
> + if (tr->num_rmids < rdt_num_system_rmids)
This is the unexpected check that seems to contradict everything described thus far.
Should this not be "tr->num_rmids < e->num_rmids"?
> + return false;
> + }
> +
> + return true;
> +}
> +
> static void free_pkg_mmio_info(struct pkg_mmio_info **mmi)
> {
> int num_pkgs = topology_max_packages();
> @@ -165,12 +188,18 @@ DEFINE_FREE(pkg_mmio_info, struct pkg_mmio_info **, free_pkg_mmio_info(_T))
> */
> static int discover_events(struct event_group *e, struct pmt_feature_group *p)
> {
> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
> struct pkg_mmio_info **pkginfo __free(pkg_mmio_info) = NULL;
> int *pkgcounts __free(kfree) = NULL;
> struct telemetry_region *tr;
> struct pkg_mmio_info *mmi;
> int num_pkgs;
>
> + /* Potentially disable feature if insufficient RMIDs */
> + if (!check_rmid_count(e, p))
> + rdt_set_feature_disabled(e->name);
> +
> + /* User can override above disable from kernel command line */
> if (!rdt_is_feature_enabled(e->name))
> return -EINVAL;
>
> @@ -182,6 +211,8 @@ static int discover_events(struct event_group *e, struct pmt_feature_group *p)
> if (skip_this_region(tr, e))
> continue;
>
> + e->num_rmids = min(e->num_rmids, tr->num_rmids);
This seems to confirm that the check_rmid_count() check should be against e->num_rmids.
> +
> if (!pkgcounts) {
> pkgcounts = kcalloc(num_pkgs, sizeof(*pkgcounts), GFP_KERNEL);
> if (!pkgcounts)
> @@ -228,6 +259,11 @@ static int discover_events(struct event_group *e, struct pmt_feature_group *p)
> resctrl_enable_mon_event(eventid, true, e->evts[i].bin_bits, &e->evts[i]);
> }
>
Would this be "step 4" of discover_events()? If you started a "let's keep track of the
steps" then the expectation is that all steps will be tracked.
> + if (r->num_rmid)
> + r->num_rmid = min(r->num_rmid, e->num_rmids);
> + else
> + r->num_rmid = e->num_rmids;
> +
> return 0;
> }
>
Reinette
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v7 26/31] fs/resctrl: Fix life-cycle of closid_num_dirty_rmid
2025-07-11 23:53 ` [PATCH v7 26/31] fs/resctrl: Fix life-cycle of closid_num_dirty_rmid Tony Luck
@ 2025-07-25 23:51 ` Reinette Chatre
0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2025-07-25 23:51 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
Cc: x86, linux-kernel, patches
Hi Tony,
"Fix" is a loaded word to use in a patch subject and its use cannot be
justified for a non-functional change such as this.
Do not make false claims of code being broken as patch motivation.
On 7/11/25 4:53 PM, Tony Luck wrote:
> closid_num_dirty_rmid is specific to the L3 resource, but it
> is allocated/freed in the more generic dom_data_{init,exit}().
Quite bold make this argument when "the more generic
dom_data_{init,exit}()" is only called with L3 resource as argument.
This is a very straight forward change but the description so far
totally obfuscates this.
This patch does two things:
a) Rename resctrl_mon_resource_init()/resctrl_mon_resource_exit() to
resctrl_mon_l3_resource_init()/resctrl_mon_l3_resource_exit()
respectively. As mentioned earlier, this can be done as part of
earlier patch that does the renaming.
b) Separate closid_num_dirty_rmid and rmid_ptrs[] allocation done in
dom_data_init() in preparation for rmid_ptrs[] to be allocated on
resctrl mount in support of the new telemetry events.
>
> Add helpers to allocate/free closid_num_dirty_rmid.
>
> Rename resctrl_mon_resource_init() to resctrl_mon_l3_resource_init()
> and call the closid_num_dirty_rmid_init() here, instead of
> allocating in dom_data_init().
>
> Making matching changes to the exit path by renaming
> resctrl_mon_resource_exit() to resctrl_mon_l3_resource_exit()
> and free closid_num_dirty_rmid here instead of in dom_data_exit().
>
> Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> fs/resctrl/internal.h | 6 ++--
> fs/resctrl/monitor.c | 69 ++++++++++++++++++++++++-------------------
> fs/resctrl/rdtgroup.c | 12 ++++----
> 3 files changed, 48 insertions(+), 39 deletions(-)
>
> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
> index 56fdccb39375..28d505efdb7c 100644
> --- a/fs/resctrl/internal.h
> +++ b/fs/resctrl/internal.h
> @@ -358,7 +358,9 @@ int alloc_rmid(u32 closid);
>
> void free_rmid(u32 closid, u32 rmid);
>
> -void resctrl_mon_resource_exit(void);
> +int resctrl_mon_l3_resource_init(void);
> +
> +void resctrl_mon_l3_resource_exit(void);
>
> void mon_event_count(void *info);
>
> @@ -368,8 +370,6 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
> struct rdt_domain_hdr *hdr, struct rdtgroup *rdtgrp,
> cpumask_t *cpumask, struct mon_evt *evt, int first);
>
> -int resctrl_mon_resource_init(void);
> -
> void mbm_setup_overflow_handler(struct rdt_l3_mon_domain *dom,
> unsigned long delay_ms,
> int exclude_cpu);
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index 92798e1fb5b0..e3eceba70713 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -86,6 +86,37 @@ unsigned int resctrl_rmid_realloc_threshold;
> */
> unsigned int resctrl_rmid_realloc_limit;
>
> +static int closid_num_dirty_rmid_init(struct rdt_resource *r)
It is not clear to me that these new helpers are needed. To me it seems
easier to follow if they are just open coded.
> +{
> + if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID) &&
> + !closid_num_dirty_rmid) {
> + u32 num_closid = resctrl_arch_get_num_closid(r);
> + u32 *tmp;
> +
> + /*
> + * If the architecture hasn't provided a sanitised value here,
> + * this may result in larger arrays than necessary. Resctrl will
> + * use a smaller system wide value based on the resources in
> + * use.
> + */
> + tmp = kcalloc(num_closid, sizeof(*tmp), GFP_KERNEL);
> + if (!tmp)
> + return -ENOMEM;
> +
> + closid_num_dirty_rmid = tmp;
> + }
> +
> + return 0;
> +}
> +
> +static void closid_num_dirty_rmid_exit(void)
> +{
> + if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
> + kfree(closid_num_dirty_rmid);
> + closid_num_dirty_rmid = NULL;
> + }
> +}
> +
> /*
> * x86 and arm64 differ in their handling of monitoring.
> * x86's RMID are independent numbers, there is only one source of traffic
...
> @@ -938,7 +942,7 @@ bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid)
> }
>
> /**
> - * resctrl_mon_resource_init() - Initialise global monitoring structures.
> + * resctrl_mon_l3_resource_init() - Initialise global monitoring structures.
> *
> * Allocate and initialise global monitor resources that do not belong to a
> * specific domain. i.e. the rmid_ptrs[] used for the limbo and free lists.
> @@ -949,7 +953,7 @@ bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid)
> *
> * Returns 0 for success, or -ENOMEM.
> */
> -int resctrl_mon_resource_init(void)
> +int resctrl_mon_l3_resource_init(void)
> {
> struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
> int ret;
> @@ -957,6 +961,10 @@ int resctrl_mon_resource_init(void)
> if (!r->mon_capable)
> return 0;
>
> + ret = closid_num_dirty_rmid_init(r);
> + if (ret)
> + return ret;
> +
> ret = dom_data_init(r);
> if (ret)
Leaking closid_num_dirty_rmid here?
> return ret;
Reinette
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v7 27/31] x86,fs/resctrl: Move RMID initialization to first mount
2025-07-11 23:53 ` [PATCH v7 27/31] x86,fs/resctrl: Move RMID initialization to first mount Tony Luck
@ 2025-07-25 23:53 ` Reinette Chatre
0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2025-07-25 23:53 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
Cc: x86, linux-kernel, patches
Hi Tony,
On 7/11/25 4:53 PM, Tony Luck wrote:
> The resctrl file system code assumed that the only monitor events were
resctrl assumes ... etc. etc.
> tied to the RDT_RESOURCE_L3 resource. Also that the number of supported
> RMIDs was enumerated during early initialization.
>
> RDT_RESOURCE_PERF_PKG breaks both of those assumptions.
Please give detail how assumptions are broken.
>
> Delay the final enumeration of the number of RMIDs and subsequent
> allocation of structures until first mount of the resctrl file system
> so that the number of usable RMIDs can be computed as the minimum
> value from all enabled monitor resources.
This needs more thought. The idea of "final enumeration of the number of RMIDs"
does not exist. This patch modifies resctrl_arch_system_num_rmid_idx() to
compute number of RMIDs differently but there is *no* change to when
resctrl_arch_system_num_rmid_idx() is called. For example,
resctrl_arch_system_num_rmid_idx() is called as part of L3 monitor domain
initialization during CPU online that can happen long before resctrl mount.
resctrl_arch_system_num_rmid_idx() will return the number of RMIDs known
at that time that may be different from the "final enumeration". The
L3 monitor domain structures are thus created with potentially a
different RMID count than what the system will end up being able to use.
Claiming that this "delay of final enumeration of number of RMIDs and subsequent
allocation of structures until first mount" applies to all enabled monitor
resources is false.
Allocating the L3 monitoring data structures early based on RMID values known at
that time may be ok based on how system-wide RMIDs are chosen (it can only be smaller).
This will thus result in wasted space but I expect will work just fine.
A comment similar to that used during closid_num_dirty_rmid
could work but hiding implications like this just makes resctrl code
harder to understand and maintain.
> Since the dom_data* functions now only allocate/free structures
> used for RMIDs, rename: dom_data_init() -> rmid_init(),
> dom_data_exit() -> rmid_exit().
These names seem very generic. How about setup_rmid_lru_list()/free_rmid_lru_list()?
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> fs/resctrl/internal.h | 2 ++
> arch/x86/kernel/cpu/resctrl/core.c | 8 ++++++--
> fs/resctrl/monitor.c | 26 +++++++++-----------------
> fs/resctrl/rdtgroup.c | 6 ++++++
> 4 files changed, 23 insertions(+), 19 deletions(-)
>
> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
> index 28d505efdb7c..7fca1849742f 100644
> --- a/fs/resctrl/internal.h
> +++ b/fs/resctrl/internal.h
> @@ -358,6 +358,8 @@ int alloc_rmid(u32 closid);
>
> void free_rmid(u32 closid, u32 rmid);
>
> +int rmid_init(void);
> +
> int resctrl_mon_l3_resource_init(void);
>
> void resctrl_mon_l3_resource_exit(void);
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 31fb598482bf..1a6635cc5b37 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -112,10 +112,14 @@ struct rdt_hw_resource rdt_resources_all[RDT_NUM_RESOURCES] = {
>
> u32 resctrl_arch_system_num_rmid_idx(void)
> {
> - struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> + u32 num_rmids = U32_MAX;
> + struct rdt_resource *r;
> +
> + for_each_mon_capable_rdt_resource(r)
> + num_rmids = min(num_rmids, r->num_rmid);
>
> /* RMID are independent numbers for x86. num_rmid_idx == num_rmid */
> - return r->num_rmid;
> + return num_rmids == U32_MAX ? 0 : num_rmids;
> }
>
> struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index e3eceba70713..3fe81c43e5e8 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -833,20 +833,19 @@ void mbm_setup_overflow_handler(struct rdt_l3_mon_domain *dom, unsigned long del
> schedule_delayed_work_on(cpu, &dom->mbm_over, delay);
> }
>
> -static int dom_data_init(struct rdt_resource *r)
> +int rmid_init(void)
> {
> u32 idx_limit = resctrl_arch_system_num_rmid_idx();
> struct rmid_entry *entry = NULL;
> - int err = 0, i;
> u32 idx;
> + int i;
>
> - mutex_lock(&rdtgroup_mutex);
> + if (rmid_ptrs)
> + return 0;
>
> rmid_ptrs = kcalloc(idx_limit, sizeof(struct rmid_entry), GFP_KERNEL);
> - if (!rmid_ptrs) {
> - err = -ENOMEM;
> - goto out_unlock;
> - }
> + if (!rmid_ptrs)
> + return -ENOMEM;
>
> for (i = 0; i < idx_limit; i++) {
> entry = &rmid_ptrs[i];
> @@ -866,13 +865,10 @@ static int dom_data_init(struct rdt_resource *r)
> entry = __rmid_entry(idx);
> list_del(&entry->list);
>
> -out_unlock:
> - mutex_unlock(&rdtgroup_mutex);
> -
> - return err;
> + return 0;
> }
>
> -static void dom_data_exit(struct rdt_resource *r)
> +static void rmid_exit(struct rdt_resource *r)
> {
> mutex_lock(&rdtgroup_mutex);
>
> @@ -965,10 +961,6 @@ int resctrl_mon_l3_resource_init(void)
Please take a look at all functions modified to ensure function comments
are still accurate. For example, resctrl_mon_l3_resource_init() still
claims to manage rmid_ptrs[] after this change ...
> if (ret)
> return ret;
>
> - ret = dom_data_init(r);
> - if (ret)
> - return ret;
> -
> if (resctrl_arch_is_evt_configurable(QOS_L3_MBM_TOTAL_EVENT_ID)) {
> mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID].configurable = true;
> resctrl_file_fflags_init("mbm_total_bytes_config",
> @@ -993,5 +985,5 @@ void resctrl_mon_l3_resource_exit(void)
> struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
>
> closid_num_dirty_rmid_exit();
> - dom_data_exit(r);
> + rmid_exit(r);
Please do not call rmid_exit() from resctrl_mon_l3_resource_exit(). Doing so
breaks symmetry with resctrl_mon_l3_resource_init() which is especially
confusing with resctrl_mon_l3_resource_exit() called on failure exit
from resctrl_mon_l3_resource_init().
It can just be called directly from resctrl_exit()?
> }
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index b45f3d63c629..9e667d3a93ae 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -2599,6 +2599,12 @@ static int rdt_get_tree(struct fs_context *fc)
> goto out;
> }
>
> + if (resctrl_arch_mon_capable()) {
> + ret = rmid_init();
The reference to "resctrl_init()" in comments seem quite stale now.
> + if (ret)
> + goto out;
> + }
> +
> ret = rdtgroup_setup_root(ctx);
> if (ret)
> goto out;
Reinette
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v7 28/31] x86/resctrl: Enable RDT_RESOURCE_PERF_PKG
2025-07-11 23:53 ` [PATCH v7 28/31] x86/resctrl: Enable RDT_RESOURCE_PERF_PKG Tony Luck
@ 2025-07-25 23:54 ` Reinette Chatre
0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2025-07-25 23:54 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
Cc: x86, linux-kernel, patches
Hi Tony,
On 7/11/25 4:53 PM, Tony Luck wrote:
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 1a6635cc5b37..1d07c38ed528 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -774,14 +774,26 @@ static int resctrl_arch_offline_cpu(unsigned int cpu)
>
> void resctrl_arch_pre_mount(void)
> {
> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
> static atomic_t only_once = ATOMIC_INIT(0);
> - int old = 0;
> + int cpu, old = 0;
>
> if (!atomic_try_cmpxchg(&only_once, &old, 1))
> return;
>
> if (!intel_aet_get_events())
> return;
> +
> + /*
> + * Late discovery of telemetry events means the domains for the
> + * resource were not built. Do that now.
> + */
> + cpus_read_lock();
> + mutex_lock(&domain_list_lock);
> + for_each_online_cpu(cpu)
> + domain_add_cpu_mon(cpu, r);
Without an explicit "mon_capable" check this change now creates a new "contract" that
"if intel_aet_get_events() succeeds then PERF_PKG is mon_capable". I do not believe this
additional complication justifies saving a line of code.
> + mutex_unlock(&domain_list_lock);
> + cpus_read_unlock();
> }
>
> enum {
> diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> index 6958efbf7e81..ea7a782c1661 100644
> --- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
> +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> @@ -264,6 +264,9 @@ static int discover_events(struct event_group *e, struct pmt_feature_group *p)
> else
> r->num_rmid = e->num_rmids;
>
> + pr_info("%s %s monitoring detected\n", r->name, e->name);
> + r->mon_capable = true;
> +
> return 0;
> }
>
Reinette
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v7 29/31] fs/resctrl: Provide interface to create architecture specific debugfs area
2025-07-11 23:53 ` [PATCH v7 29/31] fs/resctrl: Provide interface to create architecture specific debugfs area Tony Luck
@ 2025-07-25 23:55 ` Reinette Chatre
0 siblings, 0 replies; 61+ messages in thread
From: Reinette Chatre @ 2025-07-25 23:55 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
Cc: x86, linux-kernel, patches
Hi Tony,
On 7/11/25 4:53 PM, Tony Luck wrote:
> Architectures are constrained to just the file interfaces provided by
> the file system for each resource. This does not allow for architecture
> specific debug interfaces.
(squashed context and problem description)
>
> Add resctrl_debugfs_mon_info_arch_mkdir() which creates a directory in the
> debugfs file system for a resource. Naming follows the layout of the
> main resctrl hierarchy:
>
> /sys/kernel/debug/resctrl/info/{resource}_MON/{arch}
>
> The {arch} last level directory name matches the output of
> the user level "uname -m" command.
>
> Architecture code may use this directory for debug information,
> or for minor tuning of features. It must not be used for basic
> feature enabling as debugfs may not be configured/mounted on
> production systems.
>
> Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> include/linux/resctrl.h | 9 +++++++++
> fs/resctrl/rdtgroup.c | 29 +++++++++++++++++++++++++++++
> 2 files changed, 38 insertions(+)
>
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 74cd2979549b..ed5085eeee1b 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -571,6 +571,15 @@ void resctrl_arch_reset_all_ctrls(struct rdt_resource *r);
> extern unsigned int resctrl_rmid_realloc_threshold;
> extern unsigned int resctrl_rmid_realloc_limit;
>
> +/**
> + * resctrl_debugfs_mon_info_arch_mkdir() - Create a debugfs info directory.
> + * Removed by resctrl_exit().
> + * @r: Resource (must be mon_capable).
> + *
> + * Return: dentry pointer on success, or NULL on error.
> + */
> +struct dentry *resctrl_debugfs_mon_info_arch_mkdir(struct rdt_resource *r);
> +
> int resctrl_init(void);
> void resctrl_exit(void);
>
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index 9e667d3a93ae..fdd6cf372d6c 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -24,6 +24,7 @@
> #include <linux/sched/task.h>
> #include <linux/slab.h>
> #include <linux/user_namespace.h>
> +#include <linux/utsname.h>
>
> #include <uapi/linux/magic.h>
>
> @@ -4350,6 +4351,33 @@ int resctrl_init(void)
> return ret;
> }
>
> +static struct dentry *debugfs_resctrl_info;
Please move this declaration to be with its partner, debugfs_resctrl.
> +
> +/*
> + * Create /sys/kernel/debug/resctrl/info/{r->name}_MON/{arch} directory
> + * by request for architecture to use for debugging or minor tuning.
> + * Basic functionality of features must not be controlled by files
> + * added to this directory as debugs may not be configured/mounted
debugs -> debugfs
> + * on production systems.
> + */
> +struct dentry *resctrl_debugfs_mon_info_arch_mkdir(struct rdt_resource *r)
> +{
> + struct dentry *moninfodir;
> + char name[32];
> +
> + if (!r->mon_capable)
> + return NULL;
> +
> + if (!debugfs_resctrl_info)
> + debugfs_resctrl_info = debugfs_create_dir("info", debugfs_resctrl);
> +
> + sprintf(name, "%s_MON", r->name);
> +
> + moninfodir = debugfs_create_dir(name, debugfs_resctrl_info);
> +
> + return debugfs_create_dir(utsname()->machine, moninfodir);
> +}
> +
> static bool resctrl_online_domains_exist(void)
> {
> struct rdt_resource *r;
> @@ -4401,6 +4429,7 @@ void resctrl_exit(void)
>
> debugfs_remove_recursive(debugfs_resctrl);
> debugfs_resctrl = NULL;
> + debugfs_resctrl_info = NULL;
> unregister_filesystem(&rdt_fs_type);
>
> /*
Reinette
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
` (30 preceding siblings ...)
2025-07-11 23:53 ` [PATCH v7 31/31] x86,fs/resctrl: Update Documentation for package events Tony Luck
@ 2025-07-30 18:42 ` Moger, Babu
2025-07-30 20:27 ` Reinette Chatre
31 siblings, 1 reply; 61+ messages in thread
From: Moger, Babu @ 2025-07-30 18:42 UTC (permalink / raw)
To: Tony Luck, Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman,
Peter Newman, James Morse, Drew Fustini, Dave Martin,
Anil Keshavamurthy, Chen Yu
Cc: x86, linux-kernel, patches
Hi Reinette,
On 7/11/25 18:53, Tony Luck wrote:
> The prerequisite patch series to the Intel Telemetry code is
> now in the Linux x86 platform drivers tree:
> Link: https://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86.git/
> queued for the v6.17 merge window.
>
> That series is based on v6.16-rc1. One resctrl bugfix went into
> Linus' tree after -rcl: commit 594902c986e2 ("x86,fs/resctrl: Remove
> inappropriate references to cacheinfo in the resctrl subsystem")
>
> These patches are based on the x86 platform drivers tree plus cherry
> pick of that patch. For convenience I've pushed that base, and this
> series to the rdt-aet-v7-base and rdt-aet-v7 branches of:
> git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git
>
> Changes since v6 was posted here:
> Link: https://lore.kernel.org/all/20250626164941.106341-1-tony.luck@intel.com/
>
>
> --- cover-letter ---
> Rewritten - and then updated with comments from:
> Link: https://lore.kernel.org/all/f3ba783a-6387-4997-9e8c-897109ee3559@intel.com/
>
> --- 1 ---
> Added review tag from Fenghua
> Change kerneldoc for mon_evt::rid to "resource id for this event"
>
>
> --- 2 ---
> Added review tags from Fenghua & Reinette
>
>
> --- 3 ---
> Added review tag from Reinette
>
>
> --- 4 ---
> Added review tag from Reinette
We both(Tony and me) are carrying these patches. As we are in a merge
window now, any plans to send first four patches for 6.17 queue?
Just a thought.
Thanks
Babu
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring
2025-07-30 18:42 ` [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Moger, Babu
@ 2025-07-30 20:27 ` Reinette Chatre
2025-07-30 22:05 ` Moger, Babu
0 siblings, 1 reply; 61+ messages in thread
From: Reinette Chatre @ 2025-07-30 20:27 UTC (permalink / raw)
To: babu.moger, Tony Luck, Fenghua Yu, Maciej Wieczor-Retman,
Peter Newman, James Morse, Drew Fustini, Dave Martin, Chen Yu
Cc: x86, linux-kernel, patches
Hi Babu,
On 7/30/25 11:42 AM, Moger, Babu wrote:
>
>
> We both(Tony and me) are carrying these patches. As we are in a merge
> window now, any plans to send first four patches for 6.17 queue?
>
No. Any resctrl patches targeting this merge window should already have been
merged into tip's x86/cache a couple of weeks before this merge window opened.
We can definitely consider asking x86 maintainers to pick up these four patches
when they start taking patches for v6.18, at which time one or both of these
contributions may be ready anyway. Will know better at that time.
Reinette
^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring
2025-07-30 20:27 ` Reinette Chatre
@ 2025-07-30 22:05 ` Moger, Babu
0 siblings, 0 replies; 61+ messages in thread
From: Moger, Babu @ 2025-07-30 22:05 UTC (permalink / raw)
To: Reinette Chatre, babu.moger, Tony Luck, Fenghua Yu,
Maciej Wieczor-Retman, Peter Newman, James Morse, Drew Fustini,
Dave Martin, Chen Yu
Cc: x86, linux-kernel, patches
Hi Reinette,
On 7/30/2025 3:27 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 7/30/25 11:42 AM, Moger, Babu wrote:
>>
>>
>> We both(Tony and me) are carrying these patches. As we are in a merge
>> window now, any plans to send first four patches for 6.17 queue?
>>
>
> No. Any resctrl patches targeting this merge window should already have been
> merged into tip's x86/cache a couple of weeks before this merge window opened.
>
> We can definitely consider asking x86 maintainers to pick up these four patches
> when they start taking patches for v6.18, at which time one or both of these
> contributions may be ready anyway. Will know better at that time.
>
Ok. Sounds good.
Thanks
Babu
^ permalink raw reply [flat|nested] 61+ messages in thread
end of thread, other threads:[~2025-07-30 22:05 UTC | newest]
Thread overview: 61+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-11 23:53 [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Tony Luck
2025-07-11 23:53 ` [PATCH v7 01/31] x86,fs/resctrl: Consolidate monitor event descriptions Tony Luck
2025-07-17 17:51 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 02/31] x86,fs/resctrl: Replace architecture event enabled checks Tony Luck
2025-07-11 23:53 ` [PATCH v7 03/31] x86/resctrl: Remove 'rdt_mon_features' global variable Tony Luck
2025-07-11 23:53 ` [PATCH v7 04/31] x86,fs/resctrl: Prepare for more monitor events Tony Luck
2025-07-11 23:53 ` [PATCH v7 05/31] x86,fs/resctrl: Improve domain type checking Tony Luck
2025-07-25 23:17 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 06/31] x86/resctrl: Move L3 initialization into new helper function Tony Luck
2025-07-25 23:21 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 07/31] x86,fs/resctrl: Refactor domain_remove_cpu_mon() ready for new domain types Tony Luck
2025-07-25 23:29 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 08/31] x86/resctrl: Clean up domain_remove_cpu_ctrl() Tony Luck
2025-07-25 23:22 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 09/31] x86,fs/resctrl: Use struct rdt_domain_hdr instead of struct rdt_mon_domain Tony Luck
2025-07-25 23:25 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 10/31] x86,fs/resctrl: Rename struct rdt_mon_domain and rdt_hw_mon_domain Tony Luck
2025-07-25 23:26 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 11/31] x86,fs/resctrl: Rename some L3 specific functions Tony Luck
2025-07-25 23:26 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 12/31] fs/resctrl: Make event details accessible to functions when reading events Tony Luck
2025-07-25 23:27 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 13/31] x86,fs/resctrl: Handle events that can be read from any CPU Tony Luck
2025-07-25 23:32 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 14/31] x86,fs/resctrl: Support binary fixed point event counters Tony Luck
2025-07-25 23:34 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 15/31] x86,fs/resctrl: Add an architectural hook called for each mount Tony Luck
2025-07-25 23:35 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 16/31] x86,fs/resctrl: Add and initialize rdt_resource for package scope core monitor Tony Luck
2025-07-25 23:36 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 17/31] x86/resctrl: Discover hardware telemetry events Tony Luck
2025-07-25 23:39 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 18/31] x86/resctrl: Count valid telemetry aggregators per package Tony Luck
2025-07-25 23:40 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 19/31] x86/resctrl: Complete telemetry event enumeration Tony Luck
2025-07-25 23:41 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 20/31] x86,fs/resctrl: Fill in details of events for guid 0x26696143 and 0x26557651 Tony Luck
2025-07-25 23:43 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 21/31] x86,fs/resctrl: Add architectural event pointer Tony Luck
2025-07-25 23:43 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 22/31] x86/resctrl: Read telemetry events Tony Luck
2025-07-25 23:45 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 23/31] x86/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG Tony Luck
2025-07-25 23:46 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 24/31] x86/resctrl: Add energy/perf choices to rdt boot option Tony Luck
2025-07-25 23:46 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 25/31] x86/resctrl: Handle number of RMIDs supported by telemetry resources Tony Luck
2025-07-25 23:49 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 26/31] fs/resctrl: Fix life-cycle of closid_num_dirty_rmid Tony Luck
2025-07-25 23:51 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 27/31] x86,fs/resctrl: Move RMID initialization to first mount Tony Luck
2025-07-25 23:53 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 28/31] x86/resctrl: Enable RDT_RESOURCE_PERF_PKG Tony Luck
2025-07-25 23:54 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 29/31] fs/resctrl: Provide interface to create architecture specific debugfs area Tony Luck
2025-07-25 23:55 ` Reinette Chatre
2025-07-11 23:53 ` [PATCH v7 30/31] x86/resctrl: Add debugfs files to show telemetry aggregator status Tony Luck
2025-07-11 23:53 ` [PATCH v7 31/31] x86,fs/resctrl: Update Documentation for package events Tony Luck
2025-07-30 18:42 ` [PATCH v7 00/31] x86,fs/resctrl telemetry monitoring Moger, Babu
2025-07-30 20:27 ` Reinette Chatre
2025-07-30 22:05 ` Moger, Babu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).