linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring
@ 2025-06-26 16:49 Tony Luck
  2025-06-26 16:49 ` [PATCH v6 01/30] x86,fs/resctrl: Consolidate monitor event descriptions Tony Luck
                   ` (32 more replies)
  0 siblings, 33 replies; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

These patches are based v6.16-rc3 plus David Box's V2 series for
"Intel VSEC/PMT: Introduce Discovery Driver" posted here:

Link: https://lore.kernel.org/all/20250617014041.2861032-1-david.e.box@linux.intel.com/

I've pushed "v6.16-rc3 + David's patches" to:
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git davidboxv2

The total set (v6.16-rc3 + David's patches + this series) is here:

git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git rdt-aet-v6

The first four patches of this series are shared with Babu's ABMC
series. Perhaps if all issues in these four patches are resolved here
these patches could move to upstream?

Note that patches 0017 onwards of this V6 series depend on David's patches
so can't go upstream until that series is merged.

Changes since v5 was posted here:

Link: https://lore.kernel.org/all/20250521225049.132551-1-tony.luck@intel.com/

Change map indexed by patch numbers in v5. Some patches have been merged,
split, dropped, or re-ordered. The v6 patch numbers are referred to
by their 4-digit git format-patch numbers in an attempt to avoid
confusion.

 --- 1 ---
Fixed extra space in commit message

Changed to consistent use of "eventid" for variables/arguments
of type "enum resctrl_event_id" in routines touched by this
series.

Added resctrl_event_id::QOS_FIRST_EVENT and used it as lower bound
when looping over all events.

Added #define for_each_mon_event() to iterate over all events.

s/Description of a monitor event/Properties of a monitor event/

 --- 2 ---
Use QOS_FIRST_EVENT as lower bounds check in resctrl_is_mon_event_enabled().


 --- 3 ---
Replaced changelog with Reinette's improved version.

 --- 4 ---
Add for_each_mbm_event_id() helper to iterate over MBM events.

Peter: Use sizeof(*hw_dom->arch_mbm_states[0]) instead of sizeof(struct arch_mbm_state)
(ditto for "struct mbm_state" instance).

Replaced kerneldoc description of rdt_mon_domain::mbm_states with Reinette's
improved version. Ditto for rdt_hw_mon_domain::arch_mbm_state.

Fixed resctrl_arch_reset_rmid_all() to use same coding pattern of
looping on enabled events instead of checking if rdt_hw_mon_domain::arch_mbm_state
has been allocated.

In get_mbm_state() use local variable name "state" (singular) to match other
code patterns.

Drop duplicate @arch_mbm_states: in kerneldoc for struct rdt_hw_mon_domain.

Drop " or combined CLOSID, RMID on Arm" from @arch_mbm_states description.

Use sizeof(*hw_dom->arch_mbm_states[0]) in resctrl_arch_reset_rmid_all().

Add new macro for_each_mbm_idx() (suggested by Fenghua). Use
it in mon_domain_free() and domain_destroy_mon_state(). Also to
avoid landmines in "cleanup:" code in arch_domain_mbm_alloc() and
domain_setup_mon_state().

 --- 5 ---
Patch dropped. No need for fake definitions of OOBMSM access routines and
structures because the real patches have been posted by David Box. Version
2 of his series here:
https://lore.kernel.org/all/20250617014041.2861032-1-david.e.box@linux.intel.com/

 --- 6 ---
Now 0005 in V6 series.
Updated the domain_header_is_valid() check in rdtgroup_mondata_show() to
explicitly test for RDT_RESOURCE_L3 since at this point in the series only
the L3 resource is possible, or valid. Similar change in subsequent patches
where routines to process only struct rdt_mon_domain make consistency checks.


 --- 7 ---
Moved later to patch 0011 when it becomes a little clearer which renames
are useful, and which may be just noise. Functions that now take a
"struct rdt_l3_mon_domain" argument are obviously for L3 only and don't
need a rename to make that clear.


 --- 8 ---
Now patch 0006. No comments received on V5 version of this patch.


 --- 9 ---
Now patch 0007.
s/goto done/goto out_unlock/

Changes to make symmetric cleanups to domain_remove_cpu_ctrl()
to match the code flow changes that were made to domain_remove_cpu_mon()
split out into new patch 0008.


 --- 10 ---
Now patch 0009.
Anil reported that "d" was used uninitialized in domain_remove_cpu_ctrl().

Fenghua reported an error path that did not unlock the rdtgroup_mutex.

Reinette reported change unrelated to commit message. Moved this to
new patch 0008 (now with change log).

Several domain_header_is_valid() checks now check for RDT_RESOURCE_L3.

Updated kerneldoc comments for struct mon_data to says that @sum is only
used for RDT_RESOURCE_L3.

Updated kerneldoc for mon_get_kn_priv() @do_sum to say it is only
meaningful for L3 domain and added a WARN_ON_ONCE() if this some other
resource tried to set do_sum.

More functions changed to pass struct rdt_domain_hdr. Specifically the
call chain from rdtgroup_mondata_show() down to resctrl_arch_rmid_read()
(via the smp_call*()) so rmid_read::d has been replaced by rmid_read::hdr.


 --- 11 ---
Now patch 0010.Completing this patch before the function renaming
(was patch 7, now 0011) makes it clearer where renames are useful.


 --- 12 ---
No comments received for this patch (patch numbers now aligned as
this is 0012 in V6).

 --- 13 ---
Updated commit comment with better text from Reinette.

s/goto done/goto out_ctx_free/

Changed polarity and name of helper function from cpu_on_wrong_domain()
to cpu_on_correct_domain() to avoid double negatives.


 --- 14 ---
Add mon_evt::is_floating_point set by resctrl file system code to limit
which events architecture code can request be displayed in floating point.

Simplified the fixed-point to floating point algorithm. Reinette is
correct that the additional "lshift" and "rshift" operations are not
required. All that is needed is to multiply the fixed point fractional
part by 10**decimal_places, add a rounding amount equivalent to a "1"
in the binary place after those supplied. Finally divide by 2**binary_places
(with a right shift).

Explained in commit comment how I chose the number of decimal places to
use for each binary places value.

N.B. Dave Martin expressed an opinion that the kernel should not do
this conversion. Instead it should enumerate the scaling factor for
each event where hardware reported a fixed point value. This patch
could be dropped and replaced with one to enumerate scaling factors
per event if others agree with Dave.


 --- 15 ---
Initialize atomic in resctrl_arch_pre_mount() using ATOMIC_INIT(0).

 --- 16 ---
No comments received on this patch.


 --- 17 ---
Changed comment in struct event_group from " Data fields used by this code."
to "Data fields for additional structures to manage this group."

Removed line continuations for DEFINE_FREE(). Though checkpatch is still
not happy. My following line is a "if" statement. Checkpatch wants this
to both 1) Line vertically with the "(" on preceeding line, and 2) Not
use <TAB> followed by some <SPACE> characters to make that happen.

Refactored the interface between get_pmt_feature() and configure_event()
per-Reinette's suggestion to avoid both functions looping through the
p->count entries in the pmt_feature_group structure.

Kconfig changes here ensure that David's INTEL_PMT_TELEMETRY code is
built-in to the kernel so it can be used by resctrl.


 --- 18 ---
Define a macro XML_MMIO_SIZE() as a way to document the hard-coded numbers
used to calculate the expected size of the mmio region.

If the MMIO size reported by intel_pmt_get_regions_by_feature() is smaller
than expected, print that size as part of the warning message.


 --- 19 ---
Rename mmio_info::count to mmio_info::num_regions.

Fix typo s/[0]/[1]/g in ascii art commit message structure diagram.

Take better suggestions for kerneldoc descriptions of mmio_info
structure and the @num_regions and @addrs fields.

Add a period for event_group::pkginfo kerneldoc description.

Fix declaration of **pkginfo in configure_events()

Add "_once" for "Duplicate telemetry" warning.

 --- 20 ---
Shorten field names pmt_event::evtid -> pmt_event::id and pmt_event:evt_idx
becomes pmt_event::idx.

Fenghua: Align each of th struct event_group on the "=".

Anil: Use a macro to initialize entries in mon_event_all[]. This could
be done in patch 1, but with only three events at that point the visual
clutter wasn't too awful.


 --- 21 ---
Split into:
0021: Add mon_evt::arch_priv void pointer that can be set by architectural
code when enabling an event, and passed through to resctrl_arch_rmid_read()

0022: Code to set the arch_priv pointer and use it find the containing
struct event_group for the parameters to read counter from MMIO space.
Note that due to changes in part 0009 resctrl_arch_rmid_read() takes
a struct rdt_domain_hdr argument.


 --- 22 ---
Now 0023: Keep "default:" as last option in switch in domain_remove_cpu_mon().

Comments about "goto mkdir" (was "goto do_mkdir" covered by change in
patch 0009.

 --- 23 ---
Moved to patch 0027.

No comments received for this patch. But one small change. Now
sets r->mon_capable = true; in intel_aet_get_events() so this is
done before the calculation of the minimum of RMIDs supported
in part 0025.


 --- 24 ---
Changed to address Reinette's point that initial implementation
would not work in the same way as other boot choices. Specifically
if a quirk disables a feature because of an erratum, the user should
be able to override from the command line and use it anyway.
This patch provides the option for user to disable a telemetry
feature from the command line. The force enable option moved to
next patch where it is used.


 --- 25 ---
Improved commit comment per-Reinette suggestion.

s/Will be adjusted/Adjusted/ in kerneldoc for event_group::num_rmids

Improved text for comment on not configuring a telemetry feature that
has fewer RMIDs than supported by IA32_PQR_ASSOC.

Second part of command line implementation is here to allow user to
override the fewer RMIDs issue and use a resource anyway.

 --- 26 ---
Back in sync with patch 0026. No comments received for this patch.

 --- 27 ---
Changed from providing a mechanism for architecture code to create
a custom "info/{resource}" file to providing a debugfs directory
for use by a monitor resource. Discussion on the name of the directory
fizzled out. I've gone with:
	/sys/kernel/debug/resctrl/info/{resource}_MON/{utsname()->machine}

 --- 28 ---
No comments on this patch. Changed to create one debugfs file for each
value from each aggregator instance.


 --- 29 ---
No comments on this patch.



Background
----------

Telemetry features are being implemented in conjunction with the
IA32_PQR_ASSOC.RMID value on each logical CPU. This is used to send
counts for various events to a collector in a nearby OOBMSM device to be
accumulated with counts for each <RMID, event> pair received from other
CPUs. Cores send event counts when the RMID value changes, or after each
2ms elapsed time.

Each OOBMSM device may implement multiple event collectors with each
servicing a subset of the logical CPUs on a package.  In the initial
hardware implementation, there are two categories of events: energy
and perf.

1) Energy - Two counters
core_energy: This is an estimate of Joules consumed by each core. It is
calculated based on the types of instructions executed, not from a power
meter. This counter is useful to understand how much energy a workload
is consuming.

activity: This measures "accumulated dynamic capacitance". Users who
want to optimize energy consumption for a workload may use this rather
than core_energy because it provides consistent results independent of
any frequency or voltage changes that may occur during the runtime of
the application (e.g. entry/exit from turbo mode).

2) Performance - Seven counters
These are similar events to those available via the Linux "perf" tool,
but collected in a way with much lower overhead (no need to collect data
on every context switch).

stalls_llc_hit - Counts the total number of unhalted core clock cycles
when the core is stalled due to a demand load miss which hit in the LLC

c1_res - Counts the total C1 residency across all cores. The underlying
counter increments on 100MHz clock ticks

unhalted_core_cycles - Counts the total number of unhalted core clock
cycles

stalls_llc_miss - Counts the total number of unhalted core clock cycles
when the core is stalled due to a demand load miss which missed all the
local caches

c6_res - Counts the total C6 residency. The underlying counter increments
on crystal clock (25MHz) ticks

unhalted_ref_cycles - Counts the total number of unhalted reference clock
(TSC) cycles

uops_retired - Counts the total number of uops retired

The counters are arranged in groups in MMIO space of the OOBMSM device.
E.g. for the energy counters the layout is:

Offset: Counter
0x00	core energy for RMID 0
0x08	core activity for RMID 0
0x10	core energy for RMID 1
0x18	core activity for RMID 1
...

Enumeration
-----------

The only CPUID based enumeration for this feature is the legacy
CPUID(eax=7,ecx=0).ebx{12} that indicates the presence of the
IA32_PQR_ASSOC MSR and the RMID field within it.

The OOBMSM driver discovers which features are present via
PCIe VSEC capabilities. Each feature is tagged with a unique
identifier. These identifiers indicate which XML description file from
https://github.com/intel/Intel-PMT describes which event counters are
available and their layout within the MMIO BAR space of the OOBMSM device.

Resctrl User Interface
----------------------

Because there may be multiple OOBMSM collection agents per processor
package, resctrl accumulates event counts from all agents on a package
and presents a single value to users. This will provide a consistent
user interface on future platforms that vary the number of collectors,
or the mappings from logical CPUs to collectors.

Users will continue to see the legacy monitoring files in the "L3"
directories and the telemetry files in the new "PERF_PKG" directories
(with each file providing the aggregated value from all OOBMSM collectors
on that package).

$ tree /sys/fs/resctrl/mon_data/
/sys/fs/resctrl/mon_data/
├── mon_L3_00
│   ├── llc_occupancy
│   ├── mbm_local_bytes
│   └── mbm_total_bytes
├── mon_L3_01
│   ├── llc_occupancy
│   ├── mbm_local_bytes
│   └── mbm_total_bytes
├── mon_PERF_PKG_00
│   ├── activity
│   ├── c1_res
│   ├── c6_res
│   ├── core_energy
│   ├── stalls_llc_hit
│   ├── stalls_llc_miss
│   ├── unhalted_core_cycles
│   ├── unhalted_ref_cycles
│   └── uops_retired
└── mon_PERF_PKG_01
    ├── activity
    ├── c1_res
    ├── c6_res
    ├── core_energy
    ├── stalls_llc_hit
    ├── stalls_llc_miss
    ├── unhalted_core_cycles
    ├── unhalted_ref_cycles
    └── uops_retired

Resctrl Implementation
----------------------

The OOBMSM driver exposes "intel_pmt_get_regions_by_feature()"
that returns an array of structures describing the per-RMID groups it
found from the VSEC enumeration. Linux looks at the unique identifiers
for each group and enables resctrl for all groups with known unique
identifiers.

The memory map for the counters for each <RMID, event> pair is described
by the XML file. This is too unwieldy to use in the Linux kernel, so a
simplified representation is built into the resctrl code. Note that the
counters are in MMIO space instead of accessed using the IA32_QM_EVTSEL
and IA32_QM_CTR MSRs. This means there is no need for cross-processor
calls to read counters from a CPU in a specific domain. The counters
can be read from any CPU.

High level description of code changes:

1) New scope RESCTRL_PACKAGE
2) New struct rdt_resource RDT_RESOURCE_PERF_PKG
3) Refactor monitor code paths to split existing L3 paths from new ones. In some cases this ends up with:
        switch (r->rid) {
        case RDT_RESOURCE_L3:
                helper for L3
                break;
        case RDT_RESOURCE_PERF_PKG:
                helper for PKG
                break;
        }
4) New source code file "intel_aet.c" for the code to enumerate, configure, and report event counts.

With only one platform providing this feature, it's tricky to tell
exactly where it is going to go. I've made the event definitions
platform specific (based on the unique ID from the VSEC enumeration). It
seems possible/likely that the list of events may change from generation
to generation.

I've picked names for events based on the descriptions in the XML file.

Signed-off-by: Tony Luck <tony.luck@intel.com>

Tony Luck (30):
  x86,fs/resctrl: Consolidate monitor event descriptions
  x86,fs/resctrl: Replace architecture event enabled checks
  x86/resctrl: Remove 'rdt_mon_features' global variable
  x86,fs/resctrl: Prepare for more monitor events
  x86,fs/resctrl: Improve domain type checking
  x86/resctrl: Move L3 initialization out of domain_add_cpu_mon()
  x86,fs/resctrl: Refactor domain_remove_cpu_mon() ready for new domain
    types
  x86/resctrl: Clean up domain_remove_cpu_ctrl()
  x86,fs/resctrl: Use struct rdt_domain_hdr instead of struct
    rdt_mon_domain
  x86,fs/resctrl: Rename struct rdt_mon_domain and rdt_hw_mon_domain
  x86,fs/resctrl: Rename some L3 specific functions
  fs/resctrl: Make event details accessible to functions when reading
    events
  x86,fs/resctrl: Handle events that can be read from any CPU
  x86,fs/resctrl: Support binary fixed point event counters
  x86,fs/resctrl: Add an architectural hook called for each mount
  x86,fs/resctrl: Add and initialize rdt_resource for package scope core
    monitor
  x86/resctrl: Discover hardware telemetry events
  x86/resctrl: Count valid telemetry aggregators per package
  x86/resctrl: Complete telemetry event enumeration
  x86,fs/resctrl: Fill in details of Clearwater Forest events
  x86,fs/resctrl: Add architectural event pointer
  x86/resctrl: Read core telemetry events
  x86/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG
  x86/resctrl: Add energy/perf choices to rdt boot option
  x86/resctrl: Handle number of RMIDs supported by telemetry resources
  x86,fs/resctrl: Move RMID initialization to first mount
  x86/resctrl: Enable RDT_RESOURCE_PERF_PKG
  fs/resctrl: Provide interface to create a debugfs info directory
  x86/resctrl: Add debug info/PERF_PKG_MON/status files
  x86,fs/resctrl: Update Documentation for package events

 .../admin-guide/kernel-parameters.txt         |   2 +-
 Documentation/filesystems/resctrl.rst         |  53 ++-
 include/linux/resctrl.h                       |  84 +++-
 include/linux/resctrl_types.h                 |  26 +-
 arch/x86/include/asm/resctrl.h                |  16 -
 arch/x86/kernel/cpu/resctrl/internal.h        |  31 +-
 fs/resctrl/internal.h                         |  56 ++-
 arch/x86/kernel/cpu/resctrl/core.c            | 333 ++++++++++----
 arch/x86/kernel/cpu/resctrl/intel_aet.c       | 411 ++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/monitor.c         |  78 ++--
 fs/resctrl/ctrlmondata.c                      | 130 +++++-
 fs/resctrl/monitor.c                          | 267 +++++++-----
 fs/resctrl/rdtgroup.c                         | 253 +++++++----
 arch/x86/Kconfig                              |   5 +-
 arch/x86/kernel/cpu/resctrl/Makefile          |   1 +
 15 files changed, 1327 insertions(+), 419 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/resctrl/intel_aet.c


base-tree: git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git
base-branch: davidboxv2
base-commit: 4742bf1fab91403ca48efc45f7f7fd68a156a955
-- 
2.49.0


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH v6 01/30] x86,fs/resctrl: Consolidate monitor event descriptions
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-06-27 21:55   ` Fenghua Yu
  2025-07-08 20:52   ` Reinette Chatre
  2025-06-26 16:49 ` [PATCH v6 02/30] x86,fs/resctrl: Replace architecture event enabled checks Tony Luck
                   ` (31 subsequent siblings)
  32 siblings, 2 replies; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

There are currently only three monitor events, all associated with
the RDT_RESOURCE_L3 resource. Growing support for additional events
will be easier with some restructuring to have a single point in
file system code where all attributes of all events are defined.

Place all event descriptions into an array mon_event_all[]. Doing
this has the beneficial side effect of removing the need for
rdt_resource::evt_list.

Add resctrl_event_id::QOS_FIRST_EVENT for a lower bound on range
checks for event ids and as the starting index to scan mon_event_all[].

Drop the code that builds evt_list and change the two places where
the list is scanned to scan mon_event_all[] instead using a new
helper macro for_each_mon_event().

Architecture code now informs file system code which events are
available with resctrl_enable_mon_event().

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 include/linux/resctrl.h            |  4 +-
 include/linux/resctrl_types.h      | 12 ++++--
 fs/resctrl/internal.h              | 13 ++++--
 arch/x86/kernel/cpu/resctrl/core.c | 12 ++++--
 fs/resctrl/monitor.c               | 63 +++++++++++++++---------------
 fs/resctrl/rdtgroup.c              | 11 +++---
 6 files changed, 66 insertions(+), 49 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 6fb4894b8cfd..2944042bd84c 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -269,7 +269,6 @@ enum resctrl_schema_fmt {
  * @mon_domains:	RCU list of all monitor domains for this resource
  * @name:		Name to use in "schemata" file.
  * @schema_fmt:		Which format string and parser is used for this schema.
- * @evt_list:		List of monitoring events
  * @mbm_cfg_mask:	Bandwidth sources that can be tracked when bandwidth
  *			monitoring events can be configured.
  * @cdp_capable:	Is the CDP feature available on this resource
@@ -287,7 +286,6 @@ struct rdt_resource {
 	struct list_head	mon_domains;
 	char			*name;
 	enum resctrl_schema_fmt	schema_fmt;
-	struct list_head	evt_list;
 	unsigned int		mbm_cfg_mask;
 	bool			cdp_capable;
 };
@@ -372,6 +370,8 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
 u32 resctrl_arch_system_num_rmid_idx(void);
 int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
 
+void resctrl_enable_mon_event(enum resctrl_event_id eventid);
+
 bool resctrl_arch_is_evt_configurable(enum resctrl_event_id evt);
 
 /**
diff --git a/include/linux/resctrl_types.h b/include/linux/resctrl_types.h
index a25fb9c4070d..2dadbc54e4b3 100644
--- a/include/linux/resctrl_types.h
+++ b/include/linux/resctrl_types.h
@@ -34,11 +34,15 @@
 /* Max event bits supported */
 #define MAX_EVT_CONFIG_BITS		GENMASK(6, 0)
 
-/*
- * Event IDs, the values match those used to program IA32_QM_EVTSEL before
- * reading IA32_QM_CTR on RDT systems.
- */
+/* Event IDs */
 enum resctrl_event_id {
+	/* Must match value of first event below */
+	QOS_FIRST_EVENT			= 0x01,
+
+	/*
+	 * These values match those used to program IA32_QM_EVTSEL before
+	 * reading IA32_QM_CTR on RDT systems.
+	 */
 	QOS_L3_OCCUP_EVENT_ID		= 0x01,
 	QOS_L3_MBM_TOTAL_EVENT_ID	= 0x02,
 	QOS_L3_MBM_LOCAL_EVENT_ID	= 0x03,
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 0a1eedba2b03..445a41060724 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -52,19 +52,26 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
 }
 
 /**
- * struct mon_evt - Entry in the event list of a resource
+ * struct mon_evt - Properties of a monitor event
  * @evtid:		event id
+ * @rid:		index of the resource for this event
  * @name:		name of the event
  * @configurable:	true if the event is configurable
- * @list:		entry in &rdt_resource->evt_list
+ * @enabled:		true if the event is enabled
  */
 struct mon_evt {
 	enum resctrl_event_id	evtid;
+	enum resctrl_res_level	rid;
 	char			*name;
 	bool			configurable;
-	struct list_head	list;
+	bool			enabled;
 };
 
+extern struct mon_evt mon_event_all[QOS_NUM_EVENTS];
+
+#define for_each_mon_event(mevt) for (mevt = &mon_event_all[QOS_FIRST_EVENT];	\
+				      mevt < &mon_event_all[QOS_NUM_EVENTS]; mevt++)
+
 /**
  * struct mon_data - Monitoring details for each event file.
  * @list:            Member of the global @mon_data_kn_priv_list list.
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 187d527ef73b..7fcae25874fe 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -864,12 +864,18 @@ static __init bool get_rdt_mon_resources(void)
 {
 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
 
-	if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC))
+	if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) {
+		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID);
 		rdt_mon_features |= (1 << QOS_L3_OCCUP_EVENT_ID);
-	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL))
+	}
+	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
+		resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID);
 		rdt_mon_features |= (1 << QOS_L3_MBM_TOTAL_EVENT_ID);
-	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL))
+	}
+	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
+		resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID);
 		rdt_mon_features |= (1 << QOS_L3_MBM_LOCAL_EVENT_ID);
+	}
 
 	if (!rdt_mon_features)
 		return false;
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index f5637855c3ac..2313e48de55f 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -844,38 +844,39 @@ static void dom_data_exit(struct rdt_resource *r)
 	mutex_unlock(&rdtgroup_mutex);
 }
 
-static struct mon_evt llc_occupancy_event = {
-	.name		= "llc_occupancy",
-	.evtid		= QOS_L3_OCCUP_EVENT_ID,
-};
-
-static struct mon_evt mbm_total_event = {
-	.name		= "mbm_total_bytes",
-	.evtid		= QOS_L3_MBM_TOTAL_EVENT_ID,
-};
-
-static struct mon_evt mbm_local_event = {
-	.name		= "mbm_local_bytes",
-	.evtid		= QOS_L3_MBM_LOCAL_EVENT_ID,
-};
-
 /*
- * Initialize the event list for the resource.
- *
- * Note that MBM events are also part of RDT_RESOURCE_L3 resource
- * because as per the SDM the total and local memory bandwidth
- * are enumerated as part of L3 monitoring.
+ * All available events. Architecture code marks the ones that
+ * are supported by a system using resctrl_enable_mon_event()
+ * to set .enabled.
  */
-static void l3_mon_evt_init(struct rdt_resource *r)
+struct mon_evt mon_event_all[QOS_NUM_EVENTS] = {
+	[QOS_L3_OCCUP_EVENT_ID] = {
+		.name	= "llc_occupancy",
+		.evtid	= QOS_L3_OCCUP_EVENT_ID,
+		.rid	= RDT_RESOURCE_L3,
+	},
+	[QOS_L3_MBM_TOTAL_EVENT_ID] = {
+		.name	= "mbm_total_bytes",
+		.evtid	= QOS_L3_MBM_TOTAL_EVENT_ID,
+		.rid	= RDT_RESOURCE_L3,
+	},
+	[QOS_L3_MBM_LOCAL_EVENT_ID] = {
+		.name	= "mbm_local_bytes",
+		.evtid	= QOS_L3_MBM_LOCAL_EVENT_ID,
+		.rid	= RDT_RESOURCE_L3,
+	},
+};
+
+void resctrl_enable_mon_event(enum resctrl_event_id eventid)
 {
-	INIT_LIST_HEAD(&r->evt_list);
+	if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS))
+		return;
+	if (mon_event_all[eventid].enabled) {
+		pr_warn("Duplicate enable for event %d\n", eventid);
+		return;
+	}
 
-	if (resctrl_arch_is_llc_occupancy_enabled())
-		list_add_tail(&llc_occupancy_event.list, &r->evt_list);
-	if (resctrl_arch_is_mbm_total_enabled())
-		list_add_tail(&mbm_total_event.list, &r->evt_list);
-	if (resctrl_arch_is_mbm_local_enabled())
-		list_add_tail(&mbm_local_event.list, &r->evt_list);
+	mon_event_all[eventid].enabled = true;
 }
 
 /**
@@ -902,15 +903,13 @@ int resctrl_mon_resource_init(void)
 	if (ret)
 		return ret;
 
-	l3_mon_evt_init(r);
-
 	if (resctrl_arch_is_evt_configurable(QOS_L3_MBM_TOTAL_EVENT_ID)) {
-		mbm_total_event.configurable = true;
+		mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID].configurable = true;
 		resctrl_file_fflags_init("mbm_total_bytes_config",
 					 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
 	}
 	if (resctrl_arch_is_evt_configurable(QOS_L3_MBM_LOCAL_EVENT_ID)) {
-		mbm_local_event.configurable = true;
+		mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID].configurable = true;
 		resctrl_file_fflags_init("mbm_local_bytes_config",
 					 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
 	}
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 77d08229d855..b95501d4b5de 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1152,7 +1152,9 @@ static int rdt_mon_features_show(struct kernfs_open_file *of,
 	struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
 	struct mon_evt *mevt;
 
-	list_for_each_entry(mevt, &r->evt_list, list) {
+	for_each_mon_event(mevt) {
+		if (mevt->rid != r->rid || !mevt->enabled)
+			continue;
 		seq_printf(seq, "%s\n", mevt->name);
 		if (mevt->configurable)
 			seq_printf(seq, "%s_config\n", mevt->name);
@@ -3057,10 +3059,9 @@ static int mon_add_all_files(struct kernfs_node *kn, struct rdt_mon_domain *d,
 	struct mon_evt *mevt;
 	int ret, domid;
 
-	if (WARN_ON(list_empty(&r->evt_list)))
-		return -EPERM;
-
-	list_for_each_entry(mevt, &r->evt_list, list) {
+	for_each_mon_event(mevt) {
+		if (mevt->rid != r->rid || !mevt->enabled)
+			continue;
 		domid = do_sum ? d->ci_id : d->hdr.id;
 		priv = mon_get_kn_priv(r->rid, domid, mevt, do_sum);
 		if (WARN_ON_ONCE(!priv))
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH v6 02/30] x86,fs/resctrl: Replace architecture event enabled checks
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
  2025-06-26 16:49 ` [PATCH v6 01/30] x86,fs/resctrl: Consolidate monitor event descriptions Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-06-27 22:15   ` Fenghua Yu
  2025-07-08 20:52   ` Reinette Chatre
  2025-06-26 16:49 ` [PATCH v6 03/30] x86/resctrl: Remove 'rdt_mon_features' global variable Tony Luck
                   ` (30 subsequent siblings)
  32 siblings, 2 replies; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

The resctrl file system now has complete knowledge of the status
of every event. So there is no need for per-event function calls
to check.

Replace each of the resctrl_arch_is_{event}enabled() calls with
resctrl_is_mon_event_enabled(QOS_{EVENT}).

No functional change.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 include/linux/resctrl.h               |  2 ++
 arch/x86/include/asm/resctrl.h        | 15 ---------------
 arch/x86/kernel/cpu/resctrl/core.c    |  4 ++--
 arch/x86/kernel/cpu/resctrl/monitor.c |  4 ++--
 fs/resctrl/ctrlmondata.c              |  4 ++--
 fs/resctrl/monitor.c                  | 16 +++++++++++-----
 fs/resctrl/rdtgroup.c                 | 18 +++++++++---------
 7 files changed, 28 insertions(+), 35 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 2944042bd84c..40aba6b5d4f0 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -372,6 +372,8 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
 
 void resctrl_enable_mon_event(enum resctrl_event_id eventid);
 
+bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid);
+
 bool resctrl_arch_is_evt_configurable(enum resctrl_event_id evt);
 
 /**
diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index feb93b50e990..b1dd5d6b87db 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -84,21 +84,6 @@ static inline void resctrl_arch_disable_mon(void)
 	static_branch_dec_cpuslocked(&rdt_enable_key);
 }
 
-static inline bool resctrl_arch_is_llc_occupancy_enabled(void)
-{
-	return (rdt_mon_features & (1 << QOS_L3_OCCUP_EVENT_ID));
-}
-
-static inline bool resctrl_arch_is_mbm_total_enabled(void)
-{
-	return (rdt_mon_features & (1 << QOS_L3_MBM_TOTAL_EVENT_ID));
-}
-
-static inline bool resctrl_arch_is_mbm_local_enabled(void)
-{
-	return (rdt_mon_features & (1 << QOS_L3_MBM_LOCAL_EVENT_ID));
-}
-
 /*
  * __resctrl_sched_in() - Writes the task's CLOSid/RMID to IA32_PQR_MSR
  *
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 7fcae25874fe..1a319ce9328c 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -402,13 +402,13 @@ static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_mon_domain *hw_dom)
 {
 	size_t tsize;
 
-	if (resctrl_arch_is_mbm_total_enabled()) {
+	if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID)) {
 		tsize = sizeof(*hw_dom->arch_mbm_total);
 		hw_dom->arch_mbm_total = kcalloc(num_rmid, tsize, GFP_KERNEL);
 		if (!hw_dom->arch_mbm_total)
 			return -ENOMEM;
 	}
-	if (resctrl_arch_is_mbm_local_enabled()) {
+	if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID)) {
 		tsize = sizeof(*hw_dom->arch_mbm_local);
 		hw_dom->arch_mbm_local = kcalloc(num_rmid, tsize, GFP_KERNEL);
 		if (!hw_dom->arch_mbm_local) {
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index c261558276cd..61d38517e2bf 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -207,11 +207,11 @@ void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *
 {
 	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
 
-	if (resctrl_arch_is_mbm_total_enabled())
+	if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
 		memset(hw_dom->arch_mbm_total, 0,
 		       sizeof(*hw_dom->arch_mbm_total) * r->num_rmid);
 
-	if (resctrl_arch_is_mbm_local_enabled())
+	if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
 		memset(hw_dom->arch_mbm_local, 0,
 		       sizeof(*hw_dom->arch_mbm_local) * r->num_rmid);
 }
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index d98e0d2de09f..ad7ffc6acf13 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -473,12 +473,12 @@ ssize_t rdtgroup_mba_mbps_event_write(struct kernfs_open_file *of,
 	rdt_last_cmd_clear();
 
 	if (!strcmp(buf, "mbm_local_bytes")) {
-		if (resctrl_arch_is_mbm_local_enabled())
+		if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
 			rdtgrp->mba_mbps_event = QOS_L3_MBM_LOCAL_EVENT_ID;
 		else
 			ret = -EINVAL;
 	} else if (!strcmp(buf, "mbm_total_bytes")) {
-		if (resctrl_arch_is_mbm_total_enabled())
+		if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
 			rdtgrp->mba_mbps_event = QOS_L3_MBM_TOTAL_EVENT_ID;
 		else
 			ret = -EINVAL;
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 2313e48de55f..9e988b2c1a22 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -336,7 +336,7 @@ void free_rmid(u32 closid, u32 rmid)
 
 	entry = __rmid_entry(idx);
 
-	if (resctrl_arch_is_llc_occupancy_enabled())
+	if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID))
 		add_rmid_to_limbo(entry);
 	else
 		list_add_tail(&entry->list, &rmid_free_lru);
@@ -637,10 +637,10 @@ static void mbm_update(struct rdt_resource *r, struct rdt_mon_domain *d,
 	 * This is protected from concurrent reads from user as both
 	 * the user and overflow handler hold the global mutex.
 	 */
-	if (resctrl_arch_is_mbm_total_enabled())
+	if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
 		mbm_update_one_event(r, d, closid, rmid, QOS_L3_MBM_TOTAL_EVENT_ID);
 
-	if (resctrl_arch_is_mbm_local_enabled())
+	if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
 		mbm_update_one_event(r, d, closid, rmid, QOS_L3_MBM_LOCAL_EVENT_ID);
 }
 
@@ -879,6 +879,12 @@ void resctrl_enable_mon_event(enum resctrl_event_id eventid)
 	mon_event_all[eventid].enabled = true;
 }
 
+bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid)
+{
+	return eventid >= QOS_FIRST_EVENT && eventid < QOS_NUM_EVENTS &&
+	       mon_event_all[eventid].enabled;
+}
+
 /**
  * resctrl_mon_resource_init() - Initialise global monitoring structures.
  *
@@ -914,9 +920,9 @@ int resctrl_mon_resource_init(void)
 					 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
 	}
 
-	if (resctrl_arch_is_mbm_local_enabled())
+	if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
 		mba_mbps_default_event = QOS_L3_MBM_LOCAL_EVENT_ID;
-	else if (resctrl_arch_is_mbm_total_enabled())
+	else if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
 		mba_mbps_default_event = QOS_L3_MBM_TOTAL_EVENT_ID;
 
 	return 0;
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index b95501d4b5de..a7eeb33501da 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -123,8 +123,8 @@ void rdt_staged_configs_clear(void)
 
 static bool resctrl_is_mbm_enabled(void)
 {
-	return (resctrl_arch_is_mbm_total_enabled() ||
-		resctrl_arch_is_mbm_local_enabled());
+	return (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID) ||
+		resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID));
 }
 
 static bool resctrl_is_mbm_event(int e)
@@ -196,7 +196,7 @@ static int closid_alloc(void)
 	lockdep_assert_held(&rdtgroup_mutex);
 
 	if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID) &&
-	    resctrl_arch_is_llc_occupancy_enabled()) {
+	    resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID)) {
 		cleanest_closid = resctrl_find_cleanest_closid();
 		if (cleanest_closid < 0)
 			return cleanest_closid;
@@ -4051,7 +4051,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d
 
 	if (resctrl_is_mbm_enabled())
 		cancel_delayed_work(&d->mbm_over);
-	if (resctrl_arch_is_llc_occupancy_enabled() && has_busy_rmid(d)) {
+	if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID) && has_busy_rmid(d)) {
 		/*
 		 * When a package is going down, forcefully
 		 * decrement rmid->ebusy. There is no way to know
@@ -4087,12 +4087,12 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain
 	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
 	size_t tsize;
 
-	if (resctrl_arch_is_llc_occupancy_enabled()) {
+	if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID)) {
 		d->rmid_busy_llc = bitmap_zalloc(idx_limit, GFP_KERNEL);
 		if (!d->rmid_busy_llc)
 			return -ENOMEM;
 	}
-	if (resctrl_arch_is_mbm_total_enabled()) {
+	if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID)) {
 		tsize = sizeof(*d->mbm_total);
 		d->mbm_total = kcalloc(idx_limit, tsize, GFP_KERNEL);
 		if (!d->mbm_total) {
@@ -4100,7 +4100,7 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain
 			return -ENOMEM;
 		}
 	}
-	if (resctrl_arch_is_mbm_local_enabled()) {
+	if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID)) {
 		tsize = sizeof(*d->mbm_local);
 		d->mbm_local = kcalloc(idx_limit, tsize, GFP_KERNEL);
 		if (!d->mbm_local) {
@@ -4145,7 +4145,7 @@ int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
 					   RESCTRL_PICK_ANY_CPU);
 	}
 
-	if (resctrl_arch_is_llc_occupancy_enabled())
+	if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID))
 		INIT_DELAYED_WORK(&d->cqm_limbo, cqm_handle_limbo);
 
 	/*
@@ -4220,7 +4220,7 @@ void resctrl_offline_cpu(unsigned int cpu)
 			cancel_delayed_work(&d->mbm_over);
 			mbm_setup_overflow_handler(d, 0, cpu);
 		}
-		if (resctrl_arch_is_llc_occupancy_enabled() &&
+		if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID) &&
 		    cpu == d->cqm_work_cpu && has_busy_rmid(d)) {
 			cancel_delayed_work(&d->cqm_limbo);
 			cqm_setup_limbo_handler(d, 0, cpu);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH v6 03/30] x86/resctrl: Remove 'rdt_mon_features' global variable
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
  2025-06-26 16:49 ` [PATCH v6 01/30] x86,fs/resctrl: Consolidate monitor event descriptions Tony Luck
  2025-06-26 16:49 ` [PATCH v6 02/30] x86,fs/resctrl: Replace architecture event enabled checks Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-07-08 20:53   ` Reinette Chatre
  2025-06-26 16:49 ` [PATCH v6 04/30] x86,fs/resctrl: Prepare for more monitor events Tony Luck
                   ` (29 subsequent siblings)
  32 siblings, 1 reply; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

rdt_mon_features is used as a bitmask of enabled monitor events. A monitor
event's status is now maintained in mon_evt::enabled with all monitor
events' mon_evt structures found in the filesystem's mon_event_all[] array.

Remove the remaining uses of rdt_mon_features.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/resctrl.h        | 1 -
 arch/x86/kernel/cpu/resctrl/core.c    | 9 +++++----
 arch/x86/kernel/cpu/resctrl/monitor.c | 5 -----
 3 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index b1dd5d6b87db..575f8408a9e7 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -44,7 +44,6 @@ DECLARE_PER_CPU(struct resctrl_pqr_state, pqr_state);
 
 extern bool rdt_alloc_capable;
 extern bool rdt_mon_capable;
-extern unsigned int rdt_mon_features;
 
 DECLARE_STATIC_KEY_FALSE(rdt_enable_key);
 DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 1a319ce9328c..5d14f9a14eda 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -863,21 +863,22 @@ static __init bool get_rdt_alloc_resources(void)
 static __init bool get_rdt_mon_resources(void)
 {
 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+	bool ret = false;
 
 	if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) {
 		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID);
-		rdt_mon_features |= (1 << QOS_L3_OCCUP_EVENT_ID);
+		ret = true;
 	}
 	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
 		resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID);
-		rdt_mon_features |= (1 << QOS_L3_MBM_TOTAL_EVENT_ID);
+		ret = true;
 	}
 	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
 		resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID);
-		rdt_mon_features |= (1 << QOS_L3_MBM_LOCAL_EVENT_ID);
+		ret = true;
 	}
 
-	if (!rdt_mon_features)
+	if (!ret)
 		return false;
 
 	return !rdt_get_mon_l3_config(r);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 61d38517e2bf..07f8ab097cbe 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -31,11 +31,6 @@
  */
 bool rdt_mon_capable;
 
-/*
- * Global to indicate which monitoring events are enabled.
- */
-unsigned int rdt_mon_features;
-
 #define CF(cf)	((unsigned long)(1048576 * (cf) + 0.5))
 
 static int snc_nodes_per_l3_cache = 1;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH v6 04/30] x86,fs/resctrl: Prepare for more monitor events
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (2 preceding siblings ...)
  2025-06-26 16:49 ` [PATCH v6 03/30] x86/resctrl: Remove 'rdt_mon_features' global variable Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-07-08 20:55   ` Reinette Chatre
  2025-06-26 16:49 ` [PATCH v6 05/30] x86,fs/resctrl: Improve domain type checking Tony Luck
                   ` (28 subsequent siblings)
  32 siblings, 1 reply; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

There's a rule in computer programming that objects appear zero,
once, or many times. So code accordingly.

There are two MBM events and resctrl is coded with a lot of

        if (local)
                do one thing
        if (total)
                do a different thing

Change the rdt_mon_domain and rdt_hw_mon_domain structures to hold arrays
of pointers to per event data instead of explicit fields for total and
local bandwidth.

Simplify by coding for many events using loops on which are enabled.

Move resctrl_is_mbm_event() to <linux/resctrl.h> so it can be used more
widely. Also provide a for_each_mbm_event_id() helper macro.

Cleanup variable names in functions touched to consistently use
"eventid" for those with type enum resctrl_event_id.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 include/linux/resctrl.h                | 23 +++++++++---
 include/linux/resctrl_types.h          |  3 ++
 arch/x86/kernel/cpu/resctrl/internal.h |  8 ++---
 arch/x86/kernel/cpu/resctrl/core.c     | 40 +++++++++++----------
 arch/x86/kernel/cpu/resctrl/monitor.c  | 36 +++++++++----------
 fs/resctrl/monitor.c                   | 13 ++++---
 fs/resctrl/rdtgroup.c                  | 50 +++++++++++++-------------
 7 files changed, 96 insertions(+), 77 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 40aba6b5d4f0..478d7a935ca3 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -161,8 +161,9 @@ struct rdt_ctrl_domain {
  * @hdr:		common header for different domain types
  * @ci_id:		cache info id for this domain
  * @rmid_busy_llc:	bitmap of which limbo RMIDs are above threshold
- * @mbm_total:		saved state for MBM total bandwidth
- * @mbm_local:		saved state for MBM local bandwidth
+ * @mbm_states:		Per-event pointer to the MBM event's saved state.
+ *			An MBM event's state is an array of struct mbm_state
+ *			indexed by RMID on x86 or combined CLOSID, RMID on Arm.
  * @mbm_over:		worker to periodically read MBM h/w counters
  * @cqm_limbo:		worker to periodically read CQM h/w counters
  * @mbm_work_cpu:	worker CPU for MBM h/w counters
@@ -172,8 +173,7 @@ struct rdt_mon_domain {
 	struct rdt_domain_hdr		hdr;
 	unsigned int			ci_id;
 	unsigned long			*rmid_busy_llc;
-	struct mbm_state		*mbm_total;
-	struct mbm_state		*mbm_local;
+	struct mbm_state		*mbm_states[QOS_NUM_L3_MBM_EVENTS];
 	struct delayed_work		mbm_over;
 	struct delayed_work		cqm_limbo;
 	int				mbm_work_cpu;
@@ -376,6 +376,21 @@ bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid);
 
 bool resctrl_arch_is_evt_configurable(enum resctrl_event_id evt);
 
+static inline bool resctrl_is_mbm_event(enum resctrl_event_id eventid)
+{
+	return (eventid >= QOS_L3_MBM_TOTAL_EVENT_ID &&
+		eventid <= QOS_L3_MBM_LOCAL_EVENT_ID);
+}
+
+/* Iterate over all memory bandwidth events */
+#define for_each_mbm_event_id(eventid)				\
+	for (eventid = QOS_L3_MBM_TOTAL_EVENT_ID;		\
+	     eventid <= QOS_L3_MBM_LOCAL_EVENT_ID; eventid++)
+
+/* Iterate over memory bandwidth arrays in domain structures */
+#define for_each_mbm_idx(idx)					\
+	for (idx = 0; idx < QOS_NUM_L3_MBM_EVENTS; idx++)
+
 /**
  * resctrl_arch_mon_event_config_write() - Write the config for an event.
  * @config_info: struct resctrl_mon_config_info describing the resource, domain
diff --git a/include/linux/resctrl_types.h b/include/linux/resctrl_types.h
index 2dadbc54e4b3..d98351663c2c 100644
--- a/include/linux/resctrl_types.h
+++ b/include/linux/resctrl_types.h
@@ -51,4 +51,7 @@ enum resctrl_event_id {
 	QOS_NUM_EVENTS,
 };
 
+#define QOS_NUM_L3_MBM_EVENTS	(QOS_L3_MBM_LOCAL_EVENT_ID - QOS_L3_MBM_TOTAL_EVENT_ID + 1)
+#define MBM_STATE_IDX(evt)	((evt) - QOS_L3_MBM_TOTAL_EVENT_ID)
+
 #endif /* __LINUX_RESCTRL_TYPES_H */
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 5e3c41b36437..58dca892a5df 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -54,15 +54,15 @@ struct rdt_hw_ctrl_domain {
  * struct rdt_hw_mon_domain - Arch private attributes of a set of CPUs that share
  *			      a resource for a monitor function
  * @d_resctrl:	Properties exposed to the resctrl file system
- * @arch_mbm_total:	arch private state for MBM total bandwidth
- * @arch_mbm_local:	arch private state for MBM local bandwidth
+ * @arch_mbm_states:	Per-event pointer to the MBM event's saved state.
+ *			An MBM event's state is an array of struct arch_mbm_state
+ *			indexed by RMID on x86.
  *
  * Members of this structure are accessed via helpers that provide abstraction.
  */
 struct rdt_hw_mon_domain {
 	struct rdt_mon_domain		d_resctrl;
-	struct arch_mbm_state		*arch_mbm_total;
-	struct arch_mbm_state		*arch_mbm_local;
+	struct arch_mbm_state		*arch_mbm_states[QOS_NUM_L3_MBM_EVENTS];
 };
 
 static inline struct rdt_hw_ctrl_domain *resctrl_to_arch_ctrl_dom(struct rdt_ctrl_domain *r)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 5d14f9a14eda..fbf019c1ff11 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -365,8 +365,10 @@ static void ctrl_domain_free(struct rdt_hw_ctrl_domain *hw_dom)
 
 static void mon_domain_free(struct rdt_hw_mon_domain *hw_dom)
 {
-	kfree(hw_dom->arch_mbm_total);
-	kfree(hw_dom->arch_mbm_local);
+	int idx;
+
+	for_each_mbm_idx(idx)
+		kfree(hw_dom->arch_mbm_states[idx]);
 	kfree(hw_dom);
 }
 
@@ -400,25 +402,27 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_ctrl_domain *
  */
 static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_mon_domain *hw_dom)
 {
-	size_t tsize;
-
-	if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID)) {
-		tsize = sizeof(*hw_dom->arch_mbm_total);
-		hw_dom->arch_mbm_total = kcalloc(num_rmid, tsize, GFP_KERNEL);
-		if (!hw_dom->arch_mbm_total)
-			return -ENOMEM;
-	}
-	if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID)) {
-		tsize = sizeof(*hw_dom->arch_mbm_local);
-		hw_dom->arch_mbm_local = kcalloc(num_rmid, tsize, GFP_KERNEL);
-		if (!hw_dom->arch_mbm_local) {
-			kfree(hw_dom->arch_mbm_total);
-			hw_dom->arch_mbm_total = NULL;
-			return -ENOMEM;
-		}
+	size_t tsize = sizeof(*hw_dom->arch_mbm_states[0]);
+	enum resctrl_event_id eventid;
+	int idx;
+
+	for_each_mbm_event_id(eventid) {
+		if (!resctrl_is_mon_event_enabled(eventid))
+			continue;
+		idx = MBM_STATE_IDX(eventid);
+		hw_dom->arch_mbm_states[idx] = kcalloc(num_rmid, tsize, GFP_KERNEL);
+		if (!hw_dom->arch_mbm_states[idx])
+			goto cleanup;
 	}
 
 	return 0;
+cleanup:
+	for_each_mbm_idx(idx) {
+		kfree(hw_dom->arch_mbm_states[idx]);
+		hw_dom->arch_mbm_states[idx] = NULL;
+	}
+
+	return -ENOMEM;
 }
 
 static int get_domain_id_from_scope(int cpu, enum resctrl_scope scope)
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 07f8ab097cbe..f01db2034d08 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -161,18 +161,14 @@ static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_mon_domain *hw_do
 						 u32 rmid,
 						 enum resctrl_event_id eventid)
 {
-	switch (eventid) {
-	case QOS_L3_OCCUP_EVENT_ID:
-		return NULL;
-	case QOS_L3_MBM_TOTAL_EVENT_ID:
-		return &hw_dom->arch_mbm_total[rmid];
-	case QOS_L3_MBM_LOCAL_EVENT_ID:
-		return &hw_dom->arch_mbm_local[rmid];
-	default:
-		/* Never expect to get here */
-		WARN_ON_ONCE(1);
+	struct arch_mbm_state *state;
+
+	if (!resctrl_is_mbm_event(eventid))
 		return NULL;
-	}
+
+	state = hw_dom->arch_mbm_states[MBM_STATE_IDX(eventid)];
+
+	return state ? &state[rmid] : NULL;
 }
 
 void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
@@ -201,14 +197,16 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
 void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d)
 {
 	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
-
-	if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
-		memset(hw_dom->arch_mbm_total, 0,
-		       sizeof(*hw_dom->arch_mbm_total) * r->num_rmid);
-
-	if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
-		memset(hw_dom->arch_mbm_local, 0,
-		       sizeof(*hw_dom->arch_mbm_local) * r->num_rmid);
+	enum resctrl_event_id eventid;
+	int idx;
+
+	for_each_mbm_event_id(eventid) {
+		if (!resctrl_is_mon_event_enabled(eventid))
+			continue;
+		idx = MBM_STATE_IDX(eventid);
+		memset(hw_dom->arch_mbm_states[idx], 0,
+		       sizeof(*hw_dom->arch_mbm_states[0]) * r->num_rmid);
+	}
 }
 
 static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 9e988b2c1a22..dcc6c00eb362 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -346,15 +346,14 @@ static struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
 				       u32 rmid, enum resctrl_event_id evtid)
 {
 	u32 idx = resctrl_arch_rmid_idx_encode(closid, rmid);
+	struct mbm_state *state;
 
-	switch (evtid) {
-	case QOS_L3_MBM_TOTAL_EVENT_ID:
-		return &d->mbm_total[idx];
-	case QOS_L3_MBM_LOCAL_EVENT_ID:
-		return &d->mbm_local[idx];
-	default:
+	if (!resctrl_is_mbm_event(evtid))
 		return NULL;
-	}
+
+	state = d->mbm_states[MBM_STATE_IDX(evtid)];
+
+	return state ? &state[idx] : NULL;
 }
 
 static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index a7eeb33501da..77336d5e4915 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -127,12 +127,6 @@ static bool resctrl_is_mbm_enabled(void)
 		resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID));
 }
 
-static bool resctrl_is_mbm_event(int e)
-{
-	return (e >= QOS_L3_MBM_TOTAL_EVENT_ID &&
-		e <= QOS_L3_MBM_LOCAL_EVENT_ID);
-}
-
 /*
  * Trivial allocator for CLOSIDs. Use BITMAP APIs to manipulate a bitmap
  * of free CLOSIDs.
@@ -4023,9 +4017,13 @@ static void rdtgroup_setup_default(void)
 
 static void domain_destroy_mon_state(struct rdt_mon_domain *d)
 {
+	int idx;
+
 	bitmap_free(d->rmid_busy_llc);
-	kfree(d->mbm_total);
-	kfree(d->mbm_local);
+	for_each_mbm_idx(idx) {
+		kfree(d->mbm_states[idx]);
+		d->mbm_states[idx] = NULL;
+	}
 }
 
 void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d)
@@ -4085,32 +4083,34 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d
 static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain *d)
 {
 	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
-	size_t tsize;
+	size_t tsize = sizeof(*d->mbm_states[0]);
+	enum resctrl_event_id eventid;
+	int idx;
 
 	if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID)) {
 		d->rmid_busy_llc = bitmap_zalloc(idx_limit, GFP_KERNEL);
 		if (!d->rmid_busy_llc)
 			return -ENOMEM;
 	}
-	if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID)) {
-		tsize = sizeof(*d->mbm_total);
-		d->mbm_total = kcalloc(idx_limit, tsize, GFP_KERNEL);
-		if (!d->mbm_total) {
-			bitmap_free(d->rmid_busy_llc);
-			return -ENOMEM;
-		}
-	}
-	if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID)) {
-		tsize = sizeof(*d->mbm_local);
-		d->mbm_local = kcalloc(idx_limit, tsize, GFP_KERNEL);
-		if (!d->mbm_local) {
-			bitmap_free(d->rmid_busy_llc);
-			kfree(d->mbm_total);
-			return -ENOMEM;
-		}
+
+	for_each_mbm_event_id(eventid) {
+		if (!resctrl_is_mon_event_enabled(eventid))
+			continue;
+		idx = MBM_STATE_IDX(eventid);
+		d->mbm_states[idx] = kcalloc(idx_limit, tsize, GFP_KERNEL);
+		if (!d->mbm_states[idx])
+			goto cleanup;
 	}
 
 	return 0;
+cleanup:
+	bitmap_free(d->rmid_busy_llc);
+	for_each_mbm_idx(idx) {
+		kfree(d->mbm_states[idx]);
+		d->mbm_states[idx] = NULL;
+	}
+
+	return -ENOMEM;
 }
 
 int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH v6 05/30] x86,fs/resctrl: Improve domain type checking
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (3 preceding siblings ...)
  2025-06-26 16:49 ` [PATCH v6 04/30] x86,fs/resctrl: Prepare for more monitor events Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-07-08 21:01   ` Reinette Chatre
  2025-06-26 16:49 ` [PATCH v6 06/30] x86/resctrl: Move L3 initialization out of domain_add_cpu_mon() Tony Luck
                   ` (27 subsequent siblings)
  32 siblings, 1 reply; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

The rdt_domain_hdr structure is used in both control and monitor
domain structures to provide common methods for operations such as
adding a CPU to a domain, removing a CPU from a domain, accessing
the mask of all CPUs in a domain.

The "type" field provides a simple check whether a domain is a
control or monitor domain so that programming errors operating
on domains will be quickly caught.

To prepare for additional domain types that depend on the rdt_resource
to which they are connected add the resource id into the header
and check that in addition to the type.

At this point all monitoring events are tied to the RDT_RESOURCE_L3
resource. So hard code the check in rdtgroup_mondata_show() to
that resource id.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 include/linux/resctrl.h            |  9 +++++++++
 arch/x86/kernel/cpu/resctrl/core.c | 10 ++++++----
 fs/resctrl/ctrlmondata.c           |  2 +-
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 478d7a935ca3..dc7ccd60e8c2 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -131,15 +131,24 @@ enum resctrl_domain_type {
  * @list:		all instances of this resource
  * @id:			unique id for this instance
  * @type:		type of this instance
+ * @rid:		index of resource for this domain
  * @cpu_mask:		which CPUs share this resource
  */
 struct rdt_domain_hdr {
 	struct list_head		list;
 	int				id;
 	enum resctrl_domain_type	type;
+	enum resctrl_res_level		rid;
 	struct cpumask			cpu_mask;
 };
 
+static inline bool domain_header_is_valid(struct rdt_domain_hdr *hdr,
+					  enum resctrl_domain_type type,
+					  enum resctrl_res_level rid)
+{
+	return !WARN_ON_ONCE(hdr->type != type || hdr->rid != rid);
+}
+
 /**
  * struct rdt_ctrl_domain - group of CPUs sharing a resctrl control resource
  * @hdr:		common header for different domain types
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index fbf019c1ff11..420e4eb7c160 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -459,7 +459,7 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
 
 	hdr = resctrl_find_domain(&r->ctrl_domains, id, &add_pos);
 	if (hdr) {
-		if (WARN_ON_ONCE(hdr->type != RESCTRL_CTRL_DOMAIN))
+		if (!domain_header_is_valid(hdr, RESCTRL_CTRL_DOMAIN, r->rid))
 			return;
 		d = container_of(hdr, struct rdt_ctrl_domain, hdr);
 
@@ -476,6 +476,7 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
 	d = &hw_dom->d_resctrl;
 	d->hdr.id = id;
 	d->hdr.type = RESCTRL_CTRL_DOMAIN;
+	d->hdr.rid = r->rid;
 	cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
 
 	rdt_domain_reconfigure_cdp(r);
@@ -515,7 +516,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
 
 	hdr = resctrl_find_domain(&r->mon_domains, id, &add_pos);
 	if (hdr) {
-		if (WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN))
+		if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, r->rid))
 			return;
 		d = container_of(hdr, struct rdt_mon_domain, hdr);
 
@@ -530,6 +531,7 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
 	d = &hw_dom->d_resctrl;
 	d->hdr.id = id;
 	d->hdr.type = RESCTRL_MON_DOMAIN;
+	d->hdr.rid = r->rid;
 	ci = get_cpu_cacheinfo_level(cpu, RESCTRL_L3_CACHE);
 	if (!ci) {
 		pr_warn_once("Can't find L3 cache for CPU:%d resource %s\n", cpu, r->name);
@@ -586,7 +588,7 @@ static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
 		return;
 	}
 
-	if (WARN_ON_ONCE(hdr->type != RESCTRL_CTRL_DOMAIN))
+	if (!domain_header_is_valid(hdr, RESCTRL_CTRL_DOMAIN, r->rid))
 		return;
 
 	d = container_of(hdr, struct rdt_ctrl_domain, hdr);
@@ -632,7 +634,7 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
 		return;
 	}
 
-	if (WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN))
+	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, r->rid))
 		return;
 
 	d = container_of(hdr, struct rdt_mon_domain, hdr);
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index ad7ffc6acf13..cdb4bc8baa99 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -643,7 +643,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 		 * the resource to find the domain with "domid".
 		 */
 		hdr = resctrl_find_domain(&r->mon_domains, domid, NULL);
-		if (!hdr || WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN)) {
+		if (!hdr || !domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3)) {
 			ret = -ENOENT;
 			goto out;
 		}
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH v6 06/30] x86/resctrl: Move L3 initialization out of domain_add_cpu_mon()
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (4 preceding siblings ...)
  2025-06-26 16:49 ` [PATCH v6 05/30] x86,fs/resctrl: Improve domain type checking Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-07-08 20:56   ` Reinette Chatre
  2025-06-26 16:49 ` [PATCH v6 07/30] x86,fs/resctrl: Refactor domain_remove_cpu_mon() ready for new domain types Tony Luck
                   ` (26 subsequent siblings)
  32 siblings, 1 reply; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

To prepare for additional types of monitoring domains, move all the L3
resource monitoring domain initialization out of domain_add_cpu_mon()
and into a new helper function l3_mon_domain_setup() (name chosen
as the partner of existing l3_mon_domain_free()).

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/resctrl/core.c | 55 ++++++++++++++++++------------
 1 file changed, 33 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 420e4eb7c160..20b6f2bbf858 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -496,34 +496,13 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
 	}
 }
 
-static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
+static void l3_mon_domain_setup(int cpu, int id, struct rdt_resource *r, struct list_head *add_pos)
 {
-	int id = get_domain_id_from_scope(cpu, r->mon_scope);
-	struct list_head *add_pos = NULL;
 	struct rdt_hw_mon_domain *hw_dom;
-	struct rdt_domain_hdr *hdr;
 	struct rdt_mon_domain *d;
 	struct cacheinfo *ci;
 	int err;
 
-	lockdep_assert_held(&domain_list_lock);
-
-	if (id < 0) {
-		pr_warn_once("Can't find monitor domain id for CPU:%d scope:%d for resource %s\n",
-			     cpu, r->mon_scope, r->name);
-		return;
-	}
-
-	hdr = resctrl_find_domain(&r->mon_domains, id, &add_pos);
-	if (hdr) {
-		if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, r->rid))
-			return;
-		d = container_of(hdr, struct rdt_mon_domain, hdr);
-
-		cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
-		return;
-	}
-
 	hw_dom = kzalloc_node(sizeof(*hw_dom), GFP_KERNEL, cpu_to_node(cpu));
 	if (!hw_dom)
 		return;
@@ -558,6 +537,38 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
 	}
 }
 
+static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
+{
+	int id = get_domain_id_from_scope(cpu, r->mon_scope);
+	struct list_head *add_pos = NULL;
+	struct rdt_domain_hdr *hdr;
+
+	lockdep_assert_held(&domain_list_lock);
+
+	if (id < 0) {
+		pr_warn_once("Can't find monitor domain id for CPU:%d scope:%d for resource %s\n",
+			     cpu, r->mon_scope, r->name);
+		return;
+	}
+
+	hdr = resctrl_find_domain(&r->mon_domains, id, &add_pos);
+	if (hdr) {
+		if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, r->rid))
+			return;
+		cpumask_set_cpu(cpu, &hdr->cpu_mask);
+
+		return;
+	}
+
+	switch (r->rid) {
+	case RDT_RESOURCE_L3:
+		l3_mon_domain_setup(cpu, id, r, add_pos);
+		break;
+	default:
+		WARN_ON_ONCE(1);
+	}
+}
+
 static void domain_add_cpu(int cpu, struct rdt_resource *r)
 {
 	if (r->alloc_capable)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH v6 07/30] x86,fs/resctrl: Refactor domain_remove_cpu_mon() ready for new domain types
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (5 preceding siblings ...)
  2025-06-26 16:49 ` [PATCH v6 06/30] x86/resctrl: Move L3 initialization out of domain_add_cpu_mon() Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-07-08 20:57   ` Reinette Chatre
  2025-06-26 16:49 ` [PATCH v6 08/30] x86/resctrl: Clean up domain_remove_cpu_ctrl() Tony Luck
                   ` (25 subsequent siblings)
  32 siblings, 1 reply; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Historically all monitoring events have been associated with the L3
resource. This will change when support for telemetry events is added.

The RDT_RESOURCE_L3 resource carries a lot of state in the domain
structures which needs to be dealt with when a domain is taken offline
by removing the last CPU in the domain.

Refactor domain_remove_cpu_mon() so all the L3 processing is separated
from general actions of clearing the CPU bit in the mask and removing
directories from mon_data.

resctrl_offline_mon_domain() will still need to remove domain specific
directories and files from the "mon_data" directories, but can skip the
L3 resource specific cleanup when called for other resource types.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/resctrl/core.c | 17 +++++++++++------
 fs/resctrl/rdtgroup.c              |  5 ++++-
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 20b6f2bbf858..4bf264b6a333 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -648,17 +648,22 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
 	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, r->rid))
 		return;
 
-	d = container_of(hdr, struct rdt_mon_domain, hdr);
-	hw_dom = resctrl_to_arch_mon_dom(d);
-
 	cpumask_clear_cpu(cpu, &d->hdr.cpu_mask);
-	if (cpumask_empty(&d->hdr.cpu_mask)) {
+	if (!cpumask_empty(&d->hdr.cpu_mask))
+		return;
+
+	switch (r->rid) {
+	case RDT_RESOURCE_L3:
+		d = container_of(hdr, struct rdt_mon_domain, hdr);
+		hw_dom = resctrl_to_arch_mon_dom(d);
 		resctrl_offline_mon_domain(r, d);
 		list_del_rcu(&d->hdr.list);
 		synchronize_rcu();
 		mon_domain_free(hw_dom);
-
-		return;
+		break;
+	default:
+		pr_warn_once("Unknown resource rid=%d\n", r->rid);
+		break;
 	}
 }
 
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 77336d5e4915..05438e15e2ca 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -4047,6 +4047,9 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d
 	if (resctrl_mounted && resctrl_arch_mon_capable())
 		rmdir_mondata_subdir_allrdtgrp(r, d);
 
+	if (r->rid != RDT_RESOURCE_L3)
+		goto out_unlock;
+
 	if (resctrl_is_mbm_enabled())
 		cancel_delayed_work(&d->mbm_over);
 	if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID) && has_busy_rmid(d)) {
@@ -4063,7 +4066,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d
 	}
 
 	domain_destroy_mon_state(d);
-
+out_unlock:
 	mutex_unlock(&rdtgroup_mutex);
 }
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH v6 08/30] x86/resctrl: Clean up domain_remove_cpu_ctrl()
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (6 preceding siblings ...)
  2025-06-26 16:49 ` [PATCH v6 07/30] x86,fs/resctrl: Refactor domain_remove_cpu_mon() ready for new domain types Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-06-26 16:49 ` [PATCH v6 09/30] x86,fs/resctrl: Use struct rdt_domain_hdr instead of struct rdt_mon_domain Tony Luck
                   ` (24 subsequent siblings)
  32 siblings, 0 replies; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

For symmetry with domain_remove_cpu_mon() refactor to take an
early return when removing a CPU does not empty the domain.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/resctrl/core.c | 29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 4bf264b6a333..2075c98aa4e7 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -602,25 +602,24 @@ static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
 	if (!domain_header_is_valid(hdr, RESCTRL_CTRL_DOMAIN, r->rid))
 		return;
 
+	cpumask_clear_cpu(cpu, &d->hdr.cpu_mask);
+	if (!cpumask_empty(&d->hdr.cpu_mask))
+		return;
+
 	d = container_of(hdr, struct rdt_ctrl_domain, hdr);
 	hw_dom = resctrl_to_arch_ctrl_dom(d);
 
-	cpumask_clear_cpu(cpu, &d->hdr.cpu_mask);
-	if (cpumask_empty(&d->hdr.cpu_mask)) {
-		resctrl_offline_ctrl_domain(r, d);
-		list_del_rcu(&d->hdr.list);
-		synchronize_rcu();
+	resctrl_offline_ctrl_domain(r, d);
+	list_del_rcu(&d->hdr.list);
+	synchronize_rcu();
 
-		/*
-		 * rdt_ctrl_domain "d" is going to be freed below, so clear
-		 * its pointer from pseudo_lock_region struct.
-		 */
-		if (d->plr)
-			d->plr->d = NULL;
-		ctrl_domain_free(hw_dom);
-
-		return;
-	}
+	/*
+	 * rdt_ctrl_domain "d" is going to be freed below, so clear
+	 * its pointer from pseudo_lock_region struct.
+	 */
+	if (d->plr)
+		d->plr->d = NULL;
+	ctrl_domain_free(hw_dom);
 }
 
 static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH v6 09/30] x86,fs/resctrl: Use struct rdt_domain_hdr instead of struct rdt_mon_domain
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (7 preceding siblings ...)
  2025-06-26 16:49 ` [PATCH v6 08/30] x86/resctrl: Clean up domain_remove_cpu_ctrl() Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-07-08 21:04   ` Reinette Chatre
  2025-06-26 16:49 ` [PATCH v6 10/30] x86,fs/resctrl: Rename struct rdt_mon_domain and rdt_hw_mon_domain Tony Luck
                   ` (23 subsequent siblings)
  32 siblings, 1 reply; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Historically all monitoring events have been associated with the L3
resource and it made sense to use "struct rdt_mon_domain *" arguments
to functions manipulating domains. But the addition of monitor events
tied to other resources changes this assumption.

Change calling sequence for domain addition and deletion. Also for
reading events. This includes the smp_call*() IPI where the rmid_read
now holds a pointer to struct rdt_domain_hdr.

The mon_data structure is unchanged, but documentation is updated
to not that mon_data::sum is only used for RDT_RESOURCE_L3.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 include/linux/resctrl.h               |   8 +-
 fs/resctrl/internal.h                 |  14 ++--
 arch/x86/kernel/cpu/resctrl/core.c    |   4 +-
 arch/x86/kernel/cpu/resctrl/monitor.c |  18 ++++-
 fs/resctrl/ctrlmondata.c              |  16 ++--
 fs/resctrl/monitor.c                  |  31 +++++---
 fs/resctrl/rdtgroup.c                 | 103 ++++++++++++++++++--------
 7 files changed, 130 insertions(+), 64 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index dc7ccd60e8c2..b332466312e1 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -452,9 +452,9 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
 u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_ctrl_domain *d,
 			    u32 closid, enum resctrl_conf_type type);
 int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d);
-int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d);
+int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr);
 void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d);
-void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d);
+void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr);
 void resctrl_online_cpu(unsigned int cpu);
 void resctrl_offline_cpu(unsigned int cpu);
 
@@ -462,7 +462,7 @@ void resctrl_offline_cpu(unsigned int cpu);
  * resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid
  *			      for this resource and domain.
  * @r:			resource that the counter should be read from.
- * @d:			domain that the counter should be read from.
+ * @hdr:		Header of domain that the counter should be read from.
  * @closid:		closid that matches the rmid. Depending on the architecture, the
  *			counter may match traffic of both @closid and @rmid, or @rmid
  *			only.
@@ -483,7 +483,7 @@ void resctrl_offline_cpu(unsigned int cpu);
  * Return:
  * 0 on success, or -EIO, -EINVAL etc on error.
  */
-int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
 			   u32 closid, u32 rmid, enum resctrl_event_id eventid,
 			   u64 *val, void *arch_mon_ctx);
 
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 445a41060724..ce3d24c512e3 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -77,8 +77,8 @@ extern struct mon_evt mon_event_all[QOS_NUM_EVENTS];
  * @list:            Member of the global @mon_data_kn_priv_list list.
  * @rid:             Resource id associated with the event file.
  * @evtid:           Event id associated with the event file.
- * @sum:             Set when event must be summed across multiple
- *                   domains.
+ * @sum:             Set for RDT_RESOURCE_L3 when event must be summed
+ *                   across multiple domains.
  * @domid:           When @sum is zero this is the domain to which
  *                   the event file belongs. When @sum is one this
  *                   is the id of the L3 cache that all domains to be
@@ -101,22 +101,22 @@ struct mon_data {
  *	   resource group then its event count is summed with the count from all
  *	   its child resource groups.
  * @r:	   Resource describing the properties of the event being read.
- * @d:	   Domain that the counter should be read from. If NULL then sum all
+ * @hdr:   Header of domain that the counter should be read from. If NULL then sum all
  *	   domains in @r sharing L3 @ci.id
  * @evtid: Which monitor event to read.
  * @first: Initialize MBM counter when true.
- * @ci_id: Cacheinfo id for L3. Only set when @d is NULL. Used when summing domains.
+ * @ci_id: Cacheinfo id for L3. Only set when @hdr is NULL. Used when summing domains.
  * @err:   Error encountered when reading counter.
  * @val:   Returned value of event counter. If @rgrp is a parent resource group,
  *	   @val includes the sum of event counts from its child resource groups.
- *	   If @d is NULL, @val includes the sum of all domains in @r sharing @ci.id,
+ *	   If @hdr is NULL, @val includes the sum of all domains in @r sharing @ci.id,
  *	   (summed across child resource groups if @rgrp is a parent resource group).
  * @arch_mon_ctx: Hardware monitor allocated for this read request (MPAM only).
  */
 struct rmid_read {
 	struct rdtgroup		*rgrp;
 	struct rdt_resource	*r;
-	struct rdt_mon_domain	*d;
+	struct rdt_domain_hdr	*hdr;
 	enum resctrl_event_id	evtid;
 	bool			first;
 	unsigned int		ci_id;
@@ -352,7 +352,7 @@ void mon_event_count(void *info);
 int rdtgroup_mondata_show(struct seq_file *m, void *arg);
 
 void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
-		    struct rdt_mon_domain *d, struct rdtgroup *rdtgrp,
+		    struct rdt_domain_hdr *hdr, struct rdtgroup *rdtgrp,
 		    cpumask_t *cpumask, int evtid, int first);
 
 int resctrl_mon_resource_init(void);
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 2075c98aa4e7..1fecb6425b9e 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -529,7 +529,7 @@ static void l3_mon_domain_setup(int cpu, int id, struct rdt_resource *r, struct
 
 	list_add_tail_rcu(&d->hdr.list, add_pos);
 
-	err = resctrl_online_mon_domain(r, d);
+	err = resctrl_online_mon_domain(r, &d->hdr);
 	if (err) {
 		list_del_rcu(&d->hdr.list);
 		synchronize_rcu();
@@ -655,7 +655,7 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
 	case RDT_RESOURCE_L3:
 		d = container_of(hdr, struct rdt_mon_domain, hdr);
 		hw_dom = resctrl_to_arch_mon_dom(d);
-		resctrl_offline_mon_domain(r, d);
+		resctrl_offline_mon_domain(r, hdr);
 		list_del_rcu(&d->hdr.list);
 		synchronize_rcu();
 		mon_domain_free(hw_dom);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index f01db2034d08..b31794c5dcd4 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -217,20 +217,30 @@ static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
 	return chunks >> shift;
 }
 
-int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
 			   u32 unused, u32 rmid, enum resctrl_event_id eventid,
 			   u64 *val, void *ignored)
 {
-	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
-	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
-	int cpu = cpumask_any(&d->hdr.cpu_mask);
+	int cpu = cpumask_any(&hdr->cpu_mask);
+	struct rdt_hw_mon_domain *hw_dom;
+	struct rdt_hw_resource *hw_res;
 	struct arch_mbm_state *am;
+	struct rdt_mon_domain *d;
 	u64 msr_val, chunks;
 	u32 prmid;
 	int ret;
 
 	resctrl_arch_rmid_read_context_check();
 
+	if (r->rid != RDT_RESOURCE_L3)
+		return -EINVAL;
+
+	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
+		return -EINVAL;
+
+	d = container_of(hdr, struct rdt_mon_domain, hdr);
+	hw_dom = resctrl_to_arch_mon_dom(d);
+	hw_res = resctrl_to_arch_res(r);
 	prmid = logical_rmid_to_physical_rmid(cpu, rmid);
 	ret = __rmid_read_phys(prmid, eventid, &msr_val);
 	if (ret)
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index cdb4bc8baa99..1c1c0e7bbc11 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -547,7 +547,7 @@ struct rdt_domain_hdr *resctrl_find_domain(struct list_head *h, int id,
 }
 
 void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
-		    struct rdt_mon_domain *d, struct rdtgroup *rdtgrp,
+		    struct rdt_domain_hdr *hdr, struct rdtgroup *rdtgrp,
 		    cpumask_t *cpumask, int evtid, int first)
 {
 	int cpu;
@@ -561,7 +561,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 	rr->rgrp = rdtgrp;
 	rr->evtid = evtid;
 	rr->r = r;
-	rr->d = d;
+	rr->hdr = hdr;
 	rr->first = first;
 	rr->arch_mon_ctx = resctrl_arch_mon_ctx_alloc(r, evtid);
 	if (IS_ERR(rr->arch_mon_ctx)) {
@@ -592,7 +592,6 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 	enum resctrl_event_id evtid;
 	struct rdt_domain_hdr *hdr;
 	struct rmid_read rr = {0};
-	struct rdt_mon_domain *d;
 	struct rdtgroup *rdtgrp;
 	int domid, cpu, ret = 0;
 	struct rdt_resource *r;
@@ -617,6 +616,12 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 	r = resctrl_arch_get_resource(resid);
 
 	if (md->sum) {
+		struct rdt_mon_domain *d;
+
+		if (WARN_ON_ONCE(resid != RDT_RESOURCE_L3)) {
+			ret = -EIO;
+			goto out;
+		}
 		/*
 		 * This file requires summing across all domains that share
 		 * the L3 cache id that was provided in the "domid" field of the
@@ -643,12 +648,11 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 		 * the resource to find the domain with "domid".
 		 */
 		hdr = resctrl_find_domain(&r->mon_domains, domid, NULL);
-		if (!hdr || !domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3)) {
+		if (!hdr || !domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, resid)) {
 			ret = -ENOENT;
 			goto out;
 		}
-		d = container_of(hdr, struct rdt_mon_domain, hdr);
-		mon_event_read(&rr, r, d, rdtgrp, &d->hdr.cpu_mask, evtid, false);
+		mon_event_read(&rr, r, hdr, rdtgrp, &hdr->cpu_mask, evtid, false);
 	}
 
 checkresult:
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index dcc6c00eb362..85fe88b965fa 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -159,7 +159,7 @@ void __check_limbo(struct rdt_mon_domain *d, bool force_free)
 			break;
 
 		entry = __rmid_entry(idx);
-		if (resctrl_arch_rmid_read(r, d, entry->closid, entry->rmid,
+		if (resctrl_arch_rmid_read(r, &d->hdr, entry->closid, entry->rmid,
 					   QOS_L3_OCCUP_EVENT_ID, &val,
 					   arch_mon_ctx)) {
 			rmid_dirty = true;
@@ -365,19 +365,23 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
 	int err, ret;
 	u64 tval = 0;
 
-	if (rr->first) {
-		resctrl_arch_reset_rmid(rr->r, rr->d, closid, rmid, rr->evtid);
-		m = get_mbm_state(rr->d, closid, rmid, rr->evtid);
+	if (rr->r->rid == RDT_RESOURCE_L3 && rr->first) {
+		if (WARN_ON_ONCE(!domain_header_is_valid(rr->hdr, RESCTRL_MON_DOMAIN,
+							 RDT_RESOURCE_L3)))
+			return -EINVAL;
+		d = container_of(rr->hdr, struct rdt_mon_domain, hdr);
+		resctrl_arch_reset_rmid(rr->r, d, closid, rmid, rr->evtid);
+		m = get_mbm_state(d, closid, rmid, rr->evtid);
 		if (m)
 			memset(m, 0, sizeof(struct mbm_state));
 		return 0;
 	}
 
-	if (rr->d) {
+	if (rr->hdr) {
 		/* Reading a single domain, must be on a CPU in that domain. */
-		if (!cpumask_test_cpu(cpu, &rr->d->hdr.cpu_mask))
+		if (!cpumask_test_cpu(cpu, &rr->hdr->cpu_mask))
 			return -EINVAL;
-		rr->err = resctrl_arch_rmid_read(rr->r, rr->d, closid, rmid,
+		rr->err = resctrl_arch_rmid_read(rr->r, rr->hdr, closid, rmid,
 						 rr->evtid, &tval, rr->arch_mon_ctx);
 		if (rr->err)
 			return rr->err;
@@ -387,6 +391,9 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
 		return 0;
 	}
 
+	if (WARN_ON_ONCE(rr->r->rid != RDT_RESOURCE_L3))
+		return -EINVAL;
+
 	/* Summing domains that share a cache, must be on a CPU for that cache. */
 	ci = get_cpu_cacheinfo_level(cpu, RESCTRL_L3_CACHE);
 	if (!ci || ci->id != rr->ci_id)
@@ -403,7 +410,7 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
 	list_for_each_entry(d, &rr->r->mon_domains, hdr.list) {
 		if (d->ci_id != rr->ci_id)
 			continue;
-		err = resctrl_arch_rmid_read(rr->r, d, closid, rmid,
+		err = resctrl_arch_rmid_read(rr->r, &d->hdr, closid, rmid,
 					     rr->evtid, &tval, rr->arch_mon_ctx);
 		if (!err) {
 			rr->val += tval;
@@ -432,9 +439,13 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
 static void mbm_bw_count(u32 closid, u32 rmid, struct rmid_read *rr)
 {
 	u64 cur_bw, bytes, cur_bytes;
+	struct rdt_mon_domain *d;
 	struct mbm_state *m;
 
-	m = get_mbm_state(rr->d, closid, rmid, rr->evtid);
+	if (WARN_ON_ONCE(domain_header_is_valid(rr->hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3)))
+		return;
+	d = container_of(rr->hdr, struct rdt_mon_domain, hdr);
+	m = get_mbm_state(d, closid, rmid, rr->evtid);
 	if (WARN_ON_ONCE(!m))
 		return;
 
@@ -608,7 +619,7 @@ static void mbm_update_one_event(struct rdt_resource *r, struct rdt_mon_domain *
 	struct rmid_read rr = {0};
 
 	rr.r = r;
-	rr.d = d;
+	rr.hdr = &d->hdr;
 	rr.evtid = evtid;
 	rr.arch_mon_ctx = resctrl_arch_mon_ctx_alloc(rr.r, rr.evtid);
 	if (IS_ERR(rr.arch_mon_ctx)) {
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 05438e15e2ca..3828480e0426 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2887,7 +2887,8 @@ static void rmdir_all_sub(void)
  * @rid:    The resource id for the event file being created.
  * @domid:  The domain id for the event file being created.
  * @mevt:   The type of event file being created.
- * @do_sum: Whether SNC summing monitors are being created.
+ * @do_sum: Whether SNC summing monitors are being created. Only set
+ *          when @rid == RDT_RESOURCE_L3.
  */
 static struct mon_data *mon_get_kn_priv(enum resctrl_res_level rid, int domid,
 					struct mon_evt *mevt,
@@ -2897,6 +2898,9 @@ static struct mon_data *mon_get_kn_priv(enum resctrl_res_level rid, int domid,
 
 	lockdep_assert_held(&rdtgroup_mutex);
 
+	if (WARN_ON_ONCE(do_sum && rid != RDT_RESOURCE_L3))
+		return NULL;
+
 	list_for_each_entry(priv, &mon_data_kn_priv_list, list) {
 		if (priv->rid == rid && priv->domid == domid &&
 		    priv->sum == do_sum && priv->evtid == mevt->evtid)
@@ -3024,17 +3028,27 @@ static void mon_rmdir_one_subdir(struct kernfs_node *pkn, char *name, char *subn
  * when last domain being summed is removed.
  */
 static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
-					   struct rdt_mon_domain *d)
+					   struct rdt_domain_hdr *hdr)
 {
 	struct rdtgroup *prgrp, *crgrp;
+	int domid = hdr->id;
 	char subname[32];
-	bool snc_mode;
 	char name[32];
 
-	snc_mode = r->mon_scope == RESCTRL_L3_NODE;
-	sprintf(name, "mon_%s_%02d", r->name, snc_mode ? d->ci_id : d->hdr.id);
-	if (snc_mode)
-		sprintf(subname, "mon_sub_%s_%02d", r->name, d->hdr.id);
+	if (r->rid == RDT_RESOURCE_L3) {
+		struct rdt_mon_domain *d;
+
+		if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
+			return;
+		d = container_of(hdr, struct rdt_mon_domain, hdr);
+
+		/* SNC mode? */
+		if (r->mon_scope == RESCTRL_L3_NODE) {
+			domid = d->ci_id;
+			sprintf(subname, "mon_sub_%s_%02d", r->name, d->hdr.id);
+		}
+	}
+	sprintf(name, "mon_%s_%02d", r->name, domid);
 
 	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
 		mon_rmdir_one_subdir(prgrp->mon.mon_data_kn, name, subname);
@@ -3044,19 +3058,18 @@ static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
 	}
 }
 
-static int mon_add_all_files(struct kernfs_node *kn, struct rdt_mon_domain *d,
+static int mon_add_all_files(struct kernfs_node *kn, struct rdt_domain_hdr *hdr,
 			     struct rdt_resource *r, struct rdtgroup *prgrp,
-			     bool do_sum)
+			     int domid, bool do_sum)
 {
 	struct rmid_read rr = {0};
 	struct mon_data *priv;
 	struct mon_evt *mevt;
-	int ret, domid;
+	int ret;
 
 	for_each_mon_event(mevt) {
 		if (mevt->rid != r->rid || !mevt->enabled)
 			continue;
-		domid = do_sum ? d->ci_id : d->hdr.id;
 		priv = mon_get_kn_priv(r->rid, domid, mevt, do_sum);
 		if (WARN_ON_ONCE(!priv))
 			return -EINVAL;
@@ -3065,26 +3078,38 @@ static int mon_add_all_files(struct kernfs_node *kn, struct rdt_mon_domain *d,
 		if (ret)
 			return ret;
 
-		if (!do_sum && resctrl_is_mbm_event(mevt->evtid))
-			mon_event_read(&rr, r, d, prgrp, &d->hdr.cpu_mask, mevt->evtid, true);
+		if (r->rid == RDT_RESOURCE_L3 && !do_sum && resctrl_is_mbm_event(mevt->evtid))
+			mon_event_read(&rr, r, hdr, prgrp, &hdr->cpu_mask, mevt->evtid, true);
 	}
 
 	return 0;
 }
 
 static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
-				struct rdt_mon_domain *d,
+				struct rdt_domain_hdr *hdr,
 				struct rdt_resource *r, struct rdtgroup *prgrp)
 {
 	struct kernfs_node *kn, *ckn;
+	int domid = hdr->id;
+	bool snc_mode = 0;
 	char name[32];
-	bool snc_mode;
 	int ret = 0;
 
 	lockdep_assert_held(&rdtgroup_mutex);
 
-	snc_mode = r->mon_scope == RESCTRL_L3_NODE;
-	sprintf(name, "mon_%s_%02d", r->name, snc_mode ? d->ci_id : d->hdr.id);
+	if (r->rid == RDT_RESOURCE_L3) {
+		if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
+			return -EINVAL;
+		snc_mode = r->mon_scope == RESCTRL_L3_NODE;
+		if (snc_mode) {
+			struct rdt_mon_domain *d;
+
+			d = container_of(hdr, struct rdt_mon_domain, hdr);
+			domid = d->ci_id;
+		}
+	}
+	sprintf(name, "mon_%s_%02d", r->name, domid);
+
 	kn = kernfs_find_and_get(parent_kn, name);
 	if (kn) {
 		/*
@@ -3100,13 +3125,13 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
 		ret = rdtgroup_kn_set_ugid(kn);
 		if (ret)
 			goto out_destroy;
-		ret = mon_add_all_files(kn, d, r, prgrp, snc_mode);
+		ret = mon_add_all_files(kn, hdr, r, prgrp, domid, snc_mode);
 		if (ret)
 			goto out_destroy;
 	}
 
 	if (snc_mode) {
-		sprintf(name, "mon_sub_%s_%02d", r->name, d->hdr.id);
+		sprintf(name, "mon_sub_%s_%02d", r->name, hdr->id);
 		ckn = kernfs_create_dir(kn, name, parent_kn->mode, prgrp);
 		if (IS_ERR(ckn)) {
 			ret = -EINVAL;
@@ -3117,7 +3142,7 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
 		if (ret)
 			goto out_destroy;
 
-		ret = mon_add_all_files(ckn, d, r, prgrp, false);
+		ret = mon_add_all_files(ckn, hdr, r, prgrp, hdr->id, false);
 		if (ret)
 			goto out_destroy;
 	}
@@ -3135,7 +3160,7 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
  * and "monitor" groups with given domain id.
  */
 static void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
-					   struct rdt_mon_domain *d)
+					   struct rdt_domain_hdr *hdr)
 {
 	struct kernfs_node *parent_kn;
 	struct rdtgroup *prgrp, *crgrp;
@@ -3143,12 +3168,12 @@ static void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
 
 	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
 		parent_kn = prgrp->mon.mon_data_kn;
-		mkdir_mondata_subdir(parent_kn, d, r, prgrp);
+		mkdir_mondata_subdir(parent_kn, hdr, r, prgrp);
 
 		head = &prgrp->mon.crdtgrp_list;
 		list_for_each_entry(crgrp, head, mon.crdtgrp_list) {
 			parent_kn = crgrp->mon.mon_data_kn;
-			mkdir_mondata_subdir(parent_kn, d, r, crgrp);
+			mkdir_mondata_subdir(parent_kn, hdr, r, crgrp);
 		}
 	}
 }
@@ -3157,14 +3182,14 @@ static int mkdir_mondata_subdir_alldom(struct kernfs_node *parent_kn,
 				       struct rdt_resource *r,
 				       struct rdtgroup *prgrp)
 {
-	struct rdt_mon_domain *dom;
+	struct rdt_domain_hdr *hdr;
 	int ret;
 
 	/* Walking r->domains, ensure it can't race with cpuhp */
 	lockdep_assert_cpus_held();
 
-	list_for_each_entry(dom, &r->mon_domains, hdr.list) {
-		ret = mkdir_mondata_subdir(parent_kn, dom, r, prgrp);
+	list_for_each_entry(hdr, &r->mon_domains, list) {
+		ret = mkdir_mondata_subdir(parent_kn, hdr, r, prgrp);
 		if (ret)
 			return ret;
 	}
@@ -4036,8 +4061,10 @@ void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain
 	mutex_unlock(&rdtgroup_mutex);
 }
 
-void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
+void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr)
 {
+	struct rdt_mon_domain *d;
+
 	mutex_lock(&rdtgroup_mutex);
 
 	/*
@@ -4045,11 +4072,15 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d
 	 * per domain monitor data directories.
 	 */
 	if (resctrl_mounted && resctrl_arch_mon_capable())
-		rmdir_mondata_subdir_allrdtgrp(r, d);
+		rmdir_mondata_subdir_allrdtgrp(r, hdr);
 
 	if (r->rid != RDT_RESOURCE_L3)
 		goto out_unlock;
 
+	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
+		goto out_unlock;
+
+	d = container_of(hdr, struct rdt_mon_domain, hdr);
 	if (resctrl_is_mbm_enabled())
 		cancel_delayed_work(&d->mbm_over);
 	if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID) && has_busy_rmid(d)) {
@@ -4132,12 +4163,20 @@ int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d
 	return err;
 }
 
-int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
+int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr)
 {
-	int err;
+	struct rdt_mon_domain *d;
+	int err = -EINVAL;
 
 	mutex_lock(&rdtgroup_mutex);
 
+	if (r->rid != RDT_RESOURCE_L3)
+		goto mkdir;
+
+	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, r->rid))
+		goto out_unlock;
+
+	d = container_of(hdr, struct rdt_mon_domain, hdr);
 	err = domain_setup_mon_state(r, d);
 	if (err)
 		goto out_unlock;
@@ -4151,6 +4190,8 @@ int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
 	if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID))
 		INIT_DELAYED_WORK(&d->cqm_limbo, cqm_handle_limbo);
 
+mkdir:
+	err = 0;
 	/*
 	 * If the filesystem is not mounted then only the default resource group
 	 * exists. Creation of its directories is deferred until mount time
@@ -4158,7 +4199,7 @@ int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
 	 * If resctrl is mounted, add per domain monitor data directories.
 	 */
 	if (resctrl_mounted && resctrl_arch_mon_capable())
-		mkdir_mondata_subdir_allrdtgrp(r, d);
+		mkdir_mondata_subdir_allrdtgrp(r, hdr);
 
 out_unlock:
 	mutex_unlock(&rdtgroup_mutex);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH v6 10/30] x86,fs/resctrl: Rename struct rdt_mon_domain and rdt_hw_mon_domain
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (8 preceding siblings ...)
  2025-06-26 16:49 ` [PATCH v6 09/30] x86,fs/resctrl: Use struct rdt_domain_hdr instead of struct rdt_mon_domain Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-07-08 21:06   ` Reinette Chatre
  2025-06-26 16:49 ` [PATCH v6 11/30] x86,fs/resctrl: Rename some L3 specific functions Tony Luck
                   ` (22 subsequent siblings)
  32 siblings, 1 reply; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Historically all monitoring events have been associated with the L3
resource. This will change when support for telemetry events is added.

The structures to track monitor domains at both the file system and
architecture level have generic names. This may cause confusion when
support for monitoring events in other resources is added.

Rename by adding "l3_" into the names:
rdt_mon_domain		-> rdt_l3_mon_domain
rdt_hw_mon_domain	-> rdt_hw_l3_mon_domain

No functional change.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 include/linux/resctrl.h                | 16 ++++++------
 arch/x86/kernel/cpu/resctrl/internal.h | 12 ++++-----
 fs/resctrl/internal.h                  |  8 +++---
 arch/x86/kernel/cpu/resctrl/core.c     | 14 +++++-----
 arch/x86/kernel/cpu/resctrl/monitor.c  | 18 ++++++-------
 fs/resctrl/ctrlmondata.c               |  2 +-
 fs/resctrl/monitor.c                   | 34 ++++++++++++------------
 fs/resctrl/rdtgroup.c                  | 36 +++++++++++++-------------
 8 files changed, 70 insertions(+), 70 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index b332466312e1..01740acebcd1 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -166,7 +166,7 @@ struct rdt_ctrl_domain {
 };
 
 /**
- * struct rdt_mon_domain - group of CPUs sharing a resctrl monitor resource
+ * struct rdt_l3_mon_domain - group of CPUs sharing a resctrl monitor resource
  * @hdr:		common header for different domain types
  * @ci_id:		cache info id for this domain
  * @rmid_busy_llc:	bitmap of which limbo RMIDs are above threshold
@@ -178,7 +178,7 @@ struct rdt_ctrl_domain {
  * @mbm_work_cpu:	worker CPU for MBM h/w counters
  * @cqm_work_cpu:	worker CPU for CQM h/w counters
  */
-struct rdt_mon_domain {
+struct rdt_l3_mon_domain {
 	struct rdt_domain_hdr		hdr;
 	unsigned int			ci_id;
 	unsigned long			*rmid_busy_llc;
@@ -334,10 +334,10 @@ struct resctrl_cpu_defaults {
 };
 
 struct resctrl_mon_config_info {
-	struct rdt_resource	*r;
-	struct rdt_mon_domain	*d;
-	u32			evtid;
-	u32			mon_config;
+	struct rdt_resource		*r;
+	struct rdt_l3_mon_domain	*d;
+	u32				evtid;
+	u32				mon_config;
 };
 
 /**
@@ -530,7 +530,7 @@ struct rdt_domain_hdr *resctrl_find_domain(struct list_head *h, int id,
  *
  * This can be called from any CPU.
  */
-void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
 			     u32 closid, u32 rmid,
 			     enum resctrl_event_id eventid);
 
@@ -543,7 +543,7 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
  *
  * This can be called from any CPU.
  */
-void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d);
+void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_l3_mon_domain *d);
 
 /**
  * resctrl_arch_reset_all_ctrls() - Reset the control for each CLOSID to its
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 58dca892a5df..224b71730cc3 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -51,7 +51,7 @@ struct rdt_hw_ctrl_domain {
 };
 
 /**
- * struct rdt_hw_mon_domain - Arch private attributes of a set of CPUs that share
+ * struct rdt_hw_l3_mon_domain - Arch private attributes of a set of CPUs that share
  *			      a resource for a monitor function
  * @d_resctrl:	Properties exposed to the resctrl file system
  * @arch_mbm_states:	Per-event pointer to the MBM event's saved state.
@@ -60,8 +60,8 @@ struct rdt_hw_ctrl_domain {
  *
  * Members of this structure are accessed via helpers that provide abstraction.
  */
-struct rdt_hw_mon_domain {
-	struct rdt_mon_domain		d_resctrl;
+struct rdt_hw_l3_mon_domain {
+	struct rdt_l3_mon_domain		d_resctrl;
 	struct arch_mbm_state		*arch_mbm_states[QOS_NUM_L3_MBM_EVENTS];
 };
 
@@ -70,9 +70,9 @@ static inline struct rdt_hw_ctrl_domain *resctrl_to_arch_ctrl_dom(struct rdt_ctr
 	return container_of(r, struct rdt_hw_ctrl_domain, d_resctrl);
 }
 
-static inline struct rdt_hw_mon_domain *resctrl_to_arch_mon_dom(struct rdt_mon_domain *r)
+static inline struct rdt_hw_l3_mon_domain *resctrl_to_arch_mon_dom(struct rdt_l3_mon_domain *r)
 {
-	return container_of(r, struct rdt_hw_mon_domain, d_resctrl);
+	return container_of(r, struct rdt_hw_l3_mon_domain, d_resctrl);
 }
 
 /**
@@ -124,7 +124,7 @@ static inline struct rdt_hw_resource *resctrl_to_arch_res(struct rdt_resource *r
 
 extern struct rdt_hw_resource rdt_resources_all[];
 
-void arch_mon_domain_online(struct rdt_resource *r, struct rdt_mon_domain *d);
+void arch_mon_domain_online(struct rdt_resource *r, struct rdt_l3_mon_domain *d);
 
 /* CPUID.(EAX=10H, ECX=ResID=1).EAX */
 union cpuid_0x10_1_eax {
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index ce3d24c512e3..b12242d20e61 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -357,7 +357,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 
 int resctrl_mon_resource_init(void);
 
-void mbm_setup_overflow_handler(struct rdt_mon_domain *dom,
+void mbm_setup_overflow_handler(struct rdt_l3_mon_domain *dom,
 				unsigned long delay_ms,
 				int exclude_cpu);
 
@@ -365,14 +365,14 @@ void mbm_handle_overflow(struct work_struct *work);
 
 bool is_mba_sc(struct rdt_resource *r);
 
-void cqm_setup_limbo_handler(struct rdt_mon_domain *dom, unsigned long delay_ms,
+void cqm_setup_limbo_handler(struct rdt_l3_mon_domain *dom, unsigned long delay_ms,
 			     int exclude_cpu);
 
 void cqm_handle_limbo(struct work_struct *work);
 
-bool has_busy_rmid(struct rdt_mon_domain *d);
+bool has_busy_rmid(struct rdt_l3_mon_domain *d);
 
-void __check_limbo(struct rdt_mon_domain *d, bool force_free);
+void __check_limbo(struct rdt_l3_mon_domain *d, bool force_free);
 
 void resctrl_file_fflags_init(const char *config, unsigned long fflags);
 
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 1fecb6425b9e..b6bb94444943 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -363,7 +363,7 @@ static void ctrl_domain_free(struct rdt_hw_ctrl_domain *hw_dom)
 	kfree(hw_dom);
 }
 
-static void mon_domain_free(struct rdt_hw_mon_domain *hw_dom)
+static void mon_domain_free(struct rdt_hw_l3_mon_domain *hw_dom)
 {
 	int idx;
 
@@ -400,7 +400,7 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_ctrl_domain *
  * @num_rmid:	The size of the MBM counter array
  * @hw_dom:	The domain that owns the allocated arrays
  */
-static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_mon_domain *hw_dom)
+static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_l3_mon_domain *hw_dom)
 {
 	size_t tsize = sizeof(*hw_dom->arch_mbm_states[0]);
 	enum resctrl_event_id eventid;
@@ -498,8 +498,8 @@ static void domain_add_cpu_ctrl(int cpu, struct rdt_resource *r)
 
 static void l3_mon_domain_setup(int cpu, int id, struct rdt_resource *r, struct list_head *add_pos)
 {
-	struct rdt_hw_mon_domain *hw_dom;
-	struct rdt_mon_domain *d;
+	struct rdt_hw_l3_mon_domain *hw_dom;
+	struct rdt_l3_mon_domain *d;
 	struct cacheinfo *ci;
 	int err;
 
@@ -625,9 +625,9 @@ static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
 static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
 {
 	int id = get_domain_id_from_scope(cpu, r->mon_scope);
-	struct rdt_hw_mon_domain *hw_dom;
+	struct rdt_hw_l3_mon_domain *hw_dom;
+	struct rdt_l3_mon_domain *d;
 	struct rdt_domain_hdr *hdr;
-	struct rdt_mon_domain *d;
 
 	lockdep_assert_held(&domain_list_lock);
 
@@ -653,7 +653,7 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
 
 	switch (r->rid) {
 	case RDT_RESOURCE_L3:
-		d = container_of(hdr, struct rdt_mon_domain, hdr);
+		d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
 		hw_dom = resctrl_to_arch_mon_dom(d);
 		resctrl_offline_mon_domain(r, hdr);
 		list_del_rcu(&d->hdr.list);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index b31794c5dcd4..043f777378a6 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -157,7 +157,7 @@ static int __rmid_read_phys(u32 prmid, enum resctrl_event_id eventid, u64 *val)
 	return 0;
 }
 
-static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_mon_domain *hw_dom,
+static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_l3_mon_domain *hw_dom,
 						 u32 rmid,
 						 enum resctrl_event_id eventid)
 {
@@ -171,11 +171,11 @@ static struct arch_mbm_state *get_arch_mbm_state(struct rdt_hw_mon_domain *hw_do
 	return state ? &state[rmid] : NULL;
 }
 
-void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
+void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
 			     u32 unused, u32 rmid,
 			     enum resctrl_event_id eventid)
 {
-	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
+	struct rdt_hw_l3_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
 	int cpu = cpumask_any(&d->hdr.cpu_mask);
 	struct arch_mbm_state *am;
 	u32 prmid;
@@ -194,9 +194,9 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
  * Assumes that hardware counters are also reset and thus that there is
  * no need to record initial non-zero counts.
  */
-void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d)
+void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
 {
-	struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
+	struct rdt_hw_l3_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
 	enum resctrl_event_id eventid;
 	int idx;
 
@@ -222,10 +222,10 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
 			   u64 *val, void *ignored)
 {
 	int cpu = cpumask_any(&hdr->cpu_mask);
-	struct rdt_hw_mon_domain *hw_dom;
+	struct rdt_hw_l3_mon_domain *hw_dom;
 	struct rdt_hw_resource *hw_res;
+	struct rdt_l3_mon_domain *d;
 	struct arch_mbm_state *am;
-	struct rdt_mon_domain *d;
 	u64 msr_val, chunks;
 	u32 prmid;
 	int ret;
@@ -238,7 +238,7 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
 	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
 		return -EINVAL;
 
-	d = container_of(hdr, struct rdt_mon_domain, hdr);
+	d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
 	hw_dom = resctrl_to_arch_mon_dom(d);
 	hw_res = resctrl_to_arch_res(r);
 	prmid = logical_rmid_to_physical_rmid(cpu, rmid);
@@ -275,7 +275,7 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
  * must adjust RMID counter numbers based on SNC node. See
  * logical_rmid_to_physical_rmid() for code that does this.
  */
-void arch_mon_domain_online(struct rdt_resource *r, struct rdt_mon_domain *d)
+void arch_mon_domain_online(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
 {
 	if (snc_nodes_per_l3_cache > 1)
 		msr_clear_bit(MSR_RMID_SNC_CONFIG, 0);
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index 1c1c0e7bbc11..1d7086509bfa 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -616,7 +616,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 	r = resctrl_arch_get_resource(resid);
 
 	if (md->sum) {
-		struct rdt_mon_domain *d;
+		struct rdt_l3_mon_domain *d;
 
 		if (WARN_ON_ONCE(resid != RDT_RESOURCE_L3)) {
 			ret = -EIO;
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 85fe88b965fa..28d96147b9f4 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -130,7 +130,7 @@ static void limbo_release_entry(struct rmid_entry *entry)
  * decrement the count. If the busy count gets to zero on an RMID, we
  * free the RMID
  */
-void __check_limbo(struct rdt_mon_domain *d, bool force_free)
+void __check_limbo(struct rdt_l3_mon_domain *d, bool force_free)
 {
 	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
 	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
@@ -188,7 +188,7 @@ void __check_limbo(struct rdt_mon_domain *d, bool force_free)
 	resctrl_arch_mon_ctx_free(r, QOS_L3_OCCUP_EVENT_ID, arch_mon_ctx);
 }
 
-bool has_busy_rmid(struct rdt_mon_domain *d)
+bool has_busy_rmid(struct rdt_l3_mon_domain *d)
 {
 	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
 
@@ -289,7 +289,7 @@ int alloc_rmid(u32 closid)
 static void add_rmid_to_limbo(struct rmid_entry *entry)
 {
 	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
-	struct rdt_mon_domain *d;
+	struct rdt_l3_mon_domain *d;
 	u32 idx;
 
 	lockdep_assert_held(&rdtgroup_mutex);
@@ -342,7 +342,7 @@ void free_rmid(u32 closid, u32 rmid)
 		list_add_tail(&entry->list, &rmid_free_lru);
 }
 
-static struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
+static struct mbm_state *get_mbm_state(struct rdt_l3_mon_domain *d, u32 closid,
 				       u32 rmid, enum resctrl_event_id evtid)
 {
 	u32 idx = resctrl_arch_rmid_idx_encode(closid, rmid);
@@ -359,7 +359,7 @@ static struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
 static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
 {
 	int cpu = smp_processor_id();
-	struct rdt_mon_domain *d;
+	struct rdt_l3_mon_domain *d;
 	struct cacheinfo *ci;
 	struct mbm_state *m;
 	int err, ret;
@@ -369,7 +369,7 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
 		if (WARN_ON_ONCE(!domain_header_is_valid(rr->hdr, RESCTRL_MON_DOMAIN,
 							 RDT_RESOURCE_L3)))
 			return -EINVAL;
-		d = container_of(rr->hdr, struct rdt_mon_domain, hdr);
+		d = container_of(rr->hdr, struct rdt_l3_mon_domain, hdr);
 		resctrl_arch_reset_rmid(rr->r, d, closid, rmid, rr->evtid);
 		m = get_mbm_state(d, closid, rmid, rr->evtid);
 		if (m)
@@ -439,12 +439,12 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
 static void mbm_bw_count(u32 closid, u32 rmid, struct rmid_read *rr)
 {
 	u64 cur_bw, bytes, cur_bytes;
-	struct rdt_mon_domain *d;
+	struct rdt_l3_mon_domain *d;
 	struct mbm_state *m;
 
 	if (WARN_ON_ONCE(domain_header_is_valid(rr->hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3)))
 		return;
-	d = container_of(rr->hdr, struct rdt_mon_domain, hdr);
+	d = container_of(rr->hdr, struct rdt_l3_mon_domain, hdr);
 	m = get_mbm_state(d, closid, rmid, rr->evtid);
 	if (WARN_ON_ONCE(!m))
 		return;
@@ -545,7 +545,7 @@ static struct rdt_ctrl_domain *get_ctrl_domain_from_cpu(int cpu,
  * throttle MSRs already have low percentage values.  To avoid
  * unnecessarily restricting such rdtgroups, we also increase the bandwidth.
  */
-static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm)
+static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_l3_mon_domain *dom_mbm)
 {
 	u32 closid, rmid, cur_msr_val, new_msr_val;
 	struct mbm_state *pmbm_data, *cmbm_data;
@@ -613,7 +613,7 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_mon_domain *dom_mbm)
 	resctrl_arch_update_one(r_mba, dom_mba, closid, CDP_NONE, new_msr_val);
 }
 
-static void mbm_update_one_event(struct rdt_resource *r, struct rdt_mon_domain *d,
+static void mbm_update_one_event(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
 				 u32 closid, u32 rmid, enum resctrl_event_id evtid)
 {
 	struct rmid_read rr = {0};
@@ -640,7 +640,7 @@ static void mbm_update_one_event(struct rdt_resource *r, struct rdt_mon_domain *
 	resctrl_arch_mon_ctx_free(rr.r, rr.evtid, rr.arch_mon_ctx);
 }
 
-static void mbm_update(struct rdt_resource *r, struct rdt_mon_domain *d,
+static void mbm_update(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
 		       u32 closid, u32 rmid)
 {
 	/*
@@ -661,12 +661,12 @@ static void mbm_update(struct rdt_resource *r, struct rdt_mon_domain *d,
 void cqm_handle_limbo(struct work_struct *work)
 {
 	unsigned long delay = msecs_to_jiffies(CQM_LIMBOCHECK_INTERVAL);
-	struct rdt_mon_domain *d;
+	struct rdt_l3_mon_domain *d;
 
 	cpus_read_lock();
 	mutex_lock(&rdtgroup_mutex);
 
-	d = container_of(work, struct rdt_mon_domain, cqm_limbo.work);
+	d = container_of(work, struct rdt_l3_mon_domain, cqm_limbo.work);
 
 	__check_limbo(d, false);
 
@@ -689,7 +689,7 @@ void cqm_handle_limbo(struct work_struct *work)
  * @exclude_cpu:   Which CPU the handler should not run on,
  *		   RESCTRL_PICK_ANY_CPU to pick any CPU.
  */
-void cqm_setup_limbo_handler(struct rdt_mon_domain *dom, unsigned long delay_ms,
+void cqm_setup_limbo_handler(struct rdt_l3_mon_domain *dom, unsigned long delay_ms,
 			     int exclude_cpu)
 {
 	unsigned long delay = msecs_to_jiffies(delay_ms);
@@ -706,7 +706,7 @@ void mbm_handle_overflow(struct work_struct *work)
 {
 	unsigned long delay = msecs_to_jiffies(MBM_OVERFLOW_INTERVAL);
 	struct rdtgroup *prgrp, *crgrp;
-	struct rdt_mon_domain *d;
+	struct rdt_l3_mon_domain *d;
 	struct list_head *head;
 	struct rdt_resource *r;
 
@@ -721,7 +721,7 @@ void mbm_handle_overflow(struct work_struct *work)
 		goto out_unlock;
 
 	r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
-	d = container_of(work, struct rdt_mon_domain, mbm_over.work);
+	d = container_of(work, struct rdt_l3_mon_domain, mbm_over.work);
 
 	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
 		mbm_update(r, d, prgrp->closid, prgrp->mon.rmid);
@@ -755,7 +755,7 @@ void mbm_handle_overflow(struct work_struct *work)
  * @exclude_cpu:   Which CPU the handler should not run on,
  *		   RESCTRL_PICK_ANY_CPU to pick any CPU.
  */
-void mbm_setup_overflow_handler(struct rdt_mon_domain *dom, unsigned long delay_ms,
+void mbm_setup_overflow_handler(struct rdt_l3_mon_domain *dom, unsigned long delay_ms,
 				int exclude_cpu)
 {
 	unsigned long delay = msecs_to_jiffies(delay_ms);
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 3828480e0426..4d369f6df8e8 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1617,7 +1617,7 @@ static void mondata_config_read(struct resctrl_mon_config_info *mon_info)
 static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid)
 {
 	struct resctrl_mon_config_info mon_info;
-	struct rdt_mon_domain *dom;
+	struct rdt_l3_mon_domain *dom;
 	bool sep = false;
 
 	cpus_read_lock();
@@ -1665,7 +1665,7 @@ static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
 }
 
 static void mbm_config_write_domain(struct rdt_resource *r,
-				    struct rdt_mon_domain *d, u32 evtid, u32 val)
+				    struct rdt_l3_mon_domain *d, u32 evtid, u32 val)
 {
 	struct resctrl_mon_config_info mon_info = {0};
 
@@ -1706,8 +1706,8 @@ static void mbm_config_write_domain(struct rdt_resource *r,
 static int mon_config_write(struct rdt_resource *r, char *tok, u32 evtid)
 {
 	char *dom_str = NULL, *id_str;
+	struct rdt_l3_mon_domain *d;
 	unsigned long dom_id, val;
-	struct rdt_mon_domain *d;
 
 	/* Walking r->domains, ensure it can't race with cpuhp */
 	lockdep_assert_cpus_held();
@@ -2581,7 +2581,7 @@ static int rdt_get_tree(struct fs_context *fc)
 {
 	struct rdt_fs_context *ctx = rdt_fc2context(fc);
 	unsigned long flags = RFTYPE_CTRL_BASE;
-	struct rdt_mon_domain *dom;
+	struct rdt_l3_mon_domain *dom;
 	struct rdt_resource *r;
 	int ret;
 
@@ -3036,11 +3036,11 @@ static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
 	char name[32];
 
 	if (r->rid == RDT_RESOURCE_L3) {
-		struct rdt_mon_domain *d;
+		struct rdt_l3_mon_domain *d;
 
 		if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
 			return;
-		d = container_of(hdr, struct rdt_mon_domain, hdr);
+		d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
 
 		/* SNC mode? */
 		if (r->mon_scope == RESCTRL_L3_NODE) {
@@ -3102,9 +3102,9 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
 			return -EINVAL;
 		snc_mode = r->mon_scope == RESCTRL_L3_NODE;
 		if (snc_mode) {
-			struct rdt_mon_domain *d;
+			struct rdt_l3_mon_domain *d;
 
-			d = container_of(hdr, struct rdt_mon_domain, hdr);
+			d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
 			domid = d->ci_id;
 		}
 	}
@@ -4040,7 +4040,7 @@ static void rdtgroup_setup_default(void)
 	mutex_unlock(&rdtgroup_mutex);
 }
 
-static void domain_destroy_mon_state(struct rdt_mon_domain *d)
+static void domain_destroy_mon_state(struct rdt_l3_mon_domain *d)
 {
 	int idx;
 
@@ -4063,7 +4063,7 @@ void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain
 
 void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr)
 {
-	struct rdt_mon_domain *d;
+	struct rdt_l3_mon_domain *d;
 
 	mutex_lock(&rdtgroup_mutex);
 
@@ -4080,7 +4080,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *h
 	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
 		goto out_unlock;
 
-	d = container_of(hdr, struct rdt_mon_domain, hdr);
+	d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
 	if (resctrl_is_mbm_enabled())
 		cancel_delayed_work(&d->mbm_over);
 	if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID) && has_busy_rmid(d)) {
@@ -4114,7 +4114,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *h
  *
  * Returns 0 for success, or -ENOMEM.
  */
-static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_mon_domain *d)
+static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
 {
 	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
 	size_t tsize = sizeof(*d->mbm_states[0]);
@@ -4165,7 +4165,7 @@ int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d
 
 int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr)
 {
-	struct rdt_mon_domain *d;
+	struct rdt_l3_mon_domain *d;
 	int err = -EINVAL;
 
 	mutex_lock(&rdtgroup_mutex);
@@ -4176,7 +4176,7 @@ int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr
 	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, r->rid))
 		goto out_unlock;
 
-	d = container_of(hdr, struct rdt_mon_domain, hdr);
+	d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
 	err = domain_setup_mon_state(r, d);
 	if (err)
 		goto out_unlock;
@@ -4225,10 +4225,10 @@ static void clear_childcpus(struct rdtgroup *r, unsigned int cpu)
 	}
 }
 
-static struct rdt_mon_domain *get_mon_domain_from_cpu(int cpu,
-						      struct rdt_resource *r)
+static struct rdt_l3_mon_domain *get_mon_domain_from_cpu(int cpu,
+							 struct rdt_resource *r)
 {
-	struct rdt_mon_domain *d;
+	struct rdt_l3_mon_domain *d;
 
 	lockdep_assert_cpus_held();
 
@@ -4244,7 +4244,7 @@ static struct rdt_mon_domain *get_mon_domain_from_cpu(int cpu,
 void resctrl_offline_cpu(unsigned int cpu)
 {
 	struct rdt_resource *l3 = resctrl_arch_get_resource(RDT_RESOURCE_L3);
-	struct rdt_mon_domain *d;
+	struct rdt_l3_mon_domain *d;
 	struct rdtgroup *rdtgrp;
 
 	mutex_lock(&rdtgroup_mutex);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH v6 11/30] x86,fs/resctrl: Rename some L3 specific functions
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (9 preceding siblings ...)
  2025-06-26 16:49 ` [PATCH v6 10/30] x86,fs/resctrl: Rename struct rdt_mon_domain and rdt_hw_mon_domain Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-07-08 21:08   ` Reinette Chatre
  2025-06-26 16:49 ` [PATCH v6 12/30] fs/resctrl: Make event details accessible to functions when reading events Tony Luck
                   ` (21 subsequent siblings)
  32 siblings, 1 reply; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

All monitor functions used to be tied to the RDT_RESOURCE_L3 resource,
so generic function names to setup and tear down domains made sense.

But with the arrival of monitor events tied to other domains it would
be clearer if these functions were more accurately named.

Two groups of functions renamed here:

Functions that allocate/free architecture per-RMID mbm state information:
arch_domain_mbm_alloc()		-> l3_mon_domain_mbm_alloc()
mon_domain_free()		-> l3_mon_domain_free()

Functions that allocate/free filesystem per-RMID mbm state information:
domain_setup_mon_state()	-> domain_setup_l3_mon_state()
domain_destroy_mon_state()	-> domain_destroy_l3_mon_state()

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/resctrl/core.c | 16 ++++++++--------
 fs/resctrl/rdtgroup.c              | 10 +++++-----
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index b6bb94444943..976b4f9d1197 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -363,7 +363,7 @@ static void ctrl_domain_free(struct rdt_hw_ctrl_domain *hw_dom)
 	kfree(hw_dom);
 }
 
-static void mon_domain_free(struct rdt_hw_l3_mon_domain *hw_dom)
+static void l3_mon_domain_free(struct rdt_hw_l3_mon_domain *hw_dom)
 {
 	int idx;
 
@@ -396,11 +396,11 @@ static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_ctrl_domain *
 }
 
 /**
- * arch_domain_mbm_alloc() - Allocate arch private storage for the MBM counters
+ * l3_mon_domain_mbm_alloc() - Allocate arch private storage for the MBM counters
  * @num_rmid:	The size of the MBM counter array
  * @hw_dom:	The domain that owns the allocated arrays
  */
-static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_l3_mon_domain *hw_dom)
+static int l3_mon_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_l3_mon_domain *hw_dom)
 {
 	size_t tsize = sizeof(*hw_dom->arch_mbm_states[0]);
 	enum resctrl_event_id eventid;
@@ -514,7 +514,7 @@ static void l3_mon_domain_setup(int cpu, int id, struct rdt_resource *r, struct
 	ci = get_cpu_cacheinfo_level(cpu, RESCTRL_L3_CACHE);
 	if (!ci) {
 		pr_warn_once("Can't find L3 cache for CPU:%d resource %s\n", cpu, r->name);
-		mon_domain_free(hw_dom);
+		l3_mon_domain_free(hw_dom);
 		return;
 	}
 	d->ci_id = ci->id;
@@ -522,8 +522,8 @@ static void l3_mon_domain_setup(int cpu, int id, struct rdt_resource *r, struct
 
 	arch_mon_domain_online(r, d);
 
-	if (arch_domain_mbm_alloc(r->num_rmid, hw_dom)) {
-		mon_domain_free(hw_dom);
+	if (l3_mon_domain_mbm_alloc(r->num_rmid, hw_dom)) {
+		l3_mon_domain_free(hw_dom);
 		return;
 	}
 
@@ -533,7 +533,7 @@ static void l3_mon_domain_setup(int cpu, int id, struct rdt_resource *r, struct
 	if (err) {
 		list_del_rcu(&d->hdr.list);
 		synchronize_rcu();
-		mon_domain_free(hw_dom);
+		l3_mon_domain_free(hw_dom);
 	}
 }
 
@@ -658,7 +658,7 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
 		resctrl_offline_mon_domain(r, hdr);
 		list_del_rcu(&d->hdr.list);
 		synchronize_rcu();
-		mon_domain_free(hw_dom);
+		l3_mon_domain_free(hw_dom);
 		break;
 	default:
 		pr_warn_once("Unknown resource rid=%d\n", r->rid);
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 4d369f6df8e8..39018f6c8b14 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -4040,7 +4040,7 @@ static void rdtgroup_setup_default(void)
 	mutex_unlock(&rdtgroup_mutex);
 }
 
-static void domain_destroy_mon_state(struct rdt_l3_mon_domain *d)
+static void domain_destroy_l3_mon_state(struct rdt_l3_mon_domain *d)
 {
 	int idx;
 
@@ -4096,13 +4096,13 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *h
 		cancel_delayed_work(&d->cqm_limbo);
 	}
 
-	domain_destroy_mon_state(d);
+	domain_destroy_l3_mon_state(d);
 out_unlock:
 	mutex_unlock(&rdtgroup_mutex);
 }
 
 /**
- * domain_setup_mon_state() -  Initialise domain monitoring structures.
+ * domain_setup_l3_mon_state() -  Initialise domain monitoring structures.
  * @r:	The resource for the newly online domain.
  * @d:	The newly online domain.
  *
@@ -4114,7 +4114,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *h
  *
  * Returns 0 for success, or -ENOMEM.
  */
-static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
+static int domain_setup_l3_mon_state(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
 {
 	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
 	size_t tsize = sizeof(*d->mbm_states[0]);
@@ -4177,7 +4177,7 @@ int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr
 		goto out_unlock;
 
 	d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
-	err = domain_setup_mon_state(r, d);
+	err = domain_setup_l3_mon_state(r, d);
 	if (err)
 		goto out_unlock;
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH v6 12/30] fs/resctrl: Make event details accessible to functions when reading events
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (10 preceding siblings ...)
  2025-06-26 16:49 ` [PATCH v6 11/30] x86,fs/resctrl: Rename some L3 specific functions Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-07-09 22:12   ` Reinette Chatre
  2025-06-26 16:49 ` [PATCH v6 13/30] x86,fs/resctrl: Handle events that can be read from any CPU Tony Luck
                   ` (20 subsequent siblings)
  32 siblings, 1 reply; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

All details about a monitor event are kept in the mon_evt structure.
Upper levels of code only provide the event id to lower levels.
This will become a problem when new attributes are added to the
mon_evt structure.

Change the mon_data and rmid_read structures to hold a pointer
to the mon_evt structure instead of just taking a copy of the
event id.

No functional change.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 fs/resctrl/internal.h    |  8 ++++----
 fs/resctrl/ctrlmondata.c | 16 ++++++++--------
 fs/resctrl/monitor.c     | 17 +++++++++--------
 fs/resctrl/rdtgroup.c    |  6 +++---
 4 files changed, 24 insertions(+), 23 deletions(-)

diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index b12242d20e61..1458fda64423 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -90,7 +90,7 @@ extern struct mon_evt mon_event_all[QOS_NUM_EVENTS];
 struct mon_data {
 	struct list_head	list;
 	enum resctrl_res_level	rid;
-	enum resctrl_event_id	evtid;
+	struct mon_evt		*evt;
 	int			domid;
 	bool			sum;
 };
@@ -103,7 +103,7 @@ struct mon_data {
  * @r:	   Resource describing the properties of the event being read.
  * @hdr:   Header of domain that the counter should be read from. If NULL then sum all
  *	   domains in @r sharing L3 @ci.id
- * @evtid: Which monitor event to read.
+ * @evt:   Event associated with the event file.
  * @first: Initialize MBM counter when true.
  * @ci_id: Cacheinfo id for L3. Only set when @hdr is NULL. Used when summing domains.
  * @err:   Error encountered when reading counter.
@@ -117,7 +117,7 @@ struct rmid_read {
 	struct rdtgroup		*rgrp;
 	struct rdt_resource	*r;
 	struct rdt_domain_hdr	*hdr;
-	enum resctrl_event_id	evtid;
+	struct mon_evt		*evt;
 	bool			first;
 	unsigned int		ci_id;
 	int			err;
@@ -353,7 +353,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg);
 
 void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 		    struct rdt_domain_hdr *hdr, struct rdtgroup *rdtgrp,
-		    cpumask_t *cpumask, int evtid, int first);
+		    cpumask_t *cpumask, struct mon_evt *evt, int first);
 
 int resctrl_mon_resource_init(void);
 
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index 1d7086509bfa..a99903ac5d27 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -548,7 +548,7 @@ struct rdt_domain_hdr *resctrl_find_domain(struct list_head *h, int id,
 
 void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 		    struct rdt_domain_hdr *hdr, struct rdtgroup *rdtgrp,
-		    cpumask_t *cpumask, int evtid, int first)
+		    cpumask_t *cpumask, struct mon_evt *evt, int first)
 {
 	int cpu;
 
@@ -559,11 +559,11 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 	 * Setup the parameters to pass to mon_event_count() to read the data.
 	 */
 	rr->rgrp = rdtgrp;
-	rr->evtid = evtid;
+	rr->evt = evt;
 	rr->r = r;
 	rr->hdr = hdr;
 	rr->first = first;
-	rr->arch_mon_ctx = resctrl_arch_mon_ctx_alloc(r, evtid);
+	rr->arch_mon_ctx = resctrl_arch_mon_ctx_alloc(r, evt->evtid);
 	if (IS_ERR(rr->arch_mon_ctx)) {
 		rr->err = -EINVAL;
 		return;
@@ -582,20 +582,20 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 	else
 		smp_call_on_cpu(cpu, smp_mon_event_count, rr, false);
 
-	resctrl_arch_mon_ctx_free(r, evtid, rr->arch_mon_ctx);
+	resctrl_arch_mon_ctx_free(r, evt->evtid, rr->arch_mon_ctx);
 }
 
 int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 {
 	struct kernfs_open_file *of = m->private;
 	enum resctrl_res_level resid;
-	enum resctrl_event_id evtid;
 	struct rdt_domain_hdr *hdr;
 	struct rmid_read rr = {0};
 	struct rdtgroup *rdtgrp;
 	int domid, cpu, ret = 0;
 	struct rdt_resource *r;
 	struct cacheinfo *ci;
+	struct mon_evt *evt;
 	struct mon_data *md;
 
 	rdtgrp = rdtgroup_kn_lock_live(of->kn);
@@ -612,7 +612,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 
 	resid = md->rid;
 	domid = md->domid;
-	evtid = md->evtid;
+	evt = md->evt;
 	r = resctrl_arch_get_resource(resid);
 
 	if (md->sum) {
@@ -636,7 +636,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 				if (!ci)
 					continue;
 				mon_event_read(&rr, r, NULL, rdtgrp,
-					       &ci->shared_cpu_map, evtid, false);
+					       &ci->shared_cpu_map, evt, false);
 				goto checkresult;
 			}
 		}
@@ -652,7 +652,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 			ret = -ENOENT;
 			goto out;
 		}
-		mon_event_read(&rr, r, hdr, rdtgrp, &hdr->cpu_mask, evtid, false);
+		mon_event_read(&rr, r, hdr, rdtgrp, &hdr->cpu_mask, evt, false);
 	}
 
 checkresult:
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 28d96147b9f4..6d4191eff391 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -370,8 +370,8 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
 							 RDT_RESOURCE_L3)))
 			return -EINVAL;
 		d = container_of(rr->hdr, struct rdt_l3_mon_domain, hdr);
-		resctrl_arch_reset_rmid(rr->r, d, closid, rmid, rr->evtid);
-		m = get_mbm_state(d, closid, rmid, rr->evtid);
+		resctrl_arch_reset_rmid(rr->r, d, closid, rmid, rr->evt->evtid);
+		m = get_mbm_state(d, closid, rmid, rr->evt->evtid);
 		if (m)
 			memset(m, 0, sizeof(struct mbm_state));
 		return 0;
@@ -382,7 +382,7 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
 		if (!cpumask_test_cpu(cpu, &rr->hdr->cpu_mask))
 			return -EINVAL;
 		rr->err = resctrl_arch_rmid_read(rr->r, rr->hdr, closid, rmid,
-						 rr->evtid, &tval, rr->arch_mon_ctx);
+						 rr->evt->evtid, &tval, rr->arch_mon_ctx);
 		if (rr->err)
 			return rr->err;
 
@@ -411,7 +411,7 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
 		if (d->ci_id != rr->ci_id)
 			continue;
 		err = resctrl_arch_rmid_read(rr->r, &d->hdr, closid, rmid,
-					     rr->evtid, &tval, rr->arch_mon_ctx);
+					     rr->evt->evtid, &tval, rr->arch_mon_ctx);
 		if (!err) {
 			rr->val += tval;
 			ret = 0;
@@ -445,7 +445,7 @@ static void mbm_bw_count(u32 closid, u32 rmid, struct rmid_read *rr)
 	if (WARN_ON_ONCE(domain_header_is_valid(rr->hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3)))
 		return;
 	d = container_of(rr->hdr, struct rdt_l3_mon_domain, hdr);
-	m = get_mbm_state(d, closid, rmid, rr->evtid);
+	m = get_mbm_state(d, closid, rmid, rr->evt->evtid);
 	if (WARN_ON_ONCE(!m))
 		return;
 
@@ -616,12 +616,13 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_l3_mon_domain *dom_m
 static void mbm_update_one_event(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
 				 u32 closid, u32 rmid, enum resctrl_event_id evtid)
 {
+	struct mon_evt *evt = &mon_event_all[evtid];
 	struct rmid_read rr = {0};
 
 	rr.r = r;
 	rr.hdr = &d->hdr;
-	rr.evtid = evtid;
-	rr.arch_mon_ctx = resctrl_arch_mon_ctx_alloc(rr.r, rr.evtid);
+	rr.evt = evt;
+	rr.arch_mon_ctx = resctrl_arch_mon_ctx_alloc(rr.r, evt->evtid);
 	if (IS_ERR(rr.arch_mon_ctx)) {
 		pr_warn_ratelimited("Failed to allocate monitor context: %ld",
 				    PTR_ERR(rr.arch_mon_ctx));
@@ -637,7 +638,7 @@ static void mbm_update_one_event(struct rdt_resource *r, struct rdt_l3_mon_domai
 	if (is_mba_sc(NULL))
 		mbm_bw_count(closid, rmid, &rr);
 
-	resctrl_arch_mon_ctx_free(rr.r, rr.evtid, rr.arch_mon_ctx);
+	resctrl_arch_mon_ctx_free(rr.r, evt->evtid, rr.arch_mon_ctx);
 }
 
 static void mbm_update(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 39018f6c8b14..a10f2f6825fc 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2903,7 +2903,7 @@ static struct mon_data *mon_get_kn_priv(enum resctrl_res_level rid, int domid,
 
 	list_for_each_entry(priv, &mon_data_kn_priv_list, list) {
 		if (priv->rid == rid && priv->domid == domid &&
-		    priv->sum == do_sum && priv->evtid == mevt->evtid)
+		    priv->sum == do_sum && priv->evt == mevt)
 			return priv;
 	}
 
@@ -2914,7 +2914,7 @@ static struct mon_data *mon_get_kn_priv(enum resctrl_res_level rid, int domid,
 	priv->rid = rid;
 	priv->domid = domid;
 	priv->sum = do_sum;
-	priv->evtid = mevt->evtid;
+	priv->evt = mevt;
 	list_add_tail(&priv->list, &mon_data_kn_priv_list);
 
 	return priv;
@@ -3079,7 +3079,7 @@ static int mon_add_all_files(struct kernfs_node *kn, struct rdt_domain_hdr *hdr,
 			return ret;
 
 		if (r->rid == RDT_RESOURCE_L3 && !do_sum && resctrl_is_mbm_event(mevt->evtid))
-			mon_event_read(&rr, r, hdr, prgrp, &hdr->cpu_mask, mevt->evtid, true);
+			mon_event_read(&rr, r, hdr, prgrp, &hdr->cpu_mask, mevt, true);
 	}
 
 	return 0;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH v6 13/30] x86,fs/resctrl: Handle events that can be read from any CPU
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (11 preceding siblings ...)
  2025-06-26 16:49 ` [PATCH v6 12/30] fs/resctrl: Make event details accessible to functions when reading events Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-07-08 21:15   ` Reinette Chatre
  2025-06-26 16:49 ` [PATCH v6 14/30] x86,fs/resctrl: Support binary fixed point event counters Tony Luck
                   ` (19 subsequent siblings)
  32 siblings, 1 reply; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Resctrl file system code was built with the assumption that monitor
events can only be read from a CPU in the cpumask_t set for each
domain.

This was true for x86 events accessed with an MSR interface, but may
not be true for other access methods such as MMIO.

Add a flag to struct mon_evt to indicate if the event can be read on
any CPU.

Architecture uses resctrl_enable_mon_event() to enable an event and
set the flag appropriately.

Bypass all the smp_call*() code for events that can be read on any CPU
and call mon_event_count() directly from mon_event_read().

Add a test for events that can be read from any domain to skip checks
in __mon_event_count() that the read is being done from a CPU in the
correct domain or cache scope.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 include/linux/resctrl.h            |  2 +-
 fs/resctrl/internal.h              |  2 ++
 arch/x86/kernel/cpu/resctrl/core.c |  6 ++---
 fs/resctrl/ctrlmondata.c           |  7 +++++-
 fs/resctrl/monitor.c               | 36 +++++++++++++++++++++++-------
 5 files changed, 40 insertions(+), 13 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 01740acebcd1..e05a1abb25d4 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -379,7 +379,7 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
 u32 resctrl_arch_system_num_rmid_idx(void);
 int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
 
-void resctrl_enable_mon_event(enum resctrl_event_id eventid);
+void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu);
 
 bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid);
 
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 1458fda64423..f51d10d6a510 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -57,6 +57,7 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
  * @rid:		index of the resource for this event
  * @name:		name of the event
  * @configurable:	true if the event is configurable
+ * @any_cpu:		true if the event can be read from any CPU
  * @enabled:		true if the event is enabled
  */
 struct mon_evt {
@@ -64,6 +65,7 @@ struct mon_evt {
 	enum resctrl_res_level	rid;
 	char			*name;
 	bool			configurable;
+	bool			any_cpu;
 	bool			enabled;
 };
 
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 976b4f9d1197..b83861ab504f 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -887,15 +887,15 @@ static __init bool get_rdt_mon_resources(void)
 	bool ret = false;
 
 	if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) {
-		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID);
+		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false);
 		ret = true;
 	}
 	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
-		resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID);
+		resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID, false);
 		ret = true;
 	}
 	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
-		resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID);
+		resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID, false);
 		ret = true;
 	}
 
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index a99903ac5d27..2e65fddc3408 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -569,6 +569,11 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 		return;
 	}
 
+	if (evt->any_cpu) {
+		mon_event_count(rr);
+		goto out_ctx_free;
+	}
+
 	cpu = cpumask_any_housekeeping(cpumask, RESCTRL_PICK_ANY_CPU);
 
 	/*
@@ -581,7 +586,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 		smp_call_function_any(cpumask, mon_event_count, rr, 1);
 	else
 		smp_call_on_cpu(cpu, smp_mon_event_count, rr, false);
-
+out_ctx_free:
 	resctrl_arch_mon_ctx_free(r, evt->evtid, rr->arch_mon_ctx);
 }
 
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 6d4191eff391..aec26457d82c 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -356,11 +356,30 @@ static struct mbm_state *get_mbm_state(struct rdt_l3_mon_domain *d, u32 closid,
 	return state ? &state[idx] : NULL;
 }
 
+static bool cpu_on_correct_domain(struct rmid_read *rr)
+{
+	struct cacheinfo *ci;
+	int cpu;
+
+	/* Any CPU is OK for this event */
+	if (rr->evt->any_cpu)
+		return true;
+
+	cpu = smp_processor_id();
+
+	/* Single domain. Must be on a CPU in that domain. */
+	if (rr->hdr)
+		return cpumask_test_cpu(cpu, &rr->hdr->cpu_mask);
+
+	/* Summing domains that share a cache, must be on a CPU for that cache. */
+	ci = get_cpu_cacheinfo_level(cpu, RESCTRL_L3_CACHE);
+
+	return ci && ci->id == rr->ci_id;
+}
+
 static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
 {
-	int cpu = smp_processor_id();
 	struct rdt_l3_mon_domain *d;
-	struct cacheinfo *ci;
 	struct mbm_state *m;
 	int err, ret;
 	u64 tval = 0;
@@ -378,9 +397,10 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
 	}
 
 	if (rr->hdr) {
-		/* Reading a single domain, must be on a CPU in that domain. */
-		if (!cpumask_test_cpu(cpu, &rr->hdr->cpu_mask))
+		/* Single domain. */
+		if (!cpu_on_correct_domain(rr))
 			return -EINVAL;
+
 		rr->err = resctrl_arch_rmid_read(rr->r, rr->hdr, closid, rmid,
 						 rr->evt->evtid, &tval, rr->arch_mon_ctx);
 		if (rr->err)
@@ -394,9 +414,8 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
 	if (WARN_ON_ONCE(rr->r->rid != RDT_RESOURCE_L3))
 		return -EINVAL;
 
-	/* Summing domains that share a cache, must be on a CPU for that cache. */
-	ci = get_cpu_cacheinfo_level(cpu, RESCTRL_L3_CACHE);
-	if (!ci || ci->id != rr->ci_id)
+	/* Sum across multiple domains. */
+	if (!cpu_on_correct_domain(rr))
 		return -EINVAL;
 
 	/*
@@ -878,7 +897,7 @@ struct mon_evt mon_event_all[QOS_NUM_EVENTS] = {
 	},
 };
 
-void resctrl_enable_mon_event(enum resctrl_event_id eventid)
+void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu)
 {
 	if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS))
 		return;
@@ -887,6 +906,7 @@ void resctrl_enable_mon_event(enum resctrl_event_id eventid)
 		return;
 	}
 
+	mon_event_all[eventid].any_cpu = any_cpu;
 	mon_event_all[eventid].enabled = true;
 }
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH v6 14/30] x86,fs/resctrl: Support binary fixed point event counters
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (12 preceding siblings ...)
  2025-06-26 16:49 ` [PATCH v6 13/30] x86,fs/resctrl: Handle events that can be read from any CPU Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-06-27 21:22   ` Fenghua Yu
                     ` (2 more replies)
  2025-06-26 16:49 ` [PATCH v6 15/30] x86,fs/resctrl: Add an architectural hook called for each mount Tony Luck
                   ` (18 subsequent siblings)
  32 siblings, 3 replies; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Resctrl was written with the assumption that all monitor events can be
displayed as unsigned decimal integers.

Hardware architecture counters may provide some telemetry events with
greater precision where the event is not a simple count, but is a
measurement of some sort (e.g. Joules for energy consumed).

Add a new argument to resctrl_enable_mon_event() for architecture code
to inform the file system that the value for a counter is a fixed-point
value with a specific number of binary places.  The file system will
only allow architecture to use floating point format on events that it
marked with mon_evt::is_floating_point.

Fixed point values are displayed with values rounded to an appropriate
number of decimal places for the precision of the number of binary places
provided. In general one extra decimal place is added for every three
additional binary places. There are some exceptions for low precision
binary values where exact representation is possible:

  1 binary place is 0.0 or 0.5.			=> 1 decimal place
  2 binary places is 0.0. 0.25, 0.5, 0.75	=> 2 decimal places
  3 binary places is 0.0, 0.125, etc.		=> 3 decimal places

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 include/linux/resctrl.h            |  4 +-
 fs/resctrl/internal.h              |  4 ++
 arch/x86/kernel/cpu/resctrl/core.c |  6 +-
 fs/resctrl/ctrlmondata.c           | 91 +++++++++++++++++++++++++++++-
 fs/resctrl/monitor.c               | 10 +++-
 5 files changed, 108 insertions(+), 7 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index e05a1abb25d4..1060a54cc9fa 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -379,7 +379,9 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
 u32 resctrl_arch_system_num_rmid_idx(void);
 int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
 
-void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu);
+#define MAX_BINARY_BITS	27
+
+void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu, u32 binary_bits);
 
 bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid);
 
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index f51d10d6a510..4dc678af005c 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -58,6 +58,8 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
  * @name:		name of the event
  * @configurable:	true if the event is configurable
  * @any_cpu:		true if the event can be read from any CPU
+ * @is_floating_point:	event values may be displayed in floating point format
+ * @binary_bits:	number of fixed-point binary bits from architecture
  * @enabled:		true if the event is enabled
  */
 struct mon_evt {
@@ -66,6 +68,8 @@ struct mon_evt {
 	char			*name;
 	bool			configurable;
 	bool			any_cpu;
+	bool			is_floating_point;
+	int			binary_bits;
 	bool			enabled;
 };
 
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index b83861ab504f..2b6c6b61707d 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -887,15 +887,15 @@ static __init bool get_rdt_mon_resources(void)
 	bool ret = false;
 
 	if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) {
-		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false);
+		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false, 0);
 		ret = true;
 	}
 	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
-		resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID, false);
+		resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID, false, 0);
 		ret = true;
 	}
 	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
-		resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID, false);
+		resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID, false, 0);
 		ret = true;
 	}
 
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index 2e65fddc3408..29de0e380ccc 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -590,6 +590,93 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 	resctrl_arch_mon_ctx_free(r, evt->evtid, rr->arch_mon_ctx);
 }
 
+/**
+ * struct fixed_params - parameters to decode a binary fixed point value
+ * @decplaces:	Number of decimal places for this number of binary places.
+ * @pow10:	Multiplier (10 ^ decimal places).
+ */
+struct fixed_params {
+	int	decplaces;
+	int	pow10;
+};
+
+static struct fixed_params fixed_params[MAX_BINARY_BITS + 1] = {
+	[1]  = { .decplaces = 1, .pow10 = 10 },
+	[2]  = { .decplaces = 2, .pow10 = 100 },
+	[3]  = { .decplaces = 3, .pow10 = 1000 },
+	[4]  = { .decplaces = 3, .pow10 = 1000 },
+	[5]  = { .decplaces = 3, .pow10 = 1000 },
+	[6]  = { .decplaces = 3, .pow10 = 1000 },
+	[7]  = { .decplaces = 3, .pow10 = 1000 },
+	[8]  = { .decplaces = 3, .pow10 = 1000 },
+	[9]  = { .decplaces = 3, .pow10 = 1000 },
+	[10] = { .decplaces = 4, .pow10 = 10000 },
+	[11] = { .decplaces = 4, .pow10 = 10000 },
+	[12] = { .decplaces = 4, .pow10 = 10000 },
+	[13] = { .decplaces = 5, .pow10 = 100000 },
+	[14] = { .decplaces = 5, .pow10 = 100000 },
+	[15] = { .decplaces = 5, .pow10 = 100000 },
+	[16] = { .decplaces = 6, .pow10 = 1000000 },
+	[17] = { .decplaces = 6, .pow10 = 1000000 },
+	[18] = { .decplaces = 6, .pow10 = 1000000 },
+	[19] = { .decplaces = 7, .pow10 = 10000000 },
+	[20] = { .decplaces = 7, .pow10 = 10000000 },
+	[21] = { .decplaces = 7, .pow10 = 10000000 },
+	[22] = { .decplaces = 8, .pow10 = 100000000 },
+	[23] = { .decplaces = 8, .pow10 = 100000000 },
+	[24] = { .decplaces = 8, .pow10 = 100000000 },
+	[25] = { .decplaces = 9, .pow10 = 1000000000 },
+	[26] = { .decplaces = 9, .pow10 = 1000000000 },
+	[27] = { .decplaces = 9, .pow10 = 1000000000 }
+};
+
+static void print_event_value(struct seq_file *m, int binary_bits, u64 val)
+{
+	struct fixed_params *fp = &fixed_params[binary_bits];
+	unsigned long long frac;
+	char buf[10];
+
+	/* Mask off the integer part of the fixed-point value. */
+	frac = val & GENMASK_ULL(binary_bits, 0);
+
+	/*
+	 * Multiply by 10^{desired decimal places}. The
+	 * integer part of the fixed point value is now
+	 * almost what is needed.
+	 */
+	frac *= fp->pow10;
+
+	/*
+	 * Round to nearest by adding a value that
+	 * would be a "1" in the binary_bit + 1 place.
+	 * Integer part of fixed point value is now
+	 * the needed value.
+	 */
+	frac += 1 << (binary_bits - 1);
+
+	/*
+	 * Extract the integer part of the value. This
+	 * is the decimal representation of the original
+	 * fixed-point fractional value.
+	 */
+	frac >>= binary_bits;
+
+	/*
+	 * "frac" is now in the range [0 .. fp->pow10).
+	 * I.e. string representation will fit into
+	 * fp->decplaces.
+	 */
+	sprintf(buf, "%0*llu", fp->decplaces, frac);
+
+	/* Trim trailing zeroes */
+	for (int i = fp->decplaces - 1; i > 0; i--) {
+		if (buf[i] != '0')
+			break;
+		buf[i] = '\0';
+	}
+	seq_printf(m, "%llu.%s\n", val >> binary_bits, buf);
+}
+
 int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 {
 	struct kernfs_open_file *of = m->private;
@@ -666,8 +753,10 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 		seq_puts(m, "Error\n");
 	else if (rr.err == -EINVAL)
 		seq_puts(m, "Unavailable\n");
-	else
+	else if (evt->binary_bits == 0)
 		seq_printf(m, "%llu\n", rr.val);
+	else
+		print_event_value(m, evt->binary_bits, rr.val);
 
 out:
 	rdtgroup_kn_unlock(of->kn);
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index aec26457d82c..076c0cc6e53a 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -897,16 +897,22 @@ struct mon_evt mon_event_all[QOS_NUM_EVENTS] = {
 	},
 };
 
-void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu)
+void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu, u32 binary_bits)
 {
-	if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS))
+	if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS) ||
+			 binary_bits > MAX_BINARY_BITS)
 		return;
 	if (mon_event_all[eventid].enabled) {
 		pr_warn("Duplicate enable for event %d\n", eventid);
 		return;
 	}
+	if (binary_bits && !mon_event_all[eventid].is_floating_point) {
+		pr_warn("Event %d may not be floating point\n", eventid);
+		return;
+	}
 
 	mon_event_all[eventid].any_cpu = any_cpu;
+	mon_event_all[eventid].binary_bits = binary_bits;
 	mon_event_all[eventid].enabled = true;
 }
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH v6 15/30] x86,fs/resctrl: Add an architectural hook called for each mount
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (13 preceding siblings ...)
  2025-06-26 16:49 ` [PATCH v6 14/30] x86,fs/resctrl: Support binary fixed point event counters Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-06-26 16:49 ` [PATCH v6 16/30] x86,fs/resctrl: Add and initialize rdt_resource for package scope core monitor Tony Luck
                   ` (17 subsequent siblings)
  32 siblings, 0 replies; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Enumeration of Intel telemetry events is not complete when the
resctrl "late_init" code is executed.

Add a hook at the beginning of the mount code that will be used
to check for telemetry events and initialize if any are found.

The hook is called on every attempted mount. But expectations are that
most actions (like enumeration) will only need to be performed
on the first call.

The call is made with no locks held. Architecture code is responsible
for any required locking.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 include/linux/resctrl.h            | 6 ++++++
 arch/x86/kernel/cpu/resctrl/core.c | 9 +++++++++
 fs/resctrl/rdtgroup.c              | 2 ++
 3 files changed, 17 insertions(+)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 1060a54cc9fa..23e2874105e3 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -460,6 +460,12 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *h
 void resctrl_online_cpu(unsigned int cpu);
 void resctrl_offline_cpu(unsigned int cpu);
 
+/*
+ * Architecture hook called for each attempted file system mount.
+ * No locks are held.
+ */
+void resctrl_arch_pre_mount(void);
+
 /**
  * resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid
  *			      for this resource and domain.
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 2b6c6b61707d..03c481725fdb 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -717,6 +717,15 @@ static int resctrl_arch_offline_cpu(unsigned int cpu)
 	return 0;
 }
 
+void resctrl_arch_pre_mount(void)
+{
+	static atomic_t only_once = ATOMIC_INIT(0);
+	int old = 0;
+
+	if (!atomic_try_cmpxchg(&only_once, &old, 1))
+		return;
+}
+
 enum {
 	RDT_FLAG_CMT,
 	RDT_FLAG_MBM_TOTAL,
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index a10f2f6825fc..9dac8017a2f8 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2585,6 +2585,8 @@ static int rdt_get_tree(struct fs_context *fc)
 	struct rdt_resource *r;
 	int ret;
 
+	resctrl_arch_pre_mount();
+
 	cpus_read_lock();
 	mutex_lock(&rdtgroup_mutex);
 	/*
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH v6 16/30] x86,fs/resctrl: Add and initialize rdt_resource for package scope core monitor
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (14 preceding siblings ...)
  2025-06-26 16:49 ` [PATCH v6 15/30] x86,fs/resctrl: Add an architectural hook called for each mount Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-07-08 22:05   ` Reinette Chatre
  2025-06-26 16:49 ` [PATCH v6 17/30] x86/resctrl: Discover hardware telemetry events Tony Luck
                   ` (16 subsequent siblings)
  32 siblings, 1 reply; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Counts for each Intel telemetry event are periodically sent to one or
more aggregators on each package where accumulated totals are made
available in MMIO registers.

Add a new resource for monitoring these events so that CPU hotplug
notifiers will build domains at the package granularity.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 include/linux/resctrl.h            |  2 ++
 fs/resctrl/internal.h              |  2 ++
 arch/x86/kernel/cpu/resctrl/core.c | 10 ++++++++++
 fs/resctrl/rdtgroup.c              |  2 ++
 4 files changed, 16 insertions(+)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 23e2874105e3..76c54b81e426 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -53,6 +53,7 @@ enum resctrl_res_level {
 	RDT_RESOURCE_L2,
 	RDT_RESOURCE_MBA,
 	RDT_RESOURCE_SMBA,
+	RDT_RESOURCE_PERF_PKG,
 
 	/* Must be the last */
 	RDT_NUM_RESOURCES,
@@ -252,6 +253,7 @@ enum resctrl_scope {
 	RESCTRL_L2_CACHE = 2,
 	RESCTRL_L3_CACHE = 3,
 	RESCTRL_L3_NODE,
+	RESCTRL_PACKAGE,
 };
 
 /**
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 4dc678af005c..53ced959a27d 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -239,6 +239,8 @@ struct rdtgroup {
 
 #define RFTYPE_DEBUG			BIT(10)
 
+#define RFTYPE_RES_PERF_PKG		BIT(11)
+
 #define RFTYPE_CTRL_INFO		(RFTYPE_INFO | RFTYPE_CTRL)
 
 #define RFTYPE_MON_INFO			(RFTYPE_INFO | RFTYPE_MON)
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 03c481725fdb..a5f01cac2363 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -100,6 +100,14 @@ struct rdt_hw_resource rdt_resources_all[RDT_NUM_RESOURCES] = {
 			.schema_fmt		= RESCTRL_SCHEMA_RANGE,
 		},
 	},
+	[RDT_RESOURCE_PERF_PKG] =
+	{
+		.r_resctrl = {
+			.name			= "PERF_PKG",
+			.mon_scope		= RESCTRL_PACKAGE,
+			.mon_domains		= mon_domain_init(RDT_RESOURCE_PERF_PKG),
+		},
+	},
 };
 
 u32 resctrl_arch_system_num_rmid_idx(void)
@@ -433,6 +441,8 @@ static int get_domain_id_from_scope(int cpu, enum resctrl_scope scope)
 		return get_cpu_cacheinfo_id(cpu, scope);
 	case RESCTRL_L3_NODE:
 		return cpu_to_node(cpu);
+	case RESCTRL_PACKAGE:
+		return topology_physical_package_id(cpu);
 	default:
 		break;
 	}
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 9dac8017a2f8..d9bb01edd582 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2195,6 +2195,8 @@ static unsigned long fflags_from_resource(struct rdt_resource *r)
 	case RDT_RESOURCE_MBA:
 	case RDT_RESOURCE_SMBA:
 		return RFTYPE_RES_MB;
+	case RDT_RESOURCE_PERF_PKG:
+		return RFTYPE_RES_PERF_PKG;
 	}
 
 	return WARN_ON_ONCE(1);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH v6 17/30] x86/resctrl: Discover hardware telemetry events
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (15 preceding siblings ...)
  2025-06-26 16:49 ` [PATCH v6 16/30] x86,fs/resctrl: Add and initialize rdt_resource for package scope core monitor Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-06-27 18:06   ` Luck, Tony
  2025-07-08 23:51   ` Reinette Chatre
  2025-06-26 16:49 ` [PATCH v6 18/30] x86/resctrl: Count valid telemetry aggregators per package Tony Luck
                   ` (15 subsequent siblings)
  32 siblings, 2 replies; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Hardware has one or more telemetry event aggregators per package
for each group of telemetry events. Each aggregator provides access
to event counts in an array of 64-bit values in MMIO space. There
is a "guid" (in this case a unique 32-bit integer) which refers to
an XML file published in the https://github.com/intel/Intel-PMT
that provides all the details about each aggregator.

The XML files provide the following information:
1) Which telemetry events are included in the group for this aggregator.
2) The order in which the event counters appear for each RMID.
3) The value type of each event counter (integer or fixed-point).
4) The number of RMIDs supported.
5) Which additional aggregator status registers are included.
6) The total size of the MMIO region for this aggregator.

Add select of X86_PLATFORM_DEVICES, INTEL_VSEC and
INTEL_PMT_TELEMETRY to CONFIG_X86_CPU_RESCTRL to enable use of the
discovery driver that enumerate all aggregators on the system with
intel_pmt_get_regions_by_feature(). Call this for each pmt_feature_id
that indicates per-RMID telemetry.

Save the returned pmt_feature_group pointers with guids that are known
to resctrl for use at run time.

Those pointers are returned to the INTEL_PMT_DISCOVERY driver at
resctrl_arch_exit() time.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/resctrl/internal.h  |   3 +
 arch/x86/kernel/cpu/resctrl/core.c      |   5 +
 arch/x86/kernel/cpu/resctrl/intel_aet.c | 122 ++++++++++++++++++++++++
 arch/x86/Kconfig                        |   3 +
 arch/x86/kernel/cpu/resctrl/Makefile    |   1 +
 5 files changed, 134 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/resctrl/intel_aet.c

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 224b71730cc3..e93b15bf6aab 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -169,4 +169,7 @@ void __init intel_rdt_mbm_apply_quirk(void);
 
 void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
 
+bool intel_aet_get_events(void);
+void __exit intel_aet_exit(void);
+
 #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index a5f01cac2363..9144766da836 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -734,6 +734,9 @@ void resctrl_arch_pre_mount(void)
 
 	if (!atomic_try_cmpxchg(&only_once, &old, 1))
 		return;
+
+	if (!intel_aet_get_events())
+		return;
 }
 
 enum {
@@ -1086,6 +1089,8 @@ late_initcall(resctrl_arch_late_init);
 
 static void __exit resctrl_arch_exit(void)
 {
+	intel_aet_exit();
+
 	cpuhp_remove_state(rdt_online);
 
 	resctrl_exit();
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
new file mode 100644
index 000000000000..b09044b093dd
--- /dev/null
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -0,0 +1,122 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Resource Director Technology(RDT)
+ * - Intel Application Energy Telemetry
+ *
+ * Copyright (C) 2025 Intel Corporation
+ *
+ * Author:
+ *    Tony Luck <tony.luck@intel.com>
+ */
+
+#define pr_fmt(fmt)   "resctrl: " fmt
+
+#include <linux/cleanup.h>
+#include <linux/cpu.h>
+#include <linux/intel_vsec.h>
+#include <linux/resctrl.h>
+
+#include "internal.h"
+
+/**
+ * struct event_group - All information about a group of telemetry events.
+ * @pfg:		Points to the aggregated telemetry space information
+ *			within the OOBMSM driver that contains data for all
+ *			telemetry regions.
+ * @guid:		Unique number per XML description file.
+ */
+struct event_group {
+	/* Data fields for additional structures to manage this group. */
+	struct pmt_feature_group	*pfg;
+
+	/* Remaining fields initialized from XML file. */
+	u32				guid;
+};
+
+/*
+ * Link: https://github.com/intel/Intel-PMT
+ * File: xml/CWF/OOBMSM/RMID-ENERGY/cwf_aggregator.xml
+ */
+static struct event_group energy_0x26696143 = {
+	.guid		= 0x26696143,
+};
+
+/*
+ * Link: https://github.com/intel/Intel-PMT
+ * File: xml/CWF/OOBMSM/RMID-PERF/cwf_aggregator.xml
+ */
+static struct event_group perf_0x26557651 = {
+	.guid		= 0x26557651,
+};
+
+static struct event_group *known_event_groups[] = {
+	&energy_0x26696143,
+	&perf_0x26557651,
+};
+
+#define NUM_KNOWN_GROUPS ARRAY_SIZE(known_event_groups)
+
+/* Stub for now */
+static int configure_events(struct event_group *e, struct pmt_feature_group *p)
+{
+	return -EINVAL;
+}
+
+DEFINE_FREE(intel_pmt_put_feature_group, struct pmt_feature_group *,
+		if (!IS_ERR_OR_NULL(_T))
+			intel_pmt_put_feature_group(_T))
+
+/*
+ * Make a request to the INTEL_PMT_DISCOVERY driver for the
+ * pmt_feature_group for a specific feature. If there is
+ * one the returned structure has an array of telemetry_region
+ * structures. Each describes one telemetry aggregator.
+ * Try to configure any with a known matching guid.
+ */
+static bool get_pmt_feature(enum pmt_feature_id feature)
+{
+	struct pmt_feature_group *p __free(intel_pmt_put_feature_group) = NULL;
+	struct event_group **peg;
+	bool ret;
+
+	p = intel_pmt_get_regions_by_feature(feature);
+
+	if (IS_ERR_OR_NULL(p))
+		return false;
+
+	for (peg = &known_event_groups[0]; peg < &known_event_groups[NUM_KNOWN_GROUPS]; peg++) {
+		ret = configure_events(*peg, p);
+		if (!ret) {
+			(*peg)->pfg = no_free_ptr(p);
+			return true;
+		}
+	}
+
+	return false;
+}
+
+/*
+ * Ask OOBMSM discovery driver for all the RMID based telemetry groups
+ * that it supports.
+ */
+bool intel_aet_get_events(void)
+{
+	bool ret1, ret2;
+
+	ret1 = get_pmt_feature(FEATURE_PER_RMID_ENERGY_TELEM);
+	ret2 = get_pmt_feature(FEATURE_PER_RMID_PERF_TELEM);
+
+	return ret1 || ret2;
+}
+
+void __exit intel_aet_exit(void)
+{
+	struct event_group **peg;
+
+	for (peg = &known_event_groups[0]; peg < &known_event_groups[NUM_KNOWN_GROUPS]; peg++) {
+		if ((*peg)->pfg) {
+			intel_pmt_put_feature_group((*peg)->pfg);
+			(*peg)->pfg = NULL;
+		}
+	}
+}
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 71019b3b54ea..8eb68d2230be 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -512,6 +512,9 @@ config X86_CPU_RESCTRL
 	select ARCH_HAS_CPU_RESCTRL
 	select RESCTRL_FS
 	select RESCTRL_FS_PSEUDO_LOCK
+	select X86_PLATFORM_DEVICES
+	select INTEL_VSEC
+	select INTEL_PMT_TELEMETRY
 	help
 	  Enable x86 CPU resource control support.
 
diff --git a/arch/x86/kernel/cpu/resctrl/Makefile b/arch/x86/kernel/cpu/resctrl/Makefile
index d8a04b195da2..97ceb4e44dfa 100644
--- a/arch/x86/kernel/cpu/resctrl/Makefile
+++ b/arch/x86/kernel/cpu/resctrl/Makefile
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_X86_CPU_RESCTRL)		+= core.o rdtgroup.o monitor.o
 obj-$(CONFIG_X86_CPU_RESCTRL)		+= ctrlmondata.o
+obj-$(CONFIG_X86_CPU_RESCTRL)		+= intel_aet.o
 obj-$(CONFIG_RESCTRL_FS_PSEUDO_LOCK)	+= pseudo_lock.o
 
 # To allow define_trace.h's recursive include:
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH v6 18/30] x86/resctrl: Count valid telemetry aggregators per package
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (16 preceding siblings ...)
  2025-06-26 16:49 ` [PATCH v6 17/30] x86/resctrl: Discover hardware telemetry events Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-07-09  2:20   ` Reinette Chatre
  2025-06-26 16:49 ` [PATCH v6 19/30] x86/resctrl: Complete telemetry event enumeration Tony Luck
                   ` (14 subsequent siblings)
  32 siblings, 1 reply; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

There may be multiple telemetry aggregators per package, each enumerated
by a telemetry region structure in the feature group.

Scan the array of telemetry region structures and count how many are
in each package in preparation to allocate structures to save the MMIO
addresses for each in a convenient format for use when reading event
counters.

Sanity check that the telemetry region structures have a valid
package_id and that the size they report for the MMIO space is as
large as expected from the XML description of the registers in
the region.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/resctrl/intel_aet.c | 55 ++++++++++++++++++++++++-
 1 file changed, 53 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index b09044b093dd..8d67ed709a74 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -15,6 +15,7 @@
 #include <linux/cpu.h>
 #include <linux/intel_vsec.h>
 #include <linux/resctrl.h>
+#include <linux/slab.h>
 
 #include "internal.h"
 
@@ -24,6 +25,7 @@
  *			within the OOBMSM driver that contains data for all
  *			telemetry regions.
  * @guid:		Unique number per XML description file.
+ * @mmio_size:		Number of bytes of MMIO registers for this group.
  */
 struct event_group {
 	/* Data fields for additional structures to manage this group. */
@@ -31,14 +33,19 @@ struct event_group {
 
 	/* Remaining fields initialized from XML file. */
 	u32				guid;
+	size_t				mmio_size;
 };
 
+#define XML_MMIO_SIZE(num_rmids, num_events, num_extra_status)	\
+	(((num_rmids) * (num_events) + (num_extra_status)) * sizeof(u64))
+
 /*
  * Link: https://github.com/intel/Intel-PMT
  * File: xml/CWF/OOBMSM/RMID-ENERGY/cwf_aggregator.xml
  */
 static struct event_group energy_0x26696143 = {
 	.guid		= 0x26696143,
+	.mmio_size	= XML_MMIO_SIZE(576, 2, 3),
 };
 
 /*
@@ -47,6 +54,7 @@ static struct event_group energy_0x26696143 = {
  */
 static struct event_group perf_0x26557651 = {
 	.guid		= 0x26557651,
+	.mmio_size	= XML_MMIO_SIZE(576, 7, 3),
 };
 
 static struct event_group *known_event_groups[] = {
@@ -56,10 +64,53 @@ static struct event_group *known_event_groups[] = {
 
 #define NUM_KNOWN_GROUPS ARRAY_SIZE(known_event_groups)
 
-/* Stub for now */
+static bool skip_this_region(struct telemetry_region *tr, struct event_group *e)
+{
+	if (tr->guid != e->guid)
+		return true;
+	if (tr->plat_info.package_id >= topology_max_packages()) {
+		pr_warn_once("Bad package %d in guid 0x%x\n", tr->plat_info.package_id,
+			     tr->guid);
+		return true;
+	}
+	if (tr->size < e->mmio_size) {
+		pr_warn_once("MMIO space %zu too small for guid 0x%x\n", tr->size, e->guid);
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ * Configure events from one pmt_feature_group.
+ * 1) Count how many per package.
+ * 2...) To be continued.
+ */
 static int configure_events(struct event_group *e, struct pmt_feature_group *p)
 {
-	return -EINVAL;
+	int *pkgcounts __free(kfree) = NULL;
+	struct telemetry_region *tr;
+	int num_pkgs;
+
+	num_pkgs = topology_max_packages();
+
+	/* Get per-package counts of telemetry_regions for this event group */
+	for (int i = 0; i < p->count; i++) {
+		tr = &p->regions[i];
+		if (skip_this_region(tr, e))
+			continue;
+		if (!pkgcounts) {
+			pkgcounts = kcalloc(num_pkgs, sizeof(*pkgcounts), GFP_KERNEL);
+			if (!pkgcounts)
+				return -ENOMEM;
+		}
+		pkgcounts[tr->plat_info.package_id]++;
+	}
+
+	if (!pkgcounts)
+		return -ENODEV;
+
+	return 0;
 }
 
 DEFINE_FREE(intel_pmt_put_feature_group, struct pmt_feature_group *,
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH v6 19/30] x86/resctrl: Complete telemetry event enumeration
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (17 preceding siblings ...)
  2025-06-26 16:49 ` [PATCH v6 18/30] x86/resctrl: Count valid telemetry aggregators per package Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-07-09  2:38   ` Reinette Chatre
  2025-06-26 16:49 ` [PATCH v6 20/30] x86,fs/resctrl: Fill in details of Clearwater Forest events Tony Luck
                   ` (13 subsequent siblings)
  32 siblings, 1 reply; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Counters for telemetry events are in MMIO space. Each telemetry_region
structure returned in the pmt_feature_group returned from OOBMSM contains
the base MMIO address for the counters.

Scan all the telemetry_region structures again and save the number
of regions together with a flex array of the mmio addresses for each
aggregator indexed by package id. Note that there may be multiple
aggregators per package.

Completed structure for each event group looks like this:

             +---------------------+---------------------+
pkginfo** -->|   pkginfo[0]         |    pkginfo[1]      |
             +---------------------+---------------------+
                        |                     |
                        v                     v
                +----------------+    +----------------+
                |struct mmio_info|    |struct mmio_info|
                +----------------+    +----------------+
                |num_regions = N |    |num_regions = N |
                |  addrs[0]      |    |  addrs[0]      |
                |  addrs[1]      |    |  addrs[1]      |
                |    ...         |    |    ...         |
                |  addrs[N-1]    |    |  addrs[N-1]    |
                +----------------+    +----------------+

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/resctrl/intel_aet.c | 64 +++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index 8d67ed709a74..c770039b2525 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -19,17 +19,32 @@
 
 #include "internal.h"
 
+/**
+ * struct mmio_info - MMIO address information for one event group of a package.
+ * @num_regions:	Number of telemetry regions on this package.
+ * @addrs:		Array of MMIO addresses, one per telemetry region on this package.
+ *
+ * Provides convenient access to all MMIO addresses of one event group
+ * for one package. Used when reading event data on a package.
+ */
+struct mmio_info {
+	int		num_regions;
+	void __iomem	*addrs[] __counted_by(num_regions);
+};
+
 /**
  * struct event_group - All information about a group of telemetry events.
  * @pfg:		Points to the aggregated telemetry space information
  *			within the OOBMSM driver that contains data for all
  *			telemetry regions.
+ * @pkginfo:		Per-package MMIO addresses of telemetry regions belonging to this group.
  * @guid:		Unique number per XML description file.
  * @mmio_size:		Number of bytes of MMIO registers for this group.
  */
 struct event_group {
 	/* Data fields for additional structures to manage this group. */
 	struct pmt_feature_group	*pfg;
+	struct mmio_info		**pkginfo;
 
 	/* Remaining fields initialized from XML file. */
 	u32				guid;
@@ -81,6 +96,20 @@ static bool skip_this_region(struct telemetry_region *tr, struct event_group *e)
 	return false;
 }
 
+static void free_mmio_info(struct mmio_info **mmi)
+{
+	int num_pkgs = topology_max_packages();
+
+	if (!mmi)
+		return;
+
+	for (int i = 0; i < num_pkgs; i++)
+		kfree(mmi[i]);
+	kfree(mmi);
+}
+
+DEFINE_FREE(mmio_info, struct mmio_info **, free_mmio_info(_T))
+
 /*
  * Configure events from one pmt_feature_group.
  * 1) Count how many per package.
@@ -88,8 +117,10 @@ static bool skip_this_region(struct telemetry_region *tr, struct event_group *e)
  */
 static int configure_events(struct event_group *e, struct pmt_feature_group *p)
 {
+	struct mmio_info **pkginfo __free(mmio_info) = NULL;
 	int *pkgcounts __free(kfree) = NULL;
 	struct telemetry_region *tr;
+	struct mmio_info *mmi;
 	int num_pkgs;
 
 	num_pkgs = topology_max_packages();
@@ -99,6 +130,12 @@ static int configure_events(struct event_group *e, struct pmt_feature_group *p)
 		tr = &p->regions[i];
 		if (skip_this_region(tr, e))
 			continue;
+
+		if (e->pkginfo) {
+			pr_warn_once("Duplicate telemetry information for guid 0x%x\n", e->guid);
+			return -EINVAL;
+		}
+
 		if (!pkgcounts) {
 			pkgcounts = kcalloc(num_pkgs, sizeof(*pkgcounts), GFP_KERNEL);
 			if (!pkgcounts)
@@ -110,6 +147,32 @@ static int configure_events(struct event_group *e, struct pmt_feature_group *p)
 	if (!pkgcounts)
 		return -ENODEV;
 
+	/* Allocate array for per-package struct mmio_info data */
+	pkginfo = kcalloc(num_pkgs, sizeof(*pkginfo), GFP_KERNEL);
+	if (!pkginfo)
+		return -ENOMEM;
+
+	/*
+	 * Allocate per-package mmio_info structures and initialize
+	 * count of telemetry_regions in each one.
+	 */
+	for (int i = 0; i < num_pkgs; i++) {
+		pkginfo[i] = kzalloc(struct_size(pkginfo[i], addrs, pkgcounts[i]), GFP_KERNEL);
+		if (!pkginfo[i])
+			return -ENOMEM;
+		pkginfo[i]->num_regions = pkgcounts[i];
+	}
+
+	/* Save MMIO address(es) for each telemetry region in per-package structures */
+	for (int i = 0; i < p->count; i++) {
+		tr = &p->regions[i];
+		if (skip_this_region(tr, e))
+			continue;
+		mmi = pkginfo[tr->plat_info.package_id];
+		mmi->addrs[--pkgcounts[tr->plat_info.package_id]] = tr->addr;
+	}
+	e->pkginfo = no_free_ptr(pkginfo);
+
 	return 0;
 }
 
@@ -169,5 +232,6 @@ void __exit intel_aet_exit(void)
 			intel_pmt_put_feature_group((*peg)->pfg);
 			(*peg)->pfg = NULL;
 		}
+		free_mmio_info((*peg)->pkginfo);
 	}
 }
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH v6 20/30] x86,fs/resctrl: Fill in details of Clearwater Forest events
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (18 preceding siblings ...)
  2025-06-26 16:49 ` [PATCH v6 19/30] x86/resctrl: Complete telemetry event enumeration Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-07-09  3:00   ` Reinette Chatre
  2025-06-26 16:49 ` [PATCH v6 21/30] x86,fs/resctrl: Add architectural event pointer Tony Luck
                   ` (12 subsequent siblings)
  32 siblings, 1 reply; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Clearwater Forest supports two energy related telemetry events
and seven perf style events. The counters are arranged in per-RMID
blocks like this:

	MMIO offset:0x00 Counter for RMID 0 Event 0
	MMIO offset:0x08 Counter for RMID 0 Event 1
	MMIO offset:0x10 Counter for RMID 0 Event 2
	MMIO offset:0x18 Counter for RMID 1 Event 0
	MMIO offset:0x20 Counter for RMID 1 Event 1
	MMIO offset:0x28 Counter for RMID 1 Event 2
	...

Define these events in the file system code and add the events
to the event_group structures.

PMT_EVENT_ENERGY and PMT_EVENT_ACTIVITY are produced in fixed point
format. File system code must output as floating point values.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 include/linux/resctrl_types.h           | 11 ++++++++
 arch/x86/kernel/cpu/resctrl/intel_aet.c | 33 +++++++++++++++++++++++
 fs/resctrl/monitor.c                    | 35 ++++++++++++++-----------
 3 files changed, 64 insertions(+), 15 deletions(-)

diff --git a/include/linux/resctrl_types.h b/include/linux/resctrl_types.h
index d98351663c2c..6838b02d5ca3 100644
--- a/include/linux/resctrl_types.h
+++ b/include/linux/resctrl_types.h
@@ -47,6 +47,17 @@ enum resctrl_event_id {
 	QOS_L3_MBM_TOTAL_EVENT_ID	= 0x02,
 	QOS_L3_MBM_LOCAL_EVENT_ID	= 0x03,
 
+	/* Intel Telemetry Events */
+	PMT_EVENT_ENERGY,
+	PMT_EVENT_ACTIVITY,
+	PMT_EVENT_STALLS_LLC_HIT,
+	PMT_EVENT_C1_RES,
+	PMT_EVENT_UNHALTED_CORE_CYCLES,
+	PMT_EVENT_STALLS_LLC_MISS,
+	PMT_EVENT_AUTO_C6_RES,
+	PMT_EVENT_UNHALTED_REF_CYCLES,
+	PMT_EVENT_UOPS_RETIRED,
+
 	/* Must be the last */
 	QOS_NUM_EVENTS,
 };
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index c770039b2525..f9b2959693a0 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -32,6 +32,20 @@ struct mmio_info {
 	void __iomem	*addrs[] __counted_by(num_regions);
 };
 
+/**
+ * struct pmt_event - Telemetry event.
+ * @id:		Resctrl event id.
+ * @idx:	Counter index within each per-RMID block of counters.
+ * @bin_bits:	Zero for integer valued events, else number bits in fixed-point.
+ */
+struct pmt_event {
+	enum resctrl_event_id	id;
+	int			idx;
+	int			bin_bits;
+};
+
+#define EVT(_id, _idx, _bits) { .id = _id, .idx = _idx, .bin_bits = _bits }
+
 /**
  * struct event_group - All information about a group of telemetry events.
  * @pfg:		Points to the aggregated telemetry space information
@@ -40,6 +54,8 @@ struct mmio_info {
  * @pkginfo:		Per-package MMIO addresses of telemetry regions belonging to this group.
  * @guid:		Unique number per XML description file.
  * @mmio_size:		Number of bytes of MMIO registers for this group.
+ * @num_events:		Number of events in this group.
+ * @evts:		Array of event descriptors.
  */
 struct event_group {
 	/* Data fields for additional structures to manage this group. */
@@ -49,6 +65,8 @@ struct event_group {
 	/* Remaining fields initialized from XML file. */
 	u32				guid;
 	size_t				mmio_size;
+	int				num_events;
+	struct pmt_event		evts[] __counted_by(num_events);
 };
 
 #define XML_MMIO_SIZE(num_rmids, num_events, num_extra_status)	\
@@ -61,6 +79,11 @@ struct event_group {
 static struct event_group energy_0x26696143 = {
 	.guid		= 0x26696143,
 	.mmio_size	= XML_MMIO_SIZE(576, 2, 3),
+	.num_events	= 2,
+	.evts				= {
+		EVT(PMT_EVENT_ENERGY, 0, 18),
+		EVT(PMT_EVENT_ACTIVITY, 1, 18),
+	}
 };
 
 /*
@@ -70,6 +93,16 @@ static struct event_group energy_0x26696143 = {
 static struct event_group perf_0x26557651 = {
 	.guid		= 0x26557651,
 	.mmio_size	= XML_MMIO_SIZE(576, 7, 3),
+	.num_events	= 7,
+	.evts				= {
+		EVT(PMT_EVENT_STALLS_LLC_HIT, 0, 0),
+		EVT(PMT_EVENT_C1_RES, 1, 0),
+		EVT(PMT_EVENT_UNHALTED_CORE_CYCLES, 2, 0),
+		EVT(PMT_EVENT_STALLS_LLC_MISS, 3, 0),
+		EVT(PMT_EVENT_AUTO_C6_RES, 4, 0),
+		EVT(PMT_EVENT_UNHALTED_REF_CYCLES, 5, 0),
+		EVT(PMT_EVENT_UOPS_RETIRED, 6, 0),
+	}
 };
 
 static struct event_group *known_event_groups[] = {
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 076c0cc6e53a..cff8af3a263e 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -874,27 +874,32 @@ static void dom_data_exit(struct rdt_resource *r)
 	mutex_unlock(&rdtgroup_mutex);
 }
 
+#define MON_EVENT(_eventid, _name, _res, _fp)	\
+	[_eventid] = {				\
+	.name			= _name,	\
+	.evtid			= _eventid,	\
+	.rid			= _res,		\
+	.is_floating_point	= _fp,		\
+}
+
 /*
  * All available events. Architecture code marks the ones that
  * are supported by a system using resctrl_enable_mon_event()
  * to set .enabled.
  */
 struct mon_evt mon_event_all[QOS_NUM_EVENTS] = {
-	[QOS_L3_OCCUP_EVENT_ID] = {
-		.name	= "llc_occupancy",
-		.evtid	= QOS_L3_OCCUP_EVENT_ID,
-		.rid	= RDT_RESOURCE_L3,
-	},
-	[QOS_L3_MBM_TOTAL_EVENT_ID] = {
-		.name	= "mbm_total_bytes",
-		.evtid	= QOS_L3_MBM_TOTAL_EVENT_ID,
-		.rid	= RDT_RESOURCE_L3,
-	},
-	[QOS_L3_MBM_LOCAL_EVENT_ID] = {
-		.name	= "mbm_local_bytes",
-		.evtid	= QOS_L3_MBM_LOCAL_EVENT_ID,
-		.rid	= RDT_RESOURCE_L3,
-	},
+	MON_EVENT(QOS_L3_OCCUP_EVENT_ID,		"llc_occupancy",	RDT_RESOURCE_L3,	false),
+	MON_EVENT(QOS_L3_MBM_TOTAL_EVENT_ID,		"mbm_total_bytes",	RDT_RESOURCE_L3,	false),
+	MON_EVENT(QOS_L3_MBM_LOCAL_EVENT_ID,		"mbm_local_bytes",	RDT_RESOURCE_L3,	false),
+	MON_EVENT(PMT_EVENT_ENERGY,			"core_energy",		RDT_RESOURCE_PERF_PKG,	true),
+	MON_EVENT(PMT_EVENT_ACTIVITY,			"activity",		RDT_RESOURCE_PERF_PKG,	true),
+	MON_EVENT(PMT_EVENT_STALLS_LLC_HIT,		"stalls_llc_hit",	RDT_RESOURCE_PERF_PKG,	false),
+	MON_EVENT(PMT_EVENT_C1_RES,			"c1_res",		RDT_RESOURCE_PERF_PKG,	false),
+	MON_EVENT(PMT_EVENT_UNHALTED_CORE_CYCLES,	"unhalted_core_cycles",	RDT_RESOURCE_PERF_PKG,	false),
+	MON_EVENT(PMT_EVENT_STALLS_LLC_MISS,		"stalls_llc_miss",	RDT_RESOURCE_PERF_PKG,	false),
+	MON_EVENT(PMT_EVENT_AUTO_C6_RES,		"c6_res",		RDT_RESOURCE_PERF_PKG,	false),
+	MON_EVENT(PMT_EVENT_UNHALTED_REF_CYCLES,	"unhalted_ref_cycles",	RDT_RESOURCE_PERF_PKG,	false),
+	MON_EVENT(PMT_EVENT_UOPS_RETIRED,		"uops_retired",		RDT_RESOURCE_PERF_PKG,	false),
 };
 
 void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu, u32 binary_bits)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH v6 21/30] x86,fs/resctrl: Add architectural event pointer
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (19 preceding siblings ...)
  2025-06-26 16:49 ` [PATCH v6 20/30] x86,fs/resctrl: Fill in details of Clearwater Forest events Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-07-09  3:21   ` Reinette Chatre
  2025-06-26 16:49 ` [PATCH v6 22/30] x86/resctrl: Read core telemetry events Tony Luck
                   ` (11 subsequent siblings)
  32 siblings, 1 reply; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

The resctrl file system layer passed the domain, rmid, and event id to
resctrl_arch_rmid_read() to fetch an event counter.

For some resources this may not be enough information to efficiently
access the counter.

Add mon_evt::arch_priv void pointer. Architecture code can initialize
this when marking each event enabled.

File system code passes this pointer to resctrl_arch_rmid_read().

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 include/linux/resctrl.h               |  6 ++++--
 fs/resctrl/internal.h                 |  1 +
 arch/x86/kernel/cpu/resctrl/core.c    |  6 +++---
 arch/x86/kernel/cpu/resctrl/monitor.c |  2 +-
 fs/resctrl/monitor.c                  | 12 ++++++++----
 5 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 76c54b81e426..b9f2690bee1e 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -383,7 +383,8 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
 
 #define MAX_BINARY_BITS	27
 
-void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu, u32 binary_bits);
+void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu,
+			      u32 binary_bits, void *arch_priv);
 
 bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid);
 
@@ -478,6 +479,7 @@ void resctrl_arch_pre_mount(void);
  *			only.
  * @rmid:		rmid of the counter to read.
  * @eventid:		eventid to read, e.g. L3 occupancy.
+ * @arch_priv:		architecture private data for this event.
  * @val:		result of the counter read in bytes.
  * @arch_mon_ctx:	An architecture specific value from
  *			resctrl_arch_mon_ctx_alloc(), for MPAM this identifies
@@ -495,7 +497,7 @@ void resctrl_arch_pre_mount(void);
  */
 int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
 			   u32 closid, u32 rmid, enum resctrl_event_id eventid,
-			   u64 *val, void *arch_mon_ctx);
+			   void *arch_priv, u64 *val, void *arch_mon_ctx);
 
 /**
  * resctrl_arch_rmid_read_context_check()  - warn about invalid contexts
diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 53ced959a27d..2126006075f3 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -71,6 +71,7 @@ struct mon_evt {
 	bool			is_floating_point;
 	int			binary_bits;
 	bool			enabled;
+	void			*arch_priv;
 };
 
 extern struct mon_evt mon_event_all[QOS_NUM_EVENTS];
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 9144766da836..f3144fe918dd 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -909,15 +909,15 @@ static __init bool get_rdt_mon_resources(void)
 	bool ret = false;
 
 	if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) {
-		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false, 0);
+		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false, 0, NULL);
 		ret = true;
 	}
 	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
-		resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID, false, 0);
+		resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID, false, 0, NULL);
 		ret = true;
 	}
 	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
-		resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID, false, 0);
+		resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID, false, 0, NULL);
 		ret = true;
 	}
 
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 043f777378a6..185b203f6321 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -219,7 +219,7 @@ static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
 
 int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
 			   u32 unused, u32 rmid, enum resctrl_event_id eventid,
-			   u64 *val, void *ignored)
+			   void *arch_priv, u64 *val, void *ignored)
 {
 	int cpu = cpumask_any(&hdr->cpu_mask);
 	struct rdt_hw_l3_mon_domain *hw_dom;
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index cff8af3a263e..c4b092aec9f8 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -160,7 +160,7 @@ void __check_limbo(struct rdt_l3_mon_domain *d, bool force_free)
 
 		entry = __rmid_entry(idx);
 		if (resctrl_arch_rmid_read(r, &d->hdr, entry->closid, entry->rmid,
-					   QOS_L3_OCCUP_EVENT_ID, &val,
+					   QOS_L3_OCCUP_EVENT_ID, NULL, &val,
 					   arch_mon_ctx)) {
 			rmid_dirty = true;
 		} else {
@@ -402,7 +402,8 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
 			return -EINVAL;
 
 		rr->err = resctrl_arch_rmid_read(rr->r, rr->hdr, closid, rmid,
-						 rr->evt->evtid, &tval, rr->arch_mon_ctx);
+						 rr->evt->evtid, rr->evt->arch_priv,
+						 &tval, rr->arch_mon_ctx);
 		if (rr->err)
 			return rr->err;
 
@@ -430,7 +431,8 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
 		if (d->ci_id != rr->ci_id)
 			continue;
 		err = resctrl_arch_rmid_read(rr->r, &d->hdr, closid, rmid,
-					     rr->evt->evtid, &tval, rr->arch_mon_ctx);
+					     rr->evt->evtid, rr->evt->arch_priv,
+					     &tval, rr->arch_mon_ctx);
 		if (!err) {
 			rr->val += tval;
 			ret = 0;
@@ -902,7 +904,8 @@ struct mon_evt mon_event_all[QOS_NUM_EVENTS] = {
 	MON_EVENT(PMT_EVENT_UOPS_RETIRED,		"uops_retired",		RDT_RESOURCE_PERF_PKG,	false),
 };
 
-void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu, u32 binary_bits)
+void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu,
+			      u32 binary_bits, void *arch_priv)
 {
 	if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS) ||
 			 binary_bits > MAX_BINARY_BITS)
@@ -918,6 +921,7 @@ void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu, u32 b
 
 	mon_event_all[eventid].any_cpu = any_cpu;
 	mon_event_all[eventid].binary_bits = binary_bits;
+	mon_event_all[eventid].arch_priv = arch_priv;
 	mon_event_all[eventid].enabled = true;
 }
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH v6 22/30] x86/resctrl: Read core telemetry events
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (20 preceding siblings ...)
  2025-06-26 16:49 ` [PATCH v6 21/30] x86,fs/resctrl: Add architectural event pointer Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-07-09 15:48   ` Reinette Chatre
  2025-06-26 16:49 ` [PATCH v6 23/30] x86/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG Tony Luck
                   ` (10 subsequent siblings)
  32 siblings, 1 reply; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

The resctrl file system passes requests to read event monitor files to
the architecture resctrl_arch_rmid_read() to collect values
from hardware counters.

Use the resctrl resource to differentiate between calls to read legacy
L3 events from the new telemetry events (which are attached to
RDT_RESOURCE_PERF_PKG).

There may be multiple aggregators tracking each package, so scan all of
them and add up all counters.

Enable the events marked as readable from any CPU providing an
mon_evt::arch_priv pointer to the struct pmt_event for each
event.

At run time when a user reads an event file the file system code
provides the enum resctrl_event_id for the event and the arch_priv
pointer that was supplied when the event was enabled.

Resctrl now uses readq() so depends on X86_64. Update Kconfig.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/resctrl/internal.h  |  2 ++
 arch/x86/kernel/cpu/resctrl/intel_aet.c | 46 +++++++++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/monitor.c   |  3 ++
 arch/x86/Kconfig                        |  2 +-
 4 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index e93b15bf6aab..e8d2a754bc0c 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -171,5 +171,7 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
 
 bool intel_aet_get_events(void);
 void __exit intel_aet_exit(void);
+int intel_aet_read_event(int domid, int rmid, enum resctrl_event_id evtid,
+			 void *arch_priv, u64 *val);
 
 #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index f9b2959693a0..10fd8b04105e 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -14,6 +14,7 @@
 #include <linux/cleanup.h>
 #include <linux/cpu.h>
 #include <linux/intel_vsec.h>
+#include <linux/io.h>
 #include <linux/resctrl.h>
 #include <linux/slab.h>
 
@@ -206,6 +207,13 @@ static int configure_events(struct event_group *e, struct pmt_feature_group *p)
 	}
 	e->pkginfo = no_free_ptr(pkginfo);
 
+	for (int i = 0; i < e->num_events; i++) {
+		enum resctrl_event_id eventid;
+
+		eventid = e->evts[i].id;
+		resctrl_enable_mon_event(eventid, true, e->evts[i].bin_bits, &e->evts[i]);
+	}
+
 	return 0;
 }
 
@@ -268,3 +276,41 @@ void __exit intel_aet_exit(void)
 		free_mmio_info((*peg)->pkginfo);
 	}
 }
+
+#define DATA_VALID	BIT_ULL(63)
+#define DATA_BITS	GENMASK_ULL(62, 0)
+
+/*
+ * Read counter for an event on a domain (summing all aggregators
+ * on the domain).
+ */
+int intel_aet_read_event(int domid, int rmid, enum resctrl_event_id eventid,
+			 void *arch_priv, u64 *val)
+{
+	struct pmt_event *pevt = arch_priv;
+	struct mmio_info *mmi;
+	struct event_group *e;
+	u64 evtcount;
+	void *pevt0;
+	int idx;
+
+	pevt0 = pevt - pevt->idx;
+	e = container_of(pevt0, struct event_group, evts);
+	idx = rmid * e->num_events;
+	idx += pevt->idx;
+	mmi = e->pkginfo[domid];
+
+	if (idx * sizeof(u64) + sizeof(u64) > e->mmio_size) {
+		pr_warn_once("MMIO index %d out of range\n", idx);
+		return -EIO;
+	}
+
+	for (int i = 0; i < mmi->num_regions; i++) {
+		evtcount = readq(mmi->addrs[i] + idx * sizeof(u64));
+		if (!(evtcount & DATA_VALID))
+			return -EINVAL;
+		*val += evtcount & DATA_BITS;
+	}
+
+	return 0;
+}
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 185b203f6321..51d7d99336c6 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -232,6 +232,9 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
 
 	resctrl_arch_rmid_read_context_check();
 
+	if (r->rid == RDT_RESOURCE_PERF_PKG)
+		return intel_aet_read_event(hdr->id, rmid, eventid, arch_priv, val);
+
 	if (r->rid != RDT_RESOURCE_L3)
 		return -EINVAL;
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8eb68d2230be..a6b6ecbd3877 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -507,7 +507,7 @@ config X86_MPPARSE
 
 config X86_CPU_RESCTRL
 	bool "x86 CPU resource control support"
-	depends on X86 && (CPU_SUP_INTEL || CPU_SUP_AMD)
+	depends on X86_64 && (CPU_SUP_INTEL || CPU_SUP_AMD)
 	depends on MISC_FILESYSTEMS
 	select ARCH_HAS_CPU_RESCTRL
 	select RESCTRL_FS
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH v6 23/30] x86/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (21 preceding siblings ...)
  2025-06-26 16:49 ` [PATCH v6 22/30] x86/resctrl: Read core telemetry events Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-07-09 22:13   ` Reinette Chatre
  2025-06-26 16:49 ` [PATCH v6 24/30] x86/resctrl: Add energy/perf choices to rdt boot option Tony Luck
                   ` (9 subsequent siblings)
  32 siblings, 1 reply; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

The L3 resource has several requirements for domains. There are structures
that hold the 64-bit values of counters, and elements to keep track of
the overflow and limbo threads.

None of these are needed for the PERF_PKG resource. The hardware counters
are wide enough that they do not wrap around for decades.

Define a new rdt_perf_pkg_mon_domain structure which just consists of
the standard rdt_domain_hdr to keep track of domain id and CPU mask.

Change domain_add_cpu_mon(), domain_remove_cpu_mon(),
resctrl_offline_mon_domain(), and resctrl_online_mon_domain() to check
resource type and perform only the operations needed for domains in the
PERF_PKG resource.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/resctrl/core.c | 41 ++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index f3144fe918dd..f857f92e7b8b 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -547,6 +547,38 @@ static void l3_mon_domain_setup(int cpu, int id, struct rdt_resource *r, struct
 	}
 }
 
+/**
+ * struct rdt_perf_pkg_mon_domain - CPUs sharing an Intel-PMT-scoped resctrl monitor resource
+ * @hdr:	common header for different domain types
+ */
+struct rdt_perf_pkg_mon_domain {
+	struct rdt_domain_hdr   hdr;
+};
+
+static void setup_intel_aet_mon_domain(int cpu, int id, struct rdt_resource *r,
+				       struct list_head *add_pos)
+{
+	struct rdt_perf_pkg_mon_domain *d;
+	int err;
+
+	d = kzalloc_node(sizeof(*d), GFP_KERNEL, cpu_to_node(cpu));
+	if (!d)
+		return;
+
+	d->hdr.id = id;
+	d->hdr.type = RESCTRL_MON_DOMAIN;
+	d->hdr.rid = r->rid;
+	cpumask_set_cpu(cpu, &d->hdr.cpu_mask);
+	list_add_tail_rcu(&d->hdr.list, add_pos);
+
+	err = resctrl_online_mon_domain(r, &d->hdr);
+	if (err) {
+		list_del_rcu(&d->hdr.list);
+		synchronize_rcu();
+		kfree(d);
+	}
+}
+
 static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
 {
 	int id = get_domain_id_from_scope(cpu, r->mon_scope);
@@ -574,6 +606,9 @@ static void domain_add_cpu_mon(int cpu, struct rdt_resource *r)
 	case RDT_RESOURCE_L3:
 		l3_mon_domain_setup(cpu, id, r, add_pos);
 		break;
+	case RDT_RESOURCE_PERF_PKG:
+		setup_intel_aet_mon_domain(cpu, id, r, add_pos);
+		break;
 	default:
 		WARN_ON_ONCE(1);
 	}
@@ -670,6 +705,12 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
 		synchronize_rcu();
 		l3_mon_domain_free(hw_dom);
 		break;
+	case RDT_RESOURCE_PERF_PKG:
+		resctrl_offline_mon_domain(r, hdr);
+		list_del_rcu(&hdr->list);
+		synchronize_rcu();
+		kfree(container_of(hdr, struct rdt_perf_pkg_mon_domain, hdr));
+		break;
 	default:
 		pr_warn_once("Unknown resource rid=%d\n", r->rid);
 		break;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH v6 24/30] x86/resctrl: Add energy/perf choices to rdt boot option
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (22 preceding siblings ...)
  2025-06-26 16:49 ` [PATCH v6 23/30] x86/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-07-09 22:14   ` Reinette Chatre
  2025-06-26 16:49 ` [PATCH v6 25/30] x86/resctrl: Handle number of RMIDs supported by telemetry resources Tony Luck
                   ` (8 subsequent siblings)
  32 siblings, 1 reply; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Hardware backed resctrl features are enumerated by X86_FEATURE_*
flags. These may be overridden by quirks to disable features in the case
of errata.

Users can use kernel command line options to either disable a feature,
or to force enable a feature that was disabled by a quirk.

Provide similar functionality for software defined features that do not
have an X86_FEATURE_* flag.

Unlike other options that are tied to X86_FEATURE_* flags, these must be
queried by name. Add rdt_is_software_feature_enabled() to check whether
quirks or kernel command line have disabled a feature. Just like the
hardware feature options the command line enable overrides quirk disable.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 .../admin-guide/kernel-parameters.txt         |  2 +-
 arch/x86/kernel/cpu/resctrl/internal.h        |  2 ++
 arch/x86/kernel/cpu/resctrl/core.c            | 30 +++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/intel_aet.c       |  7 +++++
 4 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index f1f2c0874da9..4c12159f3ea0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6066,7 +6066,7 @@
 	rdt=		[HW,X86,RDT]
 			Turn on/off individual RDT features. List is:
 			cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
-			mba, smba, bmec.
+			mba, smba, bmec, energy, perf.
 			E.g. to turn on cmt and turn off mba use:
 				rdt=cmt,!mba
 
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index e8d2a754bc0c..ee1c6204722e 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -169,6 +169,8 @@ void __init intel_rdt_mbm_apply_quirk(void);
 
 void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
 
+bool rdt_is_software_feature_enabled(char *option);
+
 bool intel_aet_get_events(void);
 void __exit intel_aet_exit(void);
 int intel_aet_read_event(int domid, int rmid, enum resctrl_event_id evtid,
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index f857f92e7b8b..f9f3bc58290e 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -791,6 +791,8 @@ enum {
 	RDT_FLAG_MBA,
 	RDT_FLAG_SMBA,
 	RDT_FLAG_BMEC,
+	RDT_FLAG_ENERGY,
+	RDT_FLAG_PERF,
 };
 
 #define RDT_OPT(idx, n, f)	\
@@ -816,6 +818,8 @@ static struct rdt_options rdt_options[]  __ro_after_init = {
 	RDT_OPT(RDT_FLAG_MBA,	    "mba",	X86_FEATURE_MBA),
 	RDT_OPT(RDT_FLAG_SMBA,	    "smba",	X86_FEATURE_SMBA),
 	RDT_OPT(RDT_FLAG_BMEC,	    "bmec",	X86_FEATURE_BMEC),
+	RDT_OPT(RDT_FLAG_ENERGY,    "energy",	0),
+	RDT_OPT(RDT_FLAG_PERF,	    "perf",	0),
 };
 #define NUM_RDT_OPTIONS ARRAY_SIZE(rdt_options)
 
@@ -865,6 +869,32 @@ bool rdt_cpu_has(int flag)
 	return ret;
 }
 
+/*
+ * Software options that are not based on X86_FEATURE_* bits.
+ * There is no "h/w does not support this at all" case.
+ * Assume that the caller has already determined that s/w
+ * support is present and just needs to check if the option has been
+ * disabled by a quirk that has not been overridden * by a command
+ * line option.
+ */
+bool rdt_is_software_feature_enabled(char *name)
+{
+	struct rdt_options *o;
+	bool ret = true;
+
+	for (o = rdt_options; o < &rdt_options[NUM_RDT_OPTIONS]; o++) {
+		if (!strcmp(name, o->name)) {
+			if (o->force_off)
+				ret = false;
+			if (o->force_on)
+				ret = true;
+			break;
+		}
+	}
+
+	return ret;
+}
+
 bool resctrl_arch_is_evt_configurable(enum resctrl_event_id evt)
 {
 	if (!rdt_cpu_has(X86_FEATURE_BMEC))
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index 10fd8b04105e..1d2511984156 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -49,6 +49,7 @@ struct pmt_event {
 
 /**
  * struct event_group - All information about a group of telemetry events.
+ * @name:		Name for this group (used by boot rdt= option)
  * @pfg:		Points to the aggregated telemetry space information
  *			within the OOBMSM driver that contains data for all
  *			telemetry regions.
@@ -60,6 +61,7 @@ struct pmt_event {
  */
 struct event_group {
 	/* Data fields for additional structures to manage this group. */
+	char				*name;
 	struct pmt_feature_group	*pfg;
 	struct mmio_info		**pkginfo;
 
@@ -78,6 +80,7 @@ struct event_group {
  * File: xml/CWF/OOBMSM/RMID-ENERGY/cwf_aggregator.xml
  */
 static struct event_group energy_0x26696143 = {
+	.name		= "energy",
 	.guid		= 0x26696143,
 	.mmio_size	= XML_MMIO_SIZE(576, 2, 3),
 	.num_events	= 2,
@@ -92,6 +95,7 @@ static struct event_group energy_0x26696143 = {
  * File: xml/CWF/OOBMSM/RMID-PERF/cwf_aggregator.xml
  */
 static struct event_group perf_0x26557651 = {
+	.name		= "perf",
 	.guid		= 0x26557651,
 	.mmio_size	= XML_MMIO_SIZE(576, 7, 3),
 	.num_events	= 7,
@@ -157,6 +161,9 @@ static int configure_events(struct event_group *e, struct pmt_feature_group *p)
 	struct mmio_info *mmi;
 	int num_pkgs;
 
+	if (!rdt_is_software_feature_enabled(e->name))
+		return -EINVAL;
+
 	num_pkgs = topology_max_packages();
 
 	/* Get per-package counts of telemetry_regions for this event group */
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH v6 25/30] x86/resctrl: Handle number of RMIDs supported by telemetry resources
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (23 preceding siblings ...)
  2025-06-26 16:49 ` [PATCH v6 24/30] x86/resctrl: Add energy/perf choices to rdt boot option Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-07-09 22:17   ` Reinette Chatre
  2025-06-26 16:49 ` [PATCH v6 26/30] x86,fs/resctrl: Move RMID initialization to first mount Tony Luck
                   ` (7 subsequent siblings)
  32 siblings, 1 reply; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

There are now three meanings for "number of RMIDs":

1) The number for legacy features enumerated by CPUID leaf 0xF. This
is the maximum number of distinct values that can be loaded into the
IA32_PQR_ASSOC MSR. Note that systems with Sub-NUMA Cluster mode enabled
will force scaling down the CPUID enumerated value by the number of SNC
nodes per L3-cache.

2) The number of registers in MMIO space for each event. This
is enumerated in the XML files and is the value initialized into
event_group::num_rmids. This will be overwritten with a lower
value if hardware does not support all these registers at the
same time (see next case).

3) The number of "h/w counters" (this isn't a strictly accurate
description of how things work, but serves as a useful analogy that
does describe the limitations) feeding to those MMIO registers. This
is enumerated in telemetry_region::num_rmids returned from the call to
intel_pmt_get_regions_by_feature()

Event groups with insufficient "h/w counter" to track all RMIDs are
difficult for users to use, since the system may reassign "h/w counters"
as any time. This means that users cannot reliably collect two consecutive
event counts to compute the rate at which events are occurring.

Ignore such under-resourced event groups unless the user explicitly
requests to enable them using the "rdt=" Linux boot argument.

Scan all enabled event groups and assign the RDT_RESOURCE_PERF_PKG
resource "num_rmids" value to the smallest of these values to ensure
that all resctrl groups have equal monitor capabilities.

N.B. Changed type of rdt_resource::num_rmids to u32 to match.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 include/linux/resctrl.h                 |  2 +-
 arch/x86/kernel/cpu/resctrl/internal.h  |  4 ++++
 arch/x86/kernel/cpu/resctrl/core.c      | 20 +++++++++++++++++
 arch/x86/kernel/cpu/resctrl/intel_aet.c | 29 +++++++++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/monitor.c   |  2 ++
 5 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index b9f2690bee1e..35ae24822493 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -288,7 +288,7 @@ struct rdt_resource {
 	int			rid;
 	bool			alloc_capable;
 	bool			mon_capable;
-	int			num_rmid;
+	u32			num_rmid;
 	enum resctrl_scope	ctrl_scope;
 	enum resctrl_scope	mon_scope;
 	struct resctrl_cache	cache;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index ee1c6204722e..11f25c225837 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -18,6 +18,8 @@
 
 #define RMID_VAL_UNAVAIL		BIT_ULL(62)
 
+extern int rdt_num_system_rmids;
+
 /*
  * With the above fields in use 62 bits remain in MSR_IA32_QM_CTR for
  * data to be returned. The counter width is discovered from the hardware
@@ -171,6 +173,8 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
 
 bool rdt_is_software_feature_enabled(char *option);
 
+bool rdt_is_software_feature_force_enabled(char *name);
+
 bool intel_aet_get_events(void);
 void __exit intel_aet_exit(void);
 int intel_aet_read_event(int domid, int rmid, enum resctrl_event_id evtid,
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index f9f3bc58290e..7fe4e8111773 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -895,6 +895,26 @@ bool rdt_is_software_feature_enabled(char *name)
 	return ret;
 }
 
+/*
+ * Similar to rdt_is_software_feature_enabled() but the test is whether
+ * the user has force enabled the feature on the kernel command line.
+ */
+bool rdt_is_software_feature_force_enabled(char *name)
+{
+	struct rdt_options *o;
+	bool ret = false;
+
+	for (o = rdt_options; o < &rdt_options[NUM_RDT_OPTIONS]; o++) {
+		if (!strcmp(name, o->name)) {
+			if (o->force_on)
+				ret = true;
+			break;
+		}
+	}
+
+	return ret;
+}
+
 bool resctrl_arch_is_evt_configurable(enum resctrl_event_id evt)
 {
 	if (!rdt_cpu_has(X86_FEATURE_BMEC))
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index 1d2511984156..1d9edd409883 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -15,6 +15,7 @@
 #include <linux/cpu.h>
 #include <linux/intel_vsec.h>
 #include <linux/io.h>
+#include <linux/minmax.h>
 #include <linux/resctrl.h>
 #include <linux/slab.h>
 
@@ -55,6 +56,9 @@ struct pmt_event {
  *			telemetry regions.
  * @pkginfo:		Per-package MMIO addresses of telemetry regions belonging to this group.
  * @guid:		Unique number per XML description file.
+ * @num_rmids:		Number of RMIDS supported by this group. Adjusted downwards
+ *			if enumeration from intel_pmt_get_regions_by_feature() indicates
+ *			fewer RMIDs can be tracked simultaneously.
  * @mmio_size:		Number of bytes of MMIO registers for this group.
  * @num_events:		Number of events in this group.
  * @evts:		Array of event descriptors.
@@ -67,6 +71,7 @@ struct event_group {
 
 	/* Remaining fields initialized from XML file. */
 	u32				guid;
+	u32				num_rmids;
 	size_t				mmio_size;
 	int				num_events;
 	struct pmt_event		evts[] __counted_by(num_events);
@@ -82,6 +87,7 @@ struct event_group {
 static struct event_group energy_0x26696143 = {
 	.name		= "energy",
 	.guid		= 0x26696143,
+	.num_rmids	= 576,
 	.mmio_size	= XML_MMIO_SIZE(576, 2, 3),
 	.num_events	= 2,
 	.evts				= {
@@ -97,6 +103,7 @@ static struct event_group energy_0x26696143 = {
 static struct event_group perf_0x26557651 = {
 	.name		= "perf",
 	.guid		= 0x26557651,
+	.num_rmids	= 576,
 	.mmio_size	= XML_MMIO_SIZE(576, 7, 3),
 	.num_events	= 7,
 	.evts				= {
@@ -177,6 +184,17 @@ static int configure_events(struct event_group *e, struct pmt_feature_group *p)
 			return -EINVAL;
 		}
 
+		/*
+		 * Ignore event group with fewer RMIDs than can be loaded
+		 * into the IA32_PQR_ASSOC MSR unless the user used
+		 * the rdt= boot option to specifically ask for it to
+		 * be enabled.
+		 */
+		if (tr->num_rmids < rdt_num_system_rmids &&
+		    !rdt_is_software_feature_force_enabled(e->name))
+			return -EINVAL;
+		e->num_rmids = min(e->num_rmids, tr->num_rmids);
+
 		if (!pkgcounts) {
 			pkgcounts = kcalloc(num_pkgs, sizeof(*pkgcounts), GFP_KERNEL);
 			if (!pkgcounts)
@@ -263,11 +281,22 @@ static bool get_pmt_feature(enum pmt_feature_id feature)
  */
 bool intel_aet_get_events(void)
 {
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
+	struct event_group **eg;
 	bool ret1, ret2;
 
 	ret1 = get_pmt_feature(FEATURE_PER_RMID_ENERGY_TELEM);
 	ret2 = get_pmt_feature(FEATURE_PER_RMID_PERF_TELEM);
 
+	for (eg = &known_event_groups[0]; eg < &known_event_groups[NUM_KNOWN_GROUPS]; eg++) {
+		if (!(*eg)->pfg)
+			continue;
+		if (r->num_rmid)
+			r->num_rmid = min(r->num_rmid, (*eg)->num_rmids);
+		else
+			r->num_rmid = (*eg)->num_rmids;
+	}
+
 	return ret1 || ret2;
 }
 
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 51d7d99336c6..b36634f1439b 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -33,6 +33,7 @@ bool rdt_mon_capable;
 
 #define CF(cf)	((unsigned long)(1048576 * (cf) + 0.5))
 
+int rdt_num_system_rmids;
 static int snc_nodes_per_l3_cache = 1;
 
 /*
@@ -358,6 +359,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
 	resctrl_rmid_realloc_limit = boot_cpu_data.x86_cache_size * 1024;
 	hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale / snc_nodes_per_l3_cache;
 	r->num_rmid = (boot_cpu_data.x86_cache_max_rmid + 1) / snc_nodes_per_l3_cache;
+	rdt_num_system_rmids = r->num_rmid;
 	hw_res->mbm_width = MBM_CNTR_WIDTH_BASE;
 
 	if (mbm_offset > 0 && mbm_offset <= MBM_CNTR_WIDTH_OFFSET_MAX)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH v6 26/30] x86,fs/resctrl: Move RMID initialization to first mount
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (24 preceding siblings ...)
  2025-06-26 16:49 ` [PATCH v6 25/30] x86/resctrl: Handle number of RMIDs supported by telemetry resources Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-07-09 22:18   ` Reinette Chatre
  2025-06-26 16:49 ` [PATCH v6 27/30] x86/resctrl: Enable RDT_RESOURCE_PERF_PKG Tony Luck
                   ` (6 subsequent siblings)
  32 siblings, 1 reply; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

The resctrl file system code assumed that the only monitor events were
tied to the RDT_RESOURCE_L3 resource. Also that the number of supported
RMIDs was enumerated during early initialization.

RDT_RESOURCE_PERF_PKG breaks both of those assumptions.

Delay the final enumeration of the number of RMIDs and subsequent
allocation of structures until first mount of the resctrl file system.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 fs/resctrl/internal.h              |  4 ++-
 arch/x86/kernel/cpu/resctrl/core.c |  8 +++--
 fs/resctrl/monitor.c               | 58 +++++++++++++-----------------
 fs/resctrl/rdtgroup.c              | 12 +++++--
 4 files changed, 42 insertions(+), 40 deletions(-)

diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 2126006075f3..4704ea7228ca 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -354,6 +354,8 @@ int alloc_rmid(u32 closid);
 
 void free_rmid(u32 closid, u32 rmid);
 
+int resctrl_mon_dom_data_init(void);
+
 void resctrl_mon_resource_exit(void);
 
 void mon_event_count(void *info);
@@ -364,7 +366,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 		    struct rdt_domain_hdr *hdr, struct rdtgroup *rdtgrp,
 		    cpumask_t *cpumask, struct mon_evt *evt, int first);
 
-int resctrl_mon_resource_init(void);
+void resctrl_mon_l3_resource_init(void);
 
 void mbm_setup_overflow_handler(struct rdt_l3_mon_domain *dom,
 				unsigned long delay_ms,
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 7fe4e8111773..50de0c29704f 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -112,10 +112,14 @@ struct rdt_hw_resource rdt_resources_all[RDT_NUM_RESOURCES] = {
 
 u32 resctrl_arch_system_num_rmid_idx(void)
 {
-	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+	u32 num_rmids = U32_MAX;
+	struct rdt_resource *r;
+
+	for_each_mon_capable_rdt_resource(r)
+		num_rmids = min(num_rmids, r->num_rmid);
 
 	/* RMID are independent numbers for x86. num_rmid_idx == num_rmid */
-	return r->num_rmid;
+	return num_rmids;
 }
 
 struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index c4b092aec9f8..e877f5b97d18 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -796,15 +796,27 @@ void mbm_setup_overflow_handler(struct rdt_l3_mon_domain *dom, unsigned long del
 		schedule_delayed_work_on(cpu, &dom->mbm_over, delay);
 }
 
-static int dom_data_init(struct rdt_resource *r)
+/*
+ * resctrl_dom_data_init() - Initialise global monitoring structures.
+ *
+ * Allocate and initialise global monitor resources that do not belong to a
+ * specific domain. i.e. the rmid_ptrs[] used for the limbo and free lists.
+ * Called once during boot after the struct rdt_resource's have been configured
+ * but before the filesystem is mounted.
+ * Resctrl's cpuhp callbacks may be called before this point to bring a domain
+ * online.
+ *
+ * Returns 0 for success, or -ENOMEM.
+ */
+int resctrl_mon_dom_data_init(void)
 {
+	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
 	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
 	u32 num_closid = resctrl_arch_get_num_closid(r);
 	struct rmid_entry *entry = NULL;
-	int err = 0, i;
 	u32 idx;
+	int i;
 
-	mutex_lock(&rdtgroup_mutex);
 	if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
 		u32 *tmp;
 
@@ -815,10 +827,8 @@ static int dom_data_init(struct rdt_resource *r)
 		 * use.
 		 */
 		tmp = kcalloc(num_closid, sizeof(*tmp), GFP_KERNEL);
-		if (!tmp) {
-			err = -ENOMEM;
-			goto out_unlock;
-		}
+		if (!tmp)
+			return -ENOMEM;
 
 		closid_num_dirty_rmid = tmp;
 	}
@@ -829,8 +839,7 @@ static int dom_data_init(struct rdt_resource *r)
 			kfree(closid_num_dirty_rmid);
 			closid_num_dirty_rmid = NULL;
 		}
-		err = -ENOMEM;
-		goto out_unlock;
+		return -ENOMEM;
 	}
 
 	for (i = 0; i < idx_limit; i++) {
@@ -851,13 +860,10 @@ static int dom_data_init(struct rdt_resource *r)
 	entry = __rmid_entry(idx);
 	list_del(&entry->list);
 
-out_unlock:
-	mutex_unlock(&rdtgroup_mutex);
-
-	return err;
+	return 0;
 }
 
-static void dom_data_exit(struct rdt_resource *r)
+static void resctrl_mon_dom_data_exit(struct rdt_resource *r)
 {
 	mutex_lock(&rdtgroup_mutex);
 
@@ -932,28 +938,14 @@ bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid)
 }
 
 /**
- * resctrl_mon_resource_init() - Initialise global monitoring structures.
- *
- * Allocate and initialise global monitor resources that do not belong to a
- * specific domain. i.e. the rmid_ptrs[] used for the limbo and free lists.
- * Called once during boot after the struct rdt_resource's have been configured
- * but before the filesystem is mounted.
- * Resctrl's cpuhp callbacks may be called before this point to bring a domain
- * online.
- *
- * Returns 0 for success, or -ENOMEM.
+ * resctrl_mon_l3_resource_init() - Initialise L3 configuration options.
  */
-int resctrl_mon_resource_init(void)
+void resctrl_mon_l3_resource_init(void)
 {
 	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
-	int ret;
 
 	if (!r->mon_capable)
-		return 0;
-
-	ret = dom_data_init(r);
-	if (ret)
-		return ret;
+		return;
 
 	if (resctrl_arch_is_evt_configurable(QOS_L3_MBM_TOTAL_EVENT_ID)) {
 		mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID].configurable = true;
@@ -970,13 +962,11 @@ int resctrl_mon_resource_init(void)
 		mba_mbps_default_event = QOS_L3_MBM_LOCAL_EVENT_ID;
 	else if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
 		mba_mbps_default_event = QOS_L3_MBM_TOTAL_EVENT_ID;
-
-	return 0;
 }
 
 void resctrl_mon_resource_exit(void)
 {
 	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
 
-	dom_data_exit(r);
+	resctrl_mon_dom_data_exit(r);
 }
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index d9bb01edd582..3d87e6c4c600 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2585,6 +2585,7 @@ static int rdt_get_tree(struct fs_context *fc)
 	unsigned long flags = RFTYPE_CTRL_BASE;
 	struct rdt_l3_mon_domain *dom;
 	struct rdt_resource *r;
+	static bool once;
 	int ret;
 
 	resctrl_arch_pre_mount();
@@ -2599,6 +2600,13 @@ static int rdt_get_tree(struct fs_context *fc)
 		goto out;
 	}
 
+	if (resctrl_arch_mon_capable() && !once) {
+		ret = resctrl_mon_dom_data_init();
+		if (ret)
+			goto out;
+		once = true;
+	}
+
 	ret = rdtgroup_setup_root(ctx);
 	if (ret)
 		goto out;
@@ -4298,9 +4306,7 @@ int resctrl_init(void)
 
 	thread_throttle_mode_init();
 
-	ret = resctrl_mon_resource_init();
-	if (ret)
-		return ret;
+	resctrl_mon_l3_resource_init();
 
 	ret = sysfs_create_mount_point(fs_kobj, "resctrl");
 	if (ret) {
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH v6 27/30] x86/resctrl: Enable RDT_RESOURCE_PERF_PKG
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (25 preceding siblings ...)
  2025-06-26 16:49 ` [PATCH v6 26/30] x86,fs/resctrl: Move RMID initialization to first mount Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-06-26 16:49 ` [PATCH v6 28/30] fs/resctrl: Provide interface to create a debugfs info directory Tony Luck
                   ` (5 subsequent siblings)
  32 siblings, 0 replies; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

The RDT_RESOURCE_PERF_PKG resource is not marked as "mon_capable" during
early resctrl initialization. This means that the domain lists for the
resource are not built when the CPU hot plug notifiers are registered.

Mark the resource as mon_capable and call domain_add_cpu_mon() for
each online CPU to build the domain lists in the first call to the
resctrl_arch_pre_mount() hook.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/resctrl/core.c      | 14 +++++++++++++-
 arch/x86/kernel/cpu/resctrl/intel_aet.c |  3 +++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 50de0c29704f..3ec8fbd2f778 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -774,14 +774,26 @@ static int resctrl_arch_offline_cpu(unsigned int cpu)
 
 void resctrl_arch_pre_mount(void)
 {
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
 	static atomic_t only_once = ATOMIC_INIT(0);
-	int old = 0;
+	int cpu, old = 0;
 
 	if (!atomic_try_cmpxchg(&only_once, &old, 1))
 		return;
 
 	if (!intel_aet_get_events())
 		return;
+
+	/*
+	 * Late discovery of telemetry events means the domains for the
+	 * resource were not built. Do that now.
+	 */
+	cpus_read_lock();
+	mutex_lock(&domain_list_lock);
+	for_each_online_cpu(cpu)
+		domain_add_cpu_mon(cpu, r);
+	mutex_unlock(&domain_list_lock);
+	cpus_read_unlock();
 }
 
 enum {
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index 1d9edd409883..090e7b35c3e2 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -295,6 +295,9 @@ bool intel_aet_get_events(void)
 			r->num_rmid = min(r->num_rmid, (*eg)->num_rmids);
 		else
 			r->num_rmid = (*eg)->num_rmids;
+		pr_info("%s %s monitoring detected\n", r->name, (*eg)->name);
+
+		r->mon_capable = true;
 	}
 
 	return ret1 || ret2;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH v6 28/30] fs/resctrl: Provide interface to create a debugfs info directory
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (26 preceding siblings ...)
  2025-06-26 16:49 ` [PATCH v6 27/30] x86/resctrl: Enable RDT_RESOURCE_PERF_PKG Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-07-09 22:19   ` Reinette Chatre
  2025-06-26 16:49 ` [PATCH v6 29/30] x86/resctrl: Add debug info/PERF_PKG_MON/status files Tony Luck
                   ` (4 subsequent siblings)
  32 siblings, 1 reply; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Architectures are constrained to just the file interfaces provided by
the file system for each resource. This does not allow for architecture
specific debug interfaces.

Add resctrl_debugfs_mon_info_mkdir() which creates a directory in the
debugfs file system for a resource. Naming follows the layout of the
main resctrl hierarchy:

	/sys/kernel/debug/resctrl/info/{resource}_MON

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 include/linux/resctrl.h |  6 ++++++
 fs/resctrl/rdtgroup.c   | 24 ++++++++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 35ae24822493..a8ffd9f61c46 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -569,6 +569,12 @@ void resctrl_arch_reset_all_ctrls(struct rdt_resource *r);
 extern unsigned int resctrl_rmid_realloc_threshold;
 extern unsigned int resctrl_rmid_realloc_limit;
 
+/**
+ * resctrl_debugfs_mon_info_arch_mkdir() - Create a debugfs info directory.
+ * @r:	Resource (must be mon_capable).
+ */
+struct dentry *resctrl_debugfs_mon_info_arch_mkdir(struct rdt_resource *r);
+
 int resctrl_init(void);
 void resctrl_exit(void);
 
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 3d87e6c4c600..511362a67532 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -24,6 +24,7 @@
 #include <linux/sched/task.h>
 #include <linux/slab.h>
 #include <linux/user_namespace.h>
+#include <linux/utsname.h>
 
 #include <uapi/linux/magic.h>
 
@@ -4350,6 +4351,29 @@ int resctrl_init(void)
 	return ret;
 }
 
+/*
+ * Create /sys/kernel/debug/resctrl/info/{r->name}_MON/arch directory
+ * by request for architecture to use.
+ */
+struct dentry *resctrl_debugfs_mon_info_arch_mkdir(struct rdt_resource *r)
+{
+	static struct dentry *debugfs_resctrl_info;
+	struct dentry *moninfodir;
+	char name[32];
+
+	if (!r->mon_capable)
+		return NULL;
+
+	if (!debugfs_resctrl_info)
+		debugfs_resctrl_info = debugfs_create_dir("info", debugfs_resctrl);
+
+	sprintf(name, "%s_MON", r->name);
+
+	moninfodir =  debugfs_create_dir(name, debugfs_resctrl_info);
+
+	return debugfs_create_dir(utsname()->machine, moninfodir);
+}
+
 static bool resctrl_online_domains_exist(void)
 {
 	struct rdt_resource *r;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH v6 29/30] x86/resctrl: Add debug info/PERF_PKG_MON/status files
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (27 preceding siblings ...)
  2025-06-26 16:49 ` [PATCH v6 28/30] fs/resctrl: Provide interface to create a debugfs info directory Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-07-09 22:22   ` Reinette Chatre
  2025-06-26 16:49 ` [PATCH v6 30/30] x86,fs/resctrl: Update Documentation for package events Tony Luck
                   ` (3 subsequent siblings)
  32 siblings, 1 reply; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Each telemetry aggregator provides three status registers at the top
end of MMIO space after all the per-RMID per-event counters:

  agg_data_loss_count: This counts the number of times that this aggregator
  failed to accumulate a counter value supplied by a CPU core.

  agg_data_loss_timestamp: This is a "timestamp" from a free running
  25MHz uncore timer indicating when the most recent data loss occurred.

  last_update_timestamp: Another 25MHz timestamp indicating when the
  most recent counter update was successfully applied.

Create files in /sys/kernel/debug/resctrl/info/PERF_PKG_MON/arch/
to display the value of each of these status registers for each aggregator
in each enabled event group.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/resctrl/intel_aet.c | 56 +++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index 090e7b35c3e2..422e3e126255 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -13,6 +13,7 @@
 
 #include <linux/cleanup.h>
 #include <linux/cpu.h>
+#include <linux/debugfs.h>
 #include <linux/intel_vsec.h>
 #include <linux/io.h>
 #include <linux/minmax.h>
@@ -275,6 +276,58 @@ static bool get_pmt_feature(enum pmt_feature_id feature)
 	return false;
 }
 
+static ssize_t status_read(struct file *f, char __user *buf, size_t count, loff_t *off)
+{
+	void __iomem *info = (void __iomem *)f->f_inode->i_private;
+	char status[32];
+	int len;
+
+	len = sprintf(status, "%llu\n", readq(info));
+
+	return simple_read_from_buffer(buf, count, off, status, len);
+}
+
+static const struct file_operations status_fops = {
+	.read = status_read
+};
+
+static void make_status_files(struct dentry *dir, struct event_group *e, int pkg, int instance)
+{
+	void *info = (void __force *)e->pkginfo[pkg]->addrs[instance] + e->mmio_size;
+	char name[64];
+
+	sprintf(name, "%s_pkg%d_agg%d_data_loss_count", e->name, pkg, instance);
+	debugfs_create_file(name, 0400, dir, info - 24, &status_fops);
+
+	sprintf(name, "%s_pkg%d_agg%d_data_loss_timestamp", e->name, pkg, instance);
+	debugfs_create_file(name, 0400, dir, info - 16, &status_fops);
+
+	sprintf(name, "%s_pkg%d_agg%d_last_update_timestamp", e->name, pkg, instance);
+	debugfs_create_file(name, 0400, dir, info - 8, &status_fops);
+}
+
+static void create_debug_event_status_files(struct dentry *dir, struct event_group *e)
+{
+	int num_pkgs = topology_max_packages();
+
+	for (int i = 0; i < num_pkgs; i++)
+		for (int j = 0; j < e->pkginfo[i]->num_regions; j++)
+			make_status_files(dir, e, i, j);
+}
+
+static void create_debugfs_status_file(struct rdt_resource *r)
+{
+	struct event_group **eg;
+	struct dentry *infodir;
+
+	infodir = resctrl_debugfs_mon_info_arch_mkdir(r);
+	for (eg = &known_event_groups[0]; eg < &known_event_groups[NUM_KNOWN_GROUPS]; eg++) {
+		if (!(*eg)->pfg)
+			continue;
+		create_debug_event_status_files(infodir, *eg);
+	}
+}
+
 /*
  * Ask OOBMSM discovery driver for all the RMID based telemetry groups
  * that it supports.
@@ -300,6 +353,9 @@ bool intel_aet_get_events(void)
 		r->mon_capable = true;
 	}
 
+	if (ret1 || ret2)
+		create_debugfs_status_file(r);
+
 	return ret1 || ret2;
 }
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH v6 30/30] x86,fs/resctrl: Update Documentation for package events
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (28 preceding siblings ...)
  2025-06-26 16:49 ` [PATCH v6 29/30] x86/resctrl: Add debug info/PERF_PKG_MON/status files Tony Luck
@ 2025-06-26 16:49 ` Tony Luck
  2025-07-09 22:24   ` Reinette Chatre
  2025-06-27  0:26 ` [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Luck, Tony
                   ` (2 subsequent siblings)
  32 siblings, 1 reply; 89+ messages in thread
From: Tony Luck @ 2025-06-26 16:49 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches, Tony Luck

Each "mon_data" directory is now divided between L3 events and package
events.

The "info/PERF_PKG_MON" directory contains parameters for perf events.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 Documentation/filesystems/resctrl.rst | 53 ++++++++++++++++++++++-----
 1 file changed, 43 insertions(+), 10 deletions(-)

diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index c7949dd44f2f..a452fd54b3ae 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -167,7 +167,7 @@ with respect to allocation:
 			bandwidth percentages are directly applied to
 			the threads running on the core
 
-If RDT monitoring is available there will be an "L3_MON" directory
+If L3 monitoring is available there will be an "L3_MON" directory
 with the following files:
 
 "num_rmids":
@@ -261,6 +261,23 @@ with the following files:
 		bytes) at which a previously used LLC_occupancy
 		counter can be considered for re-use.
 
+If telemetry monitoring is available there will be an "PERF_PKG_MON" directory
+with the following files:
+
+"num_rmids":
+		The number of telemetry RMIDs supported. If this is different
+		from the number reported in the L3_MON directory the limit
+		on the number of "CTRL_MON" + "MON" directories is the
+		minimum of the values.
+
+"mon_features":
+		Lists the telemetry monitoring events that are enabled on this system.
+
+When the filesystem is mounted with the debug option each subdirectory
+for a monitor resource of the "info" directory will contain a "status"
+file. Resources may use this to supply debug information about the status
+of the hardware implementing the resource.
+
 Finally, in the top level of the "info" directory there is a file
 named "last_cmd_status". This is reset with every "command" issued
 via the file system (making new directories or writing to any of the
@@ -366,15 +383,31 @@ When control is enabled all CTRL_MON groups will also contain:
 When monitoring is enabled all MON groups will also contain:
 
 "mon_data":
-	This contains a set of files organized by L3 domain and by
-	RDT event. E.g. on a system with two L3 domains there will
-	be subdirectories "mon_L3_00" and "mon_L3_01".	Each of these
-	directories have one file per event (e.g. "llc_occupancy",
-	"mbm_total_bytes", and "mbm_local_bytes"). In a MON group these
-	files provide a read out of the current value of the event for
-	all tasks in the group. In CTRL_MON groups these files provide
-	the sum for all tasks in the CTRL_MON group and all tasks in
-	MON groups. Please see example section for more details on usage.
+	This contains a set of directories, one for each instance
+	of an L3 cache, or of a processor package. The L3 cache
+	directories are named "mon_L3_00", "mon_L3_01" etc. The
+	package directories "mon_PERF_PKG_00", "mon_PERF_PKG_01" etc.
+
+	Within each directory there is one file per event. In
+	the L3 directories: "llc_occupancy", "mbm_total_bytes",
+	and "mbm_local_bytes". In the PERF_PKG directories: "core_energy",
+	"activity", etc.
+
+	"core_energy" reports a floating point number for the energy
+	(in Joules) used by cores for each RMID.
+
+	"activity" also reports a floating point value (in Farads).
+	This provides an estimate of work done independent of the
+	frequency that the cores used for execution.
+
+	All other events report decimal integer values.
+
+	In a MON group these files provide a read out of the current
+	value of the event for all tasks in the group. In CTRL_MON groups
+	these files provide the sum for all tasks in the CTRL_MON group
+	and all tasks in MON groups. Please see example section for more
+	details on usage.
+
 	On systems with Sub-NUMA Cluster (SNC) enabled there are extra
 	directories for each node (located within the "mon_L3_XX" directory
 	for the L3 cache they occupy). These are named "mon_sub_L3_YY"
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (29 preceding siblings ...)
  2025-06-26 16:49 ` [PATCH v6 30/30] x86,fs/resctrl: Update Documentation for package events Tony Luck
@ 2025-06-27  0:26 ` Luck, Tony
  2025-06-27 18:09   ` Luck, Tony
  2025-06-30 17:51 ` Reinette Chatre
  2025-07-03 16:45 ` Reinette Chatre
  32 siblings, 1 reply; 89+ messages in thread
From: Luck, Tony @ 2025-06-27  0:26 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Bother. Just got e-mail after posting v6 from lkp. Apparently
I applied the fixes to avoid "'d' used before set" in
domain_remove_cpu_ctrl() and domain_remove_cpu_mon() to some
other branch than the one that made it to my final version.

Please imagine the hunks below merged into patches 7 & 8.

-Tony

---

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 3ec8fbd2f778..39cee572a121 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -651,8 +651,8 @@ static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
 	if (!domain_header_is_valid(hdr, RESCTRL_CTRL_DOMAIN, r->rid))
 		return;
 
-	cpumask_clear_cpu(cpu, &d->hdr.cpu_mask);
-	if (!cpumask_empty(&d->hdr.cpu_mask))
+	cpumask_clear_cpu(cpu, &hdr->cpu_mask);
+	if (!cpumask_empty(&hdr->cpu_mask))
 		return;
 
 	d = container_of(hdr, struct rdt_ctrl_domain, hdr);
@@ -696,8 +696,8 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
 	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, r->rid))
 		return;
 
-	cpumask_clear_cpu(cpu, &d->hdr.cpu_mask);
-	if (!cpumask_empty(&d->hdr.cpu_mask))
+	cpumask_clear_cpu(cpu, &hdr->cpu_mask);
+	if (!cpumask_empty(&hdr->cpu_mask))
 		return;
 
 	switch (r->rid) {

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 17/30] x86/resctrl: Discover hardware telemetry events
  2025-06-26 16:49 ` [PATCH v6 17/30] x86/resctrl: Discover hardware telemetry events Tony Luck
@ 2025-06-27 18:06   ` Luck, Tony
  2025-07-03 18:27     ` Reinette Chatre
  2025-07-08 23:51   ` Reinette Chatre
  1 sibling, 1 reply; 89+ messages in thread
From: Luck, Tony @ 2025-06-27 18:06 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

On Thu, Jun 26, 2025 at 09:49:26AM -0700, Tony Luck wrote:
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 71019b3b54ea..8eb68d2230be 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -512,6 +512,9 @@ config X86_CPU_RESCTRL
>  	select ARCH_HAS_CPU_RESCTRL
>  	select RESCTRL_FS
>  	select RESCTRL_FS_PSEUDO_LOCK
> +	select X86_PLATFORM_DEVICES
> +	select INTEL_VSEC
> +	select INTEL_PMT_TELEMETRY
>  	help
>  	  Enable x86 CPU resource control support.
>  

The list of dependencies to "select" keeps growing. "lkp"
just told me that "INTEL_VSEC" depends on "PCI".

An alternative approach is to just add:

	depends on INTEL_PMT_DISCOVERY=y

instead of all the extra "select" lines.

Pro: This describes exactly what is needed. The INTEL_PMT_DISCOVERY
driver must be built-in to the kernel so that resctrl can enumerate the
telemetry features.

Con: "make olddefconfig" will now drop X86_CPU_RESCTRL until the user
hunts down and enables the chain of dependencies to get RESCTRL turned
back on again.

-Tony

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring
  2025-06-27  0:26 ` [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Luck, Tony
@ 2025-06-27 18:09   ` Luck, Tony
  0 siblings, 0 replies; 89+ messages in thread
From: Luck, Tony @ 2025-06-27 18:09 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

On Thu, Jun 26, 2025 at 05:26:46PM -0700, Luck, Tony wrote:
> Bother. Just got e-mail after posting v6 from lkp. Apparently
> I applied the fixes to avoid "'d' used before set" in
> domain_remove_cpu_ctrl() and domain_remove_cpu_mon() to some
> other branch than the one that made it to my final version.
> 
> Please imagine the hunks below merged into patches 7 & 8.

I merged these changes back into the series and pushed the updated
version to git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git
branch rdt-aet-v6.
> 
> -Tony
> 
> ---
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 3ec8fbd2f778..39cee572a121 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -651,8 +651,8 @@ static void domain_remove_cpu_ctrl(int cpu, struct rdt_resource *r)
>  	if (!domain_header_is_valid(hdr, RESCTRL_CTRL_DOMAIN, r->rid))
>  		return;
>  
> -	cpumask_clear_cpu(cpu, &d->hdr.cpu_mask);
> -	if (!cpumask_empty(&d->hdr.cpu_mask))
> +	cpumask_clear_cpu(cpu, &hdr->cpu_mask);
> +	if (!cpumask_empty(&hdr->cpu_mask))
>  		return;
>  
>  	d = container_of(hdr, struct rdt_ctrl_domain, hdr);
> @@ -696,8 +696,8 @@ static void domain_remove_cpu_mon(int cpu, struct rdt_resource *r)
>  	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, r->rid))
>  		return;
>  
> -	cpumask_clear_cpu(cpu, &d->hdr.cpu_mask);
> -	if (!cpumask_empty(&d->hdr.cpu_mask))
> +	cpumask_clear_cpu(cpu, &hdr->cpu_mask);
> +	if (!cpumask_empty(&hdr->cpu_mask))
>  		return;
>  
>  	switch (r->rid) {

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 14/30] x86,fs/resctrl: Support binary fixed point event counters
  2025-06-26 16:49 ` [PATCH v6 14/30] x86,fs/resctrl: Support binary fixed point event counters Tony Luck
@ 2025-06-27 21:22   ` Fenghua Yu
  2025-06-27 22:28     ` Luck, Tony
  2025-06-27 21:49   ` Fenghua Yu
  2025-07-08 21:46   ` Reinette Chatre
  2 siblings, 1 reply; 89+ messages in thread
From: Fenghua Yu @ 2025-06-27 21:22 UTC (permalink / raw)
  To: Tony Luck, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi, Tony,

On 6/26/25 09:49, Tony Luck wrote:
> Resctrl was written with the assumption that all monitor events can be
> displayed as unsigned decimal integers.
>
> Hardware architecture counters may provide some telemetry events with
> greater precision where the event is not a simple count, but is a
> measurement of some sort (e.g. Joules for energy consumed).
>
> Add a new argument to resctrl_enable_mon_event() for architecture code
> to inform the file system that the value for a counter is a fixed-point
> value with a specific number of binary places.  The file system will
> only allow architecture to use floating point format on events that it
> marked with mon_evt::is_floating_point.

User app needs to know if a number is a floating pointer value or an 
integer value. I see you document the energy and activity events as 
floating point values and all others are integer values.

Is it better to show the value types in info directory?

e.g. create an info file "events_floating" which shows all events with 
floating point values. Events not in this info are integer by default.

This may have two benefits:

1. An app can query the type info to parse the values accordingly 
without hard coding event types.

2. Any future floating point events can be added here without changing 
the document.

> Fixed point values are displayed with values rounded to an appropriate
> number of decimal places for the precision of the number of binary places
> provided. In general one extra decimal place is added for every three
> additional binary places. There are some exceptions for low precision
> binary values where exact representation is possible:
>
>    1 binary place is 0.0 or 0.5.			=> 1 decimal place
>    2 binary places is 0.0. 0.25, 0.5, 0.75	=> 2 decimal places
>    3 binary places is 0.0, 0.125, etc.		=> 3 decimal places
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>   include/linux/resctrl.h            |  4 +-
>   fs/resctrl/internal.h              |  4 ++
>   arch/x86/kernel/cpu/resctrl/core.c |  6 +-
>   fs/resctrl/ctrlmondata.c           | 91 +++++++++++++++++++++++++++++-
>   fs/resctrl/monitor.c               | 10 +++-
>   5 files changed, 108 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index e05a1abb25d4..1060a54cc9fa 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -379,7 +379,9 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
>   u32 resctrl_arch_system_num_rmid_idx(void);
>   int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
>   
> -void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu);
> +#define MAX_BINARY_BITS	27
> +
> +void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu, u32 binary_bits);
>   
>   bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid);
>   
> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
> index f51d10d6a510..4dc678af005c 100644
> --- a/fs/resctrl/internal.h
> +++ b/fs/resctrl/internal.h
> @@ -58,6 +58,8 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
>    * @name:		name of the event
>    * @configurable:	true if the event is configurable
>    * @any_cpu:		true if the event can be read from any CPU
> + * @is_floating_point:	event values may be displayed in floating point format
> + * @binary_bits:	number of fixed-point binary bits from architecture
>    * @enabled:		true if the event is enabled
>    */
>   struct mon_evt {
> @@ -66,6 +68,8 @@ struct mon_evt {
>   	char			*name;
>   	bool			configurable;
>   	bool			any_cpu;
> +	bool			is_floating_point;
> +	int			binary_bits;
>   	bool			enabled;
>   };
>   
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index b83861ab504f..2b6c6b61707d 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -887,15 +887,15 @@ static __init bool get_rdt_mon_resources(void)
>   	bool ret = false;
>   
>   	if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) {
> -		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false);
> +		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false, 0);
>   		ret = true;
>   	}
>   	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
> -		resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID, false);
> +		resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID, false, 0);
>   		ret = true;
>   	}
>   	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
> -		resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID, false);
> +		resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID, false, 0);
>   		ret = true;
>   	}
>   
> diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
> index 2e65fddc3408..29de0e380ccc 100644
> --- a/fs/resctrl/ctrlmondata.c
> +++ b/fs/resctrl/ctrlmondata.c
> @@ -590,6 +590,93 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>   	resctrl_arch_mon_ctx_free(r, evt->evtid, rr->arch_mon_ctx);
>   }
>   
> +/**
> + * struct fixed_params - parameters to decode a binary fixed point value
> + * @decplaces:	Number of decimal places for this number of binary places.
> + * @pow10:	Multiplier (10 ^ decimal places).
> + */
> +struct fixed_params {
> +	int	decplaces;
> +	int	pow10;
> +};
> +
> +static struct fixed_params fixed_params[MAX_BINARY_BITS + 1] = {
> +	[1]  = { .decplaces = 1, .pow10 = 10 },
> +	[2]  = { .decplaces = 2, .pow10 = 100 },
> +	[3]  = { .decplaces = 3, .pow10 = 1000 },
> +	[4]  = { .decplaces = 3, .pow10 = 1000 },
> +	[5]  = { .decplaces = 3, .pow10 = 1000 },
> +	[6]  = { .decplaces = 3, .pow10 = 1000 },
> +	[7]  = { .decplaces = 3, .pow10 = 1000 },
> +	[8]  = { .decplaces = 3, .pow10 = 1000 },
> +	[9]  = { .decplaces = 3, .pow10 = 1000 },
> +	[10] = { .decplaces = 4, .pow10 = 10000 },
> +	[11] = { .decplaces = 4, .pow10 = 10000 },
> +	[12] = { .decplaces = 4, .pow10 = 10000 },
> +	[13] = { .decplaces = 5, .pow10 = 100000 },
> +	[14] = { .decplaces = 5, .pow10 = 100000 },
> +	[15] = { .decplaces = 5, .pow10 = 100000 },
> +	[16] = { .decplaces = 6, .pow10 = 1000000 },
> +	[17] = { .decplaces = 6, .pow10 = 1000000 },
> +	[18] = { .decplaces = 6, .pow10 = 1000000 },
> +	[19] = { .decplaces = 7, .pow10 = 10000000 },
> +	[20] = { .decplaces = 7, .pow10 = 10000000 },
> +	[21] = { .decplaces = 7, .pow10 = 10000000 },
> +	[22] = { .decplaces = 8, .pow10 = 100000000 },
> +	[23] = { .decplaces = 8, .pow10 = 100000000 },
> +	[24] = { .decplaces = 8, .pow10 = 100000000 },
> +	[25] = { .decplaces = 9, .pow10 = 1000000000 },
> +	[26] = { .decplaces = 9, .pow10 = 1000000000 },
> +	[27] = { .decplaces = 9, .pow10 = 1000000000 }
> +};
> +
> +static void print_event_value(struct seq_file *m, int binary_bits, u64 val)
> +{
> +	struct fixed_params *fp = &fixed_params[binary_bits];

Is it worth to have a boundary check here like? I'm afraid without the 
hardening check, a future caller may give a wrong value and cause hard 
debugged failure.

if (WARN_ON_ONCE(binary_bits >=MAX_BINARY_BITS))

     return;

> +	unsigned long long frac;
> +	char buf[10];
> +
> +	/* Mask off the integer part of the fixed-point value. */
> +	frac = val & GENMASK_ULL(binary_bits, 0);
> +
> +	/*
> +	 * Multiply by 10^{desired decimal places}. The
> +	 * integer part of the fixed point value is now
> +	 * almost what is needed.
> +	 */
> +	frac *= fp->pow10;
> +
> +	/*
> +	 * Round to nearest by adding a value that
> +	 * would be a "1" in the binary_bit + 1 place.
> +	 * Integer part of fixed point value is now
> +	 * the needed value.
> +	 */
> +	frac += 1 << (binary_bits - 1);
> +
> +	/*
> +	 * Extract the integer part of the value. This
> +	 * is the decimal representation of the original
> +	 * fixed-point fractional value.
> +	 */
> +	frac >>= binary_bits;
> +
> +	/*
> +	 * "frac" is now in the range [0 .. fp->pow10).
> +	 * I.e. string representation will fit into
> +	 * fp->decplaces.
> +	 */
> +	sprintf(buf, "%0*llu", fp->decplaces, frac);
> +
> +	/* Trim trailing zeroes */
> +	for (int i = fp->decplaces - 1; i > 0; i--) {
> +		if (buf[i] != '0')
> +			break;
> +		buf[i] = '\0';
> +	}
> +	seq_printf(m, "%llu.%s\n", val >> binary_bits, buf);
> +}
> +
>   int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>   {
>   	struct kernfs_open_file *of = m->private;
> @@ -666,8 +753,10 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>   		seq_puts(m, "Error\n");
>   	else if (rr.err == -EINVAL)
>   		seq_puts(m, "Unavailable\n");
> -	else
> +	else if (evt->binary_bits == 0)
>   		seq_printf(m, "%llu\n", rr.val);
> +	else
> +		print_event_value(m, evt->binary_bits, rr.val);
>   
>   out:
>   	rdtgroup_kn_unlock(of->kn);
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index aec26457d82c..076c0cc6e53a 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -897,16 +897,22 @@ struct mon_evt mon_event_all[QOS_NUM_EVENTS] = {
>   	},
>   };
>   
> -void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu)
> +void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu, u32 binary_bits)
>   {
> -	if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS))
> +	if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS) ||
> +			 binary_bits > MAX_BINARY_BITS)
>   		return;
>   	if (mon_event_all[eventid].enabled) {
>   		pr_warn("Duplicate enable for event %d\n", eventid);
>   		return;
>   	}
> +	if (binary_bits && !mon_event_all[eventid].is_floating_point) {
> +		pr_warn("Event %d may not be floating point\n", eventid);
> +		return;
> +	}
>   
>   	mon_event_all[eventid].any_cpu = any_cpu;
> +	mon_event_all[eventid].binary_bits = binary_bits;
>   	mon_event_all[eventid].enabled = true;
>   }
>   

Thanks.

-Fenghua


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 14/30] x86,fs/resctrl: Support binary fixed point event counters
  2025-06-26 16:49 ` [PATCH v6 14/30] x86,fs/resctrl: Support binary fixed point event counters Tony Luck
  2025-06-27 21:22   ` Fenghua Yu
@ 2025-06-27 21:49   ` Fenghua Yu
  2025-07-08 21:46   ` Reinette Chatre
  2 siblings, 0 replies; 89+ messages in thread
From: Fenghua Yu @ 2025-06-27 21:49 UTC (permalink / raw)
  To: Tony Luck, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi, Tony,

On 6/26/25 09:49, Tony Luck wrote:
> Resctrl was written with the assumption that all monitor events can be
> displayed as unsigned decimal integers.
>
> Hardware architecture counters may provide some telemetry events with
> greater precision where the event is not a simple count, but is a
> measurement of some sort (e.g. Joules for energy consumed).
>
> Add a new argument to resctrl_enable_mon_event() for architecture code
> to inform the file system that the value for a counter is a fixed-point
> value with a specific number of binary places.  The file system will
> only allow architecture to use floating point format on events that it
> marked with mon_evt::is_floating_point.
>
> Fixed point values are displayed with values rounded to an appropriate
> number of decimal places for the precision of the number of binary places
> provided. In general one extra decimal place is added for every three
> additional binary places. There are some exceptions for low precision
> binary values where exact representation is possible:
>
>    1 binary place is 0.0 or 0.5.			=> 1 decimal place
>    2 binary places is 0.0. 0.25, 0.5, 0.75	=> 2 decimal places

nit. s/0.0./0.0,/

[SNIP]

Thanks.

-Fenghua


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 01/30] x86,fs/resctrl: Consolidate monitor event descriptions
  2025-06-26 16:49 ` [PATCH v6 01/30] x86,fs/resctrl: Consolidate monitor event descriptions Tony Luck
@ 2025-06-27 21:55   ` Fenghua Yu
  2025-07-08 20:52   ` Reinette Chatre
  1 sibling, 0 replies; 89+ messages in thread
From: Fenghua Yu @ 2025-06-27 21:55 UTC (permalink / raw)
  To: Tony Luck, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches


On 6/26/25 09:49, Tony Luck wrote:
> There are currently only three monitor events, all associated with
> the RDT_RESOURCE_L3 resource. Growing support for additional events
> will be easier with some restructuring to have a single point in
> file system code where all attributes of all events are defined.
>
> Place all event descriptions into an array mon_event_all[]. Doing
> this has the beneficial side effect of removing the need for
> rdt_resource::evt_list.
>
> Add resctrl_event_id::QOS_FIRST_EVENT for a lower bound on range
> checks for event ids and as the starting index to scan mon_event_all[].
>
> Drop the code that builds evt_list and change the two places where
> the list is scanned to scan mon_event_all[] instead using a new
> helper macro for_each_mon_event().
>
> Architecture code now informs file system code which events are
> available with resctrl_enable_mon_event().
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>

Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>

Thanks.

-Fenghua

[SNIP]


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 02/30] x86,fs/resctrl: Replace architecture event enabled checks
  2025-06-26 16:49 ` [PATCH v6 02/30] x86,fs/resctrl: Replace architecture event enabled checks Tony Luck
@ 2025-06-27 22:15   ` Fenghua Yu
  2025-07-08 20:52   ` Reinette Chatre
  1 sibling, 0 replies; 89+ messages in thread
From: Fenghua Yu @ 2025-06-27 22:15 UTC (permalink / raw)
  To: Tony Luck, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches


On 6/26/25 09:49, Tony Luck wrote:
> The resctrl file system now has complete knowledge of the status
> of every event. So there is no need for per-event function calls
> to check.
>
> Replace each of the resctrl_arch_is_{event}enabled() calls with
> resctrl_is_mon_event_enabled(QOS_{EVENT}).
>
> No functional change.
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>

Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>

Thanks.

-Fenghua

[SNIP]


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 14/30] x86,fs/resctrl: Support binary fixed point event counters
  2025-06-27 21:22   ` Fenghua Yu
@ 2025-06-27 22:28     ` Luck, Tony
  0 siblings, 0 replies; 89+ messages in thread
From: Luck, Tony @ 2025-06-27 22:28 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Reinette Chatre, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Anil Keshavamurthy,
	Chen Yu, x86, linux-kernel, patches

On Fri, Jun 27, 2025 at 02:22:18PM -0700, Fenghua Yu wrote:
> Hi, Tony,
> 
> On 6/26/25 09:49, Tony Luck wrote:
> > Resctrl was written with the assumption that all monitor events can be
> > displayed as unsigned decimal integers.
> > 
> > Hardware architecture counters may provide some telemetry events with
> > greater precision where the event is not a simple count, but is a
> > measurement of some sort (e.g. Joules for energy consumed).
> > 
> > Add a new argument to resctrl_enable_mon_event() for architecture code
> > to inform the file system that the value for a counter is a fixed-point
> > value with a specific number of binary places.  The file system will
> > only allow architecture to use floating point format on events that it
> > marked with mon_evt::is_floating_point.
> 
> User app needs to know if a number is a floating pointer value or an integer
> value. I see you document the energy and activity events as floating point
> values and all others are integer values.
> 
> Is it better to show the value types in info directory?
> 
> e.g. create an info file "events_floating" which shows all events with
> floating point values. Events not in this info are integer by default.
> 
> This may have two benefits:
> 
> 1. An app can query the type info to parse the values accordingly without
> hard coding event types.
> 
> 2. Any future floating point events can be added here without changing the
> document.

Maybe. It's obvious which are floating point because the values
have a "." in them.  Some apps may not care about the difference
and just read everything as if they are floating point. Maybe
likely since the next step is to compute the rate with:
	(current_value - previous_value) / delta_t
which will be done as a floating point calculation with
microsecond timestamps.

But it wouldn't be hard to add an info file that lists which are
in floating point (maybe also to provide the precision as
suggested by Dave Martin).

[snip]

> > +static void print_event_value(struct seq_file *m, int binary_bits, u64 val)
> > +{
> > +	struct fixed_params *fp = &fixed_params[binary_bits];
> 
> Is it worth to have a boundary check here like? I'm afraid without the
> hardening check, a future caller may give a wrong value and cause hard
> debugged failure.
> 
> if (WARN_ON_ONCE(binary_bits >=MAX_BINARY_BITS))
> 
>     return;

Seems like belt and braces overkill. resctrl_enable_mon_event()
already has a check for MAX_BINARY_BITS and will not enable
an event if architecture provides a too-large value.

-Tony

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (30 preceding siblings ...)
  2025-06-27  0:26 ` [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Luck, Tony
@ 2025-06-30 17:51 ` Reinette Chatre
  2025-06-30 22:46   ` Luck, Tony
  2025-07-03 16:45 ` Reinette Chatre
  32 siblings, 1 reply; 89+ messages in thread
From: Reinette Chatre @ 2025-06-30 17:51 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches


Tony,

On 6/26/25 9:49 AM, Tony Luck wrote:
> Background
> ----------
> 
> Telemetry features are being implemented in conjunction with the
> IA32_PQR_ASSOC.RMID value on each logical CPU. This is used to send
> counts for various events to a collector in a nearby OOBMSM device to be
> accumulated with counts for each <RMID, event> pair received from other
> CPUs. Cores send event counts when the RMID value changes, or after each
> 2ms elapsed time.

To start a review of this jumbo series and find that the *first* [1]
(straight forward) request from previous review has not been addressed is
demoralizing. I was hoping that the previous version's discussions would result
in review feedback either addressed or discussed (never ignored). I
cannot imagine how requesting OOBMSM to be expanded can be invalid though.

Reinette

[1] https://lore.kernel.org/lkml/b8ddce03-65c0-4420-b30d-e43c54943667@intel.com/

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring
  2025-06-30 17:51 ` Reinette Chatre
@ 2025-06-30 22:46   ` Luck, Tony
  2025-07-08 20:50     ` Reinette Chatre
  0 siblings, 1 reply; 89+ messages in thread
From: Luck, Tony @ 2025-06-30 22:46 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Anil Keshavamurthy,
	Chen Yu, x86, linux-kernel, patches

On Mon, Jun 30, 2025 at 10:51:50AM -0700, Reinette Chatre wrote:
> 
> Tony,
> 
> On 6/26/25 9:49 AM, Tony Luck wrote:
> > Background
> > ----------
> > 
> > Telemetry features are being implemented in conjunction with the
> > IA32_PQR_ASSOC.RMID value on each logical CPU. This is used to send
> > counts for various events to a collector in a nearby OOBMSM device to be
> > accumulated with counts for each <RMID, event> pair received from other
> > CPUs. Cores send event counts when the RMID value changes, or after each
> > 2ms elapsed time.
> 
> To start a review of this jumbo series and find that the *first* [1]
> (straight forward) request from previous review has not been addressed is
> demoralizing. I was hoping that the previous version's discussions would result
> in review feedback either addressed or discussed (never ignored). I
> cannot imagine how requesting OOBMSM to be expanded can be invalid though.
> 
> Reinette
> 
> [1] https://lore.kernel.org/lkml/b8ddce03-65c0-4420-b30d-e43c54943667@intel.com/

My profound apologies for blowing it (again). I went through the comments
to patches multiple times to try and catch all your comments. But somehow
skipped the cover letter :-( .

Here's a re-write to address comments, but also to try to provide
a better story line starting with how the logical processors capture
the event data, following on with aggregator processing, etc.

-Tony

---

On Intel systems that support per-RMID telemetry monitoring each logical
processor keeps a local count for various events. When the IA32_PQR_ASSOC.RMID
value for the logical processor changes (or when a two millisecond counter
expires) these event counts are transmitted to an event aggregator on
the same package as the processor together with the current RMID value. The
event counters are reset to zero to begin counting again.

Each aggregator takes the incoming event counts and adds them to
cumulative counts for each event for each RMID. Note that there can be
multiple aggregators on each package with no architectural association
between logical processors and an aggregator.

All of these aggregated counters can be read by an operating system from
the MMIO space of the Out Of Band Management Service Module (OOBMSM)
device(s) on a system. Any counter can be read from any logical processor.

Intel publishes details for each processor generation showing which
events are counted by each logical processor and the offsets for each
accumulated counter value within the MMIO space in XML files here:
https://github.com/intel/Intel-PMT.

For example there are two energy related telemetry events for the Clearwater
Forest family of processors and the MMIO space looks like this:

Offset	RMID	Event
------	----	-----
0x0000	0	core_energy
0x0008	0	activity
0x0010	1	core_energy
0x0018	1	activity
...
0x23F0	575	core_energy
0x23F8	575	activity

In addition the XML file provides the units (Joules for core_energy,
Farads for activity) and the type of data (fixed-point binary with
bit 63 used as to indicate the data is valid, and the low 18 bits as a
binary fraction).

Finally, each XML file provides a 32-bit unique id (or guid) that is
used as an index to find the correct XML description file for each
telemetry implementation.

The INTEL_PMT_DISCOVERY driver provides intel_pmt_get_regions_by_feature()
to enumerate the aggregator instances on a platform. It provides:
1) guid  - so resctrl can determine which events are supported
2) mmio base address of counters
3) package id

Resctrl accumulates counts from all aggregators on a package in order
to provide a consistent user interface across processor generations.

Directory structure for the telemetry events looks like this:

$ tree /sys/fs/resctrl/mon_data/
/sys/fs/resctrl/mon_data/
mon_data
├── mon_PERF_PKG_00
│   ├── activity
│   └── core_energy
└── mon_PERF_PKG_01
    ├── activity
    └── core_energy

Reading the "core_energy" file from some resctrl mon_data directory shows
the cumulative energy (in Joules) used by all tasks that ran with the RMID
associated with that directory on a given package. Note that "core_energy"
reports only energy consumed by CPU cores (data processing units,
L1/L2 caches, etc.). It does not include energy used in the "uncore"
(L3 cache, on package devices, etc.), or used by memory or I/O devices.

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring
  2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
                   ` (31 preceding siblings ...)
  2025-06-30 17:51 ` Reinette Chatre
@ 2025-07-03 16:45 ` Reinette Chatre
  2025-07-03 17:22   ` Luck, Tony
  32 siblings, 1 reply; 89+ messages in thread
From: Reinette Chatre @ 2025-07-03 16:45 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony and Dave,

On 6/26/25 9:49 AM, Tony Luck wrote:
>  --- 14 ---
> Add mon_evt::is_floating_point set by resctrl file system code to limit
> which events architecture code can request be displayed in floating point.
> 
> Simplified the fixed-point to floating point algorithm. Reinette is
> correct that the additional "lshift" and "rshift" operations are not
> required. All that is needed is to multiply the fixed point fractional
> part by 10**decimal_places, add a rounding amount equivalent to a "1"
> in the binary place after those supplied. Finally divide by 2**binary_places
> (with a right shift).
> 
> Explained in commit comment how I chose the number of decimal places to
> use for each binary places value.
> 
> N.B. Dave Martin expressed an opinion that the kernel should not do
> this conversion. Instead it should enumerate the scaling factor for
> each event where hardware reported a fixed point value. This patch
> could be dropped and replaced with one to enumerate scaling factors
> per event if others agree with Dave.

Could resctrl accommodate both usages? For example, it does not
look too invasive to add a second file <mon_evt::name>.raw for the
mon_evt::is_floating_point events that can output something like Dave
suggested in [1]:

.raw file format could be:
	#format:<output that depends on format>
	#fixed-point:<value>/<scaling factor>

Example output:
	fixed-point:0x60000/0x40000

Reinette

[1] https://lore.kernel.org/lkml/aEhMWBemtev%2Ff3yf@e133380.arm.com/


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring
  2025-07-03 16:45 ` Reinette Chatre
@ 2025-07-03 17:22   ` Luck, Tony
  2025-07-08 19:08     ` Luck, Tony
  0 siblings, 1 reply; 89+ messages in thread
From: Luck, Tony @ 2025-07-03 17:22 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Anil Keshavamurthy,
	Chen Yu, x86, linux-kernel, patches

On Thu, Jul 03, 2025 at 09:45:15AM -0700, Reinette Chatre wrote:
> Hi Tony and Dave,
> 
> On 6/26/25 9:49 AM, Tony Luck wrote:
> >  --- 14 ---
> > Add mon_evt::is_floating_point set by resctrl file system code to limit
> > which events architecture code can request be displayed in floating point.
> > 
> > Simplified the fixed-point to floating point algorithm. Reinette is
> > correct that the additional "lshift" and "rshift" operations are not
> > required. All that is needed is to multiply the fixed point fractional
> > part by 10**decimal_places, add a rounding amount equivalent to a "1"
> > in the binary place after those supplied. Finally divide by 2**binary_places
> > (with a right shift).
> > 
> > Explained in commit comment how I chose the number of decimal places to
> > use for each binary places value.
> > 
> > N.B. Dave Martin expressed an opinion that the kernel should not do
> > this conversion. Instead it should enumerate the scaling factor for
> > each event where hardware reported a fixed point value. This patch
> > could be dropped and replaced with one to enumerate scaling factors
> > per event if others agree with Dave.
> 
> Could resctrl accommodate both usages? For example, it does not
> look too invasive to add a second file <mon_evt::name>.raw for the
> mon_evt::is_floating_point events that can output something like Dave
> suggested in [1]:
> 
> .raw file format could be:
> 	#format:<output that depends on format>
> 	#fixed-point:<value>/<scaling factor>
> 
> Example output:
> 	fixed-point:0x60000/0x40000

Dave: Is that what you want in the ".raw" file? An alternative would be
to put the format information for non-integer events into an
"info" file ("info/{RESOURCE_NAME}_MON/monfeatures.raw.formats"?)
and just put the raw value into the ".raw" file under mon_data.

> 
> Reinette
> 
> [1] https://lore.kernel.org/lkml/aEhMWBemtev%2Ff3yf@e133380.arm.com/

-Tony

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 17/30] x86/resctrl: Discover hardware telemetry events
  2025-06-27 18:06   ` Luck, Tony
@ 2025-07-03 18:27     ` Reinette Chatre
  2025-07-03 20:17       ` Luck, Tony
  0 siblings, 1 reply; 89+ messages in thread
From: Reinette Chatre @ 2025-07-03 18:27 UTC (permalink / raw)
  To: Luck, Tony, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony,

On 6/27/25 11:06 AM, Luck, Tony wrote:
> On Thu, Jun 26, 2025 at 09:49:26AM -0700, Tony Luck wrote:
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index 71019b3b54ea..8eb68d2230be 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -512,6 +512,9 @@ config X86_CPU_RESCTRL
>>  	select ARCH_HAS_CPU_RESCTRL
>>  	select RESCTRL_FS
>>  	select RESCTRL_FS_PSEUDO_LOCK
>> +	select X86_PLATFORM_DEVICES
>> +	select INTEL_VSEC
>> +	select INTEL_PMT_TELEMETRY
>>  	help
>>  	  Enable x86 CPU resource control support.
>>  
> 
> The list of dependencies to "select" keeps growing. "lkp"
> just told me that "INTEL_VSEC" depends on "PCI".
> 
> An alternative approach is to just add:
> 
> 	depends on INTEL_PMT_DISCOVERY=y
> 
> instead of all the extra "select" lines.
> 
> Pro: This describes exactly what is needed. The INTEL_PMT_DISCOVERY
> driver must be built-in to the kernel so that resctrl can enumerate the
> telemetry features.

How will this behave on AMD systems?

> 
> Con: "make olddefconfig" will now drop X86_CPU_RESCTRL until the user
> hunts down and enables the chain of dependencies to get RESCTRL turned
> back on again.

Reinette


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 17/30] x86/resctrl: Discover hardware telemetry events
  2025-07-03 18:27     ` Reinette Chatre
@ 2025-07-03 20:17       ` Luck, Tony
  2025-07-03 20:31         ` Reinette Chatre
  0 siblings, 1 reply; 89+ messages in thread
From: Luck, Tony @ 2025-07-03 20:17 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Anil Keshavamurthy,
	Chen Yu, x86, linux-kernel, patches

On Thu, Jul 03, 2025 at 11:27:19AM -0700, Reinette Chatre wrote:
> Hi Tony,
> 
> On 6/27/25 11:06 AM, Luck, Tony wrote:
> > On Thu, Jun 26, 2025 at 09:49:26AM -0700, Tony Luck wrote:
> >> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> >> index 71019b3b54ea..8eb68d2230be 100644
> >> --- a/arch/x86/Kconfig
> >> +++ b/arch/x86/Kconfig
> >> @@ -512,6 +512,9 @@ config X86_CPU_RESCTRL
> >>  	select ARCH_HAS_CPU_RESCTRL
> >>  	select RESCTRL_FS
> >>  	select RESCTRL_FS_PSEUDO_LOCK
> >> +	select X86_PLATFORM_DEVICES
> >> +	select INTEL_VSEC
> >> +	select INTEL_PMT_TELEMETRY
> >>  	help
> >>  	  Enable x86 CPU resource control support.
> >>  
> > 
> > The list of dependencies to "select" keeps growing. "lkp"
> > just told me that "INTEL_VSEC" depends on "PCI".
> > 
> > An alternative approach is to just add:
> > 
> > 	depends on INTEL_PMT_DISCOVERY=y
> > 
> > instead of all the extra "select" lines.
> > 
> > Pro: This describes exactly what is needed. The INTEL_PMT_DISCOVERY
> > driver must be built-in to the kernel so that resctrl can enumerate the
> > telemetry features.
> 
> How will this behave on AMD systems?

The call to intel_pmt_get_regions_by_feature() in the INTEL_PMT_DISCOVERY
driver will return that there are no telemetry events.

If it is a problem to force resctrl users building AMD only kernels
to load the INTEL_PMT_DISCOVERY in order to use resctrl, then I can
look at providing stubs for the entry points in intel_aet.c and
create a new CONFIG option to allow resctrl to be built without
Intel telemetry support.

> > 
> > Con: "make olddefconfig" will now drop X86_CPU_RESCTRL until the user
> > hunts down and enables the chain of dependencies to get RESCTRL turned
> > back on again.
> 
> Reinette
> 

-Tony

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 17/30] x86/resctrl: Discover hardware telemetry events
  2025-07-03 20:17       ` Luck, Tony
@ 2025-07-03 20:31         ` Reinette Chatre
  2025-07-03 21:11           ` Luck, Tony
  0 siblings, 1 reply; 89+ messages in thread
From: Reinette Chatre @ 2025-07-03 20:31 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Anil Keshavamurthy,
	Chen Yu, x86, linux-kernel, patches

Hi Tony,

On 7/3/25 1:17 PM, Luck, Tony wrote:
> On Thu, Jul 03, 2025 at 11:27:19AM -0700, Reinette Chatre wrote:
>> On 6/27/25 11:06 AM, Luck, Tony wrote:
>>> On Thu, Jun 26, 2025 at 09:49:26AM -0700, Tony Luck wrote:
>>>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>>>> index 71019b3b54ea..8eb68d2230be 100644
>>>> --- a/arch/x86/Kconfig
>>>> +++ b/arch/x86/Kconfig
>>>> @@ -512,6 +512,9 @@ config X86_CPU_RESCTRL
>>>>  	select ARCH_HAS_CPU_RESCTRL
>>>>  	select RESCTRL_FS
>>>>  	select RESCTRL_FS_PSEUDO_LOCK
>>>> +	select X86_PLATFORM_DEVICES
>>>> +	select INTEL_VSEC
>>>> +	select INTEL_PMT_TELEMETRY
>>>>  	help
>>>>  	  Enable x86 CPU resource control support.
>>>>  
>>>
>>> The list of dependencies to "select" keeps growing. "lkp"
>>> just told me that "INTEL_VSEC" depends on "PCI".
>>>
>>> An alternative approach is to just add:
>>>
>>> 	depends on INTEL_PMT_DISCOVERY=y
>>>
>>> instead of all the extra "select" lines.
>>>
>>> Pro: This describes exactly what is needed. The INTEL_PMT_DISCOVERY
>>> driver must be built-in to the kernel so that resctrl can enumerate the
>>> telemetry features.
>>
>> How will this behave on AMD systems?
> 
> The call to intel_pmt_get_regions_by_feature() in the INTEL_PMT_DISCOVERY
> driver will return that there are no telemetry events.
> 
> If it is a problem to force resctrl users building AMD only kernels
> to load the INTEL_PMT_DISCOVERY in order to use resctrl, then I can
> look at providing stubs for the entry points in intel_aet.c and
> create a new CONFIG option to allow resctrl to be built without
> Intel telemetry support.

I do not think resctrl should enforce dependency on a driver that is not
valid for a platform.

Reinette


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 17/30] x86/resctrl: Discover hardware telemetry events
  2025-07-03 20:31         ` Reinette Chatre
@ 2025-07-03 21:11           ` Luck, Tony
  2025-07-03 22:00             ` Reinette Chatre
  0 siblings, 1 reply; 89+ messages in thread
From: Luck, Tony @ 2025-07-03 21:11 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Anil Keshavamurthy,
	Chen Yu, x86, linux-kernel, patches

On Thu, Jul 03, 2025 at 01:31:19PM -0700, Reinette Chatre wrote:
> I do not think resctrl should enforce dependency on a driver that is not
> valid for a platform.

Fewer stubs than I thought.  I can merge something along these
lines back into the series for the next version.

Suggestions welcome for the name of the config option. Do
I need a "_CPU" in CONFIG_X86_RESCTRL_INTEL_AET? It's already
very long.

"help" text is a placeholder. I can change that up to add more
details.

-Tony

---

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 11f25c225837..56615b1d3fc3 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -175,9 +175,19 @@ bool rdt_is_software_feature_enabled(char *option);
 
 bool rdt_is_software_feature_force_enabled(char *name);
 
+#ifdef CONFIG_X86_RESCTRL_INTEL_AET
 bool intel_aet_get_events(void);
 void __exit intel_aet_exit(void);
 int intel_aet_read_event(int domid, int rmid, enum resctrl_event_id evtid,
 			 void *arch_priv, u64 *val);
+#else
+static inline bool intel_aet_get_events(void) { return false; }
+static inline void __exit intel_aet_exit(void) { }
+static inline int intel_aet_read_event(int domid, int rmid, enum resctrl_event_id evtid,
+				       void *arch_priv, u64 *val)
+{
+	return -EINVAL;
+}
+#endif
 
 #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a6b6ecbd3877..ceb3eb371a3d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -512,9 +512,6 @@ config X86_CPU_RESCTRL
 	select ARCH_HAS_CPU_RESCTRL
 	select RESCTRL_FS
 	select RESCTRL_FS_PSEUDO_LOCK
-	select X86_PLATFORM_DEVICES
-	select INTEL_VSEC
-	select INTEL_PMT_TELEMETRY
 	help
 	  Enable x86 CPU resource control support.
 
@@ -531,6 +528,18 @@ config X86_CPU_RESCTRL
 
 	  Say N if unsure.
 
+config X86_RESCTRL_INTEL_AET
+	bool "Intel Application Energy Telemetry"
+	depends on X86_CPU_RESCTRL && CPU_SUP_INTEL && INTEL_PMT_DISCOVERY
+	help
+	  Enable per-RMID telemetry events in resctrl
+
+	  Intel feature that collects per-RMID execution data
+	  including core energy consumed by tasks. Data is aggregated
+	  per package.
+
+	  Say N if unsure.
+
 config X86_FRED
 	bool "Flexible Return and Event Delivery"
 	depends on X86_64
diff --git a/arch/x86/kernel/cpu/resctrl/Makefile b/arch/x86/kernel/cpu/resctrl/Makefile
index 97ceb4e44dfa..26fc957fb3dd 100644
--- a/arch/x86/kernel/cpu/resctrl/Makefile
+++ b/arch/x86/kernel/cpu/resctrl/Makefile
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_X86_CPU_RESCTRL)		+= core.o rdtgroup.o monitor.o
 obj-$(CONFIG_X86_CPU_RESCTRL)		+= ctrlmondata.o
-obj-$(CONFIG_X86_CPU_RESCTRL)		+= intel_aet.o
+obj-$(CONFIG_X86_RESCTRL_INTEL_AET)	+= intel_aet.o
 obj-$(CONFIG_RESCTRL_FS_PSEUDO_LOCK)	+= pseudo_lock.o
 
 # To allow define_trace.h's recursive include:
-- 
2.50.0


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 17/30] x86/resctrl: Discover hardware telemetry events
  2025-07-03 21:11           ` Luck, Tony
@ 2025-07-03 22:00             ` Reinette Chatre
  2025-07-03 23:29               ` Luck, Tony
  0 siblings, 1 reply; 89+ messages in thread
From: Reinette Chatre @ 2025-07-03 22:00 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Anil Keshavamurthy,
	Chen Yu, x86, linux-kernel, patches

Hi Tony,

On 7/3/25 2:11 PM, Luck, Tony wrote:
> On Thu, Jul 03, 2025 at 01:31:19PM -0700, Reinette Chatre wrote:
>> I do not think resctrl should enforce dependency on a driver that is not
>> valid for a platform.
> 
> Fewer stubs than I thought.  I can merge something along these
> lines back into the series for the next version.
> 
> Suggestions welcome for the name of the config option. Do
> I need a "_CPU" in CONFIG_X86_RESCTRL_INTEL_AET? It's already
> very long.

Looking at other config options in the same file it does not seem
as though this new name is exceedingly long. I'd vote for keeping the
"_CPU" with the motivation that doing so maintains a "namespace" prefix
for the resctrl options.

> 
> "help" text is a placeholder. I can change that up to add more
> details.
> 
> -Tony
> 
> ---
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 11f25c225837..56615b1d3fc3 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -175,9 +175,19 @@ bool rdt_is_software_feature_enabled(char *option);
>  
>  bool rdt_is_software_feature_force_enabled(char *name);
>  
> +#ifdef CONFIG_X86_RESCTRL_INTEL_AET
>  bool intel_aet_get_events(void);
>  void __exit intel_aet_exit(void);
>  int intel_aet_read_event(int domid, int rmid, enum resctrl_event_id evtid,
>  			 void *arch_priv, u64 *val);
> +#else
> +static inline bool intel_aet_get_events(void) { return false; }
> +static inline void __exit intel_aet_exit(void) { }
> +static inline int intel_aet_read_event(int domid, int rmid, enum resctrl_event_id evtid,
> +				       void *arch_priv, u64 *val)
> +{
> +	return -EINVAL;
> +}
> +#endif
>  
>  #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index a6b6ecbd3877..ceb3eb371a3d 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -512,9 +512,6 @@ config X86_CPU_RESCTRL
>  	select ARCH_HAS_CPU_RESCTRL
>  	select RESCTRL_FS
>  	select RESCTRL_FS_PSEUDO_LOCK
> -	select X86_PLATFORM_DEVICES
> -	select INTEL_VSEC
> -	select INTEL_PMT_TELEMETRY
>  	help
>  	  Enable x86 CPU resource control support.
>  
> @@ -531,6 +528,18 @@ config X86_CPU_RESCTRL
>  
>  	  Say N if unsure.
>  
> +config X86_RESCTRL_INTEL_AET
> +	bool "Intel Application Energy Telemetry"
> +	depends on X86_CPU_RESCTRL && CPU_SUP_INTEL && INTEL_PMT_DISCOVERY

Thank you. This pattern looks more appropriate to me. Do you expect that
the X86_64 dependency (added in patch #22) will move here also?

> +	help
> +	  Enable per-RMID telemetry events in resctrl
> +
> +	  Intel feature that collects per-RMID execution data
> +	  including core energy consumed by tasks. Data is aggregated
> +	  per package.
> +
> +	  Say N if unsure.
> +
>  config X86_FRED
>  	bool "Flexible Return and Event Delivery"
>  	depends on X86_64
> diff --git a/arch/x86/kernel/cpu/resctrl/Makefile b/arch/x86/kernel/cpu/resctrl/Makefile
> index 97ceb4e44dfa..26fc957fb3dd 100644
> --- a/arch/x86/kernel/cpu/resctrl/Makefile
> +++ b/arch/x86/kernel/cpu/resctrl/Makefile
> @@ -1,7 +1,7 @@
>  # SPDX-License-Identifier: GPL-2.0
>  obj-$(CONFIG_X86_CPU_RESCTRL)		+= core.o rdtgroup.o monitor.o
>  obj-$(CONFIG_X86_CPU_RESCTRL)		+= ctrlmondata.o
> -obj-$(CONFIG_X86_CPU_RESCTRL)		+= intel_aet.o
> +obj-$(CONFIG_X86_RESCTRL_INTEL_AET)	+= intel_aet.o
>  obj-$(CONFIG_RESCTRL_FS_PSEUDO_LOCK)	+= pseudo_lock.o
>  
>  # To allow define_trace.h's recursive include:

Reinette


^ permalink raw reply	[flat|nested] 89+ messages in thread

* RE: [PATCH v6 17/30] x86/resctrl: Discover hardware telemetry events
  2025-07-03 22:00             ` Reinette Chatre
@ 2025-07-03 23:29               ` Luck, Tony
  0 siblings, 0 replies; 89+ messages in thread
From: Luck, Tony @ 2025-07-03 23:29 UTC (permalink / raw)
  To: Chatre, Reinette
  Cc: Fenghua Yu, Wieczor-Retman, Maciej, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Keshavamurthy, Anil S,
	Chen, Yu C, x86@kernel.org, linux-kernel@vger.kernel.org,
	patches@lists.linux.dev

> > +config X86_RESCTRL_INTEL_AET
> > +   bool "Intel Application Energy Telemetry"
> > +   depends on X86_CPU_RESCTRL && CPU_SUP_INTEL && INTEL_PMT_DISCOVERY
>
> Thank you. This pattern looks more appropriate to me. Do you expect that
> the X86_64 dependency (added in patch #22) will move here also?

Yes. It will move here. The rest of resctrl is still 32-bit clean (presumably ... I
wonder if anyone regularly builds, runs, and tests resctrl in a 32-bit kernel).

-Tony

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring
  2025-07-03 17:22   ` Luck, Tony
@ 2025-07-08 19:08     ` Luck, Tony
  2025-07-08 20:49       ` Reinette Chatre
  0 siblings, 1 reply; 89+ messages in thread
From: Luck, Tony @ 2025-07-08 19:08 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Anil Keshavamurthy,
	Chen Yu, x86, linux-kernel, patches

On Thu, Jul 03, 2025 at 10:22:06AM -0700, Luck, Tony wrote:
> On Thu, Jul 03, 2025 at 09:45:15AM -0700, Reinette Chatre wrote:
> > Hi Tony and Dave,
> > 
> > On 6/26/25 9:49 AM, Tony Luck wrote:
> > >  --- 14 ---
> > > Add mon_evt::is_floating_point set by resctrl file system code to limit
> > > which events architecture code can request be displayed in floating point.
> > > 
> > > Simplified the fixed-point to floating point algorithm. Reinette is
> > > correct that the additional "lshift" and "rshift" operations are not
> > > required. All that is needed is to multiply the fixed point fractional
> > > part by 10**decimal_places, add a rounding amount equivalent to a "1"
> > > in the binary place after those supplied. Finally divide by 2**binary_places
> > > (with a right shift).
> > > 
> > > Explained in commit comment how I chose the number of decimal places to
> > > use for each binary places value.
> > > 
> > > N.B. Dave Martin expressed an opinion that the kernel should not do
> > > this conversion. Instead it should enumerate the scaling factor for
> > > each event where hardware reported a fixed point value. This patch
> > > could be dropped and replaced with one to enumerate scaling factors
> > > per event if others agree with Dave.
> > 
> > Could resctrl accommodate both usages? For example, it does not
> > look too invasive to add a second file <mon_evt::name>.raw for the
> > mon_evt::is_floating_point events that can output something like Dave
> > suggested in [1]:
> > 
> > .raw file format could be:
> > 	#format:<output that depends on format>
> > 	#fixed-point:<value>/<scaling factor>
> > 
> > Example output:
> > 	fixed-point:0x60000/0x40000
> 
> Dave: Is that what you want in the ".raw" file? An alternative would be
> to put the format information for non-integer events into an
> "info" file ("info/{RESOURCE_NAME}_MON/monfeatures.raw.formats"?)
> and just put the raw value into the ".raw" file under mon_data.

Note that I thought it easier for users to keep the raw file to just
showing a value, rather than including the formatting details in
Reinette's proposal.

Patch to implement my alternative suggestion below. To the user things
look like this:

$ cd /sys/fs/resctrl/mon_data/mon_PERF_PKG_01
$ cat core_energy
0.02203
$ cat core_energy.raw
5775
$ cat /sys/fs/resctrl/info/PERF_PKG_MON/mon_features_raw_scale
core_energy 262144
activity 262144
$ bc -ql
5775 / 262144
.02202987670898437500

If this seems useful I can write up a commit message and include
as its own patch in v7. Suggestions for better names?

-Tony

---

diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
index 4704ea7228ca..5ac4e3c98f23 100644
--- a/fs/resctrl/internal.h
+++ b/fs/resctrl/internal.h
@@ -90,6 +90,8 @@ extern struct mon_evt mon_event_all[QOS_NUM_EVENTS];
  *                   the event file belongs. When @sum is one this
  *                   is the id of the L3 cache that all domains to be
  *                   summed share.
+ * @raw:             Set for ".raw" files that directly show hardware
+ *                   provided counts with no interpretation.
  *
  * Pointed to by the kernfs kn->priv field of monitoring event files.
  * Readers and writers must hold rdtgroup_mutex.
@@ -100,6 +102,7 @@ struct mon_data {
 	struct mon_evt		*evt;
 	int			domid;
 	bool			sum;
+	bool			raw;
 };
 
 /**
diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index 29de0e380ccc..78e7af296d5a 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -753,7 +753,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
 		seq_puts(m, "Error\n");
 	else if (rr.err == -EINVAL)
 		seq_puts(m, "Unavailable\n");
-	else if (evt->binary_bits == 0)
+	else if (md->raw || evt->binary_bits == 0)
 		seq_printf(m, "%llu\n", rr.val);
 	else
 		print_event_value(m, evt->binary_bits, rr.val);
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 511362a67532..97786831722a 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -1158,6 +1158,21 @@ static int rdt_mon_features_show(struct kernfs_open_file *of,
 	return 0;
 }
 
+static int rdt_mon_features_raw_scale_show(struct kernfs_open_file *of,
+					   struct seq_file *seq, void *v)
+{
+	struct rdt_resource *r = rdt_kn_parent_priv(of->kn);
+	struct mon_evt *mevt;
+
+	for_each_mon_event(mevt) {
+		if (mevt->rid != r->rid || !mevt->enabled || !mevt->binary_bits)
+			continue;
+		seq_printf(seq, "%s %u\n", mevt->name, 1 << mevt->binary_bits);
+	}
+
+	return 0;
+}
+
 static int rdt_bw_gran_show(struct kernfs_open_file *of,
 			    struct seq_file *seq, void *v)
 {
@@ -1823,6 +1838,13 @@ static struct rftype res_common_files[] = {
 		.seq_show	= rdt_mon_features_show,
 		.fflags		= RFTYPE_MON_INFO,
 	},
+	{
+		.name		= "mon_features_raw_scale",
+		.mode		= 0444,
+		.kf_ops		= &rdtgroup_kf_single_ops,
+		.seq_show	= rdt_mon_features_raw_scale_show,
+		.fflags		= RFTYPE_MON_INFO,
+	},
 	{
 		.name		= "num_rmids",
 		.mode		= 0444,
@@ -2905,7 +2927,7 @@ static void rmdir_all_sub(void)
  */
 static struct mon_data *mon_get_kn_priv(enum resctrl_res_level rid, int domid,
 					struct mon_evt *mevt,
-					bool do_sum)
+					bool do_sum, bool rawfile)
 {
 	struct mon_data *priv;
 
@@ -2916,7 +2938,8 @@ static struct mon_data *mon_get_kn_priv(enum resctrl_res_level rid, int domid,
 
 	list_for_each_entry(priv, &mon_data_kn_priv_list, list) {
 		if (priv->rid == rid && priv->domid == domid &&
-		    priv->sum == do_sum && priv->evt == mevt)
+		    priv->sum == do_sum && priv->evt == mevt &&
+		    priv->raw == rawfile)
 			return priv;
 	}
 
@@ -2928,6 +2951,7 @@ static struct mon_data *mon_get_kn_priv(enum resctrl_res_level rid, int domid,
 	priv->domid = domid;
 	priv->sum = do_sum;
 	priv->evt = mevt;
+	priv->raw = rawfile;
 	list_add_tail(&priv->list, &mon_data_kn_priv_list);
 
 	return priv;
@@ -3078,12 +3102,13 @@ static int mon_add_all_files(struct kernfs_node *kn, struct rdt_domain_hdr *hdr,
 	struct rmid_read rr = {0};
 	struct mon_data *priv;
 	struct mon_evt *mevt;
+	char rawname[64];
 	int ret;
 
 	for_each_mon_event(mevt) {
 		if (mevt->rid != r->rid || !mevt->enabled)
 			continue;
-		priv = mon_get_kn_priv(r->rid, domid, mevt, do_sum);
+		priv = mon_get_kn_priv(r->rid, domid, mevt, do_sum, false);
 		if (WARN_ON_ONCE(!priv))
 			return -EINVAL;
 
@@ -3093,6 +3118,18 @@ static int mon_add_all_files(struct kernfs_node *kn, struct rdt_domain_hdr *hdr,
 
 		if (r->rid == RDT_RESOURCE_L3 && !do_sum && resctrl_is_mbm_event(mevt->evtid))
 			mon_event_read(&rr, r, hdr, prgrp, &hdr->cpu_mask, mevt, true);
+
+		if (!mevt->binary_bits)
+			continue;
+
+		sprintf(rawname, "%s.raw", mevt->name);
+		priv = mon_get_kn_priv(r->rid, domid, mevt, do_sum, true);
+		if (WARN_ON_ONCE(!priv))
+			return -EINVAL;
+
+		ret = mon_addfile(kn, rawname, priv);
+		if (ret)
+			return ret;
 	}
 
 	return 0;

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring
  2025-07-08 19:08     ` Luck, Tony
@ 2025-07-08 20:49       ` Reinette Chatre
  2025-07-08 22:43         ` Luck, Tony
  0 siblings, 1 reply; 89+ messages in thread
From: Reinette Chatre @ 2025-07-08 20:49 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Anil Keshavamurthy,
	Chen Yu, x86, linux-kernel, patches

Hi Tony,

On 7/8/25 12:08 PM, Luck, Tony wrote:
> On Thu, Jul 03, 2025 at 10:22:06AM -0700, Luck, Tony wrote:
>> On Thu, Jul 03, 2025 at 09:45:15AM -0700, Reinette Chatre wrote:
>>> Hi Tony and Dave,
>>>
>>> On 6/26/25 9:49 AM, Tony Luck wrote:
>>>>  --- 14 ---
>>>> Add mon_evt::is_floating_point set by resctrl file system code to limit
>>>> which events architecture code can request be displayed in floating point.
>>>>
>>>> Simplified the fixed-point to floating point algorithm. Reinette is
>>>> correct that the additional "lshift" and "rshift" operations are not
>>>> required. All that is needed is to multiply the fixed point fractional
>>>> part by 10**decimal_places, add a rounding amount equivalent to a "1"
>>>> in the binary place after those supplied. Finally divide by 2**binary_places
>>>> (with a right shift).
>>>>
>>>> Explained in commit comment how I chose the number of decimal places to
>>>> use for each binary places value.
>>>>
>>>> N.B. Dave Martin expressed an opinion that the kernel should not do
>>>> this conversion. Instead it should enumerate the scaling factor for
>>>> each event where hardware reported a fixed point value. This patch
>>>> could be dropped and replaced with one to enumerate scaling factors
>>>> per event if others agree with Dave.
>>>
>>> Could resctrl accommodate both usages? For example, it does not
>>> look too invasive to add a second file <mon_evt::name>.raw for the
>>> mon_evt::is_floating_point events that can output something like Dave
>>> suggested in [1]:
>>>
>>> .raw file format could be:
>>> 	#format:<output that depends on format>
>>> 	#fixed-point:<value>/<scaling factor>
>>>
>>> Example output:
>>> 	fixed-point:0x60000/0x40000
>>
>> Dave: Is that what you want in the ".raw" file? An alternative would be
>> to put the format information for non-integer events into an
>> "info" file ("info/{RESOURCE_NAME}_MON/monfeatures.raw.formats"?)
>> and just put the raw value into the ".raw" file under mon_data.
> 
> Note that I thought it easier for users to keep the raw file to just
> showing a value, rather than including the formatting details in
> Reinette's proposal.

Could you please elaborate what makes this easier? It is not obvious to me
how it is easier for user to open, parse, and close two files rather than one.
(more below)
> 
> Patch to implement my alternative suggestion below. To the user things
> look like this:
> 
> $ cd /sys/fs/resctrl/mon_data/mon_PERF_PKG_01
> $ cat core_energy
> 0.02203
> $ cat core_energy.raw
> 5775
> $ cat /sys/fs/resctrl/info/PERF_PKG_MON/mon_features_raw_scale
> core_energy 262144
> activity 262144
> $ bc -ql
> 5775 / 262144
> .02202987670898437500
> 
> If this seems useful I can write up a commit message and include
> as its own patch in v7. Suggestions for better names?
> 

I expect users to regularly interact with the monitoring files. For example,
"read the core_energy of group x every second". An API like above would require
a contract that the scale value will never change from resctrl mount to
resctrl unmount. I understand that this implementation supports exactly this by
allowing an architecture to only enable an event once, but do you think this is
something that will always be the case? If not then an interface like above will
require user space to open, parse, close two files instead of one on a frequent basis.
This is not ideal if user space wants to read monitoring data of multiple
groups frequently.

I would also like to keep extensibility in mind. We now know that
unsigned decimal and fixed-point binary needs to be supported. I think any
new interface used to communicate formatting information to user space should be done
in a way that can be extended for a new format. That is, for example, why
I used the actual term "fixed-point" in the example. Something like this avoids
needing assumptions that a raw value always implies fixed-point format.

Reinette


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring
  2025-06-30 22:46   ` Luck, Tony
@ 2025-07-08 20:50     ` Reinette Chatre
  0 siblings, 0 replies; 89+ messages in thread
From: Reinette Chatre @ 2025-07-08 20:50 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Anil Keshavamurthy,
	Chen Yu, x86, linux-kernel, patches

Hi Tony,

On 6/30/25 3:46 PM, Luck, Tony wrote:
> On Mon, Jun 30, 2025 at 10:51:50AM -0700, Reinette Chatre wrote:
>>
>> Tony,
>>
>> On 6/26/25 9:49 AM, Tony Luck wrote:
>>> Background
>>> ----------
>>>
>>> Telemetry features are being implemented in conjunction with the
>>> IA32_PQR_ASSOC.RMID value on each logical CPU. This is used to send
>>> counts for various events to a collector in a nearby OOBMSM device to be
>>> accumulated with counts for each <RMID, event> pair received from other
>>> CPUs. Cores send event counts when the RMID value changes, or after each
>>> 2ms elapsed time.
>>
>> To start a review of this jumbo series and find that the *first* [1]
>> (straight forward) request from previous review has not been addressed is
>> demoralizing. I was hoping that the previous version's discussions would result
>> in review feedback either addressed or discussed (never ignored). I
>> cannot imagine how requesting OOBMSM to be expanded can be invalid though.
>>
>> Reinette
>>
>> [1] https://lore.kernel.org/lkml/b8ddce03-65c0-4420-b30d-e43c54943667@intel.com/
> 
> My profound apologies for blowing it (again). I went through the comments
> to patches multiple times to try and catch all your comments. But somehow
> skipped the cover letter :-( .
> 
> Here's a re-write to address comments, but also to try to provide
> a better story line starting with how the logical processors capture
> the event data, following on with aggregator processing, etc.
> 
> -Tony
> 
> ---
> 
> On Intel systems that support per-RMID telemetry monitoring each logical
> processor keeps a local count for various events. When the IA32_PQR_ASSOC.RMID
> value for the logical processor changes (or when a two millisecond counter
> expires) these event counts are transmitted to an event aggregator on
> the same package as the processor together with the current RMID value. The
> event counters are reset to zero to begin counting again.
> 
> Each aggregator takes the incoming event counts and adds them to
> cumulative counts for each event for each RMID. Note that there can be
> multiple aggregators on each package with no architectural association
> between logical processors and an aggregator.
> 
> All of these aggregated counters can be read by an operating system from
> the MMIO space of the Out Of Band Management Service Module (OOBMSM)
> device(s) on a system. Any counter can be read from any logical processor.
> 
> Intel publishes details for each processor generation showing which
> events are counted by each logical processor and the offsets for each
> accumulated counter value within the MMIO space in XML files here:
> https://github.com/intel/Intel-PMT.
> 
> For example there are two energy related telemetry events for the Clearwater
> Forest family of processors and the MMIO space looks like this:
> 
> Offset	RMID	Event
> ------	----	-----
> 0x0000	0	core_energy
> 0x0008	0	activity
> 0x0010	1	core_energy
> 0x0018	1	activity
> ...
> 0x23F0	575	core_energy
> 0x23F8	575	activity
> 
> In addition the XML file provides the units (Joules for core_energy,
> Farads for activity) and the type of data (fixed-point binary with
> bit 63 used as to indicate the data is valid, and the low 18 bits as a

"bit 63 used as to indicate" -> "bit 63 used to indicate"?

> binary fraction).
> 
> Finally, each XML file provides a 32-bit unique id (or guid) that is
> used as an index to find the correct XML description file for each
> telemetry implementation.
> 
> The INTEL_PMT_DISCOVERY driver provides intel_pmt_get_regions_by_feature()
> to enumerate the aggregator instances on a platform. It provides:

I think it will be helpful to prime the connection between "aggregator"
and "telemetery region" here. For example,

"to enumerate the aggregator instances on a platform" -> "to enumerate
the aggregator instances (also referred to as "telemetry regions" in this series)
on a platform"

> 1) guid  - so resctrl can determine which events are supported
> 2) mmio base address of counters

mmio -> MMIO

> 3) package id
> 
> Resctrl accumulates counts from all aggregators on a package in order
> to provide a consistent user interface across processor generations.
> 
> Directory structure for the telemetry events looks like this:
> 
> $ tree /sys/fs/resctrl/mon_data/
> /sys/fs/resctrl/mon_data/
> mon_data
> ├── mon_PERF_PKG_00
> │   ├── activity
> │   └── core_energy
> └── mon_PERF_PKG_01
>     ├── activity
>     └── core_energy
> 
> Reading the "core_energy" file from some resctrl mon_data directory shows
> the cumulative energy (in Joules) used by all tasks that ran with the RMID
> associated with that directory on a given package. Note that "core_energy"
> reports only energy consumed by CPU cores (data processing units,
> L1/L2 caches, etc.). It does not include energy used in the "uncore"
> (L3 cache, on package devices, etc.), or used by memory or I/O devices.

Thank you very much for this rework. I found this much easier to follow.

Reinette


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 01/30] x86,fs/resctrl: Consolidate monitor event descriptions
  2025-06-26 16:49 ` [PATCH v6 01/30] x86,fs/resctrl: Consolidate monitor event descriptions Tony Luck
  2025-06-27 21:55   ` Fenghua Yu
@ 2025-07-08 20:52   ` Reinette Chatre
  1 sibling, 0 replies; 89+ messages in thread
From: Reinette Chatre @ 2025-07-08 20:52 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony,

On 6/26/25 9:49 AM, Tony Luck wrote:
> There are currently only three monitor events, all associated with
> the RDT_RESOURCE_L3 resource. Growing support for additional events
> will be easier with some restructuring to have a single point in
> file system code where all attributes of all events are defined.
> 
> Place all event descriptions into an array mon_event_all[]. Doing
> this has the beneficial side effect of removing the need for
> rdt_resource::evt_list.
> 
> Add resctrl_event_id::QOS_FIRST_EVENT for a lower bound on range
> checks for event ids and as the starting index to scan mon_event_all[].
> 
> Drop the code that builds evt_list and change the two places where
> the list is scanned to scan mon_event_all[] instead using a new
> helper macro for_each_mon_event().
> 
> Architecture code now informs file system code which events are
> available with resctrl_enable_mon_event().
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  include/linux/resctrl.h            |  4 +-
>  include/linux/resctrl_types.h      | 12 ++++--
>  fs/resctrl/internal.h              | 13 ++++--
>  arch/x86/kernel/cpu/resctrl/core.c | 12 ++++--
>  fs/resctrl/monitor.c               | 63 +++++++++++++++---------------
>  fs/resctrl/rdtgroup.c              | 11 +++---
>  6 files changed, 66 insertions(+), 49 deletions(-)
> 
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 6fb4894b8cfd..2944042bd84c 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -269,7 +269,6 @@ enum resctrl_schema_fmt {
>   * @mon_domains:	RCU list of all monitor domains for this resource
>   * @name:		Name to use in "schemata" file.
>   * @schema_fmt:		Which format string and parser is used for this schema.
> - * @evt_list:		List of monitoring events
>   * @mbm_cfg_mask:	Bandwidth sources that can be tracked when bandwidth
>   *			monitoring events can be configured.
>   * @cdp_capable:	Is the CDP feature available on this resource
> @@ -287,7 +286,6 @@ struct rdt_resource {
>  	struct list_head	mon_domains;
>  	char			*name;
>  	enum resctrl_schema_fmt	schema_fmt;
> -	struct list_head	evt_list;
>  	unsigned int		mbm_cfg_mask;
>  	bool			cdp_capable;
>  };
> @@ -372,6 +370,8 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
>  u32 resctrl_arch_system_num_rmid_idx(void);
>  int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
>  
> +void resctrl_enable_mon_event(enum resctrl_event_id eventid);
> +
>  bool resctrl_arch_is_evt_configurable(enum resctrl_event_id evt);
>  
>  /**
> diff --git a/include/linux/resctrl_types.h b/include/linux/resctrl_types.h
> index a25fb9c4070d..2dadbc54e4b3 100644
> --- a/include/linux/resctrl_types.h
> +++ b/include/linux/resctrl_types.h
> @@ -34,11 +34,15 @@
>  /* Max event bits supported */
>  #define MAX_EVT_CONFIG_BITS		GENMASK(6, 0)
>  
> -/*
> - * Event IDs, the values match those used to program IA32_QM_EVTSEL before
> - * reading IA32_QM_CTR on RDT systems.
> - */
> +/* Event IDs */
>  enum resctrl_event_id {
> +	/* Must match value of first event below */
> +	QOS_FIRST_EVENT			= 0x01,
> +
> +	/*
> +	 * These values match those used to program IA32_QM_EVTSEL before
> +	 * reading IA32_QM_CTR on RDT systems.
> +	 */
>  	QOS_L3_OCCUP_EVENT_ID		= 0x01,
>  	QOS_L3_MBM_TOTAL_EVENT_ID	= 0x02,
>  	QOS_L3_MBM_LOCAL_EVENT_ID	= 0x03,
> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
> index 0a1eedba2b03..445a41060724 100644
> --- a/fs/resctrl/internal.h
> +++ b/fs/resctrl/internal.h
> @@ -52,19 +52,26 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
>  }
>  
>  /**
> - * struct mon_evt - Entry in the event list of a resource
> + * struct mon_evt - Properties of a monitor event
>   * @evtid:		event id
> + * @rid:		index of the resource for this event

x86 uses @rid as an index but this is not something that resctrl
fs enforces (please correct me if I am wrong). To prevent such assumption
this can just be "resource id for this event" or "ID of the resource 
associated with this event" or ?.

Patch looks good otherwise.
| Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 02/30] x86,fs/resctrl: Replace architecture event enabled checks
  2025-06-26 16:49 ` [PATCH v6 02/30] x86,fs/resctrl: Replace architecture event enabled checks Tony Luck
  2025-06-27 22:15   ` Fenghua Yu
@ 2025-07-08 20:52   ` Reinette Chatre
  1 sibling, 0 replies; 89+ messages in thread
From: Reinette Chatre @ 2025-07-08 20:52 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony,

On 6/26/25 9:49 AM, Tony Luck wrote:
> The resctrl file system now has complete knowledge of the status
> of every event. So there is no need for per-event function calls
> to check.
> 
> Replace each of the resctrl_arch_is_{event}enabled() calls with
> resctrl_is_mon_event_enabled(QOS_{EVENT}).
> 
> No functional change.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 03/30] x86/resctrl: Remove 'rdt_mon_features' global variable
  2025-06-26 16:49 ` [PATCH v6 03/30] x86/resctrl: Remove 'rdt_mon_features' global variable Tony Luck
@ 2025-07-08 20:53   ` Reinette Chatre
  0 siblings, 0 replies; 89+ messages in thread
From: Reinette Chatre @ 2025-07-08 20:53 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony,

On 6/26/25 9:49 AM, Tony Luck wrote:
> rdt_mon_features is used as a bitmask of enabled monitor events. A monitor
> event's status is now maintained in mon_evt::enabled with all monitor
> events' mon_evt structures found in the filesystem's mon_event_all[] array.
> 
> Remove the remaining uses of rdt_mon_features.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 04/30] x86,fs/resctrl: Prepare for more monitor events
  2025-06-26 16:49 ` [PATCH v6 04/30] x86,fs/resctrl: Prepare for more monitor events Tony Luck
@ 2025-07-08 20:55   ` Reinette Chatre
  0 siblings, 0 replies; 89+ messages in thread
From: Reinette Chatre @ 2025-07-08 20:55 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony,

On 6/26/25 9:49 AM, Tony Luck wrote:
> There's a rule in computer programming that objects appear zero,
> once, or many times. So code accordingly.
> 
> There are two MBM events and resctrl is coded with a lot of
> 
>         if (local)
>                 do one thing
>         if (total)
>                 do a different thing
> 
> Change the rdt_mon_domain and rdt_hw_mon_domain structures to hold arrays
> of pointers to per event data instead of explicit fields for total and
> local bandwidth.
> 
> Simplify by coding for many events using loops on which are enabled.
> 
> Move resctrl_is_mbm_event() to <linux/resctrl.h> so it can be used more
> widely. Also provide a for_each_mbm_event_id() helper macro.
> 
> Cleanup variable names in functions touched to consistently use
> "eventid" for those with type enum resctrl_event_id.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 06/30] x86/resctrl: Move L3 initialization out of domain_add_cpu_mon()
  2025-06-26 16:49 ` [PATCH v6 06/30] x86/resctrl: Move L3 initialization out of domain_add_cpu_mon() Tony Luck
@ 2025-07-08 20:56   ` Reinette Chatre
  0 siblings, 0 replies; 89+ messages in thread
From: Reinette Chatre @ 2025-07-08 20:56 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony,

I do not think the subject is accurate since this patch does not
actually move L3 initialization out of domain_add_cpu_mon().
L3 initialization is still done in domain_add_cpu_mon() but
instead of being open coded it is done in function called
by domain_add_cpu_mon(). Perhaps something like:
"x86/resctrl: Move L3 initialization into function"?


On 6/26/25 9:49 AM, Tony Luck wrote:
> To prepare for additional types of monitoring domains, move all the L3
> resource monitoring domain initialization out of domain_add_cpu_mon()
> and into a new helper function l3_mon_domain_setup() (name chosen
> as the partner of existing l3_mon_domain_free()).

Similar comment as with subject. Also please note that l3_mon_domain_free()
does not exist at this point in the series. How about something like below,
(please feel free to improve):

	To prepare for additional types of monitoring domains, move open coded
	L3 resource monitoring domain initialization from domain_add_cpu_mon()             
	into a new helper function l3_mon_domain_setup() called by                      
	domain_add_cpu_mon().

> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---

Patch looks good to me.

Reinette

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 07/30] x86,fs/resctrl: Refactor domain_remove_cpu_mon() ready for new domain types
  2025-06-26 16:49 ` [PATCH v6 07/30] x86,fs/resctrl: Refactor domain_remove_cpu_mon() ready for new domain types Tony Luck
@ 2025-07-08 20:57   ` Reinette Chatre
  0 siblings, 0 replies; 89+ messages in thread
From: Reinette Chatre @ 2025-07-08 20:57 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony,

On 6/26/25 9:49 AM, Tony Luck wrote:
> Historically all monitoring events have been associated with the L3
> resource. This will change when support for telemetry events is added.
> 
> The RDT_RESOURCE_L3 resource carries a lot of state in the domain
> structures which needs to be dealt with when a domain is taken offline
> by removing the last CPU in the domain.
> 
> Refactor domain_remove_cpu_mon() so all the L3 processing is separated
> from general actions of clearing the CPU bit in the mask and removing
> directories from mon_data.
> 
> resctrl_offline_mon_domain() will still need to remove domain specific

"resctrl_offline_mon_domain() will still need to" -> "resctrl_offline_mon_domain()
needs to"?

> directories and files from the "mon_data" directories, but can skip the
> L3 resource specific cleanup when called for other resource types.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---

Assuming [1] is applied, this patch looks good to me.

Reinette

[1] https://lore.kernel.org/lkml/aF3lRKURweT4mhAj@agluck-desk3/



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 05/30] x86,fs/resctrl: Improve domain type checking
  2025-06-26 16:49 ` [PATCH v6 05/30] x86,fs/resctrl: Improve domain type checking Tony Luck
@ 2025-07-08 21:01   ` Reinette Chatre
  0 siblings, 0 replies; 89+ messages in thread
From: Reinette Chatre @ 2025-07-08 21:01 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony,

On 6/26/25 9:49 AM, Tony Luck wrote:
> The rdt_domain_hdr structure is used in both control and monitor
> domain structures to provide common methods for operations such as
> adding a CPU to a domain, removing a CPU from a domain, accessing
> the mask of all CPUs in a domain.
> 
> The "type" field provides a simple check whether a domain is a
> control or monitor domain so that programming errors operating
> on domains will be quickly caught.
> 
> To prepare for additional domain types that depend on the rdt_resource
> to which they are connected add the resource id into the header
> and check that in addition to the type.
> 
> At this point all monitoring events are tied to the RDT_RESOURCE_L3
> resource. So hard code the check in rdtgroup_mondata_show() to
> that resource id.

Comment about this hardcoding below ...

> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  include/linux/resctrl.h            |  9 +++++++++
>  arch/x86/kernel/cpu/resctrl/core.c | 10 ++++++----
>  fs/resctrl/ctrlmondata.c           |  2 +-
>  3 files changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 478d7a935ca3..dc7ccd60e8c2 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -131,15 +131,24 @@ enum resctrl_domain_type {
>   * @list:		all instances of this resource
>   * @id:			unique id for this instance
>   * @type:		type of this instance
> + * @rid:		index of resource for this domain

Similar comment as in patch #1 about use of "index" in resctrl fs.

>   * @cpu_mask:		which CPUs share this resource
>   */
>  struct rdt_domain_hdr {
>  	struct list_head		list;
>  	int				id;
>  	enum resctrl_domain_type	type;
> +	enum resctrl_res_level		rid;
>  	struct cpumask			cpu_mask;
>  };
>  

...

> diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
> index ad7ffc6acf13..cdb4bc8baa99 100644
> --- a/fs/resctrl/ctrlmondata.c
> +++ b/fs/resctrl/ctrlmondata.c
> @@ -643,7 +643,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>  		 * the resource to find the domain with "domid".
>  		 */
>  		hdr = resctrl_find_domain(&r->mon_domains, domid, NULL);
> -		if (!hdr || WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN)) {
> +		if (!hdr || !domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3)) {
>  			ret = -ENOENT;
>  			goto out;
>  		}

I do not see why this hardcoding is required since the resource ID is available via the
struct mon_data associated with the file. Looks like this is undone in patch #9 anyway
so the explicit handling by this patch is unclear.

Reinette

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 09/30] x86,fs/resctrl: Use struct rdt_domain_hdr instead of struct rdt_mon_domain
  2025-06-26 16:49 ` [PATCH v6 09/30] x86,fs/resctrl: Use struct rdt_domain_hdr instead of struct rdt_mon_domain Tony Luck
@ 2025-07-08 21:04   ` Reinette Chatre
  0 siblings, 0 replies; 89+ messages in thread
From: Reinette Chatre @ 2025-07-08 21:04 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony,

On 6/26/25 9:49 AM, Tony Luck wrote:
> Historically all monitoring events have been associated with the L3
> resource and it made sense to use "struct rdt_mon_domain *" arguments

How about:
"it made sense to use" -> "it made sense to use the L3 specific"

> to functions manipulating domains. But the addition of monitor events
> tied to other resources changes this assumption.
> 
> Change calling sequence for domain addition and deletion. Also for
> reading events. This includes the smp_call*() IPI where the rmid_read
> now holds a pointer to struct rdt_domain_hdr.

Above notes which parts of code is changed, but lacks description of what
the change involves. Please describe what is changed and why.

> 
> The mon_data structure is unchanged, but documentation is updated
> to not that mon_data::sum is only used for RDT_RESOURCE_L3.

"to not" -> "to note"?

> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---

...

> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index 05438e15e2ca..3828480e0426 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -2887,7 +2887,8 @@ static void rmdir_all_sub(void)
>   * @rid:    The resource id for the event file being created.
>   * @domid:  The domain id for the event file being created.
>   * @mevt:   The type of event file being created.
> - * @do_sum: Whether SNC summing monitors are being created.
> + * @do_sum: Whether SNC summing monitors are being created. Only set
> + *          when @rid == RDT_RESOURCE_L3.
>   */
>  static struct mon_data *mon_get_kn_priv(enum resctrl_res_level rid, int domid,
>  					struct mon_evt *mevt,
> @@ -2897,6 +2898,9 @@ static struct mon_data *mon_get_kn_priv(enum resctrl_res_level rid, int domid,
>  
>  	lockdep_assert_held(&rdtgroup_mutex);
>  
> +	if (WARN_ON_ONCE(do_sum && rid != RDT_RESOURCE_L3))
> +		return NULL;
> +
>  	list_for_each_entry(priv, &mon_data_kn_priv_list, list) {
>  		if (priv->rid == rid && priv->domid == domid &&
>  		    priv->sum == do_sum && priv->evtid == mevt->evtid)
> @@ -3024,17 +3028,27 @@ static void mon_rmdir_one_subdir(struct kernfs_node *pkn, char *name, char *subn
>   * when last domain being summed is removed.
>   */
>  static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
> -					   struct rdt_mon_domain *d)
> +					   struct rdt_domain_hdr *hdr)
>  {
>  	struct rdtgroup *prgrp, *crgrp;
> +	int domid = hdr->id;
>  	char subname[32];
> -	bool snc_mode;
>  	char name[32];
>  
> -	snc_mode = r->mon_scope == RESCTRL_L3_NODE;
> -	sprintf(name, "mon_%s_%02d", r->name, snc_mode ? d->ci_id : d->hdr.id);
> -	if (snc_mode)
> -		sprintf(subname, "mon_sub_%s_%02d", r->name, d->hdr.id);
> +	if (r->rid == RDT_RESOURCE_L3) {
> +		struct rdt_mon_domain *d;
> +
> +		if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
> +			return;
> +		d = container_of(hdr, struct rdt_mon_domain, hdr);
> +
> +		/* SNC mode? */
> +		if (r->mon_scope == RESCTRL_L3_NODE) {
> +			domid = d->ci_id;
> +			sprintf(subname, "mon_sub_%s_%02d", r->name, d->hdr.id);

nit: "d->hdr.id" -> "hdr->id"?

> +		}
> +	}
> +	sprintf(name, "mon_%s_%02d", r->name, domid);
>  
>  	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
>  		mon_rmdir_one_subdir(prgrp->mon.mon_data_kn, name, subname);
> @@ -3044,19 +3058,18 @@ static void rmdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
>  	}
>  }
>  
> -static int mon_add_all_files(struct kernfs_node *kn, struct rdt_mon_domain *d,
> +static int mon_add_all_files(struct kernfs_node *kn, struct rdt_domain_hdr *hdr,
>  			     struct rdt_resource *r, struct rdtgroup *prgrp,
> -			     bool do_sum)
> +			     int domid, bool do_sum)
>  {
>  	struct rmid_read rr = {0};
>  	struct mon_data *priv;
>  	struct mon_evt *mevt;
> -	int ret, domid;
> +	int ret;
>  
>  	for_each_mon_event(mevt) {
>  		if (mevt->rid != r->rid || !mevt->enabled)
>  			continue;
> -		domid = do_sum ? d->ci_id : d->hdr.id;
>  		priv = mon_get_kn_priv(r->rid, domid, mevt, do_sum);
>  		if (WARN_ON_ONCE(!priv))
>  			return -EINVAL;
> @@ -3065,26 +3078,38 @@ static int mon_add_all_files(struct kernfs_node *kn, struct rdt_mon_domain *d,
>  		if (ret)
>  			return ret;
>  
> -		if (!do_sum && resctrl_is_mbm_event(mevt->evtid))
> -			mon_event_read(&rr, r, d, prgrp, &d->hdr.cpu_mask, mevt->evtid, true);
> +		if (r->rid == RDT_RESOURCE_L3 && !do_sum && resctrl_is_mbm_event(mevt->evtid))
> +			mon_event_read(&rr, r, hdr, prgrp, &hdr->cpu_mask, mevt->evtid, true);
>  	}
>  
>  	return 0;
>  }
>  
>  static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
> -				struct rdt_mon_domain *d,
> +				struct rdt_domain_hdr *hdr,
>  				struct rdt_resource *r, struct rdtgroup *prgrp)
>  {
>  	struct kernfs_node *kn, *ckn;
> +	int domid = hdr->id;
> +	bool snc_mode = 0;

bool snc_mode = false;

>  	char name[32];
> -	bool snc_mode;
>  	int ret = 0;
>  
>  	lockdep_assert_held(&rdtgroup_mutex);
>  
> -	snc_mode = r->mon_scope == RESCTRL_L3_NODE;
> -	sprintf(name, "mon_%s_%02d", r->name, snc_mode ? d->ci_id : d->hdr.id);
> +	if (r->rid == RDT_RESOURCE_L3) {
> +		if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
> +			return -EINVAL;
> +		snc_mode = r->mon_scope == RESCTRL_L3_NODE;
> +		if (snc_mode) {
> +			struct rdt_mon_domain *d;
> +
> +			d = container_of(hdr, struct rdt_mon_domain, hdr);
> +			domid = d->ci_id;
> +		}
> +	}
> +	sprintf(name, "mon_%s_%02d", r->name, domid);
> +
>  	kn = kernfs_find_and_get(parent_kn, name);
>  	if (kn) {
>  		/*
> @@ -3100,13 +3125,13 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
>  		ret = rdtgroup_kn_set_ugid(kn);
>  		if (ret)
>  			goto out_destroy;
> -		ret = mon_add_all_files(kn, d, r, prgrp, snc_mode);
> +		ret = mon_add_all_files(kn, hdr, r, prgrp, domid, snc_mode);
>  		if (ret)
>  			goto out_destroy;
>  	}
>  
>  	if (snc_mode) {
> -		sprintf(name, "mon_sub_%s_%02d", r->name, d->hdr.id);
> +		sprintf(name, "mon_sub_%s_%02d", r->name, hdr->id);
>  		ckn = kernfs_create_dir(kn, name, parent_kn->mode, prgrp);
>  		if (IS_ERR(ckn)) {
>  			ret = -EINVAL;
> @@ -3117,7 +3142,7 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
>  		if (ret)
>  			goto out_destroy;
>  
> -		ret = mon_add_all_files(ckn, d, r, prgrp, false);
> +		ret = mon_add_all_files(ckn, hdr, r, prgrp, hdr->id, false);
>  		if (ret)
>  			goto out_destroy;
>  	}
> @@ -3135,7 +3160,7 @@ static int mkdir_mondata_subdir(struct kernfs_node *parent_kn,
>   * and "monitor" groups with given domain id.
>   */
>  static void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
> -					   struct rdt_mon_domain *d)
> +					   struct rdt_domain_hdr *hdr)
>  {
>  	struct kernfs_node *parent_kn;
>  	struct rdtgroup *prgrp, *crgrp;
> @@ -3143,12 +3168,12 @@ static void mkdir_mondata_subdir_allrdtgrp(struct rdt_resource *r,
>  
>  	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
>  		parent_kn = prgrp->mon.mon_data_kn;
> -		mkdir_mondata_subdir(parent_kn, d, r, prgrp);
> +		mkdir_mondata_subdir(parent_kn, hdr, r, prgrp);
>  
>  		head = &prgrp->mon.crdtgrp_list;
>  		list_for_each_entry(crgrp, head, mon.crdtgrp_list) {
>  			parent_kn = crgrp->mon.mon_data_kn;
> -			mkdir_mondata_subdir(parent_kn, d, r, crgrp);
> +			mkdir_mondata_subdir(parent_kn, hdr, r, crgrp);
>  		}
>  	}
>  }
> @@ -3157,14 +3182,14 @@ static int mkdir_mondata_subdir_alldom(struct kernfs_node *parent_kn,
>  				       struct rdt_resource *r,
>  				       struct rdtgroup *prgrp)
>  {
> -	struct rdt_mon_domain *dom;
> +	struct rdt_domain_hdr *hdr;
>  	int ret;
>  
>  	/* Walking r->domains, ensure it can't race with cpuhp */
>  	lockdep_assert_cpus_held();
>  
> -	list_for_each_entry(dom, &r->mon_domains, hdr.list) {
> -		ret = mkdir_mondata_subdir(parent_kn, dom, r, prgrp);
> +	list_for_each_entry(hdr, &r->mon_domains, list) {
> +		ret = mkdir_mondata_subdir(parent_kn, hdr, r, prgrp);
>  		if (ret)
>  			return ret;
>  	}
> @@ -4036,8 +4061,10 @@ void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain
>  	mutex_unlock(&rdtgroup_mutex);
>  }
>  
> -void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
> +void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr)
>  {
> +	struct rdt_mon_domain *d;
> +
>  	mutex_lock(&rdtgroup_mutex);
>  
>  	/*
> @@ -4045,11 +4072,15 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d
>  	 * per domain monitor data directories.
>  	 */
>  	if (resctrl_mounted && resctrl_arch_mon_capable())
> -		rmdir_mondata_subdir_allrdtgrp(r, d);
> +		rmdir_mondata_subdir_allrdtgrp(r, hdr);
>  
>  	if (r->rid != RDT_RESOURCE_L3)
>  		goto out_unlock;
>  
> +	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
> +		goto out_unlock;
> +
> +	d = container_of(hdr, struct rdt_mon_domain, hdr);
>  	if (resctrl_is_mbm_enabled())
>  		cancel_delayed_work(&d->mbm_over);
>  	if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID) && has_busy_rmid(d)) {
> @@ -4132,12 +4163,20 @@ int resctrl_online_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain *d
>  	return err;
>  }
>  
> -int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
> +int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr)
>  {
> -	int err;
> +	struct rdt_mon_domain *d;
> +	int err = -EINVAL;
>  
>  	mutex_lock(&rdtgroup_mutex);
>  
> +	if (r->rid != RDT_RESOURCE_L3)
> +		goto mkdir;
> +
> +	if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, r->rid))

"r->rid" -> "RDT_RESOURCE_L3"

I understand that the check right before this ensures this is the case
but the goal of this check is to keep it with the following
container_of(). Making this change also keeps the code consistent, compare
for example with resctrl_arch_rmid_read().

> +		goto out_unlock;
> +
> +	d = container_of(hdr, struct rdt_mon_domain, hdr);
>  	err = domain_setup_mon_state(r, d);
>  	if (err)
>  		goto out_unlock;
> @@ -4151,6 +4190,8 @@ int resctrl_online_mon_domain(struct rdt_resource *r, struct rdt_mon_domain *d)
>  	if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID))
>  		INIT_DELAYED_WORK(&d->cqm_limbo, cqm_handle_limbo);
>  
> +mkdir:
> +	err = 0;
>  	/*
>  	 * If the filesystem is not mounted then only the default resource group
>  	 * exists. Creation of its directories is deferred until mount time


Reinette

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 10/30] x86,fs/resctrl: Rename struct rdt_mon_domain and rdt_hw_mon_domain
  2025-06-26 16:49 ` [PATCH v6 10/30] x86,fs/resctrl: Rename struct rdt_mon_domain and rdt_hw_mon_domain Tony Luck
@ 2025-07-08 21:06   ` Reinette Chatre
  0 siblings, 0 replies; 89+ messages in thread
From: Reinette Chatre @ 2025-07-08 21:06 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony,

On 6/26/25 9:49 AM, Tony Luck wrote:
> Historically all monitoring events have been associated with the L3
> resource. This will change when support for telemetry events is added.
> 
> The structures to track monitor domains at both the file system and

"The structures to track monitor domains" -> "The structures to track
monitor domains of the L3 resource"?
(goal is to connect the context that is about the resource jumping to
problem description that is about domains)

> architecture level have generic names. This may cause confusion when
> support for monitoring events in other resources is added.
> 
> Rename by adding "l3_" into the names:
> rdt_mon_domain		-> rdt_l3_mon_domain
> rdt_hw_mon_domain	-> rdt_hw_l3_mon_domain
> 
> No functional change.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  include/linux/resctrl.h                | 16 ++++++------
>  arch/x86/kernel/cpu/resctrl/internal.h | 12 ++++-----
>  fs/resctrl/internal.h                  |  8 +++---
>  arch/x86/kernel/cpu/resctrl/core.c     | 14 +++++-----
>  arch/x86/kernel/cpu/resctrl/monitor.c  | 18 ++++++-------
>  fs/resctrl/ctrlmondata.c               |  2 +-
>  fs/resctrl/monitor.c                   | 34 ++++++++++++------------
>  fs/resctrl/rdtgroup.c                  | 36 +++++++++++++-------------
>  8 files changed, 70 insertions(+), 70 deletions(-)
> 
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index b332466312e1..01740acebcd1 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -166,7 +166,7 @@ struct rdt_ctrl_domain {
>  };
>  
>  /**
> - * struct rdt_mon_domain - group of CPUs sharing a resctrl monitor resource
> + * struct rdt_l3_mon_domain - group of CPUs sharing a resctrl monitor resource
>   * @hdr:		common header for different domain types
>   * @ci_id:		cache info id for this domain
>   * @rmid_busy_llc:	bitmap of which limbo RMIDs are above threshold
> @@ -178,7 +178,7 @@ struct rdt_ctrl_domain {
>   * @mbm_work_cpu:	worker CPU for MBM h/w counters
>   * @cqm_work_cpu:	worker CPU for CQM h/w counters
>   */
> -struct rdt_mon_domain {
> +struct rdt_l3_mon_domain {
>  	struct rdt_domain_hdr		hdr;
>  	unsigned int			ci_id;
>  	unsigned long			*rmid_busy_llc;
> @@ -334,10 +334,10 @@ struct resctrl_cpu_defaults {
>  };
>  
>  struct resctrl_mon_config_info {
> -	struct rdt_resource	*r;
> -	struct rdt_mon_domain	*d;
> -	u32			evtid;
> -	u32			mon_config;
> +	struct rdt_resource		*r;
> +	struct rdt_l3_mon_domain	*d;
> +	u32				evtid;
> +	u32				mon_config;
>  };
>  
>  /**
> @@ -530,7 +530,7 @@ struct rdt_domain_hdr *resctrl_find_domain(struct list_head *h, int id,
>   *
>   * This can be called from any CPU.
>   */
> -void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
> +void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_l3_mon_domain *d,
>  			     u32 closid, u32 rmid,
>  			     enum resctrl_event_id eventid);
>  
> @@ -543,7 +543,7 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
>   *
>   * This can be called from any CPU.
>   */
> -void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_mon_domain *d);
> +void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_l3_mon_domain *d);
>  
>  /**
>   * resctrl_arch_reset_all_ctrls() - Reset the control for each CLOSID to its
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 58dca892a5df..224b71730cc3 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -51,7 +51,7 @@ struct rdt_hw_ctrl_domain {
>  };
>  
>  /**
> - * struct rdt_hw_mon_domain - Arch private attributes of a set of CPUs that share
> + * struct rdt_hw_l3_mon_domain - Arch private attributes of a set of CPUs that share
>   *			      a resource for a monitor function

Could you please fix this alignment?

>   * @d_resctrl:	Properties exposed to the resctrl file system
>   * @arch_mbm_states:	Per-event pointer to the MBM event's saved state.
> @@ -60,8 +60,8 @@ struct rdt_hw_ctrl_domain {
>   *
>   * Members of this structure are accessed via helpers that provide abstraction.
>   */
> -struct rdt_hw_mon_domain {
> -	struct rdt_mon_domain		d_resctrl;
> +struct rdt_hw_l3_mon_domain {
> +	struct rdt_l3_mon_domain		d_resctrl;
>  	struct arch_mbm_state		*arch_mbm_states[QOS_NUM_L3_MBM_EVENTS];

Also here, please align struct member names.

>  };
>  
Reinette

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 11/30] x86,fs/resctrl: Rename some L3 specific functions
  2025-06-26 16:49 ` [PATCH v6 11/30] x86,fs/resctrl: Rename some L3 specific functions Tony Luck
@ 2025-07-08 21:08   ` Reinette Chatre
  0 siblings, 0 replies; 89+ messages in thread
From: Reinette Chatre @ 2025-07-08 21:08 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony,

On 6/26/25 9:49 AM, Tony Luck wrote:
> All monitor functions used to be tied to the RDT_RESOURCE_L3 resource,
> so generic function names to setup and tear down domains made sense.

"All monitor functions are tied to the RDT_RESOURCE_L3 resource,
so generic function names to setup and tear down domains makes sense."

> But with the arrival of monitor events tied to other domains it would
> be clearer if these functions were more accurately named.

"With the arrival of monitor events tied to new domains associated with
different resources it would be clearer if these functions are more
accurately named."
 
> Two groups of functions renamed here:
> 
> Functions that allocate/free architecture per-RMID mbm state information:

mbm -> MBM

> arch_domain_mbm_alloc()		-> l3_mon_domain_mbm_alloc()
> mon_domain_free()		-> l3_mon_domain_free()
> 
> Functions that allocate/free filesystem per-RMID mbm state information:

mbm -> MBM

> domain_setup_mon_state()	-> domain_setup_l3_mon_state()
> domain_destroy_mon_state()	-> domain_destroy_l3_mon_state()
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---

Patch looks good to me.

Reinette

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 13/30] x86,fs/resctrl: Handle events that can be read from any CPU
  2025-06-26 16:49 ` [PATCH v6 13/30] x86,fs/resctrl: Handle events that can be read from any CPU Tony Luck
@ 2025-07-08 21:15   ` Reinette Chatre
  0 siblings, 0 replies; 89+ messages in thread
From: Reinette Chatre @ 2025-07-08 21:15 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony,

On 6/26/25 9:49 AM, Tony Luck wrote:

> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index 6d4191eff391..aec26457d82c 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -356,11 +356,30 @@ static struct mbm_state *get_mbm_state(struct rdt_l3_mon_domain *d, u32 closid,
>  	return state ? &state[idx] : NULL;
>  }
>  


Could you please add a function comment for cpu_on_correct_domain(()
to document the different contexts that this function needs to be able
to handle? I think it is a bit subtle how the function is designed to be
run in preemptible as well as non-preemptible context. This will be helpful
when somebody aims to change/use this later.

> +static bool cpu_on_correct_domain(struct rmid_read *rr)
> +{
> +	struct cacheinfo *ci;
> +	int cpu;
> +
> +	/* Any CPU is OK for this event */
> +	if (rr->evt->any_cpu)
> +		return true;
> +
> +	cpu = smp_processor_id();
> +
> +	/* Single domain. Must be on a CPU in that domain. */
> +	if (rr->hdr)
> +		return cpumask_test_cpu(cpu, &rr->hdr->cpu_mask);
> +
> +	/* Summing domains that share a cache, must be on a CPU for that cache. */
> +	ci = get_cpu_cacheinfo_level(cpu, RESCTRL_L3_CACHE);
> +
> +	return ci && ci->id == rr->ci_id;
> +}
> +
>  static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
>  {
> -	int cpu = smp_processor_id();
>  	struct rdt_l3_mon_domain *d;
> -	struct cacheinfo *ci;
>  	struct mbm_state *m;
>  	int err, ret;
>  	u64 tval = 0;
> @@ -378,9 +397,10 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
>  	}
>  
>  	if (rr->hdr) {
> -		/* Reading a single domain, must be on a CPU in that domain. */
> -		if (!cpumask_test_cpu(cpu, &rr->hdr->cpu_mask))
> +		/* Single domain. */
> +		if (!cpu_on_correct_domain(rr))
>  			return -EINVAL;

cpu_on_correct_domain() duplicates the logic of __mon_event_count() so it
seems redundant to call cpu_on_correct_domain() in these paths. Since
cpu_on_correct_domain() contains complete logic it can be called just
once at the beginning of __mon_event_count() and thus also cover the
rr->first block?

> +
>  		rr->err = resctrl_arch_rmid_read(rr->r, rr->hdr, closid, rmid,
>  						 rr->evt->evtid, &tval, rr->arch_mon_ctx);
>  		if (rr->err)
> @@ -394,9 +414,8 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
>  	if (WARN_ON_ONCE(rr->r->rid != RDT_RESOURCE_L3))
>  		return -EINVAL;
>  
> -	/* Summing domains that share a cache, must be on a CPU for that cache. */
> -	ci = get_cpu_cacheinfo_level(cpu, RESCTRL_L3_CACHE);
> -	if (!ci || ci->id != rr->ci_id)
> +	/* Sum across multiple domains. */
> +	if (!cpu_on_correct_domain(rr))
>  		return -EINVAL;
>  
>  	/*
> @@ -878,7 +897,7 @@ struct mon_evt mon_event_all[QOS_NUM_EVENTS] = {
>  	},
>  };
>  
> -void resctrl_enable_mon_event(enum resctrl_event_id eventid)
> +void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu)
>  {
>  	if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS))
>  		return;
> @@ -887,6 +906,7 @@ void resctrl_enable_mon_event(enum resctrl_event_id eventid)
>  		return;
>  	}
>  
> +	mon_event_all[eventid].any_cpu = any_cpu;
>  	mon_event_all[eventid].enabled = true;
>  }
>  

Reinette

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 14/30] x86,fs/resctrl: Support binary fixed point event counters
  2025-06-26 16:49 ` [PATCH v6 14/30] x86,fs/resctrl: Support binary fixed point event counters Tony Luck
  2025-06-27 21:22   ` Fenghua Yu
  2025-06-27 21:49   ` Fenghua Yu
@ 2025-07-08 21:46   ` Reinette Chatre
  2025-07-09 16:52     ` Luck, Tony
  2 siblings, 1 reply; 89+ messages in thread
From: Reinette Chatre @ 2025-07-08 21:46 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony,

On 6/26/25 9:49 AM, Tony Luck wrote:
> Resctrl was written with the assumption that all monitor events can be
> displayed as unsigned decimal integers.
> 
> Hardware architecture counters may provide some telemetry events with
> greater precision where the event is not a simple count, but is a
> measurement of some sort (e.g. Joules for energy consumed).
> 
> Add a new argument to resctrl_enable_mon_event() for architecture code
> to inform the file system that the value for a counter is a fixed-point
> value with a specific number of binary places.  The file system will
> only allow architecture to use floating point format on events that it
> marked with mon_evt::is_floating_point.
> 
> Fixed point values are displayed with values rounded to an appropriate
> number of decimal places for the precision of the number of binary places
> provided. In general one extra decimal place is added for every three
> additional binary places. There are some exceptions for low precision
> binary values where exact representation is possible:
> 
>   1 binary place is 0.0 or 0.5.			=> 1 decimal place
>   2 binary places is 0.0. 0.25, 0.5, 0.75	=> 2 decimal places
>   3 binary places is 0.0, 0.125, etc.		=> 3 decimal places
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  include/linux/resctrl.h            |  4 +-
>  fs/resctrl/internal.h              |  4 ++
>  arch/x86/kernel/cpu/resctrl/core.c |  6 +-
>  fs/resctrl/ctrlmondata.c           | 91 +++++++++++++++++++++++++++++-
>  fs/resctrl/monitor.c               | 10 +++-
>  5 files changed, 108 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index e05a1abb25d4..1060a54cc9fa 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -379,7 +379,9 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
>  u32 resctrl_arch_system_num_rmid_idx(void);
>  int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
>  
> -void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu);
> +#define MAX_BINARY_BITS	27
> +
> +void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu, u32 binary_bits);
>  
>  bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid);
>  
> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
> index f51d10d6a510..4dc678af005c 100644
> --- a/fs/resctrl/internal.h
> +++ b/fs/resctrl/internal.h
> @@ -58,6 +58,8 @@ static inline struct rdt_fs_context *rdt_fc2context(struct fs_context *fc)
>   * @name:		name of the event
>   * @configurable:	true if the event is configurable
>   * @any_cpu:		true if the event can be read from any CPU
> + * @is_floating_point:	event values may be displayed in floating point format

To help be specific and match user interface doc in patch #30 (and supported with a change
to this patch, more below):
"event values may be displayed" -> "event values are displayed"

> + * @binary_bits:	number of fixed-point binary bits from architecture

Please append "only valid if @is_floating_point is true".

>   * @enabled:		true if the event is enabled
>   */
>  struct mon_evt {
> @@ -66,6 +68,8 @@ struct mon_evt {
>  	char			*name;
>  	bool			configurable;
>  	bool			any_cpu;
> +	bool			is_floating_point;
> +	int			binary_bits;

hmmm ... first hunk of this patch uses "u32" as type for binary_bits and
this hunk uses "int", this mix of types is not clear at this point.

Since "binary_bits" is used as index into array I do not think "int" is
appropriate. How about just unsigned int throughout?

>  	bool			enabled;
>  };
>  
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index b83861ab504f..2b6c6b61707d 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -887,15 +887,15 @@ static __init bool get_rdt_mon_resources(void)
>  	bool ret = false;
>  
>  	if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) {
> -		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false);
> +		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false, 0);
>  		ret = true;
>  	}
>  	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
> -		resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID, false);
> +		resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID, false, 0);
>  		ret = true;
>  	}
>  	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
> -		resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID, false);
> +		resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID, false, 0);
>  		ret = true;
>  	}
>  
> diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
> index 2e65fddc3408..29de0e380ccc 100644
> --- a/fs/resctrl/ctrlmondata.c
> +++ b/fs/resctrl/ctrlmondata.c
> @@ -590,6 +590,93 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>  	resctrl_arch_mon_ctx_free(r, evt->evtid, rr->arch_mon_ctx);
>  }
>  
> +/**
> + * struct fixed_params - parameters to decode a binary fixed point value
> + * @decplaces:	Number of decimal places for this number of binary places.
> + * @pow10:	Multiplier (10 ^ decimal places).

To help be specific:
* @pow10:	Multiplier (10 ^ @decplaces).

... but I wonder if this cannot just use int_pow() to avoid this hardcoding?

> + */
> +struct fixed_params {
> +	int	decplaces;
> +	int	pow10;
> +};
> +
> +static struct fixed_params fixed_params[MAX_BINARY_BITS + 1] = {
> +	[1]  = { .decplaces = 1, .pow10 = 10 },
> +	[2]  = { .decplaces = 2, .pow10 = 100 },
> +	[3]  = { .decplaces = 3, .pow10 = 1000 },
> +	[4]  = { .decplaces = 3, .pow10 = 1000 },
> +	[5]  = { .decplaces = 3, .pow10 = 1000 },
> +	[6]  = { .decplaces = 3, .pow10 = 1000 },
> +	[7]  = { .decplaces = 3, .pow10 = 1000 },
> +	[8]  = { .decplaces = 3, .pow10 = 1000 },
> +	[9]  = { .decplaces = 3, .pow10 = 1000 },
> +	[10] = { .decplaces = 4, .pow10 = 10000 },
> +	[11] = { .decplaces = 4, .pow10 = 10000 },
> +	[12] = { .decplaces = 4, .pow10 = 10000 },
> +	[13] = { .decplaces = 5, .pow10 = 100000 },
> +	[14] = { .decplaces = 5, .pow10 = 100000 },
> +	[15] = { .decplaces = 5, .pow10 = 100000 },
> +	[16] = { .decplaces = 6, .pow10 = 1000000 },
> +	[17] = { .decplaces = 6, .pow10 = 1000000 },
> +	[18] = { .decplaces = 6, .pow10 = 1000000 },
> +	[19] = { .decplaces = 7, .pow10 = 10000000 },
> +	[20] = { .decplaces = 7, .pow10 = 10000000 },
> +	[21] = { .decplaces = 7, .pow10 = 10000000 },
> +	[22] = { .decplaces = 8, .pow10 = 100000000 },
> +	[23] = { .decplaces = 8, .pow10 = 100000000 },
> +	[24] = { .decplaces = 8, .pow10 = 100000000 },
> +	[25] = { .decplaces = 9, .pow10 = 1000000000 },
> +	[26] = { .decplaces = 9, .pow10 = 1000000000 },
> +	[27] = { .decplaces = 9, .pow10 = 1000000000 }
> +};
> +
> +static void print_event_value(struct seq_file *m, int binary_bits, u64 val)
> +{
> +	struct fixed_params *fp = &fixed_params[binary_bits];
> +	unsigned long long frac;
> +	char buf[10];
> +
> +	/* Mask off the integer part of the fixed-point value. */
> +	frac = val & GENMASK_ULL(binary_bits, 0);
> +
> +	/*
> +	 * Multiply by 10^{desired decimal places}. The
> +	 * integer part of the fixed point value is now
> +	 * almost what is needed.
> +	 */
> +	frac *= fp->pow10;
> +
> +	/*
> +	 * Round to nearest by adding a value that
> +	 * would be a "1" in the binary_bit + 1 place.
> +	 * Integer part of fixed point value is now
> +	 * the needed value.
> +	 */
> +	frac += 1 << (binary_bits - 1);

The static checker I tried pointed out that since the right side
does "int" math that is assigned to "unsigned long long" this risks
an "overflow before widen" issue. You can avoid overflow by casting
1 to "unsigned long long."

> +
> +	/*
> +	 * Extract the integer part of the value. This
> +	 * is the decimal representation of the original
> +	 * fixed-point fractional value.
> +	 */
> +	frac >>= binary_bits;
> +
> +	/*
> +	 * "frac" is now in the range [0 .. fp->pow10).
> +	 * I.e. string representation will fit into
> +	 * fp->decplaces.
> +	 */
> +	sprintf(buf, "%0*llu", fp->decplaces, frac);

Please use snprintf() to handle changes to fixed_params[].

> +
> +	/* Trim trailing zeroes */
> +	for (int i = fp->decplaces - 1; i > 0; i--) {
> +		if (buf[i] != '0')
> +			break;
> +		buf[i] = '\0';
> +	}
> +	seq_printf(m, "%llu.%s\n", val >> binary_bits, buf);
> +}
> +
>  int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>  {
>  	struct kernfs_open_file *of = m->private;
> @@ -666,8 +753,10 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>  		seq_puts(m, "Error\n");
>  	else if (rr.err == -EINVAL)
>  		seq_puts(m, "Unavailable\n");
> -	else
> +	else if (evt->binary_bits == 0)
>  		seq_printf(m, "%llu\n", rr.val);
> +	else
> +		print_event_value(m, evt->binary_bits, rr.val);
>  

At this time I understand that it will be clear for which
events user space expects floating point numbers. If the architecture in
turn does not support any "binary bits" then I think resctrl
should still print a floating point number ("x.0") to match user space
expectation.

>  out:
>  	rdtgroup_kn_unlock(of->kn);
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index aec26457d82c..076c0cc6e53a 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -897,16 +897,22 @@ struct mon_evt mon_event_all[QOS_NUM_EVENTS] = {
>  	},
>  };
>  
> -void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu)
> +void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu, u32 binary_bits)
>  {
> -	if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS))
> +	if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS) ||
> +			 binary_bits > MAX_BINARY_BITS)

This alignment is off.

>  		return;
>  	if (mon_event_all[eventid].enabled) {
>  		pr_warn("Duplicate enable for event %d\n", eventid);
>  		return;
>  	}
> +	if (binary_bits && !mon_event_all[eventid].is_floating_point) {
> +		pr_warn("Event %d may not be floating point\n", eventid);
> +		return;
> +	}
>  
>  	mon_event_all[eventid].any_cpu = any_cpu;
> +	mon_event_all[eventid].binary_bits = binary_bits;
>  	mon_event_all[eventid].enabled = true;
>  }
>  

Reinette

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 16/30] x86,fs/resctrl: Add and initialize rdt_resource for package scope core monitor
  2025-06-26 16:49 ` [PATCH v6 16/30] x86,fs/resctrl: Add and initialize rdt_resource for package scope core monitor Tony Luck
@ 2025-07-08 22:05   ` Reinette Chatre
  0 siblings, 0 replies; 89+ messages in thread
From: Reinette Chatre @ 2025-07-08 22:05 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony,

On 6/26/25 9:49 AM, Tony Luck wrote:
> Counts for each Intel telemetry event are periodically sent to one or
> more aggregators on each package where accumulated totals are made
> available in MMIO registers.
> 
> Add a new resource for monitoring these events so that CPU hotplug
> notifiers will build domains at the package granularity.

Patch does a bit more than this. This can be expanded to:

	Add a new PERF_PKG resource and introduce package level scope for monitoring
	these events so that CPU hotplug notifiers can build domains at the package
	granularity.

	Use the physical package ID available via topology_physical_package_id()
	to identify the monitoring domains with package level scope. This enables
	user space to use /sys/devices/system/cpu/cpuX/topology/physical_package_id
	to identify the monitoring domain a CPU is associated with.

(Please always feel free to improve.)

> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---

Patch looks good to me.

Reinette

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring
  2025-07-08 20:49       ` Reinette Chatre
@ 2025-07-08 22:43         ` Luck, Tony
  2025-07-08 23:26           ` Reinette Chatre
  0 siblings, 1 reply; 89+ messages in thread
From: Luck, Tony @ 2025-07-08 22:43 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Anil Keshavamurthy,
	Chen Yu, x86, linux-kernel, patches

On Tue, Jul 08, 2025 at 01:49:26PM -0700, Reinette Chatre wrote:
> Hi Tony,
> 
> On 7/8/25 12:08 PM, Luck, Tony wrote:
> > On Thu, Jul 03, 2025 at 10:22:06AM -0700, Luck, Tony wrote:
> >> On Thu, Jul 03, 2025 at 09:45:15AM -0700, Reinette Chatre wrote:
> >>> Hi Tony and Dave,
> >>>
> >>> On 6/26/25 9:49 AM, Tony Luck wrote:
> >>>>  --- 14 ---
> >>>> Add mon_evt::is_floating_point set by resctrl file system code to limit
> >>>> which events architecture code can request be displayed in floating point.
> >>>>
> >>>> Simplified the fixed-point to floating point algorithm. Reinette is
> >>>> correct that the additional "lshift" and "rshift" operations are not
> >>>> required. All that is needed is to multiply the fixed point fractional
> >>>> part by 10**decimal_places, add a rounding amount equivalent to a "1"
> >>>> in the binary place after those supplied. Finally divide by 2**binary_places
> >>>> (with a right shift).
> >>>>
> >>>> Explained in commit comment how I chose the number of decimal places to
> >>>> use for each binary places value.
> >>>>
> >>>> N.B. Dave Martin expressed an opinion that the kernel should not do
> >>>> this conversion. Instead it should enumerate the scaling factor for
> >>>> each event where hardware reported a fixed point value. This patch
> >>>> could be dropped and replaced with one to enumerate scaling factors
> >>>> per event if others agree with Dave.
> >>>
> >>> Could resctrl accommodate both usages? For example, it does not
> >>> look too invasive to add a second file <mon_evt::name>.raw for the
> >>> mon_evt::is_floating_point events that can output something like Dave
> >>> suggested in [1]:
> >>>
> >>> .raw file format could be:
> >>> 	#format:<output that depends on format>
> >>> 	#fixed-point:<value>/<scaling factor>
> >>>
> >>> Example output:
> >>> 	fixed-point:0x60000/0x40000
> >>
> >> Dave: Is that what you want in the ".raw" file? An alternative would be
> >> to put the format information for non-integer events into an
> >> "info" file ("info/{RESOURCE_NAME}_MON/monfeatures.raw.formats"?)
> >> and just put the raw value into the ".raw" file under mon_data.
> > 
> > Note that I thought it easier for users to keep the raw file to just
> > showing a value, rather than including the formatting details in
> > Reinette's proposal.
> 
> Could you please elaborate what makes this easier? It is not obvious to me
> how it is easier for user to open, parse, and close two files rather than one.
> (more below)

I had only considered the case where the format does not change while
the resctrl file system is mounted. So users would read the "info" file
to get the scaling factor once, and then read the event files with a
parser that only has to convert a numerical string.

> > Patch to implement my alternative suggestion below. To the user things
> > look like this:
> > 
> > $ cd /sys/fs/resctrl/mon_data/mon_PERF_PKG_01
> > $ cat core_energy
> > 0.02203
> > $ cat core_energy.raw
> > 5775
> > $ cat /sys/fs/resctrl/info/PERF_PKG_MON/mon_features_raw_scale
> > core_energy 262144
> > activity 262144
> > $ bc -ql
> > 5775 / 262144
> > .02202987670898437500
> > 
> > If this seems useful I can write up a commit message and include
> > as its own patch in v7. Suggestions for better names?
> > 
> 
> I expect users to regularly interact with the monitoring files. For example,
> "read the core_energy of group x every second". An API like above would require
> a contract that the scale value will never change from resctrl mount to
> resctrl unmount. I understand that this implementation supports exactly this by
> allowing an architecture to only enable an event once, but do you think this is
> something that will always be the case? If not then an interface like above will
> require user space to open, parse, close two files instead of one on a frequent basis.
> This is not ideal if user space wants to read monitoring data of multiple
> groups frequently.

While hardware designers do some outlandish things. Changing the format
of an event counter on the fly seems beyond the range of possibility.
How would that even work? A driver would have to rerun enumeration of
the feature every time it read a counter. Or hardware would have to
supply some interrupt to tell s/w that the format changed.

I think it reasonable that resctrl be able to guarantee that the format
described in the info file is valid for the life of the mount.

> I would also like to keep extensibility in mind. We now know that
> unsigned decimal and fixed-point binary needs to be supported. I think any
> new interface used to communicate formatting information to user space should be done
> in a way that can be extended for a new format. That is, for example, why
> I used the actual term "fixed-point" in the example. Something like this avoids
> needing assumptions that a raw value always implies fixed-point format.

This is fair. But could be covered in the "info" file with some more
descriptive way to describe the format. Perhaps:

$ cat /sys/fs/resctrl/info/PERF_PKG_MON/mon_features_raw_scale
core_energy fixed-point scale=262144
activity fixed-point scale=262144

To allow for other types in the future.

> 
> Reinette

-Tony

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring
  2025-07-08 22:43         ` Luck, Tony
@ 2025-07-08 23:26           ` Reinette Chatre
  0 siblings, 0 replies; 89+ messages in thread
From: Reinette Chatre @ 2025-07-08 23:26 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Anil Keshavamurthy,
	Chen Yu, x86, linux-kernel, patches

Hi Tony,

On 7/8/25 3:43 PM, Luck, Tony wrote:
> On Tue, Jul 08, 2025 at 01:49:26PM -0700, Reinette Chatre wrote:
>> Hi Tony,
>>
>> On 7/8/25 12:08 PM, Luck, Tony wrote:
>>> On Thu, Jul 03, 2025 at 10:22:06AM -0700, Luck, Tony wrote:
>>>> On Thu, Jul 03, 2025 at 09:45:15AM -0700, Reinette Chatre wrote:
>>>>> Hi Tony and Dave,
>>>>>
>>>>> On 6/26/25 9:49 AM, Tony Luck wrote:
>>>>>>  --- 14 ---
>>>>>> Add mon_evt::is_floating_point set by resctrl file system code to limit
>>>>>> which events architecture code can request be displayed in floating point.
>>>>>>
>>>>>> Simplified the fixed-point to floating point algorithm. Reinette is
>>>>>> correct that the additional "lshift" and "rshift" operations are not
>>>>>> required. All that is needed is to multiply the fixed point fractional
>>>>>> part by 10**decimal_places, add a rounding amount equivalent to a "1"
>>>>>> in the binary place after those supplied. Finally divide by 2**binary_places
>>>>>> (with a right shift).
>>>>>>
>>>>>> Explained in commit comment how I chose the number of decimal places to
>>>>>> use for each binary places value.
>>>>>>
>>>>>> N.B. Dave Martin expressed an opinion that the kernel should not do
>>>>>> this conversion. Instead it should enumerate the scaling factor for
>>>>>> each event where hardware reported a fixed point value. This patch
>>>>>> could be dropped and replaced with one to enumerate scaling factors
>>>>>> per event if others agree with Dave.
>>>>>
>>>>> Could resctrl accommodate both usages? For example, it does not
>>>>> look too invasive to add a second file <mon_evt::name>.raw for the
>>>>> mon_evt::is_floating_point events that can output something like Dave
>>>>> suggested in [1]:
>>>>>
>>>>> .raw file format could be:
>>>>> 	#format:<output that depends on format>
>>>>> 	#fixed-point:<value>/<scaling factor>
>>>>>
>>>>> Example output:
>>>>> 	fixed-point:0x60000/0x40000
>>>>
>>>> Dave: Is that what you want in the ".raw" file? An alternative would be
>>>> to put the format information for non-integer events into an
>>>> "info" file ("info/{RESOURCE_NAME}_MON/monfeatures.raw.formats"?)
>>>> and just put the raw value into the ".raw" file under mon_data.
>>>
>>> Note that I thought it easier for users to keep the raw file to just
>>> showing a value, rather than including the formatting details in
>>> Reinette's proposal.
>>
>> Could you please elaborate what makes this easier? It is not obvious to me
>> how it is easier for user to open, parse, and close two files rather than one.
>> (more below)
> 
> I had only considered the case where the format does not change while
> the resctrl file system is mounted. So users would read the "info" file
> to get the scaling factor once, and then read the event files with a
> parser that only has to convert a numerical string.
> 
>>> Patch to implement my alternative suggestion below. To the user things
>>> look like this:
>>>
>>> $ cd /sys/fs/resctrl/mon_data/mon_PERF_PKG_01
>>> $ cat core_energy
>>> 0.02203
>>> $ cat core_energy.raw
>>> 5775
>>> $ cat /sys/fs/resctrl/info/PERF_PKG_MON/mon_features_raw_scale
>>> core_energy 262144
>>> activity 262144
>>> $ bc -ql
>>> 5775 / 262144
>>> .02202987670898437500
>>>
>>> If this seems useful I can write up a commit message and include
>>> as its own patch in v7. Suggestions for better names?
>>>
>>
>> I expect users to regularly interact with the monitoring files. For example,
>> "read the core_energy of group x every second". An API like above would require
>> a contract that the scale value will never change from resctrl mount to
>> resctrl unmount. I understand that this implementation supports exactly this by
>> allowing an architecture to only enable an event once, but do you think this is
>> something that will always be the case? If not then an interface like above will
>> require user space to open, parse, close two files instead of one on a frequent basis.
>> This is not ideal if user space wants to read monitoring data of multiple
>> groups frequently.
> 
> While hardware designers do some outlandish things. Changing the format
> of an event counter on the fly seems beyond the range of possibility.
> How would that even work? A driver would have to rerun enumeration of
> the feature every time it read a counter. Or hardware would have to
> supply some interrupt to tell s/w that the format changed.

There is also the new direction of resctrl dynamically enabling/disabling
hardware capabilities to consider. Here it could be reasonable, since this
would be triggered by user space, that a note of "doing this may change the
format" would be sufficient.

Something else to consider is the possibility of hardware using different scales
in different domains if the packages are not "uniform". 

> I think it reasonable that resctrl be able to guarantee that the format
> described in the info file is valid for the life of the mount.

I'd really like to think that it is reasonable also.

> 
>> I would also like to keep extensibility in mind. We now know that
>> unsigned decimal and fixed-point binary needs to be supported. I think any
>> new interface used to communicate formatting information to user space should be done
>> in a way that can be extended for a new format. That is, for example, why
>> I used the actual term "fixed-point" in the example. Something like this avoids
>> needing assumptions that a raw value always implies fixed-point format.
> 
> This is fair. But could be covered in the "info" file with some more
> descriptive way to describe the format. Perhaps:
> 
> $ cat /sys/fs/resctrl/info/PERF_PKG_MON/mon_features_raw_scale
> core_energy fixed-point scale=262144
> activity fixed-point scale=262144
> 
> To allow for other types in the future.

Note that the filename still has "scale" in its name making it specific to
fixed-point. 

It may be expected that every entry in mon_features has an entry in
mon_features_raw_scale (name TBD). This means the existing possible "mon_features"
need to be accommodated (except the _config ones). This may also be an
opportunity to introduce the unit of measurement. For example,

 $ cat /sys/fs/resctrl/info/PERF_PKG_MON/mon_features_raw_scale
 core_energy fixed-point scale=262144 unit=joules
 activity fixed-point scale=262144 unit=farads
 ...

Reinette


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 17/30] x86/resctrl: Discover hardware telemetry events
  2025-06-26 16:49 ` [PATCH v6 17/30] x86/resctrl: Discover hardware telemetry events Tony Luck
  2025-06-27 18:06   ` Luck, Tony
@ 2025-07-08 23:51   ` Reinette Chatre
  1 sibling, 0 replies; 89+ messages in thread
From: Reinette Chatre @ 2025-07-08 23:51 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony,

On 6/26/25 9:49 AM, Tony Luck wrote:
> Hardware has one or more telemetry event aggregators per package
> for each group of telemetry events. Each aggregator provides access
> to event counts in an array of 64-bit values in MMIO space. There
> is a "guid" (in this case a unique 32-bit integer) which refers to
> an XML file published in the https://github.com/intel/Intel-PMT

"an XML file published in the" -> "an XML file published in"?

> that provides all the details about each aggregator.
> 
> The XML files provide the following information:

"The XML files provide" -> "The XML file provides"
(First paragraph refers to a single XML file so I assume it means
all this information is available from one XML file?)

> 1) Which telemetry events are included in the group for this aggregator.
> 2) The order in which the event counters appear for each RMID.
> 3) The value type of each event counter (integer or fixed-point).
> 4) The number of RMIDs supported.
> 5) Which additional aggregator status registers are included.
> 6) The total size of the MMIO region for this aggregator.
> 
> Add select of X86_PLATFORM_DEVICES, INTEL_VSEC and
> INTEL_PMT_TELEMETRY to CONFIG_X86_CPU_RESCTRL to enable use of the
> discovery driver that enumerate all aggregators on the system with
> intel_pmt_get_regions_by_feature(). Call this for each pmt_feature_id
> that indicates per-RMID telemetry.
> 
> Save the returned pmt_feature_group pointers with guids that are known
> to resctrl for use at run time.
> 
> Those pointers are returned to the INTEL_PMT_DISCOVERY driver at
> resctrl_arch_exit() time.

It is not clear to me why this work needs to use two different terms for
the same thing.
Assuming it is required, up until this point this work has only used the
term "aggregator" and within this patch is where this work starts
using "telemetry region" interchangeably. When reading this patch
"telemetry region" is first used in struct event_group without the term
introduced to the reader.

Could you please add a snippet in changelog to help with this transition?

> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/kernel/cpu/resctrl/internal.h  |   3 +
>  arch/x86/kernel/cpu/resctrl/core.c      |   5 +
>  arch/x86/kernel/cpu/resctrl/intel_aet.c | 122 ++++++++++++++++++++++++
>  arch/x86/Kconfig                        |   3 +
>  arch/x86/kernel/cpu/resctrl/Makefile    |   1 +
>  5 files changed, 134 insertions(+)
>  create mode 100644 arch/x86/kernel/cpu/resctrl/intel_aet.c
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 224b71730cc3..e93b15bf6aab 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -169,4 +169,7 @@ void __init intel_rdt_mbm_apply_quirk(void);
>  
>  void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
>  
> +bool intel_aet_get_events(void);
> +void __exit intel_aet_exit(void);
> +
>  #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index a5f01cac2363..9144766da836 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -734,6 +734,9 @@ void resctrl_arch_pre_mount(void)
>  
>  	if (!atomic_try_cmpxchg(&only_once, &old, 1))
>  		return;
> +
> +	if (!intel_aet_get_events())
> +		return;
>  }
>  
>  enum {
> @@ -1086,6 +1089,8 @@ late_initcall(resctrl_arch_late_init);
>  
>  static void __exit resctrl_arch_exit(void)
>  {
> +	intel_aet_exit();
> +
>  	cpuhp_remove_state(rdt_online);
>  
>  	resctrl_exit();
> diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> new file mode 100644
> index 000000000000..b09044b093dd
> --- /dev/null
> +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> @@ -0,0 +1,122 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Resource Director Technology(RDT)
> + * - Intel Application Energy Telemetry
> + *
> + * Copyright (C) 2025 Intel Corporation
> + *
> + * Author:
> + *    Tony Luck <tony.luck@intel.com>
> + */
> +
> +#define pr_fmt(fmt)   "resctrl: " fmt
> +
> +#include <linux/cleanup.h>
> +#include <linux/cpu.h>
> +#include <linux/intel_vsec.h>
> +#include <linux/resctrl.h>
> +
> +#include "internal.h"
> +
> +/**
> + * struct event_group - All information about a group of telemetry events.
> + * @pfg:		Points to the aggregated telemetry space information
> + *			within the OOBMSM driver that contains data for all
> + *			telemetry regions.
> + * @guid:		Unique number per XML description file.
> + */
> +struct event_group {
> +	/* Data fields for additional structures to manage this group. */
> +	struct pmt_feature_group	*pfg;
> +
> +	/* Remaining fields initialized from XML file. */
> +	u32				guid;
> +};
> +
> +/*
> + * Link: https://github.com/intel/Intel-PMT
> + * File: xml/CWF/OOBMSM/RMID-ENERGY/cwf_aggregator.xml
> + */
> +static struct event_group energy_0x26696143 = {
> +	.guid		= 0x26696143,
> +};
> +
> +/*
> + * Link: https://github.com/intel/Intel-PMT
> + * File: xml/CWF/OOBMSM/RMID-PERF/cwf_aggregator.xml
> + */
> +static struct event_group perf_0x26557651 = {
> +	.guid		= 0x26557651,
> +};
> +
> +static struct event_group *known_event_groups[] = {
> +	&energy_0x26696143,
> +	&perf_0x26557651,
> +};
> +
> +#define NUM_KNOWN_GROUPS ARRAY_SIZE(known_event_groups)
> +
> +/* Stub for now */
> +static int configure_events(struct event_group *e, struct pmt_feature_group *p)
> +{
> +	return -EINVAL;
> +}
> +
> +DEFINE_FREE(intel_pmt_put_feature_group, struct pmt_feature_group *,
> +		if (!IS_ERR_OR_NULL(_T))
> +			intel_pmt_put_feature_group(_T))

As you state in cover this snippet cannot make checkpatch.pl happy.
I would propose that the issues trying to appease checkpatch.pl are
documented in the maintainer notes of this patch.

> +
> +/*
> + * Make a request to the INTEL_PMT_DISCOVERY driver for the
> + * pmt_feature_group for a specific feature. If there is
> + * one the returned structure has an array of telemetry_region
> + * structures. Each describes one telemetry aggregator.
> + * Try to configure any with a known matching guid.

I interpret "configure" to involve "write" activity where, for example,
settings are changed and things are ... configured. The way this is
written it seems that resctrl is configuring (i.e. making changes to)
events known to it. This is not the case though (?), resctrl is just
doing discovery of events here. How about above is instead:

	Try to discover any with a known matching guid

and configure_events() -> discover_events()?

> + */
> +static bool get_pmt_feature(enum pmt_feature_id feature)
> +{
> +	struct pmt_feature_group *p __free(intel_pmt_put_feature_group) = NULL;
> +	struct event_group **peg;
> +	bool ret;
> +
> +	p = intel_pmt_get_regions_by_feature(feature);
> +
> +	if (IS_ERR_OR_NULL(p))
> +		return false;
> +
> +	for (peg = &known_event_groups[0]; peg < &known_event_groups[NUM_KNOWN_GROUPS]; peg++) {
> +		ret = configure_events(*peg, p);
> +		if (!ret) {
> +			(*peg)->pfg = no_free_ptr(p);
> +			return true;
> +		}
> +	}
> +
> +	return false;
> +}
> +
> +/*
> + * Ask OOBMSM discovery driver for all the RMID based telemetry groups
> + * that it supports.
> + */
> +bool intel_aet_get_events(void)
> +{
> +	bool ret1, ret2;
> +
> +	ret1 = get_pmt_feature(FEATURE_PER_RMID_ENERGY_TELEM);
> +	ret2 = get_pmt_feature(FEATURE_PER_RMID_PERF_TELEM);
> +
> +	return ret1 || ret2;
> +}
> +
> +void __exit intel_aet_exit(void)
> +{
> +	struct event_group **peg;
> +
> +	for (peg = &known_event_groups[0]; peg < &known_event_groups[NUM_KNOWN_GROUPS]; peg++) {
> +		if ((*peg)->pfg) {
> +			intel_pmt_put_feature_group((*peg)->pfg);
> +			(*peg)->pfg = NULL;
> +		}
> +	}
> +}
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 71019b3b54ea..8eb68d2230be 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig

Kconfig has its own thread.

Reinette


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 18/30] x86/resctrl: Count valid telemetry aggregators per package
  2025-06-26 16:49 ` [PATCH v6 18/30] x86/resctrl: Count valid telemetry aggregators per package Tony Luck
@ 2025-07-09  2:20   ` Reinette Chatre
  2025-07-09 18:12     ` Luck, Tony
  0 siblings, 1 reply; 89+ messages in thread
From: Reinette Chatre @ 2025-07-09  2:20 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony,

On 6/26/25 9:49 AM, Tony Luck wrote:
> There may be multiple telemetry aggregators per package, each enumerated
> by a telemetry region structure in the feature group.

This is the valuable connection missing from earlier patch changelog.

> 
> Scan the array of telemetry region structures and count how many are
> in each package in preparation to allocate structures to save the MMIO
> addresses for each in a convenient format for use when reading event
> counters.
> 
> Sanity check that the telemetry region structures have a valid
> package_id and that the size they report for the MMIO space is as
> large as expected from the XML description of the registers in
> the region.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/kernel/cpu/resctrl/intel_aet.c | 55 ++++++++++++++++++++++++-
>  1 file changed, 53 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> index b09044b093dd..8d67ed709a74 100644
> --- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
> +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> @@ -15,6 +15,7 @@
>  #include <linux/cpu.h>
>  #include <linux/intel_vsec.h>
>  #include <linux/resctrl.h>
> +#include <linux/slab.h>
>  
>  #include "internal.h"
>  
> @@ -24,6 +25,7 @@
>   *			within the OOBMSM driver that contains data for all
>   *			telemetry regions.
>   * @guid:		Unique number per XML description file.
> + * @mmio_size:		Number of bytes of MMIO registers for this group.
>   */
>  struct event_group {
>  	/* Data fields for additional structures to manage this group. */
> @@ -31,14 +33,19 @@ struct event_group {
>  
>  	/* Remaining fields initialized from XML file. */
>  	u32				guid;
> +	size_t				mmio_size;
>  };
>  
> +#define XML_MMIO_SIZE(num_rmids, num_events, num_extra_status)	\
> +	(((num_rmids) * (num_events) + (num_extra_status)) * sizeof(u64))
> +
>  /*
>   * Link: https://github.com/intel/Intel-PMT
>   * File: xml/CWF/OOBMSM/RMID-ENERGY/cwf_aggregator.xml
>   */
>  static struct event_group energy_0x26696143 = {
>  	.guid		= 0x26696143,
> +	.mmio_size	= XML_MMIO_SIZE(576, 2, 3),
>  };
>  
>  /*
> @@ -47,6 +54,7 @@ static struct event_group energy_0x26696143 = {
>   */
>  static struct event_group perf_0x26557651 = {
>  	.guid		= 0x26557651,
> +	.mmio_size	= XML_MMIO_SIZE(576, 7, 3),
>  };
>  
>  static struct event_group *known_event_groups[] = {
> @@ -56,10 +64,53 @@ static struct event_group *known_event_groups[] = {
>  
>  #define NUM_KNOWN_GROUPS ARRAY_SIZE(known_event_groups)
>  
> -/* Stub for now */
> +static bool skip_this_region(struct telemetry_region *tr, struct event_group *e)
> +{
> +	if (tr->guid != e->guid)
> +		return true;
> +	if (tr->plat_info.package_id >= topology_max_packages()) {
> +		pr_warn_once("Bad package %d in guid 0x%x\n", tr->plat_info.package_id,

If struct event_group includes the RMID telemetry feature ID (see below) then it
would be helpful to print that here.

> +			     tr->guid);
> +		return true;
> +	}
> +	if (tr->size < e->mmio_size) {

Why not "tr->size != e->mmio_size"?

Patch #25 explains how tr->num_rmids may be smaller than the number of RMIDs in XML file
but from that description I got the impression that telemetry regions should always
support all registers documented in the XML file. Similarly, in the earlier "fake OOBMSM"
code the MMIO size of the "energy" MMIO size could still accommodate the 576 RMIDs while
the regions were configured to only support 64 RMIDs. 

> +		pr_warn_once("MMIO space %zu too small for guid 0x%x\n", tr->size, e->guid);
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
> +/*
> + * Configure events from one pmt_feature_group.

"Configure events" -> "Discover events"?

> + * 1) Count how many per package.

It is not clear what is counted here ... first sentence is "Configure events" followed
by this unspecific "Count how many per package" that can be interpreted that it
counts events here ... but it is actually counting telemetry regions?

> + * 2...) To be continued.

This comment implies that as capabilities are added to this function the comments
will be amended to document these new capabilities ... but at the end of this series
this function comment still reads as "2...) To be continued".

> + */
>  static int configure_events(struct event_group *e, struct pmt_feature_group *p)
>  {
> -	return -EINVAL;
> +	int *pkgcounts __free(kfree) = NULL;
> +	struct telemetry_region *tr;
> +	int num_pkgs;
> +
> +	num_pkgs = topology_max_packages();
> +
> +	/* Get per-package counts of telemetry_regions for this event group */

"telemetry_regions" -> "telemetry regions"? Or is it referring to the actual struct
here?

> +	for (int i = 0; i < p->count; i++) {
> +		tr = &p->regions[i];
> +		if (skip_this_region(tr, e))
> +			continue;

The function calling configure_event() does:

	struct event_group **peg;

	for (peg = &known_event_groups[0]; peg < &known_event_groups[NUM_KNOWN_GROUPS]; peg++)  {
		ret = configure_events(*peg, p);
		...
	}

As I understand there is 1:1 relationship between struct event_group and struct pmt_feature_group.
It thus seems unnecessary to loop through all the telemetry regions of a struct pmt_feature_group
if it is known to not be associated with the "event group"?
Could it be helpful to add a new (hardcoded) event_group::id that is of type enum pmt_feature_id
that can be used to ensure that only relevant struct pmt_feature_group is used to discover events
for a particular struct event_group?

Another consideration is that this implementation seems to require that guids are unique across
all telemetry regions of all RMID telemetry features, is this guaranteed?

> +		if (!pkgcounts) {
> +			pkgcounts = kcalloc(num_pkgs, sizeof(*pkgcounts), GFP_KERNEL);
> +			if (!pkgcounts)
> +				return -ENOMEM;
> +		}
> +		pkgcounts[tr->plat_info.package_id]++;
> +	}
> +
> +	if (!pkgcounts)
> +		return -ENODEV;
> +
> +	return 0;
>  }
>  
>  DEFINE_FREE(intel_pmt_put_feature_group, struct pmt_feature_group *,

Reinette

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 19/30] x86/resctrl: Complete telemetry event enumeration
  2025-06-26 16:49 ` [PATCH v6 19/30] x86/resctrl: Complete telemetry event enumeration Tony Luck
@ 2025-07-09  2:38   ` Reinette Chatre
  0 siblings, 0 replies; 89+ messages in thread
From: Reinette Chatre @ 2025-07-09  2:38 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony,

On 6/26/25 9:49 AM, Tony Luck wrote:
> Counters for telemetry events are in MMIO space. Each telemetry_region
> structure returned in the pmt_feature_group returned from OOBMSM contains
> the base MMIO address for the counters.
> 
> Scan all the telemetry_region structures again and save the number
> of regions together with a flex array of the mmio addresses for each

mmio -> MMIO

> aggregator indexed by package id. Note that there may be multiple
> aggregators per package.

This final "note" seems redundant considering the paragraph
reads: "Scan all the telemetry_region structures (plural) again and
save the number of regions (plural) ..." Perhaps just rework the
paragraph to read:

	There may be multiple aggregators per package. Scan all the
	telemetry_region structures ...

> 
> Completed structure for each event group looks like this:
> 
>              +---------------------+---------------------+
> pkginfo** -->|   pkginfo[0]         |    pkginfo[1]      |

since there are multiple arrays in this depiction it can be made
specific with a:
	       | pkginfo[package ID 0]  | pkginfo[package ID 1] |


>              +---------------------+---------------------+
>                         |                     |
>                         v                     v
>                 +----------------+    +----------------+
>                 |struct mmio_info|    |struct mmio_info|
>                 +----------------+    +----------------+
>                 |num_regions = N |    |num_regions = N |
>                 |  addrs[0]      |    |  addrs[0]      |
>                 |  addrs[1]      |    |  addrs[1]      |
>                 |    ...         |    |    ...         |
>                 |  addrs[N-1]    |    |  addrs[N-1]    |
>                 +----------------+    +----------------+
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/kernel/cpu/resctrl/intel_aet.c | 64 +++++++++++++++++++++++++
>  1 file changed, 64 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> index 8d67ed709a74..c770039b2525 100644
> --- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
> +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> @@ -19,17 +19,32 @@
>  
>  #include "internal.h"
>  
> +/**
> + * struct mmio_info - MMIO address information for one event group of a package.
> + * @num_regions:	Number of telemetry regions on this package.
> + * @addrs:		Array of MMIO addresses, one per telemetry region on this package.
> + *
> + * Provides convenient access to all MMIO addresses of one event group
> + * for one package. Used when reading event data on a package.
> + */
> +struct mmio_info {

This struct name is a bit generic. What do you think of "pkg_mmio_info" to
at least help describe it is per package?

> +	int		num_regions;
> +	void __iomem	*addrs[] __counted_by(num_regions);
> +};
> +
>  /**
>   * struct event_group - All information about a group of telemetry events.
>   * @pfg:		Points to the aggregated telemetry space information
>   *			within the OOBMSM driver that contains data for all
>   *			telemetry regions.
> + * @pkginfo:		Per-package MMIO addresses of telemetry regions belonging to this group.
>   * @guid:		Unique number per XML description file.
>   * @mmio_size:		Number of bytes of MMIO registers for this group.
>   */
>  struct event_group {
>  	/* Data fields for additional structures to manage this group. */
>  	struct pmt_feature_group	*pfg;
> +	struct mmio_info		**pkginfo;
>  
>  	/* Remaining fields initialized from XML file. */
>  	u32				guid;
> @@ -81,6 +96,20 @@ static bool skip_this_region(struct telemetry_region *tr, struct event_group *e)
>  	return false;
>  }
>  
> +static void free_mmio_info(struct mmio_info **mmi)
> +{
> +	int num_pkgs = topology_max_packages();
> +
> +	if (!mmi)
> +		return;
> +
> +	for (int i = 0; i < num_pkgs; i++)
> +		kfree(mmi[i]);
> +	kfree(mmi);
> +}
> +
> +DEFINE_FREE(mmio_info, struct mmio_info **, free_mmio_info(_T))
> +
>  /*
>   * Configure events from one pmt_feature_group.
>   * 1) Count how many per package.

(no update to the function comments)

> @@ -88,8 +117,10 @@ static bool skip_this_region(struct telemetry_region *tr, struct event_group *e)
>   */
>  static int configure_events(struct event_group *e, struct pmt_feature_group *p)
>  {
> +	struct mmio_info **pkginfo __free(mmio_info) = NULL;
>  	int *pkgcounts __free(kfree) = NULL;
>  	struct telemetry_region *tr;
> +	struct mmio_info *mmi;
>  	int num_pkgs;
>  
>  	num_pkgs = topology_max_packages();
> @@ -99,6 +130,12 @@ static int configure_events(struct event_group *e, struct pmt_feature_group *p)
>  		tr = &p->regions[i];
>  		if (skip_this_region(tr, e))
>  			continue;
> +
> +		if (e->pkginfo) {
> +			pr_warn_once("Duplicate telemetry information for guid 0x%x\n", e->guid);
> +			return -EINVAL;
> +		}

It does not seem necessary to repeat this check for every telemetry region. Could this check
be moved to start of function to avoid parsing struct pmt_feature_group entirely?

> +
>  		if (!pkgcounts) {
>  			pkgcounts = kcalloc(num_pkgs, sizeof(*pkgcounts), GFP_KERNEL);
>  			if (!pkgcounts)
> @@ -110,6 +147,32 @@ static int configure_events(struct event_group *e, struct pmt_feature_group *p)
>  	if (!pkgcounts)
>  		return -ENODEV;
>  
> +	/* Allocate array for per-package struct mmio_info data */
> +	pkginfo = kcalloc(num_pkgs, sizeof(*pkginfo), GFP_KERNEL);
> +	if (!pkginfo)
> +		return -ENOMEM;
> +
> +	/*
> +	 * Allocate per-package mmio_info structures and initialize
> +	 * count of telemetry_regions in each one.
> +	 */
> +	for (int i = 0; i < num_pkgs; i++) {
> +		pkginfo[i] = kzalloc(struct_size(pkginfo[i], addrs, pkgcounts[i]), GFP_KERNEL);
> +		if (!pkginfo[i])
> +			return -ENOMEM;
> +		pkginfo[i]->num_regions = pkgcounts[i];
> +	}
> +
> +	/* Save MMIO address(es) for each telemetry region in per-package structures */
> +	for (int i = 0; i < p->count; i++) {
> +		tr = &p->regions[i];
> +		if (skip_this_region(tr, e))
> +			continue;
> +		mmi = pkginfo[tr->plat_info.package_id];
> +		mmi->addrs[--pkgcounts[tr->plat_info.package_id]] = tr->addr;
> +	}
> +	e->pkginfo = no_free_ptr(pkginfo);
> +
>  	return 0;
>  }
>  
> @@ -169,5 +232,6 @@ void __exit intel_aet_exit(void)
>  			intel_pmt_put_feature_group((*peg)->pfg);
>  			(*peg)->pfg = NULL;
>  		}
> +		free_mmio_info((*peg)->pkginfo);
>  	}
>  }

Reinette

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 20/30] x86,fs/resctrl: Fill in details of Clearwater Forest events
  2025-06-26 16:49 ` [PATCH v6 20/30] x86,fs/resctrl: Fill in details of Clearwater Forest events Tony Luck
@ 2025-07-09  3:00   ` Reinette Chatre
  0 siblings, 0 replies; 89+ messages in thread
From: Reinette Chatre @ 2025-07-09  3:00 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony,

On 6/26/25 9:49 AM, Tony Luck wrote:
> Clearwater Forest supports two energy related telemetry events
> and seven perf style events. The counters are arranged in per-RMID
> blocks like this:
> 
> 	MMIO offset:0x00 Counter for RMID 0 Event 0
> 	MMIO offset:0x08 Counter for RMID 0 Event 1
> 	MMIO offset:0x10 Counter for RMID 0 Event 2
> 	MMIO offset:0x18 Counter for RMID 1 Event 0
> 	MMIO offset:0x20 Counter for RMID 1 Event 1
> 	MMIO offset:0x28 Counter for RMID 1 Event 2
> 	...

It is a bit unexpected that this patch is (a) specific to Clearwater Forest,
(b) it is noted that Clearwater Forest has _two_ energy related events and
_seven_ perf related events ... but then the example is for a layout with
_three_ events?

> 
> Define these events in the file system code and add the events
> to the event_group structures.
> 
> PMT_EVENT_ENERGY and PMT_EVENT_ACTIVITY are produced in fixed point
> format. File system code must output as floating point values.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  include/linux/resctrl_types.h           | 11 ++++++++
>  arch/x86/kernel/cpu/resctrl/intel_aet.c | 33 +++++++++++++++++++++++
>  fs/resctrl/monitor.c                    | 35 ++++++++++++++-----------
>  3 files changed, 64 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/resctrl_types.h b/include/linux/resctrl_types.h
> index d98351663c2c..6838b02d5ca3 100644
> --- a/include/linux/resctrl_types.h
> +++ b/include/linux/resctrl_types.h
> @@ -47,6 +47,17 @@ enum resctrl_event_id {
>  	QOS_L3_MBM_TOTAL_EVENT_ID	= 0x02,
>  	QOS_L3_MBM_LOCAL_EVENT_ID	= 0x03,
>  
> +	/* Intel Telemetry Events */
> +	PMT_EVENT_ENERGY,
> +	PMT_EVENT_ACTIVITY,
> +	PMT_EVENT_STALLS_LLC_HIT,
> +	PMT_EVENT_C1_RES,
> +	PMT_EVENT_UNHALTED_CORE_CYCLES,
> +	PMT_EVENT_STALLS_LLC_MISS,
> +	PMT_EVENT_AUTO_C6_RES,
> +	PMT_EVENT_UNHALTED_REF_CYCLES,
> +	PMT_EVENT_UOPS_RETIRED,
> +
>  	/* Must be the last */
>  	QOS_NUM_EVENTS,
>  };
> diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> index c770039b2525..f9b2959693a0 100644
> --- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
> +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> @@ -32,6 +32,20 @@ struct mmio_info {
>  	void __iomem	*addrs[] __counted_by(num_regions);
>  };
>  
> +/**
> + * struct pmt_event - Telemetry event.
> + * @id:		Resctrl event id.
> + * @idx:	Counter index within each per-RMID block of counters.
> + * @bin_bits:	Zero for integer valued events, else number bits in fixed-point.
> + */
> +struct pmt_event {
> +	enum resctrl_event_id	id;
> +	int			idx;
> +	int			bin_bits;

As I understand a negative value will be inappropriate for idx as well as bin_bits.
It looks like "unsigned int" is more appropriate?

> +};
> +
> +#define EVT(_id, _idx, _bits) { .id = _id, .idx = _idx, .bin_bits = _bits }
> +
>  /**
>   * struct event_group - All information about a group of telemetry events.
>   * @pfg:		Points to the aggregated telemetry space information
> @@ -40,6 +54,8 @@ struct mmio_info {
>   * @pkginfo:		Per-package MMIO addresses of telemetry regions belonging to this group.
>   * @guid:		Unique number per XML description file.
>   * @mmio_size:		Number of bytes of MMIO registers for this group.
> + * @num_events:		Number of events in this group.
> + * @evts:		Array of event descriptors.
>   */
>  struct event_group {
>  	/* Data fields for additional structures to manage this group. */
> @@ -49,6 +65,8 @@ struct event_group {
>  	/* Remaining fields initialized from XML file. */
>  	u32				guid;
>  	size_t				mmio_size;
> +	int				num_events;

unsigned int also seems more appropriate to reflect this is a value that
can never be negative. Also relevant to mmio_info::num_regions.

> +	struct pmt_event		evts[] __counted_by(num_events);
>  };
>  
>  #define XML_MMIO_SIZE(num_rmids, num_events, num_extra_status)	\

Reinette

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 21/30] x86,fs/resctrl: Add architectural event pointer
  2025-06-26 16:49 ` [PATCH v6 21/30] x86,fs/resctrl: Add architectural event pointer Tony Luck
@ 2025-07-09  3:21   ` Reinette Chatre
  2025-07-09 21:16     ` Reinette Chatre
  0 siblings, 1 reply; 89+ messages in thread
From: Reinette Chatre @ 2025-07-09  3:21 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony,

On 6/26/25 9:49 AM, Tony Luck wrote:
> The resctrl file system layer passed the domain, rmid, and event id to
> resctrl_arch_rmid_read() to fetch an event counter.

Please write in present tense and use uppercase for acronyms.

  The resctrl file system layer passes the domain, RMID, and event id to
  resctrl_arch_rmid_read() to fetch an event counter.


> 
> For some resources this may not be enough information to efficiently
> access the counter.
> 
> Add mon_evt::arch_priv void pointer. Architecture code can initialize
> this when marking each event enabled.
> 
> File system code passes this pointer to resctrl_arch_rmid_read().
> 
> Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  include/linux/resctrl.h               |  6 ++++--
>  fs/resctrl/internal.h                 |  1 +
>  arch/x86/kernel/cpu/resctrl/core.c    |  6 +++---
>  arch/x86/kernel/cpu/resctrl/monitor.c |  2 +-
>  fs/resctrl/monitor.c                  | 12 ++++++++----
>  5 files changed, 17 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 76c54b81e426..b9f2690bee1e 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -383,7 +383,8 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
>  
>  #define MAX_BINARY_BITS	27
>  
> -void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu, u32 binary_bits);
> +void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu,
> +			      u32 binary_bits, void *arch_priv);
>  
>  bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid);
>  
> @@ -478,6 +479,7 @@ void resctrl_arch_pre_mount(void);
>   *			only.
>   * @rmid:		rmid of the counter to read.
>   * @eventid:		eventid to read, e.g. L3 occupancy.
> + * @arch_priv:		architecture private data for this event.

Please append some detail on how it is used. For example,
	"Architecture private data for this event. The @arch_priv provided by
	 the architecture via resctrl_enable_mon_event()."

>   * @val:		result of the counter read in bytes.
>   * @arch_mon_ctx:	An architecture specific value from
>   *			resctrl_arch_mon_ctx_alloc(), for MPAM this identifies
> @@ -495,7 +497,7 @@ void resctrl_arch_pre_mount(void);
>   */
>  int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
>  			   u32 closid, u32 rmid, enum resctrl_event_id eventid,
> -			   u64 *val, void *arch_mon_ctx);
> +			   void *arch_priv, u64 *val, void *arch_mon_ctx);
>  
>  /**
>   * resctrl_arch_rmid_read_context_check()  - warn about invalid contexts
> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
> index 53ced959a27d..2126006075f3 100644
> --- a/fs/resctrl/internal.h
> +++ b/fs/resctrl/internal.h
> @@ -71,6 +71,7 @@ struct mon_evt {
>  	bool			is_floating_point;
>  	int			binary_bits;
>  	bool			enabled;
> +	void			*arch_priv;
>  };
>  
>  extern struct mon_evt mon_event_all[QOS_NUM_EVENTS];
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 9144766da836..f3144fe918dd 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -909,15 +909,15 @@ static __init bool get_rdt_mon_resources(void)
>  	bool ret = false;
>  
>  	if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) {
> -		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false, 0);
> +		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false, 0, NULL);
>  		ret = true;
>  	}
>  	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
> -		resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID, false, 0);
> +		resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID, false, 0, NULL);
>  		ret = true;
>  	}
>  	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
> -		resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID, false, 0);
> +		resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID, false, 0, NULL);
>  		ret = true;
>  	}
>  
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 043f777378a6..185b203f6321 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -219,7 +219,7 @@ static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width)
>  
>  int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
>  			   u32 unused, u32 rmid, enum resctrl_event_id eventid,
> -			   u64 *val, void *ignored)
> +			   void *arch_priv, u64 *val, void *ignored)
>  {
>  	int cpu = cpumask_any(&hdr->cpu_mask);
>  	struct rdt_hw_l3_mon_domain *hw_dom;
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index cff8af3a263e..c4b092aec9f8 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -160,7 +160,7 @@ void __check_limbo(struct rdt_l3_mon_domain *d, bool force_free)
>  
>  		entry = __rmid_entry(idx);
>  		if (resctrl_arch_rmid_read(r, &d->hdr, entry->closid, entry->rmid,
> -					   QOS_L3_OCCUP_EVENT_ID, &val,
> +					   QOS_L3_OCCUP_EVENT_ID, NULL, &val,

This is resctrl fs code. To maintain clear separation it should not assume architecture
behavior, which this does by setting arch_priv to NULL because x86 does not use it.

>  					   arch_mon_ctx)) {
>  			rmid_dirty = true;
>  		} else {
> @@ -402,7 +402,8 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
>  			return -EINVAL;
>  
>  		rr->err = resctrl_arch_rmid_read(rr->r, rr->hdr, closid, rmid,
> -						 rr->evt->evtid, &tval, rr->arch_mon_ctx);
> +						 rr->evt->evtid, rr->evt->arch_priv,
> +						 &tval, rr->arch_mon_ctx);
>  		if (rr->err)
>  			return rr->err;
>  
> @@ -430,7 +431,8 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
>  		if (d->ci_id != rr->ci_id)
>  			continue;
>  		err = resctrl_arch_rmid_read(rr->r, &d->hdr, closid, rmid,
> -					     rr->evt->evtid, &tval, rr->arch_mon_ctx);
> +					     rr->evt->evtid, rr->evt->arch_priv,
> +					     &tval, rr->arch_mon_ctx);
>  		if (!err) {
>  			rr->val += tval;
>  			ret = 0;
> @@ -902,7 +904,8 @@ struct mon_evt mon_event_all[QOS_NUM_EVENTS] = {
>  	MON_EVENT(PMT_EVENT_UOPS_RETIRED,		"uops_retired",		RDT_RESOURCE_PERF_PKG,	false),
>  };
>  
> -void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu, u32 binary_bits)
> +void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu,
> +			      u32 binary_bits, void *arch_priv)
>  {
>  	if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS) ||
>  			 binary_bits > MAX_BINARY_BITS)
> @@ -918,6 +921,7 @@ void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu, u32 b
>  
>  	mon_event_all[eventid].any_cpu = any_cpu;
>  	mon_event_all[eventid].binary_bits = binary_bits;
> +	mon_event_all[eventid].arch_priv = arch_priv;
>  	mon_event_all[eventid].enabled = true;
>  }
>  

Reinette

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 22/30] x86/resctrl: Read core telemetry events
  2025-06-26 16:49 ` [PATCH v6 22/30] x86/resctrl: Read core telemetry events Tony Luck
@ 2025-07-09 15:48   ` Reinette Chatre
  2025-07-09 21:57     ` Luck, Tony
  0 siblings, 1 reply; 89+ messages in thread
From: Reinette Chatre @ 2025-07-09 15:48 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony,

What does the "core" in the subject refer to?

On 6/26/25 9:49 AM, Tony Luck wrote:
> The resctrl file system passes requests to read event monitor files to
> the architecture resctrl_arch_rmid_read() to collect values
> from hardware counters.
> 
> Use the resctrl resource to differentiate between calls to read legacy
> L3 events from the new telemetry events (which are attached to
> RDT_RESOURCE_PERF_PKG).
> 
> There may be multiple aggregators tracking each package, so scan all of
> them and add up all counters.
> 
> Enable the events marked as readable from any CPU providing an
> mon_evt::arch_priv pointer to the struct pmt_event for each
> event.
> 
> At run time when a user reads an event file the file system code
> provides the enum resctrl_event_id for the event and the arch_priv
> pointer that was supplied when the event was enabled.
> 
> Resctrl now uses readq() so depends on X86_64. Update Kconfig.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/kernel/cpu/resctrl/internal.h  |  2 ++
>  arch/x86/kernel/cpu/resctrl/intel_aet.c | 46 +++++++++++++++++++++++++
>  arch/x86/kernel/cpu/resctrl/monitor.c   |  3 ++
>  arch/x86/Kconfig                        |  2 +-
>  4 files changed, 52 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index e93b15bf6aab..e8d2a754bc0c 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -171,5 +171,7 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
>  
>  bool intel_aet_get_events(void);
>  void __exit intel_aet_exit(void);
> +int intel_aet_read_event(int domid, int rmid, enum resctrl_event_id evtid,
> +			 void *arch_priv, u64 *val);
>  
>  #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
> diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> index f9b2959693a0..10fd8b04105e 100644
> --- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
> +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> @@ -14,6 +14,7 @@
>  #include <linux/cleanup.h>
>  #include <linux/cpu.h>
>  #include <linux/intel_vsec.h>
> +#include <linux/io.h>
>  #include <linux/resctrl.h>
>  #include <linux/slab.h>
>  
> @@ -206,6 +207,13 @@ static int configure_events(struct event_group *e, struct pmt_feature_group *p)
>  	}
>  	e->pkginfo = no_free_ptr(pkginfo);
>  
> +	for (int i = 0; i < e->num_events; i++) {
> +		enum resctrl_event_id eventid;
> +
> +		eventid = e->evts[i].id;
> +		resctrl_enable_mon_event(eventid, true, e->evts[i].bin_bits, &e->evts[i]);
> +	}
> +
>  	return 0;
>  }
>  
> @@ -268,3 +276,41 @@ void __exit intel_aet_exit(void)
>  		free_mmio_info((*peg)->pkginfo);
>  	}
>  }
> +
> +#define DATA_VALID	BIT_ULL(63)
> +#define DATA_BITS	GENMASK_ULL(62, 0)
> +
> +/*
> + * Read counter for an event on a domain (summing all aggregators
> + * on the domain).
> + */
> +int intel_aet_read_event(int domid, int rmid, enum resctrl_event_id eventid,
> +			 void *arch_priv, u64 *val)
> +{
> +	struct pmt_event *pevt = arch_priv;
> +	struct mmio_info *mmi;
> +	struct event_group *e;
> +	u64 evtcount;
> +	void *pevt0;

Should this be a struct pmt_event *?

> +	int idx;
> +
> +	pevt0 = pevt - pevt->idx;
> +	e = container_of(pevt0, struct event_group, evts);
> +	idx = rmid * e->num_events;
> +	idx += pevt->idx;
> +	mmi = e->pkginfo[domid];
> +
> +	if (idx * sizeof(u64) + sizeof(u64) > e->mmio_size) {
> +		pr_warn_once("MMIO index %d out of range\n", idx);
> +		return -EIO;
> +	}
> +
> +	for (int i = 0; i < mmi->num_regions; i++) {
> +		evtcount = readq(mmi->addrs[i] + idx * sizeof(u64));
> +		if (!(evtcount & DATA_VALID))
> +			return -EINVAL;
> +		*val += evtcount & DATA_BITS;
> +	}
> +
> +	return 0;
> +}
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 185b203f6321..51d7d99336c6 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -232,6 +232,9 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
>  
>  	resctrl_arch_rmid_read_context_check();
>  
> +	if (r->rid == RDT_RESOURCE_PERF_PKG)
> +		return intel_aet_read_event(hdr->id, rmid, eventid, arch_priv, val);
> +
>  	if (r->rid != RDT_RESOURCE_L3)
>  		return -EINVAL;
>  

This part looks good to me. I expect Kconfig part to look different next time.

Reinette


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 14/30] x86,fs/resctrl: Support binary fixed point event counters
  2025-07-08 21:46   ` Reinette Chatre
@ 2025-07-09 16:52     ` Luck, Tony
  0 siblings, 0 replies; 89+ messages in thread
From: Luck, Tony @ 2025-07-09 16:52 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Anil Keshavamurthy,
	Chen Yu, x86, linux-kernel, patches

On Tue, Jul 08, 2025 at 02:46:00PM -0700, Reinette Chatre wrote:
> > -void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu)
> > +void resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu, u32 binary_bits)
> >  {
> > -	if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS))
> > +	if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS) ||
> > +			 binary_bits > MAX_BINARY_BITS)
> 
> This alignment is off.

I think the problem is location of the parentheses. An invalid
"binary_bits" value should trigger the WARN, just as an invalid
eventid.

So I'll change to:

	if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS ||
			 binary_bits > MAX_BINARY_BITS))

-Tony

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 18/30] x86/resctrl: Count valid telemetry aggregators per package
  2025-07-09  2:20   ` Reinette Chatre
@ 2025-07-09 18:12     ` Luck, Tony
  2025-07-09 22:13       ` Reinette Chatre
  0 siblings, 1 reply; 89+ messages in thread
From: Luck, Tony @ 2025-07-09 18:12 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Anil Keshavamurthy,
	Chen Yu, x86, linux-kernel, patches

On Tue, Jul 08, 2025 at 07:20:35PM -0700, Reinette Chatre wrote:
> As I understand there is 1:1 relationship between struct event_group and struct pmt_feature_group.
> It thus seems unnecessary to loop through all the telemetry regions of a struct pmt_feature_group
> if it is known to not be associated with the "event group"?
> Could it be helpful to add a new (hardcoded) event_group::id that is of type enum pmt_feature_id
> that can be used to ensure that only relevant struct pmt_feature_group is used to discover events
> for a particular struct event_group?
> 
> Another consideration is that this implementation seems to require that guids are unique across
> all telemetry regions of all RMID telemetry features, is this guaranteed?

The guids are unique. The XML file tags them like this:

	<TELEM:uniqueid>26557651</TELEM:uniqueid>

the "guid" naming of the value comes from the Intel PMT_DISCOVERY driver.

An alternative to adding the new event_group::id field would be to
separate the arrays of known event groups. I.e. change from:

static struct event_group *known_event_groups[] = {
        &energy_0x26696143,
        &perf_0x26557651,
};

to

static struct event_group *known_energy_event_groups[] = {
        &energy_0x26696143,
};

static struct event_group *known_perf_event_groups[] = {
        &perf_0x26557651,
};

then only scan the appropriate array that matches the
enum pmt_feature_id passed to get_pmt_feature().


With only one option in each array today this looks
like extra infrasctruture. But I already have a patch
for the next generation system that adds another guid.

-Tony


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 21/30] x86,fs/resctrl: Add architectural event pointer
  2025-07-09  3:21   ` Reinette Chatre
@ 2025-07-09 21:16     ` Reinette Chatre
  0 siblings, 0 replies; 89+ messages in thread
From: Reinette Chatre @ 2025-07-09 21:16 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony,

On 7/8/25 8:21 PM, Reinette Chatre wrote:
> On 6/26/25 9:49 AM, Tony Luck wrote:
>> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
>> index 53ced959a27d..2126006075f3 100644
>> --- a/fs/resctrl/internal.h
>> +++ b/fs/resctrl/internal.h
>> @@ -71,6 +71,7 @@ struct mon_evt {
>>  	bool			is_floating_point;
>>  	int			binary_bits;
>>  	bool			enabled;
>> +	void			*arch_priv;
>>  };

Missed this earlier, please make associated change 
to kernel-doc. Please include kernel-doc check in your
patch preparation:
 ./scripts/kernel-doc -v -none <files>

Reinette

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 22/30] x86/resctrl: Read core telemetry events
  2025-07-09 15:48   ` Reinette Chatre
@ 2025-07-09 21:57     ` Luck, Tony
  2025-07-09 22:13       ` Reinette Chatre
  0 siblings, 1 reply; 89+ messages in thread
From: Luck, Tony @ 2025-07-09 21:57 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Anil Keshavamurthy,
	Chen Yu, x86, linux-kernel, patches

On Wed, Jul 09, 2025 at 08:48:47AM -0700, Reinette Chatre wrote:
> Hi Tony,
> 
> What does the "core" in the subject refer to?

The events are collected by each core. But since resctrl reports the
aggregated per-package values this is confusing. I'll drop "core" from
the Subject line.

[snip]

> > +/*
> > + * Read counter for an event on a domain (summing all aggregators
> > + * on the domain).
> > + */
> > +int intel_aet_read_event(int domid, int rmid, enum resctrl_event_id eventid,
> > +			 void *arch_priv, u64 *val)
> > +{
> > +	struct pmt_event *pevt = arch_priv;
> > +	struct mmio_info *mmi;
> > +	struct event_group *e;
> > +	u64 evtcount;
> > +	void *pevt0;
> 
> Should this be a struct pmt_event *?

I thought that too. But container_of() gets confused about types (I
think because the evts[] element is a flex array.

With "struct pmt_event *pevt0;" the compiler complains:

arch/x86/kernel/cpu/resctrl/intel_aet.c: In function ‘intel_aet_read_event’:
./include/linux/build_bug.h:78:41: error: static assertion failed: "pointer type mismatch in container_of()"
   78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
      |                                         ^~~~~~~~~~~~~~
./include/linux/build_bug.h:77:34: note: in expansion of macro ‘__static_assert’
   77 | #define static_assert(expr, ...) __static_assert(expr, ##__VA_ARGS__, #expr)
      |                                  ^~~~~~~~~~~~~~~
./include/linux/container_of.h:20:9: note: in expansion of macro ‘static_assert’
   20 |         static_assert(__same_type(*(ptr), ((type *)0)->member) ||       \
      |         ^~~~~~~~~~~~~
arch/x86/kernel/cpu/resctrl/intel_aet.c:311:13: note: in expansion of macro ‘container_of’
  311 |         e = container_of(pevt0, struct event_group, evts);
      |             ^~~~~~~~~~~~

Making it void * is the "get of of jail free" case in container_of()
with the test " || __same_type(*(ptr), void)"

If there is a better way to do this, let me know.

-Tony

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 12/30] fs/resctrl: Make event details accessible to functions when reading events
  2025-06-26 16:49 ` [PATCH v6 12/30] fs/resctrl: Make event details accessible to functions when reading events Tony Luck
@ 2025-07-09 22:12   ` Reinette Chatre
  0 siblings, 0 replies; 89+ messages in thread
From: Reinette Chatre @ 2025-07-09 22:12 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony,

On 6/26/25 9:49 AM, Tony Luck wrote:
> All details about a monitor event are kept in the mon_evt structure.
> Upper levels of code only provide the event id to lower levels.
> This will become a problem when new attributes are added to the
> mon_evt structure.
> 
> Change the mon_data and rmid_read structures to hold a pointer
> to the mon_evt structure instead of just taking a copy of the
> event id.
> 
> No functional change.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  fs/resctrl/internal.h    |  8 ++++----
>  fs/resctrl/ctrlmondata.c | 16 ++++++++--------
>  fs/resctrl/monitor.c     | 17 +++++++++--------
>  fs/resctrl/rdtgroup.c    |  6 +++---
>  4 files changed, 24 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/resctrl/internal.h b/fs/resctrl/internal.h
> index b12242d20e61..1458fda64423 100644
> --- a/fs/resctrl/internal.h
> +++ b/fs/resctrl/internal.h
> @@ -90,7 +90,7 @@ extern struct mon_evt mon_event_all[QOS_NUM_EVENTS];
>  struct mon_data {
>  	struct list_head	list;
>  	enum resctrl_res_level	rid;
> -	enum resctrl_event_id	evtid;
> +	struct mon_evt		*evt;

Please also update associated kernel-doc.

Reinette


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 18/30] x86/resctrl: Count valid telemetry aggregators per package
  2025-07-09 18:12     ` Luck, Tony
@ 2025-07-09 22:13       ` Reinette Chatre
  2025-07-09 22:48         ` Luck, Tony
  0 siblings, 1 reply; 89+ messages in thread
From: Reinette Chatre @ 2025-07-09 22:13 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Anil Keshavamurthy,
	Chen Yu, x86, linux-kernel, patches

Hi Tony,

On 7/9/25 11:12 AM, Luck, Tony wrote:
> On Tue, Jul 08, 2025 at 07:20:35PM -0700, Reinette Chatre wrote:
>> As I understand there is 1:1 relationship between struct event_group and struct pmt_feature_group.
>> It thus seems unnecessary to loop through all the telemetry regions of a struct pmt_feature_group
>> if it is known to not be associated with the "event group"?
>> Could it be helpful to add a new (hardcoded) event_group::id that is of type enum pmt_feature_id
>> that can be used to ensure that only relevant struct pmt_feature_group is used to discover events
>> for a particular struct event_group?
>>
>> Another consideration is that this implementation seems to require that guids are unique across
>> all telemetry regions of all RMID telemetry features, is this guaranteed?
> 
> The guids are unique. The XML file tags them like this:
> 
> 	<TELEM:uniqueid>26557651</TELEM:uniqueid>

I interpret above that guid is expected to be unique for one
telemetry feature. It is not clear to me that it implies that the guid
is unique across all telemetry features. For example, what prevents
a platform from using the same guid for all the telemetry features it
supports?

> 
> the "guid" naming of the value comes from the Intel PMT_DISCOVERY driver.
> 
> An alternative to adding the new event_group::id field would be to
> separate the arrays of known event groups. I.e. change from:
> 
> static struct event_group *known_event_groups[] = {
>         &energy_0x26696143,
>         &perf_0x26557651,
> };
> 
> to
> 
> static struct event_group *known_energy_event_groups[] = {
>         &energy_0x26696143,
> };
> 
> static struct event_group *known_perf_event_groups[] = {
>         &perf_0x26557651,
> };
> 
> then only scan the appropriate array that matches the
> enum pmt_feature_id passed to get_pmt_feature().
> 
> 
> With only one option in each array today this looks
> like extra infrasctruture. But I already have a patch
> for the next generation system that adds another guid.

This also sounds good. Thank you.

Reinette


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 22/30] x86/resctrl: Read core telemetry events
  2025-07-09 21:57     ` Luck, Tony
@ 2025-07-09 22:13       ` Reinette Chatre
  0 siblings, 0 replies; 89+ messages in thread
From: Reinette Chatre @ 2025-07-09 22:13 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Anil Keshavamurthy,
	Chen Yu, x86, linux-kernel, patches

Hi Tony,

On 7/9/25 2:57 PM, Luck, Tony wrote:
> On Wed, Jul 09, 2025 at 08:48:47AM -0700, Reinette Chatre wrote:
>> Hi Tony,
>>
>> What does the "core" in the subject refer to?
> 
> The events are collected by each core. But since resctrl reports the
> aggregated per-package values this is confusing. I'll drop "core" from
> the Subject line.
> 
> [snip]
> 
>>> +/*
>>> + * Read counter for an event on a domain (summing all aggregators
>>> + * on the domain).
>>> + */
>>> +int intel_aet_read_event(int domid, int rmid, enum resctrl_event_id eventid,
>>> +			 void *arch_priv, u64 *val)
>>> +{
>>> +	struct pmt_event *pevt = arch_priv;
>>> +	struct mmio_info *mmi;
>>> +	struct event_group *e;
>>> +	u64 evtcount;
>>> +	void *pevt0;
>>
>> Should this be a struct pmt_event *?
> 
> I thought that too. But container_of() gets confused about types (I
> think because the evts[] element is a flex array.
> 
> With "struct pmt_event *pevt0;" the compiler complains:
> 
> arch/x86/kernel/cpu/resctrl/intel_aet.c: In function ‘intel_aet_read_event’:
> ./include/linux/build_bug.h:78:41: error: static assertion failed: "pointer type mismatch in container_of()"
>    78 | #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
>       |                                         ^~~~~~~~~~~~~~
> ./include/linux/build_bug.h:77:34: note: in expansion of macro ‘__static_assert’
>    77 | #define static_assert(expr, ...) __static_assert(expr, ##__VA_ARGS__, #expr)
>       |                                  ^~~~~~~~~~~~~~~
> ./include/linux/container_of.h:20:9: note: in expansion of macro ‘static_assert’
>    20 |         static_assert(__same_type(*(ptr), ((type *)0)->member) ||       \
>       |         ^~~~~~~~~~~~~
> arch/x86/kernel/cpu/resctrl/intel_aet.c:311:13: note: in expansion of macro ‘container_of’
>   311 |         e = container_of(pevt0, struct event_group, evts);
>       |             ^~~~~~~~~~~~
> 
> Making it void * is the "get of of jail free" case in container_of()
> with the test " || __same_type(*(ptr), void)"
> 
> If there is a better way to do this, let me know.

Thanks for investigating. I did not consider this.

Reinette

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 23/30] x86/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG
  2025-06-26 16:49 ` [PATCH v6 23/30] x86/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG Tony Luck
@ 2025-07-09 22:13   ` Reinette Chatre
  0 siblings, 0 replies; 89+ messages in thread
From: Reinette Chatre @ 2025-07-09 22:13 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony,

On 6/26/25 9:49 AM, Tony Luck wrote:
> The L3 resource has several requirements for domains. There are structures
> that hold the 64-bit values of counters, and elements to keep track of
> the overflow and limbo threads.
> 
> None of these are needed for the PERF_PKG resource. The hardware counters
> are wide enough that they do not wrap around for decades.
> 
> Define a new rdt_perf_pkg_mon_domain structure which just consists of
> the standard rdt_domain_hdr to keep track of domain id and CPU mask.
> 
> Change domain_add_cpu_mon(), domain_remove_cpu_mon(),
> resctrl_offline_mon_domain(), and resctrl_online_mon_domain() to check

This patch does not seem to contain all the changes referred to here,
there are not changes to resctrl_offline_mon_domain() or
resctrl_online_mon_domain().

> resource type and perform only the operations needed for domains in the
> PERF_PKG resource.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---

Patch looks good to me.

Reinette


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 24/30] x86/resctrl: Add energy/perf choices to rdt boot option
  2025-06-26 16:49 ` [PATCH v6 24/30] x86/resctrl: Add energy/perf choices to rdt boot option Tony Luck
@ 2025-07-09 22:14   ` Reinette Chatre
  0 siblings, 0 replies; 89+ messages in thread
From: Reinette Chatre @ 2025-07-09 22:14 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony,

On 6/26/25 9:49 AM, Tony Luck wrote:
> Hardware backed resctrl features are enumerated by X86_FEATURE_*
> flags. These may be overridden by quirks to disable features in the case
> of errata.
> 
> Users can use kernel command line options to either disable a feature,
> or to force enable a feature that was disabled by a quirk.
> 
> Provide similar functionality for software defined features that do not
> have an X86_FEATURE_* flag.
> 
> Unlike other options that are tied to X86_FEATURE_* flags, these must be
> queried by name. Add rdt_is_software_feature_enabled() to check whether
> quirks or kernel command line have disabled a feature. Just like the
> hardware feature options the command line enable overrides quirk disable.

Referring to "Intel Application Energy Telemetry" as a software feature does
not sound right. It is a hardware feature, no? Perhaps just "a hardware feature
that does not have a X86_FEATURE_* flag"? With this rdt_is_software_feature_enabled()
could perhaps just be rdt_is_feature_enabled() (open to other ideas)?

The idea that a "software feature" may have a "quirk" also sounds wrong ...
to me that sounds like a different way of saying a "software bug" and we do not
handle software bugs with quirks. Referring to it as a "software feature" just
does not sound right.

> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |  2 +-
>  arch/x86/kernel/cpu/resctrl/internal.h        |  2 ++
>  arch/x86/kernel/cpu/resctrl/core.c            | 30 +++++++++++++++++++
>  arch/x86/kernel/cpu/resctrl/intel_aet.c       |  7 +++++
>  4 files changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index f1f2c0874da9..4c12159f3ea0 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -6066,7 +6066,7 @@
>  	rdt=		[HW,X86,RDT]
>  			Turn on/off individual RDT features. List is:
>  			cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
> -			mba, smba, bmec.
> +			mba, smba, bmec, energy, perf.
>  			E.g. to turn on cmt and turn off mba use:
>  				rdt=cmt,!mba
>  
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index e8d2a754bc0c..ee1c6204722e 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -169,6 +169,8 @@ void __init intel_rdt_mbm_apply_quirk(void);
>  
>  void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
>  
> +bool rdt_is_software_feature_enabled(char *option);
> +
>  bool intel_aet_get_events(void);
>  void __exit intel_aet_exit(void);
>  int intel_aet_read_event(int domid, int rmid, enum resctrl_event_id evtid,
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index f857f92e7b8b..f9f3bc58290e 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -791,6 +791,8 @@ enum {
>  	RDT_FLAG_MBA,
>  	RDT_FLAG_SMBA,
>  	RDT_FLAG_BMEC,
> +	RDT_FLAG_ENERGY,
> +	RDT_FLAG_PERF,
>  };
>  
>  #define RDT_OPT(idx, n, f)	\
> @@ -816,6 +818,8 @@ static struct rdt_options rdt_options[]  __ro_after_init = {
>  	RDT_OPT(RDT_FLAG_MBA,	    "mba",	X86_FEATURE_MBA),
>  	RDT_OPT(RDT_FLAG_SMBA,	    "smba",	X86_FEATURE_SMBA),
>  	RDT_OPT(RDT_FLAG_BMEC,	    "bmec",	X86_FEATURE_BMEC),
> +	RDT_OPT(RDT_FLAG_ENERGY,    "energy",	0),
> +	RDT_OPT(RDT_FLAG_PERF,	    "perf",	0),
>  };
>  #define NUM_RDT_OPTIONS ARRAY_SIZE(rdt_options)
>  
> @@ -865,6 +869,32 @@ bool rdt_cpu_has(int flag)
>  	return ret;
>  }
>  
> +/*
> + * Software options that are not based on X86_FEATURE_* bits.

"Hardware features that do not have X86_FEATURE_* bits"?

> + * There is no "h/w does not support this at all" case.
> + * Assume that the caller has already determined that s/w

(please expand h/w to hardware, s/w to software)

> + * support is present and just needs to check if the option has been

Looking at how this function is used I rather interpret
intel_pmt_get_regions_by_feature() as determining that hardware
support is present.

> + * disabled by a quirk that has not been overridden * by a command

(stray *)

> + * line option.
> + */
> +bool rdt_is_software_feature_enabled(char *name)
> +{

Reinette


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 25/30] x86/resctrl: Handle number of RMIDs supported by telemetry resources
  2025-06-26 16:49 ` [PATCH v6 25/30] x86/resctrl: Handle number of RMIDs supported by telemetry resources Tony Luck
@ 2025-07-09 22:17   ` Reinette Chatre
  0 siblings, 0 replies; 89+ messages in thread
From: Reinette Chatre @ 2025-07-09 22:17 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony,

On 6/26/25 9:49 AM, Tony Luck wrote:
> There are now three meanings for "number of RMIDs":
> 
> 1) The number for legacy features enumerated by CPUID leaf 0xF. This
> is the maximum number of distinct values that can be loaded into the
> IA32_PQR_ASSOC MSR. Note that systems with Sub-NUMA Cluster mode enabled
> will force scaling down the CPUID enumerated value by the number of SNC
> nodes per L3-cache.
> 
> 2) The number of registers in MMIO space for each event. This
> is enumerated in the XML files and is the value initialized into
> event_group::num_rmids. This will be overwritten with a lower
> value if hardware does not support all these registers at the
> same time (see next case).
> 
> 3) The number of "h/w counters" (this isn't a strictly accurate
> description of how things work, but serves as a useful analogy that
> does describe the limitations) feeding to those MMIO registers. This
> is enumerated in telemetry_region::num_rmids returned from the call to
> intel_pmt_get_regions_by_feature()
> 
> Event groups with insufficient "h/w counter" to track all RMIDs are
> difficult for users to use, since the system may reassign "h/w counters"
> as any time. This means that users cannot reliably collect two consecutive

"as any time" -> "at any time"?

> event counts to compute the rate at which events are occurring.
> 
> Ignore such under-resourced event groups unless the user explicitly
> requests to enable them using the "rdt=" Linux boot argument.
> 
> Scan all enabled event groups and assign the RDT_RESOURCE_PERF_PKG
> resource "num_rmids" value to the smallest of these values to ensure
> that all resctrl groups have equal monitor capabilities.

The "to ensure that all resctrl groups ..." seems to describe the next patch?

> 
> N.B. Changed type of rdt_resource::num_rmids to u32 to match.

rdt_resource::num_rmids -> rdt_resource::num_rmid 

Please also check that existing code accommodates this changed type.
See for example,
	rdt_num_rmids_show() {
		...
		seq_printf(seq, "%d\n", r->num_rmid);
		...
	}

> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  include/linux/resctrl.h                 |  2 +-
>  arch/x86/kernel/cpu/resctrl/internal.h  |  4 ++++
>  arch/x86/kernel/cpu/resctrl/core.c      | 20 +++++++++++++++++
>  arch/x86/kernel/cpu/resctrl/intel_aet.c | 29 +++++++++++++++++++++++++
>  arch/x86/kernel/cpu/resctrl/monitor.c   |  2 ++
>  5 files changed, 56 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index b9f2690bee1e..35ae24822493 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -288,7 +288,7 @@ struct rdt_resource {
>  	int			rid;
>  	bool			alloc_capable;
>  	bool			mon_capable;
> -	int			num_rmid;
> +	u32			num_rmid;
>  	enum resctrl_scope	ctrl_scope;
>  	enum resctrl_scope	mon_scope;
>  	struct resctrl_cache	cache;
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index ee1c6204722e..11f25c225837 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -18,6 +18,8 @@
>  
>  #define RMID_VAL_UNAVAIL		BIT_ULL(62)
>  
> +extern int rdt_num_system_rmids;
> +
>  /*
>   * With the above fields in use 62 bits remain in MSR_IA32_QM_CTR for
>   * data to be returned. The counter width is discovered from the hardware
> @@ -171,6 +173,8 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
>  
>  bool rdt_is_software_feature_enabled(char *option);
>  
> +bool rdt_is_software_feature_force_enabled(char *name);
> +
>  bool intel_aet_get_events(void);
>  void __exit intel_aet_exit(void);
>  int intel_aet_read_event(int domid, int rmid, enum resctrl_event_id evtid,
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index f9f3bc58290e..7fe4e8111773 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -895,6 +895,26 @@ bool rdt_is_software_feature_enabled(char *name)
>  	return ret;
>  }
>  
> +/*
> + * Similar to rdt_is_software_feature_enabled() but the test is whether

This is just too similar and makes this code quite confusing ... (more below)

> + * the user has force enabled the feature on the kernel command line.
> + */
> +bool rdt_is_software_feature_force_enabled(char *name)
> +{
> +	struct rdt_options *o;
> +	bool ret = false;
> +
> +	for (o = rdt_options; o < &rdt_options[NUM_RDT_OPTIONS]; o++) {
> +		if (!strcmp(name, o->name)) {
> +			if (o->force_on)
> +				ret = true;
> +			break;
> +		}
> +	}
> +
> +	return ret;
> +}
> +
>  bool resctrl_arch_is_evt_configurable(enum resctrl_event_id evt)
>  {
>  	if (!rdt_cpu_has(X86_FEATURE_BMEC))
> diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> index 1d2511984156..1d9edd409883 100644
> --- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
> +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> @@ -15,6 +15,7 @@
>  #include <linux/cpu.h>
>  #include <linux/intel_vsec.h>
>  #include <linux/io.h>
> +#include <linux/minmax.h>
>  #include <linux/resctrl.h>
>  #include <linux/slab.h>
>  
> @@ -55,6 +56,9 @@ struct pmt_event {
>   *			telemetry regions.
>   * @pkginfo:		Per-package MMIO addresses of telemetry regions belonging to this group.
>   * @guid:		Unique number per XML description file.
> + * @num_rmids:		Number of RMIDS supported by this group. Adjusted downwards
> + *			if enumeration from intel_pmt_get_regions_by_feature() indicates
> + *			fewer RMIDs can be tracked simultaneously.
>   * @mmio_size:		Number of bytes of MMIO registers for this group.
>   * @num_events:		Number of events in this group.
>   * @evts:		Array of event descriptors.
> @@ -67,6 +71,7 @@ struct event_group {
>  
>  	/* Remaining fields initialized from XML file. */
>  	u32				guid;
> +	u32				num_rmids;
>  	size_t				mmio_size;
>  	int				num_events;
>  	struct pmt_event		evts[] __counted_by(num_events);
> @@ -82,6 +87,7 @@ struct event_group {
>  static struct event_group energy_0x26696143 = {
>  	.name		= "energy",
>  	.guid		= 0x26696143,
> +	.num_rmids	= 576,
>  	.mmio_size	= XML_MMIO_SIZE(576, 2, 3),
>  	.num_events	= 2,
>  	.evts				= {
> @@ -97,6 +103,7 @@ static struct event_group energy_0x26696143 = {
>  static struct event_group perf_0x26557651 = {
>  	.name		= "perf",
>  	.guid		= 0x26557651,
> +	.num_rmids	= 576,
>  	.mmio_size	= XML_MMIO_SIZE(576, 7, 3),
>  	.num_events	= 7,
>  	.evts				= {
> @@ -177,6 +184,17 @@ static int configure_events(struct event_group *e, struct pmt_feature_group *p)
>  			return -EINVAL;
>  		}
>  
> +		/*
> +		 * Ignore event group with fewer RMIDs than can be loaded
> +		 * into the IA32_PQR_ASSOC MSR unless the user used
> +		 * the rdt= boot option to specifically ask for it to
> +		 * be enabled.
> +		 */
> +		if (tr->num_rmids < rdt_num_system_rmids &&

This check comes as a surprise after thinking that I understood the changelog
and function comments. I was expecting a check against e->num_rmids instead?


The changelog states:

> +		    !rdt_is_software_feature_force_enabled(e->name))

Having this call here is unexpected. The way resctrl has handled quirks thus far
is to disable a particular feature explicitly based on some external knowledge (eg. errata).
This is different in that resctrl does not hardcode that a feature is disabled but attempts
to determine this by something that is discoverable from hardware self. Integrating the
feature enable/disable into the flow that actually initializes the feature looks
complicated to me and the strange rdt_is_software_feature_force_enabled() supports this.

Could you please consider adding a new function call at beginning of
configure_events()/discover_events() (before calling rdt_is_feature_enabled()) that
peeks into the struct pmt_feature_group to verify if all parameters are "sane" and
then explicitly disables the feature (for example, set_rdt_options("!perf"))
if any parameter is not "sane". This makes it clear what requirements are from
hardware and what is considered a "quirk" and not even attempt to enable it by default.

Following this function with rdt_is_feature_enabled() still enables user to override
such disable.



> +			return -EINVAL;
> +		e->num_rmids = min(e->num_rmids, tr->num_rmids);
> +
>  		if (!pkgcounts) {
>  			pkgcounts = kcalloc(num_pkgs, sizeof(*pkgcounts), GFP_KERNEL);
>  			if (!pkgcounts)
> @@ -263,11 +281,22 @@ static bool get_pmt_feature(enum pmt_feature_id feature)
>   */
>  bool intel_aet_get_events(void)
>  {
> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
> +	struct event_group **eg;
>  	bool ret1, ret2;
>  
>  	ret1 = get_pmt_feature(FEATURE_PER_RMID_ENERGY_TELEM);
>  	ret2 = get_pmt_feature(FEATURE_PER_RMID_PERF_TELEM);
>  
> +	for (eg = &known_event_groups[0]; eg < &known_event_groups[NUM_KNOWN_GROUPS]; eg++) {
> +		if (!(*eg)->pfg)
> +			continue;
> +		if (r->num_rmid)
> +			r->num_rmid = min(r->num_rmid, (*eg)->num_rmids);
> +		else
> +			r->num_rmid = (*eg)->num_rmids;
> +	}
> +
>  	return ret1 || ret2;
>  }
>  
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 51d7d99336c6..b36634f1439b 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -33,6 +33,7 @@ bool rdt_mon_capable;
>  
>  #define CF(cf)	((unsigned long)(1048576 * (cf) + 0.5))
>  
> +int rdt_num_system_rmids;

Is this necessary? If I understand correctly the next patch will ensure that
resctrl will not use fewer RMIDs than what can be loaded into IA32_PQR_ASSOC MSR.

If it is required, considering that this patch goes through effort to change type
of rdt_resource::num_rmid to u32, should this also be u32?

>  static int snc_nodes_per_l3_cache = 1;
>  
>  /*
> @@ -358,6 +359,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>  	resctrl_rmid_realloc_limit = boot_cpu_data.x86_cache_size * 1024;
>  	hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale / snc_nodes_per_l3_cache;
>  	r->num_rmid = (boot_cpu_data.x86_cache_max_rmid + 1) / snc_nodes_per_l3_cache;
> +	rdt_num_system_rmids = r->num_rmid;
>  	hw_res->mbm_width = MBM_CNTR_WIDTH_BASE;
>  
>  	if (mbm_offset > 0 && mbm_offset <= MBM_CNTR_WIDTH_OFFSET_MAX)

Reinette

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 26/30] x86,fs/resctrl: Move RMID initialization to first mount
  2025-06-26 16:49 ` [PATCH v6 26/30] x86,fs/resctrl: Move RMID initialization to first mount Tony Luck
@ 2025-07-09 22:18   ` Reinette Chatre
  0 siblings, 0 replies; 89+ messages in thread
From: Reinette Chatre @ 2025-07-09 22:18 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony,

On 6/26/25 9:49 AM, Tony Luck wrote:
> The resctrl file system code assumed that the only monitor events were
> tied to the RDT_RESOURCE_L3 resource. Also that the number of supported
> RMIDs was enumerated during early initialization.
> 
> RDT_RESOURCE_PERF_PKG breaks both of those assumptions.
> 
> Delay the final enumeration of the number of RMIDs and subsequent
> allocation of structures until first mount of the resctrl file system.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---

...

> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index c4b092aec9f8..e877f5b97d18 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -796,15 +796,27 @@ void mbm_setup_overflow_handler(struct rdt_l3_mon_domain *dom, unsigned long del
>  		schedule_delayed_work_on(cpu, &dom->mbm_over, delay);
>  }
>  
> -static int dom_data_init(struct rdt_resource *r)
> +/*
> + * resctrl_dom_data_init() - Initialise global monitoring structures.
> + *
> + * Allocate and initialise global monitor resources that do not belong to a
> + * specific domain. i.e. the rmid_ptrs[] used for the limbo and free lists.
> + * Called once during boot after the struct rdt_resource's have been configured
> + * but before the filesystem is mounted.
> + * Resctrl's cpuhp callbacks may be called before this point to bring a domain
> + * online.
> + *
> + * Returns 0 for success, or -ENOMEM.
> + */

As per changelog and the goal the intention of this change is to move RMID
related allocation to first mount ...

> +int resctrl_mon_dom_data_init(void)
>  {
> +	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);

... this is a red flag ...

>  	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
>  	u32 num_closid = resctrl_arch_get_num_closid(r);
>  	struct rmid_entry *entry = NULL;
> -	int err = 0, i;
>  	u32 idx;
> +	int i;
>  
> -	mutex_lock(&rdtgroup_mutex);
>  	if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID)) {
>  		u32 *tmp;
>  
> @@ -815,10 +827,8 @@ static int dom_data_init(struct rdt_resource *r)
>  		 * use.
>  		 */
>  		tmp = kcalloc(num_closid, sizeof(*tmp), GFP_KERNEL);
> -		if (!tmp) {
> -			err = -ENOMEM;
> -			goto out_unlock;
> -		}
> +		if (!tmp)
> +			return -ENOMEM;
>  
>  		closid_num_dirty_rmid = tmp;

... looks like this L3 specific initialization got caught up in this move, resulting
in L3 specific monitoring initialization unnecessarily split between
resctrl initialization and first mount. 
I think this can be simplified by moving closid_num_dirty_rmid initialization to
resctrl_mon_l3_resource_init() where it seems to belong to be with other L3 initialization.
Doing so will make resctrl_mon_dom_data_init() dedicated to the RMID related
allocation that the changelog describes. As part of this the function could
also receive a more specific name.

...

> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index d9bb01edd582..3d87e6c4c600 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -2585,6 +2585,7 @@ static int rdt_get_tree(struct fs_context *fc)
>  	unsigned long flags = RFTYPE_CTRL_BASE;
>  	struct rdt_l3_mon_domain *dom;
>  	struct rdt_resource *r;
> +	static bool once;
>  	int ret;
>  
>  	resctrl_arch_pre_mount();
> @@ -2599,6 +2600,13 @@ static int rdt_get_tree(struct fs_context *fc)
>  		goto out;
>  	}
>  
> +	if (resctrl_arch_mon_capable() && !once) {
> +		ret = resctrl_mon_dom_data_init();
> +		if (ret)
> +			goto out;
> +		once = true;

Instead of the caller needing to keep track of this the function
self can just not allocate the RMID structures if it already exists.

> +	}
> +
>  	ret = rdtgroup_setup_root(ctx);
>  	if (ret)
>  		goto out;
> @@ -4298,9 +4306,7 @@ int resctrl_init(void)
>  
>  	thread_throttle_mode_init();
>  
> -	ret = resctrl_mon_resource_init();
> -	if (ret)
> -		return ret;
> +	resctrl_mon_l3_resource_init();
>  
>  	ret = sysfs_create_mount_point(fs_kobj, "resctrl");
>  	if (ret) {

Reinette


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 28/30] fs/resctrl: Provide interface to create a debugfs info directory
  2025-06-26 16:49 ` [PATCH v6 28/30] fs/resctrl: Provide interface to create a debugfs info directory Tony Luck
@ 2025-07-09 22:19   ` Reinette Chatre
  0 siblings, 0 replies; 89+ messages in thread
From: Reinette Chatre @ 2025-07-09 22:19 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony,

How about:
 fs/resctrl: Provide interface to create architecture specific debugfs area

On 6/26/25 9:49 AM, Tony Luck wrote:
> Architectures are constrained to just the file interfaces provided by
> the file system for each resource. This does not allow for architecture
> specific debug interfaces.
> 
> Add resctrl_debugfs_mon_info_mkdir() which creates a directory in the

Patch calls it resctrl_debugfs_mon_info_arch_mkdir().

> debugfs file system for a resource. Naming follows the layout of the
> main resctrl hierarchy:
> 
> 	/sys/kernel/debug/resctrl/info/{resource}_MON

Patch creates 
	/sys/kernel/debug/resctrl/info/{resource}_MON/{arch}

Accompanying this change it will be useful to describe how {arch} is
initialized. As a user interface I think it is helpful to connect
that the directory name will match what user can query via "uname -m".

Could you please add a snippet here on how architecture is
expected to use this? There may be some discussion about how archs
could differ in usage of this and mentioning it explicitly in changelog
will help folks see what this enables instead of appearing to sneak
this new feature in.

> 
> Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  include/linux/resctrl.h |  6 ++++++
>  fs/resctrl/rdtgroup.c   | 24 ++++++++++++++++++++++++
>  2 files changed, 30 insertions(+)
> 
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 35ae24822493..a8ffd9f61c46 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -569,6 +569,12 @@ void resctrl_arch_reset_all_ctrls(struct rdt_resource *r);
>  extern unsigned int resctrl_rmid_realloc_threshold;
>  extern unsigned int resctrl_rmid_realloc_limit;
>  
> +/**
> + * resctrl_debugfs_mon_info_arch_mkdir() - Create a debugfs info directory.
> + * @r:	Resource (must be mon_capable).

Could you please add a snippet here or in function comments of
resctrl_debugfs_mon_info_arch_mkdir() on how architecture is expected to use this?
That is, make it "official" that arch is free to create debugfs files
(and directories) to support its architecture specific debugging 
associated with resource @r.

With this being arch API in include/linux/resctrl.h I think a note on
lifecycle would be useful considering there is no partner
"resctrl_debugfs_mon_info_arch_rmdir()".

As an API to arch it will also be useful to describe what this function returns.
For example, it will be helpful to know that this passes through the
return value of debugfs_create_dir() that arch can use to determine, for example,
whether debugfs is enabled in kernel.

> + */
> +struct dentry *resctrl_debugfs_mon_info_arch_mkdir(struct rdt_resource *r);
> +
>  int resctrl_init(void);
>  void resctrl_exit(void);
>  
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index 3d87e6c4c600..511362a67532 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -24,6 +24,7 @@
>  #include <linux/sched/task.h>
>  #include <linux/slab.h>
>  #include <linux/user_namespace.h>
> +#include <linux/utsname.h>
>  
>  #include <uapi/linux/magic.h>
>  
> @@ -4350,6 +4351,29 @@ int resctrl_init(void)
>  	return ret;
>  }
>  
> +/*
> + * Create /sys/kernel/debug/resctrl/info/{r->name}_MON/arch directory

To help make clear that this is not the actual "arch" string: 
arch -> {arch}

> + * by request for architecture to use.
> + */
> +struct dentry *resctrl_debugfs_mon_info_arch_mkdir(struct rdt_resource *r)
> +{
> +	static struct dentry *debugfs_resctrl_info;
> +	struct dentry *moninfodir;
> +	char name[32];
> +
> +	if (!r->mon_capable)
> +		return NULL;
> +
> +	if (!debugfs_resctrl_info)
> +		debugfs_resctrl_info = debugfs_create_dir("info", debugfs_resctrl);
> +
> +	sprintf(name, "%s_MON", r->name);
> +
> +	moninfodir =  debugfs_create_dir(name, debugfs_resctrl_info);

(extra spaces)

> +
> +	return debugfs_create_dir(utsname()->machine, moninfodir);
> +}
> +
>  static bool resctrl_online_domains_exist(void)
>  {
>  	struct rdt_resource *r;

Reinette

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 29/30] x86/resctrl: Add debug info/PERF_PKG_MON/status files
  2025-06-26 16:49 ` [PATCH v6 29/30] x86/resctrl: Add debug info/PERF_PKG_MON/status files Tony Luck
@ 2025-07-09 22:22   ` Reinette Chatre
  0 siblings, 0 replies; 89+ messages in thread
From: Reinette Chatre @ 2025-07-09 22:22 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony,

Subject needs an update?

On 6/26/25 9:49 AM, Tony Luck wrote:
> Each telemetry aggregator provides three status registers at the top
> end of MMIO space after all the per-RMID per-event counters:
> 
>   agg_data_loss_count: This counts the number of times that this aggregator
>   failed to accumulate a counter value supplied by a CPU core.
> 
>   agg_data_loss_timestamp: This is a "timestamp" from a free running
>   25MHz uncore timer indicating when the most recent data loss occurred.
> 
>   last_update_timestamp: Another 25MHz timestamp indicating when the
>   most recent counter update was successfully applied.
> 
> Create files in /sys/kernel/debug/resctrl/info/PERF_PKG_MON/arch/

"/sys/kernel/debug/resctrl/info/PERF_PKG_MON/{arch}/" or
"/sys/kernel/debug/resctrl/info/PERF_PKG_MON/x86_64/"?

> to display the value of each of these status registers for each aggregator
> in each enabled event group.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/kernel/cpu/resctrl/intel_aet.c | 56 +++++++++++++++++++++++++
>  1 file changed, 56 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> index 090e7b35c3e2..422e3e126255 100644
> --- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
> +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> @@ -13,6 +13,7 @@
>  
>  #include <linux/cleanup.h>
>  #include <linux/cpu.h>
> +#include <linux/debugfs.h>
>  #include <linux/intel_vsec.h>
>  #include <linux/io.h>
>  #include <linux/minmax.h>
> @@ -275,6 +276,58 @@ static bool get_pmt_feature(enum pmt_feature_id feature)
>  	return false;
>  }
>  
> +static ssize_t status_read(struct file *f, char __user *buf, size_t count, loff_t *off)
> +{
> +	void __iomem *info = (void __iomem *)f->f_inode->i_private;
> +	char status[32];
> +	int len;
> +
> +	len = sprintf(status, "%llu\n", readq(info));
> +
> +	return simple_read_from_buffer(buf, count, off, status, len);
> +}
> +
> +static const struct file_operations status_fops = {
> +	.read = status_read
> +};

The custom seems to be to use DEFINE_SIMPLE_ATTRIBUTE() that can handle concurrent
reads on an open file descriptor.

> +
> +static void make_status_files(struct dentry *dir, struct event_group *e, int pkg, int instance)
> +{
> +	void *info = (void __force *)e->pkginfo[pkg]->addrs[instance] + e->mmio_size;
> +	char name[64];
> +
> +	sprintf(name, "%s_pkg%d_agg%d_data_loss_count", e->name, pkg, instance);
> +	debugfs_create_file(name, 0400, dir, info - 24, &status_fops);
> +
> +	sprintf(name, "%s_pkg%d_agg%d_data_loss_timestamp", e->name, pkg, instance);
> +	debugfs_create_file(name, 0400, dir, info - 16, &status_fops);
> +
> +	sprintf(name, "%s_pkg%d_agg%d_last_update_timestamp", e->name, pkg, instance);
> +	debugfs_create_file(name, 0400, dir, info - 8, &status_fops);
> +}
> +
> +static void create_debug_event_status_files(struct dentry *dir, struct event_group *e)
> +{
> +	int num_pkgs = topology_max_packages();
> +
> +	for (int i = 0; i < num_pkgs; i++)
> +		for (int j = 0; j < e->pkginfo[i]->num_regions; j++)
> +			make_status_files(dir, e, i, j);
> +}
> +
> +static void create_debugfs_status_file(struct rdt_resource *r)
> +{
> +	struct event_group **eg;
> +	struct dentry *infodir;
> +
> +	infodir = resctrl_debugfs_mon_info_arch_mkdir(r);

I understand that debugfs guidance is that callers should ignore errors returned by
debugfs_create_dir(). Even so, I think it is unnecessary to do so when so much unnecessary
work can be avoided when, for example, debugfs is disabled. I see no harm in bailing
out here if "infodir" cannot be created. If it can be created the other debugfs calls
are likely to succeed so further checking should not be necessary. 
No strong objection from my side if you prefer to keep it like this ... considering that
it is indeed following debugfs guidance.

> +	for (eg = &known_event_groups[0]; eg < &known_event_groups[NUM_KNOWN_GROUPS]; eg++) {
> +		if (!(*eg)->pfg)
> +			continue;
> +		create_debug_event_status_files(infodir, *eg);
> +	}
> +}
> +
>  /*
>   * Ask OOBMSM discovery driver for all the RMID based telemetry groups
>   * that it supports.
> @@ -300,6 +353,9 @@ bool intel_aet_get_events(void)
>  		r->mon_capable = true;
>  	}
>  
> +	if (ret1 || ret2)
> +		create_debugfs_status_file(r);
> +
>  	return ret1 || ret2;
>  }
>  

Reinette

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 30/30] x86,fs/resctrl: Update Documentation for package events
  2025-06-26 16:49 ` [PATCH v6 30/30] x86,fs/resctrl: Update Documentation for package events Tony Luck
@ 2025-07-09 22:24   ` Reinette Chatre
  0 siblings, 0 replies; 89+ messages in thread
From: Reinette Chatre @ 2025-07-09 22:24 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin,
	Anil Keshavamurthy, Chen Yu
  Cc: x86, linux-kernel, patches

Hi Tony,

On 6/26/25 9:49 AM, Tony Luck wrote:
> Each "mon_data" directory is now divided between L3 events and package
> events.
> 
> The "info/PERF_PKG_MON" directory contains parameters for perf events.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  Documentation/filesystems/resctrl.rst | 53 ++++++++++++++++++++++-----
>  1 file changed, 43 insertions(+), 10 deletions(-)
> 
> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
> index c7949dd44f2f..a452fd54b3ae 100644
> --- a/Documentation/filesystems/resctrl.rst
> +++ b/Documentation/filesystems/resctrl.rst
> @@ -167,7 +167,7 @@ with respect to allocation:
>  			bandwidth percentages are directly applied to
>  			the threads running on the core
>  
> -If RDT monitoring is available there will be an "L3_MON" directory
> +If L3 monitoring is available there will be an "L3_MON" directory
>  with the following files:
>  
>  "num_rmids":
> @@ -261,6 +261,23 @@ with the following files:
>  		bytes) at which a previously used LLC_occupancy
>  		counter can be considered for re-use.
>  
> +If telemetry monitoring is available there will be an "PERF_PKG_MON" directory
> +with the following files:
> +
> +"num_rmids":
> +		The number of telemetry RMIDs supported. If this is different
> +		from the number reported in the L3_MON directory the limit
> +		on the number of "CTRL_MON" + "MON" directories is the
> +		minimum of the values.
> +
> +"mon_features":
> +		Lists the telemetry monitoring events that are enabled on this system.
> +
> +When the filesystem is mounted with the debug option each subdirectory
> +for a monitor resource of the "info" directory will contain a "status"
> +file. Resources may use this to supply debug information about the status
> +of the hardware implementing the resource.

Above needs update to match the new architecture specific debug. When doing the
update please consider that it is Intel architecture specific information mixed
in with the resctrl fs documentation so needs to be highlighted as such to not
create expectations that this debugging will be available from all architectures
that support this style of events.

> +
>  Finally, in the top level of the "info" directory there is a file
>  named "last_cmd_status". This is reset with every "command" issued
>  via the file system (making new directories or writing to any of the
> @@ -366,15 +383,31 @@ When control is enabled all CTRL_MON groups will also contain:
>  When monitoring is enabled all MON groups will also contain:
>  
>  "mon_data":
> -	This contains a set of files organized by L3 domain and by
> -	RDT event. E.g. on a system with two L3 domains there will
> -	be subdirectories "mon_L3_00" and "mon_L3_01".	Each of these
> -	directories have one file per event (e.g. "llc_occupancy",
> -	"mbm_total_bytes", and "mbm_local_bytes"). In a MON group these
> -	files provide a read out of the current value of the event for
> -	all tasks in the group. In CTRL_MON groups these files provide
> -	the sum for all tasks in the CTRL_MON group and all tasks in
> -	MON groups. Please see example section for more details on usage.
> +	This contains a set of directories, one for each instance
> +	of an L3 cache, or of a processor package. The L3 cache
> +	directories are named "mon_L3_00", "mon_L3_01" etc. The
> +	package directories "mon_PERF_PKG_00", "mon_PERF_PKG_01" etc.
> +
> +	Within each directory there is one file per event. In
> +	the L3 directories: "llc_occupancy", "mbm_total_bytes",
> +	and "mbm_local_bytes". In the PERF_PKG directories: "core_energy",
> +	"activity", etc.
> +
> +	"core_energy" reports a floating point number for the energy
> +	(in Joules) used by cores for each RMID.
> +
> +	"activity" also reports a floating point value (in Farads).
> +	This provides an estimate of work done independent of the
> +	frequency that the cores used for execution.

Can this get similar treatment as cover letter wrt "core" vs "CPU"?

> +
> +	All other events report decimal integer values.
> +
> +	In a MON group these files provide a read out of the current
> +	value of the event for all tasks in the group. In CTRL_MON groups
> +	these files provide the sum for all tasks in the CTRL_MON group
> +	and all tasks in MON groups. Please see example section for more
> +	details on usage.
> +
>  	On systems with Sub-NUMA Cluster (SNC) enabled there are extra
>  	directories for each node (located within the "mon_L3_XX" directory
>  	for the L3 cache they occupy). These are named "mon_sub_L3_YY"

Reinette

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 18/30] x86/resctrl: Count valid telemetry aggregators per package
  2025-07-09 22:13       ` Reinette Chatre
@ 2025-07-09 22:48         ` Luck, Tony
  2025-07-09 22:59           ` Reinette Chatre
  0 siblings, 1 reply; 89+ messages in thread
From: Luck, Tony @ 2025-07-09 22:48 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Anil Keshavamurthy,
	Chen Yu, x86, linux-kernel, patches

On Wed, Jul 09, 2025 at 03:13:20PM -0700, Reinette Chatre wrote:
> Hi Tony,
> 
> On 7/9/25 11:12 AM, Luck, Tony wrote:
> > On Tue, Jul 08, 2025 at 07:20:35PM -0700, Reinette Chatre wrote:
> >> As I understand there is 1:1 relationship between struct event_group and struct pmt_feature_group.
> >> It thus seems unnecessary to loop through all the telemetry regions of a struct pmt_feature_group
> >> if it is known to not be associated with the "event group"?
> >> Could it be helpful to add a new (hardcoded) event_group::id that is of type enum pmt_feature_id
> >> that can be used to ensure that only relevant struct pmt_feature_group is used to discover events
> >> for a particular struct event_group?
> >>
> >> Another consideration is that this implementation seems to require that guids are unique across
> >> all telemetry regions of all RMID telemetry features, is this guaranteed?
> > 
> > The guids are unique. The XML file tags them like this:
> > 
> > 	<TELEM:uniqueid>26557651</TELEM:uniqueid>
> 
> I interpret above that guid is expected to be unique for one
> telemetry feature. It is not clear to me that it implies that the guid
> is unique across all telemetry features. For example, what prevents
> a platform from using the same guid for all the telemetry features it
> supports?

There are several non-RMID based telemetry MMIO regions in addition to
the two used by this patch series.

Think of the uniqueid as a signature for the format of the region.
Which event counters are present, in which order? How many total
counters? What is the binary format of each counter?

Or think of it as a key. Usermode telemetry tools that access these MMIO
regions use the uniqueid to choose which XML file to use to interpret
the data. I'm effectively doing this, but without including an XML
parser in the kernel. Just distilling each XML file to the basic
essence described in the event_group structure.

It would be a catastrophic failure if Intel assigned the same uniqueid
to regions that had different formats.

> 
> > 
> > the "guid" naming of the value comes from the Intel PMT_DISCOVERY driver.
> > 
> > An alternative to adding the new event_group::id field would be to
> > separate the arrays of known event groups. I.e. change from:
> > 
> > static struct event_group *known_event_groups[] = {
> >         &energy_0x26696143,
> >         &perf_0x26557651,
> > };
> > 
> > to
> > 
> > static struct event_group *known_energy_event_groups[] = {
> >         &energy_0x26696143,
> > };
> > 
> > static struct event_group *known_perf_event_groups[] = {
> >         &perf_0x26557651,
> > };
> > 
> > then only scan the appropriate array that matches the
> > enum pmt_feature_id passed to get_pmt_feature().
> > 
> > 
> > With only one option in each array today this looks
> > like extra infrasctruture. But I already have a patch
> > for the next generation system that adds another guid.
> 
> This also sounds good. Thank you.

Ok. Thanks. I'll put this idea into code in the next version.

> 
> Reinette
> 

-Tony

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH v6 18/30] x86/resctrl: Count valid telemetry aggregators per package
  2025-07-09 22:48         ` Luck, Tony
@ 2025-07-09 22:59           ` Reinette Chatre
  0 siblings, 0 replies; 89+ messages in thread
From: Reinette Chatre @ 2025-07-09 22:59 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Anil Keshavamurthy,
	Chen Yu, x86, linux-kernel, patches

Hi Tony,

On 7/9/25 3:48 PM, Luck, Tony wrote:
> On Wed, Jul 09, 2025 at 03:13:20PM -0700, Reinette Chatre wrote:
>> Hi Tony,
>>
>> On 7/9/25 11:12 AM, Luck, Tony wrote:
>>> On Tue, Jul 08, 2025 at 07:20:35PM -0700, Reinette Chatre wrote:
>>>> As I understand there is 1:1 relationship between struct event_group and struct pmt_feature_group.
>>>> It thus seems unnecessary to loop through all the telemetry regions of a struct pmt_feature_group
>>>> if it is known to not be associated with the "event group"?
>>>> Could it be helpful to add a new (hardcoded) event_group::id that is of type enum pmt_feature_id
>>>> that can be used to ensure that only relevant struct pmt_feature_group is used to discover events
>>>> for a particular struct event_group?
>>>>
>>>> Another consideration is that this implementation seems to require that guids are unique across
>>>> all telemetry regions of all RMID telemetry features, is this guaranteed?
>>>
>>> The guids are unique. The XML file tags them like this:
>>>
>>> 	<TELEM:uniqueid>26557651</TELEM:uniqueid>
>>
>> I interpret above that guid is expected to be unique for one
>> telemetry feature. It is not clear to me that it implies that the guid
>> is unique across all telemetry features. For example, what prevents
>> a platform from using the same guid for all the telemetry features it
>> supports?
> 
> There are several non-RMID based telemetry MMIO regions in addition to
> the two used by this patch series.
> 
> Think of the uniqueid as a signature for the format of the region.
> Which event counters are present, in which order? How many total
> counters? What is the binary format of each counter?

I understand this.

> 
> Or think of it as a key. Usermode telemetry tools that access these MMIO
> regions use the uniqueid to choose which XML file to use to interpret
> the data. I'm effectively doing this, but without including an XML
> parser in the kernel. Just distilling each XML file to the basic
> essence described in the event_group structure.

Right. If the analogy it is about ID used to pick the right XML to use then the
question is whether the ID is unique per directory, i.e all XML files in
RMID-ENERGY directory can be expected to have unique ID, or across all directories,
i.e XML files across RMID-ENERGY and RMID-PERF can be expected to have unique ID.

> It would be a catastrophic failure if Intel assigned the same uniqueid
> to regions that had different formats.

I think we may be speaking past each other. New code will address concern
anyway though.

Reinette



^ permalink raw reply	[flat|nested] 89+ messages in thread

end of thread, other threads:[~2025-07-09 23:00 UTC | newest]

Thread overview: 89+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-26 16:49 [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Tony Luck
2025-06-26 16:49 ` [PATCH v6 01/30] x86,fs/resctrl: Consolidate monitor event descriptions Tony Luck
2025-06-27 21:55   ` Fenghua Yu
2025-07-08 20:52   ` Reinette Chatre
2025-06-26 16:49 ` [PATCH v6 02/30] x86,fs/resctrl: Replace architecture event enabled checks Tony Luck
2025-06-27 22:15   ` Fenghua Yu
2025-07-08 20:52   ` Reinette Chatre
2025-06-26 16:49 ` [PATCH v6 03/30] x86/resctrl: Remove 'rdt_mon_features' global variable Tony Luck
2025-07-08 20:53   ` Reinette Chatre
2025-06-26 16:49 ` [PATCH v6 04/30] x86,fs/resctrl: Prepare for more monitor events Tony Luck
2025-07-08 20:55   ` Reinette Chatre
2025-06-26 16:49 ` [PATCH v6 05/30] x86,fs/resctrl: Improve domain type checking Tony Luck
2025-07-08 21:01   ` Reinette Chatre
2025-06-26 16:49 ` [PATCH v6 06/30] x86/resctrl: Move L3 initialization out of domain_add_cpu_mon() Tony Luck
2025-07-08 20:56   ` Reinette Chatre
2025-06-26 16:49 ` [PATCH v6 07/30] x86,fs/resctrl: Refactor domain_remove_cpu_mon() ready for new domain types Tony Luck
2025-07-08 20:57   ` Reinette Chatre
2025-06-26 16:49 ` [PATCH v6 08/30] x86/resctrl: Clean up domain_remove_cpu_ctrl() Tony Luck
2025-06-26 16:49 ` [PATCH v6 09/30] x86,fs/resctrl: Use struct rdt_domain_hdr instead of struct rdt_mon_domain Tony Luck
2025-07-08 21:04   ` Reinette Chatre
2025-06-26 16:49 ` [PATCH v6 10/30] x86,fs/resctrl: Rename struct rdt_mon_domain and rdt_hw_mon_domain Tony Luck
2025-07-08 21:06   ` Reinette Chatre
2025-06-26 16:49 ` [PATCH v6 11/30] x86,fs/resctrl: Rename some L3 specific functions Tony Luck
2025-07-08 21:08   ` Reinette Chatre
2025-06-26 16:49 ` [PATCH v6 12/30] fs/resctrl: Make event details accessible to functions when reading events Tony Luck
2025-07-09 22:12   ` Reinette Chatre
2025-06-26 16:49 ` [PATCH v6 13/30] x86,fs/resctrl: Handle events that can be read from any CPU Tony Luck
2025-07-08 21:15   ` Reinette Chatre
2025-06-26 16:49 ` [PATCH v6 14/30] x86,fs/resctrl: Support binary fixed point event counters Tony Luck
2025-06-27 21:22   ` Fenghua Yu
2025-06-27 22:28     ` Luck, Tony
2025-06-27 21:49   ` Fenghua Yu
2025-07-08 21:46   ` Reinette Chatre
2025-07-09 16:52     ` Luck, Tony
2025-06-26 16:49 ` [PATCH v6 15/30] x86,fs/resctrl: Add an architectural hook called for each mount Tony Luck
2025-06-26 16:49 ` [PATCH v6 16/30] x86,fs/resctrl: Add and initialize rdt_resource for package scope core monitor Tony Luck
2025-07-08 22:05   ` Reinette Chatre
2025-06-26 16:49 ` [PATCH v6 17/30] x86/resctrl: Discover hardware telemetry events Tony Luck
2025-06-27 18:06   ` Luck, Tony
2025-07-03 18:27     ` Reinette Chatre
2025-07-03 20:17       ` Luck, Tony
2025-07-03 20:31         ` Reinette Chatre
2025-07-03 21:11           ` Luck, Tony
2025-07-03 22:00             ` Reinette Chatre
2025-07-03 23:29               ` Luck, Tony
2025-07-08 23:51   ` Reinette Chatre
2025-06-26 16:49 ` [PATCH v6 18/30] x86/resctrl: Count valid telemetry aggregators per package Tony Luck
2025-07-09  2:20   ` Reinette Chatre
2025-07-09 18:12     ` Luck, Tony
2025-07-09 22:13       ` Reinette Chatre
2025-07-09 22:48         ` Luck, Tony
2025-07-09 22:59           ` Reinette Chatre
2025-06-26 16:49 ` [PATCH v6 19/30] x86/resctrl: Complete telemetry event enumeration Tony Luck
2025-07-09  2:38   ` Reinette Chatre
2025-06-26 16:49 ` [PATCH v6 20/30] x86,fs/resctrl: Fill in details of Clearwater Forest events Tony Luck
2025-07-09  3:00   ` Reinette Chatre
2025-06-26 16:49 ` [PATCH v6 21/30] x86,fs/resctrl: Add architectural event pointer Tony Luck
2025-07-09  3:21   ` Reinette Chatre
2025-07-09 21:16     ` Reinette Chatre
2025-06-26 16:49 ` [PATCH v6 22/30] x86/resctrl: Read core telemetry events Tony Luck
2025-07-09 15:48   ` Reinette Chatre
2025-07-09 21:57     ` Luck, Tony
2025-07-09 22:13       ` Reinette Chatre
2025-06-26 16:49 ` [PATCH v6 23/30] x86/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG Tony Luck
2025-07-09 22:13   ` Reinette Chatre
2025-06-26 16:49 ` [PATCH v6 24/30] x86/resctrl: Add energy/perf choices to rdt boot option Tony Luck
2025-07-09 22:14   ` Reinette Chatre
2025-06-26 16:49 ` [PATCH v6 25/30] x86/resctrl: Handle number of RMIDs supported by telemetry resources Tony Luck
2025-07-09 22:17   ` Reinette Chatre
2025-06-26 16:49 ` [PATCH v6 26/30] x86,fs/resctrl: Move RMID initialization to first mount Tony Luck
2025-07-09 22:18   ` Reinette Chatre
2025-06-26 16:49 ` [PATCH v6 27/30] x86/resctrl: Enable RDT_RESOURCE_PERF_PKG Tony Luck
2025-06-26 16:49 ` [PATCH v6 28/30] fs/resctrl: Provide interface to create a debugfs info directory Tony Luck
2025-07-09 22:19   ` Reinette Chatre
2025-06-26 16:49 ` [PATCH v6 29/30] x86/resctrl: Add debug info/PERF_PKG_MON/status files Tony Luck
2025-07-09 22:22   ` Reinette Chatre
2025-06-26 16:49 ` [PATCH v6 30/30] x86,fs/resctrl: Update Documentation for package events Tony Luck
2025-07-09 22:24   ` Reinette Chatre
2025-06-27  0:26 ` [PATCH v6 00/30] x86,fs/resctrl telemetry monitoring Luck, Tony
2025-06-27 18:09   ` Luck, Tony
2025-06-30 17:51 ` Reinette Chatre
2025-06-30 22:46   ` Luck, Tony
2025-07-08 20:50     ` Reinette Chatre
2025-07-03 16:45 ` Reinette Chatre
2025-07-03 17:22   ` Luck, Tony
2025-07-08 19:08     ` Luck, Tony
2025-07-08 20:49       ` Reinette Chatre
2025-07-08 22:43         ` Luck, Tony
2025-07-08 23:26           ` Reinette Chatre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).