[PATCH v7 00/14] Allow AET to use PMT as loadable module

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v7 00/14] Allow AET to use PMT as loadable module
@ 2026-06-01 19:56 Tony Luck
  2026-06-01 19:56 ` [PATCH v7 01/14] fs/resctrl: Move functions to avoid forward references in subsequent fixes Tony Luck
                   ` (13 more replies)
  0 siblings, 14 replies; 44+ messages in thread
From: Tony Luck @ 2026-06-01 19:56 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu,
	David E Box, x86
  Cc: Christoph Hellwig, linux-kernel, patches, Tony Luck

Requiring INTEL_PMT_TELEMETRY=y to enable AET is a functional workaround
to enable enumeration of Application Energy Telemetry (AET) events, but
unacceptable to many users. It results in increased configration complexity,
increased kernel memory footprint and inability to patch problems by unloading
a module and loading an updated version.

Add a registration function to the AET code that can be used by
INTEL_PMT_TELEMETRY to provide the enumeration functions.

INTEL_PMT_TELEMETRY can be loaded/unloaded independently of
resctrl file system mount/unmount. Perform enumeration on
every mount and cleanup on every unmount.

Signed-off-by: Tony Luck <tony.luck@intel.com>

---

Changes since v6:
Link: https://lore.kernel.org/all/20260429184858.36423-1-tony.luck@intel.com/

Sashiko found some unrelated issues during review of v6 which have
been resolved here:
Link: https://lore.kernel.org/all/cover.1779476724.git.reinette.chatre@intel.com/

The first four patches from that series are included as patches 1-4 here
because I need the re-worked file system mount/unmount code as a base.

Patches 5-14 are the real meat of this series. Changes since v6:

1) Now pass a pointer to THIS_MODULE from pmt_telemetry via the
registration call so that AET can add/remove references instead
of this being done inside the pmt_module.

2) Limit exposure of the registration functions using EXPORT_SYMBOL_NS_GPL()

3) Fix a bug that could leak holds on the pmt_module

4) Drop the call to request_module("pmt_telemetry"). Platforms that
support AET will auto-load this based on the PCIe ID of the OOBMSM
device. Forcing a load on other platforms is a waste of time.

5) Move the unregistration to the start of pmt_telem_exit()

6) Add resctrl_arch_unmount() stub to MPAM code.

7) Use resctrl_arch_rmid_idx_encode() in limbo_release_entry() when
deciding whether to add an RMID to the free list.

8) Kconfig: drop dependency on INTEL_PMT_TELEMETRY. If a kernel is being
built for a system that supports Intel telemetry, then the user will
select it themself. Including the dependency may hide the option to
select X86_CPU_RESCTRL.

Reinette Chatre (1):
  fs/resctrl: Fix deadlock for errors during mount

Tony Luck (13):
  fs/resctrl: Move functions to avoid forward references in subsequent
    fixes
  fs/resctrl: Free mon_data structures on rdt_get_tree() failure
  fs/resctrl: Fix use-after-free during unmount
  x86/resctrl: Stop setting event_group::force_off on RMID shortage
  fs/resctrl: Add interface to disable a monitor event
  x86/resctrl: Maintain a count of enabled monitor features
  fs,x86,mpam/resctrl: Handle change in number of RMIDs on each mount
  x86/resctrl: Add PMT registration API for AET enumeration callbacks
  platform/x86/intel/pmt: Register enumeration functions with resctrl
  mpam,x86/resctrl: Resolve INTEL_PMT_TELEMETRY symbols at runtime
  fs/resctrl: Call architecture hooks for every mount/unmount
  x86/resctrl: Simplify Kconfig options for resctrl
  Documentation/filesystems/resctrl: Add footnote for telemetry fstab
    mount caveat

 Documentation/filesystems/resctrl.rst      |   7 +-
 include/linux/resctrl.h                    |  43 +-
 arch/x86/include/asm/resctrl.h             |  19 +-
 arch/x86/kernel/cpu/resctrl/internal.h     |  19 +-
 arch/x86/kernel/cpu/resctrl/core.c         |  77 +++-
 arch/x86/kernel/cpu/resctrl/intel_aet.c    |  88 +++-
 arch/x86/kernel/cpu/resctrl/monitor.c      |  11 +-
 drivers/platform/x86/intel/pmt/telemetry.c |  13 +-
 drivers/resctrl/mpam_resctrl.c             |   9 +
 fs/resctrl/monitor.c                       |  59 ++-
 fs/resctrl/rdtgroup.c                      | 458 ++++++++++++---------
 arch/x86/Kconfig                           |  15 +-
 arch/x86/kernel/cpu/resctrl/Makefile       |   2 +-
 13 files changed, 531 insertions(+), 289 deletions(-)

base-commit: e43ffb69e0438cddd72aaa30898b4dc446f664f8
-- 
2.54.0

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v7 01/14] fs/resctrl: Move functions to avoid forward references in subsequent fixes
  2026-06-01 19:56 [PATCH v7 00/14] Allow AET to use PMT as loadable module Tony Luck
@ 2026-06-01 19:56 ` Tony Luck
  2026-06-01 19:56 ` [PATCH v7 02/14] fs/resctrl: Free mon_data structures on rdt_get_tree() failure Tony Luck
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 44+ messages in thread
From: Tony Luck @ 2026-06-01 19:56 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu,
	David E Box, x86
  Cc: Christoph Hellwig, linux-kernel, patches, Tony Luck, Ben Horgan

rdt_get_tree() manages resctrl fs mount and rdt_kill_sb() manages resctrl
fs unmount.

There is significant overlap between error cleanup during resctrl mount
failure and cleanup on resctrl unmount yet the cleanup is not done
consistently in these two flows.

Pull some cleanup functions before rdt_get_tree() in preparation for
a new helper that can be shared between mount and unmount.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
---
 fs/resctrl/rdtgroup.c | 376 +++++++++++++++++++++---------------------
 1 file changed, 188 insertions(+), 188 deletions(-)

diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 5dfdaa6f9d8f..a6376a3fc4c3 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2782,6 +2782,194 @@ static void schemata_list_destroy(void)
 	}
 }
 
+/*
+ * Move tasks from one to the other group. If @from is NULL, then all tasks
+ * in the systems are moved unconditionally (used for teardown).
+ *
+ * If @mask is not NULL the cpus on which moved tasks are running are set
+ * in that mask so the update smp function call is restricted to affected
+ * cpus.
+ */
+static void rdt_move_group_tasks(struct rdtgroup *from, struct rdtgroup *to,
+				 struct cpumask *mask)
+{
+	struct task_struct *p, *t;
+
+	read_lock(&tasklist_lock);
+	for_each_process_thread(p, t) {
+		if (!from || is_closid_match(t, from) ||
+		    is_rmid_match(t, from)) {
+			resctrl_arch_set_closid_rmid(t, to->closid,
+						     to->mon.rmid);
+
+			/*
+			 * Order the closid/rmid stores above before the loads
+			 * in task_curr(). This pairs with the full barrier
+			 * between the rq->curr update and
+			 * resctrl_arch_sched_in() during context switch.
+			 */
+			smp_mb();
+
+			/*
+			 * If the task is on a CPU, set the CPU in the mask.
+			 * The detection is inaccurate as tasks might move or
+			 * schedule before the smp function call takes place.
+			 * In such a case the function call is pointless, but
+			 * there is no other side effect.
+			 */
+			if (IS_ENABLED(CONFIG_SMP) && mask && task_curr(t))
+				cpumask_set_cpu(task_cpu(t), mask);
+		}
+	}
+	read_unlock(&tasklist_lock);
+}
+
+static void free_all_child_rdtgrp(struct rdtgroup *rdtgrp)
+{
+	struct rdtgroup *sentry, *stmp;
+	struct list_head *head;
+
+	head = &rdtgrp->mon.crdtgrp_list;
+	list_for_each_entry_safe(sentry, stmp, head, mon.crdtgrp_list) {
+		rdtgroup_unassign_cntrs(sentry);
+		free_rmid(sentry->closid, sentry->mon.rmid);
+		list_del(&sentry->mon.crdtgrp_list);
+
+		if (atomic_read(&sentry->waitcount) != 0)
+			sentry->flags = RDT_DELETED;
+		else
+			rdtgroup_remove(sentry);
+	}
+}
+
+/*
+ * Forcibly remove all of subdirectories under root.
+ */
+static void rmdir_all_sub(void)
+{
+	struct rdtgroup *rdtgrp, *tmp;
+
+	/* Move all tasks to the default resource group */
+	rdt_move_group_tasks(NULL, &rdtgroup_default, NULL);
+
+	list_for_each_entry_safe(rdtgrp, tmp, &rdt_all_groups, rdtgroup_list) {
+		/* Free any child rmids */
+		free_all_child_rdtgrp(rdtgrp);
+
+		/* Remove each rdtgroup other than root */
+		if (rdtgrp == &rdtgroup_default)
+			continue;
+
+		if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP ||
+		    rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED)
+			rdtgroup_pseudo_lock_remove(rdtgrp);
+
+		/*
+		 * Give any CPUs back to the default group. We cannot copy
+		 * cpu_online_mask because a CPU might have executed the
+		 * offline callback already, but is still marked online.
+		 */
+		cpumask_or(&rdtgroup_default.cpu_mask,
+			   &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask);
+
+		rdtgroup_unassign_cntrs(rdtgrp);
+
+		free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
+
+		kernfs_remove(rdtgrp->kn);
+		list_del(&rdtgrp->rdtgroup_list);
+
+		if (atomic_read(&rdtgrp->waitcount) != 0)
+			rdtgrp->flags = RDT_DELETED;
+		else
+			rdtgroup_remove(rdtgrp);
+	}
+	/* Notify online CPUs to update per cpu storage and PQR_ASSOC MSR */
+	update_closid_rmid(cpu_online_mask, &rdtgroup_default);
+
+	kernfs_remove(kn_info);
+	kernfs_remove(kn_mongrp);
+	kernfs_remove(kn_mondata);
+}
+
+/**
+ * mon_get_kn_priv() - Get the mon_data priv data for this event.
+ *
+ * The same values are used across the mon_data directories of all control and
+ * monitor groups for the same event in the same domain. Keep a list of
+ * allocated structures and re-use an existing one with the same values for
+ * @rid, @domid, etc.
+ *
+ * @rid:    The resource id for the event file being created.
+ * @domid:  The domain id for the event file being created.
+ * @mevt:   The type of event file being created.
+ * @do_sum: Whether SNC summing monitors are being created. Only set
+ *	    when @rid == RDT_RESOURCE_L3.
+ *
+ * Return: Pointer to mon_data private data of the event, NULL on failure.
+ */
+static struct mon_data *mon_get_kn_priv(enum resctrl_res_level rid, int domid,
+					struct mon_evt *mevt,
+					bool do_sum)
+{
+	struct mon_data *priv;
+
+	lockdep_assert_held(&rdtgroup_mutex);
+
+	list_for_each_entry(priv, &mon_data_kn_priv_list, list) {
+		if (priv->rid == rid && priv->domid == domid &&
+		    priv->sum == do_sum && priv->evt == mevt)
+			return priv;
+	}
+
+	priv = kzalloc_obj(*priv);
+	if (!priv)
+		return NULL;
+
+	priv->rid = rid;
+	priv->domid = domid;
+	priv->sum = do_sum;
+	priv->evt = mevt;
+	list_add_tail(&priv->list, &mon_data_kn_priv_list);
+
+	return priv;
+}
+
+/**
+ * mon_put_kn_priv() - Free all allocated mon_data structures.
+ *
+ * Called when resctrl file system is unmounted.
+ */
+static void mon_put_kn_priv(void)
+{
+	struct mon_data *priv, *tmp;
+
+	lockdep_assert_held(&rdtgroup_mutex);
+
+	list_for_each_entry_safe(priv, tmp, &mon_data_kn_priv_list, list) {
+		list_del(&priv->list);
+		kfree(priv);
+	}
+}
+
+static void resctrl_fs_teardown(void)
+{
+	lockdep_assert_held(&rdtgroup_mutex);
+
+	/* Cleared by rdtgroup_destroy_root() */
+	if (!rdtgroup_default.kn)
+		return;
+
+	rmdir_all_sub();
+	rdtgroup_unassign_cntrs(&rdtgroup_default);
+	mon_put_kn_priv();
+	rdt_pseudo_lock_release();
+	rdtgroup_default.mode = RDT_MODE_SHAREABLE;
+	closid_exit();
+	schemata_list_destroy();
+	rdtgroup_destroy_root();
+}
+
 static int rdt_get_tree(struct fs_context *fc)
 {
 	struct rdt_fs_context *ctx = rdt_fc2context(fc);
@@ -2981,194 +3169,6 @@ static int rdt_init_fs_context(struct fs_context *fc)
 	return 0;
 }
 
-/*
- * Move tasks from one to the other group. If @from is NULL, then all tasks
- * in the systems are moved unconditionally (used for teardown).
- *
- * If @mask is not NULL the cpus on which moved tasks are running are set
- * in that mask so the update smp function call is restricted to affected
- * cpus.
- */
-static void rdt_move_group_tasks(struct rdtgroup *from, struct rdtgroup *to,
-				 struct cpumask *mask)
-{
-	struct task_struct *p, *t;
-
-	read_lock(&tasklist_lock);
-	for_each_process_thread(p, t) {
-		if (!from || is_closid_match(t, from) ||
-		    is_rmid_match(t, from)) {
-			resctrl_arch_set_closid_rmid(t, to->closid,
-						     to->mon.rmid);
-
-			/*
-			 * Order the closid/rmid stores above before the loads
-			 * in task_curr(). This pairs with the full barrier
-			 * between the rq->curr update and
-			 * resctrl_arch_sched_in() during context switch.
-			 */
-			smp_mb();
-
-			/*
-			 * If the task is on a CPU, set the CPU in the mask.
-			 * The detection is inaccurate as tasks might move or
-			 * schedule before the smp function call takes place.
-			 * In such a case the function call is pointless, but
-			 * there is no other side effect.
-			 */
-			if (IS_ENABLED(CONFIG_SMP) && mask && task_curr(t))
-				cpumask_set_cpu(task_cpu(t), mask);
-		}
-	}
-	read_unlock(&tasklist_lock);
-}
-
-static void free_all_child_rdtgrp(struct rdtgroup *rdtgrp)
-{
-	struct rdtgroup *sentry, *stmp;
-	struct list_head *head;
-
-	head = &rdtgrp->mon.crdtgrp_list;
-	list_for_each_entry_safe(sentry, stmp, head, mon.crdtgrp_list) {
-		rdtgroup_unassign_cntrs(sentry);
-		free_rmid(sentry->closid, sentry->mon.rmid);
-		list_del(&sentry->mon.crdtgrp_list);
-
-		if (atomic_read(&sentry->waitcount) != 0)
-			sentry->flags = RDT_DELETED;
-		else
-			rdtgroup_remove(sentry);
-	}
-}
-
-/*
- * Forcibly remove all of subdirectories under root.
- */
-static void rmdir_all_sub(void)
-{
-	struct rdtgroup *rdtgrp, *tmp;
-
-	/* Move all tasks to the default resource group */
-	rdt_move_group_tasks(NULL, &rdtgroup_default, NULL);
-
-	list_for_each_entry_safe(rdtgrp, tmp, &rdt_all_groups, rdtgroup_list) {
-		/* Free any child rmids */
-		free_all_child_rdtgrp(rdtgrp);
-
-		/* Remove each rdtgroup other than root */
-		if (rdtgrp == &rdtgroup_default)
-			continue;
-
-		if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP ||
-		    rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED)
-			rdtgroup_pseudo_lock_remove(rdtgrp);
-
-		/*
-		 * Give any CPUs back to the default group. We cannot copy
-		 * cpu_online_mask because a CPU might have executed the
-		 * offline callback already, but is still marked online.
-		 */
-		cpumask_or(&rdtgroup_default.cpu_mask,
-			   &rdtgroup_default.cpu_mask, &rdtgrp->cpu_mask);
-
-		rdtgroup_unassign_cntrs(rdtgrp);
-
-		free_rmid(rdtgrp->closid, rdtgrp->mon.rmid);
-
-		kernfs_remove(rdtgrp->kn);
-		list_del(&rdtgrp->rdtgroup_list);
-
-		if (atomic_read(&rdtgrp->waitcount) != 0)
-			rdtgrp->flags = RDT_DELETED;
-		else
-			rdtgroup_remove(rdtgrp);
-	}
-	/* Notify online CPUs to update per cpu storage and PQR_ASSOC MSR */
-	update_closid_rmid(cpu_online_mask, &rdtgroup_default);
-
-	kernfs_remove(kn_info);
-	kernfs_remove(kn_mongrp);
-	kernfs_remove(kn_mondata);
-}
-
-/**
- * mon_get_kn_priv() - Get the mon_data priv data for this event.
- *
- * The same values are used across the mon_data directories of all control and
- * monitor groups for the same event in the same domain. Keep a list of
- * allocated structures and re-use an existing one with the same values for
- * @rid, @domid, etc.
- *
- * @rid:    The resource id for the event file being created.
- * @domid:  The domain id for the event file being created.
- * @mevt:   The type of event file being created.
- * @do_sum: Whether SNC summing monitors are being created. Only set
- *	    when @rid == RDT_RESOURCE_L3.
- *
- * Return: Pointer to mon_data private data of the event, NULL on failure.
- */
-static struct mon_data *mon_get_kn_priv(enum resctrl_res_level rid, int domid,
-					struct mon_evt *mevt,
-					bool do_sum)
-{
-	struct mon_data *priv;
-
-	lockdep_assert_held(&rdtgroup_mutex);
-
-	list_for_each_entry(priv, &mon_data_kn_priv_list, list) {
-		if (priv->rid == rid && priv->domid == domid &&
-		    priv->sum == do_sum && priv->evt == mevt)
-			return priv;
-	}
-
-	priv = kzalloc_obj(*priv);
-	if (!priv)
-		return NULL;
-
-	priv->rid = rid;
-	priv->domid = domid;
-	priv->sum = do_sum;
-	priv->evt = mevt;
-	list_add_tail(&priv->list, &mon_data_kn_priv_list);
-
-	return priv;
-}
-
-/**
- * mon_put_kn_priv() - Free all allocated mon_data structures.
- *
- * Called when resctrl file system is unmounted.
- */
-static void mon_put_kn_priv(void)
-{
-	struct mon_data *priv, *tmp;
-
-	lockdep_assert_held(&rdtgroup_mutex);
-
-	list_for_each_entry_safe(priv, tmp, &mon_data_kn_priv_list, list) {
-		list_del(&priv->list);
-		kfree(priv);
-	}
-}
-
-static void resctrl_fs_teardown(void)
-{
-	lockdep_assert_held(&rdtgroup_mutex);
-
-	/* Cleared by rdtgroup_destroy_root() */
-	if (!rdtgroup_default.kn)
-		return;
-
-	rmdir_all_sub();
-	rdtgroup_unassign_cntrs(&rdtgroup_default);
-	mon_put_kn_priv();
-	rdt_pseudo_lock_release();
-	rdtgroup_default.mode = RDT_MODE_SHAREABLE;
-	closid_exit();
-	schemata_list_destroy();
-	rdtgroup_destroy_root();
-}
-
 static void rdt_kill_sb(struct super_block *sb)
 {
 	struct rdt_resource *r;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v7 02/14] fs/resctrl: Free mon_data structures on rdt_get_tree() failure
  2026-06-01 19:56 [PATCH v7 00/14] Allow AET to use PMT as loadable module Tony Luck
  2026-06-01 19:56 ` [PATCH v7 01/14] fs/resctrl: Move functions to avoid forward references in subsequent fixes Tony Luck
@ 2026-06-01 19:56 ` Tony Luck
  2026-06-01 19:56 ` [PATCH v7 03/14] fs/resctrl: Fix use-after-free during unmount Tony Luck
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 44+ messages in thread
From: Tony Luck @ 2026-06-01 19:56 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu,
	David E Box, x86
  Cc: Christoph Hellwig, linux-kernel, patches, Tony Luck, Ben Horgan

If mkdir_mondata_all() or a subsequent call in rdt_get_tree() fails, the
mon_data structures allocated by mon_get_kn_priv() are leaked.

Add mon_put_kn_priv() to the out_mongrp error path to free the mon_data
structures.

Fixes: 2a6566038544 ("x86/resctrl: Expand the width of domid by replacing mon_data_bits")
Reported-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
---
 fs/resctrl/rdtgroup.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index a6376a3fc4c3..506b40dc9430 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -3071,6 +3071,7 @@ static int rdt_get_tree(struct fs_context *fc)
 		kernfs_remove(kn_mondata);
 out_mongrp:
 	if (resctrl_arch_mon_capable()) {
+		mon_put_kn_priv();
 		rdtgroup_unassign_cntrs(&rdtgroup_default);
 		kernfs_remove(kn_mongrp);
 	}
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v7 03/14] fs/resctrl: Fix use-after-free during unmount
  2026-06-01 19:56 [PATCH v7 00/14] Allow AET to use PMT as loadable module Tony Luck
  2026-06-01 19:56 ` [PATCH v7 01/14] fs/resctrl: Move functions to avoid forward references in subsequent fixes Tony Luck
  2026-06-01 19:56 ` [PATCH v7 02/14] fs/resctrl: Free mon_data structures on rdt_get_tree() failure Tony Luck
@ 2026-06-01 19:56 ` Tony Luck
  2026-06-01 19:56 ` [PATCH v7 04/14] fs/resctrl: Fix deadlock for errors during mount Tony Luck
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 44+ messages in thread
From: Tony Luck @ 2026-06-01 19:56 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu,
	David E Box, x86
  Cc: Christoph Hellwig, linux-kernel, patches, Tony Luck, Sashiko

During unmount or failure teardown all mon_data structures that contain
monitoring event file private data are freed after which kernfs nodes are
removed. However, the RDT_DELETED flag is never set for the statically
allocated default resource group.

A concurrent reader of an event file associated with the default resource
group may, after dropping kernfs active protection, block on rdtgroup_mutex
while unmount proceeds to free the file private data and destroy the kernfs
node without waiting for the reader.

When the mutex is released, the reader wakes up, observes that RDT_DELETED
is not set for the default group, and dereferences the already-freed
file private data.

Set RDT_DELETED for the default group unconditionally since the flag does
not lead to free of this statically allocated group.

Do not allow a new resctrl mount if there are any waiters on default group
of previous mount. A new mount will re-initialize the default group that
would appear to waiters from previous mount as though the default group is
accessible causing them to access the mon_data structures from the previous
mount that have been removed.

Fixes: 2a6566038544 ("x86/resctrl: Expand the width of domid by replacing mon_data_bits")
Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://sashiko.dev/#/patchset/20260508182143.14592-1-tony.luck%40intel.com?part=2 [1]
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
---
 fs/resctrl/rdtgroup.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 506b40dc9430..ac3285ba8775 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -585,14 +585,20 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of,
  *
  * On resource group creation via a mkdir, an extra kernfs_node reference is
  * taken to ensure that the rdtgroup structure remains accessible for the
- * rdtgroup_kn_unlock() calls where it is removed.
+ * rdtgroup_kn_unlock() calls where it is removed. The default group is
+ * statically allocated: it does not have an extra reference but will have
+ * RDT_DELETED set on unmount to support safe access to its associated files
+ * via rdtgroup_kn_lock_live/rdtgroup_kn_unlock().
  *
- * Drop the extra reference here, then free the rdtgroup structure.
+ * For all but the default group: drop the extra reference, then free the
+ * rdtgroup structure.
  *
  * Return: void
  */
 static void rdtgroup_remove(struct rdtgroup *rdtgrp)
 {
+	if (rdtgrp == &rdtgroup_default)
+		return;
 	kernfs_put(rdtgrp->kn);
 	kfree(rdtgrp);
 }
@@ -2965,6 +2971,7 @@ static void resctrl_fs_teardown(void)
 	mon_put_kn_priv();
 	rdt_pseudo_lock_release();
 	rdtgroup_default.mode = RDT_MODE_SHAREABLE;
+	rdtgroup_default.flags = RDT_DELETED;
 	closid_exit();
 	schemata_list_destroy();
 	rdtgroup_destroy_root();
@@ -2990,6 +2997,12 @@ static int rdt_get_tree(struct fs_context *fc)
 		goto out;
 	}
 
+	/* Avoid races from pending operations from a previous mount */
+	if (atomic_read(&rdtgroup_default.waitcount) != 0) {
+		ret = -EBUSY;
+		goto out;
+	}
+
 	ret = setup_rmid_lru_list();
 	if (ret)
 		goto out;
@@ -4265,6 +4278,7 @@ static int rdtgroup_setup_root(struct rdt_fs_context *ctx)
 
 	ctx->kfc.root = rdt_root;
 	rdtgroup_default.kn = kernfs_root_to_node(rdt_root);
+	rdtgroup_default.flags = 0;
 
 	return 0;
 }
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v7 04/14] fs/resctrl: Fix deadlock for errors during mount
  2026-06-01 19:56 [PATCH v7 00/14] Allow AET to use PMT as loadable module Tony Luck
                   ` (2 preceding siblings ...)
  2026-06-01 19:56 ` [PATCH v7 03/14] fs/resctrl: Fix use-after-free during unmount Tony Luck
@ 2026-06-01 19:56 ` Tony Luck
  2026-06-01 19:56 ` [PATCH v7 05/14] x86/resctrl: Stop setting event_group::force_off on RMID shortage Tony Luck
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 44+ messages in thread
From: Tony Luck @ 2026-06-01 19:56 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu,
	David E Box, x86
  Cc: Christoph Hellwig, linux-kernel, patches, Sashiko, Tony Luck,
	Ben Horgan

From: Reinette Chatre <reinette.chatre@intel.com>

rdt_get_tree() acquires rdtgroup_mutex before calling kernfs_get_tree(). If
superblock setup fails inside kernfs_get_tree(), the VFS calls kill_sb on
the same thread before the call returns.  rdt_kill_sb() unconditionally
attempts to acquire rdtgroup_mutex and deadlock occurs.

Move the call to kernfs_get_tree() outside of locks. If kernfs_get_tree()
fails and ctx->kfc.new_sb_created is set, then rdt_kill_sb() has already
been called and no further cleanup is needed.

Add an extra hold in this error path on rdtgroup_default.kn to defend against
other races destroying the root which is then dereferenced in kernfs_kill_sb()

Add resctrl_unmount() helper to keep code consistent between the
rdt_get_tree() failure path and a normal unmount.

Fixes: 5ff193fbde20 ("x86/intel_rdt: Add basic resctrl filesystem support")
Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://sashiko.dev/#/patchset/20260429184858.36423-1-tony.luck%40intel.com [1]
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
---
 fs/resctrl/rdtgroup.c | 82 +++++++++++++++++++++++++++++--------------
 1 file changed, 55 insertions(+), 27 deletions(-)

diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index ac3285ba8775..d2a1f88d8782 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -2977,10 +2977,34 @@ static void resctrl_fs_teardown(void)
 	rdtgroup_destroy_root();
 }
 
+static void resctrl_unmount(void)
+{
+	struct rdt_resource *r;
+
+	cpus_read_lock();
+	mutex_lock(&rdtgroup_mutex);
+
+	rdt_disable_ctx();
+
+	/* Put everything back to default values. */
+	for_each_alloc_capable_rdt_resource(r)
+		resctrl_arch_reset_all_ctrls(r);
+
+	resctrl_fs_teardown();
+	if (resctrl_arch_alloc_capable())
+		resctrl_arch_disable_alloc();
+	if (resctrl_arch_mon_capable())
+		resctrl_arch_disable_mon();
+	resctrl_mounted = false;
+	mutex_unlock(&rdtgroup_mutex);
+	cpus_read_unlock();
+}
+
 static int rdt_get_tree(struct fs_context *fc)
 {
 	struct rdt_fs_context *ctx = rdt_fc2context(fc);
 	unsigned long flags = RFTYPE_CTRL_BASE;
+	struct kernfs_node *rdt_root_kn;
 	struct rdt_l3_mon_domain *dom;
 	struct rdt_resource *r;
 	int ret;
@@ -3056,10 +3080,6 @@ static int rdt_get_tree(struct fs_context *fc)
 	if (ret)
 		goto out_mondata;
 
-	ret = kernfs_get_tree(fc);
-	if (ret < 0)
-		goto out_psl;
-
 	if (resctrl_arch_alloc_capable())
 		resctrl_arch_enable_alloc();
 	if (resctrl_arch_mon_capable())
@@ -3075,10 +3095,37 @@ static int rdt_get_tree(struct fs_context *fc)
 						   RESCTRL_PICK_ANY_CPU);
 	}
 
-	goto out;
+	/*
+	 * Ensure root kn remains accessible after mutex is unlocked so that
+	 * kernfs_kill_sb() can run safely if called by kernfs_get_tree()'s
+	 * failure path after creating a superblock but before taking reference
+	 * on root kn.
+	 */
+	kernfs_get(rdtgroup_default.kn);
+
+	/*
+	 * Make backup of the current root kn being created to be used in kernfs_put().
+	 * The additional reference taken above will prevent the kn from being freed
+	 * before kernfs_kill_sb() can run but rdtgroup_default.kn may be set to NULL
+	 * via rdtgroup_destroy_root() and its backing root (rdt_root) could be overwritten
+	 * before kernfs_put() can run.
+	 */
+	rdt_root_kn = rdtgroup_default.kn;
+
+	rdt_last_cmd_clear();
+	mutex_unlock(&rdtgroup_mutex);
+	cpus_read_unlock();
+
+	ret = kernfs_get_tree(fc);
+	/*
+	 * resctrl can only be mounted once, new superblock only expected
+	 * to be created once.
+	 */
+	if (!ctx->kfc.new_sb_created)
+		resctrl_unmount();
+	kernfs_put(rdt_root_kn);
+	return ret;
 
-out_psl:
-	rdt_pseudo_lock_release();
 out_mondata:
 	if (resctrl_arch_mon_capable())
 		kernfs_remove(kn_mondata);
@@ -3098,7 +3145,6 @@ static int rdt_get_tree(struct fs_context *fc)
 out_root:
 	rdtgroup_destroy_root();
 out:
-	rdt_last_cmd_clear();
 	mutex_unlock(&rdtgroup_mutex);
 	cpus_read_unlock();
 	return ret;
@@ -3185,26 +3231,8 @@ static int rdt_init_fs_context(struct fs_context *fc)
 
 static void rdt_kill_sb(struct super_block *sb)
 {
-	struct rdt_resource *r;
-
-	cpus_read_lock();
-	mutex_lock(&rdtgroup_mutex);
-
-	rdt_disable_ctx();
-
-	/* Put everything back to default values. */
-	for_each_alloc_capable_rdt_resource(r)
-		resctrl_arch_reset_all_ctrls(r);
-
-	resctrl_fs_teardown();
-	if (resctrl_arch_alloc_capable())
-		resctrl_arch_disable_alloc();
-	if (resctrl_arch_mon_capable())
-		resctrl_arch_disable_mon();
-	resctrl_mounted = false;
+	resctrl_unmount();
 	kernfs_kill_sb(sb);
-	mutex_unlock(&rdtgroup_mutex);
-	cpus_read_unlock();
 }
 
 static struct file_system_type rdt_fs_type = {
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v7 05/14] x86/resctrl: Stop setting event_group::force_off on RMID shortage
  2026-06-01 19:56 [PATCH v7 00/14] Allow AET to use PMT as loadable module Tony Luck
                   ` (3 preceding siblings ...)
  2026-06-01 19:56 ` [PATCH v7 04/14] fs/resctrl: Fix deadlock for errors during mount Tony Luck
@ 2026-06-01 19:56 ` Tony Luck
  2026-06-08 23:16   ` Reinette Chatre
  2026-06-01 19:56 ` [PATCH v7 06/14] fs/resctrl: Add interface to disable a monitor event Tony Luck
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Tony Luck @ 2026-06-01 19:56 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu,
	David E Box, x86
  Cc: Christoph Hellwig, linux-kernel, patches, Tony Luck

Drop the force_off assignment from all_regions_have_sufficient_rmid().
This preserves current single-enumeration behaviour while preparing for
the upcoming per-mount enumeration, where latching force_off would
incorrectly suppress re-enumeration on subsequent mounts - even when the
user explicitly requested the feature via "rdt={feature}".

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/resctrl/intel_aet.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index 89b8b619d5d5..e2af700bca04 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -60,8 +60,8 @@ struct pmt_event {
  *			data for all telemetry regions of type @pfname.
  *			Valid if the system supports the event group,
  *			NULL otherwise.
- * @force_off:		True when "rdt" command line or architecture code disables
- *			this event group due to insufficient RMIDs.
+ * @force_off:		True when "rdt" command line disables this event group
+ *			to avoid system limitations due to insufficient RMIDs.
  * @force_on:		True when "rdt" command line overrides disable of this
  *			event group.
  * @guid:		Unique number per XML description file.
@@ -214,10 +214,8 @@ static bool all_regions_have_sufficient_rmid(struct event_group *e, struct pmt_f
 		if (!p->regions[i].addr)
 			continue;
 		tr = &p->regions[i];
-		if (tr->num_rmids < e->num_rmid) {
-			e->force_off = true;
+		if (tr->num_rmids < e->num_rmid)
 			return false;
-		}
 	}
 
 	return true;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v7 06/14] fs/resctrl: Add interface to disable a monitor event
  2026-06-01 19:56 [PATCH v7 00/14] Allow AET to use PMT as loadable module Tony Luck
                   ` (4 preceding siblings ...)
  2026-06-01 19:56 ` [PATCH v7 05/14] x86/resctrl: Stop setting event_group::force_off on RMID shortage Tony Luck
@ 2026-06-01 19:56 ` Tony Luck
  2026-06-08 23:18   ` Reinette Chatre
  2026-06-01 19:56 ` [PATCH v7 07/14] x86/resctrl: Maintain a count of enabled monitor features Tony Luck
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Tony Luck @ 2026-06-01 19:56 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu,
	David E Box, x86
  Cc: Christoph Hellwig, linux-kernel, patches, Tony Luck

In preparation for re-running AET enumeration on every mount, AET code must be
able to disable events on unmount so the next mount starts from a clean slate.

Add a file system interface for architecture to clear the enabled flag for
a given event.

Add kerneldoc comments to describe limitations on when events may be enabled
or disabled.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 include/linux/resctrl.h | 34 ++++++++++++++++++++++++++++++++++
 fs/resctrl/monitor.c    | 12 ++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 006e57fd7ca5..a8338656f836 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -414,9 +414,43 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
 u32 resctrl_arch_system_num_rmid_idx(void);
 int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
 
+/**
+ * resctrl_enable_mon_event() - Enable monitoring event
+ * @eventid:	ID of the event
+ * @any_cpu:	True if event data can be read from any CPU.
+ * @binary_bits:Number of binary places of the fixed-point value expected to
+ *		back a floating point event. Can only be set for floating point
+ *		events.
+ * @arch_priv:  Architecture private data associated with event. Passed back to
+ *		architecture when reading the event via resctrl_arch_rmid_read().
+ *
+ * The file system must not be mounted when enabling an event.
+ *
+ * Events that require per-domain (architectural and/or filesystem) state must
+ * be enabled before the domain structures are allocated. For example before
+ * CPU hotplug callbacks that allocate domain structures are registered. If the
+ * architecture discovers a resource after initialization it should enable
+ * events needing per-domain state before any domain structure allocation which
+ * should be coordinated with the CPU hotplug callbacks.
+ *
+ * Return:
+ * true if event was successfully enabled, false otherwise.
+ */
 bool resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu,
 			      unsigned int binary_bits, void *arch_priv);
 
+/**
+ * resctrl_disable_mon_event() - Disable monitoring event
+ * @eventid:	ID of the event
+ *
+ * The file system must not be mounted when disabling an event.
+ *
+ * Events that require per-domain (architectural and/or filesystem) state
+ * will require additional cleanup which should be coordinated with the CPU
+ * hotplug callbacks.
+ */
+void resctrl_disable_mon_event(enum resctrl_event_id eventid);
+
 bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid);
 
 bool resctrl_arch_is_evt_configurable(enum resctrl_event_id evt);
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 9fd901c78dc6..327e7a863614 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -1012,6 +1012,18 @@ bool resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu,
 	return true;
 }
 
+void resctrl_disable_mon_event(enum resctrl_event_id eventid)
+{
+	if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS))
+		return;
+	if (!mon_event_all[eventid].enabled) {
+		pr_warn("Repeat disable for event %d\n", eventid);
+		return;
+	}
+
+	mon_event_all[eventid].enabled = false;
+}
+
 bool resctrl_is_mon_event_enabled(enum resctrl_event_id eventid)
 {
 	return eventid >= QOS_FIRST_EVENT && eventid < QOS_NUM_EVENTS &&
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v7 07/14] x86/resctrl: Maintain a count of enabled monitor features
  2026-06-01 19:56 [PATCH v7 00/14] Allow AET to use PMT as loadable module Tony Luck
                   ` (5 preceding siblings ...)
  2026-06-01 19:56 ` [PATCH v7 06/14] fs/resctrl: Add interface to disable a monitor event Tony Luck
@ 2026-06-01 19:56 ` Tony Luck
  2026-06-08 23:18   ` Reinette Chatre
  2026-06-01 19:56 ` [PATCH v7 08/14] fs,x86,mpam/resctrl: Handle change in number of RMIDs on each mount Tony Luck
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Tony Luck @ 2026-06-01 19:56 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu,
	David E Box, x86
  Cc: Christoph Hellwig, linux-kernel, patches, Tony Luck

AET (Application Energy Telemetry) may be enabled/disabled from one mount
to the next depending on whether the pmt_telemetry module is loaded. If
AET is the only monitoring feature supported on a system and it is enabled
in one mount, but disabled in a subsequent mount this will result in empty
mon_data directories.

Change from a boolean to a count of enabled monitor features inside
architecture code. File system code only needs to know if any monitor
features are enabled so resctrl_arch_mon_capable() can still return
boolean.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/resctrl.h         |  4 ++--
 arch/x86/kernel/cpu/resctrl/internal.h |  2 +-
 arch/x86/kernel/cpu/resctrl/core.c     | 24 +++++++++++++-----------
 arch/x86/kernel/cpu/resctrl/monitor.c  | 11 +++--------
 4 files changed, 19 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index 575f8408a9e7..1e50c7dc3fe3 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -43,7 +43,7 @@ struct resctrl_pqr_state {
 DECLARE_PER_CPU(struct resctrl_pqr_state, pqr_state);
 
 extern bool rdt_alloc_capable;
-extern bool rdt_mon_capable;
+extern int rdt_mon_feature_count;
 
 DECLARE_STATIC_KEY_FALSE(rdt_enable_key);
 DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
@@ -68,7 +68,7 @@ static inline void resctrl_arch_disable_alloc(void)
 
 static inline bool resctrl_arch_mon_capable(void)
 {
-	return rdt_mon_capable;
+	return !!rdt_mon_feature_count;
 }
 
 static inline void resctrl_arch_enable_mon(void)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index e3cfa0c10e92..3b09cfe9a046 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -224,7 +224,7 @@ union l3_qos_abmc_cfg {
 
 void rdt_ctrl_update(void *arg);
 
-int rdt_get_l3_mon_config(struct rdt_resource *r);
+void rdt_get_l3_mon_config(struct rdt_resource *r);
 
 bool rdt_cpu_has(int flag);
 
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 7667cf7c4e94..1af8f965fdd0 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -779,7 +779,7 @@ void resctrl_arch_pre_mount(void)
 	cpus_read_lock();
 	mutex_lock(&domain_list_lock);
 	r->mon_capable = true;
-	rdt_mon_capable = true;
+	rdt_mon_feature_count++;
 	for_each_online_cpu(cpu)
 		domain_add_cpu_mon(cpu, r);
 	mutex_unlock(&domain_list_lock);
@@ -959,30 +959,32 @@ static __init bool get_rdt_alloc_resources(void)
 	return ret;
 }
 
-static __init bool get_rdt_mon_resources(void)
+static __init int get_rdt_mon_resources(void)
 {
 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
-	bool ret = false;
+	int ret = 0;
 
 	if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC)) {
 		resctrl_enable_mon_event(QOS_L3_OCCUP_EVENT_ID, false, 0, NULL);
-		ret = true;
+		ret++;
 	}
 	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)) {
 		resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID, false, 0, NULL);
-		ret = true;
+		ret++;
 	}
 	if (rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL)) {
 		resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID, false, 0, NULL);
-		ret = true;
+		ret++;
 	}
 	if (rdt_cpu_has(X86_FEATURE_ABMC))
-		ret = true;
+		ret++;
 
 	if (!ret)
-		return false;
+		return 0;
 
-	return !rdt_get_l3_mon_config(r);
+	rdt_get_l3_mon_config(r);
+
+	return ret;
 }
 
 static __init void __check_quirks_intel(void)
@@ -1013,9 +1015,9 @@ static __init void check_quirks(void)
 static __init bool get_rdt_resources(void)
 {
 	rdt_alloc_capable = get_rdt_alloc_resources();
-	rdt_mon_capable = get_rdt_mon_resources();
+	rdt_mon_feature_count = get_rdt_mon_resources();
 
-	return (rdt_mon_capable || rdt_alloc_capable);
+	return (rdt_mon_feature_count || rdt_alloc_capable);
 }
 
 static __init void rdt_init_res_defs_intel(void)
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 9bd87bae4983..497cc57ac135 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -25,11 +25,8 @@
 
 #include "internal.h"
 
-/*
- * Global boolean for rdt_monitor which is true if any
- * resource monitoring is enabled.
- */
-bool rdt_mon_capable;
+/* Global count of number of resource monitor functions that are enabled. */
+int rdt_mon_feature_count;
 
 #define CF(cf)	((unsigned long)(1048576 * (cf) + 0.5))
 
@@ -402,7 +399,7 @@ static __init int snc_get_config(void)
 	return ret;
 }
 
-int __init rdt_get_l3_mon_config(struct rdt_resource *r)
+void __init rdt_get_l3_mon_config(struct rdt_resource *r)
 {
 	unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset;
 	struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
@@ -460,8 +457,6 @@ int __init rdt_get_l3_mon_config(struct rdt_resource *r)
 	}
 
 	r->mon_capable = true;
-
-	return 0;
 }
 
 void __init intel_rdt_mbm_apply_quirk(void)
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v7 08/14] fs,x86,mpam/resctrl: Handle change in number of RMIDs on each mount
  2026-06-01 19:56 [PATCH v7 00/14] Allow AET to use PMT as loadable module Tony Luck
                   ` (6 preceding siblings ...)
  2026-06-01 19:56 ` [PATCH v7 07/14] x86/resctrl: Maintain a count of enabled monitor features Tony Luck
@ 2026-06-01 19:56 ` Tony Luck
  2026-06-08 23:21   ` Reinette Chatre
  2026-06-01 19:56 ` [PATCH v7 09/14] x86/resctrl: Add PMT registration API for AET enumeration callbacks Tony Luck
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Tony Luck @ 2026-06-01 19:56 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu,
	David E Box, x86
  Cc: Christoph Hellwig, linux-kernel, patches, Tony Luck

Application Energy Telemetry (AET) event enumeration takes place
asynchronously. Linux builds the pmt_telemetry module into the kernel to
kick of enumeration early enough that it completes before first mount of
the resctrl file system.

Allowing pmt_telemetry to be a loadable module means that it is possible
for different numbers of RMIDs to be supported on each mount, depending
on whether pmt_telemetry module is loaded.

Add resctrl_arch_system_max_rmid_idx() interface to provide the maximum
supported number of RMIDs on a system. For x86 this is RDT_RESOURCE_L3
rdt_resource::mon.num_rmid (if L3 monitoring is enabled).

Allocate the rmid_ptrs, rdt_l3_mon_domain::rmid_busy_llc, and
rdt_l3_mon_domain::mbm_states based on the maximum possible.

Initialize rmid_free_lru based on the number of RMIDs available for
this mount.

Note that some RMIDs may still be marked busy from a previous mount.
Don't add these to the free list. Check current RMID limit in
limbo_release_entry() and do not add out of range RMIDs to the
free list.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 include/linux/resctrl.h            |  1 +
 arch/x86/kernel/cpu/resctrl/core.c | 12 ++++++++
 drivers/resctrl/mpam_resctrl.c     |  5 ++++
 fs/resctrl/monitor.c               | 47 ++++++++++++++++++++----------
 fs/resctrl/rdtgroup.c              |  2 +-
 5 files changed, 50 insertions(+), 17 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index a8338656f836..3705f0214fa6 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -412,6 +412,7 @@ static inline u32 resctrl_get_default_ctrl(struct rdt_resource *r)
 /* The number of closid supported by this resource regardless of CDP */
 u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
 u32 resctrl_arch_system_num_rmid_idx(void);
+u32 resctrl_arch_system_max_rmid_idx(void);
 int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
 
 /**
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 1af8f965fdd0..934492c7e643 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -129,6 +129,18 @@ u32 resctrl_arch_system_num_rmid_idx(void)
 	return num_rmids == U32_MAX ? 0 : num_rmids;
 }
 
+/**
+ * resctrl_arch_system_max_rmid_idx - Largest possible number of RMIDs
+ *
+ * Return: If L3 monitoring is supported, largest possible comes from L3.
+ */
+u32 resctrl_arch_system_max_rmid_idx(void)
+{
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+
+	return r->mon_capable ? r->mon.num_rmid : resctrl_arch_system_num_rmid_idx();
+}
+
 struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
 {
 	if (l >= RDT_NUM_RESOURCES)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index 226ff6f532fa..7079870ca894 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -272,6 +272,11 @@ u32 resctrl_arch_system_num_rmid_idx(void)
 	return (mpam_pmg_max + 1) * (mpam_partid_max + 1);
 }
 
+u32 resctrl_arch_system_max_rmid_idx(void)
+{
+	return resctrl_arch_system_num_rmid_idx();
+}
+
 u32 resctrl_arch_rmid_idx_encode(u32 closid, u32 rmid)
 {
 	return closid * (mpam_pmg_max + 1) + rmid;
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 327e7a863614..b374e2f84a75 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -115,10 +115,17 @@ static inline struct rmid_entry *__rmid_entry(u32 idx)
 
 static void limbo_release_entry(struct rmid_entry *entry)
 {
+	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
 	lockdep_assert_held(&rdtgroup_mutex);
 
 	rmid_limbo_count--;
-	list_add_tail(&entry->list, &rmid_free_lru);
+
+	/*
+	 * Limbo may be freeing an RMID from a previous mount where there
+	 * were more RMIDs available.
+	 */
+	if (resctrl_arch_rmid_idx_encode(entry->closid, entry->rmid) < idx_limit)
+		list_add_tail(&entry->list, &rmid_free_lru);
 
 	if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
 		closid_num_dirty_rmid[entry->closid]--;
@@ -133,7 +140,7 @@ static void limbo_release_entry(struct rmid_entry *entry)
 void __check_limbo(struct rdt_l3_mon_domain *d, bool force_free)
 {
 	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
-	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
+	u32 idx_limit = resctrl_arch_system_max_rmid_idx();
 	struct rmid_entry *entry;
 	u32 idx, cur_idx = 1;
 	void *arch_mon_ctx;
@@ -192,7 +199,7 @@ void __check_limbo(struct rdt_l3_mon_domain *d, bool force_free)
 
 bool has_busy_rmid(struct rdt_l3_mon_domain *d)
 {
-	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
+	u32 idx_limit = resctrl_arch_system_max_rmid_idx();
 
 	return find_first_bit(d->rmid_busy_llc, idx_limit) != idx_limit;
 }
@@ -916,24 +923,32 @@ int setup_rmid_lru_list(void)
 		return 0;
 
 	/*
-	 * Called on every mount, but the number of RMIDs cannot change
-	 * after the first mount, so keep using the same set of rmid_ptrs[]
-	 * until resctrl_exit(). Note that the limbo handler continues to
-	 * access rmid_ptrs[] after resctrl is unmounted.
+	 * Allocate the largest number of RMIDs that this system will ever
+	 * need. These cannot be freed until resctrl_exit() because the limbo
+	 * handler  continues to access rmid_ptrs[] after resctrl is unmounted.
 	 */
-	if (rmid_ptrs)
-		return 0;
+	if (!rmid_ptrs) {
+		idx_limit = resctrl_arch_system_max_rmid_idx();
+		rmid_ptrs = kzalloc_objs(struct rmid_entry, idx_limit);
+		if (!rmid_ptrs)
+			return -ENOMEM;
+
+		for (i = 0; i < idx_limit; i++) {
+			entry = &rmid_ptrs[i];
+			INIT_LIST_HEAD(&entry->list);
+
+			resctrl_arch_rmid_idx_decode(i, &entry->closid, &entry->rmid);
+		}
+	}
 
+	/* Find how many RMIDs are needed for this mount */
 	idx_limit = resctrl_arch_system_num_rmid_idx();
-	rmid_ptrs = kzalloc_objs(struct rmid_entry, idx_limit);
-	if (!rmid_ptrs)
-		return -ENOMEM;
 
+	INIT_LIST_HEAD(&rmid_free_lru);
 	for (i = 0; i < idx_limit; i++) {
 		entry = &rmid_ptrs[i];
-		INIT_LIST_HEAD(&entry->list);
-
-		resctrl_arch_rmid_idx_decode(i, &entry->closid, &entry->rmid);
+		if (i && entry->busy)
+			continue;
 		list_add_tail(&entry->list, &rmid_free_lru);
 	}
 
@@ -1156,7 +1171,7 @@ static void mbm_cntr_free_all(struct rdt_resource *r, struct rdt_l3_mon_domain *
  */
 static void resctrl_reset_rmid_all(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
 {
-	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
+	u32 idx_limit = resctrl_arch_system_max_rmid_idx();
 	enum resctrl_event_id evt;
 	int idx;
 
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index d2a1f88d8782..6d647b71c5db 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -4416,7 +4416,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *h
  */
 static int domain_setup_l3_mon_state(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
 {
-	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
+	u32 idx_limit = resctrl_arch_system_max_rmid_idx();
 	size_t tsize = sizeof(*d->mbm_states[0]);
 	enum resctrl_event_id eventid;
 	int idx;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v7 09/14] x86/resctrl: Add PMT registration API for AET enumeration callbacks
  2026-06-01 19:56 [PATCH v7 00/14] Allow AET to use PMT as loadable module Tony Luck
                   ` (7 preceding siblings ...)
  2026-06-01 19:56 ` [PATCH v7 08/14] fs,x86,mpam/resctrl: Handle change in number of RMIDs on each mount Tony Luck
@ 2026-06-01 19:56 ` Tony Luck
  2026-06-08 23:21   ` Reinette Chatre
  2026-06-01 19:56 ` [PATCH v7 10/14] platform/x86/intel/pmt: Register enumeration functions with resctrl Tony Luck
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Tony Luck @ 2026-06-01 19:56 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu,
	David E Box, x86
  Cc: Christoph Hellwig, linux-kernel, patches, Tony Luck

resctrl is always built-in; INTEL_PMT_TELEMETRY may be a module.  Add, and
export, register/unregister functions so the PMT module can supply/clear
enumeration callback functions when loaded/unloaded.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/resctrl.h          | 22 ++++++++++++++++++
 arch/x86/kernel/cpu/resctrl/intel_aet.c | 30 +++++++++++++++++++++++++
 2 files changed, 52 insertions(+)

diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index 1e50c7dc3fe3..c23d9c56a337 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -4,6 +4,8 @@
 
 #ifdef CONFIG_X86_CPU_RESCTRL
 
+#include <linux/intel_pmt_features.h>
+#include <linux/intel_vsec.h>
 #include <linux/jump_label.h>
 #include <linux/percpu.h>
 #include <linux/resctrl_types.h>
@@ -193,11 +195,31 @@ static inline void resctrl_arch_mon_ctx_free(struct rdt_resource *r,
 
 void resctrl_cpu_detect(struct cpuinfo_x86 *c);
 
+#ifdef CONFIG_X86_CPU_RESCTRL_INTEL_AET
+void intel_aet_register_enumeration(struct module *module,
+				    struct pmt_feature_group *(*get)(enum pmt_feature_id id),
+				    void (*put)(struct pmt_feature_group *p));
+void intel_aet_unregister_enumeration(void);
 #else
+static inline void intel_aet_register_enumeration(struct module *module,
+						  struct pmt_feature_group *(*get)(enum pmt_feature_id id),
+						  void (*put)(struct pmt_feature_group *p)) { }
+static inline void intel_aet_unregister_enumeration(void) { }
+#endif /* CONFIG_X86_CPU_RESCTRL_INTEL_AET */
+
+#else
+
+#include <linux/intel_pmt_features.h>
+#include <linux/intel_vsec.h>
 
 static inline void resctrl_arch_sched_in(struct task_struct *tsk) {}
 static inline void resctrl_cpu_detect(struct cpuinfo_x86 *c) {}
 
+static inline void intel_aet_register_enumeration(struct module *module,
+						  struct pmt_feature_group *(*get)(enum pmt_feature_id id),
+						  void (*put)(struct pmt_feature_group *p)) { }
+static inline void intel_aet_unregister_enumeration(void) { }
+
 #endif /* CONFIG_X86_CPU_RESCTRL */
 
 #endif /* _ASM_X86_RESCTRL_H */
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index e2af700bca04..db671725acbb 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -12,17 +12,21 @@
 #define pr_fmt(fmt)   "resctrl: " fmt
 
 #include <linux/bits.h>
+#include <linux/cleanup.h>
 #include <linux/compiler_types.h>
 #include <linux/container_of.h>
 #include <linux/cpumask.h>
 #include <linux/err.h>
 #include <linux/errno.h>
+#include <linux/export.h>
 #include <linux/gfp_types.h>
 #include <linux/init.h>
 #include <linux/intel_pmt_features.h>
 #include <linux/intel_vsec.h>
 #include <linux/io.h>
 #include <linux/minmax.h>
+#include <linux/mutex.h>
+#include <linux/module.h>
 #include <linux/printk.h>
 #include <linux/rculist.h>
 #include <linux/rcupdate.h>
@@ -289,6 +293,10 @@ static enum pmt_feature_id lookup_pfid(const char *pfname)
 	return FEATURE_INVALID;
 }
 
+static struct module *pmt_module;
+static struct pmt_feature_group *(*get_feature)(enum pmt_feature_id id);
+static void (*put_feature)(struct pmt_feature_group *p);
+
 /*
  * Request a copy of struct pmt_feature_group for each event group. If there is
  * one, the returned structure has an array of telemetry_region structures,
@@ -323,6 +331,28 @@ bool intel_aet_get_events(void)
 	return ret;
 }
 
+static DEFINE_MUTEX(aet_register_lock);
+
+void intel_aet_register_enumeration(struct module *module,
+				    struct pmt_feature_group *(*get)(enum pmt_feature_id id),
+				    void (*put)(struct pmt_feature_group *p))
+{
+	guard(mutex)(&aet_register_lock);
+	get_feature = get;
+	put_feature = put;
+	pmt_module = module;
+}
+EXPORT_SYMBOL_NS_GPL(intel_aet_register_enumeration, "INTEL_PMT");
+
+void intel_aet_unregister_enumeration(void)
+{
+	guard(mutex)(&aet_register_lock);
+	pmt_module = NULL;
+	get_feature = NULL;
+	put_feature = NULL;
+}
+EXPORT_SYMBOL_NS_GPL(intel_aet_unregister_enumeration, "INTEL_PMT");
+
 void __exit intel_aet_exit(void)
 {
 	struct event_group **peg;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v7 10/14] platform/x86/intel/pmt: Register enumeration functions with resctrl
  2026-06-01 19:56 [PATCH v7 00/14] Allow AET to use PMT as loadable module Tony Luck
                   ` (8 preceding siblings ...)
  2026-06-01 19:56 ` [PATCH v7 09/14] x86/resctrl: Add PMT registration API for AET enumeration callbacks Tony Luck
@ 2026-06-01 19:56 ` Tony Luck
  2026-06-08 23:22   ` Reinette Chatre
  2026-06-01 19:56 ` [PATCH v7 11/14] mpam,x86/resctrl: Resolve INTEL_PMT_TELEMETRY symbols at runtime Tony Luck
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Tony Luck @ 2026-06-01 19:56 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu,
	David E Box, x86
  Cc: Christoph Hellwig, linux-kernel, patches, Tony Luck

INTEL_PMT_TELEMETRY is a loadable module, but resctrl is built-in and cannot
call PMT functions directly.  Register the telemetry enumeration function
pointers at pmt_telemetry module init, and unregister them at module exit.

Note that checkpatch complains about the #include of <asm/resctrl.h>.
This is needed rather than <linux/resctrl.h> to get the function stub
definitions when CONFIG_X86_CPU_RESCTRL=n.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 drivers/platform/x86/intel/pmt/telemetry.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/platform/x86/intel/pmt/telemetry.c b/drivers/platform/x86/intel/pmt/telemetry.c
index bdc7c24a3678..33c1f8e7022f 100644
--- a/drivers/platform/x86/intel/pmt/telemetry.c
+++ b/drivers/platform/x86/intel/pmt/telemetry.c
@@ -25,6 +25,8 @@
 #include <linux/uaccess.h>
 #include <linux/xarray.h>
 
+#include <asm/resctrl.h>
+
 #include "class.h"
 
 #define TELEM_SIZE_OFFSET	0x0
@@ -425,12 +427,21 @@ static struct auxiliary_driver pmt_telem_aux_driver = {
 
 static int __init pmt_telem_init(void)
 {
-	return auxiliary_driver_register(&pmt_telem_aux_driver);
+	int ret;
+
+	ret = auxiliary_driver_register(&pmt_telem_aux_driver);
+	if (ret)
+		return ret;
+
+	intel_aet_register_enumeration(THIS_MODULE, intel_pmt_get_regions_by_feature,
+				       intel_pmt_put_feature_group);
+	return 0;
 }
 module_init(pmt_telem_init);
 
 static void __exit pmt_telem_exit(void)
 {
+	intel_aet_unregister_enumeration();
 	auxiliary_driver_unregister(&pmt_telem_aux_driver);
 	xa_destroy(&telem_array);
 }
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v7 11/14] mpam,x86/resctrl: Resolve INTEL_PMT_TELEMETRY symbols at runtime
  2026-06-01 19:56 [PATCH v7 00/14] Allow AET to use PMT as loadable module Tony Luck
                   ` (9 preceding siblings ...)
  2026-06-01 19:56 ` [PATCH v7 10/14] platform/x86/intel/pmt: Register enumeration functions with resctrl Tony Luck
@ 2026-06-01 19:56 ` Tony Luck
  2026-06-08 23:25   ` Reinette Chatre
  2026-06-01 19:56 ` [PATCH v7 12/14] fs/resctrl: Call architecture hooks for every mount/unmount Tony Luck
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 44+ messages in thread
From: Tony Luck @ 2026-06-01 19:56 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu,
	David E Box, x86
  Cc: Christoph Hellwig, linux-kernel, patches, Tony Luck

resctrl is always built-in, but INTEL_PMT_TELEMETRY and INTEL_TPMI are
logically independent and should be loadable modules.  Switch AET to use
the function-pointer registration API (introduced in the preceding patch)
instead of direct link-time references to PMT symbols.

Prepare for the file system to call resctrl_arch_pre_mount() on every mount
by moving AET enumeration into resctrl_arch_pre_mount() and cleanup into
resctrl_arch_unmount(). This allows the PMT module to be unloaded whenever
the filesystem is not mounted.

Remove intel_aet_exit because all cleanup now happens in the unmount path.

Note that the Linux file system code does not serialize calls to
fs_context_operations::get_tree(), so there may be arbitrarily many
parallel calls if users invoke mount(2) multiple times.

Add locking and state (resctrl_arch_mount_entries) to avoid repeated
enumeration on nested mount requests from file system code (which will be
failed with -EBUSY status).

event_group::num_rmid may be reset (reduced) during enumeration. This is
not worth resetting on unmount because the same reduction would occur on
each subsequent mount.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 include/linux/resctrl.h                 |  3 ++
 arch/x86/kernel/cpu/resctrl/internal.h  |  8 ++--
 arch/x86/kernel/cpu/resctrl/core.c      | 41 +++++++++++++++++--
 arch/x86/kernel/cpu/resctrl/intel_aet.c | 52 ++++++++++++++++++++++---
 drivers/resctrl/mpam_resctrl.c          |  4 ++
 5 files changed, 95 insertions(+), 13 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 3705f0214fa6..9534d42e0c57 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -557,6 +557,9 @@ void resctrl_offline_cpu(unsigned int cpu);
  */
 void resctrl_arch_pre_mount(void);
 
+/* Called to report unmount. */
+void resctrl_arch_unmount(void);
+
 /**
  * resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid
  *			      for this resource and domain.
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 3b09cfe9a046..017a19143ec9 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -234,15 +234,15 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
 void resctrl_arch_mbm_cntr_assign_set_one(struct rdt_resource *r);
 
 #ifdef CONFIG_X86_CPU_RESCTRL_INTEL_AET
-bool intel_aet_get_events(void);
-void __exit intel_aet_exit(void);
+bool intel_aet_pre_mount(void);
+void intel_aet_unmount(void);
 int intel_aet_read_event(int domid, u32 rmid, void *arch_priv, u64 *val);
 void intel_aet_mon_domain_setup(int cpu, int id, struct rdt_resource *r,
 				struct list_head *add_pos);
 bool intel_handle_aet_option(bool force_off, char *tok);
 #else
-static inline bool intel_aet_get_events(void) { return false; }
-static inline void __exit intel_aet_exit(void) { }
+static inline bool intel_aet_pre_mount(void) { return false; }
+static inline void intel_aet_unmount(void) { }
 static inline int intel_aet_read_event(int domid, u32 rmid, void *arch_priv, u64 *val)
 {
 	return -EINVAL;
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 934492c7e643..9336299b9647 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -16,10 +16,12 @@
 
 #define pr_fmt(fmt)	"resctrl: " fmt
 
+#include <linux/cleanup.h>
 #include <linux/cpu.h>
 #include <linux/slab.h>
 #include <linux/err.h>
 #include <linux/cpuhotplug.h>
+#include <linux/mutex.h>
 
 #include <asm/cpu_device_id.h>
 #include <asm/msr.h>
@@ -776,12 +778,20 @@ static int resctrl_arch_offline_cpu(unsigned int cpu)
 	return 0;
 }
 
+static DEFINE_MUTEX(resctrl_arch_mount_lock);
+static int resctrl_arch_mount_entries;
+
 void resctrl_arch_pre_mount(void)
 {
 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
 	int cpu;
 
-	if (!intel_aet_get_events())
+	guard(mutex)(&resctrl_arch_mount_lock);
+
+	if (++resctrl_arch_mount_entries > 1)
+		return;
+
+	if (!intel_aet_pre_mount())
 		return;
 
 	/*
@@ -798,6 +808,33 @@ void resctrl_arch_pre_mount(void)
 	cpus_read_unlock();
 }
 
+void resctrl_arch_unmount(void)
+{
+	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
+	int cpu;
+
+	guard(mutex)(&resctrl_arch_mount_lock);
+
+	if (--resctrl_arch_mount_entries > 0)
+		return;
+
+	WARN_ON(resctrl_arch_mount_entries < 0);
+
+	intel_aet_unmount();
+
+	if (!r->mon_capable)
+		return;
+
+	cpus_read_lock();
+	mutex_lock(&domain_list_lock);
+	for_each_online_cpu(cpu)
+		domain_remove_cpu_mon(cpu, r);
+	r->mon_capable = false;
+	rdt_mon_feature_count--;
+	mutex_unlock(&domain_list_lock);
+	cpus_read_unlock();
+}
+
 enum {
 	RDT_FLAG_CMT,
 	RDT_FLAG_MBM_TOTAL,
@@ -1174,8 +1211,6 @@ late_initcall(resctrl_arch_late_init);
 
 static void __exit resctrl_arch_exit(void)
 {
-	intel_aet_exit();
-
 	cpuhp_remove_state(rdt_online);
 
 	resctrl_exit();
diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
index db671725acbb..74c34593876b 100644
--- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
+++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
@@ -25,8 +25,8 @@
 #include <linux/intel_vsec.h>
 #include <linux/io.h>
 #include <linux/minmax.h>
-#include <linux/mutex.h>
 #include <linux/module.h>
+#include <linux/mutex.h>
 #include <linux/printk.h>
 #include <linux/rculist.h>
 #include <linux/rcupdate.h>
@@ -308,7 +308,7 @@ static void (*put_feature)(struct pmt_feature_group *p);
  * struct pmt_feature_group to indicate that its events are successfully
  * enabled.
  */
-bool intel_aet_get_events(void)
+static bool aet_get_events(void)
 {
 	struct pmt_feature_group *p;
 	enum pmt_feature_id pfid;
@@ -317,14 +317,14 @@ bool intel_aet_get_events(void)
 
 	for_each_event_group(peg) {
 		pfid = lookup_pfid((*peg)->pfname);
-		p = intel_pmt_get_regions_by_feature(pfid);
+		p = get_feature(pfid);
 		if (IS_ERR_OR_NULL(p))
 			continue;
 		if (enable_events(*peg, p)) {
 			(*peg)->pfg = p;
 			ret = true;
 		} else {
-			intel_pmt_put_feature_group(p);
+			put_feature(p);
 		}
 	}
 
@@ -353,16 +353,56 @@ void intel_aet_unregister_enumeration(void)
 }
 EXPORT_SYMBOL_NS_GPL(intel_aet_unregister_enumeration, "INTEL_PMT");
 
-void __exit intel_aet_exit(void)
+/*
+ * The pmt_telemetry module may not be loaded when the resctrl file system
+ * is mounted. Keep track of whether a hold was placed on the module to know
+ * whether to release it during unmount.
+ */
+static bool have_pmt_hold;
+
+bool intel_aet_pre_mount(void)
+{
+	bool ret;
+
+	guard(mutex)(&aet_register_lock);
+	if (!get_feature || !put_feature)
+		return false;
+
+	if (pmt_module) {
+		if (!try_module_get(pmt_module))
+			return false;
+		have_pmt_hold = true;
+	}
+
+	ret = aet_get_events();
+
+	if (!ret && pmt_module && have_pmt_hold) {
+		module_put(pmt_module);
+		have_pmt_hold = false;
+	}
+
+	return ret;
+}
+
+void intel_aet_unmount(void)
 {
 	struct event_group **peg;
 
+	guard(mutex)(&aet_register_lock);
 	for_each_event_group(peg) {
 		if ((*peg)->pfg) {
-			intel_pmt_put_feature_group((*peg)->pfg);
+			struct event_group *e = *peg;
+
+			for (int j = 0; j < e->num_events; j++)
+				resctrl_disable_mon_event(e->evts[j].id);
+			put_feature((*peg)->pfg);
 			(*peg)->pfg = NULL;
 		}
 	}
+	if (have_pmt_hold && pmt_module) {
+		module_put(pmt_module);
+		have_pmt_hold = false;
+	}
 }
 
 #define DATA_VALID	BIT_ULL(63)
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index 7079870ca894..5859cc0f5e37 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -162,6 +162,10 @@ void resctrl_arch_pre_mount(void)
 {
 }
 
+void resctrl_arch_unmount(void)
+{
+}
+
 bool resctrl_arch_get_cdp_enabled(enum resctrl_res_level rid)
 {
 	return mpam_resctrl_controls[rid].cdp_enabled;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v7 12/14] fs/resctrl: Call architecture hooks for every mount/unmount
  2026-06-01 19:56 [PATCH v7 00/14] Allow AET to use PMT as loadable module Tony Luck
                   ` (10 preceding siblings ...)
  2026-06-01 19:56 ` [PATCH v7 11/14] mpam,x86/resctrl: Resolve INTEL_PMT_TELEMETRY symbols at runtime Tony Luck
@ 2026-06-01 19:56 ` Tony Luck
  2026-06-08 23:26   ` Reinette Chatre
  2026-06-01 19:56 ` [PATCH v7 13/14] x86/resctrl: Simplify Kconfig options for resctrl Tony Luck
  2026-06-01 19:56 ` [PATCH v7 14/14] Documentation/filesystems/resctrl: Add footnote for telemetry fstab mount caveat Tony Luck
  13 siblings, 1 reply; 44+ messages in thread
From: Tony Luck @ 2026-06-01 19:56 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu,
	David E Box, x86
  Cc: Christoph Hellwig, linux-kernel, patches, Tony Luck

The intel_pmt module should only be pinned while the resctrl file system
is mounted.

Add hooks for every mount/unmount of the resctrl file system so that
architecture code can increment the intel_pmt module reference count on
mount and decrement on unmount.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 include/linux/resctrl.h | 7 +++++--
 fs/resctrl/rdtgroup.c   | 9 +++++++--
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 9534d42e0c57..e8a87c3664db 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -552,12 +552,15 @@ void resctrl_online_cpu(unsigned int cpu);
 void resctrl_offline_cpu(unsigned int cpu);
 
 /*
- * Architecture hook called at beginning of first file system mount attempt.
+ * Architecture hook called before attempting to mount the file system.
  * No locks are held.
  */
 void resctrl_arch_pre_mount(void);
 
-/* Called to report unmount. */
+/*
+ * Architecture hook called when mount fails, or on unmount.
+ * No locks are held.
+ */
 void resctrl_arch_unmount(void);
 
 /**
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 6d647b71c5db..195ebb15c5f0 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -18,7 +18,6 @@
 #include <linux/fs_parser.h>
 #include <linux/sysfs.h>
 #include <linux/kernfs.h>
-#include <linux/once.h>
 #include <linux/resctrl.h>
 #include <linux/seq_buf.h>
 #include <linux/seq_file.h>
@@ -2998,6 +2997,8 @@ static void resctrl_unmount(void)
 	resctrl_mounted = false;
 	mutex_unlock(&rdtgroup_mutex);
 	cpus_read_unlock();
+
+	resctrl_arch_unmount();
 }
 
 static int rdt_get_tree(struct fs_context *fc)
@@ -3009,7 +3010,7 @@ static int rdt_get_tree(struct fs_context *fc)
 	struct rdt_resource *r;
 	int ret;
 
-	DO_ONCE_SLEEPABLE(resctrl_arch_pre_mount);
+	resctrl_arch_pre_mount();
 
 	cpus_read_lock();
 	mutex_lock(&rdtgroup_mutex);
@@ -3147,6 +3148,10 @@ static int rdt_get_tree(struct fs_context *fc)
 out:
 	mutex_unlock(&rdtgroup_mutex);
 	cpus_read_unlock();
+
+	if (ret)
+		resctrl_arch_unmount();
+
 	return ret;
 }
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v7 13/14] x86/resctrl: Simplify Kconfig options for resctrl
  2026-06-01 19:56 [PATCH v7 00/14] Allow AET to use PMT as loadable module Tony Luck
                   ` (11 preceding siblings ...)
  2026-06-01 19:56 ` [PATCH v7 12/14] fs/resctrl: Call architecture hooks for every mount/unmount Tony Luck
@ 2026-06-01 19:56 ` Tony Luck
  2026-06-01 19:56 ` [PATCH v7 14/14] Documentation/filesystems/resctrl: Add footnote for telemetry fstab mount caveat Tony Luck
  13 siblings, 0 replies; 44+ messages in thread
From: Tony Luck @ 2026-06-01 19:56 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu,
	David E Box, x86
  Cc: Christoph Hellwig, linux-kernel, patches, Tony Luck

Linus Torvalds complained[1] about Kconfig complexity making it too
hard for "random people to build their own kernels".

CONFIG_X86_CPU_RESCTRL_INTEL_AET has been causing problems since
it was first added as it unnaturally required other config options
to be set to "built-in".

AET now resolves PMT symbols at runtime via the registration API,
so INTEL_PMT_TELEMETRY no longer needs to be built-in. This means
that AET can be included unconditionally as part of X86_CPU_RESCTRL.

CONFIG_X86_CPU_RESCTRL now depends on X86_64 because reading telemetry
data needs readq().

Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/all/CAHk-=whigg3hvOy7c1j1MXFy6o6CHp0g4Tc3Y-MAk+XDssHU0A@mail.gmail.com # 1
---
 arch/x86/include/asm/resctrl.h         |  7 -------
 arch/x86/kernel/cpu/resctrl/internal.h | 13 -------------
 arch/x86/Kconfig                       | 15 +--------------
 arch/x86/kernel/cpu/resctrl/Makefile   |  2 +-
 4 files changed, 2 insertions(+), 35 deletions(-)

diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index c23d9c56a337..5a61d0b0955d 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -195,17 +195,10 @@ static inline void resctrl_arch_mon_ctx_free(struct rdt_resource *r,
 
 void resctrl_cpu_detect(struct cpuinfo_x86 *c);
 
-#ifdef CONFIG_X86_CPU_RESCTRL_INTEL_AET
 void intel_aet_register_enumeration(struct module *module,
 				    struct pmt_feature_group *(*get)(enum pmt_feature_id id),
 				    void (*put)(struct pmt_feature_group *p));
 void intel_aet_unregister_enumeration(void);
-#else
-static inline void intel_aet_register_enumeration(struct module *module,
-						  struct pmt_feature_group *(*get)(enum pmt_feature_id id),
-						  void (*put)(struct pmt_feature_group *p)) { }
-static inline void intel_aet_unregister_enumeration(void) { }
-#endif /* CONFIG_X86_CPU_RESCTRL_INTEL_AET */
 
 #else
 
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 017a19143ec9..4a89f00aaa01 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -233,24 +233,11 @@ void __init intel_rdt_mbm_apply_quirk(void);
 void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
 void resctrl_arch_mbm_cntr_assign_set_one(struct rdt_resource *r);
 
-#ifdef CONFIG_X86_CPU_RESCTRL_INTEL_AET
 bool intel_aet_pre_mount(void);
 void intel_aet_unmount(void);
 int intel_aet_read_event(int domid, u32 rmid, void *arch_priv, u64 *val);
 void intel_aet_mon_domain_setup(int cpu, int id, struct rdt_resource *r,
 				struct list_head *add_pos);
 bool intel_handle_aet_option(bool force_off, char *tok);
-#else
-static inline bool intel_aet_pre_mount(void) { return false; }
-static inline void intel_aet_unmount(void) { }
-static inline int intel_aet_read_event(int domid, u32 rmid, void *arch_priv, u64 *val)
-{
-	return -EINVAL;
-}
-
-static inline void intel_aet_mon_domain_setup(int cpu, int id, struct rdt_resource *r,
-					      struct list_head *add_pos) { }
-static inline bool intel_handle_aet_option(bool force_off, char *tok) { return false; }
-#endif
 
 #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f3f7cb01d69d..77fd5529542a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -515,7 +515,7 @@ config X86_MPPARSE
 
 config X86_CPU_RESCTRL
 	bool "x86 CPU resource control support"
-	depends on X86 && (CPU_SUP_INTEL || CPU_SUP_AMD)
+	depends on X86_64 && (CPU_SUP_INTEL || CPU_SUP_AMD)
 	depends on MISC_FILESYSTEMS
 	select ARCH_HAS_CPU_RESCTRL
 	select RESCTRL_FS
@@ -536,19 +536,6 @@ config X86_CPU_RESCTRL
 
 	  Say N if unsure.
 
-config X86_CPU_RESCTRL_INTEL_AET
-	bool "Intel Application Energy Telemetry"
-	depends on X86_64 && X86_CPU_RESCTRL && CPU_SUP_INTEL && INTEL_PMT_TELEMETRY=y && INTEL_TPMI=y
-	help
-	  Enable per-RMID telemetry events in resctrl.
-
-	  Intel feature that collects per-RMID execution data
-	  about energy consumption, measure of frequency independent
-	  activity and other performance metrics. Data is aggregated
-	  per package.
-
-	  Say N if unsure.
-
 config X86_FRED
 	bool "Flexible Return and Event Delivery"
 	depends on X86_64
diff --git a/arch/x86/kernel/cpu/resctrl/Makefile b/arch/x86/kernel/cpu/resctrl/Makefile
index 273ddfa30836..97ceb4e44dfa 100644
--- a/arch/x86/kernel/cpu/resctrl/Makefile
+++ b/arch/x86/kernel/cpu/resctrl/Makefile
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_X86_CPU_RESCTRL)		+= core.o rdtgroup.o monitor.o
 obj-$(CONFIG_X86_CPU_RESCTRL)		+= ctrlmondata.o
-obj-$(CONFIG_X86_CPU_RESCTRL_INTEL_AET)	+= intel_aet.o
+obj-$(CONFIG_X86_CPU_RESCTRL)		+= intel_aet.o
 obj-$(CONFIG_RESCTRL_FS_PSEUDO_LOCK)	+= pseudo_lock.o
 
 # To allow define_trace.h's recursive include:
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v7 14/14] Documentation/filesystems/resctrl: Add footnote for telemetry fstab mount caveat
  2026-06-01 19:56 [PATCH v7 00/14] Allow AET to use PMT as loadable module Tony Luck
                   ` (12 preceding siblings ...)
  2026-06-01 19:56 ` [PATCH v7 13/14] x86/resctrl: Simplify Kconfig options for resctrl Tony Luck
@ 2026-06-01 19:56 ` Tony Luck
  2026-06-08 23:26   ` Reinette Chatre
  13 siblings, 1 reply; 44+ messages in thread
From: Tony Luck @ 2026-06-01 19:56 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu,
	David E Box, x86
  Cc: Christoph Hellwig, linux-kernel, patches, Tony Luck

Add a footnote to the 'If telemetry monitoring is enabled' sentence noting
that because PMT driver enumerates telemetry features asynchronously, an
automatic mount of resctrl from /etc/fstab at boot may occur before those
features are available, resulting in them not being enabled.

Assisted-by: Claude:Sonnet_4.6
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 Documentation/filesystems/resctrl.rst | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
index b003bed339fd..4e62b8c5e0d6 100644
--- a/Documentation/filesystems/resctrl.rst
+++ b/Documentation/filesystems/resctrl.rst
@@ -620,7 +620,7 @@ When monitoring is enabled all MON groups will also contain:
 	each instance of an L3 cache. Each directory contains files for the enabled
 	L3 events (e.g. "llc_occupancy", "mbm_total_bytes", and "mbm_local_bytes").
 
-	If telemetry monitoring is enabled, there will be a "mon_PERF_PKG_YY"
+	If telemetry monitoring is enabled [#]_, there will be a "mon_PERF_PKG_YY"
 	directory for each physical processor package. Each directory contains
 	files for the enabled telemetry events (e.g. "core_energy". "activity",
 	"uops_retired", etc.)
@@ -659,6 +659,11 @@ When monitoring is enabled all MON groups will also contain:
 	returned if the MBM event does not have an assigned counter in the
 	CTRL_MON group nor in any of its associated MON groups.
 
+.. [#] Telemetry features are enumerated asynchronously by the PMT driver. If
+   resctrl is automatically mounted from ``/etc/fstab`` at boot, the telemetry
+   features may not yet be available at mount time and will therefore not be
+   enabled.
+
 "mon_hw_id":
 	Available only with debug option. The identifier used by hardware
 	for the monitor group. On x86 this is the RMID.
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 05/14] x86/resctrl: Stop setting event_group::force_off on RMID shortage
  2026-06-01 19:56 ` [PATCH v7 05/14] x86/resctrl: Stop setting event_group::force_off on RMID shortage Tony Luck
@ 2026-06-08 23:16   ` Reinette Chatre
  2026-06-09 16:51     ` Luck, Tony
  0 siblings, 1 reply; 44+ messages in thread
From: Reinette Chatre @ 2026-06-08 23:16 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu,
	David E Box, x86
  Cc: Christoph Hellwig, linux-kernel, patches

Hi Tony,

On 6/1/26 12:56 PM, Tony Luck wrote:
> Drop the force_off assignment from all_regions_have_sufficient_rmid().

Apart from this being obvious from the patch, this is not how an x86 changelog
should be structured. Please see "Changelog" in
Documentation/process/maintainer-tip.rst

Please note that, after the preparatory fixes that are expected to land separately,
this will be the first patch of this series ... having this first patch just
kick off with "what changes" without any context is a difficult way for a
reviewer to start considering this piece of work.

> This preserves current single-enumeration behaviour while preparing for
> the upcoming per-mount enumeration, where latching force_off would
> incorrectly suppress re-enumeration on subsequent mounts - even when the
> user explicitly requested the feature via "rdt={feature}".

I believe that user space should still expect that rdt= options behave
consistently. So if user space for some reason provides: rdt=!mbm,mbm,!energy,energy
on a system that supports both then it should not be the case that one is enabled and the
other not.

I this think that, for example, enable_events() should start with:

	if (e->force_off && !e->force_on)
		return false;

> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/kernel/cpu/resctrl/intel_aet.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> index 89b8b619d5d5..e2af700bca04 100644
> --- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
> +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> @@ -60,8 +60,8 @@ struct pmt_event {
>   *			data for all telemetry regions of type @pfname.
>   *			Valid if the system supports the event group,
>   *			NULL otherwise.
> - * @force_off:		True when "rdt" command line or architecture code disables
> - *			this event group due to insufficient RMIDs.
> + * @force_off:		True when "rdt" command line disables this event group
> + *			to avoid system limitations due to insufficient RMIDs.

From what I understand it is only architecture that can disable an event group
"to avoid system limitations due to insufficient RMIDs" and this capability is removed
in this patch. Above implies that this capability now needs to be implemented
by userspace, which I do not think is accurate?

>   * @force_on:		True when "rdt" command line overrides disable of this
>   *			event group.
>   * @guid:		Unique number per XML description file.
> @@ -214,10 +214,8 @@ static bool all_regions_have_sufficient_rmid(struct event_group *e, struct pmt_f
>  		if (!p->regions[i].addr)
>  			continue;
>  		tr = &p->regions[i];
> -		if (tr->num_rmids < e->num_rmid) {
> -			e->force_off = true;
> +		if (tr->num_rmids < e->num_rmid)
>  			return false;
> -		}
>  	}
>  
>  	return true;

Reinette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 06/14] fs/resctrl: Add interface to disable a monitor event
  2026-06-01 19:56 ` [PATCH v7 06/14] fs/resctrl: Add interface to disable a monitor event Tony Luck
@ 2026-06-08 23:18   ` Reinette Chatre
  2026-06-09 17:21     ` Luck, Tony
  0 siblings, 1 reply; 44+ messages in thread
From: Reinette Chatre @ 2026-06-08 23:18 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu,
	David E Box, x86
  Cc: Christoph Hellwig, linux-kernel, patches

Hi Tony,

On 6/1/26 12:56 PM, Tony Luck wrote:
> In preparation for re-running AET enumeration on every mount, AET code must be
> able to disable events on unmount so the next mount starts from a clean slate.
> 
> Add a file system interface for architecture to clear the enabled flag for
> a given event.

This just verbatim describes what the patch does. It would be helpful to describe
how the flag is used by resctrl to support the first paragraph's implicit claim
that the enabled flag's value is not relevant when resctrl is unmounted.

> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index 9fd901c78dc6..327e7a863614 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -1012,6 +1012,18 @@ bool resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu,
>  	return true;
>  }
>  
> +void resctrl_disable_mon_event(enum resctrl_event_id eventid)
> +{
> +	if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS))
> +		return;
> +	if (!mon_event_all[eventid].enabled) {
> +		pr_warn("Repeat disable for event %d\n", eventid);
> +		return;
> +	}
> +
> +	mon_event_all[eventid].enabled = false;
> +}

It seems reasonable for architecture to expect that "disable" of event undoes all
settings from the earlier "enable" of event. The new function only sets the "enabled"
flag to false though. This is potentially confusing since it leaves the event with
some lingering state, some of which is a pointer to state that an architecture may
be reasonable to remove after disabling this event leaving a dangling pointer. 

Reinette


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 07/14] x86/resctrl: Maintain a count of enabled monitor features
  2026-06-01 19:56 ` [PATCH v7 07/14] x86/resctrl: Maintain a count of enabled monitor features Tony Luck
@ 2026-06-08 23:18   ` Reinette Chatre
  2026-06-09 18:46     ` Luck, Tony
  0 siblings, 1 reply; 44+ messages in thread
From: Reinette Chatre @ 2026-06-08 23:18 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu,
	David E Box, x86
  Cc: Christoph Hellwig, linux-kernel, patches

Hi Tony,

On 6/1/26 12:56 PM, Tony Luck wrote:
> AET (Application Energy Telemetry) may be enabled/disabled from one mount
> to the next depending on whether the pmt_telemetry module is loaded. If
> AET is the only monitoring feature supported on a system and it is enabled
> in one mount, but disabled in a subsequent mount this will result in empty
> mon_data directories.
> 
> Change from a boolean to a count of enabled monitor features inside
> architecture code. File system code only needs to know if any monitor

First sentence seems to be missing what is being changed.

> features are enabled so resctrl_arch_mon_capable() can still return
> boolean.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/include/asm/resctrl.h         |  4 ++--
>  arch/x86/kernel/cpu/resctrl/internal.h |  2 +-
>  arch/x86/kernel/cpu/resctrl/core.c     | 24 +++++++++++++-----------
>  arch/x86/kernel/cpu/resctrl/monitor.c  | 11 +++--------
>  4 files changed, 19 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
> index 575f8408a9e7..1e50c7dc3fe3 100644
> --- a/arch/x86/include/asm/resctrl.h
> +++ b/arch/x86/include/asm/resctrl.h
> @@ -43,7 +43,7 @@ struct resctrl_pqr_state {
>  DECLARE_PER_CPU(struct resctrl_pqr_state, pqr_state);
>  
>  extern bool rdt_alloc_capable;
> -extern bool rdt_mon_capable;
> +extern int rdt_mon_feature_count;
>  
>  DECLARE_STATIC_KEY_FALSE(rdt_enable_key);
>  DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
> @@ -68,7 +68,7 @@ static inline void resctrl_arch_disable_alloc(void)
>  
>  static inline bool resctrl_arch_mon_capable(void)
>  {
> -	return rdt_mon_capable;
> +	return !!rdt_mon_feature_count;
>  }
>  

This seem unnecessarily complicated to me. Can global "rdt_mon_capable" instead be dropped
and let resctrl_arch_mon_capable() just return "true" if any of the resources are
"mon_capable"?

Reinette


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 08/14] fs,x86,mpam/resctrl: Handle change in number of RMIDs on each mount
  2026-06-01 19:56 ` [PATCH v7 08/14] fs,x86,mpam/resctrl: Handle change in number of RMIDs on each mount Tony Luck
@ 2026-06-08 23:21   ` Reinette Chatre
  2026-06-09 21:58     ` Luck, Tony
  0 siblings, 1 reply; 44+ messages in thread
From: Reinette Chatre @ 2026-06-08 23:21 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu,
	David E Box, x86
  Cc: Christoph Hellwig, linux-kernel, patches

Hi Tony,

On 6/1/26 12:56 PM, Tony Luck wrote:
> Application Energy Telemetry (AET) event enumeration takes place
> asynchronously. Linux builds the pmt_telemetry module into the kernel to
> kick of enumeration early enough that it completes before first mount of
> the resctrl file system.
> 
> Allowing pmt_telemetry to be a loadable module means that it is possible
> for different numbers of RMIDs to be supported on each mount, depending
> on whether pmt_telemetry module is loaded.
> 
> Add resctrl_arch_system_max_rmid_idx() interface to provide the maximum
> supported number of RMIDs on a system. For x86 this is RDT_RESOURCE_L3
> rdt_resource::mon.num_rmid (if L3 monitoring is enabled).
> 
> Allocate the rmid_ptrs, rdt_l3_mon_domain::rmid_busy_llc, and
> rdt_l3_mon_domain::mbm_states based on the maximum possible.

This is a significant change to data structures expected to be
indexed by RMID. Could you please update kernel-doc of these members to reflect
the new usage?

> 
> Initialize rmid_free_lru based on the number of RMIDs available for
> this mount.
> 

Above three paragraps writes verbatim what can be seen in the patch self.
Could you please expand the changelog to help reader understand why these
changes are made?

This patch is quite difficult to decipher when one only has the code to consider.

> Note that some RMIDs may still be marked busy from a previous mount.
> Don't add these to the free list. Check current RMID limit in
> limbo_release_entry() and do not add out of range RMIDs to the
> free list.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---

...

> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 1af8f965fdd0..934492c7e643 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -129,6 +129,18 @@ u32 resctrl_arch_system_num_rmid_idx(void)
>  	return num_rmids == U32_MAX ? 0 : num_rmids;
>  }
>  
> +/**
> + * resctrl_arch_system_max_rmid_idx - Largest possible number of RMIDs
> + *
> + * Return: If L3 monitoring is supported, largest possible comes from L3.
> + */
> +u32 resctrl_arch_system_max_rmid_idx(void)
> +{
> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> +
> +	return r->mon_capable ? r->mon.num_rmid : resctrl_arch_system_num_rmid_idx();
> +}
> +
>  struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
>  {
>  	if (l >= RDT_NUM_RESOURCES)
> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
> index 226ff6f532fa..7079870ca894 100644
> --- a/drivers/resctrl/mpam_resctrl.c
> +++ b/drivers/resctrl/mpam_resctrl.c
> @@ -272,6 +272,11 @@ u32 resctrl_arch_system_num_rmid_idx(void)
>  	return (mpam_pmg_max + 1) * (mpam_partid_max + 1);
>  }
>  
> +u32 resctrl_arch_system_max_rmid_idx(void)
> +{
> +	return resctrl_arch_system_num_rmid_idx();
> +}
> +
>  u32 resctrl_arch_rmid_idx_encode(u32 closid, u32 rmid)
>  {
>  	return closid * (mpam_pmg_max + 1) + rmid;
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index 327e7a863614..b374e2f84a75 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -115,10 +115,17 @@ static inline struct rmid_entry *__rmid_entry(u32 idx)
>  
>  static void limbo_release_entry(struct rmid_entry *entry)
>  {
> +	u32 idx_limit = resctrl_arch_system_num_rmid_idx();

Having the code switch idx_limit to sometimes be resctrl_arch_system_num_rmid_idx()
and other times resctrl_arch_system_max_rmid_idx() makes this change difficult to
follow. I think it will help to use different variable names to differentiate 
the context in which it is being used and not leave reader trying to understand
why there are *two* limits.

>  	lockdep_assert_held(&rdtgroup_mutex);
>  
>  	rmid_limbo_count--;
> -	list_add_tail(&entry->list, &rmid_free_lru);
> +
> +	/*
> +	 * Limbo may be freeing an RMID from a previous mount where there
> +	 * were more RMIDs available.
> +	 */
> +	if (resctrl_arch_rmid_idx_encode(entry->closid, entry->rmid) < idx_limit)
> +		list_add_tail(&entry->list, &rmid_free_lru);
>  
>  	if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
>  		closid_num_dirty_rmid[entry->closid]--;
> @@ -133,7 +140,7 @@ static void limbo_release_entry(struct rmid_entry *entry)
>  void __check_limbo(struct rdt_l3_mon_domain *d, bool force_free)
>  {
>  	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
> -	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
> +	u32 idx_limit = resctrl_arch_system_max_rmid_idx();
>  	struct rmid_entry *entry;
>  	u32 idx, cur_idx = 1;
>  	void *arch_mon_ctx;
> @@ -192,7 +199,7 @@ void __check_limbo(struct rdt_l3_mon_domain *d, bool force_free)
>  
>  bool has_busy_rmid(struct rdt_l3_mon_domain *d)
>  {
> -	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
> +	u32 idx_limit = resctrl_arch_system_max_rmid_idx();
>  
>  	return find_first_bit(d->rmid_busy_llc, idx_limit) != idx_limit;
>  }
> @@ -916,24 +923,32 @@ int setup_rmid_lru_list(void)
>  		return 0;
>  
>  	/*
> -	 * Called on every mount, but the number of RMIDs cannot change
> -	 * after the first mount, so keep using the same set of rmid_ptrs[]
> -	 * until resctrl_exit(). Note that the limbo handler continues to
> -	 * access rmid_ptrs[] after resctrl is unmounted.
> +	 * Allocate the largest number of RMIDs that this system will ever
> +	 * need. These cannot be freed until resctrl_exit() because the limbo
> +	 * handler  continues to access rmid_ptrs[] after resctrl is unmounted.
>  	 */
> -	if (rmid_ptrs)
> -		return 0;
> +	if (!rmid_ptrs) {
> +		idx_limit = resctrl_arch_system_max_rmid_idx();
> +		rmid_ptrs = kzalloc_objs(struct rmid_entry, idx_limit);
> +		if (!rmid_ptrs)
> +			return -ENOMEM;
> +
> +		for (i = 0; i < idx_limit; i++) {
> +			entry = &rmid_ptrs[i];
> +			INIT_LIST_HEAD(&entry->list);
> +
> +			resctrl_arch_rmid_idx_decode(i, &entry->closid, &entry->rmid);
> +		}
> +	}
>  
> +	/* Find how many RMIDs are needed for this mount */
>  	idx_limit = resctrl_arch_system_num_rmid_idx();
> -	rmid_ptrs = kzalloc_objs(struct rmid_entry, idx_limit);
> -	if (!rmid_ptrs)
> -		return -ENOMEM;
>  
> +	INIT_LIST_HEAD(&rmid_free_lru);
>  	for (i = 0; i < idx_limit; i++) {
>  		entry = &rmid_ptrs[i];
> -		INIT_LIST_HEAD(&entry->list);
> -
> -		resctrl_arch_rmid_idx_decode(i, &entry->closid, &entry->rmid);
> +		if (i && entry->busy)
> +			continue;
>  		list_add_tail(&entry->list, &rmid_free_lru);
>  	}
>  
> @@ -1156,7 +1171,7 @@ static void mbm_cntr_free_all(struct rdt_resource *r, struct rdt_l3_mon_domain *
>   */
>  static void resctrl_reset_rmid_all(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
>  {
> -	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
> +	u32 idx_limit = resctrl_arch_system_max_rmid_idx();
>  	enum resctrl_event_id evt;
>  	int idx;
>  
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index d2a1f88d8782..6d647b71c5db 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -4416,7 +4416,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *h
>   */
>  static int domain_setup_l3_mon_state(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
>  {
> -	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
> +	u32 idx_limit = resctrl_arch_system_max_rmid_idx();
>  	size_t tsize = sizeof(*d->mbm_states[0]);
>  	enum resctrl_event_id eventid;
>  	int idx;

Reinette


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 09/14] x86/resctrl: Add PMT registration API for AET enumeration callbacks
  2026-06-01 19:56 ` [PATCH v7 09/14] x86/resctrl: Add PMT registration API for AET enumeration callbacks Tony Luck
@ 2026-06-08 23:21   ` Reinette Chatre
  0 siblings, 0 replies; 44+ messages in thread
From: Reinette Chatre @ 2026-06-08 23:21 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu,
	David E Box, x86
  Cc: Christoph Hellwig, linux-kernel, patches

Hi Tony,

On 6/1/26 12:56 PM, Tony Luck wrote:
> @@ -289,6 +293,10 @@ static enum pmt_feature_id lookup_pfid(const char *pfname)
>  	return FEATURE_INVALID;
>  }
>  
> +static struct module *pmt_module;
> +static struct pmt_feature_group *(*get_feature)(enum pmt_feature_id id);
> +static void (*put_feature)(struct pmt_feature_group *p);
> +
>  /*
>   * Request a copy of struct pmt_feature_group for each event group. If there is
>   * one, the returned structure has an array of telemetry_region structures,
> @@ -323,6 +331,28 @@ bool intel_aet_get_events(void)
>  	return ret;
>  }
>  
> +static DEFINE_MUTEX(aet_register_lock);

Could you please add documentation describing what this mutex protects?

Reinette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 10/14] platform/x86/intel/pmt: Register enumeration functions with resctrl
  2026-06-01 19:56 ` [PATCH v7 10/14] platform/x86/intel/pmt: Register enumeration functions with resctrl Tony Luck
@ 2026-06-08 23:22   ` Reinette Chatre
  2026-06-09 22:11     ` Luck, Tony
  0 siblings, 1 reply; 44+ messages in thread
From: Reinette Chatre @ 2026-06-08 23:22 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu,
	David E Box, x86
  Cc: Christoph Hellwig, linux-kernel, patches

Hi Tony,

On 6/1/26 12:56 PM, Tony Luck wrote:
> INTEL_PMT_TELEMETRY is a loadable module, but resctrl is built-in and cannot
> call PMT functions directly.  Register the telemetry enumeration function
> pointers at pmt_telemetry module init, and unregister them at module exit.

To ensure intel_pmt_get_regions_by_feature() has access to complete data, could
it be more accurate to register at the end of PMT's .probe() and similarly
unregister during .remove()?

Reinette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 11/14] mpam,x86/resctrl: Resolve INTEL_PMT_TELEMETRY symbols at runtime
  2026-06-01 19:56 ` [PATCH v7 11/14] mpam,x86/resctrl: Resolve INTEL_PMT_TELEMETRY symbols at runtime Tony Luck
@ 2026-06-08 23:25   ` Reinette Chatre
  2026-06-10  0:08     ` Luck, Tony
  0 siblings, 1 reply; 44+ messages in thread
From: Reinette Chatre @ 2026-06-08 23:25 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu,
	David E Box, x86
  Cc: Christoph Hellwig, linux-kernel, patches

Hi Tony,

Regarding a subject prefix ... We do not yet have a precedent for prefix
when MPAM, x86 and fs is changed so this series may indeed have the
opportunity to set that precedent. When doing so, please be consistent.

On 6/1/26 12:56 PM, Tony Luck wrote:
> resctrl is always built-in, but INTEL_PMT_TELEMETRY and INTEL_TPMI are
> logically independent and should be loadable modules.  Switch AET to use
> the function-pointer registration API (introduced in the preceding patch)

Please drop "(introduced in the preceding patch)" since patches may not
always keep this series's order once merged.

> instead of direct link-time references to PMT symbols.
> 
> Prepare for the file system to call resctrl_arch_pre_mount() on every mount
> by moving AET enumeration into resctrl_arch_pre_mount() and cleanup into
> resctrl_arch_unmount(). This allows the PMT module to be unloaded whenever
> the filesystem is not mounted.
> 
> Remove intel_aet_exit because all cleanup now happens in the unmount path.
> 
> Note that the Linux file system code does not serialize calls to
> fs_context_operations::get_tree(), so there may be arbitrarily many
> parallel calls if users invoke mount(2) multiple times.
> 
> Add locking and state (resctrl_arch_mount_entries) to avoid repeated
> enumeration on nested mount requests from file system code (which will be
> failed with -EBUSY status).
> 
> event_group::num_rmid may be reset (reduced) during enumeration. This is
> not worth resetting on unmount because the same reduction would occur on
> each subsequent mount.

Even more important to ensure that PMT is done with its enumeration when
AET enumeration starts?

> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  include/linux/resctrl.h                 |  3 ++
>  arch/x86/kernel/cpu/resctrl/internal.h  |  8 ++--
>  arch/x86/kernel/cpu/resctrl/core.c      | 41 +++++++++++++++++--
>  arch/x86/kernel/cpu/resctrl/intel_aet.c | 52 ++++++++++++++++++++++---
>  drivers/resctrl/mpam_resctrl.c          |  4 ++
>  5 files changed, 95 insertions(+), 13 deletions(-)
> 
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 3705f0214fa6..9534d42e0c57 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -557,6 +557,9 @@ void resctrl_offline_cpu(unsigned int cpu);
>   */
>  void resctrl_arch_pre_mount(void);
>  
> +/* Called to report unmount. */
> +void resctrl_arch_unmount(void);
> +
>  /**
>   * resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid
>   *			      for this resource and domain.
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 3b09cfe9a046..017a19143ec9 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -234,15 +234,15 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
>  void resctrl_arch_mbm_cntr_assign_set_one(struct rdt_resource *r);
>  
>  #ifdef CONFIG_X86_CPU_RESCTRL_INTEL_AET
> -bool intel_aet_get_events(void);
> -void __exit intel_aet_exit(void);
> +bool intel_aet_pre_mount(void);
> +void intel_aet_unmount(void);
>  int intel_aet_read_event(int domid, u32 rmid, void *arch_priv, u64 *val);
>  void intel_aet_mon_domain_setup(int cpu, int id, struct rdt_resource *r,
>  				struct list_head *add_pos);
>  bool intel_handle_aet_option(bool force_off, char *tok);
>  #else
> -static inline bool intel_aet_get_events(void) { return false; }
> -static inline void __exit intel_aet_exit(void) { }
> +static inline bool intel_aet_pre_mount(void) { return false; }
> +static inline void intel_aet_unmount(void) { }
>  static inline int intel_aet_read_event(int domid, u32 rmid, void *arch_priv, u64 *val)
>  {
>  	return -EINVAL;
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 934492c7e643..9336299b9647 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -16,10 +16,12 @@
>  
>  #define pr_fmt(fmt)	"resctrl: " fmt
>  
> +#include <linux/cleanup.h>
>  #include <linux/cpu.h>
>  #include <linux/slab.h>
>  #include <linux/err.h>
>  #include <linux/cpuhotplug.h>
> +#include <linux/mutex.h>
>  
>  #include <asm/cpu_device_id.h>
>  #include <asm/msr.h>
> @@ -776,12 +778,20 @@ static int resctrl_arch_offline_cpu(unsigned int cpu)
>  	return 0;
>  }
>  
> +static DEFINE_MUTEX(resctrl_arch_mount_lock);

Please document what this mutex protects.

> +static int resctrl_arch_mount_entries;
> +
>  void resctrl_arch_pre_mount(void)
>  {
>  	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
>  	int cpu;
>  
> -	if (!intel_aet_get_events())
> +	guard(mutex)(&resctrl_arch_mount_lock);
> +
> +	if (++resctrl_arch_mount_entries > 1)
> +		return;
> +
> +	if (!intel_aet_pre_mount())
>  		return;
>  
>  	/*
> @@ -798,6 +808,33 @@ void resctrl_arch_pre_mount(void)
>  	cpus_read_unlock();
>  }
>  
> +void resctrl_arch_unmount(void)
> +{
> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
> +	int cpu;
> +
> +	guard(mutex)(&resctrl_arch_mount_lock);
> +
> +	if (--resctrl_arch_mount_entries > 0)
> +		return;
> +
> +	WARN_ON(resctrl_arch_mount_entries < 0);
> +
> +	intel_aet_unmount();
> +
> +	if (!r->mon_capable)
> +		return;
> +
> +	cpus_read_lock();
> +	mutex_lock(&domain_list_lock);
> +	for_each_online_cpu(cpu)
> +		domain_remove_cpu_mon(cpu, r);
> +	r->mon_capable = false;
> +	rdt_mon_feature_count--;
> +	mutex_unlock(&domain_list_lock);
> +	cpus_read_unlock();
> +}
> +
>  enum {
>  	RDT_FLAG_CMT,
>  	RDT_FLAG_MBM_TOTAL,
> @@ -1174,8 +1211,6 @@ late_initcall(resctrl_arch_late_init);
>  
>  static void __exit resctrl_arch_exit(void)
>  {
> -	intel_aet_exit();
> -
>  	cpuhp_remove_state(rdt_online);
>  
>  	resctrl_exit();
> diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> index db671725acbb..74c34593876b 100644
> --- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
> +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> @@ -25,8 +25,8 @@
>  #include <linux/intel_vsec.h>
>  #include <linux/io.h>
>  #include <linux/minmax.h>
> -#include <linux/mutex.h>
>  #include <linux/module.h>
> +#include <linux/mutex.h>
>  #include <linux/printk.h>
>  #include <linux/rculist.h>
>  #include <linux/rcupdate.h>
> @@ -308,7 +308,7 @@ static void (*put_feature)(struct pmt_feature_group *p);
>   * struct pmt_feature_group to indicate that its events are successfully
>   * enabled.
>   */
> -bool intel_aet_get_events(void)
> +static bool aet_get_events(void)
>  {
>  	struct pmt_feature_group *p;
>  	enum pmt_feature_id pfid;
> @@ -317,14 +317,14 @@ bool intel_aet_get_events(void)
>  
>  	for_each_event_group(peg) {
>  		pfid = lookup_pfid((*peg)->pfname);
> -		p = intel_pmt_get_regions_by_feature(pfid);
> +		p = get_feature(pfid);
>  		if (IS_ERR_OR_NULL(p))
>  			continue;
>  		if (enable_events(*peg, p)) {
>  			(*peg)->pfg = p;
>  			ret = true;
>  		} else {
> -			intel_pmt_put_feature_group(p);
> +			put_feature(p);
>  		}
>  	}
>  
> @@ -353,16 +353,56 @@ void intel_aet_unregister_enumeration(void)
>  }
>  EXPORT_SYMBOL_NS_GPL(intel_aet_unregister_enumeration, "INTEL_PMT");
>  
> -void __exit intel_aet_exit(void)
> +/*
> + * The pmt_telemetry module may not be loaded when the resctrl file system

Is this accurate? What prevents pmt_telemetry module from being loaded when
resctrl file system is mounted?

> + * is mounted. Keep track of whether a hold was placed on the module to know
> + * whether to release it during unmount.
> + */
> +static bool have_pmt_hold;
> +
> +bool intel_aet_pre_mount(void)
> +{
> +	bool ret;
> +
> +	guard(mutex)(&aet_register_lock);
> +	if (!get_feature || !put_feature)
> +		return false;
> +
> +	if (pmt_module) {
> +		if (!try_module_get(pmt_module))
> +			return false;
> +		have_pmt_hold = true;
> +	}
> +
> +	ret = aet_get_events();
> +
> +	if (!ret && pmt_module && have_pmt_hold) {
> +		module_put(pmt_module);
> +		have_pmt_hold = false;
> +	}
> +
> +	return ret;
> +}
> +
> +void intel_aet_unmount(void)
>  {
>  	struct event_group **peg;
>  
> +	guard(mutex)(&aet_register_lock);

Could this get a short-circuit to make behavior on AMD obvious?

>  	for_each_event_group(peg) {
>  		if ((*peg)->pfg) {
> -			intel_pmt_put_feature_group((*peg)->pfg);
> +			struct event_group *e = *peg;
> +
> +			for (int j = 0; j < e->num_events; j++)
> +				resctrl_disable_mon_event(e->evts[j].id);
> +			put_feature((*peg)->pfg);
>  			(*peg)->pfg = NULL;
>  		}
>  	}
> +	if (have_pmt_hold && pmt_module) {
> +		module_put(pmt_module);
> +		have_pmt_hold = false;
> +	}
>  }
>  
>  #define DATA_VALID	BIT_ULL(63)
> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
> index 7079870ca894..5859cc0f5e37 100644
> --- a/drivers/resctrl/mpam_resctrl.c
> +++ b/drivers/resctrl/mpam_resctrl.c
> @@ -162,6 +162,10 @@ void resctrl_arch_pre_mount(void)
>  {
>  }
>  
> +void resctrl_arch_unmount(void)
> +{
> +}
> +
>  bool resctrl_arch_get_cdp_enabled(enum resctrl_res_level rid)
>  {
>  	return mpam_resctrl_controls[rid].cdp_enabled;

Reinette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 12/14] fs/resctrl: Call architecture hooks for every mount/unmount
  2026-06-01 19:56 ` [PATCH v7 12/14] fs/resctrl: Call architecture hooks for every mount/unmount Tony Luck
@ 2026-06-08 23:26   ` Reinette Chatre
  2026-06-10 16:16     ` Luck, Tony
  0 siblings, 1 reply; 44+ messages in thread
From: Reinette Chatre @ 2026-06-08 23:26 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu,
	David E Box, x86
  Cc: Christoph Hellwig, linux-kernel, patches

Hi Tony,

On 6/1/26 12:56 PM, Tony Luck wrote:
> @@ -3147,6 +3148,10 @@ static int rdt_get_tree(struct fs_context *fc)
>  out:
>  	mutex_unlock(&rdtgroup_mutex);
>  	cpus_read_unlock();
> +
> +	if (ret)

Can ret be zero here?

> +		resctrl_arch_unmount();
> +
>  	return ret;
>  }
>  

Reinette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 14/14] Documentation/filesystems/resctrl: Add footnote for telemetry fstab mount caveat
  2026-06-01 19:56 ` [PATCH v7 14/14] Documentation/filesystems/resctrl: Add footnote for telemetry fstab mount caveat Tony Luck
@ 2026-06-08 23:26   ` Reinette Chatre
  2026-06-10 16:19     ` Luck, Tony
  0 siblings, 1 reply; 44+ messages in thread
From: Reinette Chatre @ 2026-06-08 23:26 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu,
	David E Box, x86
  Cc: Christoph Hellwig, linux-kernel, patches

Hi Tony,

On 6/1/26 12:56 PM, Tony Luck wrote:
> Add a footnote to the 'If telemetry monitoring is enabled' sentence noting
> that because PMT driver enumerates telemetry features asynchronously, an
> automatic mount of resctrl from /etc/fstab at boot may occur before those
> features are available, resulting in them not being enabled.
> 
> Assisted-by: Claude:Sonnet_4.6
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  Documentation/filesystems/resctrl.rst | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
> index b003bed339fd..4e62b8c5e0d6 100644
> --- a/Documentation/filesystems/resctrl.rst
> +++ b/Documentation/filesystems/resctrl.rst
> @@ -620,7 +620,7 @@ When monitoring is enabled all MON groups will also contain:
>  	each instance of an L3 cache. Each directory contains files for the enabled
>  	L3 events (e.g. "llc_occupancy", "mbm_total_bytes", and "mbm_local_bytes").
>  
> -	If telemetry monitoring is enabled, there will be a "mon_PERF_PKG_YY"
> +	If telemetry monitoring is enabled [#]_, there will be a "mon_PERF_PKG_YY"
>  	directory for each physical processor package. Each directory contains
>  	files for the enabled telemetry events (e.g. "core_energy". "activity",
>  	"uops_retired", etc.)
> @@ -659,6 +659,11 @@ When monitoring is enabled all MON groups will also contain:
>  	returned if the MBM event does not have an assigned counter in the
>  	CTRL_MON group nor in any of its associated MON groups.
>  
> +.. [#] Telemetry features are enumerated asynchronously by the PMT driver. If
> +   resctrl is automatically mounted from ``/etc/fstab`` at boot, the telemetry
> +   features may not yet be available at mount time and will therefore not be
> +   enabled.

Would it be helpful to add a snippet that explains that in this scenario telemetry
features could be enabled on resctrl remount?

> +
>  "mon_hw_id":
>  	Available only with debug option. The identifier used by hardware
>  	for the monitor group. On x86 this is the RMID.

Reinette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 05/14] x86/resctrl: Stop setting event_group::force_off on RMID shortage
  2026-06-08 23:16   ` Reinette Chatre
@ 2026-06-09 16:51     ` Luck, Tony
  2026-06-09 23:02       ` Reinette Chatre
  0 siblings, 1 reply; 44+ messages in thread
From: Luck, Tony @ 2026-06-09 16:51 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Chen Yu, David E Box, x86,
	Christoph Hellwig, linux-kernel, patches

On Mon, Jun 08, 2026 at 04:16:58PM -0700, Reinette Chatre wrote:
> Hi Tony,
> 
> On 6/1/26 12:56 PM, Tony Luck wrote:
> > Drop the force_off assignment from all_regions_have_sufficient_rmid().
> 
> Apart from this being obvious from the patch, this is not how an x86 changelog
> should be structured. Please see "Changelog" in
> Documentation/process/maintainer-tip.rst

Revised version:
---
Subject: x86/resctrl: Fix usage of event_group::force_off in AET

The kernel command line option "rdt=" is used to force enable or disable
individual resctrl features.

There are two problems with usage in the AET (Application Energy
Telemetry) feature:
1) If the user specifies both enabling and disabling of the same feature
   (e.g. "rdt=energy,!energy") the request to enable should override the
   request to disable (for consistency with other features).
2) event_group::force_off is set true in all_regions_have_sufficient_rmid()
   which will cause problems when AET features are enumerated on each mount
   of the resctrl file system

Fix the first issue by checking event_group::force_on in enable_events().

Fix the other issue by dropping the assignment to event_group::force_off.

---

> 
> Please note that, after the preparatory fixes that are expected to land separately,
> this will be the first patch of this series ... having this first patch just
> kick off with "what changes" without any context is a difficult way for a
> reviewer to start considering this piece of work.
> 
> > This preserves current single-enumeration behaviour while preparing for
> > the upcoming per-mount enumeration, where latching force_off would
> > incorrectly suppress re-enumeration on subsequent mounts - even when the
> > user explicitly requested the feature via "rdt={feature}".
> 
> I believe that user space should still expect that rdt= options behave
> consistently. So if user space for some reason provides: rdt=!mbm,mbm,!energy,energy
> on a system that supports both then it should not be the case that one is enabled and the
> other not.
> 
> I this think that, for example, enable_events() should start with:
> 
> 	if (e->force_off && !e->force_on)
> 		return false;

OK.

>  
> > Signed-off-by: Tony Luck <tony.luck@intel.com>
> > ---
> >  arch/x86/kernel/cpu/resctrl/intel_aet.c | 8 +++-----
> >  1 file changed, 3 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> > index 89b8b619d5d5..e2af700bca04 100644
> > --- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
> > +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> > @@ -60,8 +60,8 @@ struct pmt_event {
> >   *			data for all telemetry regions of type @pfname.
> >   *			Valid if the system supports the event group,
> >   *			NULL otherwise.
> > - * @force_off:		True when "rdt" command line or architecture code disables
> > - *			this event group due to insufficient RMIDs.
> > + * @force_off:		True when "rdt" command line disables this event group
> > + *			to avoid system limitations due to insufficient RMIDs.
> 
> >From what I understand it is only architecture that can disable an event group
> "to avoid system limitations due to insufficient RMIDs" and this capability is removed
> in this patch. Above implies that this capability now needs to be implemented
> by userspace, which I do not think is accurate?

Should I just drop the second line? The user may have various other reasons
why they want to disable this event group.

Combined with the code change you suggest above, this would describe the
use of these two fields. "force_off" disables, "force_on" enables (and
overrides any "force_off").

> 
> >   * @force_on:		True when "rdt" command line overrides disable of this
> >   *			event group.
> >   * @guid:		Unique number per XML description file.
> > @@ -214,10 +214,8 @@ static bool all_regions_have_sufficient_rmid(struct event_group *e, struct pmt_f
> >  		if (!p->regions[i].addr)
> >  			continue;
> >  		tr = &p->regions[i];
> > -		if (tr->num_rmids < e->num_rmid) {
> > -			e->force_off = true;
> > +		if (tr->num_rmids < e->num_rmid)
> >  			return false;
> > -		}
> >  	}
> >  
> >  	return true;
> 
> Reinette

-Tony

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 06/14] fs/resctrl: Add interface to disable a monitor event
  2026-06-08 23:18   ` Reinette Chatre
@ 2026-06-09 17:21     ` Luck, Tony
  2026-06-09 23:02       ` Reinette Chatre
  0 siblings, 1 reply; 44+ messages in thread
From: Luck, Tony @ 2026-06-09 17:21 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Chen Yu, David E Box, x86,
	Christoph Hellwig, linux-kernel, patches

On Mon, Jun 08, 2026 at 04:18:23PM -0700, Reinette Chatre wrote:
> Hi Tony,
> 
> On 6/1/26 12:56 PM, Tony Luck wrote:
> > In preparation for re-running AET enumeration on every mount, AET code must be
> > able to disable events on unmount so the next mount starts from a clean slate.
> > 
> > Add a file system interface for architecture to clear the enabled flag for
> > a given event.
> 
> This just verbatim describes what the patch does. It would be helpful to describe
> how the flag is used by resctrl to support the first paragraph's implicit claim
> that the enabled flag's value is not relevant when resctrl is unmounted.

Revised commit:
---
Subject: fs/resctrl: Add interface to disable a monitor event

In preparation for re-running AET enumeration on every mount, AET code must be
able to disable events on unmount so the next mount starts from a clean slate.

Add a file system interface for architecture to reset architecture
controlled fields of the given event. mon_event::enabled is only used
during mount and at run time to check which events to include in file
system objects. It is not used during unmount, so it is safe to clear
it as part of the unmount flow.

Add kerneldoc comments to describe limitations on when events may be enabled
or disabled.
---

> 
> > diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> > index 9fd901c78dc6..327e7a863614 100644
> > --- a/fs/resctrl/monitor.c
> > +++ b/fs/resctrl/monitor.c
> > @@ -1012,6 +1012,18 @@ bool resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu,
> >  	return true;
> >  }
> >  
> > +void resctrl_disable_mon_event(enum resctrl_event_id eventid)
> > +{
> > +	if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS))
> > +		return;
> > +	if (!mon_event_all[eventid].enabled) {
> > +		pr_warn("Repeat disable for event %d\n", eventid);
> > +		return;
> > +	}
> > +
> > +	mon_event_all[eventid].enabled = false;
> > +}
> 
> It seems reasonable for architecture to expect that "disable" of event undoes all
> settings from the earlier "enable" of event. The new function only sets the "enabled"
> flag to false though. This is potentially confusing since it leaves the event with
> some lingering state, some of which is a pointer to state that an architecture may
> be reasonable to remove after disabling this event leaving a dangling pointer. 

Ok. I'll add:

+	mon_event_all[eventid].any_cpu = false;
+	mon_event_all[eventid].binary_bits = 0;
+	mon_event_all[eventid].arch_priv = NULL;

> 
> Reinette

-Tony
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 07/14] x86/resctrl: Maintain a count of enabled monitor features
  2026-06-08 23:18   ` Reinette Chatre
@ 2026-06-09 18:46     ` Luck, Tony
  2026-06-09 23:03       ` Reinette Chatre
  0 siblings, 1 reply; 44+ messages in thread
From: Luck, Tony @ 2026-06-09 18:46 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Chen Yu, David E Box, x86,
	Christoph Hellwig, linux-kernel, patches

On Mon, Jun 08, 2026 at 04:18:54PM -0700, Reinette Chatre wrote:
> Hi Tony,
> 
> On 6/1/26 12:56 PM, Tony Luck wrote:
> > AET (Application Energy Telemetry) may be enabled/disabled from one mount
> > to the next depending on whether the pmt_telemetry module is loaded. If
> > AET is the only monitoring feature supported on a system and it is enabled
> > in one mount, but disabled in a subsequent mount this will result in empty
> > mon_data directories.
> > 
> > Change from a boolean to a count of enabled monitor features inside
> > architecture code. File system code only needs to know if any monitor
> 
> First sentence seems to be missing what is being changed.
> 
> > features are enabled so resctrl_arch_mon_capable() can still return
> > boolean.
> > 
> > Signed-off-by: Tony Luck <tony.luck@intel.com>
> > ---
> >  arch/x86/include/asm/resctrl.h         |  4 ++--
> >  arch/x86/kernel/cpu/resctrl/internal.h |  2 +-
> >  arch/x86/kernel/cpu/resctrl/core.c     | 24 +++++++++++++-----------
> >  arch/x86/kernel/cpu/resctrl/monitor.c  | 11 +++--------
> >  4 files changed, 19 insertions(+), 22 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
> > index 575f8408a9e7..1e50c7dc3fe3 100644
> > --- a/arch/x86/include/asm/resctrl.h
> > +++ b/arch/x86/include/asm/resctrl.h
> > @@ -43,7 +43,7 @@ struct resctrl_pqr_state {
> >  DECLARE_PER_CPU(struct resctrl_pqr_state, pqr_state);
> >  
> >  extern bool rdt_alloc_capable;
> > -extern bool rdt_mon_capable;
> > +extern int rdt_mon_feature_count;
> >  
> >  DECLARE_STATIC_KEY_FALSE(rdt_enable_key);
> >  DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
> > @@ -68,7 +68,7 @@ static inline void resctrl_arch_disable_alloc(void)
> >  
> >  static inline bool resctrl_arch_mon_capable(void)
> >  {
> > -	return rdt_mon_capable;
> > +	return !!rdt_mon_feature_count;
> >  }
> >  
> 
> This seem unnecessarily complicated to me. Can global "rdt_mon_capable" instead be dropped
> and let resctrl_arch_mon_capable() just return "true" if any of the resources are
> "mon_capable"?

Conceptually this is much simpler. But I have questions about the
implementation. The new function is trivial:

bool resctrl_arch_mon_capable(void)
{
        struct rdt_resource *r;

        for_each_mon_capable_rdt_resource(r)
                return true;

        return false;
}

But that led me to #include hell when I tried to keep it as an inline
function in <asm/resctrl.h> because for_each_mon_capable_rdt_resource()
is defined in <linux/resctrl.h> after the #include <asm/resctrl.h>

So I've moved it out-of-line into arch/x86/kernel/cpu/resctrl/core.c.

The MPAM implementation is also out-of-line.

But then I wondered about performance. This change on x86 goes from an
inline function that simply returns the value of a global variable to an
out-of-line function that scans the array of rdt resources. The common
case will be a hit on the first element, so not awful. But still worse
that before I touched it.

So I looked for places where resctrl_arch_mon_capable() is called in
"hot" code paths. There's a bunch in mount and mkdir, but those aren't
very hot.

My list (check to see if I missed any others):

1) Recurring call once per second in mbm_handle_overflow()

Seems redundant. There is a check to only start the overflow handler
on mon_capable systems (only with enabled MBM events!)

2) Call for potentially every task when reading tasks files in is_rmid_match()

Also seems redundant. Next part of that "if" looks at "r->type == RDTMON_GROUP"
which can only be true on mon_capable systems.

Should I clean these up in this series? As part of this patch which
exacerbates the performance impact, or as a separate cleanup patch?
> 
> Reinette
> 
-Tony

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 08/14] fs,x86,mpam/resctrl: Handle change in number of RMIDs on each mount
  2026-06-08 23:21   ` Reinette Chatre
@ 2026-06-09 21:58     ` Luck, Tony
  2026-06-09 23:35       ` Reinette Chatre
  0 siblings, 1 reply; 44+ messages in thread
From: Luck, Tony @ 2026-06-09 21:58 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Chen Yu, David E Box, x86,
	Christoph Hellwig, linux-kernel, patches

On Mon, Jun 08, 2026 at 04:21:10PM -0700, Reinette Chatre wrote:
> Hi Tony,
> 
Changed the Subject: tag to "mpam,x86,fs/resctrl" to match what you used
for the RFC discussion. I think the implied rule is "architectures first
in alphabetical order, filesystem last".

> On 6/1/26 12:56 PM, Tony Luck wrote:
> > Application Energy Telemetry (AET) event enumeration takes place
> > asynchronously. Linux builds the pmt_telemetry module into the kernel to
> > kick of enumeration early enough that it completes before first mount of
> > the resctrl file system.
> > 
> > Allowing pmt_telemetry to be a loadable module means that it is possible
> > for different numbers of RMIDs to be supported on each mount, depending
> > on whether pmt_telemetry module is loaded.
> > 
> > Add resctrl_arch_system_max_rmid_idx() interface to provide the maximum
> > supported number of RMIDs on a system. For x86 this is RDT_RESOURCE_L3
> > rdt_resource::mon.num_rmid (if L3 monitoring is enabled).
> > 
> > Allocate the rmid_ptrs, rdt_l3_mon_domain::rmid_busy_llc, and
> > rdt_l3_mon_domain::mbm_states based on the maximum possible.
> 
> This is a significant change to data structures expected to be
> indexed by RMID. Could you please update kernel-doc of these members to reflect
> the new usage?

Will update kerneldoc.

> 
> > 
> > Initialize rmid_free_lru based on the number of RMIDs available for
> > this mount.
> > 
> 
> Above three paragraps writes verbatim what can be seen in the patch self.
> Could you please expand the changelog to help reader understand why these
> changes are made?

Will rewrite.

> This patch is quite difficult to decipher when one only has the code to consider.
> 
> > Note that some RMIDs may still be marked busy from a previous mount.
> > Don't add these to the free list. Check current RMID limit in
> > limbo_release_entry() and do not add out of range RMIDs to the
> > free list.
> > 
> > Signed-off-by: Tony Luck <tony.luck@intel.com>
> > ---
> 
> ...
> 
> > diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> > index 1af8f965fdd0..934492c7e643 100644
> > --- a/arch/x86/kernel/cpu/resctrl/core.c
> > +++ b/arch/x86/kernel/cpu/resctrl/core.c
> > @@ -129,6 +129,18 @@ u32 resctrl_arch_system_num_rmid_idx(void)
> >  	return num_rmids == U32_MAX ? 0 : num_rmids;
> >  }
> >  
> > +/**
> > + * resctrl_arch_system_max_rmid_idx - Largest possible number of RMIDs
> > + *
> > + * Return: If L3 monitoring is supported, largest possible comes from L3.
> > + */
> > +u32 resctrl_arch_system_max_rmid_idx(void)
> > +{
> > +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> > +
> > +	return r->mon_capable ? r->mon.num_rmid : resctrl_arch_system_num_rmid_idx();
> > +}
> > +
> >  struct rdt_resource *resctrl_arch_get_resource(enum resctrl_res_level l)
> >  {
> >  	if (l >= RDT_NUM_RESOURCES)
> > diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
> > index 226ff6f532fa..7079870ca894 100644
> > --- a/drivers/resctrl/mpam_resctrl.c
> > +++ b/drivers/resctrl/mpam_resctrl.c
> > @@ -272,6 +272,11 @@ u32 resctrl_arch_system_num_rmid_idx(void)
> >  	return (mpam_pmg_max + 1) * (mpam_partid_max + 1);
> >  }
> >  
> > +u32 resctrl_arch_system_max_rmid_idx(void)
> > +{
> > +	return resctrl_arch_system_num_rmid_idx();
> > +}
> > +
> >  u32 resctrl_arch_rmid_idx_encode(u32 closid, u32 rmid)
> >  {
> >  	return closid * (mpam_pmg_max + 1) + rmid;
> > diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> > index 327e7a863614..b374e2f84a75 100644
> > --- a/fs/resctrl/monitor.c
> > +++ b/fs/resctrl/monitor.c
> > @@ -115,10 +115,17 @@ static inline struct rmid_entry *__rmid_entry(u32 idx)
> >  
> >  static void limbo_release_entry(struct rmid_entry *entry)
> >  {
> > +	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
> 
> Having the code switch idx_limit to sometimes be resctrl_arch_system_num_rmid_idx()
> and other times resctrl_arch_system_max_rmid_idx() makes this change difficult to
> follow. I think it will help to use different variable names to differentiate 
> the context in which it is being used and not leave reader trying to understand
> why there are *two* limits.

I'll change the local variable name to max_idx_limit when dealing with
the max value rather than the current. setup_rmid_lru_list() will have
both variables since max_idx_limit is used for allocation, and idx_limit
to add the right number to rmid_free_lru list. Added a comment to
__check_limbo() explaining why max RMID is used there. I think the other
spots are easy to see why.

> >  	lockdep_assert_held(&rdtgroup_mutex);
> >  
> >  	rmid_limbo_count--;
> > -	list_add_tail(&entry->list, &rmid_free_lru);
> > +
> > +	/*
> > +	 * Limbo may be freeing an RMID from a previous mount where there
> > +	 * were more RMIDs available.
> > +	 */
> > +	if (resctrl_arch_rmid_idx_encode(entry->closid, entry->rmid) < idx_limit)
> > +		list_add_tail(&entry->list, &rmid_free_lru);
> >  
> >  	if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
> >  		closid_num_dirty_rmid[entry->closid]--;
> > @@ -133,7 +140,7 @@ static void limbo_release_entry(struct rmid_entry *entry)
> >  void __check_limbo(struct rdt_l3_mon_domain *d, bool force_free)
> >  {
> >  	struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
> > -	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
> > +	u32 idx_limit = resctrl_arch_system_max_rmid_idx();
> >  	struct rmid_entry *entry;
> >  	u32 idx, cur_idx = 1;
> >  	void *arch_mon_ctx;
> > @@ -192,7 +199,7 @@ void __check_limbo(struct rdt_l3_mon_domain *d, bool force_free)
> >  
> >  bool has_busy_rmid(struct rdt_l3_mon_domain *d)
> >  {
> > -	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
> > +	u32 idx_limit = resctrl_arch_system_max_rmid_idx();
> >  
> >  	return find_first_bit(d->rmid_busy_llc, idx_limit) != idx_limit;
> >  }
> > @@ -916,24 +923,32 @@ int setup_rmid_lru_list(void)
> >  		return 0;
> >  
> >  	/*
> > -	 * Called on every mount, but the number of RMIDs cannot change
> > -	 * after the first mount, so keep using the same set of rmid_ptrs[]
> > -	 * until resctrl_exit(). Note that the limbo handler continues to
> > -	 * access rmid_ptrs[] after resctrl is unmounted.
> > +	 * Allocate the largest number of RMIDs that this system will ever
> > +	 * need. These cannot be freed until resctrl_exit() because the limbo
> > +	 * handler  continues to access rmid_ptrs[] after resctrl is unmounted.
> >  	 */
> > -	if (rmid_ptrs)
> > -		return 0;
> > +	if (!rmid_ptrs) {
> > +		idx_limit = resctrl_arch_system_max_rmid_idx();
> > +		rmid_ptrs = kzalloc_objs(struct rmid_entry, idx_limit);
> > +		if (!rmid_ptrs)
> > +			return -ENOMEM;
> > +
> > +		for (i = 0; i < idx_limit; i++) {
> > +			entry = &rmid_ptrs[i];
> > +			INIT_LIST_HEAD(&entry->list);
> > +
> > +			resctrl_arch_rmid_idx_decode(i, &entry->closid, &entry->rmid);
> > +		}
> > +	}
> >  
> > +	/* Find how many RMIDs are needed for this mount */
> >  	idx_limit = resctrl_arch_system_num_rmid_idx();
> > -	rmid_ptrs = kzalloc_objs(struct rmid_entry, idx_limit);
> > -	if (!rmid_ptrs)
> > -		return -ENOMEM;
> >  
> > +	INIT_LIST_HEAD(&rmid_free_lru);
> >  	for (i = 0; i < idx_limit; i++) {
> >  		entry = &rmid_ptrs[i];
> > -		INIT_LIST_HEAD(&entry->list);
> > -
> > -		resctrl_arch_rmid_idx_decode(i, &entry->closid, &entry->rmid);
> > +		if (i && entry->busy)
> > +			continue;
> >  		list_add_tail(&entry->list, &rmid_free_lru);
> >  	}
> >  
> > @@ -1156,7 +1171,7 @@ static void mbm_cntr_free_all(struct rdt_resource *r, struct rdt_l3_mon_domain *
> >   */
> >  static void resctrl_reset_rmid_all(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
> >  {
> > -	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
> > +	u32 idx_limit = resctrl_arch_system_max_rmid_idx();
> >  	enum resctrl_event_id evt;
> >  	int idx;
> >  
> > diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> > index d2a1f88d8782..6d647b71c5db 100644
> > --- a/fs/resctrl/rdtgroup.c
> > +++ b/fs/resctrl/rdtgroup.c
> > @@ -4416,7 +4416,7 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *h
> >   */
> >  static int domain_setup_l3_mon_state(struct rdt_resource *r, struct rdt_l3_mon_domain *d)
> >  {
> > -	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
> > +	u32 idx_limit = resctrl_arch_system_max_rmid_idx();
> >  	size_t tsize = sizeof(*d->mbm_states[0]);
> >  	enum resctrl_event_id eventid;
> >  	int idx;
> 
> Reinette
>
-Tony

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 10/14] platform/x86/intel/pmt: Register enumeration functions with resctrl
  2026-06-08 23:22   ` Reinette Chatre
@ 2026-06-09 22:11     ` Luck, Tony
  0 siblings, 0 replies; 44+ messages in thread
From: Luck, Tony @ 2026-06-09 22:11 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Chen Yu, David E Box, x86,
	Christoph Hellwig, linux-kernel, patches

On Mon, Jun 08, 2026 at 04:22:27PM -0700, Reinette Chatre wrote:
> Hi Tony,
> 
> On 6/1/26 12:56 PM, Tony Luck wrote:
> > INTEL_PMT_TELEMETRY is a loadable module, but resctrl is built-in and cannot
> > call PMT functions directly.  Register the telemetry enumeration function
> > pointers at pmt_telemetry module init, and unregister them at module exit.
> 
> To ensure intel_pmt_get_regions_by_feature() has access to complete data, could
> it be more accurate to register at the end of PMT's .probe() and similarly
> unregister during .remove()?

Will do.

> 
> Reinette

-Tony

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 05/14] x86/resctrl: Stop setting event_group::force_off on RMID shortage
  2026-06-09 16:51     ` Luck, Tony
@ 2026-06-09 23:02       ` Reinette Chatre
  2026-06-10 20:01         ` Luck, Tony
  0 siblings, 1 reply; 44+ messages in thread
From: Reinette Chatre @ 2026-06-09 23:02 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Chen Yu, David E Box, x86,
	Christoph Hellwig, linux-kernel, patches

Hi Tony,

On 6/9/26 9:51 AM, Luck, Tony wrote:
> On Mon, Jun 08, 2026 at 04:16:58PM -0700, Reinette Chatre wrote:
>> Hi Tony,
>>
>> On 6/1/26 12:56 PM, Tony Luck wrote:
>>> Drop the force_off assignment from all_regions_have_sufficient_rmid().
>>
>> Apart from this being obvious from the patch, this is not how an x86 changelog
>> should be structured. Please see "Changelog" in
>> Documentation/process/maintainer-tip.rst
> 
> Revised version:
> ---
> Subject: x86/resctrl: Fix usage of event_group::force_off in AET
> 
> The kernel command line option "rdt=" is used to force enable or disable
> individual resctrl features.
> 
> There are two problems with usage in the AET (Application Energy
> Telemetry) feature:

Above seems incomplete. Could it be "two problems with usage of the rdt= kernel
command line ..." or "two problems with the "rdt=" usage ..." or ...? 

> 1) If the user specifies both enabling and disabling of the same feature
>    (e.g. "rdt=energy,!energy") the request to enable should override the
>    request to disable (for consistency with other features).
> 2) event_group::force_off is set true in all_regions_have_sufficient_rmid()
>    which will cause problems when AET features are enumerated on each mount
>    of the resctrl file system

Two comments:
- In general we should aim to make the changelog as specific as possible to
  prevent reader needing to do any deciphering. This makes the review easier
  and faster. So, please avoid general things like "will cause problems" that
  requires reader to figure out the problems (plural!) on their own. Instead
  just be specific about what the problems are.
- Is (2) still a problem when (1) is fixed? An underlying question is: if an
  event group is successfully enumerated during resctrl mount, will it always
  enumerate with identical properties on every subsequent mount?

> 
> Fix the first issue by checking event_group::force_on in enable_events().

"the first issue" seems vague to me while the rest describes the code that can be
seen from the patch. How about something like below to help describe what the code
change accomplishes:
	Enumerate the event group if user space overrides any disable of the event group.

> 
> Fix the other issue by dropping the assignment to event_group::force_off.

"the other issue" is very vague. Here it will also help to be specific. Although
per earlier comment the need for this fix is no longer clear to me?

> 
> ---
> 
>>
>> Please note that, after the preparatory fixes that are expected to land separately,
>> this will be the first patch of this series ... having this first patch just
>> kick off with "what changes" without any context is a difficult way for a
>> reviewer to start considering this piece of work.
>>
>>> This preserves current single-enumeration behaviour while preparing for
>>> the upcoming per-mount enumeration, where latching force_off would
>>> incorrectly suppress re-enumeration on subsequent mounts - even when the
>>> user explicitly requested the feature via "rdt={feature}".
>>
>> I believe that user space should still expect that rdt= options behave
>> consistently. So if user space for some reason provides: rdt=!mbm,mbm,!energy,energy
>> on a system that supports both then it should not be the case that one is enabled and the
>> other not.
>>
>> I this think that, for example, enable_events() should start with:
>>
>> 	if (e->force_off && !e->force_on)
>> 		return false;
> 
> OK.
> 
>>  
>>> Signed-off-by: Tony Luck <tony.luck@intel.com>
>>> ---
>>>  arch/x86/kernel/cpu/resctrl/intel_aet.c | 8 +++-----
>>>  1 file changed, 3 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
>>> index 89b8b619d5d5..e2af700bca04 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
>>> @@ -60,8 +60,8 @@ struct pmt_event {
>>>   *			data for all telemetry regions of type @pfname.
>>>   *			Valid if the system supports the event group,
>>>   *			NULL otherwise.
>>> - * @force_off:		True when "rdt" command line or architecture code disables
>>> - *			this event group due to insufficient RMIDs.
>>> + * @force_off:		True when "rdt" command line disables this event group
>>> + *			to avoid system limitations due to insufficient RMIDs.
>>
>> >From what I understand it is only architecture that can disable an event group
>> "to avoid system limitations due to insufficient RMIDs" and this capability is removed
>> in this patch. Above implies that this capability now needs to be implemented
>> by userspace, which I do not think is accurate?
> 
> Should I just drop the second line? The user may have various other reasons
> why they want to disable this event group.

Now that enable_events() starts with a check of event_group::force_on it seems that
the existing behavior of event_group::force_off to also reflect architecture 
disable can be maintained?
 
> Combined with the code change you suggest above, this would describe the
> use of these two fields. "force_off" disables, "force_on" enables (and
> overrides any "force_off").

I believe the "force_on" description below already accurately describes how it
overrides an earlier disable.

> 
>>
>>>   * @force_on:		True when "rdt" command line overrides disable of this
>>>   *			event group.
>>>   * @guid:		Unique number per XML description file.
>>> @@ -214,10 +214,8 @@ static bool all_regions_have_sufficient_rmid(struct event_group *e, struct pmt_f
>>>  		if (!p->regions[i].addr)
>>>  			continue;
>>>  		tr = &p->regions[i];
>>> -		if (tr->num_rmids < e->num_rmid) {
>>> -			e->force_off = true;
>>> +		if (tr->num_rmids < e->num_rmid)
>>>  			return false;
>>> -		}
>>>  	}
>>>  
>>>  	return true;

Reinette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 06/14] fs/resctrl: Add interface to disable a monitor event
  2026-06-09 17:21     ` Luck, Tony
@ 2026-06-09 23:02       ` Reinette Chatre
  0 siblings, 0 replies; 44+ messages in thread
From: Reinette Chatre @ 2026-06-09 23:02 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Chen Yu, David E Box, x86,
	Christoph Hellwig, linux-kernel, patches

Hi Tony,

On 6/9/26 10:21 AM, Luck, Tony wrote:
> On Mon, Jun 08, 2026 at 04:18:23PM -0700, Reinette Chatre wrote:
>> Hi Tony,
>>
>> On 6/1/26 12:56 PM, Tony Luck wrote:
>>> In preparation for re-running AET enumeration on every mount, AET code must be
>>> able to disable events on unmount so the next mount starts from a clean slate.
>>>
>>> Add a file system interface for architecture to clear the enabled flag for
>>> a given event.
>>
>> This just verbatim describes what the patch does. It would be helpful to describe
>> how the flag is used by resctrl to support the first paragraph's implicit claim
>> that the enabled flag's value is not relevant when resctrl is unmounted.
> 
> Revised commit:
> ---
> Subject: fs/resctrl: Add interface to disable a monitor event
> 
> In preparation for re-running AET enumeration on every mount, AET code must be
> able to disable events on unmount so the next mount starts from a clean slate.
> 
> Add a file system interface for architecture to reset architecture
> controlled fields of the given event. mon_event::enabled is only used
> during mount and at run time to check which events to include in file
> system objects. It is not used during unmount, so it is safe to clear
> it as part of the unmount flow.

This introduces a general resctrl fs interface and makes some powerful
generalized statements in support of the interface but these statements
are only true for the AET events. Surely mon_event::enabled is used
during unmount since domains can come and go while resctrl is not mounted
and as the new comments explain there is significant state coordination that
needs to be done between these event callbacks and hotplug handlers.

Similarly, the resctrl LLC occupancy worker keeps running while resctrl
is unmounted and depends on LLC occupancy event being enabled.

Creating a resctrl fs generalized interface but motivating it with a
highly customized lens of usage without making that clear in the changelog 
but instead just making grand claims of how safe this is seems underhanded.

> 
> Add kerneldoc comments to describe limitations on when events may be enabled
> or disabled.
> ---
> 
>>
>>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>>> index 9fd901c78dc6..327e7a863614 100644
>>> --- a/fs/resctrl/monitor.c
>>> +++ b/fs/resctrl/monitor.c
>>> @@ -1012,6 +1012,18 @@ bool resctrl_enable_mon_event(enum resctrl_event_id eventid, bool any_cpu,
>>>  	return true;
>>>  }
>>>  
>>> +void resctrl_disable_mon_event(enum resctrl_event_id eventid)
>>> +{
>>> +	if (WARN_ON_ONCE(eventid < QOS_FIRST_EVENT || eventid >= QOS_NUM_EVENTS))
>>> +		return;
>>> +	if (!mon_event_all[eventid].enabled) {
>>> +		pr_warn("Repeat disable for event %d\n", eventid);
>>> +		return;
>>> +	}
>>> +
>>> +	mon_event_all[eventid].enabled = false;
>>> +}
>>
>> It seems reasonable for architecture to expect that "disable" of event undoes all
>> settings from the earlier "enable" of event. The new function only sets the "enabled"
>> flag to false though. This is potentially confusing since it leaves the event with
>> some lingering state, some of which is a pointer to state that an architecture may
>> be reasonable to remove after disabling this event leaving a dangling pointer. 
> 
> Ok. I'll add:
> 
> +	mon_event_all[eventid].any_cpu = false;
> +	mon_event_all[eventid].binary_bits = 0;
> +	mon_event_all[eventid].arch_priv = NULL;

Thank you.

Reinette


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 07/14] x86/resctrl: Maintain a count of enabled monitor features
  2026-06-09 18:46     ` Luck, Tony
@ 2026-06-09 23:03       ` Reinette Chatre
  0 siblings, 0 replies; 44+ messages in thread
From: Reinette Chatre @ 2026-06-09 23:03 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Chen Yu, David E Box, x86,
	Christoph Hellwig, linux-kernel, patches

Hi Tony,

On 6/9/26 11:46 AM, Luck, Tony wrote:

...

> Conceptually this is much simpler. But I have questions about the
> implementation. The new function is trivial:
> 
> bool resctrl_arch_mon_capable(void)
> {
>         struct rdt_resource *r;
> 
>         for_each_mon_capable_rdt_resource(r)
>                 return true;
> 
>         return false;
> }
> 
> But that led me to #include hell when I tried to keep it as an inline
> function in <asm/resctrl.h> because for_each_mon_capable_rdt_resource()
> is defined in <linux/resctrl.h> after the #include <asm/resctrl.h>
> 
> So I've moved it out-of-line into arch/x86/kernel/cpu/resctrl/core.c.
> 
> The MPAM implementation is also out-of-line.
> 
> But then I wondered about performance. This change on x86 goes from an
> inline function that simply returns the value of a global variable to an
> out-of-line function that scans the array of rdt resources. The common
> case will be a hit on the first element, so not awful. But still worse
> that before I touched it.
> 
> So I looked for places where resctrl_arch_mon_capable() is called in
> "hot" code paths. There's a bunch in mount and mkdir, but those aren't
> very hot.
> 
> My list (check to see if I missed any others):
> 
> 1) Recurring call once per second in mbm_handle_overflow()
> 
> Seems redundant. There is a check to only start the overflow handler
> on mon_capable systems (only with enabled MBM events!)
> 
> 2) Call for potentially every task when reading tasks files in is_rmid_match()
> 
> Also seems redundant. Next part of that "if" looks at "r->type == RDTMON_GROUP"
> which can only be true on mon_capable systems.
> 
> Should I clean these up in this series? As part of this patch which
> exacerbates the performance impact, or as a separate cleanup patch?

Thanks for catching this. I think it is reasonable to include it in this series
as a preparatory patch with this patch helping to motivate its inclusion.

Even so, I am now a bit confused and concerned about the PMT dependencies. I am
not very familiar with module_get()/module_put() capabilities - can it be
guaranteed that PMT remains accessible between those two calls? I peeked at the
sashiko review and it mentioned the usage of "unbind" triggered from user space.
It sounds to me as though PMT's .probe() and .remove() can be triggered at
various times from user space irrespective of resctrl being mounted or not and not
prevented by a module_get()?

So, consider scenario where PMT is loaded and then resctrl is mounted and user space
creates a couple of monitor groups that contains the AET event files. What will
happen if user space then unbinds PMT? From what I can tell this will not
trigger AET unmount like resctrl unmount would and thus leave a lot of dangling state?

Back to the above ... if AET needs handle a PMT unbind, do you think that, for
example like in is_rmid_match(), that resctrl_arch_mon_capable() could return
different values during a single resctrl mount?

Reinette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 08/14] fs,x86,mpam/resctrl: Handle change in number of RMIDs on each mount
  2026-06-09 21:58     ` Luck, Tony
@ 2026-06-09 23:35       ` Reinette Chatre
  0 siblings, 0 replies; 44+ messages in thread
From: Reinette Chatre @ 2026-06-09 23:35 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Chen Yu, David E Box, x86,
	Christoph Hellwig, linux-kernel, patches

Hi Tony,

On 6/9/26 2:58 PM, Luck, Tony wrote:
> On Mon, Jun 08, 2026 at 04:21:10PM -0700, Reinette Chatre wrote:
>> Hi Tony,
>>
> Changed the Subject: tag to "mpam,x86,fs/resctrl" to match what you used
> for the RFC discussion. I think the implied rule is "architectures first
> in alphabetical order, filesystem last".

After seeing the RFC discussions I do not think what I did in PoC is ideal.
MPAM is the name of the Arm feature and soon there will also be RISC-V with its
"Ssqosid" and "QBQRI" features. Since we currently have x86 as established
prefix for the PQoS and RDT features it may be more appropriate and simpler
to instead use something like "arm,riscv,x86,fs/resctrl" if such global
change is ever needed? It still follows your implied rule of "architectures
first in alphabetical order" ... but it instead actually uses the architecture
names and not a mix of feature and architecture names. I am not dictating here
and open to suggestions.

> 
>> On 6/1/26 12:56 PM, Tony Luck wrote:
...
>>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>>> index 327e7a863614..b374e2f84a75 100644
>>> --- a/fs/resctrl/monitor.c
>>> +++ b/fs/resctrl/monitor.c
>>> @@ -115,10 +115,17 @@ static inline struct rmid_entry *__rmid_entry(u32 idx)
>>>  
>>>  static void limbo_release_entry(struct rmid_entry *entry)
>>>  {
>>> +	u32 idx_limit = resctrl_arch_system_num_rmid_idx();
>>
>> Having the code switch idx_limit to sometimes be resctrl_arch_system_num_rmid_idx()
>> and other times resctrl_arch_system_max_rmid_idx() makes this change difficult to
>> follow. I think it will help to use different variable names to differentiate 
>> the context in which it is being used and not leave reader trying to understand
>> why there are *two* limits.
> 
> I'll change the local variable name to max_idx_limit when dealing with
> the max value rather than the current. setup_rmid_lru_list() will have
> both variables since max_idx_limit is used for allocation, and idx_limit
> to add the right number to rmid_free_lru list. Added a comment to
> __check_limbo() explaining why max RMID is used there. I think the other
> spots are easy to see why.
What do you think of "min_idx_limit" instead of "idx_limit" to complement
the "max_idx_limit" while reflecting that it is the limit that is common 
(hence "minimum") among all monitoring resources?

Reinette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 11/14] mpam,x86/resctrl: Resolve INTEL_PMT_TELEMETRY symbols at runtime
  2026-06-08 23:25   ` Reinette Chatre
@ 2026-06-10  0:08     ` Luck, Tony
  2026-06-10 15:27       ` Reinette Chatre
  0 siblings, 1 reply; 44+ messages in thread
From: Luck, Tony @ 2026-06-10  0:08 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Chen Yu, David E Box, x86,
	Christoph Hellwig, linux-kernel, patches

On Mon, Jun 08, 2026 at 04:25:51PM -0700, Reinette Chatre wrote:
> Hi Tony,
> 
> Regarding a subject prefix ... We do not yet have a precedent for prefix
> when MPAM, x86 and fs is changed so this series may indeed have the
> opportunity to set that precedent. When doing so, please be consistent.
> 
> On 6/1/26 12:56 PM, Tony Luck wrote:
> > resctrl is always built-in, but INTEL_PMT_TELEMETRY and INTEL_TPMI are
> > logically independent and should be loadable modules.  Switch AET to use
> > the function-pointer registration API (introduced in the preceding patch)
> 
> Please drop "(introduced in the preceding patch)" since patches may not
> always keep this series's order once merged.

OK

> 
> > instead of direct link-time references to PMT symbols.
> > 
> > Prepare for the file system to call resctrl_arch_pre_mount() on every mount
> > by moving AET enumeration into resctrl_arch_pre_mount() and cleanup into
> > resctrl_arch_unmount(). This allows the PMT module to be unloaded whenever
> > the filesystem is not mounted.
> > 
> > Remove intel_aet_exit because all cleanup now happens in the unmount path.
> > 
> > Note that the Linux file system code does not serialize calls to
> > fs_context_operations::get_tree(), so there may be arbitrarily many
> > parallel calls if users invoke mount(2) multiple times.
> > 
> > Add locking and state (resctrl_arch_mount_entries) to avoid repeated
> > enumeration on nested mount requests from file system code (which will be
> > failed with -EBUSY status).
> > 
> > event_group::num_rmid may be reset (reduced) during enumeration. This is
> > not worth resetting on unmount because the same reduction would occur on
> > each subsequent mount.
> 
> Even more important to ensure that PMT is done with its enumeration when
> AET enumeration starts?

Would be nice, but I still don't know a way to make that happen.

> > 
> > Signed-off-by: Tony Luck <tony.luck@intel.com>
> > ---
> >  include/linux/resctrl.h                 |  3 ++
> >  arch/x86/kernel/cpu/resctrl/internal.h  |  8 ++--
> >  arch/x86/kernel/cpu/resctrl/core.c      | 41 +++++++++++++++++--
> >  arch/x86/kernel/cpu/resctrl/intel_aet.c | 52 ++++++++++++++++++++++---
> >  drivers/resctrl/mpam_resctrl.c          |  4 ++
> >  5 files changed, 95 insertions(+), 13 deletions(-)
> > 
> > diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> > index 3705f0214fa6..9534d42e0c57 100644
> > --- a/include/linux/resctrl.h
> > +++ b/include/linux/resctrl.h
> > @@ -557,6 +557,9 @@ void resctrl_offline_cpu(unsigned int cpu);
> >   */
> >  void resctrl_arch_pre_mount(void);
> >  
> > +/* Called to report unmount. */
> > +void resctrl_arch_unmount(void);
> > +
> >  /**
> >   * resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid
> >   *			      for this resource and domain.
> > diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> > index 3b09cfe9a046..017a19143ec9 100644
> > --- a/arch/x86/kernel/cpu/resctrl/internal.h
> > +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> > @@ -234,15 +234,15 @@ void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
> >  void resctrl_arch_mbm_cntr_assign_set_one(struct rdt_resource *r);
> >  
> >  #ifdef CONFIG_X86_CPU_RESCTRL_INTEL_AET
> > -bool intel_aet_get_events(void);
> > -void __exit intel_aet_exit(void);
> > +bool intel_aet_pre_mount(void);
> > +void intel_aet_unmount(void);
> >  int intel_aet_read_event(int domid, u32 rmid, void *arch_priv, u64 *val);
> >  void intel_aet_mon_domain_setup(int cpu, int id, struct rdt_resource *r,
> >  				struct list_head *add_pos);
> >  bool intel_handle_aet_option(bool force_off, char *tok);
> >  #else
> > -static inline bool intel_aet_get_events(void) { return false; }
> > -static inline void __exit intel_aet_exit(void) { }
> > +static inline bool intel_aet_pre_mount(void) { return false; }
> > +static inline void intel_aet_unmount(void) { }
> >  static inline int intel_aet_read_event(int domid, u32 rmid, void *arch_priv, u64 *val)
> >  {
> >  	return -EINVAL;
> > diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> > index 934492c7e643..9336299b9647 100644
> > --- a/arch/x86/kernel/cpu/resctrl/core.c
> > +++ b/arch/x86/kernel/cpu/resctrl/core.c
> > @@ -16,10 +16,12 @@
> >  
> >  #define pr_fmt(fmt)	"resctrl: " fmt
> >  
> > +#include <linux/cleanup.h>
> >  #include <linux/cpu.h>
> >  #include <linux/slab.h>
> >  #include <linux/err.h>
> >  #include <linux/cpuhotplug.h>
> > +#include <linux/mutex.h>
> >  
> >  #include <asm/cpu_device_id.h>
> >  #include <asm/msr.h>
> > @@ -776,12 +778,20 @@ static int resctrl_arch_offline_cpu(unsigned int cpu)
> >  	return 0;
> >  }
> >  
> > +static DEFINE_MUTEX(resctrl_arch_mount_lock);
> 
> Please document what this mutex protects.

It looks like I'm only protecting resctrl_arch_mount_entries, but I hold
the mutex for much longer. Perhaps this could just be some atomic
operations on resctrl_arch_mount_entries? I'll think about this some
more.

> 
> > +static int resctrl_arch_mount_entries;
> > +
> >  void resctrl_arch_pre_mount(void)
> >  {
> >  	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
> >  	int cpu;
> >  
> > -	if (!intel_aet_get_events())
> > +	guard(mutex)(&resctrl_arch_mount_lock);
> > +
> > +	if (++resctrl_arch_mount_entries > 1)
> > +		return;
> > +
> > +	if (!intel_aet_pre_mount())
> >  		return;
> >  
> >  	/*
> > @@ -798,6 +808,33 @@ void resctrl_arch_pre_mount(void)
> >  	cpus_read_unlock();
> >  }
> >  
> > +void resctrl_arch_unmount(void)
> > +{
> > +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_PERF_PKG].r_resctrl;
> > +	int cpu;
> > +
> > +	guard(mutex)(&resctrl_arch_mount_lock);
> > +
> > +	if (--resctrl_arch_mount_entries > 0)
> > +		return;
> > +
> > +	WARN_ON(resctrl_arch_mount_entries < 0);
> > +
> > +	intel_aet_unmount();
> > +
> > +	if (!r->mon_capable)
> > +		return;
> > +
> > +	cpus_read_lock();
> > +	mutex_lock(&domain_list_lock);
> > +	for_each_online_cpu(cpu)
> > +		domain_remove_cpu_mon(cpu, r);
> > +	r->mon_capable = false;
> > +	rdt_mon_feature_count--;
> > +	mutex_unlock(&domain_list_lock);
> > +	cpus_read_unlock();
> > +}
> > +
> >  enum {
> >  	RDT_FLAG_CMT,
> >  	RDT_FLAG_MBM_TOTAL,
> > @@ -1174,8 +1211,6 @@ late_initcall(resctrl_arch_late_init);
> >  
> >  static void __exit resctrl_arch_exit(void)
> >  {
> > -	intel_aet_exit();
> > -
> >  	cpuhp_remove_state(rdt_online);
> >  
> >  	resctrl_exit();
> > diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> > index db671725acbb..74c34593876b 100644
> > --- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
> > +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> > @@ -25,8 +25,8 @@
> >  #include <linux/intel_vsec.h>
> >  #include <linux/io.h>
> >  #include <linux/minmax.h>
> > -#include <linux/mutex.h>
> >  #include <linux/module.h>
> > +#include <linux/mutex.h>
> >  #include <linux/printk.h>
> >  #include <linux/rculist.h>
> >  #include <linux/rcupdate.h>
> > @@ -308,7 +308,7 @@ static void (*put_feature)(struct pmt_feature_group *p);
> >   * struct pmt_feature_group to indicate that its events are successfully
> >   * enabled.
> >   */
> > -bool intel_aet_get_events(void)
> > +static bool aet_get_events(void)
> >  {
> >  	struct pmt_feature_group *p;
> >  	enum pmt_feature_id pfid;
> > @@ -317,14 +317,14 @@ bool intel_aet_get_events(void)
> >  
> >  	for_each_event_group(peg) {
> >  		pfid = lookup_pfid((*peg)->pfname);
> > -		p = intel_pmt_get_regions_by_feature(pfid);
> > +		p = get_feature(pfid);
> >  		if (IS_ERR_OR_NULL(p))
> >  			continue;
> >  		if (enable_events(*peg, p)) {
> >  			(*peg)->pfg = p;
> >  			ret = true;
> >  		} else {
> > -			intel_pmt_put_feature_group(p);
> > +			put_feature(p);
> >  		}
> >  	}
> >  
> > @@ -353,16 +353,56 @@ void intel_aet_unregister_enumeration(void)
> >  }
> >  EXPORT_SYMBOL_NS_GPL(intel_aet_unregister_enumeration, "INTEL_PMT");
> >  
> > -void __exit intel_aet_exit(void)
> > +/*
> > + * The pmt_telemetry module may not be loaded when the resctrl file system
> 
> Is this accurate? What prevents pmt_telemetry module from being loaded when
> resctrl file system is mounted?

Comment isn't clear. The issue is when pmt_telemetry isn't loaded during
the mount operation, but is loaded later while the file system is
mounted. On unmount "pmt_module" is set, but should not get a
module_put() because no "get" was done at mount time.

I'll rewrite to make this clear.
> 
> > + * is mounted. Keep track of whether a hold was placed on the module to know
> > + * whether to release it during unmount.
> > + */
> > +static bool have_pmt_hold;
> > +
> > +bool intel_aet_pre_mount(void)
> > +{
> > +	bool ret;
> > +
> > +	guard(mutex)(&aet_register_lock);
> > +	if (!get_feature || !put_feature)
> > +		return false;
> > +
> > +	if (pmt_module) {
> > +		if (!try_module_get(pmt_module))
> > +			return false;
> > +		have_pmt_hold = true;
> > +	}
> > +
> > +	ret = aet_get_events();
> > +
> > +	if (!ret && pmt_module && have_pmt_hold) {
> > +		module_put(pmt_module);
> > +		have_pmt_hold = false;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +void intel_aet_unmount(void)
> >  {
> >  	struct event_group **peg;
> >  
> > +	guard(mutex)(&aet_register_lock);
> 
> Could this get a short-circuit to make behavior on AMD obvious?

Like this?

	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
		return;

Same in intel_aet_pre_mount() for symmetry??

> 
> >  	for_each_event_group(peg) {
> >  		if ((*peg)->pfg) {
> > -			intel_pmt_put_feature_group((*peg)->pfg);
> > +			struct event_group *e = *peg;
> > +
> > +			for (int j = 0; j < e->num_events; j++)
> > +				resctrl_disable_mon_event(e->evts[j].id);
> > +			put_feature((*peg)->pfg);
> >  			(*peg)->pfg = NULL;
> >  		}
> >  	}
> > +	if (have_pmt_hold && pmt_module) {
> > +		module_put(pmt_module);
> > +		have_pmt_hold = false;
> > +	}
> >  }
> >  
> >  #define DATA_VALID	BIT_ULL(63)
> > diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
> > index 7079870ca894..5859cc0f5e37 100644
> > --- a/drivers/resctrl/mpam_resctrl.c
> > +++ b/drivers/resctrl/mpam_resctrl.c
> > @@ -162,6 +162,10 @@ void resctrl_arch_pre_mount(void)
> >  {
> >  }
> >  
> > +void resctrl_arch_unmount(void)
> > +{
> > +}
> > +
> >  bool resctrl_arch_get_cdp_enabled(enum resctrl_res_level rid)
> >  {
> >  	return mpam_resctrl_controls[rid].cdp_enabled;
> 
> Reinette

-Tony

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 11/14] mpam,x86/resctrl: Resolve INTEL_PMT_TELEMETRY symbols at runtime
  2026-06-10  0:08     ` Luck, Tony
@ 2026-06-10 15:27       ` Reinette Chatre
  2026-06-10 15:49         ` Luck, Tony
  0 siblings, 1 reply; 44+ messages in thread
From: Reinette Chatre @ 2026-06-10 15:27 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Chen Yu, David E Box, x86,
	Christoph Hellwig, linux-kernel, patches

Hi Tony,

On 6/9/26 5:08 PM, Luck, Tony wrote:
> On Mon, Jun 08, 2026 at 04:25:51PM -0700, Reinette Chatre wrote:


>>> instead of direct link-time references to PMT symbols.
>>>
>>> Prepare for the file system to call resctrl_arch_pre_mount() on every mount
>>> by moving AET enumeration into resctrl_arch_pre_mount() and cleanup into
>>> resctrl_arch_unmount(). This allows the PMT module to be unloaded whenever
>>> the filesystem is not mounted.
>>>
>>> Remove intel_aet_exit because all cleanup now happens in the unmount path.
>>>
>>> Note that the Linux file system code does not serialize calls to
>>> fs_context_operations::get_tree(), so there may be arbitrarily many
>>> parallel calls if users invoke mount(2) multiple times.
>>>
>>> Add locking and state (resctrl_arch_mount_entries) to avoid repeated
>>> enumeration on nested mount requests from file system code (which will be
>>> failed with -EBUSY status).
>>>
>>> event_group::num_rmid may be reset (reduced) during enumeration. This is
>>> not worth resetting on unmount because the same reduction would occur on
>>> each subsequent mount.
>>
>> Even more important to ensure that PMT is done with its enumeration when
>> AET enumeration starts?
> 
> Would be nice, but I still don't know a way to make that happen.

Could it help to have AET depends on PMT probe being complete?

...

>>> +void intel_aet_unmount(void)
>>>  {
>>>  	struct event_group **peg;
>>>  
>>> +	guard(mutex)(&aet_register_lock);
>>
>> Could this get a short-circuit to make behavior on AMD obvious?
> 
> Like this?
> 
> 	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
> 		return;
> 
> Same in intel_aet_pre_mount() for symmetry??

I do not see that this requires to be architecture specific. This capability
introduces AET register()/unregister() and pre_mount()/unmount().
pre_mount() already does no-op if (essentially) nothing is registered. Could
unmount() similarly be a no-op if (essentially) nothing is registered?

Reinette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 11/14] mpam,x86/resctrl: Resolve INTEL_PMT_TELEMETRY symbols at runtime
  2026-06-10 15:27       ` Reinette Chatre
@ 2026-06-10 15:49         ` Luck, Tony
  2026-06-10 16:21           ` Reinette Chatre
  0 siblings, 1 reply; 44+ messages in thread
From: Luck, Tony @ 2026-06-10 15:49 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Chen Yu, David E Box, x86,
	Christoph Hellwig, linux-kernel, patches

On Wed, Jun 10, 2026 at 08:27:15AM -0700, Reinette Chatre wrote:
> Hi Tony,
> 
> On 6/9/26 5:08 PM, Luck, Tony wrote:
> > On Mon, Jun 08, 2026 at 04:25:51PM -0700, Reinette Chatre wrote:
> 
> 
> >>> instead of direct link-time references to PMT symbols.
> >>>
> >>> Prepare for the file system to call resctrl_arch_pre_mount() on every mount
> >>> by moving AET enumeration into resctrl_arch_pre_mount() and cleanup into
> >>> resctrl_arch_unmount(). This allows the PMT module to be unloaded whenever
> >>> the filesystem is not mounted.
> >>>
> >>> Remove intel_aet_exit because all cleanup now happens in the unmount path.
> >>>
> >>> Note that the Linux file system code does not serialize calls to
> >>> fs_context_operations::get_tree(), so there may be arbitrarily many
> >>> parallel calls if users invoke mount(2) multiple times.
> >>>
> >>> Add locking and state (resctrl_arch_mount_entries) to avoid repeated
> >>> enumeration on nested mount requests from file system code (which will be
> >>> failed with -EBUSY status).
> >>>
> >>> event_group::num_rmid may be reset (reduced) during enumeration. This is
> >>> not worth resetting on unmount because the same reduction would occur on
> >>> each subsequent mount.
> >>
> >> Even more important to ensure that PMT is done with its enumeration when
> >> AET enumeration starts?
> > 
> > Would be nice, but I still don't know a way to make that happen.
> 
> Could it help to have AET depends on PMT probe being complete?

I think I looked at this before, and it didn't help. IIRC the PMT probe
kicks off enumeration of the auxiliary bus, but doesn't wait. I'll check
again with David to see if there is anything that can be done here.

> ...
> 
> >>> +void intel_aet_unmount(void)
> >>>  {
> >>>  	struct event_group **peg;
> >>>  
> >>> +	guard(mutex)(&aet_register_lock);
> >>
> >> Could this get a short-circuit to make behavior on AMD obvious?
> > 
> > Like this?
> > 
> > 	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
> > 		return;
> > 
> > Same in intel_aet_pre_mount() for symmetry??
> 
> I do not see that this requires to be architecture specific. This capability
> introduces AET register()/unregister() and pre_mount()/unmount().
> pre_mount() already does no-op if (essentially) nothing is registered. Could
> unmount() similarly be a no-op if (essentially) nothing is registered?

Ah, I see. Maybe I just need a better name for my "have_pmt_hold" flag
so it can be used as the fast exit indicator in intel_aet_unmount().

Perhaps "pmt_in_use"?

Then:

	if (!pmt_in_use)
		return;
> 
> Reinette

-Tony

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 12/14] fs/resctrl: Call architecture hooks for every mount/unmount
  2026-06-08 23:26   ` Reinette Chatre
@ 2026-06-10 16:16     ` Luck, Tony
  0 siblings, 0 replies; 44+ messages in thread
From: Luck, Tony @ 2026-06-10 16:16 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Chen Yu, David E Box, x86,
	Christoph Hellwig, linux-kernel, patches

On Mon, Jun 08, 2026 at 04:26:06PM -0700, Reinette Chatre wrote:
> Hi Tony,
> 
> On 6/1/26 12:56 PM, Tony Luck wrote:
> > @@ -3147,6 +3148,10 @@ static int rdt_get_tree(struct fs_context *fc)
> >  out:
> >  	mutex_unlock(&rdtgroup_mutex);
> >  	cpus_read_unlock();
> > +
> > +	if (ret)
> 
> Can ret be zero here?

No. I'll remove the "if" check.
> 
> > +		resctrl_arch_unmount();
> > +
> >  	return ret;
> >  }
> >  
> 
> Reinette

-Tony

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 14/14] Documentation/filesystems/resctrl: Add footnote for telemetry fstab mount caveat
  2026-06-08 23:26   ` Reinette Chatre
@ 2026-06-10 16:19     ` Luck, Tony
  0 siblings, 0 replies; 44+ messages in thread
From: Luck, Tony @ 2026-06-10 16:19 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Chen Yu, David E Box, x86,
	Christoph Hellwig, linux-kernel, patches

On Mon, Jun 08, 2026 at 04:26:36PM -0700, Reinette Chatre wrote:
> Hi Tony,
> 
> On 6/1/26 12:56 PM, Tony Luck wrote:
> > Add a footnote to the 'If telemetry monitoring is enabled' sentence noting
> > that because PMT driver enumerates telemetry features asynchronously, an
> > automatic mount of resctrl from /etc/fstab at boot may occur before those
> > features are available, resulting in them not being enabled.
> > 
> > Assisted-by: Claude:Sonnet_4.6
> > Signed-off-by: Tony Luck <tony.luck@intel.com>
> > ---
> >  Documentation/filesystems/resctrl.rst | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
> > index b003bed339fd..4e62b8c5e0d6 100644
> > --- a/Documentation/filesystems/resctrl.rst
> > +++ b/Documentation/filesystems/resctrl.rst
> > @@ -620,7 +620,7 @@ When monitoring is enabled all MON groups will also contain:
> >  	each instance of an L3 cache. Each directory contains files for the enabled
> >  	L3 events (e.g. "llc_occupancy", "mbm_total_bytes", and "mbm_local_bytes").
> >  
> > -	If telemetry monitoring is enabled, there will be a "mon_PERF_PKG_YY"
> > +	If telemetry monitoring is enabled [#]_, there will be a "mon_PERF_PKG_YY"
> >  	directory for each physical processor package. Each directory contains
> >  	files for the enabled telemetry events (e.g. "core_energy". "activity",
> >  	"uops_retired", etc.)
> > @@ -659,6 +659,11 @@ When monitoring is enabled all MON groups will also contain:
> >  	returned if the MBM event does not have an assigned counter in the
> >  	CTRL_MON group nor in any of its associated MON groups.
> >  
> > +.. [#] Telemetry features are enumerated asynchronously by the PMT driver. If
> > +   resctrl is automatically mounted from ``/etc/fstab`` at boot, the telemetry
> > +   features may not yet be available at mount time and will therefore not be
> > +   enabled.
> 
> Would it be helpful to add a snippet that explains that in this scenario telemetry
> features could be enabled on resctrl remount?

Yes. I will add.

> 
> > +
> >  "mon_hw_id":
> >  	Available only with debug option. The identifier used by hardware
> >  	for the monitor group. On x86 this is the RMID.
> 
> Reinette

-Tony

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 11/14] mpam,x86/resctrl: Resolve INTEL_PMT_TELEMETRY symbols at runtime
  2026-06-10 15:49         ` Luck, Tony
@ 2026-06-10 16:21           ` Reinette Chatre
  2026-06-10 16:34             ` Luck, Tony
  0 siblings, 1 reply; 44+ messages in thread
From: Reinette Chatre @ 2026-06-10 16:21 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Chen Yu, David E Box, x86,
	Christoph Hellwig, linux-kernel, patches

Hi Tony,

On 6/10/26 8:49 AM, Luck, Tony wrote:
> On Wed, Jun 10, 2026 at 08:27:15AM -0700, Reinette Chatre wrote:
>> On 6/9/26 5:08 PM, Luck, Tony wrote:
>>> On Mon, Jun 08, 2026 at 04:25:51PM -0700, Reinette Chatre wrote:

>>>>> +void intel_aet_unmount(void)
>>>>>  {
>>>>>  	struct event_group **peg;
>>>>>  
>>>>> +	guard(mutex)(&aet_register_lock);
>>>>
>>>> Could this get a short-circuit to make behavior on AMD obvious?
>>>
>>> Like this?
>>>
>>> 	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
>>> 		return;
>>>
>>> Same in intel_aet_pre_mount() for symmetry??
>>
>> I do not see that this requires to be architecture specific. This capability
>> introduces AET register()/unregister() and pre_mount()/unmount().
>> pre_mount() already does no-op if (essentially) nothing is registered. Could
>> unmount() similarly be a no-op if (essentially) nothing is registered?
> 
> Ah, I see. Maybe I just need a better name for my "have_pmt_hold" flag
> so it can be used as the fast exit indicator in intel_aet_unmount().
> 
> Perhaps "pmt_in_use"?
> 
> Then:
> 
> 	if (!pmt_in_use)
> 		return;

Something like this. It is not obvious to me how the rest of unmount will look
with this.
Ideally it will be something symmetrical to pre_mount() to make the
flow and short-circuit of the flow obvious.

Reinette





^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 11/14] mpam,x86/resctrl: Resolve INTEL_PMT_TELEMETRY symbols at runtime
  2026-06-10 16:21           ` Reinette Chatre
@ 2026-06-10 16:34             ` Luck, Tony
  2026-06-10 16:46               ` Reinette Chatre
  0 siblings, 1 reply; 44+ messages in thread
From: Luck, Tony @ 2026-06-10 16:34 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Chen Yu, David E Box, x86,
	Christoph Hellwig, linux-kernel, patches

On Wed, Jun 10, 2026 at 09:21:04AM -0700, Reinette Chatre wrote:
> Hi Tony,
> 
> On 6/10/26 8:49 AM, Luck, Tony wrote:
> > On Wed, Jun 10, 2026 at 08:27:15AM -0700, Reinette Chatre wrote:
> >> On 6/9/26 5:08 PM, Luck, Tony wrote:
> >>> On Mon, Jun 08, 2026 at 04:25:51PM -0700, Reinette Chatre wrote:
> 
> >>>>> +void intel_aet_unmount(void)
> >>>>>  {
> >>>>>  	struct event_group **peg;
> >>>>>  
> >>>>> +	guard(mutex)(&aet_register_lock);
> >>>>
> >>>> Could this get a short-circuit to make behavior on AMD obvious?
> >>>
> >>> Like this?
> >>>
> >>> 	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
> >>> 		return;
> >>>
> >>> Same in intel_aet_pre_mount() for symmetry??
> >>
> >> I do not see that this requires to be architecture specific. This capability
> >> introduces AET register()/unregister() and pre_mount()/unmount().
> >> pre_mount() already does no-op if (essentially) nothing is registered. Could
> >> unmount() similarly be a no-op if (essentially) nothing is registered?
> > 
> > Ah, I see. Maybe I just need a better name for my "have_pmt_hold" flag
> > so it can be used as the fast exit indicator in intel_aet_unmount().
> > 
> > Perhaps "pmt_in_use"?
> > 
> > Then:
> > 
> > 	if (!pmt_in_use)
> > 		return;
> 
> Something like this. It is not obvious to me how the rest of unmount will look
> with this.
> Ideally it will be something symmetrical to pre_mount() to make the
> flow and short-circuit of the flow obvious.

With updates applied, mount/umount look like this:

/*
 * Track whether pmt_telemetry enumeration succeeded during mount for use
 * during unmount.
 */
static bool pmt_in_use;

bool intel_aet_pre_mount(void)
{
	bool ret;

	guard(mutex)(&aet_register_lock);
	if (!get_feature || !put_feature)
		return false;

	if (pmt_module) {
		if (!try_module_get(pmt_module))
			return false;
	}

	ret = aet_get_events();

	if (!ret) {
		if (pmt_module)
			module_put(pmt_module);
	} else {
		pmt_in_use = true;
	}

	return ret;
}

void intel_aet_unmount(void)
{
	struct event_group **peg;

	guard(mutex)(&aet_register_lock);
	if (!pmt_in_use)
		return;

	for_each_event_group(peg) {
		if ((*peg)->pfg) {
			struct event_group *e = *peg;

			for (int j = 0; j < e->num_events; j++)
				resctrl_disable_mon_event(e->evts[j].id);
			put_feature((*peg)->pfg);
			(*peg)->pfg = NULL;
		}
	}
	if (pmt_module)
		module_put(pmt_module);
	pmt_in_use = false;
}

}
> 
> Reinette

-Tony

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 11/14] mpam,x86/resctrl: Resolve INTEL_PMT_TELEMETRY symbols at runtime
  2026-06-10 16:34             ` Luck, Tony
@ 2026-06-10 16:46               ` Reinette Chatre
  2026-06-10 17:24                 ` Luck, Tony
  0 siblings, 1 reply; 44+ messages in thread
From: Reinette Chatre @ 2026-06-10 16:46 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Chen Yu, David E Box, x86,
	Christoph Hellwig, linux-kernel, patches

Hi Tony,

On 6/10/26 9:34 AM, Luck, Tony wrote:
> On Wed, Jun 10, 2026 at 09:21:04AM -0700, Reinette Chatre wrote:
>> Hi Tony,
>>
>> On 6/10/26 8:49 AM, Luck, Tony wrote:
>>> On Wed, Jun 10, 2026 at 08:27:15AM -0700, Reinette Chatre wrote:
>>>> On 6/9/26 5:08 PM, Luck, Tony wrote:
>>>>> On Mon, Jun 08, 2026 at 04:25:51PM -0700, Reinette Chatre wrote:
>>
>>>>>>> +void intel_aet_unmount(void)
>>>>>>>  {
>>>>>>>  	struct event_group **peg;
>>>>>>>  
>>>>>>> +	guard(mutex)(&aet_register_lock);
>>>>>>
>>>>>> Could this get a short-circuit to make behavior on AMD obvious?
>>>>>
>>>>> Like this?
>>>>>
>>>>> 	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
>>>>> 		return;
>>>>>
>>>>> Same in intel_aet_pre_mount() for symmetry??
>>>>
>>>> I do not see that this requires to be architecture specific. This capability
>>>> introduces AET register()/unregister() and pre_mount()/unmount().
>>>> pre_mount() already does no-op if (essentially) nothing is registered. Could
>>>> unmount() similarly be a no-op if (essentially) nothing is registered?
>>>
>>> Ah, I see. Maybe I just need a better name for my "have_pmt_hold" flag
>>> so it can be used as the fast exit indicator in intel_aet_unmount().
>>>
>>> Perhaps "pmt_in_use"?
>>>
>>> Then:
>>>
>>> 	if (!pmt_in_use)
>>> 		return;
>>
>> Something like this. It is not obvious to me how the rest of unmount will look
>> with this.
>> Ideally it will be something symmetrical to pre_mount() to make the
>> flow and short-circuit of the flow obvious.
> 
> With updates applied, mount/umount look like this:
> 
> /*
>  * Track whether pmt_telemetry enumeration succeeded during mount for use
>  * during unmount.
>  */
> static bool pmt_in_use;
> 
> bool intel_aet_pre_mount(void)
> {
> 	bool ret;
> 
> 	guard(mutex)(&aet_register_lock);
> 	if (!get_feature || !put_feature)
> 		return false;
> 
> 	if (pmt_module) {
> 		if (!try_module_get(pmt_module))
> 			return false;
> 	}
> 
> 	ret = aet_get_events();
> 
> 	if (!ret) {
> 		if (pmt_module)
> 			module_put(pmt_module);
> 	} else {
> 		pmt_in_use = true;
> 	}
> 
> 	return ret;
> }
> 
> void intel_aet_unmount(void)
> {
> 	struct event_group **peg;
> 
> 	guard(mutex)(&aet_register_lock);
> 	if (!pmt_in_use)
> 		return;
> 
> 	for_each_event_group(peg) {
> 		if ((*peg)->pfg) {
> 			struct event_group *e = *peg;
> 
> 			for (int j = 0; j < e->num_events; j++)
> 				resctrl_disable_mon_event(e->evts[j].id);
> 			put_feature((*peg)->pfg);
> 			(*peg)->pfg = NULL;
> 		}
> 	}
> 	if (pmt_module)
> 		module_put(pmt_module);
> 	pmt_in_use = false;
> }

I see many "if (pmt_module)" checks ... intel_aet_pre_mount() even calls it twice. Are they all necessary?
It creates the impression that the PMT module can be yanked from AET at any time, something which
intel_aet_unregister_enumeration() seems to allow. I was hoping that there can be some guarantee
that if PMT is available during pre_mount() it will continue to be available at least until
unmount() completes.

Reinette



^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH v7 11/14] mpam,x86/resctrl: Resolve INTEL_PMT_TELEMETRY symbols at runtime
  2026-06-10 16:46               ` Reinette Chatre
@ 2026-06-10 17:24                 ` Luck, Tony
  2026-06-10 17:58                   ` Reinette Chatre
  0 siblings, 1 reply; 44+ messages in thread
From: Luck, Tony @ 2026-06-10 17:24 UTC (permalink / raw)
  To: Chatre, Reinette
  Cc: Fenghua Yu, Wieczor-Retman, Maciej, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Chen, Yu C, Box, David E,
	x86@kernel.org, Christoph Hellwig, linux-kernel@vger.kernel.org,
	patches@lists.linux.dev

> I see many "if (pmt_module)" checks ... intel_aet_pre_mount() even calls it twice. Are they all necessary?
> It creates the impression that the PMT module can be yanked from AET at any time, something which
> intel_aet_unregister_enumeration() seems to allow. I was hoping that there can be some guarantee
> that if PMT is available during pre_mount() it will continue to be available at least until
> unmount() completes.

pmt_module is NULL if CONFIG_INTEL_PMT_TELEMETRY=y ... i.e. built-in to the kernel.

In that case it obviously can't go away, and doesn't need module_get()/module_put().
There's no special case for this. try_module_get() takes a fault on NULL dereference.

When CONFIG_INTEL_PMT_TELEMETRY=m then the get/put should stop it going away
(I tried rmmod while resctrl mounted and it fails to remove as expected).

-Tony

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 11/14] mpam,x86/resctrl: Resolve INTEL_PMT_TELEMETRY symbols at runtime
  2026-06-10 17:24                 ` Luck, Tony
@ 2026-06-10 17:58                   ` Reinette Chatre
  0 siblings, 0 replies; 44+ messages in thread
From: Reinette Chatre @ 2026-06-10 17:58 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Fenghua Yu, Wieczor-Retman, Maciej, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Chen, Yu C, Box, David E,
	x86@kernel.org, Christoph Hellwig, linux-kernel@vger.kernel.org,
	patches@lists.linux.dev

Hi Tony,

On 6/10/26 10:24 AM, Luck, Tony wrote:
>> I see many "if (pmt_module)" checks ... intel_aet_pre_mount() even calls it twice. Are they all necessary?
>> It creates the impression that the PMT module can be yanked from AET at any time, something which
>> intel_aet_unregister_enumeration() seems to allow. I was hoping that there can be some guarantee
>> that if PMT is available during pre_mount() it will continue to be available at least until
>> unmount() completes.
> 
> pmt_module is NULL if CONFIG_INTEL_PMT_TELEMETRY=y ... i.e. built-in to the kernel.
> 
> In that case it obviously can't go away, and doesn't need module_get()/module_put().
> There's no special case for this. try_module_get() takes a fault on NULL dereference.
> 
> When CONFIG_INTEL_PMT_TELEMETRY=m then the get/put should stop it going away
> (I tried rmmod while resctrl mounted and it fails to remove as expected).

Have you tried "unbind" via sysfs?

Even so, my comment was in response to the code you shared. Let me paste it
back and highlight what I meant with the "many pmt_module checks":

> /*
>  * Track whether pmt_telemetry enumeration succeeded during mount for use
>  * during unmount.
>  */
> static bool pmt_in_use;
> 
> bool intel_aet_pre_mount(void)
> {
> 	bool ret;
> 
> 	guard(mutex)(&aet_register_lock);
> 	if (!get_feature || !put_feature)
> 		return false;
> 
> 	if (pmt_module) {

Here is an "if (pmt_module)" check ... can it ever be false? If so
then the rest of this function becomes very confusing (more below ...)

> 		if (!try_module_get(pmt_module))
> 			return false;
> 	}
> 
> 	ret = aet_get_events();

aet_get_events() can thus seemingly be called when pmt_module is unset?

> 
> 	if (!ret) {
> 		if (pmt_module)

Can pmt_module be unset here?

> 			module_put(pmt_module);
> 	} else {
> 		pmt_in_use = true;

So pmt_in_use could be true if pmt_module is unset? Confusing, no?

> 	}
> 
> 	return ret;
> }
> 
> void intel_aet_unmount(void)
> {
> 	struct event_group **peg;
> 
> 	guard(mutex)(&aet_register_lock);
> 	if (!pmt_in_use)
> 		return;
> 
> 	for_each_event_group(peg) {
> 		if ((*peg)->pfg) {
> 			struct event_group *e = *peg;
> 
> 			for (int j = 0; j < e->num_events; j++)
> 				resctrl_disable_mon_event(e->evts[j].id);
> 			put_feature((*peg)->pfg);
> 			(*peg)->pfg = NULL;
> 		}
> 	}
> 	if (pmt_module)

So this implies that pmt_module could be unset here ... if that is the
case then that means that the PMT module disappeared while resctrl was
mounted which is exactly what this work aims to prevent, no?

> 		module_put(pmt_module);
> 	pmt_in_use = false;

... if pmt_module was unset then how could pmt_in_use ever be true?

> }

Reinette

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 05/14] x86/resctrl: Stop setting event_group::force_off on RMID shortage
  2026-06-09 23:02       ` Reinette Chatre
@ 2026-06-10 20:01         ` Luck, Tony
  0 siblings, 0 replies; 44+ messages in thread
From: Luck, Tony @ 2026-06-10 20:01 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Chen Yu, David E Box, x86,
	Christoph Hellwig, linux-kernel, patches

On Tue, Jun 09, 2026 at 04:02:11PM -0700, Reinette Chatre wrote:
> Hi Tony,
> 
> On 6/9/26 9:51 AM, Luck, Tony wrote:
> > On Mon, Jun 08, 2026 at 04:16:58PM -0700, Reinette Chatre wrote:
> >> Hi Tony,
> >>
> >> On 6/1/26 12:56 PM, Tony Luck wrote:
> >>> Drop the force_off assignment from all_regions_have_sufficient_rmid().
> >>
> >> Apart from this being obvious from the patch, this is not how an x86 changelog
> >> should be structured. Please see "Changelog" in
> >> Documentation/process/maintainer-tip.rst
> > 
> > Revised version:
> > ---
> > Subject: x86/resctrl: Fix usage of event_group::force_off in AET
> > 
> > The kernel command line option "rdt=" is used to force enable or disable
> > individual resctrl features.
> > 
> > There are two problems with usage in the AET (Application Energy
> > Telemetry) feature:
> 
> Above seems incomplete. Could it be "two problems with usage of the rdt= kernel
> command line ..." or "two problems with the "rdt=" usage ..." or ...? 
> 
> > 1) If the user specifies both enabling and disabling of the same feature
> >    (e.g. "rdt=energy,!energy") the request to enable should override the
> >    request to disable (for consistency with other features).
> > 2) event_group::force_off is set true in all_regions_have_sufficient_rmid()
> >    which will cause problems when AET features are enumerated on each mount
> >    of the resctrl file system
> 
> Two comments:
> - In general we should aim to make the changelog as specific as possible to
>   prevent reader needing to do any deciphering. This makes the review easier
>   and faster. So, please avoid general things like "will cause problems" that
>   requires reader to figure out the problems (plural!) on their own. Instead
>   just be specific about what the problems are.
> - Is (2) still a problem when (1) is fixed? An underlying question is: if an
>   event group is successfully enumerated during resctrl mount, will it always
>   enumerate with identical properties on every subsequent mount?
> 
> > 
> > Fix the first issue by checking event_group::force_on in enable_events().
> 
> "the first issue" seems vague to me while the rest describes the code that can be
> seen from the patch. How about something like below to help describe what the code
> change accomplishes:
> 	Enumerate the event group if user space overrides any disable of the event group.
> 
> > 
> > Fix the other issue by dropping the assignment to event_group::force_off.
> 
> "the other issue" is very vague. Here it will also help to be specific. Although
> per earlier comment the need for this fix is no longer clear to me?
> 
> > 
> > ---
> > 
> >>
> >> Please note that, after the preparatory fixes that are expected to land separately,
> >> this will be the first patch of this series ... having this first patch just
> >> kick off with "what changes" without any context is a difficult way for a
> >> reviewer to start considering this piece of work.
> >>
> >>> This preserves current single-enumeration behaviour while preparing for
> >>> the upcoming per-mount enumeration, where latching force_off would
> >>> incorrectly suppress re-enumeration on subsequent mounts - even when the
> >>> user explicitly requested the feature via "rdt={feature}".
> >>
> >> I believe that user space should still expect that rdt= options behave
> >> consistently. So if user space for some reason provides: rdt=!mbm,mbm,!energy,energy
> >> on a system that supports both then it should not be the case that one is enabled and the
> >> other not.
> >>
> >> I this think that, for example, enable_events() should start with:
> >>
> >> 	if (e->force_off && !e->force_on)
> >> 		return false;
> > 
> > OK.
> > 
> >>  
> >>> Signed-off-by: Tony Luck <tony.luck@intel.com>
> >>> ---
> >>>  arch/x86/kernel/cpu/resctrl/intel_aet.c | 8 +++-----
> >>>  1 file changed, 3 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/arch/x86/kernel/cpu/resctrl/intel_aet.c b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> >>> index 89b8b619d5d5..e2af700bca04 100644
> >>> --- a/arch/x86/kernel/cpu/resctrl/intel_aet.c
> >>> +++ b/arch/x86/kernel/cpu/resctrl/intel_aet.c
> >>> @@ -60,8 +60,8 @@ struct pmt_event {
> >>>   *			data for all telemetry regions of type @pfname.
> >>>   *			Valid if the system supports the event group,
> >>>   *			NULL otherwise.
> >>> - * @force_off:		True when "rdt" command line or architecture code disables
> >>> - *			this event group due to insufficient RMIDs.
> >>> + * @force_off:		True when "rdt" command line disables this event group
> >>> + *			to avoid system limitations due to insufficient RMIDs.
> >>
> >> >From what I understand it is only architecture that can disable an event group
> >> "to avoid system limitations due to insufficient RMIDs" and this capability is removed
> >> in this patch. Above implies that this capability now needs to be implemented
> >> by userspace, which I do not think is accurate?
> > 
> > Should I just drop the second line? The user may have various other reasons
> > why they want to disable this event group.
> 
> Now that enable_events() starts with a check of event_group::force_on it seems that
> the existing behavior of event_group::force_off to also reflect architecture 
> disable can be maintained?

OK. I'll drop this kerneldoc change and keep the setting of force_off = true
when there are insufficient RMIDs.

>  
> > Combined with the code change you suggest above, this would describe the
> > use of these two fields. "force_off" disables, "force_on" enables (and
> > overrides any "force_off").
> 
> I believe the "force_on" description below already accurately describes how it
> overrides an earlier disable.
> 
> > 
> >>
> >>>   * @force_on:		True when "rdt" command line overrides disable of this
> >>>   *			event group.
> >>>   * @guid:		Unique number per XML description file.
> >>> @@ -214,10 +214,8 @@ static bool all_regions_have_sufficient_rmid(struct event_group *e, struct pmt_f
> >>>  		if (!p->regions[i].addr)
> >>>  			continue;
> >>>  		tr = &p->regions[i];
> >>> -		if (tr->num_rmids < e->num_rmid) {
> >>> -			e->force_off = true;
> >>> +		if (tr->num_rmids < e->num_rmid)
> >>>  			return false;
> >>> -		}
> >>>  	}
> >>>  
> >>>  	return true;
> 
> Reinette

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2026-06-10 20:01 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-01 19:56 [PATCH v7 00/14] Allow AET to use PMT as loadable module Tony Luck
2026-06-01 19:56 ` [PATCH v7 01/14] fs/resctrl: Move functions to avoid forward references in subsequent fixes Tony Luck
2026-06-01 19:56 ` [PATCH v7 02/14] fs/resctrl: Free mon_data structures on rdt_get_tree() failure Tony Luck
2026-06-01 19:56 ` [PATCH v7 03/14] fs/resctrl: Fix use-after-free during unmount Tony Luck
2026-06-01 19:56 ` [PATCH v7 04/14] fs/resctrl: Fix deadlock for errors during mount Tony Luck
2026-06-01 19:56 ` [PATCH v7 05/14] x86/resctrl: Stop setting event_group::force_off on RMID shortage Tony Luck
2026-06-08 23:16   ` Reinette Chatre
2026-06-09 16:51     ` Luck, Tony
2026-06-09 23:02       ` Reinette Chatre
2026-06-10 20:01         ` Luck, Tony
2026-06-01 19:56 ` [PATCH v7 06/14] fs/resctrl: Add interface to disable a monitor event Tony Luck
2026-06-08 23:18   ` Reinette Chatre
2026-06-09 17:21     ` Luck, Tony
2026-06-09 23:02       ` Reinette Chatre
2026-06-01 19:56 ` [PATCH v7 07/14] x86/resctrl: Maintain a count of enabled monitor features Tony Luck
2026-06-08 23:18   ` Reinette Chatre
2026-06-09 18:46     ` Luck, Tony
2026-06-09 23:03       ` Reinette Chatre
2026-06-01 19:56 ` [PATCH v7 08/14] fs,x86,mpam/resctrl: Handle change in number of RMIDs on each mount Tony Luck
2026-06-08 23:21   ` Reinette Chatre
2026-06-09 21:58     ` Luck, Tony
2026-06-09 23:35       ` Reinette Chatre
2026-06-01 19:56 ` [PATCH v7 09/14] x86/resctrl: Add PMT registration API for AET enumeration callbacks Tony Luck
2026-06-08 23:21   ` Reinette Chatre
2026-06-01 19:56 ` [PATCH v7 10/14] platform/x86/intel/pmt: Register enumeration functions with resctrl Tony Luck
2026-06-08 23:22   ` Reinette Chatre
2026-06-09 22:11     ` Luck, Tony
2026-06-01 19:56 ` [PATCH v7 11/14] mpam,x86/resctrl: Resolve INTEL_PMT_TELEMETRY symbols at runtime Tony Luck
2026-06-08 23:25   ` Reinette Chatre
2026-06-10  0:08     ` Luck, Tony
2026-06-10 15:27       ` Reinette Chatre
2026-06-10 15:49         ` Luck, Tony
2026-06-10 16:21           ` Reinette Chatre
2026-06-10 16:34             ` Luck, Tony
2026-06-10 16:46               ` Reinette Chatre
2026-06-10 17:24                 ` Luck, Tony
2026-06-10 17:58                   ` Reinette Chatre
2026-06-01 19:56 ` [PATCH v7 12/14] fs/resctrl: Call architecture hooks for every mount/unmount Tony Luck
2026-06-08 23:26   ` Reinette Chatre
2026-06-10 16:16     ` Luck, Tony
2026-06-01 19:56 ` [PATCH v7 13/14] x86/resctrl: Simplify Kconfig options for resctrl Tony Luck
2026-06-01 19:56 ` [PATCH v7 14/14] Documentation/filesystems/resctrl: Add footnote for telemetry fstab mount caveat Tony Luck
2026-06-08 23:26   ` Reinette Chatre
2026-06-10 16:19     ` Luck, Tony

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.