[PATCH] fs/resctrl: Fix use-after-free in resctrl_offline_mon_domain()

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Tony Luck <tony.luck@intel.com>
To: Borislav Petkov <bp@alien8.de>, x86@kernel.org
Cc: Fenghua Yu <fenghuay@nvidia.com>,
	Reinette Chatre <reinette.chatre@intel.com>,
	Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>,
	Peter Newman <peternewman@google.com>,
	James Morse <james.morse@arm.com>,
	Babu Moger <babu.moger@amd.com>,
	Drew Fustini <dfustini@baylibre.com>,
	Dave Martin <Dave.Martin@arm.com>, Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org, patches@lists.linux.dev,
	Tony Luck <tony.luck@intel.com>
Subject: [PATCH] fs/resctrl: Fix use-after-free in resctrl_offline_mon_domain()
Date: Fri,  1 May 2026 14:36:11 -0700	[thread overview]
Message-ID: <20260501213611.25600-1-tony.luck@intel.com> (raw)

Sashiko noticed[1] a user-after-free in the resctrl worker thread code.

resctrl_offline_mon_domain() acquires rdtgroup_mutex and calls
cancel_delayed_work() (non-synchronous) on the per-domain mbm_over and
cqm_limbo delayed_work items, then calls domain_destroy_l3_mon_state()
which frees d->rmid_busy_llc and d->mbm_states[]. After it returns, the
caller (e.g. domain_remove_cpu_mon() in arch/x86 or the mpam equivalent)
deletes the domain from its list and frees the domain itself.

cancel_delayed_work() does not wait for a handler that is already
running. mbm_handle_overflow() and cqm_handle_limbo() each acquire
rdtgroup_mutex before touching the domain, so a handler that started
just before resctrl_offline_mon_domain() runs will block on the mutex.
When resctrl_offline_mon_domain() drops the mutex, the handler wakes
up with a stale 'd' obtained via container_of() and dereferences memory
that has just been freed.

Drain the handlers with cancel_delayed_work_sync() so no handler can be
running or pending against the domain when its state is freed:

  - Add an 'offlining' flag to struct rdt_l3_mon_domain. Under
    rdtgroup_mutex, resctrl_offline_mon_domain() sets it before
    dropping the mutex; the handlers test it after acquiring the
    mutex and exit without rescheduling. This guarantees that
    cancel_delayed_work_sync() does not race with the handler
    re-arming itself.

  - Drop cpus_read_lock() from mbm_handle_overflow() and
    cqm_handle_limbo(). resctrl_offline_mon_domain() can be invoked
    from a CPU hotplug callback that holds the hotplug write lock;
    a handler blocked on cpus_read_lock() in that window would
    deadlock cancel_delayed_work_sync(). The data the handlers
    examine is protected by rdtgroup_mutex, and
    schedule_delayed_work_on() copes with a target CPU that is going
    offline by migrating the work, so the cpus_read_lock() was not
    required for correctness.

  - Restructure resctrl_offline_mon_domain() to: set ->offlining and
    remove the mondata directories under rdtgroup_mutex; drop the
    mutex; cancel_delayed_work_sync() both handlers; reacquire the
    mutex to do the final force __check_limbo() and free the
    per-domain monitor state. The cancel must run with the mutex
    released because the handlers acquire it. Cancel both handlers
    unconditionally on the L3 path (subject to the feature being
    enabled) rather than gating cqm_limbo on has_busy_rmid(): a
    handler may already be executing __check_limbo() with no busy
    RMIDs left, and that invocation must be drained before its 'd'
    is freed.

Fixes: 24247aeeabe9 ("x86/intel_rdt/cqm: Improve limbo list processing")
Assisted-by: Copilot:claude-opus-4.7
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://sashiko.dev/#/patchset/20260429184858.36423-1-tony.luck%40intel.com [1]
---
 include/linux/resctrl.h |  1 +
 fs/resctrl/monitor.c    | 18 ++++++++++--------
 fs/resctrl/rdtgroup.c   | 38 ++++++++++++++++++++++++++++++++++----
 3 files changed, 45 insertions(+), 12 deletions(-)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 006e57fd7ca5..73f2638b96ad 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -203,6 +203,7 @@ struct rdt_l3_mon_domain {
 	int				mbm_work_cpu;
 	int				cqm_work_cpu;
 	struct mbm_cntr_cfg		*cntr_cfg;
+	bool				offlining;
 };
 
 /**
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 9fd901c78dc6..e68eec83306e 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -794,11 +794,14 @@ void cqm_handle_limbo(struct work_struct *work)
 	unsigned long delay = msecs_to_jiffies(CQM_LIMBOCHECK_INTERVAL);
 	struct rdt_l3_mon_domain *d;
 
-	cpus_read_lock();
 	mutex_lock(&rdtgroup_mutex);
 
 	d = container_of(work, struct rdt_l3_mon_domain, cqm_limbo.work);
 
+	/*  If this domain is being deleted this work no longer needs to run. */
+	if (d->offlining)
+		goto out_unlock;
+
 	__check_limbo(d, false);
 
 	if (has_busy_rmid(d)) {
@@ -808,8 +811,8 @@ void cqm_handle_limbo(struct work_struct *work)
 					 delay);
 	}
 
+out_unlock:
 	mutex_unlock(&rdtgroup_mutex);
-	cpus_read_unlock();
 }
 
 /**
@@ -841,18 +844,18 @@ void mbm_handle_overflow(struct work_struct *work)
 	struct list_head *head;
 	struct rdt_resource *r;
 
-	cpus_read_lock();
 	mutex_lock(&rdtgroup_mutex);
 
+	d = container_of(work, struct rdt_l3_mon_domain, mbm_over.work);
+
 	/*
-	 * If the filesystem has been unmounted this work no longer needs to
-	 * run.
+	 * If this domain is being deleted, or the filesystem has been
+	 * unmounted this work no longer needs to run.
 	 */
-	if (!resctrl_mounted || !resctrl_arch_mon_capable())
+	if (d->offlining || !resctrl_mounted || !resctrl_arch_mon_capable())
 		goto out_unlock;
 
 	r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
-	d = container_of(work, struct rdt_l3_mon_domain, mbm_over.work);
 
 	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
 		mbm_update(r, d, prgrp);
@@ -875,7 +878,6 @@ void mbm_handle_overflow(struct work_struct *work)
 
 out_unlock:
 	mutex_unlock(&rdtgroup_mutex);
-	cpus_read_unlock();
 }
 
 /**
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 8544020ef420..c883149fa373 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -4323,7 +4323,7 @@ void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain
 
 void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr)
 {
-	struct rdt_l3_mon_domain *d;
+	struct rdt_l3_mon_domain *d = NULL;
 
 	mutex_lock(&rdtgroup_mutex);
 
@@ -4341,8 +4341,39 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *h
 		goto out_unlock;
 
 	d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
+
+	/*
+	 * Tell mbm_handle_overflow() and cqm_handle_limbo() that this
+	 * domain is going away.
+	 */
+	d->offlining = true;
+
+out_unlock:
+	mutex_unlock(&rdtgroup_mutex);
+
+	if (!d)
+		return;
+
+	/*
+	 * Drain any pending or in-flight overflow / limbo handlers before
+	 * freeing per-domain monitor state (and before the caller frees the
+	 * domain itself). cancel_delayed_work_sync() must be called with
+	 * rdtgroup_mutex released because the handlers acquire it; the
+	 * handlers no longer take cpus_read_lock(), so this is safe to call
+	 * from a CPU hotplug callback that holds the hotplug write lock.
+	 *
+	 * Without the synchronous cancel, a handler that was already running
+	 * and blocked on rdtgroup_mutex when this function was entered could
+	 * wake after the mutex is dropped and dereference d->rmid_busy_llc,
+	 * d->mbm_states[] or the domain itself after they have been freed.
+	 */
 	if (resctrl_is_mbm_enabled())
-		cancel_delayed_work(&d->mbm_over);
+		cancel_delayed_work_sync(&d->mbm_over);
+	if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID))
+		cancel_delayed_work_sync(&d->cqm_limbo);
+
+	mutex_lock(&rdtgroup_mutex);
+
 	if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID) && has_busy_rmid(d)) {
 		/*
 		 * When a package is going down, forcefully
@@ -4353,11 +4384,10 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *h
 		 * package never comes back.
 		 */
 		__check_limbo(d, true);
-		cancel_delayed_work(&d->cqm_limbo);
 	}
 
 	domain_destroy_l3_mon_state(d);
-out_unlock:
+
 	mutex_unlock(&rdtgroup_mutex);
 }
 
-- 
2.54.0

next             reply	other threads:[~2026-05-01 21:36 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-01 21:36 Tony Luck [this message]
2026-05-04 15:11 ` [PATCH] fs/resctrl: Fix use-after-free in resctrl_offline_mon_domain() Reinette Chatre
2026-05-04 22:50   ` Luck, Tony
2026-05-05  4:39     ` Reinette Chatre

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:006e57fd7ca dfblob:73f2638b96a dfblob:9fd901c78dc
dfblob:e68eec83306 dfblob:8544020ef42 dfblob:c883149fa37 )
 OR (
bs:"[PATCH] fs/resctrl: Fix use-after-free in resctrl_offline_mon_domain()" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260501213611.25600-1-tony.luck@intel.com \
    --to=tony.luck@intel.com \
    --cc=Dave.Martin@arm.com \
    --cc=babu.moger@amd.com \
    --cc=bp@alien8.de \
    --cc=dfustini@baylibre.com \
    --cc=fenghuay@nvidia.com \
    --cc=james.morse@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maciej.wieczor-retman@intel.com \
    --cc=patches@lists.linux.dev \
    --cc=peternewman@google.com \
    --cc=reinette.chatre@intel.com \
    --cc=x86@kernel.org \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox