From: Tony Luck <tony.luck@intel.com>
To: Borislav Petkov <bp@alien8.de>, x86@kernel.org
Cc: Fenghua Yu <fenghuay@nvidia.com>,
Reinette Chatre <reinette.chatre@intel.com>,
Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>,
Peter Newman <peternewman@google.com>,
James Morse <james.morse@arm.com>,
Babu Moger <babu.moger@amd.com>,
Drew Fustini <dfustini@baylibre.com>,
Dave Martin <Dave.Martin@arm.com>, Chen Yu <yu.c.chen@intel.com>,
linux-kernel@vger.kernel.org, patches@lists.linux.dev,
Tony Luck <tony.luck@intel.com>
Subject: [PATCH] fs/resctrl: Fix use-after-free in resctrl_offline_mon_domain()
Date: Fri, 1 May 2026 14:36:11 -0700 [thread overview]
Message-ID: <20260501213611.25600-1-tony.luck@intel.com> (raw)
Sashiko noticed[1] a user-after-free in the resctrl worker thread code.
resctrl_offline_mon_domain() acquires rdtgroup_mutex and calls
cancel_delayed_work() (non-synchronous) on the per-domain mbm_over and
cqm_limbo delayed_work items, then calls domain_destroy_l3_mon_state()
which frees d->rmid_busy_llc and d->mbm_states[]. After it returns, the
caller (e.g. domain_remove_cpu_mon() in arch/x86 or the mpam equivalent)
deletes the domain from its list and frees the domain itself.
cancel_delayed_work() does not wait for a handler that is already
running. mbm_handle_overflow() and cqm_handle_limbo() each acquire
rdtgroup_mutex before touching the domain, so a handler that started
just before resctrl_offline_mon_domain() runs will block on the mutex.
When resctrl_offline_mon_domain() drops the mutex, the handler wakes
up with a stale 'd' obtained via container_of() and dereferences memory
that has just been freed.
Drain the handlers with cancel_delayed_work_sync() so no handler can be
running or pending against the domain when its state is freed:
- Add an 'offlining' flag to struct rdt_l3_mon_domain. Under
rdtgroup_mutex, resctrl_offline_mon_domain() sets it before
dropping the mutex; the handlers test it after acquiring the
mutex and exit without rescheduling. This guarantees that
cancel_delayed_work_sync() does not race with the handler
re-arming itself.
- Drop cpus_read_lock() from mbm_handle_overflow() and
cqm_handle_limbo(). resctrl_offline_mon_domain() can be invoked
from a CPU hotplug callback that holds the hotplug write lock;
a handler blocked on cpus_read_lock() in that window would
deadlock cancel_delayed_work_sync(). The data the handlers
examine is protected by rdtgroup_mutex, and
schedule_delayed_work_on() copes with a target CPU that is going
offline by migrating the work, so the cpus_read_lock() was not
required for correctness.
- Restructure resctrl_offline_mon_domain() to: set ->offlining and
remove the mondata directories under rdtgroup_mutex; drop the
mutex; cancel_delayed_work_sync() both handlers; reacquire the
mutex to do the final force __check_limbo() and free the
per-domain monitor state. The cancel must run with the mutex
released because the handlers acquire it. Cancel both handlers
unconditionally on the L3 path (subject to the feature being
enabled) rather than gating cqm_limbo on has_busy_rmid(): a
handler may already be executing __check_limbo() with no busy
RMIDs left, and that invocation must be drained before its 'd'
is freed.
Fixes: 24247aeeabe9 ("x86/intel_rdt/cqm: Improve limbo list processing")
Assisted-by: Copilot:claude-opus-4.7
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://sashiko.dev/#/patchset/20260429184858.36423-1-tony.luck%40intel.com [1]
---
include/linux/resctrl.h | 1 +
fs/resctrl/monitor.c | 18 ++++++++++--------
fs/resctrl/rdtgroup.c | 38 ++++++++++++++++++++++++++++++++++----
3 files changed, 45 insertions(+), 12 deletions(-)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 006e57fd7ca5..73f2638b96ad 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -203,6 +203,7 @@ struct rdt_l3_mon_domain {
int mbm_work_cpu;
int cqm_work_cpu;
struct mbm_cntr_cfg *cntr_cfg;
+ bool offlining;
};
/**
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 9fd901c78dc6..e68eec83306e 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -794,11 +794,14 @@ void cqm_handle_limbo(struct work_struct *work)
unsigned long delay = msecs_to_jiffies(CQM_LIMBOCHECK_INTERVAL);
struct rdt_l3_mon_domain *d;
- cpus_read_lock();
mutex_lock(&rdtgroup_mutex);
d = container_of(work, struct rdt_l3_mon_domain, cqm_limbo.work);
+ /* If this domain is being deleted this work no longer needs to run. */
+ if (d->offlining)
+ goto out_unlock;
+
__check_limbo(d, false);
if (has_busy_rmid(d)) {
@@ -808,8 +811,8 @@ void cqm_handle_limbo(struct work_struct *work)
delay);
}
+out_unlock:
mutex_unlock(&rdtgroup_mutex);
- cpus_read_unlock();
}
/**
@@ -841,18 +844,18 @@ void mbm_handle_overflow(struct work_struct *work)
struct list_head *head;
struct rdt_resource *r;
- cpus_read_lock();
mutex_lock(&rdtgroup_mutex);
+ d = container_of(work, struct rdt_l3_mon_domain, mbm_over.work);
+
/*
- * If the filesystem has been unmounted this work no longer needs to
- * run.
+ * If this domain is being deleted, or the filesystem has been
+ * unmounted this work no longer needs to run.
*/
- if (!resctrl_mounted || !resctrl_arch_mon_capable())
+ if (d->offlining || !resctrl_mounted || !resctrl_arch_mon_capable())
goto out_unlock;
r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
- d = container_of(work, struct rdt_l3_mon_domain, mbm_over.work);
list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
mbm_update(r, d, prgrp);
@@ -875,7 +878,6 @@ void mbm_handle_overflow(struct work_struct *work)
out_unlock:
mutex_unlock(&rdtgroup_mutex);
- cpus_read_unlock();
}
/**
diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index 8544020ef420..c883149fa373 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -4323,7 +4323,7 @@ void resctrl_offline_ctrl_domain(struct rdt_resource *r, struct rdt_ctrl_domain
void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *hdr)
{
- struct rdt_l3_mon_domain *d;
+ struct rdt_l3_mon_domain *d = NULL;
mutex_lock(&rdtgroup_mutex);
@@ -4341,8 +4341,39 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *h
goto out_unlock;
d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
+
+ /*
+ * Tell mbm_handle_overflow() and cqm_handle_limbo() that this
+ * domain is going away.
+ */
+ d->offlining = true;
+
+out_unlock:
+ mutex_unlock(&rdtgroup_mutex);
+
+ if (!d)
+ return;
+
+ /*
+ * Drain any pending or in-flight overflow / limbo handlers before
+ * freeing per-domain monitor state (and before the caller frees the
+ * domain itself). cancel_delayed_work_sync() must be called with
+ * rdtgroup_mutex released because the handlers acquire it; the
+ * handlers no longer take cpus_read_lock(), so this is safe to call
+ * from a CPU hotplug callback that holds the hotplug write lock.
+ *
+ * Without the synchronous cancel, a handler that was already running
+ * and blocked on rdtgroup_mutex when this function was entered could
+ * wake after the mutex is dropped and dereference d->rmid_busy_llc,
+ * d->mbm_states[] or the domain itself after they have been freed.
+ */
if (resctrl_is_mbm_enabled())
- cancel_delayed_work(&d->mbm_over);
+ cancel_delayed_work_sync(&d->mbm_over);
+ if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID))
+ cancel_delayed_work_sync(&d->cqm_limbo);
+
+ mutex_lock(&rdtgroup_mutex);
+
if (resctrl_is_mon_event_enabled(QOS_L3_OCCUP_EVENT_ID) && has_busy_rmid(d)) {
/*
* When a package is going down, forcefully
@@ -4353,11 +4384,10 @@ void resctrl_offline_mon_domain(struct rdt_resource *r, struct rdt_domain_hdr *h
* package never comes back.
*/
__check_limbo(d, true);
- cancel_delayed_work(&d->cqm_limbo);
}
domain_destroy_l3_mon_state(d);
-out_unlock:
+
mutex_unlock(&rdtgroup_mutex);
}
--
2.54.0
next reply other threads:[~2026-05-01 21:36 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-01 21:36 Tony Luck [this message]
2026-05-04 15:11 ` [PATCH] fs/resctrl: Fix use-after-free in resctrl_offline_mon_domain() Reinette Chatre
2026-05-04 22:50 ` Luck, Tony
2026-05-05 4:39 ` Reinette Chatre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260501213611.25600-1-tony.luck@intel.com \
--to=tony.luck@intel.com \
--cc=Dave.Martin@arm.com \
--cc=babu.moger@amd.com \
--cc=bp@alien8.de \
--cc=dfustini@baylibre.com \
--cc=fenghuay@nvidia.com \
--cc=james.morse@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=maciej.wieczor-retman@intel.com \
--cc=patches@lists.linux.dev \
--cc=peternewman@google.com \
--cc=reinette.chatre@intel.com \
--cc=x86@kernel.org \
--cc=yu.c.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox