From: "Luck, Tony" <tony.luck@intel.com>
To: Reinette Chatre <reinette.chatre@intel.com>
Cc: Borislav Petkov <bp@alien8.de>, <x86@kernel.org>,
Fenghua Yu <fenghuay@nvidia.com>,
Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>,
Peter Newman <peternewman@google.com>,
James Morse <james.morse@arm.com>,
Babu Moger <babu.moger@amd.com>,
"Drew Fustini" <dfustini@baylibre.com>,
Dave Martin <Dave.Martin@arm.com>, Chen Yu <yu.c.chen@intel.com>,
<linux-kernel@vger.kernel.org>, <patches@lists.linux.dev>
Subject: Re: [PATCH] fs/resctrl: Fix use-after-free in resctrl_offline_mon_domain()
Date: Mon, 4 May 2026 15:50:54 -0700 [thread overview]
Message-ID: <afkizhrSwJkiLD3j@agluck-desk3> (raw)
In-Reply-To: <2236fae5-7e66-43fb-ba05-76fd4434e2c9@intel.com>
On Mon, May 04, 2026 at 08:11:48AM -0700, Reinette Chatre wrote:
> Hi Tony,
>
> On 5/1/26 2:36 PM, Tony Luck wrote:
> > Sashiko noticed[1] a user-after-free in the resctrl worker thread code.
> >
> > resctrl_offline_mon_domain() acquires rdtgroup_mutex and calls
> > cancel_delayed_work() (non-synchronous) on the per-domain mbm_over and
> > cqm_limbo delayed_work items, then calls domain_destroy_l3_mon_state()
> > which frees d->rmid_busy_llc and d->mbm_states[]. After it returns, the
> > caller (e.g. domain_remove_cpu_mon() in arch/x86 or the mpam equivalent)
> > deletes the domain from its list and frees the domain itself.
> >
> > cancel_delayed_work() does not wait for a handler that is already
> > running. mbm_handle_overflow() and cqm_handle_limbo() each acquire
> > rdtgroup_mutex before touching the domain, so a handler that started
> > just before resctrl_offline_mon_domain() runs will block on the mutex.
> > When resctrl_offline_mon_domain() drops the mutex, the handler wakes
> > up with a stale 'd' obtained via container_of() and dereferences memory
> > that has just been freed.
> >
> > Drain the handlers with cancel_delayed_work_sync() so no handler can be
> > running or pending against the domain when its state is freed:
> >
> > - Add an 'offlining' flag to struct rdt_l3_mon_domain. Under
> > rdtgroup_mutex, resctrl_offline_mon_domain() sets it before
> > dropping the mutex; the handlers test it after acquiring the
> > mutex and exit without rescheduling. This guarantees that
> > cancel_delayed_work_sync() does not race with the handler
> > re-arming itself.
> >
> > - Drop cpus_read_lock() from mbm_handle_overflow() and
> > cqm_handle_limbo(). resctrl_offline_mon_domain() can be invoked
> > from a CPU hotplug callback that holds the hotplug write lock;
> > a handler blocked on cpus_read_lock() in that window would
> > deadlock cancel_delayed_work_sync(). The data the handlers
> > examine is protected by rdtgroup_mutex, and
> > schedule_delayed_work_on() copes with a target CPU that is going
> > offline by migrating the work, so the cpus_read_lock() was not
> > required for correctness.
> >
> > - Restructure resctrl_offline_mon_domain() to: set ->offlining and
> > remove the mondata directories under rdtgroup_mutex; drop the
> > mutex; cancel_delayed_work_sync() both handlers; reacquire the
> > mutex to do the final force __check_limbo() and free the
> > per-domain monitor state. The cancel must run with the mutex
> > released because the handlers acquire it. Cancel both handlers
> > unconditionally on the L3 path (subject to the feature being
> > enabled) rather than gating cqm_limbo on has_busy_rmid(): a
> > handler may already be executing __check_limbo() with no busy
> > RMIDs left, and that invocation must be drained before its 'd'
> > is freed.
> >
> > Fixes: 24247aeeabe9 ("x86/intel_rdt/cqm: Improve limbo list processing")
> > Assisted-by: Copilot:claude-opus-4.7
> > Signed-off-by: Tony Luck <tony.luck@intel.com>
> > Link: https://sashiko.dev/#/patchset/20260429184858.36423-1-tony.luck%40intel.com [1]
> > ---
> > include/linux/resctrl.h | 1 +
> > fs/resctrl/monitor.c | 18 ++++++++++--------
> > fs/resctrl/rdtgroup.c | 38 ++++++++++++++++++++++++++++++++++----
> > 3 files changed, 45 insertions(+), 12 deletions(-)
> >
> > diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> > index 006e57fd7ca5..73f2638b96ad 100644
> > --- a/include/linux/resctrl.h
> > +++ b/include/linux/resctrl.h
> > @@ -203,6 +203,7 @@ struct rdt_l3_mon_domain {
> > int mbm_work_cpu;
> > int cqm_work_cpu;
> > struct mbm_cntr_cfg *cntr_cfg;
> > + bool offlining;
> > };
> >
> > /**
> > diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> > index 9fd901c78dc6..e68eec83306e 100644
> > --- a/fs/resctrl/monitor.c
> > +++ b/fs/resctrl/monitor.c
> > @@ -794,11 +794,14 @@ void cqm_handle_limbo(struct work_struct *work)
> > unsigned long delay = msecs_to_jiffies(CQM_LIMBOCHECK_INTERVAL);
> > struct rdt_l3_mon_domain *d;
> >
> > - cpus_read_lock();
> > mutex_lock(&rdtgroup_mutex);
> >
> > d = container_of(work, struct rdt_l3_mon_domain, cqm_limbo.work);
>
> Since work always runs on a CPU belonging to the domain, could it be simpler to use
> get_mon_domain_from_cpu() using the CPU running the work to obtain the domain here
> instead of the work contained in the domain struct?
Is this true? When a CPU is taken offline Linux picks another CPU to run
any unexpired queued work. No guarantee that new CPU is in the same
domain. I think a robust solution is going to need a check that the
delayed work handlers are on the right domain. It looks like existing
code doesn't handle this well.
>
> This seems to more closely match the pattern used in rdtgroup_mondata_show() that
> stores the domain ID in its state instead of a pointer to the domain and then uses
> resctrl_find_domain() to find domain.
>
> >
> > + /* If this domain is being deleted this work no longer needs to run. */
> > + if (d->offlining)
> > + goto out_unlock;
> > +
Claude seemed quite confident about removal of cpus_read_lock() in these
functions. Sashiko is confident that this has opened up several new race
conditions:
https://sashiko.dev/#/patchset/20260501213611.25600-1-tony.luck%40intel.com
Maybe we need to add a reference count to the rdt_l3_mon_domain
structure and delay freeing it until the last user is gone?
>
> Reinette
-Tony
next prev parent reply other threads:[~2026-05-04 22:51 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-01 21:36 [PATCH] fs/resctrl: Fix use-after-free in resctrl_offline_mon_domain() Tony Luck
2026-05-04 15:11 ` Reinette Chatre
2026-05-04 22:50 ` Luck, Tony [this message]
2026-05-05 4:39 ` Reinette Chatre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=afkizhrSwJkiLD3j@agluck-desk3 \
--to=tony.luck@intel.com \
--cc=Dave.Martin@arm.com \
--cc=babu.moger@amd.com \
--cc=bp@alien8.de \
--cc=dfustini@baylibre.com \
--cc=fenghuay@nvidia.com \
--cc=james.morse@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=maciej.wieczor-retman@intel.com \
--cc=patches@lists.linux.dev \
--cc=peternewman@google.com \
--cc=reinette.chatre@intel.com \
--cc=x86@kernel.org \
--cc=yu.c.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox