Re: [PATCH] fs/resctrl: Fix use-after-free in resctrl_offline_mon_domain()

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: "Luck, Tony" <tony.luck@intel.com>
To: Reinette Chatre <reinette.chatre@intel.com>
Cc: Borislav Petkov <bp@alien8.de>, <x86@kernel.org>,
	Fenghua Yu <fenghuay@nvidia.com>,
	Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>,
	Peter Newman <peternewman@google.com>,
	James Morse <james.morse@arm.com>,
	Babu Moger <babu.moger@amd.com>,
	"Drew Fustini" <dfustini@baylibre.com>,
	Dave Martin <Dave.Martin@arm.com>, Chen Yu <yu.c.chen@intel.com>,
	<linux-kernel@vger.kernel.org>, <patches@lists.linux.dev>
Subject: Re: [PATCH] fs/resctrl: Fix use-after-free in resctrl_offline_mon_domain()
Date: Mon, 4 May 2026 15:50:54 -0700	[thread overview]
Message-ID: <afkizhrSwJkiLD3j@agluck-desk3> (raw)
In-Reply-To: <2236fae5-7e66-43fb-ba05-76fd4434e2c9@intel.com>

On Mon, May 04, 2026 at 08:11:48AM -0700, Reinette Chatre wrote:
> Hi Tony,
> 
> On 5/1/26 2:36 PM, Tony Luck wrote:
> > Sashiko noticed[1] a user-after-free in the resctrl worker thread code.
> > 
> > resctrl_offline_mon_domain() acquires rdtgroup_mutex and calls
> > cancel_delayed_work() (non-synchronous) on the per-domain mbm_over and
> > cqm_limbo delayed_work items, then calls domain_destroy_l3_mon_state()
> > which frees d->rmid_busy_llc and d->mbm_states[]. After it returns, the
> > caller (e.g. domain_remove_cpu_mon() in arch/x86 or the mpam equivalent)
> > deletes the domain from its list and frees the domain itself.
> > 
> > cancel_delayed_work() does not wait for a handler that is already
> > running. mbm_handle_overflow() and cqm_handle_limbo() each acquire
> > rdtgroup_mutex before touching the domain, so a handler that started
> > just before resctrl_offline_mon_domain() runs will block on the mutex.
> > When resctrl_offline_mon_domain() drops the mutex, the handler wakes
> > up with a stale 'd' obtained via container_of() and dereferences memory
> > that has just been freed.
> > 
> > Drain the handlers with cancel_delayed_work_sync() so no handler can be
> > running or pending against the domain when its state is freed:
> > 
> >   - Add an 'offlining' flag to struct rdt_l3_mon_domain. Under
> >     rdtgroup_mutex, resctrl_offline_mon_domain() sets it before
> >     dropping the mutex; the handlers test it after acquiring the
> >     mutex and exit without rescheduling. This guarantees that
> >     cancel_delayed_work_sync() does not race with the handler
> >     re-arming itself.
> > 
> >   - Drop cpus_read_lock() from mbm_handle_overflow() and
> >     cqm_handle_limbo(). resctrl_offline_mon_domain() can be invoked
> >     from a CPU hotplug callback that holds the hotplug write lock;
> >     a handler blocked on cpus_read_lock() in that window would
> >     deadlock cancel_delayed_work_sync(). The data the handlers
> >     examine is protected by rdtgroup_mutex, and
> >     schedule_delayed_work_on() copes with a target CPU that is going
> >     offline by migrating the work, so the cpus_read_lock() was not
> >     required for correctness.
> > 
> >   - Restructure resctrl_offline_mon_domain() to: set ->offlining and
> >     remove the mondata directories under rdtgroup_mutex; drop the
> >     mutex; cancel_delayed_work_sync() both handlers; reacquire the
> >     mutex to do the final force __check_limbo() and free the
> >     per-domain monitor state. The cancel must run with the mutex
> >     released because the handlers acquire it. Cancel both handlers
> >     unconditionally on the L3 path (subject to the feature being
> >     enabled) rather than gating cqm_limbo on has_busy_rmid(): a
> >     handler may already be executing __check_limbo() with no busy
> >     RMIDs left, and that invocation must be drained before its 'd'
> >     is freed.
> > 
> > Fixes: 24247aeeabe9 ("x86/intel_rdt/cqm: Improve limbo list processing")
> > Assisted-by: Copilot:claude-opus-4.7
> > Signed-off-by: Tony Luck <tony.luck@intel.com>
> > Link: https://sashiko.dev/#/patchset/20260429184858.36423-1-tony.luck%40intel.com [1]
> > ---
> >  include/linux/resctrl.h |  1 +
> >  fs/resctrl/monitor.c    | 18 ++++++++++--------
> >  fs/resctrl/rdtgroup.c   | 38 ++++++++++++++++++++++++++++++++++----
> >  3 files changed, 45 insertions(+), 12 deletions(-)
> > 
> > diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> > index 006e57fd7ca5..73f2638b96ad 100644
> > --- a/include/linux/resctrl.h
> > +++ b/include/linux/resctrl.h
> > @@ -203,6 +203,7 @@ struct rdt_l3_mon_domain {
> >  	int				mbm_work_cpu;
> >  	int				cqm_work_cpu;
> >  	struct mbm_cntr_cfg		*cntr_cfg;
> > +	bool				offlining;
> >  };
> >  
> >  /**
> > diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> > index 9fd901c78dc6..e68eec83306e 100644
> > --- a/fs/resctrl/monitor.c
> > +++ b/fs/resctrl/monitor.c
> > @@ -794,11 +794,14 @@ void cqm_handle_limbo(struct work_struct *work)
> >  	unsigned long delay = msecs_to_jiffies(CQM_LIMBOCHECK_INTERVAL);
> >  	struct rdt_l3_mon_domain *d;
> >  
> > -	cpus_read_lock();
> >  	mutex_lock(&rdtgroup_mutex);
> >  
> >  	d = container_of(work, struct rdt_l3_mon_domain, cqm_limbo.work);
> 
> Since work always runs on a CPU belonging to the domain, could it be simpler to use
> get_mon_domain_from_cpu() using the CPU running the work to obtain the domain here
> instead of the work contained in the domain struct? 

Is this true? When a CPU is taken offline Linux picks another CPU to run
any unexpired queued work. No guarantee that new CPU is in the same
domain. I think a robust solution is going to need a check that the
delayed work handlers are on the right domain. It looks like existing
code doesn't handle this well.

> 
> This seems to more closely match the pattern used in rdtgroup_mondata_show() that
> stores the domain ID in its state instead of a pointer to the domain and then uses
> resctrl_find_domain() to find domain.
> 
> >  
> > +	/*  If this domain is being deleted this work no longer needs to run. */
> > +	if (d->offlining)
> > +		goto out_unlock;
> > +

Claude seemed quite confident about removal of cpus_read_lock() in these
functions. Sashiko is confident that this has opened up several new race
conditions:

https://sashiko.dev/#/patchset/20260501213611.25600-1-tony.luck%40intel.com

Maybe we need to add a reference count to the rdt_l3_mon_domain
structure and delay freeing it until the last user is gone?

> 
> Reinette

-Tony

next prev parent reply	other threads:[~2026-05-04 22:51 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-01 21:36 [PATCH] fs/resctrl: Fix use-after-free in resctrl_offline_mon_domain() Tony Luck
2026-05-04 15:11 ` Reinette Chatre
2026-05-04 22:50   ` Luck, Tony [this message]
2026-05-05  4:39     ` Reinette Chatre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afkizhrSwJkiLD3j@agluck-desk3 \
    --to=tony.luck@intel.com \
    --cc=Dave.Martin@arm.com \
    --cc=babu.moger@amd.com \
    --cc=bp@alien8.de \
    --cc=dfustini@baylibre.com \
    --cc=fenghuay@nvidia.com \
    --cc=james.morse@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maciej.wieczor-retman@intel.com \
    --cc=patches@lists.linux.dev \
    --cc=peternewman@google.com \
    --cc=reinette.chatre@intel.com \
    --cc=x86@kernel.org \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox