Re: [RFC PATCH] fs/resctrl: Fix use-after-free during unmount

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Luck, Tony" <tony.luck@intel.com>
To: Reinette Chatre <reinette.chatre@intel.com>
Cc: Fenghua Yu <fenghuay@nvidia.com>,
	Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>,
	Peter Newman <peternewman@google.com>,
	James Morse <james.morse@arm.com>,
	Babu Moger <babu.moger@amd.com>,
	"Drew Fustini" <dfustini@baylibre.com>,
	Dave Martin <Dave.Martin@arm.com>, Chen Yu <yu.c.chen@intel.com>,
	Borislav Petkov <bp@alien8.de>, <x86@kernel.org>,
	<linux-kernel@vger.kernel.org>, <patches@lists.linux.dev>
Subject: Re: [RFC PATCH] fs/resctrl: Fix use-after-free during unmount
Date: Thu, 14 May 2026 15:23:11 -0700	[thread overview]
Message-ID: <agZLT1hIPrDIrN4C@agluck-desk3> (raw)
In-Reply-To: <28f663b7-de8c-4d10-b750-edd8ab9bceb7@intel.com>

On Thu, May 14, 2026 at 02:45:10PM -0700, Reinette Chatre wrote:
> Hi Tony,
> 
> On 5/13/26 3:40 PM, Tony Luck wrote:
> > Sashiko reported[1] this issue:
> > 
> >   During unmount or failure teardown, resctrl_fs_teardown() calls
> >   mon_put_kn_priv() (which frees all mon_data structures) followed
> >   by rdtgroup_destroy_root() (which destroys kernfs nodes). However, the
> >   RDT_DELETED flag is never set for rdtgroup_default.
> > 
> >   If a concurrent reader (e.g., rdtgroup_mondata_show()) invokes
> >   rdtgroup_kn_lock_live(), it drops kernfs active protection and blocks on
> >   rdtgroup_mutex. resctrl_fs_teardown() (holding the mutex) proceeds to free
> >   the private data and destroy the nodes without waiting for the reader.
> > 
> >   When the mutex is released, the reader wakes up, observes that RDT_DELETED is
> >   not set for the default group, and dereferences the already-freed of->kn->priv
> >   pointer.
> > 
> > Set RDT_DELETED for the default group (if there are any tasks waiting).
> > 
> > Signed-off-by: Tony Luck <tony.luck@intel.com>
> > Link: https://sashiko.dev/#/patchset/20260508182143.14592-1-tony.luck%40intel.com?part=2 [1]
> > ---
> > 
> > Yet another side-quest from Sashiko. RFC for some human eyes before I
> > add to my series and post a new version;
> 
> sashiko also reviewed it and found a few issues that seem legitimate:
> https://sashiko.dev/#/patchset/20260513224044.17167-1-tony.luck%40intel.com
> 
> > 
> > 1) Is this real? It looks like it is to me.
> 
> Looks like it to me also.

Thanks for confirming.

> > 2) Is my fix reasonable?
> > 3) Better way to fix it?
> > 
> >  fs/resctrl/rdtgroup.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> > index eac7e4f8574d..668ebe0b0ec6 100644
> > --- a/fs/resctrl/rdtgroup.c
> > +++ b/fs/resctrl/rdtgroup.c
> > @@ -594,7 +594,8 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of,
> >  static void rdtgroup_remove(struct rdtgroup *rdtgrp)
> >  {
> >  	kernfs_put(rdtgrp->kn);
> > -	kfree(rdtgrp);
> > +	if (rdtgrp != &rdtgroup_default)
> > +		kfree(rdtgrp);
> 
> Issue described by sashiko with new kernfs_put() is real here. rdtgroup_remove()
> was not called on default group before this change and it assumes that there is
> an additional reference that it can release here. See comment above the
> kernfs_get() found in mkdir_rdt_prepare, copied here for convenience:
> 
> 	/*                                                                      
> 	 * kernfs_remove() will drop the reference count on "kn" which          
> 	 * will free it. But we still need it to stick around for the           
> 	 * rdtgroup_kn_unlock(kn) call. Take one extra reference here,          
> 	 * which will be dropped by kernfs_put() in rdtgroup_remove().          
> 	 */                                               
> 
> We could aim to balance the references with a kernfs_get() during rdtgroup_setup_root()
> but then we need to take care to ensure rdtgroup_remove() is called on exit for
> the default group which may be confusing since it is not actually removed. How about
> instead just don't drop the reference that we do not have? What do you think of below?

This looks good. I can include a comment on why nothing needs to be done
for the default group.

> 	/* If doing below the function comments need to be updated */
> 	static void rdtgroup_remove(struct rdtgroup *rdtgrp)                            
> 	{                                                                               
> 		if (rdtgrp == &rdtgroup_default)                                        
> 			return;                                                         
> 		kernfs_put(rdtgrp->kn);                                                 
> 		kfree(rdtgrp);                                                          
> 	}                            
> 
> When considering the races with a concurrent mount mentioned by sashiko I wonder
> if resctrl should not also use waiters on default group to gate any new mounts.
> I was tempted to place such check in rdtgroup_setup_root() but it seems to bury
> something gating the mount so perhaps better at beginning of rdt_get_tree()?
> 
> What do you think of something like below?

Also good. Eliminates concerns about a new mount messing with things
pending from the previous mount.

> 	rdt_get_tree()
> 	{
> 		...
> 		if (resctrl_mounted) {
> 			ret = -EBUSY;
> 			goto out;
> 		}
> 	
> 		if (atomic_read(&rdtgroup_default.waitcount) != 0) {
> 			ret = -EBUSY;
> 			goto out;
> 		}
> 		...
> 	}
> 
> Another alternative to consider is to not call mon_put_kn_priv() on unmount but
> instead on resctrl_exit()? Thus treating it similar to the RMID LRU list.
> This may be more complicated in the long term since it needs more care to ensure
> needed state is still available a resctrl file reader that was blocked because of
> unmount or failure (via resctrl_exit()).

Pushing the resctrl_exit() is currently saying we don't care about the
leaked allocation (since resctrl_exit() is never called - actually
discarded). Cleaning up on unmount now means one less thing to do if we
ever make resctrl a loadable module.

> >  }
> >  
> >  static void _update_task_closid_rmid(void *task)
> > @@ -2965,6 +2966,8 @@ static void resctrl_fs_teardown(void)
> >  	mon_put_kn_priv();
> >  	rdt_pseudo_lock_release();
> >  	rdtgroup_default.mode = RDT_MODE_SHAREABLE;
> > +	if (atomic_read(&rdtgroup_default.waitcount) != 0)
> > +		rdtgroup_default.flags = RDT_DELETED;
> 
> sashiko found a race here ... looks like setting RDT_DELETED unconditionally would
> help.

Yes - as long as you are OK with the asymmetry between the default group
and regular groups. I think it is OK because there are already many
special cases for the default group.

> >  	closid_exit();
> >  	schemata_list_destroy();
> >  	rdtgroup_destroy_root();
> > @@ -4291,6 +4294,7 @@ static int rdtgroup_setup_root(struct rdt_fs_context *ctx)
> >  
> >  	ctx->kfc.root = rdt_root;
> >  	rdtgroup_default.kn = kernfs_root_to_node(rdt_root);
> > +	rdtgroup_default.flags = 0;
> >  
> >  	return 0;
> >  }
> 
> The "permanent lock leak" issue reported by sashiko is not clear to me. It claims:
> 	
> ---8<---
> 	In rdtgroup_mondata_show(), if rdtgroup_kn_lock_live() returns NULL, the
> 	error path jumps to the out label:
> 	    out:
> 	        if (rdtgrp)
> 	            rdtgroup_kn_unlock(of->kn);
> 	Because rdtgrp is NULL, the unlock is skipped, leaving the locks permanently
> 	held.
> ---8<---
> 
> Comparing the claim to actual code the snippet looks like a mismatch since
> rdtgroup_mondata_show() actually looks like:
> 	out:
> 		rdtgroup_kn_unlock(of->kn);

Yes. Looks like a problem in hallucinated code.

> Reinette

-Tony

next prev parent reply	other threads:[~2026-05-14 22:23 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-13 22:40 [RFC PATCH] fs/resctrl: Fix use-after-free during unmount Tony Luck
2026-05-14 21:45 ` Reinette Chatre
2026-05-14 22:23   ` Luck, Tony [this message]
2026-05-14 22:44     ` Reinette Chatre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agZLT1hIPrDIrN4C@agluck-desk3 \
    --to=tony.luck@intel.com \
    --cc=Dave.Martin@arm.com \
    --cc=babu.moger@amd.com \
    --cc=bp@alien8.de \
    --cc=dfustini@baylibre.com \
    --cc=fenghuay@nvidia.com \
    --cc=james.morse@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maciej.wieczor-retman@intel.com \
    --cc=patches@lists.linux.dev \
    --cc=peternewman@google.com \
    --cc=reinette.chatre@intel.com \
    --cc=x86@kernel.org \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.