Archive-only list for patches
 help / color / mirror / Atom feed
From: "Luck, Tony" <tony.luck@intel.com>
To: Reinette Chatre <reinette.chatre@intel.com>
Cc: Fenghua Yu <fenghuay@nvidia.com>,
	Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>,
	Peter Newman <peternewman@google.com>,
	James Morse <james.morse@arm.com>,
	Babu Moger <babu.moger@amd.com>,
	"Drew Fustini" <dfustini@baylibre.com>,
	Dave Martin <Dave.Martin@arm.com>, Chen Yu <yu.c.chen@intel.com>,
	Borislav Petkov <bp@alien8.de>, <x86@kernel.org>,
	<linux-kernel@vger.kernel.org>, <patches@lists.linux.dev>
Subject: Re: [RFC PATCH] fs/resctrl: Fix use-after-free during unmount
Date: Thu, 14 May 2026 15:23:11 -0700	[thread overview]
Message-ID: <agZLT1hIPrDIrN4C@agluck-desk3> (raw)
In-Reply-To: <28f663b7-de8c-4d10-b750-edd8ab9bceb7@intel.com>

On Thu, May 14, 2026 at 02:45:10PM -0700, Reinette Chatre wrote:
> Hi Tony,
> 
> On 5/13/26 3:40 PM, Tony Luck wrote:
> > Sashiko reported[1] this issue:
> > 
> >   During unmount or failure teardown, resctrl_fs_teardown() calls
> >   mon_put_kn_priv() (which frees all mon_data structures) followed
> >   by rdtgroup_destroy_root() (which destroys kernfs nodes). However, the
> >   RDT_DELETED flag is never set for rdtgroup_default.
> > 
> >   If a concurrent reader (e.g., rdtgroup_mondata_show()) invokes
> >   rdtgroup_kn_lock_live(), it drops kernfs active protection and blocks on
> >   rdtgroup_mutex. resctrl_fs_teardown() (holding the mutex) proceeds to free
> >   the private data and destroy the nodes without waiting for the reader.
> > 
> >   When the mutex is released, the reader wakes up, observes that RDT_DELETED is
> >   not set for the default group, and dereferences the already-freed of->kn->priv
> >   pointer.
> > 
> > Set RDT_DELETED for the default group (if there are any tasks waiting).
> > 
> > Signed-off-by: Tony Luck <tony.luck@intel.com>
> > Link: https://sashiko.dev/#/patchset/20260508182143.14592-1-tony.luck%40intel.com?part=2 [1]
> > ---
> > 
> > Yet another side-quest from Sashiko. RFC for some human eyes before I
> > add to my series and post a new version;
> 
> sashiko also reviewed it and found a few issues that seem legitimate:
> https://sashiko.dev/#/patchset/20260513224044.17167-1-tony.luck%40intel.com
> 
> > 
> > 1) Is this real? It looks like it is to me.
> 
> Looks like it to me also.

Thanks for confirming.

> > 2) Is my fix reasonable?
> > 3) Better way to fix it?
> > 
> >  fs/resctrl/rdtgroup.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> > index eac7e4f8574d..668ebe0b0ec6 100644
> > --- a/fs/resctrl/rdtgroup.c
> > +++ b/fs/resctrl/rdtgroup.c
> > @@ -594,7 +594,8 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of,
> >  static void rdtgroup_remove(struct rdtgroup *rdtgrp)
> >  {
> >  	kernfs_put(rdtgrp->kn);
> > -	kfree(rdtgrp);
> > +	if (rdtgrp != &rdtgroup_default)
> > +		kfree(rdtgrp);
> 
> Issue described by sashiko with new kernfs_put() is real here. rdtgroup_remove()
> was not called on default group before this change and it assumes that there is
> an additional reference that it can release here. See comment above the
> kernfs_get() found in mkdir_rdt_prepare, copied here for convenience:
> 
> 	/*                                                                      
> 	 * kernfs_remove() will drop the reference count on "kn" which          
> 	 * will free it. But we still need it to stick around for the           
> 	 * rdtgroup_kn_unlock(kn) call. Take one extra reference here,          
> 	 * which will be dropped by kernfs_put() in rdtgroup_remove().          
> 	 */                                               
> 
> We could aim to balance the references with a kernfs_get() during rdtgroup_setup_root()
> but then we need to take care to ensure rdtgroup_remove() is called on exit for
> the default group which may be confusing since it is not actually removed. How about
> instead just don't drop the reference that we do not have? What do you think of below?

This looks good. I can include a comment on why nothing needs to be done
for the default group.

> 	/* If doing below the function comments need to be updated */
> 	static void rdtgroup_remove(struct rdtgroup *rdtgrp)                            
> 	{                                                                               
> 		if (rdtgrp == &rdtgroup_default)                                        
> 			return;                                                         
> 		kernfs_put(rdtgrp->kn);                                                 
> 		kfree(rdtgrp);                                                          
> 	}                            
> 
> When considering the races with a concurrent mount mentioned by sashiko I wonder
> if resctrl should not also use waiters on default group to gate any new mounts.
> I was tempted to place such check in rdtgroup_setup_root() but it seems to bury
> something gating the mount so perhaps better at beginning of rdt_get_tree()?
> 
> What do you think of something like below?

Also good. Eliminates concerns about a new mount messing with things
pending from the previous mount.

> 	rdt_get_tree()
> 	{
> 		...
> 		if (resctrl_mounted) {
> 			ret = -EBUSY;
> 			goto out;
> 		}
> 	
> 		if (atomic_read(&rdtgroup_default.waitcount) != 0) {
> 			ret = -EBUSY;
> 			goto out;
> 		}
> 		...
> 	}
> 
> Another alternative to consider is to not call mon_put_kn_priv() on unmount but
> instead on resctrl_exit()? Thus treating it similar to the RMID LRU list.
> This may be more complicated in the long term since it needs more care to ensure
> needed state is still available a resctrl file reader that was blocked because of
> unmount or failure (via resctrl_exit()).

Pushing the resctrl_exit() is currently saying we don't care about the
leaked allocation (since resctrl_exit() is never called - actually
discarded). Cleaning up on unmount now means one less thing to do if we
ever make resctrl a loadable module.

> >  }
> >  
> >  static void _update_task_closid_rmid(void *task)
> > @@ -2965,6 +2966,8 @@ static void resctrl_fs_teardown(void)
> >  	mon_put_kn_priv();
> >  	rdt_pseudo_lock_release();
> >  	rdtgroup_default.mode = RDT_MODE_SHAREABLE;
> > +	if (atomic_read(&rdtgroup_default.waitcount) != 0)
> > +		rdtgroup_default.flags = RDT_DELETED;
> 
> sashiko found a race here ... looks like setting RDT_DELETED unconditionally would
> help.

Yes - as long as you are OK with the asymmetry between the default group
and regular groups. I think it is OK because there are already many
special cases for the default group.

> >  	closid_exit();
> >  	schemata_list_destroy();
> >  	rdtgroup_destroy_root();
> > @@ -4291,6 +4294,7 @@ static int rdtgroup_setup_root(struct rdt_fs_context *ctx)
> >  
> >  	ctx->kfc.root = rdt_root;
> >  	rdtgroup_default.kn = kernfs_root_to_node(rdt_root);
> > +	rdtgroup_default.flags = 0;
> >  
> >  	return 0;
> >  }
> 
> The "permanent lock leak" issue reported by sashiko is not clear to me. It claims:
> 	
> ---8<---
> 	In rdtgroup_mondata_show(), if rdtgroup_kn_lock_live() returns NULL, the
> 	error path jumps to the out label:
> 	    out:
> 	        if (rdtgrp)
> 	            rdtgroup_kn_unlock(of->kn);
> 	Because rdtgrp is NULL, the unlock is skipped, leaving the locks permanently
> 	held.
> ---8<---
> 
> Comparing the claim to actual code the snippet looks like a mismatch since
> rdtgroup_mondata_show() actually looks like:
> 	out:
> 		rdtgroup_kn_unlock(of->kn);

Yes. Looks like a problem in hallucinated code.

> Reinette

-Tony

  reply	other threads:[~2026-05-14 22:23 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-13 22:40 [RFC PATCH] fs/resctrl: Fix use-after-free during unmount Tony Luck
2026-05-14 21:45 ` Reinette Chatre
2026-05-14 22:23   ` Luck, Tony [this message]
2026-05-14 22:44     ` Reinette Chatre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agZLT1hIPrDIrN4C@agluck-desk3 \
    --to=tony.luck@intel.com \
    --cc=Dave.Martin@arm.com \
    --cc=babu.moger@amd.com \
    --cc=bp@alien8.de \
    --cc=dfustini@baylibre.com \
    --cc=fenghuay@nvidia.com \
    --cc=james.morse@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maciej.wieczor-retman@intel.com \
    --cc=patches@lists.linux.dev \
    --cc=peternewman@google.com \
    --cc=reinette.chatre@intel.com \
    --cc=x86@kernel.org \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox