[RFC PATCH] fs/resctrl: Fix use-after-free during unmount

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH] fs/resctrl: Fix use-after-free during unmount
@ 2026-05-13 22:40 Tony Luck
  2026-05-14 21:45 ` Reinette Chatre
  0 siblings, 1 reply; 4+ messages in thread
From: Tony Luck @ 2026-05-13 22:40 UTC (permalink / raw)
  To: Fenghua Yu, Reinette Chatre, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: Borislav Petkov, x86, linux-kernel, patches, Tony Luck

Sashiko reported[1] this issue:

  During unmount or failure teardown, resctrl_fs_teardown() calls
  mon_put_kn_priv() (which frees all mon_data structures) followed
  by rdtgroup_destroy_root() (which destroys kernfs nodes). However, the
  RDT_DELETED flag is never set for rdtgroup_default.

  If a concurrent reader (e.g., rdtgroup_mondata_show()) invokes
  rdtgroup_kn_lock_live(), it drops kernfs active protection and blocks on
  rdtgroup_mutex. resctrl_fs_teardown() (holding the mutex) proceeds to free
  the private data and destroy the nodes without waiting for the reader.

  When the mutex is released, the reader wakes up, observes that RDT_DELETED is
  not set for the default group, and dereferences the already-freed of->kn->priv
  pointer.

Set RDT_DELETED for the default group (if there are any tasks waiting).

Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://sashiko.dev/#/patchset/20260508182143.14592-1-tony.luck%40intel.com?part=2 [1]
---

Yet another side-quest from Sashiko. RFC for some human eyes before I
add to my series and post a new version;

1) Is this real? It looks like it is to me.
2) Is my fix reasonable?
3) Better way to fix it?

 fs/resctrl/rdtgroup.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
index eac7e4f8574d..668ebe0b0ec6 100644
--- a/fs/resctrl/rdtgroup.c
+++ b/fs/resctrl/rdtgroup.c
@@ -594,7 +594,8 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of,
 static void rdtgroup_remove(struct rdtgroup *rdtgrp)
 {
 	kernfs_put(rdtgrp->kn);
-	kfree(rdtgrp);
+	if (rdtgrp != &rdtgroup_default)
+		kfree(rdtgrp);
 }

 static void _update_task_closid_rmid(void *task)
@@ -2965,6 +2966,8 @@ static void resctrl_fs_teardown(void)
 	mon_put_kn_priv();
 	rdt_pseudo_lock_release();
 	rdtgroup_default.mode = RDT_MODE_SHAREABLE;
+	if (atomic_read(&rdtgroup_default.waitcount) != 0)
+		rdtgroup_default.flags = RDT_DELETED;
 	closid_exit();
 	schemata_list_destroy();
 	rdtgroup_destroy_root();
@@ -4291,6 +4294,7 @@ static int rdtgroup_setup_root(struct rdt_fs_context *ctx)

 	ctx->kfc.root = rdt_root;
 	rdtgroup_default.kn = kernfs_root_to_node(rdt_root);
+	rdtgroup_default.flags = 0;

 	return 0;
 }
-- 
2.54.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] fs/resctrl: Fix use-after-free during unmount
  2026-05-13 22:40 [RFC PATCH] fs/resctrl: Fix use-after-free during unmount Tony Luck
@ 2026-05-14 21:45 ` Reinette Chatre
  2026-05-14 22:23   ` Luck, Tony
  0 siblings, 1 reply; 4+ messages in thread
From: Reinette Chatre @ 2026-05-14 21:45 UTC (permalink / raw)
  To: Tony Luck, Fenghua Yu, Maciej Wieczor-Retman, Peter Newman,
	James Morse, Babu Moger, Drew Fustini, Dave Martin, Chen Yu
  Cc: Borislav Petkov, x86, linux-kernel, patches

Hi Tony,

On 5/13/26 3:40 PM, Tony Luck wrote:
> Sashiko reported[1] this issue:
> 
>   During unmount or failure teardown, resctrl_fs_teardown() calls
>   mon_put_kn_priv() (which frees all mon_data structures) followed
>   by rdtgroup_destroy_root() (which destroys kernfs nodes). However, the
>   RDT_DELETED flag is never set for rdtgroup_default.
> 
>   If a concurrent reader (e.g., rdtgroup_mondata_show()) invokes
>   rdtgroup_kn_lock_live(), it drops kernfs active protection and blocks on
>   rdtgroup_mutex. resctrl_fs_teardown() (holding the mutex) proceeds to free
>   the private data and destroy the nodes without waiting for the reader.
> 
>   When the mutex is released, the reader wakes up, observes that RDT_DELETED is
>   not set for the default group, and dereferences the already-freed of->kn->priv
>   pointer.
> 
> Set RDT_DELETED for the default group (if there are any tasks waiting).
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> Link: https://sashiko.dev/#/patchset/20260508182143.14592-1-tony.luck%40intel.com?part=2 [1]
> ---
> 
> Yet another side-quest from Sashiko. RFC for some human eyes before I
> add to my series and post a new version;

sashiko also reviewed it and found a few issues that seem legitimate:
https://sashiko.dev/#/patchset/20260513224044.17167-1-tony.luck%40intel.com

> 
> 1) Is this real? It looks like it is to me.

Looks like it to me also.

> 2) Is my fix reasonable?
> 3) Better way to fix it?
> 
>  fs/resctrl/rdtgroup.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index eac7e4f8574d..668ebe0b0ec6 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -594,7 +594,8 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of,
>  static void rdtgroup_remove(struct rdtgroup *rdtgrp)
>  {
>  	kernfs_put(rdtgrp->kn);
> -	kfree(rdtgrp);
> +	if (rdtgrp != &rdtgroup_default)
> +		kfree(rdtgrp);

Issue described by sashiko with new kernfs_put() is real here. rdtgroup_remove()
was not called on default group before this change and it assumes that there is
an additional reference that it can release here. See comment above the
kernfs_get() found in mkdir_rdt_prepare, copied here for convenience:

	/*                                                                      
	 * kernfs_remove() will drop the reference count on "kn" which          
	 * will free it. But we still need it to stick around for the           
	 * rdtgroup_kn_unlock(kn) call. Take one extra reference here,          
	 * which will be dropped by kernfs_put() in rdtgroup_remove().          
	 */                                               

We could aim to balance the references with a kernfs_get() during rdtgroup_setup_root()
but then we need to take care to ensure rdtgroup_remove() is called on exit for
the default group which may be confusing since it is not actually removed. How about
instead just don't drop the reference that we do not have? What do you think of below?

	/* If doing below the function comments need to be updated */
	static void rdtgroup_remove(struct rdtgroup *rdtgrp)                            
	{                                                                               
		if (rdtgrp == &rdtgroup_default)                                        
			return;                                                         
		kernfs_put(rdtgrp->kn);                                                 
		kfree(rdtgrp);                                                          
	}                            

When considering the races with a concurrent mount mentioned by sashiko I wonder
if resctrl should not also use waiters on default group to gate any new mounts.
I was tempted to place such check in rdtgroup_setup_root() but it seems to bury
something gating the mount so perhaps better at beginning of rdt_get_tree()?

What do you think of something like below?

	rdt_get_tree()
	{
		...
		if (resctrl_mounted) {
			ret = -EBUSY;
			goto out;
		}
	
		if (atomic_read(&rdtgroup_default.waitcount) != 0) {
			ret = -EBUSY;
			goto out;
		}
		...
	}

Another alternative to consider is to not call mon_put_kn_priv() on unmount but
instead on resctrl_exit()? Thus treating it similar to the RMID LRU list.
This may be more complicated in the long term since it needs more care to ensure
needed state is still available a resctrl file reader that was blocked because of
unmount or failure (via resctrl_exit()).

>  }
>  
>  static void _update_task_closid_rmid(void *task)
> @@ -2965,6 +2966,8 @@ static void resctrl_fs_teardown(void)
>  	mon_put_kn_priv();
>  	rdt_pseudo_lock_release();
>  	rdtgroup_default.mode = RDT_MODE_SHAREABLE;
> +	if (atomic_read(&rdtgroup_default.waitcount) != 0)
> +		rdtgroup_default.flags = RDT_DELETED;

sashiko found a race here ... looks like setting RDT_DELETED unconditionally would
help.

>  	closid_exit();
>  	schemata_list_destroy();
>  	rdtgroup_destroy_root();
> @@ -4291,6 +4294,7 @@ static int rdtgroup_setup_root(struct rdt_fs_context *ctx)
>  
>  	ctx->kfc.root = rdt_root;
>  	rdtgroup_default.kn = kernfs_root_to_node(rdt_root);
> +	rdtgroup_default.flags = 0;
>  
>  	return 0;
>  }

The "permanent lock leak" issue reported by sashiko is not clear to me. It claims:
	
---8<---
	In rdtgroup_mondata_show(), if rdtgroup_kn_lock_live() returns NULL, the
	error path jumps to the out label:
	    out:
	        if (rdtgrp)
	            rdtgroup_kn_unlock(of->kn);
	Because rdtgrp is NULL, the unlock is skipped, leaving the locks permanently
	held.
---8<---

Comparing the claim to actual code the snippet looks like a mismatch since
rdtgroup_mondata_show() actually looks like:
	out:
		rdtgroup_kn_unlock(of->kn);

Reinette

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] fs/resctrl: Fix use-after-free during unmount
  2026-05-14 21:45 ` Reinette Chatre
@ 2026-05-14 22:23   ` Luck, Tony
  2026-05-14 22:44     ` Reinette Chatre
  0 siblings, 1 reply; 4+ messages in thread
From: Luck, Tony @ 2026-05-14 22:23 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Chen Yu, Borislav Petkov,
	x86, linux-kernel, patches

On Thu, May 14, 2026 at 02:45:10PM -0700, Reinette Chatre wrote:
> Hi Tony,
> 
> On 5/13/26 3:40 PM, Tony Luck wrote:
> > Sashiko reported[1] this issue:
> > 
> >   During unmount or failure teardown, resctrl_fs_teardown() calls
> >   mon_put_kn_priv() (which frees all mon_data structures) followed
> >   by rdtgroup_destroy_root() (which destroys kernfs nodes). However, the
> >   RDT_DELETED flag is never set for rdtgroup_default.
> > 
> >   If a concurrent reader (e.g., rdtgroup_mondata_show()) invokes
> >   rdtgroup_kn_lock_live(), it drops kernfs active protection and blocks on
> >   rdtgroup_mutex. resctrl_fs_teardown() (holding the mutex) proceeds to free
> >   the private data and destroy the nodes without waiting for the reader.
> > 
> >   When the mutex is released, the reader wakes up, observes that RDT_DELETED is
> >   not set for the default group, and dereferences the already-freed of->kn->priv
> >   pointer.
> > 
> > Set RDT_DELETED for the default group (if there are any tasks waiting).
> > 
> > Signed-off-by: Tony Luck <tony.luck@intel.com>
> > Link: https://sashiko.dev/#/patchset/20260508182143.14592-1-tony.luck%40intel.com?part=2 [1]
> > ---
> > 
> > Yet another side-quest from Sashiko. RFC for some human eyes before I
> > add to my series and post a new version;
> 
> sashiko also reviewed it and found a few issues that seem legitimate:
> https://sashiko.dev/#/patchset/20260513224044.17167-1-tony.luck%40intel.com
> 
> > 
> > 1) Is this real? It looks like it is to me.
> 
> Looks like it to me also.

Thanks for confirming.

> > 2) Is my fix reasonable?
> > 3) Better way to fix it?
> > 
> >  fs/resctrl/rdtgroup.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> > index eac7e4f8574d..668ebe0b0ec6 100644
> > --- a/fs/resctrl/rdtgroup.c
> > +++ b/fs/resctrl/rdtgroup.c
> > @@ -594,7 +594,8 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of,
> >  static void rdtgroup_remove(struct rdtgroup *rdtgrp)
> >  {
> >  	kernfs_put(rdtgrp->kn);
> > -	kfree(rdtgrp);
> > +	if (rdtgrp != &rdtgroup_default)
> > +		kfree(rdtgrp);
> 
> Issue described by sashiko with new kernfs_put() is real here. rdtgroup_remove()
> was not called on default group before this change and it assumes that there is
> an additional reference that it can release here. See comment above the
> kernfs_get() found in mkdir_rdt_prepare, copied here for convenience:
> 
> 	/*                                                                      
> 	 * kernfs_remove() will drop the reference count on "kn" which          
> 	 * will free it. But we still need it to stick around for the           
> 	 * rdtgroup_kn_unlock(kn) call. Take one extra reference here,          
> 	 * which will be dropped by kernfs_put() in rdtgroup_remove().          
> 	 */                                               
> 
> We could aim to balance the references with a kernfs_get() during rdtgroup_setup_root()
> but then we need to take care to ensure rdtgroup_remove() is called on exit for
> the default group which may be confusing since it is not actually removed. How about
> instead just don't drop the reference that we do not have? What do you think of below?

This looks good. I can include a comment on why nothing needs to be done
for the default group.

> 	/* If doing below the function comments need to be updated */
> 	static void rdtgroup_remove(struct rdtgroup *rdtgrp)                            
> 	{                                                                               
> 		if (rdtgrp == &rdtgroup_default)                                        
> 			return;                                                         
> 		kernfs_put(rdtgrp->kn);                                                 
> 		kfree(rdtgrp);                                                          
> 	}                            
> 
> When considering the races with a concurrent mount mentioned by sashiko I wonder
> if resctrl should not also use waiters on default group to gate any new mounts.
> I was tempted to place such check in rdtgroup_setup_root() but it seems to bury
> something gating the mount so perhaps better at beginning of rdt_get_tree()?
> 
> What do you think of something like below?

Also good. Eliminates concerns about a new mount messing with things
pending from the previous mount.

> 	rdt_get_tree()
> 	{
> 		...
> 		if (resctrl_mounted) {
> 			ret = -EBUSY;
> 			goto out;
> 		}
> 	
> 		if (atomic_read(&rdtgroup_default.waitcount) != 0) {
> 			ret = -EBUSY;
> 			goto out;
> 		}
> 		...
> 	}
> 
> Another alternative to consider is to not call mon_put_kn_priv() on unmount but
> instead on resctrl_exit()? Thus treating it similar to the RMID LRU list.
> This may be more complicated in the long term since it needs more care to ensure
> needed state is still available a resctrl file reader that was blocked because of
> unmount or failure (via resctrl_exit()).

Pushing the resctrl_exit() is currently saying we don't care about the
leaked allocation (since resctrl_exit() is never called - actually
discarded). Cleaning up on unmount now means one less thing to do if we
ever make resctrl a loadable module.

> >  }
> >  
> >  static void _update_task_closid_rmid(void *task)
> > @@ -2965,6 +2966,8 @@ static void resctrl_fs_teardown(void)
> >  	mon_put_kn_priv();
> >  	rdt_pseudo_lock_release();
> >  	rdtgroup_default.mode = RDT_MODE_SHAREABLE;
> > +	if (atomic_read(&rdtgroup_default.waitcount) != 0)
> > +		rdtgroup_default.flags = RDT_DELETED;
> 
> sashiko found a race here ... looks like setting RDT_DELETED unconditionally would
> help.

Yes - as long as you are OK with the asymmetry between the default group
and regular groups. I think it is OK because there are already many
special cases for the default group.

> >  	closid_exit();
> >  	schemata_list_destroy();
> >  	rdtgroup_destroy_root();
> > @@ -4291,6 +4294,7 @@ static int rdtgroup_setup_root(struct rdt_fs_context *ctx)
> >  
> >  	ctx->kfc.root = rdt_root;
> >  	rdtgroup_default.kn = kernfs_root_to_node(rdt_root);
> > +	rdtgroup_default.flags = 0;
> >  
> >  	return 0;
> >  }
> 
> The "permanent lock leak" issue reported by sashiko is not clear to me. It claims:
> 	
> ---8<---
> 	In rdtgroup_mondata_show(), if rdtgroup_kn_lock_live() returns NULL, the
> 	error path jumps to the out label:
> 	    out:
> 	        if (rdtgrp)
> 	            rdtgroup_kn_unlock(of->kn);
> 	Because rdtgrp is NULL, the unlock is skipped, leaving the locks permanently
> 	held.
> ---8<---
> 
> Comparing the claim to actual code the snippet looks like a mismatch since
> rdtgroup_mondata_show() actually looks like:
> 	out:
> 		rdtgroup_kn_unlock(of->kn);

Yes. Looks like a problem in hallucinated code.

> Reinette

-Tony

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] fs/resctrl: Fix use-after-free during unmount
  2026-05-14 22:23   ` Luck, Tony
@ 2026-05-14 22:44     ` Reinette Chatre
  0 siblings, 0 replies; 4+ messages in thread
From: Reinette Chatre @ 2026-05-14 22:44 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Fenghua Yu, Maciej Wieczor-Retman, Peter Newman, James Morse,
	Babu Moger, Drew Fustini, Dave Martin, Chen Yu, Borislav Petkov,
	x86, linux-kernel, patches

Hi Tony,

On 5/14/26 3:23 PM, Luck, Tony wrote:
> On Thu, May 14, 2026 at 02:45:10PM -0700, Reinette Chatre wrote:
>> On 5/13/26 3:40 PM, Tony Luck wrote:

...

>> Another alternative to consider is to not call mon_put_kn_priv() on unmount but
>> instead on resctrl_exit()? Thus treating it similar to the RMID LRU list.
>> This may be more complicated in the long term since it needs more care to ensure
>> needed state is still available a resctrl file reader that was blocked because of
>> unmount or failure (via resctrl_exit()).
> 
> Pushing the resctrl_exit() is currently saying we don't care about the
> leaked allocation (since resctrl_exit() is never called - actually
> discarded). Cleaning up on unmount now means one less thing to do if we
> ever make resctrl a loadable module.

ack. One correction is that resctrl_exit() is now called by MPAM as part of its
error handling on receipt of a special error IRQ.

> 
>>>  }
>>>  
>>>  static void _update_task_closid_rmid(void *task)
>>> @@ -2965,6 +2966,8 @@ static void resctrl_fs_teardown(void)
>>>  	mon_put_kn_priv();
>>>  	rdt_pseudo_lock_release();
>>>  	rdtgroup_default.mode = RDT_MODE_SHAREABLE;
>>> +	if (atomic_read(&rdtgroup_default.waitcount) != 0)
>>> +		rdtgroup_default.flags = RDT_DELETED;
>>
>> sashiko found a race here ... looks like setting RDT_DELETED unconditionally would
>> help.
> 
> Yes - as long as you are OK with the asymmetry between the default group
> and regular groups. I think it is OK because there are already many
> special cases for the default group.

I assume that you mean that in equivalent scenario a dynamically allocated regular group
is freed when it has no waiters. Since the default group is not dynamically allocated
it cannot be freed so it cannot be fully symmetrical here. I think an unconditional
RDT_DELETED is appropriate and matches how the default group is never freed.

> 
>>>  	closid_exit();
>>>  	schemata_list_destroy();
>>>  	rdtgroup_destroy_root();
>>> @@ -4291,6 +4294,7 @@ static int rdtgroup_setup_root(struct rdt_fs_context *ctx)
>>>  
>>>  	ctx->kfc.root = rdt_root;
>>>  	rdtgroup_default.kn = kernfs_root_to_node(rdt_root);
>>> +	rdtgroup_default.flags = 0;
>>>  
>>>  	return 0;
>>>  }
>>
>> The "permanent lock leak" issue reported by sashiko is not clear to me. It claims:
>> 	
>> ---8<---
>> 	In rdtgroup_mondata_show(), if rdtgroup_kn_lock_live() returns NULL, the
>> 	error path jumps to the out label:
>> 	    out:
>> 	        if (rdtgrp)
>> 	            rdtgroup_kn_unlock(of->kn);
>> 	Because rdtgrp is NULL, the unlock is skipped, leaving the locks permanently
>> 	held.
>> ---8<---
>>
>> Comparing the claim to actual code the snippet looks like a mismatch since
>> rdtgroup_mondata_show() actually looks like:
>> 	out:
>> 		rdtgroup_kn_unlock(of->kn);
> 
> Yes. Looks like a problem in hallucinated code.

Thank you very much for the sanity check.

Reinette


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-05-14 22:44 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-13 22:40 [RFC PATCH] fs/resctrl: Fix use-after-free during unmount Tony Luck
2026-05-14 21:45 ` Reinette Chatre
2026-05-14 22:23   ` Luck, Tony
2026-05-14 22:44     ` Reinette Chatre

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.