Re: [RFC PATCH] fs/resctrl: Fix use-after-free during unmount

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

From: Reinette Chatre <reinette.chatre@intel.com>
To: Tony Luck <tony.luck@intel.com>, Fenghua Yu <fenghuay@nvidia.com>,
	"Maciej Wieczor-Retman" <maciej.wieczor-retman@intel.com>,
	Peter Newman <peternewman@google.com>,
	James Morse <james.morse@arm.com>,
	Babu Moger <babu.moger@amd.com>,
	Drew Fustini <dfustini@baylibre.com>,
	Dave Martin <Dave.Martin@arm.com>, Chen Yu <yu.c.chen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>, <x86@kernel.org>,
	<linux-kernel@vger.kernel.org>, <patches@lists.linux.dev>
Subject: Re: [RFC PATCH] fs/resctrl: Fix use-after-free during unmount
Date: Thu, 14 May 2026 14:45:10 -0700	[thread overview]
Message-ID: <28f663b7-de8c-4d10-b750-edd8ab9bceb7@intel.com> (raw)
In-Reply-To: <20260513224044.17167-1-tony.luck@intel.com>

Hi Tony,

On 5/13/26 3:40 PM, Tony Luck wrote:
> Sashiko reported[1] this issue:
> 
>   During unmount or failure teardown, resctrl_fs_teardown() calls
>   mon_put_kn_priv() (which frees all mon_data structures) followed
>   by rdtgroup_destroy_root() (which destroys kernfs nodes). However, the
>   RDT_DELETED flag is never set for rdtgroup_default.
> 
>   If a concurrent reader (e.g., rdtgroup_mondata_show()) invokes
>   rdtgroup_kn_lock_live(), it drops kernfs active protection and blocks on
>   rdtgroup_mutex. resctrl_fs_teardown() (holding the mutex) proceeds to free
>   the private data and destroy the nodes without waiting for the reader.
> 
>   When the mutex is released, the reader wakes up, observes that RDT_DELETED is
>   not set for the default group, and dereferences the already-freed of->kn->priv
>   pointer.
> 
> Set RDT_DELETED for the default group (if there are any tasks waiting).
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> Link: https://sashiko.dev/#/patchset/20260508182143.14592-1-tony.luck%40intel.com?part=2 [1]
> ---
> 
> Yet another side-quest from Sashiko. RFC for some human eyes before I
> add to my series and post a new version;

sashiko also reviewed it and found a few issues that seem legitimate:
https://sashiko.dev/#/patchset/20260513224044.17167-1-tony.luck%40intel.com

> 
> 1) Is this real? It looks like it is to me.

Looks like it to me also.

> 2) Is my fix reasonable?
> 3) Better way to fix it?
> 
>  fs/resctrl/rdtgroup.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> index eac7e4f8574d..668ebe0b0ec6 100644
> --- a/fs/resctrl/rdtgroup.c
> +++ b/fs/resctrl/rdtgroup.c
> @@ -594,7 +594,8 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of,
>  static void rdtgroup_remove(struct rdtgroup *rdtgrp)
>  {
>  	kernfs_put(rdtgrp->kn);
> -	kfree(rdtgrp);
> +	if (rdtgrp != &rdtgroup_default)
> +		kfree(rdtgrp);

Issue described by sashiko with new kernfs_put() is real here. rdtgroup_remove()
was not called on default group before this change and it assumes that there is
an additional reference that it can release here. See comment above the
kernfs_get() found in mkdir_rdt_prepare, copied here for convenience:

	/*                                                                      
	 * kernfs_remove() will drop the reference count on "kn" which          
	 * will free it. But we still need it to stick around for the           
	 * rdtgroup_kn_unlock(kn) call. Take one extra reference here,          
	 * which will be dropped by kernfs_put() in rdtgroup_remove().          
	 */                                               

We could aim to balance the references with a kernfs_get() during rdtgroup_setup_root()
but then we need to take care to ensure rdtgroup_remove() is called on exit for
the default group which may be confusing since it is not actually removed. How about
instead just don't drop the reference that we do not have? What do you think of below?

	/* If doing below the function comments need to be updated */
	static void rdtgroup_remove(struct rdtgroup *rdtgrp)                            
	{                                                                               
		if (rdtgrp == &rdtgroup_default)                                        
			return;                                                         
		kernfs_put(rdtgrp->kn);                                                 
		kfree(rdtgrp);                                                          
	}                            

When considering the races with a concurrent mount mentioned by sashiko I wonder
if resctrl should not also use waiters on default group to gate any new mounts.
I was tempted to place such check in rdtgroup_setup_root() but it seems to bury
something gating the mount so perhaps better at beginning of rdt_get_tree()?

What do you think of something like below?

	rdt_get_tree()
	{
		...
		if (resctrl_mounted) {
			ret = -EBUSY;
			goto out;
		}
	
		if (atomic_read(&rdtgroup_default.waitcount) != 0) {
			ret = -EBUSY;
			goto out;
		}
		...
	}

Another alternative to consider is to not call mon_put_kn_priv() on unmount but
instead on resctrl_exit()? Thus treating it similar to the RMID LRU list.
This may be more complicated in the long term since it needs more care to ensure
needed state is still available a resctrl file reader that was blocked because of
unmount or failure (via resctrl_exit()).

>  }
>  
>  static void _update_task_closid_rmid(void *task)
> @@ -2965,6 +2966,8 @@ static void resctrl_fs_teardown(void)
>  	mon_put_kn_priv();
>  	rdt_pseudo_lock_release();
>  	rdtgroup_default.mode = RDT_MODE_SHAREABLE;
> +	if (atomic_read(&rdtgroup_default.waitcount) != 0)
> +		rdtgroup_default.flags = RDT_DELETED;

sashiko found a race here ... looks like setting RDT_DELETED unconditionally would
help.

>  	closid_exit();
>  	schemata_list_destroy();
>  	rdtgroup_destroy_root();
> @@ -4291,6 +4294,7 @@ static int rdtgroup_setup_root(struct rdt_fs_context *ctx)
>  
>  	ctx->kfc.root = rdt_root;
>  	rdtgroup_default.kn = kernfs_root_to_node(rdt_root);
> +	rdtgroup_default.flags = 0;
>  
>  	return 0;
>  }

The "permanent lock leak" issue reported by sashiko is not clear to me. It claims:
	
---8<---
	In rdtgroup_mondata_show(), if rdtgroup_kn_lock_live() returns NULL, the
	error path jumps to the out label:
	    out:
	        if (rdtgrp)
	            rdtgroup_kn_unlock(of->kn);
	Because rdtgrp is NULL, the unlock is skipped, leaving the locks permanently
	held.
---8<---

Comparing the claim to actual code the snippet looks like a mismatch since
rdtgroup_mondata_show() actually looks like:
	out:
		rdtgroup_kn_unlock(of->kn);

Reinette

next prev parent reply	other threads:[~2026-05-14 21:45 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-13 22:40 [RFC PATCH] fs/resctrl: Fix use-after-free during unmount Tony Luck
2026-05-14 21:45 ` Reinette Chatre [this message]
2026-05-14 22:23   ` Luck, Tony
2026-05-14 22:44     ` Reinette Chatre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=28f663b7-de8c-4d10-b750-edd8ab9bceb7@intel.com \
    --to=reinette.chatre@intel.com \
    --cc=Dave.Martin@arm.com \
    --cc=babu.moger@amd.com \
    --cc=bp@alien8.de \
    --cc=dfustini@baylibre.com \
    --cc=fenghuay@nvidia.com \
    --cc=james.morse@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maciej.wieczor-retman@intel.com \
    --cc=patches@lists.linux.dev \
    --cc=peternewman@google.com \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox