From: "Luck, Tony" <tony.luck@intel.com>
To: Reinette Chatre <reinette.chatre@intel.com>
Cc: Fenghua Yu <fenghuay@nvidia.com>,
Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>,
Peter Newman <peternewman@google.com>,
James Morse <james.morse@arm.com>,
Babu Moger <babu.moger@amd.com>,
"Drew Fustini" <dfustini@baylibre.com>,
Dave Martin <Dave.Martin@arm.com>, Chen Yu <yu.c.chen@intel.com>,
Borislav Petkov <bp@alien8.de>, <x86@kernel.org>,
<linux-kernel@vger.kernel.org>, <patches@lists.linux.dev>
Subject: Re: [RFC PATCH] fs/resctrl: Fix use-after-free during unmount
Date: Thu, 14 May 2026 15:23:11 -0700 [thread overview]
Message-ID: <agZLT1hIPrDIrN4C@agluck-desk3> (raw)
In-Reply-To: <28f663b7-de8c-4d10-b750-edd8ab9bceb7@intel.com>
On Thu, May 14, 2026 at 02:45:10PM -0700, Reinette Chatre wrote:
> Hi Tony,
>
> On 5/13/26 3:40 PM, Tony Luck wrote:
> > Sashiko reported[1] this issue:
> >
> > During unmount or failure teardown, resctrl_fs_teardown() calls
> > mon_put_kn_priv() (which frees all mon_data structures) followed
> > by rdtgroup_destroy_root() (which destroys kernfs nodes). However, the
> > RDT_DELETED flag is never set for rdtgroup_default.
> >
> > If a concurrent reader (e.g., rdtgroup_mondata_show()) invokes
> > rdtgroup_kn_lock_live(), it drops kernfs active protection and blocks on
> > rdtgroup_mutex. resctrl_fs_teardown() (holding the mutex) proceeds to free
> > the private data and destroy the nodes without waiting for the reader.
> >
> > When the mutex is released, the reader wakes up, observes that RDT_DELETED is
> > not set for the default group, and dereferences the already-freed of->kn->priv
> > pointer.
> >
> > Set RDT_DELETED for the default group (if there are any tasks waiting).
> >
> > Signed-off-by: Tony Luck <tony.luck@intel.com>
> > Link: https://sashiko.dev/#/patchset/20260508182143.14592-1-tony.luck%40intel.com?part=2 [1]
> > ---
> >
> > Yet another side-quest from Sashiko. RFC for some human eyes before I
> > add to my series and post a new version;
>
> sashiko also reviewed it and found a few issues that seem legitimate:
> https://sashiko.dev/#/patchset/20260513224044.17167-1-tony.luck%40intel.com
>
> >
> > 1) Is this real? It looks like it is to me.
>
> Looks like it to me also.
Thanks for confirming.
> > 2) Is my fix reasonable?
> > 3) Better way to fix it?
> >
> > fs/resctrl/rdtgroup.c | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
> > index eac7e4f8574d..668ebe0b0ec6 100644
> > --- a/fs/resctrl/rdtgroup.c
> > +++ b/fs/resctrl/rdtgroup.c
> > @@ -594,7 +594,8 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of,
> > static void rdtgroup_remove(struct rdtgroup *rdtgrp)
> > {
> > kernfs_put(rdtgrp->kn);
> > - kfree(rdtgrp);
> > + if (rdtgrp != &rdtgroup_default)
> > + kfree(rdtgrp);
>
> Issue described by sashiko with new kernfs_put() is real here. rdtgroup_remove()
> was not called on default group before this change and it assumes that there is
> an additional reference that it can release here. See comment above the
> kernfs_get() found in mkdir_rdt_prepare, copied here for convenience:
>
> /*
> * kernfs_remove() will drop the reference count on "kn" which
> * will free it. But we still need it to stick around for the
> * rdtgroup_kn_unlock(kn) call. Take one extra reference here,
> * which will be dropped by kernfs_put() in rdtgroup_remove().
> */
>
> We could aim to balance the references with a kernfs_get() during rdtgroup_setup_root()
> but then we need to take care to ensure rdtgroup_remove() is called on exit for
> the default group which may be confusing since it is not actually removed. How about
> instead just don't drop the reference that we do not have? What do you think of below?
This looks good. I can include a comment on why nothing needs to be done
for the default group.
> /* If doing below the function comments need to be updated */
> static void rdtgroup_remove(struct rdtgroup *rdtgrp)
> {
> if (rdtgrp == &rdtgroup_default)
> return;
> kernfs_put(rdtgrp->kn);
> kfree(rdtgrp);
> }
>
> When considering the races with a concurrent mount mentioned by sashiko I wonder
> if resctrl should not also use waiters on default group to gate any new mounts.
> I was tempted to place such check in rdtgroup_setup_root() but it seems to bury
> something gating the mount so perhaps better at beginning of rdt_get_tree()?
>
> What do you think of something like below?
Also good. Eliminates concerns about a new mount messing with things
pending from the previous mount.
> rdt_get_tree()
> {
> ...
> if (resctrl_mounted) {
> ret = -EBUSY;
> goto out;
> }
>
> if (atomic_read(&rdtgroup_default.waitcount) != 0) {
> ret = -EBUSY;
> goto out;
> }
> ...
> }
>
> Another alternative to consider is to not call mon_put_kn_priv() on unmount but
> instead on resctrl_exit()? Thus treating it similar to the RMID LRU list.
> This may be more complicated in the long term since it needs more care to ensure
> needed state is still available a resctrl file reader that was blocked because of
> unmount or failure (via resctrl_exit()).
Pushing the resctrl_exit() is currently saying we don't care about the
leaked allocation (since resctrl_exit() is never called - actually
discarded). Cleaning up on unmount now means one less thing to do if we
ever make resctrl a loadable module.
> > }
> >
> > static void _update_task_closid_rmid(void *task)
> > @@ -2965,6 +2966,8 @@ static void resctrl_fs_teardown(void)
> > mon_put_kn_priv();
> > rdt_pseudo_lock_release();
> > rdtgroup_default.mode = RDT_MODE_SHAREABLE;
> > + if (atomic_read(&rdtgroup_default.waitcount) != 0)
> > + rdtgroup_default.flags = RDT_DELETED;
>
> sashiko found a race here ... looks like setting RDT_DELETED unconditionally would
> help.
Yes - as long as you are OK with the asymmetry between the default group
and regular groups. I think it is OK because there are already many
special cases for the default group.
> > closid_exit();
> > schemata_list_destroy();
> > rdtgroup_destroy_root();
> > @@ -4291,6 +4294,7 @@ static int rdtgroup_setup_root(struct rdt_fs_context *ctx)
> >
> > ctx->kfc.root = rdt_root;
> > rdtgroup_default.kn = kernfs_root_to_node(rdt_root);
> > + rdtgroup_default.flags = 0;
> >
> > return 0;
> > }
>
> The "permanent lock leak" issue reported by sashiko is not clear to me. It claims:
>
> ---8<---
> In rdtgroup_mondata_show(), if rdtgroup_kn_lock_live() returns NULL, the
> error path jumps to the out label:
> out:
> if (rdtgrp)
> rdtgroup_kn_unlock(of->kn);
> Because rdtgrp is NULL, the unlock is skipped, leaving the locks permanently
> held.
> ---8<---
>
> Comparing the claim to actual code the snippet looks like a mismatch since
> rdtgroup_mondata_show() actually looks like:
> out:
> rdtgroup_kn_unlock(of->kn);
Yes. Looks like a problem in hallucinated code.
> Reinette
-Tony
next prev parent reply other threads:[~2026-05-14 22:23 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-13 22:40 [RFC PATCH] fs/resctrl: Fix use-after-free during unmount Tony Luck
2026-05-14 21:45 ` Reinette Chatre
2026-05-14 22:23 ` Luck, Tony [this message]
2026-05-14 22:44 ` Reinette Chatre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=agZLT1hIPrDIrN4C@agluck-desk3 \
--to=tony.luck@intel.com \
--cc=Dave.Martin@arm.com \
--cc=babu.moger@amd.com \
--cc=bp@alien8.de \
--cc=dfustini@baylibre.com \
--cc=fenghuay@nvidia.com \
--cc=james.morse@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=maciej.wieczor-retman@intel.com \
--cc=patches@lists.linux.dev \
--cc=peternewman@google.com \
--cc=reinette.chatre@intel.com \
--cc=x86@kernel.org \
--cc=yu.c.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox