Re: [PATCH 3/4] fs/resctrl: Fix deadlock for errors during mount

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

From: "Luck, Tony" <tony.luck@intel.com>
To: "Chen, Yu C" <yu.c.chen@intel.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>,
	Borislav Petkov <bp@alien8.de>, <x86@kernel.org>,
	<linux-kernel@vger.kernel.org>, <patches@lists.linux.dev>,
	Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>,
	Fenghua Yu <fenghuay@nvidia.com>,
	"James Morse" <james.morse@arm.com>,
	Drew Fustini <dfustini@baylibre.com>,
	Babu Moger <babu.moger@amd.com>,
	Peter Newman <peternewman@google.com>,
	Dave Martin <Dave.Martin@arm.com>
Subject: Re: [PATCH 3/4] fs/resctrl: Fix deadlock for errors during mount
Date: Wed, 13 May 2026 12:51:49 -0700	[thread overview]
Message-ID: <agTWVa1NYNhA5wUp@agluck-desk3> (raw)
In-Reply-To: <8ee967df-329d-441d-9635-47f48b5e7b8f@intel.com>

On Wed, May 13, 2026 at 11:24:47AM +0800, Chen, Yu C wrote:
> Hi Reinette,
> On 5/12/2026 10:34 PM, Reinette Chatre wrote:
> > Hi Chenyu,
> > 
> > On 5/12/26 12:28 AM, Chen, Yu C wrote:
> > > On 5/12/2026 6:53 AM, Reinette Chatre wrote:
> > > > Hi Tony,
> > > > 
> > > > On 5/8/26 11:21 AM, Tony Luck wrote:
> > > 
> > > > +     * Obtain reference with locks held to protect against interference
> > > > +     * from resctrl_exit().
> > > > +    */
> > > > +    kernfs_get(rdt_root_kn);
> > > 
> > > [ ... ]
> > > 
> > > > @@ -3130,6 +3144,7 @@ static int rdt_get_tree(struct fs_context *fc)
> > > >         */
> > > >        if (!ctx->kfc.new_sb_created)
> > > >            resctrl_unmount();
> > > > +    kernfs_put(rdt_root_kn);
> > > 
> > > I wonder if above should be protected against
> > >      cpus_read_lock();
> > >      mutex_lock(&rdtgroup_mutex);
> > > like kernfs_get()?
> > 
> > It is not obvious to me what this protection would be needed for.
> > Do you have a troublesome scenario in mind?
> > 
> > rdt_root_kn is a local copy of rdtgroup_default.kn. The latter is indeed
> > protected by the mutex. The reason why the kernfs_get() is protected
> > by the mutex is to ensure what rdt_root_kn points to, rdtgroup_default.kn, remains
> > accessible after the mutex is dropped. Nothing else modifies rdt_root_kn. I
> > understand the appeal of symmetry but it is not clear to me what the extra
> > locking is needed for here?
> > 
> 
> Thanks for the detailed explanation. I now agree there is no need to
> protect kernfs_put() with a lock here only for symmetry reason. I
> previously thought racing conditions would occur if two code paths
> concurrently enter kernfs_put() and target the same data area.
> However, since kernfs_put() contains an atomic compare, only one
> code path can proceed, making the operation safe.
> 
> > Could it perhaps make this flow easier to understand if the kernfs_get() is
> > of the mutex protected rdtgroup_default.kn while the kernfs_put() is
> > of the local backup copy? For example:
> > 
> > 	/* Ensure root kn remains accessible after mutex is unlocked */
> > 	kernfs_get(rdtgroup_default.kn);
> > 	/*
> > 	 * Make backup of rdtgroup_default.kn just in case one of the
> > 	 * following flows (that sets rdtgroup_default.kn to NULL) run after
> > 	 * the mutex is unlocked:
> > 	 * resctrl_exit()->resctrl_fs_teardown()->rdtgroup_destroy_root()
> > 	 * kernfs_get_tree()->deactivate_locked_super()->rdt_kill_sb()->resctrl_unmount()->resctrl_fs_teardown()->rdtgroup_destroy_root()
> > 	 * These flows would not actually result in rdtgroup_default.kn
> > 	 * being removed thanks to the additional reference.
> > 	 /
> 
> Yes, this comment is very clear and helpful.
> 
> thanks,
> Chenyu

Are we out of the woods yet? Applying these suggestions I now have:

	/* Ensure root kn remains accessible after mutex is unlocked */
	kernfs_get(rdtgroup_default.kn);

	/*
	 * Make backup of rdtgroup_default.kn just in case one of the
	 * following flows (that sets rdtgroup_default.kn to NULL) run after
	 * the mutex is unlocked:
	 * resctrl_exit()->resctrl_fs_teardown()->rdtgroup_destroy_root()
	 * kernfs_get_tree()->deactivate_locked_super()->rdt_kill_sb()->
	 *	resctrl_unmount()->resctrl_fs_teardown()->rdtgroup_destroy_root()
	 * These flows would not actually result in rdtgroup_default.kn
	 * being removed thanks to the additional reference.
	 */
	rdt_root_kn = rdtgroup_default.kn;

	rdt_last_cmd_clear();
	mutex_unlock(&rdtgroup_mutex);
	cpus_read_unlock();

	ret = kernfs_get_tree(fc);
	/*
	 * resctrl can only be mounted once, new superblock only expected
	 * to be created once.
	 */
	if (!ctx->kfc.new_sb_created)
		resctrl_unmount();

resctrl_unmount() clears resctrl_mounted, so as soon locks are released
a new mount attempt (maybe started a while ago, but blocked waiting for
the mutex) can begin. I just want to confirm that won't stomp on
anything left over from this failed mount that was waiting for this
kernfs_put() to happen.  I think it is OK, because the new mount is
going to allocate all new structures. But there's been enough layers
to this onion that I'd like to confirm.

	kernfs_put(rdt_root_kn);
	return ret;

-Tony

next prev parent reply	other threads:[~2026-05-13 19:51 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-08 18:21 [PATCH 0/4] fs/resctrl: Fix three long-standing issues Tony Luck
2026-05-08 18:21 ` [PATCH 1/4] fs/resctrl: Move functions to avoid forward references in subsequent fixes Tony Luck
2026-05-08 18:21 ` [PATCH 2/4] fs/resctrl: Free mon_data structures on rdt_get_tree() failure Tony Luck
2026-05-08 21:36   ` Luck, Tony
2026-05-09 12:43     ` Chen, Yu C
2026-05-11  3:15       ` Luck, Tony
2026-05-12  1:51         ` Chen, Yu C
2026-05-08 18:21 ` [PATCH 3/4] fs/resctrl: Fix deadlock for errors during mount Tony Luck
2026-05-10 13:52   ` Chen, Yu C
2026-05-11 22:53   ` Reinette Chatre
2026-05-12  7:28     ` Chen, Yu C
2026-05-12 14:34       ` Reinette Chatre
2026-05-13  3:24         ` Chen, Yu C
2026-05-13 19:51           ` Luck, Tony [this message]
2026-05-13 22:19             ` Reinette Chatre
2026-05-08 18:21 ` [PATCH 4/4] fs/resctrl: Fix issues with worker threads when CPUs are taken offline Tony Luck
2026-05-11 23:06   ` Reinette Chatre
2026-05-13 20:10     ` Luck, Tony

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agTWVa1NYNhA5wUp@agluck-desk3 \
    --to=tony.luck@intel.com \
    --cc=Dave.Martin@arm.com \
    --cc=babu.moger@amd.com \
    --cc=bp@alien8.de \
    --cc=dfustini@baylibre.com \
    --cc=fenghuay@nvidia.com \
    --cc=james.morse@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maciej.wieczor-retman@intel.com \
    --cc=patches@lists.linux.dev \
    --cc=peternewman@google.com \
    --cc=reinette.chatre@intel.com \
    --cc=x86@kernel.org \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox