Re: [PATCH] fs/resctrl: Fix deadlock for errors during mount

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Reinette Chatre <reinette.chatre@intel.com>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>, <x86@kernel.org>,
	Fenghua Yu <fenghuay@nvidia.com>,
	Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com>,
	Peter Newman <peternewman@google.com>,
	James Morse <james.morse@arm.com>,
	Babu Moger <babu.moger@amd.com>,
	"Drew Fustini" <dfustini@baylibre.com>,
	Dave Martin <Dave.Martin@arm.com>, Chen Yu <yu.c.chen@intel.com>,
	<linux-kernel@vger.kernel.org>, <patches@lists.linux.dev>
Subject: Re: [PATCH] fs/resctrl: Fix deadlock for errors during mount
Date: Mon, 4 May 2026 10:43:53 -0700	[thread overview]
Message-ID: <51bc4f90-2979-4102-b503-930dc69ca517@intel.com> (raw)
In-Reply-To: <afjIXKwVpSAh5kAA@agluck-desk3>

Hi Tony,

On 5/4/26 9:25 AM, Luck, Tony wrote:
> On Fri, May 01, 2026 at 04:17:18PM -0700, Reinette Chatre wrote:
>> Hi Tony,
>>
>> On 5/1/26 11:56 AM, Tony Luck wrote:
>>> Sashiko noticed[1] a deadlock in the resctrl mount code.
>>>
>>> rdt_get_tree() acquires rdtgroup_mutex before calling kernfs_get_tree(). If
>>> superblock setup fails inside kernfs_get_tree(), the VFS calls kill_sb on
>>> the same thread before the call returns.  rdt_kill_sb() unconditionally
>>> attempts to acquire rdtgroup_mutex and deadlock occurs.
>>
>> Thank you for addressing this.
>>
>>>
>>> Add a boolean rdt_kill_sb_locked flag. Set it for the duration of
>>> kernfs_get_tree() and check in rdt_kill_sb() to determine if locks
>>> are already held.
>>>
>>
>> ...
>>
>>> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
>>> index 5dfdaa6f9d8f..8544020ef420 100644
>>> --- a/fs/resctrl/rdtgroup.c
>>> +++ b/fs/resctrl/rdtgroup.c
>>> @@ -2782,6 +2782,9 @@ static void schemata_list_destroy(void)
>>>  	}
>>>  }
>>>  
>>> +/* Protected by the serialized mount path (rdtgroup_mutex + resctrl_mounted). */
>>
>> I interpret above to mean that every access to rdt_kill_sb_locked can be expected to
>> be done with rdtgroup_mutex held ...
> 
> The comment could be much more descriptive about locking and limited use
> case.
> 
>>> +static bool rdt_kill_sb_locked;
>>> +
>>>  static int rdt_get_tree(struct fs_context *fc)
>>>  {
>>>  	struct rdt_fs_context *ctx = rdt_fc2context(fc);
>>> @@ -2855,7 +2858,9 @@ static int rdt_get_tree(struct fs_context *fc)
>>>  	if (ret)
>>>  		goto out_mondata;
>>>  
>>> +	rdt_kill_sb_locked = true;
>>>  	ret = kernfs_get_tree(fc);
>>> +	rdt_kill_sb_locked = false;
>>>  	if (ret < 0)
>>>  		goto out_psl;
>>>  
>>> @@ -3173,8 +3178,10 @@ static void rdt_kill_sb(struct super_block *sb)
>>>  {
>>>  	struct rdt_resource *r;
>>>  
>>> -	cpus_read_lock();
>>> -	mutex_lock(&rdtgroup_mutex);
>>> +	if (!rdt_kill_sb_locked) {
>>> +		cpus_read_lock();
>>> +		mutex_lock(&rdtgroup_mutex);
>>
>> ... but here clearly rdt_kill_sb_locked can be accessed without rdtgroup_mutex held.
> 
> A much better name for this flag would be "resctrl_mount_in_progress". With
> The header comment noting that it is set-and cleared inside
> rdtgroup_mutex protected code, it is used only in rdt_kill_sb().
> This specific use case seems safe as there are only call chains leading
> to rdt_kill_sb():
> 	1) Error cleanup from failure of kernfs_fill_super() within the
> 	   call to kernfs_get_tree() [rdtgroup_mutex still held in this
> 	   case]
> 	2) From user call to unmount the filesystem. In which case
> 	   rdt_get_tree() must have completed successfully. Any new
> 	   calls are blocked from changing this flag by the early exit
> 	   based on resctrl_mounted.
>>
>> It appears that while this change claims that rdt_kill_sb_locked is protected the
>> implementation instead seems to actually be "this works for the scenarios cared
>> about here" which I understand to be based on considerations of how the filesystem
>> code interacts with resctrl callbacks _today_.
>>
>>> +	}
>>>  
>>>  	rdt_disable_ctx();
>>>  
>>> @@ -3189,8 +3196,10 @@ static void rdt_kill_sb(struct super_block *sb)
>>>  		resctrl_arch_disable_mon();
>>>  	resctrl_mounted = false;
>>>  	kernfs_kill_sb(sb);
>>> -	mutex_unlock(&rdtgroup_mutex);
>>> -	cpus_read_unlock();
>>> +	if (!rdt_kill_sb_locked) {
>>> +		mutex_unlock(&rdtgroup_mutex);
>>> +		cpus_read_unlock();
>>> +	}
>>>  }
>>>  
>>>  static struct file_system_type rdt_fs_type = {
>>
>> Did you or your AI assistant consider running kernfs_get_tree() without rdtgroup_mutex
>> and CPU hotplug lock held? Consider, for example:
> 
> Not considered. Thanks for the suggestion ... But, see below.
> 
>> diff --git a/fs/resctrl/rdtgroup.c b/fs/resctrl/rdtgroup.c
>> index 36d21652616e..9ee6295d6521 100644
>> --- a/fs/resctrl/rdtgroup.c
>> +++ b/fs/resctrl/rdtgroup.c
>> @@ -2892,10 +2892,6 @@ static int rdt_get_tree(struct fs_context *fc)
>>  	if (ret)
>>  		goto out_mondata;
>>  
>> -	ret = kernfs_get_tree(fc);
>> -	if (ret < 0)
>> -		goto out_psl;
>> -
>>  	if (resctrl_arch_alloc_capable())
>>  		resctrl_arch_enable_alloc();
>>  	if (resctrl_arch_mon_capable())
>> @@ -2911,10 +2907,10 @@ static int rdt_get_tree(struct fs_context *fc)
>>  						   RESCTRL_PICK_ANY_CPU);
>>  	}
>>  
>> -	goto out;
>> +	mutex_unlock(&rdtgroup_mutex);
>> +	cpus_read_unlock();
>> +	return kernfs_get_tree(fc);
>>  
>> -out_psl:
>> -	rdt_pseudo_lock_release();
>>  out_mondata:
>>  	if (resctrl_arch_mon_capable())
>>  		kernfs_remove(kn_mondata);
>>
>>
>> This seems simpler by:
>> * avoiding introduction of additional state (rdt_kill_sb_locked) with unclear protection,
>> * avoiding double-cleanup on failure (rdt_kill_sb() called and then all rdt_get_tree()'s
>>   failure path),
>> * maintaining symmetry with rdt_kill_sb() by providing it the state it is
>>   expected to be called with (i.e resctrl_mounted = true).
> 
> All these are excellent points in favor of this approach.
>>
>> >From what I can tell it is safe to call kernfs_kill_sb() on failure of kernfs_get_tree(),
>> but this needs to have been be considered as part of this submission anyway.
> 
> Looks OK to me too.
> 
>> Oh, maybe there is a new lock ordering issue with this that I am missing?
> 
> I can't see any lock issues.
> 
> But ... there is a problem. kernfs_get_tree() can fail for many reasons.
> Only the specific case of failure in kernfs_get_super() makes the cleanup
> call to rdt_kill_sb(). rdt_get_tree() has no way to tell from the error
> code from kernfs_get_tree() whether cleanup has been done.

Thanks for highlighting this.

From what I can tell, kernfs_get_tree() can fail in two places: allocation of superblock
fails, in which case rdt_kill_sb() is not called, or allocation of superblock succeeded but its
initialization failed, in which case rdt_kill_sb() is called.

It seems reasonable to me to expect that rdt_kill_sb() was called if the superblock was
allocated. In this case kernfs_fs_context::new_sb_created is set.

Could kernfs_fs_context::new_sb_created be used instead of kernfs_get_tree() error code to
determine if cleanup has been done?

> 
> Plausibly I could do some surgery on the kernfs subsystem to make kernfs_get_tree()
> take a second argument "bool *did_i_call_kill_sb". Only other user is
> the cgroup code. So this might not be too invasive.

It is not clear to me yet that additional flags are needed to support this.

> 
> Or, I could fix up the comments to justify use of "resctrl_mount_in_progress"
> Also fix up rdt_kill_sb() to look like this:
> 
> static void rdt_kill_sb(struct super_block *sb)
> {
> 	if (resctrl_mount_in_progress) {
> 		resctrl_clean_up_failed_mount();
> 		return;
> 	}
> 
> 	... existing unmount path code here ...
> }

I find the reasoning about safe access to resctrl_mount_in_progress to be very
complicated. It is not clear to me that it is required when considering existing
kernfs_fs_context::new_sb_created. 

Reinette

next prev parent reply	other threads:[~2026-05-04 17:44 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-01 18:56 [PATCH] fs/resctrl: Fix deadlock for errors during mount Tony Luck
2026-05-01 23:17 ` Reinette Chatre
2026-05-04 16:25   ` Luck, Tony
2026-05-04 17:43     ` Reinette Chatre [this message]
2026-05-04 17:52       ` Luck, Tony

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51bc4f90-2979-4102-b503-930dc69ca517@intel.com \
    --to=reinette.chatre@intel.com \
    --cc=Dave.Martin@arm.com \
    --cc=babu.moger@amd.com \
    --cc=bp@alien8.de \
    --cc=dfustini@baylibre.com \
    --cc=fenghuay@nvidia.com \
    --cc=james.morse@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maciej.wieczor-retman@intel.com \
    --cc=patches@lists.linux.dev \
    --cc=peternewman@google.com \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox