Re: [PATCH v2] nfsd: drop st_mutex and rp_mutex before calling move_to_close_lru()

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: dai.ngo@oracle.com
To: NeilBrown <neilb@suse.de>
Cc: Chuck Lever <chuck.lever@oracle.com>,
	Jeff Layton <jlayton@kernel.org>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	Olga Kornievskaia <kolga@netapp.com>, Tom Talpey <tom@talpey.com>,
	linux-nfs@vger.kernel.org
Subject: Re: [PATCH v2] nfsd: drop st_mutex and rp_mutex before calling move_to_close_lru()
Date: Sat, 23 Dec 2023 10:07:30 -0800	[thread overview]
Message-ID: <620255dc-f16b-400f-bc53-cb7298d1d0e9@oracle.com> (raw)
In-Reply-To: <170328606817.11005.12343520715219354379@noble.neil.brown.name>


On 12/22/23 3:01 PM, NeilBrown wrote:
> On Sat, 23 Dec 2023, dai.ngo@oracle.com wrote:
>> On 12/21/23 6:12 PM, NeilBrown wrote:
>>> move_to_close_lru() is currently called with ->st_mutex and .rp_mutex held.
>>> This can lead to a deadlock as move_to_close_lru() waits for sc_count to
>>> drop to 2, and some threads holding a reference might be waiting for either
>>> mutex.  These references will never be dropped so sc_count will never
>>> reach 2.
>> Yes, I think there is potential deadlock here since both nfs4_preprocess_seqid_op
>> and nfsd4_find_and_lock_existing_open can increment the sc_count but then
>> have to wait for the st_mutex which being held by move_to_close_lru.
>>
>>> There can be no harm in dropping ->st_mutex to before
>>> move_to_close_lru() because the only place that takes the mutex is
>>> nfsd4_lock_ol_stateid(), and it quickly aborts if sc_type is
>>> NFS4_CLOSED_STID, which it will be before move_to_close_lru() is called.
>>>
>>> Similarly dropping .rp_mutex is safe after the state is closed and so
>>> no longer usable.  Another way to look at this is that nothing
>>> significant happens between when nfsd4_close() now calls
>>> nfsd4_cstate_clear_replay(), and where nfsd4_proc_compound calls
>>> nfsd4_cstate_clear_replay() a little later.
>>>
>>> See also
>>>    https://lore.kernel.org/lkml/4dd1fe21e11344e5969bb112e954affb@jd.com/T/
>>> where this problem was raised but not successfully resolved.
>>>
>>> Signed-off-by: NeilBrown <neilb@suse.de>
>>> ---
>>>
>>> Sorry - I posted v1 a little hastily.  I need to drop rp_mutex as well
>>> to avoid the deadlock.  This also is safe.
>>>
>>>    fs/nfsd/nfs4state.c | 12 ++++++++----
>>>    1 file changed, 8 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>>> index 40415929e2ae..453714fbcd66 100644
>>> --- a/fs/nfsd/nfs4state.c
>>> +++ b/fs/nfsd/nfs4state.c
>>> @@ -7055,7 +7055,7 @@ nfsd4_open_downgrade(struct svc_rqst *rqstp,
>>>    	return status;
>>>    }
>>>    
>>> -static void nfsd4_close_open_stateid(struct nfs4_ol_stateid *s)
>>> +static bool nfsd4_close_open_stateid(struct nfs4_ol_stateid *s)
>>>    {
>>>    	struct nfs4_client *clp = s->st_stid.sc_client;
>>>    	bool unhashed;
>>> @@ -7072,11 +7072,11 @@ static void nfsd4_close_open_stateid(struct nfs4_ol_stateid *s)
>>>    		list_for_each_entry(stp, &reaplist, st_locks)
>>>    			nfs4_free_cpntf_statelist(clp->net, &stp->st_stid);
>>>    		free_ol_stateid_reaplist(&reaplist);
>>> +		return false;
>>>    	} else {
>>>    		spin_unlock(&clp->cl_lock);
>>>    		free_ol_stateid_reaplist(&reaplist);
>>> -		if (unhashed)
>>> -			move_to_close_lru(s, clp->net);
>>> +		return unhashed;
>>>    	}
>>>    }
>>>    
>>> @@ -7092,6 +7092,7 @@ nfsd4_close(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>    	struct nfs4_ol_stateid *stp;
>>>    	struct net *net = SVC_NET(rqstp);
>>>    	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
>>> +	bool need_move_to_close_list;
>>>    
>>>    	dprintk("NFSD: nfsd4_close on file %pd\n",
>>>    			cstate->current_fh.fh_dentry);
>>> @@ -7114,8 +7115,11 @@ nfsd4_close(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>>    	 */
>>>    	nfs4_inc_and_copy_stateid(&close->cl_stateid, &stp->st_stid);
>>>    
>>> -	nfsd4_close_open_stateid(stp);
>>> +	need_move_to_close_list = nfsd4_close_open_stateid(stp);
>>>    	mutex_unlock(&stp->st_mutex);
>>> +	nfsd4_cstate_clear_replay(cstate);
>> should nfsd4_cstate_clear_replay be called only if need_move_to_close_list
>> is true?
> It certain could be done like that.
>
>     if (need_move_to_close_list) {
>           nfsd4_cstate_clear_replay(cstate);
>           move_to_close_lru(stp, net);
>     }
>
> It would make almost no behavioural difference as
> need_to_move_close_list is never true for v4.1 and later and almost
> always true for v4.0, and nfsd4_cstate_clear_replay() does nothing for
> v4.1 and later.
> The only time behaviour would interrestingly different is when
> nfsd4_close_open_stateid() found the state was already unlocked.  Then
> need_move_to_close_list would be false, but nfsd4_cstate_clear_replay()
> wouldn't be a no-op.
>
> I thought the code was a little simpler the way I wrote it.  We don't
> need the need_move_to_close_list guard on nfsd4_cstate_clear_replay(),
> so I left it unguarded.
> But I'm happy to change it if you can give a good reason - or even if
> you just think it is clearer the other way.

My thinking is that if move_to_close_lru is not called then why bother
to do nfsd4_cstate_clear_replay. It's just easier to understand and
safer (against future changes) than having to go through all possible
scenarios to make sure it's safe to call nfsd4_cstate_clear_replay
regardless.

-Dai

>
> Thanks,
> NeilBrown
>
>> -Dai
>>
>>> +	if (need_move_to_close_list)
>>> +		move_to_close_lru(stp, net);
>>>    
>>>    	/* v4.1+ suggests that we send a special stateid in here, since the
>>>    	 * clients should just ignore this anyway. Since this is not useful

next prev parent reply	other threads:[~2023-12-23 18:07 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-22  1:41 [PATCH] nfsd: drop st_mutex before calling move_to_close_lru() NeilBrown
2023-12-22  2:12 ` [PATCH v2] nfsd: drop st_mutex and rp_mutex " NeilBrown
2023-12-22 14:39   ` Chuck Lever
2023-12-22 20:15   ` dai.ngo
2023-12-22 23:01     ` NeilBrown
2023-12-23 18:07       ` dai.ngo [this message]
  -- strict thread matches above, loose matches on Subject: below --
2024-01-17  3:48 NeilBrown
2024-01-17 18:19 ` Jeff Layton
2024-02-28 17:40 ` Jeff Layton
2024-02-28 19:30   ` Jeff Layton
2024-02-29 14:00   ` Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=620255dc-f16b-400f-bc53-cb7298d1d0e9@oracle.com \
    --to=dai.ngo@oracle.com \
    --cc=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=kolga@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).