From: dai.ngo@oracle.com
To: NeilBrown <neilb@suse.de>
Cc: Chuck Lever <chuck.lever@oracle.com>,
Jeff Layton <jlayton@kernel.org>,
"J. Bruce Fields" <bfields@fieldses.org>,
Olga Kornievskaia <kolga@netapp.com>, Tom Talpey <tom@talpey.com>,
linux-nfs@vger.kernel.org
Subject: Re: [PATCH v2] nfsd: drop st_mutex and rp_mutex before calling move_to_close_lru()
Date: Sat, 23 Dec 2023 10:07:30 -0800 [thread overview]
Message-ID: <620255dc-f16b-400f-bc53-cb7298d1d0e9@oracle.com> (raw)
In-Reply-To: <170328606817.11005.12343520715219354379@noble.neil.brown.name>
On 12/22/23 3:01 PM, NeilBrown wrote:
> On Sat, 23 Dec 2023, dai.ngo@oracle.com wrote:
>> On 12/21/23 6:12 PM, NeilBrown wrote:
>>> move_to_close_lru() is currently called with ->st_mutex and .rp_mutex held.
>>> This can lead to a deadlock as move_to_close_lru() waits for sc_count to
>>> drop to 2, and some threads holding a reference might be waiting for either
>>> mutex. These references will never be dropped so sc_count will never
>>> reach 2.
>> Yes, I think there is potential deadlock here since both nfs4_preprocess_seqid_op
>> and nfsd4_find_and_lock_existing_open can increment the sc_count but then
>> have to wait for the st_mutex which being held by move_to_close_lru.
>>
>>> There can be no harm in dropping ->st_mutex to before
>>> move_to_close_lru() because the only place that takes the mutex is
>>> nfsd4_lock_ol_stateid(), and it quickly aborts if sc_type is
>>> NFS4_CLOSED_STID, which it will be before move_to_close_lru() is called.
>>>
>>> Similarly dropping .rp_mutex is safe after the state is closed and so
>>> no longer usable. Another way to look at this is that nothing
>>> significant happens between when nfsd4_close() now calls
>>> nfsd4_cstate_clear_replay(), and where nfsd4_proc_compound calls
>>> nfsd4_cstate_clear_replay() a little later.
>>>
>>> See also
>>> https://lore.kernel.org/lkml/4dd1fe21e11344e5969bb112e954affb@jd.com/T/
>>> where this problem was raised but not successfully resolved.
>>>
>>> Signed-off-by: NeilBrown <neilb@suse.de>
>>> ---
>>>
>>> Sorry - I posted v1 a little hastily. I need to drop rp_mutex as well
>>> to avoid the deadlock. This also is safe.
>>>
>>> fs/nfsd/nfs4state.c | 12 ++++++++----
>>> 1 file changed, 8 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>>> index 40415929e2ae..453714fbcd66 100644
>>> --- a/fs/nfsd/nfs4state.c
>>> +++ b/fs/nfsd/nfs4state.c
>>> @@ -7055,7 +7055,7 @@ nfsd4_open_downgrade(struct svc_rqst *rqstp,
>>> return status;
>>> }
>>>
>>> -static void nfsd4_close_open_stateid(struct nfs4_ol_stateid *s)
>>> +static bool nfsd4_close_open_stateid(struct nfs4_ol_stateid *s)
>>> {
>>> struct nfs4_client *clp = s->st_stid.sc_client;
>>> bool unhashed;
>>> @@ -7072,11 +7072,11 @@ static void nfsd4_close_open_stateid(struct nfs4_ol_stateid *s)
>>> list_for_each_entry(stp, &reaplist, st_locks)
>>> nfs4_free_cpntf_statelist(clp->net, &stp->st_stid);
>>> free_ol_stateid_reaplist(&reaplist);
>>> + return false;
>>> } else {
>>> spin_unlock(&clp->cl_lock);
>>> free_ol_stateid_reaplist(&reaplist);
>>> - if (unhashed)
>>> - move_to_close_lru(s, clp->net);
>>> + return unhashed;
>>> }
>>> }
>>>
>>> @@ -7092,6 +7092,7 @@ nfsd4_close(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>> struct nfs4_ol_stateid *stp;
>>> struct net *net = SVC_NET(rqstp);
>>> struct nfsd_net *nn = net_generic(net, nfsd_net_id);
>>> + bool need_move_to_close_list;
>>>
>>> dprintk("NFSD: nfsd4_close on file %pd\n",
>>> cstate->current_fh.fh_dentry);
>>> @@ -7114,8 +7115,11 @@ nfsd4_close(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>>> */
>>> nfs4_inc_and_copy_stateid(&close->cl_stateid, &stp->st_stid);
>>>
>>> - nfsd4_close_open_stateid(stp);
>>> + need_move_to_close_list = nfsd4_close_open_stateid(stp);
>>> mutex_unlock(&stp->st_mutex);
>>> + nfsd4_cstate_clear_replay(cstate);
>> should nfsd4_cstate_clear_replay be called only if need_move_to_close_list
>> is true?
> It certain could be done like that.
>
> if (need_move_to_close_list) {
> nfsd4_cstate_clear_replay(cstate);
> move_to_close_lru(stp, net);
> }
>
> It would make almost no behavioural difference as
> need_to_move_close_list is never true for v4.1 and later and almost
> always true for v4.0, and nfsd4_cstate_clear_replay() does nothing for
> v4.1 and later.
> The only time behaviour would interrestingly different is when
> nfsd4_close_open_stateid() found the state was already unlocked. Then
> need_move_to_close_list would be false, but nfsd4_cstate_clear_replay()
> wouldn't be a no-op.
>
> I thought the code was a little simpler the way I wrote it. We don't
> need the need_move_to_close_list guard on nfsd4_cstate_clear_replay(),
> so I left it unguarded.
> But I'm happy to change it if you can give a good reason - or even if
> you just think it is clearer the other way.
My thinking is that if move_to_close_lru is not called then why bother
to do nfsd4_cstate_clear_replay. It's just easier to understand and
safer (against future changes) than having to go through all possible
scenarios to make sure it's safe to call nfsd4_cstate_clear_replay
regardless.
-Dai
>
> Thanks,
> NeilBrown
>
>> -Dai
>>
>>> + if (need_move_to_close_list)
>>> + move_to_close_lru(stp, net);
>>>
>>> /* v4.1+ suggests that we send a special stateid in here, since the
>>> * clients should just ignore this anyway. Since this is not useful
next prev parent reply other threads:[~2023-12-23 18:07 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-22 1:41 [PATCH] nfsd: drop st_mutex before calling move_to_close_lru() NeilBrown
2023-12-22 2:12 ` [PATCH v2] nfsd: drop st_mutex and rp_mutex " NeilBrown
2023-12-22 14:39 ` Chuck Lever
2023-12-22 20:15 ` dai.ngo
2023-12-22 23:01 ` NeilBrown
2023-12-23 18:07 ` dai.ngo [this message]
-- strict thread matches above, loose matches on Subject: below --
2024-01-17 3:48 NeilBrown
2024-01-17 18:19 ` Jeff Layton
2024-02-28 17:40 ` Jeff Layton
2024-02-28 19:30 ` Jeff Layton
2024-02-29 14:00 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=620255dc-f16b-400f-bc53-cb7298d1d0e9@oracle.com \
--to=dai.ngo@oracle.com \
--cc=bfields@fieldses.org \
--cc=chuck.lever@oracle.com \
--cc=jlayton@kernel.org \
--cc=kolga@netapp.com \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).