Re: [PATCH 3/4] nfsd: replace rp_mutex to avoid deadlock in move_to_close_lru()

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jeff Layton <jlayton@kernel.org>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: NeilBrown <neilb@suse.de>,
	linux-nfs@vger.kernel.org, Olga Kornievskaia <kolga@netapp.com>,
	Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>
Subject: Re: [PATCH 3/4] nfsd: replace rp_mutex to avoid deadlock in move_to_close_lru()
Date: Mon, 04 Mar 2024 19:11:35 -0500	[thread overview]
Message-ID: <4000f063a5c4e41d863b179801682d796f0f64fe.camel@kernel.org> (raw)
In-Reply-To: <ZeZZru5hL_77IRP_@manet.1015granger.net>

On Mon, 2024-03-04 at 18:30 -0500, Chuck Lever wrote:
> On Mon, Mar 04, 2024 at 06:11:24PM -0500, Jeff Layton wrote:
> > On Mon, 2024-03-04 at 17:54 -0500, Chuck Lever wrote:
> > > On Tue, Mar 05, 2024 at 09:36:29AM +1100, NeilBrown wrote:
> > > > On Tue, 05 Mar 2024, Chuck Lever wrote:
> > > > > On Tue, Mar 05, 2024 at 08:45:45AM +1100, NeilBrown wrote:
> > > > > > On Tue, 05 Mar 2024, Chuck Lever wrote:
> > > > > > > On Mon, Mar 04, 2024 at 03:40:21PM +1100, NeilBrown wrote:
> > > > > > > > move_to_close_lru() waits for sc_count to become zero while holding
> > > > > > > > rp_mutex.  This can deadlock if another thread holds a reference and is
> > > > > > > > waiting for rp_mutex.
> > > > > > > > 
> > > > > > > > By the time we get to move_to_close_lru() the openowner is unhashed and
> > > > > > > > cannot be found any more.  So code waiting for the mutex can safely
> > > > > > > > retry the lookup if move_to_close_lru() has started.
> > > > > > > > 
> > > > > > > > So change rp_mutex to an atomic_t with three states:
> > > > > > > > 
> > > > > > > >  RP_UNLOCK   - state is still hashed, not locked for reply
> > > > > > > >  RP_LOCKED   - state is still hashed, is locked for reply
> > > > > > > >  RP_UNHASHED - state is not hashed, no code can get a lock.
> > > > > > > > 
> > > > > > > > Use wait_var_event() to wait for either a lock, or for the owner to be
> > > > > > > > unhashed.  In the latter case, retry the lookup.
> > > > > > > > 
> > > > > > > > Signed-off-by: NeilBrown <neilb@suse.de>
> > > > > > > > ---
> > > > > > > >  fs/nfsd/nfs4state.c | 38 +++++++++++++++++++++++++++++++-------
> > > > > > > >  fs/nfsd/state.h     |  2 +-
> > > > > > > >  2 files changed, 32 insertions(+), 8 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > > > > > > index 690d0e697320..47e879d5d68a 100644
> > > > > > > > --- a/fs/nfsd/nfs4state.c
> > > > > > > > +++ b/fs/nfsd/nfs4state.c
> > > > > > > > @@ -4430,21 +4430,32 @@ nfsd4_init_leases_net(struct nfsd_net *nn)
> > > > > > > >  	atomic_set(&nn->nfsd_courtesy_clients, 0);
> > > > > > > >  }
> > > > > > > >  
> > > > > > > > +enum rp_lock {
> > > > > > > > +	RP_UNLOCKED,
> > > > > > > > +	RP_LOCKED,
> > > > > > > > +	RP_UNHASHED,
> > > > > > > > +};
> > > > > > > > +
> > > > > > > >  static void init_nfs4_replay(struct nfs4_replay *rp)
> > > > > > > >  {
> > > > > > > >  	rp->rp_status = nfserr_serverfault;
> > > > > > > >  	rp->rp_buflen = 0;
> > > > > > > >  	rp->rp_buf = rp->rp_ibuf;
> > > > > > > > -	mutex_init(&rp->rp_mutex);
> > > > > > > > +	atomic_set(&rp->rp_locked, RP_UNLOCKED);
> > > > > > > >  }
> > > > > > > >  
> > > > > > > > -static void nfsd4_cstate_assign_replay(struct nfsd4_compound_state *cstate,
> > > > > > > > -		struct nfs4_stateowner *so)
> > > > > > > > +static int nfsd4_cstate_assign_replay(struct nfsd4_compound_state *cstate,
> > > > > > > > +				      struct nfs4_stateowner *so)
> > > > > > > >  {
> > > > > > > >  	if (!nfsd4_has_session(cstate)) {
> > > > > > > > -		mutex_lock(&so->so_replay.rp_mutex);
> > > > > > > > +		wait_var_event(&so->so_replay.rp_locked,
> > > > > > > > +			       atomic_cmpxchg(&so->so_replay.rp_locked,
> > > > > > > > +					      RP_UNLOCKED, RP_LOCKED) != RP_LOCKED);
> > > > > > > 
> > > > > > > What reliably prevents this from being a "wait forever" ?
> > > > > > 
> > > > > > That same thing that reliably prevented the original mutex_lock from
> > > > > > waiting forever.
> 
> Note that this patch fixes a deadlock here. So clearly, there /were/
> situations where "waiting forever" was possible with the mutex version
> of this code.
> 
> 
> > > > > > It waits until rp_locked is set to RP_UNLOCKED, which is precisely when
> > > > > > we previously called mutex_unlock.  But it *also* aborts the wait if
> > > > > > rp_locked is set to RP_UNHASHED - so it is now more reliable.
> > > > > > 
> > > > > > Does that answer the question?
> > > > > 
> > > > > Hm. I guess then we are no worse off with wait_var_event().
> > > > > 
> > > > > I'm not as familiar with this logic as perhaps I should be. How long
> > > > > does it take for the wake-up to occur, typically?
> > > > 
> > > > wait_var_event() is paired with wake_up_var().
> > > > The wake up happens when wake_up_var() is called, which in this code is
> > > > always immediately after atomic_set() updates the variable.
> > > 
> > > I'm trying to ascertain the actual wall-clock time that the nfsd thread
> > > is sleeping, at most. Is this going to be a possible DoS vector? Can
> > > it impact the ability for the server to shut down without hanging?
> > 
> > Prior to this patch, there was a mutex in play here and we just released
> > it to wake up the waiters. This is more or less doing the same thing, it
> > just indicates the resulting state better.
> 
> Well, it adds a third state so that a recovery action can be taken
> on wake-up in some cases. That avoids a deadlock, so this does count
> as a bug fix.
> 
> 
> > I doubt this will materially change how long the tasks are waiting.
> 
> It might not be a longer wait, but it still seems difficult to prove
> that the wait_var_event() will /always/ be awoken somehow.
> 

I'm not sure what other guarantees we can reasonably offer. The v4.0
replay handling requires an exclusionary lock of some sort, so you can
always get stuck if the wakeup never comes.

We typically hold this mutex today over seqid morphing operations in
v4.0, so if those get stuck doing something (a vfs_open or something?)
you'll get stuck here too.
-- 
Jeff Layton <jlayton@kernel.org>

next prev parent reply	other threads:[~2024-03-05  0:11 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-04  4:40 [PATCH 0/4 v2] nfsd: fix dadlock in move_to_close_lru() NeilBrown
2024-03-04  4:40 ` [PATCH 1/4] nfsd: move nfsd4_cstate_assign_replay() earlier in open handling NeilBrown
2024-03-04 12:42   ` Jeff Layton
2024-03-04  4:40 ` [PATCH 2/4] nfsd: perform all find_openstateowner_str calls in the one place NeilBrown
2024-03-04 12:46   ` Jeff Layton
2024-03-04 21:41     ` NeilBrown
2024-03-04 23:04       ` Jeff Layton
2024-03-04  4:40 ` [PATCH 3/4] nfsd: replace rp_mutex to avoid deadlock in move_to_close_lru() NeilBrown
2024-03-04 12:50   ` Jeff Layton
2024-03-04 14:09   ` Chuck Lever
2024-03-04 21:45     ` NeilBrown
2024-03-04 22:03       ` Chuck Lever
2024-03-04 22:36         ` NeilBrown
2024-03-04 22:54           ` Chuck Lever
2024-03-04 23:04             ` NeilBrown
2024-03-04 23:11             ` Jeff Layton
2024-03-04 23:30               ` Chuck Lever
2024-03-05  0:05                 ` NeilBrown
2024-03-05  0:11                 ` Jeff Layton [this message]
2024-03-04  4:40 ` [PATCH 4/4] nfsd: drop st_mutex_mutex before calling move_to_close_lru() NeilBrown
2024-03-04 12:52   ` Jeff Layton
  -- strict thread matches above, loose matches on Subject: below --
2024-03-04 22:45 [PATCH 0/4 v3] nfsd: fix deadlock in move_to_close_lru() NeilBrown
2024-03-04 22:45 ` [PATCH 3/4] nfsd: replace rp_mutex to avoid " NeilBrown
2024-04-08  2:09 [PATCH 0/4 v4] nfsd: fix " NeilBrown
2024-04-08  2:09 ` [PATCH 3/4] nfsd: replace rp_mutex to avoid " NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4000f063a5c4e41d863b179801682d796f0f64fe.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=Dai.Ngo@oracle.com \
    --cc=chuck.lever@oracle.com \
    --cc=kolga@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).