Re: [PATCH 3/4] nfsd: replace rp_mutex to avoid deadlock in move_to_close_lru()

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jeff Layton <jlayton@kernel.org>
To: Chuck Lever <chuck.lever@oracle.com>, NeilBrown <neilb@suse.de>
Cc: linux-nfs@vger.kernel.org, Olga Kornievskaia <kolga@netapp.com>,
	Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>
Subject: Re: [PATCH 3/4] nfsd: replace rp_mutex to avoid deadlock in move_to_close_lru()
Date: Mon, 04 Mar 2024 18:11:24 -0500	[thread overview]
Message-ID: <477e896777379d9b060a735a3873e2ea3096f76f.camel@kernel.org> (raw)
In-Reply-To: <ZeZRC21DdOkuKroo@manet.1015granger.net>

On Mon, 2024-03-04 at 17:54 -0500, Chuck Lever wrote:
> On Tue, Mar 05, 2024 at 09:36:29AM +1100, NeilBrown wrote:
> > On Tue, 05 Mar 2024, Chuck Lever wrote:
> > > On Tue, Mar 05, 2024 at 08:45:45AM +1100, NeilBrown wrote:
> > > > On Tue, 05 Mar 2024, Chuck Lever wrote:
> > > > > On Mon, Mar 04, 2024 at 03:40:21PM +1100, NeilBrown wrote:
> > > > > > move_to_close_lru() waits for sc_count to become zero while holding
> > > > > > rp_mutex.  This can deadlock if another thread holds a reference and is
> > > > > > waiting for rp_mutex.
> > > > > > 
> > > > > > By the time we get to move_to_close_lru() the openowner is unhashed and
> > > > > > cannot be found any more.  So code waiting for the mutex can safely
> > > > > > retry the lookup if move_to_close_lru() has started.
> > > > > > 
> > > > > > So change rp_mutex to an atomic_t with three states:
> > > > > > 
> > > > > >  RP_UNLOCK   - state is still hashed, not locked for reply
> > > > > >  RP_LOCKED   - state is still hashed, is locked for reply
> > > > > >  RP_UNHASHED - state is not hashed, no code can get a lock.
> > > > > > 
> > > > > > Use wait_var_event() to wait for either a lock, or for the owner to be
> > > > > > unhashed.  In the latter case, retry the lookup.
> > > > > > 
> > > > > > Signed-off-by: NeilBrown <neilb@suse.de>
> > > > > > ---
> > > > > >  fs/nfsd/nfs4state.c | 38 +++++++++++++++++++++++++++++++-------
> > > > > >  fs/nfsd/state.h     |  2 +-
> > > > > >  2 files changed, 32 insertions(+), 8 deletions(-)
> > > > > > 
> > > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > > > > index 690d0e697320..47e879d5d68a 100644
> > > > > > --- a/fs/nfsd/nfs4state.c
> > > > > > +++ b/fs/nfsd/nfs4state.c
> > > > > > @@ -4430,21 +4430,32 @@ nfsd4_init_leases_net(struct nfsd_net *nn)
> > > > > >  	atomic_set(&nn->nfsd_courtesy_clients, 0);
> > > > > >  }
> > > > > >  
> > > > > > +enum rp_lock {
> > > > > > +	RP_UNLOCKED,
> > > > > > +	RP_LOCKED,
> > > > > > +	RP_UNHASHED,
> > > > > > +};
> > > > > > +
> > > > > >  static void init_nfs4_replay(struct nfs4_replay *rp)
> > > > > >  {
> > > > > >  	rp->rp_status = nfserr_serverfault;
> > > > > >  	rp->rp_buflen = 0;
> > > > > >  	rp->rp_buf = rp->rp_ibuf;
> > > > > > -	mutex_init(&rp->rp_mutex);
> > > > > > +	atomic_set(&rp->rp_locked, RP_UNLOCKED);
> > > > > >  }
> > > > > >  
> > > > > > -static void nfsd4_cstate_assign_replay(struct nfsd4_compound_state *cstate,
> > > > > > -		struct nfs4_stateowner *so)
> > > > > > +static int nfsd4_cstate_assign_replay(struct nfsd4_compound_state *cstate,
> > > > > > +				      struct nfs4_stateowner *so)
> > > > > >  {
> > > > > >  	if (!nfsd4_has_session(cstate)) {
> > > > > > -		mutex_lock(&so->so_replay.rp_mutex);
> > > > > > +		wait_var_event(&so->so_replay.rp_locked,
> > > > > > +			       atomic_cmpxchg(&so->so_replay.rp_locked,
> > > > > > +					      RP_UNLOCKED, RP_LOCKED) != RP_LOCKED);
> > > > > 
> > > > > What reliably prevents this from being a "wait forever" ?
> > > > 
> > > > That same thing that reliably prevented the original mutex_lock from
> > > > waiting forever.
> > > > It waits until rp_locked is set to RP_UNLOCKED, which is precisely when
> > > > we previously called mutex_unlock.  But it *also* aborts the wait if
> > > > rp_locked is set to RP_UNHASHED - so it is now more reliable.
> > > > 
> > > > Does that answer the question?
> > > 
> > > Hm. I guess then we are no worse off with wait_var_event().
> > > 
> > > I'm not as familiar with this logic as perhaps I should be. How long
> > > does it take for the wake-up to occur, typically?
> > 
> > wait_var_event() is paired with wake_up_var().
> > The wake up happens when wake_up_var() is called, which in this code is
> > always immediately after atomic_set() updates the variable.
> 
> I'm trying to ascertain the actual wall-clock time that the nfsd thread
> is sleeping, at most. Is this going to be a possible DoS vector? Can
> it impact the ability for the server to shut down without hanging?
> 
> 

Prior to this patch, there was a mutex in play here and we just released
it to wake up the waiters. This is more or less doing the same thing, it
just indicates the resulting state better.

I doubt this will materially change how long the tasks are waiting.
-- 
Jeff Layton <jlayton@kernel.org>

next prev parent reply	other threads:[~2024-03-04 23:11 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-04  4:40 [PATCH 0/4 v2] nfsd: fix dadlock in move_to_close_lru() NeilBrown
2024-03-04  4:40 ` [PATCH 1/4] nfsd: move nfsd4_cstate_assign_replay() earlier in open handling NeilBrown
2024-03-04 12:42   ` Jeff Layton
2024-03-04  4:40 ` [PATCH 2/4] nfsd: perform all find_openstateowner_str calls in the one place NeilBrown
2024-03-04 12:46   ` Jeff Layton
2024-03-04 21:41     ` NeilBrown
2024-03-04 23:04       ` Jeff Layton
2024-03-04  4:40 ` [PATCH 3/4] nfsd: replace rp_mutex to avoid deadlock in move_to_close_lru() NeilBrown
2024-03-04 12:50   ` Jeff Layton
2024-03-04 14:09   ` Chuck Lever
2024-03-04 21:45     ` NeilBrown
2024-03-04 22:03       ` Chuck Lever
2024-03-04 22:36         ` NeilBrown
2024-03-04 22:54           ` Chuck Lever
2024-03-04 23:04             ` NeilBrown
2024-03-04 23:11             ` Jeff Layton [this message]
2024-03-04 23:30               ` Chuck Lever
2024-03-05  0:05                 ` NeilBrown
2024-03-05  0:11                 ` Jeff Layton
2024-03-04  4:40 ` [PATCH 4/4] nfsd: drop st_mutex_mutex before calling move_to_close_lru() NeilBrown
2024-03-04 12:52   ` Jeff Layton
  -- strict thread matches above, loose matches on Subject: below --
2024-03-04 22:45 [PATCH 0/4 v3] nfsd: fix deadlock in move_to_close_lru() NeilBrown
2024-03-04 22:45 ` [PATCH 3/4] nfsd: replace rp_mutex to avoid " NeilBrown
2024-04-08  2:09 [PATCH 0/4 v4] nfsd: fix " NeilBrown
2024-04-08  2:09 ` [PATCH 3/4] nfsd: replace rp_mutex to avoid " NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=477e896777379d9b060a735a3873e2ea3096f76f.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=Dai.Ngo@oracle.com \
    --cc=chuck.lever@oracle.com \
    --cc=kolga@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).