public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@hammerspace.com>
To: "chuck.lever@oracle.com" <chuck.lever@oracle.com>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH] SUNRPC: Fix a potential race in xprt_connect()
Date: Mon, 3 Dec 2018 17:15:11 +0000	[thread overview]
Message-ID: <3ee22a93e7ac2853605ff6833af76a59bb5516c3.camel@hammerspace.com> (raw)
In-Reply-To: <8D79F3BF-74BE-418A-85A8-7E8AF508B0AE@oracle.com>

On Mon, 2018-12-03 at 10:56 -0500, Chuck Lever wrote:
> > On Dec 2, 2018, at 12:26 PM, Trond Myklebust <trondmy@gmail.com>
> > wrote:
> > 
> > On Sun, 2018-12-02 at 12:07 -0500, Chuck Lever wrote:
> > > Test report.
> > > 
> > > I applied this on top of the three v4 patches from Friday.
> > > 
> > > With TCP, I no longer see the soft IRQ warnings or bdev leaks.
> > > However, at some point during my test (sec=krb5p,vers=3) the
> > > mount stops responding and the client becomes idle. ^C leaves
> > > the iozone processes in D state.
> > 
> > Does /proc/*/stack show any tasks with any interesting stack
> > behaviour?
> > 
> > > With RDMA, I can see a few issues:
> > > - A lot of -74 and server disconnects. I suspect this is due
> > >   to GSS sequence window overruns
> > > - xprt_connect puts the calling task to sleep if the credit
> > >   limit has been reached, and that can result in a deadlock
> > > - The reconnect logic appears to busy-wait until something
> > >   else (eg, an RTO) forces a reconnect
> > > - A connect or disconnect wake-up that occurs while an RPC
> > >   reply is being handled (ie, is still on xprt->pending)
> > >   results in that RPC being retried unnecessarily.
> > > 
> > > I'm not sure how to address that last one. Sometimes RDMA has
> > > to invalidate MRs, and that involves a wait/context swith which
> > > opens the race window significantly.
> > 
> > In principle the latter issue is supposed to be handled by the
> > connect
> > cookies. Are the updates perhaps being ordered incorrectly w.r.t.
> > the
> > xprt->pending wakeup? If so, then that could cause races.
> > 
> > I'd expect the cookie would need to be updated before the call to
> > xprt-
> > pending both on connect and disconnect.
> 
> I've found one peculiar cookie-related behavior.
> 
> At some point, xprt_request_retransmit_after_disconnect() starts
> to claim that all requests have to be retransmitted and that the
> connection is closed. Closer inspection shows that the cookie
> check here is failing because it seems to be testing a request
> that hasn't been transmitted yet.

Right, but as far as the transport layer is concerned, there should be
no difference between 'need transmit' and 'need retransmit'. It is up
to the higher RPC client layer to decide whether or not there is a
policy difference in behaviour, and AFAICS rpc_task_need_encode()
should be doing the right thing.

> > > > On Dec 2, 2018, at 9:42 AM, Trond Myklebust <trondmy@gmail.com>
> > > > wrote:
> > > > 
> > > > If an asynchronous connection attempt completes while another
> > > > task
> > > > is
> > > > in xprt_connect(), then the call to rpc_sleep_on() could end up
> > > > racing with the call to xprt_wake_pending_tasks().
> > > > So add a second test of the connection state after we've put
> > > > the
> > > > task to sleep and set the XPRT_CONNECTING flag, when we know
> > > > that
> > > > there
> > > > can be no asynchronous connection attempts still in progress.
> > > > 
> > > > Fixes: 0b9e79431377d ("SUNRPC: Move the test for
> > > > XPRT_CONNECTING
> > > > into...")
> > > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com
> > > > >
> > > > ---
> > > > net/sunrpc/xprt.c | 11 +++++++++--
> > > > 1 file changed, 9 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> > > > index 122c91c28e7c..ce927002862a 100644
> > > > --- a/net/sunrpc/xprt.c
> > > > +++ b/net/sunrpc/xprt.c
> > > > @@ -826,8 +826,15 @@ void xprt_connect(struct rpc_task *task)
> > > > 			return;
> > > > 		if (xprt_test_and_set_connecting(xprt))
> > > > 			return;
> > > > -		xprt->stat.connect_start = jiffies;
> > > > -		xprt->ops->connect(xprt, task);
> > > > +		/* Race breaker */
> > > > +		if (!xprt_connected(xprt)) {
> > > > +			xprt->stat.connect_start = jiffies;
> > > > +			xprt->ops->connect(xprt, task);
> > > > +		} else {
> > > > +			xprt_clear_connecting(xprt);
> > > > +			task->tk_status = 0;
> > > > +			rpc_wake_up_queued_task(&xprt->pending, 
> > > > task);
> > > > +		}
> > > > 	}
> > > > 	xprt_release_write(xprt, task);
> > > > }
> > > > -- 
> > > > 2.19.2
> > > > 
-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



  reply	other threads:[~2018-12-03 17:15 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-02 14:42 [PATCH] SUNRPC: Fix a potential race in xprt_connect() Trond Myklebust
2018-12-02 17:07 ` Chuck Lever
2018-12-02 17:26   ` Trond Myklebust
2018-12-03 15:56     ` Chuck Lever
2018-12-03 17:15       ` Trond Myklebust [this message]
2018-12-03 19:00         ` Chuck Lever
2018-12-04 20:04     ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3ee22a93e7ac2853605ff6833af76a59bb5516c3.camel@hammerspace.com \
    --to=trondmy@hammerspace.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox