Linux NFS development
 help / color / mirror / Atom feed
From: Trond Myklebust <trond.myklebust@fys.uio.no>
To: Michel Lespinasse <walken-Y93EPB1FQwg@public.gmane.org>
Cc: linux-nfs@vger.kernel.org
Subject: Re: Fwd: NFS 5-minute hangs upon S3 resume using 2.6.27 client
Date: Thu, 23 Oct 2008 19:17:59 -0400	[thread overview]
Message-ID: <1224803879.7625.79.camel@localhost> (raw)
In-Reply-To: <20081023195231.GA2090-Y93EPB1FQwg@public.gmane.org>

On Thu, 2008-10-23 at 12:52 -0700, Michel Lespinasse wrote:
> Hi,
> 
> On Thu, Oct 23, 2008 at 11:36:47AM -0400, Trond Myklebust wrote:
> > Does the appended patch make a difference?
> > 
> > From: Trond Myklebust <Trond.Myklebust@netapp.com>
> > Date: Thu, 23 Oct 2008 11:33:59 -0400
> > SUNRPC: Respond promptly to server TCP resets
> 
> I applied it over a 2.6.27.3 base, suspended the client for 40 minutes
> and resumed it, logging what happens from the server side. The resume
> went like this:
> 
> 12:38:53.692785 IP client.329262748 > server.nfs: 100 getattr [|nfs]
> 12:38:53.699885 arp who-has client tell server
> 12:38:54.123793 IP client.329262748 > server.nfs: 100 getattr [|nfs]
> 12:38:54.695888 arp who-has client tell server
> 12:38:54.696011 arp reply client is-at 00:19:d1:54:0e:39 (oui Unknown)
> 12:38:54.696020 IP server.nfs > client.882: R 2944642919:2944642919(0) win 0
> 12:38:54.696024 IP server.nfs > client.882: R 2944642919:2944642919(0) win 0
> 
> (I'm still concerned about the 3 second delay here...)
> 
> 12:38:57.695956 IP client.2 > server.nfs: 0 null

Does this patch fix that delay?

Cheers
  Trond

-------------------------------------------------------------------------------------
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Thu, 23 Oct 2008 19:14:55 -0400
SUNRPC: Fix the setting of xprt->reestablish_timeout when reconnecting

If the server aborts an established connection, then we should retry
connecting immediately. Since xprt->reestablish_timeout is not reset unless
we go through a TCP_FIN_WAIT1 state, we may end waiting.
The fix is to reset xprt->reestablish_timeout in TCP_ESTABLISHED, and then
rely on the fact that we set it to non-zero values in all other cases when
the server closes the connection.

Also fix a race between xs_connect() and xs_tcp_state_change(). The
value of xprt->reestablish_timeout should be updated before we actually
attempt the connection.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 net/sunrpc/xprtsock.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 0a50361..ac2aa52 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1144,6 +1144,8 @@ static void xs_tcp_state_change(struct sock *sk)
 			struct sock_xprt *transport = container_of(xprt,
 					struct sock_xprt, xprt);
 
+			xprt->reestablish_timeout = 0;
+
 			/* Reset TCP record info */
 			transport->tcp_offset = 0;
 			transport->tcp_reclen = 0;
@@ -1158,7 +1160,6 @@ static void xs_tcp_state_change(struct sock *sk)
 	case TCP_FIN_WAIT1:
 		/* The client initiated a shutdown of the socket */
 		xprt->connect_cookie++;
-		xprt->reestablish_timeout = 0;
 		set_bit(XPRT_CLOSING, &xprt->state);
 		smp_mb__before_clear_bit();
 		clear_bit(XPRT_CONNECTED, &xprt->state);
@@ -1793,6 +1794,7 @@ static void xs_connect(struct rpc_task *task)
 {
 	struct rpc_xprt *xprt = task->tk_xprt;
 	struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
+	unsigned long timeout;
 
 	if (xprt_test_and_set_connecting(xprt))
 		return;
@@ -1801,12 +1803,12 @@ static void xs_connect(struct rpc_task *task)
 		dprintk("RPC:       xs_connect delayed xprt %p for %lu "
 				"seconds\n",
 				xprt, xprt->reestablish_timeout / HZ);
-		queue_delayed_work(rpciod_workqueue,
-				   &transport->connect_worker,
-				   xprt->reestablish_timeout);
+		timeout = xprt->reestablish_timeout;
 		xprt->reestablish_timeout <<= 1;
 		if (xprt->reestablish_timeout > XS_TCP_MAX_REEST_TO)
 			xprt->reestablish_timeout = XS_TCP_MAX_REEST_TO;
+		queue_delayed_work(rpciod_workqueue,
+				   &transport->connect_worker, timeout);
 	} else {
 		dprintk("RPC:       xs_connect scheduled xprt %p\n", xprt);
 		queue_delayed_work(rpciod_workqueue,



  parent reply	other threads:[~2008-10-23 23:18 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-23  4:02 Fwd: NFS 5-minute hangs upon S3 resume using 2.6.27 client Michel Lespinasse
     [not found] ` <20081023040231.GA13512-Y93EPB1FQwg@public.gmane.org>
2008-10-23 15:36   ` Trond Myklebust
2008-10-23 19:52     ` Michel Lespinasse
     [not found]       ` <20081023195231.GA2090-Y93EPB1FQwg@public.gmane.org>
2008-10-23 23:17         ` Trond Myklebust [this message]
2008-10-24  6:57           ` Michel Lespinasse
     [not found]             ` <20081024065759.GA2401-Y93EPB1FQwg@public.gmane.org>
2008-10-24 12:29               ` Trond Myklebust
2008-10-24 21:02                 ` Michel Lespinasse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1224803879.7625.79.camel@localhost \
    --to=trond.myklebust@fys.uio.no \
    --cc=linux-nfs@vger.kernel.org \
    --cc=walken-Y93EPB1FQwg@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox