From: Trond Myklebust <trond.myklebust@fys.uio.no>
To: Michel Lespinasse <walken-Y93EPB1FQwg@public.gmane.org>
Cc: linux-nfs@vger.kernel.org
Subject: Re: Fwd: NFS 5-minute hangs upon S3 resume using 2.6.27 client
Date: Thu, 23 Oct 2008 11:36:47 -0400 [thread overview]
Message-ID: <1224776207.7625.7.camel@localhost> (raw)
In-Reply-To: <20081023040231.GA13512-Y93EPB1FQwg@public.gmane.org>
On Wed, 2008-10-22 at 21:02 -0700, Michel Lespinasse wrote:
> I sent this out to LKML earlier but was told this would be a better list.
> This has been mentionned in bugzilla already, but I'd like to draw attention
> before it gets too late for 2.6.28 (or is it already too late ???)
>
> The following is a common cause of 5-minute NFS hangs here:
>
> * Client has TCP connections to the NFS server, goes to S3 sleep for few hours.
> * TCP connections die on the server side.
> (not 100% sure why, do they use some kind of keepalive ???)
> * Client resumes from S3.
> * Client sends NFS requests down its TCP connections, gets back RST packet.
> * [Client hangs for exactly 300 seconds here]
> * Client establishes new TCP connections to the NFS server,
> and recovers from the hang.
>
> A tcpdump trace is attached at the end of bugzilla bug 11154:
> http://bugzilla.kernel.org/show_bug.cgi?id=11154
>
> Should the client immediately try to reconnect when its existing connection
> receives an RST packet ? (the 5 minute delay would make sense to me if
> RST was received in reply to a SYN, but I'm not sure about it in the case
> of an existing open TCP connection).
>
> If the 5 minute delay after an RST is necessary, could the client avoid it
> by explicitly closing/reopening its connections using suspend/resume hooks ?
>
> (I can not work around the issue locally by mounting/unmounting my NFS
> shares around the suspend/resume because rootfs also on NFS...)
>
> This NFS setup was working fine in 2.6.24. There has been issues with
> 2.6.25 and 2.6.26, but I did not confirm if they are the same bug.
> 2.6.25 usualy recovers after some variable delay and 2.6.26 usualy
> does not recover. Bugs 11154 and 11061 have more details about this,
> also Ian Campbell has been tracking an NFS issue under load that
> appeared at around the same time.
>
> Hope this helps,
Does the appended patch make a difference?
Cheers
Trond
--------------------------------------------------------------------------------------
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Thu, 23 Oct 2008 11:33:59 -0400
SUNRPC: Respond promptly to server TCP resets
If the server sends us an RST error while we're in the TCP_ESTABLISHED
state, then that will not result in a state change, and so the RPC client
ends up hanging forever (see
http://bugzilla.kernel.org/show_bug.cgi?id=11154)
We can intercept the reset by setting up an sk->sk_error_report callback,
which will then allow us to initiate a proper shutdown and retry...
We also make sure that if the send request receives an ECONNRESET, then we
shutdown too...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
net/sunrpc/xprtsock.c | 58 +++++++++++++++++++++++++++++++++++++++++--------
1 files changed, 48 insertions(+), 10 deletions(-)
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 9a288d5..0a50361 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -249,6 +249,7 @@ struct sock_xprt {
void (*old_data_ready)(struct sock *, int);
void (*old_state_change)(struct sock *);
void (*old_write_space)(struct sock *);
+ void (*old_error_report)(struct sock *);
};
/*
@@ -698,8 +699,9 @@ static int xs_tcp_send_request(struct rpc_task *task)
case -EAGAIN:
xs_nospace(task);
break;
- case -ECONNREFUSED:
case -ECONNRESET:
+ xs_tcp_shutdown(xprt);
+ case -ECONNREFUSED:
case -ENOTCONN:
case -EPIPE:
status = -ENOTCONN;
@@ -742,6 +744,22 @@ out_release:
xprt_release_xprt(xprt, task);
}
+static void xs_save_old_callbacks(struct sock_xprt *transport, struct sock *sk)
+{
+ transport->old_data_ready = sk->sk_data_ready;
+ transport->old_state_change = sk->sk_state_change;
+ transport->old_write_space = sk->sk_write_space;
+ transport->old_error_report = sk->sk_error_report;
+}
+
+static void xs_restore_old_callbacks(struct sock_xprt *transport, struct sock *sk)
+{
+ sk->sk_data_ready = transport->old_data_ready;
+ sk->sk_state_change = transport->old_state_change;
+ sk->sk_write_space = transport->old_write_space;
+ sk->sk_error_report = transport->old_error_report;
+}
+
/**
* xs_close - close a socket
* @xprt: transport
@@ -765,9 +783,8 @@ static void xs_close(struct rpc_xprt *xprt)
transport->sock = NULL;
sk->sk_user_data = NULL;
- sk->sk_data_ready = transport->old_data_ready;
- sk->sk_state_change = transport->old_state_change;
- sk->sk_write_space = transport->old_write_space;
+
+ xs_restore_old_callbacks(transport, sk);
write_unlock_bh(&sk->sk_callback_lock);
sk->sk_no_check = 0;
@@ -1180,6 +1197,28 @@ static void xs_tcp_state_change(struct sock *sk)
}
/**
+ * xs_tcp_error_report - callback mainly for catching RST events
+ * @sk: socket
+ */
+static void xs_tcp_error_report(struct sock *sk)
+{
+ struct rpc_xprt *xprt;
+
+ read_lock(&sk->sk_callback_lock);
+ if (sk->sk_err != ECONNRESET || sk->sk_state != TCP_ESTABLISHED)
+ goto out;
+ if (!(xprt = xprt_from_sock(sk)))
+ goto out;
+ dprintk("RPC: %s client %p...\n"
+ "RPC: error %d\n",
+ __func__, xprt, sk->sk_err);
+
+ xprt_force_disconnect(xprt);
+out:
+ read_unlock(&sk->sk_callback_lock);
+}
+
+/**
* xs_udp_write_space - callback invoked when socket buffer space
* becomes available
* @sk: socket whose state has changed
@@ -1454,10 +1493,9 @@ static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
write_lock_bh(&sk->sk_callback_lock);
+ xs_save_old_callbacks(transport, sk);
+
sk->sk_user_data = xprt;
- transport->old_data_ready = sk->sk_data_ready;
- transport->old_state_change = sk->sk_state_change;
- transport->old_write_space = sk->sk_write_space;
sk->sk_data_ready = xs_udp_data_ready;
sk->sk_write_space = xs_udp_write_space;
sk->sk_no_check = UDP_CSUM_NORCV;
@@ -1589,13 +1627,13 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
write_lock_bh(&sk->sk_callback_lock);
+ xs_save_old_callbacks(transport, sk);
+
sk->sk_user_data = xprt;
- transport->old_data_ready = sk->sk_data_ready;
- transport->old_state_change = sk->sk_state_change;
- transport->old_write_space = sk->sk_write_space;
sk->sk_data_ready = xs_tcp_data_ready;
sk->sk_state_change = xs_tcp_state_change;
sk->sk_write_space = xs_tcp_write_space;
+ sk->sk_error_report = xs_tcp_error_report;
sk->sk_allocation = GFP_ATOMIC;
/* socket options */
next prev parent reply other threads:[~2008-10-23 15:36 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-23 4:02 Fwd: NFS 5-minute hangs upon S3 resume using 2.6.27 client Michel Lespinasse
[not found] ` <20081023040231.GA13512-Y93EPB1FQwg@public.gmane.org>
2008-10-23 15:36 ` Trond Myklebust [this message]
2008-10-23 19:52 ` Michel Lespinasse
[not found] ` <20081023195231.GA2090-Y93EPB1FQwg@public.gmane.org>
2008-10-23 23:17 ` Trond Myklebust
2008-10-24 6:57 ` Michel Lespinasse
[not found] ` <20081024065759.GA2401-Y93EPB1FQwg@public.gmane.org>
2008-10-24 12:29 ` Trond Myklebust
2008-10-24 21:02 ` Michel Lespinasse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1224776207.7625.7.camel@localhost \
--to=trond.myklebust@fys.uio.no \
--cc=linux-nfs@vger.kernel.org \
--cc=walken-Y93EPB1FQwg@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox