All of lore.kernel.org
 help / color / mirror / Atom feed
From: Simon Kirby <sim@hostway.ca>
To: "Myklebust, Trond" <Trond.Myklebust@netapp.com>
Cc: Yan-Pai Chen <yanpai.chen@gmail.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: [3.2.5] NFSv3 CLOSE_WAIT hang
Date: Wed, 19 Sep 2012 15:01:48 -0700	[thread overview]
Message-ID: <20120919220148.GA8249@hostway.ca> (raw)
In-Reply-To: <4FA345DA4F4AE44899BD2B03EEEC2FA908F9F734@SACEXCMBX04-PRD.hq.netapp.com>

On Tue, Sep 11, 2012 at 10:17:25PM +0000, Myklebust, Trond wrote:

> Does the "if (test_and_set_bit(XPRT_LOCK) == 0)" condition immediately
> following that succeed so that queue_work() is called?

Yes, it seems to:

[146957.793093] RPC: server initated shutdown -- state 8 conn 1 dead 0 zapped 1 sk_shutdown 1
[146957.793418] xprt_force_disconnect(): setting XPRT_CLOSE_WAIT
[146957.799288] xprt_force_disconnect(): setting XPRT_LOCKED worked, calling queue_work()

On Wed, Sep 12, 2012 at 08:54:14PM +0000, Myklebust, Trond wrote:

> On Tue, 2012-09-11 at 18:17 -0400, Trond Myklebust wrote:
> 
> Hi Simon,
> 
> Can you try the following patch, and see if it addresses the TCP "server
> hangs" case?
> 
> Cheers
>   Trond
> 8<----------------------------------------------------------------------
> From 99330d09cc1074fbdc64089fa0a3f8dbdc74daaf Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
> Date: Wed, 12 Sep 2012 16:49:15 -0400
> Subject: [PATCH] SUNRPC: Ensure that the TCP socket is closed when in
>  CLOSE_WAIT
> 
> Instead of doing a shutdown() call, we need to do an actual close().
> Ditto if/when the server is sending us junk RPC headers.
> 
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
>  net/sunrpc/xprtsock.c | 21 ++++++++++++++++-----
>  1 file changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index a35b8e5..d1988cf 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -1025,6 +1025,16 @@ static void xs_udp_data_ready(struct sock *sk, int len)
>  	read_unlock_bh(&sk->sk_callback_lock);
>  }
>  
> +/*
> + * Helper function to force a TCP close if the server is sending
> + * junk and/or it has put us in CLOSE_WAIT
> + */
> +static void xs_tcp_force_close(struct rpc_xprt *xprt)
> +{
> +	set_bit(XPRT_CONNECTION_CLOSE, &xprt->state);
> +	xprt_force_disconnect(xprt);
> +}
> +
>  static inline void xs_tcp_read_fraghdr(struct rpc_xprt *xprt, struct xdr_skb_reader *desc)
>  {
>  	struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
> @@ -1051,7 +1061,7 @@ static inline void xs_tcp_read_fraghdr(struct rpc_xprt *xprt, struct xdr_skb_rea
>  	/* Sanity check of the record length */
>  	if (unlikely(transport->tcp_reclen < 8)) {
>  		dprintk("RPC:       invalid TCP record fragment length\n");
> -		xprt_force_disconnect(xprt);
> +		xs_tcp_force_close(xprt);
>  		return;
>  	}
>  	dprintk("RPC:       reading TCP record fragment of length %d\n",
> @@ -1132,7 +1142,7 @@ static inline void xs_tcp_read_calldir(struct sock_xprt *transport,
>  		break;
>  	default:
>  		dprintk("RPC:       invalid request message type\n");
> -		xprt_force_disconnect(&transport->xprt);
> +		xs_tcp_force_close(&transport->xprt);
>  	}
>  	xs_tcp_check_fraghdr(transport);
>  }
> @@ -1455,6 +1465,8 @@ static void xs_tcp_cancel_linger_timeout(struct rpc_xprt *xprt)
>  static void xs_sock_mark_closed(struct rpc_xprt *xprt)
>  {
>  	smp_mb__before_clear_bit();
> +	clear_bit(XPRT_CONNECTION_ABORT, &xprt->state);
> +	clear_bit(XPRT_CONNECTION_CLOSE, &xprt->state);
>  	clear_bit(XPRT_CLOSE_WAIT, &xprt->state);
>  	clear_bit(XPRT_CLOSING, &xprt->state);
>  	smp_mb__after_clear_bit();
> @@ -1512,8 +1524,8 @@ static void xs_tcp_state_change(struct sock *sk)
>  		break;
>  	case TCP_CLOSE_WAIT:
>  		/* The server initiated a shutdown of the socket */
> -		xprt_force_disconnect(xprt);
>  		xprt->connect_cookie++;
> +		xs_tcp_force_close(xprt);
>  	case TCP_CLOSING:
>  		/*
>  		 * If the server closed down the connection, make sure that
> @@ -2199,8 +2211,7 @@ static void xs_tcp_setup_socket(struct work_struct *work)
>  		/* We're probably in TIME_WAIT. Get rid of existing socket,
>  		 * and retry
>  		 */
> -		set_bit(XPRT_CONNECTION_CLOSE, &xprt->state);
> -		xprt_force_disconnect(xprt);
> +		xs_tcp_force_close(xprt);
>  		break;
>  	case -ECONNREFUSED:
>  	case -ECONNRESET:
> -- 
> 1.7.11.4

Yes, based on data collected today, this seems to fix the problem!
Awesome! :)

Simon-

  reply	other threads:[~2012-09-19 22:01 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-09 19:45 [3.1-rc4] NFSv3 client hang Simon Kirby
2011-09-09 23:18 ` Trond Myklebust
2011-09-09 23:18   ` Trond Myklebust
2011-10-20 19:03   ` Simon Kirby
2012-03-01 22:55     ` Simon Kirby
2012-03-02  0:25       ` Simon Kirby
2012-03-02 18:49         ` [3.2.5] NFSv3 CLOSE_WAIT hang Simon Kirby
2012-09-05  7:49           ` Yan-Pai Chen
2012-09-05 15:09             ` Myklebust, Trond
2012-09-07 13:57               ` Dick Streefland, rnews
2012-09-07 14:13                 ` Myklebust, Trond
2012-09-07 14:33                   ` Dick Streefland, rnews
2012-09-07 15:46                     ` Myklebust, Trond
2012-09-08 19:32                       ` Dick Streefland, rnews
2012-09-10  9:00                         ` Yan-Pai Chen
2012-09-11 19:40                           ` Simon Kirby
2012-09-11 22:17                             ` Myklebust, Trond
2012-09-13  5:22                               ` Yan-Pai Chen
2012-09-13 13:32                                 ` Myklebust, Trond
2012-09-21  7:30                                   ` Yan-Pai Chen
     [not found]                             ` <1347401844.15208.17.camel@lade.trondhjem.org>
2012-09-12 20:54                               ` Myklebust, Trond
2012-09-19 22:01                                 ` Simon Kirby [this message]
2012-09-19 22:11                                   ` Myklebust, Trond
2012-10-12  8:15                                     ` Simon Kirby

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120919220148.GA8249@hostway.ca \
    --to=sim@hostway.ca \
    --cc=Trond.Myklebust@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=yanpai.chen@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.