netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] [Bug 16494] NFS client over TCP hangs due to packet loss
@ 2010-08-03  8:14 Andy Chittenden
  0 siblings, 0 replies; 12+ messages in thread
From: Andy Chittenden @ 2010-08-03  8:14 UTC (permalink / raw)
  To: 'David S. Miller', 'Alexey Kuznetsov',
	'Pekka Savola (ipv6', 'James Morris',
	"'Hideaki Y
  Cc: akpm

I don't know whether this patch is the correct fix or not but it enables the
NFS client to recover.

Kernel version: 2.6.34.1 and 2.6.32.

Fixes <https://bugzilla.kernel.org/show_bug.cgi?id=16494>. It clears down
any previous shutdown attempts so that reconnects on a socket that's been
shutdown leave the socket in a usable state (otherwise tcp_sendmsg() returns
-EPIPE).

# diff -up /home/company/software/src/linux-2.6.34.1/net/ipv4/tcp_output.c
net/ipv4
--- /home/company/software/src/linux-2.6.34.1/net/ipv4/tcp_output.c
2010-07-27 08:46:46.917000000 +0100
+++ net/ipv4/tcp_output.c       2010-07-27 09:19:16.000000000 +0100
@@ -2522,6 +2522,13 @@ static void tcp_connect_init(struct sock
        struct tcp_sock *tp = tcp_sk(sk);
        __u8 rcv_wscale;

+       /* clear down any previous shutdown attempts so that
+        * reconnects on a socket that's been shutdown leave the
+        * socket in a usable state (otherwise tcp_sendmsg() returns
+        * -EPIPE).
+        */
+       sk->sk_shutdown = 0;
+
        /* We'll fix this up when we get a response from the other end.
         * See tcp_input.c:tcp_rcv_state_process case TCP_SYN_SENT.
         */

Signed-off-by: Andy Chittenden <andyc.bluearc@gmail.com>




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] [Bug 16494] NFS client over TCP hangs due to packet loss
       [not found] <4c57cfe8.887b0e0a.2f79.4772@mx.google.com>
@ 2010-08-03  8:21 ` David Miller
       [not found]   ` <20100803.012144.267950450.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2010-08-03  8:21 UTC (permalink / raw)
  To: andyc.bluearc
  Cc: kuznet, pekkas, jmorris, yoshfuji, kaber, eric.dumazet,
	William.Allen.Simpson, gilad, ilpo.jarvinen, netdev, linux-kernel,
	akpm

From: "Andy Chittenden" <andyc.bluearc@gmail.com>
Date: Tue, 3 Aug 2010 09:14:31 +0100

> I don't know whether this patch is the correct fix or not but it enables the
> NFS client to recover.
> 
> Kernel version: 2.6.34.1 and 2.6.32.
> 
> Fixes <https://bugzilla.kernel.org/show_bug.cgi?id=16494>. It clears down
> any previous shutdown attempts so that reconnects on a socket that's been
> shutdown leave the socket in a usable state (otherwise tcp_sendmsg() returns
> -EPIPE).

If the SunRPC code wants to close a TCP socket then use it again,
it should disconnect by doing a connect() with sa_family == AF_UNSPEC

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] [Bug 16494] NFS client over TCP hangs due to packet loss
       [not found]   ` <20100803.012144.267950450.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
@ 2010-08-03  9:11     ` Andrew Morton
       [not found]       ` <20100803021110.f0b3877b.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2010-08-03  9:11 UTC (permalink / raw)
  To: David Miller
  Cc: andyc.bluearc-Re5JQEeQqe8AvxtiuMwx3w,
	kuznet-v/Mj1YrvjDBInbfyfbPRSQ, pekkas-UjJjq++bwZ7HOG6cAo2yLw,
	jmorris-gx6/JNMH7DfYtjvyW6yDsg, yoshfuji-VfPWfsRibaP+Ru+s062T9g,
	kaber-dcUjhNyLwpNeoWH0uzbU5w, eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	William.Allen.Simpson-Re5JQEeQqe8AvxtiuMwx3w,
	gilad-f4XOiQkOAtcdH0auuBZGHA,
	ilpo.jarvinen-pxSi+dnQzZMxHbG02/KK1g,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA

(cc linux-nfs)

On Tue, 03 Aug 2010 01:21:44 -0700 (PDT) David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> wrote:

> From: "Andy Chittenden" <andyc.bluearc-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Date: Tue, 3 Aug 2010 09:14:31 +0100
> 
> > I don't know whether this patch is the correct fix or not but it enables the
> > NFS client to recover.
> > 
> > Kernel version: 2.6.34.1 and 2.6.32.
> > 
> > Fixes <https://bugzilla.kernel.org/show_bug.cgi?id=16494>. It clears down
> > any previous shutdown attempts so that reconnects on a socket that's been
> > shutdown leave the socket in a usable state (otherwise tcp_sendmsg() returns
> > -EPIPE).
> 
> If the SunRPC code wants to close a TCP socket then use it again,
> it should disconnect by doing a connect() with sa_family == AF_UNSPEC
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] [Bug 16494] NFS client over TCP hangs due to packet loss
       [not found]       ` <20100803021110.f0b3877b.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
@ 2010-08-03 10:25         ` Andy Chittenden
       [not found]           ` <4C57EE9A.7040308-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Andy Chittenden @ 2010-08-03 10:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, kuznet-v/Mj1YrvjDBInbfyfbPRSQ,
	pekkas-UjJjq++bwZ7HOG6cAo2yLw, jmorris-gx6/JNMH7DfYtjvyW6yDsg,
	yoshfuji-VfPWfsRibaP+Ru+s062T9g, kaber-dcUjhNyLwpNeoWH0uzbU5w,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	William.Allen.Simpson-Re5JQEeQqe8AvxtiuMwx3w,
	gilad-f4XOiQkOAtcdH0auuBZGHA,
	ilpo.jarvinen-pxSi+dnQzZMxHbG02/KK1g,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA

On 2010-08-03 10:11, Andrew Morton wrote:
> (cc linux-nfs)
>
> On Tue, 03 Aug 2010 01:21:44 -0700 (PDT) David Miller<davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>  wrote:
>
>> From: "Andy Chittenden"<andyc.bluearc-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> Date: Tue, 3 Aug 2010 09:14:31 +0100
>>
>>> I don't know whether this patch is the correct fix or not but it enables the
>>> NFS client to recover.
>>>
>>> Kernel version: 2.6.34.1 and 2.6.32.
>>>
>>> Fixes<https://bugzilla.kernel.org/show_bug.cgi?id=16494>. It clears down
>>> any previous shutdown attempts so that reconnects on a socket that's been
>>> shutdown leave the socket in a usable state (otherwise tcp_sendmsg() returns
>>> -EPIPE).
>>
>> If the SunRPC code wants to close a TCP socket then use it again,
>> it should disconnect by doing a connect() with sa_family == AF_UNSPEC

There is code to do that in the SunRPC code in xs_abort_connection() but 
that's conditionally called from xs_tcp_reuse_connection():

static void xs_tcp_reuse_connection(struct rpc_xprt *xprt, struct 
sock_xprt *transport)
{
	unsigned int state = transport->inet->sk_state;

	if (state == TCP_CLOSE && transport->sock->state == SS_UNCONNECTED)
		return;
	if ((1 << state) & (TCPF_ESTABLISHED|TCPF_SYN_SENT))
		return;
	xs_abort_connection(xprt, transport);
}

That's changed since 2.6.26 where it unconditionally did the connect() 
with sa_family == AF_UNSPEC. FWIW we cannot reproduce this problem with 
2.6.26.


--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] [Bug 16494] NFS client over TCP hangs due to packet loss
       [not found]           ` <4C57EE9A.7040308-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2010-08-05 14:55             ` Andy Chittenden
       [not found]               ` <4c5ad0d6.42ecd80a.47d7.0dfc-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Andy Chittenden @ 2010-08-05 14:55 UTC (permalink / raw)
  To: 'Andy Chittenden', 'Andrew Morton'
  Cc: 'David Miller', kuznet-v/Mj1YrvjDBInbfyfbPRSQ,
	pekkas-UjJjq++bwZ7HOG6cAo2yLw, jmorris-gx6/JNMH7DfYtjvyW6yDsg,
	yoshfuji-VfPWfsRibaP+Ru+s062T9g, kaber-dcUjhNyLwpNeoWH0uzbU5w,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	William.Allen.Simpson-Re5JQEeQqe8AvxtiuMwx3w,
	gilad-f4XOiQkOAtcdH0auuBZGHA,
	ilpo.jarvinen-pxSi+dnQzZMxHbG02/KK1g,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, 'Trond Myklebust',
	'J. Bruce Fields', 'Neil Brown',
	'Chuck Lever', 'Benny Halevy',
	'Alexandros Batsakis', 'Joe Perches'

> On 2010-08-03 10:11, Andrew Morton wrote:
> > (cc linux-nfs)
> >
> > On Tue, 03 Aug 2010 01:21:44 -0700 (PDT) David
> Miller<davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>  wrote:
> >
> >> From: "Andy Chittenden"<andyc.bluearc-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> >> Date: Tue, 3 Aug 2010 09:14:31 +0100
> >>
> >>> I don't know whether this patch is the correct fix or not but it
> enables the
> >>> NFS client to recover.
> >>>
> >>> Kernel version: 2.6.34.1 and 2.6.32.
> >>>
> >>> Fixes<https://bugzilla.kernel.org/show_bug.cgi?id=16494>. It clears
> down
> >>> any previous shutdown attempts so that reconnects on a socket
> that's been
> >>> shutdown leave the socket in a usable state (otherwise
> tcp_sendmsg() returns
> >>> -EPIPE).
> >>
> >> If the SunRPC code wants to close a TCP socket then use it again,
> >> it should disconnect by doing a connect() with sa_family ==
> AF_UNSPEC
> 
> There is code to do that in the SunRPC code in xs_abort_connection()
> but
> that's conditionally called from xs_tcp_reuse_connection():
> 
> static void xs_tcp_reuse_connection(struct rpc_xprt *xprt, struct
> sock_xprt *transport)
> {
> 	unsigned int state = transport->inet->sk_state;
> 
> 	if (state == TCP_CLOSE && transport->sock->state ==
> SS_UNCONNECTED)
> 		return;
> 	if ((1 << state) & (TCPF_ESTABLISHED|TCPF_SYN_SENT))
> 		return;
> 	xs_abort_connection(xprt, transport);
> }
> 
> That's changed since 2.6.26 where it unconditionally did the connect()
> with sa_family == AF_UNSPEC. FWIW we cannot reproduce this problem with
> 2.6.26.

The problem is fixed with this patch which also prints out that sk_shutdown
can be non-zero on entry to xs_tcp_reuse_connection:

# diff -up /home/company/software/src/linux-2.6.34.2/net/sunrpc/xprtsock.c
net/sunrpc/xprtsock.c 
--- /home/company/software/src/linux-2.6.34.2/net/sunrpc/xprtsock.c
2010-08-02 18:30:51.000000000 +0100
+++ net/sunrpc/xprtsock.c       2010-08-05 12:21:11.000000000 +0100
@@ -1322,10 +1322,11 @@ static void xs_tcp_state_change(struct s
        if (!(xprt = xprt_from_sock(sk)))
                goto out;
        dprintk("RPC:       xs_tcp_state_change client %p...\n", xprt);
-       dprintk("RPC:       state %x conn %d dead %d zapped %d\n",
+       dprintk("RPC:       state %x conn %d dead %d zapped %d sk_shutdown
%d\n",
                        sk->sk_state, xprt_connected(xprt),
                        sock_flag(sk, SOCK_DEAD),
-                       sock_flag(sk, SOCK_ZAPPED));
+                       sock_flag(sk, SOCK_ZAPPED),
+                       sk->sk_shutdown);
 
        switch (sk->sk_state) {
        case TCP_ESTABLISHED:
@@ -1796,10 +1797,18 @@ static void xs_tcp_reuse_connection(stru
 {
        unsigned int state = transport->inet->sk_state;
 
-       if (state == TCP_CLOSE && transport->sock->state == SS_UNCONNECTED)
-               return;
-       if ((1 << state) & (TCPF_ESTABLISHED|TCPF_SYN_SENT))
-               return;
+       if (state == TCP_CLOSE && transport->sock->state == SS_UNCONNECTED)
{
+               if (transport->inet->sk_shutdown == 0)
+                       return;
+               printk("%s: TCP_CLOSEd and sk_shutdown set to %d\n",
+                       __func__, transport->inet->sk_shutdown);
+       }
+       if ((1 << state) & (TCPF_ESTABLISHED|TCPF_SYN_SENT)) {
+               if (transport->inet->sk_shutdown == 0)
+                       return;
+               printk("%s: sk_shutdown set to %d\n",
+                       __func__, transport->inet->sk_shutdown);
+       }
        xs_abort_connection(xprt, transport);
 }

Signed-off-by: Andy Chittenden <andyc.bluearc-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

dmesg displays:

[ 2840.896043] xs_tcp_reuse_connection: TCP_CLOSEd and sk_shutdown set to 2

so previously the code was attempting to reuse the connection but wasn't
aborting it and thus didn't clear down sk_shutdown.


--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] [Bug 16494] NFS client over TCP hangs due to packet loss
       [not found]               ` <4c5ad0d6.42ecd80a.47d7.0dfc-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
@ 2010-08-05 19:50                 ` Trond Myklebust
       [not found]                   ` <1281037822.2948.49.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Trond Myklebust @ 2010-08-05 19:50 UTC (permalink / raw)
  To: Andy Chittenden
  Cc: 'Andrew Morton', 'David Miller',
	kuznet-v/Mj1YrvjDBInbfyfbPRSQ, pekkas-UjJjq++bwZ7HOG6cAo2yLw,
	jmorris-gx6/JNMH7DfYtjvyW6yDsg, yoshfuji-VfPWfsRibaP+Ru+s062T9g,
	kaber-dcUjhNyLwpNeoWH0uzbU5w, eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	William.Allen.Simpson-Re5JQEeQqe8AvxtiuMwx3w,
	gilad-f4XOiQkOAtcdH0auuBZGHA,
	ilpo.jarvinen-pxSi+dnQzZMxHbG02/KK1g,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, 'J. Bruce Fields',
	'Neil Brown', 'Chuck Lever',
	'Benny Halevy', 'Alexandros Batsakis',
	'Joe Perches'

On Thu, 2010-08-05 at 15:55 +0100, Andy Chittenden wrote:
> > On 2010-08-03 10:11, Andrew Morton wrote:
> > > (cc linux-nfs)
> > >
> > > On Tue, 03 Aug 2010 01:21:44 -0700 (PDT) David
> > Miller<davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>  wrote:
> > >
> > >> From: "Andy Chittenden"<andyc.bluearc-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> > >> Date: Tue, 3 Aug 2010 09:14:31 +0100
> > >>
> > >>> I don't know whether this patch is the correct fix or not but it
> > enables the
> > >>> NFS client to recover.
> > >>>
> > >>> Kernel version: 2.6.34.1 and 2.6.32.
> > >>>
> > >>> Fixes<https://bugzilla.kernel.org/show_bug.cgi?id=16494>. It clears
> > down
> > >>> any previous shutdown attempts so that reconnects on a socket
> > that's been
> > >>> shutdown leave the socket in a usable state (otherwise
> > tcp_sendmsg() returns
> > >>> -EPIPE).
> > >>
> > >> If the SunRPC code wants to close a TCP socket then use it again,
> > >> it should disconnect by doing a connect() with sa_family ==
> > AF_UNSPEC
> > 
> > There is code to do that in the SunRPC code in xs_abort_connection()
> > but
> > that's conditionally called from xs_tcp_reuse_connection():
> > 
> > static void xs_tcp_reuse_connection(struct rpc_xprt *xprt, struct
> > sock_xprt *transport)
> > {
> > 	unsigned int state = transport->inet->sk_state;
> > 
> > 	if (state == TCP_CLOSE && transport->sock->state ==
> > SS_UNCONNECTED)
> > 		return;
> > 	if ((1 << state) & (TCPF_ESTABLISHED|TCPF_SYN_SENT))
> > 		return;
> > 	xs_abort_connection(xprt, transport);
> > }
> > 
> > That's changed since 2.6.26 where it unconditionally did the connect()
> > with sa_family == AF_UNSPEC. FWIW we cannot reproduce this problem with
> > 2.6.26.
> 
> The problem is fixed with this patch which also prints out that sk_shutdown
> can be non-zero on entry to xs_tcp_reuse_connection:
> 
> # diff -up /home/company/software/src/linux-2.6.34.2/net/sunrpc/xprtsock.c
> net/sunrpc/xprtsock.c 
> --- /home/company/software/src/linux-2.6.34.2/net/sunrpc/xprtsock.c
> 2010-08-02 18:30:51.000000000 +0100
> +++ net/sunrpc/xprtsock.c       2010-08-05 12:21:11.000000000 +0100
> @@ -1322,10 +1322,11 @@ static void xs_tcp_state_change(struct s
>         if (!(xprt = xprt_from_sock(sk)))
>                 goto out;
>         dprintk("RPC:       xs_tcp_state_change client %p...\n", xprt);
> -       dprintk("RPC:       state %x conn %d dead %d zapped %d\n",
> +       dprintk("RPC:       state %x conn %d dead %d zapped %d sk_shutdown
> %d\n",
>                         sk->sk_state, xprt_connected(xprt),
>                         sock_flag(sk, SOCK_DEAD),
> -                       sock_flag(sk, SOCK_ZAPPED));
> +                       sock_flag(sk, SOCK_ZAPPED),
> +                       sk->sk_shutdown);
>  
>         switch (sk->sk_state) {
>         case TCP_ESTABLISHED:
> @@ -1796,10 +1797,18 @@ static void xs_tcp_reuse_connection(stru
>  {
>         unsigned int state = transport->inet->sk_state;
>  
> -       if (state == TCP_CLOSE && transport->sock->state == SS_UNCONNECTED)
> -               return;
> -       if ((1 << state) & (TCPF_ESTABLISHED|TCPF_SYN_SENT))
> -               return;
> +       if (state == TCP_CLOSE && transport->sock->state == SS_UNCONNECTED)
> {
> +               if (transport->inet->sk_shutdown == 0)
> +                       return;
> +               printk("%s: TCP_CLOSEd and sk_shutdown set to %d\n",
> +                       __func__, transport->inet->sk_shutdown);
> +       }
> +       if ((1 << state) & (TCPF_ESTABLISHED|TCPF_SYN_SENT)) {
> +               if (transport->inet->sk_shutdown == 0)
> +                       return;
> +               printk("%s: sk_shutdown set to %d\n",
> +                       __func__, transport->inet->sk_shutdown);
> +       }
>         xs_abort_connection(xprt, transport);
>  }
> 
> Signed-off-by: Andy Chittenden <andyc.bluearc-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> 
> dmesg displays:
> 
> [ 2840.896043] xs_tcp_reuse_connection: TCP_CLOSEd and sk_shutdown set to 2
> 
> so previously the code was attempting to reuse the connection but wasn't
> aborting it and thus didn't clear down sk_shutdown.

Hi Andy,

I note that you are adding in two new printk()s. Why should they be
printk(), and not dprintk()? Are you trying to report an exception that
the user needs to be aware of, or is this only debugging info that we'll
want to turn off under normal operation?

Also, it might be useful to add a comment to the code here to remind us
what the 'sk_shutdown == 0' case corresponds to as far as the socket
state is concerned, so that the casual reader can see why we shouldn't
reset the connection.

Cheers
  Trond
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] [Bug 16494] NFS client over TCP hangs due to packet loss
       [not found]                   ` <1281037822.2948.49.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2010-08-06  9:30                     ` Andy Chittenden
  2010-08-09  9:27                       ` Andy Chittenden
  0 siblings, 1 reply; 12+ messages in thread
From: Andy Chittenden @ 2010-08-06  9:30 UTC (permalink / raw)
  To: 'Trond Myklebust'
  Cc: 'Andrew Morton', 'David Miller',
	kuznet-v/Mj1YrvjDBInbfyfbPRSQ, pekkas-UjJjq++bwZ7HOG6cAo2yLw,
	jmorris-gx6/JNMH7DfYtjvyW6yDsg, yoshfuji-VfPWfsRibaP+Ru+s062T9g,
	kaber-dcUjhNyLwpNeoWH0uzbU5w, eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	William.Allen.Simpson-Re5JQEeQqe8AvxtiuMwx3w,
	gilad-f4XOiQkOAtcdH0auuBZGHA,
	ilpo.jarvinen-pxSi+dnQzZMxHbG02/KK1g,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, 'J. Bruce Fields',
	'Neil Brown', 'Chuck Lever',
	'Benny Halevy', 'Alexandros Batsakis',
	'Joe Perches', Andy Chittenden

> On Thu, 2010-08-05 at 15:55 +0100, Andy Chittenden wrote:
> > > On 2010-08-03 10:11, Andrew Morton wrote:
> > > > (cc linux-nfs)
> > > >
> > > > On Tue, 03 Aug 2010 01:21:44 -0700 (PDT) David
> > > Miller<davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>  wrote:
> > > >
> > > >> From: "Andy Chittenden"<andyc.bluearc-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> > > >> Date: Tue, 3 Aug 2010 09:14:31 +0100
> > > >>
> > > >>> I don't know whether this patch is the correct fix or not but
> it
> > > enables the
> > > >>> NFS client to recover.
> > > >>>
> > > >>> Kernel version: 2.6.34.1 and 2.6.32.
> > > >>>
> > > >>> Fixes<https://bugzilla.kernel.org/show_bug.cgi?id=16494>. It
> clears
> > > down
> > > >>> any previous shutdown attempts so that reconnects on a socket
> > > that's been
> > > >>> shutdown leave the socket in a usable state (otherwise
> > > tcp_sendmsg() returns
> > > >>> -EPIPE).
> > > >>
> > > >> If the SunRPC code wants to close a TCP socket then use it
> again,
> > > >> it should disconnect by doing a connect() with sa_family ==
> > > AF_UNSPEC
> > >
> > > There is code to do that in the SunRPC code in
> xs_abort_connection()
> > > but
> > > that's conditionally called from xs_tcp_reuse_connection():
> > >
> > > static void xs_tcp_reuse_connection(struct rpc_xprt *xprt, struct
> > > sock_xprt *transport)
> > > {
> > > 	unsigned int state = transport->inet->sk_state;
> > >
> > > 	if (state == TCP_CLOSE && transport->sock->state ==
> > > SS_UNCONNECTED)
> > > 		return;
> > > 	if ((1 << state) & (TCPF_ESTABLISHED|TCPF_SYN_SENT))
> > > 		return;
> > > 	xs_abort_connection(xprt, transport);
> > > }
> > >
> > > That's changed since 2.6.26 where it unconditionally did the
> connect()
> > > with sa_family == AF_UNSPEC. FWIW we cannot reproduce this problem
> with
> > > 2.6.26.
> >
> > The problem is fixed with this patch which also prints out that
> sk_shutdown
> > can be non-zero on entry to xs_tcp_reuse_connection:
> >
> > # diff -up /home/company/software/src/linux-
> 2.6.34.2/net/sunrpc/xprtsock.c
> > net/sunrpc/xprtsock.c
> > --- /home/company/software/src/linux-2.6.34.2/net/sunrpc/xprtsock.c
> > 2010-08-02 18:30:51.000000000 +0100
> > +++ net/sunrpc/xprtsock.c       2010-08-05 12:21:11.000000000 +0100
> > @@ -1322,10 +1322,11 @@ static void xs_tcp_state_change(struct s
> >         if (!(xprt = xprt_from_sock(sk)))
> >                 goto out;
> >         dprintk("RPC:       xs_tcp_state_change client %p...\n",
> xprt);
> > -       dprintk("RPC:       state %x conn %d dead %d zapped %d\n",
> > +       dprintk("RPC:       state %x conn %d dead %d zapped %d
> sk_shutdown
> > %d\n",
> >                         sk->sk_state, xprt_connected(xprt),
> >                         sock_flag(sk, SOCK_DEAD),
> > -                       sock_flag(sk, SOCK_ZAPPED));
> > +                       sock_flag(sk, SOCK_ZAPPED),
> > +                       sk->sk_shutdown);
> >
> >         switch (sk->sk_state) {
> >         case TCP_ESTABLISHED:
> > @@ -1796,10 +1797,18 @@ static void xs_tcp_reuse_connection(stru
> >  {
> >         unsigned int state = transport->inet->sk_state;
> >
> > -       if (state == TCP_CLOSE && transport->sock->state ==
> SS_UNCONNECTED)
> > -               return;
> > -       if ((1 << state) & (TCPF_ESTABLISHED|TCPF_SYN_SENT))
> > -               return;
> > +       if (state == TCP_CLOSE && transport->sock->state ==
> SS_UNCONNECTED)
> > {
> > +               if (transport->inet->sk_shutdown == 0)
> > +                       return;
> > +               printk("%s: TCP_CLOSEd and sk_shutdown set to %d\n",
> > +                       __func__, transport->inet->sk_shutdown);
> > +       }
> > +       if ((1 << state) & (TCPF_ESTABLISHED|TCPF_SYN_SENT)) {
> > +               if (transport->inet->sk_shutdown == 0)
> > +                       return;
> > +               printk("%s: sk_shutdown set to %d\n",
> > +                       __func__, transport->inet->sk_shutdown);
> > +       }
> >         xs_abort_connection(xprt, transport);
> >  }
> >
> > Signed-off-by: Andy Chittenden <andyc.bluearc-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> >
> > dmesg displays:
> >
> > [ 2840.896043] xs_tcp_reuse_connection: TCP_CLOSEd and sk_shutdown
> set to 2
> >
> > so previously the code was attempting to reuse the connection but
> wasn't
> > aborting it and thus didn't clear down sk_shutdown.
> 
> Hi Andy,
> 
> I note that you are adding in two new printk()s. Why should they be
> printk(), and not dprintk()? Are you trying to report an exception that
> the user needs to be aware of, or is this only debugging info that
> we'll
> want to turn off under normal operation?

Hi Trond

Thanks for replying. It was debugging info: the printk()s show what the problem is. I was half expecting someone to pipe up "that isn't the correct way to fix this" and suggest another avenue to look at, or even, hopefully, come up with an alternative appropriate patch as I'm not an expert in this code: I don't know whether the sk_shutdown field being left set has implications elsewhere in the sunrpc code. FWIW I left the test case running overnight and have had only 50 such messages logged so it's not a heavy printk() load.

> 
> Also, it might be useful to add a comment to the code here to remind us
> what the 'sk_shutdown == 0' case corresponds to as far as the socket
> state is concerned, so that the casual reader can see why we shouldn't
> reset the connection.

If I knew what sk_shutdown == 0 really corresponded to, I could well add a comment! :-). I just knew that in 2.6.26 we didn't see this problem and that in later kernels the connection abort sequence was being done conditionally and that the sk_shutdown flag being left set was making tcp_sendmsg return an error. So, putting two and two together, I've effectively just added another condition in which to abort the connection.

As nobody has objected to the essence of my patch, I'll attempt a new patch that changes those printk()s into dprintk() and drop in what I think are appropriate comments. So here's a revised patch:

# diff -up /home/company/software/src/linux-2.6.34.2/net/sunrpc/xprtsock.c net/sunrpc/xprtsock.c 
--- /home/company/software/src/linux-2.6.34.2/net/sunrpc/xprtsock.c     2010-08-02 18:30:51.000000000 +0100
+++ net/sunrpc/xprtsock.c       2010-08-06 08:09:08.000000000 +0100
@@ -1322,10 +1322,11 @@ static void xs_tcp_state_change(struct s
        if (!(xprt = xprt_from_sock(sk)))
                goto out;
        dprintk("RPC:       xs_tcp_state_change client %p...\n", xprt);
-       dprintk("RPC:       state %x conn %d dead %d zapped %d\n",
+       dprintk("RPC:       state %x conn %d dead %d zapped %d sk_shutdown %d\n",
                        sk->sk_state, xprt_connected(xprt),
                        sock_flag(sk, SOCK_DEAD),
-                       sock_flag(sk, SOCK_ZAPPED));
+                       sock_flag(sk, SOCK_ZAPPED),
+                       sk->sk_shutdown);
 
        switch (sk->sk_state) {
        case TCP_ESTABLISHED:
@@ -1796,10 +1797,25 @@ static void xs_tcp_reuse_connection(stru
 {
        unsigned int state = transport->inet->sk_state;
 
-       if (state == TCP_CLOSE && transport->sock->state == SS_UNCONNECTED)
-               return;
-       if ((1 << state) & (TCPF_ESTABLISHED|TCPF_SYN_SENT))
-               return;
+       if (state == TCP_CLOSE && transport->sock->state == SS_UNCONNECTED) {
+               /* we don't need to abort the connection if the socket
+                * hasn't undergone a shutdown
+                */
+               if (transport->inet->sk_shutdown == 0)
+                       return;
+               dprintk("RPC:       %s: TCP_CLOSEd and sk_shutdown set to %d\n",
+                       __func__, transport->inet->sk_shutdown);
+       }
+       if ((1 << state) & (TCPF_ESTABLISHED|TCPF_SYN_SENT)) {
+               /* we don't need to abort the connection if the socket
+                * hasn't undergone a shutdown
+                */
+               if (transport->inet->sk_shutdown == 0)
+                       return;
+               dprintk("RPC:       %s: ESTABLISHED/SYN_SENT "
+                               "sk_shutdown set to %d\n",
+                               __func__, transport->inet->sk_shutdown);
+       }
        xs_abort_connection(xprt, transport);
 }

Signed-off-by: Andy Chittenden <andyc.bluearc-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>


--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] [Bug 16494] NFS client over TCP hangs due to packet loss
  2010-08-06  9:30                     ` Andy Chittenden
@ 2010-08-09  9:27                       ` Andy Chittenden
  2010-08-09 16:55                         ` Trond Myklebust
  2018-06-19 21:56                         ` Joe Perches
  0 siblings, 2 replies; 12+ messages in thread
From: Andy Chittenden @ 2010-08-09  9:27 UTC (permalink / raw)
  To: 'Andy Chittenden', 'Trond Myklebust'
  Cc: 'Andrew Morton', 'David Miller', kuznet, pekkas,
	jmorris, yoshfuji, kaber, eric.dumazet, William.Allen.Simpson,
	gilad, ilpo.jarvinen, netdev, linux-kernel, linux-nfs,
	'J. Bruce Fields', 'Neil Brown',
	'Chuck Lever', 'Benny Halevy',
	'Alexandros Batsakis', 'Joe Perches',
	Andy Chittenden

> > On Thu, 2010-08-05 at 15:55 +0100, Andy Chittenden wrote:
> > > > On 2010-08-03 10:11, Andrew Morton wrote:
> > > > > (cc linux-nfs)
> > > > >
> > > > > On Tue, 03 Aug 2010 01:21:44 -0700 (PDT) David
> > > > Miller<davem@davemloft.net>  wrote:
> > > > >
> > > > >> From: "Andy Chittenden"<andyc.bluearc@gmail.com>
> > > > >> Date: Tue, 3 Aug 2010 09:14:31 +0100
> > > > >>
> > > > >>> I don't know whether this patch is the correct fix or not but
> > it
> > > > enables the
> > > > >>> NFS client to recover.
> > > > >>>
> > > > >>> Kernel version: 2.6.34.1 and 2.6.32.
> > > > >>>
> > > > >>> Fixes<https://bugzilla.kernel.org/show_bug.cgi?id=16494>. It
> > clears
> > > > down
> > > > >>> any previous shutdown attempts so that reconnects on a socket
> > > > that's been
> > > > >>> shutdown leave the socket in a usable state (otherwise
> > > > tcp_sendmsg() returns
> > > > >>> -EPIPE).
> > > > >>
> > > > >> If the SunRPC code wants to close a TCP socket then use it
> > again,
> > > > >> it should disconnect by doing a connect() with sa_family ==
> > > > AF_UNSPEC
> > > >
> > > > There is code to do that in the SunRPC code in
> > xs_abort_connection()
> > > > but
> > > > that's conditionally called from xs_tcp_reuse_connection():
> > > >
> > > > static void xs_tcp_reuse_connection(struct rpc_xprt *xprt, struct
> > > > sock_xprt *transport)
> > > > {
> > > > 	unsigned int state = transport->inet->sk_state;
> > > >
> > > > 	if (state == TCP_CLOSE && transport->sock->state ==
> > > > SS_UNCONNECTED)
> > > > 		return;
> > > > 	if ((1 << state) & (TCPF_ESTABLISHED|TCPF_SYN_SENT))
> > > > 		return;
> > > > 	xs_abort_connection(xprt, transport);
> > > > }
> > > >
> > > > That's changed since 2.6.26 where it unconditionally did the
> > connect()
> > > > with sa_family == AF_UNSPEC. FWIW we cannot reproduce this
> problem
> > with
> > > > 2.6.26.
> > >
> > > The problem is fixed with this patch which also prints out that
> > sk_shutdown
> > > can be non-zero on entry to xs_tcp_reuse_connection:
> > >
> > > # diff -up /home/company/software/src/linux-
> > 2.6.34.2/net/sunrpc/xprtsock.c
> > > net/sunrpc/xprtsock.c
> > > --- /home/company/software/src/linux-2.6.34.2/net/sunrpc/xprtsock.c
> > > 2010-08-02 18:30:51.000000000 +0100
> > > +++ net/sunrpc/xprtsock.c       2010-08-05 12:21:11.000000000 +0100
> > > @@ -1322,10 +1322,11 @@ static void xs_tcp_state_change(struct s
> > >         if (!(xprt = xprt_from_sock(sk)))
> > >                 goto out;
> > >         dprintk("RPC:       xs_tcp_state_change client %p...\n",
> > xprt);
> > > -       dprintk("RPC:       state %x conn %d dead %d zapped %d\n",
> > > +       dprintk("RPC:       state %x conn %d dead %d zapped %d
> > sk_shutdown
> > > %d\n",
> > >                         sk->sk_state, xprt_connected(xprt),
> > >                         sock_flag(sk, SOCK_DEAD),
> > > -                       sock_flag(sk, SOCK_ZAPPED));
> > > +                       sock_flag(sk, SOCK_ZAPPED),
> > > +                       sk->sk_shutdown);
> > >
> > >         switch (sk->sk_state) {
> > >         case TCP_ESTABLISHED:
> > > @@ -1796,10 +1797,18 @@ static void xs_tcp_reuse_connection(stru
> > >  {
> > >         unsigned int state = transport->inet->sk_state;
> > >
> > > -       if (state == TCP_CLOSE && transport->sock->state ==
> > SS_UNCONNECTED)
> > > -               return;
> > > -       if ((1 << state) & (TCPF_ESTABLISHED|TCPF_SYN_SENT))
> > > -               return;
> > > +       if (state == TCP_CLOSE && transport->sock->state ==
> > SS_UNCONNECTED)
> > > {
> > > +               if (transport->inet->sk_shutdown == 0)
> > > +                       return;
> > > +               printk("%s: TCP_CLOSEd and sk_shutdown set to
> %d\n",
> > > +                       __func__, transport->inet->sk_shutdown);
> > > +       }
> > > +       if ((1 << state) & (TCPF_ESTABLISHED|TCPF_SYN_SENT)) {
> > > +               if (transport->inet->sk_shutdown == 0)
> > > +                       return;
> > > +               printk("%s: sk_shutdown set to %d\n",
> > > +                       __func__, transport->inet->sk_shutdown);
> > > +       }
> > >         xs_abort_connection(xprt, transport);
> > >  }
> > >
> > > Signed-off-by: Andy Chittenden <andyc.bluearc@gmail.com>
> > >
> > > dmesg displays:
> > >
> > > [ 2840.896043] xs_tcp_reuse_connection: TCP_CLOSEd and sk_shutdown
> > set to 2
> > >
> > > so previously the code was attempting to reuse the connection but
> > wasn't
> > > aborting it and thus didn't clear down sk_shutdown.
> >
> > Hi Andy,
> >
> > I note that you are adding in two new printk()s. Why should they be
> > printk(), and not dprintk()? Are you trying to report an exception
> that
> > the user needs to be aware of, or is this only debugging info that
> > we'll
> > want to turn off under normal operation?
> 
> Hi Trond
> 
> Thanks for replying. It was debugging info: the printk()s show what the
> problem is. I was half expecting someone to pipe up "that isn't the
> correct way to fix this" and suggest another avenue to look at, or
> even, hopefully, come up with an alternative appropriate patch as I'm
> not an expert in this code: I don't know whether the sk_shutdown field
> being left set has implications elsewhere in the sunrpc code. FWIW I
> left the test case running overnight and have had only 50 such messages
> logged so it's not a heavy printk() load.
> 
> >
> > Also, it might be useful to add a comment to the code here to remind
> us
> > what the 'sk_shutdown == 0' case corresponds to as far as the socket
> > state is concerned, so that the casual reader can see why we
> shouldn't
> > reset the connection.
> 
> If I knew what sk_shutdown == 0 really corresponded to, I could well
> add a comment! :-). I just knew that in 2.6.26 we didn't see this
> problem and that in later kernels the connection abort sequence was
> being done conditionally and that the sk_shutdown flag being left set
> was making tcp_sendmsg return an error. So, putting two and two
> together, I've effectively just added another condition in which to
> abort the connection.
> 
> As nobody has objected to the essence of my patch, I'll attempt a new
> patch that changes those printk()s into dprintk() and drop in what I
> think are appropriate comments. So here's a revised patch:
> 
> # diff -up /home/company/software/src/linux-
> 2.6.34.2/net/sunrpc/xprtsock.c net/sunrpc/xprtsock.c
> --- /home/company/software/src/linux-2.6.34.2/net/sunrpc/xprtsock.c
> 2010-08-02 18:30:51.000000000 +0100
> +++ net/sunrpc/xprtsock.c       2010-08-06 08:09:08.000000000 +0100
> @@ -1322,10 +1322,11 @@ static void xs_tcp_state_change(struct s
>         if (!(xprt = xprt_from_sock(sk)))
>                 goto out;
>         dprintk("RPC:       xs_tcp_state_change client %p...\n", xprt);
> -       dprintk("RPC:       state %x conn %d dead %d zapped %d\n",
> +       dprintk("RPC:       state %x conn %d dead %d zapped %d
> sk_shutdown %d\n",
>                         sk->sk_state, xprt_connected(xprt),
>                         sock_flag(sk, SOCK_DEAD),
> -                       sock_flag(sk, SOCK_ZAPPED));
> +                       sock_flag(sk, SOCK_ZAPPED),
> +                       sk->sk_shutdown);
> 
>         switch (sk->sk_state) {
>         case TCP_ESTABLISHED:
> @@ -1796,10 +1797,25 @@ static void xs_tcp_reuse_connection(stru
>  {
>         unsigned int state = transport->inet->sk_state;
> 
> -       if (state == TCP_CLOSE && transport->sock->state ==
> SS_UNCONNECTED)
> -               return;
> -       if ((1 << state) & (TCPF_ESTABLISHED|TCPF_SYN_SENT))
> -               return;
> +       if (state == TCP_CLOSE && transport->sock->state ==
> SS_UNCONNECTED) {
> +               /* we don't need to abort the connection if the socket
> +                * hasn't undergone a shutdown
> +                */
> +               if (transport->inet->sk_shutdown == 0)
> +                       return;
> +               dprintk("RPC:       %s: TCP_CLOSEd and sk_shutdown set
> to %d\n",
> +                       __func__, transport->inet->sk_shutdown);
> +       }
> +       if ((1 << state) & (TCPF_ESTABLISHED|TCPF_SYN_SENT)) {
> +               /* we don't need to abort the connection if the socket
> +                * hasn't undergone a shutdown
> +                */
> +               if (transport->inet->sk_shutdown == 0)
> +                       return;
> +               dprintk("RPC:       %s: ESTABLISHED/SYN_SENT "
> +                               "sk_shutdown set to %d\n",
> +                               __func__, transport->inet-
> >sk_shutdown);
> +       }
>         xs_abort_connection(xprt, transport);
>  }
> 
> Signed-off-by: Andy Chittenden <andyc.bluearc@gmail.com>
> 

A weekend run with that patch applied to 2.6.34.2 was successful. As nobody has objected, what's the next step to getting it applied to the official source trees?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] [Bug 16494] NFS client over TCP hangs due to packet loss
  2010-08-09  9:27                       ` Andy Chittenden
@ 2010-08-09 16:55                         ` Trond Myklebust
       [not found]                           ` <1281372927.8950.3.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  2018-06-19 21:56                         ` Joe Perches
  1 sibling, 1 reply; 12+ messages in thread
From: Trond Myklebust @ 2010-08-09 16:55 UTC (permalink / raw)
  To: Andy Chittenden
  Cc: 'Andrew Morton', 'David Miller', kuznet, pekkas,
	jmorris, yoshfuji, kaber, eric.dumazet, William.Allen.Simpson,
	gilad, ilpo.jarvinen, netdev, linux-kernel, linux-nfs,
	'J. Bruce Fields', 'Neil Brown',
	'Chuck Lever', 'Benny Halevy',
	'Alexandros Batsakis', 'Joe Perches',
	Andy Chittenden

On Mon, 2010-08-09 at 10:27 +0100, Andy Chittenden wrote:
> A weekend run with that patch applied to 2.6.34.2 was successful. As nobody has objected, what's the next step to getting it applied to the official source trees?

Please resend me a version with a cleaned up changelog entry. I can then
push it as a bugfix.

Cheers
  Trond

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] [Bug 16494] NFS client over TCP hangs due to packet loss
       [not found]                           ` <1281372927.8950.3.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2010-08-10  8:40                             ` Andy Chittenden
  0 siblings, 0 replies; 12+ messages in thread
From: Andy Chittenden @ 2010-08-10  8:40 UTC (permalink / raw)
  To: 'Trond Myklebust'
  Cc: 'Andrew Morton', 'David Miller',
	kuznet-v/Mj1YrvjDBInbfyfbPRSQ, pekkas-UjJjq++bwZ7HOG6cAo2yLw,
	jmorris-gx6/JNMH7DfYtjvyW6yDsg, yoshfuji-VfPWfsRibaP+Ru+s062T9g,
	kaber-dcUjhNyLwpNeoWH0uzbU5w, eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	William.Allen.Simpson-Re5JQEeQqe8AvxtiuMwx3w,
	gilad-f4XOiQkOAtcdH0auuBZGHA,
	ilpo.jarvinen-pxSi+dnQzZMxHbG02/KK1g,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, 'J. Bruce Fields',
	'Neil Brown', 'Chuck Lever',
	'Benny Halevy', 'Alexandros Batsakis',
	'Joe Perches', Andy Chittenden

> On Mon, 2010-08-09 at 10:27 +0100, Andy Chittenden wrote:
> > A weekend run with that patch applied to 2.6.34.2 was successful. As
> nobody has objected, what's the next step to getting it applied to the
> official source trees?
> 
> Please resend me a version with a cleaned up changelog entry. I can
> then
> push it as a bugfix.
> 
> Cheers
>   Trond

Thanks. I think this sums it up:

SUNRPC: fix NFS client over TCP hangs due to packet loss (Bug 16494)

When reusing a TCP connection, ensure that it's aborted if a previous shutdown attempt has been made on that connection so that the RPC over TCP recovery mechanism succeeds.

# diff -up /home/company/software/src/linux-2.6.34.2/net/sunrpc/xprtsock.c net/sunrpc/xprtsock.c 
--- /home/company/software/src/linux-2.6.34.2/net/sunrpc/xprtsock.c     2010-08-02 18:30:51.000000000 +0100
+++ net/sunrpc/xprtsock.c       2010-08-06 08:09:08.000000000 +0100
@@ -1322,10 +1322,11 @@ static void xs_tcp_state_change(struct s
        if (!(xprt = xprt_from_sock(sk)))
                goto out;
        dprintk("RPC:       xs_tcp_state_change client %p...\n", xprt);
-       dprintk("RPC:       state %x conn %d dead %d zapped %d\n",
+       dprintk("RPC:       state %x conn %d dead %d zapped %d sk_shutdown %d\n",
                        sk->sk_state, xprt_connected(xprt),
                        sock_flag(sk, SOCK_DEAD),
-                       sock_flag(sk, SOCK_ZAPPED));
+                       sock_flag(sk, SOCK_ZAPPED),
+                       sk->sk_shutdown);
 
        switch (sk->sk_state) {
        case TCP_ESTABLISHED:
@@ -1796,10 +1797,25 @@ static void xs_tcp_reuse_connection(stru
 {
        unsigned int state = transport->inet->sk_state;
 
-       if (state == TCP_CLOSE && transport->sock->state == SS_UNCONNECTED)
-               return;
-       if ((1 << state) & (TCPF_ESTABLISHED|TCPF_SYN_SENT))
-               return;
+       if (state == TCP_CLOSE && transport->sock->state == SS_UNCONNECTED) {
+               /* we don't need to abort the connection if the socket
+                * hasn't undergone a shutdown
+                */
+               if (transport->inet->sk_shutdown == 0)
+                       return;
+               dprintk("RPC:       %s: TCP_CLOSEd and sk_shutdown set to %d\n",
+                       __func__, transport->inet->sk_shutdown);
+       }
+       if ((1 << state) & (TCPF_ESTABLISHED|TCPF_SYN_SENT)) {
+               /* we don't need to abort the connection if the socket
+                * hasn't undergone a shutdown
+                */
+               if (transport->inet->sk_shutdown == 0)
+                       return;
+               dprintk("RPC:       %s: ESTABLISHED/SYN_SENT "
+                               "sk_shutdown set to %d\n",
+                               __func__, transport->inet->sk_shutdown);
+       }
        xs_abort_connection(xprt, transport);
 }

Signed-off-by: Andy Chittenden <andyc.bluearc-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] [Bug 16494] NFS client over TCP hangs due to packet loss
  2010-08-09  9:27                       ` Andy Chittenden
  2010-08-09 16:55                         ` Trond Myklebust
@ 2018-06-19 21:56                         ` Joe Perches
  2018-06-20 16:40                           ` Andy C
  1 sibling, 1 reply; 12+ messages in thread
From: Joe Perches @ 2018-06-19 21:56 UTC (permalink / raw)
  To: Andy Chittenden, 'Trond Myklebust'
  Cc: 'Andrew Morton', 'David Miller', kuznet, pekkas,
	jmorris, yoshfuji, kaber, eric.dumazet, William.Allen.Simpson,
	gilad, ilpo.jarvinen, netdev, linux-kernel, linux-nfs,
	'J. Bruce Fields', 'Neil Brown',
	'Chuck Lever', 'Benny Halevy',
	'Alexandros Batsakis', Andy Chittenden

On Mon, 2010-08-09 at 10:27 +0100, Andy Chittenden wrote:

You really need to check your clock.
Mail sent in the year 2010?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] [Bug 16494] NFS client over TCP hangs due to packet loss
  2018-06-19 21:56                         ` Joe Perches
@ 2018-06-20 16:40                           ` Andy C
  0 siblings, 0 replies; 12+ messages in thread
From: Andy C @ 2018-06-20 16:40 UTC (permalink / raw)
  To: Joe Perches, 'Trond Myklebust'
  Cc: 'Andrew Morton', 'David Miller', kuznet, pekkas,
	jmorris, yoshfuji, kaber, eric.dumazet, William.Allen.Simpson,
	gilad, ilpo.jarvinen, netdev, linux-kernel, linux-nfs,
	'J. Bruce Fields', 'Neil Brown',
	'Chuck Lever', 'Benny Halevy',
	'Alexandros Batsakis', Andy Chittenden

On 2018-06-19 22:56, Joe Perches wrote:
> On Mon, 2010-08-09 at 10:27 +0100, Andy Chittenden wrote:
>
> You really need to check your clock.
> Mail sent in the year 2010?
>
I didn't send it just now. I sent it on 2010-08-09 at 10:27. And I did 
receive a copy at that time. If I'm reading the headers correctly, 
something happened here:

Received: from mchehab by bombadil.infradead.org with local (Exim 4.90_1 #2 (Red Hat Linux))
	id 1fVNCq-0005da-3g; Tue, 19 Jun 2018 20:26:24 +0000
Received: from vger.kernel.org ([209.132.180.67])
	by bombadil.infradead.org with esmtp (Exim 4.72 #1 (Red Hat Linux))
	id 1OiOdm-0004za-Ks; Mon, 09 Aug 2010 09:27:30 +0000
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1755981Ab0HIJ1U (ORCPT <rfc822;tgraf@infradead.org> + 8 others);
	Mon, 9 Aug 2010 05:27:20 -0400

-- 
Andy

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-06-20 16:40 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <4c57cfe8.887b0e0a.2f79.4772@mx.google.com>
2010-08-03  8:21 ` [PATCH] [Bug 16494] NFS client over TCP hangs due to packet loss David Miller
     [not found]   ` <20100803.012144.267950450.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2010-08-03  9:11     ` Andrew Morton
     [not found]       ` <20100803021110.f0b3877b.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2010-08-03 10:25         ` Andy Chittenden
     [not found]           ` <4C57EE9A.7040308-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2010-08-05 14:55             ` Andy Chittenden
     [not found]               ` <4c5ad0d6.42ecd80a.47d7.0dfc-ATjtLOhZ0NVl57MIdRCFDg@public.gmane.org>
2010-08-05 19:50                 ` Trond Myklebust
     [not found]                   ` <1281037822.2948.49.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2010-08-06  9:30                     ` Andy Chittenden
2010-08-09  9:27                       ` Andy Chittenden
2010-08-09 16:55                         ` Trond Myklebust
     [not found]                           ` <1281372927.8950.3.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2010-08-10  8:40                             ` Andy Chittenden
2018-06-19 21:56                         ` Joe Perches
2018-06-20 16:40                           ` Andy C
2010-08-03  8:14 Andy Chittenden

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).