netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Russell King - ARM Linux <linux-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org>
To: Eric Dumazet <eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Trond Myklebust
	<trond.myklebust-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>,
	Anna Schumaker
	<anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
Subject: Re: NFS/TCP/IPv6 acting strangely in 4.2
Date: Fri, 11 Sep 2015 17:49:38 +0100	[thread overview]
Message-ID: <20150911164937.GW21084@n2100.arm.linux.org.uk> (raw)
In-Reply-To: <20150911162416.GV21084-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org>

On Fri, Sep 11, 2015 at 05:24:17PM +0100, Russell King - ARM Linux wrote:
> On Fri, Sep 11, 2015 at 08:18:43AM -0700, Eric Dumazet wrote:
> > On Fri, 2015-09-11 at 16:06 +0100, Russell King - ARM Linux wrote:
> > > On Fri, Sep 11, 2015 at 03:33:47PM +0100, Russell King - ARM Linux wrote:
> > > > It looks like 0c78789e3a030615c6650fde89546cadf40ec2cc might be relevant
> > > > too, but I don't see that solving the multiple _concurrent_ connection
> > > > attempts with the same port number - presumably it's somehow trying to
> > > > make the same socket repeatedly connect despite a previous connection
> > > > being in progress, which would have nothing to do with cleaning up a
> > > > previous attempt.
> > > 
> > > As I suspected, applying the above commit in addition does not solve the
> > > problem, I still see the same behaviour: SYN SYNACK SYN RSTACK, SYN
> > > SYNACK SYN RSTACK, and eventual SYN storms.
> > > 
> > > I do have this captured as well:
> > > 
> > > 2558   0.834316 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1053655487 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60001 TSecr=0 WS=128
> > > 2559   0.834572 n2100 -> armada388 TCP nfs→rpasswd [SYN, ACK] Seq=3076611574 Ack=1053655488 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=869622246 TSecr=60001 WS=64
> > > 2560   0.834666 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd→nfs [SYN] Seq=1054228544 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60005 TSecr=0 WS=128
> > > 2561   0.834895 n2100 -> armada388 TCP nfs→rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622246 TSecr=60001
> > > 
> > > The packet at 2561 looks wrong to me - this doesn't follow what I know
> > > would be the standard TCP setup of syn, synack, ack, because that final
> > > ack is in the wrong direction.
> > > 
> > 
> > This 2561 packet is an ACK packet, because n2100 has a SYN_RECV socket
> > created by packet 2558.
> > 
> > It receives a SYN packet (2560) that it interprets as a packet slightly
> > out of sequence (1054228544 being above 1053655487) for this SYN_RECV
> > 
> > The wrong packet is 2560, not 2561
> 
> Ok.
> 
> Looking deeper at the XPRT sunrpc code, I have to wonder about the
> sanity of this:
> 
> void xprt_connect(struct rpc_task *task)
> {
> ...
>         if (!xprt_connected(xprt)) {
> ...
>                 if (test_bit(XPRT_CLOSING, &xprt->state))
>                         return;
>                 if (xprt_test_and_set_connecting(xprt))
>                         return;
>                 xprt->stat.connect_start = jiffies;
>                 xprt->ops->connect(xprt, task);
> 
> That calls into xs_connect(), which schedules a workqueue to do the
> connection.  The workqueue will call xs_tcp_setup_socket().
> 
> xs_tcp_setup_socket() creates a socket if one didn't exist, otherwise
> re-using the previously obtained socket (which'll be why its using the
> same socket) and then goes on to call xs_tcp_finish_connecting().
> 
> xs_tcp_finish_connecting() calls kernel_connect(), which will return
> -EINPROGRESS.  We seem to treat EINPROGRESS as if the connection was
> successful:
> 
>         case 0:
>         case -EINPROGRESS:
>         case -EALREADY:
>                 xprt_unlock_connect(xprt, transport);
>                 xprt_clear_connecting(xprt);
>                 return;
> 
> and the xprt_clear_connecting() results in this whole path being
> re-opened: the socket is not yet connected, so xprt_connected() will
> return false, and despite the socket connection still being mid-way
> through being connected, we clear the "connecting" status, causing
> xprt_test_and_set_connecting() to return false.
> 
> That allows us to re-call xprt->ops->connect, re-queue the connect
> worker, and re-run the call to kernel_connect() for a socket which is
> already mid-way through being connected.
> 
> Shouldn't the "connecting" status only be cleared when either the socket
> has _finished_ connecting, or when the connection has _failed_ to connect,
> and not when it's mid-way through connecting?
> 
> I've not been able to prove this: I've set rpc_debug to 129 to log
> just xprt and trans RPC facilities, and that's sufficient to change
> the timing such that this doesn't happen.

Following that idea, I just tried the patch below, and it seems to work.
I don't know whether it handles all cases after a call to kernel_connect(),
but it stops the multiple connection attempts:

  1   0.000000 armada388 -> n2100 TCP 1009→nfs [SYN] Seq=3794066539 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=15712 TSecr=870317691 WS=128
  2   0.000414 n2100 -> armada388 TCP nfs→1009 [SYN, ACK] Seq=1884476522 Ack=3794066540 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=870318939 TSecr=15712 WS=64
  3   0.000787 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066540 Ack=1884476523 Win=28672 Len=0 TSval=15712 TSecr=870318939
  4   0.001304 armada388 -> n2100 NFS V3 ACCESS Call, FH: 0x905379cc, [Check: RD LU MD XT DL]
  5   0.001566 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523 Ack=3794066660 Win=28608 Len=0 TSval=870318939 TSecr=15712
  6   0.001640 armada388 -> n2100 NFS V3 ACCESS Call, FH: 0x905379cc, [Check: RD LU MD XT DL]
  7   0.001866 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523 Ack=3794066780 Win=28608 Len=0 TSval=870318939 TSecr=15712
  8   0.003070 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 4), [Allowed: RD LU MD XT DL]
  9   0.003415 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066780 Ack=1884476647 Win=28672 Len=0 TSval=15712 TSecr=870318939
 10   0.003592 armada388 -> n2100 NFS V3 ACCESS Call, FH: 0xe15fc9c9, [Check: RD LU MD XT DL]
 11   0.004354 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 6), [Allowed: RD LU MD XT DL]
 12   0.004682 armada388 -> n2100 NFS V3 ACCESS Call, FH: 0xe15fc9c9, [Check: RD LU MD XT DL]
 13   0.005365 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 10), [Allowed: RD LU MD XT DL]
 14   0.005701 armada388 -> n2100 NFS V3 GETATTR Call, FH: 0xe15fc9c9
...

 net/sunrpc/xprtsock.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index ff5b6a2e62c3..c456d6e51c56 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1450,6 +1450,7 @@ static void xs_tcp_state_change(struct sock *sk)
 	switch (sk->sk_state) {
 	case TCP_ESTABLISHED:
 		spin_lock(&xprt->transport_lock);
+		xprt_clear_connecting(xprt);
 		if (!xprt_test_and_set_connected(xprt)) {
 			struct sock_xprt *transport = container_of(xprt,
 					struct sock_xprt, xprt);
@@ -1474,12 +1475,14 @@ static void xs_tcp_state_change(struct sock *sk)
 		smp_mb__before_atomic();
 		clear_bit(XPRT_CONNECTED, &xprt->state);
 		clear_bit(XPRT_CLOSE_WAIT, &xprt->state);
+		clear_bit(XPRT_CONNECTING, &xprt->state);		
 		smp_mb__after_atomic();
 		break;
 	case TCP_CLOSE_WAIT:
 		/* The server initiated a shutdown of the socket */
 		xprt->connect_cookie++;
 		clear_bit(XPRT_CONNECTED, &xprt->state);
+		clear_bit(XPRT_CONNECTING, &xprt->state);
 		xs_tcp_force_close(xprt);
 	case TCP_CLOSING:
 		/*
@@ -1493,6 +1496,7 @@ static void xs_tcp_state_change(struct sock *sk)
 		set_bit(XPRT_CLOSING, &xprt->state);
 		smp_mb__before_atomic();
 		clear_bit(XPRT_CONNECTED, &xprt->state);
+		clear_bit(XPRT_CONNECTING, &xprt->state);
 		smp_mb__after_atomic();
 		break;
 	case TCP_CLOSE:
@@ -2237,11 +2241,13 @@ static void xs_tcp_setup_socket(struct work_struct *work)
 		xs_tcp_force_close(xprt);
 		break;
 	case 0:
-	case -EINPROGRESS:
 	case -EALREADY:
 		xprt_unlock_connect(xprt, transport);
 		xprt_clear_connecting(xprt);
 		return;
+	case -EINPROGRESS:
+		xprt_unlock_connect(xprt, transport);
+		return;
 	case -EINVAL:
 		/* Happens, for instance, if the user specified a link
 		 * local IPv6 address without a scope-id.


-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2015-09-11 16:49 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-11 11:38 NFS/TCP/IPv6 acting strangely in 4.2 Russell King - ARM Linux
     [not found] ` <20150911113839.GO21084-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org>
2015-09-11 13:04   ` Eric Dumazet
     [not found]     ` <1441976691.4619.58.camel-XN9IlZ5yJG9HTL0Zs8A6p/gx64E7kk8eUsxypvmhUTTZJqsBc5GL+g@public.gmane.org>
2015-09-11 13:18       ` Russell King - ARM Linux
2015-09-11 14:33     ` Russell King - ARM Linux
2015-09-11 15:06       ` Russell King - ARM Linux
2015-09-11 15:18         ` Eric Dumazet
2015-09-11 16:24           ` Russell King - ARM Linux
     [not found]             ` <20150911162416.GV21084-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org>
2015-09-11 16:49               ` Russell King - ARM Linux [this message]
     [not found]                 ` <20150911164937.GW21084-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org>
2015-09-17 13:57                   ` Russell King - ARM Linux
     [not found]                     ` <20150917135705.GQ21084-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org>
2015-09-17 14:18                       ` Trond Myklebust
     [not found]                         ` <1442499509.2865.16.camel-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
2015-09-17 16:27                           ` Benjamin Coddington
2015-09-17 22:03                             ` Trond Myklebust
2015-09-17 21:47                           ` Russell King - ARM Linux
     [not found]                             ` <20150917214758.GW21084-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org>
2015-09-17 22:03                               ` Trond Myklebust
2015-09-16  6:53 ` Damien Thébault
     [not found]   ` <1442386435.3756.282.camel-RsgQwIhcE7EAvxtiuMwx3w@public.gmane.org>
2015-09-16  7:15     ` Gregory CLEMENT
     [not found]       ` <87zj0mbzsj.fsf-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8@public.gmane.org>
2015-09-16  7:39         ` Damien Thébault
2015-09-16  7:31   ` Willy Tarreau
2015-09-17 14:06   ` Russell King - ARM Linux

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150911164937.GW21084@n2100.arm.linux.org.uk \
    --to=linux-lfz/pmaqli7xmaaqvzeohq@public.gmane.org \
    --cc=anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org \
    --cc=eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
    --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=trond.myklebust-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).