netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: soukjin.bae@samsung.com,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: [Android][Kernel][TCP/IP] report of packet discarding during tcp handshaking
Date: Tue, 06 Feb 2018 20:31:14 -0800	[thread overview]
Message-ID: <1517977874.3715.153.camel@gmail.com> (raw)
In-Reply-To: <20180207021647epcms1p128701b4f7e63a88838dd132221119571@epcms1p1>

On Wed, 2018-02-07 at 11:16 +0900, 배석진 wrote:
> Hello, 
> this is bae working on samsung elec. 
> 

Hi Bae,  thanks for this detailed report and analysis.

> we have a problem that packet discarded during 3-way handshaking on TCP. 
> already looks like that Mr Dumazet try to fix the similar issue on this patch, https://android.googlesource.com/kernel/common/+/5e0724d027f0548511a2165a209572d48fe7a4c8 

This patch was fixing a more serious bug, since this was possibly
causing corruptions and kernel crashes.

What you describe is simply a packet being dropped, on a very low
probability.

Which is not a huge deal for TCP since this packet will eventually be
re-transmitted.

Also most TCP stacks/flows use DF bit set to not allow packets being
fragmented...

Anyway see my answer at the end of this mail.


> but we are still facing the another corner case.
> 
> it needs preconditions for this problem.
> (1) last ack packet of 3-way handshaking and next packet have been arrived at almost same time 
> (2) next packet, the first data packet was fragmented 
> (3) enable rps
> 
> 
> [tcp dump]
> No.     A-Time         Source     Destination  Len   Seq  Info 
>  1  08:35:18.115259  193.81.6.70  10.217.0.47  84     0   [SYN] Seq=0 Win=21504 Len=0 MSS=1460 
>  2  08:35:18.115888  10.217.0.47  193.81.6.70  84     0   6100 → 5063 [SYN, ACK] Seq=0 Ack=1 Win=29200 Len=0 MSS=1460 
>  3  08:35:18.142385  193.81.6.70  10.217.0.47  80     1   5063 → 6100 [ACK] Seq=1 Ack=1 Win=21504 Len=0 
>  4  08:35:18.142425  193.81.6.70  10.217.0.47  1516       Fragmented IP protocol (proto=Encap Security Payload 50, off=0, ID=6e24) [Reassembled in #5] 
>  5  08:35:18.142449  193.81.6.70  10.217.0.47  60     1   5063 → 6100 [ACK] Seq=1 Ack=1 Win=21504 Len=1460 [TCP segment of a reassembled PDU] 
>  6  08:35:21.227070  193.81.6.70  10.217.0.47  1516       Fragmented IP protocol (proto=Encap Security Payload 50, off=0, ID=71e9) [Reassembled in #7] 
>  7  08:35:21.227191  193.81.6.70  10.217.0.47  60     1   [TCP Retransmission] 5063 → 6100 [ACK] Seq=1 Ack=1 Win=21504 Len=1460 
>  8  08:35:21.228822  10.217.0.47  193.81.6.70  80     1   6100 → 5063 [ACK] Seq=1 Ack=1461 Win=32120 Len=0
> 
> - last ack packet of handshaking(No.3) and next data packet(No4,5) were arrived with just 40us time gap.
> 
> 
> [kernel log]
> - stage 1 
> <3>[ 1037.669229] I[0:  system_server: 3778] get_rps_cpu: skb(64), check hash value:3412396090 
> <3>[ 1037.669261] I[0:  system_server: 3778] get_rps_cpu: skb(1500), check hash value:158575680 
> <3>[ 1037.669285] I[0:  system_server: 3778] get_rps_cpu: skb(44), check hash value:158575680 
> - stage 2 
> <3>[ 1037.669541] I[1: Binder:3778_13: 8391] tcp_v4_rcv: Enter! skb(seq:A93E087B, len:1480) 
> <3>[ 1037.669552] I[2:Jit thread pool:12990] tcp_v4_rcv: Enter! skb(seq:A93E087B, len:20) 
> <3>[ 1037.669564] I[2:Jit thread pool:12990] tcp_v4_rcv: check sk_state:12 skb(seq:A93E087B, len:20) 
> <3>[ 1037.669585] I[2:Jit thread pool:12990] tcp_check_req, Enter!: skb(seq:A93E087B, len:20) 
> <3>[ 1037.669612] I[1: Binder:3778_13: 8391] tcp_v4_rcv: check sk_state:12 skb(seq:A93E087B, len:1480) 
> <3>[ 1037.669625] I[1: Binder:3778_13: 8391] tcp_check_req, Enter!: skb(seq:A93E087B, len:1480) 
> <3>[ 1037.669653] I[2:Jit thread pool:12990] tcp_check_req, skb(seq:A93E087B, len:20), own_req:1 
> <3>[ 1037.669668] I[1: Binder:3778_13: 8391] tcp_check_req, skb(seq:A93E087B, len:1480), own_req:0 
> <3>[ 1037.669708] I[2:Jit thread pool:12990] tcp_rcv_state_process, Established: skb(seq:A93E087B, len:20) 
> <3>[ 1037.669724] I[1: Binder:3778_13: 8391] tcp_v4_rcv: discard_relse skb(seq:A93E087B, len:1480)
> 
> - stage 1 
> because of the data packet has been fragmented(No.4 & 5), 
> it was hashed to another core(cpu1) which was differnet with last ack packet(cpu2), by rps. 
> so last ack and data packet handled in different core almost simultaniously, at NEW_SYN_RECV state.
> 
> - stage 2, cpu2 
> one of them will be treated in tcp_check_req() function a little more earlier, 
> then it got the true value for own_req from tcp_v4_syn_recv_sock(), and return valid nsk. 
> finally going to ESTABLISHED state.
> 
> - stage 2, cpu1 
> but another, later one is got the false value for own_req, 
> and return null for nsk, because of own_req value is false in inet_csk_complete_hashdance(). 
> so earlier packet was handled successfully but later one has gone to discard.
> 
> at this time, one of the ack or data packet could be discarded, by schedule timing. (we saw both of them) 
> if the ack was discarded, that's ok. 
> tcp state goes to ESTABLISHED by piggyback on data packet, and payload will be deliverd to upper layer. 
> but if the data packet was discarded, client can't receive the payload it have to. 
> this is the problem we faced.
> 
> 
> although server retransmitted the dropped packet(No6,7), but it takes few seconds delay. 
> since of this problem occured in IMS-Call setup, this is appeared to call connection delay. 
> these situation is serious problem in call service.
> 
> do you have any report about this or plan to fix it?

I guess the following patch should help, can you test it ?

Thanks !

 include/net/tcp.h        |    3 ++-
 net/ipv4/tcp_input.c     |    4 +++-
 net/ipv4/tcp_ipv4.c      |    9 ++++++++-
 net/ipv4/tcp_minisocks.c |    3 ++-
 net/ipv6/tcp_ipv6.c      |    9 ++++++++-
 5 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 58278669cc5572ec397f028b5888f574764e298a..fe4616978ccf2b048ca1b7ad8976558563798785 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -374,7 +374,8 @@ enum tcp_tw_status tcp_timewait_state_process(struct inet_timewait_sock *tw,
 					      struct sk_buff *skb,
 					      const struct tcphdr *th);
 struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
-			   struct request_sock *req, bool fastopen);
+			   struct request_sock *req, bool fastopen,
+			   bool *lost_race);
 int tcp_child_process(struct sock *parent, struct sock *child,
 		      struct sk_buff *skb);
 void tcp_enter_loss(struct sock *sk);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index cfa51cfd2d999342018aee3511d5760a17b1493b..20c68a4765235daec01a020913d4779d6926d203 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5870,10 +5870,12 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
 	tp->rx_opt.saw_tstamp = 0;
 	req = tp->fastopen_rsk;
 	if (req) {
+		bool lost_race;
+
 		WARN_ON_ONCE(sk->sk_state != TCP_SYN_RECV &&
 		    sk->sk_state != TCP_FIN_WAIT1);
 
-		if (!tcp_check_req(sk, skb, req, true))
+		if (!tcp_check_req(sk, skb, req, true, &lost_race))
 			goto discard;
 	}
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 95738aa0d8a60697053d763ba7023be2c62d7a90..d72de3d12b4c960c3bd1ad4c45fc493ee919201c 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1672,6 +1672,7 @@ int tcp_v4_rcv(struct sk_buff *skb)
 	if (sk->sk_state == TCP_NEW_SYN_RECV) {
 		struct request_sock *req = inet_reqsk(sk);
 		struct sock *nsk;
+		bool lost_race;
 
 		sk = req->rsk_listener;
 		if (unlikely(tcp_v4_inbound_md5_hash(sk, skb))) {
@@ -1688,15 +1689,21 @@ int tcp_v4_rcv(struct sk_buff *skb)
 		 */
 		sock_hold(sk);
 		refcounted = true;
+		lost_race = false;
 		nsk = NULL;
 		if (!tcp_filter(sk, skb)) {
 			th = (const struct tcphdr *)skb->data;
 			iph = ip_hdr(skb);
 			tcp_v4_fill_cb(skb, iph, th);
-			nsk = tcp_check_req(sk, skb, req, false);
+			nsk = tcp_check_req(sk, skb, req, false, &lost_race);
 		}
 		if (!nsk) {
 			reqsk_put(req);
+			if (lost_race) {
+				tcp_v4_restore_cb(skb);
+				sock_put(sk);
+				goto lookup;
+			}
 			goto discard_and_relse;
 		}
 		if (nsk == sk) {
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index a8384b0c11f8fa589e2ed5311899b62c80a269f8..7ebc222854f7d7215850f86456aeb6298659b6fa 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -578,7 +578,7 @@ EXPORT_SYMBOL(tcp_create_openreq_child);
 
 struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
 			   struct request_sock *req,
-			   bool fastopen)
+			   bool fastopen, bool *lost_race)
 {
 	struct tcp_options_received tmp_opt;
 	struct sock *child;
@@ -785,6 +785,7 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
 
 	sock_rps_save_rxhash(child, skb);
 	tcp_synack_rtt_meas(child, req);
+	*lost_race = !own_req;
 	return inet_csk_complete_hashdance(sk, child, req, own_req);
 
 listen_overflow:
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index a1ab29e2ab3bb99e120920186be46d2dd4bbb71e..211570bd524ddc3eddadec0b58c587935ad9a42c 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1451,6 +1451,7 @@ static int tcp_v6_rcv(struct sk_buff *skb)
 	if (sk->sk_state == TCP_NEW_SYN_RECV) {
 		struct request_sock *req = inet_reqsk(sk);
 		struct sock *nsk;
+		bool lost_race;
 
 		sk = req->rsk_listener;
 		if (tcp_v6_inbound_md5_hash(sk, skb)) {
@@ -1464,15 +1465,21 @@ static int tcp_v6_rcv(struct sk_buff *skb)
 		}
 		sock_hold(sk);
 		refcounted = true;
+		lost_race = false;
 		nsk = NULL;
 		if (!tcp_filter(sk, skb)) {
 			th = (const struct tcphdr *)skb->data;
 			hdr = ipv6_hdr(skb);
 			tcp_v6_fill_cb(skb, hdr, th);
-			nsk = tcp_check_req(sk, skb, req, false);
+			nsk = tcp_check_req(sk, skb, req, false, &lost_race);
 		}
 		if (!nsk) {
 			reqsk_put(req);
+			if (lost_race) {
+				tcp_v6_restore_cb(skb);
+				sock_put(sk);
+				goto lookup;
+			}
 			goto discard_and_relse;
 		}
 		if (nsk == sk) {

  reply	other threads:[~2018-02-07  4:31 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20180207021647epcms1p128701b4f7e63a88838dd132221119571@epcms1p1>
2018-02-07  2:16 ` [Android][Kernel][TCP/IP] report of packet discarding during tcp handshaking 배석진
2018-02-07  4:31   ` Eric Dumazet [this message]
     [not found]   ` <CGME20180207021647epcms1p128701b4f7e63a88838dd132221119571@epcms1p8>
2018-02-07  7:19     ` 배석진
2018-02-07 14:44       ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1517977874.3715.153.camel@gmail.com \
    --to=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=soukjin.bae@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).