netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process
@ 2024-11-05  2:55 Jason Xing
  2024-11-05  7:49 ` Kuniyuki Iwashima
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Jason Xing @ 2024-11-05  2:55 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, dsahern, horms; +Cc: netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

We found there are rare chances that some RST packets appear during
the shakehands because the timewait socket cannot accept the SYN and
doesn't return TCP_TW_SYN in tcp_timewait_state_process().

Here is how things happen in production:
Time        Client(A)        Server(B)
0s          SYN-->
...
132s                         <-- FIN
...
169s        FIN-->
169s                         <-- ACK
169s        SYN-->
169s                         <-- ACK
169s        RST-->
As above picture shows, the two flows have a start time difference
of 169 seconds. B starts to send FIN so it will finally enter into
TIMEWAIT state. Nearly at the same time A launches a new connection
that soon is reset by itself due to receiving a ACK.

There are two key checks in tcp_timewait_state_process() when timewait
socket in B receives the SYN packet:
1) after(TCP_SKB_CB(skb)->seq, rcv_nxt)
2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0)

Regarding the first rule, it fails as expected because in the first
connection the seq of SYN sent from A is 1892994276, then 169s have
passed, the second SYN has 239034613 (caused by overflow of s32).

Then how about the second rule?
It fails again!
Let's take a look at how the tsval comes out:
__tcp_transmit_skb()
    -> tcp_syn_options()
        -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset;
The timestamp depends on two things, one is skb->skb_mstamp_ns, the
other is tp->tsoffset. The latter value is fixed, so we don't need
to care about it. If both operations (sending FIN and then starting
sending SYN) from A happen in 1ms, then the tsval would be the same.
It can be clearly seen in the tcpdump log. Notice that the tsval is
with millisecond precision.

Based on the above analysis, I decided to make a small change to
the check in tcp_timewait_state_process() so that the second flow
would not fail.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 net/ipv4/tcp_minisocks.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index bb1fe1ba867a..2b29d1bf5ca0 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -234,7 +234,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
 	if (th->syn && !th->rst && !th->ack && !paws_reject &&
 	    (after(TCP_SKB_CB(skb)->seq, rcv_nxt) ||
 	     (tmp_opt.saw_tstamp &&
-	      (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0))) {
+	      (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) <= 0))) {
 		u32 isn = tcptw->tw_snd_nxt + 65535 + 2;
 		if (isn == 0)
 			isn++;
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2024-11-07  9:57 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-05  2:55 [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process Jason Xing
2024-11-05  7:49 ` Kuniyuki Iwashima
2024-11-05  9:08   ` Jason Xing
2024-11-05  9:35     ` Eric Dumazet
2024-11-05  9:41       ` Jason Xing
2024-11-05  9:50         ` Eric Dumazet
2024-11-05  9:56           ` Jason Xing
2024-11-05 11:48           ` Jason Xing
2024-11-07  3:16 ` Jason Xing
2024-11-07  4:15   ` Kuniyuki Iwashima
2024-11-07  4:21     ` Kuniyuki Iwashima
2024-11-07  5:23     ` Jason Xing
2024-11-07  5:43       ` Kuniyuki Iwashima
2024-11-07  6:51         ` Jason Xing
2024-11-07  7:11           ` Kuniyuki Iwashima
2024-11-07  7:44             ` Jason Xing
2024-11-07  7:51 ` Philo Lu
2024-11-07  8:01   ` Jason Xing
2024-11-07  8:22     ` Philo Lu
2024-11-07  8:26       ` Jason Xing
2024-11-07  8:37         ` Eric Dumazet
2024-11-07  9:00           ` Jason Xing
2024-11-07  9:16             ` Eric Dumazet
2024-11-07  9:18               ` Jason Xing
2024-11-07  9:30             ` Jason Xing
2024-11-07  9:45               ` Eric Dumazet
2024-11-07  9:48                 ` Eric Dumazet
2024-11-07  9:57                 ` Jason Xing

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).