* [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process
@ 2024-11-05 2:55 Jason Xing
2024-11-05 7:49 ` Kuniyuki Iwashima
` (2 more replies)
0 siblings, 3 replies; 28+ messages in thread
From: Jason Xing @ 2024-11-05 2:55 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, dsahern, horms; +Cc: netdev, Jason Xing
From: Jason Xing <kernelxing@tencent.com>
We found there are rare chances that some RST packets appear during
the shakehands because the timewait socket cannot accept the SYN and
doesn't return TCP_TW_SYN in tcp_timewait_state_process().
Here is how things happen in production:
Time Client(A) Server(B)
0s SYN-->
...
132s <-- FIN
...
169s FIN-->
169s <-- ACK
169s SYN-->
169s <-- ACK
169s RST-->
As above picture shows, the two flows have a start time difference
of 169 seconds. B starts to send FIN so it will finally enter into
TIMEWAIT state. Nearly at the same time A launches a new connection
that soon is reset by itself due to receiving a ACK.
There are two key checks in tcp_timewait_state_process() when timewait
socket in B receives the SYN packet:
1) after(TCP_SKB_CB(skb)->seq, rcv_nxt)
2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0)
Regarding the first rule, it fails as expected because in the first
connection the seq of SYN sent from A is 1892994276, then 169s have
passed, the second SYN has 239034613 (caused by overflow of s32).
Then how about the second rule?
It fails again!
Let's take a look at how the tsval comes out:
__tcp_transmit_skb()
-> tcp_syn_options()
-> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset;
The timestamp depends on two things, one is skb->skb_mstamp_ns, the
other is tp->tsoffset. The latter value is fixed, so we don't need
to care about it. If both operations (sending FIN and then starting
sending SYN) from A happen in 1ms, then the tsval would be the same.
It can be clearly seen in the tcpdump log. Notice that the tsval is
with millisecond precision.
Based on the above analysis, I decided to make a small change to
the check in tcp_timewait_state_process() so that the second flow
would not fail.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
net/ipv4/tcp_minisocks.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index bb1fe1ba867a..2b29d1bf5ca0 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -234,7 +234,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
if (th->syn && !th->rst && !th->ack && !paws_reject &&
(after(TCP_SKB_CB(skb)->seq, rcv_nxt) ||
(tmp_opt.saw_tstamp &&
- (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0))) {
+ (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) <= 0))) {
u32 isn = tcptw->tw_snd_nxt + 65535 + 2;
if (isn == 0)
isn++;
--
2.37.3
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process
2024-11-05 2:55 [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process Jason Xing
@ 2024-11-05 7:49 ` Kuniyuki Iwashima
2024-11-05 9:08 ` Jason Xing
2024-11-07 3:16 ` Jason Xing
2024-11-07 7:51 ` Philo Lu
2 siblings, 1 reply; 28+ messages in thread
From: Kuniyuki Iwashima @ 2024-11-05 7:49 UTC (permalink / raw)
To: kerneljasonxing
Cc: davem, dsahern, edumazet, horms, kernelxing, kuba, netdev, pabeni,
kuniyu
From: Jason Xing <kerneljasonxing@gmail.com>
Date: Tue, 5 Nov 2024 10:55:11 +0800
> From: Jason Xing <kernelxing@tencent.com>
>
> We found there are rare chances that some RST packets appear during
> the shakehands because the timewait socket cannot accept the SYN and
s/shakehands/handshake/
same in the subject.
> doesn't return TCP_TW_SYN in tcp_timewait_state_process().
>
> Here is how things happen in production:
> Time Client(A) Server(B)
> 0s SYN-->
> ...
> 132s <-- FIN
> ...
> 169s FIN-->
> 169s <-- ACK
> 169s SYN-->
> 169s <-- ACK
> 169s RST-->
> As above picture shows, the two flows have a start time difference
> of 169 seconds. B starts to send FIN so it will finally enter into
> TIMEWAIT state. Nearly at the same time A launches a new connection
> that soon is reset by itself due to receiving a ACK.
>
> There are two key checks in tcp_timewait_state_process() when timewait
> socket in B receives the SYN packet:
> 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt)
> 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0)
>
> Regarding the first rule, it fails as expected because in the first
> connection the seq of SYN sent from A is 1892994276, then 169s have
> passed, the second SYN has 239034613 (caused by overflow of s32).
>
> Then how about the second rule?
> It fails again!
> Let's take a look at how the tsval comes out:
> __tcp_transmit_skb()
> -> tcp_syn_options()
> -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset;
> The timestamp depends on two things, one is skb->skb_mstamp_ns, the
> other is tp->tsoffset. The latter value is fixed, so we don't need
> to care about it. If both operations (sending FIN and then starting
> sending SYN) from A happen in 1ms, then the tsval would be the same.
> It can be clearly seen in the tcpdump log. Notice that the tsval is
> with millisecond precision.
>
> Based on the above analysis, I decided to make a small change to
> the check in tcp_timewait_state_process() so that the second flow
> would not fail.
>
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
> net/ipv4/tcp_minisocks.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> index bb1fe1ba867a..2b29d1bf5ca0 100644
> --- a/net/ipv4/tcp_minisocks.c
> +++ b/net/ipv4/tcp_minisocks.c
> @@ -234,7 +234,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
> if (th->syn && !th->rst && !th->ack && !paws_reject &&
> (after(TCP_SKB_CB(skb)->seq, rcv_nxt) ||
> (tmp_opt.saw_tstamp &&
> - (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0))) {
I think this follows RFC 6191 and such a change requires a formal
discussion at IETF.
https://datatracker.ietf.org/doc/html/rfc6191#section-2
---8<---
* If TCP Timestamps would be enabled for the new incarnation of
the connection, and the timestamp contained in the incoming SYN
segment is greater than the last timestamp seen on the previous
incarnation of the connection (for that direction of the data
transfer), honor the connection request (creating a connection
in the SYN-RECEIVED state).
* If TCP Timestamps would be enabled for the new incarnation of
the connection, the timestamp contained in the incoming SYN
segment is equal to the last timestamp seen on the previous
incarnation of the connection (for that direction of the data
transfer), and the Sequence Number of the incoming SYN segment
is greater than the last sequence number seen on the previous
incarnation of the connection (for that direction of the data
transfer), honor the connection request (creating a connection
in the SYN-RECEIVED state).
---8<---
> + (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) <= 0))) {
> u32 isn = tcptw->tw_snd_nxt + 65535 + 2;
> if (isn == 0)
> isn++;
> --
> 2.37.3
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process
2024-11-05 7:49 ` Kuniyuki Iwashima
@ 2024-11-05 9:08 ` Jason Xing
2024-11-05 9:35 ` Eric Dumazet
0 siblings, 1 reply; 28+ messages in thread
From: Jason Xing @ 2024-11-05 9:08 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: davem, dsahern, edumazet, horms, kernelxing, kuba, netdev, pabeni
On Tue, Nov 5, 2024 at 3:49 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> From: Jason Xing <kerneljasonxing@gmail.com>
> Date: Tue, 5 Nov 2024 10:55:11 +0800
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > We found there are rare chances that some RST packets appear during
> > the shakehands because the timewait socket cannot accept the SYN and
>
> s/shakehands/handshake/
>
> same in the subject.
>
> > doesn't return TCP_TW_SYN in tcp_timewait_state_process().
> >
> > Here is how things happen in production:
> > Time Client(A) Server(B)
> > 0s SYN-->
> > ...
> > 132s <-- FIN
> > ...
> > 169s FIN-->
> > 169s <-- ACK
> > 169s SYN-->
> > 169s <-- ACK
> > 169s RST-->
> > As above picture shows, the two flows have a start time difference
> > of 169 seconds. B starts to send FIN so it will finally enter into
> > TIMEWAIT state. Nearly at the same time A launches a new connection
> > that soon is reset by itself due to receiving a ACK.
> >
> > There are two key checks in tcp_timewait_state_process() when timewait
> > socket in B receives the SYN packet:
> > 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt)
> > 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0)
> >
> > Regarding the first rule, it fails as expected because in the first
> > connection the seq of SYN sent from A is 1892994276, then 169s have
> > passed, the second SYN has 239034613 (caused by overflow of s32).
> >
> > Then how about the second rule?
> > It fails again!
> > Let's take a look at how the tsval comes out:
> > __tcp_transmit_skb()
> > -> tcp_syn_options()
> > -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset;
> > The timestamp depends on two things, one is skb->skb_mstamp_ns, the
> > other is tp->tsoffset. The latter value is fixed, so we don't need
> > to care about it. If both operations (sending FIN and then starting
> > sending SYN) from A happen in 1ms, then the tsval would be the same.
> > It can be clearly seen in the tcpdump log. Notice that the tsval is
> > with millisecond precision.
> >
> > Based on the above analysis, I decided to make a small change to
> > the check in tcp_timewait_state_process() so that the second flow
> > would not fail.
> >
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > ---
> > net/ipv4/tcp_minisocks.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> > index bb1fe1ba867a..2b29d1bf5ca0 100644
> > --- a/net/ipv4/tcp_minisocks.c
> > +++ b/net/ipv4/tcp_minisocks.c
> > @@ -234,7 +234,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
> > if (th->syn && !th->rst && !th->ack && !paws_reject &&
> > (after(TCP_SKB_CB(skb)->seq, rcv_nxt) ||
> > (tmp_opt.saw_tstamp &&
> > - (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0))) {
>
> I think this follows RFC 6191 and such a change requires a formal
> discussion at IETF.
>
> https://datatracker.ietf.org/doc/html/rfc6191#section-2
>
> ---8<---
> * If TCP Timestamps would be enabled for the new incarnation of
> the connection, and the timestamp contained in the incoming SYN
> segment is greater than the last timestamp seen on the previous
The true thing is that the timestamp of the SYN packet is greater than
that of the last packet, but the kernel implementation uses ms
precision (please see tcp_skb_timestamp_ts()). That function makes
those two timestamps the same.
This case happens as expected, so the second connection should be
established. My confusion just popped out of my mind: what rules
should we follow to stop the second flow?
Thanks,
Jason
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process
2024-11-05 9:08 ` Jason Xing
@ 2024-11-05 9:35 ` Eric Dumazet
2024-11-05 9:41 ` Jason Xing
0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2024-11-05 9:35 UTC (permalink / raw)
To: Jason Xing
Cc: Kuniyuki Iwashima, davem, dsahern, horms, kernelxing, kuba,
netdev, pabeni
On Tue, Nov 5, 2024 at 10:08 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Tue, Nov 5, 2024 at 3:49 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> >
> > From: Jason Xing <kerneljasonxing@gmail.com>
> > Date: Tue, 5 Nov 2024 10:55:11 +0800
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > We found there are rare chances that some RST packets appear during
> > > the shakehands because the timewait socket cannot accept the SYN and
> >
> > s/shakehands/handshake/
> >
> > same in the subject.
> >
> > > doesn't return TCP_TW_SYN in tcp_timewait_state_process().
> > >
> > > Here is how things happen in production:
> > > Time Client(A) Server(B)
> > > 0s SYN-->
> > > ...
> > > 132s <-- FIN
> > > ...
> > > 169s FIN-->
> > > 169s <-- ACK
> > > 169s SYN-->
> > > 169s <-- ACK
> > > 169s RST-->
> > > As above picture shows, the two flows have a start time difference
> > > of 169 seconds. B starts to send FIN so it will finally enter into
> > > TIMEWAIT state. Nearly at the same time A launches a new connection
> > > that soon is reset by itself due to receiving a ACK.
> > >
> > > There are two key checks in tcp_timewait_state_process() when timewait
> > > socket in B receives the SYN packet:
> > > 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt)
> > > 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0)
> > >
> > > Regarding the first rule, it fails as expected because in the first
> > > connection the seq of SYN sent from A is 1892994276, then 169s have
> > > passed, the second SYN has 239034613 (caused by overflow of s32).
> > >
> > > Then how about the second rule?
> > > It fails again!
> > > Let's take a look at how the tsval comes out:
> > > __tcp_transmit_skb()
> > > -> tcp_syn_options()
> > > -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset;
> > > The timestamp depends on two things, one is skb->skb_mstamp_ns, the
> > > other is tp->tsoffset. The latter value is fixed, so we don't need
> > > to care about it. If both operations (sending FIN and then starting
> > > sending SYN) from A happen in 1ms, then the tsval would be the same.
> > > It can be clearly seen in the tcpdump log. Notice that the tsval is
> > > with millisecond precision.
> > >
> > > Based on the above analysis, I decided to make a small change to
> > > the check in tcp_timewait_state_process() so that the second flow
> > > would not fail.
> > >
> > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > ---
> > > net/ipv4/tcp_minisocks.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> > > index bb1fe1ba867a..2b29d1bf5ca0 100644
> > > --- a/net/ipv4/tcp_minisocks.c
> > > +++ b/net/ipv4/tcp_minisocks.c
> > > @@ -234,7 +234,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
> > > if (th->syn && !th->rst && !th->ack && !paws_reject &&
> > > (after(TCP_SKB_CB(skb)->seq, rcv_nxt) ||
> > > (tmp_opt.saw_tstamp &&
> > > - (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0))) {
> >
> > I think this follows RFC 6191 and such a change requires a formal
> > discussion at IETF.
> >
> > https://datatracker.ietf.org/doc/html/rfc6191#section-2
> >
> > ---8<---
> > * If TCP Timestamps would be enabled for the new incarnation of
> > the connection, and the timestamp contained in the incoming SYN
> > segment is greater than the last timestamp seen on the previous
>
> The true thing is that the timestamp of the SYN packet is greater than
> that of the last packet, but the kernel implementation uses ms
> precision (please see tcp_skb_timestamp_ts()). That function makes
> those two timestamps the same.
>
> This case happens as expected, so the second connection should be
> established. My confusion just popped out of my mind: what rules
> should we follow to stop the second flow?
Note that linux TCP stack can use usec resolution for TCP TS values.
You might adopt it, and no longer care about this ms granularity.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process
2024-11-05 9:35 ` Eric Dumazet
@ 2024-11-05 9:41 ` Jason Xing
2024-11-05 9:50 ` Eric Dumazet
0 siblings, 1 reply; 28+ messages in thread
From: Jason Xing @ 2024-11-05 9:41 UTC (permalink / raw)
To: Eric Dumazet
Cc: Kuniyuki Iwashima, davem, dsahern, horms, kernelxing, kuba,
netdev, pabeni
On Tue, Nov 5, 2024 at 5:35 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Nov 5, 2024 at 10:08 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > On Tue, Nov 5, 2024 at 3:49 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > >
> > > From: Jason Xing <kerneljasonxing@gmail.com>
> > > Date: Tue, 5 Nov 2024 10:55:11 +0800
> > > > From: Jason Xing <kernelxing@tencent.com>
> > > >
> > > > We found there are rare chances that some RST packets appear during
> > > > the shakehands because the timewait socket cannot accept the SYN and
> > >
> > > s/shakehands/handshake/
> > >
> > > same in the subject.
> > >
> > > > doesn't return TCP_TW_SYN in tcp_timewait_state_process().
> > > >
> > > > Here is how things happen in production:
> > > > Time Client(A) Server(B)
> > > > 0s SYN-->
> > > > ...
> > > > 132s <-- FIN
> > > > ...
> > > > 169s FIN-->
> > > > 169s <-- ACK
> > > > 169s SYN-->
> > > > 169s <-- ACK
> > > > 169s RST-->
> > > > As above picture shows, the two flows have a start time difference
> > > > of 169 seconds. B starts to send FIN so it will finally enter into
> > > > TIMEWAIT state. Nearly at the same time A launches a new connection
> > > > that soon is reset by itself due to receiving a ACK.
> > > >
> > > > There are two key checks in tcp_timewait_state_process() when timewait
> > > > socket in B receives the SYN packet:
> > > > 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt)
> > > > 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0)
> > > >
> > > > Regarding the first rule, it fails as expected because in the first
> > > > connection the seq of SYN sent from A is 1892994276, then 169s have
> > > > passed, the second SYN has 239034613 (caused by overflow of s32).
> > > >
> > > > Then how about the second rule?
> > > > It fails again!
> > > > Let's take a look at how the tsval comes out:
> > > > __tcp_transmit_skb()
> > > > -> tcp_syn_options()
> > > > -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset;
> > > > The timestamp depends on two things, one is skb->skb_mstamp_ns, the
> > > > other is tp->tsoffset. The latter value is fixed, so we don't need
> > > > to care about it. If both operations (sending FIN and then starting
> > > > sending SYN) from A happen in 1ms, then the tsval would be the same.
> > > > It can be clearly seen in the tcpdump log. Notice that the tsval is
> > > > with millisecond precision.
> > > >
> > > > Based on the above analysis, I decided to make a small change to
> > > > the check in tcp_timewait_state_process() so that the second flow
> > > > would not fail.
> > > >
> > > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > > ---
> > > > net/ipv4/tcp_minisocks.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> > > > index bb1fe1ba867a..2b29d1bf5ca0 100644
> > > > --- a/net/ipv4/tcp_minisocks.c
> > > > +++ b/net/ipv4/tcp_minisocks.c
> > > > @@ -234,7 +234,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
> > > > if (th->syn && !th->rst && !th->ack && !paws_reject &&
> > > > (after(TCP_SKB_CB(skb)->seq, rcv_nxt) ||
> > > > (tmp_opt.saw_tstamp &&
> > > > - (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0))) {
> > >
> > > I think this follows RFC 6191 and such a change requires a formal
> > > discussion at IETF.
> > >
> > > https://datatracker.ietf.org/doc/html/rfc6191#section-2
> > >
> > > ---8<---
> > > * If TCP Timestamps would be enabled for the new incarnation of
> > > the connection, and the timestamp contained in the incoming SYN
> > > segment is greater than the last timestamp seen on the previous
> >
> > The true thing is that the timestamp of the SYN packet is greater than
> > that of the last packet, but the kernel implementation uses ms
> > precision (please see tcp_skb_timestamp_ts()). That function makes
> > those two timestamps the same.
> >
> > This case happens as expected, so the second connection should be
> > established. My confusion just popped out of my mind: what rules
> > should we follow to stop the second flow?
>
> Note that linux TCP stack can use usec resolution for TCP TS values.
>
> You might adopt it, and no longer care about this ms granularity.
Right, I noticed this feature. I wonder if we can change the check in
tcp_timewait_state_process() like this patch if it has no side
effects? I'm worried that some programs don't use this feature. It's
the reason why I try to propose this patch to you.
Thanks,
Jason
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process
2024-11-05 9:41 ` Jason Xing
@ 2024-11-05 9:50 ` Eric Dumazet
2024-11-05 9:56 ` Jason Xing
2024-11-05 11:48 ` Jason Xing
0 siblings, 2 replies; 28+ messages in thread
From: Eric Dumazet @ 2024-11-05 9:50 UTC (permalink / raw)
To: Jason Xing
Cc: Kuniyuki Iwashima, davem, dsahern, horms, kernelxing, kuba,
netdev, pabeni
On Tue, Nov 5, 2024 at 10:42 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Tue, Nov 5, 2024 at 5:35 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Tue, Nov 5, 2024 at 10:08 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >
> > > On Tue, Nov 5, 2024 at 3:49 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > > >
> > > > From: Jason Xing <kerneljasonxing@gmail.com>
> > > > Date: Tue, 5 Nov 2024 10:55:11 +0800
> > > > > From: Jason Xing <kernelxing@tencent.com>
> > > > >
> > > > > We found there are rare chances that some RST packets appear during
> > > > > the shakehands because the timewait socket cannot accept the SYN and
> > > >
> > > > s/shakehands/handshake/
> > > >
> > > > same in the subject.
> > > >
> > > > > doesn't return TCP_TW_SYN in tcp_timewait_state_process().
> > > > >
> > > > > Here is how things happen in production:
> > > > > Time Client(A) Server(B)
> > > > > 0s SYN-->
> > > > > ...
> > > > > 132s <-- FIN
> > > > > ...
> > > > > 169s FIN-->
> > > > > 169s <-- ACK
> > > > > 169s SYN-->
> > > > > 169s <-- ACK
> > > > > 169s RST-->
> > > > > As above picture shows, the two flows have a start time difference
> > > > > of 169 seconds. B starts to send FIN so it will finally enter into
> > > > > TIMEWAIT state. Nearly at the same time A launches a new connection
> > > > > that soon is reset by itself due to receiving a ACK.
> > > > >
> > > > > There are two key checks in tcp_timewait_state_process() when timewait
> > > > > socket in B receives the SYN packet:
> > > > > 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt)
> > > > > 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0)
> > > > >
> > > > > Regarding the first rule, it fails as expected because in the first
> > > > > connection the seq of SYN sent from A is 1892994276, then 169s have
> > > > > passed, the second SYN has 239034613 (caused by overflow of s32).
> > > > >
> > > > > Then how about the second rule?
> > > > > It fails again!
> > > > > Let's take a look at how the tsval comes out:
> > > > > __tcp_transmit_skb()
> > > > > -> tcp_syn_options()
> > > > > -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset;
> > > > > The timestamp depends on two things, one is skb->skb_mstamp_ns, the
> > > > > other is tp->tsoffset. The latter value is fixed, so we don't need
> > > > > to care about it. If both operations (sending FIN and then starting
> > > > > sending SYN) from A happen in 1ms, then the tsval would be the same.
> > > > > It can be clearly seen in the tcpdump log. Notice that the tsval is
> > > > > with millisecond precision.
> > > > >
> > > > > Based on the above analysis, I decided to make a small change to
> > > > > the check in tcp_timewait_state_process() so that the second flow
> > > > > would not fail.
> > > > >
> > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > > > ---
> > > > > net/ipv4/tcp_minisocks.c | 2 +-
> > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> > > > > index bb1fe1ba867a..2b29d1bf5ca0 100644
> > > > > --- a/net/ipv4/tcp_minisocks.c
> > > > > +++ b/net/ipv4/tcp_minisocks.c
> > > > > @@ -234,7 +234,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
> > > > > if (th->syn && !th->rst && !th->ack && !paws_reject &&
> > > > > (after(TCP_SKB_CB(skb)->seq, rcv_nxt) ||
> > > > > (tmp_opt.saw_tstamp &&
> > > > > - (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0))) {
> > > >
> > > > I think this follows RFC 6191 and such a change requires a formal
> > > > discussion at IETF.
> > > >
> > > > https://datatracker.ietf.org/doc/html/rfc6191#section-2
> > > >
> > > > ---8<---
> > > > * If TCP Timestamps would be enabled for the new incarnation of
> > > > the connection, and the timestamp contained in the incoming SYN
> > > > segment is greater than the last timestamp seen on the previous
> > >
> > > The true thing is that the timestamp of the SYN packet is greater than
> > > that of the last packet, but the kernel implementation uses ms
> > > precision (please see tcp_skb_timestamp_ts()). That function makes
> > > those two timestamps the same.
> > >
> > > This case happens as expected, so the second connection should be
> > > established. My confusion just popped out of my mind: what rules
> > > should we follow to stop the second flow?
> >
> > Note that linux TCP stack can use usec resolution for TCP TS values.
> >
> > You might adopt it, and no longer care about this ms granularity.
>
> Right, I noticed this feature. I wonder if we can change the check in
> tcp_timewait_state_process() like this patch if it has no side
> effects? I'm worried that some programs don't use this feature. It's
> the reason why I try to propose this patch to you.
Breaking RFC ? I do not think so.
Instead, use a usec clock and voila, the problem is solved.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process
2024-11-05 9:50 ` Eric Dumazet
@ 2024-11-05 9:56 ` Jason Xing
2024-11-05 11:48 ` Jason Xing
1 sibling, 0 replies; 28+ messages in thread
From: Jason Xing @ 2024-11-05 9:56 UTC (permalink / raw)
To: Eric Dumazet
Cc: Kuniyuki Iwashima, davem, dsahern, horms, kernelxing, kuba,
netdev, pabeni
On Tue, Nov 5, 2024 at 5:50 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Nov 5, 2024 at 10:42 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > On Tue, Nov 5, 2024 at 5:35 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Tue, Nov 5, 2024 at 10:08 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > >
> > > > On Tue, Nov 5, 2024 at 3:49 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > > > >
> > > > > From: Jason Xing <kerneljasonxing@gmail.com>
> > > > > Date: Tue, 5 Nov 2024 10:55:11 +0800
> > > > > > From: Jason Xing <kernelxing@tencent.com>
> > > > > >
> > > > > > We found there are rare chances that some RST packets appear during
> > > > > > the shakehands because the timewait socket cannot accept the SYN and
> > > > >
> > > > > s/shakehands/handshake/
> > > > >
> > > > > same in the subject.
> > > > >
> > > > > > doesn't return TCP_TW_SYN in tcp_timewait_state_process().
> > > > > >
> > > > > > Here is how things happen in production:
> > > > > > Time Client(A) Server(B)
> > > > > > 0s SYN-->
> > > > > > ...
> > > > > > 132s <-- FIN
> > > > > > ...
> > > > > > 169s FIN-->
> > > > > > 169s <-- ACK
> > > > > > 169s SYN-->
> > > > > > 169s <-- ACK
> > > > > > 169s RST-->
> > > > > > As above picture shows, the two flows have a start time difference
> > > > > > of 169 seconds. B starts to send FIN so it will finally enter into
> > > > > > TIMEWAIT state. Nearly at the same time A launches a new connection
> > > > > > that soon is reset by itself due to receiving a ACK.
> > > > > >
> > > > > > There are two key checks in tcp_timewait_state_process() when timewait
> > > > > > socket in B receives the SYN packet:
> > > > > > 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt)
> > > > > > 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0)
> > > > > >
> > > > > > Regarding the first rule, it fails as expected because in the first
> > > > > > connection the seq of SYN sent from A is 1892994276, then 169s have
> > > > > > passed, the second SYN has 239034613 (caused by overflow of s32).
> > > > > >
> > > > > > Then how about the second rule?
> > > > > > It fails again!
> > > > > > Let's take a look at how the tsval comes out:
> > > > > > __tcp_transmit_skb()
> > > > > > -> tcp_syn_options()
> > > > > > -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset;
> > > > > > The timestamp depends on two things, one is skb->skb_mstamp_ns, the
> > > > > > other is tp->tsoffset. The latter value is fixed, so we don't need
> > > > > > to care about it. If both operations (sending FIN and then starting
> > > > > > sending SYN) from A happen in 1ms, then the tsval would be the same.
> > > > > > It can be clearly seen in the tcpdump log. Notice that the tsval is
> > > > > > with millisecond precision.
> > > > > >
> > > > > > Based on the above analysis, I decided to make a small change to
> > > > > > the check in tcp_timewait_state_process() so that the second flow
> > > > > > would not fail.
> > > > > >
> > > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > > > > ---
> > > > > > net/ipv4/tcp_minisocks.c | 2 +-
> > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> > > > > > index bb1fe1ba867a..2b29d1bf5ca0 100644
> > > > > > --- a/net/ipv4/tcp_minisocks.c
> > > > > > +++ b/net/ipv4/tcp_minisocks.c
> > > > > > @@ -234,7 +234,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
> > > > > > if (th->syn && !th->rst && !th->ack && !paws_reject &&
> > > > > > (after(TCP_SKB_CB(skb)->seq, rcv_nxt) ||
> > > > > > (tmp_opt.saw_tstamp &&
> > > > > > - (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0))) {
> > > > >
> > > > > I think this follows RFC 6191 and such a change requires a formal
> > > > > discussion at IETF.
> > > > >
> > > > > https://datatracker.ietf.org/doc/html/rfc6191#section-2
> > > > >
> > > > > ---8<---
> > > > > * If TCP Timestamps would be enabled for the new incarnation of
> > > > > the connection, and the timestamp contained in the incoming SYN
> > > > > segment is greater than the last timestamp seen on the previous
> > > >
> > > > The true thing is that the timestamp of the SYN packet is greater than
> > > > that of the last packet, but the kernel implementation uses ms
> > > > precision (please see tcp_skb_timestamp_ts()). That function makes
> > > > those two timestamps the same.
> > > >
> > > > This case happens as expected, so the second connection should be
> > > > established. My confusion just popped out of my mind: what rules
> > > > should we follow to stop the second flow?
> > >
> > > Note that linux TCP stack can use usec resolution for TCP TS values.
> > >
> > > You might adopt it, and no longer care about this ms granularity.
> >
> > Right, I noticed this feature. I wonder if we can change the check in
> > tcp_timewait_state_process() like this patch if it has no side
> > effects? I'm worried that some programs don't use this feature. It's
> > the reason why I try to propose this patch to you.
>
> Breaking RFC ? I do not think so.
Oh right, I just can't figure it out why since we've already lost the
fine-grained timestamp in each skb. I spent a few days investigating
the bad cases this patch may bring, but I failed.
> Instead, use a usec clock and voila, the problem is solved.
Sure, it can work :)
Thanks,
Jason
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process
2024-11-05 9:50 ` Eric Dumazet
2024-11-05 9:56 ` Jason Xing
@ 2024-11-05 11:48 ` Jason Xing
1 sibling, 0 replies; 28+ messages in thread
From: Jason Xing @ 2024-11-05 11:48 UTC (permalink / raw)
To: Eric Dumazet, Martin KaFai Lau
Cc: Kuniyuki Iwashima, davem, dsahern, horms, kernelxing, kuba,
netdev, pabeni
Hello Eric, Martin
On Tue, Nov 5, 2024 at 5:50 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Nov 5, 2024 at 10:42 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > On Tue, Nov 5, 2024 at 5:35 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Tue, Nov 5, 2024 at 10:08 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > >
> > > > On Tue, Nov 5, 2024 at 3:49 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > > > >
> > > > > From: Jason Xing <kerneljasonxing@gmail.com>
> > > > > Date: Tue, 5 Nov 2024 10:55:11 +0800
> > > > > > From: Jason Xing <kernelxing@tencent.com>
> > > > > >
> > > > > > We found there are rare chances that some RST packets appear during
> > > > > > the shakehands because the timewait socket cannot accept the SYN and
> > > > >
> > > > > s/shakehands/handshake/
> > > > >
> > > > > same in the subject.
> > > > >
> > > > > > doesn't return TCP_TW_SYN in tcp_timewait_state_process().
> > > > > >
> > > > > > Here is how things happen in production:
> > > > > > Time Client(A) Server(B)
> > > > > > 0s SYN-->
> > > > > > ...
> > > > > > 132s <-- FIN
> > > > > > ...
> > > > > > 169s FIN-->
> > > > > > 169s <-- ACK
> > > > > > 169s SYN-->
> > > > > > 169s <-- ACK
> > > > > > 169s RST-->
> > > > > > As above picture shows, the two flows have a start time difference
> > > > > > of 169 seconds. B starts to send FIN so it will finally enter into
> > > > > > TIMEWAIT state. Nearly at the same time A launches a new connection
> > > > > > that soon is reset by itself due to receiving a ACK.
> > > > > >
> > > > > > There are two key checks in tcp_timewait_state_process() when timewait
> > > > > > socket in B receives the SYN packet:
> > > > > > 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt)
> > > > > > 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0)
> > > > > >
> > > > > > Regarding the first rule, it fails as expected because in the first
> > > > > > connection the seq of SYN sent from A is 1892994276, then 169s have
> > > > > > passed, the second SYN has 239034613 (caused by overflow of s32).
> > > > > >
> > > > > > Then how about the second rule?
> > > > > > It fails again!
> > > > > > Let's take a look at how the tsval comes out:
> > > > > > __tcp_transmit_skb()
> > > > > > -> tcp_syn_options()
> > > > > > -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset;
> > > > > > The timestamp depends on two things, one is skb->skb_mstamp_ns, the
> > > > > > other is tp->tsoffset. The latter value is fixed, so we don't need
> > > > > > to care about it. If both operations (sending FIN and then starting
> > > > > > sending SYN) from A happen in 1ms, then the tsval would be the same.
> > > > > > It can be clearly seen in the tcpdump log. Notice that the tsval is
> > > > > > with millisecond precision.
> > > > > >
> > > > > > Based on the above analysis, I decided to make a small change to
> > > > > > the check in tcp_timewait_state_process() so that the second flow
> > > > > > would not fail.
> > > > > >
> > > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > > > > ---
> > > > > > net/ipv4/tcp_minisocks.c | 2 +-
> > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> > > > > > index bb1fe1ba867a..2b29d1bf5ca0 100644
> > > > > > --- a/net/ipv4/tcp_minisocks.c
> > > > > > +++ b/net/ipv4/tcp_minisocks.c
> > > > > > @@ -234,7 +234,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
> > > > > > if (th->syn && !th->rst && !th->ack && !paws_reject &&
> > > > > > (after(TCP_SKB_CB(skb)->seq, rcv_nxt) ||
> > > > > > (tmp_opt.saw_tstamp &&
> > > > > > - (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0))) {
> > > > >
> > > > > I think this follows RFC 6191 and such a change requires a formal
> > > > > discussion at IETF.
> > > > >
> > > > > https://datatracker.ietf.org/doc/html/rfc6191#section-2
> > > > >
> > > > > ---8<---
> > > > > * If TCP Timestamps would be enabled for the new incarnation of
> > > > > the connection, and the timestamp contained in the incoming SYN
> > > > > segment is greater than the last timestamp seen on the previous
> > > >
> > > > The true thing is that the timestamp of the SYN packet is greater than
> > > > that of the last packet, but the kernel implementation uses ms
> > > > precision (please see tcp_skb_timestamp_ts()). That function makes
> > > > those two timestamps the same.
> > > >
> > > > This case happens as expected, so the second connection should be
> > > > established. My confusion just popped out of my mind: what rules
> > > > should we follow to stop the second flow?
> > >
> > > Note that linux TCP stack can use usec resolution for TCP TS values.
> > >
> > > You might adopt it, and no longer care about this ms granularity.
> >
> > Right, I noticed this feature. I wonder if we can change the check in
> > tcp_timewait_state_process() like this patch if it has no side
> > effects? I'm worried that some programs don't use this feature. It's
> > the reason why I try to propose this patch to you.
>
> Breaking RFC ? I do not think so.
>
> Instead, use a usec clock and voila, the problem is solved.
I wonder if it is necessary to use BPF to extend the usec clock. Since
I started investigating the BPF part, I found it can help us extend
many useful features in TCP to the most extent.
Using it with the BPF program can make the feature take effect and
widely used in future.
I'm asking because I somehow have a feeling that someday the majority
of traditional sockopts could be replaced by BPF. After all, requiring
application modification for those kernel features is a bit heavy. I'm
not sure if I'm right about it :S
Thanks,
Jason
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process
2024-11-05 2:55 [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process Jason Xing
2024-11-05 7:49 ` Kuniyuki Iwashima
@ 2024-11-07 3:16 ` Jason Xing
2024-11-07 4:15 ` Kuniyuki Iwashima
2024-11-07 7:51 ` Philo Lu
2 siblings, 1 reply; 28+ messages in thread
From: Jason Xing @ 2024-11-07 3:16 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, dsahern, horms; +Cc: netdev, Jason Xing
Hello Eric,
On Tue, Nov 5, 2024 at 10:55 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> From: Jason Xing <kernelxing@tencent.com>
>
> We found there are rare chances that some RST packets appear during
> the shakehands because the timewait socket cannot accept the SYN and
> doesn't return TCP_TW_SYN in tcp_timewait_state_process().
>
> Here is how things happen in production:
> Time Client(A) Server(B)
> 0s SYN-->
> ...
> 132s <-- FIN
> ...
> 169s FIN-->
> 169s <-- ACK
> 169s SYN-->
> 169s <-- ACK
I noticed the above ACK doesn't adhere to RFC 6191. It says:
"If the previous incarnation of the connection used Timestamps, then:
if ...
...
* Otherwise, silently drop the incoming SYN segment, thus leaving
the previous incarnation of the connection in the TIME-WAIT
state.
"
But the timewait socket sends an ACK because of this code snippet:
tcp_timewait_state_process()
-> // the checks of SYN packet failed.
-> if (!th->rst) {
-> return TCP_TW_ACK; // this line can be traced back to 2005
I think the following patch follows the RFC:
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index bb1fe1ba867a..cc22f0412f98 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -231,15 +231,17 @@ tcp_timewait_state_process(struct
inet_timewait_sock *tw, struct sk_buff *skb,
but not fatal yet.
*/
- if (th->syn && !th->rst && !th->ack && !paws_reject &&
- (after(TCP_SKB_CB(skb)->seq, rcv_nxt) ||
- (tmp_opt.saw_tstamp &&
- (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0))) {
- u32 isn = tcptw->tw_snd_nxt + 65535 + 2;
- if (isn == 0)
- isn++;
- *tw_isn = isn;
- return TCP_TW_SYN;
+ if (th->syn && !th->rst && !th->ack && !paws_reject) {
+ if (after(TCP_SKB_CB(skb)->seq, rcv_nxt) ||
+ (tmp_opt.saw_tstamp &&
+ (s32)(READ_ONCE(tcptw->tw_ts_recent) -
tmp_opt.rcv_tsval) < 0)) {
+ u32 isn = tcptw->tw_snd_nxt + 65535 + 2;
+ if (isn == 0)
+ isn++;
+ *tw_isn = isn;
+ return TCP_TW_SYN;
+ }
+ return TCP_TW_SUCCESS;
}
if (paws_reject)
Could you help me review this, Eric? Thanks in advance!
Thanks,
Jason
> 169s RST-->
> As above picture shows, the two flows have a start time difference
> of 169 seconds. B starts to send FIN so it will finally enter into
> TIMEWAIT state. Nearly at the same time A launches a new connection
> that soon is reset by itself due to receiving a ACK.
>
> There are two key checks in tcp_timewait_state_process() when timewait
> socket in B receives the SYN packet:
> 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt)
> 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0)
>
> Regarding the first rule, it fails as expected because in the first
> connection the seq of SYN sent from A is 1892994276, then 169s have
> passed, the second SYN has 239034613 (caused by overflow of s32).
>
> Then how about the second rule?
> It fails again!
> Let's take a look at how the tsval comes out:
> __tcp_transmit_skb()
> -> tcp_syn_options()
> -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset;
> The timestamp depends on two things, one is skb->skb_mstamp_ns, the
> other is tp->tsoffset. The latter value is fixed, so we don't need
> to care about it. If both operations (sending FIN and then starting
> sending SYN) from A happen in 1ms, then the tsval would be the same.
> It can be clearly seen in the tcpdump log. Notice that the tsval is
> with millisecond precision.
>
> Based on the above analysis, I decided to make a small change to
> the check in tcp_timewait_state_process() so that the second flow
> would not fail.
>
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
> net/ipv4/tcp_minisocks.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> index bb1fe1ba867a..2b29d1bf5ca0 100644
> --- a/net/ipv4/tcp_minisocks.c
> +++ b/net/ipv4/tcp_minisocks.c
> @@ -234,7 +234,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
> if (th->syn && !th->rst && !th->ack && !paws_reject &&
> (after(TCP_SKB_CB(skb)->seq, rcv_nxt) ||
> (tmp_opt.saw_tstamp &&
> - (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0))) {
> + (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) <= 0))) {
> u32 isn = tcptw->tw_snd_nxt + 65535 + 2;
> if (isn == 0)
> isn++;
> --
> 2.37.3
>
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process
2024-11-07 3:16 ` Jason Xing
@ 2024-11-07 4:15 ` Kuniyuki Iwashima
2024-11-07 4:21 ` Kuniyuki Iwashima
2024-11-07 5:23 ` Jason Xing
0 siblings, 2 replies; 28+ messages in thread
From: Kuniyuki Iwashima @ 2024-11-07 4:15 UTC (permalink / raw)
To: kerneljasonxing
Cc: davem, dsahern, edumazet, horms, kernelxing, kuba, netdev, pabeni,
kuniyu
From: Jason Xing <kerneljasonxing@gmail.com>
Date: Thu, 7 Nov 2024 11:16:04 +0800
> > Here is how things happen in production:
> > Time Client(A) Server(B)
> > 0s SYN-->
> > ...
> > 132s <-- FIN
> > ...
> > 169s FIN-->
> > 169s <-- ACK
> > 169s SYN-->
> > 169s <-- ACK
>
> I noticed the above ACK doesn't adhere to RFC 6191. It says:
> "If the previous incarnation of the connection used Timestamps, then:
> if ...
> ...
> * Otherwise, silently drop the incoming SYN segment, thus leaving
> the previous incarnation of the connection in the TIME-WAIT
> state.
> "
> But the timewait socket sends an ACK because of this code snippet:
> tcp_timewait_state_process()
> -> // the checks of SYN packet failed.
> -> if (!th->rst) {
> -> return TCP_TW_ACK; // this line can be traced back to 2005
This is a challenge ACK following RFC 5961.
If SYN is returned here, the client may lose the chance to RST the
previous connection in TIME_WAIT.
https://www.rfc-editor.org/rfc/rfc9293.html#section-3.10.7.4-2.4.1
---8<---
- TIME-WAIT STATE
o If the SYN bit is set in these synchronized states, it may
be either a legitimate new connection attempt (e.g., in the
case of TIME-WAIT), an error where the connection should be
reset, or the result of an attack attempt, as described in
RFC 5961 [9]. For the TIME-WAIT state, new connections can
be accepted if the Timestamp Option is used and meets
expectations (per [40]). For all other cases, RFC 5961
provides a mitigation with applicability to some situations,
though there are also alternatives that offer cryptographic
protection (see Section 7). RFC 5961 recommends that in
these synchronized states, if the SYN bit is set,
irrespective of the sequence number, TCP endpoints MUST send
a "challenge ACK" to the remote peer:
<SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>
---8<---
https://datatracker.ietf.org/doc/html/rfc5961#section-4
---8<---
1) If the SYN bit is set, irrespective of the sequence number, TCP
MUST send an ACK (also referred to as challenge ACK) to the remote
peer:
<SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>
After sending the acknowledgment, TCP MUST drop the unacceptable
segment and stop processing further.
---8<---
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process
2024-11-07 4:15 ` Kuniyuki Iwashima
@ 2024-11-07 4:21 ` Kuniyuki Iwashima
2024-11-07 5:23 ` Jason Xing
1 sibling, 0 replies; 28+ messages in thread
From: Kuniyuki Iwashima @ 2024-11-07 4:21 UTC (permalink / raw)
To: kuniyu
Cc: davem, dsahern, edumazet, horms, kerneljasonxing, kernelxing,
kuba, netdev, pabeni
From: Kuniyuki Iwashima <kuniyu@amazon.com>
Date: Wed, 6 Nov 2024 20:15:06 -0800
> From: Jason Xing <kerneljasonxing@gmail.com>
> Date: Thu, 7 Nov 2024 11:16:04 +0800
> > > Here is how things happen in production:
> > > Time Client(A) Server(B)
> > > 0s SYN-->
> > > ...
> > > 132s <-- FIN
> > > ...
> > > 169s FIN-->
> > > 169s <-- ACK
> > > 169s SYN-->
> > > 169s <-- ACK
> >
> > I noticed the above ACK doesn't adhere to RFC 6191. It says:
> > "If the previous incarnation of the connection used Timestamps, then:
> > if ...
> > ...
> > * Otherwise, silently drop the incoming SYN segment, thus leaving
> > the previous incarnation of the connection in the TIME-WAIT
> > state.
> > "
> > But the timewait socket sends an ACK because of this code snippet:
> > tcp_timewait_state_process()
> > -> // the checks of SYN packet failed.
> > -> if (!th->rst) {
> > -> return TCP_TW_ACK; // this line can be traced back to 2005
>
> This is a challenge ACK following RFC 5961.
>
> If SYN is returned here, the client may lose the chance to RST the
> previous connection in TIME_WAIT.
s/returned/silently dropped/ :/
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process
2024-11-07 4:15 ` Kuniyuki Iwashima
2024-11-07 4:21 ` Kuniyuki Iwashima
@ 2024-11-07 5:23 ` Jason Xing
2024-11-07 5:43 ` Kuniyuki Iwashima
1 sibling, 1 reply; 28+ messages in thread
From: Jason Xing @ 2024-11-07 5:23 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: davem, dsahern, edumazet, horms, kernelxing, kuba, netdev, pabeni
On Thu, Nov 7, 2024 at 12:15 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> From: Jason Xing <kerneljasonxing@gmail.com>
> Date: Thu, 7 Nov 2024 11:16:04 +0800
> > > Here is how things happen in production:
> > > Time Client(A) Server(B)
> > > 0s SYN-->
> > > ...
> > > 132s <-- FIN
> > > ...
> > > 169s FIN-->
> > > 169s <-- ACK
> > > 169s SYN-->
> > > 169s <-- ACK
> >
> > I noticed the above ACK doesn't adhere to RFC 6191. It says:
> > "If the previous incarnation of the connection used Timestamps, then:
> > if ...
> > ...
> > * Otherwise, silently drop the incoming SYN segment, thus leaving
> > the previous incarnation of the connection in the TIME-WAIT
> > state.
> > "
> > But the timewait socket sends an ACK because of this code snippet:
> > tcp_timewait_state_process()
> > -> // the checks of SYN packet failed.
> > -> if (!th->rst) {
> > -> return TCP_TW_ACK; // this line can be traced back to 2005
>
> This is a challenge ACK following RFC 5961.
Please note the idea of challenge ack was proposed in 2010. But this
code snippet has already existed before 2005. If it is a challenge
ack, then at least we need to count it (by using NET_INC_STATS(net,
LINUX_MIB_TCPCHALLENGEACK);).
>
> If SYN is returned here, the client may lose the chance to RST the
> previous connection in TIME_WAIT.
>
> https://www.rfc-editor.org/rfc/rfc9293.html#section-3.10.7.4-2.4.1
> ---8<---
> - TIME-WAIT STATE
>
> o If the SYN bit is set in these synchronized states, it may
> be either a legitimate new connection attempt (e.g., in the
> case of TIME-WAIT), an error where the connection should be
> reset, or the result of an attack attempt, as described in
> RFC 5961 [9]. For the TIME-WAIT state, new connections can
> be accepted if the Timestamp Option is used and meets
> expectations (per [40]). For all other cases, RFC 5961
> provides a mitigation with applicability to some situations,
> though there are also alternatives that offer cryptographic
> protection (see Section 7). RFC 5961 recommends that in
> these synchronized states, if the SYN bit is set,
> irrespective of the sequence number, TCP endpoints MUST send
> a "challenge ACK" to the remote peer:
>
> <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>
> ---8<---
>
> https://datatracker.ietf.org/doc/html/rfc5961#section-4
> ---8<---
> 1) If the SYN bit is set, irrespective of the sequence number, TCP
> MUST send an ACK (also referred to as challenge ACK) to the remote
> peer:
>
> <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>
>
> After sending the acknowledgment, TCP MUST drop the unacceptable
> segment and stop processing further.
> ---8<---
The RFC 5961 4.2 was implemented in tcp_validate_incoming():
/* step 4: Check for a SYN
* RFC 5961 4.2 : Send a challenge ack
*/
if (th->syn) {
if (sk->sk_state == TCP_SYN_RECV && sk->sk_socket && th->ack &&
TCP_SKB_CB(skb)->seq + 1 == TCP_SKB_CB(skb)->end_seq &&
TCP_SKB_CB(skb)->seq + 1 == tp->rcv_nxt &&
TCP_SKB_CB(skb)->ack_seq == tp->snd_nxt)
goto pass;
syn_challenge:
if (syn_inerr)
TCP_INC_STATS(sock_net(sk), TCP_MIB_INERRS);
NET_INC_STATS(sock_net(sk),
LINUX_MIB_TCPSYNCHALLENGE);
tcp_send_challenge_ack(sk);
SKB_DR_SET(reason, TCP_INVALID_SYN);
goto discard;
}
Also, this quotation you mentioned obviously doesn't match the kernel
implementation:
"If the SYN bit is set, irrespective of the sequence number, TCP MUST
send an ACK"
The tcp_timewait_state_process() does care about the seq number, or
else timewait socket would refuse every SYN packet.
Thanks,
Jason
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process
2024-11-07 5:23 ` Jason Xing
@ 2024-11-07 5:43 ` Kuniyuki Iwashima
2024-11-07 6:51 ` Jason Xing
0 siblings, 1 reply; 28+ messages in thread
From: Kuniyuki Iwashima @ 2024-11-07 5:43 UTC (permalink / raw)
To: kerneljasonxing
Cc: davem, dsahern, edumazet, horms, kernelxing, kuba, kuniyu, netdev,
pabeni
From: Jason Xing <kerneljasonxing@gmail.com>
Date: Thu, 7 Nov 2024 13:23:50 +0800
> On Thu, Nov 7, 2024 at 12:15 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> >
> > From: Jason Xing <kerneljasonxing@gmail.com>
> > Date: Thu, 7 Nov 2024 11:16:04 +0800
> > > > Here is how things happen in production:
> > > > Time Client(A) Server(B)
> > > > 0s SYN-->
> > > > ...
> > > > 132s <-- FIN
> > > > ...
> > > > 169s FIN-->
> > > > 169s <-- ACK
> > > > 169s SYN-->
> > > > 169s <-- ACK
> > >
> > > I noticed the above ACK doesn't adhere to RFC 6191. It says:
> > > "If the previous incarnation of the connection used Timestamps, then:
> > > if ...
> > > ...
> > > * Otherwise, silently drop the incoming SYN segment, thus leaving
> > > the previous incarnation of the connection in the TIME-WAIT
> > > state.
> > > "
> > > But the timewait socket sends an ACK because of this code snippet:
> > > tcp_timewait_state_process()
> > > -> // the checks of SYN packet failed.
> > > -> if (!th->rst) {
> > > -> return TCP_TW_ACK; // this line can be traced back to 2005
> >
> > This is a challenge ACK following RFC 5961.
>
> Please note the idea of challenge ack was proposed in 2010. But this
> code snippet has already existed before 2005. If it is a challenge
> ack, then at least we need to count it (by using NET_INC_STATS(net,
> LINUX_MIB_TCPCHALLENGEACK);).
The word was not accurate, the behaviour is compliant with RFC 5961.
RFC is often formalised based on real implementations.
Incrementing the count makes sense to me.
>
> >
> > If SYN is returned here, the client may lose the chance to RST the
> > previous connection in TIME_WAIT.
> >
> > https://www.rfc-editor.org/rfc/rfc9293.html#section-3.10.7.4-2.4.1
> > ---8<---
> > - TIME-WAIT STATE
> >
> > o If the SYN bit is set in these synchronized states, it may
> > be either a legitimate new connection attempt (e.g., in the
> > case of TIME-WAIT), an error where the connection should be
> > reset, or the result of an attack attempt, as described in
> > RFC 5961 [9]. For the TIME-WAIT state, new connections can
> > be accepted if the Timestamp Option is used and meets
> > expectations (per [40]). For all other cases, RFC 5961
> > provides a mitigation with applicability to some situations,
> > though there are also alternatives that offer cryptographic
> > protection (see Section 7). RFC 5961 recommends that in
> > these synchronized states, if the SYN bit is set,
> > irrespective of the sequence number, TCP endpoints MUST send
> > a "challenge ACK" to the remote peer:
> >
> > <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>
> > ---8<---
> >
> > https://datatracker.ietf.org/doc/html/rfc5961#section-4
> > ---8<---
> > 1) If the SYN bit is set, irrespective of the sequence number, TCP
> > MUST send an ACK (also referred to as challenge ACK) to the remote
> > peer:
> >
> > <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>
> >
> > After sending the acknowledgment, TCP MUST drop the unacceptable
> > segment and stop processing further.
> > ---8<---
>
> The RFC 5961 4.2 was implemented in tcp_validate_incoming():
> /* step 4: Check for a SYN
> * RFC 5961 4.2 : Send a challenge ack
> */
> if (th->syn) {
> if (sk->sk_state == TCP_SYN_RECV && sk->sk_socket && th->ack &&
> TCP_SKB_CB(skb)->seq + 1 == TCP_SKB_CB(skb)->end_seq &&
> TCP_SKB_CB(skb)->seq + 1 == tp->rcv_nxt &&
> TCP_SKB_CB(skb)->ack_seq == tp->snd_nxt)
> goto pass;
> syn_challenge:
> if (syn_inerr)
> TCP_INC_STATS(sock_net(sk), TCP_MIB_INERRS);
> NET_INC_STATS(sock_net(sk),
> LINUX_MIB_TCPSYNCHALLENGE);
> tcp_send_challenge_ack(sk);
> SKB_DR_SET(reason, TCP_INVALID_SYN);
> goto discard;
> }
>
> Also, this quotation you mentioned obviously doesn't match the kernel
> implementation:
> "If the SYN bit is set, irrespective of the sequence number, TCP MUST
> send an ACK"
> The tcp_timewait_state_process() does care about the seq number, or
> else timewait socket would refuse every SYN packet.
That's why I pasted RFC 9293 first that clearly states that we
should check seq number and then return ACK for all other cases.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process
2024-11-07 5:43 ` Kuniyuki Iwashima
@ 2024-11-07 6:51 ` Jason Xing
2024-11-07 7:11 ` Kuniyuki Iwashima
0 siblings, 1 reply; 28+ messages in thread
From: Jason Xing @ 2024-11-07 6:51 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: davem, dsahern, edumazet, horms, kernelxing, kuba, netdev, pabeni
On Thu, Nov 7, 2024 at 1:43 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> From: Jason Xing <kerneljasonxing@gmail.com>
> Date: Thu, 7 Nov 2024 13:23:50 +0800
> > On Thu, Nov 7, 2024 at 12:15 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > >
> > > From: Jason Xing <kerneljasonxing@gmail.com>
> > > Date: Thu, 7 Nov 2024 11:16:04 +0800
> > > > > Here is how things happen in production:
> > > > > Time Client(A) Server(B)
> > > > > 0s SYN-->
> > > > > ...
> > > > > 132s <-- FIN
> > > > > ...
> > > > > 169s FIN-->
> > > > > 169s <-- ACK
> > > > > 169s SYN-->
> > > > > 169s <-- ACK
> > > >
> > > > I noticed the above ACK doesn't adhere to RFC 6191. It says:
> > > > "If the previous incarnation of the connection used Timestamps, then:
> > > > if ...
> > > > ...
> > > > * Otherwise, silently drop the incoming SYN segment, thus leaving
> > > > the previous incarnation of the connection in the TIME-WAIT
> > > > state.
> > > > "
> > > > But the timewait socket sends an ACK because of this code snippet:
> > > > tcp_timewait_state_process()
> > > > -> // the checks of SYN packet failed.
> > > > -> if (!th->rst) {
> > > > -> return TCP_TW_ACK; // this line can be traced back to 2005
> > >
> > > This is a challenge ACK following RFC 5961.
> >
> > Please note the idea of challenge ack was proposed in 2010. But this
> > code snippet has already existed before 2005. If it is a challenge
> > ack, then at least we need to count it (by using NET_INC_STATS(net,
> > LINUX_MIB_TCPCHALLENGEACK);).
>
> The word was not accurate, the behaviour is compliant with RFC 5961.
> RFC is often formalised based on real implementations.
>
> Incrementing the count makes sense to me.
>
> >
> > >
> > > If SYN is returned here, the client may lose the chance to RST the
> > > previous connection in TIME_WAIT.
> > >
> > > https://www.rfc-editor.org/rfc/rfc9293.html#section-3.10.7.4-2.4.1
> > > ---8<---
> > > - TIME-WAIT STATE
> > >
> > > o If the SYN bit is set in these synchronized states, it may
> > > be either a legitimate new connection attempt (e.g., in the
> > > case of TIME-WAIT), an error where the connection should be
> > > reset, or the result of an attack attempt, as described in
> > > RFC 5961 [9]. For the TIME-WAIT state, new connections can
> > > be accepted if the Timestamp Option is used and meets
> > > expectations (per [40]). For all other cases, RFC 5961
> > > provides a mitigation with applicability to some situations,
> > > though there are also alternatives that offer cryptographic
> > > protection (see Section 7). RFC 5961 recommends that in
> > > these synchronized states, if the SYN bit is set,
> > > irrespective of the sequence number, TCP endpoints MUST send
> > > a "challenge ACK" to the remote peer:
> > >
> > > <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>
> > > ---8<---
> > >
> > > https://datatracker.ietf.org/doc/html/rfc5961#section-4
> > > ---8<---
> > > 1) If the SYN bit is set, irrespective of the sequence number, TCP
> > > MUST send an ACK (also referred to as challenge ACK) to the remote
> > > peer:
> > >
> > > <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>
> > >
> > > After sending the acknowledgment, TCP MUST drop the unacceptable
> > > segment and stop processing further.
> > > ---8<---
> >
> > The RFC 5961 4.2 was implemented in tcp_validate_incoming():
> > /* step 4: Check for a SYN
> > * RFC 5961 4.2 : Send a challenge ack
> > */
> > if (th->syn) {
> > if (sk->sk_state == TCP_SYN_RECV && sk->sk_socket && th->ack &&
> > TCP_SKB_CB(skb)->seq + 1 == TCP_SKB_CB(skb)->end_seq &&
> > TCP_SKB_CB(skb)->seq + 1 == tp->rcv_nxt &&
> > TCP_SKB_CB(skb)->ack_seq == tp->snd_nxt)
> > goto pass;
> > syn_challenge:
> > if (syn_inerr)
> > TCP_INC_STATS(sock_net(sk), TCP_MIB_INERRS);
> > NET_INC_STATS(sock_net(sk),
> > LINUX_MIB_TCPSYNCHALLENGE);
> > tcp_send_challenge_ack(sk);
> > SKB_DR_SET(reason, TCP_INVALID_SYN);
> > goto discard;
> > }
> >
> > Also, this quotation you mentioned obviously doesn't match the kernel
> > implementation:
> > "If the SYN bit is set, irrespective of the sequence number, TCP MUST
> > send an ACK"
> > The tcp_timewait_state_process() does care about the seq number, or
> > else timewait socket would refuse every SYN packet.
>
> That's why I pasted RFC 9293 first that clearly states that we
> should check seq number and then return ACK for all other cases.
I don't think so.
RFC 9293 only states that RFC 5691 provides an approach that mitigates
the risk by rejecting all the SYN packets if the socket stays in
synchronized state. It's "For all other cases" in RFC 9293.
Please loot at "irrespective of the sequence number" in RFC 5691 4.2
[1]. It means no matter what the seq is we MUST send back an ACK
instead of establishing a new connection.
Actually the tcp_timewait_state_process() checks the seq or timestamp
in the SYN packet.
[1]: https://datatracker.ietf.org/doc/html/rfc5961#section-4.2
Thanks,
Jason
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process
2024-11-07 6:51 ` Jason Xing
@ 2024-11-07 7:11 ` Kuniyuki Iwashima
2024-11-07 7:44 ` Jason Xing
0 siblings, 1 reply; 28+ messages in thread
From: Kuniyuki Iwashima @ 2024-11-07 7:11 UTC (permalink / raw)
To: kerneljasonxing
Cc: davem, dsahern, edumazet, horms, kernelxing, kuba, kuniyu, netdev,
pabeni
From: Jason Xing <kerneljasonxing@gmail.com>
Date: Thu, 7 Nov 2024 14:51:35 +0800
> On Thu, Nov 7, 2024 at 1:43 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> >
> > From: Jason Xing <kerneljasonxing@gmail.com>
> > Date: Thu, 7 Nov 2024 13:23:50 +0800
> > > On Thu, Nov 7, 2024 at 12:15 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > > >
> > > > From: Jason Xing <kerneljasonxing@gmail.com>
> > > > Date: Thu, 7 Nov 2024 11:16:04 +0800
> > > > > > Here is how things happen in production:
> > > > > > Time Client(A) Server(B)
> > > > > > 0s SYN-->
> > > > > > ...
> > > > > > 132s <-- FIN
> > > > > > ...
> > > > > > 169s FIN-->
> > > > > > 169s <-- ACK
> > > > > > 169s SYN-->
> > > > > > 169s <-- ACK
> > > > >
> > > > > I noticed the above ACK doesn't adhere to RFC 6191. It says:
> > > > > "If the previous incarnation of the connection used Timestamps, then:
> > > > > if ...
> > > > > ...
> > > > > * Otherwise, silently drop the incoming SYN segment, thus leaving
> > > > > the previous incarnation of the connection in the TIME-WAIT
> > > > > state.
> > > > > "
> > > > > But the timewait socket sends an ACK because of this code snippet:
> > > > > tcp_timewait_state_process()
> > > > > -> // the checks of SYN packet failed.
> > > > > -> if (!th->rst) {
> > > > > -> return TCP_TW_ACK; // this line can be traced back to 2005
> > > >
> > > > This is a challenge ACK following RFC 5961.
> > >
> > > Please note the idea of challenge ack was proposed in 2010. But this
> > > code snippet has already existed before 2005. If it is a challenge
> > > ack, then at least we need to count it (by using NET_INC_STATS(net,
> > > LINUX_MIB_TCPCHALLENGEACK);).
> >
> > The word was not accurate, the behaviour is compliant with RFC 5961.
> > RFC is often formalised based on real implementations.
> >
> > Incrementing the count makes sense to me.
> >
> > >
> > > >
> > > > If SYN is returned here, the client may lose the chance to RST the
> > > > previous connection in TIME_WAIT.
> > > >
> > > > https://www.rfc-editor.org/rfc/rfc9293.html#section-3.10.7.4-2.4.1
> > > > ---8<---
> > > > - TIME-WAIT STATE
> > > >
> > > > o If the SYN bit is set in these synchronized states, it may
> > > > be either a legitimate new connection attempt (e.g., in the
> > > > case of TIME-WAIT), an error where the connection should be
> > > > reset, or the result of an attack attempt, as described in
> > > > RFC 5961 [9]. For the TIME-WAIT state, new connections can
> > > > be accepted if the Timestamp Option is used and meets
> > > > expectations (per [40]). For all other cases, RFC 5961
> > > > provides a mitigation with applicability to some situations,
> > > > though there are also alternatives that offer cryptographic
> > > > protection (see Section 7). RFC 5961 recommends that in
> > > > these synchronized states, if the SYN bit is set,
> > > > irrespective of the sequence number, TCP endpoints MUST send
> > > > a "challenge ACK" to the remote peer:
> > > >
> > > > <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>
> > > > ---8<---
> > > >
> > > > https://datatracker.ietf.org/doc/html/rfc5961#section-4
> > > > ---8<---
> > > > 1) If the SYN bit is set, irrespective of the sequence number, TCP
> > > > MUST send an ACK (also referred to as challenge ACK) to the remote
> > > > peer:
> > > >
> > > > <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>
> > > >
> > > > After sending the acknowledgment, TCP MUST drop the unacceptable
> > > > segment and stop processing further.
> > > > ---8<---
> > >
> > > The RFC 5961 4.2 was implemented in tcp_validate_incoming():
> > > /* step 4: Check for a SYN
> > > * RFC 5961 4.2 : Send a challenge ack
> > > */
> > > if (th->syn) {
> > > if (sk->sk_state == TCP_SYN_RECV && sk->sk_socket && th->ack &&
> > > TCP_SKB_CB(skb)->seq + 1 == TCP_SKB_CB(skb)->end_seq &&
> > > TCP_SKB_CB(skb)->seq + 1 == tp->rcv_nxt &&
> > > TCP_SKB_CB(skb)->ack_seq == tp->snd_nxt)
> > > goto pass;
> > > syn_challenge:
> > > if (syn_inerr)
> > > TCP_INC_STATS(sock_net(sk), TCP_MIB_INERRS);
> > > NET_INC_STATS(sock_net(sk),
> > > LINUX_MIB_TCPSYNCHALLENGE);
> > > tcp_send_challenge_ack(sk);
> > > SKB_DR_SET(reason, TCP_INVALID_SYN);
> > > goto discard;
> > > }
> > >
> > > Also, this quotation you mentioned obviously doesn't match the kernel
> > > implementation:
> > > "If the SYN bit is set, irrespective of the sequence number, TCP MUST
> > > send an ACK"
> > > The tcp_timewait_state_process() does care about the seq number, or
> > > else timewait socket would refuse every SYN packet.
> >
> > That's why I pasted RFC 9293 first that clearly states that we
> > should check seq number and then return ACK for all other cases.
>
> I don't think so.
>
> RFC 9293 only states that RFC 5691 provides an approach that mitigates
> the risk by rejecting all the SYN packets if the socket stays in
> synchronized state. It's "For all other cases" in RFC 9293.
RFC 9293 states which RFC to prioritise. You will find the
link [40] is RFC 6191.
---8<---
For the TIME-WAIT state, new connections can
be accepted if the Timestamp Option is used and meets
expectations (per [40]). For all other cases, RFC 5961
...
---8<---
> Please loot at "irrespective of the sequence number" in RFC 5691 4.2
> [1]. It means no matter what the seq is we MUST send back an ACK
> instead of establishing a new connection.
RFC 9293 mentions accepatble cases first, so this is only applied
to "all other cases"
> Actually the tcp_timewait_state_process() checks the seq or timestamp
> in the SYN packet.
and this part takes precedence than "all other cases".
Also, you missed that the pasted part is the 4th step of incoming
segment processing.
https://www.rfc-editor.org/rfc/rfc9293.html#section-3.10.7.4
---8<---
First, check sequence number: ...
Second, check the RST bit: ...
Third, check security: ...
Fourth, check the SYN bit:
...
TIME-WAIT STATE
If the SYN bit is set in these synchronized states...
---8<---
So, RFC 9293 says "check seq number, RST, security, then
if the connection is still accepatable for TIME_WAIT based on
RFC 6191, accept it, otherwise, return ACK based on RFC 5691".
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process
2024-11-07 7:11 ` Kuniyuki Iwashima
@ 2024-11-07 7:44 ` Jason Xing
0 siblings, 0 replies; 28+ messages in thread
From: Jason Xing @ 2024-11-07 7:44 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: davem, dsahern, edumazet, horms, kernelxing, kuba, netdev, pabeni
On Thu, Nov 7, 2024 at 3:11 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> From: Jason Xing <kerneljasonxing@gmail.com>
> Date: Thu, 7 Nov 2024 14:51:35 +0800
> > On Thu, Nov 7, 2024 at 1:43 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > >
> > > From: Jason Xing <kerneljasonxing@gmail.com>
> > > Date: Thu, 7 Nov 2024 13:23:50 +0800
> > > > On Thu, Nov 7, 2024 at 12:15 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > > > >
> > > > > From: Jason Xing <kerneljasonxing@gmail.com>
> > > > > Date: Thu, 7 Nov 2024 11:16:04 +0800
> > > > > > > Here is how things happen in production:
> > > > > > > Time Client(A) Server(B)
> > > > > > > 0s SYN-->
> > > > > > > ...
> > > > > > > 132s <-- FIN
> > > > > > > ...
> > > > > > > 169s FIN-->
> > > > > > > 169s <-- ACK
> > > > > > > 169s SYN-->
> > > > > > > 169s <-- ACK
> > > > > >
> > > > > > I noticed the above ACK doesn't adhere to RFC 6191. It says:
> > > > > > "If the previous incarnation of the connection used Timestamps, then:
> > > > > > if ...
> > > > > > ...
> > > > > > * Otherwise, silently drop the incoming SYN segment, thus leaving
> > > > > > the previous incarnation of the connection in the TIME-WAIT
> > > > > > state.
> > > > > > "
> > > > > > But the timewait socket sends an ACK because of this code snippet:
> > > > > > tcp_timewait_state_process()
> > > > > > -> // the checks of SYN packet failed.
> > > > > > -> if (!th->rst) {
> > > > > > -> return TCP_TW_ACK; // this line can be traced back to 2005
> > > > >
> > > > > This is a challenge ACK following RFC 5961.
> > > >
> > > > Please note the idea of challenge ack was proposed in 2010. But this
> > > > code snippet has already existed before 2005. If it is a challenge
> > > > ack, then at least we need to count it (by using NET_INC_STATS(net,
> > > > LINUX_MIB_TCPCHALLENGEACK);).
> > >
> > > The word was not accurate, the behaviour is compliant with RFC 5961.
> > > RFC is often formalised based on real implementations.
> > >
> > > Incrementing the count makes sense to me.
> > >
> > > >
> > > > >
> > > > > If SYN is returned here, the client may lose the chance to RST the
> > > > > previous connection in TIME_WAIT.
> > > > >
> > > > > https://www.rfc-editor.org/rfc/rfc9293.html#section-3.10.7.4-2.4.1
> > > > > ---8<---
> > > > > - TIME-WAIT STATE
> > > > >
> > > > > o If the SYN bit is set in these synchronized states, it may
> > > > > be either a legitimate new connection attempt (e.g., in the
> > > > > case of TIME-WAIT), an error where the connection should be
> > > > > reset, or the result of an attack attempt, as described in
> > > > > RFC 5961 [9]. For the TIME-WAIT state, new connections can
> > > > > be accepted if the Timestamp Option is used and meets
> > > > > expectations (per [40]). For all other cases, RFC 5961
> > > > > provides a mitigation with applicability to some situations,
> > > > > though there are also alternatives that offer cryptographic
> > > > > protection (see Section 7). RFC 5961 recommends that in
> > > > > these synchronized states, if the SYN bit is set,
> > > > > irrespective of the sequence number, TCP endpoints MUST send
> > > > > a "challenge ACK" to the remote peer:
> > > > >
> > > > > <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>
> > > > > ---8<---
> > > > >
> > > > > https://datatracker.ietf.org/doc/html/rfc5961#section-4
> > > > > ---8<---
> > > > > 1) If the SYN bit is set, irrespective of the sequence number, TCP
> > > > > MUST send an ACK (also referred to as challenge ACK) to the remote
> > > > > peer:
> > > > >
> > > > > <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>
> > > > >
> > > > > After sending the acknowledgment, TCP MUST drop the unacceptable
> > > > > segment and stop processing further.
> > > > > ---8<---
> > > >
> > > > The RFC 5961 4.2 was implemented in tcp_validate_incoming():
> > > > /* step 4: Check for a SYN
> > > > * RFC 5961 4.2 : Send a challenge ack
> > > > */
> > > > if (th->syn) {
> > > > if (sk->sk_state == TCP_SYN_RECV && sk->sk_socket && th->ack &&
> > > > TCP_SKB_CB(skb)->seq + 1 == TCP_SKB_CB(skb)->end_seq &&
> > > > TCP_SKB_CB(skb)->seq + 1 == tp->rcv_nxt &&
> > > > TCP_SKB_CB(skb)->ack_seq == tp->snd_nxt)
> > > > goto pass;
> > > > syn_challenge:
> > > > if (syn_inerr)
> > > > TCP_INC_STATS(sock_net(sk), TCP_MIB_INERRS);
> > > > NET_INC_STATS(sock_net(sk),
> > > > LINUX_MIB_TCPSYNCHALLENGE);
> > > > tcp_send_challenge_ack(sk);
> > > > SKB_DR_SET(reason, TCP_INVALID_SYN);
> > > > goto discard;
> > > > }
> > > >
> > > > Also, this quotation you mentioned obviously doesn't match the kernel
> > > > implementation:
> > > > "If the SYN bit is set, irrespective of the sequence number, TCP MUST
> > > > send an ACK"
> > > > The tcp_timewait_state_process() does care about the seq number, or
> > > > else timewait socket would refuse every SYN packet.
> > >
> > > That's why I pasted RFC 9293 first that clearly states that we
> > > should check seq number and then return ACK for all other cases.
> >
> > I don't think so.
> >
> > RFC 9293 only states that RFC 5691 provides an approach that mitigates
> > the risk by rejecting all the SYN packets if the socket stays in
> > synchronized state. It's "For all other cases" in RFC 9293.
>
> RFC 9293 states which RFC to prioritise. You will find the
> link [40] is RFC 6191.
>
> ---8<---
> For the TIME-WAIT state, new connections can
> be accepted if the Timestamp Option is used and meets
> expectations (per [40]). For all other cases, RFC 5961
> ...
> ---8<---
>
> > Please loot at "irrespective of the sequence number" in RFC 5691 4.2
> > [1]. It means no matter what the seq is we MUST send back an ACK
> > instead of establishing a new connection.
>
> RFC 9293 mentions accepatble cases first, so this is only applied
> to "all other cases"
>
>
> > Actually the tcp_timewait_state_process() checks the seq or timestamp
> > in the SYN packet.
>
> and this part takes precedence than "all other cases".
>
> Also, you missed that the pasted part is the 4th step of incoming
> segment processing.
>
> https://www.rfc-editor.org/rfc/rfc9293.html#section-3.10.7.4
> ---8<---
> First, check sequence number: ...
> Second, check the RST bit: ...
> Third, check security: ...
> Fourth, check the SYN bit:
> ...
> TIME-WAIT STATE
> If the SYN bit is set in these synchronized states...
> ---8<---
>
> So, RFC 9293 says "check seq number, RST, security, then
> if the connection is still accepatable for TIME_WAIT based on
> RFC 6191, accept it, otherwise, return ACK based on RFC 5691".
I see what you mean here. If that is so, tcp_timewait_state_process()
has already implemented the challenge ack even before the challenge
ack showed up in 2010.
Interesting. If this part refers to RFC 5691, then we need to copy
part of tcp_send_challenge_ack() in this case to send the challenge
ack, like testing sysctl_tcp_challenge_ack_limit, etc.
Thanks,
Jason
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process
2024-11-05 2:55 [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process Jason Xing
2024-11-05 7:49 ` Kuniyuki Iwashima
2024-11-07 3:16 ` Jason Xing
@ 2024-11-07 7:51 ` Philo Lu
2024-11-07 8:01 ` Jason Xing
2 siblings, 1 reply; 28+ messages in thread
From: Philo Lu @ 2024-11-07 7:51 UTC (permalink / raw)
To: Jason Xing, davem, edumazet, kuba, pabeni, dsahern, horms
Cc: netdev, Jason Xing
Hi Jason,
On 2024/11/5 10:55, Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
>
> We found there are rare chances that some RST packets appear during
> the shakehands because the timewait socket cannot accept the SYN and
> doesn't return TCP_TW_SYN in tcp_timewait_state_process().
>
> Here is how things happen in production:
> Time Client(A) Server(B)
> 0s SYN-->
> ...
> 132s <-- FIN
> ...
> 169s FIN-->
> 169s <-- ACK
> 169s SYN-->
> 169s <-- ACK
> 169s RST-->
> As above picture shows, the two flows have a start time difference
> of 169 seconds. B starts to send FIN so it will finally enter into
> TIMEWAIT state. Nearly at the same time A launches a new connection
> that soon is reset by itself due to receiving a ACK.
>
> There are two key checks in tcp_timewait_state_process() when timewait
> socket in B receives the SYN packet:
> 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt)
> 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0)
>
> Regarding the first rule, it fails as expected because in the first
> connection the seq of SYN sent from A is 1892994276, then 169s have
> passed, the second SYN has 239034613 (caused by overflow of s32).
>
> Then how about the second rule?
> It fails again!
> Let's take a look at how the tsval comes out:
> __tcp_transmit_skb()
> -> tcp_syn_options()
> -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset;
> The timestamp depends on two things, one is skb->skb_mstamp_ns, the
> other is tp->tsoffset. The latter value is fixed, so we don't need
> to care about it. If both operations (sending FIN and then starting
> sending SYN) from A happen in 1ms, then the tsval would be the same.
> It can be clearly seen in the tcpdump log. Notice that the tsval is
> with millisecond precision.
>
> Based on the above analysis, I decided to make a small change to
> the check in tcp_timewait_state_process() so that the second flow
> would not fail.
>
I wonder what a bad result the RST causes. As far as I know, the client
will not close the connect and return. Instead, it re-sends an SYN in
TCP_TIMEOUT_MIN(2) jiffies (implemented in
tcp_rcv_synsent_state_process). So the second connection could still be
established successfully, at the cost of a bit more delay. Like:
Time Client(A) Server(B)
0s SYN-->
...
132s <-- FIN
...
169s FIN-->
169s <-- ACK
169s SYN-->
169s <-- ACK
169s RST-->
~2jiffies SYN-->
<-- SYN,ACK
Thanks.
--
Philo
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process
2024-11-07 7:51 ` Philo Lu
@ 2024-11-07 8:01 ` Jason Xing
2024-11-07 8:22 ` Philo Lu
0 siblings, 1 reply; 28+ messages in thread
From: Jason Xing @ 2024-11-07 8:01 UTC (permalink / raw)
To: Philo Lu
Cc: davem, edumazet, kuba, pabeni, dsahern, horms, netdev, Jason Xing
On Thu, Nov 7, 2024 at 3:51 PM Philo Lu <lulie@linux.alibaba.com> wrote:
>
> Hi Jason,
>
> On 2024/11/5 10:55, Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > We found there are rare chances that some RST packets appear during
> > the shakehands because the timewait socket cannot accept the SYN and
> > doesn't return TCP_TW_SYN in tcp_timewait_state_process().
> >
> > Here is how things happen in production:
> > Time Client(A) Server(B)
> > 0s SYN-->
> > ...
> > 132s <-- FIN
> > ...
> > 169s FIN-->
> > 169s <-- ACK
> > 169s SYN-->
> > 169s <-- ACK
> > 169s RST-->
> > As above picture shows, the two flows have a start time difference
> > of 169 seconds. B starts to send FIN so it will finally enter into
> > TIMEWAIT state. Nearly at the same time A launches a new connection
> > that soon is reset by itself due to receiving a ACK.
> >
> > There are two key checks in tcp_timewait_state_process() when timewait
> > socket in B receives the SYN packet:
> > 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt)
> > 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0)
> >
> > Regarding the first rule, it fails as expected because in the first
> > connection the seq of SYN sent from A is 1892994276, then 169s have
> > passed, the second SYN has 239034613 (caused by overflow of s32).
> >
> > Then how about the second rule?
> > It fails again!
> > Let's take a look at how the tsval comes out:
> > __tcp_transmit_skb()
> > -> tcp_syn_options()
> > -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset;
> > The timestamp depends on two things, one is skb->skb_mstamp_ns, the
> > other is tp->tsoffset. The latter value is fixed, so we don't need
> > to care about it. If both operations (sending FIN and then starting
> > sending SYN) from A happen in 1ms, then the tsval would be the same.
> > It can be clearly seen in the tcpdump log. Notice that the tsval is
> > with millisecond precision.
> >
> > Based on the above analysis, I decided to make a small change to
> > the check in tcp_timewait_state_process() so that the second flow
> > would not fail.
> >
>
> I wonder what a bad result the RST causes. As far as I know, the client
> will not close the connect and return. Instead, it re-sends an SYN in
> TCP_TIMEOUT_MIN(2) jiffies (implemented in
> tcp_rcv_synsent_state_process). So the second connection could still be
> established successfully, at the cost of a bit more delay. Like:
>
> Time Client(A) Server(B)
> 0s SYN-->
> ...
> 132s <-- FIN
> ...
> 169s FIN-->
> 169s <-- ACK
> 169s SYN-->
> 169s <-- ACK
> 169s RST-->
> ~2jiffies SYN-->
> <-- SYN,ACK
That's exactly what I meant here :) Originally I didn't expect the
application to relaunch a connection in this case.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process
2024-11-07 8:01 ` Jason Xing
@ 2024-11-07 8:22 ` Philo Lu
2024-11-07 8:26 ` Jason Xing
0 siblings, 1 reply; 28+ messages in thread
From: Philo Lu @ 2024-11-07 8:22 UTC (permalink / raw)
To: Jason Xing
Cc: davem, edumazet, kuba, pabeni, dsahern, horms, netdev, Jason Xing
On 2024/11/7 16:01, Jason Xing wrote:
> On Thu, Nov 7, 2024 at 3:51 PM Philo Lu <lulie@linux.alibaba.com> wrote:
>>
>> Hi Jason,
>>
>> On 2024/11/5 10:55, Jason Xing wrote:
>>> From: Jason Xing <kernelxing@tencent.com>
>>>
>>> We found there are rare chances that some RST packets appear during
>>> the shakehands because the timewait socket cannot accept the SYN and
>>> doesn't return TCP_TW_SYN in tcp_timewait_state_process().
>>>
>>> Here is how things happen in production:
>>> Time Client(A) Server(B)
>>> 0s SYN-->
>>> ...
>>> 132s <-- FIN
>>> ...
>>> 169s FIN-->
>>> 169s <-- ACK
>>> 169s SYN-->
>>> 169s <-- ACK
>>> 169s RST-->
>>> As above picture shows, the two flows have a start time difference
>>> of 169 seconds. B starts to send FIN so it will finally enter into
>>> TIMEWAIT state. Nearly at the same time A launches a new connection
>>> that soon is reset by itself due to receiving a ACK.
>>>
>>> There are two key checks in tcp_timewait_state_process() when timewait
>>> socket in B receives the SYN packet:
>>> 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt)
>>> 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0)
>>>
>>> Regarding the first rule, it fails as expected because in the first
>>> connection the seq of SYN sent from A is 1892994276, then 169s have
>>> passed, the second SYN has 239034613 (caused by overflow of s32).
>>>
>>> Then how about the second rule?
>>> It fails again!
>>> Let's take a look at how the tsval comes out:
>>> __tcp_transmit_skb()
>>> -> tcp_syn_options()
>>> -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset;
>>> The timestamp depends on two things, one is skb->skb_mstamp_ns, the
>>> other is tp->tsoffset. The latter value is fixed, so we don't need
>>> to care about it. If both operations (sending FIN and then starting
>>> sending SYN) from A happen in 1ms, then the tsval would be the same.
>>> It can be clearly seen in the tcpdump log. Notice that the tsval is
>>> with millisecond precision.
>>>
>>> Based on the above analysis, I decided to make a small change to
>>> the check in tcp_timewait_state_process() so that the second flow
>>> would not fail.
>>>
>>
>> I wonder what a bad result the RST causes. As far as I know, the client
>> will not close the connect and return. Instead, it re-sends an SYN in
>> TCP_TIMEOUT_MIN(2) jiffies (implemented in
>> tcp_rcv_synsent_state_process). So the second connection could still be
>> established successfully, at the cost of a bit more delay. Like:
>>
>> Time Client(A) Server(B)
>> 0s SYN-->
>> ...
>> 132s <-- FIN
>> ...
>> 169s FIN-->
>> 169s <-- ACK
>> 169s SYN-->
>> 169s <-- ACK
>> 169s RST-->
>> ~2jiffies SYN-->
>> <-- SYN,ACK
>
> That's exactly what I meant here :) Originally I didn't expect the
> application to relaunch a connection in this case.
s/application/kernel/, right? Because the retry is transparent to user
applications except the additional latency. I think all of these are
finished in a single connect() :)
--
Philo
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process
2024-11-07 8:22 ` Philo Lu
@ 2024-11-07 8:26 ` Jason Xing
2024-11-07 8:37 ` Eric Dumazet
0 siblings, 1 reply; 28+ messages in thread
From: Jason Xing @ 2024-11-07 8:26 UTC (permalink / raw)
To: Philo Lu
Cc: davem, edumazet, kuba, pabeni, dsahern, horms, netdev, Jason Xing
On Thu, Nov 7, 2024 at 4:22 PM Philo Lu <lulie@linux.alibaba.com> wrote:
>
>
>
> On 2024/11/7 16:01, Jason Xing wrote:
> > On Thu, Nov 7, 2024 at 3:51 PM Philo Lu <lulie@linux.alibaba.com> wrote:
> >>
> >> Hi Jason,
> >>
> >> On 2024/11/5 10:55, Jason Xing wrote:
> >>> From: Jason Xing <kernelxing@tencent.com>
> >>>
> >>> We found there are rare chances that some RST packets appear during
> >>> the shakehands because the timewait socket cannot accept the SYN and
> >>> doesn't return TCP_TW_SYN in tcp_timewait_state_process().
> >>>
> >>> Here is how things happen in production:
> >>> Time Client(A) Server(B)
> >>> 0s SYN-->
> >>> ...
> >>> 132s <-- FIN
> >>> ...
> >>> 169s FIN-->
> >>> 169s <-- ACK
> >>> 169s SYN-->
> >>> 169s <-- ACK
> >>> 169s RST-->
> >>> As above picture shows, the two flows have a start time difference
> >>> of 169 seconds. B starts to send FIN so it will finally enter into
> >>> TIMEWAIT state. Nearly at the same time A launches a new connection
> >>> that soon is reset by itself due to receiving a ACK.
> >>>
> >>> There are two key checks in tcp_timewait_state_process() when timewait
> >>> socket in B receives the SYN packet:
> >>> 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt)
> >>> 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0)
> >>>
> >>> Regarding the first rule, it fails as expected because in the first
> >>> connection the seq of SYN sent from A is 1892994276, then 169s have
> >>> passed, the second SYN has 239034613 (caused by overflow of s32).
> >>>
> >>> Then how about the second rule?
> >>> It fails again!
> >>> Let's take a look at how the tsval comes out:
> >>> __tcp_transmit_skb()
> >>> -> tcp_syn_options()
> >>> -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset;
> >>> The timestamp depends on two things, one is skb->skb_mstamp_ns, the
> >>> other is tp->tsoffset. The latter value is fixed, so we don't need
> >>> to care about it. If both operations (sending FIN and then starting
> >>> sending SYN) from A happen in 1ms, then the tsval would be the same.
> >>> It can be clearly seen in the tcpdump log. Notice that the tsval is
> >>> with millisecond precision.
> >>>
> >>> Based on the above analysis, I decided to make a small change to
> >>> the check in tcp_timewait_state_process() so that the second flow
> >>> would not fail.
> >>>
> >>
> >> I wonder what a bad result the RST causes. As far as I know, the client
> >> will not close the connect and return. Instead, it re-sends an SYN in
> >> TCP_TIMEOUT_MIN(2) jiffies (implemented in
> >> tcp_rcv_synsent_state_process). So the second connection could still be
> >> established successfully, at the cost of a bit more delay. Like:
> >>
> >> Time Client(A) Server(B)
> >> 0s SYN-->
> >> ...
> >> 132s <-- FIN
> >> ...
> >> 169s FIN-->
> >> 169s <-- ACK
> >> 169s SYN-->
> >> 169s <-- ACK
> >> 169s RST-->
> >> ~2jiffies SYN-->
> >> <-- SYN,ACK
> >
> > That's exactly what I meant here :) Originally I didn't expect the
> > application to relaunch a connection in this case.
>
> s/application/kernel/, right?
No. Perhaps I didn't make myself clear. If the kernel doesn't silently
drop the SYN and then send back an ACK, the application has to call
the connect() syscall again.
> Because the retry is transparent to user
> applications except the additional latency. I think all of these are
> finished in a single connect() :)
Right.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process
2024-11-07 8:26 ` Jason Xing
@ 2024-11-07 8:37 ` Eric Dumazet
2024-11-07 9:00 ` Jason Xing
0 siblings, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2024-11-07 8:37 UTC (permalink / raw)
To: Jason Xing
Cc: Philo Lu, davem, kuba, pabeni, dsahern, horms, netdev, Jason Xing
On Thu, Nov 7, 2024 at 9:27 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Thu, Nov 7, 2024 at 4:22 PM Philo Lu <lulie@linux.alibaba.com> wrote:
> >
> >
> >
> > On 2024/11/7 16:01, Jason Xing wrote:
> > > On Thu, Nov 7, 2024 at 3:51 PM Philo Lu <lulie@linux.alibaba.com> wrote:
> > >>
> > >> Hi Jason,
> > >>
> > >> On 2024/11/5 10:55, Jason Xing wrote:
> > >>> From: Jason Xing <kernelxing@tencent.com>
> > >>>
> > >>> We found there are rare chances that some RST packets appear during
> > >>> the shakehands because the timewait socket cannot accept the SYN and
> > >>> doesn't return TCP_TW_SYN in tcp_timewait_state_process().
> > >>>
> > >>> Here is how things happen in production:
> > >>> Time Client(A) Server(B)
> > >>> 0s SYN-->
> > >>> ...
> > >>> 132s <-- FIN
> > >>> ...
> > >>> 169s FIN-->
> > >>> 169s <-- ACK
> > >>> 169s SYN-->
> > >>> 169s <-- ACK
> > >>> 169s RST-->
> > >>> As above picture shows, the two flows have a start time difference
> > >>> of 169 seconds. B starts to send FIN so it will finally enter into
> > >>> TIMEWAIT state. Nearly at the same time A launches a new connection
> > >>> that soon is reset by itself due to receiving a ACK.
> > >>>
> > >>> There are two key checks in tcp_timewait_state_process() when timewait
> > >>> socket in B receives the SYN packet:
> > >>> 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt)
> > >>> 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0)
> > >>>
> > >>> Regarding the first rule, it fails as expected because in the first
> > >>> connection the seq of SYN sent from A is 1892994276, then 169s have
> > >>> passed, the second SYN has 239034613 (caused by overflow of s32).
> > >>>
> > >>> Then how about the second rule?
> > >>> It fails again!
> > >>> Let's take a look at how the tsval comes out:
> > >>> __tcp_transmit_skb()
> > >>> -> tcp_syn_options()
> > >>> -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset;
> > >>> The timestamp depends on two things, one is skb->skb_mstamp_ns, the
> > >>> other is tp->tsoffset. The latter value is fixed, so we don't need
> > >>> to care about it. If both operations (sending FIN and then starting
> > >>> sending SYN) from A happen in 1ms, then the tsval would be the same.
> > >>> It can be clearly seen in the tcpdump log. Notice that the tsval is
> > >>> with millisecond precision.
> > >>>
> > >>> Based on the above analysis, I decided to make a small change to
> > >>> the check in tcp_timewait_state_process() so that the second flow
> > >>> would not fail.
> > >>>
> > >>
> > >> I wonder what a bad result the RST causes. As far as I know, the client
> > >> will not close the connect and return. Instead, it re-sends an SYN in
> > >> TCP_TIMEOUT_MIN(2) jiffies (implemented in
> > >> tcp_rcv_synsent_state_process). So the second connection could still be
> > >> established successfully, at the cost of a bit more delay. Like:
> > >>
> > >> Time Client(A) Server(B)
> > >> 0s SYN-->
> > >> ...
> > >> 132s <-- FIN
> > >> ...
> > >> 169s FIN-->
> > >> 169s <-- ACK
> > >> 169s SYN-->
> > >> 169s <-- ACK
> > >> 169s RST-->
> > >> ~2jiffies SYN-->
> > >> <-- SYN,ACK
> > >
> > > That's exactly what I meant here :) Originally I didn't expect the
> > > application to relaunch a connection in this case.
> >
> > s/application/kernel/, right?
>
> No. Perhaps I didn't make myself clear. If the kernel doesn't silently
> drop the SYN and then send back an ACK, the application has to call
> the connect() syscall again.
My suggestion to stop the confusion:
Provide a packetdrill test.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process
2024-11-07 8:37 ` Eric Dumazet
@ 2024-11-07 9:00 ` Jason Xing
2024-11-07 9:16 ` Eric Dumazet
2024-11-07 9:30 ` Jason Xing
0 siblings, 2 replies; 28+ messages in thread
From: Jason Xing @ 2024-11-07 9:00 UTC (permalink / raw)
To: Eric Dumazet
Cc: Philo Lu, davem, kuba, pabeni, dsahern, horms, netdev, Jason Xing
On Thu, Nov 7, 2024 at 4:37 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Nov 7, 2024 at 9:27 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > On Thu, Nov 7, 2024 at 4:22 PM Philo Lu <lulie@linux.alibaba.com> wrote:
> > >
> > >
> > >
> > > On 2024/11/7 16:01, Jason Xing wrote:
> > > > On Thu, Nov 7, 2024 at 3:51 PM Philo Lu <lulie@linux.alibaba.com> wrote:
> > > >>
> > > >> Hi Jason,
> > > >>
> > > >> On 2024/11/5 10:55, Jason Xing wrote:
> > > >>> From: Jason Xing <kernelxing@tencent.com>
> > > >>>
> > > >>> We found there are rare chances that some RST packets appear during
> > > >>> the shakehands because the timewait socket cannot accept the SYN and
> > > >>> doesn't return TCP_TW_SYN in tcp_timewait_state_process().
> > > >>>
> > > >>> Here is how things happen in production:
> > > >>> Time Client(A) Server(B)
> > > >>> 0s SYN-->
> > > >>> ...
> > > >>> 132s <-- FIN
> > > >>> ...
> > > >>> 169s FIN-->
> > > >>> 169s <-- ACK
> > > >>> 169s SYN-->
> > > >>> 169s <-- ACK
> > > >>> 169s RST-->
> > > >>> As above picture shows, the two flows have a start time difference
> > > >>> of 169 seconds. B starts to send FIN so it will finally enter into
> > > >>> TIMEWAIT state. Nearly at the same time A launches a new connection
> > > >>> that soon is reset by itself due to receiving a ACK.
> > > >>>
> > > >>> There are two key checks in tcp_timewait_state_process() when timewait
> > > >>> socket in B receives the SYN packet:
> > > >>> 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt)
> > > >>> 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0)
> > > >>>
> > > >>> Regarding the first rule, it fails as expected because in the first
> > > >>> connection the seq of SYN sent from A is 1892994276, then 169s have
> > > >>> passed, the second SYN has 239034613 (caused by overflow of s32).
> > > >>>
> > > >>> Then how about the second rule?
> > > >>> It fails again!
> > > >>> Let's take a look at how the tsval comes out:
> > > >>> __tcp_transmit_skb()
> > > >>> -> tcp_syn_options()
> > > >>> -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset;
> > > >>> The timestamp depends on two things, one is skb->skb_mstamp_ns, the
> > > >>> other is tp->tsoffset. The latter value is fixed, so we don't need
> > > >>> to care about it. If both operations (sending FIN and then starting
> > > >>> sending SYN) from A happen in 1ms, then the tsval would be the same.
> > > >>> It can be clearly seen in the tcpdump log. Notice that the tsval is
> > > >>> with millisecond precision.
> > > >>>
> > > >>> Based on the above analysis, I decided to make a small change to
> > > >>> the check in tcp_timewait_state_process() so that the second flow
> > > >>> would not fail.
> > > >>>
> > > >>
> > > >> I wonder what a bad result the RST causes. As far as I know, the client
> > > >> will not close the connect and return. Instead, it re-sends an SYN in
> > > >> TCP_TIMEOUT_MIN(2) jiffies (implemented in
> > > >> tcp_rcv_synsent_state_process). So the second connection could still be
> > > >> established successfully, at the cost of a bit more delay. Like:
> > > >>
> > > >> Time Client(A) Server(B)
> > > >> 0s SYN-->
> > > >> ...
> > > >> 132s <-- FIN
> > > >> ...
> > > >> 169s FIN-->
> > > >> 169s <-- ACK
> > > >> 169s SYN-->
> > > >> 169s <-- ACK
> > > >> 169s RST-->
> > > >> ~2jiffies SYN-->
It doesn't happen. I dare to say the SYN will not be retransmitted
soon which I can tell from many tcpdump logs I captured.
> > > >> <-- SYN,ACK
> > > >
> > > > That's exactly what I meant here :) Originally I didn't expect the
> > > > application to relaunch a connection in this case.
> > >
> > > s/application/kernel/, right?
> >
> > No. Perhaps I didn't make myself clear. If the kernel doesn't silently
> > drop the SYN and then send back an ACK, the application has to call
> > the connect() syscall again.
>
> My suggestion to stop the confusion:
Oh, right now, I realized that Philo and I are not on the same page :(
Please forget that conversation.
My points are:
1) If B silently drops the SYN packet, A will retransmit an SYN packet
and then the connection will be established. It's what I tried to
propose and would like to see. It also adheres to the RFC 6191.
2) As kuniyuki pointed out on the contrary, sending an ACK (like the
current implementation) instead of silently dropping the SYN packet is
actually a challenge ack. If so, I think we need to consider this ACK
as a challenge ack like what tcp_send_challenge_ack() does.
I'd like to hear more about your opinions on the above conclusion.
Thanks!
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process
2024-11-07 9:00 ` Jason Xing
@ 2024-11-07 9:16 ` Eric Dumazet
2024-11-07 9:18 ` Jason Xing
2024-11-07 9:30 ` Jason Xing
1 sibling, 1 reply; 28+ messages in thread
From: Eric Dumazet @ 2024-11-07 9:16 UTC (permalink / raw)
To: Jason Xing
Cc: Philo Lu, davem, kuba, pabeni, dsahern, horms, netdev, Jason Xing
On Thu, Nov 7, 2024 at 10:01 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Thu, Nov 7, 2024 at 4:37 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Thu, Nov 7, 2024 at 9:27 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >
> > > On Thu, Nov 7, 2024 at 4:22 PM Philo Lu <lulie@linux.alibaba.com> wrote:
> > > >
> > > >
> > > >
> > > > On 2024/11/7 16:01, Jason Xing wrote:
> > > > > On Thu, Nov 7, 2024 at 3:51 PM Philo Lu <lulie@linux.alibaba.com> wrote:
> > > > >>
> > > > >> Hi Jason,
> > > > >>
> > > > >> On 2024/11/5 10:55, Jason Xing wrote:
> > > > >>> From: Jason Xing <kernelxing@tencent.com>
> > > > >>>
> > > > >>> We found there are rare chances that some RST packets appear during
> > > > >>> the shakehands because the timewait socket cannot accept the SYN and
> > > > >>> doesn't return TCP_TW_SYN in tcp_timewait_state_process().
> > > > >>>
> > > > >>> Here is how things happen in production:
> > > > >>> Time Client(A) Server(B)
> > > > >>> 0s SYN-->
> > > > >>> ...
> > > > >>> 132s <-- FIN
> > > > >>> ...
> > > > >>> 169s FIN-->
> > > > >>> 169s <-- ACK
> > > > >>> 169s SYN-->
> > > > >>> 169s <-- ACK
> > > > >>> 169s RST-->
> > > > >>> As above picture shows, the two flows have a start time difference
> > > > >>> of 169 seconds. B starts to send FIN so it will finally enter into
> > > > >>> TIMEWAIT state. Nearly at the same time A launches a new connection
> > > > >>> that soon is reset by itself due to receiving a ACK.
> > > > >>>
> > > > >>> There are two key checks in tcp_timewait_state_process() when timewait
> > > > >>> socket in B receives the SYN packet:
> > > > >>> 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt)
> > > > >>> 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0)
> > > > >>>
> > > > >>> Regarding the first rule, it fails as expected because in the first
> > > > >>> connection the seq of SYN sent from A is 1892994276, then 169s have
> > > > >>> passed, the second SYN has 239034613 (caused by overflow of s32).
> > > > >>>
> > > > >>> Then how about the second rule?
> > > > >>> It fails again!
> > > > >>> Let's take a look at how the tsval comes out:
> > > > >>> __tcp_transmit_skb()
> > > > >>> -> tcp_syn_options()
> > > > >>> -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset;
> > > > >>> The timestamp depends on two things, one is skb->skb_mstamp_ns, the
> > > > >>> other is tp->tsoffset. The latter value is fixed, so we don't need
> > > > >>> to care about it. If both operations (sending FIN and then starting
> > > > >>> sending SYN) from A happen in 1ms, then the tsval would be the same.
> > > > >>> It can be clearly seen in the tcpdump log. Notice that the tsval is
> > > > >>> with millisecond precision.
> > > > >>>
> > > > >>> Based on the above analysis, I decided to make a small change to
> > > > >>> the check in tcp_timewait_state_process() so that the second flow
> > > > >>> would not fail.
> > > > >>>
> > > > >>
> > > > >> I wonder what a bad result the RST causes. As far as I know, the client
> > > > >> will not close the connect and return. Instead, it re-sends an SYN in
> > > > >> TCP_TIMEOUT_MIN(2) jiffies (implemented in
> > > > >> tcp_rcv_synsent_state_process). So the second connection could still be
> > > > >> established successfully, at the cost of a bit more delay. Like:
> > > > >>
> > > > >> Time Client(A) Server(B)
> > > > >> 0s SYN-->
> > > > >> ...
> > > > >> 132s <-- FIN
> > > > >> ...
> > > > >> 169s FIN-->
> > > > >> 169s <-- ACK
> > > > >> 169s SYN-->
> > > > >> 169s <-- ACK
> > > > >> 169s RST-->
> > > > >> ~2jiffies SYN-->
>
> It doesn't happen. I dare to say the SYN will not be retransmitted
> soon which I can tell from many tcpdump logs I captured.
>
> > > > >> <-- SYN,ACK
> > > > >
> > > > > That's exactly what I meant here :) Originally I didn't expect the
> > > > > application to relaunch a connection in this case.
> > > >
> > > > s/application/kernel/, right?
> > >
> > > No. Perhaps I didn't make myself clear. If the kernel doesn't silently
> > > drop the SYN and then send back an ACK, the application has to call
> > > the connect() syscall again.
> >
> > My suggestion to stop the confusion:
>
> Oh, right now, I realized that Philo and I are not on the same page :(
> Please forget that conversation.
>
> My points are:
> 1) If B silently drops the SYN packet, A will retransmit an SYN packet
> and then the connection will be established. It's what I tried to
> propose and would like to see. It also adheres to the RFC 6191.
> 2) As kuniyuki pointed out on the contrary, sending an ACK (like the
> current implementation) instead of silently dropping the SYN packet is
> actually a challenge ack. If so, I think we need to consider this ACK
> as a challenge ack like what tcp_send_challenge_ack() does.
Like where ? Again, a packetdrill test will clarify all of this.
>
> I'd like to hear more about your opinions on the above conclusion.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process
2024-11-07 9:16 ` Eric Dumazet
@ 2024-11-07 9:18 ` Jason Xing
0 siblings, 0 replies; 28+ messages in thread
From: Jason Xing @ 2024-11-07 9:18 UTC (permalink / raw)
To: Eric Dumazet
Cc: Philo Lu, davem, kuba, pabeni, dsahern, horms, netdev, Jason Xing
On Thu, Nov 7, 2024 at 5:17 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Nov 7, 2024 at 10:01 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > On Thu, Nov 7, 2024 at 4:37 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Thu, Nov 7, 2024 at 9:27 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > >
> > > > On Thu, Nov 7, 2024 at 4:22 PM Philo Lu <lulie@linux.alibaba.com> wrote:
> > > > >
> > > > >
> > > > >
> > > > > On 2024/11/7 16:01, Jason Xing wrote:
> > > > > > On Thu, Nov 7, 2024 at 3:51 PM Philo Lu <lulie@linux.alibaba.com> wrote:
> > > > > >>
> > > > > >> Hi Jason,
> > > > > >>
> > > > > >> On 2024/11/5 10:55, Jason Xing wrote:
> > > > > >>> From: Jason Xing <kernelxing@tencent.com>
> > > > > >>>
> > > > > >>> We found there are rare chances that some RST packets appear during
> > > > > >>> the shakehands because the timewait socket cannot accept the SYN and
> > > > > >>> doesn't return TCP_TW_SYN in tcp_timewait_state_process().
> > > > > >>>
> > > > > >>> Here is how things happen in production:
> > > > > >>> Time Client(A) Server(B)
> > > > > >>> 0s SYN-->
> > > > > >>> ...
> > > > > >>> 132s <-- FIN
> > > > > >>> ...
> > > > > >>> 169s FIN-->
> > > > > >>> 169s <-- ACK
> > > > > >>> 169s SYN-->
> > > > > >>> 169s <-- ACK
> > > > > >>> 169s RST-->
> > > > > >>> As above picture shows, the two flows have a start time difference
> > > > > >>> of 169 seconds. B starts to send FIN so it will finally enter into
> > > > > >>> TIMEWAIT state. Nearly at the same time A launches a new connection
> > > > > >>> that soon is reset by itself due to receiving a ACK.
> > > > > >>>
> > > > > >>> There are two key checks in tcp_timewait_state_process() when timewait
> > > > > >>> socket in B receives the SYN packet:
> > > > > >>> 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt)
> > > > > >>> 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0)
> > > > > >>>
> > > > > >>> Regarding the first rule, it fails as expected because in the first
> > > > > >>> connection the seq of SYN sent from A is 1892994276, then 169s have
> > > > > >>> passed, the second SYN has 239034613 (caused by overflow of s32).
> > > > > >>>
> > > > > >>> Then how about the second rule?
> > > > > >>> It fails again!
> > > > > >>> Let's take a look at how the tsval comes out:
> > > > > >>> __tcp_transmit_skb()
> > > > > >>> -> tcp_syn_options()
> > > > > >>> -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset;
> > > > > >>> The timestamp depends on two things, one is skb->skb_mstamp_ns, the
> > > > > >>> other is tp->tsoffset. The latter value is fixed, so we don't need
> > > > > >>> to care about it. If both operations (sending FIN and then starting
> > > > > >>> sending SYN) from A happen in 1ms, then the tsval would be the same.
> > > > > >>> It can be clearly seen in the tcpdump log. Notice that the tsval is
> > > > > >>> with millisecond precision.
> > > > > >>>
> > > > > >>> Based on the above analysis, I decided to make a small change to
> > > > > >>> the check in tcp_timewait_state_process() so that the second flow
> > > > > >>> would not fail.
> > > > > >>>
> > > > > >>
> > > > > >> I wonder what a bad result the RST causes. As far as I know, the client
> > > > > >> will not close the connect and return. Instead, it re-sends an SYN in
> > > > > >> TCP_TIMEOUT_MIN(2) jiffies (implemented in
> > > > > >> tcp_rcv_synsent_state_process). So the second connection could still be
> > > > > >> established successfully, at the cost of a bit more delay. Like:
> > > > > >>
> > > > > >> Time Client(A) Server(B)
> > > > > >> 0s SYN-->
> > > > > >> ...
> > > > > >> 132s <-- FIN
> > > > > >> ...
> > > > > >> 169s FIN-->
> > > > > >> 169s <-- ACK
> > > > > >> 169s SYN-->
> > > > > >> 169s <-- ACK
> > > > > >> 169s RST-->
> > > > > >> ~2jiffies SYN-->
> >
> > It doesn't happen. I dare to say the SYN will not be retransmitted
> > soon which I can tell from many tcpdump logs I captured.
> >
> > > > > >> <-- SYN,ACK
> > > > > >
> > > > > > That's exactly what I meant here :) Originally I didn't expect the
> > > > > > application to relaunch a connection in this case.
> > > > >
> > > > > s/application/kernel/, right?
> > > >
> > > > No. Perhaps I didn't make myself clear. If the kernel doesn't silently
> > > > drop the SYN and then send back an ACK, the application has to call
> > > > the connect() syscall again.
> > >
> > > My suggestion to stop the confusion:
> >
> > Oh, right now, I realized that Philo and I are not on the same page :(
> > Please forget that conversation.
> >
> > My points are:
> > 1) If B silently drops the SYN packet, A will retransmit an SYN packet
> > and then the connection will be established. It's what I tried to
> > propose and would like to see. It also adheres to the RFC 6191.
> > 2) As kuniyuki pointed out on the contrary, sending an ACK (like the
> > current implementation) instead of silently dropping the SYN packet is
> > actually a challenge ack. If so, I think we need to consider this ACK
> > as a challenge ack like what tcp_send_challenge_ack() does.
>
> Like where ? Again, a packetdrill test will clarify all of this.
Well, okay, Let me try.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process
2024-11-07 9:00 ` Jason Xing
2024-11-07 9:16 ` Eric Dumazet
@ 2024-11-07 9:30 ` Jason Xing
2024-11-07 9:45 ` Eric Dumazet
1 sibling, 1 reply; 28+ messages in thread
From: Jason Xing @ 2024-11-07 9:30 UTC (permalink / raw)
To: Eric Dumazet
Cc: Philo Lu, davem, kuba, pabeni, dsahern, horms, netdev, Jason Xing
On Thu, Nov 7, 2024 at 5:00 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Thu, Nov 7, 2024 at 4:37 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Thu, Nov 7, 2024 at 9:27 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >
> > > On Thu, Nov 7, 2024 at 4:22 PM Philo Lu <lulie@linux.alibaba.com> wrote:
> > > >
> > > >
> > > >
> > > > On 2024/11/7 16:01, Jason Xing wrote:
> > > > > On Thu, Nov 7, 2024 at 3:51 PM Philo Lu <lulie@linux.alibaba.com> wrote:
> > > > >>
> > > > >> Hi Jason,
> > > > >>
> > > > >> On 2024/11/5 10:55, Jason Xing wrote:
> > > > >>> From: Jason Xing <kernelxing@tencent.com>
> > > > >>>
> > > > >>> We found there are rare chances that some RST packets appear during
> > > > >>> the shakehands because the timewait socket cannot accept the SYN and
> > > > >>> doesn't return TCP_TW_SYN in tcp_timewait_state_process().
> > > > >>>
> > > > >>> Here is how things happen in production:
> > > > >>> Time Client(A) Server(B)
> > > > >>> 0s SYN-->
> > > > >>> ...
> > > > >>> 132s <-- FIN
> > > > >>> ...
> > > > >>> 169s FIN-->
> > > > >>> 169s <-- ACK
> > > > >>> 169s SYN-->
> > > > >>> 169s <-- ACK
> > > > >>> 169s RST-->
> > > > >>> As above picture shows, the two flows have a start time difference
> > > > >>> of 169 seconds. B starts to send FIN so it will finally enter into
> > > > >>> TIMEWAIT state. Nearly at the same time A launches a new connection
> > > > >>> that soon is reset by itself due to receiving a ACK.
> > > > >>>
> > > > >>> There are two key checks in tcp_timewait_state_process() when timewait
> > > > >>> socket in B receives the SYN packet:
> > > > >>> 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt)
> > > > >>> 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0)
> > > > >>>
> > > > >>> Regarding the first rule, it fails as expected because in the first
> > > > >>> connection the seq of SYN sent from A is 1892994276, then 169s have
> > > > >>> passed, the second SYN has 239034613 (caused by overflow of s32).
> > > > >>>
> > > > >>> Then how about the second rule?
> > > > >>> It fails again!
> > > > >>> Let's take a look at how the tsval comes out:
> > > > >>> __tcp_transmit_skb()
> > > > >>> -> tcp_syn_options()
> > > > >>> -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset;
> > > > >>> The timestamp depends on two things, one is skb->skb_mstamp_ns, the
> > > > >>> other is tp->tsoffset. The latter value is fixed, so we don't need
> > > > >>> to care about it. If both operations (sending FIN and then starting
> > > > >>> sending SYN) from A happen in 1ms, then the tsval would be the same.
> > > > >>> It can be clearly seen in the tcpdump log. Notice that the tsval is
> > > > >>> with millisecond precision.
> > > > >>>
> > > > >>> Based on the above analysis, I decided to make a small change to
> > > > >>> the check in tcp_timewait_state_process() so that the second flow
> > > > >>> would not fail.
> > > > >>>
> > > > >>
> > > > >> I wonder what a bad result the RST causes. As far as I know, the client
> > > > >> will not close the connect and return. Instead, it re-sends an SYN in
> > > > >> TCP_TIMEOUT_MIN(2) jiffies (implemented in
> > > > >> tcp_rcv_synsent_state_process). So the second connection could still be
> > > > >> established successfully, at the cost of a bit more delay. Like:
> > > > >>
> > > > >> Time Client(A) Server(B)
> > > > >> 0s SYN-->
> > > > >> ...
> > > > >> 132s <-- FIN
> > > > >> ...
> > > > >> 169s FIN-->
> > > > >> 169s <-- ACK
> > > > >> 169s SYN-->
> > > > >> 169s <-- ACK
> > > > >> 169s RST-->
> > > > >> ~2jiffies SYN-->
>
> It doesn't happen. I dare to say the SYN will not be retransmitted
> soon which I can tell from many tcpdump logs I captured.
My mind went blank probably because of getting sick. Sorry. Why the
tcpdump doesn't show the retransmitted SYN packet is our private
kernel missed this following commit:
commit 9603d47bad4642118fa19fd1562569663d9235f6
Author: SeongJae Park <sjpark@amazon.de>
Date: Sun Feb 2 03:38:26 2020 +0000
tcp: Reduce SYN resend delay if a suspicous ACK is received
When closing a connection, the two acks that required to change closing
socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in
reverse order. This is possible in RSS disabled environments such as a
connection inside a host.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process
2024-11-07 9:30 ` Jason Xing
@ 2024-11-07 9:45 ` Eric Dumazet
2024-11-07 9:48 ` Eric Dumazet
2024-11-07 9:57 ` Jason Xing
0 siblings, 2 replies; 28+ messages in thread
From: Eric Dumazet @ 2024-11-07 9:45 UTC (permalink / raw)
To: Jason Xing
Cc: Philo Lu, davem, kuba, pabeni, dsahern, horms, netdev, Jason Xing
On Thu, Nov 7, 2024 at 10:30 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Thu, Nov 7, 2024 at 5:00 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > On Thu, Nov 7, 2024 at 4:37 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Thu, Nov 7, 2024 at 9:27 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > >
> > > > On Thu, Nov 7, 2024 at 4:22 PM Philo Lu <lulie@linux.alibaba.com> wrote:
> > > > >
> > > > >
> > > > >
> > > > > On 2024/11/7 16:01, Jason Xing wrote:
> > > > > > On Thu, Nov 7, 2024 at 3:51 PM Philo Lu <lulie@linux.alibaba.com> wrote:
> > > > > >>
> > > > > >> Hi Jason,
> > > > > >>
> > > > > >> On 2024/11/5 10:55, Jason Xing wrote:
> > > > > >>> From: Jason Xing <kernelxing@tencent.com>
> > > > > >>>
> > > > > >>> We found there are rare chances that some RST packets appear during
> > > > > >>> the shakehands because the timewait socket cannot accept the SYN and
> > > > > >>> doesn't return TCP_TW_SYN in tcp_timewait_state_process().
> > > > > >>>
> > > > > >>> Here is how things happen in production:
> > > > > >>> Time Client(A) Server(B)
> > > > > >>> 0s SYN-->
> > > > > >>> ...
> > > > > >>> 132s <-- FIN
> > > > > >>> ...
> > > > > >>> 169s FIN-->
> > > > > >>> 169s <-- ACK
> > > > > >>> 169s SYN-->
> > > > > >>> 169s <-- ACK
> > > > > >>> 169s RST-->
> > > > > >>> As above picture shows, the two flows have a start time difference
> > > > > >>> of 169 seconds. B starts to send FIN so it will finally enter into
> > > > > >>> TIMEWAIT state. Nearly at the same time A launches a new connection
> > > > > >>> that soon is reset by itself due to receiving a ACK.
> > > > > >>>
> > > > > >>> There are two key checks in tcp_timewait_state_process() when timewait
> > > > > >>> socket in B receives the SYN packet:
> > > > > >>> 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt)
> > > > > >>> 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0)
> > > > > >>>
> > > > > >>> Regarding the first rule, it fails as expected because in the first
> > > > > >>> connection the seq of SYN sent from A is 1892994276, then 169s have
> > > > > >>> passed, the second SYN has 239034613 (caused by overflow of s32).
> > > > > >>>
> > > > > >>> Then how about the second rule?
> > > > > >>> It fails again!
> > > > > >>> Let's take a look at how the tsval comes out:
> > > > > >>> __tcp_transmit_skb()
> > > > > >>> -> tcp_syn_options()
> > > > > >>> -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset;
> > > > > >>> The timestamp depends on two things, one is skb->skb_mstamp_ns, the
> > > > > >>> other is tp->tsoffset. The latter value is fixed, so we don't need
> > > > > >>> to care about it. If both operations (sending FIN and then starting
> > > > > >>> sending SYN) from A happen in 1ms, then the tsval would be the same.
> > > > > >>> It can be clearly seen in the tcpdump log. Notice that the tsval is
> > > > > >>> with millisecond precision.
> > > > > >>>
> > > > > >>> Based on the above analysis, I decided to make a small change to
> > > > > >>> the check in tcp_timewait_state_process() so that the second flow
> > > > > >>> would not fail.
> > > > > >>>
> > > > > >>
> > > > > >> I wonder what a bad result the RST causes. As far as I know, the client
> > > > > >> will not close the connect and return. Instead, it re-sends an SYN in
> > > > > >> TCP_TIMEOUT_MIN(2) jiffies (implemented in
> > > > > >> tcp_rcv_synsent_state_process). So the second connection could still be
> > > > > >> established successfully, at the cost of a bit more delay. Like:
> > > > > >>
> > > > > >> Time Client(A) Server(B)
> > > > > >> 0s SYN-->
> > > > > >> ...
> > > > > >> 132s <-- FIN
> > > > > >> ...
> > > > > >> 169s FIN-->
> > > > > >> 169s <-- ACK
> > > > > >> 169s SYN-->
> > > > > >> 169s <-- ACK
> > > > > >> 169s RST-->
> > > > > >> ~2jiffies SYN-->
> >
> > It doesn't happen. I dare to say the SYN will not be retransmitted
> > soon which I can tell from many tcpdump logs I captured.
>
> My mind went blank probably because of getting sick. Sorry. Why the
> tcpdump doesn't show the retransmitted SYN packet is our private
> kernel missed this following commit:
>
> commit 9603d47bad4642118fa19fd1562569663d9235f6
> Author: SeongJae Park <sjpark@amazon.de>
> Date: Sun Feb 2 03:38:26 2020 +0000
>
> tcp: Reduce SYN resend delay if a suspicous ACK is received
>
> When closing a connection, the two acks that required to change closing
> socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in
> reverse order. This is possible in RSS disabled environments such as a
> connection inside a host.
Not having a patch from linux-5.6 does not really give a good impression,
please next time make sure your patches are needed for upstream trees,
net or net-next.
This also means you can write a packetdrill test to show the benefit
of this patch,
and send it as a new selftests ;)
Back to packetdrill, and everyone will be happy.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process
2024-11-07 9:45 ` Eric Dumazet
@ 2024-11-07 9:48 ` Eric Dumazet
2024-11-07 9:57 ` Jason Xing
1 sibling, 0 replies; 28+ messages in thread
From: Eric Dumazet @ 2024-11-07 9:48 UTC (permalink / raw)
To: Jason Xing
Cc: Philo Lu, davem, kuba, pabeni, dsahern, horms, netdev, Jason Xing
On Thu, Nov 7, 2024 at 10:45 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Nov 7, 2024 at 10:30 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > On Thu, Nov 7, 2024 at 5:00 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >
> > > On Thu, Nov 7, 2024 at 4:37 PM Eric Dumazet <edumazet@google.com> wrote:
> > > >
> > > > On Thu, Nov 7, 2024 at 9:27 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > > >
> > > > > On Thu, Nov 7, 2024 at 4:22 PM Philo Lu <lulie@linux.alibaba.com> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 2024/11/7 16:01, Jason Xing wrote:
> > > > > > > On Thu, Nov 7, 2024 at 3:51 PM Philo Lu <lulie@linux.alibaba.com> wrote:
> > > > > > >>
> > > > > > >> Hi Jason,
> > > > > > >>
> > > > > > >> On 2024/11/5 10:55, Jason Xing wrote:
> > > > > > >>> From: Jason Xing <kernelxing@tencent.com>
> > > > > > >>>
> > > > > > >>> We found there are rare chances that some RST packets appear during
> > > > > > >>> the shakehands because the timewait socket cannot accept the SYN and
> > > > > > >>> doesn't return TCP_TW_SYN in tcp_timewait_state_process().
> > > > > > >>>
> > > > > > >>> Here is how things happen in production:
> > > > > > >>> Time Client(A) Server(B)
> > > > > > >>> 0s SYN-->
> > > > > > >>> ...
> > > > > > >>> 132s <-- FIN
> > > > > > >>> ...
> > > > > > >>> 169s FIN-->
> > > > > > >>> 169s <-- ACK
> > > > > > >>> 169s SYN-->
> > > > > > >>> 169s <-- ACK
> > > > > > >>> 169s RST-->
> > > > > > >>> As above picture shows, the two flows have a start time difference
> > > > > > >>> of 169 seconds. B starts to send FIN so it will finally enter into
> > > > > > >>> TIMEWAIT state. Nearly at the same time A launches a new connection
> > > > > > >>> that soon is reset by itself due to receiving a ACK.
> > > > > > >>>
> > > > > > >>> There are two key checks in tcp_timewait_state_process() when timewait
> > > > > > >>> socket in B receives the SYN packet:
> > > > > > >>> 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt)
> > > > > > >>> 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0)
> > > > > > >>>
> > > > > > >>> Regarding the first rule, it fails as expected because in the first
> > > > > > >>> connection the seq of SYN sent from A is 1892994276, then 169s have
> > > > > > >>> passed, the second SYN has 239034613 (caused by overflow of s32).
> > > > > > >>>
> > > > > > >>> Then how about the second rule?
> > > > > > >>> It fails again!
> > > > > > >>> Let's take a look at how the tsval comes out:
> > > > > > >>> __tcp_transmit_skb()
> > > > > > >>> -> tcp_syn_options()
> > > > > > >>> -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset;
> > > > > > >>> The timestamp depends on two things, one is skb->skb_mstamp_ns, the
> > > > > > >>> other is tp->tsoffset. The latter value is fixed, so we don't need
> > > > > > >>> to care about it. If both operations (sending FIN and then starting
> > > > > > >>> sending SYN) from A happen in 1ms, then the tsval would be the same.
> > > > > > >>> It can be clearly seen in the tcpdump log. Notice that the tsval is
> > > > > > >>> with millisecond precision.
> > > > > > >>>
> > > > > > >>> Based on the above analysis, I decided to make a small change to
> > > > > > >>> the check in tcp_timewait_state_process() so that the second flow
> > > > > > >>> would not fail.
> > > > > > >>>
> > > > > > >>
> > > > > > >> I wonder what a bad result the RST causes. As far as I know, the client
> > > > > > >> will not close the connect and return. Instead, it re-sends an SYN in
> > > > > > >> TCP_TIMEOUT_MIN(2) jiffies (implemented in
> > > > > > >> tcp_rcv_synsent_state_process). So the second connection could still be
> > > > > > >> established successfully, at the cost of a bit more delay. Like:
> > > > > > >>
> > > > > > >> Time Client(A) Server(B)
> > > > > > >> 0s SYN-->
> > > > > > >> ...
> > > > > > >> 132s <-- FIN
> > > > > > >> ...
> > > > > > >> 169s FIN-->
> > > > > > >> 169s <-- ACK
> > > > > > >> 169s SYN-->
> > > > > > >> 169s <-- ACK
> > > > > > >> 169s RST-->
> > > > > > >> ~2jiffies SYN-->
> > >
> > > It doesn't happen. I dare to say the SYN will not be retransmitted
> > > soon which I can tell from many tcpdump logs I captured.
> >
> > My mind went blank probably because of getting sick. Sorry. Why the
> > tcpdump doesn't show the retransmitted SYN packet is our private
> > kernel missed this following commit:
> >
> > commit 9603d47bad4642118fa19fd1562569663d9235f6
> > Author: SeongJae Park <sjpark@amazon.de>
> > Date: Sun Feb 2 03:38:26 2020 +0000
> >
> > tcp: Reduce SYN resend delay if a suspicous ACK is received
> >
> > When closing a connection, the two acks that required to change closing
> > socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in
> > reverse order. This is possible in RSS disabled environments such as a
> > connection inside a host.
>
> Not having a patch from linux-5.6 does not really give a good impression,
> please next time make sure your patches are needed for upstream trees,
> net or net-next.
>
> This also means you can write a packetdrill test to show the benefit
> of this patch,
> and send it as a new selftests ;)
>
> Back to packetdrill, and everyone will be happy.
Well, there is selftest already, not based on packetdrill because
we had no packetdrill support yet in selftests
commit af8c8a450bf4698a8a6a7c68956ea5ccafe4cc88
Author: SeongJae Park <sjpark@amazon.de>
Date: Sun Feb 2 03:38:27 2020 +0000
selftests: net: Add FIN_ACK processing order related latency spike test
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process
2024-11-07 9:45 ` Eric Dumazet
2024-11-07 9:48 ` Eric Dumazet
@ 2024-11-07 9:57 ` Jason Xing
1 sibling, 0 replies; 28+ messages in thread
From: Jason Xing @ 2024-11-07 9:57 UTC (permalink / raw)
To: Eric Dumazet
Cc: Philo Lu, davem, kuba, pabeni, dsahern, horms, netdev, Jason Xing
On Thu, Nov 7, 2024 at 5:45 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Nov 7, 2024 at 10:30 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > On Thu, Nov 7, 2024 at 5:00 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >
> > > On Thu, Nov 7, 2024 at 4:37 PM Eric Dumazet <edumazet@google.com> wrote:
> > > >
> > > > On Thu, Nov 7, 2024 at 9:27 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > > >
> > > > > On Thu, Nov 7, 2024 at 4:22 PM Philo Lu <lulie@linux.alibaba.com> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 2024/11/7 16:01, Jason Xing wrote:
> > > > > > > On Thu, Nov 7, 2024 at 3:51 PM Philo Lu <lulie@linux.alibaba.com> wrote:
> > > > > > >>
> > > > > > >> Hi Jason,
> > > > > > >>
> > > > > > >> On 2024/11/5 10:55, Jason Xing wrote:
> > > > > > >>> From: Jason Xing <kernelxing@tencent.com>
> > > > > > >>>
> > > > > > >>> We found there are rare chances that some RST packets appear during
> > > > > > >>> the shakehands because the timewait socket cannot accept the SYN and
> > > > > > >>> doesn't return TCP_TW_SYN in tcp_timewait_state_process().
> > > > > > >>>
> > > > > > >>> Here is how things happen in production:
> > > > > > >>> Time Client(A) Server(B)
> > > > > > >>> 0s SYN-->
> > > > > > >>> ...
> > > > > > >>> 132s <-- FIN
> > > > > > >>> ...
> > > > > > >>> 169s FIN-->
> > > > > > >>> 169s <-- ACK
> > > > > > >>> 169s SYN-->
> > > > > > >>> 169s <-- ACK
> > > > > > >>> 169s RST-->
> > > > > > >>> As above picture shows, the two flows have a start time difference
> > > > > > >>> of 169 seconds. B starts to send FIN so it will finally enter into
> > > > > > >>> TIMEWAIT state. Nearly at the same time A launches a new connection
> > > > > > >>> that soon is reset by itself due to receiving a ACK.
> > > > > > >>>
> > > > > > >>> There are two key checks in tcp_timewait_state_process() when timewait
> > > > > > >>> socket in B receives the SYN packet:
> > > > > > >>> 1) after(TCP_SKB_CB(skb)->seq, rcv_nxt)
> > > > > > >>> 2) (s32)(READ_ONCE(tcptw->tw_ts_recent) - tmp_opt.rcv_tsval) < 0)
> > > > > > >>>
> > > > > > >>> Regarding the first rule, it fails as expected because in the first
> > > > > > >>> connection the seq of SYN sent from A is 1892994276, then 169s have
> > > > > > >>> passed, the second SYN has 239034613 (caused by overflow of s32).
> > > > > > >>>
> > > > > > >>> Then how about the second rule?
> > > > > > >>> It fails again!
> > > > > > >>> Let's take a look at how the tsval comes out:
> > > > > > >>> __tcp_transmit_skb()
> > > > > > >>> -> tcp_syn_options()
> > > > > > >>> -> opts->tsval = tcp_skb_timestamp_ts(tp->tcp_usec_ts, skb) + tp->tsoffset;
> > > > > > >>> The timestamp depends on two things, one is skb->skb_mstamp_ns, the
> > > > > > >>> other is tp->tsoffset. The latter value is fixed, so we don't need
> > > > > > >>> to care about it. If both operations (sending FIN and then starting
> > > > > > >>> sending SYN) from A happen in 1ms, then the tsval would be the same.
> > > > > > >>> It can be clearly seen in the tcpdump log. Notice that the tsval is
> > > > > > >>> with millisecond precision.
> > > > > > >>>
> > > > > > >>> Based on the above analysis, I decided to make a small change to
> > > > > > >>> the check in tcp_timewait_state_process() so that the second flow
> > > > > > >>> would not fail.
> > > > > > >>>
> > > > > > >>
> > > > > > >> I wonder what a bad result the RST causes. As far as I know, the client
> > > > > > >> will not close the connect and return. Instead, it re-sends an SYN in
> > > > > > >> TCP_TIMEOUT_MIN(2) jiffies (implemented in
> > > > > > >> tcp_rcv_synsent_state_process). So the second connection could still be
> > > > > > >> established successfully, at the cost of a bit more delay. Like:
> > > > > > >>
> > > > > > >> Time Client(A) Server(B)
> > > > > > >> 0s SYN-->
> > > > > > >> ...
> > > > > > >> 132s <-- FIN
> > > > > > >> ...
> > > > > > >> 169s FIN-->
> > > > > > >> 169s <-- ACK
> > > > > > >> 169s SYN-->
> > > > > > >> 169s <-- ACK
> > > > > > >> 169s RST-->
> > > > > > >> ~2jiffies SYN-->
> > >
> > > It doesn't happen. I dare to say the SYN will not be retransmitted
> > > soon which I can tell from many tcpdump logs I captured.
> >
> > My mind went blank probably because of getting sick. Sorry. Why the
> > tcpdump doesn't show the retransmitted SYN packet is our private
> > kernel missed this following commit:
> >
> > commit 9603d47bad4642118fa19fd1562569663d9235f6
> > Author: SeongJae Park <sjpark@amazon.de>
> > Date: Sun Feb 2 03:38:26 2020 +0000
> >
> > tcp: Reduce SYN resend delay if a suspicous ACK is received
> >
> > When closing a connection, the two acks that required to change closing
> > socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in
> > reverse order. This is possible in RSS disabled environments such as a
> > connection inside a host.
>
> Not having a patch from linux-5.6 does not really give a good impression,
> please next time make sure your patches are needed for upstream trees,
> net or net-next.
Sorry about this. Normally, I found an issue on those older kernels
and then I will test it on the net-next tree. The issue this patch
tried to fix can be reproduced, as we all see. But today regarding the
topic about re-sending an SYN quickly, I didn't conduct the test, only
analyzing the code :( Sorry for the noise.
>
> This also means you can write a packetdrill test to show the benefit
> of this patch,
> and send it as a new selftests ;)
>
> Back to packetdrill, and everyone will be happy.
Thanks for reminding me. It's really necessary for me to get started
to learn how to write various packetdrill tests.
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2024-11-07 9:57 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-05 2:55 [PATCH net-next] tcp: avoid RST in 3-way shakehands due to failure in tcp_timewait_state_process Jason Xing
2024-11-05 7:49 ` Kuniyuki Iwashima
2024-11-05 9:08 ` Jason Xing
2024-11-05 9:35 ` Eric Dumazet
2024-11-05 9:41 ` Jason Xing
2024-11-05 9:50 ` Eric Dumazet
2024-11-05 9:56 ` Jason Xing
2024-11-05 11:48 ` Jason Xing
2024-11-07 3:16 ` Jason Xing
2024-11-07 4:15 ` Kuniyuki Iwashima
2024-11-07 4:21 ` Kuniyuki Iwashima
2024-11-07 5:23 ` Jason Xing
2024-11-07 5:43 ` Kuniyuki Iwashima
2024-11-07 6:51 ` Jason Xing
2024-11-07 7:11 ` Kuniyuki Iwashima
2024-11-07 7:44 ` Jason Xing
2024-11-07 7:51 ` Philo Lu
2024-11-07 8:01 ` Jason Xing
2024-11-07 8:22 ` Philo Lu
2024-11-07 8:26 ` Jason Xing
2024-11-07 8:37 ` Eric Dumazet
2024-11-07 9:00 ` Jason Xing
2024-11-07 9:16 ` Eric Dumazet
2024-11-07 9:18 ` Jason Xing
2024-11-07 9:30 ` Jason Xing
2024-11-07 9:45 ` Eric Dumazet
2024-11-07 9:48 ` Eric Dumazet
2024-11-07 9:57 ` Jason Xing
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).