* [PATCH v3 net-next] tcp: avoid reusing FIN_WAIT2 when trying to find port in connect() process
@ 2024-08-21 15:33 Jason Xing
2024-08-21 15:47 ` Eric Dumazet
0 siblings, 1 reply; 3+ messages in thread
From: Jason Xing @ 2024-08-21 15:33 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, dsahern, kuniyu
Cc: netdev, Jason Xing, Jade Dong
From: Jason Xing <kernelxing@tencent.com>
We found that one close-wait socket was reset by the other side
due to a new connection reusing the same port which is beyond our
expectation, so we have to investigate the underlying reason.
The following experiment is conducted in the test environment. We
limit the port range from 40000 to 40010 and delay the time to close()
after receiving a fin from the active close side, which can help us
easily reproduce like what happened in production.
Here are three connections captured by tcpdump:
127.0.0.1.40002 > 127.0.0.1.9999: Flags [S], seq 2965525191
127.0.0.1.9999 > 127.0.0.1.40002: Flags [S.], seq 2769915070
127.0.0.1.40002 > 127.0.0.1.9999: Flags [.], ack 1
127.0.0.1.40002 > 127.0.0.1.9999: Flags [F.], seq 1, ack 1
// a few seconds later, within 60 seconds
127.0.0.1.40002 > 127.0.0.1.9999: Flags [S], seq 2965590730
127.0.0.1.9999 > 127.0.0.1.40002: Flags [.], ack 2
127.0.0.1.40002 > 127.0.0.1.9999: Flags [R], seq 2965525193
// later, very quickly
127.0.0.1.40002 > 127.0.0.1.9999: Flags [S], seq 2965590730
127.0.0.1.9999 > 127.0.0.1.40002: Flags [S.], seq 3120990805
127.0.0.1.40002 > 127.0.0.1.9999: Flags [.], ack 1
As we can see, the first flow is reset because:
1) client starts a new connection, I mean, the second one
2) client tries to find a suitable port which is a timewait socket
(its state is timewait, substate is fin_wait2)
3) client occupies that timewait port to send a SYN
4) server finds a corresponding close-wait socket in ehash table,
then replies with a challenge ack
5) client sends an RST to terminate this old close-wait socket.
I don't think the port selection algo can choose a FIN_WAIT2 socket
when we turn on tcp_tw_reuse because on the server side there
remain unread data. In some cases, if one side haven't call close() yet,
we should not consider it as expendable and treat it at will.
Even though, sometimes, the server isn't able to call close() as soon
as possible like what we expect, it can not be terminated easily,
especially due to a second unrelated connection happening.
After this patch, we can see the expected failure if we start a
connection when all the ports are occupied in fin_wait2 state:
"Ncat: Cannot assign requested address."
Reported-by: Jade Dong <jadedong@tencent.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
v3
Link: https://lore.kernel.org/all/20240815113745.6668-1-kerneljasonxing@gmail.com/
1. take the ipv6 case into consideration. (Eric)
v2
Link: https://lore.kernel.org/all/20240814035136.60796-1-kerneljasonxing@gmail.com/
1. change from fin_wait2 to timewait test statement, no functional
change (Kuniyuki)
---
net/ipv4/tcp_ipv4.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index fd17f25ff288..b37c70d292bc 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -144,6 +144,9 @@ int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp)
reuse = 0;
}
+ if (tw->tw_substate == TCP_FIN_WAIT2)
+ reuse = 0;
+
/* With PAWS, it is safe from the viewpoint
of data integrity. Even without PAWS it is safe provided sequence
spaces do not overlap i.e. at data rates <= 80Mbit/sec.
--
2.37.3
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v3 net-next] tcp: avoid reusing FIN_WAIT2 when trying to find port in connect() process
2024-08-21 15:33 [PATCH v3 net-next] tcp: avoid reusing FIN_WAIT2 when trying to find port in connect() process Jason Xing
@ 2024-08-21 15:47 ` Eric Dumazet
2024-08-21 15:51 ` Jason Xing
0 siblings, 1 reply; 3+ messages in thread
From: Eric Dumazet @ 2024-08-21 15:47 UTC (permalink / raw)
To: Jason Xing
Cc: davem, kuba, pabeni, dsahern, kuniyu, netdev, Jason Xing,
Jade Dong
On Wed, Aug 21, 2024 at 5:33 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> From: Jason Xing <kernelxing@tencent.com>
>
> We found that one close-wait socket was reset by the other side
> due to a new connection reusing the same port which is beyond our
> expectation, so we have to investigate the underlying reason.
>
> The following experiment is conducted in the test environment. We
> limit the port range from 40000 to 40010 and delay the time to close()
> after receiving a fin from the active close side, which can help us
> easily reproduce like what happened in production.
>
> Here are three connections captured by tcpdump:
> 127.0.0.1.40002 > 127.0.0.1.9999: Flags [S], seq 2965525191
> 127.0.0.1.9999 > 127.0.0.1.40002: Flags [S.], seq 2769915070
> 127.0.0.1.40002 > 127.0.0.1.9999: Flags [.], ack 1
> 127.0.0.1.40002 > 127.0.0.1.9999: Flags [F.], seq 1, ack 1
> // a few seconds later, within 60 seconds
> 127.0.0.1.40002 > 127.0.0.1.9999: Flags [S], seq 2965590730
> 127.0.0.1.9999 > 127.0.0.1.40002: Flags [.], ack 2
> 127.0.0.1.40002 > 127.0.0.1.9999: Flags [R], seq 2965525193
> // later, very quickly
> 127.0.0.1.40002 > 127.0.0.1.9999: Flags [S], seq 2965590730
> 127.0.0.1.9999 > 127.0.0.1.40002: Flags [S.], seq 3120990805
> 127.0.0.1.40002 > 127.0.0.1.9999: Flags [.], ack 1
>
> As we can see, the first flow is reset because:
> 1) client starts a new connection, I mean, the second one
> 2) client tries to find a suitable port which is a timewait socket
> (its state is timewait, substate is fin_wait2)
> 3) client occupies that timewait port to send a SYN
> 4) server finds a corresponding close-wait socket in ehash table,
> then replies with a challenge ack
> 5) client sends an RST to terminate this old close-wait socket.
>
> I don't think the port selection algo can choose a FIN_WAIT2 socket
> when we turn on tcp_tw_reuse because on the server side there
> remain unread data. In some cases, if one side haven't call close() yet,
> we should not consider it as expendable and treat it at will.
>
> Even though, sometimes, the server isn't able to call close() as soon
> as possible like what we expect, it can not be terminated easily,
> especially due to a second unrelated connection happening.
>
> After this patch, we can see the expected failure if we start a
> connection when all the ports are occupied in fin_wait2 state:
> "Ncat: Cannot assign requested address."
>
> Reported-by: Jade Dong <jadedong@tencent.com>
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
> v3
> Link: https://lore.kernel.org/all/20240815113745.6668-1-kerneljasonxing@gmail.com/
> 1. take the ipv6 case into consideration. (Eric)
>
> v2
> Link: https://lore.kernel.org/all/20240814035136.60796-1-kerneljasonxing@gmail.com/
> 1. change from fin_wait2 to timewait test statement, no functional
> change (Kuniyuki)
> ---
> net/ipv4/tcp_ipv4.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index fd17f25ff288..b37c70d292bc 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -144,6 +144,9 @@ int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp)
> reuse = 0;
> }
>
> + if (tw->tw_substate == TCP_FIN_WAIT2)
> + reuse = 0;
> +
sysctl_tcp_tw_reuse default value being 2, I would suggest doing this
test earlier,
to avoid unneeded work.
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index c2860480099f216d69fc570efdb991d2304be785..9af18d0293cd6655faf4eeb60ff3d41ce94ae843
100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -118,6 +118,9 @@ int tcp_twsk_unique(struct sock *sk, struct sock
*sktw, void *twp)
struct tcp_sock *tp = tcp_sk(sk);
int ts_recent_stamp;
+ if (tw->tw_substate == TCP_FIN_WAIT2)
+ reuse = 0;
+
if (reuse == 2) {
/* Still does not detect *everything* that goes through
* lo, since we require a loopback src or dst address
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v3 net-next] tcp: avoid reusing FIN_WAIT2 when trying to find port in connect() process
2024-08-21 15:47 ` Eric Dumazet
@ 2024-08-21 15:51 ` Jason Xing
0 siblings, 0 replies; 3+ messages in thread
From: Jason Xing @ 2024-08-21 15:51 UTC (permalink / raw)
To: Eric Dumazet
Cc: davem, kuba, pabeni, dsahern, kuniyu, netdev, Jason Xing,
Jade Dong
On Wed, Aug 21, 2024 at 11:47 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, Aug 21, 2024 at 5:33 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > We found that one close-wait socket was reset by the other side
> > due to a new connection reusing the same port which is beyond our
> > expectation, so we have to investigate the underlying reason.
> >
> > The following experiment is conducted in the test environment. We
> > limit the port range from 40000 to 40010 and delay the time to close()
> > after receiving a fin from the active close side, which can help us
> > easily reproduce like what happened in production.
> >
> > Here are three connections captured by tcpdump:
> > 127.0.0.1.40002 > 127.0.0.1.9999: Flags [S], seq 2965525191
> > 127.0.0.1.9999 > 127.0.0.1.40002: Flags [S.], seq 2769915070
> > 127.0.0.1.40002 > 127.0.0.1.9999: Flags [.], ack 1
> > 127.0.0.1.40002 > 127.0.0.1.9999: Flags [F.], seq 1, ack 1
> > // a few seconds later, within 60 seconds
> > 127.0.0.1.40002 > 127.0.0.1.9999: Flags [S], seq 2965590730
> > 127.0.0.1.9999 > 127.0.0.1.40002: Flags [.], ack 2
> > 127.0.0.1.40002 > 127.0.0.1.9999: Flags [R], seq 2965525193
> > // later, very quickly
> > 127.0.0.1.40002 > 127.0.0.1.9999: Flags [S], seq 2965590730
> > 127.0.0.1.9999 > 127.0.0.1.40002: Flags [S.], seq 3120990805
> > 127.0.0.1.40002 > 127.0.0.1.9999: Flags [.], ack 1
> >
> > As we can see, the first flow is reset because:
> > 1) client starts a new connection, I mean, the second one
> > 2) client tries to find a suitable port which is a timewait socket
> > (its state is timewait, substate is fin_wait2)
> > 3) client occupies that timewait port to send a SYN
> > 4) server finds a corresponding close-wait socket in ehash table,
> > then replies with a challenge ack
> > 5) client sends an RST to terminate this old close-wait socket.
> >
> > I don't think the port selection algo can choose a FIN_WAIT2 socket
> > when we turn on tcp_tw_reuse because on the server side there
> > remain unread data. In some cases, if one side haven't call close() yet,
> > we should not consider it as expendable and treat it at will.
> >
> > Even though, sometimes, the server isn't able to call close() as soon
> > as possible like what we expect, it can not be terminated easily,
> > especially due to a second unrelated connection happening.
> >
> > After this patch, we can see the expected failure if we start a
> > connection when all the ports are occupied in fin_wait2 state:
> > "Ncat: Cannot assign requested address."
> >
> > Reported-by: Jade Dong <jadedong@tencent.com>
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > ---
> > v3
> > Link: https://lore.kernel.org/all/20240815113745.6668-1-kerneljasonxing@gmail.com/
> > 1. take the ipv6 case into consideration. (Eric)
> >
> > v2
> > Link: https://lore.kernel.org/all/20240814035136.60796-1-kerneljasonxing@gmail.com/
> > 1. change from fin_wait2 to timewait test statement, no functional
> > change (Kuniyuki)
> > ---
> > net/ipv4/tcp_ipv4.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > index fd17f25ff288..b37c70d292bc 100644
> > --- a/net/ipv4/tcp_ipv4.c
> > +++ b/net/ipv4/tcp_ipv4.c
> > @@ -144,6 +144,9 @@ int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp)
> > reuse = 0;
> > }
> >
> > + if (tw->tw_substate == TCP_FIN_WAIT2)
> > + reuse = 0;
> > +
>
> sysctl_tcp_tw_reuse default value being 2, I would suggest doing this
> test earlier,
> to avoid unneeded work.
Thanks. I should have thought of that. I will submit it ~24 hours later.
Thanks,
Jason
>
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index c2860480099f216d69fc570efdb991d2304be785..9af18d0293cd6655faf4eeb60ff3d41ce94ae843
> 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -118,6 +118,9 @@ int tcp_twsk_unique(struct sock *sk, struct sock
> *sktw, void *twp)
> struct tcp_sock *tp = tcp_sk(sk);
> int ts_recent_stamp;
>
> + if (tw->tw_substate == TCP_FIN_WAIT2)
> + reuse = 0;
> +
> if (reuse == 2) {
> /* Still does not detect *everything* that goes through
> * lo, since we require a loopback src or dst address
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-08-21 15:52 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-21 15:33 [PATCH v3 net-next] tcp: avoid reusing FIN_WAIT2 when trying to find port in connect() process Jason Xing
2024-08-21 15:47 ` Eric Dumazet
2024-08-21 15:51 ` Jason Xing
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).