* [PATCH v2 net-next 1/2] tcp: Don't drop SYN+ACK for simultaneous connect().
2024-07-08 18:08 [PATCH v2 net-next 0/2] tcp: Make simultaneous connect() RFC-compliant Kuniyuki Iwashima
@ 2024-07-08 18:08 ` Kuniyuki Iwashima
2024-07-08 18:42 ` Eric Dumazet
2024-07-08 18:08 ` [PATCH v2 net-next 2/2] selftests: tcp: Remove broken SNMP assumptions for TCP AO self-connect tests Kuniyuki Iwashima
2024-07-09 19:52 ` [PATCH v2 net-next 0/2] tcp: Make simultaneous connect() RFC-compliant Jakub Kicinski
2 siblings, 1 reply; 9+ messages in thread
From: Kuniyuki Iwashima @ 2024-07-08 18:08 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
David Ahern
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, Dmitry Safonov, netdev
RFC 9293 states that in the case of simultaneous connect(), the connection
gets established when SYN+ACK is received. [0]
TCP Peer A TCP Peer B
1. CLOSED CLOSED
2. SYN-SENT --> <SEQ=100><CTL=SYN> ...
3. SYN-RECEIVED <-- <SEQ=300><CTL=SYN> <-- SYN-SENT
4. ... <SEQ=100><CTL=SYN> --> SYN-RECEIVED
5. SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ...
6. ESTABLISHED <-- <SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED
7. ... <SEQ=100><ACK=301><CTL=SYN,ACK> --> ESTABLISHED
However, since commit 0c24604b68fc ("tcp: implement RFC 5961 4.2"), such a
SYN+ACK is dropped in tcp_validate_incoming() and responded with Challenge
ACK.
For example, the write() syscall in the following packetdrill script fails
with -EAGAIN, and wrong SNMP stats get incremented.
0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 3
+0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
+0 > S 0:0(0) <mss 1460,sackOK,TS val 1000 ecr 0,nop,wscale 8>
+0 < S 0:0(0) win 1000 <mss 1000>
+0 > S. 0:0(0) ack 1 <mss 1460,sackOK,TS val 3308134035 ecr 0,nop,wscale 8>
+0 < S. 0:0(0) ack 1 win 1000
+0 write(3, ..., 100) = 100
+0 > P. 1:101(100) ack 1
--
# packetdrill cross-synack.pkt
cross-synack.pkt:13: runtime error in write call: Expected result 100 but got -1 with errno 11 (Resource temporarily unavailable)
# nstat
...
TcpExtTCPChallengeACK 1 0.0
TcpExtTCPSYNChallenge 1 0.0
The problem is that bpf_skops_established() is triggered by the Challenge
ACK instead of SYN+ACK. This causes the bpf prog to miss the chance to
check if the peer supports a TCP option that is expected to be exchanged
in SYN and SYN+ACK.
Let's accept a bare SYN+ACK for non-TFO TCP_SYN_RECV sockets to avoid such
a situation.
Note that tcp_ack_snd_check() in tcp_rcv_state_process() is skipped not to
send an unnecessary ACK, but this could be a bit risky for net.git, so this
targets for net-next.
Link: https://www.rfc-editor.org/rfc/rfc9293.html#section-3.5-7 [0]
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
net/ipv4/tcp_input.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 47dacb575f74..50984aedbc8b 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5989,6 +5989,11 @@ static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb,
* RFC 5961 4.2 : Send a challenge ack
*/
if (th->syn) {
+ if (sk->sk_state == TCP_SYN_RECV && !tp->syn_fastopen && th->ack &&
+ TCP_SKB_CB(skb)->seq + 1 == TCP_SKB_CB(skb)->end_seq &&
+ TCP_SKB_CB(skb)->seq + 1 == tp->rcv_nxt &&
+ TCP_SKB_CB(skb)->ack_seq == tp->snd_nxt)
+ goto pass;
syn_challenge:
if (syn_inerr)
TCP_INC_STATS(sock_net(sk), TCP_MIB_INERRS);
@@ -5998,6 +6003,7 @@ static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb,
goto discard;
}
+pass:
bpf_skops_parse_hdr(sk, skb);
return true;
@@ -6804,6 +6810,9 @@ tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
tcp_fast_path_on(tp);
if (sk->sk_shutdown & SEND_SHUTDOWN)
tcp_shutdown(sk, SEND_SHUTDOWN);
+
+ if (!req)
+ goto consume;
break;
case TCP_FIN_WAIT1: {
--
2.30.2
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH v2 net-next 1/2] tcp: Don't drop SYN+ACK for simultaneous connect().
2024-07-08 18:08 ` [PATCH v2 net-next 1/2] tcp: Don't drop SYN+ACK for simultaneous connect() Kuniyuki Iwashima
@ 2024-07-08 18:42 ` Eric Dumazet
0 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2024-07-08 18:42 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, David Ahern,
Kuniyuki Iwashima, Dmitry Safonov, netdev
On Mon, Jul 8, 2024 at 11:09 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> RFC 9293 states that in the case of simultaneous connect(), the connection
> gets established when SYN+ACK is received. [0]
>
> TCP Peer A TCP Peer B
>
> 1. CLOSED CLOSED
> 2. SYN-SENT --> <SEQ=100><CTL=SYN> ...
> 3. SYN-RECEIVED <-- <SEQ=300><CTL=SYN> <-- SYN-SENT
> 4. ... <SEQ=100><CTL=SYN> --> SYN-RECEIVED
> 5. SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ...
> 6. ESTABLISHED <-- <SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED
> 7. ... <SEQ=100><ACK=301><CTL=SYN,ACK> --> ESTABLISHED
>
> However, since commit 0c24604b68fc ("tcp: implement RFC 5961 4.2"), such a
> SYN+ACK is dropped in tcp_validate_incoming() and responded with Challenge
> ACK.
>
> For example, the write() syscall in the following packetdrill script fails
> with -EAGAIN, and wrong SNMP stats get incremented.
>
> 0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 3
> +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
>
> +0 > S 0:0(0) <mss 1460,sackOK,TS val 1000 ecr 0,nop,wscale 8>
> +0 < S 0:0(0) win 1000 <mss 1000>
> +0 > S. 0:0(0) ack 1 <mss 1460,sackOK,TS val 3308134035 ecr 0,nop,wscale 8>
> +0 < S. 0:0(0) ack 1 win 1000
>
> +0 write(3, ..., 100) = 100
> +0 > P. 1:101(100) ack 1
>
> --
>
> # packetdrill cross-synack.pkt
> cross-synack.pkt:13: runtime error in write call: Expected result 100 but got -1 with errno 11 (Resource temporarily unavailable)
> # nstat
> ...
> TcpExtTCPChallengeACK 1 0.0
> TcpExtTCPSYNChallenge 1 0.0
>
> The problem is that bpf_skops_established() is triggered by the Challenge
> ACK instead of SYN+ACK. This causes the bpf prog to miss the chance to
> check if the peer supports a TCP option that is expected to be exchanged
> in SYN and SYN+ACK.
>
> Let's accept a bare SYN+ACK for non-TFO TCP_SYN_RECV sockets to avoid such
> a situation.
>
> Note that tcp_ack_snd_check() in tcp_rcv_state_process() is skipped not to
> send an unnecessary ACK, but this could be a bit risky for net.git, so this
> targets for net-next.
>
> Link: https://www.rfc-editor.org/rfc/rfc9293.html#section-3.5-7 [0]
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2 net-next 2/2] selftests: tcp: Remove broken SNMP assumptions for TCP AO self-connect tests.
2024-07-08 18:08 [PATCH v2 net-next 0/2] tcp: Make simultaneous connect() RFC-compliant Kuniyuki Iwashima
2024-07-08 18:08 ` [PATCH v2 net-next 1/2] tcp: Don't drop SYN+ACK for simultaneous connect() Kuniyuki Iwashima
@ 2024-07-08 18:08 ` Kuniyuki Iwashima
2024-07-08 18:43 ` Eric Dumazet
2024-07-08 19:49 ` Dmitry Safonov
2024-07-09 19:52 ` [PATCH v2 net-next 0/2] tcp: Make simultaneous connect() RFC-compliant Jakub Kicinski
2 siblings, 2 replies; 9+ messages in thread
From: Kuniyuki Iwashima @ 2024-07-08 18:08 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
David Ahern
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, Dmitry Safonov, netdev
tcp_ao/self-connect.c checked the following SNMP stats before/after
connect() to confirm that the test exercises the simultaneous connect()
path.
* TCPChallengeACK
* TCPSYNChallenge
But the stats should not be counted for self-connect in the first place,
and the assumption is no longer true.
Let's remove the check.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
.../selftests/net/tcp_ao/self-connect.c | 18 ------------------
1 file changed, 18 deletions(-)
diff --git a/tools/testing/selftests/net/tcp_ao/self-connect.c b/tools/testing/selftests/net/tcp_ao/self-connect.c
index e154d9e198a9..a5698b0a3718 100644
--- a/tools/testing/selftests/net/tcp_ao/self-connect.c
+++ b/tools/testing/selftests/net/tcp_ao/self-connect.c
@@ -30,8 +30,6 @@ static void setup_lo_intf(const char *lo_intf)
static void tcp_self_connect(const char *tst, unsigned int port,
bool different_keyids, bool check_restore)
{
- uint64_t before_challenge_ack, after_challenge_ack;
- uint64_t before_syn_challenge, after_syn_challenge;
struct tcp_ao_counters before_ao, after_ao;
uint64_t before_aogood, after_aogood;
struct netstat *ns_before, *ns_after;
@@ -62,8 +60,6 @@ static void tcp_self_connect(const char *tst, unsigned int port,
ns_before = netstat_read();
before_aogood = netstat_get(ns_before, "TCPAOGood", NULL);
- before_challenge_ack = netstat_get(ns_before, "TCPChallengeACK", NULL);
- before_syn_challenge = netstat_get(ns_before, "TCPSYNChallenge", NULL);
if (test_get_tcp_ao_counters(sk, &before_ao))
test_error("test_get_tcp_ao_counters()");
@@ -82,8 +78,6 @@ static void tcp_self_connect(const char *tst, unsigned int port,
ns_after = netstat_read();
after_aogood = netstat_get(ns_after, "TCPAOGood", NULL);
- after_challenge_ack = netstat_get(ns_after, "TCPChallengeACK", NULL);
- after_syn_challenge = netstat_get(ns_after, "TCPSYNChallenge", NULL);
if (test_get_tcp_ao_counters(sk, &after_ao))
test_error("test_get_tcp_ao_counters()");
if (!check_restore) {
@@ -98,18 +92,6 @@ static void tcp_self_connect(const char *tst, unsigned int port,
close(sk);
return;
}
- if (after_challenge_ack <= before_challenge_ack ||
- after_syn_challenge <= before_syn_challenge) {
- /*
- * It's also meant to test simultaneous open, so check
- * these counters as well.
- */
- test_fail("%s: Didn't challenge SYN or ACK: %zu <= %zu OR %zu <= %zu",
- tst, after_challenge_ack, before_challenge_ack,
- after_syn_challenge, before_syn_challenge);
- close(sk);
- return;
- }
if (test_tcp_ao_counters_cmp(tst, &before_ao, &after_ao, TEST_CNT_GOOD)) {
close(sk);
--
2.30.2
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH v2 net-next 2/2] selftests: tcp: Remove broken SNMP assumptions for TCP AO self-connect tests.
2024-07-08 18:08 ` [PATCH v2 net-next 2/2] selftests: tcp: Remove broken SNMP assumptions for TCP AO self-connect tests Kuniyuki Iwashima
@ 2024-07-08 18:43 ` Eric Dumazet
2024-07-08 19:49 ` Dmitry Safonov
1 sibling, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2024-07-08 18:43 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, David Ahern,
Kuniyuki Iwashima, Dmitry Safonov, netdev
On Mon, Jul 8, 2024 at 11:10 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> tcp_ao/self-connect.c checked the following SNMP stats before/after
> connect() to confirm that the test exercises the simultaneous connect()
> path.
>
> * TCPChallengeACK
> * TCPSYNChallenge
>
> But the stats should not be counted for self-connect in the first place,
> and the assumption is no longer true.
>
> Let's remove the check.
>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 net-next 2/2] selftests: tcp: Remove broken SNMP assumptions for TCP AO self-connect tests.
2024-07-08 18:08 ` [PATCH v2 net-next 2/2] selftests: tcp: Remove broken SNMP assumptions for TCP AO self-connect tests Kuniyuki Iwashima
2024-07-08 18:43 ` Eric Dumazet
@ 2024-07-08 19:49 ` Dmitry Safonov
1 sibling, 0 replies; 9+ messages in thread
From: Dmitry Safonov @ 2024-07-08 19:49 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
David Ahern, Kuniyuki Iwashima, netdev
On Mon, Jul 8, 2024 at 7:10 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> tcp_ao/self-connect.c checked the following SNMP stats before/after
> connect() to confirm that the test exercises the simultaneous connect()
> path.
>
> * TCPChallengeACK
> * TCPSYNChallenge
>
> But the stats should not be counted for self-connect in the first place,
> and the assumption is no longer true.
>
> Let's remove the check.
>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Thanks!
Reviewed-by: Dmitry Safonov <dima@arista.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 net-next 0/2] tcp: Make simultaneous connect() RFC-compliant.
2024-07-08 18:08 [PATCH v2 net-next 0/2] tcp: Make simultaneous connect() RFC-compliant Kuniyuki Iwashima
2024-07-08 18:08 ` [PATCH v2 net-next 1/2] tcp: Don't drop SYN+ACK for simultaneous connect() Kuniyuki Iwashima
2024-07-08 18:08 ` [PATCH v2 net-next 2/2] selftests: tcp: Remove broken SNMP assumptions for TCP AO self-connect tests Kuniyuki Iwashima
@ 2024-07-09 19:52 ` Jakub Kicinski
2024-07-10 1:44 ` Kuniyuki Iwashima
2 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2024-07-09 19:52 UTC (permalink / raw)
To: Kuniyuki Iwashima
Cc: David S. Miller, Eric Dumazet, Paolo Abeni, David Ahern,
Kuniyuki Iwashima, Dmitry Safonov, netdev
On Mon, 8 Jul 2024 11:08:50 -0700 Kuniyuki Iwashima wrote:
> * Add patch 2
Hi Kuniyuki!
Looks like it also makes BPF CI fail. All of these:
https://netdev.bots.linux.dev/contest.html?branch=net-next-2024-07-09--15-00&executor=gh-bpf-ci&pw-n=0
But it builds down to the reuseport test on various platforms.
--
pw-bot: cr
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 net-next 0/2] tcp: Make simultaneous connect() RFC-compliant.
2024-07-09 19:52 ` [PATCH v2 net-next 0/2] tcp: Make simultaneous connect() RFC-compliant Jakub Kicinski
@ 2024-07-10 1:44 ` Kuniyuki Iwashima
2024-07-10 16:47 ` Kuniyuki Iwashima
0 siblings, 1 reply; 9+ messages in thread
From: Kuniyuki Iwashima @ 2024-07-10 1:44 UTC (permalink / raw)
To: kuba; +Cc: davem, dima, dsahern, edumazet, kuni1840, kuniyu, netdev, pabeni
From: Jakub Kicinski <kuba@kernel.org>
Date: Tue, 9 Jul 2024 12:52:09 -0700
> On Mon, 8 Jul 2024 11:08:50 -0700 Kuniyuki Iwashima wrote:
> > * Add patch 2
>
> Hi Kuniyuki!
>
> Looks like it also makes BPF CI fail. All of these:
> https://netdev.bots.linux.dev/contest.html?branch=net-next-2024-07-09--15-00&executor=gh-bpf-ci&pw-n=0
> But it builds down to the reuseport test on various platforms.
Oh, thanks for catching!
It seems the test is using TFO, and somehow fastopen_rsk is cleared,
but a packet is processed later in SYN_RECV state...
Will look into it.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 net-next 0/2] tcp: Make simultaneous connect() RFC-compliant.
2024-07-10 1:44 ` Kuniyuki Iwashima
@ 2024-07-10 16:47 ` Kuniyuki Iwashima
0 siblings, 0 replies; 9+ messages in thread
From: Kuniyuki Iwashima @ 2024-07-10 16:47 UTC (permalink / raw)
To: kuniyu; +Cc: davem, dima, dsahern, edumazet, kuba, kuni1840, netdev, pabeni
From: Kuniyuki Iwashima <kuniyu@amazon.com>
Date: Tue, 9 Jul 2024 18:44:55 -0700
> From: Jakub Kicinski <kuba@kernel.org>
> Date: Tue, 9 Jul 2024 12:52:09 -0700
> > On Mon, 8 Jul 2024 11:08:50 -0700 Kuniyuki Iwashima wrote:
> > > * Add patch 2
> >
> > Hi Kuniyuki!
> >
> > Looks like it also makes BPF CI fail. All of these:
> > https://netdev.bots.linux.dev/contest.html?branch=net-next-2024-07-09--15-00&executor=gh-bpf-ci&pw-n=0
> > But it builds down to the reuseport test on various platforms.
>
> Oh, thanks for catching!
>
> It seems the test is using TFO, and somehow fastopen_rsk is cleared,
> but a packet is processed later in SYN_RECV state...
The test used MSG_FASTOPEN but TFO always failed due to lack of
proper configuration, this should be fixed.
IP 127.0.0.1.36477 > 127.0.0.1.53357: Flags [S], seq 2263448885:2263448893, win 65495, options [mss 65495,sackOK,TS val 2871616407 ecr 0,nop,wscale 7], length 8
IP 127.0.0.1.53357 > 127.0.0.1.36477: Flags [S.], seq 3767023264, ack 2263448886, win 65483, options [mss 65495,sackOK,TS val 2871616407 ecr 2871616407,nop,wscale 7], length 0
But this wasn't related, just red-herring.
I just missed that the ACK of 3WHS was also processed by newly created
SYN_RECV sk in tcp_rcv_state_process() called from tcp_child_process().
So, (sk->sk_state == TCP_SYN_RECV && !tp->fastopen_rsk) cannot deduce
the cross SYN+ACK case.
We need to use (sk->sk_state == TCP_SYN_RECV && sk->sk_socket).
Will post v3.
Thanks!
^ permalink raw reply [flat|nested] 9+ messages in thread