netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 net] tcp: Don't drop SYN+ACK for simultaneous connect().
@ 2024-07-04  3:57 Kuniyuki Iwashima
  2024-07-04  8:01 ` Paolo Abeni
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Kuniyuki Iwashima @ 2024-07-04  3:57 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Ahern
  Cc: Lawrence Brakmo, Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

RFC 9293 states that in the case of simultaneous connect(), the connection
gets established when SYN+ACK is received. [0]

      TCP Peer A                                       TCP Peer B

  1.  CLOSED                                           CLOSED
  2.  SYN-SENT     --> <SEQ=100><CTL=SYN>              ...
  3.  SYN-RECEIVED <-- <SEQ=300><CTL=SYN>              <-- SYN-SENT
  4.               ... <SEQ=100><CTL=SYN>              --> SYN-RECEIVED
  5.  SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ...
  6.  ESTABLISHED  <-- <SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED
  7.               ... <SEQ=100><ACK=301><CTL=SYN,ACK> --> ESTABLISHED

However, since commit 0c24604b68fc ("tcp: implement RFC 5961 4.2"), such a
SYN+ACK is dropped in tcp_validate_incoming() and responded with Challenge
ACK.

For example, the write() syscall in the following packetdrill script fails
with -EAGAIN, and wrong SNMP stats get incremented.

   0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 3
  +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)

  +0 > S  0:0(0) <mss 1460,sackOK,TS val 1000 ecr 0,nop,wscale 8>
  +0 < S  0:0(0) win 1000 <mss 1000>
  +0 > S. 0:0(0) ack 1 <mss 1460,sackOK,TS val 3308134035 ecr 0,nop,wscale 8>
  +0 < S. 0:0(0) ack 1 win 1000

  +0 write(3, ..., 100) = 100
  +0 > P. 1:101(100) ack 1

  --

  # packetdrill cross-synack.pkt
  cross-synack.pkt:13: runtime error in write call: Expected result 100 but got -1 with errno 11 (Resource temporarily unavailable)
  # nstat
  ...
  TcpExtTCPChallengeACK           1                  0.0
  TcpExtTCPSYNChallenge           1                  0.0

That said, this is no big deal because the Challenge ACK finally let the
connection state transition to TCP_ESTABLISHED in both directions.  If the
peer is not using Linux, there might be a small latency before ACK though.

The problem is that bpf_skops_established() is triggered by the Challenge
ACK instead of SYN+ACK.  This causes the bpf prog to miss the chance to
check if the peer supports a TCP option that is expected to be exchanged
in SYN and SYN+ACK.

Let's accept a bare SYN+ACK for non-TFO TCP_SYN_RECV sockets to avoid such
a situation.

Link: https://www.rfc-editor.org/rfc/rfc9293.html#section-3.5-7 [0]
Fixes: 9872a4bde31b ("bpf: Add TCP connection BPF callbacks")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/ipv4/tcp_input.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 77294fd5fd3e..70595009bb58 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5980,6 +5980,11 @@ static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb,
 	 * RFC 5961 4.2 : Send a challenge ack
 	 */
 	if (th->syn) {
+		if (sk->sk_state == TCP_SYN_RECV && !tp->syn_fastopen && th->ack &&
+		    TCP_SKB_CB(skb)->seq + 1 == TCP_SKB_CB(skb)->end_seq &&
+		    TCP_SKB_CB(skb)->seq + 1 == tp->rcv_nxt &&
+		    TCP_SKB_CB(skb)->ack_seq == tp->snd_nxt)
+			goto pass;
 syn_challenge:
 		if (syn_inerr)
 			TCP_INC_STATS(sock_net(sk), TCP_MIB_INERRS);
@@ -5990,7 +5995,7 @@ static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb,
 	}
 
 	bpf_skops_parse_hdr(sk, skb);
-
+pass:
 	return true;
 
 discard:
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 net] tcp: Don't drop SYN+ACK for simultaneous connect().
  2024-07-04  3:57 [PATCH v1 net] tcp: Don't drop SYN+ACK for simultaneous connect() Kuniyuki Iwashima
@ 2024-07-04  8:01 ` Paolo Abeni
  2024-07-04  8:03   ` Eric Dumazet
  2024-07-04  8:44 ` Eric Dumazet
  2024-07-04 11:16 ` Paolo Abeni
  2 siblings, 1 reply; 13+ messages in thread
From: Paolo Abeni @ 2024-07-04  8:01 UTC (permalink / raw)
  To: Kuniyuki Iwashima, David S. Miller, Eric Dumazet, Jakub Kicinski,
	David Ahern
  Cc: Lawrence Brakmo, Kuniyuki Iwashima, netdev

On Wed, 2024-07-03 at 20:57 -0700, Kuniyuki Iwashima wrote:
> RFC 9293 states that in the case of simultaneous connect(), the connection
> gets established when SYN+ACK is received. [0]
> 
>       TCP Peer A                                       TCP Peer B
> 
>   1.  CLOSED                                           CLOSED
>   2.  SYN-SENT     --> <SEQ=100><CTL=SYN>              ...
>   3.  SYN-RECEIVED <-- <SEQ=300><CTL=SYN>              <-- SYN-SENT
>   4.               ... <SEQ=100><CTL=SYN>              --> SYN-RECEIVED
>   5.  SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ...
>   6.  ESTABLISHED  <-- <SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED
>   7.               ... <SEQ=100><ACK=301><CTL=SYN,ACK> --> ESTABLISHED
> 
> However, since commit 0c24604b68fc ("tcp: implement RFC 5961 4.2"), such a
> SYN+ACK is dropped in tcp_validate_incoming() and responded with Challenge
> ACK.
> 
> For example, the write() syscall in the following packetdrill script fails
> with -EAGAIN, and wrong SNMP stats get incremented.
> 
>    0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 3
>   +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
> 
>   +0 > S  0:0(0) <mss 1460,sackOK,TS val 1000 ecr 0,nop,wscale 8>
>   +0 < S  0:0(0) win 1000 <mss 1000>
>   +0 > S. 0:0(0) ack 1 <mss 1460,sackOK,TS val 3308134035 ecr 0,nop,wscale 8>
>   +0 < S. 0:0(0) ack 1 win 1000
> 
>   +0 write(3, ..., 100) = 100
>   +0 > P. 1:101(100) ack 1
> 
>   --
> 
>   # packetdrill cross-synack.pkt
>   cross-synack.pkt:13: runtime error in write call: Expected result 100 but got -1 with errno 11 (Resource temporarily unavailable)
>   # nstat
>   ...
>   TcpExtTCPChallengeACK           1                  0.0
>   TcpExtTCPSYNChallenge           1                  0.0
> 
> That said, this is no big deal because the Challenge ACK finally let the
> connection state transition to TCP_ESTABLISHED in both directions.  If the
> peer is not using Linux, there might be a small latency before ACK though.

I'm curious to learn in which scenarios the peer is not running Linux:
out of sheer ignorance on my side I thought simult-connect was only
possible - or at least made any sense - only on loopback. 

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 net] tcp: Don't drop SYN+ACK for simultaneous connect().
  2024-07-04  8:01 ` Paolo Abeni
@ 2024-07-04  8:03   ` Eric Dumazet
  2024-07-04  8:14     ` Paolo Abeni
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2024-07-04  8:03 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Kuniyuki Iwashima, David S. Miller, Jakub Kicinski, David Ahern,
	Lawrence Brakmo, Kuniyuki Iwashima, netdev

On Thu, Jul 4, 2024 at 10:01 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Wed, 2024-07-03 at 20:57 -0700, Kuniyuki Iwashima wrote:
> > RFC 9293 states that in the case of simultaneous connect(), the connection
> > gets established when SYN+ACK is received. [0]
> >
> >       TCP Peer A                                       TCP Peer B
> >
> >   1.  CLOSED                                           CLOSED
> >   2.  SYN-SENT     --> <SEQ=100><CTL=SYN>              ...
> >   3.  SYN-RECEIVED <-- <SEQ=300><CTL=SYN>              <-- SYN-SENT
> >   4.               ... <SEQ=100><CTL=SYN>              --> SYN-RECEIVED
> >   5.  SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ...
> >   6.  ESTABLISHED  <-- <SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED
> >   7.               ... <SEQ=100><ACK=301><CTL=SYN,ACK> --> ESTABLISHED
> >
> > However, since commit 0c24604b68fc ("tcp: implement RFC 5961 4.2"), such a
> > SYN+ACK is dropped in tcp_validate_incoming() and responded with Challenge
> > ACK.
> >
> > For example, the write() syscall in the following packetdrill script fails
> > with -EAGAIN, and wrong SNMP stats get incremented.
> >
> >    0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 3
> >   +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
> >
> >   +0 > S  0:0(0) <mss 1460,sackOK,TS val 1000 ecr 0,nop,wscale 8>
> >   +0 < S  0:0(0) win 1000 <mss 1000>
> >   +0 > S. 0:0(0) ack 1 <mss 1460,sackOK,TS val 3308134035 ecr 0,nop,wscale 8>
> >   +0 < S. 0:0(0) ack 1 win 1000
> >
> >   +0 write(3, ..., 100) = 100
> >   +0 > P. 1:101(100) ack 1
> >
> >   --
> >
> >   # packetdrill cross-synack.pkt
> >   cross-synack.pkt:13: runtime error in write call: Expected result 100 but got -1 with errno 11 (Resource temporarily unavailable)
> >   # nstat
> >   ...
> >   TcpExtTCPChallengeACK           1                  0.0
> >   TcpExtTCPSYNChallenge           1                  0.0
> >
> > That said, this is no big deal because the Challenge ACK finally let the
> > connection state transition to TCP_ESTABLISHED in both directions.  If the
> > peer is not using Linux, there might be a small latency before ACK though.
>
> I'm curious to learn in which scenarios the peer is not running Linux:
> out of sheer ignorance on my side I thought simult-connect was only
> possible - or at least made any sense - only on loopback.

This is the case in the scenario used in the packetdrill test included
in this changelog,
but in general simultaneous connect() can be attempted from two different hosts.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 net] tcp: Don't drop SYN+ACK for simultaneous connect().
  2024-07-04  8:03   ` Eric Dumazet
@ 2024-07-04  8:14     ` Paolo Abeni
  2024-07-04  8:30       ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Abeni @ 2024-07-04  8:14 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Kuniyuki Iwashima, David S. Miller, Jakub Kicinski, David Ahern,
	Lawrence Brakmo, Kuniyuki Iwashima, netdev

On Thu, 2024-07-04 at 10:03 +0200, Eric Dumazet wrote:
> On Thu, Jul 4, 2024 at 10:01 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > 
> > On Wed, 2024-07-03 at 20:57 -0700, Kuniyuki Iwashima wrote:
> > > RFC 9293 states that in the case of simultaneous connect(), the connection
> > > gets established when SYN+ACK is received. [0]
> > > 
> > >       TCP Peer A                                       TCP Peer B
> > > 
> > >   1.  CLOSED                                           CLOSED
> > >   2.  SYN-SENT     --> <SEQ=100><CTL=SYN>              ...
> > >   3.  SYN-RECEIVED <-- <SEQ=300><CTL=SYN>              <-- SYN-SENT
> > >   4.               ... <SEQ=100><CTL=SYN>              --> SYN-RECEIVED
> > >   5.  SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ...
> > >   6.  ESTABLISHED  <-- <SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED
> > >   7.               ... <SEQ=100><ACK=301><CTL=SYN,ACK> --> ESTABLISHED
> > > 
> > > However, since commit 0c24604b68fc ("tcp: implement RFC 5961 4.2"), such a
> > > SYN+ACK is dropped in tcp_validate_incoming() and responded with Challenge
> > > ACK.
> > > 
> > > For example, the write() syscall in the following packetdrill script fails
> > > with -EAGAIN, and wrong SNMP stats get incremented.
> > > 
> > >    0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 3
> > >   +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
> > > 
> > >   +0 > S  0:0(0) <mss 1460,sackOK,TS val 1000 ecr 0,nop,wscale 8>
> > >   +0 < S  0:0(0) win 1000 <mss 1000>
> > >   +0 > S. 0:0(0) ack 1 <mss 1460,sackOK,TS val 3308134035 ecr 0,nop,wscale 8>
> > >   +0 < S. 0:0(0) ack 1 win 1000
> > > 
> > >   +0 write(3, ..., 100) = 100
> > >   +0 > P. 1:101(100) ack 1
> > > 
> > >   --
> > > 
> > >   # packetdrill cross-synack.pkt
> > >   cross-synack.pkt:13: runtime error in write call: Expected result 100 but got -1 with errno 11 (Resource temporarily unavailable)
> > >   # nstat
> > >   ...
> > >   TcpExtTCPChallengeACK           1                  0.0
> > >   TcpExtTCPSYNChallenge           1                  0.0
> > > 
> > > That said, this is no big deal because the Challenge ACK finally let the
> > > connection state transition to TCP_ESTABLISHED in both directions.  If the
> > > peer is not using Linux, there might be a small latency before ACK though.
> > 
> > I'm curious to learn in which scenarios the peer is not running Linux:
> > out of sheer ignorance on my side I thought simult-connect was only
> > possible - or at least made any sense - only on loopback.
> 
> This is the case in the scenario used in the packetdrill test included
> in this changelog,
> but in general simultaneous connect() can be attempted from two different hosts.

I understand that. I also thought such thing belonged to protocol's
edge cases nobody would dare to really use. Why doing that instead of
more usual client-server connection?

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 net] tcp: Don't drop SYN+ACK for simultaneous connect().
  2024-07-04  8:14     ` Paolo Abeni
@ 2024-07-04  8:30       ` Eric Dumazet
  0 siblings, 0 replies; 13+ messages in thread
From: Eric Dumazet @ 2024-07-04  8:30 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Kuniyuki Iwashima, David S. Miller, Jakub Kicinski, David Ahern,
	Lawrence Brakmo, Kuniyuki Iwashima, netdev

On Thu, Jul 4, 2024 at 10:14 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Thu, 2024-07-04 at 10:03 +0200, Eric Dumazet wrote:
> > On Thu, Jul 4, 2024 at 10:01 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > >
> > > On Wed, 2024-07-03 at 20:57 -0700, Kuniyuki Iwashima wrote:
> > > > RFC 9293 states that in the case of simultaneous connect(), the connection
> > > > gets established when SYN+ACK is received. [0]
> > > >
> > > >       TCP Peer A                                       TCP Peer B
> > > >
> > > >   1.  CLOSED                                           CLOSED
> > > >   2.  SYN-SENT     --> <SEQ=100><CTL=SYN>              ...
> > > >   3.  SYN-RECEIVED <-- <SEQ=300><CTL=SYN>              <-- SYN-SENT
> > > >   4.               ... <SEQ=100><CTL=SYN>              --> SYN-RECEIVED
> > > >   5.  SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ...
> > > >   6.  ESTABLISHED  <-- <SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED
> > > >   7.               ... <SEQ=100><ACK=301><CTL=SYN,ACK> --> ESTABLISHED
> > > >
> > > > However, since commit 0c24604b68fc ("tcp: implement RFC 5961 4.2"), such a
> > > > SYN+ACK is dropped in tcp_validate_incoming() and responded with Challenge
> > > > ACK.
> > > >
> > > > For example, the write() syscall in the following packetdrill script fails
> > > > with -EAGAIN, and wrong SNMP stats get incremented.
> > > >
> > > >    0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 3
> > > >   +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
> > > >
> > > >   +0 > S  0:0(0) <mss 1460,sackOK,TS val 1000 ecr 0,nop,wscale 8>
> > > >   +0 < S  0:0(0) win 1000 <mss 1000>
> > > >   +0 > S. 0:0(0) ack 1 <mss 1460,sackOK,TS val 3308134035 ecr 0,nop,wscale 8>
> > > >   +0 < S. 0:0(0) ack 1 win 1000
> > > >
> > > >   +0 write(3, ..., 100) = 100
> > > >   +0 > P. 1:101(100) ack 1
> > > >
> > > >   --
> > > >
> > > >   # packetdrill cross-synack.pkt
> > > >   cross-synack.pkt:13: runtime error in write call: Expected result 100 but got -1 with errno 11 (Resource temporarily unavailable)
> > > >   # nstat
> > > >   ...
> > > >   TcpExtTCPChallengeACK           1                  0.0
> > > >   TcpExtTCPSYNChallenge           1                  0.0
> > > >
> > > > That said, this is no big deal because the Challenge ACK finally let the
> > > > connection state transition to TCP_ESTABLISHED in both directions.  If the
> > > > peer is not using Linux, there might be a small latency before ACK though.
> > >
> > > I'm curious to learn in which scenarios the peer is not running Linux:
> > > out of sheer ignorance on my side I thought simult-connect was only
> > > possible - or at least made any sense - only on loopback.
> >
> > This is the case in the scenario used in the packetdrill test included
> > in this changelog,
> > but in general simultaneous connect() can be attempted from two different hosts.
>
> I understand that. I also thought such thing belonged to protocol's
> edge cases nobody would dare to really use. Why doing that instead of
> more usual client-server connection?

Long time ago, I heard that this could be used to establish p2p TCP flows,
from hosts behind NAT and some firewalls.

Presumably Kuniyuki company/customers rely on such RFC-compliant behavior.

Also, fuzzers definitely want to stress this part of our stack, and
this is a lot of fun.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 net] tcp: Don't drop SYN+ACK for simultaneous connect().
  2024-07-04  3:57 [PATCH v1 net] tcp: Don't drop SYN+ACK for simultaneous connect() Kuniyuki Iwashima
  2024-07-04  8:01 ` Paolo Abeni
@ 2024-07-04  8:44 ` Eric Dumazet
  2024-07-04 17:36   ` Kuniyuki Iwashima
  2024-07-04 11:16 ` Paolo Abeni
  2 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2024-07-04  8:44 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, David Ahern,
	Lawrence Brakmo, Kuniyuki Iwashima, netdev

On Thu, Jul 4, 2024 at 5:57 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> RFC 9293 states that in the case of simultaneous connect(), the connection
> gets established when SYN+ACK is received. [0]
>
>       TCP Peer A                                       TCP Peer B
>
>   1.  CLOSED                                           CLOSED
>   2.  SYN-SENT     --> <SEQ=100><CTL=SYN>              ...
>   3.  SYN-RECEIVED <-- <SEQ=300><CTL=SYN>              <-- SYN-SENT
>   4.               ... <SEQ=100><CTL=SYN>              --> SYN-RECEIVED
>   5.  SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ...
>   6.  ESTABLISHED  <-- <SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED
>   7.               ... <SEQ=100><ACK=301><CTL=SYN,ACK> --> ESTABLISHED
>
> However, since commit 0c24604b68fc ("tcp: implement RFC 5961 4.2"), such a
> SYN+ACK is dropped in tcp_validate_incoming() and responded with Challenge
> ACK.
>
> For example, the write() syscall in the following packetdrill script fails
> with -EAGAIN, and wrong SNMP stats get incremented.
>
>    0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 3
>   +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
>
>   +0 > S  0:0(0) <mss 1460,sackOK,TS val 1000 ecr 0,nop,wscale 8>
>   +0 < S  0:0(0) win 1000 <mss 1000>
>   +0 > S. 0:0(0) ack 1 <mss 1460,sackOK,TS val 3308134035 ecr 0,nop,wscale 8>
>   +0 < S. 0:0(0) ack 1 win 1000
>
>   +0 write(3, ..., 100) = 100
>   +0 > P. 1:101(100) ack 1
>
>   --
>
>   # packetdrill cross-synack.pkt
>   cross-synack.pkt:13: runtime error in write call: Expected result 100 but got -1 with errno 11 (Resource temporarily unavailable)
>   # nstat
>   ...
>   TcpExtTCPChallengeACK           1                  0.0
>   TcpExtTCPSYNChallenge           1                  0.0
>
> That said, this is no big deal because the Challenge ACK finally let the
> connection state transition to TCP_ESTABLISHED in both directions.  If the
> peer is not using Linux, there might be a small latency before ACK though.

I suggest removing these 3 lines. Removing a not needed challenge ACK is good
regardless of the 'other peer' behavior.

>
> The problem is that bpf_skops_established() is triggered by the Challenge
> ACK instead of SYN+ACK.  This causes the bpf prog to miss the chance to
> check if the peer supports a TCP option that is expected to be exchanged
> in SYN and SYN+ACK.
>
> Let's accept a bare SYN+ACK for non-TFO TCP_SYN_RECV sockets to avoid such
> a situation.
>
> Link: https://www.rfc-editor.org/rfc/rfc9293.html#section-3.5-7 [0]
> Fixes: 9872a4bde31b ("bpf: Add TCP connection BPF callbacks")
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> ---
>  net/ipv4/tcp_input.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 77294fd5fd3e..70595009bb58 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -5980,6 +5980,11 @@ static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb,
>          * RFC 5961 4.2 : Send a challenge ack
>          */
>         if (th->syn) {
> +               if (sk->sk_state == TCP_SYN_RECV && !tp->syn_fastopen && th->ack &&
> +                   TCP_SKB_CB(skb)->seq + 1 == TCP_SKB_CB(skb)->end_seq &&
> +                   TCP_SKB_CB(skb)->seq + 1 == tp->rcv_nxt &&
> +                   TCP_SKB_CB(skb)->ack_seq == tp->snd_nxt)
> +                       goto pass;
>  syn_challenge:
>                 if (syn_inerr)
>                         TCP_INC_STATS(sock_net(sk), TCP_MIB_INERRS);
> @@ -5990,7 +5995,7 @@ static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb,
>         }
>
>         bpf_skops_parse_hdr(sk, skb);
> -
> +pass:

It is not clear to me why we do not call bpf_skops_parse_hdr(sk, skb)
in this case ?


>         return true;
>
>  discard:
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 net] tcp: Don't drop SYN+ACK for simultaneous connect().
  2024-07-04  3:57 [PATCH v1 net] tcp: Don't drop SYN+ACK for simultaneous connect() Kuniyuki Iwashima
  2024-07-04  8:01 ` Paolo Abeni
  2024-07-04  8:44 ` Eric Dumazet
@ 2024-07-04 11:16 ` Paolo Abeni
  2024-07-04 12:23   ` Eric Dumazet
  2 siblings, 1 reply; 13+ messages in thread
From: Paolo Abeni @ 2024-07-04 11:16 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: Lawrence Brakmo, Kuniyuki Iwashima, netdev, David S. Miller,
	Eric Dumazet, Jakub Kicinski, David Ahern

On Wed, 2024-07-03 at 20:57 -0700, Kuniyuki Iwashima wrote:
> RFC 9293 states that in the case of simultaneous connect(), the connection
> gets established when SYN+ACK is received. [0]
> 
>       TCP Peer A                                       TCP Peer B
> 
>   1.  CLOSED                                           CLOSED
>   2.  SYN-SENT     --> <SEQ=100><CTL=SYN>              ...
>   3.  SYN-RECEIVED <-- <SEQ=300><CTL=SYN>              <-- SYN-SENT
>   4.               ... <SEQ=100><CTL=SYN>              --> SYN-RECEIVED
>   5.  SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ...
>   6.  ESTABLISHED  <-- <SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED
>   7.               ... <SEQ=100><ACK=301><CTL=SYN,ACK> --> ESTABLISHED
> 
> However, since commit 0c24604b68fc ("tcp: implement RFC 5961 4.2"), such a
> SYN+ACK is dropped in tcp_validate_incoming() and responded with Challenge
> ACK.
> 
> For example, the write() syscall in the following packetdrill script fails
> with -EAGAIN, and wrong SNMP stats get incremented.
> 
>    0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 3
>   +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
> 
>   +0 > S  0:0(0) <mss 1460,sackOK,TS val 1000 ecr 0,nop,wscale 8>
>   +0 < S  0:0(0) win 1000 <mss 1000>
>   +0 > S. 0:0(0) ack 1 <mss 1460,sackOK,TS val 3308134035 ecr 0,nop,wscale 8>
>   +0 < S. 0:0(0) ack 1 win 1000
> 
>   +0 write(3, ..., 100) = 100
>   +0 > P. 1:101(100) ack 1
> 
>   --
> 
>   # packetdrill cross-synack.pkt
>   cross-synack.pkt:13: runtime error in write call: Expected result 100 but got -1 with errno 11 (Resource temporarily unavailable)
>   # nstat
>   ...
>   TcpExtTCPChallengeACK           1                  0.0
>   TcpExtTCPSYNChallenge           1                  0.0
> 
> That said, this is no big deal because the Challenge ACK finally let the
> connection state transition to TCP_ESTABLISHED in both directions.  If the
> peer is not using Linux, there might be a small latency before ACK though.
> 
> The problem is that bpf_skops_established() is triggered by the Challenge
> ACK instead of SYN+ACK.  This causes the bpf prog to miss the chance to
> check if the peer supports a TCP option that is expected to be exchanged
> in SYN and SYN+ACK.
> 
> Let's accept a bare SYN+ACK for non-TFO TCP_SYN_RECV sockets to avoid such
> a situation.

Apparently this behavior change is causing TCP AO self-tests failures:

https://netdev.bots.linux.dev/contest.html?pw-n=0&branch=net-next-2024-07-04--09-00
e.g.
https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/668061/22-self-connect-ipv4/stdout

Could you please have a look?

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 net] tcp: Don't drop SYN+ACK for simultaneous connect().
  2024-07-04 11:16 ` Paolo Abeni
@ 2024-07-04 12:23   ` Eric Dumazet
  2024-07-04 17:42     ` Kuniyuki Iwashima
  2024-07-04 20:47     ` Dmitry Safonov
  0 siblings, 2 replies; 13+ messages in thread
From: Eric Dumazet @ 2024-07-04 12:23 UTC (permalink / raw)
  To: Paolo Abeni, Dmitry Safonov
  Cc: Kuniyuki Iwashima, Lawrence Brakmo, Kuniyuki Iwashima, netdev,
	David S. Miller, Jakub Kicinski, David Ahern

On Thu, Jul 4, 2024 at 1:16 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Wed, 2024-07-03 at 20:57 -0700, Kuniyuki Iwashima wrote:
> > RFC 9293 states that in the case of simultaneous connect(), the connection
> > gets established when SYN+ACK is received. [0]
> >
> >       TCP Peer A                                       TCP Peer B
> >
> >   1.  CLOSED                                           CLOSED
> >   2.  SYN-SENT     --> <SEQ=100><CTL=SYN>              ...
> >   3.  SYN-RECEIVED <-- <SEQ=300><CTL=SYN>              <-- SYN-SENT
> >   4.               ... <SEQ=100><CTL=SYN>              --> SYN-RECEIVED
> >   5.  SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ...
> >   6.  ESTABLISHED  <-- <SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED
> >   7.               ... <SEQ=100><ACK=301><CTL=SYN,ACK> --> ESTABLISHED
> >
> > However, since commit 0c24604b68fc ("tcp: implement RFC 5961 4.2"), such a
> > SYN+ACK is dropped in tcp_validate_incoming() and responded with Challenge
> > ACK.
> >
> > For example, the write() syscall in the following packetdrill script fails
> > with -EAGAIN, and wrong SNMP stats get incremented.
> >
> >    0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 3
> >   +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
> >
> >   +0 > S  0:0(0) <mss 1460,sackOK,TS val 1000 ecr 0,nop,wscale 8>
> >   +0 < S  0:0(0) win 1000 <mss 1000>
> >   +0 > S. 0:0(0) ack 1 <mss 1460,sackOK,TS val 3308134035 ecr 0,nop,wscale 8>
> >   +0 < S. 0:0(0) ack 1 win 1000
> >
> >   +0 write(3, ..., 100) = 100
> >   +0 > P. 1:101(100) ack 1
> >
> >   --
> >
> >   # packetdrill cross-synack.pkt
> >   cross-synack.pkt:13: runtime error in write call: Expected result 100 but got -1 with errno 11 (Resource temporarily unavailable)
> >   # nstat
> >   ...
> >   TcpExtTCPChallengeACK           1                  0.0
> >   TcpExtTCPSYNChallenge           1                  0.0
> >
> > That said, this is no big deal because the Challenge ACK finally let the
> > connection state transition to TCP_ESTABLISHED in both directions.  If the
> > peer is not using Linux, there might be a small latency before ACK though.
> >
> > The problem is that bpf_skops_established() is triggered by the Challenge
> > ACK instead of SYN+ACK.  This causes the bpf prog to miss the chance to
> > check if the peer supports a TCP option that is expected to be exchanged
> > in SYN and SYN+ACK.
> >
> > Let's accept a bare SYN+ACK for non-TFO TCP_SYN_RECV sockets to avoid such
> > a situation.
>
> Apparently this behavior change is causing TCP AO self-tests failures:
>
> https://netdev.bots.linux.dev/contest.html?pw-n=0&branch=net-next-2024-07-04--09-00
> e.g.
> https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/668061/22-self-connect-ipv4/stdout
>

These tests seem to have broken assumptions on a kernel behavior which
are orthogonal to TCP AO.

> Could you please have a look?
>
> Thanks!
>
> Paolo
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 net] tcp: Don't drop SYN+ACK for simultaneous connect().
  2024-07-04  8:44 ` Eric Dumazet
@ 2024-07-04 17:36   ` Kuniyuki Iwashima
  2024-07-04 19:01     ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Kuniyuki Iwashima @ 2024-07-04 17:36 UTC (permalink / raw)
  To: edumazet; +Cc: brakmo, davem, dsahern, kuba, kuni1840, kuniyu, netdev, pabeni

From: Eric Dumazet <edumazet@google.com>
Date: Thu, 4 Jul 2024 10:44:55 +0200
> On Thu, Jul 4, 2024 at 5:57 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> >
> > RFC 9293 states that in the case of simultaneous connect(), the connection
> > gets established when SYN+ACK is received. [0]
> >
> >       TCP Peer A                                       TCP Peer B
> >
> >   1.  CLOSED                                           CLOSED
> >   2.  SYN-SENT     --> <SEQ=100><CTL=SYN>              ...
> >   3.  SYN-RECEIVED <-- <SEQ=300><CTL=SYN>              <-- SYN-SENT
> >   4.               ... <SEQ=100><CTL=SYN>              --> SYN-RECEIVED
> >   5.  SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ...
> >   6.  ESTABLISHED  <-- <SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED
> >   7.               ... <SEQ=100><ACK=301><CTL=SYN,ACK> --> ESTABLISHED
> >
> > However, since commit 0c24604b68fc ("tcp: implement RFC 5961 4.2"), such a
> > SYN+ACK is dropped in tcp_validate_incoming() and responded with Challenge
> > ACK.
> >
> > For example, the write() syscall in the following packetdrill script fails
> > with -EAGAIN, and wrong SNMP stats get incremented.
> >
> >    0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 3
> >   +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
> >
> >   +0 > S  0:0(0) <mss 1460,sackOK,TS val 1000 ecr 0,nop,wscale 8>
> >   +0 < S  0:0(0) win 1000 <mss 1000>
> >   +0 > S. 0:0(0) ack 1 <mss 1460,sackOK,TS val 3308134035 ecr 0,nop,wscale 8>
> >   +0 < S. 0:0(0) ack 1 win 1000
> >
> >   +0 write(3, ..., 100) = 100
> >   +0 > P. 1:101(100) ack 1
> >
> >   --
> >
> >   # packetdrill cross-synack.pkt
> >   cross-synack.pkt:13: runtime error in write call: Expected result 100 but got -1 with errno 11 (Resource temporarily unavailable)
> >   # nstat
> >   ...
> >   TcpExtTCPChallengeACK           1                  0.0
> >   TcpExtTCPSYNChallenge           1                  0.0
> >
> > That said, this is no big deal because the Challenge ACK finally let the
> > connection state transition to TCP_ESTABLISHED in both directions.  If the
> > peer is not using Linux, there might be a small latency before ACK though.
> 
> I suggest removing these 3 lines. Removing a not needed challenge ACK is good
> regardless of the 'other peer' behavior.

I see, then should Fixes point to 0c24604b68fc ?

Also I noticed it still sends ACK in tcp_ack_snd_check() as if it's a
response to the normal 3WHS, so we need:

---8<---
@@ -6788,6 +6793,9 @@ tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
 		tcp_fast_path_on(tp);
 		if (sk->sk_shutdown & SEND_SHUTDOWN)
 			tcp_shutdown(sk, SEND_SHUTDOWN);
+
+		if (!req)
+			goto consume;
 		break;
 
 	case TCP_FIN_WAIT1: {
---8<---

and I have a question regarding the consume: label.  Why do we use
__kfree_skb() there instead of consume_skb() ?  I guess it's because
skb_unref() is unnecessary and expensive and tracing is also expensive ?


> 
> >
> > The problem is that bpf_skops_established() is triggered by the Challenge
> > ACK instead of SYN+ACK.  This causes the bpf prog to miss the chance to
> > check if the peer supports a TCP option that is expected to be exchanged
> > in SYN and SYN+ACK.
> >
> > Let's accept a bare SYN+ACK for non-TFO TCP_SYN_RECV sockets to avoid such
> > a situation.
> >
> > Link: https://www.rfc-editor.org/rfc/rfc9293.html#section-3.5-7 [0]
> > Fixes: 9872a4bde31b ("bpf: Add TCP connection BPF callbacks")
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > ---
> >  net/ipv4/tcp_input.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index 77294fd5fd3e..70595009bb58 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -5980,6 +5980,11 @@ static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb,
> >          * RFC 5961 4.2 : Send a challenge ack
> >          */
> >         if (th->syn) {
> > +               if (sk->sk_state == TCP_SYN_RECV && !tp->syn_fastopen && th->ack &&
> > +                   TCP_SKB_CB(skb)->seq + 1 == TCP_SKB_CB(skb)->end_seq &&
> > +                   TCP_SKB_CB(skb)->seq + 1 == tp->rcv_nxt &&
> > +                   TCP_SKB_CB(skb)->ack_seq == tp->snd_nxt)
> > +                       goto pass;
> >  syn_challenge:
> >                 if (syn_inerr)
> >                         TCP_INC_STATS(sock_net(sk), TCP_MIB_INERRS);
> > @@ -5990,7 +5995,7 @@ static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb,
> >         }
> >
> >         bpf_skops_parse_hdr(sk, skb);
> > -
> > +pass:
> 
> It is not clear to me why we do not call bpf_skops_parse_hdr(sk, skb)
> in this case ?

I skipped bpf_skops_parse_hdr() as it had this check.

        switch (sk->sk_state) {
        case TCP_SYN_RECV:
        case TCP_SYN_SENT:
        case TCP_LISTEN:
                return;
        }

Thanks!

> 
> 
> >         return true;
> >
> >  discard:
> > --
> > 2.30.2

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 net] tcp: Don't drop SYN+ACK for simultaneous connect().
  2024-07-04 12:23   ` Eric Dumazet
@ 2024-07-04 17:42     ` Kuniyuki Iwashima
  2024-07-04 20:47     ` Dmitry Safonov
  1 sibling, 0 replies; 13+ messages in thread
From: Kuniyuki Iwashima @ 2024-07-04 17:42 UTC (permalink / raw)
  To: edumazet
  Cc: 0x7f454c46, brakmo, davem, dsahern, kuba, kuni1840, kuniyu,
	netdev, pabeni

From: Eric Dumazet <edumazet@google.com>
Date: Thu, 4 Jul 2024 14:23:11 +0200
> On Thu, Jul 4, 2024 at 1:16 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >
> > On Wed, 2024-07-03 at 20:57 -0700, Kuniyuki Iwashima wrote:
> > > RFC 9293 states that in the case of simultaneous connect(), the connection
> > > gets established when SYN+ACK is received. [0]
> > >
> > >       TCP Peer A                                       TCP Peer B
> > >
> > >   1.  CLOSED                                           CLOSED
> > >   2.  SYN-SENT     --> <SEQ=100><CTL=SYN>              ...
> > >   3.  SYN-RECEIVED <-- <SEQ=300><CTL=SYN>              <-- SYN-SENT
> > >   4.               ... <SEQ=100><CTL=SYN>              --> SYN-RECEIVED
> > >   5.  SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ...
> > >   6.  ESTABLISHED  <-- <SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED
> > >   7.               ... <SEQ=100><ACK=301><CTL=SYN,ACK> --> ESTABLISHED
> > >
> > > However, since commit 0c24604b68fc ("tcp: implement RFC 5961 4.2"), such a
> > > SYN+ACK is dropped in tcp_validate_incoming() and responded with Challenge
> > > ACK.
> > >
> > > For example, the write() syscall in the following packetdrill script fails
> > > with -EAGAIN, and wrong SNMP stats get incremented.
> > >
> > >    0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 3
> > >   +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
> > >
> > >   +0 > S  0:0(0) <mss 1460,sackOK,TS val 1000 ecr 0,nop,wscale 8>
> > >   +0 < S  0:0(0) win 1000 <mss 1000>
> > >   +0 > S. 0:0(0) ack 1 <mss 1460,sackOK,TS val 3308134035 ecr 0,nop,wscale 8>
> > >   +0 < S. 0:0(0) ack 1 win 1000
> > >
> > >   +0 write(3, ..., 100) = 100
> > >   +0 > P. 1:101(100) ack 1
> > >
> > >   --
> > >
> > >   # packetdrill cross-synack.pkt
> > >   cross-synack.pkt:13: runtime error in write call: Expected result 100 but got -1 with errno 11 (Resource temporarily unavailable)
> > >   # nstat
> > >   ...
> > >   TcpExtTCPChallengeACK           1                  0.0
> > >   TcpExtTCPSYNChallenge           1                  0.0
> > >
> > > That said, this is no big deal because the Challenge ACK finally let the
> > > connection state transition to TCP_ESTABLISHED in both directions.  If the
> > > peer is not using Linux, there might be a small latency before ACK though.
> > >
> > > The problem is that bpf_skops_established() is triggered by the Challenge
> > > ACK instead of SYN+ACK.  This causes the bpf prog to miss the chance to
> > > check if the peer supports a TCP option that is expected to be exchanged
> > > in SYN and SYN+ACK.
> > >
> > > Let's accept a bare SYN+ACK for non-TFO TCP_SYN_RECV sockets to avoid such
> > > a situation.
> >
> > Apparently this behavior change is causing TCP AO self-tests failures:
> >
> > https://netdev.bots.linux.dev/contest.html?pw-n=0&branch=net-next-2024-07-04--09-00
> > e.g.
> > https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/668061/22-self-connect-ipv4/stdout
> >
> 
> These tests seem to have broken assumptions on a kernel behavior which
> are orthogonal to TCP AO.

Seems so...

> 
> > Could you please have a look?

Sure :)

Thanks!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 net] tcp: Don't drop SYN+ACK for simultaneous connect().
  2024-07-04 17:36   ` Kuniyuki Iwashima
@ 2024-07-04 19:01     ` Eric Dumazet
  2024-07-04 19:50       ` Kuniyuki Iwashima
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2024-07-04 19:01 UTC (permalink / raw)
  To: Kuniyuki Iwashima; +Cc: brakmo, davem, dsahern, kuba, kuni1840, netdev, pabeni

On Thu, Jul 4, 2024 at 7:36 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> From: Eric Dumazet <edumazet@google.com>
> Date: Thu, 4 Jul 2024 10:44:55 +0200
> > On Thu, Jul 4, 2024 at 5:57 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > >
> > > RFC 9293 states that in the case of simultaneous connect(), the connection
> > > gets established when SYN+ACK is received. [0]
> > >
> > >       TCP Peer A                                       TCP Peer B
> > >
> > >   1.  CLOSED                                           CLOSED
> > >   2.  SYN-SENT     --> <SEQ=100><CTL=SYN>              ...
> > >   3.  SYN-RECEIVED <-- <SEQ=300><CTL=SYN>              <-- SYN-SENT
> > >   4.               ... <SEQ=100><CTL=SYN>              --> SYN-RECEIVED
> > >   5.  SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ...
> > >   6.  ESTABLISHED  <-- <SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED
> > >   7.               ... <SEQ=100><ACK=301><CTL=SYN,ACK> --> ESTABLISHED
> > >
> > > However, since commit 0c24604b68fc ("tcp: implement RFC 5961 4.2"), such a
> > > SYN+ACK is dropped in tcp_validate_incoming() and responded with Challenge
> > > ACK.
> > >
> > > For example, the write() syscall in the following packetdrill script fails
> > > with -EAGAIN, and wrong SNMP stats get incremented.
> > >
> > >    0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 3
> > >   +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
> > >
> > >   +0 > S  0:0(0) <mss 1460,sackOK,TS val 1000 ecr 0,nop,wscale 8>
> > >   +0 < S  0:0(0) win 1000 <mss 1000>
> > >   +0 > S. 0:0(0) ack 1 <mss 1460,sackOK,TS val 3308134035 ecr 0,nop,wscale 8>
> > >   +0 < S. 0:0(0) ack 1 win 1000
> > >
> > >   +0 write(3, ..., 100) = 100
> > >   +0 > P. 1:101(100) ack 1
> > >
> > >   --
> > >
> > >   # packetdrill cross-synack.pkt
> > >   cross-synack.pkt:13: runtime error in write call: Expected result 100 but got -1 with errno 11 (Resource temporarily unavailable)
> > >   # nstat
> > >   ...
> > >   TcpExtTCPChallengeACK           1                  0.0
> > >   TcpExtTCPSYNChallenge           1                  0.0
> > >
> > > That said, this is no big deal because the Challenge ACK finally let the
> > > connection state transition to TCP_ESTABLISHED in both directions.  If the
> > > peer is not using Linux, there might be a small latency before ACK though.
> >
> > I suggest removing these 3 lines. Removing a not needed challenge ACK is good
> > regardless of the 'other peer' behavior.
>
> I see, then should Fixes point to 0c24604b68fc ?

I would target net-next, unless you have a very convincing reason.

The bug might only be exposed by eBPF users, right ?



>
> Also I noticed it still sends ACK in tcp_ack_snd_check() as if it's a
> response to the normal 3WHS, so we need:
>
> ---8<---
> @@ -6788,6 +6793,9 @@ tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
>                 tcp_fast_path_on(tp);
>                 if (sk->sk_shutdown & SEND_SHUTDOWN)
>                         tcp_shutdown(sk, SEND_SHUTDOWN);
> +
> +               if (!req)
> +                       goto consume;

I guess this is becoming a bit risky for net tree ?

Given tcp cross syn is mostly used by fuzzers, I would advise doing
something very minimal.

>                 break;
>
>         case TCP_FIN_WAIT1: {
> ---8<---
>
> and I have a question regarding the consume: label.  Why do we use
> __kfree_skb() there instead of consume_skb() ?  I guess it's because
> skb_unref() is unnecessary and expensive and tracing is also expensive ?

For the same reason we do __kfree_skb()  in other places.

This predates consume_skb().

>
>
> >
> > >
> > > The problem is that bpf_skops_established() is triggered by the Challenge
> > > ACK instead of SYN+ACK.  This causes the bpf prog to miss the chance to
> > > check if the peer supports a TCP option that is expected to be exchanged
> > > in SYN and SYN+ACK.
> > >
> > > Let's accept a bare SYN+ACK for non-TFO TCP_SYN_RECV sockets to avoid such
> > > a situation.
> > >
> > > Link: https://www.rfc-editor.org/rfc/rfc9293.html#section-3.5-7 [0]
> > > Fixes: 9872a4bde31b ("bpf: Add TCP connection BPF callbacks")
> > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > > ---
> > >  net/ipv4/tcp_input.c | 7 ++++++-
> > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > > index 77294fd5fd3e..70595009bb58 100644
> > > --- a/net/ipv4/tcp_input.c
> > > +++ b/net/ipv4/tcp_input.c
> > > @@ -5980,6 +5980,11 @@ static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb,
> > >          * RFC 5961 4.2 : Send a challenge ack
> > >          */
> > >         if (th->syn) {
> > > +               if (sk->sk_state == TCP_SYN_RECV && !tp->syn_fastopen && th->ack &&
> > > +                   TCP_SKB_CB(skb)->seq + 1 == TCP_SKB_CB(skb)->end_seq &&
> > > +                   TCP_SKB_CB(skb)->seq + 1 == tp->rcv_nxt &&
> > > +                   TCP_SKB_CB(skb)->ack_seq == tp->snd_nxt)
> > > +                       goto pass;
> > >  syn_challenge:
> > >                 if (syn_inerr)
> > >                         TCP_INC_STATS(sock_net(sk), TCP_MIB_INERRS);
> > > @@ -5990,7 +5995,7 @@ static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb,
> > >         }
> > >
> > >         bpf_skops_parse_hdr(sk, skb);
> > > -
> > > +pass:
> >
> > It is not clear to me why we do not call bpf_skops_parse_hdr(sk, skb)
> > in this case ?
>
> I skipped bpf_skops_parse_hdr() as it had this check.
>
>         switch (sk->sk_state) {
>         case TCP_SYN_RECV:
>         case TCP_SYN_SENT:
>         case TCP_LISTEN:
>                 return;
>         }

I think I prefer these checks being clearly centralized there, instead
of trying to duplicate them earlier.

This is slow path anyway.

I am a bit like Paolo : why do we even care, adding more fuel for fuzzers...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 net] tcp: Don't drop SYN+ACK for simultaneous connect().
  2024-07-04 19:01     ` Eric Dumazet
@ 2024-07-04 19:50       ` Kuniyuki Iwashima
  0 siblings, 0 replies; 13+ messages in thread
From: Kuniyuki Iwashima @ 2024-07-04 19:50 UTC (permalink / raw)
  To: edumazet; +Cc: brakmo, davem, dsahern, kuba, kuni1840, kuniyu, netdev, pabeni

From: Eric Dumazet <edumazet@google.com>
Date: Thu, 4 Jul 2024 21:01:52 +0200
> On Thu, Jul 4, 2024 at 7:36 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> >
> > From: Eric Dumazet <edumazet@google.com>
> > Date: Thu, 4 Jul 2024 10:44:55 +0200
> > > On Thu, Jul 4, 2024 at 5:57 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > > >
> > > > RFC 9293 states that in the case of simultaneous connect(), the connection
> > > > gets established when SYN+ACK is received. [0]
> > > >
> > > >       TCP Peer A                                       TCP Peer B
> > > >
> > > >   1.  CLOSED                                           CLOSED
> > > >   2.  SYN-SENT     --> <SEQ=100><CTL=SYN>              ...
> > > >   3.  SYN-RECEIVED <-- <SEQ=300><CTL=SYN>              <-- SYN-SENT
> > > >   4.               ... <SEQ=100><CTL=SYN>              --> SYN-RECEIVED
> > > >   5.  SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ...
> > > >   6.  ESTABLISHED  <-- <SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED
> > > >   7.               ... <SEQ=100><ACK=301><CTL=SYN,ACK> --> ESTABLISHED
> > > >
> > > > However, since commit 0c24604b68fc ("tcp: implement RFC 5961 4.2"), such a
> > > > SYN+ACK is dropped in tcp_validate_incoming() and responded with Challenge
> > > > ACK.
> > > >
> > > > For example, the write() syscall in the following packetdrill script fails
> > > > with -EAGAIN, and wrong SNMP stats get incremented.
> > > >
> > > >    0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_TCP) = 3
> > > >   +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
> > > >
> > > >   +0 > S  0:0(0) <mss 1460,sackOK,TS val 1000 ecr 0,nop,wscale 8>
> > > >   +0 < S  0:0(0) win 1000 <mss 1000>
> > > >   +0 > S. 0:0(0) ack 1 <mss 1460,sackOK,TS val 3308134035 ecr 0,nop,wscale 8>
> > > >   +0 < S. 0:0(0) ack 1 win 1000
> > > >
> > > >   +0 write(3, ..., 100) = 100
> > > >   +0 > P. 1:101(100) ack 1
> > > >
> > > >   --
> > > >
> > > >   # packetdrill cross-synack.pkt
> > > >   cross-synack.pkt:13: runtime error in write call: Expected result 100 but got -1 with errno 11 (Resource temporarily unavailable)
> > > >   # nstat
> > > >   ...
> > > >   TcpExtTCPChallengeACK           1                  0.0
> > > >   TcpExtTCPSYNChallenge           1                  0.0
> > > >
> > > > That said, this is no big deal because the Challenge ACK finally let the
> > > > connection state transition to TCP_ESTABLISHED in both directions.  If the
> > > > peer is not using Linux, there might be a small latency before ACK though.
> > >
> > > I suggest removing these 3 lines. Removing a not needed challenge ACK is good
> > > regardless of the 'other peer' behavior.
> >
> > I see, then should Fixes point to 0c24604b68fc ?
> 
> I would target net-next, unless you have a very convincing reason.
> 
> The bug might only be exposed by eBPF users, right ?

Yes, and I'm ok with net-next.


> 
> >
> > Also I noticed it still sends ACK in tcp_ack_snd_check() as if it's a
> > response to the normal 3WHS, so we need:
> >
> > ---8<---
> > @@ -6788,6 +6793,9 @@ tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
> >                 tcp_fast_path_on(tp);
> >                 if (sk->sk_shutdown & SEND_SHUTDOWN)
> >                         tcp_shutdown(sk, SEND_SHUTDOWN);
> > +
> > +               if (!req)
> > +                       goto consume;
> 
> I guess this is becoming a bit risky for net tree ?
> 
> Given tcp cross syn is mostly used by fuzzers, I would advise doing
> something very minimal.

Is this still applied if I target to net-next ?  I'm a bit confused
with that removing unnecessary ACK is good.


> 
> >                 break;
> >
> >         case TCP_FIN_WAIT1: {
> > ---8<---
> >
> > and I have a question regarding the consume: label.  Why do we use
> > __kfree_skb() there instead of consume_skb() ?  I guess it's because
> > skb_unref() is unnecessary and expensive and tracing is also expensive ?
> 
> For the same reason we do __kfree_skb()  in other places.
> 
> This predates consume_skb().

That makes sense.


> > >
> > > >
> > > > The problem is that bpf_skops_established() is triggered by the Challenge
> > > > ACK instead of SYN+ACK.  This causes the bpf prog to miss the chance to
> > > > check if the peer supports a TCP option that is expected to be exchanged
> > > > in SYN and SYN+ACK.
> > > >
> > > > Let's accept a bare SYN+ACK for non-TFO TCP_SYN_RECV sockets to avoid such
> > > > a situation.
> > > >
> > > > Link: https://www.rfc-editor.org/rfc/rfc9293.html#section-3.5-7 [0]
> > > > Fixes: 9872a4bde31b ("bpf: Add TCP connection BPF callbacks")
> > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > > > ---
> > > >  net/ipv4/tcp_input.c | 7 ++++++-
> > > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > > > index 77294fd5fd3e..70595009bb58 100644
> > > > --- a/net/ipv4/tcp_input.c
> > > > +++ b/net/ipv4/tcp_input.c
> > > > @@ -5980,6 +5980,11 @@ static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb,
> > > >          * RFC 5961 4.2 : Send a challenge ack
> > > >          */
> > > >         if (th->syn) {
> > > > +               if (sk->sk_state == TCP_SYN_RECV && !tp->syn_fastopen && th->ack &&
> > > > +                   TCP_SKB_CB(skb)->seq + 1 == TCP_SKB_CB(skb)->end_seq &&
> > > > +                   TCP_SKB_CB(skb)->seq + 1 == tp->rcv_nxt &&
> > > > +                   TCP_SKB_CB(skb)->ack_seq == tp->snd_nxt)
> > > > +                       goto pass;
> > > >  syn_challenge:
> > > >                 if (syn_inerr)
> > > >                         TCP_INC_STATS(sock_net(sk), TCP_MIB_INERRS);
> > > > @@ -5990,7 +5995,7 @@ static bool tcp_validate_incoming(struct sock *sk, struct sk_buff *skb,
> > > >         }
> > > >
> > > >         bpf_skops_parse_hdr(sk, skb);
> > > > -
> > > > +pass:
> > >
> > > It is not clear to me why we do not call bpf_skops_parse_hdr(sk, skb)
> > > in this case ?
> >
> > I skipped bpf_skops_parse_hdr() as it had this check.
> >
> >         switch (sk->sk_state) {
> >         case TCP_SYN_RECV:
> >         case TCP_SYN_SENT:
> >         case TCP_LISTEN:
> >                 return;
> >         }
> 
> I think I prefer these checks being clearly centralized there, instead
> of trying to duplicate them earlier.
> 
> This is slow path anyway.

Exactly, I'll move the label before bpf_skops_parse_hdr().


> 
> I am a bit like Paolo : why do we even care, adding more fuel for fuzzers...

...for some BPF users wanting to cover all possible cases.. :S

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 net] tcp: Don't drop SYN+ACK for simultaneous connect().
  2024-07-04 12:23   ` Eric Dumazet
  2024-07-04 17:42     ` Kuniyuki Iwashima
@ 2024-07-04 20:47     ` Dmitry Safonov
  1 sibling, 0 replies; 13+ messages in thread
From: Dmitry Safonov @ 2024-07-04 20:47 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Paolo Abeni, Kuniyuki Iwashima, Lawrence Brakmo,
	Kuniyuki Iwashima, netdev, David S. Miller, Jakub Kicinski,
	David Ahern

On Thu, 4 Jul 2024 at 13:23, Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Jul 4, 2024 at 1:16 PM Paolo Abeni <pabeni@redhat.com> wrote:
[..]
> >
> > Apparently this behavior change is causing TCP AO self-tests failures:
> >
> > https://netdev.bots.linux.dev/contest.html?pw-n=0&branch=net-next-2024-07-04--09-00
> > e.g.
> > https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/668061/22-self-connect-ipv4/stdout
> >
>
> These tests seem to have broken assumptions on a kernel behavior which
> are orthogonal to TCP AO.

Yes, I think my intention here was to verify that it tests
simultaneous connect code-path. Which is quite guaranteed in
self-connect tcp case, I guess, but some experience tells that
anything may go or evolve with time in unexpected ways (aka paranoia),
I thought it's reasonable to check TCPChallengeACK & TCPSYNChallenge
counters to verify. It seems those checks can be just dropped.

Thanks,
             Dmitry

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-07-04 20:47 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-04  3:57 [PATCH v1 net] tcp: Don't drop SYN+ACK for simultaneous connect() Kuniyuki Iwashima
2024-07-04  8:01 ` Paolo Abeni
2024-07-04  8:03   ` Eric Dumazet
2024-07-04  8:14     ` Paolo Abeni
2024-07-04  8:30       ` Eric Dumazet
2024-07-04  8:44 ` Eric Dumazet
2024-07-04 17:36   ` Kuniyuki Iwashima
2024-07-04 19:01     ` Eric Dumazet
2024-07-04 19:50       ` Kuniyuki Iwashima
2024-07-04 11:16 ` Paolo Abeni
2024-07-04 12:23   ` Eric Dumazet
2024-07-04 17:42     ` Kuniyuki Iwashima
2024-07-04 20:47     ` Dmitry Safonov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).