netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/2] tcp: fix ISN selection in TIMEWAIT -> SYN_RECV
@ 2024-04-07  9:33 Eric Dumazet
  2024-04-07  9:33 ` [PATCH net-next 1/2] tcp: propagate tcp_tw_isn via an extra parameter to ->route_req() Eric Dumazet
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Eric Dumazet @ 2024-04-07  9:33 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Neal Cardwell, netdev, eric.dumazet, Eric Dumazet

TCP can transform a TIMEWAIT socket into a SYN_RECV one from
a SYN packet, and the ISN of the SYNACK packet is normally
generated using TIMEWAIT tw_snd_nxt.

This SYN packet also bypasses normal checks against listen queue
being full or not.

Unfortunately this has been broken almost one decade ago.

This series fixes the issue, in two patches.

First patch refactors code to add tcp_tw_isn as a parameter
to ->route_req(), to make the second patch smaller.

Second patch fixes the issue, by no longer using TCP_SKB_CB(skb)
to store the tcp_tw_isn.

Following packetdrill test passes after this series:

// Set up a server listening socket.
    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
   +0 bind(3, ..., ...) = 0
   +0 listen(3, 1) = 0

// Establish connection
   +0 < S 0:0(0) win 32792 <mss 1460,nop,nop,sackOK>
   +0 > S. 0:0(0) ack 1    <mss 1460,nop,nop,sackOK>
 +.01 < . 1:1(0) ack 1 win 32792

   +0 accept(3, ..., ...) = 4

// We close(), send a FIN, and get an ACK and FIN, in order to get into TIME_WAIT.

 +.01 close(4) = 0
   +0 > F. 1:1(0) ack 1
 +.01 < F. 1:1(0) ack 2 win 32792
   +0 > . 2:2(0) ack 2

// SYN hitting a TIME_WAIT -> should use an ISN based on TIMEWAIT tw_snd_nxt

 +.01 < S 1000:1000(0) win 65535 <mss 1460,nop,nop,sackOK>
   +0 > S. 65539:65539(0) ack 1001 <mss 1460,nop,nop,sackOK>


Eric Dumazet (2):
  tcp: propagate tcp_tw_isn via an extra parameter to ->route_req()
  tcp: replace TCP_SKB_CB(skb)->tcp_tw_isn with a per-cpu field

 include/net/tcp.h        | 13 +++++++------
 net/ipv4/tcp.c           |  3 +++
 net/ipv4/tcp_input.c     | 28 +++++++++++++++++-----------
 net/ipv4/tcp_ipv4.c      |  8 +++++---
 net/ipv4/tcp_minisocks.c |  4 ++--
 net/ipv6/tcp_ipv6.c      | 15 +++++++++------
 net/mptcp/subflow.c      | 10 ++++++----
 7 files changed, 49 insertions(+), 32 deletions(-)

-- 
2.44.0.478.gd926399ef9-goog


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH net-next 1/2] tcp: propagate tcp_tw_isn via an extra parameter to ->route_req()
  2024-04-07  9:33 [PATCH net-next 0/2] tcp: fix ISN selection in TIMEWAIT -> SYN_RECV Eric Dumazet
@ 2024-04-07  9:33 ` Eric Dumazet
  2024-04-07  9:33 ` [PATCH net-next 2/2] tcp: replace TCP_SKB_CB(skb)->tcp_tw_isn with a per-cpu field Eric Dumazet
  2024-04-09 11:20 ` [PATCH net-next 0/2] tcp: fix ISN selection in TIMEWAIT -> SYN_RECV patchwork-bot+netdevbpf
  2 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2024-04-07  9:33 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Neal Cardwell, netdev, eric.dumazet, Eric Dumazet

tcp_v6_init_req() reads TCP_SKB_CB(skb)->tcp_tw_isn to find
out if the request socket is created by a SYN hitting a TIMEWAIT socket.

This has been buggy for a decade, lets directly pass the information
from tcp_conn_request().

This is a preparatory patch to make the following one easier to review.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/tcp.h    |  3 ++-
 net/ipv4/tcp_input.c |  2 +-
 net/ipv4/tcp_ipv4.c  |  3 ++-
 net/ipv6/tcp_ipv6.c  | 10 ++++++----
 net/mptcp/subflow.c  | 10 ++++++----
 5 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 9ab5b37e9d532cdf26dd423810b07912ef4abd75..fa0ab77acee23654b22e97615de983fc04eee319 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2284,7 +2284,8 @@ struct tcp_request_sock_ops {
 	struct dst_entry *(*route_req)(const struct sock *sk,
 				       struct sk_buff *skb,
 				       struct flowi *fl,
-				       struct request_sock *req);
+				       struct request_sock *req,
+				       u32 tw_isn);
 	u32 (*init_seq)(const struct sk_buff *skb);
 	u32 (*init_ts_off)(const struct net *net, const struct sk_buff *skb);
 	int (*send_synack)(const struct sock *sk, struct dst_entry *dst,
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 1f28a2561795cf48ee7dbf638c15c773c8b8c84c..48c275e6ef02bfc5dd98f0878c752841f949c714 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -7160,7 +7160,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 	/* Note: tcp_v6_init_req() might override ir_iif for link locals */
 	inet_rsk(req)->ir_iif = inet_request_bound_dev_if(sk, skb);
 
-	dst = af_ops->route_req(sk, skb, &fl, req);
+	dst = af_ops->route_req(sk, skb, &fl, req, isn);
 	if (!dst)
 		goto drop_and_free;
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 52963c3bb8ca7380692f7be6e15d687c45e8673a..81e2f05c244d1671980a34bb756f528f3e6debcc 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1666,7 +1666,8 @@ static void tcp_v4_init_req(struct request_sock *req,
 static struct dst_entry *tcp_v4_route_req(const struct sock *sk,
 					  struct sk_buff *skb,
 					  struct flowi *fl,
-					  struct request_sock *req)
+					  struct request_sock *req,
+					  u32 tw_isn)
 {
 	tcp_v4_init_req(req, sk, skb);
 
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index cffebaec66f1ab60f1dde00b8bd8cc7a595bdc91..5141f7033abd8bb03bc4e162066ca4befe343bdc 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -793,7 +793,8 @@ static int tcp_v6_md5_hash_skb(char *md5_hash,
 
 static void tcp_v6_init_req(struct request_sock *req,
 			    const struct sock *sk_listener,
-			    struct sk_buff *skb)
+			    struct sk_buff *skb,
+			    u32 tw_isn)
 {
 	bool l3_slave = ipv6_l3mdev_skb(TCP_SKB_CB(skb)->header.h6.flags);
 	struct inet_request_sock *ireq = inet_rsk(req);
@@ -807,7 +808,7 @@ static void tcp_v6_init_req(struct request_sock *req,
 	    ipv6_addr_type(&ireq->ir_v6_rmt_addr) & IPV6_ADDR_LINKLOCAL)
 		ireq->ir_iif = tcp_v6_iif(skb);
 
-	if (!TCP_SKB_CB(skb)->tcp_tw_isn &&
+	if (!tw_isn &&
 	    (ipv6_opt_accepted(sk_listener, skb, &TCP_SKB_CB(skb)->header.h6) ||
 	     np->rxopt.bits.rxinfo ||
 	     np->rxopt.bits.rxoinfo || np->rxopt.bits.rxhlim ||
@@ -820,9 +821,10 @@ static void tcp_v6_init_req(struct request_sock *req,
 static struct dst_entry *tcp_v6_route_req(const struct sock *sk,
 					  struct sk_buff *skb,
 					  struct flowi *fl,
-					  struct request_sock *req)
+					  struct request_sock *req,
+					  u32 tw_isn)
 {
-	tcp_v6_init_req(req, sk, skb);
+	tcp_v6_init_req(req, sk, skb, tw_isn);
 
 	if (security_inet_conn_request(sk, skb, req))
 		return NULL;
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 6042a47da61be8bc3000ab485fe6fbb7bff387b6..294390a9cd431024b84a56feae9b9c895111393e 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -284,7 +284,8 @@ EXPORT_SYMBOL_GPL(mptcp_subflow_init_cookie_req);
 static struct dst_entry *subflow_v4_route_req(const struct sock *sk,
 					      struct sk_buff *skb,
 					      struct flowi *fl,
-					      struct request_sock *req)
+					      struct request_sock *req,
+					      u32 tw_isn)
 {
 	struct dst_entry *dst;
 	int err;
@@ -292,7 +293,7 @@ static struct dst_entry *subflow_v4_route_req(const struct sock *sk,
 	tcp_rsk(req)->is_mptcp = 1;
 	subflow_init_req(req, sk);
 
-	dst = tcp_request_sock_ipv4_ops.route_req(sk, skb, fl, req);
+	dst = tcp_request_sock_ipv4_ops.route_req(sk, skb, fl, req, tw_isn);
 	if (!dst)
 		return NULL;
 
@@ -351,7 +352,8 @@ static int subflow_v6_send_synack(const struct sock *sk, struct dst_entry *dst,
 static struct dst_entry *subflow_v6_route_req(const struct sock *sk,
 					      struct sk_buff *skb,
 					      struct flowi *fl,
-					      struct request_sock *req)
+					      struct request_sock *req,
+					      u32 tw_isn)
 {
 	struct dst_entry *dst;
 	int err;
@@ -359,7 +361,7 @@ static struct dst_entry *subflow_v6_route_req(const struct sock *sk,
 	tcp_rsk(req)->is_mptcp = 1;
 	subflow_init_req(req, sk);
 
-	dst = tcp_request_sock_ipv6_ops.route_req(sk, skb, fl, req);
+	dst = tcp_request_sock_ipv6_ops.route_req(sk, skb, fl, req, tw_isn);
 	if (!dst)
 		return NULL;
 
-- 
2.44.0.478.gd926399ef9-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net-next 2/2] tcp: replace TCP_SKB_CB(skb)->tcp_tw_isn with a per-cpu field
  2024-04-07  9:33 [PATCH net-next 0/2] tcp: fix ISN selection in TIMEWAIT -> SYN_RECV Eric Dumazet
  2024-04-07  9:33 ` [PATCH net-next 1/2] tcp: propagate tcp_tw_isn via an extra parameter to ->route_req() Eric Dumazet
@ 2024-04-07  9:33 ` Eric Dumazet
  2024-04-09  9:58   ` Paolo Abeni
  2024-04-09 11:20 ` [PATCH net-next 0/2] tcp: fix ISN selection in TIMEWAIT -> SYN_RECV patchwork-bot+netdevbpf
  2 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2024-04-07  9:33 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Neal Cardwell, netdev, eric.dumazet, Eric Dumazet

TCP can transform a TIMEWAIT socket into a SYN_RECV one from
a SYN packet, and the ISN of the SYNACK packet is normally
generated using TIMEWAIT tw_snd_nxt :

tcp_timewait_state_process()
...
    u32 isn = tcptw->tw_snd_nxt + 65535 + 2;
    if (isn == 0)
        isn++;
    TCP_SKB_CB(skb)->tcp_tw_isn = isn;
    return TCP_TW_SYN;

This SYN packet also bypasses normal checks against listen queue
being full or not.

tcp_conn_request()
...
       __u32 isn = TCP_SKB_CB(skb)->tcp_tw_isn;
...
        /* TW buckets are converted to open requests without
         * limitations, they conserve resources and peer is
         * evidently real one.
         */
        if ((syncookies == 2 || inet_csk_reqsk_queue_is_full(sk)) && !isn) {
                want_cookie = tcp_syn_flood_action(sk, rsk_ops->slab_name);
                if (!want_cookie)
                        goto drop;
        }

This was using TCP_SKB_CB(skb)->tcp_tw_isn field in skb.

Unfortunately this field has been accidentally cleared
after the call to tcp_timewait_state_process() returning
TCP_TW_SYN.

Using a field in TCP_SKB_CB(skb) for a temporary state
is overkill.

Switch instead to a per-cpu variable.

As a bonus, we do not have to clear tcp_tw_isn in TCP receive
fast path.
It is temporarily set then cleared only in the TCP_TW_SYN dance.

Fixes: 4ad19de8774e ("net: tcp6: fix double call of tcp_v6_fill_cb()")
Fixes: eeea10b83a13 ("tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/tcp.h        | 10 +++++-----
 net/ipv4/tcp.c           |  3 +++
 net/ipv4/tcp_input.c     | 26 ++++++++++++++++----------
 net/ipv4/tcp_ipv4.c      |  5 +++--
 net/ipv4/tcp_minisocks.c |  4 ++--
 net/ipv6/tcp_ipv6.c      |  5 +++--
 6 files changed, 32 insertions(+), 21 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index fa0ab77acee23654b22e97615de983fc04eee319..ba6c5ae86e228a1633feaf39b0f1e053c3832b08 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -52,6 +52,8 @@ extern struct inet_hashinfo tcp_hashinfo;
 DECLARE_PER_CPU(unsigned int, tcp_orphan_count);
 int tcp_orphan_count_sum(void);
 
+DECLARE_PER_CPU(u32, tcp_tw_isn);
+
 void tcp_time_wait(struct sock *sk, int state, int timeo);
 
 #define MAX_TCP_HEADER	L1_CACHE_ALIGN(128 + MAX_HEADER)
@@ -392,7 +394,8 @@ enum tcp_tw_status {
 
 enum tcp_tw_status tcp_timewait_state_process(struct inet_timewait_sock *tw,
 					      struct sk_buff *skb,
-					      const struct tcphdr *th);
+					      const struct tcphdr *th,
+					      u32 *tw_isn);
 struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
 			   struct request_sock *req, bool fastopen,
 			   bool *lost_race);
@@ -935,13 +938,10 @@ struct tcp_skb_cb {
 	__u32		seq;		/* Starting sequence number	*/
 	__u32		end_seq;	/* SEQ + FIN + SYN + datalen	*/
 	union {
-		/* Note : tcp_tw_isn is used in input path only
-		 *	  (isn chosen by tcp_timewait_state_process())
-		 *
+		/* Note :
 		 * 	  tcp_gso_segs/size are used in write queue only,
 		 *	  cf tcp_skb_pcount()/tcp_skb_mss()
 		 */
-		__u32		tcp_tw_isn;
 		struct {
 			u16	tcp_gso_segs;
 			u16	tcp_gso_size;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 92ee60492314a1483cfbfa2f73d32fcad5632773..6cb5b9f74c94b72fec1d6a3cc8e6dc75bd61ba2f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -290,6 +290,9 @@ enum {
 DEFINE_PER_CPU(unsigned int, tcp_orphan_count);
 EXPORT_PER_CPU_SYMBOL_GPL(tcp_orphan_count);
 
+DEFINE_PER_CPU(u32, tcp_tw_isn);
+EXPORT_PER_CPU_SYMBOL_GPL(tcp_tw_isn);
+
 long sysctl_tcp_mem[3] __read_mostly;
 EXPORT_SYMBOL(sysctl_tcp_mem);
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 48c275e6ef02bfc5dd98f0878c752841f949c714..5a45a0923a1f058cdc80255be0f76a71fd102d4d 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -7097,7 +7097,6 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 		     struct sock *sk, struct sk_buff *skb)
 {
 	struct tcp_fastopen_cookie foc = { .len = -1 };
-	__u32 isn = TCP_SKB_CB(skb)->tcp_tw_isn;
 	struct tcp_options_received tmp_opt;
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct net *net = sock_net(sk);
@@ -7107,21 +7106,28 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 	struct dst_entry *dst;
 	struct flowi fl;
 	u8 syncookies;
+	u32 isn;
 
 #ifdef CONFIG_TCP_AO
 	const struct tcp_ao_hdr *aoh;
 #endif
 
-	syncookies = READ_ONCE(net->ipv4.sysctl_tcp_syncookies);
+	isn = __this_cpu_read(tcp_tw_isn);
+	if (isn) {
+		/* TW buckets are converted to open requests without
+		 * limitations, they conserve resources and peer is
+		 * evidently real one.
+		 */
+		__this_cpu_write(tcp_tw_isn, 0);
+	} else {
+		syncookies = READ_ONCE(net->ipv4.sysctl_tcp_syncookies);
 
-	/* TW buckets are converted to open requests without
-	 * limitations, they conserve resources and peer is
-	 * evidently real one.
-	 */
-	if ((syncookies == 2 || inet_csk_reqsk_queue_is_full(sk)) && !isn) {
-		want_cookie = tcp_syn_flood_action(sk, rsk_ops->slab_name);
-		if (!want_cookie)
-			goto drop;
+		if (syncookies == 2 || inet_csk_reqsk_queue_is_full(sk)) {
+			want_cookie = tcp_syn_flood_action(sk,
+							   rsk_ops->slab_name);
+			if (!want_cookie)
+				goto drop;
+		}
 	}
 
 	if (sk_acceptq_is_full(sk)) {
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 81e2f05c244d1671980a34bb756f528f3e6debcc..1e650ec71d2fe5198b9dad9e6ea9c5eaf868277f 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2146,7 +2146,6 @@ static void tcp_v4_fill_cb(struct sk_buff *skb, const struct iphdr *iph,
 				    skb->len - th->doff * 4);
 	TCP_SKB_CB(skb)->ack_seq = ntohl(th->ack_seq);
 	TCP_SKB_CB(skb)->tcp_flags = tcp_flag_byte(th);
-	TCP_SKB_CB(skb)->tcp_tw_isn = 0;
 	TCP_SKB_CB(skb)->ip_dsfield = ipv4_get_dsfield(iph);
 	TCP_SKB_CB(skb)->sacked	 = 0;
 	TCP_SKB_CB(skb)->has_rxtstamp =
@@ -2168,6 +2167,7 @@ int tcp_v4_rcv(struct sk_buff *skb)
 	bool refcounted;
 	struct sock *sk;
 	int ret;
+	u32 isn;
 
 	drop_reason = SKB_DROP_REASON_NOT_SPECIFIED;
 	if (skb->pkt_type != PACKET_HOST)
@@ -2383,7 +2383,7 @@ int tcp_v4_rcv(struct sk_buff *skb)
 		inet_twsk_put(inet_twsk(sk));
 		goto csum_error;
 	}
-	switch (tcp_timewait_state_process(inet_twsk(sk), skb, th)) {
+	switch (tcp_timewait_state_process(inet_twsk(sk), skb, th, &isn)) {
 	case TCP_TW_SYN: {
 		struct sock *sk2 = inet_lookup_listener(net,
 							net->ipv4.tcp_death_row.hashinfo,
@@ -2397,6 +2397,7 @@ int tcp_v4_rcv(struct sk_buff *skb)
 			sk = sk2;
 			tcp_v4_restore_cb(skb);
 			refcounted = false;
+			__this_cpu_write(tcp_tw_isn, isn);
 			goto process;
 		}
 	}
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 5b21a07ddf9aa5593d21cb856f0e0ea2f45b1eef..f53c7ada2ace4219917e75f806f39a00d5ab0123 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -95,7 +95,7 @@ static void twsk_rcv_nxt_update(struct tcp_timewait_sock *tcptw, u32 seq)
  */
 enum tcp_tw_status
 tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
-			   const struct tcphdr *th)
+			   const struct tcphdr *th, u32 *tw_isn)
 {
 	struct tcp_options_received tmp_opt;
 	struct tcp_timewait_sock *tcptw = tcp_twsk((struct sock *)tw);
@@ -228,7 +228,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
 		u32 isn = tcptw->tw_snd_nxt + 65535 + 2;
 		if (isn == 0)
 			isn++;
-		TCP_SKB_CB(skb)->tcp_tw_isn = isn;
+		*tw_isn = isn;
 		return TCP_TW_SYN;
 	}
 
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 5141f7033abd8bb03bc4e162066ca4befe343bdc..3aa9da5c9a669d2754b421cfb704ad28def5a748 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1741,7 +1741,6 @@ static void tcp_v6_fill_cb(struct sk_buff *skb, const struct ipv6hdr *hdr,
 				    skb->len - th->doff*4);
 	TCP_SKB_CB(skb)->ack_seq = ntohl(th->ack_seq);
 	TCP_SKB_CB(skb)->tcp_flags = tcp_flag_byte(th);
-	TCP_SKB_CB(skb)->tcp_tw_isn = 0;
 	TCP_SKB_CB(skb)->ip_dsfield = ipv6_get_dsfield(hdr);
 	TCP_SKB_CB(skb)->sacked = 0;
 	TCP_SKB_CB(skb)->has_rxtstamp =
@@ -1758,6 +1757,7 @@ INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb)
 	bool refcounted;
 	struct sock *sk;
 	int ret;
+	u32 isn;
 	struct net *net = dev_net(skb->dev);
 
 	drop_reason = SKB_DROP_REASON_NOT_SPECIFIED;
@@ -1967,7 +1967,7 @@ INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb)
 		goto csum_error;
 	}
 
-	switch (tcp_timewait_state_process(inet_twsk(sk), skb, th)) {
+	switch (tcp_timewait_state_process(inet_twsk(sk), skb, th, &isn)) {
 	case TCP_TW_SYN:
 	{
 		struct sock *sk2;
@@ -1985,6 +1985,7 @@ INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb)
 			sk = sk2;
 			tcp_v6_restore_cb(skb);
 			refcounted = false;
+			__this_cpu_write(tcp_tw_isn, isn);
 			goto process;
 		}
 	}
-- 
2.44.0.478.gd926399ef9-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next 2/2] tcp: replace TCP_SKB_CB(skb)->tcp_tw_isn with a per-cpu field
  2024-04-07  9:33 ` [PATCH net-next 2/2] tcp: replace TCP_SKB_CB(skb)->tcp_tw_isn with a per-cpu field Eric Dumazet
@ 2024-04-09  9:58   ` Paolo Abeni
  2024-04-09 10:11     ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Paolo Abeni @ 2024-04-09  9:58 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski
  Cc: Neal Cardwell, netdev, eric.dumazet

On Sun, 2024-04-07 at 09:33 +0000, Eric Dumazet wrote:
> TCP can transform a TIMEWAIT socket into a SYN_RECV one from
> a SYN packet, and the ISN of the SYNACK packet is normally
> generated using TIMEWAIT tw_snd_nxt :
> 
> tcp_timewait_state_process()
> ...
>     u32 isn = tcptw->tw_snd_nxt + 65535 + 2;
>     if (isn == 0)
>         isn++;
>     TCP_SKB_CB(skb)->tcp_tw_isn = isn;
>     return TCP_TW_SYN;
> 
> This SYN packet also bypasses normal checks against listen queue
> being full or not.
> 
> tcp_conn_request()
> ...
>        __u32 isn = TCP_SKB_CB(skb)->tcp_tw_isn;
> ...
>         /* TW buckets are converted to open requests without
>          * limitations, they conserve resources and peer is
>          * evidently real one.
>          */
>         if ((syncookies == 2 || inet_csk_reqsk_queue_is_full(sk)) && !isn) {
>                 want_cookie = tcp_syn_flood_action(sk, rsk_ops->slab_name);
>                 if (!want_cookie)
>                         goto drop;
>         }
> 
> This was using TCP_SKB_CB(skb)->tcp_tw_isn field in skb.
> 
> Unfortunately this field has been accidentally cleared
> after the call to tcp_timewait_state_process() returning
> TCP_TW_SYN.
> 
> Using a field in TCP_SKB_CB(skb) for a temporary state
> is overkill.
> 
> Switch instead to a per-cpu variable.

I guess that pushing the info via a local variable all the way down to
tcp_conn_request would be cumbersome, and will prevent the fast path
optimization, right?

> As a bonus, we do not have to clear tcp_tw_isn in TCP receive
> fast path.
> It is temporarily set then cleared only in the TCP_TW_SYN dance.
> 
> Fixes: 4ad19de8774e ("net: tcp6: fix double call of tcp_v6_fill_cb()")
> Fixes: eeea10b83a13 ("tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()")
> Signed-off-by: Eric Dumazet <edumazet@google.com>

[...]

> @@ -2397,6 +2397,7 @@ int tcp_v4_rcv(struct sk_buff *skb)
>  			sk = sk2;
>  			tcp_v4_restore_cb(skb);
>  			refcounted = false;
> +			__this_cpu_write(tcp_tw_isn, isn);
>  			goto process;

Unrelated from this patch, but I think the 'process' label could be
moved down skipping a couple of conditionals. 'sk' is a listener
socket, checking for TW or SYN_RECV should not be needed, right?

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next 2/2] tcp: replace TCP_SKB_CB(skb)->tcp_tw_isn with a per-cpu field
  2024-04-09  9:58   ` Paolo Abeni
@ 2024-04-09 10:11     ` Eric Dumazet
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2024-04-09 10:11 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: David S . Miller, Jakub Kicinski, Neal Cardwell, netdev,
	eric.dumazet

On Tue, Apr 9, 2024 at 11:58 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Sun, 2024-04-07 at 09:33 +0000, Eric Dumazet wrote:
> > TCP can transform a TIMEWAIT socket into a SYN_RECV one from
> > a SYN packet, and the ISN of the SYNACK packet is normally
> > generated using TIMEWAIT tw_snd_nxt :
> >
> > tcp_timewait_state_process()
> > ...
> >     u32 isn = tcptw->tw_snd_nxt + 65535 + 2;
> >     if (isn == 0)
> >         isn++;
> >     TCP_SKB_CB(skb)->tcp_tw_isn = isn;
> >     return TCP_TW_SYN;
> >
> > This SYN packet also bypasses normal checks against listen queue
> > being full or not.
> >
> > tcp_conn_request()
> > ...
> >        __u32 isn = TCP_SKB_CB(skb)->tcp_tw_isn;
> > ...
> >         /* TW buckets are converted to open requests without
> >          * limitations, they conserve resources and peer is
> >          * evidently real one.
> >          */
> >         if ((syncookies == 2 || inet_csk_reqsk_queue_is_full(sk)) && !isn) {
> >                 want_cookie = tcp_syn_flood_action(sk, rsk_ops->slab_name);
> >                 if (!want_cookie)
> >                         goto drop;
> >         }
> >
> > This was using TCP_SKB_CB(skb)->tcp_tw_isn field in skb.
> >
> > Unfortunately this field has been accidentally cleared
> > after the call to tcp_timewait_state_process() returning
> > TCP_TW_SYN.
> >
> > Using a field in TCP_SKB_CB(skb) for a temporary state
> > is overkill.
> >
> > Switch instead to a per-cpu variable.
>
> I guess that pushing the info via a local variable all the way down to
> tcp_conn_request would be cumbersome, and will prevent the fast path
> optimization, right?

Right, I tried two other variants of the fix before coming to this.

One of the issues I had is that packets eventually going through the
socket backlog
can trigger this path, so I would have to carry a local variable in a
lot of paths.

References :

commit 449809a66c1d0b1563dee84493e14bf3104d2d7e    tcp/dccp: block BH
for SYN processing
commit 1ad98e9d1bdf4724c0a8532fabd84bf3c457c2bc    tcp/dccp: fix
lockdep issue when SYN is backlogged

I chose to not care about this tricky case (vs tcp_tw_syn), by making sure
to always leave per-cpu variable cleared after each tcp_conn_request()

This was also a nice way of not having to clear a field/variable for
each incoming
TCP packet :)

>
> > As a bonus, we do not have to clear tcp_tw_isn in TCP receive
> > fast path.
> > It is temporarily set then cleared only in the TCP_TW_SYN dance.
> >
> > Fixes: 4ad19de8774e ("net: tcp6: fix double call of tcp_v6_fill_cb()")
> > Fixes: eeea10b83a13 ("tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()")
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
>
> [...]
>
> > @@ -2397,6 +2397,7 @@ int tcp_v4_rcv(struct sk_buff *skb)
> >                       sk = sk2;
> >                       tcp_v4_restore_cb(skb);
> >                       refcounted = false;
> > +                     __this_cpu_write(tcp_tw_isn, isn);
> >                       goto process;
>
> Unrelated from this patch, but I think the 'process' label could be
> moved down skipping a couple of conditionals. 'sk' is a listener
> socket, checking for TW or SYN_RECV should not be needed, right?

Absolutely !

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next 0/2] tcp: fix ISN selection in TIMEWAIT -> SYN_RECV
  2024-04-07  9:33 [PATCH net-next 0/2] tcp: fix ISN selection in TIMEWAIT -> SYN_RECV Eric Dumazet
  2024-04-07  9:33 ` [PATCH net-next 1/2] tcp: propagate tcp_tw_isn via an extra parameter to ->route_req() Eric Dumazet
  2024-04-07  9:33 ` [PATCH net-next 2/2] tcp: replace TCP_SKB_CB(skb)->tcp_tw_isn with a per-cpu field Eric Dumazet
@ 2024-04-09 11:20 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 6+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-04-09 11:20 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, kuba, pabeni, ncardwell, netdev, eric.dumazet

Hello:

This series was applied to netdev/net-next.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Sun,  7 Apr 2024 09:33:20 +0000 you wrote:
> TCP can transform a TIMEWAIT socket into a SYN_RECV one from
> a SYN packet, and the ISN of the SYNACK packet is normally
> generated using TIMEWAIT tw_snd_nxt.
> 
> This SYN packet also bypasses normal checks against listen queue
> being full or not.
> 
> [...]

Here is the summary with links:
  - [net-next,1/2] tcp: propagate tcp_tw_isn via an extra parameter to ->route_req()
    https://git.kernel.org/netdev/net-next/c/b9e810405880
  - [net-next,2/2] tcp: replace TCP_SKB_CB(skb)->tcp_tw_isn with a per-cpu field
    https://git.kernel.org/netdev/net-next/c/41eecbd712b7

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-04-09 11:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-07  9:33 [PATCH net-next 0/2] tcp: fix ISN selection in TIMEWAIT -> SYN_RECV Eric Dumazet
2024-04-07  9:33 ` [PATCH net-next 1/2] tcp: propagate tcp_tw_isn via an extra parameter to ->route_req() Eric Dumazet
2024-04-07  9:33 ` [PATCH net-next 2/2] tcp: replace TCP_SKB_CB(skb)->tcp_tw_isn with a per-cpu field Eric Dumazet
2024-04-09  9:58   ` Paolo Abeni
2024-04-09 10:11     ` Eric Dumazet
2024-04-09 11:20 ` [PATCH net-next 0/2] tcp: fix ISN selection in TIMEWAIT -> SYN_RECV patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).