netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v2 0/2] tcp: save flowlabel and use for receiver repathing
@ 2023-09-29 21:03 David Morley
  2023-09-29 21:03 ` [PATCH net-next v2 1/2] tcp: record last received ipv6 flowlabel David Morley
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: David Morley @ 2023-09-29 21:03 UTC (permalink / raw)
  To: David Miller, Eric Dumazet, Jakub Kicinski; +Cc: netdev, David Morley

From: David Morley <morleyd@google.com>

This patch series stores the last received ipv6 flowlabel. This last
received flowlabel is then used to help decide whether a packet is
likely an RTO retransmit or the result of a TLP. This new information
is used to better inform the flowlabel change decision for data
receivers.

David Morley (2):
  tcp: record last received ipv6 flowlabel
  tcp: change data receiver flowlabel after one dup

 include/net/inet_connection_sock.h |  5 ++++-
 include/net/tcp.h                  |  2 ++
 net/dccp/timer.c                   |  4 ++--
 net/ipv4/tcp_input.c               | 29 ++++++++++++++++++++++++++---
 net/ipv4/tcp_timer.c               |  2 +-
 5 files changed, 35 insertions(+), 7 deletions(-)

-- 
2.42.0.582.g8ccd20d70d-goog


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH net-next v2 1/2] tcp: record last received ipv6 flowlabel
  2023-09-29 21:03 [PATCH net-next v2 0/2] tcp: save flowlabel and use for receiver repathing David Morley
@ 2023-09-29 21:03 ` David Morley
  2023-09-29 21:03 ` [PATCH net-next v2 2/2] tcp: change data receiver flowlabel after one dup David Morley
  2023-10-04 23:32 ` [PATCH net-next v2 0/2] tcp: save flowlabel and use for receiver repathing Jakub Kicinski
  2 siblings, 0 replies; 4+ messages in thread
From: David Morley @ 2023-09-29 21:03 UTC (permalink / raw)
  To: David Miller, Eric Dumazet, Jakub Kicinski
  Cc: netdev, David Morley, Neal Cardwell, Yuchung Cheng

From: David Morley <morleyd@google.com>

In order to better estimate whether a data packet has been
retransmitted or is the result of a TLP, we save the last received
ipv6 flowlabel.

To make space for this field we resize the "ato" field in
inet_connection_sock as the current value of TCP_DELACK_MAX can be
fully contained in 8 bits and add a compile_time_assert ensuring this
field is the required size.

v2: addressed kernel bot feedback about dccp_delack_timer()

Signed-off-by: David Morley <morleyd@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Tested-by: David Morley <morleyd@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
---
 include/net/inet_connection_sock.h |  5 ++++-
 include/net/tcp.h                  |  2 ++
 net/dccp/timer.c                   |  4 ++--
 net/ipv4/tcp_input.c               | 15 +++++++++++++++
 net/ipv4/tcp_timer.c               |  2 +-
 5 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
index 5d2fcc137b88..d6d9d1c1985a 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -114,7 +114,10 @@ struct inet_connection_sock {
 		__u8		  quick;	 /* Scheduled number of quick acks	   */
 		__u8		  pingpong;	 /* The session is interactive		   */
 		__u8		  retry;	 /* Number of attempts			   */
-		__u32		  ato;		 /* Predicted tick of soft clock	   */
+		#define ATO_BITS 8
+		__u32		  ato:ATO_BITS,	 /* Predicted tick of soft clock	   */
+				  lrcv_flowlabel:20, /* last received ipv6 flowlabel	   */
+				  unused:4;
 		unsigned long	  timeout;	 /* Currently scheduled timeout		   */
 		__u32		  lrcvtime;	 /* timestamp of last received data packet */
 		__u16		  last_seg_size; /* Size of last incoming segment	   */
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 91688d0dadcd..61fe664d109b 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -131,6 +131,8 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
 #define TCP_FIN_TIMEOUT_MAX (120 * HZ) /* max TCP_LINGER2 value (two minutes) */
 
 #define TCP_DELACK_MAX	((unsigned)(HZ/5))	/* maximal time to delay before sending an ACK */
+static_assert((1 << ATO_BITS) > TCP_DELACK_MAX);
+
 #if HZ >= 100
 #define TCP_DELACK_MIN	((unsigned)(HZ/25))	/* minimal time to delay before sending an ACK */
 #define TCP_ATO_MIN	((unsigned)(HZ/25))
diff --git a/net/dccp/timer.c b/net/dccp/timer.c
index b3255e87cc7e..a4cfb47b60e5 100644
--- a/net/dccp/timer.c
+++ b/net/dccp/timer.c
@@ -196,8 +196,8 @@ static void dccp_delack_timer(struct timer_list *t)
 	if (inet_csk_ack_scheduled(sk)) {
 		if (!inet_csk_in_pingpong_mode(sk)) {
 			/* Delayed ACK missed: inflate ATO. */
-			icsk->icsk_ack.ato = min(icsk->icsk_ack.ato << 1,
-						 icsk->icsk_rto);
+			icsk->icsk_ack.ato = min_t(u32, icsk->icsk_ack.ato << 1,
+						   icsk->icsk_rto);
 		} else {
 			/* Delayed ACK missed: leave pingpong mode and
 			 * deflate ATO.
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 584825ddd0a0..abe7494361c0 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -765,6 +765,16 @@ void tcp_rcv_space_adjust(struct sock *sk)
 	tp->rcvq_space.time = tp->tcp_mstamp;
 }
 
+static void tcp_save_lrcv_flowlabel(struct sock *sk, const struct sk_buff *skb)
+{
+#if IS_ENABLED(CONFIG_IPV6)
+	struct inet_connection_sock *icsk = inet_csk(sk);
+
+	if (skb->protocol == htons(ETH_P_IPV6))
+		icsk->icsk_ack.lrcv_flowlabel = ntohl(ip6_flowlabel(ipv6_hdr(skb)));
+#endif
+}
+
 /* There is something which you must keep in mind when you analyze the
  * behavior of the tp->ato delayed ack timeout interval.  When a
  * connection starts up, we want to ack as quickly as possible.  The
@@ -813,6 +823,7 @@ static void tcp_event_data_recv(struct sock *sk, struct sk_buff *skb)
 		}
 	}
 	icsk->icsk_ack.lrcvtime = now;
+	tcp_save_lrcv_flowlabel(sk, skb);
 
 	tcp_ecn_check_ce(sk, skb);
 
@@ -4506,6 +4517,9 @@ static void tcp_rcv_spurious_retrans(struct sock *sk, const struct sk_buff *skb)
 	if (TCP_SKB_CB(skb)->seq == tcp_sk(sk)->duplicate_sack[0].start_seq &&
 	    sk_rethink_txhash(sk))
 		NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPDUPLICATEDATAREHASH);
+
+	/* Save last flowlabel after a spurious retrans. */
+	tcp_save_lrcv_flowlabel(sk, skb);
 }
 
 static void tcp_send_dupack(struct sock *sk, const struct sk_buff *skb)
@@ -4822,6 +4836,7 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
 	u32 seq, end_seq;
 	bool fragstolen;
 
+	tcp_save_lrcv_flowlabel(sk, skb);
 	tcp_ecn_check_ce(sk, skb);
 
 	if (unlikely(tcp_try_rmem_schedule(sk, skb, skb->truesize))) {
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 3f61c6a70a1f..0862b73dd3b5 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -322,7 +322,7 @@ void tcp_delack_timer_handler(struct sock *sk)
 	if (inet_csk_ack_scheduled(sk)) {
 		if (!inet_csk_in_pingpong_mode(sk)) {
 			/* Delayed ACK missed: inflate ATO. */
-			icsk->icsk_ack.ato = min(icsk->icsk_ack.ato << 1, icsk->icsk_rto);
+			icsk->icsk_ack.ato = min_t(u32, icsk->icsk_ack.ato << 1, icsk->icsk_rto);
 		} else {
 			/* Delayed ACK missed: leave pingpong mode and
 			 * deflate ATO.
-- 
2.42.0.582.g8ccd20d70d-goog


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH net-next v2 2/2] tcp: change data receiver flowlabel after one dup
  2023-09-29 21:03 [PATCH net-next v2 0/2] tcp: save flowlabel and use for receiver repathing David Morley
  2023-09-29 21:03 ` [PATCH net-next v2 1/2] tcp: record last received ipv6 flowlabel David Morley
@ 2023-09-29 21:03 ` David Morley
  2023-10-04 23:32 ` [PATCH net-next v2 0/2] tcp: save flowlabel and use for receiver repathing Jakub Kicinski
  2 siblings, 0 replies; 4+ messages in thread
From: David Morley @ 2023-09-29 21:03 UTC (permalink / raw)
  To: David Miller, Eric Dumazet, Jakub Kicinski
  Cc: netdev, David Morley, Neal Cardwell, Yuchung Cheng

From: David Morley <morleyd@google.com>

This commit changes the data receiver repath behavior to occur after
receiving a single duplicate. This can help recover ACK connectivity
quicker if a TLP was sent along a nonworking path.

For instance, consider the case where we have an initially nonworking
forward path and reverse path and subsequently switch to only working
forward paths. Before this patch we would have the following behavior.

+---------+--------+--------+----------+----------+----------+
| Event   | For FL | Rev FL | FP Works | RP Works | Data Del |
+---------+--------+--------+----------+----------+----------+
| Initial | A      | 1      | N        | N        | 0        |
+---------+--------+--------+----------+----------+----------+
| TLP     | A      | 1      | N        | N        | 0        |
+---------+--------+--------+----------+----------+----------+
| RTO 1   | B      | 1      | Y        | N        | 1        |
+---------+--------+--------+----------+----------+----------+
| RTO 2   | C      | 1      | Y        | N        | 2        |
+---------+--------+--------+----------+----------+----------+
| RTO 3   | D      | 2      | Y        | Y        | 3        |
+---------+--------+--------+----------+----------+----------+

This patch gets rid of at least RTO 3, avoiding additional unnecessary
repaths of a working forward path to a (potentially) nonworking one.

In addition, this commit changes the behavior to avoid repathing upon
rx of duplicate data if the local endpoint is in CA_Loss (in which
case the RTOs will already be changing the outgoing flowlabel).

Signed-off-by: David Morley <morleyd@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Tested-by: David Morley <morleyd@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_input.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index abe7494361c0..f77fbdb3103d 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4511,15 +4511,23 @@ static void tcp_rcv_spurious_retrans(struct sock *sk, const struct sk_buff *skb)
 {
 	/* When the ACK path fails or drops most ACKs, the sender would
 	 * timeout and spuriously retransmit the same segment repeatedly.
-	 * The receiver remembers and reflects via DSACKs. Leverage the
-	 * DSACK state and change the txhash to re-route speculatively.
+	 * If it seems our ACKs are not reaching the other side,
+	 * based on receiving a duplicate data segment with new flowlabel
+	 * (suggesting the sender suffered an RTO), and we are not already
+	 * repathing due to our own RTO, then rehash the socket to repath our
+	 * packets.
 	 */
-	if (TCP_SKB_CB(skb)->seq == tcp_sk(sk)->duplicate_sack[0].start_seq &&
+#if IS_ENABLED(CONFIG_IPV6)
+	if (inet_csk(sk)->icsk_ca_state != TCP_CA_Loss &&
+	    skb->protocol == htons(ETH_P_IPV6) &&
+	    (tcp_sk(sk)->inet_conn.icsk_ack.lrcv_flowlabel !=
+	     ntohl(ip6_flowlabel(ipv6_hdr(skb)))) &&
 	    sk_rethink_txhash(sk))
 		NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPDUPLICATEDATAREHASH);
 
 	/* Save last flowlabel after a spurious retrans. */
 	tcp_save_lrcv_flowlabel(sk, skb);
+#endif
 }
 
 static void tcp_send_dupack(struct sock *sk, const struct sk_buff *skb)
-- 
2.42.0.582.g8ccd20d70d-goog


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next v2 0/2] tcp: save flowlabel and use for receiver repathing
  2023-09-29 21:03 [PATCH net-next v2 0/2] tcp: save flowlabel and use for receiver repathing David Morley
  2023-09-29 21:03 ` [PATCH net-next v2 1/2] tcp: record last received ipv6 flowlabel David Morley
  2023-09-29 21:03 ` [PATCH net-next v2 2/2] tcp: change data receiver flowlabel after one dup David Morley
@ 2023-10-04 23:32 ` Jakub Kicinski
  2 siblings, 0 replies; 4+ messages in thread
From: Jakub Kicinski @ 2023-10-04 23:32 UTC (permalink / raw)
  To: David Morley; +Cc: David Miller, Eric Dumazet, netdev, David Morley

On Fri, 29 Sep 2023 21:03:50 +0000 David Morley wrote:
> This patch series stores the last received ipv6 flowlabel. This last
> received flowlabel is then used to help decide whether a packet is
> likely an RTO retransmit or the result of a TLP. This new information
> is used to better inform the flowlabel change decision for data
> receivers.

Minor conflict with Eric's changes to derive delack_max from rto_min.
This no longer builds, please rebase & resend, thanks!
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-10-04 23:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-29 21:03 [PATCH net-next v2 0/2] tcp: save flowlabel and use for receiver repathing David Morley
2023-09-29 21:03 ` [PATCH net-next v2 1/2] tcp: record last received ipv6 flowlabel David Morley
2023-09-29 21:03 ` [PATCH net-next v2 2/2] tcp: change data receiver flowlabel after one dup David Morley
2023-10-04 23:32 ` [PATCH net-next v2 0/2] tcp: save flowlabel and use for receiver repathing Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).