From: Neil Spring <ntspring@meta.com>
To: netdev@vger.kernel.org
Cc: edumazet@google.com, davem@davemloft.net, kuba@kernel.org
Subject: [PATCH net-next v2 1/2] tcp: rehash onto different ECMP path on retransmit timeout
Date: Wed, 8 Apr 2026 00:05:13 -0700 [thread overview]
Message-ID: <20260408070514.1840227-2-ntspring@meta.com> (raw)
In-Reply-To: <20260408070514.1840227-1-ntspring@meta.com>
Add sk_dst_reset() alongside sk_rethink_txhash() in the RTO, PLB,
and spurious-retrans paths so that the next transmit triggers a fresh
route lookup. Propagate sk_txhash into fl6->mp_hash in
inet6_csk_route_req() and inet6_csk_route_socket() so
fib6_select_path() uses the socket's current hash for ECMP selection.
The ir_iif update in tcp_check_req() covers both IPv4 and IPv6
because it was cleaner than gating on address family; IPv4 is
otherwise unaltered, and not having autoflowlabel in IPv4 means
I wouldn't expect a new path on timeout.
It is possible that PLB does not need this (that there are other
methods of reacting to local congestion); I added the sk_dst_reset
for consistency.
Signed-off-by: Neil Spring <ntspring@meta.com>
---
net/ipv4/tcp_input.c | 4 +++-
net/ipv4/tcp_minisocks.c | 13 +++++++++++++
net/ipv4/tcp_plb.c | 1 +
net/ipv4/tcp_timer.c | 1 +
net/ipv6/inet6_connection_sock.c | 11 +++++++++++
5 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 7171442c3ed7..3d42ab45066c 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5014,8 +5014,10 @@ static void tcp_rcv_spurious_retrans(struct sock *sk,
skb->protocol == htons(ETH_P_IPV6) &&
(tcp_sk(sk)->inet_conn.icsk_ack.lrcv_flowlabel !=
ntohl(ip6_flowlabel(ipv6_hdr(skb)))) &&
- sk_rethink_txhash(sk))
+ sk_rethink_txhash(sk)) {
NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPDUPLICATEDATAREHASH);
+ sk_dst_reset(sk);
+ }
/* Save last flowlabel after a spurious retrans. */
tcp_save_lrcv_flowlabel(sk, skb);
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 199f0b579e89..27edf71effc2 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -750,6 +750,19 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
* Reset timer after retransmitting SYNACK, similar to
* the idea of fast retransmit in recovery.
*/
+
+ /* Update ir_iif to match the interface the retransmitted
+ * SYN arrived on; inet6_csk_route_req() uses this as
+ * flowi6_oif, constraining ECMP path for the SYN/ACK.
+ */
+#if IS_ENABLED(CONFIG_IPV6)
+ if (sk->sk_family == AF_INET6)
+ inet_rsk(req)->ir_iif = tcp_v6_iif(skb);
+ else
+#endif
+ inet_rsk(req)->ir_iif =
+ inet_request_bound_dev_if(sk, skb);
+
if (!tcp_oow_rate_limited(sock_net(sk), skb,
LINUX_MIB_TCPACKSKIPPEDSYNRECV,
&tcp_rsk(req)->last_oow_ack_time)) {
diff --git a/net/ipv4/tcp_plb.c b/net/ipv4/tcp_plb.c
index 68ccdb9a5412..d7cc00a58e53 100644
--- a/net/ipv4/tcp_plb.c
+++ b/net/ipv4/tcp_plb.c
@@ -79,6 +79,7 @@ void tcp_plb_check_rehash(struct sock *sk, struct tcp_plb_state *plb)
return;
sk_rethink_txhash(sk);
+ sk_dst_reset(sk);
plb->consec_cong_rounds = 0;
tcp_sk(sk)->plb_rehash++;
NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPPLBREHASH);
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index ea99988795e7..acc22fc532c2 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -299,6 +299,7 @@ static int tcp_write_timeout(struct sock *sk)
if (sk_rethink_txhash(sk)) {
tp->timeout_rehash++;
__NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPTIMEOUTREHASH);
+ sk_dst_reset(sk);
}
return 0;
diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c
index 37534e116899..3fd7acbe2c49 100644
--- a/net/ipv6/inet6_connection_sock.c
+++ b/net/ipv6/inet6_connection_sock.c
@@ -14,6 +14,7 @@
#include <linux/ipv6.h>
#include <linux/jhash.h>
#include <linux/slab.h>
+#include <linux/tcp.h>
#include <net/addrconf.h>
#include <net/inet_connection_sock.h>
@@ -48,6 +49,12 @@ struct dst_entry *inet6_csk_route_req(const struct sock *sk,
fl6->flowi6_uid = sk_uid(sk);
security_req_classify_flow(req, flowi6_to_flowi_common(fl6));
+ /* Use the request socket's txhash (re-rolled by tcp_rtx_synack())
+ * for ECMP path selection; >> 1 for 31-bit mp_hash range.
+ */
+ if (tcp_rsk(req)->txhash)
+ fl6->mp_hash = tcp_rsk(req)->txhash >> 1;
+
if (!dst) {
dst = ip6_dst_lookup_flow(sock_net(sk), sk, fl6, final_p);
if (IS_ERR(dst))
@@ -70,6 +77,10 @@ struct dst_entry *inet6_csk_route_socket(struct sock *sk,
fl6->saddr = np->saddr;
fl6->flowlabel = np->flow_label;
IP6_ECN_flow_xmit(sk, fl6->flowlabel);
+
+ /* >> 1 for 31-bit mp_hash range matching nhc_upper_bound. */
+ if (sk->sk_txhash)
+ fl6->mp_hash = sk->sk_txhash >> 1;
fl6->flowi6_oif = sk->sk_bound_dev_if;
fl6->flowi6_mark = sk->sk_mark;
fl6->fl6_sport = inet->inet_sport;
--
2.52.0
next prev parent reply other threads:[~2026-04-08 7:05 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-08 0:28 [PATCH net-next 0/2] tcp: rehash onto different ECMP path on retransmit timeout Neil Spring
2026-04-08 0:28 ` [PATCH net-next 1/2] " Neil Spring
2026-04-08 1:09 ` Eric Dumazet
2026-04-08 6:59 ` Neil Spring
2026-04-08 0:28 ` [PATCH net-next 2/2] selftests: net: add ECMP rehash test Neil Spring
2026-04-08 7:05 ` [PATCH net-next v2 0/2] tcp: rehash onto different ECMP path on retransmit timeout Neil Spring
2026-04-08 7:05 ` Neil Spring [this message]
2026-04-08 7:12 ` [PATCH net-next v2 1/2] " Eric Dumazet
2026-04-08 7:05 ` [PATCH net-next v2 2/2] selftests: net: add ECMP rehash test Neil Spring
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260408070514.1840227-2-ntspring@meta.com \
--to=ntspring@meta.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox