netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] net: ipv6: respect route prfsrc and fill empty saddr before ECMP hash
@ 2025-10-05 20:49 Dmitry Z via B4 Relay
  2025-10-06 13:30 ` Ido Schimmel
  0 siblings, 1 reply; 6+ messages in thread
From: Dmitry Z via B4 Relay @ 2025-10-05 20:49 UTC (permalink / raw)
  To: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman
  Cc: netdev, linux-kernel, Dmitry Z

From: Dmitry Z <demetriousz@proton.me>

In an IPv6 ECMP scenario, if a multi-homed host initiates a connection,
`saddr` may remain empty during the initial call to `rt6_multipath_hash()`.
It gets filled later, once the outgoing interface (OIF) is determined and
`ipv6_dev_get_saddr()` (RFC 6724) selects the proper source address.

In some cases, this can cause the flow to switch paths: the first packets
go via one link, while the rest of the flow is routed over another.

A practical example is a Git-over-SSH session. When running `git fetch`,
the initial control traffic uses TOS 0x48, but data transfer switches to
TOS 0x20. This triggers a new hash computation, and at that time `saddr`
is already populated. As a result, packets with TOS 0x20 may be sent via
a different OIF, because `rt6_multipath_hash()` now produces a different
result.

This issue can happen even if the matched IPv6 route specifies a `src`
(preferred source) address. The actual impact depends on the network
topology. In my setup, the flow was redirected to a different switch and
reached another host, leading to TCP RSTs from the host where the session
was never established.

Possible workarounds:
1. Use netfilter to normalize the DSCP field before route lookup.
   (breaks DSCP/TOS assignment set by the socket)
2. Exclude the source address from the ECMP hash via sysctl knobs.
   (excludes an important part from hash computation)

This patch uses the `fib6_prefsrc.addr` value from the selected route to
populate `saddr` before ECMP hash computation, ensuring consistent path
selection across the flow.

Signed-off-by: Dmitry Z <demetriousz@proton.me>
---
 net/ipv6/route.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 3299cfa12e21c96ecb5c4dea5f305d5f7ce16084..d2ecf16417a6f0fc6956f0ebff3d8dea593da059 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2270,6 +2270,11 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
 	if (res.f6i == net->ipv6.fib6_null_entry)
 		goto out;
 
+	if (ipv6_addr_any(&fl6->saddr) &&
+	    !ipv6_addr_any(&res.f6i->fib6_prefsrc.addr)) {
+		fl6->saddr = res.f6i->fib6_prefsrc.addr;
+	}
+
 	fib6_select_path(net, &res, fl6, oif, false, skb, strict);
 
 	/*Search through exception table */

---
base-commit: e5f0a698b34ed76002dc5cff3804a61c80233a7a
change-id: 20251005-ipv6-set-saddr-to-prefsrc-before-hash-to-stabilize-ecmp-6d646ec96ac4

Best regards,
-- 
Dmitry Z <demetriousz@proton.me>



^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-10-08  6:57 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-05 20:49 [PATCH net-next] net: ipv6: respect route prfsrc and fill empty saddr before ECMP hash Dmitry Z via B4 Relay
2025-10-06 13:30 ` Ido Schimmel
2025-10-06 18:31   ` Dmitry
2025-10-07 17:04     ` Ido Schimmel
2025-10-07 22:25       ` Willem de Bruijn
2025-10-08  6:57         ` Ido Schimmel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).