From: Ido Schimmel <idosch@idosch.org>
To: demetriousz@proton.me
Cc: "David S. Miller" <davem@davemloft.net>,
David Ahern <dsahern@kernel.org>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH net-next] net: ipv6: respect route prfsrc and fill empty saddr before ECMP hash
Date: Mon, 6 Oct 2025 16:30:11 +0300 [thread overview]
Message-ID: <aOPEYwnyGnMQCp-f@shredder> (raw)
In-Reply-To: <20251005-ipv6-set-saddr-to-prefsrc-before-hash-to-stabilize-ecmp-v1-1-d43b6ef00035@proton.me>
On Sun, Oct 05, 2025 at 08:49:55PM +0000, Dmitry Z via B4 Relay wrote:
> From: Dmitry Z <demetriousz@proton.me>
>
> In an IPv6 ECMP scenario, if a multi-homed host initiates a connection,
> `saddr` may remain empty during the initial call to `rt6_multipath_hash()`.
> It gets filled later, once the outgoing interface (OIF) is determined and
> `ipv6_dev_get_saddr()` (RFC 6724) selects the proper source address.
>
> In some cases, this can cause the flow to switch paths: the first packets
> go via one link, while the rest of the flow is routed over another.
>
> A practical example is a Git-over-SSH session. When running `git fetch`,
> the initial control traffic uses TOS 0x48, but data transfer switches to
> TOS 0x20. This triggers a new hash computation, and at that time `saddr`
> is already populated. As a result, packets with TOS 0x20 may be sent via
> a different OIF, because `rt6_multipath_hash()` now produces a different
> result.
>
> This issue can happen even if the matched IPv6 route specifies a `src`
> (preferred source) address. The actual impact depends on the network
> topology. In my setup, the flow was redirected to a different switch and
> reached another host, leading to TCP RSTs from the host where the session
> was never established.
>
> Possible workarounds:
> 1. Use netfilter to normalize the DSCP field before route lookup.
> (breaks DSCP/TOS assignment set by the socket)
> 2. Exclude the source address from the ECMP hash via sysctl knobs.
> (excludes an important part from hash computation)
Two more options (which I didn't test):
3. Setting "IPQoS" in SSH config to a single value. It should prevent
OpenSSH from switching DSCP while the connection is alive. Switching
DSCP triggers a route lookup since commit 305e95bb893c ("net-ipv6:
changes to ->tclass (via IPV6_TCLASS) should sk_dst_reset()"). To be
clear, I don't think this commit is problematic as there are other
events that can invalidate cached dst entries.
4. Setting "BindAddress" in SSH config. It should make sure that the
same source address is used for all route lookups.
> This patch uses the `fib6_prefsrc.addr` value from the selected route to
> populate `saddr` before ECMP hash computation, ensuring consistent path
> selection across the flow.
I'm not convinced the problem is in the kernel. As long as all the
packets are sent with the same 5-tuple, it's up to the network to
deliver them correctly. I don't know how your topology looks like, but
in the general case packets belonging to the same flow can be routed via
different paths over time. If multiple servers can service incoming SSH
connections, then there should be a stateful load balancer between them
and the clients so that packets belonging to the same flow are always
delivered to the same server. ECMP cannot be relied on to do load
balancing alone as it's stateless.
next prev parent reply other threads:[~2025-10-06 13:30 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-05 20:49 [PATCH net-next] net: ipv6: respect route prfsrc and fill empty saddr before ECMP hash Dmitry Z via B4 Relay
2025-10-06 13:30 ` Ido Schimmel [this message]
2025-10-06 18:31 ` Dmitry
2025-10-07 17:04 ` Ido Schimmel
2025-10-07 22:25 ` Willem de Bruijn
2025-10-08 6:57 ` Ido Schimmel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aOPEYwnyGnMQCp-f@shredder \
--to=idosch@idosch.org \
--cc=davem@davemloft.net \
--cc=demetriousz@proton.me \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox