* [PATCH] net/ipv6: repeat route lookup with saddr set for ECMP
@ 2026-03-29 9:12 Maximilian Moehl
2026-03-30 7:56 ` Paolo Abeni
0 siblings, 1 reply; 3+ messages in thread
From: Maximilian Moehl @ 2026-03-29 9:12 UTC (permalink / raw)
To: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman
Cc: netdev, linux-kernel, Maximilian Moehl
When the routing decision involves ECMP, the initial hash is calculated
with saddr being :: and the decision which interface to select from an
ECMP group is based on that hash. If the route lookup has to be
repeated, e.g. because a route was updated, the hash is calculated with
saddr set to the previously selected address. This can cause the
selected interface to change, breaking ongoing connections.
To ensure the initial interface selection is based on an actual saddr
the route lookup is repeated after the source address selection if a
hash was calculated.
Signed-off-by: Maximilian Moehl <maximilian@moehl.eu>
Link: https://lore.kernel.org/all/aOYLRyIlc7XU7-7n@shredder/
---
I've created a write-up of what I've done, including the steps taken
to test the patch: https://moehl.eu/blog/linux-ipv6-ecmp-instability.html
net/ipv6/ip6_output.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 8e2a6b28cea7..465fce51d017 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1148,6 +1148,18 @@ static int ip6_dst_lookup_tail(struct net *net, const struct sock *sk,
*dst = NULL;
}
+ /* If ECMP was involved the initial hash was calculted
+ * with saddr=:: which can result in instability
+ * when it is later re-calculated with the selected
+ * saddr. Lookup the route again with the chosen
+ * saddr to get a stable result.
+ */
+ if (fl6->mp_hash) {
+ fl6->mp_hash = 0;
+ dst_release(*dst);
+ *dst = NULL;
+ }
+
if (fl6->flowi6_oif)
flags |= RT6_LOOKUP_F_IFACE;
}
base-commit: dc9e9d61e301c087bcd990dbf2fa18ad3e2e1429
--
2.53.0
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH] net/ipv6: repeat route lookup with saddr set for ECMP
2026-03-29 9:12 [PATCH] net/ipv6: repeat route lookup with saddr set for ECMP Maximilian Moehl
@ 2026-03-30 7:56 ` Paolo Abeni
2026-03-31 12:50 ` Maximilian Moehl
0 siblings, 1 reply; 3+ messages in thread
From: Paolo Abeni @ 2026-03-30 7:56 UTC (permalink / raw)
To: Maximilian Moehl, David S. Miller, David Ahern, Eric Dumazet,
Jakub Kicinski, Simon Horman
Cc: netdev, linux-kernel
On 3/29/26 11:12 AM, Maximilian Moehl wrote:
> When the routing decision involves ECMP, the initial hash is calculated
> with saddr being :: and the decision which interface to select from an
> ECMP group is based on that hash. If the route lookup has to be
> repeated, e.g. because a route was updated, the hash is calculated with
> saddr set to the previously selected address. This can cause the
> selected interface to change, breaking ongoing connections.
>
> To ensure the initial interface selection is based on an actual saddr
> the route lookup is repeated after the source address selection if a
> hash was calculated.
>
> Signed-off-by: Maximilian Moehl <maximilian@moehl.eu>
> Link: https://lore.kernel.org/all/aOYLRyIlc7XU7-7n@shredder/
> ---
> I've created a write-up of what I've done, including the steps taken
> to test the patch: https://moehl.eu/blog/linux-ipv6-ecmp-instability.html
>
> net/ipv6/ip6_output.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 8e2a6b28cea7..465fce51d017 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -1148,6 +1148,18 @@ static int ip6_dst_lookup_tail(struct net *net, const struct sock *sk,
> *dst = NULL;
> }
>
> + /* If ECMP was involved the initial hash was calculted
> + * with saddr=:: which can result in instability
> + * when it is later re-calculated with the selected
> + * saddr. Lookup the route again with the chosen
> + * saddr to get a stable result.
> + */
> + if (fl6->mp_hash) {
> + fl6->mp_hash = 0;
> + dst_release(*dst);
> + *dst = NULL;
> + }
> +
> if (fl6->flowi6_oif)
> flags |= RT6_LOOKUP_F_IFACE;
> }
This apparently breaks ipv6 fib tests (fib_tests.sh):
# IPv6 multipath load balance test
# TEST: IPv6 multipath loadbalance [FAIL]
see
https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style
on how to reproduce the tests.
Also this would deserve additional testcases.
Without diving much inside the code I have the feeling this change is
plugged into the wrong place: multipath selection logic should be
encapsulated by fib6_select_path().
/P
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH] net/ipv6: repeat route lookup with saddr set for ECMP
2026-03-30 7:56 ` Paolo Abeni
@ 2026-03-31 12:50 ` Maximilian Moehl
0 siblings, 0 replies; 3+ messages in thread
From: Maximilian Moehl @ 2026-03-31 12:50 UTC (permalink / raw)
To: Paolo Abeni, Maximilian Moehl, David S. Miller, David Ahern,
Eric Dumazet, Jakub Kicinski, Simon Horman
Cc: netdev, linux-kernel
On Mon Mar 30, 2026 at 9:56 AM CEST, Paolo Abeni wrote:
> On 3/29/26 11:12 AM, Maximilian Moehl wrote:
>> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
>> index 8e2a6b28cea7..465fce51d017 100644
>> --- a/net/ipv6/ip6_output.c
>> +++ b/net/ipv6/ip6_output.c
>> @@ -1148,6 +1148,18 @@ static int ip6_dst_lookup_tail(struct net *net, const struct sock *sk,
>> *dst = NULL;
>> }
>>
>> + /* If ECMP was involved the initial hash was calculted
>> + * with saddr=:: which can result in instability
>> + * when it is later re-calculated with the selected
>> + * saddr. Lookup the route again with the chosen
>> + * saddr to get a stable result.
>> + */
>> + if (fl6->mp_hash) {
>> + fl6->mp_hash = 0;
>> + dst_release(*dst);
>> + *dst = NULL;
>> + }
>> +
>> if (fl6->flowi6_oif)
>> flags |= RT6_LOOKUP_F_IFACE;
>> }
>
> This apparently breaks ipv6 fib tests (fib_tests.sh):
>
> # IPv6 multipath load balance test
> # TEST: IPv6 multipath loadbalance [FAIL]
>
> see
> https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style
> on how to reproduce the tests.
>
> Also this would deserve additional testcases.
Thank you for the pointer, I will look into the tests.
> Without diving much inside the code I have the feeling this change is
> plugged into the wrong place: multipath selection logic should be
> encapsulated by fib6_select_path().
To be able to move this logic into fib6_select_path(), the call to
ip6_route_get_saddr() would probably have to move as well as we need
to set the source address after the first lookup. This would then also
set the source address as a side-effect.
I can give this a try, just want to confirm that it's the correct way
before doing so or if I'm missing something.
--
Max
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-03-31 12:50 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-29 9:12 [PATCH] net/ipv6: repeat route lookup with saddr set for ECMP Maximilian Moehl
2026-03-30 7:56 ` Paolo Abeni
2026-03-31 12:50 ` Maximilian Moehl
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox