* [PATCH net v1 0/2] ipv6: fix ECMP route failover on carrier loss
@ 2026-04-27 22:42 Sagarika Sharma
2026-04-27 22:42 ` [PATCH net v1 1/2] ipv6: update route serial number on NETDEV_CHANGE Sagarika Sharma
2026-04-27 22:42 ` [PATCH net v1 2/2] selftest: net: Add test for TCP flow failover with ECMP routes Sagarika Sharma
0 siblings, 2 replies; 6+ messages in thread
From: Sagarika Sharma @ 2026-04-27 22:42 UTC (permalink / raw)
To: David S . Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: Shuah Khan, Simon Horman, Kuniyuki Iwashima, netdev,
linux-kselftest, Sagarika Sharma
This patchset resolves an issue where established IPv6 connections are
unable to transition to alternative ECMP nexthops upon carrier loss.
Unlike IPv4, the IPv6 routing subsystem does not actively invalidate
cached destinations during a NETDEV_CHANGE event. Sockets persist
with dead routes, leading to stalled traffic or connection drops.
This series introduces a fix to trigger route invalidation by
updating the route serial number on link carrier loss and provides
a corresponding selftest to validate the failover behavior for IPv4
and IPv6.
Kuniyuki Iwashima (1):
selftest: net: Add test for TCP flow failover with ECMP routes.
Sagarika Sharma (1):
ipv6: update route serial number on NETDEV_CHANGE
net/ipv6/route.c | 1 +
tools/testing/selftests/net/Makefile | 1 +
.../selftests/net/tcp_ecmp_failover.sh | 209 ++++++++++++++++++
3 files changed, 211 insertions(+)
create mode 100755 tools/testing/selftests/net/tcp_ecmp_failover.sh
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH net v1 1/2] ipv6: update route serial number on NETDEV_CHANGE
2026-04-27 22:42 [PATCH net v1 0/2] ipv6: fix ECMP route failover on carrier loss Sagarika Sharma
@ 2026-04-27 22:42 ` Sagarika Sharma
2026-04-28 11:07 ` Ido Schimmel
2026-04-27 22:42 ` [PATCH net v1 2/2] selftest: net: Add test for TCP flow failover with ECMP routes Sagarika Sharma
1 sibling, 1 reply; 6+ messages in thread
From: Sagarika Sharma @ 2026-04-27 22:42 UTC (permalink / raw)
To: David S . Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: Shuah Khan, Simon Horman, Kuniyuki Iwashima, netdev,
linux-kselftest, Sagarika Sharma
When using IPv6 ECMP routes, if a netdev listed as a nexthop experiences
a carrier change event (e.g., a bond device generating a NETDEV_CHANGE
event after its slaves go linkdown), established connections utilizing
that nexthop fail to fail over to other available nexthops. Instead,
these connections stall or drop.
This happens because the IPv6 FIB code does not invalidate the socket's
cached destination when a NETDEV_CHANGE event occurs. While
fib6_ifdown() correctly marks the nexthop with RTNH_F_LINKDOWN, it
leaves the route's serial number unchanged. As a result, sockets with a
previously cached dst do not realize the route is no longer viable and
continue to try using the non-functional nexthop.
This behavior contrasts with IPv4, which actively flushes cached
destinations on a NETDEV_CHANGE event (see fib_netdev_event() in
net/ipv4/fib_frontend.c).
Fix this by updating the route serial number in fib6_ifdown() when
setting RTNH_F_LINKDOWN. This invalidates stale cached destinations,
forcing sockets to perform a new route lookup and fail over to a
functioning nexthop.
Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)")
Signed-off-by: Sagarika Sharma <sharmasagarika@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
---
net/ipv6/route.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 19eb6b702227..0dc0316530ca 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -4995,6 +4995,7 @@ static int fib6_ifdown(struct fib6_info *rt, void *p_arg)
rt->fib6_flags & (RTF_LOCAL | RTF_ANYCAST))
break;
rt->fib6_nh->fib_nh_flags |= RTNH_F_LINKDOWN;
+ fib6_update_sernum(net, rt);
rt6_multipath_rebalance(rt);
break;
}
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH net v1 2/2] selftest: net: Add test for TCP flow failover with ECMP routes.
2026-04-27 22:42 [PATCH net v1 0/2] ipv6: fix ECMP route failover on carrier loss Sagarika Sharma
2026-04-27 22:42 ` [PATCH net v1 1/2] ipv6: update route serial number on NETDEV_CHANGE Sagarika Sharma
@ 2026-04-27 22:42 ` Sagarika Sharma
2026-04-28 11:18 ` Ido Schimmel
1 sibling, 1 reply; 6+ messages in thread
From: Sagarika Sharma @ 2026-04-27 22:42 UTC (permalink / raw)
To: David S . Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: Shuah Khan, Simon Horman, Kuniyuki Iwashima, netdev,
linux-kselftest, Sagarika Sharma
From: Kuniyuki Iwashima <kuniyu@google.com>
Without the previous commit, TCP failed to switch to alternative
IPv6 routes immediately upon carrier loss.
It would persist with the dead route until reaching the threshold
net.ipv4.tcp_retries1, leading to unnecessary delays in failover.
Let's add a selftest for this scenario to ensure TCP fails over
immediately upon a carrier loss event.
Before:
TEST: TCP IPv4 failover [ OK ]
TEST: TCP IPv6 failover [FAIL]
After:
TEST: TCP IPv4 failover [ OK ]
TEST: TCP IPv6 failover [ OK ]
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Sagarika Sharma <sharmasagarika@google.com>
---
tools/testing/selftests/net/Makefile | 1 +
.../selftests/net/tcp_ecmp_failover.sh | 209 ++++++++++++++++++
2 files changed, 210 insertions(+)
create mode 100755 tools/testing/selftests/net/tcp_ecmp_failover.sh
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index a275ed584026..f3da38c54d27 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -96,6 +96,7 @@ TEST_PROGS := \
srv6_hl2encap_red_l2vpn_test.sh \
srv6_iptunnel_cache.sh \
stress_reuseport_listen.sh \
+ tcp_ecmp_failover.sh \
tcp_fastopen_backup_key.sh \
test_bpf.sh \
test_bridge_backup_port.sh \
diff --git a/tools/testing/selftests/net/tcp_ecmp_failover.sh b/tools/testing/selftests/net/tcp_ecmp_failover.sh
new file mode 100755
index 000000000000..f857d5db84d8
--- /dev/null
+++ b/tools/testing/selftests/net/tcp_ecmp_failover.sh
@@ -0,0 +1,209 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2026 Google LLC.
+#
+# This test verifies TCP flow failover between ECMP routes
+# upon carrier loss on the active device.
+#
+# socat -----------------------------> socat
+# |
+# .-- veth-c1 -|- veth-s1 --.
+# dummy0 -| | |-- dummy0
+# '-- veth-c2 -|- veth-s2 --'
+# |
+#
+
+REQUIRE_JQ=no
+REQUIRE_MZ=no
+NUM_NETIFS=0
+
+source forwarding/lib.sh
+
+CLIENT_IP="10.0.59.1"
+SERVER_IP="10.0.92.1"
+CLIENT_IP6="2001:db8:5a9a::1"
+SERVER_IP6="2001:db8:9292::1"
+
+setup_server()
+{
+ IP="ip -n $server"
+ NS_EXEC="ip netns exec $server"
+
+ $IP link add dummy0 type dummy
+ $IP link set dummy0 up
+
+ $IP -4 addr add $SERVER_IP/32 dev dummy0
+ $IP -6 addr add $SERVER_IP6/128 dev dummy0 nodad
+
+ $IP link set veth-s1 up
+ $IP link set veth-s2 up
+
+ $IP -4 addr add 192.168.1.2/24 dev veth-s1
+ $IP -4 addr add 192.168.2.2/24 dev veth-s2
+
+ $IP -4 route add $CLIENT_IP/32 \
+ nexthop via 192.168.1.1 dev veth-s1 weight 1 \
+ nexthop via 192.168.2.1 dev veth-s2 weight 1
+
+ $IP -6 addr add 2001:db8:1::2/64 dev veth-s1 nodad
+ $IP -6 addr add 2001:db8:2::2/64 dev veth-s2 nodad
+
+ $IP -6 route add $CLIENT_IP6/128 \
+ nexthop via 2001:db8:1::1 dev veth-s1 weight 1 \
+ nexthop via 2001:db8:2::1 dev veth-s2 weight 1
+}
+
+setup_client()
+{
+ IP="ip -n $client"
+ NS_EXEC="ip netns exec $client"
+
+ $IP link add dummy0 type dummy
+ $IP link set dummy0 up
+
+ $IP -4 addr add $CLIENT_IP/32 dev dummy0
+ $IP -6 addr add $CLIENT_IP6/128 dev dummy0 nodad
+
+ $IP link set veth-c1 up
+ $IP link set veth-c2 up
+
+ $IP -4 addr add 192.168.1.1/24 dev veth-c1
+ $IP -4 addr add 192.168.2.1/24 dev veth-c2
+
+ $IP -4 route add $SERVER_IP/32 \
+ nexthop via 192.168.1.2 dev veth-c1 weight 1 \
+ nexthop via 192.168.2.2 dev veth-c2 weight 1
+
+ $IP -6 addr add 2001:db8:1::1/64 dev veth-c1 nodad
+ $IP -6 addr add 2001:db8:2::1/64 dev veth-c2 nodad
+
+ $IP -6 route add $SERVER_IP6/128 \
+ nexthop via 2001:db8:1::2 dev veth-c1 weight 1 \
+ nexthop via 2001:db8:2::2 dev veth-c2 weight 1
+
+ # By default, tcp_retries1=3 triggers a route refresh
+ # after 3 retransmits (~5s). Ensure this never occurs
+ # for test stability.
+ $NS_EXEC sysctl -qw net.ipv4.tcp_retries1=100
+
+ # When NETDEV_CHANGE is issued for a dev tied to an ECMP
+ # route, RTNH_F_LINKDOWN is flagged and the sernum is
+ # bumped to invalidate the route via sk_dst_check().
+ #
+ # Without ignore_routes_with_linkdown=1, subsequent
+ # lookups may still select the same RTNH_F_LINKDOWN route.
+ $NS_EXEC sysctl -qw net.ipv4.conf.veth-c1.ignore_routes_with_linkdown=1
+ $NS_EXEC sysctl -qw net.ipv4.conf.veth-c2.ignore_routes_with_linkdown=1
+
+ $NS_EXEC sysctl -qw net.ipv6.conf.veth-c1.ignore_routes_with_linkdown=1
+ $NS_EXEC sysctl -qw net.ipv6.conf.veth-c2.ignore_routes_with_linkdown=1
+}
+
+setup()
+{
+ setup_ns client server
+
+ ip -n $client link add veth-c1 type veth peer veth-s1 netns $server
+ ip -n $client link add veth-c2 type veth peer veth-s2 netns $server
+
+ setup_server
+ setup_client
+}
+
+cleanup()
+{
+ cleanup_all_ns
+}
+
+tcp_ecmp_failover()
+{
+ local pf=$1; shift
+ local server_ip=$1; shift
+ local client_ip=$1; shift
+
+ RET=0
+
+ tcpdump_start veth-s1 $server
+ tcpdump_start veth-s2 $server
+
+ ip netns exec $server \
+ socat -u TCP-LISTEN:8080,pf=$pf,bind=$server_ip,reuseaddr /dev/null &
+ server_pid=$!
+
+ # Wait for server to start listening.
+ # Sometimes client fails without this sleep.
+ sleep 1
+
+ ip netns exec $client \
+ socat -u /dev/zero TCP:$server_ip:8080,pf=$pf,bind=$client_ip &
+ client_pid=$!
+
+ # To capture enough packets.
+ sleep 3
+
+ tcpdump_stop veth-s1
+ tcpdump_stop veth-s2
+
+ pkts_s1=$(tcpdump_show veth-s1 | wc -l)
+ pkts_s2=$(tcpdump_show veth-s2 | wc -l)
+
+ tcpdump_cleanup veth-s1
+ tcpdump_cleanup veth-s2
+
+ # Detect the device chosen by the client
+ if [ $pkts_s1 -gt $pkts_s2 ]; then
+ veth_down=veth-s1
+ veth_up=veth-s2
+ else
+ veth_down=veth-s2
+ veth_up=veth-s1
+ fi
+
+ # Taking down $veth_down causes its peer to lose carrier,
+ # triggering NETDEV_CHANGE. This flags RTNH_F_LINKDOWN
+ # and bumps the sernum for the route associated with that
+ # peer, invalidating the cached dst in the TCP socket.
+ #
+ # Consequently, sk_dst_check() fails, forcing the subsequent
+ # lookup to select the remaining healthy route via $veth_up.
+ ip -n $server link set $veth_down down
+
+ tcpdump_start $veth_up $server
+
+ # To capture enough packets.
+ sleep 3
+
+ tcpdump_stop $veth_up
+
+ kill -9 $client_pid 2>&1 > /dev/null
+ kill -9 $server_pid 2>&1 > /dev/null
+ wait 2> /dev/null
+
+ pkts=$(tcpdump_show $veth_up | wc -l)
+
+ tcpdump_cleanup $veth_up
+
+ if [ $pkts -lt 10000 ]; then
+ RET=$ksft_fail
+ fi
+}
+
+test_ipv4()
+{
+ setup
+ tcp_ecmp_failover IPv4 $SERVER_IP $CLIENT_IP
+ log_test "TCP IPv4 failover"
+ cleanup
+}
+
+test_ipv6()
+{
+ setup
+ tcp_ecmp_failover IPv6 "[$SERVER_IP6]" "[$CLIENT_IP6]"
+ log_test "TCP IPv6 failover"
+ cleanup
+}
+
+test_ipv4
+test_ipv6
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net v1 1/2] ipv6: update route serial number on NETDEV_CHANGE
2026-04-27 22:42 ` [PATCH net v1 1/2] ipv6: update route serial number on NETDEV_CHANGE Sagarika Sharma
@ 2026-04-28 11:07 ` Ido Schimmel
2026-04-28 11:21 ` Eric Dumazet
0 siblings, 1 reply; 6+ messages in thread
From: Ido Schimmel @ 2026-04-28 11:07 UTC (permalink / raw)
To: Sagarika Sharma
Cc: David S . Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Shuah Khan, Simon Horman, Kuniyuki Iwashima, netdev,
linux-kselftest
On Mon, Apr 27, 2026 at 10:42:22PM +0000, Sagarika Sharma wrote:
> When using IPv6 ECMP routes, if a netdev listed as a nexthop experiences
> a carrier change event (e.g., a bond device generating a NETDEV_CHANGE
> event after its slaves go linkdown), established connections utilizing
> that nexthop fail to fail over to other available nexthops. Instead,
> these connections stall or drop.
>
> This happens because the IPv6 FIB code does not invalidate the socket's
> cached destination when a NETDEV_CHANGE event occurs. While
> fib6_ifdown() correctly marks the nexthop with RTNH_F_LINKDOWN, it
> leaves the route's serial number unchanged. As a result, sockets with a
> previously cached dst do not realize the route is no longer viable and
> continue to try using the non-functional nexthop.
>
> This behavior contrasts with IPv4, which actively flushes cached
> destinations on a NETDEV_CHANGE event (see fib_netdev_event() in
> net/ipv4/fib_frontend.c).
>
> Fix this by updating the route serial number in fib6_ifdown() when
> setting RTNH_F_LINKDOWN. This invalidates stale cached destinations,
> forcing sockets to perform a new route lookup and fail over to a
> functioning nexthop.
>
> Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)")
> Signed-off-by: Sagarika Sharma <sharmasagarika@google.com>
> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net v1 2/2] selftest: net: Add test for TCP flow failover with ECMP routes.
2026-04-27 22:42 ` [PATCH net v1 2/2] selftest: net: Add test for TCP flow failover with ECMP routes Sagarika Sharma
@ 2026-04-28 11:18 ` Ido Schimmel
0 siblings, 0 replies; 6+ messages in thread
From: Ido Schimmel @ 2026-04-28 11:18 UTC (permalink / raw)
To: Sagarika Sharma
Cc: David S . Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Shuah Khan, Simon Horman, Kuniyuki Iwashima, netdev,
linux-kselftest
On Mon, Apr 27, 2026 at 10:42:23PM +0000, Sagarika Sharma wrote:
> From: Kuniyuki Iwashima <kuniyu@google.com>
>
> Without the previous commit, TCP failed to switch to alternative
> IPv6 routes immediately upon carrier loss.
>
> It would persist with the dead route until reaching the threshold
> net.ipv4.tcp_retries1, leading to unnecessary delays in failover.
>
> Let's add a selftest for this scenario to ensure TCP fails over
> immediately upon a carrier loss event.
>
> Before:
> TEST: TCP IPv4 failover [ OK ]
> TEST: TCP IPv6 failover [FAIL]
>
> After:
> TEST: TCP IPv4 failover [ OK ]
> TEST: TCP IPv6 failover [ OK ]
>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
> Signed-off-by: Sagarika Sharma <sharmasagarika@google.com>
Thanks for the test. LGTM. A couple of nits below.
[...]
> diff --git a/tools/testing/selftests/net/tcp_ecmp_failover.sh b/tools/testing/selftests/net/tcp_ecmp_failover.sh
> new file mode 100755
> index 000000000000..f857d5db84d8
> --- /dev/null
> +++ b/tools/testing/selftests/net/tcp_ecmp_failover.sh
[...]
> +
> +test_ipv4
> +test_ipv6
Maybe squash something like [1]? I ran the test without the first patch
and I get:
# ./tcp_ecmp_failover.sh
TEST: TCP IPv4 failover [ OK ]
TEST: TCP IPv6 failover [FAIL]
# echo $?
0
[1]
diff --git a/tools/testing/selftests/net/tcp_ecmp_failover.sh b/tools/testing/selftests/net/tcp_ecmp_failover.sh
index f857d5db84d8..8b7a2d82c442 100755
--- a/tools/testing/selftests/net/tcp_ecmp_failover.sh
+++ b/tools/testing/selftests/net/tcp_ecmp_failover.sh
@@ -205,5 +205,10 @@ test_ipv6()
cleanup
}
+require_command socat
+require_command tcpdump
+
test_ipv4
test_ipv6
+
+exit "$EXIT_STATUS"
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net v1 1/2] ipv6: update route serial number on NETDEV_CHANGE
2026-04-28 11:07 ` Ido Schimmel
@ 2026-04-28 11:21 ` Eric Dumazet
0 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2026-04-28 11:21 UTC (permalink / raw)
To: Ido Schimmel
Cc: Sagarika Sharma, David S . Miller, David Ahern, Jakub Kicinski,
Paolo Abeni, Shuah Khan, Simon Horman, Kuniyuki Iwashima, netdev,
linux-kselftest
On Tue, Apr 28, 2026 at 4:07 AM Ido Schimmel <idosch@nvidia.com> wrote:
>
> On Mon, Apr 27, 2026 at 10:42:22PM +0000, Sagarika Sharma wrote:
> > When using IPv6 ECMP routes, if a netdev listed as a nexthop experiences
> > a carrier change event (e.g., a bond device generating a NETDEV_CHANGE
> > event after its slaves go linkdown), established connections utilizing
> > that nexthop fail to fail over to other available nexthops. Instead,
> > these connections stall or drop.
> >
> > This happens because the IPv6 FIB code does not invalidate the socket's
> > cached destination when a NETDEV_CHANGE event occurs. While
> > fib6_ifdown() correctly marks the nexthop with RTNH_F_LINKDOWN, it
> > leaves the route's serial number unchanged. As a result, sockets with a
> > previously cached dst do not realize the route is no longer viable and
> > continue to try using the non-functional nexthop.
> >
> > This behavior contrasts with IPv4, which actively flushes cached
> > destinations on a NETDEV_CHANGE event (see fib_netdev_event() in
> > net/ipv4/fib_frontend.c).
> >
> > Fix this by updating the route serial number in fib6_ifdown() when
> > setting RTNH_F_LINKDOWN. This invalidates stale cached destinations,
> > forcing sockets to perform a new route lookup and fail over to a
> > functioning nexthop.
> >
> > Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)")
> > Signed-off-by: Sagarika Sharma <sharmasagarika@google.com>
> > Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Thanks!
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-04-28 11:21 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-27 22:42 [PATCH net v1 0/2] ipv6: fix ECMP route failover on carrier loss Sagarika Sharma
2026-04-27 22:42 ` [PATCH net v1 1/2] ipv6: update route serial number on NETDEV_CHANGE Sagarika Sharma
2026-04-28 11:07 ` Ido Schimmel
2026-04-28 11:21 ` Eric Dumazet
2026-04-27 22:42 ` [PATCH net v1 2/2] selftest: net: Add test for TCP flow failover with ECMP routes Sagarika Sharma
2026-04-28 11:18 ` Ido Schimmel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox