* [PATCH net v3 1/2] net: fib: restore ECMP balance from loopback
@ 2025-12-21 19:26 Vadim Fedorenko
2025-12-21 19:26 ` [PATCH net v3 2/2] selftests: fib_test: Add test case for ipv4 multi nexthops Vadim Fedorenko
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Vadim Fedorenko @ 2025-12-21 19:26 UTC (permalink / raw)
To: David S. Miller, David Ahern, Eric Dumazet, Paolo Abeni,
Simon Horman, Willem de Bruijn, Jakub Kicinski
Cc: Shuah Khan, Ido Schimmel, netdev, Vadim Fedorenko
Preference of nexthop with source address broke ECMP for packets with
source addresses which are not in the broadcast domain, but rather added
to loopback/dummy interfaces. Original behaviour was to balance over
nexthops while now it uses the latest nexthop from the group. To fix the
issue introduce next hop scoring system where next hops with source
address equal to requested will always have higher priority.
For the case with 198.51.100.1/32 assigned to dummy0 and routed using
192.0.2.0/24 and 203.0.113.0/24 networks:
2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether d6:54:8a:ff:78:f5 brd ff:ff:ff:ff:ff:ff
inet 198.51.100.1/32 scope global dummy0
valid_lft forever preferred_lft forever
7: veth1@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 06:ed:98:87:6d:8a brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 192.0.2.2/24 scope global veth1
valid_lft forever preferred_lft forever
inet6 fe80::4ed:98ff:fe87:6d8a/64 scope link proto kernel_ll
valid_lft forever preferred_lft forever
9: veth3@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether ae:75:23:38:a0:d2 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 203.0.113.2/24 scope global veth3
valid_lft forever preferred_lft forever
inet6 fe80::ac75:23ff:fe38:a0d2/64 scope link proto kernel_ll
valid_lft forever preferred_lft forever
~ ip ro list:
default
nexthop via 192.0.2.1 dev veth1 weight 1
nexthop via 203.0.113.1 dev veth3 weight 1
192.0.2.0/24 dev veth1 proto kernel scope link src 192.0.2.2
203.0.113.0/24 dev veth3 proto kernel scope link src 203.0.113.2
before:
for i in {1..255} ; do ip ro get 10.0.0.$i; done | grep veth | awk ' {print $(NF-2)}' | sort | uniq -c:
255 veth3
after:
for i in {1..255} ; do ip ro get 10.0.0.$i; done | grep veth | awk ' {print $(NF-2)}' | sort | uniq -c:
122 veth1
133 veth3
Fixes: 32607a332cfe ("ipv4: prefer multipath nexthop that matches source address")
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
---
v2 -> v3:
- add early break in case next hop with the highest possible score is
found (Ido)
v1 -> v2:
- add score calculation for nexthop to keep original logic
- adjust commit message to explain the config
- use dummy device instead of loopback
---
net/ipv4/fib_semantics.c | 26 ++++++++++----------------
1 file changed, 10 insertions(+), 16 deletions(-)
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index a5f3c8459758..0caf38e44c73 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -2167,8 +2167,8 @@ void fib_select_multipath(struct fib_result *res, int hash,
{
struct fib_info *fi = res->fi;
struct net *net = fi->fib_net;
- bool found = false;
bool use_neigh;
+ int score = -1;
__be32 saddr;
if (unlikely(res->fi->nh)) {
@@ -2180,7 +2180,7 @@ void fib_select_multipath(struct fib_result *res, int hash,
saddr = fl4 ? fl4->saddr : 0;
change_nexthops(fi) {
- int nh_upper_bound;
+ int nh_upper_bound, nh_score = 0;
/* Nexthops without a carrier are assigned an upper bound of
* minus one when "ignore_routes_with_linkdown" is set.
@@ -2190,24 +2190,18 @@ void fib_select_multipath(struct fib_result *res, int hash,
(use_neigh && !fib_good_nh(nexthop_nh)))
continue;
- if (!found) {
+ if (saddr && nexthop_nh->nh_saddr == saddr)
+ nh_score += 2;
+ if (hash <= nh_upper_bound)
+ nh_score++;
+ if (score < nh_score) {
res->nh_sel = nhsel;
res->nhc = &nexthop_nh->nh_common;
- found = !saddr || nexthop_nh->nh_saddr == saddr;
+ if (nh_score == 3 || (!saddr && nh_score == 1))
+ return;
+ score = nh_score;
}
- if (hash > nh_upper_bound)
- continue;
-
- if (!saddr || nexthop_nh->nh_saddr == saddr) {
- res->nh_sel = nhsel;
- res->nhc = &nexthop_nh->nh_common;
- return;
- }
-
- if (found)
- return;
-
} endfor_nexthops(fi);
}
#endif
--
2.47.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH net v3 2/2] selftests: fib_test: Add test case for ipv4 multi nexthops
2025-12-21 19:26 [PATCH net v3 1/2] net: fib: restore ECMP balance from loopback Vadim Fedorenko
@ 2025-12-21 19:26 ` Vadim Fedorenko
2025-12-22 8:18 ` [PATCH net v3 1/2] net: fib: restore ECMP balance from loopback Ido Schimmel
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Vadim Fedorenko @ 2025-12-21 19:26 UTC (permalink / raw)
To: David S. Miller, David Ahern, Eric Dumazet, Paolo Abeni,
Simon Horman, Willem de Bruijn, Jakub Kicinski
Cc: Shuah Khan, Ido Schimmel, netdev, Vadim Fedorenko
The test checks that with multi nexthops route the preferred route is the
one which matches source ip. In case when source ip is on dummy
interface, it checks that the routes are balanced.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
---
tools/testing/selftests/net/fib_tests.sh | 70 +++++++++++++++++++++++-
1 file changed, 69 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
index a88f797c549a..c5694cc4ddd2 100755
--- a/tools/testing/selftests/net/fib_tests.sh
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -12,7 +12,7 @@ TESTS="unregister down carrier nexthop suppress ipv6_notify ipv4_notify \
ipv4_route_metrics ipv4_route_v6_gw rp_filter ipv4_del_addr \
ipv6_del_addr ipv4_mangle ipv6_mangle ipv4_bcast_neigh fib6_gc_test \
ipv4_mpath_list ipv6_mpath_list ipv4_mpath_balance ipv6_mpath_balance \
- fib6_ra_to_static"
+ ipv4_mpath_balance_preferred fib6_ra_to_static"
VERBOSE=0
PAUSE_ON_FAIL=no
@@ -2751,6 +2751,73 @@ ipv4_mpath_balance_test()
forwarding_cleanup
}
+get_route_dev_src()
+{
+ local pfx="$1"
+ local src="$2"
+ local out
+
+ if out=$($IP -j route get "$pfx" from "$src" | jq -re ".[0].dev"); then
+ echo "$out"
+ fi
+}
+
+ipv4_mpath_preferred()
+{
+ local src_ip=$1
+ local pref_dev=$2
+ local dev routes
+ local route0=0
+ local route1=0
+ local pref_route=0
+ num_routes=254
+
+ for i in $(seq 1 $num_routes) ; do
+ dev=$(get_route_dev_src 172.16.105.$i $src_ip)
+ if [ "$dev" = "$pref_dev" ]; then
+ pref_route=$((pref_route+1))
+ elif [ "$dev" = "veth1" ]; then
+ route0=$((route0+1))
+ elif [ "$dev" = "veth3" ]; then
+ route1=$((route1+1))
+ fi
+ done
+
+ routes=$((route0+route1))
+
+ [ "$VERBOSE" = "1" ] && echo "multipath: routes seen: ($route0,$route1,$pref_route)"
+
+ if [ x"$pref_dev" = x"" ]; then
+ [[ $routes -ge $num_routes ]] && [[ $route0 -gt 0 ]] && [[ $route1 -gt 0 ]]
+ else
+ [[ $pref_route -ge $num_routes ]]
+ fi
+
+}
+
+ipv4_mpath_balance_preferred_test()
+{
+ echo
+ echo "IPv4 multipath load balance preferred route"
+
+ forwarding_setup
+
+ $IP route add 172.16.105.0/24 \
+ nexthop via 172.16.101.2 \
+ nexthop via 172.16.103.2
+
+ ipv4_mpath_preferred 172.16.101.1 veth1
+ log_test $? 0 "IPv4 multipath loadbalance from veth1"
+
+ ipv4_mpath_preferred 172.16.103.1 veth3
+ log_test $? 0 "IPv4 multipath loadbalance from veth3"
+
+ ipv4_mpath_preferred 198.51.100.1
+ log_test $? 0 "IPv4 multipath loadbalance from dummy"
+
+ forwarding_cleanup
+}
+
ipv6_mpath_balance_test()
{
echo
@@ -2861,6 +2928,7 @@ do
ipv6_mpath_list) ipv6_mpath_list_test;;
ipv4_mpath_balance) ipv4_mpath_balance_test;;
ipv6_mpath_balance) ipv6_mpath_balance_test;;
+ ipv4_mpath_balance_preferred) ipv4_mpath_balance_preferred_test;;
fib6_ra_to_static) fib6_ra_to_static;;
help) echo "Test names: $TESTS"; exit 0;;
--
2.47.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH net v3 1/2] net: fib: restore ECMP balance from loopback
2025-12-21 19:26 [PATCH net v3 1/2] net: fib: restore ECMP balance from loopback Vadim Fedorenko
2025-12-21 19:26 ` [PATCH net v3 2/2] selftests: fib_test: Add test case for ipv4 multi nexthops Vadim Fedorenko
@ 2025-12-22 8:18 ` Ido Schimmel
2025-12-22 14:05 ` Willem de Bruijn
2025-12-30 10:10 ` patchwork-bot+netdevbpf
3 siblings, 0 replies; 5+ messages in thread
From: Ido Schimmel @ 2025-12-22 8:18 UTC (permalink / raw)
To: Vadim Fedorenko
Cc: David S. Miller, David Ahern, Eric Dumazet, Paolo Abeni,
Simon Horman, Willem de Bruijn, Jakub Kicinski, Shuah Khan,
netdev
On Sun, Dec 21, 2025 at 07:26:38PM +0000, Vadim Fedorenko wrote:
> Preference of nexthop with source address broke ECMP for packets with
> source addresses which are not in the broadcast domain, but rather added
> to loopback/dummy interfaces. Original behaviour was to balance over
> nexthops while now it uses the latest nexthop from the group. To fix the
> issue introduce next hop scoring system where next hops with source
> address equal to requested will always have higher priority.
[...]
> Fixes: 32607a332cfe ("ipv4: prefer multipath nexthop that matches source address")
> Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net v3 1/2] net: fib: restore ECMP balance from loopback
2025-12-21 19:26 [PATCH net v3 1/2] net: fib: restore ECMP balance from loopback Vadim Fedorenko
2025-12-21 19:26 ` [PATCH net v3 2/2] selftests: fib_test: Add test case for ipv4 multi nexthops Vadim Fedorenko
2025-12-22 8:18 ` [PATCH net v3 1/2] net: fib: restore ECMP balance from loopback Ido Schimmel
@ 2025-12-22 14:05 ` Willem de Bruijn
2025-12-30 10:10 ` patchwork-bot+netdevbpf
3 siblings, 0 replies; 5+ messages in thread
From: Willem de Bruijn @ 2025-12-22 14:05 UTC (permalink / raw)
To: Vadim Fedorenko, David S. Miller, David Ahern, Eric Dumazet,
Paolo Abeni, Simon Horman, Willem de Bruijn, Jakub Kicinski
Cc: Shuah Khan, Ido Schimmel, netdev, Vadim Fedorenko
Vadim Fedorenko wrote:
> Preference of nexthop with source address broke ECMP for packets with
> source addresses which are not in the broadcast domain, but rather added
> to loopback/dummy interfaces. Original behaviour was to balance over
> nexthops while now it uses the latest nexthop from the group. To fix the
> issue introduce next hop scoring system where next hops with source
> address equal to requested will always have higher priority.
>
> For the case with 198.51.100.1/32 assigned to dummy0 and routed using
> 192.0.2.0/24 and 203.0.113.0/24 networks:
>
> 2: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
> link/ether d6:54:8a:ff:78:f5 brd ff:ff:ff:ff:ff:ff
> inet 198.51.100.1/32 scope global dummy0
> valid_lft forever preferred_lft forever
> 7: veth1@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
> link/ether 06:ed:98:87:6d:8a brd ff:ff:ff:ff:ff:ff link-netnsid 0
> inet 192.0.2.2/24 scope global veth1
> valid_lft forever preferred_lft forever
> inet6 fe80::4ed:98ff:fe87:6d8a/64 scope link proto kernel_ll
> valid_lft forever preferred_lft forever
> 9: veth3@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
> link/ether ae:75:23:38:a0:d2 brd ff:ff:ff:ff:ff:ff link-netnsid 0
> inet 203.0.113.2/24 scope global veth3
> valid_lft forever preferred_lft forever
> inet6 fe80::ac75:23ff:fe38:a0d2/64 scope link proto kernel_ll
> valid_lft forever preferred_lft forever
>
> ~ ip ro list:
> default
> nexthop via 192.0.2.1 dev veth1 weight 1
> nexthop via 203.0.113.1 dev veth3 weight 1
> 192.0.2.0/24 dev veth1 proto kernel scope link src 192.0.2.2
> 203.0.113.0/24 dev veth3 proto kernel scope link src 203.0.113.2
>
> before:
> for i in {1..255} ; do ip ro get 10.0.0.$i; done | grep veth | awk ' {print $(NF-2)}' | sort | uniq -c:
> 255 veth3
>
> after:
> for i in {1..255} ; do ip ro get 10.0.0.$i; done | grep veth | awk ' {print $(NF-2)}' | sort | uniq -c:
> 122 veth1
> 133 veth3
>
> Fixes: 32607a332cfe ("ipv4: prefer multipath nexthop that matches source address")
> Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Willem de Bruijn <willemb@google.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net v3 1/2] net: fib: restore ECMP balance from loopback
2025-12-21 19:26 [PATCH net v3 1/2] net: fib: restore ECMP balance from loopback Vadim Fedorenko
` (2 preceding siblings ...)
2025-12-22 14:05 ` Willem de Bruijn
@ 2025-12-30 10:10 ` patchwork-bot+netdevbpf
3 siblings, 0 replies; 5+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-12-30 10:10 UTC (permalink / raw)
To: Vadim Fedorenko
Cc: davem, dsahern, edumazet, pabeni, horms, willemb, kuba, shuah,
idosch, netdev
Hello:
This series was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:
On Sun, 21 Dec 2025 19:26:38 +0000 you wrote:
> Preference of nexthop with source address broke ECMP for packets with
> source addresses which are not in the broadcast domain, but rather added
> to loopback/dummy interfaces. Original behaviour was to balance over
> nexthops while now it uses the latest nexthop from the group. To fix the
> issue introduce next hop scoring system where next hops with source
> address equal to requested will always have higher priority.
>
> [...]
Here is the summary with links:
- [net,v3,1/2] net: fib: restore ECMP balance from loopback
https://git.kernel.org/netdev/net/c/6e17474aa9fe
- [net,v3,2/2] selftests: fib_test: Add test case for ipv4 multi nexthops
https://git.kernel.org/netdev/net/c/3be42c3b3d43
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-12-30 10:13 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-21 19:26 [PATCH net v3 1/2] net: fib: restore ECMP balance from loopback Vadim Fedorenko
2025-12-21 19:26 ` [PATCH net v3 2/2] selftests: fib_test: Add test case for ipv4 multi nexthops Vadim Fedorenko
2025-12-22 8:18 ` [PATCH net v3 1/2] net: fib: restore ECMP balance from loopback Ido Schimmel
2025-12-22 14:05 ` Willem de Bruijn
2025-12-30 10:10 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).