netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net 1/2] net: fib: restore ECMP balance from loopback
@ 2025-12-13 13:58 Vadim Fedorenko
  2025-12-13 13:58 ` [PATCH net 2/2] selftests: fib_nexthops: Add test case for ipv4 multi nexthops Vadim Fedorenko
  2025-12-13 20:54 ` [PATCH net 1/2] net: fib: restore ECMP balance from loopback Willem de Bruijn
  0 siblings, 2 replies; 11+ messages in thread
From: Vadim Fedorenko @ 2025-12-13 13:58 UTC (permalink / raw)
  To: David S. Miller, David Ahern, Eric Dumazet, Paolo Abeni,
	Simon Horman, Willem de Bruijn, Jakub Kicinski
  Cc: Shuah Khan, Ido Schimmel, netdev, Vadim Fedorenko

Preference of nexthop with source address broke ECMP for packets with
source address from loopback interface. Original behaviour was to
balance over nexthops while now it uses the latest nexthop from the
group.

For the case with 198.51.100.1/32 assigned to lo:

before:
   done | grep veth | awk ' {print $(NF-2)}' | sort | uniq -c:
    255 veth3

after:
   done | grep veth | awk ' {print $(NF-2)}' | sort | uniq -c:
    122 veth1
    133 veth3

Fixes: 32607a332cfe ("ipv4: prefer multipath nexthop that matches source address")
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
---
 net/ipv4/fib_semantics.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index a5f3c8459758..c54b4ad9c280 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -2165,9 +2165,9 @@ static bool fib_good_nh(const struct fib_nh *nh)
 void fib_select_multipath(struct fib_result *res, int hash,
 			  const struct flowi4 *fl4)
 {
+	bool first = false, found = false;
 	struct fib_info *fi = res->fi;
 	struct net *net = fi->fib_net;
-	bool found = false;
 	bool use_neigh;
 	__be32 saddr;
 
@@ -2190,23 +2190,24 @@ void fib_select_multipath(struct fib_result *res, int hash,
 		    (use_neigh && !fib_good_nh(nexthop_nh)))
 			continue;
 
-		if (!found) {
+		if (saddr && nexthop_nh->nh_saddr == saddr) {
 			res->nh_sel = nhsel;
 			res->nhc = &nexthop_nh->nh_common;
-			found = !saddr || nexthop_nh->nh_saddr == saddr;
+			return;
 		}
 
-		if (hash > nh_upper_bound)
-			continue;
-
-		if (!saddr || nexthop_nh->nh_saddr == saddr) {
+		if (!first) {
 			res->nh_sel = nhsel;
 			res->nhc = &nexthop_nh->nh_common;
-			return;
+			first = true;
 		}
 
-		if (found)
-			return;
+		if (found || hash > nh_upper_bound)
+			continue;
+
+		res->nh_sel = nhsel;
+		res->nhc = &nexthop_nh->nh_common;
+		found = true;
 
 	} endfor_nexthops(fi);
 }
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net 2/2] selftests: fib_nexthops: Add test case for ipv4 multi nexthops
  2025-12-13 13:58 [PATCH net 1/2] net: fib: restore ECMP balance from loopback Vadim Fedorenko
@ 2025-12-13 13:58 ` Vadim Fedorenko
  2025-12-13 21:18   ` Willem de Bruijn
  2025-12-15  6:59   ` Ido Schimmel
  2025-12-13 20:54 ` [PATCH net 1/2] net: fib: restore ECMP balance from loopback Willem de Bruijn
  1 sibling, 2 replies; 11+ messages in thread
From: Vadim Fedorenko @ 2025-12-13 13:58 UTC (permalink / raw)
  To: David S. Miller, David Ahern, Eric Dumazet, Paolo Abeni,
	Simon Horman, Willem de Bruijn, Jakub Kicinski
  Cc: Shuah Khan, Ido Schimmel, netdev, Vadim Fedorenko

The test checks that with multi nexthops route the preferred route is the
one which matches source ip. In case when source ip is on loopback, it
checks that the routes are balanced.

Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
---
 tools/testing/selftests/net/fib_nexthops.sh | 85 +++++++++++++++++++++
 1 file changed, 85 insertions(+)

diff --git a/tools/testing/selftests/net/fib_nexthops.sh b/tools/testing/selftests/net/fib_nexthops.sh
index 2b0a90581e2f..9d6f57399a73 100755
--- a/tools/testing/selftests/net/fib_nexthops.sh
+++ b/tools/testing/selftests/net/fib_nexthops.sh
@@ -31,6 +31,7 @@ IPV4_TESTS="
 	ipv4_compat_mode
 	ipv4_fdb_grp_fcnal
 	ipv4_mpath_select
+	ipv4_mpath_select_nogrp
 	ipv4_torture
 	ipv4_res_torture
 "
@@ -375,6 +376,17 @@ check_large_res_grp()
 	log_test $? 0 "Dump large (x$buckets) nexthop buckets"
 }
 
+get_route_dev_src()
+{
+	local pfx="$1"
+	local src="$2"
+	local out
+
+	if out=$($IP -j route get "$pfx" from "$src" | jq -re ".[0].dev"); then
+		echo "$out"
+	fi
+}
+
 get_route_dev()
 {
 	local pfx="$1"
@@ -641,6 +653,79 @@ ipv4_fdb_grp_fcnal()
 	$IP link del dev vx10
 }
 
+ipv4_mpath_select_nogrp()
+{
+	local rc dev match h addr
+
+	echo
+	echo "IPv4 multipath selection no group"
+	echo "------------------------"
+	if [ ! -x "$(command -v jq)" ]; then
+		echo "SKIP: Could not run test; need jq tool"
+		return $ksft_skip
+	fi
+
+	IP="ip -netns $peer"
+	# Use status of existing neighbor entry when determining nexthop for
+	# multipath routes.
+	local -A gws
+	gws=([veth2]=172.16.1.1 [veth4]=172.16.2.1)
+	local -A other_dev
+	other_dev=([veth2]=veth4 [veth4]=veth2)
+	local -A local_ips
+	local_ips=([veth2]=172.16.1.2 [veth4]=172.16.2.2 [veth5]=172.16.100.1)
+	local -A route_devs
+	route_devs=([veth2]=0 [veth4]=0)
+
+	run_cmd "$IP address add 172.16.100.1/32 dev lo"
+	run_cmd "$IP ro add 172.16.102.0/24 nexthop via ${gws['veth2']} dev veth2 nexthop via ${gws['veth4']} dev veth4"
+	rc=0
+	for dev in veth2 veth4; do
+		match=0
+		from_ip="${local_ips[$dev]}"
+		for h in {1..254}; do
+			addr="172.16.102.$h"
+			if [ "$(get_route_dev_src "$addr" "$from_ip")" = "$dev" ]; then
+				match=1
+				break
+			fi
+		done
+		if (( match == 0 )); then
+			echo "SKIP: Did not find a route using device $dev"
+			return $ksft_skip
+		fi
+		run_cmd "$IP neigh add ${gws[$dev]} dev $dev nud failed"
+		if ! check_route_dev "$addr" "${other_dev[$dev]}"; then
+			rc=1
+			break
+		fi
+		run_cmd "$IP neigh del ${gws[$dev]} dev $dev"
+	done
+
+	log_test $rc 0 "Use valid neighbor during multipath selection"
+
+	from_ip="${local_ips["veth5"]}"
+	for h in {1..254}; do
+		addr="172.16.102.$h"
+		route_dev=$(get_route_dev_src "$addr" "$from_ip")
+		route_devs[$route_dev]=1
+	done
+	for dev in veth2 veth4; do
+		if [ ${route_devs[$dev]} -eq 0 ]; then
+			rc=1
+			break;
+		fi
+	done
+
+	log_test $rc 0 "Use both neighbors during multipath selection"
+
+	run_cmd "$IP neigh add 172.16.1.2 dev veth1 nud incomplete"
+	run_cmd "$IP neigh add 172.16.2.2 dev veth3 nud incomplete"
+	run_cmd "$IP route get 172.16.101.1"
+	# if we did not crash, success
+	log_test $rc 0 "Multipath selection with no valid neighbor"
+}
+
 ipv4_mpath_select()
 {
 	local rc dev match h addr
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net 1/2] net: fib: restore ECMP balance from loopback
  2025-12-13 13:58 [PATCH net 1/2] net: fib: restore ECMP balance from loopback Vadim Fedorenko
  2025-12-13 13:58 ` [PATCH net 2/2] selftests: fib_nexthops: Add test case for ipv4 multi nexthops Vadim Fedorenko
@ 2025-12-13 20:54 ` Willem de Bruijn
  2025-12-13 21:22   ` Vadim Fedorenko
  1 sibling, 1 reply; 11+ messages in thread
From: Willem de Bruijn @ 2025-12-13 20:54 UTC (permalink / raw)
  To: Vadim Fedorenko, David S. Miller, David Ahern, Eric Dumazet,
	Paolo Abeni, Simon Horman, Willem de Bruijn, Jakub Kicinski
  Cc: Shuah Khan, Ido Schimmel, netdev, Vadim Fedorenko

Vadim Fedorenko wrote:
> Preference of nexthop with source address broke ECMP for packets with
> source address from loopback interface. Original behaviour was to
> balance over nexthops while now it uses the latest nexthop from the
> group.

How does the loopback device specifically come into this?

> 
> For the case with 198.51.100.1/32 assigned to lo:
> 
> before:
>    done | grep veth | awk ' {print $(NF-2)}' | sort | uniq -c:
>     255 veth3
> 
> after:
>    done | grep veth | awk ' {print $(NF-2)}' | sort | uniq -c:
>     122 veth1
>     133 veth3
> 
> Fixes: 32607a332cfe ("ipv4: prefer multipath nexthop that matches source address")
> Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
> ---
>  net/ipv4/fib_semantics.c | 21 +++++++++++----------
>  1 file changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
> index a5f3c8459758..c54b4ad9c280 100644
> --- a/net/ipv4/fib_semantics.c
> +++ b/net/ipv4/fib_semantics.c
> @@ -2165,9 +2165,9 @@ static bool fib_good_nh(const struct fib_nh *nh)
>  void fib_select_multipath(struct fib_result *res, int hash,
>  			  const struct flowi4 *fl4)
>  {
> +	bool first = false, found = false;
>  	struct fib_info *fi = res->fi;
>  	struct net *net = fi->fib_net;
> -	bool found = false;
>  	bool use_neigh;
>  	__be32 saddr;
>  
> @@ -2190,23 +2190,24 @@ void fib_select_multipath(struct fib_result *res, int hash,
>  		    (use_neigh && !fib_good_nh(nexthop_nh)))
>  			continue;
>  
> -		if (!found) {
> +		if (saddr && nexthop_nh->nh_saddr == saddr) {
>  			res->nh_sel = nhsel;
>  			res->nhc = &nexthop_nh->nh_common;
> -			found = !saddr || nexthop_nh->nh_saddr == saddr;
> +			return;

This can return a match that exceeds the upper bound, while better
matches may exist.

Perhaps what we want is the following:

1. if there are matches that match saddr, prefer those above others
   - take the first match, as with hash input that results in load
     balancing across flows
      
2. else, take any match
   - again, first fit

If no match below fib_nh_upper_bound is found, fall back to the first
fit above that exceeds nh_upper_bound. Again, prefer first fit of 1 if
it exists, else first fit of 2.

If so then we need up to two concurrent stored options,
first_match_saddr and first.

Or alternatively use a score similar to inet listener lookup.

Since a new variable is added, I would rename found with
first_match_saddr or similar to document the intent.

>  		}
>  
> -		if (hash > nh_upper_bound)
> -			continue;
> -
> -		if (!saddr || nexthop_nh->nh_saddr == saddr) {
> +		if (!first) {
>  			res->nh_sel = nhsel;
>  			res->nhc = &nexthop_nh->nh_common;
> -			return;
> +			first = true;
>  		}
>  
> -		if (found)
> -			return;
> +		if (found || hash > nh_upper_bound)
> +			continue;
> +
> +		res->nh_sel = nhsel;
> +		res->nhc = &nexthop_nh->nh_common;
> +		found = true;
>  
>  	} endfor_nexthops(fi);
>  }
> -- 
> 2.47.3
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net 2/2] selftests: fib_nexthops: Add test case for ipv4 multi nexthops
  2025-12-13 13:58 ` [PATCH net 2/2] selftests: fib_nexthops: Add test case for ipv4 multi nexthops Vadim Fedorenko
@ 2025-12-13 21:18   ` Willem de Bruijn
  2025-12-13 21:26     ` Vadim Fedorenko
  2025-12-15  6:59   ` Ido Schimmel
  1 sibling, 1 reply; 11+ messages in thread
From: Willem de Bruijn @ 2025-12-13 21:18 UTC (permalink / raw)
  To: Vadim Fedorenko, David S. Miller, David Ahern, Eric Dumazet,
	Paolo Abeni, Simon Horman, Willem de Bruijn, Jakub Kicinski
  Cc: Shuah Khan, Ido Schimmel, netdev, Vadim Fedorenko

Vadim Fedorenko wrote:
> The test checks that with multi nexthops route the preferred route is the
> one which matches source ip. In case when source ip is on loopback, it
> checks that the routes are balanced.

are balanced [across .. ]

> 
> Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
> ---
>  tools/testing/selftests/net/fib_nexthops.sh | 85 +++++++++++++++++++++
>  1 file changed, 85 insertions(+)
> 
> diff --git a/tools/testing/selftests/net/fib_nexthops.sh b/tools/testing/selftests/net/fib_nexthops.sh
> index 2b0a90581e2f..9d6f57399a73 100755
> --- a/tools/testing/selftests/net/fib_nexthops.sh
> +++ b/tools/testing/selftests/net/fib_nexthops.sh
> @@ -31,6 +31,7 @@ IPV4_TESTS="
>  	ipv4_compat_mode
>  	ipv4_fdb_grp_fcnal
>  	ipv4_mpath_select
> +	ipv4_mpath_select_nogrp
>  	ipv4_torture
>  	ipv4_res_torture
>  "
> @@ -375,6 +376,17 @@ check_large_res_grp()
>  	log_test $? 0 "Dump large (x$buckets) nexthop buckets"
>  }
>  
> +get_route_dev_src()
> +{
> +	local pfx="$1"
> +	local src="$2"
> +	local out
> +
> +	if out=$($IP -j route get "$pfx" from "$src" | jq -re ".[0].dev"); then
> +		echo "$out"
> +	fi
> +}
> +
>  get_route_dev()
>  {
>  	local pfx="$1"
> @@ -641,6 +653,79 @@ ipv4_fdb_grp_fcnal()
>  	$IP link del dev vx10
>  }
>  
> +ipv4_mpath_select_nogrp()

There is more going on than just not using the group feature.

Would it make sense to split this into two test patches, a base test
and a follow-on that extends with the loopback special case?

> +{
> +	local rc dev match h addr
> +
> +	echo
> +	echo "IPv4 multipath selection no group"
> +	echo "------------------------"
> +	if [ ! -x "$(command -v jq)" ]; then
> +		echo "SKIP: Could not run test; need jq tool"
> +		return $ksft_skip
> +	fi
> +
> +	IP="ip -netns $peer"
> +	# Use status of existing neighbor entry when determining nexthop for
> +	# multipath routes.
> +	local -A gws
> +	gws=([veth2]=172.16.1.1 [veth4]=172.16.2.1)
> +	local -A other_dev
> +	other_dev=([veth2]=veth4 [veth4]=veth2)
> +	local -A local_ips
> +	local_ips=([veth2]=172.16.1.2 [veth4]=172.16.2.2 [veth5]=172.16.100.1)

Why do both loopback and veth5 exist with the same local ip. Can this just be lo?
> +	local -A route_devs
> +	route_devs=([veth2]=0 [veth4]=0)
> +
> +	run_cmd "$IP address add 172.16.100.1/32 dev lo"
> +	run_cmd "$IP ro add 172.16.102.0/24 nexthop via ${gws['veth2']} dev veth2 nexthop via ${gws['veth4']} dev veth4"
> +	rc=0
> +	for dev in veth2 veth4; do
> +		match=0
> +		from_ip="${local_ips[$dev]}"
> +		for h in {1..254}; do
> +			addr="172.16.102.$h"
> +			if [ "$(get_route_dev_src "$addr" "$from_ip")" = "$dev" ]; then
> +				match=1
> +				break
> +			fi
> +		done
> +		if (( match == 0 )); then
> +			echo "SKIP: Did not find a route using device $dev"
> +			return $ksft_skip
> +		fi
> +		run_cmd "$IP neigh add ${gws[$dev]} dev $dev nud failed"
> +		if ! check_route_dev "$addr" "${other_dev[$dev]}"; then
> +			rc=1
> +			break
> +		fi
> +		run_cmd "$IP neigh del ${gws[$dev]} dev $dev"
> +	done
> +
> +	log_test $rc 0 "Use valid neighbor during multipath selection"
> +
> +	from_ip="${local_ips["veth5"]}"
> +	for h in {1..254}; do
> +		addr="172.16.102.$h"
> +		route_dev=$(get_route_dev_src "$addr" "$from_ip")
> +		route_devs[$route_dev]=1
> +	done
> +	for dev in veth2 veth4; do
> +		if [ ${route_devs[$dev]} -eq 0 ]; then
> +			rc=1
> +			break;
> +		fi
> +	done
> +
> +	log_test $rc 0 "Use both neighbors during multipath selection"
> +
> +	run_cmd "$IP neigh add 172.16.1.2 dev veth1 nud incomplete"
> +	run_cmd "$IP neigh add 172.16.2.2 dev veth3 nud incomplete"
> +	run_cmd "$IP route get 172.16.101.1"
> +	# if we did not crash, success
> +	log_test $rc 0 "Multipath selection with no valid neighbor"
> +}
> +
>  ipv4_mpath_select()
>  {
>  	local rc dev match h addr
> -- 
> 2.47.3
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net 1/2] net: fib: restore ECMP balance from loopback
  2025-12-13 20:54 ` [PATCH net 1/2] net: fib: restore ECMP balance from loopback Willem de Bruijn
@ 2025-12-13 21:22   ` Vadim Fedorenko
  2025-12-15 21:45     ` Willem de Bruijn
  0 siblings, 1 reply; 11+ messages in thread
From: Vadim Fedorenko @ 2025-12-13 21:22 UTC (permalink / raw)
  To: Willem de Bruijn, David S. Miller, David Ahern, Eric Dumazet,
	Paolo Abeni, Simon Horman, Willem de Bruijn, Jakub Kicinski
  Cc: Shuah Khan, Ido Schimmel, netdev

On 13/12/2025 20:54, Willem de Bruijn wrote:
> Vadim Fedorenko wrote:
>> Preference of nexthop with source address broke ECMP for packets with
>> source address from loopback interface. Original behaviour was to
>> balance over nexthops while now it uses the latest nexthop from the
>> group.
> 
> How does the loopback device specifically come into this?

It may be a dummy device as well. The use case is when there are 2 
physical interfaces and 1 service IP address, distributed by any
routing protocol. The socket is bound to service, thus it's used in
route selection.

> 
>>
>> For the case with 198.51.100.1/32 assigned to lo:
>>
>> before:
>>     done | grep veth | awk ' {print $(NF-2)}' | sort | uniq -c:
>>      255 veth3
>>
>> after:
>>     done | grep veth | awk ' {print $(NF-2)}' | sort | uniq -c:
>>      122 veth1
>>      133 veth3
>>
>> Fixes: 32607a332cfe ("ipv4: prefer multipath nexthop that matches source address")
>> Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
>> ---
>>   net/ipv4/fib_semantics.c | 21 +++++++++++----------
>>   1 file changed, 11 insertions(+), 10 deletions(-)
>>
>> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
>> index a5f3c8459758..c54b4ad9c280 100644
>> --- a/net/ipv4/fib_semantics.c
>> +++ b/net/ipv4/fib_semantics.c
>> @@ -2165,9 +2165,9 @@ static bool fib_good_nh(const struct fib_nh *nh)
>>   void fib_select_multipath(struct fib_result *res, int hash,
>>   			  const struct flowi4 *fl4)
>>   {
>> +	bool first = false, found = false;
>>   	struct fib_info *fi = res->fi;
>>   	struct net *net = fi->fib_net;
>> -	bool found = false;
>>   	bool use_neigh;
>>   	__be32 saddr;
>>   
>> @@ -2190,23 +2190,24 @@ void fib_select_multipath(struct fib_result *res, int hash,
>>   		    (use_neigh && !fib_good_nh(nexthop_nh)))
>>   			continue;
>>   
>> -		if (!found) {
>> +		if (saddr && nexthop_nh->nh_saddr == saddr) {
>>   			res->nh_sel = nhsel;
>>   			res->nhc = &nexthop_nh->nh_common;
>> -			found = !saddr || nexthop_nh->nh_saddr == saddr;
>> +			return;
> 
> This can return a match that exceeds the upper bound, while better
> matches may exist.
> 
> Perhaps what we want is the following:
> 
> 1. if there are matches that match saddr, prefer those above others
>     - take the first match, as with hash input that results in load
>       balancing across flows
>        
> 2. else, take any match
>     - again, first fit
> 
> If no match below fib_nh_upper_bound is found, fall back to the first
> fit above that exceeds nh_upper_bound. Again, prefer first fit of 1 if
> it exists, else first fit of 2.

Oh, I see... in case when there are 2 different nexthops with the same
saddr, we have to balance as well, but with code it will stick to only
first nexthop.

> 
> If so then we need up to two concurrent stored options,
> first_match_saddr and first.

That will have to do a bit more assignments.

> Or alternatively use a score similar to inet listener lookup.

I'll check this option

> Since a new variable is added, I would rename found with
> first_match_saddr or similar to document the intent.

Ok.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net 2/2] selftests: fib_nexthops: Add test case for ipv4 multi nexthops
  2025-12-13 21:18   ` Willem de Bruijn
@ 2025-12-13 21:26     ` Vadim Fedorenko
  2025-12-13 22:02       ` David Ahern
  0 siblings, 1 reply; 11+ messages in thread
From: Vadim Fedorenko @ 2025-12-13 21:26 UTC (permalink / raw)
  To: Willem de Bruijn, David S. Miller, David Ahern, Eric Dumazet,
	Paolo Abeni, Simon Horman, Willem de Bruijn, Jakub Kicinski
  Cc: Shuah Khan, Ido Schimmel, netdev

On 13/12/2025 21:18, Willem de Bruijn wrote:
> Vadim Fedorenko wrote:
>> The test checks that with multi nexthops route the preferred route is the
>> one which matches source ip. In case when source ip is on loopback, it
>> checks that the routes are balanced.
> 
> are balanced [across .. ]

Got it.

> 
>>
>> Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
>> ---
>>   tools/testing/selftests/net/fib_nexthops.sh | 85 +++++++++++++++++++++
>>   1 file changed, 85 insertions(+)
>>
>> diff --git a/tools/testing/selftests/net/fib_nexthops.sh b/tools/testing/selftests/net/fib_nexthops.sh
>> index 2b0a90581e2f..9d6f57399a73 100755
>> --- a/tools/testing/selftests/net/fib_nexthops.sh
>> +++ b/tools/testing/selftests/net/fib_nexthops.sh
>> @@ -31,6 +31,7 @@ IPV4_TESTS="
>>   	ipv4_compat_mode
>>   	ipv4_fdb_grp_fcnal
>>   	ipv4_mpath_select
>> +	ipv4_mpath_select_nogrp
>>   	ipv4_torture
>>   	ipv4_res_torture
>>   "
>> @@ -375,6 +376,17 @@ check_large_res_grp()
>>   	log_test $? 0 "Dump large (x$buckets) nexthop buckets"
>>   }
>>   
>> +get_route_dev_src()
>> +{
>> +	local pfx="$1"
>> +	local src="$2"
>> +	local out
>> +
>> +	if out=$($IP -j route get "$pfx" from "$src" | jq -re ".[0].dev"); then
>> +		echo "$out"
>> +	fi
>> +}
>> +
>>   get_route_dev()
>>   {
>>   	local pfx="$1"
>> @@ -641,6 +653,79 @@ ipv4_fdb_grp_fcnal()
>>   	$IP link del dev vx10
>>   }
>>   
>> +ipv4_mpath_select_nogrp()
> 
> There is more going on than just not using the group feature.
> 
> Would it make sense to split this into two test patches, a base test
> and a follow-on that extends with the loopback special case?

Sounds reasonable, I'll split it in v2

> 
>> +{
>> +	local rc dev match h addr
>> +
>> +	echo
>> +	echo "IPv4 multipath selection no group"
>> +	echo "------------------------"
>> +	if [ ! -x "$(command -v jq)" ]; then
>> +		echo "SKIP: Could not run test; need jq tool"
>> +		return $ksft_skip
>> +	fi
>> +
>> +	IP="ip -netns $peer"
>> +	# Use status of existing neighbor entry when determining nexthop for
>> +	# multipath routes.
>> +	local -A gws
>> +	gws=([veth2]=172.16.1.1 [veth4]=172.16.2.1)
>> +	local -A other_dev
>> +	other_dev=([veth2]=veth4 [veth4]=veth2)
>> +	local -A local_ips
>> +	local_ips=([veth2]=172.16.1.2 [veth4]=172.16.2.2 [veth5]=172.16.100.1)
> 
> Why do both loopback and veth5 exist with the same local ip. Can this just be lo?

Ah, yeah, looks like a leftover from previous version, thanks for
catching!

>> +	local -A route_devs
>> +	route_devs=([veth2]=0 [veth4]=0)
>> +
>> +	run_cmd "$IP address add 172.16.100.1/32 dev lo"
>> +	run_cmd "$IP ro add 172.16.102.0/24 nexthop via ${gws['veth2']} dev veth2 nexthop via ${gws['veth4']} dev veth4"
>> +	rc=0
>> +	for dev in veth2 veth4; do
>> +		match=0
>> +		from_ip="${local_ips[$dev]}"
>> +		for h in {1..254}; do
>> +			addr="172.16.102.$h"
>> +			if [ "$(get_route_dev_src "$addr" "$from_ip")" = "$dev" ]; then
>> +				match=1
>> +				break
>> +			fi
>> +		done
>> +		if (( match == 0 )); then
>> +			echo "SKIP: Did not find a route using device $dev"
>> +			return $ksft_skip
>> +		fi
>> +		run_cmd "$IP neigh add ${gws[$dev]} dev $dev nud failed"
>> +		if ! check_route_dev "$addr" "${other_dev[$dev]}"; then
>> +			rc=1
>> +			break
>> +		fi
>> +		run_cmd "$IP neigh del ${gws[$dev]} dev $dev"
>> +	done
>> +
>> +	log_test $rc 0 "Use valid neighbor during multipath selection"
>> +
>> +	from_ip="${local_ips["veth5"]}"
>> +	for h in {1..254}; do
>> +		addr="172.16.102.$h"
>> +		route_dev=$(get_route_dev_src "$addr" "$from_ip")
>> +		route_devs[$route_dev]=1
>> +	done
>> +	for dev in veth2 veth4; do
>> +		if [ ${route_devs[$dev]} -eq 0 ]; then
>> +			rc=1
>> +			break;
>> +		fi
>> +	done
>> +
>> +	log_test $rc 0 "Use both neighbors during multipath selection"
>> +
>> +	run_cmd "$IP neigh add 172.16.1.2 dev veth1 nud incomplete"
>> +	run_cmd "$IP neigh add 172.16.2.2 dev veth3 nud incomplete"
>> +	run_cmd "$IP route get 172.16.101.1"
>> +	# if we did not crash, success
>> +	log_test $rc 0 "Multipath selection with no valid neighbor"
>> +}
>> +
>>   ipv4_mpath_select()
>>   {
>>   	local rc dev match h addr
>> -- 
>> 2.47.3
>>
> 
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net 2/2] selftests: fib_nexthops: Add test case for ipv4 multi nexthops
  2025-12-13 21:26     ` Vadim Fedorenko
@ 2025-12-13 22:02       ` David Ahern
  0 siblings, 0 replies; 11+ messages in thread
From: David Ahern @ 2025-12-13 22:02 UTC (permalink / raw)
  To: Vadim Fedorenko, Willem de Bruijn, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Willem de Bruijn, Jakub Kicinski
  Cc: Shuah Khan, Ido Schimmel, netdev

On 12/13/25 2:26 PM, Vadim Fedorenko wrote:
>> There is more going on than just not using the group feature.
>>
>> Would it make sense to split this into two test patches, a base test
>> and a follow-on that extends with the loopback special case?
> 
> Sounds reasonable, I'll split it in v2
> 

any test case for loopback needs another one with VRFs where the VRF
device is the loopback for that domain.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net 2/2] selftests: fib_nexthops: Add test case for ipv4 multi nexthops
  2025-12-13 13:58 ` [PATCH net 2/2] selftests: fib_nexthops: Add test case for ipv4 multi nexthops Vadim Fedorenko
  2025-12-13 21:18   ` Willem de Bruijn
@ 2025-12-15  6:59   ` Ido Schimmel
  2025-12-15 16:13     ` David Ahern
  2025-12-15 19:01     ` Vadim Fedorenko
  1 sibling, 2 replies; 11+ messages in thread
From: Ido Schimmel @ 2025-12-15  6:59 UTC (permalink / raw)
  To: Vadim Fedorenko
  Cc: David S. Miller, David Ahern, Eric Dumazet, Paolo Abeni,
	Simon Horman, Willem de Bruijn, Jakub Kicinski, Shuah Khan,
	netdev

On Sat, Dec 13, 2025 at 01:58:49PM +0000, Vadim Fedorenko wrote:
> diff --git a/tools/testing/selftests/net/fib_nexthops.sh b/tools/testing/selftests/net/fib_nexthops.sh
> index 2b0a90581e2f..9d6f57399a73 100755
> --- a/tools/testing/selftests/net/fib_nexthops.sh
> +++ b/tools/testing/selftests/net/fib_nexthops.sh
> @@ -31,6 +31,7 @@ IPV4_TESTS="
>  	ipv4_compat_mode
>  	ipv4_fdb_grp_fcnal
>  	ipv4_mpath_select
> +	ipv4_mpath_select_nogrp
>  	ipv4_torture
>  	ipv4_res_torture
>  "
> @@ -375,6 +376,17 @@ check_large_res_grp()
>  	log_test $? 0 "Dump large (x$buckets) nexthop buckets"
>  }
>  
> +get_route_dev_src()
> +{
> +	local pfx="$1"
> +	local src="$2"
> +	local out
> +
> +	if out=$($IP -j route get "$pfx" from "$src" | jq -re ".[0].dev"); then
> +		echo "$out"
> +	fi
> +}
> +
>  get_route_dev()
>  {
>  	local pfx="$1"
> @@ -641,6 +653,79 @@ ipv4_fdb_grp_fcnal()
>  	$IP link del dev vx10
>  }
>  
> +ipv4_mpath_select_nogrp()
> +{
> +	local rc dev match h addr
> +
> +	echo
> +	echo "IPv4 multipath selection no group"
> +	echo "------------------------"
> +	if [ ! -x "$(command -v jq)" ]; then
> +		echo "SKIP: Could not run test; need jq tool"
> +		return $ksft_skip
> +	fi
> +
> +	IP="ip -netns $peer"
> +	# Use status of existing neighbor entry when determining nexthop for
> +	# multipath routes.
> +	local -A gws
> +	gws=([veth2]=172.16.1.1 [veth4]=172.16.2.1)
> +	local -A other_dev
> +	other_dev=([veth2]=veth4 [veth4]=veth2)
> +	local -A local_ips
> +	local_ips=([veth2]=172.16.1.2 [veth4]=172.16.2.2 [veth5]=172.16.100.1)
> +	local -A route_devs
> +	route_devs=([veth2]=0 [veth4]=0)
> +
> +	run_cmd "$IP address add 172.16.100.1/32 dev lo"
> +	run_cmd "$IP ro add 172.16.102.0/24 nexthop via ${gws['veth2']} dev veth2 nexthop via ${gws['veth4']} dev veth4"

fib_nexthops.sh is for tests using nexthop objects: "This test is for
checking IPv4 and IPv6 FIB behavior with nexthop objects".

I suggest moving this to fib_tests.sh. See commit 4d0dac499bf3
("selftests/net: test tcp connection load balancing") that was added as
a test for the blamed commit.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net 2/2] selftests: fib_nexthops: Add test case for ipv4 multi nexthops
  2025-12-15  6:59   ` Ido Schimmel
@ 2025-12-15 16:13     ` David Ahern
  2025-12-15 19:01     ` Vadim Fedorenko
  1 sibling, 0 replies; 11+ messages in thread
From: David Ahern @ 2025-12-15 16:13 UTC (permalink / raw)
  To: Ido Schimmel, Vadim Fedorenko
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Willem de Bruijn, Jakub Kicinski, Shuah Khan, netdev

On 12/14/25 11:59 PM, Ido Schimmel wrote:
> 
> fib_nexthops.sh is for tests using nexthop objects: "This test is for
> checking IPv4 and IPv6 FIB behavior with nexthop objects".
> 
> I suggest moving this to fib_tests.sh. See commit 4d0dac499bf3
> ("selftests/net: test tcp connection load balancing") that was added as
> a test for the blamed commit.

Good catch. Yes, these tests belong in fib_tests.sh, not fib_nexthops.sh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net 2/2] selftests: fib_nexthops: Add test case for ipv4 multi nexthops
  2025-12-15  6:59   ` Ido Schimmel
  2025-12-15 16:13     ` David Ahern
@ 2025-12-15 19:01     ` Vadim Fedorenko
  1 sibling, 0 replies; 11+ messages in thread
From: Vadim Fedorenko @ 2025-12-15 19:01 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: David S. Miller, David Ahern, Eric Dumazet, Paolo Abeni,
	Simon Horman, Willem de Bruijn, Jakub Kicinski, Shuah Khan,
	netdev

On 15/12/2025 06:59, Ido Schimmel wrote:
> On Sat, Dec 13, 2025 at 01:58:49PM +0000, Vadim Fedorenko wrote:
>> diff --git a/tools/testing/selftests/net/fib_nexthops.sh b/tools/testing/selftests/net/fib_nexthops.sh
>> index 2b0a90581e2f..9d6f57399a73 100755
>> --- a/tools/testing/selftests/net/fib_nexthops.sh
>> +++ b/tools/testing/selftests/net/fib_nexthops.sh
>> @@ -31,6 +31,7 @@ IPV4_TESTS="
>>   	ipv4_compat_mode
>>   	ipv4_fdb_grp_fcnal
>>   	ipv4_mpath_select
>> +	ipv4_mpath_select_nogrp
>>   	ipv4_torture
>>   	ipv4_res_torture
>>   "
>> @@ -375,6 +376,17 @@ check_large_res_grp()
>>   	log_test $? 0 "Dump large (x$buckets) nexthop buckets"
>>   }
>>   
>> +get_route_dev_src()
>> +{
>> +	local pfx="$1"
>> +	local src="$2"
>> +	local out
>> +
>> +	if out=$($IP -j route get "$pfx" from "$src" | jq -re ".[0].dev"); then
>> +		echo "$out"
>> +	fi
>> +}
>> +
>>   get_route_dev()
>>   {
>>   	local pfx="$1"
>> @@ -641,6 +653,79 @@ ipv4_fdb_grp_fcnal()
>>   	$IP link del dev vx10
>>   }
>>   
>> +ipv4_mpath_select_nogrp()
>> +{
>> +	local rc dev match h addr
>> +
>> +	echo
>> +	echo "IPv4 multipath selection no group"
>> +	echo "------------------------"
>> +	if [ ! -x "$(command -v jq)" ]; then
>> +		echo "SKIP: Could not run test; need jq tool"
>> +		return $ksft_skip
>> +	fi
>> +
>> +	IP="ip -netns $peer"
>> +	# Use status of existing neighbor entry when determining nexthop for
>> +	# multipath routes.
>> +	local -A gws
>> +	gws=([veth2]=172.16.1.1 [veth4]=172.16.2.1)
>> +	local -A other_dev
>> +	other_dev=([veth2]=veth4 [veth4]=veth2)
>> +	local -A local_ips
>> +	local_ips=([veth2]=172.16.1.2 [veth4]=172.16.2.2 [veth5]=172.16.100.1)
>> +	local -A route_devs
>> +	route_devs=([veth2]=0 [veth4]=0)
>> +
>> +	run_cmd "$IP address add 172.16.100.1/32 dev lo"
>> +	run_cmd "$IP ro add 172.16.102.0/24 nexthop via ${gws['veth2']} dev veth2 nexthop via ${gws['veth4']} dev veth4"
> 
> fib_nexthops.sh is for tests using nexthop objects: "This test is for
> checking IPv4 and IPv6 FIB behavior with nexthop objects".
> 
> I suggest moving this to fib_tests.sh. See commit 4d0dac499bf3
> ("selftests/net: test tcp connection load balancing") that was added as
> a test for the blamed commit.

Yep, got it. Will move in v2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net 1/2] net: fib: restore ECMP balance from loopback
  2025-12-13 21:22   ` Vadim Fedorenko
@ 2025-12-15 21:45     ` Willem de Bruijn
  0 siblings, 0 replies; 11+ messages in thread
From: Willem de Bruijn @ 2025-12-15 21:45 UTC (permalink / raw)
  To: Vadim Fedorenko, Willem de Bruijn, David S. Miller, David Ahern,
	Eric Dumazet, Paolo Abeni, Simon Horman, Willem de Bruijn,
	Jakub Kicinski
  Cc: Shuah Khan, Ido Schimmel, netdev

Vadim Fedorenko wrote:
> On 13/12/2025 20:54, Willem de Bruijn wrote:
> > Vadim Fedorenko wrote:
> >> Preference of nexthop with source address broke ECMP for packets with
> >> source address from loopback interface. Original behaviour was to
> >> balance over nexthops while now it uses the latest nexthop from the
> >> group.
> > 
> > How does the loopback device specifically come into this?
> 
> It may be a dummy device as well. The use case is when there are 2 
> physical interfaces and 1 service IP address, distributed by any
> routing protocol. The socket is bound to service, thus it's used in
> route selection.

Can you elaborate a bit more on this. Maybe in the commit message
itself. How is loopback relevant to this service IP.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-12-15 21:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-13 13:58 [PATCH net 1/2] net: fib: restore ECMP balance from loopback Vadim Fedorenko
2025-12-13 13:58 ` [PATCH net 2/2] selftests: fib_nexthops: Add test case for ipv4 multi nexthops Vadim Fedorenko
2025-12-13 21:18   ` Willem de Bruijn
2025-12-13 21:26     ` Vadim Fedorenko
2025-12-13 22:02       ` David Ahern
2025-12-15  6:59   ` Ido Schimmel
2025-12-15 16:13     ` David Ahern
2025-12-15 19:01     ` Vadim Fedorenko
2025-12-13 20:54 ` [PATCH net 1/2] net: fib: restore ECMP balance from loopback Willem de Bruijn
2025-12-13 21:22   ` Vadim Fedorenko
2025-12-15 21:45     ` Willem de Bruijn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).