All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vadim Fedorenko <vadim.fedorenko@linux.dev>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	David Ahern <dsahern@kernel.org>,
	Eric Dumazet <edumazet@google.com>,
	Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>,
	Willem de Bruijn <willemb@google.com>,
	Jakub Kicinski <kuba@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>, Ido Schimmel <idosch@nvidia.com>,
	netdev@vger.kernel.org
Subject: Re: [PATCH net 1/2] net: fib: restore ECMP balance from loopback
Date: Sun, 14 Dec 2025 06:22:18 +0900	[thread overview]
Message-ID: <d0b3f358-4e0a-42f3-84f0-cbcf19066d49@linux.dev> (raw)
In-Reply-To: <willemdebruijn.kernel.5c4c191262c5@gmail.com>

On 13/12/2025 20:54, Willem de Bruijn wrote:
> Vadim Fedorenko wrote:
>> Preference of nexthop with source address broke ECMP for packets with
>> source address from loopback interface. Original behaviour was to
>> balance over nexthops while now it uses the latest nexthop from the
>> group.
> 
> How does the loopback device specifically come into this?

It may be a dummy device as well. The use case is when there are 2 
physical interfaces and 1 service IP address, distributed by any
routing protocol. The socket is bound to service, thus it's used in
route selection.

> 
>>
>> For the case with 198.51.100.1/32 assigned to lo:
>>
>> before:
>>     done | grep veth | awk ' {print $(NF-2)}' | sort | uniq -c:
>>      255 veth3
>>
>> after:
>>     done | grep veth | awk ' {print $(NF-2)}' | sort | uniq -c:
>>      122 veth1
>>      133 veth3
>>
>> Fixes: 32607a332cfe ("ipv4: prefer multipath nexthop that matches source address")
>> Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
>> ---
>>   net/ipv4/fib_semantics.c | 21 +++++++++++----------
>>   1 file changed, 11 insertions(+), 10 deletions(-)
>>
>> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
>> index a5f3c8459758..c54b4ad9c280 100644
>> --- a/net/ipv4/fib_semantics.c
>> +++ b/net/ipv4/fib_semantics.c
>> @@ -2165,9 +2165,9 @@ static bool fib_good_nh(const struct fib_nh *nh)
>>   void fib_select_multipath(struct fib_result *res, int hash,
>>   			  const struct flowi4 *fl4)
>>   {
>> +	bool first = false, found = false;
>>   	struct fib_info *fi = res->fi;
>>   	struct net *net = fi->fib_net;
>> -	bool found = false;
>>   	bool use_neigh;
>>   	__be32 saddr;
>>   
>> @@ -2190,23 +2190,24 @@ void fib_select_multipath(struct fib_result *res, int hash,
>>   		    (use_neigh && !fib_good_nh(nexthop_nh)))
>>   			continue;
>>   
>> -		if (!found) {
>> +		if (saddr && nexthop_nh->nh_saddr == saddr) {
>>   			res->nh_sel = nhsel;
>>   			res->nhc = &nexthop_nh->nh_common;
>> -			found = !saddr || nexthop_nh->nh_saddr == saddr;
>> +			return;
> 
> This can return a match that exceeds the upper bound, while better
> matches may exist.
> 
> Perhaps what we want is the following:
> 
> 1. if there are matches that match saddr, prefer those above others
>     - take the first match, as with hash input that results in load
>       balancing across flows
>        
> 2. else, take any match
>     - again, first fit
> 
> If no match below fib_nh_upper_bound is found, fall back to the first
> fit above that exceeds nh_upper_bound. Again, prefer first fit of 1 if
> it exists, else first fit of 2.

Oh, I see... in case when there are 2 different nexthops with the same
saddr, we have to balance as well, but with code it will stick to only
first nexthop.

> 
> If so then we need up to two concurrent stored options,
> first_match_saddr and first.

That will have to do a bit more assignments.

> Or alternatively use a score similar to inet listener lookup.

I'll check this option

> Since a new variable is added, I would rename found with
> first_match_saddr or similar to document the intent.

Ok.



  reply	other threads:[~2025-12-13 21:22 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-13 13:58 [PATCH net 1/2] net: fib: restore ECMP balance from loopback Vadim Fedorenko
2025-12-13 13:58 ` [PATCH net 2/2] selftests: fib_nexthops: Add test case for ipv4 multi nexthops Vadim Fedorenko
2025-12-13 21:18   ` Willem de Bruijn
2025-12-13 21:26     ` Vadim Fedorenko
2025-12-13 22:02       ` David Ahern
2025-12-15  6:59   ` Ido Schimmel
2025-12-15 16:13     ` David Ahern
2025-12-15 19:01     ` Vadim Fedorenko
2025-12-13 20:54 ` [PATCH net 1/2] net: fib: restore ECMP balance from loopback Willem de Bruijn
2025-12-13 21:22   ` Vadim Fedorenko [this message]
2025-12-15 21:45     ` Willem de Bruijn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d0b3f358-4e0a-42f3-84f0-cbcf19066d49@linux.dev \
    --to=vadim.fedorenko@linux.dev \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=idosch@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=shuah@kernel.org \
    --cc=willemb@google.com \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.