All of lore.kernel.org
 help / color / mirror / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Vadim Fedorenko <vadim.fedorenko@linux.dev>,
	 "David S. Miller" <davem@davemloft.net>,
	 David Ahern <dsahern@kernel.org>,
	 Eric Dumazet <edumazet@google.com>,
	 Paolo Abeni <pabeni@redhat.com>,
	 Simon Horman <horms@kernel.org>,
	 Willem de Bruijn <willemb@google.com>,
	 Jakub Kicinski <kuba@kernel.ord>
Cc: Shuah Khan <shuah@kernel.org>,  Ido Schimmel <idosch@nvidia.com>,
	 netdev@vger.kernel.org,
	 Vadim Fedorenko <vadim.fedorenko@linux.dev>
Subject: Re: [PATCH net 1/2] net: fib: restore ECMP balance from loopback
Date: Sat, 13 Dec 2025 15:54:18 -0500	[thread overview]
Message-ID: <willemdebruijn.kernel.5c4c191262c5@gmail.com> (raw)
In-Reply-To: <20251213135849.2054677-1-vadim.fedorenko@linux.dev>

Vadim Fedorenko wrote:
> Preference of nexthop with source address broke ECMP for packets with
> source address from loopback interface. Original behaviour was to
> balance over nexthops while now it uses the latest nexthop from the
> group.

How does the loopback device specifically come into this?

> 
> For the case with 198.51.100.1/32 assigned to lo:
> 
> before:
>    done | grep veth | awk ' {print $(NF-2)}' | sort | uniq -c:
>     255 veth3
> 
> after:
>    done | grep veth | awk ' {print $(NF-2)}' | sort | uniq -c:
>     122 veth1
>     133 veth3
> 
> Fixes: 32607a332cfe ("ipv4: prefer multipath nexthop that matches source address")
> Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
> ---
>  net/ipv4/fib_semantics.c | 21 +++++++++++----------
>  1 file changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
> index a5f3c8459758..c54b4ad9c280 100644
> --- a/net/ipv4/fib_semantics.c
> +++ b/net/ipv4/fib_semantics.c
> @@ -2165,9 +2165,9 @@ static bool fib_good_nh(const struct fib_nh *nh)
>  void fib_select_multipath(struct fib_result *res, int hash,
>  			  const struct flowi4 *fl4)
>  {
> +	bool first = false, found = false;
>  	struct fib_info *fi = res->fi;
>  	struct net *net = fi->fib_net;
> -	bool found = false;
>  	bool use_neigh;
>  	__be32 saddr;
>  
> @@ -2190,23 +2190,24 @@ void fib_select_multipath(struct fib_result *res, int hash,
>  		    (use_neigh && !fib_good_nh(nexthop_nh)))
>  			continue;
>  
> -		if (!found) {
> +		if (saddr && nexthop_nh->nh_saddr == saddr) {
>  			res->nh_sel = nhsel;
>  			res->nhc = &nexthop_nh->nh_common;
> -			found = !saddr || nexthop_nh->nh_saddr == saddr;
> +			return;

This can return a match that exceeds the upper bound, while better
matches may exist.

Perhaps what we want is the following:

1. if there are matches that match saddr, prefer those above others
   - take the first match, as with hash input that results in load
     balancing across flows
      
2. else, take any match
   - again, first fit

If no match below fib_nh_upper_bound is found, fall back to the first
fit above that exceeds nh_upper_bound. Again, prefer first fit of 1 if
it exists, else first fit of 2.

If so then we need up to two concurrent stored options,
first_match_saddr and first.

Or alternatively use a score similar to inet listener lookup.

Since a new variable is added, I would rename found with
first_match_saddr or similar to document the intent.

>  		}
>  
> -		if (hash > nh_upper_bound)
> -			continue;
> -
> -		if (!saddr || nexthop_nh->nh_saddr == saddr) {
> +		if (!first) {
>  			res->nh_sel = nhsel;
>  			res->nhc = &nexthop_nh->nh_common;
> -			return;
> +			first = true;
>  		}
>  
> -		if (found)
> -			return;
> +		if (found || hash > nh_upper_bound)
> +			continue;
> +
> +		res->nh_sel = nhsel;
> +		res->nhc = &nexthop_nh->nh_common;
> +		found = true;
>  
>  	} endfor_nexthops(fi);
>  }
> -- 
> 2.47.3
> 



  parent reply	other threads:[~2025-12-13 20:54 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-13 13:58 [PATCH net 1/2] net: fib: restore ECMP balance from loopback Vadim Fedorenko
2025-12-13 13:58 ` [PATCH net 2/2] selftests: fib_nexthops: Add test case for ipv4 multi nexthops Vadim Fedorenko
2025-12-13 21:18   ` Willem de Bruijn
2025-12-13 21:26     ` Vadim Fedorenko
2025-12-13 22:02       ` David Ahern
2025-12-15  6:59   ` Ido Schimmel
2025-12-15 16:13     ` David Ahern
2025-12-15 19:01     ` Vadim Fedorenko
2025-12-13 20:54 ` Willem de Bruijn [this message]
2025-12-13 21:22   ` [PATCH net 1/2] net: fib: restore ECMP balance from loopback Vadim Fedorenko
2025-12-15 21:45     ` Willem de Bruijn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=willemdebruijn.kernel.5c4c191262c5@gmail.com \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=idosch@nvidia.com \
    --cc=kuba@kernel.ord \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=shuah@kernel.org \
    --cc=vadim.fedorenko@linux.dev \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.