Re: [PATCH net 2/2] ipv6: Do not consider link down nexthops in path selection

Netdev List
 help / color / mirror / Atom feed

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
	 Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
	 Ido Schimmel <idosch@nvidia.com>,
	 netdev@vger.kernel.org
Cc: davem@davemloft.net,  pabeni@redhat.com,  edumazet@google.com,
	 dsahern@kernel.org,  horms@kernel.org,  gnault@redhat.com,
	 stfomichev@gmail.com,  Ido Schimmel <idosch@nvidia.com>
Subject: Re: [PATCH net 2/2] ipv6: Do not consider link down nexthops in path selection
Date: Fri, 04 Apr 2025 10:07:54 -0400	[thread overview]
Message-ID: <67efe7bab0b19_1dbb5d2944f@willemb.c.googlers.com.notmuch> (raw)
In-Reply-To: <67efe6a48e8d0_1d96d729444@willemb.c.googlers.com.notmuch>

Willem de Bruijn wrote:
> Willem de Bruijn wrote:
> > Ido Schimmel wrote:
> > > Nexthops whose link is down are not supposed to be considered during
> > > path selection when the "ignore_routes_with_linkdown" sysctl is set.
> > > This is done by assigning them a negative region boundary.
> > > 
> > > However, when comparing the computed hash (unsigned) with the region
> > > boundary (signed), the negative region boundary is treated as unsigned,
> > > resulting in incorrect nexthop selection.
> > > 
> > > Fix by treating the computed hash as signed. Note that the computed hash
> > > is always in range of [0, 2^31 - 1].
> > > 
> > > Fixes: 3d709f69a3e7 ("ipv6: Use hash-threshold instead of modulo-N")
> > > Signed-off-by: Ido Schimmel <idosch@nvidia.com>

Reviewed-by: Willem de Bruijn <willemb@google.com>

> > > ---
> > >  net/ipv6/route.c | 6 ++++--
> > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> > > index 864f0002034b..ab12b816ab94 100644
> > > --- a/net/ipv6/route.c
> > > +++ b/net/ipv6/route.c
> > > @@ -442,6 +442,7 @@ void fib6_select_path(const struct net *net, struct fib6_result *res,
> > >  {
> > >  	struct fib6_info *first, *match = res->f6i;
> > >  	struct fib6_info *sibling;
> > > +	int hash;
> > >  
> > >  	if (!match->nh && (!match->fib6_nsiblings || have_oif_match))
> > >  		goto out;
> > > @@ -468,7 +469,8 @@ void fib6_select_path(const struct net *net, struct fib6_result *res,
> > >  	if (!first)
> > >  		goto out;
> > >  
> > > -	if (fl6->mp_hash <= atomic_read(&first->fib6_nh->fib_nh_upper_bound) &&
> > > +	hash = fl6->mp_hash;
> > > +	if (hash <= atomic_read(&first->fib6_nh->fib_nh_upper_bound) &&
> > 
> > The combined upper bounds add up to the total weights of the paths.
> > 
> > Should hash be scaled (using reciprocal_scale) to that bound to have
> > a uniform random distribution across all weights?
> > 
> > Else a hash in the range [0, 2^31 - 1] is unlikely to fall within the
> > total weights range.
> 
> Never mind, the scaling is handled in rt6_upper_bound_set. Where
> weights are scaled to cover the [0, INT_MAX - 1] range.
> 
> I confused fib_nh_weight with fib_nh_upper_bound.
> 
> But should U32 hash then be truncated to the lower 31 bits, to
> drop the sign and negative half of the space when used as int?

And you document this in the commit message: "Note that the computed
hash is always in range of [0, 2^31 - 1]".

That is the `mhash >> 1` at the bottom of rt6_multipath_hash.

Sorry, I'm a bit slow in internalizing this code. And perhaps a bit
too fast at responding ;) But got it now!

 
> > >  	    rt6_score_route(first->fib6_nh, first->fib6_flags, oif,
> > >  			    strict) >= 0) {
> > >  		match = first;
> > > @@ -481,7 +483,7 @@ void fib6_select_path(const struct net *net, struct fib6_result *res,
> > >  		int nh_upper_bound;
> > >  
> > >  		nh_upper_bound = atomic_read(&nh->fib_nh_upper_bound);
> > > -		if (fl6->mp_hash > nh_upper_bound)
> > > +		if (hash > nh_upper_bound)
> > >  			continue;
> > >  		if (rt6_score_route(nh, sibling->fib6_flags, oif, strict) < 0)
> > >  			break;
> > > -- 
> > > 2.49.0
> > > 
> > 
> > 
> 
>

next prev parent reply	other threads:[~2025-04-04 14:07 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-02 11:42 [PATCH net 0/2] ipv6: Multipath routing fixes Ido Schimmel
2025-04-02 11:42 ` [PATCH net 1/2] ipv6: Start path selection from the first nexthop Ido Schimmel
2025-04-04 14:40   ` Willem de Bruijn
2025-04-06 13:45     ` Ido Schimmel
2025-04-06 18:30       ` Willem de Bruijn
2025-04-07  6:38         ` Ido Schimmel
2025-04-07 14:31           ` Willem de Bruijn
2025-04-02 11:42 ` [PATCH net 2/2] ipv6: Do not consider link down nexthops in path selection Ido Schimmel
2025-04-04 13:22   ` Willem de Bruijn
2025-04-04 14:03     ` Willem de Bruijn
2025-04-04 14:07       ` Willem de Bruijn [this message]
2025-04-04 14:40 ` [PATCH net 0/2] ipv6: Multipath routing fixes patchwork-bot+netdevbpf
2025-04-04 14:49   ` Jakub Kicinski
2025-04-04 16:22     ` Willem de Bruijn
2025-04-07 15:12 ` Guillaume Nault

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=67efe7bab0b19_1dbb5d2944f@willemb.c.googlers.com.notmuch \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=gnault@redhat.com \
    --cc=horms@kernel.org \
    --cc=idosch@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=stfomichev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox