From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Ido Schimmel <idosch@nvidia.com>, netdev@vger.kernel.org
Cc: davem@davemloft.net, pabeni@redhat.com, edumazet@google.com,
dsahern@kernel.org, horms@kernel.org, gnault@redhat.com,
stfomichev@gmail.com, Ido Schimmel <idosch@nvidia.com>
Subject: Re: [PATCH net 1/2] ipv6: Start path selection from the first nexthop
Date: Fri, 04 Apr 2025 10:40:32 -0400 [thread overview]
Message-ID: <67efef607bc41_1ddca82948c@willemb.c.googlers.com.notmuch> (raw)
In-Reply-To: <20250402114224.293392-2-idosch@nvidia.com>
Ido Schimmel wrote:
> Cited commit transitioned IPv6 path selection to use hash-threshold
> instead of modulo-N. With hash-threshold, each nexthop is assigned a
> region boundary in the multipath hash function's output space and a
> nexthop is chosen if the calculated hash is smaller than the nexthop's
> region boundary.
>
> Hash-threshold does not work correctly if path selection does not start
> with the first nexthop. For example, if fib6_select_path() is always
> passed the last nexthop in the group, then it will always be chosen
> because its region boundary covers the entire hash function's output
> space.
>
> Fix this by starting the selection process from the first nexthop and do
> not consider nexthops for which rt6_score_route() provided a negative
> score.
>
> Fixes: 3d709f69a3e7 ("ipv6: Use hash-threshold instead of modulo-N")
> Reported-by: Stanislav Fomichev <stfomichev@gmail.com>
> Closes: https://lore.kernel.org/netdev/Z9RIyKZDNoka53EO@mini-arch/
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> ---
> net/ipv6/route.c | 38 +++++++++++++++++++++++++++++++++++---
> 1 file changed, 35 insertions(+), 3 deletions(-)
>
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index c3406a0d45bd..864f0002034b 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -412,11 +412,35 @@ static bool rt6_check_expired(const struct rt6_info *rt)
> return false;
> }
>
> +static struct fib6_info *
> +rt6_multipath_first_sibling_rcu(const struct fib6_info *rt)
> +{
> + struct fib6_info *iter;
> + struct fib6_node *fn;
> +
> + fn = rcu_dereference(rt->fib6_node);
> + if (!fn)
> + goto out;
> + iter = rcu_dereference(fn->leaf);
> + if (!iter)
> + goto out;
> +
> + while (iter) {
> + if (iter->fib6_metric == rt->fib6_metric &&
> + rt6_qualify_for_ecmp(iter))
> + return iter;
> + iter = rcu_dereference(iter->fib6_next);
> + }
> +
> +out:
> + return NULL;
> +}
The rcu counterpart to rt6_multipath_first_sibling, which is used when
computing the ranges in rt6_multipath_rebalance.
> +
> void fib6_select_path(const struct net *net, struct fib6_result *res,
> struct flowi6 *fl6, int oif, bool have_oif_match,
> const struct sk_buff *skb, int strict)
> {
> - struct fib6_info *match = res->f6i;
> + struct fib6_info *first, *match = res->f6i;
> struct fib6_info *sibling;
>
> if (!match->nh && (!match->fib6_nsiblings || have_oif_match))
> @@ -440,10 +464,18 @@ void fib6_select_path(const struct net *net, struct fib6_result *res,
> return;
> }
>
> - if (fl6->mp_hash <= atomic_read(&match->fib6_nh->fib_nh_upper_bound))
> + first = rt6_multipath_first_sibling_rcu(match);
> + if (!first)
> goto out;
>
> - list_for_each_entry_rcu(sibling, &match->fib6_siblings,
> + if (fl6->mp_hash <= atomic_read(&first->fib6_nh->fib_nh_upper_bound) &&
> + rt6_score_route(first->fib6_nh, first->fib6_flags, oif,
> + strict) >= 0) {
Does this fix address two issues in one patch: start from the first
sibling, and check validity of the sibling?
The behavior on negative score for the first_sibling appears
different from that on subsequent siblings in the for_each below:
in that case the loop breaks, while for the first it skips?
if (fl6->mp_hash > nh_upper_bound)
continue;
if (rt6_score_route(nh, sibling->fib6_flags, oif, strict) < 0)
break;
match = sibling;
break;
Am I reading that correct and is that intentional?
> + match = first;
> + goto out;
> + }
> +
> + list_for_each_entry_rcu(sibling, &first->fib6_siblings,
> fib6_siblings) {
> const struct fib6_nh *nh = sibling->fib6_nh;
> int nh_upper_bound;
> --
> 2.49.0
>
next prev parent reply other threads:[~2025-04-04 14:40 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-02 11:42 [PATCH net 0/2] ipv6: Multipath routing fixes Ido Schimmel
2025-04-02 11:42 ` [PATCH net 1/2] ipv6: Start path selection from the first nexthop Ido Schimmel
2025-04-04 14:40 ` Willem de Bruijn [this message]
2025-04-06 13:45 ` Ido Schimmel
2025-04-06 18:30 ` Willem de Bruijn
2025-04-07 6:38 ` Ido Schimmel
2025-04-07 14:31 ` Willem de Bruijn
2025-04-02 11:42 ` [PATCH net 2/2] ipv6: Do not consider link down nexthops in path selection Ido Schimmel
2025-04-04 13:22 ` Willem de Bruijn
2025-04-04 14:03 ` Willem de Bruijn
2025-04-04 14:07 ` Willem de Bruijn
2025-04-04 14:40 ` [PATCH net 0/2] ipv6: Multipath routing fixes patchwork-bot+netdevbpf
2025-04-04 14:49 ` Jakub Kicinski
2025-04-04 16:22 ` Willem de Bruijn
2025-04-07 15:12 ` Guillaume Nault
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=67efef607bc41_1ddca82948c@willemb.c.googlers.com.notmuch \
--to=willemdebruijn.kernel@gmail.com \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=gnault@redhat.com \
--cc=horms@kernel.org \
--cc=idosch@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=stfomichev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox