From: Ido Schimmel <idosch@nvidia.com>
To: <netdev@vger.kernel.org>
Cc: <davem@davemloft.net>, <pabeni@redhat.com>, <edumazet@google.com>,
<dsahern@kernel.org>, <horms@kernel.org>, <gnault@redhat.com>,
<stfomichev@gmail.com>, Ido Schimmel <idosch@nvidia.com>
Subject: [PATCH net 1/2] ipv6: Start path selection from the first nexthop
Date: Wed, 2 Apr 2025 14:42:23 +0300 [thread overview]
Message-ID: <20250402114224.293392-2-idosch@nvidia.com> (raw)
In-Reply-To: <20250402114224.293392-1-idosch@nvidia.com>
Cited commit transitioned IPv6 path selection to use hash-threshold
instead of modulo-N. With hash-threshold, each nexthop is assigned a
region boundary in the multipath hash function's output space and a
nexthop is chosen if the calculated hash is smaller than the nexthop's
region boundary.
Hash-threshold does not work correctly if path selection does not start
with the first nexthop. For example, if fib6_select_path() is always
passed the last nexthop in the group, then it will always be chosen
because its region boundary covers the entire hash function's output
space.
Fix this by starting the selection process from the first nexthop and do
not consider nexthops for which rt6_score_route() provided a negative
score.
Fixes: 3d709f69a3e7 ("ipv6: Use hash-threshold instead of modulo-N")
Reported-by: Stanislav Fomichev <stfomichev@gmail.com>
Closes: https://lore.kernel.org/netdev/Z9RIyKZDNoka53EO@mini-arch/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
net/ipv6/route.c | 38 +++++++++++++++++++++++++++++++++++---
1 file changed, 35 insertions(+), 3 deletions(-)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c3406a0d45bd..864f0002034b 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -412,11 +412,35 @@ static bool rt6_check_expired(const struct rt6_info *rt)
return false;
}
+static struct fib6_info *
+rt6_multipath_first_sibling_rcu(const struct fib6_info *rt)
+{
+ struct fib6_info *iter;
+ struct fib6_node *fn;
+
+ fn = rcu_dereference(rt->fib6_node);
+ if (!fn)
+ goto out;
+ iter = rcu_dereference(fn->leaf);
+ if (!iter)
+ goto out;
+
+ while (iter) {
+ if (iter->fib6_metric == rt->fib6_metric &&
+ rt6_qualify_for_ecmp(iter))
+ return iter;
+ iter = rcu_dereference(iter->fib6_next);
+ }
+
+out:
+ return NULL;
+}
+
void fib6_select_path(const struct net *net, struct fib6_result *res,
struct flowi6 *fl6, int oif, bool have_oif_match,
const struct sk_buff *skb, int strict)
{
- struct fib6_info *match = res->f6i;
+ struct fib6_info *first, *match = res->f6i;
struct fib6_info *sibling;
if (!match->nh && (!match->fib6_nsiblings || have_oif_match))
@@ -440,10 +464,18 @@ void fib6_select_path(const struct net *net, struct fib6_result *res,
return;
}
- if (fl6->mp_hash <= atomic_read(&match->fib6_nh->fib_nh_upper_bound))
+ first = rt6_multipath_first_sibling_rcu(match);
+ if (!first)
goto out;
- list_for_each_entry_rcu(sibling, &match->fib6_siblings,
+ if (fl6->mp_hash <= atomic_read(&first->fib6_nh->fib_nh_upper_bound) &&
+ rt6_score_route(first->fib6_nh, first->fib6_flags, oif,
+ strict) >= 0) {
+ match = first;
+ goto out;
+ }
+
+ list_for_each_entry_rcu(sibling, &first->fib6_siblings,
fib6_siblings) {
const struct fib6_nh *nh = sibling->fib6_nh;
int nh_upper_bound;
--
2.49.0
next prev parent reply other threads:[~2025-04-02 11:43 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-02 11:42 [PATCH net 0/2] ipv6: Multipath routing fixes Ido Schimmel
2025-04-02 11:42 ` Ido Schimmel [this message]
2025-04-04 14:40 ` [PATCH net 1/2] ipv6: Start path selection from the first nexthop Willem de Bruijn
2025-04-06 13:45 ` Ido Schimmel
2025-04-06 18:30 ` Willem de Bruijn
2025-04-07 6:38 ` Ido Schimmel
2025-04-07 14:31 ` Willem de Bruijn
2025-04-02 11:42 ` [PATCH net 2/2] ipv6: Do not consider link down nexthops in path selection Ido Schimmel
2025-04-04 13:22 ` Willem de Bruijn
2025-04-04 14:03 ` Willem de Bruijn
2025-04-04 14:07 ` Willem de Bruijn
2025-04-04 14:40 ` [PATCH net 0/2] ipv6: Multipath routing fixes patchwork-bot+netdevbpf
2025-04-04 14:49 ` Jakub Kicinski
2025-04-04 16:22 ` Willem de Bruijn
2025-04-07 15:12 ` Guillaume Nault
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250402114224.293392-2-idosch@nvidia.com \
--to=idosch@nvidia.com \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=gnault@redhat.com \
--cc=horms@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=stfomichev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).