From: Guillaume Nault <gnault@redhat.com>
To: Ido Schimmel <idosch@idosch.org>
Cc: Simon Horman <horms@kernel.org>,
Stanislav Fomichev <stfomichev@gmail.com>,
David Miller <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Eric Dumazet <edumazet@google.com>,
netdev@vger.kernel.org, David Ahern <dsahern@kernel.org>,
Antonio Quartulli <antonio@mandelbit.com>,
Petr Machata <petrm@nvidia.com>
Subject: Re: [PATCH net v4 1/2] gre: Fix IPv6 link-local address generation.
Date: Mon, 24 Mar 2025 15:36:46 +0100 [thread overview]
Message-ID: <Z+Ft/jPlaoM+7IUU@debian> (raw)
In-Reply-To: <Z9xmvRX_g_ZifayA@shredder>
On Thu, Mar 20, 2025 at 09:04:29PM +0200, Ido Schimmel wrote:
> On Thu, Mar 20, 2025 at 04:26:46PM +0000, Simon Horman wrote:
> > On Mon, Mar 17, 2025 at 10:10:45PM +0100, Guillaume Nault wrote:
> > > On Sun, Mar 16, 2025 at 03:08:48PM +0200, Ido Schimmel wrote:
> > > > On Fri, Mar 14, 2025 at 01:18:21PM -0700, Stanislav Fomichev wrote:
> > > > > On 03/14, Guillaume Nault wrote:
> > > > > > On Fri, Mar 14, 2025 at 08:18:32AM -0700, Stanislav Fomichev wrote:
> > > > > > >
> > > > > > > Could you please double check net/forwarding/ip6gre_custom_multipath_hash.sh ?
> > > > > > > It seems like it started falling after this series has been pulled:
> > > > > > > https://netdev-3.bots.linux.dev/vmksft-forwarding-dbg/results/31301/2-ip6gre-custom-multipath-hash-sh/stdout
> > > > > >
> > > > > > Hum, net/forwarding/ip6gre_custom_multipath_hash.sh works for me on the
> > > > > > current net tree (I'm at commit 4003c9e78778). I have only one failure,
> > > > > > but it already happened before 183185a18ff9 ("gre: Fix IPv6 link-local
> > > > > > address generation.") was applied.
> > > > >
> > > > > On my side I see the following (ignore ping6 FAILs):
> > > > >
> > > > > bfc6c67ec2d6 - (net-next/main, net-next/HEAD) net/smc: use the correct ndev to find pnetid by pnetid table (7 hours ago) <Guangguan Wang>
> > > > >
> > > > > TAP version 13
> > > > > 1..1
> > > > > # timeout set to 0
> > > > > # selftests: net/forwarding: ip6gre_custom_multipath_hash.sh
> > > > > [ 9.275735][ T167] ip (167) used greatest stack depth: 23536 bytes left
> > > > > [ 13.769300][ T255] gre: GRE over IPv4 demultiplexor driver
> > > > > [ 13.838185][ T255] ip6_gre: GRE over IPv6 tunneling driver
> > > > > [ 13.951780][ T12] ip6_tunnel: g1 xmit: Local address not yet configured!
> > > > > [ 14.038101][ T12] ip6_tunnel: g1 xmit: Local address not yet configured!
> > > > > [ 15.148469][ T281] 8021q: 802.1Q VLAN Support v1.8
> > > > > [ 17.559477][ T321] GACT probability NOT on
> > > > > [ 18.551876][ T12] ip6_tunnel: g2 xmit: Local address not yet configured!
> > > > > [ 18.633656][ T12] ip6_tunnel: g2 xmit: Local address not yet configured!
> > > > > # TEST: ping [ OK ]
> > > > > # TEST: ping6 [FAIL]
> > > > > # INFO: Running IPv4 overlay custom multipath hash tests
> > > > > # TEST: Multipath hash field: Inner source IP (balanced) [FAIL]
> > > > > # Expected traffic to be balanced, but it is not
> > > > > # INFO: Packets sent on path1 / path2: 1 / 12602
> > > > > # TEST: Multipath hash field: Inner source IP (unbalanced) [ OK ]
> > > > > # INFO: Packets sent on path1 / path2: 0 / 12601
> > > > > # TEST: Multipath hash field: Inner destination IP (balanced) [FAIL]
> > > > > # Expected traffic to be balanced, but it is not
> > > > > # INFO: Packets sent on path1 / path2: 1 / 12600
> > > > > # TEST: Multipath hash field: Inner destination IP (unbalanced) [ OK ]
> > > > > # INFO: Packets sent on path1 / path2: 0 / 12600
> > > > > ...
> > > > >
> > > > > 8ecea691e844 - (HEAD -> upstream/net-next/main) Revert "gre: Fix IPv6 link-local address generation." (2 minutes ago) <Stanislav Fomichev>
> > > > >
> > > > > TAP version 13
> > > > > 1..1
> > > > > # timeout set to 0
> > > > > # selftests: net/forwarding: ip6gre_custom_multipath_hash.sh
> > > > > [ 13.863060][ T252] gre: GRE over IPv4 demultiplexor driver
> > > > > [ 13.911551][ T252] ip6_gre: GRE over IPv6 tunneling driver
> > > > > [ 15.226124][ T277] 8021q: 802.1Q VLAN Support v1.8
> > > > > [ 17.629460][ T317] GACT probability NOT on
> > > > > [ 17.645781][ T315] tc (315) used greatest stack depth: 23040 bytes left
> > > > > # TEST: ping [ OK ]
> > > > > # TEST: ping6 [FAIL]
> > > > > # INFO: Running IPv4 overlay custom multipath hash tests
> > > > > # TEST: Multipath hash field: Inner source IP (balanced) [ OK ]
> > > > > # INFO: Packets sent on path1 / path2: 5552 / 7052
> > > > > # TEST: Multipath hash field: Inner source IP (unbalanced) [ OK ]
> > > > > # INFO: Packets sent on path1 / path2: 12600 / 2
> > > > > [ 36.278056][ C2] clocksource: Long readout interval, skipping watchdog check: cs_nsec: 1078005296 wd_nsec: 1078004682
> > > > > # TEST: Multipath hash field: Inner destination IP (balanced) [ OK ]
> > > > > # INFO: Packets sent on path1 / path2: 6650 / 5950
> > > > > # TEST: Multipath hash field: Inner destination IP (unbalanced) [ OK ]
> > > > > # INFO: Packets sent on path1 / path2: 0 / 12600
> > > > > ...
> > > > >
> > > > > And I also see the failures on 4003c9e78778. Not sure why we see
> > > > > different results. And the NIPAs fails as well:
> > > > >
> > > > > https://netdev-3.bots.linux.dev/vmksft-forwarding-dbg/results/32922/1-ip6gre-custom-multipath-hash-sh/stdout
> > > >
> > > > I can reproduce this locally and I'm getting the exact same result as
> > > > the CI. All the balanced tests fail because the traffic is forwarded via
> > > > a single nexthop. No failures after reverting 183185a18ff9.
> > > >
> > > > I'm still not sure what happens, but for some reason a neighbour is not
> > > > created on one of the nexthop devices which causes rt6_check_neigh() to
> > > > skip over this path (returning RT6_NUD_FAIL_DO_RR). Enabling
> > > > CONFIG_IPV6_ROUTER_PREF fixes the issue because then RT6_NUD_SUCCEED is
> > > > returned.
> > > >
> > > > I can continue looking into this on Tuesday (mostly AFK tomorrow).
> > >
> > > I finally managed to reproduce the problem using vng. Still no problem
> > > on my regular VM, no matter if I enable CONFIG_IPV6_ROUTER_PREF or not.
> > > I'll continue investigating this problem...
> >
> > FWIIW, I have tried much, but am unable to _reliably_ reproduce this problem.
>
> Sorry for the delay. Busy with other tasks at the moment, but I found
> some time to look into this. I believe I understand the issue and have a
> fix. Guillaume's patch is fine. It simply exposed a bug elsewhere.
>
> The test is failing because all the packets are forwarded via a single
> path instead of being load balanced between both paths.
> fib6_select_path() chooses the path according to the hash-threshold
> algorithm. If the function is called with the last nexthop in a
> multipath route, it will always choose this nexthop because the
> calculated hash will always be smaller than the upper bound of this
> nexthop.
>
> Fix is to find the first nexthop (sibling route) and choose the first
> matching nexthop according to hash-threshold. Given Guillaume and you
> can reproduce the issue, can you please test the fix [1]?
Good catch!
I can confirm that your patch fixes the selftest for me.
> I think Guillaume's patch exposed the issue because it caused the ip6gre
> device to transmit a packet (Router Solicitation as part of the DAD
> process for the IPv6 link-local address) as soon as the device is
> brought up. With debug kernels this might happen while forwarding is
> still disabled as the test enables forwarding at the end of the setup.
>
> When forwarding is disabled the nexthop's neighbour state is taken into
> account when choosing a route in rt6_select() and round-robin will be
> performed between the two sibling routes. It is possible to end up in a
> situation where rt6_select() always returns the second sibling route
> which fib6_select_path() will then always select due to its upper bound.
Thanks a lot for your help Ido!
> [1]
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index fb2e99a56529..afcd66b73a92 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -412,11 +412,35 @@ static bool rt6_check_expired(const struct rt6_info *rt)
> return false;
> }
>
> +static struct fib6_info *
> +rt6_multipath_first_sibling_rcu(const struct fib6_info *rt)
> +{
> + struct fib6_info *iter;
> + struct fib6_node *fn;
> +
> + fn = rcu_dereference(rt->fib6_node);
> + if (!fn)
> + goto out;
> + iter = rcu_dereference(fn->leaf);
> + if (!iter)
> + goto out;
> +
> + while (iter) {
> + if (iter->fib6_metric == rt->fib6_metric &&
> + rt6_qualify_for_ecmp(iter))
> + return iter;
> + iter = rcu_dereference(iter->fib6_next);
> + }
> +
> +out:
> + return NULL;
> +}
> +
> void fib6_select_path(const struct net *net, struct fib6_result *res,
> struct flowi6 *fl6, int oif, bool have_oif_match,
> const struct sk_buff *skb, int strict)
> {
> - struct fib6_info *match = res->f6i;
> + struct fib6_info *first, *match = res->f6i;
> struct fib6_info *sibling;
>
> if (!match->nh && (!match->fib6_nsiblings || have_oif_match))
> @@ -440,10 +464,18 @@ void fib6_select_path(const struct net *net, struct fib6_result *res,
> return;
> }
>
> - if (fl6->mp_hash <= atomic_read(&match->fib6_nh->fib_nh_upper_bound))
> + first = rt6_multipath_first_sibling_rcu(match);
> + if (!first)
> goto out;
>
> - list_for_each_entry_rcu(sibling, &match->fib6_siblings,
> + if (fl6->mp_hash <= atomic_read(&first->fib6_nh->fib_nh_upper_bound) &&
> + rt6_score_route(first->fib6_nh, first->fib6_flags, oif,
> + strict) >= 0) {
> + match = first;
> + goto out;
> + }
> +
> + list_for_each_entry_rcu(sibling, &first->fib6_siblings,
> fib6_siblings) {
> const struct fib6_nh *nh = sibling->fib6_nh;
> int nh_upper_bound;
>
next prev parent reply other threads:[~2025-03-24 14:36 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-07 19:28 [PATCH net v4 0/2] gre: Fix regressions in IPv6 link-local address generation Guillaume Nault
2025-03-07 19:28 ` [PATCH net v4 1/2] gre: Fix " Guillaume Nault
2025-03-09 18:03 ` Ido Schimmel
2025-03-14 15:18 ` Stanislav Fomichev
2025-03-14 19:22 ` Guillaume Nault
2025-03-14 20:18 ` Stanislav Fomichev
2025-03-16 13:08 ` Ido Schimmel
2025-03-17 21:10 ` Guillaume Nault
2025-03-20 16:26 ` Simon Horman
2025-03-20 19:04 ` Ido Schimmel
2025-03-24 14:36 ` Guillaume Nault [this message]
2025-03-07 19:28 ` [PATCH net v4 2/2] selftests: Add IPv6 link-local address generation tests for GRE devices Guillaume Nault
2025-03-09 18:04 ` Ido Schimmel
2025-03-10 9:44 ` Petr Machata
2025-03-13 9:30 ` [PATCH net v4 0/2] gre: Fix regressions in IPv6 link-local address generation patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z+Ft/jPlaoM+7IUU@debian \
--to=gnault@redhat.com \
--cc=antonio@mandelbit.com \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=idosch@idosch.org \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=petrm@nvidia.com \
--cc=stfomichev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.