netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Guillaume Nault <gnault@redhat.com>
To: Ido Schimmel <idosch@idosch.org>
Cc: Simon Horman <horms@kernel.org>,
	Stanislav Fomichev <stfomichev@gmail.com>,
	David Miller <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Eric Dumazet <edumazet@google.com>,
	netdev@vger.kernel.org, David Ahern <dsahern@kernel.org>,
	Antonio Quartulli <antonio@mandelbit.com>,
	Petr Machata <petrm@nvidia.com>
Subject: Re: [PATCH net v4 1/2] gre: Fix IPv6 link-local address generation.
Date: Mon, 24 Mar 2025 15:36:46 +0100	[thread overview]
Message-ID: <Z+Ft/jPlaoM+7IUU@debian> (raw)
In-Reply-To: <Z9xmvRX_g_ZifayA@shredder>

On Thu, Mar 20, 2025 at 09:04:29PM +0200, Ido Schimmel wrote:
> On Thu, Mar 20, 2025 at 04:26:46PM +0000, Simon Horman wrote:
> > On Mon, Mar 17, 2025 at 10:10:45PM +0100, Guillaume Nault wrote:
> > > On Sun, Mar 16, 2025 at 03:08:48PM +0200, Ido Schimmel wrote:
> > > > On Fri, Mar 14, 2025 at 01:18:21PM -0700, Stanislav Fomichev wrote:
> > > > > On 03/14, Guillaume Nault wrote:
> > > > > > On Fri, Mar 14, 2025 at 08:18:32AM -0700, Stanislav Fomichev wrote:
> > > > > > > 
> > > > > > > Could you please double check net/forwarding/ip6gre_custom_multipath_hash.sh ?
> > > > > > > It seems like it started falling after this series has been pulled:
> > > > > > > https://netdev-3.bots.linux.dev/vmksft-forwarding-dbg/results/31301/2-ip6gre-custom-multipath-hash-sh/stdout
> > > > > > 
> > > > > > Hum, net/forwarding/ip6gre_custom_multipath_hash.sh works for me on the
> > > > > > current net tree (I'm at commit 4003c9e78778). I have only one failure,
> > > > > > but it already happened before 183185a18ff9 ("gre: Fix IPv6 link-local
> > > > > > address generation.") was applied.
> > > > > 
> > > > > On my side I see the following (ignore ping6 FAILs):
> > > > > 
> > > > > bfc6c67ec2d6 - (net-next/main, net-next/HEAD) net/smc: use the correct ndev to find pnetid by pnetid table (7 hours ago) <Guangguan Wang>
> > > > > 
> > > > > TAP version 13
> > > > > 1..1
> > > > > # timeout set to 0
> > > > > # selftests: net/forwarding: ip6gre_custom_multipath_hash.sh
> > > > > [    9.275735][  T167] ip (167) used greatest stack depth: 23536 bytes left
> > > > > [   13.769300][  T255] gre: GRE over IPv4 demultiplexor driver
> > > > > [   13.838185][  T255] ip6_gre: GRE over IPv6 tunneling driver
> > > > > [   13.951780][   T12] ip6_tunnel: g1 xmit: Local address not yet configured!
> > > > > [   14.038101][   T12] ip6_tunnel: g1 xmit: Local address not yet configured!
> > > > > [   15.148469][  T281] 8021q: 802.1Q VLAN Support v1.8
> > > > > [   17.559477][  T321] GACT probability NOT on
> > > > > [   18.551876][   T12] ip6_tunnel: g2 xmit: Local address not yet configured!
> > > > > [   18.633656][   T12] ip6_tunnel: g2 xmit: Local address not yet configured!
> > > > > # TEST: ping                                                          [ OK ]
> > > > > # TEST: ping6                                                         [FAIL]
> > > > > # INFO: Running IPv4 overlay custom multipath hash tests
> > > > > # TEST: Multipath hash field: Inner source IP (balanced)              [FAIL]
> > > > > #       Expected traffic to be balanced, but it is not
> > > > > # INFO: Packets sent on path1 / path2: 1 / 12602
> > > > > # TEST: Multipath hash field: Inner source IP (unbalanced)            [ OK ]
> > > > > # INFO: Packets sent on path1 / path2: 0 / 12601
> > > > > # TEST: Multipath hash field: Inner destination IP (balanced)         [FAIL]
> > > > > #       Expected traffic to be balanced, but it is not
> > > > > # INFO: Packets sent on path1 / path2: 1 / 12600
> > > > > # TEST: Multipath hash field: Inner destination IP (unbalanced)       [ OK ]
> > > > > # INFO: Packets sent on path1 / path2: 0 / 12600
> > > > > ...
> > > > > 
> > > > > 8ecea691e844 - (HEAD -> upstream/net-next/main) Revert "gre: Fix IPv6 link-local address generation." (2 minutes ago) <Stanislav Fomichev>
> > > > > 
> > > > > TAP version 13
> > > > > 1..1
> > > > > # timeout set to 0
> > > > > # selftests: net/forwarding: ip6gre_custom_multipath_hash.sh
> > > > > [   13.863060][  T252] gre: GRE over IPv4 demultiplexor driver
> > > > > [   13.911551][  T252] ip6_gre: GRE over IPv6 tunneling driver
> > > > > [   15.226124][  T277] 8021q: 802.1Q VLAN Support v1.8
> > > > > [   17.629460][  T317] GACT probability NOT on
> > > > > [   17.645781][  T315] tc (315) used greatest stack depth: 23040 bytes left
> > > > > # TEST: ping                                                          [ OK ]
> > > > > # TEST: ping6                                                         [FAIL]
> > > > > # INFO: Running IPv4 overlay custom multipath hash tests
> > > > > # TEST: Multipath hash field: Inner source IP (balanced)              [ OK ]
> > > > > # INFO: Packets sent on path1 / path2: 5552 / 7052
> > > > > # TEST: Multipath hash field: Inner source IP (unbalanced)            [ OK ]
> > > > > # INFO: Packets sent on path1 / path2: 12600 / 2
> > > > > [   36.278056][    C2] clocksource: Long readout interval, skipping watchdog check: cs_nsec: 1078005296 wd_nsec: 1078004682
> > > > > # TEST: Multipath hash field: Inner destination IP (balanced)         [ OK ]
> > > > > # INFO: Packets sent on path1 / path2: 6650 / 5950
> > > > > # TEST: Multipath hash field: Inner destination IP (unbalanced)       [ OK ]
> > > > > # INFO: Packets sent on path1 / path2: 0 / 12600
> > > > > ...
> > > > > 
> > > > > And I also see the failures on 4003c9e78778. Not sure why we see
> > > > > different results. And the NIPAs fails as well:
> > > > > 
> > > > > https://netdev-3.bots.linux.dev/vmksft-forwarding-dbg/results/32922/1-ip6gre-custom-multipath-hash-sh/stdout
> > > > 
> > > > I can reproduce this locally and I'm getting the exact same result as
> > > > the CI. All the balanced tests fail because the traffic is forwarded via
> > > > a single nexthop. No failures after reverting 183185a18ff9.
> > > > 
> > > > I'm still not sure what happens, but for some reason a neighbour is not
> > > > created on one of the nexthop devices which causes rt6_check_neigh() to
> > > > skip over this path (returning RT6_NUD_FAIL_DO_RR). Enabling
> > > > CONFIG_IPV6_ROUTER_PREF fixes the issue because then RT6_NUD_SUCCEED is
> > > > returned.
> > > > 
> > > > I can continue looking into this on Tuesday (mostly AFK tomorrow).
> > > 
> > > I finally managed to reproduce the problem using vng. Still no problem
> > > on my regular VM, no matter if I enable CONFIG_IPV6_ROUTER_PREF or not.
> > > I'll continue investigating this problem...
> > 
> > FWIIW, I have tried much, but am unable to _reliably_ reproduce this problem.
> 
> Sorry for the delay. Busy with other tasks at the moment, but I found
> some time to look into this. I believe I understand the issue and have a
> fix. Guillaume's patch is fine. It simply exposed a bug elsewhere.
> 
> The test is failing because all the packets are forwarded via a single
> path instead of being load balanced between both paths.
> fib6_select_path() chooses the path according to the hash-threshold
> algorithm. If the function is called with the last nexthop in a
> multipath route, it will always choose this nexthop because the
> calculated hash will always be smaller than the upper bound of this
> nexthop.
> 
> Fix is to find the first nexthop (sibling route) and choose the first
> matching nexthop according to hash-threshold. Given Guillaume and you
> can reproduce the issue, can you please test the fix [1]?

Good catch!
I can confirm that your patch fixes the selftest for me.

> I think Guillaume's patch exposed the issue because it caused the ip6gre
> device to transmit a packet (Router Solicitation as part of the DAD
> process for the IPv6 link-local address) as soon as the device is
> brought up. With debug kernels this might happen while forwarding is
> still disabled as the test enables forwarding at the end of the setup.
> 
> When forwarding is disabled the nexthop's neighbour state is taken into
> account when choosing a route in rt6_select() and round-robin will be
> performed between the two sibling routes. It is possible to end up in a
> situation where rt6_select() always returns the second sibling route
> which fib6_select_path() will then always select due to its upper bound.

Thanks a lot for your help Ido!

> [1]
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index fb2e99a56529..afcd66b73a92 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -412,11 +412,35 @@ static bool rt6_check_expired(const struct rt6_info *rt)
>  	return false;
>  }
>  
> +static struct fib6_info *
> +rt6_multipath_first_sibling_rcu(const struct fib6_info *rt)
> +{
> +	struct fib6_info *iter;
> +	struct fib6_node *fn;
> +
> +	fn = rcu_dereference(rt->fib6_node);
> +	if (!fn)
> +		goto out;
> +	iter = rcu_dereference(fn->leaf);
> +	if (!iter)
> +		goto out;
> +
> +	while (iter) {
> +		if (iter->fib6_metric == rt->fib6_metric &&
> +		    rt6_qualify_for_ecmp(iter))
> +			return iter;
> +		iter = rcu_dereference(iter->fib6_next);
> +	}
> +
> +out:
> +	return NULL;
> +}
> +
>  void fib6_select_path(const struct net *net, struct fib6_result *res,
>  		      struct flowi6 *fl6, int oif, bool have_oif_match,
>  		      const struct sk_buff *skb, int strict)
>  {
> -	struct fib6_info *match = res->f6i;
> +	struct fib6_info *first, *match = res->f6i;
>  	struct fib6_info *sibling;
>  
>  	if (!match->nh && (!match->fib6_nsiblings || have_oif_match))
> @@ -440,10 +464,18 @@ void fib6_select_path(const struct net *net, struct fib6_result *res,
>  		return;
>  	}
>  
> -	if (fl6->mp_hash <= atomic_read(&match->fib6_nh->fib_nh_upper_bound))
> +	first = rt6_multipath_first_sibling_rcu(match);
> +	if (!first)
>  		goto out;
>  
> -	list_for_each_entry_rcu(sibling, &match->fib6_siblings,
> +	if (fl6->mp_hash <= atomic_read(&first->fib6_nh->fib_nh_upper_bound) &&
> +	    rt6_score_route(first->fib6_nh, first->fib6_flags, oif,
> +			    strict) >= 0) {
> +		match = first;
> +		goto out;
> +	}
> +
> +	list_for_each_entry_rcu(sibling, &first->fib6_siblings,
>  				fib6_siblings) {
>  		const struct fib6_nh *nh = sibling->fib6_nh;
>  		int nh_upper_bound;
> 


  reply	other threads:[~2025-03-24 14:36 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-07 19:28 [PATCH net v4 0/2] gre: Fix regressions in IPv6 link-local address generation Guillaume Nault
2025-03-07 19:28 ` [PATCH net v4 1/2] gre: Fix " Guillaume Nault
2025-03-09 18:03   ` Ido Schimmel
2025-03-14 15:18   ` Stanislav Fomichev
2025-03-14 19:22     ` Guillaume Nault
2025-03-14 20:18       ` Stanislav Fomichev
2025-03-16 13:08         ` Ido Schimmel
2025-03-17 21:10           ` Guillaume Nault
2025-03-20 16:26             ` Simon Horman
2025-03-20 19:04               ` Ido Schimmel
2025-03-24 14:36                 ` Guillaume Nault [this message]
2025-03-07 19:28 ` [PATCH net v4 2/2] selftests: Add IPv6 link-local address generation tests for GRE devices Guillaume Nault
2025-03-09 18:04   ` Ido Schimmel
2025-03-10  9:44   ` Petr Machata
2025-03-13  9:30 ` [PATCH net v4 0/2] gre: Fix regressions in IPv6 link-local address generation patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z+Ft/jPlaoM+7IUU@debian \
    --to=gnault@redhat.com \
    --cc=antonio@mandelbit.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=idosch@idosch.org \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=petrm@nvidia.com \
    --cc=stfomichev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).