public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Ido Schimmel <idosch@idosch.org>
To: David Ahern <dsahern@kernel.org>
Cc: netdev@vger.kernel.org, kuba@kernel.org, davem@davemloft.net,
	greearb@candelatech.com
Subject: Re: [PATCH net-next] net: Add l3mdev index to flow struct and avoid oif reset for port devices
Date: Tue, 22 Mar 2022 17:31:27 +0200	[thread overview]
Message-ID: <Yjnrz7vL9HqE5UBz@shredder> (raw)
In-Reply-To: <0b0b61a1-e46d-6134-0151-913b324f056a@kernel.org>

On Tue, Mar 22, 2022 at 08:26:48AM -0600, David Ahern wrote:
> On 3/22/22 3:22 AM, Ido Schimmel wrote:
> > On Mon, Mar 14, 2022 at 02:45:51PM -0600, David Ahern wrote:
> >> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
> >> index 2af2b99e0bea..fb0e49c36c2e 100644
> >> --- a/net/ipv4/fib_trie.c
> >> +++ b/net/ipv4/fib_trie.c
> >> @@ -1429,11 +1429,8 @@ bool fib_lookup_good_nhc(const struct fib_nh_common *nhc, int fib_flags,
> >>  	    !(fib_flags & FIB_LOOKUP_IGNORE_LINKSTATE))
> >>  		return false;
> >>  
> >> -	if (!(flp->flowi4_flags & FLOWI_FLAG_SKIP_NH_OIF)) {
> >> -		if (flp->flowi4_oif &&
> >> -		    flp->flowi4_oif != nhc->nhc_oif)
> >> -			return false;
> >> -	}
> >> +	if (flp->flowi4_oif && flp->flowi4_oif != nhc->nhc_oif)
> >> +		return false;
> > 
> > David, we have several test cases that are failing which I have tracked
> > down to this patch.
> > 
> > Before the patch, if the original output interface was enslaved to a
> > VRF, the output interface in the flow struct would be updated to the VRF
> > and the 'FLOWI_FLAG_SKIP_NH_OIF' flag would be set, causing the above
> > check to be skipped.
> > 
> > After the patch, the check is no longer skipped, as original output
> > interface is retained and the flag was removed.
> > 
> > This breaks scenarios where a GRE tunnel specifies a dummy device
> > enslaved to a VRF as its physical device. The purpose of this
> > configuration is to redirect the underlay lookup to the table associated
> > with the VRF to which the dummy device is enslaved to. The check fails
> > because 'flp->flowi4_oif' points to the dummy device, whereas
> > 'nhc->nhc_oif' points to the interface via which the encapsulated packet
> > should egress.
> > 
> > Skipping the check when an l3mdev was set seems to solve the problem:
> > 
> > diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
> > index fb0e49c36c2e..cf1164e05d92 100644
> > --- a/net/ipv4/fib_trie.c
> > +++ b/net/ipv4/fib_trie.c
> > @@ -1429,7 +1429,8 @@ bool fib_lookup_good_nhc(const struct fib_nh_common *nhc, int fib_flags,
> >             !(fib_flags & FIB_LOOKUP_IGNORE_LINKSTATE))
> >                 return false;
> >  
> > -       if (flp->flowi4_oif && flp->flowi4_oif != nhc->nhc_oif)
> > +       if (!flp->flowi4_l3mdev &&
> > +           flp->flowi4_oif && flp->flowi4_oif != nhc->nhc_oif)
> >                 return false;
> >  
> >         return true;
> > 
> > AFAICT, this scenario does not break with ip6gre/ip6gretap tunnels
> > because 'RT6_LOOKUP_F_IFACE' is not set in
> > ip6_route_output_flags_noref() in this case.
> > 
> > WDYT? I plan to test this patch in our regression, but I'm not sure if I
> > missed other cases that might remain broken.
> 
> one of the requests with VRF has been to bind a socket to a port device
> and expect the lookup to enforce use of that egress port (e.g.,
> multipath). Switching the oif to the VRF device and then ignoring the
> oif check was making that check too flexible for that use case.

I see

> 
> What's the callchain for this failure? Perhaps the
> FLOWI_FLAG_SKIP_NH_OIF needs to be kept for this particular use case.

This is the stack trace for the failure:

    fib_lookup_good_nhc+5
    fib_table_lookup+3281
    fib4_rule_action+501
    fib_rules_lookup+858
    __fib_lookup+233
    fib_lookup.constprop.0+926
    ip_route_output_key_hash_rcu+3707
    ip_route_output_key_hash+392
    ip_route_output_flow+33
    ip_tunnel_xmit+1794
    gre_tap_xmit+1312
    dev_hard_start_xmit+448
    sch_direct_xmit+615
    __dev_queue_xmit+4841

The GRE tap is using a dummy device enslaved to a VRF as its physical
device.

      reply	other threads:[~2022-03-22 15:31 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-14 20:45 [PATCH net-next] net: Add l3mdev index to flow struct and avoid oif reset for port devices David Ahern
2022-03-16  5:00 ` patchwork-bot+netdevbpf
2022-03-22  9:22 ` Ido Schimmel
2022-03-22 14:26   ` David Ahern
2022-03-22 15:31     ` Ido Schimmel [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yjnrz7vL9HqE5UBz@shredder \
    --to=idosch@idosch.org \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=greearb@candelatech.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox