From: Ido Schimmel <idosch@idosch.org>
To: David Ahern <dsahern@kernel.org>
Cc: netdev@vger.kernel.org, kuba@kernel.org, davem@davemloft.net,
greearb@candelatech.com
Subject: Re: [PATCH net-next] net: Add l3mdev index to flow struct and avoid oif reset for port devices
Date: Tue, 22 Mar 2022 17:31:27 +0200 [thread overview]
Message-ID: <Yjnrz7vL9HqE5UBz@shredder> (raw)
In-Reply-To: <0b0b61a1-e46d-6134-0151-913b324f056a@kernel.org>
On Tue, Mar 22, 2022 at 08:26:48AM -0600, David Ahern wrote:
> On 3/22/22 3:22 AM, Ido Schimmel wrote:
> > On Mon, Mar 14, 2022 at 02:45:51PM -0600, David Ahern wrote:
> >> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
> >> index 2af2b99e0bea..fb0e49c36c2e 100644
> >> --- a/net/ipv4/fib_trie.c
> >> +++ b/net/ipv4/fib_trie.c
> >> @@ -1429,11 +1429,8 @@ bool fib_lookup_good_nhc(const struct fib_nh_common *nhc, int fib_flags,
> >> !(fib_flags & FIB_LOOKUP_IGNORE_LINKSTATE))
> >> return false;
> >>
> >> - if (!(flp->flowi4_flags & FLOWI_FLAG_SKIP_NH_OIF)) {
> >> - if (flp->flowi4_oif &&
> >> - flp->flowi4_oif != nhc->nhc_oif)
> >> - return false;
> >> - }
> >> + if (flp->flowi4_oif && flp->flowi4_oif != nhc->nhc_oif)
> >> + return false;
> >
> > David, we have several test cases that are failing which I have tracked
> > down to this patch.
> >
> > Before the patch, if the original output interface was enslaved to a
> > VRF, the output interface in the flow struct would be updated to the VRF
> > and the 'FLOWI_FLAG_SKIP_NH_OIF' flag would be set, causing the above
> > check to be skipped.
> >
> > After the patch, the check is no longer skipped, as original output
> > interface is retained and the flag was removed.
> >
> > This breaks scenarios where a GRE tunnel specifies a dummy device
> > enslaved to a VRF as its physical device. The purpose of this
> > configuration is to redirect the underlay lookup to the table associated
> > with the VRF to which the dummy device is enslaved to. The check fails
> > because 'flp->flowi4_oif' points to the dummy device, whereas
> > 'nhc->nhc_oif' points to the interface via which the encapsulated packet
> > should egress.
> >
> > Skipping the check when an l3mdev was set seems to solve the problem:
> >
> > diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
> > index fb0e49c36c2e..cf1164e05d92 100644
> > --- a/net/ipv4/fib_trie.c
> > +++ b/net/ipv4/fib_trie.c
> > @@ -1429,7 +1429,8 @@ bool fib_lookup_good_nhc(const struct fib_nh_common *nhc, int fib_flags,
> > !(fib_flags & FIB_LOOKUP_IGNORE_LINKSTATE))
> > return false;
> >
> > - if (flp->flowi4_oif && flp->flowi4_oif != nhc->nhc_oif)
> > + if (!flp->flowi4_l3mdev &&
> > + flp->flowi4_oif && flp->flowi4_oif != nhc->nhc_oif)
> > return false;
> >
> > return true;
> >
> > AFAICT, this scenario does not break with ip6gre/ip6gretap tunnels
> > because 'RT6_LOOKUP_F_IFACE' is not set in
> > ip6_route_output_flags_noref() in this case.
> >
> > WDYT? I plan to test this patch in our regression, but I'm not sure if I
> > missed other cases that might remain broken.
>
> one of the requests with VRF has been to bind a socket to a port device
> and expect the lookup to enforce use of that egress port (e.g.,
> multipath). Switching the oif to the VRF device and then ignoring the
> oif check was making that check too flexible for that use case.
I see
>
> What's the callchain for this failure? Perhaps the
> FLOWI_FLAG_SKIP_NH_OIF needs to be kept for this particular use case.
This is the stack trace for the failure:
fib_lookup_good_nhc+5
fib_table_lookup+3281
fib4_rule_action+501
fib_rules_lookup+858
__fib_lookup+233
fib_lookup.constprop.0+926
ip_route_output_key_hash_rcu+3707
ip_route_output_key_hash+392
ip_route_output_flow+33
ip_tunnel_xmit+1794
gre_tap_xmit+1312
dev_hard_start_xmit+448
sch_direct_xmit+615
__dev_queue_xmit+4841
The GRE tap is using a dummy device enslaved to a VRF as its physical
device.
prev parent reply other threads:[~2022-03-22 15:31 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-14 20:45 [PATCH net-next] net: Add l3mdev index to flow struct and avoid oif reset for port devices David Ahern
2022-03-16 5:00 ` patchwork-bot+netdevbpf
2022-03-22 9:22 ` Ido Schimmel
2022-03-22 14:26 ` David Ahern
2022-03-22 15:31 ` Ido Schimmel [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yjnrz7vL9HqE5UBz@shredder \
--to=idosch@idosch.org \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=greearb@candelatech.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox