From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Avinash Duduskar <avinash.duduskar@gmail.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>
Cc: Eduard Zingerman <eddyz87@gmail.com>,
Kumar Kartikeya Dwivedi <memxor@gmail.com>,
Martin KaFai Lau <martin.lau@linux.dev>,
Song Liu <song@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>,
Jiri Olsa <jolsa@kernel.org>,
Emil Tsalapatis <emil@etsalapatis.com>,
John Fastabend <john.fastabend@gmail.com>,
Stanislav Fomichev <sdf@fomichev.me>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>, David Ahern <dsahern@kernel.org>,
Shuah Khan <shuah@kernel.org>,
Jesper Dangaard Brouer <hawk@kernel.org>,
Mykyta Yatsenko <yatsenko@meta.com>,
Leon Hwang <leon.hwang@linux.dev>, KP Singh <kpsingh@kernel.org>,
Anton Protopopov <a.s.protopopov@gmail.com>,
Amery Hung <ameryhung@gmail.com>,
Eyal Birger <eyal.birger@gmail.com>, Rong Tao <rongtao@cestc.cn>,
bpf@vger.kernel.org, netdev@vger.kernel.org,
linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH bpf-next v2 3/4] bpf: Add BPF_FIB_LOOKUP_VLAN_INPUT flag to bpf_fib_lookup() helper
Date: Wed, 17 Jun 2026 11:42:53 +0200 [thread overview]
Message-ID: <874ij1h59e.fsf@toke.dk> (raw)
In-Reply-To: <20260616223426.3568080-4-avinash.duduskar@gmail.com>
Avinash Duduskar <avinash.duduskar@gmail.com> writes:
> BPF_FIB_LOOKUP_VLAN resolves a VLAN egress. The reverse is also
> useful: an XDP program receiving a VLAN-tagged frame on a physical
> device wants the lookup to behave as if the packet had arrived on the
> corresponding VLAN subinterface, so iif-based policy routing and VRF
> table selection use the right ingress.
>
> Add BPF_FIB_LOOKUP_VLAN_INPUT. When set, params->h_vlan_proto and
> params->h_vlan_TCI are read as an input VLAN tag and the matching VLAN
> device of params->ifindex is resolved with __vlan_find_dev_deep_rcu().
> The device must be up and in the same network namespace as
> params->ifindex (a VLAN device can be moved to another netns while
> registered on its parent; receive would deliver into that other
> namespace, which a lookup here cannot represent). If params->ifindex
> is itself a VLAN device, its inner (QinQ) subinterface is matched.
> For a bond or team, a tag on a port matches no device and returns
> NOT_FWDED; pass the master's ifindex.
> The lookup then runs with the resolved device as the ingress;
> params->ifindex itself is not modified on the input side. When the
> resolved device is enslaved to a VRF, both the full lookup (via the
> l3mdev rule) and BPF_FIB_LOOKUP_DIRECT (via l3mdev_fib_table_rcu())
> select the VRF's table from the resolved ingress. That follows from
> feeding the resolved device to the flow as the ingress
> (fl4.flowi4_iif = dev->ifindex), which is what makes l3mdev resolve
> the VRF master from the subinterface rather than from
> params->ifindex.
>
> The two failure classes get different treatment on purpose. A
> h_vlan_proto other than 802.1Q/802.1ad is API misuse and returns
> -EINVAL, since it would otherwise reach the WARN in vlan_proto_idx()
> with a program-controlled value. An unmatched VID, a device that is
> down, or one in another namespace is a data outcome and returns
> BPF_FIB_LKUP_RET_NOT_FWDED, matching the DIRECT path when
> fib_get_table() finds no table and mirroring real ingress, where the
> receive path drops such frames. A VID of 0 (a priority tag) is looked
> up literally and normally fails the same way; receive instead
> processes such frames untagged, so callers should not set the flag for
> priority tags. Proceeding on the physical device for any of these
> would be fail-open for the policy-routing cases above.
>
> The h_vlan fields share a union with tbid, so the flag cannot be
> combined with BPF_FIB_LOOKUP_TBID. It describes ingress, so it also
> cannot be combined with BPF_FIB_LOOKUP_OUTPUT. Both combinations
> return -EINVAL; restricting now keeps a later relaxation backward
> compatible. Combining with BPF_FIB_LOOKUP_VLAN is allowed: the tag is
> consumed on the ingress side and the egress tag is written on
> success.
>
> Under !CONFIG_VLAN_8021Q the __vlan_find_dev_deep_rcu() stub returns
> NULL, so every lookup with the flag returns NOT_FWDED, which is
> correct since no VLAN device can exist.
>
> Suggested-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Signed-off-by: Avinash Duduskar <avinash.duduskar@gmail.com>
> ---
> include/uapi/linux/bpf.h | 34 ++++++++++++++-
> net/core/filter.c | 80 +++++++++++++++++++++++++++++++---
> tools/include/uapi/linux/bpf.h | 34 ++++++++++++++-
> 3 files changed, 141 insertions(+), 7 deletions(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index f77aa9472bf1..57e28da3336a 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -3552,6 +3552,35 @@ union bpf_attr {
> * reports the route mtu in *params*->mtu_result, and on
> * the tc path without tot_len the mtu check runs after
> * the swap, against the parent device.
> + * **BPF_FIB_LOOKUP_VLAN_INPUT**
> + * Treat *params*->h_vlan_proto and *params*->h_vlan_TCI
> + * as an input VLAN tag (e.g. parsed from the packet) and
> + * run the lookup as if ingress had happened on the VLAN
> + * subinterface carrying that tag for *params*->ifindex,
> + * rather than on *params*->ifindex itself. The VID is the
> + * low 12 bits of *params*->h_vlan_TCI;
> + * *params*->h_vlan_proto must be ETH_P_8021Q or
> + * ETH_P_8021AD in network byte order (any other value
> + * returns **-EINVAL**). The
> + * subinterface is the one configured for that tag on
> + * *params*->ifindex; if *params*->ifindex is itself a
> + * VLAN device, its inner (QinQ) subinterface is matched.
> + * For a bond or team, a tag on a port matches no
> + * device and returns NOT_FWDED; pass the master's
> + * ifindex.
> + * If no matching subinterface exists, or it is not up,
> + * or it was moved to another network namespace, the
> + * lookup returns **BPF_FIB_LKUP_RET_NOT_FWDED**,
> + * mirroring real ingress, which drops a frame whose tag
> + * is unconfigured or whose VLAN device is down. A VID of
> + * 0 (a priority-tagged frame) is looked up literally like
> + * any other VID; receive instead processes such frames
> + * untagged on the device itself, so do not set this flag
> + * for priority tags.
> + * Cannot be combined with **BPF_FIB_LOOKUP_TBID** (both
> + * use the same input fields) or **BPF_FIB_LOOKUP_OUTPUT**
> + * (this flag is ingress-only); doing so returns
> + * **-EINVAL**.
This comment is also overly long - please trim.
> *
> * *ctx* is either **struct xdp_md** for XDP programs or
> * **struct sk_buff** tc cls_act programs.
> @@ -7348,6 +7377,7 @@ enum {
> BPF_FIB_LOOKUP_SRC = (1U << 4),
> BPF_FIB_LOOKUP_MARK = (1U << 5),
> BPF_FIB_LOOKUP_VLAN = (1U << 6),
> + BPF_FIB_LOOKUP_VLAN_INPUT = (1U << 7),
> };
>
> enum {
> @@ -7416,7 +7446,9 @@ struct bpf_fib_lookup {
> struct {
> /* output with BPF_FIB_LOOKUP_VLAN: set from the
> * resolved egress VLAN device (see the flag); zeroed
> - * on other successful lookups.
> + * on other successful lookups. input with
> + * BPF_FIB_LOOKUP_VLAN_INPUT: the VLAN tag to scope
> + * the lookup by.
> */
> __be16 h_vlan_proto;
> __be16 h_vlan_TCI;
> diff --git a/net/core/filter.c b/net/core/filter.c
> index b37a12321fba..cfbdd842ce61 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -6158,6 +6158,41 @@ static int bpf_fib_set_fwd_params(struct net_device *dev,
>
> return 0;
> }
> +
> +/* With BPF_FIB_LOOKUP_VLAN_INPUT the caller passes the packet's VLAN tag in
> + * params->h_vlan_proto and params->h_vlan_TCI; the lookup is done as if
> + * ingress had happened on the matching VLAN subinterface of *dev. Resolve
> + * it and store it in *dev. params is not modified.
> + *
> + * A protocol other than 802.1Q/802.1AD is API misuse (it would otherwise
> + * reach the WARN in vlan_proto_idx()), so it is rejected with -EINVAL. An
> + * unmatched VID, a matching device that is down, or one that was moved
> + * to another netns (receive would deliver into that netns' stack, which
> + * a lookup here cannot represent) is a data outcome, reported as
> + * NOT_FWDED, the same way the DIRECT path reports a missing table. Under
> + * !CONFIG_VLAN_8021Q __vlan_find_dev_deep_rcu() returns NULL, so every
> + * call returns NOT_FWDED, which is correct since no subinterface can
> + * exist.
> + */
As in the previous patch, please drop this comment.
> +static int bpf_fib_vlan_input_dev(struct net_device **dev,
> + const struct bpf_fib_lookup *params)
> +{
Just return the dev pointer and use ERR_PTR for errors? That's what we
usually do for these kinds of functions.
-Toke
next prev parent reply other threads:[~2026-06-17 9:43 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-16 22:34 [PATCH bpf-next v2 0/4] bpf: bidirectional VLAN support for bpf_fib_lookup() Avinash Duduskar
2026-06-16 22:34 ` [PATCH bpf-next v2 1/4] bpf: Initialize the l3mdev field for the fib lookup flow Avinash Duduskar
2026-06-17 9:06 ` Toke Høiland-Jørgensen
2026-06-16 22:34 ` [PATCH bpf-next v2 2/4] bpf: Add BPF_FIB_LOOKUP_VLAN flag to bpf_fib_lookup() helper Avinash Duduskar
2026-06-16 22:47 ` sashiko-bot
2026-06-17 9:26 ` Toke Høiland-Jørgensen
2026-06-16 22:34 ` [PATCH bpf-next v2 3/4] bpf: Add BPF_FIB_LOOKUP_VLAN_INPUT " Avinash Duduskar
2026-06-17 9:42 ` Toke Høiland-Jørgensen [this message]
2026-06-16 22:34 ` [PATCH bpf-next v2 4/4] selftests/bpf: Add bpf_fib_lookup() VLAN flag tests Avinash Duduskar
2026-06-16 22:40 ` sashiko-bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=874ij1h59e.fsf@toke.dk \
--to=toke@redhat.com \
--cc=a.s.protopopov@gmail.com \
--cc=ameryhung@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=avinash.duduskar@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=emil@etsalapatis.com \
--cc=eyal.birger@gmail.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=leon.hwang@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=memxor@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=rongtao@cestc.cn \
--cc=sdf@fomichev.me \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=yatsenko@meta.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox