From: Avinash Duduskar <avinash.duduskar@gmail.com>
To: Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>
Cc: "Eduard Zingerman" <eddyz87@gmail.com>,
"Kumar Kartikeya Dwivedi" <memxor@gmail.com>,
"Martin KaFai Lau" <martin.lau@linux.dev>,
"Song Liu" <song@kernel.org>,
"Yonghong Song" <yonghong.song@linux.dev>,
"Jiri Olsa" <jolsa@kernel.org>,
"Emil Tsalapatis" <emil@etsalapatis.com>,
"John Fastabend" <john.fastabend@gmail.com>,
"Stanislav Fomichev" <sdf@fomichev.me>,
"David S. Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Paolo Abeni" <pabeni@redhat.com>,
"Simon Horman" <horms@kernel.org>,
"David Ahern" <dsahern@kernel.org>,
"Shuah Khan" <shuah@kernel.org>,
"Jesper Dangaard Brouer" <hawk@kernel.org>,
"Mykyta Yatsenko" <yatsenko@meta.com>,
"Leon Hwang" <leon.hwang@linux.dev>,
"KP Singh" <kpsingh@kernel.org>,
"Anton Protopopov" <a.s.protopopov@gmail.com>,
"Amery Hung" <ameryhung@gmail.com>,
"Eyal Birger" <eyal.birger@gmail.com>,
"Rong Tao" <rongtao@cestc.cn>,
"Toke Høiland-Jørgensen" <toke@redhat.com>,
bpf@vger.kernel.org, netdev@vger.kernel.org,
linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH bpf-next v2 3/4] bpf: Add BPF_FIB_LOOKUP_VLAN_INPUT flag to bpf_fib_lookup() helper
Date: Wed, 17 Jun 2026 04:04:25 +0530 [thread overview]
Message-ID: <20260616223426.3568080-4-avinash.duduskar@gmail.com> (raw)
In-Reply-To: <20260616223426.3568080-1-avinash.duduskar@gmail.com>
BPF_FIB_LOOKUP_VLAN resolves a VLAN egress. The reverse is also
useful: an XDP program receiving a VLAN-tagged frame on a physical
device wants the lookup to behave as if the packet had arrived on the
corresponding VLAN subinterface, so iif-based policy routing and VRF
table selection use the right ingress.
Add BPF_FIB_LOOKUP_VLAN_INPUT. When set, params->h_vlan_proto and
params->h_vlan_TCI are read as an input VLAN tag and the matching VLAN
device of params->ifindex is resolved with __vlan_find_dev_deep_rcu().
The device must be up and in the same network namespace as
params->ifindex (a VLAN device can be moved to another netns while
registered on its parent; receive would deliver into that other
namespace, which a lookup here cannot represent). If params->ifindex
is itself a VLAN device, its inner (QinQ) subinterface is matched.
For a bond or team, a tag on a port matches no device and returns
NOT_FWDED; pass the master's ifindex.
The lookup then runs with the resolved device as the ingress;
params->ifindex itself is not modified on the input side. When the
resolved device is enslaved to a VRF, both the full lookup (via the
l3mdev rule) and BPF_FIB_LOOKUP_DIRECT (via l3mdev_fib_table_rcu())
select the VRF's table from the resolved ingress. That follows from
feeding the resolved device to the flow as the ingress
(fl4.flowi4_iif = dev->ifindex), which is what makes l3mdev resolve
the VRF master from the subinterface rather than from
params->ifindex.
The two failure classes get different treatment on purpose. A
h_vlan_proto other than 802.1Q/802.1ad is API misuse and returns
-EINVAL, since it would otherwise reach the WARN in vlan_proto_idx()
with a program-controlled value. An unmatched VID, a device that is
down, or one in another namespace is a data outcome and returns
BPF_FIB_LKUP_RET_NOT_FWDED, matching the DIRECT path when
fib_get_table() finds no table and mirroring real ingress, where the
receive path drops such frames. A VID of 0 (a priority tag) is looked
up literally and normally fails the same way; receive instead
processes such frames untagged, so callers should not set the flag for
priority tags. Proceeding on the physical device for any of these
would be fail-open for the policy-routing cases above.
The h_vlan fields share a union with tbid, so the flag cannot be
combined with BPF_FIB_LOOKUP_TBID. It describes ingress, so it also
cannot be combined with BPF_FIB_LOOKUP_OUTPUT. Both combinations
return -EINVAL; restricting now keeps a later relaxation backward
compatible. Combining with BPF_FIB_LOOKUP_VLAN is allowed: the tag is
consumed on the ingress side and the egress tag is written on
success.
Under !CONFIG_VLAN_8021Q the __vlan_find_dev_deep_rcu() stub returns
NULL, so every lookup with the flag returns NOT_FWDED, which is
correct since no VLAN device can exist.
Suggested-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Avinash Duduskar <avinash.duduskar@gmail.com>
---
include/uapi/linux/bpf.h | 34 ++++++++++++++-
net/core/filter.c | 80 +++++++++++++++++++++++++++++++---
tools/include/uapi/linux/bpf.h | 34 ++++++++++++++-
3 files changed, 141 insertions(+), 7 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f77aa9472bf1..57e28da3336a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3552,6 +3552,35 @@ union bpf_attr {
* reports the route mtu in *params*->mtu_result, and on
* the tc path without tot_len the mtu check runs after
* the swap, against the parent device.
+ * **BPF_FIB_LOOKUP_VLAN_INPUT**
+ * Treat *params*->h_vlan_proto and *params*->h_vlan_TCI
+ * as an input VLAN tag (e.g. parsed from the packet) and
+ * run the lookup as if ingress had happened on the VLAN
+ * subinterface carrying that tag for *params*->ifindex,
+ * rather than on *params*->ifindex itself. The VID is the
+ * low 12 bits of *params*->h_vlan_TCI;
+ * *params*->h_vlan_proto must be ETH_P_8021Q or
+ * ETH_P_8021AD in network byte order (any other value
+ * returns **-EINVAL**). The
+ * subinterface is the one configured for that tag on
+ * *params*->ifindex; if *params*->ifindex is itself a
+ * VLAN device, its inner (QinQ) subinterface is matched.
+ * For a bond or team, a tag on a port matches no
+ * device and returns NOT_FWDED; pass the master's
+ * ifindex.
+ * If no matching subinterface exists, or it is not up,
+ * or it was moved to another network namespace, the
+ * lookup returns **BPF_FIB_LKUP_RET_NOT_FWDED**,
+ * mirroring real ingress, which drops a frame whose tag
+ * is unconfigured or whose VLAN device is down. A VID of
+ * 0 (a priority-tagged frame) is looked up literally like
+ * any other VID; receive instead processes such frames
+ * untagged on the device itself, so do not set this flag
+ * for priority tags.
+ * Cannot be combined with **BPF_FIB_LOOKUP_TBID** (both
+ * use the same input fields) or **BPF_FIB_LOOKUP_OUTPUT**
+ * (this flag is ingress-only); doing so returns
+ * **-EINVAL**.
*
* *ctx* is either **struct xdp_md** for XDP programs or
* **struct sk_buff** tc cls_act programs.
@@ -7348,6 +7377,7 @@ enum {
BPF_FIB_LOOKUP_SRC = (1U << 4),
BPF_FIB_LOOKUP_MARK = (1U << 5),
BPF_FIB_LOOKUP_VLAN = (1U << 6),
+ BPF_FIB_LOOKUP_VLAN_INPUT = (1U << 7),
};
enum {
@@ -7416,7 +7446,9 @@ struct bpf_fib_lookup {
struct {
/* output with BPF_FIB_LOOKUP_VLAN: set from the
* resolved egress VLAN device (see the flag); zeroed
- * on other successful lookups.
+ * on other successful lookups. input with
+ * BPF_FIB_LOOKUP_VLAN_INPUT: the VLAN tag to scope
+ * the lookup by.
*/
__be16 h_vlan_proto;
__be16 h_vlan_TCI;
diff --git a/net/core/filter.c b/net/core/filter.c
index b37a12321fba..cfbdd842ce61 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6158,6 +6158,41 @@ static int bpf_fib_set_fwd_params(struct net_device *dev,
return 0;
}
+
+/* With BPF_FIB_LOOKUP_VLAN_INPUT the caller passes the packet's VLAN tag in
+ * params->h_vlan_proto and params->h_vlan_TCI; the lookup is done as if
+ * ingress had happened on the matching VLAN subinterface of *dev. Resolve
+ * it and store it in *dev. params is not modified.
+ *
+ * A protocol other than 802.1Q/802.1AD is API misuse (it would otherwise
+ * reach the WARN in vlan_proto_idx()), so it is rejected with -EINVAL. An
+ * unmatched VID, a matching device that is down, or one that was moved
+ * to another netns (receive would deliver into that netns' stack, which
+ * a lookup here cannot represent) is a data outcome, reported as
+ * NOT_FWDED, the same way the DIRECT path reports a missing table. Under
+ * !CONFIG_VLAN_8021Q __vlan_find_dev_deep_rcu() returns NULL, so every
+ * call returns NOT_FWDED, which is correct since no subinterface can
+ * exist.
+ */
+static int bpf_fib_vlan_input_dev(struct net_device **dev,
+ const struct bpf_fib_lookup *params)
+{
+ __be16 proto = params->h_vlan_proto;
+ struct net_device *vlan_dev;
+ u16 vid;
+
+ if (proto != htons(ETH_P_8021Q) && proto != htons(ETH_P_8021AD))
+ return -EINVAL;
+
+ vid = ntohs(params->h_vlan_TCI) & VLAN_VID_MASK;
+ vlan_dev = __vlan_find_dev_deep_rcu(*dev, proto, vid);
+ if (!vlan_dev || !(vlan_dev->flags & IFF_UP) ||
+ !net_eq(dev_net(vlan_dev), dev_net(*dev)))
+ return BPF_FIB_LKUP_RET_NOT_FWDED;
+
+ *dev = vlan_dev;
+ return 0;
+}
#endif
#if IS_ENABLED(CONFIG_INET)
@@ -6177,6 +6212,12 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
if (unlikely(!dev))
return -ENODEV;
+ if (flags & BPF_FIB_LOOKUP_VLAN_INPUT) {
+ err = bpf_fib_vlan_input_dev(&dev, params);
+ if (err)
+ return err;
+ }
+
/* verify forwarding is enabled on this interface */
in_dev = __in_dev_get_rcu(dev);
if (unlikely(!in_dev || !IN_DEV_FORWARD(in_dev)))
@@ -6186,7 +6227,10 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
fl4.flowi4_iif = 1;
fl4.flowi4_oif = params->ifindex;
} else {
- fl4.flowi4_iif = params->ifindex;
+ /* dev->ifindex, not params->ifindex: VLAN_INPUT may have
+ * resolved dev to a subinterface above.
+ */
+ fl4.flowi4_iif = dev->ifindex;
fl4.flowi4_oif = 0;
}
fl4.flowi4_dscp = inet_dsfield_to_dscp(params->tos);
@@ -6323,6 +6367,12 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
if (unlikely(!dev))
return -ENODEV;
+ if (flags & BPF_FIB_LOOKUP_VLAN_INPUT) {
+ err = bpf_fib_vlan_input_dev(&dev, params);
+ if (err)
+ return err;
+ }
+
idev = __in6_dev_get_safely(dev);
if (unlikely(!idev || !READ_ONCE(idev->cnf.forwarding)))
return BPF_FIB_LKUP_RET_FWD_DISABLED;
@@ -6331,7 +6381,11 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
fl6.flowi6_iif = 1;
oif = fl6.flowi6_oif = params->ifindex;
} else {
- oif = fl6.flowi6_iif = params->ifindex;
+ /* dev->ifindex, not params->ifindex: VLAN_INPUT may have
+ * resolved dev to a subinterface above.
+ */
+ oif = dev->ifindex;
+ fl6.flowi6_iif = oif;
fl6.flowi6_oif = 0;
strict = RT6_LOOKUP_F_HAS_SADDR;
}
@@ -6443,7 +6497,23 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
#define BPF_FIB_LOOKUP_MASK (BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT | \
BPF_FIB_LOOKUP_SKIP_NEIGH | BPF_FIB_LOOKUP_TBID | \
BPF_FIB_LOOKUP_SRC | BPF_FIB_LOOKUP_MARK | \
- BPF_FIB_LOOKUP_VLAN)
+ BPF_FIB_LOOKUP_VLAN | BPF_FIB_LOOKUP_VLAN_INPUT)
+
+static bool bpf_fib_lookup_flags_ok(u32 flags)
+{
+ if (flags & ~BPF_FIB_LOOKUP_MASK)
+ return false;
+
+ /* VLAN_INPUT reads h_vlan_proto/h_vlan_TCI, which alias tbid, so it
+ * cannot be combined with TBID. It is also ingress-only, so it
+ * cannot be combined with the egress-perspective OUTPUT flag.
+ */
+ if ((flags & BPF_FIB_LOOKUP_VLAN_INPUT) &&
+ (flags & (BPF_FIB_LOOKUP_TBID | BPF_FIB_LOOKUP_OUTPUT)))
+ return false;
+
+ return true;
+}
BPF_CALL_4(bpf_xdp_fib_lookup, struct xdp_buff *, ctx,
struct bpf_fib_lookup *, params, int, plen, u32, flags)
@@ -6451,7 +6521,7 @@ BPF_CALL_4(bpf_xdp_fib_lookup, struct xdp_buff *, ctx,
if (plen < sizeof(*params))
return -EINVAL;
- if (flags & ~BPF_FIB_LOOKUP_MASK)
+ if (!bpf_fib_lookup_flags_ok(flags))
return -EINVAL;
switch (params->family) {
@@ -6489,7 +6559,7 @@ BPF_CALL_4(bpf_skb_fib_lookup, struct sk_buff *, skb,
if (plen < sizeof(*params))
return -EINVAL;
- if (flags & ~BPF_FIB_LOOKUP_MASK)
+ if (!bpf_fib_lookup_flags_ok(flags))
return -EINVAL;
if (params->tot_len)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index f77aa9472bf1..57e28da3336a 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3552,6 +3552,35 @@ union bpf_attr {
* reports the route mtu in *params*->mtu_result, and on
* the tc path without tot_len the mtu check runs after
* the swap, against the parent device.
+ * **BPF_FIB_LOOKUP_VLAN_INPUT**
+ * Treat *params*->h_vlan_proto and *params*->h_vlan_TCI
+ * as an input VLAN tag (e.g. parsed from the packet) and
+ * run the lookup as if ingress had happened on the VLAN
+ * subinterface carrying that tag for *params*->ifindex,
+ * rather than on *params*->ifindex itself. The VID is the
+ * low 12 bits of *params*->h_vlan_TCI;
+ * *params*->h_vlan_proto must be ETH_P_8021Q or
+ * ETH_P_8021AD in network byte order (any other value
+ * returns **-EINVAL**). The
+ * subinterface is the one configured for that tag on
+ * *params*->ifindex; if *params*->ifindex is itself a
+ * VLAN device, its inner (QinQ) subinterface is matched.
+ * For a bond or team, a tag on a port matches no
+ * device and returns NOT_FWDED; pass the master's
+ * ifindex.
+ * If no matching subinterface exists, or it is not up,
+ * or it was moved to another network namespace, the
+ * lookup returns **BPF_FIB_LKUP_RET_NOT_FWDED**,
+ * mirroring real ingress, which drops a frame whose tag
+ * is unconfigured or whose VLAN device is down. A VID of
+ * 0 (a priority-tagged frame) is looked up literally like
+ * any other VID; receive instead processes such frames
+ * untagged on the device itself, so do not set this flag
+ * for priority tags.
+ * Cannot be combined with **BPF_FIB_LOOKUP_TBID** (both
+ * use the same input fields) or **BPF_FIB_LOOKUP_OUTPUT**
+ * (this flag is ingress-only); doing so returns
+ * **-EINVAL**.
*
* *ctx* is either **struct xdp_md** for XDP programs or
* **struct sk_buff** tc cls_act programs.
@@ -7348,6 +7377,7 @@ enum {
BPF_FIB_LOOKUP_SRC = (1U << 4),
BPF_FIB_LOOKUP_MARK = (1U << 5),
BPF_FIB_LOOKUP_VLAN = (1U << 6),
+ BPF_FIB_LOOKUP_VLAN_INPUT = (1U << 7),
};
enum {
@@ -7416,7 +7446,9 @@ struct bpf_fib_lookup {
struct {
/* output with BPF_FIB_LOOKUP_VLAN: set from the
* resolved egress VLAN device (see the flag); zeroed
- * on other successful lookups.
+ * on other successful lookups. input with
+ * BPF_FIB_LOOKUP_VLAN_INPUT: the VLAN tag to scope
+ * the lookup by.
*/
__be16 h_vlan_proto;
__be16 h_vlan_TCI;
--
2.54.0
next prev parent reply other threads:[~2026-06-16 22:35 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-16 22:34 [PATCH bpf-next v2 0/4] bpf: bidirectional VLAN support for bpf_fib_lookup() Avinash Duduskar
2026-06-16 22:34 ` [PATCH bpf-next v2 1/4] bpf: Initialize the l3mdev field for the fib lookup flow Avinash Duduskar
2026-06-16 22:34 ` [PATCH bpf-next v2 2/4] bpf: Add BPF_FIB_LOOKUP_VLAN flag to bpf_fib_lookup() helper Avinash Duduskar
2026-06-16 22:47 ` sashiko-bot
2026-06-16 22:34 ` Avinash Duduskar [this message]
2026-06-16 22:34 ` [PATCH bpf-next v2 4/4] selftests/bpf: Add bpf_fib_lookup() VLAN flag tests Avinash Duduskar
2026-06-16 22:40 ` sashiko-bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260616223426.3568080-4-avinash.duduskar@gmail.com \
--to=avinash.duduskar@gmail.com \
--cc=a.s.protopopov@gmail.com \
--cc=ameryhung@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=emil@etsalapatis.com \
--cc=eyal.birger@gmail.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=leon.hwang@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=memxor@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=rongtao@cestc.cn \
--cc=sdf@fomichev.me \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=toke@redhat.com \
--cc=yatsenko@meta.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.