* [PATCH bpf-next] bpf: Add BPF_FIB_LOOKUP_VLAN flag to bpf_fib_lookup() helper
@ 2026-06-09 17:20 Avinash Duduskar
2026-06-09 17:51 ` bot+bpf-ci
2026-06-09 20:14 ` Toke Høiland-Jørgensen
0 siblings, 2 replies; 3+ messages in thread
From: Avinash Duduskar @ 2026-06-09 17:20 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Cc: Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
Song Liu, Yonghong Song, Jiri Olsa, John Fastabend,
Stanislav Fomichev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Jesper Dangaard Brouer, KP Singh,
Toke Høiland-Jørgensen, bpf, netdev, linux-kernel
bpf_fib_lookup() returns the FIB-resolved egress ifindex straight
from the fib result. When the egress is a VLAN device, the returned
ifindex is the VLAN netdev's, which has no XDP xmit handler; XDP
programs that want to forward the frame (e.g. xdp-forward) must
instead target the underlying physical device and push the VLAN tag
themselves. Today the program has no way to learn either the
underlying ifindex or the VLAN tag without maintaining its own
VLAN-to-ifindex map in userspace and refreshing it on netlink
events.
Add BPF_FIB_LOOKUP_VLAN. When the caller sets this flag and the fib
result is a VLAN device, populate the existing output fields
params->h_vlan_proto and params->h_vlan_TCI from the VLAN device,
and replace params->ifindex with the underlying real device's
ifindex. params->h_vlan_TCI carries the VID only, with PCP and DEI
bits zero; a consumer wanting to set egress priority writes PCP
itself. Only the immediate parent is resolved; stacked VLANs (QinQ)
are not walked. When the flag is not set, behaviour is unchanged:
h_vlan_proto and h_vlan_TCI are zeroed and ifindex is left at the
FIB result.
This lets an XDP redirect target the physical device and learn the
tag to push in a single lookup, which xdp-forward's optional VLAN
mode (xdp-project/xdp-tools#504) wants from the kernel side.
The change extends bpf_fib_set_fwd_params() to take the egress dev
and the lookup flags so the VLAN swap happens in the same place the
vlan output fields are zeroed by default. Both IPv4 and IPv6
callers pass through. The helper's input semantics are unchanged.
Under !CONFIG_VLAN_8021Q, is_vlan_dev() returns false and the new
block is a no-op.
Suggested-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Avinash Duduskar <avinash.duduskar@gmail.com>
---
include/uapi/linux/bpf.h | 21 ++++++++++++++++++++-
net/core/filter.c | 27 +++++++++++++++++++++++----
tools/include/uapi/linux/bpf.h | 21 ++++++++++++++++++++-
3 files changed, 63 insertions(+), 6 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 11dd610fa5fa..aa7fe378a35d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3527,6 +3527,19 @@ union bpf_attr {
* Use the mark present in *params*->mark for the fib lookup.
* This option should not be used with BPF_FIB_LOOKUP_DIRECT,
* as it only has meaning for full lookups.
+ * **BPF_FIB_LOOKUP_VLAN**
+ * If the fib lookup resolves to a VLAN device, set
+ * *params*->h_vlan_proto and *params*->h_vlan_TCI from
+ * the VLAN device and replace *params*->ifindex with the
+ * underlying real device's ifindex. This lets XDP
+ * programs that target the underlying physical device
+ * (VLAN devices have no XDP xmit) discover both the
+ * real egress ifindex and the VLAN tag to push in one
+ * call. *params*->h_vlan_TCI carries the VID only,
+ * with PCP and DEI bits zero; a consumer wanting to
+ * set egress priority writes PCP itself. Only the
+ * immediate parent is resolved; stacked VLANs (QinQ)
+ * are not walked.
*
* *ctx* is either **struct xdp_md** for XDP programs or
* **struct sk_buff** tc cls_act programs.
@@ -7322,6 +7335,7 @@ enum {
BPF_FIB_LOOKUP_TBID = (1U << 3),
BPF_FIB_LOOKUP_SRC = (1U << 4),
BPF_FIB_LOOKUP_MARK = (1U << 5),
+ BPF_FIB_LOOKUP_VLAN = (1U << 6),
};
enum {
@@ -7388,7 +7402,12 @@ struct bpf_fib_lookup {
union {
struct {
- /* output */
+ /* output: only populated with BPF_FIB_LOOKUP_VLAN
+ * when the resolved egress is a VLAN device, in
+ * which case *ifindex* is replaced with the
+ * underlying real device's ifindex. Otherwise
+ * both fields are zeroed.
+ */
__be16 h_vlan_proto;
__be16 h_vlan_TCI;
};
diff --git a/net/core/filter.c b/net/core/filter.c
index 9590877b0714..782fa86df95a 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6119,10 +6119,28 @@ static const struct bpf_func_proto bpf_skb_get_xfrm_state_proto = {
#endif
#if IS_ENABLED(CONFIG_INET) || IS_ENABLED(CONFIG_IPV6)
-static int bpf_fib_set_fwd_params(struct bpf_fib_lookup *params, u32 mtu)
+static int bpf_fib_set_fwd_params(struct net_device *dev,
+ struct bpf_fib_lookup *params,
+ u32 flags, u32 mtu)
{
params->h_vlan_TCI = 0;
params->h_vlan_proto = 0;
+
+ if ((flags & BPF_FIB_LOOKUP_VLAN) && is_vlan_dev(dev)) {
+ struct net_device *real_dev = vlan_dev_real_dev(dev);
+
+ /* Only the immediate parent is resolved; stacked VLANs
+ * (QinQ) are not walked, and a NULL real_dev (which
+ * is_vlan_dev() rules out in practice) keeps the
+ * original ifindex.
+ */
+ if (real_dev) {
+ params->h_vlan_proto = vlan_dev_vlan_proto(dev);
+ params->h_vlan_TCI = htons(vlan_dev_vlan_id(dev));
+ params->ifindex = real_dev->ifindex;
+ }
+ }
+
if (mtu)
params->mtu_result = mtu; /* union with tot_len */
@@ -6265,7 +6283,7 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
memcpy(params->smac, dev->dev_addr, ETH_ALEN);
set_fwd_params:
- return bpf_fib_set_fwd_params(params, mtu);
+ return bpf_fib_set_fwd_params(dev, params, flags, mtu);
}
#endif
@@ -6404,13 +6422,14 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
memcpy(params->smac, dev->dev_addr, ETH_ALEN);
set_fwd_params:
- return bpf_fib_set_fwd_params(params, mtu);
+ return bpf_fib_set_fwd_params(dev, params, flags, mtu);
}
#endif
#define BPF_FIB_LOOKUP_MASK (BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT | \
BPF_FIB_LOOKUP_SKIP_NEIGH | BPF_FIB_LOOKUP_TBID | \
- BPF_FIB_LOOKUP_SRC | BPF_FIB_LOOKUP_MARK)
+ BPF_FIB_LOOKUP_SRC | BPF_FIB_LOOKUP_MARK | \
+ BPF_FIB_LOOKUP_VLAN)
BPF_CALL_4(bpf_xdp_fib_lookup, struct xdp_buff *, ctx,
struct bpf_fib_lookup *, params, int, plen, u32, flags)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 11dd610fa5fa..aa7fe378a35d 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3527,6 +3527,19 @@ union bpf_attr {
* Use the mark present in *params*->mark for the fib lookup.
* This option should not be used with BPF_FIB_LOOKUP_DIRECT,
* as it only has meaning for full lookups.
+ * **BPF_FIB_LOOKUP_VLAN**
+ * If the fib lookup resolves to a VLAN device, set
+ * *params*->h_vlan_proto and *params*->h_vlan_TCI from
+ * the VLAN device and replace *params*->ifindex with the
+ * underlying real device's ifindex. This lets XDP
+ * programs that target the underlying physical device
+ * (VLAN devices have no XDP xmit) discover both the
+ * real egress ifindex and the VLAN tag to push in one
+ * call. *params*->h_vlan_TCI carries the VID only,
+ * with PCP and DEI bits zero; a consumer wanting to
+ * set egress priority writes PCP itself. Only the
+ * immediate parent is resolved; stacked VLANs (QinQ)
+ * are not walked.
*
* *ctx* is either **struct xdp_md** for XDP programs or
* **struct sk_buff** tc cls_act programs.
@@ -7322,6 +7335,7 @@ enum {
BPF_FIB_LOOKUP_TBID = (1U << 3),
BPF_FIB_LOOKUP_SRC = (1U << 4),
BPF_FIB_LOOKUP_MARK = (1U << 5),
+ BPF_FIB_LOOKUP_VLAN = (1U << 6),
};
enum {
@@ -7388,7 +7402,12 @@ struct bpf_fib_lookup {
union {
struct {
- /* output */
+ /* output: only populated with BPF_FIB_LOOKUP_VLAN
+ * when the resolved egress is a VLAN device, in
+ * which case *ifindex* is replaced with the
+ * underlying real device's ifindex. Otherwise
+ * both fields are zeroed.
+ */
__be16 h_vlan_proto;
__be16 h_vlan_TCI;
};
base-commit: f1a660bbd12dd855fce6cf13f144008c4e45e7c7
--
2.54.0
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH bpf-next] bpf: Add BPF_FIB_LOOKUP_VLAN flag to bpf_fib_lookup() helper
2026-06-09 17:20 [PATCH bpf-next] bpf: Add BPF_FIB_LOOKUP_VLAN flag to bpf_fib_lookup() helper Avinash Duduskar
@ 2026-06-09 17:51 ` bot+bpf-ci
2026-06-09 20:14 ` Toke Høiland-Jørgensen
1 sibling, 0 replies; 3+ messages in thread
From: bot+bpf-ci @ 2026-06-09 17:51 UTC (permalink / raw)
To: avinash.duduskar, ast, daniel, andrii
Cc: martin.lau, eddyz87, memxor, song, yonghong.song, jolsa,
john.fastabend, sdf, davem, edumazet, kuba, pabeni, horms, hawk,
kpsingh, toke, bpf, netdev, linux-kernel, ast, andrii, daniel,
martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai
[-- Attachment #1: Type: text/plain, Size: 3310 bytes --]
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 11dd610fa5fa..aa7fe378a35d 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
[ ... ]
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 9590877b0714..782fa86df95a 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -6119,10 +6119,28 @@ static const struct bpf_func_proto bpf_skb_get_xfrm_state_proto = {
> #endif
>
> #if IS_ENABLED(CONFIG_INET) || IS_ENABLED(CONFIG_IPV6)
> -static int bpf_fib_set_fwd_params(struct bpf_fib_lookup *params, u32 mtu)
> +static int bpf_fib_set_fwd_params(struct net_device *dev,
> + struct bpf_fib_lookup *params,
> + u32 flags, u32 mtu)
> {
> params->h_vlan_TCI = 0;
> params->h_vlan_proto = 0;
> +
> + if ((flags & BPF_FIB_LOOKUP_VLAN) && is_vlan_dev(dev)) {
> + struct net_device *real_dev = vlan_dev_real_dev(dev);
Does this correctly handle stacked VLANs (QinQ)? Both the commit message
and the inline comment below mention that stacked VLANs are not walked and
only the immediate parent is resolved.
However, looking at the implementation of vlan_dev_real_dev() in
net/8021q/vlan_core.c:
const struct net_device *vlan_dev_real_dev(const struct net_device *dev)
{
while (is_vlan_dev(dev))
dev = vlan_dev_priv(dev)->real_dev;
return dev;
}
It appears vlan_dev_real_dev() loops through all stacked VLANs to return
the ultimate non-VLAN underlying device.
If the configuration is QinQ, wouldn't this set params->ifindex to the base
physical device while only returning the topmost VLAN tag in
params->h_vlan_TCI? This could cause an XDP program to push only a single
tag and drop the intermediate VLAN tags, bypassing the outer VLAN entirely.
Should this use vlan_dev_priv(dev)->real_dev instead to only get the
immediate parent as intended?
> +
> + /* Only the immediate parent is resolved; stacked VLANs
This isn't a bug, but the comment style here differs from BPF subsystem
guidelines. The opening /* has comment text on the same line, but the BPF
comment style prefers the opening /* on its own line with text starting
on the next line.
> + * (QinQ) are not walked, and a NULL real_dev (which
> + * is_vlan_dev() rules out in practice) keeps the
> + * original ifindex.
> + */
> + if (real_dev) {
> + params->h_vlan_proto = vlan_dev_vlan_proto(dev);
> + params->h_vlan_TCI = htons(vlan_dev_vlan_id(dev));
> + params->ifindex = real_dev->ifindex;
> + }
> + }
> +
> if (mtu)
> params->mtu_result = mtu; /* union with tot_len */
>
> @@ -6265,7 +6283,7 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
> memcpy(params->smac, dev->dev_addr, ETH_ALEN);
>
> set_fwd_params:
> - return bpf_fib_set_fwd_params(params, mtu);
> + return bpf_fib_set_fwd_params(dev, params, flags, mtu);
> }
> #endif
[ ... ]
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 11dd610fa5fa..aa7fe378a35d 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
[ ... ]
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/27223878961
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH bpf-next] bpf: Add BPF_FIB_LOOKUP_VLAN flag to bpf_fib_lookup() helper
2026-06-09 17:20 [PATCH bpf-next] bpf: Add BPF_FIB_LOOKUP_VLAN flag to bpf_fib_lookup() helper Avinash Duduskar
2026-06-09 17:51 ` bot+bpf-ci
@ 2026-06-09 20:14 ` Toke Høiland-Jørgensen
1 sibling, 0 replies; 3+ messages in thread
From: Toke Høiland-Jørgensen @ 2026-06-09 20:14 UTC (permalink / raw)
To: Avinash Duduskar, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko
Cc: Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
Song Liu, Yonghong Song, Jiri Olsa, John Fastabend,
Stanislav Fomichev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Jesper Dangaard Brouer, KP Singh, bpf,
netdev, linux-kernel
Avinash Duduskar <avinash.duduskar@gmail.com> writes:
> bpf_fib_lookup() returns the FIB-resolved egress ifindex straight
> from the fib result. When the egress is a VLAN device, the returned
> ifindex is the VLAN netdev's, which has no XDP xmit handler; XDP
> programs that want to forward the frame (e.g. xdp-forward) must
> instead target the underlying physical device and push the VLAN tag
> themselves. Today the program has no way to learn either the
> underlying ifindex or the VLAN tag without maintaining its own
> VLAN-to-ifindex map in userspace and refreshing it on netlink
> events.
>
> Add BPF_FIB_LOOKUP_VLAN. When the caller sets this flag and the fib
> result is a VLAN device, populate the existing output fields
> params->h_vlan_proto and params->h_vlan_TCI from the VLAN device,
> and replace params->ifindex with the underlying real device's
> ifindex. params->h_vlan_TCI carries the VID only, with PCP and DEI
> bits zero; a consumer wanting to set egress priority writes PCP
> itself. Only the immediate parent is resolved; stacked VLANs (QinQ)
> are not walked. When the flag is not set, behaviour is unchanged:
> h_vlan_proto and h_vlan_TCI are zeroed and ifindex is left at the
> FIB result.
>
> This lets an XDP redirect target the physical device and learn the
> tag to push in a single lookup, which xdp-forward's optional VLAN
> mode (xdp-project/xdp-tools#504) wants from the kernel side.
>
> The change extends bpf_fib_set_fwd_params() to take the egress dev
> and the lookup flags so the VLAN swap happens in the same place the
> vlan output fields are zeroed by default. Both IPv4 and IPv6
> callers pass through. The helper's input semantics are unchanged.
> Under !CONFIG_VLAN_8021Q, is_vlan_dev() returns false and the new
> block is a no-op.
>
> Suggested-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Signed-off-by: Avinash Duduskar <avinash.duduskar@gmail.com>
Other than the bots' comment, I think we should make this bidirectional.
I.e., it should also be possible to supply the vlan tag from the packet
when doing the lookup.
This requires a second flag, which has to be exclusive with
BPF_FIB_LOOKUP_TBID, as the tbid field unfortunately overlaps with the
VLAN fields (so they can't be used together as input).
-Toke
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-06-09 20:14 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-09 17:20 [PATCH bpf-next] bpf: Add BPF_FIB_LOOKUP_VLAN flag to bpf_fib_lookup() helper Avinash Duduskar
2026-06-09 17:51 ` bot+bpf-ci
2026-06-09 20:14 ` Toke Høiland-Jørgensen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox