From: Avinash Duduskar <avinash.duduskar@gmail.com>
To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org
Cc: ameryhung@gmail.com, a.s.protopopov@gmail.com,
bpf@vger.kernel.org, davem@davemloft.net, dsahern@kernel.org,
eddyz87@gmail.com, edumazet@google.com, emil@etsalapatis.com,
eyal.birger@gmail.com, hawk@kernel.org, horms@kernel.org,
john.fastabend@gmail.com, jolsa@kernel.org, kpsingh@kernel.org,
kuba@kernel.org, leon.hwang@linux.dev,
linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
martin.lau@linux.dev, memxor@gmail.com, netdev@vger.kernel.org,
pabeni@redhat.com, rongtao@cestc.cn, sdf@fomichev.me,
shuah@kernel.org, song@kernel.org, toke@redhat.com,
yatsenko@meta.com, yonghong.song@linux.dev
Subject: [PATCH bpf-next v3 0/3] bpf: bidirectional VLAN support for bpf_fib_lookup()
Date: Thu, 18 Jun 2026 04:17:26 +0530 [thread overview]
Message-ID: <20260617224729.1428662-1-avinash.duduskar@gmail.com> (raw)
This series adds VLAN awareness to bpf_fib_lookup() in both directions.
BPF_FIB_LOOKUP_VLAN resolves a VLAN egress to its underlying real device
plus the VLAN tag (XDP programs need this because VLAN devices have no XDP
xmit), and BPF_FIB_LOOKUP_VLAN_INPUT runs the lookup as if a tagged frame
had arrived on the matching VLAN subinterface, for iif policy routing and
VRF table selection.
The l3mdev/VRF flow-init fix that was patch 1 in v1 and v2 has been split
out and sent to bpf on its own, since it is an independent Fixes:-tagged
fix that routes to stable on its own schedule. This series is otherwise
independent of it: on the default CONFIG_INIT_STACK_ALL_ZERO the VRF
selftests pass with or without the fix. Only the one full-lookup VRF arm
("IPv4 VLAN input, tag selects VRF table") depends on it, and only on
INIT_STACK_ALL_PATTERN or NONE builds, where the uninitialized
flowi_l3mdev otherwise misses the l3mdev rule and the lookup falls
through to the main table. Applying the l3mdev fix first closes that
window.
Changes v2 -> v3 (all from Toke's review unless noted):
- Split the l3mdev/VRF flow-init fix out to a standalone bpf submission
(it was patch 1 in v2).
- Patch 2 (VLAN_INPUT): bpf_fib_vlan_input_dev() returns a
struct net_device * with ERR_PTR() for the -EINVAL case and NULL for
NOT_FWDED, instead of an int return and a **dev out-parameter.
- Trim the BPF_FIB_LOOKUP_VLAN and BPF_FIB_LOOKUP_VLAN_INPUT UAPI doc
blocks, and drop the in-function comments that restated the commit
message or the flag doc.
- Patch 1 (VLAN egress): on the skb path without tot_len, the deferred mtu
check now runs against the resolved egress (VLAN) device, not the parent
params->ifindex was swapped to, so a VLAN device with a smaller mtu than
its parent is no longer checked against, or reported as, the parent's
larger mtu. Found by the bpf ci bot; this was an open question in v2.
- Patch 3 (selftests): re-run every case through bpf_xdp_fib_lookup() as
well, since the feature targets XDP; and flip the no-tot_len mtu arm to
expect the VLAN device's mtu after the fix above.
Open questions (defaults chosen, noted here in case a maintainer
prefers otherwise):
1. An unmatched, down, or foreign-netns tag returns
BPF_FIB_LKUP_RET_NOT_FWDED, matching the DIRECT path when
fib_get_table() finds no table, rather than a new return code.
2. BPF_FIB_LOOKUP_OUTPUT | BPF_FIB_LOOKUP_VLAN_INPUT is rejected with
-EINVAL; restricting now keeps relaxing later backward-compatible.
3. The name BPF_FIB_LOOKUP_VLAN_INPUT reads oddly next to
BPF_FIB_LOOKUP_OUTPUT. A pair like _VLAN_EGRESS/_VLAN_INGRESS is an
option while nothing is merged.
4. The egress flag leaves a VLAN it cannot reduce to a physical parent
plus one tag (QinQ, or a parent in another namespace) as SUCCESS with
the VLAN device's ifindex and the vlan fields zero, like a plain
lookup. The input side instead fails closed (NOT_FWDED) on the
cross-namespace case. An XDP caller cannot xmit on a VLAN device, and
a zero h_vlan_proto does not distinguish this result from a physical
egress, so returning NOT_FWDED would be safer for XDP. But the two
cases differ: a foreign-netns parent is clearly fail-worthy, while a
QinQ egress is still a forwardable route (tc xmits on the inner VLAN
device), so failing it closed would reject a usable route. Should
egress signal NOT_FWDED, for both or only foreign-netns? I left it
best-effort, but will change it if you prefer.
Taking the tag as lookup input follows the approach David Ahern
suggested in the 2021 fwmark discussion:
https://lore.kernel.org/bpf/6248c547-ad64-04d6-fcec-374893cc1ef2@gmail.com/
v2: https://lore.kernel.org/all/20260616223426.3568080-1-avinash.duduskar@gmail.com/
v1: https://lore.kernel.org/all/20260609172052.81613-1-avinash.duduskar@gmail.com/
Avinash Duduskar (3):
bpf: Add BPF_FIB_LOOKUP_VLAN flag to bpf_fib_lookup() helper
bpf: Add BPF_FIB_LOOKUP_VLAN_INPUT flag to bpf_fib_lookup() helper
selftests/bpf: Add bpf_fib_lookup() VLAN flag tests
include/uapi/linux/bpf.h | 41 +-
net/core/filter.c | 125 +++-
tools/include/uapi/linux/bpf.h | 41 +-
.../selftests/bpf/prog_tests/fib_lookup.c | 554 +++++++++++++++++-
.../testing/selftests/bpf/progs/fib_lookup.c | 9 +
5 files changed, 741 insertions(+), 29 deletions(-)
base-commit: e771677c937da5808f7b6c1f0e4a97ec1a84f8a8
--
2.54.0
next reply other threads:[~2026-06-17 22:47 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-17 22:47 Avinash Duduskar [this message]
2026-06-17 22:47 ` [PATCH bpf-next v3 1/3] bpf: Add BPF_FIB_LOOKUP_VLAN flag to bpf_fib_lookup() helper Avinash Duduskar
2026-06-17 22:47 ` [PATCH bpf-next v3 2/3] bpf: Add BPF_FIB_LOOKUP_VLAN_INPUT " Avinash Duduskar
2026-06-17 22:47 ` [PATCH bpf-next v3 3/3] selftests/bpf: Add bpf_fib_lookup() VLAN flag tests Avinash Duduskar
2026-06-18 10:07 ` [PATCH bpf-next v3 0/3] bpf: bidirectional VLAN support for bpf_fib_lookup() Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260617224729.1428662-1-avinash.duduskar@gmail.com \
--to=avinash.duduskar@gmail.com \
--cc=a.s.protopopov@gmail.com \
--cc=ameryhung@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=emil@etsalapatis.com \
--cc=eyal.birger@gmail.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=leon.hwang@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=memxor@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=rongtao@cestc.cn \
--cc=sdf@fomichev.me \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=toke@redhat.com \
--cc=yatsenko@meta.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.