From: Avinash Duduskar <avinash.duduskar@gmail.com>
To: Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>
Cc: "Eduard Zingerman" <eddyz87@gmail.com>,
"Kumar Kartikeya Dwivedi" <memxor@gmail.com>,
"Martin KaFai Lau" <martin.lau@linux.dev>,
"Song Liu" <song@kernel.org>,
"Yonghong Song" <yonghong.song@linux.dev>,
"Jiri Olsa" <jolsa@kernel.org>,
"Emil Tsalapatis" <emil@etsalapatis.com>,
"John Fastabend" <john.fastabend@gmail.com>,
"Stanislav Fomichev" <sdf@fomichev.me>,
"David S. Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Paolo Abeni" <pabeni@redhat.com>,
"Simon Horman" <horms@kernel.org>,
"David Ahern" <dsahern@kernel.org>,
"Shuah Khan" <shuah@kernel.org>,
"Jesper Dangaard Brouer" <hawk@kernel.org>,
"Mykyta Yatsenko" <yatsenko@meta.com>,
"Leon Hwang" <leon.hwang@linux.dev>,
"KP Singh" <kpsingh@kernel.org>,
"Anton Protopopov" <a.s.protopopov@gmail.com>,
"Amery Hung" <ameryhung@gmail.com>,
"Eyal Birger" <eyal.birger@gmail.com>,
"Rong Tao" <rongtao@cestc.cn>,
"Toke Høiland-Jørgensen" <toke@redhat.com>,
bpf@vger.kernel.org, netdev@vger.kernel.org,
linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH bpf-next v2 0/4] bpf: bidirectional VLAN support for bpf_fib_lookup()
Date: Wed, 17 Jun 2026 04:04:22 +0530 [thread overview]
Message-ID: <20260616223426.3568080-1-avinash.duduskar@gmail.com> (raw)
v1 added a single flag, BPF_FIB_LOOKUP_VLAN, to resolve a VLAN egress to
its underlying real device plus the VLAN tag. v2 fixes a QinQ bug the bpf
ci bot found, adds the input direction Toke asked for, adds selftests,
and prepends a fix for a pre-existing l3mdev/VRF lookup bug in the helper.
Patch 1 is an independent fix: bpf_fib_lookup() never initialized the
flow's flowi_l3mdev field, so on the fib-rules path it is read before it
is written. The VRF master is then not resolved and the l3mdev rule fails
to match, so a slave ingress can fail to select its VRF table, today,
with no part of this series. The helper already initializes every other
rules-path flow field (mark, tun_key, uid); l3mdev was added to that set
later and this one was missed. CONFIG_INIT_STACK_ALL_ZERO (the default)
masks it, which is why the VRF selftests in patch 4 pass with or without
it; built with CONFIG_INIT_STACK_ALL_PATTERN a plain bpf_fib_lookup over a
VRF slave returns NOT_FWDED without the patch and resolves with it. It is
first so the VRF behaviour the later patches document and test is well
defined. If you would rather take it through bpf or net on its own, I am
happy to send it separately. It will not apply cleanly before v6.18,
where the flowi4_dscp context line reads flowi4_tos, so a stable backport
needs a trivial context fixup.
Changes v1 -> v2:
- Fix QinQ handling (found by the bpf ci bot): resolve the immediate
parent with vlan_dev_priv(dev)->real_dev instead of
vlan_dev_real_dev() (which walks to the bottom of a stack), and only
swap when that parent is a real device; stacked VLANs are left
unchanged. The egress block is guarded with CONFIG_VLAN_8021Q.
- Add BPF_FIB_LOOKUP_VLAN_INPUT for the input direction (requested by
Toke): supply the packet tag, run the lookup on the matching VLAN
subinterface. Exclusive with BPF_FIB_LOOKUP_TBID (shared union) and
BPF_FIB_LOOKUP_OUTPUT (ingress-only); both return -EINVAL. Taking the
tag as lookup input follows the approach David Ahern suggested in the
2021 fwmark discussion:
https://lore.kernel.org/bpf/6248c547-ad64-04d6-fcec-374893cc1ef2@gmail.com/
- Both directions are network-namespace aware: a VLAN device can be
moved to another netns while registered on its parent, so the egress
swap is skipped (foreign parent ifindex is meaningless) and the input
resolution fails closed for a device in another netns.
- Add 36 selftest cases plus a cross-netns subtest in
prog_tests/fib_lookup.c, covering both directions, the neighbour path,
OUTPUT and DIRECT|TBID, VRF (rule and DIRECT), resolution semantics
(802.1ad, PCP/DEI, QinQ-inner, bond master and port), the frag-needed
mtu_result, the error returns on both families, and the netns boundary
in both directions.
- Document both flags and the now-bidirectional h_vlan_proto/h_vlan_TCI
fields.
Open questions (defaults chosen, noted here in case a maintainer prefers
otherwise):
1. An unmatched, down, or foreign-netns tag returns
BPF_FIB_LKUP_RET_NOT_FWDED, matching the DIRECT path when
fib_get_table() finds no table, rather than a new return code.
2. BPF_FIB_LOOKUP_OUTPUT | BPF_FIB_LOOKUP_VLAN_INPUT is rejected with
-EINVAL; restricting now keeps relaxing later backward-compatible.
3. The name BPF_FIB_LOOKUP_VLAN_INPUT reads oddly next to
BPF_FIB_LOOKUP_OUTPUT. A pair like _VLAN_EGRESS/_VLAN_INGRESS is an
option while nothing is merged.
4. With BPF_FIB_LOOKUP_VLAN, the tc-path mtu check that runs when
tot_len is not set follows params->ifindex, so after a swap it
checks against the parent device rather than the VLAN device (the
route-mtu path via tot_len is unaffected). Checking against the
VLAN device would preserve the pre-flag semantics if that is
preferred.
On the bot's comment-style note: the new comments keep the form that
prevails in net/core/filter.c, and checkpatch --strict is clean.
v1: https://lore.kernel.org/all/20260609172052.81613-1-avinash.duduskar@gmail.com/
Avinash Duduskar (4):
bpf: Initialize the l3mdev field for the fib lookup flow
bpf: Add BPF_FIB_LOOKUP_VLAN flag to bpf_fib_lookup() helper
bpf: Add BPF_FIB_LOOKUP_VLAN_INPUT flag to bpf_fib_lookup() helper
selftests/bpf: Add bpf_fib_lookup() VLAN flag tests
include/uapi/linux/bpf.h | 63 ++-
net/core/filter.c | 119 ++++-
tools/include/uapi/linux/bpf.h | 63 ++-
.../selftests/bpf/prog_tests/fib_lookup.c | 494 +++++++++++++++++-
4 files changed, 726 insertions(+), 13 deletions(-)
base-commit: 140fa23df957b51385aa847986d44ad7f59b0563
--
2.54.0
next reply other threads:[~2026-06-16 22:34 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-16 22:34 Avinash Duduskar [this message]
2026-06-16 22:34 ` [PATCH bpf-next v2 1/4] bpf: Initialize the l3mdev field for the fib lookup flow Avinash Duduskar
2026-06-16 22:34 ` [PATCH bpf-next v2 2/4] bpf: Add BPF_FIB_LOOKUP_VLAN flag to bpf_fib_lookup() helper Avinash Duduskar
2026-06-16 22:34 ` [PATCH bpf-next v2 3/4] bpf: Add BPF_FIB_LOOKUP_VLAN_INPUT " Avinash Duduskar
2026-06-16 22:34 ` [PATCH bpf-next v2 4/4] selftests/bpf: Add bpf_fib_lookup() VLAN flag tests Avinash Duduskar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260616223426.3568080-1-avinash.duduskar@gmail.com \
--to=avinash.duduskar@gmail.com \
--cc=a.s.protopopov@gmail.com \
--cc=ameryhung@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=emil@etsalapatis.com \
--cc=eyal.birger@gmail.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=leon.hwang@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=memxor@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=rongtao@cestc.cn \
--cc=sdf@fomichev.me \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=toke@redhat.com \
--cc=yatsenko@meta.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox