Netdev List
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: David Ahern <dsahern@kernel.org>,
	Avinash Duduskar <avinash.duduskar@gmail.com>,
	ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org
Cc: eddyz87@gmail.com, memxor@gmail.com, martin.lau@linux.dev,
	song@kernel.org, yonghong.song@linux.dev, jolsa@kernel.org,
	emil@etsalapatis.com, john.fastabend@gmail.com, sdf@fomichev.me,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, horms@kernel.org, shuah@kernel.org,
	hawk@kernel.org, yatsenko@meta.com, leon.hwang@linux.dev,
	kpsingh@kernel.org, a.s.protopopov@gmail.com,
	ameryhung@gmail.com, rongtao@cestc.cn, eyal.birger@gmail.com,
	bpf@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org
Subject: Re: [PATCH bpf-next v5 1/3] bpf: Add BPF_FIB_LOOKUP_VLAN flag to bpf_fib_lookup() helper
Date: Wed, 01 Jul 2026 13:02:50 +0200	[thread overview]
Message-ID: <87y0fv0y79.fsf@toke.dk> (raw)
In-Reply-To: <916191fc-2e10-4449-b82b-c086d90283ae@kernel.org>

David Ahern <dsahern@kernel.org> writes:

> On 6/30/26 10:04 AM, Toke Høiland-Jørgensen wrote:
>> David Ahern <dsahern@kernel.org> writes:
>> 
>>> On 6/30/26 4:00 AM, Toke Høiland-Jørgensen wrote:
>>>>> It does not make sense to require a flag to get lookup output. vlan
>>>>> proto of 0 is not valid, so it is a clear indication that the vlan
>>>>> output parameters were not set during the lookup.
>>>>
>>>> Okay, so we could just unconditionally set the VLAN fields, but if we
>>>> start rewriting the ifindex that would be a change of the existing
>>>> behaviour that could break existing applications, no?
>>>
>>> Consistently dealing with upper devices is one of the reasons I never
>>> sent patches for vlan support.
>>>
>>> xdp support is at the driver layer for real (physical) devices. The fib
>>> lookup is going to return the vlan device index - a virtual device.
>>> Support for xdp should not be propagated to virtual devices; it goes
>>> against the intent of xdp. Any trip down this path will have to decide
>>> how to handle vlan-in-vlan use cases. Where is the line drawn for fast
>>> networking?
>> 
>> Right, which is why we need building blocks that makes it possible for
>> XDP programs to do the right thing in the BPF code :)
>> 
>> A helper that resolves the parent could be used for stacked VLANs as
>> well (just calling the helper multiple times).
>> 
>>>> Specifically, if an XDP application has a table of the interfaces it
>>>> forwards between, today they'd get a VLAN interface ifindex, which would
>>>> not be in that table, and the application would return XDP_PASS. Whereas
>>>> if we change the ifindex and populate the VLAN tag, suddenly the
>>>> interface would be in the table, but because the application doesn't
>>>> read the returned VLAN tag, it will end up sending packets out without
>>>> tagging them, leading to broken forwarding.
>>>
>>> I have not followed developments over the past few years. Does XDP have
>>> support for vlan acceleration in the Tx path now? You really want that
>>> to deal with vlans and not replicating s/w processing in ebpf.
>> 
>> It does not, no. There's TX metadata for AF_XDP, but VLAN support is not
>> in there (see include/uapi/linux/if_xdp.h).
>> 
>> Doesn't mean software VLAN handling can't be useful, though; there are
>> use cases other than the very high end where XDP can speed things up
>> even if it has to write a VLAN tag or two...
>> 
>>>> So if we don't want the flag, we'd need some other mechanism to resolve
>>>> the parent ifindex, AFAICT? Maybe a xdp_get_parent_ifindex() kfunc, say?
>>>> That could also be made generic for other stacked interface types, I
>>>> suppose.
>>>>
>>>> WDYT?
>>>
>>> dealing with stacked devices is hard :-)
>>>
>>> What is the return is a bond device or a vlan on a bond device?
>> 
>> Well, bond devices have XDP support, so you can just redirect to those :)
>> 
>> But yeah, each type of stacked device would need to pass different
>> information through to the XDP program, and the program would need to
>> support those. Building a single XDP program that supports all of them
>> will require quite a bit of code, and would probably not perform super
>> well. But most deployments have distinct subsets of features they need,
>> so this does not have to be a blocker, IMO?
>> 
>
> Seems to me the fib_lookup for xdp needs to return the bottom device,
> not the vlan device, for forwarding to work. That's why I added the
> fields to the struct. That allows the program to push the vlan header if
> required. My preference (dream?) was that Tx path had support to tell
> the redirect the vlan and h/w added it on send.

Sure, returning the bottom device index with the VLAN tag makes sense,
and that's basically what this series does (but bails out on stacked
VLANs). However, that's not what the helper does today, which is why the
flag is there, to opt-in to the new behaviour. I don't think we can just
change the ifindex without breaking existing applications (as noted
up-thread).

> But really, once stacked devices come into play, I just wanted to make
> sure thought is given to different use cases. As you know the lookup
> struct if hard bound to 64B and it is trying to cover a lot of use cases.

Agreed, I don't think we can handle stacked devices in this helper. But
we could split it out into a new one. Something like:

struct lower_device_info {
	enum device_type type;
	struct {
		__be16	h_vlan_proto;
		__be16	h_vlan_TCI;
	} vlan;
        /* add other types here */
};

int xdp_get_lower_device(int ifindex, struct lower_device_info *info);

called like:

int xdp_program(struct xdp_md *ctx)
{
        struct lower_device_info dev_info = {};
	int ifindex, ret;

        ifindex = find_destination(ctx); /* does fib lookup, or something else */

        while ((ret = xdp_get_lower_device_info(ifindex, &dev_info)) > 0) {
        	if (dev_info.type == VLAN) {
                      	push_vlan_tag(ctx, &dev_info.vlan);
                        ifindex = ret;
                } else {
                	return XDP_PASS; /* we only handle VLAN devices */
                }
        }

        return bpf_redirect(ifindex, 0);
}


With a helper like this, we obviously don't strictly speaking need to
change the fib lookup helper at all. However, for the single-tagged VLAN
case, I think supporting it directly in the fib lookup could still have
value, as an optimisation: it saves an extra call for resolving the
ifindex, and the fields are already there. So I think my preference
would be to merge this series as-is, and then follow up with a new kfunc
to handle the stacked case. But we could also just drop this series and
go straight to the new kfunc.

WDYT?

-Toke


  reply	other threads:[~2026-07-01 11:02 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-24  3:05 [PATCH bpf-next v5 0/3] bpf: bidirectional VLAN support for bpf_fib_lookup() Avinash Duduskar
2026-06-24  3:05 ` [PATCH bpf-next v5 1/3] bpf: Add BPF_FIB_LOOKUP_VLAN flag to bpf_fib_lookup() helper Avinash Duduskar
2026-06-24  9:33   ` Toke Høiland-Jørgensen
2026-06-24 11:54     ` Avinash Duduskar
2026-06-29 15:11       ` Toke Høiland-Jørgensen
2026-06-26 16:25   ` David Ahern
2026-06-29 15:08     ` Toke Høiland-Jørgensen
2026-06-29 15:49       ` David Ahern
2026-06-30 10:00         ` Toke Høiland-Jørgensen
2026-06-30 14:18           ` David Ahern
2026-06-30 16:04             ` Toke Høiland-Jørgensen
2026-06-30 17:13               ` David Ahern
2026-07-01 11:02                 ` Toke Høiland-Jørgensen [this message]
2026-07-01 15:08                   ` David Ahern
2026-06-24  3:05 ` [PATCH bpf-next v5 2/3] bpf: Add BPF_FIB_LOOKUP_VLAN_INPUT " Avinash Duduskar
2026-06-24  3:05 ` [PATCH bpf-next v5 3/3] selftests/bpf: Add bpf_fib_lookup() VLAN flag tests Avinash Duduskar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y0fv0y79.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=a.s.protopopov@gmail.com \
    --cc=ameryhung@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=avinash.duduskar@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=eddyz87@gmail.com \
    --cc=edumazet@google.com \
    --cc=emil@etsalapatis.com \
    --cc=eyal.birger@gmail.com \
    --cc=hawk@kernel.org \
    --cc=horms@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=leon.hwang@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=memxor@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=rongtao@cestc.cn \
    --cc=sdf@fomichev.me \
    --cc=shuah@kernel.org \
    --cc=song@kernel.org \
    --cc=yatsenko@meta.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox