From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: "Hudson, Nick" <nhudson@akamai.com>,
Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: "Glasgall, Anna" <aglasgal@akamai.com>,
"Tottenham, Max" <mtottenh@akamai.com>,
"Hunt, Joshua" <johunt@akamai.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Eduard Zingerman <eddyz87@gmail.com>,
Song Liu <song@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@fomichev.me>,
Hao Luo <haoluo@google.com>, Jiri Olsa <jolsa@kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>,
Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>,
Jason Xing <kerneljasonxing@gmail.com>,
Willem de Bruijn <willemb@google.com>,
Paul Chaignon <paul.chaignon@gmail.com>,
Mykyta Yatsenko <yatsenko@meta.com>,
Tao Chen <chen.dylane@linux.dev>,
Kumar Kartikeya Dwivedi <memxor@gmail.com>,
Anton Protopopov <a.s.protopopov@gmail.com>,
Tobias Klauser <tklauser@distanz.ch>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: [RFC PATCH 1/1] bpf: Add tunnel decapsulation and GSO state updates per new flags
Date: Tue, 10 Mar 2026 15:42:55 -0400 [thread overview]
Message-ID: <willemdebruijn.kernel.326800cb16195@gmail.com> (raw)
In-Reply-To: <58BD9063-C619-45C4-AB60-CCA40E391A52@akamai.com>
Hudson, Nick wrote:
>
>
> > On 25 Feb 2026, at 15:45, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> >
> > !-------------------------------------------------------------------|
> > This Message Is From an External Sender
> > This message came from outside your organization.
> > |-------------------------------------------------------------------!
> >
> > Hudson, Nick wrote:
> >>
> >>
> >>> On 20 Feb 2026, at 21:08, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> >>>
> >>> !-------------------------------------------------------------------|
> >>> This Message Is From an External Sender
> >>> This message came from outside your organization.
> >>> |-------------------------------------------------------------------!
> >>>
> >>> Nick Hudson wrote:
> >>>> Enable BPF programs to properly handle GSO state when decapsulating
> >>>> tunneled packets by adding selective GSO flag clearing and a trusted
> >>>> mode for GSO handling.
> >>>>
> >>>> New decapsulation flags:
> >>>>
> >>>> - BPF_F_ADJ_ROOM_DECAP_L4_UDP: Clear UDP tunnel GSO flags
> >>>> (SKB_GSO_UDP_TUNNEL, SKB_GSO_UDP_TUNNEL_CSUM)
> >>>> - BPF_F_ADJ_ROOM_DECAP_L4_GRE: Clear GRE tunnel GSO flags
> >>>> (SKB_GSO_GRE, SKB_GSO_GRE_CSUM)
> >>>> - BPF_F_ADJ_ROOM_DECAP_IPXIP4: Clear SKB_GSO_IPXIP4 flag for
> >>>> IPv4-in-IPv4 (IPIP) and IPv6-in-IPv4 (SIT) tunnels
> >>>> - BPF_F_ADJ_ROOM_DECAP_IPXIP6: Clear SKB_GSO_IPXIP6 flag for
> >>>> IPv6-in-IPv6 and IPv4-in-IPv6 tunnels
> >>>> - BPF_F_ADJ_ROOM_NO_DODGY: Preserve gso_segs and don't set
> >>>> SKB_GSO_DODGY when the BPF program is trusted and modifications
> >>>> are known to be valid
> >>>>
> >>>> The existing anonymous enum for BPF_FUNC_skb_adjust_room flags is
> >>>> renamed to enum bpf_adj_room_flags to enable CO-RE (Compile Once -
> >>>> Run Everywhere) lookups in BPF programs.
> >>>>
> >>>> By default, bpf_skb_adjust_room sets SKB_GSO_DODGY and resets
> >>>> gso_segs to 0, forcing revalidation. The NO_DODGY flag bypasses this
> >>>> for trusted programs that guarantee GSO correctness.
> >>>>
> >>>> Usage example (decapsulating UDP tunnel with IPv4 inner packet):
> >>>> bpf_skb_adjust_room(skb, -hdr_len, BPF_ADJ_ROOM_NET,
> >>>> BPF_F_ADJ_ROOM_DECAP_L3_IPV4 |
> >>>> BPF_F_ADJ_ROOM_DECAP_L4_UDP);
> >>>
> >>> This patch is doing to much in one patch.
> >>
> >> Sure, I’ll split it up.
> >>
> >>>
> >>> Also not convinced of the need for the NO_DODGY flag.
> >>
> >> The reason for NO_DODGY is that, without it, the egress interface will see the
> >> SKB_GSO_DODGY flag. In our use case, we want to avoid marking the egress tap as
> >> NETIF_F_GSO_ROBUST, so the skb will fail skb_gso_ok() with SKB_GSO_DODGY set.
> >> When skb_gso_ok() fails, validate_xmit_skb() calls skb_gso_segment().
> >
> > I understand why you might want it. But the dodgy check has long been
> > there for a reason: becauses these transformations are not blindly
> > accepted by the kernel. This use case does not change that.
>
> The defence I came up with here is...
>
> - setting NETIF_F_GSO_ROBUST for the tun/tap device, as it is a device level property, affects both host to guest and guest to host. the former is trusted. the latter is not. therefore this is not an option.
> - the host to guest direction is fully trusted
> - Physical NIC driver is trusted (kernel driver, hardware-validated GSO)
> - BPF program is trusted (privileged, CAP_BPF, verified by kernel)
> - Decapsulation is trusted operation for BPF code authors
> - Bridge + TAP is internal kernel forwarding
>
> Would protecting its use with a sysctl make it acceptable? (If it isn’t still)
Is the DODGY path and going through GSO a significant impact to your
workload?
So far we have always declined to add such custom opt-outs. This is
not at all the first affected user case.
Either way, let's separate this from the main functional decap patch.
prev parent reply other threads:[~2026-03-10 19:42 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-19 10:47 [RFC PATCH 0/1] bpf: Add tunnel decapsulation and GSO state updates per new flags Nick Hudson
2026-02-19 10:47 ` [RFC PATCH 1/1] " Nick Hudson
2026-02-19 11:50 ` Hudson, Nick
2026-02-19 12:18 ` Oliver Hartkopp
2026-02-20 21:08 ` Willem de Bruijn
2026-02-25 7:12 ` Hudson, Nick
2026-02-25 15:45 ` Willem de Bruijn
2026-03-10 16:26 ` Hudson, Nick
2026-03-10 19:42 ` Willem de Bruijn [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=willemdebruijn.kernel.326800cb16195@gmail.com \
--to=willemdebruijn.kernel@gmail.com \
--cc=a.s.protopopov@gmail.com \
--cc=aglasgal@akamai.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=chen.dylane@linux.dev \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=haoluo@google.com \
--cc=horms@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=johunt@akamai.com \
--cc=jolsa@kernel.org \
--cc=kerneljasonxing@gmail.com \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=memxor@gmail.com \
--cc=mtottenh@akamai.com \
--cc=netdev@vger.kernel.org \
--cc=nhudson@akamai.com \
--cc=pabeni@redhat.com \
--cc=paul.chaignon@gmail.com \
--cc=sdf@fomichev.me \
--cc=song@kernel.org \
--cc=tklauser@distanz.ch \
--cc=willemb@google.com \
--cc=yatsenko@meta.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.