public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: "Hudson, Nick" <nhudson@akamai.com>,
	 Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: "Glasgall, Anna" <aglasgal@akamai.com>,
	 "Tottenham, Max" <mtottenh@akamai.com>,
	 "Hunt, Joshua" <johunt@akamai.com>,
	 Alexei Starovoitov <ast@kernel.org>,
	 Daniel Borkmann <daniel@iogearbox.net>,
	 Andrii Nakryiko <andrii@kernel.org>,
	 Martin KaFai Lau <martin.lau@linux.dev>,
	 Eduard Zingerman <eddyz87@gmail.com>,
	 Song Liu <song@kernel.org>,
	 Yonghong Song <yonghong.song@linux.dev>,
	 John Fastabend <john.fastabend@gmail.com>,
	 KP Singh <kpsingh@kernel.org>,
	 Stanislav Fomichev <sdf@fomichev.me>,
	 Hao Luo <haoluo@google.com>,  Jiri Olsa <jolsa@kernel.org>,
	 "David S. Miller" <davem@davemloft.net>,
	 Eric Dumazet <edumazet@google.com>,
	 Jakub Kicinski <kuba@kernel.org>,
	 Paolo Abeni <pabeni@redhat.com>,
	 Simon Horman <horms@kernel.org>,
	 Jason Xing <kerneljasonxing@gmail.com>,
	 Willem de Bruijn <willemb@google.com>,
	 Paul Chaignon <paul.chaignon@gmail.com>,
	 Mykyta Yatsenko <yatsenko@meta.com>,
	 Tao Chen <chen.dylane@linux.dev>,
	 Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	 Anton Protopopov <a.s.protopopov@gmail.com>,
	 Tobias Klauser <tklauser@distanz.ch>,
	 "bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	 "netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: [RFC PATCH 1/1] bpf: Add tunnel decapsulation and GSO state updates per new flags
Date: Tue, 10 Mar 2026 15:42:55 -0400	[thread overview]
Message-ID: <willemdebruijn.kernel.326800cb16195@gmail.com> (raw)
In-Reply-To: <58BD9063-C619-45C4-AB60-CCA40E391A52@akamai.com>

Hudson, Nick wrote:
> 
> 
> > On 25 Feb 2026, at 15:45, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> > 
> > !-------------------------------------------------------------------|
> >  This Message Is From an External Sender
> >  This message came from outside your organization.
> > |-------------------------------------------------------------------!
> > 
> > Hudson, Nick wrote:
> >> 
> >> 
> >>> On 20 Feb 2026, at 21:08, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> >>> 
> >>> !-------------------------------------------------------------------|
> >>> This Message Is From an External Sender
> >>> This message came from outside your organization.
> >>> |-------------------------------------------------------------------!
> >>> 
> >>> Nick Hudson wrote:
> >>>> Enable BPF programs to properly handle GSO state when decapsulating
> >>>> tunneled packets by adding selective GSO flag clearing and a trusted
> >>>> mode for GSO handling.
> >>>> 
> >>>> New decapsulation flags:
> >>>> 
> >>>> - BPF_F_ADJ_ROOM_DECAP_L4_UDP: Clear UDP tunnel GSO flags
> >>>> (SKB_GSO_UDP_TUNNEL, SKB_GSO_UDP_TUNNEL_CSUM)
> >>>> - BPF_F_ADJ_ROOM_DECAP_L4_GRE: Clear GRE tunnel GSO flags
> >>>> (SKB_GSO_GRE, SKB_GSO_GRE_CSUM)
> >>>> - BPF_F_ADJ_ROOM_DECAP_IPXIP4: Clear SKB_GSO_IPXIP4 flag for
> >>>> IPv4-in-IPv4 (IPIP) and IPv6-in-IPv4 (SIT) tunnels
> >>>> - BPF_F_ADJ_ROOM_DECAP_IPXIP6: Clear SKB_GSO_IPXIP6 flag for
> >>>> IPv6-in-IPv6 and IPv4-in-IPv6 tunnels
> >>>> - BPF_F_ADJ_ROOM_NO_DODGY: Preserve gso_segs and don't set
> >>>> SKB_GSO_DODGY when the BPF program is trusted and modifications
> >>>> are known to be valid
> >>>> 
> >>>> The existing anonymous enum for BPF_FUNC_skb_adjust_room flags is
> >>>> renamed to enum bpf_adj_room_flags to enable CO-RE (Compile Once -
> >>>> Run Everywhere) lookups in BPF programs.
> >>>> 
> >>>> By default, bpf_skb_adjust_room sets SKB_GSO_DODGY and resets
> >>>> gso_segs to 0, forcing revalidation. The NO_DODGY flag bypasses this
> >>>> for trusted programs that guarantee GSO correctness.
> >>>> 
> >>>> Usage example (decapsulating UDP tunnel with IPv4 inner packet):
> >>>> bpf_skb_adjust_room(skb, -hdr_len, BPF_ADJ_ROOM_NET,
> >>>>                     BPF_F_ADJ_ROOM_DECAP_L3_IPV4 |
> >>>>                     BPF_F_ADJ_ROOM_DECAP_L4_UDP);
> >>> 
> >>> This patch is doing to much in one patch.
> >> 
> >> Sure, I’ll split it up.
> >> 
> >>> 
> >>> Also not convinced of the need for the NO_DODGY flag.
> >> 
> >> The reason for NO_DODGY is that, without it, the egress interface will see the
> >> SKB_GSO_DODGY flag. In our use case, we want to avoid marking the egress tap as
> >> NETIF_F_GSO_ROBUST, so the skb will fail skb_gso_ok() with SKB_GSO_DODGY set.
> >> When skb_gso_ok() fails, validate_xmit_skb() calls skb_gso_segment().
> > 
> > I understand why you might want it. But the dodgy check has long been
> > there for a reason: becauses these transformations are not blindly
> > accepted by the kernel. This use case does not change that.
> 
> The defence I came up with here is...
> 
>     - setting NETIF_F_GSO_ROBUST for the tun/tap device, as it is a device level property, affects both host to guest and guest to host. the former is trusted. the latter is not. therefore this is not an option.
>     - the host to guest direction is fully trusted
>         - Physical NIC driver is trusted (kernel driver, hardware-validated GSO)
>         - BPF program is trusted (privileged, CAP_BPF, verified by kernel)
>         - Decapsulation is trusted operation for BPF code authors
>         - Bridge + TAP is internal kernel forwarding
> 
> Would protecting its use with a sysctl make it acceptable? (If it isn’t still)

Is the DODGY path and going through GSO a significant impact to your
workload?

So far we have always declined to add such custom opt-outs. This is
not at all the first affected user case.

Either way, let's separate this from the main functional decap patch.

      reply	other threads:[~2026-03-10 19:42 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260219104710.1490304-1-nhudson@akamai.com>
2026-02-19 10:47 ` [RFC PATCH 1/1] bpf: Add tunnel decapsulation and GSO state updates per new flags Nick Hudson
2026-02-19 11:50   ` Hudson, Nick
2026-02-19 12:18     ` Oliver Hartkopp
2026-02-20 21:08   ` Willem de Bruijn
2026-02-25  7:12     ` Hudson, Nick
2026-02-25 15:45       ` Willem de Bruijn
2026-03-10 16:26         ` Hudson, Nick
2026-03-10 19:42           ` Willem de Bruijn [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=willemdebruijn.kernel.326800cb16195@gmail.com \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=a.s.protopopov@gmail.com \
    --cc=aglasgal@akamai.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=chen.dylane@linux.dev \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=eddyz87@gmail.com \
    --cc=edumazet@google.com \
    --cc=haoluo@google.com \
    --cc=horms@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=johunt@akamai.com \
    --cc=jolsa@kernel.org \
    --cc=kerneljasonxing@gmail.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=memxor@gmail.com \
    --cc=mtottenh@akamai.com \
    --cc=netdev@vger.kernel.org \
    --cc=nhudson@akamai.com \
    --cc=pabeni@redhat.com \
    --cc=paul.chaignon@gmail.com \
    --cc=sdf@fomichev.me \
    --cc=song@kernel.org \
    --cc=tklauser@distanz.ch \
    --cc=willemb@google.com \
    --cc=yatsenko@meta.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox