* [RFC PATCH 1/1] bpf: Add tunnel decapsulation and GSO state updates per new flags [not found] <20260219104710.1490304-1-nhudson@akamai.com> @ 2026-02-19 10:47 ` Nick Hudson 2026-02-19 11:50 ` Hudson, Nick 2026-02-20 21:08 ` Willem de Bruijn 0 siblings, 2 replies; 8+ messages in thread From: Nick Hudson @ 2026-02-19 10:47 UTC (permalink / raw) Cc: Nick Hudson, Anna Glasgall, Max Tottenham, Josh Hunt, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, Jason Xing, Willem de Bruijn, Paul Chaignon, Mykyta Yatsenko, Tao Chen, Kumar Kartikeya Dwivedi, Anton Protopopov, Tobias Klauser, bpf, linux-kernel, netdev Enable BPF programs to properly handle GSO state when decapsulating tunneled packets by adding selective GSO flag clearing and a trusted mode for GSO handling. New decapsulation flags: - BPF_F_ADJ_ROOM_DECAP_L4_UDP: Clear UDP tunnel GSO flags (SKB_GSO_UDP_TUNNEL, SKB_GSO_UDP_TUNNEL_CSUM) - BPF_F_ADJ_ROOM_DECAP_L4_GRE: Clear GRE tunnel GSO flags (SKB_GSO_GRE, SKB_GSO_GRE_CSUM) - BPF_F_ADJ_ROOM_DECAP_IPXIP4: Clear SKB_GSO_IPXIP4 flag for IPv4-in-IPv4 (IPIP) and IPv6-in-IPv4 (SIT) tunnels - BPF_F_ADJ_ROOM_DECAP_IPXIP6: Clear SKB_GSO_IPXIP6 flag for IPv6-in-IPv6 and IPv4-in-IPv6 tunnels - BPF_F_ADJ_ROOM_NO_DODGY: Preserve gso_segs and don't set SKB_GSO_DODGY when the BPF program is trusted and modifications are known to be valid The existing anonymous enum for BPF_FUNC_skb_adjust_room flags is renamed to enum bpf_adj_room_flags to enable CO-RE (Compile Once - Run Everywhere) lookups in BPF programs. By default, bpf_skb_adjust_room sets SKB_GSO_DODGY and resets gso_segs to 0, forcing revalidation. The NO_DODGY flag bypasses this for trusted programs that guarantee GSO correctness. Usage example (decapsulating UDP tunnel with IPv4 inner packet): bpf_skb_adjust_room(skb, -hdr_len, BPF_ADJ_ROOM_NET, BPF_F_ADJ_ROOM_DECAP_L3_IPV4 | BPF_F_ADJ_ROOM_DECAP_L4_UDP); Co-developed-by: Anna Glasgall <aglasgal@akamai.com> Signed-off-by: Anna Glasgall <aglasgal@akamai.com> Co-developed-by: Max Tottenham <mtottenh@akamai.com> Signed-off-by: Max Tottenham <mtottenh@akamai.com> Signed-off-by: Josh Hunt <johunt@akamai.com> Signed-off-by: Nick Hudson <nhudson@akamai.com> --- include/uapi/linux/bpf.h | 45 +++++++++++++++++++-- net/core/filter.c | 73 ++++++++++++++++++++++++++++------ tools/include/uapi/linux/bpf.h | 45 +++++++++++++++++++-- 3 files changed, 145 insertions(+), 18 deletions(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index c8d400b7680a..0cb24ab70af7 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -3010,8 +3010,42 @@ union bpf_attr { * * * **BPF_F_ADJ_ROOM_DECAP_L3_IPV4**, * **BPF_F_ADJ_ROOM_DECAP_L3_IPV6**: - * Indicate the new IP header version after decapsulating the outer - * IP header. Used when the inner and outer IP versions are different. + * Indicate the new IP header version after decapsulating the + * outer IP header. Used when the inner and outer IP versions + * are different. These flags only trigger a protocol change + * without clearing any tunnel-specific GSO flags. + * + * * **BPF_F_ADJ_ROOM_DECAP_L4_GRE**: + * Clear GRE tunnel GSO flags (SKB_GSO_GRE and SKB_GSO_GRE_CSUM) + * when decapsulating a GRE tunnel. + * + * * **BPF_F_ADJ_ROOM_DECAP_L4_UDP**: + * Clear UDP tunnel GSO flags (SKB_GSO_UDP_TUNNEL and + * SKB_GSO_UDP_TUNNEL_CSUM) when decapsulating a UDP tunnel. + * + * * **BPF_F_ADJ_ROOM_DECAP_IPXIP4**: + * Clear IPIP/SIT tunnel GSO flag (SKB_GSO_IPXIP4) when decapsulating + * a tunnel with an outer IPv4 header (IPv4-in-IPv4 or IPv6-in-IPv4). + * + * * **BPF_F_ADJ_ROOM_DECAP_IPXIP6**: + * Clear IPv6 encapsulation tunnel GSO flag (SKB_GSO_IPXIP6) when + * decapsulating a tunnel with an outer IPv6 header (IPv6-in-IPv6 + * or IPv4-in-IPv6). + * + * When using the decapsulation flags above, the skb->encapsulation + * flag is automatically cleared if all tunnel-specific GSO flags + * (SKB_GSO_UDP_TUNNEL, SKB_GSO_UDP_TUNNEL_CSUM, SKB_GSO_GRE, + * SKB_GSO_GRE_CSUM, SKB_GSO_IPXIP4, SKB_GSO_IPXIP6) have been + * removed from the packet. This handles cases where all tunnel + * layers have been decapsulated. + * + * * **BPF_F_ADJ_ROOM_NO_DODGY**: + * Do not mark the packet as dodgy (untrusted) and preserve + * the existing gso_segs count. By default, packet modifications + * set SKB_GSO_DODGY and reset gso_segs to 0, forcing + * revalidation. This flag is useful when decapsulating the + * tunnel, the BPF program is trusted, and the modifications + * are known to be valid. * * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers @@ -6209,7 +6243,7 @@ enum { }; /* BPF_FUNC_skb_adjust_room flags. */ -enum { +enum bpf_adj_room_flags { BPF_F_ADJ_ROOM_FIXED_GSO = (1ULL << 0), BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 = (1ULL << 1), BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 = (1ULL << 2), @@ -6219,6 +6253,11 @@ enum { BPF_F_ADJ_ROOM_ENCAP_L2_ETH = (1ULL << 6), BPF_F_ADJ_ROOM_DECAP_L3_IPV4 = (1ULL << 7), BPF_F_ADJ_ROOM_DECAP_L3_IPV6 = (1ULL << 8), + BPF_F_ADJ_ROOM_DECAP_L4_GRE = (1ULL << 9), + BPF_F_ADJ_ROOM_DECAP_L4_UDP = (1ULL << 10), + BPF_F_ADJ_ROOM_DECAP_IPXIP4 = (1ULL << 11), + BPF_F_ADJ_ROOM_DECAP_IPXIP6 = (1ULL << 12), + BPF_F_ADJ_ROOM_NO_DODGY = (1ULL << 13), }; enum { diff --git a/net/core/filter.c b/net/core/filter.c index ba019ded773d..681dd53ab841 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3484,14 +3484,28 @@ static u32 bpf_skb_net_base_len(const struct sk_buff *skb) #define BPF_F_ADJ_ROOM_DECAP_L3_MASK (BPF_F_ADJ_ROOM_DECAP_L3_IPV4 | \ BPF_F_ADJ_ROOM_DECAP_L3_IPV6) -#define BPF_F_ADJ_ROOM_MASK (BPF_F_ADJ_ROOM_FIXED_GSO | \ - BPF_F_ADJ_ROOM_ENCAP_L3_MASK | \ +#define BPF_F_ADJ_ROOM_DECAP_L4_MASK (BPF_F_ADJ_ROOM_DECAP_L4_UDP | \ + BPF_F_ADJ_ROOM_DECAP_L4_GRE) + +#define BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK (BPF_F_ADJ_ROOM_DECAP_IPXIP4 | \ + BPF_F_ADJ_ROOM_DECAP_IPXIP6) + +#define BPF_F_ADJ_ROOM_ENCAP_MASK (BPF_F_ADJ_ROOM_ENCAP_L3_MASK | \ BPF_F_ADJ_ROOM_ENCAP_L4_GRE | \ BPF_F_ADJ_ROOM_ENCAP_L4_UDP | \ BPF_F_ADJ_ROOM_ENCAP_L2_ETH | \ BPF_F_ADJ_ROOM_ENCAP_L2( \ - BPF_ADJ_ROOM_ENCAP_L2_MASK) | \ - BPF_F_ADJ_ROOM_DECAP_L3_MASK) + BPF_ADJ_ROOM_ENCAP_L2_MASK)) + +#define BPF_F_ADJ_ROOM_DECAP_MASK (BPF_F_ADJ_ROOM_DECAP_L3_MASK | \ + BPF_F_ADJ_ROOM_DECAP_L4_MASK | \ + BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK) + +#define BPF_F_ADJ_ROOM_MASK (BPF_F_ADJ_ROOM_FIXED_GSO | \ + BPF_F_ADJ_ROOM_ENCAP_MASK | \ + BPF_F_ADJ_ROOM_DECAP_MASK | \ + BPF_F_ADJ_ROOM_NO_CSUM_RESET | \ + BPF_F_ADJ_ROOM_NO_DODGY) static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, u64 flags) @@ -3503,6 +3517,10 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, unsigned int gso_type = SKB_GSO_DODGY; int ret; + if (unlikely(flags & (BPF_F_ADJ_ROOM_DECAP_MASK | + BPF_F_ADJ_ROOM_NO_DODGY))) + return -EINVAL; + if (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) { /* udp gso_size delineates datagrams, only allow if fixed */ if (!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) || @@ -3588,8 +3606,10 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, if (skb_is_gso(skb)) { struct skb_shared_info *shinfo = skb_shinfo(skb); - /* Header must be checked, and gso_segs recomputed. */ + /* Add tunnel GSO type flags as appropriate. */ shinfo->gso_type |= gso_type; + + /* Header must be checked, and gso_segs recomputed. */ shinfo->gso_segs = 0; /* Due to header growth, MSS needs to be downgraded. @@ -3610,11 +3630,14 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff, u64 flags) { + bool no_dodgy = flags & BPF_F_ADJ_ROOM_NO_DODGY; int ret; if (unlikely(flags & ~(BPF_F_ADJ_ROOM_FIXED_GSO | BPF_F_ADJ_ROOM_DECAP_L3_MASK | - BPF_F_ADJ_ROOM_NO_CSUM_RESET))) + BPF_F_ADJ_ROOM_DECAP_MASK | + BPF_F_ADJ_ROOM_NO_CSUM_RESET | + BPF_F_ADJ_ROOM_NO_DODGY))) return -EINVAL; if (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) { @@ -3647,9 +3670,36 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff, if (!(flags & BPF_F_ADJ_ROOM_FIXED_GSO)) skb_increase_gso_size(shinfo, len_diff); - /* Header must be checked, and gso_segs recomputed. */ - shinfo->gso_type |= SKB_GSO_DODGY; - shinfo->gso_segs = 0; + /* Selective GSO flag clearing based on decap type. + * Only clear the flags for the tunnel layer being removed. + */ + if (flags & BPF_F_ADJ_ROOM_DECAP_L4_UDP) + shinfo->gso_type &= ~(SKB_GSO_UDP_TUNNEL | + SKB_GSO_UDP_TUNNEL_CSUM); + if (flags & BPF_F_ADJ_ROOM_DECAP_L4_GRE) + shinfo->gso_type &= ~(SKB_GSO_GRE | + SKB_GSO_GRE_CSUM); + if (flags & BPF_F_ADJ_ROOM_DECAP_IPXIP4) + shinfo->gso_type &= ~SKB_GSO_IPXIP4; + if (flags & BPF_F_ADJ_ROOM_DECAP_IPXIP6) + shinfo->gso_type &= ~SKB_GSO_IPXIP6; + + /* Clear encapsulation flag only when no tunnel GSO flags remain */ + if (flags & BPF_F_ADJ_ROOM_DECAP_MASK) { + if (!(shinfo->gso_type & (SKB_GSO_UDP_TUNNEL | + SKB_GSO_UDP_TUNNEL_CSUM | + SKB_GSO_GRE | + SKB_GSO_GRE_CSUM | + SKB_GSO_IPXIP4 | + SKB_GSO_IPXIP6))) + skb->encapsulation = 0; + } + + /* NO_DODGY: preserve gso_segs, don't mark as dodgy. */ + if (!no_dodgy) { + shinfo->gso_type |= SKB_GSO_DODGY; + shinfo->gso_segs = 0; + } } return 0; @@ -3709,8 +3759,7 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff, u32 off; int ret; - if (unlikely(flags & ~(BPF_F_ADJ_ROOM_MASK | - BPF_F_ADJ_ROOM_NO_CSUM_RESET))) + if (unlikely(flags & ~BPF_F_ADJ_ROOM_MASK)) return -EINVAL; if (unlikely(len_diff_abs > 0xfffU)) return -EFAULT; @@ -3729,7 +3778,7 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff, return -ENOTSUPP; } - if (flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK) { + if (flags & BPF_F_ADJ_ROOM_DECAP_MASK) { if (!shrink) return -EINVAL; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 5e38b4887de6..664bc8438186 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -3010,8 +3010,42 @@ union bpf_attr { * * * **BPF_F_ADJ_ROOM_DECAP_L3_IPV4**, * **BPF_F_ADJ_ROOM_DECAP_L3_IPV6**: - * Indicate the new IP header version after decapsulating the outer - * IP header. Used when the inner and outer IP versions are different. + * Indicate the new IP header version after decapsulating the + * outer IP header. Used when the inner and outer IP versions + * are different. These flags only trigger a protocol change + * without clearing any tunnel-specific GSO flags. + * + * * **BPF_F_ADJ_ROOM_DECAP_L4_GRE**: + * Clear GRE tunnel GSO flags (SKB_GSO_GRE and SKB_GSO_GRE_CSUM) + * when decapsulating a GRE tunnel. + * + * * **BPF_F_ADJ_ROOM_DECAP_L4_UDP**: + * Clear UDP tunnel GSO flags (SKB_GSO_UDP_TUNNEL and + * SKB_GSO_UDP_TUNNEL_CSUM) when decapsulating a UDP tunnel. + * + * * **BPF_F_ADJ_ROOM_DECAP_IPXIP4**: + * Clear IPIP/SIT tunnel GSO flag (SKB_GSO_IPXIP4) when decapsulating + * a tunnel with an outer IPv4 header (IPv4-in-IPv4 or IPv6-in-IPv4). + * + * * **BPF_F_ADJ_ROOM_DECAP_IPXIP6**: + * Clear IPv6 encapsulation tunnel GSO flag (SKB_GSO_IPXIP6) when + * decapsulating a tunnel with an outer IPv6 header (IPv6-in-IPv6 + * or IPv4-in-IPv6). + * + * When using the decapsulation flags above, the skb->encapsulation + * flag is automatically cleared if all tunnel-specific GSO flags + * (SKB_GSO_UDP_TUNNEL, SKB_GSO_UDP_TUNNEL_CSUM, SKB_GSO_GRE, + * SKB_GSO_GRE_CSUM, SKB_GSO_IPXIP4, SKB_GSO_IPXIP6) have been + * removed from the packet. This handles cases where all tunnel + * layers have been decapsulated. + * + * * **BPF_F_ADJ_ROOM_NO_DODGY**: + * Do not mark the packet as dodgy (untrusted) and preserve + * the existing gso_segs count. By default, packet modifications + * set SKB_GSO_DODGY and reset gso_segs to 0, forcing + * revalidation. This flag is useful when decapsulating the + * tunnel, the BPF program is trusted, and the modifications + * are known to be valid. * * A call to this helper is susceptible to change the underlying * packet buffer. Therefore, at load time, all checks on pointers @@ -6209,7 +6243,7 @@ enum { }; /* BPF_FUNC_skb_adjust_room flags. */ -enum { +enum bpf_adj_room_flags { BPF_F_ADJ_ROOM_FIXED_GSO = (1ULL << 0), BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 = (1ULL << 1), BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 = (1ULL << 2), @@ -6219,6 +6253,11 @@ enum { BPF_F_ADJ_ROOM_ENCAP_L2_ETH = (1ULL << 6), BPF_F_ADJ_ROOM_DECAP_L3_IPV4 = (1ULL << 7), BPF_F_ADJ_ROOM_DECAP_L3_IPV6 = (1ULL << 8), + BPF_F_ADJ_ROOM_DECAP_L4_GRE = (1ULL << 9), + BPF_F_ADJ_ROOM_DECAP_L4_UDP = (1ULL << 10), + BPF_F_ADJ_ROOM_DECAP_IPXIP4 = (1ULL << 11), + BPF_F_ADJ_ROOM_DECAP_IPXIP6 = (1ULL << 12), + BPF_F_ADJ_ROOM_NO_DODGY = (1ULL << 13), }; enum { -- 2.34.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 1/1] bpf: Add tunnel decapsulation and GSO state updates per new flags 2026-02-19 10:47 ` [RFC PATCH 1/1] bpf: Add tunnel decapsulation and GSO state updates per new flags Nick Hudson @ 2026-02-19 11:50 ` Hudson, Nick 2026-02-19 12:18 ` Oliver Hartkopp 2026-02-20 21:08 ` Willem de Bruijn 1 sibling, 1 reply; 8+ messages in thread From: Hudson, Nick @ 2026-02-19 11:50 UTC (permalink / raw) Cc: Glasgall, Anna, Tottenham, Max, Hunt, Joshua, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, Jason Xing, Willem de Bruijn, Paul Chaignon, Mykyta Yatsenko, Tao Chen, Kumar Kartikeya Dwivedi, Anton Protopopov, Tobias Klauser, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 13732 bytes --] [still learning git send-email] This was supposed to be sent with this: Our use case involves a BPF program that removes VXLAN headers from packets that are subsequently forwarded via a tap interface. The new flags allow the BPF program to strip the tunnel header and selectively clear the tunnel-specific GSO state. This allows larget packets to be forwarded without segmentation through the (non-robust) tap interface for the decapsulated packets. > On 19 Feb 2026, at 10:47, Nick Hudson <nhudson@akamai.com> wrote: > > Enable BPF programs to properly handle GSO state when decapsulating > tunneled packets by adding selective GSO flag clearing and a trusted > mode for GSO handling. > > New decapsulation flags: > > - BPF_F_ADJ_ROOM_DECAP_L4_UDP: Clear UDP tunnel GSO flags > (SKB_GSO_UDP_TUNNEL, SKB_GSO_UDP_TUNNEL_CSUM) > - BPF_F_ADJ_ROOM_DECAP_L4_GRE: Clear GRE tunnel GSO flags > (SKB_GSO_GRE, SKB_GSO_GRE_CSUM) > - BPF_F_ADJ_ROOM_DECAP_IPXIP4: Clear SKB_GSO_IPXIP4 flag for > IPv4-in-IPv4 (IPIP) and IPv6-in-IPv4 (SIT) tunnels > - BPF_F_ADJ_ROOM_DECAP_IPXIP6: Clear SKB_GSO_IPXIP6 flag for > IPv6-in-IPv6 and IPv4-in-IPv6 tunnels > - BPF_F_ADJ_ROOM_NO_DODGY: Preserve gso_segs and don't set > SKB_GSO_DODGY when the BPF program is trusted and modifications > are known to be valid > > The existing anonymous enum for BPF_FUNC_skb_adjust_room flags is > renamed to enum bpf_adj_room_flags to enable CO-RE (Compile Once - > Run Everywhere) lookups in BPF programs. > > By default, bpf_skb_adjust_room sets SKB_GSO_DODGY and resets > gso_segs to 0, forcing revalidation. The NO_DODGY flag bypasses this > for trusted programs that guarantee GSO correctness. > > Usage example (decapsulating UDP tunnel with IPv4 inner packet): > bpf_skb_adjust_room(skb, -hdr_len, BPF_ADJ_ROOM_NET, > BPF_F_ADJ_ROOM_DECAP_L3_IPV4 | > BPF_F_ADJ_ROOM_DECAP_L4_UDP); > > Co-developed-by: Anna Glasgall <aglasgal@akamai.com> > Signed-off-by: Anna Glasgall <aglasgal@akamai.com> > Co-developed-by: Max Tottenham <mtottenh@akamai.com> > Signed-off-by: Max Tottenham <mtottenh@akamai.com> > Signed-off-by: Josh Hunt <johunt@akamai.com> > Signed-off-by: Nick Hudson <nhudson@akamai.com> > --- > include/uapi/linux/bpf.h | 45 +++++++++++++++++++-- > net/core/filter.c | 73 ++++++++++++++++++++++++++++------ > tools/include/uapi/linux/bpf.h | 45 +++++++++++++++++++-- > 3 files changed, 145 insertions(+), 18 deletions(-) > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index c8d400b7680a..0cb24ab70af7 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -3010,8 +3010,42 @@ union bpf_attr { > * > * * **BPF_F_ADJ_ROOM_DECAP_L3_IPV4**, > * **BPF_F_ADJ_ROOM_DECAP_L3_IPV6**: > - * Indicate the new IP header version after decapsulating the outer > - * IP header. Used when the inner and outer IP versions are different. > + * Indicate the new IP header version after decapsulating the > + * outer IP header. Used when the inner and outer IP versions > + * are different. These flags only trigger a protocol change > + * without clearing any tunnel-specific GSO flags. > + * > + * * **BPF_F_ADJ_ROOM_DECAP_L4_GRE**: > + * Clear GRE tunnel GSO flags (SKB_GSO_GRE and SKB_GSO_GRE_CSUM) > + * when decapsulating a GRE tunnel. > + * > + * * **BPF_F_ADJ_ROOM_DECAP_L4_UDP**: > + * Clear UDP tunnel GSO flags (SKB_GSO_UDP_TUNNEL and > + * SKB_GSO_UDP_TUNNEL_CSUM) when decapsulating a UDP tunnel. > + * > + * * **BPF_F_ADJ_ROOM_DECAP_IPXIP4**: > + * Clear IPIP/SIT tunnel GSO flag (SKB_GSO_IPXIP4) when decapsulating > + * a tunnel with an outer IPv4 header (IPv4-in-IPv4 or IPv6-in-IPv4). > + * > + * * **BPF_F_ADJ_ROOM_DECAP_IPXIP6**: > + * Clear IPv6 encapsulation tunnel GSO flag (SKB_GSO_IPXIP6) when > + * decapsulating a tunnel with an outer IPv6 header (IPv6-in-IPv6 > + * or IPv4-in-IPv6). > + * > + * When using the decapsulation flags above, the skb->encapsulation > + * flag is automatically cleared if all tunnel-specific GSO flags > + * (SKB_GSO_UDP_TUNNEL, SKB_GSO_UDP_TUNNEL_CSUM, SKB_GSO_GRE, > + * SKB_GSO_GRE_CSUM, SKB_GSO_IPXIP4, SKB_GSO_IPXIP6) have been > + * removed from the packet. This handles cases where all tunnel > + * layers have been decapsulated. > + * > + * * **BPF_F_ADJ_ROOM_NO_DODGY**: > + * Do not mark the packet as dodgy (untrusted) and preserve > + * the existing gso_segs count. By default, packet modifications > + * set SKB_GSO_DODGY and reset gso_segs to 0, forcing > + * revalidation. This flag is useful when decapsulating the > + * tunnel, the BPF program is trusted, and the modifications > + * are known to be valid. > * > * A call to this helper is susceptible to change the underlying > * packet buffer. Therefore, at load time, all checks on pointers > @@ -6209,7 +6243,7 @@ enum { > }; > > /* BPF_FUNC_skb_adjust_room flags. */ > -enum { > +enum bpf_adj_room_flags { > BPF_F_ADJ_ROOM_FIXED_GSO = (1ULL << 0), > BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 = (1ULL << 1), > BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 = (1ULL << 2), > @@ -6219,6 +6253,11 @@ enum { > BPF_F_ADJ_ROOM_ENCAP_L2_ETH = (1ULL << 6), > BPF_F_ADJ_ROOM_DECAP_L3_IPV4 = (1ULL << 7), > BPF_F_ADJ_ROOM_DECAP_L3_IPV6 = (1ULL << 8), > + BPF_F_ADJ_ROOM_DECAP_L4_GRE = (1ULL << 9), > + BPF_F_ADJ_ROOM_DECAP_L4_UDP = (1ULL << 10), > + BPF_F_ADJ_ROOM_DECAP_IPXIP4 = (1ULL << 11), > + BPF_F_ADJ_ROOM_DECAP_IPXIP6 = (1ULL << 12), > + BPF_F_ADJ_ROOM_NO_DODGY = (1ULL << 13), > }; > > enum { > diff --git a/net/core/filter.c b/net/core/filter.c > index ba019ded773d..681dd53ab841 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -3484,14 +3484,28 @@ static u32 bpf_skb_net_base_len(const struct sk_buff *skb) > #define BPF_F_ADJ_ROOM_DECAP_L3_MASK (BPF_F_ADJ_ROOM_DECAP_L3_IPV4 | \ > BPF_F_ADJ_ROOM_DECAP_L3_IPV6) > > -#define BPF_F_ADJ_ROOM_MASK (BPF_F_ADJ_ROOM_FIXED_GSO | \ > - BPF_F_ADJ_ROOM_ENCAP_L3_MASK | \ > +#define BPF_F_ADJ_ROOM_DECAP_L4_MASK (BPF_F_ADJ_ROOM_DECAP_L4_UDP | \ > + BPF_F_ADJ_ROOM_DECAP_L4_GRE) > + > +#define BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK (BPF_F_ADJ_ROOM_DECAP_IPXIP4 | \ > + BPF_F_ADJ_ROOM_DECAP_IPXIP6) > + > +#define BPF_F_ADJ_ROOM_ENCAP_MASK (BPF_F_ADJ_ROOM_ENCAP_L3_MASK | \ > BPF_F_ADJ_ROOM_ENCAP_L4_GRE | \ > BPF_F_ADJ_ROOM_ENCAP_L4_UDP | \ > BPF_F_ADJ_ROOM_ENCAP_L2_ETH | \ > BPF_F_ADJ_ROOM_ENCAP_L2( \ > - BPF_ADJ_ROOM_ENCAP_L2_MASK) | \ > - BPF_F_ADJ_ROOM_DECAP_L3_MASK) > + BPF_ADJ_ROOM_ENCAP_L2_MASK)) > + > +#define BPF_F_ADJ_ROOM_DECAP_MASK (BPF_F_ADJ_ROOM_DECAP_L3_MASK | \ > + BPF_F_ADJ_ROOM_DECAP_L4_MASK | \ > + BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK) > + > +#define BPF_F_ADJ_ROOM_MASK (BPF_F_ADJ_ROOM_FIXED_GSO | \ > + BPF_F_ADJ_ROOM_ENCAP_MASK | \ > + BPF_F_ADJ_ROOM_DECAP_MASK | \ > + BPF_F_ADJ_ROOM_NO_CSUM_RESET | \ > + BPF_F_ADJ_ROOM_NO_DODGY) > > static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, > u64 flags) > @@ -3503,6 +3517,10 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, > unsigned int gso_type = SKB_GSO_DODGY; > int ret; > > + if (unlikely(flags & (BPF_F_ADJ_ROOM_DECAP_MASK | > + BPF_F_ADJ_ROOM_NO_DODGY))) > + return -EINVAL; > + > if (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) { > /* udp gso_size delineates datagrams, only allow if fixed */ > if (!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) || > @@ -3588,8 +3606,10 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, > if (skb_is_gso(skb)) { > struct skb_shared_info *shinfo = skb_shinfo(skb); > > - /* Header must be checked, and gso_segs recomputed. */ > + /* Add tunnel GSO type flags as appropriate. */ > shinfo->gso_type |= gso_type; > + > + /* Header must be checked, and gso_segs recomputed. */ > shinfo->gso_segs = 0; > > /* Due to header growth, MSS needs to be downgraded. > @@ -3610,11 +3630,14 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, > static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff, > u64 flags) > { > + bool no_dodgy = flags & BPF_F_ADJ_ROOM_NO_DODGY; > int ret; > > if (unlikely(flags & ~(BPF_F_ADJ_ROOM_FIXED_GSO | > BPF_F_ADJ_ROOM_DECAP_L3_MASK | > - BPF_F_ADJ_ROOM_NO_CSUM_RESET))) > + BPF_F_ADJ_ROOM_DECAP_MASK | > + BPF_F_ADJ_ROOM_NO_CSUM_RESET | > + BPF_F_ADJ_ROOM_NO_DODGY))) > return -EINVAL; > > if (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) { > @@ -3647,9 +3670,36 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff, > if (!(flags & BPF_F_ADJ_ROOM_FIXED_GSO)) > skb_increase_gso_size(shinfo, len_diff); > > - /* Header must be checked, and gso_segs recomputed. */ > - shinfo->gso_type |= SKB_GSO_DODGY; > - shinfo->gso_segs = 0; > + /* Selective GSO flag clearing based on decap type. > + * Only clear the flags for the tunnel layer being removed. > + */ > + if (flags & BPF_F_ADJ_ROOM_DECAP_L4_UDP) > + shinfo->gso_type &= ~(SKB_GSO_UDP_TUNNEL | > + SKB_GSO_UDP_TUNNEL_CSUM); > + if (flags & BPF_F_ADJ_ROOM_DECAP_L4_GRE) > + shinfo->gso_type &= ~(SKB_GSO_GRE | > + SKB_GSO_GRE_CSUM); > + if (flags & BPF_F_ADJ_ROOM_DECAP_IPXIP4) > + shinfo->gso_type &= ~SKB_GSO_IPXIP4; > + if (flags & BPF_F_ADJ_ROOM_DECAP_IPXIP6) > + shinfo->gso_type &= ~SKB_GSO_IPXIP6; > + > + /* Clear encapsulation flag only when no tunnel GSO flags remain */ > + if (flags & BPF_F_ADJ_ROOM_DECAP_MASK) { > + if (!(shinfo->gso_type & (SKB_GSO_UDP_TUNNEL | > + SKB_GSO_UDP_TUNNEL_CSUM | > + SKB_GSO_GRE | > + SKB_GSO_GRE_CSUM | > + SKB_GSO_IPXIP4 | > + SKB_GSO_IPXIP6))) > + skb->encapsulation = 0; > + } > + > + /* NO_DODGY: preserve gso_segs, don't mark as dodgy. */ > + if (!no_dodgy) { > + shinfo->gso_type |= SKB_GSO_DODGY; > + shinfo->gso_segs = 0; > + } > } > > return 0; > @@ -3709,8 +3759,7 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff, > u32 off; > int ret; > > - if (unlikely(flags & ~(BPF_F_ADJ_ROOM_MASK | > - BPF_F_ADJ_ROOM_NO_CSUM_RESET))) > + if (unlikely(flags & ~BPF_F_ADJ_ROOM_MASK)) > return -EINVAL; > if (unlikely(len_diff_abs > 0xfffU)) > return -EFAULT; > @@ -3729,7 +3778,7 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff, > return -ENOTSUPP; > } > > - if (flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK) { > + if (flags & BPF_F_ADJ_ROOM_DECAP_MASK) { > if (!shrink) > return -EINVAL; > > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h > index 5e38b4887de6..664bc8438186 100644 > --- a/tools/include/uapi/linux/bpf.h > +++ b/tools/include/uapi/linux/bpf.h > @@ -3010,8 +3010,42 @@ union bpf_attr { > * > * * **BPF_F_ADJ_ROOM_DECAP_L3_IPV4**, > * **BPF_F_ADJ_ROOM_DECAP_L3_IPV6**: > - * Indicate the new IP header version after decapsulating the outer > - * IP header. Used when the inner and outer IP versions are different. > + * Indicate the new IP header version after decapsulating the > + * outer IP header. Used when the inner and outer IP versions > + * are different. These flags only trigger a protocol change > + * without clearing any tunnel-specific GSO flags. > + * > + * * **BPF_F_ADJ_ROOM_DECAP_L4_GRE**: > + * Clear GRE tunnel GSO flags (SKB_GSO_GRE and SKB_GSO_GRE_CSUM) > + * when decapsulating a GRE tunnel. > + * > + * * **BPF_F_ADJ_ROOM_DECAP_L4_UDP**: > + * Clear UDP tunnel GSO flags (SKB_GSO_UDP_TUNNEL and > + * SKB_GSO_UDP_TUNNEL_CSUM) when decapsulating a UDP tunnel. > + * > + * * **BPF_F_ADJ_ROOM_DECAP_IPXIP4**: > + * Clear IPIP/SIT tunnel GSO flag (SKB_GSO_IPXIP4) when decapsulating > + * a tunnel with an outer IPv4 header (IPv4-in-IPv4 or IPv6-in-IPv4). > + * > + * * **BPF_F_ADJ_ROOM_DECAP_IPXIP6**: > + * Clear IPv6 encapsulation tunnel GSO flag (SKB_GSO_IPXIP6) when > + * decapsulating a tunnel with an outer IPv6 header (IPv6-in-IPv6 > + * or IPv4-in-IPv6). > + * > + * When using the decapsulation flags above, the skb->encapsulation > + * flag is automatically cleared if all tunnel-specific GSO flags > + * (SKB_GSO_UDP_TUNNEL, SKB_GSO_UDP_TUNNEL_CSUM, SKB_GSO_GRE, > + * SKB_GSO_GRE_CSUM, SKB_GSO_IPXIP4, SKB_GSO_IPXIP6) have been > + * removed from the packet. This handles cases where all tunnel > + * layers have been decapsulated. > + * > + * * **BPF_F_ADJ_ROOM_NO_DODGY**: > + * Do not mark the packet as dodgy (untrusted) and preserve > + * the existing gso_segs count. By default, packet modifications > + * set SKB_GSO_DODGY and reset gso_segs to 0, forcing > + * revalidation. This flag is useful when decapsulating the > + * tunnel, the BPF program is trusted, and the modifications > + * are known to be valid. > * > * A call to this helper is susceptible to change the underlying > * packet buffer. Therefore, at load time, all checks on pointers > @@ -6209,7 +6243,7 @@ enum { > }; > > /* BPF_FUNC_skb_adjust_room flags. */ > -enum { > +enum bpf_adj_room_flags { > BPF_F_ADJ_ROOM_FIXED_GSO = (1ULL << 0), > BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 = (1ULL << 1), > BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 = (1ULL << 2), > @@ -6219,6 +6253,11 @@ enum { > BPF_F_ADJ_ROOM_ENCAP_L2_ETH = (1ULL << 6), > BPF_F_ADJ_ROOM_DECAP_L3_IPV4 = (1ULL << 7), > BPF_F_ADJ_ROOM_DECAP_L3_IPV6 = (1ULL << 8), > + BPF_F_ADJ_ROOM_DECAP_L4_GRE = (1ULL << 9), > + BPF_F_ADJ_ROOM_DECAP_L4_UDP = (1ULL << 10), > + BPF_F_ADJ_ROOM_DECAP_IPXIP4 = (1ULL << 11), > + BPF_F_ADJ_ROOM_DECAP_IPXIP6 = (1ULL << 12), > + BPF_F_ADJ_ROOM_NO_DODGY = (1ULL << 13), > }; > > enum { > -- > 2.34.1 > [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 3067 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 1/1] bpf: Add tunnel decapsulation and GSO state updates per new flags 2026-02-19 11:50 ` Hudson, Nick @ 2026-02-19 12:18 ` Oliver Hartkopp 0 siblings, 0 replies; 8+ messages in thread From: Oliver Hartkopp @ 2026-02-19 12:18 UTC (permalink / raw) To: Hudson, Nick Cc: Glasgall, Anna, Tottenham, Max, Hunt, Joshua, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, Jason Xing, Willem de Bruijn, Paul Chaignon, Mykyta Yatsenko, Tao Chen, Kumar Kartikeya Dwivedi, Anton Protopopov, Tobias Klauser, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org On 19.02.26 12:50, Hudson, Nick wrote: > [still learning git send-email] Definitely :-D No idea, why I am in the recipient list ;-) Best regards, Oliver > > This was supposed to be sent with this: > > > Our use case involves a BPF program that removes VXLAN headers from > packets that are subsequently forwarded via a tap interface. The new > flags allow the BPF program to strip the tunnel header and selectively > clear the tunnel-specific GSO state. This allows larget packets to be > forwarded without segmentation through the (non-robust) tap interface > for the decapsulated packets. > > > >> On 19 Feb 2026, at 10:47, Nick Hudson <nhudson@akamai.com> wrote: >> >> Enable BPF programs to properly handle GSO state when decapsulating >> tunneled packets by adding selective GSO flag clearing and a trusted >> mode for GSO handling. >> >> New decapsulation flags: >> >> - BPF_F_ADJ_ROOM_DECAP_L4_UDP: Clear UDP tunnel GSO flags >> (SKB_GSO_UDP_TUNNEL, SKB_GSO_UDP_TUNNEL_CSUM) >> - BPF_F_ADJ_ROOM_DECAP_L4_GRE: Clear GRE tunnel GSO flags >> (SKB_GSO_GRE, SKB_GSO_GRE_CSUM) >> - BPF_F_ADJ_ROOM_DECAP_IPXIP4: Clear SKB_GSO_IPXIP4 flag for >> IPv4-in-IPv4 (IPIP) and IPv6-in-IPv4 (SIT) tunnels >> - BPF_F_ADJ_ROOM_DECAP_IPXIP6: Clear SKB_GSO_IPXIP6 flag for >> IPv6-in-IPv6 and IPv4-in-IPv6 tunnels >> - BPF_F_ADJ_ROOM_NO_DODGY: Preserve gso_segs and don't set >> SKB_GSO_DODGY when the BPF program is trusted and modifications >> are known to be valid >> >> The existing anonymous enum for BPF_FUNC_skb_adjust_room flags is >> renamed to enum bpf_adj_room_flags to enable CO-RE (Compile Once - >> Run Everywhere) lookups in BPF programs. >> >> By default, bpf_skb_adjust_room sets SKB_GSO_DODGY and resets >> gso_segs to 0, forcing revalidation. The NO_DODGY flag bypasses this >> for trusted programs that guarantee GSO correctness. >> >> Usage example (decapsulating UDP tunnel with IPv4 inner packet): >> bpf_skb_adjust_room(skb, -hdr_len, BPF_ADJ_ROOM_NET, >> BPF_F_ADJ_ROOM_DECAP_L3_IPV4 | >> BPF_F_ADJ_ROOM_DECAP_L4_UDP); >> >> Co-developed-by: Anna Glasgall <aglasgal@akamai.com> >> Signed-off-by: Anna Glasgall <aglasgal@akamai.com> >> Co-developed-by: Max Tottenham <mtottenh@akamai.com> >> Signed-off-by: Max Tottenham <mtottenh@akamai.com> >> Signed-off-by: Josh Hunt <johunt@akamai.com> >> Signed-off-by: Nick Hudson <nhudson@akamai.com> >> --- >> include/uapi/linux/bpf.h | 45 +++++++++++++++++++-- >> net/core/filter.c | 73 ++++++++++++++++++++++++++++------ >> tools/include/uapi/linux/bpf.h | 45 +++++++++++++++++++-- >> 3 files changed, 145 insertions(+), 18 deletions(-) >> >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >> index c8d400b7680a..0cb24ab70af7 100644 >> --- a/include/uapi/linux/bpf.h >> +++ b/include/uapi/linux/bpf.h >> @@ -3010,8 +3010,42 @@ union bpf_attr { >> * >> * * **BPF_F_ADJ_ROOM_DECAP_L3_IPV4**, >> * **BPF_F_ADJ_ROOM_DECAP_L3_IPV6**: >> - * Indicate the new IP header version after decapsulating the outer >> - * IP header. Used when the inner and outer IP versions are different. >> + * Indicate the new IP header version after decapsulating the >> + * outer IP header. Used when the inner and outer IP versions >> + * are different. These flags only trigger a protocol change >> + * without clearing any tunnel-specific GSO flags. >> + * >> + * * **BPF_F_ADJ_ROOM_DECAP_L4_GRE**: >> + * Clear GRE tunnel GSO flags (SKB_GSO_GRE and SKB_GSO_GRE_CSUM) >> + * when decapsulating a GRE tunnel. >> + * >> + * * **BPF_F_ADJ_ROOM_DECAP_L4_UDP**: >> + * Clear UDP tunnel GSO flags (SKB_GSO_UDP_TUNNEL and >> + * SKB_GSO_UDP_TUNNEL_CSUM) when decapsulating a UDP tunnel. >> + * >> + * * **BPF_F_ADJ_ROOM_DECAP_IPXIP4**: >> + * Clear IPIP/SIT tunnel GSO flag (SKB_GSO_IPXIP4) when decapsulating >> + * a tunnel with an outer IPv4 header (IPv4-in-IPv4 or IPv6-in-IPv4). >> + * >> + * * **BPF_F_ADJ_ROOM_DECAP_IPXIP6**: >> + * Clear IPv6 encapsulation tunnel GSO flag (SKB_GSO_IPXIP6) when >> + * decapsulating a tunnel with an outer IPv6 header (IPv6-in-IPv6 >> + * or IPv4-in-IPv6). >> + * >> + * When using the decapsulation flags above, the skb->encapsulation >> + * flag is automatically cleared if all tunnel-specific GSO flags >> + * (SKB_GSO_UDP_TUNNEL, SKB_GSO_UDP_TUNNEL_CSUM, SKB_GSO_GRE, >> + * SKB_GSO_GRE_CSUM, SKB_GSO_IPXIP4, SKB_GSO_IPXIP6) have been >> + * removed from the packet. This handles cases where all tunnel >> + * layers have been decapsulated. >> + * >> + * * **BPF_F_ADJ_ROOM_NO_DODGY**: >> + * Do not mark the packet as dodgy (untrusted) and preserve >> + * the existing gso_segs count. By default, packet modifications >> + * set SKB_GSO_DODGY and reset gso_segs to 0, forcing >> + * revalidation. This flag is useful when decapsulating the >> + * tunnel, the BPF program is trusted, and the modifications >> + * are known to be valid. >> * >> * A call to this helper is susceptible to change the underlying >> * packet buffer. Therefore, at load time, all checks on pointers >> @@ -6209,7 +6243,7 @@ enum { >> }; >> >> /* BPF_FUNC_skb_adjust_room flags. */ >> -enum { >> +enum bpf_adj_room_flags { >> BPF_F_ADJ_ROOM_FIXED_GSO = (1ULL << 0), >> BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 = (1ULL << 1), >> BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 = (1ULL << 2), >> @@ -6219,6 +6253,11 @@ enum { >> BPF_F_ADJ_ROOM_ENCAP_L2_ETH = (1ULL << 6), >> BPF_F_ADJ_ROOM_DECAP_L3_IPV4 = (1ULL << 7), >> BPF_F_ADJ_ROOM_DECAP_L3_IPV6 = (1ULL << 8), >> + BPF_F_ADJ_ROOM_DECAP_L4_GRE = (1ULL << 9), >> + BPF_F_ADJ_ROOM_DECAP_L4_UDP = (1ULL << 10), >> + BPF_F_ADJ_ROOM_DECAP_IPXIP4 = (1ULL << 11), >> + BPF_F_ADJ_ROOM_DECAP_IPXIP6 = (1ULL << 12), >> + BPF_F_ADJ_ROOM_NO_DODGY = (1ULL << 13), >> }; >> >> enum { >> diff --git a/net/core/filter.c b/net/core/filter.c >> index ba019ded773d..681dd53ab841 100644 >> --- a/net/core/filter.c >> +++ b/net/core/filter.c >> @@ -3484,14 +3484,28 @@ static u32 bpf_skb_net_base_len(const struct sk_buff *skb) >> #define BPF_F_ADJ_ROOM_DECAP_L3_MASK (BPF_F_ADJ_ROOM_DECAP_L3_IPV4 | \ >> BPF_F_ADJ_ROOM_DECAP_L3_IPV6) >> >> -#define BPF_F_ADJ_ROOM_MASK (BPF_F_ADJ_ROOM_FIXED_GSO | \ >> - BPF_F_ADJ_ROOM_ENCAP_L3_MASK | \ >> +#define BPF_F_ADJ_ROOM_DECAP_L4_MASK (BPF_F_ADJ_ROOM_DECAP_L4_UDP | \ >> + BPF_F_ADJ_ROOM_DECAP_L4_GRE) >> + >> +#define BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK (BPF_F_ADJ_ROOM_DECAP_IPXIP4 | \ >> + BPF_F_ADJ_ROOM_DECAP_IPXIP6) >> + >> +#define BPF_F_ADJ_ROOM_ENCAP_MASK (BPF_F_ADJ_ROOM_ENCAP_L3_MASK | \ >> BPF_F_ADJ_ROOM_ENCAP_L4_GRE | \ >> BPF_F_ADJ_ROOM_ENCAP_L4_UDP | \ >> BPF_F_ADJ_ROOM_ENCAP_L2_ETH | \ >> BPF_F_ADJ_ROOM_ENCAP_L2( \ >> - BPF_ADJ_ROOM_ENCAP_L2_MASK) | \ >> - BPF_F_ADJ_ROOM_DECAP_L3_MASK) >> + BPF_ADJ_ROOM_ENCAP_L2_MASK)) >> + >> +#define BPF_F_ADJ_ROOM_DECAP_MASK (BPF_F_ADJ_ROOM_DECAP_L3_MASK | \ >> + BPF_F_ADJ_ROOM_DECAP_L4_MASK | \ >> + BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK) >> + >> +#define BPF_F_ADJ_ROOM_MASK (BPF_F_ADJ_ROOM_FIXED_GSO | \ >> + BPF_F_ADJ_ROOM_ENCAP_MASK | \ >> + BPF_F_ADJ_ROOM_DECAP_MASK | \ >> + BPF_F_ADJ_ROOM_NO_CSUM_RESET | \ >> + BPF_F_ADJ_ROOM_NO_DODGY) >> >> static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, >> u64 flags) >> @@ -3503,6 +3517,10 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, >> unsigned int gso_type = SKB_GSO_DODGY; >> int ret; >> >> + if (unlikely(flags & (BPF_F_ADJ_ROOM_DECAP_MASK | >> + BPF_F_ADJ_ROOM_NO_DODGY))) >> + return -EINVAL; >> + >> if (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) { >> /* udp gso_size delineates datagrams, only allow if fixed */ >> if (!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) || >> @@ -3588,8 +3606,10 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, >> if (skb_is_gso(skb)) { >> struct skb_shared_info *shinfo = skb_shinfo(skb); >> >> - /* Header must be checked, and gso_segs recomputed. */ >> + /* Add tunnel GSO type flags as appropriate. */ >> shinfo->gso_type |= gso_type; >> + >> + /* Header must be checked, and gso_segs recomputed. */ >> shinfo->gso_segs = 0; >> >> /* Due to header growth, MSS needs to be downgraded. >> @@ -3610,11 +3630,14 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, >> static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff, >> u64 flags) >> { >> + bool no_dodgy = flags & BPF_F_ADJ_ROOM_NO_DODGY; >> int ret; >> >> if (unlikely(flags & ~(BPF_F_ADJ_ROOM_FIXED_GSO | >> BPF_F_ADJ_ROOM_DECAP_L3_MASK | >> - BPF_F_ADJ_ROOM_NO_CSUM_RESET))) >> + BPF_F_ADJ_ROOM_DECAP_MASK | >> + BPF_F_ADJ_ROOM_NO_CSUM_RESET | >> + BPF_F_ADJ_ROOM_NO_DODGY))) >> return -EINVAL; >> >> if (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) { >> @@ -3647,9 +3670,36 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff, >> if (!(flags & BPF_F_ADJ_ROOM_FIXED_GSO)) >> skb_increase_gso_size(shinfo, len_diff); >> >> - /* Header must be checked, and gso_segs recomputed. */ >> - shinfo->gso_type |= SKB_GSO_DODGY; >> - shinfo->gso_segs = 0; >> + /* Selective GSO flag clearing based on decap type. >> + * Only clear the flags for the tunnel layer being removed. >> + */ >> + if (flags & BPF_F_ADJ_ROOM_DECAP_L4_UDP) >> + shinfo->gso_type &= ~(SKB_GSO_UDP_TUNNEL | >> + SKB_GSO_UDP_TUNNEL_CSUM); >> + if (flags & BPF_F_ADJ_ROOM_DECAP_L4_GRE) >> + shinfo->gso_type &= ~(SKB_GSO_GRE | >> + SKB_GSO_GRE_CSUM); >> + if (flags & BPF_F_ADJ_ROOM_DECAP_IPXIP4) >> + shinfo->gso_type &= ~SKB_GSO_IPXIP4; >> + if (flags & BPF_F_ADJ_ROOM_DECAP_IPXIP6) >> + shinfo->gso_type &= ~SKB_GSO_IPXIP6; >> + >> + /* Clear encapsulation flag only when no tunnel GSO flags remain */ >> + if (flags & BPF_F_ADJ_ROOM_DECAP_MASK) { >> + if (!(shinfo->gso_type & (SKB_GSO_UDP_TUNNEL | >> + SKB_GSO_UDP_TUNNEL_CSUM | >> + SKB_GSO_GRE | >> + SKB_GSO_GRE_CSUM | >> + SKB_GSO_IPXIP4 | >> + SKB_GSO_IPXIP6))) >> + skb->encapsulation = 0; >> + } >> + >> + /* NO_DODGY: preserve gso_segs, don't mark as dodgy. */ >> + if (!no_dodgy) { >> + shinfo->gso_type |= SKB_GSO_DODGY; >> + shinfo->gso_segs = 0; >> + } >> } >> >> return 0; >> @@ -3709,8 +3759,7 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff, >> u32 off; >> int ret; >> >> - if (unlikely(flags & ~(BPF_F_ADJ_ROOM_MASK | >> - BPF_F_ADJ_ROOM_NO_CSUM_RESET))) >> + if (unlikely(flags & ~BPF_F_ADJ_ROOM_MASK)) >> return -EINVAL; >> if (unlikely(len_diff_abs > 0xfffU)) >> return -EFAULT; >> @@ -3729,7 +3778,7 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff, >> return -ENOTSUPP; >> } >> >> - if (flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK) { >> + if (flags & BPF_F_ADJ_ROOM_DECAP_MASK) { >> if (!shrink) >> return -EINVAL; >> >> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h >> index 5e38b4887de6..664bc8438186 100644 >> --- a/tools/include/uapi/linux/bpf.h >> +++ b/tools/include/uapi/linux/bpf.h >> @@ -3010,8 +3010,42 @@ union bpf_attr { >> * >> * * **BPF_F_ADJ_ROOM_DECAP_L3_IPV4**, >> * **BPF_F_ADJ_ROOM_DECAP_L3_IPV6**: >> - * Indicate the new IP header version after decapsulating the outer >> - * IP header. Used when the inner and outer IP versions are different. >> + * Indicate the new IP header version after decapsulating the >> + * outer IP header. Used when the inner and outer IP versions >> + * are different. These flags only trigger a protocol change >> + * without clearing any tunnel-specific GSO flags. >> + * >> + * * **BPF_F_ADJ_ROOM_DECAP_L4_GRE**: >> + * Clear GRE tunnel GSO flags (SKB_GSO_GRE and SKB_GSO_GRE_CSUM) >> + * when decapsulating a GRE tunnel. >> + * >> + * * **BPF_F_ADJ_ROOM_DECAP_L4_UDP**: >> + * Clear UDP tunnel GSO flags (SKB_GSO_UDP_TUNNEL and >> + * SKB_GSO_UDP_TUNNEL_CSUM) when decapsulating a UDP tunnel. >> + * >> + * * **BPF_F_ADJ_ROOM_DECAP_IPXIP4**: >> + * Clear IPIP/SIT tunnel GSO flag (SKB_GSO_IPXIP4) when decapsulating >> + * a tunnel with an outer IPv4 header (IPv4-in-IPv4 or IPv6-in-IPv4). >> + * >> + * * **BPF_F_ADJ_ROOM_DECAP_IPXIP6**: >> + * Clear IPv6 encapsulation tunnel GSO flag (SKB_GSO_IPXIP6) when >> + * decapsulating a tunnel with an outer IPv6 header (IPv6-in-IPv6 >> + * or IPv4-in-IPv6). >> + * >> + * When using the decapsulation flags above, the skb->encapsulation >> + * flag is automatically cleared if all tunnel-specific GSO flags >> + * (SKB_GSO_UDP_TUNNEL, SKB_GSO_UDP_TUNNEL_CSUM, SKB_GSO_GRE, >> + * SKB_GSO_GRE_CSUM, SKB_GSO_IPXIP4, SKB_GSO_IPXIP6) have been >> + * removed from the packet. This handles cases where all tunnel >> + * layers have been decapsulated. >> + * >> + * * **BPF_F_ADJ_ROOM_NO_DODGY**: >> + * Do not mark the packet as dodgy (untrusted) and preserve >> + * the existing gso_segs count. By default, packet modifications >> + * set SKB_GSO_DODGY and reset gso_segs to 0, forcing >> + * revalidation. This flag is useful when decapsulating the >> + * tunnel, the BPF program is trusted, and the modifications >> + * are known to be valid. >> * >> * A call to this helper is susceptible to change the underlying >> * packet buffer. Therefore, at load time, all checks on pointers >> @@ -6209,7 +6243,7 @@ enum { >> }; >> >> /* BPF_FUNC_skb_adjust_room flags. */ >> -enum { >> +enum bpf_adj_room_flags { >> BPF_F_ADJ_ROOM_FIXED_GSO = (1ULL << 0), >> BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 = (1ULL << 1), >> BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 = (1ULL << 2), >> @@ -6219,6 +6253,11 @@ enum { >> BPF_F_ADJ_ROOM_ENCAP_L2_ETH = (1ULL << 6), >> BPF_F_ADJ_ROOM_DECAP_L3_IPV4 = (1ULL << 7), >> BPF_F_ADJ_ROOM_DECAP_L3_IPV6 = (1ULL << 8), >> + BPF_F_ADJ_ROOM_DECAP_L4_GRE = (1ULL << 9), >> + BPF_F_ADJ_ROOM_DECAP_L4_UDP = (1ULL << 10), >> + BPF_F_ADJ_ROOM_DECAP_IPXIP4 = (1ULL << 11), >> + BPF_F_ADJ_ROOM_DECAP_IPXIP6 = (1ULL << 12), >> + BPF_F_ADJ_ROOM_NO_DODGY = (1ULL << 13), >> }; >> >> enum { >> -- >> 2.34.1 >> > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 1/1] bpf: Add tunnel decapsulation and GSO state updates per new flags 2026-02-19 10:47 ` [RFC PATCH 1/1] bpf: Add tunnel decapsulation and GSO state updates per new flags Nick Hudson 2026-02-19 11:50 ` Hudson, Nick @ 2026-02-20 21:08 ` Willem de Bruijn 2026-02-25 7:12 ` Hudson, Nick 1 sibling, 1 reply; 8+ messages in thread From: Willem de Bruijn @ 2026-02-20 21:08 UTC (permalink / raw) To: Nick Hudson Cc: Nick Hudson, Anna Glasgall, Max Tottenham, Josh Hunt, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, Jason Xing, Willem de Bruijn, Paul Chaignon, Mykyta Yatsenko, Tao Chen, Kumar Kartikeya Dwivedi, Anton Protopopov, Tobias Klauser, bpf, linux-kernel, netdev Nick Hudson wrote: > Enable BPF programs to properly handle GSO state when decapsulating > tunneled packets by adding selective GSO flag clearing and a trusted > mode for GSO handling. > > New decapsulation flags: > > - BPF_F_ADJ_ROOM_DECAP_L4_UDP: Clear UDP tunnel GSO flags > (SKB_GSO_UDP_TUNNEL, SKB_GSO_UDP_TUNNEL_CSUM) > - BPF_F_ADJ_ROOM_DECAP_L4_GRE: Clear GRE tunnel GSO flags > (SKB_GSO_GRE, SKB_GSO_GRE_CSUM) > - BPF_F_ADJ_ROOM_DECAP_IPXIP4: Clear SKB_GSO_IPXIP4 flag for > IPv4-in-IPv4 (IPIP) and IPv6-in-IPv4 (SIT) tunnels > - BPF_F_ADJ_ROOM_DECAP_IPXIP6: Clear SKB_GSO_IPXIP6 flag for > IPv6-in-IPv6 and IPv4-in-IPv6 tunnels > - BPF_F_ADJ_ROOM_NO_DODGY: Preserve gso_segs and don't set > SKB_GSO_DODGY when the BPF program is trusted and modifications > are known to be valid > > The existing anonymous enum for BPF_FUNC_skb_adjust_room flags is > renamed to enum bpf_adj_room_flags to enable CO-RE (Compile Once - > Run Everywhere) lookups in BPF programs. > > By default, bpf_skb_adjust_room sets SKB_GSO_DODGY and resets > gso_segs to 0, forcing revalidation. The NO_DODGY flag bypasses this > for trusted programs that guarantee GSO correctness. > > Usage example (decapsulating UDP tunnel with IPv4 inner packet): > bpf_skb_adjust_room(skb, -hdr_len, BPF_ADJ_ROOM_NET, > BPF_F_ADJ_ROOM_DECAP_L3_IPV4 | > BPF_F_ADJ_ROOM_DECAP_L4_UDP); This patch is doing to much in one patch. Also not convinced of the need for the NO_DODGY flag. > Co-developed-by: Anna Glasgall <aglasgal@akamai.com> > Signed-off-by: Anna Glasgall <aglasgal@akamai.com> > Co-developed-by: Max Tottenham <mtottenh@akamai.com> > Signed-off-by: Max Tottenham <mtottenh@akamai.com> > Signed-off-by: Josh Hunt <johunt@akamai.com> > Signed-off-by: Nick Hudson <nhudson@akamai.com> > --- > include/uapi/linux/bpf.h | 45 +++++++++++++++++++-- > net/core/filter.c | 73 ++++++++++++++++++++++++++++------ > tools/include/uapi/linux/bpf.h | 45 +++++++++++++++++++-- > 3 files changed, 145 insertions(+), 18 deletions(-) > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index c8d400b7680a..0cb24ab70af7 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -3010,8 +3010,42 @@ union bpf_attr { > * > * * **BPF_F_ADJ_ROOM_DECAP_L3_IPV4**, > * **BPF_F_ADJ_ROOM_DECAP_L3_IPV6**: > - * Indicate the new IP header version after decapsulating the outer > - * IP header. Used when the inner and outer IP versions are different. > + * Indicate the new IP header version after decapsulating the > + * outer IP header. Used when the inner and outer IP versions > + * are different. These flags only trigger a protocol change > + * without clearing any tunnel-specific GSO flags. > + * > + * * **BPF_F_ADJ_ROOM_DECAP_L4_GRE**: > + * Clear GRE tunnel GSO flags (SKB_GSO_GRE and SKB_GSO_GRE_CSUM) > + * when decapsulating a GRE tunnel. > + * > + * * **BPF_F_ADJ_ROOM_DECAP_L4_UDP**: > + * Clear UDP tunnel GSO flags (SKB_GSO_UDP_TUNNEL and > + * SKB_GSO_UDP_TUNNEL_CSUM) when decapsulating a UDP tunnel. > + * > + * * **BPF_F_ADJ_ROOM_DECAP_IPXIP4**: > + * Clear IPIP/SIT tunnel GSO flag (SKB_GSO_IPXIP4) when decapsulating > + * a tunnel with an outer IPv4 header (IPv4-in-IPv4 or IPv6-in-IPv4). > + * > + * * **BPF_F_ADJ_ROOM_DECAP_IPXIP6**: > + * Clear IPv6 encapsulation tunnel GSO flag (SKB_GSO_IPXIP6) when > + * decapsulating a tunnel with an outer IPv6 header (IPv6-in-IPv6 > + * or IPv4-in-IPv6). > + * > + * When using the decapsulation flags above, the skb->encapsulation > + * flag is automatically cleared if all tunnel-specific GSO flags > + * (SKB_GSO_UDP_TUNNEL, SKB_GSO_UDP_TUNNEL_CSUM, SKB_GSO_GRE, > + * SKB_GSO_GRE_CSUM, SKB_GSO_IPXIP4, SKB_GSO_IPXIP6) have been > + * removed from the packet. This handles cases where all tunnel > + * layers have been decapsulated. > + * > + * * **BPF_F_ADJ_ROOM_NO_DODGY**: > + * Do not mark the packet as dodgy (untrusted) and preserve > + * the existing gso_segs count. By default, packet modifications > + * set SKB_GSO_DODGY and reset gso_segs to 0, forcing > + * revalidation. This flag is useful when decapsulating the > + * tunnel, the BPF program is trusted, and the modifications > + * are known to be valid. > * > * A call to this helper is susceptible to change the underlying > * packet buffer. Therefore, at load time, all checks on pointers > @@ -6209,7 +6243,7 @@ enum { > }; > > /* BPF_FUNC_skb_adjust_room flags. */ > -enum { > +enum bpf_adj_room_flags { > BPF_F_ADJ_ROOM_FIXED_GSO = (1ULL << 0), > BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 = (1ULL << 1), > BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 = (1ULL << 2), > @@ -6219,6 +6253,11 @@ enum { > BPF_F_ADJ_ROOM_ENCAP_L2_ETH = (1ULL << 6), > BPF_F_ADJ_ROOM_DECAP_L3_IPV4 = (1ULL << 7), > BPF_F_ADJ_ROOM_DECAP_L3_IPV6 = (1ULL << 8), > + BPF_F_ADJ_ROOM_DECAP_L4_GRE = (1ULL << 9), > + BPF_F_ADJ_ROOM_DECAP_L4_UDP = (1ULL << 10), > + BPF_F_ADJ_ROOM_DECAP_IPXIP4 = (1ULL << 11), > + BPF_F_ADJ_ROOM_DECAP_IPXIP6 = (1ULL << 12), > + BPF_F_ADJ_ROOM_NO_DODGY = (1ULL << 13), > }; > > enum { > diff --git a/net/core/filter.c b/net/core/filter.c > index ba019ded773d..681dd53ab841 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -3484,14 +3484,28 @@ static u32 bpf_skb_net_base_len(const struct sk_buff *skb) > #define BPF_F_ADJ_ROOM_DECAP_L3_MASK (BPF_F_ADJ_ROOM_DECAP_L3_IPV4 | \ > BPF_F_ADJ_ROOM_DECAP_L3_IPV6) > > -#define BPF_F_ADJ_ROOM_MASK (BPF_F_ADJ_ROOM_FIXED_GSO | \ > - BPF_F_ADJ_ROOM_ENCAP_L3_MASK | \ > +#define BPF_F_ADJ_ROOM_DECAP_L4_MASK (BPF_F_ADJ_ROOM_DECAP_L4_UDP | \ > + BPF_F_ADJ_ROOM_DECAP_L4_GRE) > + > +#define BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK (BPF_F_ADJ_ROOM_DECAP_IPXIP4 | \ > + BPF_F_ADJ_ROOM_DECAP_IPXIP6) > + > +#define BPF_F_ADJ_ROOM_ENCAP_MASK (BPF_F_ADJ_ROOM_ENCAP_L3_MASK | \ > BPF_F_ADJ_ROOM_ENCAP_L4_GRE | \ > BPF_F_ADJ_ROOM_ENCAP_L4_UDP | \ > BPF_F_ADJ_ROOM_ENCAP_L2_ETH | \ > BPF_F_ADJ_ROOM_ENCAP_L2( \ > - BPF_ADJ_ROOM_ENCAP_L2_MASK) | \ > - BPF_F_ADJ_ROOM_DECAP_L3_MASK) > + BPF_ADJ_ROOM_ENCAP_L2_MASK)) > + > +#define BPF_F_ADJ_ROOM_DECAP_MASK (BPF_F_ADJ_ROOM_DECAP_L3_MASK | \ > + BPF_F_ADJ_ROOM_DECAP_L4_MASK | \ > + BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK) > + > +#define BPF_F_ADJ_ROOM_MASK (BPF_F_ADJ_ROOM_FIXED_GSO | \ > + BPF_F_ADJ_ROOM_ENCAP_MASK | \ > + BPF_F_ADJ_ROOM_DECAP_MASK | \ > + BPF_F_ADJ_ROOM_NO_CSUM_RESET | \ > + BPF_F_ADJ_ROOM_NO_DODGY) > > static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, > u64 flags) > @@ -3503,6 +3517,10 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, > unsigned int gso_type = SKB_GSO_DODGY; > int ret; > > + if (unlikely(flags & (BPF_F_ADJ_ROOM_DECAP_MASK | > + BPF_F_ADJ_ROOM_NO_DODGY))) > + return -EINVAL; > + > if (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) { > /* udp gso_size delineates datagrams, only allow if fixed */ > if (!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) || > @@ -3588,8 +3606,10 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, > if (skb_is_gso(skb)) { > struct skb_shared_info *shinfo = skb_shinfo(skb); > > - /* Header must be checked, and gso_segs recomputed. */ > + /* Add tunnel GSO type flags as appropriate. */ > shinfo->gso_type |= gso_type; > + > + /* Header must be checked, and gso_segs recomputed. */ > shinfo->gso_segs = 0; > > /* Due to header growth, MSS needs to be downgraded. > @@ -3610,11 +3630,14 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, > static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff, > u64 flags) > { > + bool no_dodgy = flags & BPF_F_ADJ_ROOM_NO_DODGY; > int ret; > > if (unlikely(flags & ~(BPF_F_ADJ_ROOM_FIXED_GSO | > BPF_F_ADJ_ROOM_DECAP_L3_MASK | > - BPF_F_ADJ_ROOM_NO_CSUM_RESET))) > + BPF_F_ADJ_ROOM_DECAP_MASK | > + BPF_F_ADJ_ROOM_NO_CSUM_RESET | > + BPF_F_ADJ_ROOM_NO_DODGY))) > return -EINVAL; > > if (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) { > @@ -3647,9 +3670,36 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff, > if (!(flags & BPF_F_ADJ_ROOM_FIXED_GSO)) > skb_increase_gso_size(shinfo, len_diff); > > - /* Header must be checked, and gso_segs recomputed. */ > - shinfo->gso_type |= SKB_GSO_DODGY; > - shinfo->gso_segs = 0; > + /* Selective GSO flag clearing based on decap type. > + * Only clear the flags for the tunnel layer being removed. > + */ > + if (flags & BPF_F_ADJ_ROOM_DECAP_L4_UDP) > + shinfo->gso_type &= ~(SKB_GSO_UDP_TUNNEL | > + SKB_GSO_UDP_TUNNEL_CSUM); > + if (flags & BPF_F_ADJ_ROOM_DECAP_L4_GRE) > + shinfo->gso_type &= ~(SKB_GSO_GRE | > + SKB_GSO_GRE_CSUM); > + if (flags & BPF_F_ADJ_ROOM_DECAP_IPXIP4) > + shinfo->gso_type &= ~SKB_GSO_IPXIP4; > + if (flags & BPF_F_ADJ_ROOM_DECAP_IPXIP6) > + shinfo->gso_type &= ~SKB_GSO_IPXIP6; > + Probably check that the flags were set in the first place. And perhaps that length_diff >= the minimum length that would match tunnel header removal. Basically, maximize guard rails against misuse. > + /* Clear encapsulation flag only when no tunnel GSO flags remain */ > + if (flags & BPF_F_ADJ_ROOM_DECAP_MASK) { > + if (!(shinfo->gso_type & (SKB_GSO_UDP_TUNNEL | > + SKB_GSO_UDP_TUNNEL_CSUM | > + SKB_GSO_GRE | > + SKB_GSO_GRE_CSUM | > + SKB_GSO_IPXIP4 | > + SKB_GSO_IPXIP6))) > + skb->encapsulation = 0; > + } > + > + /* NO_DODGY: preserve gso_segs, don't mark as dodgy. */ > + if (!no_dodgy) { > + shinfo->gso_type |= SKB_GSO_DODGY; > + shinfo->gso_segs = 0; > + } > } > > return 0; > @@ -3709,8 +3759,7 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff, > u32 off; > int ret; > > - if (unlikely(flags & ~(BPF_F_ADJ_ROOM_MASK | > - BPF_F_ADJ_ROOM_NO_CSUM_RESET))) > + if (unlikely(flags & ~BPF_F_ADJ_ROOM_MASK)) > return -EINVAL; > if (unlikely(len_diff_abs > 0xfffU)) > return -EFAULT; > @@ -3729,7 +3778,7 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff, > return -ENOTSUPP; > } > > - if (flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK) { > + if (flags & BPF_F_ADJ_ROOM_DECAP_MASK) { > if (!shrink) > return -EINVAL; > > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h > index 5e38b4887de6..664bc8438186 100644 > --- a/tools/include/uapi/linux/bpf.h > +++ b/tools/include/uapi/linux/bpf.h > @@ -3010,8 +3010,42 @@ union bpf_attr { > * > * * **BPF_F_ADJ_ROOM_DECAP_L3_IPV4**, > * **BPF_F_ADJ_ROOM_DECAP_L3_IPV6**: > - * Indicate the new IP header version after decapsulating the outer > - * IP header. Used when the inner and outer IP versions are different. > + * Indicate the new IP header version after decapsulating the > + * outer IP header. Used when the inner and outer IP versions > + * are different. These flags only trigger a protocol change > + * without clearing any tunnel-specific GSO flags. > + * > + * * **BPF_F_ADJ_ROOM_DECAP_L4_GRE**: > + * Clear GRE tunnel GSO flags (SKB_GSO_GRE and SKB_GSO_GRE_CSUM) > + * when decapsulating a GRE tunnel. > + * > + * * **BPF_F_ADJ_ROOM_DECAP_L4_UDP**: > + * Clear UDP tunnel GSO flags (SKB_GSO_UDP_TUNNEL and > + * SKB_GSO_UDP_TUNNEL_CSUM) when decapsulating a UDP tunnel. > + * > + * * **BPF_F_ADJ_ROOM_DECAP_IPXIP4**: > + * Clear IPIP/SIT tunnel GSO flag (SKB_GSO_IPXIP4) when decapsulating > + * a tunnel with an outer IPv4 header (IPv4-in-IPv4 or IPv6-in-IPv4). > + * > + * * **BPF_F_ADJ_ROOM_DECAP_IPXIP6**: > + * Clear IPv6 encapsulation tunnel GSO flag (SKB_GSO_IPXIP6) when > + * decapsulating a tunnel with an outer IPv6 header (IPv6-in-IPv6 > + * or IPv4-in-IPv6). > + * > + * When using the decapsulation flags above, the skb->encapsulation > + * flag is automatically cleared if all tunnel-specific GSO flags > + * (SKB_GSO_UDP_TUNNEL, SKB_GSO_UDP_TUNNEL_CSUM, SKB_GSO_GRE, > + * SKB_GSO_GRE_CSUM, SKB_GSO_IPXIP4, SKB_GSO_IPXIP6) have been > + * removed from the packet. This handles cases where all tunnel > + * layers have been decapsulated. > + * > + * * **BPF_F_ADJ_ROOM_NO_DODGY**: > + * Do not mark the packet as dodgy (untrusted) and preserve > + * the existing gso_segs count. By default, packet modifications > + * set SKB_GSO_DODGY and reset gso_segs to 0, forcing > + * revalidation. This flag is useful when decapsulating the > + * tunnel, the BPF program is trusted, and the modifications > + * are known to be valid. > * > * A call to this helper is susceptible to change the underlying > * packet buffer. Therefore, at load time, all checks on pointers > @@ -6209,7 +6243,7 @@ enum { > }; > > /* BPF_FUNC_skb_adjust_room flags. */ > -enum { > +enum bpf_adj_room_flags { > BPF_F_ADJ_ROOM_FIXED_GSO = (1ULL << 0), > BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 = (1ULL << 1), > BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 = (1ULL << 2), > @@ -6219,6 +6253,11 @@ enum { > BPF_F_ADJ_ROOM_ENCAP_L2_ETH = (1ULL << 6), > BPF_F_ADJ_ROOM_DECAP_L3_IPV4 = (1ULL << 7), > BPF_F_ADJ_ROOM_DECAP_L3_IPV6 = (1ULL << 8), > + BPF_F_ADJ_ROOM_DECAP_L4_GRE = (1ULL << 9), > + BPF_F_ADJ_ROOM_DECAP_L4_UDP = (1ULL << 10), > + BPF_F_ADJ_ROOM_DECAP_IPXIP4 = (1ULL << 11), > + BPF_F_ADJ_ROOM_DECAP_IPXIP6 = (1ULL << 12), > + BPF_F_ADJ_ROOM_NO_DODGY = (1ULL << 13), > }; > > enum { > -- > 2.34.1 > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 1/1] bpf: Add tunnel decapsulation and GSO state updates per new flags 2026-02-20 21:08 ` Willem de Bruijn @ 2026-02-25 7:12 ` Hudson, Nick 2026-02-25 15:45 ` Willem de Bruijn 0 siblings, 1 reply; 8+ messages in thread From: Hudson, Nick @ 2026-02-25 7:12 UTC (permalink / raw) To: Willem de Bruijn Cc: Glasgall, Anna, Tottenham, Max, Hunt, Joshua, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, Jason Xing, Willem de Bruijn, Paul Chaignon, Mykyta Yatsenko, Tao Chen, Kumar Kartikeya Dwivedi, Anton Protopopov, Tobias Klauser, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 14627 bytes --] > On 20 Feb 2026, at 21:08, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > !-------------------------------------------------------------------| > This Message Is From an External Sender > This message came from outside your organization. > |-------------------------------------------------------------------! > > Nick Hudson wrote: >> Enable BPF programs to properly handle GSO state when decapsulating >> tunneled packets by adding selective GSO flag clearing and a trusted >> mode for GSO handling. >> >> New decapsulation flags: >> >> - BPF_F_ADJ_ROOM_DECAP_L4_UDP: Clear UDP tunnel GSO flags >> (SKB_GSO_UDP_TUNNEL, SKB_GSO_UDP_TUNNEL_CSUM) >> - BPF_F_ADJ_ROOM_DECAP_L4_GRE: Clear GRE tunnel GSO flags >> (SKB_GSO_GRE, SKB_GSO_GRE_CSUM) >> - BPF_F_ADJ_ROOM_DECAP_IPXIP4: Clear SKB_GSO_IPXIP4 flag for >> IPv4-in-IPv4 (IPIP) and IPv6-in-IPv4 (SIT) tunnels >> - BPF_F_ADJ_ROOM_DECAP_IPXIP6: Clear SKB_GSO_IPXIP6 flag for >> IPv6-in-IPv6 and IPv4-in-IPv6 tunnels >> - BPF_F_ADJ_ROOM_NO_DODGY: Preserve gso_segs and don't set >> SKB_GSO_DODGY when the BPF program is trusted and modifications >> are known to be valid >> >> The existing anonymous enum for BPF_FUNC_skb_adjust_room flags is >> renamed to enum bpf_adj_room_flags to enable CO-RE (Compile Once - >> Run Everywhere) lookups in BPF programs. >> >> By default, bpf_skb_adjust_room sets SKB_GSO_DODGY and resets >> gso_segs to 0, forcing revalidation. The NO_DODGY flag bypasses this >> for trusted programs that guarantee GSO correctness. >> >> Usage example (decapsulating UDP tunnel with IPv4 inner packet): >> bpf_skb_adjust_room(skb, -hdr_len, BPF_ADJ_ROOM_NET, >> BPF_F_ADJ_ROOM_DECAP_L3_IPV4 | >> BPF_F_ADJ_ROOM_DECAP_L4_UDP); > > This patch is doing to much in one patch. Sure, I’ll split it up. > > Also not convinced of the need for the NO_DODGY flag. The reason for NO_DODGY is that, without it, the egress interface will see the SKB_GSO_DODGY flag. In our use case, we want to avoid marking the egress tap as NETIF_F_GSO_ROBUST, so the skb will fail skb_gso_ok() with SKB_GSO_DODGY set. When skb_gso_ok() fails, validate_xmit_skb() calls skb_gso_segment(). > >> Co-developed-by: Anna Glasgall <aglasgal@akamai.com> >> Signed-off-by: Anna Glasgall <aglasgal@akamai.com> >> Co-developed-by: Max Tottenham <mtottenh@akamai.com> >> Signed-off-by: Max Tottenham <mtottenh@akamai.com> >> Signed-off-by: Josh Hunt <johunt@akamai.com> >> Signed-off-by: Nick Hudson <nhudson@akamai.com> >> --- >> include/uapi/linux/bpf.h | 45 +++++++++++++++++++-- >> net/core/filter.c | 73 ++++++++++++++++++++++++++++------ >> tools/include/uapi/linux/bpf.h | 45 +++++++++++++++++++-- >> 3 files changed, 145 insertions(+), 18 deletions(-) >> >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >> index c8d400b7680a..0cb24ab70af7 100644 >> --- a/include/uapi/linux/bpf.h >> +++ b/include/uapi/linux/bpf.h >> @@ -3010,8 +3010,42 @@ union bpf_attr { >> * >> * * **BPF_F_ADJ_ROOM_DECAP_L3_IPV4**, >> * **BPF_F_ADJ_ROOM_DECAP_L3_IPV6**: >> - * Indicate the new IP header version after decapsulating the outer >> - * IP header. Used when the inner and outer IP versions are different. >> + * Indicate the new IP header version after decapsulating the >> + * outer IP header. Used when the inner and outer IP versions >> + * are different. These flags only trigger a protocol change >> + * without clearing any tunnel-specific GSO flags. >> + * >> + * * **BPF_F_ADJ_ROOM_DECAP_L4_GRE**: >> + * Clear GRE tunnel GSO flags (SKB_GSO_GRE and SKB_GSO_GRE_CSUM) >> + * when decapsulating a GRE tunnel. >> + * >> + * * **BPF_F_ADJ_ROOM_DECAP_L4_UDP**: >> + * Clear UDP tunnel GSO flags (SKB_GSO_UDP_TUNNEL and >> + * SKB_GSO_UDP_TUNNEL_CSUM) when decapsulating a UDP tunnel. >> + * >> + * * **BPF_F_ADJ_ROOM_DECAP_IPXIP4**: >> + * Clear IPIP/SIT tunnel GSO flag (SKB_GSO_IPXIP4) when decapsulating >> + * a tunnel with an outer IPv4 header (IPv4-in-IPv4 or IPv6-in-IPv4). >> + * >> + * * **BPF_F_ADJ_ROOM_DECAP_IPXIP6**: >> + * Clear IPv6 encapsulation tunnel GSO flag (SKB_GSO_IPXIP6) when >> + * decapsulating a tunnel with an outer IPv6 header (IPv6-in-IPv6 >> + * or IPv4-in-IPv6). >> + * >> + * When using the decapsulation flags above, the skb->encapsulation >> + * flag is automatically cleared if all tunnel-specific GSO flags >> + * (SKB_GSO_UDP_TUNNEL, SKB_GSO_UDP_TUNNEL_CSUM, SKB_GSO_GRE, >> + * SKB_GSO_GRE_CSUM, SKB_GSO_IPXIP4, SKB_GSO_IPXIP6) have been >> + * removed from the packet. This handles cases where all tunnel >> + * layers have been decapsulated. >> + * >> + * * **BPF_F_ADJ_ROOM_NO_DODGY**: >> + * Do not mark the packet as dodgy (untrusted) and preserve >> + * the existing gso_segs count. By default, packet modifications >> + * set SKB_GSO_DODGY and reset gso_segs to 0, forcing >> + * revalidation. This flag is useful when decapsulating the >> + * tunnel, the BPF program is trusted, and the modifications >> + * are known to be valid. >> * >> * A call to this helper is susceptible to change the underlying >> * packet buffer. Therefore, at load time, all checks on pointers >> @@ -6209,7 +6243,7 @@ enum { >> }; >> >> /* BPF_FUNC_skb_adjust_room flags. */ >> -enum { >> +enum bpf_adj_room_flags { >> BPF_F_ADJ_ROOM_FIXED_GSO = (1ULL << 0), >> BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 = (1ULL << 1), >> BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 = (1ULL << 2), >> @@ -6219,6 +6253,11 @@ enum { >> BPF_F_ADJ_ROOM_ENCAP_L2_ETH = (1ULL << 6), >> BPF_F_ADJ_ROOM_DECAP_L3_IPV4 = (1ULL << 7), >> BPF_F_ADJ_ROOM_DECAP_L3_IPV6 = (1ULL << 8), >> + BPF_F_ADJ_ROOM_DECAP_L4_GRE = (1ULL << 9), >> + BPF_F_ADJ_ROOM_DECAP_L4_UDP = (1ULL << 10), >> + BPF_F_ADJ_ROOM_DECAP_IPXIP4 = (1ULL << 11), >> + BPF_F_ADJ_ROOM_DECAP_IPXIP6 = (1ULL << 12), >> + BPF_F_ADJ_ROOM_NO_DODGY = (1ULL << 13), >> }; >> >> enum { >> diff --git a/net/core/filter.c b/net/core/filter.c >> index ba019ded773d..681dd53ab841 100644 >> --- a/net/core/filter.c >> +++ b/net/core/filter.c >> @@ -3484,14 +3484,28 @@ static u32 bpf_skb_net_base_len(const struct sk_buff *skb) >> #define BPF_F_ADJ_ROOM_DECAP_L3_MASK (BPF_F_ADJ_ROOM_DECAP_L3_IPV4 | \ >> BPF_F_ADJ_ROOM_DECAP_L3_IPV6) >> >> -#define BPF_F_ADJ_ROOM_MASK (BPF_F_ADJ_ROOM_FIXED_GSO | \ >> - BPF_F_ADJ_ROOM_ENCAP_L3_MASK | \ >> +#define BPF_F_ADJ_ROOM_DECAP_L4_MASK (BPF_F_ADJ_ROOM_DECAP_L4_UDP | \ >> + BPF_F_ADJ_ROOM_DECAP_L4_GRE) >> + >> +#define BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK (BPF_F_ADJ_ROOM_DECAP_IPXIP4 | \ >> + BPF_F_ADJ_ROOM_DECAP_IPXIP6) >> + >> +#define BPF_F_ADJ_ROOM_ENCAP_MASK (BPF_F_ADJ_ROOM_ENCAP_L3_MASK | \ >> BPF_F_ADJ_ROOM_ENCAP_L4_GRE | \ >> BPF_F_ADJ_ROOM_ENCAP_L4_UDP | \ >> BPF_F_ADJ_ROOM_ENCAP_L2_ETH | \ >> BPF_F_ADJ_ROOM_ENCAP_L2( \ >> - BPF_ADJ_ROOM_ENCAP_L2_MASK) | \ >> - BPF_F_ADJ_ROOM_DECAP_L3_MASK) >> + BPF_ADJ_ROOM_ENCAP_L2_MASK)) >> + >> +#define BPF_F_ADJ_ROOM_DECAP_MASK (BPF_F_ADJ_ROOM_DECAP_L3_MASK | \ >> + BPF_F_ADJ_ROOM_DECAP_L4_MASK | \ >> + BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK) >> + >> +#define BPF_F_ADJ_ROOM_MASK (BPF_F_ADJ_ROOM_FIXED_GSO | \ >> + BPF_F_ADJ_ROOM_ENCAP_MASK | \ >> + BPF_F_ADJ_ROOM_DECAP_MASK | \ >> + BPF_F_ADJ_ROOM_NO_CSUM_RESET | \ >> + BPF_F_ADJ_ROOM_NO_DODGY) >> >> static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, >> u64 flags) >> @@ -3503,6 +3517,10 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, >> unsigned int gso_type = SKB_GSO_DODGY; >> int ret; >> >> + if (unlikely(flags & (BPF_F_ADJ_ROOM_DECAP_MASK | >> + BPF_F_ADJ_ROOM_NO_DODGY))) >> + return -EINVAL; >> + >> if (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) { >> /* udp gso_size delineates datagrams, only allow if fixed */ >> if (!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) || >> @@ -3588,8 +3606,10 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, >> if (skb_is_gso(skb)) { >> struct skb_shared_info *shinfo = skb_shinfo(skb); >> >> - /* Header must be checked, and gso_segs recomputed. */ >> + /* Add tunnel GSO type flags as appropriate. */ >> shinfo->gso_type |= gso_type; >> + >> + /* Header must be checked, and gso_segs recomputed. */ >> shinfo->gso_segs = 0; >> >> /* Due to header growth, MSS needs to be downgraded. >> @@ -3610,11 +3630,14 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff, >> static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff, >> u64 flags) >> { >> + bool no_dodgy = flags & BPF_F_ADJ_ROOM_NO_DODGY; >> int ret; >> >> if (unlikely(flags & ~(BPF_F_ADJ_ROOM_FIXED_GSO | >> BPF_F_ADJ_ROOM_DECAP_L3_MASK | >> - BPF_F_ADJ_ROOM_NO_CSUM_RESET))) >> + BPF_F_ADJ_ROOM_DECAP_MASK | >> + BPF_F_ADJ_ROOM_NO_CSUM_RESET | >> + BPF_F_ADJ_ROOM_NO_DODGY))) >> return -EINVAL; >> >> if (skb_is_gso(skb) && !skb_is_gso_tcp(skb)) { >> @@ -3647,9 +3670,36 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff, >> if (!(flags & BPF_F_ADJ_ROOM_FIXED_GSO)) >> skb_increase_gso_size(shinfo, len_diff); >> >> - /* Header must be checked, and gso_segs recomputed. */ >> - shinfo->gso_type |= SKB_GSO_DODGY; >> - shinfo->gso_segs = 0; >> + /* Selective GSO flag clearing based on decap type. >> + * Only clear the flags for the tunnel layer being removed. >> + */ >> + if (flags & BPF_F_ADJ_ROOM_DECAP_L4_UDP) >> + shinfo->gso_type &= ~(SKB_GSO_UDP_TUNNEL | >> + SKB_GSO_UDP_TUNNEL_CSUM); >> + if (flags & BPF_F_ADJ_ROOM_DECAP_L4_GRE) >> + shinfo->gso_type &= ~(SKB_GSO_GRE | >> + SKB_GSO_GRE_CSUM); >> + if (flags & BPF_F_ADJ_ROOM_DECAP_IPXIP4) >> + shinfo->gso_type &= ~SKB_GSO_IPXIP4; >> + if (flags & BPF_F_ADJ_ROOM_DECAP_IPXIP6) >> + shinfo->gso_type &= ~SKB_GSO_IPXIP6; >> + > > Probably check that the flags were set in the first place. Not sure it matters, but I can add this. > > And perhaps that length_diff >= the minimum length that would match > tunnel header removal. > > Basically, maximize guard rails against misuse. Will add. >> + /* Clear encapsulation flag only when no tunnel GSO flags remain */ >> + if (flags & BPF_F_ADJ_ROOM_DECAP_MASK) { >> + if (!(shinfo->gso_type & (SKB_GSO_UDP_TUNNEL | >> + SKB_GSO_UDP_TUNNEL_CSUM | >> + SKB_GSO_GRE | >> + SKB_GSO_GRE_CSUM | >> + SKB_GSO_IPXIP4 | >> + SKB_GSO_IPXIP6))) >> + skb->encapsulation = 0; >> + } >> + >> + /* NO_DODGY: preserve gso_segs, don't mark as dodgy. */ >> + if (!no_dodgy) { >> + shinfo->gso_type |= SKB_GSO_DODGY; >> + shinfo->gso_segs = 0; >> + } >> } >> >> return 0; >> @@ -3709,8 +3759,7 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff, >> u32 off; >> int ret; >> >> - if (unlikely(flags & ~(BPF_F_ADJ_ROOM_MASK | >> - BPF_F_ADJ_ROOM_NO_CSUM_RESET))) >> + if (unlikely(flags & ~BPF_F_ADJ_ROOM_MASK)) >> return -EINVAL; >> if (unlikely(len_diff_abs > 0xfffU)) >> return -EFAULT; >> @@ -3729,7 +3778,7 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff, >> return -ENOTSUPP; >> } >> >> - if (flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK) { >> + if (flags & BPF_F_ADJ_ROOM_DECAP_MASK) { >> if (!shrink) >> return -EINVAL; >> >> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h >> index 5e38b4887de6..664bc8438186 100644 >> --- a/tools/include/uapi/linux/bpf.h >> +++ b/tools/include/uapi/linux/bpf.h >> @@ -3010,8 +3010,42 @@ union bpf_attr { >> * >> * * **BPF_F_ADJ_ROOM_DECAP_L3_IPV4**, >> * **BPF_F_ADJ_ROOM_DECAP_L3_IPV6**: >> - * Indicate the new IP header version after decapsulating the outer >> - * IP header. Used when the inner and outer IP versions are different. >> + * Indicate the new IP header version after decapsulating the >> + * outer IP header. Used when the inner and outer IP versions >> + * are different. These flags only trigger a protocol change >> + * without clearing any tunnel-specific GSO flags. >> + * >> + * * **BPF_F_ADJ_ROOM_DECAP_L4_GRE**: >> + * Clear GRE tunnel GSO flags (SKB_GSO_GRE and SKB_GSO_GRE_CSUM) >> + * when decapsulating a GRE tunnel. >> + * >> + * * **BPF_F_ADJ_ROOM_DECAP_L4_UDP**: >> + * Clear UDP tunnel GSO flags (SKB_GSO_UDP_TUNNEL and >> + * SKB_GSO_UDP_TUNNEL_CSUM) when decapsulating a UDP tunnel. >> + * >> + * * **BPF_F_ADJ_ROOM_DECAP_IPXIP4**: >> + * Clear IPIP/SIT tunnel GSO flag (SKB_GSO_IPXIP4) when decapsulating >> + * a tunnel with an outer IPv4 header (IPv4-in-IPv4 or IPv6-in-IPv4). >> + * >> + * * **BPF_F_ADJ_ROOM_DECAP_IPXIP6**: >> + * Clear IPv6 encapsulation tunnel GSO flag (SKB_GSO_IPXIP6) when >> + * decapsulating a tunnel with an outer IPv6 header (IPv6-in-IPv6 >> + * or IPv4-in-IPv6). >> + * >> + * When using the decapsulation flags above, the skb->encapsulation >> + * flag is automatically cleared if all tunnel-specific GSO flags >> + * (SKB_GSO_UDP_TUNNEL, SKB_GSO_UDP_TUNNEL_CSUM, SKB_GSO_GRE, >> + * SKB_GSO_GRE_CSUM, SKB_GSO_IPXIP4, SKB_GSO_IPXIP6) have been >> + * removed from the packet. This handles cases where all tunnel >> + * layers have been decapsulated. >> + * >> + * * **BPF_F_ADJ_ROOM_NO_DODGY**: >> + * Do not mark the packet as dodgy (untrusted) and preserve >> + * the existing gso_segs count. By default, packet modifications >> + * set SKB_GSO_DODGY and reset gso_segs to 0, forcing >> + * revalidation. This flag is useful when decapsulating the >> + * tunnel, the BPF program is trusted, and the modifications >> + * are known to be valid. >> * >> * A call to this helper is susceptible to change the underlying >> * packet buffer. Therefore, at load time, all checks on pointers >> @@ -6209,7 +6243,7 @@ enum { >> }; >> >> /* BPF_FUNC_skb_adjust_room flags. */ >> -enum { >> +enum bpf_adj_room_flags { >> BPF_F_ADJ_ROOM_FIXED_GSO = (1ULL << 0), >> BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 = (1ULL << 1), >> BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 = (1ULL << 2), >> @@ -6219,6 +6253,11 @@ enum { >> BPF_F_ADJ_ROOM_ENCAP_L2_ETH = (1ULL << 6), >> BPF_F_ADJ_ROOM_DECAP_L3_IPV4 = (1ULL << 7), >> BPF_F_ADJ_ROOM_DECAP_L3_IPV6 = (1ULL << 8), >> + BPF_F_ADJ_ROOM_DECAP_L4_GRE = (1ULL << 9), >> + BPF_F_ADJ_ROOM_DECAP_L4_UDP = (1ULL << 10), >> + BPF_F_ADJ_ROOM_DECAP_IPXIP4 = (1ULL << 11), >> + BPF_F_ADJ_ROOM_DECAP_IPXIP6 = (1ULL << 12), >> + BPF_F_ADJ_ROOM_NO_DODGY = (1ULL << 13), >> }; >> >> enum { >> -- >> 2.34.1 >> > > > [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 3067 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 1/1] bpf: Add tunnel decapsulation and GSO state updates per new flags 2026-02-25 7:12 ` Hudson, Nick @ 2026-02-25 15:45 ` Willem de Bruijn 2026-03-10 16:26 ` Hudson, Nick 0 siblings, 1 reply; 8+ messages in thread From: Willem de Bruijn @ 2026-02-25 15:45 UTC (permalink / raw) To: Hudson, Nick, Willem de Bruijn Cc: Glasgall, Anna, Tottenham, Max, Hunt, Joshua, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, Jason Xing, Willem de Bruijn, Paul Chaignon, Mykyta Yatsenko, Tao Chen, Kumar Kartikeya Dwivedi, Anton Protopopov, Tobias Klauser, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org Hudson, Nick wrote: > > > > On 20 Feb 2026, at 21:08, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > > > !-------------------------------------------------------------------| > > This Message Is From an External Sender > > This message came from outside your organization. > > |-------------------------------------------------------------------! > > > > Nick Hudson wrote: > >> Enable BPF programs to properly handle GSO state when decapsulating > >> tunneled packets by adding selective GSO flag clearing and a trusted > >> mode for GSO handling. > >> > >> New decapsulation flags: > >> > >> - BPF_F_ADJ_ROOM_DECAP_L4_UDP: Clear UDP tunnel GSO flags > >> (SKB_GSO_UDP_TUNNEL, SKB_GSO_UDP_TUNNEL_CSUM) > >> - BPF_F_ADJ_ROOM_DECAP_L4_GRE: Clear GRE tunnel GSO flags > >> (SKB_GSO_GRE, SKB_GSO_GRE_CSUM) > >> - BPF_F_ADJ_ROOM_DECAP_IPXIP4: Clear SKB_GSO_IPXIP4 flag for > >> IPv4-in-IPv4 (IPIP) and IPv6-in-IPv4 (SIT) tunnels > >> - BPF_F_ADJ_ROOM_DECAP_IPXIP6: Clear SKB_GSO_IPXIP6 flag for > >> IPv6-in-IPv6 and IPv4-in-IPv6 tunnels > >> - BPF_F_ADJ_ROOM_NO_DODGY: Preserve gso_segs and don't set > >> SKB_GSO_DODGY when the BPF program is trusted and modifications > >> are known to be valid > >> > >> The existing anonymous enum for BPF_FUNC_skb_adjust_room flags is > >> renamed to enum bpf_adj_room_flags to enable CO-RE (Compile Once - > >> Run Everywhere) lookups in BPF programs. > >> > >> By default, bpf_skb_adjust_room sets SKB_GSO_DODGY and resets > >> gso_segs to 0, forcing revalidation. The NO_DODGY flag bypasses this > >> for trusted programs that guarantee GSO correctness. > >> > >> Usage example (decapsulating UDP tunnel with IPv4 inner packet): > >> bpf_skb_adjust_room(skb, -hdr_len, BPF_ADJ_ROOM_NET, > >> BPF_F_ADJ_ROOM_DECAP_L3_IPV4 | > >> BPF_F_ADJ_ROOM_DECAP_L4_UDP); > > > > This patch is doing to much in one patch. > > Sure, I’ll split it up. > > > > > Also not convinced of the need for the NO_DODGY flag. > > The reason for NO_DODGY is that, without it, the egress interface will see the > SKB_GSO_DODGY flag. In our use case, we want to avoid marking the egress tap as > NETIF_F_GSO_ROBUST, so the skb will fail skb_gso_ok() with SKB_GSO_DODGY set. > When skb_gso_ok() fails, validate_xmit_skb() calls skb_gso_segment(). I understand why you might want it. But the dodgy check has long been there for a reason: becauses these transformations are not blindly accepted by the kernel. This use case does not change that. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 1/1] bpf: Add tunnel decapsulation and GSO state updates per new flags 2026-02-25 15:45 ` Willem de Bruijn @ 2026-03-10 16:26 ` Hudson, Nick 2026-03-10 19:42 ` Willem de Bruijn 0 siblings, 1 reply; 8+ messages in thread From: Hudson, Nick @ 2026-03-10 16:26 UTC (permalink / raw) To: Willem de Bruijn Cc: Glasgall, Anna, Tottenham, Max, Hunt, Joshua, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, Jason Xing, Willem de Bruijn, Paul Chaignon, Mykyta Yatsenko, Tao Chen, Kumar Kartikeya Dwivedi, Anton Protopopov, Tobias Klauser, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 3619 bytes --] > On 25 Feb 2026, at 15:45, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > !-------------------------------------------------------------------| > This Message Is From an External Sender > This message came from outside your organization. > |-------------------------------------------------------------------! > > Hudson, Nick wrote: >> >> >>> On 20 Feb 2026, at 21:08, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: >>> >>> !-------------------------------------------------------------------| >>> This Message Is From an External Sender >>> This message came from outside your organization. >>> |-------------------------------------------------------------------! >>> >>> Nick Hudson wrote: >>>> Enable BPF programs to properly handle GSO state when decapsulating >>>> tunneled packets by adding selective GSO flag clearing and a trusted >>>> mode for GSO handling. >>>> >>>> New decapsulation flags: >>>> >>>> - BPF_F_ADJ_ROOM_DECAP_L4_UDP: Clear UDP tunnel GSO flags >>>> (SKB_GSO_UDP_TUNNEL, SKB_GSO_UDP_TUNNEL_CSUM) >>>> - BPF_F_ADJ_ROOM_DECAP_L4_GRE: Clear GRE tunnel GSO flags >>>> (SKB_GSO_GRE, SKB_GSO_GRE_CSUM) >>>> - BPF_F_ADJ_ROOM_DECAP_IPXIP4: Clear SKB_GSO_IPXIP4 flag for >>>> IPv4-in-IPv4 (IPIP) and IPv6-in-IPv4 (SIT) tunnels >>>> - BPF_F_ADJ_ROOM_DECAP_IPXIP6: Clear SKB_GSO_IPXIP6 flag for >>>> IPv6-in-IPv6 and IPv4-in-IPv6 tunnels >>>> - BPF_F_ADJ_ROOM_NO_DODGY: Preserve gso_segs and don't set >>>> SKB_GSO_DODGY when the BPF program is trusted and modifications >>>> are known to be valid >>>> >>>> The existing anonymous enum for BPF_FUNC_skb_adjust_room flags is >>>> renamed to enum bpf_adj_room_flags to enable CO-RE (Compile Once - >>>> Run Everywhere) lookups in BPF programs. >>>> >>>> By default, bpf_skb_adjust_room sets SKB_GSO_DODGY and resets >>>> gso_segs to 0, forcing revalidation. The NO_DODGY flag bypasses this >>>> for trusted programs that guarantee GSO correctness. >>>> >>>> Usage example (decapsulating UDP tunnel with IPv4 inner packet): >>>> bpf_skb_adjust_room(skb, -hdr_len, BPF_ADJ_ROOM_NET, >>>> BPF_F_ADJ_ROOM_DECAP_L3_IPV4 | >>>> BPF_F_ADJ_ROOM_DECAP_L4_UDP); >>> >>> This patch is doing to much in one patch. >> >> Sure, I’ll split it up. >> >>> >>> Also not convinced of the need for the NO_DODGY flag. >> >> The reason for NO_DODGY is that, without it, the egress interface will see the >> SKB_GSO_DODGY flag. In our use case, we want to avoid marking the egress tap as >> NETIF_F_GSO_ROBUST, so the skb will fail skb_gso_ok() with SKB_GSO_DODGY set. >> When skb_gso_ok() fails, validate_xmit_skb() calls skb_gso_segment(). > > I understand why you might want it. But the dodgy check has long been > there for a reason: becauses these transformations are not blindly > accepted by the kernel. This use case does not change that. The defence I came up with here is... - setting NETIF_F_GSO_ROBUST for the tun/tap device, as it is a device level property, affects both host to guest and guest to host. the former is trusted. the latter is not. therefore this is not an option. - the host to guest direction is fully trusted - Physical NIC driver is trusted (kernel driver, hardware-validated GSO) - BPF program is trusted (privileged, CAP_BPF, verified by kernel) - Decapsulation is trusted operation for BPF code authors - Bridge + TAP is internal kernel forwarding Would protecting its use with a sysctl make it acceptable? (If it isn’t still) [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 3067 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 1/1] bpf: Add tunnel decapsulation and GSO state updates per new flags 2026-03-10 16:26 ` Hudson, Nick @ 2026-03-10 19:42 ` Willem de Bruijn 0 siblings, 0 replies; 8+ messages in thread From: Willem de Bruijn @ 2026-03-10 19:42 UTC (permalink / raw) To: Hudson, Nick, Willem de Bruijn Cc: Glasgall, Anna, Tottenham, Max, Hunt, Joshua, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, Jason Xing, Willem de Bruijn, Paul Chaignon, Mykyta Yatsenko, Tao Chen, Kumar Kartikeya Dwivedi, Anton Protopopov, Tobias Klauser, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org Hudson, Nick wrote: > > > > On 25 Feb 2026, at 15:45, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > > > !-------------------------------------------------------------------| > > This Message Is From an External Sender > > This message came from outside your organization. > > |-------------------------------------------------------------------! > > > > Hudson, Nick wrote: > >> > >> > >>> On 20 Feb 2026, at 21:08, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > >>> > >>> !-------------------------------------------------------------------| > >>> This Message Is From an External Sender > >>> This message came from outside your organization. > >>> |-------------------------------------------------------------------! > >>> > >>> Nick Hudson wrote: > >>>> Enable BPF programs to properly handle GSO state when decapsulating > >>>> tunneled packets by adding selective GSO flag clearing and a trusted > >>>> mode for GSO handling. > >>>> > >>>> New decapsulation flags: > >>>> > >>>> - BPF_F_ADJ_ROOM_DECAP_L4_UDP: Clear UDP tunnel GSO flags > >>>> (SKB_GSO_UDP_TUNNEL, SKB_GSO_UDP_TUNNEL_CSUM) > >>>> - BPF_F_ADJ_ROOM_DECAP_L4_GRE: Clear GRE tunnel GSO flags > >>>> (SKB_GSO_GRE, SKB_GSO_GRE_CSUM) > >>>> - BPF_F_ADJ_ROOM_DECAP_IPXIP4: Clear SKB_GSO_IPXIP4 flag for > >>>> IPv4-in-IPv4 (IPIP) and IPv6-in-IPv4 (SIT) tunnels > >>>> - BPF_F_ADJ_ROOM_DECAP_IPXIP6: Clear SKB_GSO_IPXIP6 flag for > >>>> IPv6-in-IPv6 and IPv4-in-IPv6 tunnels > >>>> - BPF_F_ADJ_ROOM_NO_DODGY: Preserve gso_segs and don't set > >>>> SKB_GSO_DODGY when the BPF program is trusted and modifications > >>>> are known to be valid > >>>> > >>>> The existing anonymous enum for BPF_FUNC_skb_adjust_room flags is > >>>> renamed to enum bpf_adj_room_flags to enable CO-RE (Compile Once - > >>>> Run Everywhere) lookups in BPF programs. > >>>> > >>>> By default, bpf_skb_adjust_room sets SKB_GSO_DODGY and resets > >>>> gso_segs to 0, forcing revalidation. The NO_DODGY flag bypasses this > >>>> for trusted programs that guarantee GSO correctness. > >>>> > >>>> Usage example (decapsulating UDP tunnel with IPv4 inner packet): > >>>> bpf_skb_adjust_room(skb, -hdr_len, BPF_ADJ_ROOM_NET, > >>>> BPF_F_ADJ_ROOM_DECAP_L3_IPV4 | > >>>> BPF_F_ADJ_ROOM_DECAP_L4_UDP); > >>> > >>> This patch is doing to much in one patch. > >> > >> Sure, I’ll split it up. > >> > >>> > >>> Also not convinced of the need for the NO_DODGY flag. > >> > >> The reason for NO_DODGY is that, without it, the egress interface will see the > >> SKB_GSO_DODGY flag. In our use case, we want to avoid marking the egress tap as > >> NETIF_F_GSO_ROBUST, so the skb will fail skb_gso_ok() with SKB_GSO_DODGY set. > >> When skb_gso_ok() fails, validate_xmit_skb() calls skb_gso_segment(). > > > > I understand why you might want it. But the dodgy check has long been > > there for a reason: becauses these transformations are not blindly > > accepted by the kernel. This use case does not change that. > > The defence I came up with here is... > > - setting NETIF_F_GSO_ROBUST for the tun/tap device, as it is a device level property, affects both host to guest and guest to host. the former is trusted. the latter is not. therefore this is not an option. > - the host to guest direction is fully trusted > - Physical NIC driver is trusted (kernel driver, hardware-validated GSO) > - BPF program is trusted (privileged, CAP_BPF, verified by kernel) > - Decapsulation is trusted operation for BPF code authors > - Bridge + TAP is internal kernel forwarding > > Would protecting its use with a sysctl make it acceptable? (If it isn’t still) Is the DODGY path and going through GSO a significant impact to your workload? So far we have always declined to add such custom opt-outs. This is not at all the first affected user case. Either way, let's separate this from the main functional decap patch. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-03-10 19:42 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260219104710.1490304-1-nhudson@akamai.com>
2026-02-19 10:47 ` [RFC PATCH 1/1] bpf: Add tunnel decapsulation and GSO state updates per new flags Nick Hudson
2026-02-19 11:50 ` Hudson, Nick
2026-02-19 12:18 ` Oliver Hartkopp
2026-02-20 21:08 ` Willem de Bruijn
2026-02-25 7:12 ` Hudson, Nick
2026-02-25 15:45 ` Willem de Bruijn
2026-03-10 16:26 ` Hudson, Nick
2026-03-10 19:42 ` Willem de Bruijn
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox