public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v6 0/6] bpf: decap flags and GSO state updates
@ 2026-05-04 10:17 Nick Hudson
  2026-05-04 10:17 ` [PATCH v6 1/6] bpf: name the enum for BPF_FUNC_skb_adjust_room flags Nick Hudson
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Nick Hudson @ 2026-05-04 10:17 UTC (permalink / raw)
  To: bpf, netdev, Willem de Bruijn, Martin KaFai Lau
  Cc: Nick Hudson, Max Tottenham, Anna Glasgall


This series extends bpf_skb_adjust_room() with decapsulation-specific
flags and tunnel GSO state updates for decap use cases.

Motivation
----------

When BPF decapsulates tunneled packets, skb GSO state needs to be
updated to match the removed tunnel layer. This includes clearing the
corresponding tunnel GSO type bits and resetting encapsulation state
once no tunnel GSO flags remain.

Series Overview
---------------

- Name the adjust_room flag enum for CO-RE lookups.
- Refactor adjust_room helper masks for maintainable validation logic.
- Add new DECAP flags to UAPI.
- Add guard rails for incompatible/invalid decap flag combinations.
- Implement decap GSO state clearing on shrink.
- Add selftests to validate decap GSO and encapsulation state.

Changes v5 -> v6:
- Patch 5: extend decap-state handling for the new L4/IPXIP decap
  flags to non-GSO packets as well: when decapsulation is requested
  on a non-GSO skb, clear skb->encapsulation directly so behavior is
  consistent with the GSO path.
- Patch 6: broaden tc_tunnel coverage to exercise and validate both
  GSO and non-GSO decapsulation paths. This includes selecting
  IPXIP decap flags from the outer tunnel header family, adding
  explicit post-decap encapsulation checks for non-GSO packets,
  and removing forced TSO disable so GSO cases are exercised in
  the test harness.

Changes v4 -> v5:
- Patch 5: Remove explicit clearing of encap_hdr_csum and
  remcsum_offload on UDP decap, per review feedback.
- Patch 6: Remove SKB_GSO_TUNNEL_REMCSUM from SKB_GSO_UDP_TUNNEL_MASK
  in selftests, and minor test improvements.

Changes v3 -> v4:
- Patch 5: drop SKB_GSO_TUNNEL_REMCSUM handling from this series.
- Patch 5: clear encap_hdr_csum and remcsum_offload directly on UDP
  decap.

Changes v2 -> v3:
- Add a new selftests patch to validate decap GSO state behavior.
- Reorder the series so helper-mask refactoring precedes UAPI DECAP
  flag additions.
- Refresh patch 2 and patch 3 split to keep refactoring
  behavior-neutral.
- Patch 5: add decap tunnel GSO-state checks in "bpf: clear decap
  tunnel GSO state in skb_adjust_room" (per Gemini/sashiko).

Changes v1 -> v2:
- Patch 3: decap flag acceptance intentionally remains L3-only while
  adding helper masks.
- Patch 4: decap with L4/IPXIP support enabled with guard rails.

Co-developed-by: Max Tottenham <mtottenh@akamai.com>
Signed-off-by: Max Tottenham <mtottenh@akamai.com>
Co-developed-by: Anna Glasgall <aglasgal@akamai.com>
Signed-off-by: Anna Glasgall <aglasgal@akamai.com>
Signed-off-by: Nick Hudson <nhudson@akamai.com>

Nick Hudson (6):
  bpf: name the enum for BPF_FUNC_skb_adjust_room flags
  bpf: refactor masks for ADJ_ROOM flags and encap validation
  bpf: add BPF_F_ADJ_ROOM_DECAP_* flags for tunnel decapsulation
  bpf: allow new DECAP flags and add guard rails
  bpf: clear decap state on skb_adjust_room shrink path
  selftests/bpf: tc_tunnel - validate decap GSO and encapsulation state

 include/uapi/linux/bpf.h                                |  36 ++++++++++-
 net/core/filter.c                                       | 119 ++++++++++++++++++++++++++++++-----
 tools/include/uapi/linux/bpf.h                          |  36 ++++++++++-
 tools/testing/selftests/bpf/prog_tests/test_tc_tunnel.c |   1 -
 tools/testing/selftests/bpf/progs/test_tc_tunnel.c      |  91 ++++++++++++++++++++++++---
 5 files changed, 252 insertions(+), 31 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v6 1/6] bpf: name the enum for BPF_FUNC_skb_adjust_room flags
  2026-05-04 10:17 [PATCH bpf-next v6 0/6] bpf: decap flags and GSO state updates Nick Hudson
@ 2026-05-04 10:17 ` Nick Hudson
  2026-05-04 11:03   ` bot+bpf-ci
  2026-05-04 10:17 ` [PATCH v6 2/6] bpf: refactor masks for ADJ_ROOM flags and encap validation Nick Hudson
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Nick Hudson @ 2026-05-04 10:17 UTC (permalink / raw)
  To: bpf, netdev, Willem de Bruijn, Martin KaFai Lau
  Cc: Nick Hudson, Max Tottenham, Anna Glasgall, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, linux-kernel

The existing anonymous enum for BPF_FUNC_skb_adjust_room flags is
named to enum bpf_adj_room_flags to enable CO-RE (Compile Once -
Run Everywhere) lookups in BPF programs.

Co-developed-by: Max Tottenham <mtottenh@akamai.com>
Signed-off-by: Max Tottenham <mtottenh@akamai.com>
Co-developed-by: Anna Glasgall <aglasgal@akamai.com>
Signed-off-by: Anna Glasgall <aglasgal@akamai.com>
Signed-off-by: Nick Hudson <nhudson@akamai.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
---
 include/uapi/linux/bpf.h       | 2 +-
 tools/include/uapi/linux/bpf.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 552bc5d9afbd..c021ed8d7b44 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -6211,7 +6211,7 @@ enum {
 };
 
 /* BPF_FUNC_skb_adjust_room flags. */
-enum {
+enum bpf_adj_room_flags {
 	BPF_F_ADJ_ROOM_FIXED_GSO	= (1ULL << 0),
 	BPF_F_ADJ_ROOM_ENCAP_L3_IPV4	= (1ULL << 1),
 	BPF_F_ADJ_ROOM_ENCAP_L3_IPV6	= (1ULL << 2),
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 677be9a47347..ca35ed622ed5 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -6211,7 +6211,7 @@ enum {
 };
 
 /* BPF_FUNC_skb_adjust_room flags. */
-enum {
+enum bpf_adj_room_flags {
 	BPF_F_ADJ_ROOM_FIXED_GSO	= (1ULL << 0),
 	BPF_F_ADJ_ROOM_ENCAP_L3_IPV4	= (1ULL << 1),
 	BPF_F_ADJ_ROOM_ENCAP_L3_IPV6	= (1ULL << 2),
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 2/6] bpf: refactor masks for ADJ_ROOM flags and encap validation
  2026-05-04 10:17 [PATCH bpf-next v6 0/6] bpf: decap flags and GSO state updates Nick Hudson
  2026-05-04 10:17 ` [PATCH v6 1/6] bpf: name the enum for BPF_FUNC_skb_adjust_room flags Nick Hudson
@ 2026-05-04 10:17 ` Nick Hudson
  2026-05-04 11:03   ` bot+bpf-ci
  2026-05-04 17:14   ` Willem de Bruijn
  2026-05-04 10:17 ` [PATCH v6 3/6] bpf: add BPF_F_ADJ_ROOM_DECAP_* flags for tunnel decapsulation Nick Hudson
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 12+ messages in thread
From: Nick Hudson @ 2026-05-04 10:17 UTC (permalink / raw)
  To: bpf, netdev, Willem de Bruijn, Martin KaFai Lau
  Cc: Nick Hudson, Max Tottenham, Anna Glasgall, Daniel Borkmann,
	Alexei Starovoitov, Andrii Nakryiko, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-kernel

Refactor the helper masks for bpf_skb_adjust_room() flags to simplify
validation logic and introduce:

- BPF_F_ADJ_ROOM_ENCAP_MASK
- BPF_F_ADJ_ROOM_DECAP_MASK

Refactor existing validation checks in bpf_skb_net_shrink()
and bpf_skb_adjust_room() to use the new masks (no behavior change).

This is in preparation for supporting the new decap flags.

Co-developed-by: Max Tottenham <mtottenh@akamai.com>
Signed-off-by: Max Tottenham <mtottenh@akamai.com>
Co-developed-by: Anna Glasgall <aglasgal@akamai.com>
Signed-off-by: Anna Glasgall <aglasgal@akamai.com>
Signed-off-by: Nick Hudson <nhudson@akamai.com>
---
---
 net/core/filter.c | 38 +++++++++++++++++++++-----------------
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 80a3b702a2d4..02d3947cca32 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3484,14 +3484,19 @@ static u32 bpf_skb_net_base_len(const struct sk_buff *skb)
 #define BPF_F_ADJ_ROOM_DECAP_L3_MASK	(BPF_F_ADJ_ROOM_DECAP_L3_IPV4 | \
 					 BPF_F_ADJ_ROOM_DECAP_L3_IPV6)
 
-#define BPF_F_ADJ_ROOM_MASK		(BPF_F_ADJ_ROOM_FIXED_GSO | \
-					 BPF_F_ADJ_ROOM_ENCAP_L3_MASK | \
+#define BPF_F_ADJ_ROOM_ENCAP_MASK	(BPF_F_ADJ_ROOM_ENCAP_L3_MASK | \
 					 BPF_F_ADJ_ROOM_ENCAP_L4_GRE | \
 					 BPF_F_ADJ_ROOM_ENCAP_L4_UDP | \
 					 BPF_F_ADJ_ROOM_ENCAP_L2_ETH | \
 					 BPF_F_ADJ_ROOM_ENCAP_L2( \
-					  BPF_ADJ_ROOM_ENCAP_L2_MASK) | \
-					 BPF_F_ADJ_ROOM_DECAP_L3_MASK)
+					  BPF_ADJ_ROOM_ENCAP_L2_MASK))
+
+#define BPF_F_ADJ_ROOM_DECAP_MASK	(BPF_F_ADJ_ROOM_DECAP_L3_MASK)
+
+#define BPF_F_ADJ_ROOM_MASK		(BPF_F_ADJ_ROOM_FIXED_GSO | \
+					 BPF_F_ADJ_ROOM_ENCAP_MASK | \
+					 BPF_F_ADJ_ROOM_DECAP_MASK | \
+					 BPF_F_ADJ_ROOM_NO_CSUM_RESET)
 
 static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
 			    u64 flags)
@@ -3614,8 +3619,8 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
 	bool decap = flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK;
 	int ret;
 
-	if (unlikely(flags & ~(BPF_F_ADJ_ROOM_FIXED_GSO |
-			       BPF_F_ADJ_ROOM_DECAP_L3_MASK |
+	if (unlikely(flags & ~(BPF_F_ADJ_ROOM_DECAP_MASK |
+			       BPF_F_ADJ_ROOM_FIXED_GSO |
 			       BPF_F_ADJ_ROOM_NO_CSUM_RESET)))
 		return -EINVAL;
 
@@ -3714,8 +3719,7 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
 	u32 off;
 	int ret;
 
-	if (unlikely(flags & ~(BPF_F_ADJ_ROOM_MASK |
-			       BPF_F_ADJ_ROOM_NO_CSUM_RESET)))
+	if (unlikely(flags & ~BPF_F_ADJ_ROOM_MASK))
 		return -EINVAL;
 	if (unlikely(len_diff_abs > 0xfffU))
 		return -EFAULT;
@@ -3734,20 +3738,20 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
 		return -ENOTSUPP;
 	}
 
-	if (flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK) {
+	if (flags & BPF_F_ADJ_ROOM_DECAP_MASK) {
 		if (!shrink)
 			return -EINVAL;
 
-		switch (flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK) {
-		case BPF_F_ADJ_ROOM_DECAP_L3_IPV4:
+		/* Reject mutually exclusive decap flag pairs. */
+		if ((flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK) ==
+		    BPF_F_ADJ_ROOM_DECAP_L3_MASK)
+			return -EINVAL;
+
+		if (flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV4)
 			len_min = sizeof(struct iphdr);
-			break;
-		case BPF_F_ADJ_ROOM_DECAP_L3_IPV6:
+
+		if (flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV6)
 			len_min = sizeof(struct ipv6hdr);
-			break;
-		default:
-			return -EINVAL;
-		}
 	}
 
 	len_cur = skb->len - skb_network_offset(skb);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 3/6] bpf: add BPF_F_ADJ_ROOM_DECAP_* flags for tunnel decapsulation
  2026-05-04 10:17 [PATCH bpf-next v6 0/6] bpf: decap flags and GSO state updates Nick Hudson
  2026-05-04 10:17 ` [PATCH v6 1/6] bpf: name the enum for BPF_FUNC_skb_adjust_room flags Nick Hudson
  2026-05-04 10:17 ` [PATCH v6 2/6] bpf: refactor masks for ADJ_ROOM flags and encap validation Nick Hudson
@ 2026-05-04 10:17 ` Nick Hudson
  2026-05-04 11:03   ` bot+bpf-ci
  2026-05-04 10:17 ` [PATCH v6 4/6] bpf: allow new DECAP flags and add guard rails Nick Hudson
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Nick Hudson @ 2026-05-04 10:17 UTC (permalink / raw)
  To: bpf, netdev, Willem de Bruijn, Martin KaFai Lau
  Cc: Nick Hudson, Max Tottenham, Anna Glasgall, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, linux-kernel

Add new bpf_skb_adjust_room() decapsulation flags:

- BPF_F_ADJ_ROOM_DECAP_L4_GRE
- BPF_F_ADJ_ROOM_DECAP_L4_UDP
- BPF_F_ADJ_ROOM_DECAP_IPXIP4
- BPF_F_ADJ_ROOM_DECAP_IPXIP6

These flags let BPF programs describe which tunnel layer is being
removed, so later changes can update tunnel-related GSO state
accordingly during decapsulation.

This patch only introduces the UAPI flag definitions and helper
documentation.

Co-developed-by: Max Tottenham <mtottenh@akamai.com>
Signed-off-by: Max Tottenham <mtottenh@akamai.com>
Co-developed-by: Anna Glasgall <aglasgal@akamai.com>
Signed-off-by: Anna Glasgall <aglasgal@akamai.com>
Signed-off-by: Nick Hudson <nhudson@akamai.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
---
 include/uapi/linux/bpf.h       | 34 ++++++++++++++++++++++++++++++++--
 tools/include/uapi/linux/bpf.h | 34 ++++++++++++++++++++++++++++++++--
 2 files changed, 64 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c021ed8d7b44..4a53e731c554 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3010,8 +3010,34 @@ union bpf_attr {
  *
  *		* **BPF_F_ADJ_ROOM_DECAP_L3_IPV4**,
  *		  **BPF_F_ADJ_ROOM_DECAP_L3_IPV6**:
- *		  Indicate the new IP header version after decapsulating the outer
- *		  IP header. Used when the inner and outer IP versions are different.
+ *		  Indicate the new IP header version after decapsulating the
+ *		  outer IP header. Used when the inner and outer IP versions
+ *		  are different. These flags only trigger a protocol change
+ *		  without clearing any tunnel-specific GSO flags.
+ *
+ *		* **BPF_F_ADJ_ROOM_DECAP_L4_GRE**:
+ *		  Clear GRE tunnel GSO flags (SKB_GSO_GRE and SKB_GSO_GRE_CSUM)
+ *		  when decapsulating a GRE tunnel.
+ *
+ *		* **BPF_F_ADJ_ROOM_DECAP_L4_UDP**:
+ *		  Clear UDP tunnel GSO flags (SKB_GSO_UDP_TUNNEL and
+ *		  SKB_GSO_UDP_TUNNEL_CSUM) when decapsulating a UDP tunnel.
+ *
+ *		* **BPF_F_ADJ_ROOM_DECAP_IPXIP4**:
+ *		  Clear IPIP/SIT tunnel GSO flag (SKB_GSO_IPXIP4) when decapsulating
+ *		  a tunnel with an outer IPv4 header (IPv4-in-IPv4 or IPv6-in-IPv4).
+ *
+ *		* **BPF_F_ADJ_ROOM_DECAP_IPXIP6**:
+ *		  Clear IPv6 encapsulation tunnel GSO flag (SKB_GSO_IPXIP6) when
+ *		  decapsulating a tunnel with an outer IPv6 header (IPv6-in-IPv6
+ *		  or IPv4-in-IPv6).
+ *
+ *		When using the decapsulation flags above, the skb->encapsulation
+ *		flag is automatically cleared if all tunnel-specific GSO flags
+ *		(SKB_GSO_UDP_TUNNEL, SKB_GSO_UDP_TUNNEL_CSUM, SKB_GSO_GRE,
+ *		SKB_GSO_GRE_CSUM, SKB_GSO_IPXIP4, SKB_GSO_IPXIP6) have been
+ *		removed from the packet. This handles cases where all tunnel
+ *		layers have been decapsulated.
  *
  * 		A call to this helper is susceptible to change the underlying
  * 		packet buffer. Therefore, at load time, all checks on pointers
@@ -6221,6 +6247,10 @@ enum bpf_adj_room_flags {
 	BPF_F_ADJ_ROOM_ENCAP_L2_ETH	= (1ULL << 6),
 	BPF_F_ADJ_ROOM_DECAP_L3_IPV4	= (1ULL << 7),
 	BPF_F_ADJ_ROOM_DECAP_L3_IPV6	= (1ULL << 8),
+	BPF_F_ADJ_ROOM_DECAP_L4_GRE	= (1ULL << 9),
+	BPF_F_ADJ_ROOM_DECAP_L4_UDP	= (1ULL << 10),
+	BPF_F_ADJ_ROOM_DECAP_IPXIP4	= (1ULL << 11),
+	BPF_F_ADJ_ROOM_DECAP_IPXIP6	= (1ULL << 12),
 };
 
 enum {
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index ca35ed622ed5..f4c2fbd8fe68 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3010,8 +3010,34 @@ union bpf_attr {
  *
  *		* **BPF_F_ADJ_ROOM_DECAP_L3_IPV4**,
  *		  **BPF_F_ADJ_ROOM_DECAP_L3_IPV6**:
- *		  Indicate the new IP header version after decapsulating the outer
- *		  IP header. Used when the inner and outer IP versions are different.
+ *		  Indicate the new IP header version after decapsulating the
+ *		  outer IP header. Used when the inner and outer IP versions
+ *		  are different. These flags only trigger a protocol change
+ *		  without clearing any tunnel-specific GSO flags.
+ *
+ *		* **BPF_F_ADJ_ROOM_DECAP_L4_GRE**:
+ *		  Clear GRE tunnel GSO flags (SKB_GSO_GRE and SKB_GSO_GRE_CSUM)
+ *		  when decapsulating a GRE tunnel.
+ *
+ *		* **BPF_F_ADJ_ROOM_DECAP_L4_UDP**:
+ *		  Clear UDP tunnel GSO flags (SKB_GSO_UDP_TUNNEL and
+ *		  SKB_GSO_UDP_TUNNEL_CSUM) when decapsulating a UDP tunnel.
+ *
+ *		* **BPF_F_ADJ_ROOM_DECAP_IPXIP4**:
+ *		  Clear IPIP/SIT tunnel GSO flag (SKB_GSO_IPXIP4) when decapsulating
+ *		  a tunnel with an outer IPv4 header (IPv4-in-IPv4 or IPv6-in-IPv4).
+ *
+ *		* **BPF_F_ADJ_ROOM_DECAP_IPXIP6**:
+ *		  Clear IPv6 encapsulation tunnel GSO flag (SKB_GSO_IPXIP6) when
+ *		  decapsulating a tunnel with an outer IPv6 header (IPv6-in-IPv6
+ *		  or IPv4-in-IPv6).
+ *
+ *		When using the decapsulation flags above, the skb->encapsulation
+ *		flag is automatically cleared if all tunnel-specific GSO flags
+ *		(SKB_GSO_UDP_TUNNEL, SKB_GSO_UDP_TUNNEL_CSUM, SKB_GSO_GRE,
+ *		SKB_GSO_GRE_CSUM, SKB_GSO_IPXIP4, SKB_GSO_IPXIP6) have been
+ *		removed from the packet. This handles cases where all tunnel
+ *		layers have been decapsulated.
  *
  * 		A call to this helper is susceptible to change the underlying
  * 		packet buffer. Therefore, at load time, all checks on pointers
@@ -6221,6 +6247,10 @@ enum bpf_adj_room_flags {
 	BPF_F_ADJ_ROOM_ENCAP_L2_ETH	= (1ULL << 6),
 	BPF_F_ADJ_ROOM_DECAP_L3_IPV4	= (1ULL << 7),
 	BPF_F_ADJ_ROOM_DECAP_L3_IPV6	= (1ULL << 8),
+	BPF_F_ADJ_ROOM_DECAP_L4_GRE	= (1ULL << 9),
+	BPF_F_ADJ_ROOM_DECAP_L4_UDP	= (1ULL << 10),
+	BPF_F_ADJ_ROOM_DECAP_IPXIP4	= (1ULL << 11),
+	BPF_F_ADJ_ROOM_DECAP_IPXIP6	= (1ULL << 12),
 };
 
 enum {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 4/6] bpf: allow new DECAP flags and add guard rails
  2026-05-04 10:17 [PATCH bpf-next v6 0/6] bpf: decap flags and GSO state updates Nick Hudson
                   ` (2 preceding siblings ...)
  2026-05-04 10:17 ` [PATCH v6 3/6] bpf: add BPF_F_ADJ_ROOM_DECAP_* flags for tunnel decapsulation Nick Hudson
@ 2026-05-04 10:17 ` Nick Hudson
  2026-05-04 10:17 ` [PATCH v6 5/6] bpf: clear decap state on skb_adjust_room shrink path Nick Hudson
  2026-05-04 10:17 ` [PATCH v6 6/6] selftests/bpf: tc_tunnel - validate decap GSO and encapsulation state Nick Hudson
  5 siblings, 0 replies; 12+ messages in thread
From: Nick Hudson @ 2026-05-04 10:17 UTC (permalink / raw)
  To: bpf, netdev, Willem de Bruijn, Martin KaFai Lau
  Cc: Nick Hudson, Max Tottenham, Anna Glasgall, Daniel Borkmann,
	Alexei Starovoitov, Andrii Nakryiko, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-kernel

Add checks to require shrink-only decap, reject conflicting decap flag
combinations, and verify removed length is sufficient for claimed header
decapsulation.

Co-developed-by: Max Tottenham <mtottenh@akamai.com>
Signed-off-by: Max Tottenham <mtottenh@akamai.com>
Co-developed-by: Anna Glasgall <aglasgal@akamai.com>
Signed-off-by: Anna Glasgall <aglasgal@akamai.com>
Signed-off-by: Nick Hudson <nhudson@akamai.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
---
 net/core/filter.c | 44 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 43 insertions(+), 1 deletion(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 02d3947cca32..185a11f425fa 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -56,6 +56,7 @@
 #include <net/sock_reuseport.h>
 #include <net/busy_poll.h>
 #include <net/tcp.h>
+#include <net/gre.h>
 #include <net/xfrm.h>
 #include <net/udp.h>
 #include <linux/bpf_trace.h>
@@ -3484,6 +3485,12 @@ static u32 bpf_skb_net_base_len(const struct sk_buff *skb)
 #define BPF_F_ADJ_ROOM_DECAP_L3_MASK	(BPF_F_ADJ_ROOM_DECAP_L3_IPV4 | \
 					 BPF_F_ADJ_ROOM_DECAP_L3_IPV6)
 
+#define BPF_F_ADJ_ROOM_DECAP_L4_MASK	(BPF_F_ADJ_ROOM_DECAP_L4_UDP | \
+					 BPF_F_ADJ_ROOM_DECAP_L4_GRE)
+
+#define BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK	(BPF_F_ADJ_ROOM_DECAP_IPXIP4 | \
+					 BPF_F_ADJ_ROOM_DECAP_IPXIP6)
+
 #define BPF_F_ADJ_ROOM_ENCAP_MASK	(BPF_F_ADJ_ROOM_ENCAP_L3_MASK | \
 					 BPF_F_ADJ_ROOM_ENCAP_L4_GRE | \
 					 BPF_F_ADJ_ROOM_ENCAP_L4_UDP | \
@@ -3491,7 +3498,9 @@ static u32 bpf_skb_net_base_len(const struct sk_buff *skb)
 					 BPF_F_ADJ_ROOM_ENCAP_L2( \
 					  BPF_ADJ_ROOM_ENCAP_L2_MASK))
 
-#define BPF_F_ADJ_ROOM_DECAP_MASK	(BPF_F_ADJ_ROOM_DECAP_L3_MASK)
+#define BPF_F_ADJ_ROOM_DECAP_MASK	(BPF_F_ADJ_ROOM_DECAP_L3_MASK | \
+					 BPF_F_ADJ_ROOM_DECAP_L4_MASK | \
+					 BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK)
 
 #define BPF_F_ADJ_ROOM_MASK		(BPF_F_ADJ_ROOM_FIXED_GSO | \
 					 BPF_F_ADJ_ROOM_ENCAP_MASK | \
@@ -3739,6 +3748,8 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
 	}
 
 	if (flags & BPF_F_ADJ_ROOM_DECAP_MASK) {
+		u32 len_decap_min = 0;
+
 		if (!shrink)
 			return -EINVAL;
 
@@ -3747,6 +3758,37 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
 		    BPF_F_ADJ_ROOM_DECAP_L3_MASK)
 			return -EINVAL;
 
+		if ((flags & BPF_F_ADJ_ROOM_DECAP_L4_MASK) ==
+		    BPF_F_ADJ_ROOM_DECAP_L4_MASK)
+			return -EINVAL;
+
+		if ((flags & BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK) ==
+		    BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK)
+			return -EINVAL;
+
+		/* Reject mutually exclusive decap tunnel type flags. */
+		if ((flags & BPF_F_ADJ_ROOM_DECAP_L4_MASK) &&
+		    (flags & BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK))
+			return -EINVAL;
+
+		if (flags & BPF_F_ADJ_ROOM_DECAP_L4_MASK)
+			len_decap_min += bpf_skb_net_base_len(skb);
+
+		if (flags & BPF_F_ADJ_ROOM_DECAP_L4_UDP)
+			len_decap_min += sizeof(struct udphdr);
+
+		if (flags & BPF_F_ADJ_ROOM_DECAP_L4_GRE)
+			len_decap_min += sizeof(struct gre_base_hdr);
+
+		if (flags & BPF_F_ADJ_ROOM_DECAP_IPXIP4)
+			len_decap_min += sizeof(struct iphdr);
+
+		if (flags & BPF_F_ADJ_ROOM_DECAP_IPXIP6)
+			len_decap_min += sizeof(struct ipv6hdr);
+
+		if (len_diff_abs < len_decap_min)
+			return -EINVAL;
+
 		if (flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV4)
 			len_min = sizeof(struct iphdr);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 5/6] bpf: clear decap state on skb_adjust_room shrink path
  2026-05-04 10:17 [PATCH bpf-next v6 0/6] bpf: decap flags and GSO state updates Nick Hudson
                   ` (3 preceding siblings ...)
  2026-05-04 10:17 ` [PATCH v6 4/6] bpf: allow new DECAP flags and add guard rails Nick Hudson
@ 2026-05-04 10:17 ` Nick Hudson
  2026-05-04 17:15   ` Willem de Bruijn
  2026-05-04 10:17 ` [PATCH v6 6/6] selftests/bpf: tc_tunnel - validate decap GSO and encapsulation state Nick Hudson
  5 siblings, 1 reply; 12+ messages in thread
From: Nick Hudson @ 2026-05-04 10:17 UTC (permalink / raw)
  To: bpf, netdev, Willem de Bruijn, Martin KaFai Lau
  Cc: Nick Hudson, Max Tottenham, Anna Glasgall, Daniel Borkmann,
	Alexei Starovoitov, Andrii Nakryiko, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-kernel

On shrink in bpf_skb_adjust_room(), apply decapsulation state updates
according to BPF_F_ADJ_ROOM_DECAP_* flags.

For GSO skbs, clear only the tunnel gso_type bits that correspond to the
requested decap layer:
- DECAP_L4_UDP: SKB_GSO_UDP_TUNNEL{,_CSUM}
- DECAP_L4_GRE: SKB_GSO_GRE{,_CSUM}
- DECAP_IPXIP4: SKB_GSO_IPXIP4
- DECAP_IPXIP6: SKB_GSO_IPXIP6

Then clear skb->encapsulation only if no tunnel GSO bits remain, keeping
encapsulation set for cases such as ESP-in-UDP where tunnel state remains.

For non-GSO skbs, there are no tunnel GSO bits to consult, so clear
skb->encapsulation directly when DECAP_L4_* or DECAP_IPXIP_* flags are set.

This keeps decap state handling consistent between GSO and non-GSO packets.

Co-developed-by: Max Tottenham <mtottenh@akamai.com>
Signed-off-by: Max Tottenham <mtottenh@akamai.com>
Co-developed-by: Anna Glasgall <aglasgal@akamai.com>
Signed-off-by: Anna Glasgall <aglasgal@akamai.com>
Signed-off-by: Nick Hudson <nhudson@akamai.com>
---
 net/core/filter.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 185a11f425fa..3213732dff84 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3666,9 +3666,48 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
 		if (!(flags & BPF_F_ADJ_ROOM_FIXED_GSO))
 			skb_increase_gso_size(shinfo, len_diff);
 
+		/* Selective GSO flag clearing based on decap type.
+		 * Only clear the flags for the tunnel layer being removed.
+		 */
+		if ((flags & BPF_F_ADJ_ROOM_DECAP_L4_UDP) &&
+		    (shinfo->gso_type & (SKB_GSO_UDP_TUNNEL |
+					 SKB_GSO_UDP_TUNNEL_CSUM)))
+			shinfo->gso_type &= ~(SKB_GSO_UDP_TUNNEL |
+					      SKB_GSO_UDP_TUNNEL_CSUM);
+		if ((flags & BPF_F_ADJ_ROOM_DECAP_L4_GRE) &&
+		    (shinfo->gso_type & (SKB_GSO_GRE | SKB_GSO_GRE_CSUM)))
+			shinfo->gso_type &= ~(SKB_GSO_GRE |
+					      SKB_GSO_GRE_CSUM);
+		if ((flags & BPF_F_ADJ_ROOM_DECAP_IPXIP4) &&
+		    (shinfo->gso_type & SKB_GSO_IPXIP4))
+			shinfo->gso_type &= ~SKB_GSO_IPXIP4;
+		if ((flags & BPF_F_ADJ_ROOM_DECAP_IPXIP6) &&
+		    (shinfo->gso_type & SKB_GSO_IPXIP6))
+			shinfo->gso_type &= ~SKB_GSO_IPXIP6;
+
+		/* Clear encapsulation flag only when no tunnel GSO flags remain */
+		if (flags & (BPF_F_ADJ_ROOM_DECAP_L4_MASK |
+			     BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK)) {
+			if (!(shinfo->gso_type & (SKB_GSO_UDP_TUNNEL |
+						  SKB_GSO_UDP_TUNNEL_CSUM |
+						  SKB_GSO_GRE |
+						  SKB_GSO_GRE_CSUM |
+						  SKB_GSO_IPXIP4 |
+						  SKB_GSO_IPXIP6 |
+						  SKB_GSO_ESP)))
+				if (skb->encapsulation)
+					skb->encapsulation = 0;
+		}
+
 		/* Header must be checked, and gso_segs recomputed. */
 		shinfo->gso_type |= SKB_GSO_DODGY;
 		shinfo->gso_segs = 0;
+	} else {
+		/* For non-GSO packets, clear encapsulation if decap flags are set */
+		if ((flags & (BPF_F_ADJ_ROOM_DECAP_L4_MASK |
+			      BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK)) &&
+		    skb->encapsulation)
+			skb->encapsulation = 0;
 	}
 
 	return 0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 6/6] selftests/bpf: tc_tunnel - validate decap GSO and encapsulation state
  2026-05-04 10:17 [PATCH bpf-next v6 0/6] bpf: decap flags and GSO state updates Nick Hudson
                   ` (4 preceding siblings ...)
  2026-05-04 10:17 ` [PATCH v6 5/6] bpf: clear decap state on skb_adjust_room shrink path Nick Hudson
@ 2026-05-04 10:17 ` Nick Hudson
  5 siblings, 0 replies; 12+ messages in thread
From: Nick Hudson @ 2026-05-04 10:17 UTC (permalink / raw)
  To: bpf, netdev, Willem de Bruijn, Martin KaFai Lau
  Cc: Nick Hudson, Andrii Nakryiko, Eduard Zingerman,
	Alexei Starovoitov, Daniel Borkmann, Kumar Kartikeya Dwivedi,
	Shuah Khan, linux-kselftest, linux-kernel

tc_tunnel only partially validated decap state and missed some tunnel
cases. In particular, IPXIP decap checks were not exercised for
IPIP/SIT paths, and non-GSO decap encapsulation state was not
verified.

Tighten the test by:

- setting DECAP_IPXIP4/6 flags for IPIP/SIT/IP6 decap paths based on
  the outer tunnel header family;
- requiring needed DECAP enum values via CO-RE enum existence checks
  so missing kernel support fails fast;
- validating post-decap tunnel state for both GSO and non-GSO packets:
  expected gso_type bits must be cleared and skb->encapsulation must
  match remaining tunnel flags;
- removing forced TSO disable in the test harness so GSO validation is
  exercised.

This improves coverage for decap tunnel-state regressions and ensures
sit_none/ipip-style paths are checked correctly.

Signed-off-by: Nick Hudson <nhudson@akamai.com>
---
 .../selftests/bpf/prog_tests/test_tc_tunnel.c |  1 -
 .../selftests/bpf/progs/test_tc_tunnel.c      | 91 +++++++++++++++++--
 2 files changed, 84 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/test_tc_tunnel.c b/tools/testing/selftests/bpf/prog_tests/test_tc_tunnel.c
index 1aa7c9463980..67ba27d69347 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_tc_tunnel.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_tc_tunnel.c
@@ -438,7 +438,6 @@ static int setup(void)
 	SYS(fail_close_ns_client, "ip link add %s type veth peer name %s",
 	    "veth1 mtu 1500 netns " CLIENT_NS " address " MAC_ADDR_VETH1,
 	    "veth2 mtu 1500 netns " SERVER_NS " address " MAC_ADDR_VETH2);
-	SYS(fail_close_ns_client, "ethtool -K veth1 tso off");
 	SYS(fail_close_ns_client, "ip link set veth1 up");
 	nstoken_server = open_netns(SERVER_NS);
 	if (!ASSERT_OK_PTR(nstoken_server, "open server ns"))
diff --git a/tools/testing/selftests/bpf/progs/test_tc_tunnel.c b/tools/testing/selftests/bpf/progs/test_tc_tunnel.c
index 7376df405a6b..853bca962910 100644
--- a/tools/testing/selftests/bpf/progs/test_tc_tunnel.c
+++ b/tools/testing/selftests/bpf/progs/test_tc_tunnel.c
@@ -6,6 +6,7 @@
 
 #include <bpf/bpf_helpers.h>
 #include <bpf/bpf_endian.h>
+#include <bpf/bpf_core_read.h>
 #include "bpf_tracing_net.h"
 #include "bpf_compiler.h"
 
@@ -37,6 +38,22 @@ struct vxlanhdr___local {
 
 #define	EXTPROTO_VXLAN	0x1
 
+#define SKB_GSO_UDP_TUNNEL_MASK	(SKB_GSO_UDP_TUNNEL |			\
+				 SKB_GSO_UDP_TUNNEL_CSUM)
+
+#define SKB_GSO_TUNNEL_MASK	(SKB_GSO_UDP_TUNNEL_MASK |		\
+				 SKB_GSO_GRE |				\
+				 SKB_GSO_GRE_CSUM |			\
+				 SKB_GSO_IPXIP4 |			\
+				 SKB_GSO_IPXIP6 |			\
+				 SKB_GSO_ESP)
+
+#define BPF_F_ADJ_ROOM_DECAP_L4_MASK	(BPF_F_ADJ_ROOM_DECAP_L4_UDP |	\
+					 BPF_F_ADJ_ROOM_DECAP_L4_GRE)
+
+#define BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK	(BPF_F_ADJ_ROOM_DECAP_IPXIP4 |	\
+					 BPF_F_ADJ_ROOM_DECAP_IPXIP6)
+
 #define	VXLAN_FLAGS     bpf_htonl(1<<27)
 #define	VNI_ID		1
 #define	VXLAN_VNI	bpf_htonl(VNI_ID << 8)
@@ -589,9 +606,12 @@ int __encap_ip6vxlan_eth(struct __sk_buff *skb)
 		return TC_ACT_OK;
 }
 
-static int decap_internal(struct __sk_buff *skb, int off, int len, char proto)
+static int decap_internal(struct __sk_buff *skb, int off, int len, char proto,
+			  __u64 ipxip_flag)
 {
 	__u64 flags = BPF_F_ADJ_ROOM_FIXED_GSO;
+	struct sk_buff *kskb;
+	struct skb_shared_info *shinfo;
 	struct ipv6_opt_hdr ip6_opt_hdr;
 	struct gre_hdr greh;
 	struct udphdr udph;
@@ -599,10 +619,12 @@ static int decap_internal(struct __sk_buff *skb, int off, int len, char proto)
 
 	switch (proto) {
 	case IPPROTO_IPIP:
-		flags |= BPF_F_ADJ_ROOM_DECAP_L3_IPV4;
+		flags |= BPF_F_ADJ_ROOM_DECAP_L3_IPV4 |
+			 ipxip_flag;
 		break;
 	case IPPROTO_IPV6:
-		flags |= BPF_F_ADJ_ROOM_DECAP_L3_IPV6;
+		flags |= BPF_F_ADJ_ROOM_DECAP_L3_IPV6 |
+			 ipxip_flag;
 		break;
 	case NEXTHDR_DEST:
 		if (bpf_skb_load_bytes(skb, off + len, &ip6_opt_hdr,
@@ -610,10 +632,12 @@ static int decap_internal(struct __sk_buff *skb, int off, int len, char proto)
 			return TC_ACT_OK;
 		switch (ip6_opt_hdr.nexthdr) {
 		case IPPROTO_IPIP:
-			flags |= BPF_F_ADJ_ROOM_DECAP_L3_IPV4;
+			flags |= BPF_F_ADJ_ROOM_DECAP_L3_IPV4 |
+				 ipxip_flag;
 			break;
 		case IPPROTO_IPV6:
-			flags |= BPF_F_ADJ_ROOM_DECAP_L3_IPV6;
+			flags |= BPF_F_ADJ_ROOM_DECAP_L3_IPV6 |
+				 ipxip_flag;
 			break;
 		default:
 			return TC_ACT_OK;
@@ -621,6 +645,11 @@ static int decap_internal(struct __sk_buff *skb, int off, int len, char proto)
 		break;
 	case IPPROTO_GRE:
 		olen += sizeof(struct gre_hdr);
+		if (!bpf_core_enum_value_exists(enum bpf_adj_room_flags,
+						BPF_F_ADJ_ROOM_DECAP_L4_GRE))
+			return TC_ACT_SHOT;
+		flags |= BPF_F_ADJ_ROOM_DECAP_L4_GRE;
+
 		if (bpf_skb_load_bytes(skb, off + len, &greh, sizeof(greh)) < 0)
 			return TC_ACT_OK;
 		switch (bpf_ntohs(greh.protocol)) {
@@ -634,6 +663,10 @@ static int decap_internal(struct __sk_buff *skb, int off, int len, char proto)
 		break;
 	case IPPROTO_UDP:
 		olen += sizeof(struct udphdr);
+		if (!bpf_core_enum_value_exists(enum bpf_adj_room_flags,
+						BPF_F_ADJ_ROOM_DECAP_L4_UDP))
+			return TC_ACT_SHOT;
+		flags |= BPF_F_ADJ_ROOM_DECAP_L4_UDP;
 		if (bpf_skb_load_bytes(skb, off + len, &udph, sizeof(udph)) < 0)
 			return TC_ACT_OK;
 		switch (bpf_ntohs(udph.dest)) {
@@ -655,6 +688,40 @@ static int decap_internal(struct __sk_buff *skb, int off, int len, char proto)
 	if (bpf_skb_adjust_room(skb, -olen, BPF_ADJ_ROOM_MAC, flags))
 		return TC_ACT_SHOT;
 
+	kskb = bpf_cast_to_kern_ctx(skb);
+	shinfo = bpf_core_cast(kskb->head + kskb->end, struct skb_shared_info);
+	if (shinfo->gso_size) {
+		if ((flags & BPF_F_ADJ_ROOM_DECAP_L4_UDP) &&
+		    (shinfo->gso_type & SKB_GSO_UDP_TUNNEL_MASK))
+			return TC_ACT_SHOT;
+
+		if ((flags & BPF_F_ADJ_ROOM_DECAP_L4_GRE) &&
+		    (shinfo->gso_type & (SKB_GSO_GRE | SKB_GSO_GRE_CSUM)))
+			return TC_ACT_SHOT;
+
+		if ((flags & BPF_F_ADJ_ROOM_DECAP_IPXIP4) &&
+		    (shinfo->gso_type & SKB_GSO_IPXIP4))
+			return TC_ACT_SHOT;
+
+		if ((flags & BPF_F_ADJ_ROOM_DECAP_IPXIP6) &&
+		    (shinfo->gso_type & SKB_GSO_IPXIP6))
+			return TC_ACT_SHOT;
+
+		if (flags & (BPF_F_ADJ_ROOM_DECAP_L4_MASK |
+			     BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK)) {
+			if ((shinfo->gso_type & SKB_GSO_TUNNEL_MASK) &&
+			    !kskb->encapsulation)
+				return TC_ACT_SHOT;
+			if (!(shinfo->gso_type & SKB_GSO_TUNNEL_MASK) &&
+			    kskb->encapsulation)
+				return TC_ACT_SHOT;
+		}
+	} else if ((flags & (BPF_F_ADJ_ROOM_DECAP_L4_MASK |
+			     BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK)) &&
+		   kskb->encapsulation) {
+		return TC_ACT_SHOT;
+	}
+
 	return TC_ACT_OK;
 }
 
@@ -662,6 +729,10 @@ static int decap_ipv4(struct __sk_buff *skb)
 {
 	struct iphdr iph_outer;
 
+	if (!bpf_core_enum_value_exists(enum bpf_adj_room_flags,
+					BPF_F_ADJ_ROOM_DECAP_IPXIP4))
+		return TC_ACT_SHOT;
+
 	if (bpf_skb_load_bytes(skb, ETH_HLEN, &iph_outer,
 			       sizeof(iph_outer)) < 0)
 		return TC_ACT_OK;
@@ -670,19 +741,25 @@ static int decap_ipv4(struct __sk_buff *skb)
 		return TC_ACT_OK;
 
 	return decap_internal(skb, ETH_HLEN, sizeof(iph_outer),
-			      iph_outer.protocol);
+			      iph_outer.protocol,
+			      BPF_F_ADJ_ROOM_DECAP_IPXIP4);
 }
 
 static int decap_ipv6(struct __sk_buff *skb)
 {
 	struct ipv6hdr iph_outer;
 
+	if (!bpf_core_enum_value_exists(enum bpf_adj_room_flags,
+					BPF_F_ADJ_ROOM_DECAP_IPXIP6))
+		return TC_ACT_SHOT;
+
 	if (bpf_skb_load_bytes(skb, ETH_HLEN, &iph_outer,
 			       sizeof(iph_outer)) < 0)
 		return TC_ACT_OK;
 
 	return decap_internal(skb, ETH_HLEN, sizeof(iph_outer),
-			      iph_outer.nexthdr);
+			      iph_outer.nexthdr,
+			      BPF_F_ADJ_ROOM_DECAP_IPXIP6);
 }
 
 SEC("tc")
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 2/6] bpf: refactor masks for ADJ_ROOM flags and encap validation
  2026-05-04 10:17 ` [PATCH v6 2/6] bpf: refactor masks for ADJ_ROOM flags and encap validation Nick Hudson
@ 2026-05-04 11:03   ` bot+bpf-ci
  2026-05-04 17:14   ` Willem de Bruijn
  1 sibling, 0 replies; 12+ messages in thread
From: bot+bpf-ci @ 2026-05-04 11:03 UTC (permalink / raw)
  To: nhudson, bpf, netdev, willemb, martin.lau
  Cc: nhudson, mtottenh, aglasgal, daniel, ast, andrii, eddyz87, memxor,
	davem, edumazet, kuba, pabeni, linux-kernel, ast, andrii, daniel,
	martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 2995 bytes --]

> diff --git a/net/core/filter.c b/net/core/filter.c
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3484,15 +3484,20 @@ #define BPF_F_ADJ_ROOM_ENCAP_L3_MASK	(BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 | \
>  #define BPF_F_ADJ_ROOM_DECAP_L3_MASK	(BPF_F_ADJ_ROOM_DECAP_L3_IPV4 | \
>  					 BPF_F_ADJ_ROOM_DECAP_L3_IPV6)
>
> -#define BPF_F_ADJ_ROOM_MASK		(BPF_F_ADJ_ROOM_FIXED_GSO | \
> -					 BPF_F_ADJ_ROOM_ENCAP_L3_MASK | \
> +#define BPF_F_ADJ_ROOM_ENCAP_MASK	(BPF_F_ADJ_ROOM_ENCAP_L3_MASK | \
>  					 BPF_F_ADJ_ROOM_ENCAP_L4_GRE | \
>  					 BPF_F_ADJ_ROOM_ENCAP_L4_UDP | \
>  					 BPF_F_ADJ_ROOM_ENCAP_L2_ETH | \
>  					 BPF_F_ADJ_ROOM_ENCAP_L2( \
> -					  BPF_ADJ_ROOM_ENCAP_L2_MASK) | \
> -					 BPF_F_ADJ_ROOM_DECAP_L3_MASK)
> +					  BPF_ADJ_ROOM_ENCAP_L2_MASK))
> +
> +#define BPF_F_ADJ_ROOM_DECAP_MASK	(BPF_F_ADJ_ROOM_DECAP_L3_MASK)
>
> +#define BPF_F_ADJ_ROOM_MASK		(BPF_F_ADJ_ROOM_FIXED_GSO | \
> +					 BPF_F_ADJ_ROOM_ENCAP_MASK | \
> +					 BPF_F_ADJ_ROOM_DECAP_MASK | \
> +					 BPF_F_ADJ_ROOM_NO_CSUM_RESET)
> +
>  static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
>  			    u64 flags)
>  {

[ ... ]

> @@ -3614,8 +3619,8 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
>  	bool decap = flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK;
>  	int ret;
>
> -	if (unlikely(flags & ~(BPF_F_ADJ_ROOM_FIXED_GSO |
> -			       BPF_F_ADJ_ROOM_DECAP_L3_MASK |
> +	if (unlikely(flags & ~(BPF_F_ADJ_ROOM_DECAP_MASK |
> +			       BPF_F_ADJ_ROOM_FIXED_GSO |
>  			       BPF_F_ADJ_ROOM_NO_CSUM_RESET)))
>  		return -EINVAL;
>
> @@ -3627,6 +3632,9 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
>  	if (decap) {
>  		/* Match skb->protocol to new outer l3 protocol */
>  		if (flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV6)

Martin KaFai Lau raised a concern in v5 about whether non-GSO packets
properly handle decapsulation state clearing when bpf_skb_adjust_room()
is used to decapsulate them. He noted this has been asked multiple
times by AI review bots but has been ignored.

The concern is that BPF programs can call bpf_skb_adjust_room() to
decapsulate non-GSO packets, but if skb->encapsulation remains set after
tunnel headers are stripped, downstream network stack components could
misinterpret the packet as still being encapsulated.

Martin noted that bpf_skb_net_grow() sets encapsulation regardless of
skb_is_gso. Looking at the v6 patch series, the code that handles
decapsulation state still only operates inside the skb_is_gso() block.

Should the decapsulation state clearing logic handle non-GSO packets as
well?

Lore reference: https://lore.kernel.org/bpf/2026427201356.-eYM.martin.lau@linux.dev/
Reviewer: Martin KaFai Lau
Date: Mon, 27 Apr 2026 15:19:53 -0700

[ ... ]


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25314425016

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 3/6] bpf: add BPF_F_ADJ_ROOM_DECAP_* flags for tunnel decapsulation
  2026-05-04 10:17 ` [PATCH v6 3/6] bpf: add BPF_F_ADJ_ROOM_DECAP_* flags for tunnel decapsulation Nick Hudson
@ 2026-05-04 11:03   ` bot+bpf-ci
  0 siblings, 0 replies; 12+ messages in thread
From: bot+bpf-ci @ 2026-05-04 11:03 UTC (permalink / raw)
  To: nhudson, bpf, netdev, willemb, martin.lau
  Cc: nhudson, mtottenh, aglasgal, ast, daniel, andrii, eddyz87, memxor,
	linux-kernel, ast, andrii, daniel, martin.lau, eddyz87,
	yonghong.song, clm, ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 3617 bytes --]

> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index c021ed8d7b44..4a53e731c554 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -3010,8 +3010,34 @@ union bpf_attr {
>   *
>   *		* **BPF_F_ADJ_ROOM_DECAP_L3_IPV4**,
>   *		  **BPF_F_ADJ_ROOM_DECAP_L3_IPV6**:
> - *		  Indicate the new IP header version after decapsulating the outer
> - *		  IP header. Used when the inner and outer IP versions are different.
> + *		  Indicate the new IP header version after decapsulating the
> + *		  outer IP header. Used when the inner and outer IP versions
> + *		  are different. These flags only trigger a protocol change
> + *		  without clearing any tunnel-specific GSO flags.
> + *
> + *		* **BPF_F_ADJ_ROOM_DECAP_L4_GRE**:
> + *		  Clear GRE tunnel GSO flags (SKB_GSO_GRE and SKB_GSO_GRE_CSUM)
> + *		  when decapsulating a GRE tunnel.
> + *
> + *		* **BPF_F_ADJ_ROOM_DECAP_L4_UDP**:
> + *		  Clear UDP tunnel GSO flags (SKB_GSO_UDP_TUNNEL and
> + *		  SKB_GSO_UDP_TUNNEL_CSUM) when decapsulating a UDP tunnel.
> + *
> + *		* **BPF_F_ADJ_ROOM_DECAP_IPXIP4**:
> + *		  Clear IPIP/SIT tunnel GSO flag (SKB_GSO_IPXIP4) when decapsulating
> + *		  a tunnel with an outer IPv4 header (IPv4-in-IPv4 or IPv6-in-IPv4).
> + *
> + *		* **BPF_F_ADJ_ROOM_DECAP_IPXIP6**:
> + *		  Clear IPv6 encapsulation tunnel GSO flag (SKB_GSO_IPXIP6) when
> + *		  decapsulating a tunnel with an outer IPv6 header (IPv6-in-IPv6
> + *		  or IPv4-in-IPv6).
> + *
> + *		When using the decapsulation flags above, the skb->encapsulation
> + *		flag is automatically cleared if all tunnel-specific GSO flags
> + *		(SKB_GSO_UDP_TUNNEL, SKB_GSO_UDP_TUNNEL_CSUM, SKB_GSO_GRE,
> + *		SKB_GSO_GRE_CSUM, SKB_GSO_IPXIP4, SKB_GSO_IPXIP6) have been
> + *		removed from the packet. This handles cases where all tunnel
> + *		layers have been decapsulated.

The documentation states that skb->encapsulation is automatically
cleared when all tunnel-specific GSO flags have been removed. Does the
implementation in bpf_skb_net_shrink() handle non-GSO packets correctly?

In v5 review, Martin KaFai Lau noted that the GSO flag clearing logic
is only executed inside the if (skb_is_gso(skb)) block in
bpf_skb_net_shrink(). This means non-GSO packets bypass decapsulation
state clearing. When a non-GSO packet undergoes decapsulation with
tunnel headers stripped but skb->encapsulation remains set, downstream
network stack components may misinterpret the packet as still being
encapsulated.

Note that bpf_skb_net_grow() sets encapsulation regardless of
skb_is_gso, which suggests non-GSO packets can have encapsulation set.

This concern has been raised multiple times by reviewers but has not
been addressed in the current version.

Reference: https://lore.kernel.org/bpf/2026427201356.-eYM.martin.lau@linux.dev/

>   *
>   * 		A call to this helper is susceptible to change the underlying
>   * 		packet buffer. Therefore, at load time, all checks on pointers
> @@ -6221,6 +6247,10 @@ enum bpf_adj_room_flags {
>  	BPF_F_ADJ_ROOM_ENCAP_L2_ETH	= (1ULL << 6),
>  	BPF_F_ADJ_ROOM_DECAP_L3_IPV4	= (1ULL << 7),
>  	BPF_F_ADJ_ROOM_DECAP_L3_IPV6	= (1ULL << 8),
> +	BPF_F_ADJ_ROOM_DECAP_L4_GRE	= (1ULL << 9),
> +	BPF_F_ADJ_ROOM_DECAP_L4_UDP	= (1ULL << 10),
> +	BPF_F_ADJ_ROOM_DECAP_IPXIP4	= (1ULL << 11),
> +	BPF_F_ADJ_ROOM_DECAP_IPXIP6	= (1ULL << 12),
>  };
>
>  enum {

[ ... ]


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25314425016

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 1/6] bpf: name the enum for BPF_FUNC_skb_adjust_room flags
  2026-05-04 10:17 ` [PATCH v6 1/6] bpf: name the enum for BPF_FUNC_skb_adjust_room flags Nick Hudson
@ 2026-05-04 11:03   ` bot+bpf-ci
  0 siblings, 0 replies; 12+ messages in thread
From: bot+bpf-ci @ 2026-05-04 11:03 UTC (permalink / raw)
  To: nhudson, bpf, netdev, willemb, martin.lau
  Cc: nhudson, mtottenh, aglasgal, ast, daniel, andrii, eddyz87, memxor,
	linux-kernel, ast, andrii, daniel, martin.lau, eddyz87,
	yonghong.song, clm, ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 1450 bytes --]

> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 552bc5d9afbd..c021ed8d7b44 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -6211,7 +6211,7 @@ enum {
>  };
>
>  /* BPF_FUNC_skb_adjust_room flags. */
> -enum {
> +enum bpf_adj_room_flags {
>  	BPF_F_ADJ_ROOM_FIXED_GSO	= (1ULL << 0),
>  	BPF_F_ADJ_ROOM_ENCAP_L3_IPV4	= (1ULL << 1),
>  	BPF_F_ADJ_ROOM_ENCAP_L3_IPV6	= (1ULL << 2),

Martin KaFai Lau raised a concern in v2 about demonstrating the intended
CO-RE usage, specifically suggesting bpf_core_enum_value_exists():

https://lore.kernel.org/bpf/e5a2ad16-d2ac-4fa0-be41-162227257682@linux.dev/

He also requested that similar tests be added to test_tc_tunnel.c for the
new flags, as was done for earlier BPF_F_ADJ_ROOM_* flag additions.

While v3+ added selftests for decap GSO state validation, does the current
version include tests that explicitly demonstrate the CO-RE enum lookups for
bpf_adj_room_flags that motivated naming this enum?

> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 677be9a47347..ca35ed622ed5 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h

[ ... ]


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25314425016

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 2/6] bpf: refactor masks for ADJ_ROOM flags and encap validation
  2026-05-04 10:17 ` [PATCH v6 2/6] bpf: refactor masks for ADJ_ROOM flags and encap validation Nick Hudson
  2026-05-04 11:03   ` bot+bpf-ci
@ 2026-05-04 17:14   ` Willem de Bruijn
  1 sibling, 0 replies; 12+ messages in thread
From: Willem de Bruijn @ 2026-05-04 17:14 UTC (permalink / raw)
  To: Nick Hudson, bpf, netdev, Willem de Bruijn, Martin KaFai Lau
  Cc: Nick Hudson, Max Tottenham, Anna Glasgall, Daniel Borkmann,
	Alexei Starovoitov, Andrii Nakryiko, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-kernel

Nick Hudson wrote:
> Refactor the helper masks for bpf_skb_adjust_room() flags to simplify
> validation logic and introduce:
> 
> - BPF_F_ADJ_ROOM_ENCAP_MASK
> - BPF_F_ADJ_ROOM_DECAP_MASK
> 
> Refactor existing validation checks in bpf_skb_net_shrink()
> and bpf_skb_adjust_room() to use the new masks (no behavior change).
> 
> This is in preparation for supporting the new decap flags.
> 
> Co-developed-by: Max Tottenham <mtottenh@akamai.com>
> Signed-off-by: Max Tottenham <mtottenh@akamai.com>
> Co-developed-by: Anna Glasgall <aglasgal@akamai.com>
> Signed-off-by: Anna Glasgall <aglasgal@akamai.com>
> Signed-off-by: Nick Hudson <nhudson@akamai.com>

Reviewed-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 5/6] bpf: clear decap state on skb_adjust_room shrink path
  2026-05-04 10:17 ` [PATCH v6 5/6] bpf: clear decap state on skb_adjust_room shrink path Nick Hudson
@ 2026-05-04 17:15   ` Willem de Bruijn
  0 siblings, 0 replies; 12+ messages in thread
From: Willem de Bruijn @ 2026-05-04 17:15 UTC (permalink / raw)
  To: Nick Hudson, bpf, netdev, Willem de Bruijn, Martin KaFai Lau
  Cc: Nick Hudson, Max Tottenham, Anna Glasgall, Daniel Borkmann,
	Alexei Starovoitov, Andrii Nakryiko, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-kernel

Nick Hudson wrote:
> On shrink in bpf_skb_adjust_room(), apply decapsulation state updates
> according to BPF_F_ADJ_ROOM_DECAP_* flags.
> 
> For GSO skbs, clear only the tunnel gso_type bits that correspond to the
> requested decap layer:
> - DECAP_L4_UDP: SKB_GSO_UDP_TUNNEL{,_CSUM}
> - DECAP_L4_GRE: SKB_GSO_GRE{,_CSUM}
> - DECAP_IPXIP4: SKB_GSO_IPXIP4
> - DECAP_IPXIP6: SKB_GSO_IPXIP6
> 
> Then clear skb->encapsulation only if no tunnel GSO bits remain, keeping
> encapsulation set for cases such as ESP-in-UDP where tunnel state remains.
> 
> For non-GSO skbs, there are no tunnel GSO bits to consult, so clear
> skb->encapsulation directly when DECAP_L4_* or DECAP_IPXIP_* flags are set.
> 
> This keeps decap state handling consistent between GSO and non-GSO packets.
> 
> Co-developed-by: Max Tottenham <mtottenh@akamai.com>
> Signed-off-by: Max Tottenham <mtottenh@akamai.com>
> Co-developed-by: Anna Glasgall <aglasgal@akamai.com>
> Signed-off-by: Anna Glasgall <aglasgal@akamai.com>
> Signed-off-by: Nick Hudson <nhudson@akamai.com>

Reviewed-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-05-04 17:15 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-04 10:17 [PATCH bpf-next v6 0/6] bpf: decap flags and GSO state updates Nick Hudson
2026-05-04 10:17 ` [PATCH v6 1/6] bpf: name the enum for BPF_FUNC_skb_adjust_room flags Nick Hudson
2026-05-04 11:03   ` bot+bpf-ci
2026-05-04 10:17 ` [PATCH v6 2/6] bpf: refactor masks for ADJ_ROOM flags and encap validation Nick Hudson
2026-05-04 11:03   ` bot+bpf-ci
2026-05-04 17:14   ` Willem de Bruijn
2026-05-04 10:17 ` [PATCH v6 3/6] bpf: add BPF_F_ADJ_ROOM_DECAP_* flags for tunnel decapsulation Nick Hudson
2026-05-04 11:03   ` bot+bpf-ci
2026-05-04 10:17 ` [PATCH v6 4/6] bpf: allow new DECAP flags and add guard rails Nick Hudson
2026-05-04 10:17 ` [PATCH v6 5/6] bpf: clear decap state on skb_adjust_room shrink path Nick Hudson
2026-05-04 17:15   ` Willem de Bruijn
2026-05-04 10:17 ` [PATCH v6 6/6] selftests/bpf: tc_tunnel - validate decap GSO and encapsulation state Nick Hudson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox