* [PATCH v6 1/6] bpf: name the enum for BPF_FUNC_skb_adjust_room flags
2026-05-04 10:17 [PATCH bpf-next v6 0/6] bpf: decap flags and GSO state updates Nick Hudson
@ 2026-05-04 10:17 ` Nick Hudson
2026-05-04 11:03 ` bot+bpf-ci
2026-05-04 10:17 ` [PATCH v6 2/6] bpf: refactor masks for ADJ_ROOM flags and encap validation Nick Hudson
` (4 subsequent siblings)
5 siblings, 1 reply; 12+ messages in thread
From: Nick Hudson @ 2026-05-04 10:17 UTC (permalink / raw)
To: bpf, netdev, Willem de Bruijn, Martin KaFai Lau
Cc: Nick Hudson, Max Tottenham, Anna Glasgall, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Eduard Zingerman,
Kumar Kartikeya Dwivedi, linux-kernel
The existing anonymous enum for BPF_FUNC_skb_adjust_room flags is
named to enum bpf_adj_room_flags to enable CO-RE (Compile Once -
Run Everywhere) lookups in BPF programs.
Co-developed-by: Max Tottenham <mtottenh@akamai.com>
Signed-off-by: Max Tottenham <mtottenh@akamai.com>
Co-developed-by: Anna Glasgall <aglasgal@akamai.com>
Signed-off-by: Anna Glasgall <aglasgal@akamai.com>
Signed-off-by: Nick Hudson <nhudson@akamai.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
---
include/uapi/linux/bpf.h | 2 +-
tools/include/uapi/linux/bpf.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 552bc5d9afbd..c021ed8d7b44 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -6211,7 +6211,7 @@ enum {
};
/* BPF_FUNC_skb_adjust_room flags. */
-enum {
+enum bpf_adj_room_flags {
BPF_F_ADJ_ROOM_FIXED_GSO = (1ULL << 0),
BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 = (1ULL << 1),
BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 = (1ULL << 2),
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 677be9a47347..ca35ed622ed5 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -6211,7 +6211,7 @@ enum {
};
/* BPF_FUNC_skb_adjust_room flags. */
-enum {
+enum bpf_adj_room_flags {
BPF_F_ADJ_ROOM_FIXED_GSO = (1ULL << 0),
BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 = (1ULL << 1),
BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 = (1ULL << 2),
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH v6 1/6] bpf: name the enum for BPF_FUNC_skb_adjust_room flags
2026-05-04 10:17 ` [PATCH v6 1/6] bpf: name the enum for BPF_FUNC_skb_adjust_room flags Nick Hudson
@ 2026-05-04 11:03 ` bot+bpf-ci
0 siblings, 0 replies; 12+ messages in thread
From: bot+bpf-ci @ 2026-05-04 11:03 UTC (permalink / raw)
To: nhudson, bpf, netdev, willemb, martin.lau
Cc: nhudson, mtottenh, aglasgal, ast, daniel, andrii, eddyz87, memxor,
linux-kernel, ast, andrii, daniel, martin.lau, eddyz87,
yonghong.song, clm, ihor.solodrai
[-- Attachment #1: Type: text/plain, Size: 1450 bytes --]
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 552bc5d9afbd..c021ed8d7b44 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -6211,7 +6211,7 @@ enum {
> };
>
> /* BPF_FUNC_skb_adjust_room flags. */
> -enum {
> +enum bpf_adj_room_flags {
> BPF_F_ADJ_ROOM_FIXED_GSO = (1ULL << 0),
> BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 = (1ULL << 1),
> BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 = (1ULL << 2),
Martin KaFai Lau raised a concern in v2 about demonstrating the intended
CO-RE usage, specifically suggesting bpf_core_enum_value_exists():
https://lore.kernel.org/bpf/e5a2ad16-d2ac-4fa0-be41-162227257682@linux.dev/
He also requested that similar tests be added to test_tc_tunnel.c for the
new flags, as was done for earlier BPF_F_ADJ_ROOM_* flag additions.
While v3+ added selftests for decap GSO state validation, does the current
version include tests that explicitly demonstrate the CO-RE enum lookups for
bpf_adj_room_flags that motivated naming this enum?
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 677be9a47347..ca35ed622ed5 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
[ ... ]
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25314425016
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v6 2/6] bpf: refactor masks for ADJ_ROOM flags and encap validation
2026-05-04 10:17 [PATCH bpf-next v6 0/6] bpf: decap flags and GSO state updates Nick Hudson
2026-05-04 10:17 ` [PATCH v6 1/6] bpf: name the enum for BPF_FUNC_skb_adjust_room flags Nick Hudson
@ 2026-05-04 10:17 ` Nick Hudson
2026-05-04 11:03 ` bot+bpf-ci
2026-05-04 17:14 ` Willem de Bruijn
2026-05-04 10:17 ` [PATCH v6 3/6] bpf: add BPF_F_ADJ_ROOM_DECAP_* flags for tunnel decapsulation Nick Hudson
` (3 subsequent siblings)
5 siblings, 2 replies; 12+ messages in thread
From: Nick Hudson @ 2026-05-04 10:17 UTC (permalink / raw)
To: bpf, netdev, Willem de Bruijn, Martin KaFai Lau
Cc: Nick Hudson, Max Tottenham, Anna Glasgall, Daniel Borkmann,
Alexei Starovoitov, Andrii Nakryiko, Eduard Zingerman,
Kumar Kartikeya Dwivedi, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, linux-kernel
Refactor the helper masks for bpf_skb_adjust_room() flags to simplify
validation logic and introduce:
- BPF_F_ADJ_ROOM_ENCAP_MASK
- BPF_F_ADJ_ROOM_DECAP_MASK
Refactor existing validation checks in bpf_skb_net_shrink()
and bpf_skb_adjust_room() to use the new masks (no behavior change).
This is in preparation for supporting the new decap flags.
Co-developed-by: Max Tottenham <mtottenh@akamai.com>
Signed-off-by: Max Tottenham <mtottenh@akamai.com>
Co-developed-by: Anna Glasgall <aglasgal@akamai.com>
Signed-off-by: Anna Glasgall <aglasgal@akamai.com>
Signed-off-by: Nick Hudson <nhudson@akamai.com>
---
---
net/core/filter.c | 38 +++++++++++++++++++++-----------------
1 file changed, 21 insertions(+), 17 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index 80a3b702a2d4..02d3947cca32 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3484,14 +3484,19 @@ static u32 bpf_skb_net_base_len(const struct sk_buff *skb)
#define BPF_F_ADJ_ROOM_DECAP_L3_MASK (BPF_F_ADJ_ROOM_DECAP_L3_IPV4 | \
BPF_F_ADJ_ROOM_DECAP_L3_IPV6)
-#define BPF_F_ADJ_ROOM_MASK (BPF_F_ADJ_ROOM_FIXED_GSO | \
- BPF_F_ADJ_ROOM_ENCAP_L3_MASK | \
+#define BPF_F_ADJ_ROOM_ENCAP_MASK (BPF_F_ADJ_ROOM_ENCAP_L3_MASK | \
BPF_F_ADJ_ROOM_ENCAP_L4_GRE | \
BPF_F_ADJ_ROOM_ENCAP_L4_UDP | \
BPF_F_ADJ_ROOM_ENCAP_L2_ETH | \
BPF_F_ADJ_ROOM_ENCAP_L2( \
- BPF_ADJ_ROOM_ENCAP_L2_MASK) | \
- BPF_F_ADJ_ROOM_DECAP_L3_MASK)
+ BPF_ADJ_ROOM_ENCAP_L2_MASK))
+
+#define BPF_F_ADJ_ROOM_DECAP_MASK (BPF_F_ADJ_ROOM_DECAP_L3_MASK)
+
+#define BPF_F_ADJ_ROOM_MASK (BPF_F_ADJ_ROOM_FIXED_GSO | \
+ BPF_F_ADJ_ROOM_ENCAP_MASK | \
+ BPF_F_ADJ_ROOM_DECAP_MASK | \
+ BPF_F_ADJ_ROOM_NO_CSUM_RESET)
static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
u64 flags)
@@ -3614,8 +3619,8 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
bool decap = flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK;
int ret;
- if (unlikely(flags & ~(BPF_F_ADJ_ROOM_FIXED_GSO |
- BPF_F_ADJ_ROOM_DECAP_L3_MASK |
+ if (unlikely(flags & ~(BPF_F_ADJ_ROOM_DECAP_MASK |
+ BPF_F_ADJ_ROOM_FIXED_GSO |
BPF_F_ADJ_ROOM_NO_CSUM_RESET)))
return -EINVAL;
@@ -3714,8 +3719,7 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
u32 off;
int ret;
- if (unlikely(flags & ~(BPF_F_ADJ_ROOM_MASK |
- BPF_F_ADJ_ROOM_NO_CSUM_RESET)))
+ if (unlikely(flags & ~BPF_F_ADJ_ROOM_MASK))
return -EINVAL;
if (unlikely(len_diff_abs > 0xfffU))
return -EFAULT;
@@ -3734,20 +3738,20 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
return -ENOTSUPP;
}
- if (flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK) {
+ if (flags & BPF_F_ADJ_ROOM_DECAP_MASK) {
if (!shrink)
return -EINVAL;
- switch (flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK) {
- case BPF_F_ADJ_ROOM_DECAP_L3_IPV4:
+ /* Reject mutually exclusive decap flag pairs. */
+ if ((flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK) ==
+ BPF_F_ADJ_ROOM_DECAP_L3_MASK)
+ return -EINVAL;
+
+ if (flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV4)
len_min = sizeof(struct iphdr);
- break;
- case BPF_F_ADJ_ROOM_DECAP_L3_IPV6:
+
+ if (flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV6)
len_min = sizeof(struct ipv6hdr);
- break;
- default:
- return -EINVAL;
- }
}
len_cur = skb->len - skb_network_offset(skb);
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH v6 2/6] bpf: refactor masks for ADJ_ROOM flags and encap validation
2026-05-04 10:17 ` [PATCH v6 2/6] bpf: refactor masks for ADJ_ROOM flags and encap validation Nick Hudson
@ 2026-05-04 11:03 ` bot+bpf-ci
2026-05-04 17:14 ` Willem de Bruijn
1 sibling, 0 replies; 12+ messages in thread
From: bot+bpf-ci @ 2026-05-04 11:03 UTC (permalink / raw)
To: nhudson, bpf, netdev, willemb, martin.lau
Cc: nhudson, mtottenh, aglasgal, daniel, ast, andrii, eddyz87, memxor,
davem, edumazet, kuba, pabeni, linux-kernel, ast, andrii, daniel,
martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai
[-- Attachment #1: Type: text/plain, Size: 2995 bytes --]
> diff --git a/net/core/filter.c b/net/core/filter.c
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3484,15 +3484,20 @@ #define BPF_F_ADJ_ROOM_ENCAP_L3_MASK (BPF_F_ADJ_ROOM_ENCAP_L3_IPV4 | \
> #define BPF_F_ADJ_ROOM_DECAP_L3_MASK (BPF_F_ADJ_ROOM_DECAP_L3_IPV4 | \
> BPF_F_ADJ_ROOM_DECAP_L3_IPV6)
>
> -#define BPF_F_ADJ_ROOM_MASK (BPF_F_ADJ_ROOM_FIXED_GSO | \
> - BPF_F_ADJ_ROOM_ENCAP_L3_MASK | \
> +#define BPF_F_ADJ_ROOM_ENCAP_MASK (BPF_F_ADJ_ROOM_ENCAP_L3_MASK | \
> BPF_F_ADJ_ROOM_ENCAP_L4_GRE | \
> BPF_F_ADJ_ROOM_ENCAP_L4_UDP | \
> BPF_F_ADJ_ROOM_ENCAP_L2_ETH | \
> BPF_F_ADJ_ROOM_ENCAP_L2( \
> - BPF_ADJ_ROOM_ENCAP_L2_MASK) | \
> - BPF_F_ADJ_ROOM_DECAP_L3_MASK)
> + BPF_ADJ_ROOM_ENCAP_L2_MASK))
> +
> +#define BPF_F_ADJ_ROOM_DECAP_MASK (BPF_F_ADJ_ROOM_DECAP_L3_MASK)
>
> +#define BPF_F_ADJ_ROOM_MASK (BPF_F_ADJ_ROOM_FIXED_GSO | \
> + BPF_F_ADJ_ROOM_ENCAP_MASK | \
> + BPF_F_ADJ_ROOM_DECAP_MASK | \
> + BPF_F_ADJ_ROOM_NO_CSUM_RESET)
> +
> static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
> u64 flags)
> {
[ ... ]
> @@ -3614,8 +3619,8 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
> bool decap = flags & BPF_F_ADJ_ROOM_DECAP_L3_MASK;
> int ret;
>
> - if (unlikely(flags & ~(BPF_F_ADJ_ROOM_FIXED_GSO |
> - BPF_F_ADJ_ROOM_DECAP_L3_MASK |
> + if (unlikely(flags & ~(BPF_F_ADJ_ROOM_DECAP_MASK |
> + BPF_F_ADJ_ROOM_FIXED_GSO |
> BPF_F_ADJ_ROOM_NO_CSUM_RESET)))
> return -EINVAL;
>
> @@ -3627,6 +3632,9 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
> if (decap) {
> /* Match skb->protocol to new outer l3 protocol */
> if (flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV6)
Martin KaFai Lau raised a concern in v5 about whether non-GSO packets
properly handle decapsulation state clearing when bpf_skb_adjust_room()
is used to decapsulate them. He noted this has been asked multiple
times by AI review bots but has been ignored.
The concern is that BPF programs can call bpf_skb_adjust_room() to
decapsulate non-GSO packets, but if skb->encapsulation remains set after
tunnel headers are stripped, downstream network stack components could
misinterpret the packet as still being encapsulated.
Martin noted that bpf_skb_net_grow() sets encapsulation regardless of
skb_is_gso. Looking at the v6 patch series, the code that handles
decapsulation state still only operates inside the skb_is_gso() block.
Should the decapsulation state clearing logic handle non-GSO packets as
well?
Lore reference: https://lore.kernel.org/bpf/2026427201356.-eYM.martin.lau@linux.dev/
Reviewer: Martin KaFai Lau
Date: Mon, 27 Apr 2026 15:19:53 -0700
[ ... ]
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25314425016
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH v6 2/6] bpf: refactor masks for ADJ_ROOM flags and encap validation
2026-05-04 10:17 ` [PATCH v6 2/6] bpf: refactor masks for ADJ_ROOM flags and encap validation Nick Hudson
2026-05-04 11:03 ` bot+bpf-ci
@ 2026-05-04 17:14 ` Willem de Bruijn
1 sibling, 0 replies; 12+ messages in thread
From: Willem de Bruijn @ 2026-05-04 17:14 UTC (permalink / raw)
To: Nick Hudson, bpf, netdev, Willem de Bruijn, Martin KaFai Lau
Cc: Nick Hudson, Max Tottenham, Anna Glasgall, Daniel Borkmann,
Alexei Starovoitov, Andrii Nakryiko, Eduard Zingerman,
Kumar Kartikeya Dwivedi, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, linux-kernel
Nick Hudson wrote:
> Refactor the helper masks for bpf_skb_adjust_room() flags to simplify
> validation logic and introduce:
>
> - BPF_F_ADJ_ROOM_ENCAP_MASK
> - BPF_F_ADJ_ROOM_DECAP_MASK
>
> Refactor existing validation checks in bpf_skb_net_shrink()
> and bpf_skb_adjust_room() to use the new masks (no behavior change).
>
> This is in preparation for supporting the new decap flags.
>
> Co-developed-by: Max Tottenham <mtottenh@akamai.com>
> Signed-off-by: Max Tottenham <mtottenh@akamai.com>
> Co-developed-by: Anna Glasgall <aglasgal@akamai.com>
> Signed-off-by: Anna Glasgall <aglasgal@akamai.com>
> Signed-off-by: Nick Hudson <nhudson@akamai.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v6 3/6] bpf: add BPF_F_ADJ_ROOM_DECAP_* flags for tunnel decapsulation
2026-05-04 10:17 [PATCH bpf-next v6 0/6] bpf: decap flags and GSO state updates Nick Hudson
2026-05-04 10:17 ` [PATCH v6 1/6] bpf: name the enum for BPF_FUNC_skb_adjust_room flags Nick Hudson
2026-05-04 10:17 ` [PATCH v6 2/6] bpf: refactor masks for ADJ_ROOM flags and encap validation Nick Hudson
@ 2026-05-04 10:17 ` Nick Hudson
2026-05-04 11:03 ` bot+bpf-ci
2026-05-04 10:17 ` [PATCH v6 4/6] bpf: allow new DECAP flags and add guard rails Nick Hudson
` (2 subsequent siblings)
5 siblings, 1 reply; 12+ messages in thread
From: Nick Hudson @ 2026-05-04 10:17 UTC (permalink / raw)
To: bpf, netdev, Willem de Bruijn, Martin KaFai Lau
Cc: Nick Hudson, Max Tottenham, Anna Glasgall, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Eduard Zingerman,
Kumar Kartikeya Dwivedi, linux-kernel
Add new bpf_skb_adjust_room() decapsulation flags:
- BPF_F_ADJ_ROOM_DECAP_L4_GRE
- BPF_F_ADJ_ROOM_DECAP_L4_UDP
- BPF_F_ADJ_ROOM_DECAP_IPXIP4
- BPF_F_ADJ_ROOM_DECAP_IPXIP6
These flags let BPF programs describe which tunnel layer is being
removed, so later changes can update tunnel-related GSO state
accordingly during decapsulation.
This patch only introduces the UAPI flag definitions and helper
documentation.
Co-developed-by: Max Tottenham <mtottenh@akamai.com>
Signed-off-by: Max Tottenham <mtottenh@akamai.com>
Co-developed-by: Anna Glasgall <aglasgal@akamai.com>
Signed-off-by: Anna Glasgall <aglasgal@akamai.com>
Signed-off-by: Nick Hudson <nhudson@akamai.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
---
include/uapi/linux/bpf.h | 34 ++++++++++++++++++++++++++++++++--
tools/include/uapi/linux/bpf.h | 34 ++++++++++++++++++++++++++++++++--
2 files changed, 64 insertions(+), 4 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c021ed8d7b44..4a53e731c554 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3010,8 +3010,34 @@ union bpf_attr {
*
* * **BPF_F_ADJ_ROOM_DECAP_L3_IPV4**,
* **BPF_F_ADJ_ROOM_DECAP_L3_IPV6**:
- * Indicate the new IP header version after decapsulating the outer
- * IP header. Used when the inner and outer IP versions are different.
+ * Indicate the new IP header version after decapsulating the
+ * outer IP header. Used when the inner and outer IP versions
+ * are different. These flags only trigger a protocol change
+ * without clearing any tunnel-specific GSO flags.
+ *
+ * * **BPF_F_ADJ_ROOM_DECAP_L4_GRE**:
+ * Clear GRE tunnel GSO flags (SKB_GSO_GRE and SKB_GSO_GRE_CSUM)
+ * when decapsulating a GRE tunnel.
+ *
+ * * **BPF_F_ADJ_ROOM_DECAP_L4_UDP**:
+ * Clear UDP tunnel GSO flags (SKB_GSO_UDP_TUNNEL and
+ * SKB_GSO_UDP_TUNNEL_CSUM) when decapsulating a UDP tunnel.
+ *
+ * * **BPF_F_ADJ_ROOM_DECAP_IPXIP4**:
+ * Clear IPIP/SIT tunnel GSO flag (SKB_GSO_IPXIP4) when decapsulating
+ * a tunnel with an outer IPv4 header (IPv4-in-IPv4 or IPv6-in-IPv4).
+ *
+ * * **BPF_F_ADJ_ROOM_DECAP_IPXIP6**:
+ * Clear IPv6 encapsulation tunnel GSO flag (SKB_GSO_IPXIP6) when
+ * decapsulating a tunnel with an outer IPv6 header (IPv6-in-IPv6
+ * or IPv4-in-IPv6).
+ *
+ * When using the decapsulation flags above, the skb->encapsulation
+ * flag is automatically cleared if all tunnel-specific GSO flags
+ * (SKB_GSO_UDP_TUNNEL, SKB_GSO_UDP_TUNNEL_CSUM, SKB_GSO_GRE,
+ * SKB_GSO_GRE_CSUM, SKB_GSO_IPXIP4, SKB_GSO_IPXIP6) have been
+ * removed from the packet. This handles cases where all tunnel
+ * layers have been decapsulated.
*
* A call to this helper is susceptible to change the underlying
* packet buffer. Therefore, at load time, all checks on pointers
@@ -6221,6 +6247,10 @@ enum bpf_adj_room_flags {
BPF_F_ADJ_ROOM_ENCAP_L2_ETH = (1ULL << 6),
BPF_F_ADJ_ROOM_DECAP_L3_IPV4 = (1ULL << 7),
BPF_F_ADJ_ROOM_DECAP_L3_IPV6 = (1ULL << 8),
+ BPF_F_ADJ_ROOM_DECAP_L4_GRE = (1ULL << 9),
+ BPF_F_ADJ_ROOM_DECAP_L4_UDP = (1ULL << 10),
+ BPF_F_ADJ_ROOM_DECAP_IPXIP4 = (1ULL << 11),
+ BPF_F_ADJ_ROOM_DECAP_IPXIP6 = (1ULL << 12),
};
enum {
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index ca35ed622ed5..f4c2fbd8fe68 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3010,8 +3010,34 @@ union bpf_attr {
*
* * **BPF_F_ADJ_ROOM_DECAP_L3_IPV4**,
* **BPF_F_ADJ_ROOM_DECAP_L3_IPV6**:
- * Indicate the new IP header version after decapsulating the outer
- * IP header. Used when the inner and outer IP versions are different.
+ * Indicate the new IP header version after decapsulating the
+ * outer IP header. Used when the inner and outer IP versions
+ * are different. These flags only trigger a protocol change
+ * without clearing any tunnel-specific GSO flags.
+ *
+ * * **BPF_F_ADJ_ROOM_DECAP_L4_GRE**:
+ * Clear GRE tunnel GSO flags (SKB_GSO_GRE and SKB_GSO_GRE_CSUM)
+ * when decapsulating a GRE tunnel.
+ *
+ * * **BPF_F_ADJ_ROOM_DECAP_L4_UDP**:
+ * Clear UDP tunnel GSO flags (SKB_GSO_UDP_TUNNEL and
+ * SKB_GSO_UDP_TUNNEL_CSUM) when decapsulating a UDP tunnel.
+ *
+ * * **BPF_F_ADJ_ROOM_DECAP_IPXIP4**:
+ * Clear IPIP/SIT tunnel GSO flag (SKB_GSO_IPXIP4) when decapsulating
+ * a tunnel with an outer IPv4 header (IPv4-in-IPv4 or IPv6-in-IPv4).
+ *
+ * * **BPF_F_ADJ_ROOM_DECAP_IPXIP6**:
+ * Clear IPv6 encapsulation tunnel GSO flag (SKB_GSO_IPXIP6) when
+ * decapsulating a tunnel with an outer IPv6 header (IPv6-in-IPv6
+ * or IPv4-in-IPv6).
+ *
+ * When using the decapsulation flags above, the skb->encapsulation
+ * flag is automatically cleared if all tunnel-specific GSO flags
+ * (SKB_GSO_UDP_TUNNEL, SKB_GSO_UDP_TUNNEL_CSUM, SKB_GSO_GRE,
+ * SKB_GSO_GRE_CSUM, SKB_GSO_IPXIP4, SKB_GSO_IPXIP6) have been
+ * removed from the packet. This handles cases where all tunnel
+ * layers have been decapsulated.
*
* A call to this helper is susceptible to change the underlying
* packet buffer. Therefore, at load time, all checks on pointers
@@ -6221,6 +6247,10 @@ enum bpf_adj_room_flags {
BPF_F_ADJ_ROOM_ENCAP_L2_ETH = (1ULL << 6),
BPF_F_ADJ_ROOM_DECAP_L3_IPV4 = (1ULL << 7),
BPF_F_ADJ_ROOM_DECAP_L3_IPV6 = (1ULL << 8),
+ BPF_F_ADJ_ROOM_DECAP_L4_GRE = (1ULL << 9),
+ BPF_F_ADJ_ROOM_DECAP_L4_UDP = (1ULL << 10),
+ BPF_F_ADJ_ROOM_DECAP_IPXIP4 = (1ULL << 11),
+ BPF_F_ADJ_ROOM_DECAP_IPXIP6 = (1ULL << 12),
};
enum {
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH v6 3/6] bpf: add BPF_F_ADJ_ROOM_DECAP_* flags for tunnel decapsulation
2026-05-04 10:17 ` [PATCH v6 3/6] bpf: add BPF_F_ADJ_ROOM_DECAP_* flags for tunnel decapsulation Nick Hudson
@ 2026-05-04 11:03 ` bot+bpf-ci
0 siblings, 0 replies; 12+ messages in thread
From: bot+bpf-ci @ 2026-05-04 11:03 UTC (permalink / raw)
To: nhudson, bpf, netdev, willemb, martin.lau
Cc: nhudson, mtottenh, aglasgal, ast, daniel, andrii, eddyz87, memxor,
linux-kernel, ast, andrii, daniel, martin.lau, eddyz87,
yonghong.song, clm, ihor.solodrai
[-- Attachment #1: Type: text/plain, Size: 3617 bytes --]
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index c021ed8d7b44..4a53e731c554 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -3010,8 +3010,34 @@ union bpf_attr {
> *
> * * **BPF_F_ADJ_ROOM_DECAP_L3_IPV4**,
> * **BPF_F_ADJ_ROOM_DECAP_L3_IPV6**:
> - * Indicate the new IP header version after decapsulating the outer
> - * IP header. Used when the inner and outer IP versions are different.
> + * Indicate the new IP header version after decapsulating the
> + * outer IP header. Used when the inner and outer IP versions
> + * are different. These flags only trigger a protocol change
> + * without clearing any tunnel-specific GSO flags.
> + *
> + * * **BPF_F_ADJ_ROOM_DECAP_L4_GRE**:
> + * Clear GRE tunnel GSO flags (SKB_GSO_GRE and SKB_GSO_GRE_CSUM)
> + * when decapsulating a GRE tunnel.
> + *
> + * * **BPF_F_ADJ_ROOM_DECAP_L4_UDP**:
> + * Clear UDP tunnel GSO flags (SKB_GSO_UDP_TUNNEL and
> + * SKB_GSO_UDP_TUNNEL_CSUM) when decapsulating a UDP tunnel.
> + *
> + * * **BPF_F_ADJ_ROOM_DECAP_IPXIP4**:
> + * Clear IPIP/SIT tunnel GSO flag (SKB_GSO_IPXIP4) when decapsulating
> + * a tunnel with an outer IPv4 header (IPv4-in-IPv4 or IPv6-in-IPv4).
> + *
> + * * **BPF_F_ADJ_ROOM_DECAP_IPXIP6**:
> + * Clear IPv6 encapsulation tunnel GSO flag (SKB_GSO_IPXIP6) when
> + * decapsulating a tunnel with an outer IPv6 header (IPv6-in-IPv6
> + * or IPv4-in-IPv6).
> + *
> + * When using the decapsulation flags above, the skb->encapsulation
> + * flag is automatically cleared if all tunnel-specific GSO flags
> + * (SKB_GSO_UDP_TUNNEL, SKB_GSO_UDP_TUNNEL_CSUM, SKB_GSO_GRE,
> + * SKB_GSO_GRE_CSUM, SKB_GSO_IPXIP4, SKB_GSO_IPXIP6) have been
> + * removed from the packet. This handles cases where all tunnel
> + * layers have been decapsulated.
The documentation states that skb->encapsulation is automatically
cleared when all tunnel-specific GSO flags have been removed. Does the
implementation in bpf_skb_net_shrink() handle non-GSO packets correctly?
In v5 review, Martin KaFai Lau noted that the GSO flag clearing logic
is only executed inside the if (skb_is_gso(skb)) block in
bpf_skb_net_shrink(). This means non-GSO packets bypass decapsulation
state clearing. When a non-GSO packet undergoes decapsulation with
tunnel headers stripped but skb->encapsulation remains set, downstream
network stack components may misinterpret the packet as still being
encapsulated.
Note that bpf_skb_net_grow() sets encapsulation regardless of
skb_is_gso, which suggests non-GSO packets can have encapsulation set.
This concern has been raised multiple times by reviewers but has not
been addressed in the current version.
Reference: https://lore.kernel.org/bpf/2026427201356.-eYM.martin.lau@linux.dev/
> *
> * A call to this helper is susceptible to change the underlying
> * packet buffer. Therefore, at load time, all checks on pointers
> @@ -6221,6 +6247,10 @@ enum bpf_adj_room_flags {
> BPF_F_ADJ_ROOM_ENCAP_L2_ETH = (1ULL << 6),
> BPF_F_ADJ_ROOM_DECAP_L3_IPV4 = (1ULL << 7),
> BPF_F_ADJ_ROOM_DECAP_L3_IPV6 = (1ULL << 8),
> + BPF_F_ADJ_ROOM_DECAP_L4_GRE = (1ULL << 9),
> + BPF_F_ADJ_ROOM_DECAP_L4_UDP = (1ULL << 10),
> + BPF_F_ADJ_ROOM_DECAP_IPXIP4 = (1ULL << 11),
> + BPF_F_ADJ_ROOM_DECAP_IPXIP6 = (1ULL << 12),
> };
>
> enum {
[ ... ]
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25314425016
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v6 4/6] bpf: allow new DECAP flags and add guard rails
2026-05-04 10:17 [PATCH bpf-next v6 0/6] bpf: decap flags and GSO state updates Nick Hudson
` (2 preceding siblings ...)
2026-05-04 10:17 ` [PATCH v6 3/6] bpf: add BPF_F_ADJ_ROOM_DECAP_* flags for tunnel decapsulation Nick Hudson
@ 2026-05-04 10:17 ` Nick Hudson
2026-05-04 10:17 ` [PATCH v6 5/6] bpf: clear decap state on skb_adjust_room shrink path Nick Hudson
2026-05-04 10:17 ` [PATCH v6 6/6] selftests/bpf: tc_tunnel - validate decap GSO and encapsulation state Nick Hudson
5 siblings, 0 replies; 12+ messages in thread
From: Nick Hudson @ 2026-05-04 10:17 UTC (permalink / raw)
To: bpf, netdev, Willem de Bruijn, Martin KaFai Lau
Cc: Nick Hudson, Max Tottenham, Anna Glasgall, Daniel Borkmann,
Alexei Starovoitov, Andrii Nakryiko, Eduard Zingerman,
Kumar Kartikeya Dwivedi, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, linux-kernel
Add checks to require shrink-only decap, reject conflicting decap flag
combinations, and verify removed length is sufficient for claimed header
decapsulation.
Co-developed-by: Max Tottenham <mtottenh@akamai.com>
Signed-off-by: Max Tottenham <mtottenh@akamai.com>
Co-developed-by: Anna Glasgall <aglasgal@akamai.com>
Signed-off-by: Anna Glasgall <aglasgal@akamai.com>
Signed-off-by: Nick Hudson <nhudson@akamai.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
---
net/core/filter.c | 44 +++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 43 insertions(+), 1 deletion(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index 02d3947cca32..185a11f425fa 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -56,6 +56,7 @@
#include <net/sock_reuseport.h>
#include <net/busy_poll.h>
#include <net/tcp.h>
+#include <net/gre.h>
#include <net/xfrm.h>
#include <net/udp.h>
#include <linux/bpf_trace.h>
@@ -3484,6 +3485,12 @@ static u32 bpf_skb_net_base_len(const struct sk_buff *skb)
#define BPF_F_ADJ_ROOM_DECAP_L3_MASK (BPF_F_ADJ_ROOM_DECAP_L3_IPV4 | \
BPF_F_ADJ_ROOM_DECAP_L3_IPV6)
+#define BPF_F_ADJ_ROOM_DECAP_L4_MASK (BPF_F_ADJ_ROOM_DECAP_L4_UDP | \
+ BPF_F_ADJ_ROOM_DECAP_L4_GRE)
+
+#define BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK (BPF_F_ADJ_ROOM_DECAP_IPXIP4 | \
+ BPF_F_ADJ_ROOM_DECAP_IPXIP6)
+
#define BPF_F_ADJ_ROOM_ENCAP_MASK (BPF_F_ADJ_ROOM_ENCAP_L3_MASK | \
BPF_F_ADJ_ROOM_ENCAP_L4_GRE | \
BPF_F_ADJ_ROOM_ENCAP_L4_UDP | \
@@ -3491,7 +3498,9 @@ static u32 bpf_skb_net_base_len(const struct sk_buff *skb)
BPF_F_ADJ_ROOM_ENCAP_L2( \
BPF_ADJ_ROOM_ENCAP_L2_MASK))
-#define BPF_F_ADJ_ROOM_DECAP_MASK (BPF_F_ADJ_ROOM_DECAP_L3_MASK)
+#define BPF_F_ADJ_ROOM_DECAP_MASK (BPF_F_ADJ_ROOM_DECAP_L3_MASK | \
+ BPF_F_ADJ_ROOM_DECAP_L4_MASK | \
+ BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK)
#define BPF_F_ADJ_ROOM_MASK (BPF_F_ADJ_ROOM_FIXED_GSO | \
BPF_F_ADJ_ROOM_ENCAP_MASK | \
@@ -3739,6 +3748,8 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
}
if (flags & BPF_F_ADJ_ROOM_DECAP_MASK) {
+ u32 len_decap_min = 0;
+
if (!shrink)
return -EINVAL;
@@ -3747,6 +3758,37 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
BPF_F_ADJ_ROOM_DECAP_L3_MASK)
return -EINVAL;
+ if ((flags & BPF_F_ADJ_ROOM_DECAP_L4_MASK) ==
+ BPF_F_ADJ_ROOM_DECAP_L4_MASK)
+ return -EINVAL;
+
+ if ((flags & BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK) ==
+ BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK)
+ return -EINVAL;
+
+ /* Reject mutually exclusive decap tunnel type flags. */
+ if ((flags & BPF_F_ADJ_ROOM_DECAP_L4_MASK) &&
+ (flags & BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK))
+ return -EINVAL;
+
+ if (flags & BPF_F_ADJ_ROOM_DECAP_L4_MASK)
+ len_decap_min += bpf_skb_net_base_len(skb);
+
+ if (flags & BPF_F_ADJ_ROOM_DECAP_L4_UDP)
+ len_decap_min += sizeof(struct udphdr);
+
+ if (flags & BPF_F_ADJ_ROOM_DECAP_L4_GRE)
+ len_decap_min += sizeof(struct gre_base_hdr);
+
+ if (flags & BPF_F_ADJ_ROOM_DECAP_IPXIP4)
+ len_decap_min += sizeof(struct iphdr);
+
+ if (flags & BPF_F_ADJ_ROOM_DECAP_IPXIP6)
+ len_decap_min += sizeof(struct ipv6hdr);
+
+ if (len_diff_abs < len_decap_min)
+ return -EINVAL;
+
if (flags & BPF_F_ADJ_ROOM_DECAP_L3_IPV4)
len_min = sizeof(struct iphdr);
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH v6 5/6] bpf: clear decap state on skb_adjust_room shrink path
2026-05-04 10:17 [PATCH bpf-next v6 0/6] bpf: decap flags and GSO state updates Nick Hudson
` (3 preceding siblings ...)
2026-05-04 10:17 ` [PATCH v6 4/6] bpf: allow new DECAP flags and add guard rails Nick Hudson
@ 2026-05-04 10:17 ` Nick Hudson
2026-05-04 17:15 ` Willem de Bruijn
2026-05-04 10:17 ` [PATCH v6 6/6] selftests/bpf: tc_tunnel - validate decap GSO and encapsulation state Nick Hudson
5 siblings, 1 reply; 12+ messages in thread
From: Nick Hudson @ 2026-05-04 10:17 UTC (permalink / raw)
To: bpf, netdev, Willem de Bruijn, Martin KaFai Lau
Cc: Nick Hudson, Max Tottenham, Anna Glasgall, Daniel Borkmann,
Alexei Starovoitov, Andrii Nakryiko, Eduard Zingerman,
Kumar Kartikeya Dwivedi, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, linux-kernel
On shrink in bpf_skb_adjust_room(), apply decapsulation state updates
according to BPF_F_ADJ_ROOM_DECAP_* flags.
For GSO skbs, clear only the tunnel gso_type bits that correspond to the
requested decap layer:
- DECAP_L4_UDP: SKB_GSO_UDP_TUNNEL{,_CSUM}
- DECAP_L4_GRE: SKB_GSO_GRE{,_CSUM}
- DECAP_IPXIP4: SKB_GSO_IPXIP4
- DECAP_IPXIP6: SKB_GSO_IPXIP6
Then clear skb->encapsulation only if no tunnel GSO bits remain, keeping
encapsulation set for cases such as ESP-in-UDP where tunnel state remains.
For non-GSO skbs, there are no tunnel GSO bits to consult, so clear
skb->encapsulation directly when DECAP_L4_* or DECAP_IPXIP_* flags are set.
This keeps decap state handling consistent between GSO and non-GSO packets.
Co-developed-by: Max Tottenham <mtottenh@akamai.com>
Signed-off-by: Max Tottenham <mtottenh@akamai.com>
Co-developed-by: Anna Glasgall <aglasgal@akamai.com>
Signed-off-by: Anna Glasgall <aglasgal@akamai.com>
Signed-off-by: Nick Hudson <nhudson@akamai.com>
---
net/core/filter.c | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
diff --git a/net/core/filter.c b/net/core/filter.c
index 185a11f425fa..3213732dff84 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3666,9 +3666,48 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
if (!(flags & BPF_F_ADJ_ROOM_FIXED_GSO))
skb_increase_gso_size(shinfo, len_diff);
+ /* Selective GSO flag clearing based on decap type.
+ * Only clear the flags for the tunnel layer being removed.
+ */
+ if ((flags & BPF_F_ADJ_ROOM_DECAP_L4_UDP) &&
+ (shinfo->gso_type & (SKB_GSO_UDP_TUNNEL |
+ SKB_GSO_UDP_TUNNEL_CSUM)))
+ shinfo->gso_type &= ~(SKB_GSO_UDP_TUNNEL |
+ SKB_GSO_UDP_TUNNEL_CSUM);
+ if ((flags & BPF_F_ADJ_ROOM_DECAP_L4_GRE) &&
+ (shinfo->gso_type & (SKB_GSO_GRE | SKB_GSO_GRE_CSUM)))
+ shinfo->gso_type &= ~(SKB_GSO_GRE |
+ SKB_GSO_GRE_CSUM);
+ if ((flags & BPF_F_ADJ_ROOM_DECAP_IPXIP4) &&
+ (shinfo->gso_type & SKB_GSO_IPXIP4))
+ shinfo->gso_type &= ~SKB_GSO_IPXIP4;
+ if ((flags & BPF_F_ADJ_ROOM_DECAP_IPXIP6) &&
+ (shinfo->gso_type & SKB_GSO_IPXIP6))
+ shinfo->gso_type &= ~SKB_GSO_IPXIP6;
+
+ /* Clear encapsulation flag only when no tunnel GSO flags remain */
+ if (flags & (BPF_F_ADJ_ROOM_DECAP_L4_MASK |
+ BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK)) {
+ if (!(shinfo->gso_type & (SKB_GSO_UDP_TUNNEL |
+ SKB_GSO_UDP_TUNNEL_CSUM |
+ SKB_GSO_GRE |
+ SKB_GSO_GRE_CSUM |
+ SKB_GSO_IPXIP4 |
+ SKB_GSO_IPXIP6 |
+ SKB_GSO_ESP)))
+ if (skb->encapsulation)
+ skb->encapsulation = 0;
+ }
+
/* Header must be checked, and gso_segs recomputed. */
shinfo->gso_type |= SKB_GSO_DODGY;
shinfo->gso_segs = 0;
+ } else {
+ /* For non-GSO packets, clear encapsulation if decap flags are set */
+ if ((flags & (BPF_F_ADJ_ROOM_DECAP_L4_MASK |
+ BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK)) &&
+ skb->encapsulation)
+ skb->encapsulation = 0;
}
return 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH v6 5/6] bpf: clear decap state on skb_adjust_room shrink path
2026-05-04 10:17 ` [PATCH v6 5/6] bpf: clear decap state on skb_adjust_room shrink path Nick Hudson
@ 2026-05-04 17:15 ` Willem de Bruijn
0 siblings, 0 replies; 12+ messages in thread
From: Willem de Bruijn @ 2026-05-04 17:15 UTC (permalink / raw)
To: Nick Hudson, bpf, netdev, Willem de Bruijn, Martin KaFai Lau
Cc: Nick Hudson, Max Tottenham, Anna Glasgall, Daniel Borkmann,
Alexei Starovoitov, Andrii Nakryiko, Eduard Zingerman,
Kumar Kartikeya Dwivedi, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, linux-kernel
Nick Hudson wrote:
> On shrink in bpf_skb_adjust_room(), apply decapsulation state updates
> according to BPF_F_ADJ_ROOM_DECAP_* flags.
>
> For GSO skbs, clear only the tunnel gso_type bits that correspond to the
> requested decap layer:
> - DECAP_L4_UDP: SKB_GSO_UDP_TUNNEL{,_CSUM}
> - DECAP_L4_GRE: SKB_GSO_GRE{,_CSUM}
> - DECAP_IPXIP4: SKB_GSO_IPXIP4
> - DECAP_IPXIP6: SKB_GSO_IPXIP6
>
> Then clear skb->encapsulation only if no tunnel GSO bits remain, keeping
> encapsulation set for cases such as ESP-in-UDP where tunnel state remains.
>
> For non-GSO skbs, there are no tunnel GSO bits to consult, so clear
> skb->encapsulation directly when DECAP_L4_* or DECAP_IPXIP_* flags are set.
>
> This keeps decap state handling consistent between GSO and non-GSO packets.
>
> Co-developed-by: Max Tottenham <mtottenh@akamai.com>
> Signed-off-by: Max Tottenham <mtottenh@akamai.com>
> Co-developed-by: Anna Glasgall <aglasgal@akamai.com>
> Signed-off-by: Anna Glasgall <aglasgal@akamai.com>
> Signed-off-by: Nick Hudson <nhudson@akamai.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v6 6/6] selftests/bpf: tc_tunnel - validate decap GSO and encapsulation state
2026-05-04 10:17 [PATCH bpf-next v6 0/6] bpf: decap flags and GSO state updates Nick Hudson
` (4 preceding siblings ...)
2026-05-04 10:17 ` [PATCH v6 5/6] bpf: clear decap state on skb_adjust_room shrink path Nick Hudson
@ 2026-05-04 10:17 ` Nick Hudson
5 siblings, 0 replies; 12+ messages in thread
From: Nick Hudson @ 2026-05-04 10:17 UTC (permalink / raw)
To: bpf, netdev, Willem de Bruijn, Martin KaFai Lau
Cc: Nick Hudson, Andrii Nakryiko, Eduard Zingerman,
Alexei Starovoitov, Daniel Borkmann, Kumar Kartikeya Dwivedi,
Shuah Khan, linux-kselftest, linux-kernel
tc_tunnel only partially validated decap state and missed some tunnel
cases. In particular, IPXIP decap checks were not exercised for
IPIP/SIT paths, and non-GSO decap encapsulation state was not
verified.
Tighten the test by:
- setting DECAP_IPXIP4/6 flags for IPIP/SIT/IP6 decap paths based on
the outer tunnel header family;
- requiring needed DECAP enum values via CO-RE enum existence checks
so missing kernel support fails fast;
- validating post-decap tunnel state for both GSO and non-GSO packets:
expected gso_type bits must be cleared and skb->encapsulation must
match remaining tunnel flags;
- removing forced TSO disable in the test harness so GSO validation is
exercised.
This improves coverage for decap tunnel-state regressions and ensures
sit_none/ipip-style paths are checked correctly.
Signed-off-by: Nick Hudson <nhudson@akamai.com>
---
.../selftests/bpf/prog_tests/test_tc_tunnel.c | 1 -
.../selftests/bpf/progs/test_tc_tunnel.c | 91 +++++++++++++++++--
2 files changed, 84 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/test_tc_tunnel.c b/tools/testing/selftests/bpf/prog_tests/test_tc_tunnel.c
index 1aa7c9463980..67ba27d69347 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_tc_tunnel.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_tc_tunnel.c
@@ -438,7 +438,6 @@ static int setup(void)
SYS(fail_close_ns_client, "ip link add %s type veth peer name %s",
"veth1 mtu 1500 netns " CLIENT_NS " address " MAC_ADDR_VETH1,
"veth2 mtu 1500 netns " SERVER_NS " address " MAC_ADDR_VETH2);
- SYS(fail_close_ns_client, "ethtool -K veth1 tso off");
SYS(fail_close_ns_client, "ip link set veth1 up");
nstoken_server = open_netns(SERVER_NS);
if (!ASSERT_OK_PTR(nstoken_server, "open server ns"))
diff --git a/tools/testing/selftests/bpf/progs/test_tc_tunnel.c b/tools/testing/selftests/bpf/progs/test_tc_tunnel.c
index 7376df405a6b..853bca962910 100644
--- a/tools/testing/selftests/bpf/progs/test_tc_tunnel.c
+++ b/tools/testing/selftests/bpf/progs/test_tc_tunnel.c
@@ -6,6 +6,7 @@
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_endian.h>
+#include <bpf/bpf_core_read.h>
#include "bpf_tracing_net.h"
#include "bpf_compiler.h"
@@ -37,6 +38,22 @@ struct vxlanhdr___local {
#define EXTPROTO_VXLAN 0x1
+#define SKB_GSO_UDP_TUNNEL_MASK (SKB_GSO_UDP_TUNNEL | \
+ SKB_GSO_UDP_TUNNEL_CSUM)
+
+#define SKB_GSO_TUNNEL_MASK (SKB_GSO_UDP_TUNNEL_MASK | \
+ SKB_GSO_GRE | \
+ SKB_GSO_GRE_CSUM | \
+ SKB_GSO_IPXIP4 | \
+ SKB_GSO_IPXIP6 | \
+ SKB_GSO_ESP)
+
+#define BPF_F_ADJ_ROOM_DECAP_L4_MASK (BPF_F_ADJ_ROOM_DECAP_L4_UDP | \
+ BPF_F_ADJ_ROOM_DECAP_L4_GRE)
+
+#define BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK (BPF_F_ADJ_ROOM_DECAP_IPXIP4 | \
+ BPF_F_ADJ_ROOM_DECAP_IPXIP6)
+
#define VXLAN_FLAGS bpf_htonl(1<<27)
#define VNI_ID 1
#define VXLAN_VNI bpf_htonl(VNI_ID << 8)
@@ -589,9 +606,12 @@ int __encap_ip6vxlan_eth(struct __sk_buff *skb)
return TC_ACT_OK;
}
-static int decap_internal(struct __sk_buff *skb, int off, int len, char proto)
+static int decap_internal(struct __sk_buff *skb, int off, int len, char proto,
+ __u64 ipxip_flag)
{
__u64 flags = BPF_F_ADJ_ROOM_FIXED_GSO;
+ struct sk_buff *kskb;
+ struct skb_shared_info *shinfo;
struct ipv6_opt_hdr ip6_opt_hdr;
struct gre_hdr greh;
struct udphdr udph;
@@ -599,10 +619,12 @@ static int decap_internal(struct __sk_buff *skb, int off, int len, char proto)
switch (proto) {
case IPPROTO_IPIP:
- flags |= BPF_F_ADJ_ROOM_DECAP_L3_IPV4;
+ flags |= BPF_F_ADJ_ROOM_DECAP_L3_IPV4 |
+ ipxip_flag;
break;
case IPPROTO_IPV6:
- flags |= BPF_F_ADJ_ROOM_DECAP_L3_IPV6;
+ flags |= BPF_F_ADJ_ROOM_DECAP_L3_IPV6 |
+ ipxip_flag;
break;
case NEXTHDR_DEST:
if (bpf_skb_load_bytes(skb, off + len, &ip6_opt_hdr,
@@ -610,10 +632,12 @@ static int decap_internal(struct __sk_buff *skb, int off, int len, char proto)
return TC_ACT_OK;
switch (ip6_opt_hdr.nexthdr) {
case IPPROTO_IPIP:
- flags |= BPF_F_ADJ_ROOM_DECAP_L3_IPV4;
+ flags |= BPF_F_ADJ_ROOM_DECAP_L3_IPV4 |
+ ipxip_flag;
break;
case IPPROTO_IPV6:
- flags |= BPF_F_ADJ_ROOM_DECAP_L3_IPV6;
+ flags |= BPF_F_ADJ_ROOM_DECAP_L3_IPV6 |
+ ipxip_flag;
break;
default:
return TC_ACT_OK;
@@ -621,6 +645,11 @@ static int decap_internal(struct __sk_buff *skb, int off, int len, char proto)
break;
case IPPROTO_GRE:
olen += sizeof(struct gre_hdr);
+ if (!bpf_core_enum_value_exists(enum bpf_adj_room_flags,
+ BPF_F_ADJ_ROOM_DECAP_L4_GRE))
+ return TC_ACT_SHOT;
+ flags |= BPF_F_ADJ_ROOM_DECAP_L4_GRE;
+
if (bpf_skb_load_bytes(skb, off + len, &greh, sizeof(greh)) < 0)
return TC_ACT_OK;
switch (bpf_ntohs(greh.protocol)) {
@@ -634,6 +663,10 @@ static int decap_internal(struct __sk_buff *skb, int off, int len, char proto)
break;
case IPPROTO_UDP:
olen += sizeof(struct udphdr);
+ if (!bpf_core_enum_value_exists(enum bpf_adj_room_flags,
+ BPF_F_ADJ_ROOM_DECAP_L4_UDP))
+ return TC_ACT_SHOT;
+ flags |= BPF_F_ADJ_ROOM_DECAP_L4_UDP;
if (bpf_skb_load_bytes(skb, off + len, &udph, sizeof(udph)) < 0)
return TC_ACT_OK;
switch (bpf_ntohs(udph.dest)) {
@@ -655,6 +688,40 @@ static int decap_internal(struct __sk_buff *skb, int off, int len, char proto)
if (bpf_skb_adjust_room(skb, -olen, BPF_ADJ_ROOM_MAC, flags))
return TC_ACT_SHOT;
+ kskb = bpf_cast_to_kern_ctx(skb);
+ shinfo = bpf_core_cast(kskb->head + kskb->end, struct skb_shared_info);
+ if (shinfo->gso_size) {
+ if ((flags & BPF_F_ADJ_ROOM_DECAP_L4_UDP) &&
+ (shinfo->gso_type & SKB_GSO_UDP_TUNNEL_MASK))
+ return TC_ACT_SHOT;
+
+ if ((flags & BPF_F_ADJ_ROOM_DECAP_L4_GRE) &&
+ (shinfo->gso_type & (SKB_GSO_GRE | SKB_GSO_GRE_CSUM)))
+ return TC_ACT_SHOT;
+
+ if ((flags & BPF_F_ADJ_ROOM_DECAP_IPXIP4) &&
+ (shinfo->gso_type & SKB_GSO_IPXIP4))
+ return TC_ACT_SHOT;
+
+ if ((flags & BPF_F_ADJ_ROOM_DECAP_IPXIP6) &&
+ (shinfo->gso_type & SKB_GSO_IPXIP6))
+ return TC_ACT_SHOT;
+
+ if (flags & (BPF_F_ADJ_ROOM_DECAP_L4_MASK |
+ BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK)) {
+ if ((shinfo->gso_type & SKB_GSO_TUNNEL_MASK) &&
+ !kskb->encapsulation)
+ return TC_ACT_SHOT;
+ if (!(shinfo->gso_type & SKB_GSO_TUNNEL_MASK) &&
+ kskb->encapsulation)
+ return TC_ACT_SHOT;
+ }
+ } else if ((flags & (BPF_F_ADJ_ROOM_DECAP_L4_MASK |
+ BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK)) &&
+ kskb->encapsulation) {
+ return TC_ACT_SHOT;
+ }
+
return TC_ACT_OK;
}
@@ -662,6 +729,10 @@ static int decap_ipv4(struct __sk_buff *skb)
{
struct iphdr iph_outer;
+ if (!bpf_core_enum_value_exists(enum bpf_adj_room_flags,
+ BPF_F_ADJ_ROOM_DECAP_IPXIP4))
+ return TC_ACT_SHOT;
+
if (bpf_skb_load_bytes(skb, ETH_HLEN, &iph_outer,
sizeof(iph_outer)) < 0)
return TC_ACT_OK;
@@ -670,19 +741,25 @@ static int decap_ipv4(struct __sk_buff *skb)
return TC_ACT_OK;
return decap_internal(skb, ETH_HLEN, sizeof(iph_outer),
- iph_outer.protocol);
+ iph_outer.protocol,
+ BPF_F_ADJ_ROOM_DECAP_IPXIP4);
}
static int decap_ipv6(struct __sk_buff *skb)
{
struct ipv6hdr iph_outer;
+ if (!bpf_core_enum_value_exists(enum bpf_adj_room_flags,
+ BPF_F_ADJ_ROOM_DECAP_IPXIP6))
+ return TC_ACT_SHOT;
+
if (bpf_skb_load_bytes(skb, ETH_HLEN, &iph_outer,
sizeof(iph_outer)) < 0)
return TC_ACT_OK;
return decap_internal(skb, ETH_HLEN, sizeof(iph_outer),
- iph_outer.nexthdr);
+ iph_outer.nexthdr,
+ BPF_F_ADJ_ROOM_DECAP_IPXIP6);
}
SEC("tc")
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread