[PATCH bpf v2 0/4] bpf: Update transport_header when encapsulating UDP tunnel in lwt

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH bpf v2 0/4] bpf: Update transport_header when encapsulating UDP tunnel in lwt
@ 2026-05-29 15:13 Leon Hwang
  2026-05-29 15:13 ` [PATCH bpf v2 1/4] bpf: Fix TOCTOU issue " Leon Hwang
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Leon Hwang @ 2026-05-29 15:13 UTC (permalink / raw)
  To: bpf
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Andrii Nakryiko, Eduard Zingerman,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
	Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
	Shuah Khan, Guillaume Nault, Leon Hwang, Ido Schimmel,
	Fernando Fernandez Mancera, Peter Oskolkov, linux-kernel, netdev,
	linux-kselftest, kernel-patches-bot

Currently, bpf_lwt_push_ip_encap() does not update skb->transport_header.
When a driver, e.g. ice, reuses the stale skb->transport_header to
offload checksum computation to NIC hardware, VxLAN packets encapsulated
by bpf_lwt_push_encap() helper may be dropped due to incorrect checksum.

Update skb->transport_header in bpf_lwt_push_ip_encap() whenever the
encapsulated packet uses UDP, so checksum offload works correctly.

Fix these two issues reported by sashiko:

1. memcpy() hdr to a local buffer to avoid TOCTOU issue.
1. "iph->ihl < 5" was missing to avoid infinite-loop in MIPS driver.

Changes:
v1 -> v2:
* Address sashiko's reviews:
  * Fix TOCTOU issue in lwt to avoid changing hdr after checks.
  * Add check iph->ihl < 5 in lwt to avoid infinite-loop in MIPS driver.
  * Update comment style in selftests with BPF comment style.
* v1: https://lore.kernel.org/bpf/20260525142650.2569-1-leon.hwang@linux.dev/

Leon Hwang (4):
  bpf: Fix TOCTOU issue in lwt
  bpf: Add check iph->ihl < 5 in lwt
  bpf: Update transport_header when encapsulating UDP tunnel in lwt
  selftests/bpf: Add tests to verify the fix of encapsulating VxLAN in
    lwt

 net/core/lwt_bpf.c                            |  20 ++-
 .../selftests/bpf/prog_tests/lwt_ip_encap.c   | 158 ++++++++++++++++++
 .../selftests/bpf/progs/test_lwt_ip_encap.c   | 112 +++++++++++++
 .../bpf/progs/test_lwt_ip_encap_fix.c         |  36 ++++
 4 files changed, 323 insertions(+), 3 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_lwt_ip_encap_fix.c

-- 
2.54.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH bpf v2 1/4] bpf: Fix TOCTOU issue in lwt
  2026-05-29 15:13 [PATCH bpf v2 0/4] bpf: Update transport_header when encapsulating UDP tunnel in lwt Leon Hwang
@ 2026-05-29 15:13 ` Leon Hwang
  2026-05-29 15:49   ` sashiko-bot
  2026-06-01  0:44   ` Alexei Starovoitov
  2026-05-29 15:13 ` [PATCH bpf v2 2/4] bpf: Add check iph->ihl < 5 " Leon Hwang
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 11+ messages in thread
From: Leon Hwang @ 2026-05-29 15:13 UTC (permalink / raw)
  To: bpf
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Andrii Nakryiko, Eduard Zingerman,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
	Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
	Shuah Khan, Guillaume Nault, Leon Hwang, Ido Schimmel,
	Fernando Fernandez Mancera, Peter Oskolkov, linux-kernel, netdev,
	linux-kselftest, kernel-patches-bot

Sashiko pointed out [1]:
The hdr pointer passed to bpf_lwt_push_ip_encap() can point to concurrently
mutable memory such as a BPF map value.

So, the memory of hdr pointer can be updated after skb_postpush_rcsum().

To fix it, memcpy() the hdr to a local buffer, which will be used for the
following checks and updates.

[1] https://lore.kernel.org/bpf/20260525150010.CDEBA1F000E9@smtp.kernel.org/

Fixes: 52f278774e79 ("bpf: implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap")
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 net/core/lwt_bpf.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
index f71ef82a5f3d..8009e427851f 100644
--- a/net/core/lwt_bpf.c
+++ b/net/core/lwt_bpf.c
@@ -599,6 +599,7 @@ static int handle_gso_encap(struct sk_buff *skb, bool ipv4, int encap_len)
 
 int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress)
 {
+	u8 buff[LWT_BPF_MAX_HEADROOM];
 	struct iphdr *iph;
 	bool ipv4;
 	int err;
@@ -606,8 +607,10 @@ int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress)
 	if (unlikely(len < sizeof(struct iphdr) || len > LWT_BPF_MAX_HEADROOM))
 		return -EINVAL;
 
+	memcpy(buff, hdr, len);
+
 	/* validate protocol and length */
-	iph = (struct iphdr *)hdr;
+	iph = (struct iphdr *)buff;
 	if (iph->version == 4) {
 		ipv4 = true;
 		if (unlikely(len < iph->ihl * 4))
@@ -637,7 +640,7 @@ int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress)
 	if (ingress)
 		skb_postpush_rcsum(skb, iph, len);
 	skb_reset_network_header(skb);
-	memcpy(skb_network_header(skb), hdr, len);
+	memcpy(skb_network_header(skb), buff, len);
 	bpf_compute_data_pointers(skb);
 	skb_clear_hash(skb);
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf v2 1/4] bpf: Fix TOCTOU issue in lwt
  2026-05-29 15:13 ` [PATCH bpf v2 1/4] bpf: Fix TOCTOU issue " Leon Hwang
@ 2026-05-29 15:49   ` sashiko-bot
  2026-06-01  0:44   ` Alexei Starovoitov
  1 sibling, 0 replies; 11+ messages in thread
From: sashiko-bot @ 2026-05-29 15:49 UTC (permalink / raw)
  To: Leon Hwang; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

New issues:
- [Medium] Using plain `memcpy()` to read concurrently mutable BPF map memory causes a data race.

Pre-existing issues:
- [High] Modifying `iph->check` after `skb_postpush_rcsum()` corrupts `skb->csum` for `CHECKSUM_COMPLETE` packets.
- [High] `handle_gso_encap()` hardcodes IPv4 header length to 20 bytes, ignoring IP options and leading to corrupted GSO parsing.
- [High] The pushed header length `len` is not validated for 4-byte alignment, causing unaligned memory accesses and kernel panics.
--

commit 9a2f99096b35970231963f30d1371e5ea1824572
Author: Leon Hwang <leon.hwang@linux.dev>

bpf: Fix TOCTOU issue in lwt

This commit fixes a TOCTOU issue where the header pointer passed to
bpf_lwt_push_ip_encap() could point to concurrently mutable memory like
a BPF map value. It resolves this by copying the header into a local
buffer before validating and pushing it.

Link: https://lore.kernel.org/bpf/20260525150010.CDEBA1F000E9@smtp.kernel.org/

> diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
> index f71ef82a5f3d3..8009e427851f1 100644
> --- a/net/core/lwt_bpf.c
> +++ b/net/core/lwt_bpf.c
> @@ -599,6 +599,7 @@ static int handle_gso_encap(struct sk_buff *skb, bool ipv4, int encap_len)
>  
>  int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress)
>  {
> +	u8 buff[LWT_BPF_MAX_HEADROOM];
>  	struct iphdr *iph;
>  	bool ipv4;
>  	int err;
> @@ -606,8 +607,10 @@ int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress)
>  	if (unlikely(len < sizeof(struct iphdr) || len > LWT_BPF_MAX_HEADROOM))
>  		return -EINVAL;
>  

[Severity: High]
This is a pre-existing issue, but is it possible for the len argument to be
unaligned here? If len is not a multiple of 4 (e.g. 21 bytes), skb_push()
later will shift skb->data by len bytes, making skb->data unaligned.

When ip_fast_csum() is called, it performs 32-bit memory accesses on the
pointer. On architectures that do not support unaligned memory accesses
natively (like older ARM, MIPS, or SPARC), would this trigger a fatal
alignment exception and crash the kernel?

> +	memcpy(buff, hdr, len);

[Severity: Medium]
Since hdr can point to BPF map memory that is concurrently mutated by user
space or other CPUs, could this plain memcpy() cause a data race? The
compiler might emit multiple loads or tear the reads, potentially resulting
in an inconsistent header state and triggering KCSAN warnings.

Should data_race() or a lockless safe memory copy mechanism be used here?

> +
>  	/* validate protocol and length */
> -	iph = (struct iphdr *)hdr;
> +	iph = (struct iphdr *)buff;

[ ... ]

> @@ -637,7 +640,7 @@ int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress)
>  	if (ingress)
>  		skb_postpush_rcsum(skb, iph, len);
>  	skb_reset_network_header(skb);
> -	memcpy(skb_network_header(skb), hdr, len);
> +	memcpy(skb_network_header(skb), buff, len);
>  	bpf_compute_data_pointers(skb);
>  	skb_clear_hash(skb);

[Severity: High]
This isn't a bug introduced by this patch, but does modifying iph->check
after skb_postpush_rcsum() corrupt skb->csum for CHECKSUM_COMPLETE packets?

When an IPv4 header is pushed on an ingress packet, skb_postpush_rcsum()
adds the checksum of the pushed header to skb->csum. If the BPF program
provides an IP header with iph->check == 0, skb->csum is updated using this
zeroed checksum.

Later in this function, we have:

	if (ipv4) {
		skb->protocol = htons(ETH_P_IP);
		iph = ip_hdr(skb);

		if (!iph->check)
			iph->check = ip_fast_csum((unsigned char *)iph,
						  iph->ihl);

This calculates and overwrites iph->check in the SKB, making the IP header
mathematically sum to 0. However, skb->csum is not updated to reflect this
modification, leaving it inflated by the partial sum of the IP header.

Could this break CHECKSUM_COMPLETE synchronization, causing valid packets to
be dropped later in the network stack?

[Severity: High]
This is a pre-existing issue, but does handle_gso_encap() properly handle
IPv4 headers with IP options?

At the end of bpf_lwt_push_ip_encap(), handle_gso_encap() is called:

	if (skb_is_gso(skb))
		return handle_gso_encap(skb, ipv4, len);

bpf_lwt_push_ip_encap() allows BPF programs to push IPv4 headers with IP
options (where iph->ihl > 5). However, if the packet is a GSO packet,
handle_gso_encap() hardcodes the IPv4 header length to sizeof(struct iphdr)
instead of using ip_hdr(skb)->ihl * 4.

Would this cause the function to point next_hdr into the middle of the IP
options rather than the encapsulated inner protocol header (e.g., GRE or
UDP), leading to corrupted GSO parsing?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260529151351.69911-1-leon.hwang@linux.dev?part=1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf v2 1/4] bpf: Fix TOCTOU issue in lwt
  2026-05-29 15:13 ` [PATCH bpf v2 1/4] bpf: Fix TOCTOU issue " Leon Hwang
  2026-05-29 15:49   ` sashiko-bot
@ 2026-06-01  0:44   ` Alexei Starovoitov
  2026-06-01 13:34     ` Leon Hwang
  1 sibling, 1 reply; 11+ messages in thread
From: Alexei Starovoitov @ 2026-06-01  0:44 UTC (permalink / raw)
  To: Leon Hwang
  Cc: bpf, David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Andrii Nakryiko, Eduard Zingerman,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
	Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
	Shuah Khan, Guillaume Nault, Ido Schimmel,
	Fernando Fernandez Mancera, Peter Oskolkov, LKML,
	Network Development, open list:KERNEL SELFTEST FRAMEWORK,
	kernel-patches-bot

On Fri, May 29, 2026 at 8:14 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>
> Sashiko pointed out [1]:
> The hdr pointer passed to bpf_lwt_push_ip_encap() can point to concurrently
> mutable memory such as a BPF map value.
>
> So, the memory of hdr pointer can be updated after skb_postpush_rcsum().
>
> To fix it, memcpy() the hdr to a local buffer, which will be used for the
> following checks and updates.
>
> [1] https://lore.kernel.org/bpf/20260525150010.CDEBA1F000E9@smtp.kernel.org/
>
> Fixes: 52f278774e79 ("bpf: implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap")
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
>  net/core/lwt_bpf.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
> index f71ef82a5f3d..8009e427851f 100644
> --- a/net/core/lwt_bpf.c
> +++ b/net/core/lwt_bpf.c
> @@ -599,6 +599,7 @@ static int handle_gso_encap(struct sk_buff *skb, bool ipv4, int encap_len)
>
>  int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress)
>  {
> +       u8 buff[LWT_BPF_MAX_HEADROOM];

extra 256 bytes of stack to partially close a hypothetical issue
is not worth it.
Ignore such AI complaints.
Not every "bug" needs a fix.
If a malicious bpf user wants to crash the kernel they will
find a way to do so. Especially with agents.
We cannot realistically close all of the holes.
Right now the priority is to fix the issues that normal
users can hit and not bots.

pw-bot: cr

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf v2 1/4] bpf: Fix TOCTOU issue in lwt
  2026-06-01  0:44   ` Alexei Starovoitov
@ 2026-06-01 13:34     ` Leon Hwang
  0 siblings, 0 replies; 11+ messages in thread
From: Leon Hwang @ 2026-06-01 13:34 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Andrii Nakryiko, Eduard Zingerman,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
	Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
	Shuah Khan, Guillaume Nault, Ido Schimmel,
	Fernando Fernandez Mancera, Peter Oskolkov, LKML,
	Network Development, open list:KERNEL SELFTEST FRAMEWORK,
	kernel-patches-bot

On 2026/6/1 08:44, Alexei Starovoitov wrote:
> On Fri, May 29, 2026 at 8:14 AM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>> Sashiko pointed out [1]:
>> The hdr pointer passed to bpf_lwt_push_ip_encap() can point to concurrently
>> mutable memory such as a BPF map value.
>>
>> So, the memory of hdr pointer can be updated after skb_postpush_rcsum().
>>
>> To fix it, memcpy() the hdr to a local buffer, which will be used for the
>> following checks and updates.
>>
>> [1] https://lore.kernel.org/bpf/20260525150010.CDEBA1F000E9@smtp.kernel.org/
>>
>> Fixes: 52f278774e79 ("bpf: implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap")
>> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
>> ---
>>  net/core/lwt_bpf.c | 7 +++++--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
>> index f71ef82a5f3d..8009e427851f 100644
>> --- a/net/core/lwt_bpf.c
>> +++ b/net/core/lwt_bpf.c
>> @@ -599,6 +599,7 @@ static int handle_gso_encap(struct sk_buff *skb, bool ipv4, int encap_len)
>>
>>  int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress)
>>  {
>> +       u8 buff[LWT_BPF_MAX_HEADROOM];
> 
> extra 256 bytes of stack to partially close a hypothetical issue
> is not worth it.
> Ignore such AI complaints.
> Not every "bug" needs a fix.
> If a malicious bpf user wants to crash the kernel they will
> find a way to do so. Especially with agents.
> We cannot realistically close all of the holes.
> Right now the priority is to fix the issues that normal
> users can hit and not bots.
> 

Ack.

Will focus on the fix for encapsulating VxLAN in lwt.

Thanks,
Leon


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH bpf v2 2/4] bpf: Add check iph->ihl < 5 in lwt
  2026-05-29 15:13 [PATCH bpf v2 0/4] bpf: Update transport_header when encapsulating UDP tunnel in lwt Leon Hwang
  2026-05-29 15:13 ` [PATCH bpf v2 1/4] bpf: Fix TOCTOU issue " Leon Hwang
@ 2026-05-29 15:13 ` Leon Hwang
  2026-05-29 16:06   ` sashiko-bot
  2026-05-29 15:13 ` [PATCH bpf v2 3/4] bpf: Update transport_header when encapsulating UDP tunnel " Leon Hwang
  2026-05-29 15:13 ` [PATCH bpf v2 4/4] selftests/bpf: Add tests to verify the fix of encapsulating VxLAN " Leon Hwang
  3 siblings, 1 reply; 11+ messages in thread
From: Leon Hwang @ 2026-05-29 15:13 UTC (permalink / raw)
  To: bpf
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Andrii Nakryiko, Eduard Zingerman,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
	Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
	Shuah Khan, Guillaume Nault, Leon Hwang, Ido Schimmel,
	Fernando Fernandez Mancera, Peter Oskolkov, linux-kernel, netdev,
	linux-kselftest, kernel-patches-bot

Sashiko pointed out [1]: On architectures like MIPS, the while-loop won't
stop in ip_fast_csum().

To avoid such issues caused by invalid iph->ihl in lwt, add check
"iph->ihl < 5" in bpf_lwt_push_ip_encap() to make sure iph->ihl is valid.

[1] https://lore.kernel.org/bpf/20260525150010.CDEBA1F000E9@smtp.kernel.org/

Fixes: 52f278774e79 ("bpf: implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap")
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 net/core/lwt_bpf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
index 8009e427851f..c306120e11d2 100644
--- a/net/core/lwt_bpf.c
+++ b/net/core/lwt_bpf.c
@@ -613,7 +613,7 @@ int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress)
 	iph = (struct iphdr *)buff;
 	if (iph->version == 4) {
 		ipv4 = true;
-		if (unlikely(len < iph->ihl * 4))
+		if (unlikely(iph->ihl < 5 || len < iph->ihl * 4))
 			return -EINVAL;
 	} else if (iph->version == 6) {
 		ipv4 = false;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf v2 2/4] bpf: Add check iph->ihl < 5 in lwt
  2026-05-29 15:13 ` [PATCH bpf v2 2/4] bpf: Add check iph->ihl < 5 " Leon Hwang
@ 2026-05-29 16:06   ` sashiko-bot
  0 siblings, 0 replies; 11+ messages in thread
From: sashiko-bot @ 2026-05-29 16:06 UTC (permalink / raw)
  To: Leon Hwang; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

Pre-existing issues:
- [High] bpf_lwt_push_ip_encap updates skb->csum incorrectly for CHECKSUM_COMPLETE packets by calling skb_postpush_rcsum() before calculating the IP header checksum.
- [High] handle_gso_encap() hardcodes the inner IPv4 header length to 20 bytes, ignoring IP options, which leads to reading garbage data as transport headers.
--

commit f2ce106c7be32fdcff1b6d3eb89a4fe4496a86d1
Author: Leon Hwang <leon.hwang@linux.dev>
Subject: bpf: Add check iph->ihl < 5 in lwt

This commit adds a validation check for the IP header length (ihl) in the BPF
lightweight tunnel IP encapsulation path. It ensures that the header length is
at least 5 (20 bytes) to prevent an infinite loop in ip_fast_csum() on certain
architectures like MIPS.

Link: https://lore.kernel.org/bpf/20260525150010.CDEBA1F000E9@smtp.kernel.org/

> diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
> index 8009e427851f1..c306120e11d29 100644
> --- a/net/core/lwt_bpf.c
> +++ b/net/core/lwt_bpf.c
> @@ -613,7 +613,7 @@ int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress)
>  	iph = (struct iphdr *)buff;
>  	if (iph->version == 4) {
>  		ipv4 = true;
> -		if (unlikely(len < iph->ihl * 4))
> +		if (unlikely(iph->ihl < 5 || len < iph->ihl * 4))

[Severity: High]
This isn't a bug introduced by this patch, but since we are validating the IP
header here, could this lead to an incorrect skb->csum for CHECKSUM_COMPLETE
packets later in this function?

When a BPF program leaves iph->check = 0, skb_postpush_rcsum() is called:

    if (ingress)
        skb_postpush_rcsum(skb, iph, len);

This calculates the partial checksum of the IP header and adds it to
skb->csum. Later, the correct checksum is calculated and overwrites
iph->check:

    if (!iph->check)
        iph->check = ip_fast_csum((unsigned char *)iph, iph->ihl);

However, skb->csum is not updated to reflect this modification. Because a
valid IP header's checksum zeroes out its 16-bit one's complement sum, does
the addition in skb_postpush_rcsum() incorrectly add the header's
unchecksummed sum to skb->csum, causing packets to fail CHECKSUM_COMPLETE
validation and drop?

[Severity: High]
This is a pre-existing issue, but since this patch explicitly permits IP
headers with options (where iph->ihl > 5), does handle_gso_encap() handle
them correctly at the end of this function?

If the packet requires GSO, handle_gso_encap() hardcodes the IPv4 header
length to 20 bytes:

net/core/lwt_bpf.c:handle_gso_encap() {
    ...
    next_hdr_offset = sizeof(struct iphdr);
    ...
}

If IP options are present, won't handle_gso_encap() mistakenly read the IP
options as the transport header's protocol flags (e.g., GRE flags or UDP
checksum)? 

Will this misconfigure GSO and result in corrupted segmented packets or
hardware drops?

>  			return -EINVAL;
>  	} else if (iph->version == 6) {
>  		ipv4 = false;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260529151351.69911-1-leon.hwang@linux.dev?part=2

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH bpf v2 3/4] bpf: Update transport_header when encapsulating UDP tunnel in lwt
  2026-05-29 15:13 [PATCH bpf v2 0/4] bpf: Update transport_header when encapsulating UDP tunnel in lwt Leon Hwang
  2026-05-29 15:13 ` [PATCH bpf v2 1/4] bpf: Fix TOCTOU issue " Leon Hwang
  2026-05-29 15:13 ` [PATCH bpf v2 2/4] bpf: Add check iph->ihl < 5 " Leon Hwang
@ 2026-05-29 15:13 ` Leon Hwang
  2026-05-29 16:31   ` sashiko-bot
  2026-05-29 15:13 ` [PATCH bpf v2 4/4] selftests/bpf: Add tests to verify the fix of encapsulating VxLAN " Leon Hwang
  3 siblings, 1 reply; 11+ messages in thread
From: Leon Hwang @ 2026-05-29 15:13 UTC (permalink / raw)
  To: bpf
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Andrii Nakryiko, Eduard Zingerman,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
	Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
	Shuah Khan, Guillaume Nault, Leon Hwang, Ido Schimmel,
	Fernando Fernandez Mancera, Peter Oskolkov, linux-kernel, netdev,
	linux-kselftest, kernel-patches-bot, Leon Hwang

Currently, bpf_lwt_push_ip_encap() does not update skb->transport_header.
When a driver, e.g. ice, reuses the stale skb->transport_header to
offload checksum computation to NIC hardware, VxLAN packets encapsulated
by bpf_lwt_push_encap() helper may be dropped due to incorrect checksum.

Update skb->transport_header in bpf_lwt_push_ip_encap() whenever the
encapsulated packet uses UDP, so checksum offload works correctly.

Fixes: 52f278774e79 ("bpf: implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap")
Cc: Leon Hwang <leon.huangfu@shopee.com>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 net/core/lwt_bpf.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
index c306120e11d2..1d556dec94b4 100644
--- a/net/core/lwt_bpf.c
+++ b/net/core/lwt_bpf.c
@@ -600,6 +600,7 @@ static int handle_gso_encap(struct sk_buff *skb, bool ipv4, int encap_len)
 int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress)
 {
 	u8 buff[LWT_BPF_MAX_HEADROOM];
+	bool is_udp_tunnel;
 	struct iphdr *iph;
 	bool ipv4;
 	int err;
@@ -615,10 +616,16 @@ int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress)
 		ipv4 = true;
 		if (unlikely(iph->ihl < 5 || len < iph->ihl * 4))
 			return -EINVAL;
+		is_udp_tunnel = iph->protocol == IPPROTO_UDP;
+		if (unlikely(is_udp_tunnel && len < iph->ihl * 4 + sizeof(struct udphdr)))
+			return -EINVAL;
 	} else if (iph->version == 6) {
 		ipv4 = false;
 		if (unlikely(len < sizeof(struct ipv6hdr)))
 			return -EINVAL;
+		is_udp_tunnel = ((struct ipv6hdr *)iph)->nexthdr == NEXTHDR_UDP;
+		if (unlikely(is_udp_tunnel && len < sizeof(struct ipv6hdr) + sizeof(struct udphdr)))
+			return -EINVAL;
 	} else {
 		return -EINVAL;
 	}
@@ -641,6 +648,10 @@ int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress)
 		skb_postpush_rcsum(skb, iph, len);
 	skb_reset_network_header(skb);
 	memcpy(skb_network_header(skb), buff, len);
+	if (ipv4 && is_udp_tunnel)
+		skb_set_transport_header(skb, skb_network_offset(skb) + iph->ihl * 4);
+	else if (!ipv4 && is_udp_tunnel)
+		skb_set_transport_header(skb, skb_network_offset(skb) + sizeof(struct ipv6hdr));
 	bpf_compute_data_pointers(skb);
 	skb_clear_hash(skb);
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf v2 3/4] bpf: Update transport_header when encapsulating UDP tunnel in lwt
  2026-05-29 15:13 ` [PATCH bpf v2 3/4] bpf: Update transport_header when encapsulating UDP tunnel " Leon Hwang
@ 2026-05-29 16:31   ` sashiko-bot
  0 siblings, 0 replies; 11+ messages in thread
From: sashiko-bot @ 2026-05-29 16:31 UTC (permalink / raw)
  To: Leon Hwang; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

Pre-existing issues:
- [High] The patch only updates `skb->transport_header` for UDP tunnels, leaving it stale for GRE encapsulation, which breaks hardware offloads (e.g., TSO/checksums) on modern NICs like `ice`.
- [High] `handle_gso_encap()` hardcodes the IPv4 header length to 20 bytes, failing to account for IP options, which causes out-of-bounds reads when parsing the encapsulated L4 header.
--

commit 069ecde1299856dcf6cef2dee89cb71a921e3024
Author: Leon Hwang <leon.hwang@linux.dev>

bpf: Update transport_header when encapsulating UDP tunnel in lwt

This commit fixes checksum offloading for encapsulated UDP tunnels in the
bpf_lwt_push_ip_encap() helper by updating the stale skb->transport_header.
Without this, hardware NICs like ice may compute incorrect checksums and
drop packets.

> diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
> index c306120e11d29..1d556dec94b4f 100644
> --- a/net/core/lwt_bpf.c
> +++ b/net/core/lwt_bpf.c
[ ... ]
> @@ -615,10 +616,16 @@ int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress)
>  		ipv4 = true;
>  		if (unlikely(iph->ihl < 5 || len < iph->ihl * 4))
>  			return -EINVAL;
> +		is_udp_tunnel = iph->protocol == IPPROTO_UDP;
> +		if (unlikely(is_udp_tunnel && len < iph->ihl * 4 + sizeof(struct udphdr)))
> +			return -EINVAL;

[Severity: High]
This is a pre-existing issue, but does handle_gso_encap() handle IPv4
options correctly when parsing the encapsulated L4 header?

bpf_lwt_push_ip_encap() explicitly allows IPv4 headers with options by
checking iph->ihl < 5. When a GSO packet is encapsulated,
handle_gso_encap() is called.

In handle_gso_encap(), the offset to the outer L4 header is hardcoded to 20
bytes (sizeof(struct iphdr)), ignoring the actual iph->ihl.

If a BPF program pushes an encapsulation header with IPv4 options, wouldn't
the next_hdr pointer incorrectly point to the IP options payload instead of
the GRE or UDP header?

This could lead to out-of-bounds reads or misparsing the L4 fields,
potentially misclassifying the GSO type (e.g., wrongly setting
SKB_GSO_UDP_TUNNEL_CSUM) which leads to incorrect offload processing or
dropped packets.

[ ... ]
> @@ -641,6 +648,10 @@ int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress)
>  		skb_postpush_rcsum(skb, iph, len);
>  	skb_reset_network_header(skb);
>  	memcpy(skb_network_header(skb), buff, len);
> +	if (ipv4 && is_udp_tunnel)
> +		skb_set_transport_header(skb, skb_network_offset(skb) + iph->ihl * 4);
> +	else if (!ipv4 && is_udp_tunnel)
> +		skb_set_transport_header(skb, skb_network_offset(skb) + sizeof(struct ipv6hdr));

[Severity: High]
This isn't a bug introduced by this patch, but does this leave the
transport_header stale for GRE encapsulation?

bpf_lwt_push_ip_encap() permits pushing IP encapsulation headers with
IPPROTO_GRE, but this only updates skb->transport_header when is_udp_tunnel
is true.

For GRE packets, the transport_header remains stale and points to the inner
packet's transport header. The ice driver (and others) handles GRE and UDP
tunnel GSO offloads by calculating the outer L4 header length as:

skb_inner_network_header(skb) - skb_transport_header(skb)

With a stale transport_header, this driver length calculation evaluates to
(inner_network_offset) - (inner_transport_offset), yielding a negative
value. This large unsigned length exceeds ICE_TXD_L4LEN_MAX, causing
hardware offloads to fail and the packet to be dropped.

>  	bpf_compute_data_pointers(skb);
>  	skb_clear_hash(skb);
>

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260529151351.69911-1-leon.hwang@linux.dev?part=3

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH bpf v2 4/4] selftests/bpf: Add tests to verify the fix of encapsulating VxLAN in lwt
  2026-05-29 15:13 [PATCH bpf v2 0/4] bpf: Update transport_header when encapsulating UDP tunnel in lwt Leon Hwang
                   ` (2 preceding siblings ...)
  2026-05-29 15:13 ` [PATCH bpf v2 3/4] bpf: Update transport_header when encapsulating UDP tunnel " Leon Hwang
@ 2026-05-29 15:13 ` Leon Hwang
  2026-05-29 16:48   ` sashiko-bot
  3 siblings, 1 reply; 11+ messages in thread
From: Leon Hwang @ 2026-05-29 15:13 UTC (permalink / raw)
  To: bpf
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Andrii Nakryiko, Eduard Zingerman,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
	Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
	Shuah Khan, Guillaume Nault, Leon Hwang, Ido Schimmel,
	Fernando Fernandez Mancera, Peter Oskolkov, linux-kernel, netdev,
	linux-kselftest, kernel-patches-bot, Leon Hwang

Add two tests to verify the transport header of skb has been set when
encapsulate VxLAN using bpf_lwt_push_encap() helper.

1. VxLAN over IPv4.
2. VxLAN over IPv6.

Without the fix, the tests would fail:

 lwt_ip_encap_vxlan:FAIL:transport_hdr offset unexpected transport_hdr offset: actual 70 != expected 20
 #208     lwt_ip_encap_vxlan_ipv4:FAIL
 lwt_ip_encap_vxlan:FAIL:transport_hdr offset unexpected transport_hdr offset: actual 110 != expected 40
 #209     lwt_ip_encap_vxlan_ipv6:FAIL

Assisted-by: Claude:claude-sonnet-4-6
Cc: Leon Hwang <leon.huangfu@shopee.com>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 .../selftests/bpf/prog_tests/lwt_ip_encap.c   | 158 ++++++++++++++++++
 .../selftests/bpf/progs/test_lwt_ip_encap.c   | 112 +++++++++++++
 .../bpf/progs/test_lwt_ip_encap_fix.c         |  36 ++++
 3 files changed, 306 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/test_lwt_ip_encap_fix.c

diff --git a/tools/testing/selftests/bpf/prog_tests/lwt_ip_encap.c b/tools/testing/selftests/bpf/prog_tests/lwt_ip_encap.c
index b6391af5f6f9..3d79e3b47f89 100644
--- a/tools/testing/selftests/bpf/prog_tests/lwt_ip_encap.c
+++ b/tools/testing/selftests/bpf/prog_tests/lwt_ip_encap.c
@@ -1,8 +1,11 @@
 // SPDX-License-Identifier: GPL-2.0-only
+#include <linux/ip.h>
+#include <linux/ipv6.h>
 #include <netinet/in.h>
 
 #include "network_helpers.h"
 #include "test_progs.h"
+#include "test_lwt_ip_encap_fix.skel.h"
 
 #define BPF_FILE "test_lwt_ip_encap.bpf.o"
 
@@ -35,6 +38,10 @@
 #define IP6_ADDR_SRC IP6_ADDR_1
 #define IP6_ADDR_DST IP6_ADDR_4
 
+/* VxLAN tunnel endpoints, reachable via the bottom route (veth5/6/7/8). */
+#define IP4_ADDR_VXLAN  "172.16.17.100"
+#define IP6_ADDR_VXLAN  "fb20::1"
+
 /* Setup/topology:
  *
  *    NS1             NS2             NS3
@@ -538,3 +545,154 @@ void test_lwt_ip_encap_ipv4(void)
 	if (test__start_subtest("ingress"))
 		lwt_ip_encap(IPV4_ENCAP, INGRESS, "");
 }
+
+/*
+ * VxLAN Setup/topology:
+ *
+ * NS1 (IP*_ADDR_1)                NS2                  NS3 (IP*_ADDR_4)
+ *       [ping src]
+ *           |                          top route
+ *         veth1 (LWT encap)  <<-- veth2        veth3  -X-  veth4 (ping dst)
+ *           |                                                ^
+ *       (bottom route)                                       | (inner pkt)
+ *           v                        bottom route            |
+ *         veth5              -->> veth6        veth7  -->> veth8 (vxlan decap)
+ *                                                          (IP*_ADDR_VXLAN)
+ *
+ * Add the VxLAN endpoint addresses to NS3's veth8, create standard
+ * VxLAN decap devices bound to those addresses, and install routes so
+ * NS1/NS2 can reach the endpoints via the bottom route.
+ */
+static int setup_vxlan_routes(const char *ns3, const char *ns1, const char *ns2,
+			      const char *vrf)
+{
+	struct nstoken *nstoken;
+
+	nstoken = open_netns(ns3);
+	if (!ASSERT_OK_PTR(nstoken, "open ns3 for vxlan"))
+		return -1;
+
+	SYS(fail_close, "ip    a add %s/32  dev veth8", IP4_ADDR_VXLAN);
+	SYS(fail_close, "ip -6 a add %s/128 dev veth8", IP6_ADDR_VXLAN);
+	/*
+	 * Standard VxLAN devices to decap the encapsulated packets.  The inner
+	 * Ethernet frame uses a broadcast dst MAC so the IP stack accepts it
+	 * without ARP or FDB configuration.
+	 */
+	SYS(fail_close, "ip link add vxlan4 type vxlan id 1 dstport 4789 local %s dev veth8 nolearning noudpcsum",
+	    IP4_ADDR_VXLAN);
+	SYS(fail_close, "ip link set vxlan4 up");
+	SYS(fail_close, "ip link add vxlan6 type vxlan id 1 dstport 4789 local %s dev veth8 nolearning udp6zerocsumrx",
+	    IP6_ADDR_VXLAN);
+	SYS(fail_close, "ip link set vxlan6 up");
+	close_netns(nstoken);
+
+	SYS(fail, "ip -n %s    route add %s/32  dev veth5 via %s %s",
+	    ns1, IP4_ADDR_VXLAN, IP4_ADDR_6, vrf);
+	SYS(fail, "ip -n %s    route add %s/32  dev veth7 via %s %s",
+	    ns2, IP4_ADDR_VXLAN, IP4_ADDR_8, vrf);
+	SYS(fail, "ip -n %s -6 route add %s/128 dev veth5 via %s %s",
+	    ns1, IP6_ADDR_VXLAN, IP6_ADDR_6, vrf);
+	SYS(fail, "ip -n %s -6 route add %s/128 dev veth7 via %s %s",
+	    ns2, IP6_ADDR_VXLAN, IP6_ADDR_8, vrf);
+	return 0;
+
+fail_close:
+	close_netns(nstoken);
+fail:
+	return -1;
+}
+
+/*
+ * VxLAN encap tests (IPv4-outer and IPv6-outer variants).
+ *
+ * Test 1 - functional: the BPF LWT xmit program encapsulates the packet
+ *   (protocol=UDP, port=4789) and re-routes it without dropping it.
+ *   Verified by ping success.
+ *
+ * Test 2 - fix verification: after bpf_lwt_push_ip_encap() the
+ *   skb->transport_header must point at the outer UDP header, i.e.
+ *   transport_header - network_header == sizeof(outer IP header).
+ *   Without the fix the transport_header still points at the inner
+ *   transport layer, giving a wrong (larger) offset.
+ */
+static void lwt_ip_encap_vxlan(bool ipv4_encap)
+{
+	char ns1[NETNS_NAME_SIZE] = NETNS_BASE "-1-";
+	char ns2[NETNS_NAME_SIZE] = NETNS_BASE "-2-";
+	char ns3[NETNS_NAME_SIZE] = NETNS_BASE "-3-";
+	const char *sec = ipv4_encap ? "encap_vxlan" : "encap_vxlan6";
+	int expected_offset = ipv4_encap ? (int)sizeof(struct iphdr)
+					 : (int)sizeof(struct ipv6hdr);
+	struct test_lwt_ip_encap_fix *skel = NULL;
+	int thdr_offset;
+
+	if (!ASSERT_OK(create_ns(ns1, NETNS_NAME_SIZE), "create ns1"))
+		goto out;
+	if (!ASSERT_OK(create_ns(ns2, NETNS_NAME_SIZE), "create ns2"))
+		goto out;
+	if (!ASSERT_OK(create_ns(ns3, NETNS_NAME_SIZE), "create ns3"))
+		goto out;
+
+	if (!ASSERT_OK(setup_network(ns1, ns2, ns3, ""), "setup network"))
+		goto out;
+
+	if (!ASSERT_OK(setup_vxlan_routes(ns3, ns1, ns2, ""), "setup vxlan routes"))
+		goto out;
+
+	/*
+	 * Attach fexit to bpf_lwt_push_ip_encap() before installing the
+	 * LWT route so we don't miss the first encap call.
+	 */
+	skel = test_lwt_ip_encap_fix__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "test_lwt_ip_encap_fix__open_and_load"))
+		goto out;
+
+	if (!ASSERT_OK(test_lwt_ip_encap_fix__attach(skel), "test_lwt_ip_encap_fix__attach"))
+		goto out;
+
+	/* Remove the direct NS2->DST route so packets must go via LWT encap. */
+	SYS(out, "ip -n %s    route del %s/32  dev veth3", ns2, IP4_ADDR_DST);
+	SYS(out, "ip -n %s -6 route del %s/128 dev veth3", ns2, IP6_ADDR_DST);
+
+	/* Install the VxLAN BPF LWT xmit route. */
+	if (ipv4_encap)
+		SYS(out, "ip -n %s route add %s encap bpf xmit obj %s sec %s dev veth1",
+		    ns1, IP4_ADDR_DST, BPF_FILE, sec);
+	else
+		SYS(out, "ip -n %s -6 route add %s encap bpf xmit obj %s sec %s dev veth1",
+		    ns1, IP6_ADDR_DST, BPF_FILE, sec);
+
+	skel->bss->fexit_triggered = false;
+	if (ipv4_encap)
+		SYS(out, "ip netns exec %s ping  -c 1 -W1 %s", ns1, IP4_ADDR_DST);
+	else
+		SYS(out, "ip netns exec %s ping6 -c 1 -W1 %s", ns1, IP6_ADDR_DST);
+
+	/* Test 1: fexit triggered means bpf_lwt_push_ip_encap() succeeded. */
+	if (!ASSERT_TRUE(skel->bss->fexit_triggered, "fexit_triggered"))
+		goto out;
+
+	/*
+	 * Test 2: transport_header must sit immediately after the outer IP
+	 * header, pointing at the UDP header of the VxLAN encap.
+	 */
+	thdr_offset = (int)skel->bss->transport_hdr - (int)skel->bss->network_hdr;
+	ASSERT_EQ(thdr_offset, expected_offset, "transport_hdr offset");
+
+out:
+	test_lwt_ip_encap_fix__destroy(skel);
+	SYS_NOFAIL("ip netns del %s", ns1);
+	SYS_NOFAIL("ip netns del %s", ns2);
+	SYS_NOFAIL("ip netns del %s", ns3);
+}
+
+void test_lwt_ip_encap_vxlan_ipv4(void)
+{
+	lwt_ip_encap_vxlan(IPV4_ENCAP);
+}
+
+void test_lwt_ip_encap_vxlan_ipv6(void)
+{
+	lwt_ip_encap_vxlan(IPV6_ENCAP);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_lwt_ip_encap.c b/tools/testing/selftests/bpf/progs/test_lwt_ip_encap.c
index d6cb986e7533..36f0fc682ffb 100644
--- a/tools/testing/selftests/bpf/progs/test_lwt_ip_encap.c
+++ b/tools/testing/selftests/bpf/progs/test_lwt_ip_encap.c
@@ -2,8 +2,10 @@
 #include <stddef.h>
 #include <string.h>
 #include <linux/bpf.h>
+#include <linux/if_ether.h>
 #include <linux/ip.h>
 #include <linux/ipv6.h>
+#include <linux/udp.h>
 #include <bpf/bpf_helpers.h>
 #include <bpf/bpf_endian.h>
 
@@ -82,4 +84,114 @@ int bpf_lwt_encap_gre6(struct __sk_buff *skb)
 	return BPF_LWT_REROUTE;
 }
 
+struct vxlanhdr {
+	__be32 vx_flags;  /* I flag = 0x08000000 (valid VNI) */
+	__be32 vx_vni;    /* VNI in top 24 bits */
+};
+
+#define VXLAN_PORT  4789
+#define VXLAN_FLAGS 0x08000000
+#define VXLAN_VNI   1
+
+static const __u8 bcast[ETH_ALEN] = {
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+};
+
+static const __u8 srcmac[ETH_ALEN] = {
+	0x02, 0x00, 0x00, 0x00, 0x00, 0x01,
+};
+
+SEC("encap_vxlan")
+int bpf_lwt_encap_vxlan(struct __sk_buff *skb)
+{
+	struct encap_hdr {
+		struct iphdr    iph;
+		struct udphdr   udph;
+		struct vxlanhdr vxh;
+		struct ethhdr   eth;
+	} __attribute__((__packed__)) /* packed is required to avoid padding */ hdr;
+	int err;
+
+	memset(&hdr, 0, sizeof(hdr));
+
+	hdr.iph.ihl      = 5;
+	hdr.iph.version  = 4;
+	hdr.iph.ttl      = 0x40;
+	hdr.iph.protocol = 17; /* IPPROTO_UDP */
+	hdr.iph.tot_len  = bpf_htons(skb->len + sizeof(hdr));
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+	hdr.iph.saddr = 0x640510ac;  /* 172.16.5.100  */
+	hdr.iph.daddr = 0x641110ac;  /* 172.16.17.100 */
+#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+	hdr.iph.saddr = 0xac100564;  /* 172.16.5.100 */
+	hdr.iph.daddr = 0xac101164;  /* 172.16.17.100 */
+#else
+#error "Fix your compiler's __BYTE_ORDER__?!"
+#endif
+
+	hdr.udph.source = bpf_htons(VXLAN_PORT);
+	hdr.udph.dest   = bpf_htons(VXLAN_PORT);
+	hdr.udph.len    = bpf_htons(skb->len + sizeof(hdr.udph) + sizeof(hdr.vxh) +
+				    sizeof(hdr.eth));
+
+	hdr.vxh.vx_flags = bpf_htonl(VXLAN_FLAGS);
+	hdr.vxh.vx_vni   = bpf_htonl(VXLAN_VNI << 8);
+
+	__builtin_memcpy(hdr.eth.h_dest, bcast, ETH_ALEN);
+	__builtin_memcpy(hdr.eth.h_source, srcmac, ETH_ALEN);
+	hdr.eth.h_proto = bpf_htons(ETH_P_IP);
+
+	err = bpf_lwt_push_encap(skb, BPF_LWT_ENCAP_IP, &hdr, sizeof(hdr));
+	if (err)
+		return BPF_DROP;
+
+	return BPF_LWT_REROUTE;
+}
+
+SEC("encap_vxlan6")
+int bpf_lwt_encap_vxlan6(struct __sk_buff *skb)
+{
+	struct encap_hdr {
+		struct ipv6hdr  ip6hdr;
+		struct udphdr   udph;
+		struct vxlanhdr vxh;
+		struct ethhdr   eth;
+	} __attribute__((__packed__)) /* packed is required to avoid padding */ hdr;
+	int err;
+
+	memset(&hdr, 0, sizeof(hdr));
+
+	hdr.ip6hdr.version     = 6;
+	hdr.ip6hdr.nexthdr     = 17; /* IPPROTO_UDP */
+	hdr.ip6hdr.hop_limit   = 0x40;
+	hdr.ip6hdr.payload_len = bpf_htons(skb->len + sizeof(hdr.udph) + sizeof(hdr.vxh) +
+					   sizeof(hdr.eth));
+	/* fb05::1 */
+	hdr.ip6hdr.saddr.s6_addr[0]  = 0xfb;
+	hdr.ip6hdr.saddr.s6_addr[1]  = 0x05;
+	hdr.ip6hdr.saddr.s6_addr[15] = 1;
+	/* fb20::1 */
+	hdr.ip6hdr.daddr.s6_addr[0]  = 0xfb;
+	hdr.ip6hdr.daddr.s6_addr[1]  = 0x20;
+	hdr.ip6hdr.daddr.s6_addr[15] = 1;
+
+	hdr.udph.source = bpf_htons(VXLAN_PORT);
+	hdr.udph.dest   = bpf_htons(VXLAN_PORT);
+	hdr.udph.len    = bpf_htons(skb->len + sizeof(hdr.udph) + sizeof(hdr.vxh) +
+				    sizeof(hdr.eth));
+
+	hdr.vxh.vx_flags = bpf_htonl(VXLAN_FLAGS);
+	hdr.vxh.vx_vni   = bpf_htonl(VXLAN_VNI << 8);
+
+	__builtin_memcpy(hdr.eth.h_dest, bcast, ETH_ALEN);
+	__builtin_memcpy(hdr.eth.h_source, srcmac, ETH_ALEN);
+	hdr.eth.h_proto = bpf_htons(ETH_P_IPV6);
+
+	err = bpf_lwt_push_encap(skb, BPF_LWT_ENCAP_IP, &hdr, sizeof(hdr));
+	if (err)
+		return BPF_DROP;
+
+	return BPF_LWT_REROUTE;
+}
+
 char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/test_lwt_ip_encap_fix.c b/tools/testing/selftests/bpf/progs/test_lwt_ip_encap_fix.c
new file mode 100644
index 000000000000..e9043fe654eb
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_lwt_ip_encap_fix.c
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * fexit on bpf_lwt_push_ip_encap() to verify skb->transport_header is
+ * correctly updated when a UDP-based tunnel (e.g. VxLAN) is pushed.
+ */
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_core_read.h>
+
+/* Written by fexit, read by the user-space test via skeleton BSS. */
+__u16 transport_hdr = 0;
+__u16 network_hdr = 0;
+bool fexit_triggered = false;
+
+/*
+ * bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress)
+ *
+ * After a successful push the transport_header must point at the outer
+ * transport header (UDP for VxLAN), i.e.
+ *   transport_header - network_header == sizeof(outer IP header)
+ */
+SEC("fexit/bpf_lwt_push_ip_encap")
+int BPF_PROG(fexit_lwt_push_ip_encap, struct sk_buff *skb, void *hdr, u32 len, bool ingress,
+	     int retval)
+{
+	if (retval || fexit_triggered)
+		return 0;
+
+	fexit_triggered = true;
+	transport_hdr = BPF_CORE_READ(skb, transport_header);
+	network_hdr   = BPF_CORE_READ(skb, network_header);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf v2 4/4] selftests/bpf: Add tests to verify the fix of encapsulating VxLAN in lwt
  2026-05-29 15:13 ` [PATCH bpf v2 4/4] selftests/bpf: Add tests to verify the fix of encapsulating VxLAN " Leon Hwang
@ 2026-05-29 16:48   ` sashiko-bot
  0 siblings, 0 replies; 11+ messages in thread
From: sashiko-bot @ 2026-05-29 16:48 UTC (permalink / raw)
  To: Leon Hwang; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Medium] The fexit program `fexit_lwt_push_ip_encap` is a global hook without filtering, causing test flakiness when tests run concurrently.
--

commit 1b1d75b61046d2f71544d30d5afec12a9644a6b0
Author: Leon Hwang <leon.hwang@linux.dev>

selftests/bpf: Add tests to verify the fix of encapsulating VxLAN in lwt

This commit adds two new selftests (IPv4 and IPv6) to verify that the
transport header of an skb is correctly set when encapsulating VxLAN using
the bpf_lwt_push_encap() helper.

> diff --git a/tools/testing/selftests/bpf/progs/test_lwt_ip_encap_fix.c b/tools/testing/selftests/bpf/progs/test_lwt_ip_encap_fix.c
> new file mode 100644
> index 0000000000000..e9043fe654eb7
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_lwt_ip_encap_fix.c
> @@ -0,0 +1,36 @@
[ ... ]
> +SEC("fexit/bpf_lwt_push_ip_encap")
> +int BPF_PROG(fexit_lwt_push_ip_encap, struct sk_buff *skb, void *hdr, u32 len, bool ingress,
> +	     int retval)
> +{
> +	if (retval || fexit_triggered)
> +		return 0;
> +
> +	fexit_triggered = true;
> +	transport_hdr = BPF_CORE_READ(skb, transport_header);

[Severity: Medium]
Will this fexit program cause test flakiness when tests run concurrently?

Since test_progs executes test functions concurrently across multiple workers
by default, this global fexit program might intercept the
bpf_lwt_push_ip_encap call from a different test running simultaneously
(like the existing GRE test).

Because it only captures the first successful execution and has no filtering
for a specific test or network namespace, could it record offsets for the
wrong packet type and lead to random failures in the transport_hdr offset
assertion?

> +	network_hdr   = BPF_CORE_READ(skb, network_header);
> +	return 0;
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260529151351.69911-1-leon.hwang@linux.dev?part=4

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-06-01 13:34 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-29 15:13 [PATCH bpf v2 0/4] bpf: Update transport_header when encapsulating UDP tunnel in lwt Leon Hwang
2026-05-29 15:13 ` [PATCH bpf v2 1/4] bpf: Fix TOCTOU issue " Leon Hwang
2026-05-29 15:49   ` sashiko-bot
2026-06-01  0:44   ` Alexei Starovoitov
2026-06-01 13:34     ` Leon Hwang
2026-05-29 15:13 ` [PATCH bpf v2 2/4] bpf: Add check iph->ihl < 5 " Leon Hwang
2026-05-29 16:06   ` sashiko-bot
2026-05-29 15:13 ` [PATCH bpf v2 3/4] bpf: Update transport_header when encapsulating UDP tunnel " Leon Hwang
2026-05-29 16:31   ` sashiko-bot
2026-05-29 15:13 ` [PATCH bpf v2 4/4] selftests/bpf: Add tests to verify the fix of encapsulating VxLAN " Leon Hwang
2026-05-29 16:48   ` sashiko-bot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.