[PATCH bpf 0/2] bpf: Update transport_header when encapsulating UDP tunnel in lwt

BPF List
 help / color / mirror / Atom feed

* [PATCH bpf 0/2] bpf: Update transport_header when encapsulating UDP tunnel in lwt
@ 2026-05-25 14:26 Leon Hwang
  2026-05-25 14:26 ` [PATCH bpf 1/2] " Leon Hwang
  2026-05-25 14:26 ` [PATCH bpf 2/2] selftests/bpf: Add tests to verify the fix of encapsulating VxLAN " Leon Hwang
  0 siblings, 2 replies; 7+ messages in thread
From: Leon Hwang @ 2026-05-25 14:26 UTC (permalink / raw)
  To: bpf
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Andrii Nakryiko, Eduard Zingerman,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
	Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
	Shuah Khan, Guillaume Nault, Leon Hwang, Ido Schimmel,
	Fernando Fernandez Mancera, Peter Oskolkov, linux-kernel, netdev,
	linux-kselftest, kernel-patches-bot

Currently, bpf_lwt_push_ip_encap() does not update skb->transport_header.
When a driver, e.g. ice, reuses the stale skb->transport_header to
offload checksum computation to NIC hardware, VxLAN packets encapsulated
by bpf_lwt_push_encap() helper may be dropped due to incorrect checksum.

Update skb->transport_header in bpf_lwt_push_ip_encap() whenever the
encapsulated packet uses UDP, so checksum offload works correctly.

Leon Hwang (2):
  bpf: Update transport_header when encapsulating UDP tunnel in lwt
  selftests/bpf: Add tests to verify the fix of encapsulating VxLAN in
    lwt

 net/core/lwt_bpf.c                            |   4 +
 .../selftests/bpf/prog_tests/lwt_ip_encap.c   | 153 ++++++++++++++++++
 .../selftests/bpf/progs/test_lwt_ip_encap.c   | 112 +++++++++++++
 .../bpf/progs/test_lwt_ip_encap_fix.c         |  36 +++++
 4 files changed, 305 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/test_lwt_ip_encap_fix.c

-- 
2.54.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH bpf 1/2] bpf: Update transport_header when encapsulating UDP tunnel in lwt
  2026-05-25 14:26 [PATCH bpf 0/2] bpf: Update transport_header when encapsulating UDP tunnel in lwt Leon Hwang
@ 2026-05-25 14:26 ` Leon Hwang
  2026-05-25 15:00   ` sashiko-bot
  2026-05-25 14:26 ` [PATCH bpf 2/2] selftests/bpf: Add tests to verify the fix of encapsulating VxLAN " Leon Hwang
  1 sibling, 1 reply; 7+ messages in thread
From: Leon Hwang @ 2026-05-25 14:26 UTC (permalink / raw)
  To: bpf
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Andrii Nakryiko, Eduard Zingerman,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
	Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
	Shuah Khan, Guillaume Nault, Leon Hwang, Ido Schimmel,
	Fernando Fernandez Mancera, Peter Oskolkov, linux-kernel, netdev,
	linux-kselftest, kernel-patches-bot, Leon Hwang

Currently, bpf_lwt_push_ip_encap() does not update skb->transport_header.
When a driver, e.g. ice, reuses the stale skb->transport_header to
offload checksum computation to NIC hardware, VxLAN packets encapsulated
by bpf_lwt_push_encap() helper may be dropped due to incorrect checksum.

Update skb->transport_header in bpf_lwt_push_ip_encap() whenever the
encapsulated packet uses UDP, so checksum offload works correctly.

Fixes: 52f278774e79 ("bpf: implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap")
Cc: Leon Hwang <leon.huangfu@shopee.com>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 net/core/lwt_bpf.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
index f71ef82a5f3d..cbb2803367c6 100644
--- a/net/core/lwt_bpf.c
+++ b/net/core/lwt_bpf.c
@@ -637,6 +637,10 @@ int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress)
 	if (ingress)
 		skb_postpush_rcsum(skb, iph, len);
 	skb_reset_network_header(skb);
+	if (ipv4 && iph->protocol == IPPROTO_UDP /* UDP tunnel */)
+		skb_set_transport_header(skb, skb_network_offset(skb) + iph->ihl * 4);
+	else if (!ipv4 && ((struct ipv6hdr *)hdr)->nexthdr == NEXTHDR_UDP /* UDP tunnel */)
+		skb_set_transport_header(skb, skb_network_offset(skb) + sizeof(struct ipv6hdr));
 	memcpy(skb_network_header(skb), hdr, len);
 	bpf_compute_data_pointers(skb);
 	skb_clear_hash(skb);
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH bpf 2/2] selftests/bpf: Add tests to verify the fix of encapsulating VxLAN in lwt
  2026-05-25 14:26 [PATCH bpf 0/2] bpf: Update transport_header when encapsulating UDP tunnel in lwt Leon Hwang
  2026-05-25 14:26 ` [PATCH bpf 1/2] " Leon Hwang
@ 2026-05-25 14:26 ` Leon Hwang
  2026-05-25 15:17   ` sashiko-bot
  1 sibling, 1 reply; 7+ messages in thread
From: Leon Hwang @ 2026-05-25 14:26 UTC (permalink / raw)
  To: bpf
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Andrii Nakryiko, Eduard Zingerman,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
	Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa,
	Shuah Khan, Guillaume Nault, Leon Hwang, Ido Schimmel,
	Fernando Fernandez Mancera, Peter Oskolkov, linux-kernel, netdev,
	linux-kselftest, kernel-patches-bot, Leon Hwang

Add two tests to verify the transport header of skb has been set when
encapsulating VxLAN using bpf_lwt_push_encap() helper.

1. VxLAN over IPv4.
2. VxLAN over IPv6.

Without the fix, the tests would fail:

 lwt_ip_encap_vxlan:FAIL:transport_hdr offset unexpected transport_hdr offset: actual 70 != expected 20
 #208     lwt_ip_encap_vxlan_ipv4:FAIL
 lwt_ip_encap_vxlan:FAIL:transport_hdr offset unexpected transport_hdr offset: actual 110 != expected 40
 #209     lwt_ip_encap_vxlan_ipv6:FAIL

The unexpected offsets are: outer encap headers
(IPv4: iphdr+udp+vxlan+eth = 50 bytes, IPv6: ipv6hdr+udp+vxlan+eth = 70 bytes)
plus the inner IP header (20 or 40 bytes), because without the fix
transport_header still points at the inner transport layer instead of the
outer UDP header.

Assisted-by: Claude:claude-sonnet-4-6
Cc: Leon Hwang <leon.huangfu@shopee.com>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 .../selftests/bpf/prog_tests/lwt_ip_encap.c   | 153 ++++++++++++++++++
 .../selftests/bpf/progs/test_lwt_ip_encap.c   | 112 +++++++++++++
 .../bpf/progs/test_lwt_ip_encap_fix.c         |  36 +++++
 3 files changed, 301 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/test_lwt_ip_encap_fix.c

diff --git a/tools/testing/selftests/bpf/prog_tests/lwt_ip_encap.c b/tools/testing/selftests/bpf/prog_tests/lwt_ip_encap.c
index b6391af5f6f9..04fc35470f21 100644
--- a/tools/testing/selftests/bpf/prog_tests/lwt_ip_encap.c
+++ b/tools/testing/selftests/bpf/prog_tests/lwt_ip_encap.c
@@ -1,8 +1,11 @@
 // SPDX-License-Identifier: GPL-2.0-only
+#include <linux/ip.h>
+#include <linux/ipv6.h>
 #include <netinet/in.h>
 
 #include "network_helpers.h"
 #include "test_progs.h"
+#include "test_lwt_ip_encap_fix.skel.h"
 
 #define BPF_FILE "test_lwt_ip_encap.bpf.o"
 
@@ -35,6 +38,10 @@
 #define IP6_ADDR_SRC IP6_ADDR_1
 #define IP6_ADDR_DST IP6_ADDR_4
 
+/* VxLAN tunnel endpoints, reachable via the bottom route (veth5/6/7/8). */
+#define IP4_ADDR_VXLAN  "172.16.17.100"
+#define IP6_ADDR_VXLAN  "fb20::1"
+
 /* Setup/topology:
  *
  *    NS1             NS2             NS3
@@ -538,3 +545,149 @@ void test_lwt_ip_encap_ipv4(void)
 	if (test__start_subtest("ingress"))
 		lwt_ip_encap(IPV4_ENCAP, INGRESS, "");
 }
+
+/* VxLAN Setup/topology:
+ *
+ * NS1 (IP*_ADDR_1)                NS2                  NS3 (IP*_ADDR_4)
+ *       [ping src]
+ *           |                          top route
+ *         veth1 (LWT encap)  <<-- veth2        veth3  -X-  veth4 (ping dst)
+ *           |                                                ^
+ *       (bottom route)                                       | (inner pkt)
+ *           v                        bottom route            |
+ *         veth5              -->> veth6        veth7  -->> veth8 (vxlan decap)
+ *                                                          (IP*_ADDR_VXLAN)
+ *
+ * Add the VxLAN endpoint addresses to NS3's veth8, create standard
+ * VxLAN decap devices bound to those addresses, and install routes so
+ * NS1/NS2 can reach the endpoints via the bottom route.
+ */
+static int setup_vxlan_routes(const char *ns3, const char *ns1, const char *ns2,
+			      const char *vrf)
+{
+	struct nstoken *nstoken;
+
+	nstoken = open_netns(ns3);
+	if (!ASSERT_OK_PTR(nstoken, "open ns3 for vxlan"))
+		return -1;
+
+	SYS(fail_close, "ip    a add %s/32  dev veth8", IP4_ADDR_VXLAN);
+	SYS(fail_close, "ip -6 a add %s/128 dev veth8", IP6_ADDR_VXLAN);
+	/* Standard VxLAN devices to decap the encapsulated packets.  The inner
+	 * Ethernet frame uses a broadcast dst MAC so the IP stack accepts it
+	 * without ARP or FDB configuration.
+	 */
+	SYS(fail_close, "ip link add vxlan4 type vxlan id 1 dstport 4789 local %s dev veth8 nolearning noudpcsum",
+	    IP4_ADDR_VXLAN);
+	SYS(fail_close, "ip link set vxlan4 up");
+	SYS(fail_close, "ip link add vxlan6 type vxlan id 1 dstport 4789 local %s dev veth8 nolearning udp6zerocsumrx",
+	    IP6_ADDR_VXLAN);
+	SYS(fail_close, "ip link set vxlan6 up");
+	close_netns(nstoken);
+
+	SYS(fail, "ip -n %s    route add %s/32  dev veth5 via %s %s",
+	    ns1, IP4_ADDR_VXLAN, IP4_ADDR_6, vrf);
+	SYS(fail, "ip -n %s    route add %s/32  dev veth7 via %s %s",
+	    ns2, IP4_ADDR_VXLAN, IP4_ADDR_8, vrf);
+	SYS(fail, "ip -n %s -6 route add %s/128 dev veth5 via %s %s",
+	    ns1, IP6_ADDR_VXLAN, IP6_ADDR_6, vrf);
+	SYS(fail, "ip -n %s -6 route add %s/128 dev veth7 via %s %s",
+	    ns2, IP6_ADDR_VXLAN, IP6_ADDR_8, vrf);
+	return 0;
+
+fail_close:
+	close_netns(nstoken);
+fail:
+	return -1;
+}
+
+/* VxLAN encap tests (IPv4-outer and IPv6-outer variants).
+ *
+ * Test 1 - functional: the BPF LWT xmit program encapsulates the packet
+ *   (protocol=UDP, port=4789) and re-routes it without dropping it.
+ *   Verified by ping success.
+ *
+ * Test 2 - fix verification: after bpf_lwt_push_ip_encap() the
+ *   skb->transport_header must point at the outer UDP header, i.e.
+ *   transport_header - network_header == sizeof(outer IP header).
+ *   Without the fix the transport_header still points at the inner
+ *   transport layer, giving a wrong (larger) offset.
+ */
+static void lwt_ip_encap_vxlan(bool ipv4_encap)
+{
+	char ns1[NETNS_NAME_SIZE] = NETNS_BASE "-1-";
+	char ns2[NETNS_NAME_SIZE] = NETNS_BASE "-2-";
+	char ns3[NETNS_NAME_SIZE] = NETNS_BASE "-3-";
+	const char *sec = ipv4_encap ? "encap_vxlan" : "encap_vxlan6";
+	int expected_offset = ipv4_encap ? (int)sizeof(struct iphdr)
+					 : (int)sizeof(struct ipv6hdr);
+	struct test_lwt_ip_encap_fix *skel = NULL;
+	int thdr_offset;
+
+	if (!ASSERT_OK(create_ns(ns1, NETNS_NAME_SIZE), "create ns1"))
+		goto out;
+	if (!ASSERT_OK(create_ns(ns2, NETNS_NAME_SIZE), "create ns2"))
+		goto out;
+	if (!ASSERT_OK(create_ns(ns3, NETNS_NAME_SIZE), "create ns3"))
+		goto out;
+
+	if (!ASSERT_OK(setup_network(ns1, ns2, ns3, ""), "setup network"))
+		goto out;
+
+	if (!ASSERT_OK(setup_vxlan_routes(ns3, ns1, ns2, ""), "setup vxlan routes"))
+		goto out;
+
+	/* Attach fexit to bpf_lwt_push_ip_encap() before installing the
+	 * LWT route so we don't miss the first encap call.
+	 */
+	skel = test_lwt_ip_encap_fix__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "test_lwt_ip_encap_fix__open_and_load"))
+		goto out;
+
+	if (!ASSERT_OK(test_lwt_ip_encap_fix__attach(skel), "test_lwt_ip_encap_fix__attach"))
+		goto out;
+
+	/* Remove the direct NS2->DST route so packets must go via LWT encap. */
+	SYS(out, "ip -n %s    route del %s/32  dev veth3", ns2, IP4_ADDR_DST);
+	SYS(out, "ip -n %s -6 route del %s/128 dev veth3", ns2, IP6_ADDR_DST);
+
+	/* Install the VxLAN BPF LWT xmit route. */
+	if (ipv4_encap)
+		SYS(out, "ip -n %s route add %s encap bpf xmit obj %s sec %s dev veth1",
+		    ns1, IP4_ADDR_DST, BPF_FILE, sec);
+	else
+		SYS(out, "ip -n %s -6 route add %s encap bpf xmit obj %s sec %s dev veth1",
+		    ns1, IP6_ADDR_DST, BPF_FILE, sec);
+
+	skel->bss->fexit_triggered = false;
+	if (ipv4_encap)
+		SYS(out, "ip netns exec %s ping  -c 1 -W1 %s", ns1, IP4_ADDR_DST);
+	else
+		SYS(out, "ip netns exec %s ping6 -c 1 -W1 %s", ns1, IP6_ADDR_DST);
+
+	/* Test 1: fexit triggered means bpf_lwt_push_ip_encap() succeeded. */
+	if (!ASSERT_TRUE(skel->bss->fexit_triggered, "fexit_triggered"))
+		goto out;
+
+	/* Test 2: transport_header must sit immediately after the outer IP
+	 * header, pointing at the UDP header of the VxLAN encap.
+	 */
+	thdr_offset = (int)skel->bss->transport_hdr - (int)skel->bss->network_hdr;
+	ASSERT_EQ(thdr_offset, expected_offset, "transport_hdr offset");
+
+out:
+	test_lwt_ip_encap_fix__destroy(skel);
+	SYS_NOFAIL("ip netns del %s", ns1);
+	SYS_NOFAIL("ip netns del %s", ns2);
+	SYS_NOFAIL("ip netns del %s", ns3);
+}
+
+void test_lwt_ip_encap_vxlan_ipv4(void)
+{
+	lwt_ip_encap_vxlan(IPV4_ENCAP);
+}
+
+void test_lwt_ip_encap_vxlan_ipv6(void)
+{
+	lwt_ip_encap_vxlan(IPV6_ENCAP);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_lwt_ip_encap.c b/tools/testing/selftests/bpf/progs/test_lwt_ip_encap.c
index d6cb986e7533..36f0fc682ffb 100644
--- a/tools/testing/selftests/bpf/progs/test_lwt_ip_encap.c
+++ b/tools/testing/selftests/bpf/progs/test_lwt_ip_encap.c
@@ -2,8 +2,10 @@
 #include <stddef.h>
 #include <string.h>
 #include <linux/bpf.h>
+#include <linux/if_ether.h>
 #include <linux/ip.h>
 #include <linux/ipv6.h>
+#include <linux/udp.h>
 #include <bpf/bpf_helpers.h>
 #include <bpf/bpf_endian.h>
 
@@ -82,4 +84,114 @@ int bpf_lwt_encap_gre6(struct __sk_buff *skb)
 	return BPF_LWT_REROUTE;
 }
 
+struct vxlanhdr {
+	__be32 vx_flags;  /* I flag = 0x08000000 (valid VNI) */
+	__be32 vx_vni;    /* VNI in top 24 bits */
+};
+
+#define VXLAN_PORT  4789
+#define VXLAN_FLAGS 0x08000000
+#define VXLAN_VNI   1
+
+static const __u8 bcast[ETH_ALEN] = {
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+};
+
+static const __u8 srcmac[ETH_ALEN] = {
+	0x02, 0x00, 0x00, 0x00, 0x00, 0x01,
+};
+
+SEC("encap_vxlan")
+int bpf_lwt_encap_vxlan(struct __sk_buff *skb)
+{
+	struct encap_hdr {
+		struct iphdr    iph;
+		struct udphdr   udph;
+		struct vxlanhdr vxh;
+		struct ethhdr   eth;
+	} __attribute__((__packed__)) /* packed is required to avoid padding */ hdr;
+	int err;
+
+	memset(&hdr, 0, sizeof(hdr));
+
+	hdr.iph.ihl      = 5;
+	hdr.iph.version  = 4;
+	hdr.iph.ttl      = 0x40;
+	hdr.iph.protocol = 17; /* IPPROTO_UDP */
+	hdr.iph.tot_len  = bpf_htons(skb->len + sizeof(hdr));
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+	hdr.iph.saddr = 0x640510ac;  /* 172.16.5.100  */
+	hdr.iph.daddr = 0x641110ac;  /* 172.16.17.100 */
+#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+	hdr.iph.saddr = 0xac100564;  /* 172.16.5.100 */
+	hdr.iph.daddr = 0xac101164;  /* 172.16.17.100 */
+#else
+#error "Fix your compiler's __BYTE_ORDER__?!"
+#endif
+
+	hdr.udph.source = bpf_htons(VXLAN_PORT);
+	hdr.udph.dest   = bpf_htons(VXLAN_PORT);
+	hdr.udph.len    = bpf_htons(skb->len + sizeof(hdr.udph) + sizeof(hdr.vxh) +
+				    sizeof(hdr.eth));
+
+	hdr.vxh.vx_flags = bpf_htonl(VXLAN_FLAGS);
+	hdr.vxh.vx_vni   = bpf_htonl(VXLAN_VNI << 8);
+
+	__builtin_memcpy(hdr.eth.h_dest, bcast, ETH_ALEN);
+	__builtin_memcpy(hdr.eth.h_source, srcmac, ETH_ALEN);
+	hdr.eth.h_proto = bpf_htons(ETH_P_IP);
+
+	err = bpf_lwt_push_encap(skb, BPF_LWT_ENCAP_IP, &hdr, sizeof(hdr));
+	if (err)
+		return BPF_DROP;
+
+	return BPF_LWT_REROUTE;
+}
+
+SEC("encap_vxlan6")
+int bpf_lwt_encap_vxlan6(struct __sk_buff *skb)
+{
+	struct encap_hdr {
+		struct ipv6hdr  ip6hdr;
+		struct udphdr   udph;
+		struct vxlanhdr vxh;
+		struct ethhdr   eth;
+	} __attribute__((__packed__)) /* packed is required to avoid padding */ hdr;
+	int err;
+
+	memset(&hdr, 0, sizeof(hdr));
+
+	hdr.ip6hdr.version     = 6;
+	hdr.ip6hdr.nexthdr     = 17; /* IPPROTO_UDP */
+	hdr.ip6hdr.hop_limit   = 0x40;
+	hdr.ip6hdr.payload_len = bpf_htons(skb->len + sizeof(hdr.udph) + sizeof(hdr.vxh) +
+					   sizeof(hdr.eth));
+	/* fb05::1 */
+	hdr.ip6hdr.saddr.s6_addr[0]  = 0xfb;
+	hdr.ip6hdr.saddr.s6_addr[1]  = 0x05;
+	hdr.ip6hdr.saddr.s6_addr[15] = 1;
+	/* fb20::1 */
+	hdr.ip6hdr.daddr.s6_addr[0]  = 0xfb;
+	hdr.ip6hdr.daddr.s6_addr[1]  = 0x20;
+	hdr.ip6hdr.daddr.s6_addr[15] = 1;
+
+	hdr.udph.source = bpf_htons(VXLAN_PORT);
+	hdr.udph.dest   = bpf_htons(VXLAN_PORT);
+	hdr.udph.len    = bpf_htons(skb->len + sizeof(hdr.udph) + sizeof(hdr.vxh) +
+				    sizeof(hdr.eth));
+
+	hdr.vxh.vx_flags = bpf_htonl(VXLAN_FLAGS);
+	hdr.vxh.vx_vni   = bpf_htonl(VXLAN_VNI << 8);
+
+	__builtin_memcpy(hdr.eth.h_dest, bcast, ETH_ALEN);
+	__builtin_memcpy(hdr.eth.h_source, srcmac, ETH_ALEN);
+	hdr.eth.h_proto = bpf_htons(ETH_P_IPV6);
+
+	err = bpf_lwt_push_encap(skb, BPF_LWT_ENCAP_IP, &hdr, sizeof(hdr));
+	if (err)
+		return BPF_DROP;
+
+	return BPF_LWT_REROUTE;
+}
+
 char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/test_lwt_ip_encap_fix.c b/tools/testing/selftests/bpf/progs/test_lwt_ip_encap_fix.c
new file mode 100644
index 000000000000..e9043fe654eb
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_lwt_ip_encap_fix.c
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * fexit on bpf_lwt_push_ip_encap() to verify skb->transport_header is
+ * correctly updated when a UDP-based tunnel (e.g. VxLAN) is pushed.
+ */
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_core_read.h>
+
+/* Written by fexit, read by the user-space test via skeleton BSS. */
+__u16 transport_hdr = 0;
+__u16 network_hdr = 0;
+bool fexit_triggered = false;
+
+/*
+ * bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress)
+ *
+ * After a successful push the transport_header must point at the outer
+ * transport header (UDP for VxLAN), i.e.
+ *   transport_header - network_header == sizeof(outer IP header)
+ */
+SEC("fexit/bpf_lwt_push_ip_encap")
+int BPF_PROG(fexit_lwt_push_ip_encap, struct sk_buff *skb, void *hdr, u32 len, bool ingress,
+	     int retval)
+{
+	if (retval || fexit_triggered)
+		return 0;
+
+	fexit_triggered = true;
+	transport_hdr = BPF_CORE_READ(skb, transport_header);
+	network_hdr   = BPF_CORE_READ(skb, network_header);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf 1/2] bpf: Update transport_header when encapsulating UDP tunnel in lwt
  2026-05-25 14:26 ` [PATCH bpf 1/2] " Leon Hwang
@ 2026-05-25 15:00   ` sashiko-bot
  2026-05-27 13:40     ` Leon Hwang
  0 siblings, 1 reply; 7+ messages in thread
From: sashiko-bot @ 2026-05-25 15:00 UTC (permalink / raw)
  To: Leon Hwang; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

New issues:
- [High] A Time-of-Check to Time-of-Use (TOCTOU) race condition exists due to reading header fields directly from the mutable BPF-provided buffer after length validation.
- [Medium] The transport header is updated without verifying that the packet length actually contains the full UDP header.

Pre-existing issues:
- [Critical] Missing validation for the minimum IPv4 header length (iph->ihl >= 5) before calling ip_fast_csum() can cause a kernel panic.
--

commit 957c78832bf7e636702330e3ec4773ffefe95be0
Author: Leon Hwang <leon.hwang@linux.dev>

bpf: Update transport_header when encapsulating UDP tunnel in lwt

Currently, bpf_lwt_push_ip_encap() does not update skb->transport_header.
When a driver reuses the stale skb->transport_header to offload checksum
computation to NIC hardware, VxLAN packets encapsulated by
bpf_lwt_push_encap() may be dropped due to incorrect checksum. This patch
updates skb->transport_header in bpf_lwt_push_ip_encap() whenever the
encapsulated packet uses UDP.

> diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
> index f71ef82a5f3d3..cbb2803367c6d 100644
> --- a/net/core/lwt_bpf.c
> +++ b/net/core/lwt_bpf.c
> @@ -637,6 +637,10 @@ int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress)
>  	if (ingress)
>  		skb_postpush_rcsum(skb, iph, len);
>  	skb_reset_network_header(skb);
> +	if (ipv4 && iph->protocol == IPPROTO_UDP /* UDP tunnel */)
> +		skb_set_transport_header(skb, skb_network_offset(skb) + iph->ihl * 4);

[Severity: High]
Can a concurrent modification to the BPF-provided header cause an
out-of-bounds transport header offset?

The hdr pointer passed to bpf_lwt_push_ip_encap() can point to concurrently
mutable memory such as a BPF map value. Reading iph->protocol and iph->ihl
directly from hdr here, after length validation has already occurred,
appears to introduce a Time-of-Check to Time-of-Use race condition.

If a concurrent modification changes iph->ihl, the calculated transport
header could point well past the end of the pushed IP header. For small
SKBs, this could cause skb_transport_header to point out-of-bounds into
skb_shared_info.

Would it be safer to perform these reads after copying the header into the
SKB, using the copied, stable values instead?

[Severity: Critical]
This is a pre-existing issue and was not introduced by the patch under
review, but is there a missing validation for the minimum IPv4 header
length before ip_fast_csum() is called?

Earlier in the function, the length validation checks:

    if (unlikely(len < iph->ihl * 4))
        return -EINVAL;

If a BPF program provides an IP header with ihl = 0, the check len < 0
evaluates to false because len is at least 20.

Later, ip_fast_csum() is called with ihl = 0. On architectures like MIPS,
the ip_fast_csum() implementation reads 16 bytes and then uses a loop:

    do { ... word++ } while (word != stop);

where stop = iph + ihl. If ihl = 0, stop is physically behind word,
causing the loop to endlessly increment through the address space until
it page faults.

Does this code need a check to ensure iph->ihl >= 5?

[Severity: Medium]
Does this code verify that the packet length actually contains the full UDP
header?

This updates the transport header for UDP tunnels based solely on the IP
protocol, but it doesn't seem to validate that len actually encompasses
the full struct udphdr.

If an IP header is pushed with IPPROTO_UDP but a truncated length (like
len = 20), the transport header is set to the end of the IP header. A NIC
driver might then attempt to parse the subsequent memory as a UDP header,
which could lead to broken offloads or out-of-bounds reads.

> +	else if (!ipv4 && ((struct ipv6hdr *)hdr)->nexthdr == NEXTHDR_UDP /* UDP tunnel */)
> +		skb_set_transport_header(skb, skb_network_offset(skb) + sizeof(struct ipv6hdr));
>  	memcpy(skb_network_header(skb), hdr, len);
>  	bpf_compute_data_pointers(skb);
>  	skb_clear_hash(skb);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260525142650.2569-1-leon.hwang@linux.dev?part=1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf 2/2] selftests/bpf: Add tests to verify the fix of encapsulating VxLAN in lwt
  2026-05-25 14:26 ` [PATCH bpf 2/2] selftests/bpf: Add tests to verify the fix of encapsulating VxLAN " Leon Hwang
@ 2026-05-25 15:17   ` sashiko-bot
  2026-05-27 13:40     ` Leon Hwang
  0 siblings, 1 reply; 7+ messages in thread
From: sashiko-bot @ 2026-05-25 15:17 UTC (permalink / raw)
  To: Leon Hwang; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] Multi-line comments do not follow the modern kernel/BPF comment style.
--

commit cb1735fb3bcd5263a9150a9e452a94ee09b427e7
Author: Leon Hwang <leon.hwang@linux.dev>

selftests/bpf: Add tests to verify the fix of encapsulating VxLAN in lwt

This commit adds two tests to verify that the skb transport header is correctly
set when encapsulating VxLAN using the bpf_lwt_push_encap() helper for both
IPv4 and IPv6.

> diff --git a/tools/testing/selftests/bpf/prog_tests/lwt_ip_encap.c b/tools/testing/selftests/bpf/prog_tests/lwt_ip_encap.c
> --- a/tools/testing/selftests/bpf/prog_tests/lwt_ip_encap.c
> +++ b/tools/testing/selftests/bpf/prog_tests/lwt_ip_encap.c
[ ... ]
> @@ -538,3 +545,149 @@ void test_lwt_ip_encap_ipv4(void)
>  	if (test__start_subtest("ingress"))
>  		lwt_ip_encap(IPV4_ENCAP, INGRESS, "");
>  }
> +
> +/* VxLAN Setup/topology:

[Severity: Low]
This isn't a bug, but does this multi-line comment follow the preferred BPF
subsystem comment style? The modern multi-line comment style places the opening
/* on its own line, with the text starting on the following line.

> + *
> + * NS1 (IP*_ADDR_1)                NS2                  NS3 (IP*_ADDR_4)
[ ... ]
> +	SYS(fail_close, "ip    a add %s/32  dev veth8", IP4_ADDR_VXLAN);
> +	SYS(fail_close, "ip -6 a add %s/128 dev veth8", IP6_ADDR_VXLAN);
> +	/* Standard VxLAN devices to decap the encapsulated packets.  The inner

[Severity: Low]
This isn't a bug, but here is another instance where the multi-line comment
could be adjusted to have the opening /* on a separate line.

> +	 * Ethernet frame uses a broadcast dst MAC so the IP stack accepts it
> +	 * without ARP or FDB configuration.
> +	 */
> +	SYS(fail_close, "ip link add vxlan4 type vxlan id 1 dstport 4789 local %s dev veth8 nolearning noudpcsum",
[ ... ]
> +	return -1;
> +}
> +
> +/* VxLAN encap tests (IPv4-outer and IPv6-outer variants).

[Severity: Low]
This isn't a bug, but this comment block also places text on the same line as
the opening /*.

> + *
> + * Test 1 - functional: the BPF LWT xmit program encapsulates the packet
[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260525142650.2569-1-leon.hwang@linux.dev?part=2

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf 1/2] bpf: Update transport_header when encapsulating UDP tunnel in lwt
  2026-05-25 15:00   ` sashiko-bot
@ 2026-05-27 13:40     ` Leon Hwang
  0 siblings, 0 replies; 7+ messages in thread
From: Leon Hwang @ 2026-05-27 13:40 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: bpf

On 2026/5/25 23:00, sashiko-bot@kernel.org wrote:
[...]
>> diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
>> index f71ef82a5f3d3..cbb2803367c6d 100644
>> --- a/net/core/lwt_bpf.c
>> +++ b/net/core/lwt_bpf.c
>> @@ -637,6 +637,10 @@ int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress)
>>  	if (ingress)
>>  		skb_postpush_rcsum(skb, iph, len);
>>  	skb_reset_network_header(skb);
>> +	if (ipv4 && iph->protocol == IPPROTO_UDP /* UDP tunnel */)
>> +		skb_set_transport_header(skb, skb_network_offset(skb) + iph->ihl * 4);
> 
> [Severity: High]
> Can a concurrent modification to the BPF-provided header cause an
> out-of-bounds transport header offset?
> 
> The hdr pointer passed to bpf_lwt_push_ip_encap() can point to concurrently
> mutable memory such as a BPF map value. Reading iph->protocol and iph->ihl
> directly from hdr here, after length validation has already occurred,
> appears to introduce a Time-of-Check to Time-of-Use race condition.
> 
> If a concurrent modification changes iph->ihl, the calculated transport
> header could point well past the end of the pushed IP header. For small
> SKBs, this could cause skb_transport_header to point out-of-bounds into
> skb_shared_info.
> 
> Would it be safer to perform these reads after copying the header into the
> SKB, using the copied, stable values instead?
> 

Agreed.

Will move skb_set_transport_header() after memcpy().

> [Severity: Critical]
> This is a pre-existing issue and was not introduced by the patch under
> review, but is there a missing validation for the minimum IPv4 header
> length before ip_fast_csum() is called?
> 
> Earlier in the function, the length validation checks:
> 
>     if (unlikely(len < iph->ihl * 4))
>         return -EINVAL;
> 
> If a BPF program provides an IP header with ihl = 0, the check len < 0
> evaluates to false because len is at least 20.
> 
> Later, ip_fast_csum() is called with ihl = 0. On architectures like MIPS,
> the ip_fast_csum() implementation reads 16 bytes and then uses a loop:
> 
>     do { ... word++ } while (word != stop);
> 
> where stop = iph + ihl. If ihl = 0, stop is physically behind word,
> causing the loop to endlessly increment through the address space until
> it page faults.
> 
> Does this code need a check to ensure iph->ihl >= 5?
> 

I'm not sure about this case.

But, can update "len < iph->ihl * 4" to
"iph->ihl < 5 || len < iph->ihl * 4".

> [Severity: Medium]
> Does this code verify that the packet length actually contains the full UDP
> header?
> 
> This updates the transport header for UDP tunnels based solely on the IP
> protocol, but it doesn't seem to validate that len actually encompasses
> the full struct udphdr.
> 
> If an IP header is pushed with IPPROTO_UDP but a truncated length (like
> len = 20), the transport header is set to the end of the IP header. A NIC
> driver might then attempt to parse the subsequent memory as a UDP header,
> which could lead to broken offloads or out-of-bounds reads.
> 

Agreed.

Will check len must include UDP header.

Thanks,
Leon

>> +	else if (!ipv4 && ((struct ipv6hdr *)hdr)->nexthdr == NEXTHDR_UDP /* UDP tunnel */)
>> +		skb_set_transport_header(skb, skb_network_offset(skb) + sizeof(struct ipv6hdr));
>>  	memcpy(skb_network_header(skb), hdr, len);
>>  	bpf_compute_data_pointers(skb);
>>  	skb_clear_hash(skb);
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf 2/2] selftests/bpf: Add tests to verify the fix of encapsulating VxLAN in lwt
  2026-05-25 15:17   ` sashiko-bot
@ 2026-05-27 13:40     ` Leon Hwang
  0 siblings, 0 replies; 7+ messages in thread
From: Leon Hwang @ 2026-05-27 13:40 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: bpf

On 2026/5/25 23:17, sashiko-bot@kernel.org wrote:
[...]
>> +	 * Ethernet frame uses a broadcast dst MAC so the IP stack accepts it
>> +	 * without ARP or FDB configuration.
>> +	 */
>> +	SYS(fail_close, "ip link add vxlan4 type vxlan id 1 dstport 4789 local %s dev veth8 nolearning noudpcsum",
> [ ... ]
>> +	return -1;
>> +}
>> +
>> +/* VxLAN encap tests (IPv4-outer and IPv6-outer variants).
> 
> [Severity: Low]
> This isn't a bug, but this comment block also places text on the same line as
> the opening /*.
> 

Will update all the comments with BPF comment style.

Thanks,
Leon

>> + *
>> + * Test 1 - functional: the BPF LWT xmit program encapsulates the packet
> [ ... ]
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-05-27 13:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-25 14:26 [PATCH bpf 0/2] bpf: Update transport_header when encapsulating UDP tunnel in lwt Leon Hwang
2026-05-25 14:26 ` [PATCH bpf 1/2] " Leon Hwang
2026-05-25 15:00   ` sashiko-bot
2026-05-27 13:40     ` Leon Hwang
2026-05-25 14:26 ` [PATCH bpf 2/2] selftests/bpf: Add tests to verify the fix of encapsulating VxLAN " Leon Hwang
2026-05-25 15:17   ` sashiko-bot
2026-05-27 13:40     ` Leon Hwang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox