From: "Emil Tsalapatis" <emil@etsalapatis.com>
To: "Leon Hwang" <leon.hwang@linux.dev>, <bpf@vger.kernel.org>
Cc: "David S . Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Paolo Abeni" <pabeni@redhat.com>,
"Simon Horman" <horms@kernel.org>,
"Andrii Nakryiko" <andrii@kernel.org>,
"Eduard Zingerman" <eddyz87@gmail.com>,
"Alexei Starovoitov" <ast@kernel.org>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"Martin KaFai Lau" <martin.lau@linux.dev>,
"Kumar Kartikeya Dwivedi" <memxor@gmail.com>,
"Song Liu" <song@kernel.org>,
"Yonghong Song" <yonghong.song@linux.dev>,
"Jiri Olsa" <jolsa@kernel.org>, "Shuah Khan" <shuah@kernel.org>,
"Guillaume Nault" <gnault@redhat.com>,
"Ido Schimmel" <idosch@nvidia.com>,
"Fernando Fernandez Mancera" <fmancera@suse.de>,
"Peter Oskolkov" <posk@google.com>,
<linux-kernel@vger.kernel.org>, <netdev@vger.kernel.org>,
<linux-kselftest@vger.kernel.org>, <kernel-patches-bot@fb.com>,
"Leon Hwang" <leon.huangfu@shopee.com>
Subject: Re: [PATCH bpf v3 2/2] selftests/bpf: Add tests to verify the fix of encapsulating VxLAN in lwt
Date: Mon, 01 Jun 2026 14:24:43 -0400 [thread overview]
Message-ID: <DIXX90BMJRXM.1XKUMNSDY9OO@etsalapatis.com> (raw)
In-Reply-To: <20260601150203.20352-3-leon.hwang@linux.dev>
On Mon Jun 1, 2026 at 11:02 AM EDT, Leon Hwang wrote:
> Add two tests to verify the transport header of skb has been set when
> encapsulate VxLAN using bpf_lwt_push_encap() helper.
>
> 1. VxLAN over IPv4.
> 2. VxLAN over IPv6.
>
> Without the fix, the tests would fail:
>
> lwt_ip_encap_vxlan:FAIL:transport_hdr offset unexpected transport_hdr offset: actual 70 != expected 20
> #208 lwt_ip_encap_vxlan_ipv4:FAIL
> lwt_ip_encap_vxlan:FAIL:transport_hdr offset unexpected transport_hdr offset: actual 110 != expected 40
> #209 lwt_ip_encap_vxlan_ipv6:FAIL
>
> Assisted-by: Claude:claude-sonnet-4-6
> Cc: Leon Hwang <leon.huangfu@shopee.com>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
> .../selftests/bpf/prog_tests/lwt_ip_encap.c | 164 ++++++++++++++++++
> .../selftests/bpf/progs/test_lwt_ip_encap.c | 112 ++++++++++++
> .../bpf/progs/test_lwt_ip_encap_fix.c | 44 +++++
> 3 files changed, 320 insertions(+)
> create mode 100644 tools/testing/selftests/bpf/progs/test_lwt_ip_encap_fix.c
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/lwt_ip_encap.c b/tools/testing/selftests/bpf/prog_tests/lwt_ip_encap.c
> index b6391af5f6f9..50104d847fde 100644
> --- a/tools/testing/selftests/bpf/prog_tests/lwt_ip_encap.c
> +++ b/tools/testing/selftests/bpf/prog_tests/lwt_ip_encap.c
> @@ -1,8 +1,11 @@
> // SPDX-License-Identifier: GPL-2.0-only
> +#include <linux/ip.h>
> +#include <linux/ipv6.h>
> #include <netinet/in.h>
>
> #include "network_helpers.h"
> #include "test_progs.h"
> +#include "test_lwt_ip_encap_fix.skel.h"
>
> #define BPF_FILE "test_lwt_ip_encap.bpf.o"
>
> @@ -35,6 +38,10 @@
> #define IP6_ADDR_SRC IP6_ADDR_1
> #define IP6_ADDR_DST IP6_ADDR_4
>
> +/* VxLAN tunnel endpoints, reachable via the bottom route (veth5/6/7/8). */
> +#define IP4_ADDR_VXLAN "172.16.17.100"
> +#define IP6_ADDR_VXLAN "fb20::1"
There's a whole series of existing IP definitions at the top of the file, we
should put those new ones right below them. And fb20 -> fb11 to keep
with the pattern imo.
> +
> /* Setup/topology:
> *
> * NS1 NS2 NS3
> @@ -538,3 +545,160 @@ void test_lwt_ip_encap_ipv4(void)
> if (test__start_subtest("ingress"))
> lwt_ip_encap(IPV4_ENCAP, INGRESS, "");
> }
> +
> +/*
> + * VxLAN Setup/topology:
> + *
> + * NS1 (IP*_ADDR_1) NS2 NS3 (IP*_ADDR_4)
> + * [ping src]
> + * | top route
> + * veth1 (LWT encap) <<-- veth2 veth3 <<-- veth4 (ping dst)
> + * | ^
> + * (bottom route) | (inner pkt)
> + * v bottom route |
> + * veth5 -->> veth6 veth7 -->> veth8 (vxlan decap)
> + * (IP*_ADDR_VXLAN)
> + *
Not sure if this is rendering weird for me but NS2 could be tabbed over
once more for clarity.
> + * Add the VxLAN endpoint addresses to NS3's veth8, create standard
> + * VxLAN decap devices bound to those addresses, and install routes so
> + * NS1/NS2 can reach the endpoints via the bottom route.
> + */
> +static int setup_vxlan_routes(const char *ns3, const char *ns1, const char *ns2)
> +{
> + struct nstoken *nstoken;
> +
> + nstoken = open_netns(ns3);
> + if (!ASSERT_OK_PTR(nstoken, "open ns3 for vxlan"))
> + return -1;
> +
> + SYS(fail_close, "ip a add %s/32 dev veth8", IP4_ADDR_VXLAN);
> + SYS(fail_close, "ip -6 a add %s/128 dev veth8", IP6_ADDR_VXLAN);
> + /*
> + * Standard VxLAN devices to decap the encapsulated packets. The inner
> + * Ethernet frame uses a broadcast dst MAC so the IP stack accepts it
> + * without ARP or FDB configuration.
> + */
> + SYS(fail_close, "ip link add vxlan4 type vxlan id 1 dstport 4789 local %s dev veth8 nolearning noudpcsum",
> + IP4_ADDR_VXLAN);
> + SYS(fail_close, "ip link set vxlan4 up");
> + SYS(fail_close, "ip link add vxlan6 type vxlan id 1 dstport 4789 local %s dev veth8 nolearning udp6zerocsumrx",
> + IP6_ADDR_VXLAN);
> + SYS(fail_close, "ip link set vxlan6 up");
> + close_netns(nstoken);
> +
> + SYS(fail, "ip -n %s route add %s/32 dev veth5 via %s",
> + ns1, IP4_ADDR_VXLAN, IP4_ADDR_6);
> + SYS(fail, "ip -n %s route add %s/32 dev veth7 via %s",
> + ns2, IP4_ADDR_VXLAN, IP4_ADDR_8);
> + SYS(fail, "ip -n %s -6 route add %s/128 dev veth5 via %s",
> + ns1, IP6_ADDR_VXLAN, IP6_ADDR_6);
> + SYS(fail, "ip -n %s -6 route add %s/128 dev veth7 via %s",
> + ns2, IP6_ADDR_VXLAN, IP6_ADDR_8);
> + return 0;
> +
> +fail_close:
> + close_netns(nstoken);
> +fail:
> + return -1;
> +}
> +
> +/*
> + * VxLAN encap tests (IPv4-outer and IPv6-outer variants).
> + *
> + * Test 1 - functional: the BPF LWT xmit program encapsulates the packet
> + * (protocol=UDP, port=4789) and re-routes it without dropping it.
> + * Verified by ping success.
> + *
> + * Test 2 - fix verification: after bpf_lwt_push_ip_encap() the
> + * skb->transport_header must point at the outer UDP header, i.e.
> + * transport_header - network_header == sizeof(outer IP header).
> + * Without the fix the transport_header still points at the inner
> + * transport layer, giving a wrong (larger) offset.
> + */
The Test 2 bullet regurgitates the AI's context, can you rephrase it so
that it's not framed as a fix but as a test?
> +static void lwt_ip_encap_vxlan(bool ipv4_encap)
> +{
> + char ns1[NETNS_NAME_SIZE] = NETNS_BASE "-1-";
> + char ns2[NETNS_NAME_SIZE] = NETNS_BASE "-2-";
> + char ns3[NETNS_NAME_SIZE] = NETNS_BASE "-3-";
> + const char *sec = ipv4_encap ? "encap_vxlan" : "encap_vxlan6";
> + int expected_offset = ipv4_encap ? (int)sizeof(struct iphdr)
> + : (int)sizeof(struct ipv6hdr);
> + struct test_lwt_ip_encap_fix *skel = NULL;
> + int thdr_offset, err;
> +
> + if (!ASSERT_OK(create_ns(ns1, NETNS_NAME_SIZE), "create ns1"))
> + goto out;
> + if (!ASSERT_OK(create_ns(ns2, NETNS_NAME_SIZE), "create ns2"))
> + goto out;
> + if (!ASSERT_OK(create_ns(ns3, NETNS_NAME_SIZE), "create ns3"))
> + goto out;
> +
> + if (!ASSERT_OK(setup_network(ns1, ns2, ns3, ""), "setup network"))
> + goto out;
> +
> + if (!ASSERT_OK(setup_vxlan_routes(ns3, ns1, ns2), "setup vxlan routes"))
> + goto out;
> +
> + /*
> + * Attach fexit to bpf_lwt_push_ip_encap() before installing the
> + * LWT route so we don't miss the first encap call.
> + */
> + skel = test_lwt_ip_encap_fix__open();
> + if (!ASSERT_OK_PTR(skel, "test_lwt_ip_encap_fix__open"))
> + goto out;
> +
> + skel->rodata->tgt_ip_version = ipv4_encap ? 4 : 6;
> +
> + err = test_lwt_ip_encap_fix__load(skel);
> + if (!ASSERT_OK(err, "test_lwt_ip_encap_fix__load"))
> + goto out;
> +
> + err = test_lwt_ip_encap_fix__attach(skel);
> + if (!ASSERT_OK(err, "test_lwt_ip_encap_fix__attach"))
> + goto out;
> +
> + /* Remove the direct NS2->DST route so packets must go via LWT encap. */
> + SYS(out, "ip -n %s route del %s/32 dev veth3", ns2, IP4_ADDR_DST);
> + SYS(out, "ip -n %s -6 route del %s/128 dev veth3", ns2, IP6_ADDR_DST);
> +
> + /* Install the VxLAN BPF LWT xmit route. */
> + if (ipv4_encap)
> + SYS(out, "ip -n %s route add %s encap bpf xmit obj %s sec %s dev veth1",
> + ns1, IP4_ADDR_DST, BPF_FILE, sec);
> + else
> + SYS(out, "ip -n %s -6 route add %s encap bpf xmit obj %s sec %s dev veth1",
> + ns1, IP6_ADDR_DST, BPF_FILE, sec);
> +
> + skel->bss->fexit_triggered = false;
> + if (ipv4_encap)
> + SYS(out, "ip netns exec %s ping -c 1 -W1 %s", ns1, IP4_ADDR_DST);
> + else
> + SYS(out, "ip netns exec %s ping6 -c 1 -W1 %s", ns1, IP6_ADDR_DST);
> +
> + /* Test 1: fexit triggered means bpf_lwt_push_ip_encap() succeeded. */
> + if (!ASSERT_TRUE(skel->bss->fexit_triggered, "fexit_triggered"))
> + goto out;
> +
> + /*
> + * Test 2: transport_header must sit immediately after the outer IP
> + * header, pointing at the UDP header of the VxLAN encap.
> + */
> + thdr_offset = (int)skel->bss->transport_hdr - (int)skel->bss->network_hdr;
> + ASSERT_EQ(thdr_offset, expected_offset, "transport_hdr offset");
> +
> +out:
> + test_lwt_ip_encap_fix__destroy(skel);
> + SYS_NOFAIL("ip netns del %s", ns1);
> + SYS_NOFAIL("ip netns del %s", ns2);
> + SYS_NOFAIL("ip netns del %s", ns3);
> +}
> +
> +void test_lwt_ip_encap_vxlan_ipv4(void)
> +{
> + lwt_ip_encap_vxlan(IPV4_ENCAP);
> +}
> +
> +void test_lwt_ip_encap_vxlan_ipv6(void)
> +{
> + lwt_ip_encap_vxlan(IPV6_ENCAP);
> +}
> diff --git a/tools/testing/selftests/bpf/progs/test_lwt_ip_encap.c b/tools/testing/selftests/bpf/progs/test_lwt_ip_encap.c
> index d6cb986e7533..36f0fc682ffb 100644
> --- a/tools/testing/selftests/bpf/progs/test_lwt_ip_encap.c
> +++ b/tools/testing/selftests/bpf/progs/test_lwt_ip_encap.c
> @@ -2,8 +2,10 @@
> #include <stddef.h>
> #include <string.h>
> #include <linux/bpf.h>
> +#include <linux/if_ether.h>
> #include <linux/ip.h>
> #include <linux/ipv6.h>
> +#include <linux/udp.h>
> #include <bpf/bpf_helpers.h>
> #include <bpf/bpf_endian.h>
>
> @@ -82,4 +84,114 @@ int bpf_lwt_encap_gre6(struct __sk_buff *skb)
> return BPF_LWT_REROUTE;
> }
>
> +struct vxlanhdr {
> + __be32 vx_flags; /* I flag = 0x08000000 (valid VNI) */
> + __be32 vx_vni; /* VNI in top 24 bits */
> +};
> +
> +#define VXLAN_PORT 4789
> +#define VXLAN_FLAGS 0x08000000
> +#define VXLAN_VNI 1
> +
> +static const __u8 bcast[ETH_ALEN] = {
> + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
> +};
> +
q +static const __u8 srcmac[ETH_ALEN] = {
> + 0x02, 0x00, 0x00, 0x00, 0x00, 0x01,
> +};
> +
> +SEC("encap_vxlan")
> +int bpf_lwt_encap_vxlan(struct __sk_buff *skb)
> +{
> + struct encap_hdr {
> + struct iphdr iph;
> + struct udphdr udph;
> + struct vxlanhdr vxh;
> + struct ethhdr eth;
> + } __attribute__((__packed__)) /* packed is required to avoid padding */ hdr;
Comment is unnecessary here.
> + int err;
> +
> + memset(&hdr, 0, sizeof(hdr));
> +
> + hdr.iph.ihl = 5;
> + hdr.iph.version = 4;
> + hdr.iph.ttl = 0x40;
> + hdr.iph.protocol = 17; /* IPPROTO_UDP */
> + hdr.iph.tot_len = bpf_htons(skb->len + sizeof(hdr));
> +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> + hdr.iph.saddr = 0x640510ac; /* 172.16.5.100 */
> + hdr.iph.daddr = 0x641110ac; /* 172.16.17.100 */
Ideally want to keep the addresses we are hardcoding here and the
the addresses we're declaring in the userspace part in sync. Here
we've hardcoded the addresses three multiple ways (be, le, and string
in the userspace part).
> +#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
> + hdr.iph.saddr = 0xac100564; /* 172.16.5.100 */
> + hdr.iph.daddr = 0xac101164; /* 172.16.17.100 */
> +#else
> +#error "Fix your compiler's __BYTE_ORDER__?!"
> +#endif
> +
> + hdr.udph.source = bpf_htons(VXLAN_PORT);
> + hdr.udph.dest = bpf_htons(VXLAN_PORT);
> + hdr.udph.len = bpf_htons(skb->len + sizeof(hdr.udph) + sizeof(hdr.vxh) +
> + sizeof(hdr.eth));
> +
> + hdr.vxh.vx_flags = bpf_htonl(VXLAN_FLAGS);
> + hdr.vxh.vx_vni = bpf_htonl(VXLAN_VNI << 8);
> +
> + __builtin_memcpy(hdr.eth.h_dest, bcast, ETH_ALEN);
> + __builtin_memcpy(hdr.eth.h_source, srcmac, ETH_ALEN);
> + hdr.eth.h_proto = bpf_htons(ETH_P_IP);
> +
> + err = bpf_lwt_push_encap(skb, BPF_LWT_ENCAP_IP, &hdr, sizeof(hdr));
> + if (err)
> + return BPF_DROP;
> +
> + return BPF_LWT_REROUTE;
> +}
> +
> +SEC("encap_vxlan6")
> +int bpf_lwt_encap_vxlan6(struct __sk_buff *skb)
> +{
> + struct encap_hdr {
> + struct ipv6hdr ip6hdr;
> + struct udphdr udph;
> + struct vxlanhdr vxh;
> + struct ethhdr eth;
> + } __attribute__((__packed__)) /* packed is required to avoid padding */ hdr;
> + int err;
> +
> + memset(&hdr, 0, sizeof(hdr));
> +
> + hdr.ip6hdr.version = 6;
> + hdr.ip6hdr.nexthdr = 17; /* IPPROTO_UDP */
> + hdr.ip6hdr.hop_limit = 0x40;
> + hdr.ip6hdr.payload_len = bpf_htons(skb->len + sizeof(hdr.udph) + sizeof(hdr.vxh) +
> + sizeof(hdr.eth));
> + /* fb05::1 */
> + hdr.ip6hdr.saddr.s6_addr[0] = 0xfb;
> + hdr.ip6hdr.saddr.s6_addr[1] = 0x05;
> + hdr.ip6hdr.saddr.s6_addr[15] = 1;
> + /* fb20::1 */
> + hdr.ip6hdr.daddr.s6_addr[0] = 0xfb;
> + hdr.ip6hdr.daddr.s6_addr[1] = 0x20;
> + hdr.ip6hdr.daddr.s6_addr[15] = 1;
> +
> + hdr.udph.source = bpf_htons(VXLAN_PORT);
> + hdr.udph.dest = bpf_htons(VXLAN_PORT);
> + hdr.udph.len = bpf_htons(skb->len + sizeof(hdr.udph) + sizeof(hdr.vxh) +
> + sizeof(hdr.eth));
> +
> + hdr.vxh.vx_flags = bpf_htonl(VXLAN_FLAGS);
> + hdr.vxh.vx_vni = bpf_htonl(VXLAN_VNI << 8);
> +
> + __builtin_memcpy(hdr.eth.h_dest, bcast, ETH_ALEN);
> + __builtin_memcpy(hdr.eth.h_source, srcmac, ETH_ALEN);
> + hdr.eth.h_proto = bpf_htons(ETH_P_IPV6);
> +
> + err = bpf_lwt_push_encap(skb, BPF_LWT_ENCAP_IP, &hdr, sizeof(hdr));
> + if (err)
> + return BPF_DROP;
> +
> + return BPF_LWT_REROUTE;
> +}
> +
> char _license[] SEC("license") = "GPL";
> diff --git a/tools/testing/selftests/bpf/progs/test_lwt_ip_encap_fix.c b/tools/testing/selftests/bpf/progs/test_lwt_ip_encap_fix.c
We definitely need to change the name here this is straight from the AI.
> new file mode 100644
> index 000000000000..6945f83b94f2
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_lwt_ip_encap_fix.c
> @@ -0,0 +1,44 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include "vmlinux.h"
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_tracing.h>
> +#include <bpf/bpf_core_read.h>
> +
> +#define NEXTHDR_UDP 17 /* UDP message. */
Unnecessary, we hardcode 17 in the other test.
> +
> +volatile const int tgt_ip_version;
> +
> +__u16 transport_hdr = 0;
> +__u16 network_hdr = 0;
> +bool fexit_triggered = false;
> +
> +/*
> + * bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress)
> + *
> + * After a successful push the transport_header must point at the outer
> + * transport header (UDP for VxLAN), i.e.
> + * transport_header - network_header == sizeof(outer IP header)
> + */
> +SEC("fexit/bpf_lwt_push_ip_encap")
> +int BPF_PROG(fexit_lwt_push_ip_encap, struct sk_buff *skb, void *hdr, u32 len, bool ingress,
> + int retval)
> +{
> + struct iphdr *iph;
> +
> + if (retval || fexit_triggered)
> + return 0;
> +
> + iph = (typeof(iph)) (skb->head + skb->network_header);
> + if (iph->version != tgt_ip_version)
> + return 0;
> +
> + if ((iph->version == 4 && iph->protocol == IPPROTO_UDP) ||
> + (iph->version == 6 && ((struct ipv6hdr *)iph)->nexthdr == NEXTHDR_UDP)) {
> + fexit_triggered = true;
> + transport_hdr = BPF_CORE_READ(skb, transport_header);
> + network_hdr = BPF_CORE_READ(skb, network_header);
> + }
> + return 0;
> +}
> +
> +char _license[] SEC("license") = "GPL";
prev parent reply other threads:[~2026-06-01 18:24 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-01 15:02 [PATCH bpf v3 0/2] bpf: Update transport_header when encapsulating UDP tunnel in lwt Leon Hwang
2026-06-01 15:02 ` [PATCH bpf v3 1/2] " Leon Hwang
2026-06-01 15:28 ` Eric Dumazet
2026-06-01 15:33 ` Eric Dumazet
2026-06-01 15:45 ` bot+bpf-ci
2026-06-01 17:03 ` Emil Tsalapatis
2026-06-01 15:02 ` [PATCH bpf v3 2/2] selftests/bpf: Add tests to verify the fix of encapsulating VxLAN " Leon Hwang
2026-06-01 18:24 ` Emil Tsalapatis [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DIXX90BMJRXM.1XKUMNSDY9OO@etsalapatis.com \
--to=emil@etsalapatis.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=fmancera@suse.de \
--cc=gnault@redhat.com \
--cc=horms@kernel.org \
--cc=idosch@nvidia.com \
--cc=jolsa@kernel.org \
--cc=kernel-patches-bot@fb.com \
--cc=kuba@kernel.org \
--cc=leon.huangfu@shopee.com \
--cc=leon.hwang@linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=memxor@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=posk@google.com \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox