BPF List
 help / color / mirror / Atom feed
* [PATCH net v3 0/2] Drop skb metadata before LWT encapsulation
@ 2026-06-19 17:09 Jakub Sitnicki
  2026-06-19 17:09 ` [PATCH net v3 1/2] net: lwtunnel: " Jakub Sitnicki
  2026-06-19 17:09 ` [PATCH net v3 2/2] selftests/bpf: Add LWT encap tests for skb metadata Jakub Sitnicki
  0 siblings, 2 replies; 5+ messages in thread
From: Jakub Sitnicki @ 2026-06-19 17:09 UTC (permalink / raw)
  To: Daniel Borkmann, David S. Miller, David Ahern, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Martin KaFai Lau
  Cc: netdev, bpf, kernel-team

See description for patch 1.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
Changes in v3:
- Clear metadata for non-BPF LWT encaps as well (Sashiko)
- Add selftests for LWT encap + XDP metadata
- Link to v2: https://lore.kernel.org/r/20260514-bpf-lwt-drop-skb-metadata-v2-1-458664edc2b5@cloudflare.com

Changes in v2:
- Clear metadata in bpf_xmit to allow access from tc(x) egress (Daniel)
- Add WARNING snippet to the description
- Link to v1: https://lore.kernel.org/r/20260428-wip-skb-local-storage-from-scratch-v1-1-8f7ca9b378ce@cloudflare.com

---
Jakub Sitnicki (2):
      net: lwtunnel: Drop skb metadata before LWT encapsulation
      selftests/bpf: Add LWT encap tests for skb metadata

 net/core/lwtunnel.c                                |   6 +
 tools/testing/selftests/bpf/config                 |   3 +
 .../bpf/prog_tests/xdp_context_test_run.c          | 175 +++++++++++++++++++++
 tools/testing/selftests/bpf/progs/test_xdp_meta.c  | 123 +++++++++------
 4 files changed, 255 insertions(+), 52 deletions(-)


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH net v3 1/2] net: lwtunnel: Drop skb metadata before LWT encapsulation
  2026-06-19 17:09 [PATCH net v3 0/2] Drop skb metadata before LWT encapsulation Jakub Sitnicki
@ 2026-06-19 17:09 ` Jakub Sitnicki
  2026-06-20 17:09   ` sashiko-bot
  2026-06-19 17:09 ` [PATCH net v3 2/2] selftests/bpf: Add LWT encap tests for skb metadata Jakub Sitnicki
  1 sibling, 1 reply; 5+ messages in thread
From: Jakub Sitnicki @ 2026-06-19 17:09 UTC (permalink / raw)
  To: Daniel Borkmann, David S. Miller, David Ahern, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Martin KaFai Lau
  Cc: netdev, bpf, kernel-team

skb metadata is meant for passing information between XDP and TC. It lives
in the skb headroom, immediately before skb->data. LWT programs cannot
access the __sk_buff->data_meta pseudo-pointer to metadata.

However, LWT encapsulation prepends outer headers, moving skb->data back
over the headroom where the metadata sits. On an RX-originated (forwarded)
packet that still carries XDP metadata this goes wrong in two different
ways, depending on the encap type:

1. Non-BPF LWT encaps (mpls, seg6, ioam6 ...) call skb_push()/skb_pull()
   and silently overwrite the metadata that sits in the headroom.

2) BPF LWT xmit calls bpf_skb_change_head(), which uses skb_data_move().
   That helper expects metadata immediately before skb->data. But since
   the IP output path runs LWT xmit before neighbour output has built
   the outgoing L2 header, for forwarded packets skb->data points at the
   L3 header while skb_mac_header() still points at the old L2 header.
   skb_data_move() sees metadata ending at skb_mac_header(), not before
   skb->data, warns and clears metadata:

  WARNING: CPU: 21 PID: 454557 at include/linux/skbuff.h:4609 skb_data_move+0x47/0x90
  CPU: 21 UID: 0 PID: 454557 Comm: napi/iconduit-g Tainted: G           O        6.18.21 #1
  RIP: 0010:skb_data_move+0x47/0x90
  Call Trace:
   <IRQ>
   bpf_skb_change_head+0xe6/0x1a0
   bpf_prog_...+0x213/0x2e3
   run_lwt_bpf.isra.0+0x1d3/0x360
   bpf_xmit+0x46/0xe0
   lwtunnel_xmit+0xa1/0xf0
   ip_finish_output2+0x1e7/0x5e0
   ip_output+0x63/0x100
   __netif_receive_skb_one_core+0x85/0xa0
   process_backlog+0x9c/0x150
   __napi_poll+0x2b/0x190
   net_rx_action+0x40b/0x7f0
   handle_softirqs+0xd2/0x270
   do_softirq+0x3f/0x60
   </IRQ>

That is what happens, as for how to fix it - a received packet that
carries metadata can reach an encap through any of the three LWT
redirect modes:

  LWTUNNEL_STATE_INPUT_REDIRECT
   ip6_rcv_finish
     dst_input
       lwtunnel_input

  LWTUNNEL_STATE_OUTPUT_REDIRECT
   ip6_rcv_finish
     dst_input
       ip6_forward
         ip6_forward_finish
           dst_output
             lwtunnel_output

  LWTUNNEL_STATE_XMIT_REDIRECT
   ip6_rcv_finish
     dst_input
       ip6_forward
         ip6_forward_finish
           dst_output
             ip6_output
               ip6_finish_output
                 ip6_finish_output2
                   lwtunnel_xmit

Every encap funnels through the three LWT dispatch helpers, so drop the
metadata there, right before handing the skb to the encap op. This
single chokepoint covers all encap types and all three redirect modes:

  - lwtunnel_input():  seg6, rpl, ila, seg6_local
  - lwtunnel_output(): ioam6
  - lwtunnel_xmit():   mpls, LWT BPF xmit

Alternatively, we could clear the metadata right after TC ingress hook.
That would require a compromise, however. Metadata would become
inaccessible from TC egress (in setups where it actually reaches the
hook it tact, that is without any L2 tunnels on path).

Fixes: 8989d328dfe7 ("net: Helper to move packet data and metadata after skb_push/pull")
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/core/lwtunnel.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c
index f9d76d85d04f..b01a395d9a96 100644
--- a/net/core/lwtunnel.c
+++ b/net/core/lwtunnel.c
@@ -350,6 +350,8 @@ int lwtunnel_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 	rcu_read_lock();
 	ops = rcu_dereference(lwtun_encaps[lwtstate->type]);
 	if (likely(ops && ops->output)) {
+		/* Encap pushes outer headers over the metadata; drop it. */
+		skb_metadata_clear(skb);
 		dev_xmit_recursion_inc();
 		ret = ops->output(net, sk, skb);
 		dev_xmit_recursion_dec();
@@ -404,6 +406,8 @@ int lwtunnel_xmit(struct sk_buff *skb)
 	rcu_read_lock();
 	ops = rcu_dereference(lwtun_encaps[lwtstate->type]);
 	if (likely(ops && ops->xmit)) {
+		/* Encap pushes outer headers over the metadata; drop it. */
+		skb_metadata_clear(skb);
 		dev_xmit_recursion_inc();
 		ret = ops->xmit(skb);
 		dev_xmit_recursion_dec();
@@ -455,6 +459,8 @@ int lwtunnel_input(struct sk_buff *skb)
 	rcu_read_lock();
 	ops = rcu_dereference(lwtun_encaps[lwtstate->type]);
 	if (likely(ops && ops->input)) {
+		/* Encap pushes outer headers over the metadata; drop it. */
+		skb_metadata_clear(skb);
 		dev_xmit_recursion_inc();
 		ret = ops->input(skb);
 		dev_xmit_recursion_dec();

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH net v3 2/2] selftests/bpf: Add LWT encap tests for skb metadata
  2026-06-19 17:09 [PATCH net v3 0/2] Drop skb metadata before LWT encapsulation Jakub Sitnicki
  2026-06-19 17:09 ` [PATCH net v3 1/2] net: lwtunnel: " Jakub Sitnicki
@ 2026-06-19 17:09 ` Jakub Sitnicki
  2026-06-20 17:09   ` sashiko-bot
  1 sibling, 1 reply; 5+ messages in thread
From: Jakub Sitnicki @ 2026-06-19 17:09 UTC (permalink / raw)
  To: Daniel Borkmann, David S. Miller, David Ahern, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Martin KaFai Lau
  Cc: netdev, bpf, kernel-team

Test that an LWT encapsulation does not silently corrupt XDP metadata
sitting in the skb headroom. Exercise all three LWT dispatch paths:

- BPF LWT xmit prog reserves headroom on the LWT .xmit redirect,
- mpls pushes an MPLS label on the LWT .xmit redirect,
- seg6 in encap mode runs on the LWT .input redirect,
- ioam6 encap inserts an IOAM Hop-by-Hop option on LWT .output redirect.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 tools/testing/selftests/bpf/config                 |   3 +
 .../bpf/prog_tests/xdp_context_test_run.c          | 175 +++++++++++++++++++++
 tools/testing/selftests/bpf/progs/test_xdp_meta.c  | 123 +++++++++------
 3 files changed, 249 insertions(+), 52 deletions(-)

diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config
index bac60b444551..adb25146e88c 100644
--- a/tools/testing/selftests/bpf/config
+++ b/tools/testing/selftests/bpf/config
@@ -45,13 +45,16 @@ CONFIG_IPV6=y
 CONFIG_IPV6_FOU=y
 CONFIG_IPV6_FOU_TUNNEL=y
 CONFIG_IPV6_GRE=y
+CONFIG_IPV6_IOAM6_LWTUNNEL=y
 CONFIG_IPV6_SEG6_BPF=y
+CONFIG_IPV6_SEG6_LWTUNNEL=y
 CONFIG_IPV6_SIT=y
 CONFIG_IPV6_TUNNEL=y
 CONFIG_KEYS=y
 CONFIG_LIRC=y
 CONFIG_LIVEPATCH=y
 CONFIG_LWTUNNEL=y
+CONFIG_LWTUNNEL_BPF=y
 CONFIG_MODULE_SIG=y
 CONFIG_MODULE_SRCVERSION_ALL=y
 CONFIG_MODULE_UNLOAD=y
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_context_test_run.c b/tools/testing/selftests/bpf/prog_tests/xdp_context_test_run.c
index 26159e0499c7..448807676176 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_context_test_run.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_context_test_run.c
@@ -1,6 +1,8 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <test_progs.h>
 #include <network_helpers.h>
+#include <linux/ipv6.h>
+#include <arpa/inet.h>
 #include "test_xdp_context_test_run.skel.h"
 #include "test_xdp_meta.skel.h"
 
@@ -8,9 +10,12 @@
 #define TX_NAME "veth1"
 #define TX_NETNS "xdp_context_tx"
 #define RX_NETNS "xdp_context_rx"
+#define RX_MAC "02:00:00:00:00:01"
+#define TX_MAC "02:00:00:00:00:02"
 #define TAP_NAME "tap0"
 #define DUMMY_NAME "dum0"
 #define TAP_NETNS "xdp_context_tuntap"
+#define LWT_NETNS "xdp_context_lwt"
 
 #define TEST_PAYLOAD_LEN 32
 static const __u8 test_payload[TEST_PAYLOAD_LEN] = {
@@ -187,6 +192,42 @@ static int write_test_packet(int tap_fd)
 	return 0;
 }
 
+/* Inject Ethernet+IPv6+UDP frame into TAP */
+static int write_test_packet_udp(int tap_fd)
+{
+	__u8 pkt[sizeof(struct ethhdr) + sizeof(struct ipv6hdr) +
+		 sizeof(struct udphdr) + TEST_PAYLOAD_LEN] = {};
+	struct ethhdr *eth = (void *)pkt;
+	struct ipv6hdr *ip6 = (void *)(eth + 1);
+	struct udphdr *udp = (void *)(ip6 + 1);
+	__u8 *payload = (void *)(udp + 1);
+	const __u8 tap_mac[ETH_ALEN] = { 0x02, 0, 0, 0, 0, 0x01 };
+	int n;
+
+	memcpy(eth->h_dest, tap_mac, ETH_ALEN);
+	eth->h_proto = htons(ETH_P_IPV6);
+
+	ip6->version = 6;
+	ip6->hop_limit = 64;
+	ip6->nexthdr = IPPROTO_UDP;
+	ip6->payload_len = htons(sizeof(*udp) + TEST_PAYLOAD_LEN);
+	inet_pton(AF_INET6, "fd00::2", &ip6->saddr);
+	inet_pton(AF_INET6, "fd00:1::1", &ip6->daddr);
+
+	udp->source = htons(42);
+	udp->dest = htons(42);
+	udp->len = htons(sizeof(*udp) + TEST_PAYLOAD_LEN);
+	/* UDP checksum is not validated on the forwarding path. */
+
+	memcpy(payload, test_payload, TEST_PAYLOAD_LEN);
+
+	n = write(tap_fd, pkt, sizeof(pkt));
+	if (!ASSERT_EQ(n, sizeof(pkt), "write frame"))
+		return -1;
+
+	return 0;
+}
+
 static void dump_err_stream(const struct bpf_program *prog)
 {
 	char buf[512];
@@ -518,3 +559,137 @@ void test_xdp_context_tuntap(void)
 
 	test_xdp_meta__destroy(skel);
 }
+
+/*
+ * Test topology:
+ *
+ *	tap0 fd00::1
+ *	  RX:  injected IPv6 UDP frame, XDP ingress sets metadata
+ *	  fwd: encap route prepends outer header(s)
+ *	  TX:  TC egress validates metadata
+ *
+ * A routable IPv6 UDP frame is written into the tap fd, so it enters the RX
+ * path where XDP stores metadata. Routing then forwards it back out the same
+ * tap through an encapsulating route that prepends outer header(s). The TC
+ * egress program checks that the pushed header did not silently corrupt
+ * metadata.
+ */
+#define LWT_PIN_PATH "/sys/fs/bpf/xdp_context_lwt_xmit"
+
+enum lwt_encap_type {
+	LWT_ENCAP_BPF,
+	LWT_ENCAP_MPLS,
+	LWT_ENCAP_SEG6,
+	LWT_ENCAP_IOAM6,
+};
+
+static void test_lwt_encap(struct test_xdp_meta *skel,
+			   enum lwt_encap_type type)
+{
+	LIBBPF_OPTS(bpf_tc_hook, tc_hook, .attach_point = BPF_TC_EGRESS);
+	LIBBPF_OPTS(bpf_tc_opts, tc_opts, .handle = 1, .priority = 1);
+	struct bpf_program *lwt_prog = NULL;
+	struct netns_obj *ns = NULL;
+	const char *encap;
+	bool pinned = false;
+	int tap_ifindex;
+	int tap_fd = -1;
+	int ret;
+
+	skel->bss->test_pass = false;
+
+	switch (type) {
+	case LWT_ENCAP_BPF:
+		encap = "encap bpf xmit pinned " LWT_PIN_PATH " via fd00::2";
+		lwt_prog = skel->progs.dummy_lwt_xmit;
+		break;
+	case LWT_ENCAP_MPLS:
+		encap = "encap mpls 100 via inet6 fd00::2";
+		break;
+	case LWT_ENCAP_SEG6:
+		encap = "encap seg6 mode encap segs fd00::2";
+		break;
+	case LWT_ENCAP_IOAM6:
+		encap = "encap ioam6 mode encap tundst fd00::2 "
+			"trace prealloc type 0x800000 ns 0 size 4 via fd00::2";
+		break;
+	default:
+		return;
+	}
+
+	if (lwt_prog) {
+		unlink(LWT_PIN_PATH);
+		ret = bpf_program__pin(lwt_prog, LWT_PIN_PATH);
+		if (!ASSERT_OK(ret, "pin lwt prog"))
+			return;
+		pinned = true;
+	}
+
+	ns = netns_new(LWT_NETNS, true);
+	if (!ASSERT_OK_PTR(ns, "netns_new"))
+		goto close;
+
+	tap_fd = open_tuntap(TAP_NAME, true);
+	if (!ASSERT_GE(tap_fd, 0, "open_tuntap"))
+		goto close;
+
+	SYS(close, "ip link set dev " TAP_NAME " address " RX_MAC);
+	SYS(close, "sysctl -wq net.ipv6.conf.all.forwarding=1");
+	SYS(close, "ip addr add fd00::1/64 dev " TAP_NAME " nodad");
+	SYS(close, "ip link set dev " TAP_NAME " up");
+	SYS(close, "ip neigh add fd00::2 lladdr " TX_MAC " nud permanent dev " TAP_NAME);
+	SYS(close, "ip -6 route add fd00:1::/64 %s dev %s", encap, TAP_NAME);
+
+	tap_ifindex = if_nametoindex(TAP_NAME);
+	if (!ASSERT_GE(tap_ifindex, 0, "if_nametoindex"))
+		goto close;
+
+	ret = bpf_xdp_attach(tap_ifindex, bpf_program__fd(skel->progs.ing_xdp),
+			     0, NULL);
+	if (!ASSERT_GE(ret, 0, "bpf_xdp_attach"))
+		goto close;
+
+	tc_hook.ifindex = tap_ifindex;
+	ret = bpf_tc_hook_create(&tc_hook);
+	if (!ASSERT_OK(ret, "bpf_tc_hook_create"))
+		goto close;
+
+	tc_opts.prog_fd = bpf_program__fd(skel->progs.tc_is_meta_empty);
+	ret = bpf_tc_attach(&tc_hook, &tc_opts);
+	if (!ASSERT_OK(ret, "bpf_tc_attach"))
+		goto close;
+
+	ret = write_test_packet_udp(tap_fd);
+	if (!ASSERT_OK(ret, "write_test_packet_udp"))
+		goto close;
+
+	if (!ASSERT_TRUE(skel->bss->test_pass, "test_pass"))
+		dump_err_stream(skel->progs.tc_is_meta_empty);
+
+close:
+	if (tap_fd >= 0)
+		close(tap_fd);
+	netns_free(ns);
+	if (pinned)
+		unlink(LWT_PIN_PATH);
+}
+
+void test_xdp_context_lwt_encap(void)
+{
+	struct test_xdp_meta *skel;
+
+	skel = test_xdp_meta__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "open and load skeleton"))
+		return;
+
+	if (test__start_subtest("bpf_encap"))
+		test_lwt_encap(skel, LWT_ENCAP_BPF);
+	if (test__start_subtest("mpls_encap"))
+		test_lwt_encap(skel, LWT_ENCAP_MPLS);
+	if (test__start_subtest("seg6_encap"))
+		test_lwt_encap(skel, LWT_ENCAP_SEG6);
+	if (test__start_subtest("ioam6_encap"))
+		test_lwt_encap(skel, LWT_ENCAP_IOAM6);
+
+	test_xdp_meta__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_meta.c b/tools/testing/selftests/bpf/progs/test_xdp_meta.c
index fa73b17cb999..08b03be0b891 100644
--- a/tools/testing/selftests/bpf/progs/test_xdp_meta.c
+++ b/tools/testing/selftests/bpf/progs/test_xdp_meta.c
@@ -21,10 +21,6 @@
 
 bool test_pass;
 
-static const __u8 smac_want[ETH_ALEN] = {
-	0x12, 0x34, 0xDE, 0xAD, 0xBE, 0xEF,
-};
-
 static const __u8 meta_want[META_SIZE] = {
 	0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08,
 	0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18,
@@ -32,11 +28,6 @@ static const __u8 meta_want[META_SIZE] = {
 	0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38,
 };
 
-static bool check_smac(const struct ethhdr *eth)
-{
-	return !__builtin_memcmp(eth->h_source, smac_want, ETH_ALEN);
-}
-
 static bool check_metadata(const char *file, int line, __u8 *meta_have)
 {
 	if (!__builtin_memcmp(meta_have, meta_want, META_SIZE))
@@ -280,18 +271,47 @@ int ing_cls_dynptr_offset_oob(struct __sk_buff *ctx)
 	return TC_ACT_SHOT;
 }
 
+/* Test packets carry test metadata pattern as payload. */
+static bool is_test_packet_xdp(struct xdp_md *ctx)
+{
+	__u8 meta_have[META_SIZE];
+	__u32 len;
+
+	len = bpf_xdp_get_buff_len(ctx);
+	if (len < META_SIZE)
+		return false;
+	if (bpf_xdp_load_bytes(ctx, len - META_SIZE, meta_have, META_SIZE))
+		return false;
+	if (__builtin_memcmp(meta_have, meta_want, META_SIZE))
+		return false;
+
+	return true;
+}
+
+/* Test packets carry test metadata pattern as payload. */
+static bool is_test_packet_tc(struct __sk_buff *ctx)
+{
+	__u8 meta_have[META_SIZE];
+
+	if (ctx->len < META_SIZE)
+		return false;
+	if (bpf_skb_load_bytes(ctx, ctx->len - META_SIZE, meta_have, META_SIZE))
+		return false;
+	if (__builtin_memcmp(meta_have, meta_want, META_SIZE))
+		return false;
+
+	return true;
+}
+
 /* Reserve and clear space for metadata but don't populate it */
 SEC("xdp")
 int ing_xdp_zalloc_meta(struct xdp_md *ctx)
 {
-	struct ethhdr *eth = ctx_ptr(ctx, data);
 	__u8 *meta;
 	int ret;
 
 	/* Drop any non-test packets */
-	if (eth + 1 > ctx_ptr(ctx, data_end))
-		return XDP_DROP;
-	if (!check_smac(eth))
+	if (!is_test_packet_xdp(ctx))
 		return XDP_DROP;
 
 	ret = bpf_xdp_adjust_meta(ctx, -META_SIZE);
@@ -310,33 +330,24 @@ int ing_xdp_zalloc_meta(struct xdp_md *ctx)
 SEC("xdp")
 int ing_xdp(struct xdp_md *ctx)
 {
-	__u8 *data, *data_meta, *data_end, *payload;
-	struct ethhdr *eth;
+	__u8 *data, *data_meta;
 	int ret;
 
+	/* Drop any non-test packets */
+	if (!is_test_packet_xdp(ctx))
+		return XDP_DROP;
+
 	ret = bpf_xdp_adjust_meta(ctx, -META_SIZE);
 	if (ret < 0)
 		return XDP_DROP;
 
 	data_meta = ctx_ptr(ctx, data_meta);
-	data_end  = ctx_ptr(ctx, data_end);
 	data      = ctx_ptr(ctx, data);
 
-	eth = (struct ethhdr *)data;
-	payload = data + sizeof(struct ethhdr);
-
-	if (payload + META_SIZE > data_end ||
-	    data_meta + META_SIZE > data)
+	if (data_meta + META_SIZE > data)
 		return XDP_DROP;
 
-	/* The Linux networking stack may send other packets on the test
-	 * interface that interfere with the test. Just drop them.
-	 * The test packets can be recognized by their source MAC address.
-	 */
-	if (!check_smac(eth))
-		return XDP_DROP;
-
-	__builtin_memcpy(data_meta, payload, META_SIZE);
+	__builtin_memcpy(data_meta, meta_want, META_SIZE);
 	return XDP_PASS;
 }
 
@@ -353,7 +364,7 @@ int clone_data_meta_survives_data_write(struct __sk_buff *ctx)
 	if (eth + 1 > ctx_ptr(ctx, data_end))
 		goto out;
 	/* Ignore non-test packets */
-	if (!check_smac(eth))
+	if (!is_test_packet_tc(ctx))
 		goto out;
 
 	if (meta_have + META_SIZE > eth)
@@ -383,7 +394,7 @@ int clone_data_meta_survives_meta_write(struct __sk_buff *ctx)
 	if (eth + 1 > ctx_ptr(ctx, data_end))
 		goto out;
 	/* Ignore non-test packets */
-	if (!check_smac(eth))
+	if (!is_test_packet_tc(ctx))
 		goto out;
 
 	if (meta_have + META_SIZE > eth)
@@ -416,7 +427,7 @@ int clone_meta_dynptr_survives_data_slice_write(struct __sk_buff *ctx)
 	if (!eth)
 		goto out;
 	/* Ignore non-test packets */
-	if (!check_smac(eth))
+	if (!is_test_packet_tc(ctx))
 		goto out;
 
 	bpf_dynptr_from_skb_meta(ctx, 0, &meta);
@@ -436,16 +447,11 @@ int clone_meta_dynptr_survives_data_slice_write(struct __sk_buff *ctx)
 SEC("tc")
 int clone_meta_dynptr_survives_meta_slice_write(struct __sk_buff *ctx)
 {
-	struct bpf_dynptr data, meta;
-	const struct ethhdr *eth;
+	struct bpf_dynptr meta;
 	__u8 *meta_have;
 
-	bpf_dynptr_from_skb(ctx, 0, &data);
-	eth = bpf_dynptr_slice(&data, 0, NULL, sizeof(*eth));
-	if (!eth)
-		goto out;
 	/* Ignore non-test packets */
-	if (!check_smac(eth))
+	if (!is_test_packet_tc(ctx))
 		goto out;
 
 	bpf_dynptr_from_skb_meta(ctx, 0, &meta);
@@ -471,15 +477,10 @@ int clone_meta_dynptr_rw_before_data_dynptr_write(struct __sk_buff *ctx)
 {
 	struct bpf_dynptr data, meta;
 	__u8 meta_have[META_SIZE];
-	const struct ethhdr *eth;
 	int err;
 
-	bpf_dynptr_from_skb(ctx, 0, &data);
-	eth = bpf_dynptr_slice(&data, 0, NULL, sizeof(*eth));
-	if (!eth)
-		goto out;
 	/* Ignore non-test packets */
-	if (!check_smac(eth))
+	if (!is_test_packet_tc(ctx))
 		goto out;
 
 	/* Expect read-write metadata before unclone */
@@ -492,6 +493,7 @@ int clone_meta_dynptr_rw_before_data_dynptr_write(struct __sk_buff *ctx)
 		goto out;
 
 	/* Helper write to payload will unclone the packet */
+	bpf_dynptr_from_skb(ctx, 0, &data);
 	bpf_dynptr_write(&data, offsetof(struct ethhdr, h_proto), "x", 1, 0);
 
 	err = bpf_dynptr_read(meta_have, META_SIZE, &meta, 0, 0);
@@ -511,17 +513,12 @@ int clone_meta_dynptr_rw_before_data_dynptr_write(struct __sk_buff *ctx)
 SEC("tc")
 int clone_meta_dynptr_rw_before_meta_dynptr_write(struct __sk_buff *ctx)
 {
-	struct bpf_dynptr data, meta;
+	struct bpf_dynptr meta;
 	__u8 meta_have[META_SIZE];
-	const struct ethhdr *eth;
 	int err;
 
-	bpf_dynptr_from_skb(ctx, 0, &data);
-	eth = bpf_dynptr_slice(&data, 0, NULL, sizeof(*eth));
-	if (!eth)
-		goto out;
 	/* Ignore non-test packets */
-	if (!check_smac(eth))
+	if (!is_test_packet_tc(ctx))
 		goto out;
 
 	/* Expect read-write metadata before unclone */
@@ -545,6 +542,28 @@ int clone_meta_dynptr_rw_before_meta_dynptr_write(struct __sk_buff *ctx)
 	return TC_ACT_SHOT;
 }
 
+SEC("lwt_xmit")
+int dummy_lwt_xmit(struct __sk_buff *ctx)
+{
+	if (bpf_skb_change_head(ctx, sizeof(struct ipv6hdr), 0))
+		return BPF_DROP;
+
+	return BPF_OK;
+}
+
+SEC("tc")
+int tc_is_meta_empty(struct __sk_buff *ctx)
+{
+	if (!is_test_packet_tc(ctx))
+		return TC_ACT_OK;
+
+	if (ctx->data_meta != ctx->data)
+		return TC_ACT_OK;
+
+	test_pass = true;
+	return TC_ACT_OK;
+}
+
 SEC("tc")
 int helper_skb_vlan_push_pop(struct __sk_buff *ctx)
 {

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net v3 2/2] selftests/bpf: Add LWT encap tests for skb metadata
  2026-06-19 17:09 ` [PATCH net v3 2/2] selftests/bpf: Add LWT encap tests for skb metadata Jakub Sitnicki
@ 2026-06-20 17:09   ` sashiko-bot
  0 siblings, 0 replies; 5+ messages in thread
From: sashiko-bot @ 2026-06-20 17:09 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Low] The assertion `ASSERT_GE(tap_ifindex, 0)` incorrectly accepts 0, which is the error return value of `if_nametoindex()`, masking interface lookup failures.
--

commit 78aa5c949a7515e0e86daf8bde1669b476228831
Author: Jakub Sitnicki <jakub@cloudflare.com>

selftests/bpf: Add LWT encap tests for skb metadata

This commit adds testing for LWT encapsulation and XDP metadata sitting in the
skb headroom. It exercises LWT dispatch paths including BPF, mpls, seg6, and
ioam6 to ensure outer headers do not silently corrupt metadata.

> diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_context_test_run.c b/tools/testing/selftests/bpf/prog_tests/xdp_context_test_run.c
> index 26159e0499c76..448807676176c 100644
> --- a/tools/testing/selftests/bpf/prog_tests/xdp_context_test_run.c
> +++ b/tools/testing/selftests/bpf/prog_tests/xdp_context_test_run.c
[ ... ]
> @@ -518,3 +559,137 @@ void test_xdp_context_tuntap(void)
[ ... ]
> +	SYS(close, "ip neigh add fd00::2 lladdr " TX_MAC " nud permanent dev " TAP_NAME);
> +	SYS(close, "ip -6 route add fd00:1::/64 %s dev %s", encap, TAP_NAME);
> +
> +	tap_ifindex = if_nametoindex(TAP_NAME);
> +	if (!ASSERT_GE(tap_ifindex, 0, "if_nametoindex"))
> +		goto close;

[Severity: Low]
Does this properly catch interface lookup failures? 

Looking at if_nametoindex(), it returns 0 on error, so using ASSERT_GE
will evaluate to true when tap_ifindex is 0. This allows the test to mask
the lookup failure and inappropriately proceed with an invalid interface index
to bpf_xdp_attach().

Could this be ASSERT_GT(tap_ifindex, 0, ...) instead?

> +
> +	ret = bpf_xdp_attach(tap_ifindex, bpf_program__fd(skel->progs.ing_xdp),
> +			     0, NULL);
> +	if (!ASSERT_GE(ret, 0, "bpf_xdp_attach"))
> +		goto close;
[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260619-bpf-lwt-drop-skb-metadata-v3-0-71d6a33ab76b@cloudflare.com?part=2

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net v3 1/2] net: lwtunnel: Drop skb metadata before LWT encapsulation
  2026-06-19 17:09 ` [PATCH net v3 1/2] net: lwtunnel: " Jakub Sitnicki
@ 2026-06-20 17:09   ` sashiko-bot
  0 siblings, 0 replies; 5+ messages in thread
From: sashiko-bot @ 2026-06-20 17:09 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Calling `skb_metadata_clear(skb)` on a potentially cloned SKB modifies the shared `skb_shinfo` structure without exclusive access, leading to a data race and silent data corruption for other clones.
--

commit a6915ae09d5ba82236d77c5574df958bccb552d5
Author: Jakub Sitnicki <jakub@cloudflare.com>

net: lwtunnel: Drop skb metadata before LWT encapsulation

This commit clears skb metadata before lwt encapsulation across all dispatch
helpers. It prevents non-bpf lwt encapsulations from overwriting metadata in
the headroom and stops bpf lwt xmit from triggering warnings due to metadata
position during skb_data_move().

> diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c
> index f9d76d85d04fd..b01a395d9a966 100644
> --- a/net/core/lwtunnel.c
> +++ b/net/core/lwtunnel.c
> @@ -350,6 +350,8 @@ int lwtunnel_output(struct net *net, struct sock *sk, struct sk_buff *skb)
>  	rcu_read_lock();
>  	ops = rcu_dereference(lwtun_encaps[lwtstate->type]);
>  	if (likely(ops && ops->output)) {
> +		/* Encap pushes outer headers over the metadata; drop it. */
> +		skb_metadata_clear(skb);

[Severity: High]
Does calling skb_metadata_clear() here modify the shared skb_shinfo
structure without ensuring exclusive access?

Since skb_metadata_clear() modifies skb_shinfo(skb)->meta_len, if the
buffer is cloned (for instance, via tc mirred or multicast forwarding),
this might cause a data race and corrupt the metadata for other clones still
processing the packet.

Should we unshare the skb first to safely modify the shared info?

>  		dev_xmit_recursion_inc();
>  		ret = ops->output(net, sk, skb);
>  		dev_xmit_recursion_dec();

[ ... ]

> @@ -404,6 +406,8 @@ int lwtunnel_xmit(struct sk_buff *skb)
>  	rcu_read_lock();
>  	ops = rcu_dereference(lwtun_encaps[lwtstate->type]);
>  	if (likely(ops && ops->xmit)) {
> +		/* Encap pushes outer headers over the metadata; drop it. */
> +		skb_metadata_clear(skb);

[Severity: High]
Could this similarly corrupt shared metadata if the buffer is cloned?

>  		dev_xmit_recursion_inc();
>  		ret = ops->xmit(skb);
>  		dev_xmit_recursion_dec();

[ ... ]

> @@ -455,6 +459,8 @@ int lwtunnel_input(struct sk_buff *skb)
>  	rcu_read_lock();
>  	ops = rcu_dereference(lwtun_encaps[lwtstate->type]);
>  	if (likely(ops && ops->input)) {
> +		/* Encap pushes outer headers over the metadata; drop it. */
> +		skb_metadata_clear(skb);

[Severity: High]
Could this also lead to a data race by modifying skb_shinfo on a potentially
shared buffer before handing it to ops->input()?

>  		dev_xmit_recursion_inc();
>  		ret = ops->input(skb);
>  		dev_xmit_recursion_dec();

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260619-bpf-lwt-drop-skb-metadata-v3-0-71d6a33ab76b@cloudflare.com?part=1

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-20 17:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-19 17:09 [PATCH net v3 0/2] Drop skb metadata before LWT encapsulation Jakub Sitnicki
2026-06-19 17:09 ` [PATCH net v3 1/2] net: lwtunnel: " Jakub Sitnicki
2026-06-20 17:09   ` sashiko-bot
2026-06-19 17:09 ` [PATCH net v3 2/2] selftests/bpf: Add LWT encap tests for skb metadata Jakub Sitnicki
2026-06-20 17:09   ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox