BPF List
 help / color / mirror / Atom feed
* [PATCH bpf v3 0/2] bpf: Fix generic devmap egress skb sharing
@ 2026-06-11  4:33 Sun Jian
  2026-06-11  4:33 ` [PATCH bpf v3 1/2] bpf: Run generic devmap egress prog on private skb Sun Jian
  2026-06-11  4:33 ` [PATCH bpf v3 2/2] selftests/bpf: Cover generic devmap egress last-dst rewrite Sun Jian
  0 siblings, 2 replies; 7+ messages in thread
From: Sun Jian @ 2026-06-11  4:33 UTC (permalink / raw)
  To: bpf
  Cc: netdev, linux-kernel, linux-kselftest, ast, daniel, andrii,
	martin.lau, davem, kuba, hawk, john.fastabend, sdf, shuah,
	jiayuan.chen, toke, menglong.dong, emil, Sun Jian

Generic XDP devmap multi redirect can leave cloned skbs sharing packet
data. When a devmap egress program mutates packet data, another
destination sharing the same data may observe that mutation.

Fix this by making cloned skbs private before running the generic devmap
egress program. Add selftest coverage for the last-destination case,
where the final destination runs on the original skb while earlier
destinations use cloned skbs.

---
v3:
- Split the kernel fix and selftest into separate patches.
- Move the private-copy logic into dev_map_bpf_prog_run_skb().
- Use deterministic DEVMAP_HASH keys in the last-destination selftest.
- Fix the Fixes tag.

v2: https://lore.kernel.org/bpf/08c35c70-a59e-4e0e-91db-22b5ec30b611@linux.dev/T/#m613671b243e8651c47f84bb329f0ea668974c549
- Move the private-copy step into dev_map_generic_redirect() so the
  last-destination path is covered as well.
- Use skb_copy() instead of skb_unshare() to keep caller ownership
  unchanged on allocation failure.
- Add a generic XDP last-destination selftest case.

v1: https://lore.kernel.org/bpf/CABFUUZFimdrZdq=NWi+N-0sJZWvMwY=f4iF6-3TVMS8=m07Zmw@mail.gmail.com/

Sun Jian (2):
  bpf: Run generic devmap egress prog on private skb
  selftests/bpf: Cover generic devmap egress last-dst rewrite

 kernel/bpf/devmap.c                           |  21 ++-
 .../selftests/bpf/prog_tests/test_xdp_veth.c  | 152 +++++++++++++++++-
 2 files changed, 168 insertions(+), 5 deletions(-)


base-commit: e7ae89a0c97ce2b68b0983cd01eda67cf373517d
-- 
2.43.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH bpf v3 1/2] bpf: Run generic devmap egress prog on private skb
  2026-06-11  4:33 [PATCH bpf v3 0/2] bpf: Fix generic devmap egress skb sharing Sun Jian
@ 2026-06-11  4:33 ` Sun Jian
  2026-06-11  4:58   ` sashiko-bot
  2026-06-11  4:33 ` [PATCH bpf v3 2/2] selftests/bpf: Cover generic devmap egress last-dst rewrite Sun Jian
  1 sibling, 1 reply; 7+ messages in thread
From: Sun Jian @ 2026-06-11  4:33 UTC (permalink / raw)
  To: bpf
  Cc: netdev, linux-kernel, linux-kselftest, ast, daniel, andrii,
	martin.lau, davem, kuba, hawk, john.fastabend, sdf, shuah,
	jiayuan.chen, toke, menglong.dong, emil, Sun Jian

Generic XDP devmap multi redirect uses skb_clone() for intermediate
destinations and sends the last destination with the original skb. This
can leave multiple destinations sharing the same packet data.

This becomes visible after generic devmap egress-program support was
added: a devmap egress program may mutate packet data, and another
destination sharing the same data can observe that mutation.

Native XDP broadcast redirect does not have this issue because
xdpf_clone() copies the frame data for each destination. Generic XDP
should provide the same per-destination isolation before running a
devmap egress program.

Fix this by making cloned skbs private before running the generic devmap
egress program. Use skb_copy() instead of skb_unshare() so allocation
failure does not consume the skb and the existing caller error paths keep
their ownership semantics.

Fixes: 2ea5eabaf04a ("bpf: devmap: Implement devmap prog execution for generic XDP")
Suggested-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
---
 kernel/bpf/devmap.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index cc0a43ebab6b..14506834345a 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -512,8 +512,10 @@ static inline int __xdp_enqueue(struct net_device *dev, struct xdp_frame *xdpf,
 	return 0;
 }
 
-static u32 dev_map_bpf_prog_run_skb(struct sk_buff *skb, struct bpf_dtab_netdev *dst)
+static int dev_map_bpf_prog_run_skb(struct sk_buff **pskb,
+				    struct bpf_dtab_netdev *dst)
 {
+	struct sk_buff *skb = *pskb;
 	struct xdp_txq_info txq = { .dev = dst->dev };
 	struct xdp_buff xdp;
 	u32 act;
@@ -521,6 +523,18 @@ static u32 dev_map_bpf_prog_run_skb(struct sk_buff *skb, struct bpf_dtab_netdev
 	if (!dst->xdp_prog)
 		return XDP_PASS;
 
+	if (skb_cloned(skb)) {
+		struct sk_buff *nskb;
+
+		nskb = skb_copy(skb, GFP_ATOMIC);
+		if (!nskb)
+			return -ENOMEM;
+
+		consume_skb(skb);
+		skb = nskb;
+		*pskb = nskb;
+	}
+
 	__skb_pull(skb, skb->mac_len);
 	xdp.txq = &txq;
 
@@ -710,7 +724,10 @@ int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
 	 * return 0 even if packet is dropped. Helper below takes care of
 	 * freeing skb.
 	 */
-	if (dev_map_bpf_prog_run_skb(skb, dst) != XDP_PASS)
+	err = dev_map_bpf_prog_run_skb(&skb, dst);
+	if (err < 0)
+		return err;
+	if (err != XDP_PASS)
 		return 0;
 
 	skb->dev = dst->dev;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH bpf v3 2/2] selftests/bpf: Cover generic devmap egress last-dst rewrite
  2026-06-11  4:33 [PATCH bpf v3 0/2] bpf: Fix generic devmap egress skb sharing Sun Jian
  2026-06-11  4:33 ` [PATCH bpf v3 1/2] bpf: Run generic devmap egress prog on private skb Sun Jian
@ 2026-06-11  4:33 ` Sun Jian
  2026-06-11  4:47   ` sashiko-bot
  1 sibling, 1 reply; 7+ messages in thread
From: Sun Jian @ 2026-06-11  4:33 UTC (permalink / raw)
  To: bpf
  Cc: netdev, linux-kernel, linux-kselftest, ast, daniel, andrii,
	martin.lau, davem, kuba, hawk, john.fastabend, sdf, shuah,
	jiayuan.chen, toke, menglong.dong, emil, Sun Jian

Strengthen xdp_veth_egress to check that each destination observes the
MAC selected for its own egress ifindex, instead of only checking that
the observed MAC differs from a single magic value.

Add a generic XDP last-destination test where earlier destinations do
not have a devmap egress program while the final destination does. This
covers the case where the final destination runs on the original skb and
could otherwise rewrite packet data still shared with earlier cloned
skbs.

Use deterministic DEVMAP_HASH keys for the egress map so the intended
last destination is stable.

Suggested-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
---
 .../selftests/bpf/prog_tests/test_xdp_veth.c  | 152 +++++++++++++++++-
 1 file changed, 149 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/test_xdp_veth.c b/tools/testing/selftests/bpf/prog_tests/test_xdp_veth.c
index 3e98a1665936..681415cacb75 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_xdp_veth.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_xdp_veth.c
@@ -456,7 +456,11 @@ static void xdp_veth_egress(u32 flags)
 			.remote_flags = flags,
 		}
 	};
-	const char magic_mac[6] = { 0xAA, 0xBB, 0xCC, 0xDD, 0xEE, 0xFF};
+	const unsigned char egress_macs[VETH_PAIRS_COUNT][ETH_ALEN] = {
+		{ 0xAA, 0xBB, 0xCC, 0xDD, 0xEE, 0x01 },
+		{ 0xAA, 0xBB, 0xCC, 0xDD, 0xEE, 0x02 },
+		{ 0xAA, 0xBB, 0xCC, 0xDD, 0xEE, 0x03 },
+	};
 	struct xdp_redirect_multi_kern *xdp_redirect_multi_kern;
 	struct bpf_object *bpf_objs[VETH_EGRESS_SKEL_NB];
 	struct xdp_redirect_map *xdp_redirect_map;
@@ -512,7 +516,7 @@ static void xdp_veth_egress(u32 flags)
 						 &net_config, prog_cfg, i))
 			goto destroy_xdp_redirect_map;
 
-		err = bpf_map_update_elem(mac_map, &ifindex, magic_mac, 0);
+		err = bpf_map_update_elem(mac_map, &ifindex, egress_macs[i], 0);
 		if (!ASSERT_OK(err, "bpf_map_update_elem"))
 			goto destroy_xdp_redirect_map;
 
@@ -531,13 +535,16 @@ static void xdp_veth_egress(u32 flags)
 
 	for (i = 0; i < 2; i++) {
 		u32 key = i;
+		__be64 expected = 0;
 		u64 res;
 
 		err = bpf_map_lookup_elem(res_map, &key, &res);
 		if (!ASSERT_OK(err, "get MAC res"))
 			goto destroy_xdp_redirect_map;
 
-		ASSERT_STRNEQ((const char *)&res, magic_mac, ETH_ALEN, "compare mac");
+		/* store_mac_1/2 run on the second/third remote veths. */
+		memcpy(&expected, egress_macs[i + 1], ETH_ALEN);
+		ASSERT_EQ(res, expected, "compare mac");
 	}
 
 destroy_xdp_redirect_map:
@@ -551,6 +558,142 @@ static void xdp_veth_egress(u32 flags)
 	cleanup_network(&net_config);
 }
 
+static void xdp_veth_egress_last_dst(u32 flags)
+{
+	struct prog_configuration prog_cfg[VETH_PAIRS_COUNT] = {
+		{
+			.local_name = "xdp_redirect_map_all_prog",
+			.remote_name = "store_mac_1",
+			.local_flags = flags,
+			.remote_flags = flags,
+		},
+		{
+			.local_name = "xdp_redirect_map_all_prog",
+			.remote_name = "store_mac_2",
+			.local_flags = flags,
+			.remote_flags = flags,
+		},
+		{
+			.local_name = "xdp_redirect_map_all_prog",
+			.remote_name = "xdp_dummy_prog",
+			.local_flags = flags,
+			.remote_flags = flags,
+		}
+	};
+	const unsigned char egress_macs[VETH_PAIRS_COUNT][ETH_ALEN] = {
+		{ 0xAA, 0xBB, 0xCC, 0xDD, 0xEE, 0x01 },
+		{ 0xAA, 0xBB, 0xCC, 0xDD, 0xEE, 0x02 },
+		{ 0xAA, 0xBB, 0xCC, 0xDD, 0xEE, 0x03 },
+	};
+	struct xdp_redirect_multi_kern *xdp_redirect_multi_kern;
+	struct bpf_object *bpf_objs[VETH_EGRESS_SKEL_NB];
+	struct xdp_redirect_map *xdp_redirect_map;
+	struct net_configuration net_config;
+	int mac_map, egress_map, res_map;
+	struct nstoken *nstoken = NULL;
+	struct xdp_dummy *xdp_dummy;
+	__be64 last_mac = 0;
+	bool found = false;
+	int err;
+	int i;
+
+	xdp_dummy = xdp_dummy__open_and_load();
+	if (!ASSERT_OK_PTR(xdp_dummy, "xdp_dummy__open_and_load"))
+		return;
+
+	xdp_redirect_multi_kern = xdp_redirect_multi_kern__open_and_load();
+	if (!ASSERT_OK_PTR(xdp_redirect_multi_kern, "xdp_redirect_multi_kern__open_and_load"))
+		goto destroy_xdp_dummy;
+
+	xdp_redirect_map = xdp_redirect_map__open_and_load();
+	if (!ASSERT_OK_PTR(xdp_redirect_map, "xdp_redirect_map__open_and_load"))
+		goto destroy_xdp_redirect_multi_kern;
+
+	if (!ASSERT_OK(create_network(&net_config), "create network"))
+		goto destroy_xdp_redirect_map;
+
+	mac_map = bpf_map__fd(xdp_redirect_multi_kern->maps.mac_map);
+	if (!ASSERT_OK_FD(mac_map, "open mac_map"))
+		goto destroy_xdp_redirect_map;
+
+	egress_map = bpf_map__fd(xdp_redirect_multi_kern->maps.map_egress);
+	if (!ASSERT_OK_FD(egress_map, "open map_egress"))
+		goto destroy_xdp_redirect_map;
+
+	bpf_objs[0] = xdp_dummy->obj;
+	bpf_objs[1] = xdp_redirect_multi_kern->obj;
+	bpf_objs[2] = xdp_redirect_map->obj;
+
+	nstoken = open_netns(net_config.ns0_name);
+	if (!ASSERT_OK_PTR(nstoken, "open NS0"))
+		goto destroy_xdp_redirect_map;
+
+	for (i = 0; i < VETH_PAIRS_COUNT; i++) {
+		struct bpf_devmap_val devmap_val = {};
+		int ifindex = if_nametoindex(net_config.veth_cfg[i].local_veth);
+		u32 key = i;
+
+		SYS(destroy_xdp_redirect_map,
+		    "ip -n %s neigh add %s lladdr 00:00:00:00:00:01 dev %s",
+		    net_config.veth_cfg[i].namespace, IP_NEIGH,
+		    net_config.veth_cfg[i].remote_veth);
+
+		if (attach_programs_to_veth_pair(bpf_objs, VETH_EGRESS_SKEL_NB,
+						 &net_config, prog_cfg, i))
+			goto destroy_xdp_redirect_map;
+
+		err = bpf_map_update_elem(mac_map, &ifindex, egress_macs[i], 0);
+		if (!ASSERT_OK(err, "bpf_map_update_elem"))
+			goto destroy_xdp_redirect_map;
+
+		devmap_val.ifindex = ifindex;
+		devmap_val.bpf_prog.fd = -1;
+
+		if (i == VETH_PAIRS_COUNT - 1)
+			devmap_val.bpf_prog.fd =
+				bpf_program__fd(xdp_redirect_multi_kern->progs.xdp_devmap_prog);
+
+		err = bpf_map_update_elem(egress_map, &key, &devmap_val, 0);
+		if (!ASSERT_OK(err, "bpf_map_update_elem"))
+			goto destroy_xdp_redirect_map;
+	}
+
+	SYS_NOFAIL("ip netns exec %s ping %s -i 0.1 -c 4 -W1 > /dev/null ",
+		   net_config.veth_cfg[0].namespace, IP_NEIGH);
+
+	res_map = bpf_map__fd(xdp_redirect_map->maps.rx_mac);
+	if (!ASSERT_OK_FD(res_map, "open rx_map"))
+		goto destroy_xdp_redirect_map;
+
+	memcpy(&last_mac, egress_macs[VETH_PAIRS_COUNT - 1], ETH_ALEN);
+
+	for (i = 0; i < VETH_PAIRS_COUNT - 1; i++) {
+		u32 key = i;
+		u64 res;
+
+		err = bpf_map_lookup_elem(res_map, &key, &res);
+		if (err == -ENOENT)
+			continue;
+		if (!ASSERT_OK(err, "get MAC res"))
+			goto destroy_xdp_redirect_map;
+
+		found = true;
+		ASSERT_NEQ(res, last_mac, "compare last dst mac");
+	}
+
+	ASSERT_TRUE(found, "found earlier dst mac");
+
+destroy_xdp_redirect_map:
+	close_netns(nstoken);
+	xdp_redirect_map__destroy(xdp_redirect_map);
+destroy_xdp_redirect_multi_kern:
+	xdp_redirect_multi_kern__destroy(xdp_redirect_multi_kern);
+destroy_xdp_dummy:
+	xdp_dummy__destroy(xdp_dummy);
+
+	cleanup_network(&net_config);
+}
+
 void test_xdp_veth_redirect(void)
 {
 	if (test__start_subtest("0"))
@@ -596,4 +739,7 @@ void test_xdp_veth_egress(void)
 
 	if (test__start_subtest("SKB_MODE/egress"))
 		xdp_veth_egress(XDP_FLAGS_SKB_MODE);
+
+	if (test__start_subtest("SKB_MODE/egress_last_dst"))
+		xdp_veth_egress_last_dst(XDP_FLAGS_SKB_MODE);
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf v3 2/2] selftests/bpf: Cover generic devmap egress last-dst rewrite
  2026-06-11  4:33 ` [PATCH bpf v3 2/2] selftests/bpf: Cover generic devmap egress last-dst rewrite Sun Jian
@ 2026-06-11  4:47   ` sashiko-bot
  2026-06-11  7:14     ` sun jian
  0 siblings, 1 reply; 7+ messages in thread
From: sashiko-bot @ 2026-06-11  4:47 UTC (permalink / raw)
  To: Sun Jian; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] Uninitialized `net_config` is passed to `cleanup_network()` on early error paths, causing arbitrary shell commands to be executed via `system()`.
--

commit 60bbc3bd1e74368d2d682d8956f86b733750dfd9
Author: Sun Jian <sun.jian.kdev@gmail.com>
selftests/bpf: Cover generic devmap egress last-dst rewrite

This patch strengthens xdp_veth_egress to check that destinations observe
the correct MAC address for their own egress ifindex. It also adds a new
generic XDP last-destination test to cover cases where earlier destinations
do not have a devmap egress program.

> diff --git a/tools/testing/selftests/bpf/prog_tests/test_xdp_veth.c b/tools/testing/selftests/bpf/prog_tests/test_xdp_veth.c
> index 3e98a16659362..681415cacb75c 100644
> --- a/tools/testing/selftests/bpf/prog_tests/test_xdp_veth.c
> +++ b/tools/testing/selftests/bpf/prog_tests/test_xdp_veth.c
[ ... ]
> +static void xdp_veth_egress_last_dst(u32 flags)
> +{
[ ... ]
> +	struct net_configuration net_config;
> +	int mac_map, egress_map, res_map;
> +	struct nstoken *nstoken = NULL;
> +	struct xdp_dummy *xdp_dummy;
> +	__be64 last_mac = 0;
> +	bool found = false;
> +	int err;
> +	int i;
> +
> +	xdp_dummy = xdp_dummy__open_and_load();
> +	if (!ASSERT_OK_PTR(xdp_dummy, "xdp_dummy__open_and_load"))
> +		return;
> +
> +	xdp_redirect_multi_kern = xdp_redirect_multi_kern__open_and_load();
> +	if (!ASSERT_OK_PTR(xdp_redirect_multi_kern, "xdp_redirect_multi_kern__open_and_load"))
> +		goto destroy_xdp_dummy;
> +
> +	xdp_redirect_map = xdp_redirect_map__open_and_load();
> +	if (!ASSERT_OK_PTR(xdp_redirect_map, "xdp_redirect_map__open_and_load"))
> +		goto destroy_xdp_redirect_multi_kern;
> +
> +	if (!ASSERT_OK(create_network(&net_config), "create network"))
> +		goto destroy_xdp_redirect_map;
[ ... ]
> +destroy_xdp_redirect_map:
> +	close_netns(nstoken);
> +	xdp_redirect_map__destroy(xdp_redirect_map);
> +destroy_xdp_redirect_multi_kern:
> +	xdp_redirect_multi_kern__destroy(xdp_redirect_multi_kern);
> +destroy_xdp_dummy:
> +	xdp_dummy__destroy(xdp_dummy);
> +
> +	cleanup_network(&net_config);

[Severity: High]
This isn't a bug introduced by this patch, but does this early error
handling path pass uninitialized stack memory to cleanup_network()?

If xdp_redirect_multi_kern__open_and_load() or
xdp_redirect_map__open_and_load() fail before create_network() is called,
the goto labels eventually reach cleanup_network(&net_config).

Since cleanup_network() passes net_config fields like ns0_name to SYS_NOFAIL
which executes system commands, could passing uninitialized stack garbage
here lead to unpredictable shell command execution as root?

Would it be safer to zero-initialize net_config with `= {}` when declaring it?

> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260611043317.512843-1-sun.jian.kdev@gmail.com?part=2

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf v3 1/2] bpf: Run generic devmap egress prog on private skb
  2026-06-11  4:33 ` [PATCH bpf v3 1/2] bpf: Run generic devmap egress prog on private skb Sun Jian
@ 2026-06-11  4:58   ` sashiko-bot
  2026-06-11  7:10     ` sun jian
  0 siblings, 1 reply; 7+ messages in thread
From: sashiko-bot @ 2026-06-11  4:58 UTC (permalink / raw)
  To: Sun Jian; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Critical] A maliciously crafted or buggy BPF devmap egress program can cause a double free of an skb by returning an action code that implicitly casts to a negative integer.
- [High] skb_copy() zeroes out skb->mac_len, causing generic XDP to operate on the wrong skb->data offset and potentially read garbage from uninitialized headroom.
--

commit b95ebb55a25ce44657b1e9ee2add3b1177f06853
Author: Sun Jian <sun.jian.kdev@gmail.com>

bpf: Run generic devmap egress prog on private skb

This commit introduces a fix to prevent multiple destinations from sharing
the same packet data in generic devmap egress programs by copying cloned
skbs before running the program.

> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> index cc0a43ebab6b9..14506834345aa 100644
> --- a/kernel/bpf/devmap.c
> +++ b/kernel/bpf/devmap.c
> @@ -512,8 +512,10 @@ static inline int __xdp_enqueue(struct net_device *dev, struct xdp_frame *xdpf,
>  	return 0;
>  }
>  
> -static u32 dev_map_bpf_prog_run_skb(struct sk_buff *skb, struct bpf_dtab_netdev *dst)
> +static int dev_map_bpf_prog_run_skb(struct sk_buff **pskb,
> +				    struct bpf_dtab_netdev *dst)

[Severity: Critical]
The return type here was changed from u32 to int. Because xdp action codes
returned by bpf programs are unconstrained 32-bit unsigned integers, what
happens if a program returns a value with the highest bit set? 

Could this arbitrary unsigned action code be implicitly cast to a negative
integer, causing callers to mistakenly interpret it as a kernel error?

>  {
> +	struct sk_buff *skb = *pskb;
>  	struct xdp_txq_info txq = { .dev = dst->dev };
>  	struct xdp_buff xdp;

[ ... ]

> @@ -521,6 +523,18 @@ static u32 dev_map_bpf_prog_run_skb(struct sk_buff *skb, struct bpf_dtab_netdev
>  	if (!dst->xdp_prog)
>  		return XDP_PASS;
>  
> +	if (skb_cloned(skb)) {
> +		struct sk_buff *nskb;
> +
> +		nskb = skb_copy(skb, GFP_ATOMIC);

[Severity: High]
Does skb_copy preserve the mac_len field?

Because skb_copy allocates a new skb and zeroes out the structure up to the
headers, mac_len is left as 0.

> +		if (!nskb)
> +			return -ENOMEM;
> +
> +		consume_skb(skb);
> +		skb = nskb;
> +		*pskb = nskb;
> +	}
> +
>  	__skb_pull(skb, skb->mac_len);

[Severity: High]
Following up on the skb_copy question above, if mac_len is 0, will this
pull 0 bytes instead of the expected header length?

If so, the data pointer would remain at the mac header instead of the
network header. If an egress program later modifies the ethernet header
and triggers the mac update path, could this push the data pointer
backwards into uninitialized headroom, leading to protocol corruption
or leaked kernel memory?

[ ... ]

> @@ -710,7 +724,10 @@ int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
>  	 * return 0 even if packet is dropped. Helper below takes care of
>  	 * freeing skb.
>  	 */
> -	if (dev_map_bpf_prog_run_skb(skb, dst) != XDP_PASS)
> +	err = dev_map_bpf_prog_run_skb(&skb, dst);
> +	if (err < 0)
> +		return err;

[Severity: Critical]
If dev_map_bpf_prog_run_skb returns a negative integer due to an implicitly
cast unsigned action code as mentioned above, will it hit this error path?

If dev_map_bpf_prog_run_skb already dropped the packet via kfree_skb, will
propagating this negative value cause higher level callers to execute their
own error paths and free the identical skb a second time?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260611043317.512843-1-sun.jian.kdev@gmail.com?part=1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf v3 1/2] bpf: Run generic devmap egress prog on private skb
  2026-06-11  4:58   ` sashiko-bot
@ 2026-06-11  7:10     ` sun jian
  0 siblings, 0 replies; 7+ messages in thread
From: sun jian @ 2026-06-11  7:10 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: bpf

On Thu, Jun 11, 2026 at 12:58 PM <sashiko-bot@kernel.org> wrote:
>
> Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
> - [Critical] A maliciously crafted or buggy BPF devmap egress program can cause a double free of an skb by returning an action code that implicitly casts to a negative integer.
> - [High] skb_copy() zeroes out skb->mac_len, causing generic XDP to operate on the wrong skb->data offset and potentially read garbage from uninitialized headroom.
> --
>
> commit b95ebb55a25ce44657b1e9ee2add3b1177f06853
> Author: Sun Jian <sun.jian.kdev@gmail.com>
>
> bpf: Run generic devmap egress prog on private skb
>
> This commit introduces a fix to prevent multiple destinations from sharing
> the same packet data in generic devmap egress programs by copying cloned
> skbs before running the program.
>
> > diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> > index cc0a43ebab6b9..14506834345aa 100644
> > --- a/kernel/bpf/devmap.c
> > +++ b/kernel/bpf/devmap.c
> > @@ -512,8 +512,10 @@ static inline int __xdp_enqueue(struct net_device *dev, struct xdp_frame *xdpf,
> >       return 0;
> >  }
> >
> > -static u32 dev_map_bpf_prog_run_skb(struct sk_buff *skb, struct bpf_dtab_netdev *dst)
> > +static int dev_map_bpf_prog_run_skb(struct sk_buff **pskb,
> > +                                 struct bpf_dtab_netdev *dst)
>
> [Severity: Critical]
> The return type here was changed from u32 to int. Because xdp action codes
> returned by bpf programs are unconstrained 32-bit unsigned integers, what
> happens if a program returns a value with the highest bit set?
>
> Could this arbitrary unsigned action code be implicitly cast to a negative
> integer, causing callers to mistakenly interpret it as a kernel error?
>
> >  {
> > +     struct sk_buff *skb = *pskb;
> >       struct xdp_txq_info txq = { .dev = dst->dev };
> >       struct xdp_buff xdp;
>
> [ ... ]
>
> > @@ -521,6 +523,18 @@ static u32 dev_map_bpf_prog_run_skb(struct sk_buff *skb, struct bpf_dtab_netdev
> >       if (!dst->xdp_prog)
> >               return XDP_PASS;
> >
> > +     if (skb_cloned(skb)) {
> > +             struct sk_buff *nskb;
> > +
> > +             nskb = skb_copy(skb, GFP_ATOMIC);
>
> [Severity: High]
> Does skb_copy preserve the mac_len field?
>
> Because skb_copy allocates a new skb and zeroes out the structure up to the
> headers, mac_len is left as 0.
>
> > +             if (!nskb)
> > +                     return -ENOMEM;
> > +
> > +             consume_skb(skb);
> > +             skb = nskb;
> > +             *pskb = nskb;
> > +     }
> > +
> >       __skb_pull(skb, skb->mac_len);
>
> [Severity: High]
> Following up on the skb_copy question above, if mac_len is 0, will this
> pull 0 bytes instead of the expected header length?
>
> If so, the data pointer would remain at the mac header instead of the
> network header. If an egress program later modifies the ethernet header
> and triggers the mac update path, could this push the data pointer
> backwards into uninitialized headroom, leading to protocol corruption
> or leaked kernel memory?
>
> [ ... ]
>
> > @@ -710,7 +724,10 @@ int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
> >        * return 0 even if packet is dropped. Helper below takes care of
> >        * freeing skb.
> >        */
> > -     if (dev_map_bpf_prog_run_skb(skb, dst) != XDP_PASS)
> > +     err = dev_map_bpf_prog_run_skb(&skb, dst);
> > +     if (err < 0)
> > +             return err;
>
> [Severity: Critical]
> If dev_map_bpf_prog_run_skb returns a negative integer due to an implicitly
> cast unsigned action code as mentioned above, will it hit this error path?
>
> If dev_map_bpf_prog_run_skb already dropped the packet via kfree_skb, will
> propagating this negative value cause higher level callers to execute their
> own error paths and free the identical skb a second time?
>
> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260611043317.512843-1-sun.jian.kdev@gmail.com?part=1

Thanks for the review.

I checked the skb_copy() path and this concern is valid. skb_copy()
calls skb_copy_header(), which does not copy mac_len. I will preserve
mac_len explicitly after skb_copy() in v4.

I will also avoid mixing errno and XDP action values in the helper return
value. The helper will return only errno and pass the XDP action back
through an output argument, so invalid action values cannot be interpreted
as negative errors by the caller.

Thanks,
sun jian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf v3 2/2] selftests/bpf: Cover generic devmap egress last-dst rewrite
  2026-06-11  4:47   ` sashiko-bot
@ 2026-06-11  7:14     ` sun jian
  0 siblings, 0 replies; 7+ messages in thread
From: sun jian @ 2026-06-11  7:14 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: bpf

On Thu, Jun 11, 2026 at 12:47 PM <sashiko-bot@kernel.org> wrote:
>
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
>
> Pre-existing issues:
> - [High] Uninitialized `net_config` is passed to `cleanup_network()` on early error paths, causing arbitrary shell commands to be executed via `system()`.
> --
>
> commit 60bbc3bd1e74368d2d682d8956f86b733750dfd9
> Author: Sun Jian <sun.jian.kdev@gmail.com>
> selftests/bpf: Cover generic devmap egress last-dst rewrite
>
> This patch strengthens xdp_veth_egress to check that destinations observe
> the correct MAC address for their own egress ifindex. It also adds a new
> generic XDP last-destination test to cover cases where earlier destinations
> do not have a devmap egress program.
>
> > diff --git a/tools/testing/selftests/bpf/prog_tests/test_xdp_veth.c b/tools/testing/selftests/bpf/prog_tests/test_xdp_veth.c
> > index 3e98a16659362..681415cacb75c 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/test_xdp_veth.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/test_xdp_veth.c
> [ ... ]
> > +static void xdp_veth_egress_last_dst(u32 flags)
> > +{
> [ ... ]
> > +     struct net_configuration net_config;
> > +     int mac_map, egress_map, res_map;
> > +     struct nstoken *nstoken = NULL;
> > +     struct xdp_dummy *xdp_dummy;
> > +     __be64 last_mac = 0;
> > +     bool found = false;
> > +     int err;
> > +     int i;
> > +
> > +     xdp_dummy = xdp_dummy__open_and_load();
> > +     if (!ASSERT_OK_PTR(xdp_dummy, "xdp_dummy__open_and_load"))
> > +             return;
> > +
> > +     xdp_redirect_multi_kern = xdp_redirect_multi_kern__open_and_load();
> > +     if (!ASSERT_OK_PTR(xdp_redirect_multi_kern, "xdp_redirect_multi_kern__open_and_load"))
> > +             goto destroy_xdp_dummy;
> > +
> > +     xdp_redirect_map = xdp_redirect_map__open_and_load();
> > +     if (!ASSERT_OK_PTR(xdp_redirect_map, "xdp_redirect_map__open_and_load"))
> > +             goto destroy_xdp_redirect_multi_kern;
> > +
> > +     if (!ASSERT_OK(create_network(&net_config), "create network"))
> > +             goto destroy_xdp_redirect_map;
> [ ... ]
> > +destroy_xdp_redirect_map:
> > +     close_netns(nstoken);
> > +     xdp_redirect_map__destroy(xdp_redirect_map);
> > +destroy_xdp_redirect_multi_kern:
> > +     xdp_redirect_multi_kern__destroy(xdp_redirect_multi_kern);
> > +destroy_xdp_dummy:
> > +     xdp_dummy__destroy(xdp_dummy);
> > +
> > +     cleanup_network(&net_config);
>
> [Severity: High]
> This isn't a bug introduced by this patch, but does this early error
> handling path pass uninitialized stack memory to cleanup_network()?
>
> If xdp_redirect_multi_kern__open_and_load() or
> xdp_redirect_map__open_and_load() fail before create_network() is called,
> the goto labels eventually reach cleanup_network(&net_config).
>
> Since cleanup_network() passes net_config fields like ns0_name to SYS_NOFAIL
> which executes system commands, could passing uninitialized stack garbage
> here lead to unpredictable shell command execution as root?
>
> Would it be safer to zero-initialize net_config with `= {}` when declaring it?
>
> > +}
>
> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260611043317.512843-1-sun.jian.kdev@gmail.com?part=2

Thanks for the review.

I will zero-initialize net_config in the new test in v4, so the cleanup
path is safe even if an earlier setup step fails before create_network()
initializes it.

Thanks,
Jian

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-06-11  7:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-11  4:33 [PATCH bpf v3 0/2] bpf: Fix generic devmap egress skb sharing Sun Jian
2026-06-11  4:33 ` [PATCH bpf v3 1/2] bpf: Run generic devmap egress prog on private skb Sun Jian
2026-06-11  4:58   ` sashiko-bot
2026-06-11  7:10     ` sun jian
2026-06-11  4:33 ` [PATCH bpf v3 2/2] selftests/bpf: Cover generic devmap egress last-dst rewrite Sun Jian
2026-06-11  4:47   ` sashiko-bot
2026-06-11  7:14     ` sun jian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox