* [PATCH] bpf: Unshare cloned skb before devmap egress XDP program
@ 2026-06-09 10:02 Sun Jian
2026-06-09 10:17 ` sashiko-bot
2026-06-09 11:06 ` Menglong Dong
0 siblings, 2 replies; 6+ messages in thread
From: Sun Jian @ 2026-06-09 10:02 UTC (permalink / raw)
To: bpf
Cc: netdev, linux-kernel, linux-kselftest, ast, daniel, andrii,
martin.lau, davem, kuba, hawk, john.fastabend, sdf, shuah,
liuhangbin, Sun Jian
dev_map_redirect_clone() uses skb_clone() when redirecting a generic XDP
skb to multiple devmap destinations. The cloned skb can share packet data
with other clones.
If the destination devmap entry has an egress XDP program, that program
can modify packet data. Such modifications can then be observed by other
clones sharing the same packet data.
This can be reproduced by strengthening xdp_veth_egress to configure a
different source MAC for each egress device and checking that store_mac_1/2
observe the MAC configured for their own egress devices. Without the fix,
the SKB_MODE subtest observes store_mac_1 receiving the MAC configured for
the next egress device.
Fix this by unsharing the cloned skb before running the devmap egress XDP
program. Limit the extra copy to destinations with an attached egress
program.
Tested with:
./test_progs -t xdp_veth_egress
./test_progs -t xdp_veth
./test_progs -t xdp
Fixes: e624d4ed4aa8 ("xdp: Extend xdp_redirect_map with broadcast support")
Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
---
kernel/bpf/devmap.c | 6 ++++++
.../selftests/bpf/prog_tests/test_xdp_veth.c | 13 ++++++++++---
2 files changed, 16 insertions(+), 3 deletions(-)
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index cc0a43ebab6b..4ae65d44f9d6 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -730,6 +730,12 @@ static int dev_map_redirect_clone(struct bpf_dtab_netdev *dst,
if (!nskb)
return -ENOMEM;
+ if (dst->xdp_prog) {
+ nskb = skb_unshare(nskb, GFP_ATOMIC);
+ if (!nskb)
+ return -ENOMEM;
+ }
+
err = dev_map_generic_redirect(dst, nskb, xdp_prog);
if (unlikely(err)) {
consume_skb(nskb);
diff --git a/tools/testing/selftests/bpf/prog_tests/test_xdp_veth.c b/tools/testing/selftests/bpf/prog_tests/test_xdp_veth.c
index 3e98a1665936..52d79d5c5629 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_xdp_veth.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_xdp_veth.c
@@ -456,7 +456,11 @@ static void xdp_veth_egress(u32 flags)
.remote_flags = flags,
}
};
- const char magic_mac[6] = { 0xAA, 0xBB, 0xCC, 0xDD, 0xEE, 0xFF};
+ const unsigned char egress_macs[VETH_PAIRS_COUNT][ETH_ALEN] = {
+ { 0xAA, 0xBB, 0xCC, 0xDD, 0xEE, 0x01 },
+ { 0xAA, 0xBB, 0xCC, 0xDD, 0xEE, 0x02 },
+ { 0xAA, 0xBB, 0xCC, 0xDD, 0xEE, 0x03 },
+ };
struct xdp_redirect_multi_kern *xdp_redirect_multi_kern;
struct bpf_object *bpf_objs[VETH_EGRESS_SKEL_NB];
struct xdp_redirect_map *xdp_redirect_map;
@@ -512,7 +516,7 @@ static void xdp_veth_egress(u32 flags)
&net_config, prog_cfg, i))
goto destroy_xdp_redirect_map;
- err = bpf_map_update_elem(mac_map, &ifindex, magic_mac, 0);
+ err = bpf_map_update_elem(mac_map, &ifindex, egress_macs[i], 0);
if (!ASSERT_OK(err, "bpf_map_update_elem"))
goto destroy_xdp_redirect_map;
@@ -531,13 +535,16 @@ static void xdp_veth_egress(u32 flags)
for (i = 0; i < 2; i++) {
u32 key = i;
+ __be64 expected = 0;
u64 res;
err = bpf_map_lookup_elem(res_map, &key, &res);
if (!ASSERT_OK(err, "get MAC res"))
goto destroy_xdp_redirect_map;
- ASSERT_STRNEQ((const char *)&res, magic_mac, ETH_ALEN, "compare mac");
+ /* store_mac_1/2 run on the second/third remote veths. */
+ memcpy(&expected, egress_macs[i + 1], ETH_ALEN);
+ ASSERT_EQ(res, expected, "compare mac");
}
destroy_xdp_redirect_map:
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] bpf: Unshare cloned skb before devmap egress XDP program
2026-06-09 10:02 [PATCH] bpf: Unshare cloned skb before devmap egress XDP program Sun Jian
@ 2026-06-09 10:17 ` sashiko-bot
2026-06-09 11:06 ` Menglong Dong
1 sibling, 0 replies; 6+ messages in thread
From: sashiko-bot @ 2026-06-09 10:17 UTC (permalink / raw)
To: Sun Jian; +Cc: bpf
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
Pre-existing issues:
- [High] The patch incompletely fixes shared SKB data corruption in generic XDP devmap redirection. When broadcasting to multiple destinations, the original `skb` is used for the final destination (`last_dst`). If `last_dst` has an egress XDP program, it runs directly on the original `skb` and modifies its packet data in-place without checking if it is shared. If prior destinations lacked an egress XDP program, their clones were not unshared by the new logic in `dev_map_redirect_clone()`. Consequently, they still share the same packet data buffer as the original `skb`, and the final destination's egress program will silently corrupt the packet data of the earlier destinations' clones.
--
commit 280ec1e15889a77555ac8877632a39ede4e76d91
Author: Sun Jian <sun.jian.kdev@gmail.com>
bpf: Unshare cloned skb before devmap egress XDP program
This commit modifies dev_map_redirect_clone() to unshare cloned skbs if the
destination devmap entry has an egress XDP program. This prevents modifications
by the egress XDP program from being observed by other clones sharing the same
packet data.
> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> index cc0a43ebab6b9..4ae65d44f9d6e 100644
> --- a/kernel/bpf/devmap.c
> +++ b/kernel/bpf/devmap.c
> @@ -730,6 +730,12 @@ static int dev_map_redirect_clone(struct bpf_dtab_netdev *dst,
> if (!nskb)
> return -ENOMEM;
>
> + if (dst->xdp_prog) {
> + nskb = skb_unshare(nskb, GFP_ATOMIC);
> + if (!nskb)
> + return -ENOMEM;
> + }
> +
[Severity: High]
This is a pre-existing issue, but does this logic still leave the original skb
vulnerable to data corruption during broadcast?
In dev_map_redirect_multi(), the loop clones the skb for N-1 destinations,
and then calls dev_map_generic_redirect() on the original skb for the final
destination:
kernel/bpf/devmap.c:dev_map_redirect_multi() {
...
/* consume the first skb and return */
if (last_dst)
return dev_map_generic_redirect(last_dst, skb, xdp_prog);
...
}
If the earlier destinations lack an egress XDP program, dev_map_redirect_clone()
will not unshare their clones. If the final destination (last_dst) does have an
egress XDP program, dev_map_bpf_prog_run_skb() will run directly on the original
skb and modify its packet data in-place without ensuring the buffer is unshared:
kernel/bpf/devmap.c:dev_map_bpf_prog_run_skb() {
...
act = bpf_prog_run_generic_xdp(skb, &xdp, dst->xdp_prog);
...
}
Could this silently corrupt or truncate the packet data for the earlier
clones that still share the same buffer?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260609100214.337538-1-sun.jian.kdev@gmail.com?part=1
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] bpf: Unshare cloned skb before devmap egress XDP program
2026-06-09 10:02 [PATCH] bpf: Unshare cloned skb before devmap egress XDP program Sun Jian
2026-06-09 10:17 ` sashiko-bot
@ 2026-06-09 11:06 ` Menglong Dong
2026-06-10 0:06 ` Emil Tsalapatis
1 sibling, 1 reply; 6+ messages in thread
From: Menglong Dong @ 2026-06-09 11:06 UTC (permalink / raw)
To: Sun Jian
Cc: bpf, netdev, linux-kernel, linux-kselftest, ast, daniel, andrii,
martin.lau, davem, kuba, hawk, john.fastabend, sdf, shuah,
liuhangbin, Sun Jian
On 2026/6/9 18:02 Sun Jian <sun.jian.kdev@gmail.com> write:
> dev_map_redirect_clone() uses skb_clone() when redirecting a generic XDP
> skb to multiple devmap destinations. The cloned skb can share packet data
> with other clones.
>
> If the destination devmap entry has an egress XDP program, that program
> can modify packet data. Such modifications can then be observed by other
> clones sharing the same packet data.
>
> This can be reproduced by strengthening xdp_veth_egress to configure a
> different source MAC for each egress device and checking that store_mac_1/2
> observe the MAC configured for their own egress devices. Without the fix,
> the SKB_MODE subtest observes store_mac_1 receiving the MAC configured for
> the next egress device.
>
> Fix this by unsharing the cloned skb before running the devmap egress XDP
> program. Limit the extra copy to destinations with an attached egress
> program.
Hi, Jian.
This sounds like a good idea in this case. When I have a look at bpf_clone_redirect(),
I found that it use skb_clone() too, which means it has the same problem. The
data can be modified by other xdp prog in the destination NIC if we use
bpf_clone_redirect().
So maybe this is the default logic, and I'm not sure if this patch can break the
existing users :/
Thanks!
Menglong Dong
>
> Tested with:
> ./test_progs -t xdp_veth_egress
> ./test_progs -t xdp_veth
> ./test_progs -t xdp
[...]
>
> destroy_xdp_redirect_map:
> --
> 2.43.0
>
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] bpf: Unshare cloned skb before devmap egress XDP program
2026-06-09 11:06 ` Menglong Dong
@ 2026-06-10 0:06 ` Emil Tsalapatis
2026-06-10 1:21 ` Menglong Dong
0 siblings, 1 reply; 6+ messages in thread
From: Emil Tsalapatis @ 2026-06-10 0:06 UTC (permalink / raw)
To: Menglong Dong, Sun Jian
Cc: bpf, netdev, linux-kernel, linux-kselftest, ast, daniel, andrii,
martin.lau, davem, kuba, hawk, john.fastabend, sdf, shuah,
liuhangbin
On Tue Jun 9, 2026 at 7:06 AM EDT, Menglong Dong wrote:
> On 2026/6/9 18:02 Sun Jian <sun.jian.kdev@gmail.com> write:
>> dev_map_redirect_clone() uses skb_clone() when redirecting a generic XDP
>> skb to multiple devmap destinations. The cloned skb can share packet data
>> with other clones.
>>
>> If the destination devmap entry has an egress XDP program, that program
>> can modify packet data. Such modifications can then be observed by other
>> clones sharing the same packet data.
>>
>> This can be reproduced by strengthening xdp_veth_egress to configure a
>> different source MAC for each egress device and checking that store_mac_1/2
>> observe the MAC configured for their own egress devices. Without the fix,
>> the SKB_MODE subtest observes store_mac_1 receiving the MAC configured for
>> the next egress device.
>>
>> Fix this by unsharing the cloned skb before running the devmap egress XDP
>> program. Limit the extra copy to destinations with an attached egress
>> program.
>
> Hi, Jian.
>
> This sounds like a good idea in this case. When I have a look at bpf_clone_redirect(),
> I found that it use skb_clone() too, which means it has the same problem. The
> data can be modified by other xdp prog in the destination NIC if we use
> bpf_clone_redirect().
>
> So maybe this is the default logic, and I'm not sure if this patch can break the
> existing users :/
I think for use cases where we are using bpf_clone_redirect() to use
one clone for inspection this would add an unnecessary copy. Maybe
adding *_copy() variants instead of changing the *_clone() would be
better? That way we wouldn't be changing the behavior for existing
consumers and the naming would be consistent with the skb_* methods.
But more importantly, is there an actual use case for the kind of API
that the modified selftest requires? Nobody until now has considered
the existing behavior to be a problem.
>
> Thanks!
> Menglong Dong
>
>>
>> Tested with:
>> ./test_progs -t xdp_veth_egress
>> ./test_progs -t xdp_veth
>> ./test_progs -t xdp
> [...]
>>
>> destroy_xdp_redirect_map:
>> --
>> 2.43.0
>>
>>
>>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] bpf: Unshare cloned skb before devmap egress XDP program
2026-06-10 0:06 ` Emil Tsalapatis
@ 2026-06-10 1:21 ` Menglong Dong
2026-06-10 1:58 ` sun jian
0 siblings, 1 reply; 6+ messages in thread
From: Menglong Dong @ 2026-06-10 1:21 UTC (permalink / raw)
To: Sun Jian, Emil Tsalapatis
Cc: bpf, netdev, linux-kernel, linux-kselftest, ast, daniel, andrii,
martin.lau, davem, kuba, hawk, john.fastabend, sdf, shuah,
liuhangbin
On 2026/6/10 08:06 Emil Tsalapatis <emil@etsalapatis.com> write:
> On Tue Jun 9, 2026 at 7:06 AM EDT, Menglong Dong wrote:
> > On 2026/6/9 18:02 Sun Jian <sun.jian.kdev@gmail.com> write:
> >> dev_map_redirect_clone() uses skb_clone() when redirecting a generic XDP
> >> skb to multiple devmap destinations. The cloned skb can share packet data
> >> with other clones.
> >>
> >> If the destination devmap entry has an egress XDP program, that program
> >> can modify packet data. Such modifications can then be observed by other
> >> clones sharing the same packet data.
> >>
> >> This can be reproduced by strengthening xdp_veth_egress to configure a
> >> different source MAC for each egress device and checking that store_mac_1/2
> >> observe the MAC configured for their own egress devices. Without the fix,
> >> the SKB_MODE subtest observes store_mac_1 receiving the MAC configured for
> >> the next egress device.
> >>
> >> Fix this by unsharing the cloned skb before running the devmap egress XDP
> >> program. Limit the extra copy to destinations with an attached egress
> >> program.
> >
> > Hi, Jian.
> >
> > This sounds like a good idea in this case. When I have a look at bpf_clone_redirect(),
> > I found that it use skb_clone() too, which means it has the same problem. The
> > data can be modified by other xdp prog in the destination NIC if we use
> > bpf_clone_redirect().
> >
> > So maybe this is the default logic, and I'm not sure if this patch can break the
> > existing users :/
>
> I think for use cases where we are using bpf_clone_redirect() to use
> one clone for inspection this would add an unnecessary copy. Maybe
> adding *_copy() variants instead of changing the *_clone() would be
> better? That way we wouldn't be changing the behavior for existing
> consumers and the naming would be consistent with the skb_* methods.
Agree. It's not a good idea to change the logic of the existing API. Or
maybe we can add a BPF_F_CLONE flag for the existing API.
>
> But more importantly, is there an actual use case for the kind of API
> that the modified selftest requires? Nobody until now has considered
> the existing behavior to be a problem.
Agree too. Obviously, this is not a bug. For the use case in the commit
log, it's something that can be fixed by the user themself. If we need
modify the MAC, we'd better attach a BPF program for all the egress
device in the devmap.
>
> >
> > Thanks!
> > Menglong Dong
> >
> >>
> >> Tested with:
> >> ./test_progs -t xdp_veth_egress
> >> ./test_progs -t xdp_veth
> >> ./test_progs -t xdp
> > [...]
> >>
> >> destroy_xdp_redirect_map:
> >> --
> >> 2.43.0
> >>
> >>
> >>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] bpf: Unshare cloned skb before devmap egress XDP program
2026-06-10 1:21 ` Menglong Dong
@ 2026-06-10 1:58 ` sun jian
0 siblings, 0 replies; 6+ messages in thread
From: sun jian @ 2026-06-10 1:58 UTC (permalink / raw)
To: Menglong Dong
Cc: Emil Tsalapatis, bpf, netdev, linux-kernel, linux-kselftest, ast,
daniel, andrii, martin.lau, davem, kuba, hawk, john.fastabend,
sdf, shuah, liuhangbin
On Wed, Jun 10, 2026 at 9:21 AM Menglong Dong <menglong.dong@linux.dev> wrote:
>
> On 2026/6/10 08:06 Emil Tsalapatis <emil@etsalapatis.com> write:
> > On Tue Jun 9, 2026 at 7:06 AM EDT, Menglong Dong wrote:
> > > On 2026/6/9 18:02 Sun Jian <sun.jian.kdev@gmail.com> write:
> > >> dev_map_redirect_clone() uses skb_clone() when redirecting a generic XDP
> > >> skb to multiple devmap destinations. The cloned skb can share packet data
> > >> with other clones.
> > >>
> > >> If the destination devmap entry has an egress XDP program, that program
> > >> can modify packet data. Such modifications can then be observed by other
> > >> clones sharing the same packet data.
> > >>
> > >> This can be reproduced by strengthening xdp_veth_egress to configure a
> > >> different source MAC for each egress device and checking that store_mac_1/2
> > >> observe the MAC configured for their own egress devices. Without the fix,
> > >> the SKB_MODE subtest observes store_mac_1 receiving the MAC configured for
> > >> the next egress device.
> > >>
> > >> Fix this by unsharing the cloned skb before running the devmap egress XDP
> > >> program. Limit the extra copy to destinations with an attached egress
> > >> program.
> > >
> > > Hi, Jian.
> > >
> > > This sounds like a good idea in this case. When I have a look at bpf_clone_redirect(),
> > > I found that it use skb_clone() too, which means it has the same problem. The
> > > data can be modified by other xdp prog in the destination NIC if we use
> > > bpf_clone_redirect().
> > >
> > > So maybe this is the default logic, and I'm not sure if this patch can break the
> > > existing users :/
> >
> > I think for use cases where we are using bpf_clone_redirect() to use
> > one clone for inspection this would add an unnecessary copy. Maybe
> > adding *_copy() variants instead of changing the *_clone() would be
> > better? That way we wouldn't be changing the behavior for existing
> > consumers and the naming would be consistent with the skb_* methods.
>
> Agree. It's not a good idea to change the logic of the existing API. Or
> maybe we can add a BPF_F_CLONE flag for the existing API.
>
> >
> > But more importantly, is there an actual use case for the kind of API
> > that the modified selftest requires? Nobody until now has considered
> > the existing behavior to be a problem.
>
> Agree too. Obviously, this is not a bug. For the use case in the commit
> log, it's something that can be fixed by the user themself. If we need
> modify the MAC, we'd better attach a BPF program for all the egress
> device in the devmap.
>
> >
> > >
> > > Thanks!
> > > Menglong Dong
> > >
> > >>
> > >> Tested with:
> > >> ./test_progs -t xdp_veth_egress
> > >> ./test_progs -t xdp_veth
> > >> ./test_progs -t xdp
> > > [...]
> > >>
> > >> destroy_xdp_redirect_map:
> > >> --
> > >> 2.43.0
> > >>
> > >>
> > >>
> >
>
>
>
>
Thanks for all the comments.
I agree this should not be treated as a generic skb_clone() issue, and I
understand the concern about changing existing clone/shared-data semantics.
The case I was trying to address is narrower: generic XDP devmap
broadcast/multi redirect with per-devmap-entry egress programs. The use case is
not newly introduced by this patch. The existing xdp_veth_egress test already
attaches xdp_devmap_prog to all egress devmap entries, and that program rewrites
eth->h_source based on ctx->egress_ifindex. My change only strengthens the test
from checking that the observed MAC is not equal to a single magic MAC to
checking that each destination observes the MAC selected for its own egress
ifindex.
So attaching an egress program to all devmap entries is already what this test
does. The SKB_MODE failure happens because the cloned skbs can still share the
packet data, and a later per-egress rewrite can be observed by another
destination.
That said, I see the concern that this may need a clearer semantic boundary
between clone and copy behavior. I will also check the last_dst path pointed out
by Sashiko before deciding whether this should be handled as a bug fix, or
whether it needs a separate explicit copy/isolated redirect semantic instead.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-06-10 1:58 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-09 10:02 [PATCH] bpf: Unshare cloned skb before devmap egress XDP program Sun Jian
2026-06-09 10:17 ` sashiko-bot
2026-06-09 11:06 ` Menglong Dong
2026-06-10 0:06 ` Emil Tsalapatis
2026-06-10 1:21 ` Menglong Dong
2026-06-10 1:58 ` sun jian
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.