Netdev List
 help / color / mirror / Atom feed
From: Jiayuan Chen <jiayuan.chen@linux.dev>
To: Sun Jian <sun.jian.kdev@gmail.com>, bpf@vger.kernel.org
Cc: "Menglong Dong" <menglong.dong@linux.dev>,
	"Emil Tsalapatis" <emil@etsalapatis.com>,
	"Jiayuan Chen" <jiayuan.chen@linux.dev>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"David S. Miller" <davem@davemloft.net>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"Stanislav Fomichev" <sdf@fomichev.me>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	"Martin KaFai Lau" <martin.lau@linux.dev>,
	"Eduard Zingerman" <eddyz87@gmail.com>,
	"Kumar Kartikeya Dwivedi" <memxor@gmail.com>,
	"Song Liu" <song@kernel.org>,
	"Yonghong Song" <yonghong.song@linux.dev>,
	"Jiri Olsa" <jolsa@kernel.org>, "Shuah Khan" <shuah@kernel.org>,
	"Hangbin Liu" <liuhangbin@gmail.com>,
	"Toke Høiland-Jørgensen" <toke@redhat.com>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org
Subject: Re: [PATCH bpf v2] bpf: Run generic devmap egress prog on private skb
Date: Wed, 10 Jun 2026 20:59:31 +0800	[thread overview]
Message-ID: <08c35c70-a59e-4e0e-91db-22b5ec30b611@linux.dev> (raw)
In-Reply-To: <20260610102850.483291-1-sun.jian.kdev@gmail.com>


On 6/10/26 6:28 PM, Sun Jian wrote:
> Generic XDP devmap multi redirect uses skb_clone() for the
> intermediate destinations and sends the last destination with the
> original skb. This can leave multiple destinations sharing the same
> packet data.
>
> This becomes visible when a devmap egress program mutates packet data.
> One destination can observe changes made for another destination. The
> last-destination path has the same problem: the last destination runs on
> the original skb, so its egress program can modify packet data still
> shared with earlier cloned skbs.
>
> Native XDP broadcast redirect does not have this issue because
> xdpf_clone() copies the frame data for each destination. Generic XDP
> should provide the same per-destination isolation before running a
> devmap egress program.
>
> Fix this by making cloned skbs private in dev_map_generic_redirect()
> before running the devmap egress program. Use skb_copy() instead of
> skb_unshare() so that allocation failure does not consume the skb and
> the existing caller error paths keep their ownership semantics.
>
> Add a selftest that covers the last-destination case where earlier
> destinations do not have a devmap egress program, while the final
> destination does.
>
> Tested with:
>    ./test_progs -t xdp_veth_egress
>    ./test_progs -t xdp_veth
>    ./test_progs -t xdp
>
> Fixes: e624d4ed4aa8 ("xdp: Extend xdp_redirect_map with broadcast support")
> Suggested-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
> ---
> v1: https://lore.kernel.org/bpf/CABFUUZFimdrZdq=NWi+N-0sJZWvMwY=f4iF6-3TVMS8=m07Zmw@mail.gmail.com/
>
> Changes in v2:
> - Move the private-copy step into dev_map_generic_redirect() so the
>    last-destination path is covered as well.
> - Use skb_copy() instead of skb_unshare() to keep caller ownership
>    unchanged on allocation failure.
> - Add a generic XDP last-destination selftest case.
>
>   kernel/bpf/devmap.c                           |  10 ++
>   .../selftests/bpf/prog_tests/test_xdp_veth.c  | 151 +++++++++++++++++-
>   2 files changed, 158 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> index cc0a43ebab6b..59f267685bc6 100644
> --- a/kernel/bpf/devmap.c
> +++ b/kernel/bpf/devmap.c
> @@ -700,12 +700,22 @@ int dev_map_enqueue_multi(struct xdp_frame *xdpf, struct net_device *dev_rx,
>   int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
>   			     const struct bpf_prog *xdp_prog)
>   {
> +	struct sk_buff *nskb;
>   	int err;
>   
>   	err = xdp_ok_fwd_dev(dst->dev, skb->len);
>   	if (unlikely(err))
>   		return err;
>   
> +	if (dst->xdp_prog && skb_cloned(skb)) {
> +		nskb = skb_copy(skb, GFP_ATOMIC);
> +		if (!nskb)
> +			return -ENOMEM;
> +
> +		consume_skb(skb);
> +		skb = nskb;
> +	}
> +


LGTM.


Please split selftest as a seprated patch of a patchset.

>   	/* Redirect has already succeeded semantically at this point, so we just
>   	 * return 0 even if packet is dropped. Helper below takes care of
>   	 * freeing skb.
> diff --git a/tools/testing/selftests/bpf/prog_tests/test_xdp_veth.c b/tools/testing/selftests/bpf/prog_tests/test_xdp_veth.c
> index 3e98a1665936..1f0b9ade12fe 100644
> --- a/tools/testing/selftests/bpf/prog_tests/test_xdp_veth.c
> +++ b/tools/testing/selftests/bpf/prog_tests/test_xdp_veth.c
> @@ -456,7 +456,11 @@ static void xdp_veth_egress(u32 flags)
>   			.remote_flags = flags,
>   		


[...]

> +	for (i = 0; i < VETH_PAIRS_COUNT; i++) {
> +		struct bpf_devmap_val devmap_val = {};
> +		int ifindex = if_nametoindex(net_config.veth_cfg[i].local_veth);
> +
> +		SYS(destroy_xdp_redirect_map,
> +		    "ip -n %s neigh add %s lladdr 00:00:00:00:00:01 dev %s",
> +		    net_config.veth_cfg[i].namespace, IP_NEIGH,
> +		    net_config.veth_cfg[i].remote_veth);
> +
> +		if (attach_programs_to_veth_pair(bpf_objs, VETH_EGRESS_SKEL_NB,
> +						 &net_config, prog_cfg, i))
> +			goto destroy_xdp_redirect_map;
> +
> +		err = bpf_map_update_elem(mac_map, &ifindex, egress_macs[i], 0);
> +		if (!ASSERT_OK(err, "bpf_map_update_elem"))
> +			goto destroy_xdp_redirect_map;
> +
> +		devmap_val.ifindex = ifindex;
> +		devmap_val.bpf_prog.fd = -1;
> +
> +		if (i == VETH_PAIRS_COUNT - 1)
> +			devmap_val.bpf_prog.fd =
> +				bpf_program__fd(xdp_redirect_multi_kern->progs.xdp_devmap_prog);
> +
> +		err = bpf_map_update_elem(egress_map, &ifindex, &devmap_val, 0);


The kernel walks DEVMAP_HASH in bucket order so I don't think we can 
guarantee that the entry at VETH_PAIRS_COUNT - 1 is the last dst.

Could we use i as the key instead, so the entry order becomes 
deterministic? (bucket is key & (n_buckets - 1).)

      parent reply	other threads:[~2026-06-10 13:00 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-10 10:28 [PATCH bpf v2] bpf: Run generic devmap egress prog on private skb Sun Jian
2026-06-10 10:52 ` Toke Høiland-Jørgensen
2026-06-10 11:04 ` bot+bpf-ci
2026-06-10 12:59 ` Jiayuan Chen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=08c35c70-a59e-4e0e-91db-22b5ec30b611@linux.dev \
    --to=jiayuan.chen@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=eddyz87@gmail.com \
    --cc=emil@etsalapatis.com \
    --cc=hawk@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=liuhangbin@gmail.com \
    --cc=martin.lau@linux.dev \
    --cc=memxor@gmail.com \
    --cc=menglong.dong@linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=sdf@fomichev.me \
    --cc=shuah@kernel.org \
    --cc=song@kernel.org \
    --cc=sun.jian.kdev@gmail.com \
    --cc=toke@redhat.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox