From: Stanislav Fomichev <stfomichev@gmail.com>
To: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Cc: bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
andrii@kernel.org, netdev@vger.kernel.org,
magnus.karlsson@intel.com, kerneljasonxing@gmail.com,
Eryk Kubanski <e.kubanski@partner.samsung.com>
Subject: Re: [PATCH v7 bpf] xsk: fix immature cq descriptor production
Date: Tue, 2 Sep 2025 10:26:08 -0700 [thread overview]
Message-ID: <aLcosKaalC8DYR5j@mini-arch> (raw)
In-Reply-To: <20250829180950.2305157-1-maciej.fijalkowski@intel.com>
On 08/29, Maciej Fijalkowski wrote:
> Eryk reported an issue that I have put under Closes: tag, related to
> umem addrs being prematurely produced onto pool's completion queue.
> Let us make the skb's destructor responsible for producing all addrs
> that given skb used.
>
> Commit from fixes tag introduced the buggy behavior, it was not broken
> from day 1, but rather when XSK multi-buffer got introduced.
>
> In order to mitigate performance impact as much as possible, mimic the
> linear and frag parts within skb by storing the first address from XSK
> descriptor at sk_buff::destructor_arg. For fragments, store them at ::cb
> via list. The nodes that will go onto list will be allocated via
> kmem_cache. xsk_destruct_skb() will consume address stored at
> ::destructor_arg and optionally go through list from ::cb, if count of
> descriptors associated with this particular skb is bigger than 1.
>
> Previous approach where whole array for storing UMEM addresses from XSK
> descriptors was pre-allocated during first fragment processing yielded
> too big performance regression for 64b traffic. In current approach
> impact is much reduced on my tests and for jumbo frames I observed
> traffic being slower by at most 9%.
>
> Magnus suggested to have this way of processing special cased for
> XDP_SHARED_UMEM, so we would identify this during bind and set different
> hooks for 'backpressure mechanism' on CQ and for skb destructor, but
> given that results looked promising on my side I decided to have a
> single data path for XSK generic Tx. I suppose other auxiliary stuff
> such as helpers introduced in this patch would have to land as well in
> order to make it work, so we might have ended up with more noisy diff.
>
> Fixes: b7f72a30e9ac ("xsk: introduce wrappers and helpers for supporting multi-buffer in Tx path")
> Reported-by: Eryk Kubanski <e.kubanski@partner.samsung.com>
> Closes: https://lore.kernel.org/netdev/20250530103456.53564-1-e.kubanski@partner.samsung.com/
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---
>
> Jason, please test this v7 on your setup, I would appreciate if you
> would report results from your testbed. Thanks!
>
> v1:
> https://lore.kernel.org/bpf/20250702101648.1942562-1-maciej.fijalkowski@intel.com/
> v2:
> https://lore.kernel.org/bpf/20250705135512.1963216-1-maciej.fijalkowski@intel.com/
> v3:
> https://lore.kernel.org/bpf/20250806154127.2161434-1-maciej.fijalkowski@intel.com/
> v4:
> https://lore.kernel.org/bpf/20250813171210.2205259-1-maciej.fijalkowski@intel.com/
> v5:
> https://lore.kernel.org/bpf/aKXBHGPxjpBDKOHq@boxer/T/
> v6:
> https://lore.kernel.org/bpf/20250820154416.2248012-1-maciej.fijalkowski@intel.com/
>
> v1->v2:
> * store addrs in array carried via destructor_arg instead having them
> stored in skb headroom; cleaner and less hacky approach;
> v2->v3:
> * use kmem_cache for xsk_addrs allocation (Stan/Olek)
> * set err when xsk_addrs allocation fails (Dan)
> * change xsk_addrs layout to avoid holes
> * free xsk_addrs on error path
> * rebase
> v3->v4:
> * have kmem_cache as percpu vars
> * don't drop unnecessary braces (unrelated) (Stan)
> * use idx + i in xskq_prod_write_addr (Stan)
> * alloc kmem_cache on bind (Stan)
> * keep num_descs as first member in xsk_addrs (Magnus)
> * add ack from Magnus
> v4->v5:
> * have a single kmem_cache per xsk subsystem (Stan)
> v5->v6:
> * free skb in xsk_build_skb_zerocopy() when xsk_addrs allocation fails
> (Stan)
> * unregister netdev notifier if creating kmem_cache fails (Stan)
> v6->v7:
> * don't include Acks from Magnus/Stan; let them review the new
> approach:)
> * store first desc at sk_buff::destructor_arg and rest of frags in list
> stored at sk_buff::cb
This is a nice way out :-)
> * keep the kmem_cache but don't use it for allocation of whole array at
> one shot but rather alloc single nodes of list
>
> ---
> net/xdp/xsk.c | 99 ++++++++++++++++++++++++++++++++++++++-------
> net/xdp/xsk_queue.h | 12 ++++++
> 2 files changed, 97 insertions(+), 14 deletions(-)
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 9c3acecc14b1..3d12d1fbda41 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -36,6 +36,20 @@
> #define TX_BATCH_SIZE 32
> #define MAX_PER_SOCKET_BUDGET 32
>
> +struct xsk_addr_node {
> + u64 addr;
> + struct list_head addr_node;
> +};
> +
> +struct xsk_addr_head {
> + u32 num_descs;
> + struct list_head addrs_list;
> +};
> +
> +static struct kmem_cache *xsk_tx_generic_cache;
> +
> +#define XSKCB(skb) ((struct xsk_addr_head *)((skb)->cb))
Since you're gonna respin, maybe stick a build_bug_on here for
the sizeof xsk_addr_head vs sizeof skb cb? Who knows, maybe at some
point we'll stick more info into that struct..
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
prev parent reply other threads:[~2025-09-02 17:26 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-29 18:09 [PATCH v7 bpf] xsk: fix immature cq descriptor production Maciej Fijalkowski
2025-08-30 10:30 ` Jason Xing
2025-09-01 12:40 ` Maciej Fijalkowski
2025-09-01 16:09 ` Jason Xing
2025-09-01 20:36 ` Maciej Fijalkowski
2025-09-02 0:02 ` Jason Xing
2025-09-02 10:55 ` Maciej Fijalkowski
2025-09-02 13:38 ` Jason Xing
2025-09-02 16:22 ` Alexei Starovoitov
2025-09-02 16:53 ` Jason Xing
2025-09-02 17:26 ` Stanislav Fomichev [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aLcosKaalC8DYR5j@mini-arch \
--to=stfomichev@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=e.kubanski@partner.samsung.com \
--cc=kerneljasonxing@gmail.com \
--cc=maciej.fijalkowski@intel.com \
--cc=magnus.karlsson@intel.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox