netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Stanislav Fomichev <stfomichev@gmail.com>
Cc: <bpf@vger.kernel.org>, <ast@kernel.org>, <daniel@iogearbox.net>,
	<andrii@kernel.org>, <netdev@vger.kernel.org>,
	<magnus.karlsson@intel.com>, <aleksander.lobakin@intel.com>,
	Eryk Kubanski <e.kubanski@partner.samsung.com>
Subject: Re: [PATCH v3 bpf] xsk: fix immature cq descriptor production
Date: Wed, 13 Aug 2025 13:47:33 +0200	[thread overview]
Message-ID: <aJx7VQoyIFHGivfg@boxer> (raw)
In-Reply-To: <aJTWQDcpkz3Q4eNU@mini-arch>

On Thu, Aug 07, 2025 at 09:37:20AM -0700, Stanislav Fomichev wrote:
> On 08/07, Maciej Fijalkowski wrote:
> > On Wed, Aug 06, 2025 at 10:42:58PM +0200, Maciej Fijalkowski wrote:
> > > On Wed, Aug 06, 2025 at 09:43:53AM -0700, Stanislav Fomichev wrote:
> > > > On 08/06, Maciej Fijalkowski wrote:
> > > > > Eryk reported an issue that I have put under Closes: tag, related to
> > > > > umem addrs being prematurely produced onto pool's completion queue.
> > > > > Let us make the skb's destructor responsible for producing all addrs
> > > > > that given skb used.
> > > > > 
> > > > > Introduce struct xsk_addrs which will carry descriptor count with array
> > > > > of addresses taken from processed descriptors that will be carried via
> > > > > skb_shared_info::destructor_arg. This way we can refer to it within
> > > > > xsk_destruct_skb(). In order to mitigate the overhead that will be
> > > > > coming from memory allocations, let us introduce kmem_cache of xsk_addrs
> > > > > onto xdp_sock. Utilize the existing struct hole in xdp_sock for that.
> > > > > 
> > > > > Commit from fixes tag introduced the buggy behavior, it was not broken
> > > > > from day 1, but rather when xsk multi-buffer got introduced.
> > > > > 
> > > > > Fixes: b7f72a30e9ac ("xsk: introduce wrappers and helpers for supporting multi-buffer in Tx path")
> > > > > Reported-by: Eryk Kubanski <e.kubanski@partner.samsung.com>
> > > > > Closes: https://lore.kernel.org/netdev/20250530103456.53564-1-e.kubanski@partner.samsung.com/
> > > > > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > > > > ---
> > > > > v1:
> > > > > https://lore.kernel.org/bpf/20250702101648.1942562-1-maciej.fijalkowski@intel.com/
> > > > > v2:
> > > > > https://lore.kernel.org/bpf/20250705135512.1963216-1-maciej.fijalkowski@intel.com/
> > > > > 
> > > > > v1->v2:
> > > > > * store addrs in array carried via destructor_arg instead having them
> > > > >   stored in skb headroom; cleaner and less hacky approach;
> > > > > v2->v3:
> > > > > * use kmem_cache for xsk_addrs allocation (Stan/Olek)
> > > > > * set err when xsk_addrs allocation fails (Dan)
> > > > > * change xsk_addrs layout to avoid holes
> > > > > * free xsk_addrs on error path
> > > > > * rebase
> > > > > ---
> > > > >  include/net/xdp_sock.h |  1 +
> > > > >  net/xdp/xsk.c          | 94 ++++++++++++++++++++++++++++++++++--------
> > > > >  net/xdp/xsk_queue.h    | 12 ++++++
> > > > >  3 files changed, 89 insertions(+), 18 deletions(-)
> > > > > 
> > > > 

(...)

> > > > > +	xs->xsk_addrs_cache = kmem_cache_create("xsk_generic_xmit_cache",
> > > > > +						sizeof(struct xsk_addrs), 0,
> > > > > +						SLAB_HWCACHE_ALIGN, NULL);
> > > > > +
> > > > > +	if (!xs->xsk_addrs_cache) {
> > > > > +		sk_free(sk);
> > > > > +		return -ENOMEM;
> > > > > +	}
> > > > 
> > > > Should we move this up to happen before sk_add_node_rcu? Otherwise we
> > > > also have to do sk_del_node_init_rcu on !xs->xsk_addrs_cache here?
> > > > 
> > > > Btw, alternatively, why not make this happen at bind time when we know
> > > > whether the socket is gonna be copy or zc? And do it only for the copy
> > > > mode?
> > > 
> > > thanks for quick review Stan. makes sense to do it for copy mode only.
> > > i'll send next revision tomorrow.
> > 
> > FWIW syzbot reported an issue that "xsk_generic_xmit_cache" exists, so
> > probably we should include queue id within name so that each socket gets
> > its own cache with unique name.
> 
> Interesting. I was wondering whether it's gonna be confusing to see
> multiple "xsk_generic_xmit_cache" entries in /proc/slabinfo, but looks
> like it's not allowed :-)

I played with this a bit more, side note is that i have not seen these
entries in /proc/slabinfo unless i provided SLAB_POISON to
kmem_cache_create().

Besides I think a solution where each socket adds its own kmem_cache is
not really scalable. In theory, if someone would have such a use case that
would require loading copy mode xsk socket per each queue on NIC and there
would be multiple NICs that require it on the system, then you get a
pretty massive count of kmem_caches. I am not sure what would be the
consequences of that.

I come up with idea to have these kmem_caches as percpu vars with embedded
refcounting. xsk will index these by queue id and at the bind time if
kmem_cache under certain id was already created we just bump the refcnt.
I'll send a v4 with this implemented and I would appreciate the input on
this :)

> 

  reply	other threads:[~2025-08-13 11:47 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-06 15:41 [PATCH v3 bpf] xsk: fix immature cq descriptor production Maciej Fijalkowski
2025-08-06 16:43 ` Stanislav Fomichev
2025-08-06 20:42   ` Maciej Fijalkowski
2025-08-07 12:01     ` Maciej Fijalkowski
2025-08-07 16:37       ` Stanislav Fomichev
2025-08-13 11:47         ` Maciej Fijalkowski [this message]
2025-08-07  8:06 ` [syzbot ci] " syzbot ci
2025-08-08  9:48 ` [PATCH v3 bpf] " Magnus Karlsson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aJx7VQoyIFHGivfg@boxer \
    --to=maciej.fijalkowski@intel.com \
    --cc=aleksander.lobakin@intel.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=e.kubanski@partner.samsung.com \
    --cc=magnus.karlsson@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=stfomichev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).