All of lore.kernel.org
 help / color / mirror / Atom feed
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Stanislav Fomichev <stfomichev@gmail.com>
Cc: <bpf@vger.kernel.org>, <ast@kernel.org>, <daniel@iogearbox.net>,
	<andrii@kernel.org>, <netdev@vger.kernel.org>,
	<magnus.karlsson@intel.com>, <aleksander.lobakin@intel.com>,
	Eryk Kubanski <e.kubanski@partner.samsung.com>
Subject: Re: [PATCH v3 bpf] xsk: fix immature cq descriptor production
Date: Wed, 13 Aug 2025 13:47:33 +0200	[thread overview]
Message-ID: <aJx7VQoyIFHGivfg@boxer> (raw)
In-Reply-To: <aJTWQDcpkz3Q4eNU@mini-arch>

On Thu, Aug 07, 2025 at 09:37:20AM -0700, Stanislav Fomichev wrote:
> On 08/07, Maciej Fijalkowski wrote:
> > On Wed, Aug 06, 2025 at 10:42:58PM +0200, Maciej Fijalkowski wrote:
> > > On Wed, Aug 06, 2025 at 09:43:53AM -0700, Stanislav Fomichev wrote:
> > > > On 08/06, Maciej Fijalkowski wrote:
> > > > > Eryk reported an issue that I have put under Closes: tag, related to
> > > > > umem addrs being prematurely produced onto pool's completion queue.
> > > > > Let us make the skb's destructor responsible for producing all addrs
> > > > > that given skb used.
> > > > > 
> > > > > Introduce struct xsk_addrs which will carry descriptor count with array
> > > > > of addresses taken from processed descriptors that will be carried via
> > > > > skb_shared_info::destructor_arg. This way we can refer to it within
> > > > > xsk_destruct_skb(). In order to mitigate the overhead that will be
> > > > > coming from memory allocations, let us introduce kmem_cache of xsk_addrs
> > > > > onto xdp_sock. Utilize the existing struct hole in xdp_sock for that.
> > > > > 
> > > > > Commit from fixes tag introduced the buggy behavior, it was not broken
> > > > > from day 1, but rather when xsk multi-buffer got introduced.
> > > > > 
> > > > > Fixes: b7f72a30e9ac ("xsk: introduce wrappers and helpers for supporting multi-buffer in Tx path")
> > > > > Reported-by: Eryk Kubanski <e.kubanski@partner.samsung.com>
> > > > > Closes: https://lore.kernel.org/netdev/20250530103456.53564-1-e.kubanski@partner.samsung.com/
> > > > > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > > > > ---
> > > > > v1:
> > > > > https://lore.kernel.org/bpf/20250702101648.1942562-1-maciej.fijalkowski@intel.com/
> > > > > v2:
> > > > > https://lore.kernel.org/bpf/20250705135512.1963216-1-maciej.fijalkowski@intel.com/
> > > > > 
> > > > > v1->v2:
> > > > > * store addrs in array carried via destructor_arg instead having them
> > > > >   stored in skb headroom; cleaner and less hacky approach;
> > > > > v2->v3:
> > > > > * use kmem_cache for xsk_addrs allocation (Stan/Olek)
> > > > > * set err when xsk_addrs allocation fails (Dan)
> > > > > * change xsk_addrs layout to avoid holes
> > > > > * free xsk_addrs on error path
> > > > > * rebase
> > > > > ---
> > > > >  include/net/xdp_sock.h |  1 +
> > > > >  net/xdp/xsk.c          | 94 ++++++++++++++++++++++++++++++++++--------
> > > > >  net/xdp/xsk_queue.h    | 12 ++++++
> > > > >  3 files changed, 89 insertions(+), 18 deletions(-)
> > > > > 
> > > > 

(...)

> > > > > +	xs->xsk_addrs_cache = kmem_cache_create("xsk_generic_xmit_cache",
> > > > > +						sizeof(struct xsk_addrs), 0,
> > > > > +						SLAB_HWCACHE_ALIGN, NULL);
> > > > > +
> > > > > +	if (!xs->xsk_addrs_cache) {
> > > > > +		sk_free(sk);
> > > > > +		return -ENOMEM;
> > > > > +	}
> > > > 
> > > > Should we move this up to happen before sk_add_node_rcu? Otherwise we
> > > > also have to do sk_del_node_init_rcu on !xs->xsk_addrs_cache here?
> > > > 
> > > > Btw, alternatively, why not make this happen at bind time when we know
> > > > whether the socket is gonna be copy or zc? And do it only for the copy
> > > > mode?
> > > 
> > > thanks for quick review Stan. makes sense to do it for copy mode only.
> > > i'll send next revision tomorrow.
> > 
> > FWIW syzbot reported an issue that "xsk_generic_xmit_cache" exists, so
> > probably we should include queue id within name so that each socket gets
> > its own cache with unique name.
> 
> Interesting. I was wondering whether it's gonna be confusing to see
> multiple "xsk_generic_xmit_cache" entries in /proc/slabinfo, but looks
> like it's not allowed :-)

I played with this a bit more, side note is that i have not seen these
entries in /proc/slabinfo unless i provided SLAB_POISON to
kmem_cache_create().

Besides I think a solution where each socket adds its own kmem_cache is
not really scalable. In theory, if someone would have such a use case that
would require loading copy mode xsk socket per each queue on NIC and there
would be multiple NICs that require it on the system, then you get a
pretty massive count of kmem_caches. I am not sure what would be the
consequences of that.

I come up with idea to have these kmem_caches as percpu vars with embedded
refcounting. xsk will index these by queue id and at the bind time if
kmem_cache under certain id was already created we just bump the refcnt.
I'll send a v4 with this implemented and I would appreciate the input on
this :)

> 

  reply	other threads:[~2025-08-13 11:47 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-06 15:41 [PATCH v3 bpf] xsk: fix immature cq descriptor production Maciej Fijalkowski
2025-08-06 16:43 ` Stanislav Fomichev
2025-08-06 20:42   ` Maciej Fijalkowski
2025-08-07 12:01     ` Maciej Fijalkowski
2025-08-07 16:37       ` Stanislav Fomichev
2025-08-13 11:47         ` Maciej Fijalkowski [this message]
2025-08-07  8:06 ` [syzbot ci] " syzbot ci
2025-08-08  9:48 ` [PATCH v3 bpf] " Magnus Karlsson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aJx7VQoyIFHGivfg@boxer \
    --to=maciej.fijalkowski@intel.com \
    --cc=aleksander.lobakin@intel.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=e.kubanski@partner.samsung.com \
    --cc=magnus.karlsson@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=stfomichev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.