From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Stanislav Fomichev <stfomichev@gmail.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>,
<bpf@vger.kernel.org>, <ast@kernel.org>, <daniel@iogearbox.net>,
<andrii@kernel.org>, <netdev@vger.kernel.org>,
<magnus.karlsson@intel.com>,
Eryk Kubanski <e.kubanski@partner.samsung.com>
Subject: Re: [PATCH v2 bpf] xsk: fix immature cq descriptor production
Date: Tue, 8 Jul 2025 16:14:39 +0200 [thread overview]
Message-ID: <aG0nz2W/rBIbB7Bl@boxer> (raw)
In-Reply-To: <aGwUsDK0u3vegaYq@mini-arch>
On Mon, Jul 07, 2025 at 11:40:48AM -0700, Stanislav Fomichev wrote:
> On 07/07, Alexander Lobakin wrote:
> > From: Stanislav Fomichev <stfomichev@gmail.com>
> > Date: Mon, 7 Jul 2025 08:06:21 -0700
> >
> > > On 07/07, Alexander Lobakin wrote:
> > >> From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > >> Date: Sat, 5 Jul 2025 15:55:12 +0200
> > >>
> > >>> Eryk reported an issue that I have put under Closes: tag, related to
> > >>> umem addrs being prematurely produced onto pool's completion queue.
> > >>> Let us make the skb's destructor responsible for producing all addrs
> > >>> that given skb used.
> > >>>
> > >>> Commit from fixes tag introduced the buggy behavior, it was not broken
> > >>> from day 1, but rather when xsk multi-buffer got introduced.
> > >>>
> > >>> Introduce a struct which will carry descriptor count with array of
> > >>> addresses taken from processed descriptors that will be carried via
> > >>> skb_shared_info::destructor_arg. This way we can refer to it within
> > >>> xsk_destruct_skb().
> > >>>
> > >>> To summarize, behavior is changed from:
> > >>> - produce addr to cq, increase cq's cached_prod
> > >>> - increment descriptor count and store it on
> > >>> - (xmit and rest of path...)
> > >>> skb_shared_info::destructor_arg
> > >>> - use destructor_arg on skb destructor to update global state of cq
> > >>> producer
> > >>>
> > >>> to the following:
> > >>> - increment cq's cached_prod
> > >>> - increment descriptor count, save xdp_desc::addr in custom array and
> > >>> store this custom array on skb_shared_info::destructor_arg
> > >>> - (xmit and rest of path...)
> > >>> - use destructor_arg on skb destructor to walk the array of addrs and
> > >>> write them to cq and finally update global state of cq producer
> > >>>
> > >>> Fixes: b7f72a30e9ac ("xsk: introduce wrappers and helpers for supporting multi-buffer in Tx path")
> > >>> Reported-by: Eryk Kubanski <e.kubanski@partner.samsung.com>
> > >>> Closes: https://lore.kernel.org/netdev/20250530103456.53564-1-e.kubanski@partner.samsung.com/
> > >>> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > >>> ---
> > >>> v1:
> > >>> https://lore.kernel.org/bpf/20250702101648.1942562-1-maciej.fijalkowski@intel.com/
> > >>>
> > >>> v1->v2:
> > >>> * store addrs in array carried via destructor_arg instead having them
> > >>> stored in skb headroom; cleaner and less hacky approach;
> > >>
> > >> Might look cleaner, but what about the performance given that you're
> > >> adding a memory allocation?
> > >>
> > >> (I realize that's only for the skb mode, still)
> > >>
> > >> Yeah we anyway allocate an skb and may even copy the whole frame, just
> > >> curious.
> > >> I could recommend using skb->cb for that, but its 48 bytes would cover
> > >> only 6 addresses =\
> >
> > BTW isn't num_descs from that new structure would be the same as
> > shinfo->nr_frags + 1 (or just nr_frags for xsk_build_skb_zerocopy())?
>
> So you're saying we don't need to store it? Agreed. But storing the rest
> in cb still might be problematic with kconfig-configurable MAX_SKB_FRAGS?
Hi Stan & Olek,
no, as said in v1 drivers might linearize the skb and all frags will be
lost. This storage is needed unfortunately.
>
> > > Can we pre-allocate an array of xsk_addrs during xsk_bind (the number of
> > > xsk_addrs is bound by the tx ring size)? Then we can remove the alloc on tx
> > > and replace it with some code to manage that pool of xsk_addrs..
That would be pool-bound which makes it a shared resource so I believe
that we would repeat the problem being fixed here ;)
> >
> > Nice idea BTW.
> >
> > We could even use system per-cpu Page Pools to allocate these structs*
> > :D It wouldn't waste 1 page per one struct as PP is frag-aware and has
> > API for allocating only a small frag.
> >
> > Headroom stuff was also ok to me: we either way allocate a new skb, so
> > we could allocate it with a bit bigger headroom and put that table there
> > being sure that nobody will overwrite it (some drivers insert special
> > headers or descriptors in front of the actual skb->data).
headroom approach was causing one of bpf selftests to fail, but I didn't
check in-depth the reason. I didn't really like the check in destructor if
addr array was corrupted in v1 and I came up with v2 which seems to me a
cleaner fix.
> >
> > [*] Offtop: we could also use system PP to allocate skbs in
> > xsk_build_skb() just like it's done in xdp_build_skb_from_zc() +
> > xdp_copy_frags_from_zc() -- no way to avoid memcpy(), but the payload
> > buffers would be recycled then.
>
> Or maybe kmem_cache_alloc_node with a custom cache is good enough?
> Headroom also feels ok if we store the whole xsk_addrs struct in it.
Yep both of these approaches was something I considered, but keep in mind
it's a bugfix so I didn't want to go with something flashy. I have not
observed big performance impact but I checked only MAX_SKB_FRAGS being set
to standard value.
Would you guys be ok if I do the follow-up with possible optimization
after my vacation which would be a -next candidate?
Thanks,
MF
next prev parent reply other threads:[~2025-07-08 14:14 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-05 13:55 [PATCH v2 bpf] xsk: fix immature cq descriptor production Maciej Fijalkowski
2025-07-07 13:59 ` Alexander Lobakin
2025-07-07 15:06 ` Stanislav Fomichev
2025-07-07 15:47 ` Alexander Lobakin
2025-07-07 18:40 ` Stanislav Fomichev
2025-07-08 14:14 ` Maciej Fijalkowski [this message]
2025-07-08 15:54 ` Alexander Lobakin
2025-07-08 17:49 ` Stanislav Fomichev
2025-07-09 10:00 ` Maciej Fijalkowski
2025-07-14 19:04 ` Dan Carpenter
-- strict thread matches above, loose matches on Subject: below --
2025-07-06 7:03 kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aG0nz2W/rBIbB7Bl@boxer \
--to=maciej.fijalkowski@intel.com \
--cc=aleksander.lobakin@intel.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=e.kubanski@partner.samsung.com \
--cc=magnus.karlsson@intel.com \
--cc=netdev@vger.kernel.org \
--cc=stfomichev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.