From: Stanislav Fomichev <stfomichev@gmail.com>
To: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>,
bpf@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
andrii@kernel.org, netdev@vger.kernel.org,
magnus.karlsson@intel.com,
Eryk Kubanski <e.kubanski@partner.samsung.com>
Subject: Re: [PATCH v2 bpf] xsk: fix immature cq descriptor production
Date: Tue, 8 Jul 2025 10:49:36 -0700 [thread overview]
Message-ID: <aG1aMOmnb-6K7syY@mini-arch> (raw)
In-Reply-To: <beaab8ec-d11a-4147-b7f4-487a4c3fe45b@intel.com>
On 07/08, Alexander Lobakin wrote:
> From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> Date: Tue, 8 Jul 2025 16:14:39 +0200
>
> > On Mon, Jul 07, 2025 at 11:40:48AM -0700, Stanislav Fomichev wrote:
> >> On 07/07, Alexander Lobakin wrote:
>
> [...]
>
> >>> BTW isn't num_descs from that new structure would be the same as
> >>> shinfo->nr_frags + 1 (or just nr_frags for xsk_build_skb_zerocopy())?
> >>
> >> So you're saying we don't need to store it? Agreed. But storing the rest
> >> in cb still might be problematic with kconfig-configurable MAX_SKB_FRAGS?
>
> For sure skb->cb is too small for 17+ u64s.
>
> >
> > Hi Stan & Olek,
> >
> > no, as said in v1 drivers might linearize the skb and all frags will be
> > lost. This storage is needed unfortunately.
>
> Aaah sorry. In this case yeah, you need this separate frag count.
>
> >
> >>
> >>>> Can we pre-allocate an array of xsk_addrs during xsk_bind (the number of
> >>>> xsk_addrs is bound by the tx ring size)? Then we can remove the alloc on tx
> >>>> and replace it with some code to manage that pool of xsk_addrs..
> >
> > That would be pool-bound which makes it a shared resource so I believe
> > that we would repeat the problem being fixed here ;)
>
> Except the system Page Pool idea right below maybe :>
It doesn't have to be a shared resource, the pool (in whatever form) can be
per xsk. (unless I'm missing something)
> >>> Nice idea BTW.
> >>>
> >>> We could even use system per-cpu Page Pools to allocate these structs*
> >>> :D It wouldn't waste 1 page per one struct as PP is frag-aware and has
> >>> API for allocating only a small frag.
> >>>
> >>> Headroom stuff was also ok to me: we either way allocate a new skb, so
> >>> we could allocate it with a bit bigger headroom and put that table there
> >>> being sure that nobody will overwrite it (some drivers insert special
> >>> headers or descriptors in front of the actual skb->data).
> >
> > headroom approach was causing one of bpf selftests to fail, but I didn't
> > check in-depth the reason. I didn't really like the check in destructor if
> > addr array was corrupted in v1 and I came up with v2 which seems to me a
> > cleaner fix.
> >
> >>>
> >>> [*] Offtop: we could also use system PP to allocate skbs in
> >>> xsk_build_skb() just like it's done in xdp_build_skb_from_zc() +
> >>> xdp_copy_frags_from_zc() -- no way to avoid memcpy(), but the payload
> >>> buffers would be recycled then.
> >>
> >> Or maybe kmem_cache_alloc_node with a custom cache is good enough?
> >> Headroom also feels ok if we store the whole xsk_addrs struct in it.
> >
> > Yep both of these approaches was something I considered, but keep in mind
> > it's a bugfix so I didn't want to go with something flashy. I have not
> > observed big performance impact but I checked only MAX_SKB_FRAGS being set
> > to standard value.
> >
> > Would you guys be ok if I do the follow-up with possible optimization
> > after my vacation which would be a -next candidate?
>
> As a fix, it's totally fine for me to go in the current form, sure.
+1
next prev parent reply other threads:[~2025-07-08 17:49 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-05 13:55 [PATCH v2 bpf] xsk: fix immature cq descriptor production Maciej Fijalkowski
2025-07-07 13:59 ` Alexander Lobakin
2025-07-07 15:06 ` Stanislav Fomichev
2025-07-07 15:47 ` Alexander Lobakin
2025-07-07 18:40 ` Stanislav Fomichev
2025-07-08 14:14 ` Maciej Fijalkowski
2025-07-08 15:54 ` Alexander Lobakin
2025-07-08 17:49 ` Stanislav Fomichev [this message]
2025-07-09 10:00 ` Maciej Fijalkowski
2025-07-14 19:04 ` Dan Carpenter
-- strict thread matches above, loose matches on Subject: below --
2025-07-06 7:03 kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aG1aMOmnb-6K7syY@mini-arch \
--to=stfomichev@gmail.com \
--cc=aleksander.lobakin@intel.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=e.kubanski@partner.samsung.com \
--cc=maciej.fijalkowski@intel.com \
--cc=magnus.karlsson@intel.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.