Re: [PATCH v2 bpf] xsk: fix immature cq descriptor production

Netdev List
 help / color / mirror / Atom feed

From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Stanislav Fomichev <stfomichev@gmail.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>,
	<bpf@vger.kernel.org>, <ast@kernel.org>, <daniel@iogearbox.net>,
	<andrii@kernel.org>, <netdev@vger.kernel.org>,
	<magnus.karlsson@intel.com>,
	Eryk Kubanski <e.kubanski@partner.samsung.com>
Subject: Re: [PATCH v2 bpf] xsk: fix immature cq descriptor production
Date: Wed, 9 Jul 2025 12:00:10 +0200	[thread overview]
Message-ID: <aG49qrcYiapvMFFV@boxer> (raw)
In-Reply-To: <aG1aMOmnb-6K7syY@mini-arch>

On Tue, Jul 08, 2025 at 10:49:36AM -0700, Stanislav Fomichev wrote:
> On 07/08, Alexander Lobakin wrote:
> > From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > Date: Tue, 8 Jul 2025 16:14:39 +0200
> > 
> > > On Mon, Jul 07, 2025 at 11:40:48AM -0700, Stanislav Fomichev wrote:
> > >> On 07/07, Alexander Lobakin wrote:
> > 
> > [...]
> > 
> > >>> BTW isn't num_descs from that new structure would be the same as
> > >>> shinfo->nr_frags + 1 (or just nr_frags for xsk_build_skb_zerocopy())?
> > >>
> > >> So you're saying we don't need to store it? Agreed. But storing the rest
> > >> in cb still might be problematic with kconfig-configurable MAX_SKB_FRAGS?
> > 
> > For sure skb->cb is too small for 17+ u64s.
> > 
> > > 
> > > Hi Stan & Olek,
> > > 
> > > no, as said in v1 drivers might linearize the skb and all frags will be
> > > lost. This storage is needed unfortunately.
> > 
> > Aaah sorry. In this case yeah, you need this separate frag count.
> > 
> > > 
> > >>
> > >>>> Can we pre-allocate an array of xsk_addrs during xsk_bind (the number of
> > >>>> xsk_addrs is bound by the tx ring size)? Then we can remove the alloc on tx
> > >>>> and replace it with some code to manage that pool of xsk_addrs..
> > > 
> > > That would be pool-bound which makes it a shared resource so I believe
> > > that we would repeat the problem being fixed here ;)
> > 
> > Except the system Page Pool idea right below maybe :>
>  
>  It doesn't have to be a shared resource, the pool (in whatever form) can be
>  per xsk. (unless I'm missing something)

It can not. when you attach multiple xsks to same {dev,qid} tuple pool is
shared.
https://elixir.bootlin.com/linux/v6.16-rc5/source/net/xdp/xsk.c#L1249

Not sure right now if we could store the pointer to xsk_addrs in
xdp_sock maybe. Let me get back to this after my time off.

BTW I didn't realize you added yourself as xsk reviewer. Would be nice to
have CCed Magnus or me when doing so :P

> 
> > >>> Nice idea BTW.
> > >>>
> > >>> We could even use system per-cpu Page Pools to allocate these structs*
> > >>> :D It wouldn't waste 1 page per one struct as PP is frag-aware and has
> > >>> API for allocating only a small frag.
> > >>>
> > >>> Headroom stuff was also ok to me: we either way allocate a new skb, so
> > >>> we could allocate it with a bit bigger headroom and put that table there
> > >>> being sure that nobody will overwrite it (some drivers insert special
> > >>> headers or descriptors in front of the actual skb->data).
> > > 
> > > headroom approach was causing one of bpf selftests to fail, but I didn't
> > > check in-depth the reason. I didn't really like the check in destructor if
> > > addr array was corrupted in v1 and I came up with v2 which seems to me a
> > > cleaner fix.
> > > 
> > >>>
> > >>> [*] Offtop: we could also use system PP to allocate skbs in
> > >>> xsk_build_skb() just like it's done in xdp_build_skb_from_zc() +
> > >>> xdp_copy_frags_from_zc() -- no way to avoid memcpy(), but the payload
> > >>> buffers would be recycled then.
> > >>
> > >> Or maybe kmem_cache_alloc_node with a custom cache is good enough?
> > >> Headroom also feels ok if we store the whole xsk_addrs struct in it.
> > > 
> > > Yep both of these approaches was something I considered, but keep in mind
> > > it's a bugfix so I didn't want to go with something flashy. I have not
> > > observed big performance impact but I checked only MAX_SKB_FRAGS being set
> > > to standard value.
> > > 
> > > Would you guys be ok if I do the follow-up with possible optimization
> > > after my vacation which would be a -next candidate?
> > 
> > As a fix, it's totally fine for me to go in the current form, sure.
> 
> +1

next prev parent reply	other threads:[~2025-07-09 10:00 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-05 13:55 [PATCH v2 bpf] xsk: fix immature cq descriptor production Maciej Fijalkowski
2025-07-07 13:59 ` Alexander Lobakin
2025-07-07 15:06   ` Stanislav Fomichev
2025-07-07 15:47     ` Alexander Lobakin
2025-07-07 18:40       ` Stanislav Fomichev
2025-07-08 14:14         ` Maciej Fijalkowski
2025-07-08 15:54           ` Alexander Lobakin
2025-07-08 17:49             ` Stanislav Fomichev
2025-07-09 10:00               ` Maciej Fijalkowski [this message]
2025-07-14 19:04 ` Dan Carpenter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aG49qrcYiapvMFFV@boxer \
    --to=maciej.fijalkowski@intel.com \
    --cc=aleksander.lobakin@intel.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=e.kubanski@partner.samsung.com \
    --cc=magnus.karlsson@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=stfomichev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox