From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Stanislav Fomichev <sdf.kernel@gmail.com>
Cc: Jason Xing <kerneljasonxing@gmail.com>, <netdev@vger.kernel.org>,
<bpf@vger.kernel.org>, <magnus.karlsson@intel.com>,
<stfomichev@gmail.com>, <kuba@kernel.org>, <pabeni@redhat.com>,
<horms@kernel.org>, <bjorn@kernel.org>
Subject: Re: [PATCH net 0/7] xsk: fix AF_XDP multi-buffer Tx descriptor reclaim
Date: Fri, 26 Jun 2026 09:42:08 +0200 [thread overview]
Message-ID: <aj4tUMqAXmQskO6r@boxer> (raw)
In-Reply-To: <aj1O4vKzCuodwgYL@devvm7509.cco0.facebook.com>
On Thu, Jun 25, 2026 at 09:05:28AM -0700, Stanislav Fomichev wrote:
> On 06/25, Jason Xing wrote:
> > On Thu, Jun 25, 2026 at 12:37 AM Maciej Fijalkowski
> > <maciej.fijalkowski@intel.com> wrote:
> > >
> > > On Wed, Jun 24, 2026 at 08:38:20AM -0700, Stanislav Fomichev wrote:
> > > > On 06/23, Maciej Fijalkowski wrote:
> > > > > Hi,
> > > > >
> > > > > This series fixes several AF_XDP multi-buffer Tx paths where descriptors
> > > > > consumed from the Tx ring are not consistently returned to userspace
> > > > > through the completion ring when the packet is later dropped as invalid.
> > > > >
> > > > > The affected cases are invalid or oversized multi-buffer Tx packets in
> > > > > both the generic and zero-copy paths. In these cases, the kernel can
> > > > > consume one or more Tx descriptors while building or validating a
> > > > > multi-buffer packet, then drop the packet before it reaches the device.
> > > > > Userspace still owns the UMEM buffers only after the corresponding
> > > > > addresses are returned through the CQ. Missing completions therefore
> > > > > make userspace lose track of those buffers.
> > > > >
> > > > > The generic path fixes cover three related cases:
> > > > > * partially built multi-buffer skbs dropped by xsk_drop_skb();
> > > > > continuation descriptors left in the Tx ring after xsk_build_skb()
> > > > > reports overflow;
> > > > > * invalid descriptors encountered in the middle of a multi-buffer
> > > > > packet, including the offending invalid descriptor itself.
> > > > >
> > > > > The zero-copy path is handled separately. The batched Tx parser now
> > > > > distinguishes descriptors that can be passed to the driver from
> > > > > descriptors that are consumed only because they belong to an invalid
> > > > > multi-buffer packet. Reclaim-only descriptors are written to the CQ
> > > > > address area and published in completion order, after any earlier
> > > > > driver-visible Tx descriptors.
> > > > >
> > > > > The ZC batching path can also retain drain state when userspace has not
> > > > > yet provided the end of an invalid multi-buffer packet. To keep this
> > > > > state local to the singular batched path, the series prevents a second
> > > > > Tx socket from joining the same pool while such drain state exists.
> > > > > During the singular-to-shared transition, Tx batching is gated,
> > > > > pre-existing readers are waited out, and bind fails with -EAGAIN if the
> > > > > existing socket still has pending drain state. This avoids adding
> > > > > multi-buffer drain handling to the shared-UMEM fallback path.
> > > > >
> > > > > The last two patches update xskxceiver so the tests account invalid
> > > > > multi-buffer Tx packets as descriptors that must be reclaimed, while
> > > > > still not expecting those invalid packets on the Rx side.
> > > > >
> > > > > This is a follow-up to Jason's changes [0] which were addressing generic
> > > > > xmit only and this set allows me to pass full xskxceiver test suite run
> > > > > against ice driver.
> > > >
> > > > There is a fair amount of feedback from sashiko already :-( So the meta
> > > > question from me is: is it time to scrap our current approach where
> > > > we parse descriptor by descriptor? (and maintain half-baked skb and
> > > > half-consumed descriptor queues)
> > > >
> > > > Should we:
> > > >
> > > > 1. do desc[MAX_SKB_FRAGS] and xskq_cons_peek_desc until we exhaust
> > > > PKT_CONT (if the last packet has PKT_CONT, return EOVERFLOW to userspace
> > > > and do a full stop here)
> > > > 2. now that we really know the number of valid descriptors -> reserve
> > > > the cq space (if not -> EAGAIN)
> > > > 3. pre-allocate everything here (if at any point we have ENOMEM -> cleanup
> > > > locally, don't ever create semi-initialized skb)
> > > > 4. construct the skb
> > > > 5. xmit
> > >
> > > Yeah generic xmit became utterly horrible, haven't gone through sashiko
> > > reviews yet, but bare in mind this set also aligns zc side to what was
> > > previously being addressed by Jason.
> > >
> > > I believe planned logistics were to get these fixes onto net and then
> > > Jason had an implementation of batching on generic xmit, directed towards
> > > -next and that's where we could address current flow.
> >
> > Agreed. That's what I'm hoping for. There would be much more
> > discussion on how to do batch xmit in an elegant way, I believe.
>
> This doesn't have to depend on the batch rewrite, we should be able to rewrite
> this non-zc in net, this is still technically fixes, not feature work..
>
> There was already a couple of revisions with this drain_cont approach
> and every time I look at it feels like the cure is worse than the
> decease :-( Obviously not gonna stop you from going with the current approach,
> but these fixes feel a bit of a wasted effort to me (since the bugs keep
> coming and we are piling more complexity).
Well this is my fault as I took Jason's patches as-is and did not realize
Sashiko had issues with it. I *think* I got ZC side almost right so I'd
like to have at least one last round with trying to make the generic side
right...
next prev parent reply other threads:[~2026-06-26 7:44 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-23 13:32 [PATCH net 0/7] xsk: fix AF_XDP multi-buffer Tx descriptor reclaim Maciej Fijalkowski
2026-06-23 13:32 ` [PATCH net 1/7] xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx Maciej Fijalkowski
2026-06-26 11:12 ` Larysa Zaremba
2026-06-26 12:24 ` Jason Xing
2026-06-26 13:43 ` Fijalkowski, Maciej
2026-06-23 13:32 ` [PATCH net 2/7] xsk: drain continuation descs after overflow in xsk_build_skb() Maciej Fijalkowski
2026-06-23 13:32 ` [PATCH net 3/7] xsk: drain continuation descs on invalid descriptor in __xsk_generic_xmit() Maciej Fijalkowski
2026-06-23 13:32 ` [PATCH net 4/7] xsk: reclaim offending invalid desc in generic multi-buffer Tx Maciej Fijalkowski
2026-06-25 14:16 ` Jason Xing
2026-06-23 13:32 ` [PATCH net 5/7] xsk: reclaim invalid multi-buffer Tx descs in ZC path Maciej Fijalkowski
2026-06-23 13:32 ` [PATCH net 6/7] selftests/xsk: fix too-many-frags multi-buffer Tx test Maciej Fijalkowski
2026-06-23 13:32 ` [PATCH net 7/7] selftests/xsk: account invalid multi-buffer Tx descriptors Maciej Fijalkowski
2026-06-24 15:38 ` [PATCH net 0/7] xsk: fix AF_XDP multi-buffer Tx descriptor reclaim Stanislav Fomichev
2026-06-24 16:37 ` Maciej Fijalkowski
2026-06-25 1:33 ` Jason Xing
2026-06-25 16:05 ` Stanislav Fomichev
2026-06-26 0:24 ` Jason Xing
2026-06-26 16:25 ` Stanislav Fomichev
2026-06-26 7:42 ` Maciej Fijalkowski [this message]
2026-06-26 16:30 ` Stanislav Fomichev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aj4tUMqAXmQskO6r@boxer \
--to=maciej.fijalkowski@intel.com \
--cc=bjorn@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=horms@kernel.org \
--cc=kerneljasonxing@gmail.com \
--cc=kuba@kernel.org \
--cc=magnus.karlsson@intel.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf.kernel@gmail.com \
--cc=stfomichev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox