From: Stanislav Fomichev <sdf.kernel@gmail.com>
To: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Cc: netdev@vger.kernel.org, bpf@vger.kernel.org,
magnus.karlsson@intel.com, stfomichev@gmail.com,
kuba@kernel.org, pabeni@redhat.com, horms@kernel.org,
kerneljasonxing@gmail.com, bjorn@kernel.org
Subject: Re: [PATCH net 0/7] xsk: fix AF_XDP multi-buffer Tx descriptor reclaim
Date: Wed, 24 Jun 2026 08:38:20 -0700 [thread overview]
Message-ID: <ajv23_r4M3FGrFNN@devvm7509.cco0.facebook.com> (raw)
In-Reply-To: <20260623133240.1048434-1-maciej.fijalkowski@intel.com>
On 06/23, Maciej Fijalkowski wrote:
> Hi,
>
> This series fixes several AF_XDP multi-buffer Tx paths where descriptors
> consumed from the Tx ring are not consistently returned to userspace
> through the completion ring when the packet is later dropped as invalid.
>
> The affected cases are invalid or oversized multi-buffer Tx packets in
> both the generic and zero-copy paths. In these cases, the kernel can
> consume one or more Tx descriptors while building or validating a
> multi-buffer packet, then drop the packet before it reaches the device.
> Userspace still owns the UMEM buffers only after the corresponding
> addresses are returned through the CQ. Missing completions therefore
> make userspace lose track of those buffers.
>
> The generic path fixes cover three related cases:
> * partially built multi-buffer skbs dropped by xsk_drop_skb();
> continuation descriptors left in the Tx ring after xsk_build_skb()
> reports overflow;
> * invalid descriptors encountered in the middle of a multi-buffer
> packet, including the offending invalid descriptor itself.
>
> The zero-copy path is handled separately. The batched Tx parser now
> distinguishes descriptors that can be passed to the driver from
> descriptors that are consumed only because they belong to an invalid
> multi-buffer packet. Reclaim-only descriptors are written to the CQ
> address area and published in completion order, after any earlier
> driver-visible Tx descriptors.
>
> The ZC batching path can also retain drain state when userspace has not
> yet provided the end of an invalid multi-buffer packet. To keep this
> state local to the singular batched path, the series prevents a second
> Tx socket from joining the same pool while such drain state exists.
> During the singular-to-shared transition, Tx batching is gated,
> pre-existing readers are waited out, and bind fails with -EAGAIN if the
> existing socket still has pending drain state. This avoids adding
> multi-buffer drain handling to the shared-UMEM fallback path.
>
> The last two patches update xskxceiver so the tests account invalid
> multi-buffer Tx packets as descriptors that must be reclaimed, while
> still not expecting those invalid packets on the Rx side.
>
> This is a follow-up to Jason's changes [0] which were addressing generic
> xmit only and this set allows me to pass full xskxceiver test suite run
> against ice driver.
There is a fair amount of feedback from sashiko already :-( So the meta
question from me is: is it time to scrap our current approach where
we parse descriptor by descriptor? (and maintain half-baked skb and
half-consumed descriptor queues)
Should we:
1. do desc[MAX_SKB_FRAGS] and xskq_cons_peek_desc until we exhaust
PKT_CONT (if the last packet has PKT_CONT, return EOVERFLOW to userspace
and do a full stop here)
2. now that we really know the number of valid descriptors -> reserve
the cq space (if not -> EAGAIN)
3. pre-allocate everything here (if at any point we have ENOMEM -> cleanup
locally, don't ever create semi-initialized skb)
4. construct the skb
5. xmit
If at any point there is an issue, the cleanup is straightforward. That
whole xk->skb goes away, no state between syscalls. Thoughts?
next prev parent reply other threads:[~2026-06-24 15:38 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-23 13:32 [PATCH net 0/7] xsk: fix AF_XDP multi-buffer Tx descriptor reclaim Maciej Fijalkowski
2026-06-23 13:32 ` [PATCH net 1/7] xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx Maciej Fijalkowski
2026-06-23 13:32 ` [PATCH net 2/7] xsk: drain continuation descs after overflow in xsk_build_skb() Maciej Fijalkowski
2026-06-24 13:33 ` sashiko-bot
2026-06-23 13:32 ` [PATCH net 3/7] xsk: drain continuation descs on invalid descriptor in __xsk_generic_xmit() Maciej Fijalkowski
2026-06-24 13:33 ` sashiko-bot
2026-06-23 13:32 ` [PATCH net 4/7] xsk: reclaim offending invalid desc in generic multi-buffer Tx Maciej Fijalkowski
2026-06-24 13:33 ` sashiko-bot
2026-06-23 13:32 ` [PATCH net 5/7] xsk: reclaim invalid multi-buffer Tx descs in ZC path Maciej Fijalkowski
2026-06-24 13:33 ` sashiko-bot
2026-06-23 13:32 ` [PATCH net 6/7] selftests/xsk: fix too-many-frags multi-buffer Tx test Maciej Fijalkowski
2026-06-24 13:33 ` sashiko-bot
2026-06-23 13:32 ` [PATCH net 7/7] selftests/xsk: account invalid multi-buffer Tx descriptors Maciej Fijalkowski
2026-06-24 13:33 ` sashiko-bot
2026-06-24 15:38 ` Stanislav Fomichev [this message]
2026-06-24 16:37 ` [PATCH net 0/7] xsk: fix AF_XDP multi-buffer Tx descriptor reclaim Maciej Fijalkowski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ajv23_r4M3FGrFNN@devvm7509.cco0.facebook.com \
--to=sdf.kernel@gmail.com \
--cc=bjorn@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=horms@kernel.org \
--cc=kerneljasonxing@gmail.com \
--cc=kuba@kernel.org \
--cc=maciej.fijalkowski@intel.com \
--cc=magnus.karlsson@intel.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=stfomichev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox