Netdev List
 help / color / mirror / Atom feed
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: netdev@vger.kernel.org
Cc: bpf@vger.kernel.org, magnus.karlsson@intel.com,
	stfomichev@gmail.com, kuba@kernel.org, pabeni@redhat.com,
	horms@kernel.org, kerneljasonxing@gmail.com, bjorn@kernel.org,
	Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Subject: [PATCH net 0/7] xsk: fix AF_XDP multi-buffer Tx descriptor reclaim
Date: Tue, 23 Jun 2026 15:32:33 +0200	[thread overview]
Message-ID: <20260623133240.1048434-1-maciej.fijalkowski@intel.com> (raw)

Hi,

This series fixes several AF_XDP multi-buffer Tx paths where descriptors
consumed from the Tx ring are not consistently returned to userspace
through the completion ring when the packet is later dropped as invalid.

The affected cases are invalid or oversized multi-buffer Tx packets in
both the generic and zero-copy paths. In these cases, the kernel can
consume one or more Tx descriptors while building or validating a
multi-buffer packet, then drop the packet before it reaches the device.
Userspace still owns the UMEM buffers only after the corresponding
addresses are returned through the CQ. Missing completions therefore
make userspace lose track of those buffers.

The generic path fixes cover three related cases:
* partially built multi-buffer skbs dropped by xsk_drop_skb();
  continuation descriptors left in the Tx ring after xsk_build_skb()
  reports overflow;
* invalid descriptors encountered in the middle of a multi-buffer
  packet, including the offending invalid descriptor itself.

The zero-copy path is handled separately. The batched Tx parser now
distinguishes descriptors that can be passed to the driver from
descriptors that are consumed only because they belong to an invalid
multi-buffer packet. Reclaim-only descriptors are written to the CQ
address area and published in completion order, after any earlier
driver-visible Tx descriptors.

The ZC batching path can also retain drain state when userspace has not
yet provided the end of an invalid multi-buffer packet. To keep this
state local to the singular batched path, the series prevents a second
Tx socket from joining the same pool while such drain state exists.
During the singular-to-shared transition, Tx batching is gated,
pre-existing readers are waited out, and bind fails with -EAGAIN if the
existing socket still has pending drain state. This avoids adding
multi-buffer drain handling to the shared-UMEM fallback path.

The last two patches update xskxceiver so the tests account invalid
multi-buffer Tx packets as descriptors that must be reclaimed, while
still not expecting those invalid packets on the Rx side.

This is a follow-up to Jason's changes [0] which were addressing generic
xmit only and this set allows me to pass full xskxceiver test suite run
against ice driver.

Thanks,
Maciej

[0]: https://lore.kernel.org/netdev/20260520004244.55663-1-kerneljasonxing@gmail.com/


Jason Xing (3):
  xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx
  xsk: drain continuation descs after overflow in xsk_build_skb()
  xsk: drain continuation descs on invalid descriptor in
    __xsk_generic_xmit()

Maciej Fijalkowski (4):
  xsk: reclaim offending invalid desc in generic multi-buffer Tx
  xsk: reclaim invalid multi-buffer Tx descs in ZC path
  selftests/xsk: fix too-many-frags multi-buffer Tx test
  selftests/xsk: account invalid multi-buffer Tx descriptors

 include/net/xdp_sock.h                        |   1 +
 include/net/xsk_buff_pool.h                   |   6 +
 net/xdp/xsk.c                                 | 114 ++++++++++++++++--
 net/xdp/xsk_buff_pool.c                       |  66 ++++++++++
 net/xdp/xsk_queue.h                           |  66 +++++++---
 .../selftests/bpf/prog_tests/test_xsk.c       |  44 ++++---
 6 files changed, 254 insertions(+), 43 deletions(-)

-- 
2.43.0


             reply	other threads:[~2026-06-23 13:33 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-23 13:32 Maciej Fijalkowski [this message]
2026-06-23 13:32 ` [PATCH net 1/7] xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx Maciej Fijalkowski
2026-06-23 13:32 ` [PATCH net 2/7] xsk: drain continuation descs after overflow in xsk_build_skb() Maciej Fijalkowski
2026-06-23 13:32 ` [PATCH net 3/7] xsk: drain continuation descs on invalid descriptor in __xsk_generic_xmit() Maciej Fijalkowski
2026-06-23 13:32 ` [PATCH net 4/7] xsk: reclaim offending invalid desc in generic multi-buffer Tx Maciej Fijalkowski
2026-06-23 13:32 ` [PATCH net 5/7] xsk: reclaim invalid multi-buffer Tx descs in ZC path Maciej Fijalkowski
2026-06-23 13:32 ` [PATCH net 6/7] selftests/xsk: fix too-many-frags multi-buffer Tx test Maciej Fijalkowski
2026-06-23 13:32 ` [PATCH net 7/7] selftests/xsk: account invalid multi-buffer Tx descriptors Maciej Fijalkowski
2026-06-24 15:38 ` [PATCH net 0/7] xsk: fix AF_XDP multi-buffer Tx descriptor reclaim Stanislav Fomichev
2026-06-24 16:37   ` Maciej Fijalkowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260623133240.1048434-1-maciej.fijalkowski@intel.com \
    --to=maciej.fijalkowski@intel.com \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=horms@kernel.org \
    --cc=kerneljasonxing@gmail.com \
    --cc=kuba@kernel.org \
    --cc=magnus.karlsson@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=stfomichev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox