BPF List
 help / color / mirror / Atom feed
* [PATCH net v2 0/5] xsk: fix meta and publish of cq issues
@ 2026-05-15 12:30 Jason Xing
  2026-05-15 12:30 ` [PATCH net v2 1/5] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() Jason Xing
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Jason Xing @ 2026-05-15 12:30 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

The series is the product of previous review from sashiko[1].

1) META
patch 1: address TOCTOU around metadata.

2) PUBLISH of CQ
patch 2: make sure xsk_addr->addrs[] can be published to cq when
         overflow occurs.
patch 3: keep cleaning up the continuation descs (more than 17) and
         publish its address when overflow occurs.
patch 4: like patch 3, but only handles the invalid descs cases.

[1]: https://lore.kernel.org/all/20260502200722.53960-1-kerneljasonxing@gmail.com/

---
V2
Link: https://lore.kernel.org/all/20260510012310.88570-1-kerneljasonxing@gmail.com/
1. adjust selftests (Jakub)
2. add READ_ONCE in patch 1 (Stan)


Jason Xing (5):
  xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata()
  xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx
  xsk: drain continuation descs after overflow in xsk_build_skb()
  xsk: drain continuation descs on invalid descriptor in
    __xsk_generic_xmit()
  selftests/xsk: fix multi-buffer invalid desc tests for drain_cont

 include/net/xdp_sock.h                        |  1 +
 net/xdp/xsk.c                                 | 48 +++++++++++++++----
 .../selftests/bpf/prog_tests/test_xsk.c       | 45 +++++++++--------
 3 files changed, 64 insertions(+), 30 deletions(-)

-- 
2.41.3


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net v2 1/5] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata()
  2026-05-15 12:30 [PATCH net v2 0/5] xsk: fix meta and publish of cq issues Jason Xing
@ 2026-05-15 12:30 ` Jason Xing
  2026-05-15 12:30 ` [PATCH net v2 2/5] xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx Jason Xing
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Jason Xing @ 2026-05-15 12:30 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

The TX metadata area resides in the UMEM buffer which is memory-mapped
and concurrently writable by userspace. In xsk_skb_metadata(),
csum_start and csum_offset are read from shared memory for bounds
validation, then read again for skb assignment. A malicious userspace
application can race to overwrite these values between the two reads,
bypassing the bounds check and causing out-of-bounds memory access
during checksum computation in the transmit path.

Fix this by reading csum_start and csum_offset into local variables
once, then using the local copies for both validation and assignment.

Note that other metadata fields (flags, launch_time) and the cached
csum fields may be mutually inconsistent due to concurrent userspace
writes, but this is benign: the only security-critical invariant is
that each field's validated value is the same one used, which local
caching guarantees.

Closes: https://lore.kernel.org/all/20260503200927.73EA1C2BCB4@smtp.kernel.org/
Fixes: 48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 net/xdp/xsk.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 6bcd77068e52..adcec1d22e8b 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -722,6 +722,7 @@ static int xsk_skb_metadata(struct sk_buff *skb, void *buffer,
 			    u32 hr)
 {
 	struct xsk_tx_metadata *meta = NULL;
+	u16 csum_start, csum_offset;
 
 	if (unlikely(pool->tx_metadata_len == 0))
 		return -EINVAL;
@@ -731,13 +732,15 @@ static int xsk_skb_metadata(struct sk_buff *skb, void *buffer,
 		return -EINVAL;
 
 	if (meta->flags & XDP_TXMD_FLAGS_CHECKSUM) {
-		if (unlikely(meta->request.csum_start +
-			     meta->request.csum_offset +
+		csum_start = READ_ONCE(meta->request.csum_start);
+		csum_offset = READ_ONCE(meta->request.csum_offset);
+
+		if (unlikely(csum_start + csum_offset +
 			     sizeof(__sum16) > desc->len))
 			return -EINVAL;
 
-		skb->csum_start = hr + meta->request.csum_start;
-		skb->csum_offset = meta->request.csum_offset;
+		skb->csum_start = hr + csum_start;
+		skb->csum_offset = csum_offset;
 		skb->ip_summed = CHECKSUM_PARTIAL;
 
 		if (unlikely(pool->tx_sw_csum)) {
-- 
2.41.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net v2 2/5] xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx
  2026-05-15 12:30 [PATCH net v2 0/5] xsk: fix meta and publish of cq issues Jason Xing
  2026-05-15 12:30 ` [PATCH net v2 1/5] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() Jason Xing
@ 2026-05-15 12:30 ` Jason Xing
  2026-05-15 12:30 ` [PATCH net v2 3/5] xsk: drain continuation descs after overflow in xsk_build_skb() Jason Xing
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Jason Xing @ 2026-05-15 12:30 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

This patch is inspired by the check[1] from sashiko. It says when
overflow happens, the address of cq to be published is invalid.
Actually the severer thing is the whole process of publishing the
address of cq in this particular case is not right: it should truely
publish the address and advance the cached_prod in cq as long as it
reads descriptors from txq.

The following is the full analysis.
xsk_drop_skb() is called in three places, which all discard a partially
built multi-buffer skb:
1) xsk_build_skb() -EOVERFLOW error path: packet exceeds MAX_SKB_FRAGS
2) __xsk_generic_xmit() post-loop cleanup: an invalid descriptor in
   the TX ring prevents the partial packet from completing
3) xsk_release(): socket close while xs->skb holds an incomplete packet

In all three cases, the TX descriptors for the already-processed frags
have been consumed from the TX ring (xskq_cons_release), and CQ slots
have been reserved. However, xsk_drop_skb() calls xsk_consume_skb()
which cancels the CQ reservations via xsk_cq_cancel_locked(). Since
the buffer addresses never appear in the completion queue, userspace
permanently loses track of these buffers.

Fix this by letting consume_skb() trigger the existing xsk_destruct_skb
destructor, which already submits buffer addresses to the CQ via
xsk_cq_submit_addr_locked().

Note that cancelling the descriptors back to the TX ring (via
xskq_cons_cancel_n) is not a appropriate option because an oversized
packet that always exceeds MAX_SKB_FRAGS would be retried indefinitely,
which is an obviously deadlock bug in the TX path.

Also move the desc->addr assignment in xsk_build_skb() above the
overflow check so that the current descriptor's address is recorded
before a potential -EOVERFLOW jump to free_err, consistent with the
zerocopy path in xsk_build_skb_zerocopy().

[1]: https://lore.kernel.org/all/20260425041726.85FB3C2BCB2@smtp.kernel.org/

Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 net/xdp/xsk.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index adcec1d22e8b..1cc14cb415f3 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -713,8 +713,11 @@ static void xsk_consume_skb(struct sk_buff *skb)
 
 static void xsk_drop_skb(struct sk_buff *skb)
 {
-	xdp_sk(skb->sk)->tx->invalid_descs += xsk_get_num_desc(skb);
-	xsk_consume_skb(skb);
+	struct xdp_sock *xs = xdp_sk(skb->sk);
+
+	xs->tx->invalid_descs += xsk_get_num_desc(skb);
+	consume_skb(skb);
+	xs->skb = NULL;
 }
 
 static int xsk_skb_metadata(struct sk_buff *skb, void *buffer,
@@ -796,7 +799,7 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
 			return ERR_PTR(-ENOMEM);
 
 		/* in case of -EOVERFLOW that could happen below,
-		 * xsk_consume_skb() will release this node as whole skb
+		 * xsk_drop_skb() will release this node as whole skb
 		 * would be dropped, which implies freeing all list elements
 		 */
 		xsk_addr->addrs[xsk_addr->num_descs] = desc->addr;
@@ -888,6 +891,8 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
 				goto free_err;
 			}
 
+			xsk_addr->addrs[xsk_addr->num_descs] = desc->addr;
+
 			if (unlikely(nr_frags == (MAX_SKB_FRAGS - 1) && xp_mb_desc(desc))) {
 				err = -EOVERFLOW;
 				goto free_err;
@@ -905,8 +910,6 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
 
 			skb_add_rx_frag(skb, nr_frags, page, 0, len, PAGE_SIZE);
 			refcount_add(PAGE_SIZE, &xs->sk.sk_wmem_alloc);
-
-			xsk_addr->addrs[xsk_addr->num_descs] = desc->addr;
 		}
 	}
 
-- 
2.41.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net v2 3/5] xsk: drain continuation descs after overflow in xsk_build_skb()
  2026-05-15 12:30 [PATCH net v2 0/5] xsk: fix meta and publish of cq issues Jason Xing
  2026-05-15 12:30 ` [PATCH net v2 1/5] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() Jason Xing
  2026-05-15 12:30 ` [PATCH net v2 2/5] xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx Jason Xing
@ 2026-05-15 12:30 ` Jason Xing
  2026-05-16 12:31   ` sashiko-bot
  2026-05-15 12:30 ` [PATCH net v2 4/5] xsk: drain continuation descs on invalid descriptor in __xsk_generic_xmit() Jason Xing
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Jason Xing @ 2026-05-15 12:30 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

When a multi-buffer packet exceeds MAX_SKB_FRAGS and triggers -EOVERFLOW,
only the current descriptor is released from the TX ring. The remaining
continuation descriptors of the same packet stay in the ring. Since
xs->skb is set to NULL after the drop, the TX loop picks up these
leftover frags and misinterprets each one as the beginning of a new
packet, corrupting the packet stream.

Fix this by adding a drain_cont flag to xdp_sock. When overflow occurs
and the dropped descriptor has XDP_PKT_CONTD set, the flag is raised.
The main TX loop in __xsk_generic_xmit() then handles continuation
descriptors one at a time: each gets a normal CQ reservation (with
backpressure), its address is submitted to the completion queue, and
the descriptor is released from the TX ring. When the last fragment
(without XDP_PKT_CONTD) is processed, the flag is cleared and the
function returns -EOVERFLOW so the next call starts with a fresh
budget for normal packets.

This reuses the existing CQ backpressure and budget mechanisms, so if
the CQ is full the function returns -EAGAIN and userspace drains the
CQ before retrying. Zero buffer leakage, zero packet stream corruption.

Closes: https://lore.kernel.org/all/20260425041726.85FB3C2BCB2@smtp.kernel.org/
Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 include/net/xdp_sock.h |  1 +
 net/xdp/xsk.c          | 22 ++++++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index 23e8861e8b25..1958d19d9925 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -80,6 +80,7 @@ struct xdp_sock {
 	 * call of __xsk_generic_xmit().
 	 */
 	struct sk_buff *skb;
+	bool drain_cont;
 
 	struct list_head map_list;
 	/* Protects map_list */
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 1cc14cb415f3..f98a9553e536 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -936,6 +936,8 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
 			xs->tx->invalid_descs++;
 		}
 		xskq_cons_release(xs->tx);
+		if (xp_mb_desc(desc))
+			xs->drain_cont = true;
 	} else {
 		/* Let application retry */
 		xsk_cq_cancel_locked(xs->pool, 1);
@@ -982,6 +984,26 @@ static int __xsk_generic_xmit(struct sock *sk)
 			goto out;
 		}
 
+		if (unlikely(xs->drain_cont)) {
+			unsigned long flags;
+			u32 idx;
+
+			spin_lock_irqsave(&xs->pool->cq_prod_lock, flags);
+			idx = xskq_get_prod(xs->pool->cq);
+			xskq_prod_write_addr(xs->pool->cq, idx, desc.addr);
+			xskq_prod_submit_n(xs->pool->cq, 1);
+			spin_unlock_irqrestore(&xs->pool->cq_prod_lock, flags);
+
+			xs->tx->invalid_descs++;
+			xskq_cons_release(xs->tx);
+			if (!xp_mb_desc(&desc)) {
+				xs->drain_cont = false;
+				err = -EOVERFLOW;
+				goto out;
+			}
+			continue;
+		}
+
 		skb = xsk_build_skb(xs, &desc);
 		if (IS_ERR(skb)) {
 			err = PTR_ERR(skb);
-- 
2.41.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net v2 4/5] xsk: drain continuation descs on invalid descriptor in __xsk_generic_xmit()
  2026-05-15 12:30 [PATCH net v2 0/5] xsk: fix meta and publish of cq issues Jason Xing
                   ` (2 preceding siblings ...)
  2026-05-15 12:30 ` [PATCH net v2 3/5] xsk: drain continuation descs after overflow in xsk_build_skb() Jason Xing
@ 2026-05-15 12:30 ` Jason Xing
  2026-05-16 12:31   ` sashiko-bot
  2026-05-15 12:30 ` [PATCH net v2 5/5] selftests/xsk: fix multi-buffer invalid desc tests for drain_cont Jason Xing
  2026-05-15 16:59 ` [PATCH net v2 0/5] xsk: fix meta and publish of cq issues Maciej Fijalkowski
  5 siblings, 1 reply; 15+ messages in thread
From: Jason Xing @ 2026-05-15 12:30 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

When the TX loop in __xsk_generic_xmit() encounters an invalid
descriptor mid-packet (e.g. an out-of-bounds address), the partial
skb is dropped and the offending descriptor is released. However,
remaining continuation descriptors belonging to the same multi-buffer
packet still sit in the TX ring. Since xs->skb becomes NULL after the
drop, the next iteration treats the leftover continuation fragment as
a brand-new packet, corrupting the packet stream.

Fix this by setting the drain_cont flag when the released descriptor
has XDP_PKT_CONTD set. On the next call to __xsk_generic_xmit(), the
drain logic introduced in the previous patch handles the remaining
fragments with normal CQ backpressure.

Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 net/xdp/xsk.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index f98a9553e536..3b604776f09f 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -1045,6 +1045,8 @@ static int __xsk_generic_xmit(struct sock *sk)
 		if (xs->skb)
 			xsk_drop_skb(xs->skb);
 		xskq_cons_release(xs->tx);
+		if (xp_mb_desc(&desc))
+			xs->drain_cont = true;
 	}
 
 out:
-- 
2.41.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net v2 5/5] selftests/xsk: fix multi-buffer invalid desc tests for drain_cont
  2026-05-15 12:30 [PATCH net v2 0/5] xsk: fix meta and publish of cq issues Jason Xing
                   ` (3 preceding siblings ...)
  2026-05-15 12:30 ` [PATCH net v2 4/5] xsk: drain continuation descs on invalid descriptor in __xsk_generic_xmit() Jason Xing
@ 2026-05-15 12:30 ` Jason Xing
  2026-05-16 12:31   ` sashiko-bot
  2026-05-15 16:59 ` [PATCH net v2 0/5] xsk: fix meta and publish of cq issues Maciej Fijalkowski
  5 siblings, 1 reply; 15+ messages in thread
From: Jason Xing @ 2026-05-15 12:30 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

After the kernel xsk drain_cont patches, dropped and drained
multi-buffer descriptors have their buffer addresses published to the
completion queue (CQ) rather than being cancelled. This is the crucial
and correct way, but it leads to the selftests failure.

In the existing selftest, we need to take care of all the sub-tests
with those two things in mind:
1) Invalid packets whose last descriptor has XDP_PKT_CONTD set
   cause drain_cont to leak past the packet boundary, consuming
   subsequent valid packets from the TX ring.
2) The extra CQ entries from dropped descriptors cause outstanding_tx
   to reach zero before the kernel finishes processing all TX ring
   descriptors, so wait_for_tx_completion() exits early and valid
   packets after invalid ones are never transmitted.

This patch makes the following changes to follow the right fixes.
- XSK_DESC__INVALID_OPTION: change the value from 0xffff to 0xfffe,
  so it no longer sets the XDP_PKT_CONTD bit (bit 0).
- complete_pkts: tolerate extra CQ completions by clamping
  outstanding_tx to zero instead of failing.
- wait_for_tx_completion: add a drain loop that consumes CQ entries
  after outstanding_tx reaches zero. Ensure remaining valid packets
  are transmitted. This change is made because of patch 3 in the
  series adds a logic in __xsk_generic_xmit(): return -EOVERFLOW
  after detecting and handling the remaining part of the skb.
- testapp_invalid_desc_mb: clearing XDP_PKT_CONTD on the last
  descriptor of each invalid test packet, so drain stops at the
  packet boundary.
- testapp_too_many_frags: add one new/extra terminating descriptor
  to the invalid packet, so drain_cont stops before the trailing
  sync packet.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 .../selftests/bpf/prog_tests/test_xsk.c       | 45 ++++++++++---------
 1 file changed, 24 insertions(+), 21 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/test_xsk.c b/tools/testing/selftests/bpf/prog_tests/test_xsk.c
index 7e38ec6e656b..e23131ef7f18 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_xsk.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_xsk.c
@@ -31,7 +31,7 @@
 #define POLL_TMOUT			1000
 #define THREAD_TMOUT			3
 #define UMEM_HEADROOM_TEST_SIZE		128
-#define XSK_DESC__INVALID_OPTION	(0xffff)
+#define XSK_DESC__INVALID_OPTION	(0xfffe)
 #define XSK_UMEM__INVALID_FRAME_SIZE	(MAX_ETH_JUMBO_SIZE + 1)
 #define XSK_UMEM__LARGE_FRAME_SIZE	(3 * 1024)
 #define XSK_UMEM__MAX_FRAME_SIZE	(4 * 1024)
@@ -969,17 +969,11 @@ static int complete_pkts(struct xsk_socket_info *xsk, int batch_size)
 
 	rcvd = xsk_ring_cons__peek(&xsk->umem->cq, batch_size, &idx);
 	if (rcvd) {
-		if (rcvd > xsk->outstanding_tx) {
-			u64 addr = *xsk_ring_cons__comp_addr(&xsk->umem->cq, idx + rcvd - 1);
-
-			ksft_print_msg("[%s] Too many packets completed\n", __func__);
-			ksft_print_msg("Last completion address: %llx\n",
-				       (unsigned long long)addr);
-			return TEST_FAILURE;
-		}
-
 		xsk_ring_cons__release(&xsk->umem->cq, rcvd);
-		xsk->outstanding_tx -= rcvd;
+		if (rcvd > xsk->outstanding_tx)
+			xsk->outstanding_tx = 0;
+		else
+			xsk->outstanding_tx -= rcvd;
 	}
 
 	return TEST_PASS;
@@ -1293,6 +1287,8 @@ static int __send_pkts(struct ifobject *ifobject, struct xsk_socket_info *xsk, b
 static int wait_for_tx_completion(struct xsk_socket_info *xsk)
 {
 	struct timeval tv_end, tv_now, tv_timeout = {THREAD_TMOUT, 0};
+	unsigned int rcvd;
+	u32 idx;
 	int ret;
 
 	ret = gettimeofday(&tv_now, NULL);
@@ -1312,6 +1308,14 @@ static int wait_for_tx_completion(struct xsk_socket_info *xsk)
 		complete_pkts(xsk, xsk->batch_size);
 	}
 
+	do {
+		if (xsk_ring_prod__needs_wakeup(&xsk->tx))
+			kick_tx(xsk);
+		rcvd = xsk_ring_cons__peek(&xsk->umem->cq, xsk->batch_size, &idx);
+		if (rcvd)
+			xsk_ring_cons__release(&xsk->umem->cq, rcvd);
+	} while (rcvd);
+
 	return TEST_PASS;
 }
 
@@ -2092,10 +2096,10 @@ int testapp_invalid_desc_mb(struct test_spec *test)
 		{0, 0, 0, false, 0},
 		/* Invalid address in the second frame */
 		{0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD},
-		{umem_size, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD},
+		{umem_size, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, 0},
 		/* Invalid len in the middle */
 		{0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD},
-		{0, XSK_UMEM__INVALID_FRAME_SIZE, 0, false, XDP_PKT_CONTD},
+		{0, XSK_UMEM__INVALID_FRAME_SIZE, 0, false, 0},
 		/* Invalid options in the middle */
 		{0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD},
 		{0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XSK_DESC__INVALID_OPTION},
@@ -2250,7 +2254,7 @@ int testapp_too_many_frags(struct test_spec *test)
 		max_frags += 1;
 	}
 
-	pkts = calloc(2 * max_frags + 2, sizeof(struct pkt));
+	pkts = calloc(2 * max_frags + 3, sizeof(struct pkt));
 	if (!pkts)
 		return TEST_FAILURE;
 
@@ -2268,20 +2272,19 @@ int testapp_too_many_frags(struct test_spec *test)
 	}
 	pkts[max_frags].options = 0;
 
-	/* An invalid packet with the max amount of frags but signals packet
-	 * continues on the last frag
-	 */
-	for (i = max_frags + 1; i < 2 * max_frags + 1; i++) {
+	/* An invalid packet with too many frags */
+	for (i = max_frags + 1; i < 2 * max_frags + 2; i++) {
 		pkts[i].len = MIN_PKT_SIZE;
 		pkts[i].options = XDP_PKT_CONTD;
 		pkts[i].valid = false;
 	}
+	pkts[2 * max_frags + 1].options = 0;
 
 	/* Valid packet for synch */
-	pkts[2 * max_frags + 1].len = MIN_PKT_SIZE;
-	pkts[2 * max_frags + 1].valid = true;
+	pkts[2 * max_frags + 2].len = MIN_PKT_SIZE;
+	pkts[2 * max_frags + 2].valid = true;
 
-	if (pkt_stream_generate_custom(test, pkts, 2 * max_frags + 2)) {
+	if (pkt_stream_generate_custom(test, pkts, 2 * max_frags + 3)) {
 		free(pkts);
 		return TEST_FAILURE;
 	}
-- 
2.41.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net v2 0/5] xsk: fix meta and publish of cq issues
  2026-05-15 12:30 [PATCH net v2 0/5] xsk: fix meta and publish of cq issues Jason Xing
                   ` (4 preceding siblings ...)
  2026-05-15 12:30 ` [PATCH net v2 5/5] selftests/xsk: fix multi-buffer invalid desc tests for drain_cont Jason Xing
@ 2026-05-15 16:59 ` Maciej Fijalkowski
  2026-05-17  2:09   ` Jason Xing
  5 siblings, 1 reply; 15+ messages in thread
From: Maciej Fijalkowski @ 2026-05-15 16:59 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms,
	andrew+netdev, bpf, netdev, Jason Xing

On Fri, May 15, 2026 at 08:30:13PM +0800, Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
> 
> The series is the product of previous review from sashiko[1].
> 
> 1) META
> patch 1: address TOCTOU around metadata.
> 
> 2) PUBLISH of CQ
> patch 2: make sure xsk_addr->addrs[] can be published to cq when
>          overflow occurs.
> patch 3: keep cleaning up the continuation descs (more than 17) and
>          publish its address when overflow occurs.
> patch 4: like patch 3, but only handles the invalid descs cases.
> 
> [1]: https://lore.kernel.org/all/20260502200722.53960-1-kerneljasonxing@gmail.com/
> 
> ---
> V2
> Link: https://lore.kernel.org/all/20260510012310.88570-1-kerneljasonxing@gmail.com/
> 1. adjust selftests (Jakub)
> 2. add READ_ONCE in patch 1 (Stan)
> 
> 
> Jason Xing (5):
>   xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata()
>   xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx
>   xsk: drain continuation descs after overflow in xsk_build_skb()
>   xsk: drain continuation descs on invalid descriptor in
>     __xsk_generic_xmit()
>   selftests/xsk: fix multi-buffer invalid desc tests for drain_cont
> 
>  include/net/xdp_sock.h                        |  1 +
>  net/xdp/xsk.c                                 | 48 +++++++++++++++----
>  .../selftests/bpf/prog_tests/test_xsk.c       | 45 +++++++++--------
>  3 files changed, 64 insertions(+), 30 deletions(-)

this adds regression to xskxceiver runs on real hw (ice on my side):

PREREQUISITES: [ PASS ]
1..108
ok 1 PASS: SKB SEND_RECEIVE
ok 2 PASS: SKB SEND_RECEIVE_2K_FRAME
ok 3 PASS: SKB SEND_RECEIVE_SINGLE_PKT
ok 4 PASS: SKB POLL_RX
ok 5 PASS: SKB POLL_TX
ok 6 PASS: SKB POLL_RXQ_FULL
ok 7 PASS: SKB POLL_TXQ_FULL
ok 8 PASS: SKB ALIGNED_INV_DESC
ok 9 PASS: SKB ALIGNED_INV_DESC_2K_FRAME_SIZE
ok 10 PASS: SKB UMEM_HEADROOM
ok 11 PASS: SKB BIDIRECTIONAL
ok 12 PASS: SKB STAT_RX_DROPPED
ok 13 PASS: SKB STAT_TX_INVALID
ok 14 PASS: SKB STAT_RX_FULL
ok 15 PASS: SKB STAT_FILL_EMPTY
ok 16 PASS: SKB XDP_PROG_CLEANUP
ok 17 PASS: SKB XDP_DROP_HALF
ok 18 PASS: SKB XDP_SHARED_UMEM
ok 19 PASS: SKB XDP_METADATA_COPY
ok 20 PASS: SKB XDP_METADATA_COPY_MULTI_BUFF
# [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
# DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# 
DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# 
DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# 
.... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 
---------------------------------------
not ok 21 FAIL: SKB ALIGNED_INV_DESC_MULTI_BUFF
ok 22 PASS: SKB TOO_MANY_FRAGS
ok 23 PASS: SKB XDP_ADJUST_TAIL_SHRINK
ok 24 PASS: SKB TX_QUEUE_CONSUMER
ok 25 PASS: SKB XDP_ADJUST_TAIL_SHRINK_MULTI_BUFF
ok 26 # SKIP SKIP: SKB XDP_ADJUST_TAIL_GROW
ok 27 # SKIP SKIP: SKB XDP_ADJUST_TAIL_GROW_MULTI_BUFF
ok 28 PASS: SKB SEND_RECEIVE_9K_PACKETS
ok 29 PASS: SKB SEND_RECEIVE_UNALIGNED
ok 30 PASS: SKB UNALIGNED_INV_DESC
ok 31 PASS: SKB UNALIGNED_INV_DESC_4001_FRAME_SIZE
ok 32 PASS: SKB SEND_RECEIVE_UNALIGNED_9K_PACKETS
# [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
# DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# 
DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# 
DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# 
.... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 
---------------------------------------
not ok 33 FAIL: SKB UNALIGNED_INV_DESC_MULTI_BUFF
ok 34 PASS: SKB HW_SW_MIN_RING_SIZE
ok 35 PASS: SKB HW_SW_MAX_RING_SIZE
ok 36 PASS: SKB TEARDOWN
ok 37 PASS: DRV SEND_RECEIVE
ok 38 PASS: DRV SEND_RECEIVE_2K_FRAME
ok 39 PASS: DRV SEND_RECEIVE_SINGLE_PKT
ok 40 PASS: DRV POLL_RX
ok 41 PASS: DRV POLL_TX
ok 42 PASS: DRV POLL_RXQ_FULL
ok 43 PASS: DRV POLL_TXQ_FULL
ok 44 PASS: DRV ALIGNED_INV_DESC
ok 45 PASS: DRV ALIGNED_INV_DESC_2K_FRAME_SIZE
ok 46 PASS: DRV UMEM_HEADROOM
ok 47 PASS: DRV BIDIRECTIONAL
ok 48 PASS: DRV STAT_RX_DROPPED
ok 49 PASS: DRV STAT_TX_INVALID
ok 50 PASS: DRV STAT_RX_FULL
ok 51 PASS: DRV STAT_FILL_EMPTY
ok 52 PASS: DRV XDP_PROG_CLEANUP
ok 53 PASS: DRV XDP_DROP_HALF
ok 54 PASS: DRV XDP_SHARED_UMEM
ok 55 PASS: DRV XDP_METADATA_COPY
ok 56 PASS: DRV XDP_METADATA_COPY_MULTI_BUFF
# [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
# DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# 
DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# 
DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# 
.... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 
---------------------------------------
not ok 57 FAIL: DRV ALIGNED_INV_DESC_MULTI_BUFF
ok 58 PASS: DRV TOO_MANY_FRAGS
ok 59 PASS: DRV XDP_ADJUST_TAIL_SHRINK
ok 60 PASS: DRV TX_QUEUE_CONSUMER
ok 61 PASS: DRV XDP_ADJUST_TAIL_SHRINK_MULTI_BUFF
ok 62 PASS: DRV XDP_ADJUST_TAIL_GROW
ok 63 PASS: DRV XDP_ADJUST_TAIL_GROW_MULTI_BUFF
ok 64 PASS: DRV SEND_RECEIVE_9K_PACKETS
ok 65 PASS: DRV SEND_RECEIVE_UNALIGNED
ok 66 PASS: DRV UNALIGNED_INV_DESC
ok 67 PASS: DRV UNALIGNED_INV_DESC_4001_FRAME_SIZE
ok 68 PASS: DRV SEND_RECEIVE_UNALIGNED_9K_PACKETS
# [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
# DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# 
DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# 
DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# 
.... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 
---------------------------------------
not ok 69 FAIL: DRV UNALIGNED_INV_DESC_MULTI_BUFF
ok 70 PASS: DRV HW_SW_MIN_RING_SIZE
ok 71 PASS: DRV HW_SW_MAX_RING_SIZE
ok 72 PASS: DRV TEARDOWN
ok 73 PASS: ZC SEND_RECEIVE
ok 74 PASS: ZC SEND_RECEIVE_2K_FRAME
ok 75 PASS: ZC SEND_RECEIVE_SINGLE_PKT
ok 76 PASS: ZC POLL_RX
ok 77 PASS: ZC POLL_TX
ok 78 PASS: ZC POLL_RXQ_FULL
ok 79 PASS: ZC POLL_TXQ_FULL
ok 80 PASS: ZC ALIGNED_INV_DESC
ok 81 PASS: ZC ALIGNED_INV_DESC_2K_FRAME_SIZE
ok 82 PASS: ZC UMEM_HEADROOM
ok 83 PASS: ZC BIDIRECTIONAL
# Can not run RX_DROPPED test for ZC mode
ok 84 # SKIP SKIP: ZC STAT_RX_DROPPED
ok 85 PASS: ZC STAT_TX_INVALID
ok 86 PASS: ZC STAT_RX_FULL
ok 87 PASS: ZC STAT_FILL_EMPTY
ok 88 PASS: ZC XDP_PROG_CLEANUP
ok 89 PASS: ZC XDP_DROP_HALF
ok 90 PASS: ZC XDP_SHARED_UMEM
ok 91 PASS: ZC XDP_METADATA_COPY
ok 92 PASS: ZC XDP_METADATA_COPY_MULTI_BUFF
# [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
# DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# 
DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# 
DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# 
.... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 
---------------------------------------
not ok 93 FAIL: ZC ALIGNED_INV_DESC_MULTI_BUFF
# [is_frag_valid] expected pkt_nb [11], got pkt_nb [0]
# DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# 
DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# 
DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# 
---------------------------------------
not ok 94 FAIL: ZC TOO_MANY_FRAGS
ok 95 PASS: ZC XDP_ADJUST_TAIL_SHRINK
# Can not run TX_QUEUE_CONSUMER test for ZC mode
ok 96 # SKIP SKIP: ZC TX_QUEUE_CONSUMER
ok 97 PASS: ZC XDP_ADJUST_TAIL_SHRINK_MULTI_BUFF
ok 98 PASS: ZC XDP_ADJUST_TAIL_GROW
ok 99 PASS: ZC XDP_ADJUST_TAIL_GROW_MULTI_BUFF
ok 100 PASS: ZC SEND_RECEIVE_9K_PACKETS
ok 101 PASS: ZC SEND_RECEIVE_UNALIGNED
ok 102 PASS: ZC UNALIGNED_INV_DESC
ok 103 PASS: ZC UNALIGNED_INV_DESC_4001_FRAME_SIZE
ok 104 PASS: ZC SEND_RECEIVE_UNALIGNED_9K_PACKETS
# [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
# DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# 
DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# 
DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# 
.... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 
---------------------------------------
not ok 105 FAIL: ZC UNALIGNED_INV_DESC_MULTI_BUFF
ok 106 PASS: ZC HW_SW_MIN_RING_SIZE
ok 107 PASS: ZC HW_SW_MAX_RING_SIZE
ok 108 PASS: ZC TEARDOWN
# 4 skipped test(s) detected. Consider enabling relevant config options to improve coverage.
# Totals: pass:97 fail:7 xfail:0 xpass:0 skip:4 error:0
XSK_SELFTESTS_ens785f0np0_SOFTIRQ: [ FAIL ]
1..108
ok 1 PASS: SKB BUSY-POLL SEND_RECEIVE
ok 2 PASS: SKB BUSY-POLL SEND_RECEIVE_2K_FRAME
ok 3 PASS: SKB BUSY-POLL SEND_RECEIVE_SINGLE_PKT
ok 4 PASS: SKB BUSY-POLL POLL_RX
ok 5 PASS: SKB BUSY-POLL POLL_TX
ok 6 PASS: SKB BUSY-POLL POLL_RXQ_FULL
ok 7 PASS: SKB BUSY-POLL POLL_TXQ_FULL
ok 8 PASS: SKB BUSY-POLL ALIGNED_INV_DESC
ok 9 PASS: SKB BUSY-POLL ALIGNED_INV_DESC_2K_FRAME_SIZE
ok 10 PASS: SKB BUSY-POLL UMEM_HEADROOM
ok 11 PASS: SKB BUSY-POLL BIDIRECTIONAL
ok 12 PASS: SKB BUSY-POLL STAT_RX_DROPPED
ok 13 PASS: SKB BUSY-POLL STAT_TX_INVALID
ok 14 PASS: SKB BUSY-POLL STAT_RX_FULL
ok 15 PASS: SKB BUSY-POLL STAT_FILL_EMPTY
ok 16 PASS: SKB BUSY-POLL XDP_PROG_CLEANUP
ok 17 PASS: SKB BUSY-POLL XDP_DROP_HALF
ok 18 PASS: SKB BUSY-POLL XDP_SHARED_UMEM
ok 19 PASS: SKB BUSY-POLL XDP_METADATA_COPY
ok 20 PASS: SKB BUSY-POLL XDP_METADATA_COPY_MULTI_BUFF
# [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
# DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# 
DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# 
DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# 
.... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 
---------------------------------------
not ok 21 FAIL: SKB BUSY-POLL ALIGNED_INV_DESC_MULTI_BUFF
ok 22 PASS: SKB BUSY-POLL TOO_MANY_FRAGS
ok 23 PASS: SKB BUSY-POLL XDP_ADJUST_TAIL_SHRINK
ok 24 PASS: SKB BUSY-POLL TX_QUEUE_CONSUMER
ok 25 PASS: SKB BUSY-POLL XDP_ADJUST_TAIL_SHRINK_MULTI_BUFF
ok 26 # SKIP SKIP: SKB BUSY-POLL XDP_ADJUST_TAIL_GROW
ok 27 # SKIP SKIP: SKB BUSY-POLL XDP_ADJUST_TAIL_GROW_MULTI_BUFF
ok 28 PASS: SKB BUSY-POLL SEND_RECEIVE_9K_PACKETS
ok 29 PASS: SKB BUSY-POLL SEND_RECEIVE_UNALIGNED
ok 30 PASS: SKB BUSY-POLL UNALIGNED_INV_DESC
ok 31 PASS: SKB BUSY-POLL UNALIGNED_INV_DESC_4001_FRAME_SIZE
ok 32 PASS: SKB BUSY-POLL SEND_RECEIVE_UNALIGNED_9K_PACKETS
# [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
# DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# 
DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# 
DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# 
.... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 
---------------------------------------
not ok 33 FAIL: SKB BUSY-POLL UNALIGNED_INV_DESC_MULTI_BUFF
ok 34 PASS: SKB BUSY-POLL HW_SW_MIN_RING_SIZE
ok 35 PASS: SKB BUSY-POLL HW_SW_MAX_RING_SIZE
ok 36 PASS: SKB BUSY-POLL TEARDOWN
ok 37 PASS: DRV BUSY-POLL SEND_RECEIVE
ok 38 PASS: DRV BUSY-POLL SEND_RECEIVE_2K_FRAME
ok 39 PASS: DRV BUSY-POLL SEND_RECEIVE_SINGLE_PKT
ok 40 PASS: DRV BUSY-POLL POLL_RX
ok 41 PASS: DRV BUSY-POLL POLL_TX
ok 42 PASS: DRV BUSY-POLL POLL_RXQ_FULL
ok 43 PASS: DRV BUSY-POLL POLL_TXQ_FULL
ok 44 PASS: DRV BUSY-POLL ALIGNED_INV_DESC
ok 45 PASS: DRV BUSY-POLL ALIGNED_INV_DESC_2K_FRAME_SIZE
ok 46 PASS: DRV BUSY-POLL UMEM_HEADROOM
ok 47 PASS: DRV BUSY-POLL BIDIRECTIONAL
ok 48 PASS: DRV BUSY-POLL STAT_RX_DROPPED
ok 49 PASS: DRV BUSY-POLL STAT_TX_INVALID
ok 50 PASS: DRV BUSY-POLL STAT_RX_FULL
ok 51 PASS: DRV BUSY-POLL STAT_FILL_EMPTY
ok 52 PASS: DRV BUSY-POLL XDP_PROG_CLEANUP
ok 53 PASS: DRV BUSY-POLL XDP_DROP_HALF
ok 54 PASS: DRV BUSY-POLL XDP_SHARED_UMEM
ok 55 PASS: DRV BUSY-POLL XDP_METADATA_COPY
ok 56 PASS: DRV BUSY-POLL XDP_METADATA_COPY_MULTI_BUFF
# [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
# DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# 
DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# 
DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# 
.... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 
---------------------------------------
not ok 57 FAIL: DRV BUSY-POLL ALIGNED_INV_DESC_MULTI_BUFF
ok 58 PASS: DRV BUSY-POLL TOO_MANY_FRAGS
ok 59 PASS: DRV BUSY-POLL XDP_ADJUST_TAIL_SHRINK
ok 60 PASS: DRV BUSY-POLL TX_QUEUE_CONSUMER
ok 61 PASS: DRV BUSY-POLL XDP_ADJUST_TAIL_SHRINK_MULTI_BUFF
ok 62 PASS: DRV BUSY-POLL XDP_ADJUST_TAIL_GROW
ok 63 PASS: DRV BUSY-POLL XDP_ADJUST_TAIL_GROW_MULTI_BUFF
ok 64 PASS: DRV BUSY-POLL SEND_RECEIVE_9K_PACKETS
ok 65 PASS: DRV BUSY-POLL SEND_RECEIVE_UNALIGNED
ok 66 PASS: DRV BUSY-POLL UNALIGNED_INV_DESC
ok 67 PASS: DRV BUSY-POLL UNALIGNED_INV_DESC_4001_FRAME_SIZE
ok 68 PASS: DRV BUSY-POLL SEND_RECEIVE_UNALIGNED_9K_PACKETS
# [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
# DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# 
DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# 
DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# 
.... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 
---------------------------------------
not ok 69 FAIL: DRV BUSY-POLL UNALIGNED_INV_DESC_MULTI_BUFF
ok 70 PASS: DRV BUSY-POLL HW_SW_MIN_RING_SIZE
ok 71 PASS: DRV BUSY-POLL HW_SW_MAX_RING_SIZE
ok 72 PASS: DRV BUSY-POLL TEARDOWN
ok 73 PASS: ZC BUSY-POLL SEND_RECEIVE
ok 74 PASS: ZC BUSY-POLL SEND_RECEIVE_2K_FRAME
ok 75 PASS: ZC BUSY-POLL SEND_RECEIVE_SINGLE_PKT
ok 76 PASS: ZC BUSY-POLL POLL_RX
ok 77 PASS: ZC BUSY-POLL POLL_TX
ok 78 PASS: ZC BUSY-POLL POLL_RXQ_FULL
ok 79 PASS: ZC BUSY-POLL POLL_TXQ_FULL
ok 80 PASS: ZC BUSY-POLL ALIGNED_INV_DESC
ok 81 PASS: ZC BUSY-POLL ALIGNED_INV_DESC_2K_FRAME_SIZE
ok 82 PASS: ZC BUSY-POLL UMEM_HEADROOM
ok 83 PASS: ZC BUSY-POLL BIDIRECTIONAL
# Can not run RX_DROPPED test for ZC mode
ok 84 # SKIP SKIP: ZC BUSY-POLL STAT_RX_DROPPED
ok 85 PASS: ZC BUSY-POLL STAT_TX_INVALID
ok 86 PASS: ZC BUSY-POLL STAT_RX_FULL
ok 87 PASS: ZC BUSY-POLL STAT_FILL_EMPTY
ok 88 PASS: ZC BUSY-POLL XDP_PROG_CLEANUP
ok 89 PASS: ZC BUSY-POLL XDP_DROP_HALF
ok 90 PASS: ZC BUSY-POLL XDP_SHARED_UMEM
ok 91 PASS: ZC BUSY-POLL XDP_METADATA_COPY
ok 92 PASS: ZC BUSY-POLL XDP_METADATA_COPY_MULTI_BUFF
# [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
# DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# 
DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# 
DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# 
.... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 
---------------------------------------
not ok 93 FAIL: ZC BUSY-POLL ALIGNED_INV_DESC_MULTI_BUFF
# [is_frag_valid] expected pkt_nb [11], got pkt_nb [0]
# DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# 
DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# 
DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# 
---------------------------------------
not ok 94 FAIL: ZC BUSY-POLL TOO_MANY_FRAGS
ok 95 PASS: ZC BUSY-POLL XDP_ADJUST_TAIL_SHRINK
# Can not run TX_QUEUE_CONSUMER test for ZC mode
ok 96 # SKIP SKIP: ZC BUSY-POLL TX_QUEUE_CONSUMER
ok 97 PASS: ZC BUSY-POLL XDP_ADJUST_TAIL_SHRINK_MULTI_BUFF
ok 98 PASS: ZC BUSY-POLL XDP_ADJUST_TAIL_GROW
ok 99 PASS: ZC BUSY-POLL XDP_ADJUST_TAIL_GROW_MULTI_BUFF
ok 100 PASS: ZC BUSY-POLL SEND_RECEIVE_9K_PACKETS
ok 101 PASS: ZC BUSY-POLL SEND_RECEIVE_UNALIGNED
ok 102 PASS: ZC BUSY-POLL UNALIGNED_INV_DESC
ok 103 PASS: ZC BUSY-POLL UNALIGNED_INV_DESC_4001_FRAME_SIZE
ok 104 PASS: ZC BUSY-POLL SEND_RECEIVE_UNALIGNED_9K_PACKETS
# [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
# DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# 
DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# 
DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# 
.... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 
---------------------------------------
not ok 105 FAIL: ZC BUSY-POLL UNALIGNED_INV_DESC_MULTI_BUFF
ok 106 PASS: ZC BUSY-POLL HW_SW_MIN_RING_SIZE
ok 107 PASS: ZC BUSY-POLL HW_SW_MAX_RING_SIZE
ok 108 PASS: ZC BUSY-POLL TEARDOWN
# 4 skipped test(s) detected. Consider enabling relevant config options to improve coverage.
# Totals: pass:97 fail:7 xfail:0 xpass:0 skip:4 error:0
XSK_SELFTESTS_ens785f0np0_BUSY_POLL: [ FAIL ]

Summary:
XSK_SELFTESTS_ens785f0np0_SOFTIRQ: [ FAIL ]
XSK_SELFTESTS_ens785f0np0_BUSY_POLL: [ FAIL ]

> 
> -- 
> 2.41.3
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net v2 3/5] xsk: drain continuation descs after overflow in xsk_build_skb()
  2026-05-15 12:30 ` [PATCH net v2 3/5] xsk: drain continuation descs after overflow in xsk_build_skb() Jason Xing
@ 2026-05-16 12:31   ` sashiko-bot
  2026-05-17  1:49     ` Jason Xing
  0 siblings, 1 reply; 15+ messages in thread
From: sashiko-bot @ 2026-05-16 12:31 UTC (permalink / raw)
  To: Jason Xing; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] UAPI breakage and transmission batch abortion due to returning -EOVERFLOW to userspace
--

commit 1d238b3786b35997fb62e781dd7528f204ca8b34
Author: Jason Xing <kerneljasonxing@gmail.com>

xsk: drain continuation descs after overflow in xsk_build_skb()

When a multi-buffer packet exceeds MAX_SKB_FRAGS and triggers -EOVERFLOW,
only the current descriptor is released from the TX ring. This patch
introduces a drain_cont flag to properly handle the remaining continuation
descriptors, preventing them from being misinterpreted as the beginning
of a new packet.

Link: https://lore.kernel.org/all/20260425041726.85FB3C2BCB2@smtp.kernel.org/

> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 0a6203c425766..9c51eed6ee99c 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
[ ... ]
> @@ -1062,6 +1064,26 @@ static int __xsk_generic_xmit(struct sock *sk)
>  			goto out;
>  		}
>  
> +		if (unlikely(xs->drain_cont)) {
> +			unsigned long flags;
> +			u32 idx;
> +
> +			spin_lock_irqsave(&xs->pool->cq_prod_lock, flags);
> +			idx = xskq_get_prod(xs->pool->cq);
> +			xskq_prod_write_addr(xs->pool->cq, idx, desc.addr);
> +			xskq_prod_submit_n(xs->pool->cq, 1);
> +			spin_unlock_irqrestore(&xs->pool->cq_prod_lock, flags);
> +
> +			xs->tx->invalid_descs++;
> +			xskq_cons_release(xs->tx);
> +			if (!xp_mb_desc(&desc)) {
> +				xs->drain_cont = false;
> +				err = -EOVERFLOW;
> +				goto out;

Does returning -EOVERFLOW to userspace break the expected UAPI here?

The original code suppressed the -EOVERFLOW error to silently drop the invalid
packet and continue processing the batch:

		if (IS_ERR(skb)) {
			err = PTR_ERR(skb);
			if (err != -EOVERFLOW)
				goto out;
			err = 0;
			continue;
		}

By using goto out with err = -EOVERFLOW, doesn't this abort the transmission
batch prematurely? Any valid packets queued immediately behind the malformed
packet would not be processed during the current system call.

Additionally, if the batch is aborted before any valid packets are transmitted
(when sent_frame == false), the __xsk_tx_release(xs) call at the out label
will be skipped. Will this cause the global TX consumer index to remain
un-updated for the dropped descriptors, leading to a ring slot leak from the
userspace perspective?

Note that a later patch in the series ('selftests/xsk: fix multi-buffer
invalid desc tests for drain_cont') acknowledges this behavior by modifying
the selftests to tolerate the early return, but does this actually fix the
underlying ABI breakage in the kernel?

> +			}
> +			continue;
> +		}
> +
>  		skb = xsk_build_skb(xs, &desc);
>  		if (IS_ERR(skb)) {
>  			err = PTR_ERR(skb);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260515123018.80147-1-kerneljasonxing@gmail.com?part=3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net v2 4/5] xsk: drain continuation descs on invalid descriptor in __xsk_generic_xmit()
  2026-05-15 12:30 ` [PATCH net v2 4/5] xsk: drain continuation descs on invalid descriptor in __xsk_generic_xmit() Jason Xing
@ 2026-05-16 12:31   ` sashiko-bot
  2026-05-17  1:58     ` Jason Xing
  0 siblings, 1 reply; 15+ messages in thread
From: sashiko-bot @ 2026-05-16 12:31 UTC (permalink / raw)
  To: Jason Xing; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] State machine logic flaw causing valid packet drop due to un-cleared drain_cont flag.
--

commit 8841b899f0208ddaf58f3df5eb5feff9db62147f
Author: Jason Xing <kerneljasonxing@gmail.com>

xsk: drain continuation descs on invalid descriptor in __xsk_generic_xmit()

This commit addresses an issue in the TX loop where an invalid descriptor
mid-packet leaves remaining continuation fragments in the ring, which
corrupts the packet stream. It introduces a drain_cont flag to drain the
remaining fragments on the next iteration.

> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 9c51eed6ee99c..ac4884a87a0fb 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c

[ ... ]

> @@ -1125,6 +1125,8 @@ static int __xsk_generic_xmit(struct sock *sk)
>  		if (xs->skb)
>  			xsk_drop_skb(xs->skb);
>  		xskq_cons_release(xs->tx);
> +		if (xp_mb_desc(&desc))
> +			xs->drain_cont = true;

Does this logic inadvertently leave the socket stuck in the drain state?

If xs->drain_cont is already true and the current invalid descriptor is the
final fragment of the packet (meaning XDP_PKT_CONTD is not set), the flag
is not explicitly cleared since there is no else branch.

If xs->drain_cont remains true, could the next invocation of
__xsk_generic_xmit() intercept the first fragment of a completely valid,
unrelated packet in the if (unlikely(xs->drain_cont)) drain loop? 

It appears the valid packet would then be drained to the completion queue
and silently dropped with -EOVERFLOW.

Would it be safer to unconditionally update the state based on the
descriptor boundary, such as:

xs->drain_cont = xp_mb_desc(&desc);

>  	}
>  
>  out:

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260515123018.80147-1-kerneljasonxing@gmail.com?part=4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net v2 5/5] selftests/xsk: fix multi-buffer invalid desc tests for drain_cont
  2026-05-15 12:30 ` [PATCH net v2 5/5] selftests/xsk: fix multi-buffer invalid desc tests for drain_cont Jason Xing
@ 2026-05-16 12:31   ` sashiko-bot
  2026-05-17  2:47     ` Jason Xing
  0 siblings, 1 reply; 15+ messages in thread
From: sashiko-bot @ 2026-05-16 12:31 UTC (permalink / raw)
  To: Jason Xing; +Cc: bpf

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] Premature exit in the newly added `wait_for_tx_completion` drain loop causes the test to silently strand valid packets.
- [High] Clamping `outstanding_tx` to 0 in `complete_pkts` degrades test strictness by silently discarding unexpected completions.
- [High] Unhandled `EOVERFLOW` error in `kick_tx()` causes spurious test failures when the kernel synchronously processes invalid packets.
--

commit 51fe3cee29ff3a261580a926424f8e6171690b0d
Author: Jason Xing <kerneljasonxing@gmail.com>

selftests/xsk: fix multi-buffer invalid desc tests for drain_cont

This commit updates the selftests to account for the kernel returning CQ
entries for dropped and drained multi-buffer descriptors. It clamps the
outstanding TX count, adds a drain loop to consume extra CQ entries, and
modifies test descriptors to stop drains at packet boundaries.

>     - wait_for_tx_completion: add a drain loop that consumes CQ entries
>       after outstanding_tx reaches zero. Ensure remaining valid packets
>       are transmitted. This change is made because of patch 3 in the
>       series adds a logic in __xsk_generic_xmit(): return -EOVERFLOW
>       after detecting and handling the remaining part of the skb.

Since the kernel now returns -EOVERFLOW after handling invalid packets,
will kick_tx() abort the test when it encounters this new error code?

kick_tx() is called by complete_pkts(), which checks its return value:

test_xsk.c:complete_pkts() {
    if (xsk_ring_prod__needs_wakeup(&xsk->tx)) {
        ret = kick_tx(xsk);
        if (ret)
            return TEST_FAILURE;
    }
    ...
}

If kick_tx() does not tolerate -EOVERFLOW, this might cause spurious test
failures when the kernel synchronously processes invalid packets.

> diff --git a/tools/testing/selftests/bpf/prog_tests/test_xsk.c b/tools/testing/selftests/bpf/prog_tests/test_xsk.c
> index 7950c504ed289..488ace93ff663 100644
> --- a/tools/testing/selftests/bpf/prog_tests/test_xsk.c
> +++ b/tools/testing/selftests/bpf/prog_tests/test_xsk.c

[ ... ]

> @@ -950,17 +950,11 @@ static int complete_pkts(struct xsk_socket_info *xsk, int batch_size)
>  
>  	rcvd = xsk_ring_cons__peek(&xsk->umem->cq, batch_size, &idx);
>  	if (rcvd) {
> -		if (rcvd > xsk->outstanding_tx) {
> -			u64 addr = *xsk_ring_cons__comp_addr(&xsk->umem->cq, idx + rcvd - 1);
> -
> -			ksft_print_msg("[%s] Too many packets completed\n", __func__);
> -			ksft_print_msg("Last completion address: %llx\n",
> -				       (unsigned long long)addr);
> -			return TEST_FAILURE;
> -		}
> -
>  		xsk_ring_cons__release(&xsk->umem->cq, rcvd);
> -		xsk->outstanding_tx -= rcvd;
> +		if (rcvd > xsk->outstanding_tx)
> +			xsk->outstanding_tx = 0;
> +		else
> +			xsk->outstanding_tx -= rcvd;

Does clamping xsk->outstanding_tx to 0 mask underlying regressions by
silently discarding unexpected completions?

If the kernel correctly generates CQ entries for invalid descriptors now,
could outstanding_tx be incremented by the total number of submitted
descriptors instead of only the valid ones?

Tracking total descriptors would allow the test to use the existing
timeout-backed wait loop and eliminate the need for this clamping.

>  	}
>  
>  	return TEST_PASS;

[ ... ]

> @@ -1293,6 +1289,14 @@ static int wait_for_tx_completion(struct xsk_socket_info *xsk)
>  		complete_pkts(xsk, xsk->batch_size);
>  	}
>  
> +	do {
> +		if (xsk_ring_prod__needs_wakeup(&xsk->tx))
> +			kick_tx(xsk);
> +		rcvd = xsk_ring_cons__peek(&xsk->umem->cq, xsk->batch_size, &idx);
> +		if (rcvd)
> +			xsk_ring_cons__release(&xsk->umem->cq, rcvd);
> +	} while (rcvd);

Can this loop exit prematurely and strand valid packets?

In asynchronous modes, kick_tx() wakes the NAPI thread and returns
immediately. Because packet processing happens asynchronously, the hardware
may not have populated the completion queue before the peek() occurs.

If the queue is not yet populated, xsk_ring_cons__peek() will return 0,
causing the loop to break instantly before the valid packets have actually
completed.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260515123018.80147-1-kerneljasonxing@gmail.com?part=5

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net v2 3/5] xsk: drain continuation descs after overflow in xsk_build_skb()
  2026-05-16 12:31   ` sashiko-bot
@ 2026-05-17  1:49     ` Jason Xing
  0 siblings, 0 replies; 15+ messages in thread
From: Jason Xing @ 2026-05-17  1:49 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: bpf

On Sat, May 16, 2026 at 8:31 PM <sashiko-bot@kernel.org> wrote:
>
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [High] UAPI breakage and transmission batch abortion due to returning -EOVERFLOW to userspace
> --
>
> commit 1d238b3786b35997fb62e781dd7528f204ca8b34
> Author: Jason Xing <kerneljasonxing@gmail.com>
>
> xsk: drain continuation descs after overflow in xsk_build_skb()
>
> When a multi-buffer packet exceeds MAX_SKB_FRAGS and triggers -EOVERFLOW,
> only the current descriptor is released from the TX ring. This patch
> introduces a drain_cont flag to properly handle the remaining continuation
> descriptors, preventing them from being misinterpreted as the beginning
> of a new packet.
>
> Link: https://lore.kernel.org/all/20260425041726.85FB3C2BCB2@smtp.kernel.org/
>
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index 0a6203c425766..9c51eed6ee99c 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> [ ... ]
> > @@ -1062,6 +1064,26 @@ static int __xsk_generic_xmit(struct sock *sk)
> >                       goto out;
> >               }
> >
> > +             if (unlikely(xs->drain_cont)) {
> > +                     unsigned long flags;
> > +                     u32 idx;
> > +
> > +                     spin_lock_irqsave(&xs->pool->cq_prod_lock, flags);
> > +                     idx = xskq_get_prod(xs->pool->cq);
> > +                     xskq_prod_write_addr(xs->pool->cq, idx, desc.addr);
> > +                     xskq_prod_submit_n(xs->pool->cq, 1);
> > +                     spin_unlock_irqrestore(&xs->pool->cq_prod_lock, flags);
> > +
> > +                     xs->tx->invalid_descs++;
> > +                     xskq_cons_release(xs->tx);
> > +                     if (!xp_mb_desc(&desc)) {
> > +                             xs->drain_cont = false;
> > +                             err = -EOVERFLOW;
> > +                             goto out;
>
> Does returning -EOVERFLOW to userspace break the expected UAPI here?
>
> The original code suppressed the -EOVERFLOW error to silently drop the invalid
> packet and continue processing the batch:
>
>                 if (IS_ERR(skb)) {
>                         err = PTR_ERR(skb);
>                         if (err != -EOVERFLOW)
>                                 goto out;
>                         err = 0;
>                         continue;
>                 }
>
> By using goto out with err = -EOVERFLOW, doesn't this abort the transmission
> batch prematurely? Any valid packets queued immediately behind the malformed
> packet would not be processed during the current system call.
>
> Additionally, if the batch is aborted before any valid packets are transmitted
> (when sent_frame == false), the __xsk_tx_release(xs) call at the out label
> will be skipped. Will this cause the global TX consumer index to remain
> un-updated for the dropped descriptors, leading to a ring slot leak from the
> userspace perspective?
>
> Note that a later patch in the series ('selftests/xsk: fix multi-buffer
> invalid desc tests for drain_cont') acknowledges this behavior by modifying
> the selftests to tolerate the early return, but does this actually fix the
> underlying ABI breakage in the kernel?

Yes, Stan also pointed out. I've already removed this part in the incoming v3.

Thanks,
Jason

>
> > +                     }
> > +                     continue;
> > +             }
> > +
> >               skb = xsk_build_skb(xs, &desc);
> >               if (IS_ERR(skb)) {
> >                       err = PTR_ERR(skb);
>
> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260515123018.80147-1-kerneljasonxing@gmail.com?part=3

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net v2 4/5] xsk: drain continuation descs on invalid descriptor in __xsk_generic_xmit()
  2026-05-16 12:31   ` sashiko-bot
@ 2026-05-17  1:58     ` Jason Xing
  0 siblings, 0 replies; 15+ messages in thread
From: Jason Xing @ 2026-05-17  1:58 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: bpf

On Sat, May 16, 2026 at 8:31 PM <sashiko-bot@kernel.org> wrote:
>
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [High] State machine logic flaw causing valid packet drop due to un-cleared drain_cont flag.
> --
>
> commit 8841b899f0208ddaf58f3df5eb5feff9db62147f
> Author: Jason Xing <kerneljasonxing@gmail.com>
>
> xsk: drain continuation descs on invalid descriptor in __xsk_generic_xmit()
>
> This commit addresses an issue in the TX loop where an invalid descriptor
> mid-packet leaves remaining continuation fragments in the ring, which
> corrupts the packet stream. It introduces a drain_cont flag to drain the
> remaining fragments on the next iteration.
>
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index 9c51eed6ee99c..ac4884a87a0fb 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
>
> [ ... ]
>
> > @@ -1125,6 +1125,8 @@ static int __xsk_generic_xmit(struct sock *sk)
> >               if (xs->skb)
> >                       xsk_drop_skb(xs->skb);
> >               xskq_cons_release(xs->tx);
> > +             if (xp_mb_desc(&desc))
> > +                     xs->drain_cont = true;
>
> Does this logic inadvertently leave the socket stuck in the drain state?
>
> If xs->drain_cont is already true and the current invalid descriptor is the
> final fragment of the packet (meaning XDP_PKT_CONTD is not set), the flag
> is not explicitly cleared since there is no else branch.
>
> If xs->drain_cont remains true, could the next invocation of
> __xsk_generic_xmit() intercept the first fragment of a completely valid,
> unrelated packet in the if (unlikely(xs->drain_cont)) drain loop?
>
> It appears the valid packet would then be drained to the completion queue
> and silently dropped with -EOVERFLOW.
>
> Would it be safer to unconditionally update the state based on the
> descriptor boundary, such as:
>
> xs->drain_cont = xp_mb_desc(&desc);

Yes, I noticed this yesterday and changed it to:
if (xp_mb_desc(&desc))
xs->drain_cont = true;
else
xs->drain_cont = false;

Sure, one-liner code is much better. I'll apply your suggestion. Thanks.

>
> >       }
> >
> >  out:
>
> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260515123018.80147-1-kerneljasonxing@gmail.com?part=4

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net v2 0/5] xsk: fix meta and publish of cq issues
  2026-05-15 16:59 ` [PATCH net v2 0/5] xsk: fix meta and publish of cq issues Maciej Fijalkowski
@ 2026-05-17  2:09   ` Jason Xing
  0 siblings, 0 replies; 15+ messages in thread
From: Jason Xing @ 2026-05-17  2:09 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms,
	andrew+netdev, bpf, netdev, Jason Xing

On Sat, May 16, 2026 at 1:00 AM Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> On Fri, May 15, 2026 at 08:30:13PM +0800, Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > The series is the product of previous review from sashiko[1].
> >
> > 1) META
> > patch 1: address TOCTOU around metadata.
> >
> > 2) PUBLISH of CQ
> > patch 2: make sure xsk_addr->addrs[] can be published to cq when
> >          overflow occurs.
> > patch 3: keep cleaning up the continuation descs (more than 17) and
> >          publish its address when overflow occurs.
> > patch 4: like patch 3, but only handles the invalid descs cases.
> >
> > [1]: https://lore.kernel.org/all/20260502200722.53960-1-kerneljasonxing@gmail.com/
> >
> > ---
> > V2
> > Link: https://lore.kernel.org/all/20260510012310.88570-1-kerneljasonxing@gmail.com/
> > 1. adjust selftests (Jakub)
> > 2. add READ_ONCE in patch 1 (Stan)
> >
> >
> > Jason Xing (5):
> >   xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata()
> >   xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx
> >   xsk: drain continuation descs after overflow in xsk_build_skb()
> >   xsk: drain continuation descs on invalid descriptor in
> >     __xsk_generic_xmit()
> >   selftests/xsk: fix multi-buffer invalid desc tests for drain_cont
> >
> >  include/net/xdp_sock.h                        |  1 +
> >  net/xdp/xsk.c                                 | 48 +++++++++++++++----
> >  .../selftests/bpf/prog_tests/test_xsk.c       | 45 +++++++++--------
> >  3 files changed, 64 insertions(+), 30 deletions(-)
>
> this adds regression to xskxceiver runs on real hw (ice on my side):
>
> PREREQUISITES: [ PASS ]
> 1..108
> ok 1 PASS: SKB SEND_RECEIVE
> ok 2 PASS: SKB SEND_RECEIVE_2K_FRAME
> ok 3 PASS: SKB SEND_RECEIVE_SINGLE_PKT
> ok 4 PASS: SKB POLL_RX
> ok 5 PASS: SKB POLL_TX
> ok 6 PASS: SKB POLL_RXQ_FULL
> ok 7 PASS: SKB POLL_TXQ_FULL
> ok 8 PASS: SKB ALIGNED_INV_DESC
> ok 9 PASS: SKB ALIGNED_INV_DESC_2K_FRAME_SIZE
> ok 10 PASS: SKB UMEM_HEADROOM
> ok 11 PASS: SKB BIDIRECTIONAL
> ok 12 PASS: SKB STAT_RX_DROPPED
> ok 13 PASS: SKB STAT_TX_INVALID
> ok 14 PASS: SKB STAT_RX_FULL
> ok 15 PASS: SKB STAT_FILL_EMPTY
> ok 16 PASS: SKB XDP_PROG_CLEANUP
> ok 17 PASS: SKB XDP_DROP_HALF
> ok 18 PASS: SKB XDP_SHARED_UMEM
> ok 19 PASS: SKB XDP_METADATA_COPY
> ok 20 PASS: SKB XDP_METADATA_COPY_MULTI_BUFF
> # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
> # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01#
> DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00#
> DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....#
> .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 #
> ---------------------------------------
> not ok 21 FAIL: SKB ALIGNED_INV_DESC_MULTI_BUFF
> ok 22 PASS: SKB TOO_MANY_FRAGS
> ok 23 PASS: SKB XDP_ADJUST_TAIL_SHRINK
> ok 24 PASS: SKB TX_QUEUE_CONSUMER
> ok 25 PASS: SKB XDP_ADJUST_TAIL_SHRINK_MULTI_BUFF
> ok 26 # SKIP SKIP: SKB XDP_ADJUST_TAIL_GROW
> ok 27 # SKIP SKIP: SKB XDP_ADJUST_TAIL_GROW_MULTI_BUFF
> ok 28 PASS: SKB SEND_RECEIVE_9K_PACKETS
> ok 29 PASS: SKB SEND_RECEIVE_UNALIGNED
> ok 30 PASS: SKB UNALIGNED_INV_DESC
> ok 31 PASS: SKB UNALIGNED_INV_DESC_4001_FRAME_SIZE
> ok 32 PASS: SKB SEND_RECEIVE_UNALIGNED_9K_PACKETS
> # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
> # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01#
> DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00#
> DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....#
> .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 #
> ---------------------------------------
> not ok 33 FAIL: SKB UNALIGNED_INV_DESC_MULTI_BUFF
> ok 34 PASS: SKB HW_SW_MIN_RING_SIZE
> ok 35 PASS: SKB HW_SW_MAX_RING_SIZE
> ok 36 PASS: SKB TEARDOWN
> ok 37 PASS: DRV SEND_RECEIVE
> ok 38 PASS: DRV SEND_RECEIVE_2K_FRAME
> ok 39 PASS: DRV SEND_RECEIVE_SINGLE_PKT
> ok 40 PASS: DRV POLL_RX
> ok 41 PASS: DRV POLL_TX
> ok 42 PASS: DRV POLL_RXQ_FULL
> ok 43 PASS: DRV POLL_TXQ_FULL
> ok 44 PASS: DRV ALIGNED_INV_DESC
> ok 45 PASS: DRV ALIGNED_INV_DESC_2K_FRAME_SIZE
> ok 46 PASS: DRV UMEM_HEADROOM
> ok 47 PASS: DRV BIDIRECTIONAL
> ok 48 PASS: DRV STAT_RX_DROPPED
> ok 49 PASS: DRV STAT_TX_INVALID
> ok 50 PASS: DRV STAT_RX_FULL
> ok 51 PASS: DRV STAT_FILL_EMPTY
> ok 52 PASS: DRV XDP_PROG_CLEANUP
> ok 53 PASS: DRV XDP_DROP_HALF
> ok 54 PASS: DRV XDP_SHARED_UMEM
> ok 55 PASS: DRV XDP_METADATA_COPY
> ok 56 PASS: DRV XDP_METADATA_COPY_MULTI_BUFF
> # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
> # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01#
> DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00#
> DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....#
> .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 #
> ---------------------------------------
> not ok 57 FAIL: DRV ALIGNED_INV_DESC_MULTI_BUFF
> ok 58 PASS: DRV TOO_MANY_FRAGS
> ok 59 PASS: DRV XDP_ADJUST_TAIL_SHRINK
> ok 60 PASS: DRV TX_QUEUE_CONSUMER
> ok 61 PASS: DRV XDP_ADJUST_TAIL_SHRINK_MULTI_BUFF
> ok 62 PASS: DRV XDP_ADJUST_TAIL_GROW
> ok 63 PASS: DRV XDP_ADJUST_TAIL_GROW_MULTI_BUFF
> ok 64 PASS: DRV SEND_RECEIVE_9K_PACKETS
> ok 65 PASS: DRV SEND_RECEIVE_UNALIGNED
> ok 66 PASS: DRV UNALIGNED_INV_DESC
> ok 67 PASS: DRV UNALIGNED_INV_DESC_4001_FRAME_SIZE
> ok 68 PASS: DRV SEND_RECEIVE_UNALIGNED_9K_PACKETS
> # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
> # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01#
> DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00#
> DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....#
> .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 #
> ---------------------------------------
> not ok 69 FAIL: DRV UNALIGNED_INV_DESC_MULTI_BUFF
> ok 70 PASS: DRV HW_SW_MIN_RING_SIZE
> ok 71 PASS: DRV HW_SW_MAX_RING_SIZE
> ok 72 PASS: DRV TEARDOWN
> ok 73 PASS: ZC SEND_RECEIVE
> ok 74 PASS: ZC SEND_RECEIVE_2K_FRAME
> ok 75 PASS: ZC SEND_RECEIVE_SINGLE_PKT
> ok 76 PASS: ZC POLL_RX
> ok 77 PASS: ZC POLL_TX
> ok 78 PASS: ZC POLL_RXQ_FULL
> ok 79 PASS: ZC POLL_TXQ_FULL
> ok 80 PASS: ZC ALIGNED_INV_DESC
> ok 81 PASS: ZC ALIGNED_INV_DESC_2K_FRAME_SIZE
> ok 82 PASS: ZC UMEM_HEADROOM
> ok 83 PASS: ZC BIDIRECTIONAL
> # Can not run RX_DROPPED test for ZC mode
> ok 84 # SKIP SKIP: ZC STAT_RX_DROPPED
> ok 85 PASS: ZC STAT_TX_INVALID
> ok 86 PASS: ZC STAT_RX_FULL
> ok 87 PASS: ZC STAT_FILL_EMPTY
> ok 88 PASS: ZC XDP_PROG_CLEANUP
> ok 89 PASS: ZC XDP_DROP_HALF
> ok 90 PASS: ZC XDP_SHARED_UMEM
> ok 91 PASS: ZC XDP_METADATA_COPY
> ok 92 PASS: ZC XDP_METADATA_COPY_MULTI_BUFF
> # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
> # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01#
> DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00#
> DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....#
> .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 #
> ---------------------------------------
> not ok 93 FAIL: ZC ALIGNED_INV_DESC_MULTI_BUFF
> # [is_frag_valid] expected pkt_nb [11], got pkt_nb [0]
> # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01#
> DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00#
> DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....#
> ---------------------------------------
> not ok 94 FAIL: ZC TOO_MANY_FRAGS
> ok 95 PASS: ZC XDP_ADJUST_TAIL_SHRINK
> # Can not run TX_QUEUE_CONSUMER test for ZC mode
> ok 96 # SKIP SKIP: ZC TX_QUEUE_CONSUMER
> ok 97 PASS: ZC XDP_ADJUST_TAIL_SHRINK_MULTI_BUFF
> ok 98 PASS: ZC XDP_ADJUST_TAIL_GROW
> ok 99 PASS: ZC XDP_ADJUST_TAIL_GROW_MULTI_BUFF
> ok 100 PASS: ZC SEND_RECEIVE_9K_PACKETS
> ok 101 PASS: ZC SEND_RECEIVE_UNALIGNED
> ok 102 PASS: ZC UNALIGNED_INV_DESC
> ok 103 PASS: ZC UNALIGNED_INV_DESC_4001_FRAME_SIZE
> ok 104 PASS: ZC SEND_RECEIVE_UNALIGNED_9K_PACKETS
> # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
> # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01#
> DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00#
> DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....#
> .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 #
> ---------------------------------------
> not ok 105 FAIL: ZC UNALIGNED_INV_DESC_MULTI_BUFF
> ok 106 PASS: ZC HW_SW_MIN_RING_SIZE
> ok 107 PASS: ZC HW_SW_MAX_RING_SIZE
> ok 108 PASS: ZC TEARDOWN
> # 4 skipped test(s) detected. Consider enabling relevant config options to improve coverage.
> # Totals: pass:97 fail:7 xfail:0 xpass:0 skip:4 error:0
> XSK_SELFTESTS_ens785f0np0_SOFTIRQ: [ FAIL ]
> 1..108
> ok 1 PASS: SKB BUSY-POLL SEND_RECEIVE
> ok 2 PASS: SKB BUSY-POLL SEND_RECEIVE_2K_FRAME
> ok 3 PASS: SKB BUSY-POLL SEND_RECEIVE_SINGLE_PKT
> ok 4 PASS: SKB BUSY-POLL POLL_RX
> ok 5 PASS: SKB BUSY-POLL POLL_TX
> ok 6 PASS: SKB BUSY-POLL POLL_RXQ_FULL
> ok 7 PASS: SKB BUSY-POLL POLL_TXQ_FULL
> ok 8 PASS: SKB BUSY-POLL ALIGNED_INV_DESC
> ok 9 PASS: SKB BUSY-POLL ALIGNED_INV_DESC_2K_FRAME_SIZE
> ok 10 PASS: SKB BUSY-POLL UMEM_HEADROOM
> ok 11 PASS: SKB BUSY-POLL BIDIRECTIONAL
> ok 12 PASS: SKB BUSY-POLL STAT_RX_DROPPED
> ok 13 PASS: SKB BUSY-POLL STAT_TX_INVALID
> ok 14 PASS: SKB BUSY-POLL STAT_RX_FULL
> ok 15 PASS: SKB BUSY-POLL STAT_FILL_EMPTY
> ok 16 PASS: SKB BUSY-POLL XDP_PROG_CLEANUP
> ok 17 PASS: SKB BUSY-POLL XDP_DROP_HALF
> ok 18 PASS: SKB BUSY-POLL XDP_SHARED_UMEM
> ok 19 PASS: SKB BUSY-POLL XDP_METADATA_COPY
> ok 20 PASS: SKB BUSY-POLL XDP_METADATA_COPY_MULTI_BUFF
> # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
> # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01#
> DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00#
> DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....#
> .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 #
> ---------------------------------------
> not ok 21 FAIL: SKB BUSY-POLL ALIGNED_INV_DESC_MULTI_BUFF
> ok 22 PASS: SKB BUSY-POLL TOO_MANY_FRAGS
> ok 23 PASS: SKB BUSY-POLL XDP_ADJUST_TAIL_SHRINK
> ok 24 PASS: SKB BUSY-POLL TX_QUEUE_CONSUMER
> ok 25 PASS: SKB BUSY-POLL XDP_ADJUST_TAIL_SHRINK_MULTI_BUFF
> ok 26 # SKIP SKIP: SKB BUSY-POLL XDP_ADJUST_TAIL_GROW
> ok 27 # SKIP SKIP: SKB BUSY-POLL XDP_ADJUST_TAIL_GROW_MULTI_BUFF
> ok 28 PASS: SKB BUSY-POLL SEND_RECEIVE_9K_PACKETS
> ok 29 PASS: SKB BUSY-POLL SEND_RECEIVE_UNALIGNED
> ok 30 PASS: SKB BUSY-POLL UNALIGNED_INV_DESC
> ok 31 PASS: SKB BUSY-POLL UNALIGNED_INV_DESC_4001_FRAME_SIZE
> ok 32 PASS: SKB BUSY-POLL SEND_RECEIVE_UNALIGNED_9K_PACKETS
> # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
> # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01#
> DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00#
> DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....#
> .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 #
> ---------------------------------------
> not ok 33 FAIL: SKB BUSY-POLL UNALIGNED_INV_DESC_MULTI_BUFF
> ok 34 PASS: SKB BUSY-POLL HW_SW_MIN_RING_SIZE
> ok 35 PASS: SKB BUSY-POLL HW_SW_MAX_RING_SIZE
> ok 36 PASS: SKB BUSY-POLL TEARDOWN
> ok 37 PASS: DRV BUSY-POLL SEND_RECEIVE
> ok 38 PASS: DRV BUSY-POLL SEND_RECEIVE_2K_FRAME
> ok 39 PASS: DRV BUSY-POLL SEND_RECEIVE_SINGLE_PKT
> ok 40 PASS: DRV BUSY-POLL POLL_RX
> ok 41 PASS: DRV BUSY-POLL POLL_TX
> ok 42 PASS: DRV BUSY-POLL POLL_RXQ_FULL
> ok 43 PASS: DRV BUSY-POLL POLL_TXQ_FULL
> ok 44 PASS: DRV BUSY-POLL ALIGNED_INV_DESC
> ok 45 PASS: DRV BUSY-POLL ALIGNED_INV_DESC_2K_FRAME_SIZE
> ok 46 PASS: DRV BUSY-POLL UMEM_HEADROOM
> ok 47 PASS: DRV BUSY-POLL BIDIRECTIONAL
> ok 48 PASS: DRV BUSY-POLL STAT_RX_DROPPED
> ok 49 PASS: DRV BUSY-POLL STAT_TX_INVALID
> ok 50 PASS: DRV BUSY-POLL STAT_RX_FULL
> ok 51 PASS: DRV BUSY-POLL STAT_FILL_EMPTY
> ok 52 PASS: DRV BUSY-POLL XDP_PROG_CLEANUP
> ok 53 PASS: DRV BUSY-POLL XDP_DROP_HALF
> ok 54 PASS: DRV BUSY-POLL XDP_SHARED_UMEM
> ok 55 PASS: DRV BUSY-POLL XDP_METADATA_COPY
> ok 56 PASS: DRV BUSY-POLL XDP_METADATA_COPY_MULTI_BUFF
> # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
> # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01#
> DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00#
> DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....#
> .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 #
> ---------------------------------------
> not ok 57 FAIL: DRV BUSY-POLL ALIGNED_INV_DESC_MULTI_BUFF
> ok 58 PASS: DRV BUSY-POLL TOO_MANY_FRAGS
> ok 59 PASS: DRV BUSY-POLL XDP_ADJUST_TAIL_SHRINK
> ok 60 PASS: DRV BUSY-POLL TX_QUEUE_CONSUMER
> ok 61 PASS: DRV BUSY-POLL XDP_ADJUST_TAIL_SHRINK_MULTI_BUFF
> ok 62 PASS: DRV BUSY-POLL XDP_ADJUST_TAIL_GROW
> ok 63 PASS: DRV BUSY-POLL XDP_ADJUST_TAIL_GROW_MULTI_BUFF
> ok 64 PASS: DRV BUSY-POLL SEND_RECEIVE_9K_PACKETS
> ok 65 PASS: DRV BUSY-POLL SEND_RECEIVE_UNALIGNED
> ok 66 PASS: DRV BUSY-POLL UNALIGNED_INV_DESC
> ok 67 PASS: DRV BUSY-POLL UNALIGNED_INV_DESC_4001_FRAME_SIZE
> ok 68 PASS: DRV BUSY-POLL SEND_RECEIVE_UNALIGNED_9K_PACKETS
> # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
> # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01#
> DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00#
> DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....#
> .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 #
> ---------------------------------------
> not ok 69 FAIL: DRV BUSY-POLL UNALIGNED_INV_DESC_MULTI_BUFF
> ok 70 PASS: DRV BUSY-POLL HW_SW_MIN_RING_SIZE
> ok 71 PASS: DRV BUSY-POLL HW_SW_MAX_RING_SIZE
> ok 72 PASS: DRV BUSY-POLL TEARDOWN
> ok 73 PASS: ZC BUSY-POLL SEND_RECEIVE
> ok 74 PASS: ZC BUSY-POLL SEND_RECEIVE_2K_FRAME
> ok 75 PASS: ZC BUSY-POLL SEND_RECEIVE_SINGLE_PKT
> ok 76 PASS: ZC BUSY-POLL POLL_RX
> ok 77 PASS: ZC BUSY-POLL POLL_TX
> ok 78 PASS: ZC BUSY-POLL POLL_RXQ_FULL
> ok 79 PASS: ZC BUSY-POLL POLL_TXQ_FULL
> ok 80 PASS: ZC BUSY-POLL ALIGNED_INV_DESC
> ok 81 PASS: ZC BUSY-POLL ALIGNED_INV_DESC_2K_FRAME_SIZE
> ok 82 PASS: ZC BUSY-POLL UMEM_HEADROOM
> ok 83 PASS: ZC BUSY-POLL BIDIRECTIONAL
> # Can not run RX_DROPPED test for ZC mode
> ok 84 # SKIP SKIP: ZC BUSY-POLL STAT_RX_DROPPED
> ok 85 PASS: ZC BUSY-POLL STAT_TX_INVALID
> ok 86 PASS: ZC BUSY-POLL STAT_RX_FULL
> ok 87 PASS: ZC BUSY-POLL STAT_FILL_EMPTY
> ok 88 PASS: ZC BUSY-POLL XDP_PROG_CLEANUP
> ok 89 PASS: ZC BUSY-POLL XDP_DROP_HALF
> ok 90 PASS: ZC BUSY-POLL XDP_SHARED_UMEM
> ok 91 PASS: ZC BUSY-POLL XDP_METADATA_COPY
> ok 92 PASS: ZC BUSY-POLL XDP_METADATA_COPY_MULTI_BUFF
> # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
> # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01#
> DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00#
> DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....#
> .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 #
> ---------------------------------------
> not ok 93 FAIL: ZC BUSY-POLL ALIGNED_INV_DESC_MULTI_BUFF
> # [is_frag_valid] expected pkt_nb [11], got pkt_nb [0]
> # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01#
> DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00#
> DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....#
> ---------------------------------------
> not ok 94 FAIL: ZC BUSY-POLL TOO_MANY_FRAGS
> ok 95 PASS: ZC BUSY-POLL XDP_ADJUST_TAIL_SHRINK
> # Can not run TX_QUEUE_CONSUMER test for ZC mode
> ok 96 # SKIP SKIP: ZC BUSY-POLL TX_QUEUE_CONSUMER
> ok 97 PASS: ZC BUSY-POLL XDP_ADJUST_TAIL_SHRINK_MULTI_BUFF
> ok 98 PASS: ZC BUSY-POLL XDP_ADJUST_TAIL_GROW
> ok 99 PASS: ZC BUSY-POLL XDP_ADJUST_TAIL_GROW_MULTI_BUFF
> ok 100 PASS: ZC BUSY-POLL SEND_RECEIVE_9K_PACKETS
> ok 101 PASS: ZC BUSY-POLL SEND_RECEIVE_UNALIGNED
> ok 102 PASS: ZC BUSY-POLL UNALIGNED_INV_DESC
> ok 103 PASS: ZC BUSY-POLL UNALIGNED_INV_DESC_4001_FRAME_SIZE
> ok 104 PASS: ZC BUSY-POLL SEND_RECEIVE_UNALIGNED_9K_PACKETS
> # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
> # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01#
> DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00#
> DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....#
> .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 #
> ---------------------------------------
> not ok 105 FAIL: ZC BUSY-POLL UNALIGNED_INV_DESC_MULTI_BUFF
> ok 106 PASS: ZC BUSY-POLL HW_SW_MIN_RING_SIZE
> ok 107 PASS: ZC BUSY-POLL HW_SW_MAX_RING_SIZE
> ok 108 PASS: ZC BUSY-POLL TEARDOWN
> # 4 skipped test(s) detected. Consider enabling relevant config options to improve coverage.
> # Totals: pass:97 fail:7 xfail:0 xpass:0 skip:4 error:0
> XSK_SELFTESTS_ens785f0np0_BUSY_POLL: [ FAIL ]
>
> Summary:
> XSK_SELFTESTS_ens785f0np0_SOFTIRQ: [ FAIL ]
> XSK_SELFTESTS_ens785f0np0_BUSY_POLL: [ FAIL ]

Thanks for the test. I spent some time finally configuring the env in
my machine with ixgbe driver.

Some of the failures are probably caused because of lack of patch 5 in
your setup? The rest is caused/breaked by patch 3 which was stated by
Stan. I fixed that in the next version.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net v2 5/5] selftests/xsk: fix multi-buffer invalid desc tests for drain_cont
  2026-05-16 12:31   ` sashiko-bot
@ 2026-05-17  2:47     ` Jason Xing
  2026-05-17  6:15       ` Jason Xing
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Xing @ 2026-05-17  2:47 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: bpf

On Sat, May 16, 2026 at 8:31 PM <sashiko-bot@kernel.org> wrote:
>
> Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
> - [High] Premature exit in the newly added `wait_for_tx_completion` drain loop causes the test to silently strand valid packets.
> - [High] Clamping `outstanding_tx` to 0 in `complete_pkts` degrades test strictness by silently discarding unexpected completions.
> - [High] Unhandled `EOVERFLOW` error in `kick_tx()` causes spurious test failures when the kernel synchronously processes invalid packets.
> --
>
> commit 51fe3cee29ff3a261580a926424f8e6171690b0d
> Author: Jason Xing <kerneljasonxing@gmail.com>
>
> selftests/xsk: fix multi-buffer invalid desc tests for drain_cont
>
> This commit updates the selftests to account for the kernel returning CQ
> entries for dropped and drained multi-buffer descriptors. It clamps the
> outstanding TX count, adds a drain loop to consume extra CQ entries, and
> modifies test descriptors to stop drains at packet boundaries.
>
> >     - wait_for_tx_completion: add a drain loop that consumes CQ entries
> >       after outstanding_tx reaches zero. Ensure remaining valid packets
> >       are transmitted. This change is made because of patch 3 in the
> >       series adds a logic in __xsk_generic_xmit(): return -EOVERFLOW
> >       after detecting and handling the remaining part of the skb.
>
> Since the kernel now returns -EOVERFLOW after handling invalid packets,
> will kick_tx() abort the test when it encounters this new error code?
>
> kick_tx() is called by complete_pkts(), which checks its return value:
>
> test_xsk.c:complete_pkts() {
>     if (xsk_ring_prod__needs_wakeup(&xsk->tx)) {
>         ret = kick_tx(xsk);
>         if (ret)
>             return TEST_FAILURE;
>     }
>     ...
> }
>
> If kick_tx() does not tolerate -EOVERFLOW, this might cause spurious test
> failures when the kernel synchronously processes invalid packets.

This review was made on top of 0003 patch. It breaks previous usage
which means this review is no longer valid for the incoming v3 series.

>
> > diff --git a/tools/testing/selftests/bpf/prog_tests/test_xsk.c b/tools/testing/selftests/bpf/prog_tests/test_xsk.c
> > index 7950c504ed289..488ace93ff663 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/test_xsk.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/test_xsk.c
>
> [ ... ]
>
> > @@ -950,17 +950,11 @@ static int complete_pkts(struct xsk_socket_info *xsk, int batch_size)
> >
> >       rcvd = xsk_ring_cons__peek(&xsk->umem->cq, batch_size, &idx);
> >       if (rcvd) {
> > -             if (rcvd > xsk->outstanding_tx) {
> > -                     u64 addr = *xsk_ring_cons__comp_addr(&xsk->umem->cq, idx + rcvd - 1);
> > -
> > -                     ksft_print_msg("[%s] Too many packets completed\n", __func__);
> > -                     ksft_print_msg("Last completion address: %llx\n",
> > -                                    (unsigned long long)addr);
> > -                     return TEST_FAILURE;
> > -             }
> > -
> >               xsk_ring_cons__release(&xsk->umem->cq, rcvd);
> > -             xsk->outstanding_tx -= rcvd;
> > +             if (rcvd > xsk->outstanding_tx)
> > +                     xsk->outstanding_tx = 0;
> > +             else
> > +                     xsk->outstanding_tx -= rcvd;
>
> Does clamping xsk->outstanding_tx to 0 mask underlying regressions by
> silently discarding unexpected completions?
>
> If the kernel correctly generates CQ entries for invalid descriptors now,
> could outstanding_tx be incremented by the total number of submitted
> descriptors instead of only the valid ones?

Yep, I'll fix it.

>
> Tracking total descriptors would allow the test to use the existing
> timeout-backed wait loop and eliminate the need for this clamping.
>
> >       }
> >
> >       return TEST_PASS;
>
> [ ... ]
>
> > @@ -1293,6 +1289,14 @@ static int wait_for_tx_completion(struct xsk_socket_info *xsk)
> >               complete_pkts(xsk, xsk->batch_size);
> >       }
> >
> > +     do {
> > +             if (xsk_ring_prod__needs_wakeup(&xsk->tx))
> > +                     kick_tx(xsk);
> > +             rcvd = xsk_ring_cons__peek(&xsk->umem->cq, xsk->batch_size, &idx);
> > +             if (rcvd)
> > +                     xsk_ring_cons__release(&xsk->umem->cq, rcvd);
> > +     } while (rcvd);
>
> Can this loop exit prematurely and strand valid packets?
>
> In asynchronous modes, kick_tx() wakes the NAPI thread and returns
> immediately. Because packet processing happens asynchronously, the hardware
> may not have populated the completion queue before the peek() occurs.
>
> If the queue is not yet populated, xsk_ring_cons__peek() will return 0,
> causing the loop to break instantly before the valid packets have actually
> completed.

Right, my local AI also spotted this potential issue when running in
the physical test env. So I added:
+       usleep(100);
+       complete_pkts(xsk, xsk->batch_size);

Thanks,
Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net v2 5/5] selftests/xsk: fix multi-buffer invalid desc tests for drain_cont
  2026-05-17  2:47     ` Jason Xing
@ 2026-05-17  6:15       ` Jason Xing
  0 siblings, 0 replies; 15+ messages in thread
From: Jason Xing @ 2026-05-17  6:15 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: bpf

> > > @@ -950,17 +950,11 @@ static int complete_pkts(struct xsk_socket_info *xsk, int batch_size)
> > >
> > >       rcvd = xsk_ring_cons__peek(&xsk->umem->cq, batch_size, &idx);
> > >       if (rcvd) {
> > > -             if (rcvd > xsk->outstanding_tx) {
> > > -                     u64 addr = *xsk_ring_cons__comp_addr(&xsk->umem->cq, idx + rcvd - 1);
> > > -
> > > -                     ksft_print_msg("[%s] Too many packets completed\n", __func__);
> > > -                     ksft_print_msg("Last completion address: %llx\n",
> > > -                                    (unsigned long long)addr);
> > > -                     return TEST_FAILURE;
> > > -             }
> > > -
> > >               xsk_ring_cons__release(&xsk->umem->cq, rcvd);
> > > -             xsk->outstanding_tx -= rcvd;
> > > +             if (rcvd > xsk->outstanding_tx)
> > > +                     xsk->outstanding_tx = 0;
> > > +             else
> > > +                     xsk->outstanding_tx -= rcvd;
> >
> > Does clamping xsk->outstanding_tx to 0 mask underlying regressions by
> > silently discarding unexpected completions?
> >
> > If the kernel correctly generates CQ entries for invalid descriptors now,
> > could outstanding_tx be incremented by the total number of submitted
> > descriptors instead of only the valid ones?

Well, this assumption doesn't always hold because
xsk_cq_cancel_locked() cancels the desc from cq. I confirmed this with
testing - switching to outstanding_tx += i causes 6 additional test
failures.

Since this series fixs the wrong behavior and publishes the
invalid/overflowed desc into the cq, outstanding_tx is no longer used
as the verification in selftests. But it is still useful, like as an
indicator on when to kick the txq.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-05-17  6:16 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-15 12:30 [PATCH net v2 0/5] xsk: fix meta and publish of cq issues Jason Xing
2026-05-15 12:30 ` [PATCH net v2 1/5] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() Jason Xing
2026-05-15 12:30 ` [PATCH net v2 2/5] xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx Jason Xing
2026-05-15 12:30 ` [PATCH net v2 3/5] xsk: drain continuation descs after overflow in xsk_build_skb() Jason Xing
2026-05-16 12:31   ` sashiko-bot
2026-05-17  1:49     ` Jason Xing
2026-05-15 12:30 ` [PATCH net v2 4/5] xsk: drain continuation descs on invalid descriptor in __xsk_generic_xmit() Jason Xing
2026-05-16 12:31   ` sashiko-bot
2026-05-17  1:58     ` Jason Xing
2026-05-15 12:30 ` [PATCH net v2 5/5] selftests/xsk: fix multi-buffer invalid desc tests for drain_cont Jason Xing
2026-05-16 12:31   ` sashiko-bot
2026-05-17  2:47     ` Jason Xing
2026-05-17  6:15       ` Jason Xing
2026-05-15 16:59 ` [PATCH net v2 0/5] xsk: fix meta and publish of cq issues Maciej Fijalkowski
2026-05-17  2:09   ` Jason Xing

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox