From: Jason Xing <kerneljasonxing@gmail.com>
To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, bjorn@kernel.org, magnus.karlsson@intel.com,
maciej.fijalkowski@intel.com, jonathan.lemon@gmail.com,
sdf@fomichev.me, ast@kernel.org, daniel@iogearbox.net,
hawk@kernel.org, john.fastabend@gmail.com, horms@kernel.org,
andrew+netdev@lunn.ch
Cc: bpf@vger.kernel.org, netdev@vger.kernel.org,
Jason Xing <kernelxing@tencent.com>
Subject: [PATCH net v3 5/5] selftests/xsk: drain CQ to wait for TX completion
Date: Sun, 17 May 2026 14:33:11 +0800 [thread overview]
Message-ID: <20260517063311.28921-6-kerneljasonxing@gmail.com> (raw)
In-Reply-To: <20260517063311.28921-1-kerneljasonxing@gmail.com>
From: Jason Xing <kernelxing@tencent.com>
After the kernel xsk drain_cont patches, dropped multi-buffer
descriptors get their buffer addresses published to the completion
queue (CQ) via the skb destructor instead of being cancelled. As a
result, the CQ entries observed by user space no longer match the
software-side accounting based on valid_frags only:
__send_pkts() bumps xsk->outstanding_tx by valid_frags, while
complete_pkts() decrements it by every CQ entry it consumes,
including those produced by drops/drains. This makes
outstanding_tx underflow and causes wait_for_tx_completion() to
exit while valid descriptors are still sitting in the TX ring,
which in turn makes receive_pkts() time out for the
ALIGNED_INV_DESC_MULTI_BUFF, UNALIGNED_INV_DESC_MULTI_BUFF and
TOO_MANY_FRAGS subtests.
Fix this with two changes to the TX completion path:
- complete_pkts(): tolerate extra CQ completions by clamping
outstanding_tx to zero instead of failing.
- wait_for_tx_completion(): after the outstanding_tx loop finishes,
add a drain loop that kicks TX and consumes remaining CQ entries.
After the drain loop exits, do a short usleep and one final
complete_pkts() call so that real hardware (e.g. ice) has enough
time to post late CQ entries before we conclude the ring is
fully drained.
Adjust the multi-buffer invalid-desc tests so that the last
descriptor of every invalid packet has XDP_PKT_CONTD cleared.
Without this, the kernel drain_cont logic would consume
descriptors past the packet boundary and eat into the next valid
packet, breaking pkt_nb validation. Concretely:
- XSK_DESC__INVALID_OPTION is changed from 0xffff to 0xfffe so it
no longer asserts the XDP_PKT_CONTD bit (bit 0).
- testapp_invalid_desc_mb() clears XDP_PKT_CONTD on the trailing
descriptor of the invalid-address and invalid-length packets.
- testapp_too_many_frags() appends one extra terminating
descriptor so the over-sized invalid packet ends with
XDP_PKT_CONTD cleared, preventing the drain from spilling into
the trailing sync packet.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
.../selftests/bpf/prog_tests/test_xsk.c | 48 +++++++++++--------
1 file changed, 27 insertions(+), 21 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/test_xsk.c b/tools/testing/selftests/bpf/prog_tests/test_xsk.c
index 7950c504ed28..1f196c8ebc73 100644
--- a/tools/testing/selftests/bpf/prog_tests/test_xsk.c
+++ b/tools/testing/selftests/bpf/prog_tests/test_xsk.c
@@ -31,7 +31,7 @@
#define POLL_TMOUT 1000
#define THREAD_TMOUT 3
#define UMEM_HEADROOM_TEST_SIZE 128
-#define XSK_DESC__INVALID_OPTION (0xffff)
+#define XSK_DESC__INVALID_OPTION (0xfffe)
#define XSK_UMEM__INVALID_FRAME_SIZE (MAX_ETH_JUMBO_SIZE + 1)
#define XSK_UMEM__LARGE_FRAME_SIZE (3 * 1024)
#define XSK_UMEM__MAX_FRAME_SIZE (4 * 1024)
@@ -950,17 +950,11 @@ static int complete_pkts(struct xsk_socket_info *xsk, int batch_size)
rcvd = xsk_ring_cons__peek(&xsk->umem->cq, batch_size, &idx);
if (rcvd) {
- if (rcvd > xsk->outstanding_tx) {
- u64 addr = *xsk_ring_cons__comp_addr(&xsk->umem->cq, idx + rcvd - 1);
-
- ksft_print_msg("[%s] Too many packets completed\n", __func__);
- ksft_print_msg("Last completion address: %llx\n",
- (unsigned long long)addr);
- return TEST_FAILURE;
- }
-
xsk_ring_cons__release(&xsk->umem->cq, rcvd);
- xsk->outstanding_tx -= rcvd;
+ if (rcvd > xsk->outstanding_tx)
+ xsk->outstanding_tx = 0;
+ else
+ xsk->outstanding_tx -= rcvd;
}
return TEST_PASS;
@@ -1274,6 +1268,8 @@ static int __send_pkts(struct ifobject *ifobject, struct xsk_socket_info *xsk, b
static int wait_for_tx_completion(struct xsk_socket_info *xsk)
{
struct timeval tv_end, tv_now, tv_timeout = {THREAD_TMOUT, 0};
+ unsigned int rcvd;
+ u32 idx;
int ret;
ret = gettimeofday(&tv_now, NULL);
@@ -1293,6 +1289,17 @@ static int wait_for_tx_completion(struct xsk_socket_info *xsk)
complete_pkts(xsk, xsk->batch_size);
}
+ do {
+ if (xsk_ring_prod__needs_wakeup(&xsk->tx))
+ kick_tx(xsk);
+ rcvd = xsk_ring_cons__peek(&xsk->umem->cq, xsk->batch_size, &idx);
+ if (rcvd)
+ xsk_ring_cons__release(&xsk->umem->cq, rcvd);
+ } while (rcvd);
+
+ usleep(100);
+ complete_pkts(xsk, xsk->batch_size);
+
return TEST_PASS;
}
@@ -2075,10 +2082,10 @@ int testapp_invalid_desc_mb(struct test_spec *test)
{0, 0, 0, false, 0},
/* Invalid address in the second frame */
{0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD},
- {umem_size, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD},
+ {umem_size, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, 0},
/* Invalid len in the middle */
{0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD},
- {0, XSK_UMEM__INVALID_FRAME_SIZE, 0, false, XDP_PKT_CONTD},
+ {0, XSK_UMEM__INVALID_FRAME_SIZE, 0, false, 0},
/* Invalid options in the middle */
{0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD},
{0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XSK_DESC__INVALID_OPTION},
@@ -2229,7 +2236,7 @@ int testapp_too_many_frags(struct test_spec *test)
max_frags += 1;
}
- pkts = calloc(2 * max_frags + 2, sizeof(struct pkt));
+ pkts = calloc(2 * max_frags + 3, sizeof(struct pkt));
if (!pkts)
return TEST_FAILURE;
@@ -2247,20 +2254,19 @@ int testapp_too_many_frags(struct test_spec *test)
}
pkts[max_frags].options = 0;
- /* An invalid packet with the max amount of frags but signals packet
- * continues on the last frag
- */
- for (i = max_frags + 1; i < 2 * max_frags + 1; i++) {
+ /* An invalid packet with too many frags */
+ for (i = max_frags + 1; i < 2 * max_frags + 2; i++) {
pkts[i].len = MIN_PKT_SIZE;
pkts[i].options = XDP_PKT_CONTD;
pkts[i].valid = false;
}
+ pkts[2 * max_frags + 1].options = 0;
/* Valid packet for synch */
- pkts[2 * max_frags + 1].len = MIN_PKT_SIZE;
- pkts[2 * max_frags + 1].valid = true;
+ pkts[2 * max_frags + 2].len = MIN_PKT_SIZE;
+ pkts[2 * max_frags + 2].valid = true;
- if (pkt_stream_generate_custom(test, pkts, 2 * max_frags + 2)) {
+ if (pkt_stream_generate_custom(test, pkts, 2 * max_frags + 3)) {
free(pkts);
return TEST_FAILURE;
}
--
2.43.7
prev parent reply other threads:[~2026-05-17 6:33 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-17 6:33 [PATCH net v3 0/5] xsk: fix meta and publish of cq issues Jason Xing
2026-05-17 6:33 ` [PATCH net v3 1/5] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() Jason Xing
2026-05-17 6:33 ` [PATCH net v3 2/5] xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx Jason Xing
2026-05-17 6:33 ` [PATCH net v3 3/5] xsk: drain continuation descs after overflow in xsk_build_skb() Jason Xing
2026-05-17 6:33 ` [PATCH net v3 4/5] xsk: drain continuation descs on invalid descriptor in __xsk_generic_xmit() Jason Xing
2026-05-17 6:33 ` Jason Xing [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260517063311.28921-6-kerneljasonxing@gmail.com \
--to=kerneljasonxing@gmail.com \
--cc=andrew+netdev@lunn.ch \
--cc=ast@kernel.org \
--cc=bjorn@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=jonathan.lemon@gmail.com \
--cc=kernelxing@tencent.com \
--cc=kuba@kernel.org \
--cc=maciej.fijalkowski@intel.com \
--cc=magnus.karlsson@intel.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.