* [PATCH net v4 0/5] xsk: fix meta and publish of cq issues
@ 2026-05-20 0:42 Jason Xing
2026-05-20 0:42 ` [PATCH net v4 1/5] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() Jason Xing
` (5 more replies)
0 siblings, 6 replies; 22+ messages in thread
From: Jason Xing @ 2026-05-20 0:42 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, horms, andrew+netdev
Cc: bpf, netdev, Jason Xing
From: Jason Xing <kernelxing@tencent.com>
The series is the product of previous review from sashiko[1].
1) META
patch 1: address TOCTOU around metadata.
2) PUBLISH of CQ
patch 2: make sure xsk_addr->addrs[] can be published to cq when
overflow occurs.
patch 3: keep cleaning up the continuation descs (more than 17) and
publish its address when overflow occurs.
patch 4: like patch 3, but only handles the invalid descs cases.
[1]: https://lore.kernel.org/all/20260502200722.53960-1-kerneljasonxing@gmail.com/
---
V4
Link: https://lore.kernel.org/all/20260517063311.28921-1-kerneljasonxing@gmail.com/
1. correct the description of xmit path in patch 3 (sashiko)
2. move set logic into xmit path in patch 3 (Stan)
V3
Link: https://lore.kernel.org/all/20260515123018.80147-1-kerneljasonxing@gmail.com/
1. avoid breaking previous usage of sendto, and siliently handle
overflow case (Stan, sashiko)
2. add one particular exception process in patch 4 (sashiko)
3. adjust the selftest to make sure it passes in either virutal or
physical machines, which includes add usleep to support physical machine.
V2
Link: https://lore.kernel.org/all/20260510012310.88570-1-kerneljasonxing@gmail.com/
1. adjust selftests (Jakub)
2. add READ_ONCE in patch 1 (Stan)
Jason Xing (5):
xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata()
xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx
xsk: drain continuation descs after overflow in xsk_build_skb()
xsk: drain continuation descs on invalid descriptor in
__xsk_generic_xmit()
selftests/xsk: drain CQ to wait for TX completion
include/net/xdp_sock.h | 1 +
net/xdp/xsk.c | 44 +++++++++++++----
.../selftests/bpf/prog_tests/test_xsk.c | 48 +++++++++++--------
3 files changed, 63 insertions(+), 30 deletions(-)
--
2.43.7
^ permalink raw reply [flat|nested] 22+ messages in thread* [PATCH net v4 1/5] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() 2026-05-20 0:42 [PATCH net v4 0/5] xsk: fix meta and publish of cq issues Jason Xing @ 2026-05-20 0:42 ` Jason Xing 2026-05-21 12:04 ` Maciej Fijalkowski 2026-05-20 0:42 ` [PATCH net v4 2/5] xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx Jason Xing ` (4 subsequent siblings) 5 siblings, 1 reply; 22+ messages in thread From: Jason Xing @ 2026-05-20 0:42 UTC (permalink / raw) To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms, andrew+netdev Cc: bpf, netdev, Jason Xing From: Jason Xing <kernelxing@tencent.com> The TX metadata area resides in the UMEM buffer which is memory-mapped and concurrently writable by userspace. In xsk_skb_metadata(), csum_start and csum_offset are read from shared memory for bounds validation, then read again for skb assignment. A malicious userspace application can race to overwrite these values between the two reads, bypassing the bounds check and causing out-of-bounds memory access during checksum computation in the transmit path. Fix this by reading csum_start and csum_offset into local variables once, then using the local copies for both validation and assignment. Note that other metadata fields (flags, launch_time) and the cached csum fields may be mutually inconsistent due to concurrent userspace writes, but this is benign: the only security-critical invariant is that each field's validated value is the same one used, which local caching guarantees. Closes: https://lore.kernel.org/all/20260503200927.73EA1C2BCB4@smtp.kernel.org/ Fixes: 48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support") Signed-off-by: Jason Xing <kernelxing@tencent.com> --- net/xdp/xsk.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 5e5786cd9af5..f8c8a8c9dfba 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -802,6 +802,7 @@ static int xsk_skb_metadata(struct sk_buff *skb, void *buffer, u32 hr) { struct xsk_tx_metadata *meta = NULL; + u16 csum_start, csum_offset; if (unlikely(pool->tx_metadata_len == 0)) return -EINVAL; @@ -811,13 +812,15 @@ static int xsk_skb_metadata(struct sk_buff *skb, void *buffer, return -EINVAL; if (meta->flags & XDP_TXMD_FLAGS_CHECKSUM) { - if (unlikely(meta->request.csum_start + - meta->request.csum_offset + + csum_start = READ_ONCE(meta->request.csum_start); + csum_offset = READ_ONCE(meta->request.csum_offset); + + if (unlikely(csum_start + csum_offset + sizeof(__sum16) > desc->len)) return -EINVAL; - skb->csum_start = hr + meta->request.csum_start; - skb->csum_offset = meta->request.csum_offset; + skb->csum_start = hr + csum_start; + skb->csum_offset = csum_offset; skb->ip_summed = CHECKSUM_PARTIAL; if (unlikely(pool->tx_sw_csum)) { -- 2.43.7 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH net v4 1/5] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() 2026-05-20 0:42 ` [PATCH net v4 1/5] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() Jason Xing @ 2026-05-21 12:04 ` Maciej Fijalkowski 0 siblings, 0 replies; 22+ messages in thread From: Maciej Fijalkowski @ 2026-05-21 12:04 UTC (permalink / raw) To: Jason Xing Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing On Wed, May 20, 2026 at 08:42:40AM +0800, Jason Xing wrote: > From: Jason Xing <kernelxing@tencent.com> > > The TX metadata area resides in the UMEM buffer which is memory-mapped > and concurrently writable by userspace. In xsk_skb_metadata(), > csum_start and csum_offset are read from shared memory for bounds > validation, then read again for skb assignment. A malicious userspace > application can race to overwrite these values between the two reads, > bypassing the bounds check and causing out-of-bounds memory access > during checksum computation in the transmit path. > > Fix this by reading csum_start and csum_offset into local variables > once, then using the local copies for both validation and assignment. > > Note that other metadata fields (flags, launch_time) and the cached > csum fields may be mutually inconsistent due to concurrent userspace > writes, but this is benign: the only security-critical invariant is > that each field's validated value is the same one used, which local > caching guarantees. > > Closes: https://lore.kernel.org/all/20260503200927.73EA1C2BCB4@smtp.kernel.org/ > Fixes: 48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support") > Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> > --- > net/xdp/xsk.c | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > index 5e5786cd9af5..f8c8a8c9dfba 100644 > --- a/net/xdp/xsk.c > +++ b/net/xdp/xsk.c > @@ -802,6 +802,7 @@ static int xsk_skb_metadata(struct sk_buff *skb, void *buffer, > u32 hr) > { > struct xsk_tx_metadata *meta = NULL; > + u16 csum_start, csum_offset; > > if (unlikely(pool->tx_metadata_len == 0)) > return -EINVAL; > @@ -811,13 +812,15 @@ static int xsk_skb_metadata(struct sk_buff *skb, void *buffer, > return -EINVAL; > > if (meta->flags & XDP_TXMD_FLAGS_CHECKSUM) { > - if (unlikely(meta->request.csum_start + > - meta->request.csum_offset + > + csum_start = READ_ONCE(meta->request.csum_start); > + csum_offset = READ_ONCE(meta->request.csum_offset); > + > + if (unlikely(csum_start + csum_offset + > sizeof(__sum16) > desc->len)) > return -EINVAL; > > - skb->csum_start = hr + meta->request.csum_start; > - skb->csum_offset = meta->request.csum_offset; > + skb->csum_start = hr + csum_start; > + skb->csum_offset = csum_offset; > skb->ip_summed = CHECKSUM_PARTIAL; > > if (unlikely(pool->tx_sw_csum)) { > -- > 2.43.7 > ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH net v4 2/5] xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx 2026-05-20 0:42 [PATCH net v4 0/5] xsk: fix meta and publish of cq issues Jason Xing 2026-05-20 0:42 ` [PATCH net v4 1/5] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() Jason Xing @ 2026-05-20 0:42 ` Jason Xing 2026-05-21 12:05 ` Maciej Fijalkowski 2026-05-20 0:42 ` [PATCH net v4 3/5] xsk: drain continuation descs after overflow in xsk_build_skb() Jason Xing ` (3 subsequent siblings) 5 siblings, 1 reply; 22+ messages in thread From: Jason Xing @ 2026-05-20 0:42 UTC (permalink / raw) To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms, andrew+netdev Cc: bpf, netdev, Jason Xing From: Jason Xing <kernelxing@tencent.com> This patch is inspired by the check[1] from sashiko. It says when overflow happens, the address of cq to be published is invalid. Actually the severer thing is the whole process of publishing the address of cq in this particular case is not right: it should truely publish the address and advance the cached_prod in cq as long as it reads descriptors from txq. The following is the full analysis. xsk_drop_skb() is called in three places, which all discard a partially built multi-buffer skb: 1) xsk_build_skb() -EOVERFLOW error path: packet exceeds MAX_SKB_FRAGS 2) __xsk_generic_xmit() post-loop cleanup: an invalid descriptor in the TX ring prevents the partial packet from completing 3) xsk_release(): socket close while xs->skb holds an incomplete packet In all three cases, the TX descriptors for the already-processed frags have been consumed from the TX ring (xskq_cons_release), and CQ slots have been reserved. However, xsk_drop_skb() calls xsk_consume_skb() which cancels the CQ reservations via xsk_cq_cancel_locked(). Since the buffer addresses never appear in the completion queue, userspace permanently loses track of these buffers. Fix this by letting consume_skb() trigger the existing xsk_destruct_skb destructor, which already submits buffer addresses to the CQ via xsk_cq_submit_addr_locked(). Note that cancelling the descriptors back to the TX ring (via xskq_cons_cancel_n) is not a appropriate option because an oversized packet that always exceeds MAX_SKB_FRAGS would be retried indefinitely, which is an obviously deadlock bug in the TX path. Also move the desc->addr assignment in xsk_build_skb() above the overflow check so that the current descriptor's address is recorded before a potential -EOVERFLOW jump to free_err, consistent with the zerocopy path in xsk_build_skb_zerocopy(). [1]: https://lore.kernel.org/all/20260425041726.85FB3C2BCB2@smtp.kernel.org/ Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path") Signed-off-by: Jason Xing <kernelxing@tencent.com> --- net/xdp/xsk.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index f8c8a8c9dfba..0a6203c42576 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -793,8 +793,11 @@ static void xsk_consume_skb(struct sk_buff *skb) static void xsk_drop_skb(struct sk_buff *skb) { - xdp_sk(skb->sk)->tx->invalid_descs += xsk_get_num_desc(skb); - xsk_consume_skb(skb); + struct xdp_sock *xs = xdp_sk(skb->sk); + + xs->tx->invalid_descs += xsk_get_num_desc(skb); + consume_skb(skb); + xs->skb = NULL; } static int xsk_skb_metadata(struct sk_buff *skb, void *buffer, @@ -876,7 +879,7 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, return ERR_PTR(-ENOMEM); /* in case of -EOVERFLOW that could happen below, - * xsk_consume_skb() will release this node as whole skb + * xsk_drop_skb() will release this node as whole skb * would be dropped, which implies freeing all list elements */ xsk_addr->addrs[xsk_addr->num_descs] = desc->addr; @@ -968,6 +971,8 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, goto free_err; } + xsk_addr->addrs[xsk_addr->num_descs] = desc->addr; + if (unlikely(nr_frags == (MAX_SKB_FRAGS - 1) && xp_mb_desc(desc))) { err = -EOVERFLOW; goto free_err; @@ -985,8 +990,6 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, skb_add_rx_frag(skb, nr_frags, page, 0, len, PAGE_SIZE); refcount_add(PAGE_SIZE, &xs->sk.sk_wmem_alloc); - - xsk_addr->addrs[xsk_addr->num_descs] = desc->addr; } } -- 2.43.7 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH net v4 2/5] xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx 2026-05-20 0:42 ` [PATCH net v4 2/5] xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx Jason Xing @ 2026-05-21 12:05 ` Maciej Fijalkowski 0 siblings, 0 replies; 22+ messages in thread From: Maciej Fijalkowski @ 2026-05-21 12:05 UTC (permalink / raw) To: Jason Xing Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing On Wed, May 20, 2026 at 08:42:41AM +0800, Jason Xing wrote: > From: Jason Xing <kernelxing@tencent.com> > > This patch is inspired by the check[1] from sashiko. It says when > overflow happens, the address of cq to be published is invalid. > Actually the severer thing is the whole process of publishing the > address of cq in this particular case is not right: it should truely > publish the address and advance the cached_prod in cq as long as it > reads descriptors from txq. > > The following is the full analysis. > xsk_drop_skb() is called in three places, which all discard a partially > built multi-buffer skb: > 1) xsk_build_skb() -EOVERFLOW error path: packet exceeds MAX_SKB_FRAGS > 2) __xsk_generic_xmit() post-loop cleanup: an invalid descriptor in > the TX ring prevents the partial packet from completing > 3) xsk_release(): socket close while xs->skb holds an incomplete packet > > In all three cases, the TX descriptors for the already-processed frags > have been consumed from the TX ring (xskq_cons_release), and CQ slots > have been reserved. However, xsk_drop_skb() calls xsk_consume_skb() > which cancels the CQ reservations via xsk_cq_cancel_locked(). Since > the buffer addresses never appear in the completion queue, userspace > permanently loses track of these buffers. > > Fix this by letting consume_skb() trigger the existing xsk_destruct_skb > destructor, which already submits buffer addresses to the CQ via > xsk_cq_submit_addr_locked(). > > Note that cancelling the descriptors back to the TX ring (via > xskq_cons_cancel_n) is not a appropriate option because an oversized > packet that always exceeds MAX_SKB_FRAGS would be retried indefinitely, > which is an obviously deadlock bug in the TX path. > > Also move the desc->addr assignment in xsk_build_skb() above the > overflow check so that the current descriptor's address is recorded > before a potential -EOVERFLOW jump to free_err, consistent with the > zerocopy path in xsk_build_skb_zerocopy(). > > [1]: https://lore.kernel.org/all/20260425041726.85FB3C2BCB2@smtp.kernel.org/ > > Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path") > Signed-off-by: Jason Xing <kernelxing@tencent.com> Sorry for the noise, got lost in my inbox and replied on v3. Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> > --- > net/xdp/xsk.c | 13 ++++++++----- > 1 file changed, 8 insertions(+), 5 deletions(-) > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > index f8c8a8c9dfba..0a6203c42576 100644 > --- a/net/xdp/xsk.c > +++ b/net/xdp/xsk.c > @@ -793,8 +793,11 @@ static void xsk_consume_skb(struct sk_buff *skb) > > static void xsk_drop_skb(struct sk_buff *skb) > { > - xdp_sk(skb->sk)->tx->invalid_descs += xsk_get_num_desc(skb); > - xsk_consume_skb(skb); > + struct xdp_sock *xs = xdp_sk(skb->sk); > + > + xs->tx->invalid_descs += xsk_get_num_desc(skb); > + consume_skb(skb); > + xs->skb = NULL; > } > > static int xsk_skb_metadata(struct sk_buff *skb, void *buffer, > @@ -876,7 +879,7 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, > return ERR_PTR(-ENOMEM); > > /* in case of -EOVERFLOW that could happen below, > - * xsk_consume_skb() will release this node as whole skb > + * xsk_drop_skb() will release this node as whole skb > * would be dropped, which implies freeing all list elements > */ > xsk_addr->addrs[xsk_addr->num_descs] = desc->addr; > @@ -968,6 +971,8 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, > goto free_err; > } > > + xsk_addr->addrs[xsk_addr->num_descs] = desc->addr; > + > if (unlikely(nr_frags == (MAX_SKB_FRAGS - 1) && xp_mb_desc(desc))) { > err = -EOVERFLOW; > goto free_err; > @@ -985,8 +990,6 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, > > skb_add_rx_frag(skb, nr_frags, page, 0, len, PAGE_SIZE); > refcount_add(PAGE_SIZE, &xs->sk.sk_wmem_alloc); > - > - xsk_addr->addrs[xsk_addr->num_descs] = desc->addr; > } > } > > -- > 2.43.7 > ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH net v4 3/5] xsk: drain continuation descs after overflow in xsk_build_skb() 2026-05-20 0:42 [PATCH net v4 0/5] xsk: fix meta and publish of cq issues Jason Xing 2026-05-20 0:42 ` [PATCH net v4 1/5] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() Jason Xing 2026-05-20 0:42 ` [PATCH net v4 2/5] xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx Jason Xing @ 2026-05-20 0:42 ` Jason Xing 2026-05-20 16:10 ` Maciej Fijalkowski 2026-05-20 0:42 ` [PATCH net v4 4/5] xsk: drain continuation descs on invalid descriptor in __xsk_generic_xmit() Jason Xing ` (2 subsequent siblings) 5 siblings, 1 reply; 22+ messages in thread From: Jason Xing @ 2026-05-20 0:42 UTC (permalink / raw) To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms, andrew+netdev Cc: bpf, netdev, Jason Xing From: Jason Xing <kernelxing@tencent.com> When a multi-buffer packet exceeds MAX_SKB_FRAGS and triggers -EOVERFLOW, only the current descriptor is released from the TX ring. The remaining continuation descriptors of the same packet stay in the ring. Since xs->skb is set to NULL after the drop, the TX loop picks up these leftover frags and misinterprets each one as the beginning of a new packet, corrupting the packet stream. Fix this by adding a drain_cont flag to xdp_sock. When overflow occurs and the dropped descriptor has XDP_PKT_CONTD set, the flag is raised, so we have a chance to examine and handle the potential remaining descs of this big overflow'ed skb. When the last fragment (without XDP_PKT_CONTD) is processed, the flag is cleared and the loop continues to process subsequent descriptors with the remaining budget. This behavior follows how previous xmit path treats overflow packets. Closes: https://lore.kernel.org/all/20260425041726.85FB3C2BCB2@smtp.kernel.org/ Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path") Signed-off-by: Jason Xing <kernelxing@tencent.com> --- include/net/xdp_sock.h | 1 + net/xdp/xsk.c | 19 +++++++++++++++++++ 2 files changed, 20 insertions(+) diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index ebac60a3d8a1..8b51876efbed 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -80,6 +80,7 @@ struct xdp_sock { * call of __xsk_generic_xmit(). */ struct sk_buff *skb; + bool drain_cont; struct list_head map_list; /* Protects map_list */ diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 0a6203c42576..f4add7be8c93 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -1062,11 +1062,30 @@ static int __xsk_generic_xmit(struct sock *sk) goto out; } + if (unlikely(xs->drain_cont)) { + unsigned long flags; + u32 idx; + + spin_lock_irqsave(&xs->pool->cq_prod_lock, flags); + idx = xskq_get_prod(xs->pool->cq); + xskq_prod_write_addr(xs->pool->cq, idx, desc.addr); + xskq_prod_submit_n(xs->pool->cq, 1); + spin_unlock_irqrestore(&xs->pool->cq_prod_lock, flags); + + xs->tx->invalid_descs++; + xskq_cons_release(xs->tx); + if (!xp_mb_desc(&desc)) + xs->drain_cont = false; + continue; + } + skb = xsk_build_skb(xs, &desc); if (IS_ERR(skb)) { err = PTR_ERR(skb); if (err != -EOVERFLOW) goto out; + if (xp_mb_desc(&desc)) + xs->drain_cont = true; err = 0; continue; } -- 2.43.7 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH net v4 3/5] xsk: drain continuation descs after overflow in xsk_build_skb() 2026-05-20 0:42 ` [PATCH net v4 3/5] xsk: drain continuation descs after overflow in xsk_build_skb() Jason Xing @ 2026-05-20 16:10 ` Maciej Fijalkowski 2026-05-20 23:53 ` Jason Xing 0 siblings, 1 reply; 22+ messages in thread From: Maciej Fijalkowski @ 2026-05-20 16:10 UTC (permalink / raw) To: Jason Xing Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing On Wed, May 20, 2026 at 08:42:42AM +0800, Jason Xing wrote: > From: Jason Xing <kernelxing@tencent.com> > > When a multi-buffer packet exceeds MAX_SKB_FRAGS and triggers -EOVERFLOW, > only the current descriptor is released from the TX ring. The remaining > continuation descriptors of the same packet stay in the ring. Since > xs->skb is set to NULL after the drop, the TX loop picks up these > leftover frags and misinterprets each one as the beginning of a new > packet, corrupting the packet stream. > > Fix this by adding a drain_cont flag to xdp_sock. When overflow occurs > and the dropped descriptor has XDP_PKT_CONTD set, the flag is raised, > so we have a chance to examine and handle the potential remaining descs > of this big overflow'ed skb. > > When the last fragment (without XDP_PKT_CONTD) is processed, the flag > is cleared and the loop continues to process subsequent descriptors > with the remaining budget. This behavior follows how previous xmit path > treats overflow packets. > > Closes: https://lore.kernel.org/all/20260425041726.85FB3C2BCB2@smtp.kernel.org/ > Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path") > Signed-off-by: Jason Xing <kernelxing@tencent.com> > --- > include/net/xdp_sock.h | 1 + > net/xdp/xsk.c | 19 +++++++++++++++++++ > 2 files changed, 20 insertions(+) > > diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h > index ebac60a3d8a1..8b51876efbed 100644 > --- a/include/net/xdp_sock.h > +++ b/include/net/xdp_sock.h > @@ -80,6 +80,7 @@ struct xdp_sock { > * call of __xsk_generic_xmit(). > */ > struct sk_buff *skb; > + bool drain_cont; > > struct list_head map_list; > /* Protects map_list */ > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > index 0a6203c42576..f4add7be8c93 100644 > --- a/net/xdp/xsk.c > +++ b/net/xdp/xsk.c > @@ -1062,11 +1062,30 @@ static int __xsk_generic_xmit(struct sock *sk) > goto out; > } > > + if (unlikely(xs->drain_cont)) { Hi Jason, could we get away from introducing dedicated boolean to xdp_sock for handling this rare case? > + unsigned long flags; > + u32 idx; > + > + spin_lock_irqsave(&xs->pool->cq_prod_lock, flags); > + idx = xskq_get_prod(xs->pool->cq); > + xskq_prod_write_addr(xs->pool->cq, idx, desc.addr); > + xskq_prod_submit_n(xs->pool->cq, 1); > + spin_unlock_irqrestore(&xs->pool->cq_prod_lock, flags); It feels a bit off to do this cleanup one-by-one. I wonder if it could be covered by xsk_drop_skb() where you would walk the tx ring and store addrs at xsk_addrs and then use xsk_cq_submit_addr_locked() ? OTOH, in theory the amount of frags after we hit the overflow case could be well over MAX_SKB_FRAGS again so xsk_addrs wouldn't be able to cover them. > + > + xs->tx->invalid_descs++; > + xskq_cons_release(xs->tx); > + if (!xp_mb_desc(&desc)) > + xs->drain_cont = false; > + continue; > + } > + > skb = xsk_build_skb(xs, &desc); > if (IS_ERR(skb)) { > err = PTR_ERR(skb); > if (err != -EOVERFLOW) > goto out; > + if (xp_mb_desc(&desc)) > + xs->drain_cont = true; > err = 0; > continue; > } > -- > 2.43.7 > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net v4 3/5] xsk: drain continuation descs after overflow in xsk_build_skb() 2026-05-20 16:10 ` Maciej Fijalkowski @ 2026-05-20 23:53 ` Jason Xing 2026-05-21 12:02 ` Maciej Fijalkowski 0 siblings, 1 reply; 22+ messages in thread From: Jason Xing @ 2026-05-20 23:53 UTC (permalink / raw) To: Maciej Fijalkowski Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing On Thu, May 21, 2026 at 12:10 AM Maciej Fijalkowski <maciej.fijalkowski@intel.com> wrote: > > On Wed, May 20, 2026 at 08:42:42AM +0800, Jason Xing wrote: > > From: Jason Xing <kernelxing@tencent.com> > > > > When a multi-buffer packet exceeds MAX_SKB_FRAGS and triggers -EOVERFLOW, > > only the current descriptor is released from the TX ring. The remaining > > continuation descriptors of the same packet stay in the ring. Since > > xs->skb is set to NULL after the drop, the TX loop picks up these > > leftover frags and misinterprets each one as the beginning of a new > > packet, corrupting the packet stream. > > > > Fix this by adding a drain_cont flag to xdp_sock. When overflow occurs > > and the dropped descriptor has XDP_PKT_CONTD set, the flag is raised, > > so we have a chance to examine and handle the potential remaining descs > > of this big overflow'ed skb. > > > > When the last fragment (without XDP_PKT_CONTD) is processed, the flag > > is cleared and the loop continues to process subsequent descriptors > > with the remaining budget. This behavior follows how previous xmit path > > treats overflow packets. > > > > Closes: https://lore.kernel.org/all/20260425041726.85FB3C2BCB2@smtp.kernel.org/ > > Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path") > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > --- > > include/net/xdp_sock.h | 1 + > > net/xdp/xsk.c | 19 +++++++++++++++++++ > > 2 files changed, 20 insertions(+) > > > > diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h > > index ebac60a3d8a1..8b51876efbed 100644 > > --- a/include/net/xdp_sock.h > > +++ b/include/net/xdp_sock.h > > @@ -80,6 +80,7 @@ struct xdp_sock { > > * call of __xsk_generic_xmit(). > > */ > > struct sk_buff *skb; > > + bool drain_cont; > > > > struct list_head map_list; > > /* Protects map_list */ > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > > index 0a6203c42576..f4add7be8c93 100644 > > --- a/net/xdp/xsk.c > > +++ b/net/xdp/xsk.c > > @@ -1062,11 +1062,30 @@ static int __xsk_generic_xmit(struct sock *sk) > > goto out; > > } > > > > + if (unlikely(xs->drain_cont)) { > > Hi Jason, > > could we get away from introducing dedicated boolean to xdp_sock for > handling this rare case? I've tried several ways, but they didn't work. Please see the short conversation with Stan: https://lore.kernel.org/all/CAL+tcoA6Cu3P-y2wYjJ_R9cSeRkW2tswY1LveqH=uxd2McuR-w@mail.gmail.com/ That means we always rely on a flag (that is either static or like this patch) to recognize how we treat this abnormal descs. > > > + unsigned long flags; > > + u32 idx; > > + > > + spin_lock_irqsave(&xs->pool->cq_prod_lock, flags); > > + idx = xskq_get_prod(xs->pool->cq); > > + xskq_prod_write_addr(xs->pool->cq, idx, desc.addr); > > + xskq_prod_submit_n(xs->pool->cq, 1); > > + spin_unlock_irqrestore(&xs->pool->cq_prod_lock, flags); > > It feels a bit off to do this cleanup one-by-one. Actually no, it adheres to the basic send logic in copy mode. > > I wonder if it could be covered by xsk_drop_skb() where you would walk the > tx ring and store addrs at xsk_addrs and then use xsk_cq_submit_addr_locked() ? > > OTOH, in theory the amount of frags after we hit the overflow case could > be well over MAX_SKB_FRAGS again so xsk_addrs wouldn't be able to cover > them. Right, this is why we now put the logic into the xmit path just like other kinds of descriptors. We need to check and send it under a series of protections like the check of cq pressure and budget. I don't think it's an exception to escape the check logic. The whole process should be fast as there is no need to build skb and send it to the driver as long as this case is recognized. Thanks, Jason > > > + > > + xs->tx->invalid_descs++; > > + xskq_cons_release(xs->tx); > > + if (!xp_mb_desc(&desc)) > > + xs->drain_cont = false; > > + continue; > > + } > > + > > skb = xsk_build_skb(xs, &desc); > > if (IS_ERR(skb)) { > > err = PTR_ERR(skb); > > if (err != -EOVERFLOW) > > goto out; > > + if (xp_mb_desc(&desc)) > > + xs->drain_cont = true; > > err = 0; > > continue; > > } > > -- > > 2.43.7 > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net v4 3/5] xsk: drain continuation descs after overflow in xsk_build_skb() 2026-05-20 23:53 ` Jason Xing @ 2026-05-21 12:02 ` Maciej Fijalkowski 2026-05-21 13:10 ` Jason Xing 0 siblings, 1 reply; 22+ messages in thread From: Maciej Fijalkowski @ 2026-05-21 12:02 UTC (permalink / raw) To: Jason Xing Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing On Thu, May 21, 2026 at 07:53:27AM +0800, Jason Xing wrote: > On Thu, May 21, 2026 at 12:10 AM Maciej Fijalkowski > <maciej.fijalkowski@intel.com> wrote: > > > > On Wed, May 20, 2026 at 08:42:42AM +0800, Jason Xing wrote: > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > When a multi-buffer packet exceeds MAX_SKB_FRAGS and triggers -EOVERFLOW, > > > only the current descriptor is released from the TX ring. The remaining > > > continuation descriptors of the same packet stay in the ring. Since > > > xs->skb is set to NULL after the drop, the TX loop picks up these > > > leftover frags and misinterprets each one as the beginning of a new > > > packet, corrupting the packet stream. > > > > > > Fix this by adding a drain_cont flag to xdp_sock. When overflow occurs > > > and the dropped descriptor has XDP_PKT_CONTD set, the flag is raised, > > > so we have a chance to examine and handle the potential remaining descs > > > of this big overflow'ed skb. > > > > > > When the last fragment (without XDP_PKT_CONTD) is processed, the flag > > > is cleared and the loop continues to process subsequent descriptors > > > with the remaining budget. This behavior follows how previous xmit path > > > treats overflow packets. > > > > > > Closes: https://lore.kernel.org/all/20260425041726.85FB3C2BCB2@smtp.kernel.org/ > > > Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path") > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > --- > > > include/net/xdp_sock.h | 1 + > > > net/xdp/xsk.c | 19 +++++++++++++++++++ > > > 2 files changed, 20 insertions(+) > > > > > > diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h > > > index ebac60a3d8a1..8b51876efbed 100644 > > > --- a/include/net/xdp_sock.h > > > +++ b/include/net/xdp_sock.h > > > @@ -80,6 +80,7 @@ struct xdp_sock { > > > * call of __xsk_generic_xmit(). > > > */ > > > struct sk_buff *skb; > > > + bool drain_cont; > > > > > > struct list_head map_list; > > > /* Protects map_list */ > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > > > index 0a6203c42576..f4add7be8c93 100644 > > > --- a/net/xdp/xsk.c > > > +++ b/net/xdp/xsk.c > > > @@ -1062,11 +1062,30 @@ static int __xsk_generic_xmit(struct sock *sk) > > > goto out; > > > } > > > > > > + if (unlikely(xs->drain_cont)) { > > > > Hi Jason, > > > > could we get away from introducing dedicated boolean to xdp_sock for > > handling this rare case? > > I've tried several ways, but they didn't work. Please see the short > conversation with Stan: > https://lore.kernel.org/all/CAL+tcoA6Cu3P-y2wYjJ_R9cSeRkW2tswY1LveqH=uxd2McuR-w@mail.gmail.com/ > > That means we always rely on a flag (that is either static or like > this patch) to recognize how we treat this abnormal descs. > > > > > > + unsigned long flags; > > > + u32 idx; > > > + > > > + spin_lock_irqsave(&xs->pool->cq_prod_lock, flags); > > > + idx = xskq_get_prod(xs->pool->cq); > > > + xskq_prod_write_addr(xs->pool->cq, idx, desc.addr); > > > + xskq_prod_submit_n(xs->pool->cq, 1); > > > + spin_unlock_irqrestore(&xs->pool->cq_prod_lock, flags); > > > > It feels a bit off to do this cleanup one-by-one. > > Actually no, it adheres to the basic send logic in copy mode. Not really. Here you are producer to cq which we have batched variants for usage. But after second thoughts I don't think it would be worth to put any further effort onto optimizations here. Anyways, even though generic xmit is a whole bunch of mess, maybe you could pull this out onto a helper function? > > > > > I wonder if it could be covered by xsk_drop_skb() where you would walk the > > tx ring and store addrs at xsk_addrs and then use xsk_cq_submit_addr_locked() ? > > > > OTOH, in theory the amount of frags after we hit the overflow case could > > be well over MAX_SKB_FRAGS again so xsk_addrs wouldn't be able to cover > > them. > > Right, this is why we now put the logic into the xmit path just like > other kinds of descriptors. We need to check and send it under a > series of protections like the check of cq pressure and budget. I > don't think it's an exception to escape the check logic. The whole > process should be fast as there is no need to build skb and send it to > the driver as long as this case is recognized. > > Thanks, > Jason > > > > > > + > > > + xs->tx->invalid_descs++; > > > + xskq_cons_release(xs->tx); > > > + if (!xp_mb_desc(&desc)) > > > + xs->drain_cont = false; > > > + continue; > > > + } > > > + > > > skb = xsk_build_skb(xs, &desc); > > > if (IS_ERR(skb)) { > > > err = PTR_ERR(skb); > > > if (err != -EOVERFLOW) > > > goto out; > > > + if (xp_mb_desc(&desc)) > > > + xs->drain_cont = true; > > > err = 0; > > > continue; > > > } > > > -- > > > 2.43.7 > > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net v4 3/5] xsk: drain continuation descs after overflow in xsk_build_skb() 2026-05-21 12:02 ` Maciej Fijalkowski @ 2026-05-21 13:10 ` Jason Xing 2026-05-22 9:06 ` Magnus Karlsson 0 siblings, 1 reply; 22+ messages in thread From: Jason Xing @ 2026-05-21 13:10 UTC (permalink / raw) To: Maciej Fijalkowski Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing On Thu, May 21, 2026 at 8:02 PM Maciej Fijalkowski <maciej.fijalkowski@intel.com> wrote: > > On Thu, May 21, 2026 at 07:53:27AM +0800, Jason Xing wrote: > > On Thu, May 21, 2026 at 12:10 AM Maciej Fijalkowski > > <maciej.fijalkowski@intel.com> wrote: > > > > > > On Wed, May 20, 2026 at 08:42:42AM +0800, Jason Xing wrote: > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > When a multi-buffer packet exceeds MAX_SKB_FRAGS and triggers -EOVERFLOW, > > > > only the current descriptor is released from the TX ring. The remaining > > > > continuation descriptors of the same packet stay in the ring. Since > > > > xs->skb is set to NULL after the drop, the TX loop picks up these > > > > leftover frags and misinterprets each one as the beginning of a new > > > > packet, corrupting the packet stream. > > > > > > > > Fix this by adding a drain_cont flag to xdp_sock. When overflow occurs > > > > and the dropped descriptor has XDP_PKT_CONTD set, the flag is raised, > > > > so we have a chance to examine and handle the potential remaining descs > > > > of this big overflow'ed skb. > > > > > > > > When the last fragment (without XDP_PKT_CONTD) is processed, the flag > > > > is cleared and the loop continues to process subsequent descriptors > > > > with the remaining budget. This behavior follows how previous xmit path > > > > treats overflow packets. > > > > > > > > Closes: https://lore.kernel.org/all/20260425041726.85FB3C2BCB2@smtp.kernel.org/ > > > > Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path") > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > > --- > > > > include/net/xdp_sock.h | 1 + > > > > net/xdp/xsk.c | 19 +++++++++++++++++++ > > > > 2 files changed, 20 insertions(+) > > > > > > > > diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h > > > > index ebac60a3d8a1..8b51876efbed 100644 > > > > --- a/include/net/xdp_sock.h > > > > +++ b/include/net/xdp_sock.h > > > > @@ -80,6 +80,7 @@ struct xdp_sock { > > > > * call of __xsk_generic_xmit(). > > > > */ > > > > struct sk_buff *skb; > > > > + bool drain_cont; > > > > > > > > struct list_head map_list; > > > > /* Protects map_list */ > > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > > > > index 0a6203c42576..f4add7be8c93 100644 > > > > --- a/net/xdp/xsk.c > > > > +++ b/net/xdp/xsk.c > > > > @@ -1062,11 +1062,30 @@ static int __xsk_generic_xmit(struct sock *sk) > > > > goto out; > > > > } > > > > > > > > + if (unlikely(xs->drain_cont)) { > > > > > > Hi Jason, > > > > > > could we get away from introducing dedicated boolean to xdp_sock for > > > handling this rare case? > > > > I've tried several ways, but they didn't work. Please see the short > > conversation with Stan: > > https://lore.kernel.org/all/CAL+tcoA6Cu3P-y2wYjJ_R9cSeRkW2tswY1LveqH=uxd2McuR-w@mail.gmail.com/ > > > > That means we always rely on a flag (that is either static or like > > this patch) to recognize how we treat this abnormal descs. > > > > > > > > > + unsigned long flags; > > > > + u32 idx; > > > > + > > > > + spin_lock_irqsave(&xs->pool->cq_prod_lock, flags); > > > > + idx = xskq_get_prod(xs->pool->cq); > > > > + xskq_prod_write_addr(xs->pool->cq, idx, desc.addr); > > > > + xskq_prod_submit_n(xs->pool->cq, 1); > > > > + spin_unlock_irqrestore(&xs->pool->cq_prod_lock, flags); > > > > > > It feels a bit off to do this cleanup one-by-one. > > > > Actually no, it adheres to the basic send logic in copy mode. > > Not really. Here you are producer to cq which we have batched variants for > usage. > > But after second thoughts I don't think it would be worth to put any > further effort onto optimizations here. Anyways, even though generic xmit > is a whole bunch of mess, maybe you could pull this out onto a helper > function? Sure thing. My takeaway is that it will be refactored once as the batch xmit feature lands in the tree. Thanks, Jason > > > > > > > > > I wonder if it could be covered by xsk_drop_skb() where you would walk the > > > tx ring and store addrs at xsk_addrs and then use xsk_cq_submit_addr_locked() ? > > > > > > OTOH, in theory the amount of frags after we hit the overflow case could > > > be well over MAX_SKB_FRAGS again so xsk_addrs wouldn't be able to cover > > > them. > > > > Right, this is why we now put the logic into the xmit path just like > > other kinds of descriptors. We need to check and send it under a > > series of protections like the check of cq pressure and budget. I > > don't think it's an exception to escape the check logic. The whole > > process should be fast as there is no need to build skb and send it to > > the driver as long as this case is recognized. > > > > Thanks, > > Jason > > > > > > > > > + > > > > + xs->tx->invalid_descs++; > > > > + xskq_cons_release(xs->tx); > > > > + if (!xp_mb_desc(&desc)) > > > > + xs->drain_cont = false; > > > > + continue; > > > > + } > > > > + > > > > skb = xsk_build_skb(xs, &desc); > > > > if (IS_ERR(skb)) { > > > > err = PTR_ERR(skb); > > > > if (err != -EOVERFLOW) > > > > goto out; > > > > + if (xp_mb_desc(&desc)) > > > > + xs->drain_cont = true; > > > > err = 0; > > > > continue; > > > > } > > > > -- > > > > 2.43.7 > > > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net v4 3/5] xsk: drain continuation descs after overflow in xsk_build_skb() 2026-05-21 13:10 ` Jason Xing @ 2026-05-22 9:06 ` Magnus Karlsson 2026-05-22 9:22 ` Jason Xing 0 siblings, 1 reply; 22+ messages in thread From: Magnus Karlsson @ 2026-05-22 9:06 UTC (permalink / raw) To: Jason Xing Cc: Maciej Fijalkowski, davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing On Thu, 21 May 2026 at 15:42, Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Thu, May 21, 2026 at 8:02 PM Maciej Fijalkowski > <maciej.fijalkowski@intel.com> wrote: > > > > On Thu, May 21, 2026 at 07:53:27AM +0800, Jason Xing wrote: > > > On Thu, May 21, 2026 at 12:10 AM Maciej Fijalkowski > > > <maciej.fijalkowski@intel.com> wrote: > > > > > > > > On Wed, May 20, 2026 at 08:42:42AM +0800, Jason Xing wrote: > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > > > When a multi-buffer packet exceeds MAX_SKB_FRAGS and triggers -EOVERFLOW, > > > > > only the current descriptor is released from the TX ring. The remaining > > > > > continuation descriptors of the same packet stay in the ring. Since > > > > > xs->skb is set to NULL after the drop, the TX loop picks up these > > > > > leftover frags and misinterprets each one as the beginning of a new > > > > > packet, corrupting the packet stream. > > > > > > > > > > Fix this by adding a drain_cont flag to xdp_sock. When overflow occurs > > > > > and the dropped descriptor has XDP_PKT_CONTD set, the flag is raised, > > > > > so we have a chance to examine and handle the potential remaining descs > > > > > of this big overflow'ed skb. > > > > > > > > > > When the last fragment (without XDP_PKT_CONTD) is processed, the flag > > > > > is cleared and the loop continues to process subsequent descriptors > > > > > with the remaining budget. This behavior follows how previous xmit path > > > > > treats overflow packets. > > > > > > > > > > Closes: https://lore.kernel.org/all/20260425041726.85FB3C2BCB2@smtp.kernel.org/ > > > > > Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path") > > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > > > --- > > > > > include/net/xdp_sock.h | 1 + > > > > > net/xdp/xsk.c | 19 +++++++++++++++++++ > > > > > 2 files changed, 20 insertions(+) > > > > > > > > > > diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h > > > > > index ebac60a3d8a1..8b51876efbed 100644 > > > > > --- a/include/net/xdp_sock.h > > > > > +++ b/include/net/xdp_sock.h > > > > > @@ -80,6 +80,7 @@ struct xdp_sock { > > > > > * call of __xsk_generic_xmit(). > > > > > */ > > > > > struct sk_buff *skb; > > > > > + bool drain_cont; > > > > > > > > > > struct list_head map_list; > > > > > /* Protects map_list */ > > > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > > > > > index 0a6203c42576..f4add7be8c93 100644 > > > > > --- a/net/xdp/xsk.c > > > > > +++ b/net/xdp/xsk.c > > > > > @@ -1062,11 +1062,30 @@ static int __xsk_generic_xmit(struct sock *sk) > > > > > goto out; > > > > > } > > > > > > > > > > + if (unlikely(xs->drain_cont)) { > > > > > > > > Hi Jason, > > > > > > > > could we get away from introducing dedicated boolean to xdp_sock for > > > > handling this rare case? > > > > > > I've tried several ways, but they didn't work. Please see the short > > > conversation with Stan: > > > https://lore.kernel.org/all/CAL+tcoA6Cu3P-y2wYjJ_R9cSeRkW2tswY1LveqH=uxd2McuR-w@mail.gmail.com/ > > > > > > That means we always rely on a flag (that is either static or like > > > this patch) to recognize how we treat this abnormal descs. > > > > > > > > > > > > + unsigned long flags; > > > > > + u32 idx; > > > > > + > > > > > + spin_lock_irqsave(&xs->pool->cq_prod_lock, flags); > > > > > + idx = xskq_get_prod(xs->pool->cq); > > > > > + xskq_prod_write_addr(xs->pool->cq, idx, desc.addr); > > > > > + xskq_prod_submit_n(xs->pool->cq, 1); > > > > > + spin_unlock_irqrestore(&xs->pool->cq_prod_lock, flags); > > > > > > > > It feels a bit off to do this cleanup one-by-one. > > > > > > Actually no, it adheres to the basic send logic in copy mode. > > > > Not really. Here you are producer to cq which we have batched variants for > > usage. > > > > But after second thoughts I don't think it would be worth to put any > > further effort onto optimizations here. Anyways, even though generic xmit > > is a whole bunch of mess, maybe you could pull this out onto a helper > > function? > > Sure thing. My takeaway is that it will be refactored once as the > batch xmit feature lands in the tree. I agree that the code has become really messy since I get a headache when looking at it ;-). Do you want me to take a stab at simplifying it? > Thanks, > Jason > > > > > > > > > > > > > > I wonder if it could be covered by xsk_drop_skb() where you would walk the > > > > tx ring and store addrs at xsk_addrs and then use xsk_cq_submit_addr_locked() ? > > > > > > > > OTOH, in theory the amount of frags after we hit the overflow case could > > > > be well over MAX_SKB_FRAGS again so xsk_addrs wouldn't be able to cover > > > > them. > > > > > > Right, this is why we now put the logic into the xmit path just like > > > other kinds of descriptors. We need to check and send it under a > > > series of protections like the check of cq pressure and budget. I > > > don't think it's an exception to escape the check logic. The whole > > > process should be fast as there is no need to build skb and send it to > > > the driver as long as this case is recognized. > > > > > > Thanks, > > > Jason > > > > > > > > > > > > + > > > > > + xs->tx->invalid_descs++; > > > > > + xskq_cons_release(xs->tx); > > > > > + if (!xp_mb_desc(&desc)) > > > > > + xs->drain_cont = false; > > > > > + continue; > > > > > + } > > > > > + > > > > > skb = xsk_build_skb(xs, &desc); > > > > > if (IS_ERR(skb)) { > > > > > err = PTR_ERR(skb); > > > > > if (err != -EOVERFLOW) > > > > > goto out; > > > > > + if (xp_mb_desc(&desc)) > > > > > + xs->drain_cont = true; > > > > > err = 0; > > > > > continue; > > > > > } > > > > > -- > > > > > 2.43.7 > > > > > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net v4 3/5] xsk: drain continuation descs after overflow in xsk_build_skb() 2026-05-22 9:06 ` Magnus Karlsson @ 2026-05-22 9:22 ` Jason Xing 0 siblings, 0 replies; 22+ messages in thread From: Jason Xing @ 2026-05-22 9:22 UTC (permalink / raw) To: Magnus Karlsson Cc: Maciej Fijalkowski, davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing On Fri, May 22, 2026 at 5:06 PM Magnus Karlsson <magnus.karlsson@gmail.com> wrote: > > On Thu, 21 May 2026 at 15:42, Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > On Thu, May 21, 2026 at 8:02 PM Maciej Fijalkowski > > <maciej.fijalkowski@intel.com> wrote: > > > > > > On Thu, May 21, 2026 at 07:53:27AM +0800, Jason Xing wrote: > > > > On Thu, May 21, 2026 at 12:10 AM Maciej Fijalkowski > > > > <maciej.fijalkowski@intel.com> wrote: > > > > > > > > > > On Wed, May 20, 2026 at 08:42:42AM +0800, Jason Xing wrote: > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > > > > > When a multi-buffer packet exceeds MAX_SKB_FRAGS and triggers -EOVERFLOW, > > > > > > only the current descriptor is released from the TX ring. The remaining > > > > > > continuation descriptors of the same packet stay in the ring. Since > > > > > > xs->skb is set to NULL after the drop, the TX loop picks up these > > > > > > leftover frags and misinterprets each one as the beginning of a new > > > > > > packet, corrupting the packet stream. > > > > > > > > > > > > Fix this by adding a drain_cont flag to xdp_sock. When overflow occurs > > > > > > and the dropped descriptor has XDP_PKT_CONTD set, the flag is raised, > > > > > > so we have a chance to examine and handle the potential remaining descs > > > > > > of this big overflow'ed skb. > > > > > > > > > > > > When the last fragment (without XDP_PKT_CONTD) is processed, the flag > > > > > > is cleared and the loop continues to process subsequent descriptors > > > > > > with the remaining budget. This behavior follows how previous xmit path > > > > > > treats overflow packets. > > > > > > > > > > > > Closes: https://lore.kernel.org/all/20260425041726.85FB3C2BCB2@smtp.kernel.org/ > > > > > > Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path") > > > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > > > > > > --- > > > > > > include/net/xdp_sock.h | 1 + > > > > > > net/xdp/xsk.c | 19 +++++++++++++++++++ > > > > > > 2 files changed, 20 insertions(+) > > > > > > > > > > > > diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h > > > > > > index ebac60a3d8a1..8b51876efbed 100644 > > > > > > --- a/include/net/xdp_sock.h > > > > > > +++ b/include/net/xdp_sock.h > > > > > > @@ -80,6 +80,7 @@ struct xdp_sock { > > > > > > * call of __xsk_generic_xmit(). > > > > > > */ > > > > > > struct sk_buff *skb; > > > > > > + bool drain_cont; > > > > > > > > > > > > struct list_head map_list; > > > > > > /* Protects map_list */ > > > > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > > > > > > index 0a6203c42576..f4add7be8c93 100644 > > > > > > --- a/net/xdp/xsk.c > > > > > > +++ b/net/xdp/xsk.c > > > > > > @@ -1062,11 +1062,30 @@ static int __xsk_generic_xmit(struct sock *sk) > > > > > > goto out; > > > > > > } > > > > > > > > > > > > + if (unlikely(xs->drain_cont)) { > > > > > > > > > > Hi Jason, > > > > > > > > > > could we get away from introducing dedicated boolean to xdp_sock for > > > > > handling this rare case? > > > > > > > > I've tried several ways, but they didn't work. Please see the short > > > > conversation with Stan: > > > > https://lore.kernel.org/all/CAL+tcoA6Cu3P-y2wYjJ_R9cSeRkW2tswY1LveqH=uxd2McuR-w@mail.gmail.com/ > > > > > > > > That means we always rely on a flag (that is either static or like > > > > this patch) to recognize how we treat this abnormal descs. > > > > > > > > > > > > > > > + unsigned long flags; > > > > > > + u32 idx; > > > > > > + > > > > > > + spin_lock_irqsave(&xs->pool->cq_prod_lock, flags); > > > > > > + idx = xskq_get_prod(xs->pool->cq); > > > > > > + xskq_prod_write_addr(xs->pool->cq, idx, desc.addr); > > > > > > + xskq_prod_submit_n(xs->pool->cq, 1); > > > > > > + spin_unlock_irqrestore(&xs->pool->cq_prod_lock, flags); > > > > > > > > > > It feels a bit off to do this cleanup one-by-one. > > > > > > > > Actually no, it adheres to the basic send logic in copy mode. > > > > > > Not really. Here you are producer to cq which we have batched variants for > > > usage. > > > > > > But after second thoughts I don't think it would be worth to put any > > > further effort onto optimizations here. Anyways, even though generic xmit > > > is a whole bunch of mess, maybe you could pull this out onto a helper > > > function? > > > > Sure thing. My takeaway is that it will be refactored once as the > > batch xmit feature lands in the tree. > > I agree that the code has become really messy since I get a headache > when looking at it ;-). Do you want me to take a stab at simplifying > it? Thanks for your help. But not now probably. I think we can focus on the batch xmit feature[1] where the main xmit path will be rewritten. I admit it's still complex to understand :S Could you shed some light on the series that I'm going to post? [1]: https://lore.kernel.org/all/20260415082654.21026-1-kerneljasonxing@gmail.com/ Thanks, Jason > > > > > > > > > > > > > > > > > > > I wonder if it could be covered by xsk_drop_skb() where you would walk the > > > > > tx ring and store addrs at xsk_addrs and then use xsk_cq_submit_addr_locked() ? > > > > > > > > > > OTOH, in theory the amount of frags after we hit the overflow case could > > > > > be well over MAX_SKB_FRAGS again so xsk_addrs wouldn't be able to cover > > > > > them. > > > > > > > > Right, this is why we now put the logic into the xmit path just like > > > > other kinds of descriptors. We need to check and send it under a > > > > series of protections like the check of cq pressure and budget. I > > > > don't think it's an exception to escape the check logic. The whole > > > > process should be fast as there is no need to build skb and send it to > > > > the driver as long as this case is recognized. > > > > > > > > Thanks, > > > > Jason > > > > > > > > > > > > > > > + > > > > > > + xs->tx->invalid_descs++; > > > > > > + xskq_cons_release(xs->tx); > > > > > > + if (!xp_mb_desc(&desc)) > > > > > > + xs->drain_cont = false; > > > > > > + continue; > > > > > > + } > > > > > > + > > > > > > skb = xsk_build_skb(xs, &desc); > > > > > > if (IS_ERR(skb)) { > > > > > > err = PTR_ERR(skb); > > > > > > if (err != -EOVERFLOW) > > > > > > goto out; > > > > > > + if (xp_mb_desc(&desc)) > > > > > > + xs->drain_cont = true; > > > > > > err = 0; > > > > > > continue; > > > > > > } > > > > > > -- > > > > > > 2.43.7 > > > > > > > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH net v4 4/5] xsk: drain continuation descs on invalid descriptor in __xsk_generic_xmit() 2026-05-20 0:42 [PATCH net v4 0/5] xsk: fix meta and publish of cq issues Jason Xing ` (2 preceding siblings ...) 2026-05-20 0:42 ` [PATCH net v4 3/5] xsk: drain continuation descs after overflow in xsk_build_skb() Jason Xing @ 2026-05-20 0:42 ` Jason Xing 2026-05-20 0:42 ` [PATCH net v4 5/5] selftests/xsk: drain CQ to wait for TX completion Jason Xing 2026-05-21 12:23 ` [PATCH net v4 0/5] xsk: fix meta and publish of cq issues Maciej Fijalkowski 5 siblings, 0 replies; 22+ messages in thread From: Jason Xing @ 2026-05-20 0:42 UTC (permalink / raw) To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms, andrew+netdev Cc: bpf, netdev, Jason Xing From: Jason Xing <kernelxing@tencent.com> When the TX loop in __xsk_generic_xmit() encounters an invalid descriptor mid-packet (e.g. an out-of-bounds address), the partial skb is dropped and the offending descriptor is released. However, remaining continuation descriptors belonging to the same multi-buffer packet still sit in the TX ring. Since xs->skb becomes NULL after the drop, the next iteration treats the leftover continuation fragment as a brand-new packet, corrupting the packet stream. Fix this by setting the drain_cont flag when the released descriptor has XDP_PKT_CONTD set. On the next call to __xsk_generic_xmit(), the drain logic introduced in the previous patch handles the remaining fragments with normal CQ backpressure. There is one subtle case: if a subsequent continuation descriptor also has an invalid address, xskq_cons_peek_desc() rejects it and the while loop is never entered, so the in-loop drain path cannot clear drain_cont. The post-loop code already handles this: it sees xskq_has_descs() is true (the failed descriptor was read but not released by peek), releases it, and checks its XDP_PKT_CONTD flag. Add an else branch so that when the released descriptor is the last fragment (no XDP_PKT_CONTD), drain_cont is cleared. This prevents the next valid packet from being incorrectly drained. Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path") Signed-off-by: Jason Xing <kernelxing@tencent.com> --- net/xdp/xsk.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index f4add7be8c93..de953f38b9e2 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -1122,6 +1122,7 @@ static int __xsk_generic_xmit(struct sock *sk) if (xs->skb) xsk_drop_skb(xs->skb); xskq_cons_release(xs->tx); + xs->drain_cont = xp_mb_desc(&desc); } out: -- 2.43.7 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH net v4 5/5] selftests/xsk: drain CQ to wait for TX completion 2026-05-20 0:42 [PATCH net v4 0/5] xsk: fix meta and publish of cq issues Jason Xing ` (3 preceding siblings ...) 2026-05-20 0:42 ` [PATCH net v4 4/5] xsk: drain continuation descs on invalid descriptor in __xsk_generic_xmit() Jason Xing @ 2026-05-20 0:42 ` Jason Xing 2026-05-21 12:23 ` [PATCH net v4 0/5] xsk: fix meta and publish of cq issues Maciej Fijalkowski 5 siblings, 0 replies; 22+ messages in thread From: Jason Xing @ 2026-05-20 0:42 UTC (permalink / raw) To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms, andrew+netdev Cc: bpf, netdev, Jason Xing From: Jason Xing <kernelxing@tencent.com> After the kernel xsk drain_cont patches, dropped multi-buffer descriptors get their buffer addresses published to the completion queue (CQ) via the skb destructor instead of being cancelled. As a result, the CQ entries observed by user space no longer match the software-side accounting based on valid_frags only: __send_pkts() bumps xsk->outstanding_tx by valid_frags, while complete_pkts() decrements it by every CQ entry it consumes, including those produced by drops/drains. This makes outstanding_tx underflow and causes wait_for_tx_completion() to exit while valid descriptors are still sitting in the TX ring, which in turn makes receive_pkts() time out for the ALIGNED_INV_DESC_MULTI_BUFF, UNALIGNED_INV_DESC_MULTI_BUFF and TOO_MANY_FRAGS subtests. Fix this with two changes to the TX completion path: - complete_pkts(): tolerate extra CQ completions by clamping outstanding_tx to zero instead of failing. - wait_for_tx_completion(): after the outstanding_tx loop finishes, add a drain loop that kicks TX and consumes remaining CQ entries. After the drain loop exits, do a short usleep and one final complete_pkts() call so that real hardware (e.g. ice) has enough time to post late CQ entries before we conclude the ring is fully drained. Adjust the multi-buffer invalid-desc tests so that the last descriptor of every invalid packet has XDP_PKT_CONTD cleared. Without this, the kernel drain_cont logic would consume descriptors past the packet boundary and eat into the next valid packet, breaking pkt_nb validation. Concretely: - XSK_DESC__INVALID_OPTION is changed from 0xffff to 0xfffe so it no longer asserts the XDP_PKT_CONTD bit (bit 0). - testapp_invalid_desc_mb() clears XDP_PKT_CONTD on the trailing descriptor of the invalid-address and invalid-length packets. - testapp_too_many_frags() appends one extra terminating descriptor so the over-sized invalid packet ends with XDP_PKT_CONTD cleared, preventing the drain from spilling into the trailing sync packet. Signed-off-by: Jason Xing <kernelxing@tencent.com> --- .../selftests/bpf/prog_tests/test_xsk.c | 48 +++++++++++-------- 1 file changed, 27 insertions(+), 21 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/test_xsk.c b/tools/testing/selftests/bpf/prog_tests/test_xsk.c index 7950c504ed28..1f196c8ebc73 100644 --- a/tools/testing/selftests/bpf/prog_tests/test_xsk.c +++ b/tools/testing/selftests/bpf/prog_tests/test_xsk.c @@ -31,7 +31,7 @@ #define POLL_TMOUT 1000 #define THREAD_TMOUT 3 #define UMEM_HEADROOM_TEST_SIZE 128 -#define XSK_DESC__INVALID_OPTION (0xffff) +#define XSK_DESC__INVALID_OPTION (0xfffe) #define XSK_UMEM__INVALID_FRAME_SIZE (MAX_ETH_JUMBO_SIZE + 1) #define XSK_UMEM__LARGE_FRAME_SIZE (3 * 1024) #define XSK_UMEM__MAX_FRAME_SIZE (4 * 1024) @@ -950,17 +950,11 @@ static int complete_pkts(struct xsk_socket_info *xsk, int batch_size) rcvd = xsk_ring_cons__peek(&xsk->umem->cq, batch_size, &idx); if (rcvd) { - if (rcvd > xsk->outstanding_tx) { - u64 addr = *xsk_ring_cons__comp_addr(&xsk->umem->cq, idx + rcvd - 1); - - ksft_print_msg("[%s] Too many packets completed\n", __func__); - ksft_print_msg("Last completion address: %llx\n", - (unsigned long long)addr); - return TEST_FAILURE; - } - xsk_ring_cons__release(&xsk->umem->cq, rcvd); - xsk->outstanding_tx -= rcvd; + if (rcvd > xsk->outstanding_tx) + xsk->outstanding_tx = 0; + else + xsk->outstanding_tx -= rcvd; } return TEST_PASS; @@ -1274,6 +1268,8 @@ static int __send_pkts(struct ifobject *ifobject, struct xsk_socket_info *xsk, b static int wait_for_tx_completion(struct xsk_socket_info *xsk) { struct timeval tv_end, tv_now, tv_timeout = {THREAD_TMOUT, 0}; + unsigned int rcvd; + u32 idx; int ret; ret = gettimeofday(&tv_now, NULL); @@ -1293,6 +1289,17 @@ static int wait_for_tx_completion(struct xsk_socket_info *xsk) complete_pkts(xsk, xsk->batch_size); } + do { + if (xsk_ring_prod__needs_wakeup(&xsk->tx)) + kick_tx(xsk); + rcvd = xsk_ring_cons__peek(&xsk->umem->cq, xsk->batch_size, &idx); + if (rcvd) + xsk_ring_cons__release(&xsk->umem->cq, rcvd); + } while (rcvd); + + usleep(100); + complete_pkts(xsk, xsk->batch_size); + return TEST_PASS; } @@ -2075,10 +2082,10 @@ int testapp_invalid_desc_mb(struct test_spec *test) {0, 0, 0, false, 0}, /* Invalid address in the second frame */ {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, - {umem_size, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, + {umem_size, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, 0}, /* Invalid len in the middle */ {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, - {0, XSK_UMEM__INVALID_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, + {0, XSK_UMEM__INVALID_FRAME_SIZE, 0, false, 0}, /* Invalid options in the middle */ {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XSK_DESC__INVALID_OPTION}, @@ -2229,7 +2236,7 @@ int testapp_too_many_frags(struct test_spec *test) max_frags += 1; } - pkts = calloc(2 * max_frags + 2, sizeof(struct pkt)); + pkts = calloc(2 * max_frags + 3, sizeof(struct pkt)); if (!pkts) return TEST_FAILURE; @@ -2247,20 +2254,19 @@ int testapp_too_many_frags(struct test_spec *test) } pkts[max_frags].options = 0; - /* An invalid packet with the max amount of frags but signals packet - * continues on the last frag - */ - for (i = max_frags + 1; i < 2 * max_frags + 1; i++) { + /* An invalid packet with too many frags */ + for (i = max_frags + 1; i < 2 * max_frags + 2; i++) { pkts[i].len = MIN_PKT_SIZE; pkts[i].options = XDP_PKT_CONTD; pkts[i].valid = false; } + pkts[2 * max_frags + 1].options = 0; /* Valid packet for synch */ - pkts[2 * max_frags + 1].len = MIN_PKT_SIZE; - pkts[2 * max_frags + 1].valid = true; + pkts[2 * max_frags + 2].len = MIN_PKT_SIZE; + pkts[2 * max_frags + 2].valid = true; - if (pkt_stream_generate_custom(test, pkts, 2 * max_frags + 2)) { + if (pkt_stream_generate_custom(test, pkts, 2 * max_frags + 3)) { free(pkts); return TEST_FAILURE; } -- 2.43.7 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH net v4 0/5] xsk: fix meta and publish of cq issues 2026-05-20 0:42 [PATCH net v4 0/5] xsk: fix meta and publish of cq issues Jason Xing ` (4 preceding siblings ...) 2026-05-20 0:42 ` [PATCH net v4 5/5] selftests/xsk: drain CQ to wait for TX completion Jason Xing @ 2026-05-21 12:23 ` Maciej Fijalkowski 2026-05-21 12:41 ` Jason Xing 5 siblings, 1 reply; 22+ messages in thread From: Maciej Fijalkowski @ 2026-05-21 12:23 UTC (permalink / raw) To: Jason Xing Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing On Wed, May 20, 2026 at 08:42:39AM +0800, Jason Xing wrote: > From: Jason Xing <kernelxing@tencent.com> > > The series is the product of previous review from sashiko[1]. > > 1) META > patch 1: address TOCTOU around metadata. > > 2) PUBLISH of CQ > patch 2: make sure xsk_addr->addrs[] can be published to cq when > overflow occurs. > patch 3: keep cleaning up the continuation descs (more than 17) and > publish its address when overflow occurs. > patch 4: like patch 3, but only handles the invalid descs cases. > > [1]: https://lore.kernel.org/all/20260502200722.53960-1-kerneljasonxing@gmail.com/ > > --- > V4 > Link: https://lore.kernel.org/all/20260517063311.28921-1-kerneljasonxing@gmail.com/ > 1. correct the description of xmit path in patch 3 (sashiko) > 2. move set logic into xmit path in patch 3 (Stan) > > V3 > Link: https://lore.kernel.org/all/20260515123018.80147-1-kerneljasonxing@gmail.com/ > 1. avoid breaking previous usage of sendto, and siliently handle > overflow case (Stan, sashiko) > 2. add one particular exception process in patch 4 (sashiko) > 3. adjust the selftest to make sure it passes in either virutal or > physical machines, which includes add usleep to support physical machine. > > V2 > Link: https://lore.kernel.org/all/20260510012310.88570-1-kerneljasonxing@gmail.com/ > 1. adjust selftests (Jakub) > 2. add READ_ONCE in patch 1 (Stan) FWIW I still get test failures (yes with patch 5 applied). PTAL. > > > Jason Xing (5): > xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() > xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx > xsk: drain continuation descs after overflow in xsk_build_skb() > xsk: drain continuation descs on invalid descriptor in > __xsk_generic_xmit() > selftests/xsk: drain CQ to wait for TX completion > > include/net/xdp_sock.h | 1 + > net/xdp/xsk.c | 44 +++++++++++++---- > .../selftests/bpf/prog_tests/test_xsk.c | 48 +++++++++++-------- > 3 files changed, 63 insertions(+), 30 deletions(-) > > -- > 2.43.7 > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net v4 0/5] xsk: fix meta and publish of cq issues 2026-05-21 12:23 ` [PATCH net v4 0/5] xsk: fix meta and publish of cq issues Maciej Fijalkowski @ 2026-05-21 12:41 ` Jason Xing 2026-05-21 12:59 ` Maciej Fijalkowski 0 siblings, 1 reply; 22+ messages in thread From: Jason Xing @ 2026-05-21 12:41 UTC (permalink / raw) To: Maciej Fijalkowski Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing On Thu, May 21, 2026 at 8:24 PM Maciej Fijalkowski <maciej.fijalkowski@intel.com> wrote: > > On Wed, May 20, 2026 at 08:42:39AM +0800, Jason Xing wrote: > > From: Jason Xing <kernelxing@tencent.com> > > > > The series is the product of previous review from sashiko[1]. > > > > 1) META > > patch 1: address TOCTOU around metadata. > > > > 2) PUBLISH of CQ > > patch 2: make sure xsk_addr->addrs[] can be published to cq when > > overflow occurs. > > patch 3: keep cleaning up the continuation descs (more than 17) and > > publish its address when overflow occurs. > > patch 4: like patch 3, but only handles the invalid descs cases. > > > > [1]: https://lore.kernel.org/all/20260502200722.53960-1-kerneljasonxing@gmail.com/ > > > > --- > > V4 > > Link: https://lore.kernel.org/all/20260517063311.28921-1-kerneljasonxing@gmail.com/ > > 1. correct the description of xmit path in patch 3 (sashiko) > > 2. move set logic into xmit path in patch 3 (Stan) > > > > V3 > > Link: https://lore.kernel.org/all/20260515123018.80147-1-kerneljasonxing@gmail.com/ > > 1. avoid breaking previous usage of sendto, and siliently handle > > overflow case (Stan, sashiko) > > 2. add one particular exception process in patch 4 (sashiko) > > 3. adjust the selftest to make sure it passes in either virutal or > > physical machines, which includes add usleep to support physical machine. > > > > V2 > > Link: https://lore.kernel.org/all/20260510012310.88570-1-kerneljasonxing@gmail.com/ > > 1. adjust selftests (Jakub) > > 2. add READ_ONCE in patch 1 (Stan) > > FWIW I still get test failures (yes with patch 5 applied). PTAL. Thanks for the test. But I've tried with ixgbe driver... I noticed there are some flaky tests which have nothing to do with the series. Can you confirm that it's not caused because of the series? Thanks, Jason > > > > > > > Jason Xing (5): > > xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() > > xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx > > xsk: drain continuation descs after overflow in xsk_build_skb() > > xsk: drain continuation descs on invalid descriptor in > > __xsk_generic_xmit() > > selftests/xsk: drain CQ to wait for TX completion > > > > include/net/xdp_sock.h | 1 + > > net/xdp/xsk.c | 44 +++++++++++++---- > > .../selftests/bpf/prog_tests/test_xsk.c | 48 +++++++++++-------- > > 3 files changed, 63 insertions(+), 30 deletions(-) > > > > -- > > 2.43.7 > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net v4 0/5] xsk: fix meta and publish of cq issues 2026-05-21 12:41 ` Jason Xing @ 2026-05-21 12:59 ` Maciej Fijalkowski 2026-05-21 13:07 ` Jason Xing 0 siblings, 1 reply; 22+ messages in thread From: Maciej Fijalkowski @ 2026-05-21 12:59 UTC (permalink / raw) To: Jason Xing Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing On Thu, May 21, 2026 at 08:41:08PM +0800, Jason Xing wrote: > On Thu, May 21, 2026 at 8:24 PM Maciej Fijalkowski > <maciej.fijalkowski@intel.com> wrote: > > > > On Wed, May 20, 2026 at 08:42:39AM +0800, Jason Xing wrote: > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > The series is the product of previous review from sashiko[1]. > > > > > > 1) META > > > patch 1: address TOCTOU around metadata. > > > > > > 2) PUBLISH of CQ > > > patch 2: make sure xsk_addr->addrs[] can be published to cq when > > > overflow occurs. > > > patch 3: keep cleaning up the continuation descs (more than 17) and > > > publish its address when overflow occurs. > > > patch 4: like patch 3, but only handles the invalid descs cases. > > > > > > [1]: https://lore.kernel.org/all/20260502200722.53960-1-kerneljasonxing@gmail.com/ > > > > > > --- > > > V4 > > > Link: https://lore.kernel.org/all/20260517063311.28921-1-kerneljasonxing@gmail.com/ > > > 1. correct the description of xmit path in patch 3 (sashiko) > > > 2. move set logic into xmit path in patch 3 (Stan) > > > > > > V3 > > > Link: https://lore.kernel.org/all/20260515123018.80147-1-kerneljasonxing@gmail.com/ > > > 1. avoid breaking previous usage of sendto, and siliently handle > > > overflow case (Stan, sashiko) > > > 2. add one particular exception process in patch 4 (sashiko) > > > 3. adjust the selftest to make sure it passes in either virutal or > > > physical machines, which includes add usleep to support physical machine. > > > > > > V2 > > > Link: https://lore.kernel.org/all/20260510012310.88570-1-kerneljasonxing@gmail.com/ > > > 1. adjust selftests (Jakub) > > > 2. add READ_ONCE in patch 1 (Stan) > > > > FWIW I still get test failures (yes with patch 5 applied). PTAL. > > Thanks for the test. But I've tried with ixgbe driver... > > I noticed there are some flaky tests which have nothing to do with the > series. Can you confirm that it's not caused because of the series? That explains the different results as i am using i40e/ice which have multi-buffer support whereas ixgbe does not even support mbuf at XDP. Broken tests are from mbuf cases. > > Thanks, > Jason > > > > > > > > > > > > > Jason Xing (5): > > > xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() > > > xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx > > > xsk: drain continuation descs after overflow in xsk_build_skb() > > > xsk: drain continuation descs on invalid descriptor in > > > __xsk_generic_xmit() > > > selftests/xsk: drain CQ to wait for TX completion > > > > > > include/net/xdp_sock.h | 1 + > > > net/xdp/xsk.c | 44 +++++++++++++---- > > > .../selftests/bpf/prog_tests/test_xsk.c | 48 +++++++++++-------- > > > 3 files changed, 63 insertions(+), 30 deletions(-) > > > > > > -- > > > 2.43.7 > > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net v4 0/5] xsk: fix meta and publish of cq issues 2026-05-21 12:59 ` Maciej Fijalkowski @ 2026-05-21 13:07 ` Jason Xing 2026-05-21 14:24 ` Maciej Fijalkowski 0 siblings, 1 reply; 22+ messages in thread From: Jason Xing @ 2026-05-21 13:07 UTC (permalink / raw) To: Maciej Fijalkowski Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing On Thu, May 21, 2026 at 9:00 PM Maciej Fijalkowski <maciej.fijalkowski@intel.com> wrote: > > On Thu, May 21, 2026 at 08:41:08PM +0800, Jason Xing wrote: > > On Thu, May 21, 2026 at 8:24 PM Maciej Fijalkowski > > <maciej.fijalkowski@intel.com> wrote: > > > > > > On Wed, May 20, 2026 at 08:42:39AM +0800, Jason Xing wrote: > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > The series is the product of previous review from sashiko[1]. > > > > > > > > 1) META > > > > patch 1: address TOCTOU around metadata. > > > > > > > > 2) PUBLISH of CQ > > > > patch 2: make sure xsk_addr->addrs[] can be published to cq when > > > > overflow occurs. > > > > patch 3: keep cleaning up the continuation descs (more than 17) and > > > > publish its address when overflow occurs. > > > > patch 4: like patch 3, but only handles the invalid descs cases. > > > > > > > > [1]: https://lore.kernel.org/all/20260502200722.53960-1-kerneljasonxing@gmail.com/ > > > > > > > > --- > > > > V4 > > > > Link: https://lore.kernel.org/all/20260517063311.28921-1-kerneljasonxing@gmail.com/ > > > > 1. correct the description of xmit path in patch 3 (sashiko) > > > > 2. move set logic into xmit path in patch 3 (Stan) > > > > > > > > V3 > > > > Link: https://lore.kernel.org/all/20260515123018.80147-1-kerneljasonxing@gmail.com/ > > > > 1. avoid breaking previous usage of sendto, and siliently handle > > > > overflow case (Stan, sashiko) > > > > 2. add one particular exception process in patch 4 (sashiko) > > > > 3. adjust the selftest to make sure it passes in either virutal or > > > > physical machines, which includes add usleep to support physical machine. > > > > > > > > V2 > > > > Link: https://lore.kernel.org/all/20260510012310.88570-1-kerneljasonxing@gmail.com/ > > > > 1. adjust selftests (Jakub) > > > > 2. add READ_ONCE in patch 1 (Stan) > > > > > > FWIW I still get test failures (yes with patch 5 applied). PTAL. > > > > Thanks for the test. But I've tried with ixgbe driver... > > > > I noticed there are some flaky tests which have nothing to do with the > > series. Can you confirm that it's not caused because of the series? > > That explains the different results as i am using i40e/ice which have > multi-buffer support whereas ixgbe does not even support mbuf at XDP. > Broken tests are from mbuf cases. That's weird. I never expected the failed tests to be about multi-buffer. Are they the same as the output you attached last time? Or something new? Could you please share it so that I can investigate the root cause? Thanks, Jason > > > > > Thanks, > > Jason > > > > > > > > > > > > > > > > > > > Jason Xing (5): > > > > xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() > > > > xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx > > > > xsk: drain continuation descs after overflow in xsk_build_skb() > > > > xsk: drain continuation descs on invalid descriptor in > > > > __xsk_generic_xmit() > > > > selftests/xsk: drain CQ to wait for TX completion > > > > > > > > include/net/xdp_sock.h | 1 + > > > > net/xdp/xsk.c | 44 +++++++++++++---- > > > > .../selftests/bpf/prog_tests/test_xsk.c | 48 +++++++++++-------- > > > > 3 files changed, 63 insertions(+), 30 deletions(-) > > > > > > > > -- > > > > 2.43.7 > > > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net v4 0/5] xsk: fix meta and publish of cq issues 2026-05-21 13:07 ` Jason Xing @ 2026-05-21 14:24 ` Maciej Fijalkowski 2026-05-22 8:55 ` Jason Xing 0 siblings, 1 reply; 22+ messages in thread From: Maciej Fijalkowski @ 2026-05-21 14:24 UTC (permalink / raw) To: Jason Xing Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing On Thu, May 21, 2026 at 09:07:30PM +0800, Jason Xing wrote: > On Thu, May 21, 2026 at 9:00 PM Maciej Fijalkowski > <maciej.fijalkowski@intel.com> wrote: > > > > On Thu, May 21, 2026 at 08:41:08PM +0800, Jason Xing wrote: > > > On Thu, May 21, 2026 at 8:24 PM Maciej Fijalkowski > > > <maciej.fijalkowski@intel.com> wrote: > > > > > > > > On Wed, May 20, 2026 at 08:42:39AM +0800, Jason Xing wrote: > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > > > The series is the product of previous review from sashiko[1]. > > > > > > > > > > 1) META > > > > > patch 1: address TOCTOU around metadata. > > > > > > > > > > 2) PUBLISH of CQ > > > > > patch 2: make sure xsk_addr->addrs[] can be published to cq when > > > > > overflow occurs. > > > > > patch 3: keep cleaning up the continuation descs (more than 17) and > > > > > publish its address when overflow occurs. > > > > > patch 4: like patch 3, but only handles the invalid descs cases. > > > > > > > > > > [1]: https://lore.kernel.org/all/20260502200722.53960-1-kerneljasonxing@gmail.com/ > > > > > > > > > > --- > > > > > V4 > > > > > Link: https://lore.kernel.org/all/20260517063311.28921-1-kerneljasonxing@gmail.com/ > > > > > 1. correct the description of xmit path in patch 3 (sashiko) > > > > > 2. move set logic into xmit path in patch 3 (Stan) > > > > > > > > > > V3 > > > > > Link: https://lore.kernel.org/all/20260515123018.80147-1-kerneljasonxing@gmail.com/ > > > > > 1. avoid breaking previous usage of sendto, and siliently handle > > > > > overflow case (Stan, sashiko) > > > > > 2. add one particular exception process in patch 4 (sashiko) > > > > > 3. adjust the selftest to make sure it passes in either virutal or > > > > > physical machines, which includes add usleep to support physical machine. > > > > > > > > > > V2 > > > > > Link: https://lore.kernel.org/all/20260510012310.88570-1-kerneljasonxing@gmail.com/ > > > > > 1. adjust selftests (Jakub) > > > > > 2. add READ_ONCE in patch 1 (Stan) > > > > > > > > FWIW I still get test failures (yes with patch 5 applied). PTAL. > > > > > > Thanks for the test. But I've tried with ixgbe driver... > > > > > > I noticed there are some flaky tests which have nothing to do with the > > > series. Can you confirm that it's not caused because of the series? > > > > That explains the different results as i am using i40e/ice which have > > multi-buffer support whereas ixgbe does not even support mbuf at XDP. > > Broken tests are from mbuf cases. > > That's weird. I never expected the failed tests to be about multi-buffer. > > Are they the same as the output you attached last time? Or something > new? Could you please share it so that I can investigate the root > cause? # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # --------------------------------------- not ok 21 FAIL: SKB ALIGNED_INV_DESC_MULTI_BUFF # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # --------------------------------------- not ok 33 FAIL: SKB UNALIGNED_INV_DESC_MULTI_BUFF # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # --------------------------------------- not ok 57 FAIL: DRV ALIGNED_INV_DESC_MULTI_BUFF # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # --------------------------------------- not ok 69 FAIL: DRV UNALIGNED_INV_DESC_MULTI_BUFF # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # --------------------------------------- not ok 93 FAIL: ZC ALIGNED_INV_DESC_MULTI_BUFF # [is_frag_valid] expected pkt_nb [11], got pkt_nb [0] # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# --------------------------------------- not ok 94 FAIL: ZC TOO_MANY_FRAGS # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # --------------------------------------- not ok 105 FAIL: ZC UNALIGNED_INV_DESC_MULTI_BUFF # 4 skipped test(s) detected. Consider enabling relevant config options to improve coverage. # Totals: pass:96 fail:8 xfail:0 xpass:0 skip:4 error:0 XSK_SELFTESTS_ens259f1np1_SOFTIRQ: [ FAIL ] 1..108 # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # --------------------------------------- not ok 21 FAIL: SKB BUSY-POLL ALIGNED_INV_DESC_MULTI_BUFF # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # --------------------------------------- not ok 33 FAIL: SKB BUSY-POLL UNALIGNED_INV_DESC_MULTI_BUFF # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # --------------------------------------- not ok 57 FAIL: DRV BUSY-POLL ALIGNED_INV_DESC_MULTI_BUFF # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # --------------------------------------- not ok 69 FAIL: DRV BUSY-POLL UNALIGNED_INV_DESC_MULTI_BUFF # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # --------------------------------------- not ok 93 FAIL: ZC BUSY-POLL ALIGNED_INV_DESC_MULTI_BUFF # [is_frag_valid] expected pkt_nb [11], got pkt_nb [0] # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# --------------------------------------- # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # --------------------------------------- not ok 105 FAIL: ZC BUSY-POLL UNALIGNED_INV_DESC_MULTI_BUFF # 4 skipped test(s) detected. Consider enabling relevant config options to improve coverage. # Totals: pass:96 fail:8 xfail:0 xpass:0 skip:4 error:0 XSK_SELFTESTS_ens259f1np1_BUSY_POLL: [ FAIL ] Summary: XSK_SELFTESTS_ens259f1np1_SOFTIRQ: [ FAIL ] XSK_SELFTESTS_ens259f1np1_BUSY_POLL: [ FAIL ] > > Thanks, > Jason > > > > > > > > > Thanks, > > > Jason > > > > > > > > > > > > > > > > > > > > > > > > > Jason Xing (5): > > > > > xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() > > > > > xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx > > > > > xsk: drain continuation descs after overflow in xsk_build_skb() > > > > > xsk: drain continuation descs on invalid descriptor in > > > > > __xsk_generic_xmit() > > > > > selftests/xsk: drain CQ to wait for TX completion > > > > > > > > > > include/net/xdp_sock.h | 1 + > > > > > net/xdp/xsk.c | 44 +++++++++++++---- > > > > > .../selftests/bpf/prog_tests/test_xsk.c | 48 +++++++++++-------- > > > > > 3 files changed, 63 insertions(+), 30 deletions(-) > > > > > > > > > > -- > > > > > 2.43.7 > > > > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net v4 0/5] xsk: fix meta and publish of cq issues 2026-05-21 14:24 ` Maciej Fijalkowski @ 2026-05-22 8:55 ` Jason Xing 2026-05-22 13:48 ` Jason Xing 0 siblings, 1 reply; 22+ messages in thread From: Jason Xing @ 2026-05-22 8:55 UTC (permalink / raw) To: Maciej Fijalkowski Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing [-- Attachment #1: Type: text/plain, Size: 11890 bytes --] On Thu, May 21, 2026 at 10:24 PM Maciej Fijalkowski <maciej.fijalkowski@intel.com> wrote: > > On Thu, May 21, 2026 at 09:07:30PM +0800, Jason Xing wrote: > > On Thu, May 21, 2026 at 9:00 PM Maciej Fijalkowski > > <maciej.fijalkowski@intel.com> wrote: > > > > > > On Thu, May 21, 2026 at 08:41:08PM +0800, Jason Xing wrote: > > > > On Thu, May 21, 2026 at 8:24 PM Maciej Fijalkowski > > > > <maciej.fijalkowski@intel.com> wrote: > > > > > > > > > > On Wed, May 20, 2026 at 08:42:39AM +0800, Jason Xing wrote: > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > > > > > The series is the product of previous review from sashiko[1]. > > > > > > > > > > > > 1) META > > > > > > patch 1: address TOCTOU around metadata. > > > > > > > > > > > > 2) PUBLISH of CQ > > > > > > patch 2: make sure xsk_addr->addrs[] can be published to cq when > > > > > > overflow occurs. > > > > > > patch 3: keep cleaning up the continuation descs (more than 17) and > > > > > > publish its address when overflow occurs. > > > > > > patch 4: like patch 3, but only handles the invalid descs cases. > > > > > > > > > > > > [1]: https://lore.kernel.org/all/20260502200722.53960-1-kerneljasonxing@gmail.com/ > > > > > > > > > > > > --- > > > > > > V4 > > > > > > Link: https://lore.kernel.org/all/20260517063311.28921-1-kerneljasonxing@gmail.com/ > > > > > > 1. correct the description of xmit path in patch 3 (sashiko) > > > > > > 2. move set logic into xmit path in patch 3 (Stan) > > > > > > > > > > > > V3 > > > > > > Link: https://lore.kernel.org/all/20260515123018.80147-1-kerneljasonxing@gmail.com/ > > > > > > 1. avoid breaking previous usage of sendto, and siliently handle > > > > > > overflow case (Stan, sashiko) > > > > > > 2. add one particular exception process in patch 4 (sashiko) > > > > > > 3. adjust the selftest to make sure it passes in either virutal or > > > > > > physical machines, which includes add usleep to support physical machine. > > > > > > > > > > > > V2 > > > > > > Link: https://lore.kernel.org/all/20260510012310.88570-1-kerneljasonxing@gmail.com/ > > > > > > 1. adjust selftests (Jakub) > > > > > > 2. add READ_ONCE in patch 1 (Stan) > > > > > > > > > > FWIW I still get test failures (yes with patch 5 applied). PTAL. > > > > > > > > Thanks for the test. But I've tried with ixgbe driver... > > > > > > > > I noticed there are some flaky tests which have nothing to do with the > > > > series. Can you confirm that it's not caused because of the series? > > > > > > That explains the different results as i am using i40e/ice which have > > > multi-buffer support whereas ixgbe does not even support mbuf at XDP. > > > Broken tests are from mbuf cases. > > > > That's weird. I never expected the failed tests to be about multi-buffer. > > > > Are they the same as the output you attached last time? Or something > > new? Could you please share it so that I can investigate the root > > cause? > > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > --------------------------------------- > not ok 21 FAIL: SKB ALIGNED_INV_DESC_MULTI_BUFF > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > --------------------------------------- > not ok 33 FAIL: SKB UNALIGNED_INV_DESC_MULTI_BUFF > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > --------------------------------------- > not ok 57 FAIL: DRV ALIGNED_INV_DESC_MULTI_BUFF > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > --------------------------------------- > not ok 69 FAIL: DRV UNALIGNED_INV_DESC_MULTI_BUFF > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > --------------------------------------- > not ok 93 FAIL: ZC ALIGNED_INV_DESC_MULTI_BUFF > # [is_frag_valid] expected pkt_nb [11], got pkt_nb [0] > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > --------------------------------------- > not ok 94 FAIL: ZC TOO_MANY_FRAGS > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > --------------------------------------- > not ok 105 FAIL: ZC UNALIGNED_INV_DESC_MULTI_BUFF > > # 4 skipped test(s) detected. Consider enabling relevant config options to improve coverage. > # Totals: pass:96 fail:8 xfail:0 xpass:0 skip:4 error:0 > XSK_SELFTESTS_ens259f1np1_SOFTIRQ: [ FAIL ] > 1..108 > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > --------------------------------------- > not ok 21 FAIL: SKB BUSY-POLL ALIGNED_INV_DESC_MULTI_BUFF > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > --------------------------------------- > not ok 33 FAIL: SKB BUSY-POLL UNALIGNED_INV_DESC_MULTI_BUFF > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > --------------------------------------- > not ok 57 FAIL: DRV BUSY-POLL ALIGNED_INV_DESC_MULTI_BUFF > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > --------------------------------------- > not ok 69 FAIL: DRV BUSY-POLL UNALIGNED_INV_DESC_MULTI_BUFF > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > --------------------------------------- > not ok 93 FAIL: ZC BUSY-POLL ALIGNED_INV_DESC_MULTI_BUFF > # [is_frag_valid] expected pkt_nb [11], got pkt_nb [0] > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > --------------------------------------- > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > --------------------------------------- > not ok 105 FAIL: ZC BUSY-POLL UNALIGNED_INV_DESC_MULTI_BUFF > > # 4 skipped test(s) detected. Consider enabling relevant config options to improve coverage. > # Totals: pass:96 fail:8 xfail:0 xpass:0 skip:4 error:0 > XSK_SELFTESTS_ens259f1np1_BUSY_POLL: [ FAIL ] > > Summary: > XSK_SELFTESTS_ens259f1np1_SOFTIRQ: [ FAIL ] > XSK_SELFTESTS_ens259f1np1_BUSY_POLL: [ FAIL ] Sorry, Maciej. I managed to get one server with i40e nic but still couldn't reproduce it. Can you try the attachment (that is the replacement for v4-0005) instead? I removed those nasty CONT test cases... Really I don't think I have much time to spend on these tests which makes me feel extremely annoyed... It's not easy to analyze the code without a reproducer. The good news is that now I highly suspect that this kind of CONT test cases pollute the whole cq which affects other tests. Before I give up on the 0003/0004 patches, I'd like to hear some advice from you. Thank you. My original intention was to push batch xmit forward but at that time sashiko pointed out some unrelated bugs accidentally. Thanks, Jason > > > > > Thanks, > > Jason > > > > > > > > > > > > > Thanks, > > > > Jason > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jason Xing (5): > > > > > > xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() > > > > > > xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx > > > > > > xsk: drain continuation descs after overflow in xsk_build_skb() > > > > > > xsk: drain continuation descs on invalid descriptor in > > > > > > __xsk_generic_xmit() > > > > > > selftests/xsk: drain CQ to wait for TX completion > > > > > > > > > > > > include/net/xdp_sock.h | 1 + > > > > > > net/xdp/xsk.c | 44 +++++++++++++---- > > > > > > .../selftests/bpf/prog_tests/test_xsk.c | 48 +++++++++++-------- > > > > > > 3 files changed, 63 insertions(+), 30 deletions(-) > > > > > > > > > > > > -- > > > > > > 2.43.7 > > > > > > [-- Attachment #2: test.patch.1 --] [-- Type: application/octet-stream, Size: 2980 bytes --] diff --git a/tools/testing/selftests/bpf/prog_tests/test_xsk.c b/tools/testing/selftests/bpf/prog_tests/test_xsk.c index 7950c504ed28..4eb8356523aa 100644 --- a/tools/testing/selftests/bpf/prog_tests/test_xsk.c +++ b/tools/testing/selftests/bpf/prog_tests/test_xsk.c @@ -2070,31 +2070,25 @@ int testapp_invalid_desc_mb(struct test_spec *test) /* Valid packet for synch to start with */ {0, MIN_PKT_SIZE, 0, true, 0}, /* Zero frame len is not legal */ - {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, - {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, {0, 0, 0, false, 0}, - /* Invalid address in the second frame */ - {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, - {umem_size, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, - /* Invalid len in the middle */ - {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, - {0, XSK_UMEM__INVALID_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, - /* Invalid options in the middle */ - {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, - {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XSK_DESC__INVALID_OPTION}, + /* Invalid address */ + {umem_size, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, 0}, + /* Invalid len */ + {0, XSK_UMEM__INVALID_FRAME_SIZE, 0, false, 0}, + /* Invalid options */ + {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, + XSK_DESC__INVALID_OPTION & ~XDP_PKT_CONTD}, /* Transmit 2 frags, receive 3 */ {0, XSK_UMEM__MAX_FRAME_SIZE, 0, true, XDP_PKT_CONTD}, {0, XSK_UMEM__MAX_FRAME_SIZE, 0, true, 0}, /* Middle frame crosses chunk boundary with small length */ - {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, {-MIN_PKT_SIZE / 2, MIN_PKT_SIZE, 0, false, 0}, /* Valid packet for synch so that something is received */ {0, MIN_PKT_SIZE, 0, true, 0}}; if (umem->unaligned_mode) { /* Crossing a chunk boundary allowed */ - pkts[12].valid = true; - pkts[13].valid = true; + pkts[7].valid = true; } test->mtu = MAX_ETH_JUMBO_SIZE; @@ -2229,7 +2223,7 @@ int testapp_too_many_frags(struct test_spec *test) max_frags += 1; } - pkts = calloc(2 * max_frags + 2, sizeof(struct pkt)); + pkts = calloc(max_frags + 2, sizeof(struct pkt)); if (!pkts) return TEST_FAILURE; @@ -2247,20 +2241,11 @@ int testapp_too_many_frags(struct test_spec *test) } pkts[max_frags].options = 0; - /* An invalid packet with the max amount of frags but signals packet - * continues on the last frag - */ - for (i = max_frags + 1; i < 2 * max_frags + 1; i++) { - pkts[i].len = MIN_PKT_SIZE; - pkts[i].options = XDP_PKT_CONTD; - pkts[i].valid = false; - } - /* Valid packet for synch */ - pkts[2 * max_frags + 1].len = MIN_PKT_SIZE; - pkts[2 * max_frags + 1].valid = true; + pkts[max_frags + 1].len = MIN_PKT_SIZE; + pkts[max_frags + 1].valid = true; - if (pkt_stream_generate_custom(test, pkts, 2 * max_frags + 2)) { + if (pkt_stream_generate_custom(test, pkts, max_frags + 2)) { free(pkts); return TEST_FAILURE; } ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH net v4 0/5] xsk: fix meta and publish of cq issues 2026-05-22 8:55 ` Jason Xing @ 2026-05-22 13:48 ` Jason Xing 2026-05-22 18:33 ` Maciej Fijalkowski 0 siblings, 1 reply; 22+ messages in thread From: Jason Xing @ 2026-05-22 13:48 UTC (permalink / raw) To: Maciej Fijalkowski Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing [-- Attachment #1: Type: text/plain, Size: 12595 bytes --] On Fri, May 22, 2026 at 4:55 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Thu, May 21, 2026 at 10:24 PM Maciej Fijalkowski > <maciej.fijalkowski@intel.com> wrote: > > > > On Thu, May 21, 2026 at 09:07:30PM +0800, Jason Xing wrote: > > > On Thu, May 21, 2026 at 9:00 PM Maciej Fijalkowski > > > <maciej.fijalkowski@intel.com> wrote: > > > > > > > > On Thu, May 21, 2026 at 08:41:08PM +0800, Jason Xing wrote: > > > > > On Thu, May 21, 2026 at 8:24 PM Maciej Fijalkowski > > > > > <maciej.fijalkowski@intel.com> wrote: > > > > > > > > > > > > On Wed, May 20, 2026 at 08:42:39AM +0800, Jason Xing wrote: > > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > > > > > > > The series is the product of previous review from sashiko[1]. > > > > > > > > > > > > > > 1) META > > > > > > > patch 1: address TOCTOU around metadata. > > > > > > > > > > > > > > 2) PUBLISH of CQ > > > > > > > patch 2: make sure xsk_addr->addrs[] can be published to cq when > > > > > > > overflow occurs. > > > > > > > patch 3: keep cleaning up the continuation descs (more than 17) and > > > > > > > publish its address when overflow occurs. > > > > > > > patch 4: like patch 3, but only handles the invalid descs cases. > > > > > > > > > > > > > > [1]: https://lore.kernel.org/all/20260502200722.53960-1-kerneljasonxing@gmail.com/ > > > > > > > > > > > > > > --- > > > > > > > V4 > > > > > > > Link: https://lore.kernel.org/all/20260517063311.28921-1-kerneljasonxing@gmail.com/ > > > > > > > 1. correct the description of xmit path in patch 3 (sashiko) > > > > > > > 2. move set logic into xmit path in patch 3 (Stan) > > > > > > > > > > > > > > V3 > > > > > > > Link: https://lore.kernel.org/all/20260515123018.80147-1-kerneljasonxing@gmail.com/ > > > > > > > 1. avoid breaking previous usage of sendto, and siliently handle > > > > > > > overflow case (Stan, sashiko) > > > > > > > 2. add one particular exception process in patch 4 (sashiko) > > > > > > > 3. adjust the selftest to make sure it passes in either virutal or > > > > > > > physical machines, which includes add usleep to support physical machine. > > > > > > > > > > > > > > V2 > > > > > > > Link: https://lore.kernel.org/all/20260510012310.88570-1-kerneljasonxing@gmail.com/ > > > > > > > 1. adjust selftests (Jakub) > > > > > > > 2. add READ_ONCE in patch 1 (Stan) > > > > > > > > > > > > FWIW I still get test failures (yes with patch 5 applied). PTAL. > > > > > > > > > > Thanks for the test. But I've tried with ixgbe driver... > > > > > > > > > > I noticed there are some flaky tests which have nothing to do with the > > > > > series. Can you confirm that it's not caused because of the series? > > > > > > > > That explains the different results as i am using i40e/ice which have > > > > multi-buffer support whereas ixgbe does not even support mbuf at XDP. > > > > Broken tests are from mbuf cases. > > > > > > That's weird. I never expected the failed tests to be about multi-buffer. > > > > > > Are they the same as the output you attached last time? Or something > > > new? Could you please share it so that I can investigate the root > > > cause? > > > > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > > --------------------------------------- > > not ok 21 FAIL: SKB ALIGNED_INV_DESC_MULTI_BUFF > > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > > --------------------------------------- > > not ok 33 FAIL: SKB UNALIGNED_INV_DESC_MULTI_BUFF > > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > > --------------------------------------- > > not ok 57 FAIL: DRV ALIGNED_INV_DESC_MULTI_BUFF > > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > > --------------------------------------- > > not ok 69 FAIL: DRV UNALIGNED_INV_DESC_MULTI_BUFF > > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > > --------------------------------------- > > not ok 93 FAIL: ZC ALIGNED_INV_DESC_MULTI_BUFF > > # [is_frag_valid] expected pkt_nb [11], got pkt_nb [0] > > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > > --------------------------------------- > > not ok 94 FAIL: ZC TOO_MANY_FRAGS > > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > > --------------------------------------- > > not ok 105 FAIL: ZC UNALIGNED_INV_DESC_MULTI_BUFF > > > > # 4 skipped test(s) detected. Consider enabling relevant config options to improve coverage. > > # Totals: pass:96 fail:8 xfail:0 xpass:0 skip:4 error:0 > > XSK_SELFTESTS_ens259f1np1_SOFTIRQ: [ FAIL ] > > 1..108 > > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > > --------------------------------------- > > not ok 21 FAIL: SKB BUSY-POLL ALIGNED_INV_DESC_MULTI_BUFF > > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > > --------------------------------------- > > not ok 33 FAIL: SKB BUSY-POLL UNALIGNED_INV_DESC_MULTI_BUFF > > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > > --------------------------------------- > > not ok 57 FAIL: DRV BUSY-POLL ALIGNED_INV_DESC_MULTI_BUFF > > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > > --------------------------------------- > > not ok 69 FAIL: DRV BUSY-POLL UNALIGNED_INV_DESC_MULTI_BUFF > > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > > --------------------------------------- > > not ok 93 FAIL: ZC BUSY-POLL ALIGNED_INV_DESC_MULTI_BUFF > > # [is_frag_valid] expected pkt_nb [11], got pkt_nb [0] > > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > > --------------------------------------- > > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > > --------------------------------------- > > not ok 105 FAIL: ZC BUSY-POLL UNALIGNED_INV_DESC_MULTI_BUFF > > > > # 4 skipped test(s) detected. Consider enabling relevant config options to improve coverage. > > # Totals: pass:96 fail:8 xfail:0 xpass:0 skip:4 error:0 > > XSK_SELFTESTS_ens259f1np1_BUSY_POLL: [ FAIL ] > > > > Summary: > > XSK_SELFTESTS_ens259f1np1_SOFTIRQ: [ FAIL ] > > XSK_SELFTESTS_ens259f1np1_BUSY_POLL: [ FAIL ] > > Sorry, Maciej. I managed to get one server with i40e nic but still > couldn't reproduce it. Can you try the attachment (that is the > replacement for v4-0005) instead? I removed those nasty CONT test > cases... Ah, I think I eventually figured out a solution. Maciej, could you please test the 2nd patch instead? This patch reworks the CONTD test cases. Cross finger. Thanks, Jason > > Really I don't think I have much time to spend on these tests which > makes me feel extremely annoyed... It's not easy to analyze the code > without a reproducer. The good news is that now I highly suspect that > this kind of CONT test cases pollute the whole cq which affects other > tests. Before I give up on the 0003/0004 patches, I'd like to hear > some advice from you. Thank you. > > My original intention was to push batch xmit forward but at that time > sashiko pointed out some unrelated bugs accidentally. > > Thanks, > Jason > > > > > > > > > Thanks, > > > Jason > > > > > > > > > > > > > > > > > Thanks, > > > > > Jason > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jason Xing (5): > > > > > > > xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() > > > > > > > xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx > > > > > > > xsk: drain continuation descs after overflow in xsk_build_skb() > > > > > > > xsk: drain continuation descs on invalid descriptor in > > > > > > > __xsk_generic_xmit() > > > > > > > selftests/xsk: drain CQ to wait for TX completion > > > > > > > > > > > > > > include/net/xdp_sock.h | 1 + > > > > > > > net/xdp/xsk.c | 44 +++++++++++++---- > > > > > > > .../selftests/bpf/prog_tests/test_xsk.c | 48 +++++++++++-------- > > > > > > > 3 files changed, 63 insertions(+), 30 deletions(-) > > > > > > > > > > > > > > -- > > > > > > > 2.43.7 > > > > > > > [-- Attachment #2: 0001-selftests-xsk-rework-CONTD-testcases.patch --] [-- Type: application/octet-stream, Size: 9451 bytes --] From 6db8d9c1fae4b798050c9502495d957627a03c46 Mon Sep 17 00:00:00 2001 From: Jason Xing <kernelxing@tencent.com> Date: Fri, 22 May 2026 22:00:00 +0300 Subject: [PATCH] selftests/xsk: rework CONTD testcases The existing *INV_DESC_MULTI_BUFF and TOO_MANY_FRAGS tests embed mid-chain invalid descriptors, which collide with both the new CQ accounting. The previous arch was not designed for this behavior: 1) it's unneeded to test the series in zc mode. 2) outstanding_tx doesn't work without newly added nb_extra_cq. In a word, it's unlikely to support the CONTD test in both non-zc and zc in the unified functions. So this patch does two things: strip the previous CONTD tests, and then group them together into two testcases. Add two SKB/DRV-only tests to cover the series in isolation: DRAIN_CONT and EOVERFLOW_DRAIN_CONT. Both declare the kernel-internal extra CQ entries via nb_extra_cq so __send_pkts() keeps strict accounting, and both are skipped under ZC since drain_cont only exists on the SKB/DRV path. Signed-off-by: Jason Xing <kernelxing@tencent.com> --- .../selftests/bpf/prog_tests/test_xsk.c | 135 ++++++++++++++---- .../selftests/bpf/prog_tests/test_xsk.h | 5 + 2 files changed, 116 insertions(+), 24 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/test_xsk.c b/tools/testing/selftests/bpf/prog_tests/test_xsk.c index 7950c504ed28..1c17123f993c 100644 --- a/tools/testing/selftests/bpf/prog_tests/test_xsk.c +++ b/tools/testing/selftests/bpf/prog_tests/test_xsk.c @@ -1247,7 +1247,8 @@ static int __send_pkts(struct ifobject *ifobject, struct xsk_socket_info *xsk, b pthread_mutex_unlock(&pacing_mutex); xsk_ring_prod__submit(&xsk->tx, i); - xsk->outstanding_tx += valid_frags; + xsk->outstanding_tx += valid_frags + xsk->nb_extra_cq; + xsk->nb_extra_cq = 0; if (use_poll) { ret = poll(&fds, 1, POLL_TMOUT); @@ -2070,31 +2071,25 @@ int testapp_invalid_desc_mb(struct test_spec *test) /* Valid packet for synch to start with */ {0, MIN_PKT_SIZE, 0, true, 0}, /* Zero frame len is not legal */ - {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, - {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, {0, 0, 0, false, 0}, - /* Invalid address in the second frame */ - {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, - {umem_size, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, - /* Invalid len in the middle */ - {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, - {0, XSK_UMEM__INVALID_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, - /* Invalid options in the middle */ - {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, - {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XSK_DESC__INVALID_OPTION}, + /* Invalid address */ + {umem_size, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, 0}, + /* Invalid len */ + {0, XSK_UMEM__INVALID_FRAME_SIZE, 0, false, 0}, + /* Invalid options */ + {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, + XSK_DESC__INVALID_OPTION & ~XDP_PKT_CONTD}, /* Transmit 2 frags, receive 3 */ {0, XSK_UMEM__MAX_FRAME_SIZE, 0, true, XDP_PKT_CONTD}, {0, XSK_UMEM__MAX_FRAME_SIZE, 0, true, 0}, - /* Middle frame crosses chunk boundary with small length */ - {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, + /* Single frame crosses chunk boundary with small length */ {-MIN_PKT_SIZE / 2, MIN_PKT_SIZE, 0, false, 0}, /* Valid packet for synch so that something is received */ {0, MIN_PKT_SIZE, 0, true, 0}}; if (umem->unaligned_mode) { /* Crossing a chunk boundary allowed */ - pkts[12].valid = true; - pkts[13].valid = true; + pkts[7].valid = true; } test->mtu = MAX_ETH_JUMBO_SIZE; @@ -2229,7 +2224,7 @@ int testapp_too_many_frags(struct test_spec *test) max_frags += 1; } - pkts = calloc(2 * max_frags + 2, sizeof(struct pkt)); + pkts = calloc(max_frags + 2, sizeof(struct pkt)); if (!pkts) return TEST_FAILURE; @@ -2247,24 +2242,116 @@ int testapp_too_many_frags(struct test_spec *test) } pkts[max_frags].options = 0; - /* An invalid packet with the max amount of frags but signals packet - * continues on the last frag + /* Valid packet for synch */ + pkts[max_frags + 1].len = MIN_PKT_SIZE; + pkts[max_frags + 1].valid = true; + + if (pkt_stream_generate_custom(test, pkts, max_frags + 2)) { + free(pkts); + return TEST_FAILURE; + } + + ret = testapp_validate_traffic(test); + free(pkts); + return ret; +} + +int testapp_drain_cont(struct test_spec *test) +{ + struct pkt pkts[] = { + /* Valid packet for sync */ + {0, MIN_PKT_SIZE, 0, true, 0}, + /* Kernel-valid first frag: builds partial SKB */ + {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, + /* Kernel-invalid (len=0) with CONTD: arms drain_cont */ + {0, 0, 0, false, XDP_PKT_CONTD}, + /* Drained by drain_cont, CONTD keeps draining */ + {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, XDP_PKT_CONTD}, + /* Drained by drain_cont, no CONTD ends draining */ + {0, XSK_UMEM__LARGE_FRAME_SIZE, 0, false, 0}, + /* Valid packet for sync */ + {0, MIN_PKT_SIZE, 0, true, 0}, + }; + + if (test->mode == TEST_MODE_ZC) { + ksft_print_msg("Can not run DRAIN_CONT test for ZC mode\n"); + return TEST_SKIP; + } + + test->mtu = MAX_ETH_JUMBO_SIZE; + if (pkt_stream_generate_custom(test, pkts, ARRAY_SIZE(pkts))) + return TEST_FAILURE; + + /* 1 CQ from xsk_drop_skb (partial SKB) + 2 CQ from drain_cont */ + test->ifobj_tx->xsk->nb_extra_cq = 3; + + return testapp_validate_traffic(test); +} + +int testapp_eoverflow_drain_cont(struct test_spec *test) +{ + struct ifobject *ifobj_tx = test->ifobj_tx; + u32 max_skb_frags; + struct pkt *pkts; + u32 nb_pkts, i; + int ret; + + if (test->mode == TEST_MODE_ZC) { + ksft_print_msg("Can not run EOVERFLOW_DRAIN_CONT test for ZC mode\n"); + return TEST_SKIP; + } + + max_skb_frags = ifobj_tx->max_skb_frags; + + /* + * Layout (let N = max_skb_frags, 17 as default): + * + * pkts[0] : valid sync packet. + * pkts[1..N+1] : N+1 CONTD frags forming a multi-buffer + * chain that overruns MAX_SKB_FRAGS. The + * last frag triggers -EOVERFLOW in + * xsk_build_skb(); xsk_drop_skb() then + * publishes N+1 accumulated UMEM + * addresses to the CQ via the SKB + * destructor and arms drain_cont. + * pkts[N+2..N+3] : 2 CONTD frags consumed by drain_cont's + * write-to-CQ branch on the next sendto(). + * pkts[N+4] : !CONTD frag that closes drain_cont + * (xp_mb_desc(&desc) becomes false). + * pkts[N+5] : valid sync packet, sent normally to + * confirm that the socket recovered and + * drain_cont is no longer active. */ - for (i = max_frags + 1; i < 2 * max_frags + 1; i++) { + nb_pkts = max_skb_frags + 6; + pkts = calloc(nb_pkts, sizeof(struct pkt)); + if (!pkts) + return TEST_FAILURE; + + test->mtu = MAX_ETH_JUMBO_SIZE; + + pkts[0].len = MIN_PKT_SIZE; + pkts[0].valid = true; + + for (i = 1; i <= max_skb_frags + 4; i++) { pkts[i].len = MIN_PKT_SIZE; pkts[i].options = XDP_PKT_CONTD; pkts[i].valid = false; } + pkts[max_skb_frags + 4].options = 0; - /* Valid packet for synch */ - pkts[2 * max_frags + 1].len = MIN_PKT_SIZE; - pkts[2 * max_frags + 1].valid = true; + pkts[max_skb_frags + 5].len = MIN_PKT_SIZE; + pkts[max_skb_frags + 5].valid = true; - if (pkt_stream_generate_custom(test, pkts, 2 * max_frags + 2)) { + if (pkt_stream_generate_custom(test, pkts, nb_pkts)) { free(pkts); return TEST_FAILURE; } + /* (max_skb_frags + 1) CQ from xsk_drop_skb on EOVERFLOW and + * 3 from drain_cont. + */ + ifobj_tx->xsk->nb_extra_cq = max_skb_frags + 4; + ret = testapp_validate_traffic(test); free(pkts); return ret; diff --git a/tools/testing/selftests/bpf/prog_tests/test_xsk.h b/tools/testing/selftests/bpf/prog_tests/test_xsk.h index 1ab8aee4ce56..2e6488e0452b 100644 --- a/tools/testing/selftests/bpf/prog_tests/test_xsk.h +++ b/tools/testing/selftests/bpf/prog_tests/test_xsk.h @@ -87,6 +87,7 @@ struct xsk_socket_info { struct xsk_socket *xsk; struct pkt_stream *pkt_stream; u32 outstanding_tx; + u32 nb_extra_cq; u32 rxqsize; u32 batch_size; u8 dst_mac[ETH_ALEN]; @@ -237,6 +238,8 @@ int testapp_aligned_inv_desc(struct test_spec *test); int testapp_aligned_inv_desc_2k_frame(struct test_spec *test); int testapp_aligned_inv_desc_mb(struct test_spec *test); int testapp_bidirectional(struct test_spec *test); +int testapp_drain_cont(struct test_spec *test); +int testapp_eoverflow_drain_cont(struct test_spec *test); int testapp_headroom(struct test_spec *test); int testapp_hw_sw_max_ring_size(struct test_spec *test); int testapp_hw_sw_min_ring_size(struct test_spec *test); @@ -292,6 +295,8 @@ static const struct test_spec tests[] = { {.name = "XDP_METADATA_COPY_MULTI_BUFF", .test_func = testapp_xdp_metadata_mb}, {.name = "ALIGNED_INV_DESC_MULTI_BUFF", .test_func = testapp_aligned_inv_desc_mb}, {.name = "TOO_MANY_FRAGS", .test_func = testapp_too_many_frags}, + {.name = "DRAIN_CONT", .test_func = testapp_drain_cont}, + {.name = "EOVERFLOW_DRAIN_CONT", .test_func = testapp_eoverflow_drain_cont}, {.name = "XDP_ADJUST_TAIL_SHRINK", .test_func = testapp_adjust_tail_shrink}, {.name = "TX_QUEUE_CONSUMER", .test_func = testapp_tx_queue_consumer}, }; -- 2.43.7 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH net v4 0/5] xsk: fix meta and publish of cq issues 2026-05-22 13:48 ` Jason Xing @ 2026-05-22 18:33 ` Maciej Fijalkowski 0 siblings, 0 replies; 22+ messages in thread From: Maciej Fijalkowski @ 2026-05-22 18:33 UTC (permalink / raw) To: Jason Xing Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing On Fri, May 22, 2026 at 09:48:39PM +0800, Jason Xing wrote: > On Fri, May 22, 2026 at 4:55 PM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > On Thu, May 21, 2026 at 10:24 PM Maciej Fijalkowski > > <maciej.fijalkowski@intel.com> wrote: > > > > > > On Thu, May 21, 2026 at 09:07:30PM +0800, Jason Xing wrote: > > > > On Thu, May 21, 2026 at 9:00 PM Maciej Fijalkowski > > > > <maciej.fijalkowski@intel.com> wrote: > > > > > > > > > > On Thu, May 21, 2026 at 08:41:08PM +0800, Jason Xing wrote: > > > > > > On Thu, May 21, 2026 at 8:24 PM Maciej Fijalkowski > > > > > > <maciej.fijalkowski@intel.com> wrote: > > > > > > > > > > > > > > On Wed, May 20, 2026 at 08:42:39AM +0800, Jason Xing wrote: > > > > > > > > From: Jason Xing <kernelxing@tencent.com> > > > > > > > > > > > > > > > > The series is the product of previous review from sashiko[1]. > > > > > > > > > > > > > > > > 1) META > > > > > > > > patch 1: address TOCTOU around metadata. > > > > > > > > > > > > > > > > 2) PUBLISH of CQ > > > > > > > > patch 2: make sure xsk_addr->addrs[] can be published to cq when > > > > > > > > overflow occurs. > > > > > > > > patch 3: keep cleaning up the continuation descs (more than 17) and > > > > > > > > publish its address when overflow occurs. > > > > > > > > patch 4: like patch 3, but only handles the invalid descs cases. > > > > > > > > > > > > > > > > [1]: https://lore.kernel.org/all/20260502200722.53960-1-kerneljasonxing@gmail.com/ > > > > > > > > > > > > > > > > --- > > > > > > > > V4 > > > > > > > > Link: https://lore.kernel.org/all/20260517063311.28921-1-kerneljasonxing@gmail.com/ > > > > > > > > 1. correct the description of xmit path in patch 3 (sashiko) > > > > > > > > 2. move set logic into xmit path in patch 3 (Stan) > > > > > > > > > > > > > > > > V3 > > > > > > > > Link: https://lore.kernel.org/all/20260515123018.80147-1-kerneljasonxing@gmail.com/ > > > > > > > > 1. avoid breaking previous usage of sendto, and siliently handle > > > > > > > > overflow case (Stan, sashiko) > > > > > > > > 2. add one particular exception process in patch 4 (sashiko) > > > > > > > > 3. adjust the selftest to make sure it passes in either virutal or > > > > > > > > physical machines, which includes add usleep to support physical machine. > > > > > > > > > > > > > > > > V2 > > > > > > > > Link: https://lore.kernel.org/all/20260510012310.88570-1-kerneljasonxing@gmail.com/ > > > > > > > > 1. adjust selftests (Jakub) > > > > > > > > 2. add READ_ONCE in patch 1 (Stan) > > > > > > > > > > > > > > FWIW I still get test failures (yes with patch 5 applied). PTAL. > > > > > > > > > > > > Thanks for the test. But I've tried with ixgbe driver... > > > > > > > > > > > > I noticed there are some flaky tests which have nothing to do with the > > > > > > series. Can you confirm that it's not caused because of the series? > > > > > > > > > > That explains the different results as i am using i40e/ice which have > > > > > multi-buffer support whereas ixgbe does not even support mbuf at XDP. > > > > > Broken tests are from mbuf cases. > > > > > > > > That's weird. I never expected the failed tests to be about multi-buffer. > > > > > > > > Are they the same as the output you attached last time? Or something > > > > new? Could you please share it so that I can investigate the root > > > > cause? > > > > > > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > > > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > > > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > > > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > > > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > > > --------------------------------------- > > > not ok 21 FAIL: SKB ALIGNED_INV_DESC_MULTI_BUFF > > > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > > > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > > > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > > > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > > > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > > > --------------------------------------- > > > not ok 33 FAIL: SKB UNALIGNED_INV_DESC_MULTI_BUFF > > > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > > > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > > > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > > > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > > > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > > > --------------------------------------- > > > not ok 57 FAIL: DRV ALIGNED_INV_DESC_MULTI_BUFF > > > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > > > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > > > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > > > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > > > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > > > --------------------------------------- > > > not ok 69 FAIL: DRV UNALIGNED_INV_DESC_MULTI_BUFF > > > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > > > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > > > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > > > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > > > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > > > --------------------------------------- > > > not ok 93 FAIL: ZC ALIGNED_INV_DESC_MULTI_BUFF > > > # [is_frag_valid] expected pkt_nb [11], got pkt_nb [0] > > > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > > > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > > > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > > > --------------------------------------- > > > not ok 94 FAIL: ZC TOO_MANY_FRAGS > > > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > > > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > > > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > > > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > > > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > > > --------------------------------------- > > > not ok 105 FAIL: ZC UNALIGNED_INV_DESC_MULTI_BUFF > > > > > > # 4 skipped test(s) detected. Consider enabling relevant config options to improve coverage. > > > # Totals: pass:96 fail:8 xfail:0 xpass:0 skip:4 error:0 > > > XSK_SELFTESTS_ens259f1np1_SOFTIRQ: [ FAIL ] > > > 1..108 > > > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > > > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > > > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > > > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > > > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > > > --------------------------------------- > > > not ok 21 FAIL: SKB BUSY-POLL ALIGNED_INV_DESC_MULTI_BUFF > > > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > > > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > > > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > > > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > > > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > > > --------------------------------------- > > > not ok 33 FAIL: SKB BUSY-POLL UNALIGNED_INV_DESC_MULTI_BUFF > > > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > > > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > > > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > > > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > > > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > > > --------------------------------------- > > > not ok 57 FAIL: DRV BUSY-POLL ALIGNED_INV_DESC_MULTI_BUFF > > > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > > > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > > > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > > > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > > > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > > > --------------------------------------- > > > not ok 69 FAIL: DRV BUSY-POLL UNALIGNED_INV_DESC_MULTI_BUFF > > > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > > > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > > > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > > > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > > > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > > > --------------------------------------- > > > not ok 93 FAIL: ZC BUSY-POLL ALIGNED_INV_DESC_MULTI_BUFF > > > # [is_frag_valid] expected pkt_nb [11], got pkt_nb [0] > > > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > > > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > > > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > > > --------------------------------------- > > > # [is_frag_valid] expected pkt_nb [10], got pkt_nb [0] > > > # DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01# > > > DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00# > > > DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....# > > > .... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # > > > --------------------------------------- > > > not ok 105 FAIL: ZC BUSY-POLL UNALIGNED_INV_DESC_MULTI_BUFF > > > > > > # 4 skipped test(s) detected. Consider enabling relevant config options to improve coverage. > > > # Totals: pass:96 fail:8 xfail:0 xpass:0 skip:4 error:0 > > > XSK_SELFTESTS_ens259f1np1_BUSY_POLL: [ FAIL ] > > > > > > Summary: > > > XSK_SELFTESTS_ens259f1np1_SOFTIRQ: [ FAIL ] > > > XSK_SELFTESTS_ens259f1np1_BUSY_POLL: [ FAIL ] > > > > Sorry, Maciej. I managed to get one server with i40e nic but still > > couldn't reproduce it. Can you try the attachment (that is the > > replacement for v4-0005) instead? I removed those nasty CONT test > > cases... > > Ah, I think I eventually figured out a solution. Maciej, could you > please test the 2nd patch instead? > > This patch reworks the CONTD test cases. Cross finger. Please don't rush things here, I believe we need to think a bit more here. I have second thoughts about overall approach. My understanding wrt CQ was that it is a container that holds descriptors which have been successfully transmitted. Now we want to add also leftover descriptors from broken packets, which might confuse user space sides in case they were relying on behavior described above. The intent is right of course as we don't want to lose UMEM descs, but I feel like we need a separate mechanism for that rather than putting invalid descs to CQ. Does it make sense? Besides, even though we would stay with proposed changes, behavior between modes should be aligned. Right now ZC seems to be broken in touched regions here - when we hit the limit of frags via pool->xdp_zc_max_segs, we break the loop and discard the packet, never post it to CQ and these descs are lost from user space POV. Then we would continue on next call and interpret the rest of too big packet as a separate one (clamped) and therefore submit corrupted packet to HW. I'll be looking at ZC API but i do think we need a common approach, mode-agnostic. Thanks, Maciej > > Thanks, > Jason > > > > > Really I don't think I have much time to spend on these tests which > > makes me feel extremely annoyed... It's not easy to analyze the code > > without a reproducer. The good news is that now I highly suspect that > > this kind of CONT test cases pollute the whole cq which affects other > > tests. Before I give up on the 0003/0004 patches, I'd like to hear > > some advice from you. Thank you. > > > > My original intention was to push batch xmit forward but at that time > > sashiko pointed out some unrelated bugs accidentally. > > > > Thanks, > > Jason > > > > > > > > > > > > > Thanks, > > > > Jason > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > Jason > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jason Xing (5): > > > > > > > > xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() > > > > > > > > xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx > > > > > > > > xsk: drain continuation descs after overflow in xsk_build_skb() > > > > > > > > xsk: drain continuation descs on invalid descriptor in > > > > > > > > __xsk_generic_xmit() > > > > > > > > selftests/xsk: drain CQ to wait for TX completion > > > > > > > > > > > > > > > > include/net/xdp_sock.h | 1 + > > > > > > > > net/xdp/xsk.c | 44 +++++++++++++---- > > > > > > > > .../selftests/bpf/prog_tests/test_xsk.c | 48 +++++++++++-------- > > > > > > > > 3 files changed, 63 insertions(+), 30 deletions(-) > > > > > > > > > > > > > > > > -- > > > > > > > > 2.43.7 > > > > > > > > ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2026-05-22 18:34 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-20 0:42 [PATCH net v4 0/5] xsk: fix meta and publish of cq issues Jason Xing 2026-05-20 0:42 ` [PATCH net v4 1/5] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() Jason Xing 2026-05-21 12:04 ` Maciej Fijalkowski 2026-05-20 0:42 ` [PATCH net v4 2/5] xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx Jason Xing 2026-05-21 12:05 ` Maciej Fijalkowski 2026-05-20 0:42 ` [PATCH net v4 3/5] xsk: drain continuation descs after overflow in xsk_build_skb() Jason Xing 2026-05-20 16:10 ` Maciej Fijalkowski 2026-05-20 23:53 ` Jason Xing 2026-05-21 12:02 ` Maciej Fijalkowski 2026-05-21 13:10 ` Jason Xing 2026-05-22 9:06 ` Magnus Karlsson 2026-05-22 9:22 ` Jason Xing 2026-05-20 0:42 ` [PATCH net v4 4/5] xsk: drain continuation descs on invalid descriptor in __xsk_generic_xmit() Jason Xing 2026-05-20 0:42 ` [PATCH net v4 5/5] selftests/xsk: drain CQ to wait for TX completion Jason Xing 2026-05-21 12:23 ` [PATCH net v4 0/5] xsk: fix meta and publish of cq issues Maciej Fijalkowski 2026-05-21 12:41 ` Jason Xing 2026-05-21 12:59 ` Maciej Fijalkowski 2026-05-21 13:07 ` Jason Xing 2026-05-21 14:24 ` Maciej Fijalkowski 2026-05-22 8:55 ` Jason Xing 2026-05-22 13:48 ` Jason Xing 2026-05-22 18:33 ` Maciej Fijalkowski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox