From: Jason Xing <kerneljasonxing@gmail.com>
To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, bjorn@kernel.org, magnus.karlsson@intel.com,
maciej.fijalkowski@intel.com, jonathan.lemon@gmail.com,
sdf@fomichev.me, ast@kernel.org, daniel@iogearbox.net,
hawk@kernel.org, john.fastabend@gmail.com, horms@kernel.org,
andrew+netdev@lunn.ch
Cc: bpf@vger.kernel.org, netdev@vger.kernel.org,
Jason Xing <kernelxing@tencent.com>
Subject: [PATCH net v4 3/5] xsk: drain continuation descs after overflow in xsk_build_skb()
Date: Wed, 20 May 2026 08:42:42 +0800 [thread overview]
Message-ID: <20260520004244.55663-4-kerneljasonxing@gmail.com> (raw)
In-Reply-To: <20260520004244.55663-1-kerneljasonxing@gmail.com>
From: Jason Xing <kernelxing@tencent.com>
When a multi-buffer packet exceeds MAX_SKB_FRAGS and triggers -EOVERFLOW,
only the current descriptor is released from the TX ring. The remaining
continuation descriptors of the same packet stay in the ring. Since
xs->skb is set to NULL after the drop, the TX loop picks up these
leftover frags and misinterprets each one as the beginning of a new
packet, corrupting the packet stream.
Fix this by adding a drain_cont flag to xdp_sock. When overflow occurs
and the dropped descriptor has XDP_PKT_CONTD set, the flag is raised,
so we have a chance to examine and handle the potential remaining descs
of this big overflow'ed skb.
When the last fragment (without XDP_PKT_CONTD) is processed, the flag
is cleared and the loop continues to process subsequent descriptors
with the remaining budget. This behavior follows how previous xmit path
treats overflow packets.
Closes: https://lore.kernel.org/all/20260425041726.85FB3C2BCB2@smtp.kernel.org/
Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
include/net/xdp_sock.h | 1 +
net/xdp/xsk.c | 19 +++++++++++++++++++
2 files changed, 20 insertions(+)
diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index ebac60a3d8a1..8b51876efbed 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -80,6 +80,7 @@ struct xdp_sock {
* call of __xsk_generic_xmit().
*/
struct sk_buff *skb;
+ bool drain_cont;
struct list_head map_list;
/* Protects map_list */
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 0a6203c42576..f4add7be8c93 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -1062,11 +1062,30 @@ static int __xsk_generic_xmit(struct sock *sk)
goto out;
}
+ if (unlikely(xs->drain_cont)) {
+ unsigned long flags;
+ u32 idx;
+
+ spin_lock_irqsave(&xs->pool->cq_prod_lock, flags);
+ idx = xskq_get_prod(xs->pool->cq);
+ xskq_prod_write_addr(xs->pool->cq, idx, desc.addr);
+ xskq_prod_submit_n(xs->pool->cq, 1);
+ spin_unlock_irqrestore(&xs->pool->cq_prod_lock, flags);
+
+ xs->tx->invalid_descs++;
+ xskq_cons_release(xs->tx);
+ if (!xp_mb_desc(&desc))
+ xs->drain_cont = false;
+ continue;
+ }
+
skb = xsk_build_skb(xs, &desc);
if (IS_ERR(skb)) {
err = PTR_ERR(skb);
if (err != -EOVERFLOW)
goto out;
+ if (xp_mb_desc(&desc))
+ xs->drain_cont = true;
err = 0;
continue;
}
--
2.43.7
next prev parent reply other threads:[~2026-05-20 0:43 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-20 0:42 [PATCH net v4 0/5] xsk: fix meta and publish of cq issues Jason Xing
2026-05-20 0:42 ` [PATCH net v4 1/5] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() Jason Xing
2026-05-21 12:04 ` Maciej Fijalkowski
2026-05-30 0:44 ` sashiko-bot
2026-05-20 0:42 ` [PATCH net v4 2/5] xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx Jason Xing
2026-05-21 12:05 ` Maciej Fijalkowski
2026-05-20 0:42 ` Jason Xing [this message]
2026-05-20 16:10 ` [PATCH net v4 3/5] xsk: drain continuation descs after overflow in xsk_build_skb() Maciej Fijalkowski
2026-05-20 23:53 ` Jason Xing
2026-05-21 12:02 ` Maciej Fijalkowski
2026-05-21 13:10 ` Jason Xing
2026-05-22 9:06 ` Magnus Karlsson
2026-05-22 9:22 ` Jason Xing
2026-05-30 0:44 ` sashiko-bot
2026-05-20 0:42 ` [PATCH net v4 4/5] xsk: drain continuation descs on invalid descriptor in __xsk_generic_xmit() Jason Xing
2026-05-30 0:44 ` sashiko-bot
2026-05-20 0:42 ` [PATCH net v4 5/5] selftests/xsk: drain CQ to wait for TX completion Jason Xing
2026-05-30 0:44 ` sashiko-bot
2026-05-21 12:23 ` [PATCH net v4 0/5] xsk: fix meta and publish of cq issues Maciej Fijalkowski
2026-05-21 12:41 ` Jason Xing
2026-05-21 12:59 ` Maciej Fijalkowski
2026-05-21 13:07 ` Jason Xing
2026-05-21 14:24 ` Maciej Fijalkowski
2026-05-22 8:55 ` Jason Xing
2026-05-22 13:48 ` Jason Xing
2026-05-22 18:33 ` Maciej Fijalkowski
2026-05-22 23:49 ` Jason Xing
2026-05-26 19:43 ` Maciej Fijalkowski
2026-05-26 23:26 ` Jason Xing
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260520004244.55663-4-kerneljasonxing@gmail.com \
--to=kerneljasonxing@gmail.com \
--cc=andrew+netdev@lunn.ch \
--cc=ast@kernel.org \
--cc=bjorn@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=jonathan.lemon@gmail.com \
--cc=kernelxing@tencent.com \
--cc=kuba@kernel.org \
--cc=maciej.fijalkowski@intel.com \
--cc=magnus.karlsson@intel.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.