From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Jason Xing <kerneljasonxing@gmail.com>
Cc: <davem@davemloft.net>, <edumazet@google.com>, <kuba@kernel.org>,
<pabeni@redhat.com>, <bjorn@kernel.org>,
<magnus.karlsson@intel.com>, <jonathan.lemon@gmail.com>,
<sdf@fomichev.me>, <ast@kernel.org>, <daniel@iogearbox.net>,
<hawk@kernel.org>, <john.fastabend@gmail.com>, <horms@kernel.org>,
<andrew+netdev@lunn.ch>, <bpf@vger.kernel.org>,
<netdev@vger.kernel.org>, Jason Xing <kernelxing@tencent.com>
Subject: Re: [PATCH net v4 3/5] xsk: drain continuation descs after overflow in xsk_build_skb()
Date: Wed, 20 May 2026 18:10:31 +0200 [thread overview]
Message-ID: <ag3c9/EF4h7DLQu3@boxer> (raw)
In-Reply-To: <20260520004244.55663-4-kerneljasonxing@gmail.com>
On Wed, May 20, 2026 at 08:42:42AM +0800, Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
>
> When a multi-buffer packet exceeds MAX_SKB_FRAGS and triggers -EOVERFLOW,
> only the current descriptor is released from the TX ring. The remaining
> continuation descriptors of the same packet stay in the ring. Since
> xs->skb is set to NULL after the drop, the TX loop picks up these
> leftover frags and misinterprets each one as the beginning of a new
> packet, corrupting the packet stream.
>
> Fix this by adding a drain_cont flag to xdp_sock. When overflow occurs
> and the dropped descriptor has XDP_PKT_CONTD set, the flag is raised,
> so we have a chance to examine and handle the potential remaining descs
> of this big overflow'ed skb.
>
> When the last fragment (without XDP_PKT_CONTD) is processed, the flag
> is cleared and the loop continues to process subsequent descriptors
> with the remaining budget. This behavior follows how previous xmit path
> treats overflow packets.
>
> Closes: https://lore.kernel.org/all/20260425041726.85FB3C2BCB2@smtp.kernel.org/
> Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
> include/net/xdp_sock.h | 1 +
> net/xdp/xsk.c | 19 +++++++++++++++++++
> 2 files changed, 20 insertions(+)
>
> diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
> index ebac60a3d8a1..8b51876efbed 100644
> --- a/include/net/xdp_sock.h
> +++ b/include/net/xdp_sock.h
> @@ -80,6 +80,7 @@ struct xdp_sock {
> * call of __xsk_generic_xmit().
> */
> struct sk_buff *skb;
> + bool drain_cont;
>
> struct list_head map_list;
> /* Protects map_list */
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 0a6203c42576..f4add7be8c93 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -1062,11 +1062,30 @@ static int __xsk_generic_xmit(struct sock *sk)
> goto out;
> }
>
> + if (unlikely(xs->drain_cont)) {
Hi Jason,
could we get away from introducing dedicated boolean to xdp_sock for
handling this rare case?
> + unsigned long flags;
> + u32 idx;
> +
> + spin_lock_irqsave(&xs->pool->cq_prod_lock, flags);
> + idx = xskq_get_prod(xs->pool->cq);
> + xskq_prod_write_addr(xs->pool->cq, idx, desc.addr);
> + xskq_prod_submit_n(xs->pool->cq, 1);
> + spin_unlock_irqrestore(&xs->pool->cq_prod_lock, flags);
It feels a bit off to do this cleanup one-by-one.
I wonder if it could be covered by xsk_drop_skb() where you would walk the
tx ring and store addrs at xsk_addrs and then use xsk_cq_submit_addr_locked() ?
OTOH, in theory the amount of frags after we hit the overflow case could
be well over MAX_SKB_FRAGS again so xsk_addrs wouldn't be able to cover
them.
> +
> + xs->tx->invalid_descs++;
> + xskq_cons_release(xs->tx);
> + if (!xp_mb_desc(&desc))
> + xs->drain_cont = false;
> + continue;
> + }
> +
> skb = xsk_build_skb(xs, &desc);
> if (IS_ERR(skb)) {
> err = PTR_ERR(skb);
> if (err != -EOVERFLOW)
> goto out;
> + if (xp_mb_desc(&desc))
> + xs->drain_cont = true;
> err = 0;
> continue;
> }
> --
> 2.43.7
>
next prev parent reply other threads:[~2026-05-20 16:10 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-20 0:42 [PATCH net v4 0/5] xsk: fix meta and publish of cq issues Jason Xing
2026-05-20 0:42 ` [PATCH net v4 1/5] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() Jason Xing
2026-05-21 12:04 ` Maciej Fijalkowski
2026-05-30 0:44 ` sashiko-bot
2026-05-20 0:42 ` [PATCH net v4 2/5] xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx Jason Xing
2026-05-21 12:05 ` Maciej Fijalkowski
2026-05-20 0:42 ` [PATCH net v4 3/5] xsk: drain continuation descs after overflow in xsk_build_skb() Jason Xing
2026-05-20 16:10 ` Maciej Fijalkowski [this message]
2026-05-20 23:53 ` Jason Xing
2026-05-21 12:02 ` Maciej Fijalkowski
2026-05-21 13:10 ` Jason Xing
2026-05-22 9:06 ` Magnus Karlsson
2026-05-22 9:22 ` Jason Xing
2026-05-30 0:44 ` sashiko-bot
2026-05-20 0:42 ` [PATCH net v4 4/5] xsk: drain continuation descs on invalid descriptor in __xsk_generic_xmit() Jason Xing
2026-05-30 0:44 ` sashiko-bot
2026-05-20 0:42 ` [PATCH net v4 5/5] selftests/xsk: drain CQ to wait for TX completion Jason Xing
2026-05-30 0:44 ` sashiko-bot
2026-05-21 12:23 ` [PATCH net v4 0/5] xsk: fix meta and publish of cq issues Maciej Fijalkowski
2026-05-21 12:41 ` Jason Xing
2026-05-21 12:59 ` Maciej Fijalkowski
2026-05-21 13:07 ` Jason Xing
2026-05-21 14:24 ` Maciej Fijalkowski
2026-05-22 8:55 ` Jason Xing
2026-05-22 13:48 ` Jason Xing
2026-05-22 18:33 ` Maciej Fijalkowski
2026-05-22 23:49 ` Jason Xing
2026-05-26 19:43 ` Maciej Fijalkowski
2026-05-26 23:26 ` Jason Xing
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ag3c9/EF4h7DLQu3@boxer \
--to=maciej.fijalkowski@intel.com \
--cc=andrew+netdev@lunn.ch \
--cc=ast@kernel.org \
--cc=bjorn@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=jonathan.lemon@gmail.com \
--cc=kerneljasonxing@gmail.com \
--cc=kernelxing@tencent.com \
--cc=kuba@kernel.org \
--cc=magnus.karlsson@intel.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.