From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Jason Xing <kerneljasonxing@gmail.com>
Cc: <davem@davemloft.net>, <edumazet@google.com>, <kuba@kernel.org>,
<pabeni@redhat.com>, <bjorn@kernel.org>,
<magnus.karlsson@intel.com>, <jonathan.lemon@gmail.com>,
<sdf@fomichev.me>, <ast@kernel.org>, <daniel@iogearbox.net>,
<hawk@kernel.org>, <john.fastabend@gmail.com>, <horms@kernel.org>,
<andrew+netdev@lunn.ch>, <bpf@vger.kernel.org>,
<netdev@vger.kernel.org>, Jason Xing <kernelxing@tencent.com>
Subject: Re: [PATCH net v4 2/5] xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx
Date: Thu, 21 May 2026 14:05:39 +0200 [thread overview]
Message-ID: <ag71E39eTnzx2Npy@boxer> (raw)
In-Reply-To: <20260520004244.55663-3-kerneljasonxing@gmail.com>
On Wed, May 20, 2026 at 08:42:41AM +0800, Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
>
> This patch is inspired by the check[1] from sashiko. It says when
> overflow happens, the address of cq to be published is invalid.
> Actually the severer thing is the whole process of publishing the
> address of cq in this particular case is not right: it should truely
> publish the address and advance the cached_prod in cq as long as it
> reads descriptors from txq.
>
> The following is the full analysis.
> xsk_drop_skb() is called in three places, which all discard a partially
> built multi-buffer skb:
> 1) xsk_build_skb() -EOVERFLOW error path: packet exceeds MAX_SKB_FRAGS
> 2) __xsk_generic_xmit() post-loop cleanup: an invalid descriptor in
> the TX ring prevents the partial packet from completing
> 3) xsk_release(): socket close while xs->skb holds an incomplete packet
>
> In all three cases, the TX descriptors for the already-processed frags
> have been consumed from the TX ring (xskq_cons_release), and CQ slots
> have been reserved. However, xsk_drop_skb() calls xsk_consume_skb()
> which cancels the CQ reservations via xsk_cq_cancel_locked(). Since
> the buffer addresses never appear in the completion queue, userspace
> permanently loses track of these buffers.
>
> Fix this by letting consume_skb() trigger the existing xsk_destruct_skb
> destructor, which already submits buffer addresses to the CQ via
> xsk_cq_submit_addr_locked().
>
> Note that cancelling the descriptors back to the TX ring (via
> xskq_cons_cancel_n) is not a appropriate option because an oversized
> packet that always exceeds MAX_SKB_FRAGS would be retried indefinitely,
> which is an obviously deadlock bug in the TX path.
>
> Also move the desc->addr assignment in xsk_build_skb() above the
> overflow check so that the current descriptor's address is recorded
> before a potential -EOVERFLOW jump to free_err, consistent with the
> zerocopy path in xsk_build_skb_zerocopy().
>
> [1]: https://lore.kernel.org/all/20260425041726.85FB3C2BCB2@smtp.kernel.org/
>
> Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
Sorry for the noise, got lost in my inbox and replied on v3.
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---
> net/xdp/xsk.c | 13 ++++++++-----
> 1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index f8c8a8c9dfba..0a6203c42576 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -793,8 +793,11 @@ static void xsk_consume_skb(struct sk_buff *skb)
>
> static void xsk_drop_skb(struct sk_buff *skb)
> {
> - xdp_sk(skb->sk)->tx->invalid_descs += xsk_get_num_desc(skb);
> - xsk_consume_skb(skb);
> + struct xdp_sock *xs = xdp_sk(skb->sk);
> +
> + xs->tx->invalid_descs += xsk_get_num_desc(skb);
> + consume_skb(skb);
> + xs->skb = NULL;
> }
>
> static int xsk_skb_metadata(struct sk_buff *skb, void *buffer,
> @@ -876,7 +879,7 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
> return ERR_PTR(-ENOMEM);
>
> /* in case of -EOVERFLOW that could happen below,
> - * xsk_consume_skb() will release this node as whole skb
> + * xsk_drop_skb() will release this node as whole skb
> * would be dropped, which implies freeing all list elements
> */
> xsk_addr->addrs[xsk_addr->num_descs] = desc->addr;
> @@ -968,6 +971,8 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> goto free_err;
> }
>
> + xsk_addr->addrs[xsk_addr->num_descs] = desc->addr;
> +
> if (unlikely(nr_frags == (MAX_SKB_FRAGS - 1) && xp_mb_desc(desc))) {
> err = -EOVERFLOW;
> goto free_err;
> @@ -985,8 +990,6 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
>
> skb_add_rx_frag(skb, nr_frags, page, 0, len, PAGE_SIZE);
> refcount_add(PAGE_SIZE, &xs->sk.sk_wmem_alloc);
> -
> - xsk_addr->addrs[xsk_addr->num_descs] = desc->addr;
> }
> }
>
> --
> 2.43.7
>
next prev parent reply other threads:[~2026-05-21 12:05 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-20 0:42 [PATCH net v4 0/5] xsk: fix meta and publish of cq issues Jason Xing
2026-05-20 0:42 ` [PATCH net v4 1/5] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() Jason Xing
2026-05-21 12:04 ` Maciej Fijalkowski
2026-05-30 0:44 ` sashiko-bot
2026-05-20 0:42 ` [PATCH net v4 2/5] xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx Jason Xing
2026-05-21 12:05 ` Maciej Fijalkowski [this message]
2026-05-20 0:42 ` [PATCH net v4 3/5] xsk: drain continuation descs after overflow in xsk_build_skb() Jason Xing
2026-05-20 16:10 ` Maciej Fijalkowski
2026-05-20 23:53 ` Jason Xing
2026-05-21 12:02 ` Maciej Fijalkowski
2026-05-21 13:10 ` Jason Xing
2026-05-22 9:06 ` Magnus Karlsson
2026-05-22 9:22 ` Jason Xing
2026-05-30 0:44 ` sashiko-bot
2026-05-20 0:42 ` [PATCH net v4 4/5] xsk: drain continuation descs on invalid descriptor in __xsk_generic_xmit() Jason Xing
2026-05-30 0:44 ` sashiko-bot
2026-05-20 0:42 ` [PATCH net v4 5/5] selftests/xsk: drain CQ to wait for TX completion Jason Xing
2026-05-30 0:44 ` sashiko-bot
2026-05-21 12:23 ` [PATCH net v4 0/5] xsk: fix meta and publish of cq issues Maciej Fijalkowski
2026-05-21 12:41 ` Jason Xing
2026-05-21 12:59 ` Maciej Fijalkowski
2026-05-21 13:07 ` Jason Xing
2026-05-21 14:24 ` Maciej Fijalkowski
2026-05-22 8:55 ` Jason Xing
2026-05-22 13:48 ` Jason Xing
2026-05-22 18:33 ` Maciej Fijalkowski
2026-05-22 23:49 ` Jason Xing
2026-05-26 19:43 ` Maciej Fijalkowski
2026-05-26 23:26 ` Jason Xing
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ag71E39eTnzx2Npy@boxer \
--to=maciej.fijalkowski@intel.com \
--cc=andrew+netdev@lunn.ch \
--cc=ast@kernel.org \
--cc=bjorn@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=jonathan.lemon@gmail.com \
--cc=kerneljasonxing@gmail.com \
--cc=kernelxing@tencent.com \
--cc=kuba@kernel.org \
--cc=magnus.karlsson@intel.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.