From: sashiko-bot@kernel.org
To: "Jason Xing" <kerneljasonxing@gmail.com>
Cc: bpf@vger.kernel.org
Subject: Re: [PATCH RFC net-next v4 11/14] xsk: support batch xmit main logic
Date: Wed, 15 Apr 2026 09:38:00 +0000 [thread overview]
Message-ID: <20260415093801.0CECEC19424@smtp.kernel.org> (raw)
In-Reply-To: <20260415082654.21026-12-kerneljasonxing@gmail.com>
Sashiko AI review found 5 potential issue(s):
- [Critical] TOCTOU race on socket state pointers leading to NULL pointer dereference and Use-After-Free.
- [Critical] Integer underflow in `xsk_alloc_batch_skb()` leading to massive OOB memory allocation and kernel crash.
- [Critical] Unconditional dropping of partially built packets (`xs->skb`) causes buffer leaks and packet stream corruption.
- [High] Double cancellation of CQ slots and incorrect TX pointer manipulation on SKB build failure.
- [High] Infinite `-EAGAIN` livelock and stat inflation when packets span batch boundaries.
--
commit 4041fe3cb310f6f01b04ae2ff1fbd04cf0aa2713
Author: Jason Xing <kernelxing@tencent.com>
xsk: support batch xmit main logic
This commit implements the core logic for batched transmission in AF_XDP
sockets via __xsk_generic_xmit_batch(). It introduces a loop to read
descriptors, reserve completion queue slots, allocate SKBs, and transmit
packets in batches.
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index b4f3cc7125804..16e47f6fc30ec 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
[ ... ]
> @@ -914,6 +914,122 @@ struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> return ERR_PTR(err);
> }
>
> +static int __xsk_generic_xmit_batch(struct xdp_sock *xs)
> +{
> + struct xsk_buff_pool *pool = xs->pool;
> + struct xsk_batch *batch = &xs->batch;
> + struct xdp_desc *descs = batch->desc_cache;
> + struct net_device *dev = xs->dev;
> + u32 max_batch, max_budget;
> + bool sent_frame = false;
> + struct sk_buff *skb;
> + u32 cons_descs;
> + int err = 0;
> + u32 i = 0;
> +
> + mutex_lock(&xs->mutex);
Can a TOCTOU race occur here with a concurrent socket bind() or setsockopt()?
The pointers for pool, dev, and batch->desc_cache are read before the mutex
is acquired. If another thread modifies the socket state, could these
pointers be NULL or freed by the time xsk_is_bound() is checked, leading to
a use-after-free or NULL pointer dereference when accessing dev or descs?
> +
> + /* Since we dropped the RCU read lock, the socket state might have changed. */
> + if (unlikely(!xsk_is_bound(xs))) {
[ ... ]
> + for (i = 0; i < max_budget; i += cons_descs) {
> + u32 nb_pkts = 0;
> + u32 nb_descs;
[ ... ]
> + cons_descs = xskq_cons_read_desc_batch_copy(xs->tx, pool, descs,
> + nb_descs, &nb_pkts);
> + if (cons_descs < nb_descs) {
> + u32 delta = nb_descs - cons_descs;
> +
> + xsk_cq_cancel_locked(pool, delta);
> + xs->tx->queue_empty_descs += delta;
> + if (!cons_descs) {
> + err = -EAGAIN;
> + goto out;
> + }
Could this lead to an infinite -EAGAIN livelock and inflated statistics?
If a multi-fragment packet spans beyond the nb_descs batch limit without an
end-of-packet descriptor, xskq_cons_read_desc_batch_copy() might safely
rewind the fragments and return 0. The code then treats these valid
descriptors as empty, adds to queue_empty_descs, and returns -EAGAIN,
causing the application to loop infinitely hitting the same boundary.
> + nb_descs = cons_descs;
> + }
> +
> + cons_descs = xsk_alloc_batch_skb(xs, nb_pkts, nb_descs, &err);
Does this calculation of nb_pkts trigger an integer underflow?
If a batch contains only fragments without an end-of-packet flag, nb_pkts
will be 0. Inside xsk_alloc_batch_skb(), if there is an existing partially
built skb (xs->skb != NULL), the code may decrement nb_pkts. Since nb_pkts
is unsigned, nb_pkts-- would underflow to 0xFFFFFFFF, resulting in a
massive out-of-bounds memory allocation.
> + /* Return 'nb_descs - cons_descs' number of descs to the
> + * pool if the batch allocation partially fails
> + */
> + if (cons_descs < nb_descs) {
> + xskq_cons_cancel_n(xs->tx, nb_descs - cons_descs);
> + xsk_cq_cancel_locked(pool, nb_descs - cons_descs);
> + }
Can this cause a double cancellation of completion queue slots?
If xsk_build_skb() fails inside xsk_alloc_batch_skb() due to memory limits,
it already calls xsk_cq_cancel_locked() internally:
net/xdp/xsk.c:xsk_build_skb() {
...
if (err == -EOVERFLOW) {
...
} else {
xsk_cq_cancel_locked(xs->pool, 1);
}
}
Calling xsk_cq_cancel_locked() again here for the failed descriptors would
underflow the completion queue pointers, potentially corrupting the user
space ring buffer.
Also, does xsk_build_skb() advancing the TX consumer pointer on -EOVERFLOW
conflict with the rewind happening here?
[ ... ]
> + if (err)
> + goto out;
> + }
> +
> + /* Maximum budget of descriptors have been consumed */
> + if (xskq_has_descs(xs->tx))
> + err = -EAGAIN;
> +
> +out:
> + if (xs->skb)
> + xsk_drop_skb(xs->skb);
Does unconditionally dropping the partially built xs->skb corrupt the packet
stream?
If transmission hits a NETDEV_TX_BUSY (-EAGAIN) or a valid multi-fragment
packet spans a batch boundary, the code drops the fragments processed so far.
Because the completion queue slots and TX pointers for these early fragments
might not be properly rewound or notified, does this permanently leak buffers
and merge unrelated fragments into corrupted packets on the next sendmsg()
retry?
> +
> + /* If send_queue has more pending skbs, we must to clear
> + * the rest of them.
> + */
> + while ((skb = __skb_dequeue(&batch->send_queue)) != NULL) {
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260415082654.21026-1-kerneljasonxing@gmail.com?part=11
next prev parent reply other threads:[~2026-04-15 9:38 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-15 8:26 [PATCH RFC net-next v4 00/14] xsk: batch xmit in copy mode Jason Xing
2026-04-15 8:26 ` [PATCH RFC net-next v4 01/14] xsk: introduce XDP_GENERIC_XMIT_BATCH setsockopt Jason Xing
2026-04-15 8:51 ` sashiko-bot
2026-04-15 12:46 ` Jason Xing
2026-04-15 8:26 ` [PATCH RFC net-next v4 02/14] xsk: extend xsk_build_skb() to support passing an already allocated skb Jason Xing
2026-04-15 8:52 ` sashiko-bot
2026-04-15 13:19 ` Jason Xing
2026-04-15 8:26 ` [PATCH RFC net-next v4 03/14] xsk: add xsk_alloc_batch_skb() to build skbs in batch Jason Xing
2026-04-15 9:17 ` sashiko-bot
2026-04-16 1:18 ` Jason Xing
2026-04-15 8:26 ` [PATCH RFC net-next v4 04/14] xsk: cache data buffers to avoid frequently calling kmalloc_reserve Jason Xing
2026-04-15 9:38 ` sashiko-bot
2026-04-16 2:45 ` Jason Xing
2026-04-16 12:18 ` Jason Xing
2026-04-15 8:26 ` [PATCH RFC net-next v4 05/14] xsk: add direct xmit in batch function Jason Xing
2026-04-15 9:11 ` sashiko-bot
2026-04-16 3:04 ` Jason Xing
2026-04-15 8:26 ` [PATCH RFC net-next v4 06/14] xsk: support dynamic xmit.more control for batch xmit Jason Xing
2026-04-15 9:35 ` sashiko-bot
2026-04-16 3:43 ` Jason Xing
2026-04-16 4:50 ` Dmitry Torokhov
2026-04-16 4:51 ` Dmitry Torokhov
2026-04-15 8:26 ` [PATCH RFC net-next v4 07/14] xsk: try to skip validating skb list in xmit path Jason Xing
2026-04-15 9:33 ` sashiko-bot
2026-04-16 5:55 ` Jason Xing
2026-04-15 8:26 ` [PATCH RFC net-next v4 08/14] xsk: rename nb_pkts to nb_descs in xsk_tx_peek_release_desc_batch Jason Xing
2026-04-15 8:26 ` [PATCH RFC net-next v4 09/14] xsk: extend xskq_cons_read_desc_batch to count nb_pkts Jason Xing
2026-04-15 8:26 ` [PATCH RFC net-next v4 10/14] xsk: extend xsk_cq_reserve_locked() to reserve n slots Jason Xing
2026-04-15 8:26 ` [PATCH RFC net-next v4 11/14] xsk: support batch xmit main logic Jason Xing
2026-04-15 9:38 ` sashiko-bot [this message]
2026-04-16 9:58 ` Jason Xing
2026-04-15 8:26 ` [PATCH RFC net-next v4 12/14] xsk: separate read-mostly and write-heavy fields in xsk_buff_pool Jason Xing
2026-04-15 9:20 ` sashiko-bot
2026-04-16 10:09 ` Jason Xing
2026-04-15 8:26 ` [PATCH RFC net-next v4 13/14] xsk: retire old xmit path in copy mode Jason Xing
2026-04-15 9:18 ` sashiko-bot
2026-04-16 10:33 ` Jason Xing
2026-04-15 8:26 ` [PATCH RFC net-next v4 14/14] xsk: optimize xsk_build_skb for batch copy-mode fast path Jason Xing
2026-04-15 9:47 ` sashiko-bot
2026-04-16 13:12 ` Jason Xing
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260415093801.0CECEC19424@smtp.kernel.org \
--to=sashiko-bot@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=kerneljasonxing@gmail.com \
--cc=sashiko@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox