From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7786432A3C8 for ; Wed, 15 Apr 2026 09:38:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776245881; cv=none; b=bUrBbLFf8WPVMOB+JzvFgfMOc1XZN/lzk5M0nR/bsuqXlBNPs9J4dcY4WrjULPFJdcG1ps0dPqs/JXhV/Y/0Nm3RkSQmbkl/oIby3zD5L6UX5UTZcFHihMMqDiiFzoEyczpFE7Dr0UsI8nYsiRP9hJmKj9brMVWgo0rR+H6B5eM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776245881; c=relaxed/simple; bh=mcYi8VmykU5CBwf3ecUrQNP7fc0qFK+CcKZj81u7QF0=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=AIpAuDklmhZ+7ZLIk8wjma5jQPb9rsqfCz0nldPPZOz5L2SqDuIX4oO7Cb0/eE6GLaV4rv+rlANZTTN7cZd2vCj47zKr2xw92Bd7k741tr7PEVmGUwxpcSB2bKzAQ8VcfymwdJxeaMrUqZu5IDT3kzOrTFeVcpUPyq0FP6/36hc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=r+tSCxq1; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="r+tSCxq1" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0CECEC19424; Wed, 15 Apr 2026 09:38:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1776245881; bh=mcYi8VmykU5CBwf3ecUrQNP7fc0qFK+CcKZj81u7QF0=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date:From; b=r+tSCxq1I9Nz6Cb+SYh/66Q9D4okh7rR/VUUzIMKajua8C9xazXmbPJEtaCwaBTRQ sWkyAeKMpJUeqyDvRG792iUPWrnKWrpCN1e/YFm7U8096nZ6gm2viPpnX92XiVLc7J zuURQY/C+vIzUkNXjdNIkJU/enXUp+40k6RhOI2D/uJCXM2Xs7L5XKuVzTkMINOUgW +oSyAMTEk7O5mJa7t3D7MA2Ao7OteyAxKNqAeFgBikvoQh0DwUSYD4WF/XbNFnv4Pf 8gtxZpXj3nI/GTCJgAdshLuyLBXj/zSPXojriBRyDXKMmd5o6gWUAfUmhCTCUfBJ3V LZiHr9Lc+9SyQ== From: sashiko-bot@kernel.org Subject: Re: [PATCH RFC net-next v4 11/14] xsk: support batch xmit main logic Reply-To: sashiko@lists.linux.dev To: "Jason Xing" Cc: bpf@vger.kernel.org In-Reply-To: <20260415082654.21026-12-kerneljasonxing@gmail.com> References: <20260415082654.21026-12-kerneljasonxing@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Wed, 15 Apr 2026 09:38:00 +0000 Message-Id: <20260415093801.0CECEC19424@smtp.kernel.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Sashiko AI review found 5 potential issue(s): - [Critical] TOCTOU race on socket state pointers leading to NULL pointer d= ereference and Use-After-Free. - [Critical] Integer underflow in `xsk_alloc_batch_skb()` leading to massiv= e OOB memory allocation and kernel crash. - [Critical] Unconditional dropping of partially built packets (`xs->skb`) = causes buffer leaks and packet stream corruption. - [High] Double cancellation of CQ slots and incorrect TX pointer manipulat= ion on SKB build failure. - [High] Infinite `-EAGAIN` livelock and stat inflation when packets span b= atch boundaries. -- commit 4041fe3cb310f6f01b04ae2ff1fbd04cf0aa2713 Author: Jason Xing xsk: support batch xmit main logic This commit implements the core logic for batched transmission in AF_XDP sockets via __xsk_generic_xmit_batch(). It introduces a loop to read descriptors, reserve completion queue slots, allocate SKBs, and transmit packets in batches. > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > index b4f3cc7125804..16e47f6fc30ec 100644 > --- a/net/xdp/xsk.c > +++ b/net/xdp/xsk.c [ ... ] > @@ -914,6 +914,122 @@ struct sk_buff *xsk_build_skb(struct xdp_sock *xs, > return ERR_PTR(err); > } > =20 > +static int __xsk_generic_xmit_batch(struct xdp_sock *xs) > +{ > + struct xsk_buff_pool *pool =3D xs->pool; > + struct xsk_batch *batch =3D &xs->batch; > + struct xdp_desc *descs =3D batch->desc_cache; > + struct net_device *dev =3D xs->dev; > + u32 max_batch, max_budget; > + bool sent_frame =3D false; > + struct sk_buff *skb; > + u32 cons_descs; > + int err =3D 0; > + u32 i =3D 0; > + > + mutex_lock(&xs->mutex); Can a TOCTOU race occur here with a concurrent socket bind() or setsockopt(= )? The pointers for pool, dev, and batch->desc_cache are read before the mutex is acquired. If another thread modifies the socket state, could these pointers be NULL or freed by the time xsk_is_bound() is checked, leading to a use-after-free or NULL pointer dereference when accessing dev or descs? > + > + /* Since we dropped the RCU read lock, the socket state might have chan= ged. */ > + if (unlikely(!xsk_is_bound(xs))) { [ ... ] > + for (i =3D 0; i < max_budget; i +=3D cons_descs) { > + u32 nb_pkts =3D 0; > + u32 nb_descs; [ ... ] > + cons_descs =3D xskq_cons_read_desc_batch_copy(xs->tx, pool, descs, > + nb_descs, &nb_pkts); > + if (cons_descs < nb_descs) { > + u32 delta =3D nb_descs - cons_descs; > + > + xsk_cq_cancel_locked(pool, delta); > + xs->tx->queue_empty_descs +=3D delta; > + if (!cons_descs) { > + err =3D -EAGAIN; > + goto out; > + } Could this lead to an infinite -EAGAIN livelock and inflated statistics? If a multi-fragment packet spans beyond the nb_descs batch limit without an end-of-packet descriptor, xskq_cons_read_desc_batch_copy() might safely rewind the fragments and return 0. The code then treats these valid descriptors as empty, adds to queue_empty_descs, and returns -EAGAIN, causing the application to loop infinitely hitting the same boundary. > + nb_descs =3D cons_descs; > + } > + > + cons_descs =3D xsk_alloc_batch_skb(xs, nb_pkts, nb_descs, &err); Does this calculation of nb_pkts trigger an integer underflow? If a batch contains only fragments without an end-of-packet flag, nb_pkts will be 0. Inside xsk_alloc_batch_skb(), if there is an existing partially built skb (xs->skb !=3D NULL), the code may decrement nb_pkts. Since nb_pkts is unsigned, nb_pkts-- would underflow to 0xFFFFFFFF, resulting in a massive out-of-bounds memory allocation. > + /* Return 'nb_descs - cons_descs' number of descs to the > + * pool if the batch allocation partially fails > + */ > + if (cons_descs < nb_descs) { > + xskq_cons_cancel_n(xs->tx, nb_descs - cons_descs); > + xsk_cq_cancel_locked(pool, nb_descs - cons_descs); > + } Can this cause a double cancellation of completion queue slots? If xsk_build_skb() fails inside xsk_alloc_batch_skb() due to memory limits, it already calls xsk_cq_cancel_locked() internally: net/xdp/xsk.c:xsk_build_skb() { ... if (err =3D=3D -EOVERFLOW) { ... } else { xsk_cq_cancel_locked(xs->pool, 1); } } Calling xsk_cq_cancel_locked() again here for the failed descriptors would underflow the completion queue pointers, potentially corrupting the user space ring buffer.=20 Also, does xsk_build_skb() advancing the TX consumer pointer on -EOVERFLOW conflict with the rewind happening here? [ ... ] > + if (err) > + goto out; > + } > + > + /* Maximum budget of descriptors have been consumed */ > + if (xskq_has_descs(xs->tx)) > + err =3D -EAGAIN; > + > +out: > + if (xs->skb) > + xsk_drop_skb(xs->skb); Does unconditionally dropping the partially built xs->skb corrupt the packet stream?=20 If transmission hits a NETDEV_TX_BUSY (-EAGAIN) or a valid multi-fragment packet spans a batch boundary, the code drops the fragments processed so fa= r.=20 Because the completion queue slots and TX pointers for these early fragments might not be properly rewound or notified, does this permanently leak buffe= rs and merge unrelated fragments into corrupted packets on the next sendmsg() retry? > + > + /* If send_queue has more pending skbs, we must to clear > + * the rest of them. > + */ > + while ((skb =3D __skb_dequeue(&batch->send_queue)) !=3D NULL) { --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260415082654.2102= 6-1-kerneljasonxing@gmail.com?part=3D11