* [PATCH net 0/4] xsk: fix meta and publish of cq issues
@ 2026-05-10 1:23 Jason Xing
2026-05-10 1:23 ` [PATCH net 1/4] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() Jason Xing
` (4 more replies)
0 siblings, 5 replies; 9+ messages in thread
From: Jason Xing @ 2026-05-10 1:23 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, horms, andrew+netdev
Cc: bpf, netdev, Jason Xing
From: Jason Xing <kernelxing@tencent.com>
The series is the product of previous review from sashiko[1].
1) META
patch 1: address TOCTOU around metadata.
2) PUBLISH of CQ
patch 2: make sure xsk_addr->addrs[] can be published to cq when
overflow occurs.
patch 3: keep cleaning up the continuation descs (more than 17) and
publish its address when overflow occurs.
patch 4: like patch 3, but only handles the invalid descs cases.
[1]: https://lore.kernel.org/all/20260502200722.53960-1-kerneljasonxing@gmail.com/
Jason Xing (4):
xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata()
xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx
xsk: drain continuation descs after overflow in xsk_build_skb()
xsk: drain continuation descs on invalid descriptor in
__xsk_generic_xmit()
include/net/xdp_sock.h | 1 +
net/xdp/xsk.c | 48 ++++++++++++++++++++++++++++++++++--------
2 files changed, 40 insertions(+), 9 deletions(-)
--
2.41.3
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH net 1/4] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata()
2026-05-10 1:23 [PATCH net 0/4] xsk: fix meta and publish of cq issues Jason Xing
@ 2026-05-10 1:23 ` Jason Xing
2026-05-11 15:03 ` Stanislav Fomichev
2026-05-10 1:23 ` [PATCH net 2/4] xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx Jason Xing
` (3 subsequent siblings)
4 siblings, 1 reply; 9+ messages in thread
From: Jason Xing @ 2026-05-10 1:23 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, horms, andrew+netdev
Cc: bpf, netdev, Jason Xing
From: Jason Xing <kernelxing@tencent.com>
The TX metadata area resides in the UMEM buffer which is memory-mapped
and concurrently writable by userspace. In xsk_skb_metadata(),
csum_start and csum_offset are read from shared memory for bounds
validation, then read again for skb assignment. A malicious userspace
application can race to overwrite these values between the two reads,
bypassing the bounds check and causing out-of-bounds memory access
during checksum computation in the transmit path.
Fix this by reading csum_start and csum_offset into local variables
once, then using the local copies for both validation and assignment.
Note that other metadata fields (flags, launch_time) and the cached
csum fields may be mutually inconsistent due to concurrent userspace
writes, but this is benign: the only security-critical invariant is
that each field's validated value is the same one used, which local
caching guarantees.
Closes: https://lore.kernel.org/all/20260503200927.73EA1C2BCB4@smtp.kernel.org/
Fixes: 48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
net/xdp/xsk.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 6bcd77068e52..cd039e397018 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -722,6 +722,7 @@ static int xsk_skb_metadata(struct sk_buff *skb, void *buffer,
u32 hr)
{
struct xsk_tx_metadata *meta = NULL;
+ u16 csum_start, csum_offset;
if (unlikely(pool->tx_metadata_len == 0))
return -EINVAL;
@@ -731,13 +732,15 @@ static int xsk_skb_metadata(struct sk_buff *skb, void *buffer,
return -EINVAL;
if (meta->flags & XDP_TXMD_FLAGS_CHECKSUM) {
- if (unlikely(meta->request.csum_start +
- meta->request.csum_offset +
+ csum_start = meta->request.csum_start;
+ csum_offset = meta->request.csum_offset;
+
+ if (unlikely(csum_start + csum_offset +
sizeof(__sum16) > desc->len))
return -EINVAL;
- skb->csum_start = hr + meta->request.csum_start;
- skb->csum_offset = meta->request.csum_offset;
+ skb->csum_start = hr + csum_start;
+ skb->csum_offset = csum_offset;
skb->ip_summed = CHECKSUM_PARTIAL;
if (unlikely(pool->tx_sw_csum)) {
--
2.41.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH net 2/4] xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx
2026-05-10 1:23 [PATCH net 0/4] xsk: fix meta and publish of cq issues Jason Xing
2026-05-10 1:23 ` [PATCH net 1/4] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() Jason Xing
@ 2026-05-10 1:23 ` Jason Xing
2026-05-10 1:23 ` [PATCH net 3/4] xsk: drain continuation descs after overflow in xsk_build_skb() Jason Xing
` (2 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: Jason Xing @ 2026-05-10 1:23 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, horms, andrew+netdev
Cc: bpf, netdev, Jason Xing
From: Jason Xing <kernelxing@tencent.com>
This patch is inspired by the check[1] from sashiko. It says when
overflow happens, the address of cq to be published is invalid.
Actually the severer thing is the whole process of publishing the
address of cq in this particular case is not right: it should truely
publish the address and advance the cached_prod in cq as long as it
reads descriptors from txq.
The following is the full analysis.
xsk_drop_skb() is called in three places, which all discard a partially
built multi-buffer skb:
1) xsk_build_skb() -EOVERFLOW error path: packet exceeds MAX_SKB_FRAGS
2) __xsk_generic_xmit() post-loop cleanup: an invalid descriptor in
the TX ring prevents the partial packet from completing
3) xsk_release(): socket close while xs->skb holds an incomplete packet
In all three cases, the TX descriptors for the already-processed frags
have been consumed from the TX ring (xskq_cons_release), and CQ slots
have been reserved. However, xsk_drop_skb() calls xsk_consume_skb()
which cancels the CQ reservations via xsk_cq_cancel_locked(). Since
the buffer addresses never appear in the completion queue, userspace
permanently loses track of these buffers.
Fix this by letting consume_skb() trigger the existing xsk_destruct_skb
destructor, which already submits buffer addresses to the CQ via
xsk_cq_submit_addr_locked().
Note that cancelling the descriptors back to the TX ring (via
xskq_cons_cancel_n) is not a appropriate option because an oversized
packet that always exceeds MAX_SKB_FRAGS would be retried indefinitely,
which is an obviously deadlock bug in the TX path.
Also move the desc->addr assignment in xsk_build_skb() above the
overflow check so that the current descriptor's address is recorded
before a potential -EOVERFLOW jump to free_err, consistent with the
zerocopy path in xsk_build_skb_zerocopy().
[1]: https://lore.kernel.org/all/20260425041726.85FB3C2BCB2@smtp.kernel.org/
Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
net/xdp/xsk.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index cd039e397018..3f1e590c855d 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -713,8 +713,11 @@ static void xsk_consume_skb(struct sk_buff *skb)
static void xsk_drop_skb(struct sk_buff *skb)
{
- xdp_sk(skb->sk)->tx->invalid_descs += xsk_get_num_desc(skb);
- xsk_consume_skb(skb);
+ struct xdp_sock *xs = xdp_sk(skb->sk);
+
+ xs->tx->invalid_descs += xsk_get_num_desc(skb);
+ consume_skb(skb);
+ xs->skb = NULL;
}
static int xsk_skb_metadata(struct sk_buff *skb, void *buffer,
@@ -796,7 +799,7 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
return ERR_PTR(-ENOMEM);
/* in case of -EOVERFLOW that could happen below,
- * xsk_consume_skb() will release this node as whole skb
+ * xsk_drop_skb() will release this node as whole skb
* would be dropped, which implies freeing all list elements
*/
xsk_addr->addrs[xsk_addr->num_descs] = desc->addr;
@@ -888,6 +891,8 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
goto free_err;
}
+ xsk_addr->addrs[xsk_addr->num_descs] = desc->addr;
+
if (unlikely(nr_frags == (MAX_SKB_FRAGS - 1) && xp_mb_desc(desc))) {
err = -EOVERFLOW;
goto free_err;
@@ -905,8 +910,6 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
skb_add_rx_frag(skb, nr_frags, page, 0, len, PAGE_SIZE);
refcount_add(PAGE_SIZE, &xs->sk.sk_wmem_alloc);
-
- xsk_addr->addrs[xsk_addr->num_descs] = desc->addr;
}
}
--
2.41.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH net 3/4] xsk: drain continuation descs after overflow in xsk_build_skb()
2026-05-10 1:23 [PATCH net 0/4] xsk: fix meta and publish of cq issues Jason Xing
2026-05-10 1:23 ` [PATCH net 1/4] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() Jason Xing
2026-05-10 1:23 ` [PATCH net 2/4] xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx Jason Xing
@ 2026-05-10 1:23 ` Jason Xing
2026-05-10 1:23 ` [PATCH net 4/4] xsk: drain continuation descs on invalid descriptor in __xsk_generic_xmit() Jason Xing
2026-05-11 14:16 ` [PATCH net 0/4] xsk: fix meta and publish of cq issues Jakub Kicinski
4 siblings, 0 replies; 9+ messages in thread
From: Jason Xing @ 2026-05-10 1:23 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, horms, andrew+netdev
Cc: bpf, netdev, Jason Xing
From: Jason Xing <kernelxing@tencent.com>
When a multi-buffer packet exceeds MAX_SKB_FRAGS and triggers -EOVERFLOW,
only the current descriptor is released from the TX ring. The remaining
continuation descriptors of the same packet stay in the ring. Since
xs->skb is set to NULL after the drop, the TX loop picks up these
leftover frags and misinterprets each one as the beginning of a new
packet, corrupting the packet stream.
Fix this by adding a drain_cont flag to xdp_sock. When overflow occurs
and the dropped descriptor has XDP_PKT_CONTD set, the flag is raised.
The main TX loop in __xsk_generic_xmit() then handles continuation
descriptors one at a time: each gets a normal CQ reservation (with
backpressure), its address is submitted to the completion queue, and
the descriptor is released from the TX ring. When the last fragment
(without XDP_PKT_CONTD) is processed, the flag is cleared and the
function returns -EOVERFLOW so the next call starts with a fresh
budget for normal packets.
This reuses the existing CQ backpressure and budget mechanisms, so if
the CQ is full the function returns -EAGAIN and userspace drains the
CQ before retrying. Zero buffer leakage, zero packet stream corruption.
Closes: https://lore.kernel.org/all/20260425041726.85FB3C2BCB2@smtp.kernel.org/
Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
include/net/xdp_sock.h | 1 +
net/xdp/xsk.c | 22 ++++++++++++++++++++++
2 files changed, 23 insertions(+)
diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index 23e8861e8b25..1958d19d9925 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -80,6 +80,7 @@ struct xdp_sock {
* call of __xsk_generic_xmit().
*/
struct sk_buff *skb;
+ bool drain_cont;
struct list_head map_list;
/* Protects map_list */
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 3f1e590c855d..232dd7126905 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -936,6 +936,8 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
xs->tx->invalid_descs++;
}
xskq_cons_release(xs->tx);
+ if (xp_mb_desc(desc))
+ xs->drain_cont = true;
} else {
/* Let application retry */
xsk_cq_cancel_locked(xs->pool, 1);
@@ -982,6 +984,26 @@ static int __xsk_generic_xmit(struct sock *sk)
goto out;
}
+ if (unlikely(xs->drain_cont)) {
+ unsigned long flags;
+ u32 idx;
+
+ spin_lock_irqsave(&xs->pool->cq_prod_lock, flags);
+ idx = xskq_get_prod(xs->pool->cq);
+ xskq_prod_write_addr(xs->pool->cq, idx, desc.addr);
+ xskq_prod_submit_n(xs->pool->cq, 1);
+ spin_unlock_irqrestore(&xs->pool->cq_prod_lock, flags);
+
+ xs->tx->invalid_descs++;
+ xskq_cons_release(xs->tx);
+ if (!xp_mb_desc(&desc)) {
+ xs->drain_cont = false;
+ err = -EOVERFLOW;
+ goto out;
+ }
+ continue;
+ }
+
skb = xsk_build_skb(xs, &desc);
if (IS_ERR(skb)) {
err = PTR_ERR(skb);
--
2.41.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH net 4/4] xsk: drain continuation descs on invalid descriptor in __xsk_generic_xmit()
2026-05-10 1:23 [PATCH net 0/4] xsk: fix meta and publish of cq issues Jason Xing
` (2 preceding siblings ...)
2026-05-10 1:23 ` [PATCH net 3/4] xsk: drain continuation descs after overflow in xsk_build_skb() Jason Xing
@ 2026-05-10 1:23 ` Jason Xing
2026-05-11 14:16 ` [PATCH net 0/4] xsk: fix meta and publish of cq issues Jakub Kicinski
4 siblings, 0 replies; 9+ messages in thread
From: Jason Xing @ 2026-05-10 1:23 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, horms, andrew+netdev
Cc: bpf, netdev, Jason Xing
From: Jason Xing <kernelxing@tencent.com>
When the TX loop in __xsk_generic_xmit() encounters an invalid
descriptor mid-packet (e.g. an out-of-bounds address), the partial
skb is dropped and the offending descriptor is released. However,
remaining continuation descriptors belonging to the same multi-buffer
packet still sit in the TX ring. Since xs->skb becomes NULL after the
drop, the next iteration treats the leftover continuation fragment as
a brand-new packet, corrupting the packet stream.
Fix this by setting the drain_cont flag when the released descriptor
has XDP_PKT_CONTD set. On the next call to __xsk_generic_xmit(), the
drain logic introduced in the previous patch handles the remaining
fragments with normal CQ backpressure.
Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
net/xdp/xsk.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 232dd7126905..b41ed44e3192 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -1045,6 +1045,8 @@ static int __xsk_generic_xmit(struct sock *sk)
if (xs->skb)
xsk_drop_skb(xs->skb);
xskq_cons_release(xs->tx);
+ if (xp_mb_desc(&desc))
+ xs->drain_cont = true;
}
out:
--
2.41.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH net 0/4] xsk: fix meta and publish of cq issues
2026-05-10 1:23 [PATCH net 0/4] xsk: fix meta and publish of cq issues Jason Xing
` (3 preceding siblings ...)
2026-05-10 1:23 ` [PATCH net 4/4] xsk: drain continuation descs on invalid descriptor in __xsk_generic_xmit() Jason Xing
@ 2026-05-11 14:16 ` Jakub Kicinski
2026-05-12 14:29 ` Jason Xing
4 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2026-05-11 14:16 UTC (permalink / raw)
To: Jason Xing
Cc: davem, edumazet, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing
On Sun, 10 May 2026 09:23:06 +0800 Jason Xing wrote:
> The series is the product of previous review from sashiko[1].
This series breaks the BPF selftests, it seems:
Error: #277 ns_xsk_drv
Error: #277/21 ns_xsk_drv/ALIGNED_INV_DESC_MULTI_BUFF
Error: #277/22 ns_xsk_drv/TOO_MANY_FRAGS
Error: #278 ns_xsk_skb
Error: #278/21 ns_xsk_skb/ALIGNED_INV_DESC_MULTI_BUFF
Error: #278/22 ns_xsk_skb/TOO_MANY_FRAGS
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net 1/4] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata()
2026-05-10 1:23 ` [PATCH net 1/4] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() Jason Xing
@ 2026-05-11 15:03 ` Stanislav Fomichev
2026-05-12 14:32 ` Jason Xing
0 siblings, 1 reply; 9+ messages in thread
From: Stanislav Fomichev @ 2026-05-11 15:03 UTC (permalink / raw)
To: Jason Xing
Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing
On 05/10, Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
>
> The TX metadata area resides in the UMEM buffer which is memory-mapped
> and concurrently writable by userspace. In xsk_skb_metadata(),
> csum_start and csum_offset are read from shared memory for bounds
> validation, then read again for skb assignment. A malicious userspace
> application can race to overwrite these values between the two reads,
> bypassing the bounds check and causing out-of-bounds memory access
> during checksum computation in the transmit path.
>
> Fix this by reading csum_start and csum_offset into local variables
> once, then using the local copies for both validation and assignment.
>
> Note that other metadata fields (flags, launch_time) and the cached
> csum fields may be mutually inconsistent due to concurrent userspace
> writes, but this is benign: the only security-critical invariant is
> that each field's validated value is the same one used, which local
> caching guarantees.
>
> Closes: https://lore.kernel.org/all/20260503200927.73EA1C2BCB4@smtp.kernel.org/
> Fixes: 48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support")
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
> net/xdp/xsk.c | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 6bcd77068e52..cd039e397018 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -722,6 +722,7 @@ static int xsk_skb_metadata(struct sk_buff *skb, void *buffer,
> u32 hr)
> {
> struct xsk_tx_metadata *meta = NULL;
> + u16 csum_start, csum_offset;
>
> if (unlikely(pool->tx_metadata_len == 0))
> return -EINVAL;
> @@ -731,13 +732,15 @@ static int xsk_skb_metadata(struct sk_buff *skb, void *buffer,
> return -EINVAL;
>
> if (meta->flags & XDP_TXMD_FLAGS_CHECKSUM) {
> - if (unlikely(meta->request.csum_start +
> - meta->request.csum_offset +
> + csum_start = meta->request.csum_start;
> + csum_offset = meta->request.csum_offset;
Wondering if it's better to READ_ONCE(x) these?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net 0/4] xsk: fix meta and publish of cq issues
2026-05-11 14:16 ` [PATCH net 0/4] xsk: fix meta and publish of cq issues Jakub Kicinski
@ 2026-05-12 14:29 ` Jason Xing
0 siblings, 0 replies; 9+ messages in thread
From: Jason Xing @ 2026-05-12 14:29 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, edumazet, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing
On Mon, May 11, 2026 at 10:16 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sun, 10 May 2026 09:23:06 +0800 Jason Xing wrote:
> > The series is the product of previous review from sashiko[1].
>
> This series breaks the BPF selftests, it seems:
>
> Error: #277 ns_xsk_drv
> Error: #277/21 ns_xsk_drv/ALIGNED_INV_DESC_MULTI_BUFF
> Error: #277/22 ns_xsk_drv/TOO_MANY_FRAGS
> Error: #278 ns_xsk_skb
> Error: #278/21 ns_xsk_skb/ALIGNED_INV_DESC_MULTI_BUFF
> Error: #278/22 ns_xsk_skb/TOO_MANY_FRAGS
Thanks. I should've updated the selftest as well.
Thanks,
Jason
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net 1/4] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata()
2026-05-11 15:03 ` Stanislav Fomichev
@ 2026-05-12 14:32 ` Jason Xing
0 siblings, 0 replies; 9+ messages in thread
From: Jason Xing @ 2026-05-12 14:32 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing
On Mon, May 11, 2026 at 11:03 PM Stanislav Fomichev
<sdf.kernel@gmail.com> wrote:
>
> On 05/10, Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > The TX metadata area resides in the UMEM buffer which is memory-mapped
> > and concurrently writable by userspace. In xsk_skb_metadata(),
> > csum_start and csum_offset are read from shared memory for bounds
> > validation, then read again for skb assignment. A malicious userspace
> > application can race to overwrite these values between the two reads,
> > bypassing the bounds check and causing out-of-bounds memory access
> > during checksum computation in the transmit path.
> >
> > Fix this by reading csum_start and csum_offset into local variables
> > once, then using the local copies for both validation and assignment.
> >
> > Note that other metadata fields (flags, launch_time) and the cached
> > csum fields may be mutually inconsistent due to concurrent userspace
> > writes, but this is benign: the only security-critical invariant is
> > that each field's validated value is the same one used, which local
> > caching guarantees.
> >
> > Closes: https://lore.kernel.org/all/20260503200927.73EA1C2BCB4@smtp.kernel.org/
> > Fixes: 48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support")
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > ---
> > net/xdp/xsk.c | 11 +++++++----
> > 1 file changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index 6bcd77068e52..cd039e397018 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -722,6 +722,7 @@ static int xsk_skb_metadata(struct sk_buff *skb, void *buffer,
> > u32 hr)
> > {
> > struct xsk_tx_metadata *meta = NULL;
> > + u16 csum_start, csum_offset;
> >
> > if (unlikely(pool->tx_metadata_len == 0))
> > return -EINVAL;
> > @@ -731,13 +732,15 @@ static int xsk_skb_metadata(struct sk_buff *skb, void *buffer,
> > return -EINVAL;
> >
> > if (meta->flags & XDP_TXMD_FLAGS_CHECKSUM) {
> > - if (unlikely(meta->request.csum_start +
> > - meta->request.csum_offset +
> > + csum_start = meta->request.csum_start;
> > + csum_offset = meta->request.csum_offset;
>
> Wondering if it's better to READ_ONCE(x) these?
I still chose not to use it after reading the suggestion from local
AI. The reason is there is no WRITE_ONCE pair to make sure everything
is no data-race. I also checked some existing implementations around
the shared buffer (between userspace and kernel) and didn't manage to
see the usage of XXXX_ONCE(). Does it make any sense to you :) ?
Thanks,
Jason
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-05-12 14:33 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-10 1:23 [PATCH net 0/4] xsk: fix meta and publish of cq issues Jason Xing
2026-05-10 1:23 ` [PATCH net 1/4] xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata() Jason Xing
2026-05-11 15:03 ` Stanislav Fomichev
2026-05-12 14:32 ` Jason Xing
2026-05-10 1:23 ` [PATCH net 2/4] xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx Jason Xing
2026-05-10 1:23 ` [PATCH net 3/4] xsk: drain continuation descs after overflow in xsk_build_skb() Jason Xing
2026-05-10 1:23 ` [PATCH net 4/4] xsk: drain continuation descs on invalid descriptor in __xsk_generic_xmit() Jason Xing
2026-05-11 14:16 ` [PATCH net 0/4] xsk: fix meta and publish of cq issues Jakub Kicinski
2026-05-12 14:29 ` Jason Xing
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox