* [PATCH net v3 0/8] xsk: fix bugs around xsk skb allocation
@ 2026-04-22 3:36 Jason Xing
2026-04-22 3:36 ` [PATCH net v3 1/8] xsk: reject sw-csum UMEM binding to IFF_TX_SKB_NO_LINEAR devices Jason Xing
` (7 more replies)
0 siblings, 8 replies; 19+ messages in thread
From: Jason Xing @ 2026-04-22 3:36 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend
Cc: bpf, netdev, Jason Xing
From: Jason Xing <kernelxing@tencent.com>
There are rare issues around xsk_build_skb(). Some of them
were founded by Sashiko[1][2].
[1]: https://lore.kernel.org/all/20260415082654.21026-1-kerneljasonxing@gmail.com/
[2]: https://lore.kernel.org/all/20260418045644.28612-1-kerneljasonxing@gmail.com/
---
v3
Link: https://lore.kernel.org/all/20260420082805.14844-1-kerneljasonxing@gmail.com/
1. use !xs->skb as the indicator of first frag in patch 3 (Stan)
2. move init skb after it's allocated successfully, so patch 4,5,6 have
been adjusted accordingly (Stan)
3. do not support 32-bit arch (Stan)
4. add acked-by tags (Stan)
v2
Link: https://lore.kernel.org/all/20260418045644.28612-1-kerneljasonxing@gmail.com/#t
1. add four patches spotted by sashiko to fix buggy pre-existing
behavior
2. adjust the order of 8 patches.
Jason Xing (8):
xsk: reject sw-csum UMEM binding to IFF_TX_SKB_NO_LINEAR devices
xsk: handle NULL dereference of the skb without frags issue
xsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path
xsk: prevent CQ desync when freeing half-built skbs in xsk_build_skb()
xsk: avoid skb leak in XDP_TX_METADATA case
xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS
xsk: fix xsk_addrs slab leak on multi-buffer error path
xsk: don't support AF_XDP on 32-bit architectures
net/xdp/Kconfig | 2 +-
net/xdp/xsk.c | 31 ++++++++++++++++++++-----------
net/xdp/xsk_buff_pool.c | 3 +++
3 files changed, 24 insertions(+), 12 deletions(-)
--
2.41.3
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH net v3 1/8] xsk: reject sw-csum UMEM binding to IFF_TX_SKB_NO_LINEAR devices
2026-04-22 3:36 [PATCH net v3 0/8] xsk: fix bugs around xsk skb allocation Jason Xing
@ 2026-04-22 3:36 ` Jason Xing
2026-04-22 3:36 ` [PATCH net v3 2/8] xsk: handle NULL dereference of the skb without frags issue Jason Xing
` (6 subsequent siblings)
7 siblings, 0 replies; 19+ messages in thread
From: Jason Xing @ 2026-04-22 3:36 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend
Cc: bpf, netdev, Jason Xing
From: Jason Xing <kernelxing@tencent.com>
skb_checksum_help() is a common helper that writes the folded
16-bit checksum back via skb->data + csum_start + csum_offset,
i.e. it relies on the skb's linear head and fails (with WARN_ONCE
and -EINVAL) when skb_headlen() is 0.
AF_XDP generic xmit takes two very different paths depending on the
netdev. Drivers that advertise IFF_TX_SKB_NO_LINEAR (e.g. virtio_net)
skip the "copy payload into a linear head" step on purpose as a
performance optimisation: xsk_build_skb_zerocopy() only attaches UMEM
pages as frags and never calls skb_put(), so skb_headlen() stays 0
for the whole skb. For these skbs there is simply no linear area for
skb_checksum_help() to write the csum into - the sw-csum fallback is
structurally inapplicable.
The patch tries to catch this and reject the combination with error at
setup time. Rejecting at bind() converts this silent per-packet failure
into a synchronous, actionable -EOPNOTSUPP at setup time. HW csum and
launch_time metadata on IFF_TX_SKB_NO_LINEAR drivers are unaffected
because they do not call skb_checksum_help().
Without the patch, every descriptor carrying 'XDP_TX_METADATA |
XDP_TXMD_FLAGS_CHECKSUM' produces:
1) a WARN_ONCE "offset (N) >= skb_headlen() (0)" from skb_checksum_help(),
2) sendmsg() returning -EINVAL without consuming the descriptor
(invalid_descs is not incremented),
3) a wedged TX ring: __xsk_generic_xmit() does not advance the
consumer on non-EOVERFLOW errors, so the next sendmsg() re-reads
the same descriptor and re-hits the same WARN until the socket
is closed.
Closes: https://lore.kernel.org/all/20260419045822.843BFC2BCAF@smtp.kernel.org/#t
Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
net/xdp/xsk_buff_pool.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 37b7a68b89b3..c2521b6547e3 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -169,6 +169,9 @@ int xp_assign_dev(struct xsk_buff_pool *pool,
if (force_zc && force_copy)
return -EINVAL;
+ if (pool->tx_sw_csum && (netdev->priv_flags & IFF_TX_SKB_NO_LINEAR))
+ return -EOPNOTSUPP;
+
if (xsk_get_pool_from_qid(netdev, queue_id))
return -EBUSY;
--
2.41.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net v3 2/8] xsk: handle NULL dereference of the skb without frags issue
2026-04-22 3:36 [PATCH net v3 0/8] xsk: fix bugs around xsk skb allocation Jason Xing
2026-04-22 3:36 ` [PATCH net v3 1/8] xsk: reject sw-csum UMEM binding to IFF_TX_SKB_NO_LINEAR devices Jason Xing
@ 2026-04-22 3:36 ` Jason Xing
2026-04-22 3:36 ` [PATCH net v3 3/8] xsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path Jason Xing
` (5 subsequent siblings)
7 siblings, 0 replies; 19+ messages in thread
From: Jason Xing @ 2026-04-22 3:36 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend
Cc: bpf, netdev, Jason Xing
From: Jason Xing <kernelxing@tencent.com>
When a first descriptor (xs->skb == NULL) triggers -EOVERFLOW in
xsk_build_skb_zerocopy (e.g., MAX_SKB_FRAGS exceeded), the free_err
EOVERFLOW handler unconditionally dereferences xs->skb via
xsk_inc_num_desc(xs->skb) and xsk_drop_skb(xs->skb), causing a NULL
pointer dereference.
In this series, the skb is already freed by kfree_skb() inside
xsk_build_skb_zerocopy for the first-descriptor case, so we only need
to do the bookkeeping: cancel the one reserved CQ slot and account for
the single invalid descriptor.
Guard the existing xsk_inc_num_desc/xsk_drop_skb calls with an
xs->skb check (for the continuation case), and add an else branch
for the first-descriptor case that manually cancels the CQ slot and
increments invalid_descs by one.
Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
net/xdp/xsk.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 6149f6a79897..6521604f8d42 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -893,9 +893,14 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
kfree_skb(skb);
if (err == -EOVERFLOW) {
- /* Drop the packet */
- xsk_inc_num_desc(xs->skb);
- xsk_drop_skb(xs->skb);
+ if (xs->skb) {
+ /* Drop the packet */
+ xsk_inc_num_desc(xs->skb);
+ xsk_drop_skb(xs->skb);
+ } else {
+ xsk_cq_cancel_locked(xs->pool, 1);
+ xs->tx->invalid_descs++;
+ }
xskq_cons_release(xs->tx);
} else {
/* Let application retry */
--
2.41.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net v3 3/8] xsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path
2026-04-22 3:36 [PATCH net v3 0/8] xsk: fix bugs around xsk skb allocation Jason Xing
2026-04-22 3:36 ` [PATCH net v3 1/8] xsk: reject sw-csum UMEM binding to IFF_TX_SKB_NO_LINEAR devices Jason Xing
2026-04-22 3:36 ` [PATCH net v3 2/8] xsk: handle NULL dereference of the skb without frags issue Jason Xing
@ 2026-04-22 3:36 ` Jason Xing
2026-04-22 16:31 ` Stanislav Fomichev
2026-04-22 3:36 ` [PATCH net v3 4/8] xsk: prevent CQ desync when freeing half-built skbs in xsk_build_skb() Jason Xing
` (4 subsequent siblings)
7 siblings, 1 reply; 19+ messages in thread
From: Jason Xing @ 2026-04-22 3:36 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend
Cc: bpf, netdev, Jason Xing
From: Jason Xing <kernelxing@tencent.com>
When xsk_build_skb() processes multi-buffer packets in copy mode, the
first descriptor stores data into the skb linear area without adding
any frags, so nr_frags stays at 0. The caller then sets xs->skb = skb
to accumulate subsequent descriptors.
If a continuation descriptor fails (e.g. alloc_page returns NULL with
-EAGAIN), we jump to free_err where the condition:
if (skb && !skb_shinfo(skb)->nr_frags)
kfree_skb(skb);
evaluates to true because nr_frags is still 0 (the first descriptor
used the linear area, not frags). This frees the skb while xs->skb
still points to it, creating a dangling pointer. On the next transmit
attempt or socket close, xs->skb is dereferenced, causing a
use-after-free or double-free.
Fix by using a !xs->skb check to handle first frag situation, ensuring
we only free skbs that were freshly allocated in this call
(xs->skb is NULL) and never free an in-progress multi-buffer skb that
the caller still references.
Closes: https://lore.kernel.org/all/20260415082654.21026-4-kerneljasonxing@gmail.com/
Fixes: 6b9c129c2f93 ("xsk: remove @first_frag from xsk_build_skb()")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
net/xdp/xsk.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 6521604f8d42..d23d1b14b8b4 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -889,7 +889,7 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
return skb;
free_err:
- if (skb && !skb_shinfo(skb)->nr_frags)
+ if (skb && !xs->skb)
kfree_skb(skb);
if (err == -EOVERFLOW) {
--
2.41.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net v3 4/8] xsk: prevent CQ desync when freeing half-built skbs in xsk_build_skb()
2026-04-22 3:36 [PATCH net v3 0/8] xsk: fix bugs around xsk skb allocation Jason Xing
` (2 preceding siblings ...)
2026-04-22 3:36 ` [PATCH net v3 3/8] xsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path Jason Xing
@ 2026-04-22 3:36 ` Jason Xing
2026-04-22 16:31 ` Stanislav Fomichev
2026-04-22 3:36 ` [PATCH net v3 5/8] xsk: avoid skb leak in XDP_TX_METADATA case Jason Xing
` (3 subsequent siblings)
7 siblings, 1 reply; 19+ messages in thread
From: Jason Xing @ 2026-04-22 3:36 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend
Cc: bpf, netdev, Jason Xing
From: Jason Xing <kernelxing@tencent.com>
Once xsk_skb_init_misc() has been called on an skb, its destructor is
set to xsk_destruct_skb(), which submits the descriptor address(es) to
the completion queue and advances the CQ producer. If such an skb is
subsequently freed via kfree_skb() along an error path - before the
skb has ever been handed to the driver - the destructor still runs and
submits a bogus, half-initialized address to the CQ.
Postpone the init phase when we believe the allocation of first frag is
successfully completed. Before this init, skb can be safely freed by
kfree_skb().
Closes: https://lore.kernel.org/all/20260419045822.843BFC2BCAF@smtp.kernel.org/
Fixes: c30d084960cf ("xsk: avoid overwriting skb fields for multi-buffer traffic")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
net/xdp/xsk.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index d23d1b14b8b4..88ec6d2cbbcf 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -739,8 +739,6 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
return ERR_PTR(err);
skb_reserve(skb, hr);
-
- xsk_skb_init_misc(skb, xs, desc->addr);
if (desc->options & XDP_TX_METADATA) {
err = xsk_skb_metadata(skb, buffer, desc, pool, hr);
if (unlikely(err))
@@ -834,7 +832,6 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
if (unlikely(err))
goto free_err;
- xsk_skb_init_misc(skb, xs, desc->addr);
if (desc->options & XDP_TX_METADATA) {
err = xsk_skb_metadata(skb, buffer, desc,
xs->pool, hr);
@@ -884,6 +881,8 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
}
}
+ if (!xs->skb)
+ xsk_skb_init_misc(skb, xs, desc->addr);
xsk_inc_num_desc(skb);
return skb;
--
2.41.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net v3 5/8] xsk: avoid skb leak in XDP_TX_METADATA case
2026-04-22 3:36 [PATCH net v3 0/8] xsk: fix bugs around xsk skb allocation Jason Xing
` (3 preceding siblings ...)
2026-04-22 3:36 ` [PATCH net v3 4/8] xsk: prevent CQ desync when freeing half-built skbs in xsk_build_skb() Jason Xing
@ 2026-04-22 3:36 ` Jason Xing
2026-04-22 16:31 ` Stanislav Fomichev
2026-04-22 3:36 ` [PATCH net v3 6/8] xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS Jason Xing
` (2 subsequent siblings)
7 siblings, 1 reply; 19+ messages in thread
From: Jason Xing @ 2026-04-22 3:36 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend
Cc: bpf, netdev, Jason Xing
From: Jason Xing <kernelxing@tencent.com>
Fix it by explicitly adding kfree_skb() before returning back to its
caller.
How to reproduce it in virtio_net:
1. the current skb is the first one (which means no frag and xs->skb is
NULL) and users enable metadata feature.
2. xsk_skb_metadata() returns a error code.
3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'.
4. there is no chance to free this skb anymore.
Closes: https://lore.kernel.org/all/20260415085204.3F87AC19424@smtp.kernel.org/
Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
net/xdp/xsk.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 88ec6d2cbbcf..c49b58199d2f 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -741,8 +741,10 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
skb_reserve(skb, hr);
if (desc->options & XDP_TX_METADATA) {
err = xsk_skb_metadata(skb, buffer, desc, pool, hr);
- if (unlikely(err))
+ if (unlikely(err)) {
+ kfree_skb(skb);
return ERR_PTR(err);
+ }
}
} else {
struct xsk_addrs *xsk_addr;
--
2.41.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net v3 6/8] xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS
2026-04-22 3:36 [PATCH net v3 0/8] xsk: fix bugs around xsk skb allocation Jason Xing
` (4 preceding siblings ...)
2026-04-22 3:36 ` [PATCH net v3 5/8] xsk: avoid skb leak in XDP_TX_METADATA case Jason Xing
@ 2026-04-22 3:36 ` Jason Xing
2026-04-22 16:31 ` Stanislav Fomichev
2026-04-22 3:36 ` [PATCH net v3 7/8] xsk: fix xsk_addrs slab leak on multi-buffer error path Jason Xing
2026-04-22 3:36 ` [PATCH net v3 8/8] xsk: don't support AF_XDP on 32-bit architectures Jason Xing
7 siblings, 1 reply; 19+ messages in thread
From: Jason Xing @ 2026-04-22 3:36 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend
Cc: bpf, netdev, Jason Xing
From: Jason Xing <kernelxing@tencent.com>
Fix it by explicitly adding kfree_skb() before returning back to its
caller.
How to reproduce it in virtio_net:
1. the current skb is the first one (which means xs->skb is NULL) and
hit the limit MAX_SKB_FRAGS.
2. xsk_build_skb_zerocopy() returns -EOVERFLOW.
3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'. This
is why bug can be triggered.
4. there is no chance to free this skb anymore.
Note that if in this case the xs->skb is not NULL, xsk_build_skb() will
call xsk_drop_skb(xs->skb) to do the right thing.
Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
net/xdp/xsk.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index c49b58199d2f..5e6326e076ab 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -776,8 +776,11 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
addr = buffer - pool->addrs;
for (copied = 0, i = skb_shinfo(skb)->nr_frags; copied < len; i++) {
- if (unlikely(i >= MAX_SKB_FRAGS))
+ if (unlikely(i >= MAX_SKB_FRAGS)) {
+ if (!xs->skb)
+ kfree_skb(skb);
return ERR_PTR(-EOVERFLOW);
+ }
page = pool->umem->pgs[addr >> PAGE_SHIFT];
get_page(page);
--
2.41.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net v3 7/8] xsk: fix xsk_addrs slab leak on multi-buffer error path
2026-04-22 3:36 [PATCH net v3 0/8] xsk: fix bugs around xsk skb allocation Jason Xing
` (5 preceding siblings ...)
2026-04-22 3:36 ` [PATCH net v3 6/8] xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS Jason Xing
@ 2026-04-22 3:36 ` Jason Xing
2026-04-22 3:36 ` [PATCH net v3 8/8] xsk: don't support AF_XDP on 32-bit architectures Jason Xing
7 siblings, 0 replies; 19+ messages in thread
From: Jason Xing @ 2026-04-22 3:36 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend
Cc: bpf, netdev, Jason Xing
From: Jason Xing <kernelxing@tencent.com>
When xsk_build_skb() / xsk_build_skb_zerocopy() sees the first
continuation descriptor, it promotes destructor_arg from an inlined
address to a freshly allocated xsk_addrs (num_descs = 1). The counter
is bumped to >= 2 only at the very end of a successful build (by calling
xsk_inc_num_desc()).
If the build fails in between (e.g. alloc_page() returns NULL with
-EAGAIN, or the MAX_SKB_FRAGS overflow hits), we jump to free_err, skip
calling xsk_inc_num_desc() to increment num_descs and leave the half-built
skb attached to xs->skb for the app to retry. The skb now has
1) destructor_arg = a real xsk_addrs pointer,
2) num_descs = 1
If the app never retries and just close()s the socket, xsk_release()
calls xsk_drop_skb() -> xsk_consume_skb(), which decides whether to
free xsk_addrs by testing num_descs > 1:
if (unlikely(num_descs > 1))
kmem_cache_free(xsk_tx_generic_cache, destructor_arg);
Because num_descs is exactly 1 the branch is skipped and the
xsk_addrs object is leaked to the xsk_tx_generic_cache slab.
Fix it by directly testing if destructor_arg is still addr. Or else it
is modified and used to store the newly allocated memory from
xsk_tx_generic_cache regardless of increment of num_desc, which we
need to handle.
Closes: https://lore.kernel.org/all/20260419045824.D9E5EC2BCAF@smtp.kernel.org/
Fixes: 0ebc27a4c67d ("xsk: avoid data corruption on cq descriptor number")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
net/xdp/xsk.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 5e6326e076ab..ed96f6ec8ff2 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -605,7 +605,7 @@ static void xsk_cq_submit_addr_locked(struct xsk_buff_pool *pool,
spin_lock_irqsave(&pool->cq_prod_lock, flags);
idx = xskq_get_prod(pool->cq);
- if (unlikely(num_descs > 1)) {
+ if (unlikely(!xsk_skb_destructor_is_addr(skb))) {
xsk_addr = (struct xsk_addrs *)skb_shinfo(skb)->destructor_arg;
for (i = 0; i < num_descs; i++) {
@@ -660,7 +660,7 @@ static void xsk_consume_skb(struct sk_buff *skb)
u32 num_descs = xsk_get_num_desc(skb);
struct xsk_addrs *xsk_addr;
- if (unlikely(num_descs > 1)) {
+ if (unlikely(!xsk_skb_destructor_is_addr(skb))) {
xsk_addr = (struct xsk_addrs *)skb_shinfo(skb)->destructor_arg;
kmem_cache_free(xsk_tx_generic_cache, xsk_addr);
}
--
2.41.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net v3 8/8] xsk: don't support AF_XDP on 32-bit architectures
2026-04-22 3:36 [PATCH net v3 0/8] xsk: fix bugs around xsk skb allocation Jason Xing
` (6 preceding siblings ...)
2026-04-22 3:36 ` [PATCH net v3 7/8] xsk: fix xsk_addrs slab leak on multi-buffer error path Jason Xing
@ 2026-04-22 3:36 ` Jason Xing
2026-04-22 16:09 ` Alexander Lobakin
` (2 more replies)
7 siblings, 3 replies; 19+ messages in thread
From: Jason Xing @ 2026-04-22 3:36 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend
Cc: bpf, netdev, Jason Xing
From: Jason Xing <kernelxing@tencent.com>
In copy mode TX, xsk_skb_destructor_set_addr() stores the 64-bit
descriptor address into skb_shinfo(skb)->destructor_arg (void *) via a
uintptr_t cast:
skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t)addr | 0x1UL);
On 32-bit architectures uintptr_t is 32 bits, so the upper 32 bits of
the descriptor address are silently dropped. In XDP_ZEROCOPY unaligned
mode the chunk offset is encoded in bits 48-63 of the descriptor
address (XSK_UNALIGNED_BUF_OFFSET_SHIFT = 48), meaning the offset is
lost entirely. The completion queue then returns a truncated address to
userspace, making buffer recycling impossible.
Since we hear no one is using AF_XDP on 32-bit arch, we decided to
strictly stop supporting it at compile time.
Closes: https://lore.kernel.org/all/20260419045824.D9E5EC2BCAF@smtp.kernel.org/
Fixes: 0ebc27a4c67d ("xsk: avoid data corruption on cq descriptor number")
Suggested-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
net/xdp/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/xdp/Kconfig b/net/xdp/Kconfig
index 71af2febe72a..819aa5795f50 100644
--- a/net/xdp/Kconfig
+++ b/net/xdp/Kconfig
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only
config XDP_SOCKETS
bool "XDP sockets"
- depends on BPF_SYSCALL
+ depends on BPF_SYSCALL && 64BIT
default n
help
XDP sockets allows a channel between XDP programs and
--
2.41.3
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH net v3 8/8] xsk: don't support AF_XDP on 32-bit architectures
2026-04-22 3:36 ` [PATCH net v3 8/8] xsk: don't support AF_XDP on 32-bit architectures Jason Xing
@ 2026-04-22 16:09 ` Alexander Lobakin
2026-04-22 16:37 ` Jason Xing
2026-04-22 16:31 ` Stanislav Fomichev
2026-04-22 20:27 ` David Laight
2 siblings, 1 reply; 19+ messages in thread
From: Alexander Lobakin @ 2026-04-22 16:09 UTC (permalink / raw)
To: Jason Xing
Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, bpf, netdev, Jason Xing
From: Jason Xing <kerneljasonxing@gmail.com>
Date: Wed, 22 Apr 2026 11:36:50 +0800
> From: Jason Xing <kernelxing@tencent.com>
>
> In copy mode TX, xsk_skb_destructor_set_addr() stores the 64-bit
> descriptor address into skb_shinfo(skb)->destructor_arg (void *) via a
> uintptr_t cast:
>
> skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t)addr | 0x1UL);
>
> On 32-bit architectures uintptr_t is 32 bits, so the upper 32 bits of
> the descriptor address are silently dropped. In XDP_ZEROCOPY unaligned
> mode the chunk offset is encoded in bits 48-63 of the descriptor
> address (XSK_UNALIGNED_BUF_OFFSET_SHIFT = 48), meaning the offset is
> lost entirely. The completion queue then returns a truncated address to
> userspace, making buffer recycling impossible.
What if we relax the restriction a bit? For example, refuse to configure
an XSk socket in unaligned mode if on a 32-bit arch? Or add a check
under CONFIG_32_BIT like it was done in Page Pool:
skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t)addr | 0x1UL);
#ifdef CONFIG_32BIT
if (((uintptr_t)skb_shinfo(skb)->destructor_arg) & ~0x1UL) != addr)
// WARN_ONCE or whatever + error path
#endif
I never used XSk on a 32-bit arch, but back when I was working on 32-bit
MIPS 1G routers, I wanted to add native XSk support to the Eth driver.
Sure, just for fun, now that we have cheap AArch64 and other 64-bit
embedded chips, 32-bit embedded networking SoCs are almost dead, but
OTOH, as you can see, other subsystems like PP still try to support 32 bit.
Especially given that this issue applies to only to the skb XSk path,
not native in-driver implementations.
>
> Since we hear no one is using AF_XDP on 32-bit arch, we decided to
> strictly stop supporting it at compile time.
>
> Closes: https://lore.kernel.org/all/20260419045824.D9E5EC2BCAF@smtp.kernel.org/
> Fixes: 0ebc27a4c67d ("xsk: avoid data corruption on cq descriptor number")
> Suggested-by: Stanislav Fomichev <sdf@fomichev.me>
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
> net/xdp/Kconfig | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/xdp/Kconfig b/net/xdp/Kconfig
> index 71af2febe72a..819aa5795f50 100644
> --- a/net/xdp/Kconfig
> +++ b/net/xdp/Kconfig
> @@ -1,7 +1,7 @@
> # SPDX-License-Identifier: GPL-2.0-only
> config XDP_SOCKETS
> bool "XDP sockets"
> - depends on BPF_SYSCALL
> + depends on BPF_SYSCALL && 64BIT
> default n
> help
> XDP sockets allows a channel between XDP programs and
Thanks,
Olek
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net v3 3/8] xsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path
2026-04-22 3:36 ` [PATCH net v3 3/8] xsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path Jason Xing
@ 2026-04-22 16:31 ` Stanislav Fomichev
0 siblings, 0 replies; 19+ messages in thread
From: Stanislav Fomichev @ 2026-04-22 16:31 UTC (permalink / raw)
To: Jason Xing; +Cc: bpf, netdev, Jason Xing
> From: Jason Xing <kernelxing@tencent.com>
>
> When xsk_build_skb() processes multi-buffer packets in copy mode, the
> first descriptor stores data into the skb linear area without adding
> any frags, so nr_frags stays at 0. The caller then sets xs->skb = skb
> to accumulate subsequent descriptors.
>
> If a continuation descriptor fails (e.g. alloc_page returns NULL with
> -EAGAIN), we jump to free_err where the condition:
>
> if (skb && !skb_shinfo(skb)->nr_frags)
> kfree_skb(skb);
>
> evaluates to true because nr_frags is still 0 (the first descriptor
> used the linear area, not frags). This frees the skb while xs->skb
> still points to it, creating a dangling pointer. On the next transmit
> attempt or socket close, xs->skb is dereferenced, causing a
> use-after-free or double-free.
>
> Fix by using a !xs->skb check to handle first frag situation, ensuring
> we only free skbs that were freshly allocated in this call
> (xs->skb is NULL) and never free an in-progress multi-buffer skb that
> the caller still references.
>
> Closes: https://lore.kernel.org/all/20260415082654.21026-4-kerneljasonxing@gmail.com/
> Fixes: 6b9c129c2f93 ("xsk: remove @first_frag from xsk_build_skb()")
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
> net/xdp/xsk.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 6521604f8d42..d23d1b14b8b4 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -889,7 +889,7 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> return skb;
>
> free_err:
> - if (skb && !skb_shinfo(skb)->nr_frags)
> + if (skb && !xs->skb)
> kfree_skb(skb);
>
> if (err == -EOVERFLOW) {
> --
> 2.41.3
>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net v3 4/8] xsk: prevent CQ desync when freeing half-built skbs in xsk_build_skb()
2026-04-22 3:36 ` [PATCH net v3 4/8] xsk: prevent CQ desync when freeing half-built skbs in xsk_build_skb() Jason Xing
@ 2026-04-22 16:31 ` Stanislav Fomichev
0 siblings, 0 replies; 19+ messages in thread
From: Stanislav Fomichev @ 2026-04-22 16:31 UTC (permalink / raw)
To: Jason Xing; +Cc: bpf, netdev, Jason Xing
> From: Jason Xing <kernelxing@tencent.com>
>
> Once xsk_skb_init_misc() has been called on an skb, its destructor is
> set to xsk_destruct_skb(), which submits the descriptor address(es) to
> the completion queue and advances the CQ producer. If such an skb is
> subsequently freed via kfree_skb() along an error path - before the
> skb has ever been handed to the driver - the destructor still runs and
> submits a bogus, half-initialized address to the CQ.
>
> Postpone the init phase when we believe the allocation of first frag is
> successfully completed. Before this init, skb can be safely freed by
> kfree_skb().
>
> Closes: https://lore.kernel.org/all/20260419045822.843BFC2BCAF@smtp.kernel.org/
> Fixes: c30d084960cf ("xsk: avoid overwriting skb fields for multi-buffer traffic")
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
> net/xdp/xsk.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index d23d1b14b8b4..88ec6d2cbbcf 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -739,8 +739,6 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
> return ERR_PTR(err);
>
> skb_reserve(skb, hr);
> -
> - xsk_skb_init_misc(skb, xs, desc->addr);
> if (desc->options & XDP_TX_METADATA) {
> err = xsk_skb_metadata(skb, buffer, desc, pool, hr);
> if (unlikely(err))
> @@ -834,7 +832,6 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> if (unlikely(err))
> goto free_err;
>
> - xsk_skb_init_misc(skb, xs, desc->addr);
> if (desc->options & XDP_TX_METADATA) {
> err = xsk_skb_metadata(skb, buffer, desc,
> xs->pool, hr);
> @@ -884,6 +881,8 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> }
> }
>
> + if (!xs->skb)
> + xsk_skb_init_misc(skb, xs, desc->addr);
> xsk_inc_num_desc(skb);
>
> return skb;
> --
> 2.41.3
>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net v3 5/8] xsk: avoid skb leak in XDP_TX_METADATA case
2026-04-22 3:36 ` [PATCH net v3 5/8] xsk: avoid skb leak in XDP_TX_METADATA case Jason Xing
@ 2026-04-22 16:31 ` Stanislav Fomichev
0 siblings, 0 replies; 19+ messages in thread
From: Stanislav Fomichev @ 2026-04-22 16:31 UTC (permalink / raw)
To: Jason Xing; +Cc: bpf, netdev, Jason Xing
> From: Jason Xing <kernelxing@tencent.com>
>
> Fix it by explicitly adding kfree_skb() before returning back to its
> caller.
>
> How to reproduce it in virtio_net:
> 1. the current skb is the first one (which means no frag and xs->skb is
> NULL) and users enable metadata feature.
> 2. xsk_skb_metadata() returns a error code.
> 3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'.
> 4. there is no chance to free this skb anymore.
>
> Closes: https://lore.kernel.org/all/20260415085204.3F87AC19424@smtp.kernel.org/
> Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function")
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
> net/xdp/xsk.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 88ec6d2cbbcf..c49b58199d2f 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -741,8 +741,10 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
> skb_reserve(skb, hr);
> if (desc->options & XDP_TX_METADATA) {
> err = xsk_skb_metadata(skb, buffer, desc, pool, hr);
> - if (unlikely(err))
> + if (unlikely(err)) {
> + kfree_skb(skb);
> return ERR_PTR(err);
> + }
> }
> } else {
> struct xsk_addrs *xsk_addr;
> --
> 2.41.3
>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net v3 6/8] xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS
2026-04-22 3:36 ` [PATCH net v3 6/8] xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS Jason Xing
@ 2026-04-22 16:31 ` Stanislav Fomichev
0 siblings, 0 replies; 19+ messages in thread
From: Stanislav Fomichev @ 2026-04-22 16:31 UTC (permalink / raw)
To: Jason Xing; +Cc: bpf, netdev, Jason Xing
> From: Jason Xing <kernelxing@tencent.com>
>
> Fix it by explicitly adding kfree_skb() before returning back to its
> caller.
>
> How to reproduce it in virtio_net:
> 1. the current skb is the first one (which means xs->skb is NULL) and
> hit the limit MAX_SKB_FRAGS.
> 2. xsk_build_skb_zerocopy() returns -EOVERFLOW.
> 3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'. This
> is why bug can be triggered.
> 4. there is no chance to free this skb anymore.
>
> Note that if in this case the xs->skb is not NULL, xsk_build_skb() will
> call xsk_drop_skb(xs->skb) to do the right thing.
>
> Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
> net/xdp/xsk.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index c49b58199d2f..5e6326e076ab 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -776,8 +776,11 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
> addr = buffer - pool->addrs;
>
> for (copied = 0, i = skb_shinfo(skb)->nr_frags; copied < len; i++) {
> - if (unlikely(i >= MAX_SKB_FRAGS))
> + if (unlikely(i >= MAX_SKB_FRAGS)) {
> + if (!xs->skb)
> + kfree_skb(skb);
> return ERR_PTR(-EOVERFLOW);
> + }
>
> page = pool->umem->pgs[addr >> PAGE_SHIFT];
> get_page(page);
> --
> 2.41.3
>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net v3 8/8] xsk: don't support AF_XDP on 32-bit architectures
2026-04-22 3:36 ` [PATCH net v3 8/8] xsk: don't support AF_XDP on 32-bit architectures Jason Xing
2026-04-22 16:09 ` Alexander Lobakin
@ 2026-04-22 16:31 ` Stanislav Fomichev
2026-04-22 20:27 ` David Laight
2 siblings, 0 replies; 19+ messages in thread
From: Stanislav Fomichev @ 2026-04-22 16:31 UTC (permalink / raw)
To: Jason Xing; +Cc: bpf, netdev, Jason Xing
> From: Jason Xing <kernelxing@tencent.com>
>
> In copy mode TX, xsk_skb_destructor_set_addr() stores the 64-bit
> descriptor address into skb_shinfo(skb)->destructor_arg (void *) via a
> uintptr_t cast:
>
> skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t)addr | 0x1UL);
>
> On 32-bit architectures uintptr_t is 32 bits, so the upper 32 bits of
> the descriptor address are silently dropped. In XDP_ZEROCOPY unaligned
> mode the chunk offset is encoded in bits 48-63 of the descriptor
> address (XSK_UNALIGNED_BUF_OFFSET_SHIFT = 48), meaning the offset is
> lost entirely. The completion queue then returns a truncated address to
> userspace, making buffer recycling impossible.
>
> Since we hear no one is using AF_XDP on 32-bit arch, we decided to
> strictly stop supporting it at compile time.
>
> Closes: https://lore.kernel.org/all/20260419045824.D9E5EC2BCAF@smtp.kernel.org/
> Fixes: 0ebc27a4c67d ("xsk: avoid data corruption on cq descriptor number")
> Suggested-by: Stanislav Fomichev <sdf@fomichev.me>
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
> net/xdp/Kconfig | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/xdp/Kconfig b/net/xdp/Kconfig
> index 71af2febe72a..819aa5795f50 100644
> --- a/net/xdp/Kconfig
> +++ b/net/xdp/Kconfig
> @@ -1,7 +1,7 @@
> # SPDX-License-Identifier: GPL-2.0-only
> config XDP_SOCKETS
> bool "XDP sockets"
> - depends on BPF_SYSCALL
> + depends on BPF_SYSCALL && 64BIT
> default n
> help
> XDP sockets allows a channel between XDP programs and
> --
> 2.41.3
>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net v3 8/8] xsk: don't support AF_XDP on 32-bit architectures
2026-04-22 16:09 ` Alexander Lobakin
@ 2026-04-22 16:37 ` Jason Xing
2026-04-22 16:58 ` Maciej Fijalkowski
2026-04-22 17:00 ` Alexander Lobakin
0 siblings, 2 replies; 19+ messages in thread
From: Jason Xing @ 2026-04-22 16:37 UTC (permalink / raw)
To: Alexander Lobakin
Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, bpf, netdev, Jason Xing
On Thu, Apr 23, 2026 at 12:10 AM Alexander Lobakin
<aleksander.lobakin@intel.com> wrote:
>
> From: Jason Xing <kerneljasonxing@gmail.com>
> Date: Wed, 22 Apr 2026 11:36:50 +0800
>
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > In copy mode TX, xsk_skb_destructor_set_addr() stores the 64-bit
> > descriptor address into skb_shinfo(skb)->destructor_arg (void *) via a
> > uintptr_t cast:
> >
> > skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t)addr | 0x1UL);
> >
> > On 32-bit architectures uintptr_t is 32 bits, so the upper 32 bits of
> > the descriptor address are silently dropped. In XDP_ZEROCOPY unaligned
> > mode the chunk offset is encoded in bits 48-63 of the descriptor
> > address (XSK_UNALIGNED_BUF_OFFSET_SHIFT = 48), meaning the offset is
> > lost entirely. The completion queue then returns a truncated address to
> > userspace, making buffer recycling impossible.
>
> What if we relax the restriction a bit? For example, refuse to configure
As to the bug itself, yes, It only affects the unaligned mode.
I wonder if we can support this after someone requires us to support
32-bit arch and use it in the real world, then we can use the previous
patch to complete the full support (which doesn't harm the path on
64-bit arch).
The code looks like this based on your suggestion. Just for the record.
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 58da2f4f4397..03417b04592f 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -177,6 +177,9 @@ static int xdp_umem_reg(struct xdp_umem *umem,
struct xdp_umem_reg *mr)
if (mr->flags & ~XDP_UMEM_FLAGS_VALID)
return -EINVAL;
+ if (!IS_ENABLED(CONFIG_64BIT) && unaligned_chunks)
+ return -EOPNOTSUPP;
+
if (!unaligned_chunks && !is_power_of_2(chunk_size))
return -EINVAL;
Actually I'm fine with either of them. Right now I'm not so sure which
direction this patch should take :)
Thanks,
Jason
> an XSk socket in unaligned mode if on a 32-bit arch? Or add a check
> under CONFIG_32_BIT like it was done in Page Pool:
>
> skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t)addr | 0x1UL);
>
> #ifdef CONFIG_32BIT
> if (((uintptr_t)skb_shinfo(skb)->destructor_arg) & ~0x1UL) != addr)
> // WARN_ONCE or whatever + error path
> #endif
>
> I never used XSk on a 32-bit arch, but back when I was working on 32-bit
> MIPS 1G routers, I wanted to add native XSk support to the Eth driver.
> Sure, just for fun, now that we have cheap AArch64 and other 64-bit
> embedded chips, 32-bit embedded networking SoCs are almost dead, but
> OTOH, as you can see, other subsystems like PP still try to support 32 bit.
> Especially given that this issue applies to only to the skb XSk path,
> not native in-driver implementations.
>
> >
> > Since we hear no one is using AF_XDP on 32-bit arch, we decided to
> > strictly stop supporting it at compile time.
> >
> > Closes: https://lore.kernel.org/all/20260419045824.D9E5EC2BCAF@smtp.kernel.org/
> > Fixes: 0ebc27a4c67d ("xsk: avoid data corruption on cq descriptor number")
> > Suggested-by: Stanislav Fomichev <sdf@fomichev.me>
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > ---
> > net/xdp/Kconfig | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/xdp/Kconfig b/net/xdp/Kconfig
> > index 71af2febe72a..819aa5795f50 100644
> > --- a/net/xdp/Kconfig
> > +++ b/net/xdp/Kconfig
> > @@ -1,7 +1,7 @@
> > # SPDX-License-Identifier: GPL-2.0-only
> > config XDP_SOCKETS
> > bool "XDP sockets"
> > - depends on BPF_SYSCALL
> > + depends on BPF_SYSCALL && 64BIT
> > default n
> > help
> > XDP sockets allows a channel between XDP programs and
>
> Thanks,
> Olek
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH net v3 8/8] xsk: don't support AF_XDP on 32-bit architectures
2026-04-22 16:37 ` Jason Xing
@ 2026-04-22 16:58 ` Maciej Fijalkowski
2026-04-22 17:00 ` Alexander Lobakin
1 sibling, 0 replies; 19+ messages in thread
From: Maciej Fijalkowski @ 2026-04-22 16:58 UTC (permalink / raw)
To: Jason Xing
Cc: Alexander Lobakin, davem, edumazet, kuba, pabeni, bjorn,
magnus.karlsson, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, bpf, netdev, Jason Xing
On Thu, Apr 23, 2026 at 12:37:07AM +0800, Jason Xing wrote:
> On Thu, Apr 23, 2026 at 12:10 AM Alexander Lobakin
> <aleksander.lobakin@intel.com> wrote:
> >
> > From: Jason Xing <kerneljasonxing@gmail.com>
> > Date: Wed, 22 Apr 2026 11:36:50 +0800
> >
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > In copy mode TX, xsk_skb_destructor_set_addr() stores the 64-bit
> > > descriptor address into skb_shinfo(skb)->destructor_arg (void *) via a
> > > uintptr_t cast:
> > >
> > > skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t)addr | 0x1UL);
> > >
> > > On 32-bit architectures uintptr_t is 32 bits, so the upper 32 bits of
> > > the descriptor address are silently dropped. In XDP_ZEROCOPY unaligned
> > > mode the chunk offset is encoded in bits 48-63 of the descriptor
> > > address (XSK_UNALIGNED_BUF_OFFSET_SHIFT = 48), meaning the offset is
> > > lost entirely. The completion queue then returns a truncated address to
> > > userspace, making buffer recycling impossible.
> >
> > What if we relax the restriction a bit? For example, refuse to configure
>
> As to the bug itself, yes, It only affects the unaligned mode.
>
> I wonder if we can support this after someone requires us to support
> 32-bit arch and use it in the real world, then we can use the previous
> patch to complete the full support (which doesn't harm the path on
> 64-bit arch).
>
> The code looks like this based on your suggestion. Just for the record.
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 58da2f4f4397..03417b04592f 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -177,6 +177,9 @@ static int xdp_umem_reg(struct xdp_umem *umem,
> struct xdp_umem_reg *mr)
> if (mr->flags & ~XDP_UMEM_FLAGS_VALID)
> return -EINVAL;
> + if (!IS_ENABLED(CONFIG_64BIT) && unaligned_chunks)
> + return -EOPNOTSUPP;
> +
> if (!unaligned_chunks && !is_power_of_2(chunk_size))
> return -EINVAL;
>
> Actually I'm fine with either of them. Right now I'm not so sure which
> direction this patch should take :)
Abstracting from discussion, that is not a patch we would want against
-net. We should not forbid 32bit on stable kernels, maybe there is someone
in the 'basement' using 32bit xsk on stable kernel. Plus the fixes tag vs
patch's content looks weird ;)
>
> Thanks,
> Jason
>
> > an XSk socket in unaligned mode if on a 32-bit arch? Or add a check
> > under CONFIG_32_BIT like it was done in Page Pool:
> >
> > skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t)addr | 0x1UL);
> >
> > #ifdef CONFIG_32BIT
> > if (((uintptr_t)skb_shinfo(skb)->destructor_arg) & ~0x1UL) != addr)
> > // WARN_ONCE or whatever + error path
> > #endif
> >
> > I never used XSk on a 32-bit arch, but back when I was working on 32-bit
> > MIPS 1G routers, I wanted to add native XSk support to the Eth driver.
> > Sure, just for fun, now that we have cheap AArch64 and other 64-bit
> > embedded chips, 32-bit embedded networking SoCs are almost dead, but
> > OTOH, as you can see, other subsystems like PP still try to support 32 bit.
> > Especially given that this issue applies to only to the skb XSk path,
> > not native in-driver implementations.
> >
> > >
> > > Since we hear no one is using AF_XDP on 32-bit arch, we decided to
> > > strictly stop supporting it at compile time.
> > >
> > > Closes: https://lore.kernel.org/all/20260419045824.D9E5EC2BCAF@smtp.kernel.org/
> > > Fixes: 0ebc27a4c67d ("xsk: avoid data corruption on cq descriptor number")
> > > Suggested-by: Stanislav Fomichev <sdf@fomichev.me>
> > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > ---
> > > net/xdp/Kconfig | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/net/xdp/Kconfig b/net/xdp/Kconfig
> > > index 71af2febe72a..819aa5795f50 100644
> > > --- a/net/xdp/Kconfig
> > > +++ b/net/xdp/Kconfig
> > > @@ -1,7 +1,7 @@
> > > # SPDX-License-Identifier: GPL-2.0-only
> > > config XDP_SOCKETS
> > > bool "XDP sockets"
> > > - depends on BPF_SYSCALL
> > > + depends on BPF_SYSCALL && 64BIT
> > > default n
> > > help
> > > XDP sockets allows a channel between XDP programs and
> >
> > Thanks,
> > Olek
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net v3 8/8] xsk: don't support AF_XDP on 32-bit architectures
2026-04-22 16:37 ` Jason Xing
2026-04-22 16:58 ` Maciej Fijalkowski
@ 2026-04-22 17:00 ` Alexander Lobakin
1 sibling, 0 replies; 19+ messages in thread
From: Alexander Lobakin @ 2026-04-22 17:00 UTC (permalink / raw)
To: Jason Xing
Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, bpf, netdev, Jason Xing
From: Jason Xing <kerneljasonxing@gmail.com>
Date: Thu, 23 Apr 2026 00:37:07 +0800
> On Thu, Apr 23, 2026 at 12:10 AM Alexander Lobakin
> <aleksander.lobakin@intel.com> wrote:
>>
>> From: Jason Xing <kerneljasonxing@gmail.com>
>> Date: Wed, 22 Apr 2026 11:36:50 +0800
>>
>>> From: Jason Xing <kernelxing@tencent.com>
>>>
>>> In copy mode TX, xsk_skb_destructor_set_addr() stores the 64-bit
>>> descriptor address into skb_shinfo(skb)->destructor_arg (void *) via a
>>> uintptr_t cast:
>>>
>>> skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t)addr | 0x1UL);
>>>
>>> On 32-bit architectures uintptr_t is 32 bits, so the upper 32 bits of
>>> the descriptor address are silently dropped. In XDP_ZEROCOPY unaligned
>>> mode the chunk offset is encoded in bits 48-63 of the descriptor
>>> address (XSK_UNALIGNED_BUF_OFFSET_SHIFT = 48), meaning the offset is
>>> lost entirely. The completion queue then returns a truncated address to
>>> userspace, making buffer recycling impossible.
>>
>> What if we relax the restriction a bit? For example, refuse to configure
>
> As to the bug itself, yes, It only affects the unaligned mode.
>
> I wonder if we can support this after someone requires us to support
> 32-bit arch and use it in the real world, then we can use the previous
> patch to complete the full support (which doesn't harm the path on
> 64-bit arch).
>
> The code looks like this based on your suggestion. Just for the record.
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 58da2f4f4397..03417b04592f 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -177,6 +177,9 @@ static int xdp_umem_reg(struct xdp_umem *umem,
> struct xdp_umem_reg *mr)
> if (mr->flags & ~XDP_UMEM_FLAGS_VALID)
> return -EINVAL;
> + if (!IS_ENABLED(CONFIG_64BIT) && unaligned_chunks)
> + return -EOPNOTSUPP;
I'm fine with this one, as well as with the check I proposed. This one
would happen during the configuration so it won't affect the hotpath,
my check is just inspired by what Page Pool does when
`sizeof(dma_addr_t) > sizeof(unsigned long)` (rare case but perfectly
valid on some architectures).
> +
> if (!unaligned_chunks && !is_power_of_2(chunk_size))
> return -EINVAL;
>
> Actually I'm fine with either of them. Right now I'm not so sure which
> direction this patch should take :)
Up to you and other reviewers, but I really do think cutting off 32-bit
arches entirely due to a bug which can happen only under certain
conditions (skb XSk mode + unaligned mode) is too aggressive.
>
> Thanks,
> Jason
>
>> an XSk socket in unaligned mode if on a 32-bit arch? Or add a check
>> under CONFIG_32_BIT like it was done in Page Pool:
>>
>> skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t)addr | 0x1UL);
>>
>> #ifdef CONFIG_32BIT
>> if (((uintptr_t)skb_shinfo(skb)->destructor_arg) & ~0x1UL) != addr)
>> // WARN_ONCE or whatever + error path
>> #endif
>>
>> I never used XSk on a 32-bit arch, but back when I was working on 32-bit
>> MIPS 1G routers, I wanted to add native XSk support to the Eth driver.
>> Sure, just for fun, now that we have cheap AArch64 and other 64-bit
>> embedded chips, 32-bit embedded networking SoCs are almost dead, but
>> OTOH, as you can see, other subsystems like PP still try to support 32 bit.
>> Especially given that this issue applies to only to the skb XSk path,
>> not native in-driver implementations.
>>
>>>
>>> Since we hear no one is using AF_XDP on 32-bit arch, we decided to
>>> strictly stop supporting it at compile time.
Thanks,
Olek
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net v3 8/8] xsk: don't support AF_XDP on 32-bit architectures
2026-04-22 3:36 ` [PATCH net v3 8/8] xsk: don't support AF_XDP on 32-bit architectures Jason Xing
2026-04-22 16:09 ` Alexander Lobakin
2026-04-22 16:31 ` Stanislav Fomichev
@ 2026-04-22 20:27 ` David Laight
2 siblings, 0 replies; 19+ messages in thread
From: David Laight @ 2026-04-22 20:27 UTC (permalink / raw)
To: Jason Xing
Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, bpf, netdev, Jason Xing
On Wed, 22 Apr 2026 11:36:50 +0800
Jason Xing <kerneljasonxing@gmail.com> wrote:
> From: Jason Xing <kernelxing@tencent.com>
>
> In copy mode TX, xsk_skb_destructor_set_addr() stores the 64-bit
> descriptor address into skb_shinfo(skb)->destructor_arg (void *) via a
> uintptr_t cast:
>
> skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t)addr | 0x1UL);
>
> On 32-bit architectures uintptr_t is 32 bits, so the upper 32 bits of
> the descriptor address are silently dropped. In XDP_ZEROCOPY unaligned
> mode the chunk offset is encoded in bits 48-63 of the descriptor
> address (XSK_UNALIGNED_BUF_OFFSET_SHIFT = 48), meaning the offset is
> lost entirely. The completion queue then returns a truncated address to
> userspace, making buffer recycling impossible.
I had a look at how this is used.
I suspect that XSK_UNALIGNED_BUF_OFFSET_SHIFT can just be made smaller.
The 'addr' isn't really a normal address of any kind, it is a really
and offset into an array made up of pages of memory.
The actual address (kernel virtual of dma) is generated by:
ptr->array[addr >> PAGE_SHIFT] + addr & ~PAGE_MASK
(after removing the unaligned chunk offset from 'addr').
It is actually quite likely that there are enough free high bits
even in 32bit mode.
David
>
> Since we hear no one is using AF_XDP on 32-bit arch, we decided to
> strictly stop supporting it at compile time.
>
> Closes: https://lore.kernel.org/all/20260419045824.D9E5EC2BCAF@smtp.kernel.org/
> Fixes: 0ebc27a4c67d ("xsk: avoid data corruption on cq descriptor number")
> Suggested-by: Stanislav Fomichev <sdf@fomichev.me>
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
> net/xdp/Kconfig | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/xdp/Kconfig b/net/xdp/Kconfig
> index 71af2febe72a..819aa5795f50 100644
> --- a/net/xdp/Kconfig
> +++ b/net/xdp/Kconfig
> @@ -1,7 +1,7 @@
> # SPDX-License-Identifier: GPL-2.0-only
> config XDP_SOCKETS
> bool "XDP sockets"
> - depends on BPF_SYSCALL
> + depends on BPF_SYSCALL && 64BIT
> default n
> help
> XDP sockets allows a channel between XDP programs and
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2026-04-22 20:27 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-22 3:36 [PATCH net v3 0/8] xsk: fix bugs around xsk skb allocation Jason Xing
2026-04-22 3:36 ` [PATCH net v3 1/8] xsk: reject sw-csum UMEM binding to IFF_TX_SKB_NO_LINEAR devices Jason Xing
2026-04-22 3:36 ` [PATCH net v3 2/8] xsk: handle NULL dereference of the skb without frags issue Jason Xing
2026-04-22 3:36 ` [PATCH net v3 3/8] xsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path Jason Xing
2026-04-22 16:31 ` Stanislav Fomichev
2026-04-22 3:36 ` [PATCH net v3 4/8] xsk: prevent CQ desync when freeing half-built skbs in xsk_build_skb() Jason Xing
2026-04-22 16:31 ` Stanislav Fomichev
2026-04-22 3:36 ` [PATCH net v3 5/8] xsk: avoid skb leak in XDP_TX_METADATA case Jason Xing
2026-04-22 16:31 ` Stanislav Fomichev
2026-04-22 3:36 ` [PATCH net v3 6/8] xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS Jason Xing
2026-04-22 16:31 ` Stanislav Fomichev
2026-04-22 3:36 ` [PATCH net v3 7/8] xsk: fix xsk_addrs slab leak on multi-buffer error path Jason Xing
2026-04-22 3:36 ` [PATCH net v3 8/8] xsk: don't support AF_XDP on 32-bit architectures Jason Xing
2026-04-22 16:09 ` Alexander Lobakin
2026-04-22 16:37 ` Jason Xing
2026-04-22 16:58 ` Maciej Fijalkowski
2026-04-22 17:00 ` Alexander Lobakin
2026-04-22 16:31 ` Stanislav Fomichev
2026-04-22 20:27 ` David Laight
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox