[PATCH net 0/4] xsk: fix bugs around xsk skb allocation

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH net 0/4] xsk: fix bugs around xsk skb allocation
@ 2026-04-18  4:56 Jason Xing
  2026-04-18  4:56 ` [PATCH net 1/4] xsk: avoid skb leak in XDP_TX_METADATA case Jason Xing
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Jason Xing @ 2026-04-18  4:56 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

There are four extremely rare issues around xsk_build_skb(). Two of them
were founded by Sashiko[1].

[1]: https://lore.kernel.org/all/20260415082654.21026-1-kerneljasonxing@gmail.com/

Jason Xing (4):
  xsk: avoid skb leak in XDP_TX_METADATA case
  xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS
  xsk: handle NULL dereference of the skb without frags issue
  xsk: fix use-after-free of xs->skb in xsk_build_skb()  free_err path

 net/xdp/xsk.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

-- 
2.41.3


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net 1/4] xsk: avoid skb leak in XDP_TX_METADATA case
  2026-04-18  4:56 [PATCH net 0/4] xsk: fix bugs around xsk skb allocation Jason Xing
@ 2026-04-18  4:56 ` Jason Xing
  2026-04-19  4:58   ` sashiko-bot
  2026-04-20 15:22   ` Stanislav Fomichev
  2026-04-18  4:56 ` [PATCH net 2/4] xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS Jason Xing
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 18+ messages in thread
From: Jason Xing @ 2026-04-18  4:56 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

Fix it by explicitly adding kfree_skb() before returning back to its
caller.

How to reproduce it in virtio_net:
1. the current skb is the first one (which means no frag and xs->skb is
   NULL) and users enable metadata feature.
2. xsk_skb_metadata() returns a error code.
3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'.
4. there is no chance to free this skb anymore.

Closes: https://lore.kernel.org/all/20260415085204.3F87AC19424@smtp.kernel.org/
Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 net/xdp/xsk.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 6149f6a79897..8fcde34aec7b 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -743,8 +743,10 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
 		xsk_skb_init_misc(skb, xs, desc->addr);
 		if (desc->options & XDP_TX_METADATA) {
 			err = xsk_skb_metadata(skb, buffer, desc, pool, hr);
-			if (unlikely(err))
+			if (unlikely(err)) {
+				kfree_skb(skb);
 				return ERR_PTR(err);
+			}
 		}
 	} else {
 		struct xsk_addrs *xsk_addr;
-- 
2.41.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net 2/4] xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS
  2026-04-18  4:56 [PATCH net 0/4] xsk: fix bugs around xsk skb allocation Jason Xing
  2026-04-18  4:56 ` [PATCH net 1/4] xsk: avoid skb leak in XDP_TX_METADATA case Jason Xing
@ 2026-04-18  4:56 ` Jason Xing
  2026-04-19  4:58   ` sashiko-bot
  2026-04-20 15:44   ` Stanislav Fomichev
  2026-04-18  4:56 ` [PATCH net 3/4] xsk: handle NULL dereference of the skb without frags issue Jason Xing
  2026-04-18  4:56 ` [PATCH net 4/4] xsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path Jason Xing
  3 siblings, 2 replies; 18+ messages in thread
From: Jason Xing @ 2026-04-18  4:56 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

Fix it by explicitly adding kfree_skb() before returning back to its
caller.

How to reproduce it in virtio_net:
1. the current skb is the first one (which means xs->skb is NULL) and
   hit the limit MAX_SKB_FRAGS.
2. xsk_build_skb_zerocopy() returns -EOVERFLOW.
3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'. This
   is why bug can be triggered.
4. there is no chance to free this skb anymore.

Note that if in this case the xs->skb is not NULL, xsk_build_skb() will
call xsk_drop_skb(xs->skb) to do the right thing.

Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 net/xdp/xsk.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 8fcde34aec7b..5d3dbb118730 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -778,8 +778,11 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
 	addr = buffer - pool->addrs;
 
 	for (copied = 0, i = skb_shinfo(skb)->nr_frags; copied < len; i++) {
-		if (unlikely(i >= MAX_SKB_FRAGS))
+		if (unlikely(i >= MAX_SKB_FRAGS)) {
+			if (!xs->skb)
+				kfree_skb(skb);
 			return ERR_PTR(-EOVERFLOW);
+		}
 
 		page = pool->umem->pgs[addr >> PAGE_SHIFT];
 		get_page(page);
-- 
2.41.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net 3/4] xsk: handle NULL dereference of the skb without frags issue
  2026-04-18  4:56 [PATCH net 0/4] xsk: fix bugs around xsk skb allocation Jason Xing
  2026-04-18  4:56 ` [PATCH net 1/4] xsk: avoid skb leak in XDP_TX_METADATA case Jason Xing
  2026-04-18  4:56 ` [PATCH net 2/4] xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS Jason Xing
@ 2026-04-18  4:56 ` Jason Xing
  2026-04-19  4:58   ` sashiko-bot
  2026-04-18  4:56 ` [PATCH net 4/4] xsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path Jason Xing
  3 siblings, 1 reply; 18+ messages in thread
From: Jason Xing @ 2026-04-18  4:56 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

When a first descriptor (xs->skb == NULL) triggers -EOVERFLOW in
xsk_build_skb_zerocopy (e.g., MAX_SKB_FRAGS exceeded), the free_err
EOVERFLOW handler unconditionally dereferences xs->skb via
xsk_inc_num_desc(xs->skb) and xsk_drop_skb(xs->skb), causing a NULL
pointer dereference.

In the patch 2/4, the skb is already freed by kfree_skb() inside
xsk_build_skb_zerocopy for the first-descriptor case, so we only need
to do the bookkeeping: cancel the one reserved CQ slot and account for
the single invalid descriptor.

Guard the existing xsk_inc_num_desc/xsk_drop_skb calls with an
xs->skb check (for the continuation case), and add an else branch
for the first-descriptor case that manually cancels the CQ slot and
increments invalid_descs by one.

Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 net/xdp/xsk.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 5d3dbb118730..2918b773aa84 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -898,9 +898,14 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
 		kfree_skb(skb);
 
 	if (err == -EOVERFLOW) {
-		/* Drop the packet */
-		xsk_inc_num_desc(xs->skb);
-		xsk_drop_skb(xs->skb);
+		if (xs->skb) {
+			/* Drop the packet */
+			xsk_inc_num_desc(xs->skb);
+			xsk_drop_skb(xs->skb);
+		} else {
+			xsk_cq_cancel_locked(xs->pool, 1);
+			xs->tx->invalid_descs++;
+		}
 		xskq_cons_release(xs->tx);
 	} else {
 		/* Let application retry */
-- 
2.41.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net 4/4] xsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path
  2026-04-18  4:56 [PATCH net 0/4] xsk: fix bugs around xsk skb allocation Jason Xing
                   ` (2 preceding siblings ...)
  2026-04-18  4:56 ` [PATCH net 3/4] xsk: handle NULL dereference of the skb without frags issue Jason Xing
@ 2026-04-18  4:56 ` Jason Xing
  2026-04-19  4:58   ` sashiko-bot
  3 siblings, 1 reply; 18+ messages in thread
From: Jason Xing @ 2026-04-18  4:56 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

When xsk_build_skb() processes multi-buffer packets in copy mode, the
first descriptor stores data into the skb linear area without adding
any frags, so nr_frags stays at 0. The caller then sets xs->skb = skb
to accumulate subsequent descriptors.

If a continuation descriptor fails (e.g. alloc_page returns NULL with
-EAGAIN), we jump to free_err where the condition:

  if (skb && !skb_shinfo(skb)->nr_frags)
      kfree_skb(skb);

evaluates to true because nr_frags is still 0 (the first descriptor
used the linear area, not frags). This frees the skb while xs->skb
still points to it, creating a dangling pointer. On the next transmit
attempt or socket close, xs->skb is dereferenced, causing a
use-after-free or double-free.

Fix by adding a !xs->skb check to the condition, ensuring we only free
skbs that were freshly allocated in this call (xs->skb is NULL) and
never free an in-progress multi-buffer skb that the caller still
references.

Closes: https://lore.kernel.org/all/20260415082654.21026-4-kerneljasonxing@gmail.com/
Fixes: 6b9c129c2f93 ("xsk: remove @first_frag from xsk_build_skb()")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 net/xdp/xsk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 2918b773aa84..22c7a92e0734 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -894,7 +894,7 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
 	return skb;

 free_err:
-	if (skb && !skb_shinfo(skb)->nr_frags)
+	if (skb && !xs->skb && !skb_shinfo(skb)->nr_frags)
 		kfree_skb(skb);

 	if (err == -EOVERFLOW) {
-- 
2.41.3

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net 1/4] xsk: avoid skb leak in XDP_TX_METADATA case
  2026-04-18  4:56 ` [PATCH net 1/4] xsk: avoid skb leak in XDP_TX_METADATA case Jason Xing
@ 2026-04-19  4:58   ` sashiko-bot
  2026-04-19 13:05     ` Jason Xing
  2026-04-20 15:22   ` Stanislav Fomichev
  1 sibling, 1 reply; 18+ messages in thread
From: sashiko-bot @ 2026-04-19  4:58 UTC (permalink / raw)
  To: Jason Xing; +Cc: bpf

Sashiko AI review found 2 potential issue(s):
- [High] Premature SKB free triggers AF_XDP completion destructor, corrupting the Completion Queue (CQ) and causing an infinite TX loop.
- [High] Calling `xsk_skb_metadata()` before SKB fragments are attached breaks software checksum offload and permanently wedges the TX ring.
--

commit 7fb48e575c16e2e098969186062c6a31c8245353
Author: Jason Xing <kerneljasonxing@gmail.com>

xsk: avoid skb leak in XDP_TX_METADATA case

This commit prevents a memory leak when xsk_skb_metadata() fails by
explicitly freeing the skb before returning to the caller.

Link: https://lore.kernel.org/all/20260415085204.3F87AC19424@smtp.kernel.org/

> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 887abed254668..5b273bcd0dbce 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -823,8 +823,10 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
>  		xsk_skb_init_misc(skb, xs, desc->addr);
>  		if (desc->options & XDP_TX_METADATA) {
>  			err = xsk_skb_metadata(skb, buffer, desc, pool, hr);

Is it safe to call xsk_skb_metadata() here before the SKB fragments are
attached?

Because the fragments are populated later in the function via
skb_fill_page_desc(), both skb->len and skb_headlen(skb) are 0 at this point.

If XDP_TXMD_FLAGS_CHECKSUM is requested and software fallback is active,
xsk_skb_metadata() will call skb_checksum_help(). Because skb_headlen is 0,
the bounds check in skb_checksum_help() can trigger a WARN_ONCE and return
-EINVAL. This permanently wedges the TX ring since -EINVAL does not consume
the descriptor.

> -			if (unlikely(err))
> +			if (unlikely(err)) {
> +				kfree_skb(skb);

Does freeing the SKB here cause issues with the completion queue?

Just before this block, xsk_skb_init_misc() sets skb->destructor to
xsk_destruct_skb. Calling kfree_skb() triggers this destructor, which submits
the descriptor's address to the CQ and increments the producer index.

However, the caller xsk_build_skb() will see the error and call
xsk_cq_cancel_locked(), which decrements the cached producer index. This
desynchronizes the queue reservation pointers.

Additionally, because the error is not -EOVERFLOW, the kernel does not
advance the TX ring consumer pointer. The bad descriptor remains in the TX
ring and will be retried continuously, potentially causing an infinite loop.

Should skb->destructor be restored to sock_wfree before calling kfree_skb(),
or should xsk_skb_init_misc() be deferred until after the metadata is
validated?

>  				return ERR_PTR(err);
> +			}
>  		}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260418045644.28612-1-kerneljasonxing@gmail.com?part=1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net 2/4] xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS
  2026-04-18  4:56 ` [PATCH net 2/4] xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS Jason Xing
@ 2026-04-19  4:58   ` sashiko-bot
  2026-04-19 14:03     ` Jason Xing
  2026-04-20 15:44   ` Stanislav Fomichev
  1 sibling, 1 reply; 18+ messages in thread
From: sashiko-bot @ 2026-04-19  4:58 UTC (permalink / raw)
  To: Jason Xing; +Cc: bpf

Sashiko AI review found 2 potential issue(s):
- [High] The patch leaves a guaranteed NULL pointer dereference in the caller's error handling path when `-EOVERFLOW` is returned for the first fragment.
- [High] Calling `kfree_skb(skb)` directly triggers the packet destructor and incorrectly submits the dropped packet to the userspace Completion Queue (CQ).
--

commit e5bfaac263557abd44f47ec6e9c85e50c7aa1efc
Author: Jason Xing <kerneljasonxing@gmail.com>

    xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS

This commit explicitly frees the skb when the MAX_SKB_FRAGS limit is hit
on the first descriptor to avoid a memory leak.

> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 5b273bcd0dbce..3d6c6caa7ff82 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -858,8 +858,11 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
>  	addr = buffer - pool->addrs;
>  
>  	for (copied = 0, i = skb_shinfo(skb)->nr_frags; copied < len; i++) {
> -		if (unlikely(i >= MAX_SKB_FRAGS))
> +		if (unlikely(i >= MAX_SKB_FRAGS)) {
> +			if (!xs->skb)
> +				kfree_skb(skb);

If we free the newly allocated skb here and return -EOVERFLOW, the caller
xsk_build_skb() catches the error and jumps to free_err, where it calls:

    if (err == -EOVERFLOW) {
        /* Drop the packet */
        xsk_inc_num_desc(xs->skb);

Since xs->skb is still NULL, will this cause xsk_inc_num_desc() to
dereference a NULL pointer?

Note that this regression appears to be fixed in a later commit 1c728000f884c
("xsk: handle NULL dereference of the skb without frags issue").

Additionally, does calling kfree_skb(skb) here trigger the packet destructor
and incorrectly submit the dropped packet to the userspace completion queue?

Prior to entering the fragmentation loop, xsk_skb_init_misc() sets the
destructor to xsk_destruct_skb(). When kfree_skb(skb) is called, it invokes
this destructor, which sequentially calls xsk_cq_submit_addr_locked().

This writes the descriptor address to the completion queue and advances the
user-visible q->ring->producer pointer, falsely notifying userspace of a
successful transmission.

The caller xsk_build_skb() then executes its own error handling, calling
xsk_cq_cancel_locked(), which only decrements the kernel's internal
q->cached_prod counter.

Could this desynchronize the internal queue tracking variables from the
shared memory ring pointers?

Should the destructor be changed back to sock_wfree before freeing it,
similar to how xsk_consume_skb() handles it?

>  			return ERR_PTR(-EOVERFLOW);
> +		}
> [ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260418045644.28612-1-kerneljasonxing@gmail.com?part=2

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net 3/4] xsk: handle NULL dereference of the skb without frags issue
  2026-04-18  4:56 ` [PATCH net 3/4] xsk: handle NULL dereference of the skb without frags issue Jason Xing
@ 2026-04-19  4:58   ` sashiko-bot
  2026-04-19 14:09     ` Jason Xing
  0 siblings, 1 reply; 18+ messages in thread
From: sashiko-bot @ 2026-04-19  4:58 UTC (permalink / raw)
  To: Jason Xing; +Cc: bpf

Sashiko AI review found 2 potential issue(s):
- [Critical] Calling `kfree_skb(skb)` without clearing the SKB destructor causes Completion Queue (CQ) ring corruption and false completions.
- [High] Use-after-free of `xs->skb` in the continuation path when an error like `-EAGAIN` occurs.
--

commit 1c728000f884ce0a536d9c1fd763eb0c560ada51
Author: Jason Xing <kerneljasonxing@gmail.com>

xsk: handle NULL dereference of the skb without frags issue

This commit addresses a NULL pointer dereference in the error path of
xsk_build_skb_zerocopy() by checking if xs->skb is valid before
dereferencing it during -EOVERFLOW handling. It also adds an alternative
branch to handle bookkeeping for the first-descriptor case.

> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 3d6c6caa7ff82..2f00f188533c1 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -978,9 +978,14 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
>  		kfree_skb(skb);

Can this lead to a use-after-free on xs->skb? 

In the non-zerocopy path, if the first descriptor is linear, nr_frags remains
0 and the caller __xsk_generic_xmit() stores the SKB in xs->skb.

During the next iteration for a continuation descriptor, if an error like
-EAGAIN occurs (e.g., from alloc_page()), the code jumps to the free_err
label. The cleanup logic if (skb && !skb_shinfo(skb)->nr_frags) would
evaluate to true, freeing the entire SKB.

The function then returns -EAGAIN, leaving the freed SKB pointer in xs->skb.
On the next application retry, the code could dereference the freed xs->skb.

Note that this regression is fixed later in the patch series by commit 'xsk:
fix use-after-free of xs->skb in xsk_build_skb() free_err path'.

Also, could calling kfree_skb(skb) without clearing the SKB destructor cause
Completion Queue (CQ) ring corruption and false completions?

In xsk_build_skb_zerocopy(), xsk_skb_init_misc() sets
skb->destructor = xsk_destruct_skb. When an error occurs on the first
descriptor (e.g., xsk_skb_metadata() returning -EINVAL), kfree_skb(skb) is
called.

Because the destructor is not overridden with sock_wfree (like
xsk_consume_skb() does), kfree_skb(skb) triggers xsk_destruct_skb(), which
submits the descriptor to the CQ and increments q->ring->producer.

>  	if (err == -EOVERFLOW) {
> -		/* Drop the packet */
> -		xsk_inc_num_desc(xs->skb);
> -		xsk_drop_skb(xs->skb);
> +		if (xs->skb) {
> +			/* Drop the packet */
> +			xsk_inc_num_desc(xs->skb);
> +			xsk_drop_skb(xs->skb);
> +		} else {
> +			xsk_cq_cancel_locked(xs->pool, 1);
> +			xs->tx->invalid_descs++;
> +		}
>  		xskq_cons_release(xs->tx);
>  	} else {
>  		/* Let application retry */
>  		xsk_cq_cancel_locked(xs->pool, 1);
>  	}

The error handling paths then call xsk_cq_cancel_locked(), which decrements
the reservation counter q->cached_prod.

Does decrementing the reservation while the actual producer pointer has
advanced break queue accounting?

The kernel calculates available space using cached_prod; this artificial
underflow could cause the queue to over-report free space, allowing the
producer to silently overwrite unread user-space CQ entries.

Furthermore, in the -EINVAL path, does the invalid descriptor generate a
false completion to user space without being released from the TX ring? When
the application reuses the completed buffer, could the kernel retransmit the
same descriptor on retry, causing a use-after-free and data corruption?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260418045644.28612-1-kerneljasonxing@gmail.com?part=3

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net 4/4] xsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path
  2026-04-18  4:56 ` [PATCH net 4/4] xsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path Jason Xing
@ 2026-04-19  4:58   ` sashiko-bot
  2026-04-19 14:23     ` Jason Xing
  0 siblings, 1 reply; 18+ messages in thread
From: sashiko-bot @ 2026-04-19  4:58 UTC (permalink / raw)
  To: Jason Xing; +Cc: bpf

Sashiko AI review found 3 potential issue(s):
- [High] Calling `kfree_skb(skb)` in `free_err` after `xsk_skb_init_misc()` triggers premature CQ submission and corrupts Queue invariants.
- [High] Memory leak of `xsk_addr` when `alloc_page()` fails during multi-buffer packet processing.
- [High] On 32-bit architectures, `xsk_skb_destructor_set_addr()` truncates the 64-bit UMEM offset, corrupting the Completion Queue address.
--

commit e60e7fb34b1c59698d11da732db02bb3302ab41a
Author: Jason Xing <kerneljasonxing@gmail.com>

xsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path

When xsk_build_skb() processes multi-buffer packets in copy mode, the
first descriptor stores data into the skb linear area without adding any
frags. If a continuation descriptor fails, the error path frees the skb
while xs->skb still points to it. This patch adds a check to ensure we
only free skbs that were freshly allocated in this call.

Link: https://lore.kernel.org/all/20260415082654.21026-4-kerneljasonxing@gmail.com/

> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 2f00f188533c1..470f6d25ddb93 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -974,7 +974,7 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
>  	return skb;
>  
>  free_err:
> -	if (skb && !skb_shinfo(skb)->nr_frags)
> +	if (skb && !xs->skb && !skb_shinfo(skb)->nr_frags)
>  		kfree_skb(skb);

This isn't a regression introduced by this patch, but could calling
kfree_skb() here trigger a premature completion queue submission?

If xsk_skb_metadata() fails earlier in the function, the skb destructor
is already set to xsk_destruct_skb by xsk_skb_init_misc().

Freeing the skb here synchronously invokes xsk_destruct_skb(), which
submits the completion to user space and increments the queue producer.
The error path then executes xsk_cq_cancel_locked(), which decrements the
cached producer.

Will this permanently desync the completion queue invariants and report a
successful completion for a failed transmission? Would it be safer to
clear the skb destructor before freeing it?

Also, is there a potential memory leak of the xsk_addr struct when
alloc_page() fails during multi-buffer packet processing?

When processing a continuation descriptor, an xsk_addr struct is
dynamically allocated and stored in skb_shinfo(skb)->destructor_arg
with num_descs set to 1. If the subsequent alloc_page() fails, the code
jumps here to free_err and returns -EAGAIN, leaving the partial packet
in xs->skb for a retry.

If the socket is closed instead of retrying, xsk_drop_skb() is called on
xs->skb, which calls xsk_consume_skb(). Inside xsk_consume_skb(), the
cleanup logic uses this condition:

    if (unlikely(num_descs > 1))

Since num_descs is exactly 1, the condition fails. Does this permanently
leak the xsk_addr memory?

Finally, does xsk_skb_destructor_set_addr() truncate the 64-bit umem
offset on 32-bit architectures?

The inline storage attempts to cast the address to uintptr_t:

    skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t)addr | 0x1UL);

On 32-bit architectures, uintptr_t is 32 bits, which drops the upper 32
bits of the u64 addr. In af_xdp unaligned mode, the chunk offset is
stored in bits 48-63.

Could this truncation lose the offset entirely, causing
xsk_skb_destructor_get_addr() to retrieve a truncated address and
submit it to the user-space completion queue?

>  
>  	if (err == -EOVERFLOW) {
>  		if (xs->skb) {

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260418045644.28612-1-kerneljasonxing@gmail.com?part=4

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net 1/4] xsk: avoid skb leak in XDP_TX_METADATA case
  2026-04-19  4:58   ` sashiko-bot
@ 2026-04-19 13:05     ` Jason Xing
  0 siblings, 0 replies; 18+ messages in thread
From: Jason Xing @ 2026-04-19 13:05 UTC (permalink / raw)
  To: sashiko; +Cc: bpf

On Sun, Apr 19, 2026 at 12:58 PM <sashiko-bot@kernel.org> wrote:
>
> Sashiko AI review found 2 potential issue(s):
> - [High] Premature SKB free triggers AF_XDP completion destructor, corrupting the Completion Queue (CQ) and causing an infinite TX loop.
> - [High] Calling `xsk_skb_metadata()` before SKB fragments are attached breaks software checksum offload and permanently wedges the TX ring.
> --
>
> commit 7fb48e575c16e2e098969186062c6a31c8245353
> Author: Jason Xing <kerneljasonxing@gmail.com>
>
> xsk: avoid skb leak in XDP_TX_METADATA case
>
> This commit prevents a memory leak when xsk_skb_metadata() fails by
> explicitly freeing the skb before returning to the caller.
>
> Link: https://lore.kernel.org/all/20260415085204.3F87AC19424@smtp.kernel.org/
>
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index 887abed254668..5b273bcd0dbce 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -823,8 +823,10 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
> >               xsk_skb_init_misc(skb, xs, desc->addr);
> >               if (desc->options & XDP_TX_METADATA) {
> >                       err = xsk_skb_metadata(skb, buffer, desc, pool, hr);
>
> Is it safe to call xsk_skb_metadata() here before the SKB fragments are
> attached?
>
> Because the fragments are populated later in the function via
> skb_fill_page_desc(), both skb->len and skb_headlen(skb) are 0 at this point.
>
> If XDP_TXMD_FLAGS_CHECKSUM is requested and software fallback is active,
> xsk_skb_metadata() will call skb_checksum_help(). Because skb_headlen is 0,
> the bounds check in skb_checksum_help() can trigger a WARN_ONCE and return
> -EINVAL. This permanently wedges the TX ring since -EINVAL does not consume
> the descriptor.

I must state that this problem is not introduced by the series. The
following commit should be fixed soon:
commit 30c3055f9c0d84a67b8fd723bdec9b1b52b3c695
Author: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date:   Thu Sep 25 18:00:09 2025 +0200

    xsk: wrap generic metadata handling onto separate function

    xsk_build_skb() has gone wild with its size and one of the things we can
    do about it is to pull out a branch that takes care of metadata handling
    and make it a separate function.

    While at it, let us add metadata SW support for devices supporting
    IFF_TX_SKB_NO_LINEAR flag, that happen to have separate logic for
    building skb in xsk's generic xmit path.

I'll consider folding it into the next version even though it's a bit
complex to handle :(

>
> > -                     if (unlikely(err))
> > +                     if (unlikely(err)) {
> > +                             kfree_skb(skb);
>
> Does freeing the SKB here cause issues with the completion queue?
>
> Just before this block, xsk_skb_init_misc() sets skb->destructor to
> xsk_destruct_skb. Calling kfree_skb() triggers this destructor, which submits
> the descriptor's address to the CQ and increments the producer index.
>
> However, the caller xsk_build_skb() will see the error and call
> xsk_cq_cancel_locked(), which decrements the cached producer index. This
> desynchronizes the queue reservation pointers.
>
> Additionally, because the error is not -EOVERFLOW, the kernel does not
> advance the TX ring consumer pointer. The bad descriptor remains in the TX
> ring and will be retried continuously, potentially causing an infinite loop.
>
> Should skb->destructor be restored to sock_wfree before calling kfree_skb(),
> or should xsk_skb_init_misc() be deferred until after the metadata is
> validated?

Amazing really. The same here. Sashiko checked this pre-existing
behavior is buggy, which the patch 1 only does follow.
In terms of the free_err logic in xsk_build_skb, the untransmitted skb
should be freed by the common new helper:
static void xsk_drop_untransmitted_skb(struct sk_buff *skb)
{
        skb->destructor = sock_wfree;
        kfree_skb(skb);
}

Will cook it as a standalone patch to fix the commit:
commit c30d084960cf316c95fbf145d39974ce1ff7889c
Author: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date:   Thu Sep 25 18:00:07 2025 +0200

    xsk: avoid overwriting skb fields for multi-buffer traffic

    We are unnecessarily setting a bunch of skb fields per each processed
    descriptor, which is redundant for fragmented frames.

    Let us set these respective members for first fragment only. To address
    both paths that we have within xsk_build_skb(), move assignments onto
    xsk_set_destructor_arg() and rename it to xsk_skb_init_misc().

Thanks,
Jason

>
> >                               return ERR_PTR(err);
> > +                     }
> >               }
>
> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260418045644.28612-1-kerneljasonxing@gmail.com?part=1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net 2/4] xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS
  2026-04-19  4:58   ` sashiko-bot
@ 2026-04-19 14:03     ` Jason Xing
  0 siblings, 0 replies; 18+ messages in thread
From: Jason Xing @ 2026-04-19 14:03 UTC (permalink / raw)
  To: sashiko; +Cc: bpf

On Sun, Apr 19, 2026 at 12:58 PM <sashiko-bot@kernel.org> wrote:
>
> Sashiko AI review found 2 potential issue(s):
> - [High] The patch leaves a guaranteed NULL pointer dereference in the caller's error handling path when `-EOVERFLOW` is returned for the first fragment.
> - [High] Calling `kfree_skb(skb)` directly triggers the packet destructor and incorrectly submits the dropped packet to the userspace Completion Queue (CQ).
> --
>
> commit e5bfaac263557abd44f47ec6e9c85e50c7aa1efc
> Author: Jason Xing <kerneljasonxing@gmail.com>
>
>     xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS
>
> This commit explicitly frees the skb when the MAX_SKB_FRAGS limit is hit
> on the first descriptor to avoid a memory leak.
>
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index 5b273bcd0dbce..3d6c6caa7ff82 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -858,8 +858,11 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
> >       addr = buffer - pool->addrs;
> >
> >       for (copied = 0, i = skb_shinfo(skb)->nr_frags; copied < len; i++) {
> > -             if (unlikely(i >= MAX_SKB_FRAGS))
> > +             if (unlikely(i >= MAX_SKB_FRAGS)) {
> > +                     if (!xs->skb)
> > +                             kfree_skb(skb);
>
> If we free the newly allocated skb here and return -EOVERFLOW, the caller
> xsk_build_skb() catches the error and jumps to free_err, where it calls:
>
>     if (err == -EOVERFLOW) {
>         /* Drop the packet */
>         xsk_inc_num_desc(xs->skb);
>
> Since xs->skb is still NULL, will this cause xsk_inc_num_desc() to
> dereference a NULL pointer?
>
> Note that this regression appears to be fixed in a later commit 1c728000f884c
> ("xsk: handle NULL dereference of the skb without frags issue").

As you noticed, it was already fixed by patch 3.

>
> Additionally, does calling kfree_skb(skb) here trigger the packet destructor
> and incorrectly submit the dropped packet to the userspace completion queue?
>
> Prior to entering the fragmentation loop, xsk_skb_init_misc() sets the
> destructor to xsk_destruct_skb(). When kfree_skb(skb) is called, it invokes
> this destructor, which sequentially calls xsk_cq_submit_addr_locked().
>
> This writes the descriptor address to the completion queue and advances the
> user-visible q->ring->producer pointer, falsely notifying userspace of a
> successful transmission.
>
> The caller xsk_build_skb() then executes its own error handling, calling
> xsk_cq_cancel_locked(), which only decrements the kernel's internal
> q->cached_prod counter.
>
> Could this desynchronize the internal queue tracking variables from the
> shared memory ring pointers?
>
> Should the destructor be changed back to sock_wfree before freeing it,
> similar to how xsk_consume_skb() handles it?

Right, it's the same (pre-existing) bug as pointed out in patch 1.

Thanks,
Jason

>
> >                       return ERR_PTR(-EOVERFLOW);
> > +             }
> > [ ... ]
>
> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260418045644.28612-1-kerneljasonxing@gmail.com?part=2

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net 3/4] xsk: handle NULL dereference of the skb without frags issue
  2026-04-19  4:58   ` sashiko-bot
@ 2026-04-19 14:09     ` Jason Xing
  0 siblings, 0 replies; 18+ messages in thread
From: Jason Xing @ 2026-04-19 14:09 UTC (permalink / raw)
  To: sashiko; +Cc: bpf

On Sun, Apr 19, 2026 at 12:58 PM <sashiko-bot@kernel.org> wrote:
>
> Sashiko AI review found 2 potential issue(s):
> - [Critical] Calling `kfree_skb(skb)` without clearing the SKB destructor causes Completion Queue (CQ) ring corruption and false completions.
> - [High] Use-after-free of `xs->skb` in the continuation path when an error like `-EAGAIN` occurs.
> --
>
> commit 1c728000f884ce0a536d9c1fd763eb0c560ada51
> Author: Jason Xing <kerneljasonxing@gmail.com>
>
> xsk: handle NULL dereference of the skb without frags issue
>
> This commit addresses a NULL pointer dereference in the error path of
> xsk_build_skb_zerocopy() by checking if xs->skb is valid before
> dereferencing it during -EOVERFLOW handling. It also adds an alternative
> branch to handle bookkeeping for the first-descriptor case.
>
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index 3d6c6caa7ff82..2f00f188533c1 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -978,9 +978,14 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> >               kfree_skb(skb);
>
> Can this lead to a use-after-free on xs->skb?
>
> In the non-zerocopy path, if the first descriptor is linear, nr_frags remains
> 0 and the caller __xsk_generic_xmit() stores the SKB in xs->skb.

Patch 4 already handles this.

>
> During the next iteration for a continuation descriptor, if an error like
> -EAGAIN occurs (e.g., from alloc_page()), the code jumps to the free_err
> label. The cleanup logic if (skb && !skb_shinfo(skb)->nr_frags) would
> evaluate to true, freeing the entire SKB.
>
> The function then returns -EAGAIN, leaving the freed SKB pointer in xs->skb.
> On the next application retry, the code could dereference the freed xs->skb.
>
> Note that this regression is fixed later in the patch series by commit 'xsk:
> fix use-after-free of xs->skb in xsk_build_skb() free_err path'.
>
> Also, could calling kfree_skb(skb) without clearing the SKB destructor cause
> Completion Queue (CQ) ring corruption and false completions?
>
> In xsk_build_skb_zerocopy(), xsk_skb_init_misc() sets
> skb->destructor = xsk_destruct_skb. When an error occurs on the first
> descriptor (e.g., xsk_skb_metadata() returning -EINVAL), kfree_skb(skb) is
> called.
>
> Because the destructor is not overridden with sock_wfree (like
> xsk_consume_skb() does), kfree_skb(skb) triggers xsk_destruct_skb(), which
> submits the descriptor to the CQ and increments q->ring->producer.
>
> >       if (err == -EOVERFLOW) {
> > -             /* Drop the packet */
> > -             xsk_inc_num_desc(xs->skb);
> > -             xsk_drop_skb(xs->skb);
> > +             if (xs->skb) {
> > +                     /* Drop the packet */
> > +                     xsk_inc_num_desc(xs->skb);
> > +                     xsk_drop_skb(xs->skb);
> > +             } else {
> > +                     xsk_cq_cancel_locked(xs->pool, 1);
> > +                     xs->tx->invalid_descs++;
> > +             }
> >               xskq_cons_release(xs->tx);
> >       } else {
> >               /* Let application retry */
> >               xsk_cq_cancel_locked(xs->pool, 1);
> >       }
>
> The error handling paths then call xsk_cq_cancel_locked(), which decrements
> the reservation counter q->cached_prod.
>
> Does decrementing the reservation while the actual producer pointer has
> advanced break queue accounting?
>
> The kernel calculates available space using cached_prod; this artificial
> underflow could cause the queue to over-report free space, allowing the
> producer to silently overwrite unread user-space CQ entries.
>
> Furthermore, in the -EINVAL path, does the invalid descriptor generate a
> false completion to user space without being released from the TX ring? When
> the application reuses the completed buffer, could the kernel retransmit the
> same descriptor on retry, causing a use-after-free and data corruption?

For the above two issues, yes, we've discussed in the previous patch :)

Thanks,
Jason

>
> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260418045644.28612-1-kerneljasonxing@gmail.com?part=3

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net 4/4] xsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path
  2026-04-19  4:58   ` sashiko-bot
@ 2026-04-19 14:23     ` Jason Xing
  0 siblings, 0 replies; 18+ messages in thread
From: Jason Xing @ 2026-04-19 14:23 UTC (permalink / raw)
  To: sashiko; +Cc: bpf

On Sun, Apr 19, 2026 at 12:58 PM <sashiko-bot@kernel.org> wrote:
>
> Sashiko AI review found 3 potential issue(s):
> - [High] Calling `kfree_skb(skb)` in `free_err` after `xsk_skb_init_misc()` triggers premature CQ submission and corrupts Queue invariants.
> - [High] Memory leak of `xsk_addr` when `alloc_page()` fails during multi-buffer packet processing.
> - [High] On 32-bit architectures, `xsk_skb_destructor_set_addr()` truncates the 64-bit UMEM offset, corrupting the Completion Queue address.
> --
>
> commit e60e7fb34b1c59698d11da732db02bb3302ab41a
> Author: Jason Xing <kerneljasonxing@gmail.com>
>
> xsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path
>
> When xsk_build_skb() processes multi-buffer packets in copy mode, the
> first descriptor stores data into the skb linear area without adding any
> frags. If a continuation descriptor fails, the error path frees the skb
> while xs->skb still points to it. This patch adds a check to ensure we
> only free skbs that were freshly allocated in this call.
>
> Link: https://lore.kernel.org/all/20260415082654.21026-4-kerneljasonxing@gmail.com/
>
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index 2f00f188533c1..470f6d25ddb93 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -974,7 +974,7 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> >       return skb;
> >
> >  free_err:
> > -     if (skb && !skb_shinfo(skb)->nr_frags)
> > +     if (skb && !xs->skb && !skb_shinfo(skb)->nr_frags)
> >               kfree_skb(skb);
>
> This isn't a regression introduced by this patch, but could calling
> kfree_skb() here trigger a premature completion queue submission?
>
> If xsk_skb_metadata() fails earlier in the function, the skb destructor
> is already set to xsk_destruct_skb by xsk_skb_init_misc().
>
> Freeing the skb here synchronously invokes xsk_destruct_skb(), which
> submits the completion to user space and increments the queue producer.
> The error path then executes xsk_cq_cancel_locked(), which decrements the
> cached producer.
>
> Will this permanently desync the completion queue invariants and report a
> successful completion for a failed transmission? Would it be safer to
> clear the skb destructor before freeing it?

Ack, the same discussion again.

>
> Also, is there a potential memory leak of the xsk_addr struct when
> alloc_page() fails during multi-buffer packet processing?
>
> When processing a continuation descriptor, an xsk_addr struct is
> dynamically allocated and stored in skb_shinfo(skb)->destructor_arg
> with num_descs set to 1. If the subsequent alloc_page() fails, the code
> jumps here to free_err and returns -EAGAIN, leaving the partial packet
> in xs->skb for a retry.
>
> If the socket is closed instead of retrying, xsk_drop_skb() is called on
> xs->skb, which calls xsk_consume_skb(). Inside xsk_consume_skb(), the
> cleanup logic uses this condition:
>
>     if (unlikely(num_descs > 1))
>
> Since num_descs is exactly 1, the condition fails. Does this permanently
> leak the xsk_addr memory?

Right. See the following inline reply.

>
> Finally, does xsk_skb_destructor_set_addr() truncate the 64-bit umem
> offset on 32-bit architectures?
>
> The inline storage attempts to cast the address to uintptr_t:
>
>     skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t)addr | 0x1UL);
>
> On 32-bit architectures, uintptr_t is 32 bits, which drops the upper 32
> bits of the u64 addr. In af_xdp unaligned mode, the chunk offset is
> stored in bits 48-63.
>
> Could this truncation lose the offset entirely, causing
> xsk_skb_destructor_get_addr() to retrieve a truncated address and
> submit it to the user-space completion queue?

Right. It can happen!

Brilliant, sashiko found another two pre-existing bugs here! I'm so
shocked at sashiko that handles the extremely complicated path so
well. It's really useful to capture the bug beforehand.

I will include the corresponding two separate patches to fix the same
problematic commit:
commit 0ebc27a4c67d44e5ce88d21cdad8201862b78837
Author: Fernando Fernandez Mancera <fmancera@suse.de>
Date:   Mon Nov 24 18:14:09 2025 +0100

    xsk: avoid data corruption on cq descriptor number

    Since commit 30f241fcf52a ("xsk: Fix immature cq descriptor
    production"), the descriptor number is stored in skb control block and
    xsk_cq_submit_addr_locked() relies on it to put the umem addrs onto
    pool's completion queue.

    skb control block shouldn't be used for this purpose as after transmit
    xsk doesn't have control over it and other subsystems could use it. This
    leads to the following kernel panic due to a NULL pointer dereference.

Thanks,
Jason

>
> >
> >       if (err == -EOVERFLOW) {
> >               if (xs->skb) {
>
> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260418045644.28612-1-kerneljasonxing@gmail.com?part=4

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net 1/4] xsk: avoid skb leak in XDP_TX_METADATA case
  2026-04-18  4:56 ` [PATCH net 1/4] xsk: avoid skb leak in XDP_TX_METADATA case Jason Xing
  2026-04-19  4:58   ` sashiko-bot
@ 2026-04-20 15:22   ` Stanislav Fomichev
  2026-04-20 15:42     ` Stanislav Fomichev
  1 sibling, 1 reply; 18+ messages in thread
From: Stanislav Fomichev @ 2026-04-20 15:22 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing

On 04/18, Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
> 
> Fix it by explicitly adding kfree_skb() before returning back to its
> caller.
> 
> How to reproduce it in virtio_net:
> 1. the current skb is the first one (which means no frag and xs->skb is
>    NULL) and users enable metadata feature.
> 2. xsk_skb_metadata() returns a error code.
> 3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'.
> 4. there is no chance to free this skb anymore.
> 
> Closes: https://lore.kernel.org/all/20260415085204.3F87AC19424@smtp.kernel.org/
> Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function")
> Signed-off-by: Jason Xing <kernelxing@tencent.com>

Acked-by: Stanislav Fomichev <sdf@fomichev.me>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net 1/4] xsk: avoid skb leak in XDP_TX_METADATA case
  2026-04-20 15:22   ` Stanislav Fomichev
@ 2026-04-20 15:42     ` Stanislav Fomichev
  2026-04-20 16:27       ` Stanislav Fomichev
  0 siblings, 1 reply; 18+ messages in thread
From: Stanislav Fomichev @ 2026-04-20 15:42 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing

On 04/20, Stanislav Fomichev wrote:
> On 04/18, Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> > 
> > Fix it by explicitly adding kfree_skb() before returning back to its
> > caller.
> > 
> > How to reproduce it in virtio_net:
> > 1. the current skb is the first one (which means no frag and xs->skb is
> >    NULL) and users enable metadata feature.
> > 2. xsk_skb_metadata() returns a error code.
> > 3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'.
> > 4. there is no chance to free this skb anymore.
> > 
> > Closes: https://lore.kernel.org/all/20260415085204.3F87AC19424@smtp.kernel.org/
> > Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function")
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> 
> Acked-by: Stanislav Fomichev <sdf@fomichev.me>

Actually, I take that back.. While looking at patch 2 (which always
confuses me with that -EOVERFLOW handling), looks like we set
skb->destructor in xsk_skb_init_misc? So this kfree will do
xsk_cq_submit_addr_locked? Not sure that's what we want? Should we
move xsk_skb_init_misc to happen after xsk_skb_metadata?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net 2/4] xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS
  2026-04-18  4:56 ` [PATCH net 2/4] xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS Jason Xing
  2026-04-19  4:58   ` sashiko-bot
@ 2026-04-20 15:44   ` Stanislav Fomichev
  1 sibling, 0 replies; 18+ messages in thread
From: Stanislav Fomichev @ 2026-04-20 15:44 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing

On 04/18, Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
> 
> Fix it by explicitly adding kfree_skb() before returning back to its
> caller.
> 
> How to reproduce it in virtio_net:
> 1. the current skb is the first one (which means xs->skb is NULL) and
>    hit the limit MAX_SKB_FRAGS.
> 2. xsk_build_skb_zerocopy() returns -EOVERFLOW.
> 3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'. This
>    is why bug can be triggered.
> 4. there is no chance to free this skb anymore.
> 
> Note that if in this case the xs->skb is not NULL, xsk_build_skb() will
> call xsk_drop_skb(xs->skb) to do the right thing.
> 
> Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
>  net/xdp/xsk.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 8fcde34aec7b..5d3dbb118730 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -778,8 +778,11 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
>  	addr = buffer - pool->addrs;
>  
>  	for (copied = 0, i = skb_shinfo(skb)->nr_frags; copied < len; i++) {
> -		if (unlikely(i >= MAX_SKB_FRAGS))
> +		if (unlikely(i >= MAX_SKB_FRAGS)) {
> +			if (!xs->skb)
> +				kfree_skb(skb);
>  			return ERR_PTR(-EOVERFLOW);
> +		}
>  
>  		page = pool->umem->pgs[addr >> PAGE_SHIFT];
>  		get_page(page);
> -- 
> 2.41.3
> 

As mentioned for patch 1, need to do something with that
xsk_skb_init_misc... Otherwise we hit destructor with xsk_cq_submit_addr_locked
here as well

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net 1/4] xsk: avoid skb leak in XDP_TX_METADATA case
  2026-04-20 15:42     ` Stanislav Fomichev
@ 2026-04-20 16:27       ` Stanislav Fomichev
  2026-04-21  0:55         ` Jason Xing
  0 siblings, 1 reply; 18+ messages in thread
From: Stanislav Fomichev @ 2026-04-20 16:27 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing

On 04/20, Stanislav Fomichev wrote:
> On 04/20, Stanislav Fomichev wrote:
> > On 04/18, Jason Xing wrote:
> > > From: Jason Xing <kernelxing@tencent.com>
> > > 
> > > Fix it by explicitly adding kfree_skb() before returning back to its
> > > caller.
> > > 
> > > How to reproduce it in virtio_net:
> > > 1. the current skb is the first one (which means no frag and xs->skb is
> > >    NULL) and users enable metadata feature.
> > > 2. xsk_skb_metadata() returns a error code.
> > > 3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'.
> > > 4. there is no chance to free this skb anymore.
> > > 
> > > Closes: https://lore.kernel.org/all/20260415085204.3F87AC19424@smtp.kernel.org/
> > > Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function")
> > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > 
> > Acked-by: Stanislav Fomichev <sdf@fomichev.me>
> 
> Actually, I take that back.. While looking at patch 2 (which always
> confuses me with that -EOVERFLOW handling), looks like we set
> skb->destructor in xsk_skb_init_misc? So this kfree will do
> xsk_cq_submit_addr_locked? Not sure that's what we want? Should we
> move xsk_skb_init_misc to happen after xsk_skb_metadata?

Ooops, looks like you already addressed these in v2? Let me look into
that..

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net 1/4] xsk: avoid skb leak in XDP_TX_METADATA case
  2026-04-20 16:27       ` Stanislav Fomichev
@ 2026-04-21  0:55         ` Jason Xing
  0 siblings, 0 replies; 18+ messages in thread
From: Jason Xing @ 2026-04-21  0:55 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing

On Tue, Apr 21, 2026 at 12:27 AM Stanislav Fomichev
<sdf.kernel@gmail.com> wrote:
>
> On 04/20, Stanislav Fomichev wrote:
> > On 04/20, Stanislav Fomichev wrote:
> > > On 04/18, Jason Xing wrote:
> > > > From: Jason Xing <kernelxing@tencent.com>
> > > >
> > > > Fix it by explicitly adding kfree_skb() before returning back to its
> > > > caller.
> > > >
> > > > How to reproduce it in virtio_net:
> > > > 1. the current skb is the first one (which means no frag and xs->skb is
> > > >    NULL) and users enable metadata feature.
> > > > 2. xsk_skb_metadata() returns a error code.
> > > > 3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'.
> > > > 4. there is no chance to free this skb anymore.
> > > >
> > > > Closes: https://lore.kernel.org/all/20260415085204.3F87AC19424@smtp.kernel.org/
> > > > Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function")
> > > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > >
> > > Acked-by: Stanislav Fomichev <sdf@fomichev.me>
> >
> > Actually, I take that back.. While looking at patch 2 (which always
> > confuses me with that -EOVERFLOW handling), looks like we set
> > skb->destructor in xsk_skb_init_misc? So this kfree will do
> > xsk_cq_submit_addr_locked? Not sure that's what we want? Should we
> > move xsk_skb_init_misc to happen after xsk_skb_metadata?
>
> Ooops, looks like you already addressed these in v2? Let me look into
> that..

Thanks for the detailed check. To be frank, my mind sometimes went
blank while checking those complex paths...

Yep, let's discuss that there instead :)

Thanks,
Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2026-04-21  0:56 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-18  4:56 [PATCH net 0/4] xsk: fix bugs around xsk skb allocation Jason Xing
2026-04-18  4:56 ` [PATCH net 1/4] xsk: avoid skb leak in XDP_TX_METADATA case Jason Xing
2026-04-19  4:58   ` sashiko-bot
2026-04-19 13:05     ` Jason Xing
2026-04-20 15:22   ` Stanislav Fomichev
2026-04-20 15:42     ` Stanislav Fomichev
2026-04-20 16:27       ` Stanislav Fomichev
2026-04-21  0:55         ` Jason Xing
2026-04-18  4:56 ` [PATCH net 2/4] xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS Jason Xing
2026-04-19  4:58   ` sashiko-bot
2026-04-19 14:03     ` Jason Xing
2026-04-20 15:44   ` Stanislav Fomichev
2026-04-18  4:56 ` [PATCH net 3/4] xsk: handle NULL dereference of the skb without frags issue Jason Xing
2026-04-19  4:58   ` sashiko-bot
2026-04-19 14:09     ` Jason Xing
2026-04-18  4:56 ` [PATCH net 4/4] xsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path Jason Xing
2026-04-19  4:58   ` sashiko-bot
2026-04-19 14:23     ` Jason Xing

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox