* [PATCH net 1/4] xsk: avoid skb leak in XDP_TX_METADATA case
2026-04-18 4:56 [PATCH net 0/4] xsk: fix bugs around xsk skb allocation Jason Xing
@ 2026-04-18 4:56 ` Jason Xing
2026-04-20 15:22 ` Stanislav Fomichev
2026-04-18 4:56 ` [PATCH net 2/4] xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS Jason Xing
` (2 subsequent siblings)
3 siblings, 1 reply; 10+ messages in thread
From: Jason Xing @ 2026-04-18 4:56 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, horms, andrew+netdev
Cc: bpf, netdev, Jason Xing
From: Jason Xing <kernelxing@tencent.com>
Fix it by explicitly adding kfree_skb() before returning back to its
caller.
How to reproduce it in virtio_net:
1. the current skb is the first one (which means no frag and xs->skb is
NULL) and users enable metadata feature.
2. xsk_skb_metadata() returns a error code.
3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'.
4. there is no chance to free this skb anymore.
Closes: https://lore.kernel.org/all/20260415085204.3F87AC19424@smtp.kernel.org/
Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
net/xdp/xsk.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 6149f6a79897..8fcde34aec7b 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -743,8 +743,10 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
xsk_skb_init_misc(skb, xs, desc->addr);
if (desc->options & XDP_TX_METADATA) {
err = xsk_skb_metadata(skb, buffer, desc, pool, hr);
- if (unlikely(err))
+ if (unlikely(err)) {
+ kfree_skb(skb);
return ERR_PTR(err);
+ }
}
} else {
struct xsk_addrs *xsk_addr;
--
2.41.3
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH net 1/4] xsk: avoid skb leak in XDP_TX_METADATA case
2026-04-18 4:56 ` [PATCH net 1/4] xsk: avoid skb leak in XDP_TX_METADATA case Jason Xing
@ 2026-04-20 15:22 ` Stanislav Fomichev
2026-04-20 15:42 ` Stanislav Fomichev
0 siblings, 1 reply; 10+ messages in thread
From: Stanislav Fomichev @ 2026-04-20 15:22 UTC (permalink / raw)
To: Jason Xing
Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing
On 04/18, Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
>
> Fix it by explicitly adding kfree_skb() before returning back to its
> caller.
>
> How to reproduce it in virtio_net:
> 1. the current skb is the first one (which means no frag and xs->skb is
> NULL) and users enable metadata feature.
> 2. xsk_skb_metadata() returns a error code.
> 3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'.
> 4. there is no chance to free this skb anymore.
>
> Closes: https://lore.kernel.org/all/20260415085204.3F87AC19424@smtp.kernel.org/
> Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function")
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH net 1/4] xsk: avoid skb leak in XDP_TX_METADATA case
2026-04-20 15:22 ` Stanislav Fomichev
@ 2026-04-20 15:42 ` Stanislav Fomichev
2026-04-20 16:27 ` Stanislav Fomichev
0 siblings, 1 reply; 10+ messages in thread
From: Stanislav Fomichev @ 2026-04-20 15:42 UTC (permalink / raw)
To: Jason Xing
Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing
On 04/20, Stanislav Fomichev wrote:
> On 04/18, Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > Fix it by explicitly adding kfree_skb() before returning back to its
> > caller.
> >
> > How to reproduce it in virtio_net:
> > 1. the current skb is the first one (which means no frag and xs->skb is
> > NULL) and users enable metadata feature.
> > 2. xsk_skb_metadata() returns a error code.
> > 3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'.
> > 4. there is no chance to free this skb anymore.
> >
> > Closes: https://lore.kernel.org/all/20260415085204.3F87AC19424@smtp.kernel.org/
> > Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function")
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
>
> Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Actually, I take that back.. While looking at patch 2 (which always
confuses me with that -EOVERFLOW handling), looks like we set
skb->destructor in xsk_skb_init_misc? So this kfree will do
xsk_cq_submit_addr_locked? Not sure that's what we want? Should we
move xsk_skb_init_misc to happen after xsk_skb_metadata?
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH net 1/4] xsk: avoid skb leak in XDP_TX_METADATA case
2026-04-20 15:42 ` Stanislav Fomichev
@ 2026-04-20 16:27 ` Stanislav Fomichev
2026-04-21 0:55 ` Jason Xing
0 siblings, 1 reply; 10+ messages in thread
From: Stanislav Fomichev @ 2026-04-20 16:27 UTC (permalink / raw)
To: Jason Xing
Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing
On 04/20, Stanislav Fomichev wrote:
> On 04/20, Stanislav Fomichev wrote:
> > On 04/18, Jason Xing wrote:
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > Fix it by explicitly adding kfree_skb() before returning back to its
> > > caller.
> > >
> > > How to reproduce it in virtio_net:
> > > 1. the current skb is the first one (which means no frag and xs->skb is
> > > NULL) and users enable metadata feature.
> > > 2. xsk_skb_metadata() returns a error code.
> > > 3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'.
> > > 4. there is no chance to free this skb anymore.
> > >
> > > Closes: https://lore.kernel.org/all/20260415085204.3F87AC19424@smtp.kernel.org/
> > > Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function")
> > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> >
> > Acked-by: Stanislav Fomichev <sdf@fomichev.me>
>
> Actually, I take that back.. While looking at patch 2 (which always
> confuses me with that -EOVERFLOW handling), looks like we set
> skb->destructor in xsk_skb_init_misc? So this kfree will do
> xsk_cq_submit_addr_locked? Not sure that's what we want? Should we
> move xsk_skb_init_misc to happen after xsk_skb_metadata?
Ooops, looks like you already addressed these in v2? Let me look into
that..
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH net 1/4] xsk: avoid skb leak in XDP_TX_METADATA case
2026-04-20 16:27 ` Stanislav Fomichev
@ 2026-04-21 0:55 ` Jason Xing
0 siblings, 0 replies; 10+ messages in thread
From: Jason Xing @ 2026-04-21 0:55 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing
On Tue, Apr 21, 2026 at 12:27 AM Stanislav Fomichev
<sdf.kernel@gmail.com> wrote:
>
> On 04/20, Stanislav Fomichev wrote:
> > On 04/20, Stanislav Fomichev wrote:
> > > On 04/18, Jason Xing wrote:
> > > > From: Jason Xing <kernelxing@tencent.com>
> > > >
> > > > Fix it by explicitly adding kfree_skb() before returning back to its
> > > > caller.
> > > >
> > > > How to reproduce it in virtio_net:
> > > > 1. the current skb is the first one (which means no frag and xs->skb is
> > > > NULL) and users enable metadata feature.
> > > > 2. xsk_skb_metadata() returns a error code.
> > > > 3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'.
> > > > 4. there is no chance to free this skb anymore.
> > > >
> > > > Closes: https://lore.kernel.org/all/20260415085204.3F87AC19424@smtp.kernel.org/
> > > > Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function")
> > > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > >
> > > Acked-by: Stanislav Fomichev <sdf@fomichev.me>
> >
> > Actually, I take that back.. While looking at patch 2 (which always
> > confuses me with that -EOVERFLOW handling), looks like we set
> > skb->destructor in xsk_skb_init_misc? So this kfree will do
> > xsk_cq_submit_addr_locked? Not sure that's what we want? Should we
> > move xsk_skb_init_misc to happen after xsk_skb_metadata?
>
> Ooops, looks like you already addressed these in v2? Let me look into
> that..
Thanks for the detailed check. To be frank, my mind sometimes went
blank while checking those complex paths...
Yep, let's discuss that there instead :)
Thanks,
Jason
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH net 2/4] xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS
2026-04-18 4:56 [PATCH net 0/4] xsk: fix bugs around xsk skb allocation Jason Xing
2026-04-18 4:56 ` [PATCH net 1/4] xsk: avoid skb leak in XDP_TX_METADATA case Jason Xing
@ 2026-04-18 4:56 ` Jason Xing
2026-04-20 15:44 ` Stanislav Fomichev
2026-04-18 4:56 ` [PATCH net 3/4] xsk: handle NULL dereference of the skb without frags issue Jason Xing
2026-04-18 4:56 ` [PATCH net 4/4] xsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path Jason Xing
3 siblings, 1 reply; 10+ messages in thread
From: Jason Xing @ 2026-04-18 4:56 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, horms, andrew+netdev
Cc: bpf, netdev, Jason Xing
From: Jason Xing <kernelxing@tencent.com>
Fix it by explicitly adding kfree_skb() before returning back to its
caller.
How to reproduce it in virtio_net:
1. the current skb is the first one (which means xs->skb is NULL) and
hit the limit MAX_SKB_FRAGS.
2. xsk_build_skb_zerocopy() returns -EOVERFLOW.
3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'. This
is why bug can be triggered.
4. there is no chance to free this skb anymore.
Note that if in this case the xs->skb is not NULL, xsk_build_skb() will
call xsk_drop_skb(xs->skb) to do the right thing.
Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
net/xdp/xsk.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 8fcde34aec7b..5d3dbb118730 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -778,8 +778,11 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
addr = buffer - pool->addrs;
for (copied = 0, i = skb_shinfo(skb)->nr_frags; copied < len; i++) {
- if (unlikely(i >= MAX_SKB_FRAGS))
+ if (unlikely(i >= MAX_SKB_FRAGS)) {
+ if (!xs->skb)
+ kfree_skb(skb);
return ERR_PTR(-EOVERFLOW);
+ }
page = pool->umem->pgs[addr >> PAGE_SHIFT];
get_page(page);
--
2.41.3
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH net 2/4] xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS
2026-04-18 4:56 ` [PATCH net 2/4] xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS Jason Xing
@ 2026-04-20 15:44 ` Stanislav Fomichev
0 siblings, 0 replies; 10+ messages in thread
From: Stanislav Fomichev @ 2026-04-20 15:44 UTC (permalink / raw)
To: Jason Xing
Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing
On 04/18, Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
>
> Fix it by explicitly adding kfree_skb() before returning back to its
> caller.
>
> How to reproduce it in virtio_net:
> 1. the current skb is the first one (which means xs->skb is NULL) and
> hit the limit MAX_SKB_FRAGS.
> 2. xsk_build_skb_zerocopy() returns -EOVERFLOW.
> 3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'. This
> is why bug can be triggered.
> 4. there is no chance to free this skb anymore.
>
> Note that if in this case the xs->skb is not NULL, xsk_build_skb() will
> call xsk_drop_skb(xs->skb) to do the right thing.
>
> Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
> net/xdp/xsk.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 8fcde34aec7b..5d3dbb118730 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -778,8 +778,11 @@ static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
> addr = buffer - pool->addrs;
>
> for (copied = 0, i = skb_shinfo(skb)->nr_frags; copied < len; i++) {
> - if (unlikely(i >= MAX_SKB_FRAGS))
> + if (unlikely(i >= MAX_SKB_FRAGS)) {
> + if (!xs->skb)
> + kfree_skb(skb);
> return ERR_PTR(-EOVERFLOW);
> + }
>
> page = pool->umem->pgs[addr >> PAGE_SHIFT];
> get_page(page);
> --
> 2.41.3
>
As mentioned for patch 1, need to do something with that
xsk_skb_init_misc... Otherwise we hit destructor with xsk_cq_submit_addr_locked
here as well
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH net 3/4] xsk: handle NULL dereference of the skb without frags issue
2026-04-18 4:56 [PATCH net 0/4] xsk: fix bugs around xsk skb allocation Jason Xing
2026-04-18 4:56 ` [PATCH net 1/4] xsk: avoid skb leak in XDP_TX_METADATA case Jason Xing
2026-04-18 4:56 ` [PATCH net 2/4] xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS Jason Xing
@ 2026-04-18 4:56 ` Jason Xing
2026-04-18 4:56 ` [PATCH net 4/4] xsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path Jason Xing
3 siblings, 0 replies; 10+ messages in thread
From: Jason Xing @ 2026-04-18 4:56 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, horms, andrew+netdev
Cc: bpf, netdev, Jason Xing
From: Jason Xing <kernelxing@tencent.com>
When a first descriptor (xs->skb == NULL) triggers -EOVERFLOW in
xsk_build_skb_zerocopy (e.g., MAX_SKB_FRAGS exceeded), the free_err
EOVERFLOW handler unconditionally dereferences xs->skb via
xsk_inc_num_desc(xs->skb) and xsk_drop_skb(xs->skb), causing a NULL
pointer dereference.
In the patch 2/4, the skb is already freed by kfree_skb() inside
xsk_build_skb_zerocopy for the first-descriptor case, so we only need
to do the bookkeeping: cancel the one reserved CQ slot and account for
the single invalid descriptor.
Guard the existing xsk_inc_num_desc/xsk_drop_skb calls with an
xs->skb check (for the continuation case), and add an else branch
for the first-descriptor case that manually cancels the CQ slot and
increments invalid_descs by one.
Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
net/xdp/xsk.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 5d3dbb118730..2918b773aa84 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -898,9 +898,14 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
kfree_skb(skb);
if (err == -EOVERFLOW) {
- /* Drop the packet */
- xsk_inc_num_desc(xs->skb);
- xsk_drop_skb(xs->skb);
+ if (xs->skb) {
+ /* Drop the packet */
+ xsk_inc_num_desc(xs->skb);
+ xsk_drop_skb(xs->skb);
+ } else {
+ xsk_cq_cancel_locked(xs->pool, 1);
+ xs->tx->invalid_descs++;
+ }
xskq_cons_release(xs->tx);
} else {
/* Let application retry */
--
2.41.3
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH net 4/4] xsk: fix use-after-free of xs->skb in xsk_build_skb() free_err path
2026-04-18 4:56 [PATCH net 0/4] xsk: fix bugs around xsk skb allocation Jason Xing
` (2 preceding siblings ...)
2026-04-18 4:56 ` [PATCH net 3/4] xsk: handle NULL dereference of the skb without frags issue Jason Xing
@ 2026-04-18 4:56 ` Jason Xing
3 siblings, 0 replies; 10+ messages in thread
From: Jason Xing @ 2026-04-18 4:56 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, horms, andrew+netdev
Cc: bpf, netdev, Jason Xing
From: Jason Xing <kernelxing@tencent.com>
When xsk_build_skb() processes multi-buffer packets in copy mode, the
first descriptor stores data into the skb linear area without adding
any frags, so nr_frags stays at 0. The caller then sets xs->skb = skb
to accumulate subsequent descriptors.
If a continuation descriptor fails (e.g. alloc_page returns NULL with
-EAGAIN), we jump to free_err where the condition:
if (skb && !skb_shinfo(skb)->nr_frags)
kfree_skb(skb);
evaluates to true because nr_frags is still 0 (the first descriptor
used the linear area, not frags). This frees the skb while xs->skb
still points to it, creating a dangling pointer. On the next transmit
attempt or socket close, xs->skb is dereferenced, causing a
use-after-free or double-free.
Fix by adding a !xs->skb check to the condition, ensuring we only free
skbs that were freshly allocated in this call (xs->skb is NULL) and
never free an in-progress multi-buffer skb that the caller still
references.
Closes: https://lore.kernel.org/all/20260415082654.21026-4-kerneljasonxing@gmail.com/
Fixes: 6b9c129c2f93 ("xsk: remove @first_frag from xsk_build_skb()")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
net/xdp/xsk.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 2918b773aa84..22c7a92e0734 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -894,7 +894,7 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
return skb;
free_err:
- if (skb && !skb_shinfo(skb)->nr_frags)
+ if (skb && !xs->skb && !skb_shinfo(skb)->nr_frags)
kfree_skb(skb);
if (err == -EOVERFLOW) {
--
2.41.3
^ permalink raw reply related [flat|nested] 10+ messages in thread