public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net v2 0/2] xsk: Fixes for AF_XDP fragment handling
@ 2026-02-09 18:24 Nikhil P. Rao
  2026-02-09 18:24 ` [PATCH net v2 1/2] xsk: Fix fragment node deletion to prevent buffer leak Nikhil P. Rao
  2026-02-09 18:24 ` [PATCH net v2 2/2] xsk: Fix zero-copy AF_XDP fragment drop Nikhil P. Rao
  0 siblings, 2 replies; 7+ messages in thread
From: Nikhil P. Rao @ 2026-02-09 18:24 UTC (permalink / raw)
  To: netdev
  Cc: nikhil.rao, magnus.karlsson, maciej.fijalkowski, sdf, davem,
	edumazet, kuba, pabeni, horms, kerneljasonxing

This series fixes two issues in AF_XDP zero-copy fragment handling:

Patch 1 fixes a buffer leak caused by incorrect list node handling after
commit b692bf9a7543. The list_node field is now reused for both the xskb
pool list and the buffer free list. Using list_del() instead of
list_del_init() causes list_empty() checks in xp_free() to fail, preventing
buffers from being added to the free list.

Patch 2 fixes partial packet delivery to userspace. In the zero-copy path,
if the Rx queue fills up while enqueuing fragments, the remaining fragments
are dropped, causing the application to receive incomplete packets. The fix
ensures the Rx queue has sufficient space for all fragments before starting
to enqueue them.

v2 changes:
 - Fix indentation issue reported by kernel test robot [1]

[1] https://lore.kernel.org/oe-kbuild-all/202602051720.YfZO23pZ-lkp@intel.com/

Nikhil P. Rao (2):
  xsk: Fix fragment node deletion to prevent buffer leak
  xsk: Fix zero-copy AF_XDP fragment drop

 include/net/xdp_sock_drv.h |  6 +++---
 net/xdp/xsk.c              | 24 ++++++++++++------------
 2 files changed, 15 insertions(+), 15 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net v2 1/2] xsk: Fix fragment node deletion to prevent buffer leak
  2026-02-09 18:24 [PATCH net v2 0/2] xsk: Fixes for AF_XDP fragment handling Nikhil P. Rao
@ 2026-02-09 18:24 ` Nikhil P. Rao
  2026-02-09 21:29   ` Maciej Fijalkowski
  2026-02-09 18:24 ` [PATCH net v2 2/2] xsk: Fix zero-copy AF_XDP fragment drop Nikhil P. Rao
  1 sibling, 1 reply; 7+ messages in thread
From: Nikhil P. Rao @ 2026-02-09 18:24 UTC (permalink / raw)
  To: netdev
  Cc: nikhil.rao, magnus.karlsson, maciej.fijalkowski, sdf, davem,
	edumazet, kuba, pabeni, horms, kerneljasonxing

After commit b692bf9a7543 ("xsk: Get rid of xdp_buff_xsk::xskb_list_node"),
the list_node field is reused for both the xskb pool list and the buffer
free list, this causes a buffer leak as described below.

xp_free() checks if a buffer is already on the free list using
list_empty(&xskb->list_node). When list_del() is used to remove a node
from the xskb pool list, it doesn't reinitialize the node pointers.
This means list_empty() will return false even after the node has been
removed, causing xp_free() to incorrectly skip adding the buffer to the
free list.

Fix this by using list_del_init() instead of list_del() in all fragment
handling paths, this ensures the list node is reinitialized after removal,
allowing the list_empty() to work correctly.

Fixes: b692bf9a7543 ("xsk: Get rid of xdp_buff_xsk::xskb_list_node")
Signed-off-by: Nikhil P. Rao <nikhil.rao@amd.com>
---
 include/net/xdp_sock_drv.h | 6 +++---
 net/xdp/xsk.c              | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
index 242e34f771cc..aefc368449d5 100644
--- a/include/net/xdp_sock_drv.h
+++ b/include/net/xdp_sock_drv.h
@@ -122,7 +122,7 @@ static inline void xsk_buff_free(struct xdp_buff *xdp)
 		goto out;
 
 	list_for_each_entry_safe(pos, tmp, xskb_list, list_node) {
-		list_del(&pos->list_node);
+		list_del_init(&pos->list_node);
 		xp_free(pos);
 	}
 
@@ -157,7 +157,7 @@ static inline struct xdp_buff *xsk_buff_get_frag(const struct xdp_buff *first)
 	frag = list_first_entry_or_null(&xskb->pool->xskb_list,
 					struct xdp_buff_xsk, list_node);
 	if (frag) {
-		list_del(&frag->list_node);
+		list_del_init(&frag->list_node);
 		ret = &frag->xdp;
 	}
 
@@ -168,7 +168,7 @@ static inline void xsk_buff_del_frag(struct xdp_buff *xdp)
 {
 	struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp);
 
-	list_del(&xskb->list_node);
+	list_del_init(&xskb->list_node);
 }
 
 static inline struct xdp_buff *xsk_buff_get_head(struct xdp_buff *first)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index f093c3453f64..f2ec4f78bbb6 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -186,7 +186,7 @@ static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 		err = __xsk_rcv_zc(xs, pos, len, contd);
 		if (err)
 			goto err;
-		list_del(&pos->list_node);
+		list_del_init(&pos->list_node);
 	}
 
 	return 0;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net v2 2/2] xsk: Fix zero-copy AF_XDP fragment drop
  2026-02-09 18:24 [PATCH net v2 0/2] xsk: Fixes for AF_XDP fragment handling Nikhil P. Rao
  2026-02-09 18:24 ` [PATCH net v2 1/2] xsk: Fix fragment node deletion to prevent buffer leak Nikhil P. Rao
@ 2026-02-09 18:24 ` Nikhil P. Rao
  2026-02-09 21:55   ` Maciej Fijalkowski
  1 sibling, 1 reply; 7+ messages in thread
From: Nikhil P. Rao @ 2026-02-09 18:24 UTC (permalink / raw)
  To: netdev
  Cc: nikhil.rao, magnus.karlsson, maciej.fijalkowski, sdf, davem,
	edumazet, kuba, pabeni, horms, kerneljasonxing

AF_XDP should ensure that only a complete packet is sent to application.
In the zero-copy case, if the Rx queue gets full as fragments are being
enqueued, the remaining fragments are dropped.

Add a check to ensure that the Rx queue has enough space for all
fragments of a packet before starting to enqueue them.

Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX")
Signed-off-by: Nikhil P. Rao <nikhil.rao@amd.com>
---
 net/xdp/xsk.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index f2ec4f78bbb6..b65be95abcdc 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -166,15 +166,20 @@ static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 	u32 frags = xdp_buff_has_frags(xdp);
 	struct xdp_buff_xsk *pos, *tmp;
 	struct list_head *xskb_list;
+	u32 num_desc = 1;
 	u32 contd = 0;
-	int err;
 
-	if (frags)
+	if (frags) {
+		num_desc = xdp_get_shared_info_from_buff(xdp)->nr_frags + 1;
 		contd = XDP_PKT_CONTD;
+	}
 
-	err = __xsk_rcv_zc(xs, xskb, len, contd);
-	if (err)
-		goto err;
+	if (xskq_prod_nb_free(xs->rx, num_desc) < num_desc) {
+		xs->rx_queue_full++;
+		return -ENOBUFS;
+	}
+
+	__xsk_rcv_zc(xs, xskb, len, contd);
 	if (likely(!frags))
 		return 0;
 
@@ -183,16 +188,11 @@ static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 		if (list_is_singular(xskb_list))
 			contd = 0;
 		len = pos->xdp.data_end - pos->xdp.data;
-		err = __xsk_rcv_zc(xs, pos, len, contd);
-		if (err)
-			goto err;
+		__xsk_rcv_zc(xs, pos, len, contd);
 		list_del_init(&pos->list_node);
 	}
 
 	return 0;
-err:
-	xsk_buff_free(xdp);
-	return err;
 }
 
 static void *xsk_copy_xdp_start(struct xdp_buff *from)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net v2 1/2] xsk: Fix fragment node deletion to prevent buffer leak
  2026-02-09 18:24 ` [PATCH net v2 1/2] xsk: Fix fragment node deletion to prevent buffer leak Nikhil P. Rao
@ 2026-02-09 21:29   ` Maciej Fijalkowski
  0 siblings, 0 replies; 7+ messages in thread
From: Maciej Fijalkowski @ 2026-02-09 21:29 UTC (permalink / raw)
  To: Nikhil P. Rao
  Cc: netdev, magnus.karlsson, sdf, davem, edumazet, kuba, pabeni,
	horms, kerneljasonxing

On Mon, Feb 09, 2026 at 06:24:50PM +0000, Nikhil P. Rao wrote:
> After commit b692bf9a7543 ("xsk: Get rid of xdp_buff_xsk::xskb_list_node"),
> the list_node field is reused for both the xskb pool list and the buffer
> free list, this causes a buffer leak as described below.
> 
> xp_free() checks if a buffer is already on the free list using
> list_empty(&xskb->list_node). When list_del() is used to remove a node
> from the xskb pool list, it doesn't reinitialize the node pointers.
> This means list_empty() will return false even after the node has been
> removed, causing xp_free() to incorrectly skip adding the buffer to the
> free list.
> 
> Fix this by using list_del_init() instead of list_del() in all fragment
> handling paths, this ensures the list node is reinitialized after removal,
> allowing the list_empty() to work correctly.
> 
> Fixes: b692bf9a7543 ("xsk: Get rid of xdp_buff_xsk::xskb_list_node")
> Signed-off-by: Nikhil P. Rao <nikhil.rao@amd.com>

Nice catch!

I'm curious how did you spot this. I assume it was a mix of XDP_DROP/PASS
action returned by your program in data path with mbuf as that's the path
that was affected. I wonder if we need to come up with a test case for
xskxceiver to cover this?

Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

> ---
>  include/net/xdp_sock_drv.h | 6 +++---
>  net/xdp/xsk.c              | 2 +-
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
> index 242e34f771cc..aefc368449d5 100644
> --- a/include/net/xdp_sock_drv.h
> +++ b/include/net/xdp_sock_drv.h
> @@ -122,7 +122,7 @@ static inline void xsk_buff_free(struct xdp_buff *xdp)
>  		goto out;
>  
>  	list_for_each_entry_safe(pos, tmp, xskb_list, list_node) {
> -		list_del(&pos->list_node);
> +		list_del_init(&pos->list_node);
>  		xp_free(pos);
>  	}
>  
> @@ -157,7 +157,7 @@ static inline struct xdp_buff *xsk_buff_get_frag(const struct xdp_buff *first)
>  	frag = list_first_entry_or_null(&xskb->pool->xskb_list,
>  					struct xdp_buff_xsk, list_node);
>  	if (frag) {
> -		list_del(&frag->list_node);
> +		list_del_init(&frag->list_node);
>  		ret = &frag->xdp;
>  	}
>  
> @@ -168,7 +168,7 @@ static inline void xsk_buff_del_frag(struct xdp_buff *xdp)
>  {
>  	struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp);
>  
> -	list_del(&xskb->list_node);
> +	list_del_init(&xskb->list_node);
>  }
>  
>  static inline struct xdp_buff *xsk_buff_get_head(struct xdp_buff *first)
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index f093c3453f64..f2ec4f78bbb6 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -186,7 +186,7 @@ static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
>  		err = __xsk_rcv_zc(xs, pos, len, contd);
>  		if (err)
>  			goto err;
> -		list_del(&pos->list_node);
> +		list_del_init(&pos->list_node);
>  	}
>  
>  	return 0;
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net v2 2/2] xsk: Fix zero-copy AF_XDP fragment drop
  2026-02-09 18:24 ` [PATCH net v2 2/2] xsk: Fix zero-copy AF_XDP fragment drop Nikhil P. Rao
@ 2026-02-09 21:55   ` Maciej Fijalkowski
  2026-02-10 16:19     ` Maciej Fijalkowski
  0 siblings, 1 reply; 7+ messages in thread
From: Maciej Fijalkowski @ 2026-02-09 21:55 UTC (permalink / raw)
  To: Nikhil P. Rao
  Cc: netdev, magnus.karlsson, sdf, davem, edumazet, kuba, pabeni,
	horms, kerneljasonxing

On Mon, Feb 09, 2026 at 06:24:51PM +0000, Nikhil P. Rao wrote:
> AF_XDP should ensure that only a complete packet is sent to application.
> In the zero-copy case, if the Rx queue gets full as fragments are being
> enqueued, the remaining fragments are dropped.

All of the descs that current xdp_buff was carrying will be dropped which
is incorrect as some of them have been exposed to Rx queue already and I
don't see the error path that would rewind them. So that's my
understanding of this issue.

However, we were trying to keep the single-buf case as fast as we can, see
below.

> 
> Add a check to ensure that the Rx queue has enough space for all
> fragments of a packet before starting to enqueue them.
> 
> Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX")
> Signed-off-by: Nikhil P. Rao <nikhil.rao@amd.com>
> ---
>  net/xdp/xsk.c | 22 +++++++++++-----------
>  1 file changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index f2ec4f78bbb6..b65be95abcdc 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -166,15 +166,20 @@ static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
>  	u32 frags = xdp_buff_has_frags(xdp);
>  	struct xdp_buff_xsk *pos, *tmp;
>  	struct list_head *xskb_list;
> +	u32 num_desc = 1;
>  	u32 contd = 0;
> -	int err;
>  
> -	if (frags)
> +	if (frags) {
> +		num_desc = xdp_get_shared_info_from_buff(xdp)->nr_frags + 1;
>  		contd = XDP_PKT_CONTD;
> +	}
>  
> -	err = __xsk_rcv_zc(xs, xskb, len, contd);
> -	if (err)
> -		goto err;
> +	if (xskq_prod_nb_free(xs->rx, num_desc) < num_desc) {

this will hurt single buf performance unfortunately, I'd rather have frag
part still executed separately. Did you measure what impact on throughput
this patch has?

Further thought here is once we are sure about sufficient space in xsk
queue then we could skip sanity check that xskq_prod_reserve_desc()
contains. Look at batching that is done on Tx side.

Please see what works best here. Whether keeping linear part execution
separate from frags + producing frags in a 'batched' way or including
linear part with this 'batched' production of descriptors.

> +		xs->rx_queue_full++;
> +		return -ENOBUFS;
> +	}
> +
> +	__xsk_rcv_zc(xs, xskb, len, contd);
>  	if (likely(!frags))
>  		return 0;
>  
> @@ -183,16 +188,11 @@ static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
>  		if (list_is_singular(xskb_list))
>  			contd = 0;
>  		len = pos->xdp.data_end - pos->xdp.data;
> -		err = __xsk_rcv_zc(xs, pos, len, contd);
> -		if (err)
> -			goto err;
> +		__xsk_rcv_zc(xs, pos, len, contd);
>  		list_del_init(&pos->list_node);
>  	}
>  
>  	return 0;
> -err:
> -	xsk_buff_free(xdp);
> -	return err;
>  }
>  
>  static void *xsk_copy_xdp_start(struct xdp_buff *from)
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net v2 2/2] xsk: Fix zero-copy AF_XDP fragment drop
  2026-02-09 21:55   ` Maciej Fijalkowski
@ 2026-02-10 16:19     ` Maciej Fijalkowski
  2026-02-10 21:10       ` Rao, Nikhil
  0 siblings, 1 reply; 7+ messages in thread
From: Maciej Fijalkowski @ 2026-02-10 16:19 UTC (permalink / raw)
  To: Nikhil P. Rao
  Cc: netdev, magnus.karlsson, sdf, davem, edumazet, kuba, pabeni,
	horms, kerneljasonxing

On Mon, Feb 09, 2026 at 10:55:16PM +0100, Maciej Fijalkowski wrote:
> On Mon, Feb 09, 2026 at 06:24:51PM +0000, Nikhil P. Rao wrote:
> > AF_XDP should ensure that only a complete packet is sent to application.
> > In the zero-copy case, if the Rx queue gets full as fragments are being
> > enqueued, the remaining fragments are dropped.
> 
> All of the descs that current xdp_buff was carrying will be dropped which
> is incorrect as some of them have been exposed to Rx queue already and I
> don't see the error path that would rewind them. So that's my
> understanding of this issue.
> 
> However, we were trying to keep the single-buf case as fast as we can, see
> below.
> 
> > 
> > Add a check to ensure that the Rx queue has enough space for all
> > fragments of a packet before starting to enqueue them.
> > 
> > Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX")
> > Signed-off-by: Nikhil P. Rao <nikhil.rao@amd.com>
> > ---
> >  net/xdp/xsk.c | 22 +++++++++++-----------
> >  1 file changed, 11 insertions(+), 11 deletions(-)
> > 
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index f2ec4f78bbb6..b65be95abcdc 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -166,15 +166,20 @@ static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
> >  	u32 frags = xdp_buff_has_frags(xdp);
> >  	struct xdp_buff_xsk *pos, *tmp;
> >  	struct list_head *xskb_list;
> > +	u32 num_desc = 1;
> >  	u32 contd = 0;
> > -	int err;
> >  
> > -	if (frags)
> > +	if (frags) {
> > +		num_desc = xdp_get_shared_info_from_buff(xdp)->nr_frags + 1;
> >  		contd = XDP_PKT_CONTD;
> > +	}
> >  
> > -	err = __xsk_rcv_zc(xs, xskb, len, contd);
> > -	if (err)
> > -		goto err;
> > +	if (xskq_prod_nb_free(xs->rx, num_desc) < num_desc) {
> 
> this will hurt single buf performance unfortunately, I'd rather have frag
> part still executed separately. Did you measure what impact on throughput
> this patch has?
> 
> Further thought here is once we are sure about sufficient space in xsk
> queue then we could skip sanity check that xskq_prod_reserve_desc()
> contains. Look at batching that is done on Tx side.
> 
> Please see what works best here. Whether keeping linear part execution
> separate from frags + producing frags in a 'batched' way or including
> linear part with this 'batched' production of descriptors.

What I meant was patch below. However this is not a fix so I wouldn't
incorporate it to your set. Maybe let's go with just processing linear
part separately just like it used to be and then I can follow-up with this
diff.

From 153e1bc5d2baf6328667956ae16d47103085eac8 Mon Sep 17 00:00:00 2001
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date: Tue, 10 Feb 2026 13:39:17 +0000
Subject: [PATCH bpf-next] xsk: avoid double checking against rx queue being
 full

Currently non-zc xsk rx path for multi-buffer case checks twice if xsk
rx queue has enough space for producing descriptors:
1.
	if (xskq_prod_nb_free(xs->rx, num_desc) < num_desc) {
		xs->rx_queue_full++;
		return -ENOBUFS;
	}
2.
	__xsk_rcv_zc(xs, xskb, copied - meta_len, rem ? XDP_PKT_CONTD : 0);
	-> err = xskq_prod_reserve_desc(xs->rx, addr, len, flags);
	  -> if (xskq_prod_is_full(q))

Second part is redundant as in 1. we already peeked onto rx queue and
checked that there is enough space to produce given amount of
descriptors.

Provide helper functions that will skip it and therefore optimize code.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 net/xdp/xsk.c       | 14 +++++++++++++-
 net/xdp/xsk_queue.h | 16 +++++++++++-----
 2 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index f093c3453f64..aaadc13649e1 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -160,6 +160,17 @@ static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff_xsk *xskb, u32 len,
 	return 0;
 }
 
+static void __xsk_rcv_zc_safe(struct xdp_sock *xs, struct xdp_buff_xsk *xskb,
+			      u32 len, u32 flags)
+{
+	u64 addr;
+
+	addr = xp_get_handle(xskb, xskb->pool);
+	__xskq_prod_reserve_desc(xs->rx, addr, len, flags);
+
+	xp_release(xskb);
+}
+
 static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 {
 	struct xdp_buff_xsk *xskb = container_of(xdp, struct xdp_buff_xsk, xdp);
@@ -292,7 +303,8 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 		rem -= copied;
 
 		xskb = container_of(xsk_xdp, struct xdp_buff_xsk, xdp);
-		__xsk_rcv_zc(xs, xskb, copied - meta_len, rem ? XDP_PKT_CONTD : 0);
+		__xsk_rcv_zc_safe(xs, xskb, copied - meta_len,
+				  rem ? XDP_PKT_CONTD : 0);
 		meta_len = 0;
 	} while (rem);
 
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 1eb8d9f8b104..4f764b5748d2 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -440,20 +440,26 @@ static inline void xskq_prod_write_addr_batch(struct xsk_queue *q, struct xdp_de
 	q->cached_prod = cached_prod;
 }
 
-static inline int xskq_prod_reserve_desc(struct xsk_queue *q,
-					 u64 addr, u32 len, u32 flags)
+static inline void __xskq_prod_reserve_desc(struct xsk_queue *q,
+					    u64 addr, u32 len, u32 flags)
 {
 	struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
 	u32 idx;
 
-	if (xskq_prod_is_full(q))
-		return -ENOBUFS;
-
 	/* A, matches D */
 	idx = q->cached_prod++ & q->ring_mask;
 	ring->desc[idx].addr = addr;
 	ring->desc[idx].len = len;
 	ring->desc[idx].options = flags;
+}
+
+static inline int xskq_prod_reserve_desc(struct xsk_queue *q,
+					 u64 addr, u32 len, u32 flags)
+{
+	if (xskq_prod_is_full(q))
+		return -ENOBUFS;
+
+	__xskq_prod_reserve_desc(q, addr, len, flags);
 
 	return 0;
 }
-- 
2.43.0


> 
> > +		xs->rx_queue_full++;
> > +		return -ENOBUFS;
> > +	}
> > +
> > +	__xsk_rcv_zc(xs, xskb, len, contd);
> >  	if (likely(!frags))
> >  		return 0;
> >  
> > @@ -183,16 +188,11 @@ static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
> >  		if (list_is_singular(xskb_list))
> >  			contd = 0;
> >  		len = pos->xdp.data_end - pos->xdp.data;
> > -		err = __xsk_rcv_zc(xs, pos, len, contd);
> > -		if (err)
> > -			goto err;
> > +		__xsk_rcv_zc(xs, pos, len, contd);
> >  		list_del_init(&pos->list_node);
> >  	}
> >  
> >  	return 0;
> > -err:
> > -	xsk_buff_free(xdp);
> > -	return err;
> >  }
> >  
> >  static void *xsk_copy_xdp_start(struct xdp_buff *from)
> > -- 
> > 2.43.0
> > 

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net v2 2/2] xsk: Fix zero-copy AF_XDP fragment drop
  2026-02-10 16:19     ` Maciej Fijalkowski
@ 2026-02-10 21:10       ` Rao, Nikhil
  0 siblings, 0 replies; 7+ messages in thread
From: Rao, Nikhil @ 2026-02-10 21:10 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: netdev, magnus.karlsson, sdf, davem, edumazet, kuba, pabeni,
	horms, kerneljasonxing, ",nikhil.rao"


Sorry for the duplicate - sending without html this time.

On 2/10/2026 8:19 AM, Maciej Fijalkowski wrote:
> 
> On Mon, Feb 09, 2026 at 10:55:16PM +0100, Maciej Fijalkowski wrote:
>> On Mon, Feb 09, 2026 at 06:24:51PM +0000, Nikhil P. Rao wrote:
>>> AF_XDP should ensure that only a complete packet is sent to application.
>>> In the zero-copy case, if the Rx queue gets full as fragments are being
>>> enqueued, the remaining fragments are dropped.
>>
>> All of the descs that current xdp_buff was carrying will be dropped which
>> is incorrect as some of them have been exposed to Rx queue already and I
>> don't see the error path that would rewind them. So that's my
>> understanding of this issue.
>>
>> However, we were trying to keep the single-buf case as fast as we can, see
>> below.
>>
>>>
>>> Add a check to ensure that the Rx queue has enough space for all
>>> fragments of a packet before starting to enqueue them.
>>>
>>> Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX")
>>> Signed-off-by: Nikhil P. Rao <nikhil.rao@amd.com>
>>> ---
>>>   net/xdp/xsk.c | 22 +++++++++++-----------
>>>   1 file changed, 11 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
>>> index f2ec4f78bbb6..b65be95abcdc 100644
>>> --- a/net/xdp/xsk.c
>>> +++ b/net/xdp/xsk.c
>>> @@ -166,15 +166,20 @@ static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
>>>      u32 frags = xdp_buff_has_frags(xdp);
>>>      struct xdp_buff_xsk *pos, *tmp;
>>>      struct list_head *xskb_list;
>>> +   u32 num_desc = 1;
>>>      u32 contd = 0;
>>> -   int err;
>>>
>>> -   if (frags)
>>> +   if (frags) {
>>> +           num_desc = xdp_get_shared_info_from_buff(xdp)->nr_frags + 1;
>>>              contd = XDP_PKT_CONTD;
>>> +   }
>>>
>>> -   err = __xsk_rcv_zc(xs, xskb, len, contd);
>>> -   if (err)
>>> -           goto err;
>>> +   if (xskq_prod_nb_free(xs->rx, num_desc) < num_desc) {
>>
>> this will hurt single buf performance unfortunately, I'd rather have frag
>> part still executed separately. Did you measure what impact on throughput
>> this patch has?
>>
>> Further thought here is once we are sure about sufficient space in xsk
>> queue then we could skip sanity check that xskq_prod_reserve_desc()
>> contains. Look at batching that is done on Tx side.
>>
>> Please see what works best here. Whether keeping linear part execution
>> separate from frags + producing frags in a 'batched' way or including
>> linear part with this 'batched' production of descriptors.
> 
> What I meant was patch below. However this is not a fix so I wouldn't
> incorporate it to your set. Maybe let's go with just processing linear
> part separately just like it used to be and then I can follow-up with this
> diff.
>
Agreed, this change can be implemented without any change to single buf 
performance.

Let me know if this looks good to you:

     if (frags) {

     } else {

         /* handle single buf */

     }

     /* handle multi-buf */

>>From 153e1bc5d2baf6328667956ae16d47103085eac8 Mon Sep 17 00:00:00 2001
> From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> Date: Tue, 10 Feb 2026 13:39:17 +0000
> Subject: [PATCH bpf-next] xsk: avoid double checking against rx queue being
>   full
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-02-10 21:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-09 18:24 [PATCH net v2 0/2] xsk: Fixes for AF_XDP fragment handling Nikhil P. Rao
2026-02-09 18:24 ` [PATCH net v2 1/2] xsk: Fix fragment node deletion to prevent buffer leak Nikhil P. Rao
2026-02-09 21:29   ` Maciej Fijalkowski
2026-02-09 18:24 ` [PATCH net v2 2/2] xsk: Fix zero-copy AF_XDP fragment drop Nikhil P. Rao
2026-02-09 21:55   ` Maciej Fijalkowski
2026-02-10 16:19     ` Maciej Fijalkowski
2026-02-10 21:10       ` Rao, Nikhil

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox