[PATCH net-next v2 0/3] xsk: introduce atomic for cq in generic path

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next v2 0/3] xsk: introduce atomic for cq in generic path
@ 2025-11-25  8:54 Jason Xing
  2025-11-25  8:54 ` [PATCH net-next v2 1/3] xsk: add atomic cached_prod for copy mode Jason Xing
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Jason Xing @ 2025-11-25  8:54 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

In the hot path (that is __xsk_generic_xmit()), playing with spin lock
is time consuming. So this series replaces spin lock with atomic
operations to get better performance.

---
V2
Link: https://lore.kernel.org/all/20251124080858.89593-1-kerneljasonxing@gmail.com/
1. use separate functions rather than branches within shared routines. (Maciej)
2. make each patch as simple as possible for easier review

Jason Xing (3):
  xsk: add atomic cached_prod for copy mode
  xsk: use atomic operations around cached_prod for copy mode
  xsk: remove spin lock protection of cached_prod

 include/net/xsk_buff_pool.h |  5 -----
 net/xdp/xsk.c               | 23 +++++------------------
 net/xdp/xsk_buff_pool.c     |  1 -
 net/xdp/xsk_queue.h         | 27 +++++++++++++++++++++++----
 4 files changed, 28 insertions(+), 28 deletions(-)

-- 
2.41.3


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net-next v2 1/3] xsk: add atomic cached_prod for copy mode
  2025-11-25  8:54 [PATCH net-next v2 0/3] xsk: introduce atomic for cq in generic path Jason Xing
@ 2025-11-25  8:54 ` Jason Xing
  2025-11-25  8:54 ` [PATCH net-next v2 2/3] xsk: use atomic operations around " Jason Xing
  2025-11-25  8:54 ` [PATCH net-next v2 3/3] xsk: remove spin lock protection of cached_prod Jason Xing
  2 siblings, 0 replies; 10+ messages in thread
From: Jason Xing @ 2025-11-25  8:54 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

Add a union member for completion queue only in copy mode for now. The
purpose is to replace the cq_cached_prod_lock with atomic operation
to improve performance. Note that completion queue in zerocopy mode
doesn't need to be converted because the whole process is lockless.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 net/xdp/xsk_queue.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 1eb8d9f8b104..44cc01555c0b 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -40,7 +40,11 @@ struct xdp_umem_ring {
 struct xsk_queue {
 	u32 ring_mask;
 	u32 nentries;
-	u32 cached_prod;
+	union {
+		u32 cached_prod;
+		/* Used for cq in copy mode only */
+		atomic_t cached_prod_atomic;
+	};
 	u32 cached_cons;
 	struct xdp_ring *ring;
 	u64 invalid_descs;
-- 
2.41.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next v2 2/3] xsk: use atomic operations around cached_prod for copy mode
  2025-11-25  8:54 [PATCH net-next v2 0/3] xsk: introduce atomic for cq in generic path Jason Xing
  2025-11-25  8:54 ` [PATCH net-next v2 1/3] xsk: add atomic cached_prod for copy mode Jason Xing
@ 2025-11-25  8:54 ` Jason Xing
  2025-11-27 11:35   ` Paolo Abeni
  2025-11-25  8:54 ` [PATCH net-next v2 3/3] xsk: remove spin lock protection of cached_prod Jason Xing
  2 siblings, 1 reply; 10+ messages in thread
From: Jason Xing @ 2025-11-25  8:54 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

Use some exclusive functions for cached_prod in generic path instead
of extending unified functions to avoid affecting zerocopy feature.

Use atomic operations.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 net/xdp/xsk.c       |  4 ++--
 net/xdp/xsk_queue.h | 21 ++++++++++++++++++---
 2 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index bcfd400e9cf8..b63409b1422e 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -551,7 +551,7 @@ static int xsk_cq_reserve_locked(struct xsk_buff_pool *pool)
 	int ret;
 
 	spin_lock(&pool->cq_cached_prod_lock);
-	ret = xskq_prod_reserve(pool->cq);
+	ret = xsk_cq_cached_prod_reserve(pool->cq);
 	spin_unlock(&pool->cq_cached_prod_lock);
 
 	return ret;
@@ -588,7 +588,7 @@ static void xsk_cq_submit_addr_locked(struct xsk_buff_pool *pool,
 static void xsk_cq_cancel_locked(struct xsk_buff_pool *pool, u32 n)
 {
 	spin_lock(&pool->cq_cached_prod_lock);
-	xskq_prod_cancel_n(pool->cq, n);
+	atomic_sub(n, &pool->cq->cached_prod_atomic);
 	spin_unlock(&pool->cq_cached_prod_lock);
 }
 
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 44cc01555c0b..3a023791b273 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -402,13 +402,28 @@ static inline void xskq_prod_cancel_n(struct xsk_queue *q, u32 cnt)
 	q->cached_prod -= cnt;
 }
 
-static inline int xskq_prod_reserve(struct xsk_queue *q)
+static inline bool xsk_cq_cached_prod_nb_free(struct xsk_queue *q)
 {
-	if (xskq_prod_is_full(q))
+	u32 cached_prod = atomic_read(&q->cached_prod_atomic);
+	u32 free_entries = q->nentries - (cached_prod - q->cached_cons);
+
+	if (free_entries)
+		return true;
+
+	/* Refresh the local tail pointer */
+	q->cached_cons = READ_ONCE(q->ring->consumer);
+	free_entries = q->nentries - (cached_prod - q->cached_cons);
+
+	return free_entries ? true : false;
+}
+
+static inline int xsk_cq_cached_prod_reserve(struct xsk_queue *q)
+{
+	if (!xsk_cq_cached_prod_nb_free(q))
 		return -ENOSPC;
 
 	/* A, matches D */
-	q->cached_prod++;
+	atomic_inc(&q->cached_prod_atomic);
 	return 0;
 }
 
-- 
2.41.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next v2 3/3] xsk: remove spin lock protection of cached_prod
  2025-11-25  8:54 [PATCH net-next v2 0/3] xsk: introduce atomic for cq in generic path Jason Xing
  2025-11-25  8:54 ` [PATCH net-next v2 1/3] xsk: add atomic cached_prod for copy mode Jason Xing
  2025-11-25  8:54 ` [PATCH net-next v2 2/3] xsk: use atomic operations around " Jason Xing
@ 2025-11-25  8:54 ` Jason Xing
  2025-11-27 11:29   ` Paolo Abeni
  2 siblings, 1 reply; 10+ messages in thread
From: Jason Xing @ 2025-11-25  8:54 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

Remove the spin lock protection along with some functions adjusted.

Now cached_prod is fully converted to atomic, which improves the
performance by around 5% over different platforms.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 include/net/xsk_buff_pool.h |  5 -----
 net/xdp/xsk.c               | 21 ++++-----------------
 net/xdp/xsk_buff_pool.c     |  1 -
 3 files changed, 4 insertions(+), 23 deletions(-)

diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
index 92a2358c6ce3..0b1abdb99c9e 100644
--- a/include/net/xsk_buff_pool.h
+++ b/include/net/xsk_buff_pool.h
@@ -90,11 +90,6 @@ struct xsk_buff_pool {
 	 * destructor callback.
 	 */
 	spinlock_t cq_prod_lock;
-	/* Mutual exclusion of the completion ring in the SKB mode.
-	 * Protect: when sockets share a single cq when the same netdev
-	 * and queue id is shared.
-	 */
-	spinlock_t cq_cached_prod_lock;
 	struct xdp_buff_xsk *free_heads[];
 };
 
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index b63409b1422e..ae8a92c168b8 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -546,17 +546,6 @@ static int xsk_wakeup(struct xdp_sock *xs, u8 flags)
 	return dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id, flags);
 }
 
-static int xsk_cq_reserve_locked(struct xsk_buff_pool *pool)
-{
-	int ret;
-
-	spin_lock(&pool->cq_cached_prod_lock);
-	ret = xsk_cq_cached_prod_reserve(pool->cq);
-	spin_unlock(&pool->cq_cached_prod_lock);
-
-	return ret;
-}
-
 static void xsk_cq_submit_addr_locked(struct xsk_buff_pool *pool,
 				      struct sk_buff *skb)
 {
@@ -585,11 +574,9 @@ static void xsk_cq_submit_addr_locked(struct xsk_buff_pool *pool,
 	spin_unlock_irqrestore(&pool->cq_prod_lock, flags);
 }
 
-static void xsk_cq_cancel_locked(struct xsk_buff_pool *pool, u32 n)
+static void xsk_cq_cached_prod_cancel(struct xsk_buff_pool *pool, u32 n)
 {
-	spin_lock(&pool->cq_cached_prod_lock);
 	atomic_sub(n, &pool->cq->cached_prod_atomic);
-	spin_unlock(&pool->cq_cached_prod_lock);
 }
 
 static void xsk_inc_num_desc(struct sk_buff *skb)
@@ -643,7 +630,7 @@ static void xsk_consume_skb(struct sk_buff *skb)
 	}
 
 	skb->destructor = sock_wfree;
-	xsk_cq_cancel_locked(xs->pool, num_descs);
+	xsk_cq_cached_prod_cancel(xs->pool, num_descs);
 	/* Free skb without triggering the perf drop trace */
 	consume_skb(skb);
 	xs->skb = NULL;
@@ -860,7 +847,7 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
 		xskq_cons_release(xs->tx);
 	} else {
 		/* Let application retry */
-		xsk_cq_cancel_locked(xs->pool, 1);
+		xsk_cq_cached_prod_cancel(xs->pool, 1);
 	}
 
 	return ERR_PTR(err);
@@ -898,7 +885,7 @@ static int __xsk_generic_xmit(struct sock *sk)
 		 * if there is space in it. This avoids having to implement
 		 * any buffering in the Tx path.
 		 */
-		err = xsk_cq_reserve_locked(xs->pool);
+		err = xsk_cq_cached_prod_reserve(xs->pool->cq);
 		if (err) {
 			err = -EAGAIN;
 			goto out;
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index 51526034c42a..9539f121b290 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -91,7 +91,6 @@ struct xsk_buff_pool *xp_create_and_assign_umem(struct xdp_sock *xs,
 	INIT_LIST_HEAD(&pool->xsk_tx_list);
 	spin_lock_init(&pool->xsk_tx_list_lock);
 	spin_lock_init(&pool->cq_prod_lock);
-	spin_lock_init(&pool->cq_cached_prod_lock);
 	refcount_set(&pool->users, 1);
 
 	pool->fq = xs->fq_tmp;
-- 
2.41.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2 3/3] xsk: remove spin lock protection of cached_prod
  2025-11-25  8:54 ` [PATCH net-next v2 3/3] xsk: remove spin lock protection of cached_prod Jason Xing
@ 2025-11-27 11:29   ` Paolo Abeni
  2025-11-27 13:18     ` Jason Xing
  0 siblings, 1 reply; 10+ messages in thread
From: Paolo Abeni @ 2025-11-27 11:29 UTC (permalink / raw)
  To: Jason Xing, davem, edumazet, kuba, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend
  Cc: bpf, netdev, Jason Xing

On 11/25/25 9:54 AM, Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
> 
> Remove the spin lock protection along with some functions adjusted.
> 
> Now cached_prod is fully converted to atomic, which improves the
> performance by around 5% over different platforms.

I must admit that I'm surprised of the above delta; AFAIK replacing 1to1
spinlock with atomic should not impact performances measurably, as the
thread should still see the same contention, and will use the same
number of atomic operation on the bus.

> @@ -585,11 +574,9 @@ static void xsk_cq_submit_addr_locked(struct xsk_buff_pool *pool,
>  	spin_unlock_irqrestore(&pool->cq_prod_lock, flags);
>  }
>  
> -static void xsk_cq_cancel_locked(struct xsk_buff_pool *pool, u32 n)
> +static void xsk_cq_cached_prod_cancel(struct xsk_buff_pool *pool, u32 n)
>  {
> -	spin_lock(&pool->cq_cached_prod_lock);
>  	atomic_sub(n, &pool->cq->cached_prod_atomic);

It looks like that the spinlock and the protected data are on different
structs.

I wild guess/suspect the real gain comes from avoiding touching an
additional cacheline.
`struct xsk_queue` size is 48 bytes and such struct is allocated via
kmalloc. Adding up to 16 bytes there will not change the slub used and
thus the actual memory usage.

I think that moving the cq_cached* spinlock(s) in xsk_queue should give
the same gain, with much less code churn. Could you please have a look
at such option?

/P

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2 2/3] xsk: use atomic operations around cached_prod for copy mode
  2025-11-25  8:54 ` [PATCH net-next v2 2/3] xsk: use atomic operations around " Jason Xing
@ 2025-11-27 11:35   ` Paolo Abeni
  2025-11-27 13:55     ` Jason Xing
  0 siblings, 1 reply; 10+ messages in thread
From: Paolo Abeni @ 2025-11-27 11:35 UTC (permalink / raw)
  To: Jason Xing, davem, edumazet, kuba, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend
  Cc: bpf, netdev, Jason Xing

On 11/25/25 9:54 AM, Jason Xing wrote:
> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> index 44cc01555c0b..3a023791b273 100644
> --- a/net/xdp/xsk_queue.h
> +++ b/net/xdp/xsk_queue.h
> @@ -402,13 +402,28 @@ static inline void xskq_prod_cancel_n(struct xsk_queue *q, u32 cnt)
>  	q->cached_prod -= cnt;
>  }
>  
> -static inline int xskq_prod_reserve(struct xsk_queue *q)
> +static inline bool xsk_cq_cached_prod_nb_free(struct xsk_queue *q)
>  {
> -	if (xskq_prod_is_full(q))
> +	u32 cached_prod = atomic_read(&q->cached_prod_atomic);
> +	u32 free_entries = q->nentries - (cached_prod - q->cached_cons);
> +
> +	if (free_entries)
> +		return true;
> +
> +	/* Refresh the local tail pointer */
> +	q->cached_cons = READ_ONCE(q->ring->consumer);
> +	free_entries = q->nentries - (cached_prod - q->cached_cons);
> +
> +	return free_entries ? true : false;
> +}
_If_ different CPUs can call xsk_cq_cached_prod_reserve() simultaneously
(as the spinlock existence suggests) the above change introduce a race:

xsk_cq_cached_prod_nb_free() can return true when num_free == 1  on
CPU1, and xsk_cq_cached_prod_reserve increment cached_prod_atomic on
CPU2 before CPU1 completed xsk_cq_cached_prod_reserve().

/P


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2 3/3] xsk: remove spin lock protection of cached_prod
  2025-11-27 11:29   ` Paolo Abeni
@ 2025-11-27 13:18     ` Jason Xing
  0 siblings, 0 replies; 10+ messages in thread
From: Jason Xing @ 2025-11-27 13:18 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: davem, edumazet, kuba, bjorn, magnus.karlsson, maciej.fijalkowski,
	jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, bpf,
	netdev, Jason Xing

On Thu, Nov 27, 2025 at 7:29 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 11/25/25 9:54 AM, Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > Remove the spin lock protection along with some functions adjusted.
> >
> > Now cached_prod is fully converted to atomic, which improves the
> > performance by around 5% over different platforms.
>
> I must admit that I'm surprised of the above delta; AFAIK replacing 1to1
> spinlock with atomic should not impact performances measurably, as the
> thread should still see the same contention, and will use the same
> number of atomic operation on the bus.

Interesting point.

>
>
> > @@ -585,11 +574,9 @@ static void xsk_cq_submit_addr_locked(struct xsk_buff_pool *pool,
> >       spin_unlock_irqrestore(&pool->cq_prod_lock, flags);
> >  }
> >
> > -static void xsk_cq_cancel_locked(struct xsk_buff_pool *pool, u32 n)
> > +static void xsk_cq_cached_prod_cancel(struct xsk_buff_pool *pool, u32 n)
> >  {
> > -     spin_lock(&pool->cq_cached_prod_lock);
> >       atomic_sub(n, &pool->cq->cached_prod_atomic);
>
> It looks like that the spinlock and the protected data are on different
> structs.
>
> I wild guess/suspect the real gain comes from avoiding touching an
> additional cacheline.
> `struct xsk_queue` size is 48 bytes and such struct is allocated via
> kmalloc. Adding up to 16 bytes there will not change the slub used and
> thus the actual memory usage.
>
> I think that moving the cq_cached* spinlock(s) in xsk_queue should give
> the same gain, with much less code churn. Could you please have a look
> at such option?

I just did some tests and observed the same result as you predicted.
Thanks for the lesson!

Thanks,
Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2 2/3] xsk: use atomic operations around cached_prod for copy mode
  2025-11-27 11:35   ` Paolo Abeni
@ 2025-11-27 13:55     ` Jason Xing
  2025-11-27 15:32       ` Paolo Abeni
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Xing @ 2025-11-27 13:55 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: davem, edumazet, kuba, bjorn, magnus.karlsson, maciej.fijalkowski,
	jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, bpf,
	netdev, Jason Xing

On Thu, Nov 27, 2025 at 7:35 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 11/25/25 9:54 AM, Jason Xing wrote:
> > diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> > index 44cc01555c0b..3a023791b273 100644
> > --- a/net/xdp/xsk_queue.h
> > +++ b/net/xdp/xsk_queue.h
> > @@ -402,13 +402,28 @@ static inline void xskq_prod_cancel_n(struct xsk_queue *q, u32 cnt)
> >       q->cached_prod -= cnt;
> >  }
> >
> > -static inline int xskq_prod_reserve(struct xsk_queue *q)
> > +static inline bool xsk_cq_cached_prod_nb_free(struct xsk_queue *q)
> >  {
> > -     if (xskq_prod_is_full(q))
> > +     u32 cached_prod = atomic_read(&q->cached_prod_atomic);
> > +     u32 free_entries = q->nentries - (cached_prod - q->cached_cons);
> > +
> > +     if (free_entries)
> > +             return true;
> > +
> > +     /* Refresh the local tail pointer */
> > +     q->cached_cons = READ_ONCE(q->ring->consumer);
> > +     free_entries = q->nentries - (cached_prod - q->cached_cons);
> > +
> > +     return free_entries ? true : false;
> > +}
> _If_ different CPUs can call xsk_cq_cached_prod_reserve() simultaneously
> (as the spinlock existence suggests) the above change introduce a race:
>
> xsk_cq_cached_prod_nb_free() can return true when num_free == 1  on
> CPU1, and xsk_cq_cached_prod_reserve increment cached_prod_atomic on
> CPU2 before CPU1 completed xsk_cq_cached_prod_reserve().

I think you're right... I will give it more thought tomorrow morning.

I presume using try_cmpxchg() should work as it can detect if another
process changes @cached_prod simultaneously. They both work similarly.
But does it make any difference compared to spin lock? I don't have
any handy benchmark to stably measure two xsk sharing the same umem,
probably going to implement one.

Or like what you suggested in another thread, move that lock to struct
xsk_queue?

Thanks,
Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2 2/3] xsk: use atomic operations around cached_prod for copy mode
  2025-11-27 13:55     ` Jason Xing
@ 2025-11-27 15:32       ` Paolo Abeni
  2025-11-27 23:48         ` Jason Xing
  0 siblings, 1 reply; 10+ messages in thread
From: Paolo Abeni @ 2025-11-27 15:32 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, bjorn, magnus.karlsson, maciej.fijalkowski,
	jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, bpf,
	netdev, Jason Xing

On 11/27/25 2:55 PM, Jason Xing wrote:
> On Thu, Nov 27, 2025 at 7:35 PM Paolo Abeni <pabeni@redhat.com> wrote:
>> On 11/25/25 9:54 AM, Jason Xing wrote:
>>> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
>>> index 44cc01555c0b..3a023791b273 100644
>>> --- a/net/xdp/xsk_queue.h
>>> +++ b/net/xdp/xsk_queue.h
>>> @@ -402,13 +402,28 @@ static inline void xskq_prod_cancel_n(struct xsk_queue *q, u32 cnt)
>>>       q->cached_prod -= cnt;
>>>  }
>>>
>>> -static inline int xskq_prod_reserve(struct xsk_queue *q)
>>> +static inline bool xsk_cq_cached_prod_nb_free(struct xsk_queue *q)
>>>  {
>>> -     if (xskq_prod_is_full(q))
>>> +     u32 cached_prod = atomic_read(&q->cached_prod_atomic);
>>> +     u32 free_entries = q->nentries - (cached_prod - q->cached_cons);
>>> +
>>> +     if (free_entries)
>>> +             return true;
>>> +
>>> +     /* Refresh the local tail pointer */
>>> +     q->cached_cons = READ_ONCE(q->ring->consumer);
>>> +     free_entries = q->nentries - (cached_prod - q->cached_cons);
>>> +
>>> +     return free_entries ? true : false;
>>> +}
>> _If_ different CPUs can call xsk_cq_cached_prod_reserve() simultaneously
>> (as the spinlock existence suggests) the above change introduce a race:
>>
>> xsk_cq_cached_prod_nb_free() can return true when num_free == 1  on
>> CPU1, and xsk_cq_cached_prod_reserve increment cached_prod_atomic on
>> CPU2 before CPU1 completed xsk_cq_cached_prod_reserve().
> 
> I think you're right... I will give it more thought tomorrow morning.
> 
> I presume using try_cmpxchg() should work as it can detect if another
> process changes @cached_prod simultaneously. They both work similarly.
> But does it make any difference compared to spin lock? I don't have
> any handy benchmark to stably measure two xsk sharing the same umem,
> probably going to implement one.
> 
> Or like what you suggested in another thread, move that lock to struct
> xsk_queue?

I think moving the lock should be preferable: I think it makes sense
from a maintenance perspective to bundle the lock in the structure it
protects, and I hope it should make the whole patch simpler.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next v2 2/3] xsk: use atomic operations around cached_prod for copy mode
  2025-11-27 15:32       ` Paolo Abeni
@ 2025-11-27 23:48         ` Jason Xing
  0 siblings, 0 replies; 10+ messages in thread
From: Jason Xing @ 2025-11-27 23:48 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: davem, edumazet, kuba, bjorn, magnus.karlsson, maciej.fijalkowski,
	jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, bpf,
	netdev, Jason Xing

On Thu, Nov 27, 2025 at 11:32 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 11/27/25 2:55 PM, Jason Xing wrote:
> > On Thu, Nov 27, 2025 at 7:35 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >> On 11/25/25 9:54 AM, Jason Xing wrote:
> >>> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> >>> index 44cc01555c0b..3a023791b273 100644
> >>> --- a/net/xdp/xsk_queue.h
> >>> +++ b/net/xdp/xsk_queue.h
> >>> @@ -402,13 +402,28 @@ static inline void xskq_prod_cancel_n(struct xsk_queue *q, u32 cnt)
> >>>       q->cached_prod -= cnt;
> >>>  }
> >>>
> >>> -static inline int xskq_prod_reserve(struct xsk_queue *q)
> >>> +static inline bool xsk_cq_cached_prod_nb_free(struct xsk_queue *q)
> >>>  {
> >>> -     if (xskq_prod_is_full(q))
> >>> +     u32 cached_prod = atomic_read(&q->cached_prod_atomic);
> >>> +     u32 free_entries = q->nentries - (cached_prod - q->cached_cons);
> >>> +
> >>> +     if (free_entries)
> >>> +             return true;
> >>> +
> >>> +     /* Refresh the local tail pointer */
> >>> +     q->cached_cons = READ_ONCE(q->ring->consumer);
> >>> +     free_entries = q->nentries - (cached_prod - q->cached_cons);
> >>> +
> >>> +     return free_entries ? true : false;
> >>> +}
> >> _If_ different CPUs can call xsk_cq_cached_prod_reserve() simultaneously
> >> (as the spinlock existence suggests) the above change introduce a race:
> >>
> >> xsk_cq_cached_prod_nb_free() can return true when num_free == 1  on
> >> CPU1, and xsk_cq_cached_prod_reserve increment cached_prod_atomic on
> >> CPU2 before CPU1 completed xsk_cq_cached_prod_reserve().
> >
> > I think you're right... I will give it more thought tomorrow morning.
> >
> > I presume using try_cmpxchg() should work as it can detect if another
> > process changes @cached_prod simultaneously. They both work similarly.
> > But does it make any difference compared to spin lock? I don't have
> > any handy benchmark to stably measure two xsk sharing the same umem,
> > probably going to implement one.
> >
> > Or like what you suggested in another thread, move that lock to struct
> > xsk_queue?
>
> I think moving the lock should be preferable: I think it makes sense
> from a maintenance perspective to bundle the lock in the structure it
> protects, and I hope it should make the whole patch simpler.

Agreed. At least so far I cannot see the benefits of using
try_cmpxchg() instead as the protected area is really small. Probably
in the future I will try a better way after successfully spotting the
contention causing the performance problem.

I'm going to add your suggested-by tag since you provide this good
idea :) Thanks!

Thanks,
Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-11-27 23:48 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-25  8:54 [PATCH net-next v2 0/3] xsk: introduce atomic for cq in generic path Jason Xing
2025-11-25  8:54 ` [PATCH net-next v2 1/3] xsk: add atomic cached_prod for copy mode Jason Xing
2025-11-25  8:54 ` [PATCH net-next v2 2/3] xsk: use atomic operations around " Jason Xing
2025-11-27 11:35   ` Paolo Abeni
2025-11-27 13:55     ` Jason Xing
2025-11-27 15:32       ` Paolo Abeni
2025-11-27 23:48         ` Jason Xing
2025-11-25  8:54 ` [PATCH net-next v2 3/3] xsk: remove spin lock protection of cached_prod Jason Xing
2025-11-27 11:29   ` Paolo Abeni
2025-11-27 13:18     ` Jason Xing

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).