[PATCH net-next 0/2] xsk: mitigate the side effect of cq

bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next 0/2] xsk: mitigate the side effect of cq_lock
@ 2025-10-25  6:53 Jason Xing
  2025-10-25  6:53 ` [PATCH net-next 1/2] xsk: avoid using heavy lock when the pool is not shared Jason Xing
  2025-10-25  6:53 ` [PATCH net-next 2/2] xsk: use a smaller new lock for shared pool case Jason Xing
  0 siblings, 2 replies; 7+ messages in thread
From: Jason Xing @ 2025-10-25  6:53 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

Two optimizations regarding cq_lock can yield a performance increase
because of avoiding disabling and enabling interrupts frequently.

Jason Xing (2):
  xsk: avoid using heavy lock when the pool is not shared
  xsk: use a smaller new lock for shared pool case

 include/net/xsk_buff_pool.h | 13 +++++++++----
 net/xdp/xsk.c               | 20 ++++++++++++--------
 net/xdp/xsk_buff_pool.c     |  3 ++-
 3 files changed, 23 insertions(+), 13 deletions(-)

-- 
2.41.3


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net-next 1/2] xsk: avoid using heavy lock when the pool is not shared
  2025-10-25  6:53 [PATCH net-next 0/2] xsk: mitigate the side effect of cq_lock Jason Xing
@ 2025-10-25  6:53 ` Jason Xing
  2025-10-29  0:29   ` Jakub Kicinski
  2025-10-29 15:48   ` Maciej Fijalkowski
  2025-10-25  6:53 ` [PATCH net-next 2/2] xsk: use a smaller new lock for shared pool case Jason Xing
  1 sibling, 2 replies; 7+ messages in thread
From: Jason Xing @ 2025-10-25  6:53 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

The commit f09ced4053bc ("xsk: Fix race in SKB mode transmit with
shared cq") uses a heavy lock (spin_lock_irqsave) for the shared
pool scenario which is that multiple sockets share the same pool.

It does harm to the case where the pool is only owned by one xsk.
The patch distinguishes those two cases through checking if the xsk
list only has one xsk. If so, that means the pool is exclusive and
we don't need to hold the lock and disable IRQ at all. The benefit
of this is to avoid those two operations being executed extremely
frequently.

With this patch, the performance number[1] could go from 1,872,565 pps
to 2,147,803 pps. It's a noticeable rise of around 14.6%.

[1]: taskset -c 1 ./xdpsock -i enp2s0f1 -q 0 -t -S -s 64

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 net/xdp/xsk.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 7b0c68a70888..76f797fcc49c 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -548,12 +548,15 @@ static int xsk_wakeup(struct xdp_sock *xs, u8 flags)
 
 static int xsk_cq_reserve_locked(struct xsk_buff_pool *pool)
 {
+	bool lock = !list_is_singular(&pool->xsk_tx_list);
 	unsigned long flags;
 	int ret;
 
-	spin_lock_irqsave(&pool->cq_lock, flags);
+	if (lock)
+		spin_lock_irqsave(&pool->cq_lock, flags);
 	ret = xskq_prod_reserve(pool->cq);
-	spin_unlock_irqrestore(&pool->cq_lock, flags);
+	if (lock)
+		spin_unlock_irqrestore(&pool->cq_lock, flags);
 
 	return ret;
 }
@@ -588,11 +591,14 @@ static void xsk_cq_submit_addr_locked(struct xsk_buff_pool *pool,
 
 static void xsk_cq_cancel_locked(struct xsk_buff_pool *pool, u32 n)
 {
+	bool lock = !list_is_singular(&pool->xsk_tx_list);
 	unsigned long flags;
 
-	spin_lock_irqsave(&pool->cq_lock, flags);
+	if (lock)
+		spin_lock_irqsave(&pool->cq_lock, flags);
 	xskq_prod_cancel_n(pool->cq, n);
-	spin_unlock_irqrestore(&pool->cq_lock, flags);
+	if (lock)
+		spin_unlock_irqrestore(&pool->cq_lock, flags);
 }
 
 static void xsk_inc_num_desc(struct sk_buff *skb)
-- 
2.41.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 1/2] xsk: avoid using heavy lock when the pool is not shared
  2025-10-25  6:53 ` [PATCH net-next 1/2] xsk: avoid using heavy lock when the pool is not shared Jason Xing
@ 2025-10-29  0:29   ` Jakub Kicinski
  2025-10-29  1:36     ` Jason Xing
  2025-10-29 15:48   ` Maciej Fijalkowski
  1 sibling, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2025-10-29  0:29 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing

On Sat, 25 Oct 2025 14:53:09 +0800 Jason Xing wrote:
>  static int xsk_cq_reserve_locked(struct xsk_buff_pool *pool)
>  {
> +	bool lock = !list_is_singular(&pool->xsk_tx_list);
>  	unsigned long flags;
>  	int ret;
>  
> -	spin_lock_irqsave(&pool->cq_lock, flags);
> +	if (lock)
> +		spin_lock_irqsave(&pool->cq_lock, flags);
>  	ret = xskq_prod_reserve(pool->cq);
> -	spin_unlock_irqrestore(&pool->cq_lock, flags);
> +	if (lock)
> +		spin_unlock_irqrestore(&pool->cq_lock, flags);

Please explain in the commit message what guarantees that the list will
remain singular until the function exits.
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 1/2] xsk: avoid using heavy lock when the pool is not shared
  2025-10-29  0:29   ` Jakub Kicinski
@ 2025-10-29  1:36     ` Jason Xing
  0 siblings, 0 replies; 7+ messages in thread
From: Jason Xing @ 2025-10-29  1:36 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, edumazet, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing

On Wed, Oct 29, 2025 at 8:29 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sat, 25 Oct 2025 14:53:09 +0800 Jason Xing wrote:
> >  static int xsk_cq_reserve_locked(struct xsk_buff_pool *pool)
> >  {
> > +     bool lock = !list_is_singular(&pool->xsk_tx_list);
> >       unsigned long flags;
> >       int ret;
> >
> > -     spin_lock_irqsave(&pool->cq_lock, flags);
> > +     if (lock)
> > +             spin_lock_irqsave(&pool->cq_lock, flags);
> >       ret = xskq_prod_reserve(pool->cq);
> > -     spin_unlock_irqrestore(&pool->cq_lock, flags);
> > +     if (lock)
> > +             spin_unlock_irqrestore(&pool->cq_lock, flags);
>
> Please explain in the commit message what guarantees that the list will
> remain singular until the function exits.

Thanks for bringing up a good point that I missed before. I think I
might acquire xsk_tx_list_lock first to make sure xp_add_xsk() will
not interfere with this process. I will figure it out soon.

Thanks,
Jason

> --
> pw-bot: cr

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 1/2] xsk: avoid using heavy lock when the pool is not shared
  2025-10-25  6:53 ` [PATCH net-next 1/2] xsk: avoid using heavy lock when the pool is not shared Jason Xing
  2025-10-29  0:29   ` Jakub Kicinski
@ 2025-10-29 15:48   ` Maciej Fijalkowski
  2025-10-29 23:43     ` Jason Xing
  1 sibling, 1 reply; 7+ messages in thread
From: Maciej Fijalkowski @ 2025-10-29 15:48 UTC (permalink / raw)
  To: Jason Xing
  Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms,
	andrew+netdev, bpf, netdev, Jason Xing

On Sat, Oct 25, 2025 at 02:53:09PM +0800, Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
> 
> The commit f09ced4053bc ("xsk: Fix race in SKB mode transmit with
> shared cq") uses a heavy lock (spin_lock_irqsave) for the shared
> pool scenario which is that multiple sockets share the same pool.
> 
> It does harm to the case where the pool is only owned by one xsk.
> The patch distinguishes those two cases through checking if the xsk
> list only has one xsk. If so, that means the pool is exclusive and
> we don't need to hold the lock and disable IRQ at all. The benefit
> of this is to avoid those two operations being executed extremely
> frequently.

Even with a single CQ producer we need to have related code within
critical section. One core can be in process context via sendmsg() and
for some reason xmit failed and driver consumed skb (destructor called).

Other core can be at same time calling the destructor on different skb
that has been successfully xmitted, doing the Tx completion via driver's
NAPI. This means that without locking the SPSC concept would be violated.

So I'm afraid I have to nack this.

> 
> With this patch, the performance number[1] could go from 1,872,565 pps
> to 2,147,803 pps. It's a noticeable rise of around 14.6%.
> 
> [1]: taskset -c 1 ./xdpsock -i enp2s0f1 -q 0 -t -S -s 64
> 
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
>  net/xdp/xsk.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 7b0c68a70888..76f797fcc49c 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -548,12 +548,15 @@ static int xsk_wakeup(struct xdp_sock *xs, u8 flags)
>  
>  static int xsk_cq_reserve_locked(struct xsk_buff_pool *pool)
>  {
> +	bool lock = !list_is_singular(&pool->xsk_tx_list);
>  	unsigned long flags;
>  	int ret;
>  
> -	spin_lock_irqsave(&pool->cq_lock, flags);
> +	if (lock)
> +		spin_lock_irqsave(&pool->cq_lock, flags);
>  	ret = xskq_prod_reserve(pool->cq);
> -	spin_unlock_irqrestore(&pool->cq_lock, flags);
> +	if (lock)
> +		spin_unlock_irqrestore(&pool->cq_lock, flags);
>  
>  	return ret;
>  }
> @@ -588,11 +591,14 @@ static void xsk_cq_submit_addr_locked(struct xsk_buff_pool *pool,
>  
>  static void xsk_cq_cancel_locked(struct xsk_buff_pool *pool, u32 n)
>  {
> +	bool lock = !list_is_singular(&pool->xsk_tx_list);
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&pool->cq_lock, flags);
> +	if (lock)
> +		spin_lock_irqsave(&pool->cq_lock, flags);
>  	xskq_prod_cancel_n(pool->cq, n);
> -	spin_unlock_irqrestore(&pool->cq_lock, flags);
> +	if (lock)
> +		spin_unlock_irqrestore(&pool->cq_lock, flags);
>  }
>  
>  static void xsk_inc_num_desc(struct sk_buff *skb)
> -- 
> 2.41.3
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 1/2] xsk: avoid using heavy lock when the pool is not shared
  2025-10-29 15:48   ` Maciej Fijalkowski
@ 2025-10-29 23:43     ` Jason Xing
  0 siblings, 0 replies; 7+ messages in thread
From: Jason Xing @ 2025-10-29 23:43 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms,
	andrew+netdev, bpf, netdev, Jason Xing

On Wed, Oct 29, 2025 at 11:48 PM Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> On Sat, Oct 25, 2025 at 02:53:09PM +0800, Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > The commit f09ced4053bc ("xsk: Fix race in SKB mode transmit with
> > shared cq") uses a heavy lock (spin_lock_irqsave) for the shared
> > pool scenario which is that multiple sockets share the same pool.
> >
> > It does harm to the case where the pool is only owned by one xsk.
> > The patch distinguishes those two cases through checking if the xsk
> > list only has one xsk. If so, that means the pool is exclusive and
> > we don't need to hold the lock and disable IRQ at all. The benefit
> > of this is to avoid those two operations being executed extremely
> > frequently.
>
> Even with a single CQ producer we need to have related code within
> critical section. One core can be in process context via sendmsg() and
> for some reason xmit failed and driver consumed skb (destructor called).
>
> Other core can be at same time calling the destructor on different skb
> that has been successfully xmitted, doing the Tx completion via driver's
> NAPI. This means that without locking the SPSC concept would be violated.
>
> So I'm afraid I have to nack this.

But that will not happen around cq->cached_prod. All the possible
places where cached_prod is modified are in the process context. I've
already pointed out the different subtle cases in patch [2/2].

SPSC is all about the global state of producer and consumer that can
affect both layers instead of local or cached ones. So that's why we
can apply a lockless policy in this patch when the pool is exclusive
and why we can use a smaller lock as patch [2/2] shows.

As to how to prevent the case like Jakub mentioned, so far I cannot
find a good solution unless introducing a new option that limits one
xsk binding to only one unique pool. But probably it's not worth it.
It's the reason why I will scrap this patch in V2.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net-next 2/2] xsk: use a smaller new lock for shared pool case
  2025-10-25  6:53 [PATCH net-next 0/2] xsk: mitigate the side effect of cq_lock Jason Xing
  2025-10-25  6:53 ` [PATCH net-next 1/2] xsk: avoid using heavy lock when the pool is not shared Jason Xing
@ 2025-10-25  6:53 ` Jason Xing
  1 sibling, 0 replies; 7+ messages in thread
From: Jason Xing @ 2025-10-25  6:53 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
	maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
	john.fastabend, horms, andrew+netdev
  Cc: bpf, netdev, Jason Xing

From: Jason Xing <kernelxing@tencent.com>

- Split cq_lock into two smaller locks: cq_prod_lock and
  cq_cached_prod_lock
- Avoid disabling/enabling interrupts in the hot xmit path

In either xsk_cq_cancel_locked() or xsk_cq_reserve_locked() function,
the race condition is only between multiple xsks sharing the same
pool. They are all in the process context rather than interrupt context,
so now the small lock named cq_cached_prod_lock can be used without
handling interrupts.

While cq_cached_prod_lock ensures the exclusive modification of
@cached_prod, cq_prod_lock in xsk_cq_submit_addr_locked() only cares
about @producer and corresponding @desc. Both of them don't necessarily
be consistent with @cached_prod protected by cq_cached_prod_lock.
That's the reason why the previous big lock can be split into two
smaller ones.

Frequently disabling and enabling interrupt are very time consuming
in some cases, especially in a per-descriptor granularity, which now
can be avoided after this optimization, even when the pool is shared by
multiple xsks.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 include/net/xsk_buff_pool.h | 13 +++++++++----
 net/xdp/xsk.c               | 14 ++++++--------
 net/xdp/xsk_buff_pool.c     |  3 ++-
 3 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
index cac56e6b0869..92a2358c6ce3 100644
--- a/include/net/xsk_buff_pool.h
+++ b/include/net/xsk_buff_pool.h
@@ -85,11 +85,16 @@ struct xsk_buff_pool {
 	bool unaligned;
 	bool tx_sw_csum;
 	void *addrs;
-	/* Mutual exclusion of the completion ring in the SKB mode. Two cases to protect:
-	 * NAPI TX thread and sendmsg error paths in the SKB destructor callback and when
-	 * sockets share a single cq when the same netdev and queue id is shared.
+	/* Mutual exclusion of the completion ring in the SKB mode.
+	 * Protect: NAPI TX thread and sendmsg error paths in the SKB
+	 * destructor callback.
 	 */
-	spinlock_t cq_lock;
+	spinlock_t cq_prod_lock;
+	/* Mutual exclusion of the completion ring in the SKB mode.
+	 * Protect: when sockets share a single cq when the same netdev
+	 * and queue id is shared.
+	 */
+	spinlock_t cq_cached_prod_lock;
 	struct xdp_buff_xsk *free_heads[];
 };
 
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 76f797fcc49c..d254817b8a53 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -549,14 +549,13 @@ static int xsk_wakeup(struct xdp_sock *xs, u8 flags)
 static int xsk_cq_reserve_locked(struct xsk_buff_pool *pool)
 {
 	bool lock = !list_is_singular(&pool->xsk_tx_list);
-	unsigned long flags;
 	int ret;
 
 	if (lock)
-		spin_lock_irqsave(&pool->cq_lock, flags);
+		spin_lock(&pool->cq_cached_prod_lock);
 	ret = xskq_prod_reserve(pool->cq);
 	if (lock)
-		spin_unlock_irqrestore(&pool->cq_lock, flags);
+		spin_unlock(&pool->cq_cached_prod_lock);
 
 	return ret;
 }
@@ -569,7 +568,7 @@ static void xsk_cq_submit_addr_locked(struct xsk_buff_pool *pool,
 	unsigned long flags;
 	u32 idx;
 
-	spin_lock_irqsave(&pool->cq_lock, flags);
+	spin_lock_irqsave(&pool->cq_prod_lock, flags);
 	idx = xskq_get_prod(pool->cq);
 
 	xskq_prod_write_addr(pool->cq, idx,
@@ -586,19 +585,18 @@ static void xsk_cq_submit_addr_locked(struct xsk_buff_pool *pool,
 		}
 	}
 	xskq_prod_submit_n(pool->cq, descs_processed);
-	spin_unlock_irqrestore(&pool->cq_lock, flags);
+	spin_unlock_irqrestore(&pool->cq_prod_lock, flags);
 }
 
 static void xsk_cq_cancel_locked(struct xsk_buff_pool *pool, u32 n)
 {
 	bool lock = !list_is_singular(&pool->xsk_tx_list);
-	unsigned long flags;
 
 	if (lock)
-		spin_lock_irqsave(&pool->cq_lock, flags);
+		spin_lock(&pool->cq_cached_prod_lock);
 	xskq_prod_cancel_n(pool->cq, n);
 	if (lock)
-		spin_unlock_irqrestore(&pool->cq_lock, flags);
+		spin_unlock(&pool->cq_cached_prod_lock);
 }
 
 static void xsk_inc_num_desc(struct sk_buff *skb)
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index aa9788f20d0d..add44bd09cae 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -94,7 +94,8 @@ struct xsk_buff_pool *xp_create_and_assign_umem(struct xdp_sock *xs,
 	INIT_LIST_HEAD(&pool->xskb_list);
 	INIT_LIST_HEAD(&pool->xsk_tx_list);
 	spin_lock_init(&pool->xsk_tx_list_lock);
-	spin_lock_init(&pool->cq_lock);
+	spin_lock_init(&pool->cq_prod_lock);
+	spin_lock_init(&pool->cq_cached_prod_lock);
 	refcount_set(&pool->users, 1);
 
 	pool->fq = xs->fq_tmp;
-- 
2.41.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-10-29 23:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-25  6:53 [PATCH net-next 0/2] xsk: mitigate the side effect of cq_lock Jason Xing
2025-10-25  6:53 ` [PATCH net-next 1/2] xsk: avoid using heavy lock when the pool is not shared Jason Xing
2025-10-29  0:29   ` Jakub Kicinski
2025-10-29  1:36     ` Jason Xing
2025-10-29 15:48   ` Maciej Fijalkowski
2025-10-29 23:43     ` Jason Xing
2025-10-25  6:53 ` [PATCH net-next 2/2] xsk: use a smaller new lock for shared pool case Jason Xing

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).