* [PATCH net-next 0/2] xsk: mitigate the side effect of cq_lock @ 2025-10-25 6:53 Jason Xing 2025-10-25 6:53 ` [PATCH net-next 1/2] xsk: avoid using heavy lock when the pool is not shared Jason Xing 2025-10-25 6:53 ` [PATCH net-next 2/2] xsk: use a smaller new lock for shared pool case Jason Xing 0 siblings, 2 replies; 7+ messages in thread From: Jason Xing @ 2025-10-25 6:53 UTC (permalink / raw) To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms, andrew+netdev Cc: bpf, netdev, Jason Xing From: Jason Xing <kernelxing@tencent.com> Two optimizations regarding cq_lock can yield a performance increase because of avoiding disabling and enabling interrupts frequently. Jason Xing (2): xsk: avoid using heavy lock when the pool is not shared xsk: use a smaller new lock for shared pool case include/net/xsk_buff_pool.h | 13 +++++++++---- net/xdp/xsk.c | 20 ++++++++++++-------- net/xdp/xsk_buff_pool.c | 3 ++- 3 files changed, 23 insertions(+), 13 deletions(-) -- 2.41.3 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH net-next 1/2] xsk: avoid using heavy lock when the pool is not shared 2025-10-25 6:53 [PATCH net-next 0/2] xsk: mitigate the side effect of cq_lock Jason Xing @ 2025-10-25 6:53 ` Jason Xing 2025-10-29 0:29 ` Jakub Kicinski 2025-10-29 15:48 ` Maciej Fijalkowski 2025-10-25 6:53 ` [PATCH net-next 2/2] xsk: use a smaller new lock for shared pool case Jason Xing 1 sibling, 2 replies; 7+ messages in thread From: Jason Xing @ 2025-10-25 6:53 UTC (permalink / raw) To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms, andrew+netdev Cc: bpf, netdev, Jason Xing From: Jason Xing <kernelxing@tencent.com> The commit f09ced4053bc ("xsk: Fix race in SKB mode transmit with shared cq") uses a heavy lock (spin_lock_irqsave) for the shared pool scenario which is that multiple sockets share the same pool. It does harm to the case where the pool is only owned by one xsk. The patch distinguishes those two cases through checking if the xsk list only has one xsk. If so, that means the pool is exclusive and we don't need to hold the lock and disable IRQ at all. The benefit of this is to avoid those two operations being executed extremely frequently. With this patch, the performance number[1] could go from 1,872,565 pps to 2,147,803 pps. It's a noticeable rise of around 14.6%. [1]: taskset -c 1 ./xdpsock -i enp2s0f1 -q 0 -t -S -s 64 Signed-off-by: Jason Xing <kernelxing@tencent.com> --- net/xdp/xsk.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 7b0c68a70888..76f797fcc49c 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -548,12 +548,15 @@ static int xsk_wakeup(struct xdp_sock *xs, u8 flags) static int xsk_cq_reserve_locked(struct xsk_buff_pool *pool) { + bool lock = !list_is_singular(&pool->xsk_tx_list); unsigned long flags; int ret; - spin_lock_irqsave(&pool->cq_lock, flags); + if (lock) + spin_lock_irqsave(&pool->cq_lock, flags); ret = xskq_prod_reserve(pool->cq); - spin_unlock_irqrestore(&pool->cq_lock, flags); + if (lock) + spin_unlock_irqrestore(&pool->cq_lock, flags); return ret; } @@ -588,11 +591,14 @@ static void xsk_cq_submit_addr_locked(struct xsk_buff_pool *pool, static void xsk_cq_cancel_locked(struct xsk_buff_pool *pool, u32 n) { + bool lock = !list_is_singular(&pool->xsk_tx_list); unsigned long flags; - spin_lock_irqsave(&pool->cq_lock, flags); + if (lock) + spin_lock_irqsave(&pool->cq_lock, flags); xskq_prod_cancel_n(pool->cq, n); - spin_unlock_irqrestore(&pool->cq_lock, flags); + if (lock) + spin_unlock_irqrestore(&pool->cq_lock, flags); } static void xsk_inc_num_desc(struct sk_buff *skb) -- 2.41.3 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH net-next 1/2] xsk: avoid using heavy lock when the pool is not shared 2025-10-25 6:53 ` [PATCH net-next 1/2] xsk: avoid using heavy lock when the pool is not shared Jason Xing @ 2025-10-29 0:29 ` Jakub Kicinski 2025-10-29 1:36 ` Jason Xing 2025-10-29 15:48 ` Maciej Fijalkowski 1 sibling, 1 reply; 7+ messages in thread From: Jakub Kicinski @ 2025-10-29 0:29 UTC (permalink / raw) To: Jason Xing Cc: davem, edumazet, pabeni, bjorn, magnus.karlsson, maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing On Sat, 25 Oct 2025 14:53:09 +0800 Jason Xing wrote: > static int xsk_cq_reserve_locked(struct xsk_buff_pool *pool) > { > + bool lock = !list_is_singular(&pool->xsk_tx_list); > unsigned long flags; > int ret; > > - spin_lock_irqsave(&pool->cq_lock, flags); > + if (lock) > + spin_lock_irqsave(&pool->cq_lock, flags); > ret = xskq_prod_reserve(pool->cq); > - spin_unlock_irqrestore(&pool->cq_lock, flags); > + if (lock) > + spin_unlock_irqrestore(&pool->cq_lock, flags); Please explain in the commit message what guarantees that the list will remain singular until the function exits. -- pw-bot: cr ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next 1/2] xsk: avoid using heavy lock when the pool is not shared 2025-10-29 0:29 ` Jakub Kicinski @ 2025-10-29 1:36 ` Jason Xing 0 siblings, 0 replies; 7+ messages in thread From: Jason Xing @ 2025-10-29 1:36 UTC (permalink / raw) To: Jakub Kicinski Cc: davem, edumazet, pabeni, bjorn, magnus.karlsson, maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing On Wed, Oct 29, 2025 at 8:29 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Sat, 25 Oct 2025 14:53:09 +0800 Jason Xing wrote: > > static int xsk_cq_reserve_locked(struct xsk_buff_pool *pool) > > { > > + bool lock = !list_is_singular(&pool->xsk_tx_list); > > unsigned long flags; > > int ret; > > > > - spin_lock_irqsave(&pool->cq_lock, flags); > > + if (lock) > > + spin_lock_irqsave(&pool->cq_lock, flags); > > ret = xskq_prod_reserve(pool->cq); > > - spin_unlock_irqrestore(&pool->cq_lock, flags); > > + if (lock) > > + spin_unlock_irqrestore(&pool->cq_lock, flags); > > Please explain in the commit message what guarantees that the list will > remain singular until the function exits. Thanks for bringing up a good point that I missed before. I think I might acquire xsk_tx_list_lock first to make sure xp_add_xsk() will not interfere with this process. I will figure it out soon. Thanks, Jason > -- > pw-bot: cr ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next 1/2] xsk: avoid using heavy lock when the pool is not shared 2025-10-25 6:53 ` [PATCH net-next 1/2] xsk: avoid using heavy lock when the pool is not shared Jason Xing 2025-10-29 0:29 ` Jakub Kicinski @ 2025-10-29 15:48 ` Maciej Fijalkowski 2025-10-29 23:43 ` Jason Xing 1 sibling, 1 reply; 7+ messages in thread From: Maciej Fijalkowski @ 2025-10-29 15:48 UTC (permalink / raw) To: Jason Xing Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing On Sat, Oct 25, 2025 at 02:53:09PM +0800, Jason Xing wrote: > From: Jason Xing <kernelxing@tencent.com> > > The commit f09ced4053bc ("xsk: Fix race in SKB mode transmit with > shared cq") uses a heavy lock (spin_lock_irqsave) for the shared > pool scenario which is that multiple sockets share the same pool. > > It does harm to the case where the pool is only owned by one xsk. > The patch distinguishes those two cases through checking if the xsk > list only has one xsk. If so, that means the pool is exclusive and > we don't need to hold the lock and disable IRQ at all. The benefit > of this is to avoid those two operations being executed extremely > frequently. Even with a single CQ producer we need to have related code within critical section. One core can be in process context via sendmsg() and for some reason xmit failed and driver consumed skb (destructor called). Other core can be at same time calling the destructor on different skb that has been successfully xmitted, doing the Tx completion via driver's NAPI. This means that without locking the SPSC concept would be violated. So I'm afraid I have to nack this. > > With this patch, the performance number[1] could go from 1,872,565 pps > to 2,147,803 pps. It's a noticeable rise of around 14.6%. > > [1]: taskset -c 1 ./xdpsock -i enp2s0f1 -q 0 -t -S -s 64 > > Signed-off-by: Jason Xing <kernelxing@tencent.com> > --- > net/xdp/xsk.c | 14 ++++++++++---- > 1 file changed, 10 insertions(+), 4 deletions(-) > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > index 7b0c68a70888..76f797fcc49c 100644 > --- a/net/xdp/xsk.c > +++ b/net/xdp/xsk.c > @@ -548,12 +548,15 @@ static int xsk_wakeup(struct xdp_sock *xs, u8 flags) > > static int xsk_cq_reserve_locked(struct xsk_buff_pool *pool) > { > + bool lock = !list_is_singular(&pool->xsk_tx_list); > unsigned long flags; > int ret; > > - spin_lock_irqsave(&pool->cq_lock, flags); > + if (lock) > + spin_lock_irqsave(&pool->cq_lock, flags); > ret = xskq_prod_reserve(pool->cq); > - spin_unlock_irqrestore(&pool->cq_lock, flags); > + if (lock) > + spin_unlock_irqrestore(&pool->cq_lock, flags); > > return ret; > } > @@ -588,11 +591,14 @@ static void xsk_cq_submit_addr_locked(struct xsk_buff_pool *pool, > > static void xsk_cq_cancel_locked(struct xsk_buff_pool *pool, u32 n) > { > + bool lock = !list_is_singular(&pool->xsk_tx_list); > unsigned long flags; > > - spin_lock_irqsave(&pool->cq_lock, flags); > + if (lock) > + spin_lock_irqsave(&pool->cq_lock, flags); > xskq_prod_cancel_n(pool->cq, n); > - spin_unlock_irqrestore(&pool->cq_lock, flags); > + if (lock) > + spin_unlock_irqrestore(&pool->cq_lock, flags); > } > > static void xsk_inc_num_desc(struct sk_buff *skb) > -- > 2.41.3 > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next 1/2] xsk: avoid using heavy lock when the pool is not shared 2025-10-29 15:48 ` Maciej Fijalkowski @ 2025-10-29 23:43 ` Jason Xing 0 siblings, 0 replies; 7+ messages in thread From: Jason Xing @ 2025-10-29 23:43 UTC (permalink / raw) To: Maciej Fijalkowski Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms, andrew+netdev, bpf, netdev, Jason Xing On Wed, Oct 29, 2025 at 11:48 PM Maciej Fijalkowski <maciej.fijalkowski@intel.com> wrote: > > On Sat, Oct 25, 2025 at 02:53:09PM +0800, Jason Xing wrote: > > From: Jason Xing <kernelxing@tencent.com> > > > > The commit f09ced4053bc ("xsk: Fix race in SKB mode transmit with > > shared cq") uses a heavy lock (spin_lock_irqsave) for the shared > > pool scenario which is that multiple sockets share the same pool. > > > > It does harm to the case where the pool is only owned by one xsk. > > The patch distinguishes those two cases through checking if the xsk > > list only has one xsk. If so, that means the pool is exclusive and > > we don't need to hold the lock and disable IRQ at all. The benefit > > of this is to avoid those two operations being executed extremely > > frequently. > > Even with a single CQ producer we need to have related code within > critical section. One core can be in process context via sendmsg() and > for some reason xmit failed and driver consumed skb (destructor called). > > Other core can be at same time calling the destructor on different skb > that has been successfully xmitted, doing the Tx completion via driver's > NAPI. This means that without locking the SPSC concept would be violated. > > So I'm afraid I have to nack this. But that will not happen around cq->cached_prod. All the possible places where cached_prod is modified are in the process context. I've already pointed out the different subtle cases in patch [2/2]. SPSC is all about the global state of producer and consumer that can affect both layers instead of local or cached ones. So that's why we can apply a lockless policy in this patch when the pool is exclusive and why we can use a smaller lock as patch [2/2] shows. As to how to prevent the case like Jakub mentioned, so far I cannot find a good solution unless introducing a new option that limits one xsk binding to only one unique pool. But probably it's not worth it. It's the reason why I will scrap this patch in V2. Thanks, Jason ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH net-next 2/2] xsk: use a smaller new lock for shared pool case 2025-10-25 6:53 [PATCH net-next 0/2] xsk: mitigate the side effect of cq_lock Jason Xing 2025-10-25 6:53 ` [PATCH net-next 1/2] xsk: avoid using heavy lock when the pool is not shared Jason Xing @ 2025-10-25 6:53 ` Jason Xing 1 sibling, 0 replies; 7+ messages in thread From: Jason Xing @ 2025-10-25 6:53 UTC (permalink / raw) To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson, maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, horms, andrew+netdev Cc: bpf, netdev, Jason Xing From: Jason Xing <kernelxing@tencent.com> - Split cq_lock into two smaller locks: cq_prod_lock and cq_cached_prod_lock - Avoid disabling/enabling interrupts in the hot xmit path In either xsk_cq_cancel_locked() or xsk_cq_reserve_locked() function, the race condition is only between multiple xsks sharing the same pool. They are all in the process context rather than interrupt context, so now the small lock named cq_cached_prod_lock can be used without handling interrupts. While cq_cached_prod_lock ensures the exclusive modification of @cached_prod, cq_prod_lock in xsk_cq_submit_addr_locked() only cares about @producer and corresponding @desc. Both of them don't necessarily be consistent with @cached_prod protected by cq_cached_prod_lock. That's the reason why the previous big lock can be split into two smaller ones. Frequently disabling and enabling interrupt are very time consuming in some cases, especially in a per-descriptor granularity, which now can be avoided after this optimization, even when the pool is shared by multiple xsks. Signed-off-by: Jason Xing <kernelxing@tencent.com> --- include/net/xsk_buff_pool.h | 13 +++++++++---- net/xdp/xsk.c | 14 ++++++-------- net/xdp/xsk_buff_pool.c | 3 ++- 3 files changed, 17 insertions(+), 13 deletions(-) diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h index cac56e6b0869..92a2358c6ce3 100644 --- a/include/net/xsk_buff_pool.h +++ b/include/net/xsk_buff_pool.h @@ -85,11 +85,16 @@ struct xsk_buff_pool { bool unaligned; bool tx_sw_csum; void *addrs; - /* Mutual exclusion of the completion ring in the SKB mode. Two cases to protect: - * NAPI TX thread and sendmsg error paths in the SKB destructor callback and when - * sockets share a single cq when the same netdev and queue id is shared. + /* Mutual exclusion of the completion ring in the SKB mode. + * Protect: NAPI TX thread and sendmsg error paths in the SKB + * destructor callback. */ - spinlock_t cq_lock; + spinlock_t cq_prod_lock; + /* Mutual exclusion of the completion ring in the SKB mode. + * Protect: when sockets share a single cq when the same netdev + * and queue id is shared. + */ + spinlock_t cq_cached_prod_lock; struct xdp_buff_xsk *free_heads[]; }; diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 76f797fcc49c..d254817b8a53 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -549,14 +549,13 @@ static int xsk_wakeup(struct xdp_sock *xs, u8 flags) static int xsk_cq_reserve_locked(struct xsk_buff_pool *pool) { bool lock = !list_is_singular(&pool->xsk_tx_list); - unsigned long flags; int ret; if (lock) - spin_lock_irqsave(&pool->cq_lock, flags); + spin_lock(&pool->cq_cached_prod_lock); ret = xskq_prod_reserve(pool->cq); if (lock) - spin_unlock_irqrestore(&pool->cq_lock, flags); + spin_unlock(&pool->cq_cached_prod_lock); return ret; } @@ -569,7 +568,7 @@ static void xsk_cq_submit_addr_locked(struct xsk_buff_pool *pool, unsigned long flags; u32 idx; - spin_lock_irqsave(&pool->cq_lock, flags); + spin_lock_irqsave(&pool->cq_prod_lock, flags); idx = xskq_get_prod(pool->cq); xskq_prod_write_addr(pool->cq, idx, @@ -586,19 +585,18 @@ static void xsk_cq_submit_addr_locked(struct xsk_buff_pool *pool, } } xskq_prod_submit_n(pool->cq, descs_processed); - spin_unlock_irqrestore(&pool->cq_lock, flags); + spin_unlock_irqrestore(&pool->cq_prod_lock, flags); } static void xsk_cq_cancel_locked(struct xsk_buff_pool *pool, u32 n) { bool lock = !list_is_singular(&pool->xsk_tx_list); - unsigned long flags; if (lock) - spin_lock_irqsave(&pool->cq_lock, flags); + spin_lock(&pool->cq_cached_prod_lock); xskq_prod_cancel_n(pool->cq, n); if (lock) - spin_unlock_irqrestore(&pool->cq_lock, flags); + spin_unlock(&pool->cq_cached_prod_lock); } static void xsk_inc_num_desc(struct sk_buff *skb) diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c index aa9788f20d0d..add44bd09cae 100644 --- a/net/xdp/xsk_buff_pool.c +++ b/net/xdp/xsk_buff_pool.c @@ -94,7 +94,8 @@ struct xsk_buff_pool *xp_create_and_assign_umem(struct xdp_sock *xs, INIT_LIST_HEAD(&pool->xskb_list); INIT_LIST_HEAD(&pool->xsk_tx_list); spin_lock_init(&pool->xsk_tx_list_lock); - spin_lock_init(&pool->cq_lock); + spin_lock_init(&pool->cq_prod_lock); + spin_lock_init(&pool->cq_cached_prod_lock); refcount_set(&pool->users, 1); pool->fq = xs->fq_tmp; -- 2.41.3 ^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-10-29 23:44 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-10-25 6:53 [PATCH net-next 0/2] xsk: mitigate the side effect of cq_lock Jason Xing 2025-10-25 6:53 ` [PATCH net-next 1/2] xsk: avoid using heavy lock when the pool is not shared Jason Xing 2025-10-29 0:29 ` Jakub Kicinski 2025-10-29 1:36 ` Jason Xing 2025-10-29 15:48 ` Maciej Fijalkowski 2025-10-29 23:43 ` Jason Xing 2025-10-25 6:53 ` [PATCH net-next 2/2] xsk: use a smaller new lock for shared pool case Jason Xing
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).