* [RFC 1/3] net/cnxk: add pause to spinloops
2026-01-21 18:05 [RFC 0/3] Add pause to empty spinloops Stephen Hemminger
@ 2026-01-21 18:05 ` Stephen Hemminger
2026-01-21 18:05 ` [RFC 2/3] event/cnxk: " Stephen Hemminger
2026-01-21 18:05 ` [RFC 3/3] devtools/cocci: add script to find empty spinloops Stephen Hemminger
2 siblings, 0 replies; 5+ messages in thread
From: Stephen Hemminger @ 2026-01-21 18:05 UTC (permalink / raw)
To: dev
Cc: Stephen Hemminger, Nithin Dabilpuram, Kiran Kumar K,
Sunil Kumar Kori, Satha Rao, Harman Kalra
On SMT systems when a spinloop is done without a pause
it may cause excessive latency. This problem was found
by the fix_empty_spinloops coccinelle script.
This is compile tested only! I don't have this hardware.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
drivers/net/cnxk/cn10k_tx.h | 4 ++--
drivers/net/cnxk/cn20k_tx.h | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/cnxk/cn10k_tx.h b/drivers/net/cnxk/cn10k_tx.h
index be9e020ac5..3f4cad168c 100644
--- a/drivers/net/cnxk/cn10k_tx.h
+++ b/drivers/net/cnxk/cn10k_tx.h
@@ -167,7 +167,7 @@ cn10k_nix_vwqe_wait_fc(struct cn10k_eth_txq *txq, uint16_t req)
#else
RTE_SET_USED(pkts);
while (rte_atomic_load_explicit(&txq->fc_cache_pkts, rte_memory_order_relaxed) < 0)
- ;
+ rte_pause();
#endif
cached = rte_atomic_fetch_sub_explicit(&txq->fc_cache_pkts, req, rte_memory_order_acquire) -
req;
@@ -402,7 +402,7 @@ cn10k_nix_sec_fc_wait(struct cn10k_eth_txq *txq, uint16_t nb_pkts)
#else
/* Wait for primary core to refill FC. */
while (rte_atomic_load_explicit(fc_sw, rte_memory_order_relaxed) < 0)
- ;
+ rte_pause();
#endif
val = rte_atomic_fetch_sub_explicit(fc_sw, nb_pkts, rte_memory_order_acquire) - nb_pkts;
diff --git a/drivers/net/cnxk/cn20k_tx.h b/drivers/net/cnxk/cn20k_tx.h
index 9e48744831..3dfad5fd5a 100644
--- a/drivers/net/cnxk/cn20k_tx.h
+++ b/drivers/net/cnxk/cn20k_tx.h
@@ -165,7 +165,7 @@ cn20k_nix_vwqe_wait_fc(struct cn20k_eth_txq *txq, uint16_t req)
#else
RTE_SET_USED(pkts);
while (rte_atomic_load_explicit(&txq->fc_cache_pkts, rte_memory_order_relaxed) < 0)
- ;
+ rte_pause();
#endif
cached = rte_atomic_fetch_sub_explicit(&txq->fc_cache_pkts, req, rte_memory_order_acquire) -
req;
@@ -392,7 +392,7 @@ cn20k_nix_sec_fc_wait(struct cn20k_eth_txq *txq, uint16_t nb_pkts)
#else
/* Wait for primary core to refill FC. */
while (rte_atomic_load_explicit(fc_sw, rte_memory_order_relaxed) < 0)
- ;
+ rte_pause();
#endif
val = rte_atomic_fetch_sub_explicit(fc_sw, nb_pkts, rte_memory_order_acquire) - nb_pkts;
--
2.51.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [RFC 2/3] event/cnxk: add pause to spinloops
2026-01-21 18:05 [RFC 0/3] Add pause to empty spinloops Stephen Hemminger
2026-01-21 18:05 ` [RFC 1/3] net/cnxk: add pause to spinloops Stephen Hemminger
@ 2026-01-21 18:05 ` Stephen Hemminger
2026-01-21 21:01 ` Stephen Hemminger
2026-01-21 18:05 ` [RFC 3/3] devtools/cocci: add script to find empty spinloops Stephen Hemminger
2 siblings, 1 reply; 5+ messages in thread
From: Stephen Hemminger @ 2026-01-21 18:05 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, Pavan Nikhilesh, Shijith Thotton
On SMT systems when a spinloop is done without a pause
it may cause excessive latency. This problem was found
by the fix_empty_spinloops coccinelle script.
This is compile tested only! I don't have this hardware.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
drivers/event/cnxk/cn10k_worker.c | 2 +-
drivers/event/cnxk/cn20k_worker.c | 2 +-
drivers/event/cnxk/cnxk_tim_worker.h | 4 ++--
3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/event/cnxk/cn10k_worker.c b/drivers/event/cnxk/cn10k_worker.c
index 80077ec8a1..69ac67115a 100644
--- a/drivers/event/cnxk/cn10k_worker.c
+++ b/drivers/event/cnxk/cn10k_worker.c
@@ -93,7 +93,7 @@ sso_lmt_aw_wait_fc(struct cn10k_sso_hws *ws, int64_t req)
retry:
while (rte_atomic_load_explicit(ws->fc_cache_space, rte_memory_order_relaxed) < 0)
- ;
+ rte_pause();
cached = rte_atomic_fetch_sub_explicit(ws->fc_cache_space, req, rte_memory_order_acquire) -
req;
diff --git a/drivers/event/cnxk/cn20k_worker.c b/drivers/event/cnxk/cn20k_worker.c
index 53daf3b4b0..49dfb2a28c 100644
--- a/drivers/event/cnxk/cn20k_worker.c
+++ b/drivers/event/cnxk/cn20k_worker.c
@@ -93,7 +93,7 @@ sso_lmt_aw_wait_fc(struct cn20k_sso_hws *ws, int64_t req)
retry:
while (rte_atomic_load_explicit(ws->fc_cache_space, rte_memory_order_relaxed) < 0)
- ;
+ rte_pause();
cached = rte_atomic_fetch_sub_explicit(ws->fc_cache_space, req, rte_memory_order_acquire) -
req;
diff --git a/drivers/event/cnxk/cnxk_tim_worker.h b/drivers/event/cnxk/cnxk_tim_worker.h
index 09f84091ab..887c0800e2 100644
--- a/drivers/event/cnxk/cnxk_tim_worker.h
+++ b/drivers/event/cnxk/cnxk_tim_worker.h
@@ -405,9 +405,9 @@ cnxk_tim_add_entry_mp(struct cnxk_tim_ring *const tim_ring,
: [crem] "r"(&bkt->w1)
: "memory");
#else
- while (rte_atomic_load_explicit((int64_t __rte_atomic *)&bkt->w1,
+ while (rte_atomic_load_explicit((int64_t __rte_atomic *)&bkt->w1,
rte_memory_order_relaxed) < 0)
- ;
+ rte_pause();
#endif
goto __retry;
} else if (!rem) {
--
2.51.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [RFC 2/3] event/cnxk: add pause to spinloops
2026-01-21 18:05 ` [RFC 2/3] event/cnxk: " Stephen Hemminger
@ 2026-01-21 21:01 ` Stephen Hemminger
0 siblings, 0 replies; 5+ messages in thread
From: Stephen Hemminger @ 2026-01-21 21:01 UTC (permalink / raw)
To: dev; +Cc: Pavan Nikhilesh, Shijith Thotton
On Wed, 21 Jan 2026 10:05:43 -0800
Stephen Hemminger <stephen@networkplumber.org> wrote:
> diff --git a/drivers/event/cnxk/cnxk_tim_worker.h b/drivers/event/cnxk/cnxk_tim_worker.h
> index 09f84091ab..887c0800e2 100644
> --- a/drivers/event/cnxk/cnxk_tim_worker.h
> +++ b/drivers/event/cnxk/cnxk_tim_worker.h
> @@ -405,9 +405,9 @@ cnxk_tim_add_entry_mp(struct cnxk_tim_ring *const tim_ring,
> : [crem] "r"(&bkt->w1)
> : "memory");
> #else
> - while (rte_atomic_load_explicit((int64_t __rte_atomic *)&bkt->w1,
> + while (rte_atomic_load_explicit((int64_t __rte_atomic *)&bkt->w1,
> rte_memory_order_relaxed) < 0)
> - ;
> + rte_pause();
> #endif
I noticed while looking at the code there is assembly to do the wait for instructions.
Why doesn't this driver use the rte_unit_equal_64 instead?
^ permalink raw reply [flat|nested] 5+ messages in thread
* [RFC 3/3] devtools/cocci: add script to find empty spinloops
2026-01-21 18:05 [RFC 0/3] Add pause to empty spinloops Stephen Hemminger
2026-01-21 18:05 ` [RFC 1/3] net/cnxk: add pause to spinloops Stephen Hemminger
2026-01-21 18:05 ` [RFC 2/3] event/cnxk: " Stephen Hemminger
@ 2026-01-21 18:05 ` Stephen Hemminger
2 siblings, 0 replies; 5+ messages in thread
From: Stephen Hemminger @ 2026-01-21 18:05 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
This script finds and fixes many variations of the pattern:
while (!atomic(&flag));
to add a rte_pause() to the loop.
This type of loop was causing failures in the standalone atomic
tests on high core system. The script generalizes that to find other
places with the same problem.
Script was autogenerated by AI and works but may cover
more cases than really necessary.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
devtools/cocci/fix_empty_spinloops.cocci | 165 +++++++++++++++++++++++
1 file changed, 165 insertions(+)
create mode 100644 devtools/cocci/fix_empty_spinloops.cocci
diff --git a/devtools/cocci/fix_empty_spinloops.cocci b/devtools/cocci/fix_empty_spinloops.cocci
new file mode 100644
index 0000000000..ff64b30eac
--- /dev/null
+++ b/devtools/cocci/fix_empty_spinloops.cocci
@@ -0,0 +1,165 @@
+// SPDX-License-Identifier: BSD-3-Clause
+// Find and fix empty spin loops that should call rte_pause()
+//
+// Empty spin loops waste CPU cycles and can cause performance issues.
+// This script finds various forms of busy-wait loops and adds rte_pause()
+// to give hints to the CPU and reduce power consumption.
+
+// Rule 1: Handle rte_atomic*_read() variants
+@fix_atomic_read@
+expression ptr, val;
+@@
+
+(
+- while (rte_atomic16_read(ptr) == val);
++ while (rte_atomic16_read(ptr) == val)
++ rte_pause();
+|
+- while (rte_atomic16_read(ptr) != val);
++ while (rte_atomic16_read(ptr) != val)
++ rte_pause();
+|
+- while (rte_atomic32_read(ptr) == val);
++ while (rte_atomic32_read(ptr) == val)
++ rte_pause();
+|
+- while (rte_atomic32_read(ptr) != val);
++ while (rte_atomic32_read(ptr) != val)
++ rte_pause();
+|
+- while (rte_atomic64_read(ptr) == val);
++ while (rte_atomic64_read(ptr) == val)
++ rte_pause();
+|
+- while (rte_atomic64_read(ptr) != val);
++ while (rte_atomic64_read(ptr) != val)
++ rte_pause();
+)
+
+// Rule 2: Handle rte_atomic*_read() with comparison operators
+@fix_atomic_cmp@
+expression ptr, val;
+@@
+
+(
+- while (rte_atomic16_read(ptr) < val);
++ while (rte_atomic16_read(ptr) < val)
++ rte_pause();
+|
+- while (rte_atomic16_read(ptr) > val);
++ while (rte_atomic16_read(ptr) > val)
++ rte_pause();
+|
+- while (rte_atomic32_read(ptr) < val);
++ while (rte_atomic32_read(ptr) < val)
++ rte_pause();
+|
+- while (rte_atomic32_read(ptr) > val);
++ while (rte_atomic32_read(ptr) > val)
++ rte_pause();
+|
+- while (rte_atomic64_read(ptr) < val);
++ while (rte_atomic64_read(ptr) < val)
++ rte_pause();
+|
+- while (rte_atomic64_read(ptr) > val);
++ while (rte_atomic64_read(ptr) > val)
++ rte_pause();
+)
+
+// Rule 3: Handle C11 atomics with rte_atomic_load_explicit()
+@fix_c11_atomic@
+expression ptr, order, val;
+@@
+
+(
+- while (rte_atomic_load_explicit(ptr, order) == val);
++ while (rte_atomic_load_explicit(ptr, order) == val)
++ rte_pause();
+|
+- while (rte_atomic_load_explicit(ptr, order) != val);
++ while (rte_atomic_load_explicit(ptr, order) != val)
++ rte_pause();
+|
+- while (rte_atomic_load_explicit(ptr, order) < val);
++ while (rte_atomic_load_explicit(ptr, order) < val)
++ rte_pause();
+|
+- while (rte_atomic_load_explicit(ptr, order) > val);
++ while (rte_atomic_load_explicit(ptr, order) > val)
++ rte_pause();
+)
+
+// Rule 4: Handle __atomic_load_n() directly
+@fix_gcc_atomic@
+expression ptr, order, val;
+@@
+
+(
+- while (__atomic_load_n(ptr, order) == val);
++ while (__atomic_load_n(ptr, order) == val)
++ rte_pause();
+|
+- while (__atomic_load_n(ptr, order) != val);
++ while (__atomic_load_n(ptr, order) != val)
++ rte_pause();
+|
+- while (__atomic_load_n(ptr, order) < val);
++ while (__atomic_load_n(ptr, order) < val)
++ rte_pause();
+|
+- while (__atomic_load_n(ptr, order) > val);
++ while (__atomic_load_n(ptr, order) > val)
++ rte_pause();
+)
+
+// Rule 5: Handle volatile variable reads (simple dereference)
+@fix_volatile@
+expression E;
+identifier v;
+@@
+
+(
+- while (*v == E);
++ while (*v == E)
++ rte_pause();
+|
+- while (*v != E);
++ while (*v != E)
++ rte_pause();
+|
+- while (*v < E);
++ while (*v < E)
++ rte_pause();
+|
+- while (*v > E);
++ while (*v > E)
++ rte_pause();
+|
+- while (v == E);
++ while (v == E)
++ rte_pause();
+|
+- while (v != E);
++ while (v != E)
++ rte_pause();
+)
+
+// Rule 6: Handle negated conditions
+@fix_negated@
+expression ptr, val;
+@@
+
+(
+- while (!rte_atomic32_read(ptr));
++ while (!rte_atomic32_read(ptr))
++ rte_pause();
+|
+- while (!rte_atomic64_read(ptr));
++ while (!rte_atomic64_read(ptr))
++ rte_pause();
+|
+- while (!rte_atomic_load_explicit(ptr, val));
++ while (!rte_atomic_load_explicit(ptr, val))
++ rte_pause();
+)
--
2.51.0
^ permalink raw reply related [flat|nested] 5+ messages in thread