* [RFC 0/3] Add pause to empty spinloops
@ 2026-01-21 18:05 Stephen Hemminger
2026-01-21 18:05 ` [RFC 1/3] net/cnxk: add pause to spinloops Stephen Hemminger
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Stephen Hemminger @ 2026-01-21 18:05 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
On SMT systems, empty spinloops can cause excessive latency due to
the spinning core consuming resources that could be used by other
hardware threads. This series addresses this by adding rte_pause()
calls to busy-wait loops in the cnxk drivers.
The first two patches fix existing empty spinloops in the net/cnxk
and event/cnxk drivers. These were identified using a new coccinelle
script that finds variations of the pattern:
while (!atomic(&flag));
This is compile tested only! I don't have that hardware.
The third patch adds this coccinelle script to devtools/ so that
similar issues can be detected and fixed automatically across the
codebase.
The script handles multiple atomic API variants:
- Legacy rte_atomic*_read() functions
- C11 atomics via rte_atomic_load_explicit()
- GCC builtins via __atomic_load_n()
- Simple volatile variable checks
Stephen Hemminger (3):
net/cnxk: add pause to spinloops
event/cnxk: add pause to spinloops
devtools/cocci: add script to find empty spinloops
devtools/cocci/fix_empty_spinloops.cocci | 165 +++++++++++++++++++++++
drivers/event/cnxk/cn10k_worker.c | 2 +-
drivers/event/cnxk/cn20k_worker.c | 2 +-
drivers/event/cnxk/cnxk_tim_worker.h | 4 +-
drivers/net/cnxk/cn10k_tx.h | 4 +-
drivers/net/cnxk/cn20k_tx.h | 4 +-
6 files changed, 173 insertions(+), 8 deletions(-)
create mode 100644 devtools/cocci/fix_empty_spinloops.cocci
--
2.51.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [RFC 1/3] net/cnxk: add pause to spinloops
2026-01-21 18:05 [RFC 0/3] Add pause to empty spinloops Stephen Hemminger
@ 2026-01-21 18:05 ` Stephen Hemminger
2026-01-21 18:05 ` [RFC 2/3] event/cnxk: " Stephen Hemminger
2026-01-21 18:05 ` [RFC 3/3] devtools/cocci: add script to find empty spinloops Stephen Hemminger
2 siblings, 0 replies; 5+ messages in thread
From: Stephen Hemminger @ 2026-01-21 18:05 UTC (permalink / raw)
To: dev
Cc: Stephen Hemminger, Nithin Dabilpuram, Kiran Kumar K,
Sunil Kumar Kori, Satha Rao, Harman Kalra
On SMT systems when a spinloop is done without a pause
it may cause excessive latency. This problem was found
by the fix_empty_spinloops coccinelle script.
This is compile tested only! I don't have this hardware.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
drivers/net/cnxk/cn10k_tx.h | 4 ++--
drivers/net/cnxk/cn20k_tx.h | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/cnxk/cn10k_tx.h b/drivers/net/cnxk/cn10k_tx.h
index be9e020ac5..3f4cad168c 100644
--- a/drivers/net/cnxk/cn10k_tx.h
+++ b/drivers/net/cnxk/cn10k_tx.h
@@ -167,7 +167,7 @@ cn10k_nix_vwqe_wait_fc(struct cn10k_eth_txq *txq, uint16_t req)
#else
RTE_SET_USED(pkts);
while (rte_atomic_load_explicit(&txq->fc_cache_pkts, rte_memory_order_relaxed) < 0)
- ;
+ rte_pause();
#endif
cached = rte_atomic_fetch_sub_explicit(&txq->fc_cache_pkts, req, rte_memory_order_acquire) -
req;
@@ -402,7 +402,7 @@ cn10k_nix_sec_fc_wait(struct cn10k_eth_txq *txq, uint16_t nb_pkts)
#else
/* Wait for primary core to refill FC. */
while (rte_atomic_load_explicit(fc_sw, rte_memory_order_relaxed) < 0)
- ;
+ rte_pause();
#endif
val = rte_atomic_fetch_sub_explicit(fc_sw, nb_pkts, rte_memory_order_acquire) - nb_pkts;
diff --git a/drivers/net/cnxk/cn20k_tx.h b/drivers/net/cnxk/cn20k_tx.h
index 9e48744831..3dfad5fd5a 100644
--- a/drivers/net/cnxk/cn20k_tx.h
+++ b/drivers/net/cnxk/cn20k_tx.h
@@ -165,7 +165,7 @@ cn20k_nix_vwqe_wait_fc(struct cn20k_eth_txq *txq, uint16_t req)
#else
RTE_SET_USED(pkts);
while (rte_atomic_load_explicit(&txq->fc_cache_pkts, rte_memory_order_relaxed) < 0)
- ;
+ rte_pause();
#endif
cached = rte_atomic_fetch_sub_explicit(&txq->fc_cache_pkts, req, rte_memory_order_acquire) -
req;
@@ -392,7 +392,7 @@ cn20k_nix_sec_fc_wait(struct cn20k_eth_txq *txq, uint16_t nb_pkts)
#else
/* Wait for primary core to refill FC. */
while (rte_atomic_load_explicit(fc_sw, rte_memory_order_relaxed) < 0)
- ;
+ rte_pause();
#endif
val = rte_atomic_fetch_sub_explicit(fc_sw, nb_pkts, rte_memory_order_acquire) - nb_pkts;
--
2.51.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [RFC 2/3] event/cnxk: add pause to spinloops
2026-01-21 18:05 [RFC 0/3] Add pause to empty spinloops Stephen Hemminger
2026-01-21 18:05 ` [RFC 1/3] net/cnxk: add pause to spinloops Stephen Hemminger
@ 2026-01-21 18:05 ` Stephen Hemminger
2026-01-21 21:01 ` Stephen Hemminger
2026-01-21 18:05 ` [RFC 3/3] devtools/cocci: add script to find empty spinloops Stephen Hemminger
2 siblings, 1 reply; 5+ messages in thread
From: Stephen Hemminger @ 2026-01-21 18:05 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, Pavan Nikhilesh, Shijith Thotton
On SMT systems when a spinloop is done without a pause
it may cause excessive latency. This problem was found
by the fix_empty_spinloops coccinelle script.
This is compile tested only! I don't have this hardware.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
drivers/event/cnxk/cn10k_worker.c | 2 +-
drivers/event/cnxk/cn20k_worker.c | 2 +-
drivers/event/cnxk/cnxk_tim_worker.h | 4 ++--
3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/event/cnxk/cn10k_worker.c b/drivers/event/cnxk/cn10k_worker.c
index 80077ec8a1..69ac67115a 100644
--- a/drivers/event/cnxk/cn10k_worker.c
+++ b/drivers/event/cnxk/cn10k_worker.c
@@ -93,7 +93,7 @@ sso_lmt_aw_wait_fc(struct cn10k_sso_hws *ws, int64_t req)
retry:
while (rte_atomic_load_explicit(ws->fc_cache_space, rte_memory_order_relaxed) < 0)
- ;
+ rte_pause();
cached = rte_atomic_fetch_sub_explicit(ws->fc_cache_space, req, rte_memory_order_acquire) -
req;
diff --git a/drivers/event/cnxk/cn20k_worker.c b/drivers/event/cnxk/cn20k_worker.c
index 53daf3b4b0..49dfb2a28c 100644
--- a/drivers/event/cnxk/cn20k_worker.c
+++ b/drivers/event/cnxk/cn20k_worker.c
@@ -93,7 +93,7 @@ sso_lmt_aw_wait_fc(struct cn20k_sso_hws *ws, int64_t req)
retry:
while (rte_atomic_load_explicit(ws->fc_cache_space, rte_memory_order_relaxed) < 0)
- ;
+ rte_pause();
cached = rte_atomic_fetch_sub_explicit(ws->fc_cache_space, req, rte_memory_order_acquire) -
req;
diff --git a/drivers/event/cnxk/cnxk_tim_worker.h b/drivers/event/cnxk/cnxk_tim_worker.h
index 09f84091ab..887c0800e2 100644
--- a/drivers/event/cnxk/cnxk_tim_worker.h
+++ b/drivers/event/cnxk/cnxk_tim_worker.h
@@ -405,9 +405,9 @@ cnxk_tim_add_entry_mp(struct cnxk_tim_ring *const tim_ring,
: [crem] "r"(&bkt->w1)
: "memory");
#else
- while (rte_atomic_load_explicit((int64_t __rte_atomic *)&bkt->w1,
+ while (rte_atomic_load_explicit((int64_t __rte_atomic *)&bkt->w1,
rte_memory_order_relaxed) < 0)
- ;
+ rte_pause();
#endif
goto __retry;
} else if (!rem) {
--
2.51.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [RFC 3/3] devtools/cocci: add script to find empty spinloops
2026-01-21 18:05 [RFC 0/3] Add pause to empty spinloops Stephen Hemminger
2026-01-21 18:05 ` [RFC 1/3] net/cnxk: add pause to spinloops Stephen Hemminger
2026-01-21 18:05 ` [RFC 2/3] event/cnxk: " Stephen Hemminger
@ 2026-01-21 18:05 ` Stephen Hemminger
2 siblings, 0 replies; 5+ messages in thread
From: Stephen Hemminger @ 2026-01-21 18:05 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
This script finds and fixes many variations of the pattern:
while (!atomic(&flag));
to add a rte_pause() to the loop.
This type of loop was causing failures in the standalone atomic
tests on high core system. The script generalizes that to find other
places with the same problem.
Script was autogenerated by AI and works but may cover
more cases than really necessary.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
devtools/cocci/fix_empty_spinloops.cocci | 165 +++++++++++++++++++++++
1 file changed, 165 insertions(+)
create mode 100644 devtools/cocci/fix_empty_spinloops.cocci
diff --git a/devtools/cocci/fix_empty_spinloops.cocci b/devtools/cocci/fix_empty_spinloops.cocci
new file mode 100644
index 0000000000..ff64b30eac
--- /dev/null
+++ b/devtools/cocci/fix_empty_spinloops.cocci
@@ -0,0 +1,165 @@
+// SPDX-License-Identifier: BSD-3-Clause
+// Find and fix empty spin loops that should call rte_pause()
+//
+// Empty spin loops waste CPU cycles and can cause performance issues.
+// This script finds various forms of busy-wait loops and adds rte_pause()
+// to give hints to the CPU and reduce power consumption.
+
+// Rule 1: Handle rte_atomic*_read() variants
+@fix_atomic_read@
+expression ptr, val;
+@@
+
+(
+- while (rte_atomic16_read(ptr) == val);
++ while (rte_atomic16_read(ptr) == val)
++ rte_pause();
+|
+- while (rte_atomic16_read(ptr) != val);
++ while (rte_atomic16_read(ptr) != val)
++ rte_pause();
+|
+- while (rte_atomic32_read(ptr) == val);
++ while (rte_atomic32_read(ptr) == val)
++ rte_pause();
+|
+- while (rte_atomic32_read(ptr) != val);
++ while (rte_atomic32_read(ptr) != val)
++ rte_pause();
+|
+- while (rte_atomic64_read(ptr) == val);
++ while (rte_atomic64_read(ptr) == val)
++ rte_pause();
+|
+- while (rte_atomic64_read(ptr) != val);
++ while (rte_atomic64_read(ptr) != val)
++ rte_pause();
+)
+
+// Rule 2: Handle rte_atomic*_read() with comparison operators
+@fix_atomic_cmp@
+expression ptr, val;
+@@
+
+(
+- while (rte_atomic16_read(ptr) < val);
++ while (rte_atomic16_read(ptr) < val)
++ rte_pause();
+|
+- while (rte_atomic16_read(ptr) > val);
++ while (rte_atomic16_read(ptr) > val)
++ rte_pause();
+|
+- while (rte_atomic32_read(ptr) < val);
++ while (rte_atomic32_read(ptr) < val)
++ rte_pause();
+|
+- while (rte_atomic32_read(ptr) > val);
++ while (rte_atomic32_read(ptr) > val)
++ rte_pause();
+|
+- while (rte_atomic64_read(ptr) < val);
++ while (rte_atomic64_read(ptr) < val)
++ rte_pause();
+|
+- while (rte_atomic64_read(ptr) > val);
++ while (rte_atomic64_read(ptr) > val)
++ rte_pause();
+)
+
+// Rule 3: Handle C11 atomics with rte_atomic_load_explicit()
+@fix_c11_atomic@
+expression ptr, order, val;
+@@
+
+(
+- while (rte_atomic_load_explicit(ptr, order) == val);
++ while (rte_atomic_load_explicit(ptr, order) == val)
++ rte_pause();
+|
+- while (rte_atomic_load_explicit(ptr, order) != val);
++ while (rte_atomic_load_explicit(ptr, order) != val)
++ rte_pause();
+|
+- while (rte_atomic_load_explicit(ptr, order) < val);
++ while (rte_atomic_load_explicit(ptr, order) < val)
++ rte_pause();
+|
+- while (rte_atomic_load_explicit(ptr, order) > val);
++ while (rte_atomic_load_explicit(ptr, order) > val)
++ rte_pause();
+)
+
+// Rule 4: Handle __atomic_load_n() directly
+@fix_gcc_atomic@
+expression ptr, order, val;
+@@
+
+(
+- while (__atomic_load_n(ptr, order) == val);
++ while (__atomic_load_n(ptr, order) == val)
++ rte_pause();
+|
+- while (__atomic_load_n(ptr, order) != val);
++ while (__atomic_load_n(ptr, order) != val)
++ rte_pause();
+|
+- while (__atomic_load_n(ptr, order) < val);
++ while (__atomic_load_n(ptr, order) < val)
++ rte_pause();
+|
+- while (__atomic_load_n(ptr, order) > val);
++ while (__atomic_load_n(ptr, order) > val)
++ rte_pause();
+)
+
+// Rule 5: Handle volatile variable reads (simple dereference)
+@fix_volatile@
+expression E;
+identifier v;
+@@
+
+(
+- while (*v == E);
++ while (*v == E)
++ rte_pause();
+|
+- while (*v != E);
++ while (*v != E)
++ rte_pause();
+|
+- while (*v < E);
++ while (*v < E)
++ rte_pause();
+|
+- while (*v > E);
++ while (*v > E)
++ rte_pause();
+|
+- while (v == E);
++ while (v == E)
++ rte_pause();
+|
+- while (v != E);
++ while (v != E)
++ rte_pause();
+)
+
+// Rule 6: Handle negated conditions
+@fix_negated@
+expression ptr, val;
+@@
+
+(
+- while (!rte_atomic32_read(ptr));
++ while (!rte_atomic32_read(ptr))
++ rte_pause();
+|
+- while (!rte_atomic64_read(ptr));
++ while (!rte_atomic64_read(ptr))
++ rte_pause();
+|
+- while (!rte_atomic_load_explicit(ptr, val));
++ while (!rte_atomic_load_explicit(ptr, val))
++ rte_pause();
+)
--
2.51.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [RFC 2/3] event/cnxk: add pause to spinloops
2026-01-21 18:05 ` [RFC 2/3] event/cnxk: " Stephen Hemminger
@ 2026-01-21 21:01 ` Stephen Hemminger
0 siblings, 0 replies; 5+ messages in thread
From: Stephen Hemminger @ 2026-01-21 21:01 UTC (permalink / raw)
To: dev; +Cc: Pavan Nikhilesh, Shijith Thotton
On Wed, 21 Jan 2026 10:05:43 -0800
Stephen Hemminger <stephen@networkplumber.org> wrote:
> diff --git a/drivers/event/cnxk/cnxk_tim_worker.h b/drivers/event/cnxk/cnxk_tim_worker.h
> index 09f84091ab..887c0800e2 100644
> --- a/drivers/event/cnxk/cnxk_tim_worker.h
> +++ b/drivers/event/cnxk/cnxk_tim_worker.h
> @@ -405,9 +405,9 @@ cnxk_tim_add_entry_mp(struct cnxk_tim_ring *const tim_ring,
> : [crem] "r"(&bkt->w1)
> : "memory");
> #else
> - while (rte_atomic_load_explicit((int64_t __rte_atomic *)&bkt->w1,
> + while (rte_atomic_load_explicit((int64_t __rte_atomic *)&bkt->w1,
> rte_memory_order_relaxed) < 0)
> - ;
> + rte_pause();
> #endif
I noticed while looking at the code there is assembly to do the wait for instructions.
Why doesn't this driver use the rte_unit_equal_64 instead?
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-01-21 21:01 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-21 18:05 [RFC 0/3] Add pause to empty spinloops Stephen Hemminger
2026-01-21 18:05 ` [RFC 1/3] net/cnxk: add pause to spinloops Stephen Hemminger
2026-01-21 18:05 ` [RFC 2/3] event/cnxk: " Stephen Hemminger
2026-01-21 21:01 ` Stephen Hemminger
2026-01-21 18:05 ` [RFC 3/3] devtools/cocci: add script to find empty spinloops Stephen Hemminger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox