* [PATCH v2 0/6] fix test failures on larger core systems
[not found] <0260118201223.323024-1-stephen@networkplumber.org>
@ 2026-01-20 1:55 ` Stephen Hemminger
2026-01-20 1:55 ` [PATCH v2 1/6] test: add pause to synchronization spinloops Stephen Hemminger
` (6 more replies)
0 siblings, 7 replies; 14+ messages in thread
From: Stephen Hemminger @ 2026-01-20 1:55 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
This series addresses several test failures that occur sporadically on
systems with many cores (32+), particularly on AMD Zen architectures.
I think Ferruh may have addressed similar problems in earlier
releases.
The root causes fall into three categories:
1. Missing rte_pause() in synchronization spinloops (patch 1)
Tight spinloops without pause cause SMT thread starvation and
unpredictable timing behavior.
2. Fixed iteration counts that don't scale (patch 2)
The atomic test performs 1M iterations per worker regardless of
core count. With 32+ cores, contention causes timeout failures.
3. File-prefix collisions during parallel test execution (patches 5-6)
Multiple tests using the default "rte" prefix compete for the same
fbarray files, causing EAL initialization failures.
Additionally, two BPF-related fixes that I was seeing on
this system.
4. Lack of error checking in BPF elf load test (patch 3)
5. Unsupported BPF instructions with newer clang (patch 4)
Clang 20+ generates JMP32 instructions that DPDK BPF doesn't support.
v2 - Drop the unnecessary fsync()
- Rework the file prefix handling for trace tests
Stephen Hemminger (6):
test: add pause to synchronization spinloops
test: fix timeout for atomic test on high core count systems
test: fix error handling in ELF load tests
test: fix unsupported BPF instructions in elf load test
test: add file-prefix for all fast-tests on Linux
test: fix trace_autotest_with_traces parallel execution
app/test/bpf/meson.build | 3 +-
app/test/suites/meson.build | 23 +++++++++----
app/test/test_atomic.c | 67 ++++++++++++++++++++++---------------
app/test/test_bpf.c | 3 +-
app/test/test_threads.c | 17 +++++-----
5 files changed, 70 insertions(+), 43 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2 1/6] test: add pause to synchronization spinloops
2026-01-20 1:55 ` [PATCH v2 0/6] fix test failures on larger core systems Stephen Hemminger
@ 2026-01-20 1:55 ` Stephen Hemminger
2026-01-21 16:10 ` Bruce Richardson
2026-01-20 1:55 ` [PATCH v2 2/6] test: fix timeout for atomic test on high core count systems Stephen Hemminger
` (5 subsequent siblings)
6 siblings, 1 reply; 14+ messages in thread
From: Stephen Hemminger @ 2026-01-20 1:55 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, stable
The atomic and thread tests use tight spinloops to synchronize.
These spinloops lack rte_pause() which causes problems on high core
count systems, particularly AMD Zen architectures where:
- Tight spinloops without pause can starve SMT sibling threads
- Memory ordering and store-buffer forwarding behave differently
- Higher core counts amplify timing windows for race conditions
This manifests as sporadic test failures on systems with 32+ cores
that don't reproduce on smaller core count systems.
Add rte_pause() to all seven synchronization spinloops to allow
proper CPU resource sharing and improve memory ordering behavior.
Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
app/test/test_atomic.c | 15 ++++++++-------
app/test/test_threads.c | 17 +++++++++--------
2 files changed, 17 insertions(+), 15 deletions(-)
diff --git a/app/test/test_atomic.c b/app/test/test_atomic.c
index 8160a33e0e..b1a0d40ece 100644
--- a/app/test/test_atomic.c
+++ b/app/test/test_atomic.c
@@ -15,6 +15,7 @@
#include <rte_atomic.h>
#include <rte_eal.h>
#include <rte_lcore.h>
+#include <rte_pause.h>
#include <rte_random.h>
#include <rte_hash_crc.h>
@@ -114,7 +115,7 @@ test_atomic_usual(__rte_unused void *arg)
unsigned i;
while (rte_atomic32_read(&synchro) == 0)
- ;
+ rte_pause();
for (i = 0; i < N; i++)
rte_atomic16_inc(&a16);
@@ -150,7 +151,7 @@ static int
test_atomic_tas(__rte_unused void *arg)
{
while (rte_atomic32_read(&synchro) == 0)
- ;
+ rte_pause();
if (rte_atomic16_test_and_set(&a16))
rte_atomic64_inc(&count);
@@ -171,7 +172,7 @@ test_atomic_addsub_and_return(__rte_unused void *arg)
unsigned i;
while (rte_atomic32_read(&synchro) == 0)
- ;
+ rte_pause();
for (i = 0; i < N; i++) {
tmp16 = rte_atomic16_add_return(&a16, 1);
@@ -210,7 +211,7 @@ static int
test_atomic_inc_and_test(__rte_unused void *arg)
{
while (rte_atomic32_read(&synchro) == 0)
- ;
+ rte_pause();
if (rte_atomic16_inc_and_test(&a16)) {
rte_atomic64_inc(&count);
@@ -237,7 +238,7 @@ static int
test_atomic_dec_and_test(__rte_unused void *arg)
{
while (rte_atomic32_read(&synchro) == 0)
- ;
+ rte_pause();
if (rte_atomic16_dec_and_test(&a16))
rte_atomic64_inc(&count);
@@ -269,7 +270,7 @@ test_atomic128_cmp_exchange(__rte_unused void *arg)
unsigned int i;
while (rte_atomic32_read(&synchro) == 0)
- ;
+ rte_pause();
expected = count128;
@@ -407,7 +408,7 @@ test_atomic_exchange(__rte_unused void *arg)
/* Wait until all of the other threads have been dispatched */
while (rte_atomic32_read(&synchro) == 0)
- ;
+ rte_pause();
/*
* Let the battle begin! Every thread attempts to steal the current
diff --git a/app/test/test_threads.c b/app/test/test_threads.c
index 5cd8bd4559..e2700b4a92 100644
--- a/app/test/test_threads.c
+++ b/app/test/test_threads.c
@@ -7,6 +7,7 @@
#include <rte_thread.h>
#include <rte_debug.h>
#include <rte_stdatomic.h>
+#include <rte_pause.h>
#include "test.h"
@@ -23,7 +24,7 @@ thread_main(void *arg)
rte_atomic_store_explicit(&thread_id_ready, 1, rte_memory_order_release);
while (rte_atomic_load_explicit(&thread_id_ready, rte_memory_order_acquire) == 1)
- ;
+ rte_pause();
return 0;
}
@@ -39,7 +40,7 @@ test_thread_create_join(void)
"Failed to create thread.");
while (rte_atomic_load_explicit(&thread_id_ready, rte_memory_order_acquire) == 0)
- ;
+ rte_pause();
RTE_TEST_ASSERT(rte_thread_equal(thread_id, thread_main_id) != 0,
"Unexpected thread id.");
@@ -63,7 +64,7 @@ test_thread_create_detach(void)
&thread_main_id) == 0, "Failed to create thread.");
while (rte_atomic_load_explicit(&thread_id_ready, rte_memory_order_acquire) == 0)
- ;
+ rte_pause();
RTE_TEST_ASSERT(rte_thread_equal(thread_id, thread_main_id) != 0,
"Unexpected thread id.");
@@ -87,7 +88,7 @@ test_thread_priority(void)
"Failed to create thread");
while (rte_atomic_load_explicit(&thread_id_ready, rte_memory_order_acquire) == 0)
- ;
+ rte_pause();
priority = RTE_THREAD_PRIORITY_NORMAL;
RTE_TEST_ASSERT(rte_thread_set_priority(thread_id, priority) == 0,
@@ -139,7 +140,7 @@ test_thread_affinity(void)
"Failed to create thread");
while (rte_atomic_load_explicit(&thread_id_ready, rte_memory_order_acquire) == 0)
- ;
+ rte_pause();
RTE_TEST_ASSERT(rte_thread_get_affinity_by_id(thread_id, &cpuset0) == 0,
"Failed to get thread affinity");
@@ -192,7 +193,7 @@ test_thread_attributes_affinity(void)
"Failed to create attributes affinity thread.");
while (rte_atomic_load_explicit(&thread_id_ready, rte_memory_order_acquire) == 0)
- ;
+ rte_pause();
RTE_TEST_ASSERT(rte_thread_get_affinity_by_id(thread_id, &cpuset1) == 0,
"Failed to get attributes thread affinity");
@@ -221,7 +222,7 @@ test_thread_attributes_priority(void)
"Failed to create attributes priority thread.");
while (rte_atomic_load_explicit(&thread_id_ready, rte_memory_order_acquire) == 0)
- ;
+ rte_pause();
RTE_TEST_ASSERT(rte_thread_get_priority(thread_id, &priority) == 0,
"Failed to get thread priority");
@@ -245,7 +246,7 @@ test_thread_control_create_join(void)
"Failed to create thread.");
while (rte_atomic_load_explicit(&thread_id_ready, rte_memory_order_acquire) == 0)
- ;
+ rte_pause();
RTE_TEST_ASSERT(rte_thread_equal(thread_id, thread_main_id) != 0,
"Unexpected thread id.");
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 2/6] test: fix timeout for atomic test on high core count systems
2026-01-20 1:55 ` [PATCH v2 0/6] fix test failures on larger core systems Stephen Hemminger
2026-01-20 1:55 ` [PATCH v2 1/6] test: add pause to synchronization spinloops Stephen Hemminger
@ 2026-01-20 1:55 ` Stephen Hemminger
2026-01-21 16:11 ` Bruce Richardson
2026-01-20 1:55 ` [PATCH v2 3/6] test: fix error handling in ELF load tests Stephen Hemminger
` (4 subsequent siblings)
6 siblings, 1 reply; 14+ messages in thread
From: Stephen Hemminger @ 2026-01-20 1:55 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, stable
The atomic test uses tight spinloops to synchronize worker threads
and performs a fixed 1,000,000 iterations per worker. This causes
two problems on high core count systems:
With many cores (e.g., 32), the massive contention on shared
atomic variables causes the test to exceed the 10 second timeout.
Scale iterations inversely with core count to maintain roughly
constant test duration regardless of system size
With 32 cores, iterations drop from 1,000,000 to 31,250 per worker,
which keeps the test well within the timeout while still providing
meaningful coverage.
Bugzilla ID: 952
Fixes: af75078fece3 ("first public release")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
app/test/test_atomic.c | 52 ++++++++++++++++++++++++++----------------
1 file changed, 32 insertions(+), 20 deletions(-)
diff --git a/app/test/test_atomic.c b/app/test/test_atomic.c
index b1a0d40ece..ccd8e5d29b 100644
--- a/app/test/test_atomic.c
+++ b/app/test/test_atomic.c
@@ -10,6 +10,7 @@
#include <sys/queue.h>
#include <rte_memory.h>
+#include <rte_common.h>
#include <rte_per_lcore.h>
#include <rte_launch.h>
#include <rte_atomic.h>
@@ -101,7 +102,15 @@
#define NUM_ATOMIC_TYPES 3
-#define N 1000000
+#define N_BASE 1000000u
+#define N_MIN 10000u
+
+/*
+ * Number of iterations for each test, scaled inversely with core count.
+ * More cores means more contention which increases time per operation.
+ * Calculated once at test start to avoid repeated computation in workers.
+ */
+static unsigned int num_iterations;
static rte_atomic16_t a16;
static rte_atomic32_t a32;
@@ -112,36 +121,36 @@ static rte_atomic32_t synchro;
static int
test_atomic_usual(__rte_unused void *arg)
{
- unsigned i;
+ unsigned int i;
while (rte_atomic32_read(&synchro) == 0)
rte_pause();
- for (i = 0; i < N; i++)
+ for (i = 0; i < num_iterations; i++)
rte_atomic16_inc(&a16);
- for (i = 0; i < N; i++)
+ for (i = 0; i < num_iterations; i++)
rte_atomic16_dec(&a16);
- for (i = 0; i < (N / 5); i++)
+ for (i = 0; i < (num_iterations / 5); i++)
rte_atomic16_add(&a16, 5);
- for (i = 0; i < (N / 5); i++)
+ for (i = 0; i < (num_iterations / 5); i++)
rte_atomic16_sub(&a16, 5);
- for (i = 0; i < N; i++)
+ for (i = 0; i < num_iterations; i++)
rte_atomic32_inc(&a32);
- for (i = 0; i < N; i++)
+ for (i = 0; i < num_iterations; i++)
rte_atomic32_dec(&a32);
- for (i = 0; i < (N / 5); i++)
+ for (i = 0; i < (num_iterations / 5); i++)
rte_atomic32_add(&a32, 5);
- for (i = 0; i < (N / 5); i++)
+ for (i = 0; i < (num_iterations / 5); i++)
rte_atomic32_sub(&a32, 5);
- for (i = 0; i < N; i++)
+ for (i = 0; i < num_iterations; i++)
rte_atomic64_inc(&a64);
- for (i = 0; i < N; i++)
+ for (i = 0; i < num_iterations; i++)
rte_atomic64_dec(&a64);
- for (i = 0; i < (N / 5); i++)
+ for (i = 0; i < (num_iterations / 5); i++)
rte_atomic64_add(&a64, 5);
- for (i = 0; i < (N / 5); i++)
+ for (i = 0; i < (num_iterations / 5); i++)
rte_atomic64_sub(&a64, 5);
return 0;
@@ -169,12 +178,12 @@ test_atomic_addsub_and_return(__rte_unused void *arg)
uint32_t tmp16;
uint32_t tmp32;
uint64_t tmp64;
- unsigned i;
+ unsigned int i;
while (rte_atomic32_read(&synchro) == 0)
rte_pause();
- for (i = 0; i < N; i++) {
+ for (i = 0; i < num_iterations; i++) {
tmp16 = rte_atomic16_add_return(&a16, 1);
rte_atomic64_add(&count, tmp16);
@@ -274,7 +283,7 @@ test_atomic128_cmp_exchange(__rte_unused void *arg)
expected = count128;
- for (i = 0; i < N; i++) {
+ for (i = 0; i < num_iterations; i++) {
do {
rte_int128_t desired;
@@ -401,7 +410,7 @@ get_crc8(uint8_t *message, int length)
static int
test_atomic_exchange(__rte_unused void *arg)
{
- int i;
+ unsigned int i;
test16_t nt16, ot16; /* new token, old token */
test32_t nt32, ot32;
test64_t nt64, ot64;
@@ -417,7 +426,7 @@ test_atomic_exchange(__rte_unused void *arg)
* appropriate crc32 hash for the data) then the test iteration has
* passed. If the token is invalid, increment the counter.
*/
- for (i = 0; i < N; i++) {
+ for (i = 0; i < num_iterations; i++) {
/* Test 64bit Atomic Exchange */
nt64.u64 = rte_rand();
@@ -446,6 +455,9 @@ test_atomic_exchange(__rte_unused void *arg)
static int
test_atomic(void)
{
+ /* Scale iterations by number of cores to keep test duration reasonable */
+ num_iterations = RTE_MAX(N_BASE / rte_lcore_count(), N_MIN);
+
rte_atomic16_init(&a16);
rte_atomic32_init(&a32);
rte_atomic64_init(&a64);
@@ -593,7 +605,7 @@ test_atomic(void)
rte_atomic32_clear(&synchro);
iterations = count128.val[0] - count128.val[1];
- if (iterations != (uint64_t)4*N*(rte_lcore_count()-1)) {
+ if (iterations != (uint64_t)4*num_iterations*(rte_lcore_count()-1)) {
printf("128-bit compare and swap failed\n");
return -1;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 3/6] test: fix error handling in ELF load tests
2026-01-20 1:55 ` [PATCH v2 0/6] fix test failures on larger core systems Stephen Hemminger
2026-01-20 1:55 ` [PATCH v2 1/6] test: add pause to synchronization spinloops Stephen Hemminger
2026-01-20 1:55 ` [PATCH v2 2/6] test: fix timeout for atomic test on high core count systems Stephen Hemminger
@ 2026-01-20 1:55 ` Stephen Hemminger
2026-01-20 12:08 ` Marat Khalili
2026-01-20 1:55 ` [PATCH v2 4/6] test: fix unsupported BPF instructions in elf load test Stephen Hemminger
` (3 subsequent siblings)
6 siblings, 1 reply; 14+ messages in thread
From: Stephen Hemminger @ 2026-01-20 1:55 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, stable
Address related issues found during review
- Add missing TEST_ASSERT for mempool creation in test_bpf_elf_tx_load
- Initialize port variable in test_bpf_elf_rx_load to avoid undefined
behavior in cleanup path if null_vdev_setup fails early
Fixes: cf1e03f881af ("test/bpf: add ELF loading")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
app/test/test_bpf.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index a7d56f8d86..0e969f9f13 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -3580,6 +3580,7 @@ test_bpf_elf_tx_load(void)
mb_pool = rte_pktmbuf_pool_create("bpf_tx_test_pool", BPF_TEST_POOLSIZE,
0, 0, RTE_MBUF_DEFAULT_BUF_SIZE,
SOCKET_ID_ANY);
+ TEST_ASSERT(mb_pool != NULL, "failed to create mempool");
ret = null_vdev_setup(null_dev, &port, mb_pool);
if (ret != 0)
@@ -3664,7 +3665,7 @@ test_bpf_elf_rx_load(void)
static const char null_dev[] = "net_null_bpf0";
struct rte_mempool *pool = NULL;
char *tmpfile = NULL;
- uint16_t port;
+ uint16_t port = UINT16_MAX;
int ret;
printf("%s start\n", __func__);
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 4/6] test: fix unsupported BPF instructions in elf load test
2026-01-20 1:55 ` [PATCH v2 0/6] fix test failures on larger core systems Stephen Hemminger
` (2 preceding siblings ...)
2026-01-20 1:55 ` [PATCH v2 3/6] test: fix error handling in ELF load tests Stephen Hemminger
@ 2026-01-20 1:55 ` Stephen Hemminger
2026-01-21 16:20 ` Bruce Richardson
2026-01-20 1:55 ` [PATCH v2 5/6] test: add file-prefix for all fast-tests on Linux Stephen Hemminger
` (2 subsequent siblings)
6 siblings, 1 reply; 14+ messages in thread
From: Stephen Hemminger @ 2026-01-20 1:55 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, stable, Marat Khalili
The DPDK BPF library only handles the base BPF instructions.
It does not handle JMP32 which would cause the bpf_elf_load
test to fail on clang 20 or later.
Bugzilla ID: 1844
Fixes: cf1e03f881af ("test/bpf: add ELF loading")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Marat Khalili <marat.khalili@huawei.com>
---
app/test/bpf/meson.build | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/app/test/bpf/meson.build b/app/test/bpf/meson.build
index aaecfa7018..91c1b434f8 100644
--- a/app/test/bpf/meson.build
+++ b/app/test/bpf/meson.build
@@ -24,7 +24,8 @@ if not xxd.found()
endif
# BPF compiler flags
-bpf_cflags = [ '-O2', '-target', 'bpf', '-g', '-c']
+# At present: DPDK BPF does not support v3 or later
+bpf_cflags = [ '-O2', '-target', 'bpf', '-mcpu=v2', '-g', '-c']
# Enable test in test_bpf.c
cflags += '-DTEST_BPF_ELF_LOAD'
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 5/6] test: add file-prefix for all fast-tests on Linux
2026-01-20 1:55 ` [PATCH v2 0/6] fix test failures on larger core systems Stephen Hemminger
` (3 preceding siblings ...)
2026-01-20 1:55 ` [PATCH v2 4/6] test: fix unsupported BPF instructions in elf load test Stephen Hemminger
@ 2026-01-20 1:55 ` Stephen Hemminger
2026-01-21 16:22 ` Bruce Richardson
2026-01-20 1:55 ` [PATCH v2 6/6] test: fix trace_autotest_with_traces parallel execution Stephen Hemminger
2026-01-21 16:31 ` [PATCH v2 0/6] fix test failures on larger core systems Bruce Richardson
6 siblings, 1 reply; 14+ messages in thread
From: Stephen Hemminger @ 2026-01-20 1:55 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, stable, Marat Khalili
When running tests in parallel on systems with many cores, multiple test
processes collide on the default "rte" file-prefix, causing EAL
initialization failures:
EAL: Cannot allocate memzone list: Device or resource busy
EAL: Cannot init memzone
This occurs because all DPDK tests (including --no-huge tests) use
file-backed arrays for memzone tracking. These files are created at
/var/run/dpdk/<prefix>/fbarray_memzone and require exclusive locking
during initialization. When multiple tests run in parallel with the
same file-prefix, they compete for this lock.
The original implementation included --file-prefix for Linux to
prevent this collision. This was later removed during test
infrastructure refactoring.
Restore the --file-prefix argument for all fast-tests on Linux,
regardless of whether they use hugepages. Tests that exercise
file-prefix functionality (like eal_flags_file_prefix_autotest)
spawn child processes with their own hardcoded prefixes and use
get_current_prefix() to verify the parent's resources, so they work
correctly regardless of what prefix the parent process uses.
Fixes: 50823f30f0c8 ("test: build using per-file dependencies")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Marat Khalili <marat.khalili@huawei.com>
---
app/test/suites/meson.build | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/app/test/suites/meson.build b/app/test/suites/meson.build
index 1010150eee..4c815ea097 100644
--- a/app/test/suites/meson.build
+++ b/app/test/suites/meson.build
@@ -85,11 +85,15 @@ foreach suite:test_suites
if nohuge
test_args += test_no_huge_args
elif not has_hugepage
- continue #skip this tests
+ continue # skip this test
endif
if not asan and get_option('b_sanitize').contains('address')
continue # skip this test
endif
+ if is_linux
+ # use unique file-prefix to allow parallel runs
+ test_args += ['--file-prefix=' + test_name.underscorify()]
+ endif
if get_option('default_library') == 'shared'
test_args += ['-d', dpdk_drivers_build_dir]
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 6/6] test: fix trace_autotest_with_traces parallel execution
2026-01-20 1:55 ` [PATCH v2 0/6] fix test failures on larger core systems Stephen Hemminger
` (4 preceding siblings ...)
2026-01-20 1:55 ` [PATCH v2 5/6] test: add file-prefix for all fast-tests on Linux Stephen Hemminger
@ 2026-01-20 1:55 ` Stephen Hemminger
2026-01-21 16:29 ` Bruce Richardson
2026-01-21 16:31 ` [PATCH v2 0/6] fix test failures on larger core systems Bruce Richardson
6 siblings, 1 reply; 14+ messages in thread
From: Stephen Hemminger @ 2026-01-20 1:55 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, stable
The trace_autotest_with_traces test needs a unique file-prefix to avoid
collisions when running in parallel with other tests.
Rather than duplicating test argument construction, restructure to add
file-prefix as the last step. This allows reusing test_args for the
trace variant by concatenating the trace-specific arguments and a
different file-prefix at the end.
Fixes: 0aeaf75df879 ("test: define unit tests suites based on test types")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
app/test/suites/meson.build | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)
diff --git a/app/test/suites/meson.build b/app/test/suites/meson.build
index 4c815ea097..fdc0b77149 100644
--- a/app/test/suites/meson.build
+++ b/app/test/suites/meson.build
@@ -90,26 +90,33 @@ foreach suite:test_suites
if not asan and get_option('b_sanitize').contains('address')
continue # skip this test
endif
- if is_linux
- # use unique file-prefix to allow parallel runs
- test_args += ['--file-prefix=' + test_name.underscorify()]
- endif
-
if get_option('default_library') == 'shared'
test_args += ['-d', dpdk_drivers_build_dir]
endif
+ # use unique file-prefix to allow parallel runs
+ if is_linux
+ file_prefix = ['--file-prefix=' + test_name.underscorify()]
+ else
+ file_prefix = []
+ endif
+
test(test_name, dpdk_test,
- args : test_args,
+ args : test_args + file_prefix,
env: ['DPDK_TEST=' + test_name],
timeout : timeout_seconds_fast,
is_parallel : false,
suite : 'fast-tests')
if not is_windows and test_name == 'trace_autotest'
- test_args += ['--trace=.*']
- test_args += ['--trace-dir=@0@'.format(meson.current_build_dir())]
+ trace_extra = ['--trace=.*',
+ '--trace-dir=@0@'.format(meson.current_build_dir())]
+ if is_linux
+ trace_prefix = ['--file-prefix=trace_autotest_with_traces']
+ else
+ trace_prefix = []
+ endif
test(test_name + '_with_traces', dpdk_test,
- args : test_args,
+ args : test_args + trace_extra + trace_prefix,
env: ['DPDK_TEST=' + test_name],
timeout : timeout_seconds_fast,
is_parallel : false,
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* RE: [PATCH v2 3/6] test: fix error handling in ELF load tests
2026-01-20 1:55 ` [PATCH v2 3/6] test: fix error handling in ELF load tests Stephen Hemminger
@ 2026-01-20 12:08 ` Marat Khalili
0 siblings, 0 replies; 14+ messages in thread
From: Marat Khalili @ 2026-01-20 12:08 UTC (permalink / raw)
To: Stephen Hemminger, dev@dpdk.org; +Cc: stable@dpdk.org
> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Tuesday 20 January 2026 01:55
> To: dev@dpdk.org
> Cc: Stephen Hemminger <stephen@networkplumber.org>; stable@dpdk.org
> Subject: [PATCH v2 3/6] test: fix error handling in ELF load tests
>
> Address related issues found during review
> - Add missing TEST_ASSERT for mempool creation in test_bpf_elf_tx_load
> - Initialize port variable in test_bpf_elf_rx_load to avoid undefined
> behavior in cleanup path if null_vdev_setup fails early
>
> Fixes: cf1e03f881af ("test/bpf: add ELF loading")
> Cc: stable@dpdk.org
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
> app/test/test_bpf.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
> index a7d56f8d86..0e969f9f13 100644
> --- a/app/test/test_bpf.c
> +++ b/app/test/test_bpf.c
> @@ -3580,6 +3580,7 @@ test_bpf_elf_tx_load(void)
> mb_pool = rte_pktmbuf_pool_create("bpf_tx_test_pool", BPF_TEST_POOLSIZE,
> 0, 0, RTE_MBUF_DEFAULT_BUF_SIZE,
> SOCKET_ID_ANY);
> + TEST_ASSERT(mb_pool != NULL, "failed to create mempool");
>
> ret = null_vdev_setup(null_dev, &port, mb_pool);
> if (ret != 0)
> @@ -3664,7 +3665,7 @@ test_bpf_elf_rx_load(void)
> static const char null_dev[] = "net_null_bpf0";
> struct rte_mempool *pool = NULL;
> char *tmpfile = NULL;
> - uint16_t port;
> + uint16_t port = UINT16_MAX;
> int ret;
>
> printf("%s start\n", __func__);
> --
> 2.51.0
>
Acked-by: Marat Khalili <marat.khalili@huawei.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/6] test: add pause to synchronization spinloops
2026-01-20 1:55 ` [PATCH v2 1/6] test: add pause to synchronization spinloops Stephen Hemminger
@ 2026-01-21 16:10 ` Bruce Richardson
0 siblings, 0 replies; 14+ messages in thread
From: Bruce Richardson @ 2026-01-21 16:10 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev, stable
On Mon, Jan 19, 2026 at 05:55:04PM -0800, Stephen Hemminger wrote:
> The atomic and thread tests use tight spinloops to synchronize.
> These spinloops lack rte_pause() which causes problems on high core
> count systems, particularly AMD Zen architectures where:
>
> - Tight spinloops without pause can starve SMT sibling threads
> - Memory ordering and store-buffer forwarding behave differently
> - Higher core counts amplify timing windows for race conditions
>
> This manifests as sporadic test failures on systems with 32+ cores
> that don't reproduce on smaller core count systems.
>
> Add rte_pause() to all seven synchronization spinloops to allow
> proper CPU resource sharing and improve memory ordering behavior.
>
> Fixes: af75078fece3 ("first public release")
> Cc: stable@dpdk.org
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
> app/test/test_atomic.c | 15 ++++++++-------
> app/test/test_threads.c | 17 +++++++++--------
> 2 files changed, 17 insertions(+), 15 deletions(-)
>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/6] test: fix timeout for atomic test on high core count systems
2026-01-20 1:55 ` [PATCH v2 2/6] test: fix timeout for atomic test on high core count systems Stephen Hemminger
@ 2026-01-21 16:11 ` Bruce Richardson
0 siblings, 0 replies; 14+ messages in thread
From: Bruce Richardson @ 2026-01-21 16:11 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev, stable
On Mon, Jan 19, 2026 at 05:55:05PM -0800, Stephen Hemminger wrote:
> The atomic test uses tight spinloops to synchronize worker threads
> and performs a fixed 1,000,000 iterations per worker. This causes
> two problems on high core count systems:
>
> With many cores (e.g., 32), the massive contention on shared
> atomic variables causes the test to exceed the 10 second timeout.
>
> Scale iterations inversely with core count to maintain roughly
> constant test duration regardless of system size
>
> With 32 cores, iterations drop from 1,000,000 to 31,250 per worker,
> which keeps the test well within the timeout while still providing
> meaningful coverage.
>
> Bugzilla ID: 952
> Fixes: af75078fece3 ("first public release")
> Cc: stable@dpdk.org
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
> app/test/test_atomic.c | 52 ++++++++++++++++++++++++++----------------
> 1 file changed, 32 insertions(+), 20 deletions(-)
>
Tested-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Tested this on a system with 96 cores and test no longer times out or fails
for me.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 4/6] test: fix unsupported BPF instructions in elf load test
2026-01-20 1:55 ` [PATCH v2 4/6] test: fix unsupported BPF instructions in elf load test Stephen Hemminger
@ 2026-01-21 16:20 ` Bruce Richardson
0 siblings, 0 replies; 14+ messages in thread
From: Bruce Richardson @ 2026-01-21 16:20 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev, stable, Marat Khalili
On Mon, Jan 19, 2026 at 05:55:07PM -0800, Stephen Hemminger wrote:
> The DPDK BPF library only handles the base BPF instructions.
> It does not handle JMP32 which would cause the bpf_elf_load
> test to fail on clang 20 or later.
>
> Bugzilla ID: 1844
> Fixes: cf1e03f881af ("test/bpf: add ELF loading")
> Cc: stable@dpdk.org
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> Acked-by: Marat Khalili <marat.khalili@huawei.com>
> ---
> app/test/bpf/meson.build | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/app/test/bpf/meson.build b/app/test/bpf/meson.build
> index aaecfa7018..91c1b434f8 100644
> --- a/app/test/bpf/meson.build
> +++ b/app/test/bpf/meson.build
> @@ -24,7 +24,8 @@ if not xxd.found()
> endif
>
> # BPF compiler flags
> -bpf_cflags = [ '-O2', '-target', 'bpf', '-g', '-c']
> +# At present: DPDK BPF does not support v3 or later
> +bpf_cflags = [ '-O2', '-target', 'bpf', '-mcpu=v2', '-g', '-c']
>
> # Enable test in test_bpf.c
> cflags += '-DTEST_BPF_ELF_LOAD'
> --
One small additional thing in the bpf autotest, is that the test fails if
net/null driver is disabled. It would be good if it reported skipped in
that case.
/Bruce
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 5/6] test: add file-prefix for all fast-tests on Linux
2026-01-20 1:55 ` [PATCH v2 5/6] test: add file-prefix for all fast-tests on Linux Stephen Hemminger
@ 2026-01-21 16:22 ` Bruce Richardson
0 siblings, 0 replies; 14+ messages in thread
From: Bruce Richardson @ 2026-01-21 16:22 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev, stable, Marat Khalili
On Mon, Jan 19, 2026 at 05:55:08PM -0800, Stephen Hemminger wrote:
> When running tests in parallel on systems with many cores, multiple test
> processes collide on the default "rte" file-prefix, causing EAL
> initialization failures:
>
> EAL: Cannot allocate memzone list: Device or resource busy
> EAL: Cannot init memzone
>
> This occurs because all DPDK tests (including --no-huge tests) use
> file-backed arrays for memzone tracking. These files are created at
> /var/run/dpdk/<prefix>/fbarray_memzone and require exclusive locking
> during initialization. When multiple tests run in parallel with the
> same file-prefix, they compete for this lock.
>
> The original implementation included --file-prefix for Linux to
> prevent this collision. This was later removed during test
> infrastructure refactoring.
>
> Restore the --file-prefix argument for all fast-tests on Linux,
> regardless of whether they use hugepages. Tests that exercise
> file-prefix functionality (like eal_flags_file_prefix_autotest)
> spawn child processes with their own hardcoded prefixes and use
> get_current_prefix() to verify the parent's resources, so they work
> correctly regardless of what prefix the parent process uses.
>
> Fixes: 50823f30f0c8 ("test: build using per-file dependencies")
> Cc: stable@dpdk.org
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> Acked-by: Marat Khalili <marat.khalili@huawei.com>
> ---
> app/test/suites/meson.build | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/app/test/suites/meson.build b/app/test/suites/meson.build
> index 1010150eee..4c815ea097 100644
> --- a/app/test/suites/meson.build
> +++ b/app/test/suites/meson.build
> @@ -85,11 +85,15 @@ foreach suite:test_suites
> if nohuge
> test_args += test_no_huge_args
> elif not has_hugepage
> - continue #skip this tests
> + continue # skip this test
> endif
> if not asan and get_option('b_sanitize').contains('address')
> continue # skip this test
> endif
> + if is_linux
> + # use unique file-prefix to allow parallel runs
> + test_args += ['--file-prefix=' + test_name.underscorify()]
> + endif
>
No harm in this, even though I suspect parallel runs may hit other issues.
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 6/6] test: fix trace_autotest_with_traces parallel execution
2026-01-20 1:55 ` [PATCH v2 6/6] test: fix trace_autotest_with_traces parallel execution Stephen Hemminger
@ 2026-01-21 16:29 ` Bruce Richardson
0 siblings, 0 replies; 14+ messages in thread
From: Bruce Richardson @ 2026-01-21 16:29 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev, stable
On Mon, Jan 19, 2026 at 05:55:09PM -0800, Stephen Hemminger wrote:
> The trace_autotest_with_traces test needs a unique file-prefix to avoid
> collisions when running in parallel with other tests.
>
> Rather than duplicating test argument construction, restructure to add
> file-prefix as the last step. This allows reusing test_args for the
> trace variant by concatenating the trace-specific arguments and a
> different file-prefix at the end.
>
> Fixes: 0aeaf75df879 ("test: define unit tests suites based on test types")
> Cc: stable@dpdk.org
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
> app/test/suites/meson.build | 25 ++++++++++++++++---------
> 1 file changed, 16 insertions(+), 9 deletions(-)
>
> diff --git a/app/test/suites/meson.build b/app/test/suites/meson.build
> index 4c815ea097..fdc0b77149 100644
> --- a/app/test/suites/meson.build
> +++ b/app/test/suites/meson.build
> @@ -90,26 +90,33 @@ foreach suite:test_suites
> if not asan and get_option('b_sanitize').contains('address')
> continue # skip this test
> endif
> - if is_linux
> - # use unique file-prefix to allow parallel runs
> - test_args += ['--file-prefix=' + test_name.underscorify()]
> - endif
> -
> if get_option('default_library') == 'shared'
> test_args += ['-d', dpdk_drivers_build_dir]
> endif
>
> + # use unique file-prefix to allow parallel runs
> + if is_linux
> + file_prefix = ['--file-prefix=' + test_name.underscorify()]
> + else
> + file_prefix = []
> + endif
> +
I would test to shorten, and merge generating a trace prefix into this, to
avoid multiple if-else branches.:
file_prefix = []
trace_file_prefix = []
if is_linux
file_prefix = ['--file-prefix=' + test_name.underscorify()]
trace_file_prefix = [file_prefix[0] + '_with_traces']
endif
> test(test_name, dpdk_test,
> - args : test_args,
> + args : test_args + file_prefix,
> env: ['DPDK_TEST=' + test_name],
> timeout : timeout_seconds_fast,
> is_parallel : false,
> suite : 'fast-tests')
> if not is_windows and test_name == 'trace_autotest'
> - test_args += ['--trace=.*']
> - test_args += ['--trace-dir=@0@'.format(meson.current_build_dir())]
> + trace_extra = ['--trace=.*',
> + '--trace-dir=@0@'.format(meson.current_build_dir())]
> + if is_linux
> + trace_prefix = ['--file-prefix=trace_autotest_with_traces']
> + else
> + trace_prefix = []
> + endif
> test(test_name + '_with_traces', dpdk_test,
> - args : test_args,
> + args : test_args + trace_extra + trace_prefix,
> env: ['DPDK_TEST=' + test_name],
> timeout : timeout_seconds_fast,
> is_parallel : false,
> --
> 2.51.0
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 0/6] fix test failures on larger core systems
2026-01-20 1:55 ` [PATCH v2 0/6] fix test failures on larger core systems Stephen Hemminger
` (5 preceding siblings ...)
2026-01-20 1:55 ` [PATCH v2 6/6] test: fix trace_autotest_with_traces parallel execution Stephen Hemminger
@ 2026-01-21 16:31 ` Bruce Richardson
6 siblings, 0 replies; 14+ messages in thread
From: Bruce Richardson @ 2026-01-21 16:31 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev
On Mon, Jan 19, 2026 at 05:55:03PM -0800, Stephen Hemminger wrote:
> This series addresses several test failures that occur sporadically on
> systems with many cores (32+), particularly on AMD Zen architectures.
> I think Ferruh may have addressed similar problems in earlier
> releases.
>
> The root causes fall into three categories:
>
> 1. Missing rte_pause() in synchronization spinloops (patch 1)
> Tight spinloops without pause cause SMT thread starvation and
> unpredictable timing behavior.
>
> 2. Fixed iteration counts that don't scale (patch 2)
> The atomic test performs 1M iterations per worker regardless of
> core count. With 32+ cores, contention causes timeout failures.
>
Testing on a 96-core part, I still see timeouts (with -t2, so 20-second
allowed) in mcslock, stack, stack_lf and timer tests. Limiting core counts
for those makes the failures go away, so it's likely the same issue as you
solved here.
/Bruce
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2026-01-21 16:31 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <0260118201223.323024-1-stephen@networkplumber.org>
2026-01-20 1:55 ` [PATCH v2 0/6] fix test failures on larger core systems Stephen Hemminger
2026-01-20 1:55 ` [PATCH v2 1/6] test: add pause to synchronization spinloops Stephen Hemminger
2026-01-21 16:10 ` Bruce Richardson
2026-01-20 1:55 ` [PATCH v2 2/6] test: fix timeout for atomic test on high core count systems Stephen Hemminger
2026-01-21 16:11 ` Bruce Richardson
2026-01-20 1:55 ` [PATCH v2 3/6] test: fix error handling in ELF load tests Stephen Hemminger
2026-01-20 12:08 ` Marat Khalili
2026-01-20 1:55 ` [PATCH v2 4/6] test: fix unsupported BPF instructions in elf load test Stephen Hemminger
2026-01-21 16:20 ` Bruce Richardson
2026-01-20 1:55 ` [PATCH v2 5/6] test: add file-prefix for all fast-tests on Linux Stephen Hemminger
2026-01-21 16:22 ` Bruce Richardson
2026-01-20 1:55 ` [PATCH v2 6/6] test: fix trace_autotest_with_traces parallel execution Stephen Hemminger
2026-01-21 16:29 ` Bruce Richardson
2026-01-21 16:31 ` [PATCH v2 0/6] fix test failures on larger core systems Bruce Richardson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox