All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wander Lairson Costa <wander@redhat.com>
To: Clark Williams <williams@redhat.com>,
	John Kacur <jkacur@redhat.com>,
	linux-rt-users@vger.kernel.org
Cc: Juri Lelli <juri.lelli@redhat.com>,
	luffyluo@tencent.com, davidlt@rivosinc.com,
	Wander Lairson Costa <wander@redhat.com>
Subject: [[PATCH stalld] 00/33] Test suite hardening, correctness fixes, and BPF optimization
Date: Wed, 20 May 2026 11:00:27 -0300	[thread overview]
Message-ID: <20260520140104.112142-1-wander@redhat.com> (raw)

The stalld functional test suite has accumulated significant
reliability and maintainability issues over time. Several tests
were silently passing due to missing assertions, shell scoping
bugs that discarded subshell results, and grep substring matches
causing false positives on multi-digit CPU systems. Massive
boilerplate duplication existed across the suite, timing-dependent
sleeps caused flakiness, and a daemon bug was discovered where
--force_fifo combined with single-threaded mode silently fell
back to adaptive mode instead of erroring out. Additionally, the
BPF queue_track backend suffered from an O(n) linear array scan
on every sched_wakeup event, consuming significant real-time
latency budget in Telco workloads.

This series addresses these issues in three areas. First, the
daemon's --force_fifo validation is fixed to reject the
incompatible single-threaded mode at startup. Second, the test
suite is overhauled by introducing eight shared helpers
(test_section, cleanup_scenario, find_starved_child,
init_functional_test, assert_stalld_rejects, assert_log_contains,
assert_success, and starvation/boost assertion helpers) that
replace duplicated validation patterns across all functional
tests. The tests are then hardened with proper assertions,
transitioned to fail-fast semantics where any failure immediately
aborts the run, and timing-dependent sleeps are replaced with
event-driven synchronization. The starvation_gen helper's signal
handler is fixed for async-signal safety. Legacy C test
infrastructure, stale documentation, and unreachable code are
removed. Third, the BPF queue_track backend replaces its per-CPU
linear array with a hash map keyed by (cpu, pid) for O(1) task
lookups, reducing thread latency from 5-6us to 3us on real-time
systems.

As a result, approximately seven previously silent tests now
correctly validate daemon behavior, the suite runs deterministically
via event-driven waits rather than fixed sleeps, and the BPF
backend eliminates its lookup overhead entirely. The series yields
a net reduction of roughly 9,850 lines (+874/-10,725).

Wander Lairson Costa (33):
  stalld: Reject --force_fifo in single-threaded mode
  tests: Introduce test_section() helper
  tests: Introduce cleanup_scenario() helper
  tests: Introduce starvation and boost asserts
  tests: Introduce find_starved_child() helper
  tests: Fix task exit timing in test_boost_restoration
  tests: Consolidate and adopt init_functional_test()
  tests: Introduce assert_stalld_rejects() helper
  tests: Fix boost verification in runtime and duration tests
  tests: Fix subshell swallowing test results
  tests: Fix repeated log match finding same line
  chore: Remove legacy test infrastructure and stale docs
  tests: Add assertions to SCHED_OTHER restoration test
  tests: Fix CPU selection grep substring matches
  tests: Add idle CPU skipping assertion
  tests: Remove redundant pkill from cleanup
  tests: Introduce and adopt assert_log_contains() helper
  tests: Remove weak, redundant, and assertion-free test blocks
  tests: Introduce and adopt assert_success() helper
  tests: Replace wait conditionals with asserts
  tests: Remove if-wrappers around assert calls
  tests: Abort immediately on test failure
  tests: Remove dead code after making fail() fatal
  tests: Introduce and adopt process helpers
  tests: Extract wait_for_process_exit helper
  tests: Reduce default wait timeouts
  tests: Reduce starvation_gen durations
  tests: Replace init sleeps in test_affinity
  tests: Drop redundant sleeps in test_pidfile
  tests: Remove redundant sleeps after start_stalld
  tests: Reduce timing and replace sleeps with event waits
  tests: Fix async-signal-unsafe handler
  bpf: Replace linear task scan with hash map

 .claude/CLAUDE.md                             |  585 ---------
 .claude/agents/agent-prompt-engineer.md       |  135 --
 .claude/agents/c-expert.md                    |   53 -
 .claude/agents/code-reviewer.md               |  104 --
 .claude/agents/get-agent-hash                 |   99 --
 .claude/agents/git-scm-master.md              | 1154 -----------------
 .claude/agents/kernel-hacker.md               |  231 ----
 .claude/agents/plan-validator.md              |  130 --
 .claude/agents/project-historian.md           |  285 ----
 .claude/agents/project-librarian.md           |  604 ---------
 .claude/agents/project-manager.md             |  388 ------
 .claude/agents/project-scope-guardian.md      |  326 -----
 .claude/agents/python-expert.md               |   57 -
 .claude/agents/test-specialist.md             |  656 ----------
 .claude/agents/update-agent-hashes            |   96 --
 .claude/context-snapshot.json                 |  103 --
 .claude/rules                                 |   42 -
 .gitignore                                    |    5 +-
 bpf/stalld.bpf.c                              |  206 ++-
 src/queue_track.c                             |   95 +-
 src/queue_track.h                             |  107 +-
 src/utils.c                                   |    9 +-
 tests/BACKEND_USAGE.md                        |  269 ----
 tests/CONTEXT_SNAPSHOT_2025-10-31.md          |  231 ----
 tests/Makefile                                |   29 +-
 tests/README.md                               |   29 +-
 tests/TODO.md                                 |  727 -----------
 tests/functional/test_affinity.sh             |  178 +--
 tests/functional/test_backend_selection.sh    |   24 +-
 tests/functional/test_boost_duration.sh       |  164 +--
 tests/functional/test_boost_period.sh         |  151 +--
 tests/functional/test_boost_restoration.sh    |  310 +----
 tests/functional/test_boost_runtime.sh        |  179 +--
 tests/functional/test_cpu_selection.sh        |   76 +-
 tests/functional/test_deadline_boosting.sh    |  266 +---
 tests/functional/test_fifo_boosting.sh        |  269 +---
 .../test_fifo_priority_starvation.sh          |  197 +--
 tests/functional/test_force_fifo.sh           |  196 +--
 tests/functional/test_foreground.sh           |   65 +-
 tests/functional/test_idle_detection.sh       |  195 +--
 tests/functional/test_log_only.sh             |   30 +-
 tests/functional/test_logging_destinations.sh |   38 +-
 tests/functional/test_pidfile.sh              |  166 +--
 tests/functional/test_runqueue_parsing.sh     |  417 ------
 tests/functional/test_starvation_detection.sh |  281 +---
 tests/functional/test_starvation_threshold.sh |  138 +-
 tests/functional/test_task_merging.sh         |  161 +--
 tests/helpers/starvation_gen.c                |    2 +-
 tests/helpers/test_helpers.sh                 |  356 +++--
 tests/legacy/README.md                        |  169 ---
 tests/legacy/test01.c                         |  506 --------
 tests/legacy/test01_wrapper.sh                |  161 ---
 tests/run_tests.sh                            |  149 +--
 53 files changed, 874 insertions(+), 10725 deletions(-)
 delete mode 100644 .claude/CLAUDE.md
 delete mode 100644 .claude/agents/agent-prompt-engineer.md
 delete mode 100644 .claude/agents/c-expert.md
 delete mode 100644 .claude/agents/code-reviewer.md
 delete mode 100755 .claude/agents/get-agent-hash
 delete mode 100644 .claude/agents/git-scm-master.md
 delete mode 100644 .claude/agents/kernel-hacker.md
 delete mode 100644 .claude/agents/plan-validator.md
 delete mode 100644 .claude/agents/project-historian.md
 delete mode 100644 .claude/agents/project-librarian.md
 delete mode 100644 .claude/agents/project-manager.md
 delete mode 100644 .claude/agents/project-scope-guardian.md
 delete mode 100644 .claude/agents/python-expert.md
 delete mode 100644 .claude/agents/test-specialist.md
 delete mode 100755 .claude/agents/update-agent-hashes
 delete mode 100644 .claude/context-snapshot.json
 delete mode 100644 .claude/rules
 delete mode 100644 tests/BACKEND_USAGE.md
 delete mode 100644 tests/CONTEXT_SNAPSHOT_2025-10-31.md
 delete mode 100644 tests/TODO.md
 delete mode 100755 tests/functional/test_runqueue_parsing.sh
 delete mode 100644 tests/legacy/README.md
 delete mode 100644 tests/legacy/test01.c
 delete mode 100755 tests/legacy/test01_wrapper.sh

-- 
2.54.0


             reply	other threads:[~2026-05-20 14:01 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-20 14:00 Wander Lairson Costa [this message]
2026-05-20 14:00 ` [[PATCH stalld] 01/33] stalld: Reject --force_fifo in single-threaded mode Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 02/33] tests: Introduce test_section() helper Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 03/33] tests: Introduce cleanup_scenario() helper Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 04/33] tests: Introduce starvation and boost asserts Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 05/33] tests: Introduce find_starved_child() helper Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 06/33] tests: Fix task exit timing in test_boost_restoration Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 07/33] tests: Consolidate and adopt init_functional_test() Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 08/33] tests: Introduce assert_stalld_rejects() helper Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 09/33] tests: Fix boost verification in runtime and duration tests Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 10/33] tests: Fix subshell swallowing test results Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 11/33] tests: Fix repeated log match finding same line Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 12/33] chore: Remove legacy test infrastructure and stale docs Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 13/33] tests: Add assertions to SCHED_OTHER restoration test Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 14/33] tests: Fix CPU selection grep substring matches Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 15/33] tests: Add idle CPU skipping assertion Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 16/33] tests: Remove redundant pkill from cleanup Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 17/33] tests: Introduce and adopt assert_log_contains() helper Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 18/33] tests: Remove weak, redundant, and assertion-free test blocks Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 19/33] tests: Introduce and adopt assert_success() helper Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 20/33] tests: Replace wait conditionals with asserts Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 21/33] tests: Remove if-wrappers around assert calls Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 22/33] tests: Abort immediately on test failure Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 23/33] tests: Remove dead code after making fail() fatal Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 24/33] tests: Introduce and adopt process helpers Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 25/33] tests: Extract wait_for_process_exit helper Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 26/33] tests: Reduce default wait timeouts Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 27/33] tests: Reduce starvation_gen durations Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 28/33] tests: Replace init sleeps in test_affinity Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 29/33] tests: Drop redundant sleeps in test_pidfile Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 30/33] tests: Remove redundant sleeps after start_stalld Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 31/33] tests: Reduce timing and replace sleeps with event waits Wander Lairson Costa
2026-05-20 14:00 ` [[PATCH stalld] 32/33] tests: Fix async-signal-unsafe handler Wander Lairson Costa
2026-05-20 14:01 ` [[PATCH stalld] 33/33] bpf: Replace linear task scan with hash map Wander Lairson Costa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260520140104.112142-1-wander@redhat.com \
    --to=wander@redhat.com \
    --cc=davidlt@rivosinc.com \
    --cc=jkacur@redhat.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=luffyluo@tencent.com \
    --cc=williams@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.