linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/12] stalld: sched_debug parser refactoring and stability fixes
@ 2025-10-17  3:15 Clark Williams
  0 siblings, 0 replies; only message in thread
From: Clark Williams @ 2025-10-17  3:15 UTC (permalink / raw)
  To: linux-rt-users
  Cc: Clark Williams, wander, debarbos, marco.chiappero, chris.friesen,
	luochunsheng

This patch series refactors the sched_debug backend parser to support
multiple kernel versions, fixes critical crashes, and enhances stalld's
robustness and testing capabilities. Note that if you aren't interested
in playing with patches you get pull the 'parser' branch from:

   https://github.com/clrkwllms/stalld

I'd appreciate some review. 

The series is organized into three logical groups:

## sched_debug Parser Refactoring (Patches 1-3)

The first group modernizes the sched_debug task parser to handle format
variations across kernel versions from 3.x through 6.12+:

  - Patch 1 unifies the OLD/NEW parsing code paths into a single
    word-based parser with detected field offsets, eliminating code
    duplication and improving maintainability.

  - Patch 2 fixes critical parsing logic errors including state
    filtering for NEW_TASK_FORMAT ('>R', 'R', 'X' states), corrects
    the skip2word() function to properly position at word boundaries,
    and fixes buffer allocation to include null terminators.

  - Patch 3 resolves a double-free crash in fill_waiting_task() by
    explicitly setting cpu_info->starving = NULL on early returns,
    preventing freed memory from being accessed on subsequent iterations.

## stalld Core Improvements (Patches 4-9)

The second group enhances stalld's logging, testing capabilities, and
runtime robustness:

  - Patches 4-6 improve logging quality: removing noisy idle reports,
    initializing idle_time to -1 to force initial queue checks, and
    removing misleading DL-server messages.

  - Patch 7 adds missing starvation logging in single-threaded log-only
    mode, ensuring consistent behavior across threading modes and fixing
    test_log_only.sh failures.

  - Patch 8 introduces -N/--no_idle_detect flag to disable idle CPU
    detection optimization, critical for controlled test environments.

  - Patch 9 adds defensive NULL checks and bounds validation in
    print_boosted_info() to prevent segfaults during task boosting.

## Build System and Final Fixes (Patches 10-12)

The final group adds legacy kernel support and resolves a remaining
crash in adaptive/aggressive modes:

  - Patch 10 adds comprehensive legacy 3.x kernel build support,
    detecting kernel version and adjusting compilation flags, disabling
    BPF/CET on legacy systems, and defaulting to sched_debug backend.

  - Patch 11 fixes a bash syntax error in run-local.sh for
    compatibility with bash 4.2.x on legacy systems.

  - Patch 12 fixes a critical segfault in adaptive/aggressive modes by
    guarding update_cpu_starving_vector() calls with config_single_threaded
    checks, since cpu_starving_vector is only allocated in single-threaded
    mode.

## Testing

This series has been tested across multiple kernel versions and
threading modes:
  - Legacy 3.x kernels with OLD_TASK_FORMAT
  - Modern 4.18+ kernels with NEW_TASK_FORMAT
  - 6.12+ kernels with EEVDF field additions
  - Single-threaded, adaptive, and aggressive modes
  - Log-only mode with starvation detection
  - Idle detection disabled mode for testing

Clark Williams (10):
  sched_debug: Unify parsing methods for task_info
  sched_debug: Fix runqueue task parsing logic and state filtering
  sched_debug: Fix double-free crash in fill_waiting_task()
  stalld.c: remove noisy idle report and added report to
    should_skip_idle_cpus()
  stalld.c: initialize cpu_info->idle_time to be -1
  stalld.c: get rid of misleading print about DL-Server
  stalld.c: Add starvation logging in single-threaded log-only mode
  stalld: Add -N/--no_idle_detect flag to disable idle detection
  stalld: Add defensive checks in print_boosted_info
  Fix segfault in adaptive/aggressive modes

Derek Barbosa (2):
  Makefile: Add support for legacy kernels
  scripts: fix run-local if bashism

 Makefile             |  40 +++-
 scripts/run-local.sh |   2 +-
 src/sched_debug.c    | 437 ++++++++++++++++++++++---------------------
 src/sched_debug.h    |  65 ++++++-
 src/stalld.c         |  59 ++++--
 src/utils.c          |   8 +-
 6 files changed, 377 insertions(+), 234 deletions(-)

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2025-10-17  3:15 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-17  3:15 [PATCH 00/12] stalld: sched_debug parser refactoring and stability fixes Clark Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).