public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/3] sched_ext: Improve exit-time diagnostics
@ 2026-04-29  8:23 Changwoo Min
  2026-04-29  8:23 ` [PATCH v3 1/3] sched_ext: Extract scx_dump_cpu() from scx_dump_state() Changwoo Min
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Changwoo Min @ 2026-04-29  8:23 UTC (permalink / raw)
  To: tj, void, arighi, changwoo; +Cc: kernel-dev, sched-ext, linux-kernel

When sched_ext is disabled by an error, the per-CPU state dump in the
exit info can get truncated on systems with many CPUs. If the CPU that
triggered the exit happens to be in the middle or end of the CPU list,
its state may never appear in the output, making it difficult to
diagnose the failure.

This series addresses that by always dumping the exit CPU first and
surfacing the same CPU id to BPF schedulers and userspace tools.

Patch 1 is a preparatory refactor that extracts the per-CPU dump logic
into a scx_dump_cpu() helper.

Patch 2 adds an exit_cpu field to scx_exit_info and threads it through
the exit path. The scx_exit() wrapper is reworked into a macro that
captures the calling CPU automatically for all error paths, while the
watchdog stall site records cpu_of(rq) explicitly. scx_dump_state()
reports the CPU in the dump header and emits it before the rest of the
per-CPU loop so it survives any output truncation.

Patch 3 propagates exit_cpu to struct user_exit_info, the BPF /
userspace shared exit record. UEI_RECORD() defaults the field to -1
before its CO-RE-gated copy so older kernels remain distinguishable
from "exit happened on CPU 0", and UEI_REPORT() appends "on CPU N" to
the EXIT line so scheduler authors see the most diagnostically useful
piece of exit info without cracking open the debug dump.

Changes since v2:
- Use s32 (instead of int) for the new exit_cpu field and the
  __scx_exit() / scx_vexit() parameter, matching the convention for
  CPU ids in sched_ext.
- v2: https://lore.kernel.org/sched-ext/20260429060726.359024-1-changwoo@igalia.com/

Changes since v1:
- Generalized "stall CPU" to "exit CPU"; the scx_exit_info field is
  now exit_cpu and is populated for any path through scx_exit() /
  __scx_exit() / scx_vexit(), not just the watchdog stall path.
- Added patch 3 to expose exit_cpu via struct user_exit_info.
- SysRq-D initializes exit_cpu to -1 so debug dumps not tied to an
  exit don't arbitrarily promote CPU 0.
- Dump header now reports "on cpu N" alongside the exit kind.
- v1: https://lore.kernel.org/sched-ext/20260408031113.76005-1-changwoo@igalia.com/

Changwoo Min (3):
  sched_ext: Extract scx_dump_cpu() from scx_dump_state()
  sched_ext: Dump the exit CPU first
  sched_ext: Expose exit_cpu to BPF and userspace

 kernel/sched/ext.c                            | 221 ++++++++++--------
 kernel/sched/ext_internal.h                   |   6 +
 .../include/scx/user_exit_info.bpf.h          |   3 +
 tools/sched_ext/include/scx/user_exit_info.h  |   2 +
 .../include/scx/user_exit_info_common.h       |   5 +
 5 files changed, 142 insertions(+), 95 deletions(-)

-- 
2.54.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-04-29 15:16 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-29  8:23 [PATCH v3 0/3] sched_ext: Improve exit-time diagnostics Changwoo Min
2026-04-29  8:23 ` [PATCH v3 1/3] sched_ext: Extract scx_dump_cpu() from scx_dump_state() Changwoo Min
2026-04-29  8:23 ` [PATCH v3 2/3] sched_ext: Dump the exit CPU first Changwoo Min
2026-04-29  8:23 ` [PATCH v3 3/3] sched_ext: Expose exit_cpu to BPF and userspace Changwoo Min
2026-04-29  8:57 ` [PATCH v3 0/3] sched_ext: Improve exit-time diagnostics Tejun Heo
2026-04-29 11:29   ` Cheng-Yang Chou
2026-04-29 12:51     ` Changwoo Min
2026-04-29 15:16     ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox