[PATCH v3 0/3] sched_ext: Improve exit-time diagnostics

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 0/3] sched_ext: Improve exit-time diagnostics
@ 2026-04-29  8:23 Changwoo Min
  2026-04-29  8:23 ` [PATCH v3 1/3] sched_ext: Extract scx_dump_cpu() from scx_dump_state() Changwoo Min
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Changwoo Min @ 2026-04-29  8:23 UTC (permalink / raw)
  To: tj, void, arighi, changwoo; +Cc: kernel-dev, sched-ext, linux-kernel

When sched_ext is disabled by an error, the per-CPU state dump in the
exit info can get truncated on systems with many CPUs. If the CPU that
triggered the exit happens to be in the middle or end of the CPU list,
its state may never appear in the output, making it difficult to
diagnose the failure.

This series addresses that by always dumping the exit CPU first and
surfacing the same CPU id to BPF schedulers and userspace tools.

Patch 1 is a preparatory refactor that extracts the per-CPU dump logic
into a scx_dump_cpu() helper.

Patch 2 adds an exit_cpu field to scx_exit_info and threads it through
the exit path. The scx_exit() wrapper is reworked into a macro that
captures the calling CPU automatically for all error paths, while the
watchdog stall site records cpu_of(rq) explicitly. scx_dump_state()
reports the CPU in the dump header and emits it before the rest of the
per-CPU loop so it survives any output truncation.

Patch 3 propagates exit_cpu to struct user_exit_info, the BPF /
userspace shared exit record. UEI_RECORD() defaults the field to -1
before its CO-RE-gated copy so older kernels remain distinguishable
from "exit happened on CPU 0", and UEI_REPORT() appends "on CPU N" to
the EXIT line so scheduler authors see the most diagnostically useful
piece of exit info without cracking open the debug dump.

Changes since v2:
- Use s32 (instead of int) for the new exit_cpu field and the
  __scx_exit() / scx_vexit() parameter, matching the convention for
  CPU ids in sched_ext.
- v2: https://lore.kernel.org/sched-ext/20260429060726.359024-1-changwoo@igalia.com/

Changes since v1:
- Generalized "stall CPU" to "exit CPU"; the scx_exit_info field is
  now exit_cpu and is populated for any path through scx_exit() /
  __scx_exit() / scx_vexit(), not just the watchdog stall path.
- Added patch 3 to expose exit_cpu via struct user_exit_info.
- SysRq-D initializes exit_cpu to -1 so debug dumps not tied to an
  exit don't arbitrarily promote CPU 0.
- Dump header now reports "on cpu N" alongside the exit kind.
- v1: https://lore.kernel.org/sched-ext/20260408031113.76005-1-changwoo@igalia.com/

Changwoo Min (3):
  sched_ext: Extract scx_dump_cpu() from scx_dump_state()
  sched_ext: Dump the exit CPU first
  sched_ext: Expose exit_cpu to BPF and userspace

 kernel/sched/ext.c                            | 221 ++++++++++--------
 kernel/sched/ext_internal.h                   |   6 +
 .../include/scx/user_exit_info.bpf.h          |   3 +
 tools/sched_ext/include/scx/user_exit_info.h  |   2 +
 .../include/scx/user_exit_info_common.h       |   5 +
 5 files changed, 142 insertions(+), 95 deletions(-)

-- 
2.54.0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 1/3] sched_ext: Extract scx_dump_cpu() from scx_dump_state()
  2026-04-29  8:23 [PATCH v3 0/3] sched_ext: Improve exit-time diagnostics Changwoo Min
@ 2026-04-29  8:23 ` Changwoo Min
  2026-04-29  8:23 ` [PATCH v3 2/3] sched_ext: Dump the exit CPU first Changwoo Min
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Changwoo Min @ 2026-04-29  8:23 UTC (permalink / raw)
  To: tj, void, arighi, changwoo; +Cc: kernel-dev, sched-ext, linux-kernel

Factor out the per-CPU state dump logic from the for_each_possible_cpu
loop in scx_dump_state() into a new scx_dump_cpu() helper to improve
readability. No functional change.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
---
 kernel/sched/ext.c | 171 +++++++++++++++++++++++----------------------
 1 file changed, 89 insertions(+), 82 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index f7b1b16e81a5..025bd8c6f429 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -6256,6 +6256,94 @@ static void scx_dump_task(struct scx_sched *sch, struct seq_buf *s, struct scx_d
 	}
 }
 
+static void scx_dump_cpu(struct scx_sched *sch, struct seq_buf *s,
+			 struct scx_dump_ctx *dctx, int cpu,
+			 bool dump_all_tasks)
+{
+	struct rq *rq = cpu_rq(cpu);
+	struct rq_flags rf;
+	struct task_struct *p;
+	struct seq_buf ns;
+	size_t avail, used;
+	char *buf;
+	bool idle;
+
+	rq_lock_irqsave(rq, &rf);
+
+	idle = list_empty(&rq->scx.runnable_list) &&
+		rq->curr->sched_class == &idle_sched_class;
+
+	if (idle && !SCX_HAS_OP(sch, dump_cpu))
+		goto next;
+
+	/*
+	 * We don't yet know whether ops.dump_cpu() will produce output
+	 * and we may want to skip the default CPU dump if it doesn't.
+	 * Use a nested seq_buf to generate the standard dump so that we
+	 * can decide whether to commit later.
+	 */
+	avail = seq_buf_get_buf(s, &buf);
+	seq_buf_init(&ns, buf, avail);
+
+	dump_newline(&ns);
+	dump_line(&ns, "CPU %-4d: nr_run=%u flags=0x%x cpu_rel=%d ops_qseq=%lu ksync=%lu",
+		  cpu, rq->scx.nr_running, rq->scx.flags,
+		  rq->scx.cpu_released, rq->scx.ops_qseq,
+		  rq->scx.kick_sync);
+	dump_line(&ns, "          curr=%s[%d] class=%ps",
+		  rq->curr->comm, rq->curr->pid,
+		  rq->curr->sched_class);
+	if (!cpumask_empty(rq->scx.cpus_to_kick))
+		dump_line(&ns, "  cpus_to_kick   : %*pb",
+			  cpumask_pr_args(rq->scx.cpus_to_kick));
+	if (!cpumask_empty(rq->scx.cpus_to_kick_if_idle))
+		dump_line(&ns, "  idle_to_kick   : %*pb",
+			  cpumask_pr_args(rq->scx.cpus_to_kick_if_idle));
+	if (!cpumask_empty(rq->scx.cpus_to_preempt))
+		dump_line(&ns, "  cpus_to_preempt: %*pb",
+			  cpumask_pr_args(rq->scx.cpus_to_preempt));
+	if (!cpumask_empty(rq->scx.cpus_to_wait))
+		dump_line(&ns, "  cpus_to_wait   : %*pb",
+			  cpumask_pr_args(rq->scx.cpus_to_wait));
+	if (!cpumask_empty(rq->scx.cpus_to_sync))
+		dump_line(&ns, "  cpus_to_sync   : %*pb",
+			  cpumask_pr_args(rq->scx.cpus_to_sync));
+
+	used = seq_buf_used(&ns);
+	if (SCX_HAS_OP(sch, dump_cpu)) {
+		ops_dump_init(&ns, "  ");
+		SCX_CALL_OP(sch, dump_cpu, rq, dctx, cpu, idle);
+		ops_dump_exit();
+	}
+
+	/*
+	 * If idle && nothing generated by ops.dump_cpu(), there's
+	 * nothing interesting. Skip.
+	 */
+	if (idle && used == seq_buf_used(&ns))
+		goto next;
+
+	/*
+	 * $s may already have overflowed when $ns was created. If so,
+	 * calling commit on it will trigger BUG.
+	 */
+	if (avail) {
+		seq_buf_commit(s, seq_buf_used(&ns));
+		if (seq_buf_has_overflowed(&ns))
+			seq_buf_set_overflow(s);
+	}
+
+	if (rq->curr->sched_class == &ext_sched_class &&
+	    (dump_all_tasks || scx_task_on_sched(sch, rq->curr)))
+		scx_dump_task(sch, s, dctx, rq, rq->curr, '*');
+
+	list_for_each_entry(p, &rq->scx.runnable_list, scx.runnable_node)
+		if (dump_all_tasks || scx_task_on_sched(sch, p))
+			scx_dump_task(sch, s, dctx, rq, p, ' ');
+next:
+	rq_unlock_irqrestore(rq, &rf);
+}
+
 /*
  * Dump scheduler state. If @dump_all_tasks is true, dump all tasks regardless
  * of which scheduler they belong to. If false, only dump tasks owned by @sch.
@@ -6276,7 +6364,6 @@ static void scx_dump_state(struct scx_sched *sch, struct scx_exit_info *ei,
 	};
 	struct seq_buf s;
 	struct scx_event_stats events;
-	char *buf;
 	int cpu;
 
 	guard(raw_spinlock_irqsave)(&scx_dump_lock);
@@ -6316,87 +6403,7 @@ static void scx_dump_state(struct scx_sched *sch, struct scx_exit_info *ei,
 	dump_line(&s, "----------");
 
 	for_each_possible_cpu(cpu) {
-		struct rq *rq = cpu_rq(cpu);
-		struct rq_flags rf;
-		struct task_struct *p;
-		struct seq_buf ns;
-		size_t avail, used;
-		bool idle;
-
-		rq_lock_irqsave(rq, &rf);
-
-		idle = list_empty(&rq->scx.runnable_list) &&
-			rq->curr->sched_class == &idle_sched_class;
-
-		if (idle && !SCX_HAS_OP(sch, dump_cpu))
-			goto next;
-
-		/*
-		 * We don't yet know whether ops.dump_cpu() will produce output
-		 * and we may want to skip the default CPU dump if it doesn't.
-		 * Use a nested seq_buf to generate the standard dump so that we
-		 * can decide whether to commit later.
-		 */
-		avail = seq_buf_get_buf(&s, &buf);
-		seq_buf_init(&ns, buf, avail);
-
-		dump_newline(&ns);
-		dump_line(&ns, "CPU %-4d: nr_run=%u flags=0x%x cpu_rel=%d ops_qseq=%lu ksync=%lu",
-			  cpu, rq->scx.nr_running, rq->scx.flags,
-			  rq->scx.cpu_released, rq->scx.ops_qseq,
-			  rq->scx.kick_sync);
-		dump_line(&ns, "          curr=%s[%d] class=%ps",
-			  rq->curr->comm, rq->curr->pid,
-			  rq->curr->sched_class);
-		if (!cpumask_empty(rq->scx.cpus_to_kick))
-			dump_line(&ns, "  cpus_to_kick   : %*pb",
-				  cpumask_pr_args(rq->scx.cpus_to_kick));
-		if (!cpumask_empty(rq->scx.cpus_to_kick_if_idle))
-			dump_line(&ns, "  idle_to_kick   : %*pb",
-				  cpumask_pr_args(rq->scx.cpus_to_kick_if_idle));
-		if (!cpumask_empty(rq->scx.cpus_to_preempt))
-			dump_line(&ns, "  cpus_to_preempt: %*pb",
-				  cpumask_pr_args(rq->scx.cpus_to_preempt));
-		if (!cpumask_empty(rq->scx.cpus_to_wait))
-			dump_line(&ns, "  cpus_to_wait   : %*pb",
-				  cpumask_pr_args(rq->scx.cpus_to_wait));
-		if (!cpumask_empty(rq->scx.cpus_to_sync))
-			dump_line(&ns, "  cpus_to_sync   : %*pb",
-				  cpumask_pr_args(rq->scx.cpus_to_sync));
-
-		used = seq_buf_used(&ns);
-		if (SCX_HAS_OP(sch, dump_cpu)) {
-			ops_dump_init(&ns, "  ");
-			SCX_CALL_OP(sch, dump_cpu, rq, &dctx, cpu, idle);
-			ops_dump_exit();
-		}
-
-		/*
-		 * If idle && nothing generated by ops.dump_cpu(), there's
-		 * nothing interesting. Skip.
-		 */
-		if (idle && used == seq_buf_used(&ns))
-			goto next;
-
-		/*
-		 * $s may already have overflowed when $ns was created. If so,
-		 * calling commit on it will trigger BUG.
-		 */
-		if (avail) {
-			seq_buf_commit(&s, seq_buf_used(&ns));
-			if (seq_buf_has_overflowed(&ns))
-				seq_buf_set_overflow(&s);
-		}
-
-		if (rq->curr->sched_class == &ext_sched_class &&
-		    (dump_all_tasks || scx_task_on_sched(sch, rq->curr)))
-			scx_dump_task(sch, &s, &dctx, rq, rq->curr, '*');
-
-		list_for_each_entry(p, &rq->scx.runnable_list, scx.runnable_node)
-			if (dump_all_tasks || scx_task_on_sched(sch, p))
-				scx_dump_task(sch, &s, &dctx, rq, p, ' ');
-	next:
-		rq_unlock_irqrestore(rq, &rf);
+		scx_dump_cpu(sch, &s, &dctx, cpu, dump_all_tasks);
 	}
 
 	dump_newline(&s);
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 2/3] sched_ext: Dump the exit CPU first
  2026-04-29  8:23 [PATCH v3 0/3] sched_ext: Improve exit-time diagnostics Changwoo Min
  2026-04-29  8:23 ` [PATCH v3 1/3] sched_ext: Extract scx_dump_cpu() from scx_dump_state() Changwoo Min
@ 2026-04-29  8:23 ` Changwoo Min
  2026-04-29  8:23 ` [PATCH v3 3/3] sched_ext: Expose exit_cpu to BPF and userspace Changwoo Min
  2026-04-29  8:57 ` [PATCH v3 0/3] sched_ext: Improve exit-time diagnostics Tejun Heo
  3 siblings, 0 replies; 8+ messages in thread
From: Changwoo Min @ 2026-04-29  8:23 UTC (permalink / raw)
  To: tj, void, arighi, changwoo; +Cc: kernel-dev, sched-ext, linux-kernel

When sched_ext is disabled by an error, the CPU that triggered the exit
is the most relevant piece of information for diagnosing the problem.
However, if there are many CPUs, the dump can get truncated and that
CPU's information may not appear in the output.

Add an exit_cpu field to scx_exit_info and thread it through scx_vexit()
/ __scx_exit(). For the watchdog stall path, populate it from cpu_of(rq)
in check_rq_for_timeouts(). For all other exit paths, define a scx_exit()
macro that wraps __scx_exit() with raw_smp_processor_id(), so the CPU
that initiated the exit is captured automatically, with no call-site
changes needed.

In scx_dump_state(), report the exit CPU in the dump header ("on cpu N")
and dump that CPU first, skipping it in the per-CPU loop, so the most
relevant CPU is never truncated out of the dump. The SysRq-D path
initializes exit_cpu to -1 so debug dumps not tied to an exit don't
arbitrarily promote CPU 0.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
---
 kernel/sched/ext.c          | 52 +++++++++++++++++++++++++++----------
 kernel/sched/ext_internal.h |  6 +++++
 2 files changed, 44 insertions(+), 14 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 025bd8c6f429..46c2e395de03 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -234,24 +234,29 @@ static bool task_dead_and_done(struct task_struct *p);
 static void scx_kick_cpu(struct scx_sched *sch, s32 cpu, u64 flags);
 static void scx_disable(struct scx_sched *sch, enum scx_exit_kind kind);
 static bool scx_vexit(struct scx_sched *sch, enum scx_exit_kind kind,
-		      s64 exit_code, const char *fmt, va_list args);
+		      s64 exit_code, s32 exit_cpu, const char *fmt,
+		      va_list args);
 
-static __printf(4, 5) bool scx_exit(struct scx_sched *sch,
-				    enum scx_exit_kind kind, s64 exit_code,
-				    const char *fmt, ...)
+static __printf(5, 6) bool __scx_exit(struct scx_sched *sch,
+				      enum scx_exit_kind kind, s64 exit_code,
+				      s32 exit_cpu, const char *fmt, ...)
 {
 	va_list args;
 	bool ret;
 
 	va_start(args, fmt);
-	ret = scx_vexit(sch, kind, exit_code, fmt, args);
+	ret = scx_vexit(sch, kind, exit_code, exit_cpu, fmt, args);
 	va_end(args);
 
 	return ret;
 }
 
+#define scx_exit(sch, kind, exit_code, fmt, args...)				\
+	__scx_exit(sch, kind, exit_code, raw_smp_processor_id(), fmt, ##args)
+
 #define scx_error(sch, fmt, args...)	scx_exit((sch), SCX_EXIT_ERROR, 0, fmt, ##args)
-#define scx_verror(sch, fmt, args)	scx_vexit((sch), SCX_EXIT_ERROR, 0, fmt, args)
+#define scx_verror(sch, fmt, args)						\
+	scx_vexit((sch), SCX_EXIT_ERROR, 0, raw_smp_processor_id(), fmt, args)
 
 #define SCX_HAS_OP(sch, op)	test_bit(SCX_OP_IDX(op), (sch)->has_op)
 
@@ -3389,9 +3394,10 @@ static bool check_rq_for_timeouts(struct rq *rq)
 					last_runnable + READ_ONCE(sch->watchdog_timeout)))) {
 			u32 dur_ms = jiffies_to_msecs(jiffies - last_runnable);
 
-			scx_exit(sch, SCX_EXIT_ERROR_STALL, 0,
-				 "%s[%d] failed to run for %u.%03us",
-				 p->comm, p->pid, dur_ms / 1000, dur_ms % 1000);
+			__scx_exit(sch, SCX_EXIT_ERROR_STALL, 0, cpu_of(rq),
+				   "%s[%d] failed to run for %u.%03us",
+				   p->comm, p->pid, dur_ms / 1000,
+				   dur_ms % 1000);
 			timed_out = true;
 			break;
 		}
@@ -5528,6 +5534,7 @@ static struct scx_exit_info *alloc_exit_info(size_t exit_dump_len)
 	if (!ei)
 		return NULL;
 
+	ei->exit_cpu = -1;
 	ei->bt = kzalloc_objs(ei->bt[0], SCX_EXIT_BT_LEN);
 	ei->msg = kzalloc(SCX_EXIT_MSG_LEN, GFP_KERNEL);
 	ei->dump = kvzalloc(exit_dump_len, GFP_KERNEL);
@@ -6384,8 +6391,13 @@ static void scx_dump_state(struct scx_sched *sch, struct scx_exit_info *ei,
 	if (ei->kind == SCX_EXIT_NONE) {
 		dump_line(&s, "Debug dump triggered by %s", ei->reason);
 	} else {
-		dump_line(&s, "%s[%d] triggered exit kind %d:",
-			  current->comm, current->pid, ei->kind);
+		if (ei->exit_cpu >= 0)
+			dump_line(&s, "%s[%d] triggered exit kind %d on cpu %d:",
+				  current->comm, current->pid, ei->kind,
+				  ei->exit_cpu);
+		else
+			dump_line(&s, "%s[%d] triggered exit kind %d:",
+				  current->comm, current->pid, ei->kind);
 		dump_line(&s, "  %s (%s)", ei->reason, ei->msg);
 		dump_newline(&s);
 		dump_line(&s, "Backtrace:");
@@ -6402,8 +6414,15 @@ static void scx_dump_state(struct scx_sched *sch, struct scx_exit_info *ei,
 	dump_line(&s, "CPU states");
 	dump_line(&s, "----------");
 
+	/*
+	 * Dump the exit CPU first so it isn't lost to dump truncation, then
+	 * walk the rest in order, skipping the one already dumped.
+	 */
+	if (ei->exit_cpu >= 0)
+		scx_dump_cpu(sch, &s, &dctx, ei->exit_cpu, dump_all_tasks);
 	for_each_possible_cpu(cpu) {
-		scx_dump_cpu(sch, &s, &dctx, cpu, dump_all_tasks);
+		if (cpu != ei->exit_cpu)
+			scx_dump_cpu(sch, &s, &dctx, cpu, dump_all_tasks);
 	}
 
 	dump_newline(&s);
@@ -6442,7 +6461,7 @@ static void scx_disable_irq_workfn(struct irq_work *irq_work)
 }
 
 static bool scx_vexit(struct scx_sched *sch,
-		      enum scx_exit_kind kind, s64 exit_code,
+		      enum scx_exit_kind kind, s64 exit_code, s32 exit_cpu,
 		      const char *fmt, va_list args)
 {
 	struct scx_exit_info *ei = sch->exit_info;
@@ -6465,6 +6484,7 @@ static bool scx_vexit(struct scx_sched *sch,
 	 */
 	ei->kind = kind;
 	ei->reason = scx_exit_reason(ei->kind);
+	ei->exit_cpu = exit_cpu;
 
 	irq_work_queue(&sch->disable_irq_work);
 	return true;
@@ -7730,7 +7750,11 @@ static const struct sysrq_key_op sysrq_sched_ext_reset_op = {
 
 static void sysrq_handle_sched_ext_dump(u8 key)
 {
-	struct scx_exit_info ei = { .kind = SCX_EXIT_NONE, .reason = "SysRq-D" };
+	struct scx_exit_info ei = {
+		.kind		= SCX_EXIT_NONE,
+		.exit_cpu	= -1,
+		.reason		= "SysRq-D",
+	};
 	struct scx_sched *sch;
 
 	list_for_each_entry_rcu(sch, &scx_sched_all, all)
diff --git a/kernel/sched/ext_internal.h b/kernel/sched/ext_internal.h
index a54903bb74b3..54c6ed43b6c7 100644
--- a/kernel/sched/ext_internal.h
+++ b/kernel/sched/ext_internal.h
@@ -97,6 +97,12 @@ struct scx_exit_info {
 	/* %SCX_EXIT_* - broad category of the exit reason */
 	enum scx_exit_kind	kind;
 
+	/*
+	 * CPU that initiated the exit, valid once @kind has been set.
+	 * Negative if the exit path didn't identify a CPU.
+	 */
+	s32			exit_cpu;
+
 	/* exit code if gracefully exiting */
 	s64			exit_code;
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 3/3] sched_ext: Expose exit_cpu to BPF and userspace
  2026-04-29  8:23 [PATCH v3 0/3] sched_ext: Improve exit-time diagnostics Changwoo Min
  2026-04-29  8:23 ` [PATCH v3 1/3] sched_ext: Extract scx_dump_cpu() from scx_dump_state() Changwoo Min
  2026-04-29  8:23 ` [PATCH v3 2/3] sched_ext: Dump the exit CPU first Changwoo Min
@ 2026-04-29  8:23 ` Changwoo Min
  2026-04-29  8:57 ` [PATCH v3 0/3] sched_ext: Improve exit-time diagnostics Tejun Heo
  3 siblings, 0 replies; 8+ messages in thread
From: Changwoo Min @ 2026-04-29  8:23 UTC (permalink / raw)
  To: tj, void, arighi, changwoo; +Cc: kernel-dev, sched-ext, linux-kernel

Extend struct user_exit_info with an exit_cpu field so BPF schedulers
and the userspace report path can see the CPU that triggered the exit,
matching the kernel-side dump.

UEI_RECORD() defaults the field to -1 before the CO-RE-gated copy so
that running against an older kernel without exit_cpu stays
distinguishable from "exit happened on CPU 0".

UEI_REPORT() appends "on CPU N" to the EXIT line when the value is
valid, surfacing the most diagnostically useful piece of exit info to
any sched_ext userspace tool without needing to crack open the debug
dump.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
---
 tools/sched_ext/include/scx/user_exit_info.bpf.h    | 3 +++
 tools/sched_ext/include/scx/user_exit_info.h        | 2 ++
 tools/sched_ext/include/scx/user_exit_info_common.h | 5 +++++
 3 files changed, 10 insertions(+)

diff --git a/tools/sched_ext/include/scx/user_exit_info.bpf.h b/tools/sched_ext/include/scx/user_exit_info.bpf.h
index e7ac6611a990..98cab643c8d9 100644
--- a/tools/sched_ext/include/scx/user_exit_info.bpf.h
+++ b/tools/sched_ext/include/scx/user_exit_info.bpf.h
@@ -32,6 +32,9 @@
 				  __uei_name##_dump_len, (__ei)->dump);		\
 	if (bpf_core_field_exists((__ei)->exit_code))				\
 		__uei_name.exit_code = (__ei)->exit_code;			\
+	__uei_name.exit_cpu = -1;						\
+	if (bpf_core_field_exists((__ei)->exit_cpu))				\
+		__uei_name.exit_cpu = (__ei)->exit_cpu;				\
 	/* use __sync to force memory barrier */				\
 	__sync_val_compare_and_swap(&__uei_name.kind, __uei_name.kind,		\
 				    (__ei)->kind);				\
diff --git a/tools/sched_ext/include/scx/user_exit_info.h b/tools/sched_ext/include/scx/user_exit_info.h
index 399697fa372f..56a02b549aef 100644
--- a/tools/sched_ext/include/scx/user_exit_info.h
+++ b/tools/sched_ext/include/scx/user_exit_info.h
@@ -39,6 +39,8 @@
 	fprintf(stderr, "EXIT: %s", __uei->reason);				\
 	if (__uei->msg[0] != '\0')						\
 		fprintf(stderr, " (%s)", __uei->msg);				\
+	if (__uei->exit_cpu >= 0)						\
+		fprintf(stderr, " on CPU %d", __uei->exit_cpu);			\
 	fputs("\n", stderr);							\
 	__uei->exit_code;							\
 })
diff --git a/tools/sched_ext/include/scx/user_exit_info_common.h b/tools/sched_ext/include/scx/user_exit_info_common.h
index 2d0981aedd89..76e2a055eb4b 100644
--- a/tools/sched_ext/include/scx/user_exit_info_common.h
+++ b/tools/sched_ext/include/scx/user_exit_info_common.h
@@ -22,6 +22,11 @@ enum uei_sizes {
 
 struct user_exit_info {
 	int		kind;
+	/*
+	 * CPU that triggered the exit, or -1 if unset (e.g. running on an
+	 * older kernel that does not expose this field).
+	 */
+	s32		exit_cpu;
 	s64		exit_code;
 	char		reason[UEI_REASON_LEN];
 	char		msg[UEI_MSG_LEN];
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 0/3] sched_ext: Improve exit-time diagnostics
  2026-04-29  8:23 [PATCH v3 0/3] sched_ext: Improve exit-time diagnostics Changwoo Min
                   ` (2 preceding siblings ...)
  2026-04-29  8:23 ` [PATCH v3 3/3] sched_ext: Expose exit_cpu to BPF and userspace Changwoo Min
@ 2026-04-29  8:57 ` Tejun Heo
  2026-04-29 11:29   ` Cheng-Yang Chou
  3 siblings, 1 reply; 8+ messages in thread
From: Tejun Heo @ 2026-04-29  8:57 UTC (permalink / raw)
  To: Changwoo Min; +Cc: void, arighi, kernel-dev, sched-ext, linux-kernel

Hello,

> Changwoo Min (3):
>   sched_ext: Extract scx_dump_cpu() from scx_dump_state()
>   sched_ext: Dump the exit CPU first
>   sched_ext: Expose exit_cpu to BPF and userspace

Applied 1-3 to sched_ext/for-7.2, thank you.

A few things I noticed that might be worth a follow-up:

1. scx_rcu_cpu_stall() takes no cpu, so the captured exit_cpu ends
   up being the detector rather than the stalled one. We could
   probably plumb it through from print_other_cpu_stall(), where
   the stalled cpu is known.

2. scx_hardlockup_irq_workfn() already has the hung cpu locally, so
   passing it via __scx_exit() might be a bit more robust than
   relying on irq_work routing.

3. Minor: "on cpu N" (kernel) vs "on CPU N" (UEI) - the casing
   could probably match.

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 0/3] sched_ext: Improve exit-time diagnostics
  2026-04-29  8:57 ` [PATCH v3 0/3] sched_ext: Improve exit-time diagnostics Tejun Heo
@ 2026-04-29 11:29   ` Cheng-Yang Chou
  2026-04-29 12:51     ` Changwoo Min
  2026-04-29 15:16     ` Tejun Heo
  0 siblings, 2 replies; 8+ messages in thread
From: Cheng-Yang Chou @ 2026-04-29 11:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Changwoo Min, void, arighi, kernel-dev, sched-ext, linux-kernel,
	Ching-Chun Huang, Chia-Ping Tsai

Hi Tejun,

On Tue, Apr 28, 2026 at 10:57:27PM -1000, Tejun Heo wrote:
> A few things I noticed that might be worth a follow-up:
> 
> 1. scx_rcu_cpu_stall() takes no cpu, so the captured exit_cpu ends
>    up being the detector rather than the stalled one. We could
>    probably plumb it through from print_other_cpu_stall(), where
>    the stalled cpu is known.

Do you mean we should change the function signatures to pass the stalled
CPU through, e.g. panic_on_rcu_stall(int stalled_cpu) and
scx_rcu_cpu_stall(int stalled_cpu)?

> 
> 2. scx_hardlockup_irq_workfn() already has the hung cpu locally, so
>    passing it via __scx_exit() might be a bit more robust than
>    relying on irq_work routing.
> 
> 3. Minor: "on cpu N" (kernel) vs "on CPU N" (UEI) - the casing
>    could probably match.
> 

I have a draft patch and can send it out. If Changwoo or anyone else is
already working on this, pls let me know!

-- 
Cheers,
Cheng-Yang

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 0/3] sched_ext: Improve exit-time diagnostics
  2026-04-29 11:29   ` Cheng-Yang Chou
@ 2026-04-29 12:51     ` Changwoo Min
  2026-04-29 15:16     ` Tejun Heo
  1 sibling, 0 replies; 8+ messages in thread
From: Changwoo Min @ 2026-04-29 12:51 UTC (permalink / raw)
  To: Cheng-Yang Chou, Tejun Heo
  Cc: void, arighi, kernel-dev, sched-ext, linux-kernel,
	Ching-Chun Huang, Chia-Ping Tsai

Hi Cheng-Yang,

On 4/29/26 8:29 PM, Cheng-Yang Chou wrote:
>> 2. scx_hardlockup_irq_workfn() already has the hung cpu locally, so
>>     passing it via __scx_exit() might be a bit more robust than
>>     relying on irq_work routing.
>>
>> 3. Minor: "on cpu N" (kernel) vs "on CPU N" (UEI) - the casing
>>     could probably match.
>>
> I have a draft patch and can send it out. If Changwoo or anyone else is
> already working on this, pls let me know!

Feel free to go ahead. Thanks!

Regards,
Changwoo Min

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 0/3] sched_ext: Improve exit-time diagnostics
  2026-04-29 11:29   ` Cheng-Yang Chou
  2026-04-29 12:51     ` Changwoo Min
@ 2026-04-29 15:16     ` Tejun Heo
  1 sibling, 0 replies; 8+ messages in thread
From: Tejun Heo @ 2026-04-29 15:16 UTC (permalink / raw)
  To: Cheng-Yang Chou
  Cc: Changwoo Min, void, arighi, kernel-dev, sched-ext, linux-kernel,
	Ching-Chun Huang, Chia-Ping Tsai

On Wed, Apr 29, 2026 at 07:29:30PM +0800, Cheng-Yang Chou wrote:
> Hi Tejun,
> 
> On Tue, Apr 28, 2026 at 10:57:27PM -1000, Tejun Heo wrote:
> > A few things I noticed that might be worth a follow-up:
> > 
> > 1. scx_rcu_cpu_stall() takes no cpu, so the captured exit_cpu ends
> >    up being the detector rather than the stalled one. We could
> >    probably plumb it through from print_other_cpu_stall(), where
> >    the stalled cpu is known.
> 
> Do you mean we should change the function signatures to pass the stalled
> CPU through, e.g. panic_on_rcu_stall(int stalled_cpu) and
> scx_rcu_cpu_stall(int stalled_cpu)?

Yeah.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-04-29 15:16 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-29  8:23 [PATCH v3 0/3] sched_ext: Improve exit-time diagnostics Changwoo Min
2026-04-29  8:23 ` [PATCH v3 1/3] sched_ext: Extract scx_dump_cpu() from scx_dump_state() Changwoo Min
2026-04-29  8:23 ` [PATCH v3 2/3] sched_ext: Dump the exit CPU first Changwoo Min
2026-04-29  8:23 ` [PATCH v3 3/3] sched_ext: Expose exit_cpu to BPF and userspace Changwoo Min
2026-04-29  8:57 ` [PATCH v3 0/3] sched_ext: Improve exit-time diagnostics Tejun Heo
2026-04-29 11:29   ` Cheng-Yang Chou
2026-04-29 12:51     ` Changwoo Min
2026-04-29 15:16     ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox