[PATCH 0/2] sched_ext: Improve watchdog stall diagnostics

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] sched_ext: Improve watchdog stall diagnostics
@ 2026-04-08  3:11 Changwoo Min
  2026-04-08  3:11 ` [PATCH 1/2] sched_ext: Extract scx_dump_cpu() from scx_dump_state() Changwoo Min
  2026-04-08  3:11 ` [PATCH 2/2] sched_ext: Dump the stall CPU first in watchdog exit Changwoo Min
  0 siblings, 2 replies; 7+ messages in thread
From: Changwoo Min @ 2026-04-08  3:11 UTC (permalink / raw)
  To: tj, void, arighi, changwoo; +Cc: kernel-dev, sched-ext, linux-kernel

When a watchdog timeout fires on a system with many CPUs, the per-CPU
state dump in the exit info can get truncated. If the stall CPU happens
to be in the middle or end of the CPU list, its state may never appear
in the output, making it difficult to diagnose the hang.

This series addresses that by always dumping the stall CPU first.

Patch 1 is a preparatory refactor that extracts the per-CPU dump logic
into a scx_dump_cpu() helper, making patch 2 straightforward.

Patch 2 adds a stall_cpu field to scx_exit_info, threads it through
the exit path, and reorders the dump loop to emit the stall CPU before
all others.

Changwoo Min (2):
  sched_ext: Extract scx_dump_cpu() from scx_dump_state()
  sched_ext: Dump the stall CPU first in watchdog exit

 kernel/sched/ext.c          | 202 +++++++++++++++++++-----------------
 kernel/sched/ext_internal.h |   3 +
 2 files changed, 112 insertions(+), 93 deletions(-)

-- 
2.53.0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] sched_ext: Extract scx_dump_cpu() from scx_dump_state()
  2026-04-08  3:11 [PATCH 0/2] sched_ext: Improve watchdog stall diagnostics Changwoo Min
@ 2026-04-08  3:11 ` Changwoo Min
  2026-04-08  3:11 ` [PATCH 2/2] sched_ext: Dump the stall CPU first in watchdog exit Changwoo Min
  1 sibling, 0 replies; 7+ messages in thread
From: Changwoo Min @ 2026-04-08  3:11 UTC (permalink / raw)
  To: tj, void, arighi, changwoo; +Cc: kernel-dev, sched-ext, linux-kernel

Factor out the per-CPU state dump logic from the for_each_possible_cpu
loop in scx_dump_state() into a new scx_dump_cpu() helper to improve
readability. No functional change.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
---
 kernel/sched/ext.c | 173 +++++++++++++++++++++++----------------------
 1 file changed, 90 insertions(+), 83 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index b757b853b42b..8f7d5c1556be 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -6190,6 +6190,95 @@ static void scx_dump_task(struct scx_sched *sch,
 	}
 }
 
+static void scx_dump_cpu(struct scx_sched *sch, struct seq_buf *s,
+			 struct scx_dump_ctx *dctx, int cpu,
+			 bool dump_all_tasks)
+{
+	struct rq *rq = cpu_rq(cpu);
+	struct rq_flags rf;
+	struct task_struct *p;
+	struct seq_buf ns;
+	size_t avail, used;
+	char *buf;
+	bool idle;
+
+	rq_lock_irqsave(rq, &rf);
+
+	idle = list_empty(&rq->scx.runnable_list) &&
+		rq->curr->sched_class == &idle_sched_class;
+
+	if (idle && !SCX_HAS_OP(sch, dump_cpu))
+		goto next;
+
+	/*
+	 * We don't yet know whether ops.dump_cpu() will produce output
+	 * and we may want to skip the default CPU dump if it doesn't.
+	 * Use a nested seq_buf to generate the standard dump so that we
+	 * can decide whether to commit later.
+	 */
+	avail = seq_buf_get_buf(s, &buf);
+	seq_buf_init(&ns, buf, avail);
+
+	dump_newline(&ns);
+	dump_line(&ns, "CPU %-4d: nr_run=%u flags=0x%x cpu_rel=%d ops_qseq=%lu ksync=%lu",
+		  cpu, rq->scx.nr_running, rq->scx.flags,
+		  rq->scx.cpu_released, rq->scx.ops_qseq,
+		  rq->scx.kick_sync);
+	dump_line(&ns, "          curr=%s[%d] class=%ps",
+		  rq->curr->comm, rq->curr->pid,
+		  rq->curr->sched_class);
+	if (!cpumask_empty(rq->scx.cpus_to_kick))
+		dump_line(&ns, "  cpus_to_kick   : %*pb",
+			  cpumask_pr_args(rq->scx.cpus_to_kick));
+	if (!cpumask_empty(rq->scx.cpus_to_kick_if_idle))
+		dump_line(&ns, "  idle_to_kick   : %*pb",
+			  cpumask_pr_args(rq->scx.cpus_to_kick_if_idle));
+	if (!cpumask_empty(rq->scx.cpus_to_preempt))
+		dump_line(&ns, "  cpus_to_preempt: %*pb",
+			  cpumask_pr_args(rq->scx.cpus_to_preempt));
+	if (!cpumask_empty(rq->scx.cpus_to_wait))
+		dump_line(&ns, "  cpus_to_wait   : %*pb",
+			  cpumask_pr_args(rq->scx.cpus_to_wait));
+	if (!cpumask_empty(rq->scx.cpus_to_sync))
+		dump_line(&ns, "  cpus_to_sync   : %*pb",
+			  cpumask_pr_args(rq->scx.cpus_to_sync));
+
+	used = seq_buf_used(&ns);
+	if (SCX_HAS_OP(sch, dump_cpu)) {
+		ops_dump_init(&ns, "  ");
+		SCX_CALL_OP(sch, SCX_KF_REST, dump_cpu, NULL,
+			    dctx, cpu, idle);
+		ops_dump_exit();
+	}
+
+	/*
+	 * If idle && nothing generated by ops.dump_cpu(), there's
+	 * nothing interesting. Skip.
+	 */
+	if (idle && used == seq_buf_used(&ns))
+		goto next;
+
+	/*
+	 * $s may already have overflowed when $ns was created. If so,
+	 * calling commit on it will trigger BUG.
+	 */
+	if (avail) {
+		seq_buf_commit(s, seq_buf_used(&ns));
+		if (seq_buf_has_overflowed(&ns))
+			seq_buf_set_overflow(s);
+	}
+
+	if (rq->curr->sched_class == &ext_sched_class &&
+	    (dump_all_tasks || scx_task_on_sched(sch, rq->curr)))
+		scx_dump_task(sch, s, dctx, rq->curr, '*');
+
+	list_for_each_entry(p, &rq->scx.runnable_list, scx.runnable_node)
+		if (dump_all_tasks || scx_task_on_sched(sch, p))
+			scx_dump_task(sch, s, dctx, p, ' ');
+next:
+	rq_unlock_irqrestore(rq, &rf);
+}
+
 /*
  * Dump scheduler state. If @dump_all_tasks is true, dump all tasks regardless
  * of which scheduler they belong to. If false, only dump tasks owned by @sch.
@@ -6210,7 +6299,6 @@ static void scx_dump_state(struct scx_sched *sch, struct scx_exit_info *ei,
 	};
 	struct seq_buf s;
 	struct scx_event_stats events;
-	char *buf;
 	int cpu;
 
 	guard(raw_spinlock_irqsave)(&scx_dump_lock);
@@ -6250,88 +6338,7 @@ static void scx_dump_state(struct scx_sched *sch, struct scx_exit_info *ei,
 	dump_line(&s, "----------");
 
 	for_each_possible_cpu(cpu) {
-		struct rq *rq = cpu_rq(cpu);
-		struct rq_flags rf;
-		struct task_struct *p;
-		struct seq_buf ns;
-		size_t avail, used;
-		bool idle;
-
-		rq_lock_irqsave(rq, &rf);
-
-		idle = list_empty(&rq->scx.runnable_list) &&
-			rq->curr->sched_class == &idle_sched_class;
-
-		if (idle && !SCX_HAS_OP(sch, dump_cpu))
-			goto next;
-
-		/*
-		 * We don't yet know whether ops.dump_cpu() will produce output
-		 * and we may want to skip the default CPU dump if it doesn't.
-		 * Use a nested seq_buf to generate the standard dump so that we
-		 * can decide whether to commit later.
-		 */
-		avail = seq_buf_get_buf(&s, &buf);
-		seq_buf_init(&ns, buf, avail);
-
-		dump_newline(&ns);
-		dump_line(&ns, "CPU %-4d: nr_run=%u flags=0x%x cpu_rel=%d ops_qseq=%lu ksync=%lu",
-			  cpu, rq->scx.nr_running, rq->scx.flags,
-			  rq->scx.cpu_released, rq->scx.ops_qseq,
-			  rq->scx.kick_sync);
-		dump_line(&ns, "          curr=%s[%d] class=%ps",
-			  rq->curr->comm, rq->curr->pid,
-			  rq->curr->sched_class);
-		if (!cpumask_empty(rq->scx.cpus_to_kick))
-			dump_line(&ns, "  cpus_to_kick   : %*pb",
-				  cpumask_pr_args(rq->scx.cpus_to_kick));
-		if (!cpumask_empty(rq->scx.cpus_to_kick_if_idle))
-			dump_line(&ns, "  idle_to_kick   : %*pb",
-				  cpumask_pr_args(rq->scx.cpus_to_kick_if_idle));
-		if (!cpumask_empty(rq->scx.cpus_to_preempt))
-			dump_line(&ns, "  cpus_to_preempt: %*pb",
-				  cpumask_pr_args(rq->scx.cpus_to_preempt));
-		if (!cpumask_empty(rq->scx.cpus_to_wait))
-			dump_line(&ns, "  cpus_to_wait   : %*pb",
-				  cpumask_pr_args(rq->scx.cpus_to_wait));
-		if (!cpumask_empty(rq->scx.cpus_to_sync))
-			dump_line(&ns, "  cpus_to_sync   : %*pb",
-				  cpumask_pr_args(rq->scx.cpus_to_sync));
-
-		used = seq_buf_used(&ns);
-		if (SCX_HAS_OP(sch, dump_cpu)) {
-			ops_dump_init(&ns, "  ");
-			SCX_CALL_OP(sch, SCX_KF_REST, dump_cpu, NULL,
-				    &dctx, cpu, idle);
-			ops_dump_exit();
-		}
-
-		/*
-		 * If idle && nothing generated by ops.dump_cpu(), there's
-		 * nothing interesting. Skip.
-		 */
-		if (idle && used == seq_buf_used(&ns))
-			goto next;
-
-		/*
-		 * $s may already have overflowed when $ns was created. If so,
-		 * calling commit on it will trigger BUG.
-		 */
-		if (avail) {
-			seq_buf_commit(&s, seq_buf_used(&ns));
-			if (seq_buf_has_overflowed(&ns))
-				seq_buf_set_overflow(&s);
-		}
-
-		if (rq->curr->sched_class == &ext_sched_class &&
-		    (dump_all_tasks || scx_task_on_sched(sch, rq->curr)))
-			scx_dump_task(sch, &s, &dctx, rq->curr, '*');
-
-		list_for_each_entry(p, &rq->scx.runnable_list, scx.runnable_node)
-			if (dump_all_tasks || scx_task_on_sched(sch, p))
-				scx_dump_task(sch, &s, &dctx, p, ' ');
-	next:
-		rq_unlock_irqrestore(rq, &rf);
+		scx_dump_cpu(sch, &s, &dctx, cpu, dump_all_tasks);
 	}
 
 	dump_newline(&s);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] sched_ext: Dump the stall CPU first in watchdog exit
  2026-04-08  3:11 [PATCH 0/2] sched_ext: Improve watchdog stall diagnostics Changwoo Min
  2026-04-08  3:11 ` [PATCH 1/2] sched_ext: Extract scx_dump_cpu() from scx_dump_state() Changwoo Min
@ 2026-04-08  3:11 ` Changwoo Min
  2026-04-09  1:19   ` Tejun Heo
  2026-04-09  5:44   ` Andrea Righi
  1 sibling, 2 replies; 7+ messages in thread
From: Changwoo Min @ 2026-04-08  3:11 UTC (permalink / raw)
  To: tj, void, arighi, changwoo; +Cc: kernel-dev, sched-ext, linux-kernel

When a watchdog timeout fires, the CPU where the stalled task was
running is the most relevant piece of information for diagnosing the
hang. However, if there are many CPUs, the dump can get truncated and
the stall CPU's information may not appear in the output.

Add a stall_cpu field to scx_exit_info, thread it through scx_vexit()
and __scx_exit(), and populate it from cpu_of(rq) in
check_rq_for_timeouts(). In scx_dump_state(), dump the stall CPU
before iterating the rest so it always appears at the top of the output.

Introduce a scx_exit() macro that wraps __scx_exit() with stall_cpu=0
for all non-stall exit paths, keeping call sites unchanged.

Signed-off-by: Changwoo Min <changwoo@igalia.com>
---
 kernel/sched/ext.c          | 31 ++++++++++++++++++++-----------
 kernel/sched/ext_internal.h |  3 +++
 2 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 8f7d5c1556be..671a1713aedb 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -200,24 +200,28 @@ static bool task_dead_and_done(struct task_struct *p);
 static void scx_kick_cpu(struct scx_sched *sch, s32 cpu, u64 flags);
 static void scx_disable(struct scx_sched *sch, enum scx_exit_kind kind);
 static bool scx_vexit(struct scx_sched *sch, enum scx_exit_kind kind,
-		      s64 exit_code, const char *fmt, va_list args);
+		      s64 exit_code, int stall_cpu, const char *fmt,
+		      va_list args);
 
-static __printf(4, 5) bool scx_exit(struct scx_sched *sch,
-				    enum scx_exit_kind kind, s64 exit_code,
-				    const char *fmt, ...)
+static __printf(5, 6) bool __scx_exit(struct scx_sched *sch,
+				      enum scx_exit_kind kind, s64 exit_code,
+				      int stall_cpu, const char *fmt, ...)
 {
 	va_list args;
 	bool ret;
 
 	va_start(args, fmt);
-	ret = scx_vexit(sch, kind, exit_code, fmt, args);
+	ret = scx_vexit(sch, kind, exit_code, stall_cpu, fmt, args);
 	va_end(args);
 
 	return ret;
 }
 
+#define scx_exit(sch, kind, exit_code, fmt, args...)				\
+					__scx_exit(sch, kind, exit_code, 0, fmt, ##args)
+
 #define scx_error(sch, fmt, args...)	scx_exit((sch), SCX_EXIT_ERROR, 0, fmt, ##args)
-#define scx_verror(sch, fmt, args)	scx_vexit((sch), SCX_EXIT_ERROR, 0, fmt, args)
+#define scx_verror(sch, fmt, args)	scx_vexit((sch), SCX_EXIT_ERROR, 0, 0, fmt, args)
 
 #define SCX_HAS_OP(sch, op)	test_bit(SCX_OP_IDX(op), (sch)->has_op)
 
@@ -3433,9 +3437,10 @@ static bool check_rq_for_timeouts(struct rq *rq)
 					last_runnable + READ_ONCE(sch->watchdog_timeout)))) {
 			u32 dur_ms = jiffies_to_msecs(jiffies - last_runnable);
 
-			scx_exit(sch, SCX_EXIT_ERROR_STALL, 0,
-				 "%s[%d] failed to run for %u.%03us",
-				 p->comm, p->pid, dur_ms / 1000, dur_ms % 1000);
+			__scx_exit(sch, SCX_EXIT_ERROR_STALL, 0, cpu_of(rq),
+				   "%s[%d] failed to run for %u.%03us",
+				   p->comm, p->pid, dur_ms / 1000,
+				   dur_ms % 1000);
 			timed_out = true;
 			break;
 		}
@@ -6337,8 +6342,11 @@ static void scx_dump_state(struct scx_sched *sch, struct scx_exit_info *ei,
 	dump_line(&s, "CPU states");
 	dump_line(&s, "----------");
 
+	/* Dump the stall CPU first, then dump the rest in order. */
+	scx_dump_cpu(sch, &s, &dctx, ei->stall_cpu, dump_all_tasks);
 	for_each_possible_cpu(cpu) {
-		scx_dump_cpu(sch, &s, &dctx, cpu, dump_all_tasks);
+		if (cpu != ei->stall_cpu)
+			scx_dump_cpu(sch, &s, &dctx, cpu, dump_all_tasks);
 	}
 
 	dump_newline(&s);
@@ -6377,7 +6385,7 @@ static void scx_disable_irq_workfn(struct irq_work *irq_work)
 }
 
 static bool scx_vexit(struct scx_sched *sch,
-		      enum scx_exit_kind kind, s64 exit_code,
+		      enum scx_exit_kind kind, s64 exit_code, int stall_cpu,
 		      const char *fmt, va_list args)
 {
 	struct scx_exit_info *ei = sch->exit_info;
@@ -6400,6 +6408,7 @@ static bool scx_vexit(struct scx_sched *sch,
 	 */
 	ei->kind = kind;
 	ei->reason = scx_exit_reason(ei->kind);
+	ei->stall_cpu = stall_cpu;
 
 	irq_work_queue(&sch->disable_irq_work);
 	return true;
diff --git a/kernel/sched/ext_internal.h b/kernel/sched/ext_internal.h
index b4f36d8b9c1d..a0a09e8f2ac2 100644
--- a/kernel/sched/ext_internal.h
+++ b/kernel/sched/ext_internal.h
@@ -93,6 +93,9 @@ struct scx_exit_info {
 	/* %SCX_EXIT_* - broad category of the exit reason */
 	enum scx_exit_kind	kind;
 
+	/* CPU where a task stall happened. */
+	int			stall_cpu;
+
 	/* exit code if gracefully exiting */
 	s64			exit_code;
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] sched_ext: Dump the stall CPU first in watchdog exit
  2026-04-08  3:11 ` [PATCH 2/2] sched_ext: Dump the stall CPU first in watchdog exit Changwoo Min
@ 2026-04-09  1:19   ` Tejun Heo
  2026-04-09  5:52     ` Andrea Righi
  2026-04-09  5:44   ` Andrea Righi
  1 sibling, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2026-04-09  1:19 UTC (permalink / raw)
  To: Changwoo Min; +Cc: void, arighi, kernel-dev, sched-ext, linux-kernel

On Wed, Apr 08, 2026 at 12:11:13PM +0900, Changwoo Min wrote:
> When a watchdog timeout fires, the CPU where the stalled task was
> running is the most relevant piece of information for diagnosing the
> hang. However, if there are many CPUs, the dump can get truncated and
> the stall CPU's information may not appear in the output.
> 
> Add a stall_cpu field to scx_exit_info, thread it through scx_vexit()
> and __scx_exit(), and populate it from cpu_of(rq) in
> check_rq_for_timeouts(). In scx_dump_state(), dump the stall CPU
> before iterating the rest so it always appears at the top of the output.
> 
> Introduce a scx_exit() macro that wraps __scx_exit() with stall_cpu=0
> for all non-stall exit paths, keeping call sites unchanged.

Would it make sense to generalize this so that the exit record the CPU the
exit is triggered on and always dump that CPU first? That should include
stall case, is likely useful for different cases too and we don't have to
add @stall_cpu to the exit functions.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] sched_ext: Dump the stall CPU first in watchdog exit
  2026-04-08  3:11 ` [PATCH 2/2] sched_ext: Dump the stall CPU first in watchdog exit Changwoo Min
  2026-04-09  1:19   ` Tejun Heo
@ 2026-04-09  5:44   ` Andrea Righi
  1 sibling, 0 replies; 7+ messages in thread
From: Andrea Righi @ 2026-04-09  5:44 UTC (permalink / raw)
  To: Changwoo Min; +Cc: tj, void, kernel-dev, sched-ext, linux-kernel

Hi Changwoo,

On Wed, Apr 08, 2026 at 12:11:13PM +0900, Changwoo Min wrote:
> When a watchdog timeout fires, the CPU where the stalled task was
> running is the most relevant piece of information for diagnosing the
> hang. However, if there are many CPUs, the dump can get truncated and
> the stall CPU's information may not appear in the output.
> 
> Add a stall_cpu field to scx_exit_info, thread it through scx_vexit()
> and __scx_exit(), and populate it from cpu_of(rq) in
> check_rq_for_timeouts(). In scx_dump_state(), dump the stall CPU
> before iterating the rest so it always appears at the top of the output.
> 
> Introduce a scx_exit() macro that wraps __scx_exit() with stall_cpu=0
> for all non-stall exit paths, keeping call sites unchanged.

Should we use stall_cpu = -1 as a sentinel to represent "no stall"?

> 
> Signed-off-by: Changwoo Min <changwoo@igalia.com>
> ---
>  kernel/sched/ext.c          | 31 ++++++++++++++++++++-----------
>  kernel/sched/ext_internal.h |  3 +++
>  2 files changed, 23 insertions(+), 11 deletions(-)
> 
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 8f7d5c1556be..671a1713aedb 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -200,24 +200,28 @@ static bool task_dead_and_done(struct task_struct *p);
>  static void scx_kick_cpu(struct scx_sched *sch, s32 cpu, u64 flags);
>  static void scx_disable(struct scx_sched *sch, enum scx_exit_kind kind);
>  static bool scx_vexit(struct scx_sched *sch, enum scx_exit_kind kind,
> -		      s64 exit_code, const char *fmt, va_list args);
> +		      s64 exit_code, int stall_cpu, const char *fmt,
> +		      va_list args);
>  
> -static __printf(4, 5) bool scx_exit(struct scx_sched *sch,
> -				    enum scx_exit_kind kind, s64 exit_code,
> -				    const char *fmt, ...)
> +static __printf(5, 6) bool __scx_exit(struct scx_sched *sch,
> +				      enum scx_exit_kind kind, s64 exit_code,
> +				      int stall_cpu, const char *fmt, ...)
>  {
>  	va_list args;
>  	bool ret;
>  
>  	va_start(args, fmt);
> -	ret = scx_vexit(sch, kind, exit_code, fmt, args);
> +	ret = scx_vexit(sch, kind, exit_code, stall_cpu, fmt, args);
>  	va_end(args);
>  
>  	return ret;
>  }
>  
> +#define scx_exit(sch, kind, exit_code, fmt, args...)				\
> +					__scx_exit(sch, kind, exit_code, 0, fmt, ##args)
> +
>  #define scx_error(sch, fmt, args...)	scx_exit((sch), SCX_EXIT_ERROR, 0, fmt, ##args)
> -#define scx_verror(sch, fmt, args)	scx_vexit((sch), SCX_EXIT_ERROR, 0, fmt, args)
> +#define scx_verror(sch, fmt, args)	scx_vexit((sch), SCX_EXIT_ERROR, 0, 0, fmt, args)
>  
>  #define SCX_HAS_OP(sch, op)	test_bit(SCX_OP_IDX(op), (sch)->has_op)
>  
> @@ -3433,9 +3437,10 @@ static bool check_rq_for_timeouts(struct rq *rq)
>  					last_runnable + READ_ONCE(sch->watchdog_timeout)))) {
>  			u32 dur_ms = jiffies_to_msecs(jiffies - last_runnable);
>  
> -			scx_exit(sch, SCX_EXIT_ERROR_STALL, 0,
> -				 "%s[%d] failed to run for %u.%03us",
> -				 p->comm, p->pid, dur_ms / 1000, dur_ms % 1000);
> +			__scx_exit(sch, SCX_EXIT_ERROR_STALL, 0, cpu_of(rq),
> +				   "%s[%d] failed to run for %u.%03us",
> +				   p->comm, p->pid, dur_ms / 1000,
> +				   dur_ms % 1000);
>  			timed_out = true;
>  			break;
>  		}
> @@ -6337,8 +6342,11 @@ static void scx_dump_state(struct scx_sched *sch, struct scx_exit_info *ei,
>  	dump_line(&s, "CPU states");
>  	dump_line(&s, "----------");
>  
> +	/* Dump the stall CPU first, then dump the rest in order. */
> +	scx_dump_cpu(sch, &s, &dctx, ei->stall_cpu, dump_all_tasks);

And here we can skip this if ei->stall_cpu < 0.

>  	for_each_possible_cpu(cpu) {
> -		scx_dump_cpu(sch, &s, &dctx, cpu, dump_all_tasks);
> +		if (cpu != ei->stall_cpu)
> +			scx_dump_cpu(sch, &s, &dctx, cpu, dump_all_tasks);
>  	}
>  
>  	dump_newline(&s);
> @@ -6377,7 +6385,7 @@ static void scx_disable_irq_workfn(struct irq_work *irq_work)
>  }
>  
>  static bool scx_vexit(struct scx_sched *sch,
> -		      enum scx_exit_kind kind, s64 exit_code,
> +		      enum scx_exit_kind kind, s64 exit_code, int stall_cpu,
>  		      const char *fmt, va_list args)
>  {
>  	struct scx_exit_info *ei = sch->exit_info;
> @@ -6400,6 +6408,7 @@ static bool scx_vexit(struct scx_sched *sch,
>  	 */
>  	ei->kind = kind;
>  	ei->reason = scx_exit_reason(ei->kind);
> +	ei->stall_cpu = stall_cpu;
>  
>  	irq_work_queue(&sch->disable_irq_work);
>  	return true;
> diff --git a/kernel/sched/ext_internal.h b/kernel/sched/ext_internal.h
> index b4f36d8b9c1d..a0a09e8f2ac2 100644
> --- a/kernel/sched/ext_internal.h
> +++ b/kernel/sched/ext_internal.h
> @@ -93,6 +93,9 @@ struct scx_exit_info {
>  	/* %SCX_EXIT_* - broad category of the exit reason */
>  	enum scx_exit_kind	kind;
>  
> +	/* CPU where a task stall happened. */
> +	int			stall_cpu;
> +

With CO-RE we shouldn't have any compatibility issue, but would it make sense to
move this at the end of the struct anyway?

>  	/* exit code if gracefully exiting */
>  	s64			exit_code;
>  
> -- 
> 2.53.0
> 

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] sched_ext: Dump the stall CPU first in watchdog exit
  2026-04-09  1:19   ` Tejun Heo
@ 2026-04-09  5:52     ` Andrea Righi
  2026-04-09  6:28       ` Tejun Heo
  0 siblings, 1 reply; 7+ messages in thread
From: Andrea Righi @ 2026-04-09  5:52 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Changwoo Min, void, kernel-dev, sched-ext, linux-kernel

On Wed, Apr 08, 2026 at 03:19:33PM -1000, Tejun Heo wrote:
> On Wed, Apr 08, 2026 at 12:11:13PM +0900, Changwoo Min wrote:
> > When a watchdog timeout fires, the CPU where the stalled task was
> > running is the most relevant piece of information for diagnosing the
> > hang. However, if there are many CPUs, the dump can get truncated and
> > the stall CPU's information may not appear in the output.
> > 
> > Add a stall_cpu field to scx_exit_info, thread it through scx_vexit()
> > and __scx_exit(), and populate it from cpu_of(rq) in
> > check_rq_for_timeouts(). In scx_dump_state(), dump the stall CPU
> > before iterating the rest so it always appears at the top of the output.
> > 
> > Introduce a scx_exit() macro that wraps __scx_exit() with stall_cpu=0
> > for all non-stall exit paths, keeping call sites unchanged.
> 
> Would it make sense to generalize this so that the exit record the CPU the
> exit is triggered on and always dump that CPU first? That should include
> stall case, is likely useful for different cases too and we don't have to
> add @stall_cpu to the exit functions.

But if we record the current CPU the exit is triggered on, in
check_rq_for_timeouts() we would prioritize the watchdog worker's CPU instead of
the stalled one, right?

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] sched_ext: Dump the stall CPU first in watchdog exit
  2026-04-09  5:52     ` Andrea Righi
@ 2026-04-09  6:28       ` Tejun Heo
  0 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2026-04-09  6:28 UTC (permalink / raw)
  To: Andrea Righi; +Cc: Changwoo Min, void, kernel-dev, sched-ext, linux-kernel

Hello,

On Thu, Apr 09, 2026 at 07:52:23AM +0200, Andrea Righi wrote:
> > Would it make sense to generalize this so that the exit record the CPU the
> > exit is triggered on and always dump that CPU first? That should include
> > stall case, is likely useful for different cases too and we don't have to
> > add @stall_cpu to the exit functions.
> 
> But if we record the current CPU the exit is triggered on, in
> check_rq_for_timeouts() we would prioritize the watchdog worker's CPU instead of
> the stalled one, right?

Ah, right. Can we at least generalize it to culprit or offending CPU? For
regular scx_error() triggers, we'd want to see what that particular CPU was
doing too.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-04-09  6:28 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-08  3:11 [PATCH 0/2] sched_ext: Improve watchdog stall diagnostics Changwoo Min
2026-04-08  3:11 ` [PATCH 1/2] sched_ext: Extract scx_dump_cpu() from scx_dump_state() Changwoo Min
2026-04-08  3:11 ` [PATCH 2/2] sched_ext: Dump the stall CPU first in watchdog exit Changwoo Min
2026-04-09  1:19   ` Tejun Heo
2026-04-09  5:52     ` Andrea Righi
2026-04-09  6:28       ` Tejun Heo
2026-04-09  5:44   ` Andrea Righi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox