Linux RCU subsystem development
 help / color / mirror / Atom feed
* [PATCH v2 sched_ext/for-7.2 0/2] sched_ext: Follow-up fixes for exit_cpu accuracy
@ 2026-05-04 16:08 Cheng-Yang Chou
  2026-05-04 16:08 ` [PATCH v2 1/2] sched_ext: Normalize exit dump header to "on CPU N" Cheng-Yang Chou
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Cheng-Yang Chou @ 2026-05-04 16:08 UTC (permalink / raw)
  To: sched-ext, Tejun Heo, David Vernet, Andrea Righi, Changwoo Min,
	Paul E . McKenney, rcu
  Cc: Ching-Chun Huang, Chia-Ping Tsai, yphbchou0911

Follow-up of Tejun's suggestion [1]:

Patch 1: Normalize "on cpu N" -> "on CPU N" in the exit dump header to
match UEI output style.

Patch 2: Fix exit_cpu accuracy for the hard lockup and RCU stall paths
which were recording the detector CPU instead of the actual hung/stalled
CPU. Touches kernel/rcu/tree_stall.h and kernel/rcu/tree_exp.h to thread
the stalled CPU through panic_on_rcu_stall().

Based on sched_ext/for-next (88588854c283).

Changes in v2:
- Use raw_smp_processor_id() in synchronize_rcu_expedited_wait() to
  avoid CONFIG_DEBUG_PREEMPT splat (preemption is enabled there after
  nbcon_cpu_emergency_exit()).
- Link to v1: 
  https://lore.kernel.org/r/20260501131521.161852-1-yphbchou0911@gmail.com/

[1]: https://lore.kernel.org/r/e7cbfc99b52b4b7059267bb81498179f@kernel.org/

Thanks,
Cheng-Yang

---

Cheng-Yang Chou (2):
  sched_ext: Normalize exit dump header to "on CPU N"
  sched_ext: Fix exit_cpu accuracy for lockup paths

 include/linux/sched/ext.h   |  4 ++--
 kernel/rcu/tree_exp.h       |  2 +-
 kernel/rcu/tree_stall.h     | 11 +++++++----
 kernel/sched/ext.c          | 16 +++++++++-------
 kernel/sched/ext_internal.h |  2 --
 5 files changed, 19 insertions(+), 16 deletions(-)

-- 
2.48.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/2] sched_ext: Normalize exit dump header to "on CPU N"
  2026-05-04 16:08 [PATCH v2 sched_ext/for-7.2 0/2] sched_ext: Follow-up fixes for exit_cpu accuracy Cheng-Yang Chou
@ 2026-05-04 16:08 ` Cheng-Yang Chou
  2026-05-04 16:08 ` [PATCH v2 2/2] sched_ext: Fix exit_cpu accuracy for lockup paths Cheng-Yang Chou
  2026-05-04 21:05 ` [PATCH v2 sched_ext/for-7.2 0/2] sched_ext: Follow-up fixes for exit_cpu accuracy Tejun Heo
  2 siblings, 0 replies; 12+ messages in thread
From: Cheng-Yang Chou @ 2026-05-04 16:08 UTC (permalink / raw)
  To: sched-ext, Tejun Heo, David Vernet, Andrea Righi, Changwoo Min,
	Paul E . McKenney, rcu
  Cc: Ching-Chun Huang, Chia-Ping Tsai, yphbchou0911

Unify to uppercase to match the UEI output.

Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
---
 kernel/sched/ext.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 45a2668284bc..48c65ac8e230 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -6486,7 +6486,7 @@ static void scx_dump_state(struct scx_sched *sch, struct scx_exit_info *ei,
 		dump_line(&s, "Debug dump triggered by %s", ei->reason);
 	} else {
 		if (ei->exit_cpu >= 0)
-			dump_line(&s, "%s[%d] triggered exit kind %d on cpu %d:",
+			dump_line(&s, "%s[%d] triggered exit kind %d on CPU %d:",
 				  current->comm, current->pid, ei->kind,
 				  ei->exit_cpu);
 		else
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/2] sched_ext: Fix exit_cpu accuracy for lockup paths
  2026-05-04 16:08 [PATCH v2 sched_ext/for-7.2 0/2] sched_ext: Follow-up fixes for exit_cpu accuracy Cheng-Yang Chou
  2026-05-04 16:08 ` [PATCH v2 1/2] sched_ext: Normalize exit dump header to "on CPU N" Cheng-Yang Chou
@ 2026-05-04 16:08 ` Cheng-Yang Chou
  2026-05-05  0:59   ` Tejun Heo
  2026-05-05  3:44   ` Paul E. McKenney
  2026-05-04 21:05 ` [PATCH v2 sched_ext/for-7.2 0/2] sched_ext: Follow-up fixes for exit_cpu accuracy Tejun Heo
  2 siblings, 2 replies; 12+ messages in thread
From: Cheng-Yang Chou @ 2026-05-04 16:08 UTC (permalink / raw)
  To: sched-ext, Tejun Heo, David Vernet, Andrea Righi, Changwoo Min,
	Paul E . McKenney, rcu
  Cc: Ching-Chun Huang, Chia-Ping Tsai, yphbchou0911

handle_lockup() uses raw_smp_processor_id() for exit_cpu, which is wrong
for two paths:

- scx_hardlockup_irq_workfn() has the hung CPU in a local variable but
  irq_work may run elsewhere; pass the local cpu explicitly.
- scx_rcu_cpu_stall() takes no CPU, recording the detector rather than
  the stalled one; add stalled_cpu to its signature and
  panic_on_rcu_stall(), threading it from print_cpu_stall() (current
  CPU) and print_other_cpu_stall() (first detected stalled CPU).

Add an exit_cpu parameter to handle_lockup() so callers pass the correct
CPU directly. Remove the now-unused scx_verror() macro.

Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
---
 include/linux/sched/ext.h   |  4 ++--
 kernel/rcu/tree_exp.h       |  2 +-
 kernel/rcu/tree_stall.h     | 11 +++++++----
 kernel/sched/ext.c          | 14 ++++++++------
 kernel/sched/ext_internal.h |  2 --
 5 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/include/linux/sched/ext.h b/include/linux/sched/ext.h
index d05efcac794d..16bbf24089ca 100644
--- a/include/linux/sched/ext.h
+++ b/include/linux/sched/ext.h
@@ -259,7 +259,7 @@ void sched_ext_dead(struct task_struct *p);
 void print_scx_info(const char *log_lvl, struct task_struct *p);
 void scx_softlockup(u32 dur_s);
 bool scx_hardlockup(int cpu);
-bool scx_rcu_cpu_stall(void);
+bool scx_rcu_cpu_stall(int stalled_cpu);
 
 #else	/* !CONFIG_SCHED_CLASS_EXT */
 
@@ -267,7 +267,7 @@ static inline void sched_ext_dead(struct task_struct *p) {}
 static inline void print_scx_info(const char *log_lvl, struct task_struct *p) {}
 static inline void scx_softlockup(u32 dur_s) {}
 static inline bool scx_hardlockup(int cpu) { return false; }
-static inline bool scx_rcu_cpu_stall(void) { return false; }
+static inline bool scx_rcu_cpu_stall(int stalled_cpu) { return false; }
 
 #endif	/* CONFIG_SCHED_CLASS_EXT */
 
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 82cada459e5d..fa83e273a648 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -675,7 +675,7 @@ static void synchronize_rcu_expedited_wait(void)
 
 		nbcon_cpu_emergency_exit();
 
-		panic_on_rcu_stall();
+		panic_on_rcu_stall(raw_smp_processor_id());
 	}
 }
 
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index b67532cb8770..172ac08e673d 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -159,7 +159,7 @@ static int __init check_cpu_stall_init(void)
 early_initcall(check_cpu_stall_init);
 
 /* If so specified via sysctl, panic, yielding cleaner stall-warning output. */
-static void panic_on_rcu_stall(void)
+static void panic_on_rcu_stall(int stalled_cpu)
 {
 	static int cpu_stall;
 
@@ -167,7 +167,7 @@ static void panic_on_rcu_stall(void)
 	 * Attempt to kick out the BPF scheduler if it's installed and defer
 	 * the panic to give the system a chance to recover.
 	 */
-	if (scx_rcu_cpu_stall())
+	if (scx_rcu_cpu_stall(stalled_cpu))
 		return;
 
 	if (++cpu_stall < sysctl_max_rcu_stall_to_panic)
@@ -631,6 +631,7 @@ static void rcu_check_gp_kthread_expired_fqs_timer(void)
 static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
 {
 	int cpu;
+	int first_stalled_cpu = -1;
 	unsigned long flags;
 	unsigned long gpa;
 	unsigned long j;
@@ -660,6 +661,8 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
 			for_each_leaf_node_possible_cpu(rnp, cpu)
 				if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu)) {
 					print_cpu_stall_info(cpu);
+					if (first_stalled_cpu < 0)
+						first_stalled_cpu = cpu;
 					ndetected++;
 				}
 		}
@@ -701,7 +704,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
 
 	nbcon_cpu_emergency_exit();
 
-	panic_on_rcu_stall();
+	panic_on_rcu_stall(first_stalled_cpu);
 
 	rcu_force_quiescent_state();  /* Kick them all. */
 }
@@ -754,7 +757,7 @@ static void print_cpu_stall(unsigned long gp_seq, unsigned long gps)
 
 	nbcon_cpu_emergency_exit();
 
-	panic_on_rcu_stall();
+	panic_on_rcu_stall(smp_processor_id());
 
 	/*
 	 * Attempt to revive the RCU machinery by forcing a context switch.
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 48c65ac8e230..8a0b1662a75a 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -5069,6 +5069,7 @@ bool scx_allow_ttwu_queue(const struct task_struct *p)
 
 /**
  * handle_lockup - sched_ext common lockup handler
+ * @exit_cpu: CPU to record in exit_info; pass the stalled/hung CPU, not current
  * @fmt: format string
  *
  * Called on system stall or lockup condition and initiates abort of sched_ext
@@ -5078,7 +5079,7 @@ bool scx_allow_ttwu_queue(const struct task_struct *p)
  * resolve the lockup. %false if sched_ext is not enabled or abort was already
  * initiated by someone else.
  */
-static __printf(1, 2) bool handle_lockup(const char *fmt, ...)
+static __printf(2, 3) bool handle_lockup(int exit_cpu, const char *fmt, ...)
 {
 	struct scx_sched *sch;
 	va_list args;
@@ -5094,7 +5095,7 @@ static __printf(1, 2) bool handle_lockup(const char *fmt, ...)
 	case SCX_ENABLING:
 	case SCX_ENABLED:
 		va_start(args, fmt);
-		ret = scx_verror(sch, fmt, args);
+		ret = scx_vexit(sch, SCX_EXIT_ERROR, 0, exit_cpu, fmt, args);
 		va_end(args);
 		return ret;
 	default:
@@ -5114,9 +5115,9 @@ static __printf(1, 2) bool handle_lockup(const char *fmt, ...)
  * resolve the reported RCU stall. %false if sched_ext is not enabled or someone
  * else already initiated abort.
  */
-bool scx_rcu_cpu_stall(void)
+bool scx_rcu_cpu_stall(int stalled_cpu)
 {
-	return handle_lockup("RCU CPU stall detected!");
+	return handle_lockup(stalled_cpu, "RCU CPU stall detected!");
 }
 
 /**
@@ -5131,7 +5132,8 @@ bool scx_rcu_cpu_stall(void)
  */
 void scx_softlockup(u32 dur_s)
 {
-	if (!handle_lockup("soft lockup - CPU %d stuck for %us", smp_processor_id(), dur_s))
+	if (!handle_lockup(smp_processor_id(), "soft lockup - CPU %d stuck for %us",
+			   smp_processor_id(), dur_s))
 		return;
 
 	printk_deferred(KERN_ERR "sched_ext: Soft lockup - CPU %d stuck for %us, disabling BPF scheduler\n",
@@ -5150,7 +5152,7 @@ static void scx_hardlockup_irq_workfn(struct irq_work *work)
 {
 	int cpu = atomic_xchg(&scx_hardlockup_cpu, -1);
 
-	if (cpu >= 0 && handle_lockup("hard lockup - CPU %d", cpu))
+	if (cpu >= 0 && handle_lockup(cpu, "hard lockup - CPU %d", cpu))
 		printk_deferred(KERN_ERR "sched_ext: Hard lockup - CPU %d, disabling BPF scheduler\n",
 				cpu);
 }
diff --git a/kernel/sched/ext_internal.h b/kernel/sched/ext_internal.h
index b4f5dd28855e..2d04f27cf7c5 100644
--- a/kernel/sched/ext_internal.h
+++ b/kernel/sched/ext_internal.h
@@ -1486,8 +1486,6 @@ __printf(5, 6) bool __scx_exit(struct scx_sched *sch, enum scx_exit_kind kind,
 	__scx_exit(sch, kind, exit_code, raw_smp_processor_id(), fmt, ##args)
 #define scx_error(sch, fmt, args...)						\
 	scx_exit((sch), SCX_EXIT_ERROR, 0, fmt, ##args)
-#define scx_verror(sch, fmt, args)						\
-	scx_vexit((sch), SCX_EXIT_ERROR, 0, raw_smp_processor_id(), fmt, args)
 
 /*
  * Return the rq currently locked from an scx callback, or NULL if no rq is
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 sched_ext/for-7.2 0/2] sched_ext: Follow-up fixes for exit_cpu accuracy
  2026-05-04 16:08 [PATCH v2 sched_ext/for-7.2 0/2] sched_ext: Follow-up fixes for exit_cpu accuracy Cheng-Yang Chou
  2026-05-04 16:08 ` [PATCH v2 1/2] sched_ext: Normalize exit dump header to "on CPU N" Cheng-Yang Chou
  2026-05-04 16:08 ` [PATCH v2 2/2] sched_ext: Fix exit_cpu accuracy for lockup paths Cheng-Yang Chou
@ 2026-05-04 21:05 ` Tejun Heo
  2 siblings, 0 replies; 12+ messages in thread
From: Tejun Heo @ 2026-05-04 21:05 UTC (permalink / raw)
  To: Cheng-Yang Chou
  Cc: sched-ext, David Vernet, Andrea Righi, Changwoo Min,
	Paul E. McKenney, rcu, Ching-Chun Huang, Chia-Ping Tsai

Hello,

Applied 1/2 to sched_ext/for-7.2. Holding 2/2 to give the RCU side
time to weigh in on the smp_processor_id() / raw_smp_processor_id()
change in synchronize_rcu_expedited_wait().

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] sched_ext: Fix exit_cpu accuracy for lockup paths
  2026-05-04 16:08 ` [PATCH v2 2/2] sched_ext: Fix exit_cpu accuracy for lockup paths Cheng-Yang Chou
@ 2026-05-05  0:59   ` Tejun Heo
  2026-05-05  3:44   ` Paul E. McKenney
  1 sibling, 0 replies; 12+ messages in thread
From: Tejun Heo @ 2026-05-05  0:59 UTC (permalink / raw)
  To: Cheng-Yang Chou
  Cc: sched-ext, David Vernet, Andrea Righi, Changwoo Min,
	Paul E. McKenney, rcu, Ching-Chun Huang, Chia-Ping Tsai

Hello,

> -		panic_on_rcu_stall();
> +		panic_on_rcu_stall(raw_smp_processor_id());

This one in synchronize_rcu_expedited_wait() - is raw_smp_processor_id()
the right CPU here? The wait runs in the grace-period worker context,
so the current CPU is the waiter, not a stuck one. The stuck CPUs are
the ones in rnp->expmask that synchronize_rcu_expedited_stall() just
iterated over.

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] sched_ext: Fix exit_cpu accuracy for lockup paths
  2026-05-04 16:08 ` [PATCH v2 2/2] sched_ext: Fix exit_cpu accuracy for lockup paths Cheng-Yang Chou
  2026-05-05  0:59   ` Tejun Heo
@ 2026-05-05  3:44   ` Paul E. McKenney
  2026-05-05  8:20     ` Cheng-Yang Chou
  1 sibling, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2026-05-05  3:44 UTC (permalink / raw)
  To: Cheng-Yang Chou
  Cc: sched-ext, Tejun Heo, David Vernet, Andrea Righi, Changwoo Min,
	rcu, Ching-Chun Huang, Chia-Ping Tsai

On Tue, May 05, 2026 at 12:08:20AM +0800, Cheng-Yang Chou wrote:
> handle_lockup() uses raw_smp_processor_id() for exit_cpu, which is wrong
> for two paths:
> 
> - scx_hardlockup_irq_workfn() has the hung CPU in a local variable but
>   irq_work may run elsewhere; pass the local cpu explicitly.
> - scx_rcu_cpu_stall() takes no CPU, recording the detector rather than
>   the stalled one; add stalled_cpu to its signature and
>   panic_on_rcu_stall(), threading it from print_cpu_stall() (current
>   CPU) and print_other_cpu_stall() (first detected stalled CPU).
> 
> Add an exit_cpu parameter to handle_lockup() so callers pass the correct
> CPU directly. Remove the now-unused scx_verror() macro.
> 
> Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
> ---
>  include/linux/sched/ext.h   |  4 ++--
>  kernel/rcu/tree_exp.h       |  2 +-
>  kernel/rcu/tree_stall.h     | 11 +++++++----
>  kernel/sched/ext.c          | 14 ++++++++------
>  kernel/sched/ext_internal.h |  2 --
>  5 files changed, 18 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/sched/ext.h b/include/linux/sched/ext.h
> index d05efcac794d..16bbf24089ca 100644
> --- a/include/linux/sched/ext.h
> +++ b/include/linux/sched/ext.h
> @@ -259,7 +259,7 @@ void sched_ext_dead(struct task_struct *p);
>  void print_scx_info(const char *log_lvl, struct task_struct *p);
>  void scx_softlockup(u32 dur_s);
>  bool scx_hardlockup(int cpu);
> -bool scx_rcu_cpu_stall(void);
> +bool scx_rcu_cpu_stall(int stalled_cpu);
>  
>  #else	/* !CONFIG_SCHED_CLASS_EXT */
>  
> @@ -267,7 +267,7 @@ static inline void sched_ext_dead(struct task_struct *p) {}
>  static inline void print_scx_info(const char *log_lvl, struct task_struct *p) {}
>  static inline void scx_softlockup(u32 dur_s) {}
>  static inline bool scx_hardlockup(int cpu) { return false; }
> -static inline bool scx_rcu_cpu_stall(void) { return false; }
> +static inline bool scx_rcu_cpu_stall(int stalled_cpu) { return false; }

There might be more than one stalled CPU.  You could pass in a cpumask_t
or similar to print all of them, or take the first or the last or
some such.

>  
>  #endif	/* CONFIG_SCHED_CLASS_EXT */
>  
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index 82cada459e5d..fa83e273a648 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -675,7 +675,7 @@ static void synchronize_rcu_expedited_wait(void)
>  
>  		nbcon_cpu_emergency_exit();
>  
> -		panic_on_rcu_stall();
> +		panic_on_rcu_stall(raw_smp_processor_id());

As Tejun pointed out, this gets you whatever random CPU was running
RCU's expedited grace-period processing, which is almost certainly not
the stalled CPU.

In addition, in preemptible kernels, it is possible to stall not on a CPU,
but on a task that was preempted within an RCU read-side critical section.
(It is also possible to stall on some CPUs and on some tasks.)

So suppose that the expedited grace period is stalled not on a CPU, but
instead on a task.  What do you want to do in that case?

If you only care about CPUs, then synchronize_rcu_expedited_stall()
could return the number of the first stalled CPU that it encountered,
or -1 if there were no CPUs doing any stalling.  If you wanted the first
CPU, you could capture it within the first rcu_for_each_leaf_node()
loop in that function.  Or you could pass in a cpumask_t and have
synchronize_rcu_expedited_stall() fill it in, getting all the CPUs.
And you could pass in an array (along with its size) to capture the PIDs
of any stalled tasks, which synchronize_rcu_expedited_stall() would in
turn pass down to rcu_print_task_exp_stall(), which could fill in the
PIDs of the first stalled tasks in the ->blkd_tasks lists.

Whatever you collect would then be passed down to panic_on_rcu_stall()
and from there to scx_rcu_cpu_stall().

Much depends on exactly what you are trying to accomplish with the
additional information.

More inline below.

						Thanx, Paul

>  	}
>  }
>  
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index b67532cb8770..172ac08e673d 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -159,7 +159,7 @@ static int __init check_cpu_stall_init(void)
>  early_initcall(check_cpu_stall_init);
>  
>  /* If so specified via sysctl, panic, yielding cleaner stall-warning output. */
> -static void panic_on_rcu_stall(void)
> +static void panic_on_rcu_stall(int stalled_cpu)
>  {
>  	static int cpu_stall;
>  
> @@ -167,7 +167,7 @@ static void panic_on_rcu_stall(void)
>  	 * Attempt to kick out the BPF scheduler if it's installed and defer
>  	 * the panic to give the system a chance to recover.
>  	 */
> -	if (scx_rcu_cpu_stall())
> +	if (scx_rcu_cpu_stall(stalled_cpu))
>  		return;
>  
>  	if (++cpu_stall < sysctl_max_rcu_stall_to_panic)
> @@ -631,6 +631,7 @@ static void rcu_check_gp_kthread_expired_fqs_timer(void)
>  static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
>  {
>  	int cpu;
> +	int first_stalled_cpu = -1;
>  	unsigned long flags;
>  	unsigned long gpa;
>  	unsigned long j;
> @@ -660,6 +661,8 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
>  			for_each_leaf_node_possible_cpu(rnp, cpu)
>  				if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu)) {
>  					print_cpu_stall_info(cpu);
> +					if (first_stalled_cpu < 0)
> +						first_stalled_cpu = cpu;

This gets the first stalled CPU.  Maybe that is also what you want for
the expedited stalls?

Again, there might not be any stalled CPUs, but only stalled tasks...

>  					ndetected++;
>  				}
>  		}
> @@ -701,7 +704,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
>  
>  	nbcon_cpu_emergency_exit();
>  
> -	panic_on_rcu_stall();
> +	panic_on_rcu_stall(first_stalled_cpu);

...in which case panic_on_rcu_stall() gets a -1.  Which might not be what
the user wants to see.

>  	rcu_force_quiescent_state();  /* Kick them all. */
>  }
> @@ -754,7 +757,7 @@ static void print_cpu_stall(unsigned long gp_seq, unsigned long gps)
>  
>  	nbcon_cpu_emergency_exit();
>  
> -	panic_on_rcu_stall();
> +	panic_on_rcu_stall(smp_processor_id());

In this case, we do just have the current CPU doing the stall, so this
does work fine.  (If there are more CPUs and tasks involved, there
could be additional RCU CPU stall warning messages.)

>  	/*
>  	 * Attempt to revive the RCU machinery by forcing a context switch.
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 48c65ac8e230..8a0b1662a75a 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -5069,6 +5069,7 @@ bool scx_allow_ttwu_queue(const struct task_struct *p)
>  
>  /**
>   * handle_lockup - sched_ext common lockup handler
> + * @exit_cpu: CPU to record in exit_info; pass the stalled/hung CPU, not current
>   * @fmt: format string
>   *
>   * Called on system stall or lockup condition and initiates abort of sched_ext
> @@ -5078,7 +5079,7 @@ bool scx_allow_ttwu_queue(const struct task_struct *p)
>   * resolve the lockup. %false if sched_ext is not enabled or abort was already
>   * initiated by someone else.
>   */
> -static __printf(1, 2) bool handle_lockup(const char *fmt, ...)
> +static __printf(2, 3) bool handle_lockup(int exit_cpu, const char *fmt, ...)
>  {
>  	struct scx_sched *sch;
>  	va_list args;
> @@ -5094,7 +5095,7 @@ static __printf(1, 2) bool handle_lockup(const char *fmt, ...)
>  	case SCX_ENABLING:
>  	case SCX_ENABLED:
>  		va_start(args, fmt);
> -		ret = scx_verror(sch, fmt, args);
> +		ret = scx_vexit(sch, SCX_EXIT_ERROR, 0, exit_cpu, fmt, args);
>  		va_end(args);
>  		return ret;
>  	default:
> @@ -5114,9 +5115,9 @@ static __printf(1, 2) bool handle_lockup(const char *fmt, ...)
>   * resolve the reported RCU stall. %false if sched_ext is not enabled or someone
>   * else already initiated abort.
>   */
> -bool scx_rcu_cpu_stall(void)
> +bool scx_rcu_cpu_stall(int stalled_cpu)
>  {
> -	return handle_lockup("RCU CPU stall detected!");
> +	return handle_lockup(stalled_cpu, "RCU CPU stall detected!");
>  }
>  
>  /**
> @@ -5131,7 +5132,8 @@ bool scx_rcu_cpu_stall(void)
>   */
>  void scx_softlockup(u32 dur_s)
>  {
> -	if (!handle_lockup("soft lockup - CPU %d stuck for %us", smp_processor_id(), dur_s))
> +	if (!handle_lockup(smp_processor_id(), "soft lockup - CPU %d stuck for %us",
> +			   smp_processor_id(), dur_s))

Repeating "smp_processor_id()" in the argument list seems a bit redundant?

>  		return;
>  
>  	printk_deferred(KERN_ERR "sched_ext: Soft lockup - CPU %d stuck for %us, disabling BPF scheduler\n",
> @@ -5150,7 +5152,7 @@ static void scx_hardlockup_irq_workfn(struct irq_work *work)
>  {
>  	int cpu = atomic_xchg(&scx_hardlockup_cpu, -1);
>  
> -	if (cpu >= 0 && handle_lockup("hard lockup - CPU %d", cpu))
> +	if (cpu >= 0 && handle_lockup(cpu, "hard lockup - CPU %d", cpu))

Repeating "cpu" in the argument list seems a bit redundant?

>  		printk_deferred(KERN_ERR "sched_ext: Hard lockup - CPU %d, disabling BPF scheduler\n",
>  				cpu);
>  }
> diff --git a/kernel/sched/ext_internal.h b/kernel/sched/ext_internal.h
> index b4f5dd28855e..2d04f27cf7c5 100644
> --- a/kernel/sched/ext_internal.h
> +++ b/kernel/sched/ext_internal.h
> @@ -1486,8 +1486,6 @@ __printf(5, 6) bool __scx_exit(struct scx_sched *sch, enum scx_exit_kind kind,
>  	__scx_exit(sch, kind, exit_code, raw_smp_processor_id(), fmt, ##args)
>  #define scx_error(sch, fmt, args...)						\
>  	scx_exit((sch), SCX_EXIT_ERROR, 0, fmt, ##args)
> -#define scx_verror(sch, fmt, args)						\
> -	scx_vexit((sch), SCX_EXIT_ERROR, 0, raw_smp_processor_id(), fmt, args)
>  
>  /*
>   * Return the rq currently locked from an scx callback, or NULL if no rq is
> -- 
> 2.48.1
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] sched_ext: Fix exit_cpu accuracy for lockup paths
  2026-05-05  3:44   ` Paul E. McKenney
@ 2026-05-05  8:20     ` Cheng-Yang Chou
  2026-05-05  8:34       ` Tejun Heo
  2026-05-05 15:10       ` Paul E. McKenney
  0 siblings, 2 replies; 12+ messages in thread
From: Cheng-Yang Chou @ 2026-05-05  8:20 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: sched-ext, Tejun Heo, David Vernet, Andrea Righi, Changwoo Min,
	rcu, Ching-Chun Huang, Chia-Ping Tsai

Hi Paul, Tejun,

On Mon, May 04, 2026 at 08:44:48PM -0700, Paul E. McKenney wrote:
> On Tue, May 05, 2026 at 12:08:20AM +0800, Cheng-Yang Chou wrote:
> > handle_lockup() uses raw_smp_processor_id() for exit_cpu, which is wrong
> > for two paths:
> > 
> > - scx_hardlockup_irq_workfn() has the hung CPU in a local variable but
> >   irq_work may run elsewhere; pass the local cpu explicitly.
> > - scx_rcu_cpu_stall() takes no CPU, recording the detector rather than
> >   the stalled one; add stalled_cpu to its signature and
> >   panic_on_rcu_stall(), threading it from print_cpu_stall() (current
> >   CPU) and print_other_cpu_stall() (first detected stalled CPU).
> > 
> > Add an exit_cpu parameter to handle_lockup() so callers pass the correct
> > CPU directly. Remove the now-unused scx_verror() macro.
> > 
> > Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
> > ---
> >  include/linux/sched/ext.h   |  4 ++--
> >  kernel/rcu/tree_exp.h       |  2 +-
> >  kernel/rcu/tree_stall.h     | 11 +++++++----
> >  kernel/sched/ext.c          | 14 ++++++++------
> >  kernel/sched/ext_internal.h |  2 --
> >  5 files changed, 18 insertions(+), 15 deletions(-)
> > 
> > diff --git a/include/linux/sched/ext.h b/include/linux/sched/ext.h
> > index d05efcac794d..16bbf24089ca 100644
> > --- a/include/linux/sched/ext.h
> > +++ b/include/linux/sched/ext.h
> > @@ -259,7 +259,7 @@ void sched_ext_dead(struct task_struct *p);
> >  void print_scx_info(const char *log_lvl, struct task_struct *p);
> >  void scx_softlockup(u32 dur_s);
> >  bool scx_hardlockup(int cpu);
> > -bool scx_rcu_cpu_stall(void);
> > +bool scx_rcu_cpu_stall(int stalled_cpu);
> >  
> >  #else	/* !CONFIG_SCHED_CLASS_EXT */
> >  
> > @@ -267,7 +267,7 @@ static inline void sched_ext_dead(struct task_struct *p) {}
> >  static inline void print_scx_info(const char *log_lvl, struct task_struct *p) {}
> >  static inline void scx_softlockup(u32 dur_s) {}
> >  static inline bool scx_hardlockup(int cpu) { return false; }
> > -static inline bool scx_rcu_cpu_stall(void) { return false; }
> > +static inline bool scx_rcu_cpu_stall(int stalled_cpu) { return false; }
> 
> There might be more than one stalled CPU.  You could pass in a cpumask_t
> or similar to print all of them, or take the first or the last or
> some such.

sched_ext's exit_info carries a single s32 exit_cpu, so we can only
record one CPU. Would taking the first stalled CPU be acceptable as a 
best-effort hint, or would you prefer we pass -1 and skip the CPU field
entirely for the RCU stall paths?

> >  
> >  #endif	/* CONFIG_SCHED_CLASS_EXT */
> >  
> > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > index 82cada459e5d..fa83e273a648 100644
> > --- a/kernel/rcu/tree_exp.h
> > +++ b/kernel/rcu/tree_exp.h
> > @@ -675,7 +675,7 @@ static void synchronize_rcu_expedited_wait(void)
> >  
> >  		nbcon_cpu_emergency_exit();
> >  
> > -		panic_on_rcu_stall();
> > +		panic_on_rcu_stall(raw_smp_processor_id());
> 
> As Tejun pointed out, this gets you whatever random CPU was running
> RCU's expedited grace-period processing, which is almost certainly not
> the stalled CPU.
> 
> In addition, in preemptible kernels, it is possible to stall not on a CPU,
> but on a task that was preempted within an RCU read-side critical section.
> (It is also possible to stall on some CPUs and on some tasks.)
> 
> So suppose that the expedited grace period is stalled not on a CPU, but
> instead on a task.  What do you want to do in that case?
> 
> If you only care about CPUs, then synchronize_rcu_expedited_stall()
> could return the number of the first stalled CPU that it encountered,
> or -1 if there were no CPUs doing any stalling.  If you wanted the first
> CPU, you could capture it within the first rcu_for_each_leaf_node()
> loop in that function.  Or you could pass in a cpumask_t and have
> synchronize_rcu_expedited_stall() fill it in, getting all the CPUs.
> And you could pass in an array (along with its size) to capture the PIDs
> of any stalled tasks, which synchronize_rcu_expedited_stall() would in
> turn pass down to rcu_print_task_exp_stall(), which could fill in the
> PIDs of the first stalled tasks in the ->blkd_tasks lists.
> 
> Whatever you collect would then be passed down to panic_on_rcu_stall()
> and from there to scx_rcu_cpu_stall().
> 
> Much depends on exactly what you are trying to accomplish with the
> additional information.

So, the goal here is best-effort: when sched_ext is aborted due to an
RCU stall, record which CPU was involved for the dump header. There is
no task-based filed in exit_info, so task stalls would naturally fall
through to -1.

Would you prefer:
 (a) pass -1 unconditionally
 (b) have synchronize_rcu_expedited_stall() return the first stalled CPU
     (or -1 if only tasks are stalled)?

> 
> More inline below.
> 
> 						Thanx, Paul
> 
> >  	}
> >  }
> >  
> > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> > index b67532cb8770..172ac08e673d 100644
> > --- a/kernel/rcu/tree_stall.h
> > +++ b/kernel/rcu/tree_stall.h
> > @@ -159,7 +159,7 @@ static int __init check_cpu_stall_init(void)
> >  early_initcall(check_cpu_stall_init);
> >  
> >  /* If so specified via sysctl, panic, yielding cleaner stall-warning output. */
> > -static void panic_on_rcu_stall(void)
> > +static void panic_on_rcu_stall(int stalled_cpu)
> >  {
> >  	static int cpu_stall;
> >  
> > @@ -167,7 +167,7 @@ static void panic_on_rcu_stall(void)
> >  	 * Attempt to kick out the BPF scheduler if it's installed and defer
> >  	 * the panic to give the system a chance to recover.
> >  	 */
> > -	if (scx_rcu_cpu_stall())
> > +	if (scx_rcu_cpu_stall(stalled_cpu))
> >  		return;
> >  
> >  	if (++cpu_stall < sysctl_max_rcu_stall_to_panic)
> > @@ -631,6 +631,7 @@ static void rcu_check_gp_kthread_expired_fqs_timer(void)
> >  static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
> >  {
> >  	int cpu;
> > +	int first_stalled_cpu = -1;
> >  	unsigned long flags;
> >  	unsigned long gpa;
> >  	unsigned long j;
> > @@ -660,6 +661,8 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
> >  			for_each_leaf_node_possible_cpu(rnp, cpu)
> >  				if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu)) {
> >  					print_cpu_stall_info(cpu);
> > +					if (first_stalled_cpu < 0)
> > +						first_stalled_cpu = cpu;
> 
> This gets the first stalled CPU.  Maybe that is also what you want for
> the expedited stalls?
> 
> Again, there might not be any stalled CPUs, but only stalled tasks...
> 

Understood. For print_other_cpu_stall() itself, first_stalled_cpu = -1
when only tasks are stalled seems acceptable since exit_cpu < 0 is
already handled gracefully.

Is there a convention I should follow here, or is -1 fine?

> >  					ndetected++;
> >  				}
> >  		}
> > @@ -701,7 +704,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
> >  
> >  	nbcon_cpu_emergency_exit();
> >  
> > -	panic_on_rcu_stall();
> > +	panic_on_rcu_stall(first_stalled_cpu);
> 
> ...in which case panic_on_rcu_stall() gets a -1.  Which might not be what
> the user wants to see.
> 
> >  	rcu_force_quiescent_state();  /* Kick them all. */
> >  }
> > @@ -754,7 +757,7 @@ static void print_cpu_stall(unsigned long gp_seq, unsigned long gps)
> >  
> >  	nbcon_cpu_emergency_exit();
> >  
> > -	panic_on_rcu_stall();
> > +	panic_on_rcu_stall(smp_processor_id());
> 
> In this case, we do just have the current CPU doing the stall, so this
> does work fine.  (If there are more CPUs and tasks involved, there
> could be additional RCU CPU stall warning messages.)

Good to know, thanks for confirming.

> 
> >  	/*
> >  	 * Attempt to revive the RCU machinery by forcing a context switch.
> > diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> > index 48c65ac8e230..8a0b1662a75a 100644
> > --- a/kernel/sched/ext.c
> > +++ b/kernel/sched/ext.c
> > @@ -5069,6 +5069,7 @@ bool scx_allow_ttwu_queue(const struct task_struct *p)
> >  
> >  /**
> >   * handle_lockup - sched_ext common lockup handler
> > + * @exit_cpu: CPU to record in exit_info; pass the stalled/hung CPU, not current
> >   * @fmt: format string
> >   *
> >   * Called on system stall or lockup condition and initiates abort of sched_ext
> > @@ -5078,7 +5079,7 @@ bool scx_allow_ttwu_queue(const struct task_struct *p)
> >   * resolve the lockup. %false if sched_ext is not enabled or abort was already
> >   * initiated by someone else.
> >   */
> > -static __printf(1, 2) bool handle_lockup(const char *fmt, ...)
> > +static __printf(2, 3) bool handle_lockup(int exit_cpu, const char *fmt, ...)
> >  {
> >  	struct scx_sched *sch;
> >  	va_list args;
> > @@ -5094,7 +5095,7 @@ static __printf(1, 2) bool handle_lockup(const char *fmt, ...)
> >  	case SCX_ENABLING:
> >  	case SCX_ENABLED:
> >  		va_start(args, fmt);
> > -		ret = scx_verror(sch, fmt, args);
> > +		ret = scx_vexit(sch, SCX_EXIT_ERROR, 0, exit_cpu, fmt, args);
> >  		va_end(args);
> >  		return ret;
> >  	default:
> > @@ -5114,9 +5115,9 @@ static __printf(1, 2) bool handle_lockup(const char *fmt, ...)
> >   * resolve the reported RCU stall. %false if sched_ext is not enabled or someone
> >   * else already initiated abort.
> >   */
> > -bool scx_rcu_cpu_stall(void)
> > +bool scx_rcu_cpu_stall(int stalled_cpu)
> >  {
> > -	return handle_lockup("RCU CPU stall detected!");
> > +	return handle_lockup(stalled_cpu, "RCU CPU stall detected!");
> >  }
> >  
> >  /**
> > @@ -5131,7 +5132,8 @@ bool scx_rcu_cpu_stall(void)
> >   */
> >  void scx_softlockup(u32 dur_s)
> >  {
> > -	if (!handle_lockup("soft lockup - CPU %d stuck for %us", smp_processor_id(), dur_s))
> > +	if (!handle_lockup(smp_processor_id(), "soft lockup - CPU %d stuck for %us",
> > +			   smp_processor_id(), dur_s))
> 
> Repeating "smp_processor_id()" in the argument list seems a bit redundant?

Will save to a local variable in v3.

> 
> >  		return;
> >  
> >  	printk_deferred(KERN_ERR "sched_ext: Soft lockup - CPU %d stuck for %us, disabling BPF scheduler\n",
> > @@ -5150,7 +5152,7 @@ static void scx_hardlockup_irq_workfn(struct irq_work *work)
> >  {
> >  	int cpu = atomic_xchg(&scx_hardlockup_cpu, -1);
> >  
> > -	if (cpu >= 0 && handle_lockup("hard lockup - CPU %d", cpu))
> > +	if (cpu >= 0 && handle_lockup(cpu, "hard lockup - CPU %d", cpu))
> 
> Repeating "cpu" in the argument list seems a bit redundant?
> 

The first occurrence is exit_cpu (struct field), the second is the %d
format argument for the readable message. BOth serve different purposes
so both are needed.

Thanks.

> >  		printk_deferred(KERN_ERR "sched_ext: Hard lockup - CPU %d, disabling BPF scheduler\n",
> >  				cpu);
> >  }
> > diff --git a/kernel/sched/ext_internal.h b/kernel/sched/ext_internal.h
> > index b4f5dd28855e..2d04f27cf7c5 100644
> > --- a/kernel/sched/ext_internal.h
> > +++ b/kernel/sched/ext_internal.h
> > @@ -1486,8 +1486,6 @@ __printf(5, 6) bool __scx_exit(struct scx_sched *sch, enum scx_exit_kind kind,
> >  	__scx_exit(sch, kind, exit_code, raw_smp_processor_id(), fmt, ##args)
> >  #define scx_error(sch, fmt, args...)						\
> >  	scx_exit((sch), SCX_EXIT_ERROR, 0, fmt, ##args)
> > -#define scx_verror(sch, fmt, args)						\
> > -	scx_vexit((sch), SCX_EXIT_ERROR, 0, raw_smp_processor_id(), fmt, args)
> >  
> >  /*
> >   * Return the rq currently locked from an scx callback, or NULL if no rq is
> > -- 
> > 2.48.1
> > 
> > 

-- 
Cheers,
Cheng-Yang

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] sched_ext: Fix exit_cpu accuracy for lockup paths
  2026-05-05  8:20     ` Cheng-Yang Chou
@ 2026-05-05  8:34       ` Tejun Heo
  2026-05-06 10:18         ` Cheng-Yang Chou
  2026-05-05 15:10       ` Paul E. McKenney
  1 sibling, 1 reply; 12+ messages in thread
From: Tejun Heo @ 2026-05-05  8:34 UTC (permalink / raw)
  To: Cheng-Yang Chou
  Cc: Paul E. McKenney, sched-ext, David Vernet, Andrea Righi,
	Changwoo Min, rcu, Ching-Chun Huang, Chia-Ping Tsai

On Tue, May 05, 2026 at 04:20:25PM +0800, Cheng-Yang Chou wrote:
...
> > There might be more than one stalled CPU.  You could pass in a cpumask_t
> > or similar to print all of them, or take the first or the last or
> > some such.
> 
> sched_ext's exit_info carries a single s32 exit_cpu, so we can only
> record one CPU. Would taking the first stalled CPU be acceptable as a 
> best-effort hint, or would you prefer we pass -1 and skip the CPU field
> entirely for the RCU stall paths?

We carry only one cpu in exit_info but we can easily dump the cpumask and
prioritize dumping all the cpus in the mask.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] sched_ext: Fix exit_cpu accuracy for lockup paths
  2026-05-05  8:20     ` Cheng-Yang Chou
  2026-05-05  8:34       ` Tejun Heo
@ 2026-05-05 15:10       ` Paul E. McKenney
  2026-05-06 10:54         ` Cheng-Yang Chou
  1 sibling, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2026-05-05 15:10 UTC (permalink / raw)
  To: Cheng-Yang Chou
  Cc: sched-ext, Tejun Heo, David Vernet, Andrea Righi, Changwoo Min,
	rcu, Ching-Chun Huang, Chia-Ping Tsai

On Tue, May 05, 2026 at 04:20:25PM +0800, Cheng-Yang Chou wrote:
> Hi Paul, Tejun,
> 
> On Mon, May 04, 2026 at 08:44:48PM -0700, Paul E. McKenney wrote:
> > On Tue, May 05, 2026 at 12:08:20AM +0800, Cheng-Yang Chou wrote:
> > > handle_lockup() uses raw_smp_processor_id() for exit_cpu, which is wrong
> > > for two paths:
> > > 
> > > - scx_hardlockup_irq_workfn() has the hung CPU in a local variable but
> > >   irq_work may run elsewhere; pass the local cpu explicitly.
> > > - scx_rcu_cpu_stall() takes no CPU, recording the detector rather than
> > >   the stalled one; add stalled_cpu to its signature and
> > >   panic_on_rcu_stall(), threading it from print_cpu_stall() (current
> > >   CPU) and print_other_cpu_stall() (first detected stalled CPU).
> > > 
> > > Add an exit_cpu parameter to handle_lockup() so callers pass the correct
> > > CPU directly. Remove the now-unused scx_verror() macro.
> > > 
> > > Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
> > > ---
> > >  include/linux/sched/ext.h   |  4 ++--
> > >  kernel/rcu/tree_exp.h       |  2 +-
> > >  kernel/rcu/tree_stall.h     | 11 +++++++----
> > >  kernel/sched/ext.c          | 14 ++++++++------
> > >  kernel/sched/ext_internal.h |  2 --
> > >  5 files changed, 18 insertions(+), 15 deletions(-)
> > > 
> > > diff --git a/include/linux/sched/ext.h b/include/linux/sched/ext.h
> > > index d05efcac794d..16bbf24089ca 100644
> > > --- a/include/linux/sched/ext.h
> > > +++ b/include/linux/sched/ext.h
> > > @@ -259,7 +259,7 @@ void sched_ext_dead(struct task_struct *p);
> > >  void print_scx_info(const char *log_lvl, struct task_struct *p);
> > >  void scx_softlockup(u32 dur_s);
> > >  bool scx_hardlockup(int cpu);
> > > -bool scx_rcu_cpu_stall(void);
> > > +bool scx_rcu_cpu_stall(int stalled_cpu);
> > >  
> > >  #else	/* !CONFIG_SCHED_CLASS_EXT */
> > >  
> > > @@ -267,7 +267,7 @@ static inline void sched_ext_dead(struct task_struct *p) {}
> > >  static inline void print_scx_info(const char *log_lvl, struct task_struct *p) {}
> > >  static inline void scx_softlockup(u32 dur_s) {}
> > >  static inline bool scx_hardlockup(int cpu) { return false; }
> > > -static inline bool scx_rcu_cpu_stall(void) { return false; }
> > > +static inline bool scx_rcu_cpu_stall(int stalled_cpu) { return false; }
> > 
> > There might be more than one stalled CPU.  You could pass in a cpumask_t
> > or similar to print all of them, or take the first or the last or
> > some such.
> 
> sched_ext's exit_info carries a single s32 exit_cpu, so we can only
> record one CPU. Would taking the first stalled CPU be acceptable as a 
> best-effort hint, or would you prefer we pass -1 and skip the CPU field
> entirely for the RCU stall paths?

I can tell you what information is availalble and how to collect it,
but I must defer to you guys as to what information can be passed along
and which of that is actually useful.  ;-)

> > >  #endif	/* CONFIG_SCHED_CLASS_EXT */
> > >  
> > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > > index 82cada459e5d..fa83e273a648 100644
> > > --- a/kernel/rcu/tree_exp.h
> > > +++ b/kernel/rcu/tree_exp.h
> > > @@ -675,7 +675,7 @@ static void synchronize_rcu_expedited_wait(void)
> > >  
> > >  		nbcon_cpu_emergency_exit();
> > >  
> > > -		panic_on_rcu_stall();
> > > +		panic_on_rcu_stall(raw_smp_processor_id());
> > 
> > As Tejun pointed out, this gets you whatever random CPU was running
> > RCU's expedited grace-period processing, which is almost certainly not
> > the stalled CPU.
> > 
> > In addition, in preemptible kernels, it is possible to stall not on a CPU,
> > but on a task that was preempted within an RCU read-side critical section.
> > (It is also possible to stall on some CPUs and on some tasks.)
> > 
> > So suppose that the expedited grace period is stalled not on a CPU, but
> > instead on a task.  What do you want to do in that case?
> > 
> > If you only care about CPUs, then synchronize_rcu_expedited_stall()
> > could return the number of the first stalled CPU that it encountered,
> > or -1 if there were no CPUs doing any stalling.  If you wanted the first
> > CPU, you could capture it within the first rcu_for_each_leaf_node()
> > loop in that function.  Or you could pass in a cpumask_t and have
> > synchronize_rcu_expedited_stall() fill it in, getting all the CPUs.
> > And you could pass in an array (along with its size) to capture the PIDs
> > of any stalled tasks, which synchronize_rcu_expedited_stall() would in
> > turn pass down to rcu_print_task_exp_stall(), which could fill in the
> > PIDs of the first stalled tasks in the ->blkd_tasks lists.
> > 
> > Whatever you collect would then be passed down to panic_on_rcu_stall()
> > and from there to scx_rcu_cpu_stall().
> > 
> > Much depends on exactly what you are trying to accomplish with the
> > additional information.
> 
> So, the goal here is best-effort: when sched_ext is aborted due to an
> RCU stall, record which CPU was involved for the dump header. There is
> no task-based filed in exit_info, so task stalls would naturally fall
> through to -1.
> 
> Would you prefer:
>  (a) pass -1 unconditionally
>  (b) have synchronize_rcu_expedited_stall() return the first stalled CPU
>      (or -1 if only tasks are stalled)?

Or, as Tejun suggests:

  (c) add a cpumask_t.

I have no idea whether or not it makes sense to also add a small fixed
array for PIDs of stalled tasks.  But there can be a very large number
of stalled tasks, so if you choose to capture any of them, it should
be a small fixed number of them.  (In my experience, in practice, there
usually are not very many stalled tasks anyway.)

> > More inline below.
> > 
> > 						Thanx, Paul
> > 
> > >  	}
> > >  }
> > >  
> > > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> > > index b67532cb8770..172ac08e673d 100644
> > > --- a/kernel/rcu/tree_stall.h
> > > +++ b/kernel/rcu/tree_stall.h
> > > @@ -159,7 +159,7 @@ static int __init check_cpu_stall_init(void)
> > >  early_initcall(check_cpu_stall_init);
> > >  
> > >  /* If so specified via sysctl, panic, yielding cleaner stall-warning output. */
> > > -static void panic_on_rcu_stall(void)
> > > +static void panic_on_rcu_stall(int stalled_cpu)
> > >  {
> > >  	static int cpu_stall;
> > >  
> > > @@ -167,7 +167,7 @@ static void panic_on_rcu_stall(void)
> > >  	 * Attempt to kick out the BPF scheduler if it's installed and defer
> > >  	 * the panic to give the system a chance to recover.
> > >  	 */
> > > -	if (scx_rcu_cpu_stall())
> > > +	if (scx_rcu_cpu_stall(stalled_cpu))
> > >  		return;
> > >  
> > >  	if (++cpu_stall < sysctl_max_rcu_stall_to_panic)
> > > @@ -631,6 +631,7 @@ static void rcu_check_gp_kthread_expired_fqs_timer(void)
> > >  static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
> > >  {
> > >  	int cpu;
> > > +	int first_stalled_cpu = -1;
> > >  	unsigned long flags;
> > >  	unsigned long gpa;
> > >  	unsigned long j;
> > > @@ -660,6 +661,8 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
> > >  			for_each_leaf_node_possible_cpu(rnp, cpu)
> > >  				if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu)) {
> > >  					print_cpu_stall_info(cpu);
> > > +					if (first_stalled_cpu < 0)
> > > +						first_stalled_cpu = cpu;
> > 
> > This gets the first stalled CPU.  Maybe that is also what you want for
> > the expedited stalls?
> > 
> > Again, there might not be any stalled CPUs, but only stalled tasks...
> > 
> 
> Understood. For print_other_cpu_stall() itself, first_stalled_cpu = -1
> when only tasks are stalled seems acceptable since exit_cpu < 0 is
> already handled gracefully.
> 
> Is there a convention I should follow here, or is -1 fine?

Using -1 for "no CPU" is fine, and that convention is already in use,
so you should be good.  Or a cpumask_t with all being zero for no CPUs
stalled.  ;-)

> > >  					ndetected++;
> > >  				}
> > >  		}
> > > @@ -701,7 +704,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
> > >  
> > >  	nbcon_cpu_emergency_exit();
> > >  
> > > -	panic_on_rcu_stall();
> > > +	panic_on_rcu_stall(first_stalled_cpu);
> > 
> > ...in which case panic_on_rcu_stall() gets a -1.  Which might not be what
> > the user wants to see.
> > 
> > >  	rcu_force_quiescent_state();  /* Kick them all. */
> > >  }
> > > @@ -754,7 +757,7 @@ static void print_cpu_stall(unsigned long gp_seq, unsigned long gps)
> > >  
> > >  	nbcon_cpu_emergency_exit();
> > >  
> > > -	panic_on_rcu_stall();
> > > +	panic_on_rcu_stall(smp_processor_id());
> > 
> > In this case, we do just have the current CPU doing the stall, so this
> > does work fine.  (If there are more CPUs and tasks involved, there
> > could be additional RCU CPU stall warning messages.)
> 
> Good to know, thanks for confirming.
> 
> > 
> > >  	/*
> > >  	 * Attempt to revive the RCU machinery by forcing a context switch.
> > > diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> > > index 48c65ac8e230..8a0b1662a75a 100644
> > > --- a/kernel/sched/ext.c
> > > +++ b/kernel/sched/ext.c
> > > @@ -5069,6 +5069,7 @@ bool scx_allow_ttwu_queue(const struct task_struct *p)
> > >  
> > >  /**
> > >   * handle_lockup - sched_ext common lockup handler
> > > + * @exit_cpu: CPU to record in exit_info; pass the stalled/hung CPU, not current
> > >   * @fmt: format string
> > >   *
> > >   * Called on system stall or lockup condition and initiates abort of sched_ext
> > > @@ -5078,7 +5079,7 @@ bool scx_allow_ttwu_queue(const struct task_struct *p)
> > >   * resolve the lockup. %false if sched_ext is not enabled or abort was already
> > >   * initiated by someone else.
> > >   */
> > > -static __printf(1, 2) bool handle_lockup(const char *fmt, ...)
> > > +static __printf(2, 3) bool handle_lockup(int exit_cpu, const char *fmt, ...)
> > >  {
> > >  	struct scx_sched *sch;
> > >  	va_list args;
> > > @@ -5094,7 +5095,7 @@ static __printf(1, 2) bool handle_lockup(const char *fmt, ...)
> > >  	case SCX_ENABLING:
> > >  	case SCX_ENABLED:
> > >  		va_start(args, fmt);
> > > -		ret = scx_verror(sch, fmt, args);
> > > +		ret = scx_vexit(sch, SCX_EXIT_ERROR, 0, exit_cpu, fmt, args);
> > >  		va_end(args);
> > >  		return ret;
> > >  	default:
> > > @@ -5114,9 +5115,9 @@ static __printf(1, 2) bool handle_lockup(const char *fmt, ...)
> > >   * resolve the reported RCU stall. %false if sched_ext is not enabled or someone
> > >   * else already initiated abort.
> > >   */
> > > -bool scx_rcu_cpu_stall(void)
> > > +bool scx_rcu_cpu_stall(int stalled_cpu)
> > >  {
> > > -	return handle_lockup("RCU CPU stall detected!");
> > > +	return handle_lockup(stalled_cpu, "RCU CPU stall detected!");
> > >  }
> > >  
> > >  /**
> > > @@ -5131,7 +5132,8 @@ bool scx_rcu_cpu_stall(void)
> > >   */
> > >  void scx_softlockup(u32 dur_s)
> > >  {
> > > -	if (!handle_lockup("soft lockup - CPU %d stuck for %us", smp_processor_id(), dur_s))
> > > +	if (!handle_lockup(smp_processor_id(), "soft lockup - CPU %d stuck for %us",
> > > +			   smp_processor_id(), dur_s))
> > 
> > Repeating "smp_processor_id()" in the argument list seems a bit redundant?
> 
> Will save to a local variable in v3.

Fair enough, though I was wondering whether adding the CPU number as
an argument means that it should be removed from the formatted print.
But maybe you have backwards compatibility constraints?

From your words below, I am guessing compatibility constraints.

> > >  		return;
> > >  
> > >  	printk_deferred(KERN_ERR "sched_ext: Soft lockup - CPU %d stuck for %us, disabling BPF scheduler\n",
> > > @@ -5150,7 +5152,7 @@ static void scx_hardlockup_irq_workfn(struct irq_work *work)
> > >  {
> > >  	int cpu = atomic_xchg(&scx_hardlockup_cpu, -1);
> > >  
> > > -	if (cpu >= 0 && handle_lockup("hard lockup - CPU %d", cpu))
> > > +	if (cpu >= 0 && handle_lockup(cpu, "hard lockup - CPU %d", cpu))
> > 
> > Repeating "cpu" in the argument list seems a bit redundant?
> 
> The first occurrence is exit_cpu (struct field), the second is the %d
> format argument for the readable message. BOth serve different purposes
> so both are needed.

Got it, and fair enough!

							Thanx, Paul

> Thanks.
> 
> > >  		printk_deferred(KERN_ERR "sched_ext: Hard lockup - CPU %d, disabling BPF scheduler\n",
> > >  				cpu);
> > >  }
> > > diff --git a/kernel/sched/ext_internal.h b/kernel/sched/ext_internal.h
> > > index b4f5dd28855e..2d04f27cf7c5 100644
> > > --- a/kernel/sched/ext_internal.h
> > > +++ b/kernel/sched/ext_internal.h
> > > @@ -1486,8 +1486,6 @@ __printf(5, 6) bool __scx_exit(struct scx_sched *sch, enum scx_exit_kind kind,
> > >  	__scx_exit(sch, kind, exit_code, raw_smp_processor_id(), fmt, ##args)
> > >  #define scx_error(sch, fmt, args...)						\
> > >  	scx_exit((sch), SCX_EXIT_ERROR, 0, fmt, ##args)
> > > -#define scx_verror(sch, fmt, args)						\
> > > -	scx_vexit((sch), SCX_EXIT_ERROR, 0, raw_smp_processor_id(), fmt, args)
> > >  
> > >  /*
> > >   * Return the rq currently locked from an scx callback, or NULL if no rq is
> > > -- 
> > > 2.48.1
> > > 
> > > 
> 
> -- 
> Cheers,
> Cheng-Yang

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] sched_ext: Fix exit_cpu accuracy for lockup paths
  2026-05-05  8:34       ` Tejun Heo
@ 2026-05-06 10:18         ` Cheng-Yang Chou
  2026-05-07 21:39           ` Tejun Heo
  0 siblings, 1 reply; 12+ messages in thread
From: Cheng-Yang Chou @ 2026-05-06 10:18 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Paul E. McKenney, sched-ext, David Vernet, Andrea Righi,
	Changwoo Min, rcu, Ching-Chun Huang, Chia-Ping Tsai

Hi Tejun,

On Mon, May 04, 2026 at 10:34:09PM -1000, Tejun Heo wrote:
> On Tue, May 05, 2026 at 04:20:25PM +0800, Cheng-Yang Chou wrote:
> ...
> > > There might be more than one stalled CPU.  You could pass in a cpumask_t
> > > or similar to print all of them, or take the first or the last or
> > > some such.
> > 
> > sched_ext's exit_info carries a single s32 exit_cpu, so we can only
> > record one CPU. Would taking the first stalled CPU be acceptable as a 
> > best-effort hint, or would you prefer we pass -1 and skip the CPU field
> > entirely for the RCU stall paths?
> 
> We carry only one cpu in exit_info but we can easily dump the cpumask and
> prioritize dumping all the cpus in the mask.

To confirm: are you suggesting scx_rcu_cpu_stall() take a cpumask, use
cpumask_first() for exit_cpu, and dump the full mask in the exit message?

Thanks.

-- 
Cheers,
Cheng-Yang

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] sched_ext: Fix exit_cpu accuracy for lockup paths
  2026-05-05 15:10       ` Paul E. McKenney
@ 2026-05-06 10:54         ` Cheng-Yang Chou
  0 siblings, 0 replies; 12+ messages in thread
From: Cheng-Yang Chou @ 2026-05-06 10:54 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: sched-ext, Tejun Heo, David Vernet, Andrea Righi, Changwoo Min,
	rcu, Ching-Chun Huang, Chia-Ping Tsai

Hi Paul,

On Tue, May 05, 2026 at 08:10:45AM -0700, Paul E. McKenney wrote:
> On Tue, May 05, 2026 at 04:20:25PM +0800, Cheng-Yang Chou wrote:
> > Hi Paul, Tejun,
> > 
> > On Mon, May 04, 2026 at 08:44:48PM -0700, Paul E. McKenney wrote:
> > > On Tue, May 05, 2026 at 12:08:20AM +0800, Cheng-Yang Chou wrote:
> > > > handle_lockup() uses raw_smp_processor_id() for exit_cpu, which is wrong
> > > > for two paths:
> > > > 
> > > > - scx_hardlockup_irq_workfn() has the hung CPU in a local variable but
> > > >   irq_work may run elsewhere; pass the local cpu explicitly.
> > > > - scx_rcu_cpu_stall() takes no CPU, recording the detector rather than
> > > >   the stalled one; add stalled_cpu to its signature and
> > > >   panic_on_rcu_stall(), threading it from print_cpu_stall() (current
> > > >   CPU) and print_other_cpu_stall() (first detected stalled CPU).
> > > > 
> > > > Add an exit_cpu parameter to handle_lockup() so callers pass the correct
> > > > CPU directly. Remove the now-unused scx_verror() macro.
> > > > 
> > > > Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
> > > > ---
> > > >  include/linux/sched/ext.h   |  4 ++--
> > > >  kernel/rcu/tree_exp.h       |  2 +-
> > > >  kernel/rcu/tree_stall.h     | 11 +++++++----
> > > >  kernel/sched/ext.c          | 14 ++++++++------
> > > >  kernel/sched/ext_internal.h |  2 --
> > > >  5 files changed, 18 insertions(+), 15 deletions(-)
> > > > 
> > > > diff --git a/include/linux/sched/ext.h b/include/linux/sched/ext.h
> > > > index d05efcac794d..16bbf24089ca 100644
> > > > --- a/include/linux/sched/ext.h
> > > > +++ b/include/linux/sched/ext.h
> > > > @@ -259,7 +259,7 @@ void sched_ext_dead(struct task_struct *p);
> > > >  void print_scx_info(const char *log_lvl, struct task_struct *p);
> > > >  void scx_softlockup(u32 dur_s);
> > > >  bool scx_hardlockup(int cpu);
> > > > -bool scx_rcu_cpu_stall(void);
> > > > +bool scx_rcu_cpu_stall(int stalled_cpu);
> > > >  
> > > >  #else	/* !CONFIG_SCHED_CLASS_EXT */
> > > >  
> > > > @@ -267,7 +267,7 @@ static inline void sched_ext_dead(struct task_struct *p) {}
> > > >  static inline void print_scx_info(const char *log_lvl, struct task_struct *p) {}
> > > >  static inline void scx_softlockup(u32 dur_s) {}
> > > >  static inline bool scx_hardlockup(int cpu) { return false; }
> > > > -static inline bool scx_rcu_cpu_stall(void) { return false; }
> > > > +static inline bool scx_rcu_cpu_stall(int stalled_cpu) { return false; }
> > > 
> > > There might be more than one stalled CPU.  You could pass in a cpumask_t
> > > or similar to print all of them, or take the first or the last or
> > > some such.
> > 
> > sched_ext's exit_info carries a single s32 exit_cpu, so we can only
> > record one CPU. Would taking the first stalled CPU be acceptable as a 
> > best-effort hint, or would you prefer we pass -1 and skip the CPU field
> > entirely for the RCU stall paths?
> 
> I can tell you what information is availalble and how to collect it,
> but I must defer to you guys as to what information can be passed along
> and which of that is actually useful.  ;-)

Yeah, waiting for Tejun and the sched_ext folks to weigh in!

> 
> > > >  #endif	/* CONFIG_SCHED_CLASS_EXT */
> > > >  
> > > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > > > index 82cada459e5d..fa83e273a648 100644
> > > > --- a/kernel/rcu/tree_exp.h
> > > > +++ b/kernel/rcu/tree_exp.h
> > > > @@ -675,7 +675,7 @@ static void synchronize_rcu_expedited_wait(void)
> > > >  
> > > >  		nbcon_cpu_emergency_exit();
> > > >  
> > > > -		panic_on_rcu_stall();
> > > > +		panic_on_rcu_stall(raw_smp_processor_id());
> > > 
> > > As Tejun pointed out, this gets you whatever random CPU was running
> > > RCU's expedited grace-period processing, which is almost certainly not
> > > the stalled CPU.
> > > 
> > > In addition, in preemptible kernels, it is possible to stall not on a CPU,
> > > but on a task that was preempted within an RCU read-side critical section.
> > > (It is also possible to stall on some CPUs and on some tasks.)
> > > 
> > > So suppose that the expedited grace period is stalled not on a CPU, but
> > > instead on a task.  What do you want to do in that case?
> > > 
> > > If you only care about CPUs, then synchronize_rcu_expedited_stall()
> > > could return the number of the first stalled CPU that it encountered,
> > > or -1 if there were no CPUs doing any stalling.  If you wanted the first
> > > CPU, you could capture it within the first rcu_for_each_leaf_node()
> > > loop in that function.  Or you could pass in a cpumask_t and have
> > > synchronize_rcu_expedited_stall() fill it in, getting all the CPUs.
> > > And you could pass in an array (along with its size) to capture the PIDs
> > > of any stalled tasks, which synchronize_rcu_expedited_stall() would in
> > > turn pass down to rcu_print_task_exp_stall(), which could fill in the
> > > PIDs of the first stalled tasks in the ->blkd_tasks lists.
> > > 
> > > Whatever you collect would then be passed down to panic_on_rcu_stall()
> > > and from there to scx_rcu_cpu_stall().
> > > 
> > > Much depends on exactly what you are trying to accomplish with the
> > > additional information.
> > 
> > So, the goal here is best-effort: when sched_ext is aborted due to an
> > RCU stall, record which CPU was involved for the dump header. There is
> > no task-based filed in exit_info, so task stalls would naturally fall
> > through to -1.
> > 
> > Would you prefer:
> >  (a) pass -1 unconditionally
> >  (b) have synchronize_rcu_expedited_stall() return the first stalled CPU
> >      (or -1 if only tasks are stalled)?
> 
> Or, as Tejun suggests:
> 
>   (c) add a cpumask_t.

(c) sounds good!

> 
> I have no idea whether or not it makes sense to also add a small fixed
> array for PIDs of stalled tasks.  But there can be a very large number
> of stalled tasks, so if you choose to capture any of them, it should
> be a small fixed number of them.  (In my experience, in practice, there
> usually are not very many stalled tasks anyway.)
> 

I'd like to skip task PIDs for now, out of scope for this patch.

> > > More inline below.
> > > 
> > > 						Thanx, Paul
> > > 
> > > >  	}
> > > >  }
> > > >  
> > > > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> > > > index b67532cb8770..172ac08e673d 100644
> > > > --- a/kernel/rcu/tree_stall.h
> > > > +++ b/kernel/rcu/tree_stall.h
> > > > @@ -159,7 +159,7 @@ static int __init check_cpu_stall_init(void)
> > > >  early_initcall(check_cpu_stall_init);
> > > >  
> > > >  /* If so specified via sysctl, panic, yielding cleaner stall-warning output. */
> > > > -static void panic_on_rcu_stall(void)
> > > > +static void panic_on_rcu_stall(int stalled_cpu)
> > > >  {
> > > >  	static int cpu_stall;
> > > >  
> > > > @@ -167,7 +167,7 @@ static void panic_on_rcu_stall(void)
> > > >  	 * Attempt to kick out the BPF scheduler if it's installed and defer
> > > >  	 * the panic to give the system a chance to recover.
> > > >  	 */
> > > > -	if (scx_rcu_cpu_stall())
> > > > +	if (scx_rcu_cpu_stall(stalled_cpu))
> > > >  		return;
> > > >  
> > > >  	if (++cpu_stall < sysctl_max_rcu_stall_to_panic)
> > > > @@ -631,6 +631,7 @@ static void rcu_check_gp_kthread_expired_fqs_timer(void)
> > > >  static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
> > > >  {
> > > >  	int cpu;
> > > > +	int first_stalled_cpu = -1;
> > > >  	unsigned long flags;
> > > >  	unsigned long gpa;
> > > >  	unsigned long j;
> > > > @@ -660,6 +661,8 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
> > > >  			for_each_leaf_node_possible_cpu(rnp, cpu)
> > > >  				if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu)) {
> > > >  					print_cpu_stall_info(cpu);
> > > > +					if (first_stalled_cpu < 0)
> > > > +						first_stalled_cpu = cpu;
> > > 
> > > This gets the first stalled CPU.  Maybe that is also what you want for
> > > the expedited stalls?
> > > 
> > > Again, there might not be any stalled CPUs, but only stalled tasks...
> > > 
> > 
> > Understood. For print_other_cpu_stall() itself, first_stalled_cpu = -1
> > when only tasks are stalled seems acceptable since exit_cpu < 0 is
> > already handled gracefully.
> > 
> > Is there a convention I should follow here, or is -1 fine?
> 
> Using -1 for "no CPU" is fine, and that convention is already in use,
> so you should be good.  Or a cpumask_t with all being zero for no CPUs
> stalled.  ;-)
> 

Thanks for confirming, and yes, cpumask_t also sounds good!

> > > >  					ndetected++;
> > > >  				}
> > > >  		}
> > > > @@ -701,7 +704,7 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
> > > >  
> > > >  	nbcon_cpu_emergency_exit();
> > > >  
> > > > -	panic_on_rcu_stall();
> > > > +	panic_on_rcu_stall(first_stalled_cpu);
> > > 
> > > ...in which case panic_on_rcu_stall() gets a -1.  Which might not be what
> > > the user wants to see.

Sorry for missing this. With cpumask_t, an empty mask covers that case
more cleanly.

> > > 
> > > >  	rcu_force_quiescent_state();  /* Kick them all. */
> > > >  }
> > > > @@ -754,7 +757,7 @@ static void print_cpu_stall(unsigned long gp_seq, unsigned long gps)
> > > >  
> > > >  	nbcon_cpu_emergency_exit();
> > > >  
> > > > -	panic_on_rcu_stall();
> > > > +	panic_on_rcu_stall(smp_processor_id());
> > > 
> > > In this case, we do just have the current CPU doing the stall, so this
> > > does work fine.  (If there are more CPUs and tasks involved, there
> > > could be additional RCU CPU stall warning messages.)
> > 
> > Good to know, thanks for confirming.
> > 
> > > 
> > > >  	/*
> > > >  	 * Attempt to revive the RCU machinery by forcing a context switch.
> > > > diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> > > > index 48c65ac8e230..8a0b1662a75a 100644
> > > > --- a/kernel/sched/ext.c
> > > > +++ b/kernel/sched/ext.c
> > > > @@ -5069,6 +5069,7 @@ bool scx_allow_ttwu_queue(const struct task_struct *p)
> > > >  
> > > >  /**
> > > >   * handle_lockup - sched_ext common lockup handler
> > > > + * @exit_cpu: CPU to record in exit_info; pass the stalled/hung CPU, not current
> > > >   * @fmt: format string
> > > >   *
> > > >   * Called on system stall or lockup condition and initiates abort of sched_ext
> > > > @@ -5078,7 +5079,7 @@ bool scx_allow_ttwu_queue(const struct task_struct *p)
> > > >   * resolve the lockup. %false if sched_ext is not enabled or abort was already
> > > >   * initiated by someone else.
> > > >   */
> > > > -static __printf(1, 2) bool handle_lockup(const char *fmt, ...)
> > > > +static __printf(2, 3) bool handle_lockup(int exit_cpu, const char *fmt, ...)
> > > >  {
> > > >  	struct scx_sched *sch;
> > > >  	va_list args;
> > > > @@ -5094,7 +5095,7 @@ static __printf(1, 2) bool handle_lockup(const char *fmt, ...)
> > > >  	case SCX_ENABLING:
> > > >  	case SCX_ENABLED:
> > > >  		va_start(args, fmt);
> > > > -		ret = scx_verror(sch, fmt, args);
> > > > +		ret = scx_vexit(sch, SCX_EXIT_ERROR, 0, exit_cpu, fmt, args);
> > > >  		va_end(args);
> > > >  		return ret;
> > > >  	default:
> > > > @@ -5114,9 +5115,9 @@ static __printf(1, 2) bool handle_lockup(const char *fmt, ...)
> > > >   * resolve the reported RCU stall. %false if sched_ext is not enabled or someone
> > > >   * else already initiated abort.
> > > >   */
> > > > -bool scx_rcu_cpu_stall(void)
> > > > +bool scx_rcu_cpu_stall(int stalled_cpu)
> > > >  {
> > > > -	return handle_lockup("RCU CPU stall detected!");
> > > > +	return handle_lockup(stalled_cpu, "RCU CPU stall detected!");
> > > >  }
> > > >  
> > > >  /**
> > > > @@ -5131,7 +5132,8 @@ bool scx_rcu_cpu_stall(void)
> > > >   */
> > > >  void scx_softlockup(u32 dur_s)
> > > >  {
> > > > -	if (!handle_lockup("soft lockup - CPU %d stuck for %us", smp_processor_id(), dur_s))
> > > > +	if (!handle_lockup(smp_processor_id(), "soft lockup - CPU %d stuck for %us",
> > > > +			   smp_processor_id(), dur_s))
> > > 
> > > Repeating "smp_processor_id()" in the argument list seems a bit redundant?
> > 
> > Will save to a local variable in v3.
> 
> Fair enough, though I was wondering whether adding the CPU number as
> an argument means that it should be removed from the formatted print.
> But maybe you have backwards compatibility constraints?
> 
> From your words below, I am guessing compatibility constraints.
> 

Yes, though not because of backwards compatibility. Same reason as
below, I prefer keeping as-is for readability.

> > > >  		return;
> > > >  
> > > >  	printk_deferred(KERN_ERR "sched_ext: Soft lockup - CPU %d stuck for %us, disabling BPF scheduler\n",
> > > > @@ -5150,7 +5152,7 @@ static void scx_hardlockup_irq_workfn(struct irq_work *work)
> > > >  {
> > > >  	int cpu = atomic_xchg(&scx_hardlockup_cpu, -1);
> > > >  
> > > > -	if (cpu >= 0 && handle_lockup("hard lockup - CPU %d", cpu))
> > > > +	if (cpu >= 0 && handle_lockup(cpu, "hard lockup - CPU %d", cpu))
> > > 
> > > Repeating "cpu" in the argument list seems a bit redundant?
> > 
> > The first occurrence is exit_cpu (struct field), the second is the %d
> > format argument for the readable message. BOth serve different purposes
> > so both are needed.
> 
> Got it, and fair enough!

Thanks for the review!

> 
> 							Thanx, Paul
> 
> > Thanks.
> > 
> > > >  		printk_deferred(KERN_ERR "sched_ext: Hard lockup - CPU %d, disabling BPF scheduler\n",
> > > >  				cpu);
> > > >  }
> > > > diff --git a/kernel/sched/ext_internal.h b/kernel/sched/ext_internal.h
> > > > index b4f5dd28855e..2d04f27cf7c5 100644
> > > > --- a/kernel/sched/ext_internal.h
> > > > +++ b/kernel/sched/ext_internal.h
> > > > @@ -1486,8 +1486,6 @@ __printf(5, 6) bool __scx_exit(struct scx_sched *sch, enum scx_exit_kind kind,
> > > >  	__scx_exit(sch, kind, exit_code, raw_smp_processor_id(), fmt, ##args)
> > > >  #define scx_error(sch, fmt, args...)						\
> > > >  	scx_exit((sch), SCX_EXIT_ERROR, 0, fmt, ##args)
> > > > -#define scx_verror(sch, fmt, args)						\
> > > > -	scx_vexit((sch), SCX_EXIT_ERROR, 0, raw_smp_processor_id(), fmt, args)
> > > >  
> > > >  /*
> > > >   * Return the rq currently locked from an scx callback, or NULL if no rq is

-- 
Cheers,
Cheng-Yang

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] sched_ext: Fix exit_cpu accuracy for lockup paths
  2026-05-06 10:18         ` Cheng-Yang Chou
@ 2026-05-07 21:39           ` Tejun Heo
  0 siblings, 0 replies; 12+ messages in thread
From: Tejun Heo @ 2026-05-07 21:39 UTC (permalink / raw)
  To: Cheng-Yang Chou
  Cc: Paul E. McKenney, sched-ext, David Vernet, Andrea Righi,
	Changwoo Min, rcu, Ching-Chun Huang, Chia-Ping Tsai

On Wed, May 06, 2026 at 06:18:11PM +0800, Cheng-Yang Chou wrote:
> Hi Tejun,
> 
> On Mon, May 04, 2026 at 10:34:09PM -1000, Tejun Heo wrote:
> > On Tue, May 05, 2026 at 04:20:25PM +0800, Cheng-Yang Chou wrote:
> > We carry only one cpu in exit_info but we can easily dump the cpumask and
> > prioritize dumping all the cpus in the mask.
> 
> To confirm: are you suggesting scx_rcu_cpu_stall() take a cpumask, use
> cpumask_first() for exit_cpu, and dump the full mask in the exit message?

Yes.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-05-07 21:39 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-04 16:08 [PATCH v2 sched_ext/for-7.2 0/2] sched_ext: Follow-up fixes for exit_cpu accuracy Cheng-Yang Chou
2026-05-04 16:08 ` [PATCH v2 1/2] sched_ext: Normalize exit dump header to "on CPU N" Cheng-Yang Chou
2026-05-04 16:08 ` [PATCH v2 2/2] sched_ext: Fix exit_cpu accuracy for lockup paths Cheng-Yang Chou
2026-05-05  0:59   ` Tejun Heo
2026-05-05  3:44   ` Paul E. McKenney
2026-05-05  8:20     ` Cheng-Yang Chou
2026-05-05  8:34       ` Tejun Heo
2026-05-06 10:18         ` Cheng-Yang Chou
2026-05-07 21:39           ` Tejun Heo
2026-05-05 15:10       ` Paul E. McKenney
2026-05-06 10:54         ` Cheng-Yang Chou
2026-05-04 21:05 ` [PATCH v2 sched_ext/for-7.2 0/2] sched_ext: Follow-up fixes for exit_cpu accuracy Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox