Linux RCU subsystem development
 help / color / mirror / Atom feed
* [PATCH] rcu: use NMI to dump backtrace of blkd-task running on other cpu
@ 2026-04-17  1:38 Jiazi Li
  2026-04-21 23:20 ` Paul E. McKenney
  0 siblings, 1 reply; 3+ messages in thread
From: Jiazi Li @ 2026-04-17  1:38 UTC (permalink / raw)
  To: Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay,
	Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki
  Cc: Jiazi Li, rcu, mingzhu.wang

sched_show_task cannot dump backtrace of blkd-task running on other
cpu:
[117421.286553][    C0] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[117421.286579][    C0] rcu:    Tasks blocked on level-0 rcu_node (CPUs 0-7): P2280
[117421.286595][    C0] rcu:    (detected by 0, t=5252 jiffies, g=751845, q=66318 ncpus=8)
[117421.286604][    C0] task:android.imms2   state:R  running task     stack:0     ...
[117421.286617][    C0] Call trace:
[117421.286622][    C0]  __switch_to+0x1a0/0x318
[117421.286636][    C0]  0x0

So use NMI to dump backtrace:
[  390.584143] rcub/0: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  390.585156] rcub/0: rcu:     Tasks blocked on level-0 rcu_node (CPUs 0-7): P6816
[  390.586207] rcub/0: rcu:     (detected by 5, t=52532 jiffies, g=7405, q=63942 ncpus=8)
[  390.587320] rcub/0: Sending NMI from CPU 5 to CPUs 4:
[  390.588111] rcu_stall_threa: NMI backtrace for cpu 4
[  390.588116] rcu_stall_threa: CPU: 4 UID: 0 PID: 6816 Comm: rcu_stall_threa Tainted: P...
[  390.588120] rcu_stall_threa: Tainted: [P]=PROPRIETARY_MODULE, [W]=WARN, [O]=OOT_MODULE
[  390.588122] rcu_stall_threa: Hardware name: MT6858 (DT)
[  390.588123] rcu_stall_threa: pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  390.588125] rcu_stall_threa: pc : _raw_spin_unlock_irqrestore+0x1c/0x44
[  390.588131] rcu_stall_threa: lr : ___ratelimit+0xd4/0x110
[  390.588134] rcu_stall_threa: sp : ffffffc08464bdf0
[  390.588135] rcu_stall_threa: x29: ffffffc08464bdf0 x28: 0000000000000000 x27: 0000000000000000
[  390.588138] rcu_stall_threa: x26: 0000000000000000 x25: 0000000000000000 x24: 00000000000004e2
[  390.588140] rcu_stall_threa: x23: ffffffd82ae77000 x22: ffffffd82af1fae8 x21: 000000000000000a
[  390.588142] rcu_stall_threa: x20: 0000000000000000 x19: 0000000000000000 x18: ffffffc08456d020
[  390.588144] rcu_stall_threa: x17: 000000008c623181 x16: 000000008c623181 x15: 0000000000000010
[  390.588146] rcu_stall_threa: x14: 0000000000000100 x13: ffffffc084648000 x12: ffffffc08464c000
[  390.588148] rcu_stall_threa: x11: 5e2da9f91a08d800 x10: ffffffd8299b39fc x9 : 0000000100005874
[  390.588150] rcu_stall_threa: x8 : 0000000000000000 x7 : 0000000000000001 x6 : fffffffebea2b0a0
[  390.588152] rcu_stall_threa: x5 : 0000000000000000 x4 : 0000000000000402 x3 : 0000000000000000
[  390.588154] rcu_stall_threa: x2 : ffffff81ca8d9680 x1 : 0000000000000000 x0 : 0000000000000001
[  390.588156] rcu_stall_threa: Call trace:
[  390.588157] rcu_stall_threa:  _raw_spin_unlock_irqrestore+0x1c/0x44
[  390.588159] rcu_stall_threa:  ___ratelimit+0xd4/0x110
[  390.588161] rcu_stall_threa:  rcu_thread_func+0x90/0xa8
[  390.588164] rcu_stall_threa:  kthread+0x110/0x1a4
[  390.588167] rcu_stall_threa:  ret_from_fork+0x10/0x20

Signed-off-by: Jiazi Li <jqqlijiazi@gmail.com>
Tested-by: mingzhu.wang <mingzhu.wang@transsion.com>
---
 kernel/rcu/tree_stall.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index b67532cb8770..5806f9a43579 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -289,7 +289,12 @@ static void rcu_print_detail_task_stall_rnp(struct rcu_node *rnp)
 		 * Avoid triggering hard lockup.
 		 */
 		touch_nmi_watchdog();
-		sched_show_task(t);
+		if (unlikely(t->on_cpu && t != current) &&
+				trigger_single_cpu_backtrace(task_cpu(t))) {
+			/*Successfully triggered remote backtrace*/
+		} else {
+			sched_show_task(t);
+		}
 	}
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 }
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] rcu: use NMI to dump backtrace of blkd-task running on other cpu
  2026-04-17  1:38 [PATCH] rcu: use NMI to dump backtrace of blkd-task running on other cpu Jiazi Li
@ 2026-04-21 23:20 ` Paul E. McKenney
  2026-04-22  6:45   ` Jiazi Li
  0 siblings, 1 reply; 3+ messages in thread
From: Paul E. McKenney @ 2026-04-21 23:20 UTC (permalink / raw)
  To: Jiazi Li
  Cc: Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Boqun Feng, Uladzislau Rezki, rcu, mingzhu.wang

On Fri, Apr 17, 2026 at 09:38:13AM +0800, Jiazi Li wrote:
> sched_show_task cannot dump backtrace of blkd-task running on other
> cpu:
> [117421.286553][    C0] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [117421.286579][    C0] rcu:    Tasks blocked on level-0 rcu_node (CPUs 0-7): P2280
> [117421.286595][    C0] rcu:    (detected by 0, t=5252 jiffies, g=751845, q=66318 ncpus=8)
> [117421.286604][    C0] task:android.imms2   state:R  running task     stack:0     ...
> [117421.286617][    C0] Call trace:
> [117421.286622][    C0]  __switch_to+0x1a0/0x318
> [117421.286636][    C0]  0x0
> 
> So use NMI to dump backtrace:
> [  390.584143] rcub/0: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [  390.585156] rcub/0: rcu:     Tasks blocked on level-0 rcu_node (CPUs 0-7): P6816
> [  390.586207] rcub/0: rcu:     (detected by 5, t=52532 jiffies, g=7405, q=63942 ncpus=8)
> [  390.587320] rcub/0: Sending NMI from CPU 5 to CPUs 4:
> [  390.588111] rcu_stall_threa: NMI backtrace for cpu 4
> [  390.588116] rcu_stall_threa: CPU: 4 UID: 0 PID: 6816 Comm: rcu_stall_threa Tainted: P...
> [  390.588120] rcu_stall_threa: Tainted: [P]=PROPRIETARY_MODULE, [W]=WARN, [O]=OOT_MODULE
> [  390.588122] rcu_stall_threa: Hardware name: MT6858 (DT)
> [  390.588123] rcu_stall_threa: pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  390.588125] rcu_stall_threa: pc : _raw_spin_unlock_irqrestore+0x1c/0x44
> [  390.588131] rcu_stall_threa: lr : ___ratelimit+0xd4/0x110
> [  390.588134] rcu_stall_threa: sp : ffffffc08464bdf0
> [  390.588135] rcu_stall_threa: x29: ffffffc08464bdf0 x28: 0000000000000000 x27: 0000000000000000
> [  390.588138] rcu_stall_threa: x26: 0000000000000000 x25: 0000000000000000 x24: 00000000000004e2
> [  390.588140] rcu_stall_threa: x23: ffffffd82ae77000 x22: ffffffd82af1fae8 x21: 000000000000000a
> [  390.588142] rcu_stall_threa: x20: 0000000000000000 x19: 0000000000000000 x18: ffffffc08456d020
> [  390.588144] rcu_stall_threa: x17: 000000008c623181 x16: 000000008c623181 x15: 0000000000000010
> [  390.588146] rcu_stall_threa: x14: 0000000000000100 x13: ffffffc084648000 x12: ffffffc08464c000
> [  390.588148] rcu_stall_threa: x11: 5e2da9f91a08d800 x10: ffffffd8299b39fc x9 : 0000000100005874
> [  390.588150] rcu_stall_threa: x8 : 0000000000000000 x7 : 0000000000000001 x6 : fffffffebea2b0a0
> [  390.588152] rcu_stall_threa: x5 : 0000000000000000 x4 : 0000000000000402 x3 : 0000000000000000
> [  390.588154] rcu_stall_threa: x2 : ffffff81ca8d9680 x1 : 0000000000000000 x0 : 0000000000000001
> [  390.588156] rcu_stall_threa: Call trace:
> [  390.588157] rcu_stall_threa:  _raw_spin_unlock_irqrestore+0x1c/0x44
> [  390.588159] rcu_stall_threa:  ___ratelimit+0xd4/0x110
> [  390.588161] rcu_stall_threa:  rcu_thread_func+0x90/0xa8
> [  390.588164] rcu_stall_threa:  kthread+0x110/0x1a4
> [  390.588167] rcu_stall_threa:  ret_from_fork+0x10/0x20
> 
> Signed-off-by: Jiazi Li <jqqlijiazi@gmail.com>
> Tested-by: mingzhu.wang <mingzhu.wang@transsion.com>

This looks like an arm64 stack trace.  Are there any arm64 systems in
production that do real NMIs?  (Don't get me wrong, it would be nice if
there are!)

> ---
>  kernel/rcu/tree_stall.h | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index b67532cb8770..5806f9a43579 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -289,7 +289,12 @@ static void rcu_print_detail_task_stall_rnp(struct rcu_node *rnp)
>  		 * Avoid triggering hard lockup.
>  		 */
>  		touch_nmi_watchdog();
> -		sched_show_task(t);
> +		if (unlikely(t->on_cpu && t != current) &&

What if task t blocks or migrates to some other CPU at this point?

> +				trigger_single_cpu_backtrace(task_cpu(t))) {
> +			/*Successfully triggered remote backtrace*/

Wouldn't inverting the condition save a couple of lines of code here?
And make it a bit more straightforward?

							Thanx, Paul

> +		} else {
> +			sched_show_task(t);
> +		}
>  	}
>  	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>  }
> -- 
> 2.49.0
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] rcu: use NMI to dump backtrace of blkd-task running on other cpu
  2026-04-21 23:20 ` Paul E. McKenney
@ 2026-04-22  6:45   ` Jiazi Li
  0 siblings, 0 replies; 3+ messages in thread
From: Jiazi Li @ 2026-04-22  6:45 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
	Josh Triplett, Boqun Feng, Uladzislau Rezki, rcu, mingzhu.wang

On Tue, Apr 21, 2026 at 04:20:54PM -0700, Paul E. McKenney wrote:
> On Fri, Apr 17, 2026 at 09:38:13AM +0800, Jiazi Li wrote:
> > sched_show_task cannot dump backtrace of blkd-task running on other
> > cpu:
> > [117421.286553][    C0] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > [117421.286579][    C0] rcu:    Tasks blocked on level-0 rcu_node (CPUs 0-7): P2280
> > [117421.286595][    C0] rcu:    (detected by 0, t=5252 jiffies, g=751845, q=66318 ncpus=8)
> > [117421.286604][    C0] task:android.imms2   state:R  running task     stack:0     ...
> > [117421.286617][    C0] Call trace:
> > [117421.286622][    C0]  __switch_to+0x1a0/0x318
> > [117421.286636][    C0]  0x0
> > 
> > So use NMI to dump backtrace:
> > [  390.584143] rcub/0: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > [  390.585156] rcub/0: rcu:     Tasks blocked on level-0 rcu_node (CPUs 0-7): P6816
> > [  390.586207] rcub/0: rcu:     (detected by 5, t=52532 jiffies, g=7405, q=63942 ncpus=8)
> > [  390.587320] rcub/0: Sending NMI from CPU 5 to CPUs 4:
> > [  390.588111] rcu_stall_threa: NMI backtrace for cpu 4
> > [  390.588116] rcu_stall_threa: CPU: 4 UID: 0 PID: 6816 Comm: rcu_stall_threa Tainted: P...
> > [  390.588120] rcu_stall_threa: Tainted: [P]=PROPRIETARY_MODULE, [W]=WARN, [O]=OOT_MODULE
> > [  390.588122] rcu_stall_threa: Hardware name: MT6858 (DT)
> > [  390.588123] rcu_stall_threa: pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [  390.588125] rcu_stall_threa: pc : _raw_spin_unlock_irqrestore+0x1c/0x44
> > [  390.588131] rcu_stall_threa: lr : ___ratelimit+0xd4/0x110
> > [  390.588134] rcu_stall_threa: sp : ffffffc08464bdf0
> > [  390.588135] rcu_stall_threa: x29: ffffffc08464bdf0 x28: 0000000000000000 x27: 0000000000000000
> > [  390.588138] rcu_stall_threa: x26: 0000000000000000 x25: 0000000000000000 x24: 00000000000004e2
> > [  390.588140] rcu_stall_threa: x23: ffffffd82ae77000 x22: ffffffd82af1fae8 x21: 000000000000000a
> > [  390.588142] rcu_stall_threa: x20: 0000000000000000 x19: 0000000000000000 x18: ffffffc08456d020
> > [  390.588144] rcu_stall_threa: x17: 000000008c623181 x16: 000000008c623181 x15: 0000000000000010
> > [  390.588146] rcu_stall_threa: x14: 0000000000000100 x13: ffffffc084648000 x12: ffffffc08464c000
> > [  390.588148] rcu_stall_threa: x11: 5e2da9f91a08d800 x10: ffffffd8299b39fc x9 : 0000000100005874
> > [  390.588150] rcu_stall_threa: x8 : 0000000000000000 x7 : 0000000000000001 x6 : fffffffebea2b0a0
> > [  390.588152] rcu_stall_threa: x5 : 0000000000000000 x4 : 0000000000000402 x3 : 0000000000000000
> > [  390.588154] rcu_stall_threa: x2 : ffffff81ca8d9680 x1 : 0000000000000000 x0 : 0000000000000001
> > [  390.588156] rcu_stall_threa: Call trace:
> > [  390.588157] rcu_stall_threa:  _raw_spin_unlock_irqrestore+0x1c/0x44
> > [  390.588159] rcu_stall_threa:  ___ratelimit+0xd4/0x110
> > [  390.588161] rcu_stall_threa:  rcu_thread_func+0x90/0xa8
> > [  390.588164] rcu_stall_threa:  kthread+0x110/0x1a4
> > [  390.588167] rcu_stall_threa:  ret_from_fork+0x10/0x20
> > 
> > Signed-off-by: Jiazi Li <jqqlijiazi@gmail.com>
> > Tested-by: mingzhu.wang <mingzhu.wang@transsion.com>
> 
> This looks like an arm64 stack trace.  Are there any arm64 systems in
> production that do real NMIs?  (Don't get me wrong, it would be nice if
> there are!)
> 
From commit 331a1b3a836c ("arm64: smp: Add arch support for backtrace using pseudo-NMI"), ARM64 using
pseudo-NMI, it's actually an IPI.
> > ---
> >  kernel/rcu/tree_stall.h | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> > index b67532cb8770..5806f9a43579 100644
> > --- a/kernel/rcu/tree_stall.h
> > +++ b/kernel/rcu/tree_stall.h
> > @@ -289,7 +289,12 @@ static void rcu_print_detail_task_stall_rnp(struct rcu_node *rnp)
> >  		 * Avoid triggering hard lockup.
> >  		 */
> >  		touch_nmi_watchdog();
> > -		sched_show_task(t);
> > +		if (unlikely(t->on_cpu && t != current) &&
> 
> What if task t blocks or migrates to some other CPU at this point?
> 
Yes, that's indeed a concern. We can identify such scenarios by checking
whether the PID reported by RCU matched the PID captured in the NMI
backtrace.
Do you have any suggestions?
> > +				trigger_single_cpu_backtrace(task_cpu(t))) {
> > +			/*Successfully triggered remote backtrace*/
> 
> Wouldn't inverting the condition save a couple of lines of code here?
> And make it a bit more straightforward?
> 
> 							Thanx, Paul
> 
Do you mean something like the following code?
		if (!unlikely(t->on_cpu && t != current) ||
			!trigger_single_cpu_backtrace(task_cpu(t)))
			sched_show_task(t);

> > +		} else {
> > +			sched_show_task(t);
> > +		}
> >  	}
> >  	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> >  }
> > -- 
> > 2.49.0
> > 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-04-22  6:46 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-17  1:38 [PATCH] rcu: use NMI to dump backtrace of blkd-task running on other cpu Jiazi Li
2026-04-21 23:20 ` Paul E. McKenney
2026-04-22  6:45   ` Jiazi Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox