* [PATCH] rcu: use NMI to dump backtrace of blkd-task running on other cpu
@ 2026-04-17 1:38 Jiazi Li
2026-04-21 23:20 ` Paul E. McKenney
0 siblings, 1 reply; 3+ messages in thread
From: Jiazi Li @ 2026-04-17 1:38 UTC (permalink / raw)
To: Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay,
Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki
Cc: Jiazi Li, rcu, mingzhu.wang
sched_show_task cannot dump backtrace of blkd-task running on other
cpu:
[117421.286553][ C0] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[117421.286579][ C0] rcu: Tasks blocked on level-0 rcu_node (CPUs 0-7): P2280
[117421.286595][ C0] rcu: (detected by 0, t=5252 jiffies, g=751845, q=66318 ncpus=8)
[117421.286604][ C0] task:android.imms2 state:R running task stack:0 ...
[117421.286617][ C0] Call trace:
[117421.286622][ C0] __switch_to+0x1a0/0x318
[117421.286636][ C0] 0x0
So use NMI to dump backtrace:
[ 390.584143] rcub/0: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 390.585156] rcub/0: rcu: Tasks blocked on level-0 rcu_node (CPUs 0-7): P6816
[ 390.586207] rcub/0: rcu: (detected by 5, t=52532 jiffies, g=7405, q=63942 ncpus=8)
[ 390.587320] rcub/0: Sending NMI from CPU 5 to CPUs 4:
[ 390.588111] rcu_stall_threa: NMI backtrace for cpu 4
[ 390.588116] rcu_stall_threa: CPU: 4 UID: 0 PID: 6816 Comm: rcu_stall_threa Tainted: P...
[ 390.588120] rcu_stall_threa: Tainted: [P]=PROPRIETARY_MODULE, [W]=WARN, [O]=OOT_MODULE
[ 390.588122] rcu_stall_threa: Hardware name: MT6858 (DT)
[ 390.588123] rcu_stall_threa: pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 390.588125] rcu_stall_threa: pc : _raw_spin_unlock_irqrestore+0x1c/0x44
[ 390.588131] rcu_stall_threa: lr : ___ratelimit+0xd4/0x110
[ 390.588134] rcu_stall_threa: sp : ffffffc08464bdf0
[ 390.588135] rcu_stall_threa: x29: ffffffc08464bdf0 x28: 0000000000000000 x27: 0000000000000000
[ 390.588138] rcu_stall_threa: x26: 0000000000000000 x25: 0000000000000000 x24: 00000000000004e2
[ 390.588140] rcu_stall_threa: x23: ffffffd82ae77000 x22: ffffffd82af1fae8 x21: 000000000000000a
[ 390.588142] rcu_stall_threa: x20: 0000000000000000 x19: 0000000000000000 x18: ffffffc08456d020
[ 390.588144] rcu_stall_threa: x17: 000000008c623181 x16: 000000008c623181 x15: 0000000000000010
[ 390.588146] rcu_stall_threa: x14: 0000000000000100 x13: ffffffc084648000 x12: ffffffc08464c000
[ 390.588148] rcu_stall_threa: x11: 5e2da9f91a08d800 x10: ffffffd8299b39fc x9 : 0000000100005874
[ 390.588150] rcu_stall_threa: x8 : 0000000000000000 x7 : 0000000000000001 x6 : fffffffebea2b0a0
[ 390.588152] rcu_stall_threa: x5 : 0000000000000000 x4 : 0000000000000402 x3 : 0000000000000000
[ 390.588154] rcu_stall_threa: x2 : ffffff81ca8d9680 x1 : 0000000000000000 x0 : 0000000000000001
[ 390.588156] rcu_stall_threa: Call trace:
[ 390.588157] rcu_stall_threa: _raw_spin_unlock_irqrestore+0x1c/0x44
[ 390.588159] rcu_stall_threa: ___ratelimit+0xd4/0x110
[ 390.588161] rcu_stall_threa: rcu_thread_func+0x90/0xa8
[ 390.588164] rcu_stall_threa: kthread+0x110/0x1a4
[ 390.588167] rcu_stall_threa: ret_from_fork+0x10/0x20
Signed-off-by: Jiazi Li <jqqlijiazi@gmail.com>
Tested-by: mingzhu.wang <mingzhu.wang@transsion.com>
---
kernel/rcu/tree_stall.h | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index b67532cb8770..5806f9a43579 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -289,7 +289,12 @@ static void rcu_print_detail_task_stall_rnp(struct rcu_node *rnp)
* Avoid triggering hard lockup.
*/
touch_nmi_watchdog();
- sched_show_task(t);
+ if (unlikely(t->on_cpu && t != current) &&
+ trigger_single_cpu_backtrace(task_cpu(t))) {
+ /*Successfully triggered remote backtrace*/
+ } else {
+ sched_show_task(t);
+ }
}
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
}
--
2.49.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] rcu: use NMI to dump backtrace of blkd-task running on other cpu
2026-04-17 1:38 [PATCH] rcu: use NMI to dump backtrace of blkd-task running on other cpu Jiazi Li
@ 2026-04-21 23:20 ` Paul E. McKenney
2026-04-22 6:45 ` Jiazi Li
0 siblings, 1 reply; 3+ messages in thread
From: Paul E. McKenney @ 2026-04-21 23:20 UTC (permalink / raw)
To: Jiazi Li
Cc: Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
Josh Triplett, Boqun Feng, Uladzislau Rezki, rcu, mingzhu.wang
On Fri, Apr 17, 2026 at 09:38:13AM +0800, Jiazi Li wrote:
> sched_show_task cannot dump backtrace of blkd-task running on other
> cpu:
> [117421.286553][ C0] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [117421.286579][ C0] rcu: Tasks blocked on level-0 rcu_node (CPUs 0-7): P2280
> [117421.286595][ C0] rcu: (detected by 0, t=5252 jiffies, g=751845, q=66318 ncpus=8)
> [117421.286604][ C0] task:android.imms2 state:R running task stack:0 ...
> [117421.286617][ C0] Call trace:
> [117421.286622][ C0] __switch_to+0x1a0/0x318
> [117421.286636][ C0] 0x0
>
> So use NMI to dump backtrace:
> [ 390.584143] rcub/0: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 390.585156] rcub/0: rcu: Tasks blocked on level-0 rcu_node (CPUs 0-7): P6816
> [ 390.586207] rcub/0: rcu: (detected by 5, t=52532 jiffies, g=7405, q=63942 ncpus=8)
> [ 390.587320] rcub/0: Sending NMI from CPU 5 to CPUs 4:
> [ 390.588111] rcu_stall_threa: NMI backtrace for cpu 4
> [ 390.588116] rcu_stall_threa: CPU: 4 UID: 0 PID: 6816 Comm: rcu_stall_threa Tainted: P...
> [ 390.588120] rcu_stall_threa: Tainted: [P]=PROPRIETARY_MODULE, [W]=WARN, [O]=OOT_MODULE
> [ 390.588122] rcu_stall_threa: Hardware name: MT6858 (DT)
> [ 390.588123] rcu_stall_threa: pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [ 390.588125] rcu_stall_threa: pc : _raw_spin_unlock_irqrestore+0x1c/0x44
> [ 390.588131] rcu_stall_threa: lr : ___ratelimit+0xd4/0x110
> [ 390.588134] rcu_stall_threa: sp : ffffffc08464bdf0
> [ 390.588135] rcu_stall_threa: x29: ffffffc08464bdf0 x28: 0000000000000000 x27: 0000000000000000
> [ 390.588138] rcu_stall_threa: x26: 0000000000000000 x25: 0000000000000000 x24: 00000000000004e2
> [ 390.588140] rcu_stall_threa: x23: ffffffd82ae77000 x22: ffffffd82af1fae8 x21: 000000000000000a
> [ 390.588142] rcu_stall_threa: x20: 0000000000000000 x19: 0000000000000000 x18: ffffffc08456d020
> [ 390.588144] rcu_stall_threa: x17: 000000008c623181 x16: 000000008c623181 x15: 0000000000000010
> [ 390.588146] rcu_stall_threa: x14: 0000000000000100 x13: ffffffc084648000 x12: ffffffc08464c000
> [ 390.588148] rcu_stall_threa: x11: 5e2da9f91a08d800 x10: ffffffd8299b39fc x9 : 0000000100005874
> [ 390.588150] rcu_stall_threa: x8 : 0000000000000000 x7 : 0000000000000001 x6 : fffffffebea2b0a0
> [ 390.588152] rcu_stall_threa: x5 : 0000000000000000 x4 : 0000000000000402 x3 : 0000000000000000
> [ 390.588154] rcu_stall_threa: x2 : ffffff81ca8d9680 x1 : 0000000000000000 x0 : 0000000000000001
> [ 390.588156] rcu_stall_threa: Call trace:
> [ 390.588157] rcu_stall_threa: _raw_spin_unlock_irqrestore+0x1c/0x44
> [ 390.588159] rcu_stall_threa: ___ratelimit+0xd4/0x110
> [ 390.588161] rcu_stall_threa: rcu_thread_func+0x90/0xa8
> [ 390.588164] rcu_stall_threa: kthread+0x110/0x1a4
> [ 390.588167] rcu_stall_threa: ret_from_fork+0x10/0x20
>
> Signed-off-by: Jiazi Li <jqqlijiazi@gmail.com>
> Tested-by: mingzhu.wang <mingzhu.wang@transsion.com>
This looks like an arm64 stack trace. Are there any arm64 systems in
production that do real NMIs? (Don't get me wrong, it would be nice if
there are!)
> ---
> kernel/rcu/tree_stall.h | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index b67532cb8770..5806f9a43579 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -289,7 +289,12 @@ static void rcu_print_detail_task_stall_rnp(struct rcu_node *rnp)
> * Avoid triggering hard lockup.
> */
> touch_nmi_watchdog();
> - sched_show_task(t);
> + if (unlikely(t->on_cpu && t != current) &&
What if task t blocks or migrates to some other CPU at this point?
> + trigger_single_cpu_backtrace(task_cpu(t))) {
> + /*Successfully triggered remote backtrace*/
Wouldn't inverting the condition save a couple of lines of code here?
And make it a bit more straightforward?
Thanx, Paul
> + } else {
> + sched_show_task(t);
> + }
> }
> raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> }
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] rcu: use NMI to dump backtrace of blkd-task running on other cpu
2026-04-21 23:20 ` Paul E. McKenney
@ 2026-04-22 6:45 ` Jiazi Li
0 siblings, 0 replies; 3+ messages in thread
From: Jiazi Li @ 2026-04-22 6:45 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes,
Josh Triplett, Boqun Feng, Uladzislau Rezki, rcu, mingzhu.wang
On Tue, Apr 21, 2026 at 04:20:54PM -0700, Paul E. McKenney wrote:
> On Fri, Apr 17, 2026 at 09:38:13AM +0800, Jiazi Li wrote:
> > sched_show_task cannot dump backtrace of blkd-task running on other
> > cpu:
> > [117421.286553][ C0] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > [117421.286579][ C0] rcu: Tasks blocked on level-0 rcu_node (CPUs 0-7): P2280
> > [117421.286595][ C0] rcu: (detected by 0, t=5252 jiffies, g=751845, q=66318 ncpus=8)
> > [117421.286604][ C0] task:android.imms2 state:R running task stack:0 ...
> > [117421.286617][ C0] Call trace:
> > [117421.286622][ C0] __switch_to+0x1a0/0x318
> > [117421.286636][ C0] 0x0
> >
> > So use NMI to dump backtrace:
> > [ 390.584143] rcub/0: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > [ 390.585156] rcub/0: rcu: Tasks blocked on level-0 rcu_node (CPUs 0-7): P6816
> > [ 390.586207] rcub/0: rcu: (detected by 5, t=52532 jiffies, g=7405, q=63942 ncpus=8)
> > [ 390.587320] rcub/0: Sending NMI from CPU 5 to CPUs 4:
> > [ 390.588111] rcu_stall_threa: NMI backtrace for cpu 4
> > [ 390.588116] rcu_stall_threa: CPU: 4 UID: 0 PID: 6816 Comm: rcu_stall_threa Tainted: P...
> > [ 390.588120] rcu_stall_threa: Tainted: [P]=PROPRIETARY_MODULE, [W]=WARN, [O]=OOT_MODULE
> > [ 390.588122] rcu_stall_threa: Hardware name: MT6858 (DT)
> > [ 390.588123] rcu_stall_threa: pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [ 390.588125] rcu_stall_threa: pc : _raw_spin_unlock_irqrestore+0x1c/0x44
> > [ 390.588131] rcu_stall_threa: lr : ___ratelimit+0xd4/0x110
> > [ 390.588134] rcu_stall_threa: sp : ffffffc08464bdf0
> > [ 390.588135] rcu_stall_threa: x29: ffffffc08464bdf0 x28: 0000000000000000 x27: 0000000000000000
> > [ 390.588138] rcu_stall_threa: x26: 0000000000000000 x25: 0000000000000000 x24: 00000000000004e2
> > [ 390.588140] rcu_stall_threa: x23: ffffffd82ae77000 x22: ffffffd82af1fae8 x21: 000000000000000a
> > [ 390.588142] rcu_stall_threa: x20: 0000000000000000 x19: 0000000000000000 x18: ffffffc08456d020
> > [ 390.588144] rcu_stall_threa: x17: 000000008c623181 x16: 000000008c623181 x15: 0000000000000010
> > [ 390.588146] rcu_stall_threa: x14: 0000000000000100 x13: ffffffc084648000 x12: ffffffc08464c000
> > [ 390.588148] rcu_stall_threa: x11: 5e2da9f91a08d800 x10: ffffffd8299b39fc x9 : 0000000100005874
> > [ 390.588150] rcu_stall_threa: x8 : 0000000000000000 x7 : 0000000000000001 x6 : fffffffebea2b0a0
> > [ 390.588152] rcu_stall_threa: x5 : 0000000000000000 x4 : 0000000000000402 x3 : 0000000000000000
> > [ 390.588154] rcu_stall_threa: x2 : ffffff81ca8d9680 x1 : 0000000000000000 x0 : 0000000000000001
> > [ 390.588156] rcu_stall_threa: Call trace:
> > [ 390.588157] rcu_stall_threa: _raw_spin_unlock_irqrestore+0x1c/0x44
> > [ 390.588159] rcu_stall_threa: ___ratelimit+0xd4/0x110
> > [ 390.588161] rcu_stall_threa: rcu_thread_func+0x90/0xa8
> > [ 390.588164] rcu_stall_threa: kthread+0x110/0x1a4
> > [ 390.588167] rcu_stall_threa: ret_from_fork+0x10/0x20
> >
> > Signed-off-by: Jiazi Li <jqqlijiazi@gmail.com>
> > Tested-by: mingzhu.wang <mingzhu.wang@transsion.com>
>
> This looks like an arm64 stack trace. Are there any arm64 systems in
> production that do real NMIs? (Don't get me wrong, it would be nice if
> there are!)
>
From commit 331a1b3a836c ("arm64: smp: Add arch support for backtrace using pseudo-NMI"), ARM64 using
pseudo-NMI, it's actually an IPI.
> > ---
> > kernel/rcu/tree_stall.h | 7 ++++++-
> > 1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> > index b67532cb8770..5806f9a43579 100644
> > --- a/kernel/rcu/tree_stall.h
> > +++ b/kernel/rcu/tree_stall.h
> > @@ -289,7 +289,12 @@ static void rcu_print_detail_task_stall_rnp(struct rcu_node *rnp)
> > * Avoid triggering hard lockup.
> > */
> > touch_nmi_watchdog();
> > - sched_show_task(t);
> > + if (unlikely(t->on_cpu && t != current) &&
>
> What if task t blocks or migrates to some other CPU at this point?
>
Yes, that's indeed a concern. We can identify such scenarios by checking
whether the PID reported by RCU matched the PID captured in the NMI
backtrace.
Do you have any suggestions?
> > + trigger_single_cpu_backtrace(task_cpu(t))) {
> > + /*Successfully triggered remote backtrace*/
>
> Wouldn't inverting the condition save a couple of lines of code here?
> And make it a bit more straightforward?
>
> Thanx, Paul
>
Do you mean something like the following code?
if (!unlikely(t->on_cpu && t != current) ||
!trigger_single_cpu_backtrace(task_cpu(t)))
sched_show_task(t);
> > + } else {
> > + sched_show_task(t);
> > + }
> > }
> > raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > }
> > --
> > 2.49.0
> >
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-04-22 6:46 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-17 1:38 [PATCH] rcu: use NMI to dump backtrace of blkd-task running on other cpu Jiazi Li
2026-04-21 23:20 ` Paul E. McKenney
2026-04-22 6:45 ` Jiazi Li
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox