[BUG] RCU stall / hung rcu_gp: process_srcu blocked in synchronize_rcu_normal triggered by perf trace teardown on 7.0.0-rc1

public inbox for rcu@vger.kernel.org
 help / color / mirror / Atom feed

* [BUG] RCU stall / hung rcu_gp: process_srcu blocked in synchronize_rcu_normal triggered by perf trace teardown on 7.0.0-rc1
@ 2026-03-02 11:26 Zw Tang
  2026-03-02 13:36 ` Sasha Levin
  0 siblings, 1 reply; 4+ messages in thread
From: Zw Tang @ 2026-03-02 11:26 UTC (permalink / raw)
  To: paulmck, peterz, mhiramat
  Cc: jiangshanlai, mingo, acme, namhyung, rcu, linux-perf-users,
	linux-trace-kernel, linux-kernel

Hi,

I am reporting an RCU stall / hung task issue triggered by a syzkaller
reproducer on Linux 7.0.0-rc1.

The system gets stuck with RCU stalls and multiple hung tasks. The
rcu_gp workqueue shows in-flight “process_srcu”, and kworkers running
“process_srcu” are blocked for >143s. At the same time, a repro task
blocks during perf event teardown (close()), waiting on SRCU.

Key log excerpts:

“Showing busy workqueues and worker pools: workqueue rcu_gp ...
in-flight: process_srcu”

“rcu: INFO: rcu_preempt detected stalls on CPUs/tasks ... Tasks
blocked on level-0 rcu_node ...”

hung tasks:

kworker/... Workqueue: rcu_gp process_srcu
process_srcu -> try_check_zero -> synchronize_rcu_normal -> wait_for_completion

repro2 task blocked in close():
__x64_sys_close -> __fput -> perf_release -> perf_event_release_kernel
-> __free_event
-> perf_trace_destroy -> perf_trace_event_unreg -> __synchronize_srcu
-> wait_for_completion

My understanding / suspected wait chain:

repro task closes a perf fd and enters perf trace teardown:
perf_trace_event_unreg() waits for SRCU via __synchronize_srcu().

rcu_gp workers running process_srcu attempt to advance SRCU and end up calling
synchronize_rcu_normal(), waiting for a normal RCU grace period to complete.

RCU grace period does not complete (RCU stall reported on CPUs/tasks), causing:
__synchronize_srcu() to wait indefinitely, process_srcu to remain
blocked, and the system to stall.

This looks like a circular wait involving perf trace teardown + SRCU +
RCU GP progress (possibly aggravated by timing/RT scheduling).

Reproducer:
C reproducer:  https://pastebin.com/raw/L9hDPCrP
kernel config:  https://pastebin.com/raw/jCq8qdq7
console output : https://pastebin.com/raw/D8XETkeH

Kernel:
git tree: torvalds/linux
commit: 4d349ee5c7782f8b27f6cb550f112c5e26fff38d
kernel version: 7.0.0-rc1
hardware: QEMU Ubuntu 24.10


[  170.750583] Showing busy workqueues and worker pools:
[  170.750595] workqueue rcu_gp: flags=0x108
[  170.750609]   pwq 2: cpus=0 node=0 flags=0x0 nice=0 active=2 refcnt=3
[  170.750635]     in-flight: 9:process_srcu ,11:process_srcu
[  170.750723] pool 2: cpus=0 node=0 flags=0x0 nice=0 hung=0s
workers=4 idle: 220
[  173.609623] perf: interrupt took too long (69138 > 69021), lowering
kernel.pe0
[  173.918434] perf: interrupt took too long (86584 > 86422), lowering
kernel.pe0
[  174.394436] perf: interrupt took too long (109613 > 108230),
lowering kernel.0
[  174.633565] perf: interrupt took too long (137226 > 137016),
lowering kernel.0
[  174.652442] perf: interrupt took too long (171805 > 171532),
lowering kernel.0
[  174.679472] perf: interrupt took too long (214868 > 214756),
lowering kernel.0
[  174.728383] perf: interrupt took too long (268722 > 268585),
lowering kernel.0
[  174.882802] perf: interrupt took too long (336002 > 335902),
lowering kernel.0
[  271.443907] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  271.443951] rcu:     Tasks blocked on level-0 rcu_node (CPUs 0-1): P4093
[  271.443972] rcu:     (detected by 1, t=105002 jiffies, g=33573,
q=4342 ncpus=2)
[  271.443986] task:repro2          state:R  running task
stack:28792 pid:4093  tgid:4092  ppid:300    task_flags:0x400140
flags:0x00080012
[  271.444042] Call Trace:
[  271.444051]  <IRQ>
[  271.444058]  sched_show_task+0x3b5/0x650
[  271.444088]  ? trace_event_raw_event_sched_pi_setprio+0x440/0x440
[  271.444121]  ? rcu_dump_cpu_stacks+0x337/0x4b0
[  271.444146]  ? write_comp_data+0x1f/0x70
[  271.444164]  ? __sanitizer_cov_trace_pc+0x1a/0x40
[  271.444182]  ? wq_watchdog_touch+0xec/0x170
[  271.444198]  rcu_sched_clock_irq+0x297b/0x31c0
[  271.444221]  ? tmigr_requires_handle_remote_up+0x143/0x1c0
[  271.444245]  ? rcu_momentary_eqs+0x40/0x40
[  271.444261]  ? __tmigr_cpu_deactivate+0x1a0/0x1a0
[  271.444283]  ? __sanitizer_cov_trace_pc+0x1a/0x40
[  271.444301]  ? tmigr_requires_handle_remote+0x1cd/0x2a0
[  271.444324]  ? tmigr_handle_remote+0x320/0x320
[  271.444348]  ? write_comp_data+0x1f/0x70
[  271.444366]  ? write_comp_data+0x1f/0x70
[  271.444383]  ? __cgroup_account_cputime_field+0xb9/0x160
[  271.444406]  ? write_comp_data+0x1f/0x70
[  271.444424]  ? hrtimer_run_queues+0x64/0x450
[  271.444447]  update_process_times+0xfa/0x200
[  271.444469]  tick_nohz_handler+0x504/0x720
[  271.444487]  ? find_held_lock+0x2b/0x80
[  271.444505]  ? tick_do_update_jiffies64+0x380/0x380
[  271.444523]  ? lock_release+0xc9/0x2a0
[  271.444548]  __hrtimer_run_queues+0x771/0xb20
[  271.444568]  ? tick_do_update_jiffies64+0x380/0x380
[  271.444589]  ? write_comp_data+0x1f/0x70
[  271.444607]  ? enqueue_hrtimer+0x360/0x360
[  271.444626]  ? __sanitizer_cov_trace_pc+0x1a/0x40
[  271.444649]  hrtimer_interrupt+0x36e/0x820
[  271.444676]  __sysvec_apic_timer_interrupt+0xb5/0x3b0
[  271.444695]  sysvec_apic_timer_interrupt+0x6b/0x80
[  271.444720]  </IRQ>
[  271.444724]  <TASK>
[  271.444730]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[  271.444748] RIP: 0010:finish_task_switch+0x126/0x5c0
[  271.444771] Code: f6 4d 8d 7d 48 e8 8a 45 0a 00 31 f6 4c 89 ef e8
b0 fb ff ff 4c 89 ff e8 88 db b9 02 e8 93 c6 34 00 fb 65 4c 8b 3d 5a
34 cb 04 <49> 8d bf c8 14 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa
48 c1
[  271.444786] RSP: 0018:ffff888009db79b0 EFLAGS: 00000202
[  271.444798] RAX: 00000000001a071f RBX: ffff88800d6b4c80 RCX: 0000000000000006
[  271.444808] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81467bbd
[  271.444817] RBP: ffff888009db79f8 R08: 0000000000000001 R09: 0000000000000001
[  271.444826] R10: fffffbfff0ae32ea R11: ffffffff85719757 R12: ffff8880073f9a80
[  271.444836] R13: ffff88806d338fc0 R14: 0000000000000001 R15: ffff888009d19a80
[  271.444852]  ? finish_task_switch+0x11d/0x5c0
[  271.444876]  ? finish_task_switch+0xe0/0x5c0
[  271.444897]  ? __sanitizer_cov_trace_pc+0x1a/0x40
[  271.444914]  ? __switch_to+0x854/0x1270
[  271.444936]  __schedule+0x1198/0x4190
[  271.444957]  ? io_schedule_timeout+0x80/0x80
[  271.444974]  ? mark_held_locks+0x49/0x70
[  271.444998]  preempt_schedule_irq+0x4e/0x90
[  271.445015]  irqentry_exit+0x17b/0x6c0
[  271.445037]  ? irqentry_enter+0x2a/0xd0
[  271.445058]  ? trace_hardirqs_off_finish+0x12f/0x160
[  271.445080]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[  271.445096] RIP: 0010:__sanitizer_cov_trace_pc+0xd/0x40
[  271.445127] Code: 00 00 ff 00 09 c2 75 07 4c 8b 81 e0 14 00 00 4c
89 c0 c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa bf 02 00 00 00 4c
8b 0c 24 <65> 48 8b 35 9b a7 9e 04 e8 46 ff ff ff 84 c0 74 20 48 8b 96
d0 14
[  271.445141] RSP: 0018:ffff888009db7c28 EFLAGS: 00000246
[  271.445152] RAX: dffffc0000000000 RBX: ffff88800df47b58 RCX: 0000000000000000
[  271.445161] RDX: 1ffff11001be8f71 RSI: ffff888009d19a80 RDI: 0000000000000002
[  271.445171] RBP: ffff888009d18000 R08: 0000000000000000 R09: ffffffff8167adf7
[  271.445180] R10: ffffed1001d87110 R11: ffff88800ec38883 R12: 0000000000000001
[  271.445190] R13: ffff888009db7cd0 R14: ffff888009d18028 R15: 00000000000000ec
[  271.445203]  ? __futex_wake_mark+0xb7/0xe0
[  271.445222]  ? __sanitizer_cov_trace_pc+0x1a/0x40
[  271.445240]  __futex_wake_mark+0xb7/0xe0
[  271.445255]  futex_wake_mark+0xa4/0x190
[  271.445272]  futex_wake+0x441/0x540
[  271.445289]  ? futex_wake_mark+0x190/0x190
[  271.445306]  ? percpu_counter_add_batch+0x11b/0x260
[  271.445329]  ? write_comp_data+0x1f/0x70
[  271.445348]  do_futex+0x26b/0x360
[  271.445363]  ? __ia32_sys_get_robust_list+0x140/0x140
[  271.445380]  ? lock_is_held_type+0x9b/0x110
[  271.445404]  __x64_sys_futex+0x1c9/0x480
[  271.445420]  ? do_futex+0x360/0x360
[  271.445433]  ? write_comp_data+0x1f/0x70
[  271.445450]  ? fd_install+0x1ec/0x4e0
[  271.445471]  ? __sanitizer_cov_trace_pc+0x1a/0x40
[  271.445488]  ? __sys_socket+0x4b/0x130
[  271.445508]  do_syscall_64+0x115/0x690
[  271.445529]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[  271.445544] RIP: 0033:0x7fdfffbe4fc9
[  271.445568] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 8e 0d 00 f7 d8 64 89
01 48
[  271.445582] RSP: 002b:00007fdfffaebde8 EFLAGS: 00000246 ORIG_RAX:
00000000000000ca
[  271.445595] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fdfffbe4fc9
[  271.445605] RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 000055b24d1850ec
[  271.445614] RBP: 00007fdfffaebe00 R08: 0000000000000000 R09: 0000000000000000
[  271.445623] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffb7d2901e
[  271.445632] R13: 00007fffb7d2901f R14: 00007fdfffaebf00 R15: 0000000000022000
[  271.445650]  </TASK>
[  318.431885] INFO: task kworker/0:0:9 blocked for more than 143 seconds.
[  318.431913]       Not tainted 7.0.0-rc1-00301-g4d349ee5c778 #1
[  318.431921] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  318.431928] task:kworker/0:0     state:D stack:27912 pid:9
tgid:9     ppid:2      task_flags:0x4208060 flags:0x00080000
[  318.431983] Workqueue: rcu_gp process_srcu
[  318.432032] Call Trace:
[  318.432038]  <TASK>
[  318.432049]  __schedule+0x1190/0x4190
[  318.432078]  ? io_schedule_timeout+0x80/0x80
[  318.432095]  ? lock_release+0xc9/0x2a0
[  318.432119]  schedule+0xd1/0x260
[  318.432134]  schedule_timeout+0x240/0x280
[  318.432157]  ? hrtimer_nanosleep_restart+0x340/0x340
[  318.432182]  ? mark_held_locks+0x49/0x70
[  318.432203]  ? lockdep_hardirqs_on_prepare+0xd7/0x170
[  318.432224]  ? _raw_spin_unlock_irq+0x23/0x40
[  318.432245]  ? trace_hardirqs_on+0x18/0x170
[  318.432267]  wait_for_completion+0x169/0x320
[  318.432286]  ? wait_for_completion_killable+0x410/0x410
[  318.432304]  ? mark_held_locks+0x49/0x70
[  318.432324]  ? lockdep_hardirqs_on_prepare+0xd7/0x170
[  318.432348]  ? _raw_spin_unlock_irqrestore+0x2c/0x50
[  318.432372]  ? trace_hardirqs_on+0x18/0x170
[  318.432390]  ? _raw_spin_unlock_irqrestore+0x2c/0x50
[  318.432414]  synchronize_rcu_normal+0x208/0x5f0
[  318.432433]  ? start_poll_synchronize_rcu_full+0x90/0x90
[  318.432453]  ? do_raw_spin_lock+0x123/0x290
[  318.432481]  ? _raw_spin_unlock_irqrestore+0x2c/0x50
[  318.432504]  ? rt_mutex_slowunlock+0x824/0xaf0
[  318.432526]  ? lock_is_held_type+0x9b/0x110
[  318.432555]  try_check_zero+0x429/0x630
[  318.432583]  process_srcu+0x4c1/0x16c0
[  318.432607]  ? lock_acquire+0x187/0x2e0
[  318.432626]  ? process_scheduled_works+0x4ca/0x1ac0
[  318.432650]  ? lock_release+0xc9/0x2a0
[  318.432674]  process_scheduled_works+0x553/0x1ac0
[  318.432704]  ? insert_work+0x190/0x190
[  318.432723]  ? write_comp_data+0x1f/0x70
[  318.432741]  ? __sanitizer_cov_trace_pc+0x1a/0x40
[  318.432761]  ? lock_is_held_type+0x9b/0x110
[  318.432784]  ? __sanitizer_cov_trace_pc+0x1a/0x40
[  318.432805]  worker_thread+0x5a9/0xd10
[  318.432830]  ? bh_worker+0x740/0x740
[  318.435937]  kthread+0x3f9/0x530
[  318.435991]  ? kthread_affine_node+0x2a0/0x2a0
[  318.436060]  ret_from_fork+0x666/0xab0
[  318.436098]  ? native_tss_update_io_bitmap+0x6c0/0x6c0
[  318.436139]  ? write_comp_data+0x1f/0x70
[  318.436175]  ? __sanitizer_cov_trace_pc+0x1a/0x40
[  318.436213]  ? __switch_to+0x854/0x1270
[  318.436255]  ? kthread_affine_node+0x2a0/0x2a0
[  318.436301]  ret_from_fork_asm+0x11/0x20
[  318.436360]  </TASK>
[  318.436373] INFO: task kworker/0:1:11 blocked for more than 143 seconds.
[  318.436393]       Not tainted 7.0.0-rc1-00301-g4d349ee5c778 #1
[  318.436411] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  318.436422] task:kworker/0:1     state:D stack:26592 pid:11
tgid:11    ppid:2      task_flags:0x4208060 flags:0x00080000
[  318.436528] Workqueue: rcu_gp process_srcu
[  318.436571] Call Trace:
[  318.436580]  <TASK>
[  318.436595]  __schedule+0x1190/0x4190
[  318.436639]  ? io_schedule_timeout+0x80/0x80
[  318.436676]  ? lock_release+0xc9/0x2a0
[  318.436726]  schedule+0xd1/0x260
[  318.436759]  schedule_timeout+0x240/0x280
[  318.436804]  ? hrtimer_nanosleep_restart+0x340/0x340
[  318.522895]  ? mark_held_locks+0x49/0x70
[  318.522930]  ? lockdep_hardirqs_on_prepare+0xd7/0x170
[  318.522952]  ? _raw_spin_unlock_irq+0x23/0x40
[  318.522977]  ? trace_hardirqs_on+0x18/0x170
[  318.523008]  wait_for_completion+0x169/0x320
[  318.523029]  ? wait_for_completion_killable+0x410/0x410
[  318.523047]  ? mark_held_locks+0x49/0x70
[  318.523068]  ? lockdep_hardirqs_on_prepare+0xd7/0x170
[  318.523088]  ? _raw_spin_unlock_irqrestore+0x2c/0x50
[  318.523110]  ? trace_hardirqs_on+0x18/0x170
[  318.523128]  ? _raw_spin_unlock_irqrestore+0x2c/0x50
[  318.523152]  synchronize_rcu_normal+0x208/0x5f0
[  318.523171]  ? start_poll_synchronize_rcu_full+0x90/0x90
[  318.523190]  ? do_raw_spin_lock+0x123/0x290
[  318.523219]  ? _raw_spin_unlock_irqrestore+0x2c/0x50
[  318.523242]  ? rt_mutex_slowunlock+0x824/0xaf0
[  318.523264]  ? lock_is_held_type+0x9b/0x110
[  318.523294]  try_check_zero+0x429/0x630
[  318.523321]  process_srcu+0x4c1/0x16c0
[  318.523346]  ? lock_acquire+0x187/0x2e0
[  318.523365]  ? process_scheduled_works+0x4ca/0x1ac0
[  318.523389]  ? lock_release+0xc9/0x2a0
[  318.523412]  process_scheduled_works+0x553/0x1ac0
[  318.523442]  ? insert_work+0x190/0x190
[  318.523461]  ? write_comp_data+0x1f/0x70
[  318.523478]  ? __sanitizer_cov_trace_pc+0x1a/0x40
[  318.523498]  ? lock_is_held_type+0x9b/0x110
[  318.523521]  ? __sanitizer_cov_trace_pc+0x1a/0x40
[  318.523541]  worker_thread+0x5a9/0xd10
[  318.523566]  ? bh_worker+0x740/0x740
[  318.523587]  kthread+0x3f9/0x530
[  318.523605]  ? kthread_affine_node+0x2a0/0x2a0
[  318.523627]  ret_from_fork+0x666/0xab0
[  318.523645]  ? native_tss_update_io_bitmap+0x6c0/0x6c0
[  318.523664]  ? write_comp_data+0x1f/0x70
[  318.523680]  ? __sanitizer_cov_trace_pc+0x1a/0x40
[  318.523698]  ? __switch_to+0x854/0x1270
[  318.523717]  ? kthread_affine_node+0x2a0/0x2a0
[  318.523738]  ret_from_fork_asm+0x11/0x20
[  318.523764]  </TASK>
[  318.526244] INFO: task repro2:4086 blocked for more than 143 seconds.
[  318.526258]       Not tainted 7.0.0-rc1-00301-g4d349ee5c778 #1
[  318.526267] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  318.526273] task:repro2          state:D stack:25512 pid:4086
tgid:4086  ppid:298    task_flags:0x400040 flags:0x00080002
executing program[  318.526322] Call Trace:
[  318.526327]  <TASK>

[  318.526334]  __schedule+0x1190/0x4190
[  318.526356]  ? io_schedule_timeout+0x80/0x80
[  318.526373]  ? lock_release+0xc9/0x2a0
[  318.526396]  schedule+0xd1/0x260
[  318.526411]  schedule_timeout+0x240/0x280
[  318.526432]  ? hrtimer_nanosleep_restart+0x340/0x340
[  318.526458]  ? mark_held_locks+0x49/0x70
[  318.526478]  ? lockdep_hardirqs_on_prepare+0xd7/0x170
[  318.526499]  ? _raw_spin_unlock_irq+0x23/0x40
[  318.526520]  ? trace_hardirqs_on+0x18/0x170
[  318.526539]  wait_for_completion+0x169/0x320
[  318.526556]  ? __lock_acquire+0x55a/0x1ef0
[  318.526577]  ? wait_for_completion_killable+0x410/0x410
[  318.526597]  ? lockdep_init_map_type+0x4b/0x210
[  318.526621]  __synchronize_srcu+0x143/0x230
[  318.526643]  ? start_poll_synchronize_srcu+0x10/0x10
[  318.526666]  ? rcu_tasks_pregp_step+0x10/0x10
[  318.526687]  ? kvm_clock_read+0x3b/0x60
[  318.526707]  ? write_comp_data+0x1f/0x70
[  318.526723]  ? __sanitizer_cov_trace_pc+0x1a/0x40
[  318.526741]  ? ktime_get_mono_fast_ns+0x19d/0x2b0
[  318.526766]  ? synchronize_srcu+0x53/0x260
[  318.526789]  perf_trace_event_unreg.isra.0+0xb8/0x1e0
[  318.526807]  perf_trace_destroy+0xc3/0x1c0
[  318.526822]  ? perf_tp_event_init+0x120/0x120
[  318.529171]  __free_event+0x257/0xc10
[  318.529200]  ? perf_event_release_kernel+0x460/0x460
[  318.529215]  put_event+0x3c/0x90
[  318.529230]  perf_event_release_kernel+0x357/0x460
[  318.529248]  ? perf_event_release_kernel+0x460/0x460
[  318.529263]  perf_release+0x37/0x50
[  318.529277]  __fput+0x420/0xb80
[  318.529303]  fput_close_sync+0x10f/0x230
[  318.529325]  ? alloc_file_clone+0x110/0x110
[  318.529345]  ? dnotify_flush+0x7f/0x4c0
[  318.529372]  __x64_sys_close+0x8f/0x120
[  318.529390]  do_syscall_64+0x115/0x690
[  318.529410]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
[  318.529425] RIP: 0033:0x7fdfffcd611b
[  318.529436] RSP: 002b:00007fffb7d290d0 EFLAGS: 00000293 ORIG_RAX:
0000000000000003
[  318.529450] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fdfffcd611b
[  318.529460] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005
[  318.529469] RBP: 00007fffb7d29100 R08: 0000000000000000 R09: 0000000000000032
[  318.529478] R10: 0000000000000000 R11: 0000000000000293 R12: 000055b24d16e3f0
[  318.529486] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  318.529503]  </TASK>


Could you please advise whether this is a known issue/regression, and
what additional debug data would be most helpful? I can rerun with
extra debugging options if needed.

Thanks,
Zw Tang

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG] RCU stall / hung rcu_gp: process_srcu blocked in synchronize_rcu_normal triggered by perf trace teardown on 7.0.0-rc1
  2026-03-02 11:26 [BUG] RCU stall / hung rcu_gp: process_srcu blocked in synchronize_rcu_normal triggered by perf trace teardown on 7.0.0-rc1 Zw Tang
@ 2026-03-02 13:36 ` Sasha Levin
  2026-03-02 16:25   ` Steven Rostedt
  0 siblings, 1 reply; 4+ messages in thread
From: Sasha Levin @ 2026-03-02 13:36 UTC (permalink / raw)
  To: Zw Tang, paulmck, peterz, mhiramat
  Cc: Sasha Levin, jiangshanlai, mingo, acme, namhyung, rcu,
	linux-perf-users, linux-trace-kernel, linux-kernel, rostedt,
	mathieu.desnoyers, josh, bigeasy, ast, boqun.feng, mark.rutland

This response was AI-generated by bug-bot. The analysis may contain errors — please verify independently.

## Bug Summary

This is an RCU stall and hung task deadlock on 7.0.0-rc1, triggered by perf trace teardown under perf interrupt storm conditions. The perf subsystem's tracepoint unregistration path now blocks on SRCU (tracepoint_srcu), which in turn blocks on RCU grace period completion, creating a cascading stall when RCU progress is delayed by perf NMI interrupt storms. Severity: system hang (multiple tasks blocked >143s, eventual complete stall).

## Stack Trace Analysis

The bug involves three interacting blocked entities. Here are the decoded stack traces:

**1. repro2 (pid 4086) - blocked in perf trace teardown (close()):**
```
__x64_sys_close
  fput_close_sync
    __fput
      perf_release
        perf_event_release_kernel
          put_event
            __free_event
              perf_trace_destroy
                perf_trace_event_unreg         [kernel/trace/trace_event_perf.c:154]
                  tracepoint_synchronize_unregister [include/linux/tracepoint.h:116]
                    synchronize_srcu(&tracepoint_srcu)
                      __synchronize_srcu
                        wait_for_completion    ← BLOCKED
```

**2. kworker/0:0 (pid 9) and kworker/0:1 (pid 11) - SRCU grace period workers:**
```
Workqueue: rcu_gp process_srcu
process_srcu                                   [kernel/rcu/srcutree.c:1304]
  srcu_advance_state                           [kernel/rcu/srcutree.c:1161]
    try_check_zero                             [kernel/rcu/srcutree.c:1171]
      srcu_readers_active_idx_check            [kernel/rcu/srcutree.c:544]
        synchronize_rcu()                      ← SRCU-fast path, line 569
          synchronize_rcu_normal
            wait_for_completion                ← BLOCKED
```

**3. repro2 (pid 4093) - RCU stall source:**
```
rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P4093
task:repro2   state:R  running task
  (running in futex_wake syscall, interrupted by timer IRQ)
  asm_sysvec_apic_timer_interrupt
    irqentry_exit → preempt_schedule_irq → __schedule
      finish_task_switch
```

The trace shows process context for the hung tasks and interrupt context (timer IRQ) for the RCU stall detection. The kworkers are in D (uninterruptible sleep) state, blocked in wait_for_completion() within the SRCU grace period state machine.

## Root Cause Analysis

This is a regression introduced by commit a46023d5616ed ("tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast"), which switched tracepoint read-side protection from preempt_disable()+RCU to SRCU-fast via DEFINE_SRCU_FAST(tracepoint_srcu).

The root cause is a new coupling between SRCU grace period processing and RCU grace period completion that did not exist before. The deadlock chain is:

1. The reproducer creates perf events using tracepoints, then closes them while generating heavy perf interrupt load. The perf NMI interrupt storms ("perf: interrupt took too long" messages escalating from 69ms to 336ms) consume most CPU time, starving RCU quiescent state detection.

2. When the perf fd is closed, perf_trace_event_unreg() (kernel/trace/trace_event_perf.c:154) calls tracepoint_synchronize_unregister() (include/linux/tracepoint.h:116), which now calls synchronize_srcu(&tracepoint_srcu) instead of synchronize_rcu().

3. The SRCU grace period for tracepoint_srcu is processed by process_srcu() running in the rcu_gp workqueue. Because tracepoint_srcu is DEFINE_SRCU_FAST, its srcu_reader_flavor includes SRCU_READ_FLAVOR_FAST, which is part of SRCU_READ_FLAVOR_SLOWGP.

4. In srcu_readers_active_idx_check() (kernel/rcu/srcutree.c:544), when SRCU_READ_FLAVOR_SLOWGP is detected, the function calls synchronize_rcu() (line 569) instead of smp_mb() (line 301 in non-fast path). This is the key design tradeoff of SRCU-fast: faster readers (no smp_mb() on read side) at the cost of slower grace periods (synchronize_rcu() on update side).

5. synchronize_rcu() → synchronize_rcu_normal() → wait_for_completion(), waiting for an RCU grace period to complete. But the RCU grace period is stalled because the perf interrupt storms are preventing CPUs from passing through quiescent states quickly enough.

6. Since process_srcu is blocked waiting for synchronize_rcu(), the tracepoint_srcu SRCU grace period cannot advance, so synchronize_srcu(&tracepoint_srcu) in the perf teardown path also blocks indefinitely.

The pre-existing condition (perf NMI storms causing RCU stalls) was previously tolerable because the perf teardown path used synchronize_rcu() directly (via the old tracepoint_synchronize_unregister()), which would eventually complete once the RCU stall resolved. Now, with SRCU-fast, there is an additional layer of indirection: perf teardown waits on SRCU, SRCU processing waits on RCU, and both the SRCU workqueue threads and the perf teardown task are stuck.

## Affected Versions

This is a regression in v7.0-rc1. The bug was introduced by commit a46023d5616ed ("tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast"), which was merged via the trace-v7.0 merge (3c6e577d5ae70). The underlying SRCU-fast infrastructure was added by commit c4020620528e4 ("srcu: Add SRCU-fast readers") and 4d86b1e7e1e98 ("srcu: Add SRCU_READ_FLAVOR_SLOWGP to flag need for synchronize_rcu()"), but the regression became triggerable only when a46023d5616ed applied SRCU-fast to the tracepoint_srcu used in the perf event teardown path.

Kernels before v7.0-rc1 (i.e., v6.x and earlier) are not affected, as they used preempt_disable()+RCU for tracepoint protection, and tracepoint_synchronize_unregister() called synchronize_rcu() directly without SRCU involvement.

## Relevant Commits and Fixes

Key commits in the causal chain:

- a46023d5616ed ("tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast") - the commit that introduced the regression by switching tracepoints to SRCU-fast
- a77cb6a867667 ("srcu: Fix warning to permit SRCU-fast readers in NMI handlers") - immediate predecessor fix
- c4020620528e4 ("srcu: Add SRCU-fast readers") - added the SRCU-fast reader API
- 4d86b1e7e1e98 ("srcu: Add SRCU_READ_FLAVOR_SLOWGP to flag need for synchronize_rcu()") - added the synchronize_rcu()-instead-of-smp_mb() logic in SRCU grace period processing
- 16718274ee75d ("tracing: perf: Have perf tracepoint callbacks always disable preemption") - preparatory commit for the SRCU-fast switch

No fix for this specific issue was found in mainline or in any -next branches as of today.

## Prior Discussions

No prior reports of this specific RCU stall / SRCU deadlock triggered via perf trace teardown with SRCU-fast were found on lore.kernel.org. The original SRCU-fast tracepoint series was posted at https://lore.kernel.org/all/20260126231256.499701982@kernel.org/ (linked from the commit message), motivated by enabling preemptible BPF on tracepoints for RT systems (https://lore.kernel.org/all/20250613152218.1924093-1-bigeasy@linutronix.de/). No discussion of the synchronize_rcu()-from-workqueue stall scenario appears to have taken place in those threads.

## Suggested Actions

1. Confirm the regression by testing with the parent commit a77cb6a867667 (immediately before a46023d5616ed). If the issue disappears, this confirms the SRCU-fast tracepoint switch as the cause.

2. As a quick workaround, reverting a46023d5616ed (and its preparatory commits a77cb6a867667, f7d327654b886, 16718274ee75d if needed) should eliminate the deadlock, at the cost of losing preemptible BPF tracepoint support.

3. The fundamental issue is that process_srcu() for SRCU-fast structures calls synchronize_rcu() synchronously from workqueue context. Possible fixes include:
   - Using an asynchronous mechanism (e.g., call_rcu() with a callback to resume SRCU GP processing) instead of blocking synchronize_rcu() within the SRCU state machine.
   - Having srcu_readers_active_idx_check() use poll_state_synchronize_rcu() and defer retrying instead of blocking.
   - Bounding the perf interrupt rate escalation to prevent the RCU stall in the first place (though this would only mask the underlying SRCU↔RCU coupling issue).

4. If you can reproduce reliably, adding the following debug options would provide more information: CONFIG_RCU_TRACE=y, CONFIG_PROVE_RCU=y, and booting with rcutree.rcu_kick_kthreads=1 to see if kicking the RCU threads helps break the stall.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG] RCU stall / hung rcu_gp: process_srcu blocked in synchronize_rcu_normal triggered by perf trace teardown on 7.0.0-rc1
  2026-03-02 13:36 ` Sasha Levin
@ 2026-03-02 16:25   ` Steven Rostedt
  2026-03-02 17:37     ` Sasha Levin
  0 siblings, 1 reply; 4+ messages in thread
From: Steven Rostedt @ 2026-03-02 16:25 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Zw Tang, paulmck, peterz, mhiramat, jiangshanlai, mingo, acme,
	namhyung, rcu, linux-perf-users, linux-trace-kernel, linux-kernel,
	mathieu.desnoyers, josh, bigeasy, ast, boqun.feng, mark.rutland

On Mon,  2 Mar 2026 08:36:13 -0500
Sasha Levin <sashal@kernel.org> wrote:

> This response was AI-generated by bug-bot. The analysis may contain errors — please verify independently.
> 
> ## Bug Summary
> 
> This is an RCU stall and hung task deadlock on 7.0.0-rc1, triggered by perf trace teardown under perf interrupt storm conditions. The perf subsystem's tracepoint unregistration path now blocks on SRCU (tracepoint_srcu), which in turn blocks on RCU grace period completion, creating a cascading stall when RCU progress is delayed by perf NMI interrupt storms. Severity: system hang (multiple tasks blocked >143s, eventual complete stall).

Hmm, this analysis corresponds nicely to what I was thinking when looking
at the stack dumps, but it gives a bit more details than I would have.

> ## Root Cause Analysis
> 
> This is a regression introduced by commit a46023d5616ed ("tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast"), which switched tracepoint read-side protection from preempt_disable()+RCU to SRCU-fast via DEFINE_SRCU_FAST(tracepoint_srcu).

Yes, as soon as I saw the report, I knew it had to do with this commit.

> 
> The root cause is a new coupling between SRCU grace period processing and RCU grace period completion that did not exist before. The deadlock chain is:
> 
> 1. The reproducer creates perf events using tracepoints, then closes them while generating heavy perf interrupt load. The perf NMI interrupt storms ("perf: interrupt took too long" messages escalating from 69ms to 336ms) consume most CPU time, starving RCU quiescent state detection.

The real bug is the NMI interrupt storms. The commit above only makes it
more of an issue, but that commit itself is not the bug.


> ## Suggested Actions
> 
> 1. Confirm the regression by testing with the parent commit a77cb6a867667 (immediately before a46023d5616ed). If the issue disappears, this confirms the SRCU-fast tracepoint switch as the cause.
> 
> 2. As a quick workaround, reverting a46023d5616ed (and its preparatory commits a77cb6a867667, f7d327654b886, 16718274ee75d if needed) should eliminate the deadlock, at the cost of losing preemptible BPF tracepoint support.

That is not the answer, as the above only made the current bug (interrupt
storms) visible.

> 
> 3. The fundamental issue is that process_srcu() for SRCU-fast structures calls synchronize_rcu() synchronously from workqueue context. Possible fixes include:
>    - Using an asynchronous mechanism (e.g., call_rcu() with a callback to resume SRCU GP processing) instead of blocking synchronize_rcu() within the SRCU state machine.
>    - Having srcu_readers_active_idx_check() use poll_state_synchronize_rcu() and defer retrying instead of blocking.
>    - Bounding the perf interrupt rate escalation to prevent the RCU stall in the first place (though this would only mask the underlying SRCU↔RCU coupling issue).
> 
> 4. If you can reproduce reliably, adding the following debug options would provide more information: CONFIG_RCU_TRACE=y, CONFIG_PROVE_RCU=y, and booting with rcutree.rcu_kick_kthreads=1 to see if kicking the RCU threads helps break the stall.


The real fix is to find a way to disable the perf interrupt storms *before*
unregistering the tracepoint.

-- Steve

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG] RCU stall / hung rcu_gp: process_srcu blocked in synchronize_rcu_normal triggered by perf trace teardown on 7.0.0-rc1
  2026-03-02 16:25   ` Steven Rostedt
@ 2026-03-02 17:37     ` Sasha Levin
  0 siblings, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2026-03-02 17:37 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Zw Tang, paulmck, peterz, mhiramat, jiangshanlai, mingo, acme,
	namhyung, rcu, linux-perf-users, linux-trace-kernel, linux-kernel,
	mathieu.desnoyers, josh, bigeasy, ast, boqun.feng, mark.rutland

On Mon, Mar 02, 2026 at 11:25:45AM -0500, Steven Rostedt wrote:
>On Mon,  2 Mar 2026 08:36:13 -0500
>Sasha Levin <sashal@kernel.org> wrote:
>
>> This response was AI-generated by bug-bot. The analysis may contain errors — please verify independently.
>>
>> ## Bug Summary
>>
>> This is an RCU stall and hung task deadlock on 7.0.0-rc1, triggered by perf trace teardown under perf interrupt storm conditions. The perf subsystem's tracepoint unregistration path now blocks on SRCU (tracepoint_srcu), which in turn blocks on RCU grace period completion, creating a cascading stall when RCU progress is delayed by perf NMI interrupt storms. Severity: system hang (multiple tasks blocked >143s, eventual complete stall).
>
>Hmm, this analysis corresponds nicely to what I was thinking when looking
>at the stack dumps, but it gives a bit more details than I would have.

Thanks for the feedback!

I'll finish fine-tuning this script, and publish it in a day or two :)

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-03-02 17:37 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-02 11:26 [BUG] RCU stall / hung rcu_gp: process_srcu blocked in synchronize_rcu_normal triggered by perf trace teardown on 7.0.0-rc1 Zw Tang
2026-03-02 13:36 ` Sasha Levin
2026-03-02 16:25   ` Steven Rostedt
2026-03-02 17:37     ` Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox