* [syzbot] [kernel?] WARNING: suspicious RCU usage in __do_softirq
@ 2024-04-02 18:05 syzbot
2024-04-16 7:05 ` syzbot
2024-04-16 8:10 ` Z qiang
0 siblings, 2 replies; 18+ messages in thread
From: syzbot @ 2024-04-02 18:05 UTC (permalink / raw)
To: linux-kernel, peterz, syzkaller-bugs, tglx
Hello,
syzbot found the following issue on:
HEAD commit: c0b832517f62 Add linux-next specific files for 20240402
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=15f64776180000
kernel config: https://syzkaller.appspot.com/x/.config?x=afcaf46d374cec8c
dashboard link: https://syzkaller.appspot.com/bug?extid=dce04ed6d1438ad69656
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10f00471180000
Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/0d36ec76edc7/disk-c0b83251.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/6f9bb4e37dd0/vmlinux-c0b83251.xz
kernel image: https://storage.googleapis.com/syzbot-assets/2349287b14b7/bzImage-c0b83251.xz
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+dce04ed6d1438ad69656@syzkaller.appspotmail.com
=============================
WARNING: suspicious RCU usage
6.9.0-rc2-next-20240402-syzkaller #0 Not tainted
-----------------------------
kernel/rcu/tree.c:276 Illegal rcu_softirq_qs() in RCU read-side critical section!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ksoftirqd/0/16:
#0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline]
#0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: rcu_read_lock_sched include/linux/rcupdate.h:933 [inline]
#0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: pfn_valid include/linux/mmzone.h:2019 [inline]
#0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: __virt_addr_valid+0x183/0x520 arch/x86/mm/physaddr.c:65
stack backtrace:
CPU: 0 PID: 16 Comm: ksoftirqd/0 Not tainted 6.9.0-rc2-next-20240402-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
lockdep_rcu_suspicious+0x221/0x340 kernel/locking/lockdep.c:6712
rcu_softirq_qs+0xd9/0x370 kernel/rcu/tree.c:273
__do_softirq+0x5fd/0x980 kernel/softirq.c:568
invoke_softirq kernel/softirq.c:428 [inline]
__irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633
irq_exit_rcu+0x9/0x30 kernel/softirq.c:645
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
RIP: 0010:debug_lockdep_rcu_enabled+0xd/0x40 kernel/rcu/update.c:320
Code: f5 90 0f 0b 90 90 90 eb c6 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 31 c0 83 3d c7 0f 28 04 00 <74> 1e 83 3d 26 42 28 04 00 74 15 65 48 8b 0c 25 c0 d3 03 00 31 c0
RSP: 0018:ffffc90000157a50 EFLAGS: 00000202
RAX: 0000000000000000 RBX: 00000000000000a0 RCX: 0000000000000001
RDX: dffffc0000000000 RSI: ffffffff8bcae740 RDI: ffffffff8c1f7ec0
RBP: dffffc0000000000 R08: ffffffff92f3a527 R09: 1ffffffff25e74a4
R10: dffffc0000000000 R11: fffffbfff25e74a5 R12: 0000000029373578
R13: 1ffff9200002af64 R14: ffffffff814220f3 R15: ffff88813fff90a0
rcu_read_lock_sched include/linux/rcupdate.h:934 [inline]
pfn_valid include/linux/mmzone.h:2019 [inline]
__virt_addr_valid+0x1a9/0x520 arch/x86/mm/physaddr.c:65
kasan_addr_to_slab+0xd/0x80 mm/kasan/common.c:37
__kasan_record_aux_stack+0x11/0xc0 mm/kasan/generic.c:526
__call_rcu_common kernel/rcu/tree.c:3096 [inline]
call_rcu+0x167/0xa70 kernel/rcu/tree.c:3200
context_switch kernel/sched/core.c:5412 [inline]
__schedule+0x17f0/0x4a50 kernel/sched/core.c:6746
__schedule_loop kernel/sched/core.c:6823 [inline]
schedule+0x14b/0x320 kernel/sched/core.c:6838
smpboot_thread_fn+0x61e/0xa30 kernel/smpboot.c:160
kthread+0x2f0/0x390 kernel/kthread.c:388
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
</TASK>
----------------
Code disassembly (best guess):
0: f5 cmc
1: 90 nop
2: 0f 0b ud2
4: 90 nop
5: 90 nop
6: 90 nop
7: eb c6 jmp 0xffffffcf
9: 0f 1f 40 00 nopl 0x0(%rax)
d: 90 nop
e: 90 nop
f: 90 nop
10: 90 nop
11: 90 nop
12: 90 nop
13: 90 nop
14: 90 nop
15: 90 nop
16: 90 nop
17: 90 nop
18: 90 nop
19: 90 nop
1a: 90 nop
1b: 90 nop
1c: 90 nop
1d: f3 0f 1e fa endbr64
21: 31 c0 xor %eax,%eax
23: 83 3d c7 0f 28 04 00 cmpl $0x0,0x4280fc7(%rip) # 0x4280ff1
* 2a: 74 1e je 0x4a <-- trapping instruction
2c: 83 3d 26 42 28 04 00 cmpl $0x0,0x4284226(%rip) # 0x4284259
33: 74 15 je 0x4a
35: 65 48 8b 0c 25 c0 d3 mov %gs:0x3d3c0,%rcx
3c: 03 00
3e: 31 c0 xor %eax,%eax
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.
If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)
If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report
If you want to undo deduplication, reply with:
#syz undup
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [syzbot] [kernel?] WARNING: suspicious RCU usage in __do_softirq 2024-04-02 18:05 [syzbot] [kernel?] WARNING: suspicious RCU usage in __do_softirq syzbot @ 2024-04-16 7:05 ` syzbot 2024-04-16 8:10 ` Z qiang 1 sibling, 0 replies; 18+ messages in thread From: syzbot @ 2024-04-16 7:05 UTC (permalink / raw) To: linux-kernel, peterz, syzkaller-bugs, tglx syzbot has found a reproducer for the following issue on: HEAD commit: 6bd343537461 Add linux-next specific files for 20240415 git tree: linux-next console output: https://syzkaller.appspot.com/x/log.txt?x=132b923b180000 kernel config: https://syzkaller.appspot.com/x/.config?x=c247afaa437e6409 dashboard link: https://syzkaller.appspot.com/bug?extid=dce04ed6d1438ad69656 compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11e7ff77180000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=125e7767180000 Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/aaf63ca280e5/disk-6bd34353.raw.xz vmlinux: https://storage.googleapis.com/syzbot-assets/353fdc5b9ee6/vmlinux-6bd34353.xz kernel image: https://storage.googleapis.com/syzbot-assets/25a7a362b023/bzImage-6bd34353.xz IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+dce04ed6d1438ad69656@syzkaller.appspotmail.com ============================= WARNING: suspicious RCU usage 6.9.0-rc4-next-20240415-syzkaller #0 Not tainted ----------------------------- kernel/rcu/tree.c:276 Illegal rcu_softirq_qs() in RCU read-side critical section! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by ksoftirqd/0/16: #0: ffffffff8e334160 (rcu_read_lock_sched){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline] (rcu_read_lock_sched){....}-{1:2}, at: rcu_read_lock_sched include/linux/rcupdate.h:933 [inline] (rcu_read_lock_sched){....}-{1:2}, at: pfn_valid include/linux/mmzone.h:2019 [inline] (rcu_read_lock_sched){....}-{1:2}, at: __virt_addr_valid+0x183/0x520 arch/x86/mm/physaddr.c:65 stack backtrace: CPU: 0 PID: 16 Comm: ksoftirqd/0 Not tainted 6.9.0-rc4-next-20240415-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 lockdep_rcu_suspicious+0x221/0x340 kernel/locking/lockdep.c:6712 rcu_softirq_qs+0xd9/0x370 kernel/rcu/tree.c:273 __do_softirq+0x5fd/0x980 kernel/softirq.c:568 invoke_softirq kernel/softirq.c:428 [inline] __irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633 irq_exit_rcu+0x9/0x30 kernel/softirq.c:645 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline] sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702 RIP: 0010:lock_acquire+0x264/0x550 kernel/locking/lockdep.c:5758 Code: 2b 00 74 08 4c 89 f7 e8 8a fd 88 00 f6 44 24 61 02 0f 85 85 01 00 00 41 f7 c7 00 02 00 00 74 01 fb 48 c7 44 24 40 0e 36 e0 45 <4b> c7 44 25 00 00 00 00 00 43 c7 44 25 09 00 00 00 00 43 c7 44 25 RSP: 0018:ffffc90000157820 EFLAGS: 00000206 RAX: 0000000000000001 RBX: 1ffff9200002af10 RCX: 0000000000000001 RDX: dffffc0000000000 RSI: ffffffff8bcac4e0 RDI: ffffffff8c1f8ee0 RBP: ffffc90000157980 R08: ffffffff92f765a7 R09: 1ffffffff25eecb4 R10: dffffc0000000000 R11: fffffbfff25eecb5 R12: 1ffff9200002af0c R13: dffffc0000000000 R14: ffffc90000157880 R15: 0000000000000246 rcu_lock_acquire include/linux/rcupdate.h:329 [inline] rcu_read_lock_sched include/linux/rcupdate.h:933 [inline] pfn_valid include/linux/mmzone.h:2019 [inline] __virt_addr_valid+0x1a0/0x520 arch/x86/mm/physaddr.c:65 kasan_addr_to_slab+0xd/0x80 mm/kasan/common.c:37 __kasan_record_aux_stack+0x11/0xc0 mm/kasan/generic.c:526 __call_rcu_common kernel/rcu/tree.c:3102 [inline] call_rcu+0x167/0xa70 kernel/rcu/tree.c:3206 context_switch kernel/sched/core.c:5412 [inline] __schedule+0x17f0/0x4a50 kernel/sched/core.c:6746 preempt_schedule_common+0x84/0xd0 kernel/sched/core.c:6925 preempt_schedule+0xe1/0xf0 kernel/sched/core.c:6949 preempt_schedule_thunk+0x1a/0x30 arch/x86/entry/thunk.S:12 smpboot_thread_fn+0x65b/0xa30 kernel/smpboot.c:163 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> ---------------- Code disassembly (best guess): 0: 2b 00 sub (%rax),%eax 2: 74 08 je 0xc 4: 4c 89 f7 mov %r14,%rdi 7: e8 8a fd 88 00 call 0x88fd96 c: f6 44 24 61 02 testb $0x2,0x61(%rsp) 11: 0f 85 85 01 00 00 jne 0x19c 17: 41 f7 c7 00 02 00 00 test $0x200,%r15d 1e: 74 01 je 0x21 20: fb sti 21: 48 c7 44 24 40 0e 36 movq $0x45e0360e,0x40(%rsp) 28: e0 45 * 2a: 4b c7 44 25 00 00 00 movq $0x0,0x0(%r13,%r12,1) <-- trapping instruction 31: 00 00 33: 43 c7 44 25 09 00 00 movl $0x0,0x9(%r13,%r12,1) 3a: 00 00 3c: 43 rex.XB 3d: c7 .byte 0xc7 3e: 44 rex.R 3f: 25 .byte 0x25 --- If you want syzbot to run the reproducer, reply with: #syz test: git://repo/address.git branch-or-commit-hash If you attach or paste a git patch, syzbot will apply it before testing. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot] [kernel?] WARNING: suspicious RCU usage in __do_softirq 2024-04-02 18:05 [syzbot] [kernel?] WARNING: suspicious RCU usage in __do_softirq syzbot 2024-04-16 7:05 ` syzbot @ 2024-04-16 8:10 ` Z qiang 2024-04-16 8:44 ` Z qiang 1 sibling, 1 reply; 18+ messages in thread From: Z qiang @ 2024-04-16 8:10 UTC (permalink / raw) To: syzbot, Paul E. McKenney; +Cc: linux-kernel, peterz, syzkaller-bugs, tglx Cc: Paul > > Hello, > > syzbot found the following issue on: > > HEAD commit: c0b832517f62 Add linux-next specific files for 20240402 > git tree: linux-next > console output: https://syzkaller.appspot.com/x/log.txt?x=15f64776180000 > kernel config: https://syzkaller.appspot.com/x/.config?x=afcaf46d374cec8c > dashboard link: https://syzkaller.appspot.com/bug?extid=dce04ed6d1438ad69656 > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10f00471180000 > > Downloadable assets: > disk image: https://storage.googleapis.com/syzbot-assets/0d36ec76edc7/disk-c0b83251.raw.xz > vmlinux: https://storage.googleapis.com/syzbot-assets/6f9bb4e37dd0/vmlinux-c0b83251.xz > kernel image: https://storage.googleapis.com/syzbot-assets/2349287b14b7/bzImage-c0b83251.xz > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+dce04ed6d1438ad69656@syzkaller.appspotmail.com > > ============================= > WARNING: suspicious RCU usage > 6.9.0-rc2-next-20240402-syzkaller #0 Not tainted > ----------------------------- > kernel/rcu/tree.c:276 Illegal rcu_softirq_qs() in RCU read-side critical section! > > other info that might help us debug this: > > > rcu_scheduler_active = 2, debug_locks = 1 > 1 lock held by ksoftirqd/0/16: > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline] > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: rcu_read_lock_sched include/linux/rcupdate.h:933 [inline] > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: pfn_valid include/linux/mmzone.h:2019 [inline] > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: __virt_addr_valid+0x183/0x520 arch/x86/mm/physaddr.c:65 > > stack backtrace: > CPU: 0 PID: 16 Comm: ksoftirqd/0 Not tainted 6.9.0-rc2-next-20240402-syzkaller #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 > Call Trace: > <IRQ> > __dump_stack lib/dump_stack.c:88 [inline] > dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 > lockdep_rcu_suspicious+0x221/0x340 kernel/locking/lockdep.c:6712 > rcu_softirq_qs+0xd9/0x370 kernel/rcu/tree.c:273 > __do_softirq+0x5fd/0x980 kernel/softirq.c:568 > invoke_softirq kernel/softirq.c:428 [inline] > __irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633 > irq_exit_rcu+0x9/0x30 kernel/softirq.c:645 > instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline] > sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043 > </IRQ> > <TASK> > asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702 > RIP: 0010:debug_lockdep_rcu_enabled+0xd/0x40 kernel/rcu/update.c:320 > Code: f5 90 0f 0b 90 90 90 eb c6 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 31 c0 83 3d c7 0f 28 04 00 <74> 1e 83 3d 26 42 28 04 00 74 15 65 48 8b 0c 25 c0 d3 03 00 31 c0 > RSP: 0018:ffffc90000157a50 EFLAGS: 00000202 > RAX: 0000000000000000 RBX: 00000000000000a0 RCX: 0000000000000001 > RDX: dffffc0000000000 RSI: ffffffff8bcae740 RDI: ffffffff8c1f7ec0 > RBP: dffffc0000000000 R08: ffffffff92f3a527 R09: 1ffffffff25e74a4 > R10: dffffc0000000000 R11: fffffbfff25e74a5 R12: 0000000029373578 > R13: 1ffff9200002af64 R14: ffffffff814220f3 R15: ffff88813fff90a0 > rcu_read_lock_sched include/linux/rcupdate.h:934 [inline] > pfn_valid include/linux/mmzone.h:2019 [inline] > __virt_addr_valid+0x1a9/0x520 arch/x86/mm/physaddr.c:65 > kasan_addr_to_slab+0xd/0x80 mm/kasan/common.c:37 > __kasan_record_aux_stack+0x11/0xc0 mm/kasan/generic.c:526 This should be caused by the following commit: d818cc76e2b4 ("kasan: Record work creation stack trace with interrupts enabled") Is it possible to make the rcu_softirq_qs() run only in ksoftirqd task? Thanks Zqiang > __call_rcu_common kernel/rcu/tree.c:3096 [inline] > call_rcu+0x167/0xa70 kernel/rcu/tree.c:3200 > context_switch kernel/sched/core.c:5412 [inline] > __schedule+0x17f0/0x4a50 kernel/sched/core.c:6746 > __schedule_loop kernel/sched/core.c:6823 [inline] > schedule+0x14b/0x320 kernel/sched/core.c:6838 > smpboot_thread_fn+0x61e/0xa30 kernel/smpboot.c:160 > kthread+0x2f0/0x390 kernel/kthread.c:388 > ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243 > </TASK> > ---------------- > Code disassembly (best guess): > 0: f5 cmc > 1: 90 nop > 2: 0f 0b ud2 > 4: 90 nop > 5: 90 nop > 6: 90 nop > 7: eb c6 jmp 0xffffffcf > 9: 0f 1f 40 00 nopl 0x0(%rax) > d: 90 nop > e: 90 nop > f: 90 nop > 10: 90 nop > 11: 90 nop > 12: 90 nop > 13: 90 nop > 14: 90 nop > 15: 90 nop > 16: 90 nop > 17: 90 nop > 18: 90 nop > 19: 90 nop > 1a: 90 nop > 1b: 90 nop > 1c: 90 nop > 1d: f3 0f 1e fa endbr64 > 21: 31 c0 xor %eax,%eax > 23: 83 3d c7 0f 28 04 00 cmpl $0x0,0x4280fc7(%rip) # 0x4280ff1 > * 2a: 74 1e je 0x4a <-- trapping instruction > 2c: 83 3d 26 42 28 04 00 cmpl $0x0,0x4284226(%rip) # 0x4284259 > 33: 74 15 je 0x4a > 35: 65 48 8b 0c 25 c0 d3 mov %gs:0x3d3c0,%rcx > 3c: 03 00 > 3e: 31 c0 xor %eax,%eax > > > --- > This report is generated by a bot. It may contain errors. > See https://goo.gl/tpsmEJ for more information about syzbot. > syzbot engineers can be reached at syzkaller@googlegroups.com. > > syzbot will keep track of this issue. See: > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > > If the report is already addressed, let syzbot know by replying with: > #syz fix: exact-commit-title > > If you want syzbot to run the reproducer, reply with: > #syz test: git://repo/address.git branch-or-commit-hash > If you attach or paste a git patch, syzbot will apply it before testing. > > If you want to overwrite report's subsystems, reply with: > #syz set subsystems: new-subsystem > (See the list of subsystem names on the web dashboard) > > If the report is a duplicate of another one, reply with: > #syz dup: exact-subject-of-another-report > > If you want to undo deduplication, reply with: > #syz undup > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot] [kernel?] WARNING: suspicious RCU usage in __do_softirq 2024-04-16 8:10 ` Z qiang @ 2024-04-16 8:44 ` Z qiang 2024-04-17 1:13 ` Paul E. McKenney 0 siblings, 1 reply; 18+ messages in thread From: Z qiang @ 2024-04-16 8:44 UTC (permalink / raw) To: syzbot, Paul E. McKenney; +Cc: linux-kernel, peterz, syzkaller-bugs, tglx On Tue, Apr 16, 2024 at 4:10 PM Z qiang <qiang.zhang1211@gmail.com> wrote: > > Cc: Paul > > > > Hello, > > > > syzbot found the following issue on: > > > > HEAD commit: c0b832517f62 Add linux-next specific files for 20240402 > > git tree: linux-next > > console output: https://syzkaller.appspot.com/x/log.txt?x=15f64776180000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=afcaf46d374cec8c > > dashboard link: https://syzkaller.appspot.com/bug?extid=dce04ed6d1438ad69656 > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10f00471180000 > > > > Downloadable assets: > > disk image: https://storage.googleapis.com/syzbot-assets/0d36ec76edc7/disk-c0b83251.raw.xz > > vmlinux: https://storage.googleapis.com/syzbot-assets/6f9bb4e37dd0/vmlinux-c0b83251.xz > > kernel image: https://storage.googleapis.com/syzbot-assets/2349287b14b7/bzImage-c0b83251.xz > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > Reported-by: syzbot+dce04ed6d1438ad69656@syzkaller.appspotmail.com > > > > ============================= > > WARNING: suspicious RCU usage > > 6.9.0-rc2-next-20240402-syzkaller #0 Not tainted > > ----------------------------- > > kernel/rcu/tree.c:276 Illegal rcu_softirq_qs() in RCU read-side critical section! > > > > other info that might help us debug this: > > > > > > rcu_scheduler_active = 2, debug_locks = 1 > > 1 lock held by ksoftirqd/0/16: > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline] > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: rcu_read_lock_sched include/linux/rcupdate.h:933 [inline] > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: pfn_valid include/linux/mmzone.h:2019 [inline] > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: __virt_addr_valid+0x183/0x520 arch/x86/mm/physaddr.c:65 > > > > stack backtrace: > > CPU: 0 PID: 16 Comm: ksoftirqd/0 Not tainted 6.9.0-rc2-next-20240402-syzkaller #0 > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 > > Call Trace: > > <IRQ> > > __dump_stack lib/dump_stack.c:88 [inline] > > dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 > > lockdep_rcu_suspicious+0x221/0x340 kernel/locking/lockdep.c:6712 > > rcu_softirq_qs+0xd9/0x370 kernel/rcu/tree.c:273 > > __do_softirq+0x5fd/0x980 kernel/softirq.c:568 > > invoke_softirq kernel/softirq.c:428 [inline] > > __irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633 > > irq_exit_rcu+0x9/0x30 kernel/softirq.c:645 > > instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline] > > sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043 > > </IRQ> > > <TASK> > > asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702 > > RIP: 0010:debug_lockdep_rcu_enabled+0xd/0x40 kernel/rcu/update.c:320 > > Code: f5 90 0f 0b 90 90 90 eb c6 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 31 c0 83 3d c7 0f 28 04 00 <74> 1e 83 3d 26 42 28 04 00 74 15 65 48 8b 0c 25 c0 d3 03 00 31 c0 > > RSP: 0018:ffffc90000157a50 EFLAGS: 00000202 > > RAX: 0000000000000000 RBX: 00000000000000a0 RCX: 0000000000000001 > > RDX: dffffc0000000000 RSI: ffffffff8bcae740 RDI: ffffffff8c1f7ec0 > > RBP: dffffc0000000000 R08: ffffffff92f3a527 R09: 1ffffffff25e74a4 > > R10: dffffc0000000000 R11: fffffbfff25e74a5 R12: 0000000029373578 > > R13: 1ffff9200002af64 R14: ffffffff814220f3 R15: ffff88813fff90a0 > > rcu_read_lock_sched include/linux/rcupdate.h:934 [inline] > > pfn_valid include/linux/mmzone.h:2019 [inline] > > __virt_addr_valid+0x1a9/0x520 arch/x86/mm/physaddr.c:65 > > kasan_addr_to_slab+0xd/0x80 mm/kasan/common.c:37 > > __kasan_record_aux_stack+0x11/0xc0 mm/kasan/generic.c:526 > > > This should be caused by the following commit: > d818cc76e2b4 ("kasan: Record work creation stack trace with interrupts enabled") > > Is it possible to make the rcu_softirq_qs() run only in ksoftirqd task? use rcu_softirq_qs_periodic() in run_ksoftirqd(). > > Thanks > Zqiang > > > __call_rcu_common kernel/rcu/tree.c:3096 [inline] > > call_rcu+0x167/0xa70 kernel/rcu/tree.c:3200 > > context_switch kernel/sched/core.c:5412 [inline] > > __schedule+0x17f0/0x4a50 kernel/sched/core.c:6746 > > __schedule_loop kernel/sched/core.c:6823 [inline] > > schedule+0x14b/0x320 kernel/sched/core.c:6838 > > smpboot_thread_fn+0x61e/0xa30 kernel/smpboot.c:160 > > kthread+0x2f0/0x390 kernel/kthread.c:388 > > ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243 > > </TASK> > > ---------------- > > Code disassembly (best guess): > > 0: f5 cmc > > 1: 90 nop > > 2: 0f 0b ud2 > > 4: 90 nop > > 5: 90 nop > > 6: 90 nop > > 7: eb c6 jmp 0xffffffcf > > 9: 0f 1f 40 00 nopl 0x0(%rax) > > d: 90 nop > > e: 90 nop > > f: 90 nop > > 10: 90 nop > > 11: 90 nop > > 12: 90 nop > > 13: 90 nop > > 14: 90 nop > > 15: 90 nop > > 16: 90 nop > > 17: 90 nop > > 18: 90 nop > > 19: 90 nop > > 1a: 90 nop > > 1b: 90 nop > > 1c: 90 nop > > 1d: f3 0f 1e fa endbr64 > > 21: 31 c0 xor %eax,%eax > > 23: 83 3d c7 0f 28 04 00 cmpl $0x0,0x4280fc7(%rip) # 0x4280ff1 > > * 2a: 74 1e je 0x4a <-- trapping instruction > > 2c: 83 3d 26 42 28 04 00 cmpl $0x0,0x4284226(%rip) # 0x4284259 > > 33: 74 15 je 0x4a > > 35: 65 48 8b 0c 25 c0 d3 mov %gs:0x3d3c0,%rcx > > 3c: 03 00 > > 3e: 31 c0 xor %eax,%eax > > > > > > --- > > This report is generated by a bot. It may contain errors. > > See https://goo.gl/tpsmEJ for more information about syzbot. > > syzbot engineers can be reached at syzkaller@googlegroups.com. > > > > syzbot will keep track of this issue. See: > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > > > > If the report is already addressed, let syzbot know by replying with: > > #syz fix: exact-commit-title > > > > If you want syzbot to run the reproducer, reply with: > > #syz test: git://repo/address.git branch-or-commit-hash > > If you attach or paste a git patch, syzbot will apply it before testing. > > > > If you want to overwrite report's subsystems, reply with: > > #syz set subsystems: new-subsystem > > (See the list of subsystem names on the web dashboard) > > > > If the report is a duplicate of another one, reply with: > > #syz dup: exact-subject-of-another-report > > > > If you want to undo deduplication, reply with: > > #syz undup > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot] [kernel?] WARNING: suspicious RCU usage in __do_softirq 2024-04-16 8:44 ` Z qiang @ 2024-04-17 1:13 ` Paul E. McKenney 2024-04-17 2:25 ` Z qiang 0 siblings, 1 reply; 18+ messages in thread From: Paul E. McKenney @ 2024-04-17 1:13 UTC (permalink / raw) To: Z qiang; +Cc: syzbot, linux-kernel, peterz, syzkaller-bugs, tglx On Tue, Apr 16, 2024 at 04:44:54PM +0800, Z qiang wrote: > On Tue, Apr 16, 2024 at 4:10 PM Z qiang <qiang.zhang1211@gmail.com> wrote: > > > > Cc: Paul > > > > > > Hello, > > > > > > syzbot found the following issue on: > > > > > > HEAD commit: c0b832517f62 Add linux-next specific files for 20240402 > > > git tree: linux-next > > > console output: https://syzkaller.appspot.com/x/log.txt?x=15f64776180000 > > > kernel config: https://syzkaller.appspot.com/x/.config?x=afcaf46d374cec8c > > > dashboard link: https://syzkaller.appspot.com/bug?extid=dce04ed6d1438ad69656 > > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10f00471180000 > > > > > > Downloadable assets: > > > disk image: https://storage.googleapis.com/syzbot-assets/0d36ec76edc7/disk-c0b83251.raw.xz > > > vmlinux: https://storage.googleapis.com/syzbot-assets/6f9bb4e37dd0/vmlinux-c0b83251.xz > > > kernel image: https://storage.googleapis.com/syzbot-assets/2349287b14b7/bzImage-c0b83251.xz > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > Reported-by: syzbot+dce04ed6d1438ad69656@syzkaller.appspotmail.com > > > > > > ============================= > > > WARNING: suspicious RCU usage > > > 6.9.0-rc2-next-20240402-syzkaller #0 Not tainted > > > ----------------------------- > > > kernel/rcu/tree.c:276 Illegal rcu_softirq_qs() in RCU read-side critical section! > > > > > > other info that might help us debug this: > > > > > > > > > rcu_scheduler_active = 2, debug_locks = 1 > > > 1 lock held by ksoftirqd/0/16: > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline] > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: rcu_read_lock_sched include/linux/rcupdate.h:933 [inline] > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: pfn_valid include/linux/mmzone.h:2019 [inline] > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: __virt_addr_valid+0x183/0x520 arch/x86/mm/physaddr.c:65 > > > > > > stack backtrace: > > > CPU: 0 PID: 16 Comm: ksoftirqd/0 Not tainted 6.9.0-rc2-next-20240402-syzkaller #0 > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 > > > Call Trace: > > > <IRQ> > > > __dump_stack lib/dump_stack.c:88 [inline] > > > dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 > > > lockdep_rcu_suspicious+0x221/0x340 kernel/locking/lockdep.c:6712 > > > rcu_softirq_qs+0xd9/0x370 kernel/rcu/tree.c:273 > > > __do_softirq+0x5fd/0x980 kernel/softirq.c:568 Huh. This statement is supposed to prevent this call to __do_softirq() from interrupt exit:: if (!IS_ENABLED(CONFIG_PREEMPT_RT) && __this_cpu_read(ksoftirqd) == current) So was the ksoftirqd kthread interrupted at a point where it happens to have softirq enabled? Thanx, Paul > > > invoke_softirq kernel/softirq.c:428 [inline] > > > __irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633 > > > irq_exit_rcu+0x9/0x30 kernel/softirq.c:645 > > > instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline] > > > sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043 > > > </IRQ> > > > <TASK> > > > asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702 > > > RIP: 0010:debug_lockdep_rcu_enabled+0xd/0x40 kernel/rcu/update.c:320 > > > Code: f5 90 0f 0b 90 90 90 eb c6 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 31 c0 83 3d c7 0f 28 04 00 <74> 1e 83 3d 26 42 28 04 00 74 15 65 48 8b 0c 25 c0 d3 03 00 31 c0 > > > RSP: 0018:ffffc90000157a50 EFLAGS: 00000202 > > > RAX: 0000000000000000 RBX: 00000000000000a0 RCX: 0000000000000001 > > > RDX: dffffc0000000000 RSI: ffffffff8bcae740 RDI: ffffffff8c1f7ec0 > > > RBP: dffffc0000000000 R08: ffffffff92f3a527 R09: 1ffffffff25e74a4 > > > R10: dffffc0000000000 R11: fffffbfff25e74a5 R12: 0000000029373578 > > > R13: 1ffff9200002af64 R14: ffffffff814220f3 R15: ffff88813fff90a0 > > > rcu_read_lock_sched include/linux/rcupdate.h:934 [inline] > > > pfn_valid include/linux/mmzone.h:2019 [inline] > > > __virt_addr_valid+0x1a9/0x520 arch/x86/mm/physaddr.c:65 > > > kasan_addr_to_slab+0xd/0x80 mm/kasan/common.c:37 > > > __kasan_record_aux_stack+0x11/0xc0 mm/kasan/generic.c:526 > > > > > > This should be caused by the following commit: > > d818cc76e2b4 ("kasan: Record work creation stack trace with interrupts enabled") > > > > Is it possible to make the rcu_softirq_qs() run only in ksoftirqd task? > > use rcu_softirq_qs_periodic() in run_ksoftirqd(). > > > > > Thanks > > Zqiang > > > > > __call_rcu_common kernel/rcu/tree.c:3096 [inline] > > > call_rcu+0x167/0xa70 kernel/rcu/tree.c:3200 > > > context_switch kernel/sched/core.c:5412 [inline] > > > __schedule+0x17f0/0x4a50 kernel/sched/core.c:6746 > > > __schedule_loop kernel/sched/core.c:6823 [inline] > > > schedule+0x14b/0x320 kernel/sched/core.c:6838 > > > smpboot_thread_fn+0x61e/0xa30 kernel/smpboot.c:160 > > > kthread+0x2f0/0x390 kernel/kthread.c:388 > > > ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 > > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243 > > > </TASK> > > > ---------------- > > > Code disassembly (best guess): > > > 0: f5 cmc > > > 1: 90 nop > > > 2: 0f 0b ud2 > > > 4: 90 nop > > > 5: 90 nop > > > 6: 90 nop > > > 7: eb c6 jmp 0xffffffcf > > > 9: 0f 1f 40 00 nopl 0x0(%rax) > > > d: 90 nop > > > e: 90 nop > > > f: 90 nop > > > 10: 90 nop > > > 11: 90 nop > > > 12: 90 nop > > > 13: 90 nop > > > 14: 90 nop > > > 15: 90 nop > > > 16: 90 nop > > > 17: 90 nop > > > 18: 90 nop > > > 19: 90 nop > > > 1a: 90 nop > > > 1b: 90 nop > > > 1c: 90 nop > > > 1d: f3 0f 1e fa endbr64 > > > 21: 31 c0 xor %eax,%eax > > > 23: 83 3d c7 0f 28 04 00 cmpl $0x0,0x4280fc7(%rip) # 0x4280ff1 > > > * 2a: 74 1e je 0x4a <-- trapping instruction > > > 2c: 83 3d 26 42 28 04 00 cmpl $0x0,0x4284226(%rip) # 0x4284259 > > > 33: 74 15 je 0x4a > > > 35: 65 48 8b 0c 25 c0 d3 mov %gs:0x3d3c0,%rcx > > > 3c: 03 00 > > > 3e: 31 c0 xor %eax,%eax > > > > > > > > > --- > > > This report is generated by a bot. It may contain errors. > > > See https://goo.gl/tpsmEJ for more information about syzbot. > > > syzbot engineers can be reached at syzkaller@googlegroups.com. > > > > > > syzbot will keep track of this issue. See: > > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > > > > > > If the report is already addressed, let syzbot know by replying with: > > > #syz fix: exact-commit-title > > > > > > If you want syzbot to run the reproducer, reply with: > > > #syz test: git://repo/address.git branch-or-commit-hash > > > If you attach or paste a git patch, syzbot will apply it before testing. > > > > > > If you want to overwrite report's subsystems, reply with: > > > #syz set subsystems: new-subsystem > > > (See the list of subsystem names on the web dashboard) > > > > > > If the report is a duplicate of another one, reply with: > > > #syz dup: exact-subject-of-another-report > > > > > > If you want to undo deduplication, reply with: > > > #syz undup > > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot] [kernel?] WARNING: suspicious RCU usage in __do_softirq 2024-04-17 1:13 ` Paul E. McKenney @ 2024-04-17 2:25 ` Z qiang 2024-04-17 22:39 ` Paul E. McKenney 0 siblings, 1 reply; 18+ messages in thread From: Z qiang @ 2024-04-17 2:25 UTC (permalink / raw) To: paulmck; +Cc: syzbot, linux-kernel, peterz, syzkaller-bugs, tglx > > On Tue, Apr 16, 2024 at 04:44:54PM +0800, Z qiang wrote: > > On Tue, Apr 16, 2024 at 4:10 PM Z qiang <qiang.zhang1211@gmail.com> wrote: > > > > > > Cc: Paul > > > > > > > > Hello, > > > > > > > > syzbot found the following issue on: > > > > > > > > HEAD commit: c0b832517f62 Add linux-next specific files for 20240402 > > > > git tree: linux-next > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=15f64776180000 > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=afcaf46d374cec8c > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=dce04ed6d1438ad69656 > > > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 > > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10f00471180000 > > > > > > > > Downloadable assets: > > > > disk image: https://storage.googleapis.com/syzbot-assets/0d36ec76edc7/disk-c0b83251.raw.xz > > > > vmlinux: https://storage.googleapis.com/syzbot-assets/6f9bb4e37dd0/vmlinux-c0b83251.xz > > > > kernel image: https://storage.googleapis.com/syzbot-assets/2349287b14b7/bzImage-c0b83251.xz > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > Reported-by: syzbot+dce04ed6d1438ad69656@syzkaller.appspotmail.com > > > > > > > > ============================= > > > > WARNING: suspicious RCU usage > > > > 6.9.0-rc2-next-20240402-syzkaller #0 Not tainted > > > > ----------------------------- > > > > kernel/rcu/tree.c:276 Illegal rcu_softirq_qs() in RCU read-side critical section! > > > > > > > > other info that might help us debug this: > > > > > > > > > > > > rcu_scheduler_active = 2, debug_locks = 1 > > > > 1 lock held by ksoftirqd/0/16: > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline] > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: rcu_read_lock_sched include/linux/rcupdate.h:933 [inline] > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: pfn_valid include/linux/mmzone.h:2019 [inline] > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: __virt_addr_valid+0x183/0x520 arch/x86/mm/physaddr.c:65 > > > > > > > > stack backtrace: > > > > CPU: 0 PID: 16 Comm: ksoftirqd/0 Not tainted 6.9.0-rc2-next-20240402-syzkaller #0 > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 > > > > Call Trace: > > > > <IRQ> > > > > __dump_stack lib/dump_stack.c:88 [inline] > > > > dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 > > > > lockdep_rcu_suspicious+0x221/0x340 kernel/locking/lockdep.c:6712 > > > > rcu_softirq_qs+0xd9/0x370 kernel/rcu/tree.c:273 > > > > __do_softirq+0x5fd/0x980 kernel/softirq.c:568 > > Huh. This statement is supposed to prevent this call to __do_softirq() > from interrupt exit:: > > if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > __this_cpu_read(ksoftirqd) == current) > > So was the ksoftirqd kthread interrupted at a point where it happens to > have softirq enabled? It should look like this: schedule() switch_to ksoftirqd/0 finish_task_switch ->put_task_struct_rcu_user ->call_rcu(&task->rcu, delayed_put_task_struct) ->__kasan_record_aux_stack ->pfn_valid ->rcu_read_lock_sched <interrupt> __irq_exit_rcu ->__do_softirq -> if (!IS_ENABLED(CONFIG_PREEMPT_RT) && __this_cpu_read(ksoftirqd) == current) rcu_softirq_qs -> RCU_LOCKDEP_WARN(lock_is_held(&rcu_sched_lock_map)) Thanks Zqiang > > Thanx, Paul > > > > > invoke_softirq kernel/softirq.c:428 [inline] > > > > __irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633 > > > > irq_exit_rcu+0x9/0x30 kernel/softirq.c:645 > > > > instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline] > > > > sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043 > > > > </IRQ> > > > > <TASK> > > > > asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702 > > > > RIP: 0010:debug_lockdep_rcu_enabled+0xd/0x40 kernel/rcu/update.c:320 > > > > Code: f5 90 0f 0b 90 90 90 eb c6 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 31 c0 83 3d c7 0f 28 04 00 <74> 1e 83 3d 26 42 28 04 00 74 15 65 48 8b 0c 25 c0 d3 03 00 31 c0 > > > > RSP: 0018:ffffc90000157a50 EFLAGS: 00000202 > > > > RAX: 0000000000000000 RBX: 00000000000000a0 RCX: 0000000000000001 > > > > RDX: dffffc0000000000 RSI: ffffffff8bcae740 RDI: ffffffff8c1f7ec0 > > > > RBP: dffffc0000000000 R08: ffffffff92f3a527 R09: 1ffffffff25e74a4 > > > > R10: dffffc0000000000 R11: fffffbfff25e74a5 R12: 0000000029373578 > > > > R13: 1ffff9200002af64 R14: ffffffff814220f3 R15: ffff88813fff90a0 > > > > rcu_read_lock_sched include/linux/rcupdate.h:934 [inline] > > > > pfn_valid include/linux/mmzone.h:2019 [inline] > > > > __virt_addr_valid+0x1a9/0x520 arch/x86/mm/physaddr.c:65 > > > > kasan_addr_to_slab+0xd/0x80 mm/kasan/common.c:37 > > > > __kasan_record_aux_stack+0x11/0xc0 mm/kasan/generic.c:526 > > > > > > > > > This should be caused by the following commit: > > > d818cc76e2b4 ("kasan: Record work creation stack trace with interrupts enabled") > > > > > > Is it possible to make the rcu_softirq_qs() run only in ksoftirqd task? > > > > use rcu_softirq_qs_periodic() in run_ksoftirqd(). > > > > > > > > Thanks > > > Zqiang > > > > > > > __call_rcu_common kernel/rcu/tree.c:3096 [inline] > > > > call_rcu+0x167/0xa70 kernel/rcu/tree.c:3200 > > > > context_switch kernel/sched/core.c:5412 [inline] > > > > __schedule+0x17f0/0x4a50 kernel/sched/core.c:6746 > > > > __schedule_loop kernel/sched/core.c:6823 [inline] > > > > schedule+0x14b/0x320 kernel/sched/core.c:6838 > > > > smpboot_thread_fn+0x61e/0xa30 kernel/smpboot.c:160 > > > > kthread+0x2f0/0x390 kernel/kthread.c:388 > > > > ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 > > > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243 > > > > </TASK> > > > > ---------------- > > > > Code disassembly (best guess): > > > > 0: f5 cmc > > > > 1: 90 nop > > > > 2: 0f 0b ud2 > > > > 4: 90 nop > > > > 5: 90 nop > > > > 6: 90 nop > > > > 7: eb c6 jmp 0xffffffcf > > > > 9: 0f 1f 40 00 nopl 0x0(%rax) > > > > d: 90 nop > > > > e: 90 nop > > > > f: 90 nop > > > > 10: 90 nop > > > > 11: 90 nop > > > > 12: 90 nop > > > > 13: 90 nop > > > > 14: 90 nop > > > > 15: 90 nop > > > > 16: 90 nop > > > > 17: 90 nop > > > > 18: 90 nop > > > > 19: 90 nop > > > > 1a: 90 nop > > > > 1b: 90 nop > > > > 1c: 90 nop > > > > 1d: f3 0f 1e fa endbr64 > > > > 21: 31 c0 xor %eax,%eax > > > > 23: 83 3d c7 0f 28 04 00 cmpl $0x0,0x4280fc7(%rip) # 0x4280ff1 > > > > * 2a: 74 1e je 0x4a <-- trapping instruction > > > > 2c: 83 3d 26 42 28 04 00 cmpl $0x0,0x4284226(%rip) # 0x4284259 > > > > 33: 74 15 je 0x4a > > > > 35: 65 48 8b 0c 25 c0 d3 mov %gs:0x3d3c0,%rcx > > > > 3c: 03 00 > > > > 3e: 31 c0 xor %eax,%eax > > > > > > > > > > > > --- > > > > This report is generated by a bot. It may contain errors. > > > > See https://goo.gl/tpsmEJ for more information about syzbot. > > > > syzbot engineers can be reached at syzkaller@googlegroups.com. > > > > > > > > syzbot will keep track of this issue. See: > > > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > > > > > > > > If the report is already addressed, let syzbot know by replying with: > > > > #syz fix: exact-commit-title > > > > > > > > If you want syzbot to run the reproducer, reply with: > > > > #syz test: git://repo/address.git branch-or-commit-hash > > > > If you attach or paste a git patch, syzbot will apply it before testing. > > > > > > > > If you want to overwrite report's subsystems, reply with: > > > > #syz set subsystems: new-subsystem > > > > (See the list of subsystem names on the web dashboard) > > > > > > > > If the report is a duplicate of another one, reply with: > > > > #syz dup: exact-subject-of-another-report > > > > > > > > If you want to undo deduplication, reply with: > > > > #syz undup > > > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot] [kernel?] WARNING: suspicious RCU usage in __do_softirq 2024-04-17 2:25 ` Z qiang @ 2024-04-17 22:39 ` Paul E. McKenney 2024-04-18 9:49 ` Z qiang 0 siblings, 1 reply; 18+ messages in thread From: Paul E. McKenney @ 2024-04-17 22:39 UTC (permalink / raw) To: Z qiang; +Cc: syzbot, linux-kernel, peterz, syzkaller-bugs, tglx On Wed, Apr 17, 2024 at 10:25:01AM +0800, Z qiang wrote: > > > > On Tue, Apr 16, 2024 at 04:44:54PM +0800, Z qiang wrote: > > > On Tue, Apr 16, 2024 at 4:10 PM Z qiang <qiang.zhang1211@gmail.com> wrote: > > > > > > > > Cc: Paul > > > > > > > > > > Hello, > > > > > > > > > > syzbot found the following issue on: > > > > > > > > > > HEAD commit: c0b832517f62 Add linux-next specific files for 20240402 > > > > > git tree: linux-next > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=15f64776180000 > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=afcaf46d374cec8c > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=dce04ed6d1438ad69656 > > > > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 > > > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10f00471180000 > > > > > > > > > > Downloadable assets: > > > > > disk image: https://storage.googleapis.com/syzbot-assets/0d36ec76edc7/disk-c0b83251.raw.xz > > > > > vmlinux: https://storage.googleapis.com/syzbot-assets/6f9bb4e37dd0/vmlinux-c0b83251.xz > > > > > kernel image: https://storage.googleapis.com/syzbot-assets/2349287b14b7/bzImage-c0b83251.xz > > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > > Reported-by: syzbot+dce04ed6d1438ad69656@syzkaller.appspotmail.com > > > > > > > > > > ============================= > > > > > WARNING: suspicious RCU usage > > > > > 6.9.0-rc2-next-20240402-syzkaller #0 Not tainted > > > > > ----------------------------- > > > > > kernel/rcu/tree.c:276 Illegal rcu_softirq_qs() in RCU read-side critical section! > > > > > > > > > > other info that might help us debug this: > > > > > > > > > > > > > > > rcu_scheduler_active = 2, debug_locks = 1 > > > > > 1 lock held by ksoftirqd/0/16: > > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline] > > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: rcu_read_lock_sched include/linux/rcupdate.h:933 [inline] > > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: pfn_valid include/linux/mmzone.h:2019 [inline] > > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: __virt_addr_valid+0x183/0x520 arch/x86/mm/physaddr.c:65 > > > > > > > > > > stack backtrace: > > > > > CPU: 0 PID: 16 Comm: ksoftirqd/0 Not tainted 6.9.0-rc2-next-20240402-syzkaller #0 > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 > > > > > Call Trace: > > > > > <IRQ> > > > > > __dump_stack lib/dump_stack.c:88 [inline] > > > > > dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 > > > > > lockdep_rcu_suspicious+0x221/0x340 kernel/locking/lockdep.c:6712 > > > > > rcu_softirq_qs+0xd9/0x370 kernel/rcu/tree.c:273 > > > > > __do_softirq+0x5fd/0x980 kernel/softirq.c:568 > > > > Huh. This statement is supposed to prevent this call to __do_softirq() > > from interrupt exit:: > > > > if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > > __this_cpu_read(ksoftirqd) == current) > > > > So was the ksoftirqd kthread interrupted at a point where it happens to > > have softirq enabled? > > It should look like this: > schedule() > switch_to ksoftirqd/0 > finish_task_switch So this CPU's ksoftirqd task is running. > ->put_task_struct_rcu_user > ->call_rcu(&task->rcu, delayed_put_task_struct) > ->__kasan_record_aux_stack > ->pfn_valid > ->rcu_read_lock_sched > <interrupt> > __irq_exit_rcu > ->__do_softirq > -> if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > __this_cpu_read(ksoftirqd) == current) But we are also running __do_softirq() from return from interrupt. While running in this mode, we are not supposed to invoke rcu_softirq_qs(). But the "__this_cpu_read(ksoftirqd) == current" check yields "true", so we do call rcu_softirq_qs() anyway. That is a bug. We need to upgrade or replace that check to something that returns true only if called at process level from ksoftirqd. Any thoughts on a good way to do that? For example, would adding "&& in_task()" do the trick, or are there other unfortunate corner cases? (And good job tracking this down, by the way!) Thanx, Paul > rcu_softirq_qs > -> > RCU_LOCKDEP_WARN(lock_is_held(&rcu_sched_lock_map)) > > Thanks > Zqiang > > > > > > Thanx, Paul > > > > > > > invoke_softirq kernel/softirq.c:428 [inline] > > > > > __irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633 > > > > > irq_exit_rcu+0x9/0x30 kernel/softirq.c:645 > > > > > instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline] > > > > > sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043 > > > > > </IRQ> > > > > > <TASK> > > > > > asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702 > > > > > RIP: 0010:debug_lockdep_rcu_enabled+0xd/0x40 kernel/rcu/update.c:320 > > > > > Code: f5 90 0f 0b 90 90 90 eb c6 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 31 c0 83 3d c7 0f 28 04 00 <74> 1e 83 3d 26 42 28 04 00 74 15 65 48 8b 0c 25 c0 d3 03 00 31 c0 > > > > > RSP: 0018:ffffc90000157a50 EFLAGS: 00000202 > > > > > RAX: 0000000000000000 RBX: 00000000000000a0 RCX: 0000000000000001 > > > > > RDX: dffffc0000000000 RSI: ffffffff8bcae740 RDI: ffffffff8c1f7ec0 > > > > > RBP: dffffc0000000000 R08: ffffffff92f3a527 R09: 1ffffffff25e74a4 > > > > > R10: dffffc0000000000 R11: fffffbfff25e74a5 R12: 0000000029373578 > > > > > R13: 1ffff9200002af64 R14: ffffffff814220f3 R15: ffff88813fff90a0 > > > > > rcu_read_lock_sched include/linux/rcupdate.h:934 [inline] > > > > > pfn_valid include/linux/mmzone.h:2019 [inline] > > > > > __virt_addr_valid+0x1a9/0x520 arch/x86/mm/physaddr.c:65 > > > > > kasan_addr_to_slab+0xd/0x80 mm/kasan/common.c:37 > > > > > __kasan_record_aux_stack+0x11/0xc0 mm/kasan/generic.c:526 > > > > > > > > > > > > This should be caused by the following commit: > > > > d818cc76e2b4 ("kasan: Record work creation stack trace with interrupts enabled") > > > > > > > > Is it possible to make the rcu_softirq_qs() run only in ksoftirqd task? > > > > > > use rcu_softirq_qs_periodic() in run_ksoftirqd(). > > > > > > > > > > > Thanks > > > > Zqiang > > > > > > > > > __call_rcu_common kernel/rcu/tree.c:3096 [inline] > > > > > call_rcu+0x167/0xa70 kernel/rcu/tree.c:3200 > > > > > context_switch kernel/sched/core.c:5412 [inline] > > > > > __schedule+0x17f0/0x4a50 kernel/sched/core.c:6746 > > > > > __schedule_loop kernel/sched/core.c:6823 [inline] > > > > > schedule+0x14b/0x320 kernel/sched/core.c:6838 > > > > > smpboot_thread_fn+0x61e/0xa30 kernel/smpboot.c:160 > > > > > kthread+0x2f0/0x390 kernel/kthread.c:388 > > > > > ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 > > > > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243 > > > > > </TASK> > > > > > ---------------- > > > > > Code disassembly (best guess): > > > > > 0: f5 cmc > > > > > 1: 90 nop > > > > > 2: 0f 0b ud2 > > > > > 4: 90 nop > > > > > 5: 90 nop > > > > > 6: 90 nop > > > > > 7: eb c6 jmp 0xffffffcf > > > > > 9: 0f 1f 40 00 nopl 0x0(%rax) > > > > > d: 90 nop > > > > > e: 90 nop > > > > > f: 90 nop > > > > > 10: 90 nop > > > > > 11: 90 nop > > > > > 12: 90 nop > > > > > 13: 90 nop > > > > > 14: 90 nop > > > > > 15: 90 nop > > > > > 16: 90 nop > > > > > 17: 90 nop > > > > > 18: 90 nop > > > > > 19: 90 nop > > > > > 1a: 90 nop > > > > > 1b: 90 nop > > > > > 1c: 90 nop > > > > > 1d: f3 0f 1e fa endbr64 > > > > > 21: 31 c0 xor %eax,%eax > > > > > 23: 83 3d c7 0f 28 04 00 cmpl $0x0,0x4280fc7(%rip) # 0x4280ff1 > > > > > * 2a: 74 1e je 0x4a <-- trapping instruction > > > > > 2c: 83 3d 26 42 28 04 00 cmpl $0x0,0x4284226(%rip) # 0x4284259 > > > > > 33: 74 15 je 0x4a > > > > > 35: 65 48 8b 0c 25 c0 d3 mov %gs:0x3d3c0,%rcx > > > > > 3c: 03 00 > > > > > 3e: 31 c0 xor %eax,%eax > > > > > > > > > > > > > > > --- > > > > > This report is generated by a bot. It may contain errors. > > > > > See https://goo.gl/tpsmEJ for more information about syzbot. > > > > > syzbot engineers can be reached at syzkaller@googlegroups.com. > > > > > > > > > > syzbot will keep track of this issue. See: > > > > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > > > > > > > > > > If the report is already addressed, let syzbot know by replying with: > > > > > #syz fix: exact-commit-title > > > > > > > > > > If you want syzbot to run the reproducer, reply with: > > > > > #syz test: git://repo/address.git branch-or-commit-hash > > > > > If you attach or paste a git patch, syzbot will apply it before testing. > > > > > > > > > > If you want to overwrite report's subsystems, reply with: > > > > > #syz set subsystems: new-subsystem > > > > > (See the list of subsystem names on the web dashboard) > > > > > > > > > > If the report is a duplicate of another one, reply with: > > > > > #syz dup: exact-subject-of-another-report > > > > > > > > > > If you want to undo deduplication, reply with: > > > > > #syz undup > > > > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot] [kernel?] WARNING: suspicious RCU usage in __do_softirq 2024-04-17 22:39 ` Paul E. McKenney @ 2024-04-18 9:49 ` Z qiang 2024-04-18 22:49 ` Paul E. McKenney 0 siblings, 1 reply; 18+ messages in thread From: Z qiang @ 2024-04-18 9:49 UTC (permalink / raw) To: paulmck; +Cc: syzbot, linux-kernel, peterz, syzkaller-bugs, tglx > > On Wed, Apr 17, 2024 at 10:25:01AM +0800, Z qiang wrote: > > > > > > On Tue, Apr 16, 2024 at 04:44:54PM +0800, Z qiang wrote: > > > > On Tue, Apr 16, 2024 at 4:10 PM Z qiang <qiang.zhang1211@gmail.com> wrote: > > > > > > > > > > Cc: Paul > > > > > > > > > > > > Hello, > > > > > > > > > > > > syzbot found the following issue on: > > > > > > > > > > > > HEAD commit: c0b832517f62 Add linux-next specific files for 20240402 > > > > > > git tree: linux-next > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=15f64776180000 > > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=afcaf46d374cec8c > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=dce04ed6d1438ad69656 > > > > > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 > > > > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10f00471180000 > > > > > > > > > > > > Downloadable assets: > > > > > > disk image: https://storage.googleapis.com/syzbot-assets/0d36ec76edc7/disk-c0b83251.raw.xz > > > > > > vmlinux: https://storage.googleapis.com/syzbot-assets/6f9bb4e37dd0/vmlinux-c0b83251.xz > > > > > > kernel image: https://storage.googleapis.com/syzbot-assets/2349287b14b7/bzImage-c0b83251.xz > > > > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > > > Reported-by: syzbot+dce04ed6d1438ad69656@syzkaller.appspotmail.com > > > > > > > > > > > > ============================= > > > > > > WARNING: suspicious RCU usage > > > > > > 6.9.0-rc2-next-20240402-syzkaller #0 Not tainted > > > > > > ----------------------------- > > > > > > kernel/rcu/tree.c:276 Illegal rcu_softirq_qs() in RCU read-side critical section! > > > > > > > > > > > > other info that might help us debug this: > > > > > > > > > > > > > > > > > > rcu_scheduler_active = 2, debug_locks = 1 > > > > > > 1 lock held by ksoftirqd/0/16: > > > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline] > > > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: rcu_read_lock_sched include/linux/rcupdate.h:933 [inline] > > > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: pfn_valid include/linux/mmzone.h:2019 [inline] > > > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: __virt_addr_valid+0x183/0x520 arch/x86/mm/physaddr.c:65 > > > > > > > > > > > > stack backtrace: > > > > > > CPU: 0 PID: 16 Comm: ksoftirqd/0 Not tainted 6.9.0-rc2-next-20240402-syzkaller #0 > > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 > > > > > > Call Trace: > > > > > > <IRQ> > > > > > > __dump_stack lib/dump_stack.c:88 [inline] > > > > > > dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 > > > > > > lockdep_rcu_suspicious+0x221/0x340 kernel/locking/lockdep.c:6712 > > > > > > rcu_softirq_qs+0xd9/0x370 kernel/rcu/tree.c:273 > > > > > > __do_softirq+0x5fd/0x980 kernel/softirq.c:568 > > > > > > Huh. This statement is supposed to prevent this call to __do_softirq() > > > from interrupt exit:: > > > > > > if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > > > __this_cpu_read(ksoftirqd) == current) > > > > > > So was the ksoftirqd kthread interrupted at a point where it happens to > > > have softirq enabled? > > > > It should look like this: > > schedule() > > switch_to ksoftirqd/0 > > finish_task_switch > > So this CPU's ksoftirqd task is running. > > > ->put_task_struct_rcu_user > > ->call_rcu(&task->rcu, delayed_put_task_struct) > > ->__kasan_record_aux_stack > > ->pfn_valid > > ->rcu_read_lock_sched > > <interrupt> > > __irq_exit_rcu > > ->__do_softirq > > -> if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > > __this_cpu_read(ksoftirqd) == current) > > But we are also running __do_softirq() from return from interrupt. While > running in this mode, we are not supposed to invoke rcu_softirq_qs(). > But the "__this_cpu_read(ksoftirqd) == current" check yields "true", > so we do call rcu_softirq_qs() anyway. That is a bug. > > We need to upgrade or replace that check to something that returns true > only if called at process level from ksoftirqd. > > Any thoughts on a good way to do that? For example, would adding "&& > in_task()" do the trick, or are there other unfortunate corner cases? The rcu_softirq_qs() is invoked in softirq_handle_begin/end() critical section, in softirqd/0 task context, the "in_task()" should return false, will miss qs report in softirqd/0 task context. diff --git a/kernel/softirq.c b/kernel/softirq.c index b315b21fb28c..9b8f0c0f7675 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -563,10 +563,6 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) pending >>= softirq_bit; } - if (!IS_ENABLED(CONFIG_PREEMPT_RT) && - __this_cpu_read(ksoftirqd) == current) - rcu_softirq_qs(); - local_irq_disable(); pending = local_softirq_pending(); @@ -915,6 +911,8 @@ static int ksoftirqd_should_run(unsigned int cpu) static void run_ksoftirqd(unsigned int cpu) { + unsigned long last_qs = jiffies; + ksoftirqd_run_begin(); if (local_softirq_pending()) { /* @@ -923,6 +921,7 @@ static void run_ksoftirqd(unsigned int cpu) */ __do_softirq(); ksoftirqd_run_end(); + rcu_softirq_qs_periodic(last_qs); cond_resched(); return; } Any thoughts? Thanks Zqiang > > (And good job tracking this down, by the way!) > > Thanx, Paul > > > rcu_softirq_qs > > -> > > RCU_LOCKDEP_WARN(lock_is_held(&rcu_sched_lock_map)) > > > > Thanks > > Zqiang > > > > > > > > > > Thanx, Paul > > > > > > > > > invoke_softirq kernel/softirq.c:428 [inline] > > > > > > __irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633 > > > > > > irq_exit_rcu+0x9/0x30 kernel/softirq.c:645 > > > > > > instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline] > > > > > > sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043 > > > > > > </IRQ> > > > > > > <TASK> > > > > > > asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702 > > > > > > RIP: 0010:debug_lockdep_rcu_enabled+0xd/0x40 kernel/rcu/update.c:320 > > > > > > Code: f5 90 0f 0b 90 90 90 eb c6 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 31 c0 83 3d c7 0f 28 04 00 <74> 1e 83 3d 26 42 28 04 00 74 15 65 48 8b 0c 25 c0 d3 03 00 31 c0 > > > > > > RSP: 0018:ffffc90000157a50 EFLAGS: 00000202 > > > > > > RAX: 0000000000000000 RBX: 00000000000000a0 RCX: 0000000000000001 > > > > > > RDX: dffffc0000000000 RSI: ffffffff8bcae740 RDI: ffffffff8c1f7ec0 > > > > > > RBP: dffffc0000000000 R08: ffffffff92f3a527 R09: 1ffffffff25e74a4 > > > > > > R10: dffffc0000000000 R11: fffffbfff25e74a5 R12: 0000000029373578 > > > > > > R13: 1ffff9200002af64 R14: ffffffff814220f3 R15: ffff88813fff90a0 > > > > > > rcu_read_lock_sched include/linux/rcupdate.h:934 [inline] > > > > > > pfn_valid include/linux/mmzone.h:2019 [inline] > > > > > > __virt_addr_valid+0x1a9/0x520 arch/x86/mm/physaddr.c:65 > > > > > > kasan_addr_to_slab+0xd/0x80 mm/kasan/common.c:37 > > > > > > __kasan_record_aux_stack+0x11/0xc0 mm/kasan/generic.c:526 > > > > > > > > > > > > > > > This should be caused by the following commit: > > > > > d818cc76e2b4 ("kasan: Record work creation stack trace with interrupts enabled") > > > > > > > > > > Is it possible to make the rcu_softirq_qs() run only in ksoftirqd task? > > > > > > > > use rcu_softirq_qs_periodic() in run_ksoftirqd(). > > > > > > > > > > > > > > Thanks > > > > > Zqiang > > > > > > > > > > > __call_rcu_common kernel/rcu/tree.c:3096 [inline] > > > > > > call_rcu+0x167/0xa70 kernel/rcu/tree.c:3200 > > > > > > context_switch kernel/sched/core.c:5412 [inline] > > > > > > __schedule+0x17f0/0x4a50 kernel/sched/core.c:6746 > > > > > > __schedule_loop kernel/sched/core.c:6823 [inline] > > > > > > schedule+0x14b/0x320 kernel/sched/core.c:6838 > > > > > > smpboot_thread_fn+0x61e/0xa30 kernel/smpboot.c:160 > > > > > > kthread+0x2f0/0x390 kernel/kthread.c:388 > > > > > > ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 > > > > > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243 > > > > > > </TASK> > > > > > > ---------------- > > > > > > Code disassembly (best guess): > > > > > > 0: f5 cmc > > > > > > 1: 90 nop > > > > > > 2: 0f 0b ud2 > > > > > > 4: 90 nop > > > > > > 5: 90 nop > > > > > > 6: 90 nop > > > > > > 7: eb c6 jmp 0xffffffcf > > > > > > 9: 0f 1f 40 00 nopl 0x0(%rax) > > > > > > d: 90 nop > > > > > > e: 90 nop > > > > > > f: 90 nop > > > > > > 10: 90 nop > > > > > > 11: 90 nop > > > > > > 12: 90 nop > > > > > > 13: 90 nop > > > > > > 14: 90 nop > > > > > > 15: 90 nop > > > > > > 16: 90 nop > > > > > > 17: 90 nop > > > > > > 18: 90 nop > > > > > > 19: 90 nop > > > > > > 1a: 90 nop > > > > > > 1b: 90 nop > > > > > > 1c: 90 nop > > > > > > 1d: f3 0f 1e fa endbr64 > > > > > > 21: 31 c0 xor %eax,%eax > > > > > > 23: 83 3d c7 0f 28 04 00 cmpl $0x0,0x4280fc7(%rip) # 0x4280ff1 > > > > > > * 2a: 74 1e je 0x4a <-- trapping instruction > > > > > > 2c: 83 3d 26 42 28 04 00 cmpl $0x0,0x4284226(%rip) # 0x4284259 > > > > > > 33: 74 15 je 0x4a > > > > > > 35: 65 48 8b 0c 25 c0 d3 mov %gs:0x3d3c0,%rcx > > > > > > 3c: 03 00 > > > > > > 3e: 31 c0 xor %eax,%eax > > > > > > > > > > > > > > > > > > --- > > > > > > This report is generated by a bot. It may contain errors. > > > > > > See https://goo.gl/tpsmEJ for more information about syzbot. > > > > > > syzbot engineers can be reached at syzkaller@googlegroups.com. > > > > > > > > > > > > syzbot will keep track of this issue. See: > > > > > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > > > > > > > > > > > > If the report is already addressed, let syzbot know by replying with: > > > > > > #syz fix: exact-commit-title > > > > > > > > > > > > If you want syzbot to run the reproducer, reply with: > > > > > > #syz test: git://repo/address.git branch-or-commit-hash > > > > > > If you attach or paste a git patch, syzbot will apply it before testing. > > > > > > > > > > > > If you want to overwrite report's subsystems, reply with: > > > > > > #syz set subsystems: new-subsystem > > > > > > (See the list of subsystem names on the web dashboard) > > > > > > > > > > > > If the report is a duplicate of another one, reply with: > > > > > > #syz dup: exact-subject-of-another-report > > > > > > > > > > > > If you want to undo deduplication, reply with: > > > > > > #syz undup > > > > > > ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [syzbot] [kernel?] WARNING: suspicious RCU usage in __do_softirq 2024-04-18 9:49 ` Z qiang @ 2024-04-18 22:49 ` Paul E. McKenney 2024-04-19 5:50 ` Z qiang 0 siblings, 1 reply; 18+ messages in thread From: Paul E. McKenney @ 2024-04-18 22:49 UTC (permalink / raw) To: Z qiang; +Cc: syzbot, linux-kernel, peterz, syzkaller-bugs, tglx On Thu, Apr 18, 2024 at 05:49:38PM +0800, Z qiang wrote: > > > > On Wed, Apr 17, 2024 at 10:25:01AM +0800, Z qiang wrote: > > > > > > > > On Tue, Apr 16, 2024 at 04:44:54PM +0800, Z qiang wrote: > > > > > On Tue, Apr 16, 2024 at 4:10 PM Z qiang <qiang.zhang1211@gmail.com> wrote: > > > > > > > > > > > > Cc: Paul > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > syzbot found the following issue on: > > > > > > > > > > > > > > HEAD commit: c0b832517f62 Add linux-next specific files for 20240402 > > > > > > > git tree: linux-next > > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=15f64776180000 > > > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=afcaf46d374cec8c > > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=dce04ed6d1438ad69656 > > > > > > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 > > > > > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10f00471180000 > > > > > > > > > > > > > > Downloadable assets: > > > > > > > disk image: https://storage.googleapis.com/syzbot-assets/0d36ec76edc7/disk-c0b83251.raw.xz > > > > > > > vmlinux: https://storage.googleapis.com/syzbot-assets/6f9bb4e37dd0/vmlinux-c0b83251.xz > > > > > > > kernel image: https://storage.googleapis.com/syzbot-assets/2349287b14b7/bzImage-c0b83251.xz > > > > > > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > > > > Reported-by: syzbot+dce04ed6d1438ad69656@syzkaller.appspotmail.com > > > > > > > > > > > > > > ============================= > > > > > > > WARNING: suspicious RCU usage > > > > > > > 6.9.0-rc2-next-20240402-syzkaller #0 Not tainted > > > > > > > ----------------------------- > > > > > > > kernel/rcu/tree.c:276 Illegal rcu_softirq_qs() in RCU read-side critical section! > > > > > > > > > > > > > > other info that might help us debug this: > > > > > > > > > > > > > > > > > > > > > rcu_scheduler_active = 2, debug_locks = 1 > > > > > > > 1 lock held by ksoftirqd/0/16: > > > > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline] > > > > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: rcu_read_lock_sched include/linux/rcupdate.h:933 [inline] > > > > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: pfn_valid include/linux/mmzone.h:2019 [inline] > > > > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: __virt_addr_valid+0x183/0x520 arch/x86/mm/physaddr.c:65 > > > > > > > > > > > > > > stack backtrace: > > > > > > > CPU: 0 PID: 16 Comm: ksoftirqd/0 Not tainted 6.9.0-rc2-next-20240402-syzkaller #0 > > > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 > > > > > > > Call Trace: > > > > > > > <IRQ> > > > > > > > __dump_stack lib/dump_stack.c:88 [inline] > > > > > > > dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 > > > > > > > lockdep_rcu_suspicious+0x221/0x340 kernel/locking/lockdep.c:6712 > > > > > > > rcu_softirq_qs+0xd9/0x370 kernel/rcu/tree.c:273 > > > > > > > __do_softirq+0x5fd/0x980 kernel/softirq.c:568 > > > > > > > > Huh. This statement is supposed to prevent this call to __do_softirq() > > > > from interrupt exit:: > > > > > > > > if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > > > > __this_cpu_read(ksoftirqd) == current) > > > > > > > > So was the ksoftirqd kthread interrupted at a point where it happens to > > > > have softirq enabled? > > > > > > It should look like this: > > > schedule() > > > switch_to ksoftirqd/0 > > > finish_task_switch > > > > So this CPU's ksoftirqd task is running. > > > > > ->put_task_struct_rcu_user > > > ->call_rcu(&task->rcu, delayed_put_task_struct) > > > ->__kasan_record_aux_stack > > > ->pfn_valid > > > ->rcu_read_lock_sched > > > <interrupt> > > > __irq_exit_rcu > > > ->__do_softirq > > > -> if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > > > __this_cpu_read(ksoftirqd) == current) > > > > But we are also running __do_softirq() from return from interrupt. While > > running in this mode, we are not supposed to invoke rcu_softirq_qs(). > > But the "__this_cpu_read(ksoftirqd) == current" check yields "true", > > so we do call rcu_softirq_qs() anyway. That is a bug. > > > > We need to upgrade or replace that check to something that returns true > > only if called at process level from ksoftirqd. > > > > Any thoughts on a good way to do that? For example, would adding "&& > > in_task()" do the trick, or are there other unfortunate corner cases? > > The rcu_softirq_qs() is invoked in softirq_handle_begin/end() critical section, > in softirqd/0 task context, the "in_task()" should return false, will miss > qs report in softirqd/0 task context. > > diff --git a/kernel/softirq.c b/kernel/softirq.c > index b315b21fb28c..9b8f0c0f7675 100644 > --- a/kernel/softirq.c > +++ b/kernel/softirq.c > @@ -563,10 +563,6 @@ asmlinkage __visible void __softirq_entry > __do_softirq(void) > pending >>= softirq_bit; > } > > - if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > - __this_cpu_read(ksoftirqd) == current) > - rcu_softirq_qs(); > - > local_irq_disable(); > > pending = local_softirq_pending(); > @@ -915,6 +911,8 @@ static int ksoftirqd_should_run(unsigned int cpu) > > static void run_ksoftirqd(unsigned int cpu) > { > + unsigned long last_qs = jiffies; > + > ksoftirqd_run_begin(); > if (local_softirq_pending()) { > /* > @@ -923,6 +921,7 @@ static void run_ksoftirqd(unsigned int cpu) > */ > __do_softirq(); > ksoftirqd_run_end(); > + rcu_softirq_qs_periodic(last_qs); Unfortunately, we need the quiescent state to be within __do_softirq(). > cond_resched(); > return; > } > > Any thoughts? Can we mask the return value from preempt_count(), for example, as shown in the CONFIG_PREEMPTION=n version of rcu_flavor_sched_clock_irq()? The trick is that we should be able to ignore SOFTIRQ_MASK because __do_softirq() should not be invoked when softirqs are disabled. Emphasis on "should". ;-) Thanx, Paul > Thanks > Zqiang > > > > > > (And good job tracking this down, by the way!) > > > > Thanx, Paul > > > > > rcu_softirq_qs > > > -> > > > RCU_LOCKDEP_WARN(lock_is_held(&rcu_sched_lock_map)) > > > > > > Thanks > > > Zqiang > > > > > > > > > > > > > > Thanx, Paul > > > > > > > > > > > invoke_softirq kernel/softirq.c:428 [inline] > > > > > > > __irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633 > > > > > > > irq_exit_rcu+0x9/0x30 kernel/softirq.c:645 > > > > > > > instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline] > > > > > > > sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043 > > > > > > > </IRQ> > > > > > > > <TASK> > > > > > > > asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702 > > > > > > > RIP: 0010:debug_lockdep_rcu_enabled+0xd/0x40 kernel/rcu/update.c:320 > > > > > > > Code: f5 90 0f 0b 90 90 90 eb c6 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 31 c0 83 3d c7 0f 28 04 00 <74> 1e 83 3d 26 42 28 04 00 74 15 65 48 8b 0c 25 c0 d3 03 00 31 c0 > > > > > > > RSP: 0018:ffffc90000157a50 EFLAGS: 00000202 > > > > > > > RAX: 0000000000000000 RBX: 00000000000000a0 RCX: 0000000000000001 > > > > > > > RDX: dffffc0000000000 RSI: ffffffff8bcae740 RDI: ffffffff8c1f7ec0 > > > > > > > RBP: dffffc0000000000 R08: ffffffff92f3a527 R09: 1ffffffff25e74a4 > > > > > > > R10: dffffc0000000000 R11: fffffbfff25e74a5 R12: 0000000029373578 > > > > > > > R13: 1ffff9200002af64 R14: ffffffff814220f3 R15: ffff88813fff90a0 > > > > > > > rcu_read_lock_sched include/linux/rcupdate.h:934 [inline] > > > > > > > pfn_valid include/linux/mmzone.h:2019 [inline] > > > > > > > __virt_addr_valid+0x1a9/0x520 arch/x86/mm/physaddr.c:65 > > > > > > > kasan_addr_to_slab+0xd/0x80 mm/kasan/common.c:37 > > > > > > > __kasan_record_aux_stack+0x11/0xc0 mm/kasan/generic.c:526 > > > > > > > > > > > > > > > > > > This should be caused by the following commit: > > > > > > d818cc76e2b4 ("kasan: Record work creation stack trace with interrupts enabled") > > > > > > > > > > > > Is it possible to make the rcu_softirq_qs() run only in ksoftirqd task? > > > > > > > > > > use rcu_softirq_qs_periodic() in run_ksoftirqd(). > > > > > > > > > > > > > > > > > Thanks > > > > > > Zqiang > > > > > > > > > > > > > __call_rcu_common kernel/rcu/tree.c:3096 [inline] > > > > > > > call_rcu+0x167/0xa70 kernel/rcu/tree.c:3200 > > > > > > > context_switch kernel/sched/core.c:5412 [inline] > > > > > > > __schedule+0x17f0/0x4a50 kernel/sched/core.c:6746 > > > > > > > __schedule_loop kernel/sched/core.c:6823 [inline] > > > > > > > schedule+0x14b/0x320 kernel/sched/core.c:6838 > > > > > > > smpboot_thread_fn+0x61e/0xa30 kernel/smpboot.c:160 > > > > > > > kthread+0x2f0/0x390 kernel/kthread.c:388 > > > > > > > ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 > > > > > > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243 > > > > > > > </TASK> > > > > > > > ---------------- > > > > > > > Code disassembly (best guess): > > > > > > > 0: f5 cmc > > > > > > > 1: 90 nop > > > > > > > 2: 0f 0b ud2 > > > > > > > 4: 90 nop > > > > > > > 5: 90 nop > > > > > > > 6: 90 nop > > > > > > > 7: eb c6 jmp 0xffffffcf > > > > > > > 9: 0f 1f 40 00 nopl 0x0(%rax) > > > > > > > d: 90 nop > > > > > > > e: 90 nop > > > > > > > f: 90 nop > > > > > > > 10: 90 nop > > > > > > > 11: 90 nop > > > > > > > 12: 90 nop > > > > > > > 13: 90 nop > > > > > > > 14: 90 nop > > > > > > > 15: 90 nop > > > > > > > 16: 90 nop > > > > > > > 17: 90 nop > > > > > > > 18: 90 nop > > > > > > > 19: 90 nop > > > > > > > 1a: 90 nop > > > > > > > 1b: 90 nop > > > > > > > 1c: 90 nop > > > > > > > 1d: f3 0f 1e fa endbr64 > > > > > > > 21: 31 c0 xor %eax,%eax > > > > > > > 23: 83 3d c7 0f 28 04 00 cmpl $0x0,0x4280fc7(%rip) # 0x4280ff1 > > > > > > > * 2a: 74 1e je 0x4a <-- trapping instruction > > > > > > > 2c: 83 3d 26 42 28 04 00 cmpl $0x0,0x4284226(%rip) # 0x4284259 > > > > > > > 33: 74 15 je 0x4a > > > > > > > 35: 65 48 8b 0c 25 c0 d3 mov %gs:0x3d3c0,%rcx > > > > > > > 3c: 03 00 > > > > > > > 3e: 31 c0 xor %eax,%eax > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > This report is generated by a bot. It may contain errors. > > > > > > > See https://goo.gl/tpsmEJ for more information about syzbot. > > > > > > > syzbot engineers can be reached at syzkaller@googlegroups.com. > > > > > > > > > > > > > > syzbot will keep track of this issue. See: > > > > > > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > > > > > > > > > > > > > > If the report is already addressed, let syzbot know by replying with: > > > > > > > #syz fix: exact-commit-title > > > > > > > > > > > > > > If you want syzbot to run the reproducer, reply with: > > > > > > > #syz test: git://repo/address.git branch-or-commit-hash > > > > > > > If you attach or paste a git patch, syzbot will apply it before testing. > > > > > > > > > > > > > > If you want to overwrite report's subsystems, reply with: > > > > > > > #syz set subsystems: new-subsystem > > > > > > > (See the list of subsystem names on the web dashboard) > > > > > > > > > > > > > > If the report is a duplicate of another one, reply with: > > > > > > > #syz dup: exact-subject-of-another-report > > > > > > > > > > > > > > If you want to undo deduplication, reply with: > > > > > > > #syz undup > > > > > > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot] [kernel?] WARNING: suspicious RCU usage in __do_softirq 2024-04-18 22:49 ` Paul E. McKenney @ 2024-04-19 5:50 ` Z qiang 2024-04-19 6:11 ` Z qiang ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Z qiang @ 2024-04-19 5:50 UTC (permalink / raw) To: paulmck; +Cc: syzbot, linux-kernel, peterz, syzkaller-bugs, tglx > > On Thu, Apr 18, 2024 at 05:49:38PM +0800, Z qiang wrote: > > > > > > On Wed, Apr 17, 2024 at 10:25:01AM +0800, Z qiang wrote: > > > > > > > > > > On Tue, Apr 16, 2024 at 04:44:54PM +0800, Z qiang wrote: > > > > > > On Tue, Apr 16, 2024 at 4:10 PM Z qiang <qiang.zhang1211@gmail.com> wrote: > > > > > > > > > > > > > > Cc: Paul > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > syzbot found the following issue on: > > > > > > > > > > > > > > > > HEAD commit: c0b832517f62 Add linux-next specific files for 20240402 > > > > > > > > git tree: linux-next > > > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=15f64776180000 > > > > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=afcaf46d374cec8c > > > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=dce04ed6d1438ad69656 > > > > > > > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 > > > > > > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10f00471180000 > > > > > > > > > > > > > > > > Downloadable assets: > > > > > > > > disk image: https://storage.googleapis.com/syzbot-assets/0d36ec76edc7/disk-c0b83251.raw.xz > > > > > > > > vmlinux: https://storage.googleapis.com/syzbot-assets/6f9bb4e37dd0/vmlinux-c0b83251.xz > > > > > > > > kernel image: https://storage.googleapis.com/syzbot-assets/2349287b14b7/bzImage-c0b83251.xz > > > > > > > > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > > > > > Reported-by: syzbot+dce04ed6d1438ad69656@syzkaller.appspotmail.com > > > > > > > > > > > > > > > > ============================= > > > > > > > > WARNING: suspicious RCU usage > > > > > > > > 6.9.0-rc2-next-20240402-syzkaller #0 Not tainted > > > > > > > > ----------------------------- > > > > > > > > kernel/rcu/tree.c:276 Illegal rcu_softirq_qs() in RCU read-side critical section! > > > > > > > > > > > > > > > > other info that might help us debug this: > > > > > > > > > > > > > > > > > > > > > > > > rcu_scheduler_active = 2, debug_locks = 1 > > > > > > > > 1 lock held by ksoftirqd/0/16: > > > > > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline] > > > > > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: rcu_read_lock_sched include/linux/rcupdate.h:933 [inline] > > > > > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: pfn_valid include/linux/mmzone.h:2019 [inline] > > > > > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: __virt_addr_valid+0x183/0x520 arch/x86/mm/physaddr.c:65 > > > > > > > > > > > > > > > > stack backtrace: > > > > > > > > CPU: 0 PID: 16 Comm: ksoftirqd/0 Not tainted 6.9.0-rc2-next-20240402-syzkaller #0 > > > > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 > > > > > > > > Call Trace: > > > > > > > > <IRQ> > > > > > > > > __dump_stack lib/dump_stack.c:88 [inline] > > > > > > > > dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 > > > > > > > > lockdep_rcu_suspicious+0x221/0x340 kernel/locking/lockdep.c:6712 > > > > > > > > rcu_softirq_qs+0xd9/0x370 kernel/rcu/tree.c:273 > > > > > > > > __do_softirq+0x5fd/0x980 kernel/softirq.c:568 > > > > > > > > > > Huh. This statement is supposed to prevent this call to __do_softirq() > > > > > from interrupt exit:: > > > > > > > > > > if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > > > > > __this_cpu_read(ksoftirqd) == current) > > > > > > > > > > So was the ksoftirqd kthread interrupted at a point where it happens to > > > > > have softirq enabled? > > > > > > > > It should look like this: > > > > schedule() > > > > switch_to ksoftirqd/0 > > > > finish_task_switch > > > > > > So this CPU's ksoftirqd task is running. > > > > > > > ->put_task_struct_rcu_user > > > > ->call_rcu(&task->rcu, delayed_put_task_struct) > > > > ->__kasan_record_aux_stack > > > > ->pfn_valid > > > > ->rcu_read_lock_sched > > > > <interrupt> > > > > __irq_exit_rcu > > > > ->__do_softirq > > > > -> if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > > > > __this_cpu_read(ksoftirqd) == current) > > > > > > But we are also running __do_softirq() from return from interrupt. While > > > running in this mode, we are not supposed to invoke rcu_softirq_qs(). > > > But the "__this_cpu_read(ksoftirqd) == current" check yields "true", > > > so we do call rcu_softirq_qs() anyway. That is a bug. > > > > > > We need to upgrade or replace that check to something that returns true > > > only if called at process level from ksoftirqd. > > > > > > Any thoughts on a good way to do that? For example, would adding "&& > > > in_task()" do the trick, or are there other unfortunate corner cases? > > > > The rcu_softirq_qs() is invoked in softirq_handle_begin/end() critical section, > > in softirqd/0 task context, the "in_task()" should return false, will miss > > qs report in softirqd/0 task context. > > > > diff --git a/kernel/softirq.c b/kernel/softirq.c > > index b315b21fb28c..9b8f0c0f7675 100644 > > --- a/kernel/softirq.c > > +++ b/kernel/softirq.c > > @@ -563,10 +563,6 @@ asmlinkage __visible void __softirq_entry > > __do_softirq(void) > > pending >>= softirq_bit; > > } > > > > - if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > > - __this_cpu_read(ksoftirqd) == current) > > - rcu_softirq_qs(); > > - > > local_irq_disable(); > > > > pending = local_softirq_pending(); > > @@ -915,6 +911,8 @@ static int ksoftirqd_should_run(unsigned int cpu) > > > > static void run_ksoftirqd(unsigned int cpu) > > { > > + unsigned long last_qs = jiffies; > > + > > ksoftirqd_run_begin(); > > if (local_softirq_pending()) { > > /* > > @@ -923,6 +921,7 @@ static void run_ksoftirqd(unsigned int cpu) > > */ > > __do_softirq(); > > ksoftirqd_run_end(); > > + rcu_softirq_qs_periodic(last_qs); > > Unfortunately, we need the quiescent state to be within __do_softirq(). > > > cond_resched(); > > return; > > } > > > > Any thoughts? > > Can we mask the return value from preempt_count(), for example, as shown > in the CONFIG_PREEMPTION=n version of rcu_flavor_sched_clock_irq()? > > The trick is that we should be able to ignore SOFTIRQ_MASK because > __do_softirq() should not be invoked when softirqs are disabled. > Emphasis on "should". ;-) Does it mean this? diff --git a/kernel/softirq.c b/kernel/softirq.c index b315b21fb28c..315b717ec944 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -517,6 +517,8 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) bool in_hardirq; __u32 pending; int softirq_bit; + bool preempt_enabled = (IS_ENABLED(CONFIG_PREEMPT_COUNT) && + !(preempt_count() & PREEMPT_MASK)); /* * Mask out PF_MEMALLOC as the current task context is borrowed for the @@ -564,7 +566,8 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) } if (!IS_ENABLED(CONFIG_PREEMPT_RT) && - __this_cpu_read(ksoftirqd) == current) + __this_cpu_read(ksoftirqd) == current && + preempt_enabled && !rcu_preempt_depth()) rcu_softirq_qs(); local_irq_disable(); For built with CONFIG_PREEMPTION=n and CONFIG_COUNT=n kernels, the preempt_enabled is always false. or how about this? diff --git a/kernel/softirq.c b/kernel/softirq.c index b315b21fb28c..80b9fb4ee562 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -66,6 +66,8 @@ const char * const softirq_to_name[NR_SOFTIRQS] = { "TASKLET", "SCHED", "HRTIMER", "RCU" }; +static DEFINE_PER_CPU(bool, ksoftirqd_work); + /* * we cannot loop indefinitely here to avoid userspace starvation, * but we also don't want to introduce a worst case 1/HZ latency @@ -404,10 +406,12 @@ static inline void softirq_handle_end(void) static inline void ksoftirqd_run_begin(void) { local_irq_disable(); + per_cpu(ksoftirqd_work, smp_processor_id()) = true; } static inline void ksoftirqd_run_end(void) { + per_cpu(ksoftirqd_work, smp_processor_id()) = false; local_irq_enable(); } @@ -564,7 +568,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) } if (!IS_ENABLED(CONFIG_PREEMPT_RT) && - __this_cpu_read(ksoftirqd) == current) + __this_cpu_read(ksoftirqd) == current && ksoftirqd_work) rcu_softirq_qs(); local_irq_disable(); @@ -970,6 +974,10 @@ static struct smp_hotplug_thread softirq_threads = { static __init int spawn_ksoftirqd(void) { + int cpu; + + for_each_possible_cpu(cpu) + per_cpu(ksoftirqd_work, cpu) = false; cpuhp_setup_state_nocalls(CPUHP_SOFTIRQ_DEAD, "softirq:dead", NULL, takeover_tasklets); BUG_ON(smpboot_register_percpu_thread(&softirq_threads)); Thanks Zqiang > > Thanx, Paul > > > Thanks > > Zqiang > > > > > > > > > > (And good job tracking this down, by the way!) > > > > > > Thanx, Paul > > > > > > > rcu_softirq_qs > > > > -> > > > > RCU_LOCKDEP_WARN(lock_is_held(&rcu_sched_lock_map)) > > > > > > > > Thanks > > > > Zqiang > > > > > > > > > > > > > > > > > > Thanx, Paul > > > > > > > > > > > > > invoke_softirq kernel/softirq.c:428 [inline] > > > > > > > > __irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633 > > > > > > > > irq_exit_rcu+0x9/0x30 kernel/softirq.c:645 > > > > > > > > instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline] > > > > > > > > sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043 > > > > > > > > </IRQ> > > > > > > > > <TASK> > > > > > > > > asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702 > > > > > > > > RIP: 0010:debug_lockdep_rcu_enabled+0xd/0x40 kernel/rcu/update.c:320 > > > > > > > > Code: f5 90 0f 0b 90 90 90 eb c6 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 31 c0 83 3d c7 0f 28 04 00 <74> 1e 83 3d 26 42 28 04 00 74 15 65 48 8b 0c 25 c0 d3 03 00 31 c0 > > > > > > > > RSP: 0018:ffffc90000157a50 EFLAGS: 00000202 > > > > > > > > RAX: 0000000000000000 RBX: 00000000000000a0 RCX: 0000000000000001 > > > > > > > > RDX: dffffc0000000000 RSI: ffffffff8bcae740 RDI: ffffffff8c1f7ec0 > > > > > > > > RBP: dffffc0000000000 R08: ffffffff92f3a527 R09: 1ffffffff25e74a4 > > > > > > > > R10: dffffc0000000000 R11: fffffbfff25e74a5 R12: 0000000029373578 > > > > > > > > R13: 1ffff9200002af64 R14: ffffffff814220f3 R15: ffff88813fff90a0 > > > > > > > > rcu_read_lock_sched include/linux/rcupdate.h:934 [inline] > > > > > > > > pfn_valid include/linux/mmzone.h:2019 [inline] > > > > > > > > __virt_addr_valid+0x1a9/0x520 arch/x86/mm/physaddr.c:65 > > > > > > > > kasan_addr_to_slab+0xd/0x80 mm/kasan/common.c:37 > > > > > > > > __kasan_record_aux_stack+0x11/0xc0 mm/kasan/generic.c:526 > > > > > > > > > > > > > > > > > > > > > This should be caused by the following commit: > > > > > > > d818cc76e2b4 ("kasan: Record work creation stack trace with interrupts enabled") > > > > > > > > > > > > > > Is it possible to make the rcu_softirq_qs() run only in ksoftirqd task? > > > > > > > > > > > > use rcu_softirq_qs_periodic() in run_ksoftirqd(). > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > Zqiang > > > > > > > > > > > > > > > __call_rcu_common kernel/rcu/tree.c:3096 [inline] > > > > > > > > call_rcu+0x167/0xa70 kernel/rcu/tree.c:3200 > > > > > > > > context_switch kernel/sched/core.c:5412 [inline] > > > > > > > > __schedule+0x17f0/0x4a50 kernel/sched/core.c:6746 > > > > > > > > __schedule_loop kernel/sched/core.c:6823 [inline] > > > > > > > > schedule+0x14b/0x320 kernel/sched/core.c:6838 > > > > > > > > smpboot_thread_fn+0x61e/0xa30 kernel/smpboot.c:160 > > > > > > > > kthread+0x2f0/0x390 kernel/kthread.c:388 > > > > > > > > ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 > > > > > > > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243 > > > > > > > > </TASK> > > > > > > > > ---------------- > > > > > > > > Code disassembly (best guess): > > > > > > > > 0: f5 cmc > > > > > > > > 1: 90 nop > > > > > > > > 2: 0f 0b ud2 > > > > > > > > 4: 90 nop > > > > > > > > 5: 90 nop > > > > > > > > 6: 90 nop > > > > > > > > 7: eb c6 jmp 0xffffffcf > > > > > > > > 9: 0f 1f 40 00 nopl 0x0(%rax) > > > > > > > > d: 90 nop > > > > > > > > e: 90 nop > > > > > > > > f: 90 nop > > > > > > > > 10: 90 nop > > > > > > > > 11: 90 nop > > > > > > > > 12: 90 nop > > > > > > > > 13: 90 nop > > > > > > > > 14: 90 nop > > > > > > > > 15: 90 nop > > > > > > > > 16: 90 nop > > > > > > > > 17: 90 nop > > > > > > > > 18: 90 nop > > > > > > > > 19: 90 nop > > > > > > > > 1a: 90 nop > > > > > > > > 1b: 90 nop > > > > > > > > 1c: 90 nop > > > > > > > > 1d: f3 0f 1e fa endbr64 > > > > > > > > 21: 31 c0 xor %eax,%eax > > > > > > > > 23: 83 3d c7 0f 28 04 00 cmpl $0x0,0x4280fc7(%rip) # 0x4280ff1 > > > > > > > > * 2a: 74 1e je 0x4a <-- trapping instruction > > > > > > > > 2c: 83 3d 26 42 28 04 00 cmpl $0x0,0x4284226(%rip) # 0x4284259 > > > > > > > > 33: 74 15 je 0x4a > > > > > > > > 35: 65 48 8b 0c 25 c0 d3 mov %gs:0x3d3c0,%rcx > > > > > > > > 3c: 03 00 > > > > > > > > 3e: 31 c0 xor %eax,%eax > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > > This report is generated by a bot. It may contain errors. > > > > > > > > See https://goo.gl/tpsmEJ for more information about syzbot. > > > > > > > > syzbot engineers can be reached at syzkaller@googlegroups.com. > > > > > > > > > > > > > > > > syzbot will keep track of this issue. See: > > > > > > > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > > > > > > > > > > > > > > > > If the report is already addressed, let syzbot know by replying with: > > > > > > > > #syz fix: exact-commit-title > > > > > > > > > > > > > > > > If you want syzbot to run the reproducer, reply with: > > > > > > > > #syz test: git://repo/address.git branch-or-commit-hash > > > > > > > > If you attach or paste a git patch, syzbot will apply it before testing. > > > > > > > > > > > > > > > > If you want to overwrite report's subsystems, reply with: > > > > > > > > #syz set subsystems: new-subsystem > > > > > > > > (See the list of subsystem names on the web dashboard) > > > > > > > > > > > > > > > > If the report is a duplicate of another one, reply with: > > > > > > > > #syz dup: exact-subject-of-another-report > > > > > > > > > > > > > > > > If you want to undo deduplication, reply with: > > > > > > > > #syz undup > > > > > > > > ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [syzbot] [kernel?] WARNING: suspicious RCU usage in __do_softirq 2024-04-19 5:50 ` Z qiang @ 2024-04-19 6:11 ` Z qiang 2024-04-19 7:12 ` Paul E. McKenney 2024-04-22 21:15 ` Thomas Gleixner 2 siblings, 0 replies; 18+ messages in thread From: Z qiang @ 2024-04-19 6:11 UTC (permalink / raw) To: paulmck; +Cc: syzbot, linux-kernel, peterz, syzkaller-bugs, tglx > > > > > On Thu, Apr 18, 2024 at 05:49:38PM +0800, Z qiang wrote: > > > > > > > > On Wed, Apr 17, 2024 at 10:25:01AM +0800, Z qiang wrote: > > > > > > > > > > > > On Tue, Apr 16, 2024 at 04:44:54PM +0800, Z qiang wrote: > > > > > > > On Tue, Apr 16, 2024 at 4:10 PM Z qiang <qiang.zhang1211@gmail.com> wrote: > > > > > > > > > > > > > > > > Cc: Paul > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > syzbot found the following issue on: > > > > > > > > > > > > > > > > > > HEAD commit: c0b832517f62 Add linux-next specific files for 20240402 > > > > > > > > > git tree: linux-next > > > > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=15f64776180000 > > > > > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=afcaf46d374cec8c > > > > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=dce04ed6d1438ad69656 > > > > > > > > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 > > > > > > > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10f00471180000 > > > > > > > > > > > > > > > > > > Downloadable assets: > > > > > > > > > disk image: https://storage.googleapis.com/syzbot-assets/0d36ec76edc7/disk-c0b83251.raw.xz > > > > > > > > > vmlinux: https://storage.googleapis.com/syzbot-assets/6f9bb4e37dd0/vmlinux-c0b83251.xz > > > > > > > > > kernel image: https://storage.googleapis.com/syzbot-assets/2349287b14b7/bzImage-c0b83251.xz > > > > > > > > > > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > > > > > > Reported-by: syzbot+dce04ed6d1438ad69656@syzkaller.appspotmail.com > > > > > > > > > > > > > > > > > > ============================= > > > > > > > > > WARNING: suspicious RCU usage > > > > > > > > > 6.9.0-rc2-next-20240402-syzkaller #0 Not tainted > > > > > > > > > ----------------------------- > > > > > > > > > kernel/rcu/tree.c:276 Illegal rcu_softirq_qs() in RCU read-side critical section! > > > > > > > > > > > > > > > > > > other info that might help us debug this: > > > > > > > > > > > > > > > > > > > > > > > > > > > rcu_scheduler_active = 2, debug_locks = 1 > > > > > > > > > 1 lock held by ksoftirqd/0/16: > > > > > > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline] > > > > > > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: rcu_read_lock_sched include/linux/rcupdate.h:933 [inline] > > > > > > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: pfn_valid include/linux/mmzone.h:2019 [inline] > > > > > > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: __virt_addr_valid+0x183/0x520 arch/x86/mm/physaddr.c:65 > > > > > > > > > > > > > > > > > > stack backtrace: > > > > > > > > > CPU: 0 PID: 16 Comm: ksoftirqd/0 Not tainted 6.9.0-rc2-next-20240402-syzkaller #0 > > > > > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 > > > > > > > > > Call Trace: > > > > > > > > > <IRQ> > > > > > > > > > __dump_stack lib/dump_stack.c:88 [inline] > > > > > > > > > dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 > > > > > > > > > lockdep_rcu_suspicious+0x221/0x340 kernel/locking/lockdep.c:6712 > > > > > > > > > rcu_softirq_qs+0xd9/0x370 kernel/rcu/tree.c:273 > > > > > > > > > __do_softirq+0x5fd/0x980 kernel/softirq.c:568 > > > > > > > > > > > > Huh. This statement is supposed to prevent this call to __do_softirq() > > > > > > from interrupt exit:: > > > > > > > > > > > > if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > > > > > > __this_cpu_read(ksoftirqd) == current) > > > > > > > > > > > > So was the ksoftirqd kthread interrupted at a point where it happens to > > > > > > have softirq enabled? > > > > > > > > > > It should look like this: > > > > > schedule() > > > > > switch_to ksoftirqd/0 > > > > > finish_task_switch > > > > > > > > So this CPU's ksoftirqd task is running. > > > > > > > > > ->put_task_struct_rcu_user > > > > > ->call_rcu(&task->rcu, delayed_put_task_struct) > > > > > ->__kasan_record_aux_stack > > > > > ->pfn_valid > > > > > ->rcu_read_lock_sched > > > > > <interrupt> > > > > > __irq_exit_rcu > > > > > ->__do_softirq > > > > > -> if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > > > > > __this_cpu_read(ksoftirqd) == current) > > > > > > > > But we are also running __do_softirq() from return from interrupt. While > > > > running in this mode, we are not supposed to invoke rcu_softirq_qs(). > > > > But the "__this_cpu_read(ksoftirqd) == current" check yields "true", > > > > so we do call rcu_softirq_qs() anyway. That is a bug. > > > > > > > > We need to upgrade or replace that check to something that returns true > > > > only if called at process level from ksoftirqd. > > > > > > > > Any thoughts on a good way to do that? For example, would adding "&& > > > > in_task()" do the trick, or are there other unfortunate corner cases? > > > > > > The rcu_softirq_qs() is invoked in softirq_handle_begin/end() critical section, > > > in softirqd/0 task context, the "in_task()" should return false, will miss > > > qs report in softirqd/0 task context. > > > > > > diff --git a/kernel/softirq.c b/kernel/softirq.c > > > index b315b21fb28c..9b8f0c0f7675 100644 > > > --- a/kernel/softirq.c > > > +++ b/kernel/softirq.c > > > @@ -563,10 +563,6 @@ asmlinkage __visible void __softirq_entry > > > __do_softirq(void) > > > pending >>= softirq_bit; > > > } > > > > > > - if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > > > - __this_cpu_read(ksoftirqd) == current) > > > - rcu_softirq_qs(); > > > - > > > local_irq_disable(); > > > > > > pending = local_softirq_pending(); > > > @@ -915,6 +911,8 @@ static int ksoftirqd_should_run(unsigned int cpu) > > > > > > static void run_ksoftirqd(unsigned int cpu) > > > { > > > + unsigned long last_qs = jiffies; > > > + > > > ksoftirqd_run_begin(); > > > if (local_softirq_pending()) { > > > /* > > > @@ -923,6 +921,7 @@ static void run_ksoftirqd(unsigned int cpu) > > > */ > > > __do_softirq(); > > > ksoftirqd_run_end(); > > > + rcu_softirq_qs_periodic(last_qs); > > > > Unfortunately, we need the quiescent state to be within __do_softirq(). > > > > > cond_resched(); > > > return; > > > } > > > > > > Any thoughts? > > > > Can we mask the return value from preempt_count(), for example, as shown > > in the CONFIG_PREEMPTION=n version of rcu_flavor_sched_clock_irq()? > > > > The trick is that we should be able to ignore SOFTIRQ_MASK because > > __do_softirq() should not be invoked when softirqs are disabled. > > Emphasis on "should". ;-) > > > > Does it mean this? > > diff --git a/kernel/softirq.c b/kernel/softirq.c > index b315b21fb28c..315b717ec944 100644 > --- a/kernel/softirq.c > +++ b/kernel/softirq.c > @@ -517,6 +517,8 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) > bool in_hardirq; > __u32 pending; > int softirq_bit; > + bool preempt_enabled = (IS_ENABLED(CONFIG_PREEMPT_COUNT) && > + !(preempt_count() & PREEMPT_MASK)); > > /* > * Mask out PF_MEMALLOC as the current task context is borrowed for the > @@ -564,7 +566,8 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) > } > > if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > - __this_cpu_read(ksoftirqd) == current) > + __this_cpu_read(ksoftirqd) == current && > + preempt_enabled && !rcu_preempt_depth()) > rcu_softirq_qs(); > > local_irq_disable(); > > For built with CONFIG_PREEMPTION=n and CONFIG_COUNT=n kernels, > the preempt_enabled is always false. > > or how about this? > > diff --git a/kernel/softirq.c b/kernel/softirq.c > index b315b21fb28c..80b9fb4ee562 100644 > --- a/kernel/softirq.c > +++ b/kernel/softirq.c > @@ -66,6 +66,8 @@ const char * const softirq_to_name[NR_SOFTIRQS] = { > "TASKLET", "SCHED", "HRTIMER", "RCU" > }; > > +static DEFINE_PER_CPU(bool, ksoftirqd_work); > + > /* > * we cannot loop indefinitely here to avoid userspace starvation, > * but we also don't want to introduce a worst case 1/HZ latency > @@ -404,10 +406,12 @@ static inline void softirq_handle_end(void) > static inline void ksoftirqd_run_begin(void) > { > local_irq_disable(); > + per_cpu(ksoftirqd_work, smp_processor_id()) = true; > } > > static inline void ksoftirqd_run_end(void) > { > + per_cpu(ksoftirqd_work, smp_processor_id()) = false; > local_irq_enable(); > } > > @@ -564,7 +568,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) > } > > if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > - __this_cpu_read(ksoftirqd) == current) > + __this_cpu_read(ksoftirqd) == current && ksoftirqd_work) use __this_cpu_read(ksoftirqd_work) > rcu_softirq_qs(); > > local_irq_disable(); > @@ -970,6 +974,10 @@ static struct smp_hotplug_thread softirq_threads = { > > static __init int spawn_ksoftirqd(void) > { > + int cpu; > + > + for_each_possible_cpu(cpu) > + per_cpu(ksoftirqd_work, cpu) = false; > cpuhp_setup_state_nocalls(CPUHP_SOFTIRQ_DEAD, "softirq:dead", NULL, > takeover_tasklets); > BUG_ON(smpboot_register_percpu_thread(&softirq_threads)); > > > Thanks > Zqiang > > > > > > > Thanx, Paul > > > > > Thanks > > > Zqiang > > > > > > > > > > > > > > (And good job tracking this down, by the way!) > > > > > > > > Thanx, Paul > > > > > > > > > rcu_softirq_qs > > > > > -> > > > > > RCU_LOCKDEP_WARN(lock_is_held(&rcu_sched_lock_map)) > > > > > > > > > > Thanks > > > > > Zqiang > > > > > > > > > > > > > > > > > > > > > > Thanx, Paul > > > > > > > > > > > > > > > invoke_softirq kernel/softirq.c:428 [inline] > > > > > > > > > __irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633 > > > > > > > > > irq_exit_rcu+0x9/0x30 kernel/softirq.c:645 > > > > > > > > > instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline] > > > > > > > > > sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043 > > > > > > > > > </IRQ> > > > > > > > > > <TASK> > > > > > > > > > asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702 > > > > > > > > > RIP: 0010:debug_lockdep_rcu_enabled+0xd/0x40 kernel/rcu/update.c:320 > > > > > > > > > Code: f5 90 0f 0b 90 90 90 eb c6 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 31 c0 83 3d c7 0f 28 04 00 <74> 1e 83 3d 26 42 28 04 00 74 15 65 48 8b 0c 25 c0 d3 03 00 31 c0 > > > > > > > > > RSP: 0018:ffffc90000157a50 EFLAGS: 00000202 > > > > > > > > > RAX: 0000000000000000 RBX: 00000000000000a0 RCX: 0000000000000001 > > > > > > > > > RDX: dffffc0000000000 RSI: ffffffff8bcae740 RDI: ffffffff8c1f7ec0 > > > > > > > > > RBP: dffffc0000000000 R08: ffffffff92f3a527 R09: 1ffffffff25e74a4 > > > > > > > > > R10: dffffc0000000000 R11: fffffbfff25e74a5 R12: 0000000029373578 > > > > > > > > > R13: 1ffff9200002af64 R14: ffffffff814220f3 R15: ffff88813fff90a0 > > > > > > > > > rcu_read_lock_sched include/linux/rcupdate.h:934 [inline] > > > > > > > > > pfn_valid include/linux/mmzone.h:2019 [inline] > > > > > > > > > __virt_addr_valid+0x1a9/0x520 arch/x86/mm/physaddr.c:65 > > > > > > > > > kasan_addr_to_slab+0xd/0x80 mm/kasan/common.c:37 > > > > > > > > > __kasan_record_aux_stack+0x11/0xc0 mm/kasan/generic.c:526 > > > > > > > > > > > > > > > > > > > > > > > > This should be caused by the following commit: > > > > > > > > d818cc76e2b4 ("kasan: Record work creation stack trace with interrupts enabled") > > > > > > > > > > > > > > > > Is it possible to make the rcu_softirq_qs() run only in ksoftirqd task? > > > > > > > > > > > > > > use rcu_softirq_qs_periodic() in run_ksoftirqd(). > > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > Zqiang > > > > > > > > > > > > > > > > > __call_rcu_common kernel/rcu/tree.c:3096 [inline] > > > > > > > > > call_rcu+0x167/0xa70 kernel/rcu/tree.c:3200 > > > > > > > > > context_switch kernel/sched/core.c:5412 [inline] > > > > > > > > > __schedule+0x17f0/0x4a50 kernel/sched/core.c:6746 > > > > > > > > > __schedule_loop kernel/sched/core.c:6823 [inline] > > > > > > > > > schedule+0x14b/0x320 kernel/sched/core.c:6838 > > > > > > > > > smpboot_thread_fn+0x61e/0xa30 kernel/smpboot.c:160 > > > > > > > > > kthread+0x2f0/0x390 kernel/kthread.c:388 > > > > > > > > > ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 > > > > > > > > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243 > > > > > > > > > </TASK> > > > > > > > > > ---------------- > > > > > > > > > Code disassembly (best guess): > > > > > > > > > 0: f5 cmc > > > > > > > > > 1: 90 nop > > > > > > > > > 2: 0f 0b ud2 > > > > > > > > > 4: 90 nop > > > > > > > > > 5: 90 nop > > > > > > > > > 6: 90 nop > > > > > > > > > 7: eb c6 jmp 0xffffffcf > > > > > > > > > 9: 0f 1f 40 00 nopl 0x0(%rax) > > > > > > > > > d: 90 nop > > > > > > > > > e: 90 nop > > > > > > > > > f: 90 nop > > > > > > > > > 10: 90 nop > > > > > > > > > 11: 90 nop > > > > > > > > > 12: 90 nop > > > > > > > > > 13: 90 nop > > > > > > > > > 14: 90 nop > > > > > > > > > 15: 90 nop > > > > > > > > > 16: 90 nop > > > > > > > > > 17: 90 nop > > > > > > > > > 18: 90 nop > > > > > > > > > 19: 90 nop > > > > > > > > > 1a: 90 nop > > > > > > > > > 1b: 90 nop > > > > > > > > > 1c: 90 nop > > > > > > > > > 1d: f3 0f 1e fa endbr64 > > > > > > > > > 21: 31 c0 xor %eax,%eax > > > > > > > > > 23: 83 3d c7 0f 28 04 00 cmpl $0x0,0x4280fc7(%rip) # 0x4280ff1 > > > > > > > > > * 2a: 74 1e je 0x4a <-- trapping instruction > > > > > > > > > 2c: 83 3d 26 42 28 04 00 cmpl $0x0,0x4284226(%rip) # 0x4284259 > > > > > > > > > 33: 74 15 je 0x4a > > > > > > > > > 35: 65 48 8b 0c 25 c0 d3 mov %gs:0x3d3c0,%rcx > > > > > > > > > 3c: 03 00 > > > > > > > > > 3e: 31 c0 xor %eax,%eax > > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > > > This report is generated by a bot. It may contain errors. > > > > > > > > > See https://goo.gl/tpsmEJ for more information about syzbot. > > > > > > > > > syzbot engineers can be reached at syzkaller@googlegroups.com. > > > > > > > > > > > > > > > > > > syzbot will keep track of this issue. See: > > > > > > > > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > > > > > > > > > > > > > > > > > > If the report is already addressed, let syzbot know by replying with: > > > > > > > > > #syz fix: exact-commit-title > > > > > > > > > > > > > > > > > > If you want syzbot to run the reproducer, reply with: > > > > > > > > > #syz test: git://repo/address.git branch-or-commit-hash > > > > > > > > > If you attach or paste a git patch, syzbot will apply it before testing. > > > > > > > > > > > > > > > > > > If you want to overwrite report's subsystems, reply with: > > > > > > > > > #syz set subsystems: new-subsystem > > > > > > > > > (See the list of subsystem names on the web dashboard) > > > > > > > > > > > > > > > > > > If the report is a duplicate of another one, reply with: > > > > > > > > > #syz dup: exact-subject-of-another-report > > > > > > > > > > > > > > > > > > If you want to undo deduplication, reply with: > > > > > > > > > #syz undup > > > > > > > > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot] [kernel?] WARNING: suspicious RCU usage in __do_softirq 2024-04-19 5:50 ` Z qiang 2024-04-19 6:11 ` Z qiang @ 2024-04-19 7:12 ` Paul E. McKenney 2024-04-19 7:29 ` Z qiang 2024-04-22 21:15 ` Thomas Gleixner 2 siblings, 1 reply; 18+ messages in thread From: Paul E. McKenney @ 2024-04-19 7:12 UTC (permalink / raw) To: Z qiang; +Cc: syzbot, linux-kernel, peterz, syzkaller-bugs, tglx On Fri, Apr 19, 2024 at 01:50:02PM +0800, Z qiang wrote: > > > > On Thu, Apr 18, 2024 at 05:49:38PM +0800, Z qiang wrote: > > > > > > > > On Wed, Apr 17, 2024 at 10:25:01AM +0800, Z qiang wrote: > > > > > > > > > > > > On Tue, Apr 16, 2024 at 04:44:54PM +0800, Z qiang wrote: > > > > > > > On Tue, Apr 16, 2024 at 4:10 PM Z qiang <qiang.zhang1211@gmail.com> wrote: > > > > > > > > > > > > > > > > Cc: Paul > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > syzbot found the following issue on: > > > > > > > > > > > > > > > > > > HEAD commit: c0b832517f62 Add linux-next specific files for 20240402 > > > > > > > > > git tree: linux-next > > > > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=15f64776180000 > > > > > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=afcaf46d374cec8c > > > > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=dce04ed6d1438ad69656 > > > > > > > > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 > > > > > > > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10f00471180000 > > > > > > > > > > > > > > > > > > Downloadable assets: > > > > > > > > > disk image: https://storage.googleapis.com/syzbot-assets/0d36ec76edc7/disk-c0b83251.raw.xz > > > > > > > > > vmlinux: https://storage.googleapis.com/syzbot-assets/6f9bb4e37dd0/vmlinux-c0b83251.xz > > > > > > > > > kernel image: https://storage.googleapis.com/syzbot-assets/2349287b14b7/bzImage-c0b83251.xz > > > > > > > > > > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > > > > > > Reported-by: syzbot+dce04ed6d1438ad69656@syzkaller.appspotmail.com > > > > > > > > > > > > > > > > > > ============================= > > > > > > > > > WARNING: suspicious RCU usage > > > > > > > > > 6.9.0-rc2-next-20240402-syzkaller #0 Not tainted > > > > > > > > > ----------------------------- > > > > > > > > > kernel/rcu/tree.c:276 Illegal rcu_softirq_qs() in RCU read-side critical section! > > > > > > > > > > > > > > > > > > other info that might help us debug this: > > > > > > > > > > > > > > > > > > > > > > > > > > > rcu_scheduler_active = 2, debug_locks = 1 > > > > > > > > > 1 lock held by ksoftirqd/0/16: > > > > > > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline] > > > > > > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: rcu_read_lock_sched include/linux/rcupdate.h:933 [inline] > > > > > > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: pfn_valid include/linux/mmzone.h:2019 [inline] > > > > > > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: __virt_addr_valid+0x183/0x520 arch/x86/mm/physaddr.c:65 > > > > > > > > > > > > > > > > > > stack backtrace: > > > > > > > > > CPU: 0 PID: 16 Comm: ksoftirqd/0 Not tainted 6.9.0-rc2-next-20240402-syzkaller #0 > > > > > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 > > > > > > > > > Call Trace: > > > > > > > > > <IRQ> > > > > > > > > > __dump_stack lib/dump_stack.c:88 [inline] > > > > > > > > > dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 > > > > > > > > > lockdep_rcu_suspicious+0x221/0x340 kernel/locking/lockdep.c:6712 > > > > > > > > > rcu_softirq_qs+0xd9/0x370 kernel/rcu/tree.c:273 > > > > > > > > > __do_softirq+0x5fd/0x980 kernel/softirq.c:568 > > > > > > > > > > > > Huh. This statement is supposed to prevent this call to __do_softirq() > > > > > > from interrupt exit:: > > > > > > > > > > > > if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > > > > > > __this_cpu_read(ksoftirqd) == current) > > > > > > > > > > > > So was the ksoftirqd kthread interrupted at a point where it happens to > > > > > > have softirq enabled? > > > > > > > > > > It should look like this: > > > > > schedule() > > > > > switch_to ksoftirqd/0 > > > > > finish_task_switch > > > > > > > > So this CPU's ksoftirqd task is running. > > > > > > > > > ->put_task_struct_rcu_user > > > > > ->call_rcu(&task->rcu, delayed_put_task_struct) > > > > > ->__kasan_record_aux_stack > > > > > ->pfn_valid > > > > > ->rcu_read_lock_sched > > > > > <interrupt> > > > > > __irq_exit_rcu > > > > > ->__do_softirq > > > > > -> if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > > > > > __this_cpu_read(ksoftirqd) == current) > > > > > > > > But we are also running __do_softirq() from return from interrupt. While > > > > running in this mode, we are not supposed to invoke rcu_softirq_qs(). > > > > But the "__this_cpu_read(ksoftirqd) == current" check yields "true", > > > > so we do call rcu_softirq_qs() anyway. That is a bug. > > > > > > > > We need to upgrade or replace that check to something that returns true > > > > only if called at process level from ksoftirqd. > > > > > > > > Any thoughts on a good way to do that? For example, would adding "&& > > > > in_task()" do the trick, or are there other unfortunate corner cases? > > > > > > The rcu_softirq_qs() is invoked in softirq_handle_begin/end() critical section, > > > in softirqd/0 task context, the "in_task()" should return false, will miss > > > qs report in softirqd/0 task context. > > > > > > diff --git a/kernel/softirq.c b/kernel/softirq.c > > > index b315b21fb28c..9b8f0c0f7675 100644 > > > --- a/kernel/softirq.c > > > +++ b/kernel/softirq.c > > > @@ -563,10 +563,6 @@ asmlinkage __visible void __softirq_entry > > > __do_softirq(void) > > > pending >>= softirq_bit; > > > } > > > > > > - if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > > > - __this_cpu_read(ksoftirqd) == current) > > > - rcu_softirq_qs(); > > > - > > > local_irq_disable(); > > > > > > pending = local_softirq_pending(); > > > @@ -915,6 +911,8 @@ static int ksoftirqd_should_run(unsigned int cpu) > > > > > > static void run_ksoftirqd(unsigned int cpu) > > > { > > > + unsigned long last_qs = jiffies; > > > + > > > ksoftirqd_run_begin(); > > > if (local_softirq_pending()) { > > > /* > > > @@ -923,6 +921,7 @@ static void run_ksoftirqd(unsigned int cpu) > > > */ > > > __do_softirq(); > > > ksoftirqd_run_end(); > > > + rcu_softirq_qs_periodic(last_qs); > > > > Unfortunately, we need the quiescent state to be within __do_softirq(). > > > > > cond_resched(); > > > return; > > > } > > > > > > Any thoughts? > > > > Can we mask the return value from preempt_count(), for example, as shown > > in the CONFIG_PREEMPTION=n version of rcu_flavor_sched_clock_irq()? > > > > The trick is that we should be able to ignore SOFTIRQ_MASK because > > __do_softirq() should not be invoked when softirqs are disabled. > > Emphasis on "should". ;-) > > > > Does it mean this? > > diff --git a/kernel/softirq.c b/kernel/softirq.c > index b315b21fb28c..315b717ec944 100644 > --- a/kernel/softirq.c > +++ b/kernel/softirq.c > @@ -517,6 +517,8 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) > bool in_hardirq; > __u32 pending; > int softirq_bit; > + bool preempt_enabled = (IS_ENABLED(CONFIG_PREEMPT_COUNT) && > + !(preempt_count() & PREEMPT_MASK)); > > /* > * Mask out PF_MEMALLOC as the current task context is borrowed for the > @@ -564,7 +566,8 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) > } > > if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > - __this_cpu_read(ksoftirqd) == current) > + __this_cpu_read(ksoftirqd) == current && > + preempt_enabled && !rcu_preempt_depth()) > rcu_softirq_qs(); > > local_irq_disable(); > > For built with CONFIG_PREEMPTION=n and CONFIG_COUNT=n kernels, > the preempt_enabled is always false. Assuming that this works, it is preferred over the next one. Thanx, Paul > or how about this? > > diff --git a/kernel/softirq.c b/kernel/softirq.c > index b315b21fb28c..80b9fb4ee562 100644 > --- a/kernel/softirq.c > +++ b/kernel/softirq.c > @@ -66,6 +66,8 @@ const char * const softirq_to_name[NR_SOFTIRQS] = { > "TASKLET", "SCHED", "HRTIMER", "RCU" > }; > > +static DEFINE_PER_CPU(bool, ksoftirqd_work); > + > /* > * we cannot loop indefinitely here to avoid userspace starvation, > * but we also don't want to introduce a worst case 1/HZ latency > @@ -404,10 +406,12 @@ static inline void softirq_handle_end(void) > static inline void ksoftirqd_run_begin(void) > { > local_irq_disable(); > + per_cpu(ksoftirqd_work, smp_processor_id()) = true; > } > > static inline void ksoftirqd_run_end(void) > { > + per_cpu(ksoftirqd_work, smp_processor_id()) = false; > local_irq_enable(); > } > > @@ -564,7 +568,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) > } > > if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > - __this_cpu_read(ksoftirqd) == current) > + __this_cpu_read(ksoftirqd) == current && ksoftirqd_work) > rcu_softirq_qs(); > > local_irq_disable(); > @@ -970,6 +974,10 @@ static struct smp_hotplug_thread softirq_threads = { > > static __init int spawn_ksoftirqd(void) > { > + int cpu; > + > + for_each_possible_cpu(cpu) > + per_cpu(ksoftirqd_work, cpu) = false; > cpuhp_setup_state_nocalls(CPUHP_SOFTIRQ_DEAD, "softirq:dead", NULL, > takeover_tasklets); > BUG_ON(smpboot_register_percpu_thread(&softirq_threads)); > > > Thanks > Zqiang > > > > > > > Thanx, Paul > > > > > Thanks > > > Zqiang > > > > > > > > > > > > > > (And good job tracking this down, by the way!) > > > > > > > > Thanx, Paul > > > > > > > > > rcu_softirq_qs > > > > > -> > > > > > RCU_LOCKDEP_WARN(lock_is_held(&rcu_sched_lock_map)) > > > > > > > > > > Thanks > > > > > Zqiang > > > > > > > > > > > > > > > > > > > > > > Thanx, Paul > > > > > > > > > > > > > > > invoke_softirq kernel/softirq.c:428 [inline] > > > > > > > > > __irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633 > > > > > > > > > irq_exit_rcu+0x9/0x30 kernel/softirq.c:645 > > > > > > > > > instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline] > > > > > > > > > sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043 > > > > > > > > > </IRQ> > > > > > > > > > <TASK> > > > > > > > > > asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702 > > > > > > > > > RIP: 0010:debug_lockdep_rcu_enabled+0xd/0x40 kernel/rcu/update.c:320 > > > > > > > > > Code: f5 90 0f 0b 90 90 90 eb c6 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 31 c0 83 3d c7 0f 28 04 00 <74> 1e 83 3d 26 42 28 04 00 74 15 65 48 8b 0c 25 c0 d3 03 00 31 c0 > > > > > > > > > RSP: 0018:ffffc90000157a50 EFLAGS: 00000202 > > > > > > > > > RAX: 0000000000000000 RBX: 00000000000000a0 RCX: 0000000000000001 > > > > > > > > > RDX: dffffc0000000000 RSI: ffffffff8bcae740 RDI: ffffffff8c1f7ec0 > > > > > > > > > RBP: dffffc0000000000 R08: ffffffff92f3a527 R09: 1ffffffff25e74a4 > > > > > > > > > R10: dffffc0000000000 R11: fffffbfff25e74a5 R12: 0000000029373578 > > > > > > > > > R13: 1ffff9200002af64 R14: ffffffff814220f3 R15: ffff88813fff90a0 > > > > > > > > > rcu_read_lock_sched include/linux/rcupdate.h:934 [inline] > > > > > > > > > pfn_valid include/linux/mmzone.h:2019 [inline] > > > > > > > > > __virt_addr_valid+0x1a9/0x520 arch/x86/mm/physaddr.c:65 > > > > > > > > > kasan_addr_to_slab+0xd/0x80 mm/kasan/common.c:37 > > > > > > > > > __kasan_record_aux_stack+0x11/0xc0 mm/kasan/generic.c:526 > > > > > > > > > > > > > > > > > > > > > > > > This should be caused by the following commit: > > > > > > > > d818cc76e2b4 ("kasan: Record work creation stack trace with interrupts enabled") > > > > > > > > > > > > > > > > Is it possible to make the rcu_softirq_qs() run only in ksoftirqd task? > > > > > > > > > > > > > > use rcu_softirq_qs_periodic() in run_ksoftirqd(). > > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > Zqiang > > > > > > > > > > > > > > > > > __call_rcu_common kernel/rcu/tree.c:3096 [inline] > > > > > > > > > call_rcu+0x167/0xa70 kernel/rcu/tree.c:3200 > > > > > > > > > context_switch kernel/sched/core.c:5412 [inline] > > > > > > > > > __schedule+0x17f0/0x4a50 kernel/sched/core.c:6746 > > > > > > > > > __schedule_loop kernel/sched/core.c:6823 [inline] > > > > > > > > > schedule+0x14b/0x320 kernel/sched/core.c:6838 > > > > > > > > > smpboot_thread_fn+0x61e/0xa30 kernel/smpboot.c:160 > > > > > > > > > kthread+0x2f0/0x390 kernel/kthread.c:388 > > > > > > > > > ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 > > > > > > > > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243 > > > > > > > > > </TASK> > > > > > > > > > ---------------- > > > > > > > > > Code disassembly (best guess): > > > > > > > > > 0: f5 cmc > > > > > > > > > 1: 90 nop > > > > > > > > > 2: 0f 0b ud2 > > > > > > > > > 4: 90 nop > > > > > > > > > 5: 90 nop > > > > > > > > > 6: 90 nop > > > > > > > > > 7: eb c6 jmp 0xffffffcf > > > > > > > > > 9: 0f 1f 40 00 nopl 0x0(%rax) > > > > > > > > > d: 90 nop > > > > > > > > > e: 90 nop > > > > > > > > > f: 90 nop > > > > > > > > > 10: 90 nop > > > > > > > > > 11: 90 nop > > > > > > > > > 12: 90 nop > > > > > > > > > 13: 90 nop > > > > > > > > > 14: 90 nop > > > > > > > > > 15: 90 nop > > > > > > > > > 16: 90 nop > > > > > > > > > 17: 90 nop > > > > > > > > > 18: 90 nop > > > > > > > > > 19: 90 nop > > > > > > > > > 1a: 90 nop > > > > > > > > > 1b: 90 nop > > > > > > > > > 1c: 90 nop > > > > > > > > > 1d: f3 0f 1e fa endbr64 > > > > > > > > > 21: 31 c0 xor %eax,%eax > > > > > > > > > 23: 83 3d c7 0f 28 04 00 cmpl $0x0,0x4280fc7(%rip) # 0x4280ff1 > > > > > > > > > * 2a: 74 1e je 0x4a <-- trapping instruction > > > > > > > > > 2c: 83 3d 26 42 28 04 00 cmpl $0x0,0x4284226(%rip) # 0x4284259 > > > > > > > > > 33: 74 15 je 0x4a > > > > > > > > > 35: 65 48 8b 0c 25 c0 d3 mov %gs:0x3d3c0,%rcx > > > > > > > > > 3c: 03 00 > > > > > > > > > 3e: 31 c0 xor %eax,%eax > > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > > > This report is generated by a bot. It may contain errors. > > > > > > > > > See https://goo.gl/tpsmEJ for more information about syzbot. > > > > > > > > > syzbot engineers can be reached at syzkaller@googlegroups.com. > > > > > > > > > > > > > > > > > > syzbot will keep track of this issue. See: > > > > > > > > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > > > > > > > > > > > > > > > > > > If the report is already addressed, let syzbot know by replying with: > > > > > > > > > #syz fix: exact-commit-title > > > > > > > > > > > > > > > > > > If you want syzbot to run the reproducer, reply with: > > > > > > > > > #syz test: git://repo/address.git branch-or-commit-hash > > > > > > > > > If you attach or paste a git patch, syzbot will apply it before testing. > > > > > > > > > > > > > > > > > > If you want to overwrite report's subsystems, reply with: > > > > > > > > > #syz set subsystems: new-subsystem > > > > > > > > > (See the list of subsystem names on the web dashboard) > > > > > > > > > > > > > > > > > > If the report is a duplicate of another one, reply with: > > > > > > > > > #syz dup: exact-subject-of-another-report > > > > > > > > > > > > > > > > > > If you want to undo deduplication, reply with: > > > > > > > > > #syz undup > > > > > > > > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot] [kernel?] WARNING: suspicious RCU usage in __do_softirq 2024-04-19 7:12 ` Paul E. McKenney @ 2024-04-19 7:29 ` Z qiang 0 siblings, 0 replies; 18+ messages in thread From: Z qiang @ 2024-04-19 7:29 UTC (permalink / raw) To: paulmck; +Cc: syzbot, linux-kernel, peterz, syzkaller-bugs, tglx On Fri, Apr 19, 2024 at 3:12 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Fri, Apr 19, 2024 at 01:50:02PM +0800, Z qiang wrote: > > > > > > On Thu, Apr 18, 2024 at 05:49:38PM +0800, Z qiang wrote: > > > > > > > > > > On Wed, Apr 17, 2024 at 10:25:01AM +0800, Z qiang wrote: > > > > > > > > > > > > > > On Tue, Apr 16, 2024 at 04:44:54PM +0800, Z qiang wrote: > > > > > > > > On Tue, Apr 16, 2024 at 4:10 PM Z qiang <qiang.zhang1211@gmail.com> wrote: > > > > > > > > > > > > > > > > > > Cc: Paul > > > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > > > syzbot found the following issue on: > > > > > > > > > > > > > > > > > > > > HEAD commit: c0b832517f62 Add linux-next specific files for 20240402 > > > > > > > > > > git tree: linux-next > > > > > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=15f64776180000 > > > > > > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=afcaf46d374cec8c > > > > > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=dce04ed6d1438ad69656 > > > > > > > > > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 > > > > > > > > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10f00471180000 > > > > > > > > > > > > > > > > > > > > Downloadable assets: > > > > > > > > > > disk image: https://storage.googleapis.com/syzbot-assets/0d36ec76edc7/disk-c0b83251.raw.xz > > > > > > > > > > vmlinux: https://storage.googleapis.com/syzbot-assets/6f9bb4e37dd0/vmlinux-c0b83251.xz > > > > > > > > > > kernel image: https://storage.googleapis.com/syzbot-assets/2349287b14b7/bzImage-c0b83251.xz > > > > > > > > > > > > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > > > > > > > Reported-by: syzbot+dce04ed6d1438ad69656@syzkaller.appspotmail.com > > > > > > > > > > > > > > > > > > > > ============================= > > > > > > > > > > WARNING: suspicious RCU usage > > > > > > > > > > 6.9.0-rc2-next-20240402-syzkaller #0 Not tainted > > > > > > > > > > ----------------------------- > > > > > > > > > > kernel/rcu/tree.c:276 Illegal rcu_softirq_qs() in RCU read-side critical section! > > > > > > > > > > > > > > > > > > > > other info that might help us debug this: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > rcu_scheduler_active = 2, debug_locks = 1 > > > > > > > > > > 1 lock held by ksoftirqd/0/16: > > > > > > > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline] > > > > > > > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: rcu_read_lock_sched include/linux/rcupdate.h:933 [inline] > > > > > > > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: pfn_valid include/linux/mmzone.h:2019 [inline] > > > > > > > > > > #0: ffffffff8e334d20 (rcu_read_lock_sched){....}-{1:2}, at: __virt_addr_valid+0x183/0x520 arch/x86/mm/physaddr.c:65 > > > > > > > > > > > > > > > > > > > > stack backtrace: > > > > > > > > > > CPU: 0 PID: 16 Comm: ksoftirqd/0 Not tainted 6.9.0-rc2-next-20240402-syzkaller #0 > > > > > > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 > > > > > > > > > > Call Trace: > > > > > > > > > > <IRQ> > > > > > > > > > > __dump_stack lib/dump_stack.c:88 [inline] > > > > > > > > > > dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 > > > > > > > > > > lockdep_rcu_suspicious+0x221/0x340 kernel/locking/lockdep.c:6712 > > > > > > > > > > rcu_softirq_qs+0xd9/0x370 kernel/rcu/tree.c:273 > > > > > > > > > > __do_softirq+0x5fd/0x980 kernel/softirq.c:568 > > > > > > > > > > > > > > Huh. This statement is supposed to prevent this call to __do_softirq() > > > > > > > from interrupt exit:: > > > > > > > > > > > > > > if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > > > > > > > __this_cpu_read(ksoftirqd) == current) > > > > > > > > > > > > > > So was the ksoftirqd kthread interrupted at a point where it happens to > > > > > > > have softirq enabled? > > > > > > > > > > > > It should look like this: > > > > > > schedule() > > > > > > switch_to ksoftirqd/0 > > > > > > finish_task_switch > > > > > > > > > > So this CPU's ksoftirqd task is running. > > > > > > > > > > > ->put_task_struct_rcu_user > > > > > > ->call_rcu(&task->rcu, delayed_put_task_struct) > > > > > > ->__kasan_record_aux_stack > > > > > > ->pfn_valid > > > > > > ->rcu_read_lock_sched > > > > > > <interrupt> > > > > > > __irq_exit_rcu > > > > > > ->__do_softirq > > > > > > -> if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > > > > > > __this_cpu_read(ksoftirqd) == current) > > > > > > > > > > But we are also running __do_softirq() from return from interrupt. While > > > > > running in this mode, we are not supposed to invoke rcu_softirq_qs(). > > > > > But the "__this_cpu_read(ksoftirqd) == current" check yields "true", > > > > > so we do call rcu_softirq_qs() anyway. That is a bug. > > > > > > > > > > We need to upgrade or replace that check to something that returns true > > > > > only if called at process level from ksoftirqd. > > > > > > > > > > Any thoughts on a good way to do that? For example, would adding "&& > > > > > in_task()" do the trick, or are there other unfortunate corner cases? > > > > > > > > The rcu_softirq_qs() is invoked in softirq_handle_begin/end() critical section, > > > > in softirqd/0 task context, the "in_task()" should return false, will miss > > > > qs report in softirqd/0 task context. > > > > > > > > diff --git a/kernel/softirq.c b/kernel/softirq.c > > > > index b315b21fb28c..9b8f0c0f7675 100644 > > > > --- a/kernel/softirq.c > > > > +++ b/kernel/softirq.c > > > > @@ -563,10 +563,6 @@ asmlinkage __visible void __softirq_entry > > > > __do_softirq(void) > > > > pending >>= softirq_bit; > > > > } > > > > > > > > - if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > > > > - __this_cpu_read(ksoftirqd) == current) > > > > - rcu_softirq_qs(); > > > > - > > > > local_irq_disable(); > > > > > > > > pending = local_softirq_pending(); > > > > @@ -915,6 +911,8 @@ static int ksoftirqd_should_run(unsigned int cpu) > > > > > > > > static void run_ksoftirqd(unsigned int cpu) > > > > { > > > > + unsigned long last_qs = jiffies; > > > > + > > > > ksoftirqd_run_begin(); > > > > if (local_softirq_pending()) { > > > > /* > > > > @@ -923,6 +921,7 @@ static void run_ksoftirqd(unsigned int cpu) > > > > */ > > > > __do_softirq(); > > > > ksoftirqd_run_end(); > > > > + rcu_softirq_qs_periodic(last_qs); > > > > > > Unfortunately, we need the quiescent state to be within __do_softirq(). > > > > > > > cond_resched(); > > > > return; > > > > } > > > > > > > > Any thoughts? > > > > > > Can we mask the return value from preempt_count(), for example, as shown > > > in the CONFIG_PREEMPTION=n version of rcu_flavor_sched_clock_irq()? > > > > > > The trick is that we should be able to ignore SOFTIRQ_MASK because > > > __do_softirq() should not be invoked when softirqs are disabled. > > > Emphasis on "should". ;-) > > > > > > > > Does it mean this? > > > > diff --git a/kernel/softirq.c b/kernel/softirq.c > > index b315b21fb28c..315b717ec944 100644 > > --- a/kernel/softirq.c > > +++ b/kernel/softirq.c > > @@ -517,6 +517,8 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) > > bool in_hardirq; > > __u32 pending; > > int softirq_bit; > > + bool preempt_enabled = (IS_ENABLED(CONFIG_PREEMPT_COUNT) && > > + !(preempt_count() & PREEMPT_MASK)); > > > > /* > > * Mask out PF_MEMALLOC as the current task context is borrowed for the > > @@ -564,7 +566,8 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) > > } > > > > if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > > - __this_cpu_read(ksoftirqd) == current) > > + __this_cpu_read(ksoftirqd) == current && > > + preempt_enabled && !rcu_preempt_depth()) > > rcu_softirq_qs(); > > > > local_irq_disable(); > > > > For built with CONFIG_PREEMPTION=n and CONFIG_COUNT=n kernels, > > the preempt_enabled is always false. > > Assuming that this works, it is preferred over the next one. But for built with CONFIG_PREEMPTION=n and CONFIG_COUNT=n kernels, will miss invoke rcu_softirq_qs(), even if in the ksoftirqd task context. Thanks Zqiang > > Thanx, Paul > > > or how about this? > > > > diff --git a/kernel/softirq.c b/kernel/softirq.c > > index b315b21fb28c..80b9fb4ee562 100644 > > --- a/kernel/softirq.c > > +++ b/kernel/softirq.c > > @@ -66,6 +66,8 @@ const char * const softirq_to_name[NR_SOFTIRQS] = { > > "TASKLET", "SCHED", "HRTIMER", "RCU" > > }; > > > > +static DEFINE_PER_CPU(bool, ksoftirqd_work); > > + > > /* > > * we cannot loop indefinitely here to avoid userspace starvation, > > * but we also don't want to introduce a worst case 1/HZ latency > > @@ -404,10 +406,12 @@ static inline void softirq_handle_end(void) > > static inline void ksoftirqd_run_begin(void) > > { > > local_irq_disable(); > > + per_cpu(ksoftirqd_work, smp_processor_id()) = true; > > } > > > > static inline void ksoftirqd_run_end(void) > > { > > + per_cpu(ksoftirqd_work, smp_processor_id()) = false; > > local_irq_enable(); > > } > > > > @@ -564,7 +568,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) > > } > > > > if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > > - __this_cpu_read(ksoftirqd) == current) > > + __this_cpu_read(ksoftirqd) == current && ksoftirqd_work) > > rcu_softirq_qs(); > > > > local_irq_disable(); > > @@ -970,6 +974,10 @@ static struct smp_hotplug_thread softirq_threads = { > > > > static __init int spawn_ksoftirqd(void) > > { > > + int cpu; > > + > > + for_each_possible_cpu(cpu) > > + per_cpu(ksoftirqd_work, cpu) = false; > > cpuhp_setup_state_nocalls(CPUHP_SOFTIRQ_DEAD, "softirq:dead", NULL, > > takeover_tasklets); > > BUG_ON(smpboot_register_percpu_thread(&softirq_threads)); > > > > > > Thanks > > Zqiang > > > > > > > > > > > > Thanx, Paul > > > > > > > Thanks > > > > Zqiang > > > > > > > > > > > > > > > > > > (And good job tracking this down, by the way!) > > > > > > > > > > Thanx, Paul > > > > > > > > > > > rcu_softirq_qs > > > > > > -> > > > > > > RCU_LOCKDEP_WARN(lock_is_held(&rcu_sched_lock_map)) > > > > > > > > > > > > Thanks > > > > > > Zqiang > > > > > > > > > > > > > > > > > > > > > > > > > > Thanx, Paul > > > > > > > > > > > > > > > > > invoke_softirq kernel/softirq.c:428 [inline] > > > > > > > > > > __irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633 > > > > > > > > > > irq_exit_rcu+0x9/0x30 kernel/softirq.c:645 > > > > > > > > > > instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline] > > > > > > > > > > sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043 > > > > > > > > > > </IRQ> > > > > > > > > > > <TASK> > > > > > > > > > > asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702 > > > > > > > > > > RIP: 0010:debug_lockdep_rcu_enabled+0xd/0x40 kernel/rcu/update.c:320 > > > > > > > > > > Code: f5 90 0f 0b 90 90 90 eb c6 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 31 c0 83 3d c7 0f 28 04 00 <74> 1e 83 3d 26 42 28 04 00 74 15 65 48 8b 0c 25 c0 d3 03 00 31 c0 > > > > > > > > > > RSP: 0018:ffffc90000157a50 EFLAGS: 00000202 > > > > > > > > > > RAX: 0000000000000000 RBX: 00000000000000a0 RCX: 0000000000000001 > > > > > > > > > > RDX: dffffc0000000000 RSI: ffffffff8bcae740 RDI: ffffffff8c1f7ec0 > > > > > > > > > > RBP: dffffc0000000000 R08: ffffffff92f3a527 R09: 1ffffffff25e74a4 > > > > > > > > > > R10: dffffc0000000000 R11: fffffbfff25e74a5 R12: 0000000029373578 > > > > > > > > > > R13: 1ffff9200002af64 R14: ffffffff814220f3 R15: ffff88813fff90a0 > > > > > > > > > > rcu_read_lock_sched include/linux/rcupdate.h:934 [inline] > > > > > > > > > > pfn_valid include/linux/mmzone.h:2019 [inline] > > > > > > > > > > __virt_addr_valid+0x1a9/0x520 arch/x86/mm/physaddr.c:65 > > > > > > > > > > kasan_addr_to_slab+0xd/0x80 mm/kasan/common.c:37 > > > > > > > > > > __kasan_record_aux_stack+0x11/0xc0 mm/kasan/generic.c:526 > > > > > > > > > > > > > > > > > > > > > > > > > > > This should be caused by the following commit: > > > > > > > > > d818cc76e2b4 ("kasan: Record work creation stack trace with interrupts enabled") > > > > > > > > > > > > > > > > > > Is it possible to make the rcu_softirq_qs() run only in ksoftirqd task? > > > > > > > > > > > > > > > > use rcu_softirq_qs_periodic() in run_ksoftirqd(). > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > Zqiang > > > > > > > > > > > > > > > > > > > __call_rcu_common kernel/rcu/tree.c:3096 [inline] > > > > > > > > > > call_rcu+0x167/0xa70 kernel/rcu/tree.c:3200 > > > > > > > > > > context_switch kernel/sched/core.c:5412 [inline] > > > > > > > > > > __schedule+0x17f0/0x4a50 kernel/sched/core.c:6746 > > > > > > > > > > __schedule_loop kernel/sched/core.c:6823 [inline] > > > > > > > > > > schedule+0x14b/0x320 kernel/sched/core.c:6838 > > > > > > > > > > smpboot_thread_fn+0x61e/0xa30 kernel/smpboot.c:160 > > > > > > > > > > kthread+0x2f0/0x390 kernel/kthread.c:388 > > > > > > > > > > ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 > > > > > > > > > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243 > > > > > > > > > > </TASK> > > > > > > > > > > ---------------- > > > > > > > > > > Code disassembly (best guess): > > > > > > > > > > 0: f5 cmc > > > > > > > > > > 1: 90 nop > > > > > > > > > > 2: 0f 0b ud2 > > > > > > > > > > 4: 90 nop > > > > > > > > > > 5: 90 nop > > > > > > > > > > 6: 90 nop > > > > > > > > > > 7: eb c6 jmp 0xffffffcf > > > > > > > > > > 9: 0f 1f 40 00 nopl 0x0(%rax) > > > > > > > > > > d: 90 nop > > > > > > > > > > e: 90 nop > > > > > > > > > > f: 90 nop > > > > > > > > > > 10: 90 nop > > > > > > > > > > 11: 90 nop > > > > > > > > > > 12: 90 nop > > > > > > > > > > 13: 90 nop > > > > > > > > > > 14: 90 nop > > > > > > > > > > 15: 90 nop > > > > > > > > > > 16: 90 nop > > > > > > > > > > 17: 90 nop > > > > > > > > > > 18: 90 nop > > > > > > > > > > 19: 90 nop > > > > > > > > > > 1a: 90 nop > > > > > > > > > > 1b: 90 nop > > > > > > > > > > 1c: 90 nop > > > > > > > > > > 1d: f3 0f 1e fa endbr64 > > > > > > > > > > 21: 31 c0 xor %eax,%eax > > > > > > > > > > 23: 83 3d c7 0f 28 04 00 cmpl $0x0,0x4280fc7(%rip) # 0x4280ff1 > > > > > > > > > > * 2a: 74 1e je 0x4a <-- trapping instruction > > > > > > > > > > 2c: 83 3d 26 42 28 04 00 cmpl $0x0,0x4284226(%rip) # 0x4284259 > > > > > > > > > > 33: 74 15 je 0x4a > > > > > > > > > > 35: 65 48 8b 0c 25 c0 d3 mov %gs:0x3d3c0,%rcx > > > > > > > > > > 3c: 03 00 > > > > > > > > > > 3e: 31 c0 xor %eax,%eax > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > > > > This report is generated by a bot. It may contain errors. > > > > > > > > > > See https://goo.gl/tpsmEJ for more information about syzbot. > > > > > > > > > > syzbot engineers can be reached at syzkaller@googlegroups.com. > > > > > > > > > > > > > > > > > > > > syzbot will keep track of this issue. See: > > > > > > > > > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > > > > > > > > > > > > > > > > > > > > If the report is already addressed, let syzbot know by replying with: > > > > > > > > > > #syz fix: exact-commit-title > > > > > > > > > > > > > > > > > > > > If you want syzbot to run the reproducer, reply with: > > > > > > > > > > #syz test: git://repo/address.git branch-or-commit-hash > > > > > > > > > > If you attach or paste a git patch, syzbot will apply it before testing. > > > > > > > > > > > > > > > > > > > > If you want to overwrite report's subsystems, reply with: > > > > > > > > > > #syz set subsystems: new-subsystem > > > > > > > > > > (See the list of subsystem names on the web dashboard) > > > > > > > > > > > > > > > > > > > > If the report is a duplicate of another one, reply with: > > > > > > > > > > #syz dup: exact-subject-of-another-report > > > > > > > > > > > > > > > > > > > > If you want to undo deduplication, reply with: > > > > > > > > > > #syz undup > > > > > > > > > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot] [kernel?] WARNING: suspicious RCU usage in __do_softirq 2024-04-19 5:50 ` Z qiang 2024-04-19 6:11 ` Z qiang 2024-04-19 7:12 ` Paul E. McKenney @ 2024-04-22 21:15 ` Thomas Gleixner 2024-04-23 2:20 ` Z qiang 2 siblings, 1 reply; 18+ messages in thread From: Thomas Gleixner @ 2024-04-22 21:15 UTC (permalink / raw) To: Z qiang, paulmck; +Cc: syzbot, linux-kernel, peterz, syzkaller-bugs On Fri, Apr 19 2024 at 13:50, Z qiang wrote: >> On Thu, Apr 18, 2024 at 05:49:38PM +0800, Z qiang wrote: > static __init int spawn_ksoftirqd(void) > { > + int cpu; > + > + for_each_possible_cpu(cpu) > + per_cpu(ksoftirqd_work, cpu) = false; First of all that initialization is pointless, but why making all of this complex as hell? Thanks, tglx --- diff --git a/kernel/softirq.c b/kernel/softirq.c index b315b21fb28c..e991d735be0d 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -508,7 +508,7 @@ static inline bool lockdep_softirq_start(void) { return false; } static inline void lockdep_softirq_end(bool in_hardirq) { } #endif -asmlinkage __visible void __softirq_entry __do_softirq(void) +static void handle_softirqs(bool kirqd) { unsigned long end = jiffies + MAX_SOFTIRQ_TIME; unsigned long old_flags = current->flags; @@ -563,8 +563,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) pending >>= softirq_bit; } - if (!IS_ENABLED(CONFIG_PREEMPT_RT) && - __this_cpu_read(ksoftirqd) == current) + if (!IS_ENABLED(CONFIG_PREEMPT_RT) && kirqd) rcu_softirq_qs(); local_irq_disable(); @@ -584,6 +583,11 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) current_restore_flags(old_flags, PF_MEMALLOC); } +asmlinkage __visible void __softirq_entry __do_softirq(void) +{ + handle_softirqs(false); +} + /** * irq_enter_rcu - Enter an interrupt context with RCU watching */ @@ -921,7 +925,7 @@ static void run_ksoftirqd(unsigned int cpu) * We can safely run softirq on inline stack, as we are not deep * in the task stack here. */ - __do_softirq(); + handle_softirqs(true); ksoftirqd_run_end(); cond_resched(); return; ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [syzbot] [kernel?] WARNING: suspicious RCU usage in __do_softirq 2024-04-22 21:15 ` Thomas Gleixner @ 2024-04-23 2:20 ` Z qiang 2024-04-23 22:28 ` Paul E. McKenney 0 siblings, 1 reply; 18+ messages in thread From: Z qiang @ 2024-04-23 2:20 UTC (permalink / raw) To: Thomas Gleixner; +Cc: paulmck, syzbot, linux-kernel, peterz, syzkaller-bugs > > On Fri, Apr 19 2024 at 13:50, Z qiang wrote: > >> On Thu, Apr 18, 2024 at 05:49:38PM +0800, Z qiang wrote: > > static __init int spawn_ksoftirqd(void) > > { > > + int cpu; > > + > > + for_each_possible_cpu(cpu) > > + per_cpu(ksoftirqd_work, cpu) = false; > > First of all that initialization is pointless, but why making all of > this complex as hell? > > Thanks, > > tglx > --- > diff --git a/kernel/softirq.c b/kernel/softirq.c > index b315b21fb28c..e991d735be0d 100644 > --- a/kernel/softirq.c > +++ b/kernel/softirq.c > @@ -508,7 +508,7 @@ static inline bool lockdep_softirq_start(void) { return false; } > static inline void lockdep_softirq_end(bool in_hardirq) { } > #endif > > -asmlinkage __visible void __softirq_entry __do_softirq(void) > +static void handle_softirqs(bool kirqd) > { > unsigned long end = jiffies + MAX_SOFTIRQ_TIME; > unsigned long old_flags = current->flags; > @@ -563,8 +563,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) > pending >>= softirq_bit; > } > > - if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > - __this_cpu_read(ksoftirqd) == current) > + if (!IS_ENABLED(CONFIG_PREEMPT_RT) && kirqd) > rcu_softirq_qs(); > > local_irq_disable(); > @@ -584,6 +583,11 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) > current_restore_flags(old_flags, PF_MEMALLOC); > } > > +asmlinkage __visible void __softirq_entry __do_softirq(void) > +{ > + handle_softirqs(false); > +} > + > /** > * irq_enter_rcu - Enter an interrupt context with RCU watching > */ > @@ -921,7 +925,7 @@ static void run_ksoftirqd(unsigned int cpu) > * We can safely run softirq on inline stack, as we are not deep > * in the task stack here. > */ > - __do_softirq(); > + handle_softirqs(true); Thanks, this is good for me :), Paul, what do you think? Thanks Zqiang > ksoftirqd_run_end(); > cond_resched(); > return; ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot] [kernel?] WARNING: suspicious RCU usage in __do_softirq 2024-04-23 2:20 ` Z qiang @ 2024-04-23 22:28 ` Paul E. McKenney 2024-04-26 2:09 ` Z qiang 0 siblings, 1 reply; 18+ messages in thread From: Paul E. McKenney @ 2024-04-23 22:28 UTC (permalink / raw) To: Z qiang; +Cc: Thomas Gleixner, syzbot, linux-kernel, peterz, syzkaller-bugs On Tue, Apr 23, 2024 at 10:20:49AM +0800, Z qiang wrote: > > > > On Fri, Apr 19 2024 at 13:50, Z qiang wrote: > > >> On Thu, Apr 18, 2024 at 05:49:38PM +0800, Z qiang wrote: > > > static __init int spawn_ksoftirqd(void) > > > { > > > + int cpu; > > > + > > > + for_each_possible_cpu(cpu) > > > + per_cpu(ksoftirqd_work, cpu) = false; > > > > First of all that initialization is pointless, but why making all of > > this complex as hell? > > > > Thanks, > > > > tglx > > --- > > diff --git a/kernel/softirq.c b/kernel/softirq.c > > index b315b21fb28c..e991d735be0d 100644 > > --- a/kernel/softirq.c > > +++ b/kernel/softirq.c > > @@ -508,7 +508,7 @@ static inline bool lockdep_softirq_start(void) { return false; } > > static inline void lockdep_softirq_end(bool in_hardirq) { } > > #endif > > > > -asmlinkage __visible void __softirq_entry __do_softirq(void) > > +static void handle_softirqs(bool kirqd) > > { > > unsigned long end = jiffies + MAX_SOFTIRQ_TIME; > > unsigned long old_flags = current->flags; > > @@ -563,8 +563,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) > > pending >>= softirq_bit; > > } > > > > - if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > > - __this_cpu_read(ksoftirqd) == current) > > + if (!IS_ENABLED(CONFIG_PREEMPT_RT) && kirqd) > > rcu_softirq_qs(); > > > > local_irq_disable(); > > @@ -584,6 +583,11 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) > > current_restore_flags(old_flags, PF_MEMALLOC); > > } > > > > +asmlinkage __visible void __softirq_entry __do_softirq(void) > > +{ > > + handle_softirqs(false); > > +} > > + > > /** > > * irq_enter_rcu - Enter an interrupt context with RCU watching > > */ > > @@ -921,7 +925,7 @@ static void run_ksoftirqd(unsigned int cpu) > > * We can safely run softirq on inline stack, as we are not deep > > * in the task stack here. > > */ > > - __do_softirq(); > > + handle_softirqs(true); > > Thanks, this is good for me :), > Paul, what do you think? This looks quite nice to me, especially given that it avoids changing all the other calls to __do_softirq(). Some architectures might want to directly call handle_softirqs(), but if so, they can send the patches. Thanx, Paul > Thanks > Zqiang > > > ksoftirqd_run_end(); > > cond_resched(); > > return; ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot] [kernel?] WARNING: suspicious RCU usage in __do_softirq 2024-04-23 22:28 ` Paul E. McKenney @ 2024-04-26 2:09 ` Z qiang 2024-04-26 17:23 ` Thomas Gleixner 0 siblings, 1 reply; 18+ messages in thread From: Z qiang @ 2024-04-26 2:09 UTC (permalink / raw) To: paulmck; +Cc: Thomas Gleixner, syzbot, linux-kernel, peterz, syzkaller-bugs > > On Tue, Apr 23, 2024 at 10:20:49AM +0800, Z qiang wrote: > > > > > > On Fri, Apr 19 2024 at 13:50, Z qiang wrote: > > > >> On Thu, Apr 18, 2024 at 05:49:38PM +0800, Z qiang wrote: > > > > static __init int spawn_ksoftirqd(void) > > > > { > > > > + int cpu; > > > > + > > > > + for_each_possible_cpu(cpu) > > > > + per_cpu(ksoftirqd_work, cpu) = false; > > > > > > First of all that initialization is pointless, but why making all of > > > this complex as hell? > > > > > > Thanks, > > > > > > tglx > > > --- > > > diff --git a/kernel/softirq.c b/kernel/softirq.c > > > index b315b21fb28c..e991d735be0d 100644 > > > --- a/kernel/softirq.c > > > +++ b/kernel/softirq.c > > > @@ -508,7 +508,7 @@ static inline bool lockdep_softirq_start(void) { return false; } > > > static inline void lockdep_softirq_end(bool in_hardirq) { } > > > #endif > > > > > > -asmlinkage __visible void __softirq_entry __do_softirq(void) > > > +static void handle_softirqs(bool kirqd) > > > { > > > unsigned long end = jiffies + MAX_SOFTIRQ_TIME; > > > unsigned long old_flags = current->flags; > > > @@ -563,8 +563,7 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) > > > pending >>= softirq_bit; > > > } > > > > > > - if (!IS_ENABLED(CONFIG_PREEMPT_RT) && > > > - __this_cpu_read(ksoftirqd) == current) > > > + if (!IS_ENABLED(CONFIG_PREEMPT_RT) && kirqd) > > > rcu_softirq_qs(); > > > > > > local_irq_disable(); > > > @@ -584,6 +583,11 @@ asmlinkage __visible void __softirq_entry __do_softirq(void) > > > current_restore_flags(old_flags, PF_MEMALLOC); > > > } > > > > > > +asmlinkage __visible void __softirq_entry __do_softirq(void) > > > +{ > > > + handle_softirqs(false); > > > +} > > > + > > > /** > > > * irq_enter_rcu - Enter an interrupt context with RCU watching > > > */ > > > @@ -921,7 +925,7 @@ static void run_ksoftirqd(unsigned int cpu) > > > * We can safely run softirq on inline stack, as we are not deep > > > * in the task stack here. > > > */ > > > - __do_softirq(); > > > + handle_softirqs(true); > > > > Thanks, this is good for me :), > > Paul, what do you think? > > This looks quite nice to me, especially given that it avoids changing > all the other calls to __do_softirq(). Some architectures might want > to directly call handle_softirqs(), but if so, they can send the patches. Hello, Thomas Can you send an official patch? or I will send and add you to Co-developed-by tags. Thanks Zqiang > > Thanx, Paul > > > Thanks > > Zqiang > > > > > ksoftirqd_run_end(); > > > cond_resched(); > > > return; ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [syzbot] [kernel?] WARNING: suspicious RCU usage in __do_softirq 2024-04-26 2:09 ` Z qiang @ 2024-04-26 17:23 ` Thomas Gleixner 0 siblings, 0 replies; 18+ messages in thread From: Thomas Gleixner @ 2024-04-26 17:23 UTC (permalink / raw) To: Z qiang, paulmck; +Cc: syzbot, linux-kernel, peterz, syzkaller-bugs On Fri, Apr 26 2024 at 10:09, Z qiang wrote: >> This looks quite nice to me, especially given that it avoids changing >> all the other calls to __do_softirq(). Some architectures might want >> to directly call handle_softirqs(), but if so, they can send the patches. > > > Can you send an official patch? or I will send and add you to > Co-developed-by tags. Just send it along with a nice changelog and add: Suggested-by: Thomas Gleixner <tglx@linutronix.de> That's good enough. ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-04-26 17:23 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-04-02 18:05 [syzbot] [kernel?] WARNING: suspicious RCU usage in __do_softirq syzbot 2024-04-16 7:05 ` syzbot 2024-04-16 8:10 ` Z qiang 2024-04-16 8:44 ` Z qiang 2024-04-17 1:13 ` Paul E. McKenney 2024-04-17 2:25 ` Z qiang 2024-04-17 22:39 ` Paul E. McKenney 2024-04-18 9:49 ` Z qiang 2024-04-18 22:49 ` Paul E. McKenney 2024-04-19 5:50 ` Z qiang 2024-04-19 6:11 ` Z qiang 2024-04-19 7:12 ` Paul E. McKenney 2024-04-19 7:29 ` Z qiang 2024-04-22 21:15 ` Thomas Gleixner 2024-04-23 2:20 ` Z qiang 2024-04-23 22:28 ` Paul E. McKenney 2024-04-26 2:09 ` Z qiang 2024-04-26 17:23 ` Thomas Gleixner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox