* [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2)
@ 2025-08-22 4:15 syzbot
2025-08-22 12:08 ` Lorenzo Stoakes
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: syzbot @ 2025-08-22 4:15 UTC (permalink / raw)
To: Liam.Howlett, akpm, jannh, linux-kernel, linux-mm,
lorenzo.stoakes, pfalcato, syzkaller-bugs, vbabka
Hello,
syzbot found the following issue on:
HEAD commit: be48bcf004f9 Merge tag 'for-6.17-rc2-tag' of git://git.ker..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=136dfba2580000
kernel config: https://syzkaller.appspot.com/x/.config?x=142508fb116c212f
dashboard link: https://syzkaller.appspot.com/bug?extid=8785aaf121cfb2141e0d
compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=107a43bc580000
Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/24fd400c6842/disk-be48bcf0.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/59146305635d/vmlinux-be48bcf0.xz
kernel image: https://storage.googleapis.com/syzbot-assets/b3e5f65cbcc8/bzImage-be48bcf0.xz
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+8785aaf121cfb2141e0d@syzkaller.appspotmail.com
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P6029/1:b..l P1208/1:b..l P6031/3:b..l P6030/1:b..l
rcu: (detected by 1, t=10502 jiffies, g=6285, q=421 ncpus=2)
task:dhcpcd state:R running task stack:28896 pid:6030 tgid:6030 ppid:5513 task_flags:0x400040 flags:0x00004002
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5357 [inline]
__schedule+0x1190/0x5de0 kernel/sched/core.c:6961
preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
irqentry_exit+0x36/0x90 kernel/entry/common.c:197
asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
RIP: 0010:unwind_next_frame+0xfe7/0x20a0 arch/x86/kernel/unwind_orc.c:664
Code: 85 80 0c 00 00 49 89 6d 40 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e 56 10 00 00 <41> 39 5d 00 0f 84 10 06 00 00 bd 01 00 00 00 e9 de f3 ff ff 48 b8
RSP: 0018:ffffc90003cdf6a8 EFLAGS: 00000246
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffc90003ce0000
RDX: 1ffff9200079bee3 RSI: ffffc90003cdfa70 RDI: ffffc90003cdf758
RBP: ffffc90003cdfae0 R08: 0000000000000001 R09: 0000000000000000
R10: ffffc90003cdf718 R11: 00000000000121e6 R12: ffffc90003cdf768
R13: ffffc90003cdf718 R14: ffffc90003cdfa80 R15: ffffc90003cdf74c
arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
kasan_record_aux_stack+0xa7/0xc0 mm/kasan/generic.c:548
slab_free_hook mm/slub.c:2378 [inline]
slab_free mm/slub.c:4680 [inline]
kmem_cache_free+0x15a/0x4d0 mm/slub.c:4782
vms_complete_munmap_vmas+0x573/0x970 mm/vma.c:1293
do_vmi_align_munmap+0x43b/0x7d0 mm/vma.c:1536
do_vmi_munmap+0x204/0x3e0 mm/vma.c:1584
__vm_munmap+0x19a/0x390 mm/vma.c:3155
__do_sys_munmap mm/mmap.c:1080 [inline]
__se_sys_munmap mm/mmap.c:1077 [inline]
__x64_sys_munmap+0x59/0x80 mm/mmap.c:1077
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fb13ec2f2e7
RSP: 002b:00007fffe10fae78 EFLAGS: 00000206 ORIG_RAX: 000000000000000b
RAX: ffffffffffffffda RBX: 0000562bd1443510 RCX: 00007fb13ec2f2e7
RDX: 0000000000000002 RSI: 0000000000004028 RDI: 00007fb13ea1b000
RBP: 00007fffe10faf80 R08: 0000562bd1432470 R09: 0000000000000001
R10: 00007fffe10fadb0 R11: 0000000000000206 R12: 00007fffe10faea0
R13: 00007fb13ec42000 R14: 0000562bd1443510 R15: 0000000000000000
</TASK>
task:syz-executor state:R running task stack:27632 pid:6031 tgid:6031 ppid:5870 task_flags:0x400000 flags:0x00004000
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5357 [inline]
__schedule+0x1190/0x5de0 kernel/sched/core.c:6961
preempt_schedule_common+0x44/0xc0 kernel/sched/core.c:7145
preempt_schedule_thunk+0x16/0x30 arch/x86/entry/thunk.S:12
__raw_spin_unlock include/linux/spinlock_api_smp.h:143 [inline]
_raw_spin_unlock+0x3e/0x50 kernel/locking/spinlock.c:186
spin_unlock include/linux/spinlock.h:391 [inline]
filemap_map_pages+0xe15/0x1670 mm/filemap.c:3791
do_fault_around mm/memory.c:5531 [inline]
do_read_fault mm/memory.c:5564 [inline]
do_fault mm/memory.c:5707 [inline]
do_pte_missing+0xe39/0x3ba0 mm/memory.c:4234
handle_pte_fault mm/memory.c:6052 [inline]
__handle_mm_fault+0x152a/0x2a50 mm/memory.c:6195
handle_mm_fault+0x589/0xd10 mm/memory.c:6364
do_user_addr_fault+0x60c/0x1370 arch/x86/mm/fault.c:1336
handle_page_fault arch/x86/mm/fault.c:1476 [inline]
exc_page_fault+0x5c/0xb0 arch/x86/mm/fault.c:1532
asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
RIP: 0033:0x7f54cd7177c7
RSP: 002b:00007fffb79a5b40 EFLAGS: 00010246
RAX: 00007f54ce525000 RBX: 0000000000000000 RCX: 0000000000000064
RDX: 00007fffb79a5de9 RSI: 0000000000000002 RDI: 00007fffb79a5dd8
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
R13: 00007fffb79a5c48 R14: 0000000000000000 R15: 0000000000000000
</TASK>
task:kworker/0:3 state:R running task stack:25368 pid:1208 tgid:1208 ppid:2 task_flags:0x4208060 flags:0x00004000
Workqueue: events_power_efficient gc_worker
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5357 [inline]
__schedule+0x1190/0x5de0 kernel/sched/core.c:6961
preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
irqentry_exit+0x36/0x90 kernel/entry/common.c:197
asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
RIP: 0010:write_comp_data+0x0/0x90 kernel/kcov.c:240
Code: 48 8b 05 db b4 1a 12 48 8b 80 30 16 00 00 e9 97 05 db 09 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <49> 89 d2 49 89 f8 49 89 f1 65 48 8b 15 a7 b4 1a 12 65 8b 05 b8 b4
RSP: 0018:ffffc9000441fb50 EFLAGS: 00000293
RAX: 0000000000000000 RBX: 0000000000040000 RCX: ffffffff89ba2a52
RDX: 0000000000040000 RSI: 0000000000000433 RDI: 0000000000000004
RBP: ffffffff9b2c41ec R08: 0000000000000004 R09: 0000000000000000
R10: 0000000000000000 R11: ffffffff9b030610 R12: ffff888031800000
R13: 0000000000000433 R14: dffffc0000000000 R15: 0000000000001770
gc_worker+0x342/0x16e0 net/netfilter/nf_conntrack_core.c:1549
process_one_work+0x9cc/0x1b70 kernel/workqueue.c:3236
process_scheduled_works kernel/workqueue.c:3319 [inline]
worker_thread+0x6c8/0xf10 kernel/workqueue.c:3400
kthread+0x3c5/0x780 kernel/kthread.c:463
ret_from_fork+0x5d7/0x6f0 arch/x86/kernel/process.c:148
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
task:dhcpcd state:R running task stack:26072 pid:6029 tgid:6029 ppid:5513 task_flags:0x400040 flags:0x00004002
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5357 [inline]
__schedule+0x1190/0x5de0 kernel/sched/core.c:6961
preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
irqentry_exit+0x36/0x90 kernel/entry/common.c:197
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
RIP: 0010:orc_ip arch/x86/kernel/unwind_orc.c:80 [inline]
RIP: 0010:__orc_find+0x7e/0xf0 arch/x86/kernel/unwind_orc.c:102
Code: ea 3f 48 c1 fe 02 48 01 f2 48 d1 fa 48 8d 5c 95 00 48 89 da 48 c1 ea 03 0f b6 34 0a 48 89 da 83 e2 07 83 c2 03 40 38 f2 7c 05 <40> 84 f6 75 4b 48 63 13 48 01 da 49 39 d5 73 af 4c 8d 63 fc 49 39
RSP: 0018:ffffc90003337648 EFLAGS: 00000202
RAX: ffffffff914e0dd8 RBX: ffffffff90c5215c RCX: dffffc0000000000
RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffffffff90c52148
RBP: ffffffff90c52148 R08: ffffffff914e0e1a R09: 0000000000000000
R10: ffffc900033376f8 R11: 0000000000011271 R12: ffffffff90c52170
R13: ffffffff82127173 R14: ffffffff90c52148 R15: ffffffff90c52148
orc_find arch/x86/kernel/unwind_orc.c:227 [inline]
unwind_next_frame+0x2ec/0x20a0 arch/x86/kernel/unwind_orc.c:494
arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
kasan_save_track+0x14/0x30 mm/kasan/common.c:68
poison_kmalloc_redzone mm/kasan/common.c:388 [inline]
__kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:405
kmalloc_noprof include/linux/slab.h:905 [inline]
slab_free_hook mm/slub.c:2369 [inline]
slab_free mm/slub.c:4680 [inline]
kmem_cache_free+0x142/0x4d0 mm/slub.c:4782
vms_complete_munmap_vmas+0x573/0x970 mm/vma.c:1293
do_vmi_align_munmap+0x43b/0x7d0 mm/vma.c:1536
do_vmi_munmap+0x204/0x3e0 mm/vma.c:1584
__vm_munmap+0x19a/0x390 mm/vma.c:3155
__do_sys_munmap mm/mmap.c:1080 [inline]
__se_sys_munmap mm/mmap.c:1077 [inline]
__x64_sys_munmap+0x59/0x80 mm/mmap.c:1077
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fb13ec2f2e7
RSP: 002b:00007fffe10fae78 EFLAGS: 00000202 ORIG_RAX: 000000000000000b
RAX: ffffffffffffffda RBX: 0000562bd1443f00 RCX: 00007fb13ec2f2e7
RDX: 0000000000000001 RSI: 000000000002f6d0 RDI: 00007fb13e9c1000
RBP: 00007fffe10faf80 R08: 00000000000004f0 R09: 0000000000000002
R10: 00007fffe10fadb0 R11: 0000000000000202 R12: 00007fffe10faec0
R13: 00007fb13ec42000 R14: 0000562bd1443f00 R15: 0000000000000000
</TASK>
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.
If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)
If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report
If you want to undo deduplication, reply with:
#syz undup
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2)
2025-08-22 4:15 [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2) syzbot
@ 2025-08-22 12:08 ` Lorenzo Stoakes
2025-08-22 13:55 ` Harry Yoo
2025-08-28 2:05 ` Liam R. Howlett
2025-08-28 2:20 ` Liam R. Howlett
2 siblings, 1 reply; 10+ messages in thread
From: Lorenzo Stoakes @ 2025-08-22 12:08 UTC (permalink / raw)
To: syzbot
Cc: Liam.Howlett, akpm, jannh, linux-kernel, linux-mm, pfalcato,
syzkaller-bugs, vbabka, Sebastian Andrzej Siewior, Harry Yoo
+cc Sebastian for RCU ORC change...
+cc Harry for slab side.
Pinging Jann for the CONFIG_SLUB_RCU_DEBUG element.
Jann - could this possibly be related to CONFIG_SLUB_RCU_DEBUG? As it seems to
the stack is within KASAN, but no KASAN report so maybe it's KASAN itself that's
having an issue?
Though I'm thinking maybe it's the orc unwinder itself that could be problematic
here (yet invoked by CONFIG_SLUB_RCU_DEBUG though)... and yeah kinda suspcious
because:
- We have two threads freeing VMAs using SLAB_TYPESAFE_BY_RCU
- CONFIG_SLUB_RCU_DEBUG means that we use KASAN to save an aux stack, which
makes us do an unwind via ORC, which then takes an RCU read lock on
unwind_next_frame(), and both are doing this unwinding at the time of report.
- ???
- Somehow things get locked up?
I'm not an RCU expert (clearly :) so I'm not sure exactly how this could result
in a stall, but it's suspicious.
On Thu, Aug 21, 2025 at 09:15:37PM -0700, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: be48bcf004f9 Merge tag 'for-6.17-rc2-tag' of git://git.ker..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=136dfba2580000
> kernel config: https://syzkaller.appspot.com/x/.config?x=142508fb116c212f
lockdep (CONFIG_PROVE_LOCKING) is on, so I'm guessing there's no deadlock here.
CONFIG_DEBUG_VM_MAPLE_TREE is enabled, which will cause _major_ slowdown on VMA
operations as the tree is constantly being fully validated.
This may explain the stalls...
> dashboard link: https://syzkaller.appspot.com/bug?extid=8785aaf121cfb2141e0d
> compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=107a43bc580000
No C repro yet...
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/24fd400c6842/disk-be48bcf0.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/59146305635d/vmlinux-be48bcf0.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/b3e5f65cbcc8/bzImage-be48bcf0.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+8785aaf121cfb2141e0d@syzkaller.appspotmail.com
>
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P6029/1:b..l P1208/1:b..l P6031/3:b..l P6030/1:b..l
> rcu: (detected by 1, t=10502 jiffies, g=6285, q=421 ncpus=2)
So 105s, or 1m45s, that's pretty long...
> task:dhcpcd state:R running task stack:28896 pid:6030 tgid:6030 ppid:5513 task_flags:0x400040 flags:0x00004002
> Call Trace:
> <TASK>
> context_switch kernel/sched/core.c:5357 [inline]
> __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
> preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
> irqentry_exit+0x36/0x90 kernel/entry/common.c:197
> asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
> RIP: 0010:unwind_next_frame+0xfe7/0x20a0 arch/x86/kernel/unwind_orc.c:664
Hmm, while the line number is not pertinent, I notice unwind_next_frame() has:
guard(rcu)()
In it from commit 14daa3bca217 ("x86: Use RCU in all users of
__module_address().") though from Jan 2025...
This is defined (took me a while to track down!!) in include/linux/rcupdate.h:
DEFINE_LOCK_GUARD_0(rcu,
do {
rcu_read_lock();
/*
* sparse doesn't call the cleanup function,
* so just release immediately and don't track
* the context. We don't need to anyway, since
* the whole point of the guard is to not need
* the explicit unlock.
*/
__release(RCU);
} while (0),
rcu_read_unlock())
Meaning it's equivalent to a scoped rcu_read_lock() / rcu_read_unlock().
But since no C repro this is likely a race of some kind that might be very hard to hit.
> Code: 85 80 0c 00 00 49 89 6d 40 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e 56 10 00 00 <41> 39 5d 00 0f 84 10 06 00 00 bd 01 00 00 00 e9 de f3 ff ff 48 b8
> RSP: 0018:ffffc90003cdf6a8 EFLAGS: 00000246
> RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffc90003ce0000
> RDX: 1ffff9200079bee3 RSI: ffffc90003cdfa70 RDI: ffffc90003cdf758
> RBP: ffffc90003cdfae0 R08: 0000000000000001 R09: 0000000000000000
> R10: ffffc90003cdf718 R11: 00000000000121e6 R12: ffffc90003cdf768
> R13: ffffc90003cdf718 R14: ffffc90003cdfa80 R15: ffffc90003cdf74c
> arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
> stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
> kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
> kasan_record_aux_stack+0xa7/0xc0 mm/kasan/generic.c:548
> slab_free_hook mm/slub.c:2378 [inline]
Invokes the CONFIG_SLUB_RCU_DEBUG stack trace saving stuff
> slab_free mm/slub.c:4680 [inline]
> kmem_cache_free+0x15a/0x4d0 mm/slub.c:4782
Note that VMAs are SLAB_TYPESAFE_BY_RCU so maybe that's somehow playing a role
here?
In free_slab():
if (unlikely(s->flags & SLAB_TYPESAFE_BY_RCU))
call_rcu(&slab->rcu_head, rcu_free_slab);
> vms_complete_munmap_vmas+0x573/0x970 mm/vma.c:1293
> do_vmi_align_munmap+0x43b/0x7d0 mm/vma.c:1536
> do_vmi_munmap+0x204/0x3e0 mm/vma.c:1584
> __vm_munmap+0x19a/0x390 mm/vma.c:3155
> __do_sys_munmap mm/mmap.c:1080 [inline]
> __se_sys_munmap mm/mmap.c:1077 [inline]
> __x64_sys_munmap+0x59/0x80 mm/mmap.c:1077
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
Seems a normal trace for an unmap, note (inlining removes stuff here) it's:
vms_complete_munmap_vmas() -> remove_vma() -> vm_area_free() -> kmem_cache_free()
> RIP: 0033:0x7fb13ec2f2e7
> RSP: 002b:00007fffe10fae78 EFLAGS: 00000206 ORIG_RAX: 000000000000000b
> RAX: ffffffffffffffda RBX: 0000562bd1443510 RCX: 00007fb13ec2f2e7
> RDX: 0000000000000002 RSI: 0000000000004028 RDI: 00007fb13ea1b000
> RBP: 00007fffe10faf80 R08: 0000562bd1432470 R09: 0000000000000001
> R10: 00007fffe10fadb0 R11: 0000000000000206 R12: 00007fffe10faea0
> R13: 00007fb13ec42000 R14: 0000562bd1443510 R15: 0000000000000000
> </TASK>
> task:syz-executor state:R running task stack:27632 pid:6031 tgid:6031 ppid:5870 task_flags:0x400000 flags:0x00004000
> Call Trace:
> <TASK>
> context_switch kernel/sched/core.c:5357 [inline]
> __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
> preempt_schedule_common+0x44/0xc0 kernel/sched/core.c:7145
> preempt_schedule_thunk+0x16/0x30 arch/x86/entry/thunk.S:12
> __raw_spin_unlock include/linux/spinlock_api_smp.h:143 [inline]
> _raw_spin_unlock+0x3e/0x50 kernel/locking/spinlock.c:186
> spin_unlock include/linux/spinlock.h:391 [inline]
> filemap_map_pages+0xe15/0x1670 mm/filemap.c:3791
> do_fault_around mm/memory.c:5531 [inline]
> do_read_fault mm/memory.c:5564 [inline]
> do_fault mm/memory.c:5707 [inline]
> do_pte_missing+0xe39/0x3ba0 mm/memory.c:4234
> handle_pte_fault mm/memory.c:6052 [inline]
> __handle_mm_fault+0x152a/0x2a50 mm/memory.c:6195
> handle_mm_fault+0x589/0xd10 mm/memory.c:6364
> do_user_addr_fault+0x60c/0x1370 arch/x86/mm/fault.c:1336
> handle_page_fault arch/x86/mm/fault.c:1476 [inline]
> exc_page_fault+0x5c/0xb0 arch/x86/mm/fault.c:1532
> asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
Faulting path being context switched on unlock of PTE spinlock...
> RIP: 0033:0x7f54cd7177c7
> RSP: 002b:00007fffb79a5b40 EFLAGS: 00010246
> RAX: 00007f54ce525000 RBX: 0000000000000000 RCX: 0000000000000064
> RDX: 00007fffb79a5de9 RSI: 0000000000000002 RDI: 00007fffb79a5dd8
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
> R13: 00007fffb79a5c48 R14: 0000000000000000 R15: 0000000000000000
> </TASK>
> task:kworker/0:3 state:R running task stack:25368 pid:1208 tgid:1208 ppid:2 task_flags:0x4208060 flags:0x00004000
> Workqueue: events_power_efficient gc_worker
> Call Trace:
> <TASK>
> context_switch kernel/sched/core.c:5357 [inline]
> __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
> preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
> irqentry_exit+0x36/0x90 kernel/entry/common.c:197
> asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
> RIP: 0010:write_comp_data+0x0/0x90 kernel/kcov.c:240
> Code: 48 8b 05 db b4 1a 12 48 8b 80 30 16 00 00 e9 97 05 db 09 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <49> 89 d2 49 89 f8 49 89 f1 65 48 8b 15 a7 b4 1a 12 65 8b 05 b8 b4
> RSP: 0018:ffffc9000441fb50 EFLAGS: 00000293
> RAX: 0000000000000000 RBX: 0000000000040000 RCX: ffffffff89ba2a52
> RDX: 0000000000040000 RSI: 0000000000000433 RDI: 0000000000000004
> RBP: ffffffff9b2c41ec R08: 0000000000000004 R09: 0000000000000000
> R10: 0000000000000000 R11: ffffffff9b030610 R12: ffff888031800000
> R13: 0000000000000433 R14: dffffc0000000000 R15: 0000000000001770
> gc_worker+0x342/0x16e0 net/netfilter/nf_conntrack_core.c:1549
> process_one_work+0x9cc/0x1b70 kernel/workqueue.c:3236
> process_scheduled_works kernel/workqueue.c:3319 [inline]
> worker_thread+0x6c8/0xf10 kernel/workqueue.c:3400
> kthread+0x3c5/0x780 kernel/kthread.c:463
> ret_from_fork+0x5d7/0x6f0 arch/x86/kernel/process.c:148
> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> </TASK>
> task:dhcpcd state:R running task stack:26072 pid:6029 tgid:6029 ppid:5513 task_flags:0x400040 flags:0x00004002
> Call Trace:
> <TASK>
> context_switch kernel/sched/core.c:5357 [inline]
> __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
> preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
> irqentry_exit+0x36/0x90 kernel/entry/common.c:197
> asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
> RIP: 0010:orc_ip arch/x86/kernel/unwind_orc.c:80 [inline]
> RIP: 0010:__orc_find+0x7e/0xf0 arch/x86/kernel/unwind_orc.c:102
> Code: ea 3f 48 c1 fe 02 48 01 f2 48 d1 fa 48 8d 5c 95 00 48 89 da 48 c1 ea 03 0f b6 34 0a 48 89 da 83 e2 07 83 c2 03 40 38 f2 7c 05 <40> 84 f6 75 4b 48 63 13 48 01 da 49 39 d5 73 af 4c 8d 63 fc 49 39
> RSP: 0018:ffffc90003337648 EFLAGS: 00000202
> RAX: ffffffff914e0dd8 RBX: ffffffff90c5215c RCX: dffffc0000000000
> RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffffffff90c52148
> RBP: ffffffff90c52148 R08: ffffffff914e0e1a R09: 0000000000000000
> R10: ffffc900033376f8 R11: 0000000000011271 R12: ffffffff90c52170
> R13: ffffffff82127173 R14: ffffffff90c52148 R15: ffffffff90c52148
> orc_find arch/x86/kernel/unwind_orc.c:227 [inline]
> unwind_next_frame+0x2ec/0x20a0 arch/x86/kernel/unwind_orc.c:494
This is also RCU-read locked.
> arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
> stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
> kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
> kasan_save_track+0x14/0x30 mm/kasan/common.c:68
> poison_kmalloc_redzone mm/kasan/common.c:388 [inline]
> __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:405
> kmalloc_noprof include/linux/slab.h:905 [inline]
> slab_free_hook mm/slub.c:2369 [inline]
> slab_free mm/slub.c:4680 [inline]
> kmem_cache_free+0x142/0x4d0 mm/slub.c:4782
> vms_complete_munmap_vmas+0x573/0x970 mm/vma.c:1293
> do_vmi_align_munmap+0x43b/0x7d0 mm/vma.c:1536
> do_vmi_munmap+0x204/0x3e0 mm/vma.c:1584
> __vm_munmap+0x19a/0x390 mm/vma.c:3155
Simultaneous unmap?
> __do_sys_munmap mm/mmap.c:1080 [inline]
> __se_sys_munmap mm/mmap.c:1077 [inline]
> __x64_sys_munmap+0x59/0x80 mm/mmap.c:1077
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7fb13ec2f2e7
> RSP: 002b:00007fffe10fae78 EFLAGS: 00000202 ORIG_RAX: 000000000000000b
> RAX: ffffffffffffffda RBX: 0000562bd1443f00 RCX: 00007fb13ec2f2e7
> RDX: 0000000000000001 RSI: 000000000002f6d0 RDI: 00007fb13e9c1000
> RBP: 00007fffe10faf80 R08: 00000000000004f0 R09: 0000000000000002
> R10: 00007fffe10fadb0 R11: 0000000000000202 R12: 00007fffe10faec0
> R13: 00007fb13ec42000 R14: 0000562bd1443f00 R15: 0000000000000000
> </TASK>
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
>
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before testing.
>
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
>
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
>
> If you want to undo deduplication, reply with:
> #syz undup
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2)
2025-08-22 12:08 ` Lorenzo Stoakes
@ 2025-08-22 13:55 ` Harry Yoo
2025-08-28 0:29 ` Josh Poimboeuf
0 siblings, 1 reply; 10+ messages in thread
From: Harry Yoo @ 2025-08-22 13:55 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: syzbot, Liam.Howlett, akpm, jannh, linux-kernel, linux-mm,
pfalcato, syzkaller-bugs, vbabka, Sebastian Andrzej Siewior,
jpoimboe, peterz
On Fri, Aug 22, 2025 at 01:08:02PM +0100, Lorenzo Stoakes wrote:
> +cc Sebastian for RCU ORC change...
>
> +cc Harry for slab side.
+cc Josh and Peter for stack unwinding stuff.
> Pinging Jann for the CONFIG_SLUB_RCU_DEBUG element.
>
> Jann - could this possibly be related to CONFIG_SLUB_RCU_DEBUG? As it seems to
> the stack is within KASAN, but no KASAN report so maybe it's KASAN itself that's
> having an issue?
>
> Though I'm thinking maybe it's the orc unwinder itself that could be problematic
> here (yet invoked by CONFIG_SLUB_RCU_DEBUG though)... and yeah kinda suspcious
> because:
>
> - We have two threads freeing VMAs using SLAB_TYPESAFE_BY_RCU
> - CONFIG_SLUB_RCU_DEBUG means that we use KASAN to save an aux stack, which
> makes us do an unwind via ORC, which then takes an RCU read lock on
> unwind_next_frame(), and both are doing this unwinding at the time of report.
> - ???
> - Somehow things get locked up?
>
> I'm not an RCU expert (clearly :) so I'm not sure exactly how this could result
> in a stall, but it's suspicious.
Can this be because of misleading ORC data or logical error in ORC unwinder
that makes it fall into an infinite loop (unwind_done() never returning
true in arch_stack_walk())?
...because the reported line number reported doesn't really make sense
as a cause of stalls.
--
Cheers,
Harry / Hyeonggon
> On Thu, Aug 21, 2025 at 09:15:37PM -0700, syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: be48bcf004f9 Merge tag 'for-6.17-rc2-tag' of git://git.ker..
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=136dfba2580000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=142508fb116c212f
>
> lockdep (CONFIG_PROVE_LOCKING) is on, so I'm guessing there's no deadlock here.
>
> CONFIG_DEBUG_VM_MAPLE_TREE is enabled, which will cause _major_ slowdown on VMA
> operations as the tree is constantly being fully validated.
>
> This may explain the stalls...
>
> > dashboard link: https://syzkaller.appspot.com/bug?extid=8785aaf121cfb2141e0d
> > compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=107a43bc580000
>
> No C repro yet...
>
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/24fd400c6842/disk-be48bcf0.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/59146305635d/vmlinux-be48bcf0.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/b3e5f65cbcc8/bzImage-be48bcf0.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+8785aaf121cfb2141e0d@syzkaller.appspotmail.com
> >
> > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P6029/1:b..l P1208/1:b..l P6031/3:b..l P6030/1:b..l
> > rcu: (detected by 1, t=10502 jiffies, g=6285, q=421 ncpus=2)
>
> So 105s, or 1m45s, that's pretty long...
>
> > task:dhcpcd state:R running task stack:28896 pid:6030 tgid:6030 ppid:5513 task_flags:0x400040 flags:0x00004002
> > Call Trace:
> > <TASK>
> > context_switch kernel/sched/core.c:5357 [inline]
> > __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
> > preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
> > irqentry_exit+0x36/0x90 kernel/entry/common.c:197
> > asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
> > RIP: 0010:unwind_next_frame+0xfe7/0x20a0 arch/x86/kernel/unwind_orc.c:664
>
> Hmm, while the line number is not pertinent, I notice unwind_next_frame() has:
>
> guard(rcu)()
>
> In it from commit 14daa3bca217 ("x86: Use RCU in all users of
> __module_address().") though from Jan 2025...
>
> This is defined (took me a while to track down!!) in include/linux/rcupdate.h:
>
> DEFINE_LOCK_GUARD_0(rcu,
> do {
> rcu_read_lock();
> /*
> * sparse doesn't call the cleanup function,
> * so just release immediately and don't track
> * the context. We don't need to anyway, since
> * the whole point of the guard is to not need
> * the explicit unlock.
> */
> __release(RCU);
> } while (0),
> rcu_read_unlock())
>
> Meaning it's equivalent to a scoped rcu_read_lock() / rcu_read_unlock().
>
> But since no C repro this is likely a race of some kind that might be very hard to hit.
>
> > Code: 85 80 0c 00 00 49 89 6d 40 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e 56 10 00 00 <41> 39 5d 00 0f 84 10 06 00 00 bd 01 00 00 00 e9 de f3 ff ff 48 b8
> > RSP: 0018:ffffc90003cdf6a8 EFLAGS: 00000246
> > RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffc90003ce0000
> > RDX: 1ffff9200079bee3 RSI: ffffc90003cdfa70 RDI: ffffc90003cdf758
> > RBP: ffffc90003cdfae0 R08: 0000000000000001 R09: 0000000000000000
> > R10: ffffc90003cdf718 R11: 00000000000121e6 R12: ffffc90003cdf768
> > R13: ffffc90003cdf718 R14: ffffc90003cdfa80 R15: ffffc90003cdf74c
> > arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
> > stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
> > kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
> > kasan_record_aux_stack+0xa7/0xc0 mm/kasan/generic.c:548
> > slab_free_hook mm/slub.c:2378 [inline]
>
> Invokes the CONFIG_SLUB_RCU_DEBUG stack trace saving stuff
>
> > slab_free mm/slub.c:4680 [inline]
> > kmem_cache_free+0x15a/0x4d0 mm/slub.c:4782
>
> Note that VMAs are SLAB_TYPESAFE_BY_RCU so maybe that's somehow playing a role
> here?
>
> In free_slab():
>
> if (unlikely(s->flags & SLAB_TYPESAFE_BY_RCU))
> call_rcu(&slab->rcu_head, rcu_free_slab);
>
> > vms_complete_munmap_vmas+0x573/0x970 mm/vma.c:1293
> > do_vmi_align_munmap+0x43b/0x7d0 mm/vma.c:1536
> > do_vmi_munmap+0x204/0x3e0 mm/vma.c:1584
> > __vm_munmap+0x19a/0x390 mm/vma.c:3155
> > __do_sys_munmap mm/mmap.c:1080 [inline]
> > __se_sys_munmap mm/mmap.c:1077 [inline]
> > __x64_sys_munmap+0x59/0x80 mm/mmap.c:1077
> > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> > entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> Seems a normal trace for an unmap, note (inlining removes stuff here) it's:
>
> vms_complete_munmap_vmas() -> remove_vma() -> vm_area_free() -> kmem_cache_free()
>
> > RIP: 0033:0x7fb13ec2f2e7
> > RSP: 002b:00007fffe10fae78 EFLAGS: 00000206 ORIG_RAX: 000000000000000b
> > RAX: ffffffffffffffda RBX: 0000562bd1443510 RCX: 00007fb13ec2f2e7
> > RDX: 0000000000000002 RSI: 0000000000004028 RDI: 00007fb13ea1b000
> > RBP: 00007fffe10faf80 R08: 0000562bd1432470 R09: 0000000000000001
> > R10: 00007fffe10fadb0 R11: 0000000000000206 R12: 00007fffe10faea0
> > R13: 00007fb13ec42000 R14: 0000562bd1443510 R15: 0000000000000000
> > </TASK>
> > task:syz-executor state:R running task stack:27632 pid:6031 tgid:6031 ppid:5870 task_flags:0x400000 flags:0x00004000
> > Call Trace:
> > <TASK>
> > context_switch kernel/sched/core.c:5357 [inline]
> > __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
> > preempt_schedule_common+0x44/0xc0 kernel/sched/core.c:7145
> > preempt_schedule_thunk+0x16/0x30 arch/x86/entry/thunk.S:12
> > __raw_spin_unlock include/linux/spinlock_api_smp.h:143 [inline]
> > _raw_spin_unlock+0x3e/0x50 kernel/locking/spinlock.c:186
> > spin_unlock include/linux/spinlock.h:391 [inline]
> > filemap_map_pages+0xe15/0x1670 mm/filemap.c:3791
> > do_fault_around mm/memory.c:5531 [inline]
> > do_read_fault mm/memory.c:5564 [inline]
> > do_fault mm/memory.c:5707 [inline]
> > do_pte_missing+0xe39/0x3ba0 mm/memory.c:4234
> > handle_pte_fault mm/memory.c:6052 [inline]
> > __handle_mm_fault+0x152a/0x2a50 mm/memory.c:6195
> > handle_mm_fault+0x589/0xd10 mm/memory.c:6364
> > do_user_addr_fault+0x60c/0x1370 arch/x86/mm/fault.c:1336
> > handle_page_fault arch/x86/mm/fault.c:1476 [inline]
> > exc_page_fault+0x5c/0xb0 arch/x86/mm/fault.c:1532
> > asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
>
> Faulting path being context switched on unlock of PTE spinlock...
>
> > RIP: 0033:0x7f54cd7177c7
> > RSP: 002b:00007fffb79a5b40 EFLAGS: 00010246
> > RAX: 00007f54ce525000 RBX: 0000000000000000 RCX: 0000000000000064
> > RDX: 00007fffb79a5de9 RSI: 0000000000000002 RDI: 00007fffb79a5dd8
> > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
> > R13: 00007fffb79a5c48 R14: 0000000000000000 R15: 0000000000000000
> > </TASK>
> > task:kworker/0:3 state:R running task stack:25368 pid:1208 tgid:1208 ppid:2 task_flags:0x4208060 flags:0x00004000
> > Workqueue: events_power_efficient gc_worker
> > Call Trace:
> > <TASK>
> > context_switch kernel/sched/core.c:5357 [inline]
> > __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
> > preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
> > irqentry_exit+0x36/0x90 kernel/entry/common.c:197
> > asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
> > RIP: 0010:write_comp_data+0x0/0x90 kernel/kcov.c:240
> > Code: 48 8b 05 db b4 1a 12 48 8b 80 30 16 00 00 e9 97 05 db 09 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <49> 89 d2 49 89 f8 49 89 f1 65 48 8b 15 a7 b4 1a 12 65 8b 05 b8 b4
> > RSP: 0018:ffffc9000441fb50 EFLAGS: 00000293
> > RAX: 0000000000000000 RBX: 0000000000040000 RCX: ffffffff89ba2a52
> > RDX: 0000000000040000 RSI: 0000000000000433 RDI: 0000000000000004
> > RBP: ffffffff9b2c41ec R08: 0000000000000004 R09: 0000000000000000
> > R10: 0000000000000000 R11: ffffffff9b030610 R12: ffff888031800000
> > R13: 0000000000000433 R14: dffffc0000000000 R15: 0000000000001770
> > gc_worker+0x342/0x16e0 net/netfilter/nf_conntrack_core.c:1549
> > process_one_work+0x9cc/0x1b70 kernel/workqueue.c:3236
> > process_scheduled_works kernel/workqueue.c:3319 [inline]
> > worker_thread+0x6c8/0xf10 kernel/workqueue.c:3400
> > kthread+0x3c5/0x780 kernel/kthread.c:463
> > ret_from_fork+0x5d7/0x6f0 arch/x86/kernel/process.c:148
> > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> > </TASK>
> > task:dhcpcd state:R running task stack:26072 pid:6029 tgid:6029 ppid:5513 task_flags:0x400040 flags:0x00004002
> > Call Trace:
> > <TASK>
> > context_switch kernel/sched/core.c:5357 [inline]
> > __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
> > preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
> > irqentry_exit+0x36/0x90 kernel/entry/common.c:197
> > asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
> > RIP: 0010:orc_ip arch/x86/kernel/unwind_orc.c:80 [inline]
> > RIP: 0010:__orc_find+0x7e/0xf0 arch/x86/kernel/unwind_orc.c:102
> > Code: ea 3f 48 c1 fe 02 48 01 f2 48 d1 fa 48 8d 5c 95 00 48 89 da 48 c1 ea 03 0f b6 34 0a 48 89 da 83 e2 07 83 c2 03 40 38 f2 7c 05 <40> 84 f6 75 4b 48 63 13 48 01 da 49 39 d5 73 af 4c 8d 63 fc 49 39
> > RSP: 0018:ffffc90003337648 EFLAGS: 00000202
> > RAX: ffffffff914e0dd8 RBX: ffffffff90c5215c RCX: dffffc0000000000
> > RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffffffff90c52148
> > RBP: ffffffff90c52148 R08: ffffffff914e0e1a R09: 0000000000000000
> > R10: ffffc900033376f8 R11: 0000000000011271 R12: ffffffff90c52170
> > R13: ffffffff82127173 R14: ffffffff90c52148 R15: ffffffff90c52148
> > orc_find arch/x86/kernel/unwind_orc.c:227 [inline]
> > unwind_next_frame+0x2ec/0x20a0 arch/x86/kernel/unwind_orc.c:494
>
> This is also RCU-read locked.
>
> > arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
> > stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
> > kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
> > kasan_save_track+0x14/0x30 mm/kasan/common.c:68
> > poison_kmalloc_redzone mm/kasan/common.c:388 [inline]
> > __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:405
> > kmalloc_noprof include/linux/slab.h:905 [inline]
> > slab_free_hook mm/slub.c:2369 [inline]
> > slab_free mm/slub.c:4680 [inline]
> > kmem_cache_free+0x142/0x4d0 mm/slub.c:4782
> > vms_complete_munmap_vmas+0x573/0x970 mm/vma.c:1293
> > do_vmi_align_munmap+0x43b/0x7d0 mm/vma.c:1536
> > do_vmi_munmap+0x204/0x3e0 mm/vma.c:1584
> > __vm_munmap+0x19a/0x390 mm/vma.c:3155
>
> Simultaneous unmap?
>
> > __do_sys_munmap mm/mmap.c:1080 [inline]
> > __se_sys_munmap mm/mmap.c:1077 [inline]
> > __x64_sys_munmap+0x59/0x80 mm/mmap.c:1077
> > do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> > do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> > entry_SYSCALL_64_after_hwframe+0x77/0x7f
> > RIP: 0033:0x7fb13ec2f2e7
> > RSP: 002b:00007fffe10fae78 EFLAGS: 00000202 ORIG_RAX: 000000000000000b
> > RAX: ffffffffffffffda RBX: 0000562bd1443f00 RCX: 00007fb13ec2f2e7
> > RDX: 0000000000000001 RSI: 000000000002f6d0 RDI: 00007fb13e9c1000
> > RBP: 00007fffe10faf80 R08: 00000000000004f0 R09: 0000000000000002
> > R10: 00007fffe10fadb0 R11: 0000000000000202 R12: 00007fffe10faec0
> > R13: 00007fb13ec42000 R14: 0000562bd1443f00 R15: 0000000000000000
> > </TASK>
> >
> >
> > ---
> > This report is generated by a bot. It may contain errors.
> > See https://goo.gl/tpsmEJ for more information about syzbot.
> > syzbot engineers can be reached at syzkaller@googlegroups.com.
> >
> > syzbot will keep track of this issue. See:
> > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> >
> > If the report is already addressed, let syzbot know by replying with:
> > #syz fix: exact-commit-title
> >
> > If you want syzbot to run the reproducer, reply with:
> > #syz test: git://repo/address.git branch-or-commit-hash
> > If you attach or paste a git patch, syzbot will apply it before testing.
> >
> > If you want to overwrite report's subsystems, reply with:
> > #syz set subsystems: new-subsystem
> > (See the list of subsystem names on the web dashboard)
> >
> > If the report is a duplicate of another one, reply with:
> > #syz dup: exact-subject-of-another-report
> >
> > If you want to undo deduplication, reply with:
> > #syz undup
>
> Cheers, Lorenzo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2)
2025-08-22 13:55 ` Harry Yoo
@ 2025-08-28 0:29 ` Josh Poimboeuf
2025-08-28 1:57 ` Liam R. Howlett
0 siblings, 1 reply; 10+ messages in thread
From: Josh Poimboeuf @ 2025-08-28 0:29 UTC (permalink / raw)
To: Harry Yoo
Cc: Lorenzo Stoakes, syzbot, Liam.Howlett, akpm, jannh, linux-kernel,
linux-mm, pfalcato, syzkaller-bugs, vbabka,
Sebastian Andrzej Siewior, peterz
On Fri, Aug 22, 2025 at 10:55:10PM +0900, Harry Yoo wrote:
> On Fri, Aug 22, 2025 at 01:08:02PM +0100, Lorenzo Stoakes wrote:
> > +cc Sebastian for RCU ORC change...
> >
> > +cc Harry for slab side.
>
> +cc Josh and Peter for stack unwinding stuff.
>
> > Pinging Jann for the CONFIG_SLUB_RCU_DEBUG element.
> >
> > Jann - could this possibly be related to CONFIG_SLUB_RCU_DEBUG? As it seems to
> > the stack is within KASAN, but no KASAN report so maybe it's KASAN itself that's
> > having an issue?
> >
> > Though I'm thinking maybe it's the orc unwinder itself that could be problematic
> > here (yet invoked by CONFIG_SLUB_RCU_DEBUG though)... and yeah kinda suspcious
> > because:
> >
> > - We have two threads freeing VMAs using SLAB_TYPESAFE_BY_RCU
> > - CONFIG_SLUB_RCU_DEBUG means that we use KASAN to save an aux stack, which
> > makes us do an unwind via ORC, which then takes an RCU read lock on
> > unwind_next_frame(), and both are doing this unwinding at the time of report.
> > - ???
> > - Somehow things get locked up?
> >
> > I'm not an RCU expert (clearly :) so I'm not sure exactly how this could result
> > in a stall, but it's suspicious.
>
> Can this be because of misleading ORC data or logical error in ORC unwinder
> that makes it fall into an infinite loop (unwind_done() never returning
> true in arch_stack_walk())?
>
> ...because the reported line number reported doesn't really make sense
> as a cause of stalls.
There shouldn't be any way for ORC to hit an infinite loop. Worst case
it would stop after the caller's buffer fills up. ORC has always been
solid, and the RCU usage looks fine to me. I tend to doubt ORC is at
fault here.
Maybe some interaction higher up the stack is causing things to run in a
tight loop.
All those debugging options (e.g., DEBUG_VM_MAPLE_TREE, LOCKDEP, KASAN,
SLUB_RCU_DEBUG...) could be a factor in slowing things down to a crawl.
--
Josh
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2)
2025-08-28 0:29 ` Josh Poimboeuf
@ 2025-08-28 1:57 ` Liam R. Howlett
2025-08-28 3:35 ` Liam R. Howlett
0 siblings, 1 reply; 10+ messages in thread
From: Liam R. Howlett @ 2025-08-28 1:57 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: Harry Yoo, Lorenzo Stoakes, syzbot, akpm, jannh, linux-kernel,
linux-mm, pfalcato, syzkaller-bugs, vbabka,
Sebastian Andrzej Siewior, peterz
* Josh Poimboeuf <jpoimboe@kernel.org> [250827 20:29]:
> On Fri, Aug 22, 2025 at 10:55:10PM +0900, Harry Yoo wrote:
> > On Fri, Aug 22, 2025 at 01:08:02PM +0100, Lorenzo Stoakes wrote:
> > > +cc Sebastian for RCU ORC change...
> > >
> > > +cc Harry for slab side.
> >
> > +cc Josh and Peter for stack unwinding stuff.
> >
> > > Pinging Jann for the CONFIG_SLUB_RCU_DEBUG element.
> > >
> > > Jann - could this possibly be related to CONFIG_SLUB_RCU_DEBUG? As it seems to
> > > the stack is within KASAN, but no KASAN report so maybe it's KASAN itself that's
> > > having an issue?
> > >
> > > Though I'm thinking maybe it's the orc unwinder itself that could be problematic
> > > here (yet invoked by CONFIG_SLUB_RCU_DEBUG though)... and yeah kinda suspcious
> > > because:
> > >
> > > - We have two threads freeing VMAs using SLAB_TYPESAFE_BY_RCU
> > > - CONFIG_SLUB_RCU_DEBUG means that we use KASAN to save an aux stack, which
> > > makes us do an unwind via ORC, which then takes an RCU read lock on
> > > unwind_next_frame(), and both are doing this unwinding at the time of report.
> > > - ???
> > > - Somehow things get locked up?
> > >
> > > I'm not an RCU expert (clearly :) so I'm not sure exactly how this could result
> > > in a stall, but it's suspicious.
> >
> > Can this be because of misleading ORC data or logical error in ORC unwinder
> > that makes it fall into an infinite loop (unwind_done() never returning
> > true in arch_stack_walk())?
> >
> > ...because the reported line number reported doesn't really make sense
> > as a cause of stalls.
>
> There shouldn't be any way for ORC to hit an infinite loop. Worst case
> it would stop after the caller's buffer fills up. ORC has always been
> solid, and the RCU usage looks fine to me. I tend to doubt ORC is at
> fault here.
>
> Maybe some interaction higher up the stack is causing things to run in a
> tight loop.
>
> All those debugging options (e.g., DEBUG_VM_MAPLE_TREE, LOCKDEP, KASAN,
> SLUB_RCU_DEBUG...) could be a factor in slowing things down to a crawl.
DEBUG_VM_MAPLE_TREE is super heavy, but that comes from validate_mm()
which would be the last thing to happen before returning, usually.
I mean surely that would show up in the logs.
Okay it's in the second log on the dashboard..
Yeah, I think it's debug options eventually causing failure. Apparently
there's a reproducer for syz now but without the validate_mm().
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2)
2025-08-22 4:15 [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2) syzbot
2025-08-22 12:08 ` Lorenzo Stoakes
@ 2025-08-28 2:05 ` Liam R. Howlett
2025-08-28 2:05 ` syzbot
2025-08-28 2:20 ` Liam R. Howlett
2 siblings, 1 reply; 10+ messages in thread
From: Liam R. Howlett @ 2025-08-28 2:05 UTC (permalink / raw)
To: syzbot
Cc: akpm, jannh, linux-kernel, linux-mm, lorenzo.stoakes, pfalcato,
syzkaller-bugs, vbabka
* syzbot <syzbot+8785aaf121cfb2141e0d@syzkaller.appspotmail.com> [250822 00:15]:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: be48bcf004f9 Merge tag 'for-6.17-rc2-tag' of git://git.ker..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=136dfba2580000
> kernel config: https://syzkaller.appspot.com/x/.config?x=142508fb116c212f
> dashboard link: https://syzkaller.appspot.com/bug?extid=8785aaf121cfb2141e0d
> compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=107a43bc580000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/24fd400c6842/disk-be48bcf0.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/59146305635d/vmlinux-be48bcf0.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/b3e5f65cbcc8/bzImage-be48bcf0.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+8785aaf121cfb2141e0d@syzkaller.appspotmail.com
>
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P6029/1:b..l P1208/1:b..l P6031/3:b..l P6030/1:b..l
> rcu: (detected by 1, t=10502 jiffies, g=6285, q=421 ncpus=2)
> task:dhcpcd state:R running task stack:28896 pid:6030 tgid:6030 ppid:5513 task_flags:0x400040 flags:0x00004002
> Call Trace:
> <TASK>
> context_switch kernel/sched/core.c:5357 [inline]
> __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
> preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
> irqentry_exit+0x36/0x90 kernel/entry/common.c:197
> asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
> RIP: 0010:unwind_next_frame+0xfe7/0x20a0 arch/x86/kernel/unwind_orc.c:664
> Code: 85 80 0c 00 00 49 89 6d 40 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e 56 10 00 00 <41> 39 5d 00 0f 84 10 06 00 00 bd 01 00 00 00 e9 de f3 ff ff 48 b8
> RSP: 0018:ffffc90003cdf6a8 EFLAGS: 00000246
> RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffc90003ce0000
> RDX: 1ffff9200079bee3 RSI: ffffc90003cdfa70 RDI: ffffc90003cdf758
> RBP: ffffc90003cdfae0 R08: 0000000000000001 R09: 0000000000000000
> R10: ffffc90003cdf718 R11: 00000000000121e6 R12: ffffc90003cdf768
> R13: ffffc90003cdf718 R14: ffffc90003cdfa80 R15: ffffc90003cdf74c
> arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
> stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
> kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
> kasan_record_aux_stack+0xa7/0xc0 mm/kasan/generic.c:548
> slab_free_hook mm/slub.c:2378 [inline]
> slab_free mm/slub.c:4680 [inline]
> kmem_cache_free+0x15a/0x4d0 mm/slub.c:4782
> vms_complete_munmap_vmas+0x573/0x970 mm/vma.c:1293
> do_vmi_align_munmap+0x43b/0x7d0 mm/vma.c:1536
> do_vmi_munmap+0x204/0x3e0 mm/vma.c:1584
> __vm_munmap+0x19a/0x390 mm/vma.c:3155
> __do_sys_munmap mm/mmap.c:1080 [inline]
> __se_sys_munmap mm/mmap.c:1077 [inline]
> __x64_sys_munmap+0x59/0x80 mm/mmap.c:1077
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7fb13ec2f2e7
> RSP: 002b:00007fffe10fae78 EFLAGS: 00000206 ORIG_RAX: 000000000000000b
> RAX: ffffffffffffffda RBX: 0000562bd1443510 RCX: 00007fb13ec2f2e7
> RDX: 0000000000000002 RSI: 0000000000004028 RDI: 00007fb13ea1b000
> RBP: 00007fffe10faf80 R08: 0000562bd1432470 R09: 0000000000000001
> R10: 00007fffe10fadb0 R11: 0000000000000206 R12: 00007fffe10faea0
> R13: 00007fb13ec42000 R14: 0000562bd1443510 R15: 0000000000000000
> </TASK>
> task:syz-executor state:R running task stack:27632 pid:6031 tgid:6031 ppid:5870 task_flags:0x400000 flags:0x00004000
> Call Trace:
> <TASK>
> context_switch kernel/sched/core.c:5357 [inline]
> __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
> preempt_schedule_common+0x44/0xc0 kernel/sched/core.c:7145
> preempt_schedule_thunk+0x16/0x30 arch/x86/entry/thunk.S:12
> __raw_spin_unlock include/linux/spinlock_api_smp.h:143 [inline]
> _raw_spin_unlock+0x3e/0x50 kernel/locking/spinlock.c:186
> spin_unlock include/linux/spinlock.h:391 [inline]
> filemap_map_pages+0xe15/0x1670 mm/filemap.c:3791
> do_fault_around mm/memory.c:5531 [inline]
> do_read_fault mm/memory.c:5564 [inline]
> do_fault mm/memory.c:5707 [inline]
> do_pte_missing+0xe39/0x3ba0 mm/memory.c:4234
> handle_pte_fault mm/memory.c:6052 [inline]
> __handle_mm_fault+0x152a/0x2a50 mm/memory.c:6195
> handle_mm_fault+0x589/0xd10 mm/memory.c:6364
> do_user_addr_fault+0x60c/0x1370 arch/x86/mm/fault.c:1336
> handle_page_fault arch/x86/mm/fault.c:1476 [inline]
> exc_page_fault+0x5c/0xb0 arch/x86/mm/fault.c:1532
> asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
> RIP: 0033:0x7f54cd7177c7
> RSP: 002b:00007fffb79a5b40 EFLAGS: 00010246
> RAX: 00007f54ce525000 RBX: 0000000000000000 RCX: 0000000000000064
> RDX: 00007fffb79a5de9 RSI: 0000000000000002 RDI: 00007fffb79a5dd8
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
> R13: 00007fffb79a5c48 R14: 0000000000000000 R15: 0000000000000000
> </TASK>
> task:kworker/0:3 state:R running task stack:25368 pid:1208 tgid:1208 ppid:2 task_flags:0x4208060 flags:0x00004000
> Workqueue: events_power_efficient gc_worker
> Call Trace:
> <TASK>
> context_switch kernel/sched/core.c:5357 [inline]
> __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
> preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
> irqentry_exit+0x36/0x90 kernel/entry/common.c:197
> asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
> RIP: 0010:write_comp_data+0x0/0x90 kernel/kcov.c:240
> Code: 48 8b 05 db b4 1a 12 48 8b 80 30 16 00 00 e9 97 05 db 09 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <49> 89 d2 49 89 f8 49 89 f1 65 48 8b 15 a7 b4 1a 12 65 8b 05 b8 b4
> RSP: 0018:ffffc9000441fb50 EFLAGS: 00000293
> RAX: 0000000000000000 RBX: 0000000000040000 RCX: ffffffff89ba2a52
> RDX: 0000000000040000 RSI: 0000000000000433 RDI: 0000000000000004
> RBP: ffffffff9b2c41ec R08: 0000000000000004 R09: 0000000000000000
> R10: 0000000000000000 R11: ffffffff9b030610 R12: ffff888031800000
> R13: 0000000000000433 R14: dffffc0000000000 R15: 0000000000001770
> gc_worker+0x342/0x16e0 net/netfilter/nf_conntrack_core.c:1549
> process_one_work+0x9cc/0x1b70 kernel/workqueue.c:3236
> process_scheduled_works kernel/workqueue.c:3319 [inline]
> worker_thread+0x6c8/0xf10 kernel/workqueue.c:3400
> kthread+0x3c5/0x780 kernel/kthread.c:463
> ret_from_fork+0x5d7/0x6f0 arch/x86/kernel/process.c:148
> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> </TASK>
> task:dhcpcd state:R running task stack:26072 pid:6029 tgid:6029 ppid:5513 task_flags:0x400040 flags:0x00004002
> Call Trace:
> <TASK>
> context_switch kernel/sched/core.c:5357 [inline]
> __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
> preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
> irqentry_exit+0x36/0x90 kernel/entry/common.c:197
> asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
> RIP: 0010:orc_ip arch/x86/kernel/unwind_orc.c:80 [inline]
> RIP: 0010:__orc_find+0x7e/0xf0 arch/x86/kernel/unwind_orc.c:102
> Code: ea 3f 48 c1 fe 02 48 01 f2 48 d1 fa 48 8d 5c 95 00 48 89 da 48 c1 ea 03 0f b6 34 0a 48 89 da 83 e2 07 83 c2 03 40 38 f2 7c 05 <40> 84 f6 75 4b 48 63 13 48 01 da 49 39 d5 73 af 4c 8d 63 fc 49 39
> RSP: 0018:ffffc90003337648 EFLAGS: 00000202
> RAX: ffffffff914e0dd8 RBX: ffffffff90c5215c RCX: dffffc0000000000
> RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffffffff90c52148
> RBP: ffffffff90c52148 R08: ffffffff914e0e1a R09: 0000000000000000
> R10: ffffc900033376f8 R11: 0000000000011271 R12: ffffffff90c52170
> R13: ffffffff82127173 R14: ffffffff90c52148 R15: ffffffff90c52148
> orc_find arch/x86/kernel/unwind_orc.c:227 [inline]
> unwind_next_frame+0x2ec/0x20a0 arch/x86/kernel/unwind_orc.c:494
> arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
> stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
> kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
> kasan_save_track+0x14/0x30 mm/kasan/common.c:68
> poison_kmalloc_redzone mm/kasan/common.c:388 [inline]
> __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:405
> kmalloc_noprof include/linux/slab.h:905 [inline]
> slab_free_hook mm/slub.c:2369 [inline]
> slab_free mm/slub.c:4680 [inline]
> kmem_cache_free+0x142/0x4d0 mm/slub.c:4782
> vms_complete_munmap_vmas+0x573/0x970 mm/vma.c:1293
> do_vmi_align_munmap+0x43b/0x7d0 mm/vma.c:1536
> do_vmi_munmap+0x204/0x3e0 mm/vma.c:1584
> __vm_munmap+0x19a/0x390 mm/vma.c:3155
> __do_sys_munmap mm/mmap.c:1080 [inline]
> __se_sys_munmap mm/mmap.c:1077 [inline]
> __x64_sys_munmap+0x59/0x80 mm/mmap.c:1077
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7fb13ec2f2e7
> RSP: 002b:00007fffe10fae78 EFLAGS: 00000202 ORIG_RAX: 000000000000000b
> RAX: ffffffffffffffda RBX: 0000562bd1443f00 RCX: 00007fb13ec2f2e7
> RDX: 0000000000000001 RSI: 000000000002f6d0 RDI: 00007fb13e9c1000
> RBP: 00007fffe10faf80 R08: 00000000000004f0 R09: 0000000000000002
> R10: 00007fffe10fadb0 R11: 0000000000000202 R12: 00007fffe10faec0
> R13: 00007fb13ec42000 R14: 0000562bd1443f00 R15: 0000000000000000
> </TASK>
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
>
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before testing.
>
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
>
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
>
> If you want to undo deduplication, reply with:
> #syz undup
Let's see if speeding up the debug helps.
#syz test:
--- a/mm/vma.c
+++ b/mm/vma.c
@@ -648,6 +648,7 @@ void validate_mm(struct mm_struct *mm)
struct vm_area_struct *vma;
VMA_ITERATOR(vmi, mm, 0);
+ return;
mt_validate(&mm->mm_mt);
for_each_vma(vmi, vma) {
#ifdef CONFIG_DEBUG_VM_RB
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2)
2025-08-28 2:05 ` Liam R. Howlett
@ 2025-08-28 2:05 ` syzbot
0 siblings, 0 replies; 10+ messages in thread
From: syzbot @ 2025-08-28 2:05 UTC (permalink / raw)
To: liam.howlett
Cc: akpm, jannh, liam.howlett, linux-kernel, linux-mm,
lorenzo.stoakes, pfalcato, syzkaller-bugs, vbabka
> * syzbot <syzbot+8785aaf121cfb2141e0d@syzkaller.appspotmail.com> [250822 00:15]:
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit: be48bcf004f9 Merge tag 'for-6.17-rc2-tag' of git://git.ker..
>> git tree: upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=136dfba2580000
>> kernel config: https://syzkaller.appspot.com/x/.config?x=142508fb116c212f
>> dashboard link: https://syzkaller.appspot.com/bug?extid=8785aaf121cfb2141e0d
>> compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=107a43bc580000
>>
>> Downloadable assets:
>> disk image: https://storage.googleapis.com/syzbot-assets/24fd400c6842/disk-be48bcf0.raw.xz
>> vmlinux: https://storage.googleapis.com/syzbot-assets/59146305635d/vmlinux-be48bcf0.xz
>> kernel image: https://storage.googleapis.com/syzbot-assets/b3e5f65cbcc8/bzImage-be48bcf0.xz
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+8785aaf121cfb2141e0d@syzkaller.appspotmail.com
>>
>> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>> rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P6029/1:b..l P1208/1:b..l P6031/3:b..l P6030/1:b..l
>> rcu: (detected by 1, t=10502 jiffies, g=6285, q=421 ncpus=2)
>> task:dhcpcd state:R running task stack:28896 pid:6030 tgid:6030 ppid:5513 task_flags:0x400040 flags:0x00004002
>> Call Trace:
>> <TASK>
>> context_switch kernel/sched/core.c:5357 [inline]
>> __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
>> preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
>> irqentry_exit+0x36/0x90 kernel/entry/common.c:197
>> asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
>> RIP: 0010:unwind_next_frame+0xfe7/0x20a0 arch/x86/kernel/unwind_orc.c:664
>> Code: 85 80 0c 00 00 49 89 6d 40 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e 56 10 00 00 <41> 39 5d 00 0f 84 10 06 00 00 bd 01 00 00 00 e9 de f3 ff ff 48 b8
>> RSP: 0018:ffffc90003cdf6a8 EFLAGS: 00000246
>> RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffc90003ce0000
>> RDX: 1ffff9200079bee3 RSI: ffffc90003cdfa70 RDI: ffffc90003cdf758
>> RBP: ffffc90003cdfae0 R08: 0000000000000001 R09: 0000000000000000
>> R10: ffffc90003cdf718 R11: 00000000000121e6 R12: ffffc90003cdf768
>> R13: ffffc90003cdf718 R14: ffffc90003cdfa80 R15: ffffc90003cdf74c
>> arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
>> stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
>> kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
>> kasan_record_aux_stack+0xa7/0xc0 mm/kasan/generic.c:548
>> slab_free_hook mm/slub.c:2378 [inline]
>> slab_free mm/slub.c:4680 [inline]
>> kmem_cache_free+0x15a/0x4d0 mm/slub.c:4782
>> vms_complete_munmap_vmas+0x573/0x970 mm/vma.c:1293
>> do_vmi_align_munmap+0x43b/0x7d0 mm/vma.c:1536
>> do_vmi_munmap+0x204/0x3e0 mm/vma.c:1584
>> __vm_munmap+0x19a/0x390 mm/vma.c:3155
>> __do_sys_munmap mm/mmap.c:1080 [inline]
>> __se_sys_munmap mm/mmap.c:1077 [inline]
>> __x64_sys_munmap+0x59/0x80 mm/mmap.c:1077
>> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>> do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
>> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>> RIP: 0033:0x7fb13ec2f2e7
>> RSP: 002b:00007fffe10fae78 EFLAGS: 00000206 ORIG_RAX: 000000000000000b
>> RAX: ffffffffffffffda RBX: 0000562bd1443510 RCX: 00007fb13ec2f2e7
>> RDX: 0000000000000002 RSI: 0000000000004028 RDI: 00007fb13ea1b000
>> RBP: 00007fffe10faf80 R08: 0000562bd1432470 R09: 0000000000000001
>> R10: 00007fffe10fadb0 R11: 0000000000000206 R12: 00007fffe10faea0
>> R13: 00007fb13ec42000 R14: 0000562bd1443510 R15: 0000000000000000
>> </TASK>
>> task:syz-executor state:R running task stack:27632 pid:6031 tgid:6031 ppid:5870 task_flags:0x400000 flags:0x00004000
>> Call Trace:
>> <TASK>
>> context_switch kernel/sched/core.c:5357 [inline]
>> __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
>> preempt_schedule_common+0x44/0xc0 kernel/sched/core.c:7145
>> preempt_schedule_thunk+0x16/0x30 arch/x86/entry/thunk.S:12
>> __raw_spin_unlock include/linux/spinlock_api_smp.h:143 [inline]
>> _raw_spin_unlock+0x3e/0x50 kernel/locking/spinlock.c:186
>> spin_unlock include/linux/spinlock.h:391 [inline]
>> filemap_map_pages+0xe15/0x1670 mm/filemap.c:3791
>> do_fault_around mm/memory.c:5531 [inline]
>> do_read_fault mm/memory.c:5564 [inline]
>> do_fault mm/memory.c:5707 [inline]
>> do_pte_missing+0xe39/0x3ba0 mm/memory.c:4234
>> handle_pte_fault mm/memory.c:6052 [inline]
>> __handle_mm_fault+0x152a/0x2a50 mm/memory.c:6195
>> handle_mm_fault+0x589/0xd10 mm/memory.c:6364
>> do_user_addr_fault+0x60c/0x1370 arch/x86/mm/fault.c:1336
>> handle_page_fault arch/x86/mm/fault.c:1476 [inline]
>> exc_page_fault+0x5c/0xb0 arch/x86/mm/fault.c:1532
>> asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
>> RIP: 0033:0x7f54cd7177c7
>> RSP: 002b:00007fffb79a5b40 EFLAGS: 00010246
>> RAX: 00007f54ce525000 RBX: 0000000000000000 RCX: 0000000000000064
>> RDX: 00007fffb79a5de9 RSI: 0000000000000002 RDI: 00007fffb79a5dd8
>> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
>> R13: 00007fffb79a5c48 R14: 0000000000000000 R15: 0000000000000000
>> </TASK>
>> task:kworker/0:3 state:R running task stack:25368 pid:1208 tgid:1208 ppid:2 task_flags:0x4208060 flags:0x00004000
>> Workqueue: events_power_efficient gc_worker
>> Call Trace:
>> <TASK>
>> context_switch kernel/sched/core.c:5357 [inline]
>> __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
>> preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
>> irqentry_exit+0x36/0x90 kernel/entry/common.c:197
>> asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
>> RIP: 0010:write_comp_data+0x0/0x90 kernel/kcov.c:240
>> Code: 48 8b 05 db b4 1a 12 48 8b 80 30 16 00 00 e9 97 05 db 09 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <49> 89 d2 49 89 f8 49 89 f1 65 48 8b 15 a7 b4 1a 12 65 8b 05 b8 b4
>> RSP: 0018:ffffc9000441fb50 EFLAGS: 00000293
>> RAX: 0000000000000000 RBX: 0000000000040000 RCX: ffffffff89ba2a52
>> RDX: 0000000000040000 RSI: 0000000000000433 RDI: 0000000000000004
>> RBP: ffffffff9b2c41ec R08: 0000000000000004 R09: 0000000000000000
>> R10: 0000000000000000 R11: ffffffff9b030610 R12: ffff888031800000
>> R13: 0000000000000433 R14: dffffc0000000000 R15: 0000000000001770
>> gc_worker+0x342/0x16e0 net/netfilter/nf_conntrack_core.c:1549
>> process_one_work+0x9cc/0x1b70 kernel/workqueue.c:3236
>> process_scheduled_works kernel/workqueue.c:3319 [inline]
>> worker_thread+0x6c8/0xf10 kernel/workqueue.c:3400
>> kthread+0x3c5/0x780 kernel/kthread.c:463
>> ret_from_fork+0x5d7/0x6f0 arch/x86/kernel/process.c:148
>> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
>> </TASK>
>> task:dhcpcd state:R running task stack:26072 pid:6029 tgid:6029 ppid:5513 task_flags:0x400040 flags:0x00004002
>> Call Trace:
>> <TASK>
>> context_switch kernel/sched/core.c:5357 [inline]
>> __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
>> preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
>> irqentry_exit+0x36/0x90 kernel/entry/common.c:197
>> asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
>> RIP: 0010:orc_ip arch/x86/kernel/unwind_orc.c:80 [inline]
>> RIP: 0010:__orc_find+0x7e/0xf0 arch/x86/kernel/unwind_orc.c:102
>> Code: ea 3f 48 c1 fe 02 48 01 f2 48 d1 fa 48 8d 5c 95 00 48 89 da 48 c1 ea 03 0f b6 34 0a 48 89 da 83 e2 07 83 c2 03 40 38 f2 7c 05 <40> 84 f6 75 4b 48 63 13 48 01 da 49 39 d5 73 af 4c 8d 63 fc 49 39
>> RSP: 0018:ffffc90003337648 EFLAGS: 00000202
>> RAX: ffffffff914e0dd8 RBX: ffffffff90c5215c RCX: dffffc0000000000
>> RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffffffff90c52148
>> RBP: ffffffff90c52148 R08: ffffffff914e0e1a R09: 0000000000000000
>> R10: ffffc900033376f8 R11: 0000000000011271 R12: ffffffff90c52170
>> R13: ffffffff82127173 R14: ffffffff90c52148 R15: ffffffff90c52148
>> orc_find arch/x86/kernel/unwind_orc.c:227 [inline]
>> unwind_next_frame+0x2ec/0x20a0 arch/x86/kernel/unwind_orc.c:494
>> arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
>> stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
>> kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
>> kasan_save_track+0x14/0x30 mm/kasan/common.c:68
>> poison_kmalloc_redzone mm/kasan/common.c:388 [inline]
>> __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:405
>> kmalloc_noprof include/linux/slab.h:905 [inline]
>> slab_free_hook mm/slub.c:2369 [inline]
>> slab_free mm/slub.c:4680 [inline]
>> kmem_cache_free+0x142/0x4d0 mm/slub.c:4782
>> vms_complete_munmap_vmas+0x573/0x970 mm/vma.c:1293
>> do_vmi_align_munmap+0x43b/0x7d0 mm/vma.c:1536
>> do_vmi_munmap+0x204/0x3e0 mm/vma.c:1584
>> __vm_munmap+0x19a/0x390 mm/vma.c:3155
>> __do_sys_munmap mm/mmap.c:1080 [inline]
>> __se_sys_munmap mm/mmap.c:1077 [inline]
>> __x64_sys_munmap+0x59/0x80 mm/mmap.c:1077
>> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>> do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
>> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>> RIP: 0033:0x7fb13ec2f2e7
>> RSP: 002b:00007fffe10fae78 EFLAGS: 00000202 ORIG_RAX: 000000000000000b
>> RAX: ffffffffffffffda RBX: 0000562bd1443f00 RCX: 00007fb13ec2f2e7
>> RDX: 0000000000000001 RSI: 000000000002f6d0 RDI: 00007fb13e9c1000
>> RBP: 00007fffe10faf80 R08: 00000000000004f0 R09: 0000000000000002
>> R10: 00007fffe10fadb0 R11: 0000000000000202 R12: 00007fffe10faec0
>> R13: 00007fb13ec42000 R14: 0000562bd1443f00 R15: 0000000000000000
>> </TASK>
>>
>>
>> ---
>> This report is generated by a bot. It may contain errors.
>> See https://goo.gl/tpsmEJ for more information about syzbot.
>> syzbot engineers can be reached at syzkaller@googlegroups.com.
>>
>> syzbot will keep track of this issue. See:
>> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>>
>> If the report is already addressed, let syzbot know by replying with:
>> #syz fix: exact-commit-title
>>
>> If you want syzbot to run the reproducer, reply with:
>> #syz test: git://repo/address.git branch-or-commit-hash
>> If you attach or paste a git patch, syzbot will apply it before testing.
>>
>> If you want to overwrite report's subsystems, reply with:
>> #syz set subsystems: new-subsystem
>> (See the list of subsystem names on the web dashboard)
>>
>> If the report is a duplicate of another one, reply with:
>> #syz dup: exact-subject-of-another-report
>>
>> If you want to undo deduplication, reply with:
>> #syz undup
>
> Let's see if speeding up the debug helps.
>
> #syz test:
"---" does not look like a valid git repo address.
>
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -648,6 +648,7 @@ void validate_mm(struct mm_struct *mm)
> struct vm_area_struct *vma;
> VMA_ITERATOR(vmi, mm, 0);
>
> + return;
> mt_validate(&mm->mm_mt);
> for_each_vma(vmi, vma) {
> #ifdef CONFIG_DEBUG_VM_RB
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2)
2025-08-22 4:15 [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2) syzbot
2025-08-22 12:08 ` Lorenzo Stoakes
2025-08-28 2:05 ` Liam R. Howlett
@ 2025-08-28 2:20 ` Liam R. Howlett
2025-08-28 3:08 ` syzbot
2 siblings, 1 reply; 10+ messages in thread
From: Liam R. Howlett @ 2025-08-28 2:20 UTC (permalink / raw)
To: syzbot
Cc: akpm, jannh, linux-kernel, linux-mm, lorenzo.stoakes, pfalcato,
syzkaller-bugs, vbabka
* syzbot <syzbot+8785aaf121cfb2141e0d@syzkaller.appspotmail.com> [250822 00:15]:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: be48bcf004f9 Merge tag 'for-6.17-rc2-tag' of git://git.ker..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=136dfba2580000
> kernel config: https://syzkaller.appspot.com/x/.config?x=142508fb116c212f
> dashboard link: https://syzkaller.appspot.com/bug?extid=8785aaf121cfb2141e0d
> compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=107a43bc580000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/24fd400c6842/disk-be48bcf0.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/59146305635d/vmlinux-be48bcf0.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/b3e5f65cbcc8/bzImage-be48bcf0.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+8785aaf121cfb2141e0d@syzkaller.appspotmail.com
Apparently I have no idea how to do this.. let's try again.
v6.17-rc2 + skipping validate_mm().
#syz test git://git.infradead.org/users/jedix/linux-maple.git no_validate
>
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P6029/1:b..l P1208/1:b..l P6031/3:b..l P6030/1:b..l
> rcu: (detected by 1, t=10502 jiffies, g=6285, q=421 ncpus=2)
> task:dhcpcd state:R running task stack:28896 pid:6030 tgid:6030 ppid:5513 task_flags:0x400040 flags:0x00004002
> Call Trace:
> <TASK>
> context_switch kernel/sched/core.c:5357 [inline]
> __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
> preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
> irqentry_exit+0x36/0x90 kernel/entry/common.c:197
> asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
> RIP: 0010:unwind_next_frame+0xfe7/0x20a0 arch/x86/kernel/unwind_orc.c:664
> Code: 85 80 0c 00 00 49 89 6d 40 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e 56 10 00 00 <41> 39 5d 00 0f 84 10 06 00 00 bd 01 00 00 00 e9 de f3 ff ff 48 b8
> RSP: 0018:ffffc90003cdf6a8 EFLAGS: 00000246
> RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffc90003ce0000
> RDX: 1ffff9200079bee3 RSI: ffffc90003cdfa70 RDI: ffffc90003cdf758
> RBP: ffffc90003cdfae0 R08: 0000000000000001 R09: 0000000000000000
> R10: ffffc90003cdf718 R11: 00000000000121e6 R12: ffffc90003cdf768
> R13: ffffc90003cdf718 R14: ffffc90003cdfa80 R15: ffffc90003cdf74c
> arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
> stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
> kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
> kasan_record_aux_stack+0xa7/0xc0 mm/kasan/generic.c:548
> slab_free_hook mm/slub.c:2378 [inline]
> slab_free mm/slub.c:4680 [inline]
> kmem_cache_free+0x15a/0x4d0 mm/slub.c:4782
> vms_complete_munmap_vmas+0x573/0x970 mm/vma.c:1293
> do_vmi_align_munmap+0x43b/0x7d0 mm/vma.c:1536
> do_vmi_munmap+0x204/0x3e0 mm/vma.c:1584
> __vm_munmap+0x19a/0x390 mm/vma.c:3155
> __do_sys_munmap mm/mmap.c:1080 [inline]
> __se_sys_munmap mm/mmap.c:1077 [inline]
> __x64_sys_munmap+0x59/0x80 mm/mmap.c:1077
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7fb13ec2f2e7
> RSP: 002b:00007fffe10fae78 EFLAGS: 00000206 ORIG_RAX: 000000000000000b
> RAX: ffffffffffffffda RBX: 0000562bd1443510 RCX: 00007fb13ec2f2e7
> RDX: 0000000000000002 RSI: 0000000000004028 RDI: 00007fb13ea1b000
> RBP: 00007fffe10faf80 R08: 0000562bd1432470 R09: 0000000000000001
> R10: 00007fffe10fadb0 R11: 0000000000000206 R12: 00007fffe10faea0
> R13: 00007fb13ec42000 R14: 0000562bd1443510 R15: 0000000000000000
> </TASK>
> task:syz-executor state:R running task stack:27632 pid:6031 tgid:6031 ppid:5870 task_flags:0x400000 flags:0x00004000
> Call Trace:
> <TASK>
> context_switch kernel/sched/core.c:5357 [inline]
> __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
> preempt_schedule_common+0x44/0xc0 kernel/sched/core.c:7145
> preempt_schedule_thunk+0x16/0x30 arch/x86/entry/thunk.S:12
> __raw_spin_unlock include/linux/spinlock_api_smp.h:143 [inline]
> _raw_spin_unlock+0x3e/0x50 kernel/locking/spinlock.c:186
> spin_unlock include/linux/spinlock.h:391 [inline]
> filemap_map_pages+0xe15/0x1670 mm/filemap.c:3791
> do_fault_around mm/memory.c:5531 [inline]
> do_read_fault mm/memory.c:5564 [inline]
> do_fault mm/memory.c:5707 [inline]
> do_pte_missing+0xe39/0x3ba0 mm/memory.c:4234
> handle_pte_fault mm/memory.c:6052 [inline]
> __handle_mm_fault+0x152a/0x2a50 mm/memory.c:6195
> handle_mm_fault+0x589/0xd10 mm/memory.c:6364
> do_user_addr_fault+0x60c/0x1370 arch/x86/mm/fault.c:1336
> handle_page_fault arch/x86/mm/fault.c:1476 [inline]
> exc_page_fault+0x5c/0xb0 arch/x86/mm/fault.c:1532
> asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
> RIP: 0033:0x7f54cd7177c7
> RSP: 002b:00007fffb79a5b40 EFLAGS: 00010246
> RAX: 00007f54ce525000 RBX: 0000000000000000 RCX: 0000000000000064
> RDX: 00007fffb79a5de9 RSI: 0000000000000002 RDI: 00007fffb79a5dd8
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
> R13: 00007fffb79a5c48 R14: 0000000000000000 R15: 0000000000000000
> </TASK>
> task:kworker/0:3 state:R running task stack:25368 pid:1208 tgid:1208 ppid:2 task_flags:0x4208060 flags:0x00004000
> Workqueue: events_power_efficient gc_worker
> Call Trace:
> <TASK>
> context_switch kernel/sched/core.c:5357 [inline]
> __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
> preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
> irqentry_exit+0x36/0x90 kernel/entry/common.c:197
> asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
> RIP: 0010:write_comp_data+0x0/0x90 kernel/kcov.c:240
> Code: 48 8b 05 db b4 1a 12 48 8b 80 30 16 00 00 e9 97 05 db 09 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <49> 89 d2 49 89 f8 49 89 f1 65 48 8b 15 a7 b4 1a 12 65 8b 05 b8 b4
> RSP: 0018:ffffc9000441fb50 EFLAGS: 00000293
> RAX: 0000000000000000 RBX: 0000000000040000 RCX: ffffffff89ba2a52
> RDX: 0000000000040000 RSI: 0000000000000433 RDI: 0000000000000004
> RBP: ffffffff9b2c41ec R08: 0000000000000004 R09: 0000000000000000
> R10: 0000000000000000 R11: ffffffff9b030610 R12: ffff888031800000
> R13: 0000000000000433 R14: dffffc0000000000 R15: 0000000000001770
> gc_worker+0x342/0x16e0 net/netfilter/nf_conntrack_core.c:1549
> process_one_work+0x9cc/0x1b70 kernel/workqueue.c:3236
> process_scheduled_works kernel/workqueue.c:3319 [inline]
> worker_thread+0x6c8/0xf10 kernel/workqueue.c:3400
> kthread+0x3c5/0x780 kernel/kthread.c:463
> ret_from_fork+0x5d7/0x6f0 arch/x86/kernel/process.c:148
> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> </TASK>
> task:dhcpcd state:R running task stack:26072 pid:6029 tgid:6029 ppid:5513 task_flags:0x400040 flags:0x00004002
> Call Trace:
> <TASK>
> context_switch kernel/sched/core.c:5357 [inline]
> __schedule+0x1190/0x5de0 kernel/sched/core.c:6961
> preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
> irqentry_exit+0x36/0x90 kernel/entry/common.c:197
> asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
> RIP: 0010:orc_ip arch/x86/kernel/unwind_orc.c:80 [inline]
> RIP: 0010:__orc_find+0x7e/0xf0 arch/x86/kernel/unwind_orc.c:102
> Code: ea 3f 48 c1 fe 02 48 01 f2 48 d1 fa 48 8d 5c 95 00 48 89 da 48 c1 ea 03 0f b6 34 0a 48 89 da 83 e2 07 83 c2 03 40 38 f2 7c 05 <40> 84 f6 75 4b 48 63 13 48 01 da 49 39 d5 73 af 4c 8d 63 fc 49 39
> RSP: 0018:ffffc90003337648 EFLAGS: 00000202
> RAX: ffffffff914e0dd8 RBX: ffffffff90c5215c RCX: dffffc0000000000
> RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffffffff90c52148
> RBP: ffffffff90c52148 R08: ffffffff914e0e1a R09: 0000000000000000
> R10: ffffc900033376f8 R11: 0000000000011271 R12: ffffffff90c52170
> R13: ffffffff82127173 R14: ffffffff90c52148 R15: ffffffff90c52148
> orc_find arch/x86/kernel/unwind_orc.c:227 [inline]
> unwind_next_frame+0x2ec/0x20a0 arch/x86/kernel/unwind_orc.c:494
> arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
> stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
> kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
> kasan_save_track+0x14/0x30 mm/kasan/common.c:68
> poison_kmalloc_redzone mm/kasan/common.c:388 [inline]
> __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:405
> kmalloc_noprof include/linux/slab.h:905 [inline]
> slab_free_hook mm/slub.c:2369 [inline]
> slab_free mm/slub.c:4680 [inline]
> kmem_cache_free+0x142/0x4d0 mm/slub.c:4782
> vms_complete_munmap_vmas+0x573/0x970 mm/vma.c:1293
> do_vmi_align_munmap+0x43b/0x7d0 mm/vma.c:1536
> do_vmi_munmap+0x204/0x3e0 mm/vma.c:1584
> __vm_munmap+0x19a/0x390 mm/vma.c:3155
> __do_sys_munmap mm/mmap.c:1080 [inline]
> __se_sys_munmap mm/mmap.c:1077 [inline]
> __x64_sys_munmap+0x59/0x80 mm/mmap.c:1077
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7fb13ec2f2e7
> RSP: 002b:00007fffe10fae78 EFLAGS: 00000202 ORIG_RAX: 000000000000000b
> RAX: ffffffffffffffda RBX: 0000562bd1443f00 RCX: 00007fb13ec2f2e7
> RDX: 0000000000000001 RSI: 000000000002f6d0 RDI: 00007fb13e9c1000
> RBP: 00007fffe10faf80 R08: 00000000000004f0 R09: 0000000000000002
> R10: 00007fffe10fadb0 R11: 0000000000000202 R12: 00007fffe10faec0
> R13: 00007fb13ec42000 R14: 0000562bd1443f00 R15: 0000000000000000
> </TASK>
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
>
> If you want syzbot to run the reproducer, reply with:
> #syz test: git://repo/address.git branch-or-commit-hash
> If you attach or paste a git patch, syzbot will apply it before testing.
>
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
>
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
>
> If you want to undo deduplication, reply with:
> #syz undup
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2)
2025-08-28 2:20 ` Liam R. Howlett
@ 2025-08-28 3:08 ` syzbot
0 siblings, 0 replies; 10+ messages in thread
From: syzbot @ 2025-08-28 3:08 UTC (permalink / raw)
To: akpm, jannh, liam.howlett, linux-kernel, linux-mm,
lorenzo.stoakes, pfalcato, syzkaller-bugs, vbabka
Hello,
syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: rcu detected stall in corrupted
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P5218/1:b..l
rcu: (detected by 0, t=10502 jiffies, g=10417, q=327 ncpus=2)
task:udevd state:R running task stack:26640 pid:5218 tgid:5218 ppid:1 task_flags:0x400140 flags:0x00004002
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5357 [inline]
__schedule+0x1190/0x5de0 kernel/sched/core.c:6961
preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7288
irqentry_exit+0x36/0x90 kernel/entry/common.c:197
asm_sysvec_reschedule_ipi+0x1a/0x20 arch/x86/include/asm/idtentry.h:707
RIP: 0010:lock_acquire+0x30/0x350 kernel/locking/lockdep.c:5828
Code: 4d 89 cf 41 56 41 89 f6 41 55 41 89 d5 41 54 45 89 c4 55 89 cd 53 48 89 fb 48 83 ec 38 65 48 8b 05 0d 79 3e 12 48 89 44 24 30 <31> c0 66 90 65 8b 05 29 79 3e 12 83 f8 07 0f 87 bc 02 00 00 89 c0
RSP: 0018:ffffc90003d0f530 EFLAGS: 00000286
RAX: 4b548df46ee33600 RBX: ffffffff8e5c11e0 RCX: 0000000000000002
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8e5c11e0
RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
R10: ffffc90003d0f618 R11: 00000000000135a3 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
rcu_lock_acquire include/linux/rcupdate.h:331 [inline]
rcu_read_lock include/linux/rcupdate.h:841 [inline]
class_rcu_constructor include/linux/rcupdate.h:1155 [inline]
unwind_next_frame+0xd1/0x20a0 arch/x86/kernel/unwind_orc.c:479
arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25
stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122
kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
kasan_save_track+0x14/0x30 mm/kasan/common.c:68
kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:576
poison_slab_object mm/kasan/common.c:243 [inline]
__kasan_slab_free+0x60/0x70 mm/kasan/common.c:275
kasan_slab_free include/linux/kasan.h:233 [inline]
slab_free_hook mm/slub.c:2417 [inline]
slab_free mm/slub.c:4680 [inline]
kfree+0x2b4/0x4d0 mm/slub.c:4879
tomoyo_realpath_from_path+0x19f/0x6e0 security/tomoyo/realpath.c:286
tomoyo_get_realpath security/tomoyo/file.c:151 [inline]
tomoyo_path_perm+0x274/0x460 security/tomoyo/file.c:822
security_inode_getattr+0x116/0x290 security/security.c:2377
vfs_getattr fs/stat.c:259 [inline]
vfs_statx_path fs/stat.c:299 [inline]
vfs_statx+0x121/0x3f0 fs/stat.c:356
vfs_fstatat+0x7b/0xf0 fs/stat.c:375
__do_sys_newfstatat+0x97/0x120 fs/stat.c:542
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f47f6d11b0a
RSP: 002b:00007ffd84c35818 EFLAGS: 00000246 ORIG_RAX: 0000000000000106
RAX: ffffffffffffffda RBX: 0000559b20c5e418 RCX: 00007f47f6d11b0a
RDX: 00007ffd84c35820 RSI: 0000559b20c4cef3 RDI: 00000000ffffff9c
RBP: 0000559b5aa6d668 R08: 00063d641a57c867 R09: 00007f47f7457000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00007ffd84c35820 R14: 0000000000000000 R15: 00063d641a57c867
</TASK>
rcu: rcu_preempt kthread starved for 966 jiffies! g10417 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt state:R running task stack:28936 pid:16 tgid:16 ppid:2 task_flags:0x208040 flags:0x00004000
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5357 [inline]
__schedule+0x1190/0x5de0 kernel/sched/core.c:6961
__schedule_loop kernel/sched/core.c:7043 [inline]
schedule+0xe7/0x3a0 kernel/sched/core.c:7058
schedule_timeout+0x123/0x290 kernel/time/sleep_timeout.c:99
rcu_gp_fqs_loop+0x1ea/0xb00 kernel/rcu/tree.c:2083
rcu_gp_kthread+0x270/0x380 kernel/rcu/tree.c:2285
kthread+0x3c2/0x780 kernel/kthread.c:463
ret_from_fork+0x5d7/0x6f0 arch/x86/kernel/process.c:148
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
rcu: Stack dump where RCU GP kthread last ran:
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 UID: 0 PID: 6556 Comm: syz.1.23 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025
RIP: 0010:arch_static_branch arch/x86/include/asm/jump_label.h:36 [inline]
RIP: 0010:native_write_msr arch/x86/include/asm/msr.h:139 [inline]
RIP: 0010:wrmsrq arch/x86/include/asm/msr.h:199 [inline]
RIP: 0010:native_apic_msr_write arch/x86/include/asm/apic.h:212 [inline]
RIP: 0010:native_apic_msr_write+0x28/0x40 arch/x86/include/asm/apic.h:206
Code: 90 90 f3 0f 1e fa 8d 87 30 ff ff ff 83 e0 ef 74 20 89 f8 83 e0 ef 83 f8 20 74 16 c1 ef 04 31 d2 89 f0 8d 8f 00 08 00 00 0f 30 <66> 90 c3 cc cc cc cc c3 cc cc cc cc 89 f6 31 d2 89 cf e9 b1 4d ae
RSP: 0018:ffffc900031479f0 EFLAGS: 00000046
RAX: 000000000000003e RBX: ffff8880b8523a00 RCX: 0000000000000838
RDX: 0000000000000000 RSI: 000000000000003e RDI: 0000000000000038
RBP: 000000000000003e R08: 0000000000000005 R09: 000000000000003f
R10: 0000000000000020 R11: ffffffff9b0d2580 R12: dffffc0000000000
R13: 0000000000000000 R14: 0000000000000020 R15: ffffed10170a4745
FS: 0000555558775500(0000) GS:ffff8881247bc000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000c00732e000 CR3: 0000000077210000 CR4: 00000000003526f0
Call Trace:
<TASK>
apic_write arch/x86/include/asm/apic.h:405 [inline]
lapic_next_event+0x10/0x20 arch/x86/kernel/apic/apic.c:416
clockevents_program_min_delta+0x173/0x3a0 kernel/time/clockevents.c:248
clockevents_program_event+0x2a6/0x380 kernel/time/clockevents.c:336
tick_program_event+0xa9/0x140 kernel/time/tick-oneshot.c:44
__hrtimer_reprogram kernel/time/hrtimer.c:685 [inline]
__hrtimer_reprogram kernel/time/hrtimer.c:659 [inline]
hrtimer_reprogram+0x27b/0x450 kernel/time/hrtimer.c:868
hrtimer_start_range_ns+0x9d4/0xfc0 kernel/time/hrtimer.c:1330
__posixtimer_deliver_signal kernel/time/posix-timers.c:322 [inline]
posixtimer_deliver_signal+0x30d/0x6b0 kernel/time/posix-timers.c:348
dequeue_signal+0x307/0x520 kernel/signal.c:660
get_signal+0x602/0x26d0 kernel/signal.c:2914
arch_do_signal_or_restart+0x8f/0x7d0 arch/x86/kernel/signal.c:337
exit_to_user_mode_loop+0x84/0x110 kernel/entry/common.c:40
exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline]
syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline]
syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline]
do_syscall_64+0x3f6/0x4c0 arch/x86/entry/syscall_64.c:100
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f5114d8ebe9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffcc5e18098 EFLAGS: 00000246
RAX: fffffffffffffffc RBX: 00000000000231ee RCX: 00007f5114d8ebe9
RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f5114fb5fac
RBP: 0000000000000032 R08: 00007f5115bce000 R09: 00000012c5e1838f
R10: 00007ffcc5e18190 R11: 0000000000000246 R12: 00007f5114fb5fac
R13: 00007ffcc5e18190 R14: 0000000000023220 R15: 00007ffcc5e181b0
</TASK>
Tested on:
commit: a1617343 skip the validate_mm() for stall test
git tree: git://git.infradead.org/users/jedix/linux-maple.git no_validate
console output: https://syzkaller.appspot.com/x/log.txt?x=13441fbc580000
kernel config: https://syzkaller.appspot.com/x/.config?x=d4703ac89d9e185a
dashboard link: https://syzkaller.appspot.com/bug?extid=8785aaf121cfb2141e0d
compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
Note: no patches were applied.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2)
2025-08-28 1:57 ` Liam R. Howlett
@ 2025-08-28 3:35 ` Liam R. Howlett
0 siblings, 0 replies; 10+ messages in thread
From: Liam R. Howlett @ 2025-08-28 3:35 UTC (permalink / raw)
To: Josh Poimboeuf, Harry Yoo, Lorenzo Stoakes, syzbot, akpm, jannh,
linux-kernel, linux-mm, pfalcato, syzkaller-bugs, vbabka,
Sebastian Andrzej Siewior, peterz
* Liam R. Howlett <Liam.Howlett@oracle.com> [250827 21:58]:
> * Josh Poimboeuf <jpoimboe@kernel.org> [250827 20:29]:
> > On Fri, Aug 22, 2025 at 10:55:10PM +0900, Harry Yoo wrote:
> > > On Fri, Aug 22, 2025 at 01:08:02PM +0100, Lorenzo Stoakes wrote:
> > > > +cc Sebastian for RCU ORC change...
> > > >
> > > > +cc Harry for slab side.
> > >
> > > +cc Josh and Peter for stack unwinding stuff.
> > >
> > > > Pinging Jann for the CONFIG_SLUB_RCU_DEBUG element.
> > > >
> > > > Jann - could this possibly be related to CONFIG_SLUB_RCU_DEBUG? As it seems to
> > > > the stack is within KASAN, but no KASAN report so maybe it's KASAN itself that's
> > > > having an issue?
> > > >
> > > > Though I'm thinking maybe it's the orc unwinder itself that could be problematic
> > > > here (yet invoked by CONFIG_SLUB_RCU_DEBUG though)... and yeah kinda suspcious
> > > > because:
> > > >
> > > > - We have two threads freeing VMAs using SLAB_TYPESAFE_BY_RCU
> > > > - CONFIG_SLUB_RCU_DEBUG means that we use KASAN to save an aux stack, which
> > > > makes us do an unwind via ORC, which then takes an RCU read lock on
> > > > unwind_next_frame(), and both are doing this unwinding at the time of report.
> > > > - ???
> > > > - Somehow things get locked up?
> > > >
> > > > I'm not an RCU expert (clearly :) so I'm not sure exactly how this could result
> > > > in a stall, but it's suspicious.
> > >
> > > Can this be because of misleading ORC data or logical error in ORC unwinder
> > > that makes it fall into an infinite loop (unwind_done() never returning
> > > true in arch_stack_walk())?
> > >
> > > ...because the reported line number reported doesn't really make sense
> > > as a cause of stalls.
> >
> > There shouldn't be any way for ORC to hit an infinite loop. Worst case
> > it would stop after the caller's buffer fills up. ORC has always been
> > solid, and the RCU usage looks fine to me. I tend to doubt ORC is at
> > fault here.
> >
> > Maybe some interaction higher up the stack is causing things to run in a
> > tight loop.
> >
> > All those debugging options (e.g., DEBUG_VM_MAPLE_TREE, LOCKDEP, KASAN,
> > SLUB_RCU_DEBUG...) could be a factor in slowing things down to a crawl.
>
> DEBUG_VM_MAPLE_TREE is super heavy, but that comes from validate_mm()
> which would be the last thing to happen before returning, usually.
>
> I mean surely that would show up in the logs.
>
> Okay it's in the second log on the dashboard..
>
> Yeah, I think it's debug options eventually causing failure. Apparently
> there's a reproducer for syz now but without the validate_mm().
I don't think it's the debugging options as removing the validate_mm()
did not help.
We may want to wait for a c reproducer.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-08-28 3:36 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-22 4:15 [syzbot] [mm?] INFO: rcu detected stall in sys_munmap (2) syzbot
2025-08-22 12:08 ` Lorenzo Stoakes
2025-08-22 13:55 ` Harry Yoo
2025-08-28 0:29 ` Josh Poimboeuf
2025-08-28 1:57 ` Liam R. Howlett
2025-08-28 3:35 ` Liam R. Howlett
2025-08-28 2:05 ` Liam R. Howlett
2025-08-28 2:05 ` syzbot
2025-08-28 2:20 ` Liam R. Howlett
2025-08-28 3:08 ` syzbot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).