All of lore.kernel.org
 help / color / mirror / Atom feed
From: syzbot <syzbot+d8d4c31d40f868eaea30@syzkaller.appspotmail.com>
To: linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com
Subject: Forwarded: Private message regarding: [syzbot] [mm?] INFO: rcu detected stall in purge_vmap_node
Date: Mon, 12 Jan 2026 01:39:03 -0800	[thread overview]
Message-ID: <6964c137.050a0220.eaf7.0097.GAE@google.com> (raw)
In-Reply-To: <696487a4.050a0220.eaf7.0085.GAE@google.com>

For archival purposes, forwarding an incoming command email to
linux-kernel@vger.kernel.org, syzkaller-bugs@googlegroups.com.

***

Subject: Private message regarding: [syzbot] [mm?] INFO: rcu detected stall in purge_vmap_node
Author: kapoorarnav43@gmail.com

From: Arnav Kapoor <kapoorarnav43@gmail.com>
Date: Sun, 12 Jan 2026 15:30:00 +0000
Subject: [PATCH] mm/kasan: add cond_resched() in shadow page table walk

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master

Syzbot reported RCU stalls during vmalloc cleanup:
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks
task:kworker/0:17 state:R running task
purge_vmap_node+0x1ba/0xad0 mm/vmalloc.c:2299

When CONFIG_PAGE_OWNER is enabled, freeing KASAN shadow pages during
vmalloc cleanup triggers expensive stack unwinding via save_stack() ->
unwind_next_frame(), which acquires RCU read locks. Processing large
vmalloc regions can free thousands of shadow pages without yielding,
causing the worker to monopolize CPU for 10+ seconds, leading to RCU
stalls and potential OOM.

The issue occurs in this call chain:
purge_vmap_node()
-> kasan_release_vmalloc_node()
-> kasan_release_vmalloc() [for each vmap_area]
-> __kasan_release_vmalloc()
-> apply_to_existing_page_range()
-> kasan_depopulate_vmalloc_pte() [for each PTE]
-> __free_page()
-> __reset_page_owner() [CONFIG_PAGE_OWNER]
-> save_stack()
-> unwind_next_frame() [RCU read lock held]

Each shadow page free triggers stack unwinding under RCU lock. A single
large vmalloc region can have thousands of shadow pages, creating an
unbounded RCU critical section.

The previous attempt to fix this added cond_resched() between
processing each vmap_area in kasan_release_vmalloc_node(), but that's
insufficient because a single vmap_area can still contain many pages.

Fix this by adding cond_resched() in the page table walk callback
kasan_depopulate_vmalloc_pte() after every 32 pages. This ensures
regular scheduling points during large shadow region depopulation while
minimizing overhead for typical cases.

The batch size of 32 is chosen to:
- Amortize cond_resched() overhead (typically ~100ns) over multiple pages
- Limit worst-case non-preemptible time to ~3ms on typical hardware
(32 pages × ~100μs per stack unwind)
- Match common TLB and cache behavior

Note: We can't use need_resched() alone because under light CPU load,
need_resched() may remain false while RCU grace periods starve. The
batch count provides a guaranteed upper bound.

Reported-by: syzbot+d8d4c31d40f868eaea30@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d8d4c31d40f868eaea30
Signed-off-by: Arnav Kapoor <kapoorarnav43@gmail.com>
---
mm/kasan/shadow.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
index 000000000000..111111111111 100644
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -468,9 +468,23 @@ static int kasan_depopulate_vmalloc_pte(pte_t *ptep, 
unsigned long addr,
void *unused)
{
pte_t pte;
int none;
+ static DEFINE_PER_CPU(unsigned int, depopulate_batch_count);
+ unsigned int *batch = this_cpu_ptr(&depopulate_batch_count);
arch_leave_lazy_mmu_mode();
+ /*
+ * With CONFIG_PAGE_OWNER, each page free triggers expensive stack
+ * unwinding under RCU lock. Yield periodically to prevent RCU stalls
+ * when processing large vmalloc regions with thousands of shadow pages.
+ */
+ if (++(*batch) >= 32) {
+ *batch = 0;
+ cond_resched();
+ arch_enter_lazy_mmu_mode();
+ }
+
spin_lock(&init_mm.page_table_lock);
pte = ptep_get(ptep);
none = pte_none(pte);




On Monday, 12 January 2026 at 14:10:07 UTC+5:30 syzbot wrote:

Hello, 

syzbot has tested the proposed patch but the reproducer is still triggering 
an issue: 
INFO: rcu detected stall in unwind_next_frame 

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: 
rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P6892/1:b..l 
P6893/3:b..l P6746/1:b..l 
rcu: (detected by 1, t=10502 jiffies, g=16737, q=586 ncpus=2) 
task:kworker/u8:18 state:R running task stack:24088 pid:6746 tgid:6746 
ppid:2 task_flags:0x4208060 flags:0x00080000 
Workqueue: kvfree_rcu_reclaim kfree_rcu_monitor 
Call Trace: 
<TASK> 
context_switch kernel/sched/core.c:5256 [inline] 
__schedule+0x1139/0x6150 kernel/sched/core.c:6863 
preempt_schedule_irq+0x51/0x90 kernel/sched/core.c:7190 
irqentry_exit+0x1d8/0x8c0 kernel/entry/common.c:216 
asm_sysvec_apic_timer_interrupt+0x1a/0x20 
arch/x86/include/asm/idtentry.h:697 
RIP: 0010:lock_acquire+0x62/0x330 kernel/locking/lockdep.c:5872 
Code: b4 18 12 83 f8 07 0f 87 a2 02 00 00 89 c0 48 0f a3 05 e2 c1 ee 0e 0f 
82 74 02 00 00 8b 35 7a f2 ee 0e 85 f6 0f 85 8d 00 00 00 <48> 8b 44 24 30 
65 48 2b 05 f9 b3 18 12 0f 85 ad 02 00 00 48 83 c4 
RSP: 0018:ffffc90003fbf5b8 EFLAGS: 00000206 
RAX: 0000000000000046 RBX: ffffffff8e3c96a0 RCX: 00000000993b8195 
RDX: 0000000000000000 RSI: ffffffff8daa8a1d RDI: ffffffff8bf2b400 
RBP: 0000000000000002 R08: 00000000e61a05bb R09: 00000000be61a05b 
R10: 0000000000000002 R11: ffff888029058b30 R12: 0000000000000000 
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 
rcu_lock_acquire include/linux/rcupdate.h:331 [inline] 
rcu_read_lock include/linux/rcupdate.h:867 [inline] 
class_rcu_constructor include/linux/rcupdate.h:1195 [inline] 
unwind_next_frame+0xd1/0x20b0 arch/x86/kernel/unwind_orc.c:495 
arch_stack_walk+0x94/0x100 arch/x86/kernel/stacktrace.c:25 
stack_trace_save+0x8e/0xc0 kernel/stacktrace.c:122 
kasan_save_stack+0x33/0x60 mm/kasan/common.c:57 
kasan_save_track+0x14/0x30 mm/kasan/common.c:78 
kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:584 
poison_slab_object mm/kasan/common.c:253 [inline] 
__kasan_slab_free+0x5f/0x80 mm/kasan/common.c:285 
kasan_slab_free include/linux/kasan.h:235 [inline] 
slab_free_hook mm/slub.c:2540 [inline] 
slab_free_freelist_hook mm/slub.c:2569 [inline] 
slab_free_bulk mm/slub.c:6703 [inline] 
kmem_cache_free_bulk mm/slub.c:7390 [inline] 
kmem_cache_free_bulk+0x2bf/0x680 mm/slub.c:7369 
kfree_bulk include/linux/slab.h:830 [inline] 
kvfree_rcu_bulk+0x1b7/0x1e0 mm/slab_common.c:1523 
kvfree_rcu_drain_ready mm/slab_common.c:1728 [inline] 
kfree_rcu_monitor+0x1d0/0x2f0 mm/slab_common.c:1801 
process_one_work+0x9ba/0x1b20 kernel/workqueue.c:3257 
process_scheduled_works kernel/workqueue.c:3340 [inline] 
worker_thread+0x6c8/0xf10 kernel/workqueue.c:3421 
kthread+0x3c5/0x780 kernel/kthread.c:463 
ret_from_fork+0x983/0xb10 arch/x86/kernel/process.c:158 
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246 
</TASK> 
task:sed state:R running task stack:25800 pid:6893 tgid:6893 ppid:6890 
task_flags:0x400000 flags:0x00080000 
Call Trace: 
<TASK> 
context_switch kernel/sched/core.c:5256 [inline] 
__schedule+0x1139/0x6150 kernel/sched/core.c:6863 
preempt_schedule_common+0x44/0xc0 kernel/sched/core.c:7047 
preempt_schedule_thunk+0x16/0x30 arch/x86/entry/thunk.S:12 
__raw_spin_unlock include/linux/spinlock_api_smp.h:143 [inline] 
_raw_spin_unlock+0x3e/0x50 kernel/locking/spinlock.c:186 
spin_unlock include/linux/spinlock.h:391 [inline] 
filemap_map_pages+0x1194/0x1e00 mm/filemap.c:3931 
do_fault_around mm/memory.c:5713 [inline] 
do_read_fault mm/memory.c:5746 [inline] 
do_fault+0x9cd/0x1ad0 mm/memory.c:5889 
do_pte_missing mm/memory.c:4401 [inline] 
handle_pte_fault mm/memory.c:6273 [inline] 
__handle_mm_fault+0x1919/0x2bb0 mm/memory.c:6411 
handle_mm_fault+0x3fe/0xad0 mm/memory.c:6580 
do_user_addr_fault+0x60c/0x1370 arch/x86/mm/fault.c:1336 
handle_page_fault arch/x86/mm/fault.c:1476 [inline] 
exc_page_fault+0x64/0xc0 arch/x86/mm/fault.c:1532 
asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:618 
RIP: 0033:0x7fc6d7657c50 
RSP: 002b:00007ffe6008c528 EFLAGS: 00010246 
RAX: 0000000000000000 RBX: 00007fc6d768e490 RCX: 00007ffe6008c560 
RDX: 00007fc6d7689d63 RSI: 00007fc6d7689d36 RDI: 00007ffe6008c748 
RBP: 0000000000000041 R08: 00007ffe6008c550 R09: 00007ffe6008c558 
R10: 0000000000000004 R11: 0000000000000246 R12: 00007ffe6008c748 
R13: 00007ffe6008c550 R14: 00007fc6d76ce000 R15: 00005602504c1d98 
</TASK> 
task:udevd state:R running task stack:28152 pid:6892 tgid:6892 ppid:5186 
task_flags:0x400140 flags:0x00080000 
Call Trace: 
<TASK> 
context_switch kernel/sched/core.c:5256 [inline] 
__schedule+0x1139/0x6150 kernel/sched/core.c:6863 
preempt_schedule_common+0x44/0xc0 kernel/sched/core.c:7047 
preempt_schedule_thunk+0x16/0x30 arch/x86/entry/thunk.S:12 
__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:152 [inline] 
_raw_spin_unlock_irqrestore+0x61/0x80 kernel/locking/spinlock.c:194 
sock_def_readable+0x15b/0x5d0 net/core/sock.c:3611 
unix_dgram_sendmsg+0xcbd/0x1830 net/unix/af_unix.c:2286 
sock_sendmsg_nosec net/socket.c:727 [inline] 
__sock_sendmsg net/socket.c:742 [inline] 
sock_write_iter+0x566/0x610 net/socket.c:1195 
new_sync_write fs/read_write.c:593 [inline] 
vfs_write+0x7d3/0x11d0 fs/read_write.c:686 
ksys_write+0x1f8/0x250 fs/read_write.c:738 
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] 
do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94 
entry_SYSCALL_64_after_hwframe+0x77/0x7f 
RIP: 0033:0x7f6502bdf407 
RSP: 002b:00007ffc4b535850 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 
RAX: ffffffffffffffda RBX: 00007f6502b53880 RCX: 00007f6502bdf407 
RDX: 0000000000000000 RSI: 00007ffc4b5358f7 RDI: 000000000000000a 
RBP: 000000000000000a R08: 0000000000000000 R09: 0000000000000000 
R10: 0000000000000000 R11: 0000000000000202 R12: 00007f6502b536e8 
R13: 0000000000000000 R14: 0000000000000000 R15: 000055685a0a1150 
</TASK> 
rcu: rcu_preempt kthread starved for 10572 jiffies! g16737 f0x0 
RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 
rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now 
expected behavior. 
rcu: RCU grace-period kthread stack dump: 
task:rcu_preempt state:R running task stack:28120 pid:16 tgid:16 ppid:2 
task_flags:0x208040 flags:0x00080000 
Call Trace: 
<TASK> 
context_switch kernel/sched/core.c:5256 [inline] 
__schedule+0x1139/0x6150 kernel/sched/core.c:6863 
__schedule_loop kernel/sched/core.c:6945 [inline] 
schedule+0xe7/0x3a0 kernel/sched/core.c:6960 
schedule_timeout+0x123/0x290 kernel/time/sleep_timeout.c:99 
rcu_gp_fqs_loop+0x1ea/0xaf0 kernel/rcu/tree.c:2083 
rcu_gp_kthread+0x26d/0x380 kernel/rcu/tree.c:2285 
kthread+0x3c5/0x780 kernel/kthread.c:463 
ret_from_fork+0x983/0xb10 arch/x86/kernel/process.c:158 
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246 
</TASK> 
rcu: Stack dump where RCU GP kthread last ran: 
Sending NMI from CPU 1 to CPUs 0: 
NMI backtrace for cpu 0 
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
Google 10/25/2025 
RIP: 0010:pv_native_safe_halt+0xf/0x20 arch/x86/kernel/paravirt.c:82 
Code: a6 5f 02 c3 cc cc cc cc 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 
90 90 90 90 f3 0f 1e fa 66 90 0f 00 2d 13 19 12 00 fb f4 <e9> cc 35 03 00 
66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 
RSP: 0018:ffffffff8e007df8 EFLAGS: 000002c6 
RAX: 0000000000186859 RBX: 0000000000000000 RCX: ffffffff8b7846d9 
RDX: 0000000000000000 RSI: ffffffff8daceab2 RDI: ffffffff8bf2b400 
RBP: fffffbfff1c12f68 R08: 0000000000000001 R09: ffffed101708673d 
R10: ffff8880b84339eb R11: ffffffff8e098670 R12: 0000000000000000 
R13: ffffffff8e097b40 R14: ffffffff9088bdd0 R15: 0000000000000000 
FS: 0000000000000000(0000) GS:ffff8881248f5000(0000) knlGS:0000000000000000 
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
CR2: 000055555eaef588 CR3: 00000000522b3000 CR4: 00000000003526f0 
Call Trace: 
<TASK> 
arch_safe_halt arch/x86/include/asm/paravirt.h:107 [inline] 
default_idle+0x13/0x20 arch/x86/kernel/process.c:767 
default_idle_call+0x6c/0xb0 kernel/sched/idle.c:122 
cpuidle_idle_call kernel/sched/idle.c:191 [inline] 
do_idle+0x38d/0x510 kernel/sched/idle.c:332 
cpu_startup_entry+0x4f/0x60 kernel/sched/idle.c:430 
rest_init+0x16b/0x2b0 init/main.c:757 
start_kernel+0x3ef/0x4d0 init/main.c:1206 
x86_64_start_reservations+0x18/0x30 arch/x86/kernel/head64.c:310 
x86_64_start_kernel+0x130/0x190 arch/x86/kernel/head64.c:291 
common_startup_64+0x13e/0x148 
</TASK> 


Tested on: 

commit: 0f61b186 Linux 6.19-rc5 
git tree: upstream 
console output: https://syzkaller.appspot.com/x/log.txt?x=1397199a580000 
kernel config: https://syzkaller.appspot.com/x/.config?x=1859476832863c41 
dashboard link: https://syzkaller.appspot.com/bug?extid=d8d4c31d40f868eaea30 
compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0, GNU ld (GNU Binutils for 
Debian) 2.40 
patch: https://syzkaller.appspot.com/x/patch.diff?x=1704399a580000 


      parent reply	other threads:[~2026-01-12  9:39 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-12  5:33 [syzbot] [mm?] INFO: rcu detected stall in purge_vmap_node syzbot
2026-01-12  7:56 ` Forwarded: [PATCH] mm/vmalloc: prevent RCU stalls in kasan_release_vmalloc_node syzbot
2026-01-12  9:39 ` syzbot [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6964c137.050a0220.eaf7.0097.GAE@google.com \
    --to=syzbot+d8d4c31d40f868eaea30@syzkaller.appspotmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=syzkaller-bugs@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.