* [linus:master] [mm/vmalloc] a061578043: BUG:spinlock_trylock_failure_on_UP_on_CPU
@ 2025-12-10 6:10 kernel test robot
2025-12-10 23:30 ` Vishal Moola (Oracle)
0 siblings, 1 reply; 8+ messages in thread
From: kernel test robot @ 2025-12-10 6:10 UTC (permalink / raw)
To: Vishal Moola
Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Uladzislau Rezki,
linux-mm, oliver.sang
Hello,
kernel test robot noticed "BUG:spinlock_trylock_failure_on_UP_on_CPU" on:
commit: a0615780439938e8e61343f1f92a4c54a71dc6a5 ("mm/vmalloc: request large order pages from buddy allocator")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
[test failed on linus/master cb015814f8b6eebcbb8e46e111d108892c5e6821]
[test failed on linux-next/master c75caf76ed86bbc15a72808f48f8df1608a0886c]
in testcase: trinity
version:
with following parameters:
runtime: 300s
group: group-03
nr_groups: 5
config: x86_64-randconfig-011-20251207
compiler: clang-20
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 32G
(please refer to attached dmesg/kmsg for entire log/backtrace)
the issue show randomly (~50%) in tests.
645a3c4243473d5c a0615780439938e8e61343f1f92
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:60 50% 29:60 dmesg.BUG:spinlock_trylock_failure_on_UP_on_CPU
:60 50% 29:60 dmesg.RIP:_raw_spin_unlock_irqrestore
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202512101320.e2f2dd6f-lkp@intel.com
[ 1046.632156][ C0] BUG: spinlock trylock failure on UP on CPU#0, kcompactd0/28
[ 1046.633368][ C0] lock: 0xffff888807e35ef0, .magic: dead4ead, .owner: kcompactd0/28, .owner_cpu: 0
[ 1046.634872][ C0] CPU: 0 UID: 0 PID: 28 Comm: kcompactd0 Not tainted 6.18.0-rc5-00127-ga06157804399 #1 PREEMPT 8cc09ef94dcec767faa911515ce9e609c45db470
[ 1046.637019][ C0] Call Trace:
[ 1046.637563][ C0] <IRQ>
[ 1046.638038][ C0] __dump_stack (lib/dump_stack.c:95)
[ 1046.638781][ C0] dump_stack_lvl (lib/dump_stack.c:123)
[ 1046.639512][ C0] dump_stack (lib/dump_stack.c:130)
[ 1046.640168][ C0] spin_dump (kernel/locking/spinlock_debug.c:71)
[ 1046.640853][ C0] do_raw_spin_trylock (kernel/locking/spinlock_debug.c:?)
[ 1046.641678][ C0] _raw_spin_trylock (include/linux/spinlock_api_smp.h:89 kernel/locking/spinlock.c:138)
[ 1046.642473][ C0] __free_frozen_pages (mm/page_alloc.c:2973)
[ 1046.643279][ C0] ___free_pages (mm/page_alloc.c:5295)
[ 1046.643956][ C0] __free_pages (mm/page_alloc.c:5334)
[ 1046.644624][ C0] tlb_remove_table_rcu (include/linux/mm.h:? include/linux/mm.h:3122 include/asm-generic/tlb.h:220 mm/mmu_gather.c:227 mm/mmu_gather.c:290)
[ 1046.645520][ C0] ? __cfi_tlb_remove_table_rcu (mm/mmu_gather.c:289)
[ 1046.646384][ C0] ? rcu_core (kernel/rcu/tree.c:?)
[ 1046.647092][ C0] rcu_core (include/linux/rcupdate.h:341 kernel/rcu/tree.c:2607 kernel/rcu/tree.c:2861)
[ 1046.647774][ C0] rcu_core_si (kernel/rcu/tree.c:2879)
[ 1046.648439][ C0] handle_softirqs (arch/x86/include/asm/jump_label.h:36 include/trace/events/irq.h:142 kernel/softirq.c:623)
[ 1046.649202][ C0] __irq_exit_rcu (arch/x86/include/asm/jump_label.h:36 kernel/softirq.c:725)
[ 1046.649919][ C0] irq_exit_rcu (kernel/softirq.c:741)
[ 1046.650593][ C0] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1052)
[ 1046.651520][ C0] </IRQ>
[ 1046.651984][ C0] <TASK>
[ 1046.652466][ C0] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:697)
[ 1046.653389][ C0] RIP: 0010:_raw_spin_unlock_irqrestore (arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:194)
[ 1046.654391][ C0] Code: 00 44 89 f6 c1 ee 09 48 c7 c7 e0 f2 7e 86 31 d2 31 c9 e8 e8 dd 80 fd 4d 85 f6 74 05 e8 de e5 fd ff 0f ba e3 09 73 01 fb 31 f6 <ff> 0d 2f dc 6f 01 0f 95 c3 40 0f 94 c6 48 c7 c7 10 f3 7e 86 31 d2
All code
========
0: 00 44 89 f6 add %al,-0xa(%rcx,%rcx,4)
4: c1 ee 09 shr $0x9,%esi
7: 48 c7 c7 e0 f2 7e 86 mov $0xffffffff867ef2e0,%rdi
e: 31 d2 xor %edx,%edx
10: 31 c9 xor %ecx,%ecx
12: e8 e8 dd 80 fd call 0xfffffffffd80ddff
17: 4d 85 f6 test %r14,%r14
1a: 74 05 je 0x21
1c: e8 de e5 fd ff call 0xfffffffffffde5ff
21: 0f ba e3 09 bt $0x9,%ebx
25: 73 01 jae 0x28
27: fb sti
28: 31 f6 xor %esi,%esi
2a:* ff 0d 2f dc 6f 01 decl 0x16fdc2f(%rip) # 0x16fdc5f <-- trapping instruction
30: 0f 95 c3 setne %bl
33: 40 0f 94 c6 sete %sil
37: 48 c7 c7 10 f3 7e 86 mov $0xffffffff867ef310,%rdi
3e: 31 d2 xor %edx,%edx
Code starting with the faulting instruction
===========================================
0: ff 0d 2f dc 6f 01 decl 0x16fdc2f(%rip) # 0x16fdc35
6: 0f 95 c3 setne %bl
9: 40 0f 94 c6 sete %sil
d: 48 c7 c7 10 f3 7e 86 mov $0xffffffff867ef310,%rdi
14: 31 d2 xor %edx,%edx
[ 1046.657511][ C0] RSP: 0000:ffffc900001cfb50 EFLAGS: 00000246
[ 1046.658482][ C0] RAX: 0000000000000000 RBX: 0000000000000206 RCX: 0000000000000000
[ 1046.659740][ C0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 1046.660979][ C0] RBP: ffffc900001cfb68 R08: 0000000000000000 R09: 0000000000000000
[ 1046.662239][ C0] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888807e35f50
[ 1046.663505][ C0] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 1046.664741][ C0] free_pcppages_bulk (mm/page_alloc.c:1494)
[ 1046.665618][ C0] drain_pages_zone (include/linux/spinlock.h:391 mm/page_alloc.c:2632)
[ 1046.666374][ C0] __drain_all_pages (mm/page_alloc.c:2731)
[ 1046.667171][ C0] drain_all_pages (mm/page_alloc.c:2747)
[ 1046.667908][ C0] kcompactd (mm/compaction.c:3115)
[ 1046.668625][ C0] kthread (kernel/kthread.c:465)
[ 1046.669299][ C0] ? __cfi_kcompactd (mm/compaction.c:3166)
[ 1046.670046][ C0] ? __cfi_kthread (kernel/kthread.c:412)
[ 1046.670764][ C0] ret_from_fork (arch/x86/kernel/process.c:164)
[ 1046.671483][ C0] ? __cfi_kthread (kernel/kthread.c:412)
[ 1046.672174][ C0] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
[ 1046.672936][ C0] </TASK>
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20251210/202512101320.e2f2dd6f-lkp@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [linus:master] [mm/vmalloc] a061578043: BUG:spinlock_trylock_failure_on_UP_on_CPU 2025-12-10 6:10 [linus:master] [mm/vmalloc] a061578043: BUG:spinlock_trylock_failure_on_UP_on_CPU kernel test robot @ 2025-12-10 23:30 ` Vishal Moola (Oracle) 2025-12-11 10:09 ` Uladzislau Rezki 2025-12-11 20:47 ` Andrew Morton 0 siblings, 2 replies; 8+ messages in thread From: Vishal Moola (Oracle) @ 2025-12-10 23:30 UTC (permalink / raw) To: kernel test robot Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Uladzislau Rezki, linux-mm On Wed, Dec 10, 2025 at 02:10:28PM +0800, kernel test robot wrote: > > > Hello, > > kernel test robot noticed "BUG:spinlock_trylock_failure_on_UP_on_CPU" on: > > commit: a0615780439938e8e61343f1f92a4c54a71dc6a5 ("mm/vmalloc: request large order pages from buddy allocator") > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > [test failed on linus/master cb015814f8b6eebcbb8e46e111d108892c5e6821] > [test failed on linux-next/master c75caf76ed86bbc15a72808f48f8df1608a0886c] > > in testcase: trinity > version: > with following parameters: > > runtime: 300s > group: group-03 > nr_groups: 5 > > > > config: x86_64-randconfig-011-20251207 > compiler: clang-20 > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 32G > > (please refer to attached dmesg/kmsg for entire log/backtrace) > > > the issue show randomly (~50%) in tests. > > 645a3c4243473d5c a0615780439938e8e61343f1f92 > ---------------- --------------------------- > fail:runs %reproduction fail:runs > | | | > :60 50% 29:60 dmesg.BUG:spinlock_trylock_failure_on_UP_on_CPU > :60 50% 29:60 dmesg.RIP:_raw_spin_unlock_irqrestore > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot <oliver.sang@intel.com> > | Closes: https://lore.kernel.org/oe-lkp/202512101320.e2f2dd6f-lkp@intel.com > > > [ 1046.632156][ C0] BUG: spinlock trylock failure on UP on CPU#0, kcompactd0/28 > [ 1046.633368][ C0] lock: 0xffff888807e35ef0, .magic: dead4ead, .owner: kcompactd0/28, .owner_cpu: 0 > [ 1046.634872][ C0] CPU: 0 UID: 0 PID: 28 Comm: kcompactd0 Not tainted 6.18.0-rc5-00127-ga06157804399 #1 PREEMPT 8cc09ef94dcec767faa911515ce9e609c45db470 > [ 1046.637019][ C0] Call Trace: > [ 1046.637563][ C0] <IRQ> > [ 1046.638038][ C0] __dump_stack (lib/dump_stack.c:95) > [ 1046.638781][ C0] dump_stack_lvl (lib/dump_stack.c:123) > [ 1046.639512][ C0] dump_stack (lib/dump_stack.c:130) > [ 1046.640168][ C0] spin_dump (kernel/locking/spinlock_debug.c:71) > [ 1046.640853][ C0] do_raw_spin_trylock (kernel/locking/spinlock_debug.c:?) > [ 1046.641678][ C0] _raw_spin_trylock (include/linux/spinlock_api_smp.h:89 kernel/locking/spinlock.c:138) > [ 1046.642473][ C0] __free_frozen_pages (mm/page_alloc.c:2973) > [ 1046.643279][ C0] ___free_pages (mm/page_alloc.c:5295) > [ 1046.643956][ C0] __free_pages (mm/page_alloc.c:5334) > [ 1046.644624][ C0] tlb_remove_table_rcu (include/linux/mm.h:? include/linux/mm.h:3122 include/asm-generic/tlb.h:220 mm/mmu_gather.c:227 mm/mmu_gather.c:290) > [ 1046.645520][ C0] ? __cfi_tlb_remove_table_rcu (mm/mmu_gather.c:289) > [ 1046.646384][ C0] ? rcu_core (kernel/rcu/tree.c:?) > [ 1046.647092][ C0] rcu_core (include/linux/rcupdate.h:341 kernel/rcu/tree.c:2607 kernel/rcu/tree.c:2861) > [ 1046.647774][ C0] rcu_core_si (kernel/rcu/tree.c:2879) > [ 1046.648439][ C0] handle_softirqs (arch/x86/include/asm/jump_label.h:36 include/trace/events/irq.h:142 kernel/softirq.c:623) > [ 1046.649202][ C0] __irq_exit_rcu (arch/x86/include/asm/jump_label.h:36 kernel/softirq.c:725) > [ 1046.649919][ C0] irq_exit_rcu (kernel/softirq.c:741) > [ 1046.650593][ C0] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1052) > [ 1046.651520][ C0] </IRQ> > [ 1046.651984][ C0] <TASK> > [ 1046.652466][ C0] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:697) > [ 1046.653389][ C0] RIP: 0010:_raw_spin_unlock_irqrestore (arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:194) > [ 1046.654391][ C0] Code: 00 44 89 f6 c1 ee 09 48 c7 c7 e0 f2 7e 86 31 d2 31 c9 e8 e8 dd 80 fd 4d 85 f6 74 05 e8 de e5 fd ff 0f ba e3 09 73 01 fb 31 f6 <ff> 0d 2f dc 6f 01 0f 95 c3 40 0f 94 c6 48 c7 c7 10 f3 7e 86 31 d2 > All code > ======== > 0: 00 44 89 f6 add %al,-0xa(%rcx,%rcx,4) > 4: c1 ee 09 shr $0x9,%esi > 7: 48 c7 c7 e0 f2 7e 86 mov $0xffffffff867ef2e0,%rdi > e: 31 d2 xor %edx,%edx > 10: 31 c9 xor %ecx,%ecx > 12: e8 e8 dd 80 fd call 0xfffffffffd80ddff > 17: 4d 85 f6 test %r14,%r14 > 1a: 74 05 je 0x21 > 1c: e8 de e5 fd ff call 0xfffffffffffde5ff > 21: 0f ba e3 09 bt $0x9,%ebx > 25: 73 01 jae 0x28 > 27: fb sti > 28: 31 f6 xor %esi,%esi > 2a:* ff 0d 2f dc 6f 01 decl 0x16fdc2f(%rip) # 0x16fdc5f <-- trapping instruction > 30: 0f 95 c3 setne %bl > 33: 40 0f 94 c6 sete %sil > 37: 48 c7 c7 10 f3 7e 86 mov $0xffffffff867ef310,%rdi > 3e: 31 d2 xor %edx,%edx > > Code starting with the faulting instruction > =========================================== > 0: ff 0d 2f dc 6f 01 decl 0x16fdc2f(%rip) # 0x16fdc35 > 6: 0f 95 c3 setne %bl > 9: 40 0f 94 c6 sete %sil > d: 48 c7 c7 10 f3 7e 86 mov $0xffffffff867ef310,%rdi > 14: 31 d2 xor %edx,%edx > [ 1046.657511][ C0] RSP: 0000:ffffc900001cfb50 EFLAGS: 00000246 > [ 1046.658482][ C0] RAX: 0000000000000000 RBX: 0000000000000206 RCX: 0000000000000000 > [ 1046.659740][ C0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > [ 1046.660979][ C0] RBP: ffffc900001cfb68 R08: 0000000000000000 R09: 0000000000000000 > [ 1046.662239][ C0] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888807e35f50 > [ 1046.663505][ C0] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > [ 1046.664741][ C0] free_pcppages_bulk (mm/page_alloc.c:1494) > [ 1046.665618][ C0] drain_pages_zone (include/linux/spinlock.h:391 mm/page_alloc.c:2632) > [ 1046.666374][ C0] __drain_all_pages (mm/page_alloc.c:2731) > [ 1046.667171][ C0] drain_all_pages (mm/page_alloc.c:2747) > [ 1046.667908][ C0] kcompactd (mm/compaction.c:3115) > [ 1046.668625][ C0] kthread (kernel/kthread.c:465) > [ 1046.669299][ C0] ? __cfi_kcompactd (mm/compaction.c:3166) > [ 1046.670046][ C0] ? __cfi_kthread (kernel/kthread.c:412) > [ 1046.670764][ C0] ret_from_fork (arch/x86/kernel/process.c:164) > [ 1046.671483][ C0] ? __cfi_kthread (kernel/kthread.c:412) > [ 1046.672174][ C0] ret_from_fork_asm (arch/x86/entry/entry_64.S:255) > [ 1046.672936][ C0] </TASK> > > > > The kernel config and materials to reproduce are available at: > https://download.01.org/0day-ci/archive/20251210/202512101320.e2f2dd6f-lkp@intel.com > Hmmm. This looks like a race condition tied to reclaim. I'm assuming we fail to allocate a page and kick off kswapd. Then when we fall back to the bulk allocator which tries to remove a pcp page at the same time as kswapd tries to reclaim it. Maybe? Does something like this fix it? diff --git a/mm/vmalloc.c b/mm/vmalloc.c index ecbac900c35f..0d1480723ddc 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3634,7 +3634,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid, struct page *page; int i; unsigned int large_order = ilog2(nr_remaining); - gfp_t large_gfp = vmalloc_gfp_adjust(gfp, large_order) & ~__GFP_DIRECT_RECLAIM; + gfp_t large_gfp = vmalloc_gfp_adjust(gfp, large_order) & ~__GFP_RECLAIM; large_order = min(max_attempt_order, large_order); ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [linus:master] [mm/vmalloc] a061578043: BUG:spinlock_trylock_failure_on_UP_on_CPU 2025-12-10 23:30 ` Vishal Moola (Oracle) @ 2025-12-11 10:09 ` Uladzislau Rezki 2025-12-11 13:40 ` Oliver Sang 2025-12-11 20:47 ` Andrew Morton 1 sibling, 1 reply; 8+ messages in thread From: Uladzislau Rezki @ 2025-12-11 10:09 UTC (permalink / raw) To: Vishal Moola (Oracle) Cc: kernel test robot, oe-lkp, lkp, linux-kernel, Andrew Morton, Uladzislau Rezki, linux-mm On Wed, Dec 10, 2025 at 03:30:51PM -0800, Vishal Moola (Oracle) wrote: > On Wed, Dec 10, 2025 at 02:10:28PM +0800, kernel test robot wrote: > > > > > > Hello, > > > > kernel test robot noticed "BUG:spinlock_trylock_failure_on_UP_on_CPU" on: > > > > commit: a0615780439938e8e61343f1f92a4c54a71dc6a5 ("mm/vmalloc: request large order pages from buddy allocator") > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > [test failed on linus/master cb015814f8b6eebcbb8e46e111d108892c5e6821] > > [test failed on linux-next/master c75caf76ed86bbc15a72808f48f8df1608a0886c] > > > > in testcase: trinity > > version: > > with following parameters: > > > > runtime: 300s > > group: group-03 > > nr_groups: 5 > > > > > > > > config: x86_64-randconfig-011-20251207 > > compiler: clang-20 > > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 32G > > > > (please refer to attached dmesg/kmsg for entire log/backtrace) > > > > > > the issue show randomly (~50%) in tests. > > > > 645a3c4243473d5c a0615780439938e8e61343f1f92 > > ---------------- --------------------------- > > fail:runs %reproduction fail:runs > > | | | > > :60 50% 29:60 dmesg.BUG:spinlock_trylock_failure_on_UP_on_CPU > > :60 50% 29:60 dmesg.RIP:_raw_spin_unlock_irqrestore > > > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > > the same patch/commit), kindly add following tags > > | Reported-by: kernel test robot <oliver.sang@intel.com> > > | Closes: https://lore.kernel.org/oe-lkp/202512101320.e2f2dd6f-lkp@intel.com > > > > > > [ 1046.632156][ C0] BUG: spinlock trylock failure on UP on CPU#0, kcompactd0/28 > > [ 1046.633368][ C0] lock: 0xffff888807e35ef0, .magic: dead4ead, .owner: kcompactd0/28, .owner_cpu: 0 > > [ 1046.634872][ C0] CPU: 0 UID: 0 PID: 28 Comm: kcompactd0 Not tainted 6.18.0-rc5-00127-ga06157804399 #1 PREEMPT 8cc09ef94dcec767faa911515ce9e609c45db470 > > [ 1046.637019][ C0] Call Trace: > > [ 1046.637563][ C0] <IRQ> > > [ 1046.638038][ C0] __dump_stack (lib/dump_stack.c:95) > > [ 1046.638781][ C0] dump_stack_lvl (lib/dump_stack.c:123) > > [ 1046.639512][ C0] dump_stack (lib/dump_stack.c:130) > > [ 1046.640168][ C0] spin_dump (kernel/locking/spinlock_debug.c:71) > > [ 1046.640853][ C0] do_raw_spin_trylock (kernel/locking/spinlock_debug.c:?) > > [ 1046.641678][ C0] _raw_spin_trylock (include/linux/spinlock_api_smp.h:89 kernel/locking/spinlock.c:138) > > [ 1046.642473][ C0] __free_frozen_pages (mm/page_alloc.c:2973) > > [ 1046.643279][ C0] ___free_pages (mm/page_alloc.c:5295) > > [ 1046.643956][ C0] __free_pages (mm/page_alloc.c:5334) > > [ 1046.644624][ C0] tlb_remove_table_rcu (include/linux/mm.h:? include/linux/mm.h:3122 include/asm-generic/tlb.h:220 mm/mmu_gather.c:227 mm/mmu_gather.c:290) > > [ 1046.645520][ C0] ? __cfi_tlb_remove_table_rcu (mm/mmu_gather.c:289) > > [ 1046.646384][ C0] ? rcu_core (kernel/rcu/tree.c:?) > > [ 1046.647092][ C0] rcu_core (include/linux/rcupdate.h:341 kernel/rcu/tree.c:2607 kernel/rcu/tree.c:2861) > > [ 1046.647774][ C0] rcu_core_si (kernel/rcu/tree.c:2879) > > [ 1046.648439][ C0] handle_softirqs (arch/x86/include/asm/jump_label.h:36 include/trace/events/irq.h:142 kernel/softirq.c:623) > > [ 1046.649202][ C0] __irq_exit_rcu (arch/x86/include/asm/jump_label.h:36 kernel/softirq.c:725) > > [ 1046.649919][ C0] irq_exit_rcu (kernel/softirq.c:741) > > [ 1046.650593][ C0] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1052) > > [ 1046.651520][ C0] </IRQ> > > [ 1046.651984][ C0] <TASK> > > [ 1046.652466][ C0] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:697) > > [ 1046.653389][ C0] RIP: 0010:_raw_spin_unlock_irqrestore (arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:194) > > [ 1046.654391][ C0] Code: 00 44 89 f6 c1 ee 09 48 c7 c7 e0 f2 7e 86 31 d2 31 c9 e8 e8 dd 80 fd 4d 85 f6 74 05 e8 de e5 fd ff 0f ba e3 09 73 01 fb 31 f6 <ff> 0d 2f dc 6f 01 0f 95 c3 40 0f 94 c6 48 c7 c7 10 f3 7e 86 31 d2 > > All code > > ======== > > 0: 00 44 89 f6 add %al,-0xa(%rcx,%rcx,4) > > 4: c1 ee 09 shr $0x9,%esi > > 7: 48 c7 c7 e0 f2 7e 86 mov $0xffffffff867ef2e0,%rdi > > e: 31 d2 xor %edx,%edx > > 10: 31 c9 xor %ecx,%ecx > > 12: e8 e8 dd 80 fd call 0xfffffffffd80ddff > > 17: 4d 85 f6 test %r14,%r14 > > 1a: 74 05 je 0x21 > > 1c: e8 de e5 fd ff call 0xfffffffffffde5ff > > 21: 0f ba e3 09 bt $0x9,%ebx > > 25: 73 01 jae 0x28 > > 27: fb sti > > 28: 31 f6 xor %esi,%esi > > 2a:* ff 0d 2f dc 6f 01 decl 0x16fdc2f(%rip) # 0x16fdc5f <-- trapping instruction > > 30: 0f 95 c3 setne %bl > > 33: 40 0f 94 c6 sete %sil > > 37: 48 c7 c7 10 f3 7e 86 mov $0xffffffff867ef310,%rdi > > 3e: 31 d2 xor %edx,%edx > > > > Code starting with the faulting instruction > > =========================================== > > 0: ff 0d 2f dc 6f 01 decl 0x16fdc2f(%rip) # 0x16fdc35 > > 6: 0f 95 c3 setne %bl > > 9: 40 0f 94 c6 sete %sil > > d: 48 c7 c7 10 f3 7e 86 mov $0xffffffff867ef310,%rdi > > 14: 31 d2 xor %edx,%edx > > [ 1046.657511][ C0] RSP: 0000:ffffc900001cfb50 EFLAGS: 00000246 > > [ 1046.658482][ C0] RAX: 0000000000000000 RBX: 0000000000000206 RCX: 0000000000000000 > > [ 1046.659740][ C0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > [ 1046.660979][ C0] RBP: ffffc900001cfb68 R08: 0000000000000000 R09: 0000000000000000 > > [ 1046.662239][ C0] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888807e35f50 > > [ 1046.663505][ C0] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > > [ 1046.664741][ C0] free_pcppages_bulk (mm/page_alloc.c:1494) > > [ 1046.665618][ C0] drain_pages_zone (include/linux/spinlock.h:391 mm/page_alloc.c:2632) > > [ 1046.666374][ C0] __drain_all_pages (mm/page_alloc.c:2731) > > [ 1046.667171][ C0] drain_all_pages (mm/page_alloc.c:2747) > > [ 1046.667908][ C0] kcompactd (mm/compaction.c:3115) > > [ 1046.668625][ C0] kthread (kernel/kthread.c:465) > > [ 1046.669299][ C0] ? __cfi_kcompactd (mm/compaction.c:3166) > > [ 1046.670046][ C0] ? __cfi_kthread (kernel/kthread.c:412) > > [ 1046.670764][ C0] ret_from_fork (arch/x86/kernel/process.c:164) > > [ 1046.671483][ C0] ? __cfi_kthread (kernel/kthread.c:412) > > [ 1046.672174][ C0] ret_from_fork_asm (arch/x86/entry/entry_64.S:255) > > [ 1046.672936][ C0] </TASK> > > > > > > > > The kernel config and materials to reproduce are available at: > > https://download.01.org/0day-ci/archive/20251210/202512101320.e2f2dd6f-lkp@intel.com > > > > Hmmm. This looks like a race condition tied to reclaim. I'm assuming > we fail to allocate a page and kick off kswapd. Then when we fall back > to the bulk allocator which tries to remove a pcp page at the same time as > kswapd tries to reclaim it. Maybe? > > Does something like this fix it? > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index ecbac900c35f..0d1480723ddc 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -3634,7 +3634,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid, > struct page *page; > int i; > unsigned int large_order = ilog2(nr_remaining); > - gfp_t large_gfp = vmalloc_gfp_adjust(gfp, large_order) & ~__GFP_DIRECT_RECLAIM; > + gfp_t large_gfp = vmalloc_gfp_adjust(gfp, large_order) & ~__GFP_RECLAIM; > > large_order = min(max_attempt_order, large_order); > Some thoughts. That trace is not easy to follow. As for this change, probably it is worth to go with anyway. Because, if we are under a low memory condition, it is quite common for Android devices, we do not want to kick frequently kswapd without good reason. We have a fallback to single page allocator. -- Uladzislau Rezki ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linus:master] [mm/vmalloc] a061578043: BUG:spinlock_trylock_failure_on_UP_on_CPU 2025-12-11 10:09 ` Uladzislau Rezki @ 2025-12-11 13:40 ` Oliver Sang 2025-12-11 13:53 ` Oliver Sang 2025-12-11 20:52 ` Vishal Moola (Oracle) 0 siblings, 2 replies; 8+ messages in thread From: Oliver Sang @ 2025-12-11 13:40 UTC (permalink / raw) To: Uladzislau Rezki Cc: Vishal Moola (Oracle), oe-lkp, lkp, linux-kernel, Andrew Morton, linux-mm, oliver.sang [-- Attachment #1: Type: text/plain, Size: 12332 bytes --] hi, Uladzislau Rezki, On Thu, Dec 11, 2025 at 11:09:43AM +0100, Uladzislau Rezki wrote: > On Wed, Dec 10, 2025 at 03:30:51PM -0800, Vishal Moola (Oracle) wrote: > > On Wed, Dec 10, 2025 at 02:10:28PM +0800, kernel test robot wrote: > > > > > > > > > Hello, > > > > > > kernel test robot noticed "BUG:spinlock_trylock_failure_on_UP_on_CPU" on: > > > > > > commit: a0615780439938e8e61343f1f92a4c54a71dc6a5 ("mm/vmalloc: request large order pages from buddy allocator") > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > > > [test failed on linus/master cb015814f8b6eebcbb8e46e111d108892c5e6821] > > > [test failed on linux-next/master c75caf76ed86bbc15a72808f48f8df1608a0886c] > > > > > > in testcase: trinity > > > version: > > > with following parameters: > > > > > > runtime: 300s > > > group: group-03 > > > nr_groups: 5 > > > > > > > > > > > > config: x86_64-randconfig-011-20251207 > > > compiler: clang-20 > > > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 32G > > > > > > (please refer to attached dmesg/kmsg for entire log/backtrace) > > > > > > > > > the issue show randomly (~50%) in tests. > > > > > > 645a3c4243473d5c a0615780439938e8e61343f1f92 > > > ---------------- --------------------------- > > > fail:runs %reproduction fail:runs > > > | | | > > > :60 50% 29:60 dmesg.BUG:spinlock_trylock_failure_on_UP_on_CPU > > > :60 50% 29:60 dmesg.RIP:_raw_spin_unlock_irqrestore > > > > > > > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > > > the same patch/commit), kindly add following tags > > > | Reported-by: kernel test robot <oliver.sang@intel.com> > > > | Closes: https://lore.kernel.org/oe-lkp/202512101320.e2f2dd6f-lkp@intel.com > > > > > > > > > [ 1046.632156][ C0] BUG: spinlock trylock failure on UP on CPU#0, kcompactd0/28 > > > [ 1046.633368][ C0] lock: 0xffff888807e35ef0, .magic: dead4ead, .owner: kcompactd0/28, .owner_cpu: 0 > > > [ 1046.634872][ C0] CPU: 0 UID: 0 PID: 28 Comm: kcompactd0 Not tainted 6.18.0-rc5-00127-ga06157804399 #1 PREEMPT 8cc09ef94dcec767faa911515ce9e609c45db470 > > > [ 1046.637019][ C0] Call Trace: > > > [ 1046.637563][ C0] <IRQ> > > > [ 1046.638038][ C0] __dump_stack (lib/dump_stack.c:95) > > > [ 1046.638781][ C0] dump_stack_lvl (lib/dump_stack.c:123) > > > [ 1046.639512][ C0] dump_stack (lib/dump_stack.c:130) > > > [ 1046.640168][ C0] spin_dump (kernel/locking/spinlock_debug.c:71) > > > [ 1046.640853][ C0] do_raw_spin_trylock (kernel/locking/spinlock_debug.c:?) > > > [ 1046.641678][ C0] _raw_spin_trylock (include/linux/spinlock_api_smp.h:89 kernel/locking/spinlock.c:138) > > > [ 1046.642473][ C0] __free_frozen_pages (mm/page_alloc.c:2973) > > > [ 1046.643279][ C0] ___free_pages (mm/page_alloc.c:5295) > > > [ 1046.643956][ C0] __free_pages (mm/page_alloc.c:5334) > > > [ 1046.644624][ C0] tlb_remove_table_rcu (include/linux/mm.h:? include/linux/mm.h:3122 include/asm-generic/tlb.h:220 mm/mmu_gather.c:227 mm/mmu_gather.c:290) > > > [ 1046.645520][ C0] ? __cfi_tlb_remove_table_rcu (mm/mmu_gather.c:289) > > > [ 1046.646384][ C0] ? rcu_core (kernel/rcu/tree.c:?) > > > [ 1046.647092][ C0] rcu_core (include/linux/rcupdate.h:341 kernel/rcu/tree.c:2607 kernel/rcu/tree.c:2861) > > > [ 1046.647774][ C0] rcu_core_si (kernel/rcu/tree.c:2879) > > > [ 1046.648439][ C0] handle_softirqs (arch/x86/include/asm/jump_label.h:36 include/trace/events/irq.h:142 kernel/softirq.c:623) > > > [ 1046.649202][ C0] __irq_exit_rcu (arch/x86/include/asm/jump_label.h:36 kernel/softirq.c:725) > > > [ 1046.649919][ C0] irq_exit_rcu (kernel/softirq.c:741) > > > [ 1046.650593][ C0] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1052) > > > [ 1046.651520][ C0] </IRQ> > > > [ 1046.651984][ C0] <TASK> > > > [ 1046.652466][ C0] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:697) > > > [ 1046.653389][ C0] RIP: 0010:_raw_spin_unlock_irqrestore (arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:194) > > > [ 1046.654391][ C0] Code: 00 44 89 f6 c1 ee 09 48 c7 c7 e0 f2 7e 86 31 d2 31 c9 e8 e8 dd 80 fd 4d 85 f6 74 05 e8 de e5 fd ff 0f ba e3 09 73 01 fb 31 f6 <ff> 0d 2f dc 6f 01 0f 95 c3 40 0f 94 c6 48 c7 c7 10 f3 7e 86 31 d2 > > > All code > > > ======== > > > 0: 00 44 89 f6 add %al,-0xa(%rcx,%rcx,4) > > > 4: c1 ee 09 shr $0x9,%esi > > > 7: 48 c7 c7 e0 f2 7e 86 mov $0xffffffff867ef2e0,%rdi > > > e: 31 d2 xor %edx,%edx > > > 10: 31 c9 xor %ecx,%ecx > > > 12: e8 e8 dd 80 fd call 0xfffffffffd80ddff > > > 17: 4d 85 f6 test %r14,%r14 > > > 1a: 74 05 je 0x21 > > > 1c: e8 de e5 fd ff call 0xfffffffffffde5ff > > > 21: 0f ba e3 09 bt $0x9,%ebx > > > 25: 73 01 jae 0x28 > > > 27: fb sti > > > 28: 31 f6 xor %esi,%esi > > > 2a:* ff 0d 2f dc 6f 01 decl 0x16fdc2f(%rip) # 0x16fdc5f <-- trapping instruction > > > 30: 0f 95 c3 setne %bl > > > 33: 40 0f 94 c6 sete %sil > > > 37: 48 c7 c7 10 f3 7e 86 mov $0xffffffff867ef310,%rdi > > > 3e: 31 d2 xor %edx,%edx > > > > > > Code starting with the faulting instruction > > > =========================================== > > > 0: ff 0d 2f dc 6f 01 decl 0x16fdc2f(%rip) # 0x16fdc35 > > > 6: 0f 95 c3 setne %bl > > > 9: 40 0f 94 c6 sete %sil > > > d: 48 c7 c7 10 f3 7e 86 mov $0xffffffff867ef310,%rdi > > > 14: 31 d2 xor %edx,%edx > > > [ 1046.657511][ C0] RSP: 0000:ffffc900001cfb50 EFLAGS: 00000246 > > > [ 1046.658482][ C0] RAX: 0000000000000000 RBX: 0000000000000206 RCX: 0000000000000000 > > > [ 1046.659740][ C0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > > [ 1046.660979][ C0] RBP: ffffc900001cfb68 R08: 0000000000000000 R09: 0000000000000000 > > > [ 1046.662239][ C0] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888807e35f50 > > > [ 1046.663505][ C0] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > > > [ 1046.664741][ C0] free_pcppages_bulk (mm/page_alloc.c:1494) > > > [ 1046.665618][ C0] drain_pages_zone (include/linux/spinlock.h:391 mm/page_alloc.c:2632) > > > [ 1046.666374][ C0] __drain_all_pages (mm/page_alloc.c:2731) > > > [ 1046.667171][ C0] drain_all_pages (mm/page_alloc.c:2747) > > > [ 1046.667908][ C0] kcompactd (mm/compaction.c:3115) > > > [ 1046.668625][ C0] kthread (kernel/kthread.c:465) > > > [ 1046.669299][ C0] ? __cfi_kcompactd (mm/compaction.c:3166) > > > [ 1046.670046][ C0] ? __cfi_kthread (kernel/kthread.c:412) > > > [ 1046.670764][ C0] ret_from_fork (arch/x86/kernel/process.c:164) > > > [ 1046.671483][ C0] ? __cfi_kthread (kernel/kthread.c:412) > > > [ 1046.672174][ C0] ret_from_fork_asm (arch/x86/entry/entry_64.S:255) > > > [ 1046.672936][ C0] </TASK> > > > > > > > > > > > > The kernel config and materials to reproduce are available at: > > > https://download.01.org/0day-ci/archive/20251210/202512101320.e2f2dd6f-lkp@intel.com > > > > > > > Hmmm. This looks like a race condition tied to reclaim. I'm assuming > > we fail to allocate a page and kick off kswapd. Then when we fall back > > to the bulk allocator which tries to remove a pcp page at the same time as > > kswapd tries to reclaim it. Maybe? > > > > Does something like this fix it? below patch is failed to applied upon a061578043. since before reporting, we also tested on latest mainline tip at that time: [test failed on linus/master cb015814f8b6eebcbb8e46e111d108892c5e6821] which still can reproduce the issue, we applied below patch upon this commit. we found the issue still exists, though the rate seems drop, we only reproduced the issue 4 times out of 100 runs. one dmesg attached FYI. [ 448.229620][ C0] BUG: spinlock trylock failure on UP on CPU#0, kcompactd0/28 [ 448.230037][ C0] lock: 0xffff888807e75f08, .magic: dead4ead, .owner: kcompactd0/28, .owner_cpu: 0 [ 448.230474][ C0] CPU: 0 UID: 0 PID: 28 Comm: kcompactd0 Not tainted 6.18.0-12693-g5c827f68ce27 #1 PREEMPT b3721ff206516c93c37a896ae4d3df039c2d04c2 [ 448.231106][ C0] Call Trace: [ 448.231264][ C0] <IRQ> [ 448.231403][ C0] __dump_stack+0x19/0x40 [ 448.231627][ C0] dump_stack_lvl+0x36/0xb8 [ 448.231842][ C0] dump_stack+0x10/0x38 [ 448.232101][ C0] spin_dump+0x11c/0x1c0 [ 448.232404][ C0] do_raw_spin_trylock+0xe4/0x138 [ 448.232758][ C0] _raw_spin_trylock+0x1a/0xb8 [ 448.233104][ C0] __free_frozen_pages+0x750/0xa80 [ 448.233456][ C0] ___free_pages+0x67/0x140 [ 448.233765][ C0] __free_pages+0x10/0x38 [ 448.234066][ C0] __tlb_remove_table+0x16a/0x200 [ 448.234412][ C0] tlb_remove_table_rcu+0x77/0xf8 [ 448.234771][ C0] ? __cfi_tlb_remove_table_rcu+0x8/0x8 [ 448.235153][ C0] ? rcu_core+0x8af/0x1740 [ 448.235456][ C0] rcu_core+0xa25/0x1740 [ 448.235751][ C0] rcu_core_si+0xe/0x38 [ 448.236038][ C0] handle_softirqs+0x1db/0x700 [ 448.236371][ C0] __irq_exit_rcu+0x44/0xc0 [ 448.236678][ C0] irq_exit_rcu+0xe/0x38 [ 448.236968][ C0] sysvec_apic_timer_interrupt+0x78/0xb8 [ 448.237382][ C0] </IRQ> [ 448.237588][ C0] <TASK> [ 448.237798][ C0] asm_sysvec_apic_timer_interrupt+0x1b/0x40 [ 448.238214][ C0] RIP: 0010:_raw_spin_unlock_irqrestore+0x6b/0xb8 [ 448.238575][ C0] Code: 00 44 89 f6 c1 ee 09 48 c7 c7 60 91 80 86 31 d2 31 c9 e8 e8 b4 86 fd 4d 85 f6 74 05 e8 9e e9 fd ff 0f ba e3 09 73 01 fb 31 f6 <ff> 0d ef b8 75 01 0f 95 c3 40 0f 94 c6 48 c7 c7 90 91 80 86 31 d2 [ 448.239787][ C0] RSP: 0000:ffffc900001cfb48 EFLAGS: 00000246 [ 448.240187][ C0] RAX: 0000000000000000 RBX: 0000000000000206 RCX: 0000000000000000 [ 448.240700][ C0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 448.241230][ C0] RBP: ffffc900001cfb60 R08: 0000000000000000 R09: 0000000000000000 [ 448.241758][ C0] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888807e75fc8 [ 448.242289][ C0] R13: 0000000000000006 R14: 0000000000000000 R15: ffffea000cee6410 [ 448.242840][ C0] ? _raw_spin_unlock_irqrestore+0x58/0xb8 [ 448.243241][ C0] free_pcppages_bulk+0x4a2/0x500 [ 448.243591][ C0] drain_pages_zone+0xa5/0x140 [ 448.243936][ C0] __drain_all_pages+0x1ab/0x240 [ 448.244277][ C0] drain_all_pages+0x10/0x38 [ 448.244587][ C0] kcompactd+0x61f/0xaf8 [ 448.244900][ C0] kthread+0x586/0x678 [ 448.245183][ C0] ? __cfi_kcompactd+0x8/0x8 [ 448.245504][ C0] ? __cfi_kthread+0x8/0x8 [ 448.245815][ C0] ret_from_fork+0x24a/0x478 [ 448.246138][ C0] ? __cfi_kthread+0x8/0x8 [ 448.246453][ C0] ret_from_fork_asm+0x11/0x40 [ 448.246795][ C0] </TASK> [ 693.283069][ T328] hwclock: can't open '/dev/misc/rtc': No such file or directory LKP: ttyS0: 289: LKP: tbox cant kexec and rebooting forcely [ 701.346273][ T289] sysrq: Emergency Sync [ 701.346778][ T10] Emergency Sync complete [ 701.347270][ T289] sysrq: Resetting > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > index ecbac900c35f..0d1480723ddc 100644 > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > @@ -3634,7 +3634,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid, > > struct page *page; > > int i; > > unsigned int large_order = ilog2(nr_remaining); > > - gfp_t large_gfp = vmalloc_gfp_adjust(gfp, large_order) & ~__GFP_DIRECT_RECLAIM; > > + gfp_t large_gfp = vmalloc_gfp_adjust(gfp, large_order) & ~__GFP_RECLAIM; > > > > large_order = min(max_attempt_order, large_order); > > > Some thoughts. > > That trace is not easy to follow. As for this change, probably it > is worth to go with anyway. Because, if we are under a low memory > condition, it is quite common for Android devices, we do not want > to kick frequently kswapd without good reason. We have a fallback > to single page allocator. > > -- > Uladzislau Rezki [-- Attachment #2: dmesg.xz --] [-- Type: application/x-xz, Size: 15780 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linus:master] [mm/vmalloc] a061578043: BUG:spinlock_trylock_failure_on_UP_on_CPU 2025-12-11 13:40 ` Oliver Sang @ 2025-12-11 13:53 ` Oliver Sang 2025-12-11 20:52 ` Vishal Moola (Oracle) 1 sibling, 0 replies; 8+ messages in thread From: Oliver Sang @ 2025-12-11 13:53 UTC (permalink / raw) To: Uladzislau Rezki Cc: Vishal Moola (Oracle), oe-lkp, lkp, linux-kernel, Andrew Morton, linux-mm, oliver.sang Sorry for wrong reply. I should reply Vishal Moola who wrote the patch. On Thu, Dec 11, 2025 at 09:40:07PM +0800, Oliver Sang wrote: > hi, Uladzislau Rezki, > > On Thu, Dec 11, 2025 at 11:09:43AM +0100, Uladzislau Rezki wrote: > > On Wed, Dec 10, 2025 at 03:30:51PM -0800, Vishal Moola (Oracle) wrote: > > > On Wed, Dec 10, 2025 at 02:10:28PM +0800, kernel test robot wrote: > > > > > > > > > > > > Hello, > > > > > > > > kernel test robot noticed "BUG:spinlock_trylock_failure_on_UP_on_CPU" on: > > > > > > > > commit: a0615780439938e8e61343f1f92a4c54a71dc6a5 ("mm/vmalloc: request large order pages from buddy allocator") > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > > > > > [test failed on linus/master cb015814f8b6eebcbb8e46e111d108892c5e6821] > > > > [test failed on linux-next/master c75caf76ed86bbc15a72808f48f8df1608a0886c] > > > > > > > > in testcase: trinity > > > > version: > > > > with following parameters: > > > > > > > > runtime: 300s > > > > group: group-03 > > > > nr_groups: 5 > > > > > > > > > > > > > > > > config: x86_64-randconfig-011-20251207 > > > > compiler: clang-20 > > > > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 32G > > > > > > > > (please refer to attached dmesg/kmsg for entire log/backtrace) > > > > > > > > > > > > the issue show randomly (~50%) in tests. > > > > > > > > 645a3c4243473d5c a0615780439938e8e61343f1f92 > > > > ---------------- --------------------------- > > > > fail:runs %reproduction fail:runs > > > > | | | > > > > :60 50% 29:60 dmesg.BUG:spinlock_trylock_failure_on_UP_on_CPU > > > > :60 50% 29:60 dmesg.RIP:_raw_spin_unlock_irqrestore > > > > > > > > > > > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > > > > the same patch/commit), kindly add following tags > > > > | Reported-by: kernel test robot <oliver.sang@intel.com> > > > > | Closes: https://lore.kernel.org/oe-lkp/202512101320.e2f2dd6f-lkp@intel.com > > > > > > > > > > > > [ 1046.632156][ C0] BUG: spinlock trylock failure on UP on CPU#0, kcompactd0/28 > > > > [ 1046.633368][ C0] lock: 0xffff888807e35ef0, .magic: dead4ead, .owner: kcompactd0/28, .owner_cpu: 0 > > > > [ 1046.634872][ C0] CPU: 0 UID: 0 PID: 28 Comm: kcompactd0 Not tainted 6.18.0-rc5-00127-ga06157804399 #1 PREEMPT 8cc09ef94dcec767faa911515ce9e609c45db470 > > > > [ 1046.637019][ C0] Call Trace: > > > > [ 1046.637563][ C0] <IRQ> > > > > [ 1046.638038][ C0] __dump_stack (lib/dump_stack.c:95) > > > > [ 1046.638781][ C0] dump_stack_lvl (lib/dump_stack.c:123) > > > > [ 1046.639512][ C0] dump_stack (lib/dump_stack.c:130) > > > > [ 1046.640168][ C0] spin_dump (kernel/locking/spinlock_debug.c:71) > > > > [ 1046.640853][ C0] do_raw_spin_trylock (kernel/locking/spinlock_debug.c:?) > > > > [ 1046.641678][ C0] _raw_spin_trylock (include/linux/spinlock_api_smp.h:89 kernel/locking/spinlock.c:138) > > > > [ 1046.642473][ C0] __free_frozen_pages (mm/page_alloc.c:2973) > > > > [ 1046.643279][ C0] ___free_pages (mm/page_alloc.c:5295) > > > > [ 1046.643956][ C0] __free_pages (mm/page_alloc.c:5334) > > > > [ 1046.644624][ C0] tlb_remove_table_rcu (include/linux/mm.h:? include/linux/mm.h:3122 include/asm-generic/tlb.h:220 mm/mmu_gather.c:227 mm/mmu_gather.c:290) > > > > [ 1046.645520][ C0] ? __cfi_tlb_remove_table_rcu (mm/mmu_gather.c:289) > > > > [ 1046.646384][ C0] ? rcu_core (kernel/rcu/tree.c:?) > > > > [ 1046.647092][ C0] rcu_core (include/linux/rcupdate.h:341 kernel/rcu/tree.c:2607 kernel/rcu/tree.c:2861) > > > > [ 1046.647774][ C0] rcu_core_si (kernel/rcu/tree.c:2879) > > > > [ 1046.648439][ C0] handle_softirqs (arch/x86/include/asm/jump_label.h:36 include/trace/events/irq.h:142 kernel/softirq.c:623) > > > > [ 1046.649202][ C0] __irq_exit_rcu (arch/x86/include/asm/jump_label.h:36 kernel/softirq.c:725) > > > > [ 1046.649919][ C0] irq_exit_rcu (kernel/softirq.c:741) > > > > [ 1046.650593][ C0] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1052) > > > > [ 1046.651520][ C0] </IRQ> > > > > [ 1046.651984][ C0] <TASK> > > > > [ 1046.652466][ C0] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:697) > > > > [ 1046.653389][ C0] RIP: 0010:_raw_spin_unlock_irqrestore (arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:194) > > > > [ 1046.654391][ C0] Code: 00 44 89 f6 c1 ee 09 48 c7 c7 e0 f2 7e 86 31 d2 31 c9 e8 e8 dd 80 fd 4d 85 f6 74 05 e8 de e5 fd ff 0f ba e3 09 73 01 fb 31 f6 <ff> 0d 2f dc 6f 01 0f 95 c3 40 0f 94 c6 48 c7 c7 10 f3 7e 86 31 d2 > > > > All code > > > > ======== > > > > 0: 00 44 89 f6 add %al,-0xa(%rcx,%rcx,4) > > > > 4: c1 ee 09 shr $0x9,%esi > > > > 7: 48 c7 c7 e0 f2 7e 86 mov $0xffffffff867ef2e0,%rdi > > > > e: 31 d2 xor %edx,%edx > > > > 10: 31 c9 xor %ecx,%ecx > > > > 12: e8 e8 dd 80 fd call 0xfffffffffd80ddff > > > > 17: 4d 85 f6 test %r14,%r14 > > > > 1a: 74 05 je 0x21 > > > > 1c: e8 de e5 fd ff call 0xfffffffffffde5ff > > > > 21: 0f ba e3 09 bt $0x9,%ebx > > > > 25: 73 01 jae 0x28 > > > > 27: fb sti > > > > 28: 31 f6 xor %esi,%esi > > > > 2a:* ff 0d 2f dc 6f 01 decl 0x16fdc2f(%rip) # 0x16fdc5f <-- trapping instruction > > > > 30: 0f 95 c3 setne %bl > > > > 33: 40 0f 94 c6 sete %sil > > > > 37: 48 c7 c7 10 f3 7e 86 mov $0xffffffff867ef310,%rdi > > > > 3e: 31 d2 xor %edx,%edx > > > > > > > > Code starting with the faulting instruction > > > > =========================================== > > > > 0: ff 0d 2f dc 6f 01 decl 0x16fdc2f(%rip) # 0x16fdc35 > > > > 6: 0f 95 c3 setne %bl > > > > 9: 40 0f 94 c6 sete %sil > > > > d: 48 c7 c7 10 f3 7e 86 mov $0xffffffff867ef310,%rdi > > > > 14: 31 d2 xor %edx,%edx > > > > [ 1046.657511][ C0] RSP: 0000:ffffc900001cfb50 EFLAGS: 00000246 > > > > [ 1046.658482][ C0] RAX: 0000000000000000 RBX: 0000000000000206 RCX: 0000000000000000 > > > > [ 1046.659740][ C0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > > > [ 1046.660979][ C0] RBP: ffffc900001cfb68 R08: 0000000000000000 R09: 0000000000000000 > > > > [ 1046.662239][ C0] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888807e35f50 > > > > [ 1046.663505][ C0] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > > > > [ 1046.664741][ C0] free_pcppages_bulk (mm/page_alloc.c:1494) > > > > [ 1046.665618][ C0] drain_pages_zone (include/linux/spinlock.h:391 mm/page_alloc.c:2632) > > > > [ 1046.666374][ C0] __drain_all_pages (mm/page_alloc.c:2731) > > > > [ 1046.667171][ C0] drain_all_pages (mm/page_alloc.c:2747) > > > > [ 1046.667908][ C0] kcompactd (mm/compaction.c:3115) > > > > [ 1046.668625][ C0] kthread (kernel/kthread.c:465) > > > > [ 1046.669299][ C0] ? __cfi_kcompactd (mm/compaction.c:3166) > > > > [ 1046.670046][ C0] ? __cfi_kthread (kernel/kthread.c:412) > > > > [ 1046.670764][ C0] ret_from_fork (arch/x86/kernel/process.c:164) > > > > [ 1046.671483][ C0] ? __cfi_kthread (kernel/kthread.c:412) > > > > [ 1046.672174][ C0] ret_from_fork_asm (arch/x86/entry/entry_64.S:255) > > > > [ 1046.672936][ C0] </TASK> > > > > > > > > > > > > > > > > The kernel config and materials to reproduce are available at: > > > > https://download.01.org/0day-ci/archive/20251210/202512101320.e2f2dd6f-lkp@intel.com > > > > > > > > > > Hmmm. This looks like a race condition tied to reclaim. I'm assuming > > > we fail to allocate a page and kick off kswapd. Then when we fall back > > > to the bulk allocator which tries to remove a pcp page at the same time as > > > kswapd tries to reclaim it. Maybe? > > > > > > Does something like this fix it? > > below patch is failed to applied upon a061578043. > > since before reporting, we also tested on latest mainline tip at that time: > [test failed on linus/master cb015814f8b6eebcbb8e46e111d108892c5e6821] > which still can reproduce the issue, we applied below patch upon this commit. > > > we found the issue still exists, though the rate seems drop, we only reproduced > the issue 4 times out of 100 runs. > > one dmesg attached FYI. > > > [ 448.229620][ C0] BUG: spinlock trylock failure on UP on CPU#0, kcompactd0/28 > [ 448.230037][ C0] lock: 0xffff888807e75f08, .magic: dead4ead, .owner: kcompactd0/28, .owner_cpu: 0 > [ 448.230474][ C0] CPU: 0 UID: 0 PID: 28 Comm: kcompactd0 Not tainted 6.18.0-12693-g5c827f68ce27 #1 PREEMPT b3721ff206516c93c37a896ae4d3df039c2d04c2 > [ 448.231106][ C0] Call Trace: > [ 448.231264][ C0] <IRQ> > [ 448.231403][ C0] __dump_stack+0x19/0x40 > [ 448.231627][ C0] dump_stack_lvl+0x36/0xb8 > [ 448.231842][ C0] dump_stack+0x10/0x38 > [ 448.232101][ C0] spin_dump+0x11c/0x1c0 > [ 448.232404][ C0] do_raw_spin_trylock+0xe4/0x138 > [ 448.232758][ C0] _raw_spin_trylock+0x1a/0xb8 > [ 448.233104][ C0] __free_frozen_pages+0x750/0xa80 > [ 448.233456][ C0] ___free_pages+0x67/0x140 > [ 448.233765][ C0] __free_pages+0x10/0x38 > [ 448.234066][ C0] __tlb_remove_table+0x16a/0x200 > [ 448.234412][ C0] tlb_remove_table_rcu+0x77/0xf8 > [ 448.234771][ C0] ? __cfi_tlb_remove_table_rcu+0x8/0x8 > [ 448.235153][ C0] ? rcu_core+0x8af/0x1740 > [ 448.235456][ C0] rcu_core+0xa25/0x1740 > [ 448.235751][ C0] rcu_core_si+0xe/0x38 > [ 448.236038][ C0] handle_softirqs+0x1db/0x700 > [ 448.236371][ C0] __irq_exit_rcu+0x44/0xc0 > [ 448.236678][ C0] irq_exit_rcu+0xe/0x38 > [ 448.236968][ C0] sysvec_apic_timer_interrupt+0x78/0xb8 > [ 448.237382][ C0] </IRQ> > [ 448.237588][ C0] <TASK> > [ 448.237798][ C0] asm_sysvec_apic_timer_interrupt+0x1b/0x40 > [ 448.238214][ C0] RIP: 0010:_raw_spin_unlock_irqrestore+0x6b/0xb8 > [ 448.238575][ C0] Code: 00 44 89 f6 c1 ee 09 48 c7 c7 60 91 80 86 31 d2 31 c9 e8 e8 b4 86 fd 4d 85 f6 74 05 e8 9e e9 fd ff 0f ba e3 09 73 01 fb 31 f6 <ff> 0d ef b8 75 01 0f 95 c3 40 0f 94 c6 48 c7 c7 90 91 80 86 31 d2 > [ 448.239787][ C0] RSP: 0000:ffffc900001cfb48 EFLAGS: 00000246 > [ 448.240187][ C0] RAX: 0000000000000000 RBX: 0000000000000206 RCX: 0000000000000000 > [ 448.240700][ C0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > [ 448.241230][ C0] RBP: ffffc900001cfb60 R08: 0000000000000000 R09: 0000000000000000 > [ 448.241758][ C0] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888807e75fc8 > [ 448.242289][ C0] R13: 0000000000000006 R14: 0000000000000000 R15: ffffea000cee6410 > [ 448.242840][ C0] ? _raw_spin_unlock_irqrestore+0x58/0xb8 > [ 448.243241][ C0] free_pcppages_bulk+0x4a2/0x500 > [ 448.243591][ C0] drain_pages_zone+0xa5/0x140 > [ 448.243936][ C0] __drain_all_pages+0x1ab/0x240 > [ 448.244277][ C0] drain_all_pages+0x10/0x38 > [ 448.244587][ C0] kcompactd+0x61f/0xaf8 > [ 448.244900][ C0] kthread+0x586/0x678 > [ 448.245183][ C0] ? __cfi_kcompactd+0x8/0x8 > [ 448.245504][ C0] ? __cfi_kthread+0x8/0x8 > [ 448.245815][ C0] ret_from_fork+0x24a/0x478 > [ 448.246138][ C0] ? __cfi_kthread+0x8/0x8 > [ 448.246453][ C0] ret_from_fork_asm+0x11/0x40 > [ 448.246795][ C0] </TASK> > [ 693.283069][ T328] hwclock: can't open '/dev/misc/rtc': No such file or directory > LKP: ttyS0: 289: LKP: tbox cant kexec and rebooting forcely > [ 701.346273][ T289] sysrq: Emergency Sync > [ 701.346778][ T10] Emergency Sync complete > [ 701.347270][ T289] sysrq: Resetting > > > > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > > index ecbac900c35f..0d1480723ddc 100644 > > > --- a/mm/vmalloc.c > > > +++ b/mm/vmalloc.c > > > @@ -3634,7 +3634,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid, > > > struct page *page; > > > int i; > > > unsigned int large_order = ilog2(nr_remaining); > > > - gfp_t large_gfp = vmalloc_gfp_adjust(gfp, large_order) & ~__GFP_DIRECT_RECLAIM; > > > + gfp_t large_gfp = vmalloc_gfp_adjust(gfp, large_order) & ~__GFP_RECLAIM; > > > > > > large_order = min(max_attempt_order, large_order); > > > > > Some thoughts. > > > > That trace is not easy to follow. As for this change, probably it > > is worth to go with anyway. Because, if we are under a low memory > > condition, it is quite common for Android devices, we do not want > > to kick frequently kswapd without good reason. We have a fallback > > to single page allocator. > > > > -- > > Uladzislau Rezki ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linus:master] [mm/vmalloc] a061578043: BUG:spinlock_trylock_failure_on_UP_on_CPU 2025-12-11 13:40 ` Oliver Sang 2025-12-11 13:53 ` Oliver Sang @ 2025-12-11 20:52 ` Vishal Moola (Oracle) 1 sibling, 0 replies; 8+ messages in thread From: Vishal Moola (Oracle) @ 2025-12-11 20:52 UTC (permalink / raw) To: Oliver Sang Cc: Uladzislau Rezki, oe-lkp, lkp, linux-kernel, Andrew Morton, linux-mm On Thu, Dec 11, 2025 at 09:40:07PM +0800, Oliver Sang wrote: > hi, Uladzislau Rezki, > > On Thu, Dec 11, 2025 at 11:09:43AM +0100, Uladzislau Rezki wrote: > > On Wed, Dec 10, 2025 at 03:30:51PM -0800, Vishal Moola (Oracle) wrote: > > > On Wed, Dec 10, 2025 at 02:10:28PM +0800, kernel test robot wrote: > > > > > > > > > > > > Hello, > > > > > > > > kernel test robot noticed "BUG:spinlock_trylock_failure_on_UP_on_CPU" on: > > > > > > > > commit: a0615780439938e8e61343f1f92a4c54a71dc6a5 ("mm/vmalloc: request large order pages from buddy allocator") > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > > > > > [test failed on linus/master cb015814f8b6eebcbb8e46e111d108892c5e6821] > > > > [test failed on linux-next/master c75caf76ed86bbc15a72808f48f8df1608a0886c] > > > > > > > > in testcase: trinity > > > > version: > > > > with following parameters: > > > > > > > > runtime: 300s > > > > group: group-03 > > > > nr_groups: 5 > > > > > > > > > > > > > > > > config: x86_64-randconfig-011-20251207 > > > > compiler: clang-20 > > > > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 32G > > > > > > > > (please refer to attached dmesg/kmsg for entire log/backtrace) > > > > > > > > > > > > the issue show randomly (~50%) in tests. > > > > > > > > 645a3c4243473d5c a0615780439938e8e61343f1f92 > > > > ---------------- --------------------------- > > > > fail:runs %reproduction fail:runs > > > > | | | > > > > :60 50% 29:60 dmesg.BUG:spinlock_trylock_failure_on_UP_on_CPU > > > > :60 50% 29:60 dmesg.RIP:_raw_spin_unlock_irqrestore > > > > > > > > > > > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > > > > the same patch/commit), kindly add following tags > > > > | Reported-by: kernel test robot <oliver.sang@intel.com> > > > > | Closes: https://lore.kernel.org/oe-lkp/202512101320.e2f2dd6f-lkp@intel.com > > > > > > > > > > > > [ 1046.632156][ C0] BUG: spinlock trylock failure on UP on CPU#0, kcompactd0/28 > > > > [ 1046.633368][ C0] lock: 0xffff888807e35ef0, .magic: dead4ead, .owner: kcompactd0/28, .owner_cpu: 0 > > > > [ 1046.634872][ C0] CPU: 0 UID: 0 PID: 28 Comm: kcompactd0 Not tainted 6.18.0-rc5-00127-ga06157804399 #1 PREEMPT 8cc09ef94dcec767faa911515ce9e609c45db470 > > > > [ 1046.637019][ C0] Call Trace: > > > > [ 1046.637563][ C0] <IRQ> > > > > [ 1046.638038][ C0] __dump_stack (lib/dump_stack.c:95) > > > > [ 1046.638781][ C0] dump_stack_lvl (lib/dump_stack.c:123) > > > > [ 1046.639512][ C0] dump_stack (lib/dump_stack.c:130) > > > > [ 1046.640168][ C0] spin_dump (kernel/locking/spinlock_debug.c:71) > > > > [ 1046.640853][ C0] do_raw_spin_trylock (kernel/locking/spinlock_debug.c:?) > > > > [ 1046.641678][ C0] _raw_spin_trylock (include/linux/spinlock_api_smp.h:89 kernel/locking/spinlock.c:138) > > > > [ 1046.642473][ C0] __free_frozen_pages (mm/page_alloc.c:2973) > > > > [ 1046.643279][ C0] ___free_pages (mm/page_alloc.c:5295) > > > > [ 1046.643956][ C0] __free_pages (mm/page_alloc.c:5334) > > > > [ 1046.644624][ C0] tlb_remove_table_rcu (include/linux/mm.h:? include/linux/mm.h:3122 include/asm-generic/tlb.h:220 mm/mmu_gather.c:227 mm/mmu_gather.c:290) > > > > [ 1046.645520][ C0] ? __cfi_tlb_remove_table_rcu (mm/mmu_gather.c:289) > > > > [ 1046.646384][ C0] ? rcu_core (kernel/rcu/tree.c:?) > > > > [ 1046.647092][ C0] rcu_core (include/linux/rcupdate.h:341 kernel/rcu/tree.c:2607 kernel/rcu/tree.c:2861) > > > > [ 1046.647774][ C0] rcu_core_si (kernel/rcu/tree.c:2879) > > > > [ 1046.648439][ C0] handle_softirqs (arch/x86/include/asm/jump_label.h:36 include/trace/events/irq.h:142 kernel/softirq.c:623) > > > > [ 1046.649202][ C0] __irq_exit_rcu (arch/x86/include/asm/jump_label.h:36 kernel/softirq.c:725) > > > > [ 1046.649919][ C0] irq_exit_rcu (kernel/softirq.c:741) > > > > [ 1046.650593][ C0] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1052) > > > > [ 1046.651520][ C0] </IRQ> > > > > [ 1046.651984][ C0] <TASK> > > > > [ 1046.652466][ C0] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:697) > > > > [ 1046.653389][ C0] RIP: 0010:_raw_spin_unlock_irqrestore (arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:194) > > > > [ 1046.654391][ C0] Code: 00 44 89 f6 c1 ee 09 48 c7 c7 e0 f2 7e 86 31 d2 31 c9 e8 e8 dd 80 fd 4d 85 f6 74 05 e8 de e5 fd ff 0f ba e3 09 73 01 fb 31 f6 <ff> 0d 2f dc 6f 01 0f 95 c3 40 0f 94 c6 48 c7 c7 10 f3 7e 86 31 d2 > > > > All code > > > > ======== > > > > 0: 00 44 89 f6 add %al,-0xa(%rcx,%rcx,4) > > > > 4: c1 ee 09 shr $0x9,%esi > > > > 7: 48 c7 c7 e0 f2 7e 86 mov $0xffffffff867ef2e0,%rdi > > > > e: 31 d2 xor %edx,%edx > > > > 10: 31 c9 xor %ecx,%ecx > > > > 12: e8 e8 dd 80 fd call 0xfffffffffd80ddff > > > > 17: 4d 85 f6 test %r14,%r14 > > > > 1a: 74 05 je 0x21 > > > > 1c: e8 de e5 fd ff call 0xfffffffffffde5ff > > > > 21: 0f ba e3 09 bt $0x9,%ebx > > > > 25: 73 01 jae 0x28 > > > > 27: fb sti > > > > 28: 31 f6 xor %esi,%esi > > > > 2a:* ff 0d 2f dc 6f 01 decl 0x16fdc2f(%rip) # 0x16fdc5f <-- trapping instruction > > > > 30: 0f 95 c3 setne %bl > > > > 33: 40 0f 94 c6 sete %sil > > > > 37: 48 c7 c7 10 f3 7e 86 mov $0xffffffff867ef310,%rdi > > > > 3e: 31 d2 xor %edx,%edx > > > > > > > > Code starting with the faulting instruction > > > > =========================================== > > > > 0: ff 0d 2f dc 6f 01 decl 0x16fdc2f(%rip) # 0x16fdc35 > > > > 6: 0f 95 c3 setne %bl > > > > 9: 40 0f 94 c6 sete %sil > > > > d: 48 c7 c7 10 f3 7e 86 mov $0xffffffff867ef310,%rdi > > > > 14: 31 d2 xor %edx,%edx > > > > [ 1046.657511][ C0] RSP: 0000:ffffc900001cfb50 EFLAGS: 00000246 > > > > [ 1046.658482][ C0] RAX: 0000000000000000 RBX: 0000000000000206 RCX: 0000000000000000 > > > > [ 1046.659740][ C0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > > > [ 1046.660979][ C0] RBP: ffffc900001cfb68 R08: 0000000000000000 R09: 0000000000000000 > > > > [ 1046.662239][ C0] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888807e35f50 > > > > [ 1046.663505][ C0] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > > > > [ 1046.664741][ C0] free_pcppages_bulk (mm/page_alloc.c:1494) > > > > [ 1046.665618][ C0] drain_pages_zone (include/linux/spinlock.h:391 mm/page_alloc.c:2632) > > > > [ 1046.666374][ C0] __drain_all_pages (mm/page_alloc.c:2731) > > > > [ 1046.667171][ C0] drain_all_pages (mm/page_alloc.c:2747) > > > > [ 1046.667908][ C0] kcompactd (mm/compaction.c:3115) > > > > [ 1046.668625][ C0] kthread (kernel/kthread.c:465) > > > > [ 1046.669299][ C0] ? __cfi_kcompactd (mm/compaction.c:3166) > > > > [ 1046.670046][ C0] ? __cfi_kthread (kernel/kthread.c:412) > > > > [ 1046.670764][ C0] ret_from_fork (arch/x86/kernel/process.c:164) > > > > [ 1046.671483][ C0] ? __cfi_kthread (kernel/kthread.c:412) > > > > [ 1046.672174][ C0] ret_from_fork_asm (arch/x86/entry/entry_64.S:255) > > > > [ 1046.672936][ C0] </TASK> > > > > > > > > > > > > > > > > The kernel config and materials to reproduce are available at: > > > > https://download.01.org/0day-ci/archive/20251210/202512101320.e2f2dd6f-lkp@intel.com > > > > > > > > > > Hmmm. This looks like a race condition tied to reclaim. I'm assuming > > > we fail to allocate a page and kick off kswapd. Then when we fall back > > > to the bulk allocator which tries to remove a pcp page at the same time as > > > kswapd tries to reclaim it. Maybe? > > > > > > Does something like this fix it? > > below patch is failed to applied upon a061578043. > > since before reporting, we also tested on latest mainline tip at that time: > [test failed on linus/master cb015814f8b6eebcbb8e46e111d108892c5e6821] > which still can reproduce the issue, we applied below patch upon this commit. > > > we found the issue still exists, though the rate seems drop, we only reproduced > the issue 4 times out of 100 runs. > > one dmesg attached FYI. > > > [ 448.229620][ C0] BUG: spinlock trylock failure on UP on CPU#0, kcompactd0/28 > [ 448.230037][ C0] lock: 0xffff888807e75f08, .magic: dead4ead, .owner: kcompactd0/28, .owner_cpu: 0 > [ 448.230474][ C0] CPU: 0 UID: 0 PID: 28 Comm: kcompactd0 Not tainted 6.18.0-12693-g5c827f68ce27 #1 PREEMPT b3721ff206516c93c37a896ae4d3df039c2d04c2 > [ 448.231106][ C0] Call Trace: > [ 448.231264][ C0] <IRQ> > [ 448.231403][ C0] __dump_stack+0x19/0x40 > [ 448.231627][ C0] dump_stack_lvl+0x36/0xb8 > [ 448.231842][ C0] dump_stack+0x10/0x38 > [ 448.232101][ C0] spin_dump+0x11c/0x1c0 > [ 448.232404][ C0] do_raw_spin_trylock+0xe4/0x138 > [ 448.232758][ C0] _raw_spin_trylock+0x1a/0xb8 > [ 448.233104][ C0] __free_frozen_pages+0x750/0xa80 > [ 448.233456][ C0] ___free_pages+0x67/0x140 > [ 448.233765][ C0] __free_pages+0x10/0x38 > [ 448.234066][ C0] __tlb_remove_table+0x16a/0x200 > [ 448.234412][ C0] tlb_remove_table_rcu+0x77/0xf8 > [ 448.234771][ C0] ? __cfi_tlb_remove_table_rcu+0x8/0x8 > [ 448.235153][ C0] ? rcu_core+0x8af/0x1740 > [ 448.235456][ C0] rcu_core+0xa25/0x1740 > [ 448.235751][ C0] rcu_core_si+0xe/0x38 > [ 448.236038][ C0] handle_softirqs+0x1db/0x700 > [ 448.236371][ C0] __irq_exit_rcu+0x44/0xc0 > [ 448.236678][ C0] irq_exit_rcu+0xe/0x38 > [ 448.236968][ C0] sysvec_apic_timer_interrupt+0x78/0xb8 > [ 448.237382][ C0] </IRQ> > [ 448.237588][ C0] <TASK> > [ 448.237798][ C0] asm_sysvec_apic_timer_interrupt+0x1b/0x40 > [ 448.238214][ C0] RIP: 0010:_raw_spin_unlock_irqrestore+0x6b/0xb8 > [ 448.238575][ C0] Code: 00 44 89 f6 c1 ee 09 48 c7 c7 60 91 80 86 31 d2 31 c9 e8 e8 b4 86 fd 4d 85 f6 74 05 e8 9e e9 fd ff 0f ba e3 09 73 01 fb 31 f6 <ff> 0d ef b8 75 01 0f 95 c3 40 0f 94 c6 48 c7 c7 90 91 80 86 31 d2 > [ 448.239787][ C0] RSP: 0000:ffffc900001cfb48 EFLAGS: 00000246 > [ 448.240187][ C0] RAX: 0000000000000000 RBX: 0000000000000206 RCX: 0000000000000000 > [ 448.240700][ C0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > [ 448.241230][ C0] RBP: ffffc900001cfb60 R08: 0000000000000000 R09: 0000000000000000 > [ 448.241758][ C0] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888807e75fc8 > [ 448.242289][ C0] R13: 0000000000000006 R14: 0000000000000000 R15: ffffea000cee6410 > [ 448.242840][ C0] ? _raw_spin_unlock_irqrestore+0x58/0xb8 > [ 448.243241][ C0] free_pcppages_bulk+0x4a2/0x500 > [ 448.243591][ C0] drain_pages_zone+0xa5/0x140 > [ 448.243936][ C0] __drain_all_pages+0x1ab/0x240 > [ 448.244277][ C0] drain_all_pages+0x10/0x38 > [ 448.244587][ C0] kcompactd+0x61f/0xaf8 > [ 448.244900][ C0] kthread+0x586/0x678 > [ 448.245183][ C0] ? __cfi_kcompactd+0x8/0x8 > [ 448.245504][ C0] ? __cfi_kthread+0x8/0x8 > [ 448.245815][ C0] ret_from_fork+0x24a/0x478 > [ 448.246138][ C0] ? __cfi_kthread+0x8/0x8 > [ 448.246453][ C0] ret_from_fork_asm+0x11/0x40 > [ 448.246795][ C0] </TASK> > [ 693.283069][ T328] hwclock: can't open '/dev/misc/rtc': No such file or directory > LKP: ttyS0: 289: LKP: tbox cant kexec and rebooting forcely > [ 701.346273][ T289] sysrq: Emergency Sync > [ 701.346778][ T10] Emergency Sync complete > [ 701.347270][ T289] sysrq: Resetting > > > > > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > > index ecbac900c35f..0d1480723ddc 100644 > > > --- a/mm/vmalloc.c > > > +++ b/mm/vmalloc.c > > > @@ -3634,7 +3634,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid, > > > struct page *page; > > > int i; > > > unsigned int large_order = ilog2(nr_remaining); > > > - gfp_t large_gfp = vmalloc_gfp_adjust(gfp, large_order) & ~__GFP_DIRECT_RECLAIM; > > > + gfp_t large_gfp = vmalloc_gfp_adjust(gfp, large_order) & ~__GFP_RECLAIM; > > > > > > large_order = min(max_attempt_order, large_order); > > > > > Some thoughts. > > > > That trace is not easy to follow. As for this change, probably it > > is worth to go with anyway. Because, if we are under a low memory > > condition, it is quite common for Android devices, we do not want > > to kick frequently kswapd without good reason. We have a fallback > > to single page allocator. Yeah. AFAICT, This trace doesn't seem to point at this patch: __free_frozen_pages+0x750/0xa80 is: pcp = pcp_spin_trylock(zone->per_cpu_pageset, UP_flags); if (pcp) { if (!free_frozen_page_commit(zone, pcp, page, migratetype, order, fpi_flags, &UP_flags)) return; ------------> pcp_spin_unlock(pcp, UP_flags); free_pcppages_bulk+0x4a2/0x500 is: spin_unlock_irqrestore(&zone->lock, flags); Seeing how masking off reclaim reduced the occurence from 50% -> 5%, Maybe its some underlying race condition that is encouraged by the patch? I'm not sure how often people are kicking of kswapd then immediately calling the bulk allocator... ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linus:master] [mm/vmalloc] a061578043: BUG:spinlock_trylock_failure_on_UP_on_CPU 2025-12-10 23:30 ` Vishal Moola (Oracle) 2025-12-11 10:09 ` Uladzislau Rezki @ 2025-12-11 20:47 ` Andrew Morton 2025-12-11 21:01 ` Vishal Moola (Oracle) 1 sibling, 1 reply; 8+ messages in thread From: Andrew Morton @ 2025-12-11 20:47 UTC (permalink / raw) To: Vishal Moola (Oracle) Cc: kernel test robot, oe-lkp, lkp, linux-kernel, Uladzislau Rezki, linux-mm On Wed, 10 Dec 2025 15:30:51 -0800 "Vishal Moola (Oracle)" <vishal.moola@gmail.com> wrote: > > The kernel config and materials to reproduce are available at: > > https://download.01.org/0day-ci/archive/20251210/202512101320.e2f2dd6f-lkp@intel.com > > > > Hmmm. This looks like a race condition tied to reclaim. I'm assuming > we fail to allocate a page and kick off kswapd. Then when we fall back > to the bulk allocator which tries to remove a pcp page at the same time as > kswapd tries to reclaim it. Maybe? > > Does something like this fix it? Concerned. If pcp page removal is racing with kswapd reclaim then that's a problem. Altering a caller so it doesn't tickle this bug doesn't fix the bug? > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index ecbac900c35f..0d1480723ddc 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -3634,7 +3634,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid, > struct page *page; > int i; > unsigned int large_order = ilog2(nr_remaining); > - gfp_t large_gfp = vmalloc_gfp_adjust(gfp, large_order) & ~__GFP_DIRECT_RECLAIM; > + gfp_t large_gfp = vmalloc_gfp_adjust(gfp, large_order) & ~__GFP_RECLAIM; > > large_order = min(max_attempt_order, large_order); ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linus:master] [mm/vmalloc] a061578043: BUG:spinlock_trylock_failure_on_UP_on_CPU 2025-12-11 20:47 ` Andrew Morton @ 2025-12-11 21:01 ` Vishal Moola (Oracle) 0 siblings, 0 replies; 8+ messages in thread From: Vishal Moola (Oracle) @ 2025-12-11 21:01 UTC (permalink / raw) To: Andrew Morton Cc: kernel test robot, oe-lkp, lkp, linux-kernel, Uladzislau Rezki, linux-mm On Thu, Dec 11, 2025 at 12:47:10PM -0800, Andrew Morton wrote: > On Wed, 10 Dec 2025 15:30:51 -0800 "Vishal Moola (Oracle)" <vishal.moola@gmail.com> wrote: > > > > The kernel config and materials to reproduce are available at: > > > https://download.01.org/0day-ci/archive/20251210/202512101320.e2f2dd6f-lkp@intel.com > > > > > > > Hmmm. This looks like a race condition tied to reclaim. I'm assuming > > we fail to allocate a page and kick off kswapd. Then when we fall back > > to the bulk allocator which tries to remove a pcp page at the same time as > > kswapd tries to reclaim it. Maybe? > > > > Does something like this fix it? > > Concerned. If pcp page removal is racing with kswapd reclaim then > that's a problem. Altering a caller so it doesn't tickle this bug > doesn't fix the bug? Agreed. I'll look into it. > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > index ecbac900c35f..0d1480723ddc 100644 > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > @@ -3634,7 +3634,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid, > > struct page *page; > > int i; > > unsigned int large_order = ilog2(nr_remaining); > > - gfp_t large_gfp = vmalloc_gfp_adjust(gfp, large_order) & ~__GFP_DIRECT_RECLAIM; > > + gfp_t large_gfp = vmalloc_gfp_adjust(gfp, large_order) & ~__GFP_RECLAIM; > > > > large_order = min(max_attempt_order, large_order); ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-12-11 21:01 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-12-10 6:10 [linus:master] [mm/vmalloc] a061578043: BUG:spinlock_trylock_failure_on_UP_on_CPU kernel test robot 2025-12-10 23:30 ` Vishal Moola (Oracle) 2025-12-11 10:09 ` Uladzislau Rezki 2025-12-11 13:40 ` Oliver Sang 2025-12-11 13:53 ` Oliver Sang 2025-12-11 20:52 ` Vishal Moola (Oracle) 2025-12-11 20:47 ` Andrew Morton 2025-12-11 21:01 ` Vishal Moola (Oracle)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).