* [linus:master] [lib/test_vmalloc.c] 2d76e79315: Kernel_panic-not_syncing:Fatal_exception
@ 2025-06-18 6:25 kernel test robot
2025-06-19 14:10 ` Kernel crash due to alloc_tag_top_users() being called when !mem_profiling_support? Harry Yoo
` (4 more replies)
0 siblings, 5 replies; 28+ messages in thread
From: kernel test robot @ 2025-06-18 6:25 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Baoquan He,
Adrian Huang, Christop Hellwig, Mateusz Guzik, linux-mm,
oliver.sang
Hello,
for this change, we reported
"[linux-next:master] [lib/test_vmalloc.c] 7fc85b92db: Mem-Info"
in
https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
at that time, we made some tests with x86_64 config which runs well.
now we noticed the commit is in mainline now.
the config still has expected diff with parent:
--- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config 2025-06-17 14:40:29.481052101 +0800
+++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config 2025-06-17 14:41:18.448543738 +0800
@@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m
CONFIG_TEST_MISC_MINOR=m
# CONFIG_TEST_LKM is not set
CONFIG_TEST_BITOPS=m
-CONFIG_TEST_VMALLOC=m
+CONFIG_TEST_VMALLOC=y
# CONFIG_TEST_BPF is not set
CONFIG_FIND_BIT_BENCHMARK=m
# CONFIG_TEST_FIRMWARE is not set
then we noticed similar random issue with x86_64 randconfig this time.
7a73348e5d4715b5 2d76e79315e403aab595d4c8830
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:199 34% 67:200 dmesg.KASAN:null-ptr-deref_in_range[#-#]
:199 34% 67:200 dmesg.Kernel_panic-not_syncing:Fatal_exception
:199 34% 67:200 dmesg.Mem-Info
:199 34% 67:200 dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN
:199 34% 67:200 dmesg.RIP:down_read_trylock
we don't have enough knowledge to understand the relationship between code
change and the random issues. just report what we obsverved in our tests FYI.
below is full report.
kernel test robot noticed "Kernel_panic-not_syncing:Fatal_exception" on:
commit: 2d76e79315e403aab595d4c8830b7a46c19f0f3b ("lib/test_vmalloc.c: allow built-in execution")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
[test failed on linus/master e04c78d86a9699d136910cfc0bdcf01087e3267e]
[test failed on linux-next/master 050f8ad7b58d9079455af171ac279c4b9b828c11]
in testcase: boot
config: x86_64-randconfig-161-20250614
compiler: gcc-12
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com
[ 36.902716][ T60] vmalloc_node_range for size 8192 failed: Address range restricted to 0xffffc90000000000 - 0xffffe8ffffffffff
[ 36.903981][ T60] vmalloc_test/0: vmalloc error: size 4096, vm_struct allocation failed, mode:0xdc0(GFP_KERNEL|__GFP_ZERO), nodemask=(null)
[ 36.905195][ T60] CPU: 1 UID: 0 PID: 60 Comm: vmalloc_test/0 Not tainted 6.15.0-rc6-00142-g2d76e79315e4 #1 VOLUNTARY
[ 36.905201][ T60] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 36.905203][ T60] Call Trace:
[ 36.905206][ T60] <TASK>
[ 36.905209][ T60] dump_stack_lvl+0x87/0xd6
[ 36.905223][ T60] warn_alloc+0x15e/0x291
[ 36.905230][ T60] ? has_managed_dma+0x37/0x37
[ 36.905237][ T60] ? __get_vm_area_node+0x33a/0x3c0
[ 36.905244][ T60] ? __get_vm_area_node+0x33a/0x3c0
[ 36.905250][ T60] __vmalloc_node_range_noprof+0x170/0x306
[ 36.905255][ T60] ? __vmalloc_area_node+0x460/0x460
[ 36.905260][ T60] ? test_func+0x2ae/0x469
[ 36.905264][ T60] __vmalloc_node_noprof+0xb8/0xd9
[ 36.905267][ T60] ? test_func+0x2ae/0x469
[ 36.905272][ T60] align_shift_alloc_test+0xa8/0x165
[ 36.905277][ T60] test_func+0x2ae/0x469
[ 36.905281][ T60] ? pcpu_alloc_test+0x31b/0x31b
[ 36.905286][ T60] ? __kthread_parkme+0xcb/0x1a3
[ 36.905293][ T60] ? pcpu_alloc_test+0x31b/0x31b
[ 36.905297][ T60] kthread+0x452/0x464
[ 36.905301][ T60] ? kthread_is_per_cpu+0x51/0x51
[ 36.905304][ T60] ? _raw_spin_unlock_irq+0x23/0x35
[ 36.905308][ T60] ? kthread_is_per_cpu+0x51/0x51
[ 36.905311][ T60] ? kthread_is_per_cpu (kbuild/obj/consumer/x86_64-randconfig-161-20250614/kernel/kthread.c:413)
[ 36.905314][ T60] ret_from_fork (kbuild/obj/consumer/x86_64-randconfig-161-20250614/arch/x86/kernel/process.c:153)
[ 36.905318][ T60] ? kthread_is_per_cpu (kbuild/obj/consumer/x86_64-randconfig-161-20250614/kernel/kthread.c:413)
[ 36.905321][ T60] ret_from_fork_asm (kbuild/obj/consumer/x86_64-randconfig-161-20250614/arch/x86/entry/entry_64.S:255)
[ 36.905330][ T60] </TASK>
[ 36.905332][ T60] Mem-Info:
[ 36.919941][ T60] active_anon:0 inactive_anon:0 isolated_anon:0
[ 36.919941][ T60] active_file:0 inactive_file:0 isolated_file:0
[ 36.919941][ T60] unevictable:41612 dirty:0 writeback:0
[ 36.919941][ T60] slab_reclaimable:7429 slab_unreclaimable:145259
[ 36.919941][ T60] mapped:0 shmem:0 pagetables:145
[ 36.919941][ T60] sec_pagetables:0 bounce:0
[ 36.919941][ T60] kernel_misc_reclaimable:0
[ 36.919941][ T60] free:3233392 free_pcp:1185 free_cma:0
[ 36.923830][ T60] Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:166448kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB writeback_tmp:0kB kernel_stack:1952kB pagetables:580kB sec_pagetables:0kB all_unreclaimable? no Balloon:0kB
[ 36.926265][ T60] DMA free:15360kB boost:0kB min:16kB low:28kB high:40kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 36.928855][ T60] lowmem_reserve[]: 0 2991 13741 13741
[ 36.929411][ T60] DMA32 free:3060560kB boost:0kB min:3224kB low:6244kB high:9264kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:3063680kB mlocked:0kB bounce:0kB free_pcp:3120kB local_pcp:3120kB free_cma:0kB
[ 36.932080][ T60] lowmem_reserve[]: 0 0 10749 10749
[ 36.932604][ T60] Normal free:9857648kB boost:0kB min:11744kB low:22748kB high:33752kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:166448kB writepending:0kB present:13631488kB managed:11007884kB mlocked:0kB bounce:0kB free_pcp:1620kB local_pcp:740kB free_cma:0kB
[ 36.935336][ T60] lowmem_reserve[]: 0 0 0 0
[ 36.935802][ T60] DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (U) 3*4096kB (M) = 15360kB
[ 36.936931][ T60] DMA32: 0*4kB 0*8kB 1*16kB (M) 2*32kB (M) 2*64kB (M) 1*128kB (M) 2*256kB (M) 2*512kB (M) 1*1024kB (M) 1*2048kB (M) 746*4096kB (M) = 3060560kB
[ 36.938318][ T60] Normal: 6*4kB (ME) 2*8kB (ME) 7*16kB (UME) 5*32kB (M) 3*64kB (ME) 4*128kB (M) 6*256kB (UME) 2*512kB (M) 1*1024kB (M) 3*2048kB (UME) 2404*4096kB (M) = 9857528kB
[ 36.939849][ T60] 41618 total pagecache pages
[ 36.940324][ T60] 4194174 pages RAM
[ 36.940721][ T60] 0 pages HighMem/MovableOnly
[ 36.941188][ T60] 672443 pages reserved
[ 36.941626][ T60] Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#1] SMP KASAN
[ 36.942185][ T60] KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
[ 36.942185][ T60] CPU: 1 UID: 0 PID: 60 Comm: vmalloc_test/0 Not tainted 6.15.0-rc6-00142-g2d76e79315e4 #1 VOLUNTARY
[ 36.942185][ T60] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 36.942185][ T60] RIP: 0010:down_read_trylock+0xa7/0x2b9
[ 36.942185][ T60] Code: b0 ef 25 91 e8 57 16 40 00 83 3d 9c e6 a7 09 00 0f 85 2c 01 00 00 48 8d 6b 68 b8 ff ff 37 00 48 89 ea 48 c1 e0 2a 48 c1 ea 03 <80> 3c 02 00 74 08 48 89 ef e8 3c 16 40 00 48 3b 5b 68 0f 84 00 01
[ 36.942185][ T60] RSP: 0000:ffff88814657f848 EFLAGS: 00010206
[ 36.942185][ T60] RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 1ffffffff224bdf6
[ 36.942185][ T60] RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
[ 36.942185][ T60] RBP: 00000000000000d8 R08: 0000000000000000 R09: 0000000000000000
[ 36.942185][ T60] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff11028caff0a
[ 36.942185][ T60] R13: ffff88814657fa30 R14: dffffc0000000000 R15: 0000000000000000
[ 36.942185][ T60] FS: 0000000000000000(0000) GS:ffff88841c1f0000(0000) knlGS:0000000000000000
[ 36.942185][ T60] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 36.942185][ T60] CR2: 0000000000000000 CR3: 00000001636e0000 CR4: 00000000000406b0
[ 36.942185][ T60] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 36.942185][ T60] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 36.942185][ T60] Call Trace:
[ 36.942185][ T60] <TASK>
[ 36.942185][ T60] ? clear_nonspinnable+0x32/0x32
[ 36.942185][ T60] ? vprintk_emit+0x165/0x194
[ 36.942185][ T60] codetag_trylock_module_list+0xd/0x19
[ 36.942185][ T60] alloc_tag_top_users+0x95/0x216
[ 36.942185][ T60] ? _printk+0xad/0xdf
[ 36.942185][ T60] ? reserve_module_tags+0x308/0x308
[ 36.942185][ T60] __show_mem+0x167/0x54b
[ 36.942185][ T60] ? _printk+0xad/0xdf
[ 36.942185][ T60] ? printk_get_console_flush_type+0x272/0x272
[ 36.942185][ T60] ? show_free_areas+0x115d/0x115d
[ 36.942185][ T60] ? tracer_hardirqs_on+0x1b/0x28d
[ 36.942185][ T60] ? dump_stack_lvl+0x91/0xd6
[ 36.942185][ T60] ? warn_alloc+0x251/0x291
[ 36.942185][ T60] warn_alloc+0x251/0x291
[ 36.942185][ T60] ? has_managed_dma+0x37/0x37
[ 36.942185][ T60] ? __get_vm_area_node+0x33a/0x3c0
[ 36.942185][ T60] __vmalloc_node_range_noprof+0x170/0x306
[ 36.942185][ T60] ? __vmalloc_area_node+0x460/0x460
[ 36.942185][ T60] ? test_func+0x2ae/0x469
[ 36.942185][ T60] __vmalloc_node_noprof+0xb8/0xd9
[ 36.942185][ T60] ? test_func+0x2ae/0x469
[ 36.942185][ T60] align_shift_alloc_test+0xa8/0x165
[ 36.942185][ T60] test_func+0x2ae/0x469
[ 36.942185][ T60] ? pcpu_alloc_test+0x31b/0x31b
[ 36.942185][ T60] ? __kthread_parkme+0xcb/0x1a3
[ 36.942185][ T60] ? pcpu_alloc_test+0x31b/0x31b
[ 36.942185][ T60] kthread+0x452/0x464
[ 36.942185][ T60] ? kthread_is_per_cpu+0x51/0x51
[ 36.942185][ T60] ? _raw_spin_unlock_irq+0x23/0x35
[ 36.942185][ T60] ? kthread_is_per_cpu+0x51/0x51
[ 36.942185][ T60] ret_from_fork+0x20/0x54
[ 36.942185][ T60] ? kthread_is_per_cpu+0x51/0x51
[ 36.942185][ T60] ret_from_fork_asm+0x11/0x20
[ 36.942185][ T60] </TASK>
[ 36.942185][ T60] Modules linked in:
[ 37.000652][ T60] ---[ end trace 0000000000000000 ]---
[ 37.001188][ T60] RIP: 0010:down_read_trylock+0xa7/0x2b9
[ 37.001731][ T60] Code: b0 ef 25 91 e8 57 16 40 00 83 3d 9c e6 a7 09 00 0f 85 2c 01 00 00 48 8d 6b 68 b8 ff ff 37 00 48 89 ea 48 c1 e0 2a 48 c1 ea 03 <80> 3c 02 00 74 08 48 89 ef e8 3c 16 40 00 48 3b 5b 68 0f 84 00 01
[ 37.003488][ T60] RSP: 0000:ffff88814657f848 EFLAGS: 00010206
[ 37.004072][ T60] RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 1ffffffff224bdf6
[ 37.004848][ T60] RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
[ 37.005610][ T60] RBP: 00000000000000d8 R08: 0000000000000000 R09: 0000000000000000
[ 37.006381][ T60] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff11028caff0a
[ 37.007178][ T60] R13: ffff88814657fa30 R14: dffffc0000000000 R15: 0000000000000000
[ 37.007940][ T60] FS: 0000000000000000(0000) GS:ffff88841c1f0000(0000) knlGS:0000000000000000
[ 37.008792][ T60] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 37.009411][ T60] CR2: 0000000000000000 CR3: 00000001636e0000 CR4: 00000000000406b0
[ 37.010175][ T60] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 37.010950][ T60] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 37.011716][ T60] Kernel panic - not syncing: Fatal exception
[ 37.012397][ T60] Kernel Offset: 0x6200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250618/202506181351.bba867dd-lkp@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 28+ messages in thread
* Kernel crash due to alloc_tag_top_users() being called when !mem_profiling_support?
2025-06-18 6:25 [linus:master] [lib/test_vmalloc.c] 2d76e79315: Kernel_panic-not_syncing:Fatal_exception kernel test robot
@ 2025-06-19 14:10 ` Harry Yoo
2025-06-19 15:04 ` Harry Yoo
2025-06-19 15:08 ` David Wang
2025-06-20 0:40 ` [PATCH] lib/alloc_tag: do not acquire nonexistent lock when mem profiling is disabled Harry Yoo
` (3 subsequent siblings)
4 siblings, 2 replies; 28+ messages in thread
From: Harry Yoo @ 2025-06-19 14:10 UTC (permalink / raw)
To: kernel test robot
Cc: Uladzislau Rezki, oe-lkp, lkp, linux-kernel, Andrew Morton,
Baoquan He, Adrian Huang, Christop Hellwig, Mateusz Guzik,
linux-mm, Suren Baghdasaryan, Kent Overstreet
On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
>
> Hello,
>
> for this change, we reported
> "[linux-next:master] [lib/test_vmalloc.c] 7fc85b92db: Mem-Info"
> in
> https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
>
> at that time, we made some tests with x86_64 config which runs well.
>
> now we noticed the commit is in mainline now.
(Re-sending due to not Ccing people and the list...)
Hi, I'm facing the same error on my testing environment.
I think this is related to memory allocation profiling & code tagging
subsystems rather than vmalloc, so let's add related folks to Cc.
After a quick skimming of the code, it seems the condition
to trigger this is that on 1) MEM_ALLOC_PROFILING is compiled but
2) not enabled by default. and 3) allocation somehow failed, calling
alloc_tag_top_users().
I see "Memory allocation profiling is not supported!" in the dmesg,
which means it did not alloc & inititialize alloc_tag_cttype properly,
but alloc_tag_top_users() tries to acquire the semaphore.
I think the kernel should not call alloc_tag_top_users() at all (or it
should return an error) if mem_profiling_support == false?
Does the following work on your testing environment?
(Only did very light testing on my QEMU, but seems to fix the issue for me.)
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index d48b80f3f007..57d4d5673855 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -134,7 +134,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
struct codetag_bytes n;
unsigned int i, nr = 0;
- if (can_sleep)
+ if (!mem_profiling_support)
+ return 0;
+ else if (can_sleep)
codetag_lock_module_list(alloc_tag_cttype, true);
else if (!codetag_trylock_module_list(alloc_tag_cttype))
return 0;
> the config still has expected diff with parent:
>
> --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config 2025-06-17 14:40:29.481052101 +0800
> +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config 2025-06-17 14:41:18.448543738 +0800
> @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m
> CONFIG_TEST_MISC_MINOR=m
> # CONFIG_TEST_LKM is not set
> CONFIG_TEST_BITOPS=m
> -CONFIG_TEST_VMALLOC=m
> +CONFIG_TEST_VMALLOC=y
> # CONFIG_TEST_BPF is not set
> CONFIG_FIND_BIT_BENCHMARK=m
> # CONFIG_TEST_FIRMWARE is not set
>
>
> then we noticed similar random issue with x86_64 randconfig this time.
>
> 7a73348e5d4715b5 2d76e79315e403aab595d4c8830
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> :199 34% 67:200 dmesg.KASAN:null-ptr-deref_in_range[#-#]
> :199 34% 67:200 dmesg.Kernel_panic-not_syncing:Fatal_exception
> :199 34% 67:200 dmesg.Mem-Info
> :199 34% 67:200 dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN
> :199 34% 67:200 dmesg.RIP:down_read_trylock
>
> we don't have enough knowledge to understand the relationship between code
> change and the random issues. just report what we obsverved in our tests FYI.
>
> below is full report.
>
>
>
> kernel test robot noticed "Kernel_panic-not_syncing:Fatal_exception" on:
>
> commit: 2d76e79315e403aab595d4c8830b7a46c19f0f3b ("lib/test_vmalloc.c: allow built-in execution")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> [test failed on linus/master e04c78d86a9699d136910cfc0bdcf01087e3267e]
> [test failed on linux-next/master 050f8ad7b58d9079455af171ac279c4b9b828c11]
>
> in testcase: boot
>
> config: x86_64-randconfig-161-20250614
> compiler: gcc-12
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com
>
>
> [ 36.902716][ T60] vmalloc_node_range for size 8192 failed: Address range restricted to 0xffffc90000000000 - 0xffffe8ffffffffff
> [ 36.903981][ T60] vmalloc_test/0: vmalloc error: size 4096, vm_struct allocation failed, mode:0xdc0(GFP_KERNEL|__GFP_ZERO), nodemask=(null)
> [ 36.905195][ T60] CPU: 1 UID: 0 PID: 60 Comm: vmalloc_test/0 Not tainted 6.15.0-rc6-00142-g2d76e79315e4 #1 VOLUNTARY
> [ 36.905201][ T60] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 36.905203][ T60] Call Trace:
> [ 36.905206][ T60] <TASK>
> [ 36.905209][ T60] dump_stack_lvl+0x87/0xd6
> [ 36.905223][ T60] warn_alloc+0x15e/0x291
> [ 36.905230][ T60] ? has_managed_dma+0x37/0x37
> [ 36.905237][ T60] ? __get_vm_area_node+0x33a/0x3c0
> [ 36.905244][ T60] ? __get_vm_area_node+0x33a/0x3c0
> [ 36.905250][ T60] __vmalloc_node_range_noprof+0x170/0x306
> [ 36.905255][ T60] ? __vmalloc_area_node+0x460/0x460
> [ 36.905260][ T60] ? test_func+0x2ae/0x469
> [ 36.905264][ T60] __vmalloc_node_noprof+0xb8/0xd9
> [ 36.905267][ T60] ? test_func+0x2ae/0x469
> [ 36.905272][ T60] align_shift_alloc_test+0xa8/0x165
> [ 36.905277][ T60] test_func+0x2ae/0x469
> [ 36.905281][ T60] ? pcpu_alloc_test+0x31b/0x31b
> [ 36.905286][ T60] ? __kthread_parkme+0xcb/0x1a3
> [ 36.905293][ T60] ? pcpu_alloc_test+0x31b/0x31b
> [ 36.905297][ T60] kthread+0x452/0x464
> [ 36.905301][ T60] ? kthread_is_per_cpu+0x51/0x51
> [ 36.905304][ T60] ? _raw_spin_unlock_irq+0x23/0x35
> [ 36.905308][ T60] ? kthread_is_per_cpu+0x51/0x51
> [ 36.905311][ T60] ? kthread_is_per_cpu (kbuild/obj/consumer/x86_64-randconfig-161-20250614/kernel/kthread.c:413)
> [ 36.905314][ T60] ret_from_fork (kbuild/obj/consumer/x86_64-randconfig-161-20250614/arch/x86/kernel/process.c:153)
> [ 36.905318][ T60] ? kthread_is_per_cpu (kbuild/obj/consumer/x86_64-randconfig-161-20250614/kernel/kthread.c:413)
> [ 36.905321][ T60] ret_from_fork_asm (kbuild/obj/consumer/x86_64-randconfig-161-20250614/arch/x86/entry/entry_64.S:255)
> [ 36.905330][ T60] </TASK>
> [ 36.905332][ T60] Mem-Info:
> [ 36.919941][ T60] active_anon:0 inactive_anon:0 isolated_anon:0
> [ 36.919941][ T60] active_file:0 inactive_file:0 isolated_file:0
> [ 36.919941][ T60] unevictable:41612 dirty:0 writeback:0
> [ 36.919941][ T60] slab_reclaimable:7429 slab_unreclaimable:145259
> [ 36.919941][ T60] mapped:0 shmem:0 pagetables:145
> [ 36.919941][ T60] sec_pagetables:0 bounce:0
> [ 36.919941][ T60] kernel_misc_reclaimable:0
> [ 36.919941][ T60] free:3233392 free_pcp:1185 free_cma:0
> [ 36.923830][ T60] Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:166448kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB writeback_tmp:0kB kernel_stack:1952kB pagetables:580kB sec_pagetables:0kB all_unreclaimable? no Balloon:0kB
> [ 36.926265][ T60] DMA free:15360kB boost:0kB min:16kB low:28kB high:40kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [ 36.928855][ T60] lowmem_reserve[]: 0 2991 13741 13741
> [ 36.929411][ T60] DMA32 free:3060560kB boost:0kB min:3224kB low:6244kB high:9264kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:3063680kB mlocked:0kB bounce:0kB free_pcp:3120kB local_pcp:3120kB free_cma:0kB
> [ 36.932080][ T60] lowmem_reserve[]: 0 0 10749 10749
> [ 36.932604][ T60] Normal free:9857648kB boost:0kB min:11744kB low:22748kB high:33752kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:166448kB writepending:0kB present:13631488kB managed:11007884kB mlocked:0kB bounce:0kB free_pcp:1620kB local_pcp:740kB free_cma:0kB
> [ 36.935336][ T60] lowmem_reserve[]: 0 0 0 0
> [ 36.935802][ T60] DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (U) 3*4096kB (M) = 15360kB
> [ 36.936931][ T60] DMA32: 0*4kB 0*8kB 1*16kB (M) 2*32kB (M) 2*64kB (M) 1*128kB (M) 2*256kB (M) 2*512kB (M) 1*1024kB (M) 1*2048kB (M) 746*4096kB (M) = 3060560kB
> [ 36.938318][ T60] Normal: 6*4kB (ME) 2*8kB (ME) 7*16kB (UME) 5*32kB (M) 3*64kB (ME) 4*128kB (M) 6*256kB (UME) 2*512kB (M) 1*1024kB (M) 3*2048kB (UME) 2404*4096kB (M) = 9857528kB
> [ 36.939849][ T60] 41618 total pagecache pages
> [ 36.940324][ T60] 4194174 pages RAM
> [ 36.940721][ T60] 0 pages HighMem/MovableOnly
> [ 36.941188][ T60] 672443 pages reserved
> [ 36.941626][ T60] Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#1] SMP KASAN
> [ 36.942185][ T60] KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
> [ 36.942185][ T60] CPU: 1 UID: 0 PID: 60 Comm: vmalloc_test/0 Not tainted 6.15.0-rc6-00142-g2d76e79315e4 #1 VOLUNTARY
> [ 36.942185][ T60] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 36.942185][ T60] RIP: 0010:down_read_trylock+0xa7/0x2b9
> [ 36.942185][ T60] Code: b0 ef 25 91 e8 57 16 40 00 83 3d 9c e6 a7 09 00 0f 85 2c 01 00 00 48 8d 6b 68 b8 ff ff 37 00 48 89 ea 48 c1 e0 2a 48 c1 ea 03 <80> 3c 02 00 74 08 48 89 ef e8 3c 16 40 00 48 3b 5b 68 0f 84 00 01
> [ 36.942185][ T60] RSP: 0000:ffff88814657f848 EFLAGS: 00010206
> [ 36.942185][ T60] RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 1ffffffff224bdf6
> [ 36.942185][ T60] RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> [ 36.942185][ T60] RBP: 00000000000000d8 R08: 0000000000000000 R09: 0000000000000000
> [ 36.942185][ T60] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff11028caff0a
> [ 36.942185][ T60] R13: ffff88814657fa30 R14: dffffc0000000000 R15: 0000000000000000
> [ 36.942185][ T60] FS: 0000000000000000(0000) GS:ffff88841c1f0000(0000) knlGS:0000000000000000
> [ 36.942185][ T60] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 36.942185][ T60] CR2: 0000000000000000 CR3: 00000001636e0000 CR4: 00000000000406b0
> [ 36.942185][ T60] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 36.942185][ T60] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 36.942185][ T60] Call Trace:
> [ 36.942185][ T60] <TASK>
> [ 36.942185][ T60] ? clear_nonspinnable+0x32/0x32
> [ 36.942185][ T60] ? vprintk_emit+0x165/0x194
> [ 36.942185][ T60] codetag_trylock_module_list+0xd/0x19
> [ 36.942185][ T60] alloc_tag_top_users+0x95/0x216
> [ 36.942185][ T60] ? _printk+0xad/0xdf
> [ 36.942185][ T60] ? reserve_module_tags+0x308/0x308
> [ 36.942185][ T60] __show_mem+0x167/0x54b
> [ 36.942185][ T60] ? _printk+0xad/0xdf
> [ 36.942185][ T60] ? printk_get_console_flush_type+0x272/0x272
> [ 36.942185][ T60] ? show_free_areas+0x115d/0x115d
> [ 36.942185][ T60] ? tracer_hardirqs_on+0x1b/0x28d
> [ 36.942185][ T60] ? dump_stack_lvl+0x91/0xd6
> [ 36.942185][ T60] ? warn_alloc+0x251/0x291
> [ 36.942185][ T60] warn_alloc+0x251/0x291
> [ 36.942185][ T60] ? has_managed_dma+0x37/0x37
> [ 36.942185][ T60] ? __get_vm_area_node+0x33a/0x3c0
> [ 36.942185][ T60] __vmalloc_node_range_noprof+0x170/0x306
> [ 36.942185][ T60] ? __vmalloc_area_node+0x460/0x460
> [ 36.942185][ T60] ? test_func+0x2ae/0x469
> [ 36.942185][ T60] __vmalloc_node_noprof+0xb8/0xd9
> [ 36.942185][ T60] ? test_func+0x2ae/0x469
> [ 36.942185][ T60] align_shift_alloc_test+0xa8/0x165
> [ 36.942185][ T60] test_func+0x2ae/0x469
> [ 36.942185][ T60] ? pcpu_alloc_test+0x31b/0x31b
> [ 36.942185][ T60] ? __kthread_parkme+0xcb/0x1a3
> [ 36.942185][ T60] ? pcpu_alloc_test+0x31b/0x31b
> [ 36.942185][ T60] kthread+0x452/0x464
> [ 36.942185][ T60] ? kthread_is_per_cpu+0x51/0x51
> [ 36.942185][ T60] ? _raw_spin_unlock_irq+0x23/0x35
> [ 36.942185][ T60] ? kthread_is_per_cpu+0x51/0x51
> [ 36.942185][ T60] ret_from_fork+0x20/0x54
> [ 36.942185][ T60] ? kthread_is_per_cpu+0x51/0x51
> [ 36.942185][ T60] ret_from_fork_asm+0x11/0x20
> [ 36.942185][ T60] </TASK>
> [ 36.942185][ T60] Modules linked in:
> [ 37.000652][ T60] ---[ end trace 0000000000000000 ]---
> [ 37.001188][ T60] RIP: 0010:down_read_trylock+0xa7/0x2b9
> [ 37.001731][ T60] Code: b0 ef 25 91 e8 57 16 40 00 83 3d 9c e6 a7 09 00 0f 85 2c 01 00 00 48 8d 6b 68 b8 ff ff 37 00 48 89 ea 48 c1 e0 2a 48 c1 ea 03 <80> 3c 02 00 74 08 48 89 ef e8 3c 16 40 00 48 3b 5b 68 0f 84 00 01
> [ 37.003488][ T60] RSP: 0000:ffff88814657f848 EFLAGS: 00010206
> [ 37.004072][ T60] RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 1ffffffff224bdf6
> [ 37.004848][ T60] RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> [ 37.005610][ T60] RBP: 00000000000000d8 R08: 0000000000000000 R09: 0000000000000000
> [ 37.006381][ T60] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff11028caff0a
> [ 37.007178][ T60] R13: ffff88814657fa30 R14: dffffc0000000000 R15: 0000000000000000
> [ 37.007940][ T60] FS: 0000000000000000(0000) GS:ffff88841c1f0000(0000) knlGS:0000000000000000
> [ 37.008792][ T60] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 37.009411][ T60] CR2: 0000000000000000 CR3: 00000001636e0000 CR4: 00000000000406b0
> [ 37.010175][ T60] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 37.010950][ T60] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 37.011716][ T60] Kernel panic - not syncing: Fatal exception
> [ 37.012397][ T60] Kernel Offset: 0x6200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20250618/202506181351.bba867dd-lkp@intel.com
>
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>
>
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: Kernel crash due to alloc_tag_top_users() being called when !mem_profiling_support?
2025-06-19 14:10 ` Kernel crash due to alloc_tag_top_users() being called when !mem_profiling_support? Harry Yoo
@ 2025-06-19 15:04 ` Harry Yoo
2025-06-20 8:47 ` Uladzislau Rezki
2025-06-19 15:08 ` David Wang
1 sibling, 1 reply; 28+ messages in thread
From: Harry Yoo @ 2025-06-19 15:04 UTC (permalink / raw)
To: kernel test robot
Cc: Uladzislau Rezki, oe-lkp, lkp, linux-kernel, Andrew Morton,
Baoquan He, Adrian Huang, Christop Hellwig, Mateusz Guzik,
linux-mm, Suren Baghdasaryan, Kent Overstreet
On Thu, Jun 19, 2025 at 11:10:43PM +0900, Harry Yoo wrote:
> On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
> >
> > Hello,
> >
> > for this change, we reported
> > "[linux-next:master] [lib/test_vmalloc.c] 7fc85b92db: Mem-Info"
> > in
> > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
> >
> > at that time, we made some tests with x86_64 config which runs well.
> >
> > now we noticed the commit is in mainline now.
>
> (Re-sending due to not Ccing people and the list...)
>
> Hi, I'm facing the same error on my testing environment.
I should have clarified that the reason the kernel failed to allocate
memory on my machine was due to running out of memory, not because of the
vmalloc test module.
But based on the fact that the test case (align_shift_alloc_test) is
expected to fail, the issue here is not memory allocation failure
itself, but rather that the kernel crashes when the allocation fails.
So I expect the fix below will work for you as well.
> I think this is related to memory allocation profiling & code tagging
> subsystems rather than vmalloc, so let's add related folks to Cc.
>
> After a quick skimming of the code, it seems the condition
> to trigger this is that on 1) MEM_ALLOC_PROFILING is compiled but
> 2) not enabled by default. and 3) allocation somehow failed, calling
> alloc_tag_top_users().
>
> I see "Memory allocation profiling is not supported!" in the dmesg,
> which means it did not alloc & inititialize alloc_tag_cttype properly,
> but alloc_tag_top_users() tries to acquire the semaphore.
>
> I think the kernel should not call alloc_tag_top_users() at all (or it
> should return an error) if mem_profiling_support == false?
>
> Does the following work on your testing environment?
>
> (Only did very light testing on my QEMU, but seems to fix the issue for me.)
>
> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> index d48b80f3f007..57d4d5673855 100644
> --- a/lib/alloc_tag.c
> +++ b/lib/alloc_tag.c
> @@ -134,7 +134,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> struct codetag_bytes n;
> unsigned int i, nr = 0;
>
> - if (can_sleep)
> + if (!mem_profiling_support)
> + return 0;
> + else if (can_sleep)
> codetag_lock_module_list(alloc_tag_cttype, true);
> else if (!codetag_trylock_module_list(alloc_tag_cttype))
> return 0;
>
> > the config still has expected diff with parent:
> >
> > --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config 2025-06-17 14:40:29.481052101 +0800
> > +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config 2025-06-17 14:41:18.448543738 +0800
> > @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m
> > CONFIG_TEST_MISC_MINOR=m
> > # CONFIG_TEST_LKM is not set
> > CONFIG_TEST_BITOPS=m
> > -CONFIG_TEST_VMALLOC=m
> > +CONFIG_TEST_VMALLOC=y
> > # CONFIG_TEST_BPF is not set
> > CONFIG_FIND_BIT_BENCHMARK=m
> > # CONFIG_TEST_FIRMWARE is not set
> >
> >
> > then we noticed similar random issue with x86_64 randconfig this time.
> >
> > 7a73348e5d4715b5 2d76e79315e403aab595d4c8830
> > ---------------- ---------------------------
> > fail:runs %reproduction fail:runs
> > | | |
> > :199 34% 67:200 dmesg.KASAN:null-ptr-deref_in_range[#-#]
> > :199 34% 67:200 dmesg.Kernel_panic-not_syncing:Fatal_exception
> > :199 34% 67:200 dmesg.Mem-Info
> > :199 34% 67:200 dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN
> > :199 34% 67:200 dmesg.RIP:down_read_trylock
> >
> > we don't have enough knowledge to understand the relationship between code
> > change and the random issues. just report what we obsverved in our tests FYI.
> >
> > below is full report.
> >
> >
> >
> > kernel test robot noticed "Kernel_panic-not_syncing:Fatal_exception" on:
> >
> > commit: 2d76e79315e403aab595d4c8830b7a46c19f0f3b ("lib/test_vmalloc.c: allow built-in execution")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > [test failed on linus/master e04c78d86a9699d136910cfc0bdcf01087e3267e]
> > [test failed on linux-next/master 050f8ad7b58d9079455af171ac279c4b9b828c11]
> >
> > in testcase: boot
> >
> > config: x86_64-randconfig-161-20250614
> > compiler: gcc-12
> > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> >
> > (please refer to attached dmesg/kmsg for entire log/backtrace)
> >
> >
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > | Closes: https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com
> >
> >
> > [ 36.902716][ T60] vmalloc_node_range for size 8192 failed: Address range restricted to 0xffffc90000000000 - 0xffffe8ffffffffff
> > [ 36.903981][ T60] vmalloc_test/0: vmalloc error: size 4096, vm_struct allocation failed, mode:0xdc0(GFP_KERNEL|__GFP_ZERO), nodemask=(null)
> > [ 36.905195][ T60] CPU: 1 UID: 0 PID: 60 Comm: vmalloc_test/0 Not tainted 6.15.0-rc6-00142-g2d76e79315e4 #1 VOLUNTARY
> > [ 36.905201][ T60] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > [ 36.905203][ T60] Call Trace:
> > [ 36.905206][ T60] <TASK>
> > [ 36.905209][ T60] dump_stack_lvl+0x87/0xd6
> > [ 36.905223][ T60] warn_alloc+0x15e/0x291
> > [ 36.905230][ T60] ? has_managed_dma+0x37/0x37
> > [ 36.905237][ T60] ? __get_vm_area_node+0x33a/0x3c0
> > [ 36.905244][ T60] ? __get_vm_area_node+0x33a/0x3c0
> > [ 36.905250][ T60] __vmalloc_node_range_noprof+0x170/0x306
> > [ 36.905255][ T60] ? __vmalloc_area_node+0x460/0x460
> > [ 36.905260][ T60] ? test_func+0x2ae/0x469
> > [ 36.905264][ T60] __vmalloc_node_noprof+0xb8/0xd9
> > [ 36.905267][ T60] ? test_func+0x2ae/0x469
> > [ 36.905272][ T60] align_shift_alloc_test+0xa8/0x165
> > [ 36.905277][ T60] test_func+0x2ae/0x469
> > [ 36.905281][ T60] ? pcpu_alloc_test+0x31b/0x31b
> > [ 36.905286][ T60] ? __kthread_parkme+0xcb/0x1a3
> > [ 36.905293][ T60] ? pcpu_alloc_test+0x31b/0x31b
> > [ 36.905297][ T60] kthread+0x452/0x464
> > [ 36.905301][ T60] ? kthread_is_per_cpu+0x51/0x51
> > [ 36.905304][ T60] ? _raw_spin_unlock_irq+0x23/0x35
> > [ 36.905308][ T60] ? kthread_is_per_cpu+0x51/0x51
> > [ 36.905311][ T60] ? kthread_is_per_cpu (kbuild/obj/consumer/x86_64-randconfig-161-20250614/kernel/kthread.c:413)
> > [ 36.905314][ T60] ret_from_fork (kbuild/obj/consumer/x86_64-randconfig-161-20250614/arch/x86/kernel/process.c:153)
> > [ 36.905318][ T60] ? kthread_is_per_cpu (kbuild/obj/consumer/x86_64-randconfig-161-20250614/kernel/kthread.c:413)
> > [ 36.905321][ T60] ret_from_fork_asm (kbuild/obj/consumer/x86_64-randconfig-161-20250614/arch/x86/entry/entry_64.S:255)
> > [ 36.905330][ T60] </TASK>
> > [ 36.905332][ T60] Mem-Info:
> > [ 36.919941][ T60] active_anon:0 inactive_anon:0 isolated_anon:0
> > [ 36.919941][ T60] active_file:0 inactive_file:0 isolated_file:0
> > [ 36.919941][ T60] unevictable:41612 dirty:0 writeback:0
> > [ 36.919941][ T60] slab_reclaimable:7429 slab_unreclaimable:145259
> > [ 36.919941][ T60] mapped:0 shmem:0 pagetables:145
> > [ 36.919941][ T60] sec_pagetables:0 bounce:0
> > [ 36.919941][ T60] kernel_misc_reclaimable:0
> > [ 36.919941][ T60] free:3233392 free_pcp:1185 free_cma:0
> > [ 36.923830][ T60] Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:166448kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB writeback_tmp:0kB kernel_stack:1952kB pagetables:580kB sec_pagetables:0kB all_unreclaimable? no Balloon:0kB
> > [ 36.926265][ T60] DMA free:15360kB boost:0kB min:16kB low:28kB high:40kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> > [ 36.928855][ T60] lowmem_reserve[]: 0 2991 13741 13741
> > [ 36.929411][ T60] DMA32 free:3060560kB boost:0kB min:3224kB low:6244kB high:9264kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:3063680kB mlocked:0kB bounce:0kB free_pcp:3120kB local_pcp:3120kB free_cma:0kB
> > [ 36.932080][ T60] lowmem_reserve[]: 0 0 10749 10749
> > [ 36.932604][ T60] Normal free:9857648kB boost:0kB min:11744kB low:22748kB high:33752kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:166448kB writepending:0kB present:13631488kB managed:11007884kB mlocked:0kB bounce:0kB free_pcp:1620kB local_pcp:740kB free_cma:0kB
> > [ 36.935336][ T60] lowmem_reserve[]: 0 0 0 0
> > [ 36.935802][ T60] DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (U) 3*4096kB (M) = 15360kB
> > [ 36.936931][ T60] DMA32: 0*4kB 0*8kB 1*16kB (M) 2*32kB (M) 2*64kB (M) 1*128kB (M) 2*256kB (M) 2*512kB (M) 1*1024kB (M) 1*2048kB (M) 746*4096kB (M) = 3060560kB
> > [ 36.938318][ T60] Normal: 6*4kB (ME) 2*8kB (ME) 7*16kB (UME) 5*32kB (M) 3*64kB (ME) 4*128kB (M) 6*256kB (UME) 2*512kB (M) 1*1024kB (M) 3*2048kB (UME) 2404*4096kB (M) = 9857528kB
> > [ 36.939849][ T60] 41618 total pagecache pages
> > [ 36.940324][ T60] 4194174 pages RAM
> > [ 36.940721][ T60] 0 pages HighMem/MovableOnly
> > [ 36.941188][ T60] 672443 pages reserved
> > [ 36.941626][ T60] Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#1] SMP KASAN
> > [ 36.942185][ T60] KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
> > [ 36.942185][ T60] CPU: 1 UID: 0 PID: 60 Comm: vmalloc_test/0 Not tainted 6.15.0-rc6-00142-g2d76e79315e4 #1 VOLUNTARY
> > [ 36.942185][ T60] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > [ 36.942185][ T60] RIP: 0010:down_read_trylock+0xa7/0x2b9
> > [ 36.942185][ T60] Code: b0 ef 25 91 e8 57 16 40 00 83 3d 9c e6 a7 09 00 0f 85 2c 01 00 00 48 8d 6b 68 b8 ff ff 37 00 48 89 ea 48 c1 e0 2a 48 c1 ea 03 <80> 3c 02 00 74 08 48 89 ef e8 3c 16 40 00 48 3b 5b 68 0f 84 00 01
> > [ 36.942185][ T60] RSP: 0000:ffff88814657f848 EFLAGS: 00010206
> > [ 36.942185][ T60] RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 1ffffffff224bdf6
> > [ 36.942185][ T60] RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> > [ 36.942185][ T60] RBP: 00000000000000d8 R08: 0000000000000000 R09: 0000000000000000
> > [ 36.942185][ T60] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff11028caff0a
> > [ 36.942185][ T60] R13: ffff88814657fa30 R14: dffffc0000000000 R15: 0000000000000000
> > [ 36.942185][ T60] FS: 0000000000000000(0000) GS:ffff88841c1f0000(0000) knlGS:0000000000000000
> > [ 36.942185][ T60] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 36.942185][ T60] CR2: 0000000000000000 CR3: 00000001636e0000 CR4: 00000000000406b0
> > [ 36.942185][ T60] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 36.942185][ T60] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ 36.942185][ T60] Call Trace:
> > [ 36.942185][ T60] <TASK>
> > [ 36.942185][ T60] ? clear_nonspinnable+0x32/0x32
> > [ 36.942185][ T60] ? vprintk_emit+0x165/0x194
> > [ 36.942185][ T60] codetag_trylock_module_list+0xd/0x19
> > [ 36.942185][ T60] alloc_tag_top_users+0x95/0x216
> > [ 36.942185][ T60] ? _printk+0xad/0xdf
> > [ 36.942185][ T60] ? reserve_module_tags+0x308/0x308
> > [ 36.942185][ T60] __show_mem+0x167/0x54b
> > [ 36.942185][ T60] ? _printk+0xad/0xdf
> > [ 36.942185][ T60] ? printk_get_console_flush_type+0x272/0x272
> > [ 36.942185][ T60] ? show_free_areas+0x115d/0x115d
> > [ 36.942185][ T60] ? tracer_hardirqs_on+0x1b/0x28d
> > [ 36.942185][ T60] ? dump_stack_lvl+0x91/0xd6
> > [ 36.942185][ T60] ? warn_alloc+0x251/0x291
> > [ 36.942185][ T60] warn_alloc+0x251/0x291
> > [ 36.942185][ T60] ? has_managed_dma+0x37/0x37
> > [ 36.942185][ T60] ? __get_vm_area_node+0x33a/0x3c0
> > [ 36.942185][ T60] __vmalloc_node_range_noprof+0x170/0x306
> > [ 36.942185][ T60] ? __vmalloc_area_node+0x460/0x460
> > [ 36.942185][ T60] ? test_func+0x2ae/0x469
> > [ 36.942185][ T60] __vmalloc_node_noprof+0xb8/0xd9
> > [ 36.942185][ T60] ? test_func+0x2ae/0x469
> > [ 36.942185][ T60] align_shift_alloc_test+0xa8/0x165
> > [ 36.942185][ T60] test_func+0x2ae/0x469
> > [ 36.942185][ T60] ? pcpu_alloc_test+0x31b/0x31b
> > [ 36.942185][ T60] ? __kthread_parkme+0xcb/0x1a3
> > [ 36.942185][ T60] ? pcpu_alloc_test+0x31b/0x31b
> > [ 36.942185][ T60] kthread+0x452/0x464
> > [ 36.942185][ T60] ? kthread_is_per_cpu+0x51/0x51
> > [ 36.942185][ T60] ? _raw_spin_unlock_irq+0x23/0x35
> > [ 36.942185][ T60] ? kthread_is_per_cpu+0x51/0x51
> > [ 36.942185][ T60] ret_from_fork+0x20/0x54
> > [ 36.942185][ T60] ? kthread_is_per_cpu+0x51/0x51
> > [ 36.942185][ T60] ret_from_fork_asm+0x11/0x20
> > [ 36.942185][ T60] </TASK>
> > [ 36.942185][ T60] Modules linked in:
> > [ 37.000652][ T60] ---[ end trace 0000000000000000 ]---
> > [ 37.001188][ T60] RIP: 0010:down_read_trylock+0xa7/0x2b9
> > [ 37.001731][ T60] Code: b0 ef 25 91 e8 57 16 40 00 83 3d 9c e6 a7 09 00 0f 85 2c 01 00 00 48 8d 6b 68 b8 ff ff 37 00 48 89 ea 48 c1 e0 2a 48 c1 ea 03 <80> 3c 02 00 74 08 48 89 ef e8 3c 16 40 00 48 3b 5b 68 0f 84 00 01
> > [ 37.003488][ T60] RSP: 0000:ffff88814657f848 EFLAGS: 00010206
> > [ 37.004072][ T60] RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 1ffffffff224bdf6
> > [ 37.004848][ T60] RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> > [ 37.005610][ T60] RBP: 00000000000000d8 R08: 0000000000000000 R09: 0000000000000000
> > [ 37.006381][ T60] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff11028caff0a
> > [ 37.007178][ T60] R13: ffff88814657fa30 R14: dffffc0000000000 R15: 0000000000000000
> > [ 37.007940][ T60] FS: 0000000000000000(0000) GS:ffff88841c1f0000(0000) knlGS:0000000000000000
> > [ 37.008792][ T60] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 37.009411][ T60] CR2: 0000000000000000 CR3: 00000001636e0000 CR4: 00000000000406b0
> > [ 37.010175][ T60] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 37.010950][ T60] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ 37.011716][ T60] Kernel panic - not syncing: Fatal exception
> > [ 37.012397][ T60] Kernel Offset: 0x6200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> >
> >
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20250618/202506181351.bba867dd-lkp@intel.com
> >
> >
> >
> > --
> > 0-DAY CI Kernel Test Service
> > https://github.com/intel/lkp-tests/wiki
> >
> >
>
> --
> Cheers,
> Harry / Hyeonggon
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Kernel crash due to alloc_tag_top_users() being called when !mem_profiling_support?
2025-06-19 14:10 ` Kernel crash due to alloc_tag_top_users() being called when !mem_profiling_support? Harry Yoo
2025-06-19 15:04 ` Harry Yoo
@ 2025-06-19 15:08 ` David Wang
2025-06-20 1:14 ` Harry Yoo
1 sibling, 1 reply; 28+ messages in thread
From: David Wang @ 2025-06-19 15:08 UTC (permalink / raw)
To: harry.yoo, surenb, cachen
Cc: ahuang12, akpm, bhe, hch, kent.overstreet, linux-kernel, linux-mm,
lkp, mjguzik, oe-lkp, oliver.sang, urezki
> On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
> >
> > Hello,
> >
> > for this change, we reported
> > "[linux-next:master] [lib/test_vmalloc.c] 7fc85b92db: Mem-Info"
> > in
> > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
> >
> > at that time, we made some tests with x86_64 config which runs well.
> >
> > now we noticed the commit is in mainline now.
>
> (Re-sending due to not Ccing people and the list...)
>
> Hi, I'm facing the same error on my testing environment.
>
> I think this is related to memory allocation profiling & code tagging
> subsystems rather than vmalloc, so let's add related folks to Cc.
>
> After a quick skimming of the code, it seems the condition
> to trigger this is that on 1) MEM_ALLOC_PROFILING is compiled but
> 2) not enabled by default. and 3) allocation somehow failed, calling
> alloc_tag_top_users().
>
> I see "Memory allocation profiling is not supported!" in the dmesg,
> which means it did not alloc & inititialize alloc_tag_cttype properly,
> but alloc_tag_top_users() tries to acquire the semaphore.
>
> I think the kernel should not call alloc_tag_top_users() at all (or it
> should return an error) if mem_profiling_support == false?
>
> Does the following work on your testing environment?
>
> (Only did very light testing on my QEMU, but seems to fix the issue for me.)
>
> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> index d48b80f3f007..57d4d5673855 100644
> --- a/lib/alloc_tag.c
> +++ b/lib/alloc_tag.c
> @@ -134,7 +134,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> struct codetag_bytes n;
> unsigned int i, nr = 0;
>
> - if (can_sleep)
> + if (!mem_profiling_support)
> + return 0;
> + else if (can_sleep)
> codetag_lock_module_list(alloc_tag_cttype, true);
> else if (!codetag_trylock_module_list(alloc_tag_cttype))
> return 0;
I think you are correct, this was introduced/exposed by
commit 780138b1 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
(Before the commit, the BUG would only be triggered when alloc_tag_init failed)
David
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH] lib/alloc_tag: do not acquire nonexistent lock when mem profiling is disabled
2025-06-18 6:25 [linus:master] [lib/test_vmalloc.c] 2d76e79315: Kernel_panic-not_syncing:Fatal_exception kernel test robot
2025-06-19 14:10 ` Kernel crash due to alloc_tag_top_users() being called when !mem_profiling_support? Harry Yoo
@ 2025-06-20 0:40 ` Harry Yoo
2025-06-20 3:09 ` David Wang
2025-06-20 10:02 ` CONFIG_TEST_VMALLOC=y conflict/race with alloc_tag_init David Wang
` (2 subsequent siblings)
4 siblings, 1 reply; 28+ messages in thread
From: Harry Yoo @ 2025-06-20 0:40 UTC (permalink / raw)
To: akpm, surenb, kent.overstreet
Cc: oliver.sang, 00107082, cachen, linux-mm, oe-lkp, Harry Yoo
alloc_tag_top_users() attempts to acquire alloc_tag_ctype->mod_lock
even when memory allocation profiling feature is disabled at runtime.
If the feature is compiled in but not enabled at boot, alloc_tag_init()
does not properly allocate and initialize the alloc_tag_cttype variable.
This leads to a crash on memory allocation failure by attempting to
acquire a semaphore that does not exist:
Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
Tainted: [D]=DIE
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:down_read_trylock+0xaa/0x3b0
Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
Call Trace:
<TASK>
codetag_trylock_module_list+0xd/0x20
alloc_tag_top_users+0x369/0x4b0
__show_mem+0x1cd/0x6e0
warn_alloc+0x2b1/0x390
__alloc_frozen_pages_noprof+0x12b9/0x21a0
alloc_pages_mpol+0x135/0x3e0
alloc_slab_page+0x82/0xe0
new_slab+0x212/0x240
___slab_alloc+0x82a/0xe00
</TASK>
As David Wang points out, this issue was introduced by commit
780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
Before the commit, alloc tagging subsystem unconditionally allocates
the semaphore.
After the commit, alloc_tag_top_users() must check whether it was
actually initialized. Fix it by adding the appropriate check in
alloc_tag_top_users().
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com
Closes: https://lore.kernel.org/oe-lkp/202505071555.e757f1e0-lkp@intel.com
Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
I manually confirmed that the crash in the vmalloc test module no longer
occurs with this patch when the memory profiling feature is compiled
but not enabled at boot.
No Cc: stable because the offending commit was introduced in v6.16-rc1.
lib/alloc_tag.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 66a4628185f7..20c627191d3e 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -124,7 +124,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
struct codetag_bytes n;
unsigned int i, nr = 0;
- if (can_sleep)
+ if (!mem_profiling_support)
+ return 0;
+ else if (can_sleep)
codetag_lock_module_list(alloc_tag_cttype, true);
else if (!codetag_trylock_module_list(alloc_tag_cttype))
return 0;
--
2.43.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: Kernel crash due to alloc_tag_top_users() being called when !mem_profiling_support?
2025-06-19 15:08 ` David Wang
@ 2025-06-20 1:14 ` Harry Yoo
0 siblings, 0 replies; 28+ messages in thread
From: Harry Yoo @ 2025-06-20 1:14 UTC (permalink / raw)
To: David Wang
Cc: surenb, cachen, ahuang12, akpm, bhe, hch, kent.overstreet,
linux-kernel, linux-mm, lkp, mjguzik, oe-lkp, oliver.sang, urezki
On Thu, Jun 19, 2025 at 11:08:09PM +0800, David Wang wrote:
> > On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
> > >
> > > Hello,
> > >
> > > for this change, we reported
> > > "[linux-next:master] [lib/test_vmalloc.c] 7fc85b92db: Mem-Info"
> > > in
> > > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
> > >
> > > at that time, we made some tests with x86_64 config which runs well.
> > >
> > > now we noticed the commit is in mainline now.
> >
> > (Re-sending due to not Ccing people and the list...)
> >
> > Hi, I'm facing the same error on my testing environment.
> >
> > I think this is related to memory allocation profiling & code tagging
> > subsystems rather than vmalloc, so let's add related folks to Cc.
> >
> > After a quick skimming of the code, it seems the condition
> > to trigger this is that on 1) MEM_ALLOC_PROFILING is compiled but
> > 2) not enabled by default. and 3) allocation somehow failed, calling
> > alloc_tag_top_users().
> >
> > I see "Memory allocation profiling is not supported!" in the dmesg,
> > which means it did not alloc & inititialize alloc_tag_cttype properly,
> > but alloc_tag_top_users() tries to acquire the semaphore.
> >
> > I think the kernel should not call alloc_tag_top_users() at all (or it
> > should return an error) if mem_profiling_support == false?
> >
> > Does the following work on your testing environment?
> >
> > (Only did very light testing on my QEMU, but seems to fix the issue for me.)
> >
> > diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> > index d48b80f3f007..57d4d5673855 100644
> > --- a/lib/alloc_tag.c
> > +++ b/lib/alloc_tag.c
> > @@ -134,7 +134,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> > struct codetag_bytes n;
> > unsigned int i, nr = 0;
> >
> > - if (can_sleep)
> > + if (!mem_profiling_support)
> > + return 0;
> > + else if (can_sleep)
> > codetag_lock_module_list(alloc_tag_cttype, true);
> > else if (!codetag_trylock_module_list(alloc_tag_cttype))
> > return 0;
>
> I think you are correct, this was introduced/exposed by
> commit 780138b1 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
Oh, I wasn't aware of that commit.
Thanks for pointing it out!
Indeed, prior to 780138b1, it was unconditionally allocated,
so it shouldn't have been a problem unless the allocation fails.
I've sent a formal patch to help testing.
> (Before the commit, the BUG would only be triggered when alloc_tag_init failed)
That is nearly impossible to trigger as the allocation size is
too small to fail, and the allocation is done at boot step,
so it shouldn't fail in practice.
Or should we be more paranoid and fix it in v6.12 stable?
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re:[PATCH] lib/alloc_tag: do not acquire nonexistent lock when mem profiling is disabled
2025-06-20 0:40 ` [PATCH] lib/alloc_tag: do not acquire nonexistent lock when mem profiling is disabled Harry Yoo
@ 2025-06-20 3:09 ` David Wang
2025-06-20 10:40 ` [PATCH] " Harry Yoo
0 siblings, 1 reply; 28+ messages in thread
From: David Wang @ 2025-06-20 3:09 UTC (permalink / raw)
To: Harry Yoo
Cc: akpm, surenb, kent.overstreet, oliver.sang, cachen, linux-mm,
oe-lkp
At 2025-06-20 08:40:32, "Harry Yoo" <harry.yoo@oracle.com> wrote:
>alloc_tag_top_users() attempts to acquire alloc_tag_ctype->mod_lock
>even when memory allocation profiling feature is disabled at runtime.
>If the feature is compiled in but not enabled at boot, alloc_tag_init()
>does not properly allocate and initialize the alloc_tag_cttype variable.
>
>This leads to a crash on memory allocation failure by attempting to
>acquire a semaphore that does not exist:
>
> Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
> KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
> CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
> Tainted: [D]=DIE
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> RIP: 0010:down_read_trylock+0xaa/0x3b0
> Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
> RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
> RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
> RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
> R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
> R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
> FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
> Call Trace:
> <TASK>
> codetag_trylock_module_list+0xd/0x20
> alloc_tag_top_users+0x369/0x4b0
> __show_mem+0x1cd/0x6e0
> warn_alloc+0x2b1/0x390
> __alloc_frozen_pages_noprof+0x12b9/0x21a0
> alloc_pages_mpol+0x135/0x3e0
> alloc_slab_page+0x82/0xe0
> new_slab+0x212/0x240
> ___slab_alloc+0x82a/0xe00
> </TASK>
>
>As David Wang points out, this issue was introduced by commit
>780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
>Before the commit, alloc tagging subsystem unconditionally allocates
>the semaphore.
>
>After the commit, alloc_tag_top_users() must check whether it was
>actually initialized. Fix it by adding the appropriate check in
>alloc_tag_top_users().
>
>Reported-by: kernel test robot <oliver.sang@intel.com>
>Closes: https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com
I am not quite sure this can be closed, according to the config file
https://download.01.org/0day-ci/archive/20250618/202506181351.bba867dd-lkp@intel.com/config-6.15.0-rc6-00142-g2d76e79315e4
CONFIG_MEM_ALLOC_PROFILING=y
CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y <---
CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
mem_profiling_support is true on boot, and alloc_tag_ctype is properly initialized.
Maybe there is other issue lurking somewhere....
>Closes: https://lore.kernel.org/oe-lkp/202505071555.e757f1e0-lkp@intel.com
This one should not be closed, because "# CONFIG_MEM_ALLOC_PROFILING is not set".
https://download.01.org/0day-ci/archive/20250507/202505071555.e757f1e0-lkp@intel.com/config-6.15.0-rc2-00491-g7fc85b92db96
>Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
>Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
>---
>
>I manually confirmed that the crash in the vmalloc test module no longer
>occurs with this patch when the memory profiling feature is compiled
>but not enabled at boot.
>
>No Cc: stable because the offending commit was introduced in v6.16-rc1.
>
> lib/alloc_tag.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
>diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
>index 66a4628185f7..20c627191d3e 100644
>--- a/lib/alloc_tag.c
>+++ b/lib/alloc_tag.c
>@@ -124,7 +124,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> struct codetag_bytes n;
> unsigned int i, nr = 0;
>
>- if (can_sleep)
>+ if (!mem_profiling_support)
>+ return 0;
>+ else if (can_sleep)
> codetag_lock_module_list(alloc_tag_cttype, true);
> else if (!codetag_trylock_module_list(alloc_tag_cttype))
> return 0;
>--
>2.43.0
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Kernel crash due to alloc_tag_top_users() being called when !mem_profiling_support?
2025-06-19 15:04 ` Harry Yoo
@ 2025-06-20 8:47 ` Uladzislau Rezki
2025-06-22 22:54 ` Suren Baghdasaryan
0 siblings, 1 reply; 28+ messages in thread
From: Uladzislau Rezki @ 2025-06-20 8:47 UTC (permalink / raw)
To: Harry Yoo
Cc: kernel test robot, Uladzislau Rezki, oe-lkp, lkp, linux-kernel,
Andrew Morton, Baoquan He, Adrian Huang, Christop Hellwig,
Mateusz Guzik, linux-mm, Suren Baghdasaryan, Kent Overstreet
On Fri, Jun 20, 2025 at 12:04:50AM +0900, Harry Yoo wrote:
> On Thu, Jun 19, 2025 at 11:10:43PM +0900, Harry Yoo wrote:
> > On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
> > >
> > > Hello,
> > >
> > > for this change, we reported
> > > "[linux-next:master] [lib/test_vmalloc.c] 7fc85b92db: Mem-Info"
> > > in
> > > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
> > >
> > > at that time, we made some tests with x86_64 config which runs well.
> > >
> > > now we noticed the commit is in mainline now.
> >
> > (Re-sending due to not Ccing people and the list...)
> >
> > Hi, I'm facing the same error on my testing environment.
>
> I should have clarified that the reason the kernel failed to allocate
> memory on my machine was due to running out of memory, not because of the
> vmalloc test module.
>
> But based on the fact that the test case (align_shift_alloc_test) is
> expected to fail, the issue here is not memory allocation failure
> itself, but rather that the kernel crashes when the allocation fails.
>
It looks someone tries to test the CONFIG_TEST_VMALLOC=y as built-in
approach test-cases. Yes, it will trigger a lot of warnings as some
use cases are supposed to be failed. This will trigger a lot of kernel
warnings which can be considered by test-robot or people as problem.
In this case i can exclude those use cases or even not run at all unless
boot-parameters properly sets if built-in.
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 28+ messages in thread
* CONFIG_TEST_VMALLOC=y conflict/race with alloc_tag_init
2025-06-18 6:25 [linus:master] [lib/test_vmalloc.c] 2d76e79315: Kernel_panic-not_syncing:Fatal_exception kernel test robot
2025-06-19 14:10 ` Kernel crash due to alloc_tag_top_users() being called when !mem_profiling_support? Harry Yoo
2025-06-20 0:40 ` [PATCH] lib/alloc_tag: do not acquire nonexistent lock when mem profiling is disabled Harry Yoo
@ 2025-06-20 10:02 ` David Wang
2025-06-22 22:50 ` Suren Baghdasaryan
2025-06-20 14:24 ` [PATCH] lib/test_vmalloc.c: demote vmalloc_test_init to late_initcall David Wang
2025-06-20 19:53 ` [PATCH v2] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users() Harry Yoo
4 siblings, 1 reply; 28+ messages in thread
From: David Wang @ 2025-06-20 10:02 UTC (permalink / raw)
To: oliver.sang, urezki
Cc: ahuang12, akpm, bhe, hch, linux-kernel, linux-mm, lkp, mjguzik,
oe-lkp, harry.yoo, kent.overstreet, surenb
On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
>
> Hello,
>
> for this change, we reported
> "[linux-next:master] [lib/test_vmalloc.c] 7fc85b92db: Mem-Info"
> in
> https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
>
> at that time, we made some tests with x86_64 config which runs well.
>
> now we noticed the commit is in mainline now.
> the config still has expected diff with parent:
>
> --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config 2025-06-17 14:40:29.481052101 +0800
> +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config 2025-06-17 14:41:18.448543738 +0800
> @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m
> CONFIG_TEST_MISC_MINOR=m
> # CONFIG_TEST_LKM is not set
> CONFIG_TEST_BITOPS=m
> -CONFIG_TEST_VMALLOC=m
> +CONFIG_TEST_VMALLOC=y
> # CONFIG_TEST_BPF is not set
> CONFIG_FIND_BIT_BENCHMARK=m
> # CONFIG_TEST_FIRMWARE is not set
>
>
> then we noticed similar random issue with x86_64 randconfig this time.
>
> 7a73348e5d4715b5 2d76e79315e403aab595d4c8830
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> :199 34% 67:200 dmesg.KASAN:null-ptr-deref_in_range[#-#]
> :199 34% 67:200 dmesg.Kernel_panic-not_syncing:Fatal_exception
> :199 34% 67:200 dmesg.Mem-Info
> :199 34% 67:200 dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN
> :199 34% 67:200 dmesg.RIP:down_read_trylock
>
> we don't have enough knowledge to understand the relationship between code
> change and the random issues. just report what we obsverved in our tests FYI.
>
I think this is caused by a race between vmalloc_test_init and alloc_tag_init.
vmalloc_test actually depends on alloc_tag via alloc_tag_top_users, because when
memory allocation fails show_mem() would invoke alloc_tag_top_users.
With following configuration:
CONFIG_TEST_VMALLOC=y
CONFIG_MEM_ALLOC_PROFILING=y
CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y
CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
If vmalloc_test_init starts before alloc_tag_init, show_mem() would cause
a NULL deference because alloc_tag_cttype was not init yet.
I add some debug to confirm this theory
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index d48b80f3f007..9b8e7501010f 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -133,6 +133,8 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
struct codetag *ct;
struct codetag_bytes n;
unsigned int i, nr = 0;
+ pr_info("memory profiling alloc top %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
+ return 0;
if (can_sleep)
codetag_lock_module_list(alloc_tag_cttype, true);
@@ -831,6 +833,7 @@ static int __init alloc_tag_init(void)
shutdown_mem_profiling(true);
return PTR_ERR(alloc_tag_cttype);
}
+ pr_info("memory profiling ready %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
return 0;
}
When bootup the kernel, the log shows:
$ sudo dmesg -T | grep profiling
[Fri Jun 20 17:29:35 2025] memory profiling alloc top 1: 0 <--- alloc_tag_cttype == NULL
[Fri Jun 20 17:30:24 2025] memory profiling ready 1: ffff9b1641aa06c0
vmalloc_test_init should happened after alloc_tag_init if CONFIG_TEST_VMALLOC=y,
or mem_show() should check whether alloc_tag is done initialized when calling
alloc_tag_top_users
David
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH] lib/alloc_tag: do not acquire nonexistent lock when mem profiling is disabled
2025-06-20 3:09 ` David Wang
@ 2025-06-20 10:40 ` Harry Yoo
2025-06-20 11:33 ` Harry Yoo
2025-06-20 12:47 ` Harry Yoo
0 siblings, 2 replies; 28+ messages in thread
From: Harry Yoo @ 2025-06-20 10:40 UTC (permalink / raw)
To: David Wang
Cc: akpm, surenb, kent.overstreet, oliver.sang, cachen, linux-mm,
oe-lkp
On Fri, Jun 20, 2025 at 11:09:16AM +0800, David Wang wrote:
>
>
> At 2025-06-20 08:40:32, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> >alloc_tag_top_users() attempts to acquire alloc_tag_ctype->mod_lock
> >even when memory allocation profiling feature is disabled at runtime.
> >If the feature is compiled in but not enabled at boot, alloc_tag_init()
> >does not properly allocate and initialize the alloc_tag_cttype variable.
> >
> >This leads to a crash on memory allocation failure by attempting to
> >acquire a semaphore that does not exist:
> >
> > Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
> > KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
> > CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
> > Tainted: [D]=DIE
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > RIP: 0010:down_read_trylock+0xaa/0x3b0
> > Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
> > RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
> > RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
> > RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> > RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
> > R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
> > R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
> > FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
> > Call Trace:
> > <TASK>
> > codetag_trylock_module_list+0xd/0x20
> > alloc_tag_top_users+0x369/0x4b0
> > __show_mem+0x1cd/0x6e0
> > warn_alloc+0x2b1/0x390
> > __alloc_frozen_pages_noprof+0x12b9/0x21a0
> > alloc_pages_mpol+0x135/0x3e0
> > alloc_slab_page+0x82/0xe0
> > new_slab+0x212/0x240
> > ___slab_alloc+0x82a/0xe00
> > </TASK>
> >
> >As David Wang points out, this issue was introduced by commit
> >780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
> >Before the commit, alloc tagging subsystem unconditionally allocates
> >the semaphore.
> >
> >After the commit, alloc_tag_top_users() must check whether it was
> >actually initialized. Fix it by adding the appropriate check in
> >alloc_tag_top_users().
> >
> >Reported-by: kernel test robot <oliver.sang@intel.com>
> > Closes: https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com
>
> I am not quite sure this can be closed, according to the config file
> https://download.01.org/0day-ci/archive/20250618/202506181351.bba867dd-lkp@intel.com/config-6.15.0-rc6-00142-g2d76e79315e4
>
> CONFIG_MEM_ALLOC_PROFILING=y
> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y <---
> CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
>
> mem_profiling_support is true on boot, and alloc_tag_ctype is properly initialized.
>
> Maybe there is other issue lurking somewhere....
Oops, I thought they are all the same issues.
I should have been more thorough and checked the config.
Thank you for pointing it out!
I think you're right. mem_profiling_support == true doesn't necessarily
mean it's allocated and initialized, as you demonstrated it in the other
email.
I think it'd be more robust to set mem_profiling_support to false,
disable mem_alloc_profiling_key at boot and enable it later when
it is properly allocated.
> >Closes: https://lore.kernel.org/oe-lkp/202505071555.e757f1e0-lkp@intel.com
>
> This one should not be closed, because "# CONFIG_MEM_ALLOC_PROFILING is not set".
> https://download.01.org/0day-ci/archive/20250507/202505071555.e757f1e0-lkp@intel.com/config-6.15.0-rc2-00491-g7fc85b92db96
I assumed it was mem profiling that caused the crash, since it happened
while printing memory info. Pretty weird coincidence...
I'll try to reproduce it and figure out why it crashed.
> >Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
> >Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> >---
> >
> >I manually confirmed that the crash in the vmalloc test module no longer
> >occurs with this patch when the memory profiling feature is compiled
> >but not enabled at boot.
> >
> >No Cc: stable because the offending commit was introduced in v6.16-rc1.
> >
> > lib/alloc_tag.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> >diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> >index 66a4628185f7..20c627191d3e 100644
> >--- a/lib/alloc_tag.c
> >+++ b/lib/alloc_tag.c
> >@@ -124,7 +124,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> > struct codetag_bytes n;
> > unsigned int i, nr = 0;
> >
> >- if (can_sleep)
> >+ if (!mem_profiling_support)
> >+ return 0;
> >+ else if (can_sleep)
> > codetag_lock_module_list(alloc_tag_cttype, true);
> > else if (!codetag_trylock_module_list(alloc_tag_cttype))
> > return 0;
> >--
> >2.43.0
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] lib/alloc_tag: do not acquire nonexistent lock when mem profiling is disabled
2025-06-20 10:40 ` [PATCH] " Harry Yoo
@ 2025-06-20 11:33 ` Harry Yoo
2025-06-20 13:59 ` David Wang
2025-06-20 12:47 ` Harry Yoo
1 sibling, 1 reply; 28+ messages in thread
From: Harry Yoo @ 2025-06-20 11:33 UTC (permalink / raw)
To: David Wang
Cc: akpm, surenb, kent.overstreet, oliver.sang, cachen, linux-mm,
oe-lkp
On Fri, Jun 20, 2025 at 07:40:21PM +0900, Harry Yoo wrote:
> On Fri, Jun 20, 2025 at 11:09:16AM +0800, David Wang wrote:
> >
> >
> > At 2025-06-20 08:40:32, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> > >alloc_tag_top_users() attempts to acquire alloc_tag_ctype->mod_lock
> > >even when memory allocation profiling feature is disabled at runtime.
> > >If the feature is compiled in but not enabled at boot, alloc_tag_init()
> > >does not properly allocate and initialize the alloc_tag_cttype variable.
> > >
> > >This leads to a crash on memory allocation failure by attempting to
> > >acquire a semaphore that does not exist:
> > >
> > > Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
> > > KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
> > > CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
> > > Tainted: [D]=DIE
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > RIP: 0010:down_read_trylock+0xaa/0x3b0
> > > Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
> > > RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
> > > RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
> > > RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> > > RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
> > > R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
> > > R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
> > > FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
> > > Call Trace:
> > > <TASK>
> > > codetag_trylock_module_list+0xd/0x20
> > > alloc_tag_top_users+0x369/0x4b0
> > > __show_mem+0x1cd/0x6e0
> > > warn_alloc+0x2b1/0x390
> > > __alloc_frozen_pages_noprof+0x12b9/0x21a0
> > > alloc_pages_mpol+0x135/0x3e0
> > > alloc_slab_page+0x82/0xe0
> > > new_slab+0x212/0x240
> > > ___slab_alloc+0x82a/0xe00
> > > </TASK>
> > >
> > >As David Wang points out, this issue was introduced by commit
> > >780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
> > >Before the commit, alloc tagging subsystem unconditionally allocates
> > >the semaphore.
> > >
> > >After the commit, alloc_tag_top_users() must check whether it was
> > >actually initialized. Fix it by adding the appropriate check in
> > >alloc_tag_top_users().
> > >
> > >Reported-by: kernel test robot <oliver.sang@intel.com>
> > > Closes: https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com
> >
> > I am not quite sure this can be closed, according to the config file
> > https://download.01.org/0day-ci/archive/20250618/202506181351.bba867dd-lkp@intel.com/config-6.15.0-rc6-00142-g2d76e79315e4
> >
> > CONFIG_MEM_ALLOC_PROFILING=y
> > CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y <---
> > CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
> >
> > mem_profiling_support is true on boot, and alloc_tag_ctype is properly initialized.
> >
> > Maybe there is other issue lurking somewhere....
>
> Oops, I thought they are all the same issues.
> I should have been more thorough and checked the config.
> Thank you for pointing it out!
>
> I think you're right. mem_profiling_support == true doesn't necessarily
> mean it's allocated and initialized, as you demonstrated it in the other
> email.
>
> I think it'd be more robust to set mem_profiling_support to false,
> disable mem_alloc_profiling_key at boot and enable it later when
> it is properly allocated.
Actually, we need something a bit more sophiscated than that.
IIUC memory allocation is accounted even before alloc_tag_init(),
and the logic depends on mem_alloc_profiling_key being enabled.
If we change that, some allocations during early boot stage won't be
accounted.
I think we need to introduce a separate variable to indicate whether
alloc_tag_init() has completed its initialization and check that in
alloc_tag_top_users().
> > >Closes: https://lore.kernel.org/oe-lkp/202505071555.e757f1e0-lkp@intel.com
> >
> > This one should not be closed, because "# CONFIG_MEM_ALLOC_PROFILING is not set".
> > https://download.01.org/0day-ci/archive/20250507/202505071555.e757f1e0-lkp@intel.com/config-6.15.0-rc2-00491-g7fc85b92db96
>
> I assumed it was mem profiling that caused the crash, since it happened
> while printing memory info. Pretty weird coincidence...
>
> I'll try to reproduce it and figure out why it crashed.
>
> > >Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
> > >Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> > >---
> > >
> > >I manually confirmed that the crash in the vmalloc test module no longer
> > >occurs with this patch when the memory profiling feature is compiled
> > >but not enabled at boot.
> > >
> > >No Cc: stable because the offending commit was introduced in v6.16-rc1.
> > >
> > > lib/alloc_tag.c | 4 +++-
> > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > >diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> > >index 66a4628185f7..20c627191d3e 100644
> > >--- a/lib/alloc_tag.c
> > >+++ b/lib/alloc_tag.c
> > >@@ -124,7 +124,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> > > struct codetag_bytes n;
> > > unsigned int i, nr = 0;
> > >
> > >- if (can_sleep)
> > >+ if (!mem_profiling_support)
> > >+ return 0;
> > >+ else if (can_sleep)
> > > codetag_lock_module_list(alloc_tag_cttype, true);
> > > else if (!codetag_trylock_module_list(alloc_tag_cttype))
> > > return 0;
> > >--
> > >2.43.0
>
> --
> Cheers,
> Harry / Hyeonggon
>
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] lib/alloc_tag: do not acquire nonexistent lock when mem profiling is disabled
2025-06-20 10:40 ` [PATCH] " Harry Yoo
2025-06-20 11:33 ` Harry Yoo
@ 2025-06-20 12:47 ` Harry Yoo
1 sibling, 0 replies; 28+ messages in thread
From: Harry Yoo @ 2025-06-20 12:47 UTC (permalink / raw)
To: David Wang
Cc: akpm, surenb, kent.overstreet, oliver.sang, cachen, linux-mm,
oe-lkp
On Fri, Jun 20, 2025 at 07:40:21PM +0900, Harry Yoo wrote:
> On Fri, Jun 20, 2025 at 11:09:16AM +0800, David Wang wrote:
> >
> >
> > At 2025-06-20 08:40:32, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> > >alloc_tag_top_users() attempts to acquire alloc_tag_ctype->mod_lock
> > >even when memory allocation profiling feature is disabled at runtime.
> > >If the feature is compiled in but not enabled at boot, alloc_tag_init()
> > >does not properly allocate and initialize the alloc_tag_cttype variable.
> > >
> > >This leads to a crash on memory allocation failure by attempting to
> > >acquire a semaphore that does not exist:
> > >
> > > Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
> > > KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
> > > CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
> > > Tainted: [D]=DIE
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > RIP: 0010:down_read_trylock+0xaa/0x3b0
> > > Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
> > > RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
> > > RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
> > > RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> > > RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
> > > R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
> > > R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
> > > FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
> > > Call Trace:
> > > <TASK>
> > > codetag_trylock_module_list+0xd/0x20
> > > alloc_tag_top_users+0x369/0x4b0
> > > __show_mem+0x1cd/0x6e0
> > > warn_alloc+0x2b1/0x390
> > > __alloc_frozen_pages_noprof+0x12b9/0x21a0
> > > alloc_pages_mpol+0x135/0x3e0
> > > alloc_slab_page+0x82/0xe0
> > > new_slab+0x212/0x240
> > > ___slab_alloc+0x82a/0xe00
> > > </TASK>
> > >
> > >As David Wang points out, this issue was introduced by commit
> > >780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
> > >Before the commit, alloc tagging subsystem unconditionally allocates
> > >the semaphore.
> > >
> > >After the commit, alloc_tag_top_users() must check whether it was
> > >actually initialized. Fix it by adding the appropriate check in
> > >alloc_tag_top_users().
> > >
> > >Reported-by: kernel test robot <oliver.sang@intel.com>
> > > Closes: https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com
> >
> > I am not quite sure this can be closed, according to the config file
> > https://download.01.org/0day-ci/archive/20250618/202506181351.bba867dd-lkp@intel.com/config-6.15.0-rc6-00142-g2d76e79315e4
> >
> > CONFIG_MEM_ALLOC_PROFILING=y
> > CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y <---
> > CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
> >
> > mem_profiling_support is true on boot, and alloc_tag_ctype is properly initialized.
> >
> > Maybe there is other issue lurking somewhere....
>
> Oops, I thought they are all the same issues.
> I should have been more thorough and checked the config.
> Thank you for pointing it out!
>
> I think you're right. mem_profiling_support == true doesn't necessarily
> mean it's allocated and initialized, as you demonstrated it in the other
> email.
>
> I think it'd be more robust to set mem_profiling_support to false,
> disable mem_alloc_profiling_key at boot and enable it later when
> it is properly allocated.
>
> > >Closes: https://lore.kernel.org/oe-lkp/202505071555.e757f1e0-lkp@intel.com
> >
> > This one should not be closed, because "# CONFIG_MEM_ALLOC_PROFILING is not set".
> > https://download.01.org/0day-ci/archive/20250507/202505071555.e757f1e0-lkp@intel.com/config-6.15.0-rc2-00491-g7fc85b92db96
>
> I assumed it was mem profiling that caused the crash, since it happened
> while printing memory info. Pretty weird coincidence...
>
> I'll try to reproduce it and figure out why it crashed.
I think this one is not a bug :)
I reproduced it with the config provided and it just takes ~900 seconds
to boot because the vmalloc test module does performance testing...
and I think the boot test just timed out.
--
Cheers,
Harry / Hyeonggon
> > >Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
> > >Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> > >---
> > >
> > >I manually confirmed that the crash in the vmalloc test module no longer
> > >occurs with this patch when the memory profiling feature is compiled
> > >but not enabled at boot.
> > >
> > >No Cc: stable because the offending commit was introduced in v6.16-rc1.
> > >
> > > lib/alloc_tag.c | 4 +++-
> > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > >diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> > >index 66a4628185f7..20c627191d3e 100644
> > >--- a/lib/alloc_tag.c
> > >+++ b/lib/alloc_tag.c
> > >@@ -124,7 +124,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> > > struct codetag_bytes n;
> > > unsigned int i, nr = 0;
> > >
> > >- if (can_sleep)
> > >+ if (!mem_profiling_support)
> > >+ return 0;
> > >+ else if (can_sleep)
> > > codetag_lock_module_list(alloc_tag_cttype, true);
> > > else if (!codetag_trylock_module_list(alloc_tag_cttype))
> > > return 0;
> > >--
> > >2.43.0
>
> --
> Cheers,
> Harry / Hyeonggon
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] lib/alloc_tag: do not acquire nonexistent lock when mem profiling is disabled
2025-06-20 11:33 ` Harry Yoo
@ 2025-06-20 13:59 ` David Wang
0 siblings, 0 replies; 28+ messages in thread
From: David Wang @ 2025-06-20 13:59 UTC (permalink / raw)
To: Harry Yoo
Cc: akpm, surenb, kent.overstreet, oliver.sang, cachen, linux-mm,
oe-lkp
At 2025-06-20 19:33:31, "Harry Yoo" <harry.yoo@oracle.com> wrote:
>On Fri, Jun 20, 2025 at 07:40:21PM +0900, Harry Yoo wrote:
>> On Fri, Jun 20, 2025 at 11:09:16AM +0800, David Wang wrote:
>> >
>> >
>> > At 2025-06-20 08:40:32, "Harry Yoo" <harry.yoo@oracle.com> wrote:
>> > >alloc_tag_top_users() attempts to acquire alloc_tag_ctype->mod_lock
>> > >even when memory allocation profiling feature is disabled at runtime.
>> > >If the feature is compiled in but not enabled at boot, alloc_tag_init()
>> > >does not properly allocate and initialize the alloc_tag_cttype variable.
>> > >
>> > >This leads to a crash on memory allocation failure by attempting to
>> > >acquire a semaphore that does not exist:
>> > >
>> > > Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
>> > > KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
>> > > CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
>> > > Tainted: [D]=DIE
>> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
>> > > RIP: 0010:down_read_trylock+0xaa/0x3b0
>> > > Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
>> > > RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
>> > > RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
>> > > RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
>> > > RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
>> > > R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
>> > > R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
>> > > FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
>> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > > CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
>> > > Call Trace:
>> > > <TASK>
>> > > codetag_trylock_module_list+0xd/0x20
>> > > alloc_tag_top_users+0x369/0x4b0
>> > > __show_mem+0x1cd/0x6e0
>> > > warn_alloc+0x2b1/0x390
>> > > __alloc_frozen_pages_noprof+0x12b9/0x21a0
>> > > alloc_pages_mpol+0x135/0x3e0
>> > > alloc_slab_page+0x82/0xe0
>> > > new_slab+0x212/0x240
>> > > ___slab_alloc+0x82a/0xe00
>> > > </TASK>
>> > >
>> > >As David Wang points out, this issue was introduced by commit
>> > >780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
>> > >Before the commit, alloc tagging subsystem unconditionally allocates
>> > >the semaphore.
>> > >
>> > >After the commit, alloc_tag_top_users() must check whether it was
>> > >actually initialized. Fix it by adding the appropriate check in
>> > >alloc_tag_top_users().
>> > >
>> > >Reported-by: kernel test robot <oliver.sang@intel.com>
>> > > Closes: https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com
>> >
>> > I am not quite sure this can be closed, according to the config file
>> > https://download.01.org/0day-ci/archive/20250618/202506181351.bba867dd-lkp@intel.com/config-6.15.0-rc6-00142-g2d76e79315e4
>> >
>> > CONFIG_MEM_ALLOC_PROFILING=y
>> > CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y <---
>> > CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
>> >
>> > mem_profiling_support is true on boot, and alloc_tag_ctype is properly initialized.
>> >
>> > Maybe there is other issue lurking somewhere....
>>
>> Oops, I thought they are all the same issues.
>> I should have been more thorough and checked the config.
>> Thank you for pointing it out!
>>
>> I think you're right. mem_profiling_support == true doesn't necessarily
>> mean it's allocated and initialized, as you demonstrated it in the other
>> email.
>>
>> I think it'd be more robust to set mem_profiling_support to false,
>> disable mem_alloc_profiling_key at boot and enable it later when
>> it is properly allocated.
>
>Actually, we need something a bit more sophiscated than that.
>IIUC memory allocation is accounted even before alloc_tag_init(),
>and the logic depends on mem_alloc_profiling_key being enabled.
>If we change that, some allocations during early boot stage won't be
>accounted.
>
>I think we need to introduce a separate variable to indicate whether
>alloc_tag_init() has completed its initialization and check that in
>alloc_tag_top_users().
An easy out would be demote vmalloc_test_init to late_initcall, I feel that is
reasonable for a test module
(I am preparing a patch for that, will send it out after some tests)
Together with this patch, we would be in good shape in most cases.
About OOM(mem_show()) during boot before alloc_tag_init,
not sure whether that should raise concern or not, OOM during boot put system
in bad shape already.....
>
>> > >Closes: https://lore.kernel.org/oe-lkp/202505071555.e757f1e0-lkp@intel.com
>> >
>> > This one should not be closed, because "# CONFIG_MEM_ALLOC_PROFILING is not set".
>> > https://download.01.org/0day-ci/archive/20250507/202505071555.e757f1e0-lkp@intel.com/config-6.15.0-rc2-00491-g7fc85b92db96
>>
>> I assumed it was mem profiling that caused the crash, since it happened
>> while printing memory info. Pretty weird coincidence...
>>
>> I'll try to reproduce it and figure out why it crashed.
>>
>> > >Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
>> > >Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
>> > >---
>> > >
>> > >I manually confirmed that the crash in the vmalloc test module no longer
>> > >occurs with this patch when the memory profiling feature is compiled
>> > >but not enabled at boot.
>> > >
>> > >No Cc: stable because the offending commit was introduced in v6.16-rc1.
>> > >
>> > > lib/alloc_tag.c | 4 +++-
>> > > 1 file changed, 3 insertions(+), 1 deletion(-)
>> > >
>> > >diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
>> > >index 66a4628185f7..20c627191d3e 100644
>> > >--- a/lib/alloc_tag.c
>> > >+++ b/lib/alloc_tag.c
>> > >@@ -124,7 +124,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
>> > > struct codetag_bytes n;
>> > > unsigned int i, nr = 0;
>> > >
>> > >- if (can_sleep)
>> > >+ if (!mem_profiling_support)
>> > >+ return 0;
>> > >+ else if (can_sleep)
>> > > codetag_lock_module_list(alloc_tag_cttype, true);
>> > > else if (!codetag_trylock_module_list(alloc_tag_cttype))
>> > > return 0;
>> > >--
>> > >2.43.0
>>
>> --
>> Cheers,
>> Harry / Hyeonggon
>>
>
>--
>Cheers,
>Harry / Hyeonggon
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH] lib/test_vmalloc.c: demote vmalloc_test_init to late_initcall
2025-06-18 6:25 [linus:master] [lib/test_vmalloc.c] 2d76e79315: Kernel_panic-not_syncing:Fatal_exception kernel test robot
` (2 preceding siblings ...)
2025-06-20 10:02 ` CONFIG_TEST_VMALLOC=y conflict/race with alloc_tag_init David Wang
@ 2025-06-20 14:24 ` David Wang
2025-06-20 19:59 ` Harry Yoo
2025-06-20 19:53 ` [PATCH v2] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users() Harry Yoo
4 siblings, 1 reply; 28+ messages in thread
From: David Wang @ 2025-06-20 14:24 UTC (permalink / raw)
To: akpm, urezki
Cc: linux-mm, linux-kernel, harry.yoo, kent.overstreet, surenb,
David Wang, kernel test robot
Commit 2d76e79315e4 ("lib/test_vmalloc.c: allow built-in execution")
enable test_vmalloc module to be built into kernel directly, but
vmalloc_test_init depends on alloc_tag module via alloc_tag_top_users().
When a kernel build with following config:
CONFIG_TEST_VMALLOC=y
CONFIG_MEM_ALLOC_PROFILING=y
CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y
CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
If vmalloc_test_init() run before alloc_tag_init(), memory
failure tests would invoke alloc_tag_top_users() which is not
ready to use and cause kernel BUG:
[ 135.116045] BUG: kernel NULL pointer dereference, address: 0000000000000030
[ 135.116063] #PF: supervisor read access in kernel mode
[ 135.116074] #PF: error_code(0x0000) - not-present page
[ 135.116085] PGD 0 P4D 0
[ 135.116094] Oops: Oops: 0000 [#1] SMP NOPTI
[ 135.116123] Tainted: [E]=UNSIGNED_MODULE
[ 135.116132] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 135.116148] RIP: 0010:down_read_trylock+0x1d/0x80
[ 135.116188] RSP: 0000:ffffb5e481a9b8f8 EFLAGS: 00010246
[ 135.116200] RAX: ffff93dc8a5ac700 RBX: 0000000000000030 RCX: 8000000000000007
[ 135.116214] RDX: 0000000000000001 RSI: 000000000000000a RDI: ffffffff93d2e733
[ 135.116228] RBP: ffffb5e481a9b9a0 R08: 0000000000000000 R09: 0000000000000003
[ 135.116241] R10: ffffb5e481a9b860 R11: ffffffff94ec6328 R12: ffffb5e481a9b9b0
[ 135.116255] R13: 0000000000000003 R14: 0000000000000001 R15: ffffffff94e0c580
[ 135.116271] FS: 00007fd41947e540(0000) GS:ffff93dd6654a000(0000) knlGS:0000000000000000
[ 135.116286] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 135.116298] CR2: 0000000000000030 CR3: 00000001099f8000 CR4: 0000000000350ef0
[ 135.116314] Call Trace:
[ 135.116321] <TASK>
[ 135.116328] codetag_trylock_module_list+0x9/0x20
[ 135.116342] alloc_tag_top_users+0x153/0x1b0
[ 135.116354] ? srso_return_thunk+0x5/0x5f
[ 135.116365] ? _printk+0x57/0x80
[ 135.116378] __show_mem+0xeb/0x210
[ 135.116394] ? dump_header+0x2ce/0x3e0
[ 135.116405] dump_header+0x2ce/0x3e0
Demote vmalloc_test_init to late_initcall can make sure alloc_tag
module got initialized before test_vmalloc module.
Link: https://lore.kernel.org/lkml/20250620100258.595495-1-00107082@163.com/
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com
Fixes: 2d76e79315e4 ("lib/test_vmalloc.c: allow built-in execution")
Signed-off-by: David Wang <00107082@163.com>
---
lib/test_vmalloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/test_vmalloc.c b/lib/test_vmalloc.c
index 1b0b59549aaf..5af009df56ad 100644
--- a/lib/test_vmalloc.c
+++ b/lib/test_vmalloc.c
@@ -598,7 +598,7 @@ static int __init vmalloc_test_init(void)
return IS_BUILTIN(CONFIG_TEST_VMALLOC) ? 0:-EAGAIN;
}
-module_init(vmalloc_test_init)
+late_initcall(vmalloc_test_init)
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Uladzislau Rezki");
--
2.39.2
^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH v2] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()
2025-06-18 6:25 [linus:master] [lib/test_vmalloc.c] 2d76e79315: Kernel_panic-not_syncing:Fatal_exception kernel test robot
` (3 preceding siblings ...)
2025-06-20 14:24 ` [PATCH] lib/test_vmalloc.c: demote vmalloc_test_init to late_initcall David Wang
@ 2025-06-20 19:53 ` Harry Yoo
2025-06-21 3:43 ` David Wang
4 siblings, 1 reply; 28+ messages in thread
From: Harry Yoo @ 2025-06-20 19:53 UTC (permalink / raw)
To: akpm, surenb, kent.overstreet
Cc: oliver.sang, 00107082, cachen, linux-mm, oe-lkp, Harry Yoo,
stable
alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
even when the alloc_tag_cttype is not allocated because:
1) alloc tagging is disabled because mem profiling is disabled
(!alloc_tag_cttype)
2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
3) alloc tagging is enabled, but failed initialization
(!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
In all cases, alloc_tag_cttype is not allocated, and therefore
alloc_tag_top_users() should not attempt to acquire the semaphore.
This leads to a crash on memory allocation failure by attempting to
acquire a non-existent semaphore:
Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
Tainted: [D]=DIE
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:down_read_trylock+0xaa/0x3b0
Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
Call Trace:
<TASK>
codetag_trylock_module_list+0xd/0x20
alloc_tag_top_users+0x369/0x4b0
__show_mem+0x1cd/0x6e0
warn_alloc+0x2b1/0x390
__alloc_frozen_pages_noprof+0x12b9/0x21a0
alloc_pages_mpol+0x135/0x3e0
alloc_slab_page+0x82/0xe0
new_slab+0x212/0x240
___slab_alloc+0x82a/0xe00
</TASK>
As David Wang points out, this issue became easier to trigger after commit
780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
Before the commit, the issue occurred only when it failed to allocate
and initialize alloc_tag_cttype or if a memory allocation fails before
alloc_tag_init() is called. After the commit, it can be easily triggered
when memory profiling is compiled but disabled at boot.
To properly determine whether alloc_tag_init() has been called and
its data structures initialized, verify that alloc_tag_cttype is a valid
pointer before acquiring the semaphore. If the variable is NULL or an error
value, it has not been properly initialized. In such a case, just skip
and do not attempt acquire the semaphore.
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com
Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
Cc: stable@vger.kernel.org
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
v1 -> v2:
- v1 fixed the bug only when MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=n.
v2 now fixes the bug even when MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y.
I didn't expect alloc_tag_cttype to be NULL when
mem_profiling_support is true, but as David points out (Thanks David!)
if a memory allocation fails before alloc_tag_init(), it can be NULL.
So instead of indirectly checking mem_profiling_support, just directly
check if alloc_tag_cttype is allocated.
- Closes: https://lore.kernel.org/oe-lkp/202505071555.e757f1e0-lkp@intel.com
tag was removed because it was not a crash and not relevant to this
patch.
- Added Cc: stable because, if an allocation fails before
alloc_tag_init(), it can be triggered even prior-780138b12381.
I verified that the bug can be triggered in v6.12 and fixed by this
patch.
It should be quite difficult to trigger in practice, though.
Maybe I'm a bit paranoid?
lib/alloc_tag.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 66a4628185f7..d8ec4c03b7d2 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -124,7 +124,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
struct codetag_bytes n;
unsigned int i, nr = 0;
- if (can_sleep)
+ if (IS_ERR_OR_NULL(alloc_tag_cttype))
+ return 0;
+ else if (can_sleep)
codetag_lock_module_list(alloc_tag_cttype, true);
else if (!codetag_trylock_module_list(alloc_tag_cttype))
return 0;
--
2.43.0
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH] lib/test_vmalloc.c: demote vmalloc_test_init to late_initcall
2025-06-20 14:24 ` [PATCH] lib/test_vmalloc.c: demote vmalloc_test_init to late_initcall David Wang
@ 2025-06-20 19:59 ` Harry Yoo
0 siblings, 0 replies; 28+ messages in thread
From: Harry Yoo @ 2025-06-20 19:59 UTC (permalink / raw)
To: David Wang
Cc: akpm, urezki, linux-mm, linux-kernel, kent.overstreet, surenb,
kernel test robot
On Fri, Jun 20, 2025 at 10:24:48PM +0800, David Wang wrote:
> Commit 2d76e79315e4 ("lib/test_vmalloc.c: allow built-in execution")
> enable test_vmalloc module to be built into kernel directly, but
> vmalloc_test_init depends on alloc_tag module via alloc_tag_top_users().
>
> When a kernel build with following config:
>
> CONFIG_TEST_VMALLOC=y
> CONFIG_MEM_ALLOC_PROFILING=y
> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y
> CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
>
> If vmalloc_test_init() run before alloc_tag_init(), memory
> failure tests would invoke alloc_tag_top_users() which is not
> ready to use and cause kernel BUG:
>
> [ 135.116045] BUG: kernel NULL pointer dereference, address: 0000000000000030
> [ 135.116063] #PF: supervisor read access in kernel mode
> [ 135.116074] #PF: error_code(0x0000) - not-present page
> [ 135.116085] PGD 0 P4D 0
> [ 135.116094] Oops: Oops: 0000 [#1] SMP NOPTI
> [ 135.116123] Tainted: [E]=UNSIGNED_MODULE
> [ 135.116132] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 135.116148] RIP: 0010:down_read_trylock+0x1d/0x80
> [ 135.116188] RSP: 0000:ffffb5e481a9b8f8 EFLAGS: 00010246
> [ 135.116200] RAX: ffff93dc8a5ac700 RBX: 0000000000000030 RCX: 8000000000000007
> [ 135.116214] RDX: 0000000000000001 RSI: 000000000000000a RDI: ffffffff93d2e733
> [ 135.116228] RBP: ffffb5e481a9b9a0 R08: 0000000000000000 R09: 0000000000000003
> [ 135.116241] R10: ffffb5e481a9b860 R11: ffffffff94ec6328 R12: ffffb5e481a9b9b0
> [ 135.116255] R13: 0000000000000003 R14: 0000000000000001 R15: ffffffff94e0c580
> [ 135.116271] FS: 00007fd41947e540(0000) GS:ffff93dd6654a000(0000) knlGS:0000000000000000
> [ 135.116286] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 135.116298] CR2: 0000000000000030 CR3: 00000001099f8000 CR4: 0000000000350ef0
> [ 135.116314] Call Trace:
> [ 135.116321] <TASK>
> [ 135.116328] codetag_trylock_module_list+0x9/0x20
> [ 135.116342] alloc_tag_top_users+0x153/0x1b0
> [ 135.116354] ? srso_return_thunk+0x5/0x5f
> [ 135.116365] ? _printk+0x57/0x80
> [ 135.116378] __show_mem+0xeb/0x210
> [ 135.116394] ? dump_header+0x2ce/0x3e0
> [ 135.116405] dump_header+0x2ce/0x3e0
>
> Demote vmalloc_test_init to late_initcall can make sure alloc_tag
> module got initialized before test_vmalloc module.
I'm not sure this is the right place to fix it.
The bug can be triggered by any early memory allocation failure,
before alloc_tag_init() is called (yeah, that's not that likely).
There is nothing specific to vmalloc that triggers the bug.
--
Cheers,
Harry / Hyeonggon
> Link: https://urldefense.com/v3/__https://lore.kernel.org/lkml/20250620100258.595495-1-00107082@163.com/__;!!ACWV5N9M2RV99hQ!NXhzLP0lE5O2YKK9PfCt3LDTk4qWGsy1ebNXBQETNNJrL2JS3R01iunwBVXbDA4_kKjrbyQWfzNa7iN5RQ$
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com__;!!ACWV5N9M2RV99hQ!NXhzLP0lE5O2YKK9PfCt3LDTk4qWGsy1ebNXBQETNNJrL2JS3R01iunwBVXbDA4_kKjrbyQWfzP95R5wqA$
> Fixes: 2d76e79315e4 ("lib/test_vmalloc.c: allow built-in execution")
> Signed-off-by: David Wang <00107082@163.com>
> ---
> lib/test_vmalloc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lib/test_vmalloc.c b/lib/test_vmalloc.c
> index 1b0b59549aaf..5af009df56ad 100644
> --- a/lib/test_vmalloc.c
> +++ b/lib/test_vmalloc.c
> @@ -598,7 +598,7 @@ static int __init vmalloc_test_init(void)
> return IS_BUILTIN(CONFIG_TEST_VMALLOC) ? 0:-EAGAIN;
> }
>
> -module_init(vmalloc_test_init)
> +late_initcall(vmalloc_test_init)
>
> MODULE_LICENSE("GPL");
> MODULE_AUTHOR("Uladzislau Rezki");
> --
> 2.39.2
>
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re:[PATCH v2] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()
2025-06-20 19:53 ` [PATCH v2] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users() Harry Yoo
@ 2025-06-21 3:43 ` David Wang
2025-06-22 22:24 ` [PATCH " Suren Baghdasaryan
0 siblings, 1 reply; 28+ messages in thread
From: David Wang @ 2025-06-21 3:43 UTC (permalink / raw)
To: Harry Yoo
Cc: akpm, surenb, kent.overstreet, oliver.sang, cachen, linux-mm,
oe-lkp, stable
At 2025-06-21 03:53:05, "Harry Yoo" <harry.yoo@oracle.com> wrote:
>alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
>even when the alloc_tag_cttype is not allocated because:
>
> 1) alloc tagging is disabled because mem profiling is disabled
> (!alloc_tag_cttype)
> 2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
> 3) alloc tagging is enabled, but failed initialization
> (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
>
>In all cases, alloc_tag_cttype is not allocated, and therefore
>alloc_tag_top_users() should not attempt to acquire the semaphore.
>
>This leads to a crash on memory allocation failure by attempting to
>acquire a non-existent semaphore:
>
> Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
> KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
> CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
> Tainted: [D]=DIE
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> RIP: 0010:down_read_trylock+0xaa/0x3b0
> Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
> RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
> RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
> RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
> R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
> R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
> FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
> Call Trace:
> <TASK>
> codetag_trylock_module_list+0xd/0x20
> alloc_tag_top_users+0x369/0x4b0
> __show_mem+0x1cd/0x6e0
> warn_alloc+0x2b1/0x390
> __alloc_frozen_pages_noprof+0x12b9/0x21a0
> alloc_pages_mpol+0x135/0x3e0
> alloc_slab_page+0x82/0xe0
> new_slab+0x212/0x240
> ___slab_alloc+0x82a/0xe00
> </TASK>
>
>As David Wang points out, this issue became easier to trigger after commit
>780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
>
>Before the commit, the issue occurred only when it failed to allocate
>and initialize alloc_tag_cttype or if a memory allocation fails before
>alloc_tag_init() is called. After the commit, it can be easily triggered
>when memory profiling is compiled but disabled at boot.
>
>To properly determine whether alloc_tag_init() has been called and
>its data structures initialized, verify that alloc_tag_cttype is a valid
>pointer before acquiring the semaphore. If the variable is NULL or an error
>value, it has not been properly initialized. In such a case, just skip
>and do not attempt acquire the semaphore.
>
>Reported-by: kernel test robot <oliver.sang@intel.com>
>Closes: https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com
>Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
>Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
>Cc: stable@vger.kernel.org
>Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Just notice another thread can be closed as well:
https://lore.kernel.org/all/202506131711.5b41931c-lkp@intel.com/
This coincide with scenario #1, where OOM happened with
CONFIG_MEM_ALLOC_PROFILING=y
# CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT is not set
# CONFIG_MEM_ALLOC_PROFILING_DEBUG is not set
>---
>
>v1 -> v2:
>
>- v1 fixed the bug only when MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=n.
>
> v2 now fixes the bug even when MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y.
> I didn't expect alloc_tag_cttype to be NULL when
> mem_profiling_support is true, but as David points out (Thanks David!)
> if a memory allocation fails before alloc_tag_init(), it can be NULL.
>
> So instead of indirectly checking mem_profiling_support, just directly
> check if alloc_tag_cttype is allocated.
>
>- Closes: https://lore.kernel.org/oe-lkp/202505071555.e757f1e0-lkp@intel.com
> tag was removed because it was not a crash and not relevant to this
> patch.
>
>- Added Cc: stable because, if an allocation fails before
> alloc_tag_init(), it can be triggered even prior-780138b12381.
> I verified that the bug can be triggered in v6.12 and fixed by this
> patch.
>
> It should be quite difficult to trigger in practice, though.
> Maybe I'm a bit paranoid?
>
> lib/alloc_tag.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
>diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
>index 66a4628185f7..d8ec4c03b7d2 100644
>--- a/lib/alloc_tag.c
>+++ b/lib/alloc_tag.c
>@@ -124,7 +124,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> struct codetag_bytes n;
> unsigned int i, nr = 0;
>
>- if (can_sleep)
>+ if (IS_ERR_OR_NULL(alloc_tag_cttype))
>+ return 0;
>+ else if (can_sleep)
> codetag_lock_module_list(alloc_tag_cttype, true);
> else if (!codetag_trylock_module_list(alloc_tag_cttype))
> return 0;
>--
>2.43.0
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()
2025-06-21 3:43 ` David Wang
@ 2025-06-22 22:24 ` Suren Baghdasaryan
2025-06-23 2:01 ` Harry Yoo
0 siblings, 1 reply; 28+ messages in thread
From: Suren Baghdasaryan @ 2025-06-22 22:24 UTC (permalink / raw)
To: David Wang
Cc: Harry Yoo, akpm, kent.overstreet, oliver.sang, cachen, linux-mm,
oe-lkp, stable
On Fri, Jun 20, 2025 at 8:43 PM David Wang <00107082@163.com> wrote:
>
>
> At 2025-06-21 03:53:05, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> >alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
> >even when the alloc_tag_cttype is not allocated because:
> >
> > 1) alloc tagging is disabled because mem profiling is disabled
> > (!alloc_tag_cttype)
> > 2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
> > 3) alloc tagging is enabled, but failed initialization
> > (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
> >
> >In all cases, alloc_tag_cttype is not allocated, and therefore
> >alloc_tag_top_users() should not attempt to acquire the semaphore.
> >
> >This leads to a crash on memory allocation failure by attempting to
> >acquire a non-existent semaphore:
> >
> > Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
> > KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
> > CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
> > Tainted: [D]=DIE
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > RIP: 0010:down_read_trylock+0xaa/0x3b0
> > Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
> > RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
> > RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
> > RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> > RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
> > R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
> > R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
> > FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
> > Call Trace:
> > <TASK>
> > codetag_trylock_module_list+0xd/0x20
> > alloc_tag_top_users+0x369/0x4b0
> > __show_mem+0x1cd/0x6e0
> > warn_alloc+0x2b1/0x390
> > __alloc_frozen_pages_noprof+0x12b9/0x21a0
> > alloc_pages_mpol+0x135/0x3e0
> > alloc_slab_page+0x82/0xe0
> > new_slab+0x212/0x240
> > ___slab_alloc+0x82a/0xe00
> > </TASK>
> >
> >As David Wang points out, this issue became easier to trigger after commit
> >780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
> >
> >Before the commit, the issue occurred only when it failed to allocate
> >and initialize alloc_tag_cttype or if a memory allocation fails before
> >alloc_tag_init() is called. After the commit, it can be easily triggered
> >when memory profiling is compiled but disabled at boot.
Thanks for the fix and sorry about the delay with reviewing it.
> >
> >To properly determine whether alloc_tag_init() has been called and
> >its data structures initialized, verify that alloc_tag_cttype is a valid
> >pointer before acquiring the semaphore. If the variable is NULL or an error
> >value, it has not been properly initialized. In such a case, just skip
> >and do not attempt acquire the semaphore.
nit: s/attempt acquire/attempt to acquire
> >
> >Reported-by: kernel test robot <oliver.sang@intel.com>
> >Closes: https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com
> >Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
> >Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
> >Cc: stable@vger.kernel.org
> >Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
>
> Just notice another thread can be closed as well:
> https://lore.kernel.org/all/202506131711.5b41931c-lkp@intel.com/
> This coincide with scenario #1, where OOM happened with
> CONFIG_MEM_ALLOC_PROFILING=y
> # CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT is not set
> # CONFIG_MEM_ALLOC_PROFILING_DEBUG is not set
>
> >---
> >
> >v1 -> v2:
> >
> >- v1 fixed the bug only when MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=n.
> >
> > v2 now fixes the bug even when MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y.
> > I didn't expect alloc_tag_cttype to be NULL when
> > mem_profiling_support is true, but as David points out (Thanks David!)
> > if a memory allocation fails before alloc_tag_init(), it can be NULL.
> >
> > So instead of indirectly checking mem_profiling_support, just directly
> > check if alloc_tag_cttype is allocated.
> >
> >- Closes: https://lore.kernel.org/oe-lkp/202505071555.e757f1e0-lkp@intel.com
> > tag was removed because it was not a crash and not relevant to this
> > patch.
> >
> >- Added Cc: stable because, if an allocation fails before
> > alloc_tag_init(), it can be triggered even prior-780138b12381.
> > I verified that the bug can be triggered in v6.12 and fixed by this
> > patch.
> >
> > It should be quite difficult to trigger in practice, though.
> > Maybe I'm a bit paranoid?
> >
> > lib/alloc_tag.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> >diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> >index 66a4628185f7..d8ec4c03b7d2 100644
> >--- a/lib/alloc_tag.c
> >+++ b/lib/alloc_tag.c
> >@@ -124,7 +124,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> > struct codetag_bytes n;
> > unsigned int i, nr = 0;
> >
> >- if (can_sleep)
> >+ if (IS_ERR_OR_NULL(alloc_tag_cttype))
> >+ return 0;
So, AFAIKT alloc_tag_cttype will be NULL when memory profiling is
disabled and it will be ENOMEM if codetag_register_type() fails. I
think it would be good to add a pr_warn() in the alloc_tag_init() when
codetag_register_type() fails so that the user can determine the
reason why show_mem() report is missing allocation tag information.
> >+ else if (can_sleep)
nit: the above extra "else" is not really needed. The following should
work just fine, is more readable and produces less churn:
+ if (IS_ERR_OR_NULL(alloc_tag_cttype))
+ return 0;
+
if (can_sleep)
codetag_lock_module_list(alloc_tag_cttype, true);
else if (!codetag_trylock_module_list(alloc_tag_cttype))
return 0;
> > codetag_lock_module_list(alloc_tag_cttype, true);
> > else if (!codetag_trylock_module_list(alloc_tag_cttype))
> > return 0;
> >--
> >2.43.0
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: CONFIG_TEST_VMALLOC=y conflict/race with alloc_tag_init
2025-06-20 10:02 ` CONFIG_TEST_VMALLOC=y conflict/race with alloc_tag_init David Wang
@ 2025-06-22 22:50 ` Suren Baghdasaryan
2025-06-23 2:04 ` Harry Yoo
2025-06-23 2:45 ` David Wang
0 siblings, 2 replies; 28+ messages in thread
From: Suren Baghdasaryan @ 2025-06-22 22:50 UTC (permalink / raw)
To: David Wang
Cc: oliver.sang, urezki, ahuang12, akpm, bhe, hch, linux-kernel,
linux-mm, lkp, mjguzik, oe-lkp, harry.yoo, kent.overstreet
On Fri, Jun 20, 2025 at 3:03 AM David Wang <00107082@163.com> wrote:
>
> On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
> >
> > Hello,
> >
> > for this change, we reported
> > "[linux-next:master] [lib/test_vmalloc.c] 7fc85b92db: Mem-Info"
> > in
> > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
> >
> > at that time, we made some tests with x86_64 config which runs well.
> >
> > now we noticed the commit is in mainline now.
>
> > the config still has expected diff with parent:
> >
> > --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config 2025-06-17 14:40:29.481052101 +0800
> > +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config 2025-06-17 14:41:18.448543738 +0800
> > @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m
> > CONFIG_TEST_MISC_MINOR=m
> > # CONFIG_TEST_LKM is not set
> > CONFIG_TEST_BITOPS=m
> > -CONFIG_TEST_VMALLOC=m
> > +CONFIG_TEST_VMALLOC=y
> > # CONFIG_TEST_BPF is not set
> > CONFIG_FIND_BIT_BENCHMARK=m
> > # CONFIG_TEST_FIRMWARE is not set
> >
> >
> > then we noticed similar random issue with x86_64 randconfig this time.
> >
> > 7a73348e5d4715b5 2d76e79315e403aab595d4c8830
> > ---------------- ---------------------------
> > fail:runs %reproduction fail:runs
> > | | |
> > :199 34% 67:200 dmesg.KASAN:null-ptr-deref_in_range[#-#]
> > :199 34% 67:200 dmesg.Kernel_panic-not_syncing:Fatal_exception
> > :199 34% 67:200 dmesg.Mem-Info
> > :199 34% 67:200 dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN
> > :199 34% 67:200 dmesg.RIP:down_read_trylock
> >
> > we don't have enough knowledge to understand the relationship between code
> > change and the random issues. just report what we obsverved in our tests FYI.
> >
>
> I think this is caused by a race between vmalloc_test_init and alloc_tag_init.
>
> vmalloc_test actually depends on alloc_tag via alloc_tag_top_users, because when
> memory allocation fails show_mem() would invoke alloc_tag_top_users.
>
> With following configuration:
>
> CONFIG_TEST_VMALLOC=y
> CONFIG_MEM_ALLOC_PROFILING=y
> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y
> CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
>
> If vmalloc_test_init starts before alloc_tag_init, show_mem() would cause
> a NULL deference because alloc_tag_cttype was not init yet.
>
> I add some debug to confirm this theory
> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> index d48b80f3f007..9b8e7501010f 100644
> --- a/lib/alloc_tag.c
> +++ b/lib/alloc_tag.c
> @@ -133,6 +133,8 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> struct codetag *ct;
> struct codetag_bytes n;
> unsigned int i, nr = 0;
> + pr_info("memory profiling alloc top %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
> + return 0;
>
> if (can_sleep)
> codetag_lock_module_list(alloc_tag_cttype, true);
> @@ -831,6 +833,7 @@ static int __init alloc_tag_init(void)
> shutdown_mem_profiling(true);
> return PTR_ERR(alloc_tag_cttype);
> }
> + pr_info("memory profiling ready %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
>
> return 0;
> }
>
> When bootup the kernel, the log shows:
>
> $ sudo dmesg -T | grep profiling
> [Fri Jun 20 17:29:35 2025] memory profiling alloc top 1: 0 <--- alloc_tag_cttype == NULL
> [Fri Jun 20 17:30:24 2025] memory profiling ready 1: ffff9b1641aa06c0
>
>
> vmalloc_test_init should happened after alloc_tag_init if CONFIG_TEST_VMALLOC=y,
> or mem_show() should check whether alloc_tag is done initialized when calling
> alloc_tag_top_users
Thanks for reporting!
So, IIUC https://lore.kernel.org/all/20250620195305.1115151-1-harry.yoo@oracle.com/
will address this issue as well. Is that correct?
>
>
>
> David
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Kernel crash due to alloc_tag_top_users() being called when !mem_profiling_support?
2025-06-20 8:47 ` Uladzislau Rezki
@ 2025-06-22 22:54 ` Suren Baghdasaryan
2025-06-23 11:29 ` Uladzislau Rezki
0 siblings, 1 reply; 28+ messages in thread
From: Suren Baghdasaryan @ 2025-06-22 22:54 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Harry Yoo, kernel test robot, oe-lkp, lkp, linux-kernel,
Andrew Morton, Baoquan He, Adrian Huang, Christop Hellwig,
Mateusz Guzik, linux-mm, Kent Overstreet
On Fri, Jun 20, 2025 at 1:47 AM Uladzislau Rezki <urezki@gmail.com> wrote:
>
> On Fri, Jun 20, 2025 at 12:04:50AM +0900, Harry Yoo wrote:
> > On Thu, Jun 19, 2025 at 11:10:43PM +0900, Harry Yoo wrote:
> > > On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
> > > >
> > > > Hello,
> > > >
> > > > for this change, we reported
> > > > "[linux-next:master] [lib/test_vmalloc.c] 7fc85b92db: Mem-Info"
> > > > in
> > > > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
> > > >
> > > > at that time, we made some tests with x86_64 config which runs well.
> > > >
> > > > now we noticed the commit is in mainline now.
> > >
> > > (Re-sending due to not Ccing people and the list...)
> > >
> > > Hi, I'm facing the same error on my testing environment.
> >
> > I should have clarified that the reason the kernel failed to allocate
> > memory on my machine was due to running out of memory, not because of the
> > vmalloc test module.
> >
> > But based on the fact that the test case (align_shift_alloc_test) is
> > expected to fail, the issue here is not memory allocation failure
> > itself, but rather that the kernel crashes when the allocation fails.
> >
> It looks someone tries to test the CONFIG_TEST_VMALLOC=y as built-in
> approach test-cases. Yes, it will trigger a lot of warnings as some
> use cases are supposed to be failed. This will trigger a lot of kernel
> warnings which can be considered by test-robot or people as problem.
>
> In this case i can exclude those use cases or even not run at all unless
> boot-parameters properly sets if built-in.
Sorry, I'm catching up on my email backlog. IIUC
https://lore.kernel.org/all/20250620195305.1115151-1-harry.yoo@oracle.com/
addresses this issue. Is my understanding correct?
>
> --
> Uladzislau Rezki
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH v2] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()
2025-06-22 22:24 ` [PATCH " Suren Baghdasaryan
@ 2025-06-23 2:01 ` Harry Yoo
0 siblings, 0 replies; 28+ messages in thread
From: Harry Yoo @ 2025-06-23 2:01 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: David Wang, akpm, kent.overstreet, oliver.sang, cachen, linux-mm,
oe-lkp, stable
On Sun, Jun 22, 2025 at 03:24:08PM -0700, Suren Baghdasaryan wrote:
> On Fri, Jun 20, 2025 at 8:43 PM David Wang <00107082@163.com> wrote:
> >
> >
> > At 2025-06-21 03:53:05, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> > >alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
> > >even when the alloc_tag_cttype is not allocated because:
> > >
> > > 1) alloc tagging is disabled because mem profiling is disabled
> > > (!alloc_tag_cttype)
> > > 2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
> > > 3) alloc tagging is enabled, but failed initialization
> > > (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
> > >
> > >In all cases, alloc_tag_cttype is not allocated, and therefore
> > >alloc_tag_top_users() should not attempt to acquire the semaphore.
> > >
> > >This leads to a crash on memory allocation failure by attempting to
> > >acquire a non-existent semaphore:
> > >
> > > Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
> > > KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
> > > CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
> > > Tainted: [D]=DIE
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > RIP: 0010:down_read_trylock+0xaa/0x3b0
> > > Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
> > > RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
> > > RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
> > > RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> > > RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
> > > R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
> > > R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
> > > FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
> > > Call Trace:
> > > <TASK>
> > > codetag_trylock_module_list+0xd/0x20
> > > alloc_tag_top_users+0x369/0x4b0
> > > __show_mem+0x1cd/0x6e0
> > > warn_alloc+0x2b1/0x390
> > > __alloc_frozen_pages_noprof+0x12b9/0x21a0
> > > alloc_pages_mpol+0x135/0x3e0
> > > alloc_slab_page+0x82/0xe0
> > > new_slab+0x212/0x240
> > > ___slab_alloc+0x82a/0xe00
> > > </TASK>
> > >
> > >As David Wang points out, this issue became easier to trigger after commit
> > >780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
> > >
> > >Before the commit, the issue occurred only when it failed to allocate
> > >and initialize alloc_tag_cttype or if a memory allocation fails before
> > >alloc_tag_init() is called. After the commit, it can be easily triggered
> > >when memory profiling is compiled but disabled at boot.
>
> Thanks for the fix and sorry about the delay with reviewing it.
No problem ;)
> > >
> > >To properly determine whether alloc_tag_init() has been called and
> > >its data structures initialized, verify that alloc_tag_cttype is a valid
> > >pointer before acquiring the semaphore. If the variable is NULL or an error
> > >value, it has not been properly initialized. In such a case, just skip
> > >and do not attempt acquire the semaphore.
>
> nit: s/attempt acquire/attempt to acquire
Will fix the typo.
> > >
> > >Reported-by: kernel test robot <oliver.sang@intel.com>
> > >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com__;!!ACWV5N9M2RV99hQ!NZv9w8rtFb5ni1zqQs7y8loVNvbrbW3d1pBi4bA_f_Tfh-pegcni0iK5642QuK6FqCBCaOUfy-7KeUc$
> > >Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
> > >Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
> > >Cc: stable@vger.kernel.org
> > >Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> >
> > Just notice another thread can be closed as well:
> > https://urldefense.com/v3/__https://lore.kernel.org/all/202506131711.5b41931c-lkp@intel.com/__;!!ACWV5N9M2RV99hQ!NZv9w8rtFb5ni1zqQs7y8loVNvbrbW3d1pBi4bA_f_Tfh-pegcni0iK5642QuK6FqCBCaOUfSGgkKj0$
> > This coincide with scenario #1, where OOM happened with
> > CONFIG_MEM_ALLOC_PROFILING=y
> > # CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT is not set
> > # CONFIG_MEM_ALLOC_PROFILING_DEBUG is not set
> >
> > >---
> > >
> > >v1 -> v2:
> > >
> > >- v1 fixed the bug only when MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=n.
> > >
> > > v2 now fixes the bug even when MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y.
> > > I didn't expect alloc_tag_cttype to be NULL when
> > > mem_profiling_support is true, but as David points out (Thanks David!)
> > > if a memory allocation fails before alloc_tag_init(), it can be NULL.
> > >
> > > So instead of indirectly checking mem_profiling_support, just directly
> > > check if alloc_tag_cttype is allocated.
> > >
> > >- Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202505071555.e757f1e0-lkp@intel.com__;!!ACWV5N9M2RV99hQ!NZv9w8rtFb5ni1zqQs7y8loVNvbrbW3d1pBi4bA_f_Tfh-pegcni0iK5642QuK6FqCBCaOUfwfwsQlE$
> > > tag was removed because it was not a crash and not relevant to this
> > > patch.
> > >
> > >- Added Cc: stable because, if an allocation fails before
> > > alloc_tag_init(), it can be triggered even prior-780138b12381.
> > > I verified that the bug can be triggered in v6.12 and fixed by this
> > > patch.
> > >
> > > It should be quite difficult to trigger in practice, though.
> > > Maybe I'm a bit paranoid?
> > >
> > > lib/alloc_tag.c | 4 +++-
> > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > >diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> > >index 66a4628185f7..d8ec4c03b7d2 100644
> > >--- a/lib/alloc_tag.c
> > >+++ b/lib/alloc_tag.c
> > >@@ -124,7 +124,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> > > struct codetag_bytes n;
> > > unsigned int i, nr = 0;
> > >
> > >- if (can_sleep)
> > >+ if (IS_ERR_OR_NULL(alloc_tag_cttype))
> > >+ return 0;
>
> So, AFAIKT alloc_tag_cttype will be NULL when memory profiling is
> disabled and it will be ENOMEM if codetag_register_type() fails.
Yes.
Or when memory profiling is enabled, but a memory allocation fails
before alloc_tag_init().
> I think it would be good to add a pr_warn() in the alloc_tag_init() when
> codetag_register_type() fails so that the user can determine the
> reason why show_mem() report is missing allocation tag information.
Will do.
> > >+ else if (can_sleep)
>
> nit: the above extra "else" is not really needed. The following should
> work just fine, is more readable and produces less churn:
>
> + if (IS_ERR_OR_NULL(alloc_tag_cttype))
> + return 0;
> +
> if (can_sleep)
> codetag_lock_module_list(alloc_tag_cttype, true);
> else if (!codetag_trylock_module_list(alloc_tag_cttype))
> return 0;
Will do, thanks!
>
> > > codetag_lock_module_list(alloc_tag_cttype, true);
> > > else if (!codetag_trylock_module_list(alloc_tag_cttype))
> > > return 0;
> > >--
> > >2.43.0
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: CONFIG_TEST_VMALLOC=y conflict/race with alloc_tag_init
2025-06-22 22:50 ` Suren Baghdasaryan
@ 2025-06-23 2:04 ` Harry Yoo
2025-06-23 2:45 ` David Wang
1 sibling, 0 replies; 28+ messages in thread
From: Harry Yoo @ 2025-06-23 2:04 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: David Wang, oliver.sang, urezki, ahuang12, akpm, bhe, hch,
linux-kernel, linux-mm, lkp, mjguzik, oe-lkp, kent.overstreet
On Sun, Jun 22, 2025 at 03:50:44PM -0700, Suren Baghdasaryan wrote:
> On Fri, Jun 20, 2025 at 3:03 AM David Wang <00107082@163.com> wrote:
> >
> > On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
> > >
> > > Hello,
> > >
> > > for this change, we reported
> > > "[linux-next:master] [lib/test_vmalloc.c] 7fc85b92db: Mem-Info"
> > > in
> > > https://urldefense.com/v3/__https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/__;!!ACWV5N9M2RV99hQ!LY3bHD8lW73pDdoyiPE87NlpBt6nrJCqoSCm7mxOX2M5tOiT__0NF9Hs2Qm0otnk8D6kx9-OrbpZWVI$
> > >
> > > at that time, we made some tests with x86_64 config which runs well.
> > >
> > > now we noticed the commit is in mainline now.
> >
> > > the config still has expected diff with parent:
> > >
> > > --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config 2025-06-17 14:40:29.481052101 +0800
> > > +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config 2025-06-17 14:41:18.448543738 +0800
> > > @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m
> > > CONFIG_TEST_MISC_MINOR=m
> > > # CONFIG_TEST_LKM is not set
> > > CONFIG_TEST_BITOPS=m
> > > -CONFIG_TEST_VMALLOC=m
> > > +CONFIG_TEST_VMALLOC=y
> > > # CONFIG_TEST_BPF is not set
> > > CONFIG_FIND_BIT_BENCHMARK=m
> > > # CONFIG_TEST_FIRMWARE is not set
> > >
> > >
> > > then we noticed similar random issue with x86_64 randconfig this time.
> > >
> > > 7a73348e5d4715b5 2d76e79315e403aab595d4c8830
> > > ---------------- ---------------------------
> > > fail:runs %reproduction fail:runs
> > > | | |
> > > :199 34% 67:200 dmesg.KASAN:null-ptr-deref_in_range[#-#]
> > > :199 34% 67:200 dmesg.Kernel_panic-not_syncing:Fatal_exception
> > > :199 34% 67:200 dmesg.Mem-Info
> > > :199 34% 67:200 dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN
> > > :199 34% 67:200 dmesg.RIP:down_read_trylock
> > >
> > > we don't have enough knowledge to understand the relationship between code
> > > change and the random issues. just report what we obsverved in our tests FYI.
> > >
> >
> > I think this is caused by a race between vmalloc_test_init and alloc_tag_init.
> >
> > vmalloc_test actually depends on alloc_tag via alloc_tag_top_users, because when
> > memory allocation fails show_mem() would invoke alloc_tag_top_users.
> >
> > With following configuration:
> >
> > CONFIG_TEST_VMALLOC=y
> > CONFIG_MEM_ALLOC_PROFILING=y
> > CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y
> > CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
> >
> > If vmalloc_test_init starts before alloc_tag_init, show_mem() would cause
> > a NULL deference because alloc_tag_cttype was not init yet.
> >
> > I add some debug to confirm this theory
> > diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> > index d48b80f3f007..9b8e7501010f 100644
> > --- a/lib/alloc_tag.c
> > +++ b/lib/alloc_tag.c
> > @@ -133,6 +133,8 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> > struct codetag *ct;
> > struct codetag_bytes n;
> > unsigned int i, nr = 0;
> > + pr_info("memory profiling alloc top %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
> > + return 0;
> >
> > if (can_sleep)
> > codetag_lock_module_list(alloc_tag_cttype, true);
> > @@ -831,6 +833,7 @@ static int __init alloc_tag_init(void)
> > shutdown_mem_profiling(true);
> > return PTR_ERR(alloc_tag_cttype);
> > }
> > + pr_info("memory profiling ready %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
> >
> > return 0;
> > }
> >
> > When bootup the kernel, the log shows:
> >
> > $ sudo dmesg -T | grep profiling
> > [Fri Jun 20 17:29:35 2025] memory profiling alloc top 1: 0 <--- alloc_tag_cttype == NULL
> > [Fri Jun 20 17:30:24 2025] memory profiling ready 1: ffff9b1641aa06c0
> >
> >
> > vmalloc_test_init should happened after alloc_tag_init if CONFIG_TEST_VMALLOC=y,
> > or mem_show() should check whether alloc_tag is done initialized when calling
> > alloc_tag_top_users
>
> Thanks for reporting!
> So, IIUC https://lore.kernel.org/all/20250620195305.1115151-1-harry.yoo@oracle.com/
> will address this issue as well. Is that correct?
Yes, I verified that it addresses this issue.
> >
> > David
> >
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: CONFIG_TEST_VMALLOC=y conflict/race with alloc_tag_init
2025-06-22 22:50 ` Suren Baghdasaryan
2025-06-23 2:04 ` Harry Yoo
@ 2025-06-23 2:45 ` David Wang
2025-06-23 3:16 ` David Wang
2025-06-23 11:36 ` Uladzislau Rezki
1 sibling, 2 replies; 28+ messages in thread
From: David Wang @ 2025-06-23 2:45 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: oliver.sang, urezki, ahuang12, akpm, bhe, hch, linux-kernel,
linux-mm, lkp, mjguzik, oe-lkp, harry.yoo, kent.overstreet
At 2025-06-23 06:50:44, "Suren Baghdasaryan" <surenb@google.com> wrote:
>On Fri, Jun 20, 2025 at 3:03 AM David Wang <00107082@163.com> wrote:
>>
>> On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
>> >
>> > Hello,
>> >
>> > for this change, we reported
>> > "[linux-next:master] [lib/test_vmalloc.c] 7fc85b92db: Mem-Info"
>> > in
>> > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
>> >
>> > at that time, we made some tests with x86_64 config which runs well.
>> >
>> > now we noticed the commit is in mainline now.
>>
>> > the config still has expected diff with parent:
>> >
>> > --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config 2025-06-17 14:40:29.481052101 +0800
>> > +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config 2025-06-17 14:41:18.448543738 +0800
>> > @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m
>> > CONFIG_TEST_MISC_MINOR=m
>> > # CONFIG_TEST_LKM is not set
>> > CONFIG_TEST_BITOPS=m
>> > -CONFIG_TEST_VMALLOC=m
>> > +CONFIG_TEST_VMALLOC=y
>> > # CONFIG_TEST_BPF is not set
>> > CONFIG_FIND_BIT_BENCHMARK=m
>> > # CONFIG_TEST_FIRMWARE is not set
>> >
>> >
>> > then we noticed similar random issue with x86_64 randconfig this time.
>> >
>> > 7a73348e5d4715b5 2d76e79315e403aab595d4c8830
>> > ---------------- ---------------------------
>> > fail:runs %reproduction fail:runs
>> > | | |
>> > :199 34% 67:200 dmesg.KASAN:null-ptr-deref_in_range[#-#]
>> > :199 34% 67:200 dmesg.Kernel_panic-not_syncing:Fatal_exception
>> > :199 34% 67:200 dmesg.Mem-Info
>> > :199 34% 67:200 dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN
>> > :199 34% 67:200 dmesg.RIP:down_read_trylock
>> >
>> > we don't have enough knowledge to understand the relationship between code
>> > change and the random issues. just report what we obsverved in our tests FYI.
>> >
>>
>> I think this is caused by a race between vmalloc_test_init and alloc_tag_init.
>>
>> vmalloc_test actually depends on alloc_tag via alloc_tag_top_users, because when
>> memory allocation fails show_mem() would invoke alloc_tag_top_users.
>>
>> With following configuration:
>>
>> CONFIG_TEST_VMALLOC=y
>> CONFIG_MEM_ALLOC_PROFILING=y
>> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y
>> CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
>>
>> If vmalloc_test_init starts before alloc_tag_init, show_mem() would cause
>> a NULL deference because alloc_tag_cttype was not init yet.
>>
>> I add some debug to confirm this theory
>> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
>> index d48b80f3f007..9b8e7501010f 100644
>> --- a/lib/alloc_tag.c
>> +++ b/lib/alloc_tag.c
>> @@ -133,6 +133,8 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
>> struct codetag *ct;
>> struct codetag_bytes n;
>> unsigned int i, nr = 0;
>> + pr_info("memory profiling alloc top %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
>> + return 0;
>>
>> if (can_sleep)
>> codetag_lock_module_list(alloc_tag_cttype, true);
>> @@ -831,6 +833,7 @@ static int __init alloc_tag_init(void)
>> shutdown_mem_profiling(true);
>> return PTR_ERR(alloc_tag_cttype);
>> }
>> + pr_info("memory profiling ready %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
>>
>> return 0;
>> }
>>
>> When bootup the kernel, the log shows:
>>
>> $ sudo dmesg -T | grep profiling
>> [Fri Jun 20 17:29:35 2025] memory profiling alloc top 1: 0 <--- alloc_tag_cttype == NULL
>> [Fri Jun 20 17:30:24 2025] memory profiling ready 1: ffff9b1641aa06c0
>>
>>
>> vmalloc_test_init should happened after alloc_tag_init if CONFIG_TEST_VMALLOC=y,
>> or mem_show() should check whether alloc_tag is done initialized when calling
>> alloc_tag_top_users
>
>Thanks for reporting!
>So, IIUC https://lore.kernel.org/all/20250620195305.1115151-1-harry.yoo@oracle.com/
>will address this issue as well. Is that correct?
Yes, the panic can be fix by that patch.
I still feel it better to delay vmalloc_test_init, make it happen after alloc_tag_init.
Or, maybe we can promote alloc_tag_init to some early init? I remember reporting some allocation
not registered by memory profiling during boot,
https://lore.kernel.org/all/213ff7d2.7c6c.1945eb0c2ff.Coremail.00107082@163.com/
I will make some tests, and update later
David
>
>>
>>
>>
>> David
>>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: CONFIG_TEST_VMALLOC=y conflict/race with alloc_tag_init
2025-06-23 2:45 ` David Wang
@ 2025-06-23 3:16 ` David Wang
2025-06-23 4:39 ` David Wang
2025-06-23 11:36 ` Uladzislau Rezki
1 sibling, 1 reply; 28+ messages in thread
From: David Wang @ 2025-06-23 3:16 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: oliver.sang, urezki, ahuang12, akpm, bhe, hch, linux-kernel,
linux-mm, lkp, mjguzik, oe-lkp, harry.yoo, kent.overstreet
At 2025-06-23 10:45:31, "David Wang" <00107082@163.com> wrote:
>
>At 2025-06-23 06:50:44, "Suren Baghdasaryan" <surenb@google.com> wrote:
>>On Fri, Jun 20, 2025 at 3:03 AM David Wang <00107082@163.com> wrote:
>>>
>>> On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
>>> >
>>> > Hello,
>>> >
>>> > for this change, we reported
>>> > "[linux-next:master] [lib/test_vmalloc.c] 7fc85b92db: Mem-Info"
>>> > in
>>> > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
>>> >
>>> > at that time, we made some tests with x86_64 config which runs well.
>>> >
>>> > now we noticed the commit is in mainline now.
>>>
>>> > the config still has expected diff with parent:
>>> >
>>> > --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config 2025-06-17 14:40:29.481052101 +0800
>>> > +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config 2025-06-17 14:41:18.448543738 +0800
>>> > @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m
>>> > CONFIG_TEST_MISC_MINOR=m
>>> > # CONFIG_TEST_LKM is not set
>>> > CONFIG_TEST_BITOPS=m
>>> > -CONFIG_TEST_VMALLOC=m
>>> > +CONFIG_TEST_VMALLOC=y
>>> > # CONFIG_TEST_BPF is not set
>>> > CONFIG_FIND_BIT_BENCHMARK=m
>>> > # CONFIG_TEST_FIRMWARE is not set
>>> >
>>> >
>>> > then we noticed similar random issue with x86_64 randconfig this time.
>>> >
>>> > 7a73348e5d4715b5 2d76e79315e403aab595d4c8830
>>> > ---------------- ---------------------------
>>> > fail:runs %reproduction fail:runs
>>> > | | |
>>> > :199 34% 67:200 dmesg.KASAN:null-ptr-deref_in_range[#-#]
>>> > :199 34% 67:200 dmesg.Kernel_panic-not_syncing:Fatal_exception
>>> > :199 34% 67:200 dmesg.Mem-Info
>>> > :199 34% 67:200 dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN
>>> > :199 34% 67:200 dmesg.RIP:down_read_trylock
>>> >
>>> > we don't have enough knowledge to understand the relationship between code
>>> > change and the random issues. just report what we obsverved in our tests FYI.
>>> >
>>>
>>> I think this is caused by a race between vmalloc_test_init and alloc_tag_init.
>>>
>>> vmalloc_test actually depends on alloc_tag via alloc_tag_top_users, because when
>>> memory allocation fails show_mem() would invoke alloc_tag_top_users.
>>>
>>> With following configuration:
>>>
>>> CONFIG_TEST_VMALLOC=y
>>> CONFIG_MEM_ALLOC_PROFILING=y
>>> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y
>>> CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
>>>
>>> If vmalloc_test_init starts before alloc_tag_init, show_mem() would cause
>>> a NULL deference because alloc_tag_cttype was not init yet.
>>>
>>> I add some debug to confirm this theory
>>> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
>>> index d48b80f3f007..9b8e7501010f 100644
>>> --- a/lib/alloc_tag.c
>>> +++ b/lib/alloc_tag.c
>>> @@ -133,6 +133,8 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
>>> struct codetag *ct;
>>> struct codetag_bytes n;
>>> unsigned int i, nr = 0;
>>> + pr_info("memory profiling alloc top %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
>>> + return 0;
>>>
>>> if (can_sleep)
>>> codetag_lock_module_list(alloc_tag_cttype, true);
>>> @@ -831,6 +833,7 @@ static int __init alloc_tag_init(void)
>>> shutdown_mem_profiling(true);
>>> return PTR_ERR(alloc_tag_cttype);
>>> }
>>> + pr_info("memory profiling ready %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
>>>
>>> return 0;
>>> }
>>>
>>> When bootup the kernel, the log shows:
>>>
>>> $ sudo dmesg -T | grep profiling
>>> [Fri Jun 20 17:29:35 2025] memory profiling alloc top 1: 0 <--- alloc_tag_cttype == NULL
>>> [Fri Jun 20 17:30:24 2025] memory profiling ready 1: ffff9b1641aa06c0
>>>
>>>
>>> vmalloc_test_init should happened after alloc_tag_init if CONFIG_TEST_VMALLOC=y,
>>> or mem_show() should check whether alloc_tag is done initialized when calling
>>> alloc_tag_top_users
>>
>>Thanks for reporting!
>>So, IIUC https://lore.kernel.org/all/20250620195305.1115151-1-harry.yoo@oracle.com/
>>will address this issue as well. Is that correct?
>
>Yes, the panic can be fix by that patch.
>
>I still feel it better to delay vmalloc_test_init, make it happen after alloc_tag_init.
>Or, maybe we can promote alloc_tag_init to some early init? I remember reporting some allocation
>not registered by memory profiling during boot,
>https://lore.kernel.org/all/213ff7d2.7c6c.1945eb0c2ff.Coremail.00107082@163.com/
>
>I will make some tests, and update later
The memory allocations in sched_init_domains happened quite early, maybe it is core_initcall, while
alloc_tag_init needs rootfs, it needs to be after rootfs_initcall, so no reasonable place to promote.......
But I think this explain why some allocation counter missed during boot: the allocation happened before alloc_tag_init
Thanks
David
>
>
>David
>
>
>>
>>>
>>>
>>>
>>> David
>>>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: CONFIG_TEST_VMALLOC=y conflict/race with alloc_tag_init
2025-06-23 3:16 ` David Wang
@ 2025-06-23 4:39 ` David Wang
0 siblings, 0 replies; 28+ messages in thread
From: David Wang @ 2025-06-23 4:39 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: oliver.sang, urezki, ahuang12, akpm, bhe, hch, linux-kernel,
linux-mm, lkp, mjguzik, oe-lkp, harry.yoo, kent.overstreet
At 2025-06-23 11:16:15, "David Wang" <00107082@163.com> wrote:
>
>At 2025-06-23 10:45:31, "David Wang" <00107082@163.com> wrote:
>>
>>At 2025-06-23 06:50:44, "Suren Baghdasaryan" <surenb@google.com> wrote:
>>>On Fri, Jun 20, 2025 at 3:03 AM David Wang <00107082@163.com> wrote:
>>>>
>>>> On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
>>>> >
>>>> > Hello,
>>>> >
>>>> > for this change, we reported
>>>> > "[linux-next:master] [lib/test_vmalloc.c] 7fc85b92db: Mem-Info"
>>>> > in
>>>> > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
>>>> >
>>>> > at that time, we made some tests with x86_64 config which runs well.
>>>> >
>>>> > now we noticed the commit is in mainline now.
>>>>
>>>> > the config still has expected diff with parent:
>>>> >
>>>> > --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config 2025-06-17 14:40:29.481052101 +0800
>>>> > +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config 2025-06-17 14:41:18.448543738 +0800
>>>> > @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m
>>>> > CONFIG_TEST_MISC_MINOR=m
>>>> > # CONFIG_TEST_LKM is not set
>>>> > CONFIG_TEST_BITOPS=m
>>>> > -CONFIG_TEST_VMALLOC=m
>>>> > +CONFIG_TEST_VMALLOC=y
>>>> > # CONFIG_TEST_BPF is not set
>>>> > CONFIG_FIND_BIT_BENCHMARK=m
>>>> > # CONFIG_TEST_FIRMWARE is not set
>>>> >
>>>> >
>>>> > then we noticed similar random issue with x86_64 randconfig this time.
>>>> >
>>>> > 7a73348e5d4715b5 2d76e79315e403aab595d4c8830
>>>> > ---------------- ---------------------------
>>>> > fail:runs %reproduction fail:runs
>>>> > | | |
>>>> > :199 34% 67:200 dmesg.KASAN:null-ptr-deref_in_range[#-#]
>>>> > :199 34% 67:200 dmesg.Kernel_panic-not_syncing:Fatal_exception
>>>> > :199 34% 67:200 dmesg.Mem-Info
>>>> > :199 34% 67:200 dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN
>>>> > :199 34% 67:200 dmesg.RIP:down_read_trylock
>>>> >
>>>> > we don't have enough knowledge to understand the relationship between code
>>>> > change and the random issues. just report what we obsverved in our tests FYI.
>>>> >
>>>>
>>>> I think this is caused by a race between vmalloc_test_init and alloc_tag_init.
>>>>
>>>> vmalloc_test actually depends on alloc_tag via alloc_tag_top_users, because when
>>>> memory allocation fails show_mem() would invoke alloc_tag_top_users.
>>>>
>>>> With following configuration:
>>>>
>>>> CONFIG_TEST_VMALLOC=y
>>>> CONFIG_MEM_ALLOC_PROFILING=y
>>>> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y
>>>> CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
>>>>
>>>> If vmalloc_test_init starts before alloc_tag_init, show_mem() would cause
>>>> a NULL deference because alloc_tag_cttype was not init yet.
>>>>
>>>> I add some debug to confirm this theory
>>>> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
>>>> index d48b80f3f007..9b8e7501010f 100644
>>>> --- a/lib/alloc_tag.c
>>>> +++ b/lib/alloc_tag.c
>>>> @@ -133,6 +133,8 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
>>>> struct codetag *ct;
>>>> struct codetag_bytes n;
>>>> unsigned int i, nr = 0;
>>>> + pr_info("memory profiling alloc top %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
>>>> + return 0;
>>>>
>>>> if (can_sleep)
>>>> codetag_lock_module_list(alloc_tag_cttype, true);
>>>> @@ -831,6 +833,7 @@ static int __init alloc_tag_init(void)
>>>> shutdown_mem_profiling(true);
>>>> return PTR_ERR(alloc_tag_cttype);
>>>> }
>>>> + pr_info("memory profiling ready %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
>>>>
>>>> return 0;
>>>> }
>>>>
>>>> When bootup the kernel, the log shows:
>>>>
>>>> $ sudo dmesg -T | grep profiling
>>>> [Fri Jun 20 17:29:35 2025] memory profiling alloc top 1: 0 <--- alloc_tag_cttype == NULL
>>>> [Fri Jun 20 17:30:24 2025] memory profiling ready 1: ffff9b1641aa06c0
>>>>
>>>>
>>>> vmalloc_test_init should happened after alloc_tag_init if CONFIG_TEST_VMALLOC=y,
>>>> or mem_show() should check whether alloc_tag is done initialized when calling
>>>> alloc_tag_top_users
>>>
>>>Thanks for reporting!
>>>So, IIUC https://lore.kernel.org/all/20250620195305.1115151-1-harry.yoo@oracle.com/
>>>will address this issue as well. Is that correct?
>>
>>Yes, the panic can be fix by that patch.
>>
>>I still feel it better to delay vmalloc_test_init, make it happen after alloc_tag_init.
>>Or, maybe we can promote alloc_tag_init to some early init? I remember reporting some allocation
>>not registered by memory profiling during boot,
>>https://lore.kernel.org/all/213ff7d2.7c6c.1945eb0c2ff.Coremail.00107082@163.com/
>>
>>I will make some tests, and update later
>
>The memory allocations in sched_init_domains happened quite early, maybe it is core_initcall, while
> alloc_tag_init needs rootfs, it needs to be after rootfs_initcall, so no reasonable place to promote.......
>But I think this explain why some allocation counter missed during boot: the allocation happened before alloc_tag_init
..... Sorry, I think I was wrong..... The counters does not need alloc_tag_init...
sorry for bothering, please ignore my mumbo jumbo.
David
>
>
>Thanks
>David
>
>>
>>
>>David
>>
>>
>>>
>>>>
>>>>
>>>>
>>>> David
>>>>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Kernel crash due to alloc_tag_top_users() being called when !mem_profiling_support?
2025-06-22 22:54 ` Suren Baghdasaryan
@ 2025-06-23 11:29 ` Uladzislau Rezki
0 siblings, 0 replies; 28+ messages in thread
From: Uladzislau Rezki @ 2025-06-23 11:29 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: Uladzislau Rezki, Harry Yoo, kernel test robot, oe-lkp, lkp,
linux-kernel, Andrew Morton, Baoquan He, Adrian Huang,
Christop Hellwig, Mateusz Guzik, linux-mm, Kent Overstreet
On Sun, Jun 22, 2025 at 03:54:51PM -0700, Suren Baghdasaryan wrote:
> On Fri, Jun 20, 2025 at 1:47 AM Uladzislau Rezki <urezki@gmail.com> wrote:
> >
> > On Fri, Jun 20, 2025 at 12:04:50AM +0900, Harry Yoo wrote:
> > > On Thu, Jun 19, 2025 at 11:10:43PM +0900, Harry Yoo wrote:
> > > > On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > for this change, we reported
> > > > > "[linux-next:master] [lib/test_vmalloc.c] 7fc85b92db: Mem-Info"
> > > > > in
> > > > > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
> > > > >
> > > > > at that time, we made some tests with x86_64 config which runs well.
> > > > >
> > > > > now we noticed the commit is in mainline now.
> > > >
> > > > (Re-sending due to not Ccing people and the list...)
> > > >
> > > > Hi, I'm facing the same error on my testing environment.
> > >
> > > I should have clarified that the reason the kernel failed to allocate
> > > memory on my machine was due to running out of memory, not because of the
> > > vmalloc test module.
> > >
> > > But based on the fact that the test case (align_shift_alloc_test) is
> > > expected to fail, the issue here is not memory allocation failure
> > > itself, but rather that the kernel crashes when the allocation fails.
> > >
> > It looks someone tries to test the CONFIG_TEST_VMALLOC=y as built-in
> > approach test-cases. Yes, it will trigger a lot of warnings as some
> > use cases are supposed to be failed. This will trigger a lot of kernel
> > warnings which can be considered by test-robot or people as problem.
> >
> > In this case i can exclude those use cases or even not run at all unless
> > boot-parameters properly sets if built-in.
>
> Sorry, I'm catching up on my email backlog. IIUC
> https://lore.kernel.org/all/20250620195305.1115151-1-harry.yoo@oracle.com/
> addresses this issue. Is my understanding correct?
>
I checked/tested the .config from the test-robot in order to reproduce
the kernel crash. Unfortunately i can not trigger this. But, people from
the another thread already confirmed that it solves the crash.
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: CONFIG_TEST_VMALLOC=y conflict/race with alloc_tag_init
2025-06-23 2:45 ` David Wang
2025-06-23 3:16 ` David Wang
@ 2025-06-23 11:36 ` Uladzislau Rezki
2025-06-23 13:20 ` David Wang
1 sibling, 1 reply; 28+ messages in thread
From: Uladzislau Rezki @ 2025-06-23 11:36 UTC (permalink / raw)
To: David Wang
Cc: Suren Baghdasaryan, oliver.sang, urezki, ahuang12, akpm, bhe, hch,
linux-kernel, linux-mm, lkp, mjguzik, oe-lkp, harry.yoo,
kent.overstreet
On Mon, Jun 23, 2025 at 10:45:31AM +0800, David Wang wrote:
>
> At 2025-06-23 06:50:44, "Suren Baghdasaryan" <surenb@google.com> wrote:
> >On Fri, Jun 20, 2025 at 3:03 AM David Wang <00107082@163.com> wrote:
> >>
> >> On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
> >> >
> >> > Hello,
> >> >
> >> > for this change, we reported
> >> > "[linux-next:master] [lib/test_vmalloc.c] 7fc85b92db: Mem-Info"
> >> > in
> >> > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
> >> >
> >> > at that time, we made some tests with x86_64 config which runs well.
> >> >
> >> > now we noticed the commit is in mainline now.
> >>
> >> > the config still has expected diff with parent:
> >> >
> >> > --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config 2025-06-17 14:40:29.481052101 +0800
> >> > +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config 2025-06-17 14:41:18.448543738 +0800
> >> > @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m
> >> > CONFIG_TEST_MISC_MINOR=m
> >> > # CONFIG_TEST_LKM is not set
> >> > CONFIG_TEST_BITOPS=m
> >> > -CONFIG_TEST_VMALLOC=m
> >> > +CONFIG_TEST_VMALLOC=y
> >> > # CONFIG_TEST_BPF is not set
> >> > CONFIG_FIND_BIT_BENCHMARK=m
> >> > # CONFIG_TEST_FIRMWARE is not set
> >> >
> >> >
> >> > then we noticed similar random issue with x86_64 randconfig this time.
> >> >
> >> > 7a73348e5d4715b5 2d76e79315e403aab595d4c8830
> >> > ---------------- ---------------------------
> >> > fail:runs %reproduction fail:runs
> >> > | | |
> >> > :199 34% 67:200 dmesg.KASAN:null-ptr-deref_in_range[#-#]
> >> > :199 34% 67:200 dmesg.Kernel_panic-not_syncing:Fatal_exception
> >> > :199 34% 67:200 dmesg.Mem-Info
> >> > :199 34% 67:200 dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN
> >> > :199 34% 67:200 dmesg.RIP:down_read_trylock
> >> >
> >> > we don't have enough knowledge to understand the relationship between code
> >> > change and the random issues. just report what we obsverved in our tests FYI.
> >> >
> >>
> >> I think this is caused by a race between vmalloc_test_init and alloc_tag_init.
> >>
> >> vmalloc_test actually depends on alloc_tag via alloc_tag_top_users, because when
> >> memory allocation fails show_mem() would invoke alloc_tag_top_users.
> >>
> >> With following configuration:
> >>
> >> CONFIG_TEST_VMALLOC=y
> >> CONFIG_MEM_ALLOC_PROFILING=y
> >> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y
> >> CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
> >>
> >> If vmalloc_test_init starts before alloc_tag_init, show_mem() would cause
> >> a NULL deference because alloc_tag_cttype was not init yet.
> >>
> >> I add some debug to confirm this theory
> >> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> >> index d48b80f3f007..9b8e7501010f 100644
> >> --- a/lib/alloc_tag.c
> >> +++ b/lib/alloc_tag.c
> >> @@ -133,6 +133,8 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> >> struct codetag *ct;
> >> struct codetag_bytes n;
> >> unsigned int i, nr = 0;
> >> + pr_info("memory profiling alloc top %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
> >> + return 0;
> >>
> >> if (can_sleep)
> >> codetag_lock_module_list(alloc_tag_cttype, true);
> >> @@ -831,6 +833,7 @@ static int __init alloc_tag_init(void)
> >> shutdown_mem_profiling(true);
> >> return PTR_ERR(alloc_tag_cttype);
> >> }
> >> + pr_info("memory profiling ready %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
> >>
> >> return 0;
> >> }
> >>
> >> When bootup the kernel, the log shows:
> >>
> >> $ sudo dmesg -T | grep profiling
> >> [Fri Jun 20 17:29:35 2025] memory profiling alloc top 1: 0 <--- alloc_tag_cttype == NULL
> >> [Fri Jun 20 17:30:24 2025] memory profiling ready 1: ffff9b1641aa06c0
> >>
> >>
> >> vmalloc_test_init should happened after alloc_tag_init if CONFIG_TEST_VMALLOC=y,
> >> or mem_show() should check whether alloc_tag is done initialized when calling
> >> alloc_tag_top_users
> >
> >Thanks for reporting!
> >So, IIUC https://lore.kernel.org/all/20250620195305.1115151-1-harry.yoo@oracle.com/
> >will address this issue as well. Is that correct?
>
> Yes, the panic can be fix by that patch.
>
> I still feel it better to delay vmalloc_test_init, make it happen after alloc_tag_init.
>
We can, but then we would not notice the bag that is in question :)
At least we should, i think, to exclude the tests which trigger warnings
when the test-suite is run with default configurations, i.e. run the tests
which are not supposed to fail.
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: CONFIG_TEST_VMALLOC=y conflict/race with alloc_tag_init
2025-06-23 11:36 ` Uladzislau Rezki
@ 2025-06-23 13:20 ` David Wang
0 siblings, 0 replies; 28+ messages in thread
From: David Wang @ 2025-06-23 13:20 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Suren Baghdasaryan, oliver.sang, ahuang12, akpm, bhe, hch,
linux-kernel, linux-mm, lkp, mjguzik, oe-lkp, harry.yoo,
kent.overstreet
At 2025-06-23 19:36:03, "Uladzislau Rezki" <urezki@gmail.com> wrote:
>On Mon, Jun 23, 2025 at 10:45:31AM +0800, David Wang wrote:
>>
>> At 2025-06-23 06:50:44, "Suren Baghdasaryan" <surenb@google.com> wrote:
>> >On Fri, Jun 20, 2025 at 3:03 AM David Wang <00107082@163.com> wrote:
>> >>
>> >> On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
>> >> >
>> >> > Hello,
>> >> >
>> >> > for this change, we reported
>> >> > "[linux-next:master] [lib/test_vmalloc.c] 7fc85b92db: Mem-Info"
>> >> > in
>> >> > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
>> >> >
>> >> > at that time, we made some tests with x86_64 config which runs well.
>> >> >
>> >> > now we noticed the commit is in mainline now.
>> >>
>> >> > the config still has expected diff with parent:
>> >> >
>> >> > --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config 2025-06-17 14:40:29.481052101 +0800
>> >> > +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config 2025-06-17 14:41:18.448543738 +0800
>> >> > @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m
>> >> > CONFIG_TEST_MISC_MINOR=m
>> >> > # CONFIG_TEST_LKM is not set
>> >> > CONFIG_TEST_BITOPS=m
>> >> > -CONFIG_TEST_VMALLOC=m
>> >> > +CONFIG_TEST_VMALLOC=y
>> >> > # CONFIG_TEST_BPF is not set
>> >> > CONFIG_FIND_BIT_BENCHMARK=m
>> >> > # CONFIG_TEST_FIRMWARE is not set
>> >> >
>> >> >
>> >> > then we noticed similar random issue with x86_64 randconfig this time.
>> >> >
>> >> > 7a73348e5d4715b5 2d76e79315e403aab595d4c8830
>> >> > ---------------- ---------------------------
>> >> > fail:runs %reproduction fail:runs
>> >> > | | |
>> >> > :199 34% 67:200 dmesg.KASAN:null-ptr-deref_in_range[#-#]
>> >> > :199 34% 67:200 dmesg.Kernel_panic-not_syncing:Fatal_exception
>> >> > :199 34% 67:200 dmesg.Mem-Info
>> >> > :199 34% 67:200 dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN
>> >> > :199 34% 67:200 dmesg.RIP:down_read_trylock
>> >> >
>> >> > we don't have enough knowledge to understand the relationship between code
>> >> > change and the random issues. just report what we obsverved in our tests FYI.
>> >> >
>> >>
>> >> I think this is caused by a race between vmalloc_test_init and alloc_tag_init.
>> >>
>> >> vmalloc_test actually depends on alloc_tag via alloc_tag_top_users, because when
>> >> memory allocation fails show_mem() would invoke alloc_tag_top_users.
>> >>
>> >> With following configuration:
>> >>
>> >> CONFIG_TEST_VMALLOC=y
>> >> CONFIG_MEM_ALLOC_PROFILING=y
>> >> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y
>> >> CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
>> >>
>> >> If vmalloc_test_init starts before alloc_tag_init, show_mem() would cause
>> >> a NULL deference because alloc_tag_cttype was not init yet.
>> >>
>> >> I add some debug to confirm this theory
>> >> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
>> >> index d48b80f3f007..9b8e7501010f 100644
>> >> --- a/lib/alloc_tag.c
>> >> +++ b/lib/alloc_tag.c
>> >> @@ -133,6 +133,8 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
>> >> struct codetag *ct;
>> >> struct codetag_bytes n;
>> >> unsigned int i, nr = 0;
>> >> + pr_info("memory profiling alloc top %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
>> >> + return 0;
>> >>
>> >> if (can_sleep)
>> >> codetag_lock_module_list(alloc_tag_cttype, true);
>> >> @@ -831,6 +833,7 @@ static int __init alloc_tag_init(void)
>> >> shutdown_mem_profiling(true);
>> >> return PTR_ERR(alloc_tag_cttype);
>> >> }
>> >> + pr_info("memory profiling ready %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
>> >>
>> >> return 0;
>> >> }
>> >>
>> >> When bootup the kernel, the log shows:
>> >>
>> >> $ sudo dmesg -T | grep profiling
>> >> [Fri Jun 20 17:29:35 2025] memory profiling alloc top 1: 0 <--- alloc_tag_cttype == NULL
>> >> [Fri Jun 20 17:30:24 2025] memory profiling ready 1: ffff9b1641aa06c0
>> >>
>> >>
>> >> vmalloc_test_init should happened after alloc_tag_init if CONFIG_TEST_VMALLOC=y,
>> >> or mem_show() should check whether alloc_tag is done initialized when calling
>> >> alloc_tag_top_users
>> >
>> >Thanks for reporting!
>> >So, IIUC https://lore.kernel.org/all/20250620195305.1115151-1-harry.yoo@oracle.com/
>> >will address this issue as well. Is that correct?
>>
>> Yes, the panic can be fix by that patch.
>>
>> I still feel it better to delay vmalloc_test_init, make it happen after alloc_tag_init.
>>
>We can, but then we would not notice the bag that is in question :)
Yes, strangely lucky here~ :)
I was thinking, if some vmalloc tests fail, is alloc_tag_top_users helpful for debug?
Considering this bug has already been caught, if alloc_tag_top_users is helpful for vmalloc test analysis,
maybe it is still reasonable to delay vmalloc_test_init?... ☺︎
>
>At least we should, i think, to exclude the tests which trigger warnings
>when the test-suite is run with default configurations, i.e. run the tests
>which are not supposed to fail.
>
>--
>Uladzislau Rezki
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2025-06-23 13:21 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-18 6:25 [linus:master] [lib/test_vmalloc.c] 2d76e79315: Kernel_panic-not_syncing:Fatal_exception kernel test robot
2025-06-19 14:10 ` Kernel crash due to alloc_tag_top_users() being called when !mem_profiling_support? Harry Yoo
2025-06-19 15:04 ` Harry Yoo
2025-06-20 8:47 ` Uladzislau Rezki
2025-06-22 22:54 ` Suren Baghdasaryan
2025-06-23 11:29 ` Uladzislau Rezki
2025-06-19 15:08 ` David Wang
2025-06-20 1:14 ` Harry Yoo
2025-06-20 0:40 ` [PATCH] lib/alloc_tag: do not acquire nonexistent lock when mem profiling is disabled Harry Yoo
2025-06-20 3:09 ` David Wang
2025-06-20 10:40 ` [PATCH] " Harry Yoo
2025-06-20 11:33 ` Harry Yoo
2025-06-20 13:59 ` David Wang
2025-06-20 12:47 ` Harry Yoo
2025-06-20 10:02 ` CONFIG_TEST_VMALLOC=y conflict/race with alloc_tag_init David Wang
2025-06-22 22:50 ` Suren Baghdasaryan
2025-06-23 2:04 ` Harry Yoo
2025-06-23 2:45 ` David Wang
2025-06-23 3:16 ` David Wang
2025-06-23 4:39 ` David Wang
2025-06-23 11:36 ` Uladzislau Rezki
2025-06-23 13:20 ` David Wang
2025-06-20 14:24 ` [PATCH] lib/test_vmalloc.c: demote vmalloc_test_init to late_initcall David Wang
2025-06-20 19:59 ` Harry Yoo
2025-06-20 19:53 ` [PATCH v2] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users() Harry Yoo
2025-06-21 3:43 ` David Wang
2025-06-22 22:24 ` [PATCH " Suren Baghdasaryan
2025-06-23 2:01 ` Harry Yoo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).