linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [linus:master] [lib/test_vmalloc.c]  2d76e79315: Kernel_panic-not_syncing:Fatal_exception
@ 2025-06-18  6:25 kernel test robot
  2025-06-19 14:10 ` Kernel crash due to alloc_tag_top_users() being called when !mem_profiling_support? Harry Yoo
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: kernel test robot @ 2025-06-18  6:25 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Baoquan He,
	Adrian Huang, Christop Hellwig, Mateusz Guzik, linux-mm,
	oliver.sang


Hello,

for this change, we reported
"[linux-next:master] [lib/test_vmalloc.c]  7fc85b92db: Mem-Info"
in
https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/

at that time, we made some tests with x86_64 config which runs well.

now we noticed the commit is in mainline now.

the config still has expected diff with parent:

--- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config   2025-06-17 14:40:29.481052101 +0800
+++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config   2025-06-17 14:41:18.448543738 +0800
@@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m
 CONFIG_TEST_MISC_MINOR=m
 # CONFIG_TEST_LKM is not set
 CONFIG_TEST_BITOPS=m
-CONFIG_TEST_VMALLOC=m
+CONFIG_TEST_VMALLOC=y
 # CONFIG_TEST_BPF is not set
 CONFIG_FIND_BIT_BENCHMARK=m
 # CONFIG_TEST_FIRMWARE is not set


then we noticed similar random issue with x86_64 randconfig this time.

7a73348e5d4715b5 2d76e79315e403aab595d4c8830
---------------- ---------------------------
       fail:runs  %reproduction    fail:runs
           |             |             |
           :199         34%          67:200   dmesg.KASAN:null-ptr-deref_in_range[#-#]
           :199         34%          67:200   dmesg.Kernel_panic-not_syncing:Fatal_exception
           :199         34%          67:200   dmesg.Mem-Info
           :199         34%          67:200   dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN
           :199         34%          67:200   dmesg.RIP:down_read_trylock

we don't have enough knowledge to understand the relationship between code
change and the random issues. just report what we obsverved in our tests FYI.

below is full report.



kernel test robot noticed "Kernel_panic-not_syncing:Fatal_exception" on:

commit: 2d76e79315e403aab595d4c8830b7a46c19f0f3b ("lib/test_vmalloc.c: allow built-in execution")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linus/master      e04c78d86a9699d136910cfc0bdcf01087e3267e]
[test failed on linux-next/master 050f8ad7b58d9079455af171ac279c4b9b828c11]

in testcase: boot

config: x86_64-randconfig-161-20250614
compiler: gcc-12
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com


[   36.902716][   T60] vmalloc_node_range for size 8192 failed: Address range restricted to 0xffffc90000000000 - 0xffffe8ffffffffff
[   36.903981][   T60] vmalloc_test/0: vmalloc error: size 4096, vm_struct allocation failed, mode:0xdc0(GFP_KERNEL|__GFP_ZERO), nodemask=(null)
[   36.905195][   T60] CPU: 1 UID: 0 PID: 60 Comm: vmalloc_test/0 Not tainted 6.15.0-rc6-00142-g2d76e79315e4 #1 VOLUNTARY 
[   36.905201][   T60] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[   36.905203][   T60] Call Trace:
[   36.905206][   T60]  <TASK>
[   36.905209][   T60]  dump_stack_lvl+0x87/0xd6
[   36.905223][   T60]  warn_alloc+0x15e/0x291
[   36.905230][   T60]  ? has_managed_dma+0x37/0x37
[   36.905237][   T60]  ? __get_vm_area_node+0x33a/0x3c0
[   36.905244][   T60]  ? __get_vm_area_node+0x33a/0x3c0
[   36.905250][   T60]  __vmalloc_node_range_noprof+0x170/0x306
[   36.905255][   T60]  ? __vmalloc_area_node+0x460/0x460
[   36.905260][   T60]  ? test_func+0x2ae/0x469
[   36.905264][   T60]  __vmalloc_node_noprof+0xb8/0xd9
[   36.905267][   T60]  ? test_func+0x2ae/0x469
[   36.905272][   T60]  align_shift_alloc_test+0xa8/0x165
[   36.905277][   T60]  test_func+0x2ae/0x469
[   36.905281][   T60]  ? pcpu_alloc_test+0x31b/0x31b
[   36.905286][   T60]  ? __kthread_parkme+0xcb/0x1a3
[   36.905293][   T60]  ? pcpu_alloc_test+0x31b/0x31b
[   36.905297][   T60]  kthread+0x452/0x464
[   36.905301][   T60]  ? kthread_is_per_cpu+0x51/0x51
[   36.905304][   T60]  ? _raw_spin_unlock_irq+0x23/0x35
[   36.905308][   T60]  ? kthread_is_per_cpu+0x51/0x51
[ 36.905311][ T60] ? kthread_is_per_cpu (kbuild/obj/consumer/x86_64-randconfig-161-20250614/kernel/kthread.c:413) 
[ 36.905314][ T60] ret_from_fork (kbuild/obj/consumer/x86_64-randconfig-161-20250614/arch/x86/kernel/process.c:153) 
[ 36.905318][ T60] ? kthread_is_per_cpu (kbuild/obj/consumer/x86_64-randconfig-161-20250614/kernel/kthread.c:413) 
[ 36.905321][ T60] ret_from_fork_asm (kbuild/obj/consumer/x86_64-randconfig-161-20250614/arch/x86/entry/entry_64.S:255) 
[   36.905330][   T60]  </TASK>
[   36.905332][   T60] Mem-Info:
[   36.919941][   T60] active_anon:0 inactive_anon:0 isolated_anon:0
[   36.919941][   T60]  active_file:0 inactive_file:0 isolated_file:0
[   36.919941][   T60]  unevictable:41612 dirty:0 writeback:0
[   36.919941][   T60]  slab_reclaimable:7429 slab_unreclaimable:145259
[   36.919941][   T60]  mapped:0 shmem:0 pagetables:145
[   36.919941][   T60]  sec_pagetables:0 bounce:0
[   36.919941][   T60]  kernel_misc_reclaimable:0
[   36.919941][   T60]  free:3233392 free_pcp:1185 free_cma:0
[   36.923830][   T60] Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:166448kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB writeback_tmp:0kB kernel_stack:1952kB pagetables:580kB sec_pagetables:0kB all_unreclaimable? no Balloon:0kB
[   36.926265][   T60] DMA free:15360kB boost:0kB min:16kB low:28kB high:40kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[   36.928855][   T60] lowmem_reserve[]: 0 2991 13741 13741
[   36.929411][   T60] DMA32 free:3060560kB boost:0kB min:3224kB low:6244kB high:9264kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:3063680kB mlocked:0kB bounce:0kB free_pcp:3120kB local_pcp:3120kB free_cma:0kB
[   36.932080][   T60] lowmem_reserve[]: 0 0 10749 10749
[   36.932604][   T60] Normal free:9857648kB boost:0kB min:11744kB low:22748kB high:33752kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:166448kB writepending:0kB present:13631488kB managed:11007884kB mlocked:0kB bounce:0kB free_pcp:1620kB local_pcp:740kB free_cma:0kB
[   36.935336][   T60] lowmem_reserve[]: 0 0 0 0
[   36.935802][   T60] DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (U) 3*4096kB (M) = 15360kB
[   36.936931][   T60] DMA32: 0*4kB 0*8kB 1*16kB (M) 2*32kB (M) 2*64kB (M) 1*128kB (M) 2*256kB (M) 2*512kB (M) 1*1024kB (M) 1*2048kB (M) 746*4096kB (M) = 3060560kB
[   36.938318][   T60] Normal: 6*4kB (ME) 2*8kB (ME) 7*16kB (UME) 5*32kB (M) 3*64kB (ME) 4*128kB (M) 6*256kB (UME) 2*512kB (M) 1*1024kB (M) 3*2048kB (UME) 2404*4096kB (M) = 9857528kB
[   36.939849][   T60] 41618 total pagecache pages
[   36.940324][   T60] 4194174 pages RAM
[   36.940721][   T60] 0 pages HighMem/MovableOnly
[   36.941188][   T60] 672443 pages reserved
[   36.941626][   T60] Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#1] SMP KASAN
[   36.942185][   T60] KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
[   36.942185][   T60] CPU: 1 UID: 0 PID: 60 Comm: vmalloc_test/0 Not tainted 6.15.0-rc6-00142-g2d76e79315e4 #1 VOLUNTARY 
[   36.942185][   T60] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[   36.942185][   T60] RIP: 0010:down_read_trylock+0xa7/0x2b9
[   36.942185][   T60] Code: b0 ef 25 91 e8 57 16 40 00 83 3d 9c e6 a7 09 00 0f 85 2c 01 00 00 48 8d 6b 68 b8 ff ff 37 00 48 89 ea 48 c1 e0 2a 48 c1 ea 03 <80> 3c 02 00 74 08 48 89 ef e8 3c 16 40 00 48 3b 5b 68 0f 84 00 01
[   36.942185][   T60] RSP: 0000:ffff88814657f848 EFLAGS: 00010206
[   36.942185][   T60] RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 1ffffffff224bdf6
[   36.942185][   T60] RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
[   36.942185][   T60] RBP: 00000000000000d8 R08: 0000000000000000 R09: 0000000000000000
[   36.942185][   T60] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff11028caff0a
[   36.942185][   T60] R13: ffff88814657fa30 R14: dffffc0000000000 R15: 0000000000000000
[   36.942185][   T60] FS:  0000000000000000(0000) GS:ffff88841c1f0000(0000) knlGS:0000000000000000
[   36.942185][   T60] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   36.942185][   T60] CR2: 0000000000000000 CR3: 00000001636e0000 CR4: 00000000000406b0
[   36.942185][   T60] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   36.942185][   T60] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   36.942185][   T60] Call Trace:
[   36.942185][   T60]  <TASK>
[   36.942185][   T60]  ? clear_nonspinnable+0x32/0x32
[   36.942185][   T60]  ? vprintk_emit+0x165/0x194
[   36.942185][   T60]  codetag_trylock_module_list+0xd/0x19
[   36.942185][   T60]  alloc_tag_top_users+0x95/0x216
[   36.942185][   T60]  ? _printk+0xad/0xdf
[   36.942185][   T60]  ? reserve_module_tags+0x308/0x308
[   36.942185][   T60]  __show_mem+0x167/0x54b
[   36.942185][   T60]  ? _printk+0xad/0xdf
[   36.942185][   T60]  ? printk_get_console_flush_type+0x272/0x272
[   36.942185][   T60]  ? show_free_areas+0x115d/0x115d
[   36.942185][   T60]  ? tracer_hardirqs_on+0x1b/0x28d
[   36.942185][   T60]  ? dump_stack_lvl+0x91/0xd6
[   36.942185][   T60]  ? warn_alloc+0x251/0x291
[   36.942185][   T60]  warn_alloc+0x251/0x291
[   36.942185][   T60]  ? has_managed_dma+0x37/0x37
[   36.942185][   T60]  ? __get_vm_area_node+0x33a/0x3c0
[   36.942185][   T60]  __vmalloc_node_range_noprof+0x170/0x306
[   36.942185][   T60]  ? __vmalloc_area_node+0x460/0x460
[   36.942185][   T60]  ? test_func+0x2ae/0x469
[   36.942185][   T60]  __vmalloc_node_noprof+0xb8/0xd9
[   36.942185][   T60]  ? test_func+0x2ae/0x469
[   36.942185][   T60]  align_shift_alloc_test+0xa8/0x165
[   36.942185][   T60]  test_func+0x2ae/0x469
[   36.942185][   T60]  ? pcpu_alloc_test+0x31b/0x31b
[   36.942185][   T60]  ? __kthread_parkme+0xcb/0x1a3
[   36.942185][   T60]  ? pcpu_alloc_test+0x31b/0x31b
[   36.942185][   T60]  kthread+0x452/0x464
[   36.942185][   T60]  ? kthread_is_per_cpu+0x51/0x51
[   36.942185][   T60]  ? _raw_spin_unlock_irq+0x23/0x35
[   36.942185][   T60]  ? kthread_is_per_cpu+0x51/0x51
[   36.942185][   T60]  ret_from_fork+0x20/0x54
[   36.942185][   T60]  ? kthread_is_per_cpu+0x51/0x51
[   36.942185][   T60]  ret_from_fork_asm+0x11/0x20
[   36.942185][   T60]  </TASK>
[   36.942185][   T60] Modules linked in:
[   37.000652][   T60] ---[ end trace 0000000000000000 ]---
[   37.001188][   T60] RIP: 0010:down_read_trylock+0xa7/0x2b9
[   37.001731][   T60] Code: b0 ef 25 91 e8 57 16 40 00 83 3d 9c e6 a7 09 00 0f 85 2c 01 00 00 48 8d 6b 68 b8 ff ff 37 00 48 89 ea 48 c1 e0 2a 48 c1 ea 03 <80> 3c 02 00 74 08 48 89 ef e8 3c 16 40 00 48 3b 5b 68 0f 84 00 01
[   37.003488][   T60] RSP: 0000:ffff88814657f848 EFLAGS: 00010206
[   37.004072][   T60] RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 1ffffffff224bdf6
[   37.004848][   T60] RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
[   37.005610][   T60] RBP: 00000000000000d8 R08: 0000000000000000 R09: 0000000000000000
[   37.006381][   T60] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff11028caff0a
[   37.007178][   T60] R13: ffff88814657fa30 R14: dffffc0000000000 R15: 0000000000000000
[   37.007940][   T60] FS:  0000000000000000(0000) GS:ffff88841c1f0000(0000) knlGS:0000000000000000
[   37.008792][   T60] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   37.009411][   T60] CR2: 0000000000000000 CR3: 00000001636e0000 CR4: 00000000000406b0
[   37.010175][   T60] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   37.010950][   T60] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   37.011716][   T60] Kernel panic - not syncing: Fatal exception
[   37.012397][   T60] Kernel Offset: 0x6200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250618/202506181351.bba867dd-lkp@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Kernel crash due to alloc_tag_top_users() being called when !mem_profiling_support?
  2025-06-18  6:25 [linus:master] [lib/test_vmalloc.c] 2d76e79315: Kernel_panic-not_syncing:Fatal_exception kernel test robot
@ 2025-06-19 14:10 ` Harry Yoo
  2025-06-19 15:04   ` Harry Yoo
  2025-06-19 15:08   ` David Wang
  2025-06-20 10:02 ` CONFIG_TEST_VMALLOC=y conflict/race with alloc_tag_init David Wang
  2025-06-20 14:24 ` [PATCH] lib/test_vmalloc.c: demote vmalloc_test_init to late_initcall David Wang
  2 siblings, 2 replies; 18+ messages in thread
From: Harry Yoo @ 2025-06-19 14:10 UTC (permalink / raw)
  To: kernel test robot
  Cc: Uladzislau Rezki, oe-lkp, lkp, linux-kernel, Andrew Morton,
	Baoquan He, Adrian Huang, Christop Hellwig, Mateusz Guzik,
	linux-mm, Suren Baghdasaryan, Kent Overstreet

On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
> 
> Hello,
> 
> for this change, we reported
> "[linux-next:master] [lib/test_vmalloc.c]  7fc85b92db: Mem-Info"
> in
> https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
> 
> at that time, we made some tests with x86_64 config which runs well.
> 
> now we noticed the commit is in mainline now.

(Re-sending due to not Ccing people and the list...)

Hi, I'm facing the same error on my testing environment.

I think this is related to memory allocation profiling & code tagging
subsystems rather than vmalloc, so let's add related folks to Cc.

After a quick skimming of the code, it seems the condition
to trigger this is that on 1) MEM_ALLOC_PROFILING is compiled but
2) not enabled by default. and 3) allocation somehow failed, calling
alloc_tag_top_users().

I see "Memory allocation profiling is not supported!" in the dmesg,
which means it did not alloc & inititialize alloc_tag_cttype properly,
but alloc_tag_top_users() tries to acquire the semaphore.

I think the kernel should not call alloc_tag_top_users() at all (or it
should return an error) if mem_profiling_support == false?

Does the following work on your testing environment?

(Only did very light testing on my QEMU, but seems to fix the issue for me.)

diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index d48b80f3f007..57d4d5673855 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -134,7 +134,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
 	struct codetag_bytes n;
 	unsigned int i, nr = 0;
 
-	if (can_sleep)
+	if (!mem_profiling_support)
+		return 0;
+	else if (can_sleep)
 		codetag_lock_module_list(alloc_tag_cttype, true);
 	else if (!codetag_trylock_module_list(alloc_tag_cttype))
 		return 0;

> the config still has expected diff with parent:
> 
> --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config   2025-06-17 14:40:29.481052101 +0800
> +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config   2025-06-17 14:41:18.448543738 +0800
> @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m
>  CONFIG_TEST_MISC_MINOR=m
>  # CONFIG_TEST_LKM is not set
>  CONFIG_TEST_BITOPS=m
> -CONFIG_TEST_VMALLOC=m
> +CONFIG_TEST_VMALLOC=y
>  # CONFIG_TEST_BPF is not set
>  CONFIG_FIND_BIT_BENCHMARK=m
>  # CONFIG_TEST_FIRMWARE is not set
> 
> 
> then we noticed similar random issue with x86_64 randconfig this time.
> 
> 7a73348e5d4715b5 2d76e79315e403aab595d4c8830
> ---------------- ---------------------------
>        fail:runs  %reproduction    fail:runs
>            |             |             |
>            :199         34%          67:200   dmesg.KASAN:null-ptr-deref_in_range[#-#]
>            :199         34%          67:200   dmesg.Kernel_panic-not_syncing:Fatal_exception
>            :199         34%          67:200   dmesg.Mem-Info
>            :199         34%          67:200   dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN
>            :199         34%          67:200   dmesg.RIP:down_read_trylock
> 
> we don't have enough knowledge to understand the relationship between code
> change and the random issues. just report what we obsverved in our tests FYI.
> 
> below is full report.
> 
> 
> 
> kernel test robot noticed "Kernel_panic-not_syncing:Fatal_exception" on:
> 
> commit: 2d76e79315e403aab595d4c8830b7a46c19f0f3b ("lib/test_vmalloc.c: allow built-in execution")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> [test failed on linus/master      e04c78d86a9699d136910cfc0bdcf01087e3267e]
> [test failed on linux-next/master 050f8ad7b58d9079455af171ac279c4b9b828c11]
> 
> in testcase: boot
> 
> config: x86_64-randconfig-161-20250614
> compiler: gcc-12
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> 
> (please refer to attached dmesg/kmsg for entire log/backtrace)
> 
> 
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com
> 
> 
> [   36.902716][   T60] vmalloc_node_range for size 8192 failed: Address range restricted to 0xffffc90000000000 - 0xffffe8ffffffffff
> [   36.903981][   T60] vmalloc_test/0: vmalloc error: size 4096, vm_struct allocation failed, mode:0xdc0(GFP_KERNEL|__GFP_ZERO), nodemask=(null)
> [   36.905195][   T60] CPU: 1 UID: 0 PID: 60 Comm: vmalloc_test/0 Not tainted 6.15.0-rc6-00142-g2d76e79315e4 #1 VOLUNTARY 
> [   36.905201][   T60] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [   36.905203][   T60] Call Trace:
> [   36.905206][   T60]  <TASK>
> [   36.905209][   T60]  dump_stack_lvl+0x87/0xd6
> [   36.905223][   T60]  warn_alloc+0x15e/0x291
> [   36.905230][   T60]  ? has_managed_dma+0x37/0x37
> [   36.905237][   T60]  ? __get_vm_area_node+0x33a/0x3c0
> [   36.905244][   T60]  ? __get_vm_area_node+0x33a/0x3c0
> [   36.905250][   T60]  __vmalloc_node_range_noprof+0x170/0x306
> [   36.905255][   T60]  ? __vmalloc_area_node+0x460/0x460
> [   36.905260][   T60]  ? test_func+0x2ae/0x469
> [   36.905264][   T60]  __vmalloc_node_noprof+0xb8/0xd9
> [   36.905267][   T60]  ? test_func+0x2ae/0x469
> [   36.905272][   T60]  align_shift_alloc_test+0xa8/0x165
> [   36.905277][   T60]  test_func+0x2ae/0x469
> [   36.905281][   T60]  ? pcpu_alloc_test+0x31b/0x31b
> [   36.905286][   T60]  ? __kthread_parkme+0xcb/0x1a3
> [   36.905293][   T60]  ? pcpu_alloc_test+0x31b/0x31b
> [   36.905297][   T60]  kthread+0x452/0x464
> [   36.905301][   T60]  ? kthread_is_per_cpu+0x51/0x51
> [   36.905304][   T60]  ? _raw_spin_unlock_irq+0x23/0x35
> [   36.905308][   T60]  ? kthread_is_per_cpu+0x51/0x51
> [ 36.905311][ T60] ? kthread_is_per_cpu (kbuild/obj/consumer/x86_64-randconfig-161-20250614/kernel/kthread.c:413) 
> [ 36.905314][ T60] ret_from_fork (kbuild/obj/consumer/x86_64-randconfig-161-20250614/arch/x86/kernel/process.c:153) 
> [ 36.905318][ T60] ? kthread_is_per_cpu (kbuild/obj/consumer/x86_64-randconfig-161-20250614/kernel/kthread.c:413) 
> [ 36.905321][ T60] ret_from_fork_asm (kbuild/obj/consumer/x86_64-randconfig-161-20250614/arch/x86/entry/entry_64.S:255) 
> [   36.905330][   T60]  </TASK>
> [   36.905332][   T60] Mem-Info:
> [   36.919941][   T60] active_anon:0 inactive_anon:0 isolated_anon:0
> [   36.919941][   T60]  active_file:0 inactive_file:0 isolated_file:0
> [   36.919941][   T60]  unevictable:41612 dirty:0 writeback:0
> [   36.919941][   T60]  slab_reclaimable:7429 slab_unreclaimable:145259
> [   36.919941][   T60]  mapped:0 shmem:0 pagetables:145
> [   36.919941][   T60]  sec_pagetables:0 bounce:0
> [   36.919941][   T60]  kernel_misc_reclaimable:0
> [   36.919941][   T60]  free:3233392 free_pcp:1185 free_cma:0
> [   36.923830][   T60] Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:166448kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB writeback_tmp:0kB kernel_stack:1952kB pagetables:580kB sec_pagetables:0kB all_unreclaimable? no Balloon:0kB
> [   36.926265][   T60] DMA free:15360kB boost:0kB min:16kB low:28kB high:40kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> [   36.928855][   T60] lowmem_reserve[]: 0 2991 13741 13741
> [   36.929411][   T60] DMA32 free:3060560kB boost:0kB min:3224kB low:6244kB high:9264kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:3063680kB mlocked:0kB bounce:0kB free_pcp:3120kB local_pcp:3120kB free_cma:0kB
> [   36.932080][   T60] lowmem_reserve[]: 0 0 10749 10749
> [   36.932604][   T60] Normal free:9857648kB boost:0kB min:11744kB low:22748kB high:33752kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:166448kB writepending:0kB present:13631488kB managed:11007884kB mlocked:0kB bounce:0kB free_pcp:1620kB local_pcp:740kB free_cma:0kB
> [   36.935336][   T60] lowmem_reserve[]: 0 0 0 0
> [   36.935802][   T60] DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (U) 3*4096kB (M) = 15360kB
> [   36.936931][   T60] DMA32: 0*4kB 0*8kB 1*16kB (M) 2*32kB (M) 2*64kB (M) 1*128kB (M) 2*256kB (M) 2*512kB (M) 1*1024kB (M) 1*2048kB (M) 746*4096kB (M) = 3060560kB
> [   36.938318][   T60] Normal: 6*4kB (ME) 2*8kB (ME) 7*16kB (UME) 5*32kB (M) 3*64kB (ME) 4*128kB (M) 6*256kB (UME) 2*512kB (M) 1*1024kB (M) 3*2048kB (UME) 2404*4096kB (M) = 9857528kB
> [   36.939849][   T60] 41618 total pagecache pages
> [   36.940324][   T60] 4194174 pages RAM
> [   36.940721][   T60] 0 pages HighMem/MovableOnly
> [   36.941188][   T60] 672443 pages reserved
> [   36.941626][   T60] Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#1] SMP KASAN
> [   36.942185][   T60] KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
> [   36.942185][   T60] CPU: 1 UID: 0 PID: 60 Comm: vmalloc_test/0 Not tainted 6.15.0-rc6-00142-g2d76e79315e4 #1 VOLUNTARY 
> [   36.942185][   T60] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [   36.942185][   T60] RIP: 0010:down_read_trylock+0xa7/0x2b9
> [   36.942185][   T60] Code: b0 ef 25 91 e8 57 16 40 00 83 3d 9c e6 a7 09 00 0f 85 2c 01 00 00 48 8d 6b 68 b8 ff ff 37 00 48 89 ea 48 c1 e0 2a 48 c1 ea 03 <80> 3c 02 00 74 08 48 89 ef e8 3c 16 40 00 48 3b 5b 68 0f 84 00 01
> [   36.942185][   T60] RSP: 0000:ffff88814657f848 EFLAGS: 00010206
> [   36.942185][   T60] RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 1ffffffff224bdf6
> [   36.942185][   T60] RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> [   36.942185][   T60] RBP: 00000000000000d8 R08: 0000000000000000 R09: 0000000000000000
> [   36.942185][   T60] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff11028caff0a
> [   36.942185][   T60] R13: ffff88814657fa30 R14: dffffc0000000000 R15: 0000000000000000
> [   36.942185][   T60] FS:  0000000000000000(0000) GS:ffff88841c1f0000(0000) knlGS:0000000000000000
> [   36.942185][   T60] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   36.942185][   T60] CR2: 0000000000000000 CR3: 00000001636e0000 CR4: 00000000000406b0
> [   36.942185][   T60] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   36.942185][   T60] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   36.942185][   T60] Call Trace:
> [   36.942185][   T60]  <TASK>
> [   36.942185][   T60]  ? clear_nonspinnable+0x32/0x32
> [   36.942185][   T60]  ? vprintk_emit+0x165/0x194
> [   36.942185][   T60]  codetag_trylock_module_list+0xd/0x19
> [   36.942185][   T60]  alloc_tag_top_users+0x95/0x216
> [   36.942185][   T60]  ? _printk+0xad/0xdf
> [   36.942185][   T60]  ? reserve_module_tags+0x308/0x308
> [   36.942185][   T60]  __show_mem+0x167/0x54b
> [   36.942185][   T60]  ? _printk+0xad/0xdf
> [   36.942185][   T60]  ? printk_get_console_flush_type+0x272/0x272
> [   36.942185][   T60]  ? show_free_areas+0x115d/0x115d
> [   36.942185][   T60]  ? tracer_hardirqs_on+0x1b/0x28d
> [   36.942185][   T60]  ? dump_stack_lvl+0x91/0xd6
> [   36.942185][   T60]  ? warn_alloc+0x251/0x291
> [   36.942185][   T60]  warn_alloc+0x251/0x291
> [   36.942185][   T60]  ? has_managed_dma+0x37/0x37
> [   36.942185][   T60]  ? __get_vm_area_node+0x33a/0x3c0
> [   36.942185][   T60]  __vmalloc_node_range_noprof+0x170/0x306
> [   36.942185][   T60]  ? __vmalloc_area_node+0x460/0x460
> [   36.942185][   T60]  ? test_func+0x2ae/0x469
> [   36.942185][   T60]  __vmalloc_node_noprof+0xb8/0xd9
> [   36.942185][   T60]  ? test_func+0x2ae/0x469
> [   36.942185][   T60]  align_shift_alloc_test+0xa8/0x165
> [   36.942185][   T60]  test_func+0x2ae/0x469
> [   36.942185][   T60]  ? pcpu_alloc_test+0x31b/0x31b
> [   36.942185][   T60]  ? __kthread_parkme+0xcb/0x1a3
> [   36.942185][   T60]  ? pcpu_alloc_test+0x31b/0x31b
> [   36.942185][   T60]  kthread+0x452/0x464
> [   36.942185][   T60]  ? kthread_is_per_cpu+0x51/0x51
> [   36.942185][   T60]  ? _raw_spin_unlock_irq+0x23/0x35
> [   36.942185][   T60]  ? kthread_is_per_cpu+0x51/0x51
> [   36.942185][   T60]  ret_from_fork+0x20/0x54
> [   36.942185][   T60]  ? kthread_is_per_cpu+0x51/0x51
> [   36.942185][   T60]  ret_from_fork_asm+0x11/0x20
> [   36.942185][   T60]  </TASK>
> [   36.942185][   T60] Modules linked in:
> [   37.000652][   T60] ---[ end trace 0000000000000000 ]---
> [   37.001188][   T60] RIP: 0010:down_read_trylock+0xa7/0x2b9
> [   37.001731][   T60] Code: b0 ef 25 91 e8 57 16 40 00 83 3d 9c e6 a7 09 00 0f 85 2c 01 00 00 48 8d 6b 68 b8 ff ff 37 00 48 89 ea 48 c1 e0 2a 48 c1 ea 03 <80> 3c 02 00 74 08 48 89 ef e8 3c 16 40 00 48 3b 5b 68 0f 84 00 01
> [   37.003488][   T60] RSP: 0000:ffff88814657f848 EFLAGS: 00010206
> [   37.004072][   T60] RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 1ffffffff224bdf6
> [   37.004848][   T60] RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> [   37.005610][   T60] RBP: 00000000000000d8 R08: 0000000000000000 R09: 0000000000000000
> [   37.006381][   T60] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff11028caff0a
> [   37.007178][   T60] R13: ffff88814657fa30 R14: dffffc0000000000 R15: 0000000000000000
> [   37.007940][   T60] FS:  0000000000000000(0000) GS:ffff88841c1f0000(0000) knlGS:0000000000000000
> [   37.008792][   T60] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   37.009411][   T60] CR2: 0000000000000000 CR3: 00000001636e0000 CR4: 00000000000406b0
> [   37.010175][   T60] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   37.010950][   T60] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   37.011716][   T60] Kernel panic - not syncing: Fatal exception
> [   37.012397][   T60] Kernel Offset: 0x6200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> 
> 
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20250618/202506181351.bba867dd-lkp@intel.com
> 
> 
> 
> -- 
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
> 
> 

-- 
Cheers,
Harry / Hyeonggon

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: Kernel crash due to alloc_tag_top_users() being called when !mem_profiling_support?
  2025-06-19 14:10 ` Kernel crash due to alloc_tag_top_users() being called when !mem_profiling_support? Harry Yoo
@ 2025-06-19 15:04   ` Harry Yoo
  2025-06-20  8:47     ` Uladzislau Rezki
  2025-06-19 15:08   ` David Wang
  1 sibling, 1 reply; 18+ messages in thread
From: Harry Yoo @ 2025-06-19 15:04 UTC (permalink / raw)
  To: kernel test robot
  Cc: Uladzislau Rezki, oe-lkp, lkp, linux-kernel, Andrew Morton,
	Baoquan He, Adrian Huang, Christop Hellwig, Mateusz Guzik,
	linux-mm, Suren Baghdasaryan, Kent Overstreet

On Thu, Jun 19, 2025 at 11:10:43PM +0900, Harry Yoo wrote:
> On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
> > 
> > Hello,
> > 
> > for this change, we reported
> > "[linux-next:master] [lib/test_vmalloc.c]  7fc85b92db: Mem-Info"
> > in
> > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
> > 
> > at that time, we made some tests with x86_64 config which runs well.
> > 
> > now we noticed the commit is in mainline now.
> 
> (Re-sending due to not Ccing people and the list...)
> 
> Hi, I'm facing the same error on my testing environment.

I should have clarified that the reason the kernel failed to allocate
memory on my machine was due to running out of memory, not because of the
vmalloc test module.

But based on the fact that the test case (align_shift_alloc_test) is
expected to fail, the issue here is not memory allocation failure
itself, but rather that the kernel crashes when the allocation fails.

So I expect the fix below will work for you as well.

> I think this is related to memory allocation profiling & code tagging
> subsystems rather than vmalloc, so let's add related folks to Cc.
> 
> After a quick skimming of the code, it seems the condition
> to trigger this is that on 1) MEM_ALLOC_PROFILING is compiled but
> 2) not enabled by default. and 3) allocation somehow failed, calling
> alloc_tag_top_users().
> 
> I see "Memory allocation profiling is not supported!" in the dmesg,
> which means it did not alloc & inititialize alloc_tag_cttype properly,
> but alloc_tag_top_users() tries to acquire the semaphore.
> 
> I think the kernel should not call alloc_tag_top_users() at all (or it
> should return an error) if mem_profiling_support == false?
> 
> Does the following work on your testing environment?
> 
> (Only did very light testing on my QEMU, but seems to fix the issue for me.)
> 
> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> index d48b80f3f007..57d4d5673855 100644
> --- a/lib/alloc_tag.c
> +++ b/lib/alloc_tag.c
> @@ -134,7 +134,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
>  	struct codetag_bytes n;
>  	unsigned int i, nr = 0;
>  
> -	if (can_sleep)
> +	if (!mem_profiling_support)
> +		return 0;
> +	else if (can_sleep)
>  		codetag_lock_module_list(alloc_tag_cttype, true);
>  	else if (!codetag_trylock_module_list(alloc_tag_cttype))
>  		return 0;
> 
> > the config still has expected diff with parent:
> > 
> > --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config   2025-06-17 14:40:29.481052101 +0800
> > +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config   2025-06-17 14:41:18.448543738 +0800
> > @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m
> >  CONFIG_TEST_MISC_MINOR=m
> >  # CONFIG_TEST_LKM is not set
> >  CONFIG_TEST_BITOPS=m
> > -CONFIG_TEST_VMALLOC=m
> > +CONFIG_TEST_VMALLOC=y
> >  # CONFIG_TEST_BPF is not set
> >  CONFIG_FIND_BIT_BENCHMARK=m
> >  # CONFIG_TEST_FIRMWARE is not set
> > 
> > 
> > then we noticed similar random issue with x86_64 randconfig this time.
> > 
> > 7a73348e5d4715b5 2d76e79315e403aab595d4c8830
> > ---------------- ---------------------------
> >        fail:runs  %reproduction    fail:runs
> >            |             |             |
> >            :199         34%          67:200   dmesg.KASAN:null-ptr-deref_in_range[#-#]
> >            :199         34%          67:200   dmesg.Kernel_panic-not_syncing:Fatal_exception
> >            :199         34%          67:200   dmesg.Mem-Info
> >            :199         34%          67:200   dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN
> >            :199         34%          67:200   dmesg.RIP:down_read_trylock
> > 
> > we don't have enough knowledge to understand the relationship between code
> > change and the random issues. just report what we obsverved in our tests FYI.
> > 
> > below is full report.
> > 
> > 
> > 
> > kernel test robot noticed "Kernel_panic-not_syncing:Fatal_exception" on:
> > 
> > commit: 2d76e79315e403aab595d4c8830b7a46c19f0f3b ("lib/test_vmalloc.c: allow built-in execution")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > 
> > [test failed on linus/master      e04c78d86a9699d136910cfc0bdcf01087e3267e]
> > [test failed on linux-next/master 050f8ad7b58d9079455af171ac279c4b9b828c11]
> > 
> > in testcase: boot
> > 
> > config: x86_64-randconfig-161-20250614
> > compiler: gcc-12
> > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> > 
> > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > 
> > 
> > 
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > | Closes: https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com
> > 
> > 
> > [   36.902716][   T60] vmalloc_node_range for size 8192 failed: Address range restricted to 0xffffc90000000000 - 0xffffe8ffffffffff
> > [   36.903981][   T60] vmalloc_test/0: vmalloc error: size 4096, vm_struct allocation failed, mode:0xdc0(GFP_KERNEL|__GFP_ZERO), nodemask=(null)
> > [   36.905195][   T60] CPU: 1 UID: 0 PID: 60 Comm: vmalloc_test/0 Not tainted 6.15.0-rc6-00142-g2d76e79315e4 #1 VOLUNTARY 
> > [   36.905201][   T60] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > [   36.905203][   T60] Call Trace:
> > [   36.905206][   T60]  <TASK>
> > [   36.905209][   T60]  dump_stack_lvl+0x87/0xd6
> > [   36.905223][   T60]  warn_alloc+0x15e/0x291
> > [   36.905230][   T60]  ? has_managed_dma+0x37/0x37
> > [   36.905237][   T60]  ? __get_vm_area_node+0x33a/0x3c0
> > [   36.905244][   T60]  ? __get_vm_area_node+0x33a/0x3c0
> > [   36.905250][   T60]  __vmalloc_node_range_noprof+0x170/0x306
> > [   36.905255][   T60]  ? __vmalloc_area_node+0x460/0x460
> > [   36.905260][   T60]  ? test_func+0x2ae/0x469
> > [   36.905264][   T60]  __vmalloc_node_noprof+0xb8/0xd9
> > [   36.905267][   T60]  ? test_func+0x2ae/0x469
> > [   36.905272][   T60]  align_shift_alloc_test+0xa8/0x165
> > [   36.905277][   T60]  test_func+0x2ae/0x469
> > [   36.905281][   T60]  ? pcpu_alloc_test+0x31b/0x31b
> > [   36.905286][   T60]  ? __kthread_parkme+0xcb/0x1a3
> > [   36.905293][   T60]  ? pcpu_alloc_test+0x31b/0x31b
> > [   36.905297][   T60]  kthread+0x452/0x464
> > [   36.905301][   T60]  ? kthread_is_per_cpu+0x51/0x51
> > [   36.905304][   T60]  ? _raw_spin_unlock_irq+0x23/0x35
> > [   36.905308][   T60]  ? kthread_is_per_cpu+0x51/0x51
> > [ 36.905311][ T60] ? kthread_is_per_cpu (kbuild/obj/consumer/x86_64-randconfig-161-20250614/kernel/kthread.c:413) 
> > [ 36.905314][ T60] ret_from_fork (kbuild/obj/consumer/x86_64-randconfig-161-20250614/arch/x86/kernel/process.c:153) 
> > [ 36.905318][ T60] ? kthread_is_per_cpu (kbuild/obj/consumer/x86_64-randconfig-161-20250614/kernel/kthread.c:413) 
> > [ 36.905321][ T60] ret_from_fork_asm (kbuild/obj/consumer/x86_64-randconfig-161-20250614/arch/x86/entry/entry_64.S:255) 
> > [   36.905330][   T60]  </TASK>
> > [   36.905332][   T60] Mem-Info:
> > [   36.919941][   T60] active_anon:0 inactive_anon:0 isolated_anon:0
> > [   36.919941][   T60]  active_file:0 inactive_file:0 isolated_file:0
> > [   36.919941][   T60]  unevictable:41612 dirty:0 writeback:0
> > [   36.919941][   T60]  slab_reclaimable:7429 slab_unreclaimable:145259
> > [   36.919941][   T60]  mapped:0 shmem:0 pagetables:145
> > [   36.919941][   T60]  sec_pagetables:0 bounce:0
> > [   36.919941][   T60]  kernel_misc_reclaimable:0
> > [   36.919941][   T60]  free:3233392 free_pcp:1185 free_cma:0
> > [   36.923830][   T60] Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:166448kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB writeback_tmp:0kB kernel_stack:1952kB pagetables:580kB sec_pagetables:0kB all_unreclaimable? no Balloon:0kB
> > [   36.926265][   T60] DMA free:15360kB boost:0kB min:16kB low:28kB high:40kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
> > [   36.928855][   T60] lowmem_reserve[]: 0 2991 13741 13741
> > [   36.929411][   T60] DMA32 free:3060560kB boost:0kB min:3224kB low:6244kB high:9264kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:3063680kB mlocked:0kB bounce:0kB free_pcp:3120kB local_pcp:3120kB free_cma:0kB
> > [   36.932080][   T60] lowmem_reserve[]: 0 0 10749 10749
> > [   36.932604][   T60] Normal free:9857648kB boost:0kB min:11744kB low:22748kB high:33752kB reserved_highatomic:0KB free_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:166448kB writepending:0kB present:13631488kB managed:11007884kB mlocked:0kB bounce:0kB free_pcp:1620kB local_pcp:740kB free_cma:0kB
> > [   36.935336][   T60] lowmem_reserve[]: 0 0 0 0
> > [   36.935802][   T60] DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (U) 3*4096kB (M) = 15360kB
> > [   36.936931][   T60] DMA32: 0*4kB 0*8kB 1*16kB (M) 2*32kB (M) 2*64kB (M) 1*128kB (M) 2*256kB (M) 2*512kB (M) 1*1024kB (M) 1*2048kB (M) 746*4096kB (M) = 3060560kB
> > [   36.938318][   T60] Normal: 6*4kB (ME) 2*8kB (ME) 7*16kB (UME) 5*32kB (M) 3*64kB (ME) 4*128kB (M) 6*256kB (UME) 2*512kB (M) 1*1024kB (M) 3*2048kB (UME) 2404*4096kB (M) = 9857528kB
> > [   36.939849][   T60] 41618 total pagecache pages
> > [   36.940324][   T60] 4194174 pages RAM
> > [   36.940721][   T60] 0 pages HighMem/MovableOnly
> > [   36.941188][   T60] 672443 pages reserved
> > [   36.941626][   T60] Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#1] SMP KASAN
> > [   36.942185][   T60] KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
> > [   36.942185][   T60] CPU: 1 UID: 0 PID: 60 Comm: vmalloc_test/0 Not tainted 6.15.0-rc6-00142-g2d76e79315e4 #1 VOLUNTARY 
> > [   36.942185][   T60] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > [   36.942185][   T60] RIP: 0010:down_read_trylock+0xa7/0x2b9
> > [   36.942185][   T60] Code: b0 ef 25 91 e8 57 16 40 00 83 3d 9c e6 a7 09 00 0f 85 2c 01 00 00 48 8d 6b 68 b8 ff ff 37 00 48 89 ea 48 c1 e0 2a 48 c1 ea 03 <80> 3c 02 00 74 08 48 89 ef e8 3c 16 40 00 48 3b 5b 68 0f 84 00 01
> > [   36.942185][   T60] RSP: 0000:ffff88814657f848 EFLAGS: 00010206
> > [   36.942185][   T60] RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 1ffffffff224bdf6
> > [   36.942185][   T60] RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> > [   36.942185][   T60] RBP: 00000000000000d8 R08: 0000000000000000 R09: 0000000000000000
> > [   36.942185][   T60] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff11028caff0a
> > [   36.942185][   T60] R13: ffff88814657fa30 R14: dffffc0000000000 R15: 0000000000000000
> > [   36.942185][   T60] FS:  0000000000000000(0000) GS:ffff88841c1f0000(0000) knlGS:0000000000000000
> > [   36.942185][   T60] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   36.942185][   T60] CR2: 0000000000000000 CR3: 00000001636e0000 CR4: 00000000000406b0
> > [   36.942185][   T60] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [   36.942185][   T60] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [   36.942185][   T60] Call Trace:
> > [   36.942185][   T60]  <TASK>
> > [   36.942185][   T60]  ? clear_nonspinnable+0x32/0x32
> > [   36.942185][   T60]  ? vprintk_emit+0x165/0x194
> > [   36.942185][   T60]  codetag_trylock_module_list+0xd/0x19
> > [   36.942185][   T60]  alloc_tag_top_users+0x95/0x216
> > [   36.942185][   T60]  ? _printk+0xad/0xdf
> > [   36.942185][   T60]  ? reserve_module_tags+0x308/0x308
> > [   36.942185][   T60]  __show_mem+0x167/0x54b
> > [   36.942185][   T60]  ? _printk+0xad/0xdf
> > [   36.942185][   T60]  ? printk_get_console_flush_type+0x272/0x272
> > [   36.942185][   T60]  ? show_free_areas+0x115d/0x115d
> > [   36.942185][   T60]  ? tracer_hardirqs_on+0x1b/0x28d
> > [   36.942185][   T60]  ? dump_stack_lvl+0x91/0xd6
> > [   36.942185][   T60]  ? warn_alloc+0x251/0x291
> > [   36.942185][   T60]  warn_alloc+0x251/0x291
> > [   36.942185][   T60]  ? has_managed_dma+0x37/0x37
> > [   36.942185][   T60]  ? __get_vm_area_node+0x33a/0x3c0
> > [   36.942185][   T60]  __vmalloc_node_range_noprof+0x170/0x306
> > [   36.942185][   T60]  ? __vmalloc_area_node+0x460/0x460
> > [   36.942185][   T60]  ? test_func+0x2ae/0x469
> > [   36.942185][   T60]  __vmalloc_node_noprof+0xb8/0xd9
> > [   36.942185][   T60]  ? test_func+0x2ae/0x469
> > [   36.942185][   T60]  align_shift_alloc_test+0xa8/0x165
> > [   36.942185][   T60]  test_func+0x2ae/0x469
> > [   36.942185][   T60]  ? pcpu_alloc_test+0x31b/0x31b
> > [   36.942185][   T60]  ? __kthread_parkme+0xcb/0x1a3
> > [   36.942185][   T60]  ? pcpu_alloc_test+0x31b/0x31b
> > [   36.942185][   T60]  kthread+0x452/0x464
> > [   36.942185][   T60]  ? kthread_is_per_cpu+0x51/0x51
> > [   36.942185][   T60]  ? _raw_spin_unlock_irq+0x23/0x35
> > [   36.942185][   T60]  ? kthread_is_per_cpu+0x51/0x51
> > [   36.942185][   T60]  ret_from_fork+0x20/0x54
> > [   36.942185][   T60]  ? kthread_is_per_cpu+0x51/0x51
> > [   36.942185][   T60]  ret_from_fork_asm+0x11/0x20
> > [   36.942185][   T60]  </TASK>
> > [   36.942185][   T60] Modules linked in:
> > [   37.000652][   T60] ---[ end trace 0000000000000000 ]---
> > [   37.001188][   T60] RIP: 0010:down_read_trylock+0xa7/0x2b9
> > [   37.001731][   T60] Code: b0 ef 25 91 e8 57 16 40 00 83 3d 9c e6 a7 09 00 0f 85 2c 01 00 00 48 8d 6b 68 b8 ff ff 37 00 48 89 ea 48 c1 e0 2a 48 c1 ea 03 <80> 3c 02 00 74 08 48 89 ef e8 3c 16 40 00 48 3b 5b 68 0f 84 00 01
> > [   37.003488][   T60] RSP: 0000:ffff88814657f848 EFLAGS: 00010206
> > [   37.004072][   T60] RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 1ffffffff224bdf6
> > [   37.004848][   T60] RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> > [   37.005610][   T60] RBP: 00000000000000d8 R08: 0000000000000000 R09: 0000000000000000
> > [   37.006381][   T60] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff11028caff0a
> > [   37.007178][   T60] R13: ffff88814657fa30 R14: dffffc0000000000 R15: 0000000000000000
> > [   37.007940][   T60] FS:  0000000000000000(0000) GS:ffff88841c1f0000(0000) knlGS:0000000000000000
> > [   37.008792][   T60] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   37.009411][   T60] CR2: 0000000000000000 CR3: 00000001636e0000 CR4: 00000000000406b0
> > [   37.010175][   T60] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [   37.010950][   T60] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [   37.011716][   T60] Kernel panic - not syncing: Fatal exception
> > [   37.012397][   T60] Kernel Offset: 0x6200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > 
> > 
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20250618/202506181351.bba867dd-lkp@intel.com
> > 
> > 
> > 
> > -- 
> > 0-DAY CI Kernel Test Service
> > https://github.com/intel/lkp-tests/wiki
> > 
> > 
> 
> -- 
> Cheers,
> Harry / Hyeonggon

-- 
Cheers,
Harry / Hyeonggon

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Kernel crash due to alloc_tag_top_users() being called when !mem_profiling_support?
  2025-06-19 14:10 ` Kernel crash due to alloc_tag_top_users() being called when !mem_profiling_support? Harry Yoo
  2025-06-19 15:04   ` Harry Yoo
@ 2025-06-19 15:08   ` David Wang
  2025-06-20  1:14     ` Harry Yoo
  1 sibling, 1 reply; 18+ messages in thread
From: David Wang @ 2025-06-19 15:08 UTC (permalink / raw)
  To: harry.yoo, surenb, cachen
  Cc: ahuang12, akpm, bhe, hch, kent.overstreet, linux-kernel, linux-mm,
	lkp, mjguzik, oe-lkp, oliver.sang, urezki

> On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
> > 
> > Hello,
> > 
> > for this change, we reported
> > "[linux-next:master] [lib/test_vmalloc.c]  7fc85b92db: Mem-Info"
> > in
> > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
> > 
> > at that time, we made some tests with x86_64 config which runs well.
> > 
> > now we noticed the commit is in mainline now.
> 
> (Re-sending due to not Ccing people and the list...)
> 
> Hi, I'm facing the same error on my testing environment.
> 
> I think this is related to memory allocation profiling & code tagging
> subsystems rather than vmalloc, so let's add related folks to Cc.
> 
> After a quick skimming of the code, it seems the condition
> to trigger this is that on 1) MEM_ALLOC_PROFILING is compiled but
> 2) not enabled by default. and 3) allocation somehow failed, calling
> alloc_tag_top_users().
> 
> I see "Memory allocation profiling is not supported!" in the dmesg,
> which means it did not alloc & inititialize alloc_tag_cttype properly,
> but alloc_tag_top_users() tries to acquire the semaphore.
> 
> I think the kernel should not call alloc_tag_top_users() at all (or it
> should return an error) if mem_profiling_support == false?
> 
> Does the following work on your testing environment?
> 
> (Only did very light testing on my QEMU, but seems to fix the issue for me.)
> 
> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> index d48b80f3f007..57d4d5673855 100644
> --- a/lib/alloc_tag.c
> +++ b/lib/alloc_tag.c
> @@ -134,7 +134,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
>  	struct codetag_bytes n;
>  	unsigned int i, nr = 0;
>  
> -	if (can_sleep)
> +	if (!mem_profiling_support)
> +		return 0;
> +	else if (can_sleep)
>  		codetag_lock_module_list(alloc_tag_cttype, true);
>  	else if (!codetag_trylock_module_list(alloc_tag_cttype))
>  		return 0;

I think you are correct, this was introduced/exposed by
commit 780138b1 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
(Before the commit, the BUG would only be triggered when alloc_tag_init failed)


David


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Kernel crash due to alloc_tag_top_users() being called when !mem_profiling_support?
  2025-06-19 15:08   ` David Wang
@ 2025-06-20  1:14     ` Harry Yoo
  0 siblings, 0 replies; 18+ messages in thread
From: Harry Yoo @ 2025-06-20  1:14 UTC (permalink / raw)
  To: David Wang
  Cc: surenb, cachen, ahuang12, akpm, bhe, hch, kent.overstreet,
	linux-kernel, linux-mm, lkp, mjguzik, oe-lkp, oliver.sang, urezki

On Thu, Jun 19, 2025 at 11:08:09PM +0800, David Wang wrote:
> > On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
> > > 
> > > Hello,
> > > 
> > > for this change, we reported
> > > "[linux-next:master] [lib/test_vmalloc.c]  7fc85b92db: Mem-Info"
> > > in
> > > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
> > > 
> > > at that time, we made some tests with x86_64 config which runs well.
> > > 
> > > now we noticed the commit is in mainline now.
> > 
> > (Re-sending due to not Ccing people and the list...)
> > 
> > Hi, I'm facing the same error on my testing environment.
> > 
> > I think this is related to memory allocation profiling & code tagging
> > subsystems rather than vmalloc, so let's add related folks to Cc.
> > 
> > After a quick skimming of the code, it seems the condition
> > to trigger this is that on 1) MEM_ALLOC_PROFILING is compiled but
> > 2) not enabled by default. and 3) allocation somehow failed, calling
> > alloc_tag_top_users().
> > 
> > I see "Memory allocation profiling is not supported!" in the dmesg,
> > which means it did not alloc & inititialize alloc_tag_cttype properly,
> > but alloc_tag_top_users() tries to acquire the semaphore.
> > 
> > I think the kernel should not call alloc_tag_top_users() at all (or it
> > should return an error) if mem_profiling_support == false?
> > 
> > Does the following work on your testing environment?
> > 
> > (Only did very light testing on my QEMU, but seems to fix the issue for me.)
> > 
> > diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> > index d48b80f3f007..57d4d5673855 100644
> > --- a/lib/alloc_tag.c
> > +++ b/lib/alloc_tag.c
> > @@ -134,7 +134,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> >  	struct codetag_bytes n;
> >  	unsigned int i, nr = 0;
> >  
> > -	if (can_sleep)
> > +	if (!mem_profiling_support)
> > +		return 0;
> > +	else if (can_sleep)
> >  		codetag_lock_module_list(alloc_tag_cttype, true);
> >  	else if (!codetag_trylock_module_list(alloc_tag_cttype))
> >  		return 0;
> 
> I think you are correct, this was introduced/exposed by
> commit 780138b1 ("alloc_tag: check mem_profiling_support in alloc_tag_init")

Oh, I wasn't aware of that commit.
Thanks for pointing it out!

Indeed, prior to 780138b1, it was unconditionally allocated,
so it shouldn't have been a problem unless the allocation fails.

I've sent a formal patch to help testing.

> (Before the commit, the BUG would only be triggered when alloc_tag_init failed)

That is nearly impossible to trigger as the allocation size is
too small to fail, and the allocation is done at boot step,
so it shouldn't fail in practice.

Or should we be more paranoid and fix it in v6.12 stable?

-- 
Cheers,
Harry / Hyeonggon

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Kernel crash due to alloc_tag_top_users() being called when !mem_profiling_support?
  2025-06-19 15:04   ` Harry Yoo
@ 2025-06-20  8:47     ` Uladzislau Rezki
  2025-06-22 22:54       ` Suren Baghdasaryan
  0 siblings, 1 reply; 18+ messages in thread
From: Uladzislau Rezki @ 2025-06-20  8:47 UTC (permalink / raw)
  To: Harry Yoo
  Cc: kernel test robot, Uladzislau Rezki, oe-lkp, lkp, linux-kernel,
	Andrew Morton, Baoquan He, Adrian Huang, Christop Hellwig,
	Mateusz Guzik, linux-mm, Suren Baghdasaryan, Kent Overstreet

On Fri, Jun 20, 2025 at 12:04:50AM +0900, Harry Yoo wrote:
> On Thu, Jun 19, 2025 at 11:10:43PM +0900, Harry Yoo wrote:
> > On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
> > > 
> > > Hello,
> > > 
> > > for this change, we reported
> > > "[linux-next:master] [lib/test_vmalloc.c]  7fc85b92db: Mem-Info"
> > > in
> > > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
> > > 
> > > at that time, we made some tests with x86_64 config which runs well.
> > > 
> > > now we noticed the commit is in mainline now.
> > 
> > (Re-sending due to not Ccing people and the list...)
> > 
> > Hi, I'm facing the same error on my testing environment.
> 
> I should have clarified that the reason the kernel failed to allocate
> memory on my machine was due to running out of memory, not because of the
> vmalloc test module.
> 
> But based on the fact that the test case (align_shift_alloc_test) is
> expected to fail, the issue here is not memory allocation failure
> itself, but rather that the kernel crashes when the allocation fails.
> 
It looks someone tries to test the CONFIG_TEST_VMALLOC=y as built-in
approach test-cases. Yes, it will trigger a lot of warnings as some
use cases are supposed to be failed. This will trigger a lot of kernel
warnings which can be considered by test-robot or people as problem.

In this case i can exclude those use cases or even not run at all unless
boot-parameters properly sets if built-in.

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 18+ messages in thread

* CONFIG_TEST_VMALLOC=y conflict/race with alloc_tag_init
  2025-06-18  6:25 [linus:master] [lib/test_vmalloc.c] 2d76e79315: Kernel_panic-not_syncing:Fatal_exception kernel test robot
  2025-06-19 14:10 ` Kernel crash due to alloc_tag_top_users() being called when !mem_profiling_support? Harry Yoo
@ 2025-06-20 10:02 ` David Wang
  2025-06-22 22:50   ` Suren Baghdasaryan
  2025-06-20 14:24 ` [PATCH] lib/test_vmalloc.c: demote vmalloc_test_init to late_initcall David Wang
  2 siblings, 1 reply; 18+ messages in thread
From: David Wang @ 2025-06-20 10:02 UTC (permalink / raw)
  To: oliver.sang, urezki
  Cc: ahuang12, akpm, bhe, hch, linux-kernel, linux-mm, lkp, mjguzik,
	oe-lkp, harry.yoo, kent.overstreet, surenb

On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
> 
> Hello,
> 
> for this change, we reported
> "[linux-next:master] [lib/test_vmalloc.c]  7fc85b92db: Mem-Info"
> in
> https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
> 
> at that time, we made some tests with x86_64 config which runs well.
> 
> now we noticed the commit is in mainline now.

> the config still has expected diff with parent:
> 
> --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config   2025-06-17 14:40:29.481052101 +0800
> +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config   2025-06-17 14:41:18.448543738 +0800
> @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m
>  CONFIG_TEST_MISC_MINOR=m
>  # CONFIG_TEST_LKM is not set
>  CONFIG_TEST_BITOPS=m
> -CONFIG_TEST_VMALLOC=m
> +CONFIG_TEST_VMALLOC=y
>  # CONFIG_TEST_BPF is not set
>  CONFIG_FIND_BIT_BENCHMARK=m
>  # CONFIG_TEST_FIRMWARE is not set
> 
> 
> then we noticed similar random issue with x86_64 randconfig this time.
> 
> 7a73348e5d4715b5 2d76e79315e403aab595d4c8830
> ---------------- ---------------------------
>        fail:runs  %reproduction    fail:runs
>            |             |             |
>            :199         34%          67:200   dmesg.KASAN:null-ptr-deref_in_range[#-#]
>            :199         34%          67:200   dmesg.Kernel_panic-not_syncing:Fatal_exception
>            :199         34%          67:200   dmesg.Mem-Info
>            :199         34%          67:200   dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN
>            :199         34%          67:200   dmesg.RIP:down_read_trylock
> 
> we don't have enough knowledge to understand the relationship between code
> change and the random issues. just report what we obsverved in our tests FYI.
> 

I think this is caused by a race between vmalloc_test_init and alloc_tag_init.

vmalloc_test actually depends on alloc_tag via alloc_tag_top_users, because when
memory allocation fails show_mem() would invoke alloc_tag_top_users.

With following configuration:

CONFIG_TEST_VMALLOC=y
CONFIG_MEM_ALLOC_PROFILING=y
CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y
CONFIG_MEM_ALLOC_PROFILING_DEBUG=y

If vmalloc_test_init starts before alloc_tag_init, show_mem() would cause
a NULL deference because alloc_tag_cttype was not init yet.

I add some debug to confirm this theory
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index d48b80f3f007..9b8e7501010f 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -133,6 +133,8 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
        struct codetag *ct;
        struct codetag_bytes n;
        unsigned int i, nr = 0;
+       pr_info("memory profiling alloc top %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
+       return 0;
 
        if (can_sleep)
                codetag_lock_module_list(alloc_tag_cttype, true);
@@ -831,6 +833,7 @@ static int __init alloc_tag_init(void)
                shutdown_mem_profiling(true);
                return PTR_ERR(alloc_tag_cttype);
        }
+       pr_info("memory profiling ready %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
 
        return 0;
 }

When bootup the kernel, the log shows:

$ sudo dmesg -T | grep profiling
[Fri Jun 20 17:29:35 2025] memory profiling alloc top 1: 0  <--- alloc_tag_cttype == NULL
[Fri Jun 20 17:30:24 2025] memory profiling ready 1: ffff9b1641aa06c0


vmalloc_test_init should happened after alloc_tag_init if CONFIG_TEST_VMALLOC=y,
or mem_show() should check whether alloc_tag is done initialized when calling
alloc_tag_top_users



David


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH] lib/test_vmalloc.c: demote vmalloc_test_init to late_initcall
  2025-06-18  6:25 [linus:master] [lib/test_vmalloc.c] 2d76e79315: Kernel_panic-not_syncing:Fatal_exception kernel test robot
  2025-06-19 14:10 ` Kernel crash due to alloc_tag_top_users() being called when !mem_profiling_support? Harry Yoo
  2025-06-20 10:02 ` CONFIG_TEST_VMALLOC=y conflict/race with alloc_tag_init David Wang
@ 2025-06-20 14:24 ` David Wang
  2025-06-20 19:59   ` Harry Yoo
  2 siblings, 1 reply; 18+ messages in thread
From: David Wang @ 2025-06-20 14:24 UTC (permalink / raw)
  To: akpm, urezki
  Cc: linux-mm, linux-kernel, harry.yoo, kent.overstreet, surenb,
	David Wang, kernel test robot

Commit 2d76e79315e4 ("lib/test_vmalloc.c: allow built-in execution")
enable test_vmalloc module to be built into kernel directly, but
vmalloc_test_init depends on alloc_tag module via alloc_tag_top_users().

When a kernel build with following config:

CONFIG_TEST_VMALLOC=y
CONFIG_MEM_ALLOC_PROFILING=y
CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y
CONFIG_MEM_ALLOC_PROFILING_DEBUG=y

If vmalloc_test_init() run before alloc_tag_init(), memory
failure tests would invoke alloc_tag_top_users() which is not
ready to use and cause kernel BUG:

 [  135.116045] BUG: kernel NULL pointer dereference, address: 0000000000000030
 [  135.116063] #PF: supervisor read access in kernel mode
 [  135.116074] #PF: error_code(0x0000) - not-present page
 [  135.116085] PGD 0 P4D 0
 [  135.116094] Oops: Oops: 0000 [#1] SMP NOPTI
 [  135.116123] Tainted: [E]=UNSIGNED_MODULE
 [  135.116132] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
 [  135.116148] RIP: 0010:down_read_trylock+0x1d/0x80
 [  135.116188] RSP: 0000:ffffb5e481a9b8f8 EFLAGS: 00010246
 [  135.116200] RAX: ffff93dc8a5ac700 RBX: 0000000000000030 RCX: 8000000000000007
 [  135.116214] RDX: 0000000000000001 RSI: 000000000000000a RDI: ffffffff93d2e733
 [  135.116228] RBP: ffffb5e481a9b9a0 R08: 0000000000000000 R09: 0000000000000003
 [  135.116241] R10: ffffb5e481a9b860 R11: ffffffff94ec6328 R12: ffffb5e481a9b9b0
 [  135.116255] R13: 0000000000000003 R14: 0000000000000001 R15: ffffffff94e0c580
 [  135.116271] FS:  00007fd41947e540(0000) GS:ffff93dd6654a000(0000) knlGS:0000000000000000
 [  135.116286] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [  135.116298] CR2: 0000000000000030 CR3: 00000001099f8000 CR4: 0000000000350ef0
 [  135.116314] Call Trace:
 [  135.116321]  <TASK>
 [  135.116328]  codetag_trylock_module_list+0x9/0x20
 [  135.116342]  alloc_tag_top_users+0x153/0x1b0
 [  135.116354]  ? srso_return_thunk+0x5/0x5f
 [  135.116365]  ? _printk+0x57/0x80
 [  135.116378]  __show_mem+0xeb/0x210
 [  135.116394]  ? dump_header+0x2ce/0x3e0
 [  135.116405]  dump_header+0x2ce/0x3e0

Demote vmalloc_test_init to late_initcall can make sure alloc_tag
module got initialized before test_vmalloc module.

Link: https://lore.kernel.org/lkml/20250620100258.595495-1-00107082@163.com/
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com
Fixes: 2d76e79315e4 ("lib/test_vmalloc.c: allow built-in execution")
Signed-off-by: David Wang <00107082@163.com>
---
 lib/test_vmalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/test_vmalloc.c b/lib/test_vmalloc.c
index 1b0b59549aaf..5af009df56ad 100644
--- a/lib/test_vmalloc.c
+++ b/lib/test_vmalloc.c
@@ -598,7 +598,7 @@ static int __init vmalloc_test_init(void)
 	return IS_BUILTIN(CONFIG_TEST_VMALLOC) ? 0:-EAGAIN;
 }
 
-module_init(vmalloc_test_init)
+late_initcall(vmalloc_test_init)
 
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Uladzislau Rezki");
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] lib/test_vmalloc.c: demote vmalloc_test_init to late_initcall
  2025-06-20 14:24 ` [PATCH] lib/test_vmalloc.c: demote vmalloc_test_init to late_initcall David Wang
@ 2025-06-20 19:59   ` Harry Yoo
  0 siblings, 0 replies; 18+ messages in thread
From: Harry Yoo @ 2025-06-20 19:59 UTC (permalink / raw)
  To: David Wang
  Cc: akpm, urezki, linux-mm, linux-kernel, kent.overstreet, surenb,
	kernel test robot

On Fri, Jun 20, 2025 at 10:24:48PM +0800, David Wang wrote:
> Commit 2d76e79315e4 ("lib/test_vmalloc.c: allow built-in execution")
> enable test_vmalloc module to be built into kernel directly, but
> vmalloc_test_init depends on alloc_tag module via alloc_tag_top_users().
>
> When a kernel build with following config:
> 
> CONFIG_TEST_VMALLOC=y
> CONFIG_MEM_ALLOC_PROFILING=y
> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y
> CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
> 
> If vmalloc_test_init() run before alloc_tag_init(), memory
> failure tests would invoke alloc_tag_top_users() which is not
> ready to use and cause kernel BUG:
> 
>  [  135.116045] BUG: kernel NULL pointer dereference, address: 0000000000000030
>  [  135.116063] #PF: supervisor read access in kernel mode
>  [  135.116074] #PF: error_code(0x0000) - not-present page
>  [  135.116085] PGD 0 P4D 0
>  [  135.116094] Oops: Oops: 0000 [#1] SMP NOPTI
>  [  135.116123] Tainted: [E]=UNSIGNED_MODULE
>  [  135.116132] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
>  [  135.116148] RIP: 0010:down_read_trylock+0x1d/0x80
>  [  135.116188] RSP: 0000:ffffb5e481a9b8f8 EFLAGS: 00010246
>  [  135.116200] RAX: ffff93dc8a5ac700 RBX: 0000000000000030 RCX: 8000000000000007
>  [  135.116214] RDX: 0000000000000001 RSI: 000000000000000a RDI: ffffffff93d2e733
>  [  135.116228] RBP: ffffb5e481a9b9a0 R08: 0000000000000000 R09: 0000000000000003
>  [  135.116241] R10: ffffb5e481a9b860 R11: ffffffff94ec6328 R12: ffffb5e481a9b9b0
>  [  135.116255] R13: 0000000000000003 R14: 0000000000000001 R15: ffffffff94e0c580
>  [  135.116271] FS:  00007fd41947e540(0000) GS:ffff93dd6654a000(0000) knlGS:0000000000000000
>  [  135.116286] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  [  135.116298] CR2: 0000000000000030 CR3: 00000001099f8000 CR4: 0000000000350ef0
>  [  135.116314] Call Trace:
>  [  135.116321]  <TASK>
>  [  135.116328]  codetag_trylock_module_list+0x9/0x20
>  [  135.116342]  alloc_tag_top_users+0x153/0x1b0
>  [  135.116354]  ? srso_return_thunk+0x5/0x5f
>  [  135.116365]  ? _printk+0x57/0x80
>  [  135.116378]  __show_mem+0xeb/0x210
>  [  135.116394]  ? dump_header+0x2ce/0x3e0
>  [  135.116405]  dump_header+0x2ce/0x3e0
> 
> Demote vmalloc_test_init to late_initcall can make sure alloc_tag
> module got initialized before test_vmalloc module.

I'm not sure this is the right place to fix it.

The bug can be triggered by any early memory allocation failure,
before alloc_tag_init() is called (yeah, that's not that likely).

There is nothing specific to vmalloc that triggers the bug.

-- 
Cheers,
Harry / Hyeonggon

> Link: https://urldefense.com/v3/__https://lore.kernel.org/lkml/20250620100258.595495-1-00107082@163.com/__;!!ACWV5N9M2RV99hQ!NXhzLP0lE5O2YKK9PfCt3LDTk4qWGsy1ebNXBQETNNJrL2JS3R01iunwBVXbDA4_kKjrbyQWfzNa7iN5RQ$ 
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com__;!!ACWV5N9M2RV99hQ!NXhzLP0lE5O2YKK9PfCt3LDTk4qWGsy1ebNXBQETNNJrL2JS3R01iunwBVXbDA4_kKjrbyQWfzP95R5wqA$ 
> Fixes: 2d76e79315e4 ("lib/test_vmalloc.c: allow built-in execution")
> Signed-off-by: David Wang <00107082@163.com>
> ---
>  lib/test_vmalloc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/test_vmalloc.c b/lib/test_vmalloc.c
> index 1b0b59549aaf..5af009df56ad 100644
> --- a/lib/test_vmalloc.c
> +++ b/lib/test_vmalloc.c
> @@ -598,7 +598,7 @@ static int __init vmalloc_test_init(void)
>  	return IS_BUILTIN(CONFIG_TEST_VMALLOC) ? 0:-EAGAIN;
>  }
>  
> -module_init(vmalloc_test_init)
> +late_initcall(vmalloc_test_init)
>  
>  MODULE_LICENSE("GPL");
>  MODULE_AUTHOR("Uladzislau Rezki");
> -- 
> 2.39.2
> 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CONFIG_TEST_VMALLOC=y conflict/race with alloc_tag_init
  2025-06-20 10:02 ` CONFIG_TEST_VMALLOC=y conflict/race with alloc_tag_init David Wang
@ 2025-06-22 22:50   ` Suren Baghdasaryan
  2025-06-23  2:04     ` Harry Yoo
  2025-06-23  2:45     ` David Wang
  0 siblings, 2 replies; 18+ messages in thread
From: Suren Baghdasaryan @ 2025-06-22 22:50 UTC (permalink / raw)
  To: David Wang
  Cc: oliver.sang, urezki, ahuang12, akpm, bhe, hch, linux-kernel,
	linux-mm, lkp, mjguzik, oe-lkp, harry.yoo, kent.overstreet

On Fri, Jun 20, 2025 at 3:03 AM David Wang <00107082@163.com> wrote:
>
> On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
> >
> > Hello,
> >
> > for this change, we reported
> > "[linux-next:master] [lib/test_vmalloc.c]  7fc85b92db: Mem-Info"
> > in
> > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
> >
> > at that time, we made some tests with x86_64 config which runs well.
> >
> > now we noticed the commit is in mainline now.
>
> > the config still has expected diff with parent:
> >
> > --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config   2025-06-17 14:40:29.481052101 +0800
> > +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config   2025-06-17 14:41:18.448543738 +0800
> > @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m
> >  CONFIG_TEST_MISC_MINOR=m
> >  # CONFIG_TEST_LKM is not set
> >  CONFIG_TEST_BITOPS=m
> > -CONFIG_TEST_VMALLOC=m
> > +CONFIG_TEST_VMALLOC=y
> >  # CONFIG_TEST_BPF is not set
> >  CONFIG_FIND_BIT_BENCHMARK=m
> >  # CONFIG_TEST_FIRMWARE is not set
> >
> >
> > then we noticed similar random issue with x86_64 randconfig this time.
> >
> > 7a73348e5d4715b5 2d76e79315e403aab595d4c8830
> > ---------------- ---------------------------
> >        fail:runs  %reproduction    fail:runs
> >            |             |             |
> >            :199         34%          67:200   dmesg.KASAN:null-ptr-deref_in_range[#-#]
> >            :199         34%          67:200   dmesg.Kernel_panic-not_syncing:Fatal_exception
> >            :199         34%          67:200   dmesg.Mem-Info
> >            :199         34%          67:200   dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN
> >            :199         34%          67:200   dmesg.RIP:down_read_trylock
> >
> > we don't have enough knowledge to understand the relationship between code
> > change and the random issues. just report what we obsverved in our tests FYI.
> >
>
> I think this is caused by a race between vmalloc_test_init and alloc_tag_init.
>
> vmalloc_test actually depends on alloc_tag via alloc_tag_top_users, because when
> memory allocation fails show_mem() would invoke alloc_tag_top_users.
>
> With following configuration:
>
> CONFIG_TEST_VMALLOC=y
> CONFIG_MEM_ALLOC_PROFILING=y
> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y
> CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
>
> If vmalloc_test_init starts before alloc_tag_init, show_mem() would cause
> a NULL deference because alloc_tag_cttype was not init yet.
>
> I add some debug to confirm this theory
> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> index d48b80f3f007..9b8e7501010f 100644
> --- a/lib/alloc_tag.c
> +++ b/lib/alloc_tag.c
> @@ -133,6 +133,8 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
>         struct codetag *ct;
>         struct codetag_bytes n;
>         unsigned int i, nr = 0;
> +       pr_info("memory profiling alloc top %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
> +       return 0;
>
>         if (can_sleep)
>                 codetag_lock_module_list(alloc_tag_cttype, true);
> @@ -831,6 +833,7 @@ static int __init alloc_tag_init(void)
>                 shutdown_mem_profiling(true);
>                 return PTR_ERR(alloc_tag_cttype);
>         }
> +       pr_info("memory profiling ready %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
>
>         return 0;
>  }
>
> When bootup the kernel, the log shows:
>
> $ sudo dmesg -T | grep profiling
> [Fri Jun 20 17:29:35 2025] memory profiling alloc top 1: 0  <--- alloc_tag_cttype == NULL
> [Fri Jun 20 17:30:24 2025] memory profiling ready 1: ffff9b1641aa06c0
>
>
> vmalloc_test_init should happened after alloc_tag_init if CONFIG_TEST_VMALLOC=y,
> or mem_show() should check whether alloc_tag is done initialized when calling
> alloc_tag_top_users

Thanks for reporting!
So, IIUC https://lore.kernel.org/all/20250620195305.1115151-1-harry.yoo@oracle.com/
will address this issue as well. Is that correct?

>
>
>
> David
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Kernel crash due to alloc_tag_top_users() being called when !mem_profiling_support?
  2025-06-20  8:47     ` Uladzislau Rezki
@ 2025-06-22 22:54       ` Suren Baghdasaryan
  2025-06-23 11:29         ` Uladzislau Rezki
  0 siblings, 1 reply; 18+ messages in thread
From: Suren Baghdasaryan @ 2025-06-22 22:54 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Harry Yoo, kernel test robot, oe-lkp, lkp, linux-kernel,
	Andrew Morton, Baoquan He, Adrian Huang, Christop Hellwig,
	Mateusz Guzik, linux-mm, Kent Overstreet

On Fri, Jun 20, 2025 at 1:47 AM Uladzislau Rezki <urezki@gmail.com> wrote:
>
> On Fri, Jun 20, 2025 at 12:04:50AM +0900, Harry Yoo wrote:
> > On Thu, Jun 19, 2025 at 11:10:43PM +0900, Harry Yoo wrote:
> > > On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
> > > >
> > > > Hello,
> > > >
> > > > for this change, we reported
> > > > "[linux-next:master] [lib/test_vmalloc.c]  7fc85b92db: Mem-Info"
> > > > in
> > > > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
> > > >
> > > > at that time, we made some tests with x86_64 config which runs well.
> > > >
> > > > now we noticed the commit is in mainline now.
> > >
> > > (Re-sending due to not Ccing people and the list...)
> > >
> > > Hi, I'm facing the same error on my testing environment.
> >
> > I should have clarified that the reason the kernel failed to allocate
> > memory on my machine was due to running out of memory, not because of the
> > vmalloc test module.
> >
> > But based on the fact that the test case (align_shift_alloc_test) is
> > expected to fail, the issue here is not memory allocation failure
> > itself, but rather that the kernel crashes when the allocation fails.
> >
> It looks someone tries to test the CONFIG_TEST_VMALLOC=y as built-in
> approach test-cases. Yes, it will trigger a lot of warnings as some
> use cases are supposed to be failed. This will trigger a lot of kernel
> warnings which can be considered by test-robot or people as problem.
>
> In this case i can exclude those use cases or even not run at all unless
> boot-parameters properly sets if built-in.

Sorry, I'm catching up on my email backlog. IIUC
https://lore.kernel.org/all/20250620195305.1115151-1-harry.yoo@oracle.com/
addresses this issue. Is my understanding correct?

>
> --
> Uladzislau Rezki

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CONFIG_TEST_VMALLOC=y conflict/race with alloc_tag_init
  2025-06-22 22:50   ` Suren Baghdasaryan
@ 2025-06-23  2:04     ` Harry Yoo
  2025-06-23  2:45     ` David Wang
  1 sibling, 0 replies; 18+ messages in thread
From: Harry Yoo @ 2025-06-23  2:04 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: David Wang, oliver.sang, urezki, ahuang12, akpm, bhe, hch,
	linux-kernel, linux-mm, lkp, mjguzik, oe-lkp, kent.overstreet

On Sun, Jun 22, 2025 at 03:50:44PM -0700, Suren Baghdasaryan wrote:
> On Fri, Jun 20, 2025 at 3:03 AM David Wang <00107082@163.com> wrote:
> >
> > On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
> > >
> > > Hello,
> > >
> > > for this change, we reported
> > > "[linux-next:master] [lib/test_vmalloc.c]  7fc85b92db: Mem-Info"
> > > in
> > > https://urldefense.com/v3/__https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/__;!!ACWV5N9M2RV99hQ!LY3bHD8lW73pDdoyiPE87NlpBt6nrJCqoSCm7mxOX2M5tOiT__0NF9Hs2Qm0otnk8D6kx9-OrbpZWVI$ 
> > >
> > > at that time, we made some tests with x86_64 config which runs well.
> > >
> > > now we noticed the commit is in mainline now.
> >
> > > the config still has expected diff with parent:
> > >
> > > --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config   2025-06-17 14:40:29.481052101 +0800
> > > +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config   2025-06-17 14:41:18.448543738 +0800
> > > @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m
> > >  CONFIG_TEST_MISC_MINOR=m
> > >  # CONFIG_TEST_LKM is not set
> > >  CONFIG_TEST_BITOPS=m
> > > -CONFIG_TEST_VMALLOC=m
> > > +CONFIG_TEST_VMALLOC=y
> > >  # CONFIG_TEST_BPF is not set
> > >  CONFIG_FIND_BIT_BENCHMARK=m
> > >  # CONFIG_TEST_FIRMWARE is not set
> > >
> > >
> > > then we noticed similar random issue with x86_64 randconfig this time.
> > >
> > > 7a73348e5d4715b5 2d76e79315e403aab595d4c8830
> > > ---------------- ---------------------------
> > >        fail:runs  %reproduction    fail:runs
> > >            |             |             |
> > >            :199         34%          67:200   dmesg.KASAN:null-ptr-deref_in_range[#-#]
> > >            :199         34%          67:200   dmesg.Kernel_panic-not_syncing:Fatal_exception
> > >            :199         34%          67:200   dmesg.Mem-Info
> > >            :199         34%          67:200   dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN
> > >            :199         34%          67:200   dmesg.RIP:down_read_trylock
> > >
> > > we don't have enough knowledge to understand the relationship between code
> > > change and the random issues. just report what we obsverved in our tests FYI.
> > >
> >
> > I think this is caused by a race between vmalloc_test_init and alloc_tag_init.
> >
> > vmalloc_test actually depends on alloc_tag via alloc_tag_top_users, because when
> > memory allocation fails show_mem() would invoke alloc_tag_top_users.
> >
> > With following configuration:
> >
> > CONFIG_TEST_VMALLOC=y
> > CONFIG_MEM_ALLOC_PROFILING=y
> > CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y
> > CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
> >
> > If vmalloc_test_init starts before alloc_tag_init, show_mem() would cause
> > a NULL deference because alloc_tag_cttype was not init yet.
> >
> > I add some debug to confirm this theory
> > diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> > index d48b80f3f007..9b8e7501010f 100644
> > --- a/lib/alloc_tag.c
> > +++ b/lib/alloc_tag.c
> > @@ -133,6 +133,8 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> >         struct codetag *ct;
> >         struct codetag_bytes n;
> >         unsigned int i, nr = 0;
> > +       pr_info("memory profiling alloc top %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
> > +       return 0;
> >
> >         if (can_sleep)
> >                 codetag_lock_module_list(alloc_tag_cttype, true);
> > @@ -831,6 +833,7 @@ static int __init alloc_tag_init(void)
> >                 shutdown_mem_profiling(true);
> >                 return PTR_ERR(alloc_tag_cttype);
> >         }
> > +       pr_info("memory profiling ready %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
> >
> >         return 0;
> >  }
> >
> > When bootup the kernel, the log shows:
> >
> > $ sudo dmesg -T | grep profiling
> > [Fri Jun 20 17:29:35 2025] memory profiling alloc top 1: 0  <--- alloc_tag_cttype == NULL
> > [Fri Jun 20 17:30:24 2025] memory profiling ready 1: ffff9b1641aa06c0
> >
> >
> > vmalloc_test_init should happened after alloc_tag_init if CONFIG_TEST_VMALLOC=y,
> > or mem_show() should check whether alloc_tag is done initialized when calling
> > alloc_tag_top_users
> 
> Thanks for reporting!
> So, IIUC https://lore.kernel.org/all/20250620195305.1115151-1-harry.yoo@oracle.com/
> will address this issue as well. Is that correct?

Yes, I verified that it addresses this issue.

> >
> > David
> >

-- 
Cheers,
Harry / Hyeonggon

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CONFIG_TEST_VMALLOC=y conflict/race with alloc_tag_init
  2025-06-22 22:50   ` Suren Baghdasaryan
  2025-06-23  2:04     ` Harry Yoo
@ 2025-06-23  2:45     ` David Wang
  2025-06-23  3:16       ` David Wang
  2025-06-23 11:36       ` Uladzislau Rezki
  1 sibling, 2 replies; 18+ messages in thread
From: David Wang @ 2025-06-23  2:45 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: oliver.sang, urezki, ahuang12, akpm, bhe, hch, linux-kernel,
	linux-mm, lkp, mjguzik, oe-lkp, harry.yoo, kent.overstreet


At 2025-06-23 06:50:44, "Suren Baghdasaryan" <surenb@google.com> wrote:
>On Fri, Jun 20, 2025 at 3:03 AM David Wang <00107082@163.com> wrote:
>>
>> On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
>> >
>> > Hello,
>> >
>> > for this change, we reported
>> > "[linux-next:master] [lib/test_vmalloc.c]  7fc85b92db: Mem-Info"
>> > in
>> > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
>> >
>> > at that time, we made some tests with x86_64 config which runs well.
>> >
>> > now we noticed the commit is in mainline now.
>>
>> > the config still has expected diff with parent:
>> >
>> > --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config   2025-06-17 14:40:29.481052101 +0800
>> > +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config   2025-06-17 14:41:18.448543738 +0800
>> > @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m
>> >  CONFIG_TEST_MISC_MINOR=m
>> >  # CONFIG_TEST_LKM is not set
>> >  CONFIG_TEST_BITOPS=m
>> > -CONFIG_TEST_VMALLOC=m
>> > +CONFIG_TEST_VMALLOC=y
>> >  # CONFIG_TEST_BPF is not set
>> >  CONFIG_FIND_BIT_BENCHMARK=m
>> >  # CONFIG_TEST_FIRMWARE is not set
>> >
>> >
>> > then we noticed similar random issue with x86_64 randconfig this time.
>> >
>> > 7a73348e5d4715b5 2d76e79315e403aab595d4c8830
>> > ---------------- ---------------------------
>> >        fail:runs  %reproduction    fail:runs
>> >            |             |             |
>> >            :199         34%          67:200   dmesg.KASAN:null-ptr-deref_in_range[#-#]
>> >            :199         34%          67:200   dmesg.Kernel_panic-not_syncing:Fatal_exception
>> >            :199         34%          67:200   dmesg.Mem-Info
>> >            :199         34%          67:200   dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN
>> >            :199         34%          67:200   dmesg.RIP:down_read_trylock
>> >
>> > we don't have enough knowledge to understand the relationship between code
>> > change and the random issues. just report what we obsverved in our tests FYI.
>> >
>>
>> I think this is caused by a race between vmalloc_test_init and alloc_tag_init.
>>
>> vmalloc_test actually depends on alloc_tag via alloc_tag_top_users, because when
>> memory allocation fails show_mem() would invoke alloc_tag_top_users.
>>
>> With following configuration:
>>
>> CONFIG_TEST_VMALLOC=y
>> CONFIG_MEM_ALLOC_PROFILING=y
>> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y
>> CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
>>
>> If vmalloc_test_init starts before alloc_tag_init, show_mem() would cause
>> a NULL deference because alloc_tag_cttype was not init yet.
>>
>> I add some debug to confirm this theory
>> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
>> index d48b80f3f007..9b8e7501010f 100644
>> --- a/lib/alloc_tag.c
>> +++ b/lib/alloc_tag.c
>> @@ -133,6 +133,8 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
>>         struct codetag *ct;
>>         struct codetag_bytes n;
>>         unsigned int i, nr = 0;
>> +       pr_info("memory profiling alloc top %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
>> +       return 0;
>>
>>         if (can_sleep)
>>                 codetag_lock_module_list(alloc_tag_cttype, true);
>> @@ -831,6 +833,7 @@ static int __init alloc_tag_init(void)
>>                 shutdown_mem_profiling(true);
>>                 return PTR_ERR(alloc_tag_cttype);
>>         }
>> +       pr_info("memory profiling ready %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
>>
>>         return 0;
>>  }
>>
>> When bootup the kernel, the log shows:
>>
>> $ sudo dmesg -T | grep profiling
>> [Fri Jun 20 17:29:35 2025] memory profiling alloc top 1: 0  <--- alloc_tag_cttype == NULL
>> [Fri Jun 20 17:30:24 2025] memory profiling ready 1: ffff9b1641aa06c0
>>
>>
>> vmalloc_test_init should happened after alloc_tag_init if CONFIG_TEST_VMALLOC=y,
>> or mem_show() should check whether alloc_tag is done initialized when calling
>> alloc_tag_top_users
>
>Thanks for reporting!
>So, IIUC https://lore.kernel.org/all/20250620195305.1115151-1-harry.yoo@oracle.com/
>will address this issue as well. Is that correct?

Yes, the panic can be fix by that patch.

I still feel it better to delay vmalloc_test_init, make it happen after alloc_tag_init.
Or, maybe we can promote alloc_tag_init to some early init? I remember reporting some allocation
not registered by memory profiling during boot,  
https://lore.kernel.org/all/213ff7d2.7c6c.1945eb0c2ff.Coremail.00107082@163.com/

I will make some tests, and update later


David


>
>>
>>
>>
>> David
>>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CONFIG_TEST_VMALLOC=y conflict/race with alloc_tag_init
  2025-06-23  2:45     ` David Wang
@ 2025-06-23  3:16       ` David Wang
  2025-06-23  4:39         ` David Wang
  2025-06-23 11:36       ` Uladzislau Rezki
  1 sibling, 1 reply; 18+ messages in thread
From: David Wang @ 2025-06-23  3:16 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: oliver.sang, urezki, ahuang12, akpm, bhe, hch, linux-kernel,
	linux-mm, lkp, mjguzik, oe-lkp, harry.yoo, kent.overstreet


At 2025-06-23 10:45:31, "David Wang" <00107082@163.com> wrote:
>
>At 2025-06-23 06:50:44, "Suren Baghdasaryan" <surenb@google.com> wrote:
>>On Fri, Jun 20, 2025 at 3:03 AM David Wang <00107082@163.com> wrote:
>>>
>>> On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
>>> >
>>> > Hello,
>>> >
>>> > for this change, we reported
>>> > "[linux-next:master] [lib/test_vmalloc.c]  7fc85b92db: Mem-Info"
>>> > in
>>> > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
>>> >
>>> > at that time, we made some tests with x86_64 config which runs well.
>>> >
>>> > now we noticed the commit is in mainline now.
>>>
>>> > the config still has expected diff with parent:
>>> >
>>> > --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config   2025-06-17 14:40:29.481052101 +0800
>>> > +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config   2025-06-17 14:41:18.448543738 +0800
>>> > @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m
>>> >  CONFIG_TEST_MISC_MINOR=m
>>> >  # CONFIG_TEST_LKM is not set
>>> >  CONFIG_TEST_BITOPS=m
>>> > -CONFIG_TEST_VMALLOC=m
>>> > +CONFIG_TEST_VMALLOC=y
>>> >  # CONFIG_TEST_BPF is not set
>>> >  CONFIG_FIND_BIT_BENCHMARK=m
>>> >  # CONFIG_TEST_FIRMWARE is not set
>>> >
>>> >
>>> > then we noticed similar random issue with x86_64 randconfig this time.
>>> >
>>> > 7a73348e5d4715b5 2d76e79315e403aab595d4c8830
>>> > ---------------- ---------------------------
>>> >        fail:runs  %reproduction    fail:runs
>>> >            |             |             |
>>> >            :199         34%          67:200   dmesg.KASAN:null-ptr-deref_in_range[#-#]
>>> >            :199         34%          67:200   dmesg.Kernel_panic-not_syncing:Fatal_exception
>>> >            :199         34%          67:200   dmesg.Mem-Info
>>> >            :199         34%          67:200   dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN
>>> >            :199         34%          67:200   dmesg.RIP:down_read_trylock
>>> >
>>> > we don't have enough knowledge to understand the relationship between code
>>> > change and the random issues. just report what we obsverved in our tests FYI.
>>> >
>>>
>>> I think this is caused by a race between vmalloc_test_init and alloc_tag_init.
>>>
>>> vmalloc_test actually depends on alloc_tag via alloc_tag_top_users, because when
>>> memory allocation fails show_mem() would invoke alloc_tag_top_users.
>>>
>>> With following configuration:
>>>
>>> CONFIG_TEST_VMALLOC=y
>>> CONFIG_MEM_ALLOC_PROFILING=y
>>> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y
>>> CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
>>>
>>> If vmalloc_test_init starts before alloc_tag_init, show_mem() would cause
>>> a NULL deference because alloc_tag_cttype was not init yet.
>>>
>>> I add some debug to confirm this theory
>>> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
>>> index d48b80f3f007..9b8e7501010f 100644
>>> --- a/lib/alloc_tag.c
>>> +++ b/lib/alloc_tag.c
>>> @@ -133,6 +133,8 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
>>>         struct codetag *ct;
>>>         struct codetag_bytes n;
>>>         unsigned int i, nr = 0;
>>> +       pr_info("memory profiling alloc top %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
>>> +       return 0;
>>>
>>>         if (can_sleep)
>>>                 codetag_lock_module_list(alloc_tag_cttype, true);
>>> @@ -831,6 +833,7 @@ static int __init alloc_tag_init(void)
>>>                 shutdown_mem_profiling(true);
>>>                 return PTR_ERR(alloc_tag_cttype);
>>>         }
>>> +       pr_info("memory profiling ready %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
>>>
>>>         return 0;
>>>  }
>>>
>>> When bootup the kernel, the log shows:
>>>
>>> $ sudo dmesg -T | grep profiling
>>> [Fri Jun 20 17:29:35 2025] memory profiling alloc top 1: 0  <--- alloc_tag_cttype == NULL
>>> [Fri Jun 20 17:30:24 2025] memory profiling ready 1: ffff9b1641aa06c0
>>>
>>>
>>> vmalloc_test_init should happened after alloc_tag_init if CONFIG_TEST_VMALLOC=y,
>>> or mem_show() should check whether alloc_tag is done initialized when calling
>>> alloc_tag_top_users
>>
>>Thanks for reporting!
>>So, IIUC https://lore.kernel.org/all/20250620195305.1115151-1-harry.yoo@oracle.com/
>>will address this issue as well. Is that correct?
>
>Yes, the panic can be fix by that patch.
>
>I still feel it better to delay vmalloc_test_init, make it happen after alloc_tag_init.
>Or, maybe we can promote alloc_tag_init to some early init? I remember reporting some allocation
>not registered by memory profiling during boot,  
>https://lore.kernel.org/all/213ff7d2.7c6c.1945eb0c2ff.Coremail.00107082@163.com/
>
>I will make some tests, and update later

The memory allocations in sched_init_domains happened quite early, maybe it is core_initcall, while
 alloc_tag_init needs rootfs, it needs to be after rootfs_initcall, so no reasonable place to promote.......
But I think this explain why some allocation counter missed during boot: the allocation happened before alloc_tag_init


Thanks
David

>
>
>David
>
>
>>
>>>
>>>
>>>
>>> David
>>>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CONFIG_TEST_VMALLOC=y conflict/race with alloc_tag_init
  2025-06-23  3:16       ` David Wang
@ 2025-06-23  4:39         ` David Wang
  0 siblings, 0 replies; 18+ messages in thread
From: David Wang @ 2025-06-23  4:39 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: oliver.sang, urezki, ahuang12, akpm, bhe, hch, linux-kernel,
	linux-mm, lkp, mjguzik, oe-lkp, harry.yoo, kent.overstreet


At 2025-06-23 11:16:15, "David Wang" <00107082@163.com> wrote:
>
>At 2025-06-23 10:45:31, "David Wang" <00107082@163.com> wrote:
>>
>>At 2025-06-23 06:50:44, "Suren Baghdasaryan" <surenb@google.com> wrote:
>>>On Fri, Jun 20, 2025 at 3:03 AM David Wang <00107082@163.com> wrote:
>>>>
>>>> On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
>>>> >
>>>> > Hello,
>>>> >
>>>> > for this change, we reported
>>>> > "[linux-next:master] [lib/test_vmalloc.c]  7fc85b92db: Mem-Info"
>>>> > in
>>>> > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
>>>> >
>>>> > at that time, we made some tests with x86_64 config which runs well.
>>>> >
>>>> > now we noticed the commit is in mainline now.
>>>>
>>>> > the config still has expected diff with parent:
>>>> >
>>>> > --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config   2025-06-17 14:40:29.481052101 +0800
>>>> > +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config   2025-06-17 14:41:18.448543738 +0800
>>>> > @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m
>>>> >  CONFIG_TEST_MISC_MINOR=m
>>>> >  # CONFIG_TEST_LKM is not set
>>>> >  CONFIG_TEST_BITOPS=m
>>>> > -CONFIG_TEST_VMALLOC=m
>>>> > +CONFIG_TEST_VMALLOC=y
>>>> >  # CONFIG_TEST_BPF is not set
>>>> >  CONFIG_FIND_BIT_BENCHMARK=m
>>>> >  # CONFIG_TEST_FIRMWARE is not set
>>>> >
>>>> >
>>>> > then we noticed similar random issue with x86_64 randconfig this time.
>>>> >
>>>> > 7a73348e5d4715b5 2d76e79315e403aab595d4c8830
>>>> > ---------------- ---------------------------
>>>> >        fail:runs  %reproduction    fail:runs
>>>> >            |             |             |
>>>> >            :199         34%          67:200   dmesg.KASAN:null-ptr-deref_in_range[#-#]
>>>> >            :199         34%          67:200   dmesg.Kernel_panic-not_syncing:Fatal_exception
>>>> >            :199         34%          67:200   dmesg.Mem-Info
>>>> >            :199         34%          67:200   dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN
>>>> >            :199         34%          67:200   dmesg.RIP:down_read_trylock
>>>> >
>>>> > we don't have enough knowledge to understand the relationship between code
>>>> > change and the random issues. just report what we obsverved in our tests FYI.
>>>> >
>>>>
>>>> I think this is caused by a race between vmalloc_test_init and alloc_tag_init.
>>>>
>>>> vmalloc_test actually depends on alloc_tag via alloc_tag_top_users, because when
>>>> memory allocation fails show_mem() would invoke alloc_tag_top_users.
>>>>
>>>> With following configuration:
>>>>
>>>> CONFIG_TEST_VMALLOC=y
>>>> CONFIG_MEM_ALLOC_PROFILING=y
>>>> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y
>>>> CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
>>>>
>>>> If vmalloc_test_init starts before alloc_tag_init, show_mem() would cause
>>>> a NULL deference because alloc_tag_cttype was not init yet.
>>>>
>>>> I add some debug to confirm this theory
>>>> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
>>>> index d48b80f3f007..9b8e7501010f 100644
>>>> --- a/lib/alloc_tag.c
>>>> +++ b/lib/alloc_tag.c
>>>> @@ -133,6 +133,8 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
>>>>         struct codetag *ct;
>>>>         struct codetag_bytes n;
>>>>         unsigned int i, nr = 0;
>>>> +       pr_info("memory profiling alloc top %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
>>>> +       return 0;
>>>>
>>>>         if (can_sleep)
>>>>                 codetag_lock_module_list(alloc_tag_cttype, true);
>>>> @@ -831,6 +833,7 @@ static int __init alloc_tag_init(void)
>>>>                 shutdown_mem_profiling(true);
>>>>                 return PTR_ERR(alloc_tag_cttype);
>>>>         }
>>>> +       pr_info("memory profiling ready %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
>>>>
>>>>         return 0;
>>>>  }
>>>>
>>>> When bootup the kernel, the log shows:
>>>>
>>>> $ sudo dmesg -T | grep profiling
>>>> [Fri Jun 20 17:29:35 2025] memory profiling alloc top 1: 0  <--- alloc_tag_cttype == NULL
>>>> [Fri Jun 20 17:30:24 2025] memory profiling ready 1: ffff9b1641aa06c0
>>>>
>>>>
>>>> vmalloc_test_init should happened after alloc_tag_init if CONFIG_TEST_VMALLOC=y,
>>>> or mem_show() should check whether alloc_tag is done initialized when calling
>>>> alloc_tag_top_users
>>>
>>>Thanks for reporting!
>>>So, IIUC https://lore.kernel.org/all/20250620195305.1115151-1-harry.yoo@oracle.com/
>>>will address this issue as well. Is that correct?
>>
>>Yes, the panic can be fix by that patch.
>>
>>I still feel it better to delay vmalloc_test_init, make it happen after alloc_tag_init.
>>Or, maybe we can promote alloc_tag_init to some early init? I remember reporting some allocation
>>not registered by memory profiling during boot,  
>>https://lore.kernel.org/all/213ff7d2.7c6c.1945eb0c2ff.Coremail.00107082@163.com/
>>
>>I will make some tests, and update later
>
>The memory allocations in sched_init_domains happened quite early, maybe it is core_initcall, while
> alloc_tag_init needs rootfs, it needs to be after rootfs_initcall, so no reasonable place to promote.......
>But I think this explain why some allocation counter missed during boot: the allocation happened before alloc_tag_init

..... Sorry, I think I was wrong..... The counters does not need alloc_tag_init...

sorry for bothering, please ignore my mumbo jumbo.

David

>
>
>Thanks
>David
>
>>
>>
>>David
>>
>>
>>>
>>>>
>>>>
>>>>
>>>> David
>>>>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Kernel crash due to alloc_tag_top_users() being called when !mem_profiling_support?
  2025-06-22 22:54       ` Suren Baghdasaryan
@ 2025-06-23 11:29         ` Uladzislau Rezki
  0 siblings, 0 replies; 18+ messages in thread
From: Uladzislau Rezki @ 2025-06-23 11:29 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Uladzislau Rezki, Harry Yoo, kernel test robot, oe-lkp, lkp,
	linux-kernel, Andrew Morton, Baoquan He, Adrian Huang,
	Christop Hellwig, Mateusz Guzik, linux-mm, Kent Overstreet

On Sun, Jun 22, 2025 at 03:54:51PM -0700, Suren Baghdasaryan wrote:
> On Fri, Jun 20, 2025 at 1:47 AM Uladzislau Rezki <urezki@gmail.com> wrote:
> >
> > On Fri, Jun 20, 2025 at 12:04:50AM +0900, Harry Yoo wrote:
> > > On Thu, Jun 19, 2025 at 11:10:43PM +0900, Harry Yoo wrote:
> > > > On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > for this change, we reported
> > > > > "[linux-next:master] [lib/test_vmalloc.c]  7fc85b92db: Mem-Info"
> > > > > in
> > > > > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
> > > > >
> > > > > at that time, we made some tests with x86_64 config which runs well.
> > > > >
> > > > > now we noticed the commit is in mainline now.
> > > >
> > > > (Re-sending due to not Ccing people and the list...)
> > > >
> > > > Hi, I'm facing the same error on my testing environment.
> > >
> > > I should have clarified that the reason the kernel failed to allocate
> > > memory on my machine was due to running out of memory, not because of the
> > > vmalloc test module.
> > >
> > > But based on the fact that the test case (align_shift_alloc_test) is
> > > expected to fail, the issue here is not memory allocation failure
> > > itself, but rather that the kernel crashes when the allocation fails.
> > >
> > It looks someone tries to test the CONFIG_TEST_VMALLOC=y as built-in
> > approach test-cases. Yes, it will trigger a lot of warnings as some
> > use cases are supposed to be failed. This will trigger a lot of kernel
> > warnings which can be considered by test-robot or people as problem.
> >
> > In this case i can exclude those use cases or even not run at all unless
> > boot-parameters properly sets if built-in.
> 
> Sorry, I'm catching up on my email backlog. IIUC
> https://lore.kernel.org/all/20250620195305.1115151-1-harry.yoo@oracle.com/
> addresses this issue. Is my understanding correct?
> 
I checked/tested the .config from the test-robot in order to reproduce
the kernel crash. Unfortunately i can not trigger this. But, people from
the another thread already confirmed that it solves the crash.

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CONFIG_TEST_VMALLOC=y conflict/race with alloc_tag_init
  2025-06-23  2:45     ` David Wang
  2025-06-23  3:16       ` David Wang
@ 2025-06-23 11:36       ` Uladzislau Rezki
  2025-06-23 13:20         ` David Wang
  1 sibling, 1 reply; 18+ messages in thread
From: Uladzislau Rezki @ 2025-06-23 11:36 UTC (permalink / raw)
  To: David Wang
  Cc: Suren Baghdasaryan, oliver.sang, urezki, ahuang12, akpm, bhe, hch,
	linux-kernel, linux-mm, lkp, mjguzik, oe-lkp, harry.yoo,
	kent.overstreet

On Mon, Jun 23, 2025 at 10:45:31AM +0800, David Wang wrote:
> 
> At 2025-06-23 06:50:44, "Suren Baghdasaryan" <surenb@google.com> wrote:
> >On Fri, Jun 20, 2025 at 3:03 AM David Wang <00107082@163.com> wrote:
> >>
> >> On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
> >> >
> >> > Hello,
> >> >
> >> > for this change, we reported
> >> > "[linux-next:master] [lib/test_vmalloc.c]  7fc85b92db: Mem-Info"
> >> > in
> >> > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
> >> >
> >> > at that time, we made some tests with x86_64 config which runs well.
> >> >
> >> > now we noticed the commit is in mainline now.
> >>
> >> > the config still has expected diff with parent:
> >> >
> >> > --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config   2025-06-17 14:40:29.481052101 +0800
> >> > +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config   2025-06-17 14:41:18.448543738 +0800
> >> > @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m
> >> >  CONFIG_TEST_MISC_MINOR=m
> >> >  # CONFIG_TEST_LKM is not set
> >> >  CONFIG_TEST_BITOPS=m
> >> > -CONFIG_TEST_VMALLOC=m
> >> > +CONFIG_TEST_VMALLOC=y
> >> >  # CONFIG_TEST_BPF is not set
> >> >  CONFIG_FIND_BIT_BENCHMARK=m
> >> >  # CONFIG_TEST_FIRMWARE is not set
> >> >
> >> >
> >> > then we noticed similar random issue with x86_64 randconfig this time.
> >> >
> >> > 7a73348e5d4715b5 2d76e79315e403aab595d4c8830
> >> > ---------------- ---------------------------
> >> >        fail:runs  %reproduction    fail:runs
> >> >            |             |             |
> >> >            :199         34%          67:200   dmesg.KASAN:null-ptr-deref_in_range[#-#]
> >> >            :199         34%          67:200   dmesg.Kernel_panic-not_syncing:Fatal_exception
> >> >            :199         34%          67:200   dmesg.Mem-Info
> >> >            :199         34%          67:200   dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN
> >> >            :199         34%          67:200   dmesg.RIP:down_read_trylock
> >> >
> >> > we don't have enough knowledge to understand the relationship between code
> >> > change and the random issues. just report what we obsverved in our tests FYI.
> >> >
> >>
> >> I think this is caused by a race between vmalloc_test_init and alloc_tag_init.
> >>
> >> vmalloc_test actually depends on alloc_tag via alloc_tag_top_users, because when
> >> memory allocation fails show_mem() would invoke alloc_tag_top_users.
> >>
> >> With following configuration:
> >>
> >> CONFIG_TEST_VMALLOC=y
> >> CONFIG_MEM_ALLOC_PROFILING=y
> >> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y
> >> CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
> >>
> >> If vmalloc_test_init starts before alloc_tag_init, show_mem() would cause
> >> a NULL deference because alloc_tag_cttype was not init yet.
> >>
> >> I add some debug to confirm this theory
> >> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> >> index d48b80f3f007..9b8e7501010f 100644
> >> --- a/lib/alloc_tag.c
> >> +++ b/lib/alloc_tag.c
> >> @@ -133,6 +133,8 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> >>         struct codetag *ct;
> >>         struct codetag_bytes n;
> >>         unsigned int i, nr = 0;
> >> +       pr_info("memory profiling alloc top %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
> >> +       return 0;
> >>
> >>         if (can_sleep)
> >>                 codetag_lock_module_list(alloc_tag_cttype, true);
> >> @@ -831,6 +833,7 @@ static int __init alloc_tag_init(void)
> >>                 shutdown_mem_profiling(true);
> >>                 return PTR_ERR(alloc_tag_cttype);
> >>         }
> >> +       pr_info("memory profiling ready %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
> >>
> >>         return 0;
> >>  }
> >>
> >> When bootup the kernel, the log shows:
> >>
> >> $ sudo dmesg -T | grep profiling
> >> [Fri Jun 20 17:29:35 2025] memory profiling alloc top 1: 0  <--- alloc_tag_cttype == NULL
> >> [Fri Jun 20 17:30:24 2025] memory profiling ready 1: ffff9b1641aa06c0
> >>
> >>
> >> vmalloc_test_init should happened after alloc_tag_init if CONFIG_TEST_VMALLOC=y,
> >> or mem_show() should check whether alloc_tag is done initialized when calling
> >> alloc_tag_top_users
> >
> >Thanks for reporting!
> >So, IIUC https://lore.kernel.org/all/20250620195305.1115151-1-harry.yoo@oracle.com/
> >will address this issue as well. Is that correct?
> 
> Yes, the panic can be fix by that patch.
> 
> I still feel it better to delay vmalloc_test_init, make it happen after alloc_tag_init.
>
We can, but then we would not notice the bag that is in question :)

At least we should, i think, to exclude the tests which trigger warnings
when the test-suite is run with default configurations, i.e. run the tests
which are not supposed to fail.

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CONFIG_TEST_VMALLOC=y conflict/race with alloc_tag_init
  2025-06-23 11:36       ` Uladzislau Rezki
@ 2025-06-23 13:20         ` David Wang
  0 siblings, 0 replies; 18+ messages in thread
From: David Wang @ 2025-06-23 13:20 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Suren Baghdasaryan, oliver.sang, ahuang12, akpm, bhe, hch,
	linux-kernel, linux-mm, lkp, mjguzik, oe-lkp, harry.yoo,
	kent.overstreet


At 2025-06-23 19:36:03, "Uladzislau Rezki" <urezki@gmail.com> wrote:
>On Mon, Jun 23, 2025 at 10:45:31AM +0800, David Wang wrote:
>> 
>> At 2025-06-23 06:50:44, "Suren Baghdasaryan" <surenb@google.com> wrote:
>> >On Fri, Jun 20, 2025 at 3:03 AM David Wang <00107082@163.com> wrote:
>> >>
>> >> On Wed, Jun 18, 2025 at 02:25:37PM +0800, kernel test robot wrote:
>> >> >
>> >> > Hello,
>> >> >
>> >> > for this change, we reported
>> >> > "[linux-next:master] [lib/test_vmalloc.c]  7fc85b92db: Mem-Info"
>> >> > in
>> >> > https://lore.kernel.org/all/202505071555.e757f1e0-lkp@intel.com/
>> >> >
>> >> > at that time, we made some tests with x86_64 config which runs well.
>> >> >
>> >> > now we noticed the commit is in mainline now.
>> >>
>> >> > the config still has expected diff with parent:
>> >> >
>> >> > --- /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/7a73348e5d4715b5565a53f21c01ea7b54e46cbd/.config   2025-06-17 14:40:29.481052101 +0800
>> >> > +++ /pkg/linux/x86_64-randconfig-161-20250614/gcc-12/2d76e79315e403aab595d4c8830b7a46c19f0f3b/.config   2025-06-17 14:41:18.448543738 +0800
>> >> > @@ -7551,7 +7551,7 @@ CONFIG_TEST_IDA=m
>> >> >  CONFIG_TEST_MISC_MINOR=m
>> >> >  # CONFIG_TEST_LKM is not set
>> >> >  CONFIG_TEST_BITOPS=m
>> >> > -CONFIG_TEST_VMALLOC=m
>> >> > +CONFIG_TEST_VMALLOC=y
>> >> >  # CONFIG_TEST_BPF is not set
>> >> >  CONFIG_FIND_BIT_BENCHMARK=m
>> >> >  # CONFIG_TEST_FIRMWARE is not set
>> >> >
>> >> >
>> >> > then we noticed similar random issue with x86_64 randconfig this time.
>> >> >
>> >> > 7a73348e5d4715b5 2d76e79315e403aab595d4c8830
>> >> > ---------------- ---------------------------
>> >> >        fail:runs  %reproduction    fail:runs
>> >> >            |             |             |
>> >> >            :199         34%          67:200   dmesg.KASAN:null-ptr-deref_in_range[#-#]
>> >> >            :199         34%          67:200   dmesg.Kernel_panic-not_syncing:Fatal_exception
>> >> >            :199         34%          67:200   dmesg.Mem-Info
>> >> >            :199         34%          67:200   dmesg.Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]SMP_KASAN
>> >> >            :199         34%          67:200   dmesg.RIP:down_read_trylock
>> >> >
>> >> > we don't have enough knowledge to understand the relationship between code
>> >> > change and the random issues. just report what we obsverved in our tests FYI.
>> >> >
>> >>
>> >> I think this is caused by a race between vmalloc_test_init and alloc_tag_init.
>> >>
>> >> vmalloc_test actually depends on alloc_tag via alloc_tag_top_users, because when
>> >> memory allocation fails show_mem() would invoke alloc_tag_top_users.
>> >>
>> >> With following configuration:
>> >>
>> >> CONFIG_TEST_VMALLOC=y
>> >> CONFIG_MEM_ALLOC_PROFILING=y
>> >> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y
>> >> CONFIG_MEM_ALLOC_PROFILING_DEBUG=y
>> >>
>> >> If vmalloc_test_init starts before alloc_tag_init, show_mem() would cause
>> >> a NULL deference because alloc_tag_cttype was not init yet.
>> >>
>> >> I add some debug to confirm this theory
>> >> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
>> >> index d48b80f3f007..9b8e7501010f 100644
>> >> --- a/lib/alloc_tag.c
>> >> +++ b/lib/alloc_tag.c
>> >> @@ -133,6 +133,8 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
>> >>         struct codetag *ct;
>> >>         struct codetag_bytes n;
>> >>         unsigned int i, nr = 0;
>> >> +       pr_info("memory profiling alloc top %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
>> >> +       return 0;
>> >>
>> >>         if (can_sleep)
>> >>                 codetag_lock_module_list(alloc_tag_cttype, true);
>> >> @@ -831,6 +833,7 @@ static int __init alloc_tag_init(void)
>> >>                 shutdown_mem_profiling(true);
>> >>                 return PTR_ERR(alloc_tag_cttype);
>> >>         }
>> >> +       pr_info("memory profiling ready %d: %llx\n", mem_profiling_support, (long long)alloc_tag_cttype);
>> >>
>> >>         return 0;
>> >>  }
>> >>
>> >> When bootup the kernel, the log shows:
>> >>
>> >> $ sudo dmesg -T | grep profiling
>> >> [Fri Jun 20 17:29:35 2025] memory profiling alloc top 1: 0  <--- alloc_tag_cttype == NULL
>> >> [Fri Jun 20 17:30:24 2025] memory profiling ready 1: ffff9b1641aa06c0
>> >>
>> >>
>> >> vmalloc_test_init should happened after alloc_tag_init if CONFIG_TEST_VMALLOC=y,
>> >> or mem_show() should check whether alloc_tag is done initialized when calling
>> >> alloc_tag_top_users
>> >
>> >Thanks for reporting!
>> >So, IIUC https://lore.kernel.org/all/20250620195305.1115151-1-harry.yoo@oracle.com/
>> >will address this issue as well. Is that correct?
>> 
>> Yes, the panic can be fix by that patch.
>> 
>> I still feel it better to delay vmalloc_test_init, make it happen after alloc_tag_init.
>>
>We can, but then we would not notice the bag that is in question :)

Yes,   strangely lucky here~ :)
I was thinking, if some vmalloc tests fail, is alloc_tag_top_users helpful for debug?
Considering this bug has already been caught,  if alloc_tag_top_users is helpful for vmalloc test analysis,
maybe it is still reasonable to delay vmalloc_test_init?... ☺︎

>
>At least we should, i think, to exclude the tests which trigger warnings
>when the test-suite is run with default configurations, i.e. run the tests
>which are not supposed to fail.



>
>--
>Uladzislau Rezki

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-06-23 13:21 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-18  6:25 [linus:master] [lib/test_vmalloc.c] 2d76e79315: Kernel_panic-not_syncing:Fatal_exception kernel test robot
2025-06-19 14:10 ` Kernel crash due to alloc_tag_top_users() being called when !mem_profiling_support? Harry Yoo
2025-06-19 15:04   ` Harry Yoo
2025-06-20  8:47     ` Uladzislau Rezki
2025-06-22 22:54       ` Suren Baghdasaryan
2025-06-23 11:29         ` Uladzislau Rezki
2025-06-19 15:08   ` David Wang
2025-06-20  1:14     ` Harry Yoo
2025-06-20 10:02 ` CONFIG_TEST_VMALLOC=y conflict/race with alloc_tag_init David Wang
2025-06-22 22:50   ` Suren Baghdasaryan
2025-06-23  2:04     ` Harry Yoo
2025-06-23  2:45     ` David Wang
2025-06-23  3:16       ` David Wang
2025-06-23  4:39         ` David Wang
2025-06-23 11:36       ` Uladzislau Rezki
2025-06-23 13:20         ` David Wang
2025-06-20 14:24 ` [PATCH] lib/test_vmalloc.c: demote vmalloc_test_init to late_initcall David Wang
2025-06-20 19:59   ` Harry Yoo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).