* [PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()
@ 2025-06-24 7:25 Harry Yoo
2025-06-24 8:21 ` David Wang
` (2 more replies)
0 siblings, 3 replies; 22+ messages in thread
From: Harry Yoo @ 2025-06-24 7:25 UTC (permalink / raw)
To: akpm, surenb, kent.overstreet
Cc: oliver.sang, 00107082, cachen, linux-mm, oe-lkp, Harry Yoo,
stable
alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
even when the alloc_tag_cttype is not allocated because:
1) alloc tagging is disabled because mem profiling is disabled
(!alloc_tag_cttype)
2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
3) alloc tagging is enabled, but failed initialization
(!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
In all cases, alloc_tag_cttype is not allocated, and therefore
alloc_tag_top_users() should not attempt to acquire the semaphore.
This leads to a crash on memory allocation failure by attempting to
acquire a non-existent semaphore:
Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
Tainted: [D]=DIE
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:down_read_trylock+0xaa/0x3b0
Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
Call Trace:
<TASK>
codetag_trylock_module_list+0xd/0x20
alloc_tag_top_users+0x369/0x4b0
__show_mem+0x1cd/0x6e0
warn_alloc+0x2b1/0x390
__alloc_frozen_pages_noprof+0x12b9/0x21a0
alloc_pages_mpol+0x135/0x3e0
alloc_slab_page+0x82/0xe0
new_slab+0x212/0x240
___slab_alloc+0x82a/0xe00
</TASK>
As David Wang points out, this issue became easier to trigger after commit
780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
Before the commit, the issue occurred only when it failed to allocate
and initialize alloc_tag_cttype or if a memory allocation fails before
alloc_tag_init() is called. After the commit, it can be easily triggered
when memory profiling is compiled but disabled at boot.
To properly determine whether alloc_tag_init() has been called and
its data structures initialized, verify that alloc_tag_cttype is a valid
pointer before acquiring the semaphore. If the variable is NULL or an error
value, it has not been properly initialized. In such a case, just skip
and do not attempt to acquire the semaphore.
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com
Closes: https://lore.kernel.org/oe-lkp/202506131711.5b41931c-lkp@intel.com
Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
Cc: stable@vger.kernel.org
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---
@Suren: I did not add another pr_warn() because every error path in
alloc_tag_init() already has pr_err().
v2 -> v3:
- Added another Closes: tag (David)
- Moved the condition into a standalone if block for better readability
(Suren)
- Typo fix (Suren)
lib/alloc_tag.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 41ccfb035b7b..e9b33848700a 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -127,6 +127,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
struct codetag_bytes n;
unsigned int i, nr = 0;
+ if (IS_ERR_OR_NULL(alloc_tag_cttype))
+ return 0;
+
if (can_sleep)
codetag_lock_module_list(alloc_tag_cttype, true);
else if (!codetag_trylock_module_list(alloc_tag_cttype))
--
2.43.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re:[PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()
2025-06-24 7:25 [PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users() Harry Yoo
@ 2025-06-24 8:21 ` David Wang
2025-06-24 9:09 ` [PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()y Harry Yoo
2025-06-24 13:25 ` Re:[PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users() David Wang
2025-06-30 6:25 ` Raghavendra K T
2 siblings, 1 reply; 22+ messages in thread
From: David Wang @ 2025-06-24 8:21 UTC (permalink / raw)
To: Harry Yoo
Cc: akpm, surenb, kent.overstreet, oliver.sang, cachen, linux-mm,
oe-lkp, stable
At 2025-06-24 15:25:13, "Harry Yoo" <harry.yoo@oracle.com> wrote:
>alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
>even when the alloc_tag_cttype is not allocated because:
>
> 1) alloc tagging is disabled because mem profiling is disabled
> (!alloc_tag_cttype)
> 2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
> 3) alloc tagging is enabled, but failed initialization
> (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
>
>In all cases, alloc_tag_cttype is not allocated, and therefore
>alloc_tag_top_users() should not attempt to acquire the semaphore.
>
>This leads to a crash on memory allocation failure by attempting to
>acquire a non-existent semaphore:
>
> Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
> KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
> CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
> Tainted: [D]=DIE
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> RIP: 0010:down_read_trylock+0xaa/0x3b0
> Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
> RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
> RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
> RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
> R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
> R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
> FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
> Call Trace:
> <TASK>
> codetag_trylock_module_list+0xd/0x20
> alloc_tag_top_users+0x369/0x4b0
> __show_mem+0x1cd/0x6e0
> warn_alloc+0x2b1/0x390
> __alloc_frozen_pages_noprof+0x12b9/0x21a0
> alloc_pages_mpol+0x135/0x3e0
> alloc_slab_page+0x82/0xe0
> new_slab+0x212/0x240
> ___slab_alloc+0x82a/0xe00
> </TASK>
>
>As David Wang points out, this issue became easier to trigger after commit
>780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
>
>Before the commit, the issue occurred only when it failed to allocate
>and initialize alloc_tag_cttype or if a memory allocation fails before
>alloc_tag_init() is called. After the commit, it can be easily triggered
>when memory profiling is compiled but disabled at boot.
>
>To properly determine whether alloc_tag_init() has been called and
>its data structures initialized, verify that alloc_tag_cttype is a valid
>pointer before acquiring the semaphore. If the variable is NULL or an error
>value, it has not been properly initialized. In such a case, just skip
>and do not attempt to acquire the semaphore.
>
>Reported-by: kernel test robot <oliver.sang@intel.com>
>Closes: https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com
>Closes: https://lore.kernel.org/oe-lkp/202506131711.5b41931c-lkp@intel.com
>Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
>Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
>Cc: stable@vger.kernel.org
>Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
>---
>
>@Suren: I did not add another pr_warn() because every error path in
>alloc_tag_init() already has pr_err().
>
>v2 -> v3:
>- Added another Closes: tag (David)
>- Moved the condition into a standalone if block for better readability
> (Suren)
>- Typo fix (Suren)
>
> lib/alloc_tag.c | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
>index 41ccfb035b7b..e9b33848700a 100644
>--- a/lib/alloc_tag.c
>+++ b/lib/alloc_tag.c
>@@ -127,6 +127,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> struct codetag_bytes n;
> unsigned int i, nr = 0;
>
>+ if (IS_ERR_OR_NULL(alloc_tag_cttype))
Should a warning added here? indicating codetag module not ready yet and the memory failure happened during boot:
if (mem_profiling_support) pr_warn("...
>+ return 0;
>+
> if (can_sleep)
> codetag_lock_module_list(alloc_tag_cttype, true);
> else if (!codetag_trylock_module_list(alloc_tag_cttype))
>--
>2.43.0
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()y
2025-06-24 8:21 ` David Wang
@ 2025-06-24 9:09 ` Harry Yoo
2025-06-24 9:30 ` David Wang
0 siblings, 1 reply; 22+ messages in thread
From: Harry Yoo @ 2025-06-24 9:09 UTC (permalink / raw)
To: David Wang
Cc: akpm, surenb, kent.overstreet, oliver.sang, cachen, linux-mm,
oe-lkp, stable
On Tue, Jun 24, 2025 at 04:21:23PM +0800, David Wang wrote:
> At 2025-06-24 15:25:13, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> >alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
> >even when the alloc_tag_cttype is not allocated because:
> >
> > 1) alloc tagging is disabled because mem profiling is disabled
> > (!alloc_tag_cttype)
> > 2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
> > 3) alloc tagging is enabled, but failed initialization
> > (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
> >
> >In all cases, alloc_tag_cttype is not allocated, and therefore
> >alloc_tag_top_users() should not attempt to acquire the semaphore.
> >
> >This leads to a crash on memory allocation failure by attempting to
> >acquire a non-existent semaphore:
> >
> > Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
> > KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
> > CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
> > Tainted: [D]=DIE
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > RIP: 0010:down_read_trylock+0xaa/0x3b0
> > Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
> > RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
> > RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
> > RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> > RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
> > R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
> > R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
> > FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
> > Call Trace:
> > <TASK>
> > codetag_trylock_module_list+0xd/0x20
> > alloc_tag_top_users+0x369/0x4b0
> > __show_mem+0x1cd/0x6e0
> > warn_alloc+0x2b1/0x390
> > __alloc_frozen_pages_noprof+0x12b9/0x21a0
> > alloc_pages_mpol+0x135/0x3e0
> > alloc_slab_page+0x82/0xe0
> > new_slab+0x212/0x240
> > ___slab_alloc+0x82a/0xe00
> > </TASK>
> >
> >As David Wang points out, this issue became easier to trigger after commit
> >780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
> >
> >Before the commit, the issue occurred only when it failed to allocate
> >and initialize alloc_tag_cttype or if a memory allocation fails before
> >alloc_tag_init() is called. After the commit, it can be easily triggered
> >when memory profiling is compiled but disabled at boot.
> >
> >To properly determine whether alloc_tag_init() has been called and
> >its data structures initialized, verify that alloc_tag_cttype is a valid
> >pointer before acquiring the semaphore. If the variable is NULL or an error
> >value, it has not been properly initialized. In such a case, just skip
> >and do not attempt to acquire the semaphore.
> >
> >Reported-by: kernel test robot <oliver.sang@intel.com>
> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com__;!!ACWV5N9M2RV99hQ!PxJNKp4Bj6h0XIWpRXcmFeIz51jORtRRAo1j23ZnRgvTm0E0Mp5l6UrLNCkiHww6AVWOSfbDDdBwKgJ9_Q$
> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506131711.5b41931c-lkp@intel.com__;!!ACWV5N9M2RV99hQ!PxJNKp4Bj6h0XIWpRXcmFeIz51jORtRRAo1j23ZnRgvTm0E0Mp5l6UrLNCkiHww6AVWOSfbDDdC-7OiUsg$
> >Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
> >Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
> >Cc: stable@vger.kernel.org
> >Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> >---
> >
> >@Suren: I did not add another pr_warn() because every error path in
> >alloc_tag_init() already has pr_err().
> >
> >v2 -> v3:
> >- Added another Closes: tag (David)
> >- Moved the condition into a standalone if block for better readability
> > (Suren)
> >- Typo fix (Suren)
> >
> > lib/alloc_tag.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> >diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> >index 41ccfb035b7b..e9b33848700a 100644
> >--- a/lib/alloc_tag.c
> >+++ b/lib/alloc_tag.c
> >@@ -127,6 +127,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> > struct codetag_bytes n;
> > unsigned int i, nr = 0;
> >
> >+ if (IS_ERR_OR_NULL(alloc_tag_cttype))
>
> Should a warning added here? indicating codetag module not ready yet and the memory failure happened during boot:
> if (mem_profiling_support) pr_warn("...
I think you're saying we need to print a warning when alloc tagging
can't provide "top users".
And there can be three different reasons why it can't provide them:
1) alloc_tag_cttype is not ready yet or mem profiling is disabled.
2) the context can't sleep and trylock failed.
3) alloc tags do not exist.
I think that makes sense, but it should be a new feature (as a separate
patch) and not a -stable material?
If you're interested in doing this, please feel free to proceed.
It will look like this:
sh invoked oom-killer: gfp_mask=0x140dca(GFP_HIGHUSER_MOVABLE|_0
[... snip ...]
Mem-Info:
active_anon:467412 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:872 slab_unreclaimable:4769
mapped:833 shmem:49252 pagetables:930
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:15416 free_pcp:7714 free_cma:0
[... snip ...]
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
524158 pages RAM
0 pages HighMem/MovableOnly
22064 pages reserved
0 pages cma reserved
0 pages hwpoisoned
No memory allocation info available: Alloc tagging subsystem not initialized || Context cannot sleep || No alloc tags recorded
[ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
[ 105] 0 105 3065 1003 402 0 601 65536 0 0 systemd-udevd
[ 171] 0 171 775610 418124 417661 0 463 3416064 0 0 sh
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-1,global_oom,task_memcg=/,task=sh,pid=171,uid=0
Out of memory: Killed process 171 (sh) total-vm:3102440kB, anon-rss:1670644kB, file-rss:0kB, shmem-rss:1852kB, UID:0 pgtables:3336kB oom_score_adj:0
> >+ return 0;
> >+
> > if (can_sleep)
> > codetag_lock_module_list(alloc_tag_cttype, true);
> > else if (!codetag_trylock_module_list(alloc_tag_cttype))
> >--
> >2.43.0
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()y
2025-06-24 9:09 ` [PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()y Harry Yoo
@ 2025-06-24 9:30 ` David Wang
2025-06-24 10:59 ` Harry Yoo
0 siblings, 1 reply; 22+ messages in thread
From: David Wang @ 2025-06-24 9:30 UTC (permalink / raw)
To: Harry Yoo
Cc: akpm, surenb, kent.overstreet, oliver.sang, cachen, linux-mm,
oe-lkp, stable
At 2025-06-24 17:09:54, "Harry Yoo" <harry.yoo@oracle.com> wrote:
>On Tue, Jun 24, 2025 at 04:21:23PM +0800, David Wang wrote:
>> At 2025-06-24 15:25:13, "Harry Yoo" <harry.yoo@oracle.com> wrote:
>> >alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
>> >even when the alloc_tag_cttype is not allocated because:
>> >
>> > 1) alloc tagging is disabled because mem profiling is disabled
>> > (!alloc_tag_cttype)
>> > 2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
>> > 3) alloc tagging is enabled, but failed initialization
>> > (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
>> >
>> >In all cases, alloc_tag_cttype is not allocated, and therefore
>> >alloc_tag_top_users() should not attempt to acquire the semaphore.
>> >
>> >This leads to a crash on memory allocation failure by attempting to
>> >acquire a non-existent semaphore:
>> >
>> > Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
>> > KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
>> > CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
>> > Tainted: [D]=DIE
>> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
>> > RIP: 0010:down_read_trylock+0xaa/0x3b0
>> > Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
>> > RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
>> > RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
>> > RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
>> > RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
>> > R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
>> > R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
>> > FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
>> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
>> > Call Trace:
>> > <TASK>
>> > codetag_trylock_module_list+0xd/0x20
>> > alloc_tag_top_users+0x369/0x4b0
>> > __show_mem+0x1cd/0x6e0
>> > warn_alloc+0x2b1/0x390
>> > __alloc_frozen_pages_noprof+0x12b9/0x21a0
>> > alloc_pages_mpol+0x135/0x3e0
>> > alloc_slab_page+0x82/0xe0
>> > new_slab+0x212/0x240
>> > ___slab_alloc+0x82a/0xe00
>> > </TASK>
>> >
>> >As David Wang points out, this issue became easier to trigger after commit
>> >780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
>> >
>> >Before the commit, the issue occurred only when it failed to allocate
>> >and initialize alloc_tag_cttype or if a memory allocation fails before
>> >alloc_tag_init() is called. After the commit, it can be easily triggered
>> >when memory profiling is compiled but disabled at boot.
>> >
>> >To properly determine whether alloc_tag_init() has been called and
>> >its data structures initialized, verify that alloc_tag_cttype is a valid
>> >pointer before acquiring the semaphore. If the variable is NULL or an error
>> >value, it has not been properly initialized. In such a case, just skip
>> >and do not attempt to acquire the semaphore.
>> >
>> >Reported-by: kernel test robot <oliver.sang@intel.com>
>> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com__;!!ACWV5N9M2RV99hQ!PxJNKp4Bj6h0XIWpRXcmFeIz51jORtRRAo1j23ZnRgvTm0E0Mp5l6UrLNCkiHww6AVWOSfbDDdBwKgJ9_Q$
>> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506131711.5b41931c-lkp@intel.com__;!!ACWV5N9M2RV99hQ!PxJNKp4Bj6h0XIWpRXcmFeIz51jORtRRAo1j23ZnRgvTm0E0Mp5l6UrLNCkiHww6AVWOSfbDDdC-7OiUsg$
>> >Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
>> >Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
>> >Cc: stable@vger.kernel.org
>> >Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
>> >---
>> >
>> >@Suren: I did not add another pr_warn() because every error path in
>> >alloc_tag_init() already has pr_err().
>> >
>> >v2 -> v3:
>> >- Added another Closes: tag (David)
>> >- Moved the condition into a standalone if block for better readability
>> > (Suren)
>> >- Typo fix (Suren)
>> >
>> > lib/alloc_tag.c | 3 +++
>> > 1 file changed, 3 insertions(+)
>> >
>> >diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
>> >index 41ccfb035b7b..e9b33848700a 100644
>> >--- a/lib/alloc_tag.c
>> >+++ b/lib/alloc_tag.c
>> >@@ -127,6 +127,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
>> > struct codetag_bytes n;
>> > unsigned int i, nr = 0;
>> >
>> >+ if (IS_ERR_OR_NULL(alloc_tag_cttype))
>>
>> Should a warning added here? indicating codetag module not ready yet and the memory failure happened during boot:
>> if (mem_profiling_support) pr_warn("...
>
>I think you're saying we need to print a warning when alloc tagging
>can't provide "top users".
I just meant printing a warning when show_mem is needed before codetag module initialized,
as reported in https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com/
where mem_profiling_support is 1, but alloc_tag_cttype is still NULL.
This can tell we do have a memory failure during boot before codetag_init, even with memory profiling activated.
>
>And there can be three different reasons why it can't provide them:
>
>1) alloc_tag_cttype is not ready yet or mem profiling is disabled.
>2) the context can't sleep and trylock failed.
This case is not just about warning, it is a bug if possible.
>3) alloc tags do not exist.
>
>--
>Cheers,
>Harry / Hyeonggon
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()y
2025-06-24 9:30 ` David Wang
@ 2025-06-24 10:59 ` Harry Yoo
2025-06-24 11:28 ` David Wang
0 siblings, 1 reply; 22+ messages in thread
From: Harry Yoo @ 2025-06-24 10:59 UTC (permalink / raw)
To: David Wang
Cc: akpm, surenb, kent.overstreet, oliver.sang, cachen, linux-mm,
oe-lkp, stable
On Tue, Jun 24, 2025 at 05:30:02PM +0800, David Wang wrote:
>
> At 2025-06-24 17:09:54, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> >On Tue, Jun 24, 2025 at 04:21:23PM +0800, David Wang wrote:
> >> At 2025-06-24 15:25:13, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> >> >alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
> >> >even when the alloc_tag_cttype is not allocated because:
> >> >
> >> > 1) alloc tagging is disabled because mem profiling is disabled
> >> > (!alloc_tag_cttype)
> >> > 2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
> >> > 3) alloc tagging is enabled, but failed initialization
> >> > (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
> >> >
> >> >In all cases, alloc_tag_cttype is not allocated, and therefore
> >> >alloc_tag_top_users() should not attempt to acquire the semaphore.
> >> >
> >> >This leads to a crash on memory allocation failure by attempting to
> >> >acquire a non-existent semaphore:
> >> >
> >> > Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
> >> > KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
> >> > CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
> >> > Tainted: [D]=DIE
> >> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> >> > RIP: 0010:down_read_trylock+0xaa/0x3b0
> >> > Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
> >> > RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
> >> > RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
> >> > RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> >> > RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
> >> > R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
> >> > R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
> >> > FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
> >> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> > CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
> >> > Call Trace:
> >> > <TASK>
> >> > codetag_trylock_module_list+0xd/0x20
> >> > alloc_tag_top_users+0x369/0x4b0
> >> > __show_mem+0x1cd/0x6e0
> >> > warn_alloc+0x2b1/0x390
> >> > __alloc_frozen_pages_noprof+0x12b9/0x21a0
> >> > alloc_pages_mpol+0x135/0x3e0
> >> > alloc_slab_page+0x82/0xe0
> >> > new_slab+0x212/0x240
> >> > ___slab_alloc+0x82a/0xe00
> >> > </TASK>
> >> >
> >> >As David Wang points out, this issue became easier to trigger after commit
> >> >780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
> >> >
> >> >Before the commit, the issue occurred only when it failed to allocate
> >> >and initialize alloc_tag_cttype or if a memory allocation fails before
> >> >alloc_tag_init() is called. After the commit, it can be easily triggered
> >> >when memory profiling is compiled but disabled at boot.
> >> >
> >> >To properly determine whether alloc_tag_init() has been called and
> >> >its data structures initialized, verify that alloc_tag_cttype is a valid
> >> >pointer before acquiring the semaphore. If the variable is NULL or an error
> >> >value, it has not been properly initialized. In such a case, just skip
> >> >and do not attempt to acquire the semaphore.
> >> >
> >> >Reported-by: kernel test robot <oliver.sang@intel.com>
> >> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com__;!!ACWV5N9M2RV99hQ!PxJNKp4Bj6h0XIWpRXcmFeIz51jORtRRAo1j23ZnRgvTm0E0Mp5l6UrLNCkiHww6AVWOSfbDDdBwKgJ9_Q$
> >> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506131711.5b41931c-lkp@intel.com__;!!ACWV5N9M2RV99hQ!PxJNKp4Bj6h0XIWpRXcmFeIz51jORtRRAo1j23ZnRgvTm0E0Mp5l6UrLNCkiHww6AVWOSfbDDdC-7OiUsg$
> >> >Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
> >> >Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
> >> >Cc: stable@vger.kernel.org
> >> >Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> >> >---
> >> >
> >> >@Suren: I did not add another pr_warn() because every error path in
> >> >alloc_tag_init() already has pr_err().
> >> >
> >> >v2 -> v3:
> >> >- Added another Closes: tag (David)
> >> >- Moved the condition into a standalone if block for better readability
> >> > (Suren)
> >> >- Typo fix (Suren)
> >> >
> >> > lib/alloc_tag.c | 3 +++
> >> > 1 file changed, 3 insertions(+)
> >> >
> >> >diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> >> >index 41ccfb035b7b..e9b33848700a 100644
> >> >--- a/lib/alloc_tag.c
> >> >+++ b/lib/alloc_tag.c
> >> >@@ -127,6 +127,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> >> > struct codetag_bytes n;
> >> > unsigned int i, nr = 0;
> >> >
> >> >+ if (IS_ERR_OR_NULL(alloc_tag_cttype))
> >>
> >> Should a warning added here? indicating codetag module not ready yet and the memory failure happened during boot:
> >> if (mem_profiling_support) pr_warn("...
> >
> >I think you're saying we need to print a warning when alloc tagging
> >can't provide "top users".
>
> I just meant printing a warning when show_mem is needed before codetag module initialized,
> as reported in https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com/__;!!ACWV5N9M2RV99hQ!J2waTUro8owaYlpAZJ6fnrHZvcGMbY6qAO5QvvIGZzUv-ryWjCjhO-maTOolfpPvPSr6CpqOgkRalCwJow$
> where mem_profiling_support is 1, but alloc_tag_cttype is still NULL.
> This can tell we do have a memory failure during boot before codetag_init, even with memory profiling activated.
Ok. You didn't mean that.
But still I think it's better to handle all cases and print distinct
warnings, rather than handling only the specific case where memory profiling
is enabled but not yet initialized.
Users will want to know why allocation information is not available,
and there can be multiple reasons including the one you mentioned.
What do you think?
> >And there can be three different reasons why it can't provide them:
> >
> >1) alloc_tag_cttype is not ready yet or mem profiling is disabled.
> >2) the context can't sleep and trylock failed.
>
> This case is not just about warning, it is a bug if possible.
Why do you think it is a bug? If trylock fails, alloc_tag_top_users()
returns 0 and do nothing.
> >3) alloc tags do not exist.
> >
>
> >--
> >Cheers,
> >Harry / Hyeonggon
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()y
2025-06-24 10:59 ` Harry Yoo
@ 2025-06-24 11:28 ` David Wang
2025-06-24 13:15 ` David Wang
0 siblings, 1 reply; 22+ messages in thread
From: David Wang @ 2025-06-24 11:28 UTC (permalink / raw)
To: Harry Yoo
Cc: akpm, surenb, kent.overstreet, oliver.sang, cachen, linux-mm,
oe-lkp, stable
At 2025-06-24 18:59:52, "Harry Yoo" <harry.yoo@oracle.com> wrote:
>On Tue, Jun 24, 2025 at 05:30:02PM +0800, David Wang wrote:
>>
>> At 2025-06-24 17:09:54, "Harry Yoo" <harry.yoo@oracle.com> wrote:
>> >On Tue, Jun 24, 2025 at 04:21:23PM +0800, David Wang wrote:
>> >> At 2025-06-24 15:25:13, "Harry Yoo" <harry.yoo@oracle.com> wrote:
>> >> >alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
>> >> >even when the alloc_tag_cttype is not allocated because:
>> >> >
>> >> > 1) alloc tagging is disabled because mem profiling is disabled
>> >> > (!alloc_tag_cttype)
>> >> > 2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
>> >> > 3) alloc tagging is enabled, but failed initialization
>> >> > (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
>> >> >
>> >> >In all cases, alloc_tag_cttype is not allocated, and therefore
>> >> >alloc_tag_top_users() should not attempt to acquire the semaphore.
>> >> >
>> >> >This leads to a crash on memory allocation failure by attempting to
>> >> >acquire a non-existent semaphore:
>> >> >
>> >> > Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
>> >> > KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
>> >> > CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
>> >> > Tainted: [D]=DIE
>> >> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
>> >> > RIP: 0010:down_read_trylock+0xaa/0x3b0
>> >> > Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
>> >> > RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
>> >> > RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
>> >> > RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
>> >> > RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
>> >> > R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
>> >> > R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
>> >> > FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
>> >> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> >> > CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
>> >> > Call Trace:
>> >> > <TASK>
>> >> > codetag_trylock_module_list+0xd/0x20
>> >> > alloc_tag_top_users+0x369/0x4b0
>> >> > __show_mem+0x1cd/0x6e0
>> >> > warn_alloc+0x2b1/0x390
>> >> > __alloc_frozen_pages_noprof+0x12b9/0x21a0
>> >> > alloc_pages_mpol+0x135/0x3e0
>> >> > alloc_slab_page+0x82/0xe0
>> >> > new_slab+0x212/0x240
>> >> > ___slab_alloc+0x82a/0xe00
>> >> > </TASK>
>> >> >
>> >> >As David Wang points out, this issue became easier to trigger after commit
>> >> >780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
>> >> >
>> >> >Before the commit, the issue occurred only when it failed to allocate
>> >> >and initialize alloc_tag_cttype or if a memory allocation fails before
>> >> >alloc_tag_init() is called. After the commit, it can be easily triggered
>> >> >when memory profiling is compiled but disabled at boot.
>> >> >
>> >> >To properly determine whether alloc_tag_init() has been called and
>> >> >its data structures initialized, verify that alloc_tag_cttype is a valid
>> >> >pointer before acquiring the semaphore. If the variable is NULL or an error
>> >> >value, it has not been properly initialized. In such a case, just skip
>> >> >and do not attempt to acquire the semaphore.
>> >> >
>> >> >Reported-by: kernel test robot <oliver.sang@intel.com>
>> >> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com__;!!ACWV5N9M2RV99hQ!PxJNKp4Bj6h0XIWpRXcmFeIz51jORtRRAo1j23ZnRgvTm0E0Mp5l6UrLNCkiHww6AVWOSfbDDdBwKgJ9_Q$
>> >> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506131711.5b41931c-lkp@intel.com__;!!ACWV5N9M2RV99hQ!PxJNKp4Bj6h0XIWpRXcmFeIz51jORtRRAo1j23ZnRgvTm0E0Mp5l6UrLNCkiHww6AVWOSfbDDdC-7OiUsg$
>> >> >Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
>> >> >Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
>> >> >Cc: stable@vger.kernel.org
>> >> >Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
>> >> >---
>> >> >
>> >> >@Suren: I did not add another pr_warn() because every error path in
>> >> >alloc_tag_init() already has pr_err().
>> >> >
>> >> >v2 -> v3:
>> >> >- Added another Closes: tag (David)
>> >> >- Moved the condition into a standalone if block for better readability
>> >> > (Suren)
>> >> >- Typo fix (Suren)
>> >> >
>> >> > lib/alloc_tag.c | 3 +++
>> >> > 1 file changed, 3 insertions(+)
>> >> >
>> >> >diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
>> >> >index 41ccfb035b7b..e9b33848700a 100644
>> >> >--- a/lib/alloc_tag.c
>> >> >+++ b/lib/alloc_tag.c
>> >> >@@ -127,6 +127,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
>> >> > struct codetag_bytes n;
>> >> > unsigned int i, nr = 0;
>> >> >
>> >> >+ if (IS_ERR_OR_NULL(alloc_tag_cttype))
>> >>
>> >> Should a warning added here? indicating codetag module not ready yet and the memory failure happened during boot:
>> >> if (mem_profiling_support) pr_warn("...
>> >
>> >I think you're saying we need to print a warning when alloc tagging
>> >can't provide "top users".
>>
>> I just meant printing a warning when show_mem is needed before codetag module initialized,
>> as reported in https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com/__;!!ACWV5N9M2RV99hQ!J2waTUro8owaYlpAZJ6fnrHZvcGMbY6qAO5QvvIGZzUv-ryWjCjhO-maTOolfpPvPSr6CpqOgkRalCwJow$
>> where mem_profiling_support is 1, but alloc_tag_cttype is still NULL.
>> This can tell we do have a memory failure during boot before codetag_init, even with memory profiling activated.
>
>Ok. You didn't mean that.
>
>But still I think it's better to handle all cases and print distinct
>warnings, rather than handling only the specific case where memory profiling
>is enabled but not yet initialized.
>
>Users will want to know why allocation information is not available,
>and there can be multiple reasons including the one you mentioned.
>
>What do you think?
I am not sure....
I think most cases you mentioned is just a pr_info, those are expected behavior or designed that way.
But when mem_profiling_support==1 && alloc_tag_cttype==NULL, this is an unexpected behavior, which is a pr_warn.
>
>> >And there can be three different reasons why it can't provide them:
>> >
>> >1) alloc_tag_cttype is not ready yet or mem profiling is disabled.
>> >2) the context can't sleep and trylock failed.
>>
>> This case is not just about warning, it is a bug if possible.
>
>Why do you think it is a bug? If trylock fails, alloc_tag_top_users()
>returns 0 and do nothing.
Oh, I was wrong, I thought this is about failure to acquire a lock. (not trylock.)
>
>> >3) alloc tags do not exist.
>> >
>>
>> >--
>> >Cheers,
>> >Harry / Hyeonggon
>
>--
>Cheers,
>Harry / Hyeonggon
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()y
2025-06-24 11:28 ` David Wang
@ 2025-06-24 13:15 ` David Wang
2025-06-24 14:24 ` Harry Yoo
0 siblings, 1 reply; 22+ messages in thread
From: David Wang @ 2025-06-24 13:15 UTC (permalink / raw)
To: Harry Yoo
Cc: akpm, surenb, kent.overstreet, oliver.sang, cachen, linux-mm,
oe-lkp, stable
At 2025-06-24 19:28:18, "David Wang" <00107082@163.com> wrote:
>
>At 2025-06-24 18:59:52, "Harry Yoo" <harry.yoo@oracle.com> wrote:
>>On Tue, Jun 24, 2025 at 05:30:02PM +0800, David Wang wrote:
>>>
>>> At 2025-06-24 17:09:54, "Harry Yoo" <harry.yoo@oracle.com> wrote:
>>> >On Tue, Jun 24, 2025 at 04:21:23PM +0800, David Wang wrote:
>>> >> At 2025-06-24 15:25:13, "Harry Yoo" <harry.yoo@oracle.com> wrote:
>>> >> >alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
>>> >> >even when the alloc_tag_cttype is not allocated because:
>>> >> >
>>> >> > 1) alloc tagging is disabled because mem profiling is disabled
>>> >> > (!alloc_tag_cttype)
>>> >> > 2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
>>> >> > 3) alloc tagging is enabled, but failed initialization
>>> >> > (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
>>> >> >
>>> >> >In all cases, alloc_tag_cttype is not allocated, and therefore
>>> >> >alloc_tag_top_users() should not attempt to acquire the semaphore.
>>> >> >
>>> >> >This leads to a crash on memory allocation failure by attempting to
>>> >> >acquire a non-existent semaphore:
>>> >> >
>>> >> > Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
>>> >> > KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
>>> >> > CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
>>> >> > Tainted: [D]=DIE
>>> >> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
>>> >> > RIP: 0010:down_read_trylock+0xaa/0x3b0
>>> >> > Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
>>> >> > RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
>>> >> > RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
>>> >> > RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
>>> >> > RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
>>> >> > R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
>>> >> > R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
>>> >> > FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
>>> >> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> >> > CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
>>> >> > Call Trace:
>>> >> > <TASK>
>>> >> > codetag_trylock_module_list+0xd/0x20
>>> >> > alloc_tag_top_users+0x369/0x4b0
>>> >> > __show_mem+0x1cd/0x6e0
>>> >> > warn_alloc+0x2b1/0x390
>>> >> > __alloc_frozen_pages_noprof+0x12b9/0x21a0
>>> >> > alloc_pages_mpol+0x135/0x3e0
>>> >> > alloc_slab_page+0x82/0xe0
>>> >> > new_slab+0x212/0x240
>>> >> > ___slab_alloc+0x82a/0xe00
>>> >> > </TASK>
>>> >> >
>>> >> >As David Wang points out, this issue became easier to trigger after commit
>>> >> >780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
>>> >> >
>>> >> >Before the commit, the issue occurred only when it failed to allocate
>>> >> >and initialize alloc_tag_cttype or if a memory allocation fails before
>>> >> >alloc_tag_init() is called. After the commit, it can be easily triggered
>>> >> >when memory profiling is compiled but disabled at boot.
>>> >> >
>>> >> >To properly determine whether alloc_tag_init() has been called and
>>> >> >its data structures initialized, verify that alloc_tag_cttype is a valid
>>> >> >pointer before acquiring the semaphore. If the variable is NULL or an error
>>> >> >value, it has not been properly initialized. In such a case, just skip
>>> >> >and do not attempt to acquire the semaphore.
>>> >> >
>>> >> >Reported-by: kernel test robot <oliver.sang@intel.com>
>>> >> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com__;!!ACWV5N9M2RV99hQ!PxJNKp4Bj6h0XIWpRXcmFeIz51jORtRRAo1j23ZnRgvTm0E0Mp5l6UrLNCkiHww6AVWOSfbDDdBwKgJ9_Q$
>>> >> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506131711.5b41931c-lkp@intel.com__;!!ACWV5N9M2RV99hQ!PxJNKp4Bj6h0XIWpRXcmFeIz51jORtRRAo1j23ZnRgvTm0E0Mp5l6UrLNCkiHww6AVWOSfbDDdC-7OiUsg$
>>> >> >Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
>>> >> >Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
>>> >> >Cc: stable@vger.kernel.org
>>> >> >Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
>>> >> >---
>>> >> >
>>> >> >@Suren: I did not add another pr_warn() because every error path in
>>> >> >alloc_tag_init() already has pr_err().
>>> >> >
>>> >> >v2 -> v3:
>>> >> >- Added another Closes: tag (David)
>>> >> >- Moved the condition into a standalone if block for better readability
>>> >> > (Suren)
>>> >> >- Typo fix (Suren)
>>> >> >
>>> >> > lib/alloc_tag.c | 3 +++
>>> >> > 1 file changed, 3 insertions(+)
>>> >> >
>>> >> >diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
>>> >> >index 41ccfb035b7b..e9b33848700a 100644
>>> >> >--- a/lib/alloc_tag.c
>>> >> >+++ b/lib/alloc_tag.c
>>> >> >@@ -127,6 +127,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
>>> >> > struct codetag_bytes n;
>>> >> > unsigned int i, nr = 0;
>>> >> >
>>> >> >+ if (IS_ERR_OR_NULL(alloc_tag_cttype))
>>> >>
>>> >> Should a warning added here? indicating codetag module not ready yet and the memory failure happened during boot:
>>> >> if (mem_profiling_support) pr_warn("...
>>> >
>>> >I think you're saying we need to print a warning when alloc tagging
>>> >can't provide "top users".
>>>
>>> I just meant printing a warning when show_mem is needed before codetag module initialized,
>>> as reported in https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com/__;!!ACWV5N9M2RV99hQ!J2waTUro8owaYlpAZJ6fnrHZvcGMbY6qAO5QvvIGZzUv-ryWjCjhO-maTOolfpPvPSr6CpqOgkRalCwJow$
>>> where mem_profiling_support is 1, but alloc_tag_cttype is still NULL.
>>> This can tell we do have a memory failure during boot before codetag_init, even with memory profiling activated.
>>
>>Ok. You didn't mean that.
>>
>>But still I think it's better to handle all cases and print distinct
>>warnings, rather than handling only the specific case where memory profiling
>>is enabled but not yet initialized.
>>
>>Users will want to know why allocation information is not available,
>>and there can be multiple reasons including the one you mentioned.
>>
>>What do you think?
>
>I am not sure....
>I think most cases you mentioned is just a pr_info, those are expected behavior or designed that way.
>But when mem_profiling_support==1 && alloc_tag_cttype==NULL, this is an unexpected behavior, which is a pr_warn.
Put it in a clearer way, so far we have identified two "error" conditions:
1. mem_profiling_support=1 but initialization for alloc_tag_cttype failed, "alloc_tag_init() already has pr_err()", as you mentioned.
2. mem_profiling_support=1 , but codetag module have not been init yet. I suggested adding a pr_warn here.
Other situations are designed that way, we can add pr_info if it is really necessary....
David
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re:[PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()
2025-06-24 7:25 [PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users() Harry Yoo
2025-06-24 8:21 ` David Wang
@ 2025-06-24 13:25 ` David Wang
2025-06-24 13:50 ` [PATCH " Harry Yoo
2025-06-30 6:25 ` Raghavendra K T
2 siblings, 1 reply; 22+ messages in thread
From: David Wang @ 2025-06-24 13:25 UTC (permalink / raw)
To: Harry Yoo
Cc: akpm, surenb, kent.overstreet, oliver.sang, cachen, linux-mm,
oe-lkp, stable
At 2025-06-24 15:25:13, "Harry Yoo" <harry.yoo@oracle.com> wrote:
>alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
>even when the alloc_tag_cttype is not allocated because:
>
> 1) alloc tagging is disabled because mem profiling is disabled
> (!alloc_tag_cttype)
> 2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
> 3) alloc tagging is enabled, but failed initialization
> (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
>
>In all cases, alloc_tag_cttype is not allocated, and therefore
>alloc_tag_top_users() should not attempt to acquire the semaphore.
>
>This leads to a crash on memory allocation failure by attempting to
>acquire a non-existent semaphore:
>
> Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
> KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
> CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
> Tainted: [D]=DIE
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> RIP: 0010:down_read_trylock+0xaa/0x3b0
> Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
> RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
> RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
> RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
> R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
> R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
> FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
> Call Trace:
> <TASK>
> codetag_trylock_module_list+0xd/0x20
> alloc_tag_top_users+0x369/0x4b0
> __show_mem+0x1cd/0x6e0
> warn_alloc+0x2b1/0x390
> __alloc_frozen_pages_noprof+0x12b9/0x21a0
> alloc_pages_mpol+0x135/0x3e0
> alloc_slab_page+0x82/0xe0
> new_slab+0x212/0x240
> ___slab_alloc+0x82a/0xe00
> </TASK>
>
>As David Wang points out, this issue became easier to trigger after commit
>780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
>
>Before the commit, the issue occurred only when it failed to allocate
>and initialize alloc_tag_cttype or if a memory allocation fails before
>alloc_tag_init() is called. After the commit, it can be easily triggered
>when memory profiling is compiled but disabled at boot.
>
>To properly determine whether alloc_tag_init() has been called and
>its data structures initialized, verify that alloc_tag_cttype is a valid
>pointer before acquiring the semaphore. If the variable is NULL or an error
>value, it has not been properly initialized. In such a case, just skip
>and do not attempt to acquire the semaphore.
>
>Reported-by: kernel test robot <oliver.sang@intel.com>
>Closes: https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com
>Closes: https://lore.kernel.org/oe-lkp/202506131711.5b41931c-lkp@intel.com
>Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
>Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
>Cc: stable@vger.kernel.org
>Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
>---
>
>@Suren: I did not add another pr_warn() because every error path in
>alloc_tag_init() already has pr_err().
>
>v2 -> v3:
>- Added another Closes: tag (David)
>- Moved the condition into a standalone if block for better readability
> (Suren)
>- Typo fix (Suren)
>
> lib/alloc_tag.c | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
>index 41ccfb035b7b..e9b33848700a 100644
>--- a/lib/alloc_tag.c
>+++ b/lib/alloc_tag.c
>@@ -127,6 +127,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> struct codetag_bytes n;
> unsigned int i, nr = 0;
>
>+ if (IS_ERR_OR_NULL(alloc_tag_cttype))
>+ return 0;
What about mem_profiling_support set to 0 after alloc_tag_init, in this case:
alloc_tag_cttype != NULL && mem_profiling_support==0
I kind of think alloc_tag_top_users should return 0 in this case....and both mem_profiling_support and alloc_tag_cttype should be checked....
>+
> if (can_sleep)
> codetag_lock_module_list(alloc_tag_cttype, true);
> else if (!codetag_trylock_module_list(alloc_tag_cttype))
>--
>2.43.0
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()
2025-06-24 13:25 ` Re:[PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users() David Wang
@ 2025-06-24 13:50 ` Harry Yoo
2025-06-24 14:00 ` David Wang
0 siblings, 1 reply; 22+ messages in thread
From: Harry Yoo @ 2025-06-24 13:50 UTC (permalink / raw)
To: David Wang
Cc: akpm, surenb, kent.overstreet, oliver.sang, cachen, linux-mm,
oe-lkp, stable
On Tue, Jun 24, 2025 at 09:25:58PM +0800, David Wang wrote:
>
> At 2025-06-24 15:25:13, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> >alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
> >even when the alloc_tag_cttype is not allocated because:
> >
> > 1) alloc tagging is disabled because mem profiling is disabled
> > (!alloc_tag_cttype)
> > 2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
> > 3) alloc tagging is enabled, but failed initialization
> > (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
> >
> >In all cases, alloc_tag_cttype is not allocated, and therefore
> >alloc_tag_top_users() should not attempt to acquire the semaphore.
> >
> >This leads to a crash on memory allocation failure by attempting to
> >acquire a non-existent semaphore:
> >
> > Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
> > KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
> > CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
> > Tainted: [D]=DIE
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > RIP: 0010:down_read_trylock+0xaa/0x3b0
> > Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
> > RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
> > RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
> > RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> > RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
> > R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
> > R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
> > FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
> > Call Trace:
> > <TASK>
> > codetag_trylock_module_list+0xd/0x20
> > alloc_tag_top_users+0x369/0x4b0
> > __show_mem+0x1cd/0x6e0
> > warn_alloc+0x2b1/0x390
> > __alloc_frozen_pages_noprof+0x12b9/0x21a0
> > alloc_pages_mpol+0x135/0x3e0
> > alloc_slab_page+0x82/0xe0
> > new_slab+0x212/0x240
> > ___slab_alloc+0x82a/0xe00
> > </TASK>
> >
> >As David Wang points out, this issue became easier to trigger after commit
> >780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
> >
> >Before the commit, the issue occurred only when it failed to allocate
> >and initialize alloc_tag_cttype or if a memory allocation fails before
> >alloc_tag_init() is called. After the commit, it can be easily triggered
> >when memory profiling is compiled but disabled at boot.
> >
> >To properly determine whether alloc_tag_init() has been called and
> >its data structures initialized, verify that alloc_tag_cttype is a valid
> >pointer before acquiring the semaphore. If the variable is NULL or an error
> >value, it has not been properly initialized. In such a case, just skip
> >and do not attempt to acquire the semaphore.
> >
> >Reported-by: kernel test robot <oliver.sang@intel.com>
> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com__;!!ACWV5N9M2RV99hQ!MADvGKtvTvlLXNxlrJ4BdOSnbsJlyrSroPUGJ3JQHs_IF-gxxqfQ89OTZ21aN96DbmjG9qH3Wi1MlgtiSA$
> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506131711.5b41931c-lkp@intel.com__;!!ACWV5N9M2RV99hQ!MADvGKtvTvlLXNxlrJ4BdOSnbsJlyrSroPUGJ3JQHs_IF-gxxqfQ89OTZ21aN96DbmjG9qH3Wi0o2OoynA$
> >Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
> >Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
> >Cc: stable@vger.kernel.org
> >Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> >---
> >
> >@Suren: I did not add another pr_warn() because every error path in
> >alloc_tag_init() already has pr_err().
> >
> >v2 -> v3:
> >- Added another Closes: tag (David)
> >- Moved the condition into a standalone if block for better readability
> > (Suren)
> >- Typo fix (Suren)
> >
> > lib/alloc_tag.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> >diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> >index 41ccfb035b7b..e9b33848700a 100644
> >--- a/lib/alloc_tag.c
> >+++ b/lib/alloc_tag.c
> >@@ -127,6 +127,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> > struct codetag_bytes n;
> > unsigned int i, nr = 0;
> >
> >+ if (IS_ERR_OR_NULL(alloc_tag_cttype))
> >+ return 0;
>
> What about mem_profiling_support set to 0 after alloc_tag_init, in this case:
> alloc_tag_cttype != NULL && mem_profiling_support==0
>
> I kind of think alloc_tag_top_users should return 0 in this case....and both mem_profiling_support and alloc_tag_cttype should be checked....
After commit 780138b12381, alloc_tag_cttype is not allocated if
!mem_profiling_support. (And that's why this bug showed up)
> >+
> > if (can_sleep)
> > codetag_lock_module_list(alloc_tag_cttype, true);
> > else if (!codetag_trylock_module_list(alloc_tag_cttype))
> >--
> >2.43.0
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()
2025-06-24 13:50 ` [PATCH " Harry Yoo
@ 2025-06-24 14:00 ` David Wang
2025-06-24 14:13 ` Harry Yoo
0 siblings, 1 reply; 22+ messages in thread
From: David Wang @ 2025-06-24 14:00 UTC (permalink / raw)
To: Harry Yoo
Cc: akpm, surenb, kent.overstreet, oliver.sang, cachen, linux-mm,
oe-lkp, stable
At 2025-06-24 21:50:02, "Harry Yoo" <harry.yoo@oracle.com> wrote:
>On Tue, Jun 24, 2025 at 09:25:58PM +0800, David Wang wrote:
>>
>> At 2025-06-24 15:25:13, "Harry Yoo" <harry.yoo@oracle.com> wrote:
>> >alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
>> >even when the alloc_tag_cttype is not allocated because:
>> >
>> > 1) alloc tagging is disabled because mem profiling is disabled
>> > (!alloc_tag_cttype)
>> > 2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
>> > 3) alloc tagging is enabled, but failed initialization
>> > (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
>> >
>> >In all cases, alloc_tag_cttype is not allocated, and therefore
>> >alloc_tag_top_users() should not attempt to acquire the semaphore.
>> >
>> >This leads to a crash on memory allocation failure by attempting to
>> >acquire a non-existent semaphore:
>> >
>> > Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
>> > KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
>> > CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
>> > Tainted: [D]=DIE
>> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
>> > RIP: 0010:down_read_trylock+0xaa/0x3b0
>> > Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
>> > RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
>> > RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
>> > RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
>> > RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
>> > R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
>> > R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
>> > FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
>> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
>> > Call Trace:
>> > <TASK>
>> > codetag_trylock_module_list+0xd/0x20
>> > alloc_tag_top_users+0x369/0x4b0
>> > __show_mem+0x1cd/0x6e0
>> > warn_alloc+0x2b1/0x390
>> > __alloc_frozen_pages_noprof+0x12b9/0x21a0
>> > alloc_pages_mpol+0x135/0x3e0
>> > alloc_slab_page+0x82/0xe0
>> > new_slab+0x212/0x240
>> > ___slab_alloc+0x82a/0xe00
>> > </TASK>
>> >
>> >As David Wang points out, this issue became easier to trigger after commit
>> >780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
>> >
>> >Before the commit, the issue occurred only when it failed to allocate
>> >and initialize alloc_tag_cttype or if a memory allocation fails before
>> >alloc_tag_init() is called. After the commit, it can be easily triggered
>> >when memory profiling is compiled but disabled at boot.
>> >
>> >To properly determine whether alloc_tag_init() has been called and
>> >its data structures initialized, verify that alloc_tag_cttype is a valid
>> >pointer before acquiring the semaphore. If the variable is NULL or an error
>> >value, it has not been properly initialized. In such a case, just skip
>> >and do not attempt to acquire the semaphore.
>> >
>> >Reported-by: kernel test robot <oliver.sang@intel.com>
>> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com__;!!ACWV5N9M2RV99hQ!MADvGKtvTvlLXNxlrJ4BdOSnbsJlyrSroPUGJ3JQHs_IF-gxxqfQ89OTZ21aN96DbmjG9qH3Wi1MlgtiSA$
>> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506131711.5b41931c-lkp@intel.com__;!!ACWV5N9M2RV99hQ!MADvGKtvTvlLXNxlrJ4BdOSnbsJlyrSroPUGJ3JQHs_IF-gxxqfQ89OTZ21aN96DbmjG9qH3Wi0o2OoynA$
>> >Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
>> >Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
>> >Cc: stable@vger.kernel.org
>> >Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
>> >---
>> >
>> >@Suren: I did not add another pr_warn() because every error path in
>> >alloc_tag_init() already has pr_err().
>> >
>> >v2 -> v3:
>> >- Added another Closes: tag (David)
>> >- Moved the condition into a standalone if block for better readability
>> > (Suren)
>> >- Typo fix (Suren)
>> >
>> > lib/alloc_tag.c | 3 +++
>> > 1 file changed, 3 insertions(+)
>> >
>> >diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
>> >index 41ccfb035b7b..e9b33848700a 100644
>> >--- a/lib/alloc_tag.c
>> >+++ b/lib/alloc_tag.c
>> >@@ -127,6 +127,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
>> > struct codetag_bytes n;
>> > unsigned int i, nr = 0;
>> >
>> >+ if (IS_ERR_OR_NULL(alloc_tag_cttype))
>> >+ return 0;
>>
>> What about mem_profiling_support set to 0 after alloc_tag_init, in this case:
>> alloc_tag_cttype != NULL && mem_profiling_support==0
>>
>> I kind of think alloc_tag_top_users should return 0 in this case....and both mem_profiling_support and alloc_tag_cttype should be checked....
>
>After commit 780138b12381, alloc_tag_cttype is not allocated if
>!mem_profiling_support. (And that's why this bug showed up)
There is a sysctl(/proc/sys/vm/mem_profiling) which can override mem_profiling_support and set it to 0, after alloc_tag_init with mem_profiling_support=1
>
>> >+
>> > if (can_sleep)
>> > codetag_lock_module_list(alloc_tag_cttype, true);
>> > else if (!codetag_trylock_module_list(alloc_tag_cttype))
>> >--
>> >2.43.0
>
>--
>Cheers,
>Harry / Hyeonggon
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()
2025-06-24 14:00 ` David Wang
@ 2025-06-24 14:13 ` Harry Yoo
2025-06-24 14:28 ` David Wang
0 siblings, 1 reply; 22+ messages in thread
From: Harry Yoo @ 2025-06-24 14:13 UTC (permalink / raw)
To: David Wang
Cc: akpm, surenb, kent.overstreet, oliver.sang, cachen, linux-mm,
oe-lkp, stable
On Tue, Jun 24, 2025 at 10:00:48PM +0800, David Wang wrote:
>
> At 2025-06-24 21:50:02, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> >On Tue, Jun 24, 2025 at 09:25:58PM +0800, David Wang wrote:
> >>
> >> At 2025-06-24 15:25:13, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> >> >alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
> >> >even when the alloc_tag_cttype is not allocated because:
> >> >
> >> > 1) alloc tagging is disabled because mem profiling is disabled
> >> > (!alloc_tag_cttype)
> >> > 2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
> >> > 3) alloc tagging is enabled, but failed initialization
> >> > (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
> >> >
> >> >In all cases, alloc_tag_cttype is not allocated, and therefore
> >> >alloc_tag_top_users() should not attempt to acquire the semaphore.
> >> >
> >> >This leads to a crash on memory allocation failure by attempting to
> >> >acquire a non-existent semaphore:
> >> >
> >> > Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
> >> > KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
> >> > CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
> >> > Tainted: [D]=DIE
> >> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> >> > RIP: 0010:down_read_trylock+0xaa/0x3b0
> >> > Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
> >> > RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
> >> > RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
> >> > RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> >> > RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
> >> > R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
> >> > R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
> >> > FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
> >> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> > CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
> >> > Call Trace:
> >> > <TASK>
> >> > codetag_trylock_module_list+0xd/0x20
> >> > alloc_tag_top_users+0x369/0x4b0
> >> > __show_mem+0x1cd/0x6e0
> >> > warn_alloc+0x2b1/0x390
> >> > __alloc_frozen_pages_noprof+0x12b9/0x21a0
> >> > alloc_pages_mpol+0x135/0x3e0
> >> > alloc_slab_page+0x82/0xe0
> >> > new_slab+0x212/0x240
> >> > ___slab_alloc+0x82a/0xe00
> >> > </TASK>
> >> >
> >> >As David Wang points out, this issue became easier to trigger after commit
> >> >780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
> >> >
> >> >Before the commit, the issue occurred only when it failed to allocate
> >> >and initialize alloc_tag_cttype or if a memory allocation fails before
> >> >alloc_tag_init() is called. After the commit, it can be easily triggered
> >> >when memory profiling is compiled but disabled at boot.
> >> >
> >> >To properly determine whether alloc_tag_init() has been called and
> >> >its data structures initialized, verify that alloc_tag_cttype is a valid
> >> >pointer before acquiring the semaphore. If the variable is NULL or an error
> >> >value, it has not been properly initialized. In such a case, just skip
> >> >and do not attempt to acquire the semaphore.
> >> >
> >> >Reported-by: kernel test robot <oliver.sang@intel.com>
> >> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com__;!!ACWV5N9M2RV99hQ!MADvGKtvTvlLXNxlrJ4BdOSnbsJlyrSroPUGJ3JQHs_IF-gxxqfQ89OTZ21aN96DbmjG9qH3Wi1MlgtiSA$
> >> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506131711.5b41931c-lkp@intel.com__;!!ACWV5N9M2RV99hQ!MADvGKtvTvlLXNxlrJ4BdOSnbsJlyrSroPUGJ3JQHs_IF-gxxqfQ89OTZ21aN96DbmjG9qH3Wi0o2OoynA$
> >> >Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
> >> >Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
> >> >Cc: stable@vger.kernel.org
> >> >Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> >> >---
> >> >
> >> >@Suren: I did not add another pr_warn() because every error path in
> >> >alloc_tag_init() already has pr_err().
> >> >
> >> >v2 -> v3:
> >> >- Added another Closes: tag (David)
> >> >- Moved the condition into a standalone if block for better readability
> >> > (Suren)
> >> >- Typo fix (Suren)
> >> >
> >> > lib/alloc_tag.c | 3 +++
> >> > 1 file changed, 3 insertions(+)
> >> >
> >> >diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> >> >index 41ccfb035b7b..e9b33848700a 100644
> >> >--- a/lib/alloc_tag.c
> >> >+++ b/lib/alloc_tag.c
> >> >@@ -127,6 +127,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> >> > struct codetag_bytes n;
> >> > unsigned int i, nr = 0;
> >> >
> >> >+ if (IS_ERR_OR_NULL(alloc_tag_cttype))
> >> >+ return 0;
> >>
> >> What about mem_profiling_support set to 0 after alloc_tag_init, in this case:
> >> alloc_tag_cttype != NULL && mem_profiling_support==0
> >>
> >> I kind of think alloc_tag_top_users should return 0 in this case....and both mem_profiling_support and alloc_tag_cttype should be checked....
> >
> >After commit 780138b12381, alloc_tag_cttype is not allocated if
> >!mem_profiling_support. (And that's why this bug showed up)
>
> There is a sysctl(/proc/sys/vm/mem_profiling) which can override mem_profiling_support and set it to 0, after alloc_tag_init with mem_profiling_support=1
Ok. Maybe it shouldn't report memory allocation information that is
collected before mem profiling was disabled. (I'm not sure why it disabling
at runtime is allowed, though)
That's a good thing to have, but I think that's a behavioral change in
mem profiling, irrelevant to this bug and not a -stable thing.
Maybe as a follow-up patch?
> >> >+
> >> > if (can_sleep)
> >> > codetag_lock_module_list(alloc_tag_cttype, true);
> >> > else if (!codetag_trylock_module_list(alloc_tag_cttype))
> >> >--
> >> >2.43.0
> >
> >--
> >Cheers,
> >Harry / Hyeonggon
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()y
2025-06-24 13:15 ` David Wang
@ 2025-06-24 14:24 ` Harry Yoo
2025-06-24 14:47 ` David Wang
0 siblings, 1 reply; 22+ messages in thread
From: Harry Yoo @ 2025-06-24 14:24 UTC (permalink / raw)
To: David Wang
Cc: akpm, surenb, kent.overstreet, oliver.sang, cachen, linux-mm,
oe-lkp, stable
On Tue, Jun 24, 2025 at 09:15:55PM +0800, David Wang wrote:
>
> At 2025-06-24 19:28:18, "David Wang" <00107082@163.com> wrote:
> >
> >At 2025-06-24 18:59:52, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> >>On Tue, Jun 24, 2025 at 05:30:02PM +0800, David Wang wrote:
> >>>
> >>> At 2025-06-24 17:09:54, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> >>> >On Tue, Jun 24, 2025 at 04:21:23PM +0800, David Wang wrote:
> >>> >> At 2025-06-24 15:25:13, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> >>> >> >alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
> >>> >> >even when the alloc_tag_cttype is not allocated because:
> >>> >> >
> >>> >> > 1) alloc tagging is disabled because mem profiling is disabled
> >>> >> > (!alloc_tag_cttype)
> >>> >> > 2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
> >>> >> > 3) alloc tagging is enabled, but failed initialization
> >>> >> > (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
> >>> >> >
> >>> >> >In all cases, alloc_tag_cttype is not allocated, and therefore
> >>> >> >alloc_tag_top_users() should not attempt to acquire the semaphore.
> >>> >> >
> >>> >> >This leads to a crash on memory allocation failure by attempting to
> >>> >> >acquire a non-existent semaphore:
> >>> >> >
> >>> >> > Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
> >>> >> > KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
> >>> >> > CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
> >>> >> > Tainted: [D]=DIE
> >>> >> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> >>> >> > RIP: 0010:down_read_trylock+0xaa/0x3b0
> >>> >> > Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
> >>> >> > RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
> >>> >> > RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
> >>> >> > RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> >>> >> > RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
> >>> >> > R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
> >>> >> > R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
> >>> >> > FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
> >>> >> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> >> > CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
> >>> >> > Call Trace:
> >>> >> > <TASK>
> >>> >> > codetag_trylock_module_list+0xd/0x20
> >>> >> > alloc_tag_top_users+0x369/0x4b0
> >>> >> > __show_mem+0x1cd/0x6e0
> >>> >> > warn_alloc+0x2b1/0x390
> >>> >> > __alloc_frozen_pages_noprof+0x12b9/0x21a0
> >>> >> > alloc_pages_mpol+0x135/0x3e0
> >>> >> > alloc_slab_page+0x82/0xe0
> >>> >> > new_slab+0x212/0x240
> >>> >> > ___slab_alloc+0x82a/0xe00
> >>> >> > </TASK>
> >>> >> >
> >>> >> >As David Wang points out, this issue became easier to trigger after commit
> >>> >> >780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
> >>> >> >
> >>> >> >Before the commit, the issue occurred only when it failed to allocate
> >>> >> >and initialize alloc_tag_cttype or if a memory allocation fails before
> >>> >> >alloc_tag_init() is called. After the commit, it can be easily triggered
> >>> >> >when memory profiling is compiled but disabled at boot.
> >>> >> >
> >>> >> >To properly determine whether alloc_tag_init() has been called and
> >>> >> >its data structures initialized, verify that alloc_tag_cttype is a valid
> >>> >> >pointer before acquiring the semaphore. If the variable is NULL or an error
> >>> >> >value, it has not been properly initialized. In such a case, just skip
> >>> >> >and do not attempt to acquire the semaphore.
> >>> >> >
> >>> >> >Reported-by: kernel test robot <oliver.sang@intel.com>
> >>> >> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com__;!!ACWV5N9M2RV99hQ!PxJNKp4Bj6h0XIWpRXcmFeIz51jORtRRAo1j23ZnRgvTm0E0Mp5l6UrLNCkiHww6AVWOSfbDDdBwKgJ9_Q$
> >>> >> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506131711.5b41931c-lkp@intel.com__;!!ACWV5N9M2RV99hQ!PxJNKp4Bj6h0XIWpRXcmFeIz51jORtRRAo1j23ZnRgvTm0E0Mp5l6UrLNCkiHww6AVWOSfbDDdC-7OiUsg$
> >>> >> >Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
> >>> >> >Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
> >>> >> >Cc: stable@vger.kernel.org
> >>> >> >Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> >>> >> >---
> >>> >> >
> >>> >> >@Suren: I did not add another pr_warn() because every error path in
> >>> >> >alloc_tag_init() already has pr_err().
> >>> >> >
> >>> >> >v2 -> v3:
> >>> >> >- Added another Closes: tag (David)
> >>> >> >- Moved the condition into a standalone if block for better readability
> >>> >> > (Suren)
> >>> >> >- Typo fix (Suren)
> >>> >> >
> >>> >> > lib/alloc_tag.c | 3 +++
> >>> >> > 1 file changed, 3 insertions(+)
> >>> >> >
> >>> >> >diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> >>> >> >index 41ccfb035b7b..e9b33848700a 100644
> >>> >> >--- a/lib/alloc_tag.c
> >>> >> >+++ b/lib/alloc_tag.c
> >>> >> >@@ -127,6 +127,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> >>> >> > struct codetag_bytes n;
> >>> >> > unsigned int i, nr = 0;
> >>> >> >
> >>> >> >+ if (IS_ERR_OR_NULL(alloc_tag_cttype))
> >>> >>
> >>> >> Should a warning added here? indicating codetag module not ready yet and the memory failure happened during boot:
> >>> >> if (mem_profiling_support) pr_warn("...
> >>> >
> >>> >I think you're saying we need to print a warning when alloc tagging
> >>> >can't provide "top users".
> >>>
> >>> I just meant printing a warning when show_mem is needed before codetag module initialized,
> >>> as reported in https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com/__;!!ACWV5N9M2RV99hQ!J2waTUro8owaYlpAZJ6fnrHZvcGMbY6qAO5QvvIGZzUv-ryWjCjhO-maTOolfpPvPSr6CpqOgkRalCwJow$
> >>> where mem_profiling_support is 1, but alloc_tag_cttype is still NULL.
> >>> This can tell we do have a memory failure during boot before codetag_init, even with memory profiling activated.
> >>
> >>Ok. You didn't mean that.
> >>
> >>But still I think it's better to handle all cases and print distinct
> >>warnings, rather than handling only the specific case where memory profiling
> >>is enabled but not yet initialized.
> >>
> >>Users will want to know why allocation information is not available,
> >>and there can be multiple reasons including the one you mentioned.
> >>
> >>What do you think?
> >
> >I am not sure....
> >I think most cases you mentioned is just a pr_info, those are expected behavior or designed that way.
> >But when mem_profiling_support==1 && alloc_tag_cttype==NULL, this is an unexpected behavior, which is a pr_warn.
>
> Put it in a clearer way, so far we have identified two "error" conditions:
> 1. mem_profiling_support=1 but initialization for alloc_tag_cttype failed, "alloc_tag_init() already has pr_err()", as you mentioned.
Yes, and this is helpful because it is not expected to fail.
> 2. mem_profiling_support=1 , but codetag module have not been init yet. I suggested adding a pr_warn here.
But in this case, I'm not sure what's the point of the pr_warn() is.
"Memory allocations are not expected fail before alloc_tag_init()"?
That's a weird assumption to write as code. I'd rather handle it
silently without informing the user.
Yes, we've identified the error condition, but it’s not an error anymore
because this patch fixes it. If it's not an error, users don't need to
be aware of the case.
I don't understand what makes this case special that the user needs to
be specifically informed about it, while they aren't informed when
memory allocation info is unavailable for other reasons.
As a user, I only care why there is no memory allocation info available.
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()
2025-06-24 14:13 ` Harry Yoo
@ 2025-06-24 14:28 ` David Wang
2025-06-24 14:57 ` Suren Baghdasaryan
0 siblings, 1 reply; 22+ messages in thread
From: David Wang @ 2025-06-24 14:28 UTC (permalink / raw)
To: Harry Yoo
Cc: akpm, surenb, kent.overstreet, oliver.sang, cachen, linux-mm,
oe-lkp, stable
At 2025-06-24 22:13:49, "Harry Yoo" <harry.yoo@oracle.com> wrote:
>On Tue, Jun 24, 2025 at 10:00:48PM +0800, David Wang wrote:
>>
>> At 2025-06-24 21:50:02, "Harry Yoo" <harry.yoo@oracle.com> wrote:
>> >On Tue, Jun 24, 2025 at 09:25:58PM +0800, David Wang wrote:
>> >>
>> >> At 2025-06-24 15:25:13, "Harry Yoo" <harry.yoo@oracle.com> wrote:
>> >> >alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
>> >> >even when the alloc_tag_cttype is not allocated because:
>> >> >
>> >> > 1) alloc tagging is disabled because mem profiling is disabled
>> >> > (!alloc_tag_cttype)
>> >> > 2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
>> >> > 3) alloc tagging is enabled, but failed initialization
>> >> > (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
>> >> >
>> >> >In all cases, alloc_tag_cttype is not allocated, and therefore
>> >> >alloc_tag_top_users() should not attempt to acquire the semaphore.
>> >> >
>> >> >This leads to a crash on memory allocation failure by attempting to
>> >> >acquire a non-existent semaphore:
>> >> >
>> >> > Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
>> >> > KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
>> >> > CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
>> >> > Tainted: [D]=DIE
>> >> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
>> >> > RIP: 0010:down_read_trylock+0xaa/0x3b0
>> >> > Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
>> >> > RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
>> >> > RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
>> >> > RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
>> >> > RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
>> >> > R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
>> >> > R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
>> >> > FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
>> >> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> >> > CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
>> >> > Call Trace:
>> >> > <TASK>
>> >> > codetag_trylock_module_list+0xd/0x20
>> >> > alloc_tag_top_users+0x369/0x4b0
>> >> > __show_mem+0x1cd/0x6e0
>> >> > warn_alloc+0x2b1/0x390
>> >> > __alloc_frozen_pages_noprof+0x12b9/0x21a0
>> >> > alloc_pages_mpol+0x135/0x3e0
>> >> > alloc_slab_page+0x82/0xe0
>> >> > new_slab+0x212/0x240
>> >> > ___slab_alloc+0x82a/0xe00
>> >> > </TASK>
>> >> >
>> >> >As David Wang points out, this issue became easier to trigger after commit
>> >> >780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
>> >> >
>> >> >Before the commit, the issue occurred only when it failed to allocate
>> >> >and initialize alloc_tag_cttype or if a memory allocation fails before
>> >> >alloc_tag_init() is called. After the commit, it can be easily triggered
>> >> >when memory profiling is compiled but disabled at boot.
>> >> >
>> >> >To properly determine whether alloc_tag_init() has been called and
>> >> >its data structures initialized, verify that alloc_tag_cttype is a valid
>> >> >pointer before acquiring the semaphore. If the variable is NULL or an error
>> >> >value, it has not been properly initialized. In such a case, just skip
>> >> >and do not attempt to acquire the semaphore.
>> >> >
>> >> >Reported-by: kernel test robot <oliver.sang@intel.com>
>> >> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com__;!!ACWV5N9M2RV99hQ!MADvGKtvTvlLXNxlrJ4BdOSnbsJlyrSroPUGJ3JQHs_IF-gxxqfQ89OTZ21aN96DbmjG9qH3Wi1MlgtiSA$
>> >> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506131711.5b41931c-lkp@intel.com__;!!ACWV5N9M2RV99hQ!MADvGKtvTvlLXNxlrJ4BdOSnbsJlyrSroPUGJ3JQHs_IF-gxxqfQ89OTZ21aN96DbmjG9qH3Wi0o2OoynA$
>> >> >Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
>> >> >Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
>> >> >Cc: stable@vger.kernel.org
>> >> >Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
>> >> >---
>> >> >
>> >> >@Suren: I did not add another pr_warn() because every error path in
>> >> >alloc_tag_init() already has pr_err().
>> >> >
>> >> >v2 -> v3:
>> >> >- Added another Closes: tag (David)
>> >> >- Moved the condition into a standalone if block for better readability
>> >> > (Suren)
>> >> >- Typo fix (Suren)
>> >> >
>> >> > lib/alloc_tag.c | 3 +++
>> >> > 1 file changed, 3 insertions(+)
>> >> >
>> >> >diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
>> >> >index 41ccfb035b7b..e9b33848700a 100644
>> >> >--- a/lib/alloc_tag.c
>> >> >+++ b/lib/alloc_tag.c
>> >> >@@ -127,6 +127,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
>> >> > struct codetag_bytes n;
>> >> > unsigned int i, nr = 0;
>> >> >
>> >> >+ if (IS_ERR_OR_NULL(alloc_tag_cttype))
>> >> >+ return 0;
>> >>
>> >> What about mem_profiling_support set to 0 after alloc_tag_init, in this case:
>> >> alloc_tag_cttype != NULL && mem_profiling_support==0
>> >>
>> >> I kind of think alloc_tag_top_users should return 0 in this case....and both mem_profiling_support and alloc_tag_cttype should be checked....
>> >
>> >After commit 780138b12381, alloc_tag_cttype is not allocated if
>> >!mem_profiling_support. (And that's why this bug showed up)
>>
>> There is a sysctl(/proc/sys/vm/mem_profiling) which can override mem_profiling_support and set it to 0, after alloc_tag_init with mem_profiling_support=1
>
>Ok. Maybe it shouldn't report memory allocation information that is
>collected before mem profiling was disabled. (I'm not sure why it disabling
>at runtime is allowed, though)
>
>That's a good thing to have, but I think that's a behavioral change in
>mem profiling, irrelevant to this bug and not a -stable thing.
>
>Maybe as a follow-up patch?
Only a little more changes needed, I was suggesting:
@@ -134,6 +122,14 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
struct codetag_bytes n;
unsigned int i, nr = 0;
+ if (!mem_profiling_support)
+ return 0;
+
+ if (IS_ERR_OR_NULL(alloc_tag_cttype)) {
+ pr_warn("alloctag module is not ready yet.\n");
+ return 0;
+ }
+
if (can_sleep)
David
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()y
2025-06-24 14:24 ` Harry Yoo
@ 2025-06-24 14:47 ` David Wang
2025-06-24 15:04 ` Suren Baghdasaryan
0 siblings, 1 reply; 22+ messages in thread
From: David Wang @ 2025-06-24 14:47 UTC (permalink / raw)
To: Harry Yoo
Cc: akpm, surenb, kent.overstreet, oliver.sang, cachen, linux-mm,
oe-lkp, stable
At 2025-06-24 22:24:33, "Harry Yoo" <harry.yoo@oracle.com> wrote:
>On Tue, Jun 24, 2025 at 09:15:55PM +0800, David Wang wrote:
>>
>> At 2025-06-24 19:28:18, "David Wang" <00107082@163.com> wrote:
>> >
>> >At 2025-06-24 18:59:52, "Harry Yoo" <harry.yoo@oracle.com> wrote:
>> >>On Tue, Jun 24, 2025 at 05:30:02PM +0800, David Wang wrote:
>> >>>
>> >>> At 2025-06-24 17:09:54, "Harry Yoo" <harry.yoo@oracle.com> wrote:
>> >>> >On Tue, Jun 24, 2025 at 04:21:23PM +0800, David Wang wrote:
>> >>> >> At 2025-06-24 15:25:13, "Harry Yoo" <harry.yoo@oracle.com> wrote:
>> >>> >> >alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
>> >>> >> >even when the alloc_tag_cttype is not allocated because:
>> >>> >> >
>> >>> >> > 1) alloc tagging is disabled because mem profiling is disabled
>> >>> >> > (!alloc_tag_cttype)
>> >>> >> > 2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
>> >>> >> > 3) alloc tagging is enabled, but failed initialization
>> >>> >> > (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
>> >>> >> >
>> >>> >> >In all cases, alloc_tag_cttype is not allocated, and therefore
>> >>> >> >alloc_tag_top_users() should not attempt to acquire the semaphore.
>> >>> >> >
>> >>> >> >This leads to a crash on memory allocation failure by attempting to
>> >>> >> >acquire a non-existent semaphore:
>> >>> >> >
>> >>> >> > Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
>> >>> >> > KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
>> >>> >> > CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
>> >>> >> > Tainted: [D]=DIE
>> >>> >> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
>> >>> >> > RIP: 0010:down_read_trylock+0xaa/0x3b0
>> >>> >> > Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
>> >>> >> > RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
>> >>> >> > RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
>> >>> >> > RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
>> >>> >> > RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
>> >>> >> > R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
>> >>> >> > R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
>> >>> >> > FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
>> >>> >> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> >>> >> > CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
>> >>> >> > Call Trace:
>> >>> >> > <TASK>
>> >>> >> > codetag_trylock_module_list+0xd/0x20
>> >>> >> > alloc_tag_top_users+0x369/0x4b0
>> >>> >> > __show_mem+0x1cd/0x6e0
>> >>> >> > warn_alloc+0x2b1/0x390
>> >>> >> > __alloc_frozen_pages_noprof+0x12b9/0x21a0
>> >>> >> > alloc_pages_mpol+0x135/0x3e0
>> >>> >> > alloc_slab_page+0x82/0xe0
>> >>> >> > new_slab+0x212/0x240
>> >>> >> > ___slab_alloc+0x82a/0xe00
>> >>> >> > </TASK>
>> >>> >> >
>> >>> >> >As David Wang points out, this issue became easier to trigger after commit
>> >>> >> >780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
>> >>> >> >
>> >>> >> >Before the commit, the issue occurred only when it failed to allocate
>> >>> >> >and initialize alloc_tag_cttype or if a memory allocation fails before
>> >>> >> >alloc_tag_init() is called. After the commit, it can be easily triggered
>> >>> >> >when memory profiling is compiled but disabled at boot.
>> >>> >> >
>> >>> >> >To properly determine whether alloc_tag_init() has been called and
>> >>> >> >its data structures initialized, verify that alloc_tag_cttype is a valid
>> >>> >> >pointer before acquiring the semaphore. If the variable is NULL or an error
>> >>> >> >value, it has not been properly initialized. In such a case, just skip
>> >>> >> >and do not attempt to acquire the semaphore.
>> >>> >> >
>> >>> >> >Reported-by: kernel test robot <oliver.sang@intel.com>
>> >>> >> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com__;!!ACWV5N9M2RV99hQ!PxJNKp4Bj6h0XIWpRXcmFeIz51jORtRRAo1j23ZnRgvTm0E0Mp5l6UrLNCkiHww6AVWOSfbDDdBwKgJ9_Q$
>> >>> >> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506131711.5b41931c-lkp@intel.com__;!!ACWV5N9M2RV99hQ!PxJNKp4Bj6h0XIWpRXcmFeIz51jORtRRAo1j23ZnRgvTm0E0Mp5l6UrLNCkiHww6AVWOSfbDDdC-7OiUsg$
>> >>> >> >Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
>> >>> >> >Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
>> >>> >> >Cc: stable@vger.kernel.org
>> >>> >> >Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
>> >>> >> >---
>> >>> >> >
>> >>> >> >@Suren: I did not add another pr_warn() because every error path in
>> >>> >> >alloc_tag_init() already has pr_err().
>> >>> >> >
>> >>> >> >v2 -> v3:
>> >>> >> >- Added another Closes: tag (David)
>> >>> >> >- Moved the condition into a standalone if block for better readability
>> >>> >> > (Suren)
>> >>> >> >- Typo fix (Suren)
>> >>> >> >
>> >>> >> > lib/alloc_tag.c | 3 +++
>> >>> >> > 1 file changed, 3 insertions(+)
>> >>> >> >
>> >>> >> >diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
>> >>> >> >index 41ccfb035b7b..e9b33848700a 100644
>> >>> >> >--- a/lib/alloc_tag.c
>> >>> >> >+++ b/lib/alloc_tag.c
>> >>> >> >@@ -127,6 +127,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
>> >>> >> > struct codetag_bytes n;
>> >>> >> > unsigned int i, nr = 0;
>> >>> >> >
>> >>> >> >+ if (IS_ERR_OR_NULL(alloc_tag_cttype))
>> >>> >>
>> >>> >> Should a warning added here? indicating codetag module not ready yet and the memory failure happened during boot:
>> >>> >> if (mem_profiling_support) pr_warn("...
>> >>> >
>> >>> >I think you're saying we need to print a warning when alloc tagging
>> >>> >can't provide "top users".
>> >>>
>> >>> I just meant printing a warning when show_mem is needed before codetag module initialized,
>> >>> as reported in https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com/__;!!ACWV5N9M2RV99hQ!J2waTUro8owaYlpAZJ6fnrHZvcGMbY6qAO5QvvIGZzUv-ryWjCjhO-maTOolfpPvPSr6CpqOgkRalCwJow$
>> >>> where mem_profiling_support is 1, but alloc_tag_cttype is still NULL.
>> >>> This can tell we do have a memory failure during boot before codetag_init, even with memory profiling activated.
>> >>
>> >>Ok. You didn't mean that.
>> >>
>> >>But still I think it's better to handle all cases and print distinct
>> >>warnings, rather than handling only the specific case where memory profiling
>> >>is enabled but not yet initialized.
>> >>
>> >>Users will want to know why allocation information is not available,
>> >>and there can be multiple reasons including the one you mentioned.
>> >>
>> >>What do you think?
>> >
>> >I am not sure....
>> >I think most cases you mentioned is just a pr_info, those are expected behavior or designed that way.
>> >But when mem_profiling_support==1 && alloc_tag_cttype==NULL, this is an unexpected behavior, which is a pr_warn.
>>
>> Put it in a clearer way, so far we have identified two "error" conditions:
>> 1. mem_profiling_support=1 but initialization for alloc_tag_cttype failed, "alloc_tag_init() already has pr_err()", as you mentioned.
>
>Yes, and this is helpful because it is not expected to fail.
>
>> 2. mem_profiling_support=1 , but codetag module have not been init yet. I suggested adding a pr_warn here.
>
>But in this case, I'm not sure what's the point of the pr_warn() is.
>"Memory allocations are not expected fail before alloc_tag_init()"?
>That's a weird assumption to write as code. I'd rather handle it
>silently without informing the user.
>
>Yes, we've identified the error condition, but it’s not an error anymore
>because this patch fixes it. If it's not an error, users don't need to
>be aware of the case.
>
>I don't understand what makes this case special that the user needs to
>be specifically informed about it, while they aren't informed when
>memory allocation info is unavailable for other reasons.
>As a user, I only care why there is no memory allocation info available.
My point is just that we are not expecting anyone calls alloc_tag_top_users() before
alloc_tag_init(), when that happened, measures, such as late_initcall if possible, can be taken
to fix it. and a warning message is easier to catch.
(This is not just for explaining why no memory profiling information shows up)
>
>--
>Cheers,
>Harry / Hyeonggon
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()
2025-06-24 14:28 ` David Wang
@ 2025-06-24 14:57 ` Suren Baghdasaryan
2025-06-24 15:14 ` Harry Yoo
2025-06-24 15:23 ` David Wang
0 siblings, 2 replies; 22+ messages in thread
From: Suren Baghdasaryan @ 2025-06-24 14:57 UTC (permalink / raw)
To: David Wang
Cc: Harry Yoo, akpm, kent.overstreet, oliver.sang, cachen, linux-mm,
oe-lkp, stable
On Tue, Jun 24, 2025 at 7:28 AM David Wang <00107082@163.com> wrote:
>
>
> At 2025-06-24 22:13:49, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> >On Tue, Jun 24, 2025 at 10:00:48PM +0800, David Wang wrote:
> >>
> >> At 2025-06-24 21:50:02, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> >> >On Tue, Jun 24, 2025 at 09:25:58PM +0800, David Wang wrote:
> >> >>
> >> >> At 2025-06-24 15:25:13, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> >> >> >alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
> >> >> >even when the alloc_tag_cttype is not allocated because:
> >> >> >
> >> >> > 1) alloc tagging is disabled because mem profiling is disabled
> >> >> > (!alloc_tag_cttype)
> >> >> > 2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
> >> >> > 3) alloc tagging is enabled, but failed initialization
> >> >> > (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
> >> >> >
> >> >> >In all cases, alloc_tag_cttype is not allocated, and therefore
> >> >> >alloc_tag_top_users() should not attempt to acquire the semaphore.
> >> >> >
> >> >> >This leads to a crash on memory allocation failure by attempting to
> >> >> >acquire a non-existent semaphore:
> >> >> >
> >> >> > Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
> >> >> > KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
> >> >> > CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
> >> >> > Tainted: [D]=DIE
> >> >> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> >> >> > RIP: 0010:down_read_trylock+0xaa/0x3b0
> >> >> > Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
> >> >> > RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
> >> >> > RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
> >> >> > RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> >> >> > RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
> >> >> > R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
> >> >> > R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
> >> >> > FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
> >> >> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> >> > CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
> >> >> > Call Trace:
> >> >> > <TASK>
> >> >> > codetag_trylock_module_list+0xd/0x20
> >> >> > alloc_tag_top_users+0x369/0x4b0
> >> >> > __show_mem+0x1cd/0x6e0
> >> >> > warn_alloc+0x2b1/0x390
> >> >> > __alloc_frozen_pages_noprof+0x12b9/0x21a0
> >> >> > alloc_pages_mpol+0x135/0x3e0
> >> >> > alloc_slab_page+0x82/0xe0
> >> >> > new_slab+0x212/0x240
> >> >> > ___slab_alloc+0x82a/0xe00
> >> >> > </TASK>
> >> >> >
> >> >> >As David Wang points out, this issue became easier to trigger after commit
> >> >> >780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
> >> >> >
> >> >> >Before the commit, the issue occurred only when it failed to allocate
> >> >> >and initialize alloc_tag_cttype or if a memory allocation fails before
> >> >> >alloc_tag_init() is called. After the commit, it can be easily triggered
> >> >> >when memory profiling is compiled but disabled at boot.
> >> >> >
> >> >> >To properly determine whether alloc_tag_init() has been called and
> >> >> >its data structures initialized, verify that alloc_tag_cttype is a valid
> >> >> >pointer before acquiring the semaphore. If the variable is NULL or an error
> >> >> >value, it has not been properly initialized. In such a case, just skip
> >> >> >and do not attempt to acquire the semaphore.
> >> >> >
> >> >> >Reported-by: kernel test robot <oliver.sang@intel.com>
> >> >> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com__;!!ACWV5N9M2RV99hQ!MADvGKtvTvlLXNxlrJ4BdOSnbsJlyrSroPUGJ3JQHs_IF-gxxqfQ89OTZ21aN96DbmjG9qH3Wi1MlgtiSA$
> >> >> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506131711.5b41931c-lkp@intel.com__;!!ACWV5N9M2RV99hQ!MADvGKtvTvlLXNxlrJ4BdOSnbsJlyrSroPUGJ3JQHs_IF-gxxqfQ89OTZ21aN96DbmjG9qH3Wi0o2OoynA$
> >> >> >Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
> >> >> >Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
> >> >> >Cc: stable@vger.kernel.org
> >> >> >Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> >> >> >---
> >> >> >
> >> >> >@Suren: I did not add another pr_warn() because every error path in
> >> >> >alloc_tag_init() already has pr_err().
> >> >> >
> >> >> >v2 -> v3:
> >> >> >- Added another Closes: tag (David)
> >> >> >- Moved the condition into a standalone if block for better readability
> >> >> > (Suren)
> >> >> >- Typo fix (Suren)
> >> >> >
> >> >> > lib/alloc_tag.c | 3 +++
> >> >> > 1 file changed, 3 insertions(+)
> >> >> >
> >> >> >diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> >> >> >index 41ccfb035b7b..e9b33848700a 100644
> >> >> >--- a/lib/alloc_tag.c
> >> >> >+++ b/lib/alloc_tag.c
> >> >> >@@ -127,6 +127,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> >> >> > struct codetag_bytes n;
> >> >> > unsigned int i, nr = 0;
> >> >> >
> >> >> >+ if (IS_ERR_OR_NULL(alloc_tag_cttype))
> >> >> >+ return 0;
> >> >>
> >> >> What about mem_profiling_support set to 0 after alloc_tag_init, in this case:
> >> >> alloc_tag_cttype != NULL && mem_profiling_support==0
> >> >>
> >> >> I kind of think alloc_tag_top_users should return 0 in this case....and both mem_profiling_support and alloc_tag_cttype should be checked....
> >> >
> >> >After commit 780138b12381, alloc_tag_cttype is not allocated if
> >> >!mem_profiling_support. (And that's why this bug showed up)
> >>
> >> There is a sysctl(/proc/sys/vm/mem_profiling) which can override mem_profiling_support and set it to 0, after alloc_tag_init with mem_profiling_support=1
Wait, /proc/sys/vm/mem_profiling is changing mem_alloc_profiling_key,
not mem_profiling_support. Am I missing something?
> >
> >Ok. Maybe it shouldn't report memory allocation information that is
> >collected before mem profiling was disabled. (I'm not sure why it disabling
> >at runtime is allowed, though)
> >
> >That's a good thing to have, but I think that's a behavioral change in
> >mem profiling, irrelevant to this bug and not a -stable thing.
> >
> >Maybe as a follow-up patch?
>
> Only a little more changes needed, I was suggesting:
>
> @@ -134,6 +122,14 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> struct codetag_bytes n;
> unsigned int i, nr = 0;
>
> + if (!mem_profiling_support)
> + return 0;
David is right that with /proc/sys/vm/mem_profiling memory profiling
can be turned off at runtime but the above condition should be:
if (!mem_alloc_profiling_enabled())
return 0;
> +
> + if (IS_ERR_OR_NULL(alloc_tag_cttype)) {
> + pr_warn("alloctag module is not ready yet.\n");
I don't think spitting out this warning on every show_mem() is useful.
If alloc_tag_cttype is invalid because codetag_register_type() failed
then we already print an error here:
https://elixir.bootlin.com/linux/v6.16-rc3/source/lib/alloc_tag.c#L829,
so user has the logs to track this down.
If show_mem() is called so early that alloc_tag_init() hasn't been
called yet then missing allocation tag information would not be
surprising I think, considering it's early boot. I don't think it's
worth detecting and reporting such a state.
> + return 0;
> + }
> +
> if (can_sleep)
>
>
>
> David
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()y
2025-06-24 14:47 ` David Wang
@ 2025-06-24 15:04 ` Suren Baghdasaryan
0 siblings, 0 replies; 22+ messages in thread
From: Suren Baghdasaryan @ 2025-06-24 15:04 UTC (permalink / raw)
To: David Wang
Cc: Harry Yoo, akpm, kent.overstreet, oliver.sang, cachen, linux-mm,
oe-lkp, stable
On Tue, Jun 24, 2025 at 7:47 AM David Wang <00107082@163.com> wrote:
>
>
> At 2025-06-24 22:24:33, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> >On Tue, Jun 24, 2025 at 09:15:55PM +0800, David Wang wrote:
> >>
> >> At 2025-06-24 19:28:18, "David Wang" <00107082@163.com> wrote:
> >> >
> >> >At 2025-06-24 18:59:52, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> >> >>On Tue, Jun 24, 2025 at 05:30:02PM +0800, David Wang wrote:
> >> >>>
> >> >>> At 2025-06-24 17:09:54, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> >> >>> >On Tue, Jun 24, 2025 at 04:21:23PM +0800, David Wang wrote:
> >> >>> >> At 2025-06-24 15:25:13, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> >> >>> >> >alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
> >> >>> >> >even when the alloc_tag_cttype is not allocated because:
> >> >>> >> >
> >> >>> >> > 1) alloc tagging is disabled because mem profiling is disabled
> >> >>> >> > (!alloc_tag_cttype)
> >> >>> >> > 2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
> >> >>> >> > 3) alloc tagging is enabled, but failed initialization
> >> >>> >> > (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
> >> >>> >> >
> >> >>> >> >In all cases, alloc_tag_cttype is not allocated, and therefore
> >> >>> >> >alloc_tag_top_users() should not attempt to acquire the semaphore.
> >> >>> >> >
> >> >>> >> >This leads to a crash on memory allocation failure by attempting to
> >> >>> >> >acquire a non-existent semaphore:
> >> >>> >> >
> >> >>> >> > Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
> >> >>> >> > KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
> >> >>> >> > CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
> >> >>> >> > Tainted: [D]=DIE
> >> >>> >> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> >> >>> >> > RIP: 0010:down_read_trylock+0xaa/0x3b0
> >> >>> >> > Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
> >> >>> >> > RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
> >> >>> >> > RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
> >> >>> >> > RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> >> >>> >> > RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
> >> >>> >> > R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
> >> >>> >> > R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
> >> >>> >> > FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
> >> >>> >> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> >>> >> > CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
> >> >>> >> > Call Trace:
> >> >>> >> > <TASK>
> >> >>> >> > codetag_trylock_module_list+0xd/0x20
> >> >>> >> > alloc_tag_top_users+0x369/0x4b0
> >> >>> >> > __show_mem+0x1cd/0x6e0
> >> >>> >> > warn_alloc+0x2b1/0x390
> >> >>> >> > __alloc_frozen_pages_noprof+0x12b9/0x21a0
> >> >>> >> > alloc_pages_mpol+0x135/0x3e0
> >> >>> >> > alloc_slab_page+0x82/0xe0
> >> >>> >> > new_slab+0x212/0x240
> >> >>> >> > ___slab_alloc+0x82a/0xe00
> >> >>> >> > </TASK>
> >> >>> >> >
> >> >>> >> >As David Wang points out, this issue became easier to trigger after commit
> >> >>> >> >780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
> >> >>> >> >
> >> >>> >> >Before the commit, the issue occurred only when it failed to allocate
> >> >>> >> >and initialize alloc_tag_cttype or if a memory allocation fails before
> >> >>> >> >alloc_tag_init() is called. After the commit, it can be easily triggered
> >> >>> >> >when memory profiling is compiled but disabled at boot.
> >> >>> >> >
> >> >>> >> >To properly determine whether alloc_tag_init() has been called and
> >> >>> >> >its data structures initialized, verify that alloc_tag_cttype is a valid
> >> >>> >> >pointer before acquiring the semaphore. If the variable is NULL or an error
> >> >>> >> >value, it has not been properly initialized. In such a case, just skip
> >> >>> >> >and do not attempt to acquire the semaphore.
> >> >>> >> >
> >> >>> >> >Reported-by: kernel test robot <oliver.sang@intel.com>
> >> >>> >> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com__;!!ACWV5N9M2RV99hQ!PxJNKp4Bj6h0XIWpRXcmFeIz51jORtRRAo1j23ZnRgvTm0E0Mp5l6UrLNCkiHww6AVWOSfbDDdBwKgJ9_Q$
> >> >>> >> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506131711.5b41931c-lkp@intel.com__;!!ACWV5N9M2RV99hQ!PxJNKp4Bj6h0XIWpRXcmFeIz51jORtRRAo1j23ZnRgvTm0E0Mp5l6UrLNCkiHww6AVWOSfbDDdC-7OiUsg$
> >> >>> >> >Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
> >> >>> >> >Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
> >> >>> >> >Cc: stable@vger.kernel.org
> >> >>> >> >Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> >> >>> >> >---
> >> >>> >> >
> >> >>> >> >@Suren: I did not add another pr_warn() because every error path in
> >> >>> >> >alloc_tag_init() already has pr_err().
> >> >>> >> >
> >> >>> >> >v2 -> v3:
> >> >>> >> >- Added another Closes: tag (David)
> >> >>> >> >- Moved the condition into a standalone if block for better readability
> >> >>> >> > (Suren)
> >> >>> >> >- Typo fix (Suren)
> >> >>> >> >
> >> >>> >> > lib/alloc_tag.c | 3 +++
> >> >>> >> > 1 file changed, 3 insertions(+)
> >> >>> >> >
> >> >>> >> >diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> >> >>> >> >index 41ccfb035b7b..e9b33848700a 100644
> >> >>> >> >--- a/lib/alloc_tag.c
> >> >>> >> >+++ b/lib/alloc_tag.c
> >> >>> >> >@@ -127,6 +127,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> >> >>> >> > struct codetag_bytes n;
> >> >>> >> > unsigned int i, nr = 0;
> >> >>> >> >
> >> >>> >> >+ if (IS_ERR_OR_NULL(alloc_tag_cttype))
> >> >>> >>
> >> >>> >> Should a warning added here? indicating codetag module not ready yet and the memory failure happened during boot:
> >> >>> >> if (mem_profiling_support) pr_warn("...
> >> >>> >
> >> >>> >I think you're saying we need to print a warning when alloc tagging
> >> >>> >can't provide "top users".
> >> >>>
> >> >>> I just meant printing a warning when show_mem is needed before codetag module initialized,
> >> >>> as reported in https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com/__;!!ACWV5N9M2RV99hQ!J2waTUro8owaYlpAZJ6fnrHZvcGMbY6qAO5QvvIGZzUv-ryWjCjhO-maTOolfpPvPSr6CpqOgkRalCwJow$
> >> >>> where mem_profiling_support is 1, but alloc_tag_cttype is still NULL.
> >> >>> This can tell we do have a memory failure during boot before codetag_init, even with memory profiling activated.
> >> >>
> >> >>Ok. You didn't mean that.
> >> >>
> >> >>But still I think it's better to handle all cases and print distinct
> >> >>warnings, rather than handling only the specific case where memory profiling
> >> >>is enabled but not yet initialized.
> >> >>
> >> >>Users will want to know why allocation information is not available,
> >> >>and there can be multiple reasons including the one you mentioned.
> >> >>
> >> >>What do you think?
> >> >
> >> >I am not sure....
> >> >I think most cases you mentioned is just a pr_info, those are expected behavior or designed that way.
> >> >But when mem_profiling_support==1 && alloc_tag_cttype==NULL, this is an unexpected behavior, which is a pr_warn.
> >>
> >> Put it in a clearer way, so far we have identified two "error" conditions:
> >> 1. mem_profiling_support=1 but initialization for alloc_tag_cttype failed, "alloc_tag_init() already has pr_err()", as you mentioned.
> >
> >Yes, and this is helpful because it is not expected to fail.
> >
> >> 2. mem_profiling_support=1 , but codetag module have not been init yet. I suggested adding a pr_warn here.
> >
> >But in this case, I'm not sure what's the point of the pr_warn() is.
> >"Memory allocations are not expected fail before alloc_tag_init()"?
> >That's a weird assumption to write as code. I'd rather handle it
> >silently without informing the user.
> >
> >Yes, we've identified the error condition, but it’s not an error anymore
> >because this patch fixes it. If it's not an error, users don't need to
> >be aware of the case.
> >
> >I don't understand what makes this case special that the user needs to
> >be specifically informed about it, while they aren't informed when
> >memory allocation info is unavailable for other reasons.
> >As a user, I only care why there is no memory allocation info available.
>
> My point is just that we are not expecting anyone calls alloc_tag_top_users() before
> alloc_tag_init(), when that happened, measures, such as late_initcall if possible, can be taken
> to fix it. and a warning message is easier to catch.
> (This is not just for explaining why no memory profiling information shows up)
I wouldn't rule out the possibility of
show_mem()->alloc_tag_top_users() being called during early boot and
before alloc_tag_init(). Such a thing can happen during early boot,
however I would not call that an error and if that happens and
allocinfo data is missing from the report I don't think that would
surprise the user, so not sure it's worth detecting and reporting this
case.
>
> >
> >--
> >Cheers,
> >Harry / Hyeonggon
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()
2025-06-24 14:57 ` Suren Baghdasaryan
@ 2025-06-24 15:14 ` Harry Yoo
2025-06-24 15:38 ` Suren Baghdasaryan
2025-06-24 15:23 ` David Wang
1 sibling, 1 reply; 22+ messages in thread
From: Harry Yoo @ 2025-06-24 15:14 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: David Wang, akpm, kent.overstreet, oliver.sang, cachen, linux-mm,
oe-lkp, stable
On Tue, Jun 24, 2025 at 07:57:40AM -0700, Suren Baghdasaryan wrote:
> On Tue, Jun 24, 2025 at 7:28 AM David Wang <00107082@163.com> wrote:
> >
> >
> > At 2025-06-24 22:13:49, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> > >On Tue, Jun 24, 2025 at 10:00:48PM +0800, David Wang wrote:
> > >>
> > >> At 2025-06-24 21:50:02, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> > >> >On Tue, Jun 24, 2025 at 09:25:58PM +0800, David Wang wrote:
> > >> >>
> > >> >> At 2025-06-24 15:25:13, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> > >> >> >alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
> > >> >> >even when the alloc_tag_cttype is not allocated because:
> > >> >> >
> > >> >> > 1) alloc tagging is disabled because mem profiling is disabled
> > >> >> > (!alloc_tag_cttype)
> > >> >> > 2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
> > >> >> > 3) alloc tagging is enabled, but failed initialization
> > >> >> > (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
> > >> >> >
> > >> >> >In all cases, alloc_tag_cttype is not allocated, and therefore
> > >> >> >alloc_tag_top_users() should not attempt to acquire the semaphore.
> > >> >> >
> > >> >> >This leads to a crash on memory allocation failure by attempting to
> > >> >> >acquire a non-existent semaphore:
> > >> >> >
> > >> >> > Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
> > >> >> > KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
> > >> >> > CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
> > >> >> > Tainted: [D]=DIE
> > >> >> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > >> >> > RIP: 0010:down_read_trylock+0xaa/0x3b0
> > >> >> > Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
> > >> >> > RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
> > >> >> > RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
> > >> >> > RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> > >> >> > RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
> > >> >> > R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
> > >> >> > R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
> > >> >> > FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
> > >> >> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >> >> > CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
> > >> >> > Call Trace:
> > >> >> > <TASK>
> > >> >> > codetag_trylock_module_list+0xd/0x20
> > >> >> > alloc_tag_top_users+0x369/0x4b0
> > >> >> > __show_mem+0x1cd/0x6e0
> > >> >> > warn_alloc+0x2b1/0x390
> > >> >> > __alloc_frozen_pages_noprof+0x12b9/0x21a0
> > >> >> > alloc_pages_mpol+0x135/0x3e0
> > >> >> > alloc_slab_page+0x82/0xe0
> > >> >> > new_slab+0x212/0x240
> > >> >> > ___slab_alloc+0x82a/0xe00
> > >> >> > </TASK>
> > >> >> >
> > >> >> >As David Wang points out, this issue became easier to trigger after commit
> > >> >> >780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
> > >> >> >
> > >> >> >Before the commit, the issue occurred only when it failed to allocate
> > >> >> >and initialize alloc_tag_cttype or if a memory allocation fails before
> > >> >> >alloc_tag_init() is called. After the commit, it can be easily triggered
> > >> >> >when memory profiling is compiled but disabled at boot.
> > >> >> >
> > >> >> >To properly determine whether alloc_tag_init() has been called and
> > >> >> >its data structures initialized, verify that alloc_tag_cttype is a valid
> > >> >> >pointer before acquiring the semaphore. If the variable is NULL or an error
> > >> >> >value, it has not been properly initialized. In such a case, just skip
> > >> >> >and do not attempt to acquire the semaphore.
> > >> >> >
> > >> >> >Reported-by: kernel test robot <oliver.sang@intel.com>
> > >> >> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com__;!!ACWV5N9M2RV99hQ!MADvGKtvTvlLXNxlrJ4BdOSnbsJlyrSroPUGJ3JQHs_IF-gxxqfQ89OTZ21aN96DbmjG9qH3Wi1MlgtiSA$
> > >> >> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506131711.5b41931c-lkp@intel.com__;!!ACWV5N9M2RV99hQ!MADvGKtvTvlLXNxlrJ4BdOSnbsJlyrSroPUGJ3JQHs_IF-gxxqfQ89OTZ21aN96DbmjG9qH3Wi0o2OoynA$
> > >> >> >Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
> > >> >> >Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
> > >> >> >Cc: stable@vger.kernel.org
> > >> >> >Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> > >> >> >---
> > >> >> >
> > >> >> >@Suren: I did not add another pr_warn() because every error path in
> > >> >> >alloc_tag_init() already has pr_err().
> > >> >> >
> > >> >> >v2 -> v3:
> > >> >> >- Added another Closes: tag (David)
> > >> >> >- Moved the condition into a standalone if block for better readability
> > >> >> > (Suren)
> > >> >> >- Typo fix (Suren)
> > >> >> >
> > >> >> > lib/alloc_tag.c | 3 +++
> > >> >> > 1 file changed, 3 insertions(+)
> > >> >> >
> > >> >> >diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> > >> >> >index 41ccfb035b7b..e9b33848700a 100644
> > >> >> >--- a/lib/alloc_tag.c
> > >> >> >+++ b/lib/alloc_tag.c
> > >> >> >@@ -127,6 +127,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> > >> >> > struct codetag_bytes n;
> > >> >> > unsigned int i, nr = 0;
> > >> >> >
> > >> >> >+ if (IS_ERR_OR_NULL(alloc_tag_cttype))
> > >> >> >+ return 0;
> > >> >>
> > >> >> What about mem_profiling_support set to 0 after alloc_tag_init, in this case:
> > >> >> alloc_tag_cttype != NULL && mem_profiling_support==0
> > >> >>
> > >> >> I kind of think alloc_tag_top_users should return 0 in this case....and both mem_profiling_support and alloc_tag_cttype should be checked....
> > >> >
> > >> >After commit 780138b12381, alloc_tag_cttype is not allocated if
> > >> >!mem_profiling_support. (And that's why this bug showed up)
> > >>
> > >> There is a sysctl(/proc/sys/vm/mem_profiling) which can override mem_profiling_support and set it to 0, after alloc_tag_init with mem_profiling_support=1
>
> Wait, /proc/sys/vm/mem_profiling is changing mem_alloc_profiling_key,
> not mem_profiling_support. Am I missing something?
Feels like it should call shutdown_mem_profiling() instead of
proc_do_static_key() (and also remove /proc/allocinfo)?
> > >
> > >Ok. Maybe it shouldn't report memory allocation information that is
> > >collected before mem profiling was disabled. (I'm not sure why it disabling
> > >at runtime is allowed, though)
> > >
> > >That's a good thing to have, but I think that's a behavioral change in
> > >mem profiling, irrelevant to this bug and not a -stable thing.
> > >
> > >Maybe as a follow-up patch?
> >
> > Only a little more changes needed, I was suggesting:
> >
> > @@ -134,6 +122,14 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> > struct codetag_bytes n;
> > unsigned int i, nr = 0;
> >
> > + if (!mem_profiling_support)
> > + return 0;
>
> David is right that with /proc/sys/vm/mem_profiling memory profiling
> can be turned off at runtime but the above condition should be:
> if (!mem_alloc_profiling_enabled())
> return 0;
I agree that this change is a useful addition, but adding it to the patch
doesn't look right. It's doing two different things.
> > +
> > + if (IS_ERR_OR_NULL(alloc_tag_cttype)) {
> > + pr_warn("alloctag module is not ready yet.\n");
>
> I don't think spitting out this warning on every show_mem() is useful.
> If alloc_tag_cttype is invalid because codetag_register_type() failed
> then we already print an error here:
> https://elixir.bootlin.com/linux/v6.16-rc3/source/lib/alloc_tag.c#L829,
> so user has the logs to track this down.
> If show_mem() is called so early that alloc_tag_init() hasn't been
> called yet then missing allocation tag information would not be
> surprising I think, considering it's early boot. I don't think it's
> worth detecting and reporting such a state.
>
> > + return 0;
> > + }
> > +
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()
2025-06-24 14:57 ` Suren Baghdasaryan
2025-06-24 15:14 ` Harry Yoo
@ 2025-06-24 15:23 ` David Wang
1 sibling, 0 replies; 22+ messages in thread
From: David Wang @ 2025-06-24 15:23 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: Harry Yoo, akpm, kent.overstreet, oliver.sang, cachen, linux-mm,
oe-lkp, stable
At 2025-06-24 22:57:40, "Suren Baghdasaryan" <surenb@google.com> wrote:
>On Tue, Jun 24, 2025 at 7:28 AM David Wang <00107082@163.com> wrote:
>>
>>
>> At 2025-06-24 22:13:49, "Harry Yoo" <harry.yoo@oracle.com> wrote:
>> >On Tue, Jun 24, 2025 at 10:00:48PM +0800, David Wang wrote:
>> >>
>> >> At 2025-06-24 21:50:02, "Harry Yoo" <harry.yoo@oracle.com> wrote:
>> >> >On Tue, Jun 24, 2025 at 09:25:58PM +0800, David Wang wrote:
>> >> >>
>> >> >> At 2025-06-24 15:25:13, "Harry Yoo" <harry.yoo@oracle.com> wrote:
>> >> >> >alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
>> >> >> >even when the alloc_tag_cttype is not allocated because:
>> >> >> >
>> >> >> > 1) alloc tagging is disabled because mem profiling is disabled
>> >> >> > (!alloc_tag_cttype)
>> >> >> > 2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
>> >> >> > 3) alloc tagging is enabled, but failed initialization
>> >> >> > (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
>> >> >> >
>> >> >> >In all cases, alloc_tag_cttype is not allocated, and therefore
>> >> >> >alloc_tag_top_users() should not attempt to acquire the semaphore.
>> >> >> >
>> >> >> >This leads to a crash on memory allocation failure by attempting to
>> >> >> >acquire a non-existent semaphore:
>> >> >> >
>> >> >> > Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
>> >> >> > KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
>> >> >> > CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
>> >> >> > Tainted: [D]=DIE
>> >> >> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
>> >> >> > RIP: 0010:down_read_trylock+0xaa/0x3b0
>> >> >> > Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
>> >> >> > RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
>> >> >> > RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
>> >> >> > RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
>> >> >> > RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
>> >> >> > R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
>> >> >> > R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
>> >> >> > FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
>> >> >> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> >> >> > CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
>> >> >> > Call Trace:
>> >> >> > <TASK>
>> >> >> > codetag_trylock_module_list+0xd/0x20
>> >> >> > alloc_tag_top_users+0x369/0x4b0
>> >> >> > __show_mem+0x1cd/0x6e0
>> >> >> > warn_alloc+0x2b1/0x390
>> >> >> > __alloc_frozen_pages_noprof+0x12b9/0x21a0
>> >> >> > alloc_pages_mpol+0x135/0x3e0
>> >> >> > alloc_slab_page+0x82/0xe0
>> >> >> > new_slab+0x212/0x240
>> >> >> > ___slab_alloc+0x82a/0xe00
>> >> >> > </TASK>
>> >> >> >
>> >> >> >As David Wang points out, this issue became easier to trigger after commit
>> >> >> >780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
>> >> >> >
>> >> >> >Before the commit, the issue occurred only when it failed to allocate
>> >> >> >and initialize alloc_tag_cttype or if a memory allocation fails before
>> >> >> >alloc_tag_init() is called. After the commit, it can be easily triggered
>> >> >> >when memory profiling is compiled but disabled at boot.
>> >> >> >
>> >> >> >To properly determine whether alloc_tag_init() has been called and
>> >> >> >its data structures initialized, verify that alloc_tag_cttype is a valid
>> >> >> >pointer before acquiring the semaphore. If the variable is NULL or an error
>> >> >> >value, it has not been properly initialized. In such a case, just skip
>> >> >> >and do not attempt to acquire the semaphore.
>> >> >> >
>> >> >> >Reported-by: kernel test robot <oliver.sang@intel.com>
>> >> >> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com__;!!ACWV5N9M2RV99hQ!MADvGKtvTvlLXNxlrJ4BdOSnbsJlyrSroPUGJ3JQHs_IF-gxxqfQ89OTZ21aN96DbmjG9qH3Wi1MlgtiSA$
>> >> >> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506131711.5b41931c-lkp@intel.com__;!!ACWV5N9M2RV99hQ!MADvGKtvTvlLXNxlrJ4BdOSnbsJlyrSroPUGJ3JQHs_IF-gxxqfQ89OTZ21aN96DbmjG9qH3Wi0o2OoynA$
>> >> >> >Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
>> >> >> >Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
>> >> >> >Cc: stable@vger.kernel.org
>> >> >> >Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
>> >> >> >---
>> >> >> >
>> >> >> >@Suren: I did not add another pr_warn() because every error path in
>> >> >> >alloc_tag_init() already has pr_err().
>> >> >> >
>> >> >> >v2 -> v3:
>> >> >> >- Added another Closes: tag (David)
>> >> >> >- Moved the condition into a standalone if block for better readability
>> >> >> > (Suren)
>> >> >> >- Typo fix (Suren)
>> >> >> >
>> >> >> > lib/alloc_tag.c | 3 +++
>> >> >> > 1 file changed, 3 insertions(+)
>> >> >> >
>> >> >> >diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
>> >> >> >index 41ccfb035b7b..e9b33848700a 100644
>> >> >> >--- a/lib/alloc_tag.c
>> >> >> >+++ b/lib/alloc_tag.c
>> >> >> >@@ -127,6 +127,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
>> >> >> > struct codetag_bytes n;
>> >> >> > unsigned int i, nr = 0;
>> >> >> >
>> >> >> >+ if (IS_ERR_OR_NULL(alloc_tag_cttype))
>> >> >> >+ return 0;
>> >> >>
>> >> >> What about mem_profiling_support set to 0 after alloc_tag_init, in this case:
>> >> >> alloc_tag_cttype != NULL && mem_profiling_support==0
>> >> >>
>> >> >> I kind of think alloc_tag_top_users should return 0 in this case....and both mem_profiling_support and alloc_tag_cttype should be checked....
>> >> >
>> >> >After commit 780138b12381, alloc_tag_cttype is not allocated if
>> >> >!mem_profiling_support. (And that's why this bug showed up)
>> >>
>> >> There is a sysctl(/proc/sys/vm/mem_profiling) which can override mem_profiling_support and set it to 0, after alloc_tag_init with mem_profiling_support=1
>
>Wait, /proc/sys/vm/mem_profiling is changing mem_alloc_profiling_key,
>not mem_profiling_support. Am I missing something?
Oh... I was wrong again, I was reading a init func setup_early_mem_profiling......
>
>> >
>> >Ok. Maybe it shouldn't report memory allocation information that is
>> >collected before mem profiling was disabled. (I'm not sure why it disabling
>> >at runtime is allowed, though)
>> >
>> >That's a good thing to have, but I think that's a behavioral change in
>> >mem profiling, irrelevant to this bug and not a -stable thing.
>> >
>> >Maybe as a follow-up patch?
>>
>> Only a little more changes needed, I was suggesting:
>>
>> @@ -134,6 +122,14 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
>> struct codetag_bytes n;
>> unsigned int i, nr = 0;
>>
>> + if (!mem_profiling_support)
>> + return 0;
>
>David is right that with /proc/sys/vm/mem_profiling memory profiling
>can be turned off at runtime but the above condition should be:
>
>if (!mem_alloc_profiling_enabled())
> return 0;
>
>
>> +
>> + if (IS_ERR_OR_NULL(alloc_tag_cttype)) {
>> + pr_warn("alloctag module is not ready yet.\n");
>
>I don't think spitting out this warning on every show_mem() is useful.
>If alloc_tag_cttype is invalid because codetag_register_type() failed
>then we already print an error here:
>https://elixir.bootlin.com/linux/v6.16-rc3/source/lib/alloc_tag.c#L829,
>so user has the logs to track this down.
>If show_mem() is called so early that alloc_tag_init() hasn't been
>called yet then missing allocation tag information would not be
>surprising I think, considering it's early boot. I don't think it's
>worth detecting and reporting such a state.
>
>> + return 0;
>> + }
>> +
>> if (can_sleep)
>>
>>
>>
>> David
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()
2025-06-24 15:14 ` Harry Yoo
@ 2025-06-24 15:38 ` Suren Baghdasaryan
2025-06-24 18:00 ` Harry Yoo
0 siblings, 1 reply; 22+ messages in thread
From: Suren Baghdasaryan @ 2025-06-24 15:38 UTC (permalink / raw)
To: Harry Yoo
Cc: David Wang, akpm, kent.overstreet, oliver.sang, cachen, linux-mm,
oe-lkp, stable
On Tue, Jun 24, 2025 at 8:14 AM Harry Yoo <harry.yoo@oracle.com> wrote:
>
> On Tue, Jun 24, 2025 at 07:57:40AM -0700, Suren Baghdasaryan wrote:
> > On Tue, Jun 24, 2025 at 7:28 AM David Wang <00107082@163.com> wrote:
> > >
> > >
> > > At 2025-06-24 22:13:49, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> > > >On Tue, Jun 24, 2025 at 10:00:48PM +0800, David Wang wrote:
> > > >>
> > > >> At 2025-06-24 21:50:02, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> > > >> >On Tue, Jun 24, 2025 at 09:25:58PM +0800, David Wang wrote:
> > > >> >>
> > > >> >> At 2025-06-24 15:25:13, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> > > >> >> >alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
> > > >> >> >even when the alloc_tag_cttype is not allocated because:
> > > >> >> >
> > > >> >> > 1) alloc tagging is disabled because mem profiling is disabled
> > > >> >> > (!alloc_tag_cttype)
> > > >> >> > 2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
> > > >> >> > 3) alloc tagging is enabled, but failed initialization
> > > >> >> > (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
> > > >> >> >
> > > >> >> >In all cases, alloc_tag_cttype is not allocated, and therefore
> > > >> >> >alloc_tag_top_users() should not attempt to acquire the semaphore.
> > > >> >> >
> > > >> >> >This leads to a crash on memory allocation failure by attempting to
> > > >> >> >acquire a non-existent semaphore:
> > > >> >> >
> > > >> >> > Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
> > > >> >> > KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
> > > >> >> > CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
> > > >> >> > Tainted: [D]=DIE
> > > >> >> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > >> >> > RIP: 0010:down_read_trylock+0xaa/0x3b0
> > > >> >> > Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
> > > >> >> > RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
> > > >> >> > RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
> > > >> >> > RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> > > >> >> > RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
> > > >> >> > R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
> > > >> >> > R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
> > > >> >> > FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
> > > >> >> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > >> >> > CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
> > > >> >> > Call Trace:
> > > >> >> > <TASK>
> > > >> >> > codetag_trylock_module_list+0xd/0x20
> > > >> >> > alloc_tag_top_users+0x369/0x4b0
> > > >> >> > __show_mem+0x1cd/0x6e0
> > > >> >> > warn_alloc+0x2b1/0x390
> > > >> >> > __alloc_frozen_pages_noprof+0x12b9/0x21a0
> > > >> >> > alloc_pages_mpol+0x135/0x3e0
> > > >> >> > alloc_slab_page+0x82/0xe0
> > > >> >> > new_slab+0x212/0x240
> > > >> >> > ___slab_alloc+0x82a/0xe00
> > > >> >> > </TASK>
> > > >> >> >
> > > >> >> >As David Wang points out, this issue became easier to trigger after commit
> > > >> >> >780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
> > > >> >> >
> > > >> >> >Before the commit, the issue occurred only when it failed to allocate
> > > >> >> >and initialize alloc_tag_cttype or if a memory allocation fails before
> > > >> >> >alloc_tag_init() is called. After the commit, it can be easily triggered
> > > >> >> >when memory profiling is compiled but disabled at boot.
> > > >> >> >
> > > >> >> >To properly determine whether alloc_tag_init() has been called and
> > > >> >> >its data structures initialized, verify that alloc_tag_cttype is a valid
> > > >> >> >pointer before acquiring the semaphore. If the variable is NULL or an error
> > > >> >> >value, it has not been properly initialized. In such a case, just skip
> > > >> >> >and do not attempt to acquire the semaphore.
> > > >> >> >
> > > >> >> >Reported-by: kernel test robot <oliver.sang@intel.com>
> > > >> >> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com__;!!ACWV5N9M2RV99hQ!MADvGKtvTvlLXNxlrJ4BdOSnbsJlyrSroPUGJ3JQHs_IF-gxxqfQ89OTZ21aN96DbmjG9qH3Wi1MlgtiSA$
> > > >> >> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506131711.5b41931c-lkp@intel.com__;!!ACWV5N9M2RV99hQ!MADvGKtvTvlLXNxlrJ4BdOSnbsJlyrSroPUGJ3JQHs_IF-gxxqfQ89OTZ21aN96DbmjG9qH3Wi0o2OoynA$
> > > >> >> >Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
> > > >> >> >Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
> > > >> >> >Cc: stable@vger.kernel.org
> > > >> >> >Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> > > >> >> >---
> > > >> >> >
> > > >> >> >@Suren: I did not add another pr_warn() because every error path in
> > > >> >> >alloc_tag_init() already has pr_err().
> > > >> >> >
> > > >> >> >v2 -> v3:
> > > >> >> >- Added another Closes: tag (David)
> > > >> >> >- Moved the condition into a standalone if block for better readability
> > > >> >> > (Suren)
> > > >> >> >- Typo fix (Suren)
> > > >> >> >
> > > >> >> > lib/alloc_tag.c | 3 +++
> > > >> >> > 1 file changed, 3 insertions(+)
> > > >> >> >
> > > >> >> >diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> > > >> >> >index 41ccfb035b7b..e9b33848700a 100644
> > > >> >> >--- a/lib/alloc_tag.c
> > > >> >> >+++ b/lib/alloc_tag.c
> > > >> >> >@@ -127,6 +127,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> > > >> >> > struct codetag_bytes n;
> > > >> >> > unsigned int i, nr = 0;
> > > >> >> >
> > > >> >> >+ if (IS_ERR_OR_NULL(alloc_tag_cttype))
> > > >> >> >+ return 0;
> > > >> >>
> > > >> >> What about mem_profiling_support set to 0 after alloc_tag_init, in this case:
> > > >> >> alloc_tag_cttype != NULL && mem_profiling_support==0
> > > >> >>
> > > >> >> I kind of think alloc_tag_top_users should return 0 in this case....and both mem_profiling_support and alloc_tag_cttype should be checked....
> > > >> >
> > > >> >After commit 780138b12381, alloc_tag_cttype is not allocated if
> > > >> >!mem_profiling_support. (And that's why this bug showed up)
> > > >>
> > > >> There is a sysctl(/proc/sys/vm/mem_profiling) which can override mem_profiling_support and set it to 0, after alloc_tag_init with mem_profiling_support=1
> >
> > Wait, /proc/sys/vm/mem_profiling is changing mem_alloc_profiling_key,
> > not mem_profiling_support. Am I missing something?
>
> Feels like it should call shutdown_mem_profiling() instead of
> proc_do_static_key() (and also remove /proc/allocinfo)?
No, we should be able to re-enable it later on. You can't do that if
you call shutdown_mem_profiling().
mem_profiling_support is very different from mem_alloc_profiling_key.
mem_profiling_support means memory profiling is not supported while
mem_alloc_profiling_key means it's enabled or disabled and can be
changed later.
>
> > > >
> > > >Ok. Maybe it shouldn't report memory allocation information that is
> > > >collected before mem profiling was disabled. (I'm not sure why it disabling
> > > >at runtime is allowed, though)
> > > >
> > > >That's a good thing to have, but I think that's a behavioral change in
> > > >mem profiling, irrelevant to this bug and not a -stable thing.
> > > >
> > > >Maybe as a follow-up patch?
> > >
> > > Only a little more changes needed, I was suggesting:
> > >
> > > @@ -134,6 +122,14 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> > > struct codetag_bytes n;
> > > unsigned int i, nr = 0;
> > >
> > > + if (!mem_profiling_support)
> > > + return 0;
> >
> > David is right that with /proc/sys/vm/mem_profiling memory profiling
> > can be turned off at runtime but the above condition should be:
> > if (!mem_alloc_profiling_enabled())
> > return 0;
>
> I agree that this change is a useful addition, but adding it to the patch
> doesn't look right. It's doing two different things.
You might be right, calling alloc_tag_top_users() while
!mem_alloc_profiling_enabled() will print older data but it won't lead
to UAF.
>
> > > +
> > > + if (IS_ERR_OR_NULL(alloc_tag_cttype)) {
> > > + pr_warn("alloctag module is not ready yet.\n");
> >
> > I don't think spitting out this warning on every show_mem() is useful.
> > If alloc_tag_cttype is invalid because codetag_register_type() failed
> > then we already print an error here:
> > https://elixir.bootlin.com/linux/v6.16-rc3/source/lib/alloc_tag.c#L829,
> > so user has the logs to track this down.
> > If show_mem() is called so early that alloc_tag_init() hasn't been
> > called yet then missing allocation tag information would not be
> > surprising I think, considering it's early boot. I don't think it's
> > worth detecting and reporting such a state.
> >
> > > + return 0;
> > > + }
> > > +
>
> --
> Cheers,
> Harry / Hyeonggon
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()
2025-06-24 15:38 ` Suren Baghdasaryan
@ 2025-06-24 18:00 ` Harry Yoo
2025-06-25 0:52 ` Suren Baghdasaryan
0 siblings, 1 reply; 22+ messages in thread
From: Harry Yoo @ 2025-06-24 18:00 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: David Wang, akpm, kent.overstreet, oliver.sang, cachen, linux-mm,
oe-lkp, stable
On Tue, Jun 24, 2025 at 08:38:05AM -0700, Suren Baghdasaryan wrote:
> On Tue, Jun 24, 2025 at 8:14 AM Harry Yoo <harry.yoo@oracle.com> wrote:
> >
> > On Tue, Jun 24, 2025 at 07:57:40AM -0700, Suren Baghdasaryan wrote:
> > > On Tue, Jun 24, 2025 at 7:28 AM David Wang <00107082@163.com> wrote:
> > > >
> > > >
> > > > At 2025-06-24 22:13:49, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> > > > >On Tue, Jun 24, 2025 at 10:00:48PM +0800, David Wang wrote:
> > > > >>
> > > > >> At 2025-06-24 21:50:02, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> > > > >> >On Tue, Jun 24, 2025 at 09:25:58PM +0800, David Wang wrote:
> > > > >> >>
> > > > >> >> At 2025-06-24 15:25:13, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> > > > >> >> >alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
> > > > >> >> >even when the alloc_tag_cttype is not allocated because:
> > > > >> >> >
> > > > >> >> > 1) alloc tagging is disabled because mem profiling is disabled
> > > > >> >> > (!alloc_tag_cttype)
> > > > >> >> > 2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
> > > > >> >> > 3) alloc tagging is enabled, but failed initialization
> > > > >> >> > (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
> > > > >> >> >
> > > > >> >> >In all cases, alloc_tag_cttype is not allocated, and therefore
> > > > >> >> >alloc_tag_top_users() should not attempt to acquire the semaphore.
> > > > >> >> >
> > > > >> >> >This leads to a crash on memory allocation failure by attempting to
> > > > >> >> >acquire a non-existent semaphore:
> > > > >> >> >
> > > > >> >> > Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
> > > > >> >> > KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
> > > > >> >> > CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
> > > > >> >> > Tainted: [D]=DIE
> > > > >> >> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > > >> >> > RIP: 0010:down_read_trylock+0xaa/0x3b0
> > > > >> >> > Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
> > > > >> >> > RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
> > > > >> >> > RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
> > > > >> >> > RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> > > > >> >> > RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
> > > > >> >> > R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
> > > > >> >> > R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
> > > > >> >> > FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
> > > > >> >> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > >> >> > CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
> > > > >> >> > Call Trace:
> > > > >> >> > <TASK>
> > > > >> >> > codetag_trylock_module_list+0xd/0x20
> > > > >> >> > alloc_tag_top_users+0x369/0x4b0
> > > > >> >> > __show_mem+0x1cd/0x6e0
> > > > >> >> > warn_alloc+0x2b1/0x390
> > > > >> >> > __alloc_frozen_pages_noprof+0x12b9/0x21a0
> > > > >> >> > alloc_pages_mpol+0x135/0x3e0
> > > > >> >> > alloc_slab_page+0x82/0xe0
> > > > >> >> > new_slab+0x212/0x240
> > > > >> >> > ___slab_alloc+0x82a/0xe00
> > > > >> >> > </TASK>
> > > > >> >> >
> > > > >> >> >As David Wang points out, this issue became easier to trigger after commit
> > > > >> >> >780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
> > > > >> >> >
> > > > >> >> >Before the commit, the issue occurred only when it failed to allocate
> > > > >> >> >and initialize alloc_tag_cttype or if a memory allocation fails before
> > > > >> >> >alloc_tag_init() is called. After the commit, it can be easily triggered
> > > > >> >> >when memory profiling is compiled but disabled at boot.
> > > > >> >> >
> > > > >> >> >To properly determine whether alloc_tag_init() has been called and
> > > > >> >> >its data structures initialized, verify that alloc_tag_cttype is a valid
> > > > >> >> >pointer before acquiring the semaphore. If the variable is NULL or an error
> > > > >> >> >value, it has not been properly initialized. In such a case, just skip
> > > > >> >> >and do not attempt to acquire the semaphore.
> > > > >> >> >
> > > > >> >> >Reported-by: kernel test robot <oliver.sang@intel.com>
> > > > >> >> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com__;!!ACWV5N9M2RV99hQ!MADvGKtvTvlLXNxlrJ4BdOSnbsJlyrSroPUGJ3JQHs_IF-gxxqfQ89OTZ21aN96DbmjG9qH3Wi1MlgtiSA$
> > > > >> >> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506131711.5b41931c-lkp@intel.com__;!!ACWV5N9M2RV99hQ!MADvGKtvTvlLXNxlrJ4BdOSnbsJlyrSroPUGJ3JQHs_IF-gxxqfQ89OTZ21aN96DbmjG9qH3Wi0o2OoynA$
> > > > >> >> >Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
> > > > >> >> >Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
> > > > >> >> >Cc: stable@vger.kernel.org
> > > > >> >> >Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> > > > >> >> >---
> > > > >> >> >
> > > > >> >> >@Suren: I did not add another pr_warn() because every error path in
> > > > >> >> >alloc_tag_init() already has pr_err().
> > > > >> >> >
> > > > >> >> >v2 -> v3:
> > > > >> >> >- Added another Closes: tag (David)
> > > > >> >> >- Moved the condition into a standalone if block for better readability
> > > > >> >> > (Suren)
> > > > >> >> >- Typo fix (Suren)
> > > > >> >> >
> > > > >> >> > lib/alloc_tag.c | 3 +++
> > > > >> >> > 1 file changed, 3 insertions(+)
> > > > >> >> >
> > > > >> >> >diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> > > > >> >> >index 41ccfb035b7b..e9b33848700a 100644
> > > > >> >> >--- a/lib/alloc_tag.c
> > > > >> >> >+++ b/lib/alloc_tag.c
> > > > >> >> >@@ -127,6 +127,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> > > > >> >> > struct codetag_bytes n;
> > > > >> >> > unsigned int i, nr = 0;
> > > > >> >> >
> > > > >> >> >+ if (IS_ERR_OR_NULL(alloc_tag_cttype))
> > > > >> >> >+ return 0;
> > > > >> >>
> > > > >> >> What about mem_profiling_support set to 0 after alloc_tag_init, in this case:
> > > > >> >> alloc_tag_cttype != NULL && mem_profiling_support==0
> > > > >> >>
> > > > >> >> I kind of think alloc_tag_top_users should return 0 in this case....and both mem_profiling_support and alloc_tag_cttype should be checked....
> > > > >> >
> > > > >> >After commit 780138b12381, alloc_tag_cttype is not allocated if
> > > > >> >!mem_profiling_support. (And that's why this bug showed up)
> > > > >>
> > > > >> There is a sysctl(/proc/sys/vm/mem_profiling) which can override mem_profiling_support and set it to 0, after alloc_tag_init with mem_profiling_support=1
> > >
> > > Wait, /proc/sys/vm/mem_profiling is changing mem_alloc_profiling_key,
> > > not mem_profiling_support. Am I missing something?
> >
> > Feels like it should call shutdown_mem_profiling() instead of
> > proc_do_static_key() (and also remove /proc/allocinfo)?
>
> No, we should be able to re-enable it later on.
A few questions that came up while reading this,
please feel free to ignore :)
- What is the expected output of /proc/allocinfo when it's disabled?
Should it print old data or nothing? I think it should be consistent
with the behavior of alloc_tag_top_users().
- When it's disabled and re-enabled again, can we see inconsistent
data if some memory has been freed in the meantime?
> You can't do that if you call shutdown_mem_profiling().
Because setting mem_profiling_support = false mean it's not supported.
And you can't re-enable if it's not supported. Gotcha!
> mem_profiling_support is very different from mem_alloc_profiling_key.
> mem_profiling_support means memory profiling is not supported while
> mem_alloc_profiling_key means it's enabled or disabled and can be
> changed later.
Okay. Now I see why I was confused. Perhaps I should have guessed that
from the name, but I was not 100% sure about the meaning.
> > > > >
> > > > >Ok. Maybe it shouldn't report memory allocation information that is
> > > > >collected before mem profiling was disabled. (I'm not sure why it disabling
> > > > >at runtime is allowed, though)
> > > > >
> > > > >That's a good thing to have, but I think that's a behavioral change in
> > > > >mem profiling, irrelevant to this bug and not a -stable thing.
> > > > >
> > > > >Maybe as a follow-up patch?
> > > >
> > > > Only a little more changes needed, I was suggesting:
> > > >
> > > > @@ -134,6 +122,14 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> > > > struct codetag_bytes n;
> > > > unsigned int i, nr = 0;
> > > >
> > > > + if (!mem_profiling_support)
> > > > + return 0;
> > >
> > > David is right that with /proc/sys/vm/mem_profiling memory profiling
> > > can be turned off at runtime but the above condition should be:
> > > if (!mem_alloc_profiling_enabled())
> > > return 0;
> >
> > I agree that this change is a useful addition, but adding it to the patch
> > doesn't look right. It's doing two different things.
>
> You might be right, calling alloc_tag_top_users() while
> !mem_alloc_profiling_enabled() will print older data but it won't lead
> to UAF.
Yes and I think it'll be great if David could post it after the fix lands
maineline.
Aside from that, any feedback on the v3 of the patch?
If yes, I'll adjust it.
If not, please consider an ack ;)
> > > > +
> > > > + if (IS_ERR_OR_NULL(alloc_tag_cttype)) {
> > > > + pr_warn("alloctag module is not ready yet.\n");
> > >
> > > I don't think spitting out this warning on every show_mem() is useful.
> > > If alloc_tag_cttype is invalid because codetag_register_type() failed
> > > then we already print an error here:
> > > https://elixir.bootlin.com/linux/v6.16-rc3/source/lib/alloc_tag.c#L829,
> > > so user has the logs to track this down.
> > > If show_mem() is called so early that alloc_tag_init() hasn't been
> > > called yet then missing allocation tag information would not be
> > > surprising I think, considering it's early boot. I don't think it's
> > > worth detecting and reporting such a state.
> > >
> > > > + return 0;
> > > > + }
> > > > +
> >
> > --
> > Cheers,
> > Harry / Hyeonggon
>
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()
2025-06-24 18:00 ` Harry Yoo
@ 2025-06-25 0:52 ` Suren Baghdasaryan
0 siblings, 0 replies; 22+ messages in thread
From: Suren Baghdasaryan @ 2025-06-25 0:52 UTC (permalink / raw)
To: Harry Yoo
Cc: David Wang, akpm, kent.overstreet, oliver.sang, cachen, linux-mm,
oe-lkp, stable
On Tue, Jun 24, 2025 at 11:01 AM Harry Yoo <harry.yoo@oracle.com> wrote:
>
> On Tue, Jun 24, 2025 at 08:38:05AM -0700, Suren Baghdasaryan wrote:
> > On Tue, Jun 24, 2025 at 8:14 AM Harry Yoo <harry.yoo@oracle.com> wrote:
> > >
> > > On Tue, Jun 24, 2025 at 07:57:40AM -0700, Suren Baghdasaryan wrote:
> > > > On Tue, Jun 24, 2025 at 7:28 AM David Wang <00107082@163.com> wrote:
> > > > >
> > > > >
> > > > > At 2025-06-24 22:13:49, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> > > > > >On Tue, Jun 24, 2025 at 10:00:48PM +0800, David Wang wrote:
> > > > > >>
> > > > > >> At 2025-06-24 21:50:02, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> > > > > >> >On Tue, Jun 24, 2025 at 09:25:58PM +0800, David Wang wrote:
> > > > > >> >>
> > > > > >> >> At 2025-06-24 15:25:13, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> > > > > >> >> >alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
> > > > > >> >> >even when the alloc_tag_cttype is not allocated because:
> > > > > >> >> >
> > > > > >> >> > 1) alloc tagging is disabled because mem profiling is disabled
> > > > > >> >> > (!alloc_tag_cttype)
> > > > > >> >> > 2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
> > > > > >> >> > 3) alloc tagging is enabled, but failed initialization
> > > > > >> >> > (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
> > > > > >> >> >
> > > > > >> >> >In all cases, alloc_tag_cttype is not allocated, and therefore
> > > > > >> >> >alloc_tag_top_users() should not attempt to acquire the semaphore.
> > > > > >> >> >
> > > > > >> >> >This leads to a crash on memory allocation failure by attempting to
> > > > > >> >> >acquire a non-existent semaphore:
> > > > > >> >> >
> > > > > >> >> > Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
> > > > > >> >> > KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
> > > > > >> >> > CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
> > > > > >> >> > Tainted: [D]=DIE
> > > > > >> >> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > > > >> >> > RIP: 0010:down_read_trylock+0xaa/0x3b0
> > > > > >> >> > Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
> > > > > >> >> > RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
> > > > > >> >> > RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
> > > > > >> >> > RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> > > > > >> >> > RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
> > > > > >> >> > R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
> > > > > >> >> > R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
> > > > > >> >> > FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
> > > > > >> >> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > >> >> > CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
> > > > > >> >> > Call Trace:
> > > > > >> >> > <TASK>
> > > > > >> >> > codetag_trylock_module_list+0xd/0x20
> > > > > >> >> > alloc_tag_top_users+0x369/0x4b0
> > > > > >> >> > __show_mem+0x1cd/0x6e0
> > > > > >> >> > warn_alloc+0x2b1/0x390
> > > > > >> >> > __alloc_frozen_pages_noprof+0x12b9/0x21a0
> > > > > >> >> > alloc_pages_mpol+0x135/0x3e0
> > > > > >> >> > alloc_slab_page+0x82/0xe0
> > > > > >> >> > new_slab+0x212/0x240
> > > > > >> >> > ___slab_alloc+0x82a/0xe00
> > > > > >> >> > </TASK>
> > > > > >> >> >
> > > > > >> >> >As David Wang points out, this issue became easier to trigger after commit
> > > > > >> >> >780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
> > > > > >> >> >
> > > > > >> >> >Before the commit, the issue occurred only when it failed to allocate
> > > > > >> >> >and initialize alloc_tag_cttype or if a memory allocation fails before
> > > > > >> >> >alloc_tag_init() is called. After the commit, it can be easily triggered
> > > > > >> >> >when memory profiling is compiled but disabled at boot.
> > > > > >> >> >
> > > > > >> >> >To properly determine whether alloc_tag_init() has been called and
> > > > > >> >> >its data structures initialized, verify that alloc_tag_cttype is a valid
> > > > > >> >> >pointer before acquiring the semaphore. If the variable is NULL or an error
> > > > > >> >> >value, it has not been properly initialized. In such a case, just skip
> > > > > >> >> >and do not attempt to acquire the semaphore.
> > > > > >> >> >
> > > > > >> >> >Reported-by: kernel test robot <oliver.sang@intel.com>
> > > > > >> >> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com__;!!ACWV5N9M2RV99hQ!MADvGKtvTvlLXNxlrJ4BdOSnbsJlyrSroPUGJ3JQHs_IF-gxxqfQ89OTZ21aN96DbmjG9qH3Wi1MlgtiSA$
> > > > > >> >> >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506131711.5b41931c-lkp@intel.com__;!!ACWV5N9M2RV99hQ!MADvGKtvTvlLXNxlrJ4BdOSnbsJlyrSroPUGJ3JQHs_IF-gxxqfQ89OTZ21aN96DbmjG9qH3Wi0o2OoynA$
> > > > > >> >> >Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
> > > > > >> >> >Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
> > > > > >> >> >Cc: stable@vger.kernel.org
> > > > > >> >> >Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> > > > > >> >> >---
> > > > > >> >> >
> > > > > >> >> >@Suren: I did not add another pr_warn() because every error path in
> > > > > >> >> >alloc_tag_init() already has pr_err().
> > > > > >> >> >
> > > > > >> >> >v2 -> v3:
> > > > > >> >> >- Added another Closes: tag (David)
> > > > > >> >> >- Moved the condition into a standalone if block for better readability
> > > > > >> >> > (Suren)
> > > > > >> >> >- Typo fix (Suren)
> > > > > >> >> >
> > > > > >> >> > lib/alloc_tag.c | 3 +++
> > > > > >> >> > 1 file changed, 3 insertions(+)
> > > > > >> >> >
> > > > > >> >> >diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> > > > > >> >> >index 41ccfb035b7b..e9b33848700a 100644
> > > > > >> >> >--- a/lib/alloc_tag.c
> > > > > >> >> >+++ b/lib/alloc_tag.c
> > > > > >> >> >@@ -127,6 +127,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> > > > > >> >> > struct codetag_bytes n;
> > > > > >> >> > unsigned int i, nr = 0;
> > > > > >> >> >
> > > > > >> >> >+ if (IS_ERR_OR_NULL(alloc_tag_cttype))
> > > > > >> >> >+ return 0;
> > > > > >> >>
> > > > > >> >> What about mem_profiling_support set to 0 after alloc_tag_init, in this case:
> > > > > >> >> alloc_tag_cttype != NULL && mem_profiling_support==0
> > > > > >> >>
> > > > > >> >> I kind of think alloc_tag_top_users should return 0 in this case....and both mem_profiling_support and alloc_tag_cttype should be checked....
> > > > > >> >
> > > > > >> >After commit 780138b12381, alloc_tag_cttype is not allocated if
> > > > > >> >!mem_profiling_support. (And that's why this bug showed up)
> > > > > >>
> > > > > >> There is a sysctl(/proc/sys/vm/mem_profiling) which can override mem_profiling_support and set it to 0, after alloc_tag_init with mem_profiling_support=1
> > > >
> > > > Wait, /proc/sys/vm/mem_profiling is changing mem_alloc_profiling_key,
> > > > not mem_profiling_support. Am I missing something?
> > >
> > > Feels like it should call shutdown_mem_profiling() instead of
> > > proc_do_static_key() (and also remove /proc/allocinfo)?
> >
> > No, we should be able to re-enable it later on.
>
> A few questions that came up while reading this,
> please feel free to ignore :)
>
> - What is the expected output of /proc/allocinfo when it's disabled?
> Should it print old data or nothing? I think it should be consistent
> with the behavior of alloc_tag_top_users().
>
> - When it's disabled and re-enabled again, can we see inconsistent
> data if some memory has been freed in the meantime?
>
> > You can't do that if you call shutdown_mem_profiling().
>
> Because setting mem_profiling_support = false mean it's not supported.
> And you can't re-enable if it's not supported. Gotcha!
>
> > mem_profiling_support is very different from mem_alloc_profiling_key.
> > mem_profiling_support means memory profiling is not supported while
> > mem_alloc_profiling_key means it's enabled or disabled and can be
> > changed later.
>
> Okay. Now I see why I was confused. Perhaps I should have guessed that
> from the name, but I was not 100% sure about the meaning.
>
> > > > > >
> > > > > >Ok. Maybe it shouldn't report memory allocation information that is
> > > > > >collected before mem profiling was disabled. (I'm not sure why it disabling
> > > > > >at runtime is allowed, though)
> > > > > >
> > > > > >That's a good thing to have, but I think that's a behavioral change in
> > > > > >mem profiling, irrelevant to this bug and not a -stable thing.
> > > > > >
> > > > > >Maybe as a follow-up patch?
> > > > >
> > > > > Only a little more changes needed, I was suggesting:
> > > > >
> > > > > @@ -134,6 +122,14 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> > > > > struct codetag_bytes n;
> > > > > unsigned int i, nr = 0;
> > > > >
> > > > > + if (!mem_profiling_support)
> > > > > + return 0;
> > > >
> > > > David is right that with /proc/sys/vm/mem_profiling memory profiling
> > > > can be turned off at runtime but the above condition should be:
> > > > if (!mem_alloc_profiling_enabled())
> > > > return 0;
> > >
> > > I agree that this change is a useful addition, but adding it to the patch
> > > doesn't look right. It's doing two different things.
> >
> > You might be right, calling alloc_tag_top_users() while
> > !mem_alloc_profiling_enabled() will print older data but it won't lead
> > to UAF.
>
> Yes and I think it'll be great if David could post it after the fix lands
> maineline.
>
> Aside from that, any feedback on the v3 of the patch?
>
> If yes, I'll adjust it.
> If not, please consider an ack ;)
LGTM.
Acked-by: Suren Baghdasaryan <surenb@google.com>
>
> > > > > +
> > > > > + if (IS_ERR_OR_NULL(alloc_tag_cttype)) {
> > > > > + pr_warn("alloctag module is not ready yet.\n");
> > > >
> > > > I don't think spitting out this warning on every show_mem() is useful.
> > > > If alloc_tag_cttype is invalid because codetag_register_type() failed
> > > > then we already print an error here:
> > > > https://elixir.bootlin.com/linux/v6.16-rc3/source/lib/alloc_tag.c#L829,
> > > > so user has the logs to track this down.
> > > > If show_mem() is called so early that alloc_tag_init() hasn't been
> > > > called yet then missing allocation tag information would not be
> > > > surprising I think, considering it's early boot. I don't think it's
> > > > worth detecting and reporting such a state.
> > > >
> > > > > + return 0;
> > > > > + }
> > > > > +
> > >
> > > --
> > > Cheers,
> > > Harry / Hyeonggon
> >
>
> --
> Cheers,
> Harry / Hyeonggon
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()
2025-06-24 7:25 [PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users() Harry Yoo
2025-06-24 8:21 ` David Wang
2025-06-24 13:25 ` Re:[PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users() David Wang
@ 2025-06-30 6:25 ` Raghavendra K T
2 siblings, 0 replies; 22+ messages in thread
From: Raghavendra K T @ 2025-06-30 6:25 UTC (permalink / raw)
To: Harry Yoo, akpm, surenb, kent.overstreet
Cc: oliver.sang, 00107082, cachen, linux-mm, oe-lkp, stable
On 6/24/2025 12:55 PM, Harry Yoo wrote:
> alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
> even when the alloc_tag_cttype is not allocated because:
>
> 1) alloc tagging is disabled because mem profiling is disabled
> (!alloc_tag_cttype)
> 2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
> 3) alloc tagging is enabled, but failed initialization
> (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
>
> In all cases, alloc_tag_cttype is not allocated, and therefore
> alloc_tag_top_users() should not attempt to acquire the semaphore.
>
> This leads to a crash on memory allocation failure by attempting to
> acquire a non-existent semaphore:
>
> Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
> KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
> CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G D 6.16.0-rc2 #1 VOLUNTARY
> Tainted: [D]=DIE
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> RIP: 0010:down_read_trylock+0xaa/0x3b0
> Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
> RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
> RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
> RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
> R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
> R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
> FS: 00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
> Call Trace:
> <TASK>
> codetag_trylock_module_list+0xd/0x20
> alloc_tag_top_users+0x369/0x4b0
> __show_mem+0x1cd/0x6e0
> warn_alloc+0x2b1/0x390
> __alloc_frozen_pages_noprof+0x12b9/0x21a0
> alloc_pages_mpol+0x135/0x3e0
> alloc_slab_page+0x82/0xe0
> new_slab+0x212/0x240
> ___slab_alloc+0x82a/0xe00
> </TASK>
>
> As David Wang points out, this issue became easier to trigger after commit
> 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
>
> Before the commit, the issue occurred only when it failed to allocate
> and initialize alloc_tag_cttype or if a memory allocation fails before
> alloc_tag_init() is called. After the commit, it can be easily triggered
> when memory profiling is compiled but disabled at boot.
>
> To properly determine whether alloc_tag_init() has been called and
> its data structures initialized, verify that alloc_tag_cttype is a valid
> pointer before acquiring the semaphore. If the variable is NULL or an error
> value, it has not been properly initialized. In such a case, just skip
> and do not attempt to acquire the semaphore.
>
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com
> Closes: https://lore.kernel.org/oe-lkp/202506131711.5b41931c-lkp@intel.com
> Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
> Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
> Cc: stable@vger.kernel.org
> Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> ---
>
> @Suren: I did not add another pr_warn() because every error path in
> alloc_tag_init() already has pr_err().
>
> v2 -> v3:
> - Added another Closes: tag (David)
> - Moved the condition into a standalone if block for better readability
> (Suren)
> - Typo fix (Suren)
>
> lib/alloc_tag.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> index 41ccfb035b7b..e9b33848700a 100644
> --- a/lib/alloc_tag.c
> +++ b/lib/alloc_tag.c
> @@ -127,6 +127,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> struct codetag_bytes n;
> unsigned int i, nr = 0;
>
> + if (IS_ERR_OR_NULL(alloc_tag_cttype))
> + return 0;
> +
> if (can_sleep)
> codetag_lock_module_list(alloc_tag_cttype, true);
> else if (!codetag_trylock_module_list(alloc_tag_cttype))
Hello,
I have hit this while digging one of the TEST_VMALLOC test.
(specifically align_shift_alloc_test which is expected to fail and
give similar splat)
I can confirm that with
CONFIG_MEM_ALLOC_PROFILING=y
CONFIG_TEST_VMALLOC=y
system does not boot (for v6.16+).
with this patch system boots fine.
and thus seems to be a mandatory fix. (for the above CONFIG combination)
Fell free to add:
Tested-by: Raghavendra K T <raghavendra.kt@amd.com>
Thanks and Regards
- Raghu
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2025-06-30 6:25 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-24 7:25 [PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users() Harry Yoo
2025-06-24 8:21 ` David Wang
2025-06-24 9:09 ` [PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()y Harry Yoo
2025-06-24 9:30 ` David Wang
2025-06-24 10:59 ` Harry Yoo
2025-06-24 11:28 ` David Wang
2025-06-24 13:15 ` David Wang
2025-06-24 14:24 ` Harry Yoo
2025-06-24 14:47 ` David Wang
2025-06-24 15:04 ` Suren Baghdasaryan
2025-06-24 13:25 ` Re:[PATCH v3] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users() David Wang
2025-06-24 13:50 ` [PATCH " Harry Yoo
2025-06-24 14:00 ` David Wang
2025-06-24 14:13 ` Harry Yoo
2025-06-24 14:28 ` David Wang
2025-06-24 14:57 ` Suren Baghdasaryan
2025-06-24 15:14 ` Harry Yoo
2025-06-24 15:38 ` Suren Baghdasaryan
2025-06-24 18:00 ` Harry Yoo
2025-06-25 0:52 ` Suren Baghdasaryan
2025-06-24 15:23 ` David Wang
2025-06-30 6:25 ` Raghavendra K T
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).