public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()
       [not found] <202506181351.bba867dd-lkp@intel.com>
@ 2025-06-20 19:53 ` Harry Yoo
  2025-06-21  3:43   ` David Wang
  0 siblings, 1 reply; 4+ messages in thread
From: Harry Yoo @ 2025-06-20 19:53 UTC (permalink / raw)
  To: akpm, surenb, kent.overstreet
  Cc: oliver.sang, 00107082, cachen, linux-mm, oe-lkp, Harry Yoo,
	stable

alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
even when the alloc_tag_cttype is not allocated because:

  1) alloc tagging is disabled because mem profiling is disabled
     (!alloc_tag_cttype)
  2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
  3) alloc tagging is enabled, but failed initialization
     (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))

In all cases, alloc_tag_cttype is not allocated, and therefore
alloc_tag_top_users() should not attempt to acquire the semaphore.

This leads to a crash on memory allocation failure by attempting to
acquire a non-existent semaphore:

  Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
  KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
  CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G      D             6.16.0-rc2 #1 VOLUNTARY
  Tainted: [D]=DIE
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
  RIP: 0010:down_read_trylock+0xaa/0x3b0
  Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
  RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
  RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
  RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
  RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
  R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
  R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
  FS:  00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
  Call Trace:
   <TASK>
   codetag_trylock_module_list+0xd/0x20
   alloc_tag_top_users+0x369/0x4b0
   __show_mem+0x1cd/0x6e0
   warn_alloc+0x2b1/0x390
   __alloc_frozen_pages_noprof+0x12b9/0x21a0
   alloc_pages_mpol+0x135/0x3e0
   alloc_slab_page+0x82/0xe0
   new_slab+0x212/0x240
   ___slab_alloc+0x82a/0xe00
   </TASK>

As David Wang points out, this issue became easier to trigger after commit
780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").

Before the commit, the issue occurred only when it failed to allocate
and initialize alloc_tag_cttype or if a memory allocation fails before
alloc_tag_init() is called. After the commit, it can be easily triggered
when memory profiling is compiled but disabled at boot.

To properly determine whether alloc_tag_init() has been called and
its data structures initialized, verify that alloc_tag_cttype is a valid
pointer before acquiring the semaphore. If the variable is NULL or an error
value, it has not been properly initialized. In such a case, just skip
and do not attempt acquire the semaphore.

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com
Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
Cc: stable@vger.kernel.org
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
---

v1 -> v2:

- v1 fixed the bug only when MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=n.
  
  v2 now fixes the bug even when MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y.
  I didn't expect alloc_tag_cttype to be NULL when
  mem_profiling_support is true, but as David points out (Thanks David!)
  if a memory allocation fails before alloc_tag_init(), it can be NULL.

  So instead of indirectly checking mem_profiling_support, just directly
  check if alloc_tag_cttype is allocated.

- Closes: https://lore.kernel.org/oe-lkp/202505071555.e757f1e0-lkp@intel.com
  tag was removed because it was not a crash and not relevant to this
  patch.

- Added Cc: stable because, if an allocation fails before
  alloc_tag_init(), it can be triggered even prior-780138b12381.
  I verified that the bug can be triggered in v6.12 and fixed by this
  patch.

  It should be quite difficult to trigger in practice, though.
  Maybe I'm a bit paranoid?

 lib/alloc_tag.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 66a4628185f7..d8ec4c03b7d2 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -124,7 +124,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
 	struct codetag_bytes n;
 	unsigned int i, nr = 0;
 
-	if (can_sleep)
+	if (IS_ERR_OR_NULL(alloc_tag_cttype))
+		return 0;
+	else if (can_sleep)
 		codetag_lock_module_list(alloc_tag_cttype, true);
 	else if (!codetag_trylock_module_list(alloc_tag_cttype))
 		return 0;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re:[PATCH v2] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()
  2025-06-20 19:53 ` [PATCH v2] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users() Harry Yoo
@ 2025-06-21  3:43   ` David Wang
  2025-06-22 22:24     ` [PATCH " Suren Baghdasaryan
  0 siblings, 1 reply; 4+ messages in thread
From: David Wang @ 2025-06-21  3:43 UTC (permalink / raw)
  To: Harry Yoo
  Cc: akpm, surenb, kent.overstreet, oliver.sang, cachen, linux-mm,
	oe-lkp, stable


At 2025-06-21 03:53:05, "Harry Yoo" <harry.yoo@oracle.com> wrote:
>alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
>even when the alloc_tag_cttype is not allocated because:
>
>  1) alloc tagging is disabled because mem profiling is disabled
>     (!alloc_tag_cttype)
>  2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
>  3) alloc tagging is enabled, but failed initialization
>     (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
>
>In all cases, alloc_tag_cttype is not allocated, and therefore
>alloc_tag_top_users() should not attempt to acquire the semaphore.
>
>This leads to a crash on memory allocation failure by attempting to
>acquire a non-existent semaphore:
>
>  Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
>  KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
>  CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G      D             6.16.0-rc2 #1 VOLUNTARY
>  Tainted: [D]=DIE
>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
>  RIP: 0010:down_read_trylock+0xaa/0x3b0
>  Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
>  RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
>  RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
>  RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
>  RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
>  R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
>  R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
>  FS:  00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
>  Call Trace:
>   <TASK>
>   codetag_trylock_module_list+0xd/0x20
>   alloc_tag_top_users+0x369/0x4b0
>   __show_mem+0x1cd/0x6e0
>   warn_alloc+0x2b1/0x390
>   __alloc_frozen_pages_noprof+0x12b9/0x21a0
>   alloc_pages_mpol+0x135/0x3e0
>   alloc_slab_page+0x82/0xe0
>   new_slab+0x212/0x240
>   ___slab_alloc+0x82a/0xe00
>   </TASK>
>
>As David Wang points out, this issue became easier to trigger after commit
>780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
>
>Before the commit, the issue occurred only when it failed to allocate
>and initialize alloc_tag_cttype or if a memory allocation fails before
>alloc_tag_init() is called. After the commit, it can be easily triggered
>when memory profiling is compiled but disabled at boot.
>
>To properly determine whether alloc_tag_init() has been called and
>its data structures initialized, verify that alloc_tag_cttype is a valid
>pointer before acquiring the semaphore. If the variable is NULL or an error
>value, it has not been properly initialized. In such a case, just skip
>and do not attempt acquire the semaphore.
>
>Reported-by: kernel test robot <oliver.sang@intel.com>
>Closes: https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com
>Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
>Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
>Cc: stable@vger.kernel.org
>Signed-off-by: Harry Yoo <harry.yoo@oracle.com>

Just notice another thread can be closed as well:
https://lore.kernel.org/all/202506131711.5b41931c-lkp@intel.com/
This coincide with scenario #1, where OOM happened with
CONFIG_MEM_ALLOC_PROFILING=y
# CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT is not set
# CONFIG_MEM_ALLOC_PROFILING_DEBUG is not set

>---
>
>v1 -> v2:
>
>- v1 fixed the bug only when MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=n.
>  
>  v2 now fixes the bug even when MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y.
>  I didn't expect alloc_tag_cttype to be NULL when
>  mem_profiling_support is true, but as David points out (Thanks David!)
>  if a memory allocation fails before alloc_tag_init(), it can be NULL.
>
>  So instead of indirectly checking mem_profiling_support, just directly
>  check if alloc_tag_cttype is allocated.
>
>- Closes: https://lore.kernel.org/oe-lkp/202505071555.e757f1e0-lkp@intel.com
>  tag was removed because it was not a crash and not relevant to this
>  patch.
>
>- Added Cc: stable because, if an allocation fails before
>  alloc_tag_init(), it can be triggered even prior-780138b12381.
>  I verified that the bug can be triggered in v6.12 and fixed by this
>  patch.
>
>  It should be quite difficult to trigger in practice, though.
>  Maybe I'm a bit paranoid?
>
> lib/alloc_tag.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
>diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
>index 66a4628185f7..d8ec4c03b7d2 100644
>--- a/lib/alloc_tag.c
>+++ b/lib/alloc_tag.c
>@@ -124,7 +124,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> 	struct codetag_bytes n;
> 	unsigned int i, nr = 0;
> 
>-	if (can_sleep)
>+	if (IS_ERR_OR_NULL(alloc_tag_cttype))
>+		return 0;
>+	else if (can_sleep)
> 		codetag_lock_module_list(alloc_tag_cttype, true);
> 	else if (!codetag_trylock_module_list(alloc_tag_cttype))
> 		return 0;
>-- 
>2.43.0

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()
  2025-06-21  3:43   ` David Wang
@ 2025-06-22 22:24     ` Suren Baghdasaryan
  2025-06-23  2:01       ` Harry Yoo
  0 siblings, 1 reply; 4+ messages in thread
From: Suren Baghdasaryan @ 2025-06-22 22:24 UTC (permalink / raw)
  To: David Wang
  Cc: Harry Yoo, akpm, kent.overstreet, oliver.sang, cachen, linux-mm,
	oe-lkp, stable

On Fri, Jun 20, 2025 at 8:43 PM David Wang <00107082@163.com> wrote:
>
>
> At 2025-06-21 03:53:05, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> >alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
> >even when the alloc_tag_cttype is not allocated because:
> >
> >  1) alloc tagging is disabled because mem profiling is disabled
> >     (!alloc_tag_cttype)
> >  2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
> >  3) alloc tagging is enabled, but failed initialization
> >     (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
> >
> >In all cases, alloc_tag_cttype is not allocated, and therefore
> >alloc_tag_top_users() should not attempt to acquire the semaphore.
> >
> >This leads to a crash on memory allocation failure by attempting to
> >acquire a non-existent semaphore:
> >
> >  Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
> >  KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
> >  CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G      D             6.16.0-rc2 #1 VOLUNTARY
> >  Tainted: [D]=DIE
> >  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> >  RIP: 0010:down_read_trylock+0xaa/0x3b0
> >  Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
> >  RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
> >  RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
> >  RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> >  RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
> >  R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
> >  R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
> >  FS:  00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
> >  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >  CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
> >  Call Trace:
> >   <TASK>
> >   codetag_trylock_module_list+0xd/0x20
> >   alloc_tag_top_users+0x369/0x4b0
> >   __show_mem+0x1cd/0x6e0
> >   warn_alloc+0x2b1/0x390
> >   __alloc_frozen_pages_noprof+0x12b9/0x21a0
> >   alloc_pages_mpol+0x135/0x3e0
> >   alloc_slab_page+0x82/0xe0
> >   new_slab+0x212/0x240
> >   ___slab_alloc+0x82a/0xe00
> >   </TASK>
> >
> >As David Wang points out, this issue became easier to trigger after commit
> >780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
> >
> >Before the commit, the issue occurred only when it failed to allocate
> >and initialize alloc_tag_cttype or if a memory allocation fails before
> >alloc_tag_init() is called. After the commit, it can be easily triggered
> >when memory profiling is compiled but disabled at boot.

Thanks for the fix and sorry about the delay with reviewing it.

> >
> >To properly determine whether alloc_tag_init() has been called and
> >its data structures initialized, verify that alloc_tag_cttype is a valid
> >pointer before acquiring the semaphore. If the variable is NULL or an error
> >value, it has not been properly initialized. In such a case, just skip
> >and do not attempt acquire the semaphore.

nit: s/attempt acquire/attempt to acquire

> >
> >Reported-by: kernel test robot <oliver.sang@intel.com>
> >Closes: https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com
> >Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
> >Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
> >Cc: stable@vger.kernel.org
> >Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
>
> Just notice another thread can be closed as well:
> https://lore.kernel.org/all/202506131711.5b41931c-lkp@intel.com/
> This coincide with scenario #1, where OOM happened with
> CONFIG_MEM_ALLOC_PROFILING=y
> # CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT is not set
> # CONFIG_MEM_ALLOC_PROFILING_DEBUG is not set
>
> >---
> >
> >v1 -> v2:
> >
> >- v1 fixed the bug only when MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=n.
> >
> >  v2 now fixes the bug even when MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y.
> >  I didn't expect alloc_tag_cttype to be NULL when
> >  mem_profiling_support is true, but as David points out (Thanks David!)
> >  if a memory allocation fails before alloc_tag_init(), it can be NULL.
> >
> >  So instead of indirectly checking mem_profiling_support, just directly
> >  check if alloc_tag_cttype is allocated.
> >
> >- Closes: https://lore.kernel.org/oe-lkp/202505071555.e757f1e0-lkp@intel.com
> >  tag was removed because it was not a crash and not relevant to this
> >  patch.
> >
> >- Added Cc: stable because, if an allocation fails before
> >  alloc_tag_init(), it can be triggered even prior-780138b12381.
> >  I verified that the bug can be triggered in v6.12 and fixed by this
> >  patch.
> >
> >  It should be quite difficult to trigger in practice, though.
> >  Maybe I'm a bit paranoid?
> >
> > lib/alloc_tag.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> >diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> >index 66a4628185f7..d8ec4c03b7d2 100644
> >--- a/lib/alloc_tag.c
> >+++ b/lib/alloc_tag.c
> >@@ -124,7 +124,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> >       struct codetag_bytes n;
> >       unsigned int i, nr = 0;
> >
> >-      if (can_sleep)
> >+      if (IS_ERR_OR_NULL(alloc_tag_cttype))
> >+              return 0;

So, AFAIKT alloc_tag_cttype will be NULL when memory profiling is
disabled and it will be ENOMEM if codetag_register_type() fails. I
think it would be good to add a pr_warn() in the alloc_tag_init() when
codetag_register_type() fails so that the user can determine the
reason why show_mem() report is missing allocation tag information.

> >+      else if (can_sleep)

nit: the above extra "else" is not really needed. The following should
work just fine, is more readable and produces less churn:

+      if (IS_ERR_OR_NULL(alloc_tag_cttype))
+              return 0;
+
      if (can_sleep)
               codetag_lock_module_list(alloc_tag_cttype, true);
       else if (!codetag_trylock_module_list(alloc_tag_cttype))
               return 0;

> >               codetag_lock_module_list(alloc_tag_cttype, true);
> >       else if (!codetag_trylock_module_list(alloc_tag_cttype))
> >               return 0;
> >--
> >2.43.0

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users()
  2025-06-22 22:24     ` [PATCH " Suren Baghdasaryan
@ 2025-06-23  2:01       ` Harry Yoo
  0 siblings, 0 replies; 4+ messages in thread
From: Harry Yoo @ 2025-06-23  2:01 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: David Wang, akpm, kent.overstreet, oliver.sang, cachen, linux-mm,
	oe-lkp, stable

On Sun, Jun 22, 2025 at 03:24:08PM -0700, Suren Baghdasaryan wrote:
> On Fri, Jun 20, 2025 at 8:43 PM David Wang <00107082@163.com> wrote:
> >
> >
> > At 2025-06-21 03:53:05, "Harry Yoo" <harry.yoo@oracle.com> wrote:
> > >alloc_tag_top_users() attempts to lock alloc_tag_cttype->mod_lock
> > >even when the alloc_tag_cttype is not allocated because:
> > >
> > >  1) alloc tagging is disabled because mem profiling is disabled
> > >     (!alloc_tag_cttype)
> > >  2) alloc tagging is enabled, but not yet initialized (!alloc_tag_cttype)
> > >  3) alloc tagging is enabled, but failed initialization
> > >     (!alloc_tag_cttype or IS_ERR(alloc_tag_cttype))
> > >
> > >In all cases, alloc_tag_cttype is not allocated, and therefore
> > >alloc_tag_top_users() should not attempt to acquire the semaphore.
> > >
> > >This leads to a crash on memory allocation failure by attempting to
> > >acquire a non-existent semaphore:
> > >
> > >  Oops: general protection fault, probably for non-canonical address 0xdffffc000000001b: 0000 [#3] SMP KASAN NOPTI
> > >  KASAN: null-ptr-deref in range [0x00000000000000d8-0x00000000000000df]
> > >  CPU: 2 UID: 0 PID: 1 Comm: systemd Tainted: G      D             6.16.0-rc2 #1 VOLUNTARY
> > >  Tainted: [D]=DIE
> > >  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > >  RIP: 0010:down_read_trylock+0xaa/0x3b0
> > >  Code: d0 7c 08 84 d2 0f 85 a0 02 00 00 8b 0d df 31 dd 04 85 c9 75 29 48 b8 00 00 00 00 00 fc ff df 48 8d 6b 68 48 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 88 02 00 00 48 3b 5b 68 0f 85 53 01 00 00 65 ff
> > >  RSP: 0000:ffff8881002ce9b8 EFLAGS: 00010016
> > >  RAX: dffffc0000000000 RBX: 0000000000000070 RCX: 0000000000000000
> > >  RDX: 000000000000001b RSI: 000000000000000a RDI: 0000000000000070
> > >  RBP: 00000000000000d8 R08: 0000000000000001 R09: ffffed107dde49d1
> > >  R10: ffff8883eef24e8b R11: ffff8881002cec20 R12: 1ffff11020059d37
> > >  R13: 00000000003fff7b R14: ffff8881002cec20 R15: dffffc0000000000
> > >  FS:  00007f963f21d940(0000) GS:ffff888458ca6000(0000) knlGS:0000000000000000
> > >  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >  CR2: 00007f963f5edf71 CR3: 000000010672c000 CR4: 0000000000350ef0
> > >  Call Trace:
> > >   <TASK>
> > >   codetag_trylock_module_list+0xd/0x20
> > >   alloc_tag_top_users+0x369/0x4b0
> > >   __show_mem+0x1cd/0x6e0
> > >   warn_alloc+0x2b1/0x390
> > >   __alloc_frozen_pages_noprof+0x12b9/0x21a0
> > >   alloc_pages_mpol+0x135/0x3e0
> > >   alloc_slab_page+0x82/0xe0
> > >   new_slab+0x212/0x240
> > >   ___slab_alloc+0x82a/0xe00
> > >   </TASK>
> > >
> > >As David Wang points out, this issue became easier to trigger after commit
> > >780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init").
> > >
> > >Before the commit, the issue occurred only when it failed to allocate
> > >and initialize alloc_tag_cttype or if a memory allocation fails before
> > >alloc_tag_init() is called. After the commit, it can be easily triggered
> > >when memory profiling is compiled but disabled at boot.
> 
> Thanks for the fix and sorry about the delay with reviewing it.

No problem ;)

> > >
> > >To properly determine whether alloc_tag_init() has been called and
> > >its data structures initialized, verify that alloc_tag_cttype is a valid
> > >pointer before acquiring the semaphore. If the variable is NULL or an error
> > >value, it has not been properly initialized. In such a case, just skip
> > >and do not attempt acquire the semaphore.
> 
> nit: s/attempt acquire/attempt to acquire

Will fix the typo.

> > >
> > >Reported-by: kernel test robot <oliver.sang@intel.com>
> > >Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202506181351.bba867dd-lkp@intel.com__;!!ACWV5N9M2RV99hQ!NZv9w8rtFb5ni1zqQs7y8loVNvbrbW3d1pBi4bA_f_Tfh-pegcni0iK5642QuK6FqCBCaOUfy-7KeUc$ 
> > >Fixes: 780138b12381 ("alloc_tag: check mem_profiling_support in alloc_tag_init")
> > >Fixes: 1438d349d16b ("lib: add memory allocations report in show_mem()")
> > >Cc: stable@vger.kernel.org
> > >Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> >
> > Just notice another thread can be closed as well:
> > https://urldefense.com/v3/__https://lore.kernel.org/all/202506131711.5b41931c-lkp@intel.com/__;!!ACWV5N9M2RV99hQ!NZv9w8rtFb5ni1zqQs7y8loVNvbrbW3d1pBi4bA_f_Tfh-pegcni0iK5642QuK6FqCBCaOUfSGgkKj0$ 
> > This coincide with scenario #1, where OOM happened with
> > CONFIG_MEM_ALLOC_PROFILING=y
> > # CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT is not set
> > # CONFIG_MEM_ALLOC_PROFILING_DEBUG is not set
> >
> > >---
> > >
> > >v1 -> v2:
> > >
> > >- v1 fixed the bug only when MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=n.
> > >
> > >  v2 now fixes the bug even when MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y.
> > >  I didn't expect alloc_tag_cttype to be NULL when
> > >  mem_profiling_support is true, but as David points out (Thanks David!)
> > >  if a memory allocation fails before alloc_tag_init(), it can be NULL.
> > >
> > >  So instead of indirectly checking mem_profiling_support, just directly
> > >  check if alloc_tag_cttype is allocated.
> > >
> > >- Closes: https://urldefense.com/v3/__https://lore.kernel.org/oe-lkp/202505071555.e757f1e0-lkp@intel.com__;!!ACWV5N9M2RV99hQ!NZv9w8rtFb5ni1zqQs7y8loVNvbrbW3d1pBi4bA_f_Tfh-pegcni0iK5642QuK6FqCBCaOUfwfwsQlE$ 
> > >  tag was removed because it was not a crash and not relevant to this
> > >  patch.
> > >
> > >- Added Cc: stable because, if an allocation fails before
> > >  alloc_tag_init(), it can be triggered even prior-780138b12381.
> > >  I verified that the bug can be triggered in v6.12 and fixed by this
> > >  patch.
> > >
> > >  It should be quite difficult to trigger in practice, though.
> > >  Maybe I'm a bit paranoid?
> > >
> > > lib/alloc_tag.c | 4 +++-
> > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > >diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
> > >index 66a4628185f7..d8ec4c03b7d2 100644
> > >--- a/lib/alloc_tag.c
> > >+++ b/lib/alloc_tag.c
> > >@@ -124,7 +124,9 @@ size_t alloc_tag_top_users(struct codetag_bytes *tags, size_t count, bool can_sl
> > >       struct codetag_bytes n;
> > >       unsigned int i, nr = 0;
> > >
> > >-      if (can_sleep)
> > >+      if (IS_ERR_OR_NULL(alloc_tag_cttype))
> > >+              return 0;
> 
> So, AFAIKT alloc_tag_cttype will be NULL when memory profiling is
> disabled and it will be ENOMEM if codetag_register_type() fails.

Yes.

Or when memory profiling is enabled, but a memory allocation fails
before alloc_tag_init().

> I think it would be good to add a pr_warn() in the alloc_tag_init() when
> codetag_register_type() fails so that the user can determine the
> reason why show_mem() report is missing allocation tag information.

Will do.

> > >+      else if (can_sleep)
> 
> nit: the above extra "else" is not really needed. The following should
> work just fine, is more readable and produces less churn:
> 
> +      if (IS_ERR_OR_NULL(alloc_tag_cttype))
> +              return 0;
> +
>       if (can_sleep)
>                codetag_lock_module_list(alloc_tag_cttype, true);
>        else if (!codetag_trylock_module_list(alloc_tag_cttype))
>                return 0;

Will do, thanks!

> 
> > >               codetag_lock_module_list(alloc_tag_cttype, true);
> > >       else if (!codetag_trylock_module_list(alloc_tag_cttype))
> > >               return 0;
> > >--
> > >2.43.0

-- 
Cheers,
Harry / Hyeonggon

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-06-23  2:02 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <202506181351.bba867dd-lkp@intel.com>
2025-06-20 19:53 ` [PATCH v2] lib/alloc_tag: do not acquire non-existent lock in alloc_tag_top_users() Harry Yoo
2025-06-21  3:43   ` David Wang
2025-06-22 22:24     ` [PATCH " Suren Baghdasaryan
2025-06-23  2:01       ` Harry Yoo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox