* [PATCH v1 0/2] mm/show_mem: Bug fix for print mem alloc info @ 2025-08-27 18:34 Yueyang Pan 2025-08-27 18:34 ` [PATCH v1 1/2] mm/show_mem: No print when not mem_alloc_profiling_enabled() Yueyang Pan ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Yueyang Pan @ 2025-08-27 18:34 UTC (permalink / raw) To: Suren Baghdasaryan, Andrew Morton, Vlastimil Babka, Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan, Usama Arif Cc: linux-mm, kernel-team, linux-kernel This patch set fixes two issues we saw in production rollout. The first issue is that we saw all zero output of memory allocation profiling information from show_mem() if CONFIG_MEM_ALLOC_PROFILING is set and sysctl.vm.mem_profiling=0. In this case, the behaviour should be the same as when CONFIG_MEM_ALLOC_PROFILING is unset, where show_mem prints nothing about the information. This will make further parse easier as we don't have to differentiate what a all zero line actually means (Does it mean 0 bytes are allocated or simply memory allocation profiling is disabled). The second issue is that multiple entities can call show_mem() which messed up the allocation info in dmesg. We saw outputs like this: ``` 327 MiB 83635 mm/compaction.c:1880 func:compaction_alloc 48.4 GiB 12684937 mm/memory.c:1061 func:folio_prealloc 7.48 GiB 10899 mm/huge_memory.c:1159 func:vma_alloc_anon_folio_pmd 298 MiB 95216 kernel/fork.c:318 func:alloc_thread_stack_node 250 MiB 63901 mm/zsmalloc.c:987 func:alloc_zspage 1.42 GiB 372527 mm/memory.c:1063 func:folio_prealloc 1.17 GiB 95693 mm/slub.c:2424 func:alloc_slab_page 651 MiB 166732 mm/readahead.c:270 func:page_cache_ra_unbounded 419 MiB 107261 net/core/page_pool.c:572 func:__page_pool_alloc_pages_slow 404 MiB 103425 arch/x86/mm/pgtable.c:25 func:pte_alloc_one ``` The above example is because one kthread invokes show_mem() from __alloc_pages_slowpath while kernel itself calls oom_kill_process() Yueyang Pan (2): mm/show_mem: No print when not mem_alloc_profiling_enabled() mm/show_mem: Add trylock while printing alloc info mm/show_mem.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) -- 2.47.3 ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v1 1/2] mm/show_mem: No print when not mem_alloc_profiling_enabled() 2025-08-27 18:34 [PATCH v1 0/2] mm/show_mem: Bug fix for print mem alloc info Yueyang Pan @ 2025-08-27 18:34 ` Yueyang Pan 2025-08-27 18:34 ` [PATCH v1 2/2] mm/show_mem: Add trylock while printing alloc info Yueyang Pan 2025-08-27 19:51 ` [PATCH v1 0/2] mm/show_mem: Bug fix for print mem " Vishal Moola (Oracle) 2 siblings, 0 replies; 19+ messages in thread From: Yueyang Pan @ 2025-08-27 18:34 UTC (permalink / raw) To: Suren Baghdasaryan, Andrew Morton, Vlastimil Babka, Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan, Usama Arif Cc: linux-mm, kernel-team, linux-kernel This patch makes print kernel memory allocation information controlled by mem_alloc_profiling_enabled() so that we won't see all zero numbers in production. Signed-off-by: Yueyang Pan <pyyjason@gmail.com> --- mm/show_mem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/show_mem.c b/mm/show_mem.c index 41999e94a56d..b71e222fde86 100644 --- a/mm/show_mem.c +++ b/mm/show_mem.c @@ -419,7 +419,7 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages)); #endif #ifdef CONFIG_MEM_ALLOC_PROFILING - { + if (mem_alloc_profiling_enabled()) { struct codetag_bytes tags[10]; size_t i, nr; -- 2.47.3 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v1 2/2] mm/show_mem: Add trylock while printing alloc info 2025-08-27 18:34 [PATCH v1 0/2] mm/show_mem: Bug fix for print mem alloc info Yueyang Pan 2025-08-27 18:34 ` [PATCH v1 1/2] mm/show_mem: No print when not mem_alloc_profiling_enabled() Yueyang Pan @ 2025-08-27 18:34 ` Yueyang Pan 2025-08-27 22:06 ` Andrew Morton 2025-08-27 19:51 ` [PATCH v1 0/2] mm/show_mem: Bug fix for print mem " Vishal Moola (Oracle) 2 siblings, 1 reply; 19+ messages in thread From: Yueyang Pan @ 2025-08-27 18:34 UTC (permalink / raw) To: Suren Baghdasaryan, Andrew Morton, Vlastimil Babka, Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan, Usama Arif Cc: linux-mm, kernel-team, linux-kernel In production, show_mem() can be called concurrently from two different entities, for example one from oom_kill_process() another from __alloc_pages_slowpath from another kthread. This patch adds a mutex and invokes trylock before printing out the kernel alloc info in show_mem(). This way two alloc info won't interleave with each other, which then makes parsing easier. Signed-off-by: Yueyang Pan <pyyjason@gmail.com> --- mm/show_mem.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/show_mem.c b/mm/show_mem.c index b71e222fde86..8814b5f8a7dc 100644 --- a/mm/show_mem.c +++ b/mm/show_mem.c @@ -23,6 +23,8 @@ EXPORT_SYMBOL(_totalram_pages); unsigned long totalreserve_pages __read_mostly; unsigned long totalcma_pages __read_mostly; +static DEFINE_MUTEX(mem_alloc_profiling_mutex); + static inline void show_node(struct zone *zone) { if (IS_ENABLED(CONFIG_NUMA)) @@ -419,7 +421,7 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages)); #endif #ifdef CONFIG_MEM_ALLOC_PROFILING - if (mem_alloc_profiling_enabled()) { + if (mem_alloc_profiling_enabled() && mutex_trylock(&mem_alloc_profiling_mutex)) { struct codetag_bytes tags[10]; size_t i, nr; @@ -445,6 +447,7 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) ct->lineno, ct->function); } } + mutex_unlock(&mem_alloc_profiling_mutex); } #endif } -- 2.47.3 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v1 2/2] mm/show_mem: Add trylock while printing alloc info 2025-08-27 18:34 ` [PATCH v1 2/2] mm/show_mem: Add trylock while printing alloc info Yueyang Pan @ 2025-08-27 22:06 ` Andrew Morton 2025-08-27 22:28 ` Shakeel Butt 2025-08-28 8:34 ` Yueyang Pan 0 siblings, 2 replies; 19+ messages in thread From: Andrew Morton @ 2025-08-27 22:06 UTC (permalink / raw) To: Yueyang Pan Cc: Suren Baghdasaryan, Vlastimil Babka, Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan, Usama Arif, linux-mm, kernel-team, linux-kernel On Wed, 27 Aug 2025 11:34:23 -0700 Yueyang Pan <pyyjason@gmail.com> wrote: > In production, show_mem() can be called concurrently from two > different entities, for example one from oom_kill_process() > another from __alloc_pages_slowpath from another kthread. This > patch adds a mutex and invokes trylock before printing out the > kernel alloc info in show_mem(). This way two alloc info won't > interleave with each other, which then makes parsing easier. > Fair enough, I guess. > --- a/mm/show_mem.c > +++ b/mm/show_mem.c > @@ -23,6 +23,8 @@ EXPORT_SYMBOL(_totalram_pages); > unsigned long totalreserve_pages __read_mostly; > unsigned long totalcma_pages __read_mostly; > > +static DEFINE_MUTEX(mem_alloc_profiling_mutex); It would be a bit neater to make this local to __show_mem() - it didn't need file scope. Also, mutex_unlock() isn't to be used from interrupt context, so problem. Something like atomic cmpxchg or test_and_set_bit could be used and wouldn't involve mutex_unlock()'s wakeup logic, which isn't needed here. > static inline void show_node(struct zone *zone) > { > if (IS_ENABLED(CONFIG_NUMA)) > @@ -419,7 +421,7 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) > printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages)); > #endif > #ifdef CONFIG_MEM_ALLOC_PROFILING > - if (mem_alloc_profiling_enabled()) { > + if (mem_alloc_profiling_enabled() && mutex_trylock(&mem_alloc_profiling_mutex)) { > struct codetag_bytes tags[10]; > size_t i, nr; > > @@ -445,6 +447,7 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) > ct->lineno, ct->function); > } > } > + mutex_unlock(&mem_alloc_profiling_mutex); > } If we're going to suppress the usual output then how about we let people know this happened, rather than silently dropping it? pr_notice("memory allocation output suppressed due to show_mem() contention\n") or something like that? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 2/2] mm/show_mem: Add trylock while printing alloc info 2025-08-27 22:06 ` Andrew Morton @ 2025-08-27 22:28 ` Shakeel Butt 2025-08-28 8:36 ` Yueyang Pan 2025-08-28 8:34 ` Yueyang Pan 1 sibling, 1 reply; 19+ messages in thread From: Shakeel Butt @ 2025-08-27 22:28 UTC (permalink / raw) To: Andrew Morton Cc: Yueyang Pan, Suren Baghdasaryan, Vlastimil Babka, Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan, Usama Arif, linux-mm, kernel-team, linux-kernel On Wed, Aug 27, 2025 at 03:06:19PM -0700, Andrew Morton wrote: > On Wed, 27 Aug 2025 11:34:23 -0700 Yueyang Pan <pyyjason@gmail.com> wrote: > > > In production, show_mem() can be called concurrently from two > > different entities, for example one from oom_kill_process() > > another from __alloc_pages_slowpath from another kthread. This > > patch adds a mutex and invokes trylock before printing out the > > kernel alloc info in show_mem(). This way two alloc info won't > > interleave with each other, which then makes parsing easier. > > > > Fair enough, I guess. > > > --- a/mm/show_mem.c > > +++ b/mm/show_mem.c > > @@ -23,6 +23,8 @@ EXPORT_SYMBOL(_totalram_pages); > > unsigned long totalreserve_pages __read_mostly; > > unsigned long totalcma_pages __read_mostly; > > > > +static DEFINE_MUTEX(mem_alloc_profiling_mutex); > > It would be a bit neater to make this local to __show_mem() - it didn't > need file scope. +1, something static to __show_mem(). > > Also, mutex_unlock() isn't to be used from interrupt context, so > problem. > > Something like atomic cmpxchg or test_and_set_bit could be used and > wouldn't involve mutex_unlock()'s wakeup logic, which isn't needed > here. +1 > > > static inline void show_node(struct zone *zone) > > { > > if (IS_ENABLED(CONFIG_NUMA)) > > @@ -419,7 +421,7 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) > > printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages)); > > #endif > > #ifdef CONFIG_MEM_ALLOC_PROFILING > > - if (mem_alloc_profiling_enabled()) { > > + if (mem_alloc_profiling_enabled() && mutex_trylock(&mem_alloc_profiling_mutex)) { > > struct codetag_bytes tags[10]; > > size_t i, nr; > > > > @@ -445,6 +447,7 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) > > ct->lineno, ct->function); > > } > > } > > + mutex_unlock(&mem_alloc_profiling_mutex); > > } > > If we're going to suppress the usual output then how about we let > people know this happened, rather than silently dropping it? > > pr_notice("memory allocation output suppressed due to show_mem() contention\n") > > or something like that? Personally I think this is not needed as this patch is suppressing only the memory allocation profiling output which is global, will be same for all the consumers and context does not matter. All consumers will get the memory allocation profiling data eventually. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 2/2] mm/show_mem: Add trylock while printing alloc info 2025-08-27 22:28 ` Shakeel Butt @ 2025-08-28 8:36 ` Yueyang Pan 0 siblings, 0 replies; 19+ messages in thread From: Yueyang Pan @ 2025-08-28 8:36 UTC (permalink / raw) To: Shakeel Butt Cc: Andrew Morton, Suren Baghdasaryan, Vlastimil Babka, Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan, Usama Arif, linux-mm, kernel-team, linux-kernel On Wed, Aug 27, 2025 at 03:28:41PM -0700, Shakeel Butt wrote: > On Wed, Aug 27, 2025 at 03:06:19PM -0700, Andrew Morton wrote: > > On Wed, 27 Aug 2025 11:34:23 -0700 Yueyang Pan <pyyjason@gmail.com> wrote: > > > > > In production, show_mem() can be called concurrently from two > > > different entities, for example one from oom_kill_process() > > > another from __alloc_pages_slowpath from another kthread. This > > > patch adds a mutex and invokes trylock before printing out the > > > kernel alloc info in show_mem(). This way two alloc info won't > > > interleave with each other, which then makes parsing easier. > > > > > > > Fair enough, I guess. > > > > > --- a/mm/show_mem.c > > > +++ b/mm/show_mem.c > > > @@ -23,6 +23,8 @@ EXPORT_SYMBOL(_totalram_pages); > > > unsigned long totalreserve_pages __read_mostly; > > > unsigned long totalcma_pages __read_mostly; > > > > > > +static DEFINE_MUTEX(mem_alloc_profiling_mutex); > > > > It would be a bit neater to make this local to __show_mem() - it didn't > > need file scope. > > +1, something static to __show_mem(). Thanks for your feedback, Shakeel. See my reply to Andrew for this. > > > > > Also, mutex_unlock() isn't to be used from interrupt context, so > > problem. > > > > Something like atomic cmpxchg or test_and_set_bit could be used and > > wouldn't involve mutex_unlock()'s wakeup logic, which isn't needed > > here. > > +1 Again, see my reply to Andrew. > > > > > > static inline void show_node(struct zone *zone) > > > { > > > if (IS_ENABLED(CONFIG_NUMA)) > > > @@ -419,7 +421,7 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) > > > printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages)); > > > #endif > > > #ifdef CONFIG_MEM_ALLOC_PROFILING > > > - if (mem_alloc_profiling_enabled()) { > > > + if (mem_alloc_profiling_enabled() && mutex_trylock(&mem_alloc_profiling_mutex)) { > > > struct codetag_bytes tags[10]; > > > size_t i, nr; > > > > > > @@ -445,6 +447,7 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) > > > ct->lineno, ct->function); > > > } > > > } > > > + mutex_unlock(&mem_alloc_profiling_mutex); > > > } > > > > If we're going to suppress the usual output then how about we let > > people know this happened, rather than silently dropping it? > > > > pr_notice("memory allocation output suppressed due to show_mem() contention\n") > > > > or something like that? > > Personally I think this is not needed as this patch is suppressing only > the memory allocation profiling output which is global, will be same > for all the consumers and context does not matter. All consumers will > get the memory allocation profiling data eventually. For this point, I sort of agree with you. Wait for others' opinions? Thanks Pan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 2/2] mm/show_mem: Add trylock while printing alloc info 2025-08-27 22:06 ` Andrew Morton 2025-08-27 22:28 ` Shakeel Butt @ 2025-08-28 8:34 ` Yueyang Pan 2025-08-28 8:41 ` Vlastimil Babka 1 sibling, 1 reply; 19+ messages in thread From: Yueyang Pan @ 2025-08-28 8:34 UTC (permalink / raw) To: Andrew Morton Cc: Suren Baghdasaryan, Vlastimil Babka, Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan, Usama Arif, linux-mm, kernel-team, linux-kernel On Wed, Aug 27, 2025 at 03:06:19PM -0700, Andrew Morton wrote: > On Wed, 27 Aug 2025 11:34:23 -0700 Yueyang Pan <pyyjason@gmail.com> wrote: > > > In production, show_mem() can be called concurrently from two > > different entities, for example one from oom_kill_process() > > another from __alloc_pages_slowpath from another kthread. This > > patch adds a mutex and invokes trylock before printing out the > > kernel alloc info in show_mem(). This way two alloc info won't > > interleave with each other, which then makes parsing easier. > > > > Fair enough, I guess. > > > --- a/mm/show_mem.c > > +++ b/mm/show_mem.c > > @@ -23,6 +23,8 @@ EXPORT_SYMBOL(_totalram_pages); > > unsigned long totalreserve_pages __read_mostly; > > unsigned long totalcma_pages __read_mostly; > > > > +static DEFINE_MUTEX(mem_alloc_profiling_mutex); > > It would be a bit neater to make this local to __show_mem() - it didn't > need file scope. Thanks for your feedback, Andrew. I will move it the next version. > > Also, mutex_unlock() isn't to be used from interrupt context, so > problem. > > Something like atomic cmpxchg or test_and_set_bit could be used and > wouldn't involve mutex_unlock()'s wakeup logic, which isn't needed > here. I was not aware of interrupt context before. I will change to test-and-set lock in the next version. > > > static inline void show_node(struct zone *zone) > > { > > if (IS_ENABLED(CONFIG_NUMA)) > > @@ -419,7 +421,7 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) > > printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages)); > > #endif > > #ifdef CONFIG_MEM_ALLOC_PROFILING > > - if (mem_alloc_profiling_enabled()) { > > + if (mem_alloc_profiling_enabled() && mutex_trylock(&mem_alloc_profiling_mutex)) { > > struct codetag_bytes tags[10]; > > size_t i, nr; > > > > @@ -445,6 +447,7 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) > > ct->lineno, ct->function); > > } > > } > > + mutex_unlock(&mem_alloc_profiling_mutex); > > } > > If we're going to suppress the usual output then how about we let > people know this happened, rather than silently dropping it? > > pr_notice("memory allocation output suppressed due to show_mem() contention\n") > > or something like that? For this point, I am sort of on Shakeel's side. Probably I won't call it suppressed as two concurrent printers is actually sharing this global information. Thanks, Pan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 2/2] mm/show_mem: Add trylock while printing alloc info 2025-08-28 8:34 ` Yueyang Pan @ 2025-08-28 8:41 ` Vlastimil Babka 2025-08-28 8:47 ` Yueyang Pan 2025-08-28 16:35 ` Shakeel Butt 0 siblings, 2 replies; 19+ messages in thread From: Vlastimil Babka @ 2025-08-28 8:41 UTC (permalink / raw) To: Yueyang Pan, Andrew Morton Cc: Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan, Usama Arif, linux-mm, kernel-team, linux-kernel On 8/28/25 10:34, Yueyang Pan wrote: > On Wed, Aug 27, 2025 at 03:06:19PM -0700, Andrew Morton wrote: >> On Wed, 27 Aug 2025 11:34:23 -0700 Yueyang Pan <pyyjason@gmail.com> wrote: >> >> > In production, show_mem() can be called concurrently from two >> > different entities, for example one from oom_kill_process() >> > another from __alloc_pages_slowpath from another kthread. This >> > patch adds a mutex and invokes trylock before printing out the >> > kernel alloc info in show_mem(). This way two alloc info won't >> > interleave with each other, which then makes parsing easier. What about the rest of the information printed by show_mem() being interleaved? >> > >> >> Fair enough, I guess. >> >> > --- a/mm/show_mem.c >> > +++ b/mm/show_mem.c >> > @@ -23,6 +23,8 @@ EXPORT_SYMBOL(_totalram_pages); >> > unsigned long totalreserve_pages __read_mostly; >> > unsigned long totalcma_pages __read_mostly; >> > >> > +static DEFINE_MUTEX(mem_alloc_profiling_mutex); >> >> It would be a bit neater to make this local to __show_mem() - it didn't >> need file scope. > > Thanks for your feedback, Andrew. I will move it the next version. > >> >> Also, mutex_unlock() isn't to be used from interrupt context, so >> problem. >> >> Something like atomic cmpxchg or test_and_set_bit could be used and >> wouldn't involve mutex_unlock()'s wakeup logic, which isn't needed >> here. > > I was not aware of interrupt context before. I will change to test-and-set > lock in the next version. Perhaps simply spinlock_t with spin_trylock()? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 2/2] mm/show_mem: Add trylock while printing alloc info 2025-08-28 8:41 ` Vlastimil Babka @ 2025-08-28 8:47 ` Yueyang Pan 2025-08-28 8:53 ` Vlastimil Babka 2025-08-28 16:35 ` Shakeel Butt 1 sibling, 1 reply; 19+ messages in thread From: Yueyang Pan @ 2025-08-28 8:47 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan, Usama Arif, linux-mm, kernel-team, linux-kernel On Thu, Aug 28, 2025 at 10:41:23AM +0200, Vlastimil Babka wrote: > On 8/28/25 10:34, Yueyang Pan wrote: > > On Wed, Aug 27, 2025 at 03:06:19PM -0700, Andrew Morton wrote: > >> On Wed, 27 Aug 2025 11:34:23 -0700 Yueyang Pan <pyyjason@gmail.com> wrote: > >> > >> > In production, show_mem() can be called concurrently from two > >> > different entities, for example one from oom_kill_process() > >> > another from __alloc_pages_slowpath from another kthread. This > >> > patch adds a mutex and invokes trylock before printing out the > >> > kernel alloc info in show_mem(). This way two alloc info won't > >> > interleave with each other, which then makes parsing easier. > > What about the rest of the information printed by show_mem() being interleaved? Thanks for your feedback, Vlastimil. We cannot use trylock for the rest part as node filter can be different. Do you think we need a lock to prevent the whole show_mem() from being interleaved and to acquire it at the very beginning? Will it be too heavy? > > >> > > >> > >> Fair enough, I guess. > >> > >> > --- a/mm/show_mem.c > >> > +++ b/mm/show_mem.c > >> > @@ -23,6 +23,8 @@ EXPORT_SYMBOL(_totalram_pages); > >> > unsigned long totalreserve_pages __read_mostly; > >> > unsigned long totalcma_pages __read_mostly; > >> > > >> > +static DEFINE_MUTEX(mem_alloc_profiling_mutex); > >> > >> It would be a bit neater to make this local to __show_mem() - it didn't > >> need file scope. > > > > Thanks for your feedback, Andrew. I will move it the next version. > > > >> > >> Also, mutex_unlock() isn't to be used from interrupt context, so > >> problem. > >> > >> Something like atomic cmpxchg or test_and_set_bit could be used and > >> wouldn't involve mutex_unlock()'s wakeup logic, which isn't needed > >> here. > > > > I was not aware of interrupt context before. I will change to test-and-set > > lock in the next version. > > Perhaps simply spinlock_t with spin_trylock()? > Agreed. Thanks Pan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 2/2] mm/show_mem: Add trylock while printing alloc info 2025-08-28 8:47 ` Yueyang Pan @ 2025-08-28 8:53 ` Vlastimil Babka 2025-08-28 9:51 ` Yueyang Pan 0 siblings, 1 reply; 19+ messages in thread From: Vlastimil Babka @ 2025-08-28 8:53 UTC (permalink / raw) To: Yueyang Pan Cc: Andrew Morton, Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan, Usama Arif, linux-mm, kernel-team, linux-kernel On 8/28/25 10:47, Yueyang Pan wrote: > On Thu, Aug 28, 2025 at 10:41:23AM +0200, Vlastimil Babka wrote: >> On 8/28/25 10:34, Yueyang Pan wrote: >> > On Wed, Aug 27, 2025 at 03:06:19PM -0700, Andrew Morton wrote: >> >> On Wed, 27 Aug 2025 11:34:23 -0700 Yueyang Pan <pyyjason@gmail.com> wrote: >> >> >> >> > In production, show_mem() can be called concurrently from two >> >> > different entities, for example one from oom_kill_process() >> >> > another from __alloc_pages_slowpath from another kthread. This >> >> > patch adds a mutex and invokes trylock before printing out the >> >> > kernel alloc info in show_mem(). This way two alloc info won't >> >> > interleave with each other, which then makes parsing easier. >> >> What about the rest of the information printed by show_mem() being interleaved? > > Thanks for your feedback, Vlastimil. We cannot use trylock for the rest > part as node filter can be different. Right. > Do you think we need a lock to prevent the whole show_mem() from being > interleaved and to acquire it at the very beginning? Will it be too > heavy? It might be risky so perhaps let's not. Guess we can disentangle by dmesg showing the thread id prefix. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 2/2] mm/show_mem: Add trylock while printing alloc info 2025-08-28 8:53 ` Vlastimil Babka @ 2025-08-28 9:51 ` Yueyang Pan 2025-08-28 9:54 ` Vlastimil Babka 0 siblings, 1 reply; 19+ messages in thread From: Yueyang Pan @ 2025-08-28 9:51 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan, Usama Arif, linux-mm, kernel-team, linux-kernel On Thu, Aug 28, 2025 at 10:53:01AM +0200, Vlastimil Babka wrote: > On 8/28/25 10:47, Yueyang Pan wrote: > > On Thu, Aug 28, 2025 at 10:41:23AM +0200, Vlastimil Babka wrote: > >> On 8/28/25 10:34, Yueyang Pan wrote: > >> > On Wed, Aug 27, 2025 at 03:06:19PM -0700, Andrew Morton wrote: > >> >> On Wed, 27 Aug 2025 11:34:23 -0700 Yueyang Pan <pyyjason@gmail.com> wrote: > >> >> > >> >> > In production, show_mem() can be called concurrently from two > >> >> > different entities, for example one from oom_kill_process() > >> >> > another from __alloc_pages_slowpath from another kthread. This > >> >> > patch adds a mutex and invokes trylock before printing out the > >> >> > kernel alloc info in show_mem(). This way two alloc info won't > >> >> > interleave with each other, which then makes parsing easier. > >> > >> What about the rest of the information printed by show_mem() being interleaved? > > > > Thanks for your feedback, Vlastimil. We cannot use trylock for the rest > > part as node filter can be different. > > Right. > > > Do you think we need a lock to prevent the whole show_mem() from being > > interleaved and to acquire it at the very beginning? Will it be too > > heavy? > > It might be risky so perhaps let's not. Guess we can disentangle by dmesg > showing the thread id prefix. I have thought about this. Since each line can interleave with another, we would end up adding tid to each line. Not sure if this is acceptable. Thanks Pan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 2/2] mm/show_mem: Add trylock while printing alloc info 2025-08-28 9:51 ` Yueyang Pan @ 2025-08-28 9:54 ` Vlastimil Babka 2025-08-28 22:10 ` Yueyang Pan 0 siblings, 1 reply; 19+ messages in thread From: Vlastimil Babka @ 2025-08-28 9:54 UTC (permalink / raw) To: Yueyang Pan Cc: Andrew Morton, Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan, Usama Arif, linux-mm, kernel-team, linux-kernel On 8/28/25 11:51, Yueyang Pan wrote: > On Thu, Aug 28, 2025 at 10:53:01AM +0200, Vlastimil Babka wrote: >> On 8/28/25 10:47, Yueyang Pan wrote: >> > On Thu, Aug 28, 2025 at 10:41:23AM +0200, Vlastimil Babka wrote: >> >> On 8/28/25 10:34, Yueyang Pan wrote: >> >> > On Wed, Aug 27, 2025 at 03:06:19PM -0700, Andrew Morton wrote: >> >> >> On Wed, 27 Aug 2025 11:34:23 -0700 Yueyang Pan <pyyjason@gmail.com> wrote: >> >> >> >> >> >> > In production, show_mem() can be called concurrently from two >> >> >> > different entities, for example one from oom_kill_process() >> >> >> > another from __alloc_pages_slowpath from another kthread. This >> >> >> > patch adds a mutex and invokes trylock before printing out the >> >> >> > kernel alloc info in show_mem(). This way two alloc info won't >> >> >> > interleave with each other, which then makes parsing easier. >> >> >> >> What about the rest of the information printed by show_mem() being interleaved? >> > >> > Thanks for your feedback, Vlastimil. We cannot use trylock for the rest >> > part as node filter can be different. >> >> Right. >> >> > Do you think we need a lock to prevent the whole show_mem() from being >> > interleaved and to acquire it at the very beginning? Will it be too >> > heavy? >> >> It might be risky so perhaps let's not. Guess we can disentangle by dmesg >> showing the thread id prefix. > > I have thought about this. Since each line can interleave with another, we > would end up adding tid to each line. Not sure if this is acceptable. I meant that printk/dmesg already does that so it's fine. > Thanks > Pan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 2/2] mm/show_mem: Add trylock while printing alloc info 2025-08-28 9:54 ` Vlastimil Babka @ 2025-08-28 22:10 ` Yueyang Pan 0 siblings, 0 replies; 19+ messages in thread From: Yueyang Pan @ 2025-08-28 22:10 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan, Usama Arif, linux-mm, kernel-team, linux-kernel On Thu, Aug 28, 2025 at 11:54:58AM +0200, Vlastimil Babka wrote: > On 8/28/25 11:51, Yueyang Pan wrote: > > On Thu, Aug 28, 2025 at 10:53:01AM +0200, Vlastimil Babka wrote: > >> On 8/28/25 10:47, Yueyang Pan wrote: > >> > On Thu, Aug 28, 2025 at 10:41:23AM +0200, Vlastimil Babka wrote: > >> >> On 8/28/25 10:34, Yueyang Pan wrote: > >> >> > On Wed, Aug 27, 2025 at 03:06:19PM -0700, Andrew Morton wrote: > >> >> >> On Wed, 27 Aug 2025 11:34:23 -0700 Yueyang Pan <pyyjason@gmail.com> wrote: > >> >> >> > >> >> >> > In production, show_mem() can be called concurrently from two > >> >> >> > different entities, for example one from oom_kill_process() > >> >> >> > another from __alloc_pages_slowpath from another kthread. This > >> >> >> > patch adds a mutex and invokes trylock before printing out the > >> >> >> > kernel alloc info in show_mem(). This way two alloc info won't > >> >> >> > interleave with each other, which then makes parsing easier. > >> >> > >> >> What about the rest of the information printed by show_mem() being interleaved? > >> > > >> > Thanks for your feedback, Vlastimil. We cannot use trylock for the rest > >> > part as node filter can be different. > >> > >> Right. > >> > >> > Do you think we need a lock to prevent the whole show_mem() from being > >> > interleaved and to acquire it at the very beginning? Will it be too > >> > heavy? > >> > >> It might be risky so perhaps let's not. Guess we can disentangle by dmesg > >> showing the thread id prefix. > > > > I have thought about this. Since each line can interleave with another, we > > would end up adding tid to each line. Not sure if this is acceptable. > > I meant that printk/dmesg already does that so it's fine. Cool. Then I will do this for the previous part before memory allocation info. > > > Thanks > > Pan > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 2/2] mm/show_mem: Add trylock while printing alloc info 2025-08-28 8:41 ` Vlastimil Babka 2025-08-28 8:47 ` Yueyang Pan @ 2025-08-28 16:35 ` Shakeel Butt 2025-08-28 17:21 ` Vlastimil Babka 1 sibling, 1 reply; 19+ messages in thread From: Shakeel Butt @ 2025-08-28 16:35 UTC (permalink / raw) To: Vlastimil Babka Cc: Yueyang Pan, Andrew Morton, Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan, Usama Arif, linux-mm, kernel-team, linux-kernel On Thu, Aug 28, 2025 at 10:41:23AM +0200, Vlastimil Babka wrote: > > > > I was not aware of interrupt context before. I will change to test-and-set > > lock in the next version. > > Perhaps simply spinlock_t with spin_trylock()? > Will lockdep complain that this spinlock is taken in non-irq and irq context without disabling irqs? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 2/2] mm/show_mem: Add trylock while printing alloc info 2025-08-28 16:35 ` Shakeel Butt @ 2025-08-28 17:21 ` Vlastimil Babka 0 siblings, 0 replies; 19+ messages in thread From: Vlastimil Babka @ 2025-08-28 17:21 UTC (permalink / raw) To: Shakeel Butt Cc: Yueyang Pan, Andrew Morton, Suren Baghdasaryan, Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan, Usama Arif, linux-mm, kernel-team, linux-kernel On 8/28/25 18:35, Shakeel Butt wrote: > On Thu, Aug 28, 2025 at 10:41:23AM +0200, Vlastimil Babka wrote: >> > >> > I was not aware of interrupt context before. I will change to test-and-set >> > lock in the next version. >> >> Perhaps simply spinlock_t with spin_trylock()? >> > > Will lockdep complain that this spinlock is taken in non-irq and irq > context without disabling irqs? If we only use spin_trylock then it won't. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 0/2] mm/show_mem: Bug fix for print mem alloc info 2025-08-27 18:34 [PATCH v1 0/2] mm/show_mem: Bug fix for print mem alloc info Yueyang Pan 2025-08-27 18:34 ` [PATCH v1 1/2] mm/show_mem: No print when not mem_alloc_profiling_enabled() Yueyang Pan 2025-08-27 18:34 ` [PATCH v1 2/2] mm/show_mem: Add trylock while printing alloc info Yueyang Pan @ 2025-08-27 19:51 ` Vishal Moola (Oracle) 2025-08-28 8:29 ` Yueyang Pan 2 siblings, 1 reply; 19+ messages in thread From: Vishal Moola (Oracle) @ 2025-08-27 19:51 UTC (permalink / raw) To: Yueyang Pan Cc: Suren Baghdasaryan, Andrew Morton, Vlastimil Babka, Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan, Usama Arif, linux-mm, kernel-team, linux-kernel On Wed, Aug 27, 2025 at 11:34:21AM -0700, Yueyang Pan wrote: > This patch set fixes two issues we saw in production rollout. > > The first issue is that we saw all zero output of memory allocation > profiling information from show_mem() if CONFIG_MEM_ALLOC_PROFILING > is set and sysctl.vm.mem_profiling=0. In this case, the behaviour > should be the same as when CONFIG_MEM_ALLOC_PROFILING is unset, Did you mean to say when sysctl.vm.mem_profiling=never? My understanding is that setting the sysctl=0 Pauses memory allocation profiling, while 1 Resumes it. When the sysctl=never should be the same as when the config is unset, but I suspect we might still want the info when set to 0. > where show_mem prints nothing about the information. This will make > further parse easier as we don't have to differentiate what a all > zero line actually means (Does it mean 0 bytes are allocated > or simply memory allocation profiling is disabled). > > The second issue is that multiple entities can call show_mem() > which messed up the allocation info in dmesg. We saw outputs like this: > ``` > 327 MiB 83635 mm/compaction.c:1880 func:compaction_alloc > 48.4 GiB 12684937 mm/memory.c:1061 func:folio_prealloc > 7.48 GiB 10899 mm/huge_memory.c:1159 func:vma_alloc_anon_folio_pmd > 298 MiB 95216 kernel/fork.c:318 func:alloc_thread_stack_node > 250 MiB 63901 mm/zsmalloc.c:987 func:alloc_zspage > 1.42 GiB 372527 mm/memory.c:1063 func:folio_prealloc > 1.17 GiB 95693 mm/slub.c:2424 func:alloc_slab_page > 651 MiB 166732 mm/readahead.c:270 func:page_cache_ra_unbounded > 419 MiB 107261 net/core/page_pool.c:572 func:__page_pool_alloc_pages_slow > 404 MiB 103425 arch/x86/mm/pgtable.c:25 func:pte_alloc_one > ``` > The above example is because one kthread invokes show_mem() > from __alloc_pages_slowpath while kernel itself calls > oom_kill_process() I'm not familiar with show_mem(). Could you spell out what's wrong with the output above? > Yueyang Pan (2): > mm/show_mem: No print when not mem_alloc_profiling_enabled() > mm/show_mem: Add trylock while printing alloc info > > mm/show_mem.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > -- > 2.47.3 > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 0/2] mm/show_mem: Bug fix for print mem alloc info 2025-08-27 19:51 ` [PATCH v1 0/2] mm/show_mem: Bug fix for print mem " Vishal Moola (Oracle) @ 2025-08-28 8:29 ` Yueyang Pan 2025-08-28 17:05 ` Vishal Moola (Oracle) 0 siblings, 1 reply; 19+ messages in thread From: Yueyang Pan @ 2025-08-28 8:29 UTC (permalink / raw) To: Vishal Moola (Oracle) Cc: Suren Baghdasaryan, Andrew Morton, Vlastimil Babka, Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan, Usama Arif, linux-mm, kernel-team, linux-kernel On Wed, Aug 27, 2025 at 12:51:17PM -0700, Vishal Moola (Oracle) wrote: > On Wed, Aug 27, 2025 at 11:34:21AM -0700, Yueyang Pan wrote: > > This patch set fixes two issues we saw in production rollout. > > > > The first issue is that we saw all zero output of memory allocation > > profiling information from show_mem() if CONFIG_MEM_ALLOC_PROFILING > > is set and sysctl.vm.mem_profiling=0. In this case, the behaviour > > should be the same as when CONFIG_MEM_ALLOC_PROFILING is unset, > > Did you mean to say when sysctl.vm.mem_profiling=never? > > My understanding is that setting the sysctl=0 Pauses memory allocation > profiling, while 1 Resumes it. When the sysctl=never should be the same > as when the config is unset, but I suspect we might still want the info > when set to 0. Thanks for your feedback Vishal. Here I mean for both =0 and =never. In both cases, now __show_mem() will print all 0s, which both is redundant and also makes differentiate hard. IMO when __show_mem() prints something the output should be useful at least. > > > where show_mem prints nothing about the information. This will make > > further parse easier as we don't have to differentiate what a all > > zero line actually means (Does it mean 0 bytes are allocated > > or simply memory allocation profiling is disabled). > > > > The second issue is that multiple entities can call show_mem() > > which messed up the allocation info in dmesg. We saw outputs like this: > > ``` > > 327 MiB 83635 mm/compaction.c:1880 func:compaction_alloc > > 48.4 GiB 12684937 mm/memory.c:1061 func:folio_prealloc > > 7.48 GiB 10899 mm/huge_memory.c:1159 func:vma_alloc_anon_folio_pmd > > 298 MiB 95216 kernel/fork.c:318 func:alloc_thread_stack_node > > 250 MiB 63901 mm/zsmalloc.c:987 func:alloc_zspage > > 1.42 GiB 372527 mm/memory.c:1063 func:folio_prealloc > > 1.17 GiB 95693 mm/slub.c:2424 func:alloc_slab_page > > 651 MiB 166732 mm/readahead.c:270 func:page_cache_ra_unbounded > > 419 MiB 107261 net/core/page_pool.c:572 func:__page_pool_alloc_pages_slow > > 404 MiB 103425 arch/x86/mm/pgtable.c:25 func:pte_alloc_one > > ``` > > The above example is because one kthread invokes show_mem() > > from __alloc_pages_slowpath while kernel itself calls > > oom_kill_process() > > I'm not familiar with show_mem(). Could you spell out what's wrong with > the output above? So here in the normal case, the output should be sorted by size. Here two print happen at the same time so they interleave with each other, making further parse harder (need to sort again and dedup). > > > Yueyang Pan (2): > > mm/show_mem: No print when not mem_alloc_profiling_enabled() > > mm/show_mem: Add trylock while printing alloc info > > > > mm/show_mem.c | 5 ++++- > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > -- > > 2.47.3 > > Thanks, Pan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 0/2] mm/show_mem: Bug fix for print mem alloc info 2025-08-28 8:29 ` Yueyang Pan @ 2025-08-28 17:05 ` Vishal Moola (Oracle) 2025-08-28 22:07 ` Yueyang Pan 0 siblings, 1 reply; 19+ messages in thread From: Vishal Moola (Oracle) @ 2025-08-28 17:05 UTC (permalink / raw) To: Yueyang Pan Cc: Suren Baghdasaryan, Andrew Morton, Vlastimil Babka, Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan, Usama Arif, linux-mm, kernel-team, linux-kernel On Thu, Aug 28, 2025 at 01:29:08AM -0700, Yueyang Pan wrote: > On Wed, Aug 27, 2025 at 12:51:17PM -0700, Vishal Moola (Oracle) wrote: > > On Wed, Aug 27, 2025 at 11:34:21AM -0700, Yueyang Pan wrote: > > > This patch set fixes two issues we saw in production rollout. > > > > > > The first issue is that we saw all zero output of memory allocation > > > profiling information from show_mem() if CONFIG_MEM_ALLOC_PROFILING > > > is set and sysctl.vm.mem_profiling=0. In this case, the behaviour > > > should be the same as when CONFIG_MEM_ALLOC_PROFILING is unset, > > > > Did you mean to say when sysctl.vm.mem_profiling=never? > > > > My understanding is that setting the sysctl=0 Pauses memory allocation > > profiling, while 1 Resumes it. When the sysctl=never should be the same > > as when the config is unset, but I suspect we might still want the info > > when set to 0. > > Thanks for your feedback Vishal. Here I mean for both =0 and =never. > In both cases, now __show_mem() will print all 0s, which both is redundant > and also makes differentiate hard. IMO when __show_mem() prints something > the output should be useful at least. If differentiating between 0 allocations vs disabled is the primary concern, I think prefacing the dump with the status of the tool is better than treating =0 and =never as the same. The way I see it, the {0,1,never} tristate offers a level of versatility that I'm not sure we need to eliminate. I'm thinking about cases where we may temporarily set =1 to track some allocations, then back to =0 'pause' on that exact period of time. Memory allocation profiling still has those allocations tracked while set to =0 (we can still see them in /proc/allocinfo at least). If a user decided to do that just before an oom, could they see something useful from show_mem() even when =0? > > > > > where show_mem prints nothing about the information. This will make > > > further parse easier as we don't have to differentiate what a all > > > zero line actually means (Does it mean 0 bytes are allocated > > > or simply memory allocation profiling is disabled). > > > > > > The second issue is that multiple entities can call show_mem() > > > which messed up the allocation info in dmesg. We saw outputs like this: > > > ``` > > > 327 MiB 83635 mm/compaction.c:1880 func:compaction_alloc > > > 48.4 GiB 12684937 mm/memory.c:1061 func:folio_prealloc > > > 7.48 GiB 10899 mm/huge_memory.c:1159 func:vma_alloc_anon_folio_pmd > > > 298 MiB 95216 kernel/fork.c:318 func:alloc_thread_stack_node > > > 250 MiB 63901 mm/zsmalloc.c:987 func:alloc_zspage > > > 1.42 GiB 372527 mm/memory.c:1063 func:folio_prealloc > > > 1.17 GiB 95693 mm/slub.c:2424 func:alloc_slab_page > > > 651 MiB 166732 mm/readahead.c:270 func:page_cache_ra_unbounded > > > 419 MiB 107261 net/core/page_pool.c:572 func:__page_pool_alloc_pages_slow > > > 404 MiB 103425 arch/x86/mm/pgtable.c:25 func:pte_alloc_one > > > ``` > > > The above example is because one kthread invokes show_mem() > > > from __alloc_pages_slowpath while kernel itself calls > > > oom_kill_process() > > > > I'm not familiar with show_mem(). Could you spell out what's wrong with > > the output above? > > So here in the normal case, the output should be sorted by size. Here > two print happen at the same time so they interleave with each other, > making further parse harder (need to sort again and dedup). Gotcha. > > > > > Yueyang Pan (2): > > > mm/show_mem: No print when not mem_alloc_profiling_enabled() > > > mm/show_mem: Add trylock while printing alloc info > > > > > > mm/show_mem.c | 5 ++++- > > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > > > -- > > > 2.47.3 > > > > > Thanks, > Pan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v1 0/2] mm/show_mem: Bug fix for print mem alloc info 2025-08-28 17:05 ` Vishal Moola (Oracle) @ 2025-08-28 22:07 ` Yueyang Pan 0 siblings, 0 replies; 19+ messages in thread From: Yueyang Pan @ 2025-08-28 22:07 UTC (permalink / raw) To: Vishal Moola (Oracle) Cc: Suren Baghdasaryan, Andrew Morton, Vlastimil Babka, Michal Hocko, Brendan Jackman, Johannes Weiner, Zi Yan, Usama Arif, linux-mm, kernel-team, linux-kernel On Thu, Aug 28, 2025 at 10:05:18AM -0700, Vishal Moola (Oracle) wrote: > On Thu, Aug 28, 2025 at 01:29:08AM -0700, Yueyang Pan wrote: > > On Wed, Aug 27, 2025 at 12:51:17PM -0700, Vishal Moola (Oracle) wrote: > > > On Wed, Aug 27, 2025 at 11:34:21AM -0700, Yueyang Pan wrote: > > > > This patch set fixes two issues we saw in production rollout. > > > > > > > > The first issue is that we saw all zero output of memory allocation > > > > profiling information from show_mem() if CONFIG_MEM_ALLOC_PROFILING > > > > is set and sysctl.vm.mem_profiling=0. In this case, the behaviour > > > > should be the same as when CONFIG_MEM_ALLOC_PROFILING is unset, > > > > > > Did you mean to say when sysctl.vm.mem_profiling=never? > > > > > > My understanding is that setting the sysctl=0 Pauses memory allocation > > > profiling, while 1 Resumes it. When the sysctl=never should be the same > > > as when the config is unset, but I suspect we might still want the info > > > when set to 0. > > > > Thanks for your feedback Vishal. Here I mean for both =0 and =never. > > In both cases, now __show_mem() will print all 0s, which both is redundant > > and also makes differentiate hard. IMO when __show_mem() prints something > > the output should be useful at least. > > If differentiating between 0 allocations vs disabled is the primary > concern, I think prefacing the dump with the status of the tool is > better than treating =0 and =never as the same. > > The way I see it, the {0,1,never} tristate offers a level of versatility > that I'm not sure we need to eliminate. > > I'm thinking about cases where we may temporarily set =1 to track some > allocations, then back to =0 'pause' on that exact period of time. Memory > allocation profiling still has those allocations tracked while set to =0 > (we can still see them in /proc/allocinfo at least). If a user decided to > do that just before an oom, could they see something useful from > show_mem() even when =0? This is a good point. I agree with your suggestion about adding the state to print. I am still unsure about if we want to print it when =0. The first reason is that memory allocation profiler does not support runtime enabling now. We have to set it via boot cmdline. It will make more sense if we have this feature. Second is because memory allocation profiling is quite light-weighted, I would assume user really don't need this feature when they set =0. The original reason why I tried to disable this is because in our production table we see a lot of 0Bs coming from the machines where the changes in boot cmdline have not been pushed to. If we have state info, we could possibly filter this info out before sending it to the table. So I agree upon adding the state to print. Maybe others also have thoughts about this? > > > > > > > > where show_mem prints nothing about the information. This will make > > > > further parse easier as we don't have to differentiate what a all > > > > zero line actually means (Does it mean 0 bytes are allocated > > > > or simply memory allocation profiling is disabled). > > > > > > > > The second issue is that multiple entities can call show_mem() > > > > which messed up the allocation info in dmesg. We saw outputs like this: > > > > ``` > > > > 327 MiB 83635 mm/compaction.c:1880 func:compaction_alloc > > > > 48.4 GiB 12684937 mm/memory.c:1061 func:folio_prealloc > > > > 7.48 GiB 10899 mm/huge_memory.c:1159 func:vma_alloc_anon_folio_pmd > > > > 298 MiB 95216 kernel/fork.c:318 func:alloc_thread_stack_node > > > > 250 MiB 63901 mm/zsmalloc.c:987 func:alloc_zspage > > > > 1.42 GiB 372527 mm/memory.c:1063 func:folio_prealloc > > > > 1.17 GiB 95693 mm/slub.c:2424 func:alloc_slab_page > > > > 651 MiB 166732 mm/readahead.c:270 func:page_cache_ra_unbounded > > > > 419 MiB 107261 net/core/page_pool.c:572 func:__page_pool_alloc_pages_slow > > > > 404 MiB 103425 arch/x86/mm/pgtable.c:25 func:pte_alloc_one > > > > ``` > > > > The above example is because one kthread invokes show_mem() > > > > from __alloc_pages_slowpath while kernel itself calls > > > > oom_kill_process() > > > > > > I'm not familiar with show_mem(). Could you spell out what's wrong with > > > the output above? > > > > So here in the normal case, the output should be sorted by size. Here > > two print happen at the same time so they interleave with each other, > > making further parse harder (need to sort again and dedup). > > Gotcha. > > > > > > > > Yueyang Pan (2): > > > > mm/show_mem: No print when not mem_alloc_profiling_enabled() > > > > mm/show_mem: Add trylock while printing alloc info > > > > > > > > mm/show_mem.c | 5 ++++- > > > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > > > > > -- > > > > 2.47.3 > > > > > > > > Thanks, > > Pan ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2025-08-28 22:10 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-08-27 18:34 [PATCH v1 0/2] mm/show_mem: Bug fix for print mem alloc info Yueyang Pan 2025-08-27 18:34 ` [PATCH v1 1/2] mm/show_mem: No print when not mem_alloc_profiling_enabled() Yueyang Pan 2025-08-27 18:34 ` [PATCH v1 2/2] mm/show_mem: Add trylock while printing alloc info Yueyang Pan 2025-08-27 22:06 ` Andrew Morton 2025-08-27 22:28 ` Shakeel Butt 2025-08-28 8:36 ` Yueyang Pan 2025-08-28 8:34 ` Yueyang Pan 2025-08-28 8:41 ` Vlastimil Babka 2025-08-28 8:47 ` Yueyang Pan 2025-08-28 8:53 ` Vlastimil Babka 2025-08-28 9:51 ` Yueyang Pan 2025-08-28 9:54 ` Vlastimil Babka 2025-08-28 22:10 ` Yueyang Pan 2025-08-28 16:35 ` Shakeel Butt 2025-08-28 17:21 ` Vlastimil Babka 2025-08-27 19:51 ` [PATCH v1 0/2] mm/show_mem: Bug fix for print mem " Vishal Moola (Oracle) 2025-08-28 8:29 ` Yueyang Pan 2025-08-28 17:05 ` Vishal Moola (Oracle) 2025-08-28 22:07 ` Yueyang Pan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).