All of lore.kernel.org
 help / color / mirror / Atom feed
From: "David Wang" <00107082@163.com>
To: "Suren Baghdasaryan" <surenb@google.com>, kent.overstreet@linux.dev
Cc: "Hao Ge" <hao.ge@linux.dev>,
	akpm@linux-foundation.org,  linux-kernel@vger.kernel.org,
	linux-mm@kvack.org,  "Hao Ge" <gehao@kylinos.cn>,
	"Alessio Balsini" <balsini@google.com>,
	 "Pasha Tatashin" <tatashin@google.com>,
	 "Sourav Panda" <souravpanda@google.com>
Subject: memory alloc profiling seems not work properly during bootup?
Date: Mon, 13 Jan 2025 16:03:50 +0800 (CST)	[thread overview]
Message-ID: <213ff7d2.7c6c.1945eb0c2ff.Coremail.00107082@163.com> (raw)
In-Reply-To: <254a4857.b2b.19458d0dbc2.Coremail.00107082@163.com>

Hi, 

More update, 

When I boot up my system,  no alloc_percpu was accounted in kernel/sched/topology.c

         996       14 kernel/sched/topology.c:2275 func:__sdt_alloc 80 
         996       14 kernel/sched/topology.c:2266 func:__sdt_alloc 80 
          96        6 kernel/sched/topology.c:2259 func:__sdt_alloc 80 
       12388       24 kernel/sched/topology.c:2252 func:__sdt_alloc 80 
         612        1 kernel/sched/topology.c:1961 func:sched_init_numa 1

And then after suspend/resume, those alloc_percpu shows up.

         996       14 kernel/sched/topology.c:2275 func:__sdt_alloc 395 
         996       14 kernel/sched/topology.c:2266 func:__sdt_alloc 395 
          96        6 kernel/sched/topology.c:2259 func:__sdt_alloc 395 
       12388       24 kernel/sched/topology.c:2252 func:__sdt_alloc 395 
           0        0 kernel/sched/topology.c:2242 func:__sdt_alloc 70    <---
           0        0 kernel/sched/topology.c:2238 func:__sdt_alloc 70    <---
           0        0 kernel/sched/topology.c:2234 func:__sdt_alloc 70    <---
           0        0 kernel/sched/topology.c:2230 func:__sdt_alloc 70    <---
         612        1 kernel/sched/topology.c:1961 func:sched_init_numa 1

I have my accumulative counter patch and filter out items with 0 accumulative counter, 
I am almost sure the patch would not cause this accounting issue, but not 100%.....


It seems to me, during boot up, some alloc_percpu is not registered.


FYI
David



At 2025-01-12 12:41:10, "David Wang" <00107082@163.com> wrote:
>
>
>At 2025-01-11 22:31:36, "David Wang" <00107082@163.com> wrote:
>>Hi, 
>>
>>I have using this feature for a long while, and I believe this memory alloc profiling feature
>>is quite powerful. 
>>
>>But, I have been wondering how to use this data, specifically:
>>how anomaly could be detected, what pattern should be defined as anomaly?
>>
>>So far, I have tools collecting those data (via prometheus), make basic analysis, i.e. top-k, group-by or rate.
>>Those analysis help me understand my system, but I cannot tell whether it is abnormal or not.
>>
>>And sometimes I would just read through /proc/allocinfo, trying to pickup something.
>>(Sometimes get lucky, actually only once, find the underflow problem weeks ago.)
>>
>>A tool would be more helpful if it can identify anomalies, and we can add more pattern as develop along.
>>
>>A pattern may be hard to define, especially when it involves context. For example,
>>I happened to notice following strange things recently:
>>
>>         896       14 kernel/sched/topology.c:2275 func:__sdt_alloc 1025 
>>         896       14 kernel/sched/topology.c:2266 func:__sdt_alloc 1025 
>>          96        6 kernel/sched/topology.c:2259 func:__sdt_alloc 1025 
>>       12288       24 kernel/sched/topology.c:2252 func:__sdt_alloc 1025    <----- B
>>           0        0 kernel/sched/topology.c:2242 func:__sdt_alloc 210     
>>           0        0 kernel/sched/topology.c:2238 func:__sdt_alloc 210 
>>           0        0 kernel/sched/topology.c:2234 func:__sdt_alloc 210 
>>           0        0 kernel/sched/topology.c:2230 func:__sdt_alloc 210     <----- A
>>Code A
>>2230                 sdd->sd = alloc_percpu(struct sched_domain *);
>>2231                 if (!sdd->sd)
>>2232                         return -ENOMEM;
>>2233 
>>
>>Code B
>>2246                 for_each_cpu(j, cpu_map) {
>>                             ...
>>
>>2251 
>>2252                         sd = kzalloc_node(sizeof(struct sched_domain) + cpumask_size(),
>>2253                                         GFP_KERNEL, cpu_to_node(j));
>>2254                         if (!sd)
>>2255                                 return -ENOMEM;
>>2256 
>>2257                         *per_cpu_ptr(sdd->sd, j) = sd;
>>
>>
>>The address of memory alloced by 'Code B', is stored in memory "Code A', the allocation counter for 'Code A'
>>is *0*, while 'Code B' is not *0*.  Something odd happens here, either it is expected and some ownership changes happened somewhere
>>, or it is a leak, or it is an accounting problem. 
>>
>>If a tool can help identify this kind of pattern, that would be great!~
>>
>>
>>Any suggestions about how to proceed with the memory problem of kernel/sched/topology.c mentioneded
>> above?, or is it a problem at all?
>>
>
>Update: 
>
>It seems the memory alloced by 'Code B' could be handovered via claim_allocations:
>1530 /*
>1531  * NULL the sd_data elements we've used to build the sched_domain and
>1532  * sched_group structure so that the subsequent __free_domain_allocs()
>1533  * will not free the data we're using.
>1534  */
>1535 static void claim_allocations(int cpu, struct sched_domain *sd)
>
>So most likely, this is neither a leak nor a accounting issue. False alarm, sorry....
>
>The reason I brought this up is that the profiling data is rich, but a user who is not familiar
>with code detail could not make much out of it. If a tool can tell whether the system is drifting away somewhere, 
>like a healthcheck based on profiling data, it would be quite helpful. 
>
>Thanks
>David 
> 
>

  reply	other threads:[~2025-01-13  8:04 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-06 11:21 [PATCH] tools/mm: Introduce a tool to handle entries in allocinfo Hao Ge
2025-01-06 21:11 ` Suren Baghdasaryan
2025-01-07 15:11   ` Alessio Balsini
2025-01-08  1:16     ` Hao Ge
2025-01-11 14:31   ` David Wang
2025-01-12  4:41     ` David Wang
2025-01-13  8:03       ` David Wang [this message]
2025-01-13 21:56         ` memory alloc profiling seems not work properly during bootup? Suren Baghdasaryan
2025-01-14  3:35           ` David Wang
2025-01-14 18:48             ` Suren Baghdasaryan
2025-01-15  1:27               ` David Wang
2025-01-20 21:03                 ` Suren Baghdasaryan
2025-01-13 21:47     ` [PATCH] tools/mm: Introduce a tool to handle entries in allocinfo Suren Baghdasaryan
2025-01-09  0:19 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=213ff7d2.7c6c.1945eb0c2ff.Coremail.00107082@163.com \
    --to=00107082@163.com \
    --cc=akpm@linux-foundation.org \
    --cc=balsini@google.com \
    --cc=gehao@kylinos.cn \
    --cc=hao.ge@linux.dev \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=souravpanda@google.com \
    --cc=surenb@google.com \
    --cc=tatashin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.